Naive Bayes Classifier

Naive Bayes, Probability, Classification, Supervised Learning

Launch Solver →

The Naive Bayes classifier is a fast and effective machine learning algorithm based on probability. It is called 'naive' because it makes a massive assumption: it believes every single feature in your dataset is completely independent of the others. For example, it assumes 'Age' has absolutely no effect on 'Income'. While this is rarely true in the real world, the algorithm still performs surprisingly well, especially for things like spam filtering.

The Bayes Theorem Formula (Adapted for Numericals)

P(AB)=P(BA)P(A)P(B)P(A|B) = \frac{P(B|A) \cdot P(A)}{P(B)}

What do these variables mean?

  • P(Class | Query)Posterior Probability: The final chance that your new query belongs to this specific class.
  • P(Class)Prior Probability: The base chance of this class occurring in the whole dataset (e.g., total 'Yes' / total rows).
  • P(Feature | Class)Conditional Probability (Likelihood): How often did this specific feature happen when the class was true?
  • P(Query)The denominator (Prior probability of predictors). NOTE: In manual numericals, we completely ignore dividing by this because it is the exact same number for every class!

How Does it Work?

1

Calculate the 'Prior Probability' for each class (e.g., Count of 'Yes' / Total Rows, and Count of 'No' / Total Rows).

2

Look at the features in your Target Query. For each feature, calculate its 'Conditional Probability' against every class (e.g., How many times was Income='High' when Class='Yes'?).

3

Multiply the Prior Probability by all the Conditional Probabilities for that specific class.

4

Repeat the multiplication for the other classes.

5

Compare the final calculated numbers. The class with the highest score is your predicted answer!

Important Rules & Conventions

  • Exam Trick: Skip dividing by the denominator P(B)! Since you divide every class by the same denominator, it won't change which class gets the highest score.
  • Watch out for the 'Zero Frequency' problem. If a specific feature never appears with a class in your training data, its probability is 0. Since you are multiplying everything together, one 0 turns the entire final score to 0!
  • If you hit a Zero Frequency in advanced problems, you have to use a technique called 'Laplace Smoothing' (adding 1 to your counts).

Advantages

  • Extremely fast to train and quick at making predictions.
  • Performs exceptionally well on multi-class and text classification problems (like spam detection or sentiment analysis).
  • Only requires a small amount of training data to estimate the necessary probabilities.

Disadvantages

  • × The 'naive' assumption that all features are independent is almost never true in real-world scenarios.
  • × If a categorical variable has a category in the test dataset that wasn't observed in the training dataset, the model assigns it a 0 probability (Zero Frequency problem).
  • × It is known as a bad estimator, so the final probability outputs shouldn't be taken too literally—only their rank matters.

Summary

Naive Bayes is a powerhouse of probability. By simply counting occurrences, building fractions, and multiplying them together, it can make highly accurate predictions without needing complex math. Just remember its blind spot: it naively thinks every feature acts completely on its own.