Naive Bayes Classifier: A Complete Solved Numerical Example

Scenario: Medical Symptom Diagnosis

The Objective: Predict whether a patient has a viral infection by calculating the independent probabilities of their specific symptoms.

Core Mechanics
  • The Missing Denominator: In standard probability, you divide by the total probability of the evidence. In Naive Bayes, we skip that! Since the denominator is the exact same for every class, we just compare the numerators and pick the biggest one to save time.
  • The "Naive" Assumption: The algorithm gets its name because it assumes every single feature is completely independent. It blindly multiplies their individual probabilities together, completely ignoring how features might actually interact in the real world.
  • Multiply and Conquer: To make a prediction, you simply take the base probability of a class (the Prior) and multiply it by the probability of each feature matching that class. It's just one long chain of multiplication!
  • The Zero-Frequency Trap: If a specific feature value never appeared with a class in your data, its probability is zero. Because we multiply everything, that single zero wipes out the entire score! Note: While "Laplace Smoothing" is the standard industry fix for this, this solver runs pure, unsmoothed calculations to match introductory exams.

Step 1: The Training Data

Before we can classify the new target point, we must analyze our historical data.

Data PointFeverFatigueCoughHeadacheDiagnosis
P1HighYesYesYesViral
P2HighYesNoYesViral
P3NormalNoYesNoHealthy
P4NormalNoNoNoHealthy
P5HighNoYesYesViral
P6NormalYesNoYesHealthy
P7HighYesYesNoViral
P8NormalNoYesYesHealthy
P9NormalYesYesNoHealthy
P10HighYesNoNoViral
TargetHighYesYesYes?

Step 2: Prior Probabilities P (Class)

First, we calculate the baseline probability of each class occurring in our entire dataset before looking at any specific symptoms.

Formula: P(c)=Count of Class cTotal Data PointsP(c) = \dfrac{\text{Count of Class } c}{\text{Total Data Points}}
P (Viral)
= 5 / 10
= 0.5
P (Healthy)
= 5 / 10
= 0.5

Step 3: Conditional Probabilities P (Feature | Class)

Next, we look at our target point and calculate the "likelihood" of each specific feature appearing, assuming a specific class is true. We do this by isolating the rows for a specific class and counting how many times the feature matches.

Formula: P(xic)=Count of xi within Class cTotal Count of Class cP(x_i | c) = \dfrac{\text{Count of } x_i \text{ within Class } c}{\text{Total Count of Class } c}
For Class: Viral
P (Fever = High | Viral)= 5/5
P (Fatigue = Yes | Viral)= 4/5
P (Cough = Yes | Viral)= 3/5
P (Headache = Yes | Viral)= 3/5
For Class: Healthy
P (Fever = High | Healthy)= 0/5
P (Fatigue = Yes | Healthy)= 2/5
P (Cough = Yes | Healthy)= 3/5
P (Headache = Yes | Healthy)= 2/5

Step 4: Final Probabilities & Conclusion

Finally, we apply Bayes' Theorem (using the naive independence assumption). We multiply the Prior Probability (Step 2) by all the Conditional Probabilities (Step 3) for each class. The class that yields the highest score is our prediction.

Formula: P(AB)=P(BA)P(A)P(B)P(A|B) = \dfrac{P(B|A) \cdot P(A)}{P(B)}
P (Viral | Target)

= 5/10 * 5/5 * 4/5 * 3/5 * 3/5

= 0.144

P (Healthy | Target)

= 5/10 * 0/5 * 2/5 * 3/5 * 2/5

= 0

Final Takeaway

Look closely at the 'Healthy' calculation in Step 4! Because zero healthy patients in our dataset had a 'High' fever, that single 0/5 multiplier completely wipes out the entire final probability score. This 'Zero-Frequency Problem' is the algorithm's biggest flaw and exactly why exams will test you on Laplace Smoothing.