Naive Bayes Classifier: A Complete Solved Numerical Example

Scenario: Medical Symptom Diagnosis

The Objective: Predict whether a patient has a viral infection based on reported clinical symptoms.

Step 1: The Training Data

Before we can classify the new target point, we must analyze our historical data.

Data PointFeverFatigueCoughHeadacheDiagnosis
P1HighYesYesYesViral
P2HighYesNoYesViral
P3NormalNoYesNoHealthy
P4NormalNoNoNoHealthy
P5HighNoYesYesViral
P6NormalYesNoYesHealthy
P7HighYesYesNoViral
P8NormalNoYesYesHealthy
P9NormalYesYesNoHealthy
P10HighYesNoNoViral
TargetHighYesYesYes?

Step 2: Prior Probabilities P (Class)

First, we calculate the baseline probability of each class occurring in our entire dataset before looking at any specific symptoms.

Formula: P(c)=Count of Class cTotal Data PointsP(c) = \dfrac{\text{Count of Class } c}{\text{Total Data Points}}
P (Viral)
= 5 / 10
= 0.5000
P (Healthy)
= 5 / 10
= 0.5000

Step 3: Conditional Probabilities P (Feature | Class)

Next, we look at our target point and calculate the "likelihood" of each specific feature appearing, assuming a specific class is true. We do this by isolating the rows for a specific class and counting how many times the feature matches.

Formula: P(xic)=Count of xi within Class cTotal Count of Class cP(x_i | c) = \dfrac{\text{Count of } x_i \text{ within Class } c}{\text{Total Count of Class } c}
For Class: Viral
P (Fever = High | Viral)= 5/5
P (Fatigue = Yes | Viral)= 4/5
P (Cough = Yes | Viral)= 3/5
P (Headache = Yes | Viral)= 3/5
For Class: Healthy
P (Fever = High | Healthy)= 0/5
P (Fatigue = Yes | Healthy)= 2/5
P (Cough = Yes | Healthy)= 3/5
P (Headache = Yes | Healthy)= 2/5

Step 4: Final Probabilities & Conclusion

Finally, we apply Bayes' Theorem (using the naive independence assumption). We multiply the Prior Probability (Step 2) by all the Conditional Probabilities (Step 3) for each class. The class that yields the highest score is our prediction.

Formula: P(AB)=P(BA)P(A)P(B)P(A|B) = \dfrac{P(B|A) \cdot P(A)}{P(B)}
P (Viral | Target)

= 5/10 * 5/5 * 4/5 * 3/5 * 3/5

= 0.1440

P (Healthy | Target)

= 5/10 * 0/5 * 2/5 * 3/5 * 2/5

= 0.0000

Final Prediction

Because Viral has the highest final calculated probability, the Naive Bayes classifier predicts that the target point belongs to the Viral class.