Confusion Matrix: A Complete Solved Numerical Example

Scenario: Fraudulent Transaction Detection

The Objective: A bank's fraud detection model was evaluated on 200 recent transactions. The results are recorded below.

Core Mechanics
  • Decoding the Acronyms: Read them backward! The second letter (P or N) is what your model guessed. The first letter (T or F) tells you if that guess was True (Right) or False (Wrong).
  • The Accuracy Trap: High accuracy is dangerously misleading if your classes are imbalanced. A lazy model can reach 99% accuracy by just guessing the majority class every single time, while missing every single rare case you actually care about.
  • Precision (Quality Control): Of everything your model claimed was Positive, how many actually were? It measures how much you can trust your model when it screams "Positive!"—too many false alarms destroy this score.
  • Recall (The Dragnet): Of all the actual Positives hidden in the real data, how many did you catch? Recall is about your model's ability to hunt down every single target; missing real ones destroys this score.

Step 1: The Confusion Matrix

Here are the raw results from evaluating the model.Positive Class = Fraudulent | Negative Class = Legitimate

Predicted Positive
Predicted Negative
Actual Positive
40
10
Actual Negative
15
135

Step 2: Step-by-Step Calculation Breakdown

Here is exactly how each metric was derived using the core confusion matrix formulas.

1

Accuracy

Out of all predictions, how many were perfectly correct?

87.5 %
Formula String
TP+TNTP+TN+FP+FN40+13540+135+15+10\frac{TP + TN}{TP + TN + FP + FN} \Rightarrow \frac{40 + 135}{40 + 135 + 15 + 10}
Execution
175200=0.875\frac{175}{200} = 0.875
2

Precision

When the AI predicted 'Yes', how often was it actually right?

72.727 %
Formula String
TPTP+FP4040+15\frac{TP}{TP + FP} \Rightarrow \frac{40}{40 + 15}
Execution
4055=0.727\frac{40}{55} = 0.727
3

Recall (Sensitivity)

Out of all the actual 'Yes' cases, how many did the AI successfully find?

80 %
Formula String
TPTP+FN4040+10\frac{TP}{TP + FN} \Rightarrow \frac{40}{40 + 10}
Execution
4050=0.8\frac{40}{50} = 0.8
4

F1 Score

The harmonic mean. It forces a balance between Precision and Recall.

76.19 %
Formula String
2×Precision×RecallPrecision+Recall2×0.727×0.80.727+0.82 \times \frac{Precision \times Recall}{Precision + Recall} \Rightarrow 2 \times \frac{0.727 \times 0.8}{0.727 + 0.8}
Execution
2×0.5821.527=0.7622 \times \frac{0.582}{1.527} = 0.762

Step 3: Evaluation Metrics Summary

Using the values above, we calculate the four primary performance metrics for this classification model.

Accuracy
87.5 %
Precision
72.727 %
Recall
80 %
F1 Score
76.19 %

Final Takeaway

The positive class represents a Fraudulent transaction. A False Negative (missed fraud) is significantly more costly than a False Positive (a legitimate transaction flagged for review). Notice the tension between the metrics! Because the model's Precision (72.7%) and Recall (80%) are noticeably different, you cannot just add them up and divide by two. You must calculate the F1 Score (76.19%) to find their true harmonic mean, which is a classic exam trap!