Confusion Matrix: A Complete Solved Numerical Example

Scenario: Fraudulent Transaction Detection

The Objective: A bank's fraud detection model was evaluated on 200 recent transactions. The results are recorded below.

Step 1: The Confusion Matrix

Here are the raw results from evaluating the model.Positive Class = Fraudulent | Negative Class = Legitimate

Predicted Positive
Predicted Negative
Actual Positive
40
10
Actual Negative
15
135

Step 2: Step-by-Step Calculation Breakdown

Here is exactly how each metric was derived using the core confusion matrix formulas.

1

Accuracy

Out of all predictions, how many were perfectly correct?

87.5%
Formula String
TP+TNTP+TN+FP+FN40+13540+135+15+10\frac{TP + TN}{TP + TN + FP + FN} \Rightarrow \frac{40 + 135}{40 + 135 + 15 + 10}
Execution
175200=0.875\frac{175}{200} = 0.875
2

Precision

When the AI predicted 'Yes', how often was it actually right?

72.7%
Formula String
TPTP+FP4040+15\frac{TP}{TP + FP} \Rightarrow \frac{40}{40 + 15}
Execution
4055=0.727\frac{40}{55} = 0.727
3

Recall (Sensitivity)

Out of all the actual 'Yes' cases, how many did the AI successfully find?

80.0%
Formula String
TPTP+FN4040+10\frac{TP}{TP + FN} \Rightarrow \frac{40}{40 + 10}
Execution
4050=0.800\frac{40}{50} = 0.800
4

F1 Score

The harmonic mean. It forces a balance between Precision and Recall.

76.2%
Formula String
2×Precision×RecallPrecision+Recall2×0.727×0.8000.727+0.8002 \times \frac{Precision \times Recall}{Precision + Recall} \Rightarrow 2 \times \frac{0.727 \times 0.800}{0.727 + 0.800}
Execution
2×0.5821.527=0.7622 \times \frac{0.582}{1.527} = 0.762

Step 3: Evaluation Metrics Summary

Using the values above, we calculate the four primary performance metrics for this classification model.

Accuracy
87.5%
Precision
72.7%
Recall
80.0%
F1 Score
76.2%

Final Takeaway

The positive class represents a Fraudulent transaction. A False Negative (missed fraud) is significantly more costly than a False Positive (a legitimate transaction flagged for review). With FN=10 and FP=15, this dataset is specifically designed to produce a Precision and Recall that are not equal, forcing students to calculate and interpret the F1 Score as a harmonic mean rather than a simple average.