Precision vs. Recall: The Confusion Matrix Tradeoff Explained

Try Solver →

TL;DR — Precision and Recall both come from the confusion matrix, but they ask different questions. Precision asks: 'Of everything I predicted as Positive, how many actually were?' — it measures prediction quality. Recall asks: 'Of everything that actually was Positive, how many did I catch?' — it measures coverage. Improving one typically hurts the other because lowering the classification threshold catches more positives (better recall) but also pulls in more false alarms (worse precision).

Feature Comparison

Feature	Precision	Recall (Sensitivity)
Core Question	Of all the points I predicted as Positive, what fraction was correct?	Of all the actual Positives in the dataset, what fraction did I successfully detect?
Formula	$Precision = \frac{TP}{TP + FP}$	$Recall = \frac{TP}{TP + FN}$
What It Penalizes	False Positives ( $FP$ ) — incorrectly labeling a negative as positive	False Negatives ( $FN$ ) — missing an actual positive by labeling it negative
Denominator Comes From	Everything the model predicted as Positive: $TP + FP$	Everything that actually is Positive: $TP + FN$
Range	$0 \leq Precision \leq 1$ ; higher is better	$0 \leq Recall \leq 1$ ; higher is better
Perfect Score Achieved By	A model that only predicts Positive when it is absolutely sure — predicts very few positives but is rarely wrong	A model that predicts everything as Positive — catches every real positive but generates massive false alarms
Effect of Lowering Classification Threshold	Decreases — more positives predicted means more false positives, hurting precision	Increases — more positives predicted means more true positives are caught
Combined Metric	F1-score balances both: $F1 = 2 \times \frac{Precision \times Recall}{Precision + Recall}$	F1-score balances both: $F1 = 2 \times \frac{Precision \times Recall}{Precision + Recall}$
Also Known As	Positive Predictive Value ( $PPV$ )	Sensitivity, True Positive Rate ( $TPR$ ), Hit Rate
Related Metric to Watch	False Discovery Rate ( $FDR = 1 - Precision = \frac{FP}{TP + FP}$ )	False Negative Rate ( $FNR = 1 - Recall = \frac{FN}{TP + FN}$ )

Complexity Showdown

Training Time

Precision:N/A — Precision is a metric, not a model

Recall:N/A — Recall is a metric, not a model

Precision and Recall are evaluation metrics computed from the confusion matrix after predictions are made. They have no training cost of their own.

Prediction Time

Precision:

O(1)

— computed from four confusion matrix cells:

TP

FP

TN

FN

Recall:

O(1)

— computed from four confusion matrix cells:

TP

FP

TN

FN

Both metrics are simple arithmetic on the confusion matrix. Given the matrix, both are computed in constant time.

Space Complexity

Precision:

O(1)

— stores four integers from the confusion matrix

Recall:

O(1)

— stores four integers from the confusion matrix

For binary classification, the confusion matrix has exactly four cells: $TP$ , $FP$ , $TN$ , $FN$ . Both metrics derive from these four numbers with no additional storage.

When To Use Which?

Prioritize Precision when:

✓False positives are costly — e.g., a spam filter that marks legitimate email as spam destroys user trust. Being wrong about a positive prediction is worse than missing some positives.
✓You are making recommendations — a recommendation system that shows irrelevant items feels broken, even if it misses some good ones.
✓Legal or financial decisions are involved — falsely flagging a transaction as fraudulent when it is not causes customer friction and legal liability.
✓The positive class is common and you need to filter it carefully — high precision means your positives are genuinely positive.

Prioritize Recall when:

✓False negatives are dangerous — e.g., a cancer screening test that misses a real tumor is a catastrophic failure. Missing a positive is worse than a false alarm.
✓You are doing security threat detection — missing a real attack is far worse than flagging a benign event for further review.
✓Legal compliance requires exhaustive detection — e.g., detecting all instances of prohibited content, where missing any is a liability.
✓The positive class is rare and you must catch as many as possible — high recall ensures you're not systematically missing the rare-but-important cases.
✓Downstream human review is available — if a human will check all predicted positives anyway, false positives are cheap and missing true positives is the real risk.

Common Exam Traps

⚠️

Confusing which metric uses $FP$ and which uses $FN$

This is the single most common error. Precision denominator = $TP + FP$ (what you predicted positive). Recall denominator = $TP + FN$ (what actually was positive). A useful mnemonic: Precision = 'how Precise were my Positive Predictions'; Recall = 'how many Real positives did I Recall/Retrieve?'

⚠️

Thinking high accuracy means the model is good

On an imbalanced dataset (e.g., $99\%$ negative class), a model that always predicts Negative achieves $99\%$ accuracy but has $Precision = 0$ and $Recall = 0$ for the positive class. Accuracy is a useless metric when classes are imbalanced — use Precision, Recall, and F1.

⚠️

Assuming you can maximize both Precision and Recall simultaneously

They are in direct tension via the classification threshold. Lowering the threshold catches more positives (higher Recall) but also more false alarms (lower Precision). The Precision-Recall curve visualizes this tradeoff. F1-score picks the harmonic mean as a balanced operating point.

⚠️

Not knowing why F1 uses harmonic mean instead of arithmetic mean

Arithmetic mean rewards models that score high on one metric and zero on the other. For example: $Precision = 1.0$ , $Recall = 0.0$ gives arithmetic mean $= 0.5$ but $F1 = 0$ . The harmonic mean punishes extreme imbalances and only rewards balanced performance.

⚠️

Forgetting that Recall equals True Positive Rate ( $TPR$ ), which is the y-axis of the ROC curve

ROC curves plot $TPR$ (= Recall) on the y-axis vs. $FPR = \frac{FP}{FP + TN}$ on the x-axis. Exam questions often ask about ROC curves and expect you to know that $TPR = Recall = \frac{TP}{TP + FN}$ .

⚠️

Applying single-class Precision/Recall to a multi-class problem without specifying the averaging strategy

For multi-class problems, you must state whether you're using macro-average (equal weight per class) or weighted-average (weight by class frequency). Reporting a single Precision/Recall number for multi-class without an averaging strategy is technically undefined.

Final Verdict

Precision and Recall are two sides of the same coin, and which one matters more is entirely dictated by the cost of being wrong in each direction. When a false positive is costly (spam filters, fraud alerts), optimize for Precision. When a false negative is dangerous (medical screening, security detection), optimize for Recall. When you can't decide, F1-score gives a principled single-number balance — but always ask which type of error your application can afford.

Explore the Full Topic

Try the Solver →Read Complete Theory Guide