Apriori: Support vs. Confidence vs. Lift — Three Metrics, One Rule

Q: Saying a rule with Confidence $= 0.9$ is always a strong rule

Confidence $= 0.9$ means nothing if $\text{Support}(B) = 0.92$. In that case, $\text{Lift} = 0.9 / 0.92 \approx 0.98 < 1$, meaning buying $A$ actually slightly decreases the probability of buying $B$ relative to the base rate. The rule is worse than useless. Always check Lift before declaring a high-Confidence rule useful.

Q: Thinking Support is symmetric but Confidence is not, or vice versa

Support is symmetric: $\text{Support}(A \cup B) = \text{Support}(B \cup A)$. Lift is also symmetric. Confidence is the asymmetric one: $\text{Confidence}(A \Rightarrow B) = \text{Support}(A \cup B) / \text{Support}(A)$ which changes when you flip the rule to $B \Rightarrow A$. Exam questions often test this distinction.

Q: Confusing the Apriori antimonotone property with Confidence or Lift

The antimonotone property applies only to Support: if itemset $X$ has low Support, every superset of $X$ also has low Support, so they can all be pruned. This property does NOT hold for Confidence or Lift — a subset rule can have low Confidence while a superset rule has high Confidence. Apriori's pruning efficiency is entirely based on Support.

Try Solver →

TL;DR — The Apriori algorithm mines association rules of the form 'If a customer buys $A$ , they also buy $B$ '. But generating a rule is the easy part — deciding whether a rule is actually useful requires three separate metrics. Support tells you how often $A$ and $B$ appear together in the dataset; it filters out combinations so rare that the rule is statistically meaningless. Confidence tells you, given that a transaction contains $A$ , how likely it is to also contain $B$ ; it measures the directional strength of the rule. Lift tells you whether the association between $A$ and $B$ is genuine or just a side-effect of $B$ being very popular on its own. You need all three: high Support ensures statistical reliability, high Confidence ensures directional strength, and Lift $> 1$ ensures the relationship is real and not just driven by $B$ 's base popularity.

Feature Comparison

Feature	Support & Confidence	Lift
What It Measures	Support: how often $A$ and $B$ co-occur in the dataset. Confidence: how often $B$ appears given $A$ is present	Lift: how much more often $A$ and $B$ appear together than you'd expect if they were statistically independent
Formula	$\text{Support}(A \Rightarrow B) = \dfrac{\|A \cup B\|}{N}$ where $N$ is total transactions. $\text{Confidence}(A \Rightarrow B) = \dfrac{\|A \cup B\|}{\|A\|}$	$\text{Lift}(A \Rightarrow B) = \dfrac{\text{Confidence}(A \Rightarrow B)}{\text{Support}(B)} = \dfrac{\text{Support}(A \cup B)}{\text{Support}(A) \times \text{Support}(B)}$
Output Range	Support: $[0, 1]$ — fraction of all transactions. Confidence: $[0, 1]$ — conditional probability	Lift: $[0, \infty)$ — a ratio. $\text{Lift} = 1$ means independence; $\text{Lift} > 1$ means positive association; $\text{Lift} < 1$ means negative association
When It's Misleading Alone	High Confidence alone is misleading if $B$ is already very popular — if 90% of baskets contain bread, any rule $A \Rightarrow \text{bread}$ will have $\geq 90\%$ Confidence regardless of whether $A$ has anything to do with bread	High Lift alone is misleading if Support is very low — a $\text{Lift} = 10$ on a rule that appears in only 2 transactions out of 10,000 is not actionable; the pattern may be statistical noise
Role in the Apriori Algorithm	Support is used as the primary pruning threshold (Minimum Support) during candidate generation — infrequent itemsets are eliminated early. Confidence filters the final candidate rules	Lift is computed after candidate generation and rule filtering; it is used to rank and select the most genuinely useful rules from those that passed Support and Confidence thresholds
Directionality	Support is symmetric: $\text{Support}(A \Rightarrow B) = \text{Support}(B \Rightarrow A)$ . Confidence is asymmetric: $\text{Confidence}(A \Rightarrow B) \neq \text{Confidence}(B \Rightarrow A)$ in general	Lift is symmetric: $\text{Lift}(A \Rightarrow B) = \text{Lift}(B \Rightarrow A)$ . It measures association strength, not direction
What a Value of $1$ Means	Support $= 1$ : the itemset appears in every single transaction. Confidence $= 1$ : every transaction with $A$ also contains $B$ (perfect rule)	$\text{Lift} = 1$ : knowing $A$ gives you zero additional information about $B$ ; they are statistically independent and the rule is useless
Practical Interpretation	'Bread and butter appear together in 30% of all transactions' (Support). 'When customers buy bread, they buy butter 75% of the time' (Confidence)	'Customers who buy bread are 2.5 times more likely to buy butter than a random customer is' (Lift $= 2.5$ )

Complexity Showdown

Training Time

Support:Support calculation:

O(N \times |T|)

per candidate pass where

N

is transactions and

|T|

is average transaction size. Confidence:

O(1)

per rule — computed from already-counted Support values

Lift:

O(1)

per rule — Lift is computed directly from Support

(A)

, Support

(B)

, and Support

(A \cup B)

, all of which were already computed

All three metrics are computed from item counts collected during Apriori's candidate generation passes. Support requires the actual counting work. Confidence and Lift are both simple arithmetic operations on top of Support values — none of them adds meaningful overhead compared to the core counting step.

Prediction Time

Support:Support threshold applied during generation:

O(2^k)

candidate itemsets at level

k

before pruning. Confidence applied at rule generation:

O(|\text{frequent itemsets}|)

Lift:

O(|\text{rules}|)

— Lift is computed once per surviving rule after Support and Confidence filtering; it adds no computational burden to candidate generation

Support and Confidence are both gates inside the algorithm that affect how many candidates are generated and evaluated. Lift is computed at the very end on the surviving rule set only. Lift is the cheapest metric to compute because it only runs on rules that already passed the other two filters.

Space Complexity

Support:

O(|\text{frequent itemsets}|)

— must store all itemsets that survive the minimum support threshold along with their counts

Lift:

O(|\text{rules}|)

— stores Lift values only for rules that passed Support and Confidence; a subset of all frequent itemsets

Support requires storing all frequent itemsets and their counts throughout the algorithm. Lift is only stored for finalized rules, which is a smaller set. In practice all three values are stored together per rule, but Lift alone requires less intermediate storage.

When To Use Which?

Focus on Support & Confidence when:

✓You are filtering for statistical reliability — set a Minimum Support threshold (e.g., $\text{Support} \geq 0.05$ ) to ensure rules are based on enough data points to be meaningful, not just lucky coincidences in a small sample.
✓You want directional 'if-then' predictions — Confidence directly answers 'if a customer buys $A$ , how likely are they to buy $B$ ?' which is the core question for cross-selling and recommendation systems.
✓You are in the Apriori candidate generation phase — Support is the pruning metric used inside the algorithm itself. Apriori's key optimization (the antimonotone property) states: if an itemset has low Support, all of its supersets will too, so they can be pruned immediately.
✓You need asymmetric rule evaluation — Confidence lets you distinguish $A \Rightarrow B$ from $B \Rightarrow A$ , which can tell you the direction of influence (e.g., buying diapers predicts beer more strongly than buying beer predicts diapers).
✓You are reporting rules to a business stakeholder — Support and Confidence have intuitive, easy-to-explain interpretations as percentages and conditional probabilities.

Focus on Lift when:

✓You want to filter out trivially obvious rules caused by item popularity — high-Confidence rules for universally popular items (milk, bread) are not interesting because those items appear everywhere anyway. $\text{Lift} > 1$ confirms the association is real.
✓You need to rank rules by actual usefulness — after applying Support and Confidence thresholds, sort surviving rules by Lift to surface the most surprising and actionable patterns first.
✓You are comparing rules with different base rates — Lift normalizes for item popularity, so you can fairly compare a rule involving a rare item with a rule involving a common item.
✓You want symmetric association detection — since Lift is symmetric, it's useful when you care about the relationship between items regardless of which direction the recommendation goes.
✓You are detecting negative associations — $\text{Lift} < 1$ reveals items that actively co-occur less than expected. This is invisible to Support and Confidence but can be practically useful (e.g., competing products that are rarely purchased together).

Common Exam Traps

⚠️

Saying a rule with Confidence $= 0.9$ is always a strong rule

Confidence $= 0.9$ means nothing if $\text{Support}(B) = 0.92$ . In that case, $\text{Lift} = 0.9 / 0.92 \approx 0.98 < 1$ , meaning buying $A$ actually slightly decreases the probability of buying $B$ relative to the base rate. The rule is worse than useless. Always check Lift before declaring a high-Confidence rule useful.

⚠️

Thinking Support is symmetric but Confidence is not, or vice versa

Support is symmetric: $\text{Support}(A \cup B) = \text{Support}(B \cup A)$ . Lift is also symmetric. Confidence is the asymmetric one: $\text{Confidence}(A \Rightarrow B) = \text{Support}(A \cup B) / \text{Support}(A)$ which changes when you flip the rule to $B \Rightarrow A$ . Exam questions often test this distinction.

⚠️

Confusing the Apriori antimonotone property with Confidence or Lift

The antimonotone property applies only to Support: if itemset $X$ has low Support, every superset of $X$ also has low Support, so they can all be pruned. This property does NOT hold for Confidence or Lift — a subset rule can have low Confidence while a superset rule has high Confidence. Apriori's pruning efficiency is entirely based on Support.

⚠️

Claiming Lift $> 1$ is sufficient to call a rule useful

Lift $> 1$ confirms positive association, but the rule also needs adequate Support. A rule with $\text{Lift} = 50$ but $\text{Support} = 0.0001$ (3 transactions out of 30,000) is based on near-zero evidence and likely noise. A useful rule needs $\text{Lift} > 1$ AND Support above your minimum threshold.

⚠️

Saying $\text{Lift} = 1$ means the rule is wrong

$\text{Lift} = 1$ means $A$ and $B$ are statistically independent — knowing $A$ gives you zero additional information about $B$ . The rule isn't 'wrong'; it's just useless for prediction. The items co-occur as often as chance would predict regardless of any association.

⚠️

Forgetting that Confidence can equal 1 without meaning a perfect rule in the absolute sense

$\text{Confidence} = 1$ means every transaction containing $A$ also contains $B$ — a perfect rule directionally. But if $\text{Support}(A) = 0.001$ , this perfect rule applies to only 0.1% of your data. It's a perfect rule for an extremely rare pattern. Support and Confidence together determine whether a rule is both strong and broadly applicable.

Final Verdict

Support, Confidence, and Lift are not competing metrics — they answer three different questions about the same rule. Support asks: 'Is this pattern common enough to trust?' Confidence asks: 'How reliably does $A$ predict $B$ ?' Lift asks: 'Is this prediction genuinely useful, or is $B$ just popular?' A rule worth acting on needs all three: enough Support to be statistically grounded, enough Confidence to be directionally reliable, and $\text{Lift} > 1$ to confirm the association is real and not just a reflection of $B$ 's base popularity. Evaluate all three together. A rule that passes only one or two of these filters is a misleading rule.

Explore the Full Topic

Try the Solver →Read Complete Theory Guide