Apriori Algorithm: A Complete Solved Numerical Example
Scenario: Market Basket Analysis
The Objective: Discover which book categories are frequently purchased together by ruthlessly filtering out unpopular combinations before generating recommendation rules.
Core Mechanics▼
- Support is the Entry Ticket: To survive, an itemset must appear frequently enough to pass your Minimum Support threshold. If it falls short, cross it off immediately—it is permanently discarded.
- The Pruning Rule (Apriori Property): If a small itemset is infrequent, every larger set containing it is also infrequent. If fails, you never waste time scanning the data for !
- Level-by-Level Growth: You must build itemsets strictly one step at a time. Frequent singles combine to form pairs; surviving pairs form triplets. You only ever build size using the winners from size .
- Confidence Makes Rules: Finding popular itemsets is only half the job. To create an actual recommendation rule (), you filter by Confidence: "Given they already have A, how likely are they to also have B?"
Step 1: The Transaction Data
The Apriori algorithm requires a list of transactions to mine frequent patterns. In this scenario, we need a Minimum Support of 50% and a Minimum Confidence of 70%.
| Data Point | Items (Comma-Separated) |
|---|---|
| P1 | A, B, C |
| P2 | A, C, D |
| P3 | B, C, E |
| P4 | A, B, C, D |
| P5 | A, C, E |
| P6 | B, D, E |
Step 2: Frequent Itemset Generation
We iteratively generate Candidate Itemsets () and filter them by the absolute minimum support count (3) to find Frequent Itemsets ().
Iteration 1: Finding 1-Itemsets
| {A} | 4 |
| {B} | 4 |
| {C} | 5 |
| {D} | 3 |
| {E} | 3 |
| {A} | 4 |
| {B} | 4 |
| {C} | 5 |
| {D} | 3 |
| {E} | 3 |
Iteration 2: Finding 2-Itemsets
| {A, B} | 2Drop |
| {A, C} | 4 |
| {A, D} | 2Drop |
| {A, E} | 1Drop |
| {B, C} | 3 |
| {B, D} | 2Drop |
| {B, E} | 2Drop |
| {C, D} | 2Drop |
| {C, E} | 2Drop |
| {D, E} | 1Drop |
| {A, C} | 4 |
| {B, C} | 3 |
Step 3: Association Rules Generation
For every frequent itemset of size 2 or more, we generate all possible rules () and calculate their confidence. If the confidence is the rule is kept.
| Rule () | Sup () | Sup () | Confidence | Status |
|---|---|---|---|---|
| {A}→{C} | 4 | 4 | 100 % | Keep |
| {C}→{A} | 4 | 5 | 80 % | Keep |
| {B}→{C} | 3 | 4 | 75 % | Keep |
| {C}→{B} | 3 | 5 | 60 % | Drop |
Final Takeaway
Look at the final two rows in Step 3 to see the ultimate Apriori exam trap: rule asymmetry! Even though {B} → {C} passes the 70% confidence threshold and is kept, the exact reverse rule {C} → {B} scores only 60% and gets dropped, proving mathematically that association rules are strictly one-way.