Decision Tree ID3: A Complete Solved Numerical Example
Scenario: Bank Loan Approval
The Objective: Determine whether a loan application should be approved by following the logical splits of an applicant's financial profile.
Core Mechanics▼
- Entropy Measures the Mess: It quantifies how "mixed up" your data is. A perfectly pure group has an entropy of . A 50/50 split is a total mess with an entropy of . Lower is always cleaner!
- Maximize Information Gain: Information Gain (IG) is simply the drop in entropy after a split. At every step, calculate the IG for all remaining features and greedily pick the one that cleans up the data the most.
- One and Done (No Re-use): Once you split on a feature, cross it off the list for that specific branch. You can never split on the exact same feature twice down the same path.
- The Stopping Conditions: Stop growing a branch when all remaining examples belong to the exact same class (a pure leaf), or when you completely run out of features to test.
Step 1: The Training Data
Before calculating recursive splits, we must define the dataset. The ID3 algorithm evaluates each feature to predict the target class (Approved).
| Data Point | Credit_History | Employment | Income_Level | Owns_Property | Approved |
|---|---|---|---|---|---|
| P1 | Good | Stable | High | Yes | Yes |
| P2 | Good | Stable | Medium | No | Yes |
| P3 | Poor | Unstable | Low | No | No |
| P4 | Poor | Stable | Medium | Yes | No |
| P5 | Good | Unstable | High | Yes | Yes |
| P6 | Poor | Unstable | High | No | No |
| P7 | Good | Stable | Low | No | Yes |
| P8 | Poor | Stable | Low | Yes | No |
| P9 | Good | Unstable | Low | No | No |
| Target | Good | Unstable | Medium | No | ? |
Step 2: Recursive Splitting (Entropy & Gain)
To build the tree, we recursively calculate the Entropy of the system and find the Information Gain for every available feature. The feature with the highest gain becomes the splitting node.
Iteration 1
Context: Root
1. Entropy of Target Class, Entropy(S)
Positives (P) for 'Yes' = 4
Negatives (N) for 'No' = 5
2. Subset Information Required
| Value | Pi | Ni | I (Pi, Ni) |
|---|---|---|---|
| Good | 4 | 1 | 0.722 |
| Poor | 0 | 4 | 0 |
| Value | Pi | Ni | I (Pi, Ni) |
|---|---|---|---|
| Stable | 3 | 2 | 0.971 |
| Unstable | 1 | 3 | 0.811 |
| Value | Pi | Ni | I (Pi, Ni) |
|---|---|---|---|
| High | 2 | 1 | 0.918 |
| Medium | 1 | 1 | 1 |
| Low | 1 | 3 | 0.811 |
| Value | Pi | Ni | I (Pi, Ni) |
|---|---|---|---|
| Yes | 2 | 2 | 1 |
| No | 2 | 3 | 0.971 |
3. Weighted Feature Entropy
4. Feature Information Gain
5. Feature Selection Decision
Credit_History generated the highest Information Gain (0.59).
It is selected as the optimal splitting node for this subset.
Iteration 2
Context: Credit_History = Good
| Data Point | Credit_History | Employment | Income_Level | Owns_Property | Approved |
|---|---|---|---|---|---|
| P1 | Good | Stable | High | Yes | Yes |
| P2 | Good | Stable | Medium | No | Yes |
| P5 | Good | Unstable | High | Yes | Yes |
| P7 | Good | Stable | Low | No | Yes |
| P9 | Good | Unstable | Low | No | No |
1. Entropy of Target Class, Entropy(S)
Positives (P) for 'Yes' = 4
Negatives (N) for 'No' = 1
2. Subset Information Required
| Value | Pi | Ni | I (Pi, Ni) |
|---|---|---|---|
| Stable | 3 | 0 | 0 |
| Unstable | 1 | 1 | 1 |
| Value | Pi | Ni | I (Pi, Ni) |
|---|---|---|---|
| High | 2 | 0 | 0 |
| Medium | 1 | 0 | 0 |
| Low | 1 | 1 | 1 |
| Value | Pi | Ni | I (Pi, Ni) |
|---|---|---|---|
| Yes | 2 | 0 | 0 |
| No | 2 | 1 | 0.918 |
3. Weighted Feature Entropy
4. Feature Information Gain
5. Feature Selection Decision
Employment generated the highest Information Gain (0.322).
It is selected as the optimal splitting node for this subset.
Iteration 3
Context: Employment = Unstable
| Data Point | Credit_History | Employment | Income_Level | Owns_Property | Approved |
|---|---|---|---|---|---|
| P5 | Good | Unstable | High | Yes | Yes |
| P9 | Good | Unstable | Low | No | No |
1. Entropy of Target Class, Entropy(S)
Positives (P) for 'Yes' = 1
Negatives (N) for 'No' = 1
2. Subset Information Required
| Value | Pi | Ni | I (Pi, Ni) |
|---|---|---|---|
| High | 1 | 0 | 0 |
| Low | 0 | 1 | 0 |
| Value | Pi | Ni | I (Pi, Ni) |
|---|---|---|---|
| Yes | 1 | 0 | 0 |
| No | 0 | 1 | 0 |
3. Weighted Feature Entropy
4. Feature Information Gain
5. Feature Selection Decision
Income_Level generated the highest Information Gain (1).
It is selected as the optimal splitting node for this subset.
Step 3: Final Computed Decision Tree
Combining all the recursive splits from Step 2 yields the final classification tree.
Final Takeaway
Notice how the algorithm places the most mathematically decisive feature (Credit_History) at the very root to split the data as fast as possible. If a profile has 'Poor' credit, it instantly hits a pure leaf (Class: No), proving that Decision Trees will completely ignore remaining features like Employment or Income once an outcome is guaranteed!