Decision Tree ID3: A Complete Solved Numerical Example

Scenario: Bank Loan Approval

The Objective: Determine whether a loan application should be approved by following the logical splits of an applicant's financial profile.

Core Mechanics
  • Entropy Measures the Mess: It quantifies how "mixed up" your data is. A perfectly pure group has an entropy of 00. A 50/50 split is a total mess with an entropy of 11. Lower is always cleaner!
  • Maximize Information Gain: Information Gain (IG) is simply the drop in entropy after a split. At every step, calculate the IG for all remaining features and greedily pick the one that cleans up the data the most.
  • One and Done (No Re-use): Once you split on a feature, cross it off the list for that specific branch. You can never split on the exact same feature twice down the same path.
  • The Stopping Conditions: Stop growing a branch when all remaining examples belong to the exact same class (a pure leaf), or when you completely run out of features to test.

Step 1: The Training Data

Before calculating recursive splits, we must define the dataset. The ID3 algorithm evaluates each feature to predict the target class (Approved).

Data PointCredit_HistoryEmploymentIncome_LevelOwns_PropertyApproved
P1GoodStableHighYesYes
P2GoodStableMediumNoYes
P3PoorUnstableLowNoNo
P4PoorStableMediumYesNo
P5GoodUnstableHighYesYes
P6PoorUnstableHighNoNo
P7GoodStableLowNoYes
P8PoorStableLowYesNo
P9GoodUnstableLowNoNo
TargetGoodUnstableMediumNo?

Step 2: Recursive Splitting (Entropy & Gain)

To build the tree, we recursively calculate the Entropy of the system and find the Information Gain for every available feature. The feature with the highest gain becomes the splitting node.

Iteration 1

Context: Root

1. Entropy of Target Class, Entropy(S)
Formula: Entropy(S)=PP+Nlog2(PP+N)NP+Nlog2(NP+N)\text{Entropy}(S) = - \dfrac{P}{P+N} \log_2\left(\dfrac{P}{P+N}\right) - \dfrac{N}{P+N} \log_2\left(\dfrac{N}{P+N}\right)

Positives (P) for 'Yes' = 4

Negatives (N) for 'No' = 5

Entropy(S)=0.991\text{Entropy}(S) = 0.991

2. Subset Information Required
Formula: Entropy(Pi,Ni)=PiPi+Nilog2(PiPi+Ni)NiPi+Nilog2(NiPi+Ni)\text{Entropy}(P_i, N_i) = - \dfrac{P_i}{P_i+N_i} \log_2\left(\dfrac{P_i}{P_i+N_i}\right) - \dfrac{N_i}{P_i+N_i} \log_2\left(\dfrac{N_i}{P_i+N_i}\right)
Evaluating Feature: Credit_History
ValuePiNiI (Pi, Ni)
Good410.722
Poor040
Evaluating Feature: Employment
ValuePiNiI (Pi, Ni)
Stable320.971
Unstable130.811
Evaluating Feature: Income_Level
ValuePiNiI (Pi, Ni)
High210.918
Medium111
Low130.811
Evaluating Feature: Owns_Property
ValuePiNiI (Pi, Ni)
Yes221
No230.971
3. Weighted Feature Entropy
Formula: Entropy(A)=[pi+niP+N]×Entropy(Pi,Ni)\text{Entropy}(A) = \sum \left[ \dfrac{p_i + n_i}{P + N} \right] \times \text{Entropy}(P_i, N_i)
Credit_HistoryEntropy = 0.401
EmploymentEntropy = 0.9
Income_LevelEntropy = 0.889
Owns_PropertyEntropy = 0.984
4. Feature Information Gain
Formula: Gain(S,A)=Entropy(S)Entropy(A)\text{Gain}(S, A) = \text{Entropy}(S) - \text{Entropy}(A)
Credit_History0.991 - 0.401 =Gain:0.59
Employment0.991 - 0.9 =Gain:0.091
Income_Level0.991 - 0.889 =Gain:0.102
Owns_Property0.991 - 0.984 =Gain:0.007
5. Feature Selection Decision

Credit_History generated the highest Information Gain (0.59). It is selected as the optimal splitting node for this subset.

Resulting Split
Credit_History ?
Good
Class: ?
Poor
Class: No

Iteration 2

Context: Credit_History = Good

Current Data: Filtered by Credit_History = Good5 Rows
Data PointCredit_HistoryEmploymentIncome_LevelOwns_PropertyApproved
P1GoodStableHighYesYes
P2GoodStableMediumNoYes
P5GoodUnstableHighYesYes
P7GoodStableLowNoYes
P9GoodUnstableLowNoNo
1. Entropy of Target Class, Entropy(S)
Formula: Entropy(S)=PP+Nlog2(PP+N)NP+Nlog2(NP+N)\text{Entropy}(S) = - \dfrac{P}{P+N} \log_2\left(\dfrac{P}{P+N}\right) - \dfrac{N}{P+N} \log_2\left(\dfrac{N}{P+N}\right)

Positives (P) for 'Yes' = 4

Negatives (N) for 'No' = 1

Entropy(S)=0.722\text{Entropy}(S) = 0.722

2. Subset Information Required
Formula: Entropy(Pi,Ni)=PiPi+Nilog2(PiPi+Ni)NiPi+Nilog2(NiPi+Ni)\text{Entropy}(P_i, N_i) = - \dfrac{P_i}{P_i+N_i} \log_2\left(\dfrac{P_i}{P_i+N_i}\right) - \dfrac{N_i}{P_i+N_i} \log_2\left(\dfrac{N_i}{P_i+N_i}\right)
Evaluating Feature: Employment
ValuePiNiI (Pi, Ni)
Stable300
Unstable111
Evaluating Feature: Income_Level
ValuePiNiI (Pi, Ni)
High200
Medium100
Low111
Evaluating Feature: Owns_Property
ValuePiNiI (Pi, Ni)
Yes200
No210.918
3. Weighted Feature Entropy
Formula: Entropy(A)=[pi+niP+N]×Entropy(Pi,Ni)\text{Entropy}(A) = \sum \left[ \dfrac{p_i + n_i}{P + N} \right] \times \text{Entropy}(P_i, N_i)
EmploymentEntropy = 0.4
Income_LevelEntropy = 0.4
Owns_PropertyEntropy = 0.551
4. Feature Information Gain
Formula: Gain(S,A)=Entropy(S)Entropy(A)\text{Gain}(S, A) = \text{Entropy}(S) - \text{Entropy}(A)
Employment0.722 - 0.4 =Gain:0.322
Income_Level0.722 - 0.4 =Gain:0.322
Owns_Property0.722 - 0.551 =Gain:0.171
5. Feature Selection Decision

Employment generated the highest Information Gain (0.322). It is selected as the optimal splitting node for this subset.

Resulting Split
Employment ?
Stable
Class: Yes
Unstable
Class: ?

Iteration 3

Context: Employment = Unstable

Current Data: Filtered by Employment = Unstable2 Rows
Data PointCredit_HistoryEmploymentIncome_LevelOwns_PropertyApproved
P5GoodUnstableHighYesYes
P9GoodUnstableLowNoNo
1. Entropy of Target Class, Entropy(S)
Formula: Entropy(S)=PP+Nlog2(PP+N)NP+Nlog2(NP+N)\text{Entropy}(S) = - \dfrac{P}{P+N} \log_2\left(\dfrac{P}{P+N}\right) - \dfrac{N}{P+N} \log_2\left(\dfrac{N}{P+N}\right)

Positives (P) for 'Yes' = 1

Negatives (N) for 'No' = 1

Entropy(S)=1\text{Entropy}(S) = 1

2. Subset Information Required
Formula: Entropy(Pi,Ni)=PiPi+Nilog2(PiPi+Ni)NiPi+Nilog2(NiPi+Ni)\text{Entropy}(P_i, N_i) = - \dfrac{P_i}{P_i+N_i} \log_2\left(\dfrac{P_i}{P_i+N_i}\right) - \dfrac{N_i}{P_i+N_i} \log_2\left(\dfrac{N_i}{P_i+N_i}\right)
Evaluating Feature: Income_Level
ValuePiNiI (Pi, Ni)
High100
Low010
Evaluating Feature: Owns_Property
ValuePiNiI (Pi, Ni)
Yes100
No010
3. Weighted Feature Entropy
Formula: Entropy(A)=[pi+niP+N]×Entropy(Pi,Ni)\text{Entropy}(A) = \sum \left[ \dfrac{p_i + n_i}{P + N} \right] \times \text{Entropy}(P_i, N_i)
Income_LevelEntropy = 0
Owns_PropertyEntropy = 0
4. Feature Information Gain
Formula: Gain(S,A)=Entropy(S)Entropy(A)\text{Gain}(S, A) = \text{Entropy}(S) - \text{Entropy}(A)
Income_Level1 - 0 =Gain:1
Owns_Property1 - 0 =Gain:1
5. Feature Selection Decision

Income_Level generated the highest Information Gain (1). It is selected as the optimal splitting node for this subset.

Resulting Split
Income_Level ?
High
Class: Yes
Low
Class: No

Step 3: Final Computed Decision Tree

Combining all the recursive splits from Step 2 yields the final classification tree.

Credit_History ?
Good
Employment ?
Stable
Class: Yes
Unstable
Income_Level ?
High
Class: Yes
Low
Class: No
Poor
Class: No

Final Takeaway

Notice how the algorithm places the most mathematically decisive feature (Credit_History) at the very root to split the data as fast as possible. If a profile has 'Poor' credit, it instantly hits a pure leaf (Class: No), proving that Decision Trees will completely ignore remaining features like Employment or Income once an outcome is guaranteed!