Decision Tree ID3: A Complete Solved Numerical Example

Try Decision Tree Solver →Read Decision Tree Theory →

Scenario: Bank Loan Approval

The Objective: Determine whether a loan application should be approved by following the logical splits of an applicant's financial profile.

Core Mechanics

▼

Entropy Measures the Mess: It quantifies how "mixed up" your data is. A perfectly pure group has an entropy of $0$ . A 50/50 split is a total mess with an entropy of $1$ . Lower is always cleaner!
Maximize Information Gain: Information Gain (IG) is simply the drop in entropy after a split. At every step, calculate the IG for all remaining features and greedily pick the one that cleans up the data the most.
One and Done (No Re-use): Once you split on a feature, cross it off the list for that specific branch. You can never split on the exact same feature twice down the same path.
The Stopping Conditions: Stop growing a branch when all remaining examples belong to the exact same class (a pure leaf), or when you completely run out of features to test.

Step 1: The Training Data

Before calculating recursive splits, we must define the dataset. The ID3 algorithm evaluates each feature to predict the target class (Approved).

Data Point	Credit_History	Employment	Income_Level	Owns_Property	Approved
P1	Good	Stable	High	Yes	Yes
P2	Good	Stable	Medium	No	Yes
P3	Poor	Unstable	Low	No	No
P4	Poor	Stable	Medium	Yes	No
P5	Good	Unstable	High	Yes	Yes
P6	Poor	Unstable	High	No	No
P7	Good	Stable	Low	No	Yes
P8	Poor	Stable	Low	Yes	No
P9	Good	Unstable	Low	No	No
Target	Good	Unstable	Medium	No	?

Want to edit this data in the live solver?

Step 2: Recursive Splitting (Entropy & Gain)

To build the tree, we recursively calculate the Entropy of the system and find the Information Gain for every available feature. The feature with the highest gain becomes the splitting node.

Iteration 1

Context: Root

1. Entropy of Target Class, Entropy(S)

Formula:

\text{Entropy}(S) = - \dfrac{P}{P+N} \log_2\left(\dfrac{P}{P+N}\right) - \dfrac{N}{P+N} \log_2\left(\dfrac{N}{P+N}\right)

Positives (P) for 'Yes' = 4

Negatives (N) for 'No' = 5

$\text{Entropy}(S) = 0.991$

2. Subset Information Required

Formula:

\text{Entropy}(P_i, N_i) = - \dfrac{P_i}{P_i+N_i} \log_2\left(\dfrac{P_i}{P_i+N_i}\right) - \dfrac{N_i}{P_i+N_i} \log_2\left(\dfrac{N_i}{P_i+N_i}\right)

Evaluating Feature: Credit_History

Value	Pi	Ni	I (Pi, Ni)
Good	4	1	0.722
Poor	0	4	0

Evaluating Feature: Employment

Value	Pi	Ni	I (Pi, Ni)
Stable	3	2	0.971
Unstable	1	3	0.811

Evaluating Feature: Income_Level

Value	Pi	Ni	I (Pi, Ni)
High	2	1	0.918
Medium	1	1	1
Low	1	3	0.811

Evaluating Feature: Owns_Property

Value	Pi	Ni	I (Pi, Ni)
Yes	2	2	1
No	2	3	0.971

3. Weighted Feature Entropy

Formula:

\text{Entropy}(A) = \sum \left[ \dfrac{p_i + n_i}{P + N} \right] \times \text{Entropy}(P_i, N_i)

Credit_HistoryEntropy = 0.401

EmploymentEntropy = 0.9

Income_LevelEntropy = 0.889

Owns_PropertyEntropy = 0.984

4. Feature Information Gain

Formula:

\text{Gain}(S, A) = \text{Entropy}(S) - \text{Entropy}(A)

Credit_History0.991 - 0.401 =Gain:0.59

Employment0.991 - 0.9 =Gain:0.091

Income_Level0.991 - 0.889 =Gain:0.102

Owns_Property0.991 - 0.984 =Gain:0.007

5. Feature Selection Decision

Credit_History generated the highest Information Gain (0.59).
It is selected as the optimal splitting node for this subset.

Resulting Split

Credit_History ?

Good

Class: ?

Poor

Class: No

Iteration 2

Context: Credit_History = Good

Current Data: Filtered by Credit_History = Good5 Rows

Data Point	Credit_History	Employment	Income_Level	Owns_Property	Approved
P1	Good	Stable	High	Yes	Yes
P2	Good	Stable	Medium	No	Yes
P5	Good	Unstable	High	Yes	Yes
P7	Good	Stable	Low	No	Yes
P9	Good	Unstable	Low	No	No

1. Entropy of Target Class, Entropy(S)

Formula:

\text{Entropy}(S) = - \dfrac{P}{P+N} \log_2\left(\dfrac{P}{P+N}\right) - \dfrac{N}{P+N} \log_2\left(\dfrac{N}{P+N}\right)

Positives (P) for 'Yes' = 4

Negatives (N) for 'No' = 1

$\text{Entropy}(S) = 0.722$

2. Subset Information Required

Formula:

\text{Entropy}(P_i, N_i) = - \dfrac{P_i}{P_i+N_i} \log_2\left(\dfrac{P_i}{P_i+N_i}\right) - \dfrac{N_i}{P_i+N_i} \log_2\left(\dfrac{N_i}{P_i+N_i}\right)

Evaluating Feature: Employment

Value	Pi	Ni	I (Pi, Ni)
Stable	3	0	0
Unstable	1	1	1

Evaluating Feature: Income_Level

Value	Pi	Ni	I (Pi, Ni)
High	2	0	0
Medium	1	0	0
Low	1	1	1

Evaluating Feature: Owns_Property

Value	Pi	Ni	I (Pi, Ni)
Yes	2	0	0
No	2	1	0.918

3. Weighted Feature Entropy

Formula:

\text{Entropy}(A) = \sum \left[ \dfrac{p_i + n_i}{P + N} \right] \times \text{Entropy}(P_i, N_i)

EmploymentEntropy = 0.4

Income_LevelEntropy = 0.4

Owns_PropertyEntropy = 0.551

4. Feature Information Gain

Formula:

\text{Gain}(S, A) = \text{Entropy}(S) - \text{Entropy}(A)

Employment0.722 - 0.4 =Gain:0.322

Income_Level0.722 - 0.4 =Gain:0.322

Owns_Property0.722 - 0.551 =Gain:0.171

5. Feature Selection Decision

Employment generated the highest Information Gain (0.322).
It is selected as the optimal splitting node for this subset.

Resulting Split

Employment ?

Stable

Class: Yes

Unstable

Class: ?

Iteration 3

Context: Employment = Unstable

Current Data: Filtered by Employment = Unstable2 Rows

Data Point	Credit_History	Employment	Income_Level	Owns_Property	Approved
P5	Good	Unstable	High	Yes	Yes
P9	Good	Unstable	Low	No	No

1. Entropy of Target Class, Entropy(S)

Formula:

\text{Entropy}(S) = - \dfrac{P}{P+N} \log_2\left(\dfrac{P}{P+N}\right) - \dfrac{N}{P+N} \log_2\left(\dfrac{N}{P+N}\right)

Positives (P) for 'Yes' = 1

Negatives (N) for 'No' = 1

$\text{Entropy}(S) = 1$

2. Subset Information Required

Formula:

\text{Entropy}(P_i, N_i) = - \dfrac{P_i}{P_i+N_i} \log_2\left(\dfrac{P_i}{P_i+N_i}\right) - \dfrac{N_i}{P_i+N_i} \log_2\left(\dfrac{N_i}{P_i+N_i}\right)

Evaluating Feature: Income_Level

Value	Pi	Ni	I (Pi, Ni)
High	1	0	0
Low	0	1	0

Evaluating Feature: Owns_Property

Value	Pi	Ni	I (Pi, Ni)
Yes	1	0	0
No	0	1	0

3. Weighted Feature Entropy

Formula:

\text{Entropy}(A) = \sum \left[ \dfrac{p_i + n_i}{P + N} \right] \times \text{Entropy}(P_i, N_i)

Income_LevelEntropy = 0

Owns_PropertyEntropy = 0

4. Feature Information Gain

Formula:

\text{Gain}(S, A) = \text{Entropy}(S) - \text{Entropy}(A)

Income_Level1 - 0 =Gain:1

Owns_Property1 - 0 =Gain:1

5. Feature Selection Decision

Income_Level generated the highest Information Gain (1).
It is selected as the optimal splitting node for this subset.

Resulting Split

Income_Level ?

High

Class: Yes

Low

Class: No

Step 3: Final Computed Decision Tree

Combining all the recursive splits from Step 2 yields the final classification tree.

Credit_History ?

Good

Employment ?

Stable

Class: Yes

Unstable

Income_Level ?

High

Class: Yes

Low

Class: No

Poor

Class: No

Final Takeaway

Notice how the algorithm places the most mathematically decisive feature (Credit_History) at the very root to split the data as fast as possible. If a profile has 'Poor' credit, it instantly hits a pure leaf (Class: No), proving that Decision Trees will completely ignore remaining features like Employment or Income once an outcome is guaranteed!