Random Forest: A Complete Solved Numerical Example
Scenario: Hospital Readmission Risk
The Objective: Predict whether a discharged patient will be readmitted within 30 days based on clinical indicators.
Step 1: The Historical Data & Target Point
Random Forest improves upon a single Decision Tree by building an "ensemble" (a collection) of multiple trees. We will use this data to predict the Readmitted status for our target patient.
| Data Point | Age_Group | Diagnosis_Severity | Num_Medications | Has_Support | Readmitted |
|---|---|---|---|---|---|
| P1 | Young | Mild | Few | Yes | No |
| P2 | Young | Severe | Many | No | Yes |
| P3 | Middle | Mild | Few | Yes | No |
| P4 | Middle | Severe | Many | Yes | Yes |
| P5 | Senior | Mild | Many | No | Yes |
| P6 | Senior | Severe | Few | No | Yes |
| P7 | Middle | Mild | Many | No | No |
| P8 | Young | Mild | Few | No | No |
| Target | Senior | Mild | Many | No | ? |
Step 2: Bootstrapping & Feature Selection
To ensure our trees don't all look identical, we give each tree a randomized subset of the data (Bootstrapping) and restrict which features it is allowed to split on.
Tree 1
Tree 2
Tree 3
Step 3: Tree-by-Tree Construction
Select a tab below to see how each individual tree calculates its splits using its assigned data, builds its structure, and casts its vote for the target patient.
Allowed Features: Age_Group, Diagnosis_Severity
Tree 1 Math Breakdown
Iteration 1
Context: Root
| Data Point | Age_Group | Diagnosis_Severity | Readmitted |
|---|---|---|---|
| P1 | Young | Mild | No |
| P1 | Young | Mild | No |
| P3 | Middle | Mild | No |
| P4 | Middle | Severe | Yes |
| P5 | Senior | Mild | Yes |
| P6 | Senior | Severe | Yes |
| P6 | Senior | Severe | Yes |
| P8 | Young | Mild | No |
1. Entropy of Target Class, Entropy(S)
Positives (P) for 'Yes' = 4
Negatives (N) for 'No' = 4
2. Subset Information Required
| Value | Pi | Ni | I (Pi, Ni) |
|---|---|---|---|
| Young | 0 | 3 | 0.00 |
| Middle | 1 | 1 | 1.00 |
| Senior | 3 | 0 | 0.00 |
| Value | Pi | Ni | I (Pi, Ni) |
|---|---|---|---|
| Mild | 1 | 4 | 0.72 |
| Severe | 3 | 0 | 0.00 |
3. Weighted Feature Entropy
4. Feature Information Gain
5. Feature Selection Decision
Age_Group generated the highest Information Gain (0.75).
It is selected as the optimal splitting node for this subset.
Iteration 2
Context: Age_Group = Middle
| Data Point | Age_Group | Diagnosis_Severity | Readmitted |
|---|---|---|---|
| P3 | Middle | Mild | No |
| P4 | Middle | Severe | Yes |
1. Entropy of Target Class, Entropy(S)
Positives (P) for 'Yes' = 1
Negatives (N) for 'No' = 1
2. Subset Information Required
| Value | Pi | Ni | I (Pi, Ni) |
|---|---|---|---|
| Mild | 0 | 1 | 0.00 |
| Severe | 1 | 0 | 0.00 |
3. Weighted Feature Entropy
4. Feature Information Gain
5. Feature Selection Decision
Diagnosis_Severity generated the highest Information Gain (1.00).
It is selected as the optimal splitting node for this subset.
Tree 1 Resulting Decision Tree
Step 4: Forest Prediction (Majority Vote)
Each tree evaluates the Target Patient and casts a vote. The final prediction is simply the class that receives the most votes!
Voting Results
Class 'Yes': 2 votes
Class 'No': 1 vote