Random Forest: A Complete Solved Numerical Example

Scenario: Hospital Readmission Risk

The Objective: Predict whether a discharged patient will be readmitted within 30 days based on clinical indicators.

Step 1: The Historical Data & Target Point

Random Forest improves upon a single Decision Tree by building an "ensemble" (a collection) of multiple trees. We will use this data to predict the Readmitted status for our target patient.

Data PointAge_GroupDiagnosis_SeverityNum_MedicationsHas_SupportReadmitted
P1YoungMildFewYesNo
P2YoungSevereManyNoYes
P3MiddleMildFewYesNo
P4MiddleSevereManyYesYes
P5SeniorMildManyNoYes
P6SeniorSevereFewNoYes
P7MiddleMildManyNoNo
P8YoungMildFewNoNo
TargetSeniorMildManyNo?

Step 2: Bootstrapping & Feature Selection

To ensure our trees don't all look identical, we give each tree a randomized subset of the data (Bootstrapping) and restrict which features it is allowed to split on.

Tree 1

Bootstrapped Rows:
1, 1, 3, 4, 5, 6, 6, 8
Allowed Features:
Age_GroupDiagnosis_Severity

Tree 2

Bootstrapped Rows:
2, 3, 5, 6, 7, 7, 7, 8
Allowed Features:
Num_MedicationsHas_Support

Tree 3

Bootstrapped Rows:
1, 2, 4, 5, 6, 7, 8, 8
Allowed Features:
Age_GroupHas_Support

Step 3: Tree-by-Tree Construction

Select a tab below to see how each individual tree calculates its splits using its assigned data, builds its structure, and casts its vote for the target patient.

Using Rows: 1, 1, 3, 4, 5, 6, 6, 8
Allowed Features: Age_Group, Diagnosis_Severity

Tree 1 Math Breakdown

Iteration 1

Context: Root

Current Data: Full Bootstrap Sample8 Rows
Data PointAge_GroupDiagnosis_SeverityReadmitted
P1YoungMildNo
P1YoungMildNo
P3MiddleMildNo
P4MiddleSevereYes
P5SeniorMildYes
P6SeniorSevereYes
P6SeniorSevereYes
P8YoungMildNo
1. Entropy of Target Class, Entropy(S)
Formula: Entropy(S)=PP+Nlog2(PP+N)NP+Nlog2(NP+N)\text{Entropy}(S) = - \dfrac{P}{P+N} \log_2\left(\dfrac{P}{P+N}\right) - \dfrac{N}{P+N} \log_2\left(\dfrac{N}{P+N}\right)

Positives (P) for 'Yes' = 4

Negatives (N) for 'No' = 4

Entropy(S)=1.00\text{Entropy}(S) = 1.00

2. Subset Information Required
Formula: Entropy(Pi,Ni)=PiPi+Nilog2(PiPi+Ni)NiPi+Nilog2(NiPi+Ni)\text{Entropy}(P_i, N_i) = - \dfrac{P_i}{P_i+N_i} \log_2\left(\dfrac{P_i}{P_i+N_i}\right) - \dfrac{N_i}{P_i+N_i} \log_2\left(\dfrac{N_i}{P_i+N_i}\right)
Evaluating Feature: Age_Group
ValuePiNiI (Pi, Ni)
Young030.00
Middle111.00
Senior300.00
Evaluating Feature: Diagnosis_Severity
ValuePiNiI (Pi, Ni)
Mild140.72
Severe300.00
3. Weighted Feature Entropy
Formula: Entropy(A)=[pi+niP+N]×Entropy(Pi,Ni)\text{Entropy}(A) = \sum \left[ \dfrac{p_i + n_i}{P + N} \right] \times \text{Entropy}(P_i, N_i)
Age_GroupEntropy = 0.25
Diagnosis_SeverityEntropy = 0.45
4. Feature Information Gain
Formula: Gain(S,A)=Entropy(S)Entropy(A)\text{Gain}(S, A) = \text{Entropy}(S) - \text{Entropy}(A)
Age_Group1.00 - 0.25 =Gain:0.75
Diagnosis_Severity1.00 - 0.45 =Gain:0.55
5. Feature Selection Decision

Age_Group generated the highest Information Gain (0.75). It is selected as the optimal splitting node for this subset.

Resulting Split
Age_Group ?
Young
Class: No
Middle
Class: ?
Senior
Class: Yes

Iteration 2

Context: Age_Group = Middle

Current Data: Filtered by Age_Group = Middle2 Rows
Data PointAge_GroupDiagnosis_SeverityReadmitted
P3MiddleMildNo
P4MiddleSevereYes
1. Entropy of Target Class, Entropy(S)
Formula: Entropy(S)=PP+Nlog2(PP+N)NP+Nlog2(NP+N)\text{Entropy}(S) = - \dfrac{P}{P+N} \log_2\left(\dfrac{P}{P+N}\right) - \dfrac{N}{P+N} \log_2\left(\dfrac{N}{P+N}\right)

Positives (P) for 'Yes' = 1

Negatives (N) for 'No' = 1

Entropy(S)=1.00\text{Entropy}(S) = 1.00

2. Subset Information Required
Formula: Entropy(Pi,Ni)=PiPi+Nilog2(PiPi+Ni)NiPi+Nilog2(NiPi+Ni)\text{Entropy}(P_i, N_i) = - \dfrac{P_i}{P_i+N_i} \log_2\left(\dfrac{P_i}{P_i+N_i}\right) - \dfrac{N_i}{P_i+N_i} \log_2\left(\dfrac{N_i}{P_i+N_i}\right)
Evaluating Feature: Diagnosis_Severity
ValuePiNiI (Pi, Ni)
Mild010.00
Severe100.00
3. Weighted Feature Entropy
Formula: Entropy(A)=[pi+niP+N]×Entropy(Pi,Ni)\text{Entropy}(A) = \sum \left[ \dfrac{p_i + n_i}{P + N} \right] \times \text{Entropy}(P_i, N_i)
Diagnosis_SeverityEntropy = 0.00
4. Feature Information Gain
Formula: Gain(S,A)=Entropy(S)Entropy(A)\text{Gain}(S, A) = \text{Entropy}(S) - \text{Entropy}(A)
Diagnosis_Severity1.00 - 0.00 =Gain:1.00
5. Feature Selection Decision

Diagnosis_Severity generated the highest Information Gain (1.00). It is selected as the optimal splitting node for this subset.

Resulting Split
Diagnosis_Severity ?
Mild
Class: No
Severe
Class: Yes

Tree 1 Resulting Decision Tree

Age_Group ?
Young
Class: No
Middle
Diagnosis_Severity ?
Mild
Class: No
Severe
Class: Yes
Senior
Class: Yes
What does Tree 1 predict?
Prediction: Yes

Step 4: Forest Prediction (Majority Vote)

Each tree evaluates the Target Patient and casts a vote. The final prediction is simply the class that receives the most votes!

Voting Results

Class 'Yes': 2 votes

Class 'No': 1 vote

Final Result
Yes