Linear Regression: A Complete Solved Numerical Example
Scenario: Delivery Time Prediction
The Objective: Predict the exact delivery time for a parcel by calculating the mathematical line of best fit through historical warehouse data.
Core Mechanics▼
- The Best-Fit Line: The model finds the single straight line that sits as close as possible to all your data points. It is defined by the intercept () and the slope ().
- Prediction via Interpolation: To make a prediction for any new input , you simply plug it into the equation of the line. The model effectively treats the line as its "memory" of the data trend.
- The Slope Meaning: The coefficient is the "rate of change." For every one-unit increase in your input , your predicted output changes by exactly units.
- The Linearity Trap: This model assumes the world is a straight line. If your real-world data curves or changes direction, this model will fail. No amount of extra data can fix a structural mismatch!
Step 1: The Historical Data & Target Point
To predict a continuous value using Linear Regression, we first need to find the "line of best fit" through our historical data. We want to predict the Delivery_Hours when the Distance_km is exactly 55 km.
| Data Point | Distance_km | Delivery_Hours |
|---|---|---|
| P1 | 10 | 3 |
| P2 | 20 | 5 |
| P3 | 30 | 7 |
| P4 | 40 | 9 |
| P5 | 50 | 11 |
| P6 | 60 | 14 |
| P7 | 70 | 15 |
| P8 | 80 | 18 |
| Target | 55 | ? |
Step 2: Calculate Means (Average) for X and Y
First, we find the center point of all our data by taking the average of the independent variable (X) and the dependent variable (Y).
Sum of X / N
= 45
Sum of Y / N
= 10.25
Step 3: Calculate Deviations, Products, and Squares
We need to see how much each point "deviates" or wanders away from the averages we calculated in Step 2. We will sum these deviations up at the bottom of the table.
| X | Y | Dev | Dev | Dev Dev | Dev |
|---|---|---|---|---|---|
| 10 | 3 | -35 | -7.25 | 253.75 | 1225 |
| 20 | 5 | -25 | -5.25 | 131.25 | 625 |
| 30 | 7 | -15 | -3.25 | 48.75 | 225 |
| 40 | 9 | -5 | -1.25 | 6.25 | 25 |
| 50 | 11 | 5 | 0.75 | 3.75 | 25 |
| 60 | 14 | 15 | 3.75 | 56.25 | 225 |
| 70 | 15 | 25 | 4.75 | 118.75 | 625 |
| 80 | 18 | 35 | 7.75 | 271.25 | 1225 |
| SUMS (): | 890 | 4200 | |||
Step 4: Calculate Slope (m) and Intercept (b)
Using the Sums from the bottom of our table, we can finally calculate the angle of our line (Slope) and where it crosses the Y-axis (Intercept).
m = 890 / 4200
m = 0.212
b = 10.25 - (0.212 * 45)
b = 0.714
Step 5: Final Prediction
Now that we have the equation for our line, we simply plug in our target X value to predict the Y value.
Line: Y = 0.212 X + 0.714
Plugging in Target X (55):
Y = (0.212 * 55) + 0.714
Y = 12.369
Final Takeaway
Notice how the massive table in Step 3 exists solely to generate two specific numbers at the very bottom: the sum of products (890) and the sum of squared deviations (4200). On an exam, your entire slope calculation in Step 4 relies completely on dividing these two sums, meaning one tiny arithmetic mistake in a single row will derail your entire final prediction!