Linear Regression: A Complete Solved Numerical Example

Scenario: Delivery Time Prediction

The Objective: Predict the exact delivery time for a parcel by calculating the mathematical line of best fit through historical warehouse data.

Core Mechanics
  • The Best-Fit Line: The model finds the single straight line that sits as close as possible to all your data points. It is defined by the intercept (β0\beta_0) and the slope (β1\beta_1).
  • Prediction via Interpolation: To make a prediction for any new input xx, you simply plug it into the equation of the line. The model effectively treats the line as its "memory" of the data trend.
  • The Slope Meaning: The coefficient β1\beta_1 is the "rate of change." For every one-unit increase in your input xx, your predicted output y^\hat{y} changes by exactly β1\beta_1 units.
  • The Linearity Trap: This model assumes the world is a straight line. If your real-world data curves or changes direction, this model will fail. No amount of extra data can fix a structural mismatch!

Step 1: The Historical Data & Target Point

To predict a continuous value using Linear Regression, we first need to find the "line of best fit" through our historical data. We want to predict the Delivery_Hours when the Distance_km is exactly 55 km.

Data PointDistance_kmDelivery_Hours
P1103
P2205
P3307
P4409
P55011
P66014
P77015
P88018
Target55?

Step 2: Calculate Means (Average) for X and Y

First, we find the center point of all our data by taking the average of the independent variable (X) and the dependent variable (Y).

Formula: xˉ=XNyˉ=YN\bar{x} = \dfrac{\sum X}{N} \quad | \quad \bar{y} = \dfrac{\sum Y}{N}
Mean of X (xˉ\bar{x})

Sum of X / N

= 45

Mean of Y (yˉ\bar{y})

Sum of Y / N

= 10.25

Step 3: Calculate Deviations, Products, and Squares

We need to see how much each point "deviates" or wanders away from the averages we calculated in Step 2. We will sum these deviations up at the bottom of the table.

Formulas: Dev. (x)=Xxˉx') = X - \bar{x}\quad | \quad Dev. (y)=Yyˉy') = Y - \bar{y}
XYDev (X)(X')Dev (Y)(Y')Dev (X)×(X') \times Dev (Y)(Y')Dev (X)2(X')^2
103-35-7.25253.751225
205-25-5.25131.25625
307-15-3.2548.75225
409-5-1.256.2525
501150.753.7525
6014153.7556.25225
7015254.75118.75625
8018357.75271.251225
SUMS (Σ\Sigma):8904200

Step 4: Calculate Slope (m) and Intercept (b)

Using the Sums Σ\Sigma from the bottom of our table, we can finally calculate the angle of our line (Slope) and where it crosses the Y-axis (Intercept).

Formulas: m=(xy)(x)2b=yˉmxˉm = \dfrac{\sum (x'y')}{\sum (x')^2} \quad | \quad b = \bar{y} - m\bar{x}
Slope (m)

m = 890 / 4200

m = 0.212

Intercept (b)

b = 10.25 - (0.212 * 45)

b = 0.714

Step 5: Final Prediction

Now that we have the equation for our line, we simply plug in our target X value to predict the Y value.

Formula: Y=mx+bY = mx + b
Equation of the Best-Fit Line

Line: Y = 0.212 X + 0.714

Plugging in Target X (55):

Y = (0.212 * 55) + 0.714

Y = 12.369

Final Takeaway

Notice how the massive table in Step 3 exists solely to generate two specific numbers at the very bottom: the sum of products (890) and the sum of squared deviations (4200). On an exam, your entire slope calculation in Step 4 relies completely on dividing these two sums, meaning one tiny arithmetic mistake in a single row will derail your entire final prediction!