Simple vs. Multiple Linear Regression: One Feature vs. Many

TL;DR — Simple Linear Regression (SLR) models the relationship between one input feature xx and a target yy as a straight line: y^=β0+β1x\hat{y} = \beta_0 + \beta_1 x. Multiple Linear Regression (MLR) extends this to dd features: y^=β0+β1x1+β2x2++βdxd\hat{y} = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \dots + \beta_d x_d. The core math is the same — both minimize Sum of Squared Residuals. The difference is dimensionality: SLR fits a line in 2D; MLR fits a hyperplane in (d+1)(d+1)-dimensional space. MLR introduces an entirely new class of problems — multicollinearity, overfitting, and the need for Adjusted R2R^2 — that simply don't exist in SLR.

Feature Comparison

FeatureSimple Linear Regression (SLR)Multiple Linear Regression (MLR)
Model Equationy^=β0+β1x\hat{y} = \beta_0 + \beta_1 x — two parameters: one intercept, one slopey^=β0+j=1dβjxj\hat{y} = \beta_0 + \sum_{j=1}^{d} \beta_j x_jd+1d + 1 parameters: one intercept, dd slopes
Number of FeaturesExactly 11 input featured2d \geq 2 input features
Geometric InterpretationFits a straight line through a 2D scatter plot of (x,y)(x, y) pairsFits a hyperplane through a (d+1)(d+1)-dimensional space — impossible to visualize for d>2d > 2
Closed-Form Solutionβ1=(xixˉ)(yiyˉ)(xixˉ)2\beta_1 = \frac{\sum(x_i - \bar{x})(y_i - \bar{y})}{\sum(x_i - \bar{x})^2}, β0=yˉβ1xˉ\beta_0 = \bar{y} - \beta_1 \bar{x}β=(XTX)1XTy\boldsymbol{\beta} = (X^TX)^{-1}X^T\mathbf{y} — requires matrix inversion of the d×dd \times d matrix XTXX^TX
Multicollinearity RiskNone — with only one feature, there is nothing to correlate withReal and dangerous — if two features xix_i and xjx_j are highly correlated, XTXX^TX becomes near-singular, making β\beta estimates unstable and unreliable
Goodness-of-Fit MetricR2=1SSresSStotR^2 = 1 - \frac{SS_{res}}{SS_{tot}} — proportion of variance in yy explained by xxAdjusted R2=1(1R2)(n1)nd1R^2 = 1 - \frac{(1-R^2)(n-1)}{n-d-1} — penalizes adding features that don't improve the model; plain R2R^2 always increases with more features even if they are useless
Overfitting RiskVery low — two parameters (β0\beta_0, β1\beta_1) give the model almost no room to overfitIncreases with dd — with enough features, MLR can perfectly fit the training data while generalizing poorly. Regularization (Ridge, Lasso) is often needed
Feature Significance TestingOne tt-test for β1\beta_1: t=β1SE(β1)t = \frac{\beta_1}{SE(\beta_1)}One tt-test per βj\beta_j plus a global FF-test for the overall model: F=(SStotSSres)/dSSres/(nd1)F = \frac{(SS_{tot} - SS_{res})/d}{SS_{res}/(n-d-1)}
AssumptionsLinearity, independence of errors, homoscedasticity (Var(ϵ)=σ2Var(\epsilon) = \sigma^2), normality of residualsSame four assumptions, plus: no perfect multicollinearity among features (rank(X)=d+1rank(X) = d + 1)
Coefficient Interpretationβ1\beta_1: for every 1-unit increase in xx, y^\hat{y} changes by β1\beta_1 unitsβj\beta_j: for every 1-unit increase in xjx_j, y^\hat{y} changes by βj\beta_j units, holding all other features constant. The 'holding others constant' clause is essential and often forgotten

Complexity Showdown

Training Time

Simple:O(n)O(n)
Multiple:O(n×d2+d3)O(n \times d^2 + d^3)

SLR has a closed-form solution requiring a single pass through the data: O(n)O(n). MLR requires forming and inverting the d×dd \times d matrix XTXX^TX: O(n×d2)O(n \times d^2) to form it and O(d3)O(d^3) to invert it. For large dd (hundreds of features), this is the dominant cost.

Prediction Time

Simple:O(1)O(1) — one multiplication and one addition: y^=β0+β1x\hat{y} = \beta_0 + \beta_1 x
Multiple:O(d)O(d) — one dot product of length dd: y^=β0+j=1dβjxj\hat{y} = \beta_0 + \sum_{j=1}^{d} \beta_j x_j

SLR prediction is two arithmetic operations — effectively instantaneous. MLR prediction scales linearly with the number of features dd. For small dd, both are negligibly fast; for very high-dimensional data, the O(d)O(d) cost can matter.

Space Complexity

Simple:O(1)O(1) — stores exactly two values: β0\beta_0 and β1\beta_1
Multiple:O(d)O(d) — stores d+1d + 1 coefficients: β0,β1,,βd\beta_0, \beta_1, \dots, \beta_d

Both models discard the training data after fitting. SLR stores 2 numbers; MLR stores d+1d + 1 numbers. For any reasonable dd, both are negligible — this is rarely the practical bottleneck.

When To Use Which?

Use Simple Linear Regression when:

  • You have exactly one meaningful input feature, or you are deliberately isolating the effect of a single variable on the target.
  • You want the simplest possible interpretable baseline — y^=β0+β1x\hat{y} = \beta_0 + \beta_1 x has two parameters and is completely transparent.
  • You are doing exploratory analysis — plotting SLR on each feature individually helps identify which features are linearly related to the target before building a full MLR model.
  • Teaching or explaining a regression concept — SLR is the canonical starting point because it can be visualized as a line through a scatter plot.

Use Multiple Linear Regression when:

  • Multiple features jointly predict the target — e.g., house price depends on area, number of rooms, location, and age simultaneously, not just one variable.
  • You need to control for confounders — in causal analysis, MLR allows you to isolate the effect of one feature while holding others constant.
  • Your SLR residuals show clear patterns — if one feature doesn't explain enough variance, adding more features reduces the residual error.
  • You want to perform feature selection — using Lasso (L1L1 regularization) with MLR automatically drives irrelevant feature coefficients to exactly zero.
  • You need formal statistical inference — MLR provides pp-values, confidence intervals, and FF-tests to determine whether the overall model and individual features are statistically significant.

Common Exam Traps

⚠️

Thinking R2R^2 always increases when you add more features to MLR

R2R^2 can never decrease when you add a feature to an MLR model — even adding a completely random, useless feature will maintain or slightly increase R2R^2. This is why Adjusted R2R^2 exists: Rˉ2=1(1R2)(n1)nd1\bar{R}^2 = 1 - \frac{(1-R^2)(n-1)}{n-d-1}. It penalizes additional features and can decrease if a new feature adds noise without predictive power.

⚠️

Interpreting MLR coefficients without the 'holding others constant' clause

In SLR, β1\beta_1 is simply the slope of yy with respect to xx. In MLR, βj\beta_j is the partial effect of xjx_j on yy while all other features are held fixed. If you ignore this, your interpretation is wrong — the coefficient changes meaning depending on what else is in the model.

⚠️

Confusing multicollinearity with correlation between a feature and the target

Multicollinearity is correlation between two or more input features (xix_i and xjx_j) — this is the problem. Correlation between a feature and the target yy is actually desirable — that's the signal the model learns from. The two are completely different concepts.

⚠️

Assuming the FF-test in MLR tests the same thing as the individual tt-tests

The FF-test checks if the model as a whole explains significant variance — i.e., is at least one βj0\beta_j \neq 0? The individual tt-tests check each coefficient separately. A model can have a significant FF-test but no individually significant tt-tests (due to multicollinearity), or significant tt-tests with a borderline FF-test.

⚠️

Saying SLR is a special case of MLR with d=1d = 1 — and stopping there

This is technically true and worth stating, but incomplete for an exam. The deeper point is that SLR has a simpler closed-form (β1=Cov(x,y)Var(x)\beta_1 = \frac{Cov(x,y)}{Var(x)}), has no multicollinearity, doesn't need Adjusted R2R^2, and requires only one tt-test. Knowing what disappears when d=1d = 1 is what the question is really testing.

⚠️

Thinking adding more features always reduces training error in MLR

Yes — adding features always reduces or maintains training error (plain R2R^2 never decreases). But this does not imply better generalization. When dd approaches nn, the model interpolates the training data perfectly (MSE=0MSE = 0) but completely fails on new data. This is the bias-variance tradeoff in regression.

Final Verdict

Simple Linear Regression is a pedagogical foundation and a practical tool when one feature dominates. Multiple Linear Regression is the real-world workhorse for tabular data with many predictors. The math is identical at its core — both minimize (yiy^i)2\sum (y_i - \hat{y}_i)^2 — but MLR introduces multicollinearity, the need for Adjusted R2R^2, FF-tests, and regularization. Master SLR to understand the mechanics; master MLR to apply regression to real problems.