Multiple Linear Regression

Multiple Linear Regression, Matrix Method, Coefficients, Least Squares, Multivariate

Launch Solver →

Multiple Linear Regression extends simple linear regression to handle multiple input features at once. Instead of drawing a line through 2D points, it fits a 'hyperplane' through multi-dimensional data. For example, predicting a house price based on both its size AND its age — not just one factor. The math uses matrices to solve for all the coefficients (b0, b1, b2...) simultaneously, which is exactly the kind of numerical your 5th-semester exams will test.

The Prediction Model & Normal Equation

Y=b0+b1x1+b2x2+b=[(XTX)1XT]Y\begin{gathered} Y = b_0 + b_1x_1 + b_2x_2 + \dots \\[0.5em] \vdots \\[0.5em] \vec{b} = [(X^T X)^{-1} X^T] \cdot Y \end{gathered}

What do these variables mean?

  • YThe predicted output value. This is what we calculate at the very end.
  • b_0The intercept (bias). The base value of Y when all features are exactly 0.
  • b_1, b_2The coefficients (weights) for each feature. They tell you how much Y changes per 1-unit increase in that specific feature.
  • b\vec{b}The coefficient vector. This is just a matrix column containing [b_0, b_1, b_2...].
  • XThe design matrix. Your dataset's feature values, but with a leading column of all 1s added to calculate b_0.
  • XTX^TThe transpose of X (flipping the rows and columns).
  • (XTX)1(X^T X)^{-1}The inverse of the multiplied X matrices. This is the hardest part to calculate manually!
  • The Normal EquationThe bottom formula. It calculates every single coefficient in b\vec{b} simultaneously in one mathematical sweep.

How Does it Work?

1

Build the XX matrix from your dataset: add a leading column of all 1s (for b0b_0), then your feature columns side by side.

2

Build the YY matrix: a single column of all your output/target values.

3

Calculate XX-Transpose (XTX^T) by flipping the rows and columns of your XX matrix.

4

Multiply XTX^T by XX to get a square matrix. Use standard matrix multiplication row-by-column.

5

Find the inverse of (XTX)(X^T X). For a 3×33 \times 3 matrix, use the adjugate and determinant method.

6

Multiply (XTX)1(X^T X)^{-1} by XTX^T to get an intermediate matrix.

7

Multiply that result by YY to get your coefficient vector [b0,b1,b2,][b_0, b_1, b_2, \dots].

8

Plug b0,b1,b2b_0, b_1, b_2 and your query values (x1,x2)(x_1, x_2) into Y=b0+b1x1+b2x2Y = b_0 + b_1x_1 + b_2x_2 to get the prediction.

Important Rules & Conventions

  • Exam Trick 1: Always write out the XX matrix first with the column of 1s. Students who forget the leading 1s column get b0b_0 wrong and the entire solution falls apart.
  • Exam Trick 2: Double-check your transpose by verifying that element at row ii, col jj in XTX^T equals element at row jj, col ii in XX.
  • Exam Trick 3: To verify your inverse is correct, multiply (XTX)1(X^T X)^{-1} by (XTX)(X^T X) — you must get the identity matrix (1s on the diagonal, 0s elsewhere).
  • The number of rows in your final b\vec{b} vector always equals the number of features + 1 (for b0b_0). If you have 2 features, you get 3 coefficients: b0,b1,b2b_0, b_1, b_2.

Advantages

  • Handles multiple features simultaneously — far more realistic than simple linear regression for real-world data.
  • The matrix formula works for any number of features, making it highly scalable.
  • Each coefficient directly tells you the individual impact of that feature on the output, assuming other features are held constant.

Disadvantages

  • × Multicollinearity problem: if two of your input features are strongly correlated (e.g., height in cm and height in inches), the matrix becomes nearly impossible to invert and the coefficients become meaningless.
  • × Sensitive to outliers: one extreme data point can shift all coefficients significantly.
  • × Requires more data points than features. If you have 3 features but only 2 data points, the system is underdetermined and has no unique solution.

Summary

Multiple Linear Regression is the natural evolution of simple linear regression. By expressing the problem in matrix form, we can solve for all coefficients at once using the Normal Equation. While it demands careful attention to matrix operations, it is one of the most fundamental and interpretable tools in predictive modeling — and a guaranteed exam topic.