K-Nearest Neighbors (KNN) Regression
KNN, Euclidean Distance, Regression, Average, Continuous Variables
KNN Regression operates on the exact same distance-measuring principle as KNN Classification, but it solves a different type of problem. Instead of trying to guess a category (like 'Spam' or 'Not Spam'), it predicts a continuous numerical value (like predicting the price of a house based on its square footage and bedrooms). It finds the closest neighbors and simply calculates the average of their values.
Distance & Average Formulas
What do these variables mean?
- The Euclidean distance, calculated for every row just like in classification.
- The final estimated value for your new data point.
- The Average FormulaOnce you find the top nearest neighbors, you take their target numerical values, add them all up, and divide by .
How Does it Work?
Assign a value to (the number of neighbors you want to check).
Calculate the Euclidean distance between your new data entry and all other existing data points.
Arrange the calculated distances in ascending order and pick the top closest neighbors.
Look at the numerical target values of those neighbors. Calculate their Mean (average) to get your final predicted value.
Important Rules & Conventions
- Unlike classification, where you want an odd to avoid ties, regression doesn't suffer from voting ties. An even value works perfectly fine here.
- Outliers are very dangerous in KNN Regression. If one of your nearest neighbors is a massive outlier, it will completely pull your average off course.
- Always scale or normalize your data before calculating distances, otherwise features with large numbers (like Salary) will overpower features with small numbers (like Age).
Advantages
- ✓ Incredibly easy to understand and transition to if you already know KNN Classification.
- ✓ No explicit training phase required (Lazy Learner).
- ✓ Can capture highly non-linear relationships in data that algorithms like Linear Regression might miss.
Disadvantages
- × Calculations are slow during prediction because it measures the distance to every single point.
- × Cannot extrapolate outside of its training data range. (e.g., It can never predict a house price higher than the highest price currently in its dataset).
Summary
KNN Regression is the numerical sibling of KNN Classification. By finding the closest data points and averaging their values, it provides a simple and intuitive way to predict continuous numbers without needing a complex mathematical equation.