K Nearest Neighbor (KNN) Regression is one of the simplest forms of regression in the machine learning toolbox. Training data is stored on the model and all computation is deferred until the time of prediction. When a new observation is provided it calculates the distance between the new observation and all training data in an n-dimensional space (where n is the number of variables). The predicted value is the average value of the k closest training records.
Below is a simple example of using KNN Regression in Lurn.
Suppose we have a dataset containing the age, years of college eduction and annual income for a set of people. We could use this as training data to predict people's annual income based on their age and years of eduction.
people = [
# age years of education annual income
[ 25, 4, 50000],
[ 35, 6, 60000],
[ 51, 2, 40000],
[ 45, 8, 90000],
[ 32, 4, 70000],
]
# extract age and eduction
predictors = people.map { |person| person[0..1] }
# extract annual income
target_var = people.map { |person| person[2]}
The model can be trained by passing the predictors and target values to an initialized instance of the KNNRegression model.
# initialize the model with a k of 2
model = Lurn::Neighbors::KNNRegression.new(2)
model.fit(predictors, target_var)
The model can now be used to predict the income of a person given his/her age and years of education.
# predict the income of a 31 year old person with 4 years of eduction
model.predict([31, 4])