Anniversary coming up? Find out if love really is everlasting by calculating the value of the diamond ring he’s bought you using K Nearest Neighbor regression

5 min readMay 21, 2021

In Part I, we discussed the basic concepts of K nearest neighbors algorithm and built a KNN classifier using XL8ML software. We explored the various handy features available in XL8ML which are equivalent to Python. KNN algorithm is more popularly used for classification problems. However, I have seldom seen KNN being implemented on regression tasks. In this blog, we will build a regression model using KNN with the help of XL8ML software.

The KNN algorithm uses feature similarity to predict the values of new data points. This means that the new point is assigned a value based on how closely it resembles the points in the training set. We saw that for classification problems, the final prediction is made using the mode of the similar data points. In case of regression, mean is taken. Now an important question arises. On what basis are two data points considered to be close or far away? Well, here comes the methods of calculating distance between data points.

1. Euclidean Distance: Euclidean distance is calculated as the square root of the sum of the squared differences between a new point (x) and an existing point (y)

2. Manhattan Distance: This is the distance between real vectors using the sum of their absolute difference

Using any of these methods the distance is calculated between two data points and we come to know whether they should be considered similar or not based on their proximity. Let’s move to the actual implementation as we are clear with the intuition behind regression using KNN.

Precious stones like diamond are in high demand in the investment market due to their monetary rewards. It is of utmost importance to the diamond dealers to predict the accurate price. However, the prediction process is difficult due to the wide variation in the diamond stones sizes and characteristics. We are going to predict the diamond prices using KNN algorithm. Let us have a look at the data first.

· Carat: Carat weight of the diamond

· Cut: Describe cut quality of the diamond. Quality in increasing order Fair, Good, Very Good, Premium, Ideal

· Color: Color of the diamond, with D being the best and J the worst

· Clarity: How obvious inclusions are within the diamond:(in order from best to worst, FL = flawless, I3= level 3 inclusions) FL,IF, VVS1,

· Depth: he height of a diamond, measured from the culet to the table, divided by its average girdle diameter

· Table: The width of the diamond’s table expressed as a percentage of its average diameter

· X: length in mm

· Y: width in mm

· Z: height in mm

· Price: Price of the diamond

Let’s get started. Load the dataset into Excel. This is how the dataset looks after loading into Excel.

We need to separate out the individual columns in order to analyze the data. This can be done using xl_s_split(input range,delimiter,trim) function from XL8ML software. Once we do that, we get the desired format.

In order to build the regression model, we need to use two functions from XL8ML software.

Ø xl_ml_knn_regression_train()

Ø xl_ml_knn_regression_predict()

The dataset is split into training and testing dataset randomly in 80:20 ratio. Let us give a name for the training model. Let it be ‘TRAIN’. Let us apply the function xl_ml_knn_regression_train(model name,input range,output range,weight,k) over the range of cells. ‘k’ refers to the number of training samples to form a cluster and ‘weight’ can take two values: uniform and distance. When weight is ‘uniform’, all points in the neighborhood are weighted equally. When weight is ‘distance’, closer neighbors of a query point will have a greater influence than neighbors which are further away.

We get the trained model in cell L9. We will be considering this moving forward for other calculations.

Now let’s predict the diamond price with the features of the test data using xl_ml_knn_regression_predict (trained model, input range).

Now it’s time to evaluate the model we have built. We can use metrics like mean squared error and mean absolute error to evaluate the performance. We will take a closer look at the popular metrics for regression models and how to calculate them for our model. You need to understand these metrics in order to determine whether regression models are accurate or misleading. Following a flawed model is a bad idea, so it is important that you can quantify how accurate your model is.

MAE (Mean Absolute Error) represents the difference between the original and predicted values extracted by averaged the absolute difference over the data set. MSE (Mean Squared Error) represents the difference between the original and predicted values extracted by squared the average difference over the data set.

XL8ML software offers these metrics for evaluation purpose. The Mean Absolute Error can be calculated using xl_ml_mean_absolute_error (LHS,RHS). We get MAE as 0.76. Our model has performed well.

The Mean Squared Error can be calculated using xl_ml_mean_squared_ error(LHS,RHS). We get MSE as 2.82 which is good. The range of MAE and MSE is from 0 to positive infinity. If they are 0, it means the actual and predicted values are exactly the same. MAE and MSE values should be as low as possible for a model to perform better. Our model is doing a good job in predicting the diamond prices!

Finally we have built our own KNN classifier and regressor without the help of Python by just using XL8ML software. Please do check it out at https://xl8ml.com. Now that you know all about the KNN algorithm, you are ready to start building predictive models. If you are not aware of building a KNN classifier please visit https://poojadurai1997.medium.com/lets-implement-lazy-learning-algorithm-using-xl8ml-part-i-48641adb2be4. Stay tuned for more interesting blogs!

Anniversary coming up? Find out if love really is everlasting by calculating the value of the diamond ring he’s bought you using K Nearest Neighbor regression

Written by Poojadurai