What’s wrong with my K-Earest Neighbor code on R?

Asked

Viewed 65 times

-2

Good afternoon,

Why does my KNN code in R produce "predictions" that are equal to the Test Base responses themselves? That is, if we change the responses of the test base the "forecast" changes equally (and the forecast table turns out to be 100% accuracy)?

Code:

a <- read.csv2("Base Treino.csv")

b <- read.csv2("Base Teste.csv")
a_cl <- a[1:10,4]

pr=knn(a,b,a_cl, k=2)

a_teste_cat <- b[,4]

tab <- table(pr,a_teste_cat)

tab

Base Treino:

100 33  100 0
100 66  75  0
100 100 50  0
100 0   25  100
100 0   100 100
0   0   25  100
0   0   75  100
0   0   0   100
0   33  100 0
0   66  100 0

Base Teste:

100 33  100 0
100 66  75  0
0   0   25  100
0   0   75  100
0   0   0   100
0   66  100 0
  • Hi Victor, why did you include the test set when training the KNN? A dirty word is to improve the name of the variables, instead of 'a', use something like 'data''.

  • Filipe, good night. knn syntax asks me to insert the test base in the parameters... How you perform?

1 answer

1

Your accuracy is giving 100% because the data from your test suite is present in the training suite. As the result of the KNN is measured by the distance between points, and its test set is present in the training set, it will always hit.

An extra tip is, whenever optimizing a hyper-parameter of a model (like the K of the KNN), create a validation set and find the best hyper-parameter values for the validation set. Only after this process use the template for the test set. With this you will be able to simulate reality, training and optimizing the model with known data, and testing it for unknown data.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.