Cross valdation n-fold

Asked

Viewed 28 times

1

   W1        W2        W3        W4         A/N

0  0.543405  0.278369  0.424518  0.844776   1
1  0.121569  0.670749  0.825853  0.136707   1
2  0.891322  0.209202  0.185328  0.108377   1
3  0.978624  0.811683  0.171941  0.816225   0
4  0.431704  0.940030  0.817649  0.336112   0
5  0.372832  0.005689  0.252426  0.795663   0
6  0.598843  0.603805  0.105148  0.381943   1
7  0.890412  0.980921  0.059942  0.890546   1
8  0.742480  0.630184  0.581842  0.020439   1
9  0.544685  0.769115  0.250695  0.285896   1

I’m trying to use a split k-fold

   kf= KFold(len(df),n_folds=10)

I’m trying to save now example:

for train,test in kf:
    xtr = X[col][train]   # aonde a col é col = w1,w2,w3,w4
    ytr = X['A/N'][train]
    xtest = X[col][test]
    ytest = X['A/N'][test]

Problem I can save only one column at a time when I try to save W1,W2 an index error happens, ie I can only save X[col[i]][Train]

1 answer

1


You are not accessing the dataframe elements correctly. It is not recommended to use X[][], as the dataframe understands this as (X[])[].

In your example, you are doing X[['W1','W2']][2], which is understood as 'create a new dataframe with the W1 and W2 columns of X and access column 2 of this new dataframe. See also Indexing

I also think it’s best if you split X and y out of the loop. Because you’re making copies of the dataframe. I also recommend you understand the difference between View vs Copy

I don’t know which version of your Kfold, but using this version sklearn.model_selection.Kfold the code below makes Kfold correctly

import pandas as pd
from sklearn.model_selection import KFold

df = pd.read_csv('kfold.csv')

X = df[['W1', 'W2', 'W3', 'W3']]
y = df['A/N']

kf= KFold(n_splits=10)
for train,test in kf.split(X):
    xtr = X.loc[train]
    ytr = y.loc[train]
    xtest = X.loc[test]
    ytest = y.loc[test]

Browser other questions tagged

You are not signed in. Login or sign up in order to post.