Best way to delete a sample from a Dataframe at the same time save it in a variable

Asked

Viewed 26 times

0

Hello, I’m starting in supervised learning and want to separate one sample random of my dataframe for testing leaving the rest for training.

I got the result I was hoping for, but I’m uncertain if this is more of a gambit.

So if someone more experienced can take a look and show me a more reliable way, I would be grateful.

That’s what my code looks like

import pandas as pd 

uri = 'https://gist.githubusercontent.com/guilhermesilveira/2d2efa37d66b6c84a722ea627a897ced/raw/10968b997d885cbded1c92938c7a9912ba41c615/tracking.csv'

dados = pd.read_csv(uri)

teste = dados.sample(24)
treino = dados[~dados.isin(teste)].dropna()

1 answer

1


Hello, in the "sklearn" library you can import a separate function for data separation between training and testing, and also choose the size of the training/test ratio:

from sklearn.model_selection import train_test_split
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.3)

Being X what will be used to predict, Y what will be predicted and 'test_size' the value of the percentage of data designated to test.

It is important to follow this order of (X_train, X_test, Y_train, Y_test) in this function to ensure correct operation.

  • Exactly what I was looking for. Thank you very much!

Browser other questions tagged

You are not signed in. Login or sign up in order to post.