Doubt python model creation machine Learning

Asked

Viewed 99 times

1

I have a question in the creation of my machine model Learning. I want to create a model that provides me the PSS_Stress

columns = "ExamID;FinalGrade;PSS_Stress;StudyID;TotalQuestions;avg_durationperquestion;avg_tbd;decision_time_efficiency;good_decision_time_efficiency;maxduration;median_tbd;minduration;num_decisions_made;question_enter_count;ratio_decisions;ratio_good_decisions;totalduration;variance_tbd".split(";")
data = pd.read_csv("dataset.csv")
df = pd.DataFrame(data,columns=columns)
dfimp = df.fillna(df.mean())  

X = dfimp.drop(['PSS_Stress'], axis=1) 
Y=dfimp['PSS_Stress'] 
validation_size = 0.20
seed = 7
X_train, X_validation, Y_train, Y_validation = model_selection.train_test_split(X, Y, test_size=validation_size)

cart = DecisionTreeClassifier()
cart.fit(X_train, Y_train) 
score=cart.score(X_validation, Y_validation)
print(score)

My doubt is in the variable X. I will have in this variable all the Features of my dataset or all the Features except my target variable which in this case is PSS_Stress which was as I did in the image above

1 answer

1

X_train shall have all the variables necessary to predict the value of Y_train. If Y_train has only the column PSS_Stress, then X_train will have all other columns of your dataset, except for PSS_Stress.

After all, it makes no sense to use PSS_Stress to predict for herself.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.