1
I have a question in the creation of my machine model Learning. I want to create a model that provides me the PSS_Stress
columns = "ExamID;FinalGrade;PSS_Stress;StudyID;TotalQuestions;avg_durationperquestion;avg_tbd;decision_time_efficiency;good_decision_time_efficiency;maxduration;median_tbd;minduration;num_decisions_made;question_enter_count;ratio_decisions;ratio_good_decisions;totalduration;variance_tbd".split(";")
data = pd.read_csv("dataset.csv")
df = pd.DataFrame(data,columns=columns)
dfimp = df.fillna(df.mean())
X = dfimp.drop(['PSS_Stress'], axis=1)
Y=dfimp['PSS_Stress']
validation_size = 0.20
seed = 7
X_train, X_validation, Y_train, Y_validation = model_selection.train_test_split(X, Y, test_size=validation_size)
cart = DecisionTreeClassifier()
cart.fit(X_train, Y_train)
score=cart.score(X_validation, Y_validation)
print(score)
My doubt is in the variable X. I will have in this variable all the Features of my dataset or all the Features except my target variable which in this case is PSS_Stress
which was as I did in the image above