Linear split #python random_state Seed recognition failed

Asked

Viewed 52 times

-1

If random_state is not working, whenever I run into jupyter, it comes with a different precision.. Can anyone tell me the error? thanks in advance :D

# estimador de aprovaçao baseado nas notas das matérias
# o número nas notas variam de 0 a 100
# 1 significa aprovado e 0 reprovado 
import pandas as pd
import seaborn as sns
import numpy as np
from sklearn.svm import LinearSVC
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split
dataset = pd.read_csv('Student.csv')
#valores não preenchidos serão tratados como nota 0, pois o aluno não compareceu a prova
dataset.dropna()
treinoX, testeX, treinoY, testeY = train_test_split(dataset.drop('Result', axis = 1),dataset['Result'].to_frame(),test_size = 0.25,random_state = 0)
estimador = LinearSVC()
estimador.fit(treinoX, treinoY)
previsao = estimador.predict(testeX)
precisao = print('precisão de {}%'.format(accuracy_score(testeY,previsao)*100))
precisao
x = input('defina a nota de Física:')
y = input('defina a nota de Matemática:')
z = input('defina a nota de Química:')
k = pd.DataFrame(np.array([x,y,z]).reshape(1,-1))
teste_previsao = estimador.predict(k)
if teste_previsao[0] == 1:
    print('parabéns, você foi aprovado!')
elif teste_previsao[0] == 0:
    print('você não foi aprovado,desculpe :/')
  • Important you [Dit] your post and explain in detail the problem with a [mcve]. Studying the post available on this link can make a very positive difference in your use of the site: Stack Overflow Survival Guide in English

1 answer

0


I made the following change to your code:

...
dataset.dropna()

X = dataset.drop('Result', axix=1, inplace=True).values

Y = dataset['Result'].values

treinoX, testeX, treinoY, testeY = train_test_split(X, Y, test_size = 0.25, random_state = 42)
...

I separated the parameters of the train_test_split to facilitate, I added inplace=True within the drop() and transformed the variables Xand Y adding the .values at the end of each of them. The sklearn can only make correct evaluations when you enter data such as numpy.array.

A small detail: I changed the value of random_state for a value greater than 0, I do not know if this has effect or not.

I hope you solve your problem! Anything we can discuss here in the comments!

  • thanks! 2 questions: whenever I put . values on a dataframe, it will become an array of numpy? can I also do this with Series? you think the problem of Seed was exclusively because of the formatting that was not in array?

  • i made the recommended change, but the Seed remains in trouble, always gives a different Accuracy

  • Answering your questions: 1 - Yes. If you use .values in a dataframe it will always return a numpy array. 2 - Yes, you can do with Series as well. 3 - I don’t think the problem is with Series x array numpy. Test this today... Try the following: sets the parameter random_state when you create the template, this way: estimador = LinearSVC(random_state=50) and see if its accuracy continues to vary.

  • I believe the problem was inplace = False

Browser other questions tagged

You are not signed in. Login or sign up in order to post.