2
I’m trying to sort with sklearn
, but I’m getting an error:
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.linear_model import LinearRegression
from sklearn.naive_bayes import GaussianNB
from sklearn.naive_bayes import MultinomialNB
from sklearn.model_selection import train_test_split
from sklearn import metrics
X = df['texto'].values #texto que é a base para classificação
Y = df['sentimento'].values #sentimento é o que será treinado. Obs, a coluna setimento já está preenchida com o devido sentimento para cada texto (seguro, inseguro ou nêutro)
split_test_size = 0.30 #30% para teste e 70% para treino
#dividindo o modelo
X_treino, X_teste, Y_treino, Y_teste = train_test_split(X, Y, test_size = split_test_size, random_state = 42)
modelo_v1 = GaussianNB()
#treinando o modelo
modelo_v1.fit(X_treino, Y_treino.ravel())
Returns the error:
Traceback (Most recent call last): File "C: Users USUARIO workspacePython tests for exampleClassificacaTwitter2.py", line 280, in main() File "C: Users USUARIO workspacePython tests for exampleClassificacaTwitter2.py", line 65, in main classificar3(df, "I’m afraid of violence") File "C: Users USUARIO workspacePython tests for exampleClassificacaTwitter2.py", line 277, in classificar3 modelo_v1.fit(x_workout, Y_workout.Ravel()) File "C: Programdata Anaconda3 lib site-Packages sklearn naive_bayes.py", line 182, in fit X, y = check_X_y(X, y) File "C: Programdata Anaconda3 lib site-Packages sklearn utils validation.py", line 521, in check_X_y ensure_min_features, warn_on_dtype, Estimator) File "C: Programdata Anaconda3 lib site-Packages sklearn utils validation.py", line 382, in check_array array = np.array(array, dtype=dtype, order=order, copy=copy) Valueerror: could not Convert string to float: 'I only feel comfortable in a quiet place'
Does it not work with string? Or would I have to take the number of the frequency of words?
Good afternoon, @André Nascimento. I believe you are using the original texts (raw data) instead of the characteristics extracted from the texts with some characteristic extraction technique (text Features from Feature Extraction technique). Countvectorizer is in your code but I have not seen it being used. Using it can help.
– Anderson Chaves