0
I have a theoretical problem where a store owner wants to know the chance of a particular phrase generating a sale, I have in hand a dictionary with 20 random words and 10 phrases formed by exactly 10 randomly chosen words within my dictionary:
Word dictionary :
I organized my sentences in a table and exchanged the words for ID’s to perform a classification test:
I tested some sklearn algorithms and what gave me the best result was CART:
treino_x, valid_x, treino_y, valid_y = (model_selection.train_test_split
(x, y, test_size=valid, random_state=sementes))
pontuacao = 'accuracy'
modelos = []
modelos.append(('CART', DecisionTreeClassifier()))
resultado = []
nomes = []
for nome, modelo in modelos:
kfold = model_selection.KFold(n_splits=10, random_state=sementes)
cv_results = model_selection.cross_val_score(modelo, treino_x, treino_y, cv=kfold, scoring=pontuacao)
resultado.append(cv_results)
nomes.append(nome)
msg = "%s: %f (%f)" % (nome, cv_results.mean(), cv_results.std())
print(msg)
cart = DecisionTreeClassifier()
cart.fit(treino_x, treino_y)
predictions = cart.predict(valid_x)
print(accuracy_score(valid_y, predictions))
With this I can already simulate a new phrase any and use a Cart.predict
to tell me if this phrase is sale or not, however instead of just returning me Sale / N I would like to know the probability that this phrase has generated a sale (Ex: 78%), at this point enters the weight of each of the Features, as I do to calculate this weight using Python ?