Problem with classifier in PLN

Asked

Viewed 221 times

2

I’m developing a chatbot, and to get the answer I’m using the Naive Bayes classifier to sort the questions and answers. For those who want to see all the project code and more settings follow the link Github

To develop I am using the Textblob library for python, the problem is that when training my classifier it is always returning the same message, idependente of the input I use. The message is:

"All right?"

I still can not identify the problem, I do not know if the problem is in the way my data are willing to perform the training or if it is in the way I am training p classifier.

My class that carries out the grading process is this:

#encoding: utf-8
#!/usr/bin/env python
from textblob.classifiers import NaiveBayesClassifier
from textblob import TextBlob
import logging

class Talk(object):
    """A classe Talk é responsável por retornar a resposta
    de uma frase, baseando nas informações exportadas. Utilizando a classificação
    de acordo com o teorema de Bayes
    """
    def __init__(self):
        """
        Construtor da classe

        cl -> Armazena o classificador
        accuracy -> Armazena a precisão do algoritmo
        """
        self.__cl = None
        self.__accuracy = 0


    def train(self, train_set):
        """
        Treina com a lista de informações formada de frases e suas
        respectivas classificações:
        """

        logging.debug('Inicia treinamento da previsão de intenção')
        self.__cl = NaiveBayesClassifier(train_set)
        logging.debug('Treinamento da previsão de intenção finalizado')

    def test(self, test_set):
        """
        Realiza testes com a lista de informações formada
        de frases e sua respectiva classificação para obter a precisão:
        """

        logging.debug('Inicia teste da previsão de intenção')
        self.__accuracy = self.__cl.accuracy(test_set)
        logging.debug('Teste da previsão de intenção finalizado')
        logging.info('Precisão da previsão: {}'.format(self.__accuracy))

    def response(self, phrase):
        """
        Retorna a rasposta da frase de acordo com o classificador criado
        """
        logging.debug('Analisa a frase "{}"'.format(phrase))
        blob = TextBlob(phrase,classifier=self.__cl)
        result = blob.classify()
        logging.debug('Resposta: "{}"'.format(result))
        return result

Follow the link in my file with training information and test data

1 answer

1


After a lot of tests, I was able to figure out what the problem was.

The problem was how many possible classes the classifier needed to interpret

For example the following training set:

oie, oi
oi, oiee
olá, oii
tudo bem?, tudo certo
td bem?, tudo bom
tudo bom?, tudo tranquilo

In the above case, all the answers are different from each other, although it is obvious that there are answers with the same meaning that the classifier cannot do this analysis. Summarizing in the example above I have 6 entries and 6 classes out, that’s bad for a classifier to apprehend anything.

My solution was to define response classes:

oie, [oi]
oi, [oi]
olá, [oi]
tudo bem?, [resposta tudo bem]
td bem?, [resposta tudo bem]
tudo bom?, [resposta tudo bem]

Now I have a completely different situation, I have 6 entries and 2 classes out, and that makes the accuracy in the answers rise absurdly.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.