How does an automatic categorization algorithm work?

Asked

Viewed 181 times

4

I have that doubt.

I’ve noticed on sites like Yahoo Answers, that there’s a recognition of the semantics of the questions and they’re categorized automatically. Sure, there are mistakes, but it’s very effective most of the time.

Which method is used?

I’ve thought of ways to do it, but I’d like to hear from you here.

I thought to do a keyword count in the informed text, and so, direct you to the category that contains these keywords. It would be a kind of "punctuation", where with each keyword found, add a point to the category that contains it, in a field as "cat_keywords" in the database.

Another question I have is about computational resources. An algorithm of this type would not consume many resources?

  • 1

    I think what you’re looking for is related to the study of natural languages. See this IME paper: https://www.ime.usp.br/~slago/IA-pln.pdf

  • Thank you. I’ll see.

2 answers

0

Hello,

There are some written/spoken language libraries and also studies that can help with this solution. As for everything in technology, there are several ways to arrive at the same result. I will present my examples of use with PHP.

1 - You can have a "top 10" of expressions used in each category and do a check with what is being inserted, to link a category. In this way more simple, you can remove short expressions (less than 3 characters) and create a language hash. Each language will have its own.

2 - Lexical study. With the lexical study (speech) of language, you will arrive at more precise links. There are studies that catalog words and generate proximity of feeling and also define them as verb, adverb, etc. There is a small number of words used by us (English) that is just over 2,000 words. It’s easy to create relationships. This PUCRS study is what I find most interesting https://www.inf.pucrs.br/linatural/wordpress/recursos-e-ferramentas/wordnetaffectbr/

3 - Metaphonic study. Ex: https://www.php.net/manual/en/function.metaphone.php

4 - Cognitive analysis. Cloud services provide text analysis for decision making. Ex: https://azure.microsoft.com/pt-br/services/cognitive-services/text-analytics/

These would be some options.

0

It could easily solve by creating a table categories, which would be the tags, and another categorie_words. In that second you would insert words that have to do with the associated category. Dai would sweep every word... So what was typed can bring more than one category or bring the category that has more words associated with it.

Ex: categories:

id - category

1 - Ruby

2 - PHP

categories_words:

id - id_category - word

1 - 1 - Rails

1 - 1 - System

1 - 2 - Laravel

1 - 2 - System

text: I want to develop a system in Laravel

By occurrence: Categories - Ruby and PHP

For the most words: Categories - PHP

Browser other questions tagged

You are not signed in. Login or sign up in order to post.