(Clarification) Artificial intelligence Word classification (Python)

Asked

Viewed 111 times

1

I was trying to do something to fulfill that need:

  • Read the word
  • Hit a dictionary
  • E Try to complete the word (if possible)

For example:

ENTRADA: CSA
DICIONARIO ["HOMEM", "CASA", "MULHER"]`
SAIDA: CASA

ENTRADA: OMEM
DICIONARIO ["HOMEM", "CASA", "MULHER"]`
SAIDA: HOMEM

ENTRADA: MUHER
DICIONARIO ["HOMEM", "CASA", "MULHER"]`
SAIDA: MULHER

ENTRADA: JOGO
DICIONARIO ["HOMEM", "CASA", "MULHER"]`
SAIDA: Palavra não definida e/ou desconhecida

I ended up finding some examples of scikit-Earn and google, but did not understand how to use them.

Documentation scikit-Learn

Google example


I ask for help to clarify what is happening in this code:

import re
from collections import Counter

def words(text): return re.findall(r'\w+', text.lower())

WORDS = Counter(words(open('big.txt').read()))

def P(word, N=sum(WORDS.values())): 
//"Probability of `word`."
return WORDS[word] / N

def correction(word): 
//"Most probable spelling correction for word."
return max(candidates(word), key=P)

def candidates(word): 
//"Generate possible spelling corrections for word."
return (known([word]) or known(edits1(word)) or known(edits2(word)) or [word])

def known(words): 
//"The subset of `words` that appear in the dictionary of WORDS."
return set(w for w in words if w in WORDS)

def edits1(word):
//"All edits that are one edit away from `word`."
letters    = 'abcdefghijklmnopqrstuvwxyz'
splits     = [(word[:i], word[i:])    for i in range(len(word) + 1)]
deletes    = [L + R[1:]               for L, R in splits if R]
transposes = [L + R[1] + R[0] + R[2:] for L, R in splits if len(R)>1]
replaces   = [L + c + R[1:]           for L, R in splits if R for c in letters]
inserts    = [L + c + R               for L, R in splits for c in letters]
return set(deletes + transposes + replaces + inserts)

def edits2(word): 
//"All edits that are two edits away from `word`."
return (e2 for e1 in edits1(word) for e2 in edits1(e1))
  • 1

    Do you need to correct after typing or during typing? Should the broker’s behavior rely on implicit feedback or does he not need to learn from his error rate? Will you remove the stopwords? The code basically uses the frequency of the terms to calculate and provide the most likely combination of letters for the word. Basically it suggests the often most 'similar' or probable.

  • The words will come from a text base already typed, for example a spreadsheet, in the next column I intend to put the percentage (in decimal) of being the word of the dictionary and the word of the dictionary in the next column... Something like |Word|Percentage|Possible word| |Casa|1|Casa| |Caa|%|Casa|

  • @Intrusion Thanks bro, I ended up studying by account a little statistics and probability that is what the code uses... But still THANK YOU for the clarification. now I can edit the code according to my needs.

No answers

Browser other questions tagged

You are not signed in. Login or sign up in order to post.