0
I saw some examples of codes to transform strings such as "four" into an integer 4, but it was always quite manual. Have some more automatic way to do this using NLP?
One thing I noticed when I did the test below was that the spacy
recognizes both "four" and "4" as numbers, which is a good start, but is there any way to use this to make the transformation from one type to the other? I don’t know much about natural language processing, so I don’t know exactly what the name of the technique that would be responsible for this.
nlp = spacy.load("pt")
doc = nlp("quatro vídeos")
for token in doc:
print(token.text, token.pos_, token.dep_)
# retorna
# quatro NUM nummod
# vídeos SYM ROOT
doc = nlp("4 vídeos")
for token in doc:
print(token.text, token.pos_, token.dep_)
# retorna
# 4 NUM nummod
# vídeos SYM ROOT
Thank you!
A "more robust" and "more consistent" form may have - but "more automatic" I doubt - NLP and other techniques will involve cofiguration and parameterizations that will always be more complex than a dictionary match
{'zero': 0, 'um': 1, ...}
The reading of larger numbers written by exception ("two thousand one hundred and seventy-three"), I think yes, can be done in a simpler (automatic) way with NLP than with a common heuristica only withif
anddicionários
. But for a single digit, it certainly won’t be the simplest option (although it might be better for several other reasons)– jsbueno
got it. What I did to solve my problem was to create a dictionary, but I would have to be endlessly typing to cover all the possible cases. You can give me an example of these more consistent techniques that actually use NLP?
– Giovana Morais