3
I am trying to remove unwanted words from any text, but it takes away from other words. For example:
remover_palavras = ["a", "e", "o"]
The program returns: btt (potato), mns (less)
What to do?
3
I am trying to remove unwanted words from any text, but it takes away from other words. For example:
remover_palavras = ["a", "e", "o"]
The program returns: btt (potato), mns (less)
What to do?
5
If it’s a simple joke, you can create an algorithm that does the following:
Create a list of words to be removed from the text.
Create a list (lista_frase
) where each element in the list is a word from its original phrase.
Create a second list (result
), selecting items from the first list (lista_frase
) which are not in the list of deleted words (remover_palavras
).
Joins all elements of the resulting list by separating them by a space.
Code example:
frase = 'Oi, eu sou Goku e estou indo para a minha casa'
remover_palavras = ['a', 'e']
lista_frase = frase.split()
result = [palavra for palavra in lista_frase if palavra.lower() not in remover_palavras]
retorno = ' '.join(result)
print(retorno)
The exit will be
Hi, I’m Goku I’m going to my house
0
For me the best way would be with Regular Expressions:
import re
text = 'Oi, eu sou Goku e estou indo para a minha casa'
palavras = ['a','e']
for i in palavras:
text = re.sub(r'\s'+i+'([\s,\.])',r'\1',text)
print(text)
I find it interesting that if there is any score that it is maintained, but then it will be of interest to you.
-1
I’m a beginner in Python, but function that solves your problem.
def remover_palavra(palavra, remover):
remover_tamanho = len(remover)
palavra_tamanho = len(palavra)
while True:
remover_posicao = palavra.find(remover)
if remover_posicao != -1:
palavra_inicio = palavra[0:remover_posicao]
palavra_fim = palavra[remover_posicao+remover_tamanho:palavra_tamanho]
palavra = palavra_inicio + palavra_fim
else:
break
return palavra
palavras = ["batata", "menos"]
palavras_para_remover = ["a", "e", "o"]
for palavra in palavras:
resultado = palavra;
for remover in palavras_para_remover:
resultado = remover_palavra(resultado, remover)
print(resultado)
btt
mns
This is exactly the result that is not expected. Note that you have not removed words, but the letters of the words. That is not the request. If the phrase is "the potato", the output should be only "potato", not "btt".
I think I understand what you mean. I really didn’t understand the question. But the code comes to work (as I said I’m starting with python now). Using as variables: "words = ["the potato"]palavras_para_remover = ["the "]" works, but really the ideal code is to check spaces between words...
See the jbueno solution above. It is very simple and does what you ask. It will be useful to study it.
Browser other questions tagged python python-3.x
You are not signed in. Login or sign up in order to post.
Can you elaborate on your problem? Do you want to remove the letters "a", "e" and "o" only when they are alone? By the way, edit the question and add your code.
– Woss
Beware of such substitution, he doesn’t have much abundance.
– Oralista de Sistemas
@Andersoncarloswoss, yes when they are alone. I want my program to inform the stopWords, like: of, if, him, you, that, etc.
– Dúvida.Net
You can give an example of an input and how you want to output?
– MagicHat