Reading undesirable characters - Python

Asked

Viewed 114 times

3

A way to separate words from a string provided as in the method below:

entrada = input().split(" ")

Doubt: If I want to exclude more than one specific character (" ,&ˆ*!!:") not necessarily in order, not necessarily grouped; what resources could I use?

Example:

"João:saiu!! de%$ˆcasa" -------------> "João saiu de casa" 

3 answers

8

Has a very simple way using Regex:

import re
input = "João:saiu!! de%$ˆcasa"
pattern = "[,&ˆ*!!:%\$\s]+"
repl = " "

output = re.sub(pattern, repl, input)
print(output)
# João saiu de casa

You can add new characters to exclude between brackets in the variable pattern.

5

It is also possible to remove unwanted characters without using regular expressions.

The class str contains the method str.translate() which returns a copy of the string in which each character was mapped through the translation table created with the static method str.maketrans() whose one of its implementation accepts two parameters that must be strings of equal length, and in the resulting dictionary, each character in x will be mapped to the character at the same position in y.

texto = "João:saiu!! de%$ˆcasa"                              #Texto a ser purgado.
indesejados = ":!%$ˆ"                                        #Caracteres a serem purgados.

tabela = str.maketrans(indesejados, " " * len(indesejados))  #Cria a tabela de tradução onde cada caractere indesejado será mapeado para um caractere de espaço.
novo_texto = " ".join(texto.translate(tabela).split())       #Purga o texto, o descontrói e o reconstrói sem espaços duplicados.

print(novo_texto)                                            #Imprime João saiu de casa

Test the example on ideone.

5

A solution:

import re

inp = "João:saiu!! de%$ˆcasa"
pattern = "[a-zà-ú]+"
output = " ".join(re.findall(pattern, inp, re.I))

print(inp)
print(output)

or without re.I (ignore case):

import re

inp = "João:saiu!! de%$ˆcasa"
pattern = "[a-zA-ZÀ-ú]+"
output = " ".join(re.findall(pattern, inp))

print(inp)
print(output)

Browser other questions tagged

You are not signed in. Login or sign up in order to post.