How to replace more than one character in the replace method in Python 3?

Asked

Viewed 9,012 times

10

I have the text and need to remove all punctuation marks using just one method replace(), without using tie as well.

My initial idea would be to use texto.replace('.', ''), this for each type of score, but I would like to simplify.

  • You tested it to see if it works?

  • 3

    I reversed the tag addition, as the author did not necessarily request solutions with regex. Remembering that the tags refer to the question, not the answers given.

5 answers

15


You have to create a function that does this, that is to create an abstraction to meet the requirement of being a single command. But as every command in the language does not mean that it will be a single instruction to be executed. The naive way of doing this would be something like this:

def mulipleReplace(text):
    for char in ".!?,":
        text = text.replace(char, "")
    return text

But this is extremely inefficient because the string is immutable, so each of these operations will generate a new string, allocating memory that is something expensive, including because then you will have to copy everything, you will have to release enough to be more expensive still, and potentially could clog the memory. The most idiomatic way to solve this would be:

def mulipleReplace(text):
    return "".join([char if char in ".!?," else "" for char in text)

Behold working in the ideone. And in the repl it.. Also put on the Github for future reference.

This form creates a list of characters that matters and then transforms the list into string.

Can still do with Regex, which I do not like even because it is usually inefficient (I don’t guarantee in this case because I test a little in Python, if you want to test all options before you do, and there are several other ways to do the same, but I have confidence that the previous one is very fast and probably the fastest), something like this:

texto = re.sub(".!?,", texto, "")

Don’t forget to import the module re.

Moral of the story: programming is not to line up commands that we don’t understand what we do, it is to understand the whole process that occurs in the computer, in the technology that you are using and to know that to solve problems, the real profession that we practice is necessary to have methodology and technique. It has several ways to do it, but most need awareness, it can’t just be copy and paste, and it may not exist ready, or the ready is not ideal.

7

TL;DR:

import re
...
novo_texto = re.sub(r"\W", texto_original, "")

Full answer:

The "replace" can only exchange a single string of text for another. For more complex text operations, the so-called "regular expressions".

Regular expressions are a mechanism that exist in several programming languages, including Python, precisely to find complex patterns within text, and to proceed to some operation from there (e.g., substitution).

The biggest negative point of the same is precisely a side effect of its greater strength: to have a great expressiveness to describe the various patterns of text, they become complex - to the point of being described practically as another programming language - a "mini language" inside Python.

In Python the regular expressions are all well behaved and well defined - and has no special syntax. Simply import the module re - and there you have several functions that always have the same format: you pass the regular expression as a string in the first parameter, the text where you want to find the occurrences in the second parameter, and depending on the operation more parameters - in the case of the replace, you need the method re.sub. (In other languages, such as perl and javascript, regular expressions use special syntax instead of being written as function calls - this does not make it easy to read code).

So, if you read the linked documentation above, you will find that you can describe a "group of characters" within [] in regular expression - and any of the group characters will "give match" in your search.

import re
...
novo_texto = re.sub(r"[,;&!?/.:]", texto_original, "")

Ready - this will make the "replace" of any character among ,;&!?/.: by empty string ''.

If you want to eliminate all strange characters other than letters, numbers and _, instead of putting all characters can use the expression:

novo_texto = re.sub(r"\W", texto_original, "")

Another detail is that we usually put one r as a quote prefix of the string that is passed as a regular expression: this causes Python not to interpret the character \ inside that string as a Python escape - that is, the sequence \n, for example, it will not be "translated" before the code is even executed, to the code character \x10 - and yes, it will be kept as two literal characters ([' ', 'n']). This is necessary because the \ is also widely used within regular expressions.

Note that regular expression documentation can seem very dense and confusing - and indeed it is! Unless you’re going to make intensive use of regexps, it can lead years old to be comfortable with its use, to the point of creating your own without staying too long doing tests before - this is normal. If you are going to use a lot, the recommended is to train a little a day - like gym, until you have ease.

No regular expressions

Programmatically it is also possible to replace several characters, but then you really need one for and call replace once for each character or sequence you want to replace. Depending on the occasion, it can be very time consuming to build a regular expression that works well, so you can use the syntax you are already used to.

The great disadvantage of doing this in this case is that the string to be replaced will be processed once for each character, while the regular expression does this only once - if it is a critical point of an application, for example, a web application that needs to do this quickly, and still respond to other requests several times in the same second, And if the text is large (200_000 characters or more), it can make a difference. For small text (a few Kilobytes) and a few characters, in an operation that will be executed only once (treat a sample of text that will be passed to a database or AI engine), it makes absolutely no difference to write the code in pure Python - the running time will be less than one tenth of a second anyway:

texto = ...
...
for caractere in "!@#$%*()<>:|/?":
    texto = texto.replace(caractere, "")

2

You can use a for to solve your problem

texto = input("Digite um texto: ")

for c in ".!?,#@&%":
    texto = texto.replace(c, '')

2

You can import the module string and return the special characters with string.punctuation:

Ex.:

import string

for c in string.punctuation:
    texto = texto.replace('c','')

Very similar to Juliana Marques' code, the string.punctuation returns all these characters:

!"#$%&\'()*+,-./:;<=>?@[\\]^_``{|}~)
  • Despite the interesting quote to "string.punctuation', which adds things that are not in the other answers, the code is incorrect - the variable texto remains unchanged, since the .replace does not change the string "inplace", but returns a copy - and, if you did, you would only be replacing the literal letter "c" in the text, not the contents of the variable c.

  • @jsbueno true!!! I had not noticed, I will edit. Grateful!!!

0

texto = '@#$#$@!#$@!#$ AB C D DFGSFGS DFGS #$%@#$%@#%$,..,.,..3452345.1'
caracter_esp = '.!?,@#%$ ' 

Here you enter with all the characters you want to remove caracter_esp = '.!?,@#%$ ', if you want to remove the spaces just add a space (already inserted a space).

You can use list comprehension:

nova_string = "".join([ caracter for caracter in texto if caracter not in caracter_esp])
print(nova_string)

Can do using filter:

outra_string = "".join((filter(lambda caracter: caracter not in caracter_esp, texto)))
print(outra_string)

Browser other questions tagged

You are not signed in. Login or sign up in order to post.