Remove Python punctuation and symbols

Asked

Viewed 16,389 times

0

I’m trying to remove the punctuation symbols and other symbols (characters like copyright, for example) from a string.

I want to leave the accented characters, the hyphen, the apostrophe ('), the blank, in addition to the letters and numbers.

How to do this in python?

2 answers

3

You can select the characters one by one to remove them.

def chr_remove(old, to_remove):
    new_string = old
    for x in to_remove:
        new_string = new_string.replace(x, '')
    return new_string

so you can remove only the desired characters. Ex:

> s = "string $com (caracteres#."
> print chr_remove(s, "$(#") # remove $,# e ( da string
string com caracteres.
  • Thanks for the tip, but this way it is a little impossible to predict all the symbols that can appear

2


Try to use regex:

import re

string_nova = re.sub(u'[^a-zA-Z0-9áéíóúÁÉÍÓÚâêîôÂÊÎÔãõÃÕçÇ: ]', '', string_velha.decode('utf-8'))
  • Your suggestion removed the punctuation but also removed the accented characters. I would like to keep them.

  • Oops! Sorry for the mistake, I edited the code. Now should solve your problem.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.