How to remove special character and string column point from a data frame?

Asked

Viewed 4,239 times

0

raw_data = {'NAME': ['José L. da Silva', 
                      'Ricardo Proença', 
                      'Antônio de Morais']}

df = pd.DataFrame(raw_data, columns = ['NAME'])

How to transform the names of the NAME column into:

  • Jose L da Silva (no point or accent)
  • Ricardo Proenca (without the cedilla) and
  • Antonio de Morais (without the accent)?

1 answer

1

You can use the function apply() of objects of the type Series. With it you can apply any function that returns something. So, you can define a correction function and apply it. For example:

def corrigir_nomes(nome):
    nome = nome.replace('.', '').replace('ç', 'c').replace('ô', 'o').replace('é', 'e')
    return nome

And then apply to the column you want:

df['NAME'] = df['NAME'].apply(corrigir_nomes)

The result will be something like:

0      Jose L da Silva
1      Ricardo Proenca
2    Antonio de Morais
Name: NAME, dtype: object

Browser other questions tagged

You are not signed in. Login or sign up in order to post.