Replace list items in a dataframe column

Asked

Viewed 978 times

0

I’m trying to replace names that are on a list in a column of a large dataframe. I’m trying this way, but it’s not working...

List of names (the list is too long):

Jack
Liam
John
Ethan
George
...

Small example of dataframe:

       A          B                                   C
  French      house                Phone <phone_numbers>
 English      house                 email <adresse_mail>
  French  apartment                      my name is Liam
  French      house                         Hello George
 English  apartment   Ethan, my phone is <phone_numbers>

My script:

import re
import pandas as pd
from pandas import Series

df = pd.read_excel('data_frame.xlsx')
data = Series.to_string(df['Descricao'])

first_names = open('names_list.txt', 'r')
names_read = first_names.readlines()

def names_teste(no_names):

    list_to_string = ''.join(names_read)

    for l in list_to_string.split('\n'):
        replaces = no_names.replace([l, '<name>'], l)
    return replaces

result = names_teste(no_names)
print(result)

My result shows an error:

runfile('C:/Users/marin/Desktop/Python/replaces.py', wdir='C:/Users/marin/Desktop/Python')
Traceback (most recent call last):

  File "<ipython-input-30-d10d01d4e428>", line 1, in <module>
runfile('C:/Users/marin/Desktop/Python/replaces.py', wdir='C:/Users/marin/Desktop/Python')

  File "C:\Programmes\lib\site-packages\spyder\utils\site\sitecustomize.py", line 705, in runfile
execfile(filename, namespace)

  File "C:\Programmes\lib\site-packages\spyder\utils\site\sitecustomize.py", line 102, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)

  File "C:/Users/marin/Desktop/Python", line 121, in <module>
result = names_teste(no_names)

  File "C:/Users/marin/Desktop/Python", line 103, in names_teste
replaces = no_names.replace([l, '<name>'], l)

TypeError: replace() argument 1 must be str, not list

Good output:

                                  C
              Phone <phone_numbers>
               email <adresse_mail>
                  my name is <name>
                       Hello <name>
<name>, my phone is <phone_numbers>

1 answer

2


This version uses regular expression to replace all names at once:

df = pd.read_excel('data_frame.xlsx')

with open('names_list.txt') as nomes:
    re_nomes = re.compile(r'|'.join(re.escape(nome.strip())
        for nome in nomes), flags=re.IGNORECASE)

df['Descricao'] = df['Descricao'].str.replace(re_nomes, '<name>')
print(df)    

Already this version replaces name by name:

df = pd.read_excel('data_frame.xlsx')

with open('names_list.txt') as nomes:
    for nome in nomes:
        df['Descricao'] = df['Descricao'].str.replace(nome, '<name>')

print(df)    
  • Thanks @nosklo, but my output is None :(

  • 1

    @marin I tested both codes and worked perfectly here.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.