Doubt replace in pandas

Asked

Viewed 78 times

0

Doubt in replace in dataframe pandas

texto = 'Vírus de computadores são uma lenda urbana.'

dado = {'texto': [texto]}
df = pd.DataFrame(dado)

df['nova_coluna'] = df.texto.str.replace('urbana', '')

By creating this new column we can buy that the word 'urban' has been retained

but if you put that expression inside the is not right

palavras = 'computadores', 'uma', 'lenda'

for palavra in palavras:

    df['nova_coluna'] = df.texto.str.replace(palavra, '')

is created by column but words are not exchanged

2 answers

2

The problem is that when you do df['nova_coluna'] = df.texto.str.replace(palavra, '') you are changing the original text all the time, ie an iteration you remove 'computers' and not saved, then 'one' and do nothing.

import pandas as pd

texto = 'Vírus de computadores são uma lenda urbana.'

dado = {'texto': [texto]}
df = pd.DataFrame(dado)

palavras = 'computadores', 'uma', 'lenda'

for item in range(len(palavras)):

    texto = texto.replace(palavras[item], '')

df['nova_coluna'] = texto

0

First palavras will have content: "legend". The correct statement of lists is:

palavras = ['computadores', 'uma', 'lenda']

Then comes the next problem. df['texto'] has only 1 value (the correct term is another in fact) and replaces the list values palavras, mine df['texto'] would have 3 values. Which is a problem. So I’ll show you a way to solve and then you adapt to what you really need :)

import pandas as pd


texto = 'Vírus de computadores são uma lenda urbana.'

dado = {'texto': [texto]}
df = pd.DataFrame(dado)

df['nova_coluna'] = df.texto.str.replace('urbana', '')

for palavra in palavras:
   df['nova_coluna'] = df.texto.replace(palavra, '')
df

In that case the value of df.nova_coluna is only one. However, it went through 3 values in each loop iteration:

  • 1: 'Virus of are an urban legend.'
  • 2: 'Computer viruses are urban legend.'
  • 3: 'Computer viruses are an urban.'

For the value of df.texto ('Computer viruses are an urban legend') will always be used to define the value of df.nova_coluna.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.