Changing a value with the pandas library

Asked

Viewed 4,838 times

1

I’m opening a file. csv with the pandas library, but I am informed at the time of opening this file that a given column presents values of different types. I know that the character "/" was used in this file to denote missing data, very likely this is the problem. My question is, how to replace the "/"" character with another value?

2 answers

1

Good afternoon Mauricio, all right?

I believe that if the pandas is identifying the '/' character, you can use the 'apply' function'.

novodf = antigodf.apply(lambda x: x.replace('/','0'))

This way you would replace '/' with '0'

If you want to replace the/' value with Nan, to use functions related to null numbers of the pandas, import the numpy library and replace the apply '0' with np.Nan, thus:

import numpy as np
novodf = antigodf.apply(lambda x: x.replace('/',np.nan))

If you want to replace Nan, you can use the function fillna() passing the value that will replace Nan

novodf.fillna(0)

This command would replace Nan values with 0.

Here is a link from Paulo Vasconcellos' website with great tips for pandas and data : Paulovasconcellos

I hope I’ve helped.

Thank you.

Claudio.

  • Thanks Claudio!

  • I was unsuccessful using the expression novodf = antigodf.apply(lambda x: x.replace('/','0')). No execution error occurs, but it does not replace the character "/".

1


If the desired data type is numerical, you can, after opening the file, do this:

df['coluna']=pd.to_numeric(df['coluna'], errors='coerce')

Thus, you already convert the existing values to number and the '/' to Nan. Then if you want a number instead of Nan, you can use the fillna() function described by Claudio Gonçalves Filho.

  • Hi Paula, thank you so much.

  • It worked!!!! However, there are also situations where the field mixes "/" with number, as a result of typo. Example: "/2.53". Is there any way to delete only the "/" characters? Or replace "/" with vázio ("")?

  • Speak Mauricio, you can use the function lstrip() iterating over the columns of your dataframe. For a column would be : df['Coluna'] = novodf['Coluna'].str.lstrip('/') Iterating : for coluna in df.columns: 
 novodf[coluna] = novodf[coluna].str.lstrip('/')

Browser other questions tagged

You are not signed in. Login or sign up in order to post.