How to reduce formula . replace in Python

Question

How to reduce formula . replace in Python

Asked 6 years, 6 months ago

Viewed 199 times

2

I’m with a dataframe where you would like to replace the 0.1 encoding by yes and no. Some columns of the df have this encoding and so I wrote the following command:

dados_trabalho = dados_trabalho.replace({"ASSINTOM": {0: "Sim", 1 : "Não"}}).replace({"DOR ATIPICA": {0: "Sim", 1 : "Não"}}).replace({"IAM": {0: "Sim", 1 : "Não"}}).replace({"HAS": {0: "Sim", 1 : "Não"}}).replace({"DM": {0: "Sim", 1 : "Não"}}).replace({"DISPLIP": {0: "Sim", 1 : "Não"}}).replace({"DOR TIPICA": {0: "Sim", 1 : "Não"}})

It runs correctly and replaces the columns identified by the new encoding, but I would like to know if there is a way to summarize this formula so that the script don’t get huge.

I tried to create the function:

def change_columns(df):
    c = df.columns
    df = df.replace({c: {0: "Sim", 1 : "Não"}})

The problem is when I insert the dataframe in this function occurs the following error:

change_columns(dados_trabalho)

TypeError                                 Traceback (most recent call last)
<ipython-input-141-43eb9316b19b> in <module>
----> 1 change_columns(dados_trabalho)

<ipython-input-140-9fbbd4e9e293> in change_columns(df)
      1 def change_columns(df):
      2     c = df.columns
----> 3     df = df.replace({c: {0: "Sim", 1 : "Não"}})

/usr/lib/python3/dist-packages/pandas/core/indexes/base.py in __hash__(self)
   2060 
   2061     def __hash__(self):
-> 2062         raise TypeError("unhashable type: %r" % type(self).__name__)
   2063 
   2064     def __setitem__(self, key, value):

TypeError: unhashable type: 'Index'

I’m starting with Python and so I believe I’m forgetting something.

RESOLVED:

I was able to solve the problem with the following code:

import pandas

def change_columns(df, cols):
    for col_name in cols:
        df = df.replace({col_name: {0:'sim', 1:'nao'}})
    return df

# create sample data
df = pandas.DataFrame([[0, 0, 1, 0, 1, 1], [1, 0, 1, 0, 1, 0]])
print('Starting DataFrame:')
print(df)

# define columns to do the replacement
columns_to_replace = [0, 2, 3]
# perform the replacement
df = change_columns(df, columns_to_replace)

# see the result
print('After processing DataFrame: ')
print(df)

Running the code above should produce the result:

Starting DataFrame:
   0  1  2  3  4  5
0  0  0  1  0  1  1
1  1  0  1  0  1  0
After processing DataFrame:
     0  1   2    3  4  5
0  yes  0  no  yes  1  1
1   no  0  no  yes  1  0

2 answers

2

Editing: I noticed only after putting the answer, that I had already solved your problem. In any case, I leave here the answer. Maybe I can help someone in the future.

Your problem results from the way you are passing the column parameter to the replace function. The column name is key of the dictionary and as such should be of an immutable type. Due to this, you cannot pass the list with columns directly.

Try making the following change in your function:

def change_columns(df):
    c = df.columns
    df.replace({c : {'0':'Sim', '1':'Nao'} for c in columns}, inplace=True)

The previous function is also equivalent to:

def change_columns(df):
    for c in df.columns:
        df.replace({c : {'0':'Sim', '1':'Nao'}}, inplace=True)

Note that it is necessary to filter the columns where the substitution should be applied, otherwise the substitution will be performed in all columns of the Dataframe.

Thank you very much Bruno

– Pedro Henrique

2019/01/27 at 21:54

Browser other questions tagged python python-3.x pandas

You are not signed in. Login or sign up in order to post.

by Igor Cavalcanti • **478** points · Answer 1 · 2019-01-23T19:28:17+00:00

Well, according to the documentation from pandas you can make the replacement directly across the DataFrame as follows (reference reference here):

df = df.replace(["Sim", "Não"], [1, 0])

Note that you are replacing every occurrence of the values "Yes" and "No" by 0 and 1 respectively throughout the Dataframe, thus eliminating the need to do one by one.