2
I’m with a dataframe where you would like to replace the 0.1 encoding by yes and no. Some columns of the df
have this encoding and so I wrote the following command:
dados_trabalho = dados_trabalho.replace({"ASSINTOM": {0: "Sim", 1 : "Não"}}).replace({"DOR ATIPICA": {0: "Sim", 1 : "Não"}}).replace({"IAM": {0: "Sim", 1 : "Não"}}).replace({"HAS": {0: "Sim", 1 : "Não"}}).replace({"DM": {0: "Sim", 1 : "Não"}}).replace({"DISPLIP": {0: "Sim", 1 : "Não"}}).replace({"DOR TIPICA": {0: "Sim", 1 : "Não"}})
It runs correctly and replaces the columns identified by the new encoding, but I would like to know if there is a way to summarize this formula so that the script don’t get huge.
I tried to create the function:
def change_columns(df):
c = df.columns
df = df.replace({c: {0: "Sim", 1 : "Não"}})
The problem is when I insert the dataframe in this function occurs the following error:
change_columns(dados_trabalho)
TypeError Traceback (most recent call last)
<ipython-input-141-43eb9316b19b> in <module>
----> 1 change_columns(dados_trabalho)
<ipython-input-140-9fbbd4e9e293> in change_columns(df)
1 def change_columns(df):
2 c = df.columns
----> 3 df = df.replace({c: {0: "Sim", 1 : "Não"}})
/usr/lib/python3/dist-packages/pandas/core/indexes/base.py in __hash__(self)
2060
2061 def __hash__(self):
-> 2062 raise TypeError("unhashable type: %r" % type(self).__name__)
2063
2064 def __setitem__(self, key, value):
TypeError: unhashable type: 'Index'
I’m starting with Python and so I believe I’m forgetting something.
RESOLVED:
I was able to solve the problem with the following code:
import pandas
def change_columns(df, cols):
for col_name in cols:
df = df.replace({col_name: {0:'sim', 1:'nao'}})
return df
# create sample data
df = pandas.DataFrame([[0, 0, 1, 0, 1, 1], [1, 0, 1, 0, 1, 0]])
print('Starting DataFrame:')
print(df)
# define columns to do the replacement
columns_to_replace = [0, 2, 3]
# perform the replacement
df = change_columns(df, columns_to_replace)
# see the result
print('After processing DataFrame: ')
print(df)
Running the code above should produce the result:
Starting DataFrame:
0 1 2 3 4 5
0 0 0 1 0 1 1
1 1 0 1 0 1 0
After processing DataFrame:
0 1 2 3 4 5
0 yes 0 no yes 1 1
1 no 0 no yes 1 0
Igor thanks for the answer. The problem is that I just want to apply in these columns. df is bigger and has more columns. When I spin this way it replaces in all columns and not in the ones I selected
– Pedro Henrique
If you do
df[['coluna1', 'coluna2']].replace(["Sim", "Não"], [1, 0])
doesn’t solve? Then selects the columns you want.– Igor Cavalcanti