Why use double clasps in Pandas?

Asked

Viewed 158 times

1

Given the following Dataframes:

df = pd.DataFrame([[1, 2, 1], [4, 5, 2], [1, 2 , 3]],
     columns=['coluna1', 'coluna2','id'])

df2 = pd.DataFrame([[1, 7, 1], [4, 'a', 2], [1, 'abc', 3]],
     columns=['coluna3','coluna4', 'id'])

I want to merge between them but only bringing the column3 of df2

In case I use:

df = df.merge(df2['coluna3','id'], on='id', how='left')

I get the following error:

Keyerror: ('coluna3', 'id')

But if instead of using only one bracket ( [] ) to select the columns I want, I use two ( [[]] ), it works normally, why this??

df = df.merge(df2[['coluna3','id']], on='id', how='left')

inserir a descrição da imagem aqui

1 answer

3


In practice, they are two equal operators and are equivalent to the method __getitem__, according to the documentation of the pandas that you can see here.

It is easier to understand this equivalence by looking at an example. To replicate the result using [] with the __getitem__ just do:

df.__getitem__('coluna1')

returning:

0    1
1    4
2    1

Note that, as is a function, if you use coluna1, coluna2 python will understand that they are two arguments, but the function expects only one. That’s why doing so you will have an error given by __getitem__() takes 2 positional arguments but 3 were given ( the first argument is self, the dataframe itself).

However, despite the function _getitem_ not accept more than one string beyond the dataframe, it accepts a list of strings as a single argument, as you can see running df.__getitem__(['coluna1','coluna2'])

Anyway, that’s basically the explanation. It’s like there is no operator [], it is only a symbol that replicates the behavior of a function that accepts only one argument.

Read about the __getitem__ here

Browser other questions tagged

You are not signed in. Login or sign up in order to post.