How to find data based on another column with Pandas?

Asked

Viewed 2,035 times

-1

I own the dataframe dfa:

id   nome  
1    jose  
2    pedro  
3    maria  
3    maria  
2    pedro   
1    jose

And a list with ids:

ids = [2, 3]

I want a dataframe dfb be the values of dfa filtered on the basis of ids:

id  nome  
2   pedro  
3   maria 

Including removing duplicate values in dfa

  • 3

    You are on Stackoverflow in English, please ask the question in English to avoid closing

  • Why does your dfa have all values duplicated in reverse order and why these values are also not duplicated in the output, since there are 4 values with ids equal to 2 or 3?

  • Because one of the goals is to remove the duplicity of dfa and put in dfb only values, whose, be in the list.

1 answer

1

Defining your dataframe:

>>> import pandas as pd

>>> d = {
...   'id': [1, 2, 3, 3, 2, 1], 
...   'name': ['jose', 'pedro', 'maria', 'maria', 'pedro', 'jose']
... }

>>> dfa = pd.DataFrame(data=d)
>>> print(dfa)

   id   name
0   1   jose
1   2  pedro
2   3  maria
3   3  maria
4   2  pedro
5   1   jose

Possessing a list of ids, ids = [2, 3], you can use the isin to verify which lines have the id on the list:

>>> ids = [2, 3]
>>> print(dfa['id'].isin(ids))

0    False
1     True
2     True
3     True
4     True
5    False
Name: id, dtype: bool

And with that, you can filter your original data:

>>> dfb = dfa[dfa['id'].isin(ids)]
>>> print(dfb)
   id   name
1   2  pedro
2   3  maria
3   3  maria
4   2  pedro

To remove duplicate records, simply use drop_duplicates:

>>> dfb = dfb.drop_duplicates()
>>> print(dfb)

   id   name
1   2  pedro
2   3  maria

Browser other questions tagged

You are not signed in. Login or sign up in order to post.