Defining your dataframe:
>>> import pandas as pd
>>> d = {
... 'id': [1, 2, 3, 3, 2, 1],
... 'name': ['jose', 'pedro', 'maria', 'maria', 'pedro', 'jose']
... }
>>> dfa = pd.DataFrame(data=d)
>>> print(dfa)
id name
0 1 jose
1 2 pedro
2 3 maria
3 3 maria
4 2 pedro
5 1 jose
Possessing a list of ids, ids = [2, 3]
, you can use the isin
to verify which lines have the id
on the list:
>>> ids = [2, 3]
>>> print(dfa['id'].isin(ids))
0 False
1 True
2 True
3 True
4 True
5 False
Name: id, dtype: bool
And with that, you can filter your original data:
>>> dfb = dfa[dfa['id'].isin(ids)]
>>> print(dfb)
id name
1 2 pedro
2 3 maria
3 3 maria
4 2 pedro
To remove duplicate records, simply use drop_duplicates
:
>>> dfb = dfb.drop_duplicates()
>>> print(dfb)
id name
1 2 pedro
2 3 maria
You are on Stackoverflow in English, please ask the question in English to avoid closing
– Melissa
Why does your
dfa
have all values duplicated in reverse order and why these values are also not duplicated in the output, since there are 4 values with ids equal to 2 or 3?– Woss
Because one of the goals is to remove the duplicity of dfa and put in dfb only values, whose, be in the list.
– Richard Lopes