Remove lines less frequently from pandas.dataframe

Question

Remove lines less frequently from pandas.dataframe

Asked 5 years, 3 months ago

Viewed 136 times

-2

I own a dataframe with more than 13000 lines and would like to remove some based on the frequency with which they appear taking into account the column named variedade.

df.variedade.value_counts()

RB867515    5084

SP813250    2500

RB855453     981

others       849

RB855156     750

RB855536     633

SP832847     561

RB835054     541

SP801842     423

SP835073     326

RB835486     253

RB845210     199

SP803280     187

RB72454      164

RB966928     146

Name: variedade, dtype: int64

I would like to keep only the 3 varieties that most appear and delete the rest, thus changing the amount of lines to just over 8000.

I tried the command:

v = df[['variedade']]

df[v.replace(v.apply(pd.Series.value_counts)).gt(900).all(1)]

However, after asking for one value_counts column variedade appears that I have more than 13000 lines yet. Does anyone have any idea where I’m going wrong?

Rafael, if my answer solved your problem, you can mark it as accepted. See the importance of this link how and why to accept an answer

– Terry

2020/05/17 at 14:59

1 answer

Browser other questions tagged python pandas

You are not signed in. Login or sign up in order to post.

by Terry • **889** points · Answer 1 · 2020-04-29T15:28:32+00:00

0

Combine the value_counts with a head(3).index to create a mask with the elements that most appear in the Dataframe. After, with isin select them.

mask = df['variedade'].value_counts().head(3).index    
df = df.loc[df['variedade'].isin(mask)]

Terry, I tried this command, however after using a df.shape I still have more than 13000 lines... I would like to disappear with the lines that do not correspond to those 3 most frequent, but this did not happen.

– Rafael Mansini

2020/04/29 at 17:50
To decrease the number of lines in your DF, you need to assign the return of the command .loc a variable. See the edition of my reply :)

– Terry

2020/04/29 at 21:27
Thanks for the tip! Helped a lot!!!

– Rafael Mansini

2020/04/30 at 12:40
@Rafaelmansini If you accept my answer as the correct one, it also helps me :)

– Terry

2020/04/30 at 14:04