How do I delete lines by cell specific content in pandas

Asked

Viewed 112 times

0

I am preprocessing my data using the Python pandas library.

This is a project to train an algorithm to predict "roles"

This is the result I get when I run.

print(vagas.role_name.value_counts())
Security Entry                                                     9300
Retail Entry                                                       6562
Healthcare                                                         5884
Food & Hospitality                                                 2559
Unmatched Role                                                     1922
Security Experienced                                               1481
Education                                                           541
Corporate Experienced                                               538
Retail Experienced                                                  309
Service Technician                                                  188
Transportation                                                      183
Sales                                                               175
Software & Technology                                               148
General Labor                                                       128
Corporate Entry                                                     110
Tire Sales & Service                                                 44
Insurance Sales Agent REFERRAL ONLY                                  33
Test and Referrals ONLY                                              29
Customer Service                                                     28
Insurance Sales Agent In Person                                      18
Insurance Sales Agent REFERRAL ONLY - Reliable Life Insurance        17
Security Officer                                                     12
Security Guard (Road Guard)                                           9
Insurance Sales Agent In Person - Reliable Life Insurance             8
Insurance Sales Agent Phone Interview - Reliable Life Insurance       5
Insurance Sales Agent In Person - Union National Life Insurance       4
Insurance Sales Agent Phone Interview                                 3
Guest Service Call Center Representative - example role only          3
Manager in Training - example role only                               3
Visual Merchandiser at Forever 21                                     2
Sales Associate at Forever 21 LIVE                                    2
DO NOT USE                                                            2
Truck Driver - CDL                                                    1
TRAINING ROLE ONLY                                                    1
Experienced Material Handler MCkesson  (4+years)                      1
Tire service Technician                                               1
Assistant Visual Manager at Forever 21                                1
test role MH                                                          1
Sales & Administration                                                1
Lead of Service - Fashion  LIVE                                       1
Delivery Professional                                                 1
Security Officer - Armed                                              1
Co Manager at Forever 21                                              1
Name: role_name, dtype: int64

I want to remove all "roles" that have less than 100 lines

  • See if my answer solves your problem.

1 answer

0

You can do it using groupby with count

df = df.loc[df.groupby('role_name')['role_name'].transform('count') >= 100]

Browser other questions tagged

You are not signed in. Login or sign up in order to post.