0
I am preprocessing my data using the Python pandas library.
This is a project to train an algorithm to predict "roles"
This is the result I get when I run.
print(vagas.role_name.value_counts())
Security Entry 9300
Retail Entry 6562
Healthcare 5884
Food & Hospitality 2559
Unmatched Role 1922
Security Experienced 1481
Education 541
Corporate Experienced 538
Retail Experienced 309
Service Technician 188
Transportation 183
Sales 175
Software & Technology 148
General Labor 128
Corporate Entry 110
Tire Sales & Service 44
Insurance Sales Agent REFERRAL ONLY 33
Test and Referrals ONLY 29
Customer Service 28
Insurance Sales Agent In Person 18
Insurance Sales Agent REFERRAL ONLY - Reliable Life Insurance 17
Security Officer 12
Security Guard (Road Guard) 9
Insurance Sales Agent In Person - Reliable Life Insurance 8
Insurance Sales Agent Phone Interview - Reliable Life Insurance 5
Insurance Sales Agent In Person - Union National Life Insurance 4
Insurance Sales Agent Phone Interview 3
Guest Service Call Center Representative - example role only 3
Manager in Training - example role only 3
Visual Merchandiser at Forever 21 2
Sales Associate at Forever 21 LIVE 2
DO NOT USE 2
Truck Driver - CDL 1
TRAINING ROLE ONLY 1
Experienced Material Handler MCkesson (4+years) 1
Tire service Technician 1
Assistant Visual Manager at Forever 21 1
test role MH 1
Sales & Administration 1
Lead of Service - Fashion LIVE 1
Delivery Professional 1
Security Officer - Armed 1
Co Manager at Forever 21 1
Name: role_name, dtype: int64
I want to remove all "roles" that have less than 100 lines
See if my answer solves your problem.
– Terry