How to change duplicate data in a dataframe?

Asked

Viewed 63 times

1

I am trying to automate a process that I do manually in excel. That is to extract the company’s employee base from excel, select some specific columns (because the file is too large), remove certain level of hierarchy and filter some companies. so far it has been. if you want to give any suggestions for better it will be very welcome. however in the name column, has some duplicate names are really different people. It is necessary to keep duplicate. my doubt is when I do in excel I put "." at the end of each name to differentiate, I can do it by python? I am using googlecolab.

Obs: when I run it presents 2 errors

  1. WARNING *** file size (7827463) not 512 + Multiple of sector size (512)
  2. /usr/local/lib/python3.6/dist-Packages/ipykernel_launcher.py:6: Userwarning: Boolean Series key will be reindexed to match Dataframe index. is normal?
view = pd.read_excel ("/content/View.xls")
filtro = view['Emp'] < 4
filtro2 = view['Hierarquia Cargo'] > 3
view1 = view[filtro]
view5 = view1[filtro2]
bd = view5 [['Nome', 'Emp', 'EST', 'Matr', 'Nome Estabelecimento', 'Descr Unid Lotacao', 'Descr CC', 'Desc Afast']]
bd = bd.sort_values (by='Nome', ascending=True)
display (bd)

1 answer

2


You can rename duplicates yes. See below

Creating Dataframe Test

>>> import pandas as pd

>>> df = pd.DataFrame({"frutas": ["banana", "goiaba", "laranja", "banana", "uva", "laranja", "banana"]})

>>> df
    frutas
0   banana
1   goiaba
2  laranja
3   banana
4      uva
5  laranja
6   banana

Renaming duplicates

>>> df["frutas"] = df.frutas.where(~df.frutas.duplicated(), df.frutas + '.')

>>> df
     frutas
0    banana
1    goiaba
2   laranja
3   banana.
4       uva
5  laranja.
6   banana.

Realize that there is a banana and two banana.... This exemplifies the case that you have several people with the same name.

Spinning once again to take out the second case of banana.

>>> df["frutas"] = df.frutas.where(~df.frutas.duplicated(), df.frutas + '.')

>>> df
     frutas
0    banana
1    goiaba
2   laranja
3   banana.
4       uva
5  laranja.
6  banana..

Browser other questions tagged

You are not signed in. Login or sign up in order to post.