How to remove duplicate names in rows from a python dataframe?

Question

How to remove duplicate names in rows from a python dataframe?

Asked 4 years, 1 month ago

Viewed 25 times

-1

Hello, everybody.

I have the following dataframe:

df = pd.DataFrame({
    'Codigo': [1, 2, 3, 4],
    'Nomes': ['Alan Silva, Alan Silva', 'Carlos Santos, Carlos Santos', 'Joao Pedro', ' João Pedro'],
})
df

    Codigo  Nomes
0   1   Alan Silva, Alan Silva
1   2   Carlos Santos, Carlos Santos
2   3   Joao Pedro
3   4   João Pedro

Is there any way to remove duplicate names? The output I need is:

Codigo  Nomes
0   1   Alan Silva
1   2   Carlos Santos
2   3   Joao Pedro
3   4   João Pedro

1 answer

Browser other questions tagged python pandas

You are not signed in. Login or sign up in order to post.

by Paulo Marques • **3,739** points · Answer 1 · 2021-06-25T20:06:21+00:00

I see here two distinct cases:

Case 1: All value of Nomes which has the delimiter vírgula (,) is obligatorily repeated
Case 2: The values of Nomes that having a comma is not necessarily a repeated name.

Solutions

Case 1

df = pd.DataFrame({'Codigo': [1, 2, 3, 4],'Nomes': ['Alan Silva, Alan Silva', 'Carlos Santos, Carlos Santos', 'Joao Pedro', ' João Pedro']})

df['Nomes'] = df['Nomes'].apply(lambda nome: nome.split(',')[0])

Case 2

>>> df = pd.DataFrame({'Codigo': [1, 2, 3, 4],'Nomes': ['Alan Silva, Alan Silva', 'Carlos Santos, Carlos Santana', 'Joao Pedro', ' João Pedro']})

>>> df
   Codigo                          Nomes
0       1         Alan Silva, Alan Silva
1       2  Carlos Santos, Carlos Santana     # Repare que o segundo é Santana
2       3                     Joao Pedro
3       4                     João Pedro

For this case, set a function that checks that the name is equal and returns the correct value.

def verifica_repetido(nomes):
    l_nomes = nomes.split(",")
    if len(l_nomes) == 1:
        return nomes
    if l_nomes[0].lstrip().rstrip() == l_nomes[1].lstrip().rstrip():
        return l_nomes[0]
    return nomes

Then do as you did the first time, but applying the function.

df['Nomes'] = df['Nomes'].apply(verifica_repetido)

Note You can do the same thing with the named function you did with lambda