I see here two distinct cases:
Case 1: All value of Nomes
which has the delimiter vírgula (,)
is obligatorily repeated
Case 2: The values of Nomes
that having a comma is not necessarily a repeated name.
Solutions
Case 1
df = pd.DataFrame({'Codigo': [1, 2, 3, 4],'Nomes': ['Alan Silva, Alan Silva', 'Carlos Santos, Carlos Santos', 'Joao Pedro', ' João Pedro']})
df['Nomes'] = df['Nomes'].apply(lambda nome: nome.split(',')[0])
Case 2
>>> df = pd.DataFrame({'Codigo': [1, 2, 3, 4],'Nomes': ['Alan Silva, Alan Silva', 'Carlos Santos, Carlos Santana', 'Joao Pedro', ' João Pedro']})
>>> df
Codigo Nomes
0 1 Alan Silva, Alan Silva
1 2 Carlos Santos, Carlos Santana # Repare que o segundo é Santana
2 3 Joao Pedro
3 4 João Pedro
For this case, set a function that checks that the name is equal and returns the correct value.
def verifica_repetido(nomes):
l_nomes = nomes.split(",")
if len(l_nomes) == 1:
return nomes
if l_nomes[0].lstrip().rstrip() == l_nomes[1].lstrip().rstrip():
return l_nomes[0]
return nomes
Then do as you did the first time, but applying the function.
df['Nomes'] = df['Nomes'].apply(verifica_repetido)
Note You can do the same thing with the named function you did with lambda
Perfect! Both solution meet perfectly. Thank you
– Alan Teixeira