How to compare snippets of two columns of a Dataframe to generate filter

Asked

Viewed 73 times

0

I have a data frame with multiple columns. I need to filter based on two if the end of the string is the same as the other. p. example:

Item 1           Item 2       Item 3
carro do joão    quitado     casa do joão
carro do josé    quitado     casa do antonio
carro do thiago  quitado     casa do thiago

I need to pick up only the lines where the owners are the same in Item 1 and Item 3

I tried this way, but it only returns a list [John, John], I can’t think of alternatives.

  teste = []

for row in dados.iterrows():
    teste += dados['Item 1'].str.split('do ')[1][1] == dados['Item 3'].str.split('do ')[1][1]

2 answers

0


An alternative is to use an apply to separate the part you want and then compare the two series.

df = pd.DataFrame({'Item 1':['carro do joão', 'carro do josé', 'carro do thiago'],
              'Item 2':['quitado', 'quitado', 'quitado'],
              'Item 3':['carro do joão', 'carro do marcelo', 'carro do thiago']})
teste = df['Item 1'].str.split('do ').apply(lambda x: x[1]) == df['Item 3'].str.split('do ').apply(lambda x: x[1])
df[teste]

Or use .str.Extract to take everything after a da/de/das/dos and compare

teste = df['Item 1'].str.extract('.*d.+ (.*)', expand=False) == df['Item 3'].str.extract('.*d.+ (.*)', expand=False)
df[teste]

-1

I’m a little rusty but what if it is:

test += data[data['Item 1'].str.split('do')[1][1] == data['Item 3'].str.split('do')[1][1]] , would work ?

Browser other questions tagged

You are not signed in. Login or sign up in order to post.