In Python, check if the contents of one column are present in another column?

Asked

Viewed 47 times

-1

I’m at Jupyter Notebook working with Python.

My current dataframe is in the following format - data.Columns['name', 'filename', 'text'].

Imagem do dataframe

All columns are string type, wish, take the 'name' column and check that all of its contents are found exactly somewhere in the string of the 'text' column. I wish to have as a result all verified records whether found or not, as shown below.

Imagem do resultado esperado

Subsequently I wish to have this result exported to csv.

  • This code gives error. I have a script that gives me only the results found (date[['name' in x for x in data['text']]]), but I need all.

1 answer

1


I don’t have your base so I created my own. See below:

>>> import pandas as pd

>>> df = pd.DataFrame({"nome": ["teste 1", "teste 2", "teste 3"], "nome_arquivo": ["um arquivo", "dois arquivos", "tres adivinha"], "texto": ["Aqui vc encontra um arquivo", "Aqui nao tem o texto", "tres adivinha está aqui"]})

>>> df
      nome   nome_arquivo                        texto
0  teste 1     um arquivo  Aqui vc encontra um arquivo
1  teste 2  dois arquivos         Aqui nao tem o texto
2  teste 3  tres adivinha      tres adivinha está aqui

>>> df["encontrado"] = df.apply(lambda x: x.nome_arquivo in x.texto, axis=1)

>>> df
      nome   nome_arquivo                        texto  encontrado
0  teste 1     um arquivo  Aqui vc encontra um arquivo        True
1  teste 2  dois arquivos         Aqui nao tem o texto       False
2  teste 3  tres adivinha      tres adivinha está aqui        True

I hope it helps

  • Hi @Paulo, thank you very much, flowed in parts, this command generates an error, but generates the dataframe only with the column result. Error follows: A value is trying to be set on a copy of a Slice from a Dataframe. Try using . Loc[row_indexer,col_indexer] = value Instead See the caveats in the Documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy """Entry point for Launching an Ipython kernel. I’m looking to see if I can fix this one. Thanks.

  • @Perciliano, better open another post with the error, the code piece and, if possible, the dataset. Maybe just a piece like I did generating the answer above

Browser other questions tagged

You are not signed in. Login or sign up in order to post.