How can I check if a column of a dataframe is contained in a column of another dataframe?

Asked

Viewed 73 times

-1

Hello, I have the following question: I have two dataframes and I want to check if the values of one column contain in another column of another dataframe that does not have the same name nor the same order One column has 22,000 lines and the other has 48,000 lines. In case I want to check if the id of one column has in another and if yes return only the lines that match. For example df1 has a column with the following values,

column = [ '1', '2', '3', '4', '5']

in df2

column2 = ['1', '3']

I want to return the lines of df1 that contains the values of the df2 column

curso = cursos.where(cursos['CÓDIGO UNIDADE DE ENSINO'] == cursoAtivo['CO_UNIDADE_ENSINO']).notna()

I received the following error for the above code Can only compare identically-labeled Series Objects

  • 1

    can make a minimal replicable example?

  • Hi Lucas, I tried to explain more clearly. The name of the columns are not identical

  • You want to know coluna2 all or part of it?

  • I want to know the whole column

  • For a hunch, it would be something like this >>>df = curso.where(cursos['CÓDIGO UNIDADE DE ENSINO'].isin(cursoAtivo['CO_UNIDADE_ENSINO'])), >>>curso = df.dropna(0, subset=["coluna"], inplace=True) .I ask you to edit the question and put a sample of the two Dataframes involved so you can provide an accurate and documented answer.

1 answer

1

Assuming that it is coluna2 has to be inside the coluna, I believe the best way is to use set.issubset().

Creating Test Dataframes

>>> df1 = pd.DataFrame({"coluna": [ '1', '2', '3', '4', '5']})
>>> df2 = pd.DataFrame({"coluna2": [ '1', '3']})

Dataframes

>>> df1
  coluna
0      1
1      2
2      3
3      4
4      5

>>> df2
  coluna2
0       1
1       3

Creating sets

>>> c1 = set(list(df1["coluna"]))
>>> c2 = set(list(df2["coluna2"]))

Checking if one column is inside another

>>> c2.issubset(c1)
True

Important Note The set is a set without repetition. Thus, a column whose list is [1, 2, 1, 2, 2, 2, 2] will have a set equal to {1, 2}. So, if you want to test the sequence [1, 2, 1, 2, 2, 2, 2] is in another dataframe, the solution presented will not work for this case.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.