Find common values in two different dataframes

Asked

Viewed 428 times

1

Editing the question:

I have two Dataframes of different sizes, it’s them:

 df1 = pd.DataFrame({'bc': bc_1}, index=altura_1)    
 df1.shape()=(73,1)
 >>> print df1
             bc
  1.175441  0.002884
  1.115565  0.001905
  1.055689  0.003029
  0.995812  0.003366
  .
  .
  .

 df2 = pd.DataFrame({'bc': bc_2}, index=altura_2)    
 df2.shape()=(18,1)
 >>> print df2
            bc
  0.150  0.000005
  0.165  0.000007
  0.180  0.000010
  .
  .
  .

And they’re measuring the same variable, only using two different instruments.

The graph that represents this data is below: inserir a descrição da imagem aqui

Do you realize that in some moments, the curves (the points) are very close (os)? I need to find out what these points are and store them in another dataframe

  • Could you edit the question by adding examples? Like a piece of each dataframe and the expected output?

1 answer

2


Starting from these two dataframes:

>>> print df1
       dados
1.1    1.123
2.2    1.567
3.3    2.001
4.4    2.345

>>> print df2
       dados
1.0     1.12
2.3     1.56
3.5     2.00

I calculate the distance of two points on a graph:

def calc_distancia(x1, y1, x2, y2):
    return ((x1-x2)**2 + (y1-y2)**2) ** (1/2.0)

Now, we define a df3 and calculate the distance of the points df1 and df2. If the distance is less than one value, we add the data in df3:

DISTANCIA_MAXIMA = 0.5#Defini a distância máxima como 0.5

df3 = pd.DataFrame(data={'X1':[], 'Y1':[], 'X2':[], 'Y2':[], 'Distancia':[]})
#Verifia quais valores são iguais comparando todos os dados1 com todos os dados2
for index_df1, row_df1 in df1.iterrows():
    for index_df2, row_df2 in df2.iterrows():
        distancia = calc_distancia(index_df1, row_df1['dados'], index_df2, row_df2['dados'])
        if distancia < DISTANCIA_MAXIMA:
            df3 = df3.append({'X1':index_df1, 'Y1':row_df1['dados'], 'X2':index_df2, 'Y2':row_df2['dados'], 'Distancia':distancia}, ignore_index=True)

>>> print df3
   Distancia   X1   X2     Y1    Y2
0   0.100045  1.1  1.0  1.123  1.12
1   0.100245  2.2  2.3  1.567  1.56
2   0.200002  3.3  3.5  2.001  2.00
  • I used this logic that Oce suggested, but the resulting dataframe (df3) was more than 400 lines when in fact, it was supposed to be smaller than the other two. I edited the question with the data as requested. I think it is now easier to understand what I need to do. Thank you

  • @Luana became easier haha. See if with the edition I made you can apply to your problem.

  • Yes! Thank you! I understood what you did, but now I need to do it another way... I need to find where the height difference is minimal (Y1 and Y2) and store the values of x refents at that point. My final dataframe has to have the same number of dots as my smallest dataframe.. in this case, df2.. Did you understand? I’m having a hard time doing that too

  • @Luana if a new problem has arisen that needs a different answer, ask another question addressing the new problem.

  • Actually the question is the same...the p/ solve approach that is a little different from the one you suggested, since instead of calculating the distance using x and y, I need to take only the y in consideration... But thank you anyway

Browser other questions tagged

You are not signed in. Login or sign up in order to post.