Easy counting between columns - Python

Asked

Viewed 125 times

2

I’m starting in python and have trouble automating some calculations. I have a Dataframe with 8 columns [A, B, C, D, E, F, G, H] and 150 rows.

I need to count how many times 2 columns are equal to each other, for example: A==B, A==C, A==D... B==C, B==D... Then I need to divide the amount of equalities by the total number of rows (150) and store this result in another table. So far I’ve achieved this:

condicao = (df['A']==df['B'])
sum(condicao)

x = sum(condicao)/150
print(x)

With this code I already get the result I need, however, it would be necessary to create 28 conditions. Any idea how to summarize this?

Hugs.

  • Columns need to be equal in all entries or you want to count the number of entries in common between two columns?

1 answer

3


(in case you want to count the number of equal entries for each possible pair of columns)

import pandas as pd
import numpy as np


Colunas = ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H']


Duplas = [Colunas[i]+Colunas[j] for i in range(8) for j in range(i+1, 8)]
Duplas = np.array(Duplas)


@np.vectorize
def ContaDuplas(Dupla):

    return np.sum(df[Dupla[0]] == df[Dupla[1]])



NovaTabela = pd.DataFrame({'Contagem': ContaDuplas(Duplas)/150}, index=Duplas)
  • 1

    It was exactly what I needed, with this new table I can assemble an array. Thank you very much.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.