How to find the accuracy between two columns of a data frame?

Asked

Viewed 64 times

-2

I have two columns of the same order csv, one that was written and the other that should have been written. I have three questions 1 How do I find the accuracy between them (or the more the lines in column 1 are similar to the same line in column 2 the better is the accuracy ratio), 2 as I find the line with greater precision and less precision, 3º how to remove lines with lower hit rate ?

import pandas as pd
relat_int = pd.read_csv
relat_int.head()

when I have to use . score() apprehends this error:

AttributeError                            Traceback (most recent call last)
<ipython-input-13-d63a2ac83383> in <module>
      4 e = relat_int['Intenções Reais/Esperadas']
      5 
----> 6 relat_int.score(r, e)

~\anaconda3\lib\site-packages\pandas\core\generic.py in __getattr__(self, name)
   5272             if self._info_axis._can_hold_identifiers_and_holds_name(name):
   5273                 return self[name]
-> 5274             return object.__getattribute__(self, name)
   5275 
   5276     def __setattr__(self, name: str, value) -> None:

AttributeError: 'DataFrame' object has no attribute 'score'
  • Put a snippet of csv to facilitate understanding. What do you consider the accuracy between two columns? Which method score you want to use, Dataframe does not have this method, so you are getting this error.

  • Thank you Damian. That would be: Column A // Column B ========= Column A row 1 = Ball /// Column B row 1 = Sole /// Column A row 2 = Street /// Column B row 2 = Street // In this case the program should realize that line one is with different result and line two with equal result, ie the hit rate (accuracy) is 50%. I explained it better ? It’s just that I’m new to programming....

1 answer

0

Considering a dataframe in the following format, as your example:

>>> df
   ColA  ColB
0  Bola  Sola
1   Rua   Rua

And you want to count how many rows have the values of columns A and B equal, you could use the numpy:

import numpy

total_iguais = numpy.sum(df['ColA'] == df['ColB'])
total = len(df)
acuracia = total_iguais / total

print(f'Acuracia: {acuracia}')

Now if you want to know how close the two words are (Ball and Sole are different by just one letter), you would have to add some logic to compare each character or use some similarity algorithm, such as the Sequencematcher:

import difflib
s = difflib.SequenceMatcher(None, 'Bola', 'Sola')
print(s.ratio()) # 0.75

Browser other questions tagged

You are not signed in. Login or sign up in order to post.