sklearn’s classification_report and confusion_matrix: do the values not match?

Question

sklearn’s classification_report and confusion_matrix: do the values not match?

Asked 7 years, 1 month ago

Viewed 1,686 times

2

Model: logistic regression with sklearn.

I decided to compare the results shown in the classification_report, calculating them using the confusion matrix but apparently the results do not match:

classification_report:

precision    recall  f1-score   support

      0       0.54      0.94      0.68     56000
      1       0.96      0.62      0.75    119341

avg / total 0.82 0.72 0.73 175341

Matrix of confusion generated:

    [52624  3376]
   [45307 74034]]

My calculations based on the above confusion matrix:

How much the average model achieves (accuracy)?

(TP + TN)/total

(74034 + 52624)/(52624 + 74034 +45307 + 74034)*100 = 51%

The accuracy of the model (ratio between the number of TP and the sum of TP and FP)?

74034/(74034 + 3376)*100 = 95%

What is the Recall R of the model (ratio between the TP number and the sum of TP and FN)

74034/(74034 + 45307)*100 = 62%

As we can see, recall and Precision do not match. What is wrong? How to interpret the results? What F1-score and support represent?

1 answer

Browser other questions tagged python-3.x machine-learning sklearn

You are not signed in. Login or sign up in order to post.

by Begnini • **1,681** points · Answer 1 · 2018-07-05T13:23:36+00:00

I will try to explain step by step the analysis so that you can understand the problem or someone else with the same problem can understand how to solve these things.

First, I’ll generate two vectors, target and predicted, that will simulate the result of your classification. These vectors were created from the data you passed.

First, the classification_report says you have 56,000 class 0 and 119341 class 1 in your rank. So I’ll generate a vector with 56,000 zeros and 119341 ones.

import nump as np

class0 = 56000 
class1 = 119341
total = class0 + class1

target          = np.zeros(total, dtype=np.int)
target[class0:] = np.ones(class1, dtype=np.int)

# pra provar que os valores estao certos
sum(target == 0) == class0, sum(target == 1) == class1

With that, you have the vector target, with the data that your classification should have hit. Let’s generate now the predicted, which will have what its classification reported. These data were taken from its confusion matrix.

class0_hit  = 52624 # qto acertou da classe 0
class0_miss = 3376 # qto errou da classe 0
class1_miss = 45307 # qto errou da classe 1
class1_hit  = 74034 # qto acertou da classe 1

predicted = np.zeros(total, dtype=np.int)

predicted[class0_hit:class0_hit + class0_miss + class1_hit] = np.ones(class0_miss + class1_hit, dtype=np.int)

# pra provar que os valores estao certos
sum(predicted == 0) == class0_hit + class1_miss, sum(predicted == 1) == class0_miss + class1_hit

Now we can look at sklearn’s Classification report and see what it tells us of these values:

from sklearn.metrics import classification_report
print (classification_report(target, predicted))

             precision    recall  f1-score   support

          0       0.54      0.94      0.68     56000
          1       0.96      0.62      0.75    119341

avg / total       0.82      0.72      0.73    175341

This is exactly the same as the Classification report you pasted. We’ve reached the same point as Voce.

Looking now at the matrix of confusion:

from sklearn.metrics import confusion_matrix
print (confusion_matrix(target, predicted))

[[52624  3376]
 [45307 74034]]

It’s still the same. Let’s look at what acuracia says:

from sklearn.metrics import accuracy_score
accuracy_score(target, predicted)
> 0.7223524446649672

It returns 72%. Just like the Classification report. So why are your accounts giving 51% in accuracy? In your account this:

(TP + TN)/total
(74034 + 52624)/(52624 + 74034 + 45307 + 74034)*100 = 51%

If you repair, the value 74.034 is repeated 2x. Doing the accounts using the values set in the code would look like this:

 acc = (class0_hit + class1_hit) / total
 > 0.7223524446649672

That matches the value of accuracy_score. The accuracy and recall calculations are correct:

from sklearn.metrics import precision_score
precision_score(target, predicted)
> 0.9563880635576799

from sklearn.metrics import recall_score
recall_score(target, predicted)
> 0.6203567927200208

But because, then, the classification_report is returning those odd values at the end? The answer is simple and is in his documentation.

The reported averages are a prevalence-Weighted macro-Average Across classes (equivalent to precision_recall_fscore_support with Average='Weighted').

That is, it does not do the simple calculation, it takes into account the amount of each class to calculate the average.

Let’s take a look at this method precision_recall_fscore_support. It has a parameter called average, used to control the behavior of the calculation. Running it with the same parameter as the classification_report we have the same result:

from sklearn.metrics import precision_recall_fscore_support
precision_recall_fscore_support(target, predicted, average='weighted')
> (0.8225591977440773, 0.7223524446649672, 0.7305824989909749, None)

Now, since his classification has only 2 classes, the right one is to ask him to calculate with average Binary. Switching to binary the parameter, we have as a result:

precision_recall_fscore_support(target, predicted, average='binary')
> (0.9563880635576799, 0.6203567927200208, 0.75256542533456, None)

Which is exactly the result we find using sklearn’s own functions or by doing the calculation at hand.