Calculate the weight of each class of an unbalanced multi-label dataset

Asked

Viewed 52 times

0

I would like to calculate the weight of each class of a dataset multi-label to pass to fit_generator of Keras the parameter class_weight. In the case of a dataset single label, as my output is limited to only one class, I can calculate as the example below:

def calc_weight_single_label(label_count):
    max_value = max(label_count.values())
    class_weight = {}
    for key in label_count.keys():
        class_weight[key] = max_value/label_count[key]
    return class_weight

>>> # class A:10%, class B:50% and class C:40%
>>> labels_dict = {'A':10, 'B':50, 'C':40}
>>> calc_weight_single_label(labels_dict)
    {'A': 5.0, 'B': 1.0, 'C': 1.25}

This means the loss when classifying the class A erroneously will be 5 times higher than ranking B erroneously.

However, in a multi-label dataset, can I have ratings like: only A, A and B, A and C and so on. How can I calculate the weight of each class in this case?

An example would be this dictionary with occurrences labels_dict = {'A':10, 'B':50, 'C':40, 'D':20} and a total number of samples equal to 100.

  • How so can you exist classification A and B? I don’t understand!!! For example you want to classify whether a person is healthy, sick or dead. In your description could you classify that person is healthy and dead or sick and dead? Strange huh ...

  • In this example of being healthy, sick or dead, the rating is single-label. An example of multi-label are classes such as: class A is 0 for age < 18 and 1 for age > 18, class B is 0 for not being a student and 1 for being. Ai a classification can be either A and B, or only A, only B or none.

  • @Alexciuffa what Keras method are you using to do this training? I need to understand the multi-label strategy being used to give a more appropriate response.

  • I’m using the .fit_generator(). The generator is something like the flow_from_dataframe(), but customized. My Abels are on a dataframe in a column labels. An example of two lines would be: [1,0,0,1] and [0,0,1,1]

  • I don’t think I was very clear, but I want to know which classifier you are using. I’m not an expert on Keras but from what I understand. fit_generator() is just a way to train with batches more flexibly.

  • I don’t quite understand the question. I’m using a CNN architecture, followed by a Fully-Connected network and, in the last layer, output with sigmoid as activation function. If that doesn’t answer the question, could you give me an example of classifiers?

Show 1 more comment
No answers

Browser other questions tagged

You are not signed in. Login or sign up in order to post.