Retrieve value of variables within a Class in a Python function (partial_fit)

Asked

Viewed 235 times

1

I am creating a new algorithm to run along with the algorithms provided by the package sklearn in python, however the dataset used is extremely large and I am using the function partial_fit (Example: Naive Bayes link) so you can pick up blocks from dataset and run the training/test. However this function is recovered several times and some variables cannot lose the value after returning to Main, so that it is possible to update the values with each block increment.

And I’m wondering how to store this value within the function without it being reset to every new call??? Note that I do not want their return to Main yes that they are stored within the function so that you can recover them later. And without using GLOBAL variable.

Ex: Code

Excerpt from the Code:

for i, (X_train_text, y_train) in enumerate(minibatch_iterators):
    tick = time.time()
    X_train = vectorizer.transform(X_train_text)
    total_vect_time += time.time() - tick

    for cls_name, cls in partial_fit_classifiers.items():
        tick = time.time()
        # update estimator with examples in the current mini-batch
        
        # função é chamada várias vezes
        cls.partial_fit(X_train, y_train, classes=all_classes)

        # accumulate test accuracy stats
        cls_stats[cls_name]['total_fit_time'] += time.time() - tick
        cls_stats[cls_name]['n_train'] += X_train.shape[0]
        cls_stats[cls_name]['n_train_pos'] += sum(y_train)
        tick = time.time()
        cls_stats[cls_name]['accuracy'] = cls.score(X_test, y_test)
        cls_stats[cls_name]['prediction_time'] = time.time() - tick
        acc_history = (cls_stats[cls_name]['accuracy'],
                       cls_stats[cls_name]['n_train'])
        cls_stats[cls_name]['accuracy_history'].append(acc_history)
        run_history = (cls_stats[cls_name]['accuracy'],
                       total_vect_time + cls_stats[cls_name]['total_fit_time'])
        cls_stats[cls_name]['runtime_history'].append(run_history)

NOTE: See that cls.partial_fit is called several times by more than one Classifier and at the end a new block of the dataset and again the classifiers are called, but still the variables do not lose allocated values within their functions. In the case of Naive Bayes it still continues with the values of the last call for update. (Ex of variables updated online Naive Bayes: mean and standard deviation)

Follow Video to help explain: Link

  • Which variables do you want to keep the values? This partial_fit is from its implementation or from the library?

  • Hello there partial_fit is a standard implementation schema template in Sklearn. In my case I need to keep a numpy matrix active for the next iteration.

  • Here is a video to help with the explanation: https://youtu.be/J9UszAIIco4

No answers

Browser other questions tagged

You are not signed in. Login or sign up in order to post.