1
I am creating a new algorithm to run along with the algorithms provided by the package sklearn in python, however the dataset used is extremely large and I am using the function partial_fit (Example: Naive Bayes link) so you can pick up blocks from dataset and run the training/test. However this function is recovered several times and some variables cannot lose the value after returning to Main, so that it is possible to update the values with each block increment.
And I’m wondering how to store this value within the function without it being reset to every new call??? Note that I do not want their return to Main yes that they are stored within the function so that you can recover them later. And without using GLOBAL variable.
Ex: Code
Excerpt from the Code:
for i, (X_train_text, y_train) in enumerate(minibatch_iterators):
tick = time.time()
X_train = vectorizer.transform(X_train_text)
total_vect_time += time.time() - tick
for cls_name, cls in partial_fit_classifiers.items():
tick = time.time()
# update estimator with examples in the current mini-batch
# função é chamada várias vezes
cls.partial_fit(X_train, y_train, classes=all_classes)
# accumulate test accuracy stats
cls_stats[cls_name]['total_fit_time'] += time.time() - tick
cls_stats[cls_name]['n_train'] += X_train.shape[0]
cls_stats[cls_name]['n_train_pos'] += sum(y_train)
tick = time.time()
cls_stats[cls_name]['accuracy'] = cls.score(X_test, y_test)
cls_stats[cls_name]['prediction_time'] = time.time() - tick
acc_history = (cls_stats[cls_name]['accuracy'],
cls_stats[cls_name]['n_train'])
cls_stats[cls_name]['accuracy_history'].append(acc_history)
run_history = (cls_stats[cls_name]['accuracy'],
total_vect_time + cls_stats[cls_name]['total_fit_time'])
cls_stats[cls_name]['runtime_history'].append(run_history)
NOTE: See that cls.partial_fit
is called several times by more than one Classifier and at the end a new block of the dataset and again the classifiers are called, but still the variables do not lose allocated values within their functions. In the case of Naive Bayes it still continues with the values of the last call for update. (Ex of variables updated online Naive Bayes: mean and standard deviation)
Follow Video to help explain: Link
Which variables do you want to keep the values? This
partial_fit
is from its implementation or from the library?– Woss
Hello there
partial_fit
is a standard implementation schema template in Sklearn. In my case I need to keep a numpy matrix active for the next iteration.– Lucas de Souza Rodrigues
Here is a video to help with the explanation: https://youtu.be/J9UszAIIco4
– Lucas de Souza Rodrigues