Typeerror: can’t pickle _thread. _local Objects when I try to use scikit-Learn RFE in a template created in tensorflow

Asked

Viewed 106 times

1

I’m trying to use the scikit-Learn library RFE on models I created using tensorflow, but when I try to train I get TypeError: can't pickle _thread._local objects. Follow the code and error below:

import tensorflow as tf
import pandas as pd
from sklearn.feature_selection import RFE

data = {'atributo1':[1,2,3,4,5],'atributo2':[1,2,3,4,5],'atributo3':[1,2,3,4,5],'atributo4':[1,2,3,4,5], 'target':[1,0,1,0,1]}

base = pd.DataFrame(data)

n_hidden1 = 100
n_hidden2 = 50
n_outputs = 2

def create_model():
    model = tf.keras.Sequential([tf.keras.layers.Dense(n_hidden1,activation = 'relu'),
                             tf.keras.layers.Dense(n_hidden2,activation = 'relu'),
                             tf.keras.layers.Dense(n_outputs,activation = 'softmax')])
    model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

    return model

model = tf.keras.wrappers.scikit_learn.KerasClassifier(build_fn=create_model(), batch_size = 10, epochs = 20)
rank = RFE(estimator=model,verbose=1,n_features_to_select=2)
rank.fit(base.drop('target',axis=1),base['target'])

> runfile('C:/Users/panto/.spyder-py3/temp.py', wdir='C:/Users/panto/.spyder-py3')
Traceback (most recent call last):

  File "<ipython-input-5-4d89fbeba90e>", line 1, in <module>
    runfile('C:/Users/panto/.spyder-py3/temp.py', wdir='C:/Users/panto/.spyder-py3')

  File "C:\Users\panto\AppData\Local\Continuum\anaconda3\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 827, in runfile
    execfile(filename, namespace)

  File "C:\Users\panto\AppData\Local\Continuum\anaconda3\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 110, in execfile
    exec(compile(f.read(), filename, 'exec'), namespace)

  File "C:/Users/panto/.spyder-py3/temp.py", line 25, in <module>
    rank.fit(base.drop('target',axis=1),base['target'])

  File "C:\Users\panto\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\feature_selection\rfe.py", line 144, in fit
    return self._fit(X, y)

  File "C:\Users\panto\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\feature_selection\rfe.py", line 179, in _fit
    estimator = clone(self.estimator)

  File "C:\Users\panto\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\base.py", line 64, in clone
    new_object_params[name] = clone(param, safe=False)

  File "C:\Users\panto\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\base.py", line 55, in clone
    return copy.deepcopy(estimator)

  File "C:\Users\panto\AppData\Local\Continuum\anaconda3\lib\copy.py", line 180, in deepcopy
    y = _reconstruct(x, memo, *rv)

  File "C:\Users\panto\AppData\Local\Continuum\anaconda3\lib\copy.py", line 280, in _reconstruct
    state = deepcopy(state, memo)

  File "C:\Users\panto\AppData\Local\Continuum\anaconda3\lib\copy.py", line 150, in deepcopy
    y = copier(x, memo)

  File "C:\Users\panto\AppData\Local\Continuum\anaconda3\lib\copy.py", line 240, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)

  File "C:\Users\panto\AppData\Local\Continuum\anaconda3\lib\copy.py", line 169, in deepcopy
    rv = reductor(4)

TypeError: can't pickle _thread._local objects
  • You need to stare at the model before the function RFE, nay?

  • No, I’ve used RFE with templates created in sklearn itself and it worked like this, it serves precisely to stare several times and find the rank of attributes.

  • Try to make a minimum repeatable example. Then it would be easier to understand what is happening. See instructions here: https://answall.com/help/minimal-reproducible-example

1 answer

1


Your code is correct, but it is not running and will not run because there is an incompatibility between the Keras and the function RFE (recursive Feature elimination ) do sklearn. As can be seen in documentation of RFE:

First, the estimator is trained in the initial set of resources and the importance of each resource is obtained through an attribute coefficient_ or through an attribute feature_importances_. Then the resources less important are removed from the current set of resources. This procedure is repeated recursively on the pruned set until the desired number of resources to be selected is finally achieved. (Free translation, my emphasis)

That is, for RFE to work, the underlying model used must have an attribute called coef_ or one whose name is feature_importances_. Note that this is not the case with KerasClassifier. You can see this using your own code with some modifications. See:

import tensorflow as tf
import pandas as pd
from sklearn.feature_selection import RFE
from sklearn.svm import SVR #esse modulo vai ser importante para o próximo exemplo

data = {'atributo1':[1,2,3,4,5],'atributo2':[1,2,3,4,5],'atributo3':[1,2,3,4,5],'atributo4':[1,2,3,4,5], 'target':[1,0,1,0,1]}

base = pd.DataFrame(data)

n_hidden1 = 100
n_hidden2 = 50
n_outputs = 2

def create_model():
    model = tf.keras.Sequential([tf.keras.layers.Dense(n_hidden1,activation = 'relu'),
                             tf.keras.layers.Dense(n_hidden2,activation = 'relu'),
                             tf.keras.layers.Dense(n_outputs,activation = 'softmax')])
    model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

    return model

X = base.drop('target',axis=1).values
y = base['target'].values
model = tf.keras.wrappers.scikit_learn.KerasClassifier(build_fn=create_model, batch_size = 10, epochs = 20)
model.fit(X, y)

#Mostrar todos os métodos e atributos
print(dir(model))

This is the output, note the absence of the attributes cited in the RFE documentation:

['__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', '_keras_api_names', '_keras_api_names_v1', 'build_fn', 'check_params', 'classes_', 'filter_sk_params', 'fit', 'get_params', 'model', 'n_classes_', 'predict', 'predict_proba', 'score', 'set_params', 'sk_params']

Or simply:

print('coef_' in dir(model))
print('feature_importances_' in dir(model))

Output:

False
False

To see that your code works and the problem is the Keras, run the same code using a template SVR linear. For this, just import the module (see code above), and replace the model for:

model = SVR(kernel="linear")

Browser other questions tagged

You are not signed in. Login or sign up in order to post.