Most Important Attributes in Random Forest Classifier

Asked

Viewed 508 times

1

Good afternoon guys, I wonder if you have to return a percentage of each attribute used in training Random Forest Classifier, to show which attributes are the most deterministic.

2 answers

1


Reading the documentation is always a good first step.

In any case, the manual:

feature_importances_ : array of Shape = [n_features]
The Feature importances (the Higher, the more Important the Feature).

Who in a free translation:

importance_das_variable_ : vector with shape = [numero_de_attributes] The importance of variables (the greater the importance of the variable).

Just to make it extremely clear, you will initialize your model (1), train it (2) and then get an importance of variables:

from sklearn.ensemble import RandomForestClassifier

clf = RandomForestClassifier()      # (1)
clf.fit(x, y)                       # (2)
print(clf.feature_importances_)     # (3)
  • Nothing to do with it but you saved my life in a translation, thank you

1

That paper proposes a methodology to analyze the predictions of this type of algorithm. Fortunately there is this python project implementing the methodology.

At this link has a tutorial of use exactly with Randomforest. I am copying the code below not to run the risk of link stop working.

import sklearn
import sklearn.datasets
import sklearn.ensemble
import numpy as np
import lime
import lime.lime_tabular
from __future__ import print_function
np.random.seed(1)

# treinar algoritmo 
iris = sklearn.datasets.load_iris()
train, test, labels_train, labels_test = sklearn.model_selection.train_test_split(iris.data, iris.target, train_size=0.80)
rf = sklearn.ensemble.RandomForestClassifier(n_estimators=500)
rf.fit(train, labels_train)

# explicar as predições
explainer = lime.lime_tabular.LimeTabularExplainer(train, feature_names=iris.feature_names, class_names=iris.target_names, discretize_continuous=True)

i = np.random.randint(0, test.shape[0])
exp = explainer.explain_instance(test[i], rf.predict_proba, num_features=2, top_labels=1)

Browser other questions tagged

You are not signed in. Login or sign up in order to post.