PCA method for Feature Selection - How do I resolve the raise Exception("Data must be 1-dimensional" error)?

Asked

Viewed 65 times

3

I am trying to implement the PCA method for Feature Selection from the following functions:

#Função que permitirá rankear as features mais importantes em um barhplot
def ranks_PCA (x_train, y_train, features_train, RESULT_PATH='Results'):
    print("\nMétodo PCA")

    pca = PCA(n_components=58)
    pca.fit_transform(x_train)

    imp_array = np.array(pca.components_)
    imp_order = imp_array.argsort()
    ranks = imp_order.argsort()

    # Plot PCA
    imp = pd.Series(pca.components_, index=x_train.columns)
    imp = imp.sort_values()

    imp.plot(kind="barh")
    plt.xlabel("Importance")
    plt.ylabel("Features")
    plt.title("Feature importance using PCA")
    # plt.show()
    plt.savefig(RESULT_PATH + '/ranks_DT.png', bbox_inches='tight')

    return ranks

#Função para predição das features dos dados de teste
def predict_PCA(x_test_sel, k_vetor, y_train):
    model = decomposition.PCA()
    model.fit(k_vetor, y_train)
    y_predict = model.predict(x_test_sel)
    return(y_predict)

#Função que calcula o ranking dos dados de treinamento
ranks4 = frk.ranks_PCA(x_train, y_train, features_train, RESULT_PATH)

I have doubts whether this implementation is correct to get more important Features. When trying to run this code, I get the following error:

Traceback (Most recent call last): File "feat_test.py", line 235, in 'Results/Pdbbind2018_f58_delta_pkd') File "feat_test.py", line 78, in run_experiment ranks4 = frk.ranks_PCA(x_train, y_train, features_train, RESULT_PATH) File "C: Users Patricia Desktop VT-58 - Copy Feature-importance feature_rank_ ensemble Scripts feature_ranks.py", line 121, in ranks_PCA imp = pd.Series(pca.components_, index=x_train.Columns) File "C: Users Patricia Desktop VT-58 - Copy Feature-importance feature_rank_ ensemble env lib site-Packages pandas core series.py", line 305, in init data = sanitize_array(data, index, dtype, copy, raise_cast_failure=True) File "C: Users Patricia Desktop VT-58 - Copy Feature-importance feature_rank_ ensemble env lib site-Packages pandas core Construction.py", line 482, in saniti ze_array raise Exception("Data must be 1-dimensional")

Could someone help me?

1 answer

1

The attribute Components_ of sklearn.decomposition.PCA returns an array with Shape equal to (n_components, n_features). In your case (58, x.Shape[0]).

You are using this array to try to create an object of the type pandas. Series, accepting only 1 dimension arrays.

The mistake is in:

imp = pd.Series(pca.components_, index=x_train.columns)

If you want to plot only one component, you must choose it or change the number of PCA components to 1.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.