Sklearn - Difference between preprocessing.Scale() and preprocessing.Standardscaler()

Asked

Viewed 141 times

-1

Hello!

I’m a beginner in Data Science and Machine Learning, I’m sorry if the doubt is kind of "beast".

I understand the importance of standardization/normalization of Features and in my studies, I always come across the use of Standardscaler(). Studying through Sklearn documentation, I saw that there is also preprocessing.Scale() and in practice, both Standardscaler() and only Scaler()in my test resulted in the same thing.

The documentation says that Standardscaler() applies "Transformer API". What would that be? What is the difference when using preprocessing.Cale() x preprocessing.Standardscaler() ?

My tests:

from sklearn import preprocessing
import numpy as np

X_train = np.array([[ 1., -1.,  2.],
                    [ 2.,  0.,  0.],
                    [ 0.,  1., -1.]])

X_scaled = preprocessing.scale(X_train)
X_scaled ## Features padronizadas

out[ ]:

array([[ 0.        , -1.22474487,  1.33630621],
       [ 1.22474487,  0.        , -0.26726124],
       [-1.22474487,  1.22474487, -1.06904497]])

Using the Standardscaler():

from sklearn.preprocessing import StandardScaler

scaler = preprocessing.StandardScaler().fit(X_train)
StandardScaler()

scaler.mean_

scaler.scale_

scaler.transform(X_train)

out [ ]

array([[ 0.        , -1.22474487,  1.33630621],
       [ 1.22474487,  0.        , -0.26726124],
       [-1.22474487,  1.22474487, -1.06904497]])

Thank you very much!

1 answer

0


1) What is the difference when using preprocessing.scale() Vs preprocessing.StandardScaler()?

Both the StandardScaler().fit(X_train) and the scale(X_train) perform the same operation, but the first generates a class, while the second only scales the data.

If you have a X_train and a X_test, correct is to scale both data sets by the mean and standard deviation only of the X_train. It is in this scenario that the StandardScaler() enters:

scaler = StandardScaler()

X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

If you have only one data set and this scheduling will only be done once, the preprocessing.scale() is enough.

2) The documentation says that Standardscaler() applies "Transformer API". What would that be?

To documentation says:

The preprocessing further module provides a Utility class Standardscaler that Implements the Transformer API to Compute the Mean and standard deviation on a training set so as to be Able to later reapply the same Transformation on the testing set.

The module preprocessing still provides a Standardscaler utility class that implements the API Transformer to calculate the mean and standard deviation in a training set so that the same transformation can be reused in the test set.

This API Transformer is a reference to the module Transformer sklearn, used to transform data. This module is flatly used by sklearn itself in its own methods, but we can do a personalized.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.