How to normalize data? Any sklearn library?

Asked

Viewed 5,209 times

3

I need to normalize the data I have so that it stays between -1 and 1.

I used Standardscaler, but the interval got longer.

What other sklearn library could you use? There are several in sklearn, but I could not, should make life easier, but I believe that I am not knowing how to use.

What I tried was:

df = pd.read_fwf('traco_treino.txt', header=None)
plt.plot(df)

Dados no intervalo -4 e 4

Data in range -4 and 4

After the normalisation attempt:

from sklearn.preprocessing import StandardScaler  
scaler = StandardScaler()  
scaler.fit(df)
dftrans = scaler.transform(df)
plt.plot(dftrans)

Dado não foi normalizado com sucesso

The data is between -10 and 10.

1 answer

4


The Standardscaler standardizes the data for a unit of variance (var=1) and not for a crease, so the results differ from expected.

To standardize the data in the interval (-1, 1), use the Maxabsscaler:

import numpy as np
from sklearn.preprocessing import MaxAbsScaler

# Define os dados
dados = np.array([[0, 0], [300, -4], [400, 3.8], [1000, 0.5], [3000, 0]], dtype=np.float64)

dados
=> array([[  0.00000000e+00,   0.00000000e+00],
       [  3.00000000e+02,  -4.00000000e+00],
       [  4.00000000e+02,   3.80000000e+00],
       [  1.00000000e+03,   5.00000000e-01],
       [  3.00000000e+03,   0.00000000e+00]])

# Instancia o MaxAbsScaler
p=MaxAbsScaler()

# Analisa os dados e prepara o padronizador
p.fit(dados)
=> MaxAbsScaler(copy=True)

# Transforma os dados
print(p.transform(dados))
=> [[ 0.          0.        ]
 [ 0.1        -1.        ]
 [ 0.13333333  0.95      ]
 [ 0.33333333  0.125     ]
 [ 1.          0.        ]]

More information on documentation or Wikipedia: Feature scaling

  • Hello Gomiero, thanks for the help. However, it is not working. My data is a column of values. So I turned it into a dataframe with pandas, so it had two columns. Is that the problem? It only works in a 2D array?

  • The columns of Dataframe are arrays therefore should work smoothly. Check whether the way you are creating the Dataframe is correct and if the data types are ok

  • Hello Gomiero, my data are . txt, how could I do? When I create Dataframe, I create a column with indexes from 0 to 2999 (data size is 3000), in addition to txt values.

  • Assuming the data is in a column 'a' of Dataframe, try to transform the values into an np.array of the column type (eg: dd = dados['a'].values.reshape(-1,1)). After the reshape, execute the p.fit(dd) and the print(p.transform(dd). I believe the problem is reshape so that the Scaler work

  • Array 1D does not give, gives error, but it was 2D. I must have done something wrong the other time, now it worked. Thanks!

Browser other questions tagged

You are not signed in. Login or sign up in order to post.