How do I perform a normalization of curves in python?

Asked

Viewed 76 times

2

Good afternoon everyone, I am doing an exploratory analysis of data in Python using the classic DF kc_house_data, and I came across the following problem.

I would like to make some comparisons to see whether or not there is a statistical difference between two sets of data:

d1 = data['price_m2_living'].loc[data['floors'] % 1 != 0]
d2 = data['price_m2_living'].loc[data['floors'] % 1 == 0]

First of all I had to check the normality of the sets, for that I made a histogram:

inserir a descrição da imagem aqui inserir a descrição da imagem aqui

I found that the histogram does not resemble a normal curve, so I used skew and kstest in both df:

print(skew(d1),skew(d2))

0.7013599441290586 1.3217600229412982

print(kstest(d1, 'norm'), kstest(d2,'norm'))

Kstestresult(statistic=1.0, Pvalue=0.0) Kstestresult(statistic=1.0, Pvalue=0.0)

That is, these two dfs cannot be considered normal, so I would have two options; 1- normalize the curves, 2- check if there is difference between the data using some test for 'non-normal' data (in this case, Wilcoxon)

For this I tried using scypy.stats.lognormal, but I’m having difficulty, I don’t know if this is the right way to normalize the curve, I’ve tried some variations, but never at the end my histogram seems to have normalized. And even if my skew(x) shows a lower value than the D1 and D2 initial my value p of the Kolmogorov-Smirnov test is always 0.

df1 = lognorm.pdf(d1, d1.std())
df2 = lognorm.pdf(d1, d1.std())

I thought this would be the way to normalize these curves, but it’s not working. While the Wilcoxon test, as far as I can tell, only applies to curves with the same number n of values.

If anyone can help me, I would be very grateful, I’m already a few days stuck in it.

  • By the type of asymmetry you can log the data. numpy has log function

  • See if that helping...

1 answer

0

Have you ever tried using Scikit-Learn to perform normalization? For this, you can use the class StandardScaler which performs this task as follows:

from sklearn.preprocessing import StandardScaler
norm = StandardScaler()

d1_norm = norm.fit_transform(d1)
d2_norm = norm.fit_transform(d2)

Browser other questions tagged

You are not signed in. Login or sign up in order to post.