Using only the statsmodels
:
With the statsmodels
it is possible to write the desired formula, for example:
target ~ np.power(X1, 2) + X2
In this example, it means that we are searching for the parameters a1
and a2
that best approach:
target = a1 * X1^2 + a2 * X2
A practical example in your case would be to write the formula and pass the houses.to_dict('list')
as data
:
import statsmodels.formula.api as sm
import numpy as np
model = sm.ols(formula = 'AdjSalePrice ~ np.power(SqFtTotLiving, 2) + SqFtLot + Bathrooms + Bedrooms + BldgGrade', data = houses.to_dict('list')).fit()
Then to use the trained model, just do:
model.predict({
"SqFtTotLiving":[20],
"SqFtLot":[10],
"Bathrooms":[2],
"Bedrooms":[4],
"BldgGrade":[10]
})
I think it is worth mentioning that using bias, a column with "1", can help improve the result.
References:
Using only the sklearn
:
It is possible to generate a polynomial input with preprocessing PolynomialFeatures
and then apply a linear regression. This function transforms a vector, such as [x1, x2]
in [1, x1, x2, x1^2, x1*x2, x2^2]
.
from sklearn.preprocessing import PolynomialFeatures
from sklearn import linear_model
# Entradas exemplo
X = [[0.99, 0.65, 0.35, 0.01], [0.6, 0.01, 0.5, 0.2]]
# [[ X1, X2, X3, X4]
target = [1, 0]
poly = PolynomialFeatures(degree=2, include_bias=True)
X_polinomial = poly.fit_transform(X)
>>> print(np.round(X_polinomial[0], decimals=3))
[ 1, 0.99, 0.65, 0.35, 0.01, 0.98, 0.644, 0.346, 0.01, 0.423, 0.227, 0.007, 0.122, 0.003, 0.]
#[bias, X1, X2, X3, X4, X1*X1, X1*X2, X1*X3, X1*X4, X2*X2, X2*X3, X2*X4, X3*X3, X3*X4, X4*X4]
#[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]
clf = linear_model.LinearRegression()
clf.fit(X_polinomial, target)
To choose which columns you want as input, for example, use only bias, X1, X2 and X1 2, just do:
features_to_use = [0, 1, 2, 5]
clf = linear_model.LinearRegression()
clf.fit(X_polinomial[:, features_to_use], target)
References:
could explain a little more the part with the statsmodels?
– Ed S