How to use a quadratic regression model?

Asked

Viewed 401 times

1

I’m trying to learn how to adjust a quadratic regression model. The dataset can be downloaded at: https://filebin.net/ztr9har5nio7x78v

Let Adjsaleprice be the target variable and "Sqfttotliving","Sqftlot","Bathrooms","Bedrooms","Bldggrade" the predictor variables.

Imagine that Sqfttotliving will be the variable that has grade 2. Be python code:

import pandas as pd
import numpy as np
import statsmodels.api as sm
import sklearn


houses = pd.read_csv("house_sales.csv", sep = '\t')#separador é tab

colunas = ["AdjSalePrice","SqFtTotLiving","SqFtLot","Bathrooms","Bedrooms","BldgGrade"]

houses1 = houses[colunas]


X = houses1.iloc[:,1:] ## 
y =  houses1.iloc[:,0] ##

How to adjust a quadratic regression model using sklearn and statsmodels ? I can only use linear regression...

1 answer

2


Using only the statsmodels:

With the statsmodels it is possible to write the desired formula, for example:

target ~ np.power(X1, 2) + X2

In this example, it means that we are searching for the parameters a1 and a2 that best approach:

target = a1 * X1^2 + a2 * X2

A practical example in your case would be to write the formula and pass the houses.to_dict('list') as data:

import statsmodels.formula.api as sm
import numpy as np

model = sm.ols(formula = 'AdjSalePrice ~ np.power(SqFtTotLiving, 2) + SqFtLot + Bathrooms + Bedrooms + BldgGrade', data = houses.to_dict('list')).fit()

Then to use the trained model, just do:

model.predict({
    "SqFtTotLiving":[20],
    "SqFtLot":[10],
    "Bathrooms":[2],
    "Bedrooms":[4],
    "BldgGrade":[10]
})

I think it is worth mentioning that using bias, a column with "1", can help improve the result.

References:


Using only the sklearn:

It is possible to generate a polynomial input with preprocessing PolynomialFeatures and then apply a linear regression. This function transforms a vector, such as [x1, x2] in [1, x1, x2, x1^2, x1*x2, x2^2].

from sklearn.preprocessing import PolynomialFeatures
from sklearn import linear_model

# Entradas exemplo
X = [[0.99, 0.65, 0.35, 0.01], [0.6, 0.01, 0.5, 0.2]]
#   [[  X1,   X2,   X3,   X4]
target = [1, 0]

poly = PolynomialFeatures(degree=2, include_bias=True)
X_polinomial = poly.fit_transform(X)

>>> print(np.round(X_polinomial[0], decimals=3))
 [   1, 0.99, 0.65, 0.35, 0.01,  0.98, 0.644, 0.346,  0.01, 0.423, 0.227, 0.007, 0.122, 0.003,    0.]
#[bias,   X1,   X2,   X3,   X4, X1*X1, X1*X2, X1*X3, X1*X4, X2*X2, X2*X3, X2*X4, X3*X3, X3*X4, X4*X4]
#[   0,    1,    2,    3,    4,     5,     6,     7,      8,    9,    10,    11,    12,    13,    14]

clf = linear_model.LinearRegression()
clf.fit(X_polinomial, target)

To choose which columns you want as input, for example, use only bias, X1, X2 and X1 2, just do:

features_to_use = [0, 1, 2, 5]

clf = linear_model.LinearRegression()
clf.fit(X_polinomial[:, features_to_use], target)

References:

  • could explain a little more the part with the statsmodels?

Browser other questions tagged

You are not signed in. Login or sign up in order to post.