Application of linear regression

Asked

Viewed 224 times

0

I have two lists

print(lista)
[970084.4148727012, 983104.7719906792, 996164.0, 1006426.5111488493, 1016687.0370821969, 1026941.5758164332, 1037185.9604590479, 1047415.8544247652, 1057626.746645888, 1067813.94679318, 1077972.5805253708, 1088097.584787312, 1098183.7031788095, 1147832.9385862947, 1195602.90322828, 1281768.5077875573]

print(new_list)
[3161, 3185, 3164, 3152, 3154, 3146, 3144, 3174, 0, 0, 0, 0, 0, 0, 0, 0]

I want to apply linear regression to predict the values that are 0 in new_list, for this I selected only the first 8 items:

from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
X = lista[:8]
y = new_list[:8]

I’ve separated the data for training and testing

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=2)

And applied linear regression:

regr = LinearRegression() 
regr.fit(X_train, y_train)

But there was a mistake:

Valueerror: Expected 2D array, got 1D array Instead: array=[ 970084.4148727 983104.77199068 996164. 1006426.51114885 1016687.0370822 1026941.57581643 1037185.96045905 1047415.85442477]. Reshape your data either using array.reshape(-1, 1) if your data has a single Feature or array.reshape(1, -1) if it contains a single sample.

What Should I Do?

  • 1

    Hey, how about doing the [tour] as I already suggested in your other question? You still haven’t learned how to use all the tools on the site, but keep posting questions. Please do the [tour] to learn at least the basics of the operation.

  • Sorry my ignorance, but I do not understand what is wrong in my question, I’ve seen the tour and in my view there is nothing wrong, I hope I can be more specific, because if something I am doing wrong, I will fix at the same time.

  • At the moment there is not, as it has been edited. Use the bass accents only to format inline codes. For code snippets, just indent with 4 blank spaces. For ease, the editor has the button {} entering the selected code.

  • I appreciate the clarification, and I’m sorry for the error.

1 answer

2


The sklearn supposes that your data X be a list list, because otherwise it has no way to distinguish between a dataset of, for example, 8 Features and 1 example and a dataset of 1 Feature and 8 examples.

To solve this, you can turn your list into a list of lists:

from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
lista = [[elemento] for elemento in lista]
X = lista[:8]
y = new_list[:8]

...

Or use the numpy with reshape as suggested by the error message (Reshape your data using array.reshape(-1, 1) if your data has a single feature):

import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
X = lista[:8]
X = np.array(X).reshape(-1, 1)
y = new_list[:8]

...
  • It worked perfectly, thank you.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.