Valueerror: Input contains Nan, Infinity or a value Too large for dtype('float64') ( I did the check and none of these proceed)

Question

Valueerror: Input contains Nan, Infinity or a value Too large for dtype('float64') ( I did the check and none of these proceed)

Asked 5 years, 3 months ago

Viewed 2,767 times

0

I’m working on a python linear regression project, but there’s been a problem with the.fit() model. the following errors occur:

in the code I put here:

Valueerror: Input contains Nan, Infinity or a value Too large for dtype('float64').

when I try to set:

Valueerror: could not Convert string to float: 'e'

I’ve searched the Internet, but nothing says "and" I have tried to convert int(float(x)) and the numbers are all floats in the number.0 format, with no decimal places. Some of the numbers are 0.0 and others are high values. here is the code for analysis:

mport pandas as pd
import seaborn as sns
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
#imports
dataset = pd.read_csv('movies_metadata.csv')
data = dataset.columns
data =dataset[['title','budget','revenue','vote_average']]
#seleção dos dados
custo = []
for i in data['budget']:
    try:
        custo.append(int(i))
    except ValueError:
        custo.append(0)
custo = pd.Series(custo)
data['custo'] = custo
data.drop(['budget'],axis = 1)
data.dropna()
#modelo de machine learning
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
import sklearn.metrics 
X = data['custo'].values.reshape(-1,1)
Y =data['revenue'].values.reshape(-1,1)
treino_x, teste_x, treino_y, teste_y = train_test_split(X,Y,random_state = 101,train_size = 0.27)
lista = [treino_x, teste_x, treino_y, teste_y]
modelo = LinearRegression()
modelo.fit(teste_x,teste_y)

1 answer

Browser other questions tagged python machine-learning regression

You are not signed in. Login or sign up in order to post.

by jfaccioni • **1,283** points · Answer 1 · 2020-04-12T15:23:18+00:00

Without access to the data I can’t test to see if this is it, but note that the lines

data.drop(['budget'],axis = 1)
data.dropna()

are not operations in-place. So much Dataframe.drop how much Dataframe.dropna are methods that return a new DataFrame, which you are not checking for any variable.

Simply tick the DataFrame returned to a variable, ex:

data = data.drop(['budget'],axis = 1)
data = data.dropna()

Or add the argument inplace=True to change the behaviour of the methods, e.g.:

data.drop(['budget'],axis = 1, inplace=True)
data.dropna(inplace=True)