Valueerror: could not Convert string to float: 'red'

Question

Valueerror: could not Convert string to float: 'red'

Asked 5 years, 4 months ago

Viewed 2,901 times

2

Hello, I’m trying to make a model for deciding white and red wines, this is my code:

from sklearn.model_selection import train_test_split
import keras
from keras.models import Sequential
from keras.layers import Dense 
import numpy as np

np.random.seed(2)

# number of wine classes
classifications = 2

# load dataset
dataset = np.loadtxt('/content/wine.csv', delimiter=",")

# split dataset into sets for testing and training
X = dataset[:,1:12]
Y = dataset[:,0:1]
x_train, x_test, y_train, y_test = train_test_split(X, Y, test_size=0.66, random_state=5)

# convert output values to one-hot
y_train = keras.utils.to_categorical(y_train-1, classifications)
y_test = keras.utils.to_categorical(y_test-1, classifications)


# creating model
model = Sequential()
model.add(Dense(10, input_dim=13, activation='relu'))
model.add(Dense(8, activation='relu'))
model.add(Dense(6, activation='relu'))
model.add(Dense(6, activation='relu'))
model.add(Dense(4, activation='relu'))
model.add(Dense(2, activation='relu'))
model.add(Dense(classifications, activation='softmax'))

# compile and fit model
model.compile(loss="categorical_crossentropy", optimizer="adam", metrics=['accuracy'])
model.fit(x_train, y_train, batch_size=15, epochs=2500, validation_data=(x_test, y_test))

my csv code is kind of big so I’ll only put a few columns:

7.4,0.7,0,1.9,76,11,34,9.978,3.51,0.56,9.4,5,red
7.8,0.88,0,2.6,98,25,67,9.968,3.2,0.68,9.8,5,red
7.8,0.76,0.04,2.3,92,15,54,997,3.26,0.65,9.8,5,red
11.2,0.28,0.56,1.9,75,17,60,998,3.16,0.58,9.8,6,red
7.4,0.7,0,1.9,76,11,34,9.978,3.51,0.56,9.4,5,red
7.4,0.66,0,1.8,75,13,40,9.978,3.51,0.56,9.4,5,red
7.9,0.6,0.06,1.6,69,15,59,9.964,3.3,0.46,9.4,5,red
8,0.27,0.25,19.1,45,50,208,100.051,03.05,0.5,9.2,6,white
6.3,0.38,0.17,8.8,0.08,50,212,99.803,3.47,0.66,9.4,4,white
7.1,0.21,0.28,2.7,34,23,111,99.405,3.35,0.64,10.2,4,white
6.2,0.38,0.18,7.4,95,28,195,99.773,3.53,0.71,9.2,4,white
8.2,0.24,0.3,2.3,0.05,23,106,99.397,2.98,0.5,10,5,white
7,0.16,0.26,6.85,47,30,220,99.622,3.38,0.58,10.1,6,white
7.3,815,0.09,11.4,44,45,204,99.713,3.15,0.46,9,5,white
6.3,0.41,0.16,0.9,32,25,98,99.274,3.16,0.42,9.5,5,white

and the error is as follows:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-7-942872a3fef1> in <module>()
     11 
     12 # load dataset
---> 13 dataset = np.loadtxt('/content/wine.csv', delimiter=",")
     14 
     15 # split dataset into sets for testing and training

/usr/local/lib/python3.6/dist-packages/numpy/lib/npyio.py in floatconv(x)
    792         if '0x' in x:
    793             return float.fromhex(x)
--> 794         return float(x)
    795 
    796     typ = dtype.type

ValueError: could not convert string to float: 'red'

Help me please Thank you!

1 answer

Browser other questions tagged python csv machine-learning

You are not signed in. Login or sign up in order to post.

by Ricardo Tenorio • 86 points · Answer 1 · 2020-07-01T16:56:27+00:00

Hi, the problem is that your database (Wine.csv) has in its lines both numbers and strings (Labels); one way to read this data would be using the pandas and Labelencoder (to convert its categories -- red and white -- to one-hot) from scikit-Learn. Another thing I noticed is that you may be confusing your predictors with the target, the changes I made to the code are considering that you want the numerical values as predictors and 'red' and 'white' as the target to be classified. I hope I’ve helped,

from sklearn.model_selection import train_test_split
import keras
from keras.models import Sequential
from keras.layers import Dense 
import numpy as np
import pandas as pd
from sklearn.preprocessing import LabelEncoder

LE = LabelEncoder()

np.random.seed(2)

# number of wine classes
classifications = 2

# load dataset
dataset = pd.read_csv('/content/wine.csv', header=None)

dataset[12] = LE.fit_transform(dataset[12])

X = dataset.iloc[:,1:12].values
Y = dataset.iloc[:,12]
x_train, x_test, y_train, y_test = train_test_split(X, Y, test_size=0.66, 
                                                random_state=5)

# convert output values to one-hot
y_train = keras.utils.to_categorical(y_train-1, classifications)
y_test = keras.utils.to_categorical(y_test-1, classifications)

# creating model
model = Sequential()
model.add(Dense(10, input_dim=11, activation='relu'))
model.add(Dense(8, activation='relu'))
model.add(Dense(6, activation='relu'))
model.add(Dense(6, activation='relu'))
model.add(Dense(4, activation='relu'))
model.add(Dense(2, activation='relu'))
model.add(Dense(classifications, activation='softmax'))

model.compile(loss="categorical_crossentropy", optimizer="adam", metrics= 
['accuracy'])
model.fit(x_train, y_train, batch_size=15, epochs=2500, validation_data=(x_test, y_test))