Normalisation of data in R

Asked

Viewed 1,260 times

1

I’m trying to normalize the data and give the following error:

Error in if (colSd != 0) res[, i] <- (x[, i] - colMean)/colSd else res[,  : 
  missing value where TRUE/FALSE needed

the code in R is this:

# Modelo de Trinamento de uma Rede Multi-Layer Perceptron
library(RSNNS)

#carrega dados funcionando
Dados= read.delim("~/TrainingDataset.txt",
                  header = FALSE,
                  sep = ",",
                  quote = "\n\r")


# Embaralha os exemplos
Dados <- Dados[sample(1:nrow(Dados),length(1:nrow(Dados))),1:ncol(Dados)]

# Determinas os atributose a classe
DadosValues <- Dados[,1:30]
DadosTargets <- decodeClassLabels(Dados[,31])


# Divide o conjunto de dados em treinamento e teste uma proporcao escolhida
Dados <- splitForTrainingAndTest(DadosValues, DadosTargets, ratio=0.15)

# Normaliza o conjunto de dados para uma faixa de representacao igual para odos atributos
Dados <- normTrainingAndTestSet(Dados)

# Gera o modelo MLP (Multi-Layer Perceptron com SIZE neurônios) treinado.
model <- mlp(Dados$inputsTrain, Dados$targetsTrain, size=5, learnFuncParams=c(0.1), 
             maxit=50, inputsTest=Dados$inputsTest, targetsTest=Dados$targetsTest)


summary(model)
#model
weightMatrix(model)
extractNetInfo(model)


par(mfrow=c(2,2))
plotIterativeError(model)


# Caldulando as saidas prvistas pela rede MLP
predictions <- predict(model,Dados$inputsTest)

plotRegressionError(predictions[,2], Dados$targetsTest[,2])

# Monta a matrix de confusão entre a saída esperada e a saída calculada
confusionMatrix(Dados$targetsTrain,fitted.values(model))
confusionMatrix(Dados$targetsTest,predictions)

A small sample of my data:

-1,1,1,1,-1,-1,-1,-1,-1,1,1,-1,1,-1,1,-1,-1,-1,0,1,1,1,1,-1,-1,-1,-1,1,1,-1,-1
1,1,1,1,1,-1,0,1,-1,1,1,-1,1,0,-1,-1,1,1,0,1,1,1,1,-1,-1,0,-1,1,1,1,-1
1,0,1,1,1,-1,-1,-1,-1,1,1,-1,1,0,-1,-1,-1,-1,0,1,1,1,1,1,-1,1,-1,1,0,-1,-1
1,0,1,1,1,-1,-1,-1,1,1,1,-1,-1,0,0,-1,1,1,0,1,1,1,1,-1,-1,1,-1,1,-1,1,-1
1,0,-1,1,1,-1,1,1,-1,1,1,1,1,0,0,-1,1,1,0,-1,1,-1,1,-1,-1,0,-1,1,1,1,1
-1,0,-1,1,-1,-1,1,1,-1,1,1,-1,1,0,0,-1,-1,-1,0,1,1,1,1,1,1,1,-1,1,-1,-1,1
1,0,-1,1,1,-1,-1,-1,1,1,1,1,-1,-1,0,-1,-1,-1,0,1,1,1,1,1,-1,-1,-1,1,0,-1,-1
1,0,1,1,1,-1,-1,-1,1,1,1,-1,-1,0,-1,-1,1,1,0,1,1,1,1,-1,-1,0,-1,1,0,1,-1
1,0,-1,1,1,-1,1,1,-1,1,1,-1,1,0,1,-1,1,1,0,1,1,1,1,1,-1,1,1,1,0,1,1
1,1,-1,1,1,-1,-1,1,-1,1,1,1,1,0,1,-1,1,1,0,1,1,1,1,1,-1,0,-1,1,0,1,-1
1,1,1,1,1,-1,0,1,1,1,1,1,-1,0,0,-1,-1,-1,0,1,1,1,1,-1,1,1,1,1,-1,-1,1
1,1,-1,1,1,-1,1,-1,-1,1,1,1,1,-1,-1,-1,-1,-1,0,1,1,1,1,-1,-1,-1,-1,1,0,-1,-1
-1,1,-1,1,-1,-1,0,0,1,1,1,-1,-1,-1,1,-1,1,1,0,-1,1,-1,1,1,-1,-1,-1,1,0,1,-1
1,1,-1,1,1,-1,0,-1,1,1,1,1,-1,-1,-1,-1,1,1,0,1,1,1,1,-1,-1,0,-1,1,1,1,-1
1,1,-1,1,1,1,-1,1,-1,1,1,-1,1,0,1,1,1,1,0,1,1,1,1,1,-1,1,-1,1,-1,1,1
1,-1,-1,-1,1,-1,0,0,1,1,1,1,-1,-1,0,-1,1,1,0,1,1,1,1,1,-1,-1,-1,1,0,1,-1
1,-1,-1,1,1,-1,1,1,-1,1,1,-1,1,0,-1,-1,-1,-1,0,1,1,1,1,1,-1,0,-1,1,1,-1,-1
1,-1,1,1,1,-1,-1,0,1,1,-1,1,1,0,-1,-1,-1,-1,0,1,1,1,1,-1,1,1,-1,1,1,-1,-1
1,1,1,1,1,-1,-1,1,1,1,1,-1,-1,0,-1,-1,-1,-1,0,1,1,1,1,1,-1,-1,1,1,-1,-1,1
1,1,1,1,1,-1,-1,1,-1,1,1,1,1,0,0,-1,-1,-1,0,-1,-1,-1,-1,1,-1,0,-1,1,0,-1,1
1,0,-1,1,1,-1,0,1,-1,1,1,1,1,0,0,-1,-1,-1,0,-1,1,-1,1,-1,1,1,-1,1,-1,-1,1
1,0,1,1,1,-1,0,1,1,1,1,-1,-1,0,-1,-1,-1,-1,0,1,1,1,1,-1,1,-1,-1,1,0,-1,1
1,1,1,1,1,-1,-1,-1,-1,1,1,-1,1,0,0,-1,1,1,0,1,1,1,1,1,1,0,-1,1,-1,1,1
1,1,1,1,1,-1,1,0,-1,1,1,1,1,0,0,-1,1,1,0,1,1,1,1,1,1,1,-1,1,-1,1,1
1,-1,-1,-1,1,-1,1,1,-1,1,1,-1,-1,0,0,-1,1,1,0,1,1,1,1,1,1,-1,-1,1,0,1,1
1,-1,1,1,1,-1,0,1,-1,1,1,1,1,1,0,-1,1,1,0,1,1,1,1,-1,1,1,-1,1,0,1,1
1,-1,1,1,1,-1,0,-1,1,1,1,-1,-1,-1,-1,-1,-1,-1,0,1,1,1,1,1,1,0,-1,1,-1,-1,-1

Any idea how to fix?

  • Since we do not have access to your data, the example is not reproducible, so there is not much to help you by giving a definitive answer. Also, there is no indication of which function is reporting this error. It would be normTrainingAndTestSet ? One suggestion I give you is to look for columns in the data that have constant values. After all, if the column values are constant, its standard deviation is zero. There is no point in applying a formula of the type (x[, i] - colMean)/colSd because this generates a division by zero.

  • @Marcusnunes I added a sample just to give you an idea of how the dice are, maybe help me hahaha but from now on, thank you! and yes, the function with problem is the one mentioned by you

  • 1

    Instead of doing this, turn from the command apply(Dados, 2, sd). This will give you the standard deviations for each column of your data set. If any of the reported values are close to zero, then your data has a problem. Read pages 3 and 4 of this article (careful, pdf) that it gives some ideas on how to solve this problem.

No answers

Browser other questions tagged

You are not signed in. Login or sign up in order to post.