how to define the input array for RNA training in the rnn package?

Asked

Viewed 78 times

2

In the package rnn there is an example of how to proceed to perform network training, which is described in this link (example 1). In the approach of this package the entries are given in the format of a array 3D, where dim 1: samples; dim 2: time; dim 3: variables, however not making explicit the division of inputs and targets (inputs and targets, which is a common approach in packages on RNA). Moreover, in the package description both the inputs and the targets must have the same dimension. So, how can I define my data set for the recurrent neural network in the package rnn?

These would be examples of my data training in a data frame (inputs):

inserir a descrição da imagem aqui

These the targets (targets):

inserir a descrição da imagem aqui

  • If that’s what I’m thinking of your question, it’s about how to get the data for training. You have variables that keep the time record, 10 days, so the dimensions of your input is X=(n_samples, 10, 3) , and Y=(n_samples, 10, 1). You need to organize the instantiated dataset array and configuring the dimensions in dim.

  • can make an example code?

  • share your data that after I exemplifco for you, can use thedput for that, put them in your question.

  • OK, I put my data with hyperlink.

  • The values of your data came all zero, I created some random, for example should serve.

  • @Rafaeltoledo ok, I’ll test and give you a feedback.

Show 1 more comment

1 answer

0

It seemed to me your data are organized as follows, each line you have the record of a single variable in 4 sequential moments of time, the Q_t is the value you want to predict based on the sequence of the last 3 values of it Q_t_1, Q_t_2, Q_t_3.

Therefore, you have such a configuration:

n_samples <- nrow(data)
timesteps <- 3
n_variables <- 1

To adjust the data properly for training, you have to create a three-dimensional array. Then, you can iterate over the original dataset and allocate the values within the sample array by sample.

data_X <- array(NA, dim=c(n_samples, timesteps, n_variables))
data_Y <- array(NA, dim=c(n_samples, 1, 1))

for(i in 1:n_samples){
        data_X[i,,1] <- unlist(data[i, c("Q_t_1", "Q_t_2", "Q_t_3")])
        data_Y[i,,1] <- unlist(data[i, "Q_t"])

}

If you had one more variable, say R_t_*, it would be in `data_X[i,,2] and so on.

To perform the training, follow the same tutorial you mentioned, except that now seq_to_seq_unsync must be TRUE, due to the model having to return a single value from the input sequence. Different from the tutorial where the template returns a sequence of same size as the input.

model <- trainr(Y=data_Y, X=data_X, hidden_dim=100,
                learningrate=0.1, batch_size=1, numepochs=100,
                seq_to_seq_unsync=T)

plot(colMeans(model$error), type="l")

# predita sobre o conjunto de treinamento
data_H <- predictr(model, data_X)

# compara o valor com real com a hipótese
head(cbind(data_Y, data_H))

Browser other questions tagged

You are not signed in. Login or sign up in order to post.