Neural network with a scalar input (time) and 3 output values that does not train properly

Asked

Viewed 35 times

3

Hello! I am building a network that should map time values to a vector containing 3 values of reagent concentrations, simulating a chemical reaction of type A->B->C. The concentration values were obtained with the Solver odeint and solving the equations numerically, thus having a data set to feed the network.

I’m using the Keras, with the following network:

model = Sequential()

model.add(Dense(1, activation='relu')) #input layer [t]

model.add(Dense(64, activation = 'relu'))
model.add(Dense(64, activation = 'relu'))
model.add(Dense(16, activation = 'relu'))

model.add(Dense(3, activation = 'relu')) #output layer [CA, CB, CC]

model.compile(loss = "mean_squared_error", optimizer = "Adam", metrics = ['mean_squared_error'])

return model

By training the network, I get the following learning curve: Curva de aprendizado do modelo

However, when I will test the network for the t values used as input, using the following code:

for i in range(len(t)):
  C.append(model.predict([t[i]]))
C = np.array(C)

plt.plot(t,C[:,0]);
plt.legend(['A','B','C']);

Getting the following curve:

Perfil de concentração do modelo

While right would be something close to:

Concentrações reais

I would like to know what may be causing this problem, and how to resolve it. Thank you!

  • You are right about the use of function mean_squared_error? It is a problem of regression or classification?

  • As far as I’m concerned, it’s a regression problem. Basically my network should fit to a set of 3 differential equations that are generated by the mass balances of each component.

  • There is little information to understand here, but apparently, your problem may be classification. If Voce expects its network to manage the outputs [CA, CB, CC], predicting whether a reaction is of type CA, CB or CC, Voce has a classification problem, categorical_crossentropy

1 answer

1

It seems to me that you are modeling your problem the wrong way (correct me if I’m wrong, I delete the answer if that’s the case).

Although you are interested in A/B/C concentrations over time, it is not time that should be your input variable. Its input variable should be the concentrations of the three reagents at time tn and its output concentrations in tn+1.

I’m not familiar with your problem, but I imagine you have three equations of state like:

of/dt = c1,1 A + c1,2 B + c1,3C
dB/dt = c2,1 A + c2,2 B + c2,3C
dC/dt = c3,1 A + c3,2 B + c3,3C

Discretizing time, your value update equation should be something like:

A(tn+1) = A(tn) + dt of/dt
B(tn+1) = B(tn) + dt dB/dt
C(tn+1) = C(tn) + dt dC/dt

As far as I know, this is what differential equation solvers do behind the scenes - translate the system into state equations and solve the equation step by step (potentially using adaptive dt).

If these assumptions are correct (or at least in the right direction), what I recommend is the following:

  • Create a dataset where X is the concentration at each step and Y is the concentration at the next step
  • If your original dataset does not have the dt constant, add the dt variable to your inputs so that your model can learn to consider the duration of the step in the calculation of the following concentrations
  • Separate a percentage of this dataset for validation (eg last 10% of the period you have available)
  • Train a neural network with 3 inputs (or 4 if dt is included) and 3 outputs, using MSE as Loss Function
  • Try other activation functions in the output layer. You are using outputs that are restricted in the range [0.1], so you may be able to use the Sigmoid function as activation instead of Relu. Remember that Relu fills [0,[
  • In your input layer, do not use 1 as the number of units. Remember that the number of units gives the output dimension of that layer, not the input. That is, the number of units does not need to correspond to the dimensionality of its input. To tell Keras the size of your input, use the argument input_shape in the declaration of the layer, as below:
model.add(Dense(64, activation='relu', input_shape=[3]))

(Note input_shape=[3], referring to the use of 3 variables as input to your network).

Let me know if it works!

Browser other questions tagged

You are not signed in. Login or sign up in order to post.