Recurrent Neural Networks
Due to internal states of the partially recurrent neural networks, they are appropriate to time series prediction. Those states work as a short-term memory and they are able to represent information about the preceding inputs [Sta97]. That information takes an important place when the goal is long-term prediction.
Partially recurrent networks are multilayer perceptron networks in which a few recurrent connections are introduced. Generally, they have a special group of neurons in the input layer, called context neurons or neurons of state. Thus, in the input layer of partially recurrent networks two types of neurons are distinguished, those that act like input itself, receiving the signals of the outside, and the context neurons that receive output values of one of the layers delayed by one step. They can be used for time series prediction problem [Cho97], [Tsa02], [Gil01], [Sit02].
The Jordan Network
The Jordan neural network proposed by Jordan in 1986 [Jor86a], [Jor86] is characterized because the context neurons receive a copy from the output neurons and from themselves. This, the Jordan network has so many neurons of context as output neurons. The recurrent connections from the output layer to the context neurons have an associated parameter, m, that, generally, takes a constant value positive and smaller than 1.
For time series prediction, the network will have one output neuron that represents the predicted value of the time series at futures instants. Hence, the network will have only one context neuron and its activations at instant t is given by the following expression:
c(t) = mc(t-1) + x(t-1)
where x(t-1) is output network at instant t-1.
The rest of the activations of the network are calculated as in multiplayer perceptron, it is enough to consider as input vector the concatenation of the external input activations and context neurones activations:
u(t) = (x(t),...,x(t-d), c(t)))
Tacking into account the expression of the context neuron activation, it is possible to write:
Therefore, the parameter m equips to the Jordan network with certain inertia the neurons of context of this network. In the previous expression it is observed that the context neuron accumulates the output of the network at all the previous instants and the value of the parameter m determines the sensitivity of the context neuron to retain this information.
The Elman Network
The Elman network [Elm90] is characterized because the context neurons receive a copy of the hidden neurons of the network and these connections do not take to associate any parameter. They exist, therefore, as many context neurons as hidden neurons in the network.
As for the Jordan network, the rest of the activations are calculated as in a multilayer perceptron, considering the concatenation of external inputs and context neurons as the vector input to the network.
The training process for Jordan and Elman neural networks
The training mechanism for Jordan and Elman networks can be summarized in following the steps:
The activations of the context neurons are initialised as zero at the initial instant.
The external input (x(t)...,x(t-d)) at instant t and the neurons context activations at instant t are concatenated to determined the input vector u(t) to the network, which is propagated towards the output of the network, obtaining therefore the prediction at instant t+1.
The back propagation algorithm is applied to modify the weights of the network
The time variable time is increased in one unit and the procedure goes to step 2
The Multi-Step Recurrent network
The Multi-Step recurrent network [Gal01] is a partially recurrent network, in which the feedback connections are from the output neuron to the input layer. In this case, the context neurones memorise previous outputs of the network. With the purpose of long-term prediction, the number of context neurones and the number of input units receiving the predicted and measured time series values, respectively, depend on the prediction horizon h. Also, the number of input and context neurones is changing every sampling time, as it is described below.
Assuming that the prediction horizon is fixed to h and assuming that at instant k the goal is to predict the time series values at instants k+1, k+2, ..., k+h+1, the number of input units decreases from d+1 to d+1-h and the number of context neurons increases from 0 to h, respectively. Thus, the sequences received by the external inputs and the context neurons, at every instant k, are given by the following sequence:
The number of context neurons is initialized to zero and the external inputs receive the sequence: x(k),...,x(k-d)
The future instants k+I for i=2,...,h+1 are not real, but simulated. Now the input units receive the vector x(k+i),...,x(k-d-i), and the (i-1)-th context neurons memorise the previous i-1 outputs of the network, i.e.:
After that, the external inputs and the context neurons are resettled
The parameters of the Multi-step recurrent model are determined to minimise the error along interval [k+1, k+h+1]. At each instant k, starting with k=d:
The number of context neurons is initialised to zero. d+1 external input neurons are set receiving the measured values of the time series, x(k),...,x(k-d). The output of the network is given by the following equation:
The number of context neurons is increased in one unit and the number of external units is decreased also in one unit. The context neuron memorises the output of the network, previously calculated. Thus, the prediction at the simulated instant k+2 is given by:
Step 2 is repeated until h context neurons are achieved. The outputs of the recurrent model at instants k+3, ...,k+h+1 are given by the following equations, respectively:
At this moment, the parameter set of the model, W2, is updated. In order to impose a training phase for the purpose of long-term prediction, the learning is based on the sum of the local errors along the prediction horizon, i.e. along the interval [k+1, k+h+1]. Hence, the parameter set W2 is updated following the negative gradient direction of the error function given by
At this point the time variable k is increased in one unit and the procedure returns to step 1
The procedure is repeated for the complete training set until to reach the convergence.
Once the training of the model is finalised, it can be used for the purpose of multi-step prediction. In this case, the steps 1, 2, 3 and 5 are carried out.