FannTest.nb

Empirical Evaluation of the
Fann Training Algorithms

Objective

To compare the Fann training algorithms with published results on similiar algorithms to verify the implementation of the algorithms in Fann. The case chosen here is described in section 3.3 of 'Empirical Evaluation of the Improved Rprop Learning Algorithms' by C. Igel and M. Hüsken, Institut für Neuroinformatik, Ruhr-Universität Bochum. The case was chosen because the training data is readily available and fully defined.

Based on inspection of the source code the Rprop implementation in Fann must be the algorithm so similar results to those reported in the paper are assumed to be obtainable. The paper also includes results on QuickProp.

The 1.2.0 release version of Fann was used with the Fann for Mathematica extensions using double precision math.

Problem and Model Description

Section 3.3.1 in the paper states: 'The goal of this regression task is to reproduce the time series of the average number of sunspots observed per year. The data from the time steps t - 1, t - 2, t - 4 and t - 8 are used to predict the average number of spots at time t. The input data are normalized between 0.2 and 0.8. 289 patterns are used, the first pattern to predict is from year 1708. A 4-5-1 feed-forward neural network without shortcut connections is used.'

Training data

Historical sunspot data is available at the National Geophysical Data Center, www.ngdc.noaa.gov.

Yearly Sunspot Numbers from 1700 to Present

Implementation

Select and normalize the data

'The input data are normalized between 0.2 and 0.8. 289 patterns are used, the first pattern to predict is from year 1708.'

Implementation

Construct the training data

'The data from the time steps t - 1, t - 2, t - 4 and t - 8 are used to predict the average number of spots at time t.'

Implementation

Neural Network

'A 4-5-1 feed-forward neural network without shortcut connections is used.' In the paper it is further stated: 'In all experiments, we use the same parameters for the four Rprop algorithms. These parameters are set to = 1.2, = 0.5, = 0.5 (the initial value of the ), = 0 and = 50 [...]. The only exception is the training of the recurrent neural network, where we set = 0.0125 [...]' Note that in Fann the value of is fixed at 0.0125.

For Quickprop good results were obtained with the learning rate set at 1.5, in the paper a value of 1.0 was chosen. Batch backpropagation obtained good results with a learning rate (η) of 1.3. The activation functions chosen in the paper was sigmoid for all hidden units and linear for outputs. However this gives considerably poorer performance for the incremental back propagation algorithm, so sigmoid is used for outputs for this algorithm. The paper does not state which error function was used, here the tanh error function is used for all networks.

Implementation

Weights

'To achieve a fair comparison, the [...] random weight initializations were the same for all the learning algorithms.'

Implementation

Training

The graph in the paper showed that training occurred for 1000 propagations and that the median of 100 runs were used. Here the networks are trained for 1000 epochs and the mean of 100 runs is shown. The paper found that on this problem Rprop performed best followed by other algorithms such as Quickprop. The Rprop algorithm performs best among the batch algorithms although Quickprop initially converges faster. However - as suggested by S. Nissen - when shuffling the data the incremental backpropagation algorithm outperforms all other algorithms supported in the Fann library.

Implementation

Figure 1

[Graphics:HTMLFiles/FannTest_17.gif]

Forecasts

A representative sample of each of the trained networks can be executed so their output can be directly compared to the desired outputs.

Implementation

Figure 2 - 6

[Graphics:HTMLFiles/FannTest_19.gif]

[Graphics:HTMLFiles/FannTest_20.gif]

[Graphics:HTMLFiles/FannTest_21.gif]

[Graphics:HTMLFiles/FannTest_22.gif]

[Graphics:HTMLFiles/FannTest_23.gif]

Mean Square Error

To verify the results, the mean square error (trainmse) reported during the training can be compared to a mean square error calculated directly from the desired outputs and the obtained outputs (calcmse). During the preparation of this analysis it was found that the mean square error used internally in the Fann library for the incremental backpropagation algorithm can deviate substantially from a correctly calculated mean square error. Fann for Mathematica was changed so it does not return the internal values, however it should be noted that this fix only resolves the issue when passing training data from Mathematica and not when training on data in a file. It was also found that the Rprop and Quickprop implementations only work as expected when training is performed in one call, apparently due to the internal resetting of data structures in the Fann library. This has a negative effect when these algorithms are used in multiple steps such as in animations of the learning process. Users of the Fann Library should be aware of these limitations to avoid surprising results.

Implementation

Table 1 Difference between independently calculated mse values and the mse values returned by Fann for Mathematica

Conclusion

The objective of verifying that published results can be reproduced with Fann was achieved in the sense that the Rprop algorithm outperformed the other batch algorithms which is in accordance with the published results. However as suggested by S. Nissen shuffling the data enables the incremental backpropagation algorithm to outperform all other algorithms on this test. This scenario was not included in the published paper.

Created by freegoldbar (at) yahoo dot com (December 5, 2004)

Hosted by www.Geocities.ws