Dustin Stevens-Baier
Comp 572
8-5-2006
Assignment#4

A Report describing the problem and the general idea of the backpropagation learning method:

Assignment 5 required us to analyze points in a 4 by 4 square centered at the origin. A circle with a radius of one is located at the center of the square. If the points are located inside the circle, then they are classified as type �A�. If the points are located outside the circle and inside the square, then they are classified as type �B�.

1000 random points are generated to meet these criteria, and they are fed one at a time through the network. The network consists of 2 input nodes, 8 input nodes, and 2 output nodes. The picture above indicates the connectivity between the nodes. The points are fed through for 1000 iterations, or epochs with the weights and biases updated continuously. Once this process is complete, the network is �trained.� Now the network must recognize 100 additional points and correctly identify whether the points are type A or type B. This is to see if the network generalizes correctly.

A. Copy of the code.

The source can be found here
.

B. Describe how the training set was chosen, and how training was done.

1000 points within a 4 by 4 square centered at the origin were stored in an array. Each element in the array was represented as a Point datastructure ( consisting of the x and y coordinates, as well as the type of point). If a point was inside a unit circle centered at the origin, then the point was of type �A�, else it was type �B�. The training set was processed through the network 100 times as specified in the assignment. Each time a point was processed, the weights of the connections and the bias of each node in the network was updated. This file contains the original points in the graph: training.xls
The initialized state of the network is represented in the files: nodes.xls

C. Value of "step size" (called "gain" in Lippmann).

Selecting a step size is deciding on the proper mix of speed and stability. We want the network to quickly adapt to input, however given the randomness of the input we do not want to end up with a non converging network. Several values were tested. Any value below 0.1 did not seem to work well. Values above 0.3 also seemed to generate more erratic results when examining the success rates of the training. Values around .2 seemed to work the best.

D. Value of the "momentum" term.

The concept of a momentum term is to allow for quicker learning while minimizing unstable behavior. Since the network took very little time to learn there was no momentum turn used.

E. Explain how you updated the weights: after each point was shown, or after many points were show (batch).

I updated the weights after every input. This appeared to be what most former students did.

F. Describe how many iterations (shown points, or epochs) it took for the network to learn the difference between types A and B.

It took from 300 to 900 epochs for the network to show a 90% success rate while using a gain of 0.1 and a threshold of 0.1. If the activation value of an output node was greater than 1 minus the threshold, then it was considered a success for that particular point. I also noticed that in the beginning the success rate drastically increases and later it takes a much bigger increase in epochs to get a smaller increase in success rate. In the example included I used .2 as the gain (stepsize). In this case it took about 150 epochs to get to the 90 percent success rate then it took another 150 epochs to get to the 93 percent success rate. This file contains the success rate after each epoch: success.xls

G. After the network was trained, what % of the time did it get the classification correct? (test it on 100 randomly chosen points).

The network seemed to have successful results between 85-90 percent. I ran it 10-15 times and never had anything worse than 80 and better than 92.

A sample run of the test points is in this file: testing.xls

H. What decision criteria did you use? How did you decide that the network output was classifying the input as "A" or "B"?

As suggested in the assignment, the network classified the input as Type A or B based on the binary output of the resulting two neuron activations in the output layer C. (0 1) TYPE A and (1 0) TYPE B. If the errors of outputs are all smaller than 0.1 i.e the threshold value , it means the classification is correct.

Hosted by www.Geocities.ws