In Machine Learning applications, we often have a collection of examples that consist of attributes and a class label. The point of a learning algorithm is to ‘learn’ to functionally map sets of attributes (input values) to their corresponding class label (desired output). When using an artificial neural network, we feed the attributes into the network as inputs and the output of the network should correspond to the class label associated with the given inputs. When they do not correspond, we update the weights so that they will do so more closely in the future. Because of the nature of neural networks and the learning rate parameter, the desired and actual outputs probably will not correspond after a single update. Thus, we would train the network by repeatedly presenting inputs and updating weights. If we want to evaluate how well the network has learned to classify the data, we set the inputs, compute the outputs and compare to the desired outputs given by the class label.
The important characteristics of a NeuralNet include the number of input units, number of hidden units and number of output units. Typically, neural networks have an input layer of primitive units, a single hidden layer of units, and an output layer. Usually (and for our purposes), the network is fully connected between layers; that is, every input unit provides a signal to every hidden unit, and every hidden unit provides an input to every output unit. When initializing your NeuralNet, you should set weights randomly. The NeuralNetI interface mandates a method to set inputs and determine outputs. Naturally, determining outputs should perform a cascaded computation of the values at the input layer influencing the activation of the units in the hidden layer, which in turn influence the outputs of the output layer.
Before diving into the NeuralNet, we should refactor (and a bit more) our design and code for LTUs. You will want to download the revised interfaces from Eureka and refer to the updated API documentation for those interfaces.
Since the input layer units are a bit different from the hidden and output layer units, your code for a single LTU should work with both types. (Hint: An elegant way to do this is to take advantage of the shared structure in the LTUInputI and LinearThresholdUnitI interfaces. Your input layer can be direct implementations of the former while the hidden and output layers consist of implementations of the latter.) Your constructor for generalized LTUs should consume an argument for the number of inputs to the unit (if it does not already do so).
Modify your implementation of the LTU and write an implementation of the NeuralNet. Your NeuralNet should have a constructor that takes three integers representing the number of input units, hidden units, and output units respectively. It will also be responsible for stitching together the layers in a fully-connected manner as described above.
Although the foundation is in place to train your NeuralNet, we need an extra bit of flow in order to propogate the error backwards through the network. (You are implementing what is known as the back-propogation algorithm.) To repeat and extend the previous instructions:
... in order to train a network, we need to distinguish between the actual activation and the symbolic value we associate with a particular activation. For example, an activation of 0.53 may be treated as an output of "1" but we will need the 0.53 when it comes time to train the network.
... rather than using the step function we used before to determine a unit's output, we can use a sigmoid function that has certain desirable properties. The commonly used function is: sigma(x)=1/(1+e-x), where X is the actual weighted sum for a given node. For example, a hidden node j, would have weighted sum Xj (including the threshold, x0, with its -1 ‘input’ value), and would have an output, Yj = 1/(1+e-Xj). This expression ranges between 0 and 1 for inputs between negative and positive infinity.
Now, back to training. An error signal for node k is the difference between the desired output and the actual output activation (as given by the sigmoid function). Thus, ek=dk-yk, where ek is the error for desired output dk and actual output yk (for output unit k). Now we want to use the error to follow the gradient toward the ideal weight settings across the network. So taking the derivative we get an error gradient, gk=yk·(1-yk)·ek, for each output node k, and gj=yj·(1-yj)·sum(wjk·gk) for each hidden layer node j. Here, the wjk is the weight on the network link between hidden unit j and output unit k.
Finally, during training, wij gets replaced by wij + a · xi · gj for weight wij on link between input unit i and hidden unit j, where a is a learning rate parameter that you can hold constant at some small value such as 0.05. Similarly, weight wjk becomes wjk + a · yj · gk, for weight on link between hidden node j and output node k, where yj is the sigmoid activation for node j and gk is the error gradient for output node k.
On your machine where you are doing your homework, create a folder called <your email name> followed by “HW8”. For example, someone with email address “cjones” would create a folder called “cjonesA8”. Inside that folder, place plain text file(s) containing your answers to any exercises. Also, place whatever Java and documentation files are necessary for your programming project in the same folder. Finally, either tar or zip the folder so that when I extract it, the folder ”<emailname>HW08“ will be created. Finally, submit via Eureka.