Computational Intelligence SS14 Homework 2 Neural Networks

Computational Intelligence SS14
Homework 2
Neural Networks
Zeno Jonke
Tutor:
Points to achieve:
Extra points:
Info hour:
Deadline:
Hand-in mode:
Detailed hand-in instructions:
Newsgroup:
Anja Karl, [email protected]
16 pts
4* pts
05.05.2014 16:00 - 17:00, HS i12
09.05.2014 17:00
Hand in a hard-copy of your results at the IGI hand-in boxes
(Inffeldgasse 16b, 1st floor) on 09.05.2014 between 9:00 and 17:00.
Use the cover sheet from the website. Send code by e-mail to tutor.
https://www.spsc.tugraz.at/courses/computational-intelligence/
tu-graz.lv.ci
Contents
1 Optional: Backpropagation for RBF networks [4 *points]
2
2 Regression with Neural Networks [7 points]
2.1 Simple Regression with Neural Networks [3 points] . . . . . . . . . . . . . . . . . . . . . . . .
2.2 Regularized Neural Networks [4 points] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3 Hints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2
2
2
3
3 Face Recognition with Neural Networks
3.1 Pose Recognition . . . . . . . . . . . . .
3.2 Face Recognition . . . . . . . . . . . . .
3.3 Hints and Remarks . . . . . . . . . . . .
3
3
4
4
[9 points]
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
General remarks
Your submission will be graded based on. . .
• The correctness of your results (Is your code doing what it should be doing? Are your plots consistent
with what algorithm XY should produce for the given task? Is your derivation of formula XY correct?)
• The depth and correctness of your interpretations (Keep your interpretations as short as possible, but
as long as necessary to convey your ideas)
• The quality of your plots (Is everything important clearly visible in the print-out, are axes labeled,
. . . ?)
1
2
1
Optional: Backpropagation for RBF networks [4 *points]
Consider the following 2-layer feedforward neural network with a 2-dimensional input, M hidden and K
output neurons. The output of k-th neuron in the output layer is given by :
M
X
(2)
zk (x) = σ(
wkj exp(−||x − µj ||2 /b2j )),
j=1
where σ(·) denotes the sigmoid function. Your task is to derive a weight update rule for the weights wkj , µj
and bj using the backpropagation algorithm. Only consider the sum of squared errors of a single example
(x, y). Assume that the activation (the input to the activation function) of the neuron in the hidden layer is
(1)
given as aj = b12 ||x−µj ||2 . Always use the same formalism and notation as discussed in the lecture/practical
j
course.
(1)
• Define the activation function fj
(2)
(1)
and the output zj
of the neurons in the hidden layer.
(2)
(2)
• Define the activation ak , the activation function fk and the output zk of the neurons in the output
layer. Use the definitions from the hidden layer to simplify your equations.
(2)
• Calculate δk for the output neurons and the resulting weight update ∆wkj for the weights in the
output layer.
(1)
• Calculate δj for the hidden neurons and the resulting weight updates ∆µj and ∆bj for the parameters
in the hidden layer.
2
Regression with Neural Networks [7 points]
2.1
Simple Regression with Neural Networks [3 points]
In this task a simple 1-dimensional function should be learned with feed-forward neural network. Use the
data.mat dataset. In regressionNN_skeleton.m you can find the code for data loading and normalization,
and for plotting of results. Add your code to implement required functionality.
• Train a neural network with n = [1, 2, 3, 4, 6, 8, 12, 20, 40] hidden neurons. Use the training algorithm
’trainscg’, train for 700 epochs.
• Plot the mean squared error(mse) of the training and of the test set for different number of hidden
neurons n.
• Interpret your results. What is the best value of n?
• Plot the error on the test set and on the training set for n = 2, n = 8 and n = 40 during training (you
can save these plots by using saveas function, or you can get the relevant data from the performance
structure returned by the train function). Interpret your result, is the error on the training and test
set always decreasing?
• Plot the learned functions for n = 2, n = 8 and n = 40. Interpret your results, refer to results from
the previous plots!
2.2
Regularized Neural Networks [4 points]
Now we want to investigate different regularization methods for neural networks, i.e. weight decay and early
stopping. Use the same dataset as before.
CI SS14 Homework 2
Tutor: Anja Karl, [email protected]
2.3
Hints
3
• Weight Decay: Train a neural network with n = 40 hidden neurons. Use the training algorithm
’trainscg’, train for 700 epochs. Use the regularized error function msereg as net.performFcn. This
performance function is equivalent to the standard loss function used for weight decay:
X
msereg = αmse + (1 − α)
wi2
i
Use different regularization factors (α resp. net.performParam.ratio in matlab) of
α = [0.9, 0.95, 0.975, 0.99, 0.995, 1.0];
• Plot the mean squared error of the training and of the test set for the given regularization factors.
• Interpret your results, i.e. explain the course of both plots. What is the best value of α?
• Plot the learned functions for the lowest, the highest and the best value of α.
• Early Stopping: Now we want to test the performance of early stopping. Again train a neural
network with 40 hidden neurons, use the standard mse function as performance function. Train a
neural network for 700 epochs and determine at which epoch epochES the error on the test-set reaches
the minimum. Retrain a second neural network starting from the same initial weights as the first
network, but this time only train for epochES epochs.
• Determine the error on the test set and plot the learned function. Compare the error and the plotted
function to the fully trained network.
• Compare the performance (error on the test set) and the learned functions for early stopping, weight
decay and for the different number of hidden neurons. Which type of regularization would you prefer?
What are the advantages/disadvantages of these methods?
2.3
Hints
• You can easily use the training record tr returned by the train function to get the error on the training
set. If you supply the test set to the train function via the TV parameter, i.e. train(net,P,T,[],[],[],TV),
you will also find the error on the test set in that structure. See doc train for more info.
• For weight decay you can’t use the performance structure returned by the train function because
it returns the regularized error function and not the mse. Instead, you need to determine the mse
explicitly (e.g. you can use the function mse(errorvector)).
• Always normalize your training data to zero-mean and unit variance (use the function mapstd).
3
Face Recognition with Neural Networks [9 points]
In this task you work with the data file faces.mat which contains face images. The dataset contains images of
different persons, with different poses (straight/left/right/up), with/without sunglasses and showing different
emotions. It contains 2 datasets: dataset1 (input1, target1) with 60 data points and dataset2 (input2, target2)
with 564 data points. The target matrices contain the class information. The first column codes the person,
the second column the pose, the third column the emotion and the last column indicates whether the person
is wearing sunglasses. In faces_template.m you can find a script for training a network to recognize the
presence of sunglasses. This script can be used as template. Additionally you need to download the file
confmat.m which is needed to calculate the confusion matrix.
3.1
Pose Recognition
• Train a 2 layer feed-forward neural network with 6 hidden units for pose recognition. Use dataset2 for
training, trainscg as training algorithm and train for 300 epochs. Do not use any test set.
CI SS14 Homework 2
Tutor: Anja Karl, [email protected]
3.2
Face Recognition
4
• State the confusion matrix on the training set. Are there any poses which can be better separated
than others?
• Plot the weights of the hidden layer for every hidden unit. Can you find particular regions of the images
which get more weights than others? Do particular units seem to be tuned to particular features of
some sort?
3.2
Face Recognition
• Train a 2 layer feed-forward neural network with 20 hidden units for recognizing the individuals. Use
dataset1 for training, trainscg as training algorithm and train for 1000 epochs. Use dataset2 as the
test set.
• Repeat the process 10 times starting from a different initial weight vector. Plot the histogram for the
resulting mean squared error (mse) on the training and on the test set.
• Interpret your results! Explain the variance in the results.
• Use the best network (with minimal mse on the test set) to calculate the confusion matrix for the test
set and the mean classification error (not the mse !) on the test set. Plot a few misclassified images.
Do they have anything in common?
3.3
Hints and Remarks
• Normalize your input data using mapstd.
• In the template script you can find the code for plotting an image and plotting the weights of a hidden
neuron.
• Be aware that the template script only covers the 2 class classification case!
• Use the functions full and ind2vec to get from the standard class coding to a 1 out of n coding, and
vec2ind for the other way round.
• Present your results clearly, structured and legible. Document them in such a way that anybody can
reproduce them effortlessly.
CI SS14 Homework 2
Tutor: Anja Karl, [email protected]