CPTS 570 Machine Learning, Fall 2014 Homework #1 Due Date: Sep 18 1. (5 points) Answer the following questions with a yes or no along with proper justification. a. Is the decision boundary of voted perceptron linear? b. Is the decision boundary of averaged perceptron linear? 2. (10 points) In the class, we saw the Passive-Aggressive (PA) update that tries to achieve a margin equal to one after each update. Derive the PA weight update for achieving margin M . 3. (15 points) The standard perceptron algorithm we discussed in the class performs weightupdate whenever there is a prediction error (ˆ yt 6= yt ) as follows: wt+1 = wt + yt · xt (1) Suppose we know beforehand that the training data is separable with margin value γ > 0. So we decided to incorporate this knowledge into the perceptron algorithm by changing the weights whenever there is a margin error: wt 0 < yt · · xt ≤ γ (2) kwt k Let B(xt , γ) be a ball of radius γ centered at xt . Whenever there is a margin error, we want to update the weights such that all the points in the ball B(xt , γ) have the same label. Derive the weight update to achieve this criteria. 4. (70 points) Programming and empirical analysis question. Implement a multi-class online learning algorithm with both perceptron and passive-aggressive (PA) weight update as shown below. Employ the single weight vector represenation (representationII as discussed in the class). This representation is defined as follows. Each training example is of the form (xt , yt ), where xt ∈ <d is the input and yt ∈ {1, 2, · · · , k} is the class (output) label. In this representation, you will have a single weight-vector w ∈ <k·d and the augmented feature function F (xt , y) ∈ <k·d will have k blocks of size d and it will have zeroes everywhere except for the y th block, which will have xt in it. Algorithm 1 Online Multi-Class Learning Algorithm Input: D = Training examples, k = number of classes, T = maximum number of training iterations Output: w, the final weight vector 1: Initialize the weights w = 0 2: for each training iteration itr ∈ {1, 2, · · · , T } do 3: for each training example (xt , yt ) ∈ D do 4: yˆt = arg maxy∈{1,2,···,k} w · F (xt , y) // predict using the current weights 5: if mistake then 6: w = w + τ · (F (xt , yt ) − F (xt , yˆt )) // update the weights 7: end if 8: end for 9: end for 10: return final weight vector w For standard perceptron, you will use τ = 1, and for Passive-Aggressive (PA) algorithm, you will compute the learning rate τ as follows. τ= 1 − (w · F (xt , yt ) − w · F (xt , yˆt )) kF (xt , yt ) − F (xt , yˆt )k2 (3) You are provided with a training set, development set, and testing set of classification examples. Run the perceptron and passive-aggressive algorithm on the training data to compute the following. a. Compute the online learning curve for both Perceptron and PA algorithm by plotting the number of training iterations (1 to 100) on the x-axis and the number of mistakes on the y-axis. Compare the two curves and list your observations. b. Compute the accuracy of both Perceptron and PA algorithm on the training data and validation data as a function of the training iterations (1 to 100). So you will have two accuracy curves for Perceptron and another two accuracy curves for PA algorithm. Compare the four curves and list your observations. c. Tune the hyper-parameter (number of training iterations) based on the validation data, and compute the test accuracy of the perceptron and PA algorithm with the corresponding best weights. The test accuracy of which algorithm is better? d. Repeat the Perceptron experiments (a, b, and c) by shuffling the training data during each iteration. Now compare the experiments with and without shuffling. What did you observe? 1 e. Repeat the Perceptron experiments (a, b, and c) with variable learning rate ( iteration ) during each iteration. Now compare the experiments with fixed learning rate and variable learning rate. What did you observe? f. Repeat experiment (c) with averaged perceptron (remember the efficient implementation trick I mentioned in the class). Compare the test accuracies of plain perceptron and averaged perceptron. What did you observe? g. Compute the general learning curve (vary the number of training examples starting from 1000 in the increments of 1000) for a fixed number of training iterations (100). Plot the number of training examples on x-axis and the testing accuracy on the y-axis. List your observations from this curve.
© Copyright 2024 ExpyDoc