Solution to Assignment1

Exercise 3.11
VC Dimension
Bayes Classifier
Solution to Assignment]1
Hong Chang
Institute of Computing Technology,
Chinese Academy of Sciences
Machine Learning Methods (Spring 2014)
Hong Chang (ICT, CAS)
Solution to Assignment]1
Gaussian Bayes Classifier
Exercise 3.11
VC Dimension
Bayes Classifier
Gaussian Bayes Classifier
Exercise 3.11
We have seen that, as the size of a data set increases, the uncertainty
associated with the posterior distribution over model parameters
decreases. Make use of the matrix identity
(M + vvT )−1 = M−1 −
(M−1 v)(vT M−1 )
1 + vT M−1 v
to show that the uncertainty σN2 (x) associated with the linear regression
function given by (3.59) satisfies
2
σN+1
(x) ≤ σN2 (x)
(3.59): σN2 (x) =
1
β
+ φ(x)T SN φ(x)
β is the precision (inverse variance) of noise.
The first term represents the noise on the data; the second term reflects
the uncertainty associated with the parameters.
Hong Chang (ICT, CAS)
Solution to Assignment]1
Exercise 3.11
VC Dimension
Bayes Classifier
Gaussian Bayes Classifier
Solution to Exercise 3.11
As
S−1
N
T
= S−1
0 + βΦ Φ
= S−1
0 +β
N
X
(3.51)
φ(xn )φ(xn )T
(1)
n=1
So,
S−1
N+1
= S−1
0 +β
N+1
X
φ(xn )φ(xn )T
n=1
=
S−1
N
+ βφ(xN+1 )φ(xN+1 )T
2
From (3.59), we can express σN+1
(x) as
2
σN+1
(x)
=
1
+ φ(x)T SN+1 φ(x)
β
=
σN2 (x) + φ(x)T (SN+1 − SN )φ(x)
Hong Chang (ICT, CAS)
Solution to Assignment]1
(2)
Exercise 3.11
VC Dimension
Bayes Classifier
Gaussian Bayes Classifier
Solution to Exercise 3.11
−1
Applying
√ the matrix identity to Eqn (1), by setting M = SN and
v = βφ(xN+1 ):
SN+1 = SN −
βSN φ(xN+1 )φ(xN+1 )T SN
1 + βφ(xN+1 )T SN φ(xN+1 )
Substituting Eqn (3) into Eqn (2), we get:
2
σN+1
(x) = σN2 (x) −
βφ(x)T SN φ(xN+1 )φ(xN+1 )T SN φ(x)
1 + βφ(xN+1 )T SN φ(xN+1 )
Since SN is positive definite and SN φ(xN+1 )φ(xN+1 )T SN is positive
semidefinite, the second term above are nonnegative. Hence
2
σN+1
(x) ≤ σN2 (x).
Hong Chang (ICT, CAS)
Solution to Assignment]1
(3)
Exercise 3.11
VC Dimension
Bayes Classifier
Gaussian Bayes Classifier
Understanding by Illustration
1
1
t
t
0
0
−1
−1
0
x
1
1
0
x
1
0
x
1
1
t
t
0
0
−1
−1
0
x
1
Preditive distribution for Bayesian linear regression by linear combination of
Gaussian basis functions.
The mean (red curve) and standard deviation (red shaded region) are shown.
Hong Chang (ICT, CAS)
Solution to Assignment]1
Exercise 3.11
VC Dimension
Bayes Classifier
Gaussian Bayes Classifier
VC Dimension
1
sign(x12 + a)
2
A Gaussian Bayes classifier with equal covariances
VC dimension = 1
VC dimension = 4
The learner gives linear decision plane in 3D space!
3
Decision boundaries that are circles centered at the origin, of radius a
and where the class value we predict inside the circle is specified by
the parameter b
VC dimension = 2
Hong Chang (ICT, CAS)
Solution to Assignment]1
Exercise 3.11
VC Dimension
Bayes Classifier
Gaussian Bayes Classifier
Joint Bayes Classifier (1)
We can compute p(y |x1 , x2 ) by estimating p(x1 , x2 |y ) and p(y ):
p(y = 1) = 8/16 = 1/2
and
p(x1 , x2 |y = 0)
=
[1, 1, 3, 3]/8 = [1/8, 1/8, 3/8, 3/8]
p(x1 , x2 |y = 1)
=
[3, 3, 0, 2]/8 = [3/8, 3/8, 0, 1/4]
where we list the probabilities for (x1 , x2 ) = (0, 0), (0, 1), (1, 0), (1, 1) in
that order.
Then,
p(y = 1|x1 , x2 ) =
p(x1 , x2 |y = 1)p(y = 1)
p(x1 , x2 |y = 1)p(y = 0)p(x1 , x2 |y = 0)p(y = 0)
Hong Chang (ICT, CAS)
Solution to Assignment]1
Exercise 3.11
VC Dimension
Bayes Classifier
Gaussian Bayes Classifier
Joint Bayes Classifier (2)
For the test data points, we have
p(y = 1|0, 1) = 3/4
⇒ predict
1
p(y = 1|1, 0) = 0
⇒ predict
0
p(y = 1|1, 1) = 2/5
⇒ predict
0
The error rate on this test set is 2/3.
Hong Chang (ICT, CAS)
Solution to Assignment]1
Exercise 3.11
VC Dimension
Bayes Classifier
Gaussian Bayes Classifier
Naive Bayes Classifier (1)
Learning individual distribution for each feature independently
p(x1 |y = 0)
=
[2, 6]/8 = [1/4, 3/4]
p(x2 |y = 0)
=
[4, 4]/8 = [1/2, 1/2]
p(x1 |y = 1)
=
[6, 2]/8 = [3/4, 1/4]
p(x2 |y = 1)
=
[3, 5]/8 = [3/8, 5/8]
The error rate on this test set is 2/3.
Hong Chang (ICT, CAS)
Solution to Assignment]1
Exercise 3.11
VC Dimension
Bayes Classifier
Gaussian Bayes Classifier
Naive Bayes Classifier (2)
Predict for y similarly, assuming p(x1 , x2 |y) = p(x1 |y)p(x2 |y ):
p(y = 1|0, 1) = 0.7895 ⇒ predict
1
⇒ predict
0
p(y = 1|1, 1) = 0.2941 ⇒ predict
0
p(y = 1|1, 0) = 0.2
Hong Chang (ICT, CAS)
Solution to Assignment]1
Exercise 3.11
VC Dimension
Bayes Classifier
Gaussian Bayes Classifier (1)
Estimate the mean and covariance
Plot the class-wise Gaussians
Hong Chang (ICT, CAS)
Solution to Assignment]1
Gaussian Bayes Classifier
Exercise 3.11
VC Dimension
Bayes Classifier
Gaussian Bayes Classifier
Gaussian Bayes Classifier (2)
Matlab code
equalCov=false/true;
learner=gaussBayesClassify(Xtr,Ytr,equalCov);
class2DPlot(learner, Xtr, Ytr);
Bayes classifier boundary with Gaussian class-conditional distribution:
Hong Chang (ICT, CAS)
Solution to Assignment]1
Exercise 3.11
VC Dimension
Bayes Classifier
Gaussian Bayes Classifier
Gaussian Bayes Classifier (3)
Matlab code
useConstant=false; equalCov=false/true;
for degree=1:4
learner=polyClassify(degree,useConstant,gaussBayesClassify());
learner=train(learner,Xtr,Ytr,equalCov);
class2DPlot(learner,Xtr,Ytr);
end;
Hong Chang (ICT, CAS)
Solution to Assignment]1
Exercise 3.11
VC Dimension
Bayes Classifier
Gaussian Bayes Classifier (4)
Polynomial features, arbitrary covariances
Hong Chang (ICT, CAS)
Solution to Assignment]1
Gaussian Bayes Classifier
Exercise 3.11
VC Dimension
Bayes Classifier
Gaussian Bayes Classifier (5)
Polynomial features, equal covariances
Hong Chang (ICT, CAS)
Solution to Assignment]1
Gaussian Bayes Classifier