Intelligent Systems I — Exercise Sheet #12 Sampling 1

Intelligent Systems I — Exercise Sheet #12
Sampling
1. Two men looked through prison bars; one saw stars, the other tried to infer where the window frame was.1
It is dark night, you are sitting in a pitch-black room. In the darkness, you can see six stars twinkling.
?
?
?
??
?
You know that these stars are shining through a square window (a square with center (xc , yc ) and width
w. But where in the wall is this window (what are xc , yc ), and how large is it (what is w)? For your
computations: The locations of the stars are
X
−0.2335 −0.2123 0.3803 0.2564
0.1670 0.2709
=
Y
0.0586 −0.4310 0.3358 −0.3078 −0.3221 0.2334
and we will assume that stars are distributed uniformly at random across the night sky (that is actually
how this dataset was generated).
(a) What is the likelihood function p(X, Y | xc , yc , w)?
5 points
(b) What is the maximum likelihood estimate for (xc , yc , w)?
10 points
(c) Using a broad uniform prior p(xc , yc , w) = U (xc ; −2, 2)U (yc ; −2, 2)U (w; 0, 2) (that is, values for
xc , yc ∈ [−2, 2] and w ∈ [0, 2], U (x; a, b) is the uniform distribution over [a, b]), implement a rejection
sampler for windows that can produce samples from the posterior p(xc , yc , w | X, Y ), such as these:
?
?
?
?
??
(submit your code, with a plot)
40 points
(d) What are the average (i.e. expected) values for (xc , yc , w) found by your sampler?
5 points
(e) By considering the form of the likelihood from (a), can you construct an analytic sampling method
drawing directly from the posterior p(xc , yc , w | X, Y )?
30 bonus points
2. A cautionary tale about importance sampling. It was shown in the lecture that estimators constructed
using importance sampling can have high variance. This exercise shows that this problem exists even in
one dimension, even for very well-behaved problems.
R
Assume we wanted to estimate the normalisation constant Z = p(x) dx of the Gaussian distribution
p(x) = N (x; 0, 1) using importance sampling (of course, Z = 1, and there are much better sampling
methods; but let’s pretend we didn’t know that). This amounts to evaluating the expected value of the
constant function f (x) = 1 under p. As the proposal distribution, we choose q(x) = N (x; 0, σ 2 ). So we
draw samples xs from q, and evaluate the estimator
Z
φ := hf ip =
Z
f (x)p(x) dx =
f (x)
S
p(x)
1X
p(xs )
q(x) dx ≈
f (xs )
=: φˆ
q(x)
S s=1
q(xs )
(1)
ˆ = 1 = φ. What is the variance h(φˆ − hφi)
ˆ 2 i, as a
Show that the expected value of this estimator is hφi
function of σ? What happens when σ ≤ 1/2? Implement this algorithm, and make a plot of the empirical
variance of the sample weights for σ = 0.2 as S increases.
40 points
1 This is an extended form of Exercise 22.10 in David MacKay’s book “Information Theory, Inference and Learning Algorithms”,
available at http://www.inference.phy.cam.ac.uk/itprnn/book.html. It is the two-dimensional (and more romantic) version of
the famous “German Tank Problem” of estimating the outer bounds of a (uniform) distribution from samples.