Intelligent Systems I — Exercise Sheet #12 Sampling 1. Two men looked through prison bars; one saw stars, the other tried to infer where the window frame was.1 It is dark night, you are sitting in a pitch-black room. In the darkness, you can see six stars twinkling. ? ? ? ?? ? You know that these stars are shining through a square window (a square with center (xc , yc ) and width w. But where in the wall is this window (what are xc , yc ), and how large is it (what is w)? For your computations: The locations of the stars are X −0.2335 −0.2123 0.3803 0.2564 0.1670 0.2709 = Y 0.0586 −0.4310 0.3358 −0.3078 −0.3221 0.2334 and we will assume that stars are distributed uniformly at random across the night sky (that is actually how this dataset was generated). (a) What is the likelihood function p(X, Y | xc , yc , w)? 5 points (b) What is the maximum likelihood estimate for (xc , yc , w)? 10 points (c) Using a broad uniform prior p(xc , yc , w) = U (xc ; −2, 2)U (yc ; −2, 2)U (w; 0, 2) (that is, values for xc , yc ∈ [−2, 2] and w ∈ [0, 2], U (x; a, b) is the uniform distribution over [a, b]), implement a rejection sampler for windows that can produce samples from the posterior p(xc , yc , w | X, Y ), such as these: ? ? ? ? ?? (submit your code, with a plot) 40 points (d) What are the average (i.e. expected) values for (xc , yc , w) found by your sampler? 5 points (e) By considering the form of the likelihood from (a), can you construct an analytic sampling method drawing directly from the posterior p(xc , yc , w | X, Y )? 30 bonus points 2. A cautionary tale about importance sampling. It was shown in the lecture that estimators constructed using importance sampling can have high variance. This exercise shows that this problem exists even in one dimension, even for very well-behaved problems. R Assume we wanted to estimate the normalisation constant Z = p(x) dx of the Gaussian distribution p(x) = N (x; 0, 1) using importance sampling (of course, Z = 1, and there are much better sampling methods; but let’s pretend we didn’t know that). This amounts to evaluating the expected value of the constant function f (x) = 1 under p. As the proposal distribution, we choose q(x) = N (x; 0, σ 2 ). So we draw samples xs from q, and evaluate the estimator Z φ := hf ip = Z f (x)p(x) dx = f (x) S p(x) 1X p(xs ) q(x) dx ≈ f (xs ) =: φˆ q(x) S s=1 q(xs ) (1) ˆ = 1 = φ. What is the variance h(φˆ − hφi) ˆ 2 i, as a Show that the expected value of this estimator is hφi function of σ? What happens when σ ≤ 1/2? Implement this algorithm, and make a plot of the empirical variance of the sample weights for σ = 0.2 as S increases. 40 points 1 This is an extended form of Exercise 22.10 in David MacKay’s book “Information Theory, Inference and Learning Algorithms”, available at http://www.inference.phy.cam.ac.uk/itprnn/book.html. It is the two-dimensional (and more romantic) version of the famous “German Tank Problem” of estimating the outer bounds of a (uniform) distribution from samples.