Exercises ETH Zurich Dept. of Computer Science Statistical Learning Theory SS 2014 Prof. Dr. Joachim M. Buhmann, Alexey Gronskiy E-Mails: [email protected] [email protected] Website: http://www.inf.ethz.ch/personal/alexeygr/slt14/ Series 6, March 31, 2014 (Approximate Inference) Please prepare solutions until April 7, 2014. Send the solutions as a PDF (scanned or LATEX-typsetted) document to Alex’s email address with the subject SLT14. Problem 1 (Variational Approximation Inference): In a fully Bayesian model, where all parameters are given prior distributions, we denote the set of all N observed variables as X. Moreover, we collectively denote latent variables and parameters by Z. The probabilistic model specifies the joint distribution p(X, Z) and the goal is to find an approximation for the posterior p(Z|X) and for the model evidence p(X). The log marginal probability can be decomposed as follows: ln p(X) = L(q) + KL(qkp), where the lower bound L and the KL divergence are given by Z p(X, Z) dZ L(q) = q(Z) ln q(Z) Z p(Z|X) KL(qkp) = − q(Z) ln dZ q(X) 1 Show the passages from ln p(X) to L(q) + KL(qkp). Maximizing the lower bound by optimization with respect to the distribution q(Z) is equivalent to minimizing the KL divergence. Allowing any possible choice for q(Z), the maximum of L occurs when the KL divergence vanishes, which requires q(Z) = p(Z|X). However, we assume that working with the true posterior distribution is computationally intractable. Therefore, we must consider a restricted family of distributions q(Z) and then seek the member of this family which minimizes the KL divergence. One way to restrict the family of approximating distributions is to use a parametric distribution q(Z|ω), governed by a set of parameters ω. An alternative approach is obtained by partitioning the elements of Z into M disjoint groups, each denoted by Zi . Then we assume that the q distribution factorizes with respect to these groups, following q(Z) = M Y qi (Zi ). i=1 The optimal solution has the form qj∗ (Zj ) = R exp(Ei6=j [ln p(X, Z)]) . exp(Ei6=j [ln p(X, Z)])dZj 2 Solve the factorized variational approximation using a univariate Gaussian distribution, with independent and identically distributed data assumed to be drawn from a Gaussian. Consider a Gaussian-Gamma conjugate prior distribution for the mean and the precision and consider the factorization q(µ, τ ) = qµ (µ)qτ (τ ). (In this simple case, the posterior distribution can be found exactly and takes the form of a Gaussian-Gamma distribution. Note that, however, the true posterior does not factorizes as proposed in this exercise.)