Variational Approximation [pdf] - Department of Computer Science

ETH Zurich
Dept. of Computer Science
Statistical Learning Theory
SS 2014
Prof. Dr. Joachim M. Buhmann,
Alexey Gronskiy
E-Mails: [email protected]
[email protected]
Series 6, March 31, 2014
(Approximate Inference)
Please prepare solutions until April 7, 2014.
Send the solutions as a PDF (scanned or LATEX-typsetted) document to Alex’s email address with the subject
Problem 1 (Variational Approximation Inference):
In a fully Bayesian model, where all parameters are given prior distributions, we denote the set of all N observed
variables as X. Moreover, we collectively denote latent variables and parameters by Z. The probabilistic model
specifies the joint distribution p(X, Z) and the goal is to find an approximation for the posterior p(Z|X) and for
the model evidence p(X). The log marginal probability can be decomposed as follows:
ln p(X) = L(q) + KL(qkp),
where the lower bound L and the KL divergence are given by
p(X, Z)
= q(Z) ln
KL(qkp) = − q(Z) ln
1 Show the passages from ln p(X) to L(q) + KL(qkp).
Maximizing the lower bound by optimization with respect to the distribution q(Z) is equivalent to minimizing
the KL divergence. Allowing any possible choice for q(Z), the maximum of L occurs when the KL divergence
vanishes, which requires q(Z) = p(Z|X). However, we assume that working with the true posterior distribution is
computationally intractable. Therefore, we must consider a restricted family of distributions q(Z) and then seek
the member of this family which minimizes the KL divergence. One way to restrict the family of approximating
distributions is to use a parametric distribution q(Z|ω), governed by a set of parameters ω.
An alternative approach is obtained by partitioning the elements of Z into M disjoint groups, each denoted by
Zi . Then we assume that the q distribution factorizes with respect to these groups, following
q(Z) =
qi (Zi ).
The optimal solution has the form
qj∗ (Zj ) = R
exp(Ei6=j [ln p(X, Z)])
exp(Ei6=j [ln p(X, Z)])dZj
2 Solve the factorized variational approximation using a univariate Gaussian distribution, with independent and
identically distributed data assumed to be drawn from a Gaussian. Consider a Gaussian-Gamma conjugate
prior distribution for the mean and the precision and consider the factorization q(µ, τ ) = qµ (µ)qτ (τ ).
(In this simple case, the posterior distribution can be found exactly and takes the form of a Gaussian-Gamma
distribution. Note that, however, the true posterior does not factorizes as proposed in this exercise.)