presentation - Bayesian Methods research group

Putting Markov Random Fields on a Tensor Train
Alexander Novikov1
1
Anton Rodomanov1
Dmitry Vetrov1,2
Moscow State University, Russia
2
Anton Osokin1
Higher School of Economics, Russia
ICML, June 22, 2014
A. Novikov et al.
Putting MRFs on a Tensor Train
ICML, June 22, 2014
1 / 23
Summary
Many tasks in Markov random fields (MRFs) are hard.
Energy and probability of MRFs are tensors.
Tensor Train (TT) decomposition: compact representation of
high-dimensional tensors (Oseledets, 2011); efficient operations.
We use TT-format for:
partition function (normalization constant);
marginal distributions;
MAP-inference.
A. Novikov et al.
Putting MRFs on a Tensor Train
ICML, June 22, 2014
2 / 23
MRFs and tensors
Potential
E(x1 , . . . , xn ) =
Pb (x1 , . . . , xn ) =
m
X
`=1
m
Y
Θ` (x` )
`
Ψ` (x )
`=1







tensors (multidimensional arrays)






Factor
xi ∈ {1, . . . , d}.
MAP-inference
⇐⇒
minimal element in E
Partition function
⇐⇒
sum of all elements of Pb
A. Novikov et al.
Putting MRFs on a Tensor Train
ICML, June 22, 2014
3 / 23
TT-format
TT-format for tensor A:
A(x1 , . . . , xn ) = G1A [x1 ] G2A [x2 ] . . . GnA [xn ]
| {z } | {z }
1×r1 (A) r1 (A)×r2 (A)
| {z }
rn−1 (A)×1
Terminology:
GiA — TT-cores;
ri (A) — TT-ranks;
r (A) =
max
ri (A) — maximal TT-rank.
i=1,...,n−1
TT-format uses O ndr 2 (A) memory to store O (d n ) elements.
Efficient only if TT-ranks are small.
A. Novikov et al.
Putting MRFs on a Tensor Train
ICML, June 22, 2014
4 / 23
Example
A(x1 , x2 , x3 ) = x1 + x2 + x3 ,
A(x1 , x2 , x3 ) = G1A [x1 ]G2A [x2 ]G3A [x3 ],
A. Novikov et al.
Putting MRFs on a Tensor Train
ICML, June 22, 2014
5 / 23
Example
A(x1 , x2 , x3 ) = x1 + x2 + x3 ,
A(x1 , x2 , x3 ) = G1A [x1 ]G2A [x2 ]G3A [x3 ],
=
h
A(x1 , x2 , x3 ) =
h
G1A [x1 ]
x1 1
i
x1 1
i
"
G2A [x2 ]
=
1 0
x2 1
#
"
G3A [x3 ]
=
1
x3
#
Indeed:
"
1 0
x2 1
=
A. Novikov et al.
h
#"
1
x3
#
=
x1 + x2 1
Putting MRFs on a Tensor Train
i
"
1
x3
#
= x1 + x2 + x3 .
ICML, June 22, 2014
5 / 23
TT-format for probability
Consider TT-format for probability P =
1 b
ZP
of MRF:
P (x) = G1 [x1 ]G2 [x2 ] · · · Gn [xn ]
X
=
G1 [x1 ](α1 )G2 [x2 ](α1 , α2 ) · · · Gn [xn ](αn−1 ) .
α1 ,...,αn−1
|
{z
}
P (x,α)
Chain-like model with hidden variables αi = 1, . . . , ri (P ) is constructed.
x1
G1
A. Novikov et al.
x2
α1
G2
xn
...
Putting MRFs on a Tensor Train
αn−1
Gn
ICML, June 22, 2014
6 / 23
Some efficient operations in the TT-format
A. Novikov et al.
Operation
Output rank
C =A+B
C =AB
min A
sum A
r(A)+r(B)
r(A) r(B)
–
–
Putting MRFs on a Tensor Train
ICML, June 22, 2014
7 / 23
TT-approach for MRFs
MAP-inference
⇐⇒
minimal element in E
Partition function
⇐⇒
sum of all elements of Pb
Both operations are provided by the TT-format.
Let’s convert E and P into the TT-format.
A. Novikov et al.
Putting MRFs on a Tensor Train
ICML, June 22, 2014
8 / 23
Finding a TT-representation of an MRF
TT-SVD (Oseledets, 2011): exact TT-representation but only for
small tensors
No, MRF tensor is too big.
AMEn-cross (Oseledets & Tyrtyshnikov, 2010): approximate
TT-representation; uses only a small fraction of tensor’s elements
Possible, but there is also a much better way!
A. Novikov et al.
Putting MRFs on a Tensor Train
ICML, June 22, 2014
9 / 23
The algorithm & its theoretical guarantees
1
Convert the potentials Θ` (x) (factors Ψ` (x)) into the TT-format.
2
Use the TT-operations: E(x) =
A. Novikov et al.
Pm
`=1 Θ` (x)
Putting MRFs on a Tensor Train
(Pb =
Jm
`=1 Ψ` ).
ICML, June 22, 2014
10 / 23
The algorithm & its theoretical guarantees
Convert the potentials Θ` (x) (factors Ψ` (x)) into the TT-format.
2
Use the TT-operations: E(x) =
maximal TT-rank
1
Pm
`=1 Θ` (x)
(Pb =
Jm
`=1 Ψ` ).
1,000
r(P)
r(E)
800
600
400
200
0
50
100
number of nodes n
Theorem. The maximal TT-rank of E constructed by the algorithm is
p
polynomially bounded: r(E) ≤ d 2 m, where p is the order of MRF.
A. Novikov et al.
Putting MRFs on a Tensor Train
ICML, June 22, 2014
10 / 23
TT-rounding
e = round(A, ε):
TT-rounding procedure A
1
reduces TT-ranks
2
tensors are close
round
(
A. Novikov et al.
)
Putting MRFs on a Tensor Train
ICML, June 22, 2014
11 / 23
TT-rounding example
1,000
maximal TT-rank
P
800
round(P, 10−8 )
E
round(E, 10−8 )
600
400
200
0
20
40
60
80
100
120
140
number of nodes n
A. Novikov et al.
Putting MRFs on a Tensor Train
ICML, June 22, 2014
12 / 23
The algorithm motivation
TT-ranks of Pb are exponential;
We will compute partition function Z without explicitly building the
TT- representation of Pb .
A. Novikov et al.
Putting MRFs on a Tensor Train
ICML, June 22, 2014
13 / 23
Partition function estimation
Pb (x) =
=
m
Y
`=1
m
O
Ψ` (x)
Ψ` (x) =
`=1
=
m O
G1` [x1 ] · · · Gn` [xn ]
`=1
G11 [x1 ]
⊗ · · · ⊗ G1m [x1 ] · · · Gn1 [xn ] ⊗ · · · ⊗ Gnm [xn ] .
Denote: Ai [xi ] = Gi1 [xi ] ⊗ · · · ⊗ Gim [xi ].
Finally,
X
X
Z=
Pb (x) =
A1 [x1 ] . . . An [xn ]
x
x1 ,...,xn
!
!
X
=
X
A1 [x1 ] . . .
|
A. Novikov et al.
= B1 · · · Bn
An [xn ]
xn
x1
{z
B1
}
|
{z
Bn
Putting MRFs on a Tensor Train
}
ICML, June 22, 2014
14 / 23
The algorithm
Z = B1 · · · Bn ,
Each matrix Bi is huge but can be exactly represented in the TT-format.
The algorithm:
1
f1 := B1
2
f2 := round(f1 B2 , ε)
3
f3 := round(f2 B3 , ε)
4
...
5
fn := round(fn−1 Bn , ε)
6
e := fn ;
Z
A. Novikov et al.
Putting MRFs on a Tensor Train
ICML, June 22, 2014
15 / 23
Marginal distributions
Our approach can be generalized to the marginal distributions as well:
Pˆi (xi ) = B1 . . . Bi−1 Ai [xi ] Bi+1 . . . Bn ,
A. Novikov et al.
Putting MRFs on a Tensor Train
ICML, June 22, 2014
16 / 23
MAP-inference
The TT-method for the MAP-inference:
1
Convert the energy into the TT-format;
2
Find the minimal element in the energy tensor.
We compare the TT-method with the popular TRW-S algorithm on several
real-world image segmentation problems from the OpenGM database.
Problem
gm6
gm29
gm66
gm105
gm32
gm70
gm85
gm192
A. Novikov et al.
Variables
320
212
198
237
100
122
143
99
Labels
3
3
3
3
7
7
7
7
TRW-S
45.03
56.81
75.19
67.81
150.50
121.78
168.30
114.51
Putting MRFs on a Tensor Train
TT
43.11
56.21
74.92
67.71
289.29
163.60
228.40
174.78
Time (sec)
637
224
172
230
257
399
1 912
180
ICML, June 22, 2014
17 / 23
Partition function
Spin glass model:
Pb (x) =
n
Y
1
exp − hi xi
T
i=1
1
exp − cij xi xj
T
(i, j)∈E
Y
where xi ∈ {−1, 1}.
Notation:
xi
Temperature T ;
Unary coefficients hi ;
Pairwise coefficients cij .
We compare against methods from the LibDAI library (Mooij 2010).
A. Novikov et al.
Putting MRFs on a Tensor Train
ICML, June 22, 2014
18 / 23
Comparison
| log Z˜ − log Z |
102
TT
MCMC – AIS
TREEEP1
MF
BP
10−3
10−8
10−13
10−1
100
101
102
103
temperature T
Comparison on Ising model (all pairwise weights are equal cij = 1).
1
Minka and Qi 2004.
A. Novikov et al.
Putting MRFs on a Tensor Train
ICML, June 22, 2014
19 / 23
WISH
TT
| log Z˜ − log Z |
101
WISH2
BP
MF
TREEEP
10−1
10−3
10−5
0.5
1
1.5
2
2.5
3
strength of pairwise weights f
Comparison on the data from the WISH paper, T = 1, cij ∼ U[−f , f ].
2
Ermon et al. 2013.
A. Novikov et al.
Putting MRFs on a Tensor Train
ICML, June 22, 2014
20 / 23
L1 error in marginals
Marginal distributions
TT
Gibbs
TREEEP
MF
BP
10−2
10−7
10−12
10−17
0
1
2
3
strength of pairwise weights f
Spin glass models, T = 1, cij ∼ U[−f , f ].
A. Novikov et al.
Putting MRFs on a Tensor Train
ICML, June 22, 2014
21 / 23
Conclusion
Our contributions:
Algorithm that finds an exact TT-representation of MRF energy;
Algorithm that estimates the partition function and the marginals;
Theoretical guaranties for the proposed algorithms.
Source code is availible online: https://github.com/bihaqo/TT-MRF.
See poster S38 tonight!
A. Novikov et al.
Putting MRFs on a Tensor Train
ICML, June 22, 2014
22 / 23
References I
Ermon, S. et al. (2013). “Taming the Curse of Dimensionality: Discrete
Integration by Hashing and Optimization”. In: International Conference
on Machine Learning (ICML).
Minka, T. and Y. Qi (2004). “Tree-structured Approximations by
Expectation Propagation”. In: Advances in Neural Information
Processing Systems 16 (NIPS), pp. 193–200.
Mooij, J. M. (2010). “libDAI: A Free and Open Source C++ Library for
Discrete Approximate Inference in Graphical Models”. In: Journal of
Machine Learning Research 11, pp. 2169–2173.
A. Novikov et al.
Putting MRFs on a Tensor Train
ICML, June 22, 2014
23 / 23