Putting Markov Random Fields on a Tensor Train Alexander Novikov1 1 Anton Rodomanov1 Dmitry Vetrov1,2 Moscow State University, Russia 2 Anton Osokin1 Higher School of Economics, Russia ICML, June 22, 2014 A. Novikov et al. Putting MRFs on a Tensor Train ICML, June 22, 2014 1 / 23 Summary Many tasks in Markov random fields (MRFs) are hard. Energy and probability of MRFs are tensors. Tensor Train (TT) decomposition: compact representation of high-dimensional tensors (Oseledets, 2011); efficient operations. We use TT-format for: partition function (normalization constant); marginal distributions; MAP-inference. A. Novikov et al. Putting MRFs on a Tensor Train ICML, June 22, 2014 2 / 23 MRFs and tensors Potential E(x1 , . . . , xn ) = Pb (x1 , . . . , xn ) = m X `=1 m Y Θ` (x` ) ` Ψ` (x ) `=1 tensors (multidimensional arrays) Factor xi ∈ {1, . . . , d}. MAP-inference ⇐⇒ minimal element in E Partition function ⇐⇒ sum of all elements of Pb A. Novikov et al. Putting MRFs on a Tensor Train ICML, June 22, 2014 3 / 23 TT-format TT-format for tensor A: A(x1 , . . . , xn ) = G1A [x1 ] G2A [x2 ] . . . GnA [xn ] | {z } | {z } 1×r1 (A) r1 (A)×r2 (A) | {z } rn−1 (A)×1 Terminology: GiA — TT-cores; ri (A) — TT-ranks; r (A) = max ri (A) — maximal TT-rank. i=1,...,n−1 TT-format uses O ndr 2 (A) memory to store O (d n ) elements. Efficient only if TT-ranks are small. A. Novikov et al. Putting MRFs on a Tensor Train ICML, June 22, 2014 4 / 23 Example A(x1 , x2 , x3 ) = x1 + x2 + x3 , A(x1 , x2 , x3 ) = G1A [x1 ]G2A [x2 ]G3A [x3 ], A. Novikov et al. Putting MRFs on a Tensor Train ICML, June 22, 2014 5 / 23 Example A(x1 , x2 , x3 ) = x1 + x2 + x3 , A(x1 , x2 , x3 ) = G1A [x1 ]G2A [x2 ]G3A [x3 ], = h A(x1 , x2 , x3 ) = h G1A [x1 ] x1 1 i x1 1 i " G2A [x2 ] = 1 0 x2 1 # " G3A [x3 ] = 1 x3 # Indeed: " 1 0 x2 1 = A. Novikov et al. h #" 1 x3 # = x1 + x2 1 Putting MRFs on a Tensor Train i " 1 x3 # = x1 + x2 + x3 . ICML, June 22, 2014 5 / 23 TT-format for probability Consider TT-format for probability P = 1 b ZP of MRF: P (x) = G1 [x1 ]G2 [x2 ] · · · Gn [xn ] X = G1 [x1 ](α1 )G2 [x2 ](α1 , α2 ) · · · Gn [xn ](αn−1 ) . α1 ,...,αn−1 | {z } P (x,α) Chain-like model with hidden variables αi = 1, . . . , ri (P ) is constructed. x1 G1 A. Novikov et al. x2 α1 G2 xn ... Putting MRFs on a Tensor Train αn−1 Gn ICML, June 22, 2014 6 / 23 Some efficient operations in the TT-format A. Novikov et al. Operation Output rank C =A+B C =AB min A sum A r(A)+r(B) r(A) r(B) – – Putting MRFs on a Tensor Train ICML, June 22, 2014 7 / 23 TT-approach for MRFs MAP-inference ⇐⇒ minimal element in E Partition function ⇐⇒ sum of all elements of Pb Both operations are provided by the TT-format. Let’s convert E and P into the TT-format. A. Novikov et al. Putting MRFs on a Tensor Train ICML, June 22, 2014 8 / 23 Finding a TT-representation of an MRF TT-SVD (Oseledets, 2011): exact TT-representation but only for small tensors No, MRF tensor is too big. AMEn-cross (Oseledets & Tyrtyshnikov, 2010): approximate TT-representation; uses only a small fraction of tensor’s elements Possible, but there is also a much better way! A. Novikov et al. Putting MRFs on a Tensor Train ICML, June 22, 2014 9 / 23 The algorithm & its theoretical guarantees 1 Convert the potentials Θ` (x) (factors Ψ` (x)) into the TT-format. 2 Use the TT-operations: E(x) = A. Novikov et al. Pm `=1 Θ` (x) Putting MRFs on a Tensor Train (Pb = Jm `=1 Ψ` ). ICML, June 22, 2014 10 / 23 The algorithm & its theoretical guarantees Convert the potentials Θ` (x) (factors Ψ` (x)) into the TT-format. 2 Use the TT-operations: E(x) = maximal TT-rank 1 Pm `=1 Θ` (x) (Pb = Jm `=1 Ψ` ). 1,000 r(P) r(E) 800 600 400 200 0 50 100 number of nodes n Theorem. The maximal TT-rank of E constructed by the algorithm is p polynomially bounded: r(E) ≤ d 2 m, where p is the order of MRF. A. Novikov et al. Putting MRFs on a Tensor Train ICML, June 22, 2014 10 / 23 TT-rounding e = round(A, ε): TT-rounding procedure A 1 reduces TT-ranks 2 tensors are close round ( A. Novikov et al. ) Putting MRFs on a Tensor Train ICML, June 22, 2014 11 / 23 TT-rounding example 1,000 maximal TT-rank P 800 round(P, 10−8 ) E round(E, 10−8 ) 600 400 200 0 20 40 60 80 100 120 140 number of nodes n A. Novikov et al. Putting MRFs on a Tensor Train ICML, June 22, 2014 12 / 23 The algorithm motivation TT-ranks of Pb are exponential; We will compute partition function Z without explicitly building the TT- representation of Pb . A. Novikov et al. Putting MRFs on a Tensor Train ICML, June 22, 2014 13 / 23 Partition function estimation Pb (x) = = m Y `=1 m O Ψ` (x) Ψ` (x) = `=1 = m O G1` [x1 ] · · · Gn` [xn ] `=1 G11 [x1 ] ⊗ · · · ⊗ G1m [x1 ] · · · Gn1 [xn ] ⊗ · · · ⊗ Gnm [xn ] . Denote: Ai [xi ] = Gi1 [xi ] ⊗ · · · ⊗ Gim [xi ]. Finally, X X Z= Pb (x) = A1 [x1 ] . . . An [xn ] x x1 ,...,xn ! ! X = X A1 [x1 ] . . . | A. Novikov et al. = B1 · · · Bn An [xn ] xn x1 {z B1 } | {z Bn Putting MRFs on a Tensor Train } ICML, June 22, 2014 14 / 23 The algorithm Z = B1 · · · Bn , Each matrix Bi is huge but can be exactly represented in the TT-format. The algorithm: 1 f1 := B1 2 f2 := round(f1 B2 , ε) 3 f3 := round(f2 B3 , ε) 4 ... 5 fn := round(fn−1 Bn , ε) 6 e := fn ; Z A. Novikov et al. Putting MRFs on a Tensor Train ICML, June 22, 2014 15 / 23 Marginal distributions Our approach can be generalized to the marginal distributions as well: Pˆi (xi ) = B1 . . . Bi−1 Ai [xi ] Bi+1 . . . Bn , A. Novikov et al. Putting MRFs on a Tensor Train ICML, June 22, 2014 16 / 23 MAP-inference The TT-method for the MAP-inference: 1 Convert the energy into the TT-format; 2 Find the minimal element in the energy tensor. We compare the TT-method with the popular TRW-S algorithm on several real-world image segmentation problems from the OpenGM database. Problem gm6 gm29 gm66 gm105 gm32 gm70 gm85 gm192 A. Novikov et al. Variables 320 212 198 237 100 122 143 99 Labels 3 3 3 3 7 7 7 7 TRW-S 45.03 56.81 75.19 67.81 150.50 121.78 168.30 114.51 Putting MRFs on a Tensor Train TT 43.11 56.21 74.92 67.71 289.29 163.60 228.40 174.78 Time (sec) 637 224 172 230 257 399 1 912 180 ICML, June 22, 2014 17 / 23 Partition function Spin glass model: Pb (x) = n Y 1 exp − hi xi T i=1 1 exp − cij xi xj T (i, j)∈E Y where xi ∈ {−1, 1}. Notation: xi Temperature T ; Unary coefficients hi ; Pairwise coefficients cij . We compare against methods from the LibDAI library (Mooij 2010). A. Novikov et al. Putting MRFs on a Tensor Train ICML, June 22, 2014 18 / 23 Comparison | log Z˜ − log Z | 102 TT MCMC – AIS TREEEP1 MF BP 10−3 10−8 10−13 10−1 100 101 102 103 temperature T Comparison on Ising model (all pairwise weights are equal cij = 1). 1 Minka and Qi 2004. A. Novikov et al. Putting MRFs on a Tensor Train ICML, June 22, 2014 19 / 23 WISH TT | log Z˜ − log Z | 101 WISH2 BP MF TREEEP 10−1 10−3 10−5 0.5 1 1.5 2 2.5 3 strength of pairwise weights f Comparison on the data from the WISH paper, T = 1, cij ∼ U[−f , f ]. 2 Ermon et al. 2013. A. Novikov et al. Putting MRFs on a Tensor Train ICML, June 22, 2014 20 / 23 L1 error in marginals Marginal distributions TT Gibbs TREEEP MF BP 10−2 10−7 10−12 10−17 0 1 2 3 strength of pairwise weights f Spin glass models, T = 1, cij ∼ U[−f , f ]. A. Novikov et al. Putting MRFs on a Tensor Train ICML, June 22, 2014 21 / 23 Conclusion Our contributions: Algorithm that finds an exact TT-representation of MRF energy; Algorithm that estimates the partition function and the marginals; Theoretical guaranties for the proposed algorithms. Source code is availible online: https://github.com/bihaqo/TT-MRF. See poster S38 tonight! A. Novikov et al. Putting MRFs on a Tensor Train ICML, June 22, 2014 22 / 23 References I Ermon, S. et al. (2013). “Taming the Curse of Dimensionality: Discrete Integration by Hashing and Optimization”. In: International Conference on Machine Learning (ICML). Minka, T. and Y. Qi (2004). “Tree-structured Approximations by Expectation Propagation”. In: Advances in Neural Information Processing Systems 16 (NIPS), pp. 193–200. Mooij, J. M. (2010). “libDAI: A Free and Open Source C++ Library for Discrete Approximate Inference in Graphical Models”. In: Journal of Machine Learning Research 11, pp. 2169–2173. A. Novikov et al. Putting MRFs on a Tensor Train ICML, June 22, 2014 23 / 23
© Copyright 2025 ExpyDoc