Efficient Parallel Software for Large-Scale Semidefinite Programs Makoto Yamashita @ Tokyo-Tech Katsuki Fujisawa @ Chuo University MSC 2010 @ Yokohama [2010/09/08] 1 Outline 1. SemiDefinite Programming 2. Conversion of stability condition for differential inclusions to an SDP 3. Primal-Dual Interior-Point Methods and its parallel implementation 4. Numerical Results 2 Many Applications of SDP Control Theory Stability Condition for Differential Inclusions Discrete-Time Optimal Control Problem Via SDP relaxation Polynomial Optimization Problem Sensor Network Problem Quadratic Assignment Problem Quantum Chemistry/Information Large SDP ⇒ Parallel Solver 3 Standard form of SDP 4 Stability condition for differential inclusions to standard SDP Boyd et al . Does the solution in a bounded region? remain i.e., Yes, if 5 Conversion to SDP . To hold this inequality, Bounding the condition number⇒SDP. 6 SDP from SCDI . Feasible solution ⇒ Boundness of the solution Some translation for standard SDP by e.g. YALMIP [J. Löfberg]. 7 Discrete-Time Optimal Control Problems This Problem [Coleman et al] can be formulated as SDP via SparsePOP [Kim et al]. 8 Primal-Dual Interior-Point Methods Both Primal and Dual simultaneously in Polynomial-time Many software are developed SDPA [Yamashita et al] SDPT3 [Toh et al] SeDuMi [Sturm et al] CSDP [Borcher et al] 9 Algorithmic Framework of Primal-Dual Interior-Point Methods Feasible Region of Central Path Step Length to keep interior property Target Point Initial Point Search Direction The most computational time is consumed by the Search Direction Optimal Solution 10 Bottlenecks in PDIPM and SDPARA To obtain the direction, we solve 1. ELEMENTS 2. CHOLESKY Problem ELEMENTS CHOLESKY Total SCDI 22228 1593 23986 DTOC 668 1992 2713 Xeon 5460,3.16GHz In SDPARA, parallel computation is applied to these two bottlenecks 11 Nonzero pattern of Schur complement matrix (B) Fully dense Schur complement matrix SCDI Sparse Schur complement matrix DTOC 12 Exploitation of Sparsity in SDPA We change the formula by row-wise F1 F2 F3 We keep this scheme on parallel computation 13 Row-wise distribution for dense Schur complement matrix 4 CPU is available Each CPU computes only their assigned rows . No communication between CPUs Efficient memory management 14 Fomula-Cost Based distribution for sparse Schur complement 147 48 29 137 21 43 22 124 98 Load on each CPU CPU1:195 CPU2:187 17 53 24 CPU3:189 CPU4:192 Average:190.75 15 Parallel Computation for CHOLESKY We employ ScaLAPACK [Blackford et.al] ⇒ Dense MUMPS [Amestoy et.al] ⇒ Sparse Different data storage enhance the parallel Cholesky factorization 16 Problems for Numerical Results 16 nodes Xeon X5460 (3.16GHz) 48GB memory 17 Computation time on SDP [SCDI1] 100000 second 10000 1000 100 10 24410 12211 Total 15.02 times 21748 6186 10992 3165 ELEMENTS 15.67 times 5516 1625 2755 1387 CHOLESKY 14.20 times 2372 988 TOTAL 524 305 ELEMENTS 167 CHOLESKY Xeon X5460(3.16GHz) 48GB memory/node 1 1 2 4 8 #processors 16 ELEMENTS attains high scalability 18 Computation time on SDP [DTOC1] 10000 2746 Total 4.85 times 1601 1121 898 2206 ELEMENTS 13.50 times 566 1297 965 807 508 CHOLESKY 4.34 times 486 267 125 TOTAL 64 36 ELEMENTS CHOLESKY second 1000 100 10 1 1 2 4 8 #processors 16 Xeon X5460(3.16GHz) 48GB memory/node •Parallel Sparse Cholesky is difficult •ELEMENTS is still enhanced 19 Comparison with PCSDP [Ivanov et al] Time is second, O.M.:out of memory 1. SDPARA is faster than PCSDP 2. The scalability of SDPARA is higher 3. Only SDPARA can solve DTOC 20 Concluding Remarks & Future works 1. SDP has many applications including control theory 2. SDPARA solves Larse-scale SDPs effectively by parallel computation 3. Appropriate parallel computations are the key of SDPARA implementation Improvement on Multi-Threading for sparse Schur complement matrix 21
© Copyright 2025 ExpyDoc