量子化学における 超大規模半正定値計画問題と

Efficient Parallel Software for
Large-Scale Semidefinite Programs
Makoto Yamashita @ Tokyo-Tech
Katsuki Fujisawa @ Chuo University
MSC 2010 @ Yokohama [2010/09/08]
1
Outline
1. SemiDefinite Programming
2. Conversion of stability condition for
differential inclusions to an SDP
3. Primal-Dual Interior-Point Methods
and its parallel implementation
4. Numerical Results
2
Many Applications of SDP
 Control Theory
 Stability Condition for Differential Inclusions
 Discrete-Time Optimal Control Problem
 Via SDP relaxation
 Polynomial Optimization Problem
 Sensor Network Problem
 Quadratic Assignment Problem
 Quantum Chemistry/Information
 Large SDP ⇒ Parallel Solver
3
Standard form of SDP
4
Stability condition for differential
inclusions to standard SDP
Boyd et al
.
 Does the solution
in a bounded region?
remain
 i.e.,
 Yes, if
5
Conversion to SDP
.
 To hold this inequality,
 Bounding the condition number⇒SDP.
6
SDP from SCDI
.
 Feasible solution
⇒ Boundness of the solution
 Some translation for standard SDP
by e.g. YALMIP [J. Löfberg].
7
Discrete-Time Optimal Control Problems
 This Problem [Coleman et al] can be formulated as
SDP via SparsePOP [Kim et al].
8
Primal-Dual Interior-Point Methods
 Both Primal and Dual simultaneously
in Polynomial-time
 Many software are developed




SDPA [Yamashita et al]
SDPT3 [Toh et al]
SeDuMi [Sturm et al]
CSDP [Borcher et al]
9
Algorithmic Framework of
Primal-Dual Interior-Point Methods
Feasible Region of
Central Path
Step Length
to keep interior property
Target Point
Initial Point
Search Direction
The most computational time
is consumed by the Search Direction
Optimal Solution
10
Bottlenecks in PDIPM
and SDPARA
 To obtain the direction, we solve
1. ELEMENTS
2. CHOLESKY
Problem ELEMENTS CHOLESKY Total
SCDI
22228
1593
23986
DTOC
668
1992
2713
Xeon 5460,3.16GHz
 In SDPARA, parallel computation is applied
to these two bottlenecks
11
Nonzero pattern of
Schur complement matrix (B)
Fully
dense Schur complement matrix
SCDI
Sparse
Schur complement matrix
DTOC
12
Exploitation of Sparsity
in SDPA
 We change the formula by row-wise
F1
F2
F3
 We keep this scheme on parallel computation
13
Row-wise distribution
for dense Schur complement matrix
4 CPU is available
Each CPU computes
only their assigned rows
.
No communication
between CPUs
Efficient memory
management
14
Fomula-Cost Based distribution
for sparse Schur complement
147
48
29
137
21
43
22
124
98
Load on
each CPU
CPU1:195
CPU2:187
17
53
24
CPU3:189
CPU4:192
Average:190.75
15
Parallel Computation for CHOLESKY
 We employ
 ScaLAPACK [Blackford et.al] ⇒ Dense
 MUMPS
[Amestoy et.al] ⇒ Sparse
 Different data storage enhance the
parallel Cholesky factorization
16
Problems for Numerical Results
 16 nodes
 Xeon X5460 (3.16GHz)
 48GB memory
17
Computation time on SDP [SCDI1]
100000
second
10000
1000
100
10
24410
12211
Total 15.02 times
21748
6186
10992
3165
ELEMENTS 15.67 times
5516
1625
2755
1387 CHOLESKY 14.20 times
2372
988
TOTAL
524
305
ELEMENTS
167
CHOLESKY
Xeon X5460(3.16GHz)
48GB memory/node
1
1
2
4
8
#processors
16
ELEMENTS attains high scalability
18
Computation time on SDP [DTOC1]
10000
2746
Total 4.85 times
1601
1121
898
2206
ELEMENTS 13.50 times
566
1297
965
807
508 CHOLESKY 4.34 times
486
267
125
TOTAL
64
36
ELEMENTS
CHOLESKY
second
1000
100
10
1
1
2
4
8
#processors
16
Xeon X5460(3.16GHz)
48GB memory/node
•Parallel Sparse Cholesky is difficult
•ELEMENTS is still enhanced
19
Comparison with PCSDP [Ivanov et al]
Time is second, O.M.:out of memory
1. SDPARA is faster than PCSDP
2. The scalability of SDPARA is higher
3. Only SDPARA can solve DTOC
20
Concluding Remarks &
Future works
1. SDP has many applications including
control theory
2. SDPARA solves Larse-scale SDPs
effectively by parallel computation
3. Appropriate parallel computations are
the key of SDPARA implementation
 Improvement on Multi-Threading for
sparse Schur complement matrix
21