Kd f

GPU-Friendly
Pre-conditioners for FEA
Krishnan Suresh
Associate Professor
Mechanical Engineering
FEA for Structural Mechanics
Kd  f
Model
Discretize
Assemble/
Solve
Postprocess
K: NxN sparse SPD
N: degrees of freedom (dof)
Concepts applicable to transient, contact, modal, buckling, …
2
Trends in FEA
DOF
107
105
104
102
1980’s
1990
2000
2014
3
Methods
Kd  f
Direct
T
K = LL
-T
- 1
d = L (L ) f
- Robust but memory hungry
Cholmod (Steve Rennich, Wed)
4
Methods
Kd  f
Direct
Iterative
(Conjugate gradient)
Strengths
- Simple
- Scalable
Challenges
- Fast SpMV Kd
- Good preconditioner
5
Idea-1
Kd  f
Model
Discretize
Assemble/
Solve
Postprocess
Mesh-aware SpMV Acceleration: Congruence
6
Idea-2
Kd  f
Model
Discretize
Assemble/
Solve
Postprocess
Physics-aware Preconditioner: Deflation w/ Curvature
7
Details in Publication
Idea-1
Kd  f
Model
Discretize
Assemble/
Solve
Postprocess
Mesh-aware SpMV Acceleration: Congruence
9
Element Congruency
Observation: Large-meshes contain many similar elements!
Elements are ‘rigid-body/scaling’ congruent
 Identical element stiffness Ke
62350 elements
2780 distinct
95.5% congruent
10
Large Meshes
Mesh-size ­ Þ Congruency ­
6144 elements
68 distinct
98.9% congruent
83000 elements
300 distinct
99.64% congruent
11
Large Congruent Meshes
12
Implication: SpMV
Kd : Sparse Matrix-Vector Multiplication (SpMV)
Critical operation in ALL iterative solvers
Classic: Kd


  Ke  d
 assemble 
Assembly-free: Kd
 K d 
e
e
assemble
[Hughes 83]
Only store Ke of distinct elements
Congruency + Assembly-Free
=
Dramatic Reduction in Traffic
13
Parallel Implementation
Experiment
N = 106
- Assembled Kd
- Assembly-free Kd
1000
N Elements
1 Distinct
800
SpMV; Kd (msec)
770
- Identical result
- Reduced memory requests
600
400
200
37
0
Assembled AF-CPU
3.8
AF-GPU
CPU
15
Idea-2
Kd  f
Model
Discretize
Assemble/
Solve
Postprocess
Physics-aware Preconditioner: Deflation w/ Curvature
16
Deflation
Solve: Kd = f
Given 1st ‘m’ eigen-vectors of K,
can accelerate CG
Computing eigen-vectors is impractical!
Deflation space
W: (N, m)
Deflation
[Nicolaides 87]
Agglomeration
Computing eigen-vectors is impractical!
Agglomeration/Grouping
[Bulgakov 95]:
• Treat each group as rigid body
• Rigid body modes ~ low eigen-modes
Leads to …
Cheap construction of approximate W: (N, m)
Agglomeration
Assembly-Free Agglomeration


W
  n
 assemble 
W
æ1 0 0 0
çç
ç
W n = çç0 1 0 - z
çç
çè0 0 1 y
W
z
0
-x
-
yö
÷
÷
÷
x ÷
÷
÷
÷
÷
÷
0ø


W
  n     Wn n 
assemble
 assemble 
WTd
 W d 
n
assemble
[Yadav, Suresh 2014]
n
Thin Structures
Rigid body: Not efficient
æ1 0 0 0
çç
ç
W n = çç0 1 0 - z
çç
çè0 0 1 y
æ
çç
çç1 0 0 0
çç
W n = ç0 1 0 - z
çç
çç
ç0 0 1 y
çè
z
0
-x
ö
- y÷
÷
÷
x ÷
÷
÷
÷
÷
0÷
ø
z
-y
- zx
0
0
x
0
- zy
0
x2
2
y2
2
-x
ö
÷
÷
- zy ÷
÷
÷
÷
- zx ÷
÷
÷
÷
÷
÷
xy ÷
÷
÷
ø
Respects Kirchhoff-Love Theory
[Yadav, Suresh 2014]
Beams
æ
çç
çç1 0 0 0
ç
W n = çç0 1 0 - z
çç
çç
ç0 0 1 y
çè
z
-y
0
x
-x
0
ö
÷
÷
- zx ÷
÷
÷
÷
0 ÷
÷
÷
÷
2 ÷
x ÷
÷
÷
ø
2 ÷
Respects Euler-Bernoulli Theory
[Yadav, Suresh 2014]
Parallelization
Prolongation
Restriction
W
WTd
Details
Preliminary Results
 CPU:
– AMD FX-8350, 4 GHz, 8 core
– 16 GB
– C/C++ Code (OpenMP)
 GPU:
– GTX Titan (2688 cores)
– 5.6 GB
– CUDA 3.5
 Double-precision
 Timings include CPU-GPU transfer
25
Thick vs. Thin Solids
Thick Solids
Thin Solids
Challenges!
26
Thick Solid: Iterations
3.15 million DOF
27
Thick Solid: Timing
3.15 million DOF (1,000,000 elements)
#Groups
--
CPU
(sec)
2708
GPU
(sec)
301
--
134
38
50
43
28
100
38
26
200
32
25
400
29
25
Plain CG
AF CG
AF DCG
Thick Solids
• Preconditioners less critical in GPU
28
Thin Solid: Iterations
3.15 million DOF
29
Thin Solids
3.15 million DOF (850,000 voxel-elements)
#Groups
--
CPU
(sec)
14400
GPU
(sec)
1300
--
710
106
50
196
80
100
158
45
200
93
27
400
62
24
Plain CG
AF CG
AF DCG
Thin Solids
• SpMV & CG acceleration: critical
30
Transient: ANSYS, CPU
31
Large-Scale Static Analysis
• 50 million DOF
• CPU
• 8 GB
• 4 hours
• GPU
• 3 GB
• 24 minutes
32
Recap
Kd  f
Model
Discretize
Assemble/
Solve
Postprocess
[email protected]
33
Acknowledgements
 Graduate Students
 NSF
 UW-Madison
 Design Concepts
 Luvata
 Trek Bicycles
Publications available at
www.ersl.wisc.edu
Email
[email protected]