GPU-Friendly Pre-conditioners for FEA Krishnan Suresh Associate Professor Mechanical Engineering FEA for Structural Mechanics Kd f Model Discretize Assemble/ Solve Postprocess K: NxN sparse SPD N: degrees of freedom (dof) Concepts applicable to transient, contact, modal, buckling, … 2 Trends in FEA DOF 107 105 104 102 1980’s 1990 2000 2014 3 Methods Kd f Direct T K = LL -T - 1 d = L (L ) f - Robust but memory hungry Cholmod (Steve Rennich, Wed) 4 Methods Kd f Direct Iterative (Conjugate gradient) Strengths - Simple - Scalable Challenges - Fast SpMV Kd - Good preconditioner 5 Idea-1 Kd f Model Discretize Assemble/ Solve Postprocess Mesh-aware SpMV Acceleration: Congruence 6 Idea-2 Kd f Model Discretize Assemble/ Solve Postprocess Physics-aware Preconditioner: Deflation w/ Curvature 7 Details in Publication Idea-1 Kd f Model Discretize Assemble/ Solve Postprocess Mesh-aware SpMV Acceleration: Congruence 9 Element Congruency Observation: Large-meshes contain many similar elements! Elements are ‘rigid-body/scaling’ congruent Identical element stiffness Ke 62350 elements 2780 distinct 95.5% congruent 10 Large Meshes Mesh-size Þ Congruency 6144 elements 68 distinct 98.9% congruent 83000 elements 300 distinct 99.64% congruent 11 Large Congruent Meshes 12 Implication: SpMV Kd : Sparse Matrix-Vector Multiplication (SpMV) Critical operation in ALL iterative solvers Classic: Kd Ke d assemble Assembly-free: Kd K d e e assemble [Hughes 83] Only store Ke of distinct elements Congruency + Assembly-Free = Dramatic Reduction in Traffic 13 Parallel Implementation Experiment N = 106 - Assembled Kd - Assembly-free Kd 1000 N Elements 1 Distinct 800 SpMV; Kd (msec) 770 - Identical result - Reduced memory requests 600 400 200 37 0 Assembled AF-CPU 3.8 AF-GPU CPU 15 Idea-2 Kd f Model Discretize Assemble/ Solve Postprocess Physics-aware Preconditioner: Deflation w/ Curvature 16 Deflation Solve: Kd = f Given 1st ‘m’ eigen-vectors of K, can accelerate CG Computing eigen-vectors is impractical! Deflation space W: (N, m) Deflation [Nicolaides 87] Agglomeration Computing eigen-vectors is impractical! Agglomeration/Grouping [Bulgakov 95]: • Treat each group as rigid body • Rigid body modes ~ low eigen-modes Leads to … Cheap construction of approximate W: (N, m) Agglomeration Assembly-Free Agglomeration W n assemble W æ1 0 0 0 çç ç W n = çç0 1 0 - z çç çè0 0 1 y W z 0 -x - yö ÷ ÷ ÷ x ÷ ÷ ÷ ÷ ÷ ÷ 0ø W n Wn n assemble assemble WTd W d n assemble [Yadav, Suresh 2014] n Thin Structures Rigid body: Not efficient æ1 0 0 0 çç ç W n = çç0 1 0 - z çç çè0 0 1 y æ çç çç1 0 0 0 çç W n = ç0 1 0 - z çç çç ç0 0 1 y çè z 0 -x ö - y÷ ÷ ÷ x ÷ ÷ ÷ ÷ ÷ 0÷ ø z -y - zx 0 0 x 0 - zy 0 x2 2 y2 2 -x ö ÷ ÷ - zy ÷ ÷ ÷ ÷ - zx ÷ ÷ ÷ ÷ ÷ ÷ xy ÷ ÷ ÷ ø Respects Kirchhoff-Love Theory [Yadav, Suresh 2014] Beams æ çç çç1 0 0 0 ç W n = çç0 1 0 - z çç çç ç0 0 1 y çè z -y 0 x -x 0 ö ÷ ÷ - zx ÷ ÷ ÷ ÷ 0 ÷ ÷ ÷ ÷ 2 ÷ x ÷ ÷ ÷ ø 2 ÷ Respects Euler-Bernoulli Theory [Yadav, Suresh 2014] Parallelization Prolongation Restriction W WTd Details Preliminary Results CPU: – AMD FX-8350, 4 GHz, 8 core – 16 GB – C/C++ Code (OpenMP) GPU: – GTX Titan (2688 cores) – 5.6 GB – CUDA 3.5 Double-precision Timings include CPU-GPU transfer 25 Thick vs. Thin Solids Thick Solids Thin Solids Challenges! 26 Thick Solid: Iterations 3.15 million DOF 27 Thick Solid: Timing 3.15 million DOF (1,000,000 elements) #Groups -- CPU (sec) 2708 GPU (sec) 301 -- 134 38 50 43 28 100 38 26 200 32 25 400 29 25 Plain CG AF CG AF DCG Thick Solids • Preconditioners less critical in GPU 28 Thin Solid: Iterations 3.15 million DOF 29 Thin Solids 3.15 million DOF (850,000 voxel-elements) #Groups -- CPU (sec) 14400 GPU (sec) 1300 -- 710 106 50 196 80 100 158 45 200 93 27 400 62 24 Plain CG AF CG AF DCG Thin Solids • SpMV & CG acceleration: critical 30 Transient: ANSYS, CPU 31 Large-Scale Static Analysis • 50 million DOF • CPU • 8 GB • 4 hours • GPU • 3 GB • 24 minutes 32 Recap Kd f Model Discretize Assemble/ Solve Postprocess [email protected] 33 Acknowledgements Graduate Students NSF UW-Madison Design Concepts Luvata Trek Bicycles Publications available at www.ersl.wisc.edu Email [email protected]
© Copyright 2025 ExpyDoc