Many-core computing for Big Biomedical Data

Many-core computing for Big Biomedical Data
Marc A. Suchard
Departments of Biomathematics and Human Genetics
David Geffen School of Medicine at UCLA
Department of Biostatistics
UCLA Fielding School of Public Health
IDRE July 2014
[Joint work with David Madigan, Patrick Ryan, Martijn Schumie and
Trevor Shaddox]
Marc Suchard
Massively parallel statistical computing
Drug Safety using Massive Observational Databases
Goals: Model-based approach
that:
Controls for other drugs and
covariates
Includes temporal information
Most drugs without effect
(priors)
Merck recalls Vioxx
Post-market surveillance:
∼ 100K - 10M patients
∼ 1K - 10K drugs
Rare adverse events
Major concern for Big
Pharma/FDA
Marc Suchard
Throws out asymptotics
Paradigm Shift Required
“Bootstrapping is computationally
infeasible with > 20K patients. . . "
–
, August 2011, ICPE
Some truth: ≈ 100 days in my
hands (prelim in R)
Massively parallel statistical computing
Self-Controlled Case Series
Basic model: Events ← inhomogeneous Poisson process
Vioxx
MI
Vioxx
MI MI
M78
Olanzapine
Quetiapine
Celecoxib
1 Era
Time
P
7
# of events yik for subject i during era k : N
i Ki = K ∼ 10 ,
condition on total ni .
drug indicators xik = {xikj }0 during era k to drug j
drug effects β = {βj }, J ∼ 1000
log posterior: log (conditional) likelihood + log lasso prior



Ki
Ki
N
X
X
X
0

−
yik x0ik β − ni log 
tik exik β  + λ |β|
i=1
k =1
k =1
Convex optimization → Many choices!
Marc Suchard
Massively parallel statistical computing
Block Relaxation via Cyclic Coordinate Descent (CCD)
Vioxx
MI
Vioxx
MI MI
M78
Olanzapine
Quetiapine
Celecoxib
1 Era
Time
Lasso forces many βj = 0
CCD, with shrunken
Newton step, on j:
β1 , β2 , . . . , βJ , β1 , . . .
1-D Gradient/Hessian require:
Column-wise operations
ex dominates when dense
Reductions dominate
when sparse
Solution:
Exploit sparsity: limited
previous work
Good convergence
No matrix inversion
Compute in parallel using
GPUs
Very serial
de Leeuw (1994)
log posterior: log (conditional) likelihood + log lasso prior



Ki
Ki
N
X
X
X
0

−
yik x0ik β − ni log 
tik exik β  + λ |β|
i=1
k =1
k =1
Marc Suchard
Massively parallel statistical computing
Getting Down and Dirty with SCCS
Let SCCS
− log (conditional)
likelihood
Let
(−) log likelihood
(also same for CLR, Cox model)
�
�
0
0
L(β) =
=Y
L(β)
Y Xβ
Xβ −
−N
N log
log {M
{M [T
[T ×
× exp
exp(Xβ)]}
(Xβ)]}..
Then one-dimensional
one-dimensional gradients
Then
gradients and
and Hessians
Hessians are
are
∂∂22LL
�
0 [W × (1 − W)] ,
=
−N
2
=
−N
[W × (1 − W)] ,
∂βj2
∂βj
∂L
�
�
∂L
0 W and
=Y
Y0 X
Xj −
N
=
−
N
W and
j
∂βj
∂β
j
where
where
�
�
M T × exp (Xβ) × Xj
W=
M [T × exp (Xβ)]
sparse matrix {0,1}
sparse column {0,1}
element-wise
Forming W requires column-wise operations and sparse
matrix-vector
βj changes multiplication
→ update Xβ
Further propagation?
(sparse), linear/logistic
Fused transformation
regression
and reduction for
Marc(2007);
Suchard
Genkins, Madigan et al. (2007); Park and Hastie
Wu
and Lange (2009)
Marc Suchard
Ridiculuously parallel statistics
gradient and Hessian
Massively parallel statistical computing
Vectorization of Sparse Updates
CUDA kernel for updating:
w math for figure:
Start 1 thread per non-0
element in sparse-column
Xj (compressed column
storage)
Xβ, T × exp(Xβ) and
M [T × exp(Xβ)]
β, L × exp(Xβ), and M [L × exp(Xβ)] given Xj and ∆βj
β ← Xβ + ∆βj−1 Xj−1
given Xj and ∆βj
date M [L × exp(Xβ)]
 new 
 Xβ 1 




 Xβ 

2




 Xβ  =

3


 . 
 . 
 . 




Xβ K

 old  sparsecolumn
Thread #1, k = 1, n = 1
1
 Xβ 1 
 


 


0
 Xβ 
 

2
 


 


 Xβ  + ∆βj 1
Thread #2, k = 3, n = 1
 

3
 


.
 . 
.
 . 
.
 . 
 


 


Thread #3, k = K, n = N
1
Xβ K

  
 t1   ∆ exp(Xβ 1 ) 
map k to n

   
new


  

  
  t2   ∆ exp(Xβ 2 ) 
 ∆1 
1 1 1

   









 . 

..
  t  ×  ∆ exp(Xβ ) 
 ..  = 
.
  3  



3 

   




  .  




  
...
1 1  ..  
∆N


  





store (n, k) in COO
∆ exp(Xβ K )
tK
Element-wise ex and ×
M in coordinate (COO)
storage, columns same as
Xj
Atomic adds to
M [T × exp(Xβ)] (∆k ) to
avoid race conditions
new
Marc Suchard
Massively parallel statistical computing
Observational Healthcare Data Science and
Informatics (OHDSI)
Public-private collaborative between government, industry and
academia to inform active medical product surveillance
Example Databases
MarketScan Commercial Claims and Encounters (CCAE) –
59 million privately insured lives
MarketScan Lab Database (MSLR) – 1.5 million privately
insured lives with laboratory results
Example Health Outcomes
Acute liver injury
Acute renal failure
Bleeding
Upper gastrointestinal tract ulcer
Marc Suchard
Massively parallel statistical computing
Can We Fit in Reasonable Time?
Cases-Only Dataset Ranges
N = 115K to 3.6M patients, taking J = 1224 to 1428
different drugs in K = 3.8M to 75M exposure eras.
Fitting largest originally drained 51 hours (pt-estimate)
105
104
●
●
172
103
181
150
●
179
37
●
●
23
101
102
●
51 hours → 29 seconds
30
↑ hopes for cross-validation
and full Bayesian inference
23
100
Fitting Time (secs)
Genkin/Wu (white circles), all
sparse on CPU (black circles),
all sparse on GPU (black
squares)
●
●
105
106
107
Number of Patients
Off by an order-of-magnitude
on hyperparameter λ
Ran on Amazon EC2 Cloud
Marc Suchard
Massively parallel statistical computing
“Bootstrapping Is Computationally Infeasible . . . "
Pharmacological Causes of Angiodema
2
0.5
0.625
0.75
0
1
−2
−1
^
Effect size β
1
0.875
^ > 0.5
Drugs (441) for which ρ
Non-parametric bootstrap 95% confidence intervals (CIs) for
the 441 drug effects that demonstrated non-zero coefficients in
at least 50% of the bootstrap replicates.
Drotrecogin alfa (treatment for severe sepsis): 4th largest
coefficient, but CI overs 0. Medical controversy!
Marc Suchard
Massively parallel statistical computing
Hardware Comparison (via OpenCL)
1024
256
What about the Intel Phi? Dense problem (in different domain)
256
●●
●●●●
●●●●
●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●
64
●●
GPU: AMD Radeon HD 7970 GHz Edition
GPU: NVIDIA GeForce GTX 580 (CUDA)
GPU: NVIDIA Tesla K20m
MIC : Intel Xeon Phi SE10P
● CPU: Intel Xeon E5−2680 x2 (16 cores)
● CPU: Intel Xeon E5−2680 (single core)
GFLOPS
●●
16
16
●
4
●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
4
1
speedup factor
64
164
100
1,000
1e+04
6e+04
unique site patterns
60+ async CPUs, expensive cache, large memory bus
Modest performance: highly stereotyped computation,
memory-bandwidth limited (like most inference problems)
Marc Suchard
Massively parallel statistical computing
Acknowledgments
Joint work with
D. Madigan (Columbia)
P. Ryan and M. Schumie (J&J)
T. Shaddox (UCLA Ph.D. student)
Financial support
Origins of HIV
KINSHASA
Alfred P. Sloan Fellowship
KISANGANI
MBUYI MAYI
BWAMANDA
LUBUMBASHI
John Simon Guggenheim Fellowship
LIKASI
NSF IIS 1251151 (BigData) and
DMS 1264153
NIH R01 AI107034 and
R01 HG006139
Ancient DNA
Google Research Gift
Marc Suchard
Massively parallel statistical computing