A Machine-learning Approach to Pharmacogenomics. Application to

A machine learning approach to
pharmacogenomics:
application to therapy of
Myasthenia Gravis
Dimos Kapetis
Bioinformatics, Scientific Direction
Neurological Institute “Carlo Besta”, Italy
Email: [email protected]
A machine learning approach to pharmacogenomics:
application to therapy of Myasthenia Gravis
PGx of
Myasthenia
Gravis
Aim of
Study
Hypothesis
• Rare autoimmune disease antibody-mediated leading to fluctuating muscle weakness
and fatigability
• Single gene-based approach of thiopurine methyltransferase (TMPT)
has uncovered genotype-phenotype association
• Azathioprine (AZA) is a purine antagonist used as an immunosuppressant to block
T- and B-cell proliferation.
• Intolerance to AZA can occur in the absence of intolerance-associated
TPMT alleles [Colleoni et al 2012 J Clin Pharmacol]
•Establish and refine a machine learning-based pipeline applied to pathway-based
microarray data to analyze combinations of SNPs that impact on metabolic
pathways in the context of drug response.
•Machine Learning methods are able to model the relationship between genotypephenotype and determine SNP interactions.
•SNP combinations can identify minor associations that would not have been detected
with a single-based approach.
Design/Methods to study combinations of SNPs that
impact on metabolic pathways in the context of drug
response
Study
Population
AZA
Response
PathwayBased PGx
Data Mining
Pipeline
•Responders (control group): n=60
•Non-Responders: n=40
•Intolerant : n=39
•AZA dose: 100-200 mg per day
• Responders: showed beneficial after 1 year of treatment
• Intolerant: experiencing persistent side effects upon treatment
•Non-responders: no pharmacological effect
Design
•Genomic DNA extracted from peripheral blood
•1,936 drug metabolism markers in ~230 pharmacogenes
•
•
•
Feature Selection: InfoGain, Relief, Chi-squared, wrappers
Multifactor Dimensionality Reduction, BayesNet, Logistic
function, Random Forest
True Positive metrics was used to compare classification
accuracy
Methods
Data Mining Pipeline
Pre-processing
1936 SNPs
(235 genes)
Feature Selection (FS)
Model Building (MB)
Random
Forest
Wrapper
Algorithms
InfoGain
Relief
BayesNet
Logistic
ChiSquared
Performance
Evaluation in
Cross-Validation
Multifactor Dimensionality
Reduction (MDR)
MDR (Jason H. Moore et al 2006)
FS and MB are performed in WEKA (Mark Hall et al 2009)
Responders Vs Non-Responders MG patients
Classification performance comparison
102
100
98
96
94
92
90
88
86
84
82
GS
GS
wrapper+BayesNe wrapper+Random
Relief+MDR (4
SNPs)
GS
wrapper+Logistic
t (7 SNPs)
Forest (8 SNPs)
3-Folds
88.8
87.9
99.3
90.5
5-Folds
87.9
89.4
98.3
92.48
10-Folds
88.8
89.4
99.3
90.5
Average Accuracy
88.5
88.9
98.9
91.16
3-Folds
5-Folds
10-Folds
(7 SNPS)
Average Accuracy
GS=GreedyStepwise
Responders Vs Intolerant MG patients
Classification performance comparison
98
96
94
92
90
88
86
84
82
80
78
GS
GS
wrapper+BayesNet
wrapper+RandomF
(8 SNPs)
orest (5 SNPs)
3-Folds
87.9
86.26
95.4
87.2
5-Folds
89.47
87.96
94.5
87.2
10-Folds
90.2
84.9
93.5
87.2
89
86.37
94.4
87.2
Average Accuracy
3-Folds
5-Folds
10-Folds
Relief+MDR (4
SNPS)
GS
wrapper+Logistic (8
SNPs)
Average Accuracy
GS=GreedyStepwise
Two 4-order SNP combinations predicts AZA
response in MG patients
A) Responders Vs NonResponders
A
Overall accuracy =98.9%
SLCO1B1*
(rs2291075)
MAF:0.45
SLC22A2
(rs624249)
MAF:0.29
4.84%
SLCO1B1(rs2291075) + ABCB1(rs2032582)
ABCB1(rs2032582) + UGT2B4 (rs1131878)
0.51%
4.17%
-1.33%
* SLCO1B1 (rs11045879), influence AZA response efficacy in
acute lymphoblastic leukemia [Stocco G at al 2012]
ABCB1**
(rs2032582)
MAF:0.34
**ABCB1 (rs2032582) missense mutation 2677TT was found in
non responders with Crohn’s disease [Mendoza JL et al 2007]
6.46%
UGT2B4
(rs1131878)
MAF=0.27
Redunduncy
Interaction
B) Responders Vs Intolerant
Overall accuracy = 94%
SLC7A8 (Rs2268873) + CBR3 (rs8133052)
CHST7 (rs735716) + CBR3 (rs8133052)
B
ABCC6
(rs8058694)
MAF:0.35
-4.56%
-2.23%
1.44%
1.55%
SLC7A8
Rs2268873
MAF:0.27
CBR3
(rs8133052)
MAF:0.4
-1.52%
CHST7***
(rs735716)
-3.27%
MAF:0.30
Conclusions
•We introduce a data mining two-steps analysis to select and classify MG
patients able to classify our control groups vs responder and Intolerant group.
•The MDR method outperformed other ML methods and determine with high
degree of probability response to AZA in MG patients.
•The two 4-models permitted to identify SNP synergic interactions in relatively
small sample sizes.
•In conclusion, in this research study shows that the application of a combination
of multi-locus (in contrast to single based approach) may amplify the effects of
single SNPs and is a powerful approach for pharmacogenomics studies
Next Steps
• Recruit more MG patients to validate the SNP interaction model
• Use this approach to identify the inherited basis for inter-individual
differences in response to AZA in patients with other
immunomediated disease such as Multiple Sclerosis
• R-Statistical package development of the ML pipeline
ACKNOWLEDGEMENTS
Neurologia IV Unit
Dr. Renato Mantegazza
Dr. Carlo Antozzi
Dr. Pia Bernasconi
Dr. Lara Colleoni
Dr. Lorenzo Maggi
Dr. Fulvio Baggi
Bioinformatics
Dr.Barbara Galbardi
Genopolis Consortium
Dr.Maria Foti
Questions ?
Dimos Kapetis
Bioinformatics, Scientific Direction
Neurological Institute “Carlo Besta”, Italy
Email: [email protected]
Wrappers
“Wrap around” the
learning algorithm
Must therefore always
evaluate subsets
Return the best subset of
attributes
Apply for each learning
algorithm
Use same search methods
as before
Select a subset of
attributes
Induce learning
algorithm on this
subset
Evaluate the resulting
model (e.g.,
accuracy)
No
Stop?
Yes
Method
Multifactor dimensionality reduction (MDR):
Data mining approach to detect and characterize combinations of
SNPs that interact to influence a dependent or class variable.
MDR identify interactions among discrete variables that influence
a binary outcome (e.g Treated vs CNTL) and is considered a nonparametric alternative to logistic regression analysis.
MDR is good predictive machine learning method to analyze the
data. The Naive Bayes classifier is used.



Metabolism of 6-MP
L Wang and R Weinshilboum, Oncogene 25, 1629-1638 (2006)
Azathioprine Pathway metabolism