Evaluation of methods for identifying exposure-related differentially methylated regions in human blood DNA Matthew Suderman Postdoctoral Research Assistant MRC Integrative Epidemiology Unit University of Bristol, UK CpG site correlation (R) CpG site dependence and distance CpG site distance (bp) Repression and DNA methylation upstream downstream Medvedeva et al. BMC Genomics 2014 15:119 Classical differentially methylated region Jaffe A E et al. Int. J. Epidemiol. 2012;41:200-209 Questions • How to calculate statistical significance? • Predefine regions? • Sliding window? • 450K data vs whole genome bisulfite sequencing (~20M) Predefining regions gap bumphunter::clusterMaker(maxgap) Different behaviour across genes Madeleine et al. Nature Biotechnology, 2009 Bumphunting Step 1: linear regression for each CpG site Step 2: smooth regression coefficients across the genome Step 3: identify candidate DMRs; “area” is the statistic Step 4: construct null distribution from permutations Jaffe A E et al. Int. J. Epidemiol. 2012;41:200-209 Combining site statistics CG1 CG2 CG3 CGn TSS • Fisher’s method for combining p-values x = -2Σi=1..n log(pi) x has X2n2 distribution • Stouffer’s method for combining z-scores z = Σi=1..n zi / √ n z = Σi=1..n wizi / √ Σi=1..n wi2 Stouffer: choosing weights z = Σi=1..n wizi / √ Σi=1..n wi2 Weights can be chosen to emphasize: • independence wi = 1/avg correlation between site i and others • agreement wi = avg correlation between site i and others • variability wi = variance of site i methylation Stouffer: non-independence 1. z = Σi=1..n wizi / √ Σi=1..n wi2 + 2 Σi<j wiwjrij Estimate rij (correlation zi and zj) by bootstrapping. Comb-p: transform to independent z-scores 2. Estimate correlation between CpG sites at given distances b. Construct corresponding correlation matrix c. Compute Cholesky factor d. Use factor to generate independent z-scores a. Linear regression • OLS: y = X β + ϵ minimizing | X β – y|2 y = phenotype/exposure variable X = methylation levels (rows=samples, columns=CpG sites) • Lasso: minimizing | X β – y|2 + λ | β| • Ridge: minimizing | X β – y|2 + λ | β|2 The globaltest::gt() function is designed for this. The Avon Longitudinal Study of Parents and Children (Children of the 90s) Antenatal <1y 1y 2y 3y 4y 5y 6y 7y 8y Child’s health (e.g. medical history, anthropometry) Child’s demographics (e.g. ethnicity, social background) Environmental health (e.g. pollutant exposure) Child’s lifestyle (e.g. physical ac@vity, diet) Child’s school & educa@on Child’s behaviour and psychology Child’s development (e.g. cogni@ve, motor skills, puberty) Parent’s lifestyle (e.g. smoking & drinking) Parent’s psychological well-‐being Biological samples (blood, plasma, serum, cells, @ssue, hair, urine) 9y 10y 11y 12y 13y 16+ Accessible Resource for Integrated Epigenomic Studies (ARIES) cord blood Number of associations exposures/phenotypes Recrea@onal drug Depression Miscarriages Blood metal Pain medica@on (early) SEP (income) Sensory phenotype SEP (home) Pain medica@on (late) Diet Home air quality Blood metal SEP (house) Air pollu@on Tobacco Birth characteris@c Pregnancy .me point single-‐site bumphunter globaltest stouffer (cor) stouffer (ind) lasso (ChAMP) cord 0 0 0 0 0 0 cord 0 0 0 0 0 0 cord 0 0 0 0 0 1 cord 0 1 0 0 0 1 cord 0 1 0 0 0 1 cord 0 0 0 2 0 0 cord 0 0 0 0 0 2 cord 0 0 0 1 1 1 cord 0 1 1 3 4 0 cord 0 0 1 13 13 0 cord 1 1 0 1 1 0 cord 1 0 5 2 2 1 cord 3 0 0 0 0 3 cord 3 0 4 3 2 0 cord 32 1 26 27 27 1 cord 221 1 66 46 50 5 cord 2013 1 606 435 398 17 DMR methods contribute exposures/phenotypes Recrea@onal drug Depression Miscarriages Blood metal Pain medica@on (early) SEP (income) Sensory phenotype SEP (home) Pain medica@on (late) Diet Home air quality Blood metal SEP (house) Air pollu@on Tobacco Birth characteris@c Pregnancy .me point single-‐site bumphunter globaltest stouffer (cor) stouffer (ind) lasso (ChAMP) cord 0 0 0 0 0 0 cord 0 0 0 0 0 0 cord 0 0 0 0 0 1 cord 0 1 0 0 0 1 cord 0 1 0 0 0 1 cord 0 0 0 2 0 0 cord 0 0 0 0 0 2 cord 0 0 0 1 1 1 cord 0 1 1 3 4 0 cord 0 0 1 13 13 0 cord 1 1 0 1 1 0 cord 1 0 5 2 2 1 cord 3 0 0 0 0 3 cord 3 0 4 3 2 0 cord 32 1 26 27 27 1 cord 221 1 66 46 50 5 cord 2013 1 606 435 398 17 Stouffer (cor) DMRs exposures/phenotypes Recrea@onal drug Depression Miscarriages Blood metal Pain medica@on (early) SEP (income) Sensory phenotype SEP (home) Pain medica@on (late) Diet Home air quality Blood metal SEP (house) Air pollu@on Tobacco Birth characteris@c Pregnancy .me point single-‐site bumphunter globaltest stouffer (cor) stouffer (ind) lasso (ChAMP) cord 0 0 0 0 0 0 cord 0 0 0 0 0 0 cord 0 0 0 0 0 1 cord 0 1 0 0 0 1 cord 0 1 0 0 0 1 cord 0 0 0 2 0 0 cord 0 0 0 0 0 2 cord 0 0 0 1 1 1 cord 0 1 1 3 4 0 cord 0 0 1 13 13 0 cord 1 1 0 1 1 0 cord 1 0 5 2 2 1 cord 3 0 0 0 0 3 cord 3 0 4 3 2 0 cord 32 1 26 27 27 1 cord 221 1 66 46 50 5 cord 2013 1 606 435 398 17 methylation levels linear model coefficients SEP (income) DMR variable ~35kb upstream Globaltest DMRs exposures/phenotypes Recrea@onal drug Depression Miscarriages Blood metal Pain medica@on (early) SEP (income) Sensory phenotype SEP (home) Pain medica@on (late) Diet Home air quality Blood metal SEP (house) Air pollu@on Tobacco Birth characteris@c Pregnancy .me point single-‐site bumphunter globaltest stouffer (cor) stouffer (ind) lasso (ChAMP) cord 0 0 0 0 0 0 cord 0 0 0 0 0 0 cord 0 0 0 0 0 1 cord 0 1 0 0 0 1 cord 0 1 0 0 0 1 cord 0 0 0 2 0 0 cord 0 0 0 0 0 2 cord 0 0 0 1 1 1 cord 0 1 1 3 4 0 cord 0 0 1 13 13 0 cord 1 1 0 1 1 0 cord 1 0 5 2 2 1 cord 3 0 0 0 0 3 cord 3 0 4 3 2 0 cord 32 1 26 27 27 1 cord 221 1 66 46 50 5 cord 2013 1 606 435 398 17 Model fit predicted linear model coefficients Blood metal levels DMR measurements ~65kb from binding gene New tobacco exposure DMRs? exposures/phenotypes Recrea@onal drug Depression Miscarriages Blood metal Pain medica@on (early) SEP (income) Sensory phenotype SEP (home) Pain medica@on (late) Diet Home air quality Blood metal SEP (house) Air pollu@on Tobacco Birth characteris@c Pregnancy .me point single-‐site bumphunter globaltest stouffer (cor) stouffer (ind) lasso (ChAMP) cord 0 0 0 0 0 0 cord 0 0 0 0 0 0 cord 0 0 0 0 0 1 cord 0 1 0 0 0 1 cord 0 1 0 0 0 1 cord 0 0 0 2 0 0 cord 0 0 0 0 0 2 cord 0 0 0 1 1 1 cord 0 1 1 3 4 0 cord 0 0 1 13 13 0 cord 1 1 0 1 1 0 cord 1 0 5 2 2 1 cord 3 0 0 0 0 3 cord 3 0 4 3 2 0 cord 32 1 26 27 27 1 cord 221 1 66 46 50 5 cord 2013 1 606 435 398 17 New tobacco exposure DMRs single-site globaltest 11/32 sites stouffer (ind) 1 13 12 12 3 stouffer (cor) 3 DMRs and replication 500 samples associated sites/DMRs probes 1000 samples random split compare associated sites/DMRs repeat 10x stouffer (ind) stouffer (cor) Number replicated global test single site stouffer (ind) stouffer (cor) global test Number sites/DMRs single site stouffer (ind) stouffer (cor) global test single site Tobacco exposure replication % replicated stouffer (ind) stouffer (cor) Number replicated global test single site stouffer (ind) stouffer (cor) global test Number sites/DMRs single site stouffer (ind) stouffer (cor) global test single site Pregnancy (lots of associations) % replicated stouffer (ind) stouffer (cor) Number replicated global test single site stouffer (ind) stouffer (cor) global test Number sites/DMRs single site stouffer (ind) stouffer (cor) global test single site Blood metal levels replication % replicated Two-stage analysis 500 samples probes 1000 samples Stage 1: identify sites/DMRs with p < threshold random split repeat 10x Stage 2: test only those that pass the threshold Two-stage analysis sensitivity exposures/phenotypes Recrea@onal drug Depression Miscarriages Blood metal Pain medica@on (early) SEP (income) Sensory phenotype SEP (home) Pain medica@on (late) Diet Home air quality Blood metal SEP (house) Air pollu@on Tobacco Birth characteris@c Pregnancy Full dataset analysis Two-‐stage analysis (mean n=10 splits) single-‐ Bump-‐ Global-‐ stouffer stouffer lasso single-‐ Global-‐ stouffer stouffer hunter test (cor) (ind) (ChAMP) site test (cor) (ind) .me point site cord 0 0 0 0 0 0 0 0.2 0.3 0.3 cord 0 0 0 0 0 0 0.1 0.1 0.7 0.5 cord 0 0 0 0 0 1 0 0.4 0.5 0.3 cord 0 1 0 0 0 1 0 0.5 0.3 0.3 cord 0 1 0 0 0 1 0 0.4 0.2 0.2 cord 0 0 0 2 0 0 0 0.6 0.3 0.4 cord 0 0 0 0 0 2 0 0 0.2 0.3 cord 0 0 0 1 1 1 0 0.1 0.5 0.5 cord 0 1 1 3 4 0 0 0.3 0.4 0.5 cord 0 0 1 13 13 0 0 0.3 0.4 0.6 cord 1 1 0 1 1 0 0 0 0.5 0.4 cord 1 0 5 2 2 1 0.9 1.5 0.6 0.7 cord 3 0 0 0 0 3 0.4 0.2 0.3 0.2 cord 3 0 4 3 2 0 1 1.4 0.6 0.4 cord 32 1 26 27 27 1 2.4 3.7 4.3 4.4 cord 221 1 66 46 50 5 4.9 6.7 4.9 5.4 cord 2013 1 606 435 398 17 37.2 24.3 19.1 18.5 Exposure/phenotype distributions 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 Future directions • Stouffer weighting schemes • Stouffer without bootstrapping (e.g. Comb-p) • Bumphunter • Blockfinder • Lasso (ChAMP) • DMRs and exposure prediction • Grand theory about association numbers Acknowledgements Caroline Relton George Davey Smith Phenotypes/exposures Jean Golding Kate Northstone Rebecca Richmond Stouffer’s method Andrew Simpkins ALSPAC participants ARIES Sue Ring Wendy McArdle Tom Gaunt Geoff Woodward Oliver Lyttleton
© Copyright 2025 ExpyDoc