False Discovery Rate Part I : introduction et enjeux E. Roquain1 1 Laboratory LPMA, Université Pierre et Marie Curie (Paris 6), France Point de Vue, 3rd February 2014 E. Roquain FDR : introduction, enjeux et perspectives. Part I. 1 / 33 1 Introduction 2 False discovery rate control 3 FDR in other statistical issues E. Roquain FDR : introduction, enjeux et perspectives. Part I. 2 / 33 1 Introduction 2 False discovery rate control 3 FDR in other statistical issues E. Roquain FDR : introduction, enjeux et perspectives. Part I. Introduction 3 / 33 A “multiple testing joke" (http://xkcd.com) E. Roquain FDR : introduction, enjeux et perspectives. Part I. Introduction 4 / 33 A “multiple testing joke" (http://xkcd.com) E. Roquain FDR : introduction, enjeux et perspectives. Part I. Introduction 4 / 33 A “multiple testing joke" (http://xkcd.com) E. Roquain FDR : introduction, enjeux et perspectives. Part I. Introduction 4 / 33 A “multiple testing joke" (http://xkcd.com) Multiplicity problem P( make at least one false discovery ) P( the i-th is a false discovery ) A correction is needed to assess significancy! E. Roquain FDR : introduction, enjeux et perspectives. Part I. Introduction 4 / 33 Some other examples Paradoxes due to large scale experiments Probable facts appear significant E. Roquain FDR : introduction, enjeux et perspectives. Part I. Introduction 5 / 33 Science-wise multiplicity issue - [Ioannidis (2005, PLoS Medicine)] Open access, freely available online Essay Why Most Published Research Findings Are False John P. A. Ioannidis Summary There is increasing concern that most current published research findings are false. The probability that a research claim is true may depend on study power and bias, the number of other studies on the same question, and, importantly, the ratio of true to no relationships among the relationships probed in each scientific field. In this framework, a research finding is less likely to be true when the studies conducted in a field are smaller; when effect sizes are smaller; when there is a greater number and lesser preselection of tested relationships; where there is greater flexibility in designs, definitions, outcomes, and analytical modes; when there is greater financial and other interest and prejudice; and when more teams are involved in a scientific field in chase of statistical significance. Simulations show that for most study designs and settings, it is more likely for a research claim to be false than true. Moreover, for many current scientific fields, claimed research findings may often be simply accurate measures of the prevailing bias. In this essay, I discuss the implications of these problems for the conduct and interpretation of research. E. Roquain factors that influence this problem and some corollaries thereof. Modeling the Framework for False Positive Findings Several methodologists have pointed out [9–11] that the high rate of nonreplication (lack of confirmation) of research discoveries is a consequence of the convenient, yet ill-founded strategy of claiming conclusive research findings solely on the basis of a single study assessed by formal statistical significance, typically for a p-value less than 0.05. Research is not most appropriately represented and summarized by p-values, but, unfortunately, there is a widespread notion that medical research articles It can be proven that most claimed research findings are false. should be interpreted based only on p-values. Research findings are defined here as any relationship reaching formal statistical significance, e.g., effective interventions, informative predictors, risk factors, or associations. “Negative” research is also very useful. “Negative” is actually a misnomer, and the misinterpretation is widespread. ublished research findings are However, here we will target sometimes refuted by FDR subsequent : introduction, enjeux et perspectives. relationships that investigators claim is characteristic of the field and can vary a lot depending on whether the field targets highly likely relationships or searches for only one or a few true relationships among thousands and millions of hypotheses that may be postulated. Let us also consider, for computational simplicity, circumscribed fields where either there is only one true relationship (among many that can be hypothesized) or the power is similar to find any of the several existing true relationships. The pre-study probability of a relationship being true is R⁄(R + 1). The probability of a study finding a true relationship reflects the power 1 − β (one minus the Type II error rate). The probability of claiming a relationship when none truly exists reflects the Type I error rate, α. Assuming that c relationships are being probed in the field, the expected values of the 2 × 2 table are given in Table 1. After a research finding has been claimed based on achieving formal statistical significance, the post-study probability that it is true is the positive predictive value, PPV. The PPV is also the complementary probability of what Wacholder et al. have called the false positive report probability [10]. According to the 2 × 2 table, one gets PPV = (1 − β)R⁄(R − βR + α). A research finding is thus Part I. Introduction 6 / 33 Science-wise multiplicity issue - [Ioannidis (2005, PLoS Medicine)] [Talk Benjamini Southampton (2013)] ; modeling: Try many experiments ⇓ 1000 pure noise 30 perfect signal ⇓ publish results with a p-value ≤ 0.05 ⇓ ' 50 false discoveries I Jager and Leek (2013). An estimate of the science-wise false discovery rate and application to the top medical literature I 30 true discoveries ' 14% Ioannidis (2014). Discussion: Why "An estimate of the science-wise false discovery rate and application to the top medical literature" is false E. Roquain FDR : introduction, enjeux et perspectives. Part I. Introduction 6 / 33 Science-wise multiplicity issue - [Ioannidis (2005, PLoS Medicine)] [Talk Benjamini Southampton (2013)] ; modeling: Try many experiments ⇓ 1000 pure noise 30 perfect signal ⇓ publish results with a p-value ≤ 0.05 ⇓ ' 50 false discoveries I Jager and Leek (2013). An estimate of the science-wise false discovery rate and application to the top medical literature I 30 true discoveries ' 14% Ioannidis (2014). Discussion: Why "An estimate of the science-wise false discovery rate and application to the top medical literature" is false E. Roquain FDR : introduction, enjeux et perspectives. Part I. Introduction 6 / 33 Science-wise multiplicity issue - [Ioannidis (2005, PLoS Medicine)] [Talk Benjamini Southampton (2013)] ; modeling: Try many experiments ⇓ 1000 pure noise 30 perfect signal ⇓ publish results with a p-value ≤ 0.05 ⇓ ' 50 false discoveries I Jager and Leek (2013). An estimate of the science-wise false discovery rate and application to the top medical literature I 30 true discoveries ' 14% Ioannidis (2014). Discussion: Why "An estimate of the science-wise false discovery rate and application to the top medical literature" is false E. Roquain FDR : introduction, enjeux et perspectives. Part I. Introduction 6 / 33 Science-wise multiplicity issue - [Ioannidis (2005, PLoS Medicine)] [Talk Benjamini Southampton (2013)] ; modeling: Try many experiments ⇓ 1000 pure noise 30 perfect signal ⇓ publish results with a p-value ≤ 0.05 ⇓ ' 50 false discoveries I Jager and Leek (2013). An estimate of the science-wise false discovery rate and application to the top medical literature I 30 true discoveries ' 14% Ioannidis (2014). Discussion: Why "An estimate of the science-wise false discovery rate and application to the top medical literature" is false E. Roquain FDR : introduction, enjeux et perspectives. Part I. Introduction 6 / 33 Multiplicity in microarray [Hedenfalk et al. (2001)] BRCA1 vs BRCA2 genes I expression level (activity) I genes differentially activated? I 1 test for each gene I thousands of genes I nb replications dimension I correlations E. Roquain FDR : introduction, enjeux et perspectives. Part I. Introduction 7 / 33 Other applications I Neuroimaging (FMRI) activated regions? I Econometrics winning strategies? I Astronomy directions with stars? E. Roquain FDR : introduction, enjeux et perspectives. Part I. Introduction 8 / 33 Canonical setting I Xi = avg group 2 - avg group 1 (rescaled) for genes i I Gaussian model : X1 X2 .. . = µ Xm H1 H2 .. . + Hm ε1 ε2 .. . , εm with µ > 0, H ∈ {0, 1}m (fixed) and ε ∼ N (0, Γ) (Γi,i = 1). I Γ = dependence structure = Im for now Question: for each i, Hi = 0 or Hi = 1 ? Multiple testing : favors the “0" decision E. Roquain FDR : introduction, enjeux et perspectives. Part I. Introduction 9 / 33 Individual decision and errors I Test statistic: Xi I p-value: pi = Φ(Xi ), with Φ(z) = P(Z ≥ z), Z ∼ N (0, 1) pi such that if Hi = 0, pi ∼ U(0, 1) if Hi = 1, pi ∼ Φ(Φ −1 (·) − µ) b i = 1{pi ≤ t} for some threshold t I Choose H I Two errors: Hi = 0 Hi = 1 bi = 0 H true negative false negative bi = 1 H false positive true positive I False positive more annoying E. Roquain FDR : introduction, enjeux et perspectives. Part I. Introduction 10 / 33 1.0 Picture 1. m = 100; m0 = 50; µ = 2; Γ = Im 0.8 0 0 0 0 0 00 0.0 0.2 0.4 0.6 000 00 0 00 00 000 0 1 0 1 00 000 1 0 01 0 00 0 001 1 0001 0 0000 011 1 0 0 1 1 1 111111 11011 1000 111111 11111111111111111111 0.0 E. Roquain 0.2 0.4 0.6 FDR : introduction, enjeux et perspectives. Part I. 0.8 1.0 Introduction 11 / 33 1.0 Picture 2. m = 100; m0 = 95; µ = 3; Γ = Im 0.0 0.2 0.4 0.6 0.8 00 0 0 000 00 0 0 0 000 000 0 0 0 00 0 00 00 000 000 00 0000 00 000 0 0 0 0000 000 00 00 000 00 000 0000 0 000 000 00 00 00 0 0 0 0 0 001 1000 1110 0.0 E. Roquain 0.2 0.4 0.6 FDR : introduction, enjeux et perspectives. Part I. 0.8 1.0 Introduction 12 / 33 1.0 Picture 3. m = 100; m0 = 50; µ = 0.01; Γ = Im 0.0 0.2 0.4 0.6 0.8 10 100 0 1 0001 1 10 10001 1 0 111 110 1 1 0 01 1 0000 0110 1 1 0 0 10 00 0 0 0 11 0 011 1 01 00011 1 1 01 1 101 111 11 11 000 101 100 00 001 1 0 010 0.0 E. Roquain 0.2 0.4 0.6 FDR : introduction, enjeux et perspectives. Part I. 0.8 1.0 Introduction 13 / 33 1.0 Picture 4. m = 100; m0 = 95; µ = 0.01; Γ = Im 0.0 0.2 0.4 0.6 0.8 0 00 00 000 001 0000 00 0 000 0 0000 0 0 0000 000 0 0 0 0 0 0000 00 0000 0 0 00 00 00 0 0 00 0 0 00010 0 0 00 0 0 001 0 0100 0 00 000 0000 00 1 0000 0.0 E. Roquain 0.2 0.4 0.6 FDR : introduction, enjeux et perspectives. Part I. 0.8 1.0 Introduction 14 / 33 0.0 0.2 0.4 0.6 0.8 1.0 Doing like for 1 test? t ≡ α = 0.1 0.0 E. Roquain 0.2 0.4 0.6 FDR : introduction, enjeux et perspectives. Part I. 0.8 1.0 Introduction 15 / 33 1.0 Doing like for 1 test? t ≡ α = 0.1 0.8 0 100 0000 0 0 0 00 00 0 0 0 0 000 0000 0.0 0.2 0.4 0.6 000 0 00 0 00 00 0 00 000 00000 0 0 0 000 0 0 000 01 00 0 1 0 0 00 0000 00000 0 000 00 000 00 0 00 00 000011 0.0 E. Roquain 0.2 0.4 0.6 FDR : introduction, enjeux et perspectives. Part I. 0.8 1.0 Introduction 15 / 33 0 Doing like for 1 test? t ≡ α = 0.1 0 0.20 0 0 0 0 0 0.15 0 0000 0 000 0 0.00 0.05 0.10 0 0 0 0 0 0 0 E. Roquain 0 00 10 1 000 0 0.0 0.2 0.4 0.6 FDR : introduction, enjeux et perspectives. Part I. 0.8 1.0 Introduction 15 / 33 0 Union bound Bonferroni? t ≡ α/m = 0.1/100 0 0.20 0 0 0 0 0 0.15 0 0000 0 000 0 0.00 0.05 0.10 0 0 0 0 0 0 0 E. Roquain 0 00 10 1 000 0 0.0 0.2 0.4 0.6 FDR : introduction, enjeux et perspectives. Part I. 0.8 1.0 Introduction 16 / 33 0.20 Union bound Bonferroni? t ≡ α/m = 0.1/100 1 0 0 0 0.15 0 0 11 0 0.00 0.05 0.10 00 1 01 0 1 011 E. Roquain 111 1 1 01 1 1 11 01 1110 1 11 11 1101 11 11111111111111 0.0 0.2 0.4 0.6 FDR : introduction, enjeux et perspectives. Part I. 0.8 1.0 Introduction 16 / 33 0.00 0.05 0.10 0.15 0.20 Do something in between! t` = α`/m = 0.1`/100 E. Roquain 0.0 0.2 0.4 0.6 FDR : introduction, enjeux et perspectives. Part I. 0.8 1.0 Introduction 17 / 33 0.20 Do something in between! t` = α`/m = 0.1`/100 1 0 0 0 0.15 0 0 11 0 0.00 0.05 0.10 00 1 01 0 1 011 E. Roquain 111 1 1 01 1 1 11 01 1110 1 11 11 1101 11 11111111111111 0.0 0.2 0.4 0.6 FDR : introduction, enjeux et perspectives. Part I. 0.8 1.0 Introduction 17 / 33 0 Do something in between! t` = α`/m = 0.1`/100 0 0.20 0 0 0 0 0 0.15 0 0000 0 000 0 0.00 0.05 0.10 0 0 0 0 0 0 0 E. Roquain 0 00 10 1 000 0 0.0 0.2 0.4 0.6 FDR : introduction, enjeux et perspectives. Part I. 0.8 1.0 Introduction 17 / 33 Smart ! . . . and rigorous ? E. Roquain FDR : introduction, enjeux et perspectives. Part I. Introduction 18 / 33 1 Introduction 2 False discovery rate control 3 FDR in other statistical issues E. Roquain FDR : introduction, enjeux et perspectives. Part I. False discovery rate control 19 / 33 BH procedure p-value view c.d.f. view kb = max{0 ≤ i ≤ m : p(i) ≤ αi/m} b m (t) ≥ t/α} bt = max{t ∈ [0, 1] : G bt = αkb/m b i = 1{pi ≤ bt} = 1{Xi ≥ Φ−1 (bt)} H E. Roquain FDR : introduction, enjeux et perspectives. Part I. False discovery rate control 20 / 33 False discovery rate control b i = 1{pi ≤ bt} (∀i), For a decision H b i = 1} #{i : Hi = 0, H FDP(bt) = b i = 1} #{i : H FDR(bt) = E[FDP(bt)] 0 =0 0 Theorem [Benjamini and Hochberg (1995)] [Benjamini and Yekutieli (2001)] If Γ = Im and bt threshold of BH procedure, ∀µ, H, FDR(bt) = (m0 /m)α ≤ α E. Roquain FDR : introduction, enjeux et perspectives. Part I. False discovery rate control 21 / 33 Simulations. m = 50; m0 = 25; µ = 3; 0 Γ = Im FDP(BH) = 0 00 0 0 0.3 0.4 0 0.0 0.1 0.2 00 000 0 1 0 E. Roquain 1 1 1 111111111111111111111 0.0 0.2 0.4 0.6 FDR : introduction, enjeux et perspectives. Part I. 0.8 1.0 False discovery rate control 22 / 33 0.0 0.1 0.2 0.3 0.4 0 Γ=I Simulations. m = 50; m0 = 25; µ = 3; m E. Roquain FDP(BH) = 0.16 0 00 0 0 0 0 0 11 1 0 0 110 1110 0 1 1 1 1 1 1 1 1 1 1 111111 0.0 0.2 0.4 0.6 FDR : introduction, enjeux et perspectives. Part I. 0.8 1.0 False discovery rate control 22 / 33 0.0 0.1 0.2 0.3 0.4 Simulations. m = 50; m0 = 25; µ = 03; Γ = Im E. Roquain 00 FDP(BH) = 0.0833 00 0 1 0 01 10 0 1 11 1 11 11 1 1 1 0 1 0 1 1 1 1111111 0.0 0.2 0.4 0.6 FDR : introduction, enjeux et perspectives. Part I. 0.8 1.0 False discovery rate control 22 / 33 0.0 0.1 0.2 0.3 0.4 Simulations. m = 50; m0 = 25; µ 0= 3; Γ = Im E. Roquain FDP(BH) = 0.08 1 010 0 0 0 1 1 1 1 00 1 111111111111111111 0.0 0.2 0.4 0.6 FDR : introduction, enjeux et perspectives. Part I. 0.8 1.0 False discovery rate control 22 / 33 0.0 0.1 0.2 0.3 0.4 Simulations. m = 50; m0 = 25; µ = 3; Γ = Im E. Roquain 00 00 FDP(BH) = 0.12 0 1 0 0 1 0 10 0 101 101 11 1 1111 111111011111 0.0 0.2 0.4 0.6 FDR : introduction, enjeux et perspectives. Part I. 0.8 1.0 False discovery rate control 22 / 33 01 0 0 0 00 FDP(BH) = 0.167 0.3 0.4 Simulations. m = 50; m0 = 25; µ = 3; Γ = Im 1 0 1 1 1 0.0 0.1 0.2 0 E. Roquain 00 1 1 0 0111 111111111111111 0.0 0.2 0.4 0.6 FDR : introduction, enjeux et perspectives. Part I. 0.8 1.0 False discovery rate control 22 / 33 0 FDP(BH) = 0.0435 0 0 1 10 0.0 0.1 0.2 0.3 0.4 Simulations. m = 50; m0 = 25; µ0= 3; Γ = Im E. Roquain 0 1 10 1 1 1 1 1 111 1111111111111 0.0 0.2 0.4 0.6 FDR : introduction, enjeux et perspectives. Part I. 0.8 1.0 False discovery rate control 22 / 33 FDP(BH) = 0 00 0 0.0 0.1 0.2 0.3 0.4 Simulations. m = 50; m0 = 25; µ00= 3; Γ = Im E. Roquain 1 0 11 1 00 11 1111 1 1 1 11 1111111111 0.0 0.2 0.4 0.6 FDR : introduction, enjeux et perspectives. Part I. 0.8 1.0 False discovery rate control 22 / 33 0 0 00 0 10 FDP(BH) = 0 0.0 0.1 0.2 0.3 0.4 Simulations. m = 50; m0 = 25; µ = 3;00Γ = Im E. Roquain 0 0 010 0 01 0 1 0 1 11 1 1 1111111111111111 0.0 0.2 0.4 0.6 FDR : introduction, enjeux et perspectives. Part I. 0.8 1.0 False discovery rate control 22 / 33 00 0 0 0 0 0 FDP(BH) = 0.04 0.3 0.4 0 Γ=I Simulations. m = 50; m0 = 25; µ =003; m 0.2 0 0.0 0.1 1 E. Roquain 0 1 111 0111 11111111111111111 0.0 0.2 0.4 0.6 FDR : introduction, enjeux et perspectives. Part I. 0.8 1.0 False discovery rate control 22 / 33 800 600 400 0 200 Number of citations 1000 1200 sional application areas, such as neural imaging, are on the rise also, showing its ability to be applied in many diverse types of application. The list of the 10 highest cited papers that cite Benjamini and Hochberg (1995), which isand shown in Table(1995). 1, is particularly interesting, it includes statistical suggesting Benjamini Hochberg Controlling the falsebecause discovery rate: a six practical andpapers, powerful approach to that further theoretical and methodological developments of the method have had significant influence. multiple testing 1996 1998 2000 2002 2004 2006 2008 Year Fig. 1. Rapidly increasing number of citations of Benjamini and Hochberg (1995), suggesting that its influ[Benjamini (2010,JRSSB)] ence has not yet reached its peak (note that the figure for 2009 is only partially shown) 10 most cited papers that Benjamini and now > Table 20, 1.000 citations oncitegoogle scholar ! Hochberg (1995) E. Roquain Rank Article citing Benjamini Number of FDR : introduction, enjeux et perspectives. Part I. False discovery rate control 23 / 33 1 Introduction 2 False discovery rate control 3 FDR in other statistical issues E. Roquain FDR : introduction, enjeux et perspectives. Part I. FDR in other statistical issues 24 / 33 0.00 0.05 0.10 0.15 0.20 Why should FDR thresholding be adaptive to sparsity? E. Roquain 0.0 0.2 0.4 0.6 FDR : introduction, enjeux et perspectives. Part I. 0.8 1.0 FDR in other statistical issues 25 / 33 0 Why should FDR thresholding be adaptive to sparsity? 0 0.20 0 0 0 0 0 0.15 0 0000 0 000 0 0.00 0.05 0.10 0 0 0 0 0 0 0 E. Roquain 0 00 10 1 000 0 0.0 0.2 0.4 0.6 FDR : introduction, enjeux et perspectives. Part I. 0.8 1.0 FDR in other statistical issues 25 / 33 0.20 Why should FDR thresholding be adaptive to sparsity? 1 0 0 0 0.15 0 0 11 0 0.00 0.05 0.10 00 1 01 0 1 011 E. Roquain 111 1 1 01 11 11 1 0 1110 11 11 1 1101 11 11111111111111 0.0 0.2 0.4 0.6 FDR : introduction, enjeux et perspectives. Part I. 0.8 1.0 FDR in other statistical issues 25 / 33 [Linnemann] - increasing signal strength E. Roquain FDR : introduction, enjeux et perspectives. Part I. FDR in other statistical issues 26 / 33 [Linnemann] - increasing signal strength E. Roquain FDR : introduction, enjeux et perspectives. Part I. FDR in other statistical issues 26 / 33 [Linnemann] - increasing signal strength E. Roquain FDR : introduction, enjeux et perspectives. Part I. FDR in other statistical issues 26 / 33 [Linnemann] - increasing signal strength E. Roquain FDR : introduction, enjeux et perspectives. Part I. FDR in other statistical issues 26 / 33 [Linnemann] - increasing signal strength E. Roquain FDR : introduction, enjeux et perspectives. Part I. FDR in other statistical issues 26 / 33 Adaptation to unknown sparsity bt seems "adaptive" to the “quantity" of signal in the data I Classification : where is the signal ? [Bogdan et al. (2011)], [Neuvial and R. (2012)] I Detection: is there some signal ? [Ingster (2002)], [Donoho and Jin (2004)], etc I Estimation: what is the value EX of the signal ? d = Xi 1{|Xi | ≥ bt} (hard thresholding) EX [Abramovich et al. (2006)], [Donoho and Jin (2006)] E. Roquain FDR : introduction, enjeux et perspectives. Part I. FDR in other statistical issues 27 / 33 Classification Xi ∼ π0,m N (0, 1) + (1 − π0,m ) N (µm , 1), 1 ≤ i ≤ m, i.i.d. but π0,m → 1 (sparse) and µm → ∞ (compensates sparsity). I training set = null distribution known (one-class classification) bm : R → {0, 1}; I Classification rule h I Risk m X −1 −1 ˆ ˆ Rm (hm ) = (1 − π0 ) E m 1{hm (Xi ) 6= Hi } . i=1 I Classification boundary in (sparsity, signal) space such that ˆm : Rm (h ˆm ) → 0 (perfect classification) Above the boundary, ∃h ˆ m , Rm ( h ˆm ) → 1 (unclassifiable) Under the boundary, ∀h E. Roquain FDR : introduction, enjeux et perspectives. Part I. FDR in other statistical issues 28 / 33 1.0 Classification boundary µm = 0.8 Perfect classification p 2r log m π0,m = 1 − m−β 0.6 bBH (x) = 1{x ≥ Φ−1 (bt)} BH h m 0.4 r with αm ∝ (log m)−1/2 I Classification boundary attained by BH. 0.2 Unclassifiable 0.0 On the boundary : risk BH ∼ Bayes risk. 0.5 0.6 0.7 0.8 0.9 1.0 β [Bogdan et al. (2011)], [Neuvial and R. (2012)] E. Roquain FDR : introduction, enjeux et perspectives. Part I. FDR in other statistical issues 29 / 33 Detection : is there some signal ? Same model Xi ∼ π0,m N (0, 1) + (1 − π0,m ) N (µm , 1), 1 ≤ i ≤ m, i.i.d. but π0,m → 1 (sparse) and µm → ∞ (compensates sparsity). I Test H0 : “N (0, Im )" against H1 : “mixture". I Risk Rm (T ) = PH0 (T (X ) = 1) + PH1 (T (X ) = 0) I Detection boundary in (sparsity, signal) space such that Above the boundary, ∃T : Rm (T ) → 0 (perfect detection) Under the boundary, ∀T , Rm (T ) → 1 (undetectable) E. Roquain FDR : introduction, enjeux et perspectives. Part I. FDR in other statistical issues 30 / 33 Detection boundary 1.0 µm = 2r log m π0,m = 1 − m−β Perfect classification 0.8 p T BH = 1{∃i : p(i) ≤ αm i/m} 0.6 with αm ∝ (log m)−1/2 I Detection boundary attained by BH when β ∈ (3/4, 1) 0.2 0.4 r Perfect detection 0.0 Undetectable 0.5 0.6 0.7 0.8 β E. Roquain 0.9 1.0 I Better to use “higher criticism" ( ) √ i/m − p(i) max mp i p(i) (1 − p(i) ) FDR : introduction, enjeux et perspectives. Part I. FDR in other statistical issues 31 / 33 LASSO and FDR Regression with orthogonal design: X ∼ N (β, Im ) [Bogdan et al. (2013)]: sorted `1 penalized estimator (SLOPE) ) ( m X 1 2 b β = arg minm ||X − β|| + λk |β|(k ) β∈R 2 k =1 where λ1 ≥ λ2 ≥ · · · ≥ λm ; |β|(1) ≥ |β|(2) ≥ · · · ≥ |β|(m) Selection with {i : βbi = 6 0}: −1 p (α/(2m)) ' 2 log m p −1 I λk = Φ (αk /(2m)) ' 2 log(m/k ) I λk = λ = Φ E. Roquain Bonferroni BH ! FDR : introduction, enjeux et perspectives. Part I. FDR in other statistical issues 32 / 33 LASSO and FDR Regression with orthogonal design: X ∼ N (β, Im ) [Bogdan et al. (2013)]: sorted `1 penalized estimator (SLOPE) ) ( m X 1 2 b β = arg minm ||X − β|| + λk |β|(k ) β∈R 2 k =1 where λ1 ≥ λ2 ≥ · · · ≥ λm ; |β|(1) ≥ |β|(2) ≥ · · · ≥ |β|(m) Selection with {i : βbi = 6 0}: −1 p (α/(2m)) ' 2 log m p −1 I λk = Φ (αk /(2m)) ' 2 log(m/k ) I λk = λ = Φ E. Roquain Bonferroni BH ! FDR : introduction, enjeux et perspectives. Part I. FDR in other statistical issues 32 / 33 Outlook Some conclusions for FDR ⊕ Very simple ⊕ Trade-off type I / power ⊕ Adaptive to sparsity Some issues ! Sensitive to null hypothesis ! Choosing α ! Calibrating test statistics Main challenge What about dependence ? E. Roquain FDR : introduction, enjeux et perspectives. Part I. FDR in other statistical issues 33 / 33 Outlook Some conclusions for FDR ⊕ Very simple ⊕ Trade-off type I / power ⊕ Adaptive to sparsity Some issues ! Sensitive to null hypothesis ! Choosing α ! Calibrating test statistics Main challenge What about dependence ? E. Roquain FDR : introduction, enjeux et perspectives. Part I. FDR in other statistical issues 33 / 33 Outlook Some conclusions for FDR ⊕ Very simple ⊕ Trade-off type I / power ⊕ Adaptive to sparsity Some issues ! Sensitive to null hypothesis ! Choosing α ! Calibrating test statistics Main challenge What about dependence ? E. Roquain FDR : introduction, enjeux et perspectives. Part I. FDR in other statistical issues 33 / 33
© Copyright 2024 ExpyDoc