False Discovery Rate Part I : introduction et enjeux

False Discovery Rate
Part I : introduction et enjeux
E. Roquain1
1
Laboratory LPMA, Université Pierre et Marie Curie (Paris 6), France
Point de Vue, 3rd February 2014
E. Roquain
FDR : introduction, enjeux et perspectives. Part I.
1 / 33
1 Introduction
2 False discovery rate control
3 FDR in other statistical issues
E. Roquain
FDR : introduction, enjeux et perspectives. Part I.
2 / 33
1 Introduction
2 False discovery rate control
3 FDR in other statistical issues
E. Roquain
FDR : introduction, enjeux et perspectives. Part I.
Introduction
3 / 33
A “multiple testing joke" (http://xkcd.com)
E. Roquain
FDR : introduction, enjeux et perspectives. Part I.
Introduction
4 / 33
A “multiple testing joke" (http://xkcd.com)
E. Roquain
FDR : introduction, enjeux et perspectives. Part I.
Introduction
4 / 33
A “multiple testing joke" (http://xkcd.com)
E. Roquain
FDR : introduction, enjeux et perspectives. Part I.
Introduction
4 / 33
A “multiple testing joke" (http://xkcd.com)
Multiplicity problem
P( make at least one false discovery ) P( the i-th is a false discovery )
A correction is needed to assess significancy!
E. Roquain
FDR : introduction, enjeux et perspectives. Part I.
Introduction
4 / 33
Some other examples
Paradoxes due to large scale experiments
Probable facts appear significant
E. Roquain
FDR : introduction, enjeux et perspectives. Part I.
Introduction
5 / 33
Science-wise multiplicity issue - [Ioannidis (2005, PLoS Medicine)]
Open access, freely available online
Essay
Why Most Published Research Findings
Are False
John P. A. Ioannidis
Summary
There is increasing concern that most
current published research findings are
false. The probability that a research claim
is true may depend on study power and
bias, the number of other studies on the
same question, and, importantly, the ratio
of true to no relationships among the
relationships probed in each scientific
field. In this framework, a research finding
is less likely to be true when the studies
conducted in a field are smaller; when
effect sizes are smaller; when there is a
greater number and lesser preselection
of tested relationships; where there is
greater flexibility in designs, definitions,
outcomes, and analytical modes; when
there is greater financial and other
interest and prejudice; and when more
teams are involved in a scientific field
in chase of statistical significance.
Simulations show that for most study
designs and settings, it is more likely for
a research claim to be false than true.
Moreover, for many current scientific
fields, claimed research findings may
often be simply accurate measures of the
prevailing bias. In this essay, I discuss the
implications of these problems for the
conduct and interpretation of research.
E. Roquain
factors that influence this problem and
some corollaries thereof.
Modeling the Framework for False
Positive Findings
Several methodologists have
pointed out [9–11] that the high
rate of nonreplication (lack of
confirmation) of research discoveries
is a consequence of the convenient,
yet ill-founded strategy of claiming
conclusive research findings solely on
the basis of a single study assessed by
formal statistical significance, typically
for a p-value less than 0.05. Research
is not most appropriately represented
and summarized by p-values, but,
unfortunately, there is a widespread
notion that medical research articles
It can be proven that
most claimed research
findings are false.
should be interpreted based only on
p-values. Research findings are defined
here as any relationship reaching
formal statistical significance, e.g.,
effective interventions, informative
predictors, risk factors, or associations.
“Negative” research is also very useful.
“Negative” is actually a misnomer, and
the misinterpretation is widespread.
ublished research findings are
However, here we will target
sometimes refuted by FDR
subsequent
: introduction,
enjeux et perspectives.
relationships that investigators claim
is characteristic of the field and can
vary a lot depending on whether the
field targets highly likely relationships
or searches for only one or a few
true relationships among thousands
and millions of hypotheses that may
be postulated. Let us also consider,
for computational simplicity,
circumscribed fields where either there
is only one true relationship (among
many that can be hypothesized) or
the power is similar to find any of the
several existing true relationships. The
pre-study probability of a relationship
being true is R⁄(R + 1). The probability
of a study finding a true relationship
reflects the power 1 − β (one minus
the Type II error rate). The probability
of claiming a relationship when none
truly exists reflects the Type I error
rate, α. Assuming that c relationships
are being probed in the field, the
expected values of the 2 × 2 table are
given in Table 1. After a research
finding has been claimed based on
achieving formal statistical significance,
the post-study probability that it is true
is the positive predictive value, PPV.
The PPV is also the complementary
probability of what Wacholder et al.
have called the false positive report
probability [10]. According to the 2
× 2 table, one gets PPV = (1 − β)R⁄(R
− βR + α). A research finding is thus
Part I.
Introduction
6 / 33
Science-wise multiplicity issue - [Ioannidis (2005, PLoS Medicine)]
[Talk Benjamini Southampton (2013)] ; modeling:
Try many experiments
⇓
1000 pure noise
30 perfect signal
⇓
publish results with a p-value ≤ 0.05
⇓
' 50 false discoveries
I
Jager and Leek (2013). An estimate of the science-wise false discovery rate and application to the
top medical literature
I
30 true discoveries
' 14%
Ioannidis (2014). Discussion: Why "An estimate of the science-wise false discovery rate and
application to the top medical literature" is false
E. Roquain
FDR : introduction, enjeux et perspectives. Part I.
Introduction
6 / 33
Science-wise multiplicity issue - [Ioannidis (2005, PLoS Medicine)]
[Talk Benjamini Southampton (2013)] ; modeling:
Try many experiments
⇓
1000 pure noise
30 perfect signal
⇓
publish results with a p-value ≤ 0.05
⇓
' 50 false discoveries
I
Jager and Leek (2013). An estimate of the science-wise false discovery rate and application to the
top medical literature
I
30 true discoveries
' 14%
Ioannidis (2014). Discussion: Why "An estimate of the science-wise false discovery rate and
application to the top medical literature" is false
E. Roquain
FDR : introduction, enjeux et perspectives. Part I.
Introduction
6 / 33
Science-wise multiplicity issue - [Ioannidis (2005, PLoS Medicine)]
[Talk Benjamini Southampton (2013)] ; modeling:
Try many experiments
⇓
1000 pure noise
30 perfect signal
⇓
publish results with a p-value ≤ 0.05
⇓
' 50 false discoveries
I
Jager and Leek (2013). An estimate of the science-wise false discovery rate and application to the
top medical literature
I
30 true discoveries
' 14%
Ioannidis (2014). Discussion: Why "An estimate of the science-wise false discovery rate and
application to the top medical literature" is false
E. Roquain
FDR : introduction, enjeux et perspectives. Part I.
Introduction
6 / 33
Science-wise multiplicity issue - [Ioannidis (2005, PLoS Medicine)]
[Talk Benjamini Southampton (2013)] ; modeling:
Try many experiments
⇓
1000 pure noise
30 perfect signal
⇓
publish results with a p-value ≤ 0.05
⇓
' 50 false discoveries
I
Jager and Leek (2013). An estimate of the science-wise false discovery rate and application to the
top medical literature
I
30 true discoveries
' 14%
Ioannidis (2014). Discussion: Why "An estimate of the science-wise false discovery rate and
application to the top medical literature" is false
E. Roquain
FDR : introduction, enjeux et perspectives. Part I.
Introduction
6 / 33
Multiplicity in microarray [Hedenfalk et al. (2001)]
BRCA1 vs BRCA2
genes
I expression level (activity)
I genes differentially activated?
I 1 test for each gene
I thousands of genes
I nb replications dimension
I correlations
E. Roquain
FDR : introduction, enjeux et perspectives. Part I.
Introduction
7 / 33
Other applications
I Neuroimaging (FMRI)
activated regions?
I Econometrics
winning strategies?
I Astronomy
directions with stars?
E. Roquain
FDR : introduction, enjeux et perspectives. Part I.
Introduction
8 / 33
Canonical setting
I Xi = avg group 2 - avg group 1 (rescaled) for genes i
I Gaussian model :





X1
X2
..
.






 = µ


Xm
H1
H2
..
.


 
 
+
 
Hm
ε1
ε2
..
.



,

εm
with µ > 0, H ∈ {0, 1}m (fixed) and ε ∼ N (0, Γ) (Γi,i = 1).
I Γ = dependence structure = Im for now
Question: for each i, Hi = 0 or Hi = 1 ?
Multiple testing : favors the “0" decision
E. Roquain
FDR : introduction, enjeux et perspectives. Part I.
Introduction
9 / 33
Individual decision and errors
I Test statistic: Xi
I p-value: pi = Φ(Xi ), with Φ(z) = P(Z ≥ z), Z ∼ N (0, 1)
pi such that
if Hi = 0, pi ∼ U(0, 1)
if Hi = 1, pi ∼ Φ(Φ
−1
(·) − µ)
b i = 1{pi ≤ t} for some threshold t
I Choose H
I Two errors:
Hi = 0
Hi = 1
bi = 0
H
true negative
false negative
bi = 1
H
false positive
true positive
I False positive more annoying
E. Roquain
FDR : introduction, enjeux et perspectives. Part I.
Introduction
10 / 33
1.0
Picture 1. m = 100; m0 = 50; µ = 2; Γ = Im
0.8
0
0
0
0
0
00
0.0
0.2
0.4
0.6
000
00
0
00
00
000
0
1
0
1
00
000
1
0
01
0
00
0
001
1
0001
0 0000
011
1
0
0
1
1
1
111111
11011
1000
111111
11111111111111111111
0.0
E. Roquain
0.2
0.4
0.6
FDR : introduction, enjeux et perspectives. Part I.
0.8
1.0
Introduction
11 / 33
1.0
Picture 2. m = 100; m0 = 95; µ = 3; Γ = Im
0.0
0.2
0.4
0.6
0.8
00
0
0
000
00
0
0
0
000
000
0
0
0
00
0
00
00
000
000
00
0000
00
000
0
0
0
0000
000
00
00
000
00
000
0000
0
000
000
00
00
00
0
0
0
0
0
001
1000
1110
0.0
E. Roquain
0.2
0.4
0.6
FDR : introduction, enjeux et perspectives. Part I.
0.8
1.0
Introduction
12 / 33
1.0
Picture 3. m = 100; m0 = 50; µ = 0.01; Γ = Im
0.0
0.2
0.4
0.6
0.8
10
100
0
1
0001
1
10
10001
1
0
111
110
1
1
0
01
1
0000
0110
1
1
0
0
10
00
0
0
0
11
0
011
1
01
00011
1
1
01
1
101
111
11
11
000
101
100
00
001
1
0
010
0.0
E. Roquain
0.2
0.4
0.6
FDR : introduction, enjeux et perspectives. Part I.
0.8
1.0
Introduction
13 / 33
1.0
Picture 4. m = 100; m0 = 95; µ = 0.01; Γ = Im
0.0
0.2
0.4
0.6
0.8
0
00
00
000
001
0000
00
0
000
0
0000
0
0
0000
000
0
0
0
0
0
0000
00
0000
0
0
00
00
00
0
0
00
0
0
00010
0
0
00
0
0
001
0
0100
0
00
000
0000
00
1
0000
0.0
E. Roquain
0.2
0.4
0.6
FDR : introduction, enjeux et perspectives. Part I.
0.8
1.0
Introduction
14 / 33
0.0
0.2
0.4
0.6
0.8
1.0
Doing like for 1 test? t ≡ α = 0.1
0.0
E. Roquain
0.2
0.4
0.6
FDR : introduction, enjeux et perspectives. Part I.
0.8
1.0
Introduction
15 / 33
1.0
Doing like for 1 test? t ≡ α = 0.1
0.8
0
100
0000
0
0
0
00
00
0
0
0
0
000
0000
0.0
0.2
0.4
0.6
000
0
00
0
00
00
0
00
000
00000
0
0
0
000
0
0
000
01
00
0
1
0
0
00
0000
00000
0
000
00
000
00
0
00
00
000011
0.0
E. Roquain
0.2
0.4
0.6
FDR : introduction, enjeux et perspectives. Part I.
0.8
1.0
Introduction
15 / 33
0
Doing like for 1 test? t ≡
α = 0.1
0
0.20
0
0
0
0
0
0.15
0
0000
0
000
0
0.00
0.05
0.10
0
0
0
0
0
0
0
E. Roquain
0
00
10
1
000
0
0.0
0.2
0.4
0.6
FDR : introduction, enjeux et perspectives. Part I.
0.8
1.0
Introduction
15 / 33
0
Union bound Bonferroni?
t ≡ α/m = 0.1/100
0
0.20
0
0
0
0
0
0.15
0
0000
0
000
0
0.00
0.05
0.10
0
0
0
0
0
0
0
E. Roquain
0
00
10
1
000
0
0.0
0.2
0.4
0.6
FDR : introduction, enjeux et perspectives. Part I.
0.8
1.0
Introduction
16 / 33
0.20
Union bound Bonferroni? t ≡ α/m = 0.1/100
1
0
0
0
0.15
0
0
11
0
0.00
0.05
0.10
00
1
01
0
1
011
E. Roquain
111
1
1
01
1
1
11
01
1110
1
11
11
1101
11
11111111111111
0.0
0.2
0.4
0.6
FDR : introduction, enjeux et perspectives. Part I.
0.8
1.0
Introduction
16 / 33
0.00
0.05
0.10
0.15
0.20
Do something in between! t` = α`/m = 0.1`/100
E. Roquain
0.0
0.2
0.4
0.6
FDR : introduction, enjeux et perspectives. Part I.
0.8
1.0
Introduction
17 / 33
0.20
Do something in between! t` = α`/m = 0.1`/100
1
0
0
0
0.15
0
0
11
0
0.00
0.05
0.10
00
1
01
0
1
011
E. Roquain
111
1
1
01
1
1
11
01
1110
1
11
11
1101
11
11111111111111
0.0
0.2
0.4
0.6
FDR : introduction, enjeux et perspectives. Part I.
0.8
1.0
Introduction
17 / 33
0
Do something in between!
t` = α`/m = 0.1`/100
0
0.20
0
0
0
0
0
0.15
0
0000
0
000
0
0.00
0.05
0.10
0
0
0
0
0
0
0
E. Roquain
0
00
10
1
000
0
0.0
0.2
0.4
0.6
FDR : introduction, enjeux et perspectives. Part I.
0.8
1.0
Introduction
17 / 33
Smart !
. . . and rigorous ?
E. Roquain
FDR : introduction, enjeux et perspectives. Part I.
Introduction
18 / 33
1 Introduction
2 False discovery rate control
3 FDR in other statistical issues
E. Roquain
FDR : introduction, enjeux et perspectives. Part I.
False discovery rate control
19 / 33
BH procedure
p-value view
c.d.f. view
kb = max{0 ≤ i ≤ m : p(i) ≤ αi/m}
b m (t) ≥ t/α}
bt = max{t ∈ [0, 1] : G
bt = αkb/m
b i = 1{pi ≤ bt} = 1{Xi ≥ Φ−1 (bt)}
H
E. Roquain
FDR : introduction, enjeux et perspectives. Part I.
False discovery rate control
20 / 33
False discovery rate control
b i = 1{pi ≤ bt} (∀i),
For a decision H
b i = 1}
#{i : Hi = 0, H
FDP(bt) =
b i = 1}
#{i : H
FDR(bt) = E[FDP(bt)]
0
=0
0
Theorem [Benjamini and Hochberg (1995)] [Benjamini and Yekutieli (2001)]
If Γ = Im and bt threshold of BH procedure, ∀µ, H,
FDR(bt) = (m0 /m)α ≤ α
E. Roquain
FDR : introduction, enjeux et perspectives. Part I.
False discovery rate control
21 / 33
Simulations. m = 50; m0 = 25; µ = 3;
0 Γ = Im
FDP(BH) = 0
00
0
0
0.3
0.4
0
0.0
0.1
0.2
00
000
0
1
0
E. Roquain
1
1
1
111111111111111111111
0.0
0.2
0.4
0.6
FDR : introduction, enjeux et perspectives. Part I.
0.8
1.0
False discovery rate control
22 / 33
0.0
0.1
0.2
0.3
0.4
0 Γ=I
Simulations. m = 50; m0 = 25; µ = 3;
m
E. Roquain
FDP(BH) = 0.16
0
00
0
0
0
0
0
11
1
0
0
110
1110
0
1
1
1
1
1
1
1
1
1
1
111111
0.0
0.2
0.4
0.6
FDR : introduction, enjeux et perspectives. Part I.
0.8
1.0
False discovery rate control
22 / 33
0.0
0.1
0.2
0.3
0.4
Simulations. m = 50; m0 = 25; µ = 03; Γ = Im
E. Roquain
00
FDP(BH) = 0.0833
00
0
1
0
01
10
0
1
11
1
11
11
1
1
1
0
1
0
1
1
1
1111111
0.0
0.2
0.4
0.6
FDR : introduction, enjeux et perspectives. Part I.
0.8
1.0
False discovery rate control
22 / 33
0.0
0.1
0.2
0.3
0.4
Simulations. m = 50; m0 = 25; µ 0= 3; Γ = Im
E. Roquain
FDP(BH) = 0.08
1
010
0
0
0
1
1
1
1
00
1
111111111111111111
0.0
0.2
0.4
0.6
FDR : introduction, enjeux et perspectives. Part I.
0.8
1.0
False discovery rate control
22 / 33
0.0
0.1
0.2
0.3
0.4
Simulations. m = 50; m0 = 25; µ = 3; Γ = Im
E. Roquain
00
00
FDP(BH) = 0.12
0
1
0
0
1
0
10
0
101
101
11
1
1111
111111011111
0.0
0.2
0.4
0.6
FDR : introduction, enjeux et perspectives. Part I.
0.8
1.0
False discovery rate control
22 / 33
01
0
0
0
00
FDP(BH) = 0.167
0.3
0.4
Simulations. m = 50; m0 = 25; µ = 3; Γ = Im
1
0
1
1
1
0.0
0.1
0.2
0
E. Roquain
00
1
1
0
0111
111111111111111
0.0
0.2
0.4
0.6
FDR : introduction, enjeux et perspectives. Part I.
0.8
1.0
False discovery rate control
22 / 33
0
FDP(BH) = 0.0435
0
0
1
10
0.0
0.1
0.2
0.3
0.4
Simulations. m = 50; m0 = 25; µ0= 3; Γ = Im
E. Roquain
0
1
10
1
1
1
1
1
111
1111111111111
0.0
0.2
0.4
0.6
FDR : introduction, enjeux et perspectives. Part I.
0.8
1.0
False discovery rate control
22 / 33
FDP(BH) = 0
00
0
0.0
0.1
0.2
0.3
0.4
Simulations. m = 50; m0 = 25; µ00= 3; Γ = Im
E. Roquain
1
0
11
1
00
11
1111
1
1
1
11
1111111111
0.0
0.2
0.4
0.6
FDR : introduction, enjeux et perspectives. Part I.
0.8
1.0
False discovery rate control
22 / 33
0
0
00
0
10
FDP(BH) = 0
0.0
0.1
0.2
0.3
0.4
Simulations. m = 50; m0 = 25; µ = 3;00Γ = Im
E. Roquain
0
0
010
0
01
0
1
0
1
11
1
1
1111111111111111
0.0
0.2
0.4
0.6
FDR : introduction, enjeux et perspectives. Part I.
0.8
1.0
False discovery rate control
22 / 33
00
0
0
0
0
0
FDP(BH) = 0.04
0.3
0.4
0 Γ=I
Simulations. m = 50; m0 = 25; µ =003;
m
0.2
0
0.0
0.1
1
E. Roquain
0
1
111
0111
11111111111111111
0.0
0.2
0.4
0.6
FDR : introduction, enjeux et perspectives. Part I.
0.8
1.0
False discovery rate control
22 / 33
800
600
400
0
200
Number of citations
1000
1200
sional application areas, such as neural imaging, are on the rise also, showing its ability to be applied in many
diverse types of application. The list of the 10 highest cited papers that cite Benjamini and Hochberg (1995),
which isand
shown
in Table(1995).
1, is particularly
interesting,
it includes
statistical
suggesting
Benjamini
Hochberg
Controlling
the falsebecause
discovery
rate: a six
practical
andpapers,
powerful
approach to
that further theoretical and methodological developments of the method have had significant influence.
multiple testing
1996
1998
2000
2002
2004
2006
2008
Year
Fig. 1. Rapidly increasing number of citations
of Benjamini
and Hochberg (1995), suggesting that its influ[Benjamini
(2010,JRSSB)]
ence has not yet reached its peak (note that the figure for 2009 is only partially shown)
10 most
cited papers that
Benjamini and
now > Table
20, 1.000
citations
oncitegoogle
scholar !
Hochberg (1995)
E. Roquain
Rank
Article citing Benjamini
Number of
FDR : introduction, enjeux et perspectives. Part I.
False discovery rate control
23 / 33
1 Introduction
2 False discovery rate control
3 FDR in other statistical issues
E. Roquain
FDR : introduction, enjeux et perspectives. Part I.
FDR in other statistical issues
24 / 33
0.00
0.05
0.10
0.15
0.20
Why should FDR thresholding be adaptive to sparsity?
E. Roquain
0.0
0.2
0.4
0.6
FDR : introduction, enjeux et perspectives. Part I.
0.8
1.0
FDR in other statistical issues
25 / 33
0
Why should FDR thresholding
be adaptive to sparsity?
0
0.20
0
0
0
0
0
0.15
0
0000
0
000
0
0.00
0.05
0.10
0
0
0
0
0
0
0
E. Roquain
0
00
10
1
000
0
0.0
0.2
0.4
0.6
FDR : introduction, enjeux et perspectives. Part I.
0.8
1.0
FDR in other statistical issues
25 / 33
0.20
Why should FDR thresholding be adaptive to sparsity?
1
0
0
0
0.15
0
0
11
0
0.00
0.05
0.10
00
1
01
0
1
011
E. Roquain
111
1
1
01
11
11
1
0
1110
11
11
1
1101
11
11111111111111
0.0
0.2
0.4
0.6
FDR : introduction, enjeux et perspectives. Part I.
0.8
1.0
FDR in other statistical issues
25 / 33
[Linnemann] - increasing signal strength
E. Roquain
FDR : introduction, enjeux et perspectives. Part I.
FDR in other statistical issues
26 / 33
[Linnemann] - increasing signal strength
E. Roquain
FDR : introduction, enjeux et perspectives. Part I.
FDR in other statistical issues
26 / 33
[Linnemann] - increasing signal strength
E. Roquain
FDR : introduction, enjeux et perspectives. Part I.
FDR in other statistical issues
26 / 33
[Linnemann] - increasing signal strength
E. Roquain
FDR : introduction, enjeux et perspectives. Part I.
FDR in other statistical issues
26 / 33
[Linnemann] - increasing signal strength
E. Roquain
FDR : introduction, enjeux et perspectives. Part I.
FDR in other statistical issues
26 / 33
Adaptation to unknown sparsity
bt seems "adaptive" to the “quantity" of signal in the data
I Classification : where is the signal ?
[Bogdan et al. (2011)], [Neuvial and R. (2012)]
I Detection: is there some signal ?
[Ingster (2002)], [Donoho and Jin (2004)], etc
I Estimation: what is the value EX of the signal ?
d = Xi 1{|Xi | ≥ bt} (hard thresholding)
EX
[Abramovich et al. (2006)], [Donoho and Jin (2006)]
E. Roquain
FDR : introduction, enjeux et perspectives. Part I.
FDR in other statistical issues
27 / 33
Classification
Xi ∼ π0,m N (0, 1) + (1 − π0,m ) N (µm , 1), 1 ≤ i ≤ m, i.i.d.
but π0,m → 1 (sparse) and µm → ∞ (compensates sparsity).
I training set = null distribution known (one-class classification)
bm : R → {0, 1};
I Classification rule h
I Risk
m
X
−1
−1
ˆ
ˆ
Rm (hm ) = (1 − π0 ) E m
1{hm (Xi ) 6= Hi } .
i=1
I Classification boundary in (sparsity, signal) space such that
ˆm : Rm (h
ˆm ) → 0 (perfect classification)
Above the boundary, ∃h
ˆ m , Rm ( h
ˆm ) → 1 (unclassifiable)
Under the boundary, ∀h
E. Roquain
FDR : introduction, enjeux et perspectives. Part I.
FDR in other statistical issues
28 / 33
1.0
Classification boundary
µm =
0.8
Perfect classification
p
2r log m
π0,m = 1 − m−β
0.6
bBH (x) = 1{x ≥ Φ−1 (bt)}
BH h
m
0.4
r
with αm ∝ (log m)−1/2
I Classification boundary
attained by BH.
0.2
Unclassifiable
0.0
On the boundary :
risk BH ∼ Bayes risk.
0.5
0.6
0.7
0.8
0.9
1.0
β
[Bogdan et al. (2011)], [Neuvial and R. (2012)]
E. Roquain
FDR : introduction, enjeux et perspectives. Part I.
FDR in other statistical issues
29 / 33
Detection : is there some signal ?
Same model
Xi ∼ π0,m N (0, 1) + (1 − π0,m ) N (µm , 1), 1 ≤ i ≤ m, i.i.d.
but π0,m → 1 (sparse) and µm → ∞ (compensates sparsity).
I Test H0 : “N (0, Im )" against H1 : “mixture".
I Risk Rm (T ) = PH0 (T (X ) = 1) + PH1 (T (X ) = 0)
I Detection boundary in (sparsity, signal) space such that
Above the boundary, ∃T : Rm (T ) → 0 (perfect detection)
Under the boundary, ∀T , Rm (T ) → 1 (undetectable)
E. Roquain
FDR : introduction, enjeux et perspectives. Part I.
FDR in other statistical issues
30 / 33
Detection boundary
1.0
µm =
2r log m
π0,m = 1 − m−β
Perfect classification
0.8
p
T BH = 1{∃i : p(i) ≤ αm i/m}
0.6
with αm ∝ (log m)−1/2
I Detection boundary
attained by BH when
β ∈ (3/4, 1)
0.2
0.4
r
Perfect detection
0.0
Undetectable
0.5
0.6
0.7
0.8
β
E. Roquain
0.9
1.0
I Better to use “higher
criticism"
(
)
√
i/m − p(i)
max
mp
i
p(i) (1 − p(i) )
FDR : introduction, enjeux et perspectives. Part I.
FDR in other statistical issues
31 / 33
LASSO and FDR
Regression with orthogonal design:
X ∼ N (β, Im )
[Bogdan et al. (2013)]: sorted `1 penalized estimator (SLOPE)
)
(
m
X
1
2
b
β = arg minm
||X − β|| +
λk |β|(k )
β∈R
2
k =1
where λ1 ≥ λ2 ≥ · · · ≥ λm ; |β|(1) ≥ |β|(2) ≥ · · · ≥ |β|(m)
Selection with {i : βbi =
6 0}:
−1
p
(α/(2m)) ' 2 log m
p
−1
I λk = Φ (αk /(2m)) ' 2 log(m/k )
I λk = λ = Φ
E. Roquain
Bonferroni
BH !
FDR : introduction, enjeux et perspectives. Part I.
FDR in other statistical issues
32 / 33
LASSO and FDR
Regression with orthogonal design:
X ∼ N (β, Im )
[Bogdan et al. (2013)]: sorted `1 penalized estimator (SLOPE)
)
(
m
X
1
2
b
β = arg minm
||X − β|| +
λk |β|(k )
β∈R
2
k =1
where λ1 ≥ λ2 ≥ · · · ≥ λm ; |β|(1) ≥ |β|(2) ≥ · · · ≥ |β|(m)
Selection with {i : βbi =
6 0}:
−1
p
(α/(2m)) ' 2 log m
p
−1
I λk = Φ (αk /(2m)) ' 2 log(m/k )
I λk = λ = Φ
E. Roquain
Bonferroni
BH !
FDR : introduction, enjeux et perspectives. Part I.
FDR in other statistical issues
32 / 33
Outlook
Some conclusions for FDR
⊕ Very simple
⊕ Trade-off type I / power
⊕ Adaptive to sparsity
Some issues
! Sensitive to null hypothesis
! Choosing α
! Calibrating test statistics
Main challenge
What about dependence ?
E. Roquain
FDR : introduction, enjeux et perspectives. Part I.
FDR in other statistical issues
33 / 33
Outlook
Some conclusions for FDR
⊕ Very simple
⊕ Trade-off type I / power
⊕ Adaptive to sparsity
Some issues
! Sensitive to null hypothesis
! Choosing α
! Calibrating test statistics
Main challenge
What about dependence ?
E. Roquain
FDR : introduction, enjeux et perspectives. Part I.
FDR in other statistical issues
33 / 33
Outlook
Some conclusions for FDR
⊕ Very simple
⊕ Trade-off type I / power
⊕ Adaptive to sparsity
Some issues
! Sensitive to null hypothesis
! Choosing α
! Calibrating test statistics
Main challenge
What about dependence ?
E. Roquain
FDR : introduction, enjeux et perspectives. Part I.
FDR in other statistical issues
33 / 33