NONPARAMETRIC STATISTICAL THEORY Part III. Example Sheet 1

NONPARAMETRIC STATISTICAL THEORY
Part III.
Example Sheet 1 (of 3)
RJS & AKHK/Lent 2014
Comments and corrections to [email protected]
[Notation: For a square-integrable function g : R → R, define R(g) =
R∞
for a kernel K, define µ2 (K) = −∞ x2 K(x) dx.]
iid
iid
R∞
−∞
g(x)2 dx;
1. Let U1 , . . . , Un ∼ U (0, 1), and let Y1 , . . . , Yn+1 ∼ Exp(1). Writing Sj =
for j = 1, . . . , n + 1, show that
d
U(j) =
Pj
i=1
Yi
Sj
∼ Beta(j, n − j + 1),
Sn+1
for j = 1, . . . , n.
2. (Hoeffding’s inequality) (a) Let Y be a random variable with mean zero and
a ≤ Y ≤ b. Use convexity to show that for every t ∈ R, we have
log E(etY ) ≤ −αu + log(β + αeu ),
where u = t(b−a) and α = 1−β = −a/(b−a). Using a second-order Taylor expansion
about the origin, deduce that log E(etY ) ≤ t2 (b − a)2 /8.
(b) Now let Y1 , . . . , Yn be independent with E(Yi ) = 0 and ai ≤ Yi ≤ bi for i =
1, . . . , n. Use Markov’s inequality to show that, for every > 0, we have
X
n
22
P Yi > ≤ 2 exp − Pn
.
2
(b
−
a
)
i
i
i=1
i=1
3. Let X1 , . . . , Xn be independent with distribution P on a measurable space
Pn (X , A),
−1
ˆ
ˆ
and let Pn be the empirical measure of X1 , . . . , Xn ; thus Pn (A) = n
i=1 1{Xi ∈A}
for A ∈ A. Show that, for all > 0 and A ∈ A, we have
2
P(|Pˆn (A) − P (A)| > ) ≤ 2e−2n .
iid
4. (a) Let X1 , . . . , Xn ∼ F , and let Fˆn denote their empirical distribution function.
For t1 < . . . < tk , write down the distribution of
n Fˆn (t1 ), Fˆn (t2 ) − Fˆn (t1 ), . . . , Fˆn (tk ) − Fˆn (tk−1 ), 1 − Fˆn (tk ) .
1
(b) Find the asymptotic distribution of n1/2 Fˆn (t1 ) − F (t1 ), . . . , Fˆn (tk ) − F (tk ) .
5. (Continuation) We say a continuous process (Bt )t∈[0,1] is a standard Brownian
motion on [0, 1] if B0 = 0, and if, for 0 ≤ s1 ≤ t1 ≤ . . . ≤ sk ≤ tk ≤ 1, we
have (Bt1 − Bs1 , . . . , Btk − Bsk ) ∼ Nk (0, Σ), where Σ := diag(t1 − s1 , . . . , tk − sk ).
The process (Wt )t∈[0,1] defined by Wt = Bt − tB1 is called a Brownian bridge, or
tied-down Brownian motion, because W0 = W1 = 0. Compute the distribution of
(Wt1 , . . . , Wtk ).
d
[These last two questions suggest that “n1/2 Fˆn (t) − F (t) → WF (t) as n → ∞”. Care
is required to make this statement and its proof precise.]
6. (a) Verify the algebraic identity
φσ (x − µ)φσ0 (x − µ0 ) = φσσ0 /(σ2 +σ02 )1/2 (x − µ∗ )φ(σ2 +σ02 )1/2 (µ − µ0 ),
where µ∗ = (σ 02 µ + σ 2 µ0 )/(σ 2 + σ 02 ), and φσ (x) is the N (0, σ 2 ) density.
(b) Let X1 , . . . , Xn be independent N (0, σ 2 ) random variables. Taking K to be the
N (0, 1) density, show that the mean integrated squared error of the kernel density
estimate fˆh with kernel K and bandwidth h can be expressed exactly as
1
1
23/2
1
1
1
ˆ
+ 1−
−
+
.
M ISE(fh ) = 1/2
2π
nh
n (h2 + σ 2 )1/2 (h2 + 2σ 2 )1/2 σ
7. (Continuation) Now suppose that h = hn satisfies h → 0 as n → ∞ and nh → ∞
as n → ∞. Derive an appropriate asymptotic expansion of the M ISE computed
above, and deduce that the asymptotically optimal bandwidth with respect to the
M ISE criterion is given by
1/5
4
σ.
hAM ISE =
3n
Check that the same expression is obtained from the general formula for the asymptotically optimal bandwidth for a second-order kernel.
iid
8. Let X1 , . . . , Xn ∼ f , where f 00 is bounded. Write f˜b for the histogram estimator of
f with binwidth b. Assume b = bn → 0 andnb → ∞ as n → ∞. For x ∈ R, let Ib (x)
denote the bin containing x and pb (x) = P X1 ∈ Ib (x) denote the bin probability.
Show that
1
pb (x) = bf (x) + f 0 (x)[b2 − 2b{x − tb (x)}] + O(b3 )
2
as n → ∞, where tb (x) is the left-hand endpoint of Ib (x). Deduce that
1
f (x) 1 2 0 2
0
2
2
0
2
3
˜
M SE{fb (x)} =
+ b f (x) +f (x) {x−tb (x)} −bf (x) {x−tb (x)}+O +b .
nb
4
n
2
9. (Continuation) Assuming in addition that R(f 0 ) < ∞, argue informally that
1
1
1
M ISE(f˜b ) =
+ b2 R(f 0 ) + o
+ b2 .
nb 12
nb
Hence derive the AM ISE optimal binwidth bAM ISE and find AM ISE(f˜bAM ISE ).
10. (Scheff´
e’s theorem) Let (fn ) be a sequence of densities and f be another
density such that fn → f almost everywhere. By integrating gn = f − fn separately
over {x : gn (x) > 0} and {x : gn (x) ≤ 0} and using dominated convergence, show
that
Z ∞
|fn (x) − f (x)| dx → 0.
−∞
11. Assume the standard conditions on f , h and K from lectures, and Ralso that
∞
f 00 is continuous with R(f 00 ) < ∞. Use Fubini’s theorem to show that h −∞ (Kh2 ∗
f )(x) dx = R(K).
Use the dominated convergence theorem to show that (Kh ∗ f )(x) → f (x) for each
x ∈ R, and show
(Kh ∗ f )(x) < ∞. Apply Scheff´e’s theorem to
n∈N supx∈R
R ∞ that sup
R∞
2
deduce that −∞ (Kh ∗ f ) (x) dx → −∞ f (x)2 dx.
Finally, deduce that
Z
∞
Var{fˆh (x)} dx =
−∞
1
R(K) + O(n−1 ).
nh
2
R∞ R∞
12. (Continuation) Show that −∞ E{fˆh (x)}−f (x) dx = h4 −∞ A2n (x) dx, where
Z ∞Z 1
An (x) =
(1 − t)f 00 (x − thz)z 2 K(z) dt dz.
−∞
0
Apply Cauchy-Schwarz twice, firstly to the innermost integral with (1−t)1/2 |z|K 1/2 (z)
as one term of the product, and secondly to the middle integral, and then use Fubini’s
theorem to evaluate the x-integral first, to show that
Z ∞
1
A2n (x) dx ≤ R(f 00 )µ22 (K)
4
−∞
for all n. Use dominated convergence to show that An (x) → 12 f 00 (x)µ2 (K) for each
x ∈ R. Apply Fatou’s lemma and combine the previous results to conclude that
1
1
1
M ISE(fˆh ) =
R(K) + h4 R(f 00 )µ22 (K) + o
+ h4 .
nh
4
nh
3