Chebyshev Inequality with estimated Mean and Variance

Chebyshev Inequality
with estimated Mean and
Variance
John G. Saw, Mark C.K. Yang, Tse Chin Mo
The American Statistician, 1984
Coffeetalk, 29 Jan 2014
woensdag 29 januari 2014
Chebyshev Inequality
• The inequality
1
P [|X − µ| > λσ] ≤ 2
λ
• Holds for finite variances
• Useful to check that values far away from the mean are
very unlikely:
pX
µ
1
λ2
λσ
2
woensdag 29 januari 2014
Chebyshev Ineq. on estimated vars
• For Chebychev
1
P [|X − µ| > λσ] ≤ 2
λ
you need the true µ and σ .
¯ and Q (estimated on n points),
• If you have estimates X
the paper shows:
2
2
g
(nλ
/(n
−
1
+
λ
))
n+1
¯ > λQ] ≤
P [|X − X|
n+1
• Uhm, what is gn+1 (.) ?
3
woensdag 29 januari 2014
The function g_m
• Define integer n ≥ 2 , real value x ≥ 1
• Define integer m = n + 1
• Define integer ν = �m/x�


if ν is even
ν
2
gm (x) = ν
if ν is odd and x < a


2
ν − 1 if ν is odd and x ≥ a
• where:
m(m − ν)
a =
1 + ν(m − ν)
2
• Huh?!?!?!
4
woensdag 29 januari 2014
The function g_m
• Ok, just show it, for n=10, m=11:
g
11
10
9
8
g11 (x)
7
gm
6
5
4
3
2
1
0
0
5
10
x
x
15
5
woensdag 29 januari 2014
And the inequality...
• And we used it in:
2
2
g
(nλ
/(n
−
1
+
λ
))
n+1
¯ > λQ] ≤
P [|X − X|
n+1
bound on P(|.|>h Q) for n=10
1
0.9
• Gives
(n=10):
0.8
bound on P(|.|>h Q)
0.7
0.6
0.5
0.4
P [|...| > λQ]
0.3
0.2
0.1
0
1
1.5
2
2.5
3
h
3.5
4
4.5
5
λ
6
woensdag 29 januari 2014
And the inequality...
• And we used it in:
2
2
g
(nλ
/(n
−
1
+
λ
))
n+1
¯ > λQ] ≤
P [|X − X|
n+1
bound on P(|.|>h Q) for n=10
1
0.9
• Gives
(n=10):
0.8
bound on P(|.|>h Q)
0.7
0.6
the prob.
does not go
to 0...
0.5
0.4
P [|...| > λQ]
0.3
0.2
0.1
0
1
1.5
2
2.5
3
h
3.5
4
4.5
5
λ
6
woensdag 29 januari 2014
But for larger and larger n...
• For larger n it becomes standard Chebyshev:
bound on P(|.|>h Q) for n=100
bound on P(|.|>h Q) for n=1000
1
1
0.9
0.9
n = 100
0.8
0.7
bound on P(|.|>h Q)
bound on P(|.|>h Q)
0.7
0.6
0.5
0.4
0.6
0.5
0.4
0.3
0.3
0.2
0.2
0.1
0.1
0
n = 1000
0.8
1
1.5
2
2.5
3
h
3.5
4
4.5
5
0
1
1.5
2
2.5
3
h
3.5
4
4.5
5
7
woensdag 29 januari 2014
Limit for large n...
• If you ignore some details (like rounding, and even/odd
distinction) we sort-of have:
n+1
gn+1 (x) ≈
x
gn+1 (nλ2 /(n − 1 + λ2 ))
1
n+1
≈
• So:
n+1
n + 1 nλ2 /(n − 1 + λ2 )
n − 1 + λ2
n−1 1
1
=
=
+
2
2
nλ
n λ
n
• Hurray:
1
n−1 1
1
+
lim (
)
=
n→∞
n λ2
n
λ2
8
woensdag 29 januari 2014
Chebyshev on training sample
• It is possible!
2
2
gn+1 (nλ /(n − 1 + λ ))
¯
P [|X − X| > λQ] ≤
n+1
• The proof is quite hard (but only one page long...)
• The probability for λ → ∞ does not go to zero!
• I wanted to use to find out:
how many samples n do I need to estimate an average
{error, 1-NN distance, ...}?
(But apparently
9
woensdag 29 januari 2014