Problem Set 2

Fall 2014 Randomized Algorithms
Deadline: Oct 22, 2014
Problem Set 2
Prof. Friedrich Eisenbrand
Assistant: Chidambaram
1. Let X be a random variable with mean µ and variance σ 2 . The statement
P[|X − µ| ≥ a] ≤ σ 2 /a2 is called Chebyshev’s inequality and is easily
derived from Markov’s inequality. Prove the “one-sided” version of it, i.e.,
P[X ≥ µ + a] ≤
σ2
.
σ 2 + a2
[4 pts]
Solution: Let Y = X − µ and t > 0. P[X ≥ µ + a] = P[Y + t ≥ a + t] ≤
+t)2
P[(Y + t)2 ≥ (a + t)2 ]. The last expression is equal to P[ (Y
(a+t)2 ≥ 1] to
which we apply Markov’s inequality and set t = σ 2 /a to get the result.
2. Consider a collection of X1 , . . . , Xn of nPindependent integers chosen unin
formly from the set {0, 1, 2}. Let X = i=1 Xi and 0 < δ < 1. Derive a
Chernoff bound for P[X ≥ (1 + δ)n] and P[X ≤ (1 − δ)n].
[4 pts]
Solution: As in the previous exercise define
P the zero-mean random variables Yi = Xi − E[Xi ] = Xi − 1. Let Y = i Yi . Now, P[X ≥ (1 + δ)n] =
P[Y ≥ δn] = P[etY ≥ etδn ] for any t > 0. Using Markov’s inequality
P[etY ≥ etδn ] ≤
Y
Y1
E[etY ]
= e−tδn
E[etYi ] = e−tδn
(e−t + e0 + et ).
tδn
e
3
i
i
2
We now make use of the inequality cosh(t) = (e−t + et )/2 ≤ et /2 (one
way to see it is by using the series expansions of ex and cosh(x)). Thus,
2
2
1 −t
1
(e + e0 + et ) ≤ (1 + 2et /2 ) ≤ et /2 .
3
3
Putting things together we have
Y 2
2
2
P[X ≥ (1 + δ)n] ≤ e−tδn
et /2 = e−tδn+t n/2 = e−δ n/2 ,
i
where we have made the choice t = δ to maximize the absolute value of
the term in the exponent. The Chernoff bound for P[X ≤ (1 − δ)n] is
identical.
3. We can often make better estimates on the expected value of a random
variable when we have access to higher moment information. In the first
part of the exercise we will obtain good estimates on the absolute value
of any random variable using the fourth moment. In the second part we
will use this to prove a discrepancy result.
1
(a) Using the information that for all x ≥ 0 and q > 0,
√ 3 3
x4
2
x≥ √
x −
,
2 q
q
prove that for any random variable S,
E[|S|] ≥
(E[S 2 ])3/2
.
(E[S 4 ])1/2
(b) Let X1 , . . . , Xn be a collection of 4-wise independent random variables taking −1 or 1 with equal probability. Let 1 , . . . , n ∈ R. Show
that if S = 1 X1 + · · · + n Xn then
r
q
21 + . . . 2n
≤ E[|S|] ≤ 21 + . . . 2n .
3
[6 pts]
Solution: By linearity of expectations we have
√ 3 3
E[S 4 ]
E[|S|] ≥ √
E[S 2 ] −
.
2 q
q
Substituting q = 3E[S 4 ]/E[S 2 ] we obtain the result in first part.
P
P 2
For the second part, observe that E[S 2 ] = E[ i,j i j Xi Xj ] =
i i
4
since
E[X
X
]
=
0
when
i
=
6
j
and
1
when
i
=
j.
Similarly,
E[S
]
=
i
j
P
E[ i,j,k,l i j k l Xi Xj Xk Xl ] where only the monomials of the form Xi4
and Xi2 Xj2 for i < j contribute to the expectation. Thus,
X
X
X
X 4
4
4 4
E[S ] = E[
i Xi +
i j Xi2 Xj2 ] =
4i + 6
2i 2j .
2
i
i<j
i
i<j
P
The above can be upper bounded by 3( i 2i )2 . Using the inequality from
the first part we can therefore prove
r
P
( i 2i )3/2
21 + · · · + 2n
(E[S 2 ])3/2
√
≥
.
=
E[|S|] ≥
P
3
(E[S 4 ])1/2
3( i 2i )
The upper bound on E[|S|] follows from the fact that E[|S|]2 ≤ E[S 2 ].
4. Suppose that we can obtain independent samples X1 , X2 , . . . of a random
variable X and that we want to use these samples to estimate E[X]. Using
Pt
t samples, we use
i=1 Xi /t for our estimate of E[X]. We want the
estimate to be within E[X] from the true value of E[X] with probability
at least 1 − δ. We may not be able to use Chernoff’s bound directly to
bound how good our estimate is if X is not a 0 − 1 random variable (as we
saw in the lecture), and we do not know its moment generating function.
We develop an alternative approach
that requires only having a bound on
p
the variance of X. Let r = Var[X]/E[X].
2
(a) Show using Chebyshev’s inequality that O(r2 /2 δ) samples are sufficient to solve the problem.
(b) Suppose that we need only a weak estimate that is within E[X] of
E[X] with probability at least 3/4. Argue that O(r2 /2 ) samples are
enough for this weak estimate.
(c) Show that, by taking the median of O(log(1/δ))weak estimates, we
can obtain an estimate within E[X] of E[X] with probability at least
1 − δ. Conclude that we need only O((r2 log(1/δ)))/2 ).
[6 pts]
Solution:PDefine µ := E[X] and σ 2 := Var[X] so that E[X 2 ] = σ 2 + µ2 .
Let Y = ( i Xi )/n be the random variable corresponding to our estimate.
Chebyshev’s inequality states that
E[(Y − µ)2 ]
.
2 µ2
P
E[(Y − µ)2 ] = E[Y 2 ] − E[Y ]2 = (1/n2 )E[( i Xi )2 ] − µ2 . Expanding the
first term,
X
X
X
n 2
2
2
2
2
µ .
E[(
Xi ) ] = E[
Xi + 2
Xi Xj ] = n(σ + µ ) + 2
2
i
i
i<j
P[|Y − µ| ≥ µ] ≤
Therefore,
E[(Y − µ)2 ] =
(n − 1) 2
σ 2 + µ2
σ 2 + µ2
+
µ − µ2 <
.
n
n
n
By Chebyshev’s inequality, P[|Y − µ| ≥ µ] < δ when
n≥
1 + r2
.
2 δ
Clearly, the solution to the second part is simply obtained by substituting
δ = 1/4 in the above expression.
Now there are two ways to solve the final part. One way is to use the
Chernoff bounds to argue that with high probability at least half of the
weak estimates are going to be in the correct range so that the median is
only an factor away from the mean µ.
Another way is the following: assume Z1 , . . . , Zn to be 7/8-weak estimates
(we are still using only O(r2 /2 ) samples for these slightly improved weak
estimates). That is, for each Zi , P[|Zi − µ| ≥ µ] ≤ 1/8. Let Z be the
median of the samples Z1 , . . . , Zn . Now P[|Z − µ| ≥ µ] is at most the
probability that some n/2 weak estimates
are outside the range [µ−µ, µ+
n
µ]. This quantity is at most n/2
(1/8)n/2 < 2n (1/8)n/2 = (1/2)n which
is less than δ for
1
n ≥ log2 .
δ
3
n
Note: In the first draft of solutions I missed the additional n/2
factor
in the second method. Thanks to Alfonso for catching this error.
5. Consider a set S of n points on a line. Pick a point uniformly at random
from S. Repeat this procedure at total of r times and let R be the set of
points that was picked at least once. Show that with constant probability
every interval of points in S defined by consecutive points in R is of size
at most O(n log r/r).
[5 pts]
Solution: Split the set S into intervals I1 , . . . , It of size cn log r/r. The
number of such intervals t is r/(c log r). The probability that in all the r
r r
−c log r
= 1/rc .
trials we don’t pick from some interval Ij is (1 − c log
r ) ≤e
By the union bound the probability at least one of the intervals I1 , . . . , It
is empty is at most t/rc = 1/(crc−1 log r) ≤ 1/c. This implies that with
probability at least 1 − 1/c all intervals of S defined by consecutive points
of R have size at most 2cn log r/r.
4