Fall 2014 Randomized Algorithms Deadline: Oct 22, 2014 Problem Set 2 Prof. Friedrich Eisenbrand Assistant: Chidambaram 1. Let X be a random variable with mean µ and variance σ 2 . The statement P[|X − µ| ≥ a] ≤ σ 2 /a2 is called Chebyshev’s inequality and is easily derived from Markov’s inequality. Prove the “one-sided” version of it, i.e., P[X ≥ µ + a] ≤ σ2 . σ 2 + a2 [4 pts] Solution: Let Y = X − µ and t > 0. P[X ≥ µ + a] = P[Y + t ≥ a + t] ≤ +t)2 P[(Y + t)2 ≥ (a + t)2 ]. The last expression is equal to P[ (Y (a+t)2 ≥ 1] to which we apply Markov’s inequality and set t = σ 2 /a to get the result. 2. Consider a collection of X1 , . . . , Xn of nPindependent integers chosen unin formly from the set {0, 1, 2}. Let X = i=1 Xi and 0 < δ < 1. Derive a Chernoff bound for P[X ≥ (1 + δ)n] and P[X ≤ (1 − δ)n]. [4 pts] Solution: As in the previous exercise define P the zero-mean random variables Yi = Xi − E[Xi ] = Xi − 1. Let Y = i Yi . Now, P[X ≥ (1 + δ)n] = P[Y ≥ δn] = P[etY ≥ etδn ] for any t > 0. Using Markov’s inequality P[etY ≥ etδn ] ≤ Y Y1 E[etY ] = e−tδn E[etYi ] = e−tδn (e−t + e0 + et ). tδn e 3 i i 2 We now make use of the inequality cosh(t) = (e−t + et )/2 ≤ et /2 (one way to see it is by using the series expansions of ex and cosh(x)). Thus, 2 2 1 −t 1 (e + e0 + et ) ≤ (1 + 2et /2 ) ≤ et /2 . 3 3 Putting things together we have Y 2 2 2 P[X ≥ (1 + δ)n] ≤ e−tδn et /2 = e−tδn+t n/2 = e−δ n/2 , i where we have made the choice t = δ to maximize the absolute value of the term in the exponent. The Chernoff bound for P[X ≤ (1 − δ)n] is identical. 3. We can often make better estimates on the expected value of a random variable when we have access to higher moment information. In the first part of the exercise we will obtain good estimates on the absolute value of any random variable using the fourth moment. In the second part we will use this to prove a discrepancy result. 1 (a) Using the information that for all x ≥ 0 and q > 0, √ 3 3 x4 2 x≥ √ x − , 2 q q prove that for any random variable S, E[|S|] ≥ (E[S 2 ])3/2 . (E[S 4 ])1/2 (b) Let X1 , . . . , Xn be a collection of 4-wise independent random variables taking −1 or 1 with equal probability. Let 1 , . . . , n ∈ R. Show that if S = 1 X1 + · · · + n Xn then r q 21 + . . . 2n ≤ E[|S|] ≤ 21 + . . . 2n . 3 [6 pts] Solution: By linearity of expectations we have √ 3 3 E[S 4 ] E[|S|] ≥ √ E[S 2 ] − . 2 q q Substituting q = 3E[S 4 ]/E[S 2 ] we obtain the result in first part. P P 2 For the second part, observe that E[S 2 ] = E[ i,j i j Xi Xj ] = i i 4 since E[X X ] = 0 when i = 6 j and 1 when i = j. Similarly, E[S ] = i j P E[ i,j,k,l i j k l Xi Xj Xk Xl ] where only the monomials of the form Xi4 and Xi2 Xj2 for i < j contribute to the expectation. Thus, X X X X 4 4 4 4 E[S ] = E[ i Xi + i j Xi2 Xj2 ] = 4i + 6 2i 2j . 2 i i<j i i<j P The above can be upper bounded by 3( i 2i )2 . Using the inequality from the first part we can therefore prove r P ( i 2i )3/2 21 + · · · + 2n (E[S 2 ])3/2 √ ≥ . = E[|S|] ≥ P 3 (E[S 4 ])1/2 3( i 2i ) The upper bound on E[|S|] follows from the fact that E[|S|]2 ≤ E[S 2 ]. 4. Suppose that we can obtain independent samples X1 , X2 , . . . of a random variable X and that we want to use these samples to estimate E[X]. Using Pt t samples, we use i=1 Xi /t for our estimate of E[X]. We want the estimate to be within E[X] from the true value of E[X] with probability at least 1 − δ. We may not be able to use Chernoff’s bound directly to bound how good our estimate is if X is not a 0 − 1 random variable (as we saw in the lecture), and we do not know its moment generating function. We develop an alternative approach that requires only having a bound on p the variance of X. Let r = Var[X]/E[X]. 2 (a) Show using Chebyshev’s inequality that O(r2 /2 δ) samples are sufficient to solve the problem. (b) Suppose that we need only a weak estimate that is within E[X] of E[X] with probability at least 3/4. Argue that O(r2 /2 ) samples are enough for this weak estimate. (c) Show that, by taking the median of O(log(1/δ))weak estimates, we can obtain an estimate within E[X] of E[X] with probability at least 1 − δ. Conclude that we need only O((r2 log(1/δ)))/2 ). [6 pts] Solution:PDefine µ := E[X] and σ 2 := Var[X] so that E[X 2 ] = σ 2 + µ2 . Let Y = ( i Xi )/n be the random variable corresponding to our estimate. Chebyshev’s inequality states that E[(Y − µ)2 ] . 2 µ2 P E[(Y − µ)2 ] = E[Y 2 ] − E[Y ]2 = (1/n2 )E[( i Xi )2 ] − µ2 . Expanding the first term, X X X n 2 2 2 2 2 µ . E[( Xi ) ] = E[ Xi + 2 Xi Xj ] = n(σ + µ ) + 2 2 i i i<j P[|Y − µ| ≥ µ] ≤ Therefore, E[(Y − µ)2 ] = (n − 1) 2 σ 2 + µ2 σ 2 + µ2 + µ − µ2 < . n n n By Chebyshev’s inequality, P[|Y − µ| ≥ µ] < δ when n≥ 1 + r2 . 2 δ Clearly, the solution to the second part is simply obtained by substituting δ = 1/4 in the above expression. Now there are two ways to solve the final part. One way is to use the Chernoff bounds to argue that with high probability at least half of the weak estimates are going to be in the correct range so that the median is only an factor away from the mean µ. Another way is the following: assume Z1 , . . . , Zn to be 7/8-weak estimates (we are still using only O(r2 /2 ) samples for these slightly improved weak estimates). That is, for each Zi , P[|Zi − µ| ≥ µ] ≤ 1/8. Let Z be the median of the samples Z1 , . . . , Zn . Now P[|Z − µ| ≥ µ] is at most the probability that some n/2 weak estimates are outside the range [µ−µ, µ+ n µ]. This quantity is at most n/2 (1/8)n/2 < 2n (1/8)n/2 = (1/2)n which is less than δ for 1 n ≥ log2 . δ 3 n Note: In the first draft of solutions I missed the additional n/2 factor in the second method. Thanks to Alfonso for catching this error. 5. Consider a set S of n points on a line. Pick a point uniformly at random from S. Repeat this procedure at total of r times and let R be the set of points that was picked at least once. Show that with constant probability every interval of points in S defined by consecutive points in R is of size at most O(n log r/r). [5 pts] Solution: Split the set S into intervals I1 , . . . , It of size cn log r/r. The number of such intervals t is r/(c log r). The probability that in all the r r r −c log r = 1/rc . trials we don’t pick from some interval Ij is (1 − c log r ) ≤e By the union bound the probability at least one of the intervals I1 , . . . , It is empty is at most t/rc = 1/(crc−1 log r) ≤ 1/c. This implies that with probability at least 1 − 1/c all intervals of S defined by consecutive points of R have size at most 2cn log r/r. 4