Generation of Non-Uniform Random Numbers

MS&E 223
Simulation
Peter J. Haas
Lecture Notes #5
Generation Of Non-Uniform Random Numbers
Spring Quarter 2005-06
Generation of Non-Uniform Random Numbers
Refs: L&P, Chs. 6 and 7, L&K, Ch. 7, and Devroye, Non-Uniform Random Variate Generation (watch
for typos)
Problem: Given a uniform random variable U, generate a random variable X having a prescribed
distribution function FX (⋅) .
We previously discussed the inversion method. While inversion is a very general method, it may be
computationally expensive. In particular, computing FX−1 ( ⋅ ) may have to be implemented via a numerical
root-finding method in many cases. Therefore, we will now describe other methods for non-uniform
random number generation.
1. Acceptance-Rejection Method
We now discuss a method that is well-suited to generating random variates with an easily-calculated
density. This method is called acceptance-rejection.
Goal: Generate a random variate X having given probability density fX(x), where fX(x) is positive only
on the interval [a, b] (where - ∞ < a < b < ∞ ).
Enclose the density in a rectangle having base (b−a) and height m, where m = sup fX (x) .
a≤ x≤ b
fX (x)
m
x
b
a
Suppose we throw down points uniformly in the rectangle R (denote these points by the symbol x).
Throw away (or reject) the points above the density fX(x).
Claim: The x-coordinate of each accepted point (denoted by the symbol ⊗ ) has density fX(x).
Proof: Let (Z1 , Z2 ) be the (x, y) coordinates of a random point distributed uniformly in R. Then, for
a ≤ x ≤ b,
P(X ≤ x) = P(Z1 ≤ x | acceptance) =
P(Z1 ≤ x , Z2 ≤ f X (Z1 ))
.
P(Z2 ≤ f X (Z1 ))
Page 1 of 8
MS&E 223
Simulation
Peter J. Haas
Lecture Notes #5
Generation Of Non-Uniform Random Numbers
Spring Quarter 2005-06
But P(Z1 ≤ x , Z 2 ≤ f X (Z1 )) is just the probability that (Z1 , Z2 ) falls into the shaded region below:
fX(t)
x
a
This probability is just
P(X ≤ x) =
∫
x
a
∫
x
a
f X (t) dt
area of R
b
t
-1
. Putting x = b, we conclude that P(Z 2 ≤ f X (Z1 )) = (area of R) . So,
f X (t) dt .
as desired. Hence, acceptance-rejection works. A more precise specification of the algorithm is:
1.
2.
3.
Generate U1 , U 2 as two independent U(0,1) random variates.
Set Z1 = a + (b − a)U1 and Z 2 = mU 2 [We are essentially using the inversion method here]
If Z 2 ≤ f X (Z1 ) , return X = Z1 . Else, return to 1.
Note that the number of pairs N of uniform random variates (U1 , U 2 ) required to generate X is
geometrically distributed with mass function
P(N = k) = p(1 − p)k −1
for k ≥ 1 ,
-1
where p = (area of R) . The mean number of pairs that would be required is then
E(N) =
1
= (b−a)m .
p
Hence, a good measure of how fast such an acceptance-rejection algorithm will be is the closeness of the
quantity (b−a)m to 1. (So we want m to be as small as possible.)
2. Generalized Acceptance-Rejection
There are many random variates having densities that are positive over infinite intervals, or such that
(b−a)m is very large. In such cases, the following generalization of acceptance-rejection may be useful.
Goal: Generate a random variate X having given probability density f X ( ⋅ ) .
Page 2 of 8
MS&E 223
Simulation
Peter J. Haas
Lecture Notes #5
Generation Of Non-Uniform Random Numbers
Spring Quarter 2005-06
We assume that there exists a random variable Y, having known density g(⋅) , such that:
1.
We have a means of efficiently generating replicates of Y (e.g. Y is exponential).
2.
there exits a constant c > 0 such that fX(x) ≤ c g(x) for all x.
x
x
x
x
fX(x)
x
x
x
c g(x)
x
x
x
x
x
x
x
A random variable Y having the above properties is called a majorizing random variable and the density
g(⋅) is called a majorizing density.
Let
c = sup
x
f X (x)
g(x)
be the smallest such constant.
Generalized acceptance-rejection works as follows:
1.
2.
3.
Generate the random variable Y having density g(⋅) .
Generate a uniform(0,1) random variable U, independent of Y.
If Ucg(Y) ≤ fX(Y), return X = Y. Else, return to 1.
An argument similar to that used in the case of acceptance-rejection shows that the above algorithm
produces variates with the required density fX (⋅) . In this case, we have, using the law of total expectation
and the independence of U and Y,
P(Y ≤ x,acceptance) = P ( Y ≤ x, U ≤ f X (Y) / cg(Y) ) = ∫ P ( U ≤ f X (Y) / cg(Y) | Y = y ) g(y) dy
x
−∞
= ∫ P ( U ≤ f X (y) / cg(y) ) g(y) dy = ∫
x
−∞
( f X (y) / cg(y) ) g(y) dy = (1/ c) ∫−∞ f X (y) dy.
−∞
x
x
Taking x = ∞ in the above calculation yields P(acceptance) = 1/ c , so that
x
P(Y ≤ x, acceptance) (1/ c) ∫-∞ ( f X (y) / c ) dy
P(X ≤ x | acceptance)=
= ∫ f X (y) dy .
=
-∞
P ( acceptance )
1/ c
x
Page 3 of 8
MS&E 223
Simulation
Peter J. Haas
Lecture Notes #5
Generation Of Non-Uniform Random Numbers
Spring Quarter 2005-06
The following figure provides some intuition; it was obtained by scaling the previous figure by a factor of
1/c. The majorizing density curve plays the role of the bounding rectangle in the simple A/R method.
g(x)
fX (x)/c
x
Note that the number N of random variates Y required to generate a single replicate of X is geometric
with mean c. Hence, c is a measure of the efficiency of the algorithm--the closer c is to 1, the better.
3. Convolution Method
To generate a random variable X that can be represented as
X = Y1 +
+ Ym ,
where the Yi 's are independent random variables, an obvious strategy is to generate the Yi 's
independently, and sum them to obtain X. This is known as the convolution method because the
distribution function of X can be computed analytically as the convolution of the distributions of the Yi 's .
Example: Suppose X is binomially distributed with parameters n and p (0 ≤ p ≤ 1). Then, it is well known
that
D
X =
Y1 +
+ Ym ,
where the Yi 's are i.i.d. Bernoulli (p) random variables having mass function
P(Yi = 0) = 1 − p and P(Yi = 1) = p
If n isn't too large, the convolution method can be a reasonably efficient way of generating X. Otherwise,
one can turn to more sophisticated algorithms. See Devroye.
4. Composition Method
Suppose that either the distribution FX (⋅) or the probability density fX (⋅) can be represented either of the
following two forms:
(1)
FX (x) = p1FY1 (x) + p 2 FY2 (x) + ... + p m FYm (x)
Page 4 of 8
MS&E 223
Simulation
Peter J. Haas
(2)
Lecture Notes #5
Generation Of Non-Uniform Random Numbers
Spring Quarter 2005-06
f X (x) = p1f Y1 (x) + p 2 f Y2 (x) + ... + p m f Ym (x)
where p1 ,… , p m are non-negative and sum to one (so that they form a probability mass function). Then,
assuming that the Yi's are relatively easily to generate, we can generate X as follows:
1.
Generate a discrete random variable I on {1, ..., m}, where P(I = j) = p j for 1 ≤ j ≤ m.
2.
Generate YI from FYI (⋅) (or fYI (⋅) ).
3.
Return X = YI .
This approach can sometimes lead to very fast algorithms for non-uniform variate generation. To see
why, suppose we approximate the density fX (⋅) (of the random variable X to be generated) from below by
a piecewise-linear function.
R3
fX(x)
R1
a
R2
b
c
The area underneath the piecewise-linear function consists of two regions R 1 and R 2 ; the area above the
approximation and below fX (⋅) is R 3 . If we let p i denote the area of R i , then it is clear that
p1 , p 2 , and p 3 form a probability mass function. Furthermore,
fX(x) = p1f Y1 (x) + p 2 f Y2 (x) + p 3 f Y3 (x)
where
⎧ 2(x − a)
, a≤x≤b
⎪
f Y1 (x) = ⎨ (b − a) 2
⎪ 0
, otherwise
⎩
⎧ 2(c − x)
, b≤x≤c
⎪
f Y2 (x) = ⎨ (c − b) 2
⎪
0
, otherwise
⎩
f Y3 (x) =
1
(f X (x) − p1f Y1 (x) − p 2 f Y2 (x)) .
p3
Page 5 of 8
MS&E 223
Simulation
Peter J. Haas
Lecture Notes #5
Generation Of Non-Uniform Random Numbers
Spring Quarter 2005-06
To see this, set g i (x) = pi f Yi (x) for i=1,2,3 and observe that g 2 (x) = f Y2 (x) = 0 for x ∈ [a, b] . Now pick a
value z ∈ [a, b] and observe that
(
)
P(X ≤ z) = ∫ f X (x) dx = ∫ ⎡⎣g1 (x) + ( f X (x) − g1 (x) ) ⎤⎦ dx = ∫ p1 f Y1 (x) + p3 f Y3 (x) dx
a
a
a
z
z
(
z
z
)
= ∫ p1 f Y1 (x) + p 2 f Y2 (x) + p3 f Y3 (x) dx,
a
so that the desired result follows after taking derivatives. A similar argument establishes the result when
x ∈ [b,c] .
To generate random variables from triangular densities is easy and very fast. Inversion can be used, or
one can take advantage of the following:
Exercise: Let Z1, Z2 be two independent uniform random variables on [a, b]. Prove that the random
variable max(Z1, Z2) has density fY1 (⋅) . (If inversion is used, a square root needs to be computed; this
avoids the square root, at the cost of generating an additional random variable).
Exercise: Let Z1, Z2 be two independent uniform random variables on [b, c]. Prove that the random
variable min(Z1, Z2) has density fY2 (⋅) .
Hence, so long as p3 is small, this composition approach to generating the random variable X will be
efficient, because whatever routine is used to generate Y3 will only be used infrequently (and hence we
don’t care much if the routine is relatively inefficient).
Very complex densities fX (⋅) may be approximated by combinations of "triangles" (i.e. triangular
densities of the above form) and "rectangles" (i.e. uniform random variables). One then needs some
(possibly inefficient) algorithm to generate variates from what is "left over" after the approximation. But,
as in the above example, this will only be called infrequently. This approach, while time-consuming to
code, can lead to very fast variate generation schemes.
5. Generating Discrete Random Variables: The Alias Method
We return now to the question of how to efficiently generate discrete random variables with a given
probability mass function. Specifically, suppose our goal is to generate a random variable X for which
P(X = x i ) = p i for 1 ≤ i ≤ n.
The alias method relies upon the idea that if the p i 's are uniform (so that p i = 1/n for 1 ≤ i ≤ n), then X can
be rapidly generated. Conceptually, we can think of dropping points randomly in a rectangle that is 1/n
units tall and n units wide. The rectangle is divided into n vertical regions, each 1 unit wide, numbered
1,2,..., n from left to right. E.g., for n = 4 we have:
Page 6 of 8
MS&E 223
Simulation
Peter J. Haas
Lecture Notes #5
Generation Of Non-Uniform Random Numbers
Spring Quarter 2005-06
1/4
1
2
3 4
0
4
If the generated point lands in region i, then we return the value x i . In this simple case we don’t need to
actually generate the y-coordinate of our random point, and can implement our technique efficiently by
generating a uniform(0,1) random variable U and setting
X = x ⎡nU ⎤ ,
where ⎡⎢ z ⎤⎥ denotes the smallest integer greater than or equal to z. (We don’t generate an explicit x value,
we merely determine in which region our x value falls.)
For non-uniform p i 's , the idea now is to “shift” probability mass around so as to make a rectangular
region---where the vertical regions may now be horizontally partitioned into subregions---and use
essentially the above idea. In shifting mass to make the rectangle, it turns out that the ith vertical region
never needs to be chopped up into more than two parts: one part is associated with the original x i value,
and the other part is associated with an alternate value x j for some j ≠ i . The alternate value is called the
alias value. (See figures.)
The x-coordinate of the random point is handled as before---i.e., we don’t generate explicit x-values but
merely choose which region we fall in. Similarly, we don’t generate explicit y-values, but rather choose
which of the two parts of the region that the point falls in. The point falls in the lower part of vertical
region i with probability ri = (height of lower part of region i)/(height of region i).
In general, to use the alias method, we first need to execute a set-up algorithm to generate the alias values
and probability table. For 1 ≤ i ≤ n, this yields a set of a i 's (the alias values) and ri 's (the aliasing
probabilities).
We can then employ the following algorithm:
Generate two independent uniform (0,1) random variables U1 and U 2 .
1.
Set I = nU1 .
2.
If U 2 ≤ rI , then return x I . Otherwise, return a I .
3.
To set up the alias table requires running an algorithm that executes in O(n) time. (In other words, the
algorithm runs in time which is essentially linear in n.) See Appendix 8B in L&K for details. When n is
large, some care has to be taken when programming to avoid numerical round-off errors.
Page 7 of 8
MS&E 223
Simulation
Peter J. Haas
Lecture Notes #5
Generation Of Non-Uniform Random Numbers
Spring Quarter 2005-06
Example 1:
1/2
1/2
1/3
1/3
x2
1/6
1/6
x2
x3
x1
x1
x2
x3
i
xi
pi
ai
ri
1
2
3
x1
x2
x3
1/6
1/2
1/3
x2
x2
x3
0.5
1
1
Example 2:
3/7
7/21
2/7
x1
5/21
x2
x1
1/7
3/21
x2
x3
x1
x2
x3
i
xi
pi
ai
ri
1
x1
3/7
x1
1
2
x2
3/7
x1
5/7
3
x3
1/7
x2
3/7
Page 8 of 8