Ch3. Generating Random Variates with Non-Uniform Distributions

ST4231, Semester I, 2003-2004 Ch3. Generating Random Variates with Non-Uniform Distributions This chapter mainly focuses on methods for generating non-uniform random numbers based on the built-in standard uniform random number generator. Outline of the Chapter: Inversion Method Rejection Method Composition Method Polar Method Multivariate Random Variable Generation 1

1 Inversion Method Proposition 1.1 The Foundation Theory of the Inversion Method Let F be the cdf of a random variable, and let U be a random variable with U[0,1] distribution. Then F 1 (U) F. Proof: Let F X denote the distribution of X = F 1 (U). Then F X (x) = P {X x} = P {F 1 (U) x} Now since F is a distribution function it follows that F (x) is a monotone increasing function of x and so the inequality a b is equivalent to the inequality F (a) F (b). Hence, from equation above, we see that F X (x) = P {F (F 1 (U) F (x)} = P {U F (x)} = F (x) since U is uniform (0, 1) 2

Remark When F : R [0, 1] is continuous and strictly increasing, then F 1 :[0, 1] R is also continuous and strictly increasing. More generally, we only know that F is right-continuous and non-decreasing, then we need to define F 1 by F 1 (u) = inf{z R : F (z) u}, u [0, 1]. 3

1.1 Discrete Random Number Generators Suppose we want to generate the value of a discrete random variable X having probability mass function P (X = x j )=p j, j =0, 1,..., p j =1 Algorithm Generate a random number U. if U<p 0, set X = x 0 and stop. if U<p 0 + p 1, set X = x 1 and stop. if U<p 0 + p 1 + p 2, set X = x 2 and stop.. j 4

Remark If the x i, i 0, are ordered so that x 0 <x 1 < and if we let F denote the distribution function of X, then F (x k )= k i=0 p i and so X will be equal to x j if F (x j 1 ) U<F(x j ) In other words, after generating a random number U we determine the value of X by finding the interval [F (x j 1 ),F(x j )) in which U lies [or, equivalently, by finding the inverse of F (U)]. It is for this reason that the above is called the inversion method for generating X. 5

Example 1. A Simple Discrete Ranom Numer Generator If we want to simulate a random variable X such that p 1 =0.2, p 2 =0.15, p 3 =0.25, p 4 =0.4, where p j = P (X = j). We could generate u and do the following: if U<0.2set X = 1 and stop. if U<0.35 set X = 2 and stop. if U<0.6set X = 3 and stop. Otherwise set X =4. However, a more efficient procedure is the following if U<0.4set X = 4 and stop. if U<0.65 set X = 3 and stop. if U<0.85 set X = 1 and stop. Otherwise set X =2. 6

Example 2. Geometric Ranom Number Generator P (X = i) =pq i 1, i 1, where q =1 p Since j 1 P (X = i) = 1 P (X >j 1) i=1 = 1 P (The first j 1 trials are all failures) = 1 q j 1 We can generate the value of X by generating a random number U and setting X equal to that value j for which 1 q j 1 U<1 q j or, equivalently, for which q j < 1 U q j 1 7

Generator 1 Thus, we can define X by X = min{j : q j < 1 U} X = min{j : j log(q) <log(1 U)} log(1 U) = min{j : j> } log(q) Hence, we can express X as X = int( log(1 U) )+1 log(q) where int(x) denotes the integer part of x. Generator 2 We can also write X as X = int( log(u) log(q) )+1, because 1 U is also uniformly distributed on (0,1). 8

Example 3. Poisson Random Number Generator P oisson(λ), The random variable X λ λi p i = P (X = i) =e i! for i =0, 1,. For p i and p i+1, we have the following recursive relationship, Generator Algorithm: p i+1 = Generate a random number U. i =0,p = e λ, F = p. if U<F, set X = i and stop. λ i +1 p i, i 0 p = λp/(i + 1), F = F + p, i = i +1. Go to step 3. 9

Remark: Based on the property of Poisson distribution, if λ is very large, one efficient generation algorithm is as follows: Let I = int(λ), first generate a random number to determine X is larger or smaller than I, then searches downward starting from X = I in the case where X I and upwards starting from X = I + 1 otherwise. Average number of searches = 1 + E[ X λ ] = 1+ X λ λe[ ] λ = 1+ λe[ Z ], wherez N(0, 1) = 1+0.798 λ 10

Example 4.Binomial Random Number Generator for i =0, 1,,n. The recursive identity, P (X = i) = n! i!(n 1)! pi (1 p) n i P (X = i +1)= n i p P (X = i) i +11 p Generator Algorithm: Generate a random number U. c = p/(1 p), i =0,pr =(1 p) n, F = pr. If U<F, set X = i and stop. pr = c(n i) i+1 pr, F = F + pr, i = i +1. Go to step 3. X Binomial(n, p), and Remark: As in Poisson case, when the mean np is large it is better to first determine if the generated value is less than or equal to I = int(np) or whether it is larger than I. Then decide to search downward or upward. 11

1.2 Continuous Random Number Generators Example 1. A Simple Continuous Random Number Generator want to generate X from the distribution Suppose we F (x) =x n, 0 <x<1 Solution: Let x = F 1 (u), then u = F (x) =x n or, equivalently, x = u 1/n Hence we have the following algorithm for generating a random variable from F (x). Generate a random number U U(0, 1). Set X = U 1/n. 12

Example 2. Exponential Random Number Generator If X is an exponential random variable with rate 1, then its distribution function is given by F (x) =1 e x Let x = F 1 (u), we have u = F (x) =1 e x or, taking logarithms, x = log(1 u) Hence we can generate an exponential with parameter 1 by generating a random number U and then setting X = F 1 (U) = log(1 U) = log(u) In general, an exponential random variable with rate λ can be generated by generating a random number U, and setting X = 1 λ log(u). 13

Example 3. Gamma Random Number Generator Suppose we want to generate the value of a gamma (n, λ) random variable. F (x) = x 0 λe λy (λy) n 1 dy (n 1)! Remark 1 It is not possible to give a closed form expression for the inverse F 1 (x), we can not use the inverse transform method here. Remark 2 By using the result that a gamma (n, λ) random variable X can be regarded as being the sum of n independent exponentials, each with rate λ, we can make use of example 2 to generate X. Generator Algorithm Based on the above idea, we can generate a gamma (n, λ) random variable by generating n random numbers U 1,, U n and then setting X = 1 λ log U 1 1 λ log U n = 1 λ log(u 1 U n ) Where the use of the identity n i=1 log(x i) = log(x 1 x n ) is computationally time saving in that it requires only one rather than n logarithm computations. 14

2 Rejection Method The Problem Suppose we want to generate a d-dimensional random vector X from a known target pdf f(x) onr d. Assume that g(x) is another pdf on R d satisfying two conditions: We already know how to generate random vector V from g(x), and There is a constant α such that f(x) αg(x) for every x R d. Then we can apply the following general Rejection Method Algorithm: (1) Generate V g. (2) Generate Y U[0,αg(V )]. (3) If Y f(v ), then Accept: set X = V and stop. Otherwise, then Reject: return to step (1). We call g the Trial or Proposal pdf. 15

Theorem 2.1 The Foundation Theory of the Rejection Method If X is generated via steps 1-3 of the rejection method above, then X f. Proof: Define the following two subsets of R d R: S = {(x,y):0 y αg(x)} and B = {(x,y):0 y f(x)}, which are regions below the graphs of αg and f respectively. Our first observation is that steps 1 and 2 generate a random point (V,Y) that is uniformly distributed on S. Let h(x,y) be the joint density of (V,Y), then we have g(x)h(y x) = 1 α if (x,y) S h(x,y)= 0 otherwise. Let (V,Y ) be the accepted point, i.e., the first (V,Y) that is in B. Then our first observation implies that (V,Y ) is uniformly distributed on B; i.e., its pdf is identically 1 on B (since the volume of B is 1). Hence, the marginal pdf of X = V is k(x) = f(x) 0 1dy = f(x). 16

Efficiency of the General Rejection Method For each proposal (V,Y) obtained via steps 1 and 2. P {(V,Y) is accepted} = area(b) area(s) = 1 α Therefore, the expected number of proposals needed is α. In fact, the number of proposals needed has the geometric distribution with parameter 1/α. Thus in the interests of efficiency, we would like to choose g so that α is small (i.e., close to 1). Clearly, taking α = sup f(x)/g(x) is the optimal α, given f and g. 17

Example 1 Simulate a distribution with probability P = {0.11, 0.12,0.09, 0.08,0.12, 0.10, 0.09, 0.09,0.10, 0.10}. whereas one possibility is to use the inverse transform algorithm, a better approach is to use the rejection method with q being the discrete uniform density on 1, 2,, 10. That is, q j =1/10, j =1, 2,, 10. Choose c =1.2 byc = max{p j /q j } =1.2, so the algorithm is as follows: Generate a random number U 1 and set Y = int(10u 1 )+1. Generate a second random number U 2. If U 2 p y /1.2, set X = Y and stop. Otherwise return to step 1. 18

Example 2 Using the rejection method to generate a random variable having density function f(x) =20x(1 x) 3, 0 <x<1 Step 1: Specify the Proposal pdf Since this random variable (Beta(2,4)) is concentrated in the interval (0,1), let us consider the rejection method with g(x) =1, 0 <x<1. Step 2: Find the Optimal C To determine the constant C, we maximize the following function f(x) =20x(1 x)3 g(x) d dx (20x(1 x)3 ) = 20[(1 x) 3 3x(1 x) 2 ] 19

Step 3: Specify the Rejection Function Setting this equal to 0 shows that the maximal value is attained when x =1/4, and thus f(x)/g(x) 20(0.25)(1 0.25) 3 = 135 64 c Hence, f(x) cg(x) = 256 x(1 x)3 27 Step 4: Write down the Generator Algorithm Thus, the rejection procedure is as follows, Generate random number U 1 and U 2. If U 2 256 27 U 1(1 U 1 ) 3, stop and set X = U 1. Otherwise, return to step 1. The average number of times that step 1 will be performed is c = 135/64 2.11. 20

Example 3 Suppose we want to generate a random variable having the Gamma(1.5,1) density f(x) =Kx 1/2 e x, x > 0 where K =1/Γ(1.5) = 2/ π. Because such a random variable is concentrated on the positive axis and has mean 1.5, it is natural to try the rejection technique with an exponential random variable with the same mean. Hence, let g(x) = 2 3 e 2x/3, x > 0 We have f(x) g(x) = 3 2 Kx1/2 e x/3 Maximize the ratio, we get c = 33/2 (2πe) 1/2, and So the algorithm is as follows, f(x) cg(x) =(2e/3)1/2 x 1/2 e x/3 Generate a random number U 1, and set Y = 3 2 log U 1. Generate a random number U 2. If U 2 < (2eY/3) 1/2 e Y/3, set X = Y, otherwise, return to step 1. 21

3 The Composition Method Take a pdf f and divide the region under the graph of f into a finite number of sibregions, say S 1,...,S M, with respective areas α 1,...,α M so that M i=1 α i = 1. To generate X f via the composition method: (1) Generate I {1,...,M} with pmf (α 1,...,α M ). (2) Generate (V,W) uniformly on S I. (3) Set X = V. 22

One can also describe this method by expressing the target pdf as a mixture of other pdf s f 1,...,f M, that is, M f = α i f i. i=1 For example, we want to simulate the random variables from the following distribution P (X = j) =αp (1) j +(1 α)p (2) j where 0 <α<1. The algorithm is as follows: Generate a random number U. If U<α, generate X from P (1), and stop. Otherwise go to step 3. If U>α, generate X from P (2), and stop. 23

Example 1 Simulate a random variable from the following distribution: { 0.05 for j =1, 2, 3, 4, 5 p j = P (X = j) = 0.15 for j =6, 7, 8, 9, 10 By noting that p j =0.5p (1) j +0.5p (2) j, where Algorithm is as follows: p (1) Generate a random number U 1. Generate a random number U 2. j =0.1, j=1,, 10 { 0 for j =1, 2, 3, 4, 5 p (2) j = 0.2 for j =6, 7, 8, 9, 10 If U 1 < 0.5, set X = int(10u 2 ) + 1, otherwise, set X = int(5u 2 )+6. 24

4 The Polar Method for Generating Normal R.V. Let X and Y be independent standard normal random variables and let R and θ denote the polar coordinates of the vector (X, Y ). That is R 2 = X 2 + Y 2 tan θ = Y X Since X and Y are independent, we have the joint density f(x, y) = 1 e x2 /2 1 e y2 /2 = 1 +y 2 )/2 2π 2π 2π e (x2 25

To determine the joint density of R 2 and θ, we make the change of variables r = x 2 + y 2 θ = tan 1 ( y x ) We have (with the Transformation Jacobian J = 2), f(r, θ) = 1 1 2 2π e r/2, 0 <r<, 0 <θ<2π (1) However, as this is equal to the product of an exponential density having mean 2 (namely, 1 2 e r/2 ) and the uniform density on (0, 2π), it follows that R 2 and θ are independent, with R 2 being exponential with mean 2 and θ being uniformly distributed over (0, 2π). 26

We can now generate a pair of independent standard normal random variables X and Y by using (1) to first generate their polar coordinates and then transforming back to rectangular coordinates. The algorithm is as follows Box-Muller Algorithm Generate random number U 1 and U 2. R 2 = 2 log(u 1 ), set θ =2πU 2. let X = R cos θ = 2 log U 1 cos(2πu 2 ) Y = R sin θ = 2 log U 1 sin(2πu 2 ) (2) The above transformation is known as Box-Muller Transformation. However, the above algorithm is not very efficient: the reason for this is the need to compute the sine and cosine trigonometric functions. There is a way to get around this time-consuming difficulty by an indirect computation of the sine and cosine of a random angle. The algorithm is as follows. 27

An Efficient Generator Generate U 1 and U 2 from U(0,1), and set V 1 = 2U 1 1, V 2 =2U 2 1. Then (V 1,V 2 ) is uniformly distributed in the square of area 4 centered at (0, 0). Suppose now that we continually generate such pairs (V 1,V 2 ) until we obtain one that is contained in the circle of radius 1 centered at (0, 0) that is, until (V 1,V 2 ) such that V1 2 + V 2 2 1. It now follows that such a pair (V 1,V 2 ) is uniformly distributed in the circle. If we let R and θ denote the polar coordinates of this pair, then it is not difficult to verify that R and θ are independent, with R 2 being uniformly distributed on (0,1) and with θ being uniformly distributed over (0, 2π). Since θ is thus a random angle, it follows that we can generate the sine and cosine of a random angle by generating a random point (V 1,V 2 ) in the circle and setting sin θ = V 2 R = V 2 (V 2 1 + V 2 2 )1/2 cos θ = V 1 R = V 1 (V 2 1 + V 2 2 )1/2 Following the Box-Muller transformation, we can generate independent unit normals as follows, X = ( 2 log(u)) 1/2 V 1 (V 2 1 +V 2 2)1/2 Y = ( 2 log(u)) 1/2 V 1 (V 2 (3) 2 +V 2 2)1/2 28

Since R 2 = X 2 + Y 2 is itself uniformly distributed over (0,1) and is independent of the random angle θ, we can use it as the random number U needed in equation (3). Therefore, letting S = R 2, we obtain that X = ( 2 log(s)/s) 1/2 V 1 Y = ( 2 log(s)/s) 1/2 V 2 (4) are independent unit normals when (V 1,V 2 ) is a randomly chosen point in the circle of radius 1 centered at the origin, and S = V1 2 + V 2 2. Summing up, the algorithm is as follows, The Improved Box-Muller Algorithm Generate random numbers, U 1 and U 2. Set V 1 =2U 1 1, V 2 =2U 2 1, S = V1 2 + V 2 2 If S>1 return to step 1. return the independent unit normals X = ( 2 log(s)/s) 1/2 V 1 Y = ( 2 log(s)/s) 1/2 V 2 29

5 Multivariate Distributions 5.1 Multivariate Normal Distribution X N d (µ, Σ) has the following density function p(x) = 1 (2π) d/2 Σ exp{ (x µ) Σ 1 (x µ) } 1/2 2 A direct way of generating random vectors from the distribution is to generate a d-vector of i.i.d standard normal deviates z =(z 1,z 2,,z n ) and then to form the vector x = T z + µ Where T is a d d matrix such that T T =Σ. (T could be a Cholesky factor of Σ, for example.) Then x has a N d (µ, Σ) distribution. Another approach for generating the d-vector x from N d (µ, Σ) is to generate x 1 from N 1 (µ 1,σ 11 ), generate x 2 conditionally on x 1, generate x 3 conditionally on x 1 and x 2, and so on. 30

5.2 Multinomial Distribution The probability function for the d-variate multinomial distribution is p(x) = n! x π j j xj! for π j =1,x j 0, and x j = n. To generate a multinomial, a simple way is to work with the marginals, they are binomials. The generation is done sequentially. Each succeeding conditional marginal is binomial. For efficiency, the first marginal considered would be the one with the largest probability. 31