CPSC 405 Random Variate Generation 2007W T1 Handout These notes present techniques to generate samples of desired probability distributions, and some fundamental results and techiques. Some of this material can be found in Chapter 8 (9 in 2nd ed.) of the book. 1 Inverse Transform Suppose we can generate a uniform random number r on [0 1]. How can we generate numbers x with a given pdf P (x)? To warm up our brain, let s first think about something else. Suppose we generate a uniform random number 0 < r < 1 and square it. So we have x = r 2. Clearly we also have 0 < x < 1. What is the pdf P (x) for x? A wild guess might be that it is just the square of the pdf for r, so x would also be uniform. It is however easy to see that this can t be true. Consider the probability p that x < 1/2. If x was uniform on [0 1] we would get p = 1/2. In order to get an x < 1/2 we must have gotten an r < 1/ 2. The probability for this is p = 1/ 2, not p = 1/2. So P (x) can t be uniform. To figure out what P (x) is, consider the probability p that x falls in the interval [x x + x]. In the limit x 0 we have, according to the definition of a pdf, p = P (x) x. So if we can figure out p, we can compute P (x). For x to fall in [x x + x], we must generate r in the range [ x x + x]. Since r is uniform, the probability for this (which is also p) is just the length of this interval. So we get p = x + x x, and we now use p = P (x) x and solve for P (x), obtaining x + x x P (x) =. x Taking the limit x 0 we thus obtain x + x x P (x) = lim x 0 x = d 1 x = dx 2 x. With this result, we now have an instance of the inverse transform method to generate random numbers with pdf 1/2 x: generate a uniform random number r and square it. The general form of the inverse transform method is obtained by computing the pdf of x = g(r) for some function g, and then trying to find a function g such that the desired pdf is obtained. Let us assume that g(r) is invertible with inverse g 1 (x). The chance that x lies in the interval [x x + dx] is P (x)dx, for infinitesimal dx. What values of r should we have gotten to get this? (Remember we are generating values x by calling a uniform random number generator to get r and then setting x = g(r).) We should have gotten an r in the interval [r r + dr], with r = g 1 (x) and r + dr = g 1 (x + dx). Write out the last formula and get r + dr = g 1 (x) + (g 1 ) (x)dx, 1
where the denotes the derivative. Using r = g 1 (x) we can simplify this to dr = (g 1 ) (x)dx. (1) The probability for the r value to be in the interval [r r + dr] is just dr. This is also the probability for x to be in [x x + dx], which is P (x)dx. Using Eq. 1, we get thus P (x)dx = (g 1 ) (x)dx, P (x) = (g 1 ) (x). Integrating both sides and remembering that F (x) = x P (y)dy, gives us or F (x) = g 1 (x), g(x) = F 1 (x), provided F (x) has an inverse. In summary, to generate a number x with pdf P (x) using the inverse transform method, we first figure out the cdf F (x) from P (x). We then invert that by solving r = F (x) for x, which gives the function F 1 (r). We then generate a uniform random number 0 < r < 1 and compute x = F 1 (r). 2 Pdf of a function of a random variable Suppose x has pdf P (x). What is the pdf Q(y) of y = g(x)? The chance for x to be in [x x + dx] is P (x)dx. Then y is in [y y + dy], with y = g(x) and dy = g (x)dx. The chance for y to be in that interval is per definition Q(y)dy. So we get, noting that g can be negative, Q(y) g (x) dx = P (x)dx, so Q(y) = P (x)/ g (x) = P (g 1 (y)) (g 1 ) (y). (2) Note that we have made an implicit assumption here that g(x) is monotone. If it is not, several intervals in x can map onto the same values y, and the inverse does not exist. This would complicate matters, and we shall not deal with this problem. If g(x) is monotone the derivative is either negative or positive (or zero), which means the derivative of the absolute value is the absolute value of the derivative, which was used in Eq. 2. Let s try it on a familiar example: P (x) is 1 on [0 1] (i.e., the uniform distribution) and g(x) = (b a)x + a. We know already that the result is a uniform pdf on [a b]. Inverting g(x) gives g 1 (y) = (y a)/(b a), and (g 1 ) (y) = 1/(b a), so Q(y) = 1/(b a) which is correct. 2
A linear transformation is often used to get a normal distribution with given µ and σ from the standard normal with µ = 1 and σ = 1, N(t) = 1 2π e t2 /2. If z denotes a standard normal variate, then x = µ + σz is normally distributed with those mean and variance as can be verified easily by using Eq. 2. 3 Constructing the pdf from measured data In this section I will present an algorithm to generate samples from a continuous distribution for which we only know a finite number of measured data points. Suppose we have measured some parameter x in a system, and have recorded N + 1 values which we have sorted in increasing order x 1, x 2,..., x N+1. Our task is now to create samples from the unknown pdf P (x) underlying this data. There are several ways to do this and since we have only limited data we will have to make some guesses. Let us denote the N intervals by I k = [x k x k+1 ] for k = 1,..., N. We now want to assign equal probabilities to each of the intervals. Areas with small interval sizes will then have a higher P (x) as expected as there are more intervals per unit length, if all intervals are treated as equally probable. Let s begin with our old friend, the uniform random number generator on [0 1], and divide the unit interval into N equal intervals R k = [k (k + 1)]/N, with k = 1,..., N. The plan is now to generate r, figure out which interval R k it is in, look up the corresponding interval I k, and generate an appropriate value of x in that interval I k, depending on the location in the interval R k of r. If r is in the left of the interval R k a value from the left side of I k will be generated and the other way around. See Figure 1. Here is MATLAB code (empgen.m) that does it. It takes a vector data with a (sorted) data sample and returns a random number y generated from the empirical pdf. Note that the empirical pdf is never explicitly constructed. function y = empgen(data) N = length(data)-1; r = rand; % k is interval k = 1+floor(N*r); % relative offset in interval (0-1) offset = r*n - (k-1); % map to appropriate value of data y = data(k)*(1-offset) + data(k+1)*offset; 3
Figure 1: Mapping from the uniform random number r to the value x based on the measured data points. 4 Convolutions Let K y = r k, k=1 where r are taken from some (fixed) distribution P (r). The pdf of y, Q(y) is called a convolution of the distribution P (x). Note that it is just a sum, and a sum is the average up to a constant. There is no simple way to compute Q(y). It goes like this. Consider the K dimensional space R spanned by r k. Let < r k <, and P (r) is possibly zero on big regions on R. The equation K k=1 r k = y, for fixed y defines a hyperplane H in R. So Q(y) is just the probability density that the r k ly on the hyperplane H, which is Q(y) = P (r 1 )P (r 2 )... P (r K ))d K 1 r, (3) H which is a hypersurface integral. For example, consider K = 2 and take P (r) to be uniform on [0 1]. The domain H is defined by the equation r 1 + r 2 = y, together with the conditions 0 < r 1 < 1 and 0 < r 2 < 1. This defines a straight line segment, which intersects the r 1 and r 2 axis at y, with 0 < y < 2. Equation 3 now reads Q(y) = ds. which is the length of the line segment times some constant we don t worry about here. The length of the line segment plotted as a function of y is just a triangle with peak at y = 1, the triangular distribution. 4 H
(a) Integral over H (b) Resulting triangular distribution Figure 2: The sum of two uniform random numbers obeys a triangular distribution Convolution gives us an easy way to generate the Erlang distribution as it is defined as the distribution of a sum of exponentially distributed variables. Note that if K is large, we will always generate an approximately normal distribution, as the convolution is just the mean up to a multiplicative constant 5 Acceptance-rejection This is sometimes an easy and fast method to program. Suppose we want to generate uniform random numbers on [c 1]. We could generate r on [0 1], accept it if r c, reject it and try again otherwise. In pseudo-c: double f1(double c) { double r; while((double r = rand())<c); return r; } Let s compare this to our inverse transform technique: double f2(double c) { double r; r = c + (1-c)*rand(); return r; } Which one is faster? The chance that f1 will generate a wrong value precisely n times is given by p = (1 c)c n. The chance for getting n wrong values is c n and the chance of 5
getting the right value at the end is 1 c. The expected value for w, the number of times a wrong value is generated is thus < w >= n(1 c)c n = c/(1 c). n=0 Suppose now the functions are called N times. On average we have to call rand() N(1+ < w >) = N/(1 c) times, and every time we have to do a compare. Method 2, (f2) on the other hand always needs to call rand() only once, but it has to do an addition, subtraction, and a multiplication every time. Which is faster depends clearly on c. Let s work it out. Let T R be the computation time for rand(), T A is the time to do an addition, subtraction, and a multiplication and let T C be the time to do the comparison. If we denote the time spent per call for the two algorithms by T 1 and T 2 we have and T 1 = 1 1 c (T R + T C ) T 2 = T R + T A. The acceptance-rejection algorithm is faster if T 1 < T 2 which we can rewrite as For example if c = 0.1 we get c 1 c T R + T c 1 c < T A. T C + 0.1T R < T A which is probably satisfied if we use the LCM algorithm for rand() as it requires about the same time as T A. This method is of course not only applicable to uniform distributions. Here s a more realistic example. IQ s are normally distributed with mean of 100 and standard deviation 15. This is of course an approximation, and in particular N(x, 100, 15) can generate negative IQ s. So if we want to generate a sample of IQ s we could try to use the inverse transform technique for a cutoff normal distribution. However it is much simpler and faster to use acceptance-rejection here and just try again if you should get a negative value. In fact, the chance of this happening is only about 1 in ten billion! 6