MTH739U/P: Topics in Scientific Computing Autumn 16 Week 6 4.5 Generic algorithms for non-uniform variates We have seen that sampling from a uniform distribution in [, 1] is a relatively straightforward process, and all the programming languages, libraries, and environments for scientific computing normally provide either a function to sample uniformly in [, 1] (that function is called rand() in Octave/Matlab) or to sample a value uniformly at random among a set of integers (as for instance the function randi() in Octave/Matlab). However, in many practical situations one needs to draw random samples from other distributions, not from uniform distributions. In this section we will review a few algorithms to sample from non-uniform distributions. In particular, we will first focus on two algorithms that can be used (at least in principle) to sample from any generic distribution, namely the Inverse Function method and the Acceptance-Rejection method. Then, we will present two algorithms which allow to sample efficiently from a Gaussian distribution, namely the Box-Muller algorithm and the Marsaglia Polar Method. Incidentally, all these algorithms are based on the possibility to draw samples from a uniform distribution, which are then appropriately transformed in order to guarantee that the resulting values are distributed according to the desired probability density function (or probability distribution function, in the case of discrete event spaces). 4.6 Inverse Function method Let us consider a continuous random variable X with probability density function f(x) on a sample space Ω. The cumulative distribution function (CDF) is defined as the integral of the probability density function f(x): F (X) = P (X x) = x f(t)dt. Note that, since f(x) is a probability density function, F (X) takes values between and 1, and is monotonically increasing (see Fig. 1). Similarly, if the random variable X is discrete, the Cumulative Distribution Function is: F (X) = P (X x) = x i x P (X = x i ) We now consider F as a function of the random variable X: Y = F (X) which gives us a new random variable Y. Note that for any random variable X, with probability density function f(x), the random variable Y spans the interval [, 1] uniformly at random. Therefore, if we find the inverse of F, and apply that to a uniform 1
.4 1..3.75 y. y.5.1.5.. 4 4 PDF 4 4 CDF Figure 1: Probability density function (PDF) and cumulative density function (CDF) of the Normal distribution. distribution, we can generate a random variable with density f(x). mean that if we consider X = F 1 (Y ) In particular, we with Y = U(, 1), then the value X that we obtain has probability density function f(x). Notice that we know that F 1 always exists, by definition, since F is a monotonically increasing function. For the sake of simplicity, and without loss of generality, we will focus in the following on the case of scalar real-valued probability density functions, i.e. f : R R. The Inverse Function method to generate a random variable with probability density function f(x) consists of the following steps: 1. Calculate the CDF associated to f(x) as F (x) = x f(t)dt. Find the inverse function X = F 1 (Y ) 3. Sample the random variable u from U(, 1), i.e. from the uniform distribution in [, 1] 4. Return the sample Z = F 1 (u).
4.6.1 Example 1: exponential distribution As an example, let us consider the exponential distribution: f(x) = λe λx x. It is easy to verify that f(x) satisfied all the properties of a probability density functions, namely. f(x), x and: So, following the procedure sketched above: λe λx dx = e λx = 1 1. We compute the cumulative distribution function (CDF) associated to f(x): F (x) = = λ x x f(t)dt e λt dt = e λt x = 1 e λx. We find the inverse of the CDF x = F 1 (u): u = F (x) = 1 e λx e λx = 1 y x = 1 log(1 u) λ F 1 (u) = 1 log(1 u) λ 3. We sample a variable u U(, 1), and compute Z = F 1 (u) 4. The random variable Z has probability density function f(x)! A possible Octave/Matlab implementation of the algorithm for generating a random variable with an exponential distribution will look like: function z = exp_sample(lambda) u = rand; z = -1/lambda * log(1-u); end 3
4.6. Example : Rayleigh distribution Another example in which the Inverse Function method works seamlessly is that of the Rayleigh distribution, whose probability density function is: f(x) = xe x, x It is easy to verify that f(x) has all the properties of a probability density function, namely: f(x), x and + xe x dx = e x We apply the same procedure to sample from the Rayleigh distribution using the Inverse Function method, namely: + = 1. i) We first compute the CDF of the Rayleigh distribution: F (x) = x ii) Compute the inverse of the CDF: f(t)dt = x u = F (x) = 1 e x u 1 = e x x = ln(1 u) te t dt = 1 e x, iii) Then we sample u from the uniform distribution U(, 1) iv) The random variable Z = F 1 (u) is Rayleigh distributed. A possible Octave/Matlab implementation of a function to sample from the Rayleigh distribution would look like this: function z = sample_rayleigh() u = rand; z = sqrt(- * log(1-u)); end 4.7 Acceptance-Rejection method The Inverse Function method looks indeed pretty general, and in principle it could be used to sample from any probability density function, since the existence (and invertibility) of the CDF associated to a certain PDF is guaranteed by the definition of CDF. The only caveat is that, although the CDF can always be computed, it might not be possible 4
Figure : Graphical interpretation of the Acceptance-Rejection method to sample from continuous probability density functions. to express the CDF in a closed form. This happens, for instance, for all the probability density functions which do not have an elementary primitive function. A typical example is that of the Gaussian distribution: f(x) = 1 (x µ) e σ πσ which does not have a simple primitive function. In fact, the cumulative distribution function is the integral function: F (x) = x 1 (x µ) e σ πσ which does not have an explicit inverse. Even if the cumulative density function cannot be expressed in terms of elementary functions, it will still be possible to estimate the inverse of the CDF by numerically solving the corresponding integral, as it happens for instance for the Gaussian distribution, but this would in general require additional heavy computations. The Acceptance-Rejection method is a concrete alternative to the Inverse Function method whenever computing the inverse of the CDF is difficult. Assuming that we want to draw random samples from the probability density function f(x), x Ω, the Acceptance-Rejection method consists of the following steps: 1. Find a function g(x) such that g(x) f(x), x Ω and define A g = Ω g(x)dx. Sample a random variable x from the probability density function g(x)/a g, e.g. using the Inverse Function method 3. Sample a random variable ȳ uniformly in [, g( x)] 5
4. If ȳ f( x) then accept the sample and return x; otherwise, reject the sample and start again from step () above 5. The accepted samples will be distributed accordingly to f(x) Despite looking quite obscure at a first glance, this procedure has a very intuitive geometrical interpretation, as sketched in Fig.. Let us call F and G, respectively, the regions between the x-axis and the functions f(x) and g(x). Recall that the normalisation condition for the probability density function f(x) f(x)dx = 1 Ω implies that the area of the region F is A f = 1 (light-grey area in Fig. ). In general, if we sample a point ( x, ȳ) uniformly at random within the region G, then the abscissa x will be distributed accordingly to g(x)/a g. In fact, by definition, the probability that the abscissa x lies within the interval [ x δx, x + δx] is equal to the fraction between the area of the shaded dark-grey region in Fig., which is equal to δx g( x), and the area of the whole region G under the function g(x), which is equal to A g. Hence we have P ( x δx < x < x+δx) = δx g( x)/a g. In particular, the probability that the point ( x, ȳ) sampled uniformly at random in G lies between the x-axis and the curve f(x) is just equal to the ratio f( x)/g( x). Hence, if we sample a point ( x, ȳ) uniformly at random in G, and ȳ < f( x), then the abscissa x will be distributed with probability density function f(x). Notice that, in order to sample a point uniformly at random in the region G, we just need to sample its first coordinate x from g(x)/a g, and its second coordinate ȳ uniformly at random between and g( x). Then, the Acceptance-Rejection method accepts the sample (i.e., the abscissa x) only if ȳ < f( x). The fundamental hypothesis of the Acceptance-Rejection method is that we are able to draw random samples from g(x)/a g, for instance by using the Inverse Function method. This means that, in principle, we could set g(x) equal to any multiple of a distribution function having the the same event space Ω of f(x), such that g(x) f(x), x Ω. However, it is evident that the only points which will lead to an accepted (successful) sample are those which fall in the area F below the function f(x) (indicated as light-grey in Fig. ). This means that the acceptance rate, i.e. the fraction of points which yield an accepted sample, is equal to 1/A g. Consequently, in order to maximise the acceptance rate and to avoid to perform too many useless computations, we would have to minimise the area A g, i.e. we would like g(x) to be as close as possible to f(x). In general, it could be possible to use the simple Acceptance-Rejection method to sample from a Gaussian distribution. In particular, that would not be very efficient. In the following sections we will review two methods specifically tailored to sample random variables from a Gaussian distribution. 4.8 A note on the Gaussian distribution Recall the Gaussian PDF with mean µ and variance σ. f(x) = 1 e (x µ) σ πσ 6
..1..3.4 3σ σ 1σ µ 1σ σ 3σ Figure 3: Gaussian probability density with mean µ and standard deviation σ. Let us now define a new variable z = x µ σ. Using the notion that the probability is always conserved under transformation of variables, we can show that 1 f(x)dx = 1 ( x µ πσ e σ ) dx = = = 1 1 dx πσ e z 1 π e 1 z dz f(z)dz, dz dz noting that dx = σ. So, f(z) is a Gaussian with µ = and dz σ = 1. In other words, the random variable Z N(, 1). Therefore, the random variable X = σz + µ follows a Gaussian PDF with mean µ variance σ. This means that if we can generate a random sample Z from the distribution N(, 1) (i.e., from a Gaussian with µ = and σ = 1) then we can generate a sample X from a Gaussian distribution with expected value µ and variance σ by computing: X = σz + µ So the problem of generating random samples from a generic Gaussian distribution is reduced to the problem of sampling from N(, 1). But, how can we generate Z N(, 1)? Using the inversion method does not work as the Gaussian CDF is not analytically tractable and can be inverted only numerically. Several methods have been proposed to generate Gaussian random variables in an efficient way, and here we will consider two of them. 4.9 Box-Muller method Let us consider two random variables independently drawn from the Gaussian distribution X N(, 1) and Y N(, 1). If we consider X and Y as co-ordinates in two dimensions, 7
we can define a further two random variables R, and θ, corresponding to the polar representation of the coordinates X and Y : R = X + Y θ = arctan Y X and, conversely, X = R cos θ Y = R sin θ It can be shown that R is drawn from a Rayleigh distribution and has PDF f(r) = re r. and that θ follows a uniform distribution U(, π) x * exp( x^/)...4.6 4 6 8 1 x Figure 4: Rayleigh distribution To prove the above statements, let us consider the joint PDF f(x, y). The integral b d a c f(x, y) dx dy is the probability that X lies between a and b and Y lies between c and d. We want to transform the variables X, Y to polar coordinates, R, θ. Since the two variables X and Y are drawn independently, then the joint probability density function f(x, y) factorises into the product of the two corresponding marginal probabilities. In other words, f(x, y) = f(x)f(y), hence: f(x, y) dx dy = 1 dx e 1 (x +y ) dy. π 8
Then we change variables, i.e. we write: f(x, y) dx dy = π dr dθ 1 r π e ( ) x, y J, θ, r where the determinant of the Jacobian is ( ) ( x, y x ) y J = det r r x y θ, r θ θ ( cos θ sin θ = det r sin θ r cos θ ), noting that x = r cos θ and y = r sin θ. So, the determinant is ( ) x, y J = r cos θ + r sin θ θ, r Therefore, with The marginal PDFs are: = r. f(x, y) dx dy = f(r) = f(θ) = f(r, θ) = r r π e π f(r, θ)dθ = re r f(r, θ)dr = 1 π π dr dθ r r π e The Box-Muller method uses these results, and works as follows: 1. Sample R from the Rayleigh distribution, e.g. by using the Inverse Function method described in the previous section.. Sample θ from the uniform distribution U(, π) in [, π]. This is simply done by multiplying the uniformly distributed variable by π 3. Compute X = R cos θ and Y = R sin θ. θ = πu(, 1) U(, π). This gives two independent random variables X and Y that both follow the Gaussian distribution N(, 1). 9
Algorithm A possible Octave/Matlab implementation of the algorithm for generating two Gaussian distributed random variables using the Box-Muller method is as follows: function [x,y] = box_muller() u1 = rand; u = rand; r = sqrt ( - * ln ( u1) ); v = * pi * u; x = r * cos v; y = r * sin v; end 4.1 Marsaglia polar method The Box-Muller method is not very efficient in practice, since it relies on the computation of trigonometric functions (which are normally relatively slow). To get around this, Marsaglia proposed a polar method for computing Gaussian random variables in the following way. Let us consider two uniformly distributed variables between 1 and 1: X U( 1, 1) Y U( 1, 1) Let s draw pairs of these, and accept both only if X + Y 1 or reject both otherwise. We will then have random points that are uniformly distributed on a disk with radius 1. See figure below. Figure 5: Marsaglia s polar method: unit disc Expressing these in polar coordinates, we get two random variables: ω = X + Y U(, 1) θ = arctan Y X U(, π). 1
Since ω is uniformly distributed in [, 1], then R = ln ω is Rayleigh distributed, and we can compute two variables that are Gaussian distributed: z 1 = R cos θ = ln ω x ω z = R sin θ = ln ω Y ω, where X = ω cos θ and Y = ω sin θ. A possible Octave/Matlab implementation of the algorithm to generate Gaussian random variables following Marsaglia s polar method is the following: function [z1,z] = marsaglia() w = ; while w > 1 x = * rand - 1; y = * rand - 1; w = x*x + y*y; end z1 = x * sqrt( - * ln(w) / w); z = y * sqrt( - * ln(w) / w); end %% U(-1,1) %% U(-1,1) Notice that the Marsaglia polar method is a specific instance of an acceptance-rejection procedure. In fact, the method requires to sample in the square [ 1, 1] [ 1, 1], but accepts only the points within the unit disc. It is easy to realise that the acceptance ratio of the Marsaglia polar method is equal to the fraction between the area of the unit disc and the area of the circumscribed square, i.e. to π/4.785. This means that the Marsaglia polar method rejects about 1.5% of the samples. 11