MTH739U/P: Topics in Scientific Computing Autumn 2016 Week 6

Similar documents
Lecture 11: Probability, Order Statistics and Sampling

Transformation of Probability Densities

2 Functions of random variables

STAT 801: Mathematical Statistics. Distribution Theory

Solution to Assignment 3

STAT 450: Statistical Theory. Distribution Theory. Reading in Casella and Berger: Ch 2 Sec 1, Ch 4 Sec 1, Ch 4 Sec 6.

conditional cdf, conditional pdf, total probability theorem?

3F1 Random Processes Examples Paper (for all 6 lectures)

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

Probability and Distributions

SOLUTIONS TO THE FINAL EXAM. December 14, 2010, 9:00am-12:00 (3 hours)

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

Ch3. Generating Random Variates with Non-Uniform Distributions

Chapter 3 sections. SKIP: 3.10 Markov Chains. SKIP: pages Chapter 3 - continued

Chapter 4. Continuous Random Variables 4.1 PDF

Change Of Variable Theorem: Multiple Dimensions

5.2 Continuous random variables

Generation from simple discrete distributions

1 Review of Probability and Distributions

Random Variables and Their Distributions

Statistics for scientists and engineers

Continuous Random Variables

STAT 450: Statistical Theory. Distribution Theory. Reading in Casella and Berger: Ch 2 Sec 1, Ch 4 Sec 1, Ch 4 Sec 6.

2 Random Variable Generation

4. CONTINUOUS RANDOM VARIABLES

Lecture 25: Review. Statistics 104. April 23, Colin Rundel

Statistics 3657 : Moment Approximations

Joint Distributions. (a) Scalar multiplication: k = c d. (b) Product of two matrices: c d. (c) The transpose of a matrix:

Chapter 3 sections. SKIP: 3.10 Markov Chains. SKIP: pages Chapter 3 - continued

Probability Theory and Statistics. Peter Jochumzen

BMIR Lecture Series on Probability and Statistics Fall 2015 Discrete RVs

Random Variables. Random variables. A numerically valued map X of an outcome ω from a sample space Ω to the real line R

So we will instead use the Jacobian method for inferring the PDF of functionally related random variables; see Bertsekas & Tsitsiklis Sec. 4.1.

Chapter 2: Random Variables

Joint Distributions: Part Two 1

Cheng Soon Ong & Christian Walder. Canberra February June 2018

t 2 + 2t dt = (t + 1) dt + 1 = arctan t x + 6 x(x 3)(x + 2) = A x +

ECE 302 Division 2 Exam 2 Solutions, 11/4/2009.

2008 Winton. Review of Statistical Terminology

2. The CDF Technique. 1. Introduction. f X ( ).

x + ye z2 + ze y2, y + xe z2 + ze x2, z and where T is the

ENGG2430A-Homework 2

36. Double Integration over Non-Rectangular Regions of Type II

Final Exam 2011 Winter Term 2 Solutions

Chapter 3: Random Variables 1

Methods of Data Analysis Random numbers, Monte Carlo integration, and Stochastic Simulation Algorithm (SSA / Gillespie)

Perhaps the simplest way of modeling two (discrete) random variables is by means of a joint PMF, defined as follows.

x n cos 2x dx. dx = nx n 1 and v = 1 2 sin(2x). Andreas Fring (City University London) AS1051 Lecture Autumn / 36

Bivariate Transformations

Chapter 3: Random Variables 1

p. 6-1 Continuous Random Variables p. 6-2

MAT 271E Probability and Statistics

BMIR Lecture Series on Probability and Statistics Fall, 2015 Uniform Distribution

STAT2201. Analysis of Engineering & Scientific Data. Unit 3

Multiple Random Variables

MSc Mas6002, Introductory Material Mathematical Methods Exercises

Probability, CLT, CLT counterexamples, Bayes. The PDF file of this lecture contains a full reference document on probability and random variables.

EEL 5544 Noise in Linear Systems Lecture 30. X (s) = E [ e sx] f X (x)e sx dx. Moments can be found from the Laplace transform as

MTH4101 CALCULUS II REVISION NOTES. 1. COMPLEX NUMBERS (Thomas Appendix 7 + lecture notes) ax 2 + bx + c = 0. x = b ± b 2 4ac 2a. i = 1.

Figure 25:Differentials of surface.

Random variables. DS GA 1002 Probability and Statistics for Data Science.

Statistics 100A Homework 5 Solutions

We introduce methods that are useful in:

A6523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring 2011

Math 10C - Fall Final Exam

Generation of non-uniform random numbers

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 8 10/1/2008 CONTINUOUS RANDOM VARIABLES

Monte-Carlo MMD-MA, Université Paris-Dauphine. Xiaolu Tan

Ching-Han Hsu, BMES, National Tsing Hua University c 2015 by Ching-Han Hsu, Ph.D., BMIR Lab. = a + b 2. b a. x a b a = 12

Chapter 2 Continuous Distributions

Review of Statistical Terminology

DO NOT BEGIN THIS TEST UNTIL INSTRUCTED TO START

Probability Models. 4. What is the definition of the expectation of a discrete random variable?

Chapter 5. Random Variables (Continuous Case) 5.1 Basic definitions

MATH 3150: PDE FOR ENGINEERS FINAL EXAM (VERSION D) 1. Consider the heat equation in a wire whose diffusivity varies over time: u k(t) 2 x 2

Calculus II Study Guide Fall 2015 Instructor: Barry McQuarrie Page 1 of 8

Review for the First Midterm Exam

ESTIMATION THEORY. Chapter Estimation of Random Variables

Lecture 13 - Wednesday April 29th

Question: My computer only knows how to generate a uniform random variable. How do I generate others? f X (x)dx. f X (s)ds.

Probability Density Under Transformation

Joint Probability Distributions and Random Samples (Devore Chapter Five)

MAS223 Statistical Inference and Modelling Exercises

Math 265H: Calculus III Practice Midterm II: Fall 2014

Multivariate distributions

Statistical Theory MT 2007 Problems 4: Solution sketches

Distributions of Functions of Random Variables

1 Acceptance-Rejection Method

APPM/MATH 4/5520 Solutions to Problem Set Two. = 2 y = y 2. e 1 2 x2 1 = 1. (g 1

Multivariate Statistics

2. Variance and Covariance: We will now derive some classic properties of variance and covariance. Assume real-valued random variables X and Y.

Math 147 Exam II Practice Problems

1. For each function, find all of its critical points and then classify each point as a local extremum or saddle point.

Continuous Random Variables

Chapter 2. Random Variable. Define single random variables in terms of their PDF and CDF, and calculate moments such as the mean and variance.

Chapter 3, 4 Random Variables ENCS Probability and Stochastic Processes. Concordia University

3. Minimization with constraints Problem III. Minimize f(x) in R n given that x satisfies the equality constraints. g j (x) = c j, j = 1,...

Discrete Mathematics and Probability Theory Spring 2014 Anant Sahai Note 18

More on Bayes and conjugate forms

E X A M. Probability Theory and Stochastic Processes Date: December 13, 2016 Duration: 4 hours. Number of pages incl.

Transcription:

MTH739U/P: Topics in Scientific Computing Autumn 16 Week 6 4.5 Generic algorithms for non-uniform variates We have seen that sampling from a uniform distribution in [, 1] is a relatively straightforward process, and all the programming languages, libraries, and environments for scientific computing normally provide either a function to sample uniformly in [, 1] (that function is called rand() in Octave/Matlab) or to sample a value uniformly at random among a set of integers (as for instance the function randi() in Octave/Matlab). However, in many practical situations one needs to draw random samples from other distributions, not from uniform distributions. In this section we will review a few algorithms to sample from non-uniform distributions. In particular, we will first focus on two algorithms that can be used (at least in principle) to sample from any generic distribution, namely the Inverse Function method and the Acceptance-Rejection method. Then, we will present two algorithms which allow to sample efficiently from a Gaussian distribution, namely the Box-Muller algorithm and the Marsaglia Polar Method. Incidentally, all these algorithms are based on the possibility to draw samples from a uniform distribution, which are then appropriately transformed in order to guarantee that the resulting values are distributed according to the desired probability density function (or probability distribution function, in the case of discrete event spaces). 4.6 Inverse Function method Let us consider a continuous random variable X with probability density function f(x) on a sample space Ω. The cumulative distribution function (CDF) is defined as the integral of the probability density function f(x): F (X) = P (X x) = x f(t)dt. Note that, since f(x) is a probability density function, F (X) takes values between and 1, and is monotonically increasing (see Fig. 1). Similarly, if the random variable X is discrete, the Cumulative Distribution Function is: F (X) = P (X x) = x i x P (X = x i ) We now consider F as a function of the random variable X: Y = F (X) which gives us a new random variable Y. Note that for any random variable X, with probability density function f(x), the random variable Y spans the interval [, 1] uniformly at random. Therefore, if we find the inverse of F, and apply that to a uniform 1

.4 1..3.75 y. y.5.1.5.. 4 4 PDF 4 4 CDF Figure 1: Probability density function (PDF) and cumulative density function (CDF) of the Normal distribution. distribution, we can generate a random variable with density f(x). mean that if we consider X = F 1 (Y ) In particular, we with Y = U(, 1), then the value X that we obtain has probability density function f(x). Notice that we know that F 1 always exists, by definition, since F is a monotonically increasing function. For the sake of simplicity, and without loss of generality, we will focus in the following on the case of scalar real-valued probability density functions, i.e. f : R R. The Inverse Function method to generate a random variable with probability density function f(x) consists of the following steps: 1. Calculate the CDF associated to f(x) as F (x) = x f(t)dt. Find the inverse function X = F 1 (Y ) 3. Sample the random variable u from U(, 1), i.e. from the uniform distribution in [, 1] 4. Return the sample Z = F 1 (u).

4.6.1 Example 1: exponential distribution As an example, let us consider the exponential distribution: f(x) = λe λx x. It is easy to verify that f(x) satisfied all the properties of a probability density functions, namely. f(x), x and: So, following the procedure sketched above: λe λx dx = e λx = 1 1. We compute the cumulative distribution function (CDF) associated to f(x): F (x) = = λ x x f(t)dt e λt dt = e λt x = 1 e λx. We find the inverse of the CDF x = F 1 (u): u = F (x) = 1 e λx e λx = 1 y x = 1 log(1 u) λ F 1 (u) = 1 log(1 u) λ 3. We sample a variable u U(, 1), and compute Z = F 1 (u) 4. The random variable Z has probability density function f(x)! A possible Octave/Matlab implementation of the algorithm for generating a random variable with an exponential distribution will look like: function z = exp_sample(lambda) u = rand; z = -1/lambda * log(1-u); end 3

4.6. Example : Rayleigh distribution Another example in which the Inverse Function method works seamlessly is that of the Rayleigh distribution, whose probability density function is: f(x) = xe x, x It is easy to verify that f(x) has all the properties of a probability density function, namely: f(x), x and + xe x dx = e x We apply the same procedure to sample from the Rayleigh distribution using the Inverse Function method, namely: + = 1. i) We first compute the CDF of the Rayleigh distribution: F (x) = x ii) Compute the inverse of the CDF: f(t)dt = x u = F (x) = 1 e x u 1 = e x x = ln(1 u) te t dt = 1 e x, iii) Then we sample u from the uniform distribution U(, 1) iv) The random variable Z = F 1 (u) is Rayleigh distributed. A possible Octave/Matlab implementation of a function to sample from the Rayleigh distribution would look like this: function z = sample_rayleigh() u = rand; z = sqrt(- * log(1-u)); end 4.7 Acceptance-Rejection method The Inverse Function method looks indeed pretty general, and in principle it could be used to sample from any probability density function, since the existence (and invertibility) of the CDF associated to a certain PDF is guaranteed by the definition of CDF. The only caveat is that, although the CDF can always be computed, it might not be possible 4

Figure : Graphical interpretation of the Acceptance-Rejection method to sample from continuous probability density functions. to express the CDF in a closed form. This happens, for instance, for all the probability density functions which do not have an elementary primitive function. A typical example is that of the Gaussian distribution: f(x) = 1 (x µ) e σ πσ which does not have a simple primitive function. In fact, the cumulative distribution function is the integral function: F (x) = x 1 (x µ) e σ πσ which does not have an explicit inverse. Even if the cumulative density function cannot be expressed in terms of elementary functions, it will still be possible to estimate the inverse of the CDF by numerically solving the corresponding integral, as it happens for instance for the Gaussian distribution, but this would in general require additional heavy computations. The Acceptance-Rejection method is a concrete alternative to the Inverse Function method whenever computing the inverse of the CDF is difficult. Assuming that we want to draw random samples from the probability density function f(x), x Ω, the Acceptance-Rejection method consists of the following steps: 1. Find a function g(x) such that g(x) f(x), x Ω and define A g = Ω g(x)dx. Sample a random variable x from the probability density function g(x)/a g, e.g. using the Inverse Function method 3. Sample a random variable ȳ uniformly in [, g( x)] 5

4. If ȳ f( x) then accept the sample and return x; otherwise, reject the sample and start again from step () above 5. The accepted samples will be distributed accordingly to f(x) Despite looking quite obscure at a first glance, this procedure has a very intuitive geometrical interpretation, as sketched in Fig.. Let us call F and G, respectively, the regions between the x-axis and the functions f(x) and g(x). Recall that the normalisation condition for the probability density function f(x) f(x)dx = 1 Ω implies that the area of the region F is A f = 1 (light-grey area in Fig. ). In general, if we sample a point ( x, ȳ) uniformly at random within the region G, then the abscissa x will be distributed accordingly to g(x)/a g. In fact, by definition, the probability that the abscissa x lies within the interval [ x δx, x + δx] is equal to the fraction between the area of the shaded dark-grey region in Fig., which is equal to δx g( x), and the area of the whole region G under the function g(x), which is equal to A g. Hence we have P ( x δx < x < x+δx) = δx g( x)/a g. In particular, the probability that the point ( x, ȳ) sampled uniformly at random in G lies between the x-axis and the curve f(x) is just equal to the ratio f( x)/g( x). Hence, if we sample a point ( x, ȳ) uniformly at random in G, and ȳ < f( x), then the abscissa x will be distributed with probability density function f(x). Notice that, in order to sample a point uniformly at random in the region G, we just need to sample its first coordinate x from g(x)/a g, and its second coordinate ȳ uniformly at random between and g( x). Then, the Acceptance-Rejection method accepts the sample (i.e., the abscissa x) only if ȳ < f( x). The fundamental hypothesis of the Acceptance-Rejection method is that we are able to draw random samples from g(x)/a g, for instance by using the Inverse Function method. This means that, in principle, we could set g(x) equal to any multiple of a distribution function having the the same event space Ω of f(x), such that g(x) f(x), x Ω. However, it is evident that the only points which will lead to an accepted (successful) sample are those which fall in the area F below the function f(x) (indicated as light-grey in Fig. ). This means that the acceptance rate, i.e. the fraction of points which yield an accepted sample, is equal to 1/A g. Consequently, in order to maximise the acceptance rate and to avoid to perform too many useless computations, we would have to minimise the area A g, i.e. we would like g(x) to be as close as possible to f(x). In general, it could be possible to use the simple Acceptance-Rejection method to sample from a Gaussian distribution. In particular, that would not be very efficient. In the following sections we will review two methods specifically tailored to sample random variables from a Gaussian distribution. 4.8 A note on the Gaussian distribution Recall the Gaussian PDF with mean µ and variance σ. f(x) = 1 e (x µ) σ πσ 6

..1..3.4 3σ σ 1σ µ 1σ σ 3σ Figure 3: Gaussian probability density with mean µ and standard deviation σ. Let us now define a new variable z = x µ σ. Using the notion that the probability is always conserved under transformation of variables, we can show that 1 f(x)dx = 1 ( x µ πσ e σ ) dx = = = 1 1 dx πσ e z 1 π e 1 z dz f(z)dz, dz dz noting that dx = σ. So, f(z) is a Gaussian with µ = and dz σ = 1. In other words, the random variable Z N(, 1). Therefore, the random variable X = σz + µ follows a Gaussian PDF with mean µ variance σ. This means that if we can generate a random sample Z from the distribution N(, 1) (i.e., from a Gaussian with µ = and σ = 1) then we can generate a sample X from a Gaussian distribution with expected value µ and variance σ by computing: X = σz + µ So the problem of generating random samples from a generic Gaussian distribution is reduced to the problem of sampling from N(, 1). But, how can we generate Z N(, 1)? Using the inversion method does not work as the Gaussian CDF is not analytically tractable and can be inverted only numerically. Several methods have been proposed to generate Gaussian random variables in an efficient way, and here we will consider two of them. 4.9 Box-Muller method Let us consider two random variables independently drawn from the Gaussian distribution X N(, 1) and Y N(, 1). If we consider X and Y as co-ordinates in two dimensions, 7

we can define a further two random variables R, and θ, corresponding to the polar representation of the coordinates X and Y : R = X + Y θ = arctan Y X and, conversely, X = R cos θ Y = R sin θ It can be shown that R is drawn from a Rayleigh distribution and has PDF f(r) = re r. and that θ follows a uniform distribution U(, π) x * exp( x^/)...4.6 4 6 8 1 x Figure 4: Rayleigh distribution To prove the above statements, let us consider the joint PDF f(x, y). The integral b d a c f(x, y) dx dy is the probability that X lies between a and b and Y lies between c and d. We want to transform the variables X, Y to polar coordinates, R, θ. Since the two variables X and Y are drawn independently, then the joint probability density function f(x, y) factorises into the product of the two corresponding marginal probabilities. In other words, f(x, y) = f(x)f(y), hence: f(x, y) dx dy = 1 dx e 1 (x +y ) dy. π 8

Then we change variables, i.e. we write: f(x, y) dx dy = π dr dθ 1 r π e ( ) x, y J, θ, r where the determinant of the Jacobian is ( ) ( x, y x ) y J = det r r x y θ, r θ θ ( cos θ sin θ = det r sin θ r cos θ ), noting that x = r cos θ and y = r sin θ. So, the determinant is ( ) x, y J = r cos θ + r sin θ θ, r Therefore, with The marginal PDFs are: = r. f(x, y) dx dy = f(r) = f(θ) = f(r, θ) = r r π e π f(r, θ)dθ = re r f(r, θ)dr = 1 π π dr dθ r r π e The Box-Muller method uses these results, and works as follows: 1. Sample R from the Rayleigh distribution, e.g. by using the Inverse Function method described in the previous section.. Sample θ from the uniform distribution U(, π) in [, π]. This is simply done by multiplying the uniformly distributed variable by π 3. Compute X = R cos θ and Y = R sin θ. θ = πu(, 1) U(, π). This gives two independent random variables X and Y that both follow the Gaussian distribution N(, 1). 9

Algorithm A possible Octave/Matlab implementation of the algorithm for generating two Gaussian distributed random variables using the Box-Muller method is as follows: function [x,y] = box_muller() u1 = rand; u = rand; r = sqrt ( - * ln ( u1) ); v = * pi * u; x = r * cos v; y = r * sin v; end 4.1 Marsaglia polar method The Box-Muller method is not very efficient in practice, since it relies on the computation of trigonometric functions (which are normally relatively slow). To get around this, Marsaglia proposed a polar method for computing Gaussian random variables in the following way. Let us consider two uniformly distributed variables between 1 and 1: X U( 1, 1) Y U( 1, 1) Let s draw pairs of these, and accept both only if X + Y 1 or reject both otherwise. We will then have random points that are uniformly distributed on a disk with radius 1. See figure below. Figure 5: Marsaglia s polar method: unit disc Expressing these in polar coordinates, we get two random variables: ω = X + Y U(, 1) θ = arctan Y X U(, π). 1

Since ω is uniformly distributed in [, 1], then R = ln ω is Rayleigh distributed, and we can compute two variables that are Gaussian distributed: z 1 = R cos θ = ln ω x ω z = R sin θ = ln ω Y ω, where X = ω cos θ and Y = ω sin θ. A possible Octave/Matlab implementation of the algorithm to generate Gaussian random variables following Marsaglia s polar method is the following: function [z1,z] = marsaglia() w = ; while w > 1 x = * rand - 1; y = * rand - 1; w = x*x + y*y; end z1 = x * sqrt( - * ln(w) / w); z = y * sqrt( - * ln(w) / w); end %% U(-1,1) %% U(-1,1) Notice that the Marsaglia polar method is a specific instance of an acceptance-rejection procedure. In fact, the method requires to sample in the square [ 1, 1] [ 1, 1], but accepts only the points within the unit disc. It is easy to realise that the acceptance ratio of the Marsaglia polar method is equal to the fraction between the area of the unit disc and the area of the circumscribed square, i.e. to π/4.785. This means that the Marsaglia polar method rejects about 1.5% of the samples. 11