Simulation - Lectures - Part I

Similar documents
Ch3. Generating Random Variates with Non-Uniform Distributions

Stat 451 Lecture Notes Simulating Random Variables

Chapter 6: Functions of Random Variables

2 Random Variable Generation

Generating the Sample

Random Variables and Their Distributions

Continuous Random Variables

Contents 1. Contents

Lecture 1: August 28

Foundations of Statistical Inference

Chapter 5 continued. Chapter 5 sections

conditional cdf, conditional pdf, total probability theorem?

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

Simulation - Lectures - Part III Markov chain Monte Carlo

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

MAS223 Statistical Inference and Modelling Exercises

Distributions of Functions of Random Variables. 5.1 Functions of One Random Variable

S6880 #7. Generate Non-uniform Random Number #1

1 Introduction. P (n = 1 red ball drawn) =

4. CONTINUOUS RANDOM VARIABLES

Probability Background

P (x). all other X j =x j. If X is a continuous random vector (see p.172), then the marginal distributions of X i are: f(x)dx 1 dx n

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Math Stochastic Processes & Simulation. Davar Khoshnevisan University of Utah

16 : Markov Chain Monte Carlo (MCMC)

Chapter 5. Chapter 5 sections

Stat751 / CSI771 Midterm October 15, 2015 Solutions, Comments. f(x) = 0 otherwise

Simulation. Where real stuff starts

Lecture 4. Continuous Random Variables and Transformations of Random Variables

Simulating Random Variables

Multiple Random Variables

APPM/MATH 4/5520 Solutions to Exam I Review Problems. f X 1,X 2. 2e x 1 x 2. = x 2

MTH739U/P: Topics in Scientific Computing Autumn 2016 Week 6

Department of Statistical Science FIRST YEAR EXAM - SPRING 2017

Estimation theory. Parametric estimation. Properties of estimators. Minimum variance estimator. Cramer-Rao bound. Maximum likelihood estimators

ECE 4400:693 - Information Theory

Probability and Distributions

Markov Chain Monte Carlo (MCMC)

Probability. Machine Learning and Pattern Recognition. Chris Williams. School of Informatics, University of Edinburgh. August 2014

1.1 Review of Probability Theory

STAT Chapter 5 Continuous Distributions

Bayesian Inference and MCMC

t x 1 e t dt, and simplify the answer when possible (for example, when r is a positive even number). In particular, confirm that EX 4 = 3.

Simulation. Where real stuff starts

Multivariate Random Variable

Part IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

Probability Models. 4. What is the definition of the expectation of a discrete random variable?

Monte Carlo Methods for Computation and Optimization (048715)

1: PROBABILITY REVIEW

Basics on Probability. Jingrui He 09/11/2007

First Year Examination Department of Statistics, University of Florida

i=1 k i=1 g i (Y )] = k

Lecture 1. ABC of Probability

Chapter 3, 4 Random Variables ENCS Probability and Stochastic Processes. Concordia University

STAT 418: Probability and Stochastic Processes

Review. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Lecture 17: The Exponential and Some Related Distributions

Generation from simple discrete distributions

Slides 8: Statistical Models in Simulation

Quick Tour of Basic Probability Theory and Linear Algebra

Computer Intensive Methods in Mathematical Statistics

Lecture 2: Repetition of probability theory and statistics

Formulas for probability theory and linear models SF2941

Chapter 2: Random Variables

Qualifying Exam CS 661: System Simulation Summer 2013 Prof. Marvin K. Nakayama

LIST OF FORMULAS FOR STK1100 AND STK1110

3. Probability and Statistics

Cheng Soon Ong & Christian Walder. Canberra February June 2018

( ) ( ) Monte Carlo Methods Interested in. E f X = f x d x. Examples:

General Principles in Random Variates Generation

Graphical Models and Kernel Methods

EE4601 Communication Systems

1 Probability and Random Variables

Topic 4: Continuous random variables

This exam is closed book and closed notes. (You will have access to a copy of the Table of Common Distributions given in the back of the text.

Probability Background

ACM 116: Lectures 3 4

Stochastic Processes and Monte-Carlo Methods. University of Massachusetts: Spring Luc Rey-Bellet

6 Markov Chain Monte Carlo (MCMC)

Introduction to Markov Chain Monte Carlo & Gibbs Sampling

Continuous random variables

Chapter 2: Fundamentals of Statistics Lecture 15: Models and statistics

Review (probability, linear algebra) CE-717 : Machine Learning Sharif University of Technology

Computational statistics

18.440: Lecture 28 Lectures Review

APPM/MATH 4/5520 Solutions to Problem Set Two. = 2 y = y 2. e 1 2 x2 1 = 1. (g 1

MARKOV CHAIN MONTE CARLO

STAT 3610: Review of Probability Distributions

L09. PARTICLE FILTERING. NA568 Mobile Robotics: Methods & Algorithms

Monte Carlo Studies. The response in a Monte Carlo study is a random variable.

Slides 5: Random Number Extensions

Review 1: STAT Mark Carpenter, Ph.D. Professor of Statistics Department of Mathematics and Statistics. August 25, 2015

Joint Probability Distributions and Random Samples (Devore Chapter Five)

Chapter 3 sections. SKIP: 3.10 Markov Chains. SKIP: pages Chapter 3 - continued

THE QUEEN S UNIVERSITY OF BELFAST

Basic concepts of probability theory

Stochastic Models in Computer Science A Tutorial

5. Random Vectors. probabilities. characteristic function. cross correlation, cross covariance. Gaussian random vectors. functions of random vectors

Course: ESO-209 Home Work: 1 Instructor: Debasis Kundu

STAT215: Solutions for Homework 1

Transcription:

Simulation - Lectures - Part I Julien Berestycki -(adapted from François Caron s slides) Part A Simulation and Statistical Programming Hilary Term 2017 Part A Simulation. HT 2017. J. Berestycki. 1 / 66

Simulation and Statistical Programming Lectures on Simulation (Prof. J. Berestycki): Mondays 2-3pm Weeks 1-8. LG.01, 24-29 St Giles. Computer Lab on Statistical Programming (Prof. R. Evans): Wednesday 9-11am Weeks 2,3,5,8. LG.02, 24-29 St Giles. Departmental problem classes: Thursdays 9-10am or 10-11am - Weeks 3,5,7,8. LG.04, 24-29 St Giles. Hand in problem sheet solutions by Monday 12am of same week at the reception of 24-29 St Giles. Webpage: http://www.stats.ox.ac.uk/~berestyc/teaching/a12.html This course builds upon the notes and slides of Geoff Nicholls, Arnaud Doucet, Yee Whye Teh and Matti Vihola. Part A Simulation. HT 2017. J. Berestycki. 2 / 66

Outline Introduction Monte Carlo integration Random variable generation Inversion Method Transformation Methods Rejection Sampling Part A Simulation. HT 2017. J. Berestycki. 3 / 66

Outline Introduction Monte Carlo integration Random variable generation Inversion Method Transformation Methods Rejection Sampling Part A Simulation. HT 2017. J. Berestycki. 4 / 66

Monte Carlo Simulation Methods Computational tools for the simulation of random variables and the approximation of integrals/expectations. These simulation methods, aka Monte Carlo methods, are used in many fields including statistical physics, computational chemistry, statistical inference, genetics, finance etc. The Metropolis algorithm was named the top algorithm of the 20th century by a committee of mathematicians, computer scientists & physicists. With the dramatic increase of computational power, Monte Carlo methods are increasingly used. Part A Simulation. HT 2017. J. Berestycki. 5 / 66

Objectives of the Course Introduce the main tools for the simulation of random variables and the approximation of multidimensional integrals: Integration by Monte Carlo, inversion method, transformation method, rejection sampling, importance sampling, Markov chain Monte Carlo including Metropolis-Hastings. Understand the theoretical foundations and convergence properties of these methods. Learn to derive and implement specific algorithms for given random variables. Part A Simulation. HT 2017. J. Berestycki. 6 / 66

Computing Expectations Let X be either a discrete random variable (r.v.) taking values in a countable or finite set Ω, with p.m.f. f X or a continuous r.v. taking values in Ω = R d, with p.d.f. f X Assume you are interested in computing where φ : Ω R. θ = E (φ(x)) { = x Ω φ(x)f X(x) Ω φ(x)f X(x)dx if X is discrete if X is continuous It is impossible to compute θ exactly in most realistic applications. Even if it is possible (for Ω finite) the number of elements may be so huge that it is practically impossible ( d ) Example: Ω = R d, X N (µ, Σ) and φ(x) = I k=1 x2 k α. Example: Ω = R d, X N (µ, Σ) and φ(x) = I (x 1 < 0,..., x d < 0). Part A Simulation. HT 2017. J. Berestycki. 7 / 66

Example: Queuing Systems Customers arrive at a shop and queue to be served. Their requests require varying amount of time. The manager cares about customer satisfaction and not excessively exceeding the 9am-5pm working day of his employees. Mathematically we could set up stochastic models for the arrival process of customers and for the service time based on past experience. Question: If the shop assistants continue to deal with all customers in the shop at 5pm, what is the probability that they will have served all the customers by 5.30pm? If we call X N the number of customers in the shop at 5.30pm then the probability of interest is P (X = 0) = E (I(X = 0)). For realistic models, we typically do not know analytically the distribution of X. Part A Simulation. HT 2017. J. Berestycki. 8 / 66

Example: Particle in a Random Medium A particle (X t ) t=1,2,... evolves according to a stochastic model on Ω = R d. At each time step t, it is absorbed with probability 1 G(X t ) where G : Ω [0, 1]. Question: What is the probability that the particle has not yet been absorbed at time T? The probability of interest is P (not absorbed at time T ) = E [G(X 1 )G(X 2 ) G(X T )]. For realistic models, we cannot compute this probability. Part A Simulation. HT 2017. J. Berestycki. 9 / 66

Example: Ising Model The Ising model serves to model the behavior of a magnet and is the best known/most researched model in statistical physics. The magnetism of a material is modelled by the collective contribution of dipole moments of many atomic spins. Consider a simple 2D-Ising model on a finite lattice G ={1, 2,..., m} {1, 2,..., m} where each site σ = (i, j) hosts a particle with a +1 or -1 spin modeled as a r.v. X σ. The distribution of X = {X σ } σ G on { 1, 1} m2 is given by π(x) = exp( βu(x)) Z β where β > 0 is the inverse temperature and the potential energy is U(x) = J σ σ x σ x σ Physicists are interested in computing E [U(X)] and Z β. Part A Simulation. HT 2017. J. Berestycki. 10 / 66

Example: Ising Model Sample from an Ising model for m = 250. Part A Simulation. HT 2017. J. Berestycki. 11 / 66

Bayesian Inference Suppose (X, Y ) are both continuous r.v. with a joint density f X,Y (x, y). Think of Y as data, and X as unknown parameters of interest We have f X,Y (x, y) = f X (x) f Y X (y x) where, in many statistics problems, f X (x) can be thought of as a prior and f Y X (y x) as a likelihood function for a given Y = y. Using Bayes rule, we have f X Y (x y) = f X(x) f Y X (y x). f Y (y) For most problems of interest,f X Y (x y) does not admit an analytic expression and we cannot compute E (φ(x) Y = y) = φ(x)f X Y (x y)dx. Part A Simulation. HT 2017. J. Berestycki. 12 / 66

Outline Introduction Monte Carlo integration Random variable generation Inversion Method Transformation Methods Rejection Sampling Part A Simulation. HT 2017. J. Berestycki. 13 / 66

Monte Carlo Integration Definition (Monte Carlo method) Let X be either a discrete r.v. taking values in a countable or finite set Ω, with p.m.f. f X, or a continuous r.v. taking values in Ω = R d, with p.d.f. f X. Consider { θ = E (φ(x)) = x Ω φ(x)f X(x) if X is discrete Ω φ(x)f X(x)dx if X is continuous where φ : Ω R. Let X 1,..., X n be i.i.d. r.v. with p.d.f. (or p.m.f.) f X. Then ˆθ n = 1 n φ(x i ), n i=1 is called the Monte Carlo estimator of the expectation θ. Monte Carlo methods can be thought of as a stochastic way to approximate integrals. Part A Simulation. HT 2017. J. Berestycki. 14 / 66

Monte Carlo Integration Algorithm 1 Monte Carlo Algorithm Simulate independent X 1,..., X n with p.m.f. or p.d.f. f X Return ˆθ n = 1 n n i=1 φ(x i). Part A Simulation. HT 2017. J. Berestycki. 15 / 66

Computing Pi with Monte Carlo Methods Consider the 2 2 square, say S R 2 with inscribed disk D of radius 1. 1.5 1 0.5 0 0.5 1 1.5 1.5 1 0.5 0 0.5 1 1.5 A 2 2 square S with inscribed disk D of radius 1. Part A Simulation. HT 2017. J. Berestycki. 16 / 66

Computing Pi with Monte Carlo Methods We have D dx 1dx 2 S dx 1dx 2 = π 4. How could you estimate this quantity through simulation? D dx 1dx 2 S dx = I ((x 1, x 2 ) D) 1 1dx 2 S 4 dx 1dx 2 = E [φ(x 1, X 2 )] = θ where the expectation is w.r.t. the uniform distribution on S and φ(x 1, X 2 ) = I ((X 1, X 2 ) D). To sample uniformly on S = ( 1, 1) ( 1, 1) then simply use X 1 = 2U 1 1, X 2 = 2U 2 1 where U 1, U 2 U(0, 1). Part A Simulation. HT 2017. J. Berestycki. 17 / 66

Computing Pi with Monte Carlo Methods n <- 1000 x <- array(0, c(2,1000)) t <- array(0, c(1,1000)) for (i in 1:1000) { # generate point in square x[1,i] <- 2*runif(1)-1 x[2,i] <- 2*runif(1)-1 # compute phi(x); test whether in disk if (x[1,i]*x[1,i] + x[2,i]*x[2,i] <= 1) { t[i] <- 1 } else { t[i] <- 0 } } print(sum(t)/n*4) Part A Simulation. HT 2017. J. Berestycki. 18 / 66

Computing Pi with Monte Carlo Methods 1.5 1 0.5 0 0.5 1 1.5 1.5 1 0.5 0 0.5 1 1.5 A 2 2 square S with inscribed disk D of radius 1 and Monte Carlo samples. Part A Simulation. HT 2017. J. Berestycki. 19 / 66

Computing Pi with Monte Carlo Methods 2 x 10 3 0 2 4 6 8 10 12 14 16 18 0 100 200 300 400 500 600 700 800 900 1000 ˆθ n θ as a function of the number of samples n. Part A Simulation. HT 2017. J. Berestycki. 20 / 66

Computing Pi with Monte Carlo Methods 0.03 0.02 0.01 0 0.01 0.02 100 200 300 400 500 600 700 800 900 ˆθ n θ as a function of the number of samples n, 100 independent realizations. Part A Simulation. HT 2017. J. Berestycki. 21 / 66

Applications Toy example: simulate a large number n of independent r.v. X i N (µ, Σ) and ( ˆθ n = 1 n d ) I Xk,i 2 n α. i=1 Queuing: simulate a large number n of days using your stochastic models for the arrival process of customers and for the service time and compute ˆθ n = 1 n I (X i = 0) n i=1 where X i is the number of customers in the shop at 5.30pm for ith sample. Particle in Random Medium: simulate a large number n of particle paths (X 1,i, X 2,i,..., X T,i ) where i = 1,..., n and compute ˆθ n = 1 n G(X 1,i )G(X 2,i ) G(X T,i ) n i=1 k=1 Part A Simulation. HT 2017. J. Berestycki. 22 / 66

Monte Carlo Integration: Properties Proposition: Assume θ = E (φ(x)) exists. Then the Monte Carlo estimator ˆθ n has the following properties Unbiasedness ) E (ˆθn = θ Strong consistency ˆθ n θ almost surely as n Proof: We have ) E (ˆθn = 1 n n E (φ(x i )) = θ. i=1 Strong consistency is a consequence of the strong law of large numbers applied to Y i = φ(x i ) which is applicable as θ = E (φ(x)) is assumed to exist. Part A Simulation. HT 2017. J. Berestycki. 23 / 66

Monte Carlo Integration: Central Limit Theorem Proposition: Assume θ = E (φ(x)) and σ 2 = V (φ(x)) exist then ( E (ˆθ n θ) 2) ) = V (ˆθn = σ2 n and n ) d (ˆθn θ N (0, 1). σ ) ) ) Proof. We have E ((ˆθ n θ) 2 = V (ˆθn as E (ˆθn = θ and ) V (ˆθn = 1 n n 2 V (φ(x i )) = σ2 n. i=1 The CLT applied to Y i = φ(x i ) tells us that Y 1 + + Y n nθ σ n d N (0, 1) so the result follows as ˆθ n = 1 n (Y 1 + + Y n ). Part A Simulation. HT 2017. J. Berestycki. 24 / 66

Monte Carlo Integration: Variance Estimation Proposition: Assume σ 2 = V (φ(x)) exists then Sφ(X) 2 = 1 n (φ(x i ) n 1 ˆθ ) 2 n i=1 is an unbiased sample variance estimator of σ 2. Proof. Let Y i = φ(x i ) then we have n ( ) E Sφ(X) 2 = = 1 n 1 1 n 1 E i=1 ( (Yi E Y ) ) 2 ( n where Y = φ(x) and Y = 1 n n i=1 Y i. i=1 Y 2 i ny 2 ) = n ( V (Y ) + θ 2) n ( V ( Y ) + θ 2) n 1 = V (Y ) = V (φ(x)). Part A Simulation. HT 2017. J. Berestycki. 25 / 66

How Good is The Estimator? Chebyshev s inequality yields the bound ( P ˆθn θ > c σ ) n ) V (ˆθn c 2 σ 2 /n = 1 c 2. Another estimate follows from the CLT for large n n ) ( d (ˆθn θ N (0, 1) P ˆθn θ > c σ ) 2 (1 Φ(c)). σ n Hence by choosing c = c α s.t. 2 (1 Φ(c α )) = α, an approximate (1 α)100%-ci for θ is ) ( ) σ S φ(x) (ˆθ n ± c α n ˆθ n ± c α. n Part A Simulation. HT 2017. J. Berestycki. 26 / 66

Monte Carlo Integration Whatever being Ω; e.g. Ω = R or Ω = R 1000, the error is still in σ/ n. This is in contrast with deterministic methods. The error in a product trapezoidal rule in d dimensions is O(n 2/d ) for twice continuously differentiable integrands. It is sometimes said erroneously that it beats the curse of dimensionality but this is generally not true as σ 2 typically depends of dim(ω). Part A Simulation. HT 2017. J. Berestycki. 27 / 66

Pseudo-random numbers The aim of the game is to be able to generate complicated random variables and stochastic models. Henceforth, we will assume that we have access to a sequence of independent random variables (U i, i 1) that are uniformly distributed on (0, 1); i.e. U i U[0, 1]. In R, the command u runif(100) return 100 realizations of uniform r.v. in (0, 1). Strictly speaking, we only have access to pseudo-random (deterministic) numbers. The behaviour of modern random number generators (constructed on number theory) resembles mathematical random numbers in many respects: standard statistical tests for uniformity, independence, etc. do not show significant deviations. Part A Simulation. HT 2017. J. Berestycki. 28 / 66

Part A Simulation. HT 2017. J. Berestycki. 29 / 66

Outline Introduction Monte Carlo integration Random variable generation Inversion Method Transformation Methods Rejection Sampling Part A Simulation. HT 2017. J. Berestycki. 30 / 66

Outline Introduction Monte Carlo integration Random variable generation Inversion Method Transformation Methods Rejection Sampling Part A Simulation. HT 2017. J. Berestycki. 31 / 66

Generating Random Variables Using Inversion A function F : R [0, 1] is a cumulative distribution function (cdf) if - F is increasing; i.e. if x y then F (x) F (y) - F is right continuous; i.e. F (x + ɛ) F (x) as ɛ 0 (ɛ > 0) - F (x) 0 as x and F (x) 1 as x +. A random variable X R has cdf F if P (X x) = F (x) for all x R. If F is differentiable on R, with derivative f, then X is continuously distributed with probability density function (pdf) f. Part A Simulation. HT 2017. J. Berestycki. 32 / 66

Generating Random Variables Using Inversion Proposition. Let F be a continuous and strictly increasing cdf on R, with inverse F 1 : [0, 1] R. Let U U[0, 1] then X = F 1 (U) has cdf F. Proof. We have P (X x) = P ( F 1 (U) x ) = P (U F (x)) = F (x). Proposition. Let F be a cdf on R and define its generalized inverse F 1 : [0, 1] R, F 1 (u) = inf {x R; F (x) u}. Let U U[0, 1] then X = F 1 (U) has cdf F. Part A Simulation. HT 2017. J. Berestycki. 33 / 66

Illustration of the Inversion Method 0.4 0.3 Gaussian 0.2 0.1 0 10 8 6 4 2 0 2 4 6 8 10 1 0.8 Cumulative distribution 0.6 u~u(0,1) 0.4 0.2 x=f 1 (u) 0 10 8 6 4 2 0 2 4 6 8 10 Top: pdf of a Gaussian r.v., bottom: associated cdf. Part A Simulation. HT 2017. J. Berestycki. 34 / 66

Examples Weibull distribution. Let α, λ > 0 then the Weibull cdf is given by F (x) = 1 exp ( λx α ), x 0. We calculate u = F (x) log (1 u) = λx α ( ) log (1 u) 1/α x =. λ As (1 U) U[0, 1] when U U[0, 1] we can use X = ( log U ) 1/α. λ Part A Simulation. HT 2017. J. Berestycki. 35 / 66

Examples Cauchy distribution. It has pdf and cdf 1 f(x) = π (1 + x 2 ), F (x) = 1 arc tan x + 2 π We have u = F (x) u = 1 arc tan x + ( ( 2 π x = tan π u 1 )) 2 Logistic distribution. It has pdf and cdf f(x) = exp( x) (1 + exp( x)) 2, F (x) = 1 1 + exp( x) ( ) u x = log. 1 u Practice: Derive an algorithm to simulate from an Exponential random variable with rate λ > 0. Part A Simulation. HT 2017. J. Berestycki. 36 / 66

Generating Discrete Random Variables Using Inversion If X is a discrete N r.v. with P (X = n) = p(n), we get F (x) = x j=0 p(j) and F 1 (u) is x N such that x 1 p(j) < u j=0 with the LHS= 0 if x = 0. x p(j) j=0 Note: the mapping at the values F (n) are irrelevant. Note: the same method is applicable to any discrete valued r.v. X, P (X = x n ) = p(n). Part A Simulation. HT 2017. J. Berestycki. 37 / 66

Illustration of the Inversion Method: Discrete case Part A Simulation. HT 2017. J. Berestycki. 38 / 66

Example: Geometric Distribution If 0 < p < 1 and q = 1 p and we want to simulate X Geom(p) then p(x) = pq x 1, F (x) = 1 q x x = 1, 2, 3... The smallest x N giving F (x) u is the smallest x 1 satisfying x log(1 u)/ log(q) and this is given by log(1 u) x = F 1 (u) = log(q) where x rounds up and we could replace 1 u with u. Part A Simulation. HT 2017. J. Berestycki. 39 / 66

Outline Introduction Monte Carlo integration Random variable generation Inversion Method Transformation Methods Rejection Sampling Part A Simulation. HT 2017. J. Berestycki. 40 / 66

Transformation Methods Suppose we have a random variable Y Q, Y Ω Q, which we can simulate (eg, by inversion) and some other variable X P, X Ω P, which we wish to simulate. Suppose we can find a function ϕ : Ω Q Ω P with the property that X = ϕ(y ). Then we can simulate from X by first simulating Y Q, and then set X = ϕ(y ). Inversion is a special case of this idea. We may generalize this idea to take functions of collections of variables with different distributions. Part A Simulation. HT 2017. J. Berestycki. 41 / 66

Transformation Methods Example: Let Y i, i = 1, 2,..., α, be iid variables with Y i Exp(1) and X = β 1 α i=1 Y i then X Gamma(α, β). Proof: The MGF of the random variable X is E ( e tx) = α i=1 ) E (e β 1 ty i = (1 t/β) α which is the MGF of a Gamma(α, β) variate. Incidentally, the Gamma(α, β) density is f X (x) = βα Γ(α) xα 1 e βx for x > 0. Practice: A generalized gamma variable Z with parameters a > 0, b > 0, σ > 0 has density f Z (z) = σba Γ(a/σ) za 1 e (bz)σ. Derive an algorithm to simulate from Z. Part A Simulation. HT 2017. J. Berestycki. 42 / 66

Transformation Methods: Box-Muller Algorithm For continuous random variables, a tool is the transformation/change of variables formula for pdf. Proposition. If R 2 Exp( 1 2 ) and Θ U[0, 2π] are independent then X = R cos Θ, Y = R sin Θ are independent with X N (0, 1), Y N (0, 1). Proof: We have f R 2,Θ(r 2, θ) = 1 2 exp ( r 2 /2 ) 1 2π and f X,Y (x, y) = f R 2,Θ(r 2, θ) det (r2, θ) (x, y) where det (r2, θ) (x, y) 1 ( = det x r y 2 r 2 x θ y θ ) ( cos θ = det sin 2r θ 2r r sin θ r cos θ ) = 1 2. Part A Simulation. HT 2017. J. Berestycki. 43 / 66

Transformation Methods: Box-Muller Algorithm Let U 1 U[0, 1] and U 2 U[0, 1] then and R 2 = 2 log(u 1 ) Exp Θ = 2πU 2 U[0, 2π] X = R cos Θ N (0, 1) Y = R sin Θ N (0, 1), ( ) 1 2 This still requires evaluating log, cos and sin. Part A Simulation. HT 2017. J. Berestycki. 44 / 66

Simulating Multivariate Normal Let consider X R d, X N(µ, Σ) where µ is the mean and Σ is the (positive definite) covariance matrix. ( f X (x) = (2π) d/2 det Σ 1/2 exp 1 ) 2 (x µ)t Σ 1 (x µ). Proposition. Let Z = (Z 1,..., Z d ) be a collection of d independent standard normal random variables. Let L be a real d d matrix satisfying LL T = Σ, and Then X = LZ + µ. X N (µ, Σ). Part A Simulation. HT 2017. J. Berestycki. 45 / 66

Simulating Multivariate Normal Proof. We have f Z (z) = (2π) d/2 exp ( 1 2 zt z ).The joint density of the new variables is f X (x) = f Z (z) z det x where z x = L 1 and det(l) = det(l T ) so det(l 2 ) = det(σ), and det(l 1 ) = 1/ det(l) so det(l 1 ) = det(σ) 1/2. Also z T z = (x µ) T ( L 1) T L 1 (x µ) = (x µ) T Σ 1 (x µ). If Σ = V DV T is the eigendecomposition of Σ, we can pick L = V D 1/2. Cholesky factorization Σ = LL T where L is a lower triangular matrix. See numerical analysis. Part A Simulation. HT 2017. J. Berestycki. 46 / 66

Outline Introduction Monte Carlo integration Random variable generation Inversion Method Transformation Methods Rejection Sampling Part A Simulation. HT 2017. J. Berestycki. 47 / 66

Rejection Sampling Let X be a continuous r.v. on Ω with pdf f X Consider a continuous rv variable U > 0 such that the conditional pdf of U given X = x is f U X (u x) = { 1 f X (x) if u < f X (x) 0 otherwise The joint pdf of (X, U) is f X,U (x, u) = f X (x) f U X (u x) = f X (x) 1 f X (x) I(0 < u < f X(x)) = I(0 < u < f X (x)) Uniform distribution on the set A = {(x, u) 0 < u < f X (x), x Ω} Part A Simulation. HT 2017. J. Berestycki. 48 / 66

Rejection Sampling Theorem (Fundamental Theorem of simulation) Let X be a rv on Ω with pdf or pmf f X. Simulating X is equivalent to simulating (X, U) Unif({(x, u) x Ω, 0 < u < f X (x)}) Part A Simulation. HT 2017. J. Berestycki. 49 / 66

Rejection Sampling Direct sampling of (X, U) uniformly over the set A is in general challenging Let S A be a bigger set such that simulating uniform rv on S is easy Rejection sampling technique: 1. Simulate (Y, V ) Unif(S), with simulated values y and v 2. if (y, v) A then stop and return X = y,u = v, 3. otherwise go back to 1. The resulting rv (X, U) is uniformly distributed on A X is marginally distributed from f X Part A Simulation. HT 2017. J. Berestycki. 50 / 66

Example: Beta density Let X Beta(5, 5) be a continuous rv with pdf f X (x) = where α = β = 5. f X (x) is upper bounded by 3 on [0, 1]. Γ(α + β) Γ(α)Γ(β) xα 1 (1 x) β 1, 0 < x < 1 Part A Simulation. HT 2017. J. Berestycki. 51 / 66

Example: Beta density Let S = {(y, v) y [0, 1], v [0, 3]} 1. Simulate Y U([0, 1]) and V U([0, 3]), with simulated values y and v 2. If v < f X (x), return X = x 3. Otherwise go back to Step 1. Only requires simulating uniform random variables and evaluating the pdf pointwise Part A Simulation. HT 2017. J. Berestycki. 52 / 66

Rejection Sampling Consider X a random variable on Ω with a pdf/pmf f(x), a target distribution We want to sample from f using a proposal pdf/pmf q which we can sample. Proposition. Suppose we can find a constant M such that f(x)/q(x) M for all x Ω. The following Rejection algorithm returns X f. Algorithm 2 Rejection sampling Step 1 - Simulate Y q and U U[0, 1], with simulated value y and u respectively. Step 2 - If u f(y)/q(y)/m then stop and return X = y, Step 3 - otherwise go back to Step 1. Part A Simulation. HT 2017. J. Berestycki. 53 / 66

Illustrations f(x) is the pdf of a Beta(5, 5) rv Proposal density q is the pdf of a uniform rv on [0, 1] Part A Simulation. HT 2017. J. Berestycki. 54 / 66

Illustrations X R with multimodal pdf Proposal density q is the pdf of a standardized normal Part A Simulation. HT 2017. J. Berestycki. 55 / 66

Rejection Sampling: Proof for discrete rv We have Pr (X = x) = We have = Pr (reject n 1 times, draw Y = x and accept it) n=1 Pr (reject Y ) n 1 Pr (draw Y = x and accept it) n=1 Pr (draw Y = x and accept it) = Pr (draw Y = x) Pr (accept Y Y = x) ( = q(x) Pr U f(y ) ) q(y ) /M Y = x = f(x) M Part A Simulation. HT 2017. J. Berestycki. 56 / 66

The probability of having a rejection is Pr (reject Y ) = x Ω Pr (draw Y = x and reject it) = ( q(x) Pr U f(y ) x Ω = ( q(x) 1 f(x) q(x)m x Ω ) q(y ) /M Y = x ) = 1 1 M Hence we have Pr (X = x) = = Pr (reject Y ) n 1 Pr (draw Y = x and accept it) n=1 n=1 ( 1 1 ) n 1 f(x) M M = f(x). Note the number of accept/reject trials has a geometric distribution of success probability 1/M, so the mean number of trials is M. Part A Simulation. HT 2017. J. Berestycki. 57 / 66

Rejection Sampling: Proof for continuous scalar rv Here is an alternative proof given for a continuous scalar variable X, the rejection algorithm still works but f, q are now pdfs. We accept the proposal Y whenever (U, Y ) f U,Y where f U,Y (u, y) = q(y)i (0,1) (u) satisfies U f(y )/(Mq(Y )). We have Pr (X x) = Pr (Y x U f(y )/Mq(Y )) Pr (Y x, U f(y )/Mq(Y )) = Pr (U f(y )/Mq(Y )) x f(y)/mq(y) 0 f U,Y (u, y)dudy = f(y)/mq(y) 0 f U,Y (u, y)dudy x f(y)/mq(y) 0 q(y)dudy = f(y)/mq(y) 0 q(y)dudy = x f(y)dy. Part A Simulation. HT 2017. J. Berestycki. 58 / 66

Example: Beta Density Assume you have for α, β 1 f(x) = Γ(α + β) Γ(α)Γ(β) xα 1 (1 x) β 1, 0 < x < 1 which is upper bounded on [0, 1]. We propose to use as a proposal q(x) = I (0,1) (x) the uniform density on [0, 1]. We need to find a bound M s.t. f(x)/mq(x) = f(x)/m 1. The smallest M is M = max 0<x<1 f(x) and we obtain by solving for f (x) = 0 ( ) Γ (α + β) α 1 α 1 ( ) β 1 β 1 M = Γ (α) Γ (β) α + β 2 α + β 2 }{{} M which gives f(y) Mq(y) = yα 1 (1 y) β 1 M. Part A Simulation. HT 2017. J. Berestycki. 59 / 66

Dealing with Unknown Normalising Constants In most practical scenarios, we only know f(x) and q(x) up to some normalising constants; i.e. f(x) = f(x)/z f and q(x) = q(x)/z q where f(x), q(x) are known but Z f = f(x)dx, Ω Z q = Ω q(x)dx are unknown/expensive to compute. Rejection can still be used: Indeed f(x)/q(x) M for all x Ω iff f(x)/ q(x) M, with M = Z f M/Z q. Practically, this means we can ignore the normalising constants from the start: if we can find M to bound f(x)/ q(x) then it is correct to accept with probability f(x)/ M q(x) in the rejection algorithm. In this case the mean number N of accept/reject trials will equal Z q M/Zf (that is, M again). Part A Simulation. HT 2017. J. Berestycki. 60 / 66

Simulating Gamma Random Variables We want to simulate a random variable X Gamma(α, β) which works for any α 1 (not just integers); f(x) = xα 1 exp( βx) Z f for x > 0, Z f = Γ(α)/β α so f(x) = x α 1 exp( βx) will do as our unnormalised target. When α = a is a positive integer we can simulate X Gamma(a, β) by adding a independent Exp(β) variables, Y i Exp(β), X = a i=1 Y i. Hence we can sample densities close in shape to Gamma(α, β) since we can sample Gamma( α, β). Perhaps this, or something like it, would make an envelope/proposal density? Part A Simulation. HT 2017. J. Berestycki. 61 / 66

Let a = α and let s try to use Gamma(a, b) as the envelope, so Y Gamma(a, b) for integer a 1 and some b > 0. The density of Y is q(x) = xa 1 exp( bx) Z q for x > 0, Z q = Γ(a)/b a so q(x) = x a 1 exp( bx) will do as our unnormalised envelope function. We have to check whether the ratio f(x)/ q(x) is bounded over R + where f(x)/ q(x) = x α a exp( (β b)x). Consider (a) x 0 and (b) x. For (a) we need a α so a = α is indeed fine. For (b) we need b < β (not b = β since we need the exponential to kill off the growth of x α a ). Part A Simulation. HT 2017. J. Berestycki. 62 / 66

Given that we have chosen a = α and b < β for the ratio to be bounded, we now compute the bound. d dx ( f(x)/ q(x)) = 0 at x = (α a)/(β b) (and this must be a maximum at x 0 under our conditions on a and b), so f(x)/ q(x) M for all x 0 if M = ( ) α a α a exp( (α a)). β b Accept Y at step 2 of Rejection Sampler if U f(y )/ M q(y ) where f(y )/ M q(y ) = Y α a exp( (β b)y )/ M. Part A Simulation. HT 2017. J. Berestycki. 63 / 66

Simulating Gamma Random Variables: Best choice of b Any 0 < b < β will do, but is there a best choice of b? Idea: choose b to minimize the expected number of simulations of Y per sample X output. Since the number N of trials is Geometric, with success probability Z f /( MZ q ), the expected number of trials is E(N) = Z q M/Zf. Now Z f = Γ(α)β α where Γ is the Gamma function related to the factorial. Practice: Show that the optimal b solves d db (b a (β b) α+a ) = 0 so deduce that b = β(a/α) is the optimal choice. Part A Simulation. HT 2017. J. Berestycki. 64 / 66

Simulating Normal Random Variables Let f(x) = (2π) 1 2 exp( 1 2 x2 ) and q(x) = 1/π/(1 + x 2 ). We have which is attained at ±1. f(x) q(x) = (1 + x2 ) exp ( 12 ) x2 2/ e = M Hence the probability of acceptance is ( P U f(y ) ) = Z f 2π e = M q(y ) MZ q 2 e π = 2π 0.66 and the mean number of trials to success is approximately 1/0.66 1.52. Part A Simulation. HT 2017. J. Berestycki. 65 / 66

The acceptance probability goes exponentially fast to zero with d. Part A Simulation. HT 2017. J. Berestycki. 66 / 66 Rejection Sampling in High Dimension Consider and f(x 1,..., x d ) = exp q(x 1,..., x d ) = exp For σ > 1, we have ( f(x 1,..., x d ) q(x 1,..., x d ) = exp 1 2 ( ( 1 2 1 2σ 2 d k=1 d k=1 ( 1 σ 2 ) d k=1 x 2 k ) x 2 k x 2 k ) ) 1 = M. The acceptance probability of a proposal for σ > 1 is ( P U f(x ) 1,..., X d ) = Z f = σ M q(x 1,..., X d ) MZ d. q