Stat 451 Lecture Notes Simulating Random Variables

Similar documents
Stat 451 Lecture Notes Monte Carlo Integration

S6880 #7. Generate Non-uniform Random Number #1

Stat 451 Lecture Notes Markov Chain Monte Carlo. Ryan Martin UIC

Stat 451 Lecture Notes Numerical Integration

Random variables. DS GA 1002 Probability and Statistics for Data Science.

Non-parametric Inference and Resampling

Simulating Random Variables

Basic Sampling Methods

Generating Random Numbers

Simulating from a gamma distribution with small shape parameter

Lecture 2: Repetition of probability theory and statistics

2 Random Variable Generation

Statistics for scientists and engineers

Probability and Distributions

Ch3. Generating Random Variates with Non-Uniform Distributions

The Mixture Approach for Simulating New Families of Bivariate Distributions with Specified Correlations

STAT 430/510: Lecture 15

Introduction and Overview STAT 421, SP Course Instructor

Multivariate Simulations

CS 361: Probability & Statistics

Generation from simple discrete distributions

PROBABILITY DISTRIBUTIONS. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception

Statistical Methods in Particle Physics

Practice Problems Section Problems

Machine Learning. Probability Basics. Marc Toussaint University of Stuttgart Summer 2014

Functions of Random Variables Notes of STAT 6205 by Dr. Fan

( ) ( ) Monte Carlo Methods Interested in. E f X = f x d x. Examples:

Previously Monte Carlo Integration

MAS223 Statistical Inference and Modelling Exercises

Review. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Recall the Basics of Hypothesis Testing

CS 361: Probability & Statistics

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions

Common ontinuous random variables

Probability Distributions Columns (a) through (d)

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS

14.30 Introduction to Statistical Methods in Economics Spring 2009

General Principles in Random Variates Generation

Nonlife Actuarial Models. Chapter 14 Basic Monte Carlo Methods

STAT2201. Analysis of Engineering & Scientific Data. Unit 3

Sampling Distributions of Statistics Corresponds to Chapter 5 of Tamhane and Dunlop

STA 4273H: Statistical Machine Learning

So we will instead use the Jacobian method for inferring the PDF of functionally related random variables; see Bertsekas & Tsitsiklis Sec. 4.1.

1 Introduction. P (n = 1 red ball drawn) =

Stat 5101 Notes: Brand Name Distributions

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner

functions Poisson distribution Normal distribution Arbitrary functions

Basics on Probability. Jingrui He 09/11/2007

1 Review of Probability and Distributions

Deep Variational Inference. FLARE Reading Group Presentation Wesley Tansey 9/28/2016

Computer Intensive Methods in Mathematical Statistics

Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2

Random Variables and Their Distributions

Lecture 1: August 28

Statistical Data Analysis Stat 3: p-values, parameter estimation

Stat 5101 Lecture Notes

Estimation of Quantiles

conditional cdf, conditional pdf, total probability theorem?

Statistics. Lecture 2 August 7, 2000 Frank Porter Caltech. The Fundamentals; Point Estimation. Maximum Likelihood, Least Squares and All That

Chapter 2: Resampling Maarten Jansen

2 Functions of random variables

Lecture 11: Probability, Order Statistics and Sampling

Random Variate Generation

DS-GA 1002 Lecture notes 2 Fall Random variables

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3

Simulation. Where real stuff starts

Bayesian Models in Machine Learning

Classical and Bayesian inference

Generating the Sample

Lecture 2. (See Exercise 7.22, 7.23, 7.24 in Casella & Berger)

Math Stochastic Processes & Simulation. Davar Khoshnevisan University of Utah

CPSC 340: Machine Learning and Data Mining

MFM Practitioner Module: Quantitative Risk Management. John Dodson. September 23, 2015

Computational statistics

Direct Simulation Methods #2

15-388/688 - Practical Data Science: Basic probability. J. Zico Kolter Carnegie Mellon University Spring 2018

IEOR 4703: Homework 2 Solutions

MTH739U/P: Topics in Scientific Computing Autumn 2016 Week 6

Algorithms for Uncertainty Quantification

Artificial Intelligence

Statistics: Learning models from data

Stat 5421 Lecture Notes Proper Conjugate Priors for Exponential Families Charles J. Geyer March 28, 2016

Lecture 3. G. Cowan. Lecture 3 page 1. Lectures on Statistical Data Analysis

Stat 5101 Lecture Slides: Deck 7 Asymptotics, also called Large Sample Theory. Charles J. Geyer School of Statistics University of Minnesota

Part IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems

Joint Probability Distributions and Random Samples (Devore Chapter Five)

STA 711: Probability & Measure Theory Robert L. Wolpert

3 Continuous Random Variables

Mathematical Statistics

We introduce methods that are useful in:

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Chapter 2 Continuous Distributions

Distributions of Functions of Random Variables. 5.1 Functions of One Random Variable

One-parameter models

Bayesian Methods for Machine Learning

Lecture 1: Random number generation, permutation test, and the bootstrap. August 25, 2016

Course: ESO-209 Home Work: 1 Instructor: Debasis Kundu

Chapter 5 continued. Chapter 5 sections

Lecture 6: Markov Chain Monte Carlo

Transcription:

Stat 451 Lecture Notes 05 12 Simulating Random Variables Ryan Martin UIC www.math.uic.edu/~rgmartin 1 Based on Chapter 6 in Givens & Hoeting, Chapter 22 in Lange, and Chapter 2 in Robert & Casella 2 Updated: March 7, 2016 1 / 47

Outline 1 Introduction 2 Direct sampling techniques 3 Fundamental theorem of simulation 4 Indirect sampling techniques 5 Sampling importance resampling 6 Summary 2 / 47

Motivation Simulation is a very powerful tool for statisticians. It allows us to investigate the performance of statistical methods before delving deep into difficult theoretical work. At a more practical level, integrals themselves are important for statisticians: p-values are nothing but integrals; Bayesians need to evaluate integrals to produce posterior probabilities, point estimates, and model selection criteria. Therefore, there is a need to understand simulation techniques and how they can be used for integral approximations. 3 / 47

Basic Monte Carlo Suppose we have a function ϕ(x) and we d like to compute E{ϕ(X )} = ϕ(x)f (x) dx, where f (x) is a density. There is no guarantee that the techniques we learn in calculus are sufficient to evaluate this integral analytically. Thankfully, the law of large numbers (LLN) is here to help: If X 1, X 2,... are iid samples from f (x), then 1 n n i=1 ϕ(x i) ϕ(x)f (x) dx with prob 1. Suggests that a generic approximation of the integral be obtained by sampling lots of X i s from f (x) and replacing integration with averaging. This is the heart of the Monte Carlo method. 4 / 47

What follows? Here we focus mostly on simulation techniques. Some of these will be familiar, others probably not. As soon as we know how to produce samples from a distribution, the basic Monte Carlo above can be used to approximate any expectation. But there are problems where it is not possible to sample from a distribution exactly. We ll discuss this point more later. 5 / 47

Outline 1 Introduction 2 Direct sampling techniques 3 Fundamental theorem of simulation 4 Indirect sampling techniques 5 Sampling importance resampling 6 Summary 6 / 47

Generating uniform RVs Generating a single U from a uniform distribution on [0, 1] seems simple enough. However, there are a number of concerns to be addressed. For example, is it even possible for a computer, which is precise but ultimately discrete, to produce any number between 0 and 1? Furthermore, how can a deterministic computer possibly generate anything that s really random? While it s important to understand that these questions are out there, we will side-step them and assume that calls of runif in R produce bona fide uniform RVs. 7 / 47

Inverse cdf transform Suppose we want to simulate X whose distribution has a d given cdf F, i.e., dx F (x) = f (x). If F is continuous and strictly increasing, then F 1 exists. Then sampling U Unif(0, 1) and setting X = F 1 (U) does the job can you prove it? This method is (sometimes) called the inversion method. The assumptions above can be weakened to some extent. 8 / 47

Example exponential distribution For an exponential distribution with rate λ, we have f (x) = λe λx and F (x) = 1 e λx. It is easy to check that the inverse cdf is F 1 (u) = log(1 u)/λ, u (0, 1). Therefore, to sample X from an exponential distribution: 1 Sample U Unif(0, 1). 2 Set X = log(1 U)/λ. Can be easily vectorized to get samples of size n. This is in the R function rexp be careful about rate vs. scale parametrization. 9 / 47

Example Cauchy distribution The standard Cauchy distribution has pdf and cdf f (x) = 1 π(1 + x 2 ) and F (x) = 1/2 + arctan(x)/π. This distribution has shape similar to normal, but tails are much heavier Cauchy has no finite moments. But its cdf can be inverted: F 1 (u) = tan[π(u 1/2)], u (0, 1). To generate X from standard Cauchy: Sample U Unif(0, 1). Set X = tan[π(u 1/2)]. Non-standard Cauchy (location µ and scale σ): µ + σx. Use rt(n, df=1) in R. 10 / 47

Example discrete uniform distribution Suppose we want X to be sampled uniformly from {1,..., N}. Here is an example where the cdf is neither continuous nor strictly increasing. The idea is as follows: 1 Divide up the interval [0, 1] into N equal subintervals; i.e., [0, 1/N), [1/N, 2/N) and so forth. 2 Sample U Unif(0, 1). 3 If i/n U < (i + 1)/N, then X = i + 1. More simply, set X = nu + 1. This is equivalent to sample(n, size=1) in R. 11 / 47

Example triangular distribution The (symmetric!) pdf of X is given by { 1 + x if 1 x < 0, f (x) = 1 x if 0 x 1. If we restrict X to [0, 1], then the cdf is simply F (x) = 1 (1 x) 2, x [0, 1]. For this sub-problem the inverse is F 1 (u) = 1 1 u, u [0, 1]. To sample X from the triangular distribution: 1 Sample U Unif(0, 1). 2 Set X = 1 1 U. 3 Take X = ± X based on a flip of a coin. 12 / 47

Sampling normal RVs While normal RVs can, in principle, be generating using the cdf transform method, this requires evaluation of the standard normal inverse cdf, which is a non-trivial calculation. There are a number of fast and efficient alternatives for generating normal RVs. The one below, due to Box and Muller, is based on some trigonometric transformations. 13 / 47

Box Muller method This method generates a pair of normal RVs X and Y. The method is based on the following facts: The cartesian coordinates (X, Y ) are equivalent to the polar coordinates (Θ, R), and the polar coordinates have a joint pdf (2π) 1 r e r 2 /2, (θ, r) [0, 2π] [0, ). Then Θ Unif(0, 2π) and R 2 Exp(2) are independent. So to generate independent normal X and Y : 1 Sample U, V Unif(0, 1). 2 Set R 2 = 2 log V and Θ = 2πU. 3 Finally, take X = R cos Θ and Y = R sin Θ. Take a linear function to get different mean and variance. 14 / 47

Bernoulli RVs Perhaps the simplest RVs are Bernoulli RVs ones that take only values 0 or 1. To generate X Ber(p): 1 Sample U Unif(0, 1). 2 If U p, then set X = 1; otherwise set X = 0. In R, use rbinom(n, size=1, prob=p). 15 / 47

Binomial RVs Since X Bin(n, p) is distributionally the same as X 1 + + X n, where the X i s are independent Ber(p) RVs, the previous slides gives a natural strategy to sample X. That is, to sample X Bin(n, p), generate X 1,..., X n independently from Ber(p) and set X equal to their sum. 16 / 47

Poisson RVs Poisson RVs can be constructed from a Poisson process, an integer-valued continuous time stochastic process. By definition, the number of events of a Poisson process in a fixed interval of time is a Poisson RV with mean proportional to the length of the interval. But the time between events are independent exponentials. Therefore, if Y 1, Y 2,... are independent Exp(1) RVs, then X = max { k : k i=1 Y i λ } then X Pois(λ). In R, use rpois(n, lambda). 17 / 47

Chi-square RVs The chi-square RV X (with n degrees of freedom) is defined as follows: Z 1,..., Z n are independent N(0, 1) RVs. Take X = Z 2 1 + + Z 2 n. Therefore, to sample X ChiSq(n) take the sum of squares of n independent standard normal RVs. Independent normals can be sampled using Box Muller. 18 / 47

Student-t RVs A Student-t RV X (with ν degrees of freedom) is defined as the ratio of a standard normal and the square root of an independent (normalized) chi-square RV. More formally, let Z N(0, 1) and Y ChiSq(ν); then is a t ν RV. X = Z/ Y /ν Remember the scale mixture of normals representation...? In R, use rt(n, df=nu). 19 / 47

Multivariate normal RVs The p-dimensional normal distribution has a mean vector µ and a p p variance-covariance matrix Σ. The techniques above can be used to sample a vector Z = (Z 1,..., Z p ) of independent normal RVs. But how to incorporate the dependence contained in Σ? Let Σ = ΩΩ be the Cholesky decomposition of Σ. It can be shown that X = µ + ΩZ is the desired p-dimensional normal distribution. Can you prove it? 20 / 47

Outline 1 Introduction 2 Direct sampling techniques 3 Fundamental theorem of simulation 4 Indirect sampling techniques 5 Sampling importance resampling 6 Summary 21 / 47

Intuition Let f be a density function on an arbitrary space X; the goal is to simulate from f. Note the trivial identity: f (x) = f (x) 0 du. This identity implicitly introduces an auxiliary variable U with a conditionally uniform distribution. The intuition behind this viewpoint is that simulating from the joint distribution of (X, U) might be easy, and then we can just throw away U to get a sample of X f... 22 / 47

The theorem Theorem. Simulating X f is equivalent to simulating (X, U) Unif({(x, u) : 0 < u < f (x)}) and then throwing away U. Proof: Write the density of (X, U) and integrate out U. How to implement this? One kind-of silly 3 idea: X f and U (X = x) Unif(0, f (x)). A better idea: conditioning preserves uniformity, i.e., Z Unif(Z) = Z (Z Z 0 ) Unif(Z 0 ). 3 Not actually silly, it s used in the accept reject method... 23 / 47

More on implementation The conditioning preserves uniformity point can be interpreted as follows. Suppose that A is a set that contains {(x, u) : 0 < u < f (x)}. Simulate (X, U) uniformly on A, and keep (X, U) only if U < f (X ). Such a pair (X, U) is uniformly distributed on the constraint set, so X has the desired distribution f. Efficiency of sampling depends on how tightly A fits the desired constraint set. For a one-dimensional X, with bounded support and bounded f, a reasonable choice for A is a rectangle; see below. Idea extends this is what the next sections are about! 24 / 47

Example beta simulation Simulate from a Beta(2.7, 6.3) distribution. Start with uniforms in a box but keep only those that satisfy the constraint. Simulated 2000 uniforms, kept only 744 betas. U 0.0 0.5 1.0 1.5 2.0 2.5 Density 0.0 0.5 1.0 1.5 2.0 2.5 0.0 0.2 0.4 0.6 0.8 1.0 Z 0.0 0.2 0.4 0.6 X 25 / 47

Outline 1 Introduction 2 Direct sampling techniques 3 Fundamental theorem of simulation 4 Indirect sampling techniques Acceptance rejection sampling Ratio method 5 Sampling importance resampling 6 Summary 26 / 47

Outline 1 Introduction 2 Direct sampling techniques 3 Fundamental theorem of simulation 4 Indirect sampling techniques Acceptance rejection sampling Ratio method 5 Sampling importance resampling 6 Summary 27 / 47

Majorization Suppose we want to sample from a distribution with pdf f (x). Suppose, further, that f (x) h(x) := Mg(x) where g(x) is a (simple) pdf and M is a constant > 1. We say h(x) majorizes f (x). The goal is to use samples from the (easy to sample) pdf g(x) as approximate samples from f (x). But it is clear that unless g(x) f (x), there will be samples from g that are not representative of f. The idea is to throw away those bad samples. 28 / 47

Acceptance rejection method The rule that determines when a sample is thrown away is called the acceptance rejection method. To approximately sample X from f (x): 1 Sample U Unif(0, 1). 2 Sample X g(x). 3 Keep X = X if U f (X )/h(x ) Try to prove the following: accept reject returns a sample from f (see theorem ); acceptance probability is 1/M. Goal is to make acceptance prob high, i.e., M 1... (?) Note: it is not necessary to know f exactly it s enough to know f only up to a proportionality constant. 29 / 47

Example Robert & Casella book cover True density is a weird trig function see code. Majorant is a normal. Histogram of n = 2000 samples from trig density. f(xx) 0 1 2 3 4 5 Density 0.0 0.2 0.4 0.6 0.8-4 -2 0 2 4 xx -3-2 -1 0 1 2 3 X 30 / 47

Example von Mises distribution Consider the von Mises distribution with pdf f (x) = exp{ κ cos x}, x [0, 2π], 2πI 0 (κ) where I 0 (κ) is a Bessel function. This distribution is often referred to as a circular normal distribution, and is a popular model in directional statistics. To sample from the von Mises distribution for fixed κ, we implement the acceptance-rejection method with majorant made out of exponential pdfs. Only trick is to get a good envelope/majorant. See code for more details. 31 / 47

von Mises distribution (cont.) von Mises distribution with κ = 2. 4 Uses two oppositely oriented exponentials for the envelope. n = 5000 samples; acceptance rate (for κ = 2) is 0.81. (Un-normalized) Density 0 1 2 3 4 5 6 7 True Majorant Density 0.0 0.1 0.2 0.3 0.4 0.5 0 1 2 3 4 5 6 x 0 1 2 3 4 5 6 x 4 For some reason, majorant construction in my code only works for κ > 1... 32 / 47

Example small-shape gamma distribution Suppose we want X Gamma(α, 1), with α 0.001. Try using rgamma for this you ll get lots of exact zeros! Can we develop a better/more efficient method? Towards an accept reject method, we have the following: α log X Exp(1) in distribution, as α 0. Suggests that we can get good samples of log X by doing accept reject with (basically) an exponential envelope. This is some work of mine: a version of the paper is at arxiv:1302.1884; R code is on my research website. 33 / 47

Example small-shape gamma (cont.) Left panel shows (un-normalized) density of Z = α log X, for α = 0.005, along with the proposed envelope. It turns out that this approach is more efficient than other accept reject methods for this problem, based on acceptance rate (as a function of α). h α(z) 0.0 0.2 0.4 0.6 0.8 1.0 Acceptance Rate 0.92 0.94 0.96 0.98 1.00 Ahrens Dieter Best Kundu Gupta rgamss 0.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0 z 0.00 0.02 0.04 0.06 0.08 0.10 α 34 / 47

Outline 1 Introduction 2 Direct sampling techniques 3 Fundamental theorem of simulation 4 Indirect sampling techniques Acceptance rejection sampling Ratio method 5 Sampling importance resampling 6 Summary 35 / 47

Bounding set Suppose we want to sample from a pdf f (x). All that matters is the shape of f (x), so we can remove any constants and consider h(x) = cf (x), for some c > 0. Define the set in R 2 : S h = { (u, v) : 0 < u h(v/u) 1/2}. If S h is bounded, then we can find a bounding set that encloses it. Ideally, the bounding set should be simple, e.g., a rectangle. 36 / 47

Ratio method Suppose S h is bounded, in which case we can find a rectangle that encloses it. To sample X from f (x), the ratio method goes as follows: 1 Sample (U, V ) uniformly from the bounding rectangle. 2 If (U, V ) S h, then X = V /U is a sample from f (x). The proof is not too hard, but requires some tedious jacobian-type of calculations; see Lange. Naturally, some draws from the rectangle will be rejected the efficiency of the sampling algorithm depends on how closely the bounding set matches S h. 37 / 47

Example gamma distribution Sample X Gamma(α, 1) for non-integer α. To apply the ratio method, take h(x) = x α 1 e x for x > 0. It can be shown that, in general, if h(x) = 0 for x < 0, then the rectangle [0, k u ] [0, k v ], with encloses S h. k u = sup x For the gamma case, h(x) 1/2 and k v = sup{ x h(x) }, x k u = [(α 1)/e] (α 1)/2 and k v = [(α + 1)/e] (α+1)/2. 38 / 47

Example gamma distribution (cont.) Ratio method samples X Gamma(α, 1) as follows: 1 Sample U, V Unif(0, 1), set U = k u U and V = k v V. 2 Set X = V /U. 3 If U X (α 1)/2 e X /2, then accept X. Example: α = 7.7. Ratio method has acceptance probability 0.44. Density 0.00 0.05 0.10 0.15 5 10 15 20 x 39 / 47

Outline 1 Introduction 2 Direct sampling techniques 3 Fundamental theorem of simulation 4 Indirect sampling techniques 5 Sampling importance resampling 6 Summary 40 / 47

Motivation Previous methods are exact in the sense that the distribution of the draw X (given that it s accepted) is the target distribution f. Is it necessary for the sampling to be exact? An interesting idea is to sample from a different distribution g(x), which is similar to f (x), and weight these samples in such a way that a resample according to the given weights looks like a sample from f (x). This is the idea of sampling importance resampling (SIR). Not unlike acceptance rejection sampling. 41 / 47

SIR algorithm Suppose that the target pdf is f (x), which may be known only up to a proportionality constant. Let g(x) be another pdf with the same support as f (x). SIR algorithm: 1 Take an independent sample Y 1,..., Y m from g. 2 Calculate the standardized importance weights w(y j ) f (Y j )/g(y j ), j = 1,..., m. 3 Resample X 1,..., X n with replacement from {Y 1,..., Y m } with probabilities w(y 1 ),..., w(y m ). The resulting sample X 1,..., X n is approximately distributed according to f (x). 42 / 47

Remarks One can prove that, as m, P(X 1 A) A f (x) dx, and it is in this sense that the sampling is approximate. The choice of envelope g is not trivial. Need f (x)/g(x) to not be too large. If this is violated, then it may happen that one w(y j ) will be almost identically 1, meaning that the X -sample would all be the same value. Theory suggests m should be large, but large here depends on the desired size n of the sample from the target. In general, the Monte Carlo estimates based on the SIR sample will have variances larger than could be obtained with other methods (e.g., importance sampling, next). 43 / 47

Example Bayesian inference via SIR Consider the problem from old homework, where the likelihood function is L(θ) n {1 cos(x i θ)}, π θ π. i=1 Observed data (X 1,..., X n ) given in the code. Assume that θ is given a Unif( π, π) prior distribution. Use SIR algorithm, with the prior as the envelope; N = 10 3. Plots below. 44 / 47

Example Bayesian inference via SIR (cont.) Left is a histogram of the importance weights. Right is a histogram the SIR sample (density overlaid). Density 0 50000 100000 150000 Density 0.0 0.5 1.0 1.5 0.00000 0.00002 0.00004 0.00006 0.00008 0.00010 0.00012 w -1.0-0.5 0.0 0.5 1.0 1.5 θ 45 / 47

Outline 1 Introduction 2 Direct sampling techniques 3 Fundamental theorem of simulation 4 Indirect sampling techniques 5 Sampling importance resampling 6 Summary 46 / 47

Remarks Simulating random variables is important for various applications of the Monte Carlo method. Some distributions can be easily simulated via the inversion method, while others require more care. Fundamental theorem provides a general strategy for simulating from non-standard distributions, though its implementation may not be straightforward. The accept reject method is a clever implementation. Method s efficiency relies on how close the envelope function is to the target density. Not easy to make a good choice. Various automatic/adaptive methods are available. The accept reject idea appeared in the SIR context, and will appear again later... 47 / 47