Simulating Random Variables

Similar documents
Chapter 5 continued. Chapter 5 sections

Stat 451 Lecture Notes Simulating Random Variables

S6880 #7. Generate Non-uniform Random Number #1

Approximate Normality, Newton-Raphson, & Multivariate Delta Method

Chapter 5. Chapter 5 sections

Chapter 3 sections. SKIP: 3.10 Markov Chains. SKIP: pages Chapter 3 - continued

Generating Random Numbers

2 Random Variable Generation

Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2

Chapter 3 sections. SKIP: 3.10 Markov Chains. SKIP: pages Chapter 3 - continued

MAS223 Statistical Inference and Modelling Exercises

Distributions of Functions of Random Variables. 5.1 Functions of One Random Variable

1 Acceptance-Rejection Method

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence

ISyE 6644 Fall 2014 Test 3 Solutions

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Math 3215 Intro. Probability & Statistics Summer 14. Homework 5: Due 7/3/14

Probability and Distributions

t x 1 e t dt, and simplify the answer when possible (for example, when r is a positive even number). In particular, confirm that EX 4 = 3.

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions

( ) ( ) Monte Carlo Methods Interested in. E f X = f x d x. Examples:

Part 6: Multivariate Normal and Linear Models

APPM/MATH 4/5520 Solutions to Exam I Review Problems. f X 1,X 2. 2e x 1 x 2. = x 2

Contents 1. Contents

2 Functions of random variables

17 : Markov Chain Monte Carlo

Principles of Bayesian Inference

Generating Random Variables

Multivariate Simulations

CPSC 540: Machine Learning

Nonlife Actuarial Models. Chapter 14 Basic Monte Carlo Methods

Generation from simple discrete distributions

Qualifying Exam CS 661: System Simulation Summer 2013 Prof. Marvin K. Nakayama

Brief Review of Probability

ST 740: Multiparameter Inference

Parameter Estimation

STAT Chapter 5 Continuous Distributions

Metropolis-Hastings sampling

CTDL-Positive Stable Frailty Model

Ch3. Generating Random Variates with Non-Uniform Distributions

Efficient Sampling Methods for Truncated Multivariate Normal and Student-t Distributions Subject to Linear Inequality Constraints

PARAMETER ESTIMATION: BAYESIAN APPROACH. These notes summarize the lectures on Bayesian parameter estimation.

Continuous Random Variables

Bayesian Inference II. Course lecturer: Loukia Meligkotsidou

Exploring Monte Carlo Methods

Deep Variational Inference. FLARE Reading Group Presentation Wesley Tansey 9/28/2016

Markov Chain Monte Carlo (MCMC) and Model Evaluation. August 15, 2017

Markov Chain Monte Carlo methods

Foundations of Statistical Inference

Generating pseudo- random numbers

Statistics for scientists and engineers

MCMC algorithms for fitting Bayesian models

Chapter 6: Functions of Random Variables

A Fully Nonparametric Modeling Approach to. BNP Binary Regression

Basic Sampling Methods

Generating the Sample

STA 294: Stochastic Processes & Bayesian Nonparametrics

Final Examination a. STA 532: Statistical Inference. Wednesday, 2015 Apr 29, 7:00 10:00pm. Thisisaclosed bookexam books&phonesonthefloor.

Bayesian Methods with Monte Carlo Markov Chains II

Joint Probability Distributions and Random Samples (Devore Chapter Five)

Stat 5101 Lecture Notes

Experimental Design and Statistics - AGA47A

Stat 542: Item Response Theory Modeling Using The Extended Rank Likelihood

2905 Queueing Theory and Simulation PART IV: SIMULATION

Motivation Scale Mixutres of Normals Finite Gaussian Mixtures Skew-Normal Models. Mixture Models. Econ 690. Purdue University

A Few Special Distributions and Their Properties

Dependence. Practitioner Course: Portfolio Optimization. John Dodson. September 10, Dependence. John Dodson. Outline.

STAT 461/561- Assignments, Year 2015

1 Simulating normal (Gaussian) rvs with applications to simulating Brownian motion and geometric Brownian motion in one and two dimensions

Markov Chain Monte Carlo (MCMC)

Stat 426 : Homework 1.

The Bayesian Choice. Christian P. Robert. From Decision-Theoretic Foundations to Computational Implementation. Second Edition.

Stat 451 Lecture Notes Markov Chain Monte Carlo. Ryan Martin UIC

Continuous random variables

PMR Learning as Inference

19 : Slice Sampling and HMC

Spring 2012 Math 541B Exam 1

Bayesian Multivariate Logistic Regression

Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing

Collapsed Gibbs Sampler for Dirichlet Process Gaussian Mixture Models (DPGMM)

Multivariate Distributions

Bayesian model selection in graphs by using BDgraph package

Stat 5101 Notes: Brand Name Distributions

Bayesian Methods for Machine Learning

ST 740: Linear Models and Multivariate Normal Inference

18.440: Lecture 28 Lectures Review

IE 581 Introduction to Stochastic Simulation

Chapter 4 Multiple Random Variables

STAT J535: Chapter 5: Classes of Bayesian Priors

STA 4273H: Statistical Machine Learning

Reminder of some Markov Chain properties:

Random Variables and Their Distributions

The binomial model. Assume a uniform prior distribution on p(θ). Write the pdf for this distribution.

Random Numbers and Simulation

PROBABILITY DISTRIBUTIONS. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception

Lecture 1: August 28

Monte Carlo integration

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 8 10/1/2008 CONTINUOUS RANDOM VARIABLES

Review of Probabilities and Basic Statistics

Monte Carlo Inference Methods

Transcription:

Simulating Random Variables Timothy Hanson Department of Statistics, University of South Carolina Stat 740: Statistical Computing 1 / 23

R has many built-in random number generators... Beta, gamma (also χ 2 and exponential), normal (also Student s t & Cauchy, F, log-normal), Weibull, logistic, binomial, geometric, hypergeometric, Poisson, etc. For each distribution, R has the pdf/pmf, quantile function, cdf, and an independent random number generator. R also has the distribution of different test statistics, e.g. Tukey s studentized range, Wilcoxin rank sum statistic, etc. There are packages to sample multivariate normal, Wishart and inverse Wishart, multivariate t, Pareto, etc. Google is your friend. We will discuss methods for simulating random variables anyway for when you run into non-standard ones. 2 / 23

Everything starts with uniform... Simulating U 1, U 2, U 3, iid U(0, 1) is the main building block for all that follows. Random uniform generators are not random. In R try set.seed(1) then runif(10) several times. They are said to be pseudo random they satisfy certain statistical tests we d expect independent uniforms to pass, e.g. Kolmogorov-Smirnov. Look up Diehard tests in Wikipedia. Try?RNG to see what R is capable of and the default. Historically common: conguential generators, see pp. 72 75. R sets the seed by the current time and process ID. 3 / 23

Inverse transformation Important result (p. 39): U U(0, 1), and X = F (U) implies X F ( ). The generalized inverse of a non-decreasing cdf F ( ) is F (u) = inf{x : F (x) u}. If F ( ) is monotone increasing and continuous over its support, representing a continuous random variable, F (u) = F 1 (u). You just need to find the inverse function. Proof of result straightforward (board). (p. 44) If F (u) is a stair function with jumps at x 1, x 2, x 3,..., representing a discrete random variable, then U U(0, 1) and X = x j F (x j 1 ) < U < F (x j ) implies X F ( ). Here, F (x 0 ) = 0. sample automates this last result; ddiscrete, pdiscrete, qdiscrete, and rdiscrete are in the e1071 package. 4 / 23

Examples... Inverses can be easily derived in closed-form: exp(λ) (Ex. 2.5, p. 39) Weibull(α, β) Pareto Cauchy Inverses not available in closed-form: Normal (although R uses inversion as the default!) beta gamma F (is there another way?) 5 / 23

Section 2.2 Tricks and relationships... Lots of clever tricks, relationships among variables, etc. on pp. 42 46: Box-Muller for normal r.v. (can be implemented in rnorm), Poisson via waiting times, beta from order statistics, gamma from beta & exponential, etc. These can be used but are often not optimal. There are a few that are good for MCMC (coming up). 6 / 23

Sampling multivariate normals Want to simulate y N p (µ, Σ). Recall if z 1,..., z p iid N(0, 1), z = (z1,..., z p ), a R m and A R m p then a + Az N m (a, AA ). A Cholesky decomposition produces a C such that Σ = C C where C is upper triangular. Thus Σ = C C µ + C z N p (µ, Σ). 7 / 23

Some other multivariate distributions To sample (q 1,..., q p ) Dirichlet(α 1,..., α p ) take ind. y i Γ(α i, 1) and q i = for i = 1,..., p. y i k j=1 y j To sample Σ Wishart p (k, S 0 ) the def n Σ = k x i x i, i=1 x 1,..., x k iid Np (0, Σ), is impractical when k is large. Odell, and Feiveson (1966, JASA) give an efficient method based on χ 2 and normal r.v. To sample n mult(n, q), independently sample discrete Y 1,..., Y n where P(Y i = k) = q k for k = 1,..., p and set n n k = I {Y i = k}, k = 1,..., p. i=1 Examples: multivariate normal; Dirichlet. 8 / 23

Some other multivariate distributions There are algorithms to simulate many other multivariate distributions (e.g. multivariate t); Google is your friend. R has rwishart and rmultinom, rdirichlet is in MCMCpack, rmvnorm is in mvtnorm, etc. Many more versions of all of these floating around different packages as well as functions to evaluate the pdf/pmf/cdf, etc. IMSL is a package of numeric routines for FORTRAN 90/95 that includes various random number generators, pmf/pdf/cdf/quantile functions, etc. 9 / 23

Fundamental theorem of simulation Back to univariate simulation... Over pp. 47 50 is a general idea that can be paraphrased as follows. To simulate from a (possibly unnormalized) density Y f ( ), we can find a density g(x) such that f (x) Mg(x) for all x, then (a) simulate from X g( ) and (b) accept Y = X with probability f (X ) Mg(X ). If Y not accepted repeat (a) and (b). This is the same as X g( ) indep. U U(0, 1) and accepting Y = X U f (X ) Mg(X ). Called the accept-reject algorithm. Read Section 2.3.2 for examples and implementation notes. In particular, the probability of accepting is 1 M when both densities are normalized. 10 / 23

Some simple ideas Show (x 1, x 2 ) with joint density h(x 1, x 2 ) = I {0 < x 2 < g(x 1 )} implies x 1 g( ). This proves the fundamental theorem of simulation. Show if x 1 g( ) and x 2 x 1 U(0, g(x 1 )) then the joint density is h(x 1, x 2 ) = I {0 < x 2 < g(x 2 )}. This is how the pair (x 1, x 2 ) is sampled. Finally, if f (x) Mg(x) for all x, then x 1 g( ), x 2 x 1 U(0, Mg(x 1 )), and x 2 < f (x 1 ) x 1 f ( ). 11 / 23

Accept-reject Direct, unintuitive proof that it works... P ( ) Y x U f (Y ) Mg(Y ) Example in R: multimodal density on p. 50. = = = = ( P Y x,u f (Y ) ) Mg(Y ) ( P U f (Y ) ) Mg(Y ) x f (y)/[mg(y)] 0 du g(y) dy f (y)/[mg(y)] 0 du g(y) dy x f (y)/[mg(y)]g(y) dy f (y)/[mg(y)]g(y) dy x f (y) dy f (y) dy 12 / 23

Envelope accept-reject If f ( ) is costly to evaluate we can add a lower squeezing function. Say g l (x) f (x) Mg m (x), all x. 1 X g m ( ) indep. of U U(0, 1); 2 accept Y = X if U g l (X ) Mg m(x ) ; 3 otherwise accept Y = X if U f (X ) Mg m(x ). Repeat if necessary until Y accepted. Then Y f ( ). Only more efficient if evaluating f ( ) is costly. See examples pp. 54 55. 13 / 23

Adaptive rejection sampling A widely applicable, adaptive version of envelope accept-reject is available for (possibly unnormalized) densities f ( ) that are d log-concave, 2 log f (x) < 0 for all x; the algorithm is called dx 2 adaptive rejection sampling (ARS). This method iteratively builds piecewise-linear envelope functions log g l (x) and log g m (x) around log f (x) and performs envelope accept-reject until acceptance. The rejected values x 1, x 2,... are where log f (x) is evaluated. You book tersely describes the algorthm on pp. 56 57; I ll attempt to illustrate on the board. Wikipedia also has a nice explanation. 14 / 23

Adaptive rejection sampling Each rejected x j is incorporated into the upper and lower envelopes, making them tighter where they need to be. Eventually g l and g m will be close enough to f (x) to easily accept. Sampling from g m is simply truncated exponential distributions; easy! There is a derivative-free version and a slightly more efficient version that requires d dx log f (x). For non-log-concave densities, i.e. any f (x), one can use the adaptive rejection Metropolis sampling (ARMS) algorithm; more later. Coding by hand is possible (see Wild & Gilks 1993, Applied Statistics) but a pain. Tim did it for his dissertation work. 15 / 23

R packages that perform ARS... ars. Requires d dx log f (x). MfUSampler. Also does ARMS, slice sampling and Metropolis-Hastings w/ Gaussian proposal. There are others not on CRAN. Google adaptive rejection R package. Also found C and FORTRAN subroutines posted. Example: ARS for N(0, 1). 16 / 23

Metropolis-Hastings We will cover Metropolis-Hastings (MH) in more detail later when we discuss MCMC for obtaining Bayesian inference for π(θ x), but for now let s briefly introduce it as another method for simulating from (a possibly unnormalized) f ( ). The MH algorthim produces a dependent sample Y 1,..., Y n from f ( ) that, if we are careful, we can use like an iid sample. Or we can take simply take the last one Y = Y n f ( ). 17 / 23

Metropolis-Hastings Here s one version called an independence sampler. (0) Initialize Y 0 = y 0 for some y 0. Then for j = 1,..., n repeat (1) through (3): (1) Generate X g( ) indep. of U U(0, 1); (2) compute ρ = 1 f (X )g(y j 1) f (Y j 1 )g(x ) ; (3) if U ρ accept Y j = X otherwise Y j = Y j 1. With positive probability successive values can be tied! Algorithm efficiency has to do with this probability. What is the probability of acceptance of a new value if g(x) f (x)? Example in R: multimodal density on p. 50. 18 / 23

Method of composition For joint (X, Y ) f (x, y) = f X Y (x y)f Y (y) you can sample (a) Y f Y ( ), then (b) X Y = y f X Y ( y). The pair (X, Y ) f (x, y) as required. This works when it is easy to sample Y marginally, and easy to sample X Y. Use this to get (X 1, Y 1 ),..., (X n, Y n ). This is useful in many situations, but here s a common one. We are interested in X 1,..., X n iid fx ( ) where f X (x) = f X Y (x y)f Y (y) dy. }{{} f (x,y) The method of composition will allow us to get an iid sample X 1,..., X n ; we just throw away Y 1,..., Y n. 19 / 23

Two examples... Works the same for discrete mixtures f X (x) = j=1 f X Y (x y j ) P(Y = y j ). }{{} π j A finite mixture of univariate normals has density f (x) = J π j φ(x µ j, σj 2 ). j=1 Sampling X f ( ) is easily carried out via the method of composition: first sample Y where P(Y = j) = π j, then sample X Y N(µ Y, σ 2 Y ). 20 / 23

t ν Another example: t distribution: Y χ 2 ν, X Y N(0, ν/y ) X t ν. Note that this is just the definition. Let Y χ 2 ν indep. of Z N(0, 1) and let X = Then X Y N(0, ν/y ). Z = Z ν/y. Y /ν 21 / 23

Frequentist properties of estimators Most papers in JASA, Biometrics, Statistics in Medicine, JRSSB, etc. have a simulations section. Data are generated under known conditions and then an estimator/inferential procedure is applied. That is, x 1,..., x n f (x 1,..., x n θ) is generated M times with known θ 0 = (θ 01,..., θ 0k ) producing M estimates ˆθ 1,..., ˆθ M, M sets of k SEs or posterior SDs, and M sets of k CIs. Typically M = 500 or M = 1000 and n reflects common sample sizes found in clinical setting or in the data analysis section, e.g. n = 100, n = 500, n = 1000, n = 5000, etc. Often two or three sample sizes are chosen. 22 / 23

Frequentist properties of estimators Common things to look at are: k Biases 1 M M m=1 ˆθ m j θ 0j. k average SE or SD 1 M M m=1 se(ˆθ m j ). Sometimes instead average lengths of k CIs are reported; gets at the same thing. k MSEs 1 M M m=1 (ˆθ m j θ 0j ) 2. k SD of point estimates 1 M M m=1 (ˆθ m j 1 M M s=1 ˆθ s j )2. k Coverage probability of CIs 1 M M m=1 I {Lm j < θ 0j < U m j }. 23 / 23