Monte Carlo Composition Inversion Acceptance/Rejection Sampling. Direct Simulation. Econ 690. Purdue University

Similar documents
Direct Simulation Methods #2

Hypothesis Testing. Econ 690. Purdue University. Justin L. Tobias (Purdue) Testing 1 / 33

Motivation Scale Mixutres of Normals Finite Gaussian Mixtures Skew-Normal Models. Mixture Models. Econ 690. Purdue University

Gibbs Sampling in Endogenous Variables Models

Gibbs Sampling in Linear Models #1

Gibbs Sampling in Linear Models #2

Gibbs Sampling in Latent Variable Models #1

Metropolis Hastings. Rebecca C. Steorts Bayesian Methods and Modern Statistics: STA 360/601. Module 9

Density Estimation. Seungjin Choi

Lecture : Probabilistic Machine Learning

Nonparameteric Regression:

Review. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Bayesian Linear Regression

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner

Other Noninformative Priors

Short Questions (Do two out of three) 15 points each

Linear Models A linear model is defined by the expression

Eco517 Fall 2004 C. Sims MIDTERM EXAM

Basic Sampling Methods

GAUSSIAN PROCESS REGRESSION

MCMC algorithms for fitting Bayesian models

ECE276A: Sensing & Estimation in Robotics Lecture 10: Gaussian Mixture and Particle Filtering

Statistical Machine Learning Lecture 8: Markov Chain Monte Carlo Sampling

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Bayesian Semiparametric GARCH Models

Bayesian Linear Models

Introduction to Bayesian Computation

DAG models and Markov Chain Monte Carlo methods a short overview

Markov Chain Monte Carlo Lecture 4

Principles of Bayesian Inference

Bayesian Linear Models

Economics 520. Lecture Note 19: Hypothesis Testing via the Neyman-Pearson Lemma CB 8.1,

Relevance Vector Machines

Computational statistics

IEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm

GWAS V: Gaussian processes

Bayesian Inference for DSGE Models. Lawrence J. Christiano

Bayesian Inference for DSGE Models. Lawrence J. Christiano

Eco517 Fall 2013 C. Sims MCMC. October 8, 2013

The distribution inherited by Y is called the Cauchy distribution. Using that. d dy ln(1 + y2 ) = 1 arctan(y)

(3) Review of Probability. ST440/540: Applied Bayesian Statistics

The linear model is the most fundamental of all serious statistical models encompassing:

E cient Importance Sampling

Bayesian Semiparametric GARCH Models

The Expectation-Maximization Algorithm

Part 1: Expectation Propagation

Introduction to Bayesian Inference

Part 6: Multivariate Normal and Linear Models

David Giles Bayesian Econometrics

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =

Gaussian Processes for Machine Learning

STA 4273H: Statistical Machine Learning

Bayesian Linear Regression [DRAFT - In Progress]

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY (formerly the Examinations of the Institute of Statisticians) GRADUATE DIPLOMA, 2004

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions

Lecture 8: Bayesian Estimation of Parameters in State Space Models

Generating Random Variates 2 (Chapter 8, Law)

Lecture 1: Bayesian Framework Basics

CS281A/Stat241A Lecture 22

Statistical Inference

FIRST YEAR EXAM Monday May 10, 2010; 9:00 12:00am

Introduction into Bayesian statistics

Probability and Estimation. Alan Moses

19 : Slice Sampling and HMC

Physics 509: Bootstrap and Robust Parameter Estimation

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS

Regression #8: Loose Ends

EM for Spherical Gaussians

Lecture 4: Probabilistic Learning

1 Description of variables

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Supplement to A Hierarchical Approach for Fitting Curves to Response Time Measurements

Chapter 5 continued. Chapter 5 sections

Completeness. On the other hand, the distribution of an ancillary statistic doesn t depend on θ at all.

TAKEHOME FINAL EXAM e iω e 2iω e iω e 2iω

CSC2515 Assignment #2

Practical Bayesian Optimization of Machine Learning. Learning Algorithms

Topic 12 Overview of Estimation

Hyperparameter estimation in Dirichlet process mixture models

Lecture 1 Basic Statistical Machinery

Foundations of Statistical Inference

Statistical and Learning Techniques in Computer Vision Lecture 2: Maximum Likelihood and Bayesian Estimation Jens Rittscher and Chuck Stewart

Problem 1 (20) Log-normal. f(x) Cauchy

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

The Normal Linear Regression Model with Natural Conjugate Prior. March 7, 2016

Bayesian Linear Models

One-parameter models

Fractional Imputation in Survey Sampling: A Comparative Review

Principles of Bayesian Inference

Regression #5: Confidence Intervals and Hypothesis Testing (Part 1)

Outline of GLMs. Definitions

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent

Introduction to Probabilistic Machine Learning

Exam C Solutions Spring 2005

MCMC Methods: Gibbs and Metropolis

Curve Fitting Re-visited, Bishop1.2.5

Bayesian GLMs and Metropolis-Hastings Algorithm

Gaussian Process Approximations of Stochastic Differential Equations

ECON 3150/4150, Spring term Lecture 6

Problem Set 2. MAS 622J/1.126J: Pattern Recognition and Analysis. Due: 5:00 p.m. on September 30

Transcription:

Methods Econ 690 Purdue University

Outline 1 Monte Carlo Integration 2 The Method of Composition 3 The Method of Inversion 4 Acceptance/Rejection Sampling

Monte Carlo Integration Suppose you wish to calculate a posterior moment of the form: Θ E[g(θ) y] = g(θ)p(y θ)p(θ)dθ Θ p(y θ)p(θ)dθ. With Monte Carlo Integration, we assume that we can draw directly from the posteiror p(θ y).

If this is true, then, under reasonable conditions, a law of large numbers guarantees that In the above θ (i) iid p(θ y). Note that n here is under our control, as it refers to the Monte Carlo sample size rather than the number of observations. Thus, we can estimate the desired (finite-sample) moment with arbitrary accuracy. This technique has a very demanding prerequisite - that we can draw directly from p(θ y).

Method of Composition The method of composition provides a convenient way of drawing from p(θ y) when the joint distribution is decomposed into a product of marginals and conditionals, and each of these component pieces can be easily drawn from. For example, suppose θ = [θ 1 θ 2 ] and that p(θ 2 y) and p(θ 1 θ 2, y) are well known densities that are easily sampled.

Under these conditions, one can obtain a draw from p(θ y) in the following way: 1 2 Why this works is, perhaps, obvious, but consider A B Θ 1 Θ 2. Then,

Consider the linear regression model: under the prior y = X β + u, u X N(0, σ 2 I n ) p(β, σ 2 ) σ 2. In our linear regression model notes, we showed and [ n k σ 2 y IG 2 β σ 2, y N[ ˆβ, σ 2 (X X ) 1 ], 2[(y X ˆβ) (y X ˆβ)] 1 ]. Thus, we can sample from the joint posterior p(β, σ 2 y) by first sampling σ 2 from its marginal posterior, and then sampling β from the conditional normal posterior.

The method of composition can also prove to be a very valuable tool for problems of (posterior) prediction. To this end, consider an out-of-sample value y f which is presumed to be generated by our regression model: y f = X f β + u f, u f X f N(0, σ 2 ). 1 Note that y f β, σ 2 does not depend on y. (But does through β and σ 2. ) 2 The goal is to simulate draws from the posterior predictive: p(y f y), which does not depend on any of the model s parameters.

To generate draws from this posterior predictive, we first consider the joint posterior distribution: p(y f, β, σ 2 y). If we can draw from this distribution, we can use only the y f draws (and ignore those associated with β and σ 2 ) as draws from the marginal p(y f y). How can we do this?

Note This suggests that draws from the marginal posterior predictive distribution can be obtained by 1 2 3 4 Note, of course, this requires that X f is known. 5 Doing this many times will produce a set of draws from the posterior predictive y f y.

Let s apply this method to generate draws from the posterior predictive using our log wage example: log(wage) i = β 0 + β 1 Ed i + u i. The method just described could be applied directly to sample from the predictive distribution of (log) hourly wages. However, the wage density itself is actually more interpretable. To sample from the posterior predictive of wages (in levels), we can consider drawing from an augmented density of the form: p(w f, y f, β, σ 2 y) where w f = exp(y f ).

p(w f, y f, β, σ 2 y) We can write this joint distribution as follows: where the last line follows since the distribution of w f only depends on y f and, in fact,

Thus, within the context of our example, we can generate draws from the posterior predictive distribution of hourly wages w f as follows: 1 Generate 2 Generate 3 Generate 4 Generate

We apply this technique to our data set and generate 10,000 draws from the posterior predictive distribution of hourly wages for two cases: Ed = 12 and Ed = 16. The 10,000 draws are then smoothed nonparametrically via a kernel density estimator. [I will provide a MATLAB file for you that does these calculations]. Graphs of these densities are provided on the following page.

0.09 0.08 Education =12 0.07 Density 0.06 0.05 0.04 Education = 16 0.03 0.02 0.01 0 0 10 20 30 40 50 Hourly Wage Figure: Posterior Predictive Hourly Wage Densities

The (posterior predictive) mean hourly wage for high school graduates is (approximately) $11.07 The mean hourly wage for those with a BA is (approximately) $15.88 The posterior probability that a high school graduate will receive an hourly wage greater than $15 is Pr(w f > 15 Ed f = 12, y).19 The posterior probability that an individual with a BA will receive an hourly wage greater than $15 is Pr(w f > 15 Ed f = 16, y).44 If you are curious, doing the same exercise for someone with a Ph.D., i.e., Ed = 20, gives Pr(w f > 15 Ed f = 20, y).72

The Method of Inversion Suppose that X is a continuous random variable with distribution function F and density f. Further, assume that the distribution function F can be easily calculated. Let U U(0, 1), a Uniform random variable on the unit interval, and define Y = F 1 (U). Derive the distribution of the random variable Y.

U U(0, 1), Y = F 1 (U). We can establish the desired result using a change of variables. First, note that with I ( ) denoting an indicator function and U = F (Y ). Thus, Therefore Y has distribution function F.

How is the result we just established useful? This result is extremely useful in cases where the cdf F and its inverse are easily calculated because it provides a way to generate draws from f. Specifically, we can: 1 2 It follows that Y is a draw from f. We now provide several examples of this method.

Consider an exponential random variable with density function p(x θ) = θ 1 exp( x/θ), x > 0. Show how the inverse transform method can be used to generate draws from the exponential density.

Note that, for x > 0, x 1 F (x) = ( 0 θ exp t ) dt θ ( = 1 exp x ). θ The results of our previous derivation show that if we can solve for x in the equation with u denoting a realized draw from a U(0, 1) distribution, then x has the desired exponential density. A little algebra provides as the solution.

Let x TN [a,b] (µ, σ 2 ) denote that x is a truncated Normal random variable. Specifically, this notation indicates that x is generated from a Normal density with mean µ and variance σ 2, which is truncated to lie in the interval [a, b]. The density function for x in this case is given as Show how the inverse transform method can be used to generate draws from this truncated Normal density.

For a x b, the c.d.f. of the truncated Normal random variable is Therefore, if x is a solution to the equation u = Φ ( x µ ) ( σ Φ a µ ) ( ) σ Φ b µ σ Φ ( a µ ), σ where u is realized draw from a U(0, 1) distribution, then x TN [a,b] (µ, σ 2 ).

With a little algebra it follows that ( ) [ ( ) ( )]) a µ b µ a µ x = µ + σφ (Φ 1 + u Φ Φ σ σ σ is a solution. When: b =, so that the random variable is truncated from below only, we substitute Φ[(b µ)/σ] with 1 in the above expression. a =, so that the random variable is truncated from above only, we substitute Φ[(a µ)/σ] with 0 in the above expression.

Suppose y µ, σ 2 LN(µ, σ 2 ), implying ( 1 1 p(y) = 2πσ 2 y exp 1 ) [ln y µ]2, y > 0. 2σ2 We seek to use inversion to generate draws from the lognormal distribution.

Note, for c > 0, Let Then Thus, setting produces a draw from the desired lognormal distribution.

Rejection Sampling provides another way of obtaining draws from some density of interest. Generally, it proceeds as follows: 1 Draw from some approximating density 2 Compute a statistic, like a critical value. 3 If some condition is met related to the size of the critical value, then keep the draw as a draw from the target density. Otherwise, start over until the needed condition is satisfied.

Consider the following strategy for drawing from a density f (x) defined over the compact support a x b with M max a x b f (x) : 1 Generate two independent Uniform random variables U 1 and U 2 as follows: U i iid U(0, 1), i = 1, 2. 2 If MU 2 > f (a + [b a]u 1 ), start over. That is, go back to the first step and generate new values for U 1 and U 2, and again determine if MU 2 > f (a + [b a]u 1 ). 3 When MU 2 f (a + [b a]u 1 )) set x = a + (b a)u 1 as a draw from f (x).

We will answer the following two questions regarding this algorithm: (a) What is the probability that any particular iteration in the above algorithm will produce a draw that is accepted? (b) Sketch a proof as to why x, when it is accepted, has distribution function F (x) = x a f (t) dt.

First, we will consider question (a) and investigate the probability of acceptance. Note that

The third line uses the fact that U 1 and U 2 are independent, The fourth and fifth lines follow from the fact that U i U(0, 1), i = 1, 2, The fifth line also applies a change of variable, setting t = a + (b a)u 1. Thus the probability of accepting a candidate draw in the algorithm is [M(b a)] 1. Note that, when using this method to sample from a Uniform distribution on [a, b], all candidates from the algorithm are accepted.

Let us now move on to part (b), which seeks to establish why this algorithm works: Note that Therefore, a candidate draw which is accepted from the acceptance/rejection method has distribution function F, as desired.

Let us now consider an applicaton of this rejection sampling algorithm. Consider the triangular density function, given as p(x) = 1 x, x [ 1, 1]. Describe how the rejection sampling algorithm can be used to generate draws from this density function.

For this simple example, note that M = 1 and b a = 2, so that the overall acceptance rate is one-half. That is, we would expect that, say, 50,000 pairs of independent Uniform variates in the acceptance/rejection algorithm would be needed in order to produce a final sample of 25,000 draws. Here is a small MATLAB program that does this: iter = 10000; U2 = rand(iter,1); U1 = rand(iter,1); fff = U2-1 + abs(2*u1-1); points = find(fff <= 0); draws = -1 + 2*U1(points);