Monte Carlo Methods: Lecture 3 : Importance Sampling

Similar documents
Monte Carlo Integration

Monte Carlo method and application to random processes

Unbiased Estimation. February 7-12, 2008

Lecture 3. Properties of Summary Statistics: Sampling Distribution

Clases 7-8: Métodos de reducción de varianza en Monte Carlo *

Lecture 19: Convergence

Surveying the Variance Reduction Methods

Element sampling: Part 2

Lecture 2: Monte Carlo Simulation

Direction: This test is worth 250 points. You are required to complete this test within 50 minutes.

Lecture 33: Bootstrap

January 25, 2017 INTRODUCTION TO MATHEMATICAL STATISTICS

This section is optional.

First Year Quantitative Comp Exam Spring, Part I - 203A. f X (x) = 0 otherwise

CEU Department of Economics Econometrics 1, Problem Set 1 - Solutions

Random Variables, Sampling and Estimation

EECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1

Understanding Samples

Let us give one more example of MLE. Example 3. The uniform distribution U[0, θ] on the interval [0, θ] has p.d.f.

Distribution of Random Samples & Limit theorems

Exponential Families and Bayesian Inference

Lecture 2: Poisson Sta*s*cs Probability Density Func*ons Expecta*on and Variance Es*mators

SDS 321: Introduction to Probability and Statistics

Lecture 12: September 27

STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS. Comments:

Lecture 18: Sampling distributions

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.

Asymptotics. Hypothesis Testing UMP. Asymptotic Tests and p-values

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4

Lecture 01: the Central Limit Theorem. 1 Central Limit Theorem for i.i.d. random variables

INFINITE SEQUENCES AND SERIES

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 5

Sequences and Series of Functions

Estimation for Complete Data

Sampling Distributions, Z-Tests, Power

Lecture 7: Density Estimation: k-nearest Neighbor and Basis Approach

ECONOMETRIC THEORY. MODULE XIII Lecture - 34 Asymptotic Theory and Stochastic Regressors

2. The volume of the solid of revolution generated by revolving the area bounded by the

Direction: This test is worth 150 points. You are required to complete this test within 55 minutes.

Review Questions, Chapters 8, 9. f(y) = 0, elsewhere. F (y) = f Y(1) = n ( e y/θ) n 1 1 θ e y/θ = n θ e yn

Probability and Statistics

Lecture 16: UMVUE: conditioning on sufficient and complete statistics

Output Analysis and Run-Length Control

ST5215: Advanced Statistical Theory

Large Sample Theory. Convergence. Central Limit Theorems Asymptotic Distribution Delta Method. Convergence in Probability Convergence in Distribution

Lecture 11 and 12: Basic estimation theory

Lecture 15: Density estimation

Math 113, Calculus II Winter 2007 Final Exam Solutions

CSE 527, Additional notes on MLE & EM

1 Covariance Estimation

Chapter 6 Sampling Distributions

INFINITE SEQUENCES AND SERIES

Statisticians use the word population to refer the total number of (potential) observations under consideration

Estimation of the Mean and the ACVF

Econ 325 Notes on Point Estimator and Confidence Interval 1 By Hiro Kasahara

Machine Learning Brett Bernstein

Some Basic Probability Concepts. 2.1 Experiments, Outcomes and Random Variables

MONTE CARLO VARIANCE REDUCTION METHODS

Convergence of random variables. (telegram style notes) P.J.C. Spreij

Topic 10: The Law of Large Numbers

Expectation and Variance of a random variable

The variance of a sum of independent variables is the sum of their variances, since covariances are zero. Therefore. V (xi )= n n 2 σ2 = σ2.

In this section we derive some finite-sample properties of the OLS estimator. b is an estimator of β. It is a function of the random sample data.

MA Advanced Econometrics: Properties of Least Squares Estimators

Econ 325/327 Notes on Sample Mean, Sample Proportion, Central Limit Theorem, Chi-square Distribution, Student s t distribution 1.

Chapter 6 Principles of Data Reduction

Machine Learning Brett Bernstein

Sequential Monte Carlo Methods - A Review. Arnaud Doucet. Engineering Department, Cambridge University, UK

It is always the case that unions, intersections, complements, and set differences are preserved by the inverse image of a function.

Lecture Note 8 Point Estimators and Point Estimation Methods. MIT Spring 2006 Herman Bennett

Hypothesis Testing. Evaluation of Performance of Learned h. Issues. Trade-off Between Bias and Variance

Ma 4121: Introduction to Lebesgue Integration Solutions to Homework Assignment 5

Approximations and more PMFs and PDFs

STAT Homework 1 - Solutions

MATH 10550, EXAM 3 SOLUTIONS

Lecture 3: August 31

Series III. Chapter Alternating Series

Exercise 4.3 Use the Continuity Theorem to prove the Cramér-Wold Theorem, Theorem. (1) φ a X(1).

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 2 9/9/2013. Large Deviations for i.i.d. Random Variables

4. Partial Sums and the Central Limit Theorem

ECE 6980 An Algorithmic and Information-Theoretic Toolbox for Massive Data

Machine Learning Theory (CS 6783)

Topic 9: Sampling Distributions of Estimators

1 Introduction to reducing variance in Monte Carlo simulations

Power series are analytic

Solutions: Homework 3

IIT JAM Mathematical Statistics (MS) 2006 SECTION A

Mathematical Statistics - MS

It should be unbiased, or approximately unbiased. Variance of the variance estimator should be small. That is, the variance estimator is stable.

1 Bayesian Computation.

Advanced Stochastic Processes.

Statistical Theory MT 2009 Problems 1: Solution sketches

Last Lecture. Wald Test

Chapter 4. Fourier Series

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.

Power series are analytic

Discrete Mathematics for CS Spring 2008 David Wagner Note 22

Economics 326 Methods of Empirical Research in Economics. Lecture 18: The asymptotic variance of OLS and heteroskedasticity

Notes 19 : Martingale CLT

Point Estimation: properties of estimators 1 FINITE-SAMPLE PROPERTIES. finite-sample properties (CB 7.3) large-sample properties (CB 10.

Transcription:

Mote Carlo Methods: Lecture 3 : Importace Samplig Nick Whiteley 16.10.2008 Course material origially by Adam Johase ad Ludger Evers 2007

Overview of this lecture What we have see... Rejectio samplig. This lecture will cover... Importace samplig. Basic importace samplig Importace samplig usig self-ormalised weights Fiite variace estimates Optimal proposals Example

Recall rejectio samplig Algorithm 2.1: Rejectio samplig Give two desities f, g with f(x) < M g(x) for all x, we ca geerate a sample from f by 1. Draw X g. 2. Accept X as a sample from f with probability otherwise go back to step 1. Drawbacks: f(x) M g(x), We eed that f(x) < M g(x) O average we eed to repeat the first step M times before we ca accept a value proposed by g.

The fudametal idetities behid importace samplig (1) Assume that g(x) > 0 for (almost) all x with f(x) > 0. The for a measurable set A: P(X A) = A f(x) dx = A g(x) f(x) g(x) }{{} =:w(x) dx = A g(x)w(x) dx For some itegrable test fuctio h, assume that g(x) > 0 for (almost) all x with f(x) h(x) 0 = E f (h(x)) = f(x)h(x) dx = g(x)w(x)h(x) dx = E g (w(x) h(x)), g(x) f(x) h(x) dx g(x) }{{} =:w(x)

The fudametal idetities behid importace samplig (2) How ca we make use of E f (h(x)) = E g (w(x) h(x))? Cosider X 1,..., X g ad E g w(x) h(x) < +. The 1 a.s. w(x i )h(x i ) E g (w(x) h(x)) (law of large umbers), which implies 1 a.s. w(x i )h(x i ) E f (h(x)). Thus we ca estimate µ := E f (h(x)) by 1 Sample X 1,..., X g 2 µ := 1 w(x i)h(x i )

The importace samplig algorithm Algorithm 2.1a: Importace Samplig Choose g such that supp(g) supp(f h). 1. For i = 1,..., : i. Geerate X i g. ii. Set w(x i ) = f(xi) g(x. i) 2. Retur µ = w(w i)h(x i ) as a estimate of E f (h(x)). Cotrary to rejectio samplig, importace samplig does ot yield realisatios from f, but a weighted sample (X i, W i ). The weighted sample ca be used for estimatig expectatios E f (h(x)) (ad thus probabilities, etc.)

Basic properties of the importace samplig estimate We have already see that µ is cosistet if supp(g) supp(f h) ad E g w(x) h(x) < +, as µ := 1 a.s. w(x i )h(x i ) E f (h(x)) The expected value of the weights is E g (w(x)) = 1. µ is ubiased (see theorem below) Theorem 2.2: Bias ad Variace of Importace Samplig E g ( µ) = µ Var g ( µ) = Var g(w(x) h(x))

Is it eough to kow f up to a multiplicative costat? Assume f(x) = Cπ(x). The µ = w(x i)h(x i ) = 1 Cπ(X i ) g(x i ) h(x i) C does ot cacel out kowig π( ) is ot eough. Idea: Replace ormalisato by by ormalisatio by w(x i), i.e. cosider the self-ormalised estimator ˆµ = w(x i)h(x i ) w(x i) Now we have that ˆµ = w(x i)h(x i ) w(x = i) π(x i ) g(x i ) h(x i), π(x i ) g(x i ) ˆµ does ot deped o C eough to kow f up to a multiplicative costat

The importace samplig algorithm (2) Algorithm 2.1b: Importace Samplig usig self-ormalised weights Choose g such that supp(g) supp(f h). 1. For i = 1,..., : i. Geerate X i g. ii. Set w(x i ) = f(xi) g(x. i) 2. Retur ˆµ = w(x i)h(x i ) w(x i) as a estimate of E f (h(x)).

Basic properties of the self-ormalised estimate ˆµ is cosistet as ˆµ = w(x i)h(x i ) }{{ } = µ E f (h(x)) w(x i) } {{ } 1 a.s. E f (h(x)), (provided supp(g) supp(f h) ad E g w(x) h(x) < + ) ˆµ is biased, but asymptotically ubiased (see theorem below) Theorem 2.2: Bias ad Variace (ctd.) E g (ˆµ) = µ + µvar g(w(x)) Cov g (w(x), w(x) h(x)) Var g (ˆµ) = Var g(w(x) h(x)) 2µCov g (w(x), w(x) h(x)) + µ2 Var g (w(x)) + O( 2 ) + O( 2 )

Fiite variace estimators Importace samplig estimate cosistet for large choice of g. (oly eed that...) More importat i practice: fiite variace estimators, i.e. ( Var( µ) = Var w(x ) i)h(x i ) < + Necessary (albeit very restrictive) coditios for fiite variace of µ: f(x) < M g(x) ad Var f (h(x)) <, or E is compact, f is bouded above o E, ad g is bouded below o E. Note: If f has heavier tails the g, the the weights will have ifiite variace!

Optimal proposals Theorem 2.3: Optimal proposal The proposal distributio g that miimises the variace of µ is g (x) = h(x) f(x) h(t) f(t) dt. Theorem of little practical use: the optimal proposal ivolves h(t) f(t) dt, which is the itegral we wat to estimate! Practical relevace of theorem 2.3: Choose g such that it is close to h(x) f(x)

Super-efficiecy of importace samplig For the optimal g we have that Var f ( h(x1 ) +... + h(x ) if h is ot almost surely costat. Superefficiecy of importace samplig ) > Var g ( µ), The variace of the importace samplig estimate ca be less tha the variace obtaied whe samplig directly from the target f. Ituitio: Importace samplig allows us to choose g such that we focus o areas which cotribute most to the itegral h(x)f(x) dx. Eve sub-optimal proposals ca be super-efficiet.

Example 2.5: Setup Compute E f X for X t 3 by... (a) samplig directly from t 3. (b) usig a t 1 distributio as istrumetal distributio. (c) usig a N(0, 1) distributio as istrumetal distributio.

Example 2.5: Desities 0.0 0.1 0.2 0.3 0.4 x f(x) (Target) f(x) (direct samplig) gt 1 (x) (IS t 1 ) g N(0,1) (x) (IS N(0, 1)) -4-2 0 2 4 x

Example 2.5: Estimates obtaied Samplig directly from t3 IS usig t1 as istrumetal dist IS usig N(0, 1) as istrumetal dist IS estimate over time 0.0 0.5 1.0 1.5 2.0 2.5 3.0 0 500 1000 1500 0 500 1000 1500 0 500 1000 1500 Iteratio

Example 2.5: Weights Samplig directly from t3 IS usig t1 as istrumetal dist IS usig N(0, 1) as istrumetal dist Weights Wi 0.0 0.5 1.0 1.5 2.0 2.5 3.0-4 -2 0 2 4-4 -2 0 2 4-4 -2 0 2 4 Sample Xi from the istrumetal distributio