Monte Carlo Integration

Similar documents
Monte Carlo Methods: Lecture 3 : Importance Sampling

Lecture 2: Monte Carlo Simulation

Clases 7-8: Métodos de reducción de varianza en Monte Carlo *

Chapter 2 The Monte Carlo Method

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.

Chapter 9: Numerical Differentiation

Monte Carlo method and application to random processes

EECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1

Topic 9: Sampling Distributions of Estimators

Element sampling: Part 2

Lecture 19: Convergence

1 Introduction to reducing variance in Monte Carlo simulations

Frequentist Inference

MATH 10550, EXAM 3 SOLUTIONS

Chapter 6 Principles of Data Reduction

Topic 9: Sampling Distributions of Estimators

This exam contains 19 pages (including this cover page) and 10 questions. A Formulae sheet is provided with the exam.

Topic 9: Sampling Distributions of Estimators

January 25, 2017 INTRODUCTION TO MATHEMATICAL STATISTICS

Estimation for Complete Data

Stochastic Simulation

Lecture 7: Density Estimation: k-nearest Neighbor and Basis Approach

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS

Discrete Mathematics for CS Spring 2008 David Wagner Note 22

Topic 10: The Law of Large Numbers

1 Approximating Integrals using Taylor Polynomials

Sequential Monte Carlo Methods - A Review. Arnaud Doucet. Engineering Department, Cambridge University, UK

Bayesian Methods: Introduction to Multi-parameter Models

Machine Learning Brett Bernstein

f(x) dx as we do. 2x dx x also diverges. Solution: We compute 2x dx lim

STAT Homework 1 - Solutions

Problem Set 4 Due Oct, 12

Distribution of Random Samples & Limit theorems

Probability and statistics: basic terms

Hypothesis Testing. Evaluation of Performance of Learned h. Issues. Trade-off Between Bias and Variance

Math 113, Calculus II Winter 2007 Final Exam Solutions

An Introduction to Randomized Algorithms

SAMPLING LIPSCHITZ CONTINUOUS DENSITIES. 1. Introduction

Let us give one more example of MLE. Example 3. The uniform distribution U[0, θ] on the interval [0, θ] has p.d.f.

Enumerative & Asymptotic Combinatorics

Lecture 11 and 12: Basic estimation theory

1 Bayesian Computation.

Expectation and Variance of a random variable

THE ROYAL STATISTICAL SOCIETY GRADUATE DIPLOMA STATISTICAL THEORY AND METHODS PAPER I

FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures

n n i=1 Often we also need to estimate the variance. Below are three estimators each of which is optimal in some sense: n 1 i=1 k=1 i=1 k=1 i=1 k=1

Econ 325/327 Notes on Sample Mean, Sample Proportion, Central Limit Theorem, Chi-square Distribution, Student s t distribution 1.

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 5

MATH301 Real Analysis (2008 Fall) Tutorial Note #7. k=1 f k (x) converges pointwise to S(x) on E if and

7.1 Convergence of sequences of random variables

Simulation. Two Rule For Inverting A Distribution Function

Statisticians use the word population to refer the total number of (potential) observations under consideration

REGRESSION WITH QUADRATIC LOSS

Goodness-of-Fit Tests and Categorical Data Analysis (Devore Chapter Fourteen)

10-701/ Machine Learning Mid-term Exam Solution

IIT JAM Mathematical Statistics (MS) 2006 SECTION A


Statistical Theory MT 2009 Problems 1: Solution sketches

Lecture 3: August 31

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 2 9/9/2013. Large Deviations for i.i.d. Random Variables

THE KALMAN FILTER RAUL ROJAS

MONTE CARLO VARIANCE REDUCTION METHODS

Lecture 12: November 13, 2018

18.01 Calculus Jason Starr Fall 2005

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4

INFINITE SEQUENCES AND SERIES

6. Sufficient, Complete, and Ancillary Statistics

STAT Homework 2 - Solutions

Lecture 12: September 27

Statistical Theory MT 2008 Problems 1: Solution sketches

This section is optional.

Exponential Families and Bayesian Inference

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss

c. Explain the basic Newsvendor model. Why is it useful for SC models? e. What additional research do you believe will be helpful in this area?

A statistical method to determine sample size to estimate characteristic value of soil parameters

Advanced Stochastic Processes.

Regression with quadratic loss

SOLUTIONS TO EXAM 3. Solution: Note that this defines two convergent geometric series with respective radii r 1 = 2/5 < 1 and r 2 = 1/5 < 1.

Math 61CM - Solutions to homework 3

Convergence of random variables. (telegram style notes) P.J.C. Spreij

Since X n /n P p, we know that X n (n. Xn (n X n ) Using the asymptotic result above to obtain an approximation for fixed n, we obtain

Stat 421-SP2012 Interval Estimation Section

Chapter 4. Fourier Series

Machine Learning Brett Bernstein

1 Introduction: within and beyond the normal approximation

Section 13.3 Area and the Definite Integral

Binary classification, Part 1

Basics of Probability Theory (for Theory of Computation courses)

6.3 Testing Series With Positive Terms

Parameter, Statistic and Random Samples

Econ 325 Notes on Point Estimator and Confidence Interval 1 By Hiro Kasahara

1 Generating functions for balls in boxes

1 Duality revisited. AM 221: Advanced Optimization Spring 2016

Optimally Sparse SVMs

Recurrence Relations

University of Colorado Denver Dept. Math. & Stat. Sciences Applied Analysis Preliminary Exam 13 January 2012, 10:00 am 2:00 pm. Good luck!

Hauptman and Karle Joint and Conditional Probability Distributions. Robert H. Blessing, HWI/UB Structural Biology Department, January 2003 ( )

4. Partial Sums and the Central Limit Theorem

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.

ON POINTWISE BINOMIAL APPROXIMATION

Transcription:

Mote Carlo Itegratio I these otes we first review basic umerical itegratio methods (usig Riema approximatio ad the trapezoidal rule) ad their limitatios for evaluatig multidimesioal itegrals. Next we itroduce stochastic itegratio methods based o Mote Carlo ad importace samplig. We coclude with a sectio o computatioally efficiet geeratio of radom umbers, whe the samplig desity is kow up to a ormalizig costat. A excellet referece for this material is the book by Robert ad Casella []. These stochastic methods have foud umerous applicatios i egieerig; see for istace the papers i the 22 special issue of the IEEE Trasactios o Sigal Processig [2]. Riema Itegratio Cosider the problem of evaluatig a itegral I = b φ(x) dx. The Riema approximatio a to I is give by Î = (x i x i )φ(x i ) () where a = x < x < x 2 < < x = b. This may be viewed as approximatig φ(x) with a piecewise-costat fuctio ˆφ (x) which is equal to φ(x i ) for all x [x i, x i ] ad i. Ideed Î = ˆφ. Assumig that the derivative φ (x) is bouded, ad that x i = a+(b a) i, the maximum absolute error due to this approximatio is upper bouded as φ(x) ˆφ (x) (b a) φ with equality if φ(x) is a affie fuctio. Hece the error icurred by approximatig the itegral with a Riema sum is at most Î I C for some costat C = (b a)2 φ idepedet of. 2 Trapezoidal Rule The approximatio formula () ca be improved by replacig φ(x i ) with [φ(x 2 i) + φ(x i )]: Î = (x i x i ) 2 [φ(x i) + φ(x i )]. (2) This is the so-called trapezoidal rule, which is extesively used for umerical itegratio. For istace, if φ(x) is a affie fuctio, the approximatio is exact. For geeral fuctios φ(x), the approximatio error is due to the curvature of φ. If the secod derivative φ (x) exists ad is bouded, it may be show (by applicatio of Taylor s theorem agai) that Î I C 2 for some costat C.

3 Multidimesioal Itegratio For d-dimesioal itegrals, is a subset of R d. A itegral ca be approximated by a Riema sum, similarly to Sec., or usig a trapezoidal rule as i Sec. 2. If a -poit approximatio is used, the trapezoidal rule yields a approximatio error Î I C for 2/d some costat C. This is the same formula as i D, except that is replaced with /d (the umber of poits per coordiate i case is discretized usig a cubic lattice). Hece eeds to icrease expoetially with d to achieve a target approximatio error. This pheomeo is kow as the curse of dimesioality. The stochastic methods for umerical itegratio avoid the curse of dimesioality, as the resultig itegrals may be approximated with a accuracy of the order of /, where is the umber of samples,, take from. Hece the stochastic methods outperform the determiistic oes for dimesios d > 4 ad are worse for d < 4. 4 Classical Mote Carlo Itegratio The basic problem cosidered i this sectio ad the followig oe is as follows. Give a pdf f(x), x ad a fuctio h(x), x, evaluate the itegral µ = E f [h()] = h(x)f(x) dx. Note these methods ca be used to evaluate ay itegral I = φ(x) dx by expressig φ as the product of a pdf f ad aother fuctio h. Give, 2,, draw iid from the pdf f, estimate µ by the empirical average ˆµ = h( i ). a.s By the strog law of large umbers, we have ˆµ µ as. The variace of ˆµ is Var(ˆµ ) = Var[h()] = (h(x) µ) 2 f(x) dx. We will heceforth assume that E f [h 2 ()] <. Example. Let f be the Cauchy distributio, f(x) =, x R, ad h(x) the π(+x 2 ) idicator fuctio for the iterval [, 2]. We have µ = 2 dx.35. π( + x 2 ) 2

The estimator of µ is give by ˆµ = { i 2}. Its variace is µ( µ) Var(ˆµ ) =.2. This method is ituitively iefficiet because oly 35% of the samples cotribute to the sum givig ˆµ. Ca we do better? 5 Importace Samplig The idea here is to draw samples ot from f, but from a auxiliary pdf g (ofte called istrumetal desity). Specifically, give, 2,, draw iid from the pdf g, estimate µ by the empirical average ˆµ = f( i ) g( i ) h( i). Clearly this method reduces to stadard Mote-Carlo if g = f. It is required that supp{f} supp{g}, i.e., f(x) > g(x) >. By the strog law of large umbers, we have [ ] a.s f() ˆµ E g g() h() = f(x)h(x) dx = µ as. Hece the estimator remais ubiased. Its variace is Var g (ˆµ ) = [ ] { f() Var g g() h() = ( ) } 2 f() E g g() h() µ 2 = { } f 2 (x) g(x) h2 (x) dx µ 2 which geerally differs from Var f (ˆµ ). The idea of importace samplig is to fid a good g such that Var g (ˆµ ) < Var f (ˆµ ). For the Cauchy example above, cosider the uiform pdf over [, 2]: g(x) = 2 { x 2} = 2 h(x). The we have ˆµ = 2 π( + 2 i ). 3

The variace of this estimator is Var g (ˆµ ) = { 2 } 2f 2 (x) dx µ 2 i.e., about 2 times smaller tha Var f (ˆµ )!.9 I priciple oe may seek g that miimizes Var g (ˆµ ) over all possible pdf s. The solutio is otaied usig the method of Lagrage multipliers: miimize the Lagragia L(g, λ) = Var g (ˆµ ) + λ g(x) dx v(x) = g(x) dx + λ g(x) dx where λ is the Lagrage multiplier, ad we have used the shorthad v(x) = f 2 (x)h 2 (x). Takig the Fréchet derivative of L(g, λ) with respect to g(x), we obtai whece = L(g, λ) g(x) = v(x) g 2 (x) + λ, g(x) = v(x)/λ = x f(x) h(x) f(x) h(x) dx where the value of λ was selected to esure that g =. The expressio above is elegat, however evaluatig g(x) requires computatio of the itegral i the umerator, which is as hard as the origial problem! I practice thus oe is cotet to fid a good g that assigs high probability to regios where f(x) h(x) is large. Ideally the ratio roughly costat over. f(x) h(x) g(x) would be 6 Radom Number Geeratio A classical method for geeratig a real radom variable from a arbitrary cdf F(x) is to geerate a radom variable U uiformly distributed over [, ] ad the apply the iverse cdf to U, resultig i = F (U) with the desired distributio. Ideed Pr[ x] = Pr[U F(x)] = F(x). Now suppose the pdf f(x) is kow up to a ormalizatio costat which is difficult or expesive to compute. A example is whe samples have to be geerated from a posterior p(y x) p(x) distributio f(x y) = p(y x) p(x) dx, where the itegral i the deomiator is the ormalizatio costat. A good method i this case is the so-called Accept-reject method [, Ch. 2.3]. We are f(x) give a auxiliary pdf g(x) which is easy to sample, ad a costat M such that Mg(x) holds ad is easy to evaluate for all x supp(f). The Accept-reject method works as follows: 4

() Geerate idepedet radom variables g ad U Uiform [, ]. (2) Accept Y = if U f() Mg(). Retur to () otherwise. Claim: Y f. Proof: The cdf for Y is [ Pr[Y y] = Pr y U f() ] Mg() [ ] Pr y, U f() Mg() = [ ] = N(y) Pr, U f() N( ). (3) Mg() The umerator of (3) takes the form N(y) = y = M y f(x) Mg(x) dx g(x) f(x) dx hece N( ) =. Substitutig back ito (3), we obtai Pr[Y y] = y M proves the claim. du f(x) dx, which As a fial observatio, i Step 2 of the Accept-reject algorithm, the probability of acceptace is equal to N( ) =. If = R, this forces the tails of g to be heavier tha those of M f, otherwise the ratio f/g would be ubouded, ad so would M. Refereces [] C. P. Robert ad G. Casella, Mote Carlo Statistical Methods, Spriger, New York, 999. [2] IEEE Trasactios o Sigal Processig special issue o Mote Carlo methods for statistical sigal processig, Vol. 5, No. 2, Feb. 22. 5