Mote Carlo Itegratio I these otes we first review basic umerical itegratio methods (usig Riema approximatio ad the trapezoidal rule) ad their limitatios for evaluatig multidimesioal itegrals. Next we itroduce stochastic itegratio methods based o Mote Carlo ad importace samplig. We coclude with a sectio o computatioally efficiet geeratio of radom umbers, whe the samplig desity is kow up to a ormalizig costat. A excellet referece for this material is the book by Robert ad Casella []. These stochastic methods have foud umerous applicatios i egieerig; see for istace the papers i the 22 special issue of the IEEE Trasactios o Sigal Processig [2]. Riema Itegratio Cosider the problem of evaluatig a itegral I = b φ(x) dx. The Riema approximatio a to I is give by Î = (x i x i )φ(x i ) () where a = x < x < x 2 < < x = b. This may be viewed as approximatig φ(x) with a piecewise-costat fuctio ˆφ (x) which is equal to φ(x i ) for all x [x i, x i ] ad i. Ideed Î = ˆφ. Assumig that the derivative φ (x) is bouded, ad that x i = a+(b a) i, the maximum absolute error due to this approximatio is upper bouded as φ(x) ˆφ (x) (b a) φ with equality if φ(x) is a affie fuctio. Hece the error icurred by approximatig the itegral with a Riema sum is at most Î I C for some costat C = (b a)2 φ idepedet of. 2 Trapezoidal Rule The approximatio formula () ca be improved by replacig φ(x i ) with [φ(x 2 i) + φ(x i )]: Î = (x i x i ) 2 [φ(x i) + φ(x i )]. (2) This is the so-called trapezoidal rule, which is extesively used for umerical itegratio. For istace, if φ(x) is a affie fuctio, the approximatio is exact. For geeral fuctios φ(x), the approximatio error is due to the curvature of φ. If the secod derivative φ (x) exists ad is bouded, it may be show (by applicatio of Taylor s theorem agai) that Î I C 2 for some costat C.
3 Multidimesioal Itegratio For d-dimesioal itegrals, is a subset of R d. A itegral ca be approximated by a Riema sum, similarly to Sec., or usig a trapezoidal rule as i Sec. 2. If a -poit approximatio is used, the trapezoidal rule yields a approximatio error Î I C for 2/d some costat C. This is the same formula as i D, except that is replaced with /d (the umber of poits per coordiate i case is discretized usig a cubic lattice). Hece eeds to icrease expoetially with d to achieve a target approximatio error. This pheomeo is kow as the curse of dimesioality. The stochastic methods for umerical itegratio avoid the curse of dimesioality, as the resultig itegrals may be approximated with a accuracy of the order of /, where is the umber of samples,, take from. Hece the stochastic methods outperform the determiistic oes for dimesios d > 4 ad are worse for d < 4. 4 Classical Mote Carlo Itegratio The basic problem cosidered i this sectio ad the followig oe is as follows. Give a pdf f(x), x ad a fuctio h(x), x, evaluate the itegral µ = E f [h()] = h(x)f(x) dx. Note these methods ca be used to evaluate ay itegral I = φ(x) dx by expressig φ as the product of a pdf f ad aother fuctio h. Give, 2,, draw iid from the pdf f, estimate µ by the empirical average ˆµ = h( i ). a.s By the strog law of large umbers, we have ˆµ µ as. The variace of ˆµ is Var(ˆµ ) = Var[h()] = (h(x) µ) 2 f(x) dx. We will heceforth assume that E f [h 2 ()] <. Example. Let f be the Cauchy distributio, f(x) =, x R, ad h(x) the π(+x 2 ) idicator fuctio for the iterval [, 2]. We have µ = 2 dx.35. π( + x 2 ) 2
The estimator of µ is give by ˆµ = { i 2}. Its variace is µ( µ) Var(ˆµ ) =.2. This method is ituitively iefficiet because oly 35% of the samples cotribute to the sum givig ˆµ. Ca we do better? 5 Importace Samplig The idea here is to draw samples ot from f, but from a auxiliary pdf g (ofte called istrumetal desity). Specifically, give, 2,, draw iid from the pdf g, estimate µ by the empirical average ˆµ = f( i ) g( i ) h( i). Clearly this method reduces to stadard Mote-Carlo if g = f. It is required that supp{f} supp{g}, i.e., f(x) > g(x) >. By the strog law of large umbers, we have [ ] a.s f() ˆµ E g g() h() = f(x)h(x) dx = µ as. Hece the estimator remais ubiased. Its variace is Var g (ˆµ ) = [ ] { f() Var g g() h() = ( ) } 2 f() E g g() h() µ 2 = { } f 2 (x) g(x) h2 (x) dx µ 2 which geerally differs from Var f (ˆµ ). The idea of importace samplig is to fid a good g such that Var g (ˆµ ) < Var f (ˆµ ). For the Cauchy example above, cosider the uiform pdf over [, 2]: g(x) = 2 { x 2} = 2 h(x). The we have ˆµ = 2 π( + 2 i ). 3
The variace of this estimator is Var g (ˆµ ) = { 2 } 2f 2 (x) dx µ 2 i.e., about 2 times smaller tha Var f (ˆµ )!.9 I priciple oe may seek g that miimizes Var g (ˆµ ) over all possible pdf s. The solutio is otaied usig the method of Lagrage multipliers: miimize the Lagragia L(g, λ) = Var g (ˆµ ) + λ g(x) dx v(x) = g(x) dx + λ g(x) dx where λ is the Lagrage multiplier, ad we have used the shorthad v(x) = f 2 (x)h 2 (x). Takig the Fréchet derivative of L(g, λ) with respect to g(x), we obtai whece = L(g, λ) g(x) = v(x) g 2 (x) + λ, g(x) = v(x)/λ = x f(x) h(x) f(x) h(x) dx where the value of λ was selected to esure that g =. The expressio above is elegat, however evaluatig g(x) requires computatio of the itegral i the umerator, which is as hard as the origial problem! I practice thus oe is cotet to fid a good g that assigs high probability to regios where f(x) h(x) is large. Ideally the ratio roughly costat over. f(x) h(x) g(x) would be 6 Radom Number Geeratio A classical method for geeratig a real radom variable from a arbitrary cdf F(x) is to geerate a radom variable U uiformly distributed over [, ] ad the apply the iverse cdf to U, resultig i = F (U) with the desired distributio. Ideed Pr[ x] = Pr[U F(x)] = F(x). Now suppose the pdf f(x) is kow up to a ormalizatio costat which is difficult or expesive to compute. A example is whe samples have to be geerated from a posterior p(y x) p(x) distributio f(x y) = p(y x) p(x) dx, where the itegral i the deomiator is the ormalizatio costat. A good method i this case is the so-called Accept-reject method [, Ch. 2.3]. We are f(x) give a auxiliary pdf g(x) which is easy to sample, ad a costat M such that Mg(x) holds ad is easy to evaluate for all x supp(f). The Accept-reject method works as follows: 4
() Geerate idepedet radom variables g ad U Uiform [, ]. (2) Accept Y = if U f() Mg(). Retur to () otherwise. Claim: Y f. Proof: The cdf for Y is [ Pr[Y y] = Pr y U f() ] Mg() [ ] Pr y, U f() Mg() = [ ] = N(y) Pr, U f() N( ). (3) Mg() The umerator of (3) takes the form N(y) = y = M y f(x) Mg(x) dx g(x) f(x) dx hece N( ) =. Substitutig back ito (3), we obtai Pr[Y y] = y M proves the claim. du f(x) dx, which As a fial observatio, i Step 2 of the Accept-reject algorithm, the probability of acceptace is equal to N( ) =. If = R, this forces the tails of g to be heavier tha those of M f, otherwise the ratio f/g would be ubouded, ad so would M. Refereces [] C. P. Robert ad G. Casella, Mote Carlo Statistical Methods, Spriger, New York, 999. [2] IEEE Trasactios o Sigal Processig special issue o Mote Carlo methods for statistical sigal processig, Vol. 5, No. 2, Feb. 22. 5