SOCHASIC SIMULAIO FOR BLOCKED DAA Stochastc System Analyss and Bayesan Model Updatng Monte Carlo smulaton Rejecton samplng Importance samplng Markov chan Monte Carlo Monte Carlo smulaton Introducton: If we know how to drectly sample from f x to obtan..d. samples { X :,..., } =, accordng to the Law of Large umber: E g X g X g f x = MCS Procedure: skpped snce t s trval Analyss:. he MCS estmator g MCS s unbased and the varance decays wth the / rate. MCS E f x g = E f x g X E g X f x = = MCS Var g Var g f x f x X = = Varf x g X. Confdence nterval MCS estmator g Central Lmt heorem MCS = = g X Var ormal E g X, f x g X o buld up confdence nterval, we need estmated smply as the sample varance of MCS Var g X g X g f x = f x f x Var g X, whch can be { g X :,..., } =,.e. herefore, the followng 95.4% confdence nterval for Gaussan can be bult:
Stochastc System Analyss and Bayesan Model Updatng Var g X Var g X MCS f x MCS f x g E g X g + f x Remarks:. If we know how to drectly sample from f x s a very strct premse snce t usually requres the knowledge of F the nverse of the cumulatve densty functon, whch s avalable to us only for certan standard probablty densty functons, e.g. unform, Gaussan, Gamma, etc. In the case that we know we can sample from f x usng the followng steps: Draw U ~ unf [ 0,] and let X = F U. It can be shown that ~ Proof: X f x. P X x = P F U x = P U F x = F x. he accuracy of MCS does not depend on the dmenson. F, MCS g s robust aganst the dmenson of X snce Var g
Stochastc System Analyss and Bayesan Model Updatng Rejecton samplng Introducton: 3
Stochastc System Analyss and Bayesan Model Updatng Suppose we don t know how to sample f x drectly, but we do know how to evaluate f x up to a constant. f x = a h x where a : unknown constant, h x : we know how to calculate t. ote that ths s the usual stuaton for Bayesan analyss, where the posteror PDF has the followng form: x f Y f Y f Y x f f x Y = = f Y x f x 44443 { h x a Assume that we know how to sample and evaluate a chosen PDF q x q x K q x h x Procedure:. Let K be a number such that K q x h x. Let X C ~ q x, C : canddate. 3. Accept X = X C w.p. C C h X Kq X., x. 4. Cycle 3 untl we get enough accepted samples. Analyss: Same as the Monte Carlo smulaton except that the number of samples should be the number of the accepted samples. Remarks: Dfferent from MCMC, when a sample s rejected, we do not repeat the prevous sample. he number K can be solved as the followng: Let x * x x * x * x h h = arg max K =,.e. solvng K usually requres optmzaton. x q q 3 How do we do Step 3? Let ~ [ 0, ] 4 If the shapes of q x and x U unf Let X = X C, f C C h X U Kq X h are very dfferent, the effcency of rejecton 4
Stochastc System Analyss and Bayesan Model Updatng samplng wll be poor. However, t s often the case that the shape of h x s unknown to us. It's hard to choose an effcent q x. 5 When the X dmenson s hgh, rejecton samplng can be extremely neffcent. hs s because t can be much harder to fnd a good q x n hgh dmensonal space. Adaptve rejecton samplng Suppose h x log s concave. l o g Kq x h x K q x Kq 3 3 x Procedure: Let q x be an exponental PDF and s used as the ntal rejecton samplng PDF. ote that Kq x s tangent to h x. Draw C ~ X q x. If C sample. However, f C X accepted, keep on usng x X rejected, let x that Kq x s tangent to x q to generate the next q be the second exponental PDF so h at X C n the log space. Generate the next sample usng the envelope PDF formed by Kq x and Kq dong so untl we get enough number of accepted samples. x. Contnue Remarks:. ot applcable to non-log-concave PDF.. Stll not good for hgh dmensonal X. 3. here are varants of adaptve rejecton samplng technques that do not requre evaluatng the gradents [] and do not requre log-concaveness []. 5
Stochastc System Analyss and Bayesan Model Updatng Importance samplng Introducton: Suppose we don t know how to drectly sample from f x, but we know how to evaluate t. We would lke to compute the expected value of some functon g X wth respect to f x,.e. to compute E f x [ g X ] g x f x = dx Let q x, called the mportance samplng PDF, be a PDF that we know how to sample and evaluate and whose support regon contans the support regon of f x, then f x f x E f x g X = g x f x dx = g x q x dx = Eq x g x q x q x Procedure: Draw { X },, L, =..d. from q x. Accordng to the Law of Large umber, we have f f X f x q x = X E g X = E g X g X g q X q X IS Analyss:. he IS estmator g IS s unbased and the varance of g IS decays wth the / rate. = E g X q x x X f X IS E g E q x q x g X = = q X f X q X f = Eq x g X q X f = g x q x dx = E f x g X q x 6
Stochastc System Analyss and Bayesan Model Updatng f X IS f X Var g Var q x q x g X = = Var q x g X = q X q X. Confdence nterval IS estmator g IS = g X f X q X Central Lmt heorem q x Central Lmt heorem X X f Varq x g X f X q X ormal E g X, q X f Varq x g X q X ormal E g X, f x o buld up confdence nterval, we need estmated smply as the sample varance of X Var g X q x q X f X, whch can be f g X : =,...,,.e. q X X f X f IS Var q x g X g X g q X = q X herefore, the followng 95.4% confdence nterval for Gaussan can be bult: f X f X Varq x g X Varq x g X q X q X IS IS g E g X g + f x Remarks:. Recall that the varance of the MCS estmator s Var g X seen that the varance of the IS estmator s Var g X q x f x f. Here we have X q X. It can be shown that f the mportance samplng PDF q x g x f x, the varance of 7
Stochastc System Analyss and Bayesan Model Updatng the IS estmator s 0. We call ths the optmal mportance samplng PDF. However, ths PDF usually cannot be drectly sampled and evaluated. It s also lkely to choose an mportance samplng PDF so that the varance of the resultng IS estmator s larger than that of MCS estmator.. Importance weghts Let us defne w { w : =,..., } X f as the mportance weght of the q X One can see that the IS estmator s smply E q x w = E = q X Also, w = = = f X q x = w g X Varq x f X Central Lmt heorem ormal, q X = q X w f. ote that X th sample. 3. Importance samplng s neffcent when X dmenson s hgh. hs s because when the dmenson s hgh, the mportance weghts may become hghly non-unform. he consequence s that the number of effectve samples s lttle. Modfed mportance samplng Introducton: For Bayesan analyss, we usually only know how to evaluate the posteror PDF f f x Y up to a constant: x Y = x f Y f Y x f ote that we usually don t know the normalzng constant f Y. So the mportance 8
Stochastc System Analyss and Bayesan Model Updatng samplng technque cannot be drectly appled snce t requres full evaluaton of f x Y. Modfed mportance samplng only requres the evaluaton of f to a constant. f x Y a h x =, where a : we don t know t, how to evaluate t. ow let the normalzed mportance weght be w j j h X h X q X j= q X ote that smply = w w g X. = =. he MIS estmator for Observaton: when, MIS s the same as IS. Proof: When, j j h X h X Eq x =, so j= q X q X a x Y up h x : we know E g X E g X Y f x Y j j h X. j= q X a herefore, the normalzed mportance weght n MIS and the mportance weght n IS become dentcal. he consequence s that MIS s only asymptotcally unbased. s Procedure:. Draw. { X :,..., } ~ q x = and compute E g X Y w g X = w j j h X h X. q X j= q X Remarks:. When s large, the statstcal propertes of the MIS estmator s smlar to those for the IS estmator. In fact, MIS s asymptotcally unbased.. MIS can be used to estmate the normalzng constant f Y n Bayesan analyss. Let h X α = be the non-normalzed mportance weght. It can be shown that q X α s an unbased estmator of Y = f. 9
Stochastc System Analyss and Bayesan Model Updatng Sample-mportance resamplng Introducton: Both mportance samplng and modfed mportance samplng are manly for estmatng E g X Y. If we would lke to obtan samples from f x Y, we can adopt SIR, whch s just one step ahead of MIS. he dea s to resample the MIS samples accordng to ther normalzed weghts. Procedure: Recall IIS, we have. Do MIS to obtan { w :, L, } =, {, } X w :,..., = w =. Recall that = = w =.. Resamplng: Let j X = X w.p. new w for j = L,, M, then j { X : j =, L, M} ~ f x Y new Remarks:. here may be repeated samples n. M s better to be no larger than. 3. If s small, asymptotcally unbased. j { X : j,, M } new j { X : j,, M } ~ f x Y new = L. = L. hs s because MIS s only 0
Stochastc System Analyss and Bayesan Model Updatng
Stochastc System Analyss and Bayesan Model Updatng Markov Chan Monte Carlo. Metropols-Hastng algorthm. Gbbs sampler 3. Hybrd Monte Carlo Metropols-Hastng algorthm Introducton: Suppose the target PDF s the posteror PDF n a Bayesan analyss: x f Y f Y x f f x Y = = a h x he dea of MH s to create a Markov chan whose statonary dstrbuton s the same as the target PDF. Procedure: c 0. Let H x x be the chosen proposal PDF.. Intalze 0 X = any place.
3. Let the canddate C c 0 ~ 4. Let X Stochastc System Analyss and Bayesan Model Updatng X H x X and compute C X w. p mn, = X w. p r r 0 mn, 5. Cycle 3 4 to obtan Markov chan samples { t X : t 0,..., } wll be asymptotcally dstrbuted as f C 0 C 0 C 0 h X H X X r =. h X H X X =. hese samples x Y f the Markov chan s ergodc. Frst note that the MH algorthm creates a Markov chan that satsfes detaled balance: F z x h x = B x z h z x, z, F z x and B x z are the forward and backward probablty transton kernels of the Markov chan. he forward and backward kernels n the MH algorthm are = mn, δ + mn, F z x r z x r H z x = mn, δ + mn, B x z r x z r H x z B h where z H x z f z Y H x z h r = =, x H z x rb h x H z x f x Y H z x LHS = F z x h x B x Y H z x f = =. h z H x z f z Y H x z h z H x z h z H x z = mn, δ z x h x mn, H z x h x h x H z x + h x H z x h x H z x = mn, δ + mn, h z H x z x z h z H z x h x h z H x z h x H z x h x H z x mn, δ x z h z mn h z H x z h z H x z B x z h z RHS = + = =, h z H x z herefore, as long as the Markov chan s ergodc, the statonary dstrbuton wll be unque and equal to the target f x Y. Analyss: 3
. he MCMC estmator for E g X Y MCMC estmator s asymptotcally unbased because Stochastc System Analyss and Bayesan Model Updatng s smply MCMC t g = g X. he t = [ ] t E g MCMC = E g X Y E g X Y t= ote that f we gnore the MC samples n the burn-n perod, the estmator s unbased. t t [ MCMC ] = = Var g Var g X Y Var g X Y t= t= t s t cov, = Var g X Y g X g X Y + t= s= t= ote that after the MC reaches ts statonary state, the tme orgn s forgotten, so t t+ k g X g X Y = R k cov, ote that R k = R k and = R 0 Var g X Y Var [ g MCMC ] = R 0 R s t R 0 R s s + = + s= t= s= s= 0 [ γ ] R 0 s R s R 0 = + = + R where s R s R 0 γ = quantfes the degree of dependence of the MC s= samples. ote that R 0 s the varance of the estmator f the MC samples are all ndependent. However, the MC samples are dependent so the actual varance s larger than R 0 of ndependent samples s + γ samples as the followng: t t+ k = cov, t= for a factor of [ + γ ]. One can say that the equvalent number. ote that R k can be estmated from the MC t t k g X g + MCMC g X g MCMC R k R k g X g X k k. Confdence nterval ote that the Central Lmt heorem and the Law of Large umber stll hold for 4
weakly dependent samples,.e. Stochastc System Analyss and Bayesan Model Updatng Y [ γ ] Var g X t g MCMC = g X E g X Y, + t= So the followng 95.4% confdence nterval can be bult: 0 MCMC R R0 g + E g X Y g + + [ ] MCMC γ [ γ ] Remarks:. Recall that f x Y a h x =. We don t need to know how to evaluate the normalzng constant a snce t s cancelled out as we compute r.. When the ntal state of the Markov chan s chosen arbtrarly, the chan may need some tme to reach ts statonary state. We call ths perod the burn-n perod. he MC samples n the burn-n perod are not dstrbuted as f x Y. We can choose to dscard these samples. he usual way of dentfyng the burn-n perod s to plot the sample value tme hstory or the tme hstory of certan chosen statstcs and dentfy t vsually. X t Burn-n perod t Samples n the burn-n perod can be dscarded 3. How do we know the resultng MC s ergodc? In many cases, we don t know t for sure. But f we conduct several MCMCs to obtan ndependent MCs and found ther behavor s smlar, we can argue that the MC may be ergodc. here are cases where we can mmedately see the MC s ergodc or not: y x H, f x f x y x H, 4. After reachng the statonary state, the MC samples become dentcally dstrbuted but dependent. he dependency between two samples decays wth the tme 5
dfference. 5. When H z x H x z algorthm. ote Stochastc System Analyss and Bayesan Model Updatng =, the resultng algorthm s called the Metropols C 0 h X r =. h X 6. he choce of the proposal PDF H z x can sgnfcantly affect the effcency of MCMC. From our experence, the type of H z x s not crtcal but the wdth of H z x matters. If H z x s very narrow, r and no rejecton, but adjacent samples are hghly correlated. If H z x s very wde, r s often small and lots of rejecton and repeatng samples, so the samples can be hghly correlated. herefore, H z x should not be too narrow or too wde. he rule of thumb s to take the wdth of H z x to be the same order of the wdth of f 50%. 0 h X always accept h X C h x C h X x Y. Usually, the performance s the best when the rejecton rate s around 7. In prncple, the MH algorthm should work even when the X dmenson s hgh. In practce, an effcent proposal PDF can be dffcult to buld because we usually lack of knowledge of the geometry of the target PDF f x Y. here are also ssues for the local-random-walk MH, e.g. MH wth a Gaussan proposal PDF. Consder the followng Gaussan PDF. Let σ max and σ mn be the maxmum and mnmum egenvalues of the covarance matrx. Suppose we choose a Gaussan proposal PDF H z x whose standard devaton n all drectons s dentcal and equal to σ mn. One can see that n order to have the MC samples travel h X accept wth prob r throughout the sgnfcant regon of the target PDF, we need at least σ max /σ mn MC 0 6
Stochastc System Analyss and Bayesan Model Updatng tme steps. hs dffculty s due to hgh asperty rato may occur regardless the dmenson of X. However, from my experence, t occurs more frequently when X dmenson s hgh. hs s defntely not an ssue when X s a scalar. 8. MH may not work for mult-modal PDFs snce the resultng MC may be non-ergodc. 7
Stochastc System Analyss and Bayesan Model Updatng 8
Stochastc System Analyss and Bayesan Model Updatng 9
Stochastc System Analyss and Bayesan Model Updatng Gbbs sampler Introducton: Is t possble to conduct MCMC wthout rejectng samples? It turns out to be possble. hs can happen when the uncertan varable X can be dvded nto m groups { X :,..., m} = and, moreover, when t s feasble to drectly sample from { } \, f x x x Y, where { x x } denotes { x,..., x, x,..., x } \ + m. Procedure:. Intalze. Sample 0 0 0 X 0 = X, X,, X L m at any place. 3. Repeat step to get ~,,, X f x x x Y 0 0 L m 0 0 X ~ f x x, x3, L, xm, Y 0 0 3 ~ 3,, 4, L, m, X = X f x x x x x Y M X ~ m f xm x, L, xm, Y dstrbuted as f { X t : t 0,, } = L. hese samples wll be asymptotcally x Y f the Markov chan s ergodc. ote that the Gbbs sampler algorthm creates a Markov chan that satsfes detaled balance: F z x f x Y B x z f z Y x, z =, F z x and B x z are the forward and backward probablty transton kernels of the Markov chan. Usng the followng transton as an example: t+ t+ x t+ t+ x t t+ x3 x3 t t x4 x4 x x L L M M t t x m x m x z he forward and backward kernels n the Gbbs sampler algorthm are: 0
Stochastc System Analyss and Bayesan Model Updatng = δ δ δ 4 4 Lδ m m 3,, 4, L, m, F z x z x z x z x z x f z x x x x Y = δ δ δ 4 4 Lδ m m 3,, 4, L, m, B x z x z x z x z x z f x z z z z Y x Y F z x f = δ z x Lδ z m xm f z3 x, L, xm, Y f x, L, xm Y o δ z3 x3 o x3 Lδ, L, m m m 3 3 o δ x z f z, L, zm Y o z3 = δ x z Lδ x m zm f z3 z, L, zm, Y f z, z, x3, z4l, zm Y o δ x3 z3 o z3 f z z Y = δ x z x z f z, z, x, z L, z Y Lδ,, 3, 4L, m 3 4 m f z z x z z Y = δ x z x z f z z Y, L, m m m 3 3 o δ x z f z, L, zm Y o z3 = δ x z Lδ x m zm f x3 z, L, zm, Y f z, L, zm Y = B x z f z Y o δ x3 z3 o z3 Remarks:. Unlke MH, Gbbs sampler needs no proposal PDF, so no rejecton. But the f x x \. tradeoff s that we need to know how to sample from x. In the case that we don t know how to drectly sample from f x x \, we can x sample t usng sample-mportance resamplng, MCMC, rejecton samplng, etc. A f x x \ popular choce s to use adaptve rejecton samplng to sample from x [3]. 3. he advantages and dsadvantages of Gbbs sampler are smlar to those for MH. he analyss s also the same. 4. Gbbs sampler wll not work well when some uncertan parameters are hghly correlated condtonng on the data. he consequence s that the MC samples move very slowly when sample these hghly correlated varables. here are varants of Gbbs sampler that enjoy larger jumps n the MC samples even when such hgh correlaton exsts, e.g. Gbbs sampler wth overrelaxaton [4,5]. Hybrd Monte Carlo
Introducton: Stochastc System Analyss and Bayesan Model Updatng Both MH and Gbbs sampler create MCs wth local random walk behavor. However, ths behavor s not desrable. Wth ths behavor, t may take a long tme to have the resultng MC travel throughout the sgnfcant regon of f x Y, especally when X dmenson s hgh. Hybrd Monte Carlo s a MCMC algorthm that does not use local random walk. he basc dea s to add an auxlary uncertan varable Z to the process, where the new target PDF s proportonal to, n z = f x z h x e where from f, f x Y = a h x and n s the dmenson of X. If we are able to sample x z, t s clear that the X part of the samples wll be dstrbuted as f x Y. How do we sample from f, x z? We employ MCMC. he man trck for Hybrd Monte Carlo s to gve h x and z some physcal meanng: log [ h x ] s consdered to be the potental energy of a ball wth unt mass x s the locaton of the ball; thnk of log [ h x ] and z s the velocty of the ball. So the total energy of the ball s n log h x + z = log f x, z H x, z = to be the profle of a valley If there s no frcton when the ball rolls on the valley, the total energy H x, z s conservatve,.e. f x, z s conservatve, and the ball rolls accordng to the followng equatons: dx, log dz H x z h x = z = = =,..., n dt dt x x Procedure:. Intalze 0 X, 0 Ẑ. Solve x t, z t accordng to the followng governng equaton, log h x dx dz H x z = z = = =,..., n dt dt x x
Stochastc System Analyss and Bayesan Model Updatng wth ntal condton x0 0 = X and z0 0 = Z. Evolve the soluton for randomzed duraton of 0 R. Let X x R 0 =. 3. Resample n z = Z ~ e 4. Cycle -3 to get { t X : t 0,..., } dstrbuted as f =. hese samples wll be asymptotcally x Y f the Markov chan s ergodc. Analyss: same as MH Remarks:. he HMC algorthm ndeed samples from f, x z. o see ths, let s vew the end product of Step n the algorthm as a canddate,.e. C 0 C X = x R, Z = z R 0 s the canddate of the MCMC algorthm. One can verfy that C C, 0 0, f X Z r = = f X Z hs s because the total energy H x, z, and hence f x, z, remans constant n the soluton process of the Hamltonan equatons. herefore, the canddate X C, C Z s always accepted as the next MC sample. 0. he duraton R n Step s better to be randomzed to avod a perodc MC, although the chance of beng so s very small. 3. Step 3 s necessary snce wthout t, f x, z wll be always constant, hence the MC wll not explore the entre phase space. 4. In practce, Step cannot be solved analytcally and must be solved approxmately. Dong so, the resultng r rato wll be only approxmately equal to. o handle ths ssue, we can smply add an accept/reject step,.e. change Step to the followng: Solve x t, z t approxmately accordng to the followng governng equaton 3
Stochastc System Analyss and Bayesan Model Updatng, log h x dx dz H x z = z = = =,..., n dt dt x x wth ntal condton x0 X 0 = and z0 0 = Z. Evolve the soluton for randomzed duraton of 0 R. Let C X x R 0 = and Z C z R 0 =. Accept the canddate,.e. take C C, 0 0, f X Z r = f X Z X C = X, wth probablty r, where 0 If not accepted, repeat the prevous sample,.e. take X = X. he approxmate soluton algorthm to the Hamltonan equatons can not be arbtrary. In fact, the chosen algorthm must be reversble n tme,.e. f startng from ntal condton, we obtan the canddate soluton; a reversble algorthm must have the property that startng from that canddate and solve the equaton backward n tme, we wll obtan the same ntal condton. hs s so because the resultng HMC algorthm must satsfy the so called detaled balance or reversblty condton. he followng leapfrog fnte dfference algorthm s reversble and s accurate to the nd order n the aylor seres expanson: t t h x z t + = z t + x x= x t h x t t x t + t = x t + t z t + t t h x z t + t = z t + + x x= x t+ t h x t + t 5. One can see that HMC does not do local random walk and t s possble for HMC to make large jumps. Of course the tradeoff s that HMC requres solvng the Hamltonan equatons, whch may requres much computaton. Accordng to the experence from other researchers, t seems that the cost s usually worthwhle, compared to MH and Gbbs sampler. 4
Stochastc System Analyss and Bayesan Model Updatng References: [] Glks, W. R. 99 Dervatve-free adaptve rejecton samplng for Gbbs samplng. Bayesan Statstcs 4, eds. Bernardo, J., Berger, J., Dawd, A. P., and Smth, A. F. M. Oxford Unversty Press. 5
Stochastc System Analyss and Bayesan Model Updatng [] Glks, W. R., Best,. G. and an, K. K. C. 995 Adaptve rejecton Metropols samplng. Appled Statstcs, 44, 455-47. [3] Glks, W.R., and Wld, P. 99. Adaptve rejecton samplng for Gbbs samplng. Appled Statstcs, 4, 337--348. [4] Adler, S.L. 98. Over-relaxaton method for the Monte Carlo evaluaton of the partton functon for multquadratc actons. Physcal Revew D - Partcles and Felds, 3, 90-904. [5] eal, R.M. 995. Suppressng random walks n Markov chan Monte Carlo usng ordered overrelaxaton. echncal Report 9508, Dept. of Statstcs, Unversty of oronto. In general: Mackay, J.D.C. "Introducton to Monte Carlo methods." 6