Target trackng example Flterng: Xt Y1: t (man nterest) Smoothng: X1: t Y1: t (also gven wth SIS) However as we have seen, the estmate of ths dstrbuton breaks down when t gets large due to the weghts becomng degenerate (f we don t resample). If we resample, most of the values sampled for X 1 wll dsappear when t gets large (related to the weght breakdown). So SIS sn t useful for all problems. Gbbs samplng Specal case of Markov Chan Monte Carlo (MCMC) Instead of generatng ndependent samples, t generates dependent samples va a Markov chan. 1 2 3 X X X Useful for a wde range of problems. 1
Popular for Bayesan analyses, but s a general samplng procedure. For example, t can be used to do smoothng n the target trackng example. Smlar to SIS n that the random varable X s decomposed nto X = { X1, X2,, Xk } and each pece s smulated separately. However the condtonng structure s dfferent. When samplng X j, t s drawn condtonal on all other components of X. Gbbs sampler A) Startng value: X 0 = { X 0 0 0 1, X2,, Xk } Pcked by some mechansm t t t t B) Sample X { X1, X2,, X k } = by 1) 2) X ~ X X, X,, X k t t 1 t 1 t 1 1 1 2 3 X ~ X X, X,, X k t t t 1 t 1 2 2 1 3 2
j ) X ~ X X,, X, X,, X t t t t 1 t 1 j j 1 j 1 j+ 1 k k ) X t ~ t 1,, t k X j X Xk 1 Under certan regularty condtons, the 1 2 3 realzatons X, X, X, form a Markov chan X. wth statonary dstrbuton [ ] Thus the realzatons can be treated as dependent samples from the desred dstrbuton. Example: (Nuclear pump falure) Gaver & O Murcheartagh (Technometrcs, 1987) Gelfand & Smth (JASA, 1990) Observed 10 nuclear reactor pumps Counted the number of falures for each pump 3
Pump Falures ( s ) Obs Tme ( t ) Obs Rate ( l ) 1 5 94.320 0.053 2 1 15.720 0.064 3 5 62.880 0.080 4 14 125.760 0.111 5 3 5.240 0.573 6 19 31.440 0.604 7 1 1.048 0.954 8 1 1.048 0.954 9 4 2.096 1.910 10 22 10.480 2.099 (Obs Tme n 1000 s of hours) (Obs Rate = Falures / Tme) 4
Want to determne the true falure rate for each pump wth the followng herarchcal model s ~ Posson β ~Gamma, ( t ) ( α β) ( ) β ~IGamma γ,1δ Note: β ~IGamma ( γ,1δ ) s equvalent to f 1 ~Gamma,1 β ( s ) ( ) π β ( ) ρ β Want to determne = = = ( t ) α 1 α γ + 1 ( γ δ ) s s e β Γ γ δ e β e! ( α) δ β Γ ( γ) t β 1) S for each pump = 1,, 10 2) β S where( S { s,, s } = ) 1 10 5
Note that both sets of these dstrbutons are hard to get analytcally. Can show that p ( S) where {,, } α + s α+ s 1 t 1 10α+ γ ( δ + ) Γ ( α + s ) =. 1 10 Note that the s are correlated and tryng to get the margnal for each looks to be ntractable analytcally. Run a Gbbs sampler to determne β, S. From ths sampler we can get the desred dstrbutons S and β S. A possble Gbbs scheme Step 1) Sample 1 ~ 1 ( 1), β,s Step 10) Sample 10 ~ 10 ( 10), β,s Step 11) Sample β ~ β,s where ( ) = { 1,, j 1, j 1,, j + 10} t e 6
Need the followng condtonal dstrbutons ~ j j ( ), β, S = j β, s j j 1 = Gamma α + s j, t j + 1 β β ~ β, S = β 1 = IGamma γ + 10 α, δ + Ths can be gotten from the jont dstrbuton by ncludng only the terms n the product that contan the random varable of nterest [ β,, S ] ( ) s 10 t 10 α 1 β γ δ β t e e δ e = a = 1 s! = 1 β Γ α β Γ γ ( ) γ + 1 ( ) e.g. for j, whch terms above have a j n them. 7
Equvalently, you can do ths by lookng at the graph structure of the model by only ncludng terms that correspond to edges jonng to the node of nterest. e.g. for β, whch edges connect wth the node for β. S β Example Run: α = 1.8 δ = 1 γ = 0.1 n = 1000 0 β = l 8
Pump 1 Pump 2 Densty 0 5 10 15 Densty 0 1 2 3 4 5 0.00 0.05 0.10 0.15 0.20 0.25 0.30 lambda 0.0 0.2 0.4 0.6 0.8 1.0 lambda Pump 7 Pump 8 Densty 0.0 0.2 0.4 0.6 0.8 Densty 0.0 0.2 0.4 0.6 0.8 0 1 2 3 4 5 6 lambda 0 1 2 3 4 5 6 lambda Pump 10 Beta Densty 0.0 0.4 0.8 Densty 0.0 1.0 2.0 3.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 lambda 0.5 1.0 1.5 beta 9
Pump Mean Medan Std Dev 1 0.0702 0.0668 0.0268 2 0.1542 0.1363 0.0925 3 0.1039 0.0988 0.0399 4 0.1233 0.1206 0.0310 5 0.6263 0.5805 0.2924 6 0.6136 0.6040 0.1351 7 0.8241 0.7102 0.5267 8 0.8268 0.7129 0.5309 9 1.2949 1.2040 0.5776 10 1.8404 1.8121 0.3903 Mean Medan Std Dev Beta 0.4372 0.4161 0.1315 10
beta_+1 0.4 0.6 0.8 1.0 ( 1 ) Cor β, β + = 0.302 0.4 0.6 0.8 1.0 beta_ 11
Pump 1 lambda_+1 0.02 0.04 0.06 0.08 0.10 0.12 0.02 0.04 0.06 0.08 0.10 0.12 lambda_ ( 1 1 1 ) Cor, + = 0.012 Pump 9 lambda_+1 0.5 1.0 1.5 2.0 2.5 3.0 0.5 1.0 1.5 2.0 2.5 3.0 lambda_ ( 1 9 9 ) Cor, + = 0.091 12
Pump 7 lambda_+1 0.0 0.5 1.0 1.5 2.0 0.0 0.5 1.0 1.5 2.0 lambda_ 1 ( 7 7 ) Cor, + = 0.063 Pump 8 lambda_+1 0.0 0.5 1.0 1.5 2.0 0.0 0.5 1.0 1.5 2.0 lambda_ 1 ( 8 8 ) Cor, + = 0.142 13
Target trackng wth the Gbbs sampler As mentoned last tme, the smoothng problem, X1: k Y1: k, sn t solved very well wth SIS. However t can be done very easly wth Gbbs samplng. Step j, j = 1,, k 1 Step k Draw X ~ X X 1, X + 1, Y j j j j j Draw X ~ X X 1, Y k k k k As all the components nvolved n these condtonal dstrbutons are normal, each of these condtonal dstrbutons are normal, thus are easly sampled. In the SIS analyss, t was assumed that all of the parameters of the movement and measurement error dstrbutons (all varances) and the startng pont were assumed known. Ths can easly be relaxed by puttng prors on X 0, Λ, and Σ and samplng them as well as part of the Markov chan. 14
Λ X 0 X 1 X 2 X 3 Z 1 Z 2 Z 3 Σ The sampler needs to be modfed as Step 0 Draw X0 ~ X0 X1, Λ Step j, j = 1,, k 1 Draw X ~ j X j X j 1, X j+ 1, Yj, Λ Step k Draw Xk ~ Xk Xk 1, Yk, Λ 15
Step k + 1 Draw Λ Λ X 0: Step k + 2 ~ k ~, Draw Σ Σ X0: k Y1: K Ths can be performed by Gbbs samplng f the prors on X 0 s Normal and the prors on Λ and Σ are IGamma. Condtons for Gbbs Samplng to work Whle you can always run the chan, t may not gve the answer you want. That s, the realzatons may not have the desred statonary dstrbuton. One-step transtons: p( x y ) n-step transtons: pn ( x y ). Statonary dstrbuton: π ( x) = lm p n ( x y) n 16
If t exsts, t satsfes ( ) ( ) π ( ) π x = p x y y dy A stronger condton whch shows that π ( x ) s the densty of the statonary dstrbuton s π ( x) p( y x) = π ( y) p( x y) holds for all x & y (detaled balance). Note that detaled balance statonarty but statonarty doesn t mply detaled balance. If the followng two condtons hold, the chan wll have the desred statonary dstrbuton. Irreducblty: The chan generated must be rreducble. That s t s possble to get from each state to every other state n a fnte number of steps. 17
Not all problems lead to rreducble chans. Example: ABO blood types The chldren s data?? mples that the chld wth blood type AB must have genotype AB and that the chld wth blood type O must have AB O genotype OO. The only possble way for the two chldren to nhert those genotypes f for one parent to have genotype AO and for the other parent to have genotype BO. However t s not possble to say whch parent has whch genotype wth certanty. By a smple symmetry argument [ = & = ] = P[ Dad = BO & Mom = AO] P Dad AO Mom BO = 0.5 18
Lets try runnng a Gbbs sampler, by frst generatng mom s genotype gven dad s and then dad s gven mom s. Let start the chan wth Dad = AO. Step 1: Generate Mom P Mom = AO Dad = AO = 0 P Mom = BO Dad = AO = 1 so Mom = BO. Step 2: Generate Dad P Dad = AO Mom = BO = 1 P Dad = BO Mom = BO = so Dad = AO. Ths mples that every realzaton of the chan has Mom = BO & Dad = AO. If the chan s started wth Dad = BO, every realzaton of that chan wll have Mom = AO & Dad = BO. 0 19
The reducble chan n ths case does not have the correct statonary dstrbuton. (Well reducble chans don t really have statonary dstrbutons anyway). But runnng the descrbed Gbbs sampler wll not correctly the descrbe the dstrbuton of the mother and father s genotypes. Aperodcty: Don t want a perodc chan (e.g. certan states can only occur on when t s even) Ths volates the dea that each state has a long run frequency margnally. Startng Ponts For every chan you need to specfy a startng pont. There are a number of approaches for choosng ths. 1) Pror means δ = =. γ 0 In pump example, set β E [ β] 20
2) Estmate from data In pump example, E[ l ] = αβ, so set β =. α 0 l In target trackng example, set startng postons at each tme to average observed postons, the dfferences of these to get the veloctes. 3) Sample from pror 4) Ad hoc choces In pump example, set 0 β = For many problems, ths choce can be mportant. The statonary dstrbuton s an asymptotc property and t may take a long tme for the chan to converge. 21
Start = l-bar / Alpha Beta 0.2 0.3 0.4 0.5 0.6 0 10 20 30 40 50 Imputaton Start = Infnty Beta 0.3 0.4 0.5 0.6 0.7 0 10 20 30 40 50 Imputaton Start = 0 Beta 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0 10 20 30 40 50 Imputaton 22
0 β = 0 (actually 100 Startng wth 10 ), the ntal draws are not consstent wth the statonary dstrbuton seen later n the chan. Whle for ths example, the problem clears up quckly, for other problems t can take a whle. Ths s more common whch larger problems, that mght have mllons, or maybe bllons of varables beng sampled n a complete sngle scan through the data. Ths can occur wth large space tme problems, such as the Tropcal Pacfc sea surface temperature predctons dscussed at <http://www.stat.ohostate.edu/~sses/collab_enso.php>. 23
Forecast map for December 2002 based on data from January 1970 to May 2002 Observed December 2002 map The usual approach to have a burn-n perod where the ntal samples are thrown away snce they may not be representatve of samples from the statonary dstrbuton. 24
The followng table contans estmates of the posteror means of the 11 parameters n the pump example wth 3 dfferent startng ponts. The frst 200 mputatons were dscarded and then the next 1000 mputatons were sampled. 0 Pump β = l α 0 β = 0 β = 0 1 0.0688 0.0704 0.0715 2 0.1531 0.1531 0.1575 3 0.1064 0.1024 0.1050 4 0.1234 0.1236 0.1221 5 0.6008 0.6198 0.6319 6 0.6116 0.6145 0.6163 7 0.7744 0.8501 0.8118 8 0.8173 0.8224 0.8190 9 1.2584 1.2748 1.2857 10 1.8393 1.8536 1.8409 β 0.4256 0.4358 0.4334 25
Often the bgger the problem, the longer the burn-n perod desred. However those are the problems where tme consderatons wll lmt the total number of mputatons that can be done. So you do want to thnk about startng values for your chan. Gbbs samplng and Bayes Choce of prors For Gbbs samplng to be effcent, the draws n each step of the procedure need to be feasble. That suggests that conjugate dstrbutons need to be used as part of the herarchcal model, as was done n pump and target trackng examples. However conjugacy s not strctly requred, as rejecton samplng wth log-concave dstrbutons mght be able to be used n some problems. Ths dea s sometmes used n the software package WnBUGS (Bayesan analyss Usng Gbbs Samplng). 26
However for some problems the model you want to analyze s not conjugate and the trcks to get around non-conjugacy won t work. For example, lets change model for the pump example to ~ Posson µ, σ ~LogN, s ( t ) σ ( µ σ ) 2 2 2 ( ) ( α β) µ ~Logstc ντ, ~ Webull, Good luck on runnng a Gbbs sampler on ths model (I thnk). Other samplng technques are needed, for ths and other more complcated problems. 27
Metropols Hastngs Algorthm (M-H) A general approach for constructng a Markov chan that has the desred statonary π = π j ) dstrbuton ( ( ) 1) Proposal dstrbuton: Assume that j X t =. Need to propose a new q = q j. state wth dstrbuton j ( ) 2) Calculate the Hastngs rato a j π jq j = mn,1 πqj 3) Acceptance/Reject step Generate U ~ U ( 0,1) and set X t + 1 = = j f U a t ( X ) j otherwse 28
Notes: 1) Gbbs samplng s a specal case of M-H as for each step, π jq j = 1 π q j whch mples the relatonshp also holds for a complete scan through all the varables. 2) The Metropols (Metropols et al, 1953) algorthm was based on a symmetrc proposal dstrbuton ( qj = qj ) a j π j = mn,1 π So a hgher probablty state wll always be accepted. 3) As wth many other samplng procedures, π and q only need to be known up to normalzng constants as they wll be cancelled out when calculatng the Hastngs rato. 29
4) Perodcty sn t a problem usually. For many proposals, q > 0 for all. Also f t+ 1 t a j < 1, P X X = = > 0, thus some states have perod 1, whch mples the chan s aperodc. 5) qa j j gves the 1-step transton probabltes of the chan (e.g. ts p( x y ) n the earler notaton). 6) Detaled balance s easy. Wthout loss of generalty, assume that π jq π q j j < 1 (whch mples a j < 1 and a j = 1) Then π qa = π q j j j = π q j j = π qa π q j π q j j j j j 30
7) The bg problem s rreducblty. However by settng the proposal to correspond to a rreducble chan solves ths. Proposal dstrbuton deas: 1) Approxmate the dstrbuton. For example use a normal wth smlar means and varances. Or use a t wth a moderate number of degrees of freedom. 2) Random walk q( y x) = q( y x) If there s a contnuous state process, you could use ( ) y = x + ε; ε ~ q For a dscrete process, you could use 0.4 j = 1 q( j ) = 0.2 j = 0.4 j = + 1 31
3) Autoregressve chan ( ) ; ~ ( ) y = a + B x a + z z q For the random walk and autoregressve chans, q does not need to correspond to a symmetrc dstrbuton (though that s common). 4) Independence sampler ( ) = q( y) q y x For an ndependence sampler you want q to be smlar to π. a j π jq = mn,1 πq j If they are too dfferent, q π could get very small, makng t dffcult to move from state. (The chan mxes slowly). 32
5) Block at a tme Deal wth varables n blocks lke the Gbbs sampler. Sometmes referred to Metropols wthn Gbbs. Allows for complex problems to be broken down nto smpler ones. Any M-H style update can be used wthn each block (e.g. random walk for one block, ndependence sampler for the next, Gbbs for the one after that). Allows for a Gbbs style sampler, but wthout the worry about conjugate dstrbutons n the model to make samplng easer. Pump Example: ~ Posson ( t ) ( µ σ ) 2 ( ) µ, σ ~LogN, s σ 2 2 µ ~ N ν, τ 2 ( γ δ) ~IGamma, 33
2 Can perform Gbbs on µ and σ but not on, due the non-conjugacy of the Posson and log Normal dstrbutons. Step, = 1,, 10 (M-H): 2 Sample from s, µσ, wth proposal * 2 ~logn, θ (Multplcatve random walk) ( ) HR = * ( t) s ( t ) e * t * 1 log µ φ * σ σ 1 log µ φ σ σ * 1 log log φ θ θ * 1 log log φ * θ θ s t e aj ( HR ) = mn,1 34
Step 11 (Gbbs): 2 2 Sample µ from µσ,, ντ, ~ N ( mean, var) where Step 12 (Gbbs): 2 Sample σ from 1 ν mean = var log 2 + 2 σ τ 1 n 1 var = + 2 2 σ τ σ 2, µ, γ, δ 1 ~IGamma γ + 5, δ + ( log µ ) 2 2 35
Parameters for run Burn-n: 1000 Imputatons: 100,000 ν = -50 2 τ = 100 γ = 1 δ = 100 2 θ = 0.01 Startng values = l 1 µ = log 10 2 1 l ( log ) 2 σ = l µ 9 36
Other optons 1) Combne steps 1 10 nto a sngle draw. Wth ths opton all s change or none do. In the sampler used, whether each changes s ndependent of the other s. The opton used s probably preferable, as t should lead to better mxng of the chan. 2 2) Combne samplng, µ, and σ nto a sngle M-H step. Probably suboptmal as the proposal dstrbuton won t be a great match for the jont posteror dstrbuton of 2, µ, and σ. 37
Rejecton rates Havng some rejecton can be good. Wth the multplcatve random walk sampler 2 used, f θ s too small, there wll be very few rejectons, but the sampler wll move too slowly through the space. 2 Increasng θ wll lead to better mxng, as bgger jumps can be made, though t wll lead to hgher rejecton rates. You need to fnd a balance between rejecton rates, mxng of the chan, and coverage of the state space. For some problems, a rejecton rate of 50% s fne and I ve seen reports for large problems usng normal random walk proposals the rejecton rates of 75% are optmal. 38
Rejecton rates for falure rates proposals under dfferent random walk varances Pump 0.000001 0.0001 0.01 0.04 1 0.00012 0.00613 0.07045 0.13776 2 0.00009 0.00531 0.03141 0.06130 3 0.00034 0.00784 0.07107 0.13754 4 0.00043 0.01126 0.11705 0.22482 5 0.00028 0.00691 0.05521 0.10705 6 0.00126 0.01442 0.13511 0.26028 7 0.00012 0.00148 0.03027 0.05735 8 0.00007 0.00414 0.02854 0.05824 9 0.00024 0.00559 0.06105 0.12131 10 0.00070 0.01461 0.14790 0.27735 39
Theta^2 = 0.000001 Lambda_1 0.00 0.10 0.20 0 e+00 2 e+04 4 e+04 6 e+04 8 e+04 1 e+05 Tme Theta^2 = 0.0001 Lambda_1 0.00 0.10 0.20 0 e+00 2 e+04 4 e+04 6 e+04 8 e+04 1 e+05 Tme Theta^2 = 0.01 Lambda_1 0.00 0.10 0.20 0 e+00 2 e+04 4 e+04 6 e+04 8 e+04 1 e+05 Tme Theta^2 = 0.04 Lambda_1 0.00 0.10 0.20 0 e+00 2 e+04 4 e+04 6 e+04 8 e+04 1 e+05 Tme 40
Standard errors n MCMC As dscussed before, the correlaton of the chan of the chan must be taken nto account when determnng standard errors of quanttes estmated by the sampler. Suppose we use x to estmate and that the burn-n perod was long enough to get nto the statonary dstrbuton. Then σ Var 2 2 n 2 n 1 ( x) = n + ( n j) j = 1 ρ j For a reasonable chan, the autocorrelatons wll de off and so lets assume that they wll be neglgble for j > K. Then the above reduces to 2 σ Var 2 2 n ( x) = n + ( n j) K j = 1 ρ j If the autocorrelatons de off farly quckly, σ and ρ j can be estmated consstently (though wth some bas) by the usual emprcal moments. 2 41
Another approach s blockng. Assume that n = Jm for ntegers J and m. Then let jm 1 x = x ; j = 1,, J j m = j 1 m + 1 ( ) Note that x = x. If m s large relatve to K, then the correlatons between the x j should neglgble and the varance can be estmated as f the x were ndependent. j If the correlaton s slghtly larger, t mght be reasonable to assume that the correlaton between x j and x j + 1 s some value ρ to be determned, but that correlatons at larger lags are neglgble. In ths case ( ) ( ) 1 + x x 2 Var Var j J ρ 42
Estmates wth m = 100 Parameter x SE ρ 1 0.05290 0.00071 0.36116 2 0.06926 0.00277 0.66197 3 0.07837 0.00106 0.35354 4 0.11053 0.00056 0.10520 5 0.56167 0.01119 0.46975 6 0.60546 0.00237 0.10960 7 0.92318 0.04068 0.67346 8 0.90361 0.03766 0.63510 9 1.82900 0.02884 0.33629 10 2.10188 0.00726 0.05263 µ -2.52492 0.01384 0.41517 2 σ 27.15958 0.09967 0.07579 43
Estmates wth m = 1000 Parameter x SE ρ 1 0.05290 0.00075 0.13239 2 0.06926 0.00399 0.18756 3 0.07837 0.00088-0.13079 4 0.11053 0.00045-0.15794 5 0.56167 0.01205-0.00838 6 0.60546 0.00226-0.07845 7 0.92318 0.06081 0.12201 8 0.90361 0.04822 0.04495 9 1.82900 0.03303 0.07779 10 2.10188 0.00757 0.06487 µ -2.52492 0.01981 0.15224 2 σ 27.15958 0.13956 0.29726 44
Standard error estmates for pump example m = 1000 m = 100 Independent 1 0.000752 0.000710 0.000075 2 0.003992 0.002769 0.000205 3 0.000885 0.001063 0.000111 4 0.000446 0.000555 0.000094 5 0.012051 0.011193 0.001009 6 0.002258 0.002373 0.000439 7 0.060813 0.040679 0.002970 8 0.048219 0.037656 0.002807 9 0.033030 0.028835 0.002945 10 0.007568 0.007264 0.001428 µ 0.019808 0.013840 0.005729 2 σ 0.139560 0.099674 0.056767 45