STA4502 (Monte Carlo Estimation) Lecture Notes, Jan Feb 2013

Size: px
Start display at page:

Download "STA4502 (Monte Carlo Estimation) Lecture Notes, Jan Feb 2013"

Transcription

1 STA4502 Monte Carlo Estimation Lecture Notes, Jan Feb 203 by Jeffrey S. Rosenthal, University of Toronto Last updated: February 5, 203. Note: I will update these notes regularly on the course web page. However, they are just rough, point-form notes, with no guarantee of completeness or accuracy. They should in no way be regarded as a substitute for attending the lectures, doing the homework exercises, or reading the reference books. INTRODUCTION: Introduction to course, handout, references, prerequisites, etc. Course web page: probability.ca/sta4502 Six weeks only; counts for QUARTER-credit only. Bahen Centre room 200, Wednesdays, and Fridays 2. If not Stat Dept grad student, must REQUEST enrolment by ; need advanced undergraduate probability/statistics background, plus basic computer programming experience. Conversely, if you already know lots about MCMC etc., then this course might not be right for you since it s an INTRODUCTION to these topics. How many of you are stat grad students? undergrads? math? computer science? physics? economics? management? engineering? other? Auditing?? Theme of the course: use pseudorandomness on a computer to simulate and hence estimate important/interesting quantities. Example: Suppose want to estimate m := E[Z 4 cosz], where Z Normal0,. Monte Carlo solution: replicate a large number z,..., z n of Normal0, random variables, and let x i = zi 4 cosz i. Their mean x n n i= x i is an unbiased estimate of E[X] E[Z 4 cosz]. R: Z = rnorm00; X = Z 4 cosz; meanx [file RMC ] unstable... but if replace 00 with then x close to.23...

2 Variability?? Well, can estimate standard deviation of x by standard error of x, which is: se = n /2 sdx n /2 varx = n /2 n [file RMC ] Then what is, say, a 95% confidence interval for m? n x i x 2. Well, by central limit theorem CLT, for large n, have x Nm, v Nm, se 2. Strictly speaking, should use t distribution, not normal distribution... but if n large that doesn t really matter ignore it for now. So m x se N0,. So, P.96 < m x se < So, Px.96 se < m < x +.96 se i.e., approximate 95% confidence interval is [file RMC ] i= x.96 se, x +.96 se. Alternatively, could compute expectation as z 4 cosz e z2 /2 2π dz. Analytic? Numerical? Better? Worse? [file RMC :.23] What about higher-dimensional versions? Can t do numerical integration! Note: In this course we will just use R to automatically sample from simple distributions like Normal, Uniform, Exponential, etc. How does it work? Discussed in e.g. Statistical Computing course. What if distribution too complicated to sample from? MCMC!... including Metropolis, Gibbs, tempered, trans-dimensional,... 2

3 MONTE CARLO INTEGRATION: How to compute an integral with Monte Carlo? Re-write it as an expectation! EXAMPLE: Want to compute 0 0 gx, y dx dy. Regard this as E[gX, Y ], where X, Y i.i.d. Uniform[0, ]. e.g. gx, y = cos xy. file RMCint Easy! Get about 0.88 ± Mathematica gives e.g. estimate I = 5 0 Here 4 0 gx, y dy dx, where gx, y = cos xy gx, y dy dx = gx, y /4dy /5dx = E[5 4 gx, Y ], where X Uniform[0, 5] and Y Uniform[0, 4]. So, let X i Uniform[0, 5], and Y i Uniform[0, 4] all independent. Estimate I by M M i= 5 4 gx i, Y i. Standard error: se = M /2 sd 5 4 gx, Y,..., 5 4 gx M, Y M. With M = 0 6, get about 4. ± file RMCint2 e.g. estimate 0 0 hx, y dy dx, where hx, y = e y2 cos xy. Can t use Uniform expectations. Instead, write this as 0 0 ey hx, y e y dy dx. This is the same as E[e Y hx, Y ], where X Uniform[0, ] and Y Exponential are independent. So, estimate it by M M i= ey i hx i, Y i, where X i Uniform[0, ] and Y i Exponential i.i.d.. With M = 0 6 get about ± very accurate! file RMCint3 Check: Numerical integration [Mathematica] gives Alternatively, could write this as e5y hx, y 5 e 5y dy dx = E[ 5 e5y hx, Y ] where X Uniform[0, ] and Y Exponential5 indep.. 3

4 Then, estimate it by M M i= 5 e5y i hx i, y i, where x i Uniform[0, ] and y i Exponential5 i.i.d.. With M = 0 6, get about ± RMCint4. larger standard error... file If replace 5 by /5, get about ± about the same. So which choice is best? Whichever one minimises the standard error! λ.5, se ? END WEDNESDAY # In general, to evaluate I E[hY ] = hy πy dy, where Y has density π, could instead re-write this as I = hx πx fx fx dx, where f is easily sampled from, with fx > 0 whenever πx > 0. Then I = E hx πx fx, where X has density f. Importance Sampling Can then do classical iid Monte Carlo integration, get standard errors etc. Good if easier to sample from f than π, and/or if the function hx πx fx variable than h itself. In general, best to make hx πx fx approximately constant. is less e.g. extreme case: if I = 0 e 3x dx, then I = 0 /33e 3x dx = E[/3] where X Exponential3, so I = /3 error = 0, no MC needed. UNNORMALISED DENSITIES: Suppose now that πy = c gy, where we know g but don t know c or π. Unnormalised density, e.g. Bayesian posterior. Obviously, c =, but this might be hard to compute. gy dy Still, I = hx πx dx = hx gx dx hx c gx dx =. gx dx If sample {x i } f i.i.d., then hx gx dx = hx gx / fx fx dx = E[hX gx / fx] where X f. So, hx gx dx M M i= hx i gx i / fx i. 4

5 Similarly, gx dx M M i= gx i / fx i. So, I M i= M i= hx i gx i / fx i gx i / fx i. Importance Sampling : weighted average Not unbiased, standard errors less clear, but still consistent. Good to use same sample {x i } for both numerator and denominator: easier computationally, and smaller variance. Example: compute I EY 2 where Y has density c y 3 siny 4 cosy 5 0<y<, where c > 0 unknown and hard to compute!. Here gy = y 3 siny 4 cosy 5 0<y<, and hy = y 2. Let fy = 6 y 5 0<y<. [Fact check: if U Uniform[0, ], then X U /6 f.] M M hx i gx i / fx i sinx i= Then I i= = 4i cosx5i. file Rimp... M i= get about gx i / fx i M i= sinx 4 i cosx5 i / x2 i Or, let fy = 4 y 3 0<y<. [Then if U Uniform[0, ], then U /4 f.] M M hx i gx i / fx i sinx i= Then I i= = 4 i cosx5 i x2 i. file Rimp sinx 4i cosx5i M i= gx i / fx i END FRIDAY # M i= With importance sampling, is it important to use the same samples {x i } in both numerator and denominator? What if independent samples are used instead? Let s try it! file Rimpind Both ways work, but usually the same samples work better. What other methods are available to iid sample from π? REJECTION SAMPLER: Assume πx = c gx, with π and c unknown, g known but hard to sample from. Want to sample X π. Then if X, X 2,..., X M π iid, then can estimate E π h by M M i= hx i, etc. 5

6 Find some other, easily-sampled density f, and known K > 0, such that K fx gx for all x. Sample X f, and U Uniform[0, ] indep.. If U gx/kfx, then accept X as a draw from π. Otherwise, reject X and start over again. Now, PU gx/kfx X = x = gx/kfx, so conditional on accepting, we have that P X y U gx KfX = P X y, U P U gx KfX gx KfX = y gx fx Kfx dx gx fx Kfx dx = y gx dx gx dx = y πx dx. So, conditional on accepting, X π. Good! iid! However, prob. of accepting may be very small, then get very few samples. Example: π = N0,, i.e. gx = πx = 2π /2 exp x 2 /2. Want: E π X 4, i.e. hx = x 4. Let f be double-exponential distribution, i.e. fx = 2 e x. If K = 8, then: For x 2, Kfx = 8 2 exp x 8 2 exp 2 2π /2 πx = gx. For x 2, Kfx = 8 2 exp x 8 2 exp x2 /2 2π /2 exp x 2 /2 = πx = gx. So, can apply rejection sampler with this f and K, to get samples, estimate of E[X], estimate of E[hX], estimate of P[X < ], etc. file Rrej For Rejection Sampler, P accept = E[P accept X] = E[ gx KfX ] = K gx dx = ck. Only depends on K, not f. So, in M attempts, get about M/cK iid samples. gx Kfx fx dx = Rrej example: c =, K = 8, M = 0, 000, so get about M/8 = 250 samples. Since c fixed, try to minimise K. 6

7 Extreme case: fx = πx, so gx = πx/c = fx/c, and can take K = /c, whence P accept =, iid sampling: optimal. Note: these algorithms all work in discrete case too. Can let π, f, etc. be probability functions, i.e. probability densities with respect to counting measure. Then the algorithms proceed exactly as before. AUXILIARY VARIABLE APPROACH: related: slice sampler Suppose πx = c gx, and X, Y chosen uniformly under the graph of g. i.e., X, Y Uniform{x, y R 2 : 0 y gx}. Then X π, i.e. we have sampled from π. b a gx dx area with a<x<b Why? For a < b, Pa < X < b = total area = = b πx dx. gx dx a So, if repeat, get i.i.d. samples from π, can estimate E π h etc. Auxiliary Variable rejection sampler: If support of g contained in [L, R], and gx K, then can first sample X, Y Uniform[L, R] [0, K], then reject if Y > gx, otherwise accept as sample with X, Y Uniform{x, y : 0 y gx}, hence X π. Example: gy = y 3 siny 4 cosy 5 0<y<. Then L = 0, R =, K =. So, sample X, Y Uniform[0, ], then keep X iff Y gx. If hy = y 2, could compute e.g. E π h as the mean of the squares of the accepted samples. file Raux Can iid / importance / rejection / auxiliary sampling solve all problems? No! Many challenging cases arise, e.g. from Bayesian statistics later. Some are high-dimensional, and the above algorithms fail. Alternative algorithm: MCMC! 7

8 MARKOV CHAIN MONTE CARLO MCMC: Suppose have complicated, high-dimensional density π = c g. Want samples X, X 2,... π. Then can do Monte Carlo. Define a Markov chain random process X 0, X, X 2,..., so for large n, X n π. METROPOLIS ALGORITHM 953: Choose some initial value X 0 perhaps random. Then, given X n, choose a proposal move Y n MV NX n, σ 2 I say. Let A n = πy n / πx n = gy n / gx n, and U n Uniform[0, ]. Then, if U n < A n, set X n = Y n accept, otherwise set X n = X n reject. Repeat, for n =, 2, 3,..., M. Note: only need to compute πy n / πx n, so multiplicative constants cancel. Fact: Then, for large n, have X n π. rwm.html Java applet END WEDNESDAY #2 Handouts: class homework, project, participation. Also on course web page. Then can estimate E π h hx πx dx by: E π h M B M i=b+ hx i, where B burn-in chosen large enough so X B π, and M chosen large enough to get good Monte Carlo estimates. Aside: if accepted all proposals, then would have a random walk Markov chain. So, this is called the random walk Metropolis RWM algorithm. How large B? Difficult to say! Some theory... active area of research [see e.g. Rosenthal, Quantitative convergence rates of Markov chains: A simple account, Elec Comm Prob 2002, on instructor s web page]... usually use trial-and-error... What initial value X 0? 8

9 Virtually any one will do, but central ones best. Ideal: overdispersed starting distribution, i.e. choose X 0 randomly from some distribution that covers the important part of the state space. EXAMPLE: gy = y 3 siny 4 cosy 5 0<y<. Want to compute again! E π h where hy = y 2. Use Metropolis algorithm with proposal Y NX,. [file Rmet ] Works pretty well, but lots of variability! Plot: appears to have good mixing... EXAMPLE: πx, x 2 = C cos x x 2 I0 x 5, 0 x 2 4. Want to compute E π h, where hx, x 2 = e x + x 2 2. Metropolis algorithm... works... gets between about 34 and but large uncertainty... file Rmet2 Mathematica gets Individual plots appear to have good mixing... Joint plot shows fewer samples where x x 2 π/2 2. = END FRIDAY #2 e.g. if πx = exp i<j x j x i, then logπx = i<j x j x i. OPTIMAL SCALING: Can change proposal distribution to Y n MV NX n, σ 2 I for any σ > 0. Which is best? If σ too small, then usually accept, but chain won t move much. If σ too large, then will usually reject proposals, so chain still won t move much. Optimal: need σ just right to avoid both extremes. Goldilocks Principle Can experiment... rwm.html applet, files Rmet, Rmet2... Some theory... limited... active area of research... General principle: the acceptance rate should be far from 0 and far from. In a certain idealised high-dimensional limit, optimal acceptance rate is 0.234!. 9

10 [Roberts et al., Ann Appl Prob 997; Roberts and Rosenthal, Stat Sci 200] MCMC STANDARD ERROR: What about standard error, i.e. uncertainty? It s usually larger than in iid case due to correlations, and harder to quantify. Simplest: re-run the chain many times, with same M and B, with different initial values drawn from some overdispersed starting distribution, and compute standard error of the sequence of estimates. Then can analyse the estimates obtained as iid... But how to estimate standard error from a single run? M i.e., how to estimate v Var M B i=b+ hx i? Let hx = hx E π h, so E π h = 0. And, assume B large enough that X i π for i > B. Then, for large M B, [ v E π M B = = where = M i=b+ 2 ] hx i E π h [ = E π M B M i=b+ 2 ] hx i [ M B 2 M BE π hx i 2 + 2M B E π hx i hx i+ ] +2M B 2E π hx i hx i M B M B M B Var πh E π hx i E π hx i hx i+ + 2 E π hx i hx i Var π h + 2 Cov π hx i hx i+ + 2 Cov π hx i hx i varfact = Corr π hx i, hx i+ +2 Corr π hx i, hx i M B Var πh varfact = iid variance varfact, k= Corr π hx 0, hx k ρ k = k= k= ρ k

11 integrated auto-correlation time. Also varfact = 2 k=0 ρ k. Then can estimate both iid variance, and varfact, from the sample run, as usual. Note: to compute varfact, don t sum over all k, just e.g. until, say, ρ k < 0.05 or ρ k < 0 or... Can use R s built-in acf function, or can write your own better. Then standard error = se = v = iid-se varfact. e.g. the files Rmet and Rmet2 [modified]. Recall: true answers are about and 38.7, respectively. Usually varfact ; try to get better chains so varfact smaller. Sometimes even try to design chain to get varfact < antithetic. CONFIDENCE INTERVALS: Suppose we estimate u E π h by the quantity e = M B M i=b+ hx i, and obtain an estimate e and an approximate variance as above v. Then what is, say, a 95% confidence interval for u? Well, if have central limit theorem CLT, then for large M B, e Nu, v. So e u v /2 N0,. So, P.96 < e u v /2 < So, P.96 v < e u <.96 v i.e., with prob 95%, the interval e.96 v, e +.96 v will contain u. Again, strictly speaking, should use t distribution, not normal distribution... but if M B large that doesn t really matter ignore it for now. e.g. the files Rmet and Rmet2 [modified]. Recall: true answers are about and 38.7, respectively. But does a CLT even hold?? END WEDNESDAY #3

12 But does a CLT even hold?? Does not follow from classical i.i.d. CLT. Does not always hold. But often does. For example, CLT holds if chain is geometrically ergodic later! and E π h 2+δ < for some δ > 0. If chain also reversible then don t need δ: Roberts and Rosenthal, Geometric ergodicity and hybrid Markov chains, ECP 997. So MCMC is more complicated than standard Monte Carlo. Why should we bother? Some problems too challenging for other methods. For example... BAYESIAN STATISTICS: Have unknown parameters θ, and a statistical model likelihood function for how the distribution of the data Y depends on θ: LY θ. Have a prior distribution, representing our initial subjective? probabilities for θ: Lθ. Combining these gives a full joint distribution for θ and Y, i.e. Lθ, Y. Then posterior distribution of θ, πθ, is then the conditional distribution of θ, conditioned on the observed data y, i.e. πθ = Lθ Y = y. In terms of densities, if have prior density f θ θ, and likelihood f Y θ y, θ, then joint density is f θ,y θ, y = f θ θ f Y θ y, θ, and posterior density is πθ = f θ,y θ, y f Y y = c f θ,y θ, y = c f θ θ f Y θ y, θ. Bayesian Statistics Example: VARIANCE COMPONENTS MODEL a.k.a. random effects model : Suppose some population has overall mean µ unknown. Population consists of K groups. Observe Y i,..., Y iji from group i, for i K. Assume Y ij Nθ i, W cond. ind., where θ i and W unknown. 2

13 Assume the different θ i are linked by θ i Nµ, V cond. ind., with µ and V also unknown. Want to estimate some or all of V, W, µ, θ,..., θ K. Bayesian approach: use prior distributions, e.g. conjugate : V IGa, b ; W IGa 2, b 2 ; µ Na 3, b 3, where a i, b i known constants, and IGa, b is inverse gamma distribution, with density ba Γa e b/x x a for x > 0. Combining the above dependencies, we see that the joint density is for V, W > 0: fv, W, µ, θ,..., θ K, Y, Y 2,..., Y KJK = C e b /V V a e b 2/W W a 2 e µ a 3 2 /2b 3 K K V /2 e θ i µ 2 /2V i= J i i= j= W /2 e Y ij θ i 2 /2W K = C 2 e b/v V a e b2/w W a2 e µ a 3 2 /2b 3 V K/2 W 2 K K J i exp θ i µ 2 /2V Y ij θ i 2 /2W. i= i= j= i= J i Then πv, W, µ, θ,..., θ K = C 3 e b /V V a e b 2/W W a 2 e µ a 3 2 /2b 3 K K V /2 e θ i µ 2 /2V i= J i i= j= W /2 e Y ij θ i 2 /2W COMMENT: For big complicated π, often better to work with the LOGARITHMS, i.e. accept if logu n < loga n = logπy n logπx n. Then only need to compute logπx, which could be easier / finite. END FRIDAY #3 3

14 Bayesian Statistics Example: VARIANCE COMPONENTS MODEL cont d: After a bit of simplifying, πv, W, µ, θ,..., θ K K = Ce b/v V a e b2/w W a2 e µ a 3 2 /2b 3 V K/2 W 2 J i i= K K J i exp θ i µ 2 /2V Y ij θ i 2 /2W. i= i= j= Better to program on log scale: log πv, W, µ, θ,..., θ K =.... Dimension: d = K + 3, e.g. K = 9, d = 22. How to compute/estimate, say, E π W/V? Or sensitivity to choice of e.g. b? Numerical integration? No, too high-dimensional! Importance sampling? Perhaps, but what f? Not very efficient! Rejection sampling? What f? What K? Virtually no samples! Many applications, e.g.: Predicting success at law school D. Rubin, JASA 980, K = 82 schools. Melanoma recurrence s24/ Analysing Clinicals24 Bartolucci.pdf, K = 9 patient catagories. Comparing baseball home-run hitters J. Albert, The American Statistician 992, K = 2 players. Analysing fabric dyes Davies 967; Box/Tiao 973; Gelfand/Smith JASA 990, K = 6 batches of dyestuff. data in file Rdye INDEPENDENCE SAMPLER: Recall: with random-walk Metropolis, propose Y n MV NX n, σ 2 I d, then accept if U n < A n where U n Uniform[0, ] and A n = πy n / πx n. One alternative of many later is the independence sampler. Propose {Y n } q, i.e. the {Y n } are i.i.d. from some fixed density q, independent of X n. e.g. Y n MV N0, I d 4

15 Then accept if U n < A n where U n Uniform[0, ] and A n = πy n qx n πx n qy n. Special case of the Metropolis-Hastings algorithm, where Y n qx n,, and A n = πy n qy n, X n πx n qx n, Y n later. Very special case: if qy πy, i.e. propose exactly from target density π, then A n, i.e. make great proposals, and always accept them iid. EXAMPLE: independence sampler with πx = e x and qx = ke kx. Then if X n = x and Y n = y, then A n = e y ke kx e x ke ky k = : iid sampling great. k = 0.0: proposals way too large so-so. k = 5: proposals somewhat too small terrible. And with k = 5, confidence intervals often miss. file Rind2 Why is large k so much worse than small k? MCMC CONVERGENCE RATES, PART I: {X n } : Markov chain on X, with stationary distribution Π. Let P n x, S = P[X n S X 0 = x]. Hope that for large n, P n x, S ΠS. Let Dx, n = P n x, Π sup S X P n x, S ΠS. DEFN: chain is ergodic if lim n Dx, n = 0, for Π-a.e. x X. = e k y x. file Rind DEFN: chain is geometrically ergodic if there is ρ <, and M : X [0, ] which is Π-a.e. finite, such that Dx, n Mx ρ n for all x X and n N. DEFN: a quantitative bound on convergence is an actual number n such that Dx, n < 0.0 say. [Then sometimes say chain converges in n iterations.] Quantitative bounds often difficult though I ve worked on them a lot, but geometric ergodicity often easier to verify. Fact: CLT holds for n n i= hx i if chain is geometrically ergodic and E π h 2+δ < for some δ > 0. If chain also reversible then don t need δ: Roberts and Rosenthal, Geometric 5

16 ergodicity and hybrid Markov chains, ECP 997. If CLT holds, then have 95% confidence interval e.96 v, e +.96 v. So what do we know about ergodicity? Theorem later: if chain is irreducible and aperiodic and Π is stationary, then chain is ergodic. What about convergence rates of independence sampler? By Thm, independence sampler is ergodic provided qx > 0 whenever πx > 0. But is that sufficient? No, e.g. previous Rind example with k = 5: ergodic of course, but not geometrically ergodic, CLT does not hold, confidence intervals often miss. file Rind2 FACT: independence sampler is geometrically ergodic IF AND ONLY IF there is δ > 0 such that qx δπx for π-a.e. x X, in which case Dx, n δ n for π-a.e. x X. So, if πx = e x and qx = ke kx for x > 0, where 0 < k, then can take δ = k, so Dx, n k n. e.g. if k = 0.0, then Dx, converges after 459 iterations.. = < 0.0 for all x > 0, i.e. But if k >, then not geometrically ergodic. Fact: if k = 5, then D0, n > 0.0 for all n 4, 000, 000, while D0, n < 0.0 for all n 4, 000, 000, i.e. convergence takes between 4 million and 4 million iterations. Slow! [Roberts and Rosenthal, Quantitative Non-Geometric Convergence Bounds for Independence Samplers, MCAP 20.] What about other chains besides independence sampler? Coming soon! VARIABLE-AT-A-TIME MCMC: Propose to move just one coordinate at a time, leaving all the other coordinates fixed since changing all coordinates at once may be difficult. e.g. proposal Y n has Y n,i NX n,i, σ 2, with Y n,j = X n,j for j i. 6

17 Here Y n,i is the i th coordinate of Y n. Then accept/reject with usual Metropolis rule symmetric case: Metropolis-within- Gibbs or Metropolis-Hastings rule general case: Metropolis-Hastings-within-Gibbs. Need to choose which coordinate to update each time... Could choose coordinates in sequence, 2,..., d,, 2,... systematic-scan. Or, choose coordinate Uniform{, 2,..., d} each time random-scan. Note: one systematic-scan iteration corresponds to d random-scan ones... EXAMPLE: again πx, x 2 = C cos x x 2 I0 x 5, 0 x 2 4, and hx, x 2 = e x + x 2 2. Recall: Mathematica gives E π h = Works with systematic-scan file Rmwg or random-scan file Rmwg2. GIBBS SAMPLER: Special case of Metropolis-Hastings-within-Gibbs later. Proposal distribution for i th coordinate is equal to the conditional distribution of that coordinate according to π, conditional on the current values of all the other coordinates. Then, always accept. Reason later. Can use either systematic or random scan, just like above. END WEDNESDAY #4 EXAMPLE: Variance Components Model: Update of µ say should be from conditional density of µ, conditional on current values of all the other coordinates: Lµ V, W, θ,..., θ K, Y,..., Y JK K. This conditional density is proportional to the full joint density, but with everything except µ treated as constant. Recall: full joint density is: K = Ce b/v V a e b2/w W a2 e µ a 3 2 /2b 3 V K/2 W 2 J i i= K K J i exp θ i µ 2 /2V Y ij θ i 2 /2W. i= i= j= 7

18 So, conditional density of µ is C 2 e µ a 3 2 /2b 3 [ ] K exp θ i µ 2 /2V. i= This equals verify this! HW! C 3 exp µ 2 + K 2b 3 2V + µa 3 + b 3 V K i= θ i. Side calculation: if µ Nm, v, then density e µ m2 /2v e µ2 /2v+µm/v. Hence, here µ Nm, v, where /2v = 2b 3 + K 2V and m/v = a 3 b 3 + V K i= θ i. Solve: v = b 3 V/V + Kb 3, and m = a 3 V + b 3 K i= θ i / V + Kb 3. So, in Gibbs Sampler, each time µ is updated, we sample it from Nm, v for this m and v and always accept. Similarly HW!, conditional distribution for V is: C 4 e b /V V a V K/2 exp Recall that IGr, s has density [ ] K θ i µ 2 /2V, V > 0. i= sr Γr e s/x x r for x > 0. So, conditional distribution for V equals IGa + K/2, b + 2 K i= θ i µ 2. Can similar compute conditional distributions for W and θ i HW. So, in this case, the systematic-scan Gibbs sampler proceeds HW by: Update V from its conditional distribution IG...,.... Update W from its conditional distribution IG...,.... Update µ from its conditional distribution N...,.... Update θ i from its conditional distribution N...,..., for i =, 2,..., K. Repeat all of the above M times. Or, the random-scan Gibbs sampler proceeds by choosing one of V, W, µ, θ,..., θ K uniformly at random, and then updating that coordinate from its corresponding conditional distribution. 8

19 Then repeat this step M times [or MK + 3 times?]. MCMC CONVERGENCE RATES, PART II: FACT: if state space is finite, and chain is irreducible and aperiodic, then always geometrically ergodic. What about for random-walk Metropolis algorithm RWM, i.e. where {Y n X n } q for some fixed symmetric density q? e.g. Y n NX n, σ 2 I, or Y n Uniform[X n δ, X n + δ]. FACT: RWM is geometrically ergodic essentially if and only if π has exponential tails, i.e. there are a, b, c > 0 such that πx ae b x whenever x > c. Requires a few technical conditions: π and q continuous and positive; q has finite first moment; and π non-increasing in the tails, with in higher dims bounded Gaussian curvature. [Mengersen and Tweedie, Ann Stat 996; Roberts and Tweedie, Biometrika 996] EXAMPLES: RWM on R with usual proposals: Y n NX n, σ 2. CASE #: Π = N5, 4 2, and functional hy = y 2, so E π h = = 4. file Rnorm... σ = v. σ = 4 v. σ = 6 Does CLT hold? Yes! geometrically ergodic, and E π h p < for all p. Indeed, confidence intervals usually contain 4. file Rnorm2 CASE #2: πy = c +y 4, and functional hy = y2, so E π h = y2 +y 4 dy +y 4 dy = π/ 2 π/ 2 =. Not exponential tails, so no CLT; estimates less stable, confidence intervals often miss. file Rheavy END FRIDAY #4 CASE #3: πy = π+y 2 Cauchy, and functional hy = 0<y<0, so E π h = Π X < 0 = 2 arctan0/π = [Π0 < X < x = arctanx/π] Not geometrically ergodic. 9

20 Confidence intervals often miss file Rcauchy CASE #4: πy = π+y 2 Cauchy, and functional hy = miny2, 00. [Numerical integration: E π h. =.77] Again, not geometrically ergodic, and 95% CI often miss.77, though iid MC NOTE: does better. file Rcauchy2 Even when CLT holds, it s rather unstable, e.g. requires that chain has converged to Π, and might underestimate v. So, estimate of v is very important! varfact not always reliable? Repeated runs! Another approach is batch means, whereby chain is broken into m large batches, which are assumed to be approximately i.i.d., SO WHY DOES MCMC WORK?: Need Markov chain theory... STA447/ already know? Basic fact: if a Markov chain is irreducible and aperiodic, with stationarity distribution π, then LX n π as n. More precisely... THEOREM: If Markov chain is irreducible, with stationarity probability density π, then for π-a.e. initial value X 0 = x, n a if E π h <, then lim n n i= hx i = E π h hx πx dx; and b if chain aperiodic, then also lim PX n S = πx dx for all S X. n S Let s figure out what this all means... Notation: P i, j = PX n+ = j X n = i discrete case, or P x, A = PX n+ A X n = x general case. Also ΠA = πx dx. A Well, irreducible means that you have positive probability of eventually getting from anywhere to anywhere else. Discrete case: for all i, j X there is n N such that P X n = j X 0 = i > 0. discrete case General case: for all x X, and for all A X with ΠA > 0, there is n N such that P X n A X 0 = x > 0. 20

21 Usually satisfied for MCMC. And, aperiodic means there are no forced cycles, i.e. there do not exist disjoint nonempty subsets X, X 2,..., X d for d 2, such that P x, X i+ = for all x X i i d, and P x, X = for all x X d. Diagram. This is true for virtually any Metropolis algorithm, e.g. it s true if P x, {x} > 0 for any one state x X, e.g. if positive prob of rejection. Also true if P x, has positive density throughout S, for all x S, for some S X with ΠS > 0. Not quite guaranteed, e.g. X = {0,, 2, 3}, and π uniform on X, and Y n = X n ± mod 4. But almost always holds. What about Π being a stationary distribution?? Begin with DISCRETE CASE e.g. rwm.html. Assume for simplicity that πx > 0 for all x X. Let qx, y = PY n = y X n = x be proposal distribution, e.g. qx, x + = qx, x = /2. Always chosen to be symmetric, i.e. qx, y = qy, x. Acceptance probability is min, πy πx. State space is X, e.g. X {, 2, 3, 4, 5, 6}. Then, for i, j X with i j, P i, j = qi, j min, πj πi. Follows that chain is reversible : for all i, j X, by symmetry, πi P i, j = qi, j minπi, πj = qj, i minπi, πj = πj P j, i. Intuition: if X 0 π, i.e. PX 0 = i = πi for all i X, then PX 0 = i, X = j = PX 0 = j, X = i... time reversible... We then compute that if X 0 π, i.e. that PX 0 = i = Πi for all i X, then: PX = j = PX 0 = i P i, j = πi P i, j = πj P j, i i X i X i X 2

22 = πj i X P j, i = πj, i.e. X π too! So, the Markov chain preserves π, i.e. π is a stationary distribution. This is true for any Metropolis algorithm! It then follows from the Theorem i.e., Basic Fact that as n, LX n π, i.e. lim n P X n = i = πi for all i X. file rwm.html n Also follows that if E π h <, then lim n n i= hx i = E π h hx πx dx. LLN SO WHAT ABOUT THE MORE GENERAL, CONTINUOUS CASE? Some notation: Let X be the state space of all possible values. Usually X R d, e.g. for Variance Components Model, X = 0, 0, R R K R K+3. Let qx, y be the proposal density for y given x. So, in above case, qx, y = 2πσ d/2 exp d i= y i x i 2 /2σ 2. Symmetric: qx, y = qy, x. Let αx, y be probability of accepting a proposed move from x to y, i.e. αx, y = PU n < A n X n = x, Y n = y = PU n < πy πx = min[, πy πx ]. Let P x, S = PX S X 0 = x be the transition probabilities. Then if x S, then P x, S = PY S, U < A X 0 = x = S qx, y min[, πy/πx] dy. Shorthand: for x y, P x, dy = qx, y min[, πy/πx] dy. Then for x y, P x, dy πx dx = qx, y min[, πy/πx] dy πx dx = qx, y min[πx, πy] dy dx = P y, dx πy dy. symmetric Follows that P x, dy πx dx = P y, dx πy dy for all x, y X. reversible Shorthand: P x, dy Πdx = P y, dx Πdy. How does reversible help? 22

23 Well, suppose X 0 Π, i.e. we start in stationarity. Then PX S = PX S X 0 = x πx dx = = x X x X y S P y, dx πy dy = y S x X y S πy dy ΠS, so also X π. So, chain preserves π, i.e. π is stationary distribution. So, again, the Theorem applies. Note: key facts about qx, y are symmetry, and irreducibility. P x, dy πx dx So, could replace Y n N0, by e.g. Y n Uniform[X n, X n + ], or on discrete space Y n = X n ± with prob. 2 each, etc. METROPOLIS-HASTINGS ALGORITHMS: Hastings [Canadian!], Biometrika 970; see Now that we understand the theory, we can consider more general algorithms too... Previous Metropolis algorithm works provided proposal distribution is symmetric, i.e. qx, y = qy, x. But what if it isn t? For Metropolis, key was that qx, y αx, y πx was symmetric to make the Markov chain be reversible. [ If instead A n = πy n qy n,x n πx n qx n,y n, i.e. acceptance prob. αx, y = min, then: [ qx, y αx, y πx = qx, y min, So, still symmetric, even if q wasn t. ] πy qy,x πx qx,y, πy qy, x ] [ ] πx = min πx qx, y, πy qy, x. πx qx, y So, for Metropolis-Hastings algorithm, replace A n = πy n / πx n by A n = πy n qy n,x n πx n qx n,y n, then still reversible, and everything else remains the same. i.e., still accept if U n < A n, otherwise reject. Intuition: if qx, y >> qy, x, then Metropolis chain would spend too much time at y and not enough at x, so need to accept fewer moves x y. Do require that qx, y > 0 iff qy, x > 0. 23

24 INDEPENDENCE SAMPLER mentioned earlier: Proposals {Y n } i.i.d. from some fixed distribution say, Y n MV N0, I. Another special case of Metropolis-Hastings algorithm. Then qx, y = qy, depends only on y. So, now A n = πy n qx n πx n qy n. files Rind, Rind2 from before END WEDNESDAY #5 GIBBS SAMPLER mentioned earlier: Special case of Metropolis-Hastings-within-Gibbs. Proposal distribution for i th coordinate is equal to the conditional distribution of that coordinate according to π, conditional on the current values of all the other coordinates. That is, q i x, y = Cx i πy whenever x i = y i, where x i means all coordinates except the i th one. Here Cx i is the appropriate normalising constant which depends on x i. So Cx i = Cy i. Then A n = πy n q i Y n,x n =. n πy n So, always accept. LANGEVIN ALGORITHM: Y n MV N X n + 2 σ2 log πx n, σ 2 I. πx n q i X n,y n = πy n CY i n πx n πx n CX i Special case of Metropolis-Hastings algorithm. Intuition: tries to move in direction where π increasing. Based on discrete approximation to Langevin diffusion. Usually more efficient, but requires knowledge and computation of log π. Hard. EXAMPLE: again πx, x 2 = C cos x x 2 I0 x 5, 0 x 2 4, and hx, x 2 = e x + x 2 2. Recall: Mathematica gives E π h = Proposal distribution: Y n MV NX n, σ 2 + X n 2 2 I. 24

25 Intuition: larger proposal variance if farther from center. So, qx, y = C + x 2 2 exp y x 2 / 2 σ 2 + x 2 2. So, can run Metropolis-Hastings algorithm for this example. file RMH Usually get between 34 and 43, with claimed standard error 2. Recall: Mathematica gets EXAMPLES RE WHY DOES MCMC WORK: EXAMPLE #: Metropolis algorithm where X = Z, πx = 2 x /3, and qx, y = 2 if x y =, otherwise 0. Reversible? Yes, it s a Metropolis algorithm! π stationary? Yes, follows from reversibility! Aperiodic? Yes, since P x, {x} > 0! Irreducible? Yes: πx > 0 for all x X, so can get from x to y in x y steps. So, by theorem, probabilities and expectations converge to those of π good. EXAMPLE #2: Same as #, except now πx = 2 x for x 0, with π0 = 0. Still reversible, π stationary, aperiodic, same as before. Irreducible? No can t go from positive to negative! EXAMPLE #3: Same as #2, except now qx, y = 4 if x y 2, otherwise 0. Still reversible, π stationary, aperiodic, same as before. Irreducible? Yes can jump over 0 to get from positive to negative, and back! END FRIDAY #5 EXAMPLE #4: Metropolis algorithm with X = R, and πx = C e x6, and proposals Y n Uniform[X n, X n + ]. Reversible? Yes since qx, y still symmetric. π stationary? Yes since reversible! Irreducible? Yes since P n x, dy has positive density whenever y x n. 25

26 Aperiodic? Yes since if periodic, then if e.g. X [0, ] has positive measure, then possible to go from X directly to X, i.e. if x X [0, ], then P x, X > 0. Or, even simpler: since P x, {x} > 0 for all x X. So, by theorem, probabilities and expectations converge to those of π good. EXAMPLE #5: Same as #4, except now πx = C e x6 x<2 + x>4. Still reversible and stationary and aperiodic, same as before. But no longer irreducible: cannot jump from [4, to, 2] or back. So, does not converge. EXAMPLE #6: Same as #5, except now proposals are Y n Uniform[X n 5, X n + 5]. Still reversible and stationary and aperiodic, same as before. And now irreducible, too: now can jump from [4, to, 2] or back. EXAMPLE #7: Same as #6, except now Y n Uniform[X n 5, X n + 0]. Makes no sense proposals not symmetric, so not a Metropolis algorithm! Not even symmetrically zero, for a Metropolis-Hastings algorithm. OPTIMAL RWM PROPOSALS: Consider RWM on X = R d, where Y n MV NX n, Σ for some d d proposal covariance matrix Σ. What is best choice of Σ? Usually we take Σ = σ 2 I d for some σ > 0, and then choose σ so acceptance rate not too small, not too large e.g But can we do better? Suppose for now that Π = MV Nµ 0, Σ 0 for some fixed µ 0 and Σ 0, in dim=5. Try RWM with various proposal distributions file Ropt : first version: Y n MV NX n, I d. acc 0.06; varfact 220 second version: Y n MV NX n, 0. I d. acc 0.234; varfact 300 third version: Y n MV NX n, Σ 0. acc 0.3; varfact 5 26

27 fourth version: Y n MV NX n,.4 Σ 0. acc 0.234; varfact 7 Or in dim=20 file Ropt2, with file targ20 : Y n MV NX n, I d. acc 0.234; varfact 400 or more Y n MV NX n, Σ 0. acc 0.234; varfact 50 Conclusion: acceptance rates near are better. But also, proposals shaped like the target are better. This has been proved for targets which are orthogonal transformations of independent components Roberts et al., Ann Appl Prob 997; Roberts and Rosenthal, Stat Sci 200; Bédard, Ann Appl Prob Is approximately true for most unimodal targets... Problem: Σ 0 would usually be unknown; then what? Can perhaps adapt! ADAPTIVE MCMC: What if target covariance Σ 0 is unknown?? Can estimate target covariance based on run so far, to get empirical covariance Σ n. Then update proposal covariance on the fly, by using proposal Y n MV NX n, Σ n [or Y n MV NX n,.4σ n, or Y n MV NX n, /dσ n ]. Hope that for large n, Σ n Σ 0, so proposals nearly optimal. Usually also add ɛi d to proposal covariance, to improve stability, e.g. ɛ = Try R version, for the same MVN example as in Ropt file Radapt : Need much longer burn-in, e.g. B = 20, 000, for adaption to work. Get varfact of last 4000 iterations of about 8... optimal... competitive with Ropt The longer the run, the more benefit from adaptation. d Can also compute slow-down factor, s n d i= λ 2 in / d i= λ in, 2 where {λ in } eigenvals of Σ /2 n Σ /2 0. Starts large, should converge to. [Motivation: if Σ n = Σ 0, then λ in, so s n = dd/d 2.] 27

28 Higher dimensions: figure plotamx200.png dim=200. Works well, but it takes many iterations before the adaption is helpful. BUT IS ADAPTIVE MCMC A VALID ALGORITHM?? Not in general: see e.g. adapt.html Algorithm now non-markovian, doesn t preserve stationarity at each step. However, still converges to Π provided that the adaption i is diminishing and ii satisfies a technical condition called containment. For details see e.g. Roberts & Rosenthal, Coupling and Convergence of Adaptive MCMC J. Appl. Prob TEMPERED MCMC: Suppose Π is multi-modal, i.e. has distinct parts e.g., Π = 2 N0, + 2 N20, Usual RWM with Y n NX n, say can explore well within each mode, but how to get from one mode to the other? Idea: if Π were flatter, e.g. 2 N0, N20, 02, then much easier to get between modes. So: define a sequence Π, Π 2,..., Π m where Π = Π cold, and Π τ larger τ hot. e.g. Π τ = 2 N0, τ N20, τ 2 ; file Rtempered Then define joint Markov chain x, τ on X {, 2,..., m}. How? In the end, only count those samples where τ =. is flatter for END WEDNESDAY #6 Then define joint Markov chain x, τ on X {, 2,..., m}, with stationary distribution Π defined by ΠS {τ} = m Π τ S. Can also use other weights besides m. Define new Markov chain with both spatial moves change x and temperature moves change τ. 28

29 e.g. perhaps chain alternates between: a propose x Nx,, accept with prob min, b propose τ = τ ± prob 2 each, accept with prob min, = min,. πx,τ πx,τ π τ x π τ x πx,τ πx,τ = min, π τ x π τ x. Chain should converge to Π. In the end, only count those samples where τ =. EXAMPLE: Π = 2 N0, + 2 N20, Assume proposals are Y n NX n,. Mixing for Π: terrible! file Rtempered with dotempering=false and temp=; note the small claimed standard error! Define Π τ = 2 N0, τ N20, τ 2, for τ =, 2,..., 0. Mixing better for larger τ! temp=,2,3,4,...,0 file Rtempered with dotempering=false and Compare graphs of π and π 0 : plot commands at bottom of Rtempered... So, use above a b algorithm; converges fairly well to Π. file Rtempered, with dotempering=true So, conditional on τ =, converges to Π. Rtempered points command at end of file So, average of those hx with τ = gives good estimate of E π h. HOW TO FIND THE TEMPERED DENSITIES π τ? Usually won t know about e.g. Π τ = 2 N0, τ N20, τ 2. Instead, can e.g. let π τ x = c τ πx /τ. Sometimes write β = /τ. Then Π = Π, and π τ flatter for larger τ good. e.g. if πx density of Nµ, σ 2, then c τ πx /τ density of Nµ, τσ 2. Then temperature acceptance probability is: min, π τ x π τ x = min, c τ /τ πx /τ. c τ This depends on the c τ, which are usually unknown bad. 29

30 What to do? PARALLEL TEMPERING: a.k.a. Metropolis-Coupled MCMC, or MCMCMC Alternative to tempered MCMC. Instead, use state space X m, with m chains, i.e. one chain for each temperature. So, state at time n is X n = X n, X n2,..., X nm, where X nτ is at temperature τ. Stationary distribution is now Π = Π Π 2... Π m, i.e. ΠX S, X 2 S 2,..., X m S m = Π S Π 2 S 2... Π m S m. Then, can update the chain X n,τ at temperature τ for each τ m, by proposing e.g. Y n,τ NX n,τ,, and accepting with probability min,. π τ Y n,τ π τ X n,τ And, can also choose temperatures τ and τ e.g., at random, and propose to swap the values X n,τ and X n,τ, and accept this with probability min, π τ X n,τ π τ X n,τ π τ X n,τ π τ X n,τ. Now, normalising constants cancel, e.g. if π τ x = c τ πx /τ, then acceptance probability is: min, c τ πx n,τ /τ c τ πx n,τ /τ c τ πx n,τ /τ c τ πx n,τ /τ = min, πx n,τ /τ πx n,τ /τ, πx n,τ /τ πx n,τ /τ so c τ and c τ are not required. EXAMPLE: suppose again that Π τ = 2 N0, τ N20, τ 2, for τ =, 2,..., 0. Can run parallel tempering... works pretty well. file Rpara END FRIDAY #6 SUMMARY: Monte Carlo can be used for nearly everything! Good luck with your project, and with the rest of your studies. 30

STA3431 (Monte Carlo Methods) Lecture Notes, Winter 2011

STA3431 (Monte Carlo Methods) Lecture Notes, Winter 2011 STA3431 Monte Carlo Methods) Lecture Notes, Winter 2011 by Jeffrey S. Rosenthal, University of Toronto Last updated: April 4, 2011.) Note: I will update these notes regularly on the course web page). However,

More information

STA3431 (Monte Carlo Methods) Lecture Notes, Winter 2010

STA3431 (Monte Carlo Methods) Lecture Notes, Winter 2010 STA3431 Monte Carlo Methods Lecture Notes, Winter 2010 by Jeffrey S. Rosenthal, University of Toronto Last updated: March 29, 2010. Note: I will update these notes regularly on-line. However, they are

More information

STA3431 (Monte Carlo Methods) Lecture Notes, Fall by Jeffrey S. Rosenthal, University of Toronto (Last updated: November 16, 2018)

STA3431 (Monte Carlo Methods) Lecture Notes, Fall by Jeffrey S. Rosenthal, University of Toronto (Last updated: November 16, 2018) STA343 (Monte Carlo Methods) Lecture Notes, Fall 208 by Jeffrey S. Rosenthal, University of Toronto (Last updated: November 6, 208) Note: I will update these notes regularly (on the course web page). However,

More information

Markov Chain Monte Carlo (MCMC)

Markov Chain Monte Carlo (MCMC) Markov Chain Monte Carlo (MCMC Dependent Sampling Suppose we wish to sample from a density π, and we can evaluate π as a function but have no means to directly generate a sample. Rejection sampling can

More information

University of Toronto Department of Statistics

University of Toronto Department of Statistics Norm Comparisons for Data Augmentation by James P. Hobert Department of Statistics University of Florida and Jeffrey S. Rosenthal Department of Statistics University of Toronto Technical Report No. 0704

More information

17 : Markov Chain Monte Carlo

17 : Markov Chain Monte Carlo 10-708: Probabilistic Graphical Models, Spring 2015 17 : Markov Chain Monte Carlo Lecturer: Eric P. Xing Scribes: Heran Lin, Bin Deng, Yun Huang 1 Review of Monte Carlo Methods 1.1 Overview Monte Carlo

More information

CSC 2541: Bayesian Methods for Machine Learning

CSC 2541: Bayesian Methods for Machine Learning CSC 2541: Bayesian Methods for Machine Learning Radford M. Neal, University of Toronto, 2011 Lecture 3 More Markov Chain Monte Carlo Methods The Metropolis algorithm isn t the only way to do MCMC. We ll

More information

An introduction to adaptive MCMC

An introduction to adaptive MCMC An introduction to adaptive MCMC Gareth Roberts MIRAW Day on Monte Carlo methods March 2011 Mainly joint work with Jeff Rosenthal. http://www2.warwick.ac.uk/fac/sci/statistics/crism/ Conferences and workshops

More information

Markov Chain Monte Carlo

Markov Chain Monte Carlo Markov Chain Monte Carlo Recall: To compute the expectation E ( h(y ) ) we use the approximation E(h(Y )) 1 n n h(y ) t=1 with Y (1),..., Y (n) h(y). Thus our aim is to sample Y (1),..., Y (n) from f(y).

More information

Stat 451 Lecture Notes Markov Chain Monte Carlo. Ryan Martin UIC

Stat 451 Lecture Notes Markov Chain Monte Carlo. Ryan Martin UIC Stat 451 Lecture Notes 07 12 Markov Chain Monte Carlo Ryan Martin UIC www.math.uic.edu/~rgmartin 1 Based on Chapters 8 9 in Givens & Hoeting, Chapters 25 27 in Lange 2 Updated: April 4, 2016 1 / 42 Outline

More information

Computational statistics

Computational statistics Computational statistics Markov Chain Monte Carlo methods Thierry Denœux March 2017 Thierry Denœux Computational statistics March 2017 1 / 71 Contents of this chapter When a target density f can be evaluated

More information

STA 294: Stochastic Processes & Bayesian Nonparametrics

STA 294: Stochastic Processes & Bayesian Nonparametrics MARKOV CHAINS AND CONVERGENCE CONCEPTS Markov chains are among the simplest stochastic processes, just one step beyond iid sequences of random variables. Traditionally they ve been used in modelling a

More information

6 Markov Chain Monte Carlo (MCMC)

6 Markov Chain Monte Carlo (MCMC) 6 Markov Chain Monte Carlo (MCMC) The underlying idea in MCMC is to replace the iid samples of basic MC methods, with dependent samples from an ergodic Markov chain, whose limiting (stationary) distribution

More information

CS281A/Stat241A Lecture 22

CS281A/Stat241A Lecture 22 CS281A/Stat241A Lecture 22 p. 1/4 CS281A/Stat241A Lecture 22 Monte Carlo Methods Peter Bartlett CS281A/Stat241A Lecture 22 p. 2/4 Key ideas of this lecture Sampling in Bayesian methods: Predictive distribution

More information

Sampling from complex probability distributions

Sampling from complex probability distributions Sampling from complex probability distributions Louis J. M. Aslett (louis.aslett@durham.ac.uk) Department of Mathematical Sciences Durham University UTOPIAE Training School II 4 July 2017 1/37 Motivation

More information

Bayesian Computation via Markov chain Monte Carlo

Bayesian Computation via Markov chain Monte Carlo Bayesian Computation via Markov chain Monte Carlo Radu V. Craiu Department of Statistics University of Toronto Jeffrey S. Rosenthal Department of Statistics University of Toronto (Version of June 6, 2013.)

More information

STA205 Probability: Week 8 R. Wolpert

STA205 Probability: Week 8 R. Wolpert INFINITE COIN-TOSS AND THE LAWS OF LARGE NUMBERS The traditional interpretation of the probability of an event E is its asymptotic frequency: the limit as n of the fraction of n repeated, similar, and

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

Lecture 8: The Metropolis-Hastings Algorithm

Lecture 8: The Metropolis-Hastings Algorithm 30.10.2008 What we have seen last time: Gibbs sampler Key idea: Generate a Markov chain by updating the component of (X 1,..., X p ) in turn by drawing from the full conditionals: X (t) j Two drawbacks:

More information

Examples of Adaptive MCMC

Examples of Adaptive MCMC Examples of Adaptive MCMC by Gareth O. Roberts * and Jeffrey S. Rosenthal ** (September, 2006.) Abstract. We investigate the use of adaptive MCMC algorithms to automatically tune the Markov chain parameters

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate

More information

Introduction to Machine Learning CMU-10701

Introduction to Machine Learning CMU-10701 Introduction to Machine Learning CMU-10701 Markov Chain Monte Carlo Methods Barnabás Póczos & Aarti Singh Contents Markov Chain Monte Carlo Methods Goal & Motivation Sampling Rejection Importance Markov

More information

Stat 516, Homework 1

Stat 516, Homework 1 Stat 516, Homework 1 Due date: October 7 1. Consider an urn with n distinct balls numbered 1,..., n. We sample balls from the urn with replacement. Let N be the number of draws until we encounter a ball

More information

Adaptive Monte Carlo methods

Adaptive Monte Carlo methods Adaptive Monte Carlo methods Jean-Michel Marin Projet Select, INRIA Futurs, Université Paris-Sud joint with Randal Douc (École Polytechnique), Arnaud Guillin (Université de Marseille) and Christian Robert

More information

Quantitative Non-Geometric Convergence Bounds for Independence Samplers

Quantitative Non-Geometric Convergence Bounds for Independence Samplers Quantitative Non-Geometric Convergence Bounds for Independence Samplers by Gareth O. Roberts * and Jeffrey S. Rosenthal ** (September 28; revised July 29.) 1. Introduction. Markov chain Monte Carlo (MCMC)

More information

Winter 2019 Math 106 Topics in Applied Mathematics. Lecture 9: Markov Chain Monte Carlo

Winter 2019 Math 106 Topics in Applied Mathematics. Lecture 9: Markov Chain Monte Carlo Winter 2019 Math 106 Topics in Applied Mathematics Data-driven Uncertainty Quantification Yoonsang Lee (yoonsang.lee@dartmouth.edu) Lecture 9: Markov Chain Monte Carlo 9.1 Markov Chain A Markov Chain Monte

More information

Variance Bounding Markov Chains

Variance Bounding Markov Chains Variance Bounding Markov Chains by Gareth O. Roberts * and Jeffrey S. Rosenthal ** (September 2006; revised April 2007.) Abstract. We introduce a new property of Markov chains, called variance bounding.

More information

Computer intensive statistical methods

Computer intensive statistical methods Lecture 13 MCMC, Hybrid chains October 13, 2015 Jonas Wallin jonwal@chalmers.se Chalmers, Gothenburg university MH algorithm, Chap:6.3 The metropolis hastings requires three objects, the distribution of

More information

Minicourse on: Markov Chain Monte Carlo: Simulation Techniques in Statistics

Minicourse on: Markov Chain Monte Carlo: Simulation Techniques in Statistics Minicourse on: Markov Chain Monte Carlo: Simulation Techniques in Statistics Eric Slud, Statistics Program Lecture 1: Metropolis-Hastings Algorithm, plus background in Simulation and Markov Chains. Lecture

More information

Probabilistic Graphical Models Lecture 17: Markov chain Monte Carlo

Probabilistic Graphical Models Lecture 17: Markov chain Monte Carlo Probabilistic Graphical Models Lecture 17: Markov chain Monte Carlo Andrew Gordon Wilson www.cs.cmu.edu/~andrewgw Carnegie Mellon University March 18, 2015 1 / 45 Resources and Attribution Image credits,

More information

GENERAL STATE SPACE MARKOV CHAINS AND MCMC ALGORITHMS

GENERAL STATE SPACE MARKOV CHAINS AND MCMC ALGORITHMS GENERAL STATE SPACE MARKOV CHAINS AND MCMC ALGORITHMS by Gareth O. Roberts* and Jeffrey S. Rosenthal** (March 2004; revised August 2004) Abstract. This paper surveys various results about Markov chains

More information

Computer intensive statistical methods

Computer intensive statistical methods Lecture 11 Markov Chain Monte Carlo cont. October 6, 2015 Jonas Wallin jonwal@chalmers.se Chalmers, Gothenburg university The two stage Gibbs sampler If the conditional distributions are easy to sample

More information

SC7/SM6 Bayes Methods HT18 Lecturer: Geoff Nicholls Lecture 2: Monte Carlo Methods Notes and Problem sheets are available at http://www.stats.ox.ac.uk/~nicholls/bayesmethods/ and via the MSc weblearn pages.

More information

MARKOV CHAIN MONTE CARLO

MARKOV CHAIN MONTE CARLO MARKOV CHAIN MONTE CARLO RYAN WANG Abstract. This paper gives a brief introduction to Markov Chain Monte Carlo methods, which offer a general framework for calculating difficult integrals. We start with

More information

Metropolis Hastings. Rebecca C. Steorts Bayesian Methods and Modern Statistics: STA 360/601. Module 9

Metropolis Hastings. Rebecca C. Steorts Bayesian Methods and Modern Statistics: STA 360/601. Module 9 Metropolis Hastings Rebecca C. Steorts Bayesian Methods and Modern Statistics: STA 360/601 Module 9 1 The Metropolis-Hastings algorithm is a general term for a family of Markov chain simulation methods

More information

Stat 451 Lecture Notes Monte Carlo Integration

Stat 451 Lecture Notes Monte Carlo Integration Stat 451 Lecture Notes 06 12 Monte Carlo Integration Ryan Martin UIC www.math.uic.edu/~rgmartin 1 Based on Chapter 6 in Givens & Hoeting, Chapter 23 in Lange, and Chapters 3 4 in Robert & Casella 2 Updated:

More information

Brief introduction to Markov Chain Monte Carlo

Brief introduction to Markov Chain Monte Carlo Brief introduction to Department of Probability and Mathematical Statistics seminar Stochastic modeling in economics and finance November 7, 2011 Brief introduction to Content 1 and motivation Classical

More information

University of Toronto Department of Statistics

University of Toronto Department of Statistics Optimal Proposal Distributions and Adaptive MCMC by Jeffrey S. Rosenthal Department of Statistics University of Toronto Technical Report No. 0804 June 30, 2008 TECHNICAL REPORT SERIES University of Toronto

More information

Lecture 7 and 8: Markov Chain Monte Carlo

Lecture 7 and 8: Markov Chain Monte Carlo Lecture 7 and 8: Markov Chain Monte Carlo 4F13: Machine Learning Zoubin Ghahramani and Carl Edward Rasmussen Department of Engineering University of Cambridge http://mlg.eng.cam.ac.uk/teaching/4f13/ Ghahramani

More information

Monte Carlo Methods. Leon Gu CSD, CMU

Monte Carlo Methods. Leon Gu CSD, CMU Monte Carlo Methods Leon Gu CSD, CMU Approximate Inference EM: y-observed variables; x-hidden variables; θ-parameters; E-step: q(x) = p(x y, θ t 1 ) M-step: θ t = arg max E q(x) [log p(y, x θ)] θ Monte

More information

INTRODUCTION TO BAYESIAN STATISTICS

INTRODUCTION TO BAYESIAN STATISTICS INTRODUCTION TO BAYESIAN STATISTICS Sarat C. Dass Department of Statistics & Probability Department of Computer Science & Engineering Michigan State University TOPICS The Bayesian Framework Different Types

More information

Dynamic models. Dependent data The AR(p) model The MA(q) model Hidden Markov models. 6 Dynamic models

Dynamic models. Dependent data The AR(p) model The MA(q) model Hidden Markov models. 6 Dynamic models 6 Dependent data The AR(p) model The MA(q) model Hidden Markov models Dependent data Dependent data Huge portion of real-life data involving dependent datapoints Example (Capture-recapture) capture histories

More information

Markov Chain Monte Carlo (MCMC)

Markov Chain Monte Carlo (MCMC) School of Computer Science 10-708 Probabilistic Graphical Models Markov Chain Monte Carlo (MCMC) Readings: MacKay Ch. 29 Jordan Ch. 21 Matt Gormley Lecture 16 March 14, 2016 1 Homework 2 Housekeeping Due

More information

MCMC Methods: Gibbs and Metropolis

MCMC Methods: Gibbs and Metropolis MCMC Methods: Gibbs and Metropolis Patrick Breheny February 28 Patrick Breheny BST 701: Bayesian Modeling in Biostatistics 1/30 Introduction As we have seen, the ability to sample from the posterior distribution

More information

MCMC algorithms for fitting Bayesian models

MCMC algorithms for fitting Bayesian models MCMC algorithms for fitting Bayesian models p. 1/1 MCMC algorithms for fitting Bayesian models Sudipto Banerjee sudiptob@biostat.umn.edu University of Minnesota MCMC algorithms for fitting Bayesian models

More information

MSc MT15. Further Statistical Methods: MCMC. Lecture 5-6: Markov chains; Metropolis Hastings MCMC. Notes and Practicals available at

MSc MT15. Further Statistical Methods: MCMC. Lecture 5-6: Markov chains; Metropolis Hastings MCMC. Notes and Practicals available at MSc MT15. Further Statistical Methods: MCMC Lecture 5-6: Markov chains; Metropolis Hastings MCMC Notes and Practicals available at www.stats.ox.ac.uk\ nicholls\mscmcmc15 Markov chain Monte Carlo Methods

More information

F denotes cumulative density. denotes probability density function; (.)

F denotes cumulative density. denotes probability density function; (.) BAYESIAN ANALYSIS: FOREWORDS Notation. System means the real thing and a model is an assumed mathematical form for the system.. he probability model class M contains the set of the all admissible models

More information

Some Results on the Ergodicity of Adaptive MCMC Algorithms

Some Results on the Ergodicity of Adaptive MCMC Algorithms Some Results on the Ergodicity of Adaptive MCMC Algorithms Omar Khalil Supervisor: Jeffrey Rosenthal September 2, 2011 1 Contents 1 Andrieu-Moulines 4 2 Roberts-Rosenthal 7 3 Atchadé and Fort 8 4 Relationship

More information

April 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning

April 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning for for Advanced Topics in California Institute of Technology April 20th, 2017 1 / 50 Table of Contents for 1 2 3 4 2 / 50 History of methods for Enrico Fermi used to calculate incredibly accurate predictions

More information

The simple slice sampler is a specialised type of MCMC auxiliary variable method (Swendsen and Wang, 1987; Edwards and Sokal, 1988; Besag and Green, 1

The simple slice sampler is a specialised type of MCMC auxiliary variable method (Swendsen and Wang, 1987; Edwards and Sokal, 1988; Besag and Green, 1 Recent progress on computable bounds and the simple slice sampler by Gareth O. Roberts* and Jerey S. Rosenthal** (May, 1999.) This paper discusses general quantitative bounds on the convergence rates of

More information

MAS223 Statistical Inference and Modelling Exercises

MAS223 Statistical Inference and Modelling Exercises MAS223 Statistical Inference and Modelling Exercises The exercises are grouped into sections, corresponding to chapters of the lecture notes Within each section exercises are divided into warm-up questions,

More information

A Search and Jump Algorithm for Markov Chain Monte Carlo Sampling. Christopher Jennison. Adriana Ibrahim. Seminar at University of Kuwait

A Search and Jump Algorithm for Markov Chain Monte Carlo Sampling. Christopher Jennison. Adriana Ibrahim. Seminar at University of Kuwait A Search and Jump Algorithm for Markov Chain Monte Carlo Sampling Christopher Jennison Department of Mathematical Sciences, University of Bath, UK http://people.bath.ac.uk/mascj Adriana Ibrahim Institute

More information

Pattern Recognition and Machine Learning. Bishop Chapter 11: Sampling Methods

Pattern Recognition and Machine Learning. Bishop Chapter 11: Sampling Methods Pattern Recognition and Machine Learning Chapter 11: Sampling Methods Elise Arnaud Jakob Verbeek May 22, 2008 Outline of the chapter 11.1 Basic Sampling Algorithms 11.2 Markov Chain Monte Carlo 11.3 Gibbs

More information

Hastings-within-Gibbs Algorithm: Introduction and Application on Hierarchical Model

Hastings-within-Gibbs Algorithm: Introduction and Application on Hierarchical Model UNIVERSITY OF TEXAS AT SAN ANTONIO Hastings-within-Gibbs Algorithm: Introduction and Application on Hierarchical Model Liang Jing April 2010 1 1 ABSTRACT In this paper, common MCMC algorithms are introduced

More information

Robert Collins CSE586, PSU Intro to Sampling Methods

Robert Collins CSE586, PSU Intro to Sampling Methods Robert Collins Intro to Sampling Methods CSE586 Computer Vision II Penn State Univ Robert Collins A Brief Overview of Sampling Monte Carlo Integration Sampling and Expected Values Inverse Transform Sampling

More information

MCMC Review. MCMC Review. Gibbs Sampling. MCMC Review

MCMC Review. MCMC Review. Gibbs Sampling. MCMC Review MCMC Review http://jackman.stanford.edu/mcmc/icpsr99.pdf http://students.washington.edu/fkrogsta/bayes/stat538.pdf http://www.stat.berkeley.edu/users/terry/classes/s260.1998 /Week9a/week9a/week9a.html

More information

A quick introduction to Markov chains and Markov chain Monte Carlo (revised version)

A quick introduction to Markov chains and Markov chain Monte Carlo (revised version) A quick introduction to Markov chains and Markov chain Monte Carlo (revised version) Rasmus Waagepetersen Institute of Mathematical Sciences Aalborg University 1 Introduction These notes are intended to

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is

More information

16 : Approximate Inference: Markov Chain Monte Carlo

16 : Approximate Inference: Markov Chain Monte Carlo 10-708: Probabilistic Graphical Models 10-708, Spring 2017 16 : Approximate Inference: Markov Chain Monte Carlo Lecturer: Eric P. Xing Scribes: Yuan Yang, Chao-Ming Yen 1 Introduction As the target distribution

More information

Markov Chain Monte Carlo The Metropolis-Hastings Algorithm

Markov Chain Monte Carlo The Metropolis-Hastings Algorithm Markov Chain Monte Carlo The Metropolis-Hastings Algorithm Anthony Trubiano April 11th, 2018 1 Introduction Markov Chain Monte Carlo (MCMC) methods are a class of algorithms for sampling from a probability

More information

Markov Chains and MCMC

Markov Chains and MCMC Markov Chains and MCMC CompSci 590.02 Instructor: AshwinMachanavajjhala Lecture 4 : 590.02 Spring 13 1 Recap: Monte Carlo Method If U is a universe of items, and G is a subset satisfying some property,

More information

Markov chain Monte Carlo methods in atmospheric remote sensing

Markov chain Monte Carlo methods in atmospheric remote sensing 1 / 45 Markov chain Monte Carlo methods in atmospheric remote sensing Johanna Tamminen johanna.tamminen@fmi.fi ESA Summer School on Earth System Monitoring and Modeling July 3 Aug 11, 212, Frascati July,

More information

Optimizing and Adapting the Metropolis Algorithm

Optimizing and Adapting the Metropolis Algorithm 6 Optimizing and Adapting the Metropolis Algorithm Jeffrey S. Rosenthal University of Toronto, Toronto, ON 6.1 Introduction Many modern scientific questions involve high-dimensional data and complicated

More information

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo Group Prof. Daniel Cremers 10a. Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative is Markov Chain

More information

Control Variates for Markov Chain Monte Carlo

Control Variates for Markov Chain Monte Carlo Control Variates for Markov Chain Monte Carlo Dellaportas, P., Kontoyiannis, I., and Tsourti, Z. Dept of Statistics, AUEB Dept of Informatics, AUEB 1st Greek Stochastics Meeting Monte Carlo: Probability

More information

Ergodic Theorems. Samy Tindel. Purdue University. Probability Theory 2 - MA 539. Taken from Probability: Theory and examples by R.

Ergodic Theorems. Samy Tindel. Purdue University. Probability Theory 2 - MA 539. Taken from Probability: Theory and examples by R. Ergodic Theorems Samy Tindel Purdue University Probability Theory 2 - MA 539 Taken from Probability: Theory and examples by R. Durrett Samy T. Ergodic theorems Probability Theory 1 / 92 Outline 1 Definitions

More information

Principles of Bayesian Inference

Principles of Bayesian Inference Principles of Bayesian Inference Sudipto Banerjee University of Minnesota July 20th, 2008 1 Bayesian Principles Classical statistics: model parameters are fixed and unknown. A Bayesian thinks of parameters

More information

Reversible Markov chains

Reversible Markov chains Reversible Markov chains Variational representations and ordering Chris Sherlock Abstract This pedagogical document explains three variational representations that are useful when comparing the efficiencies

More information

Simultaneous drift conditions for Adaptive Markov Chain Monte Carlo algorithms

Simultaneous drift conditions for Adaptive Markov Chain Monte Carlo algorithms Simultaneous drift conditions for Adaptive Markov Chain Monte Carlo algorithms Yan Bai Feb 2009; Revised Nov 2009 Abstract In the paper, we mainly study ergodicity of adaptive MCMC algorithms. Assume that

More information

An introduction to Bayesian statistics and model calibration and a host of related topics

An introduction to Bayesian statistics and model calibration and a host of related topics An introduction to Bayesian statistics and model calibration and a host of related topics Derek Bingham Statistics and Actuarial Science Simon Fraser University Cast of thousands have participated in the

More information

Quantifying Uncertainty

Quantifying Uncertainty Sai Ravela M. I. T Last Updated: Spring 2013 1 Markov Chain Monte Carlo Monte Carlo sampling made for large scale problems via Markov Chains Monte Carlo Sampling Rejection Sampling Importance Sampling

More information

ECE531 Lecture 8: Non-Random Parameter Estimation

ECE531 Lecture 8: Non-Random Parameter Estimation ECE531 Lecture 8: Non-Random Parameter Estimation D. Richard Brown III Worcester Polytechnic Institute 19-March-2009 Worcester Polytechnic Institute D. Richard Brown III 19-March-2009 1 / 25 Introduction

More information

Markov Chain Monte Carlo

Markov Chain Monte Carlo Chapter 5 Markov Chain Monte Carlo MCMC is a kind of improvement of the Monte Carlo method By sampling from a Markov chain whose stationary distribution is the desired sampling distributuion, it is possible

More information

Monte Carlo methods for sampling-based Stochastic Optimization

Monte Carlo methods for sampling-based Stochastic Optimization Monte Carlo methods for sampling-based Stochastic Optimization Gersende FORT LTCI CNRS & Telecom ParisTech Paris, France Joint works with B. Jourdain, T. Lelièvre, G. Stoltz from ENPC and E. Kuhn from

More information

Markov chain Monte Carlo

Markov chain Monte Carlo Markov chain Monte Carlo Feng Li feng.li@cufe.edu.cn School of Statistics and Mathematics Central University of Finance and Economics Revised on April 24, 2017 Today we are going to learn... 1 Markov Chains

More information

A short diversion into the theory of Markov chains, with a view to Markov chain Monte Carlo methods

A short diversion into the theory of Markov chains, with a view to Markov chain Monte Carlo methods A short diversion into the theory of Markov chains, with a view to Markov chain Monte Carlo methods by Kasper K. Berthelsen and Jesper Møller June 2004 2004-01 DEPARTMENT OF MATHEMATICAL SCIENCES AALBORG

More information

References. Markov-Chain Monte Carlo. Recall: Sampling Motivation. Problem. Recall: Sampling Methods. CSE586 Computer Vision II

References. Markov-Chain Monte Carlo. Recall: Sampling Motivation. Problem. Recall: Sampling Methods. CSE586 Computer Vision II References Markov-Chain Monte Carlo CSE586 Computer Vision II Spring 2010, Penn State Univ. Recall: Sampling Motivation If we can generate random samples x i from a given distribution P(x), then we can

More information

19 : Slice Sampling and HMC

19 : Slice Sampling and HMC 10-708: Probabilistic Graphical Models 10-708, Spring 2018 19 : Slice Sampling and HMC Lecturer: Kayhan Batmanghelich Scribes: Boxiang Lyu 1 MCMC (Auxiliary Variables Methods) In inference, we are often

More information

Lecture 6: Markov Chain Monte Carlo

Lecture 6: Markov Chain Monte Carlo Lecture 6: Markov Chain Monte Carlo D. Jason Koskinen koskinen@nbi.ku.dk Photo by Howard Jackman University of Copenhagen Advanced Methods in Applied Statistics Feb - Apr 2016 Niels Bohr Institute 2 Outline

More information

Spring 2012 Math 541B Exam 1

Spring 2012 Math 541B Exam 1 Spring 2012 Math 541B Exam 1 1. A sample of size n is drawn without replacement from an urn containing N balls, m of which are red and N m are black; the balls are otherwise indistinguishable. Let X denote

More information

SAMPLING ALGORITHMS. In general. Inference in Bayesian models

SAMPLING ALGORITHMS. In general. Inference in Bayesian models SAMPLING ALGORITHMS SAMPLING ALGORITHMS In general A sampling algorithm is an algorithm that outputs samples x 1, x 2,... from a given distribution P or density p. Sampling algorithms can for example be

More information

CS242: Probabilistic Graphical Models Lecture 7B: Markov Chain Monte Carlo & Gibbs Sampling

CS242: Probabilistic Graphical Models Lecture 7B: Markov Chain Monte Carlo & Gibbs Sampling CS242: Probabilistic Graphical Models Lecture 7B: Markov Chain Monte Carlo & Gibbs Sampling Professor Erik Sudderth Brown University Computer Science October 27, 2016 Some figures and materials courtesy

More information

Part 4: Multi-parameter and normal models

Part 4: Multi-parameter and normal models Part 4: Multi-parameter and normal models 1 The normal model Perhaps the most useful (or utilized) probability model for data analysis is the normal distribution There are several reasons for this, e.g.,

More information

Statistics & Data Sciences: First Year Prelim Exam May 2018

Statistics & Data Sciences: First Year Prelim Exam May 2018 Statistics & Data Sciences: First Year Prelim Exam May 2018 Instructions: 1. Do not turn this page until instructed to do so. 2. Start each new question on a new sheet of paper. 3. This is a closed book

More information

Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016

Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016 Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016 EPSY 905: Intro to Bayesian and MCMC Today s Class An

More information

Monte-Carlo MMD-MA, Université Paris-Dauphine. Xiaolu Tan

Monte-Carlo MMD-MA, Université Paris-Dauphine. Xiaolu Tan Monte-Carlo MMD-MA, Université Paris-Dauphine Xiaolu Tan tan@ceremade.dauphine.fr Septembre 2015 Contents 1 Introduction 1 1.1 The principle.................................. 1 1.2 The error analysis

More information

Examples of Adaptive MCMC

Examples of Adaptive MCMC Examples of Adaptive MCMC by Gareth O. Roberts * and Jeffrey S. Rosenthal ** (September 2006; revised January 2008.) Abstract. We investigate the use of adaptive MCMC algorithms to automatically tune the

More information

Math 456: Mathematical Modeling. Tuesday, April 9th, 2018

Math 456: Mathematical Modeling. Tuesday, April 9th, 2018 Math 456: Mathematical Modeling Tuesday, April 9th, 2018 The Ergodic theorem Tuesday, April 9th, 2018 Today 1. Asymptotic frequency (or: How to use the stationary distribution to estimate the average amount

More information

Lecture 21: Convergence of transformations and generating a random variable

Lecture 21: Convergence of transformations and generating a random variable Lecture 21: Convergence of transformations and generating a random variable If Z n converges to Z in some sense, we often need to check whether h(z n ) converges to h(z ) in the same sense. Continuous

More information

Stat 535 C - Statistical Computing & Monte Carlo Methods. Lecture February Arnaud Doucet

Stat 535 C - Statistical Computing & Monte Carlo Methods. Lecture February Arnaud Doucet Stat 535 C - Statistical Computing & Monte Carlo Methods Lecture 13-28 February 2006 Arnaud Doucet Email: arnaud@cs.ubc.ca 1 1.1 Outline Limitations of Gibbs sampling. Metropolis-Hastings algorithm. Proof

More information

Bayesian GLMs and Metropolis-Hastings Algorithm

Bayesian GLMs and Metropolis-Hastings Algorithm Bayesian GLMs and Metropolis-Hastings Algorithm We have seen that with conjugate or semi-conjugate prior distributions the Gibbs sampler can be used to sample from the posterior distribution. In situations,

More information

Faithful couplings of Markov chains: now equals forever

Faithful couplings of Markov chains: now equals forever Faithful couplings of Markov chains: now equals forever by Jeffrey S. Rosenthal* Department of Statistics, University of Toronto, Toronto, Ontario, Canada M5S 1A1 Phone: (416) 978-4594; Internet: jeff@utstat.toronto.edu

More information

Bayesian Methods with Monte Carlo Markov Chains II

Bayesian Methods with Monte Carlo Markov Chains II Bayesian Methods with Monte Carlo Markov Chains II Henry Horng-Shing Lu Institute of Statistics National Chiao Tung University hslu@stat.nctu.edu.tw http://tigpbp.iis.sinica.edu.tw/courses.htm 1 Part 3

More information

Markov chain Monte Carlo

Markov chain Monte Carlo Markov chain Monte Carlo Karl Oskar Ekvall Galin L. Jones University of Minnesota March 12, 2019 Abstract Practically relevant statistical models often give rise to probability distributions that are analytically

More information

Markov Chain Monte Carlo methods

Markov Chain Monte Carlo methods Markov Chain Monte Carlo methods Tomas McKelvey and Lennart Svensson Signal Processing Group Department of Signals and Systems Chalmers University of Technology, Sweden November 26, 2012 Today s learning

More information

Introduction to Computational Biology Lecture # 14: MCMC - Markov Chain Monte Carlo

Introduction to Computational Biology Lecture # 14: MCMC - Markov Chain Monte Carlo Introduction to Computational Biology Lecture # 14: MCMC - Markov Chain Monte Carlo Assaf Weiner Tuesday, March 13, 2007 1 Introduction Today we will return to the motif finding problem, in lecture 10

More information

ON CONVERGENCE RATES OF GIBBS SAMPLERS FOR UNIFORM DISTRIBUTIONS

ON CONVERGENCE RATES OF GIBBS SAMPLERS FOR UNIFORM DISTRIBUTIONS The Annals of Applied Probability 1998, Vol. 8, No. 4, 1291 1302 ON CONVERGENCE RATES OF GIBBS SAMPLERS FOR UNIFORM DISTRIBUTIONS By Gareth O. Roberts 1 and Jeffrey S. Rosenthal 2 University of Cambridge

More information

ST 740: Markov Chain Monte Carlo

ST 740: Markov Chain Monte Carlo ST 740: Markov Chain Monte Carlo Alyson Wilson Department of Statistics North Carolina State University October 14, 2012 A. Wilson (NCSU Stsatistics) MCMC October 14, 2012 1 / 20 Convergence Diagnostics:

More information

Simple Techniques for Improving SGD. CS6787 Lecture 2 Fall 2017

Simple Techniques for Improving SGD. CS6787 Lecture 2 Fall 2017 Simple Techniques for Improving SGD CS6787 Lecture 2 Fall 2017 Step Sizes and Convergence Where we left off Stochastic gradient descent x t+1 = x t rf(x t ; yĩt ) Much faster per iteration than gradient

More information

Recent Advances in Regional Adaptation for MCMC

Recent Advances in Regional Adaptation for MCMC Recent Advances in Regional Adaptation for MCMC Radu Craiu Department of Statistics University of Toronto Collaborators: Yan Bai (Statistics, Toronto) Antonio Fabio di Narzo (Statistics, Bologna) Jeffrey

More information