Bayesian Inference and Decision Theory

Size: px

Start display at page:

Download "Bayesian Inference and Decision Theory"

Moses Parsons
6 years ago
Views:

1 Bayesian Inference and Decision Theory Instructor: Kathryn Blackmond Laskey Room 2214 ENGR (703) Office Hours: Tuesday and Thursday 4:30-5:30 PM, or by appointment Spring 2018 Unit 6: Gibbs Sampling Unit 6(v2a) - 1 -

2 Learning Objectives for Unit 6 Describe how Gibbs sampling works Implement a simple Gibbs sampler Use JAGS to perform Gibbs sampling Estimate posterior quantities from the output of a Gibbs sampler for the posterior distribution Describe some MCMC diagnostics Apply diagnostics to assess adequacy of MCMC sampler output Unit 6(v2a) - 2 -

3 Review: Steps in Bayesian Data Analysis 1. Determine the question: We are concerned with understanding the process by which a data set x = x 1,, x n was generated 2. Specify the likelihood: f(x θ) expresses probability distribution of data conditional on parameter 3. Specify the prior distribution: g(θ) represents beliefs about parameter prior to seeing observations x 4. Find the (exact or approximate) posterior distribution: For a Bayesian, the posterior distribution is everything needed to draw conclusions about θ Once we have specified the likelihood and the prior, the posterior distribution is completely determined Approximation is needed when posterior distribution is intractable 5. Summarize the posterior distribution and draw conclusions: We report posterior summaries such as mean, credible interval, or predictive probabilities Summaries are chosen to address the original question We also do analyses to check model adequacy Unit 6(v2a) - 3 -

4 Step 4: Find / Approximate the Posterior Distribution When the prior and likelihood form a conjugate pair, we have a closed form expression for the posterior distribution and many posterior quantities There is no closed-form expression for some posterior quantities Example: difference in defect rates for two plants where defect rates are independent Gamma random variables Sometimes we can estimate these quantities using direct Monte Carlo For many interesting problems no exact posterior distribution can be found We cannot use direct Monte Carlo We need another way to approximate the posterior distribution Markov Chain Monte Carlo is a class of methods for taking correlated (not iid) draws from the posterior distribution MCMC can be applied for many problems for which direct Monte Carlo cannot be used Gibbs sampling is the simplest MCMC method Unit 6(v2a) - 4 -

5 Example: Normal Random Variable with Independent Mean and Precision Problem: infer mean and precision of normal data In Unit 5 we used the normal-gamma conjugate prior Prior knowledge about mean Θ and precision Ρ are dependent The greater the precision of an observation, the more sure we are about the prior mean This might not be a faithful representation of our prior information Consider a prior distribution in which Θ and Ρ are independent a priori and Ρ has a gamma distribution with shape α and scale β Θ has a normal distribution with mean µ and standard deviation τ This is not a conjugate distribution There is no closed-form expression for the posterior distribution Ρ Y Ρ Y Θ _ X Normal-gamma conjugate prior n X = 1 x n i Y = (x i x) 2 i=1 Θ _ X Independent normal and gamma priors n i=1 Unit 6(v2a) - 5 -

6 A Semi-Conjugate Prior Distribution A prior distribution for two (or more) parameters is semi-conjugate if the prior distribution for each parameter given the others is conjugate The independent normal and gamma prior distribution is semi-conjugate Observations: X 1,, X n Θ, Ρ ~ Normal(Θ, 1/Ρ 1/2 ) Distribution for Θ given Ρ = ρ and X 1:n : Prior distribution: Θ ~ Normal(µ, τ) is independent of Ρ Posterior distribution: Θ Ρ=ρ, X 1:n ~ Normal(µ*, τ*) µ /τ 2 + ρ X (see Unit 5) τ* = (1/τ 2 + nρ) -1/2 i µ* = i 1/τ 2 + nρ Distribution for Ρ given Θ = θ and X 1:n : Prior distribution: Ρ ~ Gamma(α, β) is independent of Θ Posterior distribution: Ρ Θ=θ, X 1:n ~ Gamma(α*, β*) α* = α + n/2 β* = ( β (X i θ) 2 ) 1 (see next page for derivation) i Ρ Y Θ _ X Unit 6(v2a) - 6 -

7 Semi-Conjugate Prior Distribution: Details Distribution for Θ given Ρ = ρ and X 1:n is just the case of known standard deviation from Unit 5 Θ Ρ=ρ, X 1:n ~ Normal(µ*, τ*) τ* = (1/τ 2 + nρ) -1/2 µ* = We find the distribution for Ρ given Θ = θ and X 1:n by considering the limiting case of the normal-gamma distribution as the precision multiplier tends to infinity (i.e. the mean has infinite precision a priori) If X 1:n are iid Normal(Θ, Ρ -1/2 ) and prior distribution for (Θ, Ρ) is Normal- Gamma(µ, k, α, β ) then posterior distribution for Ρ is Gamma(α*, β*) with α* = α +! " β* = β $% + % x " ) x " +,! ) x μ " $% ",-! = β $% + % x " ) x "! ) + x μ " $% " %-!/, β $% + % x " ) x " +! ) x μ " $% " as k = β $% + % " x ) μ " ) µ /τ 2 + ρ i X i 1/τ 2 + nρ $% = β $% + % " x ) θ " ) Conditioning on Θ = θ and X 1:n means assuming that Θ = θ is known with infinite precision (k ) to be equal to the prior mean μ. $% Unit 6(v2a) - 7 -

8 Approximating the Posterior Distribution for a Semi-Conjugate Prior We have no closed-form expression for the posterior distribution of (Θ, Ρ) given X 1:n We cannot approximate with direct Monte Carlo We can approximate using Gibbs sampling: INITIALIZE: Chose arbitrary initial parameter values θ (0), ρ (0) SAMPLE: For k = 1,, M» Sample θ (k) from g(θ ρ (k-1), x)» Sample ρ (k) from g(ρ θ (k), x) Facts about Gibbs sampling: Successive draws are correlated: (θ (k), ρ (k) ) depends on (θ (k-1), ρ (k-1) ) The sequence (θ (1), ρ (1) ), (θ (2), ρ (2) ), is a Markov chain This Markov chain has a unique stationary distribution equal to the posterior distribution of (Θ, Ρ) given X 1:n We can use the samples (θ (1), ρ (1) ), (θ (2), ρ (2) ), to approximate posterior quantities of interest Unit 6(v2a) - 8 -

9 Review: Markov Chain A Markov chain (of order 1) is a sequence of random variables X 1, X 2, such that X i is independent of all lowernumbered X j (j < i-1) given X i-1 The X i can be univariate or multivariate Pr(X i X 1, X 2, X i-1 ) = Pr(X i X i-1 ) In an order k Markov chain, X i is independent of all lowernumbered X j (j < i) given X i-1,, X i-k Under fairly general conditions a Markov chain has a unique stationary distribution π(x) If X i has distribution π(x) then so does X i+1 X 1 X 2 X 3 X 4 Unit 6(v2a) - 9 -

Example of a Markov Chain States: Cold, Exposed, Healthy Allowable transitions: Cold à Cold (p=0.12) Cold à Healthy (p=0.88) Exposed à Cold (p=0.75) Exposed à Healthy (p=0.25) Healthy à Exposed (p=0.

10 Example of a Markov Chain States: Cold, Exposed, Healthy Allowable transitions: Cold à Cold (p=0.12) Cold à Healthy (p=0.88) Exposed à Cold (p=0.75) Exposed à Healthy (p=0.25) Healthy à Exposed (p=0.12) Healthy à Healthy (p=0.88) Unique stationary distribution P st (Cold) = ; P st (Exposed) = ; P st (Healthy) = All initial distributions evolve to stationary distribution Cold_Status_1 Cold_Status_2 Cold_Status_3 Cold_Status_4 Cold_Status_5 Cold Exposed Healthy Cold Exposed Healthy Cold Exposed Healthy Cold Exposed Healthy Cold Exposed Healthy Cold_Status_1 Cold_Status_2 Cold_Status_3 Cold_Status_4 Cold_Status_5 Cold Exposed Healthy Cold Exposed Healthy Cold Exposed Healthy Cold Exposed Healthy Cold Exposed Healthy Cold_Status_1 Cold_Status_2 Cold_Status_3 Cold_Status_4 Cold_Status_5 Cold Exposed Healthy Cold Exposed Healthy Cold Exposed Healthy Cold Exposed Healthy Cold Exposed Healthy Unit 6(v2a)

11 Markov Chain of Gibbs Samples for (θ, ρ) INITIALIZE: Chose arbitrary initial parameter values θ (0), ρ (0) SAMPLE: For k = 1,, M Sample θ (k) from g(θ ρ (k-1), X=x) (posterior distribution of Θ ρ (k-1), X=x)» Normal with mean (µ/τ 2 + nρσx i )/(1/τ 2 + nρ (k-1) ) and precision (1/τ 2 + nρ (k-1) ) Sample ρ (k) from g (ρ θ (k), X=x) (posterior distribution of Ρ θ (k), X=x)» Gamma with shape α + n/2 and scale (β -1 + ½ Σ(x i -θ (k) ) 2 ) -1 This process gives a Markov chain with states (θ (k), ρ (k) ) (θ (k), ρ (k) ) is independent of the past given (θ (k-1), ρ (k-1) ) g(θ (1) ρ (0), X=x) g(θ (2) ρ (1), X=x) g(θ (3) ρ (2), X=x) θ (0) θ (1) θ (2) θ (3) ρ (0) ρ (1) ρ (2) ρ (3) g(ρ (1) θ (1), X=x) g(ρ (2) θ (2), X=x) g(ρ (3) θ (3), X=x) Unit 6(v2a)

12 Reaction Time Example We analyzed a data set of reaction times in Unit 5 using a noninformative conjugate normal-gamma distribution g(θ,ρ) ρ 1 Normal-Gamma(µ, k, α, β) with µ = 0, k = 0, α = % ", β = Although we can find the posterior distribution exactly, we will use Gibbs sampling to illustrate the method Posterior distribution of Θ given Ρ=ρ and x 1,, x n is normal with Mean µ* = x = 5.73 and precision ρ* = nρ (posterior distribution from normal conjugate prior with known variance and uninformative prior on mean, i.e., k=0) Posterior distribution of Ρ given Θ=θ and x 1,, x n is gamma with Shape α* = n/2 ½ Scale β* = %! x " )7% ) θ " $% (known mean formula from page 6 with β = ) We sample repeatedly from these distributions to find the Gibbs sampling estimate of the posterior distribution Unit 6(v2a)

13 Results: Gibbs Sampling for Reaction Times with Semi-Conjugate Prior 10,000 samples were drawn from the Gibbs sampler for the posterior distribution given the 30 reaction time observations: 95% credible interval for Θ: [5.68, 5.78] 95% credible interval for Σ: [0.102, 0.171] Kernel Density Plots for Marginal Posterior Densities of Θ and Σ R code is available on Blackboard g(θ x) g(σ x) θ σ Unit 6(v2a)

14 Scatterplots for Gibbs Sampler Output Scatterplots for 10,000 samples from Gibbs sampler for posterior distribution given 30 observations on the first non-schizophrenic subject Left: joint distribution of mean and precision Right: joint distribution of mean and standard deviation Unit 6(v2a)

15 Comparison: Exact, Direct MC and Gibbs for Normal Model with Conjugate Prior The Unit 5 example used a normal-gamma (µ=0, k=0, α=-0.5, β= ) conjugate prior distribution for (Θ,Ρ) The exact posterior distribution for (Θ,Ρ) given X is a normal-gamma (µ*=5.73, k*=30, α*=14.5, β*=4.30) distribution Marginal distribution of Ρ is Gamma(α*, β*) Marginal density for Σ can be found with a bit of calculus (see next page) Marginal distribution of Θ is nonstandard t with center µ* and spread (k*α*β*) -1/2 We can approximate this distribution by simulating iid normal-gamma (µ*, k*, α*, β*) observations: Simulate ρ m from a Gamma(a*, β*) distribution, and simulate θ m from a Normal(µ*, (k*ρ m ) -1/2 )) distribution In this unit we approximated this distribution by Gibbs sampling Sample θ m from a normal distribution with mean µ* and standard deviation (k*ρ m-1 ) -1/2 ( ) 1 Sample ρ m from a gamma distribution with shape α* and scale β (x i θ m ) 2 i Unit 6(v2a)

16 Comparison of Posterior Density Estimates Density Direct MC KD Gibbs KD Theoretical t Plot compares exact and approximate posterior density functions for mean of reaction time distribution Dashed green line shows kernel density estimate from 10,000 direct Monte Carlo samples Dotted blue line shows kernel density estimate from 10,000 Gibbs samples Solid red line shows posterior t density with center µ* = 5.732, spread 1/(k*α*β*) 1/2 = , and degrees of freedom 2α* = Theta R code is available on Blackboard Unit 6(v2a)

17 Gibbs Sampling in General Suppose we wish to estimate g(y x) = g(y 1, y 2,, y p x) Sometimes we cannot sample directly from g(y x), but we can sample from each of the full conditional distributions g(y i y 1,, y i-1, y i+1,, y p, x) In such a case, we can apply Gibbs sampling as follows: INITIALIZE: Chose initial parameter values y 1 (0), y 2 (0),, y p (0) SAMPLE: For m = 1,, M» Sample y 1 (m) from g(y 1 y 2 (m-1),, y p (m-1), x)» Sample y 2 (m) from g(y 2 y 1 (m), y 3 (m-1),, y p (m-1), x)» Sample y i (m) from g(y i y 1 (m),, y i-1 (m), y i+1 (m-1),, y p (m-1), x)» Sample y p (m) from g(y p y 1 (m),, y p-1 (m), x) This sampling process is a Markov chain, because the distribution of y 1 (m),, y p (m) is independent of the past given y 1 (m-1),, y p (m-1) Under fairly general conditions g(y x) is the unique stationary distribution Unit 6(v2a)

18 Markov Chain Monte Carlo General-purpose class of Monte Carlo algorithms originating in statistical physics Often applied in problems for which exact computation of posterior distribution is intractable Goal: estimate a target distribution by Monte Carlo sampling Method: Construct a Markov chain with a unique stationary distribution equal to the target distribution P(X) Sample from this Markov chain Estimate P(X) by the frequency of X in the sample (often discarding a burn-in period) Remarks MCMC takes correlated draws from a distribution constructed to have target distribution as stationary distribution We use MCMC when we cannot take iid draws from the target distribution The most common MCMC samplers are the Gibbs sampler (this unit) and the Metropolis-Hastings sampler (a generalization of the Gibbs sampler we will study later) Unit 6(v2a)

19 MCMC Computation MCMC is an active area of research and is widely used in applications Software for doing MCMC is available for free use R packages: MATLAB tools: Python tools: Bayesian Inference Using Gibbs Sampling (BUGS): WinBUGS JAGS Stan Many people write custom code for specific applications Unit 6(v2a)

20 BUGS is: Bayesian Inference Using Gibbs Sampling A high-level language for defining Bayesian models A library of sampling routines An interface for running the sampler An output processor for processing and interpreting results BUGS is intended to free the modeler to focus on the problem without worrying about details of inference implementation Incarnations of BUGS: Classic BUGS WinBUGS JAGS developed 1995, cross-platform, not maintained Windows-only GUI, creates coda files for input to R, latest news on blog is dated 2012 cross-platform, interfaces directly to R with rjags and R2jags We will focus on JAGS because it is cross-platform, can be called from R, and is currently being maintained Unit 6(v2a)

21 Quick Guide to Installing and Running JAGS JAGS runs on Linux, Mac, and Windows and interfaces with R through the rjags and R2jags packages. To install JAGS and set it up to be used from R: If necessary Download and install R and potentially a user interface to R like R Studio (see here for tips on getting started withr). Download and install JAGS as per operating system requirements. Install additional R packages: e.g., rjags to interface with JAGS R2jags to call JAGS from R (depends on rjags) coda to process MCMC output superdiag for MCMC convergence diagnostics Source: Unit 6(v2a)

22 JAGS Example: Reaction Times (1 of 2) 1. Specify model in BUGS language and save as.jags file) 1. Run the model from R Unit 6(v2a)

23 JAGS Example: Reaction Times (2 of 2) 3. Analyze output using coda package It is common to thin the chain by keeping only every k th observation. This reduces the serial correlation. Summary Table: Deviance = -log (y θ,ρ) is a measure of how well the observations fit the model Unit 6(v2a)

24 Traceplots deviance D θ, ρ = 2 log P(x θ, ρ) Measures how well model fits Used to compare models Unit 6(v2a)

25 Kernel Density Plots Unit 6(v2a)

26 MCMC Diagnostics Direct Monte Carlo generates iid samples from the distribution of interest MCMC generates a Markov chain in which successive realizations are correlated When the target distribution is multi-model, MCMC sampler can get stuck in regions near a local mode, yielding a poor approximation to the target distribution MCMC diagnostics help us to identify problems with the sampler and to assess whether we have collected enough samples to get a good approximation to the target distribution Some MCMC diagnostics Traceplot plots a parameter against iteration number to help diagnose whether the sampler is getting stuck in a local region Sample autocorrelation function (acf) evaluates correlation between elements of the sequence as a function of the time separation Effective sample size Uses acf to estimate the number of independent MC draws needed to achieve same precision as the MCMC samples Convergence diagnostics Run parallel MCMC chains and evaluate convergence using within and between chain variance Unit 6(v2a)

27 Example: Weaver Ants Body lengths of weaver ant workers show a bimodal distribution Minor workers are a little more than half the size of major workers There is very little overlap in the size distributions A mixture of two normal distributions provides a good model for the body length data Minor workers (36%)» Mean 4.8 mm» Std dev 0.36 mm Major workers (64%)» Mean 7.7 mm» Std dev 0.61 mm Body Length (mm) Mixture Density Sample Frequency Weber, NA (1946). Dimorphism in the African Oecophylla worker and an anomaly (Hym.: Formicidae). Annals of the Entomological Society of America 39: pp Unit 6(v2a)

28 R Code for Ant Mixture Density Plot #Ants example #Data from Weber, 1946, "Dimorphism in the African Oecophyulla Worker and an Anomaly # lengthcounts <- c(8,41,52,7,6,11,32,56,59,23,5) lengths <- c(4.0,4.5,5.0,5.5,6.0,6.5,7.0,7.5,8.0,8.5,9.0) # Mixture model mu1 <- 4.8 sd1 < mu2 <- 7.7 sd2 < pr1 < pr2 <- (1-pr1) Z X Ant type Ant length xvals <- 350:1100/100 mixdens <- pr1*dnorm(xvals,mu1,sd1)+pr2*dnorm(xvals,mu2,sd2) # Sample Frequencies and Mixture Density Plot plot(xvals,mixdens,col="red",type="l",main="",ylab="",xlab="body Length (mm)") lines(lengths,lengthcounts/(300*.5),col="darkcyan",type="h",lwd=3) legend(8.2,0.42,c("mixture Density","Sample Frequency"),col=c("red","darkcyan"),lty=c(1,1),lwd=c(1,3)) Unit 6(v2a)

29 Direct Monte Carlo for Ant Lengths The distribution for ant lengths can be simulated directly: Simulate z = 1 with probability 0.36 and z=2 with probability 0.64 If z=1 simulate length from Normal(4.8, 0.36) If z=2 simulate length from Normal(7.7, 0.61) This produces an iid sample of ant lengths Z X Histogram of 1000 Direct MC Values Trace Plot of 1000 Direct MC Values Frequency Simulated Body Length (mm) Direct MC Simulated Body Length (mm) Iteration Unit 6(v2a)

30 R Code for Direct MC # Direct Monte Carlo xd <- NULL zd <- NULL for (i in 1:numSim) { zval <- rbinom(1,1,pr1) if (zval==1) zd[i] <- 1 else zd[i] <- 2 if (zval==1) xd[i] <- rnorm(1,mu1,sd1) else xd[i] <- rnorm(1,mu2,sd2) } # Trace plot and Histogram plot(1:numsim,xd,main="",ylab="simulated Body Length (mm)",xlab="iteration") histogram(xd,xlab="direct MC Simulated Body Length (mm)",ylab="frequency") Unit 6(v2a)

31 Gibbs Sampling for Ant Lengths This example illustrates problems that can occur with MCMC (you do not want to do this!) Gibbs sampling to simulate ant length distribution: Initialize length x 0 For each k» Calculate L 1 = f(x k-1 4.9, 0.36) (normal density with mean 4.9, sd 0.36)» Calculate L 2 = f(x k-1 7.7, 0.61) (normal density with mean 7.7, sd 0.61)» Calculate p 1 = 0.36L 1 / (0.36L L 2 )» Simulate z k = 1 or 2, with probabilities p 1 and 1-p 1 respectively» If z k =1 simulate x k from Normal(4.8, 0.36)» If z k =2 simulate length from Normal(7.7, 0.61) This produces a sample of ant lengths Consecutive observations are correlated This is a Markov chain with stationary distribution equal to the target mixture distribution Due to correlation between successive samples, this is a very inefficient way to simulate samples from the target distribution Z X Unit 6(v2a)

32 Gibbs Sampling Results (1000 Samples) Histogram of 1000 Gibbs Samples Trace Plot of 1000 Gibbs Samples 30 Frequency Simulated Body Length (mm) Gibbs Simulated Body Length (mm) Iteration Unit 6(v2a)

33 Gibbs Sampling Results (10,000 Samples) Histogram of 10,000 Gibbs Samples Trace Plot of 10,000 Gibbs Samples 20 Frequency Simulated Body Length (mm) Gibbs Simulated Body Length (mm) Iteration Unit 6(v2a)

34 Comparison: Gibbs Sampling for Parameters of Reaction Time Distribution Histogram of 10,000 Gibbs Samples Trace Plot of 10,000 Gibbs Samples Frequency Simulated Mean Log Reaction Time Gibbs Samples of Mean Log Reaction Time Iteration Unit 6(v2a)

35 R Code for Gibbs Sampling # Gibbs sampling xval <- mu1 xg <- NULL #Starting value zg <- NULL for (i in 1:numSim) { } likz1 <- pr1*dnorm(xval,mu1,sd1) likz2 <- pr2*dnorm(xval,mu2,sd2) prz1 <- likz1/(likz1+likz2) zval <- rbinom(1,1,prz1) if (zval==1) zg[i] <- 1 else zg[i] <- 2 if (zval==1) xval <- rnorm(1,mu1,sd1) else xval <- rnorm(1,mu2,sd2) xg[i] <- xval # Trace plot and Histogram plot(1:numsim,xg,main="",ylab="simulated Body Length (mm)",xlab="iteration") histogram(xg,xlab="gibbs Simulated Body Length (mm)",ylab="frequency") Unit 6(v2a)

36 Autocorrelation Function The lag-k autocorrelation function (acf) estimates correlation between observations k steps apart The lag-1 autocorrelation for the 10,000 Gibbs simulations is The lag-1 autocorrelation for the 1,000 direct MC simulations is R command: acf(x) ACF for 10,000 Gibbs Samples ACF for 1000 Direct MC Samples ACF ACF Lag Lag Unit 6(v2a)

37 Effective Sample Size We can use the autocorrelation function to estimate the number of independent draws we would need to give the same precision as our MCMC sample The effectivesize function in R calculates such an estimate To use this function, you must load the coda package This package provides output analysis and diagnostics for MCMC samples Effective sample sizes for reaction time and ant length simulations For 10,000 Gibbs samples of reaction time mean and standard deviation, effective size was 9650 for the mean and 9711 for the standard deviation For 1000 direct MC samples of ant lengths, the effective size was 1000 For 10,000 Gibbs samples of ant lengths, the effective size was 28.5 For ant length simulation, we clearly prefer direct MC to Gibbs sampling For many problems, MCMC is necessary because we cannot sample directly from the target distribution of interest We must be careful to draw enough samples for reliable inference We must be on the lookout for problems such as multimodality Unit 6(v2a)

38 Other MCMC Diagnostics Potential scale reduction (Gelman and Rubin, 1992) Compares within and between variance components for multiple MCMC chains Chains are started at overdispersed starting points; convergence occurs when output of all chains is indistinguishable Large values (above 1.1 or 1.2) suggest chain has not converged Geweke (1992) z-score Test for equality of means of first and last part of a Markov chain Burn-in period may be discarded but more than half the chain should be retained Heidelberger and Welch (1983) diagnostic Uses Cramer-von-Mises statistic to test null hypothesis that sampled values come from stationary distribution Raftery and Lewis (1992) diagnostic Use on a short pilot run of the chain Provides information on sample size for chain with no correlation between successive samples These diagnostics are available as part of the coda package in R. Documentation is available at Unit 6(v2a)

39 Caveat In hard problems it can be very difficult to assess whether realizations of a MCMC sampler provide an adequate approximation to the target distribution Although MCMC diagnostics are helpful, it is possible for a chain to be stuck in a local optimum without being detected by MCMC diagnostics For highly multi-modal problems it may be that the best we can do is find a good local mode of the posterior distribution Unit 6(v2a)

40 JAGS References MCMC Diagnostics Gelman, A., & Rubin, D. B. (1992). Inference from Iterative Simulation using Multiple Sequences. Statistical Science, 7, Geweke, J. (1992). Evaluating the accuracy of sampling-based approaches to the calculation of posterior moments. In Bayesian Statistics 4, Bernardo, J. M., Berger, J. O., Dawid, A. P. and Smith, A. F. M. (eds.), Oxford: Oxford University Press. Heidelberger P and Welch PD. (1983). Simulation run length control in the presence of an initial transient. Opns Res., 31, Raftery, Adrian E.; Lewis, Steven M. (1992). [Practical Markov Chain Monte Carlo]: Comment: One Long Run with Diagnostics: Implementation Strategies for Markov Chain Monte Carlo. Statist. Sci. 7, no. 4, Unit 6(v2a)

41 Summary and Synthesis Gibbs sampling is a Markov Chain Monte Carlo (MCMC) approximation method that can be applied to problems in which it is possible to sample from the full conditional distributions of each target variable given all the others Gibbs sampling can be applied in cases for which direct Monte Carlo is infeasible Like all MCMC methods, Gibbs sampling yields a correlated sequence of draws A number of MCMC diagnostic tools can help assess the severity of autocorrelation in the chain and evaluate whether enough samples have been collected Although these diagnostics are useful, they can be deceptive For very hard problems, it may be infeasible to obtain an accurate estimate of the posterior distribution, and good local optima are the best that can be done Gibbs sampling (and other MCMC algorithms) have proven very useful for otherwise intractable estimation problems Tools are available for running MCMC samplers from R Unit 6(v2a)

STAT 425: Introduction to Bayesian Analysis

STAT 425: Introduction to Bayesian Analysis Marina Vannucci Rice University, USA Fall 2017 Marina Vannucci (Rice University, USA) Bayesian Analysis (Part 2) Fall 2017 1 / 19 Part 2: Markov chain Monte