Structured Markov Chain Monte Carlo

Size: px
Start display at page:

Download "Structured Markov Chain Monte Carlo"

Transcription

1 Structured Markov Chain Monte Carlo by Daniel J. SARGENT 1, James S. HODGES, and Bradley P. CARLIN 1 Section of Biostatistics, Mayo Clinic Division of Biostatistics, School of Public Health, University of Minnesota July 13, 1999 Abstract In this paper we introduce a general method for Bayesian computing in richly-parameterized models, Structured Markov Chain Monte Carlo (SMCMC), that is based on a blocked hybrid of the Gibbs sampling and Metropolis-Hastings algorithms. SMCMC speeds algorithm convergence by using the structure that is present in the problem to suggest an appropriate Metropolis-Hastings candidate distribution. While the approach is easiest to describe for hierarchical normal linear models, we show that its extension to both non-normal and nonlinear cases is straightforward. After describing the method in detail we compare its performance (in terms of runtime and autocorrelation in the samples) to other existing methods, including the single-site updating Gibbs sampler available in the popular BUGS software package. Our results suggest signicant improvements in convergence for many problems using SMCMC, as well as broad applicability of the method, including previously intractable hierarchical nonlinear model settings. KEY WORDS: Blocking Convergence acceleration Gibbs sampling Hierarchical model Metropolis- Hastings algorithm.

2 1 Introduction The past decade has seen an explosion in the use of advanced Bayesian methods, largely due to Markov chain Monte Carlo (MCMC) computational methods for estimating posterior distributions. These methods sample from a Markov chain whose stationary distribution is the posterior, producing a correlated sample from this distribution. Compared to quadrature and independent, identically distributed Monte Carlo approaches, MCMC methods are typically easier to implement and more broadly applicable, but they require a convergence \diagnosis," i.e., a decision as to when the samples may be safely viewed as draws from the chain's stationary distribution. While many authors (e.g., Tierney 1994 Roberts and Smith 1993 and Roberts and Tweedie 199) have investigated theoretical convergence properties of MCMC methods, assessing convergence in practice is problematic, because strictly speaking this can only be determined from a sampled chain of innite length. Most practitioners use a variety of diagnostics to isolate convergence diculties see Cowles and Carlin (199) or Mengersen et al. (1999) for reviews. The diculty of assessing convergence has led many authors to refocus on convergence acceleration, on the grounds that a sampler that traverses the parameter space more quickly istypically easier and safer to use. Acceleration methods include reparametrization (see e.g. Hills and Smith 199 Gelfand, Sahu, and Carlin 1995 Gilks and Roberts 199), auxiliary variables (Swendsen and Wang 198 Besag and Green 1993 Damien et al Mira and Tierney 199), and multichain annealing or tempering (Geyer and Thompson 1995 Neal 199). Gilks and Roberts (199) give an overview of these and other acceleration methods. Another approach to accelerating convergence is blocking, orupdatingmultivariate blocks of (typically highly correlated) parameters. This contrasts with the most common approach to MCMC, in which each parameter is updated separately according to its conditional distribution 1

3 given the data and every other parameter in the model. The latter approach is used by BUGS (Spiegelhalter et al. 1995a), the most general and easy-to-use MCMC software package to date. BUGS has been used on a wide range of models (Spiegelhalter et al. 1995b), but high posterior correlations are a major hindrance to its univariate updating algorithms. Blocking { which can be done using Gibbs updates or multivariate Metropolis-Hastings (M-H) updates (Hastings 190 Carlin and Louis 199, Sec ) { often solves this problem. Recent work by Liu (1994) and Liu et al. (1994) conrms its good performance for a broad class of models, though Liu et al. (1994, Sec. 5) and Roberts and Sahu (199, Sec..4) give examples where blocking actually slows a sampler's convergence. Unfortunately, success stories in blocking are often application-specic, and general rules have been hard to nd. In this paper, we introduce a simple, general, and exible blocked MCMC method for a large class of richly-parameterized linear and nonlinear models. The method, which we term Structured MCMC (SMCMC), accelerates convergence for many of these models by blocking groups of similar parameters while taking full advantage of the posterior correlation structure induced by the model and data. For linear models, this structure yields closed-form full conditionals, which may be used as candidate distributions in Hastings independence chain updates. For nonlinear models, Gaussian distributions can be used to approximate the data's contribution to the posterior. When combined with Gaussian prior distributions, this yields eective approximations to the dependence structure in the posterior, which can in turn produce ecient candidate distributions for a Hastings algorithm. In this respect, our algorithm is reminiscent of a \linearization" approach usedby Gelfand et al. (199), though our approach applies much more broadly. The remainder of the paper is organized as follows. Section reviews and illustrates the basis of SMCMC, the constraint case formulation of Hodges (1998). Section 3 lays out the SMCMC algorithm for richly-parameterized linear models, while Section 4 considers the more dicult nonlinear

4 case. Section 5 gives three examples covering a range of modeling and computational complexity, ranging from a hierarchical linear model with normal errors to a hierarchical Cox proportional hazards model. Besides illustrating SMCMC's use, we compare its runtimes and eective sample sizes to those of standard algorithms, including BUGS, when such alternatives are feasible. Section oers concluding remarks, as well as directions for future research. Constraint case framework for richly-parametrized models \Richly-parameterized models" includes hierarchical and other multilevel models (Lindley and Smith 19 Bryk and Raudenbush 199), dynamic models (West and Harrison 1989), variance component models (Searle et al. 199), some spatial models (Besag et al. 1991), and others. We introduce the constraint case framework using hierarchical linear models. Hierarchical models are usually used for data structures with a natural hierarchical structure, e.g., students within classrooms within a school. In this example, a separate regression could be t to the students in each classroom each classroom's vector of regression parameters could then be treated as an outcome in a regression for the whole school. A hierarchial model is thus a hierarchy of simple models conforming to the hierarchy in the data. As no standard terminology for richlyparameterized models has evolved, we use the terminology in Hodges (1998), as outlined in the following example. The simplest hierarchical model is the balanced one-way random-eects model. Suppose we have JK observations y ij i=1 ::: K j =1 ::: J the model assumes that for each i, they ij have acommonmean i, and that the i are in turn draws from a distribution with mean. Ifwe take ij N(0 ) and i N(0 ) (where N(a b) denotes the normal distribution with mean a 3

5 and variance b), the model can be represented by: y ij = i ij (1) i = i () and = M (3) where (3) represents a N(M s ) prior for. A Bayesian analysis adds prior distributions for and. Ifwe rewrite () and (3) as 0 = ; i i (4) and M = ; (5) the model can be expressed in the form of a linear model by combining (1), (4), and (5): 4 y 0 M 3 5 = 4 1 J JK1 0 1 J ;I K 1 K 0 1K K ; 3 5 () where y = fy ij g = f ij g, = f i g, and I m 1 m, and 0 mp are the identity matrix, a column vector of 1's, and a matrix of zeros of the specied dimensions, respectively. Hodges (1998) shows that Bayesian inferences based on () are identical to those drawn from the standard formulation. A wide variety of richly-parameterized models can be re-expressed in a general form of which 4

6 () is a special case, namely 4 y 0 M 3 = 5 4 X 1 0 H 1 H G 1 G : () In more compact notation, () can be expressed as Y = X E (8) where X and Y are known, is unknown, and E is an error term having Cov(E) = ;, where ; is block diagonal with a block corresponding to the covariance matrices of each of, and, i.e. 0 Cov(E)=;= Cov() Cov() Cov() 1 C A : (9) Note that in our simple example (), ; is actually a full (not just block) diagonal matrix. Following Hodges (1998), we use the term \data case" to refer to rows of X, Y, and E in (8) corresponding to X 1 in (). The data cases are the terms in the joint posterior into which the outcomes y enter directly. Rows of X, Y, and E corresponding to the H i are \constraint cases." They place restrictions { stochastic constraints { on possible values of 1. Finally, we call the rows of X, Y,andE corresponding to the G i \prior cases", this label being reserved for cases with known (specied) error variances. Constraint cases being common to both Bayesian and non-bayesian analyses based on this formulation, we will henceforth call it \the constraint case formulation" of a richly-parameterized model. 5

7 Nonlinear models cannot be expressed in the general form (). However, we can usually represent some levels of the hierarchy as linear models, typically the levels corresponding to equations (4) and (5). We defer further discussion on this point to Section 4. 3 Structured MCMC for richly-parameterized linear models 3.1 Normal errors For models with normal errors for the data, constraint, and prior cases, Gibbs sampling and (8) provide a multivariate method for generating draws from the marginal posterior of. From (8), has conditional posterior density j Y X ; N (X 0 ; ;1 X) ;1 X 0 ; ;1 Y (X 0 ; ;1 X) ;1 : (10) If we use conjugate priors for the variance components (i.e., gamma or Wishart priors for their inverses), then the full conditional for the variance components factors into a product of gamma and/or Wishart distributions. Convergence for such samplers is virtually immediate see Hodges (1998, Sec. 5.), and the associated discussion by Wakeeld (1998). The constraint case formulation works by using, in each MCMC draw, all the information in the model and data about the posterior correlations among the elements of. The MCMC algorithm outlined in the preceding paragraph has three key features: 1. It samples, that is, all of the mean-structure parameters, as a single block.. It does so using information about the conditional posterior covariance of, supplied by (a) the mean structure of the richly-parameterized model expressed in the constraint-case formulation in (10), this structure is captured by X

8 (b) the covariance structure of the richly parameterized model expressed in the constraintcase formulation in (10), this structure is captured by ;. 3. It does so using the conditional distribution in (10), for suitable denitions of X and ;. We dene a SMCMC algorithm for a richly-parameterized model to be any algorithm having these three features. For linear models with normal errors, the suitable values of X and ; are those in (8) and (9). These permit at least three dierent implementations of a SMCMC algorithm. The conceptually simplest implementation is the blocked Gibbs sampler discussed above. However, although this Gibbs implementation may converge in few iterations, computing may be slow due to the need to invert a large matrix, X 0 ; ;1 X,ateach iteration. A second SMCMC implementation uses (10) as a candidate distribution in a Hastings independence chain algorithm. Here we might update ; occasionally during the algorithm's pre-convergence \burn-in" period. Such apilot adaptation scheme (Gilks et al, 1998) is simple but forces us to use a single (possibly imperfect) ; for the post-convergence samples that are summarized for posterior inference. A third SMCMC implementation for linear models with normal errors updates ; continually at the algorithm's regeneration times. Regeneration times are points in the algorithm which divide the Markov chain into sections whose sample paths are independent. This allows adaptation of ; to occur repeatedly without disturbing the chain's stationary distribution, or the consistency of point estimates made from its sampled values. Gilks, Roberts, and Sahu (1998) provide a straight forward method for identifying regeneration times in Hastings independence chains of the sort used by SMCMC. The pilot and regenerative adaptive schemes typically sacrice a small amount of eciency for a substantial saving in computing time compared to the Gibbs implementation of a SMCMC algorithm. This possibility is illustrated in the examples of Section 5.

9 3. Non-normal errors An advantage of MCMC algorithms is that they expand the class of models a statistician can feasibly consider. Carlin and Polson (1991) and Evans and Swartz (1995) use Gibbs sampling with latent variables for non-normal error distributions that can be written as normal scale mixtures, such as the t, double exponential, and exponential power distributions. Using such auxiliary variables, these non-normal models can be written in the normal-errors form (), making these models accessible to SMCMC algorithms like those described in Section 3.1. As for models with normal errors, at least three implementations of a SMCMC algorithm are possible. When this use of auxiliary variables is not available, a SMCMC algorithm must use a more general M-H implementation, and an appropriate candidate distribution may be obtained by writing the model as in () and using (10) as the candidate distribution. Of course, (10) is not the full conditional distribution, but it can still be a good candidate distribution for a Hastings independence chain. 4 Structured MCMC for richly-parameterized nonlinear models By \nonlinear models", we mean models in which the outcome y is not linearly related to the parameters in the mean structure. The data cases of such models do not t the form (), but the constraint and prior cases can often be written as a linear model we consider such models here. Because of the nonlinearity in the data cases, it is less straightforward to specify the \suitable denitions of X and ;" needed for a SMCMC algorithm than it was for the linear models of Section 3. This section discusses specication of suitable X and ; in nonlinear settings. To implement a SMCMC algorithm in this setting, we must supply a linear structure approximating the data's contribution to the posterior. We dosoby constructing articial outcome data ~y 8

10 with the property that E(~yj 1 ) is roughly equal to 1. It is only necessary to have rough equality because we only use ~y in a M-H implementation to generate candidate draws for. Specically, for a given ~y with covariance matrix V,we can write the approximate linear model 4 ~y 0 M I 0 H 1 H G 1 G (11) where N(0 V). Equation (11) supplies the necessary X matrix for a SMCMC algorithm, and the necessary ; uses cov() = V and the appropriate covariance matrices for and in (9). The articial outcome data ~y can be supplied by two general strategies, producing (respectively) \unshrunk estimates" and \low-shrinkage estimates" of. The former strategy makes use of crude parameter estimates from the nonlinear part of the model (without the prior or constraint cases), provided they are available. For example, in the pharmacokinetic example of Section 5., we have ve to eight observations per subject, so a standard nonlinear regression will give point estimates and associated variance estimates for the two parameters per subject. Such crude estimates are often available when the ratio of data elements to parameters is large and the constraint and prior cases are included solely to induce shrinkage in estimation. We call such estimates \unshrunk estimates" { which they literally are { and use them and an estimate of their variance as ~y and V, respectively. In certain problems, however, such unshrunk estimates may be unstable, or the model may not be identied without constraint and prior cases. This is typically true when the ratio of data elements to parameters approaches 1. For example, in Section 5.3 we t a model with only one observation per element of, and the entire purpose of tting the model is to shrink the i cleverly (or so we hope). In such cases the constraint and prior cases are added to the model not only to 9

11 encourage shrinkage, but to ensure identiability. Toobtain ~y and V in these cases, we recommend running a simple univariate Metropolis algorithm for a small number of iterations, using prior distributions for the variance components that insist on little shrinkage. That is, use just enough prior information on the variances to identify, but not enough to induce much shrinkage. The posterior mean of approximated by this algorithm is a type of articial data ~y which, when used with the constraint and prior cases in (11), approximates the nonlinear t well enough for the present purpose. We call the results of this univariate algorithm \low shrinkage" estimates, and use them and an estimate of their variance as ~y and V, respectively. Often, setting the low shrinkage priors is straightforward: one can consider ranges of measurable values for the elements of and use priors whose sole purpose is to force the i to stay in that range. Moreover, denitive convergence is generally not necessary in this preliminary chain in our experience rough estimates provided a M-H candidate density suitable for a SMCMC algorithm. These preliminary chains tend to converge quickly for two related reasons. First, the elements of have low posterior correlations by construction. Second, the marginal posterior for each i is determined almost entirely by the data to which it is directly related. Still, a careful choice of prior in the preliminary algorithm may be necessary to produce acceptable M-H candidates for the second-stage SMCMC algorithm. Finally, in the common case of generalized linear mixed models (GLMMs), a simple transformation can often improve matters further. Suppose our model for individual i is i = x 0 i z i where z i is an individual-specic random eect, and = g() where g is the link function relating the linear predictor to the expected value of a data point y. As is detailed by Besag et 10

12 al. (1995) in the binomial case (logit link) and by Waller et al. (199) in the Poisson case (log link), reparametrizing from (z i ;) to ( i ;) produces exactly a normal/inverse Wishart model conditional on i (assuming the usual conjugate prior structures). While equation (11) still requires articial data ~y i, these are naturally obtained as rough estimates of the individual parameters i, a second benet provided by the transformed scale. 5 Numerical Illustrations 5.1 Linear model with normal errors A recent AIDS clinical trial, Community Programs for Clinical Research on AIDS (CPCRA) trial 00 (Abrams et al. 1994) compared didanosine (ddi) and zalcitabine (ddc) in patients with HIV infection who were intolerant to or had treatment failure on zidovudine (ZDV, also known as AZT). For the present purpose, the response variable Y i for patient i is the change in CD4 count between baseline the two-month follow-up, for the K = 3 patients who had both measurements. Three binary predictor variables and their interactions are of interest: treatment group (x 1i :1= ddc, 0 = ddi), reason for eligibility (x i : 1 = failed ZDV,0=intolerant tozdv), and baseline Karnofsky score (x 3i : 1 = score > 85, 0 = score 85 higher scores are better). Following Sargent and Hodges (199), consider the saturated model for this 3 factorial design: Y i = 0 x 1i 1 x i x 3i 3 x 1i x i 1 x 1i x 3i 13 x i x 3i 3 x 1i x i x 3i 13 i (1) where i iid N(0 ) i =1 ::: 3. We use at priors for the intercept 0 and main eects ( 1 3 ), but place hierarchical constraints on the two- and three-way interaction terms, namely l N(0 l )forl =1 13 3, and 13. This linear model is easily written in the form (). Adopting a vague G(0:0001 0:0001) prior for 1= (i.e., having mean 1 and variance 10 4 )and 11

13 independent G(1 1) priors for the h l 1=l l= , completes the specication. Our model is fully conjugate, so BUGS handles it easily. Besides the univariate Gibbs sampler, BUGS allows blocking of the xed eects ( )into a single vector, for which we specied a multivariate normal prior having near-zero precision. BUGS allows no further blocking. For a comparison based on speed alone, we also ran a univariate Gibbs sampler coded in Fortran. For comparison, equation (10) yields a SMCMC implementation that alternately samples from the multivariate normal full conditional p(jh y), and the gamma full conditionals p(j h y) and p(h l j y). As previously mentioned, this Gibbs sampler is a SMCMC implementation that updates the candidate ; at every iteration. We also consider a pilot-adaptive SMCMC implementation, in which we update ; at iterations 1 ::: 10, then every 10th iteration until iteration 1000, and then use the value of ; at iteration 1000 in the \production" run, discarding the 1000 iteration burn-in period. In this chain, p(j ~ h ~ y) is used as a candidate distribution for in a Hastings independence subchain, where ~ h and ~ are the components of ; at iteration The implementations outlined above illustrate the MCMC trade-o between short runtimes and low autocorrelation in the generated samples. An implementation that is fast per iteration may produce highly autocorrelated samples, which are less useful for posterior summarization. To make a fair comparison among the various implementations, we use the notion of eective sample size, or ESS (Kass et al. 1998, p. 99). ESS is dened for each parameter as the number of MCMC samples drawn, N, divided by the parameter's so-called autocorrelation time, =1 P 1 k=1 (k), where (k) is the autocorrelation at lag k. We estimate using sample autocorrelations estimated from the MCMC chain, cutting o the summation when these drop below 0.1 in magnitude. The comparisons between chains presented here are not sensitive to the method of calculating the ESS. For each chain, we obtain N = 5000 post burn-in iterations, compute ESS values, and divide by the chain's runtime in seconds. The resulting \eective samples generated per second" (ES/sec) 1

14 BUGS BUGS Gibbs (univariate SMCMC Gibbs SMCMC (univariate) (partial blocking) Fortran) (full blocking) (pilot adaptation) ESS ES/sec ESS ES/sec ESS ES/sec ESS ES/sec ESS ES/sec h h h h Table 1: Eective sample sizes (ESS) and eective samples drawn per second (ES/sec) for the CPCRA 00 example. provides a fair basis for comparing chains. Table 1 shows the results for the ve implementations. While we caution against putting too much stock in crude runtimes (which are machine-, compiler-, and programmer-dependent), our runtimes (in seconds) to obtain 5000 post-burn-in samples on an Ultra Sparc 0 workstation were as follows: univariate BUGS, 3.4 partially blocked BUGS,. univariate Gibbs implemented in Fortran,.1 SMCMC using fully blocked Gibbs, and SMCMC with pilot adaptation, Pilot-adaptive SMCMC dominates the other three in terms of generation rate (ES/sec) for, the advantage is substantial. The BUGS chains are fast but produce highly autocorrelated samples, hurting their ESS. By contrast, SMCMC implemented as a fully blocked Gibbs sampler produces essentially uncorrelated draws, but at the cost of long runtimes because of repeated matrix inversions. We have tried other SMCMC implementations in this setting, including updating ; less frequently (say, the rst 10, then only every 100th iteration til iteration 1000) or during a shorter 13

15 pilot period (say, the rst 100 iterations). Both of these options produce results similar to those in the nal column of Table 1, with little eect on speed. An alternative method for this problem would be to estimate the precision parameters via pilot adaptation, and then use plain importance sampling for the using the structured distribution, equation (10), as the importance distribution. As a cautionary note, we have experimented with more extreme priors for the h l (e.g., a gamma having mean 1000 and standard deviation 10,000) and found that a SMCMC implementation with infrequent ; updates during the pilot period performs less well. With this extreme prior, the posteriors for the h l are very skewed and highly dispersed, so the values of the h l at each update are not in any way typical of draws from the posterior hence convergence of these chains suers. We discuss this issue further in Section. 5. Nonlinear pharmacokinetic model Wakeeld et al. (1994) presented the data in Figure 1, plasma concentrations Y ij of the drug Cadralazine at various times t ij after administration of a single dose of 30 mg, in 10 heart failure patients. Here i = 1 ::: 10 indexes patients, while j = 1 ::: n i indexes observations within patient, 5 n i 8. Wakeeld et al. (1994) suggested a \one-compartment" pharmacokinetic model in which the mean plasma concentration i (t ij )attimet ij is i (t ij )=30 ;1 i exp(; i t ij = i ) : Later unpublished work by these authors suggests this model is best t on the log scale. Dening Z ij log Y ij, Z ij is then Z ij =log30; a i ; exp(b i ; a i )t ij ij where a i = log i, b i =log i,and ij ind N(0 i ). 14

16 concentration time (hours) (a) log(concentration) -4-0 (b) time (hours) Figure 1: Cadralazine concentration pharmacokinetic data (a) original scale (b) log scale. Following the analysis by Wakeeld et al. (1994), we assume the subject-specic eects i (a i b i ) 0 are i.i.d. N ( ), where =( a b ). These authors recommend, and we use, the usual conjugate priors, namely N ( C), ;1 i ;( ), and ;1 Wishart((R) ;1 ). Our SMCMC implementations for these data used priors recommended by Wakeeld et al. (1994), namely 0 =0, = 0, C ;1 = Diag(0:01 0:01), =,andr = Diag(0:04 0:04). Any MCMC algorithm for this model needs a Metropolis-Hastings step for the random eects i because their full conditional distributions are neither conjugate nor necessarily log-concave. BUGS, V: 0. for UNIX allows such steps, using Metropolis rejection from a grid-based proposal distribution (Ritter and Tanner 199) see Spiegelhalter et al. (199). This form of proposal requires us to place bounds on the individual elements of, but we can create a BUGS specication quite close to the MCMC specication above by using the product formulation of the bivariate normal, namely a i N( a a )I(L a U a )andb i ja i N (k 0 k 1 (a i ; c) b ) I(L b U b ), where (L a U a ) and (L b U b ) 15

17 are broad truncation regions to enable the grid-based Metropolis algorithm, and c is a constant that roughly centers the a i 's (thus reducing correlation between the intercept k 0 and slope k 1 ). In this formulation, we approximate the SMCMC specication by taking G(0:0001 0:0001) priors for the ;1 i, N(0 0:0001) priors for a k 0 and k 1,andG(1 0:04) priors for ;1 a and ;1 b. We considered four MCMC implementations: BUGS V: 0., a standard univariate Metropolis implementation, and two SMCMC implementations. The rst SMCMC implementation used unshrunk estimates of the i s obtained by tting separate models to each patient using SAS PROC NLIN. We used these unshrunk estimates and estimates of their variances as ~y and V in (10), and implemented SMCMC as described in Section 4. The second SMCMC implementation used lowshrinkage estimates from a preliminary univariate Metropolis run. Low-shrinkage estimates are not required here because we have atleast5observations per patient we include them for illustration and comparison. To obtain low-shrinkage estimates, we used a univariate Metropolis chain as above except that we set 0 1 = v v v v C A with v xed and to be estimated. The a i may range from to 4, and the b i from 0 to, suggesting that v = 1 might be appropriate. We ran a univariate Metropolis algorithm with this v for 5000 iterations to produce low-shrinkage estimates for the (a i b i ) pairs, using the posterior means of a i and b i for ~y and the posterior covariance matrix of the (a i b i ) pairs for V in equation (11). The two SMCMC implementations updated ; at iterations 1 ::: 10,thenatevery 10th until iteration 1000, with the value of ; at iteration 1000 used to set the candidate distribution for the production run. Table shows the ESS and eective samples per second for the four chains. Each chain was run for 5000 post burn-in iterations. SMCMC with pilot adaptation does better than all the competitors 1

18 BUGS V0. Metropolis SMCMC, SMCMC, (univariate) (univariate) unshrunk ~y low shrinkage ~y (pilot adaptation) (pilot adaptation) ESS ES/s ESS ES/s ESS ES/s ESS ES/s a a a a b b b b a b a Table : Eective sample sizes (ESS) and eective samples drawn per second (ES/s), MCMC algorithms for the PK data. except for b. The runtimes in seconds for each methodwere: BUGS, univariate Metropolis, 8.5 and SMCMC with pilot adaptation, 1.0. The SAS PROC NLIN run (required for SMCMC with unshrunk ~y) used a negligible amount of CPU time (approximately 0.3 CPU seconds). These results again highlight the tradeo between the eciency and speed of each iteration: BUGS produced less correlated samples (larger ESS), but when we account forruntimes, SMCMC's greater speed prevails. We see at least two reasons why SMCMC did not have a larger advantage compared to BUGS in this problem. First, the 0-dimensional joint posterior of the f(a i b i )g may be non-normal. However, a thorough investigation of this distribution revealed no gross departures from normality. Thus, it appears that SMCMC's modest improvement over BUGS arises because SMCMC exploits posterior correlation, but the constraint cases induce little correlation in the posteriors of the 1

19 mean-structure parameters. As Figure 1 shows, each patient (except #8) has data more or less on a straight line on the log scale, so neither the a i nor the b i will shrink much. There is high correlation within the (a i b i ) pairs, and SMCMC and BUGS provide substantial benet in ESS over the univariate Metropolis algorithm by using this correlation. But once BUGS' runtime is taken into account, the SMCMC algorithm based on the unshrunk ~y emerges as the clear winner (ES/s columns in Table ). 5.3 Cox model with time-varying coecients Kalbeisch and Prentice (198, p.3-4) gave data from a clinical trial in which 13 men with cancer were randomized between experimental chemotherapy and a standard treatment ve covariates were measured. The endpoint was time to death in days there were 9 unique death times. Previous analyses (Lin 1991 Grambsch and Therneau 1994) showed no signicant treatment eect, but strong evidence of a non-proportional eect of one covariate, Karnofsky score, which measures a patient's functional status, ranging from 100 (normal) to zero (dead). We revisit these data to demonstrate SMCMC in a relatively dicult problem, one that BUGS cannot handle. In the Cox proportional hazards model (Cox, 19), covariates have constant multiplicative eects on the hazard function (t x), where the hazard function for an individual's death time T i, given the individual's covariate vector x i, is given by (t x i ) = lim 4t!0 Pr[T i (t t 4t) j T i t x i ] : t Consider instead a model that allows a covariate's coecient to take a dierent value at each unique event time, but which smooths these coecients through a simple random-walk smoother (Sargent 199). This smoother assumes that the dierence between the coecient values at adjacent event 18

20 times t i and t i;1, i.e. ( i ; i;1 ), is drawn from a distribution f with mean zero and variance (t i ; t i;1 ). Specically, for patient j at time t i, the model is (t i x j ) = 0 (t i ) exp(x j i ) i ; i;1 N 0 (t i ; t i;1 ) where x j is the Karnofsky score for patient j and i is specic to t i. This smoother induces very high posterior correlations between adjacent i 's, causing great diculty for univariate MCMC implementations. Sargent (199) analyzed these data using a Cox model with a time-varying coecient for Karnofsky score with K =9 i. He used a Gamma prior for h = 1 with mean and standard deviation 10 5, and a univariate Metropolis algorithm. This algorithm converged very slowly: posterior summaries were based on three chains of 50,000 iterations each, which required approximately 3.5 hours of computer time. The long chains were necessitated by extremely slow mixing for example, the median lag 8 sample autocorrelation over the 9 i was Attempts to use a multivariate Metropolis algorithm, using a candidate distribution with covariance matrix based on an estimate of the within-chain covariance obtained from a pilot adaptive scheme, had no success due to extremely high posterior correlations among the i. Even when the chains were started in the needle-thin ellipsoid of high probability, computations with the covariance matrix were unstable because it was nearly singular. To make a SMCMC algorithm, we need articial data ~y i with expectations that are roughly i. Unshrunk estimates are not available here: if we t separate i for each unique event time without smoothing, many i have anunbounded posterior mode. Instead, we obtained low-shrinkage estimates by running a small number of iterations of a univariate algorithm for the model that smooths 19

21 the i, but using a prior for that forces little smoothing. We then used the posterior means of the i from this algorithm as our low-shrinkage estimates. This avoids the usual problem of univariate algorithms because the low-smoothing prior forces low posterior correlations among the i. In this case, we used a gamma prior for h with mean and standard deviation 1/10. (Means and standard deviations smaller by up to 4 orders of magnitude do not change the performance of our SMCMC implementations.) Dene ~y i and i to be the posterior mean and variance for each i from this univariate low-shrinkage run. With these low-shrinkage estimates, we used this linear model in a SMCMC algorithm: 3 4 ~y 0 (K;1) 3 5 = 4 I KK ; ; ;1 1 E (13) 5 where =( 1 ::: K ) 0, E is a (K ; 1)-vector with mean zero and diagonal covariance matrix, the rst K diagonal elements being i and the last (K ; 1) elements being. In the language of Section, equation (13) is a special case of () with no prior cases. Figure displays ES/sec for selected parameters for three MCMC implementations using this model. The rst algorithm is the simple univariate Metropolis algorithm mentioned above we note that to obtain convergence with this algorithm a run of 50,000 iterations was necessary, which in the gure we have normalized to 5000 iterations by dividing the ESS by 10. In an attempt to improve on these results, we considered three SMCMC implementations. First, we used an implementation that used the value of ; at each iteration as the basis for the M-H proposal distribution. A chain of 5000 iterations of this implementation is shown in Figure with the label 'SMCMC update every 0

22 ES/sec SMCMC adaptive SMCMC update every iteration o Univariate Metropolis ooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo oo oo oooooo ooo oo o oooo o o oooo theta Figure : Eective samples drawn per second (ES/sec) for three algorithms for the time varying coecient Cox example. iteration'. This chain converged quickly (i.e., it had excellent convergence diagnostics after 000 iterations) however, its run time was elevated because the proposal distribution was re-computed at each iteration. In an attempt to speed run time, we tried other implementations to avoid this re-computation. The rst attempt was a pilot-adaptive implementation, updating ; every 10th iteration for the rst 1000 iterations, but this chain converged poorly. In this problem, the posterior standard deviation of h = 1 is suciently large (on the order of 10 5 ) that no single value of ; gives consistently good M-H proposals. The second SMCMC implementation used the adaptive approach of Gilks et al. (1998), where the proposal distribution (10) is updated at the chain's regeneration times with the current value of ;. This chain converged moderately quickly (apparentconvergence after 5000 iterations) regenerations occurred on average every 1 th iteration. The results from this chain are shown in Figure with the label 'SMCMC adaptive'. The run times per 5000 iterations for the three chains shown in Figure were: univariate 1

23 Metropolis minutes (i.e., 0 minutes for 50,000 iterations) SMCMC update every iteration 45 minutes and SMCMC adaptive 5 minutes. Based on the data in Figure, the both SM- CMC implementations provided substantial improvements in ES/sec compared to the univariate algorithm. Conclusions We have presented a general method of MCMC computing for richly-parameterized models. Based on Hodges (1998), SMCMC uses linear structure implied by the model and data to suggest multivariate candidate distributions for a M-H algorithm. This speeds computing by avoiding problems created by high posterior correlations and by requiring fewer likelihood evaluations. We have demonstrated our approach with both linear and nonlinear examples. While these examples are far too complicated for analytical evaluation of convergence, investigations in simpler settings are available and instructive see e.g. Liu et al. (1994, Sec. 5) and Roberts and Sahu (199, Sec..4). SMCMC improves convergence by blocking parameters with high posterior correlations, specically by exploiting structure induced in the mean-structure parameters by the constraint cases. In our experience, in linear models this structure is ample enough that SMCMC pays dividends. In nonlinear cases, however, matters are less clear, and we have found examples in which SMCMC is no better than univariate algorithms. The diculty appears to have two sources. First, SMCMC needs articial data cases, either directly from individual-level data or from a preliminary M-H algorithm with a low shrinkage prior. Poor selection of these articial data cases can hamper a SMCMC algorithm. For example, in the pharmacokinetic problem of Section 5., we originally created low-shrinkage estimates by running a univariate M-H algorithm with an xed to have

24 large variances but zero correlation. This choice restricted the very posterior correlations (in this case, between a i and b i for each i) that SMCMC exploits. The resulting SMCMC implementation was no better than a univariate algorithm, but a better choice of low-shrinkage estimates, shown in Section 5., led to a SMCMC implementation that has superior ESS compared to the univariate algorithm. A second example where SMCMC may provide no advantage is when is of high dimension and V is not diagonal. In these cases, equation (11) can require the user to manipulate huge matrices. In one longitudinal binary-outcome problem, had 541 elements and SMCMC was actually slower than the univariate alternative because of the matrix manipulations. In some cases, it may be possible to avoid this problem by using special structure in the design and/or covariance matrices. If a hierarchical model induces little shrinkage, SMCMC will have little structure to exploit in creating candidate densities. This situation can arise either because the constraint cases cannot induce exploitable structure, or more often because the constraint-case variances happen to be so large that the constraint cases induce little shrinkage. This happened in a frailty model we t, treating the log frailties as a random eect, and convergence of a SMCMC implementation took as many iterations as the univariate alternative, although run time was shorter because the SMCMC implementation made fewer likelihood evaluations. As a general approach, we recommend rst attempting to use BUGS or a simple univariate algorithm to simulate draws from the joint posterior. If these approaches suer from poor convergence or lengthy runtimes, SMCMC may provide substantial improvement. Two apparently reasonable suggestions do not improve SMCMC algorithms. First, one might consider replacing the mean of (10) with the current location of the chain, i.e., a pure Metropolis instead of M-H approach. This strategy radically slowed convergence in several examples, including those in Sections 5.1 and 5.3. This might be because convergence of the pure Metropolis form relies 3

25 solely on the mixing of the Markov chain, while the M-H form is similar to importance sampling when the proposal and target densities are nearly alike. Secondly, one might suggest using the mean of the covariance matrix obtained from a pilot adaptive scheme as the covariance matrix in (10). Again, this suggestion has proven detrimental to convergence inseveral of the examples considered here. The failure of both of these suggestions may also arise because in essence SMCMC works by mimicking a Gibbs sampler. Substituting a current meanchain, or using the covariance matrix from a pilot adaptive scheme, weakens this analogy. Pilot-adaptive SMCMC implementations may be unsatisfactory in cases where the posterior distributions of the variance components are highly dispersed. Thus, such implementations are only recommended in cases where the analyst can feel comfortable that the values for the components of ; at a given iteration are in some sense typical of the posterior distribution. When this is not the case, an adaptive implementation using regeneration points (Gilks et al. 1998) has proven helpful. In summary, SMCMC appears very competitive with univariate algorithms when they are available, and can oer a feasible solution in harder problems. We have had success using SMCMC algorithms in some problems that were otherwise infeasible, such as complex hierarchical proportional hazards models. Further investigation of SMCMC and other MCMC blocking methods is warranted, so more experience can be gained in choosing good strategies in particular situations and in designing general-purpose SMCMC algorithms for large classes of problems. Acknowledgments The second author was supported in part by the Minnesota Oral Health Clinical Research Center, NIH/NIDR P30-DE093, while the third author was supported in part by National Institute of Allergy and Infectious Diseases (NIAID) Grant R01-AI419. Jon Wakeeld graciously supplied 4

26 us with the data used in Section 5.. We are grateful for the assistance of three diligent referees whose comments led to substantial improvements in the manuscript. References Abrams, D.I., Goldman, A.I., Launer, C., et al. (1994), \Comparative Trial of Didanosine and Zalcitabine in Patients with Human Immunodeciency Virus Infection who are Intolerant or have Failed Zidovudine Therapy," New Eng. J. Med., 330, 5{. Besag, J. and Green, P.J. (1993), \Spatial Statistics and Bayesian Computation" (with discussion), J. Roy. Stat. Soc., Ser. B, 55, 5{3. Besag, J., Green, P., Higdon, D. and Mengersen, K. (1995), \Bayesian Computation and Stochastic Systems" (with discussion), Statistical Science, 10, 3{. Besag, J., York, J.C., and Mollie, A. (1991), \Bayesian Image Restoration, with Two Applications in Spatial Statistics" (with discussion), Ann. Inst. Stat. Math., 43, 1{59. Bryk, A.S. and Raudenbush, S.W. (199), Hierarchical Linear Models: Applications and Data Analysis Methods, Newbury Park, CA: SAGE Publications. Carlin, B.P. and Louis, T.A. (199), Bayes and Empirical Bayes Methods for Data Analysis, London: Chapman and Hall. Carlin, B.P. and Polson, N.G. (1991), \Inference for Nonconjugate Bayesian Models Using the Gibbs Sampler," Can. J. Stat., 19, 399{405. Cowles, M.K. and Carlin, B.P. (199), \Markov Chain Monte Carlo Convergence Diagnostics: A Comparative Review," J. Amer. Stat. Assoc., 91, 883{904. Cox, D.R. (19), Regression models and life tables (with discussion), Journal of the Royal Statistical Society, Series B, 34, Damien, P., Wakeeld, J. and Walker, S. (1999), \Gibbs Sampling for Bayesian Nonconjugate and Hierarchical Models using Auxilliary Variables," to appear J. Roy. Stat. Soc., Ser. B. Evans, M. and Swartz, T. (1995), \Methods for Approximating Integrals in Statistics with Special Emphasis on Bayesian Integration Problems," Stat. Sci., 10, 54{. Gelfand, A.E., Mallick, B.K. and Polasek, W. (199), \Broken Biological Size Relationships: A Truncated Semiparametric Regression Approach with Measurement Error," J. Amer. Stat. Assoc., 9, 83{845. Gelfand, A.E. and Sahu, S.K. (1994), \On Markov Chain Monte Carlo Acceleration," J. Comp. Graph. Stat., 3, 1{. Gelfand, A.E., Sahu, S.K. and Carlin, B.P. (1995), \Ecient Parametrizations for Normal Linear Mixed Models," Biometrika, 8, 49{488. 5

27 Gelman, A. and Rubin, D.B. (199), \Inference from Iterative Simulation using Multiple Sequences" (with discussion), Stat. Sci.,, 45{511. Geyer, C.J. and Thompson, E.A. (1995), \Annealing Markov Chain Monte Carlo with Applications to Ancestral Inference," J. Amer. Stat. Assoc., 90, 909{90. Gilks, W.R. and Roberts, G.O. (199), \Strategies for Improving MCMC," in Markov Chain Monte Carlo in Practice, W.R. Gilks, S. Richardson and D.J. Spiegelhalter, D.J., eds, London: Chapman and Hall, pp. 89{114. Gilks, W.R., Roberts, G.O., and Sahu, S.K. (1998), \Adaptive Markov Chain Monte Carlo through Regeneration," J. Amer. Stat. Assoc., 93, Grambsch, P.M. and Therneau, T.M. (1994), \Proportional Hazards Tests and Diagnostics based on Weighted Residuals," Biometrika, 81, 515{5. Hastings, W.K. (190), \Monte Carlo Sampling Methods using Markov Chains and their Applications," Biometrika, 5, 9{109. Hills, S.E. and Smith, A.F.M. (199), \Parametrization Issues in Bayesian Inference," in Bayesian Statistics 4, Ed. J.M. Bernardo, J.O. Berger, A.P. Dawid and A.F.M. Smith, pp. 41{49, Oxford: Oxford University Press. Hodges, J.S. (1998), \Some Algebra and Geometry for Hierarchical Models, Applied to Diagnostics" (with discussion), J. Roy. Stat. Soc., Ser. B, 0, 49{53. Kalbeisch, J.D. and Prentice, R.L. (198), The Statistical Analysis of Failure Time Data. New York: Wiley. Kass, R.E., Carlin, B.P., Gelman, A. and Neal, R. (1998), \Markov Chain Monte Carlo in Practice: A Roundtable Discussion," Amer. Stat., 5, 93{100. Lin, D.Y. (1991). \Goodness-of-Fit for the Cox Regression Model Based on a Class of Parameter Estimators," J. Amer. Stat. Assoc., 8, 5{8. Lindley, D.V., and Smith, A.F.M. (19), \Bayes Estimates for the Linear Model" (with discussion), J. Roy. Stat. Soc., Ser. B, 34, 1{41. Liu, J.S. (1994), \The Collapsed Gibbs Sampler in Bayesian Computations with Applications to a Gene Regulation Problem," J. Amer. Stat. Assoc., 89, 958{9. Liu, J.S., Wong, W.H., and Kong, A. (1994), \Covariance Structure of the Gibbs Sampler with Applications to the Comparisons of Estimators and Augmentation Schemes," Biometrika, 81, {40. Mengersen, K.L., Robert, C.P., and Guihenneuc-Jouyaux, C. (1999), \MCMC Convergence Diagnostics: A `RevieWWW' " (with discussion), to appear in Bayesian Statistics, eds. J.M. Bernardo, J.O. Berger, A.P. Dawid, and A.F.M. Smith. Oxford: Oxford University Press. Mira, A., and Tierney, L. (199), \On the Use of Auxiliary Variables in Markov Chain Monte Carlo Sampling," technical report, School of Statistics, University of Minnesota.

28 Neal, R.M. (199), \Sampling from Multimodal Distributions using Tempered Transitions," Stat. and Comp.,, 353{ 3. Ritter, C. and Tanner, M.A. (199), \Facilitating the Gibbs Sampler: The Gibbs Stopper and the Griddy Gibbs Sampler," J. Amer. Stat. Assoc., 8, 81{88. Roberts, G.O. and Sahu, S.K. (199), \Updating Schemes, Correlation Structure, Blocking and Parameterization for the Gibbs Sampler," J. Roy. Stat. Soc., Ser. B, 59, 91{31. Roberts, G.O. and Smith, A.F.M. (1993), \Simple Conditions for the Convergence of the Gibbs Sampler and Metropolis-Hastings Algorithms," Stoch. Proc. and their Apps., 49, 0{1. Roberts, G.O. and Tweedie, R.L. (199), \Geometric Convergence and Central Limit Theorems for Multidimensional Hastings and Metropolis Algorithms," Biometrika, 83, 95{110. Sargent, D.J. (199), \A Flexible Approach to Time-Varying Coecients in the Cox Regression Setting," Lifetime Data Analysis, 3, 13{5. Sargent, D.J., and Hodges, J.S (199), \Smoothed ANOVA with Application to Subgroup Analysis," Research Report 9{00, Division of Biostatistics, University of Minnesota. Searle, S.R., Casella, G., and McCulloch, C.E. (199), Variance Components, New York: Wiley. Spiegelhalter, D.J., Thomas, A., Best, N. and Gilks, W.R. (1995a), \BUGS: Bayesian Inference Using Gibbs Sampling, Version 0.50," technical report, Medical Research Council Biostatistics Unit, Institute of Public Health, Cambridge University. Spiegelhalter, D.J., Thomas, A., Best, N. and Gilks, W.R. (1995b), \BUGS Examples, Version 0.50," technical report, Medical Research Council Biostatistics Unit, Institute of Public Health, Cambridge University. Spiegelhalter, D.J., Thomas, A., Best, N. and Gilks, W.R. (199), \BUGS 0.: Bayesian Inference Using Gibbs Sampling (Addendum to Manual)," technical report, Medical Research Council Biostatistics Unit, Institute of Public Health, Cambridge University. Swendsen, R.H. and Wang, J.-S. (198), \Nonuniversal Critical Dynamics in Monte Carlo Simulations," Phys. Rev. Letters, 58, 8{88. Tierney, L. (1994), \Markov Chains for Exploring Posterior Distributions (with discussion)," Ann. Statist.,, 101{1. Wakeeld, J.C. (1998), Discussion of \Some Algebra and Geometry for Hierarchical Models, Applied to Diagnostics," by J.S. Hodges, J. Roy. Stat. Soc., Ser. B, 0, 53{55. Wakeeld, J.C., Smith, A.F.M., Racine-Poon, A., and Gelfand, A.E. (1994), \Bayesian Analysis of Linear and Non-linear Population Models by Using the Gibbs Sampler," App. Stat., 43, 01{1. Waller, L.A., Carlin, B.P., Xia, H. and Gelfand, A.E. (199), \Hierarchical Spatio-temporal Mapping of Disease Rates," Journal of the American Statistical Association, 9, 0{1. West, M. and Harrison, W. (1989), Bayesian Forecasting and Dynamic Models, New York: Springer- Verlag.

The simple slice sampler is a specialised type of MCMC auxiliary variable method (Swendsen and Wang, 1987; Edwards and Sokal, 1988; Besag and Green, 1

The simple slice sampler is a specialised type of MCMC auxiliary variable method (Swendsen and Wang, 1987; Edwards and Sokal, 1988; Besag and Green, 1 Recent progress on computable bounds and the simple slice sampler by Gareth O. Roberts* and Jerey S. Rosenthal** (May, 1999.) This paper discusses general quantitative bounds on the convergence rates of

More information

Markov Chain Monte Carlo in Practice

Markov Chain Monte Carlo in Practice Markov Chain Monte Carlo in Practice Edited by W.R. Gilks Medical Research Council Biostatistics Unit Cambridge UK S. Richardson French National Institute for Health and Medical Research Vilejuif France

More information

Simulation of truncated normal variables. Christian P. Robert LSTA, Université Pierre et Marie Curie, Paris

Simulation of truncated normal variables. Christian P. Robert LSTA, Université Pierre et Marie Curie, Paris Simulation of truncated normal variables Christian P. Robert LSTA, Université Pierre et Marie Curie, Paris Abstract arxiv:0907.4010v1 [stat.co] 23 Jul 2009 We provide in this paper simulation algorithms

More information

Bayesian Meta-analysis with Hierarchical Modeling Brian P. Hobbs 1

Bayesian Meta-analysis with Hierarchical Modeling Brian P. Hobbs 1 Bayesian Meta-analysis with Hierarchical Modeling Brian P. Hobbs 1 Division of Biostatistics, School of Public Health, University of Minnesota, Mayo Mail Code 303, Minneapolis, Minnesota 55455 0392, U.S.A.

More information

Identiability and convergence issues for Markov chain Monte Carlo tting of spatial models

Identiability and convergence issues for Markov chain Monte Carlo tting of spatial models STATISTICS IN MEDICINE Statist. Med. 2000; 19:2279 2294 Identiability and convergence issues for Markov chain Monte Carlo tting of spatial models Lynn E. Eberly and Bradley P. Carlin ; Division of Biostatistics;

More information

Markov Chain Monte Carlo methods

Markov Chain Monte Carlo methods Markov Chain Monte Carlo methods By Oleg Makhnin 1 Introduction a b c M = d e f g h i 0 f(x)dx 1.1 Motivation 1.1.1 Just here Supresses numbering 1.1.2 After this 1.2 Literature 2 Method 2.1 New math As

More information

Markov Chain Monte Carlo methods

Markov Chain Monte Carlo methods Markov Chain Monte Carlo methods Tomas McKelvey and Lennart Svensson Signal Processing Group Department of Signals and Systems Chalmers University of Technology, Sweden November 26, 2012 Today s learning

More information

The Bias-Variance dilemma of the Monte Carlo. method. Technion - Israel Institute of Technology, Technion City, Haifa 32000, Israel

The Bias-Variance dilemma of the Monte Carlo. method. Technion - Israel Institute of Technology, Technion City, Haifa 32000, Israel The Bias-Variance dilemma of the Monte Carlo method Zlochin Mark 1 and Yoram Baram 1 Technion - Israel Institute of Technology, Technion City, Haifa 32000, Israel fzmark,baramg@cs.technion.ac.il Abstract.

More information

Kobe University Repository : Kernel

Kobe University Repository : Kernel Kobe University Repository : Kernel タイトル Title 著者 Author(s) 掲載誌 巻号 ページ Citation 刊行日 Issue date 資源タイプ Resource Type 版区分 Resource Version 権利 Rights DOI URL Note on the Sampling Distribution for the Metropolis-

More information

A note on Reversible Jump Markov Chain Monte Carlo

A note on Reversible Jump Markov Chain Monte Carlo A note on Reversible Jump Markov Chain Monte Carlo Hedibert Freitas Lopes Graduate School of Business The University of Chicago 5807 South Woodlawn Avenue Chicago, Illinois 60637 February, 1st 2006 1 Introduction

More information

eqr094: Hierarchical MCMC for Bayesian System Reliability

eqr094: Hierarchical MCMC for Bayesian System Reliability eqr094: Hierarchical MCMC for Bayesian System Reliability Alyson G. Wilson Statistical Sciences Group, Los Alamos National Laboratory P.O. Box 1663, MS F600 Los Alamos, NM 87545 USA Phone: 505-667-9167

More information

Multivariate Survival Analysis

Multivariate Survival Analysis Multivariate Survival Analysis Previously we have assumed that either (X i, δ i ) or (X i, δ i, Z i ), i = 1,..., n, are i.i.d.. This may not always be the case. Multivariate survival data can arise in

More information

Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016

Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016 Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016 EPSY 905: Intro to Bayesian and MCMC Today s Class An

More information

Hastings-within-Gibbs Algorithm: Introduction and Application on Hierarchical Model

Hastings-within-Gibbs Algorithm: Introduction and Application on Hierarchical Model UNIVERSITY OF TEXAS AT SAN ANTONIO Hastings-within-Gibbs Algorithm: Introduction and Application on Hierarchical Model Liang Jing April 2010 1 1 ABSTRACT In this paper, common MCMC algorithms are introduced

More information

Ronald Christensen. University of New Mexico. Albuquerque, New Mexico. Wesley Johnson. University of California, Irvine. Irvine, California

Ronald Christensen. University of New Mexico. Albuquerque, New Mexico. Wesley Johnson. University of California, Irvine. Irvine, California Texts in Statistical Science Bayesian Ideas and Data Analysis An Introduction for Scientists and Statisticians Ronald Christensen University of New Mexico Albuquerque, New Mexico Wesley Johnson University

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate

More information

Computational statistics

Computational statistics Computational statistics Markov Chain Monte Carlo methods Thierry Denœux March 2017 Thierry Denœux Computational statistics March 2017 1 / 71 Contents of this chapter When a target density f can be evaluated

More information

Statistical Inference for Stochastic Epidemic Models

Statistical Inference for Stochastic Epidemic Models Statistical Inference for Stochastic Epidemic Models George Streftaris 1 and Gavin J. Gibson 1 1 Department of Actuarial Mathematics & Statistics, Heriot-Watt University, Riccarton, Edinburgh EH14 4AS,

More information

ST 740: Markov Chain Monte Carlo

ST 740: Markov Chain Monte Carlo ST 740: Markov Chain Monte Carlo Alyson Wilson Department of Statistics North Carolina State University October 14, 2012 A. Wilson (NCSU Stsatistics) MCMC October 14, 2012 1 / 20 Convergence Diagnostics:

More information

Principles of Bayesian Inference

Principles of Bayesian Inference Principles of Bayesian Inference Sudipto Banerjee University of Minnesota July 20th, 2008 1 Bayesian Principles Classical statistics: model parameters are fixed and unknown. A Bayesian thinks of parameters

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

1. INTRODUCTION Propp and Wilson (1996,1998) described a protocol called \coupling from the past" (CFTP) for exact sampling from a distribution using

1. INTRODUCTION Propp and Wilson (1996,1998) described a protocol called \coupling from the past (CFTP) for exact sampling from a distribution using Ecient Use of Exact Samples by Duncan J. Murdoch* and Jerey S. Rosenthal** Abstract Propp and Wilson (1996,1998) described a protocol called coupling from the past (CFTP) for exact sampling from the steady-state

More information

Analysis of the Gibbs sampler for a model. related to James-Stein estimators. Jeffrey S. Rosenthal*

Analysis of the Gibbs sampler for a model. related to James-Stein estimators. Jeffrey S. Rosenthal* Analysis of the Gibbs sampler for a model related to James-Stein estimators by Jeffrey S. Rosenthal* Department of Statistics University of Toronto Toronto, Ontario Canada M5S 1A1 Phone: 416 978-4594.

More information

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence Bayesian Inference in GLMs Frequentists typically base inferences on MLEs, asymptotic confidence limits, and log-likelihood ratio tests Bayesians base inferences on the posterior distribution of the unknowns

More information

A characterization of consistency of model weights given partial information in normal linear models

A characterization of consistency of model weights given partial information in normal linear models Statistics & Probability Letters ( ) A characterization of consistency of model weights given partial information in normal linear models Hubert Wong a;, Bertrand Clare b;1 a Department of Health Care

More information

Younshik Chung and Hyungsoon Kim 968). Sharples(990) showed how variance ination can be incorporated easily into general hierarchical models, retainin

Younshik Chung and Hyungsoon Kim 968). Sharples(990) showed how variance ination can be incorporated easily into general hierarchical models, retainin Bayesian Outlier Detection in Regression Model Younshik Chung and Hyungsoon Kim Abstract The problem of 'outliers', observations which look suspicious in some way, has long been one of the most concern

More information

17 : Markov Chain Monte Carlo

17 : Markov Chain Monte Carlo 10-708: Probabilistic Graphical Models, Spring 2015 17 : Markov Chain Monte Carlo Lecturer: Eric P. Xing Scribes: Heran Lin, Bin Deng, Yun Huang 1 Review of Monte Carlo Methods 1.1 Overview Monte Carlo

More information

Markov Chain Monte Carlo A Contribution to the Encyclopedia of Environmetrics

Markov Chain Monte Carlo A Contribution to the Encyclopedia of Environmetrics Markov Chain Monte Carlo A Contribution to the Encyclopedia of Environmetrics Galin L. Jones and James P. Hobert Department of Statistics University of Florida May 2000 1 Introduction Realistic statistical

More information

Metropolis-Hastings Algorithm

Metropolis-Hastings Algorithm Strength of the Gibbs sampler Metropolis-Hastings Algorithm Easy algorithm to think about. Exploits the factorization properties of the joint probability distribution. No difficult choices to be made to

More information

Rank Regression with Normal Residuals using the Gibbs Sampler

Rank Regression with Normal Residuals using the Gibbs Sampler Rank Regression with Normal Residuals using the Gibbs Sampler Stephen P Smith email: hucklebird@aol.com, 2018 Abstract Yu (2000) described the use of the Gibbs sampler to estimate regression parameters

More information

A Nonparametric Approach Using Dirichlet Process for Hierarchical Generalized Linear Mixed Models

A Nonparametric Approach Using Dirichlet Process for Hierarchical Generalized Linear Mixed Models Journal of Data Science 8(2010), 43-59 A Nonparametric Approach Using Dirichlet Process for Hierarchical Generalized Linear Mixed Models Jing Wang Louisiana State University Abstract: In this paper, we

More information

Prerequisite: STATS 7 or STATS 8 or AP90 or (STATS 120A and STATS 120B and STATS 120C). AP90 with a minimum score of 3

Prerequisite: STATS 7 or STATS 8 or AP90 or (STATS 120A and STATS 120B and STATS 120C). AP90 with a minimum score of 3 University of California, Irvine 2017-2018 1 Statistics (STATS) Courses STATS 5. Seminar in Data Science. 1 Unit. An introduction to the field of Data Science; intended for entering freshman and transfers.

More information

Markov Chain Monte Carlo

Markov Chain Monte Carlo Markov Chain Monte Carlo Recall: To compute the expectation E ( h(y ) ) we use the approximation E(h(Y )) 1 n n h(y ) t=1 with Y (1),..., Y (n) h(y). Thus our aim is to sample Y (1),..., Y (n) from f(y).

More information

Comparison of Three Calculation Methods for a Bayesian Inference of Two Poisson Parameters

Comparison of Three Calculation Methods for a Bayesian Inference of Two Poisson Parameters Journal of Modern Applied Statistical Methods Volume 13 Issue 1 Article 26 5-1-2014 Comparison of Three Calculation Methods for a Bayesian Inference of Two Poisson Parameters Yohei Kawasaki Tokyo University

More information

POSTERIOR ANALYSIS OF THE MULTIPLICATIVE HETEROSCEDASTICITY MODEL

POSTERIOR ANALYSIS OF THE MULTIPLICATIVE HETEROSCEDASTICITY MODEL COMMUN. STATIST. THEORY METH., 30(5), 855 874 (2001) POSTERIOR ANALYSIS OF THE MULTIPLICATIVE HETEROSCEDASTICITY MODEL Hisashi Tanizaki and Xingyuan Zhang Faculty of Economics, Kobe University, Kobe 657-8501,

More information

Stat 451 Lecture Notes Markov Chain Monte Carlo. Ryan Martin UIC

Stat 451 Lecture Notes Markov Chain Monte Carlo. Ryan Martin UIC Stat 451 Lecture Notes 07 12 Markov Chain Monte Carlo Ryan Martin UIC www.math.uic.edu/~rgmartin 1 Based on Chapters 8 9 in Givens & Hoeting, Chapters 25 27 in Lange 2 Updated: April 4, 2016 1 / 42 Outline

More information

Summary of Talk Background to Multilevel modelling project. What is complex level 1 variation? Tutorial dataset. Method 1 : Inverse Wishart proposals.

Summary of Talk Background to Multilevel modelling project. What is complex level 1 variation? Tutorial dataset. Method 1 : Inverse Wishart proposals. Modelling the Variance : MCMC methods for tting multilevel models with complex level 1 variation and extensions to constrained variance matrices By Dr William Browne Centre for Multilevel Modelling Institute

More information

On a multivariate implementation of the Gibbs sampler

On a multivariate implementation of the Gibbs sampler Note On a multivariate implementation of the Gibbs sampler LA García-Cortés, D Sorensen* National Institute of Animal Science, Research Center Foulum, PB 39, DK-8830 Tjele, Denmark (Received 2 August 1995;

More information

MCMC algorithms for fitting Bayesian models

MCMC algorithms for fitting Bayesian models MCMC algorithms for fitting Bayesian models p. 1/1 MCMC algorithms for fitting Bayesian models Sudipto Banerjee sudiptob@biostat.umn.edu University of Minnesota MCMC algorithms for fitting Bayesian models

More information

Geometric ergodicity of the Bayesian lasso

Geometric ergodicity of the Bayesian lasso Geometric ergodicity of the Bayesian lasso Kshiti Khare and James P. Hobert Department of Statistics University of Florida June 3 Abstract Consider the standard linear model y = X +, where the components

More information

Multilevel Statistical Models: 3 rd edition, 2003 Contents

Multilevel Statistical Models: 3 rd edition, 2003 Contents Multilevel Statistical Models: 3 rd edition, 2003 Contents Preface Acknowledgements Notation Two and three level models. A general classification notation and diagram Glossary Chapter 1 An introduction

More information

Alternative implementations of Monte Carlo EM algorithms for likelihood inferences

Alternative implementations of Monte Carlo EM algorithms for likelihood inferences Genet. Sel. Evol. 33 001) 443 45 443 INRA, EDP Sciences, 001 Alternative implementations of Monte Carlo EM algorithms for likelihood inferences Louis Alberto GARCÍA-CORTÉS a, Daniel SORENSEN b, Note a

More information

A Search and Jump Algorithm for Markov Chain Monte Carlo Sampling. Christopher Jennison. Adriana Ibrahim. Seminar at University of Kuwait

A Search and Jump Algorithm for Markov Chain Monte Carlo Sampling. Christopher Jennison. Adriana Ibrahim. Seminar at University of Kuwait A Search and Jump Algorithm for Markov Chain Monte Carlo Sampling Christopher Jennison Department of Mathematical Sciences, University of Bath, UK http://people.bath.ac.uk/mascj Adriana Ibrahim Institute

More information

University of Toronto Department of Statistics

University of Toronto Department of Statistics Norm Comparisons for Data Augmentation by James P. Hobert Department of Statistics University of Florida and Jeffrey S. Rosenthal Department of Statistics University of Toronto Technical Report No. 0704

More information

Default Priors and Effcient Posterior Computation in Bayesian

Default Priors and Effcient Posterior Computation in Bayesian Default Priors and Effcient Posterior Computation in Bayesian Factor Analysis January 16, 2010 Presented by Eric Wang, Duke University Background and Motivation A Brief Review of Parameter Expansion Literature

More information

Bayesian data analysis in practice: Three simple examples

Bayesian data analysis in practice: Three simple examples Bayesian data analysis in practice: Three simple examples Martin P. Tingley Introduction These notes cover three examples I presented at Climatea on 5 October 0. Matlab code is available by request to

More information

Approaches for Multiple Disease Mapping: MCAR and SANOVA

Approaches for Multiple Disease Mapping: MCAR and SANOVA Approaches for Multiple Disease Mapping: MCAR and SANOVA Dipankar Bandyopadhyay Division of Biostatistics, University of Minnesota SPH April 22, 2015 1 Adapted from Sudipto Banerjee s notes SANOVA vs MCAR

More information

Frailty Modeling for Spatially Correlated Survival Data, with Application to Infant Mortality in Minnesota By: Sudipto Banerjee, Mela. P.

Frailty Modeling for Spatially Correlated Survival Data, with Application to Infant Mortality in Minnesota By: Sudipto Banerjee, Mela. P. Frailty Modeling for Spatially Correlated Survival Data, with Application to Infant Mortality in Minnesota By: Sudipto Banerjee, Melanie M. Wall, Bradley P. Carlin November 24, 2014 Outlines of the talk

More information

STAT 425: Introduction to Bayesian Analysis

STAT 425: Introduction to Bayesian Analysis STAT 425: Introduction to Bayesian Analysis Marina Vannucci Rice University, USA Fall 2017 Marina Vannucci (Rice University, USA) Bayesian Analysis (Part 2) Fall 2017 1 / 19 Part 2: Markov chain Monte

More information

Bayesian time series classification

Bayesian time series classification Bayesian time series classification Peter Sykacek Department of Engineering Science University of Oxford Oxford, OX 3PJ, UK psyk@robots.ox.ac.uk Stephen Roberts Department of Engineering Science University

More information

Bayesian Inference for the Multivariate Normal

Bayesian Inference for the Multivariate Normal Bayesian Inference for the Multivariate Normal Will Penny Wellcome Trust Centre for Neuroimaging, University College, London WC1N 3BG, UK. November 28, 2014 Abstract Bayesian inference for the multivariate

More information

Bayesian Inference on Joint Mixture Models for Survival-Longitudinal Data with Multiple Features. Yangxin Huang

Bayesian Inference on Joint Mixture Models for Survival-Longitudinal Data with Multiple Features. Yangxin Huang Bayesian Inference on Joint Mixture Models for Survival-Longitudinal Data with Multiple Features Yangxin Huang Department of Epidemiology and Biostatistics, COPH, USF, Tampa, FL yhuang@health.usf.edu January

More information

Bayesian nonparametric estimation of finite population quantities in absence of design information on nonsampled units

Bayesian nonparametric estimation of finite population quantities in absence of design information on nonsampled units Bayesian nonparametric estimation of finite population quantities in absence of design information on nonsampled units Sahar Z Zangeneh Robert W. Keener Roderick J.A. Little Abstract In Probability proportional

More information

Bayesian Inference. Chapter 1. Introduction and basic concepts

Bayesian Inference. Chapter 1. Introduction and basic concepts Bayesian Inference Chapter 1. Introduction and basic concepts M. Concepción Ausín Department of Statistics Universidad Carlos III de Madrid Master in Business Administration and Quantitative Methods Master

More information

Multivariate Slice Sampling. A Thesis. Submitted to the Faculty. Drexel University. Jingjing Lu. in partial fulfillment of the

Multivariate Slice Sampling. A Thesis. Submitted to the Faculty. Drexel University. Jingjing Lu. in partial fulfillment of the Multivariate Slice Sampling A Thesis Submitted to the Faculty of Drexel University by Jingjing Lu in partial fulfillment of the requirements for the degree of Doctor of Philosophy June 2008 c Copyright

More information

CTDL-Positive Stable Frailty Model

CTDL-Positive Stable Frailty Model CTDL-Positive Stable Frailty Model M. Blagojevic 1, G. MacKenzie 2 1 Department of Mathematics, Keele University, Staffordshire ST5 5BG,UK and 2 Centre of Biostatistics, University of Limerick, Ireland

More information

STAT 518 Intro Student Presentation

STAT 518 Intro Student Presentation STAT 518 Intro Student Presentation Wen Wei Loh April 11, 2013 Title of paper Radford M. Neal [1999] Bayesian Statistics, 6: 475-501, 1999 What the paper is about Regression and Classification Flexible

More information

Probabilistic Graphical Models Lecture 17: Markov chain Monte Carlo

Probabilistic Graphical Models Lecture 17: Markov chain Monte Carlo Probabilistic Graphical Models Lecture 17: Markov chain Monte Carlo Andrew Gordon Wilson www.cs.cmu.edu/~andrewgw Carnegie Mellon University March 18, 2015 1 / 45 Resources and Attribution Image credits,

More information

Contents. Part I: Fundamentals of Bayesian Inference 1

Contents. Part I: Fundamentals of Bayesian Inference 1 Contents Preface xiii Part I: Fundamentals of Bayesian Inference 1 1 Probability and inference 3 1.1 The three steps of Bayesian data analysis 3 1.2 General notation for statistical inference 4 1.3 Bayesian

More information

Supplement to A Hierarchical Approach for Fitting Curves to Response Time Measurements

Supplement to A Hierarchical Approach for Fitting Curves to Response Time Measurements Supplement to A Hierarchical Approach for Fitting Curves to Response Time Measurements Jeffrey N. Rouder Francis Tuerlinckx Paul L. Speckman Jun Lu & Pablo Gomez May 4 008 1 The Weibull regression model

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate

More information

Modelling geoadditive survival data

Modelling geoadditive survival data Modelling geoadditive survival data Thomas Kneib & Ludwig Fahrmeir Department of Statistics, Ludwig-Maximilians-University Munich 1. Leukemia survival data 2. Structured hazard regression 3. Mixed model

More information

Introduction to Machine Learning CMU-10701

Introduction to Machine Learning CMU-10701 Introduction to Machine Learning CMU-10701 Markov Chain Monte Carlo Methods Barnabás Póczos & Aarti Singh Contents Markov Chain Monte Carlo Methods Goal & Motivation Sampling Rejection Importance Markov

More information

Lecture 5: Spatial probit models. James P. LeSage University of Toledo Department of Economics Toledo, OH

Lecture 5: Spatial probit models. James P. LeSage University of Toledo Department of Economics Toledo, OH Lecture 5: Spatial probit models James P. LeSage University of Toledo Department of Economics Toledo, OH 43606 jlesage@spatial-econometrics.com March 2004 1 A Bayesian spatial probit model with individual

More information

Factorization of Seperable and Patterned Covariance Matrices for Gibbs Sampling

Factorization of Seperable and Patterned Covariance Matrices for Gibbs Sampling Monte Carlo Methods Appl, Vol 6, No 3 (2000), pp 205 210 c VSP 2000 Factorization of Seperable and Patterned Covariance Matrices for Gibbs Sampling Daniel B Rowe H & SS, 228-77 California Institute of

More information

Bayes: All uncertainty is described using probability.

Bayes: All uncertainty is described using probability. Bayes: All uncertainty is described using probability. Let w be the data and θ be any unknown quantities. Likelihood. The probability model π(w θ) has θ fixed and w varying. The likelihood L(θ; w) is π(w

More information

Implementing componentwise Hastings algorithms

Implementing componentwise Hastings algorithms Computational Statistics & Data Analysis 48 (2005) 363 389 www.elsevier.com/locate/csda Implementing componentwise Hastings algorithms Richard A. Levine a;, Zhaoxia Yu b, William G. Hanley c, John J. Nitao

More information

MONTE CARLO METHODS. Hedibert Freitas Lopes

MONTE CARLO METHODS. Hedibert Freitas Lopes MONTE CARLO METHODS Hedibert Freitas Lopes The University of Chicago Booth School of Business 5807 South Woodlawn Avenue, Chicago, IL 60637 http://faculty.chicagobooth.edu/hedibert.lopes hlopes@chicagobooth.edu

More information

Part 8: GLMs and Hierarchical LMs and GLMs

Part 8: GLMs and Hierarchical LMs and GLMs Part 8: GLMs and Hierarchical LMs and GLMs 1 Example: Song sparrow reproductive success Arcese et al., (1992) provide data on a sample from a population of 52 female song sparrows studied over the course

More information

Weighted tests of homogeneity for testing the number of components in a mixture

Weighted tests of homogeneity for testing the number of components in a mixture Computational Statistics & Data Analysis 41 (2003) 367 378 www.elsevier.com/locate/csda Weighted tests of homogeneity for testing the number of components in a mixture Edward Susko Department of Mathematics

More information

Bayesian linear regression

Bayesian linear regression Bayesian linear regression Linear regression is the basis of most statistical modeling. The model is Y i = X T i β + ε i, where Y i is the continuous response X i = (X i1,..., X ip ) T is the corresponding

More information

A Note on Bayesian Inference After Multiple Imputation

A Note on Bayesian Inference After Multiple Imputation A Note on Bayesian Inference After Multiple Imputation Xiang Zhou and Jerome P. Reiter Abstract This article is aimed at practitioners who plan to use Bayesian inference on multiplyimputed datasets in

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

On Reparametrization and the Gibbs Sampler

On Reparametrization and the Gibbs Sampler On Reparametrization and the Gibbs Sampler Jorge Carlos Román Department of Mathematics Vanderbilt University James P. Hobert Department of Statistics University of Florida March 2014 Brett Presnell Department

More information

NONLINEAR APPLICATIONS OF MARKOV CHAIN MONTE CARLO

NONLINEAR APPLICATIONS OF MARKOV CHAIN MONTE CARLO NONLINEAR APPLICATIONS OF MARKOV CHAIN MONTE CARLO by Gregois Lee, B.Sc.(ANU), B.Sc.Hons(UTas) Submitted in fulfilment of the requirements for the Degree of Doctor of Philosophy Department of Mathematics

More information

Generalized Linear Models for Non-Normal Data

Generalized Linear Models for Non-Normal Data Generalized Linear Models for Non-Normal Data Today s Class: 3 parts of a generalized model Models for binary outcomes Complications for generalized multivariate or multilevel models SPLH 861: Lecture

More information

Doing Bayesian Integrals

Doing Bayesian Integrals ASTR509-13 Doing Bayesian Integrals The Reverend Thomas Bayes (c.1702 1761) Philosopher, theologian, mathematician Presbyterian (non-conformist) minister Tunbridge Wells, UK Elected FRS, perhaps due to

More information

Markov Chain Monte Carlo (MCMC) and Model Evaluation. August 15, 2017

Markov Chain Monte Carlo (MCMC) and Model Evaluation. August 15, 2017 Markov Chain Monte Carlo (MCMC) and Model Evaluation August 15, 2017 Frequentist Linking Frequentist and Bayesian Statistics How can we estimate model parameters and what does it imply? Want to find the

More information

Bayesian Analysis of Vector ARMA Models using Gibbs Sampling. Department of Mathematics and. June 12, 1996

Bayesian Analysis of Vector ARMA Models using Gibbs Sampling. Department of Mathematics and. June 12, 1996 Bayesian Analysis of Vector ARMA Models using Gibbs Sampling Nalini Ravishanker Department of Statistics University of Connecticut Storrs, CT 06269 ravishan@uconnvm.uconn.edu Bonnie K. Ray Department of

More information

A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness

A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness A. Linero and M. Daniels UF, UT-Austin SRC 2014, Galveston, TX 1 Background 2 Working model

More information

Longitudinal + Reliability = Joint Modeling

Longitudinal + Reliability = Joint Modeling Longitudinal + Reliability = Joint Modeling Carles Serrat Institute of Statistics and Mathematics Applied to Building CYTED-HAROSA International Workshop November 21-22, 2013 Barcelona Mainly from Rizopoulos,

More information

Bayesian Networks in Educational Assessment

Bayesian Networks in Educational Assessment Bayesian Networks in Educational Assessment Estimating Parameters with MCMC Bayesian Inference: Expanding Our Context Roy Levy Arizona State University Roy.Levy@asu.edu 2017 Roy Levy MCMC 1 MCMC 2 Posterior

More information

Appendix: Modeling Approach

Appendix: Modeling Approach AFFECTIVE PRIMACY IN INTRAORGANIZATIONAL TASK NETWORKS Appendix: Modeling Approach There is now a significant and developing literature on Bayesian methods in social network analysis. See, for instance,

More information

Markov chain Monte Carlo

Markov chain Monte Carlo Markov chain Monte Carlo Karl Oskar Ekvall Galin L. Jones University of Minnesota March 12, 2019 Abstract Practically relevant statistical models often give rise to probability distributions that are analytically

More information

Gaussian process for nonstationary time series prediction

Gaussian process for nonstationary time series prediction Computational Statistics & Data Analysis 47 (2004) 705 712 www.elsevier.com/locate/csda Gaussian process for nonstationary time series prediction Soane Brahim-Belhouari, Amine Bermak EEE Department, Hong

More information

Univariate Normal Distribution; GLM with the Univariate Normal; Least Squares Estimation

Univariate Normal Distribution; GLM with the Univariate Normal; Least Squares Estimation Univariate Normal Distribution; GLM with the Univariate Normal; Least Squares Estimation PRE 905: Multivariate Analysis Spring 2014 Lecture 4 Today s Class The building blocks: The basics of mathematical

More information

Efficient MCMC Sampling for Hierarchical Bayesian Inverse Problems

Efficient MCMC Sampling for Hierarchical Bayesian Inverse Problems Efficient MCMC Sampling for Hierarchical Bayesian Inverse Problems Andrew Brown 1,2, Arvind Saibaba 3, Sarah Vallélian 2,3 CCNS Transition Workshop SAMSI May 5, 2016 Supported by SAMSI Visiting Research

More information

Statistical Practice

Statistical Practice Statistical Practice A Note on Bayesian Inference After Multiple Imputation Xiang ZHOU and Jerome P. REITER This article is aimed at practitioners who plan to use Bayesian inference on multiply-imputed

More information

MULTILEVEL IMPUTATION 1

MULTILEVEL IMPUTATION 1 MULTILEVEL IMPUTATION 1 Supplement B: MCMC Sampling Steps and Distributions for Two-Level Imputation This document gives technical details of the full conditional distributions used to draw regression

More information

Advanced Statistical Modelling

Advanced Statistical Modelling Markov chain Monte Carlo (MCMC) Methods and Their Applications in Bayesian Statistics School of Technology and Business Studies/Statistics Dalarna University Borlänge, Sweden. Feb. 05, 2014. Outlines 1

More information

Bayesian Nonparametric Regression for Diabetes Deaths

Bayesian Nonparametric Regression for Diabetes Deaths Bayesian Nonparametric Regression for Diabetes Deaths Brian M. Hartman PhD Student, 2010 Texas A&M University College Station, TX, USA David B. Dahl Assistant Professor Texas A&M University College Station,

More information

Computationally Efficient Estimation of Multilevel High-Dimensional Latent Variable Models

Computationally Efficient Estimation of Multilevel High-Dimensional Latent Variable Models Computationally Efficient Estimation of Multilevel High-Dimensional Latent Variable Models Tihomir Asparouhov 1, Bengt Muthen 2 Muthen & Muthen 1 UCLA 2 Abstract Multilevel analysis often leads to modeling

More information

Improved Robust MCMC Algorithm for Hierarchical Models

Improved Robust MCMC Algorithm for Hierarchical Models UNIVERSITY OF TEXAS AT SAN ANTONIO Improved Robust MCMC Algorithm for Hierarchical Models Liang Jing July 2010 1 1 ABSTRACT In this paper, three important techniques are discussed with details: 1) group

More information

Part 7: Hierarchical Modeling

Part 7: Hierarchical Modeling Part 7: Hierarchical Modeling!1 Nested data It is common for data to be nested: i.e., observations on subjects are organized by a hierarchy Such data are often called hierarchical or multilevel For example,

More information

The STS Surgeon Composite Technical Appendix

The STS Surgeon Composite Technical Appendix The STS Surgeon Composite Technical Appendix Overview Surgeon-specific risk-adjusted operative operative mortality and major complication rates were estimated using a bivariate random-effects logistic

More information

A quick introduction to Markov chains and Markov chain Monte Carlo (revised version)

A quick introduction to Markov chains and Markov chain Monte Carlo (revised version) A quick introduction to Markov chains and Markov chain Monte Carlo (revised version) Rasmus Waagepetersen Institute of Mathematical Sciences Aalborg University 1 Introduction These notes are intended to

More information

Separate and Joint Modeling of Longitudinal and Event Time Data Using Standard Computer Packages

Separate and Joint Modeling of Longitudinal and Event Time Data Using Standard Computer Packages Separate and Joint Modeling of Longitudinal and Event Time Data Using Standard Computer Packages Xu GUO and Bradley P. CARLIN Many clinical trials and other medical and reliability studies generate both

More information

Estimating marginal likelihoods from the posterior draws through a geometric identity

Estimating marginal likelihoods from the posterior draws through a geometric identity Estimating marginal likelihoods from the posterior draws through a geometric identity Johannes Reichl Energy Institute at the Johannes Kepler University Linz E-mail for correspondence: reichl@energieinstitut-linz.at

More information

Bayesian Linear Regression

Bayesian Linear Regression Bayesian Linear Regression Sudipto Banerjee 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. September 15, 2010 1 Linear regression models: a Bayesian perspective

More information

Principles of Bayesian Inference

Principles of Bayesian Inference Principles of Bayesian Inference Sudipto Banerjee and Andrew O. Finley 2 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. 2 Department of Forestry & Department

More information