2 1 Introduction Multilevel models, for data having a nested or hierarchical structure, have become an important component of the applied statistician

Size: px
Start display at page:

Download "2 1 Introduction Multilevel models, for data having a nested or hierarchical structure, have become an important component of the applied statistician"

Transcription

1 Implementation and Performance Issues in the Bayesian and Likelihood Fitting of Multilevel Models William J. Browne 1 and David Draper 2 1 Institute of Education, University of London, 20 Bedford Way, London WC1H 0AL, England 2 Department of Mathematical Sciences, University of Bath, Claverton Down, Bath BA2 7AY, England Summary We use simulation studies (a) to compare Bayesian and likelihood fitting methods, in terms of validity of conclusions, in two-level random-slopes regression (RSR) models, and (b) to compare several Bayesian estimation methods based on Markov chain Monte Carlo, in terms of computational efficiency, in random-effects logistic regression (RELR) models. We find (a) that the Bayesian approach witha particular choice of diffuse inverse Wishart prior distribution for the (co)variance parameters performs at least as well in terms of bias of estimates and actual coverage of nominal 95% intervals as maximum likelihood methods in RSR models with medium sample sizes (expressed in terms of the number J of level 2 units), but neither approach performs as well as might be hoped with small J; and (b) that an adaptive hybrid Metropolis-Gibbs sampling method we have developed for use in the multilevel modeling package MlwiN outperforms adaptive rejection Gibbs sampling in the RELR models we have considered, sometimes by a wide margin. Keywords: Adaptive Metropolis Sampling, Diffuse Prior Distributions, Educational Data, Gibbs Sampling, Hierarchical Modeling, IGLS, Markov Chain Monte Carlo (MCMC), MCMC Efficiency, Maximum Likelihood Methods, Random-Effects Logistic Regression, Random-Slopes Regression, RIGLS, Variance Components.

2 2 1 Introduction Multilevel models, for data having a nested or hierarchical structure, have become an important component of the applied statistician's tool-chest in the past 15 years (e.g., Bryk and Raudenbush 1992, Goldstein 1995, Draper 2000). Examples include variance-components (VC), random-slopes regression (RSR), and random effects logistic regression (RELR) models, all of which we will visit in what follows. In the early days of multilevel modeling the only available fitting methods were based on maximum likelihood: iterative generalized least squares (IGLS) and restricted IGLS (RIGLS) or related methods such as Fisher scoring (Longford 1987), restricted maximum likelihood (REML), and empirical Bayes estimation (Bryk et al. 1988) for models with Gaussian outcomes (Goldstein 1986, 1989); and marginal quasi-likelihood (MQL) and penalized (or predictive) quasi-likelihood (PQL) for data sets with dichotomous outcomes (e.g., Breslow and Clayton 1993). More recently fully Bayesian analyses based on Markov chain Monte Carlo (MCMC) methods have become possible in packages such as BUGS (Spiegelhalter et al. 1997) and MLwiN (Rasbash et al. 1999). Recent alternatives for fitting multilevel models, which we do not pursue here, include integratedlikelihood approaches based on Gaussian quadrature (e.g., Pinheiro and Bates 1995) and Laplace approximations (e.g., Raudenbush et al. 2000). We (the authors of this article) are the co-developers of the Bayesian MCMC capabilities in MLwiN. Below we examine (a) the relative performance, in the sense of point and interval estimation accuracy, of likelihood and Bayesian fitting methods in RSR models, and (b) some performance comparisons in RELR models in the sense of required CPU time to achieve a given accuracy of posterior summary between several MCMC fitting methods, including adaptive rejection sampling (Gilks and Wild 1992) and an approach we have developed specifically for MLwiN based on adaptive hybrid Metropolis-Gibbs sampling. In a companion article to this one (Browne and Draper 1999, hereafter BD99) we compare likelihood and Bayesian fitting methods in VC and RELR models (also see Hoijtink 2000 for an MCMC investigation of a random-intercept model). 2 Random-slopes regression (RSR) models A multilevel modeling data set which we have found useful in fixing ideas was collected by Mortimore et al. (1988) in a study called the Junior School Project (JSP). This was a longitudinal investigation of roughly 2,000 pupils from 50 primary schools chosen randomly from the 636 Inner London Education Authority (ILEA) schools in Woodhouse et al. (1995) examined a random subsample of N = 887 students at J = 48 of these schools; here we will refer to this subsample as the JSP data. One focus of principal interest was the relationship between mathematics test scores at year 3 (math3) and

3 3 Table 1: A comparison of IGLS/RIGLS and Bayesian fitting (with the diffuse prior labeled Wishart 1 in Section 2.1.2) in model (1) applied to the JSP data. Figures in parentheses in the upper table are SEs (for the ML methods) or posterior SDs (for the Bayesian method). Point Estimates Parameter Method fi 0 fi 1 ffu0 2 fl 01 ffu1 2 ffe 2 IGLS : (0.366) (0.043) (1.30) (0.119) (0.017) (1.34) RIGLS : (0.370) (0.043) (1.33) (0.122) (0.017) (1.34) Bayesian : (0.371) (0.058) (1.39) (0.153) (0.029) (1.35) 95% Interval Estimates Parameter Method fi 0 fi 1 ffu0 2 fl 01 ffu1 2 ffe 2 RIGLS (29:8; (0:529; (2:13; ( 0:611 (0:004; (24:3; (Gaussian) 31:3) 0:697) 7:35) 0:133) 0:070) 29:6) Bayesian (29:9; (0:505; (2:36; ( 0:660; (0:058; (24:3; 31:3) 0:732) 7:77) 0:061) 0:170) 29:7) year 5 (math5). These two variables have marginal distributions which are not far from Gaussian, and the scatterplot of the entire data set indicates approximate linearity, but it is quite possible that the slopes and intercepts of the best-fitting linear models at the school level (one regression line per school) are different. However, the numbers n j of pupils per school vary from 5 to 62 in this data set, with fully 1 of the schools having 12 pupils or less, 3 so it would be unwise to attempt to fit regressions local to each school. A natural approach that strikes a balance between global fitting (which ignores the clustered character of the data) and local regression (which would be unstable) is based on a random-slopes regression model of the form y ij =(fi 0 + u 0j )+(fi 1 + u 1j )(x ij μx)+e ij ; u j = u0j u 1j IID ο N2 (0;V u ) ;V u = ff 2 u0 fl 01 fl 01 ff 2 u1 ;e ij IID ο N(0;ff 2 e ); (1) where i =1;:::;n j ;j =1;:::;J; P J j=1 n j = N;y ij and x ij are the math5 and math3 scores for pupil i in school j, respectively, andμx is the mean of math3 over all N pupils (centering the predictor in this way improves convergence of the iterative estimation methods discussed below). This model regards the schools as having been drawn randomly from a population of schools, each having its own slope and intercept, and the result of fitting (1) will be to

4 4 shrink the local estimates of these parameters toward the global (population) regression. Table 1 presents a comparison of maximum likelihood (IGLS/RIGLS) and Bayesian (posterior mean) estimates of model (1) applied to the JSP data (the Bayesian results use a particular diffuse prior to be discussed below in Section 2.1.2). Packages such asmlwin (which produced these summaries) often report only point estimates and standard errors when maximum likelihood (ML) fitting is employed, thereby tacitly encouraging users to build largesample Gaussian interval estimates of the form h^ ± 1:96 d SE(^ ) i ; we quote these intervals in the table. It may be seen that the two fitting methods produce similar results for many of the point estimates (with the notable exception of ff 2 u1, where the Bayesian estimate is almost three times the size of the RIGLS value), but the Bayesian intervals are all wider than their likelihood counterparts (up to 70% wider in the case of ff 2 u1). An MLwiN user writing a paper based on the JSP data might well wonder which setof results to report (bearing in mind that the ML and diffuse-prior Bayesian approaches are estimating different quantities, namely in Bayesian language the modes and means of the marginal posterior distributions, respectively). We offer a suggested answer to this question below in Section Methods for fitting RSR models IGLS and RIGLS IGLS is an iterative maximum likelihood method based on generalized least squares (GLS). For the general algorithm see Goldstein (1995); here we present asketch of this fitting method in the special case of the RSR model (1). This model may be expressed in the usual general linear model form Y = Xfi + e Λ (2) by means of the following steps: (a) stackthevalues y ij in the order (y 11 ;:::; y n11 ; :::; y 1J ;:::;y nj J) into the N 1vector Y ; (b) create a vector x out of the (x ij μx) values analogously, let X be the N 2 matrix whose first column is a vector of 1s and whose second column is x, and define the 2 1 vector fi =(fi 0 ;fi 1 ) T ; and (c) stack the values e Λ ij = u 0j + u 1j (x ij μx)+e ij into the N 1vector e Λ. Under the standard assumptions this vector has mean 0 and covariance matrix V whose diagonal elements are V (e Λ ij) =ff 2 u0 +2(x ij μx) fl 01 +(x ij μx) 2 ff 2 u1 + ff 2 e (3) and whose off-diagonal elements may be computed as in Goldstein (1995). The idea underlying IGLS is that (i) if V were known, fi could be estimated by GLS, yielding ^fi = X T V 1 X 1 X T V 1 Y (4)

5 5 with covariance matrix X T V 1 X 1, and (ii) if fi were known, one could form the residuals ~ Y = Y Xfi, calculate Y Λ = ~ Y ~ Y T, stack the columns of Y Λ into one long column vector Y ΛΛ, and define a linear model Y ΛΛ = Z Λ + ffl; (5) where Z Λ is the design matrix for the random-effects parameters = ff 2 u0 ; fl 01 ;ff 2 u1 T in model (1). Another application of GLS then yields ^ = Z ΛT V Λ 1 Z Λ 1 Z ΛT V Λ 1 Y ΛΛ ; (6) where V Λ = V ΩV, Ω is the Kronecker product, and the covariance matrix of 1. ^ is 2 Z ΛT V Λ 1 Z Λ Starting with initial estimates of the fixed effects fi from ordinary least squares, IGLS iterates between equations (4) and (6) to convergence, which is judged to occur when two successive sets of estimates differ by no more than a given tolerance (on a component-by-component basis; this is not guaranteed to occur in RSR models, as will be seen in Section 2.2). As with many ML procedures, IGLS produces biased estimates in small samples, often in particular underestimating random-effects variances because the sampling variation of ^fi is not accounted for in the algorithm above. Defining the residuals instead as ~ Y Λ = Y X ^fi and ^Y Λ = ~ Y Λ ~Y Λ T, Goldstein (1989) showed that E( ^Y Λ )=V X X T V 1 X 1 X T ; (7) so that the IGLS estimates can be bias-adjusted by adding the second term in (7) to ^Y Λ at each iteration. This is restricted IGLS (RIGLS), which coincides with restricted maximum likelihood (REML) in Gaussian models such as(1). Standard errors of IGLS and RIGLS estimates are based on the final values of the covariance matrices mentioned above at convergence Prior distributions in multilevel modeling The Bayesian fitting of multilevel models requires, as usual in Bayesian work, a joint prior distribution for the parameters, or a series of marginal priors if a priori independence is assumed. If substantive information about the parameters is available Λ then it should naturally be used, although there are risks to the validity of one's conclusions if strong prior information that is after the fact seen to have been out of step with reality is employed (see, e.g., BD99 for an example). When developing the Bayesian capabilities in MLwiN we took the view that, in addition to being provided with a facility for Λ For instance, from expert judgment (see, e.g., Madigan et al for a method of eliciting a prior data set" in the context of graphical models) or previous studies judged relevant to the current inquiry.

6 6 specifying informative priors, users should be given the option of selecting among one or more diffuse priors for those occasions when little was known a priori. The reason for the phrase one or more" in the last sentence is that, while the literature is unanimous in recommending Gaussian priors with huge variances (effectively, improper U( 1; 1) priors) as the diffuse choice for fixed effects and this is what weuseinmlwin and in what follows there are several possibilities for diffuse priors on (co)variance parameters and matrices. The conjugate choice for the level 1 variance ffe 2 in model (1) is the scaled inverse chi-square χ 2 (ν; s 2 ) family (e.g., Gelman, Carlin, et al. 1995); this is equivalent toaninverse gamma 1 s2 ν ; ν distribution, where ν is the 2 2 prior effective sample size and s 2 is a prior estimate of ffe. 2 In the results below weusetwo diffuse members of this family: ffl A (proper) locally uniform prior for ffe 2 on (0; 1 ffl ) for small positive ffl (Gelman and Rubin 1992, Carlin 1992), which is equivalent to a Pareto(1;ffl) prior for the precision fi e = ff 1 (Spiegelhalter et al. 1997); 2 e and ffl A (proper) 1 (ffl; ffl) prior for ff 2 e (Spiegelhalter et al. 1997), for small positive ffl. These priors are specified within the χ 2 (ν; s 2 ) family bythechoices (ν; s 2 )= ( 2; 0) and (2ffl; 1), respectively. We have found that results are generally insensitive to the specific choice of ffl in the region of (the default setting recommended by the developers of the BUGS package for the 1 prior); we report findings with this value. All the remaining parameters in model (1) are contained in the covariance matrix V u, for which the conjugate choice is the inverse Wishart W 1 (ν p ;S p ) family, where in parallel with the χ 2 (ν; s 2 ) distribution ν p is the prior effective sample size and S p is a prior estimate for V u. We examine three diffuse settings of the Wishart parameters below: (ν p ;S p )=(2;I 2 ) (labeled Wishart prior 1 in Tables 4 and 5 below), (4; ^± u ) (Wishart prior 2 in what follows), and ( 3; 0), where I 2 is the 2 2 identity matrix and ^± u is the RIGLS estimate of V u. The third of these settings corresponds to an improper uniform prior on V u, and it is perhaps worth noting that the second is gently data-determined Gibbs sampling: full conditionals With the conjugate prior choices in the previous subsection, all of the full conditional distributions in model (1) have familiar closed-form expressions, making Gibbs sampling a natural MCMC choice with this model. It is helpful notationally (a) to re-express the first line of the model as y ij = fi 0 x 0ij + fi 1 x 1ij + u 0j x 0ij + u 1j x 1ij + e ij ; (8)

7 7 where x 0ij =1andx 1ij =(x ij μx) in the previous notation, and (b) to let X ij stand for the row vector (x 0ij ;x 1ij ). Gibbs sampling in model (1) or (8) proceeds most smoothly by regarding the level 2 residuals u j =(u 0j ;u 1j ) as latent variables to be sampled along with the model parameters. It is then natural to split the unknowns into four groups: the 2 1 column vector of fixed effects fi, the level 2 residuals u j,the level 2 variance matrix V u, and the level 1 variance ff 2 e. The full conditionals with this blocking of unknowns and the priors as above are then as follows (cf. Zeger and Karim 1991): h fi j y; ff 2 e ;u ο N ^Dff P J P nj 2 2 j=1 e i=1 XT ij (y ij X ij u j ) ; ^D ; P nj i i=1 XT ij (y ij X ij fi) ; ^Dj ; (V u j u) ο W J P 1 J + ν p ; j=1 ut j u j + S p ; uj j y; V u ;ff 2 e;fi ο N 2 h ^Dj ff 2 e h ff 2 e j y; fi; u ο 1 N+νe ; h where ^D = ff 2 e P ij XT ij X ij 1 ; ^Dj = y ij X ij (fi + u j ). ν e s 2 e + P J j=1 1 ff 2 e P nj i=1 e2 ij P nj i=1 XT ij X ij + V u Gibbs sampling: computational efficiency gains i i ; (9) i 1, and eij = RSR models are a special case of the general L-level model, in which Gibbs sampling has the same basic four steps as in equation (9) above (also see Section 3.1). By analyzing the allocation of CPU time across these steps in a naive initial coding of the algorithm, we were able to identify two efficiency gains which considerably reduced execution time without creating undue storage burden. P P n ffl Quantities such asm j = j i=1 XT ij X J ij and M j=1 j involve onlythe fixed matrix X of predictors and can be calculated once and stored for useineach iteration. ffl Let Xli Λ be the vector of predictor variables at level l for observation i in the L-level formulation, where l = 1 refers to the variables associated with the fixed effects (the fi k in (8)), and let e i be the level 1 residual for observation i. Considerable use is made in the general algorithm of quantities of the form d li = e i + c li, where c li is the product of a vector fil Λ of fixed or random effects and the predictor vector Xli Λ (note, for example, the appearance of (y ij X ij u j ) and (y ij X ij fi) in the first two steps of (9)). If the e i are stored then whenever d li is needed the current value of fil Λ times Xli Λ can be added to the residual for use in sampling the new value of fil Λ, and once this is done the new level 1 residual is available by subtraction of the updated product c li from the new d li.

8 Starting values and burn-in/monitoring strategy Normally in MCMC analyses considerable attention needs to be given to the combined choice of starting values and burn-in strategy, to ensure that the equilibrium distribution has been reached before monitoring begins. This is far less of a problem in MLwiN, where likelihood estimates provide initial values of sufficient quality that burn-ins of 500 iterations (the default) or less are typically more than adequate y. MLwiN release 1.1 provides a range of MCMC diagnostics to aid in determining the minimum length of monitoring run to achieve the user's accuracy goals in summarizing the marginal and joint posterior distributions of interest. Time series traces, or trajectories, may be displayed for all unknowns in the model, and clicking on any of these trajectories produces a pop-up window with the following diagnostics and summaries: a marginal kernel density trace, the autocorrelation and partial autocorrelation functions, a plot of the Monte Carlo standard error of the posterior mean as a function of the length n M of the monitoring run, the Raftery-Lewis (1992) and Brooks- Draper (2000) diagnostics, and the posterior mean, standard deviation (SD), median, and (by default) 2.5% and 97.5% quantiles. With its default settings the Raftery-Lewis diagnostic estimates how large n M needs to be so that the actual posterior probability content of the nominal central 95% interval for the parameter in question is between 94% and 96% with Monte Carlo probability at least 95%. By contrast the Brooks-Draper diagnostic estimates the value of n M required so that the posterior mean of may be quoted to k significant figures with at least 100(1 ff)% Monte Carlo probability. In the typical case in which the trajectory for approximates that of an autoregressive time series of order 1 with estimated first-order autocorrelation ^ρ, and writing the estimated posterior mean in the form ^ = a 10 b for 1» a<10, the required ^n M satisfies ^n M 4 hφ 1 1 ff 2 i 2 b k+1 2 ^ff 1+^ρ 10 1 ^ρ ; (10) where ^ff is the estimated posterior SD of. It is presumed that the user has thought carefully about the scale on which results are to be reported for example, diagnostic (10) applied to the JSP data would produce a much larger value of ^n M for fi 0 if 30 were subtracted from all the observations with k held fixed z. y Jim Hodges (personal communication) has recently noted that there may be more potential problems with multimodality of posterior distributions in hierarchical models than is commonly believed; see Liu and Hodges (1999) for details. This may beinvestigated in MLwiN by making parallel runs with widely dispersed starting values, as in Gelman and Rubin (1992). z For example, if the user wished to report ^fi0 =30:6 =3: , i.e., k =3,(10)would be applied with b = 1; whereas if 30 were subtracted from all data values and the user still insisted on k = 3, the estimate would now be6: , (10) would now beinvoked with all the same inputs except b = 1, and the new ^nm value would be 10,000 times larger

9 9 Table 2: Summary of initial study designs for the RSR model (1) simulations. Total Design Number of pupils number of (J) per school (n j) pupils (N) 3(12) (12) (18 for all schools) 216 7(48) (48) (18 for all schools) Simulation study design Our interest in conducting the simulation study described here focused upon the effects of three aspects of model (1) on the performance of methods used to fit the model: the total numbers of level 1 and 2 units in the design (pupils and schools in the JSP data, respectively), the degree of imbalance in the numbers of pupils per school, and the strength of correlation between the level 1 and 2 random effects, which is governed by the covariance parameter fl 01. As in our study (BD99) of the variance-components model y ij = fi 0 + u j + e ij ; i =1;:::;n j ; j =1;:::;J; P J j=1 n j = N; u j IID ο N(0;ff 2 u ); e ij IID ο N(0;ff 2 e ); (11) which is a special case of (1) without the covariate x, we therefore initially examined four different study designs with respect to J and the n j, crossing this factor with five values of fl 01, in all cases holding the other parameters in (1) constant at values similar to those in a version of the JSP data with greater between-school variation in the effect of x on y: fi 0 = 30:0;fi 1 = 0:5;ff 2 u0 = 5:0;ff 2 u1 = 0:5, and ff 2 e = 30:0. Table 2 summarizes the initial designs considered, which are numbered 3, 4, 7, and 8 for consistency with BD99. Design 7 was arrived at by removing one pupil at random from the 23 largest schools in the JSP data, to produce a value of N (864) which was an integer multiple of 18, the average number of pupils per school in all designs. Designs 4 and 8 are balanced, with 12 and 48 schools respectively, and design 3 is imbalanced in a way that mimics the actual JSP distribution of pupils per school. The chosen values of fl 01 were ±1:4; ±0:5, than before. In effect, in the presence of Monte Carlo uncertainty, it is just as hard to accurately announce a posterior mean of when the posterior SD is (say) as it is to quote a posterior mean of with the same posterior SD.

10 10 Table 3: Summary of convergence results when ML methods are used to fit model (1). Entries m 1=m 2 give numbers of simulated data sets out of 1,000 with indicated results for IGLS (m 1) and RIGLS (m 2). Study design is given in terms of the number of level 2 units (12 or 48) and whether the design is balanced (B) or unbalanced (U). PD = estimate of V u positive definite. Failed to Converged Converged Design ρ Converge but not PD and PD 3(12U) /349 21/77 623/574 3(12U) /124 5/19 902/857 (Λ) 3 (12U) /116 2/7 927/877 3(12U) /118 3/11 906/871 3(12U) /366 12/76 621/558 4 (12B) /74 3/23 914/903 4 (12B) /14 1/1 986/985 (Λ) 4 (12B) 0.0 9/9 0/1 991/990 4 (12B) /7 0/2 994/991 4 (12B) /72 3/25 912/903 (Λ) 7 (48U) /14 1/2 986/984 (Λ) 7 (48U) /2 0/0 998/998 (Λ) 7 (48U) 0.0 0/0 0/0 1000/1000 (Λ) 7 (48U) /0 0/0 1000/1000 (Λ) 7 (48U) /15 0/2 984/983 8 (48B) /6 0/2 994/992 8 (48B) /1 0/0 999/999 (Λ) 8 (48B) 0.0 0/0 0/0 1000/ (48B) /0 0/0 1000/ (48B) /8 0/0 992/992 and 0, corresponding to correlations between the slope and intercept random effects of ρ = ±0:89; ±0:32, and 0, respectively. In each cell of the 4 5layout crossing design and correlation, we simulated 1,000 data sets according to model (1), holding the predictor x fixed throughout at its values in the 864-pupil version of the JSP data. Table 3 summarizes the sorts of convergence problems to which the ML methods IGLS and RIGLS are susceptible in RSR models. It may be seen that in unbalanced multilevel data sets with relatively few level 2 units and strong correlation between the slope and intercept random effects, convergence of both IGLS and RIGLS can fail to occur up to 37% of the time, and even when convergence occurs the resulting estimate of the covariance matrix V u in model (1) may failto be positive definite on a significant number of occasions (up to 8% of the simulated data sets). As is intuitively reasonable, problems of this kind occur more readily with increasing jρj, decreasing J, and increasing imbalance. Figure 1 presents a trajectory plot for a data set in which IGLS has failed to converge; this fitting method appears to be cycling

11 11 Figure 1: Trajectory plot arising from IGLS fitting of model (1) to a simulated data set in which convergence is not achieved. between two sets of parameter estimates, even though Bayesian analysis of the same data showed that the posterior distribution with a diffuse prior was unimodal (it is possible that direct application of the EM algorithm instead of IGLS would yield the ML estimates with these data while avoiding problems like those in Figure 1). To avoid ML convergence problems and concentrate on other performance measures such as bias and coverage of point andinterval estimates, respectively, we focus in reporting our main results on the designs marked with a (Λ) intable 3. This subset of eight designs includes two with a small J value (12), six unbalanced designs, and four with nonzero ρ, so that something may be said about the effects of all three of these factors on the outcomes of interest. The subsets of the 1,000 replications in each design for which both ML methods converged and produced positive definite V u estimates are used in what follows, making the simulation sample size at least 877 in all designs examined in detail. To decidehow long to monitor the Gibbs-sampling output we estimated time per iteration and calculated Raftery-Lewis diagnostics as a function of the total number of pupils N. This revealed that the smaller designs in Table 2 needed longer monitoring runs to satisfy Raftery-Lewis default accuracy constraints but took less time per iteration, leading to the following monitoring run lengths: 30,000 in studies 3 and 4, and 10,000 in studies 7 and 8. After verifying that MLwiN and BUGS gave identical results (up to Monte Carlo noise) with RSR models specified with the same priors, for

12 12 computational convenience we used MLwiN for the IGLS/RIGLS and uniform- Wishart-prior Bayesian results and BUGS for the other two sets of Wishart findings. 2.3 RSR validity results Tables 4 and 5 and Figures 2 5 summarize the performance, in RSR model (1), of the two ML methods (IGLS and RIGLS, which do not use prior distributions) and the Bayesian approach with the three prior distributions described in Section 2.1.2, in terms of bias of point estimates and coverage and length of nominal 95% interval estimates. For the sake of brevity we report full numerical results about interval estimates only for two of the eight (Λ) designs in Table 3 (chosen to be typical), although bias results from all eight designs are i displayed in the figures. Bias is reported in relative terms (as 100h^ %) except when the true value of the parameter was 0, when it is presented in absolute terms ([^ ]). To simulate the behavior of users of the ML features in packages such asmlwin,wherepoint estimates and standard errors are typically reported rather than intervals, the ML results in the middle and bottom ^ iparts of Tables 4 and 5 are based on intervalsofthe form h^ ± 1:96 d SE for all parameters of model (1). The performance of intervals for variance parameters based on the gamma distribution (see BD99 for examples and formulae) was only marginally better than that for the large-sample Gaussian intervals examined here; we again omit details for brevity. As noted in Section 2.1.2, we obtained results using both U 0; 1 ffl and 1 (ffl; ffl) priors for ffe, 2 which in all cases were so close that there was little value in presenting both. We examined the behavior of posterior means, medians, and modes for a number of the Bayesian estimation methods but confine our reporting here largely to results for posterior means. Bayesian 95% intervals were obtained as the 2.5% and 97.5% points in the empirical distributions of the MCMC draws for each parameter. The following conclusions are evident from the tables and plots given here and from additional detailed results in Browne (1998), which isavailable on the web at ffl All methods produced estimates of fi 0 ;fi 1, and ff 2 e which were close to unbiased in all simulation design configurations (Figures 2 and 4), and the actual coverage of nominal 95% intervals for these parameters was close to 95% for all methods in the designs with J =48(as was also the case with J = 12 for ff 2 e). ffl Imbalance in the design had a smaller effect on the results than the number of level 2 units (Figures 4 and 5), and with a few exceptions (see Figures 2 and 3) the effect of ρ on performance was also modest.

13 13 Table 4: Summary of RSR results, unbalanced design (3) with J = 12 schools and ρ = 0, based on 877 simulated data sets. Monte Carlo standard errors are in parentheses; bias in top table is relative (in percentage points) except for fl 01, where absolute bias is reported [in brackets] since the true value is zero. Relative Bias (%) Parameter Method fi 0 fi 1 ffu0 2 fl 01 ffu1 2 ffe 2 IGLS [ (0.15) (1.5) (2.2) (0.02)] (1.6) (0.40) RIGLS [ (0.15) (1.5) (2.2) (0.02)] (1.7) (0.41) Wishart [ Prior 1 (0.09) (1.5) (2.3) (0.02)] (1.8) (0.36) Wishart [ Prior 2 (0.09) (1.5) (2.1) (0.02)] (1.7) (0.36) Uniform [ Prior 3 (0.09) (1.5) (5.1) (0.05)] (3.7) (0.35) 95% Interval Coverage Parameter Method fi 0 fi 1 ffu0 2 fl 01 ffu1 2 ffe 2 IGLS RIGLS Wishart Wishart Uniform % Interval Length Parameter Method fi 0 fi 1 ffu0 2 fl 01 ffu1 2 ffe 2 IGLS (0.02) (0.006) (0.2) (0.03) (0.01) (0.04) RIGLS (0.03) (0.007) (0.2) (0.03) (0.01) (0.05) Wishart (0.02) (0.005) (0.2) (0.04) (0.02) (0.04) Wishart (0.02) (0.007) (0.2) (0.03) (0.01) (0.04) Uniform (0.04) (0.01) (0.7) (0.1) (0.06) (0.04) Note: Monte Carlo SEs for coverage rates in the middle table ranged from 0.7% (for estimates near 95%) to 1.4% (for estimates near 80%).

14 14 Table 5: Summary of RSR results, unbalanced design (7) with J =48schools and ρ = 0:89, based on 984 simulated data sets. Monte Carlo standard errors are in parentheses; bias in top table is relative (in percentage points). Relative Bias (%) Parameter Method fi 0 fi 1 ffu0 2 fl 01 ffu1 2 ffe 2 IGLS (0.04) (0.69) (0.92) (0.79) (0.72) (0.16) RIGLS (0.04) (0.69) (0.94) (0.79) (0.73) (0.16) Wishart Prior 1 (0.04) (0.69) (0.94) (0.82) (0.74) (0.16) Wishart Prior 2 (0.04) (0.69) (0.90) (0.82) (0.75) (0.16) Uniform Prior 3 (0.04) (0.69) (1.1) (0.93) (0.84) (0.16) 95% Interval Coverage Parameter Method fi 0 fi 1 ffu0 2 fl 01 ffu1 2 ffe 2 IGLS RIGLS Wishart Wishart Uniform % Interval Length Parameter Method fi 0 fi 1 ffu0 2 fl 01 ffu1 2 ffe 2 IGLS (0.005) (0.001) (0.04) (0.009) (0.003) (0.01) RIGLS (0.005) (0.001) (0.04) (0.009) (0.003) (0.001) Wishart (0.005) (0.001) (0.04) (0.01) (0.003) (0.01) Wishart (0.005) (0.001) (0.04) (0.009) (0.003) (0.01) Uniform (0.005) (0.002) (0.05) (0.01) (0.004) (0.01) Note: Monte Carlo SEs for coverage rates in the middle table ranged from 0.7% (for estimates near 95%) to 1.0% (for estimates near 90%).

15 15 Relative bias (%) fi 0 IGLS RIGLS Wishart Prior 1 Wishart Prior 2 Uniform Prior ρ Relative bias (%) fi ρ Relative bias (%) ff 2 e ρ Figure 2: Relative bias (in %) as a function of ρ for fi 0, fi 1, and ff 2 e, design 7.

16 16 Relative bias (%) ff 2 u0 IGLS RIGLS Wishart Prior 1 Wishart Prior 2 Uniform Prior ρ Relative bias (%) fl ρ Relative bias (%) ff 2 u ρ Figure 3: Relative bias (in %) as a function of ρ for ff 2 u0, fl 01, andff 2 u1, design 7.

17 17 Relative bias (%) fi 0 IGLS RIGLS Wishart Prior 1 Wishart Prior 2 Uniform Prior 12U 12B 48U 48B Study Design Relative bias (%) fi 1 12U 12B 48U 48B Study Design Relative bias (%) ff 2 e 12U 12B 48U 48B Study Design Figure 4: Relative bias (in %) as a function of study design (symbols as in Table 3) for fi 0, fi 1, and ff 2 e, with ρ =0.

18 18 Relative bias (%) ff 2 u0 IGLS RIGLS Wishart Prior 1 Wishart Prior 2 Uniform Prior 12U 12B 48U 48B Study Design Absolute bias fl 01 12U 12B 48U 48B Study Design Relative bias (%) ff 2 u1 12U 12B 48U 48B Study Design Figure 5: Bias as a function of study design (symbols as in Table 3) for ff 2 u0, fl 01, and ff 2 u1, with ρ =0. Top and bottom figures display relative bias (in %); middle figure gives absolute bias (the true quantity is 0).

19 19 ffl IGLS underestimated ff 2 u0 and ff 2 u1 in all settings examined (Figures 3 and 5), by up to 13 ± 2 percentage points in the case of design (Λ)4 (±x denotes the Monte Carlo standard error here and below). RIGLS often corrected this bias substantially, although the correction resulted in positive biases of up to 10 ± 2 percentage points (for ff 2 u0 in design (Λ) 3; see Table 4). ffl While uniform priors on location parameters in many Bayesian settings, including RSR model (1), perform well even in fairly small-sample situations, a uniform prior on the covariance matrix V u yielded disastrous results, with (a) routine biases on the high side of percentage points when J = 48 and up to 219 percentage points with J = 12 (Figure 5), and (b) intervals up to 4.8 times as wide as those from the other methods, which nevertheless yielded coverages as low as 82 ± 1% (in design (Λ)3, Table 4). Using the posterior median or mode instead of the mean improves the performance to some extent (see BD99), but this prior should be regarded as a failure in calibration terms even with a fairly large sample size. ffl The two Wishart priors performed reasonably well with respect to bias in the J = 48 designs (median jrelative biasj across these six designs and the three parameters in V u of 3.9 and 2.8 percentage points, and maximum jrelative biasj of 9.0 and 9.3 percentage points, for the W 1 (2;I 2 ) and W 1 (4; ^± u ) priors, respectively; Table 5), but bias was higher in the J = 12 designs: median and maximum jrelative biasj of (7:8; 34) and (4:9; 20) percentage points for these two priors, respectively. Overall the gently data-determined W 1 (4; ^± u ) prior performed a bit better than the other Wishart prior we examined on bias grounds, but identifying a diffuse prior for covariance matrices in multilevel models with excellent bias properties in small samples is a subject of continuing investigation. ffl The actual coverage of nominal 95% IGLS intervals for parameters in the covariance matrix V u was systematically and substantially below 95% in all designs examined, with values for the diagonal elements as low as91± 1% for J =48and81± 1% with J =12. RIGLS achieved some improvement but still produced intervals with undercoverage for example, the corresponding RIGLS figures were 92 ± 1% and 85 ± 1%. ffl The coverage behavior of the W 1 (4; ^± u ) prior was similar to that of RIGLS with J = 48 but noticeably inferior when J = 12, with coverage as low as78± 1% for some components of V u. The W 1 (2;I 2 ) prior matched or exceeded RIGLS (when Monte Carlo noise was considered) in coverage performance in every design examined, but still yielded coverage as lowas88±1% for ff 2 u0 in design (Λ)4. The W 1 (2;I 2 )intervals for the covariance components achieved their improved performance by

20 20 being appropriately wider than the RIGLS intervals (an average of 26% wider in the J = 12 designs, for instance); their lack of perfect coverage, when present, was directly traceable to bias. The other dimension along which ML and Bayesian methods maybecompared is computation time, where maximum likelihood has a clear advantage over MCMC-based approaches (e.g., at 333MHz RIGLS takes 8 seconds in real time to fit model (1) to the JSP data versus about 6 minutes for 50,000 MCMC iterations). To summarize our findings, therefore, we are unable to make a strong recommendation at this time on validity grounds for two-level RSR models with a small number J of level 2 units, but with moderate or large J echoing the conclusion in BD99 for VC and RELR models we would recommend (for computational speed) the use of RIGLS estimation in the exploratory stages of the analysis, when a number of models are typically examined, followed by Bayesian fitting with a prior similar to W 1 (2;I 2 )on the variance components to produce publishable point and interval estimates with the final model chosen x. 3 Alternatives to Gibbs sampling in multilevel models Gibbs sampling is a natural approach to MCMC fitting in Gaussian variancecomponents (VC) and random-slopes regression (RSR) models such as(11) and (1), respectively, because the full conditional distributions have simple recognizable forms when the residuals at levels higher than 1 in the hierarchy are treated as latent variables and sampled along with the parameters. However, (a) this is no longer true in random-effects logistic regression (RELR) models, and (b) even when it is true it is possible that other MCMC methods will be more efficient than Gibbs sampling. The main natural alternative to Gibbs in multilevel models is a hybrid of Metropolis and Gibbs sampling, in which Metropolis updates are calculated for some of the unknowns and Gibbs sampling is employed for the remainder. Furthermore, there are two additional kinds of flexibility in the Metropolis updates: (i) these may either be performed on the unknowns one at a time or in blocks, and (ii) the Metropolis proposal distributions may either be fixed throughout the run or chosen adaptively at the beginning of the sampling. In the rest of this section we elaborate on these alternatives and present some MCMC efficiency comparisons. x This is a potentially dangerous strategy in small-sample settings on grounds of failure to propagate model uncertainty (e.g., Draper 1995), but the corrections required to adjust for having performed model selection and fitting on the same data set with, e.g., 48 schools and 887 students (as in the JSP data) should be modest.

21 A hybrid Metropolis-Gibbs sampler with univariate updates The general three-level Gaussian model may be written y ijk = X 1ijk fi 1 + X 2ijk fi 2jk + X 3ijk fi 3k + e ijk ; e ijk ο N(0;ff 2 );fi 2jk ο N p2 (0;V 2 );fi 3k ο N p3 (0;V 3 ); (12) in which fi 1 collects together the fixed effects and fi 2 and fi 3 are the level 2 and 3 residuals, with covariance matrices V 2 and V 3, respectively. A general N level model has one set of fixed effects, N sets of residuals (although the residuals at level 1areavailable by subtraction and do not need to be sampled), and N sets of (co)variance parameters. As in equation (9) all of these quantities may be naturally split into four groups: the fixed effects, the (N 1) sets of residuals (excluding those on level 1), the level 1 scalar variance ff 2, and the (N 1) higher-level covariance matrices. Assuming uniform priors for the fixed effects and inverse gamma and inverse Wishart prior distributions for ff 2 and the covariance matrices, respectively, the Gibbs sampling algorithm for the N level model involves generalizations of the full conditional distributions in (9). An alternative Metropolis- Gibbs hybrid approach which generalizes naturally to RELR models uses (a) univariate-update random-walk Metropolis sampling on the fixed effects and residuals and (b) Gibbs sampling for ff 2 and the covariance matrices. This method requires specification of the proposal distribution variances, a choice which affects the MCMC efficiency of the combined algorithm. In more detail, the idea in generating a proposed move fi (t) 1k the fixed-effects vector fi 1 at iteration t is to draw from the N for element k of fi (t 1) 1k ;» k^fi 2 k distribution, where ^fi k 2 is an estimate of the posterior variance of fi 1k and» k is a well-chosen scale factor. In Gaussian settings simpler than multilevel models Gelman, Roberts, and Gilks (1995) showed that the optimal value for» k is approximately 5.8 for univariate parameters when fik 2 is known, leading to an optimal acceptance rate of about 44%, with lower optimal» k values as the dimensionality of the parameter of interest increases. Estimates of fik 2 are readily available in the MLwiN context from maximum likelihood, but this leaves open the question of what to use for» k in multilevel analyses. We performed a small simulation study to address this issue, with results as summarized in Table 6 and Figure 6. The first five modelsintable 6were variations on (1) and (11) applied to the JSP data described at the beginning of Section 2(VC1 is model (11); to this model VC2 adds math3 as a linear predictor with nonrandom slope, and VC3 includes in addition the student's gender (also treated nonrandomly); RSR2 is model (1), and RSR3 adds gender with random slopes). The last row in Table 6, SCH1, pertains to model (1) applied to a different educational data set from Rasbash et al. (1999) with 4,059 students in 65 schools, in which the outcome is an examination score at age 16 and the predictor is the student's London

22 22 ^nm = fi 0 ^nm = fi Scale Factor Scale Factor ^nm = fi 0 ^nm = fi Acceptance Rate Acceptance Rate Figure 6: Effect of scale factor and acceptance rate on default Raftery-Lewis ^n M value for parameters fi 0 and fi 1 in model (1) applied to the JSP data. Solid curves are robust (lowess) smooths; the horizontal scale in the top plots is logarithmic.

23 23 Table 6: Optimal scale factors and ranges of near-optimal acceptance rates for various VC and RSR models (values are approximate averages across three Monte Carlo repetitions). Optimal Scale Factor Near-Optimal Acceptance Rate Parameter Parameter Model fi 0 fi 1 fi 2 fi 0 fi 1 fi 2 VC % VC % 40 60% VC % 40 60% 40 65% RSR % 35 65% RSR % 40 70% 40 65% SCH % 40 70% Reading Test score at age 11. Figure 6 is based on (1) applied to the JSP data, and uses the default Raftery-Lewis recommended length of monitoring run ^n M as the measure of MCMC efficiency. In all cases three runs (each employing a burn-in of 500 from RIGLS starting values and a monitoring run of 50,000) were made with different random seeds at each of a variety of scale factors from 0.05 to 20 and the results averaged (Table 6) or displayed. It is clear that (a) the optimal scale factors vary considerably from one model and parameter to another (while always remaining substantially below 5.8), but (b) the near-optimal acceptance rate is quite flat in the region 45 60% for a wide variety of models and parameters. In view of these findings we decided to provide MLwiN with a hybrid Metropolis-Gibbs option that chooses the scale factors adaptively by monitoring the acceptance rates of all the parameters. 3.2 An adaptive hybrid Metropolis-Gibbs sampler for random-effects logistic regression (RELR) models We present the idea behind our adaptive hybrid Metropolis-Gibbs sampler in the context of RELR models, where Gibbs sampling by itself is not straightforward because the full conditionals for the fixed effects and residuals do not have simple recognizable forms (see Browne 1998 for details on the adaptive method with Gaussian outcomes, and see, e.g., Müller 1993 for alternative approaches to adaptive MCMC sampling). Consider for illustration a twolevel data set with a dichotomous outcome and a model of the form (y ij j p ij ) indep ο P Bernoulli (p ij ) with p logit (p ij )=fi 0 + fi k=1 k (x ijk μx k )+u j ; (13) where u j IID ο N 0;ff 2 u. As was the case with model (12), our adaptivehybrid sampler uses Metropolis updates on the fixed effects fi = (fi 0 ;:::;fi p ) and residuals u j and Gibbs updates on the variance ff 2 u (see BD99 for a detailed

24 24 description of the updating with a three-level RELR model); the difference is in how the Metropolis proposal distribution (PD) variances are calculated. From maximum likelihood starting values we first employ a sampling period of random length (but with an upper bound set by the user) during which the PD variances are adaptively tuned and then eventually fixed for the remainder of the run; this is followed by the usual burn-in period (see Section 2.2); and then the main monitoring run from which posterior summaries are calculated occurs. The tuning of the PD variances is based on achieving an acceptance rate for each parameter that lies within a user-specified tolerance interval (r ;r+ ). The algorithm examines empirical acceptance rates in batches of 100 iterations, comparing them for each parameter with the tolerance interval and modifying the proposal distribution appropriately before going on to the next batch of 100. With r Λ as the acceptance rate in the most recent batchand ffp 2 as the PD variance for a given parameter, the modification performed at the end of each batch isasfollows: 1 r Λ If r Λ r; ff p! ff p»2 1 r ; else ff p! ff p 2 r Λ : (14) r This alters the PD variance by a greater amount the farther the empirical acceptance rate is from the target r. If r Λ is too low, the proposed moves are too big, so ffp 2 is decreased; if r Λ is too high, the parameter space is being explored with moves that are too small, and ffp 2 is increased. If the r Λ values are within the tolerance interval during three successive batches of 100 iterations, the parameter is marked as satisfying the tolerance conditions, and once all parameters have been marked the overall tolerance condition is satisfied and adapting stops (after a parameter has been marked it is still modified as before until all parameters are marked). To bound the time spent in the adapting procedure an upper limit is set (in MLwiN the default is 5,000 iterations) and when this limit is reached the adapting period ends regardless of whether the tolerance conditions are met (in practice this occurs rarely). Values of (r; ) = (0:5; 0:1) appear to give near-optimal Metropolis performance for a wide variety of multilevel models and are used as the defaults. Block updating. In both Gaussian and dichotomous-outcome models another approach involves the use of block rather than univariate Metropolis updating. The advantage of block updating (e.g., Gilks et al. 1996) is that it can account for the correlation structure of the unknowns in sets of sensiblychosen blocks, potentially increasing MCMC efficiency. A natural strategy in the L level generalization of model (12) is to create L sets of blocks, one consisting of all of the fixed effects and the other (L 1) groups of n l blocks of size n rl comprising all of the residuals at level 2;:::;L, respectively, wheren l is the number of blocks at level l and n rl is the number of residuals per block at level l. Multivariate normal proposal distributions may thenbeusedfor each block, for example with covariance matrices of the form»^±, in which

25 25 initially (for instance) ^± is the maximum-likelihood estimate of the block covariance matrix (of dimension p, say). In simple non-hierarchical Gaussian settings Gelman, Roberts, and Gilks (1995) find that» = 5:8 p is optimal, leading to acceptance rates of approximate form :22 + :31 p :09 p. It is also 2 possible to apply a version of the adaptive algorithm described above atthe block level, in which ^± is updated during the adaptation period along with», bearing in mind that the target value of r should decrease with increasing p. MLwiN release 1.1 offers both fixed and adaptive» options for block updating. 3.3 RELR computational efficiency results We have performed a small simulation to compare the adaptive methods of Section 3.2, in terms of MCMC efficiency in RELR models, with Gibbs sampling via adaptive rejection as implemented in the package WinBUGS. Adaptive rejection sampling (ARS; Gilks and Wild 1992) avoids the problem of nonstandard full conditional distributions in models with dichotomous outcomes by using a version of rejection sampling (e.g., Ripley 1987) in which the upper and lower envelopes evolve adaptively depending on the points sampled so far in the run. Results in this section are anecdotal but representative of many similar comparisons we have made. Tables 7 and 8 compare the MLwiN univariate and multivariate adaptive hybrid Metropolis-Gibbs approaches with ARS in the fitting of model (13) to two data sets. In Table 7 the outcome variable is an indicator in the JSP data (N = 887) of whether or not the student's math5 score was 30 or above (this occurred 67% of the time), and the predictors include a centered version of math3 (x 1 ) and dummy variables for gender (x 2 ) and whether the principal wage-earner in the student's family was a manual or nonmanual worker (x 3 ). The data for Table 8 are taken from the British Election Study (Heath et al. 1996), a longitudinal survey of the determinants of voting behavior. The sample studied here includes N = 800 people, chosen representatively from a total of 110 constituencies (the grouping variable), who were asked (among other things) to report how they voted in the 1983 British general election. The outcome in Table 8 is an indicator of whether the person reported voting Conservative or not (44% of the sample did so), and the predictors were centered versions of variables, each on a 21 point scale, measuring attitudes toward nuclear weapons (x 1 ), high unemployment as a means to lower inflation (x 2 ), tax cuts (x 3 ), and privatization (x 4 ). MCMC efficiency is measured with default Raftery-Lewis estimates of required length of monitoring run; results in both tables are based on an average of five chains using different random seeds, each with a burn-in of 500 from good starting values and a monitoring run of 50,000 iterations. Table 7 shows that (a) Gibbs sampling via ARS is the most efficient method per iteration, by a considerable margin, with ^n M values across the

Bayesian Methods in Multilevel Regression

Bayesian Methods in Multilevel Regression Bayesian Methods in Multilevel Regression Joop Hox MuLOG, 15 september 2000 mcmc What is Statistics?! Statistics is about uncertainty To err is human, to forgive divine, but to include errors in your design

More information

Partitioning variation in multilevel models.

Partitioning variation in multilevel models. Partitioning variation in multilevel models. by Harvey Goldstein, William Browne and Jon Rasbash Institute of Education, London, UK. Summary. In multilevel modelling, the residual variation in a response

More information

Summary of Talk Background to Multilevel modelling project. What is complex level 1 variation? Tutorial dataset. Method 1 : Inverse Wishart proposals.

Summary of Talk Background to Multilevel modelling project. What is complex level 1 variation? Tutorial dataset. Method 1 : Inverse Wishart proposals. Modelling the Variance : MCMC methods for tting multilevel models with complex level 1 variation and extensions to constrained variance matrices By Dr William Browne Centre for Multilevel Modelling Institute

More information

ST 740: Markov Chain Monte Carlo

ST 740: Markov Chain Monte Carlo ST 740: Markov Chain Monte Carlo Alyson Wilson Department of Statistics North Carolina State University October 14, 2012 A. Wilson (NCSU Stsatistics) MCMC October 14, 2012 1 / 20 Convergence Diagnostics:

More information

Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016

Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016 Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016 EPSY 905: Intro to Bayesian and MCMC Today s Class An

More information

MULTILEVEL IMPUTATION 1

MULTILEVEL IMPUTATION 1 MULTILEVEL IMPUTATION 1 Supplement B: MCMC Sampling Steps and Distributions for Two-Level Imputation This document gives technical details of the full conditional distributions used to draw regression

More information

Computational statistics

Computational statistics Computational statistics Markov Chain Monte Carlo methods Thierry Denœux March 2017 Thierry Denœux Computational statistics March 2017 1 / 71 Contents of this chapter When a target density f can be evaluated

More information

Discrete Response Multilevel Models for Repeated Measures: An Application to Voting Intentions Data

Discrete Response Multilevel Models for Repeated Measures: An Application to Voting Intentions Data Quality & Quantity 34: 323 330, 2000. 2000 Kluwer Academic Publishers. Printed in the Netherlands. 323 Note Discrete Response Multilevel Models for Repeated Measures: An Application to Voting Intentions

More information

Modelling heterogeneous variance-covariance components in two-level multilevel models with application to school effects educational research

Modelling heterogeneous variance-covariance components in two-level multilevel models with application to school effects educational research Modelling heterogeneous variance-covariance components in two-level multilevel models with application to school effects educational research Research Methods Festival Oxford 9 th July 014 George Leckie

More information

Generalized Linear Models for Non-Normal Data

Generalized Linear Models for Non-Normal Data Generalized Linear Models for Non-Normal Data Today s Class: 3 parts of a generalized model Models for binary outcomes Complications for generalized multivariate or multilevel models SPLH 861: Lecture

More information

WU Weiterbildung. Linear Mixed Models

WU Weiterbildung. Linear Mixed Models Linear Mixed Effects Models WU Weiterbildung SLIDE 1 Outline 1 Estimation: ML vs. REML 2 Special Models On Two Levels Mixed ANOVA Or Random ANOVA Random Intercept Model Random Coefficients Model Intercept-and-Slopes-as-Outcomes

More information

Bayesian Networks in Educational Assessment

Bayesian Networks in Educational Assessment Bayesian Networks in Educational Assessment Estimating Parameters with MCMC Bayesian Inference: Expanding Our Context Roy Levy Arizona State University Roy.Levy@asu.edu 2017 Roy Levy MCMC 1 MCMC 2 Posterior

More information

Slice Sampling with Adaptive Multivariate Steps: The Shrinking-Rank Method

Slice Sampling with Adaptive Multivariate Steps: The Shrinking-Rank Method Slice Sampling with Adaptive Multivariate Steps: The Shrinking-Rank Method Madeleine B. Thompson Radford M. Neal Abstract The shrinking rank method is a variation of slice sampling that is efficient at

More information

Default Priors and Effcient Posterior Computation in Bayesian

Default Priors and Effcient Posterior Computation in Bayesian Default Priors and Effcient Posterior Computation in Bayesian Factor Analysis January 16, 2010 Presented by Eric Wang, Duke University Background and Motivation A Brief Review of Parameter Expansion Literature

More information

eqr094: Hierarchical MCMC for Bayesian System Reliability

eqr094: Hierarchical MCMC for Bayesian System Reliability eqr094: Hierarchical MCMC for Bayesian System Reliability Alyson G. Wilson Statistical Sciences Group, Los Alamos National Laboratory P.O. Box 1663, MS F600 Los Alamos, NM 87545 USA Phone: 505-667-9167

More information

Part 8: GLMs and Hierarchical LMs and GLMs

Part 8: GLMs and Hierarchical LMs and GLMs Part 8: GLMs and Hierarchical LMs and GLMs 1 Example: Song sparrow reproductive success Arcese et al., (1992) provide data on a sample from a population of 52 female song sparrows studied over the course

More information

Variance partitioning in multilevel logistic models that exhibit overdispersion

Variance partitioning in multilevel logistic models that exhibit overdispersion J. R. Statist. Soc. A (2005) 168, Part 3, pp. 599 613 Variance partitioning in multilevel logistic models that exhibit overdispersion W. J. Browne, University of Nottingham, UK S. V. Subramanian, Harvard

More information

Appendix: Modeling Approach

Appendix: Modeling Approach AFFECTIVE PRIMACY IN INTRAORGANIZATIONAL TASK NETWORKS Appendix: Modeling Approach There is now a significant and developing literature on Bayesian methods in social network analysis. See, for instance,

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

The lmm Package. May 9, Description Some improved procedures for linear mixed models

The lmm Package. May 9, Description Some improved procedures for linear mixed models The lmm Package May 9, 2005 Version 0.3-4 Date 2005-5-9 Title Linear mixed models Author Original by Joseph L. Schafer . Maintainer Jing hua Zhao Description Some improved

More information

Markov Chain Monte Carlo methods

Markov Chain Monte Carlo methods Markov Chain Monte Carlo methods By Oleg Makhnin 1 Introduction a b c M = d e f g h i 0 f(x)dx 1.1 Motivation 1.1.1 Just here Supresses numbering 1.1.2 After this 1.2 Literature 2 Method 2.1 New math As

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate

More information

Markov Chain Monte Carlo methods

Markov Chain Monte Carlo methods Markov Chain Monte Carlo methods Tomas McKelvey and Lennart Svensson Signal Processing Group Department of Signals and Systems Chalmers University of Technology, Sweden November 26, 2012 Today s learning

More information

Computationally Efficient Estimation of Multilevel High-Dimensional Latent Variable Models

Computationally Efficient Estimation of Multilevel High-Dimensional Latent Variable Models Computationally Efficient Estimation of Multilevel High-Dimensional Latent Variable Models Tihomir Asparouhov 1, Bengt Muthen 2 Muthen & Muthen 1 UCLA 2 Abstract Multilevel analysis often leads to modeling

More information

Part 7: Hierarchical Modeling

Part 7: Hierarchical Modeling Part 7: Hierarchical Modeling!1 Nested data It is common for data to be nested: i.e., observations on subjects are organized by a hierarchy Such data are often called hierarchical or multilevel For example,

More information

The Bayesian Approach to Multi-equation Econometric Model Estimation

The Bayesian Approach to Multi-equation Econometric Model Estimation Journal of Statistical and Econometric Methods, vol.3, no.1, 2014, 85-96 ISSN: 2241-0384 (print), 2241-0376 (online) Scienpress Ltd, 2014 The Bayesian Approach to Multi-equation Econometric Model Estimation

More information

Bayesian Linear Regression

Bayesian Linear Regression Bayesian Linear Regression Sudipto Banerjee 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. September 15, 2010 1 Linear regression models: a Bayesian perspective

More information

Hierarchical Linear Models. Jeff Gill. University of Florida

Hierarchical Linear Models. Jeff Gill. University of Florida Hierarchical Linear Models Jeff Gill University of Florida I. ESSENTIAL DESCRIPTION OF HIERARCHICAL LINEAR MODELS II. SPECIAL CASES OF THE HLM III. THE GENERAL STRUCTURE OF THE HLM IV. ESTIMATION OF THE

More information

CSC 2541: Bayesian Methods for Machine Learning

CSC 2541: Bayesian Methods for Machine Learning CSC 2541: Bayesian Methods for Machine Learning Radford M. Neal, University of Toronto, 2011 Lecture 3 More Markov Chain Monte Carlo Methods The Metropolis algorithm isn t the only way to do MCMC. We ll

More information

17 : Markov Chain Monte Carlo

17 : Markov Chain Monte Carlo 10-708: Probabilistic Graphical Models, Spring 2015 17 : Markov Chain Monte Carlo Lecturer: Eric P. Xing Scribes: Heran Lin, Bin Deng, Yun Huang 1 Review of Monte Carlo Methods 1.1 Overview Monte Carlo

More information

Hierarchical Generalized Linear Models. ERSH 8990 REMS Seminar on HLM Last Lecture!

Hierarchical Generalized Linear Models. ERSH 8990 REMS Seminar on HLM Last Lecture! Hierarchical Generalized Linear Models ERSH 8990 REMS Seminar on HLM Last Lecture! Hierarchical Generalized Linear Models Introduction to generalized models Models for binary outcomes Interpreting parameter

More information

Part 6: Multivariate Normal and Linear Models

Part 6: Multivariate Normal and Linear Models Part 6: Multivariate Normal and Linear Models 1 Multiple measurements Up until now all of our statistical models have been univariate models models for a single measurement on each member of a sample of

More information

Package lmm. R topics documented: March 19, Version 0.4. Date Title Linear mixed models. Author Joseph L. Schafer

Package lmm. R topics documented: March 19, Version 0.4. Date Title Linear mixed models. Author Joseph L. Schafer Package lmm March 19, 2012 Version 0.4 Date 2012-3-19 Title Linear mixed models Author Joseph L. Schafer Maintainer Jing hua Zhao Depends R (>= 2.0.0) Description Some

More information

Bayes: All uncertainty is described using probability.

Bayes: All uncertainty is described using probability. Bayes: All uncertainty is described using probability. Let w be the data and θ be any unknown quantities. Likelihood. The probability model π(w θ) has θ fixed and w varying. The likelihood L(θ; w) is π(w

More information

ECO 513 Fall 2009 C. Sims HIDDEN MARKOV CHAIN MODELS

ECO 513 Fall 2009 C. Sims HIDDEN MARKOV CHAIN MODELS ECO 513 Fall 2009 C. Sims HIDDEN MARKOV CHAIN MODELS 1. THE CLASS OF MODELS y t {y s, s < t} p(y t θ t, {y s, s < t}) θ t = θ(s t ) P[S t = i S t 1 = j] = h ij. 2. WHAT S HANDY ABOUT IT Evaluating the

More information

STAT 425: Introduction to Bayesian Analysis

STAT 425: Introduction to Bayesian Analysis STAT 425: Introduction to Bayesian Analysis Marina Vannucci Rice University, USA Fall 2017 Marina Vannucci (Rice University, USA) Bayesian Analysis (Part 2) Fall 2017 1 / 19 Part 2: Markov chain Monte

More information

Latent Variable Centering of Predictors and Mediators in Multilevel and Time-Series Models

Latent Variable Centering of Predictors and Mediators in Multilevel and Time-Series Models Latent Variable Centering of Predictors and Mediators in Multilevel and Time-Series Models Tihomir Asparouhov and Bengt Muthén August 5, 2018 Abstract We discuss different methods for centering a predictor

More information

16 : Approximate Inference: Markov Chain Monte Carlo

16 : Approximate Inference: Markov Chain Monte Carlo 10-708: Probabilistic Graphical Models 10-708, Spring 2017 16 : Approximate Inference: Markov Chain Monte Carlo Lecturer: Eric P. Xing Scribes: Yuan Yang, Chao-Ming Yen 1 Introduction As the target distribution

More information

Univariate Normal Distribution; GLM with the Univariate Normal; Least Squares Estimation

Univariate Normal Distribution; GLM with the Univariate Normal; Least Squares Estimation Univariate Normal Distribution; GLM with the Univariate Normal; Least Squares Estimation PRE 905: Multivariate Analysis Spring 2014 Lecture 4 Today s Class The building blocks: The basics of mathematical

More information

Ronald Christensen. University of New Mexico. Albuquerque, New Mexico. Wesley Johnson. University of California, Irvine. Irvine, California

Ronald Christensen. University of New Mexico. Albuquerque, New Mexico. Wesley Johnson. University of California, Irvine. Irvine, California Texts in Statistical Science Bayesian Ideas and Data Analysis An Introduction for Scientists and Statisticians Ronald Christensen University of New Mexico Albuquerque, New Mexico Wesley Johnson University

More information

Bayesian Analysis of Latent Variable Models using Mplus

Bayesian Analysis of Latent Variable Models using Mplus Bayesian Analysis of Latent Variable Models using Mplus Tihomir Asparouhov and Bengt Muthén Version 2 June 29, 2010 1 1 Introduction In this paper we describe some of the modeling possibilities that are

More information

A Search and Jump Algorithm for Markov Chain Monte Carlo Sampling. Christopher Jennison. Adriana Ibrahim. Seminar at University of Kuwait

A Search and Jump Algorithm for Markov Chain Monte Carlo Sampling. Christopher Jennison. Adriana Ibrahim. Seminar at University of Kuwait A Search and Jump Algorithm for Markov Chain Monte Carlo Sampling Christopher Jennison Department of Mathematical Sciences, University of Bath, UK http://people.bath.ac.uk/mascj Adriana Ibrahim Institute

More information

LINEAR MULTILEVEL MODELS. Data are often hierarchical. By this we mean that data contain information

LINEAR MULTILEVEL MODELS. Data are often hierarchical. By this we mean that data contain information LINEAR MULTILEVEL MODELS JAN DE LEEUW ABSTRACT. This is an entry for The Encyclopedia of Statistics in Behavioral Science, to be published by Wiley in 2005. 1. HIERARCHICAL DATA Data are often hierarchical.

More information

An Introduction to Path Analysis

An Introduction to Path Analysis An Introduction to Path Analysis PRE 905: Multivariate Analysis Lecture 10: April 15, 2014 PRE 905: Lecture 10 Path Analysis Today s Lecture Path analysis starting with multivariate regression then arriving

More information

Specifying Latent Curve and Other Growth Models Using Mplus. (Revised )

Specifying Latent Curve and Other Growth Models Using Mplus. (Revised ) Ronald H. Heck 1 University of Hawai i at Mānoa Handout #20 Specifying Latent Curve and Other Growth Models Using Mplus (Revised 12-1-2014) The SEM approach offers a contrasting framework for use in analyzing

More information

Linear Mixed Models. One-way layout REML. Likelihood. Another perspective. Relationship to classical ideas. Drawbacks.

Linear Mixed Models. One-way layout REML. Likelihood. Another perspective. Relationship to classical ideas. Drawbacks. Linear Mixed Models One-way layout Y = Xβ + Zb + ɛ where X and Z are specified design matrices, β is a vector of fixed effect coefficients, b and ɛ are random, mean zero, Gaussian if needed. Usually think

More information

Lecture 5: Spatial probit models. James P. LeSage University of Toledo Department of Economics Toledo, OH

Lecture 5: Spatial probit models. James P. LeSage University of Toledo Department of Economics Toledo, OH Lecture 5: Spatial probit models James P. LeSage University of Toledo Department of Economics Toledo, OH 43606 jlesage@spatial-econometrics.com March 2004 1 A Bayesian spatial probit model with individual

More information

Item Parameter Calibration of LSAT Items Using MCMC Approximation of Bayes Posterior Distributions

Item Parameter Calibration of LSAT Items Using MCMC Approximation of Bayes Posterior Distributions R U T C O R R E S E A R C H R E P O R T Item Parameter Calibration of LSAT Items Using MCMC Approximation of Bayes Posterior Distributions Douglas H. Jones a Mikhail Nediak b RRR 7-2, February, 2! " ##$%#&

More information

Introducing Generalized Linear Models: Logistic Regression

Introducing Generalized Linear Models: Logistic Regression Ron Heck, Summer 2012 Seminars 1 Multilevel Regression Models and Their Applications Seminar Introducing Generalized Linear Models: Logistic Regression The generalized linear model (GLM) represents and

More information

MCMC algorithms for fitting Bayesian models

MCMC algorithms for fitting Bayesian models MCMC algorithms for fitting Bayesian models p. 1/1 MCMC algorithms for fitting Bayesian models Sudipto Banerjee sudiptob@biostat.umn.edu University of Minnesota MCMC algorithms for fitting Bayesian models

More information

Introduction to Matrix Algebra and the Multivariate Normal Distribution

Introduction to Matrix Algebra and the Multivariate Normal Distribution Introduction to Matrix Algebra and the Multivariate Normal Distribution Introduction to Structural Equation Modeling Lecture #2 January 18, 2012 ERSH 8750: Lecture 2 Motivation for Learning the Multivariate

More information

Estimating a Piecewise Growth Model with Longitudinal Data that Contains Individual Mobility across Clusters

Estimating a Piecewise Growth Model with Longitudinal Data that Contains Individual Mobility across Clusters Estimating a Piecewise Growth Model with Longitudinal Data that Contains Individual Mobility across Clusters Audrey J. Leroux Georgia State University Piecewise Growth Model (PGM) PGMs are beneficial for

More information

A Note on Bayesian Inference After Multiple Imputation

A Note on Bayesian Inference After Multiple Imputation A Note on Bayesian Inference After Multiple Imputation Xiang Zhou and Jerome P. Reiter Abstract This article is aimed at practitioners who plan to use Bayesian inference on multiplyimputed datasets in

More information

ML estimation: Random-intercepts logistic model. and z

ML estimation: Random-intercepts logistic model. and z ML estimation: Random-intercepts logistic model log p ij 1 p = x ijβ + υ i with υ i N(0, συ) 2 ij Standardizing the random effect, θ i = υ i /σ υ, yields log p ij 1 p = x ij β + σ υθ i with θ i N(0, 1)

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

Markov Chain Monte Carlo (MCMC) and Model Evaluation. August 15, 2017

Markov Chain Monte Carlo (MCMC) and Model Evaluation. August 15, 2017 Markov Chain Monte Carlo (MCMC) and Model Evaluation August 15, 2017 Frequentist Linking Frequentist and Bayesian Statistics How can we estimate model parameters and what does it imply? Want to find the

More information

Multilevel Statistical Models: 3 rd edition, 2003 Contents

Multilevel Statistical Models: 3 rd edition, 2003 Contents Multilevel Statistical Models: 3 rd edition, 2003 Contents Preface Acknowledgements Notation Two and three level models. A general classification notation and diagram Glossary Chapter 1 An introduction

More information

MCMC Methods: Gibbs and Metropolis

MCMC Methods: Gibbs and Metropolis MCMC Methods: Gibbs and Metropolis Patrick Breheny February 28 Patrick Breheny BST 701: Bayesian Modeling in Biostatistics 1/30 Introduction As we have seen, the ability to sample from the posterior distribution

More information

STA 294: Stochastic Processes & Bayesian Nonparametrics

STA 294: Stochastic Processes & Bayesian Nonparametrics MARKOV CHAINS AND CONVERGENCE CONCEPTS Markov chains are among the simplest stochastic processes, just one step beyond iid sequences of random variables. Traditionally they ve been used in modelling a

More information

Markov Chain Monte Carlo in Practice

Markov Chain Monte Carlo in Practice Markov Chain Monte Carlo in Practice Edited by W.R. Gilks Medical Research Council Biostatistics Unit Cambridge UK S. Richardson French National Institute for Health and Medical Research Vilejuif France

More information

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence Bayesian Inference in GLMs Frequentists typically base inferences on MLEs, asymptotic confidence limits, and log-likelihood ratio tests Bayesians base inferences on the posterior distribution of the unknowns

More information

Bayesian Inference. Chapter 4: Regression and Hierarchical Models

Bayesian Inference. Chapter 4: Regression and Hierarchical Models Bayesian Inference Chapter 4: Regression and Hierarchical Models Conchi Ausín and Mike Wiper Department of Statistics Universidad Carlos III de Madrid Advanced Statistics and Data Mining Summer School

More information

The Mixture Approach for Simulating New Families of Bivariate Distributions with Specified Correlations

The Mixture Approach for Simulating New Families of Bivariate Distributions with Specified Correlations The Mixture Approach for Simulating New Families of Bivariate Distributions with Specified Correlations John R. Michael, Significance, Inc. and William R. Schucany, Southern Methodist University The mixture

More information

CS242: Probabilistic Graphical Models Lecture 7B: Markov Chain Monte Carlo & Gibbs Sampling

CS242: Probabilistic Graphical Models Lecture 7B: Markov Chain Monte Carlo & Gibbs Sampling CS242: Probabilistic Graphical Models Lecture 7B: Markov Chain Monte Carlo & Gibbs Sampling Professor Erik Sudderth Brown University Computer Science October 27, 2016 Some figures and materials courtesy

More information

Contents. Part I: Fundamentals of Bayesian Inference 1

Contents. Part I: Fundamentals of Bayesian Inference 1 Contents Preface xiii Part I: Fundamentals of Bayesian Inference 1 1 Probability and inference 3 1.1 The three steps of Bayesian data analysis 3 1.2 General notation for statistical inference 4 1.3 Bayesian

More information

Ron Heck, Fall Week 8: Introducing Generalized Linear Models: Logistic Regression 1 (Replaces prior revision dated October 20, 2011)

Ron Heck, Fall Week 8: Introducing Generalized Linear Models: Logistic Regression 1 (Replaces prior revision dated October 20, 2011) Ron Heck, Fall 2011 1 EDEP 768E: Seminar in Multilevel Modeling rev. January 3, 2012 (see footnote) Week 8: Introducing Generalized Linear Models: Logistic Regression 1 (Replaces prior revision dated October

More information

Markov Chain Monte Carlo (MCMC)

Markov Chain Monte Carlo (MCMC) Markov Chain Monte Carlo (MCMC Dependent Sampling Suppose we wish to sample from a density π, and we can evaluate π as a function but have no means to directly generate a sample. Rejection sampling can

More information

Model Estimation Example

Model Estimation Example Ronald H. Heck 1 EDEP 606: Multivariate Methods (S2013) April 7, 2013 Model Estimation Example As we have moved through the course this semester, we have encountered the concept of model estimation. Discussions

More information

Gibbs Sampling in Linear Models #2

Gibbs Sampling in Linear Models #2 Gibbs Sampling in Linear Models #2 Econ 690 Purdue University Outline 1 Linear Regression Model with a Changepoint Example with Temperature Data 2 The Seemingly Unrelated Regressions Model 3 Gibbs sampling

More information

Bayesian linear regression

Bayesian linear regression Bayesian linear regression Linear regression is the basis of most statistical modeling. The model is Y i = X T i β + ε i, where Y i is the continuous response X i = (X i1,..., X ip ) T is the corresponding

More information

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis. 401 Review Major topics of the course 1. Univariate analysis 2. Bivariate analysis 3. Simple linear regression 4. Linear algebra 5. Multiple regression analysis Major analysis methods 1. Graphical analysis

More information

Eco517 Fall 2013 C. Sims MCMC. October 8, 2013

Eco517 Fall 2013 C. Sims MCMC. October 8, 2013 Eco517 Fall 2013 C. Sims MCMC October 8, 2013 c 2013 by Christopher A. Sims. This document may be reproduced for educational and research purposes, so long as the copies contain this notice and are retained

More information

MLMED. User Guide. Nicholas J. Rockwood The Ohio State University Beta Version May, 2017

MLMED. User Guide. Nicholas J. Rockwood The Ohio State University Beta Version May, 2017 MLMED User Guide Nicholas J. Rockwood The Ohio State University rockwood.19@osu.edu Beta Version May, 2017 MLmed is a computational macro for SPSS that simplifies the fitting of multilevel mediation and

More information

Principles of Bayesian Inference

Principles of Bayesian Inference Principles of Bayesian Inference Sudipto Banerjee University of Minnesota July 20th, 2008 1 Bayesian Principles Classical statistics: model parameters are fixed and unknown. A Bayesian thinks of parameters

More information

Statistical Practice

Statistical Practice Statistical Practice A Note on Bayesian Inference After Multiple Imputation Xiang ZHOU and Jerome P. REITER This article is aimed at practitioners who plan to use Bayesian inference on multiply-imputed

More information

Bayesian non-parametric model to longitudinally predict churn

Bayesian non-parametric model to longitudinally predict churn Bayesian non-parametric model to longitudinally predict churn Bruno Scarpa Università di Padova Conference of European Statistics Stakeholders Methodologists, Producers and Users of European Statistics

More information

Penalized Loss functions for Bayesian Model Choice

Penalized Loss functions for Bayesian Model Choice Penalized Loss functions for Bayesian Model Choice Martyn International Agency for Research on Cancer Lyon, France 13 November 2009 The pure approach For a Bayesian purist, all uncertainty is represented

More information

An Introduction to Mplus and Path Analysis

An Introduction to Mplus and Path Analysis An Introduction to Mplus and Path Analysis PSYC 943: Fundamentals of Multivariate Modeling Lecture 10: October 30, 2013 PSYC 943: Lecture 10 Today s Lecture Path analysis starting with multivariate regression

More information

Supplement to A Hierarchical Approach for Fitting Curves to Response Time Measurements

Supplement to A Hierarchical Approach for Fitting Curves to Response Time Measurements Supplement to A Hierarchical Approach for Fitting Curves to Response Time Measurements Jeffrey N. Rouder Francis Tuerlinckx Paul L. Speckman Jun Lu & Pablo Gomez May 4 008 1 The Weibull regression model

More information

Bayesian inference for factor scores

Bayesian inference for factor scores Bayesian inference for factor scores Murray Aitkin and Irit Aitkin School of Mathematics and Statistics University of Newcastle UK October, 3 Abstract Bayesian inference for the parameters of the factor

More information

A Re-Introduction to General Linear Models (GLM)

A Re-Introduction to General Linear Models (GLM) A Re-Introduction to General Linear Models (GLM) Today s Class: You do know the GLM Estimation (where the numbers in the output come from): From least squares to restricted maximum likelihood (REML) Reviewing

More information

CTDL-Positive Stable Frailty Model

CTDL-Positive Stable Frailty Model CTDL-Positive Stable Frailty Model M. Blagojevic 1, G. MacKenzie 2 1 Department of Mathematics, Keele University, Staffordshire ST5 5BG,UK and 2 Centre of Biostatistics, University of Limerick, Ireland

More information

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling 10-708: Probabilistic Graphical Models 10-708, Spring 2014 27 : Distributed Monte Carlo Markov Chain Lecturer: Eric P. Xing Scribes: Pengtao Xie, Khoa Luu In this scribe, we are going to review the Parallel

More information

TAKEHOME FINAL EXAM e iω e 2iω e iω e 2iω

TAKEHOME FINAL EXAM e iω e 2iω e iω e 2iω ECO 513 Spring 2015 TAKEHOME FINAL EXAM (1) Suppose the univariate stochastic process y is ARMA(2,2) of the following form: y t = 1.6974y t 1.9604y t 2 + ε t 1.6628ε t 1 +.9216ε t 2, (1) where ε is i.i.d.

More information

STA414/2104 Statistical Methods for Machine Learning II

STA414/2104 Statistical Methods for Machine Learning II STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements

More information

Bayesian Inference for the Multivariate Normal

Bayesian Inference for the Multivariate Normal Bayesian Inference for the Multivariate Normal Will Penny Wellcome Trust Centre for Neuroimaging, University College, London WC1N 3BG, UK. November 28, 2014 Abstract Bayesian inference for the multivariate

More information

Lecture 5. G. Cowan Lectures on Statistical Data Analysis Lecture 5 page 1

Lecture 5. G. Cowan Lectures on Statistical Data Analysis Lecture 5 page 1 Lecture 5 1 Probability (90 min.) Definition, Bayes theorem, probability densities and their properties, catalogue of pdfs, Monte Carlo 2 Statistical tests (90 min.) general concepts, test statistics,

More information

Bayesian Inference. Chapter 4: Regression and Hierarchical Models

Bayesian Inference. Chapter 4: Regression and Hierarchical Models Bayesian Inference Chapter 4: Regression and Hierarchical Models Conchi Ausín and Mike Wiper Department of Statistics Universidad Carlos III de Madrid Master in Business Administration and Quantitative

More information

Introduction to Bayesian methods in inverse problems

Introduction to Bayesian methods in inverse problems Introduction to Bayesian methods in inverse problems Ville Kolehmainen 1 1 Department of Applied Physics, University of Eastern Finland, Kuopio, Finland March 4 2013 Manchester, UK. Contents Introduction

More information

Markov Chain Monte Carlo

Markov Chain Monte Carlo Markov Chain Monte Carlo Recall: To compute the expectation E ( h(y ) ) we use the approximation E(h(Y )) 1 n n h(y ) t=1 with Y (1),..., Y (n) h(y). Thus our aim is to sample Y (1),..., Y (n) from f(y).

More information

Ages of stellar populations from color-magnitude diagrams. Paul Baines. September 30, 2008

Ages of stellar populations from color-magnitude diagrams. Paul Baines. September 30, 2008 Ages of stellar populations from color-magnitude diagrams Paul Baines Department of Statistics Harvard University September 30, 2008 Context & Example Welcome! Today we will look at using hierarchical

More information

Performance of Likelihood-Based Estimation Methods for Multilevel Binary Regression Models

Performance of Likelihood-Based Estimation Methods for Multilevel Binary Regression Models Performance of Likelihood-Based Estimation Methods for Multilevel Binary Regression Models Marc Callens and Christophe Croux 1 K.U. Leuven Abstract: By means of a fractional factorial simulation experiment,

More information

The Wishart distribution Scaled Wishart. Wishart Priors. Patrick Breheny. March 28. Patrick Breheny BST 701: Bayesian Modeling in Biostatistics 1/11

The Wishart distribution Scaled Wishart. Wishart Priors. Patrick Breheny. March 28. Patrick Breheny BST 701: Bayesian Modeling in Biostatistics 1/11 Wishart Priors Patrick Breheny March 28 Patrick Breheny BST 701: Bayesian Modeling in Biostatistics 1/11 Introduction When more than two coefficients vary, it becomes difficult to directly model each element

More information

Chapter 1 Statistical Inference

Chapter 1 Statistical Inference Chapter 1 Statistical Inference causal inference To infer causality, you need a randomized experiment (or a huge observational study and lots of outside information). inference to populations Generalizations

More information

STAT 518 Intro Student Presentation

STAT 518 Intro Student Presentation STAT 518 Intro Student Presentation Wen Wei Loh April 11, 2013 Title of paper Radford M. Neal [1999] Bayesian Statistics, 6: 475-501, 1999 What the paper is about Regression and Classification Flexible

More information

Estimation in Generalized Linear Models with Heterogeneous Random Effects. Woncheol Jang Johan Lim. May 19, 2004

Estimation in Generalized Linear Models with Heterogeneous Random Effects. Woncheol Jang Johan Lim. May 19, 2004 Estimation in Generalized Linear Models with Heterogeneous Random Effects Woncheol Jang Johan Lim May 19, 2004 Abstract The penalized quasi-likelihood (PQL) approach is the most common estimation procedure

More information

Multilevel Modeling: When and Why 1. 1 Why multilevel data need multilevel models

Multilevel Modeling: When and Why 1. 1 Why multilevel data need multilevel models Multilevel Modeling: When and Why 1 J. Hox University of Amsterdam & Utrecht University Amsterdam/Utrecht, the Netherlands Abstract: Multilevel models have become popular for the analysis of a variety

More information

Use of Bayesian multivariate prediction models to optimize chromatographic methods

Use of Bayesian multivariate prediction models to optimize chromatographic methods Use of Bayesian multivariate prediction models to optimize chromatographic methods UCB Pharma! Braine lʼalleud (Belgium)! May 2010 Pierre Lebrun, ULg & Arlenda Bruno Boulanger, ULg & Arlenda Philippe Lambert,

More information

Bayesian Inference for Regression Parameters

Bayesian Inference for Regression Parameters Bayesian Inference for Regression Parameters 1 Bayesian inference for simple linear regression parameters follows the usual pattern for all Bayesian analyses: 1. Form a prior distribution over all unknown

More information

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability

More information