Hospital H1 H2 H3 H4. Patient P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 P12. Neighbourhood N1 N2 N3

Size: px
Start display at page:

Download "Hospital H1 H2 H3 H4. Patient P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 P12. Neighbourhood N1 N2 N3"

Transcription

1 Chapter 1 NON-HIERARCHICAL MULTILEVEL MODELS Jon Rasbash and William Browne 1. INTRODUCTION In the models discussed in this book so far we have assumed that the structures of the populations from which the data have been drawn are hierarchical. This assumption is sometimes not justied. In this chapter two main types of non-hierarchical model are considered. Firstly, cross-classied models. The notion of cross-classication is probably reasonably familiar to most readers. Secondly, we consider multiple membership models, where lower level units are inuenced by more than one higher level unit from the same classication. For example, some pupils may attend more than one school. We also consider situations that contain a mixture of hierarchical, crossed and multiple membership relationships. 2. CROSS-CLASSIFIED MODELS This section is divided into three parts. In the rst part we look at situations that can give rise to a two way cross-classication and introduce some diagrams to describe the population structure, and discuss notation for constructing a statistical model. In the second part we discuss some of the possible estimation methods for estimating cross-classied models and give an example analysis of an educational data set. In the third part we then describe some more complex cross-classied structures and give an example analyses of a medical data set. 2.1 TWO WAY CROSS-CLASSIFICATIONS : A BASIC MODEL Suppose, we have data on a large number of patients, attending many hospitals and we also know the neighbourhood in which the patient lives and that we regard patient, neighbourhood and hospital all as important 1

2 2 Table 1.1 Patients cross-classied by hospital and neighbourhood. Neighbourhood 1 Neighbourhood 2 Neighbourhood 3 Hospital 1 XX X Hospital 2 X X Hospital 3 XX X Hospital 4 X XXX Table 1.2 Patients nested within hospitals within neighbourhoods. Hospital 1 Hospital 2 Hospital 3 Hospital 4 Neighbourhood 1 Neighbourhood 2 Neighbourhood 3 XXX XX XXX XXXX sources of variation for the patient level outcome measure we wish to study. Now, typically hospitals will draw patients from many dierent neighbourhoods and the inhabitants of a neighbourhood will go to many hospitals. No pure hierarchy can be found and patients are said to be contained within a cross-classication of hospitals by neighbourhoods. This can be represented schematically, for the case of twelve patients contained within a cross-classication of three neighbourhoods by four hospitals as in table 1.1. In this example we have patients at level 1 and neighbourhood and hospital are cross-classied at level 2. The characteristic pattern of a cross-classication is shown, some rows contains multiple entries and some columns contain multiple entries. In a nested relationship, if the row classication is nested within the column classication then all the entries across a row will fall under a single column and vice versa if the column classication is nested within the row classication. For example, if hospitals are nested within neighbourhoods we might observe the pattern in table 1.2.

3 Many studies follow this simple two-way crossed structure, here are a few examples : Education: students cross-classied by primary school and secondary school. Health: patients cross-classied by general practice and hospital. Survey data: individuals cross-classied by interviewer and area of residence Diagrams for representing the relationship between classications. We nd two types of diagrams useful in expressing the nature of relationships between classications. Firstly, unit diagrams where we draw every unit (patient, hospital and neighbourhood, in the case of our rst example) and connect each lowest level unit(patient) to its parent units (hospital, neighbourhood). Such a representation of the data in table 1.1 is shown in gure Hospital H1 H2 H3 H4 Patient P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 P12 Neighbourhood N1 N2 N3 Figure 1.1 Diagrams for crossed structure given in table 1.1. Note that we have two hierarchies present, patients within hospitals and patients within neighbourhoods, we have organised the topology of the diagram such that patients are nested within hospitals. However,

4 4 when we come to add neighbourhoods to the diagram we see that the connecting lines cross, indicating we have a cross classication. Drawing the hierarchical structure shown in table 1.2 gives the representation shown in gure 1.2. Neighbourhood N1 N2 N3 Hospital H1 H4 H2 H3 Patient P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 P12 Figure 1.2 Diagrams for completely nested structure given in table 1.2. Clearly, todrawsuch diagrams that include all units with large data sets is not practical as there will be far to many nodes on the diagram to t into a reasonable area. However, they can be used in schematic form to convey the structure of the relationship between classications. However, when we have four or ve random classications present (as commonly occur with social data) even schematic forms of these diagrams can become hard to read. There is a more minimal diagram, the classication diagram, which has one node for each classication. Nodes connected by an arrow indicate a nested relationship, nodes connected by a double arrow indicate a multiple-membership relationship (examples are given later) and unconnected nodes indicate a crossed relationship. Thus the crossed structure in gure 1.1 and the completely nested structure of gure 1.2 are drawn as

5 5 Hospital Neighbourhood Neighbourhood Hospital Patient Patient (i) crossed structure (ii) nested structure Figure 1.3 Classication Diagrams for nesting and crossing Some notation for constructing a statistical model. The matrix notation used in this book for describing hierarchical models, that is, y j =X j +Z j j + e j does not readily extend to the case of cross-classications. This is because this notation assumes a unique hierarchy where we write down the generic equation for the jth level two unit. In a simple cross-classication we have two sets of level two units, for example, hospitals and neighbourhoods, so which classication is j indexing? We can extend the basic scalar notation to handle cross-classied structures. Assume we have patients nested within a cross-classication of neighbourhoods by hospital, that is the case illustrated in gure 1.3(i). Suppose wewant to estimate a simple variance components model giving estimates of the mean response and patient, hospital and neighbourhood level variation. In this case we can write the model in scalar notation as y i(j1 j 2 ) = 0 + j1 + j2 + e i(j1 j 2 ) where 0 estimates the mean response, j 1 indexes the neighbourhood classication, j 2 indexes the hospital classication, j1 is the random

6 6 Indexing table for neighbourhoods and hospitals for patients given in g- Table 1.3 ure 1.1 i nhbd(i) hosp(i) eect for neighbourhood j 1, j2 is the random eect for hospital j 2, y i(j1 j 2 ) is the response for the ith patient from the cell in the crossclassication dened by neighbourhood j 1 and hospital j 2 and nally e i(j1 j 2 ) is the patient level residual for the i'th patient from cell in the cross-classication dened by neighbourhood j 1 and hospital j 2. Details of how this notation extends to represent more complex models and patterns of cross-classications are given in Rasbash and Browne, One problem with this notation is that as we t models with more classications and more complex patterns of crossing, the subscript notation that describes the patterns becomes very cumbersome and dicult to read. We therefore prefer an alternative notation introduced in Browne et al., We can write the same model as y i = 0 + nbhd(i) + (3) hosp(i) + e i where i indexes the observation level in this case patients, and nbhd(i) and hosp(i) are functions that return the unit number of the neighbourhood and hospital, respectively, that patient i belongs to. Thus for the data structure drawn in gure 1.1 the values of nbhd(i) and hosp(i) are given in table 1.3. Therefore the model for patient 3 would be

7 7 and for patient 5 would be y 3 = (3) 1 + e 3 y 5 = (3) 2 + e 5 We number the classications from 2 upwards as we use classication number 1 to represent the identity classication that applies to the observation level (like level 1 in a hierarchical model). This classication simply returns the row numbers in the data matrix. As can be seen random eects require bracketed superscripting with their classication number to avoid ambiguity. This simplied notation has the advantage that the subscripting notation does not increase in complexity as we add more classications. This simplication is achieved because the notation makes no attempt to describe the patterns of crossing and nesting present. This is useful information and we therefore advocate the use of this notation in conjunction with the classication diagrams, as shown in gure 1.3, which displays these patterns explicitily. 2.2 ESTIMATION ALGORITHMS We will describe three estimation algorithms for tting cross-classication models in detail and mention other alternatives. Each of these three methods has advantages and disadvantages in terms of speed, memory usage and bias and these will be discussed later. All three methods have been implemented in versions of the MLwiN software package (Rasbash et al., 2000) and all results in this paper are produced by thispackage An IGLS algorithm for estimating cross-classied models. The iterative generalized least squares estimates for a multilevel model are those estimates which simultaneously satisfy both of the following equations: ^ =(X T V ;1 X) ;1 (X T V ;1 y) ^ =(Z T (V ) ;1 Z ) ;1 Z T (V ) ;1 y where ^ are the estimated xed coecients and ^ is a vector containing the estimated variances and covariances of the sets of random eects in the model. V=Cov(y j X ) and an estimate of V is constructed from the elements of ^. y is the vector of elements of (y ; X)(y ; X) T

8 8 and therefore has length n 2 (n is the length of the data set). V is the covariance matrix of y and Z is the design matrix linking y to V in the regression of y on Z. V has the form V =V N V. See Goldstein, 1986 for more details. Some of these matrices are massive for example (V ) ;1 is dimensioned n 2 by n 2, making a direct software implementation of these estimating equations extremely resource intensive both in terms of CPU time and memory consumed. However, in hierarchical models V and V have a block diagonal structure which can be exploited by customised algorithms (see Goldstein and Rasbash, 1996) which allow ecient computation. The problem presented by cross-classied models is that V (and therefore V ) no longer has the block diagonal structure which the ecient algorithm requires Structure of V for cross classied models. Lets take a look at the structure of V, the covariance matrix of y, for cross-classied models and see how we can adapt the standard IGLS algorithm to handle cross-classications. The basic two level cross-classied model (with hospitals + neighbourhoods) can be written as : y i =X + hosp(i) + (3) nhbd(i) + e i hosp(i) N(0 2 ) (3) nhbd(i) N(0 2 (3) ) e i N(0 2 e) The variance of our response is now var(y i )=var( hosp(i) + (3) nhbd(i) + e i)= (3) + 2 e : The covariance between individuals a and b is cov(y a y b )=cov( hosp(a) + (3) nhbd(a) + e a hosp(b) + (3) nhbd(b) + e b) which simplies to 2 for two individuals from the same hospital but dierent neighbourhoods, 2 for two individuals from the same (3) neighbourhood but dierent hospitals, for two individuals (3) from the same neighbourhood and the same hospital and zero for two individuals who are from both dierent neighbourhoods and hospitals. If we take atoy example of ve patients in two hospitals and introduce a cross-classication with two neighbourhoods, as shown in table 1.4. This generates a 5 by 5covariance matrix for the responses of the ve patients with the following structure :

9 9 Table 1.4 Indexing table for hospitals and neighbourhoods for 5 patients i hosp(i) nhbd(i) V= 0 h + n + p h h + n 0 n h h + n + p h n 0 h + n h h + n + p 0 n 0 n 0 h + n + p h n 0 n h h + n + p where n = 2 h= 2 (3) and p = 2 e. Here the data is sorted patient within hospital, this allows us to split the covariance matrix into two components. A component for patients within hospitals which has a block diagonal structure (P) and a component for neighbourhoods which is not block diagonal (Q) : V=P+Q where P= and Q= 0 0 h + p h h 0 0 h h + p h 0 0 h h h + p h + p h h h + p n 0 n 0 n 0 n 0 n 0 n 0 n 0 n 0 n 0 n 0 n 0 n 0 n 1 C A Splitting the structure of V into a hierarchical, block-diagonal part that the IGLS algorithm can handle in an ecient way and a nonhierarchical, non-block diagonal part forms the basis of a relatively ecient algorithm for handling cross-classied models. If we take the dummy variable indicator matrix of neighbourhoods (Z), then we have Q=ZZ T n : 1 C A 1 C A

10 10 Z= C A, ZZT n = We can dene a `pseudo-unit' that spans the entire data set, in our toy example, all ve points, and declare this pseudo-unit to be level three in the model (removing the neighbourhood level from the model). We can now form the three level hierarchical model 1 C A n 2 y i = 0 + hosp(i) + (3) punit(i) 1 Z 1 + (3) punit(i) 2 Z 2 + e i 4 (3) punit(i) 1 (3) punit(i) 2 3 " N(0 (3) 1 (3) ) (3) = 0 2 (3) 2 # hosp(i) N(0 2 ) e i N(0 2 e) Here the level structure is patients within hospitals within the pseudo unit level. Z 1 and Z 2 are columns 1 and 2 of Z. 2 (3) 1 and 2 (3) 2 are both estimates of the between neighbourhood variation, therefore we constrain them to be equal. Thus we can use the standard IGLS hierarchical algorithm to dene and estimate the correct covariance structure for a cross-classied model. Now if we had 200 hospitals and 100 neighbourhoods, we would have to form 100 dummy variables for the neighbourhoods, allow them all to have variances at level 3 and constrain the variances to be equal. Details of this algorithm are given in Rasbash and Goldstein, 1994 and Bull et al., 1999 and it will be refered to as the RG algorithm in later sections MCMC. The MCMC estimation methods (see Chapter 3 of this book for a fuller description) aim to generate samples from the joint posterior distribution of all unknown parameters. They then use these samples to calculate point andinterval estimates for each individual parameter. The Gibbs sampler algorithm produces samples from the joint posterior by generating in turn from the conditional posterior distributions of groups of unknown parameters. In chapter 3 the Gibbs sampling algorithm for a Normally distributed response hierarchical model is given. As we have seen in the notation section we can describe our model as a set of additive terms, one for the xed part of the model and one

11 for each of the random classications. The MCMC algorithm works on each of these terms seperately and consequently the algorithm for a crossclassied model is no more complicated than for a hierarchical model. For illustration we present the steps for the following cross-classied model based on the variance components hospitals by neighbourhoods model and refer the interested reader to Browne et al., 2000 for more general algorithms. Note that if the response is dichotomous or a count then as in chapter 3 we can use the Metropolis-Gibbs hybrid method discussed there. The basic two level cross-classied model (with hospitals + neighbourhoods) can be written as : y i =X + hosp(i) + (3) nhbd(i) + e i 11 hosp(i) N(0 2 ) (3) nhbd(i) N(0 2 (3) ) e i N(0 2 e) We can split our unknown parameters into 6 distinct sets : thexedef- fects,, the hospital random eects,, the neighbourhood random hosp(i) eects, (3), the hospitals variance, nhbd(i) 2 the neighbourhood variance, 2 and the residual variance, (3) 2 e. Then we need to generate random draws from the conditional distribution of each of these six groups of unknowns. MCMC algorithms are generally used in a Bayesian context and consequently we need to dene prior distributions for our unknown parameters. For generality we will use a multivariate Normal prior for the xed eects, N pf ( p S p ), and scaled inverse (SI) 2 priors for the three variances. For the hospital variance 2 SI2 ( 2 s 2 2), for the neighbourhood variance 2 (3) SI2 ( 3 s 2 3) and for the residual variance e 2 SI2 ( e s 2 e). The steps are then as follows: In step 1 of the algorithm the conditional posterior distribution in the Gibbs update for the xed eects parameter vector is multivariate normal with dimension p f (the number of xed eects) : p( j y (3) 2 2 (3) 2 e) N pf (b b D) where bd = hp Ni=1 (X i ) T X i e 2 (X i ) T d i e 2 b = b D hp i + S ;1 p + S p ;1 p i ;1 and d i = y i ; hosp(i) ; (3) nhbd(i) : i where In step 2 we update the hospital residuals, k, using Gibbs sampling with a univariate Normal full conditional distribution :

12 12 p( k j y (3) 2 2 (3) 2 e) N(b bd k = bu k = D b k n k e P i hosp(i)=k 2 ;1 and d i 2 e d i = y i ; X i ; (3) nhbd(i) : k D b k ) where where In step 3 we update the neighbourhood residuals, (3) k, using Gibbs sampling with a univariate Normal full conditional distribution : p( (3) k j y 2 2 (3) 2 e) N(b (3) bd (3) k = bu (3) k = D b (3) k n (3) k e P i nhbd(i)=k 2 (3) ;1 and d (3) i 3 e d (3) i = y i ; X i ; hosp(i) : Note that in the above two stepsn (c) k k D b (3) k ) where where refers to the number of individuals in the kth unit of classication c. In step 4 we update the hospital variance 2 and a Gamma full conditional distribution for 1= 2 : using Gibbs sampling h p(1= 2 j y (3) 2 (3) 2 e) Gamma n2 + 2 P 1 n2 2 2 j=1 ( j ) s 2 2 In step 5 we update the neighbourhood variance 2 using Gibbs sampling and a Gamma full conditional distribution for 1= 2 : (3) (3) h p(1= 2 (3) j y (3) 2 2 e) Gamma n3 + 3 P 1 n3 2 2 j=1 ((3) j ) s 2 3 In step 6 we update the observation level variance e 2 using Gibbs sampling and a Gamma full conditional distribution for 1=e 2 : h p(1=e 2 j y (3) 2 (3)) Gamma N +e P i i e2 i + es 2 e : The above 6 steps are repeatedly sampled from in sequence to produce correlated chains of parameter estimates from which point and interval estimates can be created as in chapter AIP method. The Alternating Imputation Prediction (AIP) method is a data augmentation algorithm for estimating cross-classied i i : :

13 models with large numbers of random eects. Comprehensive details of this algorithm are given in Clayton and Rasbash, We now give an overview. Data augmentation has been reviewed by Schafer, Tanner and Wong, 1987 introduced the idea of data augmentation as a stochastic version of the EM algorithm for maximum likelihood estimation in problems involving missing data. Corresponding to the E and M steps of Tanner and Wong we have I(mputation) step - impute missing data by sampling the distribution of the missing data conditional upon the observed data and current values of the model parameters. P(osterior) step - sample parameter values from the complete data posterior distribution these will be used for the next I-step. In the context of random eect models, the random eects play the role of missing data. If the observed data are denoted by y, the random eects by and the model parameters by and if we denote the probability distribution of y conditional on X as p(yjx) then the algorithm is specied (at step t) by Istep-Draw a sample (t) from p( j y = (t;1) ) P step - Draw asample (t) from p( j y = (t) ) Repeated application of these two steps delivers a stochastic chain with equilibrium distribution p( j y) in a similar way to the MCMC algorithm. Now lets look at how we can adapt this method to t a crossed random eects model when the only estimating engine we have at our disposal is one optimized for tting nested random eects. An n-way cross-classied model can be broken down into n sub-models each of which is a 2 level hierarchical model. For example, patients nested within a cross-classication of neighbourhood by hospital can be broken down into a patient within hospital sub-model and a patient within neighbourhood sub-model. Take the simple model y i =X i + nbd(i) + (3) hosp(i) + e i where neighbourhood and hospital are cross-classied. This crossclassied model can be portioned into two hierarchical sub models : patients within neighbourhoods (model N) and patients within hospitals (model H). An informal description of the AIP algorithm is : 1. Start by tting model N using an estimation procedure for 2 level models. 13

14 14 2. Sample the model parameters from an approximation to their joint posterior distribution. That is sample the xed eects, the neighbourhood level variance and the patient level variance denote these samples by [0 2], 2 [0 2] and 2 e[0 2] respectively. Here [0,2] labels a term as belonging to AIP iteration 0, for classication number 2, that is neighbourhood. This is the P-step for the neighbourhood classication. 3. Next sample a set of neighbourhood level random eects(o [0 2] ) from p( [0 2] j y [0 2] 2 [0 2] 2 ). This is the I-step for the e[0 2] neighbourhood classication. 4. Oset o [0 2] from y, that is form y = y ; o [0 2], re-sort the data according to hospitals and t model H using the new oset response y. 5. Next sample [0 3], 2 and 2, from this second model, H. [0 3] e[0 3] This is the P-step for the hospital classication. 6. Sample a set of hospital level random eects(o [0 3] ) from p( [0 3] j y [0 3] 2 [0 3] 2 ). This is the I-step for the hospital classication. e[0 3] This completes one iteration of the AIP algorithm, this is an Imputation- Posterior algorithm that Alternates between the neighbourhood and hospital classications. We proceed by forming y = y ;o [0 3], that is osetting the sampled hospital residuals from y and using that as a response in step 1. After T iterations the procedure delivers the following two chains, that can be used for inference f [0 2] 2 [0 2] 2 e[0 2] g f [1 2] 2 [1 2] 2 e[1 2] g :::f [T 2] 2 [T 2] 2 e[t 2] g f [0 3] 2 [0 3] 2 e[0 3] g f [1 3] 2 [1 3] 2 e[1 3] g :::f [T 3] 2 [T 3] 2 e[t 3] g Note that we get two sets of estimates for both the xed eects and the level 1 variance with the AIP algorithm and these should be approximately equal Other Methods. Raudenbush, 1993 considers an empirical Bayes approach to tting cross-classied models based on the EM algorithm. He considers the specic case of two classications where one of the classications has many units whilst the other has far fewer and shows two educational examples to illustrate the method. Two other recent approaches that can be used for tting cross-classied models, in particular with non-normal responses are Gauss-Hermite quadrature within PQL estimation Pan and Thompson, 2000 and the

15 HGLM model framework as described in Lee and Nelder, Neither of these approaches has been designed with speed of estimation in mind and so they are currently not feasible for the size of some of the problems that are considered in practice Comparison of estimation methods. The RG method when it works is generally fairly quick to converge where all or all but one of the crossed classications have small numbers of units. When there are multiple crossed classications with large numbers of untis then the speed of the RG algorithm deteriorates and memory usage is greatly increased, often exhausting the available memory. The AIP method does not have these memory problems but will be slower for structures that are almost hierarchical. Although this method works reasonably well, if the response is a binary variable and quasi-likelihood methods need to be used, then this method like the RG method is still aected by the bias that is inherent in quasi-likelihood methods for binary response multilevel models (See Goldstein and Rasbash, 1996). The MCMC methods have no bias problems although there are still issues on which prior distributions to use for the variance parameters. They also, like the AIP methods do not have any memory problems. They are however generally computationally a lot slower as they are estimating the whole distribution and not simply the mode, although as the structure of the data becomes more complex the ratio of speed dierence is reduced An example analysis of a two way cross-classication: primary schools crossed with secondary schools. We will here consider tting the RG method using the IGLS algorithm, the MCMC method based on Gibbs sampling (Browne et al., 2000) and the AIP method to an educational example from Fife in Scotland. Here we have as a response the exam results of 3,435 children at age 16. We know for each child both the primary school and secondary school that they attended and we areinterested in partitioning the variance between these two sources and individual pupil level variation. The classication diagram is shown in gure 1.4. There are 148 primary schools that feed into 19 secondary schools in the dataset. Of the 148 primary schools, 59 are nested within a single secondary school, whilst another 62 have at most 3 pupils that do not go to the main secondary school so we have an almost nested structure. This structure is particularly suited for the RG algorithm. We will t the following model to the dataset 15

16 16 Primary School Secondary School Pupil Figure 1.4 Classication Diagram for the Fife educational example Table 1.5 Point estimates for the Fife educational dataset. Parameter IGLS MCMC AIP Mean achievement ( 0 ) 5.50 (0.17) 5.50 (0.18) 5.51 (0.19) Secondary school variance ( 2 ) 0.35 (0.16) 0.41 (0.21) 0.34 (0.15) Primary school variance ( 2 ) (3) 1.12 (0.20) 1.15 (0.213) 1.11 (0.20) Individual level variance (e) (0.20) 8.12 (0.20) 8.11 (0.20) y i = 0 + SEC(i) + (3) PRIM(i) + e i SEC(i) N(0 2 ) (3) PRIM(i) N(0 2 (3) ) e i N(0 2 e): The results are shown in table 1.5. From table 1.5 we can see that in this example there is more variation between primary schools than between secondary schools. The MCMC

17 estimates replicate the IGLS estimates with slightly greater higher level variances (mean versus mode estimates) due to the skewness of the posterior distribution. The AIP method gives very similar results to the IGLS method. A further discussion of these results is given in Goldstein, MODELS FOR MORE COMPLEX POPULATION STRUCTURES In this section we will consider expanding the simple two cross-classied structure to accomodate more classications and more complex structures Example scenarios. Lets take the situation described in the classication diagram drawn in gure 1.3(i) where patients lie within a cross-classication of hospitals by areas. We may have information on the doctor that treated each patient and doctors may be nested within hospitals. The classication diagram for this structure is shown in gure Hospital Doctor Neighbourhood Patient Figure 1.5 Classication Diagram for two crossed hierarchies (patients within doctors within hospitals)*(patients within neighbourhoods). A variance components model for this structure is written as

18 18 y i = 0 + nhbd(i) + (3) hosp(i) + (4) doct(i) + e i If doctors work across hospitals and are therefore not nested within hospital we then have a three way cross-classication which is drawn in gure 1.6. Hospital Neighbourhood Doctor Patient Figure 1.6 Classication Diagram for three crossed hierarchies (patients within hospitals)* (patients within doctors)*(patients within neighbourhoods). Note that the variance components model for the structure in gure 1.6 is also described by the same equation. This is a reection of the fact that the model notation for describing the random eects simply lists the classications that are sources of variation for the response we are modelling. In the variance components model we only have an intercept term which varies across all four classications present. Suppose we had another explanatory variable, x 1 and we wished to allow its coecient to vary across the doctor classications we would write this model as y i = 0 + nhbd(i) + (3) hosp(i) + (4) doct(i) 0 + 1x 1i + (4) doct(i) 1 x 1i + e i or alternatively we can express the model as :

19 19 y i = 0i + 1i x 1i + e i 0i = nhbd(i) (3) + hosp(i) (4) doct(i) 0 1i = 1 + (4) doct(i) 1 It may be that the scenario described in gure 1.6 is further complicated because hospitals, doctors and neighbourhoods are all nested within regions. In this case the classication diagram becomes as in gure 1.7. Region Hospital Neighbourhood Doctor Patient Figure 1.7 Classication Diagram for three crossed hierarchies nested within a higher level classication. Extending the last model to incorporate a simple random eect for the region classication we have y i = 0i + 1i x 1i + e i 0i = nhbd(i) (3) + hosp(i) (4) + doct(i) 0 (5) reg(i) 1i = 1 + (4) doct(i) 1 These few example scenarios indicate how the classication diagrams and simplied notation can extend to describe patterns of crossings of arbitrary complexity An example analysis of a complex cross-classied structure : Articial Insemination data. We consider a data set con-

20 20 cerning articial insemination by donor. Detailed description of this data set and the substantive research questions addressed by modelling it within a cross-classied frame work are given in Ecochard and Clayton, The data was re-analysed in Clayton and Rasbash, 1999 as an example case study demonstrating the properties of the AIP algorithm for estimating cross-classied models. The data consists of 1901 women who were inseminated by sperm donations from 279 donors. Each donor made multiple donations, there were 1328 donations in all. A single donation is used for multiple inseminations. Each woman receives a series of monthly inseminations, 1 insemination per ovulatory cycle. The data contain cycles within the 1901 women. There are two crossed hierarchies, a hierarchy for donors and a hierarchy for women. Level 1 corresponds to measures made at each ovulatory cycle. The response we analyse is the binary variable indicating if conception occurs in a given cycle. The hierarchy for women is cycles within women. The hierarchy for donors is cycles within donations within donors. Within a series of cycles a women may receive sperm from multiple donors/donations. The classication diagram for this structure is given in gure 1.8. The model tted to the data is Donor Donation Woman Cycle Figure 1.8 Classication Diagram for the articial insemination example model.

21 21 Table 1.6 Results for the Articial insemination example. Parameter MCMC AIP intercept ( 0 ) (0.21) (0.21) azoospermia ( 1 ) 0.21 (0.09) 0.22 (0.10) semen quality ( 2 ) 0.18 (0.03) 0.18 (0.03) womens age > 35 ( 3 ) (0.12) (0.12) Sperm count ( 4 ) (0.001) (0.001) Sperm motility ( 5 ) (0.0001) (0.0001) Insemination too early ( 6 ) (0.17) (0.17) Insemination too late ( 7 ) (0.09) (0.09) Donor variance ( 2 ) (4) 0.11 (0.06) 0.10 (0.06) Donation variance ( 2 ) (3) 0.36 (0.074) 0.34 (0.065) Women variance ( 2 ) 1.02 (0.15) 1.01 (0.11) y i Bernouilli( i ) logit( i )= 0 + azoo i 1 + semenq i 2 + age > 35 i 3 + spermcount i 4 + spermmot i 5 + iearly i 6 + ilate i woman(i) (3) + donation(i) (4) (donor(i) woman(i) N(0 2 ) (3) donation(i) N(0 2 (3) ) (4) donor(i) N(0 2 ) (4) (1.1) Note that azoospermia (azoo) is a dichotomous variable indicating whether the fecundability of the woman is impaired (0 impaired, 1 not impaired). The results of tting this model from the MCMC and AIP estimation procedures are given in table 1.6. This model could not be tted using the RG algorithm. This is because if the data is sorted according to women then we need to t 279 dummy variables for donors and 1328 dummy variables for donations. Alternatively, if we sort the data according to donations within donors we have to t 1901 dummy variables for women. Either way, the size of these data matrices cause problems of insucient memory. Even if these memory problems can be worked around the numerical instability of the constraining procedure, that attempts to constrain over a thousand seperately estimated variances to be equal, causes the adapted IGLS algorithm to fail to converge. After inclusion of covariates there is considerably more variation in the probability of a successful insemination attributable to the women

22 22 hierarchy thanthedonorhierarchy. Both the AIP and MCMC methods give similar estimates for all parameters. The xed eect estimates show that the probability of conception is increased with azoospermia and increased sperm quality, count and motility but decreased with the age of the woman and with inseminations that are too early or late. 3. MULTIPLE MEMBERSHIP MODELS As we have seen from the previous section, allowing classications to be crossed gives rise to a large family of additional model structures that can be estimated. The other main restriction of the basic multilevel model is the need for observations to belong to a unique classication unit i.e. every pupil belongs to a particular class, every patient is treated at a particular hospital. Often however, over time a patient may be treated at several hospitals and depending on the response of interest all of these hospitals may have inuence. In this section we will rstly introduce the idea of multiple membership and give some example scenarios where it may occur. We will then discuss the possible estimation procedures that can be used to t multiple membership models and nish the chapter with a simulated example from the eld of education. 3.1 A BASIC STRUCTURE FOR TWO LEVEL MULTIPLE MEMBERSHIPS Supose we have data on a large number of patients that attend their local hospital and during the course of their hospital stay they are treated by several nurses and we regard the nurses as an important factor on the patients outcome of interest. Now typically each patient will be seen by more than one nurse during their stay (although some will only see 1) but there are many nurses and so we will treat nurses as a random classication rather than as xed eects. To illustrate this table 1.7 shows the nurses seen by the rst 4 patients. We can consider this structure in a unit diagram as shown in gure 1.9. Here each line in the diagram corresponds to a tick mark in the table. Again as our dataset gets larger such unit diagrams become impractical as there will be too many nodesandsowe will resort to using the classication diagrams introduced earlier for cross-classied models. If we wish to include multiple membership classications in such diagrams we use the convention of a double arrow to represent multiple membership. This will lead to the classication diagram shown in gure 1.10 for the above patients and nurses example.

23 23 Table 1.7 Table of patients that are seen by multiple nurses. Patient 1 Patient 2 Patient 3 Patient 4 Nurse 1 Nurse 2 Nurse 3 p p p p p p p Nurse N1 N2 N3 Patient P1 P2 P3 P4 Figure 1.9 Unit Diagram for multiple membership patients within nurses example Example scenarios. Many studies have multiple membership structure, here are a few examples : Education : pupils change school/class over the course of their education and each school/class has an eect on their education. Health : patients are seen by several doctors and nurses during the course of their treatment. Survey data : Over their lifetime individuals move household and each household has a bearing on their lifestyle, health, salary etc Constructing a statistical model. Returning to our example of patients being seen by multiple nurses, we have patient 1's response being aected by nurse 1 and nurse 3 while patient 2 is only aected by nurse 1. As we are treating nurse as a random classication

24 24 Nurse Patient Figure 1.10 example. Classication Diagram for multiple membership patients within nurses we would like each patient's response to have equal eect on the nurse classication variance so we generally weight the random eects to sum to 1. For example let's assume patient 1 has been treated by nurse 1 for 2 days and nurse 3 for 1 day. Then we may give nurse 1 a weight of 2 and nurse 3 a weight of 1. Often we do not have information on 3 3 the amount of time patients are seen by each nurse and so we commonly allocate equal weights (in this case 1 )toeach nurse. 2 We can then write down a general two level multiple-membership model as y i =X + X j2nurse(i) w i j j + e i j N(0 2 ) e i N(0 2 e) nurse(i) istheset of nurses seen by patient i and w i j given to nurse j for patient i. Here we assume that X j2nurse(i) w i j =18i: is the weight

25 If we wish to write out this model for the rst four patients from the example we get 25 y 1 =X e 1 y 2 =X e 2 y 3 =X e 3 y 4 =X e ESTIMATION ALGORITHMS There are two main algorithms for multiple membership models, an adaption of the Rasbash and Goldstein, 1994 algorithm described earlier and the MCMC method. The AIP method has not been extended to cater for multiple membership models An IGLS algorithm for multiple membership models. Earlier we described how to t a cross-classied model by absorbing one of the cross-classications into a set of dummy variables (The RG method). A slight modication is required to allow this technique to be used to t multiple membership models. First lets consider a two level hierarchical model for patients within nurses: y i = 0 + nurse(i) + e i nurse(i) N(0 2 ) e i N(0 2 e): We can reparamaterise this simple two level model as y i = 0 +z i 1 nurse(i) 1 +z i 2 nurse(i) 2 +z i 3 nurse(i) 3 +:::+z i J nurse(i) J +e i nurse(i) 1 nurse(i) 2 nurse(i) 3. nurse(i) J N(0 ) = J e i N(0 2 e)

26 26 where z i j is a dummy variable which is 1 if patient i is seen by nurse j, 0 otherwise and J is the total number of nurses. Also we add the constraint 2 = 1 2 = ::: = 2 2. Now these two models will J deliver the same estimates, however the second formulation will take much longer to compute. The advantage of the second model formulation is that it is straightforward to extend it to the multiple membership case. Suppose patients are not nested within a single nurse but are multiple members of nurses with membership probabilities, i j. We can simply replace z i j with i j in the second formulation and estimation can proceed in an identical fashion but will now deliver estimates for the multiple membership model MCMC. Once again we will use a Gibbs sampling algorithm that relies on updating groups of parameters in turn from their conditional posterior distributions. For illustration we present the steps for the following simple multiple membership model based on the variance components model patients within nurses described earlier. We once again refer the interested reader to Browne et al., 2000 for more general algorithms and note that if the response is dichotomous or a count then as in chapter 3 we can use the Metropolis-Gibbs hybrid method discussed there. The basic two level multiple membership model (patients within nurses) can be written as : y i =X + X j2nurse(i) w i j j + e i j N(0 2 ) e i N(0 2 e) We can split our unknown parameters into 4 distinct sets : the xed eects,, the nurse random eects, j, the nurse level variance, 2 and the patient level residual variance, e. 2 We then need to generate random draws from the conditional distribution of each of these four groups of unknowns. We will dene prior distributions for our unknown parameters as follows: For generality we will use a multivariate Normal prior for the xed eects, N pf ( p S p ), and scaled inverse 2 priors for the twovariances. For the nurse level variance 2 SI2 ( 2 s 2 2), and for the patientlevel variance e 2 SI2 ( e s 2 e). The steps are then as follows: In step 1 of the algorithm the conditional posterior distribution in the Gibbs update for the xed eects parameter vector is multivariate normal with dimension p f (the number of xed eects) :

27 27 p( j y 2 2 e) N pf (b b D) where bd = h PNi=1 (X i ) T X i e 2 (X i ) T d i e 2 b = b D h P i + S ;1 p + S p ;1 p i ;1 and i where d i = y i ; P j2nurse(i) w i j j : In step 2 we update the nurse residuals, k, using Gibbs sampling with aunivariate Normal full conditional distribution : p( k j y 2 2 e) N(b " P i k2nurse(i) bd k = bu k = D b k (w i k )2 2 e " P i k2nurse(i) k D b k ) where + 1 w i k d i k e 2 # 2 # ;1 and where d i k = y i ; X i ; P j2nurse(i) j6=k w i j j : In step 3 we update the nurse level variance 2 using Gibbs sampling and a Gamma full conditional distribution for 1= 2 : h p(1= 2 j y e) 2 Gamma n2 + 2 P 1 n2 i 2 2 j=1 ( j ) s 2 2 : In step 4 we update the patient level variance e 2 using Gibbs sampling and a Gamma full conditional distribution for 1=e 2 : h p(1=e 2 j y 2 ) Gamma N +e P i i e2 i + es 2 e : The above 4 steps are repeatedly sampled from in sequence to produce correlated chains of parameter estimates from which point and interval estimates can be created as in chapter Comparison of estimation methods. As in the comparison for cross-classied models there are benets for both methods. The RG method is fairly quick but the number of level 2 units determines the size of some of the matrices involved and the number of constraints that the method has to apply. These dependencies lead to numerical instability or memory exhaustion in situations with more than a few hundred level 2 units. The MCMC methods although again computationally slower do not suer from these memory problems An example analysis of a twolevel multiple membership model : Children moving school. We consider a simulated

28 28 Table 1.8 Results for the multiple membership schools example. Parameter RG RIGLS estimates MCMC Estimates intercept ( 0 ) (0.040) (0.040) LRT eect ( 1 ) (0.012) (0.013) School variance ( 2 ) (3) (0.018) (0.020) Pupil variance ( 2 ) (0.013) (0.013) data example based on the problem in education of adjusting for the fact that pupils move school during the course of their studies. We will consider a study with 4059 students from 65 schools taken from Rasbash et al., The actual data in the study has each child belonging to 1 school but we will assume that over their education 10% of children moved school so we will choose at random for 10% of the children a second school. We will assume that information about when the move occured is unavailable and so for these children we will allocate equal weights of 0.5 to each school. Browne et al., 2000 considered this as the basis for a simulation experiment by generating 1000 datasets with this structure to show the bias and coverage properties of the MCMC method. We will instead consider the true response on our modied structure. We have as a response the pupil's total (normalised) exam score in all GCSE exams taken at age 16 and as a predictor the pupil's (standardised) score in a reading test taken at age 11. As we are interested in progress from age it makes sense to consider the eect of all schools attended in this period. We will consider the following model normexam i = standlrt i + X j2school(i) w i j j + e i j N(0 2 ) e i N(0 2 e) We t this model using both the RG and MCMC methods and the results can be seen in table 1.8 From the table we can see that both methods give similar results. If we compare the results here with the results in Rasbash et al., 2000 we see only slight changes to the estimates with the level 2 variance slightly decreased and the level 1 variance slightly increased. However in cases where there is greater amounts of multiple membership the

29 variance estimates can be altered if this multiple membership is ignored, for example if we randomly assigned every pupil to a second school the variances change to and at levels 1 and 2 respectively. 4. COMBINING MULTIPLE MEMBERSHIP AND CROSS-CLASSIFIED STRUCTURES IN A SINGLE MODEL Consider two of our earlier examples in the eld of education, rstly pupils in a crossing of primary schools and secondary schools and secondly pupils who are moving from school to school. We could assume that these two structures occur simultaneously and we will then end up with a model structure that contains both a multiple membership classication (secondary schools) and a second classication (primary schools) that is crossed with the rst. This scenario can be represented by a classication diagram as in gure Browne et al., 2000 refer to models that contain both multiple memberships and cross classications as multiple membership multiple classication (MMMC) models. 29 P. School S. School Pupil Figure 1.11 model. Classication Diagram for the neighbours/schools multiple membership

30 EXAMPLE SCENARIOS Many studies have both cross-classied and multiple membership classications in their structure, a few examples are the following : Education : pupils can be aected by the crossing of the neighbourhood they live in and the school they attend. They could also change class over their period of education and so this multiple membership class classication will be crossed with the neighbourhood classication. Health : patients are seen by several doctors during their treatment and may visit several hospitals. Doctors who are specialists may move from hospital to hospital and so are crossed with the hospitals. Survey Data : individuals will belong to many households over the course of their lives and will reside in several properties. An entire household may move to a new property so households can be crossed with properties and all the households/properties can have an eect on the individual. See Goldstein et al., 2000 for more details. Spatial Data : individuals will belong to a particular area but will also be aected by multiple neighbouring areas. 4.2 CONSTRUCTING A STATISTICAL MODEL If we return to our example of pupils attending multiple secondary schools but coming from one primary school we need to combine the multiple membership and cross classied model structures into one model. As we are treating the secondary schools as a random classication we would like each pupil to have an equal eect on the secondary school classication so we willuseweights that add to 1 when a pupil attends more than one secondaryschool. We will let second(i) be the list of secondary schools that child i has attended. We can then write down a general two classication MMMC model as y i =X + X j2second(i) w i j j + (3) prim + e i j N(0 2 ) (3) prim N(0 2 (3) ) e i N(0 2 e):

31 Here w i j 31 is the weight given to secondary school j for pupil i. Here we assume that P j2second(i) w i j = 18i. Both the RG algorithm and the MCMC method can be used to t these models that combine both multiple membership and cross classication. 4.3 AN EXAMPLE ANALYSIS : DANISH POULTRY FARMING Rasbash and Browne, 2001 consider an example from veterinary epidemiology concerning the outbreaks of salmonella typhimurium in ocks of chickens in poultry farms in Denmark between 1995 and The response of interest is whether salmonella typhimurium is present in a ock and in the data collected 6.3% of ocks had the disease. At the observation level, each observation represents a ock of chickens. For each ock the response variable is whether or not there was an instance of salmonella in that ock. The basic data have a simple hierarchical structure as each ock iskept in a house on a farm until slaughter. As ocks live for a short time before they are slaughtered several ocks will stay in the same house each year. The hierarchy is as follows 10,127 child ocks within 725 houses on 304 farms. Each ock is created from a mixture of parentocks (up to 6) of which there are 200 in Denmark and so we have a crossing between the child ock hierarchy and the multiple membership parent ock classication. The classication diagram can be seen in gure We also know the exact makeup of each child ock (in terms of parent ocks) and so can use these as weights for each of the parent ocks. We are interested in assessing how much of the variability in typhoid incidence can be attributed to houses, farms and parent ocks. There are also 4 hatcheries in which all the eggs from the parent ocks are hatched. We will therefore t a variance components model that allows for dierent average rates of salmonella for each year with hatchery included in the xed part as follows : salmonella i Bernouilli( i ) logit( i )= 0 + Y Y hatch2 3 + hatch3 4 + hatch House(i) (3) + P Farm(i) j2p:flock(i) w(4) i j (4) j House(i) N(0 2 ) (3) Farm(i) N(0 2 (3) ) (4) N(0 2 (4) ) (1.2)

32 32 Farm House Parent Flock Child Flock Figure 1.12 Classication diagram for the Danish poultry model. The results of tting model 1.2 using both the Rasbash and Goldstein method with 1st order MQL estimation and the MCMC method can be seen in table 1.9. The quasi-likelihood methods are numerically rather unstable and we could not get either 2nd order MQL or PQL to t this model. We can see here that there are large eects for the year the chickens were born suggesting that salmonella was more prevalent in 1995 than the other years. The hatchery eects were also large suggesting chickens produced in hatcheries 1 and 3 had a larger incidence of salmonella. There is a large variability for the parent ock eects and for the farm eects which are of similar magnitude. There is less variability between houses within farms Method comparison. The MCMC results were run for 50,000 iterations after a burn-in of 20,000 (This took just under 2 hours on a 733MHz PC) as we used arbitrary starting values and so the chain took a while to converge. From table 1.9 we can see reasonable agreementbetween the two methods, although the xed eects in MQL are all smaller as is the farm level variance. This behaviour was shown in simulations on a nested 3 level binary response data structure in Rodriguez

Summary of Talk Background to Multilevel modelling project. What is complex level 1 variation? Tutorial dataset. Method 1 : Inverse Wishart proposals.

Summary of Talk Background to Multilevel modelling project. What is complex level 1 variation? Tutorial dataset. Method 1 : Inverse Wishart proposals. Modelling the Variance : MCMC methods for tting multilevel models with complex level 1 variation and extensions to constrained variance matrices By Dr William Browne Centre for Multilevel Modelling Institute

More information

Partitioning variation in multilevel models.

Partitioning variation in multilevel models. Partitioning variation in multilevel models. by Harvey Goldstein, William Browne and Jon Rasbash Institute of Education, London, UK. Summary. In multilevel modelling, the residual variation in a response

More information

Multilevel Statistical Models: 3 rd edition, 2003 Contents

Multilevel Statistical Models: 3 rd edition, 2003 Contents Multilevel Statistical Models: 3 rd edition, 2003 Contents Preface Acknowledgements Notation Two and three level models. A general classification notation and diagram Glossary Chapter 1 An introduction

More information

Recent Developments in Multilevel Modeling

Recent Developments in Multilevel Modeling Recent Developments in Multilevel Modeling Roberto G. Gutierrez Director of Statistics StataCorp LP 2007 North American Stata Users Group Meeting, Boston R. Gutierrez (StataCorp) Multilevel Modeling August

More information

Hierarchical Linear Models. Jeff Gill. University of Florida

Hierarchical Linear Models. Jeff Gill. University of Florida Hierarchical Linear Models Jeff Gill University of Florida I. ESSENTIAL DESCRIPTION OF HIERARCHICAL LINEAR MODELS II. SPECIAL CASES OF THE HLM III. THE GENERAL STRUCTURE OF THE HLM IV. ESTIMATION OF THE

More information

1 Introduction. 2 Example

1 Introduction. 2 Example Statistics: Multilevel modelling Richard Buxton. 2008. Introduction Multilevel modelling is an approach that can be used to handle clustered or grouped data. Suppose we are trying to discover some of the

More information

MULTILEVEL IMPUTATION 1

MULTILEVEL IMPUTATION 1 MULTILEVEL IMPUTATION 1 Supplement B: MCMC Sampling Steps and Distributions for Two-Level Imputation This document gives technical details of the full conditional distributions used to draw regression

More information

Modelling heterogeneous variance-covariance components in two-level multilevel models with application to school effects educational research

Modelling heterogeneous variance-covariance components in two-level multilevel models with application to school effects educational research Modelling heterogeneous variance-covariance components in two-level multilevel models with application to school effects educational research Research Methods Festival Oxford 9 th July 014 George Leckie

More information

Efficient Analysis of Mixed Hierarchical and Cross-Classified Random Structures Using a Multilevel Model

Efficient Analysis of Mixed Hierarchical and Cross-Classified Random Structures Using a Multilevel Model Efficient Analysis of Mixed Hierarchical and Cross-Classified Random Structures Using a Multilevel Model Jon Rasbash; Harvey Goldstein Journal of Educational and Behavioral Statistics, Vol. 19, No. 4.

More information

University of Bristol - Explore Bristol Research. Peer reviewed version. Link to published version (if available): /rssa.

University of Bristol - Explore Bristol Research. Peer reviewed version. Link to published version (if available): /rssa. Goldstein, H., Carpenter, J. R., & Browne, W. J. (2014). Fitting multilevel multivariate models with missing data in responses and covariates that may include interactions and non-linear terms. Journal

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate

More information

G METHOD IN ACTION: FROM EXACT SAMPLING TO APPROXIMATE ONE

G METHOD IN ACTION: FROM EXACT SAMPLING TO APPROXIMATE ONE G METHOD IN ACTION: FROM EXACT SAMPLING TO APPROXIMATE ONE UDREA PÄUN Communicated by Marius Iosifescu The main contribution of this work is the unication, by G method using Markov chains, therefore, a

More information

Genetic Heterogeneity of Environmental Variance - estimation of variance components using Double Hierarchical Generalized Linear Models

Genetic Heterogeneity of Environmental Variance - estimation of variance components using Double Hierarchical Generalized Linear Models Genetic Heterogeneity of Environmental Variance - estimation of variance components using Double Hierarchical Generalized Linear Models L. Rönnegård,a,b, M. Felleki a,b, W.F. Fikse b and E. Strandberg

More information

Comparison of multiple imputation methods for systematically and sporadically missing multilevel data

Comparison of multiple imputation methods for systematically and sporadically missing multilevel data Comparison of multiple imputation methods for systematically and sporadically missing multilevel data V. Audigier, I. White, S. Jolani, T. Debray, M. Quartagno, J. Carpenter, S. van Buuren, M. Resche-Rigon

More information

1/sqrt(B) convergence 1/B convergence B

1/sqrt(B) convergence 1/B convergence B The Error Coding Method and PICTs Gareth James and Trevor Hastie Department of Statistics, Stanford University March 29, 1998 Abstract A new family of plug-in classication techniques has recently been

More information

A multivariate multilevel model for the analysis of TIMMS & PIRLS data

A multivariate multilevel model for the analysis of TIMMS & PIRLS data A multivariate multilevel model for the analysis of TIMMS & PIRLS data European Congress of Methodology July 23-25, 2014 - Utrecht Leonardo Grilli 1, Fulvia Pennoni 2, Carla Rampichini 1, Isabella Romeo

More information

SF2930 Regression Analysis

SF2930 Regression Analysis SF2930 Regression Analysis Alexandre Chotard Tree-based regression and classication 20 February 2017 1 / 30 Idag Overview Regression trees Pruning Bagging, random forests 2 / 30 Today Overview Regression

More information

The performance of estimation methods for generalized linear mixed models

The performance of estimation methods for generalized linear mixed models University of Wollongong Research Online University of Wollongong Thesis Collection 1954-2016 University of Wollongong Thesis Collections 2008 The performance of estimation methods for generalized linear

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016

Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016 Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016 EPSY 905: Intro to Bayesian and MCMC Today s Class An

More information

Discrete Response Multilevel Models for Repeated Measures: An Application to Voting Intentions Data

Discrete Response Multilevel Models for Repeated Measures: An Application to Voting Intentions Data Quality & Quantity 34: 323 330, 2000. 2000 Kluwer Academic Publishers. Printed in the Netherlands. 323 Note Discrete Response Multilevel Models for Repeated Measures: An Application to Voting Intentions

More information

Part 7: Hierarchical Modeling

Part 7: Hierarchical Modeling Part 7: Hierarchical Modeling!1 Nested data It is common for data to be nested: i.e., observations on subjects are organized by a hierarchy Such data are often called hierarchical or multilevel For example,

More information

Determining Sample Sizes for Surveys with Data Analyzed by Hierarchical Linear Models

Determining Sample Sizes for Surveys with Data Analyzed by Hierarchical Linear Models Journal of Of cial Statistics, Vol. 14, No. 3, 1998, pp. 267±275 Determining Sample Sizes for Surveys with Data Analyzed by Hierarchical Linear Models Michael P. ohen 1 Behavioral and social data commonly

More information

Alternative implementations of Monte Carlo EM algorithms for likelihood inferences

Alternative implementations of Monte Carlo EM algorithms for likelihood inferences Genet. Sel. Evol. 33 001) 443 45 443 INRA, EDP Sciences, 001 Alternative implementations of Monte Carlo EM algorithms for likelihood inferences Louis Alberto GARCÍA-CORTÉS a, Daniel SORENSEN b, Note a

More information

On estimation of the Poisson parameter in zero-modied Poisson models

On estimation of the Poisson parameter in zero-modied Poisson models Computational Statistics & Data Analysis 34 (2000) 441 459 www.elsevier.com/locate/csda On estimation of the Poisson parameter in zero-modied Poisson models Ekkehart Dietz, Dankmar Bohning Department of

More information

Bayesian Hierarchical Models

Bayesian Hierarchical Models Bayesian Hierarchical Models Gavin Shaddick, Millie Green, Matthew Thomas University of Bath 6 th - 9 th December 2016 1/ 34 APPLICATIONS OF BAYESIAN HIERARCHICAL MODELS 2/ 34 OUTLINE Spatial epidemiology

More information

A Comparison of Multilevel Logistic Regression Models with Parametric and Nonparametric Random Intercepts

A Comparison of Multilevel Logistic Regression Models with Parametric and Nonparametric Random Intercepts A Comparison of Multilevel Logistic Regression Models with Parametric and Nonparametric Random Intercepts Olga Lukocien_e, Jeroen K. Vermunt Department of Methodology and Statistics, Tilburg University,

More information

Younshik Chung and Hyungsoon Kim 968). Sharples(990) showed how variance ination can be incorporated easily into general hierarchical models, retainin

Younshik Chung and Hyungsoon Kim 968). Sharples(990) showed how variance ination can be incorporated easily into general hierarchical models, retainin Bayesian Outlier Detection in Regression Model Younshik Chung and Hyungsoon Kim Abstract The problem of 'outliers', observations which look suspicious in some way, has long been one of the most concern

More information

Spring RMC Professional Development Series January 14, Generalized Linear Mixed Models (GLMMs): Concepts and some Demonstrations

Spring RMC Professional Development Series January 14, Generalized Linear Mixed Models (GLMMs): Concepts and some Demonstrations Spring RMC Professional Development Series January 14, 2016 Generalized Linear Mixed Models (GLMMs): Concepts and some Demonstrations Ann A. O Connell, Ed.D. Professor, Educational Studies (QREM) Director,

More information

Lecture Notes based on Koop (2003) Bayesian Econometrics

Lecture Notes based on Koop (2003) Bayesian Econometrics Lecture Notes based on Koop (2003) Bayesian Econometrics A.Colin Cameron University of California - Davis November 15, 2005 1. CH.1: Introduction The concepts below are the essential concepts used throughout

More information

LINEAR MULTILEVEL MODELS. Data are often hierarchical. By this we mean that data contain information

LINEAR MULTILEVEL MODELS. Data are often hierarchical. By this we mean that data contain information LINEAR MULTILEVEL MODELS JAN DE LEEUW ABSTRACT. This is an entry for The Encyclopedia of Statistics in Behavioral Science, to be published by Wiley in 2005. 1. HIERARCHICAL DATA Data are often hierarchical.

More information

Variance partitioning in multilevel logistic models that exhibit overdispersion

Variance partitioning in multilevel logistic models that exhibit overdispersion J. R. Statist. Soc. A (2005) 168, Part 3, pp. 599 613 Variance partitioning in multilevel logistic models that exhibit overdispersion W. J. Browne, University of Nottingham, UK S. V. Subramanian, Harvard

More information

Part 8: GLMs and Hierarchical LMs and GLMs

Part 8: GLMs and Hierarchical LMs and GLMs Part 8: GLMs and Hierarchical LMs and GLMs 1 Example: Song sparrow reproductive success Arcese et al., (1992) provide data on a sample from a population of 52 female song sparrows studied over the course

More information

A characterization of consistency of model weights given partial information in normal linear models

A characterization of consistency of model weights given partial information in normal linear models Statistics & Probability Letters ( ) A characterization of consistency of model weights given partial information in normal linear models Hubert Wong a;, Bertrand Clare b;1 a Department of Health Care

More information

Toutenburg, Fieger: Using diagnostic measures to detect non-mcar processes in linear regression models with missing covariates

Toutenburg, Fieger: Using diagnostic measures to detect non-mcar processes in linear regression models with missing covariates Toutenburg, Fieger: Using diagnostic measures to detect non-mcar processes in linear regression models with missing covariates Sonderforschungsbereich 386, Paper 24 (2) Online unter: http://epub.ub.uni-muenchen.de/

More information

Generalized Linear Models for Non-Normal Data

Generalized Linear Models for Non-Normal Data Generalized Linear Models for Non-Normal Data Today s Class: 3 parts of a generalized model Models for binary outcomes Complications for generalized multivariate or multilevel models SPLH 861: Lecture

More information

Machine Learning Linear Classification. Prof. Matteo Matteucci

Machine Learning Linear Classification. Prof. Matteo Matteucci Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)

More information

,..., θ(2),..., θ(n)

,..., θ(2),..., θ(n) Likelihoods for Multivariate Binary Data Log-Linear Model We have 2 n 1 distinct probabilities, but we wish to consider formulations that allow more parsimonious descriptions as a function of covariates.

More information

A Non-parametric bootstrap for multilevel models

A Non-parametric bootstrap for multilevel models A Non-parametric bootstrap for multilevel models By James Carpenter London School of Hygiene and ropical Medicine Harvey Goldstein and Jon asbash Institute of Education 1. Introduction Bootstrapping is

More information

Mixture modelling of recurrent event times with long-term survivors: Analysis of Hutterite birth intervals. John W. Mac McDonald & Alessandro Rosina

Mixture modelling of recurrent event times with long-term survivors: Analysis of Hutterite birth intervals. John W. Mac McDonald & Alessandro Rosina Mixture modelling of recurrent event times with long-term survivors: Analysis of Hutterite birth intervals John W. Mac McDonald & Alessandro Rosina Quantitative Methods in the Social Sciences Seminar -

More information

Part 6: Multivariate Normal and Linear Models

Part 6: Multivariate Normal and Linear Models Part 6: Multivariate Normal and Linear Models 1 Multiple measurements Up until now all of our statistical models have been univariate models models for a single measurement on each member of a sample of

More information

Figure 36: Respiratory infection versus time for the first 49 children.

Figure 36: Respiratory infection versus time for the first 49 children. y BINARY DATA MODELS We devote an entire chapter to binary data since such data are challenging, both in terms of modeling the dependence, and parameter interpretation. We again consider mixed effects

More information

Computationally Efficient Estimation of Multilevel High-Dimensional Latent Variable Models

Computationally Efficient Estimation of Multilevel High-Dimensional Latent Variable Models Computationally Efficient Estimation of Multilevel High-Dimensional Latent Variable Models Tihomir Asparouhov 1, Bengt Muthen 2 Muthen & Muthen 1 UCLA 2 Abstract Multilevel analysis often leads to modeling

More information

Nonlinear multilevel models, with an application to discrete response data

Nonlinear multilevel models, with an application to discrete response data Biometrika (1991), 78, 1, pp. 45-51 Printed in Great Britain Nonlinear multilevel models, with an application to discrete response data BY HARVEY GOLDSTEIN Department of Mathematics, Statistics and Computing,

More information

MLMED. User Guide. Nicholas J. Rockwood The Ohio State University Beta Version May, 2017

MLMED. User Guide. Nicholas J. Rockwood The Ohio State University Beta Version May, 2017 MLMED User Guide Nicholas J. Rockwood The Ohio State University rockwood.19@osu.edu Beta Version May, 2017 MLmed is a computational macro for SPSS that simplifies the fitting of multilevel mediation and

More information

Bayesian non-parametric model to longitudinally predict churn

Bayesian non-parametric model to longitudinally predict churn Bayesian non-parametric model to longitudinally predict churn Bruno Scarpa Università di Padova Conference of European Statistics Stakeholders Methodologists, Producers and Users of European Statistics

More information

H-LIKELIHOOD ESTIMATION METHOOD FOR VARYING CLUSTERED BINARY MIXED EFFECTS MODEL

H-LIKELIHOOD ESTIMATION METHOOD FOR VARYING CLUSTERED BINARY MIXED EFFECTS MODEL H-LIKELIHOOD ESTIMATION METHOOD FOR VARYING CLUSTERED BINARY MIXED EFFECTS MODEL Intesar N. El-Saeiti Department of Statistics, Faculty of Science, University of Bengahzi-Libya. entesar.el-saeiti@uob.edu.ly

More information

STA 216, GLM, Lecture 16. October 29, 2007

STA 216, GLM, Lecture 16. October 29, 2007 STA 216, GLM, Lecture 16 October 29, 2007 Efficient Posterior Computation in Factor Models Underlying Normal Models Generalized Latent Trait Models Formulation Genetic Epidemiology Illustration Structural

More information

Gibbs Sampling in Latent Variable Models #1

Gibbs Sampling in Latent Variable Models #1 Gibbs Sampling in Latent Variable Models #1 Econ 690 Purdue University Outline 1 Data augmentation 2 Probit Model Probit Application A Panel Probit Panel Probit 3 The Tobit Model Example: Female Labor

More information

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

A graph contains a set of nodes (vertices) connected by links (edges or arcs) BOLTZMANN MACHINES Generative Models Graphical Models A graph contains a set of nodes (vertices) connected by links (edges or arcs) In a probabilistic graphical model, each node represents a random variable,

More information

Prediction of ordinal outcomes when the association between predictors and outcome diers between outcome levels

Prediction of ordinal outcomes when the association between predictors and outcome diers between outcome levels STATISTICS IN MEDICINE Statist. Med. 2005; 24:1357 1369 Published online 26 November 2004 in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/sim.2009 Prediction of ordinal outcomes when the

More information

Lecture 13: Data Modelling and Distributions. Intelligent Data Analysis and Probabilistic Inference Lecture 13 Slide No 1

Lecture 13: Data Modelling and Distributions. Intelligent Data Analysis and Probabilistic Inference Lecture 13 Slide No 1 Lecture 13: Data Modelling and Distributions Intelligent Data Analysis and Probabilistic Inference Lecture 13 Slide No 1 Why data distributions? It is a well established fact that many naturally occurring

More information

Business Statistics. Tommaso Proietti. Linear Regression. DEF - Università di Roma 'Tor Vergata'

Business Statistics. Tommaso Proietti. Linear Regression. DEF - Università di Roma 'Tor Vergata' Business Statistics Tommaso Proietti DEF - Università di Roma 'Tor Vergata' Linear Regression Specication Let Y be a univariate quantitative response variable. We model Y as follows: Y = f(x) + ε where

More information

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics STA414/2104 Lecture 11: Gaussian Processes Department of Statistics www.utstat.utoronto.ca Delivered by Mark Ebden with thanks to Russ Salakhutdinov Outline Gaussian Processes Exam review Course evaluations

More information

Bagging During Markov Chain Monte Carlo for Smoother Predictions

Bagging During Markov Chain Monte Carlo for Smoother Predictions Bagging During Markov Chain Monte Carlo for Smoother Predictions Herbert K. H. Lee University of California, Santa Cruz Abstract: Making good predictions from noisy data is a challenging problem. Methods

More information

Bayesian Methods in Multilevel Regression

Bayesian Methods in Multilevel Regression Bayesian Methods in Multilevel Regression Joop Hox MuLOG, 15 september 2000 mcmc What is Statistics?! Statistics is about uncertainty To err is human, to forgive divine, but to include errors in your design

More information

The Basic Two-Level Regression Model

The Basic Two-Level Regression Model 7 Manuscript version, chapter in J.J. Hox, M. Moerbeek & R. van de Schoot (018). Multilevel Analysis. Techniques and Applications. New York, NY: Routledge. The Basic Two-Level Regression Model Summary.

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

analysis of incomplete data in statistical surveys

analysis of incomplete data in statistical surveys analysis of incomplete data in statistical surveys Ugo Guarnera 1 1 Italian National Institute of Statistics, Italy guarnera@istat.it Jordan Twinning: Imputation - Amman, 6-13 Dec 2014 outline 1 origin

More information

Lecture 12: Effect modification, and confounding in logistic regression

Lecture 12: Effect modification, and confounding in logistic regression Lecture 12: Effect modification, and confounding in logistic regression Ani Manichaikul amanicha@jhsph.edu 4 May 2007 Today Categorical predictor create dummy variables just like for linear regression

More information

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent Latent Variable Models for Binary Data Suppose that for a given vector of explanatory variables x, the latent variable, U, has a continuous cumulative distribution function F (u; x) and that the binary

More information

Panel Data. March 2, () Applied Economoetrics: Topic 6 March 2, / 43

Panel Data. March 2, () Applied Economoetrics: Topic 6 March 2, / 43 Panel Data March 2, 212 () Applied Economoetrics: Topic March 2, 212 1 / 43 Overview Many economic applications involve panel data. Panel data has both cross-sectional and time series aspects. Regression

More information

Bayesian Linear Regression

Bayesian Linear Regression Bayesian Linear Regression Sudipto Banerjee 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. September 15, 2010 1 Linear regression models: a Bayesian perspective

More information

The Bias-Variance dilemma of the Monte Carlo. method. Technion - Israel Institute of Technology, Technion City, Haifa 32000, Israel

The Bias-Variance dilemma of the Monte Carlo. method. Technion - Israel Institute of Technology, Technion City, Haifa 32000, Israel The Bias-Variance dilemma of the Monte Carlo method Zlochin Mark 1 and Yoram Baram 1 Technion - Israel Institute of Technology, Technion City, Haifa 32000, Israel fzmark,baramg@cs.technion.ac.il Abstract.

More information

Multilevel Modeling: When and Why 1. 1 Why multilevel data need multilevel models

Multilevel Modeling: When and Why 1. 1 Why multilevel data need multilevel models Multilevel Modeling: When and Why 1 J. Hox University of Amsterdam & Utrecht University Amsterdam/Utrecht, the Netherlands Abstract: Multilevel models have become popular for the analysis of a variety

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

Implementing componentwise Hastings algorithms

Implementing componentwise Hastings algorithms Computational Statistics & Data Analysis 48 (2005) 363 389 www.elsevier.com/locate/csda Implementing componentwise Hastings algorithms Richard A. Levine a;, Zhaoxia Yu b, William G. Hanley c, John J. Nitao

More information

ABC random forest for parameter estimation. Jean-Michel Marin

ABC random forest for parameter estimation. Jean-Michel Marin ABC random forest for parameter estimation Jean-Michel Marin Université de Montpellier Institut Montpelliérain Alexander Grothendieck (IMAG) Institut de Biologie Computationnelle (IBC) Labex Numev! joint

More information

Online Appendix to: Marijuana on Main Street? Estimating Demand in Markets with Limited Access

Online Appendix to: Marijuana on Main Street? Estimating Demand in Markets with Limited Access Online Appendix to: Marijuana on Main Street? Estating Demand in Markets with Lited Access By Liana Jacobi and Michelle Sovinsky This appendix provides details on the estation methodology for various speci

More information

bound on the likelihood through the use of a simpler variational approximating distribution. A lower bound is particularly useful since maximization o

bound on the likelihood through the use of a simpler variational approximating distribution. A lower bound is particularly useful since maximization o Category: Algorithms and Architectures. Address correspondence to rst author. Preferred Presentation: oral. Variational Belief Networks for Approximate Inference Wim Wiegerinck David Barber Stichting Neurale

More information

Lab 3: Two levels Poisson models (taken from Multilevel and Longitudinal Modeling Using Stata, p )

Lab 3: Two levels Poisson models (taken from Multilevel and Longitudinal Modeling Using Stata, p ) Lab 3: Two levels Poisson models (taken from Multilevel and Longitudinal Modeling Using Stata, p. 376-390) BIO656 2009 Goal: To see if a major health-care reform which took place in 1997 in Germany was

More information

Contents. Part I: Fundamentals of Bayesian Inference 1

Contents. Part I: Fundamentals of Bayesian Inference 1 Contents Preface xiii Part I: Fundamentals of Bayesian Inference 1 1 Probability and inference 3 1.1 The three steps of Bayesian data analysis 3 1.2 General notation for statistical inference 4 1.3 Bayesian

More information

ROBUSTNESS OF MULTILEVEL PARAMETER ESTIMATES AGAINST SMALL SAMPLE SIZES

ROBUSTNESS OF MULTILEVEL PARAMETER ESTIMATES AGAINST SMALL SAMPLE SIZES ROBUSTNESS OF MULTILEVEL PARAMETER ESTIMATES AGAINST SMALL SAMPLE SIZES Cora J.M. Maas 1 Utrecht University, The Netherlands Joop J. Hox Utrecht University, The Netherlands In social sciences, research

More information

Exploiting TIMSS and PIRLS combined data: multivariate multilevel modelling of student achievement

Exploiting TIMSS and PIRLS combined data: multivariate multilevel modelling of student achievement Exploiting TIMSS and PIRLS combined data: multivariate multilevel modelling of student achievement Second meeting of the FIRB 2012 project Mixture and latent variable models for causal-inference and analysis

More information

Marginal Specifications and a Gaussian Copula Estimation

Marginal Specifications and a Gaussian Copula Estimation Marginal Specifications and a Gaussian Copula Estimation Kazim Azam Abstract Multivariate analysis involving random variables of different type like count, continuous or mixture of both is frequently required

More information

ML estimation: Random-intercepts logistic model. and z

ML estimation: Random-intercepts logistic model. and z ML estimation: Random-intercepts logistic model log p ij 1 p = x ijβ + υ i with υ i N(0, συ) 2 ij Standardizing the random effect, θ i = υ i /σ υ, yields log p ij 1 p = x ij β + σ υθ i with θ i N(0, 1)

More information

8 Nominal and Ordinal Logistic Regression

8 Nominal and Ordinal Logistic Regression 8 Nominal and Ordinal Logistic Regression 8.1 Introduction If the response variable is categorical, with more then two categories, then there are two options for generalized linear models. One relies on

More information

Estimating a Piecewise Growth Model with Longitudinal Data that Contains Individual Mobility across Clusters

Estimating a Piecewise Growth Model with Longitudinal Data that Contains Individual Mobility across Clusters Estimating a Piecewise Growth Model with Longitudinal Data that Contains Individual Mobility across Clusters Audrey J. Leroux Georgia State University Piecewise Growth Model (PGM) PGMs are beneficial for

More information

var D (B) = var(b? E D (B)) = var(b)? cov(b; D)(var(D))?1 cov(d; B) (2) Stone [14], and Hartigan [9] are among the rst to discuss the role of such ass

var D (B) = var(b? E D (B)) = var(b)? cov(b; D)(var(D))?1 cov(d; B) (2) Stone [14], and Hartigan [9] are among the rst to discuss the role of such ass BAYES LINEAR ANALYSIS [This article appears in the Encyclopaedia of Statistical Sciences, Update volume 3, 1998, Wiley.] The Bayes linear approach is concerned with problems in which we want to combine

More information

WU Weiterbildung. Linear Mixed Models

WU Weiterbildung. Linear Mixed Models Linear Mixed Effects Models WU Weiterbildung SLIDE 1 Outline 1 Estimation: ML vs. REML 2 Special Models On Two Levels Mixed ANOVA Or Random ANOVA Random Intercept Model Random Coefficients Model Intercept-and-Slopes-as-Outcomes

More information

Multilevel Analysis, with Extensions

Multilevel Analysis, with Extensions May 26, 2010 We start by reviewing the research on multilevel analysis that has been done in psychometrics and educational statistics, roughly since 1985. The canonical reference (at least I hope so) is

More information

Notes on Discriminant Functions and Optimal Classification

Notes on Discriminant Functions and Optimal Classification Notes on Discriminant Functions and Optimal Classification Padhraic Smyth, Department of Computer Science University of California, Irvine c 2017 1 Discriminant Functions Consider a classification problem

More information

Bayesian linear regression

Bayesian linear regression Bayesian linear regression Linear regression is the basis of most statistical modeling. The model is Y i = X T i β + ε i, where Y i is the continuous response X i = (X i1,..., X ip ) T is the corresponding

More information

0 o 1 i B C D 0/1 0/ /1

0 o 1 i B C D 0/1 0/ /1 A Comparison of Dominance Mechanisms and Simple Mutation on Non-Stationary Problems Jonathan Lewis,? Emma Hart, Graeme Ritchie Department of Articial Intelligence, University of Edinburgh, Edinburgh EH

More information

Default Priors and Effcient Posterior Computation in Bayesian

Default Priors and Effcient Posterior Computation in Bayesian Default Priors and Effcient Posterior Computation in Bayesian Factor Analysis January 16, 2010 Presented by Eric Wang, Duke University Background and Motivation A Brief Review of Parameter Expansion Literature

More information

Bayesian methods for missing data: part 1. Key Concepts. Nicky Best and Alexina Mason. Imperial College London

Bayesian methods for missing data: part 1. Key Concepts. Nicky Best and Alexina Mason. Imperial College London Bayesian methods for missing data: part 1 Key Concepts Nicky Best and Alexina Mason Imperial College London BAYES 2013, May 21-23, Erasmus University Rotterdam Missing Data: Part 1 BAYES2013 1 / 68 Outline

More information

Sparse Linear Models (10/7/13)

Sparse Linear Models (10/7/13) STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine

More information

Local Likelihood Bayesian Cluster Modeling for small area health data. Andrew Lawson Arnold School of Public Health University of South Carolina

Local Likelihood Bayesian Cluster Modeling for small area health data. Andrew Lawson Arnold School of Public Health University of South Carolina Local Likelihood Bayesian Cluster Modeling for small area health data Andrew Lawson Arnold School of Public Health University of South Carolina Local Likelihood Bayesian Cluster Modelling for Small Area

More information

ECLT 5810 Linear Regression and Logistic Regression for Classification. Prof. Wai Lam

ECLT 5810 Linear Regression and Logistic Regression for Classification. Prof. Wai Lam ECLT 5810 Linear Regression and Logistic Regression for Classification Prof. Wai Lam Linear Regression Models Least Squares Input vectors is an attribute / feature / predictor (independent variable) The

More information

Bayes Estimation in Meta-analysis using a linear model theorem

Bayes Estimation in Meta-analysis using a linear model theorem University of Wollongong Research Online Applied Statistics Education and Research Collaboration (ASEARC) - Conference Papers Faculty of Engineering and Information Sciences 2012 Bayes Estimation in Meta-analysis

More information

Ronald Christensen. University of New Mexico. Albuquerque, New Mexico. Wesley Johnson. University of California, Irvine. Irvine, California

Ronald Christensen. University of New Mexico. Albuquerque, New Mexico. Wesley Johnson. University of California, Irvine. Irvine, California Texts in Statistical Science Bayesian Ideas and Data Analysis An Introduction for Scientists and Statisticians Ronald Christensen University of New Mexico Albuquerque, New Mexico Wesley Johnson University

More information

Computational statistics

Computational statistics Computational statistics Markov Chain Monte Carlo methods Thierry Denœux March 2017 Thierry Denœux Computational statistics March 2017 1 / 71 Contents of this chapter When a target density f can be evaluated

More information

Basics of Modern Missing Data Analysis

Basics of Modern Missing Data Analysis Basics of Modern Missing Data Analysis Kyle M. Lang Center for Research Methods and Data Analysis University of Kansas March 8, 2013 Topics to be Covered An introduction to the missing data problem Missing

More information

Logistic Regression: Regression with a Binary Dependent Variable

Logistic Regression: Regression with a Binary Dependent Variable Logistic Regression: Regression with a Binary Dependent Variable LEARNING OBJECTIVES Upon completing this chapter, you should be able to do the following: State the circumstances under which logistic regression

More information

Hierarchical Generalized Linear Models. ERSH 8990 REMS Seminar on HLM Last Lecture!

Hierarchical Generalized Linear Models. ERSH 8990 REMS Seminar on HLM Last Lecture! Hierarchical Generalized Linear Models ERSH 8990 REMS Seminar on HLM Last Lecture! Hierarchical Generalized Linear Models Introduction to generalized models Models for binary outcomes Interpreting parameter

More information

Metropolis-Hastings Algorithm

Metropolis-Hastings Algorithm Strength of the Gibbs sampler Metropolis-Hastings Algorithm Easy algorithm to think about. Exploits the factorization properties of the joint probability distribution. No difficult choices to be made to

More information

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2 a: Conditional Probability and Bayes Rule

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2 a: Conditional Probability and Bayes Rule 2E1395 - Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2 a: Conditional Probability and Bayes Rule Exercise 2A1 We can call X the observation (X i indicates that the program

More information

Department of. Computer Science. Functional Implementations of. Eigensolver. December 15, Colorado State University

Department of. Computer Science. Functional Implementations of. Eigensolver. December 15, Colorado State University Department of Computer Science Analysis of Non-Strict Functional Implementations of the Dongarra-Sorensen Eigensolver S. Sur and W. Bohm Technical Report CS-9- December, 99 Colorado State University Analysis

More information

Linear Regression. S. Sumitra

Linear Regression. S. Sumitra Linear Regression S Sumitra Notations: x i : ith data point; x T : transpose of x; x ij : ith data point s jth attribute Let {(x 1, y 1 ), (x, y )(x N, y N )} be the given data, x i D and y i Y Here D

More information

Managing Uncertainty

Managing Uncertainty Managing Uncertainty Bayesian Linear Regression and Kalman Filter December 4, 2017 Objectives The goal of this lab is multiple: 1. First it is a reminder of some central elementary notions of Bayesian

More information