Hospital H1 H2 H3 H4. Patient P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 P12. Neighbourhood N1 N2 N3

Size: px

Start display at page:

Download "Hospital H1 H2 H3 H4. Patient P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 P12. Neighbourhood N1 N2 N3"

Scarlett Crawford
6 years ago
Views:

1 Chapter 1 NON-HIERARCHICAL MULTILEVEL MODELS Jon Rasbash and William Browne 1. INTRODUCTION In the models discussed in this book so far we have assumed that the structures of the populations from which the data have been drawn are hierarchical. This assumption is sometimes not justied. In this chapter two main types of non-hierarchical model are considered. Firstly, cross-classied models. The notion of cross-classication is probably reasonably familiar to most readers. Secondly, we consider multiple membership models, where lower level units are inuenced by more than one higher level unit from the same classication. For example, some pupils may attend more than one school. We also consider situations that contain a mixture of hierarchical, crossed and multiple membership relationships. 2. CROSS-CLASSIFIED MODELS This section is divided into three parts. In the rst part we look at situations that can give rise to a two way cross-classication and introduce some diagrams to describe the population structure, and discuss notation for constructing a statistical model. In the second part we discuss some of the possible estimation methods for estimating cross-classied models and give an example analysis of an educational data set. In the third part we then describe some more complex cross-classied structures and give an example analyses of a medical data set. 2.1 TWO WAY CROSS-CLASSIFICATIONS : A BASIC MODEL Suppose, we have data on a large number of patients, attending many hospitals and we also know the neighbourhood in which the patient lives and that we regard patient, neighbourhood and hospital all as important 1

2 2 Table 1.1 Patients cross-classied by hospital and neighbourhood. Neighbourhood 1 Neighbourhood 2 Neighbourhood 3 Hospital 1 XX X Hospital 2 X X Hospital 3 XX X Hospital 4 X XXX Table 1.2 Patients nested within hospitals within neighbourhoods. Hospital 1 Hospital 2 Hospital 3 Hospital 4 Neighbourhood 1 Neighbourhood 2 Neighbourhood 3 XXX XX XXX XXXX sources of variation for the patient level outcome measure we wish to study. Now, typically hospitals will draw patients from many dierent neighbourhoods and the inhabitants of a neighbourhood will go to many hospitals. No pure hierarchy can be found and patients are said to be contained within a cross-classication of hospitals by neighbourhoods. This can be represented schematically, for the case of twelve patients contained within a cross-classication of three neighbourhoods by four hospitals as in table 1.1. In this example we have patients at level 1 and neighbourhood and hospital are cross-classied at level 2. The characteristic pattern of a cross-classication is shown, some rows contains multiple entries and some columns contain multiple entries. In a nested relationship, if the row classication is nested within the column classication then all the entries across a row will fall under a single column and vice versa if the column classication is nested within the row classication. For example, if hospitals are nested within neighbourhoods we might observe the pattern in table 1.2.

3 Many studies follow this simple two-way crossed structure, here are a few examples : Education: students cross-classied by primary school and secondary school. Health: patients cross-classied by general practice and hospital. Survey data: individuals cross-classied by interviewer and area of residence Diagrams for representing the relationship between classications. We nd two types of diagrams useful in expressing the nature of relationships between classications. Firstly, unit diagrams where we draw every unit (patient, hospital and neighbourhood, in the case of our rst example) and connect each lowest level unit(patient) to its parent units (hospital, neighbourhood). Such a representation of the data in table 1.1 is shown in gure Hospital H1 H2 H3 H4 Patient P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 P12 Neighbourhood N1 N2 N3 Figure 1.1 Diagrams for crossed structure given in table 1.1. Note that we have two hierarchies present, patients within hospitals and patients within neighbourhoods, we have organised the topology of the diagram such that patients are nested within hospitals. However,

4 4 when we come to add neighbourhoods to the diagram we see that the connecting lines cross, indicating we have a cross classication. Drawing the hierarchical structure shown in table 1.2 gives the representation shown in gure 1.2. Neighbourhood N1 N2 N3 Hospital H1 H4 H2 H3 Patient P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 P12 Figure 1.2 Diagrams for completely nested structure given in table 1.2. Clearly, todrawsuch diagrams that include all units with large data sets is not practical as there will be far to many nodes on the diagram to t into a reasonable area. However, they can be used in schematic form to convey the structure of the relationship between classications. However, when we have four or ve random classications present (as commonly occur with social data) even schematic forms of these diagrams can become hard to read. There is a more minimal diagram, the classication diagram, which has one node for each classication. Nodes connected by an arrow indicate a nested relationship, nodes connected by a double arrow indicate a multiple-membership relationship (examples are given later) and unconnected nodes indicate a crossed relationship. Thus the crossed structure in gure 1.1 and the completely nested structure of gure 1.2 are drawn as

5 5 Hospital Neighbourhood Neighbourhood Hospital Patient Patient (i) crossed structure (ii) nested structure Figure 1.3 Classication Diagrams for nesting and crossing Some notation for constructing a statistical model. The matrix notation used in this book for describing hierarchical models, that is, y j =X j +Z j j + e j does not readily extend to the case of cross-classications. This is because this notation assumes a unique hierarchy where we write down the generic equation for the jth level two unit. In a simple cross-classication we have two sets of level two units, for example, hospitals and neighbourhoods, so which classication is j indexing? We can extend the basic scalar notation to handle cross-classied structures. Assume we have patients nested within a cross-classication of neighbourhoods by hospital, that is the case illustrated in gure 1.3(i). Suppose wewant to estimate a simple variance components model giving estimates of the mean response and patient, hospital and neighbourhood level variation. In this case we can write the model in scalar notation as y i(j1 j 2 ) = 0 + j1 + j2 + e i(j1 j 2 ) where 0 estimates the mean response, j 1 indexes the neighbourhood classication, j 2 indexes the hospital classication, j1 is the random

6 6 Indexing table for neighbourhoods and hospitals for patients given in g- Table 1.3 ure 1.1 i nhbd(i) hosp(i) eect for neighbourhood j 1, j2 is the random eect for hospital j 2, y i(j1 j 2 ) is the response for the ith patient from the cell in the crossclassication dened by neighbourhood j 1 and hospital j 2 and nally e i(j1 j 2 ) is the patient level residual for the i'th patient from cell in the cross-classication dened by neighbourhood j 1 and hospital j 2. Details of how this notation extends to represent more complex models and patterns of cross-classications are given in Rasbash and Browne, One problem with this notation is that as we t models with more classications and more complex patterns of crossing, the subscript notation that describes the patterns becomes very cumbersome and dicult to read. We therefore prefer an alternative notation introduced in Browne et al., We can write the same model as y i = 0 + nbhd(i) + (3) hosp(i) + e i where i indexes the observation level in this case patients, and nbhd(i) and hosp(i) are functions that return the unit number of the neighbourhood and hospital, respectively, that patient i belongs to. Thus for the data structure drawn in gure 1.1 the values of nbhd(i) and hosp(i) are given in table 1.3. Therefore the model for patient 3 would be

7 7 and for patient 5 would be y 3 = (3) 1 + e 3 y 5 = (3) 2 + e 5 We number the classications from 2 upwards as we use classication number 1 to represent the identity classication that applies to the observation level (like level 1 in a hierarchical model). This classication simply returns the row numbers in the data matrix. As can be seen random eects require bracketed superscripting with their classication number to avoid ambiguity. This simplied notation has the advantage that the subscripting notation does not increase in complexity as we add more classications. This simplication is achieved because the notation makes no attempt to describe the patterns of crossing and nesting present. This is useful information and we therefore advocate the use of this notation in conjunction with the classication diagrams, as shown in gure 1.3, which displays these patterns explicitily. 2.2 ESTIMATION ALGORITHMS We will describe three estimation algorithms for tting cross-classication models in detail and mention other alternatives. Each of these three methods has advantages and disadvantages in terms of speed, memory usage and bias and these will be discussed later. All three methods have been implemented in versions of the MLwiN software package (Rasbash et al., 2000) and all results in this paper are produced by thispackage An IGLS algorithm for estimating cross-classied models. The iterative generalized least squares estimates for a multilevel model are those estimates which simultaneously satisfy both of the following equations: ^ =(X T V ;1 X) ;1 (X T V ;1 y) ^ =(Z T (V ) ;1 Z ) ;1 Z T (V ) ;1 y where ^ are the estimated xed coecients and ^ is a vector containing the estimated variances and covariances of the sets of random eects in the model. V=Cov(y j X ) and an estimate of V is constructed from the elements of ^. y is the vector of elements of (y ; X)(y ; X) T

8 8 and therefore has length n 2 (n is the length of the data set). V is the covariance matrix of y and Z is the design matrix linking y to V in the regression of y on Z. V has the form V =V N V. See Goldstein, 1986 for more details. Some of these matrices are massive for example (V ) ;1 is dimensioned n 2 by n 2, making a direct software implementation of these estimating equations extremely resource intensive both in terms of CPU time and memory consumed. However, in hierarchical models V and V have a block diagonal structure which can be exploited by customised algorithms (see Goldstein and Rasbash, 1996) which allow ecient computation. The problem presented by cross-classied models is that V (and therefore V ) no longer has the block diagonal structure which the ecient algorithm requires Structure of V for cross classied models. Lets take a look at the structure of V, the covariance matrix of y, for cross-classied models and see how we can adapt the standard IGLS algorithm to handle cross-classications. The basic two level cross-classied model (with hospitals + neighbourhoods) can be written as : y i =X + hosp(i) + (3) nhbd(i) + e i hosp(i) N(0 2 ) (3) nhbd(i) N(0 2 (3) ) e i N(0 2 e) The variance of our response is now var(y i )=var( hosp(i) + (3) nhbd(i) + e i)= (3) + 2 e : The covariance between individuals a and b is cov(y a y b )=cov( hosp(a) + (3) nhbd(a) + e a hosp(b) + (3) nhbd(b) + e b) which simplies to 2 for two individuals from the same hospital but dierent neighbourhoods, 2 for two individuals from the same (3) neighbourhood but dierent hospitals, for two individuals (3) from the same neighbourhood and the same hospital and zero for two individuals who are from both dierent neighbourhoods and hospitals. If we take atoy example of ve patients in two hospitals and introduce a cross-classication with two neighbourhoods, as shown in table 1.4. This generates a 5 by 5covariance matrix for the responses of the ve patients with the following structure :

9 9 Table 1.4 Indexing table for hospitals and neighbourhoods for 5 patients i hosp(i) nhbd(i) V= 0 h + n + p h h + n 0 n h h + n + p h n 0 h + n h h + n + p 0 n 0 n 0 h + n + p h n 0 n h h + n + p where n = 2 h= 2 (3) and p = 2 e. Here the data is sorted patient within hospital, this allows us to split the covariance matrix into two components. A component for patients within hospitals which has a block diagonal structure (P) and a component for neighbourhoods which is not block diagonal (Q) : V=P+Q where P= and Q= 0 0 h + p h h 0 0 h h + p h 0 0 h h h + p h + p h h h + p n 0 n 0 n 0 n 0 n 0 n 0 n 0 n 0 n 0 n 0 n 0 n 0 n 1 C A Splitting the structure of V into a hierarchical, block-diagonal part that the IGLS algorithm can handle in an ecient way and a nonhierarchical, non-block diagonal part forms the basis of a relatively ecient algorithm for handling cross-classied models. If we take the dummy variable indicator matrix of neighbourhoods (Z), then we have Q=ZZ T n : 1 C A 1 C A

10 10 Z= C A, ZZT n = We can dene a `pseudo-unit' that spans the entire data set, in our toy example, all ve points, and declare this pseudo-unit to be level three in the model (removing the neighbourhood level from the model). We can now form the three level hierarchical model 1 C A n 2 y i = 0 + hosp(i) + (3) punit(i) 1 Z 1 + (3) punit(i) 2 Z 2 + e i 4 (3) punit(i) 1 (3) punit(i) 2 3 " N(0 (3) 1 (3) ) (3) = 0 2 (3) 2 # hosp(i) N(0 2 ) e i N(0 2 e) Here the level structure is patients within hospitals within the pseudo unit level. Z 1 and Z 2 are columns 1 and 2 of Z. 2 (3) 1 and 2 (3) 2 are both estimates of the between neighbourhood variation, therefore we constrain them to be equal. Thus we can use the standard IGLS hierarchical algorithm to dene and estimate the correct covariance structure for a cross-classied model. Now if we had 200 hospitals and 100 neighbourhoods, we would have to form 100 dummy variables for the neighbourhoods, allow them all to have variances at level 3 and constrain the variances to be equal. Details of this algorithm are given in Rasbash and Goldstein, 1994 and Bull et al., 1999 and it will be refered to as the RG algorithm in later sections MCMC. The MCMC estimation methods (see Chapter 3 of this book for a fuller description) aim to generate samples from the joint posterior distribution of all unknown parameters. They then use these samples to calculate point andinterval estimates for each individual parameter. The Gibbs sampler algorithm produces samples from the joint posterior by generating in turn from the conditional posterior distributions of groups of unknown parameters. In chapter 3 the Gibbs sampling algorithm for a Normally distributed response hierarchical model is given. As we have seen in the notation section we can describe our model as a set of additive terms, one for the xed part of the model and one

11 for each of the random classications. The MCMC algorithm works on each of these terms seperately and consequently the algorithm for a crossclassied model is no more complicated than for a hierarchical model. For illustration we present the steps for the following cross-classied model based on the variance components hospitals by neighbourhoods model and refer the interested reader to Browne et al., 2000 for more general algorithms. Note that if the response is dichotomous or a count then as in chapter 3 we can use the Metropolis-Gibbs hybrid method discussed there. The basic two level cross-classied model (with hospitals + neighbourhoods) can be written as : y i =X + hosp(i) + (3) nhbd(i) + e i 11 hosp(i) N(0 2 ) (3) nhbd(i) N(0 2 (3) ) e i N(0 2 e) We can split our unknown parameters into 6 distinct sets : thexedef- fects,, the hospital random eects,, the neighbourhood random hosp(i) eects, (3), the hospitals variance, nhbd(i) 2 the neighbourhood variance, 2 and the residual variance, (3) 2 e. Then we need to generate random draws from the conditional distribution of each of these six groups of unknowns. MCMC algorithms are generally used in a Bayesian context and consequently we need to dene prior distributions for our unknown parameters. For generality we will use a multivariate Normal prior for the xed eects, N pf ( p S p ), and scaled inverse (SI) 2 priors for the three variances. For the hospital variance 2 SI2 ( 2 s 2 2), for the neighbourhood variance 2 (3) SI2 ( 3 s 2 3) and for the residual variance e 2 SI2 ( e s 2 e). The steps are then as follows: In step 1 of the algorithm the conditional posterior distribution in the Gibbs update for the xed eects parameter vector is multivariate normal with dimension p f (the number of xed eects) : p( j y (3) 2 2 (3) 2 e) N pf (b b D) where bd = hp Ni=1 (X i ) T X i e 2 (X i ) T d i e 2 b = b D hp i + S ;1 p + S p ;1 p i ;1 and d i = y i ; hosp(i) ; (3) nhbd(i) : i where In step 2 we update the hospital residuals, k, using Gibbs sampling with a univariate Normal full conditional distribution :

12 12 p( k j y (3) 2 2 (3) 2 e) N(b bd k = bu k = D b k n k e P i hosp(i)=k 2 ;1 and d i 2 e d i = y i ; X i ; (3) nhbd(i) : k D b k ) where where In step 3 we update the neighbourhood residuals, (3) k, using Gibbs sampling with a univariate Normal full conditional distribution : p( (3) k j y 2 2 (3) 2 e) N(b (3) bd (3) k = bu (3) k = D b (3) k n (3) k e P i nhbd(i)=k 2 (3) ;1 and d (3) i 3 e d (3) i = y i ; X i ; hosp(i) : Note that in the above two stepsn (c) k k D b (3) k ) where where refers to the number of individuals in the kth unit of classication c. In step 4 we update the hospital variance 2 and a Gamma full conditional distribution for 1= 2 : using Gibbs sampling h p(1= 2 j y (3) 2 (3) 2 e) Gamma n2 + 2 P 1 n2 2 2 j=1 ( j ) s 2 2 In step 5 we update the neighbourhood variance 2 using Gibbs sampling and a Gamma full conditional distribution for 1= 2 : (3) (3) h p(1= 2 (3) j y (3) 2 2 e) Gamma n3 + 3 P 1 n3 2 2 j=1 ((3) j ) s 2 3 In step 6 we update the observation level variance e 2 using Gibbs sampling and a Gamma full conditional distribution for 1=e 2 : h p(1=e 2 j y (3) 2 (3)) Gamma N +e P i i e2 i + es 2 e : The above 6 steps are repeatedly sampled from in sequence to produce correlated chains of parameter estimates from which point and interval estimates can be created as in chapter AIP method. The Alternating Imputation Prediction (AIP) method is a data augmentation algorithm for estimating cross-classied i i : :

13 models with large numbers of random eects. Comprehensive details of this algorithm are given in Clayton and Rasbash, We now give an overview. Data augmentation has been reviewed by Schafer, Tanner and Wong, 1987 introduced the idea of data augmentation as a stochastic version of the EM algorithm for maximum likelihood estimation in problems involving missing data. Corresponding to the E and M steps of Tanner and Wong we have I(mputation) step - impute missing data by sampling the distribution of the missing data conditional upon the observed data and current values of the model parameters. P(osterior) step - sample parameter values from the complete data posterior distribution these will be used for the next I-step. In the context of random eect models, the random eects play the role of missing data. If the observed data are denoted by y, the random eects by and the model parameters by and if we denote the probability distribution of y conditional on X as p(yjx) then the algorithm is specied (at step t) by Istep-Draw a sample (t) from p( j y = (t;1) ) P step - Draw asample (t) from p( j y = (t) ) Repeated application of these two steps delivers a stochastic chain with equilibrium distribution p( j y) in a similar way to the MCMC algorithm. Now lets look at how we can adapt this method to t a crossed random eects model when the only estimating engine we have at our disposal is one optimized for tting nested random eects. An n-way cross-classied model can be broken down into n sub-models each of which is a 2 level hierarchical model. For example, patients nested within a cross-classication of neighbourhood by hospital can be broken down into a patient within hospital sub-model and a patient within neighbourhood sub-model. Take the simple model y i =X i + nbd(i) + (3) hosp(i) + e i where neighbourhood and hospital are cross-classied. This crossclassied model can be portioned into two hierarchical sub models : patients within neighbourhoods (model N) and patients within hospitals (model H). An informal description of the AIP algorithm is : 1. Start by tting model N using an estimation procedure for 2 level models. 13

14 14 2. Sample the model parameters from an approximation to their joint posterior distribution. That is sample the xed eects, the neighbourhood level variance and the patient level variance denote these samples by [0 2], 2 [0 2] and 2 e[0 2] respectively. Here [0,2] labels a term as belonging to AIP iteration 0, for classication number 2, that is neighbourhood. This is the P-step for the neighbourhood classication. 3. Next sample a set of neighbourhood level random eects(o [0 2] ) from p( [0 2] j y [0 2] 2 [0 2] 2 ). This is the I-step for the e[0 2] neighbourhood classication. 4. Oset o [0 2] from y, that is form y = y ; o [0 2], re-sort the data according to hospitals and t model H using the new oset response y. 5. Next sample [0 3], 2 and 2, from this second model, H. [0 3] e[0 3] This is the P-step for the hospital classication. 6. Sample a set of hospital level random eects(o [0 3] ) from p( [0 3] j y [0 3] 2 [0 3] 2 ). This is the I-step for the hospital classication. e[0 3] This completes one iteration of the AIP algorithm, this is an Imputation- Posterior algorithm that Alternates between the neighbourhood and hospital classications. We proceed by forming y = y ;o [0 3], that is osetting the sampled hospital residuals from y and using that as a response in step 1. After T iterations the procedure delivers the following two chains, that can be used for inference f [0 2] 2 [0 2] 2 e[0 2] g f [1 2] 2 [1 2] 2 e[1 2] g :::f [T 2] 2 [T 2] 2 e[t 2] g f [0 3] 2 [0 3] 2 e[0 3] g f [1 3] 2 [1 3] 2 e[1 3] g :::f [T 3] 2 [T 3] 2 e[t 3] g Note that we get two sets of estimates for both the xed eects and the level 1 variance with the AIP algorithm and these should be approximately equal Other Methods. Raudenbush, 1993 considers an empirical Bayes approach to tting cross-classied models based on the EM algorithm. He considers the specic case of two classications where one of the classications has many units whilst the other has far fewer and shows two educational examples to illustrate the method. Two other recent approaches that can be used for tting cross-classied models, in particular with non-normal responses are Gauss-Hermite quadrature within PQL estimation Pan and Thompson, 2000 and the

15 HGLM model framework as described in Lee and Nelder, Neither of these approaches has been designed with speed of estimation in mind and so they are currently not feasible for the size of some of the problems that are considered in practice Comparison of estimation methods. The RG method when it works is generally fairly quick to converge where all or all but one of the crossed classications have small numbers of units. When there are multiple crossed classications with large numbers of untis then the speed of the RG algorithm deteriorates and memory usage is greatly increased, often exhausting the available memory. The AIP method does not have these memory problems but will be slower for structures that are almost hierarchical. Although this method works reasonably well, if the response is a binary variable and quasi-likelihood methods need to be used, then this method like the RG method is still aected by the bias that is inherent in quasi-likelihood methods for binary response multilevel models (See Goldstein and Rasbash, 1996). The MCMC methods have no bias problems although there are still issues on which prior distributions to use for the variance parameters. They also, like the AIP methods do not have any memory problems. They are however generally computationally a lot slower as they are estimating the whole distribution and not simply the mode, although as the structure of the data becomes more complex the ratio of speed dierence is reduced An example analysis of a two way cross-classication: primary schools crossed with secondary schools. We will here consider tting the RG method using the IGLS algorithm, the MCMC method based on Gibbs sampling (Browne et al., 2000) and the AIP method to an educational example from Fife in Scotland. Here we have as a response the exam results of 3,435 children at age 16. We know for each child both the primary school and secondary school that they attended and we areinterested in partitioning the variance between these two sources and individual pupil level variation. The classication diagram is shown in gure 1.4. There are 148 primary schools that feed into 19 secondary schools in the dataset. Of the 148 primary schools, 59 are nested within a single secondary school, whilst another 62 have at most 3 pupils that do not go to the main secondary school so we have an almost nested structure. This structure is particularly suited for the RG algorithm. We will t the following model to the dataset 15

16 16 Primary School Secondary School Pupil Figure 1.4 Classication Diagram for the Fife educational example Table 1.5 Point estimates for the Fife educational dataset. Parameter IGLS MCMC AIP Mean achievement ( 0 ) 5.50 (0.17) 5.50 (0.18) 5.51 (0.19) Secondary school variance ( 2 ) 0.35 (0.16) 0.41 (0.21) 0.34 (0.15) Primary school variance ( 2 ) (3) 1.12 (0.20) 1.15 (0.213) 1.11 (0.20) Individual level variance (e) (0.20) 8.12 (0.20) 8.11 (0.20) y i = 0 + SEC(i) + (3) PRIM(i) + e i SEC(i) N(0 2 ) (3) PRIM(i) N(0 2 (3) ) e i N(0 2 e): The results are shown in table 1.5. From table 1.5 we can see that in this example there is more variation between primary schools than between secondary schools. The MCMC

17 estimates replicate the IGLS estimates with slightly greater higher level variances (mean versus mode estimates) due to the skewness of the posterior distribution. The AIP method gives very similar results to the IGLS method. A further discussion of these results is given in Goldstein, MODELS FOR MORE COMPLEX POPULATION STRUCTURES In this section we will consider expanding the simple two cross-classied structure to accomodate more classications and more complex structures Example scenarios. Lets take the situation described in the classication diagram drawn in gure 1.3(i) where patients lie within a cross-classication of hospitals by areas. We may have information on the doctor that treated each patient and doctors may be nested within hospitals. The classication diagram for this structure is shown in gure Hospital Doctor Neighbourhood Patient Figure 1.5 Classication Diagram for two crossed hierarchies (patients within doctors within hospitals)*(patients within neighbourhoods). A variance components model for this structure is written as

18 18 y i = 0 + nhbd(i) + (3) hosp(i) + (4) doct(i) + e i If doctors work across hospitals and are therefore not nested within hospital we then have a three way cross-classication which is drawn in gure 1.6. Hospital Neighbourhood Doctor Patient Figure 1.6 Classication Diagram for three crossed hierarchies (patients within hospitals)* (patients within doctors)*(patients within neighbourhoods). Note that the variance components model for the structure in gure 1.6 is also described by the same equation. This is a reection of the fact that the model notation for describing the random eects simply lists the classications that are sources of variation for the response we are modelling. In the variance components model we only have an intercept term which varies across all four classications present. Suppose we had another explanatory variable, x 1 and we wished to allow its coecient to vary across the doctor classications we would write this model as y i = 0 + nhbd(i) + (3) hosp(i) + (4) doct(i) 0 + 1x 1i + (4) doct(i) 1 x 1i + e i or alternatively we can express the model as :

19 19 y i = 0i + 1i x 1i + e i 0i = nhbd(i) (3) + hosp(i) (4) doct(i) 0 1i = 1 + (4) doct(i) 1 It may be that the scenario described in gure 1.6 is further complicated because hospitals, doctors and neighbourhoods are all nested within regions. In this case the classication diagram becomes as in gure 1.7. Region Hospital Neighbourhood Doctor Patient Figure 1.7 Classication Diagram for three crossed hierarchies nested within a higher level classication. Extending the last model to incorporate a simple random eect for the region classication we have y i = 0i + 1i x 1i + e i 0i = nhbd(i) (3) + hosp(i) (4) + doct(i) 0 (5) reg(i) 1i = 1 + (4) doct(i) 1 These few example scenarios indicate how the classication diagrams and simplied notation can extend to describe patterns of crossings of arbitrary complexity An example analysis of a complex cross-classied structure : Articial Insemination data. We consider a data set con-

20 20 cerning articial insemination by donor. Detailed description of this data set and the substantive research questions addressed by modelling it within a cross-classied frame work are given in Ecochard and Clayton, The data was re-analysed in Clayton and Rasbash, 1999 as an example case study demonstrating the properties of the AIP algorithm for estimating cross-classied models. The data consists of 1901 women who were inseminated by sperm donations from 279 donors. Each donor made multiple donations, there were 1328 donations in all. A single donation is used for multiple inseminations. Each woman receives a series of monthly inseminations, 1 insemination per ovulatory cycle. The data contain cycles within the 1901 women. There are two crossed hierarchies, a hierarchy for donors and a hierarchy for women. Level 1 corresponds to measures made at each ovulatory cycle. The response we analyse is the binary variable indicating if conception occurs in a given cycle. The hierarchy for women is cycles within women. The hierarchy for donors is cycles within donations within donors. Within a series of cycles a women may receive sperm from multiple donors/donations. The classication diagram for this structure is given in gure 1.8. The model tted to the data is Donor Donation Woman Cycle Figure 1.8 Classication Diagram for the articial insemination example model.

21 21 Table 1.6 Results for the Articial insemination example. Parameter MCMC AIP intercept ( 0 ) (0.21) (0.21) azoospermia ( 1 ) 0.21 (0.09) 0.22 (0.10) semen quality ( 2 ) 0.18 (0.03) 0.18 (0.03) womens age > 35 ( 3 ) (0.12) (0.12) Sperm count ( 4 ) (0.001) (0.001) Sperm motility ( 5 ) (0.0001) (0.0001) Insemination too early ( 6 ) (0.17) (0.17) Insemination too late ( 7 ) (0.09) (0.09) Donor variance ( 2 ) (4) 0.11 (0.06) 0.10 (0.06) Donation variance ( 2 ) (3) 0.36 (0.074) 0.34 (0.065) Women variance ( 2 ) 1.02 (0.15) 1.01 (0.11) y i Bernouilli( i ) logit( i )= 0 + azoo i 1 + semenq i 2 + age > 35 i 3 + spermcount i 4 + spermmot i 5 + iearly i 6 + ilate i woman(i) (3) + donation(i) (4) (donor(i) woman(i) N(0 2 ) (3) donation(i) N(0 2 (3) ) (4) donor(i) N(0 2 ) (4) (1.1) Note that azoospermia (azoo) is a dichotomous variable indicating whether the fecundability of the woman is impaired (0 impaired, 1 not impaired). The results of tting this model from the MCMC and AIP estimation procedures are given in table 1.6. This model could not be tted using the RG algorithm. This is because if the data is sorted according to women then we need to t 279 dummy variables for donors and 1328 dummy variables for donations. Alternatively, if we sort the data according to donations within donors we have to t 1901 dummy variables for women. Either way, the size of these data matrices cause problems of insucient memory. Even if these memory problems can be worked around the numerical instability of the constraining procedure, that attempts to constrain over a thousand seperately estimated variances to be equal, causes the adapted IGLS algorithm to fail to converge. After inclusion of covariates there is considerably more variation in the probability of a successful insemination attributable to the women

22 22 hierarchy thanthedonorhierarchy. Both the AIP and MCMC methods give similar estimates for all parameters. The xed eect estimates show that the probability of conception is increased with azoospermia and increased sperm quality, count and motility but decreased with the age of the woman and with inseminations that are too early or late. 3. MULTIPLE MEMBERSHIP MODELS As we have seen from the previous section, allowing classications to be crossed gives rise to a large family of additional model structures that can be estimated. The other main restriction of the basic multilevel model is the need for observations to belong to a unique classication unit i.e. every pupil belongs to a particular class, every patient is treated at a particular hospital. Often however, over time a patient may be treated at several hospitals and depending on the response of interest all of these hospitals may have inuence. In this section we will rstly introduce the idea of multiple membership and give some example scenarios where it may occur. We will then discuss the possible estimation procedures that can be used to t multiple membership models and nish the chapter with a simulated example from the eld of education. 3.1 A BASIC STRUCTURE FOR TWO LEVEL MULTIPLE MEMBERSHIPS Supose we have data on a large number of patients that attend their local hospital and during the course of their hospital stay they are treated by several nurses and we regard the nurses as an important factor on the patients outcome of interest. Now typically each patient will be seen by more than one nurse during their stay (although some will only see 1) but there are many nurses and so we will treat nurses as a random classication rather than as xed eects. To illustrate this table 1.7 shows the nurses seen by the rst 4 patients. We can consider this structure in a unit diagram as shown in gure 1.9. Here each line in the diagram corresponds to a tick mark in the table. Again as our dataset gets larger such unit diagrams become impractical as there will be too many nodesandsowe will resort to using the classication diagrams introduced earlier for cross-classied models. If we wish to include multiple membership classications in such diagrams we use the convention of a double arrow to represent multiple membership. This will lead to the classication diagram shown in gure 1.10 for the above patients and nurses example.

23 23 Table 1.7 Table of patients that are seen by multiple nurses. Patient 1 Patient 2 Patient 3 Patient 4 Nurse 1 Nurse 2 Nurse 3 p p p p p p p Nurse N1 N2 N3 Patient P1 P2 P3 P4 Figure 1.9 Unit Diagram for multiple membership patients within nurses example Example scenarios. Many studies have multiple membership structure, here are a few examples : Education : pupils change school/class over the course of their education and each school/class has an eect on their education. Health : patients are seen by several doctors and nurses during the course of their treatment. Survey data : Over their lifetime individuals move household and each household has a bearing on their lifestyle, health, salary etc Constructing a statistical model. Returning to our example of patients being seen by multiple nurses, we have patient 1's response being aected by nurse 1 and nurse 3 while patient 2 is only aected by nurse 1. As we are treating nurse as a random classication

24 24 Nurse Patient Figure 1.10 example. Classication Diagram for multiple membership patients within nurses we would like each patient's response to have equal eect on the nurse classication variance so we generally weight the random eects to sum to 1. For example let's assume patient 1 has been treated by nurse 1 for 2 days and nurse 3 for 1 day. Then we may give nurse 1 a weight of 2 and nurse 3 a weight of 1. Often we do not have information on 3 3 the amount of time patients are seen by each nurse and so we commonly allocate equal weights (in this case 1 )toeach nurse. 2 We can then write down a general two level multiple-membership model as y i =X + X j2nurse(i) w i j j + e i j N(0 2 ) e i N(0 2 e) nurse(i) istheset of nurses seen by patient i and w i j given to nurse j for patient i. Here we assume that X j2nurse(i) w i j =18i: is the weight

25 If we wish to write out this model for the rst four patients from the example we get 25 y 1 =X e 1 y 2 =X e 2 y 3 =X e 3 y 4 =X e ESTIMATION ALGORITHMS There are two main algorithms for multiple membership models, an adaption of the Rasbash and Goldstein, 1994 algorithm described earlier and the MCMC method. The AIP method has not been extended to cater for multiple membership models An IGLS algorithm for multiple membership models. Earlier we described how to t a cross-classied model by absorbing one of the cross-classications into a set of dummy variables (The RG method). A slight modication is required to allow this technique to be used to t multiple membership models. First lets consider a two level hierarchical model for patients within nurses: y i = 0 + nurse(i) + e i nurse(i) N(0 2 ) e i N(0 2 e): We can reparamaterise this simple two level model as y i = 0 +z i 1 nurse(i) 1 +z i 2 nurse(i) 2 +z i 3 nurse(i) 3 +:::+z i J nurse(i) J +e i nurse(i) 1 nurse(i) 2 nurse(i) 3. nurse(i) J N(0 ) = J e i N(0 2 e)

26 26 where z i j is a dummy variable which is 1 if patient i is seen by nurse j, 0 otherwise and J is the total number of nurses. Also we add the constraint 2 = 1 2 = ::: = 2 2. Now these two models will J deliver the same estimates, however the second formulation will take much longer to compute. The advantage of the second model formulation is that it is straightforward to extend it to the multiple membership case. Suppose patients are not nested within a single nurse but are multiple members of nurses with membership probabilities, i j. We can simply replace z i j with i j in the second formulation and estimation can proceed in an identical fashion but will now deliver estimates for the multiple membership model MCMC. Once again we will use a Gibbs sampling algorithm that relies on updating groups of parameters in turn from their conditional posterior distributions. For illustration we present the steps for the following simple multiple membership model based on the variance components model patients within nurses described earlier. We once again refer the interested reader to Browne et al., 2000 for more general algorithms and note that if the response is dichotomous or a count then as in chapter 3 we can use the Metropolis-Gibbs hybrid method discussed there. The basic two level multiple membership model (patients within nurses) can be written as : y i =X + X j2nurse(i) w i j j + e i j N(0 2 ) e i N(0 2 e) We can split our unknown parameters into 4 distinct sets : the xed eects,, the nurse random eects, j, the nurse level variance, 2 and the patient level residual variance, e. 2 We then need to generate random draws from the conditional distribution of each of these four groups of unknowns. We will dene prior distributions for our unknown parameters as follows: For generality we will use a multivariate Normal prior for the xed eects, N pf ( p S p ), and scaled inverse 2 priors for the twovariances. For the nurse level variance 2 SI2 ( 2 s 2 2), and for the patientlevel variance e 2 SI2 ( e s 2 e). The steps are then as follows: In step 1 of the algorithm the conditional posterior distribution in the Gibbs update for the xed eects parameter vector is multivariate normal with dimension p f (the number of xed eects) :

27 27 p( j y 2 2 e) N pf (b b D) where bd = h PNi=1 (X i ) T X i e 2 (X i ) T d i e 2 b = b D h P i + S ;1 p + S p ;1 p i ;1 and i where d i = y i ; P j2nurse(i) w i j j : In step 2 we update the nurse residuals, k, using Gibbs sampling with aunivariate Normal full conditional distribution : p( k j y 2 2 e) N(b " P i k2nurse(i) bd k = bu k = D b k (w i k )2 2 e " P i k2nurse(i) k D b k ) where + 1 w i k d i k e 2 # 2 # ;1 and where d i k = y i ; X i ; P j2nurse(i) j6=k w i j j : In step 3 we update the nurse level variance 2 using Gibbs sampling and a Gamma full conditional distribution for 1= 2 : h p(1= 2 j y e) 2 Gamma n2 + 2 P 1 n2 i 2 2 j=1 ( j ) s 2 2 : In step 4 we update the patient level variance e 2 using Gibbs sampling and a Gamma full conditional distribution for 1=e 2 : h p(1=e 2 j y 2 ) Gamma N +e P i i e2 i + es 2 e : The above 4 steps are repeatedly sampled from in sequence to produce correlated chains of parameter estimates from which point and interval estimates can be created as in chapter Comparison of estimation methods. As in the comparison for cross-classied models there are benets for both methods. The RG method is fairly quick but the number of level 2 units determines the size of some of the matrices involved and the number of constraints that the method has to apply. These dependencies lead to numerical instability or memory exhaustion in situations with more than a few hundred level 2 units. The MCMC methods although again computationally slower do not suer from these memory problems An example analysis of a twolevel multiple membership model : Children moving school. We consider a simulated

28 28 Table 1.8 Results for the multiple membership schools example. Parameter RG RIGLS estimates MCMC Estimates intercept ( 0 ) (0.040) (0.040) LRT eect ( 1 ) (0.012) (0.013) School variance ( 2 ) (3) (0.018) (0.020) Pupil variance ( 2 ) (0.013) (0.013) data example based on the problem in education of adjusting for the fact that pupils move school during the course of their studies. We will consider a study with 4059 students from 65 schools taken from Rasbash et al., The actual data in the study has each child belonging to 1 school but we will assume that over their education 10% of children moved school so we will choose at random for 10% of the children a second school. We will assume that information about when the move occured is unavailable and so for these children we will allocate equal weights of 0.5 to each school. Browne et al., 2000 considered this as the basis for a simulation experiment by generating 1000 datasets with this structure to show the bias and coverage properties of the MCMC method. We will instead consider the true response on our modied structure. We have as a response the pupil's total (normalised) exam score in all GCSE exams taken at age 16 and as a predictor the pupil's (standardised) score in a reading test taken at age 11. As we are interested in progress from age it makes sense to consider the eect of all schools attended in this period. We will consider the following model normexam i = standlrt i + X j2school(i) w i j j + e i j N(0 2 ) e i N(0 2 e) We t this model using both the RG and MCMC methods and the results can be seen in table 1.8 From the table we can see that both methods give similar results. If we compare the results here with the results in Rasbash et al., 2000 we see only slight changes to the estimates with the level 2 variance slightly decreased and the level 1 variance slightly increased. However in cases where there is greater amounts of multiple membership the

29 variance estimates can be altered if this multiple membership is ignored, for example if we randomly assigned every pupil to a second school the variances change to and at levels 1 and 2 respectively. 4. COMBINING MULTIPLE MEMBERSHIP AND CROSS-CLASSIFIED STRUCTURES IN A SINGLE MODEL Consider two of our earlier examples in the eld of education, rstly pupils in a crossing of primary schools and secondary schools and secondly pupils who are moving from school to school. We could assume that these two structures occur simultaneously and we will then end up with a model structure that contains both a multiple membership classication (secondary schools) and a second classication (primary schools) that is crossed with the rst. This scenario can be represented by a classication diagram as in gure Browne et al., 2000 refer to models that contain both multiple memberships and cross classications as multiple membership multiple classication (MMMC) models. 29 P. School S. School Pupil Figure 1.11 model. Classication Diagram for the neighbours/schools multiple membership

30 EXAMPLE SCENARIOS Many studies have both cross-classied and multiple membership classications in their structure, a few examples are the following : Education : pupils can be aected by the crossing of the neighbourhood they live in and the school they attend. They could also change class over their period of education and so this multiple membership class classication will be crossed with the neighbourhood classication. Health : patients are seen by several doctors during their treatment and may visit several hospitals. Doctors who are specialists may move from hospital to hospital and so are crossed with the hospitals. Survey Data : individuals will belong to many households over the course of their lives and will reside in several properties. An entire household may move to a new property so households can be crossed with properties and all the households/properties can have an eect on the individual. See Goldstein et al., 2000 for more details. Spatial Data : individuals will belong to a particular area but will also be aected by multiple neighbouring areas. 4.2 CONSTRUCTING A STATISTICAL MODEL If we return to our example of pupils attending multiple secondary schools but coming from one primary school we need to combine the multiple membership and cross classied model structures into one model. As we are treating the secondary schools as a random classication we would like each pupil to have an equal eect on the secondary school classication so we willuseweights that add to 1 when a pupil attends more than one secondaryschool. We will let second(i) be the list of secondary schools that child i has attended. We can then write down a general two classication MMMC model as y i =X + X j2second(i) w i j j + (3) prim + e i j N(0 2 ) (3) prim N(0 2 (3) ) e i N(0 2 e):

31 Here w i j 31 is the weight given to secondary school j for pupil i. Here we assume that P j2second(i) w i j = 18i. Both the RG algorithm and the MCMC method can be used to t these models that combine both multiple membership and cross classication. 4.3 AN EXAMPLE ANALYSIS : DANISH POULTRY FARMING Rasbash and Browne, 2001 consider an example from veterinary epidemiology concerning the outbreaks of salmonella typhimurium in ocks of chickens in poultry farms in Denmark between 1995 and The response of interest is whether salmonella typhimurium is present in a ock and in the data collected 6.3% of ocks had the disease. At the observation level, each observation represents a ock of chickens. For each ock the response variable is whether or not there was an instance of salmonella in that ock. The basic data have a simple hierarchical structure as each ock iskept in a house on a farm until slaughter. As ocks live for a short time before they are slaughtered several ocks will stay in the same house each year. The hierarchy is as follows 10,127 child ocks within 725 houses on 304 farms. Each ock is created from a mixture of parentocks (up to 6) of which there are 200 in Denmark and so we have a crossing between the child ock hierarchy and the multiple membership parent ock classication. The classication diagram can be seen in gure We also know the exact makeup of each child ock (in terms of parent ocks) and so can use these as weights for each of the parent ocks. We are interested in assessing how much of the variability in typhoid incidence can be attributed to houses, farms and parent ocks. There are also 4 hatcheries in which all the eggs from the parent ocks are hatched. We will therefore t a variance components model that allows for dierent average rates of salmonella for each year with hatchery included in the xed part as follows : salmonella i Bernouilli( i ) logit( i )= 0 + Y Y hatch2 3 + hatch3 4 + hatch House(i) (3) + P Farm(i) j2p:flock(i) w(4) i j (4) j House(i) N(0 2 ) (3) Farm(i) N(0 2 (3) ) (4) N(0 2 (4) ) (1.2)

32 32 Farm House Parent Flock Child Flock Figure 1.12 Classication diagram for the Danish poultry model. The results of tting model 1.2 using both the Rasbash and Goldstein method with 1st order MQL estimation and the MCMC method can be seen in table 1.9. The quasi-likelihood methods are numerically rather unstable and we could not get either 2nd order MQL or PQL to t this model. We can see here that there are large eects for the year the chickens were born suggesting that salmonella was more prevalent in 1995 than the other years. The hatchery eects were also large suggesting chickens produced in hatcheries 1 and 3 had a larger incidence of salmonella. There is a large variability for the parent ock eects and for the farm eects which are of similar magnitude. There is less variability between houses within farms Method comparison. The MCMC results were run for 50,000 iterations after a burn-in of 20,000 (This took just under 2 hours on a 733MHz PC) as we used arbitrary starting values and so the chain took a while to converge. From table 1.9 we can see reasonable agreementbetween the two methods, although the xed eects in MQL are all smaller as is the farm level variance. This behaviour was shown in simulations on a nested 3 level binary response data structure in Rodriguez

Summary of Talk Background to Multilevel modelling project. What is complex level 1 variation? Tutorial dataset. Method 1 : Inverse Wishart proposals.

Summary of Talk Background to Multilevel modelling project. What is complex level 1 variation? Tutorial dataset. Method 1 : Inverse Wishart proposals. Modelling the Variance : MCMC methods for tting multilevel models with complex level 1 variation and extensions to constrained variance matrices By Dr William Browne Centre for Multilevel Modelling Institute