Linear Mixed Effects Models WU Weiterbildung SLIDE 1
Outline 1 Estimation: ML vs. REML 2 Special Models On Two Levels Mixed ANOVA Or Random ANOVA Random Intercept Model Random Coefficients Model Intercept-and-Slopes-as-Outcomes Model Hierarchical Linear Model 3 Variance Structures 4 Testing Hypotheses for LMM 5 Predicting Values With LMM Best Linear Unbiased Estimators (BLUE) Best Linear Unbiased Predictors (BLUP) 6 Centering 7 Generalisations SLIDE 2
What Are Mixed Effect Models? Models with mixed effects, i.e. fixed and random effects Very similar mixed effects models are known under many different names Mixed (effect) models (statistics esp. biostatistics) Multilevel models (sociology) Hierarchical models (psychology) Random coefficient models (econometrics)... SLIDE 3
Types Of Mixed Effect Models We can distinguish between Linear mixed effect models (LMM) Linear models for metric variables with normal error structure t-test, ANOVA, Regression, ANCOVA,... Generalized linear mixed effect models (GLMM) Linear models for expected values of exponential families Linear models, logistic models, poisson models, gamma models,... Nonlinear mixed effect models Nonlinear models with nonlinear influence of the random effects GLMM, growth curve models We will only look at the first class of models. SLIDE 4
Linear Mixed Effect Models We will consider models with the following properties Normal error structure (i.e. the errors are normally distributed) The dependent variable is a linear combination of the explanatory variables The explanatory variables can have fixed or random effects or both More than one source of variation is present They are for data that need not be i.i.d. sampled from the same population SLIDE 5
Types Of Effects - I Fixed effects are the classic type of effects They influence the mean of the dependent variable They are usually of interest, hence inference is done for them They are controlled All levels or realisations that are of interest are included Usually the substantive theory refers to them Small number of levels for factors All levels/realisations under consideration are represented in the coding Examples: Experimental conditions (treatment/control), causal variables (education on income), control factors (age), uncontrollable conditions in quasi-experimental designs (gender),... SLIDE 6
Types Of Effects - II Random effects are the classy type of effects They influence the covariance structure of the dependent variable They are effects associated with the sampling We have only a random sample of all possible levels/realisations (in our coding) They influence how accurate our fixed effect model predictions are We are usually interested in their variance, not their actual values Usually a large number of possible levels/realisations Examples: Error, experimental conditions that are not fully realised (all chairs of a company), confounding variables, mediators, individuality of subjects,... If an effect is random or fixed is due to the study design! SLIDE 7
Nested Random Effects In Models (Hierarchically) Nested random effects arise if we have some type of clustering structure, either by design or because of our interest. The idea is that observations within a cluster are not independent Each random effect level is some kind of subpopulation Independence holds between clusters and their nested structues (i.e. subjects, realisations or other cluster) Can be multilevel, i.e. a cluster in a cluster in a cluster Very often the case in social science Can correct sampling errors to a certain degree Should be the method of choice for quasi-experiments We will mostly focus on this type of random effects. SLIDE 8
Crossed Random Effects In Models Crossed random effects (non-hierarchically nested) arise if there is no hierarchically nested structure, either by design or because of our interest. We are interested in a fixed effect but want to correct for possible sources of bias (or random fluctuations) Subjects belong to more than one factor that are (factorially) crossed If nesting is not strict but dynamic, crossed effects also arise (cross-over designs for example) Subjects may be cross-classified (partially crossed) Rarely the case in social science but potentially very useful SLIDE 9
Types Of Grouped Data Some examples of natural hierarchical data Longitudinal data Multicenter data Meta analysis Questionnaires Representative surveys data Block designs Panels Matched pairs SLIDE 10
Conceptual Differences LMM Vs. LM Classic Linear Models They have one source of randomness, the error The error structure is usually specified as independent identically distributed Sampling has to be totally random for all subjects Mixed Models They have more than one source of randomness, the random effects (including error) The errors no longer need to be independent identically distributed Sampling can be less restrictive They can account for heteroscedasticity SLIDE 11
Model Formulation - LM In matrix notation, the linear model looks like this with y = Xβ + ɛ, ɛ N (0, Σ). Here y is the dependent variable vector, X is a n p design matrix ɛ the vector of errors and β the fixed effect parameter vector of interest. Often, Σ is assumed to be σ 2 I with I being the n n indentity matrix. The distribution of y is then normal y N (Xβ, Σ) SLIDE 12
Model Formulation - LMM In matrix notation, the linear mixed model looks like this with y = Xβ + Zu + ɛ, ɛ N (0, Σ), u N (0, Ω) The notation is as before, and Z is a n q design matrix for the random effects, u is the vector of random effect parameters and Ω is the variance-covariance matrix of the random effects (in which we are primarily interested). The random vectors u and ɛ are independent. This leads to the distribution of y to be with V = ZΩZ + Σ. SLIDE 13 y N (Xβ, V)
Software For LMM - General SPSS offers the MIXED procedure to fit models as described above. In R we have a number of packages that can calculate these models, but lme4 or nlme are recommended. These procedures can Fit variance components, random coefficients, slopes-as-outcomes etc. models They need data to be in the long format (see below) Additional variables code the nesting structure SLIDE 14
Software For LMM - Data SPSS or R need a certain data structure. The data restructure wizard can do that for us in SPSS, in R the package reshape helps. Let us look at the didactic examples: Example 1 (firms): Growth of firm size measured at 9 occasions (longitudinal data) Example 2 (groceries): Amount spent on grocery items for different retailers (clustered data) Example 3 (testmarket): Coupon campaign of a retailer (multilevel clusters and longitudinal) Example 4 (bdf): Data on language scores of pupils (multilevel clusters and cross-section/longitudinal structure with many explanatory variables) Please try to restructure firmwide from wide to long format. You may want ot look at groceries to see how that should look. SLIDE 15
First Examples - Grocery Sales The groceries data are amount spent on grocery sales in different stores. 60 stores, 351 customers Various explanatory variables Let us calculate an overall mean model and an ANOVA with/without random effect. SLIDE 16
First Examples - Grocery Sales 800.00 88 226 312 600.00 59 235 326 Amount spent 400.00 200.00 7 19 60 64 148 169 214 213 193 223 239 262 261.00 1 3 5 7 9 11131517192123252729313335373941434547495153555759 2 4 6 8 1012141618202224262830323436384042444648505254565860 Store ID SLIDE 17
First Examples - Employees In Firms Let us look at a regression model of the number of employees explained by the revenue for the firms data. Measured at 9 occasions Not necessarily measured at the same time/revenue level Revenue is centered Let us calculate a regression of employees on revenue with/without a random effect over the firms. SLIDE 18
Firm Employees - LM 1.0 0.5 0.0 0.5 1.0 130 140 150 160 170 revenue employees SLIDE 19
Firm Employees - LM employees 130 140 150 160 170 1.0 0.5 0.0 0.5 1.0 revenue SLIDE 20
Firm Employees - LMM 1.0 0.00.51.0 employees 170 160 150 140 130 170 160 150 140 130 170 160 150 140 130 19 3 1 7 10 4 24 18 17 26 22 5 16 25 12 23 15 9 13 11 8 2 14 21 20 6 170 160 150 140 130 170 160 150 140 130 1.0 0.00.51.0 1.0 0.00.51.0 revenue 1.0 0.00.51.0 SLIDE 21
Estimation One can estimate LMM parameters (fixed effects, error, variances and covariances) in many different ways, either frequentist or Bayesian. Frequentist approaches Full Information Maximum Likelihood Restricted Maximum Likelihood Least Squares (ordinary or weighted) Bayesian MCMC approaches The two ML procedures are most popular. SLIDE 22
Estimation - FIML It uses the whole information of the joint likelihood of β, Ω and Σ and includes a penalty term for the magnitude of the random effects. It is a shrinkage estimator. The problem is that those estimates are biased since the estimates for the fixed effects have to be plugged in to calculate variance components and vice versa. This will reduce the degrees of freedom available but that is not accounted for. Intuitively: The FIML fixed effects are WLS estimates with the FIML variance components used for weighting. SLIDE 23
Estimation - REML This approach restricts the likelihood to error contrasts and maximizes that restricted (residual) likelihood, a function of Ω and Σ. This takes the loss of degrees of freedom into account and has therefore smaller bias. It can be shown that it is the FIML with an additional bias correction. The problem is that the fixed effects no longer occur in the restricted likelihood and model comparison with likelihood ratio tests can only be done if two models have the same fixed effect specification. Intuitively: The REML fixed effects are WLS estimates with the REML variance components used for weighting. SLIDE 24
ANOVA With Random Effect - I This is an ANOVA for which there is more than one random effect, the within group error and between group error. There are j = 1,..., J groups. y ij = µ + u j + ɛ ij It can be easier to specify such models on each level separately Level 1: y ij = β j + ɛ ij Level 2: β j = µ + u j This model is very useful for calculating the amount of variance that can be explained by higher level covariates (via the identity Var(y ij ) = Var(u j + ɛ ij ) = ω + σ 2 ) and the intraclass correlation as ω/(ω + σ 2 ). SLIDE 25
ANOVA With Random Effect- II We already calculated an example, the mean amount spent on groceries. Let us also calculate the IC correlation. SLIDE 26
Random Intercept Model - I This model adds a metric predictor at the first level. Level 1: y ij = β 0j + β 1j X ij + ɛ ij, That predictor s coefficient is fixed over all groups. Level 2: β 0j = µ + u j β 1j = γ 10 These methods are very popular especially for panel data. SLIDE 27
Random Intercept Model - I We already calculated an example, the number of employees in firms. SLIDE 28
Random Coefficients Model - I This model generalizes the idea of groupwise random variation to all coefficients. Level 1: y ij = β 0j + β 1j X ij + ɛ ij, This model specifies the metric predictor s coefficient as randomly varying over groups too. Level 2: β 0j = µ + u 0j β 1j = γ 10 + u 1j SLIDE 29
Random Coefficients Model - II Let us fit a random coefficient model for the firm data. SLIDE 30
Intercept and Slopes as Outcomes - I This is a very flexible generalisation. The level 1 model remains as before. Level 1: y ij = β 0j + β 1j X ij + ɛ ij, On the second level however, we explain the random variation in the coefficients by second level covariates. Level 2: β 0j = γ 00 + γ 01 W j + u 0j β 1j = γ 10 + γ 11 W j + u 1j SLIDE 31
Intercept-and-Slopes-as-Outcomes - II A company ran a promotion campaign over a number of weeks at different stores. At level 1 we have repeated sales and an explanatory factor. At level 2 we have the store location and the store location age as an explanatory variable. Let us calculate an intercept and slopes as outcomes model for the testmarket data. SLIDE 32
Multilevel Model Also known as hierarchical linear model. Maximum flexibility of these specification can be achieved by allowing for more levels and multiple explanatory variables on all levels. This is easiest by specifying each level model separately, but becomes notationally tedious. The matrix formulation y = Xβ + Zu + ɛ, is most parsimonous. However this does not really show what is happening. SLIDE 33
A Three Level RC Model Let us continue with the testmarket. We have three levels, the stores sales nested in the stores nested in a market. The Level 1 model is: y ijk = β 0jk + β 1jk X ijk + ɛ ijk On Level 2 we specify random slope and intercepts: The Level 3 model is then: β 0jk = γ 00k + u 0jk β 1jk = γ 10k + u 1jk γ 00k = δ 000 + u 00k γ 10k = δ 100 + u 10k SLIDE 34
A Three Level IaSaO model Now we additonally want to include predictors on levels 2 and 3. On level 2 it is store location age and on level 3 it is market size. Let us first specify the level two and level three models (level 1 remains unchanged) Fit it with the software of your choice SLIDE 35
Variance Structure for Ω So far we assumed the structure of Ω to be completely free. This means an awful lot of parameters to be estimated (the upper/lower triangular matrix of Ω, p + p(p 1)/2 parameters). We can however restrict entries of Ω to be zero or have certain structure. Popular choices are Identity: All covariances between the random effects are zero AR process: Auto regressive structures based on response ARMA: Auto regressive moving average structures Spatial: Different spatial correlation structures Certain structures are useful for certain data (e.g. panel data, spatial data) SLIDE 36
Variance Structure for Σ - I Usually Σ is the standard σ 2 I. This can be too restrictive, especially for repeated measurements. Hence we can specify different error assumptions, similar to classic linear models Identity: All covariances between the random effects are zero Compound symmetry: Constant variance and within-group correlations AR process: Auto regressive structures MA process: Moving average structure ARMA: Auto regressive moving average structures Spatial: Different spatial correlation structures SLIDE 37
Variance Structure For Σ - II Let us now specify an autoregressive first order error process (AR(1)) for the residuals for the testmarket data (or the firm data) In SPSS you can do that in the first tab, in R you must use lme with a corstruct object. SLIDE 38
Testing Hypotheses For LMM The best thing is that testing works exactly as in all other maximum likelihood models, with the likelihood ratio test: 2log L 0 L A χ 2 p a p 0 Fixed effects can also be tested with F -Tests If REML has been used inference can only be done with the same fixed effect specification Tests on variance components are conservative (boundary solution) SLIDE 39
Best Linear Unbiased Estimators (BLUE) Estimators of fixed effects β are best linear unbiased estimators, BLUE. This holds of course if Ω is known (it is the GLS then) The MLE for the fixed effects is unbiased in small (and certainly in large) samples regardless of ˆΩ The MLE for the fixed effects is BLUE in most relevant cases (estimating variances of u Under randomness, estimators for β are asymptotically consistent, efficient and normal Not always identifiable We can predict a persons value based on fixed effects alone if we want, but they will not be subject specific. SLIDE 40
Best Linear Unbiased Predictors (BLUP) Estimators of random effects β are not really estimators. Since they are random, we call them predictors. This holds if ˆβ is the GLS estimator (Ω known) Under randomness, estimators for u are asymptotically consistent, efficient and normal The MLE for the random effects is BLUP in most practical cases when estimating variances of u It is a shrinkage estimator and is highly Bayesian It can also be represented as a posterior mode (or mean) We can make predictions of a persons value as the some of fixed and random effect. Such we can even make individual cluster predictions if the number of observations for the cluster is less than the number of fixed effects. SLIDE 41
Grand Mean Centering Centering of explanatory variables on the lowest level can become important for a good interpretation of especially the intercepts. This applies even more when we have longitudinal data. Grand mean centering: This is centering around the overall predictor mean This will make the intercept the grand mean This will avoid high negative correlations between intercept and coefficient Regression coefficients are the average change over the whole level 1 support Such a translation can be compensated by the regression coefficients, hence there is invariance of the expected values (for non-zero regression coefficients) There is also invariance of covariance matrix parameters SLIDE 42
Group Mean Centering This is centering the mean per level 2 group Necessary if there is systematic variability between the level 1 covariates Invariance as before does not hold We cannot compensate for group based translations by changing the coefficients and covariance matrix It is a different model compared to the non-centered one Including group means as predictors is recommended SLIDE 43
Generalized Here we fit a GLM on level 1, everything else remains as before. The GLMM is defined as E(y i ) = µ i = g 1 (Xβ + Zu) or g(µ i ) = Xβ + Zu where g( ) is a link function and the distribution of y i is an exponential family Metric data (normal distribution) Binary data (binomial distribution) Count data (Poisson distribution) Survival data (gamma distribution) In R via glmer and in SPSS via GLMMIXED. SLIDE 44
Thank you for your Attention Thomas Rusch Department of Finance, Accounting and Statistics Institute for Statistics and Mathematics email: thomas.rusch@wu.ac.at URL: http://statmath.wu.ac.at/~tr WU Wirtschaftsuniversität Wien Augasse 2 6, A-1090 Wien SLIDE 45