Multilevel Models in Matrix Form. Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2

Size: px

Start display at page:

Download "Multilevel Models in Matrix Form. Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2"

Daniela Long
5 years ago
Views:

1 Multilevel Models in Matrix Form Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2

2 Today s Lecture Linear models from a matrix perspective An example of how to do MANOVA in linear mixed models A more modern twist on the classic technique

3 THE MULTIVARIATE NORMAL DISTRIBUTION

4 Multivariate Normal Distribution The generalization of the univariate normal distribution to multiple variables is called the multivariate normal distribution (MVN) Many multivariate techniques rely on this distribution in some manner Multilevel/Mixed models

5 Univariate Normal Distribution The univariate normal distribution function is: f x = 1 2 x μ exp 2πσ2 2σ 2 The mean is μ The variance is σ 2 Standard notation for random variables x following normal distributions is x N μ, σ 2

6 Univariate Normal Distribution

7 Multivariate Normal The multivariate normal distribution function is: 1 f x = 2π p 2 Σ 1 exp x μ Σ 1 x μ T 2 2 The mean vector is μ The covariance matrix is Σ Standard notation for the MVN distribution of p variables is N p μ, Σ

8 Picturing the Multivariate Normal

9 Contour Plot (View From Above)

10 Another Multivariate Normal Plot

11 MVN Properties The MVN distribution has some convenient properties for mixed models If x has a multivariate normal distribution, then: 1. Linear combinations of x are normally distributed. 2. All subsets of the components of x have an MVN distribution. 3. Zero covariance implies that the corresponding components are independently distributed. 4. The conditional distributions of the components are MVN. Especially important for our models

12 LINEAR MODELS IN MATRICES

13 Linear Models with Matrices Recall our basic linear model (here a regression model) for observation i (of N): Y i = β 0 + β 1 X i1 + + β k X ik + e i The equation above can be expressed more compactly by a set of matrices Y = Xβ + e Y is of size (N x 1) X is of size (N x (1 + k)) β is of size ((1+k) x 1) e is of size (N x 1)

14 Unpacking the Equation Y 1 Y 2 Y NY (N x 1) = 1 X 11 X 1k 1 X 21 X 2k 1 X N1 X Nk X (N x (1 + k)) β 0 β 1 β k β ((1 + k) x 1) + e 1 e 2 e Ne (N x 1) For the first observation: Y 1 = β 0 + β 1 X β k X 1k + e 1

15 Notes on Matrices The use of matrices allows for a compact form of the model equation All observations are included The matrix of predictors, X, has the first column containing all ones Corresponds (multiplies) the intercept β 0 Shows how design matrices can be used for in linear models Think about categorical predictors (dummy coding/effect coding)

16 Linear Model Assumptions Recall that we assumed that the error terms were assumed to be Independent Normally distributed e i N(0, σ e 2 ) With matrices, we can now talk about the joint distribution of error terms (for everyone) e N N 0, σ e 2 I N

17 Error Covariance Matrix The fixed effects linear model assumes the following structure for the errors: 2 σ e 0 σ 2 e I N = 2 0 σ e In multilevel analyses, this assumption is not valid so our models introduce terms to relax this assumption

18 Estimation in Linear Models Regression estimates are typically found via least squares (called L 2 estimates) In least squares regression, the estimates are found by minimizing the sum of squared errors: N i=1 N e i 2 = Y i Y i 2 i=1 N = Y i β 0 β 1 X i1 β k X ik 2 i=1 As you could guess, we could do this matrices: N i=1 N e i 2 = Y i x i T β 2 i=1 = Y Xβ T Y Xβ = e T e

19 The Estimator The equation for β that minimizes e T e is: β = X T X 1 X T Y The nice thing about this equation is that simultaneously is the MLE (maximizes the likelihood function under normality assumptions)

20 Model Assumptions The conditional distribution of Y has a multivariate normal distribution: Mean vector is the predicted values of Y Covariance matrix is error covariance matrix f Y X N N Xβ, σ e 2 I N

21 Variance of Estimates The covariance matrix of β contains useful information regarding the standard errors of the estimates (which are found along the diagonal) Under the linear model, this is given by: Var β = σ e 2 X T X 1

22 MULTILEVEL MODELS IN MATRICES (GENERAL LINEAR MIXED MODELS)

23 Multilevel (Mixed) Models The general linear mixed model is given by: Y = Xβ + Zγ + e Y is of size (N x 1) X is of size (N x (1+k)) β is of size ((1+k) x 1) Z is of size (N x r*g) (r random effects; g groups) γ is of size (r*g x 1) e is of size (N x 1)

24 The New Terms The Z matrix is analogous to the X matrix it contains the predictors of the random effects (i.e., random intercepts, slopes, etc ) The γ matrix contains the random effects for each observation Because of the size of the observations, these matrices are rather large Can be notated differently, though

25 Z and γ for a Random Intercept For a model with a random intercept, this is how Z and γ appear: Rows Represent Observations Z = Columns Represent Groups ; γ = Columns Represent Type of Effect γ 10 γ 20 γ 30 Rows Represent Group Effect Values

26 Multilevel (Mixed) Model Assumptions Assumptions in multilevel (mixed) models involve the random effects and the error terms Random effect assumptions: Multivariate Normal (across r random effects) Mean Vector 0; Covariance Matrix G (block diagonal within a group) γ N r 0, G Error term assumptions Multivariate Normal (within a group) Mean Vector 0; Covariance Matrix R e N N (0, R)

27 Model Assumptions The conditional distribution of Y has a multivariate normal distribution: Mean vector is the predicted values of Y Covariance matrix is combination of random effect and error term covariance matrices Allows for correlated observations f Y X, Z N N Xβ, ZGZ T + R

28 New Covariance Matrix Because of the grouping structure of data, the new covariance matrix is block-diagonal Blocks represent the covariance matrix for a group/cluster of observations

29 Model Estimation Because of the inclusion of random effects (which are not directly observable), the model no longer has a single estimation equation Rather, we now must use an iterative process to estimate model parameters Two estimators are commonly used: maximum likelihood (ML) and residual maximum likelihood (REML) I will introduce ML first then REML

30 ML Estimation of Mixed Models The goal in ML estimation is to pick a set of parameters that maximize the likelihood function Typically the log-likelihood is used Here, we have to know β, γ, G, R γ isn t a part of the function below The log-likelihood function is the log of the model-assumed MVN: N N Xβ, V = ZGZ T + R

31 Simplifying Things Because of the wonders of math, we can use a technique called estimated generalized least squares Use some method to find G and R: G and R Given G and R, we can find β Here, we will define V = ZGZ T + R Specifically: β = X T VX 1 X T V 1 Y

32 The ML Log Likelihood The goal is to pick G and R and then substitute them into the log likelihood function, producing a log likelihood value Picking G and R can be done using Newton-Raphson (as is done in SAS) The function value is: Where: l G, R = 1 2 log V 1 2 rt V 1 r n 2 log(2π) r = Y Xβ = Y X X T VX 1 X T V 1 Y

33 Issues with ML Estimates ML estimation is a common choice and performs well when sample sizes are large However, estimates of the variances will be biased Similar to basic statistics phenomena of using N versus N-1 in the variance/standard deviation Therefore, the residual ML estimator was developed Called REML

34 REML Estimator The REML estimator maximizes the likelihood of the residuals The likelihood function comes from stating the likelihood of the data as a function of the likelihood of the estimated fixed effects and the residuals Here, we take the estimated residuals to be e = Y Xβ Where β = X T V 1 X 1 X T V 1 Y

35 Deriving REML Because Y is multivariate normal, β and e are linear functions of Y that are: Normally distributed (see properties of MVN) Independent Therefore, with independence we can re-express the likelihood of Y as a product of β and e L Y V = L β V L e V

36 More Deriving REML Further, due to the consistency of the estimates, we know that β N β, X T V 1 X 1 Therefore, it is now our goal to maximize the log-likelihood of the residuals, or L(e V)

37 Step 1: Taking the Log We now take the log of our original likelihood function: L Y V = L β V L e V Yielding: log L Y V = log L β V + log L e V Which gives us: log L e V = log L Y V log L β V

38 Step 2: We know that Y N N Xβ, V = ZGZ T + R and β N β, X T V 1 X 1 We can then put the MVN associated with each into our log likelihood of the residual log L e V = log L Y V log L β V

39 Even More log L e V = log L Y V log L β V = 1 2 log XT V 1 X + log V + Y Xβ T V 1 Y Xβ β β T X T V 1 X β β Here: Y Xβ T V 1 Y Xβ = Y Xβ T V 1 Y Xβ + β β T X T V 1 X β β Meaning we can cancel the last term.

40 The REML Log Likelihood After all the slides before, we can now present the REML log likelihood: log L e V = 1 2 log XT V 1 X + log V + e T n p e 2 log (2π)

41 Uses of ML and REML ML can be used for deviance tests when the fixed effects are the same or are different REML can be used for deviance tests when the fixed effects are the same only Residuals change when the fixed effects change

42 Demonstrating Through an Example REVISITING MANOVA FROM A LINEAR MODELS PERSPECTIVE

43 MANOVA Revisited The classical MANOVA model can be rephrased so as to fit into a multilevel or mixed-effects model framework The new framework can allow for: A different (smaller) set of covariances to be estimated Useful for approximating a full matrix when you do not have a lot of data Predictor variables that vary by outcome Useful for repeated measures designs Synchronization with more modern methods Multilevel models The new framework does not provide an overall MANOVA hypothesis test (i.e., Wilks Lambda)

both into one column We also must add two dummy-coded variables

44 Rearranging Data The first step to using a linear model framework is to convert our data from wide to long Here we take two scores and put both into one column We also must add two dummy-coded variables indicating which score is represented by a row of the data Wide Data Long Data

45 The Analysis - MANOVA Previously, we used MANOVA to test the multivariate hypothesis that the mean vectors were the same across all conditions:

46 More From MANOVA The Error SSCP Matrix:

47 Converting the Error SSCP Matrix to an Error Covariance Matrix Because the Error SSCP matrix is not a covariance matrix, we can obtain the covariance matrix by dividing the Error SSCP matrix by the degrees of freedom (here 109): S e = 1 E = df e =

48 Univariate Results from GLM Once we rejected our null hypothesis we then became interested in univariate ANOVAs for each outcome variable:

49 Using the MIXED Procedure However, we can now do *most* of the univariate procedures from our MANOVA within proc mixed

50 MIXED: The Error Covariance Matrix

51 MIXED: The Univariate Hypothesis Tests

52 Secondary Phrasing: *almost* MANOVA

53 Final Thoughts Today we discussed the matrix form of linear models with mixed effects Multilevel models The matrix form can be useful for reading about these models in papers and presentations This class was meant to be an introduction to the technical side of the modeling framework Much more time can be spent on just this alone

54 Next Time (Friday) Lab: Meet in Helen Newberry We ll discuss how to do MANOVA and discriminant analysis in SAS

Statistical Distribution Assumptions of General Linear Models

Statistical Distribution Assumptions of General Linear Models Applied Multilevel Models for Cross Sectional Data Lecture 4 ICPSR Summer Workshop University of Colorado Boulder Lecture 4: Statistical Distributions