Multilevel Models in Matrix Form. Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2

Similar documents
Statistical Distribution Assumptions of General Linear Models

Repeated Measures ANOVA Multivariate ANOVA and Their Relationship to Linear Mixed Models

Maximum Likelihood Estimation; Robust Maximum Likelihood; Missing Data with Maximum Likelihood

Multiple Linear Regression

Course Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model

Multivariate Regression (Chapter 10)

Generalized Linear Models for Non-Normal Data

Course Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model

Applied Multivariate Statistical Modeling Prof. J. Maiti Department of Industrial Engineering and Management Indian Institute of Technology, Kharagpur

A Re-Introduction to General Linear Models (GLM)

A Introduction to Matrix Algebra and the Multivariate Normal Distribution

Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016

Profile Analysis Multivariate Regression

An Introduction to Mplus and Path Analysis

[y i α βx i ] 2 (2) Q = i=1

Introduction to Matrix Algebra and the Multivariate Normal Distribution

Longitudinal Data Analysis of Health Outcomes

Model Estimation Example

Summer School in Statistics for Astronomers V June 1 - June 6, Regression. Mosuk Chow Statistics Department Penn State University.

Categorical Predictor Variables

EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7

Univariate Normal Distribution; GLM with the Univariate Normal; Least Squares Estimation

Normal distribution We have a random sample from N(m, υ). The sample mean is Ȳ and the corrected sum of squares is S yy. After some simplification,

Multivariate analysis of variance and covariance

Introduction to Random Effects of Time and Model Estimation

An Introduction to Path Analysis

Multivariate Linear Models

Path Analysis. PRE 906: Structural Equation Modeling Lecture #5 February 18, PRE 906, SEM: Lecture 5 - Path Analysis

Additional Notes: Investigating a Random Slope. When we have fixed level-1 predictors at level 2 we show them like this:

STA441: Spring Multiple Regression. This slide show is a free open source document. See the last slide for copyright information.

STAT 501 EXAM I NAME Spring 1999

General Linear Models. with General Linear Hypothesis Tests and Likelihood Ratio Tests

A Re-Introduction to General Linear Models

Lecture 2. The Simple Linear Regression Model: Matrix Approach

Regression With a Categorical Independent Variable

Introduction to Within-Person Analysis and RM ANOVA

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN

Ron Heck, Fall Week 3: Notes Building a Two-Level Model

Hierarchical Generalized Linear Models. ERSH 8990 REMS Seminar on HLM Last Lecture!

Regression With a Categorical Independent Variable

Chapter 7, continued: MANOVA

Review of the General Linear Model

Interactions among Continuous Predictors

Linear Regression Models P8111

Lecture 16 Solving GLMs via IRWLS

Appendix A: Review of the General Linear Model

Other hypotheses of interest (cont d)

Mixed models in R using the lme4 package Part 4: Theory of linear mixed models

Notes on the Multivariate Normal and Related Topics

Linear Regression. In this lecture we will study a particular type of regression model: the linear regression model

ANOVA Longitudinal Models for the Practice Effects Data: via GLM

Describing Change over Time: Adding Linear Trends

20. REML Estimation of Variance Components. Copyright c 2018 (Iowa State University) 20. Statistics / 36

Univariate Normal Distribution; GLM with the Univariate Normal; Least Squares Estimation

MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7

MA 575 Linear Models: Cedric E. Ginestet, Boston University Mixed Effects Estimation, Residuals Diagnostics Week 11, Lecture 1

Describing Within-Person Fluctuation over Time using Alternative Covariance Structures

Correlation and Regression

Economics 240A, Section 3: Short and Long Regression (Ch. 17) and the Multivariate Normal Distribution (Ch. 18)

Math 423/533: The Main Theoretical Topics

MANOVA is an extension of the univariate ANOVA as it involves more than one Dependent Variable (DV). The following are assumptions for using MANOVA:

Generalized Linear. Mixed Models. Methods and Applications. Modern Concepts, Walter W. Stroup. Texts in Statistical Science.

Introduction to Factor Analysis

Designing Multilevel Models Using SPSS 11.5 Mixed Model. John Painter, Ph.D.

Restricted Maximum Likelihood in Linear Regression and Linear Mixed-Effects Model

General Principles Within-Cases Factors Only Within and Between. Within Cases ANOVA. Part One

Rejection regions for the bivariate case

Ma 3/103: Lecture 24 Linear Regression I: Estimation

Regression With a Categorical Independent Variable

AGEC 621 Lecture 16 David Bessler

Hypothesis Testing for Var-Cov Components

Lecture 9: Linear Regression

Part 6: Multivariate Normal and Linear Models

MS&E 226: Small Data

Multivariate Statistical Analysis

Estimation: Problems & Solutions

Application of Ghosh, Grizzle and Sen s Nonparametric Methods in. Longitudinal Studies Using SAS PROC GLM

The Gaussian distribution

Covariance Structure Approach to Within-Cases

4.1 Computing section Example: Bivariate measurements on plants Post hoc analysis... 7

Mean Vector Inferences

Stat 5101 Lecture Notes

Chapter 6. Logistic Regression. 6.1 A linear model for the log odds

UNIVERSITY OF TORONTO Faculty of Arts and Science

CAS MA575 Linear Models

Review of CLDP 944: Multilevel Models for Longitudinal Data

An Introduction to Multilevel Models. PSYC 943 (930): Fundamentals of Multivariate Modeling Lecture 25: December 7, 2012

Chapter 9. Multivariate and Within-cases Analysis. 9.1 Multivariate Analysis of Variance

Lecture 3: Linear Models. Bruce Walsh lecture notes Uppsala EQG course version 28 Jan 2012

STAT5044: Regression and Anova

Lecture 6 Multiple Linear Regression, cont.

MATH5745 Multivariate Methods Lecture 07

Lecture 6: Single-classification multivariate ANOVA (k-group( MANOVA)

A Practitioner s Guide to Generalized Linear Models

Multivariate Time Series: VAR(p) Processes and Models

Describing Within-Person Change over Time

Generalized Linear Models 1

1 Mixed effect models and longitudinal data analysis

Linear Methods for Prediction

Multinomial Logistic Regression Models

Transcription:

Multilevel Models in Matrix Form Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2

Today s Lecture Linear models from a matrix perspective An example of how to do MANOVA in linear mixed models A more modern twist on the classic technique

THE MULTIVARIATE NORMAL DISTRIBUTION

Multivariate Normal Distribution The generalization of the univariate normal distribution to multiple variables is called the multivariate normal distribution (MVN) Many multivariate techniques rely on this distribution in some manner Multilevel/Mixed models

Univariate Normal Distribution The univariate normal distribution function is: f x = 1 2 x μ exp 2πσ2 2σ 2 The mean is μ The variance is σ 2 Standard notation for random variables x following normal distributions is x N μ, σ 2

Univariate Normal Distribution

Multivariate Normal The multivariate normal distribution function is: 1 f x = 2π p 2 Σ 1 exp x μ Σ 1 x μ T 2 2 The mean vector is μ The covariance matrix is Σ Standard notation for the MVN distribution of p variables is N p μ, Σ

Picturing the Multivariate Normal

Contour Plot (View From Above)

Another Multivariate Normal Plot

MVN Properties The MVN distribution has some convenient properties for mixed models If x has a multivariate normal distribution, then: 1. Linear combinations of x are normally distributed. 2. All subsets of the components of x have an MVN distribution. 3. Zero covariance implies that the corresponding components are independently distributed. 4. The conditional distributions of the components are MVN. Especially important for our models

LINEAR MODELS IN MATRICES

Linear Models with Matrices Recall our basic linear model (here a regression model) for observation i (of N): Y i = β 0 + β 1 X i1 + + β k X ik + e i The equation above can be expressed more compactly by a set of matrices Y = Xβ + e Y is of size (N x 1) X is of size (N x (1 + k)) β is of size ((1+k) x 1) e is of size (N x 1)

Unpacking the Equation Y 1 Y 2 Y NY (N x 1) = 1 X 11 X 1k 1 X 21 X 2k 1 X N1 X Nk X (N x (1 + k)) β 0 β 1 β k β ((1 + k) x 1) + e 1 e 2 e Ne (N x 1) For the first observation: Y 1 = β 0 + β 1 X 11 + + β k X 1k + e 1

Notes on Matrices The use of matrices allows for a compact form of the model equation All observations are included The matrix of predictors, X, has the first column containing all ones Corresponds (multiplies) the intercept β 0 Shows how design matrices can be used for in linear models Think about categorical predictors (dummy coding/effect coding)

Linear Model Assumptions Recall that we assumed that the error terms were assumed to be Independent Normally distributed e i N(0, σ e 2 ) With matrices, we can now talk about the joint distribution of error terms (for everyone) e N N 0, σ e 2 I N

Error Covariance Matrix The fixed effects linear model assumes the following structure for the errors: 2 σ e 0 σ 2 e I N = 2 0 σ e In multilevel analyses, this assumption is not valid so our models introduce terms to relax this assumption

Estimation in Linear Models Regression estimates are typically found via least squares (called L 2 estimates) In least squares regression, the estimates are found by minimizing the sum of squared errors: N i=1 N e i 2 = Y i Y i 2 i=1 N = Y i β 0 β 1 X i1 β k X ik 2 i=1 As you could guess, we could do this matrices: N i=1 N e i 2 = Y i x i T β 2 i=1 = Y Xβ T Y Xβ = e T e

The Estimator The equation for β that minimizes e T e is: β = X T X 1 X T Y The nice thing about this equation is that simultaneously is the MLE (maximizes the likelihood function under normality assumptions)

Model Assumptions The conditional distribution of Y has a multivariate normal distribution: Mean vector is the predicted values of Y Covariance matrix is error covariance matrix f Y X N N Xβ, σ e 2 I N

Variance of Estimates The covariance matrix of β contains useful information regarding the standard errors of the estimates (which are found along the diagonal) Under the linear model, this is given by: Var β = σ e 2 X T X 1

MULTILEVEL MODELS IN MATRICES (GENERAL LINEAR MIXED MODELS)

Multilevel (Mixed) Models The general linear mixed model is given by: Y = Xβ + Zγ + e Y is of size (N x 1) X is of size (N x (1+k)) β is of size ((1+k) x 1) Z is of size (N x r*g) (r random effects; g groups) γ is of size (r*g x 1) e is of size (N x 1)

The New Terms The Z matrix is analogous to the X matrix it contains the predictors of the random effects (i.e., random intercepts, slopes, etc ) The γ matrix contains the random effects for each observation Because of the size of the observations, these matrices are rather large Can be notated differently, though

Z and γ for a Random Intercept For a model with a random intercept, this is how Z and γ appear: Rows Represent Observations Z = Columns Represent Groups 1 0 0 1 0 0 0 1 0 ; γ = Columns Represent Type of Effect γ 10 γ 20 γ 30 Rows Represent Group Effect Values

Multilevel (Mixed) Model Assumptions Assumptions in multilevel (mixed) models involve the random effects and the error terms Random effect assumptions: Multivariate Normal (across r random effects) Mean Vector 0; Covariance Matrix G (block diagonal within a group) γ N r 0, G Error term assumptions Multivariate Normal (within a group) Mean Vector 0; Covariance Matrix R e N N (0, R)

Model Assumptions The conditional distribution of Y has a multivariate normal distribution: Mean vector is the predicted values of Y Covariance matrix is combination of random effect and error term covariance matrices Allows for correlated observations f Y X, Z N N Xβ, ZGZ T + R

New Covariance Matrix Because of the grouping structure of data, the new covariance matrix is block-diagonal Blocks represent the covariance matrix for a group/cluster of observations

Model Estimation Because of the inclusion of random effects (which are not directly observable), the model no longer has a single estimation equation Rather, we now must use an iterative process to estimate model parameters Two estimators are commonly used: maximum likelihood (ML) and residual maximum likelihood (REML) I will introduce ML first then REML

ML Estimation of Mixed Models The goal in ML estimation is to pick a set of parameters that maximize the likelihood function Typically the log-likelihood is used Here, we have to know β, γ, G, R γ isn t a part of the function below The log-likelihood function is the log of the model-assumed MVN: N N Xβ, V = ZGZ T + R

Simplifying Things Because of the wonders of math, we can use a technique called estimated generalized least squares Use some method to find G and R: G and R Given G and R, we can find β Here, we will define V = ZGZ T + R Specifically: β = X T VX 1 X T V 1 Y

The ML Log Likelihood The goal is to pick G and R and then substitute them into the log likelihood function, producing a log likelihood value Picking G and R can be done using Newton-Raphson (as is done in SAS) The function value is: Where: l G, R = 1 2 log V 1 2 rt V 1 r n 2 log(2π) r = Y Xβ = Y X X T VX 1 X T V 1 Y

Issues with ML Estimates ML estimation is a common choice and performs well when sample sizes are large However, estimates of the variances will be biased Similar to basic statistics phenomena of using N versus N-1 in the variance/standard deviation Therefore, the residual ML estimator was developed Called REML

REML Estimator The REML estimator maximizes the likelihood of the residuals The likelihood function comes from stating the likelihood of the data as a function of the likelihood of the estimated fixed effects and the residuals Here, we take the estimated residuals to be e = Y Xβ Where β = X T V 1 X 1 X T V 1 Y

Deriving REML Because Y is multivariate normal, β and e are linear functions of Y that are: Normally distributed (see properties of MVN) Independent Therefore, with independence we can re-express the likelihood of Y as a product of β and e L Y V = L β V L e V

More Deriving REML Further, due to the consistency of the estimates, we know that β N β, X T V 1 X 1 Therefore, it is now our goal to maximize the log-likelihood of the residuals, or L(e V)

Step 1: Taking the Log We now take the log of our original likelihood function: L Y V = L β V L e V Yielding: log L Y V = log L β V + log L e V Which gives us: log L e V = log L Y V log L β V

Step 2: We know that Y N N Xβ, V = ZGZ T + R and β N β, X T V 1 X 1 We can then put the MVN associated with each into our log likelihood of the residual log L e V = log L Y V log L β V

Even More log L e V = log L Y V log L β V = 1 2 log XT V 1 X + log V + Y Xβ T V 1 Y Xβ β β T X T V 1 X β β Here: Y Xβ T V 1 Y Xβ = Y Xβ T V 1 Y Xβ + β β T X T V 1 X β β Meaning we can cancel the last term.

The REML Log Likelihood After all the slides before, we can now present the REML log likelihood: log L e V = 1 2 log XT V 1 X + log V + e T n p e 2 log (2π)

Uses of ML and REML ML can be used for deviance tests when the fixed effects are the same or are different REML can be used for deviance tests when the fixed effects are the same only Residuals change when the fixed effects change

Demonstrating Through an Example REVISITING MANOVA FROM A LINEAR MODELS PERSPECTIVE

MANOVA Revisited The classical MANOVA model can be rephrased so as to fit into a multilevel or mixed-effects model framework The new framework can allow for: A different (smaller) set of covariances to be estimated Useful for approximating a full matrix when you do not have a lot of data Predictor variables that vary by outcome Useful for repeated measures designs Synchronization with more modern methods Multilevel models The new framework does not provide an overall MANOVA hypothesis test (i.e., Wilks Lambda)

Rearranging Data The first step to using a linear model framework is to convert our data from wide to long Here we take two scores and put both into one column We also must add two dummy-coded variables indicating which score is represented by a row of the data Wide Data Long Data

The Analysis - MANOVA Previously, we used MANOVA to test the multivariate hypothesis that the mean vectors were the same across all conditions:

More From MANOVA The Error SSCP Matrix:

Converting the Error SSCP Matrix to an Error Covariance Matrix Because the Error SSCP matrix is not a covariance matrix, we can obtain the covariance matrix by dividing the Error SSCP matrix by the degrees of freedom (here 109): S e = 1 E = 1 283.57 69.11 df e 109 69.11 347.92 2.60 0.63 = 0.63 3.19

Univariate Results from GLM Once we rejected our null hypothesis we then became interested in univariate ANOVAs for each outcome variable:

Using the MIXED Procedure However, we can now do *most* of the univariate procedures from our MANOVA within proc mixed

MIXED: The Error Covariance Matrix

MIXED: The Univariate Hypothesis Tests

Secondary Phrasing: *almost* MANOVA

Final Thoughts Today we discussed the matrix form of linear models with mixed effects Multilevel models The matrix form can be useful for reading about these models in papers and presentations This class was meant to be an introduction to the technical side of the modeling framework Much more time can be spent on just this alone

Next Time (Friday) Lab: Meet in Helen Newberry We ll discuss how to do MANOVA and discriminant analysis in SAS