Multilevel Analysis, with Extensions

Similar documents
MAXIMUM LIKELIHOOD IN GENERALIZED FIXED SCORE FACTOR ANALYSIS 1. INTRODUCTION

CENTERING IN MULTILEVEL MODELS. Consider the situation in which we have m groups of individuals, where

LINEAR MULTILEVEL MODELS. Data are often hierarchical. By this we mean that data contain information

FACTOR ANALYSIS AS MATRIX DECOMPOSITION 1. INTRODUCTION

Hierarchical Linear Modeling. Lesson Two

Chapter 4: Factor Analysis

Generalized Linear Models for Non-Normal Data

Hierarchical Modeling for Univariate Spatial Data

Interaction effects for continuous predictors in regression modeling

Multilevel Statistical Models: 3 rd edition, 2003 Contents

Multilevel Modeling: A Second Course

A Introduction to Matrix Algebra and the Multivariate Normal Distribution

Hierarchical Linear Models. Jeff Gill. University of Florida

Regression tree-based diagnostics for linear multilevel models

WU Weiterbildung. Linear Mixed Models

Hierarchical Modelling for Univariate Spatial Data

A multivariate multilevel model for the analysis of TIMMS & PIRLS data

EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7

Modelling heterogeneous variance-covariance components in two-level multilevel models with application to school effects educational research

1 The Classic Bivariate Least Squares Model

Additional Notes: Investigating a Random Slope. When we have fixed level-1 predictors at level 2 we show them like this:

Introduction to Multilevel Analysis

Finite Population Sampling and Inference

Inverse of a Square Matrix. For an N N square matrix A, the inverse of A, 1

DESIGNING EXPERIMENTS AND ANALYZING DATA A Model Comparison Perspective

Review of Panel Data Model Types Next Steps. Panel GLMs. Department of Political Science and Government Aarhus University.

An Introduction to Multivariate Statistical Analysis

6 Pattern Mixture Models

Regression: Lecture 2

Chapter 1 Statistical Inference

Machine Learning Linear Classification. Prof. Matteo Matteucci

22s:152 Applied Linear Regression

For more information about how to cite these materials visit

Statistical foundations

Subject CS1 Actuarial Statistics 1 Core Principles

Review of CLDP 944: Multilevel Models for Longitudinal Data

Specifying Latent Curve and Other Growth Models Using Mplus. (Revised )

Hierarchical Modelling for Univariate Spatial Data

Research Design: Topic 18 Hierarchical Linear Modeling (Measures within Persons) 2010 R.C. Gardner, Ph.d.

Ron Heck, Fall Week 3: Notes Building a Two-Level Model

Statistical methods research done as science rather than math: Estimates on the boundary in random regressions

Robustness of Principal Components

Linear, Generalized Linear, and Mixed-Effects Models in R. Linear and Generalized Linear Models in R Topics

Generalized Linear Models. Kurt Hornik

Analysis of the AIC Statistic for Optimal Detection of Small Changes in Dynamic Systems

Statistical Distribution Assumptions of General Linear Models

Partial factor modeling: predictor-dependent shrinkage for linear regression

THE EFFECTS OF MULTICOLLINEARITY IN MULTILEVEL MODELS

Bayes methods for categorical data. April 25, 2017

Machine Learning Linear Regression. Prof. Matteo Matteucci

Non-linear panel data modeling

Peter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8

INTRODUCTION TO MULTILEVEL MODELLING FOR REPEATED MEASURES DATA. Belfast 9 th June to 10 th June, 2011

High-Throughput Sequencing Course

Exploratory Factor Analysis and Principal Component Analysis

sphericity, 5-29, 5-32 residuals, 7-1 spread and level, 2-17 t test, 1-13 transformations, 2-15 violations, 1-19

Review of Classical Least Squares. James L. Powell Department of Economics University of California, Berkeley

Linear Models 1. Isfahan University of Technology Fall Semester, 2014

Default Priors and Effcient Posterior Computation in Bayesian

Time-Invariant Predictors in Longitudinal Models

Statistics 572 Semester Review

The prediction of house price

FE670 Algorithmic Trading Strategies. Stevens Institute of Technology

36-720: The Rasch Model

Probabilistic Graphical Models

Path Analysis. PRE 906: Structural Equation Modeling Lecture #5 February 18, PRE 906, SEM: Lecture 5 - Path Analysis

R-squared for Bayesian regression models

MTH Linear Algebra. Study Guide. Dr. Tony Yee Department of Mathematics and Information Technology The Hong Kong Institute of Education

FACTOR ANALYSIS AND MULTIDIMENSIONAL SCALING

The Application and Promise of Hierarchical Linear Modeling (HLM) in Studying First-Year Student Programs

1 Motivation for Instrumental Variable (IV) Regression

EC402 - Problem Set 3

Statistical Analysis of List Experiments

An Introduction to Multivariate Methods

Now consider the case where E(Y) = µ = Xβ and V (Y) = σ 2 G, where G is diagonal, but unknown.

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN

Measuring relationships among multiple responses

Generalized Linear Models. Last time: Background & motivation for moving beyond linear

A (Brief) Introduction to Crossed Random Effects Models for Repeated Measures Data

Fixed Effects, Invariance, and Spatial Variation in Intergenerational Mobility

A Course in Applied Econometrics Lecture 7: Cluster Sampling. Jeff Wooldridge IRP Lectures, UW Madison, August 2008

This model of the conditional expectation is linear in the parameters. A more practical and relaxed attitude towards linear regression is to say that

LECTURE 2 LINEAR REGRESSION MODEL AND OLS

MA 575 Linear Models: Cedric E. Ginestet, Boston University Mixed Effects Estimation, Residuals Diagnostics Week 11, Lecture 1

1 Introduction to Generalized Least Squares

1 Introduction. 2 Example

Introduction to Matrix Algebra and the Multivariate Normal Distribution

the error term could vary over the observations, in ways that are related

Linear Algebra Review

Linear Regression & Correlation

Introducing Generalized Linear Models: Logistic Regression

ECON 4160, Autumn term Lecture 1

Introduction to Confirmatory Factor Analysis

LECTURE 4 PRINCIPAL COMPONENTS ANALYSIS / EXPLORATORY FACTOR ANALYSIS

Ron Heck, Fall Week 8: Introducing Generalized Linear Models: Logistic Regression 1 (Replaces prior revision dated October 20, 2011)

Gov 2002: 5. Matching

An Introduction to Matrix Algebra

Alternatives to Basis Expansions. Kernels in Density Estimation. Kernels and Bandwidth. Idea Behind Kernel Methods

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent

Orthogonal decompositions in growth curve models

Transcription:

May 26, 2010

We start by reviewing the research on multilevel analysis that has been done in psychometrics and educational statistics, roughly since 1985. The canonical reference (at least I hope so) is De Leeuw and Meijer (eds), Handbook of Multilevel Analysis, Springer, 2008. Although I have been asked to specifically review the social science applications, I will also try to establish some links with environmental statistics and space-time analysis. This is actually easier than expected, because the multilevel model is a special case of the mixed linear model or the hierarchical linear model. Here, and throughout the Handbook, we use the Van Dantzig (or Dutch) Convention: random variables are underlined.

Random Coefficient Models Basics Let s start simple. Suppose we have m groups, n j observations in group j, and p predictors in each of the groups. y j = X j b j + ɛ j, b j = β + δ j. Thus y j = X j β + X j δ j + ɛ j, and E(y j ) = X j β, V(y j ) = X j Ω j X j + Σ j.

Random Coefficient Model Stacked y 1.. y m = X 1.. X m [ ] β X 1 0 0 0 X 2 0 +........ 0 0 X m δ 1.. δ m + An RC model is a random intercept (RI) model if only the intercept has a random component. Thus the first column of all X j has all ones (is equal to u j ), and only the first of the elements of δ j has non-zero variance. y 1.. y m = X 1.. X m [ ] β δ 1 u 1 +. + δ m u m ɛ 1.. ɛ m ɛ 1.. ɛ m

Random Coefficient Models Identification It is common, and in fact necessary, to make some additional assumptions. β j = β Ω j = Ω, often diagonal. Σ j = Σ, often scalar. In addition we often assume that the disturbances ɛ j and δ j are jointly multivariate normal and mutually uncorrelated. This is needed for likelihood inference, and it is an excuse to stay away from higher order moments.

Slopes-as-Outcomes Models Basics In educational statistics multilevel analysis was introduced as a way to formalize contextual analysis by embedding it in the hierarchical linear model. The leading example is n j students in m schools. Suppose we have p student-level predictors, and we have q school-level predictors. We assume y ij = b js = p x ijs b js + ɛ ij, s=1 q z jr β sr + δ js. r=1 This elementwise formulation of the model shows how to extend SAO to more than two levels (students in classrooms in schools in districts).

Slopes-as-Outcomes Models Matrix Form Define Z j as the p pq matrix Z j = z j z j, where all z j have length q. Also do some obvious stacking. Then we can write y j = X j b j + ɛ j, b j = Z j β + δ j. Using the columns xs j and the vectors z j it follows that the fixed part of the model has the block-rank-one structure given by cross-level interactions. x1 1 z 1 x2 1 z 1 xp 1 z 1 E(y 1 ). E(y m ) = X 1 Bz 1. X m Bz m = x1 2 z 2 x2 2 z 2 x 2 p z 2........ x1 m z m x2 m z m xp m z m β 1.. β p.

Slopes-as-Outcomes Models Essence This is the core of SAO models: we fit cross-level interactions to a ragged array of data Y in which the rows are uncorrelated and the columns have covariance structure V(y j ) = X j Ω j X j + Σ j. And then one searches over the submodels that simultaneously set various cross-level interaction effects and various covariances in Ω j and Σ j equal to zero. The default in most cases is to use Ω j = diag(ω) and Σ j = σ 2 I. The next step is to become state-of-the-art and get applied researchers to buy our expensive software package.

Slopes-as-Outcomes Models Truth? It would be foolhardy, maybe even insane, to pretend that models such as SAO actually describe what goes on in classrooms. We merely have a descriptive device that gives one formalization of the dependence between students in the same classroom, as well as the cross-level interactions from contextual analysis. Of course if we have reliable prior information we should incorporate it into the device to reduce both bias and variance, to be able to talk to our colleagues, and to get published. But in this case the prior information is that some students are in the same class and that some variables may be related to school outcomes. SAO provides a framework, maybe even a language, to imbed this (rather minimal) prior information. There are many functionally similar frameworks around in statistics.

Slopes-as-Outcomes Models Generalizations Some fairly straightforward generalizations follow, once we realize that the SAO model is just a mixed linear model. We can generalize to nonlinear mixed models. We can generalize to generalized mixed linear models. Both generalizations are straightforward, although computationally far from trivial. Somewhat more intricate are Non-nested (crossed) designs. Multivariate outcomes. Correlated observations. Covariance structures.

Slopes-as-Outcomes Models Non-nested Designs Non-nested designs occur in educational statistics, for example, if we keep track of both the student s primary and secondary schools. Clearly those classifications are not hierarchical, and the multilevel model becomes more complicated. Suppose, for example, we measure PM-10 in a number of years and at a number of observation points located in some cells of a rectangular grid. A simple random intercept model would be y i = p x is β s + δ t(i) + ν l(i) + ɛ i, s=1 where t(i) is the year and l(i) is the location of observation i. This can be extended to SAO models if we have regressors to describe time and space.

Growth Curve Models Basics Before we start making matters even more complicated, let us first treat a more simple special case. Suppose all X j are the same. An example could be m objects measured at n time points. X could contain a basis of polynomials or Fourier coefficients to code time points, while Z would have characteristics of individuals. The data are a realization of an n m matrix-valued random variable Y (often assumed matrix-variate normal). Then E(vec(Y )) = (X Z)β, V(vec(Y )) = I (XΩX + Σ). This is a generalization of the classical Pothoff-Roy model, which has Ω = 0. There can be missing data, of course.

Growth Curve Models Loss Function The negative log-likelihood loss function for an even more general growth curve model is D = m log det(σ)+n log det(ω)+ + tr Σ 1 (Y XBZ )Ω 1 (Y XBZ ), which shows that the dispersions of groups (objects) and individuals (time points) are separable. In growth curve analysis we usually model the expectations and keep the covariances simple. But we can also work the other way around and use elaborate dispersion and simple expectation models. That is in the tradition of structural equation and multivariate time series modeling. They are combined in the R package leopold, with Wei-Tan Tsai.

Separable Models Elaborate Means The mean structure E(Y) = XBZ in the growth curve model has both X and Z known. If X and/or Z are (partially) unknown, we can incorporate principal component analysis (perhaps non-negative), reduced rank regression analysis, correspondence analysis, canonical correspondence analysis, fixed score factor analysis. Although these models seem very different, they are all basically matrix approximation methods minimizing the same loss function. And they naturally fit into the same block relaxation or alternating least squares algorithm.

Separable Models Elaborate Dispersions In linear models, such as growth curve models, we typically have simple dispersion structures such as Σ = σ 2 I. But it also makes sense to try, for example, AR(1) or more general Toeplitz forms for Σ. Thus we can have, say, AR(1) form for Σ, and some factor analytic or spatial covariance structure for Ω. There is an obvious trade off allocating parameters to the means and allocating parameters to the dispersions. If there are too many parameters in both modes, maximum likelihood runs into incidental parameter problems, such as Neyman-Scott bias or Anderson-Rubin degeneracy. The boundary, where the log likelihood becomes unbounded, is interesting.

Separable Models Higher Order The growth curve model with E(y) = (X Z)β, V(y) = Σ Ω, easily generalizes to K dimensional arrays as E(y) = (X 1 X 2 X K )β, V(y) = Σ 1 Σ 2 Σ K, where we can have covariance structures for each of the Σ k, and elaborate mean structures for X k and β as well. This can be used to cover various forms of array decomposition, such as ICA. The BR/ALS algorithm, in the next version of leopold, remains basically the same.

Separable Models Truth? The modes of the multi-array can be defined in various ways. One mode can be cross-sectional, if multiple variables are measured on the same individuals, or at the same time an location. One mode can be spatial, another temporal. Or, alternatively, one mode can be longitude and another latitude. As in the students-in-schools context, most processes observed in nature are not separable. But neither are regressions linear, distributions normal, and variables conditionally independent. Models are false, but means and variances are still useful summaries. Same for regression, GLM, and PCA/ICA. Truth is elusive. The question is if the matrix approximations help to make description simpler and/or prediction better, and how much bias is traded off against how much variance. Ultimately, the client/scientist should decide, preferably in a reproducible way.