Review of Unconditional Multilevel Models for Longitudinal Data

Size: px

Start display at page:

Download "Review of Unconditional Multilevel Models for Longitudinal Data"

Ronald Norton
5 years ago
Views:

1 Review of Unconditional Multilevel Models for Longitudinal Data Topics: Course (and MLM) overview Concepts in longitudinal multilevel modeling Model comparisons and significance testing Describing within-person fluctuation using ACS models Describing within-person change using random effects Describing nonlinear patterns of change CLP 945: Lecture 1 1

2 What is CLP 945 about? Longitudinal data (still) Same individual units of analysis measured at different occasions (which can range from milliseconds to decades) Repeated measures data Same individual units of analysis measured via different items, using different stimuli, or under different conditions Clustered and cross-classified data Same individual units of analysis (one or more kinds of groups) measured via different people All of these fall under a more general category of multivariate data of varying complexity In common is use of random effects to describe sources of dependency of outcomes from the same unit (just different units) CLP 945: Lecture 1

3 What is a Multilevel Model (MLM)? Same as other terms you have heard of: General Linear Mixed Model (if you are from statistics) Mixed = Fixed and Random effects Random Coefficients Model (also if you are from statistics) Random coefficients = Random effects = latent variables/factors Hierarchical Linear Model (if you are from education) Not the same as hierarchical regression Special cases of MLM: Random Effects ANOVA or Repeated Measures ANOVA (Latent) Growth Curve Model (where Latent implies SEM) Within-Person Fluctuation Model (e.g., for daily diary data) Clustered/Nested Observations Model (e.g., for kids in schools) Cross-Classified Models (e.g., value-added models) Psychometric Models (e.g., factor analysis, item response theory) CLP 945: Lecture 1 3

4 The Two Sides of Any Model Model for the Means: Aka Fixed Effects, Structural Part of Model What you are used to caring about for testing hypotheses How the expected outcome for a given observation varies as a function of values on predictor variables Model for the Variance: Aka Random Effects and Residuals, Stochastic Part of Model What you are used to making assumptions about instead How residuals are distributed and related across observations (persons, groups, time, etc.) these relationships are called dependency and this is the primary way that multilevel models differ from general linear models (e.g., regression) CLP 945: Lecture 1 4

5 Review: Variances and Covariances Variance: Dispersion of y Variance (y ) t N i1 (y y ) ti N k ti Covariance: How y s go together, unstandardized Covariance (y,y ) 1 N i1 (y y )(y y ) 1i 1i N k i i Correlation: How y s go together, standardized ( 1 to 1) Correlation (y,y ) 1 Covariance(y,y ) 1 Variance(y ) * Variance(y ) 1 N = # people, t = time, i = person k = # fixed effects, y ti = y predicted from fixed effects CLP 945: Lecture 1 5

6 Dimensions for Organizing Models Outcome type: General (normal) vs. Generalized (not normal) Dimensions of sampling: One (so one variance term per outcome) vs. Multiple (so multiple variance terms per outcome) OUR WORLD General Linear Models: conditionally normal outcome distribution, fixed effects (identity link; only one dimension of sampling) Generalized Linear Models: any conditional outcome distribution, fixed effects through link functions, no random effects (one dimension) Note: Least Squares is only for GLM General Linear Mixed Models: conditionally normal outcome distribution, fixed and random effects (identity link, but multiple sampling dimensions) Generalized Linear Mixed Models: any conditional outcome distribution, fixed and random effects through link functions (multiple dimensions) Linear means fixed effects predict the link-transformed conditional mean of DV in a linear combination of (effect*predictor) + (effect*predictor) CLP 945: Lecture 1 6

7 What can MLM do for you? 1. Model dependency across observations Longitudinal, clustered, and/or cross-classified data? No problem! Tailor your model of sources of correlation to your data. Include categorical or continuous predictors at any level Time-varying, person-level, group-level predictors for each variance Explore reasons for dependency, don t just control for dependency 3. Does not require same data structure for each person Unbalanced or missing data? No problem! 4. You already know how (or you will soon)! Use SPSS Mixed, SAS Mixed, Stata, Mplus, R, HLM, MlwiN What s an intercept? What s a slope? What s a pile of variance? CLP 945: Lecture 1 7

8 Review of Unconditional Multilevel Models for Longitudinal Data Topics: Course (and MLM) overview Concepts in longitudinal multilevel modeling Model comparisons and significance testing Describing within-person fluctuation using ACS models Describing within-person change using random effects Describing nonlinear patterns of change CLP 945: Lecture 1 8

9 Levels of Analysis in Longitudinal Data Between-Person (BP) Variation: Level INTER-individual Differences Time-Invariant All longitudinal studies begin as cross-sectional studies Within-Person (WP) Variation: Level 1 INTRA-individual Differences Time-Varying Only longitudinal studies can provide this extra information Longitudinal studies allow examination of both types of relationships simultaneously (and their interactions) Any variable measured over time usually has both BP and WP variation BP = more/less than other people; WP = more/less than one s average I use person here, but level can be anything that is measured repeatedly (like animals, schools, countries ) CLP 945: Lecture 1 9

10 A Longitudinal Data Continuum Within-Person Change: Systematic change Magnitude or direction of change can be different across individuals Growth curve models Time is meaningfully sampled Within-Person Fluctuation: No systematic change Outcome just varies/fluctuates over time (e.g., emotion, stress) Time is just a way to get lots of data per individual Pure WP Change Pure WP Fluctuation Time Time CLP 945: Lecture 1 10

11 The Simplest Possible Model: The Empty GLM 140 y i = β 0 + e i Mean = Std. Dev. = N = 1,334 Filling in values: 3 = Model for the Means error variance: is also called ( y-hat ) 0 CLP 945: Lecture 1 11

12 The Two Sides of a (BP) GLM Model for the Means (Predicted Values): Each person s expected (predicted) outcome is a weighted linear function of his/her values on X and Z (and here, their interaction), each measured once per person (i.e., this is a BP, univariate model) Estimated parameters are called fixed effects (here, β, β, β, and β ) Model for the Variance ( Piles of Variance): e N 0, σ ONE source of residual (unexplained) deviation Our main focus e has a mean of 0 with some estimated constant variance σ, is normally distributed, is unrelated to X and Z, and is unrelated across people (across all observations, just people here) Estimated parameter is the residual variance only (not each e ), Proportion of residual variance reduced relative to empty model = R CLP 945: Lecture 1 1

13 Empty +Within-Person Model Start off with Mean of Y as best guess for any value: = Grand Mean = Fixed Intercept Can make better guess by taking advantage of repeated observations: = Person Mean Random Intercept CLP 945: Lecture 1 13

14 Empty +Within-Person Model Variance of Y sources: Between-Person (BP) Variance: Differences from GRAND mean INTER-Individual Differences Within-Person (WP) Variance: Differences from OWN mean INTRA-Individual Differences This part is only observable through longitudinal data. Now we have piles of variance in Y to predict. CLP 945: Lecture 1 14

15 Hypothetical Longitudinal Data CLP 945: Lecture 1 15

16 Error in a BP Model for the Variance: Single-Level Model e ti represents all Y variance e 1i e i e3i e 4i e 5i CLP 945: Lecture 1 16

Error in a +WP Model for the Variance: Multilevel Model U 0i = random intercept that represents BP variance in mean Y e ti = residual that represents WP

17 Error in a +WP Model for the Variance: Multilevel Model U 0i = random intercept that represents BP variance in mean Y e ti = residual that represents WP variance in Y U 0i e 1i e i e3i e 4i e 5i U 0i also represents constant dependency (covariance) due to mean differences in Y across persons CLP 945: Lecture 1 17

18 Empty +Within-Person Model Variance of Y sources: e ti e ti U 0i Level Random Intercept Variance (of U 0i, as ): Between-Person Variance Differences from GRAND mean 80 INTER-Individual Differences eti U 0i Level 1 Residual Variance (of e ti, as ): 0 Within-Person Variance Differences from OWN mean INTRA-Individual Differences CLP 945: Lecture 1 18

19 BP vs. +WP Empty Models Empty Between-Person Model (used for 1 occasion): y i = β 0 + e i β 0 = fixed intercept = grand mean e i = residual deviation from GRAND mean Empty +Within-Person Model (>1 occasions): y ti = β 0 + U 0i + e ti β 0 = fixed intercept = grand mean U 0i = random intercept = individual deviation from GRAND mean e ti = time-specific residual deviation from OWN mean CLP 945: Lecture 1 19

20 Intraclass Correlation (ICC) Intraclass Correlation (ICC): ICC Corr(y,y ) 1 BP BP WP Cov(y,y ) Intercept Var. Intercept Var. Residual Var. 1 Var(y )* Var(y ) 1 e u 0 u 0 u0 u 0 eu 0 u0 u u e u R matrix R CORR Matrix 1 ICC ICC ICC 1 ICC ICC ICC 1 ICC = Proportion of total variance that is between persons ICC = Average correlation among occasions (in RCORR) ICC is a standardized way of expressing how much we need to worry about dependency due to person mean differences (i.e., ICC is an effect size for constant person dependency) CLP 945: Lecture 1 0

21 Counter-Intuitive: Between-Person Variance is in the numerator, but the ICC is the correlation over time! ICC = BTW / BTW + within Large ICC Large correlation over time ICC = btw / btw + WITHIN Small ICC Small correlation over time CLP 945: Lecture 1 1

22 BP and +WP Conditional Models Multiple Regression, Between-Person ANOVA: 1 PILE y i = (β 0 + β 1 X i + β Z i ) + e i e i ONE residual, assumed uncorrelated with equal variance across observations (here, just persons) BP (all) variation Repeated Measures, Within-Person ANOVA: PILES y ti = (β 0 + β 1 X i + β Z i ) + U 0i + e ti U 0i A random intercept for differences in person means, assumed uncorrelated with equal variance across persons BP (mean) variation = is now leftover after predictors e ti A residual that represents remaining time-to-time variation, usually assumed uncorrelated with equal variance across observations (now, persons and time) WP variation = is also now leftover after predictors CLP 945: Lecture 1

23 ANOVA for longitudinal data? There are 3 possible kinds of ANOVAs we could use: Between-Persons/Groups, Univariate RM, and Multivariate RM NONE OF THEM ALLOW: Missing occasions (do listwise deletion due to least squares) Time-varying predictors (covariates are BP predictors only) Each includes the same model for the means for time: all possible mean differences (so 4 parameters to get to 4 means) Saturated means model : β 0 + β 1 (T 1 ) + β (T ) + β 3 (T 3 ) The Time variable must be balanced and discrete in ANOVA! These ANOVAs differ by what they predict for the correlation across outcomes from the same person in the model for the variance i.e., how they handle dependency due to persons, or what they say the variance and covariance of the y ti residuals should look like CLP 945: Lecture 1 3

24 1. Between-Groups ANOVA Uses e ti only (total variance = a single variance term of σ ) Assumes no covariance at all among observations from the same person: Dependency? What dependency? Will usually be very, very wrong for longitudinal data WP effects tested against wrong residual variance (significance tests will often be way too conservative) Will also tend to be wrong for clustered data, but less so (because the correlation among persons from the same group is not as strong as the correlation among occasions from the same person) Predicts a variance-covariance matrix over time (here, 4 occasions) like this, called Variance Components (R matrix is TYPE=VC on REPEATED): R matrix σ e σ e σ e σ CLP 945: Lecture 1 4 e

25 a. Univariate Repeated Measures Separates total variance into two sources: Between-Person (mean differences due to U 0i, or ) Within-Person (remaining variance due to e ti, or ) Predicts a variance-covariance matrix over time (here, 4 occasions) like this, called Compound Symmetry (R matrix is TYPE=CS on REPEATED): Mean differences from U 0i are the only reason why occasions are correlated Will usually be at least somewhat wrong for longitudinal data If people change at different rates, the variances and covariances over time have to change, too R matrix eu 0 u 0 u 0 u0 u 0 eu 0 u 0 u0 u 0 u 0 eu 0 u0 u u u e u Time Time CLP 945: Lecture 1 5

26 The Problem with Univariate RM ANOVA Univ. RM ANOVA τ σ predicts compound symmetry: All variances and all covariances are equal across occasions In other words, the amount of error observed should be the same at any occasion, so a single, pooled error variance term makes sense If not, tests of fixed effects may be biased (i.e., sometimes tested against too much or too little error, if error is not really constant over time) COMPOUND SYMMETRY RARELY FITS FOR LONGITUDINAL DATA But to get the correct tests of the fixed effects, the data must only meet a less restrictive assumption of sphericity: In English pairwise differences between adjacent occasions have equal variance and covariance (satisfied by default with only occasions) If compound symmetry is satisfied, so is sphericity (but see above) Significance test provided in ANOVA for where data meet sphericity assumption Other RM ANOVA approaches are used when sphericity fails CLP 945: Lecture 1 6

27 The Other Repeated Measures ANOVAs b. Univariate RM ANOVA with sphericity corrections Based on ε how far off sphericity (from 0-1, 1=spherical) Applies an overall correction for model df based on estimated ε, but it doesn t really address the problem that data model 3. Multivariate Repeated Measures ANOVA All variances and covariances are estimated separately over time (here, 4 occasions), called Unstructured (R matrix is TYPE=UN on REPEATED) it s not a model, it IS the data: Because it can never be wrong, UN can be useful for complete and balanced longitudinal data with few occasions (e.g., -4) Parameters = # # R matrix σ σ σ σ σ σ σ σ σ σ σ σ σ σ σ σ so can be hard to estimate Unstructured can also be specified to include random intercept variance τ Every other model for the variances is nested within Unstructured (we can do model comparisons to see if all other models are NOT WORSE) CLP 945: Lecture 1 7

28 Summary: ANOVA approaches for longitudinal data are one size fits most Saturated Model for the Means (balanced time required) All possible mean differences Unparsimonious, but best-fitting (is a description, not a model) 3 kinds of Models for the Variances (complete data required) BP ANOVA (σ only) assumes independence and constant variance over time Univ. RM ANOVA τ σ assumes constant variance and covariance Multiv. RM ANOVA (whatever) no assumptions; is a description, not a model there is no structure that shows up in a scalar equation (i.e., the way U 0i + e ti does) MLM will give us more flexibility in both parts of the model: Fixed effects that predict the pattern of means (polynomials, pieces) Random intercepts and slopes and/or alternative covariance structures that predict intermediate patterns of variance and covariance over time CLP 945: Lecture 1 8

29 Review of Unconditional Multilevel Models for Longitudinal Data Topics: Course (and MLM) overview Concepts in longitudinal multilevel modeling Model comparisons and significance testing Describing within-person fluctuation using ACS models Describing within-person change using random effects Describing nonlinear patterns of change CLP 945: Lecture 1 9

30 Relative Model Fit by Model Side Nested models (i.e., in which one is a subset of the other) can now differ from each other in two important ways Model for the Means which predictors and which fixed effects of them are included in the model Does not require assessment of relative model fit using LL or LL (can still use univariate or multivariate Wald tests for this) Model for the Variance what the pattern of variance and covariance of residuals from the same unit should be DOES require assessment of relative model fit using LL or LL Cannot use the Wald test p-values that show up on the output for testing significance of variances because those p-values are use a two-sided sampling distribution for what the variance could be (but variances cannot be negative, so those p-values are not valid) CLP 945: Lecture 1 30

31 Comparing Models for the Variance Requires assessing relative model fit via model comparison: how well does the model fit relative to other possible models? Relative fit is indexed by overall model log-likelihood (LL): Log of likelihood for each person s outcomes given model parameters Sum log-likelihoods across all independent persons = model LL Two flavors: Maximum Likelihood (ML) or Restricted ML (REML) What you get for this on your output varies by software Given as *log likelihood ( LL) in SAS or SPSS MIXED: LL gives BADNESS of fit, so smaller value = better model Given as just log-likelihood (LL) in STATA MIXED and Mplus: LL gives GOODNESS of fit, so bigger value = better model CLP 945: Lecture 1 31

32 Comparing Models for the Variance Two main questions in choosing a model for the variance: How does the variance of the residuals differ across time? How does the covariance of the residuals differ across time? Nested models are compared using a likelihood ratio test : LL test (aka, χ test in SEM; deviance difference test in MLM) fewer = from model with fewer parameters more = from model with more parameters 1. Calculate LL: if given LL, do LL = ( LL fewer ) ( LL more ) if given LL, do LL = *(LL fewer LL more ). Calculate df = (# Parms more ) (# Parms fewer ) 3. Compare LL to χ distribution with df = df Results of 1. &. must be positive values! 4. Get p-value from CHIDIST in excel or LRTEST option in STATA CLP 945: Lecture 1 3

33 Comparing Models for the Variance What your p-value for the LL test means: If you ADD parameters, then your model can get better (if LL test is significant ) or not better (not significant) If you REMOVE parameters, then your model can get worse (if LL test is significant ) or not worse (not significant) Nested or non-nested models can also be compared by Information Criteria that also reflect model parsimony No significance tests or critical values, just smaller is better AIC = Akaike IC = LL + *(#parameters) BIC = Bayesian IC = LL + log(n)*(#parameters) What parameters means depends on flavor (except in STATA): ML = ALL parameters; REML = variance model parameters only CLP 945: Lecture 1 33

34 Flavors of Maximum Likelihood Remember that Maximum likelihood comes in flavors: Restricted (or residual) maximum likelihood Only available for general linear models or general linear mixed models (that assume normally distributed residuals) Is same as LS given complete outcomes, but it doesn t require them Estimates variances the same way as in LS (accurate) y y Nk Maximum likelihood (ML; also called FIML*) Is more general, is available for the above plus for non-normal outcomes and latent variable models (CFA/SEM/IRT) Is NOT the same as LS: it under-estimates variances by not accounting for the # of estimated fixed effects y y *FI = Full information it uses all original data (they both do) N CLP 945: Lecture 1 34

35 Flavors of Full-Information Maximum Likelihood Restricted maximum likelihood (REML; used in MIXED) Provides unbiased variances Especially important for small N (< 100 units) y y Nk LL test cannot be used to compare models differing in fixed effects (no biggee; we can do this using univariate or multivariate Wald tests) LL test MUST be used to compare different models for the variance Maximum likelihood (ML; also used in MIXED) Variances (and SEs) are too small in small samples y y N Is only option in most software for path models and SEM LL test can be used to compare any nested model; must be used to compare different models for the variance CLP 945: Lecture 1 35

36 ML vs. REML in a nutshell Remember population vs. sample formulas for calculating variance? All comparisons must have same N!!! To select, type Population y y ML N METHOD=ML (- log likelihood) Sample y y Nk REML METHOD=REML default (- res log likelihood) In estimating variances, it treats fixed effects as So, in small samples, L variances will be But because it indexes the fit of the You can compare models differing in Known (df for having to also estimate fixed effects is not factored in) Too small (less difference after N=30-50 or so) Entire model (means + variances) Fixed and/or random effects (either/both) Unknown (df for having to estimate fixed effects is factored in) Unbiased (correct) Variances model only Random effects only (same fixed effects) CLP 945: Lecture 1 36

37 Rules for Comparing Models All observations must be the same across models! Compare Models Differing In: Type of Comparison: Means Model (Fixed) Only Variance Model (Random) Only Both Means and Variances Model (Fixed and Random) Nested? YES, can do significance tests via Fixed effect p-values from ML or REML -- OR -- ML LL only (NO REML LL) NO p-values REML LL (ML LL is ok if big N) ML LL only (NO REML LL) Non-Nested? NO signif. tests, instead see ML AIC, BIC (NO REML AIC, BIC) REML AIC, BIC (ML ok if big N) ML AIC, BIC only (NO REML AIC, BIC) Nested = one model is a direct subset of the other Non-Nested = one model is not a direct subset of the other CLP 945: Lecture 1 37

38 Review of Unconditional Multilevel Models for Longitudinal Data Topics: Course (and MLM) overview Concepts in longitudinal multilevel modeling Model comparisons and significance testing Describing within-person fluctuation using ACS models Describing within-person change using random effects Describing nonlinear patterns of change CLP 945: Lecture 1 38

39 Modeling Change vs. Fluctuation Pure WP Change Our focus right now Pure WP Fluctuation Time Time Model for the Means: WP Change describe pattern of average change (over time ) WP Fluctuation *may* not need anything (if no systematic change) Model for the Variances: WP Change describe individual differences in change (random effects) this allows variances and covariances to differ over time WP Fluctuation describe pattern of variances and covariances over time CLP 945: Lecture 1 39

40 Big Picture Framework: Models for the Variance in Longitudinal Data Parsimony Compound Symmetry (CS) U 0 e U 0 U 0 U0 U 0 U 0 e U 0 U 0 U 0 U 0 U 0 e U 0 U 0 U 0 U 0 U 0 e Univariate RM ANOVA Most useful model: likely somewhere in between! NAME...THAT STRUCTURE! Unstructured (UN) σ σ σ σ σ σ σ σ σ σ σ σ σ σ σ σ T1 T1 T13 T14 T1 T T3 T4 T31 T3 T3 T43 T41 T4 T43 T4 Multivariate RM ANOVA Good fit What is the pattern of variance and covariance over time? CS and UN are just two of the many, many options available within MLM, including random effects models (for change) and alternative covariance structure models (for fluctuation). CLP 945: Lecture 1 40

41 Alternative Covariance Structure Models Useful in predicting patterns of variance and covariance that arise from fluctuation in the outcome over time: Variances: Same (homogeneous) or different (heterogeneous)? Covariances: Same or different? If different, what is the pattern? Models with heterogeneous variances predict correlation instead of covariance Often don t need any fixed effects for systematic effects of time in the model for the means (although this is always an empirical question) Limitations for most of the ACS models: Require equal-interval occasions (they are based on idea of time lag ) Require balanced time across persons (no intermediate time values) But do not require complete data (unlike when CS and UN are estimated via least squares in ANOVA instead of ML/REML in MLM) ACS models do require some new terminology to introduce CLP 945: Lecture 1 41

42 Two Families of ACS Models So far, we ve referred to the variance and covariance matrix of the longitudinal outcomes as the R matrix We now refer to these as R-only models (use REPEATED statement only) Although the R matrix is actually specified per individual, ACS models usually assume the same R matrix for everyone R matrix is symmetric with dimensions n x n, in which n = # occasions per person (although people can have missing data, the same set of possible occasions is required across people to use most R-only models) 3 other matrices we ll see in G and R combined ACS models: G = matrix of random effects variances and covariances (stay tuned) Z = matrix of values for predictors that have random effects (stay tuned) V = symmetric n x n matrix of total variance and covariance over time If the model includes random effects, then G and Z get combined with R to make V as (accomplished by adding the RANDOM statement) If the model does NOT include random effects in G, then so, R-only CLP 945: Lecture 1 4

43 R-Only ACS Models The R-only models to be presented next are all specified using the REPEATED statement only (no RANDOM statement) They are explained by showing their predicted R matrix, which provides the total variances and covariances across occasions Total variance per occasion on diagonal Total covariances across occasions on off-diagonals I ve included in the labels SAS uses for each parameter Correlations across occasions can be calculated given variances and covariances, which would be shown in the RCORR matrix (available in SAS PROC MIXED) 1 s on diagonal (standardized variables), correlations on off-diagonal Unstructured (TYPE=UN) will always fit best by LL All ACS models are nested within Unstructured (UN = the data) Goal: find an ACS model that is simpler but not worse fitting than UN CLP 945: Lecture 1 43

44 R-Only ACS Models: CS/CSH Compound Symmetry: TYPE=CS parameters: 1 residual variance 1 CS covariance across occasions Constant total variance: CS σ Constant total covariance: CS CS e CS CS CS CS CS e CS CS CS CS CS e CS CS CS CS CS e Compound Symmetry Heterogeneous: TYPE=CSH n+1 parameters: n separate Var(n) total variances 1 CSH total correlation across occasions Separate total variances are estimated directly σ CSH CSH CSH CSH σ CSH CSH CSH CSH σ CSH CSH CSH CSH σ T1 T1T T1T3 T1T4 T T1 T T T3 T T4 T3 T1 T3 T T3 T3 T4 T4 T1 T4 T T4 T3 T4 Still constant total correlation: CSH (but has non-constant covariances) CLP 945: Lecture 1 44

45 R-Only ACS Models: AR1/ARH1 1 st Order Auto-Regressive: TYPE=AR(1) parameters: 1 constant total variance (mislabeled residual ) 1 AR1 total auto-correlation r T across occasions r is lag-1 correlation, r is lag- correlation, r is lag-3 correlation. 1 st Order Auto-Regressive Heterogeneous: TYPE=ARH(1) n+1 parameters: n separate Var(n) total variances 1 ARH1 total autocorrelation r T across occasions r σ σ r σ r σ r σ r σ σ r σ r σ r σ r σ σ 1 3 σt rtσ T rtσ T rtσ T 1 1 T T T T T T T 1 1 T T T T T T T 3 1 T T T T T T T r σ r r r r σ r r r r σ 1 3 σt1 rtt1t rtt1t3 rtt1t4 1 1 TTT1 T TTT3 TTT4 1 1 TT3T1 TT3T T3 TT3T4 3 1 TT4T1 TT4T TT4T3 T4 r is lag-1 correlation, r is lag- correlation, r is lag-3 correlation. CLP 945: Lecture 1 45

46 R-Only ACS Models: TOEPn/TOEPHn Toeplitz(n): TYPE=TOEP(n) n parameters: 1 constant total variance (mislabeled residual ) n 1 TOEP(lag) c Tn banded total covariances across occasions σ c c c σ c c c σ T T1 σt T T1 T T3 T T1 T c is lag-1 covariance, c is lag- covariance, c is lag-3 covariance. Toeplitz Heterogeneous(n): TYPE=TOEPH(n) n + (n 1) parameters: n separate Var(n) total variances n 1 TOEPH(lag) r Tn banded total correlations across occasions r σ r r r r σ r r r r σ σt1 rt1t1t rt T1T3 rt3 T1T4 T1TT1 T T1TT3 TTT4 TT3T1 T1T3T T3 T1T3T4 T3T4T1 TT4T T1T4T3 T4 r is lag-1 correlation, r is lag- correlation, r is lag-3 correlation. CLP 945: Lecture 1 46

47 Comparing R-only ACS Models Baseline models: CS =simplest, UN = most complex Relative to CS, more complex models fit better or not better Relative to UN, less complex models fit worse or not worse Other rules of nesting and model comparisons: Homogeneous variance models are nested within heterogeneous variance models (e.g., CS in CSH, AR1 in ARH1, TOEP in TOEPH) CS and AR1 are each nested within TOEP (i.e., TOEP can become CS or AR1 through restrictions of its covariance patterns) CS and AR1 are not nested (because both have parameters) R-only models differ in unbounded parameters, so can be compared using regular LL tests (instead of mixture LL tests) Good idea to start by assuming heterogeneous variances until you settle on the covariance pattern, then test if het. var. are still necessary When in doubt, just compare AIC and BIC (useful even with LL tests) CLP 945: Lecture 1 47

48 The Other Family of ACS Models R-only models directly predict the total variance and covariance G and R models indirectly predict the total variance and covariance through between-person (BP) and within-person (WP) sources of variance and covariance So, for this model: y ti = β 0 + U 0i + e ti BP = G matrix of level- random effect (U 0i ) variances and covariances Which effects get to be random (whose variance and covariances are then included in G) is specified using the RANDOM statement (always TYPE=UN) Our ACS models have a random intercept only, so G is 1x1 scalar of WP = R matrix of level-1 (e ti ) residual variances and covariances The n x n R matrix of residual variances and covariances that remain after controlling for random intercept variance is then modeled with REPEATED Total = V = n x n matrix of total variance and covariance over time that results from putting G and R together: Z is a matrix that holds the values of predictors with random effects, but Z will be an n x 1 column of 1 s for now (random intercept only) CLP 945: Lecture 1 48

49 A Random Intercept (G and R) Model Total Predicted Data Matrix is called V Matrix U 0 e U 0 U 0 U0 U 0 U 0 e U 0 U0 U 0 U 0 U 0 e U U U U U e Level, BP Variance Unstructured GMatrix (RANDOM statement) Each person has same 1 x 1 G matrix (no covariance across persons in two-level model) Random Intercept Variance only τ U 0 Level 1, WP Variance Diagonal (VC) RMatrix (REPEATED statement) Each person has same n x n R matrix equal variances and 0 covariances across time (no covariance across persons) Residual Variance only σ e σ e σ e σ e CLP 945: Lecture 1 49

50 CS as a Random Intercept Model RI and DIAG: Total predicted data matrix is called V matrix, created from the G [TYPE=UN] and R [TYPE=VC] matrices as follows: T V Z * G * Z R V 1 U0 e U0 U0 U e e 0 0 U 0 U 0 e U 0 U0 V U e 0 U 0 U 0 U 0 e U0 1 Z represents n per person e U U U U e Does the end result V look familiar? It should: CS = CS e CS CS CS CS CS e CS CS CS CS CS e CS CS CS CS CS e So if the R-only CS model (the simplest baseline) can be specified equivalently using G and R, can we do the same for the R-only UN model (the most complex baseline)? Absolutely!...with one small catch CLP 945: Lecture 1 50

51 UN via a Random Intercept Model RI and UNn 1: Total predicted data matrix is called V matrix, created from the G [TYPE=UN] and R [TYPE=UN(n 1)] matrices as follows: T V Z* G * Z R V 1 U e1 σe1 σe e1 U σ 0 e1 U σ 0 e13 U 0 1 σe1 e σe3 σe4 σ σ σ V U σe31 σe3 e3 σe34 σ σ σ σe4 σ e43 e4 σ σ U e1 U e U e3 U e4 U e31 U e3 U e3 U e34 U U e4 U e43 U e4 This RI and UNn 1 model is equivalent to (makes same predictions as) the R-only UN model. But it shows the residual (not total) covariances. Because we can t estimate all possible variances and covariances in the R matrix and also estimate the random intercept variance τ in the G matrix, we have to eliminate the last R matrix covariance by setting it to 0. Accordingly, in the RI and UNn 1 model, the random intercept variance τ takes on the value of the covariance for the first and last occasions. CLP 945: Lecture 1 51

52 Rationale for G and R ACS models Modeling WP fluctuation traditionally involves using R only (no G) Total BP + WP variance described by just R matrix (so R=V) Correlations would still be expected even at distant time lags because of constant individual differences (i.e., the BP random intercept) Resulting R-only model may require lots of estimated parameters as a result e.g., 8 time points? Pry need a 7-lag Toeplitz(8) model Why not take out the primary reason for the covariance across occasions (the random intercept variance) and see what s left? Random intercept variance in G control for person mean differences THEN predict just the residual variance/covariance in R, not the total Resulting model may be more parsimonious (e.g., maybe only lag1 or lag occasions are still related after removing as a source of covariance) Has the advantage of still distinguishing BP from WP variance (useful for descriptive purposes and for calculating effect sizes later) CLP 945: Lecture 1 5

53 Random Intercept + Diagonal R Models RI and DIAG: V is created from G [TYPE=UN] and R [TYPE=VC]: homogeneous residual variances; no residual covariances T Same fit as R-only CS V Z * G * Z R V 1 U0 e U0 U0 U e e 0 0 U 0 U 0 e U 0 U0 V U e 0 U 0 U 0 U 0 e U e U U U U e RI and DIAGH: V is created from G [TYPE=UN] and R [TYPE=UN(1)]: heterogeneous residual variances; no residual covariances NOT same fit T V Z * G * Z R V as R-only CSH 1 U0 e1 U0 U0 U e e 0 0 U 0 U 0 e U 0 U 0 V U e3 0 U 0 U 0 U 0 e3 U e4 U 0 U 0 U 0 U 0 e4 CLP 945: Lecture 1 53

54 Random Intercept + AR1 R Models RI and AR1: V is created from G [TYPE=UN] and R [TYPE=AR(1)]: homogeneous residual variances; auto-regressive lagged residual covariances T V Z * G * Z R V σe reσ e reσ e reσ U e 0 e U r 0 eσe U r 0 eσe U r 0 eσ e reσe σe reσ e reσ e V U U r 0 eσe U 0 e U r 0 eσe U r 0 eσe reσ e reσe σe reσ e U r 0 eσe U r 0 eσe U 0 e U r 0 eσ e reσ e r 3 1 eσe reσe σ e U r 0 eσe U r 0 eσe U r 0 eσe U 0 e RI and ARH1: V is created from G [TYPE=UN] and R [TYPE=ARH(1)]: heterogeneous residual variances; auto-regressive lagged residual covariances T V Z* G * Z R V σe1 re σσ e1 e re σσ e1 e3 re σσ r σσ r σσ r σσ e1 e reσeσ V U e1 σe re σeσe3 re σeσ r σ σ r σ σ r σ σ e re σe3σe1 re σe3σe σe3 re σσ e3 e4 1 1 r σ σ r σσ r σ σ re σe4σe1 re σe4σe re σe4σe3 σ 3 1 e4 U r 0 eσe4σe1 U r 0 e σe4σe U r 0 eσe4σe3 U 0 e4 U e1 U e e1 e U e e1 e3 U e e1 e4 U e e e1 U e U e e e3 U e e e4 U e e3 e1 U e e3 e U e3 U e e3 e4 CLP 945: Lecture 1 54

55 Random Intercept + TOEPn 1 R Models RI and TOEPn 1: V is created from G [TYPE=UN] and R [TYPE=TOEP(n 1)]: homogeneous residual variances; banded residual covariances Same fit as R-only TOEP(n) T V Z * G * Z R V U0 e U c 0 e1 U c 1 σ 0 e U e ce1 ce ce1 σe ce1 ce 0 U V ce ce1 σe ce1 c c c ce ce1 σ e c c U ce1 U e U ce1 U ce U e U e1 U e U e1 U U e U e1 U e RI and TOEPHn 1: V is created from G [TYPE=UN] and R [TYPE=TOEPH(n 1)]: homogeneous residual variances; banded residual covariances NOT same fit as R-only TOEPH(n) T V Z* G * Z R V 1 1 r σ σ σ r σ V U r σ σ r σ σ σ r σ σ r σ σ r σ σ 1 σe1 re1 σe1σe reσe1σe3 0 U 0 e1 U r 0 e1σe1σe U r 0 eσe1σe3 U 0 e1 e e1 e e1 eσe3 reσeσe4 Ur 0 e1σeσe1 U 0 e Ur 0 e1σeσe3 Ur 0 eσeσe4 e e3 e1 e1 e3 e e3 e1 e3 e4 U 0 e e3 e1 U 0 e1 e3 e U 0 e3 U r 0 e1σe3 σe4 0 reσe4σe re1 σe4σe3 σ e4 U U reσe4σe U re1 σe4σe3 U e Because of τ, highest lag covariance in R must be set to 0 for model to be identified CLP 945: Lecture 1 55

56 Random Intercept + TOEP R Models RI and TOEP: V is created from G [TYPE=UN] and R [TYPE=TOEP()]: homogeneous residual variances; banded residual covariance at lag1 only T V Z * G * Z R V U0 e U c 1 σ 0 e1 U0 U e ce ce1 σe ce1 0 U c 0 e1 U 0 e U c U e1 U0 V 0 ce1 σe ce1 c c ce1 σ e c U U e1 U e U e1 U U U e1 U e RI and TOEPH1: V is created from G [TYPE=UN] and R [TYPE=TOEPH()]: homogeneous residual variances; banded residual covariance at lag1 only Now we can test the need for residual covariances at higher lags T V Z * G * Z R V 1 σe1 re1 σe1σe 0 0 U 0 e1 U r 0 e1σe1σe U 0 U0 1 re1 σeσe1 σe re1 σeσe3 0 V U U r 0 e1σeσe1 U 0 e U r 0 e1σeσe3 U 0 0 r r σ σ σ r σ σ e1σe3σe σe3 re1σe3σe4 U U re1 σe3σe U e3 U re1 σe3σe4 e1 e4 e3 e4 U U U e1 e4 e3 U e4 CLP 945: Lecture 1 56

57 Map of R-only and G and R ACS Models Arrows indicate nesting (end is more complex model) R-Only Models G and R Combined Models R-Only Models G and R Combined Models R-only Compound Symmetry Random Intercept & R-only Compound = Diagonal R Symmetry Het. Random Intercept & Diagonal Het. R n+1 Number of Parameters in Homogeneous Variance Models n R-only First-Order Auto-Regressive R-only n-1 Lag Toeplitz = Random Intercept & First-Order Auto-Regressive R Random Intercept & 1-Lag Toeplitz R Random Intercept & -Lag Toeplitz R Random Intercept & n- Lag Toeplitz R R-only First-Order Auto-Regressive Het. R-only n-1 Lag Toeplitz Het. Random Intercept & First-Order Auto- Regressive Het. R Random Intercept & 1-Lag Toeplitz Het. R Random Intercept & -Lag Toeplitz Het. R Random Intercept & n- Lag Toeplitz Het. R n+1 n+ n+ n+3 n+n-1 Number of Parameters in Heterogeneous Variance Models R-only n-order Unstructured = Random Intercept & n-1 order Unstructured R n*(n+1) / Homogeneous Variance over Time Heterogeneous (Het.) Variance over Time CLP 945: Lecture 1 57

58 Stuff to Watch Out For If using a random intercept, don t forget to drop 1 parameter in: n-1 order UN R: Can t get all possible elements in R, plus τ in G TOEPn 1: Have to eliminate last lag covariance If using a random intercept Can t do RI + CS R: Can t get a constant in R, and then another constant in G Can often test if random intercept helps (e.g., AR1 is nested within RI + AR1) If time is treated as continuous in the fixed effects, you will need another variable for time that is categorical to use in the syntax: Continuous Time on MODEL statement Categorical Time on CLASS and REPEATED statements Most alternative covariance structure models assume time is balanced across persons with equal intervals across occasions If not, holding correlations of same lag equal doesn t make sense Other structures can be used for unbalanced time SP(POW)(time) = AR1 for unbalanced time (see SAS REPEATED statement for others) CLP 945: Lecture 1 58

59 Summary: Two Families of ACS Models R-only models: Specify R model on REPEATED statement without any random effects variances in G (so no RANDOM statement is used) Include UN, CS, CSH, AR1, AR1H, TOEPn, TOEPHn (among others) Total variance and total covariance kept in R, so R = V Other than CS, does not partition total variance into BP vs. WP G and R combined models (so G and R V): Specify random intercept variance τ in G using RANDOM statement, then specify R model using REPEATED statement G matrix = Level- BP variance and covariance due to U, so R = Level-1 WP variance and covariance of the e ti residuals R models what s left after accounting for mean differences between persons (via the random intercept variance τ in G) CLP 945: Lecture 1 59

60 Syntax for Models for the Variance Does your model include random intercept variance (for U 0i)? Use the RANDOM statement Gmatrix Random intercept models BP interindividual differences in mean Y What about residual variance (for e ti )? Use the REPEATED statement Rmatrix WITHOUT a RANDOM statement: R is BP and WP variance together = Total variances and covariances (to model all variation, so R = V) WITH a RANDOM statement: R is WP variance only = Residual variances and covariances to model WP intraindividual variation G and R put back together = V matrix of total variances and covariances The REPEATED statement is always there implicitly Any model always has at least one residual variance in R matrix But the RANDOM statement is only there if you write it G matrix isn t always necessary (don t always need random intercept) CLP 945: Lecture 1 60

61 Wrapping Up: ACS Models Even if you just expect fluctuation over time rather than change, you still should be concerned about accurately predicting the variances and covariances across occasions Baseline models (from ANOVA least squares) are CS & UN: Compound Symmetry: Equal variance and covariance over time Unstructured: All variances & covariances estimated separately CS and UN via ML or REML estimation allows missing data MLM gives us choices in the middle Goal: Get as close to UN as parsimoniously as possible R-only: Structure TOTAL variation in one matrix (R only) G+R: Put constant covariance due to random intercept in G, then structural RESIDUAL covariance in R (so that G and R V TOTAL) CLP 945: Lecture 1 61

62 Review of Unconditional Multilevel Models for Longitudinal Data Topics: Course (and MLM) overview Concepts in longitudinal multilevel modeling Model comparisons and significance testing Describing within-person fluctuation using ACS models Describing within-person change using random effects Describing nonlinear patterns of change CLP 945: Lecture 1 6

63 Modeling Change vs. Fluctuation Pure WP Change Our focus when using random effects models Pure WP Fluctuation Time Uses alternative covariance structure models instead Time Model for the Means: WP Change describe pattern of average change (over time ) WP Fluctuation *may* not need anything (if no systematic change) Model for the Variance: WP Change describe individual differences in change (random effects) this allows variances and covariances to differ over time WP Fluctuation describe pattern of variances and covariances over time CLP 945: Lecture 1 63

64 The Big Picture of Longitudinal Data: Models for the Means What kind of change occurs on average over time? There are two baseline models to consider: Empty only a fixed intercept (predicts no change) Saturated all occasion mean differences from time 0 (ANOVA model that uses # fixed effects= n) *** may not be possible in unbalanced data Parsimony Empty Model: Predicts NO change over time 1 Fixed Effect In-between options: polynomial slopes, piecewise slopes, nonlinear models Name that Trajectory! Saturated Means: Reproduces mean at each occasion # Fixed Effects = # Occasions Good fit CLP 945: Lecture 1 64

65 The Big Picture of Longitudinal Data: Models for the Variance Parsimony Compound Symmetry (CS) U 0 e U 0 U 0 U0 U 0 U 0 e U 0 U 0 U 0 U 0 U 0 e U 0 U 0 U 0 U 0 U 0 e Univariate RM ANOVA Most useful model: likely somewhere in between! Name...that Structure! Unstructured (UN) σ σ σ σ σ σ σ σ σ σ σ σ σ σ σ σ T1 T1 T13 T14 T1 T T3 T4 T31 T3 T3 T43 T41 T4 T43 T4 Multivariate RM ANOVA Good fit What is the pattern of variance and covariance over time? CS and UN are just two of the many, many options available within MLM, including random effects models (for change) and alternative covariance structure models (for fluctuation). CLP 945: Lecture 1 65

66 Empty +Within-Person Model Variance of Y sources: e ti e ti U 0i Level Random Intercept Variance (of U 0i, as ): Between-Person Variance Differences from GRAND mean 80 INTER-Individual Differences eti U 0i Level 1 Residual Variance (of e ti, as ): 0 Within-Person Variance Differences from OWN mean INTRA-Individual Differences CLP 945: Lecture 1 66

67 Empty Means, Random Intercept Model GLM Empty Model: y i = β 0 + e i MLM Empty Model: Level 1: y ti = β 0i + e ti Level : β 0i = γ 00 + U 0i 3 Total Parameters: Model for the Means (1): Fixed Intercept γ 00 Model for the Variance (): Level-1 Variance of e ti Level- Variance of U 0i Residual = time-specific deviation from individual s predicted outcome Fixed Intercept =grand mean (because no predictors yet) Random Intercept = individual-specific deviation from predicted intercept Composite equation: y ti = (γ 00 + U 0i ) + e ti CLP 945: Lecture 1 67

68 Augmenting the empty means, random intercept model with time questions about the possible effects of time: 1. Is there an effect of time on average? If the line describing the sample means not flat? Significant FIXED effect of time. Does the average effect of time vary across individuals? Does each individual need his or her own line? Significant RANDOM effect of time CLP 945: Lecture 1 68

69 Fixed and Random Effects of Time (Note: The intercept is random in every figure) No Fixed, No Random Yes Fixed, No Random No Fixed, Yes Random Yes Fixed, Yes Random CLP 945: Lecture 1 69

70 Fixed Linear Time, Random Intercept Model (4 total parameters: effect of time is FIXED only) Residual = time-specific deviation from individual s predicted outcome estimated variance of Multilevel Model Level 1: y ti = β 0i + β 1i (Time ti )+ e ti Fixed Intercept = predicted mean outcome at time 0 Fixed Linear Time Slope = predicted mean rate of change per unit time Level : β 0i = γ 00 + U 0i β 1i = γ 10 Random Intercept = individual-specific deviation from fixed intercept estimated variance of Composite Model y ti = (γ 00 + U 0i )+(γ 10 )(Time ti )+ e ti β 0i β 1i Because the effect of time is fixed, everyone is predicted to change at exactly the same rate. CLP 945: Lecture 1 70

71 Explained Variance from Fixed Linear Time Most common measure of effect size in MLM is Pseudo-R Is supposed to be variance accounted for by predictors Multiple piles of variance mean multiple possible values of pseudo R (can be calculated per variance component or per model level) A fixed linear effect of time will reduce level-1 residual variance σ in R By how much is the residual variance σ reduced? Pseudo R = residual variance - residual variance fewer more e residual variancefewer If time varies between persons, then level- random intercept variance τ in G may also be reduced: Pseudo R = random intercept variance - random intercept variance fewer more U0 random intercept variancefewer But you are likely to see a (net) INCREASE in τ instead. Here s why: CLP 945: Lecture 1 71

72 Increases in Random Intercept Variance Level- random intercept variance τ will often increase as a consequence of reducing level-1 residual variance σ Observed level- τ is NOT just between-person variance Also has a small part of within-person variance (level-1 σ ), or: Observed = True + ( /n) As n occasions increases, bias of level-1 σ is minimized Likelihood-based estimates of true τ use (σ /n) as correction factor: True = Observed ( /n) For example: observed level- τ =4.65, level-1 σ =7.06, n=4 True τ = 4.65 (7.60/4) =.88 in empty means model Add fixed linear time slope reduce σ from 7.06 to.17 (R =.69) But now True τ = 4.65 (.17/4) = 4.10 in fixed linear time model CLP 945: Lecture 1 7

73 Random Intercept Models Imply People differ from each other systematically in only ONE way in intercept (U 0i ), which implies ONE kind of BP variance, which translates to ONE source of person dependency (covariance or correlation in the outcomes from the same person) If so, after controlling for BP intercept differences (by estimating the variance of U 0i as τ in the G matrix), the e ti residuals (whose variance and covariance are estimated in the R matrix) should be uncorrelated with homogeneous variance across time, as shown: Level- G matrix: RANDOM TYPE=UN τ U 0 Level-1 R matrix: REPEATED TYPE=VC σ e σ e σ e σ e G and R matrices combine to create a total V matrix with CS pattern U 0 e U 0 U 0 U0 U 0 U 0 e U 0 U 0 U 0 U 0 U 0 e U 0 U 0 U 0 U 0 U 0 e CLP 945: Lecture 1 73

74 Matrices in a Random Intercept Model Total predicted data matrix is called V matrix, created from the G [TYPE=UN] and R [TYPE=VC] matrices as follows: T V Z * G * Z R V 1 U0 e U0 U0 U e e 0 0 U 0 U 0 e U 0 U0 V U e 0 U 0 U 0 U 0 e U e VCORR then provides the intraclass correlation, calculated as: ICC = / ( + ) 1 ICC ICC ICC ICC 1 ICC ICC ICC ICC 1 ICC ICC ICC ICC 1 assumes a constant correlation over time U U U U e For any random effects model: G matrix = BP variances/covariances R matrix = WP variances/covariances Z matrix = values of predictors with random effects (just intercept here), which can vary per person V matrix = Total variance/covariance CLP 945: Lecture 1 74

75 Random Linear Time Model (6 total parameters) Multilevel Model Level 1: Residual = time-specific deviation from individual s predicted outcome estimated variance of y ti = β 0i + β 1i (Time ti )+ e ti Fixed Intercept = predicted mean outcome at time 0 Fixed Linear Time Slope = predicted mean rate of change per unit time Level : β 0i = γ 00 + U 0i β 1i = γ 10 + U 1i Random Intercept = individual-specific deviation from fixed intercept at time 0 estimated variance of Composite Model Random Linear Time Slope= individual-specific deviation from fixed linear time slope estimated variance of y ti = (γ 00 + U 0i )+(γ 10 + U 1i )(Time ti )+ e ti Also has an estimated covariance of random intercepts and slopes of β 0i β 1i CLP 945: Lecture 1 75

76 Random Linear Time Model y ti = (γ 00 + U 0i ) + (γ 10 + U 1i )(Time ti )+ e ti Fixed Intercept Random Intercept Deviation Fixed Slope Random Slope Deviation error for person i at time t Outcome u 1i = + γ 10 = 6 γ 00 =10 U 0i = -4 Mean P Linear (Mean) Linear (P) Time e ti = -1 6 Parameters: Fixed Effects: γ 00 Intercept, γ 10 Slope Random Effects Variances: U 0i Intercept Variance U 1i Slope Variance Int-Slope Covariance 1 e ti Residual Variance = CLP 945: Lecture 1 76

77 Quantification of Random Effects Variances We can test if a random effect variance is significant, but the variance estimates are not likely to have inherent meaning e.g., I have a significant fixed linear time effect of γ 10 = 1.7, so people increase by 1.7/time on average. I also have a significant random linear time slope variance of = 0.91, so people need their own slopes (people change differently). But how much is a variance of 0.91, really? 95% Random Effects Confidence Intervals can tell you Can be calculated for each effect that is random in your model Provide range around the fixed effect within which 95% of your sample is predicted to fall, based on your random effect variance: Random Effect 95% CI = fixed effect ± 1.96* Random Variance Linear Time Slope 95% CI = γ ± 1.96* τ 1.7 ± 1.96* 0.91 = 0.15 to U So although people improve on average, individual slopes are predicted to range from 0.15 to 3.59 (so some people may actually decline) CLP 945: Lecture

Describing Within-Person Fluctuation over Time using Alternative Covariance Structures

Describing Within-Person Fluctuation over Time using Alternative Covariance Structures Today s Class: The Big Picture ACS models using the R matrix only Introducing the G, Z, and V matrices ACS models