Longitudinal Data Analysis

Size: px

Start display at page:

Download "Longitudinal Data Analysis"

Hector Augustine Freeman
5 years ago
Views:

Longitudinal Data Analysis Mike Allerhand This document has been produced for

No part of this document may be reproduced, in any form or by any means,

The CCACE is jointly funded by the University of Edinburgh and four of the

Lifelong Health and Wellbeing initiative.

1 Longitudinal Data Analysis Mike Allerhand This document has been produced for the CCACE short course: Longitudinal Data Analysis. No part of this document may be reproduced, in any form or by any means, without permission in writing from the CCACE. The CCACE is jointly funded by the University of Edinburgh and four of the United Kingdom s research councils, BBSRC, EPSRC, ESRC, MRC, under the Lifelong Health and Wellbeing initiative. The document was generated using the R knitr package and typeset using MiKTeX, (LaTeX for Windows), with beamer and TikZ packages. 2016, Dr Mike Allerhand, CCACE Statistician. 1 / 81

2 Straight-line growth Wide format y Observations at 5 measurement occasions. 4 Long format j y x x 2 / 81 Most longitudinal analysis programs require data in long format. Wide format is one row per case, and each row is a complete record. Here there s just one case, measures of something on 5 subsequent occasions. Long format is wide format reshaped so that repeated measures of a variable are stacked into a column, here y. It also needs a column to indicate which measurement occasion, (wave or time-point), each measure belongs to, here j = 1,..., 5. The measurement times are: x = 1,..., 5. Here x represents units of time. The coding assumes equal time intervals between successive measurements. The graph shows the measures y plotted against time x.

3 Straight-line growth y Equation of the straight line: 1 Slope β 1 y j = β 0 + β 1x j j = 1,..., Intercept β 0 x 3 / 81 The straight line is a model of how the measurements y change over time x. The parameters of the model are the intercept β 0 and slope β 1. The intercept β 0 is y when x = 0. The slope β 1 is the change in y per unit x, the change in y when x increases by 1. This particular model assumes a constant growth rate. y changes by the same amount for ANY unit increase of x. The model does not show how the rate of growth might tail off. This model is a perfect fit to these data.

4 Regression by ordinary least squares (OLS) y r = 1 y 1 < r < 1 y y j y y Residual e error j 0 0 x x 0 0 x x 4 / 81 Pearson correlation r = 1 indicates a perfect fit to a straight line. Correlation 1 < r < 1 indicates there is some residual error. A straight line is not a perfect fit. How do we choose the best line if no straight line is a perfect fit? OLS is a procedure for estimating the parameters of a regression model, such as a straight line. OLS estimates β 0 and β 1 for the line with the smallest residual variance. The residual variance σ 2 e is the variance of the residual errors e j around the line.

5 Residual variance Straight line regression model: y j = β 0 + β 1 x j + e j }{{}}{{} fixed random The residuals e j are assumed to be random measurement errors, as if drawn at random from a normal population with mean 0 and variance σ 2 e : e j N(0, σ 2 e ) The errors have zero mean because they vary symmetrically around the line. The consequences of that assumption are: 1. Any regression line always passes through the point ( x,ȳ). ȳ = β 0 + β 1 x 2. If β 1 = 0 the intercept β 0 is ȳ, the mean response. y j = β 0 + e j is a model of the mean. 5 / 81 y = β 0 + β 1 x + e ȳ = 1 n (β0 + β 1 x + e) = 1 β0 + 1 β1 x + 1 n n n 1 1 = β 0 + β 1 x + e n n e But e = 0 because the errors have zero mean, and 1 n x = x, so: ȳ = β 0 + β 1 x Therefore the point ( x,ȳ) always lies on the regression line. If β 1 = 0 then β 0 = ȳ. This is also true for multiple regression. ȳ = β 0 + β 1 x 1 + β 2 x

6 Unconditional and conditional models Unconditional model of the mean. Conditional model of the mean. y j = β 0 + e j y j = β 0 + β 1 x j + e j y y y y σ e 2 2 σ e 0 0 x x 0 0 x x 6 / 81 The unconditional model is intercept-only. There is no slope (it is flat) so the intercept β 0 estimates the mean response ȳ. Here the mean response is assumed not to depend upon x. The conditional model is conditional upon an explanatory variable x. Here the mean response is assumed to be different at different values of x. The intercept β 0 estimates the mean response when x = 0. The slope β 1 estimates the change in the mean per unit increase of x. If x is mean-centered so that x = 0, the intercept β 0 = ȳ. In the unconditional model, residual variance σe 2 equals response variance, Var(y). The conditional model has less residual variance. Part of Var(y) is explained by x. Var(y) is decomposed into two parts: a. The part explained by the straight-line relationship with x. b. The part that is unexplained residual variance σ 2 e. Residual variance of 0 would indicate all of Var(y) is explained by x. In that sense the size of the residual variance tells how closely the data fit the regression line, (how well x explains the variation in y).

7 OLS assumptions (1) Residuals should be: 1. Homoskedastic. The same residual variance σ 2 e at all x. 2. Un-correlated. Residuals should not depend upon each other. y σ e 2 These assumptions are used to derive a formula for the variance of the slope estimator. σ x 2 Var( ˆβ σe 2 1 ) = (n 1)σx 2 Its square root is the slope standard error. 0 0 x If these assumptions are violated standard errors will be incorrect. Then confidence intervals and p-values will also be incorrect. 7 / 81 The box around the regression line represents the slope standard error. The standard error is the standard deviation of the estimator s sampling distribution. Small standard error indicates greater precision. Results are more repeatable. Small standard error, (represented by a long thin box), is given by low residual variance, more degrees-of-freedom, and greater range of x. Including more explanatory variables in the model does not always improve it. Overfitting a specific sample loses generality. ˆ ˆ Residual variance goes down. Standard errors may increase. More parameters to estimate loses degrees-of-freedom. Explanatory variables may confound each other, reducing each variable s unique variance. Aim for a parsimoneous model with acceptable fit.

8 OLS assumptions (2) You have to assume a functional form for the relationship. A straight line is not the only model. It may be mis-specified. y y x x 8 / 81 These two datasets are contrived to have identical variances and covariance. Correlation is blind to the difference. Linear correlation only knows about straight-lines. Fitting a straight-line regression model to both datasets: the estimates, standard errors, and p-values are identical. You have to compare different models fitted to the same data. Compare their goodness-of-fit and test the difference. Compared with a straight-line model, a quadratic model is a much better fit for the data on the right. It also has much lower standard errors.

9 A small dataset Wide format Each row is one person s repeated measures. Each column is a measurement occasion. Long format i j y i th person j th measurement occasion. 9 / 81 These data are Table 11.5 in: Maxwell & Delaney (1990) Designing Experiments and Analyzing Data. 12 children were tested at age 30, 36, 42, and 48 months, (McCarthy scale of children s abilities). 1. Is there, on average, systematic growth in ability over time? 2. Is there variability in growth over time? In wide format each row is a case: one subject s record of observations. Long format is wide format reshaped so that repeated measures are stacked. Long format needs extra columns for indicator variables. i and j indicate which person and which time-point each measurement belongs to. y ij denotes a measurement of the i th person at the j th time-point. Time-points j are repeated within each person i, (and vice versa). Each pair (ij) is unique because the indices are nested.

10 Time as an independent variable i j y x Metrics Coded time-points, (eg. 0,1,2,3). Chronological age (eg. 30,36,42,48 months). Time since baseline, (eg. 0,6,12,18 months). Any meaningful non-decreasing measure / 81 Mixed effects models treat time as data. Time enters the model as an independent variable. Here variable x is time as chronological age in months. The growth rate is the slope of the response per unit time, (per month). These data are strongly balanced. Everyone has the same time-points: the same baseline times and intervals. (Here the intervals are all equal, but that is not strictly necessary). No-one has any missing time-points.

11 Pooled and subject-specific data Pooled data x y Subject-specific data x y 11 / 81 Pooled data are irrespective of grouping by subject. Subject-specific data are indicated by a spaghetti plot. Joining the dots that belong to a specific subject.

12 Pooled and subject-specific data Pooled regression line x y Subject-specific regression lines x y 12 / 81 Subject-specific regression lines often show growth fan-in or fan-out. Here there is fan-in, (except for some unusual subjects). If the data are strongly balanced, (same time-points, none missing), the pooled regression line is the average of the subject-specific regression lines. The intercept of the pooled line is the average subject-specific intercept. The slope is the average subject-specific slope.

13 Fitting a straight-line model R: fit = lm(y x, data) summary(fit) Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) e-09 x R: fit = lmer(y x + (x i), data) summary(fit) Random effects: Groups Name Variance Std.Dev. Corr i (Intercept) x Residual Fixed effects: Estimate Std. Error df t value Pr(> t ) (Intercept) e-06 x / 81 The upper table shows a regression model fitted to pooled data by OLS. The lower table shows a mixed-effects model fitted to subject-specific data by REML, (restricted maximum likelihood). The coefficients of the pooled analysis are the same as the fixed effects of the mixed-effects model, (because these data are strongly balanced). But the standard errors, and hence p-values, are different. The growth rate (x) is non-significant in the pooled analysis. But its standard errors are incorrect because these data violate OLS assumptions. It is (just) significant in the mixed-effects model. This is achieved by accounting for individual growth, (blocking on persons). The mixed-effects model has some additional parameters: the random effects. These represent variation around the average effects due to subject-specific differences. The intercept estimate is the expected response when time x = 0. The intercept variance is the variation in intercepts between subjects. These things have no meaning for a subject age 0. Centre time to give meaning to the intercept and its variance.

14 Centering time Centering time gives meaning to the intercept. The centre is 0 on a continuous scale. Centre time by subtracting a value from the time variable. i j y x Centre x on average baseline age: x ij = x ij x 1 Subtract 30 months from each x value / 81 Long format makes centering and scaling easy. Subtract a mean or some substantively meaningful time value close to the mean. Choose the centre to give meaning to the intercept. For example the expected response at the average baseline age, at the overall average age, or at some particular age. Note: if both time x and the response y are mean-centered, (eg. standardized), then the intercept becomes 0, (at the point ( x, ȳ)). Centering can change intercept variance and intercept-slope covariance, depending upon fan-in/out of subject-specific slopes. Centering on a time where fan-in/out is large makes the intercept variance large. Changing the intercept variance also changes the intercept-slope covariance. Some other reasons for centering time are: 1. It reduces collinearity in quadratic (and higher-order polynomial) models. 2. It can change the size and direction of a TIC direct effect, (if the TIC has a significant interaction with time).

15 Fitting a straight-line model R: fit = lm(y x, data) summary(fit) Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) <2e-16 x R: fit = lmer(y x + (x i), data) summary(fit) Random effects: Groups Name Variance Std.Dev. Corr i (Intercept) x Residual Fixed effects: Estimate Std. Error df t value Pr(> t ) (Intercept) e-11 x / 81 Re-fitting the same models as before. The only difference is x is now centered on the average baseline age. Now the intercept represents the expected response at 30 months. Again the coefficients of the regression model are the same as the fixed effects of the mixed-effects model, but the standard errors, and hence p-values, are different.

16 Variation in the data Variance-covariance matrix of repeated measures Cov(Y ) = σ 2 1 σ 12 σ 13 σ 14 σ 21 σ 2 1 σ 23 σ 24 σ 31 σ 32 σ 2 3 σ 34 σ 41 σ 42 σ 43 σ 2 4 Variances on the diagonal. Covariances off the diagonal Heteroskedasticity: different variance at different time-points. Serial correlation: non-zero covariance across time. OLS assumptions: equal variance on the diagonal, 0 covariance off the diagonal. 16 / Why are data serially correlated? Because the same panel is measured repeatedly over time. Some individual s measures are all relatively high, others relatively low. (The more so when there is greater difference between than within persons). Dependency upon previous observations may also come from practice effects. 2. Why are data heteroskedastic? Growth trajectories tend to fan-in or fan-out. (Typically fan-in during development, fan-out during decline). This makes variance of measures different at different time-points. Highly differential growth leads to independent measures. Consistent growth patterns lead to variance-covariance structure. The aim is to exploit patterns to account for individual growth and change in the context of many different individuals.

17 OLS regression of pooled data Longitudinal data violate a statistical assumption for OLS regression. (Residuals must be IID, Independent and Identically Distributed). Heteroskedastic, (residual variance not identical at each time point). Serial correlation, (residuals depend upon previous residuals). Consequence for OLS regression of pooled longitudinal data: The estimate of the slope may be correct, (provided the data are strongly balanced). The slope standard error is incorrect, (confidence interval and p-value are incorrect). How to account for the variation in the data? Decompose the total variance into between-person and within-person variance. Further decompose between-person variance into variance of growth parameters (intercept, slopes). 17 / 81

18 Linear mixed-effects Preliminary assumptions: Subjects are a random sample of a population. Results are conditional upon the sample. If the sample is random the results are unbiased population estimates. Everyone s growth curve has the same functional form. Different people may have different values for the growth parameters. Assuming straight-line growth for example, different people could have different intercepts and slopes. 18 / 81 A straight line is not the only model for the average person s growth trajectory. It s just the simplest.

19 Between-person variation Everyone has a growth curve. For example a straight-line. Different people have different parameters. For example different intercepts and slopes. Two kinds of parameters: The average intercept and slope. The variation in intercepts and slopes / 81

20 Subject-specific means Each subject has a mean of their own repeated measures. The grand mean β 0 is the average of the subject-specific means. Each subject s own mean may deviate from the grand mean. β β 0 β / 81

21 Unconditional model of the mean An unconditional regression line is a model of the grand mean, (β 0 estimates ȳ): y j = β 0 + e j Suppose the i th subject s mean deviates from the grand mean by u 0i. A model of the i th subject s mean, incorporating the grand mean, is: y ij = β 0 + u 0i + e ij Re-write as a 2-level model, where π 0i represents the i th subject s mean: y ij = π 0i + e ij π 0i = β 0 + u 0i The second level is another model of the mean. Its outcome is the subject-specific means, π 0i. So its intercept β 0 estimates the mean of those means. π 0i are random effects. Here they are subject-specific means, (the means of each subject s repeated measures). β 0 is a fixed effect, an average of random effects. Here it is the grand mean, the mean of the subject-specific means. 21 / 81 If everyone had the same average there would be no need for random effects. The fixed effects would be ordinary regression coefficients where one size fits all. β 0 is the grand mean in the equation y j = β 0 + e j. β 0 is also the average of the subject-specific means π 0i in the equation π 0i = β 0 + u 0i. The point of estimating the grand mean as the average of subject-specific means is to divide the total variance into homogeneous subgroups. It is the same idea as ANOVA with a blocking factor in a split-plot design. The aim is to get a more correct estimate of the standard error.

22 Decomposing variance y ij = β 0 + u 0i + e ij e j N(0, σ 2 e ) u 0i N(0, σ 2 0 ) Deviations from β 0 are divided into two parts: u 0i is the deviation of the i th subject s mean from β 0. e ij is the deviation of the i th subject at the j th time-point from their own mean. π 0i e ij u 0i Within and between-person variance: β 0 Var(y ij ) = Var(β 0 + u 0i + e ij ) = σ σ2 e + 2Cov(u 0i, e ij ) = σ σ2 e σ0 2 is between-person variance of the subject-specific means. σe 2 is within-person residual variance. Collectively these are called the variance components. 22 / 81 Between-person variation is composed of deviations u 0i of subject-specific means π 0i from the grand mean β 0. Within-person variation is composed of deviations e ij of a person s scores from their own mean π 0i. Modelling the subject-specific regressions decomposes the total variation into between-person and within-person components. These variance components are independent of each other, (Cov(u 0i, e ij ) = 0). This decomposition is fundamental to mixed-effects models.

23 Fitting the unconditional model of the mean R: fit = lmer(y 1 + (1 i), data) summary(fit) Random effects: Variance Std.Dev. (Intercept) b Residual c Fixed effects: Estimate Std. Error t value Pr(> t ) (Intercept) a e-12 a. β 0 Average subject-specific intercept, (average of subjects means). b. σ0 2 Between-person variation in intercepts, (means). c. σe 2 Average within-person variation. 23 / 81 In the R model formula: y 1 + (1 i) the 1 denotes the intercept. Read this as: Regress y on the intercept, but treat the intercept as a random effect grouped by i. In other words calculate intercepts by fitting the model y 1 individually to the repeated measures of each subject i.

24 Longitudinal intra-class correlation Where is most of the variation? Within groups, (people), or between, or somewhere in the middle. ICC = σ2 0 σ 2 0 +σ2 e Proportion of total variation that is between-persons. ICC=0 ICC=1 No variation between-persons, (σ0 2 = 0). No difference from regression of pooled data. No change within-person, (σe 2 = 0). People differ only in their mean level. ICC = = % of the total response variation is due to differences in mean level between-persons. 24 / 81 The purpose of the unconditional model is to decompose variance. Low ICC (< 0.2) suggests people are very similar, as if one person. There is no advantage to grouping. High ICC (> 0.8) suggests growth curves are flat and there is little change over time. Then there is little to be gained from repeated measures over time. Medium ICC, (say between 0.2 and 0.8), suggests there is within-person change over time and it is also worth grouping by persons to account for variation in change between-persons.

25 Between-person variation in intercepts and slopes Subject-specific intercepts and slopes. β β 0 β Deviations from the average. e ij u 0i β 0 π 0i π 0i is the i th subject s intercept. β 0 is the average of the subject-specific intercepts. 25 / 81 Th left plot shows each person s subject-specific regression line. The right plot highlights one person s repeated measures and their subject-specific regression line. Between-person difference makes each subject-specific regression line deviate from the average. The i th subject s regression line deviates from the average intercept β 0 by u 0i, and from the average slope β 1 by u 1i. Within-person residuals e ij are deviations of a subject s repeated measures from their own regression line.

26 Straight-line model of the mean conditional upon time The conditional regression line upon time x is: y j = β 0 + β 1 x j + e j A model of the i th person with subject-specific deviations from the average intercept and slope: y ij = (β 0 + u 0i ) + (β 1 + u 1i )x ij + e ij Re-write as a 2-level model: Level-1: y ij = π 0i + π 1i x ij + e ij Level-2: π 0i = β 0 + u 0i π 1i = β 1 + u 1i The second level models are again models of means. The outcomes are subject-specific intercepts π 0i and slopes π 1i. So the intercepts β 0 and β 1 estimate the mean intercept and slope. Random effects π 0i and π 1i are the i th subject s intercept and slope. Fixed effects β 0 and β 1 are the averages of the subject-specific intercepts and slopes. 26 / 81 Compared with the unconditional model, this model has more random effects. In the unconditional model π 0i were subject-specific means. In the conditional model π 0i are subject-specific intercepts and π 1i are subject-specific slopes. To specify the model, you choose which level-1 coefficients you want to treat as random effects. (It doesn t have to be all of them). Each random effect has some variance, (due to individual differences). These are collectively called the variance components. The complete set of model parameters includes both the fixed effects and the variance components. You may mainly be interested in the fixed effects. Then the variance components are nuisance parameters. They are used just to decompose variance so that the fixed effects have correct standard errors. Or the variance components may be of interest in their own right.

27 Variance components Variance is decomposed by subject-specific deviations into between-person variance: Slope [ ] ([ [ ]) u0i 0 σ 2 N, 0 u 1i 0] σ 01 σ1 2 leaving residual within-person variance: e ij N(0, σ 2 e ) β 1 0 σ0 2 is variance of subject-specific intercepts. σ1 2 is variance of subject-specific slopes. σ 01 is intercept-slope covariance. β 0 Intercept σe 2 is within-person residual variance. 27 / 81 The between-person variance components are drawn from a bivariate normal to allow the random effects to covary. Their covariance is an additional variance component. (Generally these include variances and covariances). The plot indicates intercept-slope covariance. Covariance implies a fan-in or fan-out pattern of trajectories. For example with negative covariance people with higher intercepts have a more negative slope. That suggests fan-in. When there is fan-in/out the intercept variance, and hence the intercept-slope covariance, depends upon centering. Slopes converge and cross over at some point. Re-centering can change the size and sign of intercept-slope covariance.

28 Shrinkage estimators Efficient estimator for average subject-specific parameters. Shrink subject-specific estimates towards their mean. The random effects are the estimates after shrinkage. The fixed effects are the averages of the random effects. Slope The amount a subject shrinks depends upon their reliability: β 1 The distance to the mean. The subject s residual variance. The number of non-missing observations of the subject. 0 Unreliable estimates are shrunk more towards the mean. β 0 Intercept Individuals borrow strength from others in that population. Unreliable estimates have less influence on the fixed effects and their standard errors. 28 / 81 Shrinkage estimators are efficient in the statistical sense of having lowest variance in the long run of repeated sampling. The blue dots on the plot are subject-specific estimates, the grey lines are their averages (β 0 and β 1 ), and the arrows show the direction and amount of shrinkage. Subject-specific estimates are considered unreliable when they are distant from the mean, with large residual errors, and missing observations. These are shrunk more. As a result the mean and variance of the whole cloud of points becomes a more reliable estimator of the population. Shrinkage enables subjects with missing values to contribute by allowing them to borrow strength from other subjects.

29 Fitting the standard mixed effects model R: fit = lmer(y x + (x i), data) summary(fit) Random effects: Groups Name Variance Std.Dev. Corr i (Intercept) c x d e Residual f Fixed effects: Estimate Std. Error df t value Pr(> t ) (Intercept) a e-11 x b a. β 0 Average intercept. b. β 1 Average slope. c. σ0 2 Intercept variance. d. σ1 2 Slope variance. e. σ 01 Intercept-slope covariance, (as a correlation coefficient r 01 ). f. σe 2 Average within-person residual variance. Slope-on-intercept regression coefficient: r 01 σ 1 σ 0 = / = / 81 The complete set of model parameters includes both the fixed effects and the variance components. The R summary function reports the variance components as random effects in the upper part of the table. The R model formula has an implied intercept. It could be written as: y 1 + x + (1+x i) Read as: Regress y on the intercept and slope of x, but treat both the intercept and slope as random effects grouped by i, and allow them to covary. The same model with covariance fixed at 0 could be specified: y x + (x i) The intercept-slope covariance is given as a correlation coefficient. To convert between correlation and covariance: r 01 = σ 01 /σ 0 σ 1. Confidence intervals for the variance components are provided by: confint(fit)

30 Fitting the standard mixed effects model STATA:. mixed y x i: x, covariance(unstructured) reml cog Coef. Std. Err. z P> z [95% Conf. Interval] x _cons Random-effects Parameters Estimate Std. Err. [95% Conf. Interval] id: Unstructured var(x) var(_cons) cov(x,_cons) var(residual) / 81 Different programs have their own syntax for specifying models, and report the same results in their own way. Stata calls the intercept cons. The option covariance(unstructured) specifies that there be no structural constraints applied to the variance-covariance of random effects. Here this allows intercept-slope covariance. The option reml specifies that parameter estimation use the REML procedure, (restricted maximum likelihood). This is the default for R.

31 Fitting the standard mixed effects model Mplus: VARIABLE: NAMES = i j y x ; USEVARIABLES = i y x ; WITHIN = x ; CLUSTER = i ; ANALYSIS: TYPE = TWOLEVEL RANDOM ; MODEL: %WITHIN% s y ON x ; %BETWEEN% y WITH s ; Within Level Two-Tailed Estimate S.E. Est./S.E. P-Value Residual Variances Y Between Level Y WITH S Means Y S Variances Y S / 81 Here Mplus calls the intercept Y and the slope of x S. For continuous outcome variables, (as here), Mplus uses FIML (full information maximum likelihood) estimation.

32 REML versus FIML These are methods for estimating parameters and fitting models to data. FIML = full information maximum likelihood. REML = restricted maximum likelihood. Why REML? Variance components estimated by FIML are biased (under-estimated) in small samples. It is because the calculation uses sample regression coefficients β. REML aims to correct small sample bias. It estimates variance components by maximizing the likelihood of residuals without using β. The β are calculated afterwards. REML versus FIML REML estimates of variance components are more accurate than FIML in small samples. They become similar in larger samples. Model comparisons based on likelihood calculated by REML cannot tell a difference in the β. Fixed-effects specification must be tested under FIML. Program defaults: R REML (function lmer) Stata FIML (function mixed) SAS REML (proc mixed) Mplus FIML 32 / 81 The REML procedure is analogous to the correction factor (1/(n 1)) used for estimating population variance from a random sample. Estimating a population variance is biased in small samples because the calculation uses the sample mean. Estimating population variance components is similarly biased because the calculation uses sample regression coefficients β. Variance is corrected using n 1 in the denominator for the average. Variance components are corrected by avoiding β in the REML calculation. Mplus uses FIML. To specify FIML using R: lmer(y x + (x i), data, REML=FALSE)

33 Fitting a latent growth curve model (LGC) Mplus: VARIABLE: NAMES = y1 y2 y3 y4 ; MODEL: i s y1@0 y2@6 y3@12 y4@18 ; y1 (err) ; y2 (err) ; y3 (err) ; y4 (err) ; S Two-Tailed Estimate S.E. Est./S.E. P-Value WITH I Means I S Variances I S Residual Variances Y Y Y Y / 81 Mplus can be used to fit growth curve models in the structural equation modelling framework. These are called latent growth curve models because the estimated growth parameters, (here intercept and slope), are latent variables, (factors). For equivalent results between LGC and mixed effects: 1. Hold residual variances equal across time-points. 2. Use the same coding for time-points. 3. Fit the mixed effects model using FIML.

34 Latent growth curve model y 1 y 2 y 3 y 4 1i+18s i+6s 1i+12s s s i s 1i+0s s i / 81 Squares are observed variables, (outcome at each wave). Circles are latent variables for growth factors. Equivalant to random effects: i = intercept. s = slope. Single-headed arrows point to a regression outcome. Four regression equations solved simultaneously: y1 = i y2 = i + 6s y3 = i + 12s y4 = i + 18s The regression coefficients (factor loadings) are fixed. They represent time-points coded to contrive a growth curve. Double-headed arrows are variances or covariances. The arrows at each y are residual variances. The arrow between i and s is covariance.

35 LGC models versus mixed-effects models Advantages of mixed-effects models Treats time as data in a natural way. Allows individually varying baseline times and intervals. Advantages of latent growth curve models Provides several goodness-of-fit measures. Can link multivariate measurement models into a growth model. Can link growth models into a multivariate structural model. 35 / 81 The main disadvantage is LGC models don t treat time as a variable, but as a structural constraint. It is difficult to allow individually varying time-points. Another disadvantage is LGC models are more susceptable to convergence problems. Mixed-effects models handle missing values straightforwardly. LGC models can have convergence problems here. The main advantage is LGC models are relatively easy to link into more complicated path models.

36 Multivariate measurement models y 11 y 21 y 31 y 12 y 22 y 32 y 13 y 23 y 33 f 1 f 2 f 3 i s 36 / 81 Each time-point is a multivariate measurement model. These are linked into a latent growth curve model. The aim of these models is greater reliability through multivariate measurement models. These measure what is common to the set of indicators at each time-point, and reject differential sources of measurement error. But it is necessary to establish longitudinal measurement invariance to be sure the measurement models measure the same thing in the same way at each time-point.

37 Bivariate (cross-lagged) LGC model y 1 y 2 y 3 y 4 i y s y γ y γ x i x s x x 1 x 2 x 3 x 4 37 / 81 Two growth processes each modelled by a LGC model. The models are linked by cross-lagged regressions. These specify association at the level of growth factors. Is the slope of one process determined by the baseline level of the other process?

38 Laird-Ware mixed-effects model General mixed-effects model for the i th person: Y i = X i β + Z i u i + e i X i is a design matrix for the fixed effects β, Z i is a design matrix for the random effects u i. The columns of Z i are a subset of the columns of X i, (your choice of random effects). Z i must contain only TVCs, (time-varying within-subject covariates, such as time itself). The remaining columns of X i must contain only TICs, (between-subject covariates that are constant over time). The standard model (random intercepts and slopes) in Laird-Ware form: (No TICs are included, so the columns of Z i are all the columns of X i ). y i1 1 x i1 [ ] 1 x i1 [ ] e i1 y i2 y i3 = 1 x i2 β0 1 x i3 + 1 x i2 u0i β 1 1 x i3 + e i2 u 1i e i3 y i4 1 x i4 1 x i4 e i4 β 0 + β 1 x i1 u 0i + u 1i x i1 e i1 = β 0 + β 1 x i2 β 0 + β 1 x i3 + u 0i + u 1i x i2 u 0i + u 1i x i3 + e i2 e i3 β 0 + β 1 x i4 u 0i + u 1i x i4 e i4 y ij = β 0 + β 1 x ij + u 0i + u 1i x ij + e ij = (β 0 + u 0i ) + (β 1 + u 1i )x ij + e ij 38 / 81 The random terms collected together form the composite residual. y ij = β 0 + β 1 x ij + u 0i + u 1i x ij + e ij = β 0 + β 1 x ij + ɛ ij The composite residuals of a mixed-effects model are more complicated than the independent residuals assumed for an OLS regression model. They depend upon time x. This gives the residuals a variance-covariance structure.

39 The composite residual The standard model has a composite residual that depends upon time: y ij = β 0 + β 1 x ij + u 0i + u 1i x ij + e ij }{{} ɛ ij For the standard mixed-effects model: Residual variance (diagonal elements of the variance-covariance matrix) Var(ɛ ij ) = σ 2 e + σ σ 01x ij + σ 2 1 x2 ij Residual covariance between measurement occasions j and j, (off-diagonal elements) Cov(ɛ ij, ɛ ij ) = σ σ 01(x ij + x ij ) + σ 2 1 x ij x ij For the general mixed-effects model: Cov(Y i ) = Z i Cov(u i )Z t i + σ 2 e I n Random effects in the model induce a residual variance-covariance structure. 39 / 81 Residual variance depends upon time. It is heteroskedastic. It may be different at different time-points. Random effects in a model induce a correlation structure, (a pattern of variances and covariances amongst the residuals). Without random effects the variance-covariance matrix reduces to: This represents the OLS assumptions: σ 2 e 0 Cov(Y i ) = σe 2 I n =... 0 σe 2 ˆ ˆ Homoskedasticity = identical variances on the diagonal. Un-correlated = zero covariances off the diagonal. Random effects induce structure (patterns) in the variance-covariance matrix. The model-implied correlation structure depends upon your choice of random effects. The aim is to choose a structure that reflects correlations in the observed data.

40 Time-dependent variance and heteroskedasticity The induced variance-covariance structure depends upon time. Consequently it can reflect heteroskedasticity. Residual variance: σ 2 e + σ σ 01x ij + σ 2 1 x2 ij Minimum: σ 01 /σ 2 1 Residual variance Curvature: 2σ Time (x) Either side of the minimum the variance changes monotonically with time. The location of the minimum determines where variance increases or decreases. Increasing variance with time reflects a fan-out pattern of growth curves. Decreasing variance reflects fan-in. The smaller the slope variance σ1 2 the less curvature and the more homoskedastic. 40 / 81 The diagonal of Cov(Y i ) is the residual variance at different time-points. Homoskedasticity assumes it is the same at all time time-points. Heteroskedasticity means it changes over time. Residual variance depends upon time when the model includes random effects. The form this takes provides some account of heteroskedasticity in the data. The standard model, (random intercepts and slope of time), induces a parabola. This accounts for the typical fan-in/fan-out patterns of growth curves. The time location of the minimum variance depends upon the slope variance and the intercept-slope covariance. The slope variance is usually dominant. Any TVCs added to the model must be added to Z i so they appear at level-1. But the induced variance-covariance also depends upon Z i. Therefore adding further TVCs makes the variance-covariance structure more complex and time-dependent.

41 Correlation structure Correlation structure is a pattern in the variance-covariance matrix. The pattern in the block of the i subject s residuals is assumed the same for all subjects. Unstructured assumes no pattern. All variances and covariances may be different. σ 2 1 σ Cov(Y i ) = σ σn 2 Independence assumes a strong pattern, (the OLS assumption). All variances are equal and all covariances are 0. Cov(Y i ) = σe 2 In = σ 2 e σ 2 e 41 / 81 The i th subject has n repeated measures, i = 1,..., n. Time-dependent variance on the diagonal reflects heteroskedasticity. Time-dependent covariance off the diagonal reflects serial correlation.

42 Covariance patterns and serial correlation Two ways to add covariance structure: 1. Your choice of random effects induces a certain correlation structure. 2. Some programs provide options for a range of correlation structures. These aim to account for patterns of serial correlation. Independence. The matrix has a diagonal structure. All variances are equal and all covariances are 0. Exchangeable, (compound symmetry). All variances are equal, and all covariances are equal. Toeplitz. All variances are equal. Covariance is the same across equal time intervals. This leads to a diagonally banded structure. AR(1). First-order autoregressive relationship between successive time-points: e ij = ρe ij 1 + w ij All variances are equal. Covariance decreases as the time interval increases. Unstructured. No constraints. Every variance and covariance is free to be estimated. 42 / 81 Correlation structure exploits stable patterns of residual variance-covariance in order to apply constraints and reduce the number of parameters to estimate. It is a trade-off between model fit and degrees of freedom. Unstructured correlation may give a better fit but the model may be unestimable. There may not be enough unique bits of information in the data to estimate all the required parameters.

43 Parallel slopes model Regression equation for the i th person with subject-specific deviations from the average intercept: y ij = (β 0 + u 0i ) + β 1 x ij + e ij = β 0 + β 1 x ij + u 0i + e ij } {{ } ɛ ij Var(ɛ ij ) = σ 2 e + σ2 0 Cov(ɛ ij, ɛ ij ) = σ 2 0 Compound symmetry, (sphericity): Residual variance-covariance is not time-dependent. Variance is constant at all time-points (homoskedsticity). Covariance is equal between any pair of time-points. R: fit = lmer(y x + (1 i), data) summary(fit) Random effects: Groups Name Variance Std.Dev. Corr id (Intercept) Residual Fixed effects: Estimate Std. Error df t value Pr(> t ) (Intercept) e-14 x / 81 The parallel slopes model induces exchangeable correlation structure. All variances are equal, all covariances (across time-points) are equal. σe 2 + σ0 2 σ Cov(Y i ) = σ σe 2 + σ2 0 Also called compound symmetry, (or sphericity ). Observations separated in time are assumed to be correlated, but the correlation is assumed to be the same between any pair of time-points regardless of how far apart in time. Compared with the standard model, (random intercepts and slopes): ˆ ˆ ˆ The fixed effects are the same. Residual variance within-person is higher. The restriction of parallel slopes does not fit so well. The slope variation has been lumped into the residual variance. Standard errors for fixed effects are lower. There are fewer parameters to estimate, (no slope variance or intercept-slope covariance). More degrees of freedom.

44 Model specification Two sides of model specification: 1. Specify the functional form of the growth model. For example a straight-line, or a quadratic curve, etc. 2. Specify the residual variance-covariance. This has two sides: a. Choose random effects. Your choice induces a variance-covariance structure. b. Specify program options for variance-covariance structure, if provided. How to choose random effects? This can be guided by model goodness-of-fit and comparison. 44 / 81

45 Model comparison Assess random effects specifications by comparing nested models fitted to the same data using FIML. AIC, BIC, and log likelihood, (lowest is best). Likelihood ratio test, (chi-squared test of difference in goodness-of-fit). R: fit1 = lmer(y x + (1 i), data, REML=FALSE) a fit2 = lmer(y x + (x i), data, REML=FALSE) b fit3 = lmer(y x + (x i), data, REML=FALSE) c anova(fit1, fit2, fit3) Df AIC BIC loglik deviance Chisq Chi Df Pr(>Chisq) a. Random intercept only (parallel slopes). b. Independent random intercepts and slopes. c. Covarying random intercepts and slopes, (unstructured). 45 / 81 Which model fits best? BIC suggests model (a). AIC suggests model (b). Model comparison by the LR test suggests there is no significant difference between models (a) and (b), or between models (b) and (c). Conclusion: If the fixed effects are the main interest and the variance components are nuisance parameters, the random intercept only model (a) might be preferred. If the variance components are of interest the random intercept and slope model (b) might be preferred. There is no significant benefit to allowing intercept-slope covariance.

46 Model comparison Stata: mixed y x i: a estimates store fit1 mixed y x i: x b estimates store fit2 mixed y x i: x, covariance(unstructured) c estimates store fit3 lrtest fit1 fit2 lrtest fit2 fit3 Likelihood-ratio test LR chi2(1) = 3.51 (Assumption: fit1 nested in fit2) Prob > chi2 = Likelihood-ratio test LR chi2(1) = 1.16 (Assumption: fit2 nested in fit3) Prob > chi2 = a. Random intercept only (parallel slopes). b. Independent random intercepts and slopes. c. Covarying random intercepts and slopes. 46 / 81 The same model comparison procedure using LR tests in Stata.

47 Including covariates to explain away residual variance In the standard 2-level model: Level-1 is the within-person or individual level. Level-2 is the between-person or group level. The levels decompose within and between-person variance. Between-person variance is further decomposed into variance of growth parameters. One kind of variance might be the research interest, the others a nuisance to be controlled. Either way, variance is explained by including covariates. Covariates are classified according to the kinds of variation they can explain. Time-varying covariates (TVCs) are variables that change over time, (eg. age). They explain variation within-person. Time-invariant covariates (TICs) are variables that are constant over time, (eg. sex). They explain variation between-persons. 47 / 81 Level-1 describes change in the i th person using variables that change over time. Level-2 describes differences in change between-persons using time-invariant variables that have different levels for different people. Time-invariant variables have time-invariant effects. They explain individual differences that are constant over time. This does not imply there is no differential growth. A straight-line model with random slopes, for example, allows people to grow differently with a constant difference in their growth rates. The order of the difference is determined by the model. A quadratic model, for example, allows the growth rate to change but assumes constant 2nd-order difference in curvature.

48 Longitudinal dataset with a TIC Wide format y.1 y.2 y.3 y.4 z Willett J. B. (1988) Review of research in education, p Long format i j y x z i th person j th measurement occasion. 48 / 81 A panel of 35 subjects were assessed at baseline for their cognitive function (z). The subjects were given an opposites naming task on each of four consecutive days. They were given a long list of words and had to name the opposite of words as quickly as possible. The data were the count of how many opposites they could name in 10 minutes. y.1 are the counts of the 35 persons on day 1, and so forth. The researcher was interested in whether clever people s performance improved at a faster rate. Their baseline cognitive function was assumed not to change over the four days. These data are strongly balanced: everyone has the same measurement times x with no missing time-points. Variable z is a TIC. By definition it does not change over time. It needs only to be measured once in each person, for example at baseline. TICs in long format must be repeated within-person at each time-point.

49 TICs and residual between-person variance A TIC can only explain between-person variation. It cannot explain within-person variation because it is constant within-person. Between-person variation is further decomposed into growth parameters, (eg. intercept and slopes). A TIC can be used to explain some or all of these parts, depending upon how it enters the model. TIC effect on the intercept only: Level-1: Level-2: y ij = (β 00 + β 01 z i + u 0i ) + (β 10 + u 1i ) x ij + e ij } {{ } } {{ } π 0i π 1i y ij = π 0i + π 1i x ij + e ij π 0i = β 00 + β 01 z i + u 0i π 1i = β 10 + u 1i TIC effects on the intercept and slope: Level-1: Level-2: y ij = (β 00 + β 01 z i + u 0i ) + (β 10 + β 11 z i + u 1i ) x ij + e ij } {{ } } {{ } π 0i π 1i y ij = π 0i + π 1i x ij + e ij π 0i = β 00 + β 01 z i + u 0i π 1i = β 10 + β 11 z i + u 1i 49 / 81 TICs appear as level-2 covariates. TICs are assumed to stay constant within subjects. It makes no sense for TICs to have random effects within subjects. They don t change within subjects so they can t change differently between subjects. One subject s constant TIC value may be different from another s. So there may be a TIC effect between subjects. For example sex may have a fixed effect upon the slope. The slope may be different between female and male.

50 TIC direct effect and cross-level interaction TIC effect on the intercept enters the model as a direct (main) effect. y ij = (β 00 + β 01 z i ) + β 10 x ij + ɛ ij = β 00 + β 10 x ij + β 01 z i }{{} +ɛ ij direct effect R: lmer(y 1 + x + z + (1+x i), data) lmer(y x + z + (x i), data) # shorthand (implied intercept) TIC effects on the intercept and slope of time enter the model as a direct effect and cross-level interaction with time, (a product term). y ij = (β 00 + β 01 z i ) + (β 10 + β 11 z i )x ij + ɛ ij = β 00 + β 10 x ij + β 01 z i }{{} direct effect + β 11 z i x ij +ɛ ij }{{} interaction R: lmer(y 1 + x + z + z:x + (1+x i), data) lmer(y x * z + (x i), data) # shorthand 50 / 81 Error terms are collected into a composite residual ɛ ij for convenience. The models include random intercept and slope of time (x) and their covariance. The R formula syntax tries to look like the model equation. The cross-level interaction describes how an individual-level variable such as the slope of time (at level-1) is moderated by a group-level variable such as a TIC, (at level-2). Interactions depend upon their constituent direct effects and how they are centered. If interaction x:z is significant, the effect of x is conditional upon the value z is centered on, and vice versa.

Introduction to Random Effects of Time and Model Estimation

Introduction to Random Effects of Time and Model Estimation Today s Class: The Big Picture Multilevel model notation Fixed vs. random effects of time Random intercept vs. random slope models How MLM =