Models for longitudinal data

Similar documents
Correlated data. Repeated measurements over time. Typical set-up for repeated measurements. Traditional presentation of data

Correlated data. Longitudinal data. Typical set-up for repeated measurements. Examples from literature, I. Faculty of Health Sciences

Analysis of variance and regression. May 13, 2008

Faculty of Health Sciences. Correlated data. Variance component models. Lene Theil Skovgaard & Julie Lyng Forman.

Correlated data. Overview. Example: Swelling due to vaccine. Variance component models. Faculty of Health Sciences. Variance component models

Faculty of Health Sciences. Correlated data. Variance component models. Lene Theil Skovgaard & Julie Lyng Forman.

Correlated data. Overview. Variance component models. Terminology for correlated measurements. Faculty of Health Sciences. Variance component models

Introduction to SAS proc mixed

Variance components and LMMs

Variance components and LMMs

Answer to exercise: Blood pressure lowering drugs

Introduction to SAS proc mixed

over Time line for the means). Specifically, & covariances) just a fixed variance instead. PROC MIXED: to 1000 is default) list models with TYPE=VC */

SAS Syntax and Output for Data Manipulation:

Linear mixed models. Faculty of Health Sciences. Analysis of repeated measurements, 10th March Julie Lyng Forman & Lene Theil Skovgaard

Variance component models part I

Describing Within-Person Fluctuation over Time using Alternative Covariance Structures

More about linear mixed models

Models for binary data

Review of CLDP 944: Multilevel Models for Longitudinal Data

Statistics for exp. medical researchers Regression and Correlation

Analysis of Longitudinal Data: Comparison Between PROC GLM and PROC MIXED. Maribeth Johnson Medical College of Georgia Augusta, GA

Introduction to Within-Person Analysis and RM ANOVA

MIXED MODELS FOR REPEATED (LONGITUDINAL) DATA PART 2 DAVID C. HOWELL 4/1/2010

Introduction to Random Effects of Time and Model Estimation

Model Assumptions; Predicting Heterogeneity of Variance

Review of Unconditional Multilevel Models for Longitudinal Data

Repeated Measures Modeling With PROC MIXED E. Barry Moser, Louisiana State University, Baton Rouge, LA

Linear mixed models. Program. What are repeated measurements? Outline. Faculty of Health Sciences. Analysis of repeated measurements, 10th March 2015

Faculty of Health Sciences. Correlated data. More about LMMs. Lene Theil Skovgaard. December 4, / 104

Variance component models

Review of Multilevel Models for Longitudinal Data

SAS Syntax and Output for Data Manipulation: CLDP 944 Example 3a page 1

Repeated Measures Data

An Introduction to Multilevel Models. PSYC 943 (930): Fundamentals of Multivariate Modeling Lecture 25: December 7, 2012

Multi-factor analysis of variance

Serial Correlation. Edps/Psych/Stat 587. Carolyn J. Anderson. Fall Department of Educational Psychology

SAS Code for Data Manipulation: SPSS Code for Data Manipulation: STATA Code for Data Manipulation: Psyc 945 Example 1 page 1

ANOVA Longitudinal Models for the Practice Effects Data: via GLM

Biostatistics Workshop Longitudinal Data Analysis. Session 4 GARRETT FITZMAURICE

A (Brief) Introduction to Crossed Random Effects Models for Repeated Measures Data

Describing Change over Time: Adding Linear Trends

Analysis of variance and regression. December 4, 2007

Analysis of Longitudinal Data: Comparison between PROC GLM and PROC MIXED.

STAT 5200 Handout #23. Repeated Measures Example (Ch. 16)

Analyzing the Behavior of Rats by Repeated Measurements

Topic 25 - One-Way Random Effects Models. Outline. Random Effects vs Fixed Effects. Data for One-way Random Effects Model. One-way Random effects

A Re-Introduction to General Linear Models (GLM)

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator

Describing Within-Person Change over Time

STAT 705 Generalized linear mixed models

Faculty of Health Sciences. Correlated data. Count variables. Lene Theil Skovgaard & Julie Lyng Forman. December 6, 2016

Subject-specific observed profiles of log(fev1) vs age First 50 subjects in Six Cities Study

Topic 17 - Single Factor Analysis of Variance. Outline. One-way ANOVA. The Data / Notation. One way ANOVA Cell means model Factor effects model

Modeling the Covariance

POWER ANALYSIS TO DETERMINE THE IMPORTANCE OF COVARIANCE STRUCTURE CHOICE IN MIXED MODEL REPEATED MEASURES ANOVA

Dynamic Determination of Mixed Model Covariance Structures. in Double-blind Clinical Trials. Matthew Davis - Omnicare Clinical Research

Time-Invariant Predictors in Longitudinal Models

Introduction to Linear Mixed Models: Modeling continuous longitudinal outcomes

Covariance Structure Approach to Within-Cases

Topic 23: Diagnostics and Remedies

Testing Indirect Effects for Lower Level Mediation Models in SAS PROC MIXED

Repeated Measures Design. Advertising Sales Example

Analysis of variance and regression. November 22, 2007

Repeated Measures ANOVA Multivariate ANOVA and Their Relationship to Linear Mixed Models

Correlated data. Overview. Variance component models. Terminology for correlated measurements. Faculty of Health Sciences. Variance component models

Modelling the Covariance

RANDOM and REPEATED statements - How to Use Them to Model the Covariance Structure in Proc Mixed. Charlie Liu, Dachuang Cao, Peiqi Chen, Tony Zagar

Introduction to Crossover Trials

Time-Invariant Predictors in Longitudinal Models

Hierarchical Generalized Linear Models. ERSH 8990 REMS Seminar on HLM Last Lecture!

Supplemental Materials. In the main text, we recommend graphing physiological values for individual dyad

df=degrees of freedom = n - 1

Time Invariant Predictors in Longitudinal Models

Independence (Null) Baseline Model: Item means and variances, but NO covariances

Lecture 4. Random Effects in Completely Randomized Design

CHL 5225 H Crossover Trials. CHL 5225 H Crossover Trials

Advantages of Mixed-effects Regression Models (MRM; aka multilevel, hierarchical linear, linear mixed models) 1. MRM explicitly models individual

Time-Invariant Predictors in Longitudinal Models

Correlation and Simple Linear Regression

Describing Nonlinear Change Over Time

Lecture 14: Introduction to Poisson Regression

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview

Topic 20: Single Factor Analysis of Variance

Generalized Linear. Mixed Models. Methods and Applications. Modern Concepts, Walter W. Stroup. Texts in Statistical Science.

Chapter 9. Multivariate and Within-cases Analysis. 9.1 Multivariate Analysis of Variance

LISA Short Course Series Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R. Liang (Sally) Shan Nov. 4, 2014

Time-Invariant Predictors in Longitudinal Models

General Linear Model (Chapter 4)

Varians- og regressionsanalyse

Stat/F&W Ecol/Hort 572 Review Points Ané, Spring 2010

Measurement Invariance (MI) in CFA and Differential Item Functioning (DIF) in IRT/IFA

Correlated data. Variance component models. Example: Evaluate vaccine. Traditional assumption so far. Faculty of Health Sciences

Longitudinal Data Analysis

Biostatistics 301A. Repeated measurement analysis (mixed models)

Lab 3: Two levels Poisson models (taken from Multilevel and Longitudinal Modeling Using Stata, p )

Chapter 3 ANALYSIS OF RESPONSE PROFILES

Statistics 203: Introduction to Regression and Analysis of Variance Course review

Lab 11. Multilevel Models. Description of Data

Random Intercept Models

Transcription:

Faculty of Health Sciences Contents Models for longitudinal data Analysis of repeated measurements, NFA 016 Julie Lyng Forman & Lene Theil Skovgaard Department of Biostatistics, University of Copenhagen Longitudinal designs Models for the mean Models for the covariance Linear growth model (aka random regression) Suggested reading: FLW chapters 6, 7, and 8. / 67 Outline Typical set-up for longitudinal measurements Longitudinal designs Models for the mean Covariance pattern models Analysis of summary statistics The linear growth model (random regression) Unbalanced data Two or more groups of subjects E.g. two different treatments, possibly randomized. Repeated measurements over time for each subject. calender time / age / duration of treatment planned or ad hoc times of measurement. ATT: statistical results may be biased unless we account for correlation between measurements on the same subject. 3 / 67 4 / 67

Pros and cons of longitudinal designs Longitudinal vs cross sectional effect Example: Reading ability, as a function of age and cohort: Pros Thay have much more power for detecting changes over time. (data are paired with the subject as its own control) 1 1 3 1 We may discover that subjects have different time courses In designs with only cross-sectional data, this may also be the case, but we have no way of knowing! 3 4 5 1 1 Cons 4 5 1 1 Analysis by traditional ANOVA or GLM-models is infeasible because the independence assumption is violated. We need to model both the mean and the covariance. 6 6 1 5 / 67 6 / 67 Characterizing your longitudinal study Study types Type of study: Randomized or observational? Time schedule: Fixed or ad hoc observations? Type of data: Continuous, binary, coundt, ordinal, or categorical? Data structure: Wide or long format? Randomized: One homogeneous population is studied. Randomization to two (or more) treatment groups One or more follow-up measurements + usually a measurement at baseline. Observational: Two (or more) populations are studied. E.g. men and women or two different groups of patients A well defined starting point (e.g. diagnosis). Sample size: How many subjects? How many time points? 7 / 67 8 / 67

Observation schedules Fixed time points (balanced design): Measurements collected at prespecified time points. Equidistant: 5, 10, 15, 0 minutes, or every month. Non-equidistant: 5, 10, 0, and 60 minutes. Ad hoc time points (unbalanced design): Subejcts were seen e.g. when the doctor decided or when the patient felt the need. Beware of selection bias. What are the time points representative of? Case: Calcium supplements Randomized study: including 11 girls at age 11. Treatment: calcium supplement or placebo. Outcome: BMD=bone mineral density, in g/cm 3 Planned follow-up: every 6 months in two years 5 visits in total including baseline. Does calcium increase bone gain in adolescent women? BUT: In practice, designs planned to be balanced often turn out more or less unbalanced... 9 / 67 10 / 67 Time points in the calcium study Visit as time line (ignore deviations from time schedule) visit: Number 1,, 3, 4, or 5. time: Scheduled follow-up 0, 1/, 1, 1 1/, years. obstime: Actual time of follow-up (individual). Analysis Variable : obstime time NObs Mean Std Dev Minimum Maximum ---------------------------------------------------------------- 0 11 0 0 0 0 0.5 11 0.519530 0.041849 0.4161533 0.7796 1 11 0.9606814 0.061761 0.8487337 1.1887 1.5 11 1.4997597 0.08988 1.336071 1.776865 11 1.9760965 0.0807148 1.7960301.34086 ---------------------------------------------------------------- 11 / 67 1 / 67

Repetition: Analysis of response profiles Comparison of change over n time points (visits) within g groups (treatment) of subjects. Similar to two-way ANOVA, only with correlated data. An unstructured covariance is assumed. Drawbacks: Can only handle balanced designs. Not good with many groups or time points, due to having too many model parameters. Do not make use of a priori known data patterns, e.g. correlation decreasing with time. monotone growth. Today: Parametric models for the mean and the covariance. 13 / 67 Repetition: Analysis of response profiles Note: Do baseline adjustment since treatment was randomized. title1 Response profiles (clmm) ; data calcium; set calcium; treat = grp; if visit EQ 1 then treat = P ; run; proc mixed data=calcium plots=all; class grp girl time (ref= 0 ) treat (ref= P ); model bmd = time treat*time / ddfm=kr solution cl; repeated time / type=un subject=girl; run; 14 / 67 Repetition: Analysis of response profiles Conclusion Solution for Fixed Effects Standard Effect treat time Estimate Error DF t Value Pr > t Alpha Lower Upper Intercept 0.875 0.005933 111 147.50 <.0001 0.05 0.8634 0.8869 time 0.5 0.0038 0.00343 106 8.70 <.0001 0.05 0.01573 0.050 time 1 0.04445 0.00333 101 13.34 <.0001 0.05 0.03784 0.05106 time 1.5 0.0709 0.003936 101 18.0 <.0001 0.05 0.0631 0.07873 time 0.08706 0.004363 98.5 19.95 <.0001 0.05 0.07841 0.0957 time 0 0....... time*treat C 0.5 0.00639 0.003304 103 1.9 0.058 0.05-0.000 0.0188 time*treat P 0.5 0....... time*treat C 1 0.0117 0.004759 101.46 0.0154 0.05 0.0085 0.0116 time*treat P 1 0....... time*treat C 1.5 0.0117 0.005611 101.09 0.0393 0.05 0.000587 0.085 time*treat P 1.5 0....... time*treat C 0.01896 0.00669 99.4 3.0 0.003 0.05 0.00650 0.03140 time*treat P 0....... time*treat P 0 0....... Type 3 Tests of Fixed Effects Num Den Effect DF DF F Value Pr > F time 4 93.9 48.70 <.0001 time*treat 4 93.7.74 0.033 Mean gain in BMD (g/cm 3 ) since baseline with calcium supplement or placebo years Calcium group Placebo group Difference 1/ 0.07 (0.00;0.033) 0.00 (0.014;0.07) 0.006 (0.000;0.013) 1 0.056 (0.047;0.065) 0.045 (0.035;0.054) 0.01 (0.00;0.01) 1 1/ 0.083 (0.07;0.094) 0.071 (0.060;0.08) 0.01 (0.001;0.03) 0.106 (0.094;0.118) 0.087 (0.075;0.099) 0.019 (0.006;0.031) We see a significantly higher gain in BMD with calcium at last follow-up (P=0.003) 15 / 67 16 / 67

Outline Longitudinal designs Models for the mean Covariance pattern models Analysis of summary statistics The linear growth model (random regression) Unbalanced data 17 / 67 Models for the mean Changes over time usually appear gradually and often following a distinct pattern. We gain power by incorporating this in our models. We want to estimate key parameters such as growth rates. Model the mean as a continuous function of time: Linear Exponential (log-linear) Piecewise linear (linear spline) Flexible (polynomium, smoothing spline, etc) Nonlinear (cyclic, dose-response, etc) (many possibilities, not all treated in this course). 18 / 67 Examples of mean curves Calcium: estimated response profiles 19 / 67 0 / 67 Looks as mean BMD increases linearly with time.

Calcium: linear model To fit at linear model for the mean, use the continuous variable time (scheduled visit in years since baseline). title1 Linear trend ; proc mixed data=calcium method=ml plots=all; class grp (ref= P ) girl visit; model bmd = time grp*time / ddfm=kr solution cl; repeated visit / type=un subject=girl r rcorr; run; 1 / 67 Main effect of grp omitted (no difference at baseline). Use the categorical variable visit to specify the unstructured model for the covariance. Use the method=ml option for model comparisons. Calcium: Estimates Fit Statistics - Log Likelihood -430.4 AIC (Smaller is Better) -394.4 AICC (Smaller is Better) -393.0 BIC (Smaller is Better) -345.5 Solution for Fixed Effects Standard Effect grp Estimate Error DF t Value Pr > t Alpha Lower Upper Intercept 0.870 0.00571 111 15.66 <.0001 0.05 0.8607 0.8834 time 0.04410 0.00185 98.3 0.18 <.0001 0.05 0.03976 0.04843 time*grp C 0.00888 0.003141 98.9.81 0.0060 0.05 0.00596 0.01506 time*grp P 0....... Type 3 Tests of Fixed Effects Num Den Effect DF DF F Value Pr > F time 1 99.1 969.59 <.0001 time*grp 1 98.9 7.90 0.0060 Extra increase in BMD with calcium of 0.0088 g/cm 3 per year, (95%CI: 0.006;0.0151, / 67 P=0.006). Calcium: Estimated response profiles Comparison of models for the mean Goodness of fit is measured by likelihood. Better fitting models have large values of likelihood and therefore small values of deviance: - log Likelihood. ATT: Use the method=ml-option in proc mixed. Compute the difference in deviances (called - log Q) and compare to a χ -distribution with df= no. params. Only nested models can be compared this way. Example: linear trend vs unrestricted response profiles. log Q = 444.1 430.4 = 13.7 χ (9 3) = χ (6) P = 0.033 BUT: Is linear evolution at all plausible? 3 / 67 4 / 67

Technical note on likelihood-types When testing a submodel for the mean, the deviances of the compared models must be computed with the full (or conventional) likelihood proc mixed data=calcium method=ml; not the residual likelihood (reml) which is default in mixed. Residuals for linear trend model But don t forget: Most hypothesis about the mean can be tested using just the default F-tests (optimal choice). proc mixed data=calcium; class grp girl visit treat; model bmd = time grp*time treat*visit/ ddfm=kr solution; repeated visit / type=un subject=girl; run; Type 3 Tests of Fixed Effects Num Den Effect DF DF F Value Pr > F time 0... time*grp 0... visit*treat 6 16.9 0.039 5 / 67 6 / 67 Deviation from linearity is not that pronounced. Outline The unstructured covariance Longitudinal designs Models for the mean Covariance pattern models Analysis of summary statistics The linear growth model (random regression) Unbalanced data 7 / 67 So far we have made no assumptions about covariance. Advantages We make no wrong assumptions about the covariance of our observations. No need to think more about them. We gain insight in the actual structure of the covariance by looking at the estimates. Drawbacks 8 / 67 We use quite a lot of parameters to describe the covariance structure. Thus our analysis becomes less efficient. No good with small data sets; The results may be unstable. It can only be used in case of balanced data, i.e. all subjects have to be measured at identical times.

To get a view of the covariance Output for unstructured covariance To get an unbiased view of the covariance we must ensure that out model for the mean is correct. We can use an unrestricted model as a starting point. Now we want to see the estimated covariances and correlations, so we use the r and rcorr options in proc mixed: repeated visit / type=un subject=girl r rcorr; Estimated R Matrix for girl 101 Row Col1 Col Col3 Col4 Col5 1 0.003943 0.004186 0.004163 0.00438 0.003947 0.004186 0.00475 0.004709 0.004808 0.004519 3 0.004163 0.004709 0.004961 0.00504 0.00476 4 0.00438 0.004808 0.00504 0.00536 0.004980 5 0.003947 0.004519 0.00476 0.004980 0.004894 Estimated R Correlation Matrix for girl 101 Row Col1 Col Col3 Col4 Col5 1 1.0000 0.9698 0.9413 0.949 0.8985 0.9698 1.0000 0.977 0.9585 0.9398 3 0.9413 0.977 1.0000 0.9809 0.9591 4 0.949 0.9585 0.9809 1.0000 0.9755 5 0.8985 0.9398 0.9591 0.9755 1.0000 9 / 67 30 / 67 Models for the covariance Stationary covariance patterns Most often covariance display distinct features. E.g. decreasing correlation with increasing time span between observations as in the calcium data. We gain power by incorporating these features in our model. Possibilities: Unstructured covariance (lecture 1) Covariance pattern models (lecture ) Variance components / random effects (lecture 3) A huge selection is available with proc mixed. Large selection of models for equidistant observations. Assumption: variances and correlations are stationary: The variance is constant over time. Correlation depend only on the time-distance between the observations not the specific times of measurements. Examples: compound symmetry, autoregressive, autoregressive moving average, and the Toeplitz models. proc mixed type Cov(Y ij, Y ik ) no.par CS σ [I {j = k} + ρ I {j k}] AR(1) σ ρ k j ARMA(1,1) σ [I {j = k} + γ ρ k j 1 I {j k}] 3 TOEP σ [I {j = k} + ρ k j I {j k}] n 31 / 67 3 / 67

Examples of stationary correlations Example: Compound symmetry Also called exchangeable covariance (more on this in lecture 3) The variance σ is the same at all time points. All paired observations are correlated with the same correlation ρ, regardles of the specific time points. The 5x5 covariance and correlation matrices are Cov = σ σ ρ σ ρ σ ρ σ ρ σ ρ σ σ ρ σ ρ σ ρ σ ρ σ ρ σ σ ρ σ ρ σ ρ σ ρ σ ρ σ σ ρ σ ρ σ ρ σ ρ σ ρ σ Cor = 1 ρ ρ ρ ρ ρ 1 ρ ρ ρ ρ ρ 1 ρ ρ ρ ρ ρ 1 ρ ρ ρ ρ ρ 1 In proc mixed we write: repeated visit / type=cs subject=girl r rcorr; 33 / 67 34 / 67 Output for Compound Symmetry Example: AR(1)-covariance matrix Estimated R Matrix for girl 101 Row Col1 Col Col3 Col4 Col5 1 0.004660 0.00445 0.00445 0.00445 0.00445 0.00445 0.004660 0.00445 0.00445 0.00445 3 0.00445 0.00445 0.004660 0.00445 0.00445 4 0.00445 0.00445 0.00445 0.004660 0.00445 5 0.00445 0.00445 0.00445 0.00445 0.004660 Estimated R Correlation Matrix for girl 101 Row Col1 Col Col3 Col4 Col5 1 1.0000 0.9496 0.9496 0.9496 0.9496 0.9496 1.0000 0.9496 0.9496 0.9496 3 0.9496 0.9496 1.0000 0.9496 0.9496 4 0.9496 0.9496 0.9496 1.0000 0.9496 5 0.9496 0.9496 0.9496 0.9496 1.0000 Covariance Parameter Estimates Cov Parm Subject Estimate CS girl 0.00445 Residual 0.00035 Note: Cov Parms are not σ = 0.04660 and ρ = 0.9496... (lecture 35 / 67 3). The so-called autoregressive covariance structure has 36 / 67 Constant variance σ over time. Correlation decreasing exponentially with the distance between the observations, Cor(Y ij, Y ik ) = ρ k j The 5x5 covariance and correlation matrices are: Cov = σ σ ρ σ ρ σ ρ 3 σ ρ 4 σ ρ σ σ ρ σ ρ σ ρ 3 σ ρ σ ρ σ σ ρ σ ρ σ ρ 3 σ ρ σ ρ σ σ ρ σ ρ 4 σ ρ 3 σ ρ σ ρ σ In proc mixed we write: Co4 = repeated visit / type=ar(1) subject=girl r rcorr; 1 ρ ρ ρ ρ 1 ρ ρ ρ ρ 1 ρ ρ 3 ρ ρ 1 ρ 4 ρ 3 ρ ρ

Output from TYPE=AR(1) structure Estimated R Matrix for girl 101 Row Col1 Col Col3 Col4 Col5 1 0.004401 0.0047 0.004148 0.00406 0.003909 0.0047 0.004401 0.0047 0.004148 0.00406 3 0.004148 0.0047 0.004401 0.0047 0.004148 4 0.00406 0.004148 0.0047 0.004401 0.0047 5 0.003909 0.00406 0.004148 0.0047 0.004401 Estimated R Correlation Matrix for girl 101 Row Col1 Col Col3 Col4 Col5 1 1.0000 0.9708 0.944 0.9148 0.8881 0.9708 1.0000 0.9708 0.944 0.9148 3 0.944 0.9708 1.0000 0.9708 0.944 4 0.9148 0.944 0.9708 1.0000 0.9708 5 0.8881 0.9148 0.944 0.9708 1.0000 Covariance Parameter Estimates Cov Parm Subject Estimate AR(1) girl 0.9708 Residual 0.004401 Here σ 37 / 67 = 0.004401 and ρ = 0.9708. Heterogeneous covariance patterns The assumption that the variance does not change with time can be dropped when assuming a heterogeneous covariance pattern. No restrictions on the variances σ 1, σ,... Correlations are assumed to be stationary; They depend only on the time-distance between observations not the specific time points. Examples: the heterogeneous compound symmetry, heterogeneous autoregressive, the heterogeneous Toeplitz, and the antedependence covariance sturctures. 38 / 67 proc mixed type Cov(Y ij, Y ik ) no.par CSH σ j σ k [I {j = k} + ρ I {j k}] n + 1 ARH(1) σ j σ k ρ k j n + 1 TOEPH σ j σ k [I {j = k} + ρ k j I {j k}] n 1 k 1 ANTE(1) σ j σ k l=j ρ l n 1 Output ARH(1) structure Comparison of covariance structures Estimated R Matrix for girl 101 Row Col1 Col Col3 Col4 Col5 1 0.004013 0.00465 0.00415 0.0041 0.003939 0.00465 0.004774 0.004719 0.00475 0.004409 3 0.00415 0.004719 0.00491 0.004919 0.004590 4 0.0041 0.00475 0.004919 0.005188 0.004841 5 0.003939 0.004409 0.004590 0.004841 0.004758 Estimated R Correlation Matrix for girl 101 Row Col1 Col Col3 Col4 Col5 1 1.0000 0.9744 0.9494 0.951 0.9014 0.9744 1.0000 0.9744 0.9494 0.951 3 0.9494 0.9744 1.0000 0.9744 0.9494 4 0.951 0.9494 0.9744 1.0000 0.9744 5 0.9014 0.951 0.9494 0.9744 1.0000 Covariance Parameter Estimates CovParm Subject Estimate Var(1) girl 0.004013 Var() girl 0.004774 Var(3) girl 0.00491 Var(4) girl 0.005188 Var(5) girl 0.004758 ARH(1) girl 0.9744 Model - log L par. log Q df P-value UN -35.6 15 ARH(1) -343.4 6 9. 9 0.4 AR(1) -34.8 7.8 13 0.0096 CS -195.1 157.5 13 < 0.0001 For comparison of covariance patterns use either of the two likelihood-types (ml or reml), just don t compare one to the other. Estimates of σ1, σ,..., σ 39 / 67 5, and ρ. 40 / 67

Predicted response profiles Tests of treatment effect BUT: Confidence intervals and tests depend on the covariance. Tests of treatment effect at last follow-up Covariance pattern Esitmate (95% CI) P-value Assumed independence 0.099 (0.000;0.0578) 0.036 Compound symmetry 0.0196 (0.0110;0.083) < 0.0001 Autoregressive 0.001 (0.0078;0.034) 0.0014 Heterogeneous autoregr. 0.0194 (0.0073;0.0315) 0.0018 The estimated response profiles are almost identical for all our choices of covariance patterns -!!! 41 / 67 If data was balanced and complete we would get perfect agreement (all would be equal to the treat*time-averages). Unstructured 0.0190 (0.0065;0.0314) 0.034 4 / 67 Modeling strategy As suggested by Fitzmaurice et al. (011): 1. Put up a plausible (e.g. saturated) model for the mean. Fit the data so far ignoring correlation (GLM). 3. Check the residuals for assessing the adequacy of the model for the mean and in order to get an impression of the error covariance. 4. Pick a reasonable model for the covariance (if possible test against the unstructured model). 5. Re-check the model fit. 6. Do the analysis. Note: Lecture 4 on model diagnostics among others. Alternative: Robust standard errors Goodness of fit test for the covariance might select an incorrect model due to lack of power. If data is balanced and complete estimates are unbiased but the standard errors may be biased. Instead one could use the robust sandwich covariance estimator: proc mixed empirical data=calcium; However: The sandwich covariance estimator... is useless in small samples because it is anti-conservative. needs additional weighting to get unbiased estimates when there are missing data. 43 / 67 (More 44 / 67 about sandwich covariance estimators in lectures 5 and 6.)

Outline Many time points and few subjects Longitudinal designs In this situation choosing a reliable model for the covariance is just about impossible. Models for the mean Covariance pattern models Analysis of summary statistics The linear growth model (random regression) Unbalanced data Unstructured covariance has too many parameters. Compound symmetry underestimates correlation between observations close in time and overestimates correlation between observations far apart in time. We have no crystal ball for choosing a simple yet correct covariance pattern. Robust standard errors are anti-conservative. So what can we do? 45 / 67 46 / 67 Reduction to independent data Case: Calcium supplements By analyzing carefully chosen characteristics for each individual we can resolve to simple analyses which have no repeated measurements issues. Maybe not optimal, but feasible and easy! Examples of useful summary statistics: The changes from baseline to endpoint The slopes for individual time effects The area under the curve (AUC) The time to peak or peak value Note: Comparing measurements for each time point in turn is not recommended without adjustment for multiple testing. 47 / 67 The overall time course looks reasonably linear, but maybe the girls have individual growth rates. 48 / 67

Individual regression Fit an ordinary linear regression for each girl: Y ij = A i + B i t j + ε ij, ε ij N (0, σ ) Analysis of summary statistics Comparison of individual intercepts and slopes: Just using the two-sample t-test (for independent data). Group Level at baseline (g/cm 3 ) Slope (g/cm 3 per year) P 0.8697 (0.8515;0.8880) 0.0411 (0.0361;0.046) C 0.8817 (0.8650;0.8985) 0.0497 (0.0438;0.0556) Difference 0.010 (-0.015;0.0365) 0.00854 (0.0009;0.016) P-value 0.33 0.09 Are 49 / 67 the slopes of the Calcium group systematically bigger? 50 / 67 Limitations of summary statistics Outline No baseline adjustment, hence less power. Longitudinal designs Some of the girls in the calcium study dropped out: We get less accurate slope-estimates from girls with few observations. No slope at all if drop out was rigth after baseline. And maybe those with low BMD are more likely to drop out; They think they need supplement but might have got the placebo. Can we make better use of the full data? Models for the mean Covariance pattern models Analysis of summary statistics The linear growth model (random regression) Unbalanced data 51 / 67 5 / 67

Random regression We let each girl have her own level A i and her own slope B i We assume these individual parameters (A i and B i ) follow a bivariate normal distribution in the population ( Ai B i ) N (( αg(i) β g(i) The covariance is the so-called G-matrix. 53 / 67 ), ( τ a ω ab ω ab τ b )) G describes the population variance of the regression parameters, the inter-individual variation. Note the correlation ρ ab = ω ab τ a τ b ; Subjects with lower baseline might tend to also have smaller (or larger) growth rates. PROC MIXED: random regression proc mixed data=calcium plots=all; class grp girl; model bmd = time grp*time / ddfm=kr solution cl; random intercept time / type=un subject=girl g v vcorr; run; Individual intercepts and slopes are so-called random effects. They must be specified in a random-statement. Note: The intercept refers to time=0 (baseline). Here type=un refers to a unstructured specification of the G-matrix. If it is omitted, we may experience convergence problems and sometimes totally incomprehensible results. Option g prints the estimated G-matrix. (More about random effects in lecture 3). 54 / 67 Output from random regression Output from random regression Estimated G Matrix Row Effect girl Col1 Col 1 Intercept 101 0.004155 0.000103 time 101 0.000103 0.000191 Covariance Parameter Estimates Cov Parm Subject Estimate UN(1,1) girl 0.004155 UN(,1) girl 0.000103 UN(,) girl 0.000191 Residual 0.00015 Fit Statistics - Res Log Likelihood -350.4 AIC (smaller is better) -34.4 AICC (smaller is better) -34.3 BIC (smaller is better) -331.5 Note: ρ ab = 0.000103 0.004155 0.000191 = 0.1. Solution for Fixed Effects Standard Effect grp Estimate Error DF t Value Pr > t Alpha Lower Upper Intercept 0.875 0.006149 111 14.3 <.0001 0.05 0.8630 0.8874 time 0.04490 0.0008 96 0.33 <.0001 0.05 0.04051 0.0498 time*grp C 0.008859 0.003174 96.5.79 0.0063 0.05 0.00559 0.01516 time*grp P 0....... Type 3 Tests of Fixed Effects Num Den Effect DF DF F Value Pr > F time 1 96.4 98.47 <.0001 time*grp 1 96.5 7.79 0.0063 We find an extra increase in BMD of 0.0089 g/cm 3 per year, with calcium supplement, (95% CI: 0.006-0.015, P=0.0063). 55 / 67 56 / 67

Implied covariance The random regression implies the covariance pattern: Cov(Y ij, Y ik ) = τa + (t j + t k )ω ab + t j t k τb as a function of the G-matrix parameters and the time points. Does this fit the data well? Random regression vs UN (linear trend for the mean) log Q = 400.9 350.4 = 50.5 χ (15 3) = χ (1) P = 0.00001 No, appearantly not. 57 / 67 Implied covariance Options v and vcorr prints the covariance and correlation. Estimated V Matrix for girl 101 Row Col1 Col Col3 Col4 Col5 1 0.00480 0.00407 0.00458 0.004309 0.004360 0.00407 0.004430 0.004405 0.004503 0.00460 3 0.00458 0.004405 0.004676 0.004698 0.004844 4 0.004309 0.004503 0.004698 0.005017 0.005086 5 0.004360 0.00460 0.004844 0.005086 0.005453 Estimated V Correlation Matrix for girl 101 Row Col1 Col Col3 Col4 Col5 58 / 67 1 1.0000 0.9660 0.9518 0.999 0.906 0.9660 1.0000 0.9677 0.955 0.9364 3 0.9518 0.9677 1.0000 0.9699 0.9594 4 0.999 0.955 0.9699 1.0000 0.974 5 0.906 0.9364 0.9594 0.974 1.0000 Outline Nonequidistant time points Longitudinal designs Models for the mean Covariance pattern models Analysis of summary statistics The linear growth model (random regression) Unbalanced data In the calcium study the girls are only seen approximately twice a year. Perhaps we get better estimates of the slopes when replacing planned time of visit with the actual individual times? But we loose the option of an unstructured covariance. Other covariance patterns can still be fitted e.g. the autoregressive pattern, the random regression model, and the compound symmetry pattern. 59 / 67 60 / 67

BMD vs actual time of visit Non-equidistant observations Only a limited number of covariance patterns are available in case time points are individual or non-equidistant. They are all stationary: The variance is constant over time. The correlation depend only on the time-distance between the observations, not the specific time points. The ctime-variable must be a numerical variable in SAS proc mixed Cov(Y ij, Y ik ) no. type= param CS σ [I {j = k} + ρ I {j k}] SP(POW)(ctime) σ ρ t ij t ik SP(GAU)(ctime) σ e t ij t ik /γ SP(LIN)(ctime) σ (1 ρ t k t j ) I {ρ t k t j 1}] 61 / 67 6 / 67 Random regression in observed time Random regression, using actual duration Estimated G Matrix Row Effect girl Col1 Col Syntax is exactly the same as with scheduled time points, only the name of the time-varible changes. title1 random regression (actual time) ; proc mixed data=calcium plots=all; class grp girl; model bmd = obstime grp*obstime / ddfm=kr solution cl; random intercept obstime / type=un subject=girl g; run; 1 Intercept 101 0.004174 0.000103 obstime 101 0.000103 0.000179 Solution for Fixed Effects Standard Effect grp Estimate Error DF t Value Pr > t Alpha Lower Upper Intercept 0.8751 0.006163 111 141.99 <.0001 0.05 0.869 0.8873 obstime 0.04538 0.0016 96.3 0.99 <.0001 0.05 0.04109 0.04968 obstime*grp C 0.008758 0.003104 96.9.8 0.0058 0.05 0.00596 0.0149 obstime*grp P 0....... Type 3 Tests of Fixed Effects Num Den Effect DF DF F Value Pr > F obstime 1 96.6 1045.11 <.0001 obstime*grp 1 96.9 7.96 0.0058 63 / 67 We find an extra increase in BMD of 0.0088 g/cm 3 per year with 64 / 67 calcium supplement, 95% Ci: 0.006 to 0.0149, P=0.0058.

Tests of treatment effect Scheduled vs observed times Comparison of estimates for different covariance structures: Covariance pattern Estimate (95% CI) P-value Assumed independence 0.017 (0.0071;0.07) 0.0009 Compound symmetry 0.009 (0.0053;0.0138) < 0.0001 Autoregressive 0.0100 (0.0038;0.016) 0.0017 Random regression 0.0088 (0.006;0.0149) 0.0058 Estimates and tests depend on the covariance! Estimated slopes from the two random regressions: Group Scheduled time Actual time P 0.0449 (0.0405;0.0493) 0.0454 (0.0411;0.0497) C 0.0537 (0.0493;0.058) 0.0541 (0.0498;0.0585) Difference 0.0089 (0.006;0.01516) 0.0088 (0.006;0.0149) P-value 0.0063 0.0058 Slightly steeper mean slopes in both groups with actual times. Slightly smaller standard errors with actual times. This would have been more pronounced if actual times of visit deviated more from the scheduled times. 65 / 67 66 / 67 Concluding remarks Results depend on choice of covariance pattern Obvious bias for unrealistic models (independence, CS). More similar results for the more complex models. Not much impact of using exact times of measurement instead of planned times (visit) - WHY? Gain in modeling flexibility by rounding the times. There are sophisticated statistical arguments implying that rounding to the nearest scheduled time do not cause bias (Berkson error) But it increases variance. Choosing an apropriate model is a compromise between practical feasibility, realistic model assumptions, and interpretable results. 67 / 67