Faculty of Health Sciences Introduction to SAS proc mixed Analysis of repeated measurements, 2017 Julie Forman Department of Biostatistics, University of Copenhagen 2 / 28 Preparing data for analysis The long format Most often raw data is stored in the wide format (e.g. in Excell). one row per subject several columns with the outcomes for different occations Each row contains only one observation of the outcome. A time-variable identifies the time of measurement. An id-variable identifies measurements from same subject. Example: id sex age group aix0 aix1 aix2 1 1 57 0 10.5 17.5 25.0 2 1 48 0-2.5 8.0 8.5 3 2 54 1 18.0 24.0 23.5... To fit a linear mixed model with any statistical software data must be in the so-called long format... 3 / 28 Obs id sex age group week aix 1 1 1 57 0 0 10.5 2 1 1 57 0 12 17.5 3 1 1 57 0 24 25.0 4 2 1 48 0 0-2.5 5 2 1 48 0 12 8.0 6 2 1 48 0 24 8.5 7 3 2 54 1 0 18.0 8 3 2 54 1 12 24.0 9 3 2 54 1 24 23.5 10 4 2 46 1 0 26.0... 4 / 28
From wide to long format Data is transformed from the wide to the long format with: DATA ckd (DROP = aix-aix2); SET ckd_wide; week = 0; aix = aix0; OUTPUT; week = 12; aix = aix1; OUTPUT; week = 24; aix = aix2; OUTPUT; Note: We keep the baseline variable aix0 for the ANCOVA. 5 / 28 6 / 28 Spaghettiplots Summary statistics and pairwise scatterplots The spaghettiplots from the lecture were made with: PROC SGPANEL DATA=ckd; PANELBY group; SERIES x = week y = aix / GROUP=id; Note: Applies to data in the long format. PROC SORT DATA=ckd_wide; BY group; ODS GRAPHICS ON; PROC CORR DATA=ckd_wide PLOT=MATRIX(HISTOGRAM) NOPROB; BY group; VAR aix0-aix2; Note: Applies to data in the wide format. 7 / 28 8 / 28
Plotting averages over time The plot of group-time-averages were made with: PROC MEANS DATA=ckd NWAY; CLASS group week; VAR aix; OUTPUT OUT=ckdmeans MEAN=average; PROC SGPLOT DATA=ckdmeans; SERIES x = week y = average / GROUP = group markers; Note: Applies to data in the long format. 9 / 28 10 / 28 Syntax: Analysis of response profiles The option DDFM=KENWARDROGERS (aka KR) PROC MIXED DATA=ckd PLOTS=all; CLASS id week (ref= 0 ) group (ref= 0 ); MODEL aix = week group group*week / SOLUTION CL DDFM=KR OUTPM=ckdfit; Syntax is similar to PROC GLM with a MODEL specifying the (linear) relation between outcome and covariates. Categorical variable must be declared with CLASS. The model for the covariance (UN=ustructured) is specified in a separate REPEATED-statement. Fitted values and residuals are saved in a dataset ckdfit. Use the PLOTS-option to get some residual plots. (or DDFM=SATTERTHWAITE). A technical option intended to improve the statistical performance of the t-tests and F-tests. It has no effect on balanced data. In unbalanced situations (i.e for almost all observational studies and in case of missing observations) degrees of freedom are computed by a more complicated formulae. The computations may require a little more time, but in most cases this will not be noticable. When in doubt, use it! 11 / 28 12 / 28
Estimated response profiles Alternative model specifications Use the output data (ckdfit) to plot the estimated profiles: PROC SORT DATA=ckdfit; BY group week id; PROC SGPLOT DATA=ckdfit; SERIES x = week y = pred / GROUP = group MARKERS; The same model can be phrased differently to highlight differences between groups at specific time points or changes over time. To compare change over time between groups: Include both main effects and the interaction term. MODEL aix = time group time*group / SOLUTION CL; To get mean differences between groups at each time point: Omit the main effect of group and the intercept. MODEL aix = time time*group / NOINT SOLUTION CL; To get the means for all combinations of group and time. Include only the interaction term and omit the intercept. MODEL aix = time*group / NOINT SOLUTION CL; Note: Usually combined with LSMEANS (on the next slide) 13 / 28 14 / 28 LSMEANS Estimates the means for all time and group combination, and all possible differences between them (DIFF-option). PROC MIXED DATA=ckd; CLASS id week group; MODEL aix = group*week / NOINT DDFM=KR; LSMEANS group*week / DIFF SLICE=week CL; NOINT means that the model does not include an intercept (so there is no need to specifiy reference groups) Use SLICE=week to test for overall differences between multiple groups at each time separately (like one-way ANOVA). 15 / 28 16 / 28
First we get a summary of what data and methods proc mixed has used. (some we have specified and other are SAS defaults) Dimensions The Mixed Procedure Data Set Dependent Variable Covariance Structure Subject Effect Estimation Method Residual Variance Method Fixed Effects SE Method Degrees of Freedom Method Model Information WORK.CKD aix Unstructured id REML None Kenward-Roger Kenward-Roger Class Level Information Class Levels Values id 51 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 45 46 47 48 49 51 52 53 54 week 3 12 24 0 group 2 1 0 17 / 28 Covariance Parameters 6 Columns in X 12 Columns in Z 0 Subjects 51 Max Obs Per Subject 3 This is a summary of the mathematical model specification which is explained in lecture 4. Number of Observations Number of Observations Read 153 Number of Observations Used 144 Number of Observations Not Used 9 ATT: Missing data due to drop out and failed measurements. 18 / 28 In contrary to the ordinary linear models, no explicit formulae for the maximum likelihood estimates exist for linear mixed models in general. Therefore SAS uses numerical optimisation to compute esitmates of the mean and covariance parameters. Iteration History Iteration Evaluations -2 Res Log Like Criterion 0 1 1070.85454941 1 2 982.86560047 0.00144735 2 1 982.26253864 0.00009905 3 1 982.22468047 0.00000061 4 1 982.22445749 0.00000000 Convergence criteria met. Always check that the numerical optimisation has converged! 19 / 28 Options R and RCORR makes SAS print the estimated covariance and correlation matrices. 20 / 28 Estimated R Matrix for id 1 Row Col1 Col2 Col3 1 106.23 96.3802 80.1893 2 96.3802 159.64 106.48 3 80.1893 106.48 106.38 Estimated R Correlation Matrix for id 1 Row Col1 Col2 Col3 1 1.0000 0.7401 0.7544 2 0.7401 1.0000 0.8171 3 0.7544 0.8171 1.0000
Fit statistics can be used for comparison of different models. Fit Statistics -2 Res Log Likelihood 982.2 AIC (smaller is better) 994.2 AICC (smaller is better) 994.9 BIC (smaller is better) 1005.8 Null Model Likelihood Ratio Test DF Chi-Square Pr > ChiSq 5 88.63 <.0001 The test of "all means are the same" is hardly ever of interest. Make sure to use the PROC MIXED METHOD=ML-option if you want to use this to test nested models for the mean-structure (lecture 2). 21 / 28 At last what is most interesting: estimates and tests. Solution for Fixed Effects Effect week treat Estimate StdError DF t Value Pr > t Intercept 24.3431 2.0793 49.4 11.71 <.0001 week 12 1.0887 1.7694 46.2 0.62 0.5414 week 24 3.0895 1.4995 44.5 2.06 0.0452 week 0 0.... group 1-2.0547 2.8999 48.9-0.71 0.4820 group 0 0.... week*group 12 1-1.9493 2.4871 45.8-0.78 0.4372 week*group 12 0 0.... week*group 24 1-3.6078 2.1298 45.3-1.69 0.0971 week*group 24 0 0.... week*group 0 1 0.... week*group 0 0 0.... (confidence intervals omitted due to lack of space) Type 3 Tests of Fixed Effects Num Den Effect DF DF F Value Pr > F week 2 44.5 0.99 0.3794 group 1 47 1.84 0.1817 week*group 2 44.5 1.43 0.2490 22 / 28 Standardized (aka Studentized) residuals: Normal distribution? (Other 23 / 28 residuals and boxplots of residuals vs time and group omitted) 24 / 28
Which model should I choose? Results from the clmm and the ANCOVA model are usually very similar. We recommed the clmm. Programming and interpretation is easier. It is slightly better at handling missing data. Exception: If randomization was performed conditionally on baseline measurements, then the ANCOVA is a valid model while the clmm is not. The constrained linear mixed model (clmm) To fit the constrained model: 1. Define a new treatment variable by joining groups at baseline. 2. Leave out the main term treat in the model statement. DATA ckd; SET ckd; treat = group; IF week = 0 THEN treat = 0; PROC MIXED DATA=ckd; CLASS id week (ref= 0 ) treat (ref= 0 ); MODEL aix = week treat*week / SOLUTION CL DDFM=KR; 25 / 28 26 / 28 ANCOVA ANCOVA To prepare for the analysis. Baseline must be included as a covariate in the data. Only follow-up times are used when running the analysis. For ease of interpretation and numerical stability we center the baseline variable around its mean. For ease of quantification we use change-since-baseline as outcome. DATA followup; SET ckd; IF week > 0; baseline = aix0 - xxxx; aixchange=aix-aix0; To run the analysis with proc mixed: Include the baseline*time interaction in the model. Since the analysis is based on follow-up data, the most natural reference point for time is now the last follow-up. The treatment effect (af last follow-up) is estimated by the group-effect. PROC MIXED DATA=followup; CLASS id week (ref= 24 ) group (ref= 0 ); MODEL aixchange = group week group*week baseline*week / SOLUTION CL DDFM=KR; 27 / 28 28 / 28