Introduction to SAS proc mixed

Similar documents
Introduction to SAS proc mixed

Answer to exercise: Blood pressure lowering drugs

Subject-specific observed profiles of log(fev1) vs age First 50 subjects in Six Cities Study

SAS Syntax and Output for Data Manipulation:

Covariance Structure Approach to Within-Cases

Models for longitudinal data

ANOVA Longitudinal Models for the Practice Effects Data: via GLM

Odor attraction CRD Page 1

Analysis of Longitudinal Data: Comparison Between PROC GLM and PROC MIXED. Maribeth Johnson Medical College of Georgia Augusta, GA

dm'log;clear;output;clear'; options ps=512 ls=99 nocenter nodate nonumber nolabel FORMCHAR=" = -/\<>*"; ODS LISTING;

over Time line for the means). Specifically, & covariances) just a fixed variance instead. PROC MIXED: to 1000 is default) list models with TYPE=VC */

MIXED MODELS FOR REPEATED (LONGITUDINAL) DATA PART 2 DAVID C. HOWELL 4/1/2010

SAS Code for Data Manipulation: SPSS Code for Data Manipulation: STATA Code for Data Manipulation: Psyc 945 Example 1 page 1

Correlated data. Repeated measurements over time. Typical set-up for repeated measurements. Traditional presentation of data

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator

Topic 20: Single Factor Analysis of Variance

Variance component models part I

Testing Indirect Effects for Lower Level Mediation Models in SAS PROC MIXED

Time-Invariant Predictors in Longitudinal Models

Topic 17 - Single Factor Analysis of Variance. Outline. One-way ANOVA. The Data / Notation. One way ANOVA Cell means model Factor effects model

Time-Invariant Predictors in Longitudinal Models

WITHIN-PARTICIPANT EXPERIMENTAL DESIGNS

Multi-factor analysis of variance

Correlated data. Longitudinal data. Typical set-up for repeated measurements. Examples from literature, I. Faculty of Health Sciences

SAS Syntax and Output for Data Manipulation: CLDP 944 Example 3a page 1

Time-Invariant Predictors in Longitudinal Models

A Re-Introduction to General Linear Models (GLM)

17. Example SAS Commands for Analysis of a Classic Split-Plot Experiment 17. 1

Introduction to Within-Person Analysis and RM ANOVA

Time-Invariant Predictors in Longitudinal Models

Time Invariant Predictors in Longitudinal Models

Interactions among Continuous Predictors

Models for binary data

Some general observations.

Lecture 4. Random Effects in Completely Randomized Design

This is a Randomized Block Design (RBD) with a single factor treatment arrangement (2 levels) which are fixed.

Dynamic Determination of Mixed Model Covariance Structures. in Double-blind Clinical Trials. Matthew Davis - Omnicare Clinical Research

Chapter 11. Analysis of Variance (One-Way)

Repeated Measures Data

Descriptions of post-hoc tests

Repeated Measures Modeling With PROC MIXED E. Barry Moser, Louisiana State University, Baton Rouge, LA

Topic 23: Diagnostics and Remedies

Analysis of variance and regression. May 13, 2008

Random Intercept Models

Introduction and Background to Multilevel Analysis

Topic 25 - One-Way Random Effects Models. Outline. Random Effects vs Fixed Effects. Data for One-way Random Effects Model. One-way Random effects

Outline. Topic 20 - Diagnostics and Remedies. Residuals. Overview. Diagnostics Plots Residual checks Formal Tests. STAT Fall 2013

STAT 5200 Handout #23. Repeated Measures Example (Ch. 16)

Workshop 9.3a: Randomized block designs

A Re-Introduction to General Linear Models

Repeated Measures ANOVA Multivariate ANOVA and Their Relationship to Linear Mixed Models

Statistics for exp. medical researchers Regression and Correlation

Lecture 3: Inference in SLR

unadjusted model for baseline cholesterol 22:31 Monday, April 19,

Changes Report 2: Examples from the Australian Longitudinal Study on Women s Health for Analysing Longitudinal Data

Linear Mixed Models with Repeated Effects

Random Coefficients Model Examples

R Output for Linear Models using functions lm(), gls() & glm()

Designing Multilevel Models Using SPSS 11.5 Mixed Model. John Painter, Ph.D.

ssh tap sas913, sas

Contrasting Marginal and Mixed Effects Models Recall: two approaches to handling dependence in Generalized Linear Models:

Describing Within-Person Fluctuation over Time using Alternative Covariance Structures

Review of CLDP 944: Multilevel Models for Longitudinal Data

Introduction to Random Effects of Time and Model Estimation

Regression without measurement error using proc calis

Lecture 10: Experiments with Random Effects

LINEAR REGRESSION. Copyright 2013, SAS Institute Inc. All rights reserved.

Mixed models in R using the lme4 package Part 2: Longitudinal data, modeling interactions

PLS205 Lab 2 January 15, Laboratory Topic 3

T-test: means of Spock's judge versus all other judges 1 12:10 Wednesday, January 5, judge1 N Mean Std Dev Std Err Minimum Maximum

Step 2: Select Analyze, Mixed Models, and Linear.

In many situations, there is a non-parametric test that corresponds to the standard test, as described below:

Longitudinal Modeling with Logistic Regression

Analysis of variance and regression. November 22, 2007

Faculty of Health Sciences. Correlated data. Count variables. Lene Theil Skovgaard & Julie Lyng Forman. December 6, 2016

Multilevel Models in Matrix Form. Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2

Statistical Inference: The Marginal Model

Outline. Statistical inference for linear mixed models. One-way ANOVA in matrix-vector form

Practice with Interactions among Continuous Predictors in General Linear Models (as estimated using restricted maximum likelihood in SAS MIXED)

Simple, Marginal, and Interaction Effects in General Linear Models: Part 1

General Linear Model (Chapter 4)

STAT 501 EXAM I NAME Spring 1999

Longitudinal Data Analysis of Health Outcomes

Statistical Distribution Assumptions of General Linear Models

Faculty of Health Sciences. Correlated data. Variance component models. Lene Theil Skovgaard & Julie Lyng Forman.

Correlated data. Overview. Variance component models. Terminology for correlated measurements. Faculty of Health Sciences. Variance component models

Week 7.1--IES 612-STA STA doc

Lecture 2. The Simple Linear Regression Model: Matrix Approach

Chapter 9. Multivariate and Within-cases Analysis. 9.1 Multivariate Analysis of Variance

This is a Split-plot Design with a fixed single factor treatment arrangement in the main plot and a 2 by 3 factorial subplot.

36-309/749 Experimental Design for Behavioral and Social Sciences. Dec 1, 2015 Lecture 11: Mixed Models (HLMs)

Analysis of Count Data A Business Perspective. George J. Hurley Sr. Research Manager The Hershey Company Milwaukee June 2013

4.1 Computing section Example: Bivariate measurements on plants Post hoc analysis... 7

6. Multiple regression - PROC GLM

Value Added Modeling

SAS Analysis Examples Replication C8. * SAS Analysis Examples Replication for ASDA 2nd Edition * Berglund April 2017 * Chapter 8 ;

Generalised linear models. Response variable can take a number of different formats

More Accurately Analyze Complex Relationships

Coping with Additional Sources of Variation: ANCOVA and Random Effects

Regression: Main Ideas Setting: Quantitative outcome with a quantitative explanatory variable. Example, cont.

Transcription:

Faculty of Health Sciences Introduction to SAS proc mixed Analysis of repeated measurements, 2017 Julie Forman Department of Biostatistics, University of Copenhagen 2 / 28 Preparing data for analysis The long format Most often raw data is stored in the wide format (e.g. in Excell). one row per subject several columns with the outcomes for different occations Each row contains only one observation of the outcome. A time-variable identifies the time of measurement. An id-variable identifies measurements from same subject. Example: id sex age group aix0 aix1 aix2 1 1 57 0 10.5 17.5 25.0 2 1 48 0-2.5 8.0 8.5 3 2 54 1 18.0 24.0 23.5... To fit a linear mixed model with any statistical software data must be in the so-called long format... 3 / 28 Obs id sex age group week aix 1 1 1 57 0 0 10.5 2 1 1 57 0 12 17.5 3 1 1 57 0 24 25.0 4 2 1 48 0 0-2.5 5 2 1 48 0 12 8.0 6 2 1 48 0 24 8.5 7 3 2 54 1 0 18.0 8 3 2 54 1 12 24.0 9 3 2 54 1 24 23.5 10 4 2 46 1 0 26.0... 4 / 28

From wide to long format Data is transformed from the wide to the long format with: DATA ckd (DROP = aix-aix2); SET ckd_wide; week = 0; aix = aix0; OUTPUT; week = 12; aix = aix1; OUTPUT; week = 24; aix = aix2; OUTPUT; Note: We keep the baseline variable aix0 for the ANCOVA. 5 / 28 6 / 28 Spaghettiplots Summary statistics and pairwise scatterplots The spaghettiplots from the lecture were made with: PROC SGPANEL DATA=ckd; PANELBY group; SERIES x = week y = aix / GROUP=id; Note: Applies to data in the long format. PROC SORT DATA=ckd_wide; BY group; ODS GRAPHICS ON; PROC CORR DATA=ckd_wide PLOT=MATRIX(HISTOGRAM) NOPROB; BY group; VAR aix0-aix2; Note: Applies to data in the wide format. 7 / 28 8 / 28

Plotting averages over time The plot of group-time-averages were made with: PROC MEANS DATA=ckd NWAY; CLASS group week; VAR aix; OUTPUT OUT=ckdmeans MEAN=average; PROC SGPLOT DATA=ckdmeans; SERIES x = week y = average / GROUP = group markers; Note: Applies to data in the long format. 9 / 28 10 / 28 Syntax: Analysis of response profiles The option DDFM=KENWARDROGERS (aka KR) PROC MIXED DATA=ckd PLOTS=all; CLASS id week (ref= 0 ) group (ref= 0 ); MODEL aix = week group group*week / SOLUTION CL DDFM=KR OUTPM=ckdfit; Syntax is similar to PROC GLM with a MODEL specifying the (linear) relation between outcome and covariates. Categorical variable must be declared with CLASS. The model for the covariance (UN=ustructured) is specified in a separate REPEATED-statement. Fitted values and residuals are saved in a dataset ckdfit. Use the PLOTS-option to get some residual plots. (or DDFM=SATTERTHWAITE). A technical option intended to improve the statistical performance of the t-tests and F-tests. It has no effect on balanced data. In unbalanced situations (i.e for almost all observational studies and in case of missing observations) degrees of freedom are computed by a more complicated formulae. The computations may require a little more time, but in most cases this will not be noticable. When in doubt, use it! 11 / 28 12 / 28

Estimated response profiles Alternative model specifications Use the output data (ckdfit) to plot the estimated profiles: PROC SORT DATA=ckdfit; BY group week id; PROC SGPLOT DATA=ckdfit; SERIES x = week y = pred / GROUP = group MARKERS; The same model can be phrased differently to highlight differences between groups at specific time points or changes over time. To compare change over time between groups: Include both main effects and the interaction term. MODEL aix = time group time*group / SOLUTION CL; To get mean differences between groups at each time point: Omit the main effect of group and the intercept. MODEL aix = time time*group / NOINT SOLUTION CL; To get the means for all combinations of group and time. Include only the interaction term and omit the intercept. MODEL aix = time*group / NOINT SOLUTION CL; Note: Usually combined with LSMEANS (on the next slide) 13 / 28 14 / 28 LSMEANS Estimates the means for all time and group combination, and all possible differences between them (DIFF-option). PROC MIXED DATA=ckd; CLASS id week group; MODEL aix = group*week / NOINT DDFM=KR; LSMEANS group*week / DIFF SLICE=week CL; NOINT means that the model does not include an intercept (so there is no need to specifiy reference groups) Use SLICE=week to test for overall differences between multiple groups at each time separately (like one-way ANOVA). 15 / 28 16 / 28

First we get a summary of what data and methods proc mixed has used. (some we have specified and other are SAS defaults) Dimensions The Mixed Procedure Data Set Dependent Variable Covariance Structure Subject Effect Estimation Method Residual Variance Method Fixed Effects SE Method Degrees of Freedom Method Model Information WORK.CKD aix Unstructured id REML None Kenward-Roger Kenward-Roger Class Level Information Class Levels Values id 51 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 45 46 47 48 49 51 52 53 54 week 3 12 24 0 group 2 1 0 17 / 28 Covariance Parameters 6 Columns in X 12 Columns in Z 0 Subjects 51 Max Obs Per Subject 3 This is a summary of the mathematical model specification which is explained in lecture 4. Number of Observations Number of Observations Read 153 Number of Observations Used 144 Number of Observations Not Used 9 ATT: Missing data due to drop out and failed measurements. 18 / 28 In contrary to the ordinary linear models, no explicit formulae for the maximum likelihood estimates exist for linear mixed models in general. Therefore SAS uses numerical optimisation to compute esitmates of the mean and covariance parameters. Iteration History Iteration Evaluations -2 Res Log Like Criterion 0 1 1070.85454941 1 2 982.86560047 0.00144735 2 1 982.26253864 0.00009905 3 1 982.22468047 0.00000061 4 1 982.22445749 0.00000000 Convergence criteria met. Always check that the numerical optimisation has converged! 19 / 28 Options R and RCORR makes SAS print the estimated covariance and correlation matrices. 20 / 28 Estimated R Matrix for id 1 Row Col1 Col2 Col3 1 106.23 96.3802 80.1893 2 96.3802 159.64 106.48 3 80.1893 106.48 106.38 Estimated R Correlation Matrix for id 1 Row Col1 Col2 Col3 1 1.0000 0.7401 0.7544 2 0.7401 1.0000 0.8171 3 0.7544 0.8171 1.0000

Fit statistics can be used for comparison of different models. Fit Statistics -2 Res Log Likelihood 982.2 AIC (smaller is better) 994.2 AICC (smaller is better) 994.9 BIC (smaller is better) 1005.8 Null Model Likelihood Ratio Test DF Chi-Square Pr > ChiSq 5 88.63 <.0001 The test of "all means are the same" is hardly ever of interest. Make sure to use the PROC MIXED METHOD=ML-option if you want to use this to test nested models for the mean-structure (lecture 2). 21 / 28 At last what is most interesting: estimates and tests. Solution for Fixed Effects Effect week treat Estimate StdError DF t Value Pr > t Intercept 24.3431 2.0793 49.4 11.71 <.0001 week 12 1.0887 1.7694 46.2 0.62 0.5414 week 24 3.0895 1.4995 44.5 2.06 0.0452 week 0 0.... group 1-2.0547 2.8999 48.9-0.71 0.4820 group 0 0.... week*group 12 1-1.9493 2.4871 45.8-0.78 0.4372 week*group 12 0 0.... week*group 24 1-3.6078 2.1298 45.3-1.69 0.0971 week*group 24 0 0.... week*group 0 1 0.... week*group 0 0 0.... (confidence intervals omitted due to lack of space) Type 3 Tests of Fixed Effects Num Den Effect DF DF F Value Pr > F week 2 44.5 0.99 0.3794 group 1 47 1.84 0.1817 week*group 2 44.5 1.43 0.2490 22 / 28 Standardized (aka Studentized) residuals: Normal distribution? (Other 23 / 28 residuals and boxplots of residuals vs time and group omitted) 24 / 28

Which model should I choose? Results from the clmm and the ANCOVA model are usually very similar. We recommed the clmm. Programming and interpretation is easier. It is slightly better at handling missing data. Exception: If randomization was performed conditionally on baseline measurements, then the ANCOVA is a valid model while the clmm is not. The constrained linear mixed model (clmm) To fit the constrained model: 1. Define a new treatment variable by joining groups at baseline. 2. Leave out the main term treat in the model statement. DATA ckd; SET ckd; treat = group; IF week = 0 THEN treat = 0; PROC MIXED DATA=ckd; CLASS id week (ref= 0 ) treat (ref= 0 ); MODEL aix = week treat*week / SOLUTION CL DDFM=KR; 25 / 28 26 / 28 ANCOVA ANCOVA To prepare for the analysis. Baseline must be included as a covariate in the data. Only follow-up times are used when running the analysis. For ease of interpretation and numerical stability we center the baseline variable around its mean. For ease of quantification we use change-since-baseline as outcome. DATA followup; SET ckd; IF week > 0; baseline = aix0 - xxxx; aixchange=aix-aix0; To run the analysis with proc mixed: Include the baseline*time interaction in the model. Since the analysis is based on follow-up data, the most natural reference point for time is now the last follow-up. The treatment effect (af last follow-up) is estimated by the group-effect. PROC MIXED DATA=followup; CLASS id week (ref= 24 ) group (ref= 0 ); MODEL aixchange = group week group*week baseline*week / SOLUTION CL DDFM=KR; 27 / 28 28 / 28