Residuals in the Analysis of Longitudinal Data Jemila Hamid, PhD (Joint work with WeiLiang Huang) Clinical Epidemiology and Biostatistics & Pathology and Molecular Medicine McMaster University
Outline 1. Introduction 2. Residuals in the analysis of longitudinal data 3. Transformed Residuals 4. The growth curve model and decomposed residuals 5. Real data application 6. Discussion 2
1. Introduction Statistical modeling plays important roles in understanding the relationship between one or more variables Statistical modeling is commonly used in a wide range of applications, from finance, banking and weather forecasting to clinical medicine and public health, to name a few In medical and biological research in particular, modeling has been demonstrated to be an essential tool for enhancing our understanding of variety of common as well as rare diseases affecting the public 3
Introduction (cont d) Statistical modeling also plays important roles in disease diagnosis, prognosis, management as well as disease prevention and health promotion Statistical models are also commonly used in identifying risk factors associated with diseases and hence allowing effective diagnosis, treatment as well as prevention mechanisms The terms evidence-based medicine, evidence-based diagnosis and evidence-based decision making highlight the importance of statistical methods in areas of medicine and public health 4
Introduction (cont d) At the initial stages of modeling, we specify the model and model assumptions We then estimate the model parameters based on the specified model and the underlying assumptions Statistical models often rely on several assumptions including Distributional assumptions Mostly on outcome variables Relational assumptions Quantify relationship between outcome and predictors 5
Introduction (cont d) However, modeling is not complete without the investigation of model-data agreement Model Diagnostics We need to ask questions like Does data support model assumptions? Does the model fit the data? Are the mean and the covariance modeled properly? Do we need to add or remove variables? Are there outliers and/or influential observations that influence our estimation and affect the generalizability of our model? Model diagnostics is, therefore, a crucial component of any model fitting problem 6
Outline 1. Introduction 2. Residuals in the analysis of longitudinal data 3. Residuals decomposition 4. Residuals transformation 5. Real data application 6. Discussion 7
Residuals We can not talk about model diagnostics without residuals Residuals are not only used to check adequacy of model fit, they also are excellent tools to validate model assumptions as well as identify outliers and/or influential observations Residuals in univariate models are relatively simple to explore and have been studied extensively They are routinely used for model diagnostics Different types of residuals are proposed ordinary residuals, standardized residuals, studentized residuals and jackknife residuals 8
Residuals (cont d) Consider Model: Y = Xβ + ε Parameter Estimates: β = (X X) 1 X Y Ordinary Residuals Y C (X) C (X) R = Y Y = I X X X 1 X Y = I H Y Note: R = I H ε, E R = 0, aaa VVV r i = (1 h ii )σ 2 Residuals represent part of data that is left unexplained after a model has been fitted to data 9
Residuals (cont d) Standardized Residuals rr i = r i s, where s2 = 1 n p 1 r i 2 Studentized Residuals rrr i = r i s (1 h ii ) Jackknife Residuals rr i = r i s (i) (1 h ii ) 10
Residuals (cont d) How are the residuals used? Graphically Checking normality QQ plots Checking model fit Scatter plots Checking independence Scatter plots Checking for outliers and or influential observations leverage plots, plot of Cook s Distance, plot of DFBETAS and DEFITS 11
Residuals (cont d) How are the residuals used? Formal tests based on Residuals Test of normality: Shapiro-Wilk s Test, Kolmogorove- Smirnove test Constant Variance homoscedasticity: White s test Checking independence: Durbin-Watson Test Outliers and/or influential observations: Wald s test using Cook s Distance 12
Residuals (cont d) Normal Q-Q Plot Normal Q-Q Plot Sample Quantiles -4-2 0 2 4 6 Sample Quantiles 0 5 10 15-3 -2-1 0 1 2 3 Theoretical Quantiles -3-2 -1 0 1 2 3 Theoretical Quantiles Data from the normal distribution Data from the lognormal distribution 13
Residuals (cont d) Normal Q-Q Plot Normal Q-Q Plot Sample Quantiles 0 50 100 150 200 250 Sample Quantiles -4-2 0 2 4 6-3 -2-1 0 1 2 3 Theoretical Quantiles Data from the normal distribution Wrongly fitted model -3-2 -1 0 1 2 3 Theoretical Quantiles Data from the normal distribution Correctly fitted model resid(fitsq) 0 50 100 150 200 250 resid(fitsq) -4-2 0 2 4 6 14 9000 10000 12000 14000 16000 fitted(fitsq) 10000 12000 14000 16000 fitted(fitsq)
Residuals (cont d) Y1 others -30-20 -10 0 10 20 30 Y others -30-20 -10 0 10 20 30-30 -20-10 0 10 20 30 X others 0 10 20 30 40 50 60 Xnew others DFBETAS -20-15 -10-5 0 cooks.distance(fitnew)[-180] 0.0 0.1 0.2 0.3 15 0 100 200 300 400 500 Index 0 100 200 300 400 500 Index
2. Residuals in the Analysis of Longitudinal data Residuals are correlated Residuals are not normally distributed Residuals from the analysis of longitudinal data where there is no systematic component no effect of time 16
Residuals in the Analysis of Longitudinal data When there is time dependency where the mean is represented by a function of time, it is not obvious as to how we can use ordinary residuals obtained as a difference between the observed and fitted value 17 Correctly fitted model Wrongly fitted model
Outline 1. Introduction 2. Residuals in the analysis of longitudinal data 3. Transformed Residuals 4. The growth curve model and decomposed residuals 5. Real data application 6. Discussion 18
3. Transformed Residuals Cholesky decomposition Recall: The estimated covariance matrix for residuals is Consider the Cholesky decomposition Transform the residuals to get (Fitzmaurice, 2004) 19
Transformed Residuals (cont d) Small's graphical method The idea behind Small's graphical approach is to reduce the multivariate data to a univariate Suppose x 1, x 2, x n are independently distributed as N p (µ, ), then the statistic has a Beta distribution with parameters α = ½, β = ½(n-p-1) Where: 20
Transformed Residuals (cont d) Normal Q-Q Plot Multivariate normal data Independent (left) Correlated (right) Model is correctly fitted Normal Q- Q Plot Fitzmaurice's transformation Multivariate normal data Independent (left) Correlated (right) Model correctly fitted 21
Transformed Residuals (cont d) Normal Q-Q Plot Multivariate normal data Independent (left) Correlated (right) Model is correctly fitted Beta probability Plot Small's transformation Multivariate normal data Independent (left) Correlated (right) Model correctly fitted 22
Transformed Residuals (cont d) Normal Q-Q Plot of R Multivariate lognormal data Independent (left) correlated (right) Model is correctly fitted Fitzmaurice s Transformed Multivariate lognormal data Independent (left) Correlated (right) Model is correctly fitted 23
Transformed Residuals (cont d) Normal Q-Q Plot of R Multivariate lognormal data Independent (left) correlated (right) Model is correctly fitted Beta probability plots Small s Transformed Multivariate lognormal data Independent (left) Correlated (right) Model is correctly fitted 24
Transformed Residuals (cont d) Normal Q-Q Plot of R Multivariate normal data Correlated data Model is wrongly fitted Fitzmaurice's transformed residuals Multivariate normal data Correlated data Model is wrongly fitted 25
Transformed Residuals (cont d) Limitations in using the above two transformations in multivariate analysis Meant to be used for checking distributional assumptions and do not allow assessment of model fit If the model is not properly fitted, the performance for checking multivariate normality is not good as well This is particularly important in the analysis of longitudinal data where there is within individual assumption that has to be prespecified to describe the mean growth/change over time Residuals that allow is to check the within and between and between individual assumptions are better under this situations 26
Outline 1. Introduction 2. Residuals in the analysis of longitudinal data 3. Transformed Residuals 4. The growth curve model and decomposed residuals 5. Real data application 6. Discussion 27
4. The GCM and decomposed residuals The Growth Curve Model Suppose that we have m different groups where repeated measurements are taken from a given individual at p different time points. Suppose also that the mean for the i th group follows a polynomial curve of degree q over time, which can be described as Then, the Growth Curve Model (GCM) can be formulated as: 28
The Growth Curve Model A px(q+1) : Within individual design matrix B (q+1)xm : Parameter matrix C mxn : Between individual design matrix X pxn : observation matrix, and n = n 1 +n 2 29
The Growth Curve Model (cont d) Example: Dental measurements on eleven girls and sixteen boys at four different ages (8, 10, 12, 14) were taken. Each measurement is the distance, in millimeters, from the center of pituitary to pteryomaxillary fissure X = 30
The Growth Curve Model (cont d) 31
The Growth Curve Model (cont d) Objectives Should the growth curves be represented by second degree equations in time (t), or are linear equations adequate? Should two separate curves be used for boys and girls, or do both have the same growth curve? We may also be interested to estimate the growth curve(s) and obtain confidence band(s) for the expected growth curve(s)? 32
The Growth Curve Model (cont d) Example: Glucose Data Standard glucose tolerance test is administered 13 control and 20 obese patients Plasma inorganic phosphate measurements were determined from blood samples taken at 0, 0.5, 1, 1.5, 2, 3, 4 and 5 hours after a standard dose oral glucose Objective of the study was to study whether(or not) there is a significant difference between control and obese group of patients Second degree polynomial is used to model both groups 33
The Growth Curve Model (cont d) The matrices for the model 34
The Growth Curve Model (cont d) 35
Decomposed residuals (cont d) The maximum likelihood estimator for the parameter matrix B in the GCM is given by Khatri (1966): Where The predicted value is given as Therefore, ordinary residuals can be calculated by 36
Decomposed residuals (cont d) X Recall MANOVA Model: X X = BB + E The MLE estimate of B is B = XXX(CC ) 1 Residuals are therefore given by 37 R = X (I CC(CC ) 1 C)
Decomposed residuals (cont d) Note that R 1 + R 2 = X(I C CC 1 C) X Can be used to check between individual assumptions such as the normality assumption R 3 = XC CC 1 C AB C R 1 = I P A X(I P C ) R 2 = P A X(I P C ) R 3 = I P A XP C Can be used to check the within individual assumption This residual can be used to check if the fitted curve over time is adequate to represent the change over time 38
Decomposed residuals (cont d) Normal Q-Q Plot of R Multivariate normal data Correlated data Correctly fitted (left) Model is wrongly fitted (right) Scatter plot of R 3 Multivariate normal data Correlated data Correctly fitted (left) Model is wrongly fitted (right) 39
Decomposed residuals (cont d) Normal Q-Q Plot of R Multivariate normal data Correlated data Correctly fitted (left) Model is wrongly fitted (right) Scatter plot of R 3 Multivariate normal data Correlated data Correctly fitted (left) Model is wrongly fitted (right) 40
Decomposed residuals (cont d) Normal Q-Q Plot of R 1 + R 2 Perfectly fitted (left) Miss fitted (right) Multivariate Normal data Fitzmaurice's transformation Beta probability Plot of R 1 + R 2 Perfectly fitted (left) Miss fitted (right) GCM with normal error Small's transformation 41
Outline 1. Introduction 2. Residuals in the Growth Curve Model (GCM) 3. Residuals decomposition 4. Residuals transformation 5. Real data application 6. Discussion 42
5. Real data application Recall: Dental data 43
Real data application (cont d) Normal Q-Q plot of R 1 +R 2 (left) Scatter plot of R 3 (right) Normal Q-Q plot of Fitzmaurice's R 1 + R 2 (left) Beta quantile plot of Small's R 1 + R 2 (right) 44
Real data application (cont d) Dental data after outliers have been removed Normal Q-Q plot of Fitzmaurice's R 1 + R 2 (left) Beta quantile plot of Small's R 1 + R 2 (right) 45
Real data application (cont d) Recall: Glucose data 46
Real data application (cont d) Normal Q-Q plot of R 1 + R 2 (left) Scatter plot of R 3 (right) Normal Q-Q plot of Fitzmaurice's R 1 + R 2 (left) Beta quantile plot of Small's R 1 + R 2 (right) 47
Real data application (cont d) Glucose data without higher order of polynomial fitting Scatter plot of decomposed R 3 for quadratic fit (left) Scatter plot of decomposed R 3 for third degree fit (right) 48
6. Discussion Residuals play important roles in checking the adequacy of model fit, validating assumptions and identifying outliers and/or influential observations Residuals in the analysis of longitudinal data are correlated, not necessary normally distributed and Both Fitzmaurice's transformation or the Small's graphical method successfully removed the correlation structure. However, Fitzmaurices transformation did not perform well when data are not normally distributed where the transformed residuals leading to wrong decisions 49
Discussion (cont d) Residuals based on the growth curve model provided separate components of residuals that are useful for model diagnostics and checking multivariate normality The scatter plot of R 3 is able to identify systematic error in model fitting R 1 + R 2 provide reliable analysis for checking the normality assumption The results are consistent for small as well as large sample sizes, and for different covariance structures 50
Thank you!