Analysis of Longitudinal Data. Patrick J. Heagerty PhD Department of Biostatistics University of Washington

Size: px

Start display at page:

Download "Analysis of Longitudinal Data. Patrick J. Heagerty PhD Department of Biostatistics University of Washington"

Lucas Shields
5 years ago
Views:

1 Analsis of Longitudinal Data Patrick J. Heagert PhD Department of Biostatistics Universit of Washington 1 Auckland 2008

2 Session Three Outline Role of correlation Impact proper standard errors Used to weight individuals (clusters) Models for correlation / covariance Regression: Group-to-Group variation Random effects: Individual-to-Individual variation Serial correlation: Observation-to-Observation variation 2 Auckland 2008

3 Longitudinal Data Analsis INTRODUCTION to CORRELATION and WEIGHTING 3 Auckland 2008

4 Treatment of Lead-Exposed Children (TLC) Trial: In the 1990 s a placebo-controlled randomized trial of a new chelating agent, succimer, was conducted among children with lead levels 20-44µg/dL. Children received up to three 26-da courses of succimer or placebo and were followed for 3 ears. Data set with 100 children. m = 50 placebo; m = 50 active. Illustrate: naive analses and the impact of correlation. 4 Auckland 2008

5 TLC Trial Means Blood Lead Placebo Active Weeks 5 Auckland 2008

6 Simple (naive?) Analsis of Treatment Post Data Onl compare the mean blood lead after baseline in the TX and control groups using 3 measurements/person, and all 100 subjects. Issue(s) = Pre/Post Data compare the mean blood lead after baseline to the mean blood lead at baseline for the treatment subjects onl using 4 measurements/person, and onl 50 subjects. Issue(s) = 6 Auckland 2008

7 6-1 Auckland 2008 Simple Analsis: Post Onl Data week 0 week 1 week 4 week 6 Control β 0 β 0 β 0 Treatment β 0 + β 1 β 0 + β 1 β 0 + β 1

8 6-2 Auckland 2008 Post Data Onl. *** Analsis using POST DATA at weeks 1, 4, and 6.. regress tx if week>=1 Number of obs = Coef. Std. Err. t P> t [95% Conf. Interval] tx _cons

9 6-3 Auckland regress tx if week>=1, cluster(id) Number of obs = 300 Regression with robust standard errors Number of clusters (id) = Robust Coef. Std. Err. t P> t [95% Conf. Interval] tx _cons

10 Treatment β 0 β 0 + β 1 β 0 + β 1 β 0 + β Auckland 2008 Simple Analsis: Pre/Post for Treatment Group Onl Control week 0 week 1 week 4 week 6

11 6-5 Auckland 2008 Pre/Post Data, TX Group Onl.. *** Analsis using PRE/POST for treatment subjects.. regress post if tx==1 Number of obs = Coef. Std. Err. t P> t [95% Conf. Interval] post _cons

12 6-6 Auckland regress post if tx==1, cluster(id) Number of obs = 200 Regression with robust standard errors Number of clusters (id) = Robust Coef. Std. Err. t P> t [95% Conf. Interval] post _cons

13 Dependent Data and Proper Variance Estimates Let X ij = 0 denote placebo assignment and X ij = 1 denote active treatment. (1) Consider (Y i1, Y i2 ) with (X i1, X i2 ) = (0, 0) for i = 1 : n and (X i1, X i2 ) = (1, 1) for i = (n + 1) : 2n ˆµ 0 = 1 2n ˆµ 1 = 1 2n n 2 i=1 j=1 2n Y ij 2 i=n+1 j=1 var(ˆµ 1 ˆµ 0 ) = 1 n {σ2 (1 + ρ)} Y ij 7 Auckland 2008

14 7-1 Auckland 2008 Scenario 1 subject control treatment time 1 time 2 time 1 time 2 ID = 101 Y 1,1 Y 1,2 ID = 102 Y 2,1 Y 2,2 ID = 103 Y 3,1 Y 3,2 ID = 104 Y 4,1 Y 4,2 ID = 105 Y 5,1 Y 5,2 ID = 106 Y 6,1 Y 6,2

15 Dependent Data and Proper Variance Estimates (2) Consider (Y i1, Y i2 ) with (X i1, X i2 ) = (0, 1) for i = 1 : n and (X i1, X i2 ) = (1, 0) for i = (n + 1) : 2n ˆµ 0 = 1 2n ˆµ 1 = 1 2n { n Y i1 + i=1 { n Y i2 + i=1 2n i=n+1 2n i=n+1 Y i2 } Y i1 } var(ˆµ 1 ˆµ 0 ) = 1 n {σ2 (1 ρ)} 8 Auckland 2008

16 8-1 Auckland 2008 Scenario 2 subject control treatment time 1 time 2 time 1 time 2 ID = 101 Y 1,1 Y 1,2 ID = 102 Y 2,1 Y 2,2 ID = 103 Y 3,1 Y 3,2 ID = 104 Y 4,2 Y 4,1 ID = 105 Y 5,2 Y 5,1 ID = 106 Y 6,2 Y 6,1

17 Dependent Data and Proper Variance Estimates If we simpl had 2n independent observations on treatment (X = 1) and 2n independent observations on control then we d obtain var(ˆµ 1 ˆµ 0 ) = σ2 2n + σ2 2n = 1 n σ2 Q: What is the impact of dependence relative to the situation where all (2n + 2n) observations are independent? (1) positive dependence, ρ > 0, results in a loss of precision. (2) positive dependence, ρ > 0, results in an improvement in precision! 9 Auckland 2008

18 Therefore: Dependent data impacts proper statements of precision. Dependent data ma increase or decrease standard errors depending on the design. 10 Auckland 2008

19 Weighted Estimation Consider the situation where subjects report both the number of attempts and the number of successes: (Y i, N i ). Examples: live born (Y i ) in a litter (N i ) condoms used (Y i ) in sexual encounters (N i ) SAEs (Y i ) among total surgeries (N i ) Q: How to combine these data from i = 1 : m subjects to estimate a common rate (proportion) of successes? 11 Auckland 2008

20 Weighted Estimation Proposal 1: ˆp 1 = i Y i / i N i Proposal 2: Simple Example: ˆp 2 = 1 m Y i /N i i Data : (1, 10) (2, 100) ˆp 1 = (2 + 1)/(110) = ˆp 2 = 1 {1/10 + 2/100} = Auckland 2008

21 Weighted Estimation Note: Each of these estimators, ˆp 1, and ˆp 2, can be viewed as weighted estimators of the form: ˆp w = { i w i Y i N i } / i We obtain ˆp 1 b letting w i = N i, corresponding to equal weight given each to binar outcome, Y ij, Y i = N i j=1 Y ij. We obtain ˆp 2 b letting w i = 1, corresponding to equal weight given to each subject. w i Q: What s optimal? 13 Auckland 2008

22 Weighted Estimation A: Whatever weights are closest to 1/variance of Y i /N i (stat theor called Gauss-Markov ). If subjects are perfectl homogeneous then and ˆp 1 is best. V (Y i ) = N i p(1 p) If subjects are heterogeneous then, for example V (Y i ) = N i p(1 p){1 + (N i 1)ρ} and an estimator closer to ˆp 2 is best. 14 Auckland 2008

23 Summar: Role of Correlation Statistical inference must account for the dependence. correlation impacts standard errors! Consideration as to the choice of weighting will depend on the variance/covariance of the response variables. correlation impacts regression estimates! 15 Auckland 2008

24 Longitudinal Data Analsis INTRODUCTION to REGRESSION APPROACHES 16 Auckland 2008

25 Statistical Models Regression model: Groups mean response as a function of covariates. sstematic variation Random effects: Individuals variation from subject-to-subject in trajector. random between-subject variation Within-subject variation: Observations variation of individual observations over time random within-subject variation 17 Auckland 2008

26 Groups: Scientific Questions as Regression Questions concerning the rate of decline refer to the time slope for FEV1: E[ FEV1 X = age, gender, f508] = β 0 (X) + β 1 (X) time Time Scales Let age0 = age-at-entr, age i1 Let agel = time-since-entr, age ij age i1 18 Auckland 2008

27 CF Regression Model Model: E[FEV X i ] = β 0 +β 1 age0 + β 2 agel +β 3 female +β 4 f508 = 1 + β 5 f508 = 2 +β 6 female agel +β 7 f508 = 1 agel + β 8 f508 = 2 agel = β 0 (X i ) + β 1 (X i ) agel 19 Auckland 2008

28 19-1 Auckland 2008 Intercept f508=0 f508=1 f508=2 male β 0 + β 1 age0 β 0 + β 1 age0 β 0 + β 1 age0 +β 4 +β 5 female β 0 + β 1 age0 β 0 + β 1 age0 β 0 + β 1 age0 +β 3 +β 3 + β 4 +β 3 + β 5

29 19-2 Auckland 2008 Slope f508=0 f508=1 f508=2 male β 2 β 2 + β 7 β 2 + β 8 female β 2 β 2 + β 7 β 2 + β 8 +β 6 +β 6 +β 6

30 19-3 Auckland 2008 Gender Groups (f508==0) FEV Male Female Age (ears)

31 19-4 Auckland 2008 Genotpe Groups (male) FEV f508=0 f508=1 f508= Age (ears)

32 Define Y ij = FEV1 for subject i at time t ij X i = (X ij,..., X ini ) X ij = (X ij,1, X ij,2,..., X ij,p ) age0, agel, gender, genotpe Issue: Response variables measured on the same subject are correlated. cov(y ij, Y ik ) 0 20 Auckland 2008

33 Some Notation It is useful to have some notation that can be used to discuss the stack of data that correspond to each subject. Let n i denote the number of observations for subject i. Define: Y i = Y i1 Y i2. Y ini If the subjects are observed at a common set of times t 1, t 2,..., t m then E(Y ij ) = µ j denotes the mean of the population at time t j. 21 Auckland 2008

34 Dependence and Correlation Recall that observations are termed independent when deviation in one variable does not predict deviation in the other variable. Given two subjects with the same age and gender, then the blood pressure for patient ID=212 is not predictive of the blood pressure for patient ID=334. Observations are called dependent or correlated when one variable does predict the value of another variable. The LDL cholesterol of patient ID=212 at age 57 is predictive of the LDL cholesterol of patient ID=212 at age Auckland 2008

35 Dependence and Correlation Recall: The variance of a variable, Y ij (fix time t j for now) is defined as: σ 2 j = E [ (Y ij µ j ) 2] = E [(Y ij µ j )(Y ij µ j )] The variance measures the average distance that an observation falls awa from the mean. 23 Auckland 2008

36 Dependence and Correlation Define: The covariance of two variables, Y ij, and Y ik (fix t j and t k ) is defined as: σ jk = E [(Y ij µ j )(Y ik µ k )] The covariance measures whether, on average, departures in one variable, Y ij µ j, go together with departures in a second variable, Y ik µ k. In simple linear regression of Y ij on Y ik the regression coefficient β 1 in E(Y ij Y ik ) = β 0 + β 1 Y ik is the covariance divided b the variance of Y ik : β 1 = σ jk σ 2 k 24 Auckland 2008

37 Dependence and Correlation Define: The correlation of two variables, Y ij, and Y ik (fix t j and t k ) is defined as: ρ jk = E [(Y ij µ j )(Y ik µ k )] σ j σ k The correlation is a measure of dependence that takes values between -1 and +1. Recall that a correlation of 0.0 implies that the two measures are unrelated (linearl). Recall that a correlation of 1.0 implies that the two measures fall perfectl on a line one exactl predicts the other! 25 Auckland 2008

38 Wh interest in covariance and/or correlation? Recall that on earlier pages our standard error for the sample mean difference µ 1 µ 0 depends on ρ. In general a statistical model for the outcomes Y i = (Y i1, Y i2,..., Y ini ) requires the following: Means: µ j Variances: σ 2 j Covariances: σ jk, or correlations ρ jk. Therefore, one approach to making inferences based on longitudinal data is to construct a model for each of these three components. 26 Auckland 2008

39 Something new to model... cov(y i ) = var(y i1 ) cov(y i1, Y i2 )... cov(y i1, Y ini ) cov(y i2, Y i1 ) var(y i2 )... cov(y i2, Y ini ) cov(y ini, Y i1 ) cov(y ini, Y i2 )... var(y ini ) = σ 2 1 σ 1 σ 2 ρ σ 1 σ ni ρ 1ni σ 2 σ 1 ρ 21 σ σ 2 σ ni ρ 2ni σ ni σ 1 ρ ni 1 σ ni σ 2 ρ ni 2... σ 2 n i 27 Auckland 2008

40 TLC Trial Covariances Placebo Active Auckland 2008

41 TLC Trial Correlations Placebo Active Auckland 2008

42 Mean and Covariance Models for FEV1 Models: E(Y ij X i ) = µ ij (regression) Groups cov(y i X i ) = Σ i = between-subjects }{{} + within-subjects }{{} individual to individual observation to observation Q: What are appropriate covariance models for the FEV1 data? Individual-to-Individual variation? Observation-to-Observation variation? 30 Auckland 2008

43 age age 10 age age 14 age age 18 age age *10^ *10^1 6*10^1 age Auckland 2008

44 How to build models for correlation? Mixed models random effects between-subject variabilit within-subject similarit due to sharing trajector Serial correlation close in time implies strong similarit correlation decreases as time separation increases 31 Auckland 2008

45 Toward the Linear Mixed Model Regression model: mean response as a function of covariates. sstematic variation Random effects: Individuals variation from subject-to-subject in trajector. random between-subject variation Within-subject variation: variation of individual observations over time random within-subject variation 32 Auckland 2008

46 32-1 Auckland 2008 Two Subjects Response Months

47 Levels of Analsis We first consider the distribution of measurements within subjects: Y ij = β 0,i + β 1,i t ij + e ij e ij N (0, σ 2 ) E[Y i X i, β i ] = β 0,i + β 1,i t ij = [1, time ij ] β 0,i β 1,i = X i β i 33 Auckland 2008

48 Levels of Analsis We can equivalentl separate the subject-specific regression coefficients into the average coefficient and the specific departure for subject i: β 0,i = β 0 + b 0,i β 1,i = β 1 + b 1,i This allows another perspective: Y ij = β 0,i + β 1,i t ij + e ij = (β 0 + β 1 t ij ) + (b 0,i + b 1,i t ij ) + e ij E[Y i X i, β i ] = X i β }{{} + X i b }{{} i mean model between-subject 34 Auckland 2008

49 34-1 Auckland 2008 Sample of Lines FEV Age (ears)

50 34-2 Auckland 2008 Intercepts and Slopes Slope Deviation (b1) Intercept Deviation (b0)

51 Levels of Analsis Next we consider the distribution of patterns (parameters) among subjects: equivalentl β i N (β, D) b i N (0, D) Y i = X i β }{{} + X i b }{{} i + e }{{} i mean model between-subject within-subject 35 Auckland 2008

52 35-1 Auckland 2008 Fixed intercept, Fixed slope Random intercept, Fixed slope beta0 + beta1 * time beta0 + beta1 * time time Fixed intercept, Random slope time Random intercept, Random slope beta0 + beta1 * time beta0 + beta1 * time time time

53 Between-subject Variation We can use the idea of random effects to allow different tpes of between-subject heterogeneit: The magnitude of heterogeneit is characterized b D: b i = var(b i ) = b 0,i b 1,i D 11 D 12 D 21 D Auckland 2008

54 Between-subject Variation The components of D can be interpreted as: D 11 the tpical subject-to-subject deviation in the overall level of the response. D 22 the tpical subject-to-subject deviation in the change (time slope) of the response. D 12 the covariance between individual intercepts and slopes. If positive then subjects with high levels also have high rates of change. If negative then subjects with high levels have low rates of change. 37 Auckland 2008

55 37-1 Auckland 2008 Fixed intercept, Fixed slope Random intercept, Fixed slope beta0 + beta1 * time beta0 + beta1 * time time Fixed intercept, Random slope time Random intercept, Random slope beta0 + beta1 * time beta0 + beta1 * time time time

56 Between-subject Variation: Examples No random effects: Y ij = β 0 + β 1 t ij + e ij = [1, time ij ]β + e ij Random intercepts: Y ij = (β 0 + β 1 t ij ) + b 0,i + e ij = [1, time ij ]β + [ 1 ]b 0,i + e ij Random intercepts and slopes: Y ij = (β 0 + β 1 t ij ) + b 0,i + b 1,i t ij + e ij = [1, time ij ]β + [1, time ij ]b i + e ij 38 Auckland 2008

57 Toward the Linear Mixed Model Regression model: mean response as a function of covariates. sstematic variation Random effects: variation from subject-to-subject in trajector. random between-subject variation Within-subject variation: Observation variation of individual observations over time random within-subject variation 39 Auckland 2008

58 Covariance Models Serial Models Linear mixed models assume that each subject follows his/her own line. In some situations the dependence is more local meaning that observations close in time are more similar than those far apart in time. 40 Auckland 2008

59 Covariance Models Define e ij = ρ e ij 1 + ɛ ij ɛ ij N (0, σ 2 (1 ρ 2 ) ) ɛ i0 N (0, σ 2 ) This leads to autocorrelated errors: cov(e ij, e ik ) = σ 2 ρ j k 41 Auckland 2008

60 ID=1, rho=0.5 ID=2, rho=0.5 ID=3, rho=0.5 ID=4, rho=0.5 ID=5, rho=0.5 ID=6, rho=0.5 ID=7, rho=0.5 ID=8, rho=0.5 ID=9, rho=0.5 ID=10, rho=0.5 ID=11, rho=0.5 ID=12, rho=0.5 ID=13, rho=0.5 ID=14, rho=0.5 ID=15, rho=0.5 ID=16, rho=0.5 ID=17, rho=0.5 ID=18, rho=0.5 ID=19, rho=0.5 ID=20, rho= Auckland 2008

61 ID=1, rho=0.9 ID=2, rho=0.9 ID=3, rho=0.9 ID=4, rho=0.9 ID=5, rho=0.9 ID=6, rho=0.9 ID=7, rho=0.9 ID=8, rho=0.9 ID=9, rho=0.9 ID=10, rho=0.9 ID=11, rho=0.9 ID=12, rho=0.9 ID=13, rho=0.9 ID=14, rho=0.9 ID=15, rho=0.9 ID=16, rho=0.9 ID=17, rho=0.9 ID=18, rho=0.9 ID=19, rho=0.9 ID=20, rho= Auckland 2008

62 41-3 Auckland 2008 Two Subjects Response line for subject 1 average line line for subject 2 subject 1 dropout Months

63 Toward the Linear Mixed Model Regression model: mean response as a function of covariates. sstematic variation Random effects: variation from subject-to-subject in trajector. random between-subject variation Within-subject variation: variation of individual observations over time random within-subject variation 42 Auckland 2008

64 Session Three Summar Role of correlation Impact proper standard errors Used to weight individuals (clusters) Models for correlation / covariance Regression: Group-to-Group variation Random effects: Individual-to-Individual variation Serial correlation: Observation-to-Observation variation 43 Auckland 2008

65 43-1 Auckland 2008 EXTRA: Mixed Models and Covariances/Correlation Q: What is the correlation between outcomes Y ij and Y ik under these random effects models? Random Intercept Model Y ij = β 0 + β 1 t ij + b 0,i + e ij Y ik = β 0 + β 1 t ik + b 0,i + e ik var(y ij ) = var(b 0,i ) + var(e ij ) = D 11 + σ 2 cov(y ij, Y ik ) = cov(b 0,i + e ij, b 0,i + e ik ) = D 11

66 43-2 Auckland 2008 EXTRA: Mixed Models and Covariances/Correlation Random Intercept Model corr(y ij, Y ik ) = D 11 D11 + σ 2 D 11 + σ 2 = D 11 D 11 + σ 2 = between var between var + within var Therefore, an two outcomes have the same correlation. Doesn t depend on the specific times, nor on the distance between the measurements. Exchangeable correlation model. Assuming: var(e ij ) = σ 2, and cov(e ij, e ik ) = 0.

67 43-3 Auckland 2008 EXTRA: Mixed Models and Covariances/Correlation Random Intercept and Slope Model Y ij = (β 0 + β 1 t ij ) + (b 0,i + b 1,i t ij ) + e ij Y ik = (β 0 + β 1 t ik ) + (b 0,i + b 1,i t ik ) + e ik var(y ij ) = var(b 0,i + b 1,i t ij ) + var(e ij ) = D D 12 t ij + D 22 t 2 ij + σ 2 cov(y ij, Y ik ) = cov[(b 0,i + b 1,i t ij + e ij ), (b 0,i + b 1,i t ik + e ik )] = D 11 + D 12 (t ij + t ik ) + D 22 t ij t ik

68 43-4 Auckland 2008 EXTRA: Mixed Models and Covariances/Correlation Random Intercept and Slope Model ρ ijk = corr(y ij, Y ik ) = D 11 + D 12 (t ij + t ik ) + D 22 t ij t ik D D 12 t ij + D 22 t 2 ij + σ2 D D 12 t ik + D 22 t 2 ik + σ2 Therefore, two outcomes ma not have the same correlation. Correlation depends on the specific times for the observations, and does not have a simple form. Assuming: var(e ij ) = σ 2, and cov(e ij, e ik ) = 0.

Stat 579: Generalized Linear Models and Extensions

Stat 579: Generalized Linear Models and Extensions Linear Mixed Models for Longitudinal Data Yan Lu April, 2018, week 12 1 / 34 Correlated data multivariate observations clustered data repeated measurement