Analysis of variance and regression. May 13, 2008

Size: px

Start display at page:

Download "Analysis of variance and regression. May 13, 2008"

Dana Sutton
6 years ago
Views:

1 Analysis of variance and regression May 13, 2008

2 Repeated measurements over time Presentation of data Traditional ways of analysis Variance component model (the dogs revisited) Random regression Baseline considerations

3 Lene Theil Skovgaard, Dept. of Biostatistics, Institute of Public Health, University of Copenhagen

4 Repeated measurements, May Traditional presentation of longitudinal data: Ex: Aspirin absorption for healthy and ill subjects (Matthews et.al.,1990) Comparison of groups for each time: mass significance problem tests are not independent interpretation may be difficult

5 Repeated measurements, May What is the purpose of the investigation? Description of time course Comparison of groups in which respect? level, trend,... overall pattern

6 Repeated measurements, May Why is this difficult? or at least different from usual analyses We have several measurements on each individual traditional independence assumption is violated repeated observations on the same individual are correlated (look alike) ignoring this correlation may lead to bias, wrong standard error and therefore potentially misleading conclusions Time course may be quite irregular, with no obvious structure, to be treated as a class-variable (using many parameters) in ANOVA-type models (variance component models) Time course may vary between individuals Random regression

7 Repeated measurements, May Notation from multi-level models: level unit covariate 1 single observations time effects 2 individuals treatment effects If we fail to take this correlation into account, we will experience: possible bias in the mean value structure low efficiency (type 2 error) for evaluation of level 1 covariates (time-related effects) too small standard errors (type 1 error) for estimates of level 2 effects (treatments)

8 Repeated measurements, May Possible bias? Individual time courses Average curve sometimes referred to as the healthy worker effect

9 Repeated measurements, May Missing values MCAR Missing completely at random MAR Missing at random - may depend on past observations NR Informative missing (non-random) - depends on the missing value itself

10 Repeated measurements, May Level 1 covariates (unit: single observations), i.e. Time itself Covariates varying with time: blood pressure, heart rate, age If correlation is not taken into account, we ignore the paired situation, leading to low efficiency, i.e. too large P-values (type 2 error) Effects may go undetected!

11 Repeated measurements, May Level 2 covariates (unit: individuals), i.e. Treatment Gender, age If correlation is ignored, we act as if we have (a lot) more information than we actually have, leading to too small P-values (type 1 error) Noise may be taken to be real effects!

12 Repeated measurements, May Average curves may hide important structures! They give no indication of the variation in the time profiles Comparisons between groups should not be performed for each time point separately Comparisons between time points cannot be judged from the curves (they are paired)

13 Repeated measurements, May The model must describe the characteristic differences between individuals, and the rest (noise, error) should be of an unsystematic, random nature. Do not average over individual profiles, unless these have identical shapes, i.e. only shifts in level are seen between individuals. Alternative: Calculate individual characteristics

14 Repeated measurements, May Individual time profiles (spaghettiogram) - divided into groups Do we see time profiles of identical shape? Are the averages representative?

15 Repeated measurements, May Commonly used characteristics The response for selected times, e.g. endpoint Average over a specific period of time The slope, perhaps for a specific period Peak value Time to peak The area under the curve (AUC). A measure of cyclic behaviour. These are analysed as new observations.

16 Repeated measurements, May Ex: Aspirin time to peak peak value Conclusion: P=0.02 for identity of peak values. Quantifications!

17 Repeated measurements, May Example: 2 groups of dogs (5 resp. 6 dogs). Average profiles: of osmolality, measured 4 times (including treatments along the way)

18 Repeated measurements, May Do we have identical repetitions (except for level)?

19 Repeated measurements, May Model control Residual plot for 2-way ANOVA in (dog, treatment) We see a clear trumpet shape, because dogs with a high level also vary more than dogs with a low level. Multiplicative structure Solution: Make a logarithmic transformation!

20 Repeated measurements, May Profiles on logarithmic scale, with corresponding residual plot:

21 Repeated measurements, May Multilevel model structure: level/niveau 1 2 unit single measurements individuals variation within individuals between individuals σ 2 W ω 2 B covariates x z time, grp*time grp Multilevel models are part of the broader class of models: variance component models (which are not necessarily hierarchical)

22 Repeated measurements, May Two-level model: Observations Y gdt (group, dog, time) Random dog-level, Var(a gd ) = ωb 2 Residual variation, within dogs, Var(ε gdt ) = σw 2 Systematic effect of time and grp proc mixed data=dog; class grp time_no dog; model losmol=grp time_no grp*time_no / ddfm=satterth; random dog(grp); run;

23 Repeated measurements, May This model assumes the socalled compound symmetry, i.e. that all measurements on the same individual are equally correlated: Corr(Y gdt1, Y gdt2 ) = ρ = ω 2 B ω 2 B + σ2 W This means that the distance in time is not taken into account!!

24 Repeated measurements, May Two-level model with random dog level: Class Levels Values grp time_no dog Covariance Parameter Estimates Standard Z Cov Parm Estimate Error Value Pr Z dog(grp) Residual P=0.08 for test of interaction, i.e. no convincing indication of this. Type 3 Tests of Fixed Effects Num Den Effect DF DF F Value Pr > F grp time_no <.0001 grp*time_no

25 Repeated measurements, May Factor diagram: [I] = [Dog Time] [Dog] Grp Time Grp Time We have used the notation [ ] for the random effects, corresponding to variance components. We may note the following: The effect of Grp*Time is evaluated against Dog*Time If Grp*Time is not considered significant, we thereafter evaluate Time against Dog*Time Grp against Dog(Grp)

26 Repeated measurements, May The variance component model with random dog level specifies the covariance structure: 0 ω 2 B + σ2 W ω 2 B ω 2 B ω 2 B ω 2 B ω 2 B + σ2 W ω 2 B ω 2 B ω 2 B ω 2 B ω 2 B + σ2 W ω 2 B ω 2 B ω 2 B ω 2 B ω 2 B + σ2 W 1 0 C A = (ω2 B +σ2 W ) 1 ρ ρ ρ ρ 1 ρ ρ ρ ρ 1 ρ ρ ρ ρ 1 1 C A called the compound symmetry structure. The correlation ρ is here estimated to ρ = Corr(Y gdt1, Y gdt2 ) = ω 2 B ωb 2 + σ2 W = 0.65

27 Repeated measurements, May Note, that the specification random dog(grp); can be written in two other ways: random intercept / subject=dog(grp); repeated time / type=cs subject=dog(grp); In the following, we shall see generalisations of the constructions above.

28 Repeated measurements, May Compound symmetry analysis proc mixed data=dog; class grp time dog; model losmol=grp time grp*time / ddfm=satterth; repeated time / type=cs subject=dog(grp) rcorr; run; Covariance Parameter Estimates Cov Parm Subject Estimate CS dog(grp) Residual Fit Statistics -2 Res Log Likelihood 14.8 AIC (smaller is better) 18.8 Estimated R Correlation Matrix for dog(grp) 1 1 Row Col1 Col2 Col3 Col Type 3 Tests of Fixed Effects Num Den Effect DF DF F Value Pr > F grp time_no <.0001 grp*time_no

29 Repeated measurements, May The option ddfm=satterth (- or kenwardrogers): When the distributions are exact, they have no effect in balanced situations When approximations are necessary, these are considered best in unbalanced situations, i.e for almost all observational designs in case of missing observations It may give rise to fractional degrees of freedom The computations may require a little more time, but in most cases this will not be noticable When in doubt, use it!

30 Repeated measurements, May Since the interaction was not significant, we omit it from the model: Covariance Parameter Estimates Standard Z Cov Parm Estimate Error Value Pr Z dog(grp) Residual <.0001 Solution for Fixed Effects Standard Effect grp time_no Estimate Error DF t Value Pr > t Intercept grp grp time_no time_no time_no <.0001 time_no Type 3 Tests of Fixed Effects Num Den Effect DF DF F Value Pr > F grp time_no <.0001

31 Repeated measurements, May The variance component model (compound symmetry) with random dog level specifies the covariance structure: 0 (ωb 2 + σw) 2 1 ρ ρ ρ ρ 1 ρ ρ ρ ρ 1 ρ ρ ρ ρ 1 1 C A But: The assumption of equal correlation for all pairs of observations taken on the same individual is not necessarily reasonable! Observations taken close to each other in time will often be more closely correlated than observations taken further apart!

32 Repeated measurements, May In the dog example, the empirical correlation matrix is Rather large differences are seen between individual correlations. So what?

33 Repeated measurements, May Unstructured covariance If we do not assume any special structure for the covariance, we may let it be arbitrary = unstructured This is done in MIXED by using type=un and remembering the option hlm: proc mixed data=dog; class grp dog time_no; model losmol=grp time_no grp*time_no / ddfm=satterth; repeated time_no / type=un hlm subject=dog(grp) rcorr; run;

34 Repeated measurements, May Estimated R Correlation Matrix for dog(grp) 1 1 Row Col1 Col2 Col3 Col Fit Statistics -2 Res Log Likelihood 2.3 AIC (smaller is better) 22.3 Type 3 Hotelling-Lawley-McKeon Statistics Num Den Effect DF DF F Value Pr > F time <.0001 grp*time_no

35 Repeated measurements, May Advantages with unstructured covariance We do not force a wrong covariance structure upon our observations. We gain some insight in the actual structure of the covariance. Drawbacks of the unstructured covarianc We use quite a lot of parameters to describe the covariance structure. The result may therefore be unstable. It cannot be used for small data sets It can only be used in case of balanced data (all subjects have to be measured at identical times) Can we do something in between?

36 Repeated measurements, May Comparison of models, using the likelihood Default likelihood is the REML-likelihood, where the mean value structure has been eliminated The traditional likelihood may be obtained using an extra option: proc mixed method=ml; Use differences in 2 log L ( = 2 log Q) and compare to χ 2 with degrees of freedom equal to the difference in parameters Comparison of covariance structures: Use either of the two likelihoods Comparison of mean value structures: Use only the traditional likelihood (ML) Comparison of compound symmetry and unstructured covariance: 2log Q = = 12.5 χ 2 (10 2) = χ 2 (8) P = 0.13.

37 Repeated measurements, May Autoregressive structure of first order In case of equidistant times, this specifies the following covariance structure σ 2 1 ρ ρ 2 ρ 3 ρ 1 ρ ρ 2 ρ 2 ρ 1 ρ ρ 3 ρ 2 ρ 1 i.e. the correlation decreases (in powers) with the distance between observations. The non-equidistant analogue is Corr(Y gdt1, Y gdt2 ) = ρ t 1 t 2

38 Repeated measurements, May Autoregressive structure of first order (TYPE=AR(1)) Estimated R Correlation Matrix for dog(grp) 1 1 Row Col1 Col2 Col3 Col Covariance Parameter Estimates Standard Z Cov Parm Subject Estimate Error Value Pr Z AR(1) dog(dog) <.0001 Residual Fit Statistics -2 Res Log Likelihood 9.8 AIC (smaller is better) 13.8 Type 3 Tests of Fixed Effects Num Den Effect DF DF F Value Pr > F grp time_no <.0001 grp*time_no

39 Repeated measurements, May Note: Comparison of models with different covariance structures using a χ 2 -test on 2 log Q, the difference between 2 log L s requires, that the models are nested This is not the case for CS and AR(1)! Therefore, we have to compare both of them with the model which combines the two covariance structures: proc mixed data=dog; class grp dog time_no; model losmol = grp time_no grp*time_no / ddfm=satterth; random intercept / subject=dog(grp) vcorr; repeated time_no / type=ar(1) subject=dog(grp); run;

40 Repeated measurements, May Estimated V Correlation Matrix for dog(grp) 1 1 Row Col1 Col2 Col3 Col Covariance Parameter Estimates Cov Parm Subject Estimate dog(grp) AR(1) dog(grp) Residual Fit Statistics -2 Res Log Likelihood 9.8 AIC (smaller is better) 15.8 Type 3 Tests of Fixed Effects Num Den Effect DF DF F Value Pr > F grp time_no <.0001 grp*time_no

41 Repeated measurements, May Comparison of covariance structures cov. Model -2 log L par. df P CS=random dog AR(1) both UN Conclusions? The autoregressive structure is probably the best! Our data set is too small!

42 Repeated measurements, May What, if we had had double or triple measurements at each time? If we always have the same number of repetitions, the correct and optimal approach is to analyze averages If the number of repetitions vary, analysis of averages may still be valid (depends on the reason for the unbalance), although not optimal The optimal approach is to modify the random-statement to: random dog dog*time_no;

43 Repeated measurements, May Actually, the times are not equidistant! Then what?? The non-equidistant analogue to the autoregressive structure is Corr(Y gdt1, Y gdt2 ) = ρ t 1 t 2 which is written as TYPE=SP(POW)(time) For technical reasons, we have to rescale time to hours=time/60 proc mixed covtest data=dog; class grp hours dog; model losmol=grp hours grp*hours / s ddfm=satterth; repeated hours / subject=dog(grp) type=sp(exp)(hours) r; run;

44 Repeated measurements, May Class Level Information Class Levels Values grp hours dog Estimated R Matrix for dog(grp) 1 1 Row Col1 Col2 Col3 Col Type 3 Tests of Fixed Effects Num Den Effect DF DF F Value Pr > F grp hours <.0001 grp*hours

45 Repeated measurements, May Example: Calcium supplement for adolescent women, to improve the rate of bone gain A total of year old girls were randomized to receive either calcium or placebo. Outcome: BMD=bone mineral density, in g cm, 2 measured every 6 months (5 visits)

46 Repeated measurements, May Factor diagram: [I] = [Girl V isit] [Girl] Grp V isit Grp V isit Two-level model with: Observations Y git (group, girl=individual, visit) Random girl-level, Var(a gi ) = ωb 2 Residual variation, within girls, Var(ε git ) = σw 2

47 Repeated measurements, May Variance component model: where Y git = µ + α g + β t + γ gt + a gi + ε git Var(a gi ) = ω 2 B, Var(ε git ) = σ 2 W Like previously, we have assumed compound symmetry, i.e. that all measurements on the same girl are equally correlated: Corr(Y git1, Y git2 ) = ρ = ω 2 B ω 2 B + σ2 W

48 Repeated measurements, May Empirical correlation structure: Row COL1 COL2 COL3 COL4 COL Is compound symmetry reasonable? Other possibilities: Unstructured: T(T+1) 2 covariance parameters patterned, e.g. an autoregressive structure random regression

49 Repeated measurements, May Compound symmetry results for the calcium example: Covariance Parameter Estimates (REML) Cov Parm Estimate GIRL(GRP) Residual Tests of Fixed Effects Source NDF DDF Type III F Pr > F GRP VISIT GRP*VISIT No doubt, we see an interaction GRP*VISIT, or?

50 Repeated measurements, May Autoregressive covariance structure: Results: Covariance Parameter Estimates (REML) σ 2 1 ρ ρ 2 ρ 3 ρ 4 ρ 1 ρ ρ 2 ρ 3 ρ 2 ρ 1 ρ ρ 2 ρ 3 ρ 2 ρ 1 ρ ρ 4 ρ 3 ρ 2 ρ 1 Cov Parm Subject Estimate AR(1) GIRL(GRP) Residual Source Tests of Fixed Effects NDF DDF Type III F Pr > F GRP VISIT GRP*VISIT

51 Repeated measurements, May Comparison of test results for the test of no interaction GRP*VISIT: Covariance structure Test statistic distribution P value Independence 0.35 F(4,491) 0.84 Compound symmetry 5.30 F(4,382) Autoregressive 5.30 F(4,381) Unstructured 2.72 F(4,107) 0.034

52 Repeated measurements, May Predicted profiles for the unstructured covariance: The evolution over time looks pretty linear Include time=visit as a quantitative covariate? What about the baseline difference?

53 Repeated measurements, May Test of linear time trend (time=visit): proc mixed data=calcium; class grp girl visit; model bmd=grp time grp*time visit grp*visit / ddfm=satterth; repeated visit / type=un subject=girl(grp) r; run; proc mixed data=calcium; class grp girl visit; model bmd=grp time grp*time visit / s ddfm=satterth; repeated visit / type=un subject=girl(grp) r; run; Type 3 Tests of Fixed Effects Num Den Effect DF DF F Value Pr > F grp time 0... time*grp 0... visit grp*visit

54 Repeated measurements, May Solution for Fixed Effects Standard Effect grp visit Estimate Error DF t Value Pr > t Intercept <.0001 grp C grp P time <.0001 time*grp C time*grp P visit visit visit visit visit Type 3 Tests of Fixed Effects Num Den Effect DF DF F Value Pr > F grp time 0... time*grp visit There is some deviation from linearity (P=0.0151)

55 Repeated measurements, May The time course is reasonably linear, but maybe the girls have have different growth rates (slopes)? If we let Y gpt denote BMD for the p th girl (in the g th group) at time t (t=1,,5), we have the model: y gpt = a gp + b gp t + ε gpt, ε gpt N(0, σ 2 W)

56 Repeated measurements, May We generalize the idea of a random level to Random regression We let each individual (girl) have her own level a gp her own slope b gp

57 Repeated measurements, May But we bind these individual parameters (a gp and b gp ) together by normal distributions a gp b gp N 2 α g β g, G G = τ2 a ω ω τ 2 b = τ2 a ρτ a τ b ρτ a τ b τb 2 G describes the population variation of the lines, i.e. the inter-individual variation.

58 Repeated measurements, May We estimate in this model by writing: proc mixed covtest data=calcium; class grp girl; model bmd=grp time time*grp / ddfm=satterth s; random intercept time / type=un subject=girl(grp) g v vcorr; run; Estimated G Matrix Row Effect grp girl Col1 Col2 1 Intercept C E-6 2 time C E Estimated V Matrix for girl(grp) 101 C Row Col1 Col2 Col3 Col4 Col

59 Repeated measurements, May Estimated V Correlation Matrix for girl(grp) 101 C Row Col1 Col2 Col3 Col4 Col Covariance Parameter Estimates Standard Z Cov Parm Subject Estimate Error Value Pr Z UN(1,1) girl(grp) <.0001 UN(2,1) girl(grp) 3.733E UN(2,2) girl(grp) E <.0001 Residual <.0001 Fit Statistics -2 Res Log Likelihood AIC (smaller is better)

60 Repeated measurements, May Solution for Fixed Effects Standard Effect grp Estimate Error DF t Value Pr > t Intercept <.0001 grp C grp P time <.0001 time*grp C time*grp P Type 3 Tests of Fixed Effects Num Den Effect DF DF F Value Pr > F grp time <.0001 time*grp Thus, we find an extra increase in BMD of (0.0016) g pr. cm 3 pr. half year, when giving calcium supplement.

61 Repeated measurements, May Note concerning MIXED-notation It is necessary to use TYPE=UN in the RANDOM-statement in order to allow intercept and slope to be arbitrarily correlated Default option in RANDOM is TYPE=VC, which only specifies variance components with different variances If TYPE=UN is omitted, we may experience convergence problems and sometimes totally incomprehensible results. In this particular case, the correlation between intercept and slope is not that impressive (intercept is not completely out of range in this example).

62 Repeated measurements, May It turns out, that the girls are only seen approximately twice a year the actual dates are available and are translated into ctime, the internal date representation in SAS, denoting days since... We can no longer use the construction type=un, but still the random-statement (CS). A lot of other covariance structures will still be possible, e.g. the autoregressive type=ar(1) the girls were not precisely 11 years at the first visit As a covariate, we ought to have the specific age of the girl, but unfortunately, these are not available.

63 Repeated measurements, May If we use the newly constructed ctime: proc mixed covtest data=calcium; class grp girl; model bmd=grp ctime ctime*grp / ddfm=satterth s; random intercept ctime / type=un subject=girl(grp) g; run; Iteration History Iteration Evaluations -2 Res Log Like Criterion WARNING: Did not converge.

64 Repeated measurements, May The variablen ctime has much too large values, with a very small range, and we get numerical instability. We normalise, to approximate age or age11: age=(ctime-11475)/ ; age11=age-11; Variable N Mean Std Dev Minimum Maximum ctime bmd visit age age

65 Repeated measurements, May Random regression, covariate age: y gpt = a gp + b gp (age-11) + ε gpt

66 Repeated measurements, May Random regression, using actual age (age11=age-11): proc mixed covtest data=calcium; class grp girl; model bmd=grp age11 age11*grp / ddfm=satterth s outpm=predicted_mean; random intercept age11 / type=un subject=girl(grp) g vcorr; run; Estimated G Matrix Row Effect grp girl Col1 Col2 1 Intercept C age11 C Estimated V Correlation Matrix for girl(grp) 101 C Row Col1 Col2 Col3 Col4 Col

67 Repeated measurements, May Covariance Parameter Estimates Standard Z Cov Parm Subject Estimate Error Value Pr Z UN(1,1) girl(grp) <.0001 UN(2,1) girl(grp) UN(2,2) girl(grp) <.0001 Residual <.0001 Solution for Fixed Effects Standard Effect grp Estimate Error DF t Value Pr > t Intercept <.0001 grp C grp P age <.0001 age11*grp C age11*grp P In this model, we quantify the effect of a calcium supplement to (0.0031) g per cm 3 per year.

68 Repeated measurements, May Results from random regression: Group level at age 11 slope P (0.0087) (0.0022) C (0.0088) (0.0022) difference (0.0124) (0.0031) P

69 Repeated measurements, May Comparison of slopes for different covariance structures: Covariance 2 log L cov.par. AIC Difference structure of slopes P Independence (0.0086) 0.27 Compound (0.0020) < symmetry Exponential (0.0032) (autoregressive) Random (0.0031) regression

70 Repeated measurements, May Predicted values from random regression It looks as if there is a difference right from the start (although we have previously seen this to be insignificant, P=0.37). Baseline adjustment?

71 Repeated measurements, May It the first visit is a baseline measurement: The two groups are known to be equal at baseline To include this measurement in the comparison between groups may therefore weaken a possible difference between these (type 2 error) Dissimilarities may be present in small studies For slowly varying outcomes, even a small difference may produce non-treatment related differences, i.e. bias

72 Repeated measurements, May Approaches for handling baseline differences: Use follow-up data only (exclude baseline from analysis) - only reasonable if correlation between repeated measurements is very low Subtract baseline from successive measurements - only reasonable if correlation between repeated measurements is very high Use baseline measurement as a covariate - may be used for any degree of correlation

73 Repeated measurements, May Baseline included as a covariate will hardly change the results for the slopes since these are within-individual quantities A small change is expected because of the exclusion of visit 1 from the analysis. may affect the difference between groups at fixed ages e.g. endpoint age of 13 years

74 Repeated measurements, May Excluding baseline (4 visits only), without baseline as covariate: proc mixed covtest noclprint data=calcium; where visit>1; class grp girl; model bmd=grp age13 grp*age13 / ddfm=satterth s; random intercept age13 / type=un subject=girl(grp) g; run; Solution for Fixed Effects Standard Effect grp Estimate Error DF t Value Pr > t Intercept <.0001 grp C grp P age <.0001 age13*grp C age13*grp P Estimated gain at the age 13: (0.0138) g per cm 3

75 Repeated measurements, May Including baseline as covariate proc mixed covtest noclprint data=calcium; where visit>1; class grp girl; model bmd=baseline grp age13 grp*age13 / ddfm=satterth s; random intercept age13 / type=un subject=girl(grp) g; run; Solution for Fixed Effects Standard Effect grp Estimate Error DF t Value Pr > t Intercept baseline <.0001 grp C grp P age <.0001 age13*grp C age13*grp P Estimated gain at the age 13: (0.0062) g per cm 3

76 Repeated measurements, May Including baseline as covariate explains some (but not all) of the difference between groups at age 13 without baseline: (0.0138) baseline as covariate: (0.0062) increases the precision of the estimated difference (standard error becomes smaller)

77 Repeated measurements, May Example from Vickers, A.J. & Altman, D.G.: Analysing controlled clinical trials with baseline and follow-up measurements. British Medical Journal 2001; 323: : 52 patients with shoulder pain are randomized to either Acupuncture (n=25) Placebo (n=27) Pain is evaluated on a 100 point scale before and after treatment. High scores are good

78 Repeated measurements, May Results: Pain score (mean and SD) placebo acupuncture difference (n=27) (n=25) (95% CI) P-value baseline 53.9 (14.0) 60.4 (12.3) Type of analysis follow-up 62.3 (17.9) 79.6 (17.1) 17.3 (7.5; 27.1) changes* 8.4 (14.6) 19.2 (16.1) 10.8 (2.3; 19.4) ancova 12.7 (4.1; 21.3) * results published

79 Repeated measurements, May Baseline The placebo group lies somewhat below acupuncture Follow-up We would expect the placebo group to be lower also after treatment Therefore, the comparison is unreasonable (we see too big a difference) Change Low baseline implies an expected large positive change (regression to the mean) The placebo group is therefore expected to increase the most Therefore, the comparison is unreasonable (we see too small a difference)

80 Repeated measurements, May What to do in such a situation? Ancova Analysis of covariance, a special case of multiple regression: Outcome: follow-up data Covariates treatment (factor: acupuncture/placebo) baseline measurement (quantitative)

81 Repeated measurements, May When can we use follow-up data? when we have a control group and proper randomisation when the correlation is low When can we use differences? when we have a control group and proper randomisation when the correlation is large When can we use analysis of covariance? allways

82 Repeated measurements, May Specification of mixed models: Systematic variation: Between-individual covariates: treatment, sex, age, baseline value... Within-individual covariates: time, cumulative dose, temperature... Random variation

83 Repeated measurements, May Sources of random variation: 1. Random effects: 2. Serial correlation: 3. Measurement error:

84 Repeated measurements, May SAS, PROC MIXED model describes the systematic part (fixed effects, mean value structure) random describes the random effects repeated describes the process covariance structure local adds an additional measurement error

85 Repeated measurements, May Explained variation in percent, R 2 We have two (or more) different variances to explain! residual variation (variation within individuals, σ 2 W ) decreases when we include an important x covariate (level 1) may decrease when we include an important z covariate (level 2) variation between individuals, ω 2 B decreases when we include an important z covariate (level 2) may increase, when we include an important x covariate (level 1)

86 Repeated measurements, May Hypothetical example: The x s vary between individuals, but the average outcomes (ȳ) are almost identical: Levels of y, for fixed x are very different!

Correlated data. Repeated measurements over time. Typical set-up for repeated measurements. Traditional presentation of data

Correlated data. Repeated measurements over time. Typical set-up for repeated measurements. Traditional presentation of data Faculty of Health Sciences Repeated measurements over time Correlated data NFA, May 22, 2014 Longitudinal measurements Julie Lyng Forman & Lene Theil Skovgaard Department of Biostatistics University of