Correlated data. Variance component models. Example: Evaluate vaccine. Traditional assumption so far. Faculty of Health Sciences

Size: px

Start display at page:

Download "Correlated data. Variance component models. Example: Evaluate vaccine. Traditional assumption so far. Faculty of Health Sciences"

Mark Dorsey
5 years ago
Views:

1 Faculty of Health Sciences Variance component models Definitions and motivation Correlated data Variance component models, I Lene Theil Skovgaard November 29, 2013 One-way anova with random variation The rabbit example Hierarchical models with several levels Comparing measurement devices Crossed random effects (interactions) The visual acuity example Hjemmesider: ltsk@sund.ku.dk 1 / 86 2 / 86 Traditional assumption so far Example: Evaluate vaccine Independence One observation per individual (unit) No twins/siblings... Why is this important? Otherwise some observations (typically on the same individual) will look alike (correlated) Variation between these will not reflect the population variation (the variation between individuals) The number of observations will seem misleadingly high More precisely: We seek an estimate of the swelling due to the vaccine. Experiment: 6 rabbits, each vaccinated in 6 spots on the back Outcome y rs : swelling in cm 2, where r= 1,,R=6 denotes the rabbit, s= 1,,S=6 denotes the spot We have observed a total of 36 swelling areas, but we must expect swelling to be specific to the individual rabbit. 3 / 86 4 / 86

2 Scatter plot Naive quantification of swelling x-axis: Arbitrary numbering of rabbits The MEANS Procedure Variable N Mean Std Dev Std Error swelling What is wrong here? Imagine all measurements on a rabbit resulted in the same value... 5 / 86 6 / 86 Example: Number of infections Neglectance of correlation Number of positive swabs in 5 family members from each of 18 families will lead to errors Typical errors: Wrong standard errors (too small or too big) Wrong confidence intervals (too narrow or too wide) Wrong conclusions (type I or type II errors) The type of error depends upon the kind of question asked.. will be further explained 7 / 86 8 / 86

3 Terminology for correlated measurements Mixed models Cluster/multilevel design: Same outcome (response) measured on all individuals in a number of families/villages/school classes Repeated measurements: Same outcome (response) measured in different situations (or at different spots) for the same individual. Longitudinal measurements: Same outcome (response) measured consecutively over time for each individual. Multivariate outcome: Several outcomes (responses) for each individual, e.g. a number of hormone measurements that we want to study simultaneously. Two basic types of generalizations: Categorical covariates (family, school): Anova Variance component models Quantitative covariates (time, age): Regression Random regression SAS: Proc Mixed Mix of systematic and random effects 9 / / 86 Variance component models Hierarchical designs, cluster designs Generalizations of ANOVA-type models, involving several sources of random variation (variance components) geographical/environmental variation between regions, hospitals, schools or countries biological variation variation between individuals, families or animals within-individual variation variation between arms, teeth, injection sites, days variation due to uncontrollable circumstances time of day, temperature, observer measurement error e.g. School, School Class and Pupil [I] = [S*C*P] [S*C] S 11 / / 86

4 Examples of hierarchies Hierarchical designs, with covariates level 1 level 2 level 3 subjects twin pairs countries subjects families regions students classes schools spots rabbits fields sections rats visits subjects centres Measurements belonging together in the same cluster look alike (are correlated) On all levels, we may have random variation (variance components), as well as covariates 13 / 86 [S C P] [S C] [S] Gender Class grade School type 14 / 86 Merits of cluster designs Drawbacks of cluster designs Certain effects may be estimated more precisely, since some sources of variation are eliminated, e.g. by making comparisons within a family. This is analogous to the paired comparison situation. When planning subsequent investigations, the knowledge of the relative sizes of the variance components will be of help in deciding the number of repetitions needed at each level Bias may result, if one or more sources of variation are disregarded possible bias in the mean value structure low efficiency (type 2 error) for evaluation of level 1 covariates (within-cluster effects) too small standard errors (type 1 error) for estimates of level 2 effects (between-cluster effects) When making inference (estimation and testing), it is important to take all sources of variation into account, and effects have to be evaluated using the relevant variation! 15 / / 86

5 The vaccine in rabbits example Output from anova model Traditional model : swelling = rabbit level + variation y rs = α r + ε rs, ε rs N (0, σ 2 ) The variation (σ) can be regarded either as within-rabbit variation or measurement error (probably a combination of the two). In computer terms: Each rabbit has its own level, i.e. rabbit is a factor proc glm data=rabbit; class rabbit; model swelling=rabbit / solution; run; 17 / 86 The GLM Procedure Dependent Variable: swelling Sum of Source DF Squares Mean Square F Value Pr > F Model Error Corrected Total R-Square Coeff Var Root MSE swelling Mean Source DF Type III SS Mean Square F Value Pr > F rabbit MS B = MS W = / 86 Output from anova model, II Standard Parameter Estimate Error t Value Pr > t Intercept B <.0001 rabbit B rabbit B rabbit B rabbit B rabbit B rabbit B... But: Do we get any useful information from this? We are not interested in these particular 6 rabbits, only in rabbits in general, as a species! We assume these 6 rabbits to have been randomly selected from the species. 19 / 86 We choose to model rabbit variation instead of rabbit levels: swelling = grand mean + between-rabbit variation + within-rabbit variation y rs = µ + a r + ε rs where the a r s and the ε rs s are assumed independent, normally distributed with Var(a r )=ω 2 B, Var(ε rs)=σ 2 W The variation between rabbits is now a random factor ωb 2 and σ2 W are variance components, and the model is also called a two-level model 20 / 86

6 Fixed vs. random effects? Formulation in terms of correlation 21 / 86 Fixed: Random: all values of the factor present (typically only a few, e.g. treatment) allows inference for these particular factor values only must include a reasonable number of observations for each factor value a representative sample of values of the factor is present allows inference to be extended beyond the values in the experiment (e.g. geographical areas, classes, rabbits) is necessary when we have a covariate for this level, e.g. class grade, or treatment All swelling observations have common mean and variance: y rs N (µ, ω 2 B + σ 2 W ) But: Measurements made on the same rabbit are correlated with the intra-class correlation Corr(y r1, y r2 ) = ρ = ω 2 B ω 2 B + σ2 W Measurements made on the same rabbit tend to look more alike than measurements made on different rabbits. All measurements on the same rabbit look equally much alike. This correlation structure is called compound symmetry (CS) or exchangeability. 22 / 86 Estimation of variance components Variances are positive! In balanced situations: Within-rabbit variation = Residual variation, as usual: σ 2 W = MS W Between-rabbit variation, not quite variation between averages: But note: It may happen that σ B 2 becomes negative! by a coincidence as a result of competition between units belonging together, e.g. when measuring yield for plants grown in the same pot ω 2 B = MS B MS W S = where S denotes the number of spots, here 6 = 0.33 In such a case, it will be reported as a zero 23 / / 86

7 Reading in data in SAS Estimation in SAS data a0; input spot $ y1-y6; datalines; a b c d e f / ; / run; / / / \ / \/ data rabbit; set a0; rabbit=1; swelling=y1; output; rabbit=2; swelling=y2; output; rabbit=3; swelling=y3; output; rabbit=4; swelling=y4; output; rabbit=5; swelling=y5; output; rabbit=6; swelling=y6; output; run; proc mixed data=rabbit; class rabbit; model swelling = / s; random rabbit; run; Covariance Parameter Estimates Cov Parm Estimate rabbit Residual Solution for Fixed Effects Standard Effect Estimate Error DF t Value Pr > t Intercept < / / 86 Interpretation of variance components Interpretation of variance components, cont d Proportion of Variation Variance component Estimate variation Between ωb % Within σw % Total ωb 2 + σ2 W % Typical differences (95% Prediction Intervals): for spots on the same rabbit ± = ±2.16 cm 2 for spots on different rabbits ± = ±2.70 cm 2 27 / 86 Approx. 2 3 of the variation in the measurements comes from the variation within rabbits, i.e. between injection sites on the same rabbit. Why? Could there be a systematic difference between the injection sites? Two-way anova: Source DF Type III SS Mean Square F Value Pr > F rabbit spot This does not seem to be the case (P=0.26). 28 / 86

8 Design considerations, precision of overall mean Effective sample size For R=no. of rabbits, varying from 3 to 20: For S=no. of spots, varying from 1 to 10: If we had only one observation for each of k rabbits, how many rabbits would we then need to obtain the same precision? k = R S 1 + ρ(s 1) We have here ρ = ω2 B ω 2 B +σ2 W = = k = 12.8 Effectively, we have only approximately two independent observations from each rabbit! Var(ȳ) = ω2 B R + σ2 W RS 29 / / 86 Quantification of overall swelling Estimation of individual rabbit means...? Forget rabbit: Pool all 36 measurements, wrongly assuming independence. ˆµ = 7.367(0.155) Rabbit averages: Start out by taking averages for each rabbit. ˆµ = 7.367(0.267) Random rabbit: Estimate the mean swelling of rabbits as a species (in general the correct approach, using mixed models). ˆµ = 7.367(0.267) Two different approaches: Traditional averages ȳ r. BLUP s (best linear unbiased predictor) rely on the assumption that individuals come from the same population, and become weighted averages: ω 2 B ω 2 B + σ2 W S ȳ r. + σ 2 W S ω 2 B + σ2 W S ȳ.. which have been shrinked towards the overall mean, ȳ.. 31 / / 86

9 BLUPs vs. averages, shrinkage Quantification for reduced dataset BLUPs are used for ranking e.g. schools When the 3 smallest measurements from rabbit 2 (largest level) are omitted, the results become: Forget rabbit: We have omitted some of the largest observation. ˆµ = 7.367(0.155) ˆµ = 7.291(0.163) Random rabbit: rabbit 2 has a lower weight in the average due to a larger standard error ˆµ = 7.367(0.267) ˆµ = 7.390(0.298) 33 / / 86 Quantification for reduced dataset, cont d BLUPs for the reduced data set Unweighted rabbit averages: average for rabbit 2 has increased ˆµ = 7.367(0.267) ˆµ = 7.436(0.333) Weighted rabbit averages: rabbit 2 has a lower weight in the average due to only 3 observations ˆµ = 7.367(0.267) ˆµ = 7.291(0.265) Larger shrinkage than before, for rabbit no / / 86

10 Confidence limits for the variance components...just as a warning Intra-individual variation σw 2 : < σw 2 < Inter-individual variation ωb 2 : < ωb 2 < 2.48 So, we should take care not to over-interpret for small datasets... Now imagine, that rabbits are grouped in two (grp=1,2) proc mixed data=rabbit; class grp rabbit; model swelling = grp / s; random rabbit(grp); run; Cov Parm Estimate rabbit(grp) < this changes Residual < this stays the same Solution for Fixed Effects Standard Effect grp Estimate Error DF t Value Pr > t Intercept <.0001 grp grp / / 86 Such a comparison can not be performed in the usual way (ignoring the rabbits), since we then compare groups as if we have much more information than we actually have Type I error will occur! proc glm data=rabbit; class grp; model swelling=grp / solution; run; T for H0: Pr > T Std Error of Parameter Estimate Parameter=0 Estimate Intercept grp Two-level model level Unit Variation Covariates 1 Individual observations within rabbit rabbit*spot spot 2 Individuals/Clusters between rabbits group rabbit overall mean Errors if the random rabbit variation is ignored: low efficiency (type 2 error) for evaluation of level 1 covariates (systematic spot?) too small standard errors (type 1 error) for estimates of level 2 effects (group, overall mean) 39 / / 86

11 Factor diagrams Example of 3-level model In the traditional one-way anova: [I] = [R*S] [R] 0 In case of grouping: [I] = [R*S] [R] G 0 We have here used the notation arrows indicating simplifications / groupings [ ] for the random effects, corresponding to variance components on the various levels. Number of nuclei per cell in the rat pancreas, used for the evaluation of cytostatica 4 rats (R) 3 sections for each rat (S) 5 randomly chosen fields from each section (F) Hierarchy: fields sections rats σ 2 τ 2 ω 2 Factor diagram: [I] = [R*S*F] [R*S] [R] 0 Henrik Winther Nielsen, Inst. Med. Anat. 41 / / 86 Scatter plot, with jitter 3-level model in SAS proc mixed data=nuclei; class rat section; model nuclei= / s; random rat section(rat); run; Covariance Parameter Estimates Cov Parm Estimate rat section(rat) Residual Solution for Fixed Effects Symbols indicate sections Standard Effect Estimate Error DF t Value Pr > t Intercept / / 86

12 Estimates of variance components Typical differences Proportion of Variation Variance component Estimate variation Rats ω % Sections τ % Fields σ % Total ω 2 + τ 2 + σ % Almost all variation is on the lowest level! between two measurements: for different fields on the same section ± = ±1.255 for different sections on the same rat ±2 2 ( ) = ±1.264 for sections on different rats ±2 2 ( ) = ± / / 86 Correlations Comparing measurement devices vary, depending on Measurements on the same section: Example: Peak expiratory flow rate, l/min: 17 subjects, 2 measurement devices, each measured twice Corr(y rs1, y rs2 ) = ω 2 + τ 2 ω 2 + τ 2 + σ 2 = Measurements on different sections of the same rat: Corr(y r11, y r22 ) = ω 2 ω 2 + τ 2 + σ 2 = Measurements from different rats are independent 47 / 86 subject Wright mini Wright id Y 1p1 Y 1p2 Y 2p1 Y 2p Average SD / 86 (Bland and Altman, 1986).

13 Illustration of all data Aim of investigation Quantify the precision of each measuring device: compare the two repetitions Quantify the agreement between the two devices: compare individual measurements - or averages Give practical advice for clinical use: can we trust the devices, and can we use them interchangeably? 49 / / 86 Variance component model Correlation structure Subject, p = 1,..., 17 Methods, m = 1, 2 Repetitions, j = 1, 2 in the above model, for each subject with ordering (Wrigt1, Wright2, MiniWright1, MiniWright2) Y pmj = β m + A p + C pm + ε pmj where A p N (0, ω 2 ), C pm N (0, τ 2 ), ε pmj N (0, σ 2 ) Note: Patients need not be random here..., why?? ω 2 + τ 2 + σ 2 ω 2 + τ 2 ω 2 ω 2 ω 2 + τ 2 ω 2 + τ 2 + σ 2 ω 2 ω 2 ω 2 ω 2 ω 2 + τ 2 + σ 2 ω 2 + τ 2 ω 2 ω 2 ω 2 + τ 2 ω 2 + τ 2 + σ 2 51 / / 86

14 Correlation structure SAS-programming if subjects are considered systematic: For each subject*method combination, i.e. for two repetitions: ( τ 2 + σ 2 τ 2 ) τ 2 τ 2 + σ 2 proc mixed data=wright; class method id; model wr=method / s; random intercept method / subject=id; run; or maybe proc mixed data=wright; class method id; model wr=id method / s; random id*method; run; 53 / / 86 Output Estimates Class Level Information Class Levels Values method 2 mini wright id Covariance Parameter Estimates Cov Parm Subject Estimate Intercept id method id Residual Fit Statistics -2 Res Log Likelihood AIC (smaller is better) Solution for Fixed Effects Standard Effect method Estimate Error DF t Value Pr > t Intercept <.0001 method mini method wright Variance components: ω 2 = τ 2 = σ 2 = Systematic difference between measuring devices: ˆβ 1 ˆβ 2 = 6.03(8.05), P = 0.46 How can we use these?? 55 / / 86

15 Precision of the methods Agreement between the two methods are assumed identicalfor the two devices Difference between double measurements (identical repetitions): Difference between single measurements by the two methods: D p = Y p1j1 Y p2j2 Limits-of-agreement: D pm = Y pmj1 Y pmj2 = ε p1j1 ε p2j2 N (0, 2σ 2 ) ±2 2σ 2 = ±50.23 Limits-of-agreement: = β 1 β 2 + C p1 C p2 + ε p1j1 ε p2j2 N (β 1 β 2, 2τ 2 + 2σ 2 ) ±2 2(τ 2 + σ 2 ) = ±75.31 (where we have ignored the nonsignificant systematic difference between the two, otherwise add 6.03) 57 / / 86 Agreement between averages Difference in precision?? New model (with systematic subject effects): can be obtained by direct calculation of SD for difference between averages, or from D p = X p1. X p2. = β 1 β 2 + C p1 C p2 + ε p1. ε p2. N (β 1 β 2, 2τ 2 + σ 2 ) Limits-of-agreement: ±2 2τ 2 + σ 2 = ±66.41 Only reasonable, if averages is the standard for clinical use! Y pmj = µ + β m + α p + C pm + ε pmj C pm N (0, τ 2 ) ε pmj N (0, σ m 2 ) proc mixed data=wright; class method id; model wr=id method / ddfm=satterth s; random id*method; repeated / group=method type=simple subject=id*method; run; 59 / / 86

16 Output, systematic subject effect Results Covariance Parameter Estimates Cov Parm Subject Group Estimate method*id Residual method*id method mini Residual method*id method wright Fit Statistics -2 Res Log Likelihood AIC (smaller is better) Solution for Fixed Effects Standard Effect method id Estimate Error DF t Value Pr > t Intercept <.0001 id id id method mini method wright Type 3 Tests of Fixed Effects Num Den Effect DF DF F Value Pr > F id <.0001 method Precisions: Wright: σ 2 1 = mini Wright: σ 2 2 = Conclusion: Wright is better than mini Wright, but is it significantly better? F = σ2 2 σ 2 1 = = 1.69 F(17, 17) P = 0.14 No... Alternative test: 2 log Q = = 1.2 χ 2 (1) P = / / 86 Dubious/Incorrect Bland-Altman approaches Measurements taken in pairs Calculate agreement between averages: We have seen that these estimate 2τ 2 + σ 2 instead of 2τ 2 + 2σ 2 In general, with k repetitions: 2τ k σ2 Calculate all possible differences between pairs D pj = Y p1j1 Y p2j2 = β 1 β 2 + C p1 C p2 + ε p1j1 ε p2j2 These have the correct variance, but they are correlated due to the C s (the same individual have several differences) and can give erroneous results if the measurement devices react to subject characteristics e.g. over time... Y pmt = β m + A p + C pm + E pt + ε pmt Precision becomes impossible, since we have no true replications Agreement: D pt = Y p1t Y p2t = β 1 β 2 + C p1 C p2 + ε p1t ε p2t but these differences will again be correlated due to the C s 63 / / 86

17 Example: Visual acuity Data 7 individuals are looking at a screen, where a light flash appears. They are looking through 4 lenses, with powers 6/6, 6/18, 6/36 and 6/60, i.e. 4 magnifications: 1, 3, 6 and 10 with 2 eyes Outcome: Visual acuity, the time lag (milliseconds) between the stimulus and the electrical response at the back of the cortex Crowder & Hand (1990) 65 / / 86 Factors to take into account Main effects: 7 individuals (person), 2 eyes for each individual (eye) 4 lens magnifications (power) Interactions? person*eye person*power eye*power 2-order interaction person*eye*power = Residual 67 / / 86

18 Model ingredients Model formulation Outcome: Visual acuity p = 1,..., 7, e = 1, 2, m = 1, 2, 3, 4 Effects: Systematic: Mean value µ em eye α e, power β m eye*power γ em Random effects: patient A p patient*eye B pe, patient*power C pm Residual: patient*eye*power ε pem where Y pem = µ em + A p + B pe + C pm + ε pem A p N (0, ω 2 ) B pe N (0, τe 2 ) C pm N (0, τm) 2 ε pem N (0, σ 2 ) 69 / / 86 Factor diagram Not quite a multilevel model, but.. [I ] = [Pa Ey Po] [Pa Ey] Ey Po [Pa Po] Ey [Pa] Po 0 still a variance component model Level Unit Covariates 1 single measurements Ey*Po 2 interactions 2e [Pa*Ey] Ey 2m [Pa*Po] Po 3 individuals, [Pa] overall level 71 / / 86

SAS code, and output ods graphics on; proc mixed plots=data=visual covtest; class patient eye power; model acuity=eye power eye*power / outpm=udpm outp=udp s residual influence; random intercept eye

19 SAS code, and output ods graphics on; proc mixed plots=data=visual covtest; class patient eye power; model acuity=eye power eye*power / outpm=udpm outp=udp s residual influence; random intercept eye power / subject=patient; run; ods graphics off; Covariance Parameter Estimates Standard Z Cov Parm Subject Estimate Error Value Pr > Z Intercept patient eye patient power patient Residual Solution for Fixed Effects Standard Effect eye power Estimate Error DF t Value Pr > t Intercept <.0001 eye left eye right power power power power eye*power left eye*power left eye*power left eye*power left eye*power right eye*power right eye*power right eye*power right Type 3 Tests of Fixed Effects Num Den Effect DF DF F Value Pr > F eye power eye*power / / 86 Predicted mean profiles Individual predictions 75 / / 86

Model checks Ordinary residual plot The model was: Y

= µ em Conditional residuals: ε pem 77 / 86 78 / 86

20 Model checks Ordinary residual plot The model was: Y pem = µ em + A p + B pe + C pm + ε pem where A p N (0, ω 2 ), B pe N (0, τe 2 ), C pm N (0, τm), 2 ε pem N (0, σ 2 ) Two types of residuals Ordinary residuals: Y pem = µ em Conditional residuals: ε pem 77 / / 86 Conditional residual plot Influence diagnostics 79 / / 86

21 Omit the interaction eye*power Eye comparisons Covariance Parameter Estimates Standard Z Cov Parm Subject Estimate Error Value Pr > Z Intercept patient eye patient power patient Residual Solution for Fixed Effects Standard Effect eye power Estimate Error DF t Value Pr > t Intercept <.0001 eye left eye right power power power power Type 3 Tests of Fixed Effects Num Den Effect DF DF F Value Pr > F eye power Difference between eye averages: Ȳ.e1. Ȳ.e 2. = µ stuff + B.e1 B.e2 + ε.e1. ε.e2. Var(Ȳ.e 1. Ȳ.e 2.) = 2 7 τ 2 e σ2 = τ 2 e = is rather large (people have different eye preferences), but we have four measurements on each eye Still, we have to have a larger difference in order to detect it 81 / / 86 Magnification comparisons If we ignore correlations Difference between magnification averages: Ȳ..m1 Ȳ..m 2 = µ stuff + C..m1 C..m2 + ε..m1 ε..m2 Var(Ȳ..m 1 Ȳ..m 2 ) = 2 7 τ 2 m σ2 = τ 2 m = 3.97 is not that large (people react more or less identically to the different magnifications) and we have only two measurements for each magnification So, we can detect smaller differences than for eye comparisons and use a model with no random effects, we get Covariance Parameter Estimates Cov Parm Estimate Residual Solution for Fixed Effects Standard Effect eye power Estimate Error DF t Value Pr > t Intercept <.0001 eye left eye right power power power power Type 3 Tests of Fixed Effects Num Den Effect DF DF Chi-Square F Value Pr > ChiSq Pr > F eye power / / 86

22 Comparisons in wrong model New σ 2 = τ 2 e = τ 2 m = 0 Eye differences: Magnification differences: Var(Ȳ.e 1. Ȳ.e 2.) = σ2 = Var(Ȳ..m 1 Ȳ..m 2 ) = σ2 = Systematic vs. random effects Could the patients be treated as systematic here? Yes: Covariance Parameter Estimates Cov Parm Subject Estimate eye patient power patient Residual Type 3 Tests of Fixed Effects Num Den Effect DF DF F Value Pr > F patient eye power eye*power Can you think why? 85 / / 86

Varians- og regressionsanalyse

Faculty of Health Sciences Varians- og regressionsanalyse Variance component models Lene Theil Skovgaard Department of Biostatistics Variance component models Definitions and motivation One-way anova with