Faculty of Health Sciences. Correlated data. Variance component models. Lene Theil Skovgaard & Julie Lyng Forman.

Size: px
Start display at page:

Download "Faculty of Health Sciences. Correlated data. Variance component models. Lene Theil Skovgaard & Julie Lyng Forman."

Transcription

1 Faculty of Health Sciences Correlated data Variance component models Lene Theil Skovgaard & Julie Lyng Forman November 28, / 96

2 Overview One-way anova with random variation The rabbit example Hierarchical models with several levels Crossed random effects (interactions) The visual acuity example Random regression Home pages: RepeatedMeasures2017.html 2 / 96

3 Terminology for correlated measurements Cluster/multilevel design: Same outcome (response) measured on all individuals in a number of families/villages/school classes Repeated measurements: Same outcome (response) measured in different situations (or at different spots) for the same individual. Longitudinal measurements: Same outcome (response) measured consecutively over time for each individual. Multivariate outcome: Several outcomes (responses) for each individual, e.g. a number of hormone measurements that we want to study simultaneously. 3 / 96

4 Variance component models Models involving several sources of random variation geographical/environmental variation between regions, hospitals, schools or countries biological variation variation between individuals, families or animals within-individual variation variation between arms, teeth, injection sites, days variation due to uncontrollable circumstances time of day, temperature, observer measurement error Of course, they may also include fixed effects, such as treatment, gender etc. 4 / 96

5 Example: Swelling due to vaccine Research question: How much swelling can be expected in relation to a vaccination? Experiment: 6 rabbits, each vaccinated in 6 (randomly?) selected spots on the back Outcome y rs : swelling in cm 2, where r= 1,,R=6 denotes the rabbit, s= 1,,S=6 denotes the spot We have observed a total of 36 swelling areas, but we must expect swelling to be specific to the individual rabbit. 5 / 96

6 Scatter plot X-axis: Arbitrary numbering of rabbits 6 / 96

7 Naive quantification of swelling The MEANS Procedure Analysis Variable : swelling Lower 95% Upper 95% N Mean Std Error CL for Mean CL for Mean What is wrong here? 7 / 96 Imagine all measurements on a rabbit resulted in the same value... Then we would actually only have 6 measurements..., and SEM would be awfully wrong So what when they are only somewhat identical

8 Correlated observations Observations on the same individual look alike, they are correlated Why is this important? Variation between these will not reflect the population variation (the variation between individuals) The number of observations will seem misleadingly high So, we have to take the correlation into account 8 / 96

9 Neglectance of correlation will lead to errors Typical errors: Wrong standard errors (too small or too big) Wrong confidence intervals (too narrow or too wide) Wrong conclusions (type I or type II errors) The type of error depends upon the kind of question asked.. to be further explained 9 / 96

10 Naive analysis of swelling Each rabbit has a mean level There is some variation between the six injection sites for the same rabbit In computer language: The rabbit is a factor, and the analysis is a one-way ANOVA proc glm data=rabbit; class rabbit; model swelling=rabbit / solution; run; 10 / 96

11 Output from naive model The GLM Procedure Dependent Variable: swelling Sum of Source DF Squares Mean Square F Value Pr > F Model Error Corrected Total R-Square Coeff Var Root MSE swelling Mean Source DF Type III SS Mean Square F Value Pr > F rabbit The rabbits have different levels (P=0.0040) but this was NOT the question 11 / 96

12 Output fra den naive model, II Standard Parameter Estimate Error t Value Pr > t Intercept B <.0001 rabbit B rabbit B rabbit B rabbit B rabbit B rabbit B... But: Do we get any useful information from this? We are not interested in these particular 6 rabbits, only in rabbits in general, as a species We assume these 6 rabbits to have been randomly selected from the species. 12 / 96

13 Variance component model Instead of fixed level parameters for each rabbit, we model the differences between rabbits as an extra source of variation: y rs = µ + a r + ε rs where the a r s and the ε rs s are assumed to be independent, Normally distributed, with variances Var(a r )=ω 2 B, Var(ε rs)=σ 2 W The variation between rabbits is now a random effect, or random factor, ωb 2 and σ2 W are called variance components, and the model is also called a two-level model 13 / 96

14 Formulation in terms of correlation All swelling observations have common mean and variance: y rs N (µ, ω 2 B + σ 2 W ) But: Measurements made on the same rabbit are correlated with the intra-class correlation Corr(y r1, y r2 ) = ρ = ω 2 B ω 2 B + σ2 W Measurements made on the same rabbit tend to look more alike than measurements made on different rabbits. All measurements on the same rabbit look equally much alike. This correlation structure is called compound symmetry (CS) or exchangeability. 14 / 96

15 Covariance and correlation For the six injections sites, the covariance matrix for each rabbit is: ω 2 B + σ2 W ω 2 B ω 2 B ω 2 B ω 2 B ω 2 B ω 2 B ω 2 B + σ2 W ω 2 B ω 2 B ω 2 B ω 2 B ω 2 B ω 2 B ω 2 B + σ2 W ω 2 B ω 2 B ω 2 B ω 2 B ω 2 B ω 2 B ω 2 B + σ2 W ω 2 B ω 2 B ω 2 B ω 2 B ω 2 B ω 2 B ω 2 B + σ2 W ω 2 B ω 2 B ω 2 B ω 2 B ω 2 B ω 2 B ω 2 B + σ2 W and the corresponding Compound symmetry correlation structure is: 1 ρ ρ ρ ρ ρ ρ 1 ρ ρ ρ ρ ρ ρ 1 ρ ρ ρ ρ ρ ρ 1 ρ ρ ρ ρ ρ ρ 1 ρ ρ ρ ρ ρ ρ 1 15 / 96

16 Exchangeability = Compound Symmetry This covariance/correlation structure implies: All variances are equal: There should be the same variation between rabbits for all injection sites Any pair of measurements are equally correlated: All injection sites should be equally related to each other How could these assumptions be violated? Are the injection sites really randomly selected? If not, an unstructured covariance may be more appropriate: Some injection sites are more related than others (e.g. due to proximity). 16 / 96

17 Estimation in SAS proc mixed data=rabbit; class rabbit; model swelling = / ddfm=kr s cl; random rabbit; run; Covariance Parameter Estimates Cov Parm Estimate rabbit Residual Solution for Fixed Effects Standard Effect Estimate Error DF t Value Lower Upper Intercept Comparison to p. 7 reveals that correctly taking the correlation into account yields the same estimate, but substantially wider confidence interval To ignore the correlation leads to a type 1 error 17 / 96

18 Interpretation of variance components Proportion of Variation Variance component Estimate variation Between ωb % Within σw % Total ωb 2 + σ2 W % Typical differences (95% Prediction Intervals): for spots on the same rabbit ± = ±2.16 cm 2 for spots on different rabbits ± = ±2.70 cm 2 18 / 96

19 Interpretation of variance components, cont d Approx. 2 3 of the variation in the measurements comes from the variation within rabbits, i.e. between injection sites on the same rabbit. Why? Could there be a systematic difference between the injection sites? Cov Parm Estimate rabbit Residual Type 3 Tests of Fixed Effects Num Den Effect DF DF F Value Pr > F spot This does not seem to be the case (P=0.26). 19 / 96

20 Design considerations, precision of overall mean For R=no. of rabbits, varying from 3 to 20: For S=no. of spots, varying from 1 to 10: Standard error is the square root of: Var(ȳ) = ω2 B R 20 / 96 + σ2 W RS

21 Effective sample size If we had only one observation for each of k rabbits, how many rabbits would we then need to obtain the same precision? k = R S 1 + ρ(s 1) Inserting R = 6, S = 6 and ρ = yields k = 12.8 ω2 B ω 2 B +σ2 W = = Effectively, we have only approximately two independent observations from each rabbit! Take care: This is a pilot study, do not rely too heavily on the results. 21 / 96

22 Reduced data set - omit 3 observations What kind of effects would we expect? 22 / 96

23 Quantification of overall swelling Right columns correspond to reduced data set, where the 3 smallest measurements from rabbit 2 (with the highest level) are omitted. All 36 data Omitting 3 observations Method Estimate (SE) Estimate (SE) Simple averages (0.155) (0.163) of all (p. 7) Average (0.267) (0.333) of averages Weighted average (0.265) of averages Variance component (0.267) (0.298) model, (p. 17) 23 / 96

24 Comments to quantifications in Table on p. 23 Simple averages: Pool all 36 measurements, wrongly assuming independence. This will result in too small standard errors. In the reduced data set, the estimate is downwards biased, since we have omitted some of the largest observations. Average of averages: Start out by taking averages for each rabbit. This will be OK for balanced designs, but when we omit the three lowest observations for rabbit 2, this rabbit appear to have a higher level and will give an upwards bias. 24 / 96

25 Comments, cont d Weighted average of averages: As above, but weighted according to number of observations. For balanced designs, all weights are equal, but when we omit three observations, the rabbit 2 has a lower weight in the average due to only 3 observations This will result in a downwards bias, because rabbit 2 has a high level. Random rabbit: The variance component model will yield the correct result, provided that observations are missing at random. In the reduced data set, rabbit 2 has a lower weight in the average due to a larger standard error 25 / 96

26 Estimation of individual rabbit means...? Two different approaches: Traditional averages ȳ r. BLUP s (best linear unbiased predictor) rely on the assumption that individuals come from the same population, and become weighted averages which have been shrinked towards the overall mean: kȳ r. + (1 k)ȳ.. where k = ω 2 B ω 2 B + σ2 W S (k is close to 0 when σ 2 W is large, otherwise closer to 1) More shrinkage if rabbits look alike BLUPs are used for ranking e.g. schools 26 / 96

27 BLUPs vs. averages, shrinkage Left panel: The full dataset, Right panel: Reduced data set: Larger shrinkage for rabbit no. 2 in reduced dataset 27 / 96

28 Hierarchical designs, cluster designs e.g. School, School Class and Pupil [I] = [S*C*P] [S*C] S 28 / 96

29 Hierarchical designs, with covariates [S C P] [S C] [S] Gender Class grade School type 29 / 96

30 Examples of hierarchies level 1 level 2 level 3 subjects twin pairs countries subjects families regions students classes schools spots rabbits fields sections rats visits subjects centres Measurements belonging together in the same cluster look alike (are correlated) On all levels, we may have random variation (variance components), as well as covariates 30 / 96

31 Merits of cluster designs Certain effects may be estimated more precisely, since some sources of variation are eliminated, e.g. by making comparisons within a family or a school class This is analogous to the paired comparison situation. When planning subsequent investigations, the knowledge of the relative sizes of the variance components will be of help in deciding the number of repetitions needed at each level 31 / 96

32 Drawbacks of cluster designs Bias may result, if one or more sources of variation are disregarded low efficiency (type 2 error) for evaluation of level 1 covariates (within-cluster effects) too small standard errors (type 1 error) for estimates of level 2 effects (between-cluster effects) possible bias in the mean value structure, in case of missing values For longitudinal data, we saw that ignoring correlations implies that: Time will appear less important Groups will appear more different 32 / 96

33 Level 1 covariates (unit: single observations) Time itself Covariates varying with time: blood pressure, heart rate, age Interaction between group and time If correlation is not taken into account, we ignore the paired situation, leading to low efficiency, i.e. too large P-values Type 2 error Effects may go undetected! 33 / 96

34 Level 2 covariates (unit: individuals) Treatment Gender, age If correlation is ignored, we act as if we have more information than we actually have, leading to too small P-values Type 1 error Noise may be taken to be real effects! 34 / 96

35 A school example Models for such data include 3 sources of variation: 1: Variation between schools ([S]), 2: Variation between classes in each school ([S*C]) and 3: Variation between pupils in each class ([S*C*P], residual variation) What may happen if we forget the variation between classes in the same school, [S*C]? Pupils in the same class will be assumed no more correlated than pupils from different classes (in the same school) Covariates on class level (e.g. class grade) will appear too important Covariates on pupil level (e.g. gender) will appear less important We will return to this example, when we discuss binary data 35 / 96

36 Another example of a 3-level model Research problem: In order to evaluate the effect of cytostatica on pancreas islet β-cells, we need to quantify the number of nuclei per cell. Henrik Winther Nielsen, Inst. Med. Anat. How should data be collected in order to maximize precision with low expense and work load? How many animals (rats)? How many slices of the pancreas? How many sections of each slice should be counted? Hierarchy: fields sections rats σ 2 τ 2 ω 2 Factor diagram: [I] = [R*S*F] [R*S] [R] 0 36 / 96

37 Pilot study 4 rats (R) 3 sections for each rat (S) 5 randomly chosen fields from each section (F) Scatter plot, with jitter (symbols indicate sections) 37 / 96

38 3-level model in SAS proc mixed data=nuclei; class rat section; model nuclei= / ddfm=kr s; random intercept section / subject=rat; run; Covariance Parameter Estimates Covariance Parameter Estimates Cov Parm Subject Estimate Intercept rat section rat Residual Solution for Fixed Effects Standard Effect Estimate Error DF t Value Pr > t Intercept / 96

39 Variances are positive! and therefore these models describe all correlations to be positive. But note: It may happen that correlations are in reality negative! by a coincidence as a result of competition between units belonging together, e.g. when measuring yield for plants grown in the same pot In such a case, the corresponding variance component will be reported as a zero Here, the variation between sections is close to 0 39 / 96

40 Interpretation of variance components Proportion of Variation Variance component Estimate variation Rats ω % Sections τ % Fields σ % Total ω 2 + τ 2 + σ % Almost all variation is on the lowest level: Rats appear quite identical, perhaps they are from the same litter? Sections appear extremely identical, is the pancreas homogeneous? 40 / 96

41 Typical differences between two measurements: for different fields on the same section ± = ±1.255 for different sections on the same rat ±2 2 ( ) = ±1.264 for sections on different rats ±2 2 ( ) = ± / 96

42 Correlations vary, depending on Measurements on the same section: Corr(y rs1, y rs2 ) = ω 2 + τ 2 ω 2 + τ 2 + σ 2 = Measurements on different sections of the same rat: Corr(y r11, y r22 ) = ω 2 ω 2 + τ 2 + σ 2 = Measurements from different rats are independent 42 / 96

43 Previous example: Calcium supplements A total of year old girls were randomized to receive either calcium or placebo. Outcome: BMD=bone mineral density, in g cm, 2 ideally measured every 6 months (5 visits), but in reality... Scientific question: Does calcium improve the rate of bone gain for adolescent women? 43 / 96

44 Previous analyses of this example Response profiles, with unstructured or patterned covariance: 44 / 96

45 Timing of the 5 visits Of course, the girls were not seen with intervals of precisely 6 months... Neither were they precisely at the same age at the first visit What to do about that? What is the proper time scale? Age? Time since randomization? Assuming that the date of the first visit is also the time of randomization, we shall take this as time 0. More on baseline handling later 45 / 96

46 Time since randomization Time points are specific to each single girl Time 0 is the individual time of the fist visit visit1, visit2 etc. have no real meaning any more, because they do not refer to the same time point Time is now in units years from randomization Note on the output next page: The number of measurements decrease over time, due to missing values/dropout 46 / 96

47 Overview of individual time points The MEANS Procedure N grp visit Obs Variable N Mean Minimum Maximum C 1 55 bmd time bmd time bmd time bmd time bmd time P 1 57 bmd time bmd time bmd time bmd time bmd time / 96

48 Individual profiles Spaghetti plots 48 / 96

49 Plausible models for BMD data Mean value structure We need a model for the effect of time, since 5 separate mean values is not possible (not identical times). The simplest mean value structure is linearity Covariance structure We cannot use the construction type=un, but still the random-statement and the CS in the repeated-statement. A lot of other covariance structures will still be possible, e.g. The non-equidistant analogue to the autoregressive structure is Corr(Y git1, Y git2 ) = ρ t1 t2 which is written as TYPE=SP(POW)(ctime) A new covariance structure comes from random regression 49 / 96

50 Baseline issues It the first visit is a baseline measurement (which it is), and randomization has been performed: The two groups are known to be equal at baseline To allow a group effect at baseline may weaken a possible difference between these (type 2 error) may convert a treatment effect to an interaction Dissimilarities may be present in small studies For slowly varying outcomes, even a small difference may produce non-treatment related differences, i.e. bias 50 / 96

51 Hypothetical comparison of two treatment groups, A Truth: Constant difference between the treatments Finding: Interaction between time and treatment 51 / 96

52 Hypothetical comparison of two treatment groups, B Truth: No effect of treatment Finding: Constant difference between treatments 52 / 96

53 Baseline difference I: Observational studies Research question: Compare the outcomes for individuals from different groups (e.g. gender or illness groups): The groups are likely to differ in many respects, including baseline outcome value. Differences in the outcome may be due to any of these characteristics, and the results will depend on which of these are included in the model. Adjust for the covariates that are sensible in the context. The scientific question answered depends upon the model 53 / 96

54 Baseline difference II: Randomized studies Research question: Compare the outcomes for individuals treated differently, but otherwise identical (with respect to all baseline characteristics, including baseline outcome value): There ought to be no difference in either covariates or baseline outcome. Even so, small chance differences in baseline may create important outcome differences that may erroneously be taken to be treatment effects, if the covariate (or baseline) is highly predictive of the outcome. Using baseline measurement as a covariate (Ancova) to adjust for chance differences is most sensible in simple before/after studies, but is not optimal with more than one follow-up measurement. 54 / 96

55 Approaches for handling baseline in randomized studies Use follow-up data only (exclude baseline from analysis) - most reasonable if correlation between repeated measurements is very low Subtract baseline from successive measurements - most reasonable if correlation between repeated measurements is very high Use a model with equal mean values at baseline - may be used for any degree of correlation and gives the most sensible interpretation 55 / 96

56 Random girl level in SAS and linearity in time: proc mixed covtest data=calcium; class grp girl; model bmd=time grp*time / ddfm=kr s cl; random intercept / subject=girl(grp) v vcorr; run; Girls are nested in groups, specified by the notation random girl(grp); kr could be replaced by satterth, see p. 57 v and vcorr are printing options 56 / 96

57 The options ddfm=satterth (- or kenwardrogers=kr): When the distributions are exact, they have no effect in balanced situations When approximations are necessary, these two are considered best in unbalanced situations, i.e for almost all observational designs in case of missing observations It may give rise to fractional degrees of freedom The computations may require a little more time, but in most cases this will not be noticable When in doubt, use it! 57 / 96

58 Random girl level, output from code on p. 56 Covariance Parameter Estimates Standard Z Cov Parm Subject Estimate Error Value Pr > Z Intercept girl(grp) <.0001 Residual <.0001 Type 3 Tests of Fixed Effects Num Den Effect DF DF F Value Pr > F time <.0001 time*grp <.0001 No doubt, we see an interaction GRP*TIME 58 / 96

59 Random girl level, output, continued Solution for Fixed Effects Standard Effect grp Estimate Error DF t Value Pr > t Alpha Intercept < time < time*grp C < time*grp P Solution for Fixed Effects Effect grp Lower Upper Intercept time time*grp C time*grp P.. Excess slope in the C-group: g/cm 3 extra per year if C-treated, CI=(0.0053, ) 59 / 96

60 Model synonyms Two-level model Model with random subject levels Model with random intercepts Model with compound symmetry correlation structure (TYPE=CS) Model with exchangeability correlation structure 60 / 96

61 Alternative specifications Note, that the specification random girl(grp); can be written in two other ways: repeated visit / type=cs subject=girl(grp); CS: Compound symmetry random intercept / subject=girl(grp); In the following, we shall see generalizations of the RANDOM-statement 61 / 96

62 Individual growth rates? The time course is reasonably linear, but maybe the girls have different growth rates (slopes)? If we let Y git denote BMD for the i th girl (in the g th group) at time t (in years), we could look at the model: y git = A gi + B gi t + ε git, ε git N (0, σ 2 ) i.e., with different intercepts (A gi ) and different slopes (B gi ) for each girl 62 / 96

63 Fit a straight line for each girl Scatterplot of slopes vs. levels at first visit, as estimated by individual regressions: Slopes in the Calcium-group (blue dots) seem to be bigger / 96

64 Results from individual regression Estimates with standard errors in brackets: Group Level at baseline Slope P (0.0091) (0.0025) C (0.0082) (0.0030) Difference (0.0123) (0.0039) P-value NOTE: No restrictions on baseline here 64 / 96

65 Random regression a generalization of the idea of a random level We let each individual (girl) have her own level A gi her own slope B gi but / 96

66 Random regression, II... we bind these individual parameters (A gi and B gi ) together by normal distributions G = ( Agi B gi ( τ 2 a ω ω ) N 2 (( αβg ) τ 2 b ) =, G ) ( τ 2 a ρτ a τ b ρτ a τ b τ 2 b ) G describes the population variation of the lines, i.e. the inter-individual variation (reflected by the picture on p. 63). Note: No subscript on α because the groups are equal at baseline 66 / 96

67 Estimation in random regression keeping levels at baseline equal by omitting grp in the model-statement: proc mixed covtest data=calcium; class grp girl; model bmd=time grp*time / ddfm=kr s cl; random intercept time / type=un subject=girl g v vcorr; run; type=un in the random-statement refers to the matrix G on the previous slide, and the estimate is seen on p / 96

68 Output from random regression Estimated G Matrix Row Effect girl Col1 Col2 1 Intercept time Covariance Parameter Estimates Standard Z Cov Parm Subject Estimate Error Value Pr Z UN(1,1) girl <.0001 UN(2,1) girl UN(2,2) girl <.0001 Residual <.0001 Fit Statistics -2 Res Log Likelihood / 96

69 Output II: Estimated covariance and correlation for the 5 visits for one particular girl Estimated V Matrix for girl 101 Row Col1 Col2 Col3 Col4 Col Estimated V Correlation Matrix for girl 101 Row Col1 Col2 Col3 Col4 Col / 96

70 Output III: Estimated mean value structure Solution for Fixed Effects Standard Effect grp Estimate Error DF t Value Pr > t Alpha Intercept < time < time*grp C time*grp P Effect grp Lower Upper Intercept time time*grp C time*grp P.. Type 3 Tests of Fixed Effects Num Den Effect DF DF F Value Pr > F time <.0001 time*grp Thus, we find an extra increase in BMD of gram/cm 3 per year, CI=(0.0026, ), when giving calcium supplement, a little less than found on p / 96

71 Note concerning MIXED-notation It is necessary to use TYPE=UN in the RANDOM-statement in order to allow intercept and slope to be arbitrarily correlated Default option in RANDOM is TYPE=VC, which only specifies variance components with different variances If TYPE=UN is omitted, we may experience convergence problems and sometimes totally incomprehensible results. In this particular case, the correlation between intercept and slope is not that impressive - actually only (intercept is not completely out of range in this example, since it refers to the baseline). 71 / 96

72 Individual regressions approach Merits: Easy to understand and interpret Drawbacks: Suboptimal in case of unequal sample sizes Only simple models feasible Difficult/impossible to include covariates Not possible to account for equal baseline values 72 / 96

73 Random regression approach Merits: Uses all available information Optimal procedure if the model holds Easy to include covariates Drawbacks: Biased in case of informative missing values (or informative sample sizes) 73 / 96

74 Random regression vs. individual regressions Slopes from: Group Individual regressions Random regression P (0.0025) (0.0022) C (0.0030) (0.0022) Difference (0.0039) (0.0031) P-value Random regression gives a steeper slope The girls with flat (and low) profiles tend to be shorter These slopes contribute less to the random regression slope because they are less accurate Is this a coincidence?? Otherwise, we may see an example of informative missing values (last lecture) 74 / 96

75 Model checks Two types of residuals: Ordinary Observed minus predicted group mean (only systematic effects) Y ij X T ij ˆβ Conditional Observed minus predicted individual mean value (systematic and random effects) ε ij = Y ij (X T ij ˆβ + Z T ij ˆb i ) Conditional residuals are usually much smaller than the ordinary, since they describe deviations from subject-specific predictions. 75 / 96

76 Coding for model checks proc mixed plots=all data=calcium; class grp girl; model bmd=time grp*time / ddfm=kr s cl; random intercept time / type=un subject=girl g v vcorr; run; gives us Panels to check stability of variance and normality of residuals creates two output data sets: fitpm: Predicted mean BMD-values, common to girls in the same group fitp: Individually predicted BMD-values, specific for each girl 76 / 96

77 Model check, ordinary residuals 77 / 96

78 Model check, conditional residuals 78 / 96

79 Additional model checks Investigating linearity in age: proc sort data=fitpm; by grp time; run; title Ordinary residuals ; proc sgplot data=fitpm; loess Y=Resid X=time / group=grp; run; proc sort data=fitp; by grp time; run; title Conditional residuals ; proc sgplot data=fitp; loess Y=Resid X=time / group=grp; run; 79 / 96

80 Check of linearity, ordinary residuals (fitpm) 80 / 96

81 Check of linearity, conditional residuals(fitp) 81 / 96

82 Comments on model checks Ordinary residuals (p. 77): Homogeneity of variance, Slightly skew distribution, almost Normal Conditional residuals (p. 78): Evidently Normal Linearity (p. 81): Some non-systematic deviation from linearity seen in conditional residuals, but somewhat consistently for the two groups Linearity (p. 80): Deviation from linearity cannot be seen in the ordinary residuals (they drown in the between-subject effect) 82 / 96

83 Normality of random effects? Histogram or Box plots of estimated ˆb i s from the model is not worth much proc mixed covtest plots=all data=calcium; class grp girl; model bmd=time grp*time / ddfm=kr s cl outpm=fitpm outp=fitp residual influence; random intercept time / type=un subject=girl(grp) g v vcorr s; ods output solutionr=random_effects; run; proc sgplot data=random_effects; vbox Estimate / category=grp; run; 83 / 96

84 Predicted values from random regression Predicted group means: shown for two girls from different groups: 84 / 96

85 Predicted values from random regression, II Individual predictions: 85 / 96

86 Plausible models for BMD data Response profiles: Unstructured mean and unstructured covariance (only for balanced data) Compound symmetry covariance/correlation Synomym for random effect/level for each girl Autoregressive covariance/correlation or other covariance structures Random regression Random effects of both intercept and slope for each girl 86 / 96

87 How can we choose between models? Think... Graphical assessment of fit e.g. comparison of predicted profiles with average curves Inspection of residuals Automatic model checks, using ods graphics More extensice model checks using output data sets Tests against more flexible alternatives Fixed effects tested by the usual output Covariance patterns evaluated by χ 2 -tests on 2 log L 87 / 96

88 The mean value structure Look for: Linearity in scatter plot? Curves in residual plots? Alternatives: Splines More covariates Non-linear models 88 / 96

89 The covariance/correlation structure 1. Random effects: 2. Serial correlation (the pattern) 3. Error of measurement 89 / 96

90 Assumptions in a mixed effects model Linearity in covariates X ij (including Z ij ) Normality of residuals ε i. Normality of random effects b i Plausibility of covariance structure Independence between individuals Independence between X ij and b i, e.g. Does the timing and number of measurements relate to the development for the girl? 90 / 96

91 Importance of assumptions Important: Linearity Independence between individuals (normally not an issue) Independence between X ij and b i Appropriateness of the covariance structure: (may be circumvented by using the empirical sandwich estimator, option empirical in proc mixed) Less important: (especially when the number of observations is large) Normality of residuals ε ij Normality of random effects b i 91 / 96

92 Influential observations i.e. observations with a large influence on the estimates, either on the mean value or on the covariance parameters. These observations could have an unusual combination of covariates X i large ordinary residuals (from X i β) an unusual combination of covariates Z i or it could be a sign of a bad choice of mean value structure bad choice of covariance pattern 92 / 96

93 Cooks distance Tentative limit for being influential : 4 n = / 96

94 Repetotion Typical set-up for repeated measurements Two or more groups of subjects (typically receiving different treatments) Randomization at baseline Longitudinal measurements of the same quantity over time for each subject, typically as a function of time (duration of treatment) age cumulative dose of some drug Level 1: Single observations Level 2: Patients/Subjects 94 / 96

95 Repeated measurement designs Merits It is much more powerful in detecting time changes (data are paired with the subject as its own control) We may discover that subjects have different time courses (In designs with only cross-sectional data, this may also be the case, but we have no way of knowing!) We may identify important characteristics of the time courses, specific for each subject (trend, peak etc.) 95 / 96

96 Repeated measurement designs Drawbacks Traditional independence assumption is violated since repeated observations on the same individual are correlated (look alike) Traditional anova-models become impossible Comparison of time averages (or other characteristics) cannot incorporate time dependent covariates 96 / 96

Correlated data. Overview. Variance component models. Terminology for correlated measurements. Faculty of Health Sciences. Variance component models

Correlated data. Overview. Variance component models. Terminology for correlated measurements. Faculty of Health Sciences. Variance component models Faculty of Health Sciences Overview Correlated data Variance component models Lene Theil Skovgaard & Julie Lyng Forman November 28, 2017 One-way anova with random variation The rabbit example Hierarchical

More information

Correlated data. Overview. Example: Swelling due to vaccine. Variance component models. Faculty of Health Sciences. Variance component models

Correlated data. Overview. Example: Swelling due to vaccine. Variance component models. Faculty of Health Sciences. Variance component models Faculty of Health Sciences Overview Correlated data Variance component models One-way anova with random variation The rabbit example Hierarchical models with several levels Random regression Lene Theil

More information

Faculty of Health Sciences. Correlated data. Variance component models. Lene Theil Skovgaard & Julie Lyng Forman.

Faculty of Health Sciences. Correlated data. Variance component models. Lene Theil Skovgaard & Julie Lyng Forman. Faculty of Health Sciences Correlated data Variance component models Lene Theil Skovgaard & Julie Lyng Forman November 27, 2018 1 / 84 Overview One-way anova with random variation The rabbit example Hierarchical

More information

Correlated data. Repeated measurements over time. Typical set-up for repeated measurements. Traditional presentation of data

Correlated data. Repeated measurements over time. Typical set-up for repeated measurements. Traditional presentation of data Faculty of Health Sciences Repeated measurements over time Correlated data NFA, May 22, 2014 Longitudinal measurements Julie Lyng Forman & Lene Theil Skovgaard Department of Biostatistics University of

More information

Analysis of variance and regression. December 4, 2007

Analysis of variance and regression. December 4, 2007 Analysis of variance and regression December 4, 2007 Variance component models Variance components One-way anova with random variation estimation interpretations Two-way anova with random variation Crossed

More information

Correlated data. Variance component models. Example: Evaluate vaccine. Traditional assumption so far. Faculty of Health Sciences

Correlated data. Variance component models. Example: Evaluate vaccine. Traditional assumption so far. Faculty of Health Sciences Faculty of Health Sciences Variance component models Definitions and motivation Correlated data Variance component models, I Lene Theil Skovgaard November 29, 2013 One-way anova with random variation The

More information

Correlated data. Longitudinal data. Typical set-up for repeated measurements. Examples from literature, I. Faculty of Health Sciences

Correlated data. Longitudinal data. Typical set-up for repeated measurements. Examples from literature, I. Faculty of Health Sciences Faculty of Health Sciences Longitudinal data Correlated data Longitudinal measurements Outline Designs Models for the mean Covariance patterns Lene Theil Skovgaard November 27, 2015 Random regression Baseline

More information

Varians- og regressionsanalyse

Varians- og regressionsanalyse Faculty of Health Sciences Varians- og regressionsanalyse Variance component models Lene Theil Skovgaard Department of Biostatistics Variance component models Definitions and motivation One-way anova with

More information

Analysis of variance and regression. May 13, 2008

Analysis of variance and regression. May 13, 2008 Analysis of variance and regression May 13, 2008 Repeated measurements over time Presentation of data Traditional ways of analysis Variance component model (the dogs revisited) Random regression Baseline

More information

Correlated data. Overview. Variance component models. Terminology for correlated measurements. Faculty of Health Sciences. Variance component models

Correlated data. Overview. Variance component models. Terminology for correlated measurements. Faculty of Health Sciences. Variance component models Faculty of Health Sciences Overview Correlated data Variance component models Lene Theil Skovgaard & Julie Lyng Forman November 29, 2016 One-way anova with random variation The rabbit example Hierarchical

More information

Models for longitudinal data

Models for longitudinal data Faculty of Health Sciences Contents Models for longitudinal data Analysis of repeated measurements, NFA 016 Julie Lyng Forman & Lene Theil Skovgaard Department of Biostatistics, University of Copenhagen

More information

Variance components and LMMs

Variance components and LMMs Faculty of Health Sciences Variance components and LMMs Analysis of repeated measurements, 4th December 2014 Julie Lyng Forman & Lene Theil Skovgaard Department of Biostatistics, University of Copenhagen

More information

Variance components and LMMs

Variance components and LMMs Faculty of Health Sciences Topics for today Variance components and LMMs Analysis of repeated measurements, 4th December 04 Leftover from 8/: Rest of random regression example. New concepts for today:

More information

Faculty of Health Sciences. Correlated data. More about LMMs. Lene Theil Skovgaard. December 4, / 104

Faculty of Health Sciences. Correlated data. More about LMMs. Lene Theil Skovgaard. December 4, / 104 Faculty of Health Sciences Correlated data More about LMMs Lene Theil Skovgaard December 4, 2015 1 / 104 Further topics Model check and diagnostics Cross-over studies Paired T-tests with missing values

More information

Linear mixed models. Faculty of Health Sciences. Analysis of repeated measurements, 10th March Julie Lyng Forman & Lene Theil Skovgaard

Linear mixed models. Faculty of Health Sciences. Analysis of repeated measurements, 10th March Julie Lyng Forman & Lene Theil Skovgaard Faculty of Health Sciences Linear mixed models Analysis of repeated measurements, 10th March 2015 Julie Lyng Forman & Lene Theil Skovgaard Department of Biostatistics, University of Copenhagen 1 / 80 Program

More information

Answer to exercise: Blood pressure lowering drugs

Answer to exercise: Blood pressure lowering drugs Answer to exercise: Blood pressure lowering drugs The data set bloodpressure.txt contains data from a cross-over trial, involving three different formulations of a drug for lowering of blood pressure:

More information

Multi-factor analysis of variance

Multi-factor analysis of variance Faculty of Health Sciences Outline Multi-factor analysis of variance Basic statistics for experimental researchers 2015 Two-way ANOVA and interaction Mathed samples ANOVA Random vs systematic variation

More information

Variance component models

Variance component models Faculty of Health Sciences Variance component models Analysis of repeated measurements, NFA 2016 Julie Lyng Forman & Lene Theil Skovgaard Department of Biostatistics, University of Copenhagen Topics for

More information

Variance component models part I

Variance component models part I Faculty of Health Sciences Variance component models part I Analysis of repeated measurements, 30th November 2012 Julie Lyng Forman & Lene Theil Skovgaard Department of Biostatistics, University of Copenhagen

More information

Statistics for exp. medical researchers Regression and Correlation

Statistics for exp. medical researchers Regression and Correlation Faculty of Health Sciences Regression analysis Statistics for exp. medical researchers Regression and Correlation Lene Theil Skovgaard Sept. 28, 2015 Linear regression, Estimation and Testing Confidence

More information

Linear mixed models. Program. What are repeated measurements? Outline. Faculty of Health Sciences. Analysis of repeated measurements, 10th March 2015

Linear mixed models. Program. What are repeated measurements? Outline. Faculty of Health Sciences. Analysis of repeated measurements, 10th March 2015 university of copenhagen d e pa rt m e n t o f b i o s tat i s t i c s university of copenhagen d e pa rt m e n t o f b i o s tat i s t i c s Program Faculty of Health Sciences Topics: Linear mixed models

More information

More about linear mixed models

More about linear mixed models Faculty of Health Sciences Contents More about linear mixed models Analysis of repeated measurements, NFA 2016 Julie Lyng Forman & Lene Theil Skovgaard Department of Biostatistics, University of Copenhagen

More information

Introduction to SAS proc mixed

Introduction to SAS proc mixed Faculty of Health Sciences Introduction to SAS proc mixed Analysis of repeated measurements, 2017 Julie Forman Department of Biostatistics, University of Copenhagen Outline Data in wide and long format

More information

An Introduction to Multilevel Models. PSYC 943 (930): Fundamentals of Multivariate Modeling Lecture 25: December 7, 2012

An Introduction to Multilevel Models. PSYC 943 (930): Fundamentals of Multivariate Modeling Lecture 25: December 7, 2012 An Introduction to Multilevel Models PSYC 943 (930): Fundamentals of Multivariate Modeling Lecture 25: December 7, 2012 Today s Class Concepts in Longitudinal Modeling Between-Person vs. +Within-Person

More information

Introduction to SAS proc mixed

Introduction to SAS proc mixed Faculty of Health Sciences Introduction to SAS proc mixed Analysis of repeated measurements, 2017 Julie Forman Department of Biostatistics, University of Copenhagen 2 / 28 Preparing data for analysis The

More information

Introduction to Random Effects of Time and Model Estimation

Introduction to Random Effects of Time and Model Estimation Introduction to Random Effects of Time and Model Estimation Today s Class: The Big Picture Multilevel model notation Fixed vs. random effects of time Random intercept vs. random slope models How MLM =

More information

Review of CLDP 944: Multilevel Models for Longitudinal Data

Review of CLDP 944: Multilevel Models for Longitudinal Data Review of CLDP 944: Multilevel Models for Longitudinal Data Topics: Review of general MLM concepts and terminology Model comparisons and significance testing Fixed and random effects of time Significance

More information

Describing Change over Time: Adding Linear Trends

Describing Change over Time: Adding Linear Trends Describing Change over Time: Adding Linear Trends Longitudinal Data Analysis Workshop Section 7 University of Georgia: Institute for Interdisciplinary Research in Education and Human Development Section

More information

over Time line for the means). Specifically, & covariances) just a fixed variance instead. PROC MIXED: to 1000 is default) list models with TYPE=VC */

over Time line for the means). Specifically, & covariances) just a fixed variance instead. PROC MIXED: to 1000 is default) list models with TYPE=VC */ CLP 944 Example 4 page 1 Within-Personn Fluctuation in Symptom Severity over Time These data come from a study of weekly fluctuation in psoriasis severity. There was no intervention and no real reason

More information

SAS Syntax and Output for Data Manipulation:

SAS Syntax and Output for Data Manipulation: CLP 944 Example 5 page 1 Practice with Fixed and Random Effects of Time in Modeling Within-Person Change The models for this example come from Hoffman (2015) chapter 5. We will be examining the extent

More information

Introduction to Within-Person Analysis and RM ANOVA

Introduction to Within-Person Analysis and RM ANOVA Introduction to Within-Person Analysis and RM ANOVA Today s Class: From between-person to within-person ANOVAs for longitudinal data Variance model comparisons using 2 LL CLP 944: Lecture 3 1 The Two Sides

More information

General Linear Model (Chapter 4)

General Linear Model (Chapter 4) General Linear Model (Chapter 4) Outcome variable is considered continuous Simple linear regression Scatterplots OLS is BLUE under basic assumptions MSE estimates residual variance testing regression coefficients

More information

Multi-factor analysis of variance

Multi-factor analysis of variance Faculty of Health Sciences Outline Multi-factor analysis of variance Basic statistics for experimental researchers 2016 Two-way ANOVA and interaction Matched samples ANOVA Random vs systematic variation

More information

Statistics for exp. medical researchers Comparison of groups, T-tests and ANOVA

Statistics for exp. medical researchers Comparison of groups, T-tests and ANOVA Faculty of Health Sciences Outline Statistics for exp. medical researchers Comparison of groups, T-tests and ANOVA Lene Theil Skovgaard Sept. 14, 2015 Paired comparisons: tests and confidence intervals

More information

MIXED MODELS FOR REPEATED (LONGITUDINAL) DATA PART 2 DAVID C. HOWELL 4/1/2010

MIXED MODELS FOR REPEATED (LONGITUDINAL) DATA PART 2 DAVID C. HOWELL 4/1/2010 MIXED MODELS FOR REPEATED (LONGITUDINAL) DATA PART 2 DAVID C. HOWELL 4/1/2010 Part 1 of this document can be found at http://www.uvm.edu/~dhowell/methods/supplements/mixed Models for Repeated Measures1.pdf

More information

Analysis of variance and regression. November 22, 2007

Analysis of variance and regression. November 22, 2007 Analysis of variance and regression November 22, 2007 Parametrisations: Choice of parameters Comparison of models Test for linearity Linear splines Lene Theil Skovgaard, Dept. of Biostatistics, Institute

More information

STAT 5200 Handout #23. Repeated Measures Example (Ch. 16)

STAT 5200 Handout #23. Repeated Measures Example (Ch. 16) Motivating Example: Glucose STAT 500 Handout #3 Repeated Measures Example (Ch. 16) An experiment is conducted to evaluate the effects of three diets on the serum glucose levels of human subjects. Twelve

More information

Models for binary data

Models for binary data Faculty of Health Sciences Models for binary data Analysis of repeated measurements 2015 Julie Lyng Forman & Lene Theil Skovgaard Department of Biostatistics, University of Copenhagen 1 / 63 Program for

More information

Faculty of Health Sciences. Correlated data. Count variables. Lene Theil Skovgaard & Julie Lyng Forman. December 6, 2016

Faculty of Health Sciences. Correlated data. Count variables. Lene Theil Skovgaard & Julie Lyng Forman. December 6, 2016 Faculty of Health Sciences Correlated data Count variables Lene Theil Skovgaard & Julie Lyng Forman December 6, 2016 1 / 76 Modeling count outcomes Outline The Poisson distribution for counts Poisson models,

More information

Model Assumptions; Predicting Heterogeneity of Variance

Model Assumptions; Predicting Heterogeneity of Variance Model Assumptions; Predicting Heterogeneity of Variance Today s topics: Model assumptions Normality Constant variance Predicting heterogeneity of variance CLP 945: Lecture 6 1 Checking for Violations of

More information

multilevel modeling: concepts, applications and interpretations

multilevel modeling: concepts, applications and interpretations multilevel modeling: concepts, applications and interpretations lynne c. messer 27 october 2010 warning social and reproductive / perinatal epidemiologist concepts why context matters multilevel models

More information

Random Coefficients Model Examples

Random Coefficients Model Examples Random Coefficients Model Examples STAT:5201 Week 15 - Lecture 2 1 / 26 Each subject (or experimental unit) has multiple measurements (this could be over time, or it could be multiple measurements on a

More information

Supplemental Materials. In the main text, we recommend graphing physiological values for individual dyad

Supplemental Materials. In the main text, we recommend graphing physiological values for individual dyad 1 Supplemental Materials Graphing Values for Individual Dyad Members over Time In the main text, we recommend graphing physiological values for individual dyad members over time to aid in the decision

More information

Time-Invariant Predictors in Longitudinal Models

Time-Invariant Predictors in Longitudinal Models Time-Invariant Predictors in Longitudinal Models Today s Topics: What happens to missing predictors Effects of time-invariant predictors Fixed vs. systematically varying vs. random effects Model building

More information

Regression models. Categorical covariate, Quantitative outcome. Examples of categorical covariates. Group characteristics. Faculty of Health Sciences

Regression models. Categorical covariate, Quantitative outcome. Examples of categorical covariates. Group characteristics. Faculty of Health Sciences Faculty of Health Sciences Categorical covariate, Quantitative outcome Regression models Categorical covariate, Quantitative outcome Lene Theil Skovgaard April 29, 2013 PKA & LTS, Sect. 3.2, 3.2.1 ANOVA

More information

A (Brief) Introduction to Crossed Random Effects Models for Repeated Measures Data

A (Brief) Introduction to Crossed Random Effects Models for Repeated Measures Data A (Brief) Introduction to Crossed Random Effects Models for Repeated Measures Data Today s Class: Review of concepts in multivariate data Introduction to random intercepts Crossed random effects models

More information

Biostatistics Workshop Longitudinal Data Analysis. Session 4 GARRETT FITZMAURICE

Biostatistics Workshop Longitudinal Data Analysis. Session 4 GARRETT FITZMAURICE Biostatistics Workshop 2008 Longitudinal Data Analysis Session 4 GARRETT FITZMAURICE Harvard University 1 LINEAR MIXED EFFECTS MODELS Motivating Example: Influence of Menarche on Changes in Body Fat Prospective

More information

Describing Within-Person Fluctuation over Time using Alternative Covariance Structures

Describing Within-Person Fluctuation over Time using Alternative Covariance Structures Describing Within-Person Fluctuation over Time using Alternative Covariance Structures Today s Class: The Big Picture ACS models using the R matrix only Introducing the G, Z, and V matrices ACS models

More information

Analysis of variance and regression. April 17, Contents Comparison of several groups One-way ANOVA. Two-way ANOVA Interaction Model checking

Analysis of variance and regression. April 17, Contents Comparison of several groups One-way ANOVA. Two-way ANOVA Interaction Model checking Analysis of variance and regression Contents Comparison of several groups One-way ANOVA April 7, 008 Two-way ANOVA Interaction Model checking ANOVA, April 008 Comparison of or more groups Julie Lyng Forman,

More information

Review of Multiple Regression

Review of Multiple Regression Ronald H. Heck 1 Let s begin with a little review of multiple regression this week. Linear models [e.g., correlation, t-tests, analysis of variance (ANOVA), multiple regression, path analysis, multivariate

More information

A Re-Introduction to General Linear Models (GLM)

A Re-Introduction to General Linear Models (GLM) A Re-Introduction to General Linear Models (GLM) Today s Class: You do know the GLM Estimation (where the numbers in the output come from): From least squares to restricted maximum likelihood (REML) Reviewing

More information

Lecture 2. The Simple Linear Regression Model: Matrix Approach

Lecture 2. The Simple Linear Regression Model: Matrix Approach Lecture 2 The Simple Linear Regression Model: Matrix Approach Matrix algebra Matrix representation of simple linear regression model 1 Vectors and Matrices Where it is necessary to consider a distribution

More information

Repeated Measures Data

Repeated Measures Data Repeated Measures Data Mixed Models Lecture Notes By Dr. Hanford page 1 Data where subjects are measured repeatedly over time - predetermined intervals (weekly) - uncontrolled variable intervals between

More information

SAS Syntax and Output for Data Manipulation: CLDP 944 Example 3a page 1

SAS Syntax and Output for Data Manipulation: CLDP 944 Example 3a page 1 CLDP 944 Example 3a page 1 From Between-Person to Within-Person Models for Longitudinal Data The models for this example come from Hoffman (2015) chapter 3 example 3a. We will be examining the extent to

More information

TABLE OF CONTENTS INTRODUCTION TO MIXED-EFFECTS MODELS...3

TABLE OF CONTENTS INTRODUCTION TO MIXED-EFFECTS MODELS...3 Table of contents TABLE OF CONTENTS...1 1 INTRODUCTION TO MIXED-EFFECTS MODELS...3 Fixed-effects regression ignoring data clustering...5 Fixed-effects regression including data clustering...1 Fixed-effects

More information

ANALYSIS OF VARIANCE OF BALANCED DAIRY SCIENCE DATA USING SAS

ANALYSIS OF VARIANCE OF BALANCED DAIRY SCIENCE DATA USING SAS ANALYSIS OF VARIANCE OF BALANCED DAIRY SCIENCE DATA USING SAS Ravinder Malhotra and Vipul Sharma National Dairy Research Institute, Karnal-132001 The most common use of statistics in dairy science is testing

More information

Analysis of variance. April 16, Contents Comparison of several groups

Analysis of variance. April 16, Contents Comparison of several groups Contents Comparison of several groups Analysis of variance April 16, 2009 One-way ANOVA Two-way ANOVA Interaction Model checking Acknowledgement for use of presentation Julie Lyng Forman, Dept. of Biostatistics

More information

STAT 705 Chapters 23 and 24: Two factors, unequal sample sizes; multi-factor ANOVA

STAT 705 Chapters 23 and 24: Two factors, unequal sample sizes; multi-factor ANOVA STAT 705 Chapters 23 and 24: Two factors, unequal sample sizes; multi-factor ANOVA Timothy Hanson Department of Statistics, University of South Carolina Stat 705: Data Analysis II 1 / 22 Balanced vs. unbalanced

More information

Analysis of variance. April 16, 2009

Analysis of variance. April 16, 2009 Analysis of variance April 16, 2009 Contents Comparison of several groups One-way ANOVA Two-way ANOVA Interaction Model checking Acknowledgement for use of presentation Julie Lyng Forman, Dept. of Biostatistics

More information

Random Coefficient Model (a.k.a. multilevel model) (Adapted from UCLA Statistical Computing Seminars)

Random Coefficient Model (a.k.a. multilevel model) (Adapted from UCLA Statistical Computing Seminars) STAT:5201 Applied Statistic II Random Coefficient Model (a.k.a. multilevel model) (Adapted from UCLA Statistical Computing Seminars) School math achievement scores The data file consists of 7185 students

More information

Analysis of Longitudinal Data: Comparison Between PROC GLM and PROC MIXED. Maribeth Johnson Medical College of Georgia Augusta, GA

Analysis of Longitudinal Data: Comparison Between PROC GLM and PROC MIXED. Maribeth Johnson Medical College of Georgia Augusta, GA Analysis of Longitudinal Data: Comparison Between PROC GLM and PROC MIXED Maribeth Johnson Medical College of Georgia Augusta, GA Overview Introduction to longitudinal data Describe the data for examples

More information

Simple linear regression

Simple linear regression Simple linear regression Biometry 755 Spring 2008 Simple linear regression p. 1/40 Overview of regression analysis Evaluate relationship between one or more independent variables (X 1,...,X k ) and a single

More information

ANOVA Longitudinal Models for the Practice Effects Data: via GLM

ANOVA Longitudinal Models for the Practice Effects Data: via GLM Psyc 943 Lecture 25 page 1 ANOVA Longitudinal Models for the Practice Effects Data: via GLM Model 1. Saturated Means Model for Session, E-only Variances Model (BP) Variances Model: NO correlation, EQUAL

More information

VIII. ANCOVA. A. Introduction

VIII. ANCOVA. A. Introduction VIII. ANCOVA A. Introduction In most experiments and observational studies, additional information on each experimental unit is available, information besides the factors under direct control or of interest.

More information

Topic 25 - One-Way Random Effects Models. Outline. Random Effects vs Fixed Effects. Data for One-way Random Effects Model. One-way Random effects

Topic 25 - One-Way Random Effects Models. Outline. Random Effects vs Fixed Effects. Data for One-way Random Effects Model. One-way Random effects Topic 5 - One-Way Random Effects Models One-way Random effects Outline Model Variance component estimation - Fall 013 Confidence intervals Topic 5 Random Effects vs Fixed Effects Consider factor with numerous

More information

For more information about how to cite these materials visit

For more information about how to cite these materials visit Author(s): Kerby Shedden, Ph.D., 2010 License: Unless otherwise noted, this material is made available under the terms of the Creative Commons Attribution Share Alike 3.0 License: http://creativecommons.org/licenses/by-sa/3.0/

More information

Time-Invariant Predictors in Longitudinal Models

Time-Invariant Predictors in Longitudinal Models Time-Invariant Predictors in Longitudinal Models Today s Class (or 3): Summary of steps in building unconditional models for time What happens to missing predictors Effects of time-invariant predictors

More information

9 Correlation and Regression

9 Correlation and Regression 9 Correlation and Regression SW, Chapter 12. Suppose we select n = 10 persons from the population of college seniors who plan to take the MCAT exam. Each takes the test, is coached, and then retakes the

More information

STATISTICS 479 Exam II (100 points)

STATISTICS 479 Exam II (100 points) Name STATISTICS 79 Exam II (1 points) 1. A SAS data set was created using the following input statement: Answer parts(a) to (e) below. input State $ City $ Pop199 Income Housing Electric; (a) () Give the

More information

Lecture 10: F -Tests, ANOVA and R 2

Lecture 10: F -Tests, ANOVA and R 2 Lecture 10: F -Tests, ANOVA and R 2 1 ANOVA We saw that we could test the null hypothesis that β 1 0 using the statistic ( β 1 0)/ŝe. (Although I also mentioned that confidence intervals are generally

More information

Notes 6. Basic Stats Procedures part II

Notes 6. Basic Stats Procedures part II Statistics 5106, Fall 2007 Notes 6 Basic Stats Procedures part II Testing for Correlation between Two Variables You have probably all heard about correlation. When two variables are correlated, they are

More information

Business Statistics. Lecture 10: Course Review

Business Statistics. Lecture 10: Course Review Business Statistics Lecture 10: Course Review 1 Descriptive Statistics for Continuous Data Numerical Summaries Location: mean, median Spread or variability: variance, standard deviation, range, percentiles,

More information

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator UNIVERSITY OF TORONTO Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS Duration - 3 hours Aids Allowed: Calculator LAST NAME: FIRST NAME: STUDENT NUMBER: There are 27 pages

More information

Correlation and Simple Linear Regression

Correlation and Simple Linear Regression Correlation and Simple Linear Regression Sasivimol Rattanasiri, Ph.D Section for Clinical Epidemiology and Biostatistics Ramathibodi Hospital, Mahidol University E-mail: sasivimol.rat@mahidol.ac.th 1 Outline

More information

Variance. Standard deviation VAR = = value. Unbiased SD = SD = 10/23/2011. Functional Connectivity Correlation and Regression.

Variance. Standard deviation VAR = = value. Unbiased SD = SD = 10/23/2011. Functional Connectivity Correlation and Regression. 10/3/011 Functional Connectivity Correlation and Regression Variance VAR = Standard deviation Standard deviation SD = Unbiased SD = 1 10/3/011 Standard error Confidence interval SE = CI = = t value for

More information

Regression Analysis: Basic Concepts

Regression Analysis: Basic Concepts The simple linear model Regression Analysis: Basic Concepts Allin Cottrell Represents the dependent variable, y i, as a linear function of one independent variable, x i, subject to a random disturbance

More information

36-309/749 Experimental Design for Behavioral and Social Sciences. Dec 1, 2015 Lecture 11: Mixed Models (HLMs)

36-309/749 Experimental Design for Behavioral and Social Sciences. Dec 1, 2015 Lecture 11: Mixed Models (HLMs) 36-309/749 Experimental Design for Behavioral and Social Sciences Dec 1, 2015 Lecture 11: Mixed Models (HLMs) Independent Errors Assumption An error is the deviation of an individual observed outcome (DV)

More information

Correlated data. Overview. Cross-over study. Repetition. Faculty of Health Sciences. Variance component models, II. More on variance component models

Correlated data. Overview. Cross-over study. Repetition. Faculty of Health Sciences. Variance component models, II. More on variance component models Faculty of Health Sciences Overview Correlated data More on variance component models Variance component models, II Cross-over studies Non-normal data Comparing measurement devices Lene Theil Skovgaard

More information

6. Multiple regression - PROC GLM

6. Multiple regression - PROC GLM Use of SAS - November 2016 6. Multiple regression - PROC GLM Karl Bang Christensen Department of Biostatistics, University of Copenhagen. http://biostat.ku.dk/~kach/sas2016/ kach@biostat.ku.dk, tel: 35327491

More information

K. Model Diagnostics. residuals ˆɛ ij = Y ij ˆµ i N = Y ij Ȳ i semi-studentized residuals ω ij = ˆɛ ij. studentized deleted residuals ɛ ij =

K. Model Diagnostics. residuals ˆɛ ij = Y ij ˆµ i N = Y ij Ȳ i semi-studentized residuals ω ij = ˆɛ ij. studentized deleted residuals ɛ ij = K. Model Diagnostics We ve already seen how to check model assumptions prior to fitting a one-way ANOVA. Diagnostics carried out after model fitting by using residuals are more informative for assessing

More information

Topic 17 - Single Factor Analysis of Variance. Outline. One-way ANOVA. The Data / Notation. One way ANOVA Cell means model Factor effects model

Topic 17 - Single Factor Analysis of Variance. Outline. One-way ANOVA. The Data / Notation. One way ANOVA Cell means model Factor effects model Topic 17 - Single Factor Analysis of Variance - Fall 2013 One way ANOVA Cell means model Factor effects model Outline Topic 17 2 One-way ANOVA Response variable Y is continuous Explanatory variable is

More information

df=degrees of freedom = n - 1

df=degrees of freedom = n - 1 One sample t-test test of the mean Assumptions: Independent, random samples Approximately normal distribution (from intro class: σ is unknown, need to calculate and use s (sample standard deviation)) Hypotheses:

More information

Applied Statistics and Econometrics

Applied Statistics and Econometrics Applied Statistics and Econometrics Lecture 6 Saul Lach September 2017 Saul Lach () Applied Statistics and Econometrics September 2017 1 / 53 Outline of Lecture 6 1 Omitted variable bias (SW 6.1) 2 Multiple

More information

Describing Within-Person Change over Time

Describing Within-Person Change over Time Describing Within-Person Change over Time Topics: Multilevel modeling notation and terminology Fixed and random effects of linear time Predicted variances and covariances from random slopes Dependency

More information

Analysing data: regression and correlation S6 and S7

Analysing data: regression and correlation S6 and S7 Basic medical statistics for clinical and experimental research Analysing data: regression and correlation S6 and S7 K. Jozwiak k.jozwiak@nki.nl 2 / 49 Correlation So far we have looked at the association

More information

Random Intercept Models

Random Intercept Models Random Intercept Models Edps/Psych/Soc 589 Carolyn J. Anderson Department of Educational Psychology c Board of Trustees, University of Illinois Spring 2019 Outline A very simple case of a random intercept

More information

y response variable x 1, x 2,, x k -- a set of explanatory variables

y response variable x 1, x 2,, x k -- a set of explanatory variables 11. Multiple Regression and Correlation y response variable x 1, x 2,, x k -- a set of explanatory variables In this chapter, all variables are assumed to be quantitative. Chapters 12-14 show how to incorporate

More information

Time-Invariant Predictors in Longitudinal Models

Time-Invariant Predictors in Longitudinal Models Time-Invariant Predictors in Longitudinal Models Topics: What happens to missing predictors Effects of time-invariant predictors Fixed vs. systematically varying vs. random effects Model building strategies

More information

9. Linear Regression and Correlation

9. Linear Regression and Correlation 9. Linear Regression and Correlation Data: y a quantitative response variable x a quantitative explanatory variable (Chap. 8: Recall that both variables were categorical) For example, y = annual income,

More information

SAS Code for Data Manipulation: SPSS Code for Data Manipulation: STATA Code for Data Manipulation: Psyc 945 Example 1 page 1

SAS Code for Data Manipulation: SPSS Code for Data Manipulation: STATA Code for Data Manipulation: Psyc 945 Example 1 page 1 Psyc 945 Example page Example : Unconditional Models for Change in Number Match 3 Response Time (complete data, syntax, and output available for SAS, SPSS, and STATA electronically) These data come from

More information

Parametrisations, splines

Parametrisations, splines / 7 Parametrisations, splines Analysis of variance and regression course http://staff.pubhealth.ku.dk/~lts/regression_2 Marc Andersen, mja@statgroup.dk Analysis of variance and regression for health researchers,

More information

1 A Review of Correlation and Regression

1 A Review of Correlation and Regression 1 A Review of Correlation and Regression SW, Chapter 12 Suppose we select n = 10 persons from the population of college seniors who plan to take the MCAT exam. Each takes the test, is coached, and then

More information

Random Effects. Edps/Psych/Stat 587. Carolyn J. Anderson. Fall Department of Educational Psychology. university of illinois at urbana-champaign

Random Effects. Edps/Psych/Stat 587. Carolyn J. Anderson. Fall Department of Educational Psychology. university of illinois at urbana-champaign Random Effects Edps/Psych/Stat 587 Carolyn J. Anderson Department of Educational Psychology I L L I N O I S university of illinois at urbana-champaign Fall 2012 Outline Introduction Empirical Bayes inference

More information

COMPLETELY RANDOM DESIGN (CRD) -Design can be used when experimental units are essentially homogeneous.

COMPLETELY RANDOM DESIGN (CRD) -Design can be used when experimental units are essentially homogeneous. COMPLETELY RANDOM DESIGN (CRD) Description of the Design -Simplest design to use. -Design can be used when experimental units are essentially homogeneous. -Because of the homogeneity requirement, it may

More information

R 2 and F -Tests and ANOVA

R 2 and F -Tests and ANOVA R 2 and F -Tests and ANOVA December 6, 2018 1 Partition of Sums of Squares The distance from any point y i in a collection of data, to the mean of the data ȳ, is the deviation, written as y i ȳ. Definition.

More information

Analysis of Covariance

Analysis of Covariance Analysis of Covariance (ANCOVA) Bruce A Craig Department of Statistics Purdue University STAT 514 Topic 10 1 When to Use ANCOVA In experiment, there is a nuisance factor x that is 1 Correlated with y 2

More information

Regression: Main Ideas Setting: Quantitative outcome with a quantitative explanatory variable. Example, cont.

Regression: Main Ideas Setting: Quantitative outcome with a quantitative explanatory variable. Example, cont. TCELL 9/4/205 36-309/749 Experimental Design for Behavioral and Social Sciences Simple Regression Example Male black wheatear birds carry stones to the nest as a form of sexual display. Soler et al. wanted

More information

Covariance Structure Approach to Within-Cases

Covariance Structure Approach to Within-Cases Covariance Structure Approach to Within-Cases Remember how the data file grapefruit1.data looks: Store sales1 sales2 sales3 1 62.1 61.3 60.8 2 58.2 57.9 55.1 3 51.6 49.2 46.2 4 53.7 51.5 48.3 5 61.4 58.7

More information

Repeated Measures Modeling With PROC MIXED E. Barry Moser, Louisiana State University, Baton Rouge, LA

Repeated Measures Modeling With PROC MIXED E. Barry Moser, Louisiana State University, Baton Rouge, LA Paper 188-29 Repeated Measures Modeling With PROC MIXED E. Barry Moser, Louisiana State University, Baton Rouge, LA ABSTRACT PROC MIXED provides a very flexible environment in which to model many types

More information

Models for Clustered Data

Models for Clustered Data Models for Clustered Data Edps/Psych/Soc 589 Carolyn J Anderson Department of Educational Psychology c Board of Trustees, University of Illinois Spring 2019 Outline Notation NELS88 data Fixed Effects ANOVA

More information