Faculty of Health Sciences. Correlated data. Variance component models. Lene Theil Skovgaard & Julie Lyng Forman.

Size: px
Start display at page:

Download "Faculty of Health Sciences. Correlated data. Variance component models. Lene Theil Skovgaard & Julie Lyng Forman."

Transcription

1 Faculty of Health Sciences Correlated data Variance component models Lene Theil Skovgaard & Julie Lyng Forman November 27, / 84

2 Overview One-way anova with random variation The rabbit example Hierarchical models with several levels Random regression Home pages: RepeatedMeasures2018.html 2 / 84

3 Variance component models Models involving several sources of random variation geographical/environmental variation between regions, hospitals, schools or countries biological variation variation between individuals, families or animals within-individual variation variation between arms, teeth, injection sites, days variation due to uncontrollable circumstances time of day, temperature, observer measurement error Of course, they may also include fixed effects, such as treatment, gender etc. 3 / 84

4 Example: Swelling due to vaccine Research question: How much swelling can be expected in relation to a vaccination? Experiment: 6 rabbits, each vaccinated in 6 (randomly?) selected spots on the back Outcome y rs : swelling in cm 2, where r= 1,,R=6 denotes the rabbit, s= 1,,S=6 denotes the spot We have observed a total of 36 swelling areas, but we must expect swelling to be specific to the individual rabbit. 4 / 84

5 Scatter plot X-axis: Arbitrary numbering of rabbits Clearly, rabbit no. 2 has a tendency for larger swelling 5 / 84

6 Naive quantification of swelling The MEANS Procedure Analysis Variable : swelling Lower 95% Upper 95% N Mean Std Error CL for Mean CL for Mean What is wrong here? 6 / 84 Imagine all measurements on a rabbit resulted in the same value... Then we would actually only have 6 measurements..., and SEM would be awfully wrong So what when they are only somewhat identical

7 Correlated observations Observations on the same individual look alike, they are correlated Why is this important? Variation between observations on the same rabbit (within rabbit) will not reflect the population variation (the variation between individuals) The information in data will seem misleadingly high, as if we had 36 rabbits So, we have to take the correlation into account 7 / 84

8 Neglectance of correlation will lead to errors Typical errors: Wrong standard errors (too small or too big) Wrong confidence intervals (too narrow or too wide) Wrong conclusions (type I or type II errors) The type of error depends upon the kind of question asked.. to be further explained 8 / 84

9 Inadequate analysis of swelling Each rabbit has a mean level There is some variation between the six injection sites for the same rabbit In computer language: The rabbit is a factor, and the analysis is a one-way ANOVA proc glm data=rabbit; class rabbit; model swelling=rabbit / solution; run; lm(swelling ~ factor(rabbit), data=rabbit) 9 / 84

10 Output from inadequate model The GLM Procedure Dependent Variable: swelling Sum of Source DF Squares Mean Square F Value Pr > F Model Error Corrected Total R-Square Coeff Var Root MSE swelling Mean Source DF Type III SS Mean Square F Value Pr > F rabbit The rabbits have different levels (P=0.0040) but this was NOT the question 10 / 84

11 Output from inadequate model, II Standard Parameter Estimate Error t Value Pr > t Intercept B <.0001 rabbit B rabbit B rabbit B rabbit B rabbit B rabbit B... But: Do we get any useful information from this? We are not interested in these particular 6 rabbits, only in rabbits in general, as a species We assume these 6 rabbits to have been randomly selected from the species (just as we always do). 11 / 84

12 Variance component model Instead of fixed level parameters for each rabbit, we model the differences between rabbits as an extra source of variation: y rs = µ + a r + ε rs where the a r s and the ε rs s are assumed to be independent, Normally distributed, with variances Var(a r )=ω 2 B, Var(ε rs )=σ 2 W, the Between variance the Within variance rabbit is now a random effect, or random factor, ωb 2 and σ2 W are called variance components, and the model is also called a two-level model 12 / 84

13 Formulation in terms of correlation All swelling observations have common mean and variance: y rs N (µ, ω 2 B + σ 2 W ) But: Measurements made on the same rabbit are correlated with the intra-class correlation Corr(y r1, y r2 ) = ρ = ω 2 B ω 2 B + σ2 W Measurements made on the same rabbit tend to look more alike than measurements made on different rabbits. All measurements on the same rabbit look equally much alike. This correlation structure is called compound symmetry (CS) or exchangeability. 13 / 84

14 Covariance and correlation For the six injections sites, the covariance matrix for each rabbit is: ω 2 B + σ2 W ω 2 B ω 2 B ω 2 B ω 2 B ω 2 B ω 2 B ω 2 B + σ2 W ω 2 B ω 2 B ω 2 B ω 2 B ω 2 B ω 2 B ω 2 B + σ2 W ω 2 B ω 2 B ω 2 B ω 2 B ω 2 B ω 2 B ω 2 B + σ2 W ω 2 B ω 2 B ω 2 B ω 2 B ω 2 B ω 2 B ω 2 B + σ2 W ω 2 B ω 2 B ω 2 B ω 2 B ω 2 B ω 2 B ω 2 B + σ2 W and the corresponding Compound symmetry correlation structure is: 1 ρ ρ ρ ρ ρ ρ 1 ρ ρ ρ ρ ρ ρ 1 ρ ρ ρ ρ ρ ρ 1 ρ ρ ρ ρ ρ ρ 1 ρ ρ ρ ρ ρ ρ 1 14 / 84

15 Exchangeability = Compound Symmetry This covariance/correlation structure implies: All variances are equal: There should be the same variation between rabbits for all injection sites (if meaningful...) Any pair of measurements are equally correlated: All injection sites should be equally related to each other How could these assumptions be violated? Are the injection sites really randomly selected? If not, an unstructured covariance may be more appropriate: Some injection sites are more related than others (e.g. due to proximity). 15 / 84

16 Estimation in variance component models proc mixed data=rabbit; class rabbit; model swelling = / ddfm=kr s cl; random rabbit; run; lme(swelling ~ 1, data=rabbit random =~1 rabbit) Covariance Parameter Estimates Cov Parm Estimate rabbit Residual Solution for Fixed Effects Standard Effect Estimate Error DF t Value Lower Upper Intercept Comparison to p. 6 reveals that correctly taking the correlation into account yields the same estimate, but substantially wider confidence interval To ignore the correlation leads to a type 1 error 16 / 84

17 Interpretation of variance components Proportion of Variation Variance component Estimate variation Between ωb % Within σw % Total ωb 2 + σ2 W % Typical differences (95% Prediction Intervals): for spots on the same rabbit ± = ±2.16 cm 2 for spots on different rabbits ± = ±2.70 cm 2 17 / 84

18 Interpretation of variance components, cont d Approx. 2 3 of the variation in the measurements comes from the variation within rabbits, i.e. between injection sites on the same rabbit. Why? Could there be a systematic difference between the injection sites? Cov Parm Estimate rabbit Residual Type 3 Tests of Fixed Effects Num Den Effect DF DF F Value Pr > F spot This does not seem to be the case (P=0.26). 18 / 84

19 Design considerations, precision of overall mean For R=no. of rabbits, varying from 3 to 20: For S=no. of spots, varying from 1 to 10: Standard error is the square root of: Var(ȳ) = ω2 B R 19 / 84 + σ2 W RS

20 Design considerations, II The red dotted line on the picture on p. 19 shows the present precision, from 36 observations on 6 rabbits. It also shows that to improve precision, we have to increase the number of rabbits, since increasing the number of injections on each rabbit does not add information. We could have the same precision as here by including approximately 13 rabbits and giving them only one injection each. Effectively, we have only approximately two independent observations from each rabbit! Take care: This is a pilot study, do not rely too heavily on the results. 20 / 84

21 Analysis of averages Since our design is balanced, we could have obtained our (correct) result in a simple fashion: Compute averages for each rabbit Compute confidence interval for mean value of averages Analysis Variable : mswelling Lower 95% Upper 95% N Mean Std Error CL for Mean CL for Mean but if the design is not balanced / 84

22 Reduced data set - omit 3 observations not randomly chosen... What kind of effects would we expect? 22 / 84

23 Quantification of overall swelling Right columns correspond to reduced data set, where the 3 smallest measurements from rabbit 2 (with the highest level) are omitted. All 36 data Omitting 3 observations Method Estimate (SE) Estimate (SE) Simple averages (0.155) (0.163) of all (p. 6) Average (0.267) (0.333) of averages Weighted average (0.265) of averages Variance component (0.267) (0.298) model, (p. 16) 23 / 84

24 Comments to quantifications in Table on p. 23 Simple averages: Pool all 36 measurements, wrongly assuming independence. This will result in too small standard errors. In the reduced data set, the estimate is downwards biased, since we have omitted some of the largest observations. Average of averages: Start out by taking averages for each rabbit. This will be OK for balanced designs, but when we omit the three lowest observations for rabbit 2, this rabbit appear to have a higher level and will give an upwards bias. 24 / 84

25 Comments, cont d Weighted average of averages: As above, but weighted according to number of observations. For balanced designs, all weights are equal, but when we omit three observations, the rabbit 2 has a lower weight in the average due to only 3 observations This will result in a downwards bias, because rabbit 2 has a high level. Random rabbit: The variance component model will yield the correct result, provided that observations are missing at random. In the reduced data set, rabbit 2 has a lower weight in the average due to a larger standard error 25 / 84

26 Estimation of individual rabbit means...? Two different approaches: Traditional averages ȳ r. BLUP s (best linear unbiased predictor) rely on the assumption that individuals come from the same population, and become weighted averages which have been shrinked towards the overall mean: kȳ r. + (1 k)ȳ.., where k = ω 2 B ω 2 B + σ2 W S (k is close to 0 when σ 2 W is large, otherwise closer to 1) More shrinkage if rabbits look alike BLUPs are used for ranking e.g. schools 26 / 84

27 BLUPs vs. averages, shrinkage Left panel: The full dataset, Right panel: Reduced data set: Larger shrinkage for rabbit no. 2 in reduced dataset 27 / 84

28 Hierarchical designs, cluster designs e.g. School, School Class and Pupil [I] = [S*C*P] [S*C] S 28 / 84

29 Hierarchical designs, with covariates [S C P] [S C] [S] Gender Class grade School type 29 / 84

30 Examples of hierarchies level 1 level 2 level 3 subjects twin pairs countries subjects families regions students classes schools spots rabbits fields sections rats visits subjects centres Measurements belonging together in the same cluster look alike (are correlated) On all levels, we may have random variation (variance components), as well as covariates 30 / 84

31 Merits of cluster designs Certain effects may be estimated more precisely, since some sources of variation are eliminated, e.g. by making comparisons within a family or a school class This is analogous to the paired comparison situation. When planning subsequent investigations, the knowledge of the relative sizes of the variance components will be of help in deciding the number of repetitions needed at each level 31 / 84

32 Drawbacks of cluster designs Wrong conclusions may result, if one or more sources of variation are disregarded low efficiency (type 2 error) for evaluation of level 1 covariates (within-cluster effects) too small standard errors (type 1 error) for estimates of level 2 effects (between-cluster effects) 32 / 84

33 The school example Models for school data from p. 29 include 3 sources of variation: 1: Variation between schools ([S]) 2: Variation between classes in each school ([S*C]) 3: Variation between pupils in each class ([S*C*P], residual variation) What may happen if we forget the variation between classes in the same school, [S*C]? 33 / 84

34 The school example, II If we forget the variation between classes in the same school, [S*C]? Pupils in the same class will be assumed no more correlated than pupils from different classes (in the same school) Covariates on class level (e.g. class grade) will appear too important because N is taken to be too big Covariates on pupil level (e.g. gender) will appear less important because we overlook pairing We will return to this example, when we discuss binary data 34 / 84

35 Another example of a 3-level model Research problem: In order to evaluate the effect of cytostatica on pancreas islet β-cells, we need to quantify the number of nuclei per cell. Henrik Winther Nielsen, Inst. Med. Anat. How should data be collected in order to maximize precision with low expense and work load? How many animals (rats)? How many slices of the pancreas? How many sections of each slice should be counted? Hierarchy: fields sections rats σ 2 τ 2 ω 2 Factor diagram: [I] = [R*S*F] [R*S] [R] 0 35 / 84

36 Pilot study 4 rats (R) 3 sections for each rat (S) 5 randomly chosen fields from each section (F) Scatter plot, with jitter (symbols indicate sections) 36 / 84

37 Estimation in 3-level model proc mixed data=nuclei; class rat section; model nuclei= / ddfm=kr vciry s; random intercept section / subject=rat; run; lme(nuclei ~ 1, random=~1 rat/section, data=hwn) Covariance Parameter Estimates Covariance Parameter Estimates Cov Parm Subject Estimate Intercept rat section rat Residual Solution for Fixed Effects Standard Effect Estimate Error DF t Value Pr > t Intercept / 84

38 Variances are positive! and therefore these models describe all correlations to be positive. But note: It may happen that correlations are in reality negative! by a coincidence as a result of competition between units belonging together, e.g. when measuring yield for plants grown in the same pot In such a case, the corresponding variance component will be reported as a zero Here, the variation between sections is close to 0 38 / 84

39 Interpretation of variance components Proportion of Variation Variance component Estimate variation Rats ω % Sections τ % Fields σ % Total ω 2 + τ 2 + σ % Almost all variation is on the lowest level: Rats appear quite identical, perhaps they are from the same litter? Sections appear extremely identical, is the pancreas homogeneous? 39 / 84

40 Typical differences between two measurements: for different fields on the same section ± = ±1.255 for different sections on the same rat ±2 2 ( ) = ±1.264 for sections on different rats ±2 2 ( ) = ± / 84

41 Correlations vary, depending on Measurements on the same section: Corr(y rs1, y rs2 ) = ω 2 + τ 2 ω 2 + τ 2 + σ 2 = Measurements on different sections of the same rat: Corr(y r11, y r22 ) = ω 2 ω 2 + τ 2 + σ 2 = Measurements from different rats are independent 41 / 84

42 Model check Normality of scaled residuals? Mean value is only an intercept Logarithmic transform would be good 42 / 84

43 Logarithmic analysis proc mixed data=nuclei; class rat section; model log_nuclei= / ddfm=kr vciry s; random intercept section / subject=rat; run; Covariance Parameter Estimates Cov Parm Estimate rat section(rat) 0 Residual Solution for Fixed Effects Standard Effect Estimate Error DF t Value Pr > t Intercept / 84

44 Variances are positive! but here, we see that the variation between sections on the same pancreas is estimated as a zero (indicating de facto negative correlation between sections...) This is probably a coincidence, but it indicates a very homogeneous pancreas (in that direction) 44 / 84

45 Model check for logarithmic analysis Normality of scaled residuals? Mean value is only an intercept Normality seems ok 45 / 84

46 Previous example: Calcium supplements A total of year old girls were randomized to receive either calcium or placebo. Outcome: BMD=bone mineral density, in mg cm, 2 ideally measured every 6 months (5 visits), but in reality... Scientific question: Does calcium improve the rate of bone gain for adolescent women? 46 / 84

47 Multi-level structure for longitudinal data Level 1 covariates (unit: single observations): Time itself Covariates varying with time: blood pressure, heart rate, age Interaction between group and time If correlation is not taken into account, we ignore the paired situation, leading to low efficiency, i.e. too large P-values Type 2 error Time will appear less important Effects may go undetected! 47 / 84

48 Multi-level structure for longitudinal data, II Level 2 covariates (unit: individuals): Treatment Gender, age If correlation is ignored, we act as if we have more information than we actually have, leading to too small P-values Type 1 error Groups will appear more different Noise may be taken to be real effects! 48 / 84

49 Previous analyses of this example Response profiles, with unstructured or patterned covariance: 49 / 84

50 Time since randomization Of course, the girls were not seen with intervals of precisely 6 months... Time points are specific to each single girl Time 0 is the individual time of the fist visit visit1, visit2 etc. have no real meaning any more, because they do not refer to the same time point Time is in units years from randomization and is called obstime Note: The number of measurements decrease over time, due to missing values/dropout 50 / 84

51 Individual profiles Spaghetti plots 51 / 84

52 Plausible models for BMD data Mean value structure We need a model for the effect of time, since 5 separate mean values is not possible (not identical times). The simplest mean value structure is linearity Covariance structure We cannot use unstructured covariance but still random girl levels, corresponding to a compound symmetry covariance. A lot of other covariance structures will still be possible, e.g. The non-equidistant analogue to the autoregressive structure is Corr(Y git1, Y git2 ) = ρ t1 t2 A new covariance structure comes from random regression 52 / 84

53 Random girl level for calcium example with linearity in time and common level at baseline (obstime=0) In SAS: proc mixed covtest plots=all data=calcium; class grp girl; model bmd=time grp*obstime / ddfm=kr vciry s cl; random intercept / subject=girl(grp) v vcorr; run; kr could be replaced by satterth vciry produces scaled residuals Girls are nested in groups, specified by the notation girl(grp) v and vcorr are printing options In R: lme(bmd ~ obstime + grp:obstime, data=calcium, random=~1 girl) 53 / 84

54 Random girl level, output from code on p. 53 Covariance Parameter Estimates Standard Z Cov Parm Subject Estimate Error Value Pr > Z Intercept girl(grp) <.0001 Residual <.0001 Type 3 Tests of Fixed Effects Num Den Effect DF DF F Value Pr > F obstime <.0001 obstime*grp <.0001 No doubt, we see an interaction obstime*grp but the tests for covariance parameters are not quite trustworthy / 84

55 Random girl level, output, II Solution for Fixed Effects Standard Effect grp Estimate Error DF t Value Pr > t Alpha Intercept < obstime < obstime*grp C < obstime*grp P Effect grp Lower Upper Intercept obstime obstime*grp C obstime*grp P.. Excess slope in the C-group: 9.16 mg/cm 3 extra per year if C-treated, CI=(5.29, 13.03) 55 / 84

56 Model synonyms Two-level model Model with random subject levels Model with random intercepts Model with compound symmetry correlation structure Model with exchangeability correlation structure In the following, we shall see generalizations of random intercepts (including also random slopes) 56 / 84

57 Fit a straight line for each girl Scatterplot of slopes vs. levels at first visit, as estimated by individual regressions: Slopes in the Calcium-group (blue dots) seem to be bigger / 84

58 Results from individual regression Estimates with standard errors in brackets: Group Level at baseline Slope P (9.1) (2.14) C (8.2) (2.48) Difference 11.8 (12.3) 8.44 (3.27) P-value NOTE: No restrictions on baseline here 58 / 84

59 Individual growth rates? The time course is reasonably linear, but maybe the girls have different growth rates (slopes)? If we let Y git denote BMD for the i th girl (in the g th group) at time t (in years), we could look at the model: y git = a gi + b gi t + ε git, ε git N (0, σ 2 ) i.e., with different intercepts (a gi ) and different slopes (b gi ) for each girl, but 59 / 84

60 Random regression... we bind these individual parameters (a gi and b gi ) together by normal distributions G = ( agi b gi ( τ 2 a ω ω ) N 2 (( αβg ) τ 2 b ) =, G ) ( τ 2 a ρτ a τ b ρτ a τ b τ 2 b ) G describes the population variation of the lines, i.e. the inter-individual variation (reflected by the picture on p. 57). Note: No subscript on α because the groups are equal at baseline 60 / 84

61 Estimation in random regression keeping levels at baseline equal, by omitting grp in the model-statement: proc mixed plots=all data=calcium; class grp girl; model bmd=obstime grp*obstime / ddfm=kr vciry s cl outpm=fitm outp=fit; random intercept obstime / type=un subject=girl g gcorr v vcorr; run; model.rr = lme(bmd ~ obstime + grp:obstime, data=calcium, random=~obstime girl) type=un in the random-statement refers to the matrix G on the previous slide, and the estimate is seen on p / 84

62 Output from random regression Estimated G Matrix Row Effect girl Col1 Col2 1 Intercept obstime Estimated G Correlation Matrix Row Effect grp girl Col1 Col2 1 Intercept C obstime C Covariance Parameter Estimates Cov Parm Subject Estimate UN(1,1) girl UN(2,1) girl UN(2,2) girl Residual Correlation: Fit Statistics -2 Res Log Likelihood / 84

63 Output II: Estimated covariance and correlation for the 5 visits for one particular girl Estimated V Matrix for girl 101 Row Col1 Col2 Col3 Col4 Col Estimated V Correlation Matrix for girl 101 Row Col1 Col2 Col3 Col4 Col / 84

64 Output III: Estimated mean value structure Solution for Fixed Effects Standard Effect grp Estimate Error DF t Value Pr > t Alpha Intercept < obstime < obstime*grp C obstime*grp P Effect grp Lower Upper Intercept obstime obstime*grp C obstime*grp P.. Type 3 Tests of Fixed Effects Num Den Effect DF DF F Value Pr > F obstime <.0001 obstime*grp Thus, we find an extra increase in BMD of 8.76 mg/cm 3 per year, CI=(2.60, 14.92), when giving calcium supplement, a little less than found on p / 84

65 Note concerning random regression It is necessary to allow random intercepts and random slopes to be arbitrarily correlated This is default in R, but not in SAS If omitted, we may experience convergence problems, very erroneous results and sometimes totally incomprehensible output. In this particular case, the correlation between intercept and slope is not that impressive - actually only (intercept is not completely out of range in this example, since it refers to the baseline). 65 / 84

66 Predicted values from random regression Predicted group means (only systematic/fixed effects): shown for two girls from different groups: 66 / 84

67 Predicted values from random regression, II Individual predictions (fixed and random effects): 67 / 84

68 Model checks Two types of residuals: Ordinary Observed minus predicted group mean (only systematic/fixed effects) Y ij X T ij ˆβ Conditional Observed minus predicted individual mean value (systematic and random effects) ε ij = Y ij (X T ij ˆβ + Z T ij ˆb i ) Conditional residuals are usually much smaller than the ordinary, since they describe deviations from subject-specific predictions, but we dont see this on scaled residuals. 68 / 84

69 Model check, ordinary residuals 69 / 84

70 Model check, conditional residuals 70 / 84

71 Check of linearity, ordinary residuals 71 / 84

72 Check of linearity, conditional residuals 72 / 84

73 Comments on model checks Ordinary residuals (p. 69): Homogeneity of variance OK, Evident normality. Conditional residuals (p. 70): Homogeneity of variance OK, Evident normality. Linearity (p. 71): Some deviation from linearity in the ordinary residuals. Linearity (p. 72): Some non-systematic deviation from linearity seen in conditional residuals, but somewhat consistently for the two groups 73 / 84

74 Individual regressions approach Merits: Easy to understand and interpret Drawbacks: Suboptimal in case of unequal sample sizes Only simple models feasible Difficult/impossible to include covariates Only individuals with a sufficient number of observations will supply estimates Not possible to account for equal baseline values 74 / 84

75 Random regression approach Merits: Uses all available information Optimal procedure if the model holds Easy to include covariates Drawbacks: Biased in case of informative missing values (or informative sample sizes) Difficult to explain 75 / 84

76 Random regression vs. individual regressions Slopes from: Group Individual regressions Random regression P (2.14) (2.16) C (2.48) (2.21) Difference 8.44 (3.27) 8.76 (3.10) P-value Random regression gives a somewhat steeper slope The girls with flat (and low) profiles tend to be shorter These slopes contribute less to the random regression slope because they are less accurate Is this a coincidence?? Otherwise, we may see an example of informative missing values (last lecture) 76 / 84

77 Models for BMD data Perhaps feasible: Response profiles: Unstructured mean and unstructured covariance (only for balanced data - or almost balanced) Compound symmetry covariance/correlation Synonymous for random effect/level for each girl Probably better: Autoregressive covariance/correlation (or other patterned covariance structure) Random regression Random effects of both intercept and slope for each girl 77 / 84

78 How can we choose between models? Think... Graphical assessment of fit e.g. comparison of predicted profiles with average curves (beware of missing values) Inspection of residuals Comparing AIC s (Akaikes information criterion) Tests against more flexible alternatives Fixed effects tested by the usual output, or comparing 2 log L for ML-estimation with χ 2 -tests. Covariance patterns evaluated by χ 2 -tests on 2 log L, either ML or REML 78 / 84

79 The mean value structure Look for: Linearity in scatter plot? Curves in residual plots? Alternatives: Splines More covariates Non-linear models 79 / 84

80 The covariance/correlation structure 1. Random effects: 2. Serial correlation (the pattern) 3. Error of measurement 80 / 84

81 Assumptions in a mixed effects model Linearity in covariates, either fixed (X ij ) or random (Z ij ) Normality of residuals ε i. Normality of random effects b i Plausibility of covariance structure Independence between individuals Independence between covariate values X ij and random effects b i, e.g. Does the timing and number of measurements relate to the development for the girl? 81 / 84

82 Importance of assumptions Important: Linearity Independence between individuals (normally not an issue) Independence between X ij and b i Appropriateness of the covariance structure: (may be circumvented by using the empirical sandwich estimator), Less important: (especially when the number of observations is large) Normality of residuals ε ij Normality of random effects b i 82 / 84

83 Influential observations i.e. observations with a large influence on the estimates, either on the mean value or on the covariance parameters. These observations could have an unusual combination of fixed-type covariates (X ij ) large ordinary residuals (from mean value) an unusual combination of random-type covariates (Z ij ) or it could be a sign of a bad choice of mean value structure bad choice of covariance pattern 83 / 84

84 Cooks distance 84 / 84

Correlated data. Overview. Example: Swelling due to vaccine. Variance component models. Faculty of Health Sciences. Variance component models

Correlated data. Overview. Example: Swelling due to vaccine. Variance component models. Faculty of Health Sciences. Variance component models Faculty of Health Sciences Overview Correlated data Variance component models One-way anova with random variation The rabbit example Hierarchical models with several levels Random regression Lene Theil

More information

Faculty of Health Sciences. Correlated data. Variance component models. Lene Theil Skovgaard & Julie Lyng Forman.

Faculty of Health Sciences. Correlated data. Variance component models. Lene Theil Skovgaard & Julie Lyng Forman. Faculty of Health Sciences Correlated data Variance component models Lene Theil Skovgaard & Julie Lyng Forman November 28, 2017 1 / 96 Overview One-way anova with random variation The rabbit example Hierarchical

More information

Correlated data. Overview. Variance component models. Terminology for correlated measurements. Faculty of Health Sciences. Variance component models

Correlated data. Overview. Variance component models. Terminology for correlated measurements. Faculty of Health Sciences. Variance component models Faculty of Health Sciences Overview Correlated data Variance component models Lene Theil Skovgaard & Julie Lyng Forman November 28, 2017 One-way anova with random variation The rabbit example Hierarchical

More information

Correlated data. Repeated measurements over time. Typical set-up for repeated measurements. Traditional presentation of data

Correlated data. Repeated measurements over time. Typical set-up for repeated measurements. Traditional presentation of data Faculty of Health Sciences Repeated measurements over time Correlated data NFA, May 22, 2014 Longitudinal measurements Julie Lyng Forman & Lene Theil Skovgaard Department of Biostatistics University of

More information

Analysis of variance and regression. December 4, 2007

Analysis of variance and regression. December 4, 2007 Analysis of variance and regression December 4, 2007 Variance component models Variance components One-way anova with random variation estimation interpretations Two-way anova with random variation Crossed

More information

Correlated data. Variance component models. Example: Evaluate vaccine. Traditional assumption so far. Faculty of Health Sciences

Correlated data. Variance component models. Example: Evaluate vaccine. Traditional assumption so far. Faculty of Health Sciences Faculty of Health Sciences Variance component models Definitions and motivation Correlated data Variance component models, I Lene Theil Skovgaard November 29, 2013 One-way anova with random variation The

More information

Varians- og regressionsanalyse

Varians- og regressionsanalyse Faculty of Health Sciences Varians- og regressionsanalyse Variance component models Lene Theil Skovgaard Department of Biostatistics Variance component models Definitions and motivation One-way anova with

More information

Correlated data. Overview. Variance component models. Terminology for correlated measurements. Faculty of Health Sciences. Variance component models

Correlated data. Overview. Variance component models. Terminology for correlated measurements. Faculty of Health Sciences. Variance component models Faculty of Health Sciences Overview Correlated data Variance component models Lene Theil Skovgaard & Julie Lyng Forman November 29, 2016 One-way anova with random variation The rabbit example Hierarchical

More information

Models for longitudinal data

Models for longitudinal data Faculty of Health Sciences Contents Models for longitudinal data Analysis of repeated measurements, NFA 016 Julie Lyng Forman & Lene Theil Skovgaard Department of Biostatistics, University of Copenhagen

More information

Correlated data. Longitudinal data. Typical set-up for repeated measurements. Examples from literature, I. Faculty of Health Sciences

Correlated data. Longitudinal data. Typical set-up for repeated measurements. Examples from literature, I. Faculty of Health Sciences Faculty of Health Sciences Longitudinal data Correlated data Longitudinal measurements Outline Designs Models for the mean Covariance patterns Lene Theil Skovgaard November 27, 2015 Random regression Baseline

More information

Analysis of variance and regression. May 13, 2008

Analysis of variance and regression. May 13, 2008 Analysis of variance and regression May 13, 2008 Repeated measurements over time Presentation of data Traditional ways of analysis Variance component model (the dogs revisited) Random regression Baseline

More information

Variance components and LMMs

Variance components and LMMs Faculty of Health Sciences Variance components and LMMs Analysis of repeated measurements, 4th December 2014 Julie Lyng Forman & Lene Theil Skovgaard Department of Biostatistics, University of Copenhagen

More information

Variance components and LMMs

Variance components and LMMs Faculty of Health Sciences Topics for today Variance components and LMMs Analysis of repeated measurements, 4th December 04 Leftover from 8/: Rest of random regression example. New concepts for today:

More information

Linear mixed models. Faculty of Health Sciences. Analysis of repeated measurements, 10th March Julie Lyng Forman & Lene Theil Skovgaard

Linear mixed models. Faculty of Health Sciences. Analysis of repeated measurements, 10th March Julie Lyng Forman & Lene Theil Skovgaard Faculty of Health Sciences Linear mixed models Analysis of repeated measurements, 10th March 2015 Julie Lyng Forman & Lene Theil Skovgaard Department of Biostatistics, University of Copenhagen 1 / 80 Program

More information

Variance component models

Variance component models Faculty of Health Sciences Variance component models Analysis of repeated measurements, NFA 2016 Julie Lyng Forman & Lene Theil Skovgaard Department of Biostatistics, University of Copenhagen Topics for

More information

Variance component models part I

Variance component models part I Faculty of Health Sciences Variance component models part I Analysis of repeated measurements, 30th November 2012 Julie Lyng Forman & Lene Theil Skovgaard Department of Biostatistics, University of Copenhagen

More information

Multi-factor analysis of variance

Multi-factor analysis of variance Faculty of Health Sciences Outline Multi-factor analysis of variance Basic statistics for experimental researchers 2015 Two-way ANOVA and interaction Mathed samples ANOVA Random vs systematic variation

More information

Answer to exercise: Blood pressure lowering drugs

Answer to exercise: Blood pressure lowering drugs Answer to exercise: Blood pressure lowering drugs The data set bloodpressure.txt contains data from a cross-over trial, involving three different formulations of a drug for lowering of blood pressure:

More information

Linear mixed models. Program. What are repeated measurements? Outline. Faculty of Health Sciences. Analysis of repeated measurements, 10th March 2015

Linear mixed models. Program. What are repeated measurements? Outline. Faculty of Health Sciences. Analysis of repeated measurements, 10th March 2015 university of copenhagen d e pa rt m e n t o f b i o s tat i s t i c s university of copenhagen d e pa rt m e n t o f b i o s tat i s t i c s Program Faculty of Health Sciences Topics: Linear mixed models

More information

Statistics for exp. medical researchers Regression and Correlation

Statistics for exp. medical researchers Regression and Correlation Faculty of Health Sciences Regression analysis Statistics for exp. medical researchers Regression and Correlation Lene Theil Skovgaard Sept. 28, 2015 Linear regression, Estimation and Testing Confidence

More information

Faculty of Health Sciences. Correlated data. More about LMMs. Lene Theil Skovgaard. December 4, / 104

Faculty of Health Sciences. Correlated data. More about LMMs. Lene Theil Skovgaard. December 4, / 104 Faculty of Health Sciences Correlated data More about LMMs Lene Theil Skovgaard December 4, 2015 1 / 104 Further topics Model check and diagnostics Cross-over studies Paired T-tests with missing values

More information

More about linear mixed models

More about linear mixed models Faculty of Health Sciences Contents More about linear mixed models Analysis of repeated measurements, NFA 2016 Julie Lyng Forman & Lene Theil Skovgaard Department of Biostatistics, University of Copenhagen

More information

Introduction to Random Effects of Time and Model Estimation

Introduction to Random Effects of Time and Model Estimation Introduction to Random Effects of Time and Model Estimation Today s Class: The Big Picture Multilevel model notation Fixed vs. random effects of time Random intercept vs. random slope models How MLM =

More information

Review of CLDP 944: Multilevel Models for Longitudinal Data

Review of CLDP 944: Multilevel Models for Longitudinal Data Review of CLDP 944: Multilevel Models for Longitudinal Data Topics: Review of general MLM concepts and terminology Model comparisons and significance testing Fixed and random effects of time Significance

More information

over Time line for the means). Specifically, & covariances) just a fixed variance instead. PROC MIXED: to 1000 is default) list models with TYPE=VC */

over Time line for the means). Specifically, & covariances) just a fixed variance instead. PROC MIXED: to 1000 is default) list models with TYPE=VC */ CLP 944 Example 4 page 1 Within-Personn Fluctuation in Symptom Severity over Time These data come from a study of weekly fluctuation in psoriasis severity. There was no intervention and no real reason

More information

An Introduction to Multilevel Models. PSYC 943 (930): Fundamentals of Multivariate Modeling Lecture 25: December 7, 2012

An Introduction to Multilevel Models. PSYC 943 (930): Fundamentals of Multivariate Modeling Lecture 25: December 7, 2012 An Introduction to Multilevel Models PSYC 943 (930): Fundamentals of Multivariate Modeling Lecture 25: December 7, 2012 Today s Class Concepts in Longitudinal Modeling Between-Person vs. +Within-Person

More information

SAS Syntax and Output for Data Manipulation:

SAS Syntax and Output for Data Manipulation: CLP 944 Example 5 page 1 Practice with Fixed and Random Effects of Time in Modeling Within-Person Change The models for this example come from Hoffman (2015) chapter 5. We will be examining the extent

More information

Introduction to SAS proc mixed

Introduction to SAS proc mixed Faculty of Health Sciences Introduction to SAS proc mixed Analysis of repeated measurements, 2017 Julie Forman Department of Biostatistics, University of Copenhagen Outline Data in wide and long format

More information

Introduction to SAS proc mixed

Introduction to SAS proc mixed Faculty of Health Sciences Introduction to SAS proc mixed Analysis of repeated measurements, 2017 Julie Forman Department of Biostatistics, University of Copenhagen 2 / 28 Preparing data for analysis The

More information

General Linear Model (Chapter 4)

General Linear Model (Chapter 4) General Linear Model (Chapter 4) Outcome variable is considered continuous Simple linear regression Scatterplots OLS is BLUE under basic assumptions MSE estimates residual variance testing regression coefficients

More information

Introduction to Within-Person Analysis and RM ANOVA

Introduction to Within-Person Analysis and RM ANOVA Introduction to Within-Person Analysis and RM ANOVA Today s Class: From between-person to within-person ANOVAs for longitudinal data Variance model comparisons using 2 LL CLP 944: Lecture 3 1 The Two Sides

More information

Time-Invariant Predictors in Longitudinal Models

Time-Invariant Predictors in Longitudinal Models Time-Invariant Predictors in Longitudinal Models Today s Topics: What happens to missing predictors Effects of time-invariant predictors Fixed vs. systematically varying vs. random effects Model building

More information

Multi-factor analysis of variance

Multi-factor analysis of variance Faculty of Health Sciences Outline Multi-factor analysis of variance Basic statistics for experimental researchers 2016 Two-way ANOVA and interaction Matched samples ANOVA Random vs systematic variation

More information

MIXED MODELS FOR REPEATED (LONGITUDINAL) DATA PART 2 DAVID C. HOWELL 4/1/2010

MIXED MODELS FOR REPEATED (LONGITUDINAL) DATA PART 2 DAVID C. HOWELL 4/1/2010 MIXED MODELS FOR REPEATED (LONGITUDINAL) DATA PART 2 DAVID C. HOWELL 4/1/2010 Part 1 of this document can be found at http://www.uvm.edu/~dhowell/methods/supplements/mixed Models for Repeated Measures1.pdf

More information

Describing Change over Time: Adding Linear Trends

Describing Change over Time: Adding Linear Trends Describing Change over Time: Adding Linear Trends Longitudinal Data Analysis Workshop Section 7 University of Georgia: Institute for Interdisciplinary Research in Education and Human Development Section

More information

Statistics for exp. medical researchers Comparison of groups, T-tests and ANOVA

Statistics for exp. medical researchers Comparison of groups, T-tests and ANOVA Faculty of Health Sciences Outline Statistics for exp. medical researchers Comparison of groups, T-tests and ANOVA Lene Theil Skovgaard Sept. 14, 2015 Paired comparisons: tests and confidence intervals

More information

multilevel modeling: concepts, applications and interpretations

multilevel modeling: concepts, applications and interpretations multilevel modeling: concepts, applications and interpretations lynne c. messer 27 october 2010 warning social and reproductive / perinatal epidemiologist concepts why context matters multilevel models

More information

Time-Invariant Predictors in Longitudinal Models

Time-Invariant Predictors in Longitudinal Models Time-Invariant Predictors in Longitudinal Models Today s Class (or 3): Summary of steps in building unconditional models for time What happens to missing predictors Effects of time-invariant predictors

More information

Biostatistics Workshop Longitudinal Data Analysis. Session 4 GARRETT FITZMAURICE

Biostatistics Workshop Longitudinal Data Analysis. Session 4 GARRETT FITZMAURICE Biostatistics Workshop 2008 Longitudinal Data Analysis Session 4 GARRETT FITZMAURICE Harvard University 1 LINEAR MIXED EFFECTS MODELS Motivating Example: Influence of Menarche on Changes in Body Fat Prospective

More information

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator UNIVERSITY OF TORONTO Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS Duration - 3 hours Aids Allowed: Calculator LAST NAME: FIRST NAME: STUDENT NUMBER: There are 27 pages

More information

A (Brief) Introduction to Crossed Random Effects Models for Repeated Measures Data

A (Brief) Introduction to Crossed Random Effects Models for Repeated Measures Data A (Brief) Introduction to Crossed Random Effects Models for Repeated Measures Data Today s Class: Review of concepts in multivariate data Introduction to random intercepts Crossed random effects models

More information

36-309/749 Experimental Design for Behavioral and Social Sciences. Dec 1, 2015 Lecture 11: Mixed Models (HLMs)

36-309/749 Experimental Design for Behavioral and Social Sciences. Dec 1, 2015 Lecture 11: Mixed Models (HLMs) 36-309/749 Experimental Design for Behavioral and Social Sciences Dec 1, 2015 Lecture 11: Mixed Models (HLMs) Independent Errors Assumption An error is the deviation of an individual observed outcome (DV)

More information

Supplemental Materials. In the main text, we recommend graphing physiological values for individual dyad

Supplemental Materials. In the main text, we recommend graphing physiological values for individual dyad 1 Supplemental Materials Graphing Values for Individual Dyad Members over Time In the main text, we recommend graphing physiological values for individual dyad members over time to aid in the decision

More information

Random Coefficient Model (a.k.a. multilevel model) (Adapted from UCLA Statistical Computing Seminars)

Random Coefficient Model (a.k.a. multilevel model) (Adapted from UCLA Statistical Computing Seminars) STAT:5201 Applied Statistic II Random Coefficient Model (a.k.a. multilevel model) (Adapted from UCLA Statistical Computing Seminars) School math achievement scores The data file consists of 7185 students

More information

Time-Invariant Predictors in Longitudinal Models

Time-Invariant Predictors in Longitudinal Models Time-Invariant Predictors in Longitudinal Models Topics: What happens to missing predictors Effects of time-invariant predictors Fixed vs. systematically varying vs. random effects Model building strategies

More information

Describing Within-Person Fluctuation over Time using Alternative Covariance Structures

Describing Within-Person Fluctuation over Time using Alternative Covariance Structures Describing Within-Person Fluctuation over Time using Alternative Covariance Structures Today s Class: The Big Picture ACS models using the R matrix only Introducing the G, Z, and V matrices ACS models

More information

Topic 25 - One-Way Random Effects Models. Outline. Random Effects vs Fixed Effects. Data for One-way Random Effects Model. One-way Random effects

Topic 25 - One-Way Random Effects Models. Outline. Random Effects vs Fixed Effects. Data for One-way Random Effects Model. One-way Random effects Topic 5 - One-Way Random Effects Models One-way Random effects Outline Model Variance component estimation - Fall 013 Confidence intervals Topic 5 Random Effects vs Fixed Effects Consider factor with numerous

More information

Topic 17 - Single Factor Analysis of Variance. Outline. One-way ANOVA. The Data / Notation. One way ANOVA Cell means model Factor effects model

Topic 17 - Single Factor Analysis of Variance. Outline. One-way ANOVA. The Data / Notation. One way ANOVA Cell means model Factor effects model Topic 17 - Single Factor Analysis of Variance - Fall 2013 One way ANOVA Cell means model Factor effects model Outline Topic 17 2 One-way ANOVA Response variable Y is continuous Explanatory variable is

More information

TABLE OF CONTENTS INTRODUCTION TO MIXED-EFFECTS MODELS...3

TABLE OF CONTENTS INTRODUCTION TO MIXED-EFFECTS MODELS...3 Table of contents TABLE OF CONTENTS...1 1 INTRODUCTION TO MIXED-EFFECTS MODELS...3 Fixed-effects regression ignoring data clustering...5 Fixed-effects regression including data clustering...1 Fixed-effects

More information

SAS Code for Data Manipulation: SPSS Code for Data Manipulation: STATA Code for Data Manipulation: Psyc 945 Example 1 page 1

SAS Code for Data Manipulation: SPSS Code for Data Manipulation: STATA Code for Data Manipulation: Psyc 945 Example 1 page 1 Psyc 945 Example page Example : Unconditional Models for Change in Number Match 3 Response Time (complete data, syntax, and output available for SAS, SPSS, and STATA electronically) These data come from

More information

Analysis of variance and regression. April 17, Contents Comparison of several groups One-way ANOVA. Two-way ANOVA Interaction Model checking

Analysis of variance and regression. April 17, Contents Comparison of several groups One-way ANOVA. Two-way ANOVA Interaction Model checking Analysis of variance and regression Contents Comparison of several groups One-way ANOVA April 7, 008 Two-way ANOVA Interaction Model checking ANOVA, April 008 Comparison of or more groups Julie Lyng Forman,

More information

Regression models. Categorical covariate, Quantitative outcome. Examples of categorical covariates. Group characteristics. Faculty of Health Sciences

Regression models. Categorical covariate, Quantitative outcome. Examples of categorical covariates. Group characteristics. Faculty of Health Sciences Faculty of Health Sciences Categorical covariate, Quantitative outcome Regression models Categorical covariate, Quantitative outcome Lene Theil Skovgaard April 29, 2013 PKA & LTS, Sect. 3.2, 3.2.1 ANOVA

More information

Regression: Main Ideas Setting: Quantitative outcome with a quantitative explanatory variable. Example, cont.

Regression: Main Ideas Setting: Quantitative outcome with a quantitative explanatory variable. Example, cont. TCELL 9/4/205 36-309/749 Experimental Design for Behavioral and Social Sciences Simple Regression Example Male black wheatear birds carry stones to the nest as a form of sexual display. Soler et al. wanted

More information

Analysis of variance and regression. November 22, 2007

Analysis of variance and regression. November 22, 2007 Analysis of variance and regression November 22, 2007 Parametrisations: Choice of parameters Comparison of models Test for linearity Linear splines Lene Theil Skovgaard, Dept. of Biostatistics, Institute

More information

9. Linear Regression and Correlation

9. Linear Regression and Correlation 9. Linear Regression and Correlation Data: y a quantitative response variable x a quantitative explanatory variable (Chap. 8: Recall that both variables were categorical) For example, y = annual income,

More information

STAT 5200 Handout #23. Repeated Measures Example (Ch. 16)

STAT 5200 Handout #23. Repeated Measures Example (Ch. 16) Motivating Example: Glucose STAT 500 Handout #3 Repeated Measures Example (Ch. 16) An experiment is conducted to evaluate the effects of three diets on the serum glucose levels of human subjects. Twelve

More information

A Re-Introduction to General Linear Models (GLM)

A Re-Introduction to General Linear Models (GLM) A Re-Introduction to General Linear Models (GLM) Today s Class: You do know the GLM Estimation (where the numbers in the output come from): From least squares to restricted maximum likelihood (REML) Reviewing

More information

Time Invariant Predictors in Longitudinal Models

Time Invariant Predictors in Longitudinal Models Time Invariant Predictors in Longitudinal Models Longitudinal Data Analysis Workshop Section 9 University of Georgia: Institute for Interdisciplinary Research in Education and Human Development Section

More information

Analysis of variance. April 16, Contents Comparison of several groups

Analysis of variance. April 16, Contents Comparison of several groups Contents Comparison of several groups Analysis of variance April 16, 2009 One-way ANOVA Two-way ANOVA Interaction Model checking Acknowledgement for use of presentation Julie Lyng Forman, Dept. of Biostatistics

More information

36-309/749 Experimental Design for Behavioral and Social Sciences. Sep. 22, 2015 Lecture 4: Linear Regression

36-309/749 Experimental Design for Behavioral and Social Sciences. Sep. 22, 2015 Lecture 4: Linear Regression 36-309/749 Experimental Design for Behavioral and Social Sciences Sep. 22, 2015 Lecture 4: Linear Regression TCELL Simple Regression Example Male black wheatear birds carry stones to the nest as a form

More information

Model Assumptions; Predicting Heterogeneity of Variance

Model Assumptions; Predicting Heterogeneity of Variance Model Assumptions; Predicting Heterogeneity of Variance Today s topics: Model assumptions Normality Constant variance Predicting heterogeneity of variance CLP 945: Lecture 6 1 Checking for Violations of

More information

Faculty of Health Sciences. Correlated data. Count variables. Lene Theil Skovgaard & Julie Lyng Forman. December 6, 2016

Faculty of Health Sciences. Correlated data. Count variables. Lene Theil Skovgaard & Julie Lyng Forman. December 6, 2016 Faculty of Health Sciences Correlated data Count variables Lene Theil Skovgaard & Julie Lyng Forman December 6, 2016 1 / 76 Modeling count outcomes Outline The Poisson distribution for counts Poisson models,

More information

Analysis of variance. April 16, 2009

Analysis of variance. April 16, 2009 Analysis of variance April 16, 2009 Contents Comparison of several groups One-way ANOVA Two-way ANOVA Interaction Model checking Acknowledgement for use of presentation Julie Lyng Forman, Dept. of Biostatistics

More information

Variance. Standard deviation VAR = = value. Unbiased SD = SD = 10/23/2011. Functional Connectivity Correlation and Regression.

Variance. Standard deviation VAR = = value. Unbiased SD = SD = 10/23/2011. Functional Connectivity Correlation and Regression. 10/3/011 Functional Connectivity Correlation and Regression Variance VAR = Standard deviation Standard deviation SD = Unbiased SD = 1 10/3/011 Standard error Confidence interval SE = CI = = t value for

More information

Describing Within-Person Change over Time

Describing Within-Person Change over Time Describing Within-Person Change over Time Topics: Multilevel modeling notation and terminology Fixed and random effects of linear time Predicted variances and covariances from random slopes Dependency

More information

Random Coefficients Model Examples

Random Coefficients Model Examples Random Coefficients Model Examples STAT:5201 Week 15 - Lecture 2 1 / 26 Each subject (or experimental unit) has multiple measurements (this could be over time, or it could be multiple measurements on a

More information

ANOVA Longitudinal Models for the Practice Effects Data: via GLM

ANOVA Longitudinal Models for the Practice Effects Data: via GLM Psyc 943 Lecture 25 page 1 ANOVA Longitudinal Models for the Practice Effects Data: via GLM Model 1. Saturated Means Model for Session, E-only Variances Model (BP) Variances Model: NO correlation, EQUAL

More information

Repeated Measures Data

Repeated Measures Data Repeated Measures Data Mixed Models Lecture Notes By Dr. Hanford page 1 Data where subjects are measured repeatedly over time - predetermined intervals (weekly) - uncontrolled variable intervals between

More information

Analysis of Longitudinal Data: Comparison Between PROC GLM and PROC MIXED. Maribeth Johnson Medical College of Georgia Augusta, GA

Analysis of Longitudinal Data: Comparison Between PROC GLM and PROC MIXED. Maribeth Johnson Medical College of Georgia Augusta, GA Analysis of Longitudinal Data: Comparison Between PROC GLM and PROC MIXED Maribeth Johnson Medical College of Georgia Augusta, GA Overview Introduction to longitudinal data Describe the data for examples

More information

Lecture 2. The Simple Linear Regression Model: Matrix Approach

Lecture 2. The Simple Linear Regression Model: Matrix Approach Lecture 2 The Simple Linear Regression Model: Matrix Approach Matrix algebra Matrix representation of simple linear regression model 1 Vectors and Matrices Where it is necessary to consider a distribution

More information

Business Statistics. Lecture 10: Course Review

Business Statistics. Lecture 10: Course Review Business Statistics Lecture 10: Course Review 1 Descriptive Statistics for Continuous Data Numerical Summaries Location: mean, median Spread or variability: variance, standard deviation, range, percentiles,

More information

Random Intercept Models

Random Intercept Models Random Intercept Models Edps/Psych/Soc 589 Carolyn J. Anderson Department of Educational Psychology c Board of Trustees, University of Illinois Spring 2019 Outline A very simple case of a random intercept

More information

For more information about how to cite these materials visit

For more information about how to cite these materials visit Author(s): Kerby Shedden, Ph.D., 2010 License: Unless otherwise noted, this material is made available under the terms of the Creative Commons Attribution Share Alike 3.0 License: http://creativecommons.org/licenses/by-sa/3.0/

More information

y response variable x 1, x 2,, x k -- a set of explanatory variables

y response variable x 1, x 2,, x k -- a set of explanatory variables 11. Multiple Regression and Correlation y response variable x 1, x 2,, x k -- a set of explanatory variables In this chapter, all variables are assumed to be quantitative. Chapters 12-14 show how to incorporate

More information

Correlated data. Overview. Cross-over study. Repetition. Faculty of Health Sciences. Variance component models, II. More on variance component models

Correlated data. Overview. Cross-over study. Repetition. Faculty of Health Sciences. Variance component models, II. More on variance component models Faculty of Health Sciences Overview Correlated data More on variance component models Variance component models, II Cross-over studies Non-normal data Comparing measurement devices Lene Theil Skovgaard

More information

Time-Invariant Predictors in Longitudinal Models

Time-Invariant Predictors in Longitudinal Models Time-Invariant Predictors in Longitudinal Models Topics: Summary of building unconditional models for time Missing predictors in MLM Effects of time-invariant predictors Fixed, systematically varying,

More information

Repeated Measures Modeling With PROC MIXED E. Barry Moser, Louisiana State University, Baton Rouge, LA

Repeated Measures Modeling With PROC MIXED E. Barry Moser, Louisiana State University, Baton Rouge, LA Paper 188-29 Repeated Measures Modeling With PROC MIXED E. Barry Moser, Louisiana State University, Baton Rouge, LA ABSTRACT PROC MIXED provides a very flexible environment in which to model many types

More information

Models for Clustered Data

Models for Clustered Data Models for Clustered Data Edps/Psych/Soc 589 Carolyn J Anderson Department of Educational Psychology c Board of Trustees, University of Illinois Spring 2019 Outline Notation NELS88 data Fixed Effects ANOVA

More information

Lecture 4. Random Effects in Completely Randomized Design

Lecture 4. Random Effects in Completely Randomized Design Lecture 4. Random Effects in Completely Randomized Design Montgomery: 3.9, 13.1 and 13.7 1 Lecture 4 Page 1 Random Effects vs Fixed Effects Consider factor with numerous possible levels Want to draw inference

More information

Models for Clustered Data

Models for Clustered Data Models for Clustered Data Edps/Psych/Stat 587 Carolyn J Anderson Department of Educational Psychology c Board of Trustees, University of Illinois Fall 2017 Outline Notation NELS88 data Fixed Effects ANOVA

More information

Models for binary data

Models for binary data Faculty of Health Sciences Models for binary data Analysis of repeated measurements 2015 Julie Lyng Forman & Lene Theil Skovgaard Department of Biostatistics, University of Copenhagen 1 / 63 Program for

More information

Applied Statistics and Econometrics

Applied Statistics and Econometrics Applied Statistics and Econometrics Lecture 6 Saul Lach September 2017 Saul Lach () Applied Statistics and Econometrics September 2017 1 / 53 Outline of Lecture 6 1 Omitted variable bias (SW 6.1) 2 Multiple

More information

1 The Classic Bivariate Least Squares Model

1 The Classic Bivariate Least Squares Model Review of Bivariate Linear Regression Contents 1 The Classic Bivariate Least Squares Model 1 1.1 The Setup............................... 1 1.2 An Example Predicting Kids IQ................. 1 2 Evaluating

More information

Analysing data: regression and correlation S6 and S7

Analysing data: regression and correlation S6 and S7 Basic medical statistics for clinical and experimental research Analysing data: regression and correlation S6 and S7 K. Jozwiak k.jozwiak@nki.nl 2 / 49 Correlation So far we have looked at the association

More information

STAT5044: Regression and Anova. Inyoung Kim

STAT5044: Regression and Anova. Inyoung Kim STAT5044: Regression and Anova Inyoung Kim 2 / 47 Outline 1 Regression 2 Simple Linear regression 3 Basic concepts in regression 4 How to estimate unknown parameters 5 Properties of Least Squares Estimators:

More information

Notes 6. Basic Stats Procedures part II

Notes 6. Basic Stats Procedures part II Statistics 5106, Fall 2007 Notes 6 Basic Stats Procedures part II Testing for Correlation between Two Variables You have probably all heard about correlation. When two variables are correlated, they are

More information

Parametrisations, splines

Parametrisations, splines / 7 Parametrisations, splines Analysis of variance and regression course http://staff.pubhealth.ku.dk/~lts/regression_2 Marc Andersen, mja@statgroup.dk Analysis of variance and regression for health researchers,

More information

Longitudinal Data Analysis of Health Outcomes

Longitudinal Data Analysis of Health Outcomes Longitudinal Data Analysis of Health Outcomes Longitudinal Data Analysis Workshop Running Example: Days 2 and 3 University of Georgia: Institute for Interdisciplinary Research in Education and Human Development

More information

Linear regression and correlation

Linear regression and correlation Faculty of Health Sciences Linear regression and correlation Statistics for experimental medical researchers 2018 Julie Forman, Christian Pipper & Claus Ekstrøm Department of Biostatistics, University

More information

Faculty of Health Sciences. Regression models. Counts, Poisson regression, Lene Theil Skovgaard. Dept. of Biostatistics

Faculty of Health Sciences. Regression models. Counts, Poisson regression, Lene Theil Skovgaard. Dept. of Biostatistics Faculty of Health Sciences Regression models Counts, Poisson regression, 27-5-2013 Lene Theil Skovgaard Dept. of Biostatistics 1 / 36 Count outcome PKA & LTS, Sect. 7.2 Poisson regression The Binomial

More information

Circle the single best answer for each multiple choice question. Your choice should be made clearly.

Circle the single best answer for each multiple choice question. Your choice should be made clearly. TEST #1 STA 4853 March 6, 2017 Name: Please read the following directions. DO NOT TURN THE PAGE UNTIL INSTRUCTED TO DO SO Directions This exam is closed book and closed notes. There are 32 multiple choice

More information

1. (Rao example 11.15) A study measures oxygen demand (y) (on a log scale) and five explanatory variables (see below). Data are available as

1. (Rao example 11.15) A study measures oxygen demand (y) (on a log scale) and five explanatory variables (see below). Data are available as ST 51, Summer, Dr. Jason A. Osborne Homework assignment # - Solutions 1. (Rao example 11.15) A study measures oxygen demand (y) (on a log scale) and five explanatory variables (see below). Data are available

More information

Interactions. Interactions. Lectures 1 & 2. Linear Relationships. y = a + bx. Slope. Intercept

Interactions. Interactions. Lectures 1 & 2. Linear Relationships. y = a + bx. Slope. Intercept Interactions Lectures 1 & Regression Sometimes two variables appear related: > smoking and lung cancers > height and weight > years of education and income > engine size and gas mileage > GMAT scores and

More information

Chapter 1 Statistical Inference

Chapter 1 Statistical Inference Chapter 1 Statistical Inference causal inference To infer causality, you need a randomized experiment (or a huge observational study and lots of outside information). inference to populations Generalizations

More information

9 Correlation and Regression

9 Correlation and Regression 9 Correlation and Regression SW, Chapter 12. Suppose we select n = 10 persons from the population of college seniors who plan to take the MCAT exam. Each takes the test, is coached, and then retakes the

More information

Correlation and the Analysis of Variance Approach to Simple Linear Regression

Correlation and the Analysis of Variance Approach to Simple Linear Regression Correlation and the Analysis of Variance Approach to Simple Linear Regression Biometry 755 Spring 2009 Correlation and the Analysis of Variance Approach to Simple Linear Regression p. 1/35 Correlation

More information

An Introduction to Path Analysis

An Introduction to Path Analysis An Introduction to Path Analysis PRE 905: Multivariate Analysis Lecture 10: April 15, 2014 PRE 905: Lecture 10 Path Analysis Today s Lecture Path analysis starting with multivariate regression then arriving

More information

Introduction to Crossover Trials

Introduction to Crossover Trials Introduction to Crossover Trials Stat 6500 Tutorial Project Isaac Blackhurst A crossover trial is a type of randomized control trial. It has advantages over other designed experiments because, under certain

More information

SAS Syntax and Output for Data Manipulation: CLDP 944 Example 3a page 1

SAS Syntax and Output for Data Manipulation: CLDP 944 Example 3a page 1 CLDP 944 Example 3a page 1 From Between-Person to Within-Person Models for Longitudinal Data The models for this example come from Hoffman (2015) chapter 3 example 3a. We will be examining the extent to

More information

Chapter 11. Analysis of Variance (One-Way)

Chapter 11. Analysis of Variance (One-Way) Chapter 11 Analysis of Variance (One-Way) We now develop a statistical procedure for comparing the means of two or more groups, known as analysis of variance or ANOVA. These groups might be the result

More information