Analysis of variance and regression. December 4, 2007

Similar documents
Varians- og regressionsanalyse

Correlated data. Variance component models. Example: Evaluate vaccine. Traditional assumption so far. Faculty of Health Sciences

Correlated data. Overview. Variance component models. Terminology for correlated measurements. Faculty of Health Sciences. Variance component models

Variance component models part I

Variance component models

Faculty of Health Sciences. Correlated data. Variance component models. Lene Theil Skovgaard & Julie Lyng Forman.

Correlated data. Overview. Example: Swelling due to vaccine. Variance component models. Faculty of Health Sciences. Variance component models

Faculty of Health Sciences. Correlated data. Variance component models. Lene Theil Skovgaard & Julie Lyng Forman.

Correlated data. Overview. Variance component models. Terminology for correlated measurements. Faculty of Health Sciences. Variance component models

Variance components and LMMs

Variance components and LMMs

Analysis of variance and regression. May 13, 2008

Multi-factor analysis of variance

Linear mixed models. Faculty of Health Sciences. Analysis of repeated measurements, 10th March Julie Lyng Forman & Lene Theil Skovgaard

Correlated data. Repeated measurements over time. Typical set-up for repeated measurements. Traditional presentation of data

Linear mixed models. Program. What are repeated measurements? Outline. Faculty of Health Sciences. Analysis of repeated measurements, 10th March 2015

Correlated data. Overview. Cross-over study. Repetition. Faculty of Health Sciences. Variance component models, II. More on variance component models

Multi-factor analysis of variance

An Introduction to Multilevel Models. PSYC 943 (930): Fundamentals of Multivariate Modeling Lecture 25: December 7, 2012

Correlated data. Further topics. Sources of random variation. Specification of mixed models. Faculty of Health Sciences.

Correlated data. Longitudinal data. Typical set-up for repeated measurements. Examples from literature, I. Faculty of Health Sciences

Models for longitudinal data

For more information about how to cite these materials visit

Random Coefficient Model (a.k.a. multilevel model) (Adapted from UCLA Statistical Computing Seminars)

Statistics for exp. medical researchers Regression and Correlation

Describing Change over Time: Adding Linear Trends

Topic 17 - Single Factor Analysis of Variance. Outline. One-way ANOVA. The Data / Notation. One way ANOVA Cell means model Factor effects model

Analysis of variance and regression. April 17, Contents Comparison of several groups One-way ANOVA. Two-way ANOVA Interaction Model checking

multilevel modeling: concepts, applications and interpretations

Analysis of variance. April 16, Contents Comparison of several groups

Analysis of variance. April 16, 2009

General Linear Model (Chapter 4)

Models for Clustered Data

Models for Clustered Data

SAS Syntax and Output for Data Manipulation: CLDP 944 Example 3a page 1

Mixed Models II - Behind the Scenes Report Revised June 11, 2002 by G. Monette

Answer to exercise: Blood pressure lowering drugs

A (Brief) Introduction to Crossed Random Effects Models for Repeated Measures Data

ECNS 561 Multiple Regression Analysis

MIXED MODELS FOR REPEATED (LONGITUDINAL) DATA PART 2 DAVID C. HOWELL 4/1/2010

Outline. Analysis of Variance. Acknowledgements. Comparison of 2 or more groups. Comparison of serveral groups

Variance. Standard deviation VAR = = value. Unbiased SD = SD = 10/23/2011. Functional Connectivity Correlation and Regression.

Introduction to the Analysis of Hierarchical and Longitudinal Data

Lecture 1 Introduction to Multi-level Models

Random Intercept Models

Outline. Analysis of Variance. Comparison of 2 or more groups. Acknowledgements. Comparison of serveral groups

Introduction to Within-Person Analysis and RM ANOVA

STAT 5200 Handout #23. Repeated Measures Example (Ch. 16)

Profile Analysis Multivariate Regression

Simple Linear Regression

ANOVA approaches to Repeated Measures. repeated measures MANOVA (chapter 3)

Analysis of Variance

Models for binary data

3 Multiple Linear Regression

with the usual assumptions about the error term. The two values of X 1 X 2 0 1

Review of CLDP 944: Multilevel Models for Longitudinal Data

WU Weiterbildung. Linear Mixed Models

Introduction to Random Effects of Time and Model Estimation

Ch 2: Simple Linear Regression

Statistics for exp. medical researchers Comparison of groups, T-tests and ANOVA

36-309/749 Experimental Design for Behavioral and Social Sciences. Dec 1, 2015 Lecture 11: Mixed Models (HLMs)

Bayesian Linear Regression

Introducing Generalized Linear Models: Logistic Regression

Day 4: Shrinkage Estimators

ECO220Y Simple Regression: Testing the Slope

Ron Heck, Fall Week 8: Introducing Generalized Linear Models: Logistic Regression 1 (Replaces prior revision dated October 20, 2011)

Analysis of variance and regression. November 22, 2007

Biostatistics Workshop Longitudinal Data Analysis. Session 4 GARRETT FITZMAURICE

Chapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression

STAT 525 Fall Final exam. Tuesday December 14, 2010

Linear models Analysis of Covariance

Simple linear regression

Linear models Analysis of Covariance

Chapter 9. Multivariate and Within-cases Analysis. 9.1 Multivariate Analysis of Variance

ANCOVA. ANCOVA allows the inclusion of a 3rd source of variation into the F-formula (called the covariate) and changes the F-formula

Correlated data. Non-normal outcomes. Reminder on binary data. Non-normal data. Faculty of Health Sciences. Non-normal outcomes

Correlation and Linear Regression

Interactions. Interactions. Lectures 1 & 2. Linear Relationships. y = a + bx. Slope. Intercept

Serial Correlation. Edps/Psych/Stat 587. Carolyn J. Anderson. Fall Department of Educational Psychology

Random Effects. Edps/Psych/Stat 587. Carolyn J. Anderson. Fall Department of Educational Psychology. university of illinois at urbana-champaign

Biostatistics Quantitative Data

Faculty of Health Sciences. Correlated data. More about LMMs. Lene Theil Skovgaard. December 4, / 104

General Principles Within-Cases Factors Only Within and Between. Within Cases ANOVA. Part One

Lecture 14: Introduction to Poisson Regression

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator

Section 3: Simple Linear Regression

Introduction to Crossover Trials

6. Multiple regression - PROC GLM

2.1 Linear regression with matrices

Correlation and Simple Linear Regression

Lab 3: Two levels Poisson models (taken from Multilevel and Longitudinal Modeling Using Stata, p )

Lecture 10 Multiple Linear Regression

Nonlinear regression. Nonlinear regression analysis. Polynomial regression. How can we model non-linear effects?

Review of Multiple Regression

Correlation and Regression Bangkok, 14-18, Sept. 2015

The Application and Promise of Hierarchical Linear Modeling (HLM) in Studying First-Year Student Programs

Unbalanced Designs & Quasi F-Ratios

4 Multiple Linear Regression

Differences of Least Squares Means

Transcription:

Analysis of variance and regression December 4, 2007

Variance component models Variance components One-way anova with random variation estimation interpretations Two-way anova with random variation Crossed random effects Ecological analyses

Lene Theil Skovgaard, Dept. of Biostatistics, Institute of Public Health, University of Copenhagen e-mail: L.T.Skovgaard@biostat.ku.dk http://staff.pubhealth.ku.dk/~lts/regression07_2

Variance Component Models, December 2007 1 Terminology for correlated measurements: Multivariate outcome: Several outcomes (responses) for each individual, e.g. a number of hormone measurements that we want to study simultaneously. Cluster design: Same outcome (response) measured on all individuals in a number of families/villages/school classes Repeated measurements: Same outcome (response) measured in different situations (or at different spots) for the same individual. Longitudinal measurements: Same outcome (response) measured consecutively over time for each individual.

Variance Component Models, December 2007 2 Variance component models Generalisations of ANOVA-type models or regression models, involving several sources of random variation (variance components) environmental variation between regions, hospitals or countries biological variation variation between individuals, families or animals within-individual variation variation between arms, teeth, injection sites, days variation due to uncontrollable circumstances time of day, temperature, observer measurement error

Variance Component Models, December 2007 3 Typical studies involve data from: a number of family members from a sample of households pupils from a sample of school classes measurements on several spots of each individual Alternative name (for some of them): Multilevel models variation on each level (variance component) possibly systematic effects (covariates) on each level

Variance Component Models, December 2007 4 Examples of hierarchies: individual context/cluster level 1 level 2 level 3 subjects twin pairs countries subjects families regions students classes schools visits subjects centres

Variance Component Models, December 2007 5 Merits Certain effects may be estimated more precisely, since some sources of variation are eliminated, e.g. by making comparisons within a family. This is analogous to the paired comparison situation. When planning subsequent investigations, the knowledge of the relative sizes of the variance components will be of help in deciding the number of repetitions needed at each level (if possible).

Variance Component Models, December 2007 6 Drawbacks When making inference (estimation and testing), it is important to take all sources of variation into account, and effects have to be evaluated using the relevant variation! Bias may result, if one or more sources of variation are disregarded

Variance Component Models, December 2007 7 Measurements belonging together in the same cluster look alike (are correlated) If we fail to take this correlation into account, we will experience: possible bias in the mean value structure low efficiency (type 2 error) for evaluation of level 1 covariates (within-cluster effects) too small standard errors (type 1 error) for estimates of level 2 effects (between-cluster effects)

Variance Component Models, December 2007 8 Concepts of the day: advantage/necessity of random effects generalisations of ANOVA-type models Examples with small data sets some of them too small to allow for trustworthy interpretations illustrative precisely because of their limited size Illustrated with SAS PROC MIXED

Variance Component Models, December 2007 9 One-way analysis of variance with random variation: Comparison of k groups/clusters, satisfying The groups are not of individual interest and it is of no interest to test whether they have identical means The groups may be thought of as representatives from a population, that we want to describe. Example: 10 consecutive measurements of blood pressure on a sample of 50 women: We know that the women differ and we do not care! We only want to learn something about blood pressure in the female population in general

Variance Component Models, December 2007 10 Example of one-way anova structure: 6 rabbits are vaccinated, each in 6 spots on the back Response Y : swelling in cm 2 Model: swelling = grand mean + rabbit deviation y rs = µ + α r + ε rs, r = 1,, R = 6 denotes the rabbit, + variation ε rs N(0,σ 2 ), where s = 1,, S = 6 denotes the spot The variation can be regarded either as within-rabbit variation or measurement error (probably a combination of the two).

Variance Component Models, December 2007 11 Rabbit means: µ r = µ + α r

Variance Component Models, December 2007 12 anova-table: SS df MS=SS/df F Between 12.8333 R 1 = 5 2.5667 4.39 Within 17.5266 R(S 1) = 30 0.5842 Total 30.3599 RS 1 = 35 0.8674 Test for identical rabbits means: F = 4.39 F(5, 30), P = 0.004, But: We are not interested in these particular 6 rabbits, only in rabbits in general, as a species! We assume these 6 rabbits to have been randomly selected from the species.

Variance Component Models, December 2007 13 We choose to model rabbit variation instead of rabbit levels: swelling = grand mean + between-rabbit variation + within-rabbit variation y rs = µ + a r + ε rs, where the a r s and the ε rs s are assumed independent, normally distributed with Var(a r )=ω 2 B, Var(ε rs)=σ 2 W The variation between rabbits has been made random ω 2 B and σ2 W are variance components, and the model is also called a two-level model

Variance Component Models, December 2007 14 Fixed vs. random effects? Fixed: all values of the factor present (typically only a few, e.g. treatment) allows inference for these particular factor values only must include a reasonable number of observations for each factor value Random: a representative sample of values of the factor is present allows inference to be extended beyond the values in the experiment and to the population of possible factor values (e.g. geographical areas, classes, rabbits) is necessary when we have a covariate for this level

Variance Component Models, December 2007 15 Interpretation: All observations have common mean and variance: y rs N(µ, ω 2 B + σ 2 W) but: Measurements made on the same rabbit are correlated with the intra-class correlation Corr(y r1, y r2 ) = ρ = ω 2 B ω 2 B + σ2 W Measurements made on the same rabbit tend to look more alike than measurements made on different rabbits. All measurements on the same rabbit look equally much alike. This correlation structure is called compound symmetry (CS) or exchangeability.

Variance Component Models, December 2007 16 Estimation of variance components First step is to determine the mean values of the mean squares (in balanced situations): E(MS B ) = Rω 2 B + σ2 W E(MS W ) = σ 2 W and from this we get the estimates σ 2 W = MS W σ 2 B = MS B MS W R

Variance Component Models, December 2007 17 Note: It may happen that σ 2 B becomes negative! by a coincidence as a result of competition between units belonging together, e.g. when measuring yield for plants grown in the same pot In this case, it will be reported as a zero

Variance Component Models, December 2007 18 Reading in data in SAS: data rabbit_orig; input spot $ y1-y6; cards; a 7.9 8.7 7.4 7.4 7.1 8.2 b 6.1 8.2 7.7 7.1 8.1 5.9 c 7.5 8.1 6.0 6.4 6.2 7.5 d 6.9 8.5 6.8 7.7 8.5 8.5 e 6.7 9.9 7.3 6.4 6.4 7.3 f 7.3 8.3 7.3 5.8 6.4 7.7 ; run; data rabbit; set rabbit_orig; rabbit=1; swelling=y1; output; rabbit=2; swelling=y2; output; rabbit=3; swelling=y3; output; rabbit=4; swelling=y4; output; rabbit=5; swelling=y5; output; rabbit=6; swelling=y6; output; run;

Variance Component Models, December 2007 19 In SAS, the estimation can be performed as: proc mixed data=rabbit; class rabbit; model swelling = / s; random rabbit; run; Covariance Parameter Estimates Cov Parm Estimate rabbit 0.3304 Residual 0.5842 Solution for Fixed Effects Standard Effect Estimate Error DF t Value Pr > t Intercept 7.3667 0.2670 5 27.59 <.0001

Variance Component Models, December 2007 20 Interpretation of variance components: Proportion of Variation Variance component Estimate variation Between ωb 2 0.3304 36% Within σw 2 0.5842 64% Total ωb 2 + σw 2 0.9146 100% Typical differences (95% Prediction Intervals): for spots on the same rabbit ±2 2 0.5842 = ±2.16 cm 2 for spots on different rabbits ±2 2 0.9146 = ±2.70 cm 2

Variance Component Models, December 2007 21 Interpretation of the size of the variance components: Approx. 2 3 of the variation in the measurements comes from the variation within rabbits. Maybe there is a systematic difference between the injection spots? Two-way anova: Source DF Type III SS Mean Square F Value Pr > F rabbit 5 12.833333 2.566667 4.69 0.0037 spot 5 3.833333 0.766667 1.40 0.2584 It does not look as if there is any systematic difference (P=0.26).

Variance Component Models, December 2007 22 Design considerations Imaginary experiment with measurements on R rabbits, and S spots for each rabbit. Var(ȳ) = ω2 B R + σ2 W RS For S=#spots, varying from 1 to 10: standard error 0.15 0.20 0.25 0.30 0.35 0.40 0.45 5 10 15 20 rabbits

Variance Component Models, December 2007 23 Effective sample size If we had only one observation for each of k rabbits, how many would we need to obtain the same precision? k = R S 1 + ρ(s 1) We have here ρ = ω2 B ω 2 B +σ2 W = 0.3304 0.3304+0.5842 = 0.361 k = 12.8 Effectively, we have only approximately two independent observations from each rabbit!

Variance Component Models, December 2007 24 Quantification of overall swelling method estimate (s.e.) 1: forget rabbit 7.367 (0.155) 2: fixed rabbit 7.367 (0.127) 3: rabbit averages 7.367 (0.267) 4: random rabbit 7.367 (0.267) 1. We pool all 36 measurements, mixing up the two variance components, and assuming independence 2. We estimate the mean swelling of exactly these 6 rabbits (using only within-rabbit variation) 3. We only look at averages for each rabbit (ecological analysis) 4. We estimate the mean swelling of rabbits as a species (the correct approach)

Variance Component Models, December 2007 25 Estimation of individual rabbit means: simple averages rely on individual measurements only, ȳ r. BLUP s (or EBLUP s, expected best unbiased linear predictor) rely on the assumption that individuals come from the same population, and become weighted averages: ω 2 B ω 2 B + σ2 W S ȳ r. + σ 2 W S ω 2 B + σ2 W S ȳ.. which have been shrinked towards the overall mean, ȳ..

Variance Component Models, December 2007 26 2 2 6 13 5 4 6 13 5 4

Variance Component Models, December 2007 27 When the 3 smallest measurements from rabbit 2 (largest level) are omitted, the results become: method estimate (s.e.) 1: forget rabbit 7.291 (0.163) 2: fixed rabbit 7.291 (0.136) 3a: rabbit averages 7.291 (0.265) (weighted) 3b: rabbit averages 7.436 (0.333) (unweighted) 4: random rabbit 7.390 (0.298) reference 7.367 (0.267) 1 we have omitted some of the largest observations 2+3a rabbit 2 has a lower weight in the average due to only 3 observations 3b average for rabbit 2 has increased 4 rabbit 2 has a lower weight in the average due to a larger standard error

Variance Component Models, December 2007 28 EBLUPS for the reduced data set: 2 2 Larger shrinkage than before, for rabbit no. 2 6 13 5 4 6 13 5 4

Variance Component Models, December 2007 29 Confidence limits for the variance components: Intra-individual variation σ 2 W : 0.373 < σ 2 W < 1.044 Inter-individual variation ω 2 B : 0.057 < ω 2 B < 2.48 So, we should take care not to over-interpret...

Variance Component Models, December 2007 30 We imagine, that rabbits are grouped in two (grp=1,2) proc mixed data=rabbit; class grp rabbit; model swelling = grp / s; random rabbit(grp); run; Cov Parm Estimate rabbit(grp) 0.3633 <----------------this changes Residual 0.5842 <---------this stays the same Solution for Fixed Effects Standard Effect grp Estimate Error DF t Value Pr > t Intercept 7.1444 0.3919 4 18.23 <.0001 grp 1 0.4444 0.5542 4 0.80 0.4675 grp 2 0....

Variance Component Models, December 2007 31 Such a comparison can not be performed in the usual way (ignoring the rabbits), since we then perform the comparison/test against a wrong variation. Type I error will occur! proc glm data=rabbit; class grp; model swelling=grp / solution; run; T for H0: Pr > T Std Error of Parameter Estimate Parameter=0 Estimate INTERCEPT 7.144444444 B 33.06 0.0001 0.21610872 GRP 1 0.444444444 B 1.45 0.1551 0.30562388 2 0.000000000 B...

Variance Component Models, December 2007 32 Two-level model: level unit variation covariates 1 rabbit*spot within rabbit spot 2 rabbit between rabbits group overall mean When the random rabbit variation is ignored: low efficiency (type 2 error) for evaluation of level 1 covariates (spot) too small standard errors (type 1 error) for estimates of level 2 effects (group, overall mean)

Variance Component Models, December 2007 33 Factor diagrams: In the traditional one-way anova: [I] = [R*S] [R] 0 In case of grouping: [I] = [R*S] [R] G 0 We have here used the notation arrows indicating simplifications / groupings [ ] for the random effects, corresponding to variance components on the various levels.

Variance Component Models, December 2007 34 Example: Number of nuclei per cell in the rat pancreas, used for the evaluation of cytostatica Henrik Winther Nielsen, Inst. Med. Anat. 4 rats (R) 3 sections for each rat (S) 5 randomly chosen fields from each section (F) 3-level model fields sections rats σ 2 τ 2 ω 2 Factor diagram: [I] = [R*S*F] [R*S] [R] 0

Variance Component Models, December 2007 35 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 Covariance Parameter Estimates Cov Parm Estimate rat 0.01787 section(rat) 0.002878 Residual 0.1968 Solution for Fixed Effects Standard Effect Estimate Error DF t Value Pr > t Intercept 0.7935 0.08936 3 8.88 0.0030

Variance Component Models, December 2007 36 Estimation of variance components Proportion of Variation Variance component Estimate variation Rats ω 2 0.0179 8.2% Sections τ 2 0.0029 1.3% Fields σ 2 0.1968 90.4% Total ω 2 + τ 2 + σ 2 0.2176 100%

Variance Component Models, December 2007 37 Typical differences: for sections on different rats ±2 2 (0.0179 + 0.0029 + 0.1968) = ±1.319 for different sections on the same rat ±2 2 (0.0029 + 0.1968) = ±1.264 for different fields on the same section ±2 2 0.1968 = ±1.255

Variance Component Models, December 2007 38 The correlation between two measurements on the same rat becomes: if they are measured on the same section: Corr(y rs1, y rs2 ) = ω 2 + τ 2 ω 2 + τ 2 + σ 2 = 0.096 if they are measured on different sections: Corr(y r11, y r22 ) = ω 2 ω 2 + τ 2 + σ 2 = 0.082

Variance Component Models, December 2007 39 Examples of hierarchies: individual cluster level 1 level 2 level 3 spots rabbits fields sections rats subjects twin pairs countries subjects families regions students classes schools visits subjects centres On all levels, we may have random variation (random effects or variance components), as well as covariates

Variance Component Models, December 2007 40 Average profiles: Example: 2 groups of dogs (5 resp. 6 dogs). Outcome: Osmolality, measured at 4 different times (with treatments along the way)

Variance Component Models, December 2007 41 Do we have repetitions?

Variance Component Models, December 2007 42 Residual plot (after suitable analysis): We see a clear trumpet shape, corresponding to the fact, that Dogs with a high level also vary more than dogs with a low level. Solution: Make a logarithmic transformation!

Variance Component Models, December 2007 43 Profiles on logarithmic scale with corresponding residual plot:

Variance Component Models, December 2007 44 Two-level model: level unit variation covariates 1 dog*time within dogs grp*time time 2 dog between dogs group overall mean proc mixed data=dogs; class grp time dog; model losmol=grp time grp*time / outpm=fit1 ddfm=satterth; random dog(grp); run;

Variance Component Models, December 2007 45 Class Level Information Class Levels Values grp 2 1 2 time 4 50 100 170 290 dog 11 1 2 3 4 5 6 7 8 9 10 11 Covariance Parameter Estimates Cov Parm Estimate dog(grp) 0.06587 Residual 0.03554 P=0.08 for test of interaction, i.e. no convincing indication of this. Type 3 Tests of Fixed Effects Num Den Effect DF DF F Value Pr > F grp 1 9 2.85 0.1257 time 3 27 21.35 <.0001 grp*time 3 27 2.50 0.0805

Variance Component Models, December 2007 46 When there is no interaction, we simply omit the term from the model (but we could also just use averages, since the design is balanced) proc mixed covtest data=dogs; class grp time dog; model losmol=grp time / outpm=fit2 ddfm=satterth s; random dog(grp); run; Covariance Parameter Estimates Standard Z Cov Parm Estimate Error Value Pr Z dog(grp) 0.06453 0.03534 1.83 0.0339 Residual 0.04088 0.01056 3.87 <.0001

Variance Component Models, December 2007 47 Solution for Fixed Effects Standard Effect grp time Estimate Error DF t Value Pr > t Intercept 0.5422 0.1235 9 4.39 0.0017 grp 1 0.2795 0.1656 9 1.69 0.1257 grp 2 0.... time 50 0.1215 0.08621 30 1.41 0.1691 time 110-0.2173 0.08621 30-2.52 0.0173 time 170-0.4608 0.08621 30-5.35 <.0001 time 290 0.... Type 3 Tests of Fixed Effects Num Den Effect DF DF F Value Pr > F grp 1 9 2.85 0.1257 time 3 30 17.66 <.0001

Variance Component Models, December 2007 48 In contrast, if we forget the random dog-effect, and perform a traditional two-way anova: proc glm data=dogs; class grp time; model losmol=grp time / solution; run; The GLM Procedure Dependent Variable: losmol Source DF Type III SS Mean Square F Value Pr > F grp 1 0.85213853 0.85213853 8.48 0.0059 time 3 2.16564038 0.72188013 7.19 0.0006

Variance Component Models, December 2007 49 Standard Parameter Estimate Error t Value Pr > t Intercept 0.5421708688 B 0.10504427 5.16 <.0001 grp 1 0.2794864906 B 0.09595797 2.91 0.0059 grp 2 0.0000000000 B... time 50 0.1214934244 B 0.13514314 0.90 0.3742 time 110 -.2172752206 B 0.13514314-1.61 0.1160 time 170 -.4608255057 B 0.13514314-3.41 0.0015 time 290 0.0000000000 B... Type 2 error for effect of time (level 1 covariate) time is evaluated in an unpaired fashion Type 1 error for effect of grp (level 2 covariate) we think we have more information than we actually have (we disregard the correlation)

Variance Component Models, December 2007 50 Factor diagram: [Dog] Grp [I] [Dog Time] Grp Time Time We note the following: The effect of GRP*TIME is evaluated against DOG*TIME If GRP*TIME is not considered significant, we thereafter evaluate TIME against DOG*TIME GRP against DOG, also called DOG(GRP)

Variance Component Models, December 2007 51 Interpretation of group effect: The estimated group difference is 0.279 (0.166), corresponding to a 95% confidence interval of (-0.053,0.611). But this is on a logarithmic scale! We perform a back transformation with the exponential function and may conclude, that group 1 lies exp(0.279)=1.321 times higher than group 2, i.e. 32.1% higher. The 95% confidence interval is (exp(-0.053),exp(0.611))=(0.948,1.842)

Variance Component Models, December 2007 52 Example of a non-hierarchical model: Visual acuity: time in msec. from a stimulus (light flash) to the electrical response at the back of the cortex, measured for 7 individuals (patient), 2 eyes for each individual (eye) 4 lens magnifications (power) for each eye 1 3 3 6 1 3 57 3 1 4 1 6 6 4 2 5 52 5 4 4 27 27 7 6 3 1 13 3 3 4 15 4 45 4 7 5 1 7 2 2 5 2 2 7 7 6 6 6 6 Crowder & Hand (1990)

Variance Component Models, December 2007 53 Predictors of visual acuity: Main effects: Systematic (mean value): eye, power Random: patient Interactions: Systematic (mean value): eye*power Random: patient*eye, patient*power Example of crossed random factors

Variance Component Models, December 2007 54 proc mixed data=visual; class patient eye power; model acuity=eye power eye*power / ddfm=satterth; random patient patient*eye patient*power; run; Cov Parm Estimate patient 20.2857 patient*eye 11.6845 patient*power 4.0238 Residual 12.8333 Type 3 Tests of Fixed Effects Effect Num DF Den DF F Value Pr > F eye 1 6 0.78 0.4112 power 3 18 2.25 0.1177 eye*power 3 18 1.06 0.3925

Variance Component Models, December 2007 55 Factor diagram: [I] = [Pa Ey Po] [Pa Ey] Ey Po [Pa Po] Ey [Pa] Po 0

Variance Component Models, December 2007 56 level unit covariates 1 single measurements Ey*Po 2 interactions 2A [Pa*Ey] Ey 2B [Pa*Po] Po 2 individuals, [Pa] overall level

Variance Component Models, December 2007 57

Variance Component Models, December 2007 58 Blood pressure and social inequity 15569 women in 17 regions of Malmö Covariates: Individual (level 1): low educational achievement (x 1 ) (less than 9 years of school) age group (x 2 ) Regional (level 2): rate of people with low educational achievement (z 1 ) from the Skåne Council Statistics Office

Variance Component Models, December 2007 59 Ecological analysis level 2 analysis (analysis of regional averages): Y: average blood pressure in residential area Z 1 : rate of people with low educational achievement Estimate of regression coefficient: 4.655(1.420) It seems to be an important explanatory variable!?

Variance Component Models, December 2007 60 Size of circle indicates size of investigation Obvious effect of aggregated covariate

Variance Component Models, December 2007 61

Variance Component Models, December 2007 62 Estimates from variance component model: Covariates Individual variation Rate of variation in low education between low-education between model individuals regions x 1 σw 2 z 1 ωb 2 none - 96.034-0.347 age - 92.213-0.258 x 1, age 1.152 (0.170) 91.830-0.143 z 1, age - 91.484 4.058 (1.345) 0.121 x 1, z 1, age 1.093 (0.167) 91.256 2.966 (1.250) 0.087

Variance Component Models, December 2007 63 We note the following: Region as a random effect could only account for 0.4% of the variation in blood pressure! Ecological variable ( Rate of low-income ) will have very little impact! Ecological analysis sums up the two effects, but is not able to distinguish between the two effects It overestimates the level 2 effect It cannot be interpreted as a level 1 effect

Variance Component Models, December 2007 64

Variance Component Models, December 2007 65 Covariate effects on level 1 and level 2 can be very different Example: Reading ability, as a function of age and cohort: 1 1 2 2 3 3 4 4 5 5 6 6 1 1 1 1 1 1 2 2 2 2 2 2

Variance Component Models, December 2007 66 Misspecification missing random effect Result - type 2 error for x (unpaired) - type 1 error for z (too many df s, wrong variation) missing z - estimate of ω 2 B - estimate of σ 2 W too big perhaps too big (in unbalanced designs) missing x - estimate of ω 2 B or too small - estimate of σ 2 W too big too big

Variance Component Models, December 2007 67 Simulated data: Random effect of individual y 0 2 4 6 8 10 12 14 1.0 1.5 2.0 2.5 3.0 3.5 4.0 p

Variance Component Models, December 2007 68 Estimates: Level Variation standard deviation 1 within individuals ˆσ W = 1.59 2 between individuals ˆω B = 1.23

Variance Component Models, December 2007 69 Individual (level 1) covariate x, e.g. time/age: y 0 2 4 6 8 10 12 14 2 4 6 8 10 12 14 x

Variance Component Models, December 2007 70 Estimates: standard regression Level Variation deviation coefficient 1 within individuals ˆσ W = 0.41 β x = 1.028(0.046) 2 between individuals ˆω B = 4.43 -

Variance Component Models, December 2007 71 Addition of a level 2 covariate: z, e.g. age: y 0 2 4 6 8 10 12 14 2 4 6 8 10 12 x

Variance Component Models, December 2007 72 Estimates: standard regression Level Variation deviation coefficient 1 within individuals ˆσ W = 0.41 ˆβx = 1.033(0.046) 2 between individuals ˆω B = 1.14 ˆβz = 1.316(0.206)

Variance Component Models, December 2007 73 Comparison of estimates: within individual between individual Model ˆβx sd, ˆσ W ˆβz sd, ˆω B - 1.59-1.23 x 1.028 (0.046) 0.41-4.43 z - 1.59-0.284 (0.201) 1.03 x, z 1.033 (0.046) 0.41-1.316 (0.206) 1.14

Variance Component Models, December 2007 74 Example: suicide and religion Ecological analysis of regions: % suicides increases with % protestants, i.e. Protestants are more likely to commit suicide Or??

Variance Component Models, December 2007 75 level unit variation covariates 1 individuals within region, σ 2 W religion, x 2 regions between regions, ω 2 B % protestants, z True explanation: Interaction between individual effect (x) and region covariate (z) More suicides among catholics, in regions with many protestants