Oct Analysis of variance models. One-way anova. Three sheep breeds. Finger ridges. Random and. Fixed effects model. The random effects model

Size: px

Start display at page:

Download "Oct Analysis of variance models. One-way anova. Three sheep breeds. Finger ridges. Random and. Fixed effects model. The random effects model"

Damian Nichols
5 years ago
Views:

1 s s Oct / 34

2 s Consider N = n 0 + n n k 1 observations, which form k groups, of sizes n 0, n 1,..., n k 1. The r-th group has sample mean Ȳ r The overall mean (for all groups combined) is Ȳ = N 1 Y i. Observations in the r-th group are N(m r, σ 2 W ), for r = 0 to k 1: Y = b 0 + b 1 X b k 1 X k 1 + e e is N(0, σ 2 W ), and X 1... X k 1 are indicator variables. 2 / 34

3 Indicator variables s For an observation in group 0, all indicator variables are zero. Otherwise, for r 1, the value of X r for an observation in the r-th group is 1, and all other indicators are zero. In terms of group means, regression parameters are: b 0 = m 0, and b r = m r m 0 for r = 1... k 1 The null hypothesis : m 0 = m 1 = = m k 1 (no difference among groups) is equivalent to the standard multiple regression hypothesis b 1 = b 2 = = b k 1 = 0. 3 / 34

4 Parameter estimates s Estimates of the parameters are b0 = Ȳ 0, br = Ȳ r Ȳ 0, for r = 1... k 1 The fitted value for an observation in the r-th group is Ŷ = b 0 + b 1 X b k 1 X k 1 = Ȳ r (the sample mean for the r-th group) 4 / 34

5 Between-groups sum of squares s Regression sum of squares is k 1 S B = n r (Ȳ r Ȳ) 2, r=0 also referred to as the between-group sum of squares. Residual sum of squares S W is the sum of the residual sum of squares for each group ( within-group sum of squares). Source DF Sum Sq Mean Sq F Between k 1 S B M B M B /M W Within N k S W M W Total N 1 S yy M W (also written S 2 ) estimates the residual σ 2 W. 5 / 34

6 Hand calculation s T r is the subtotal for the r-th group, T = T 0 + T T k 1 is the total of all observations. S yy = N i=1 Yi 2 T2 N, S k 1 B = r=0 T 2 r n r T2 N S W is calculated as the difference between S yy and S B. 6 / 34

7 s A random sample is taken from each of 3 sheep, and blood Cu content measured on the 16 sheep. Blackface Welsh Cross / 34

8 Hand calculation of sheep data s Blackface Welsh Cross Total (5) (5) (6) (16) Uncorrected sum of squares for all 16 measurements is Total (corrected) sum of squares = = Between sum of squares = = / 34

9 Anova table s Source DF Sum Sq Mean Sq F Between 2 Within 13 Total 15 9 / 34

10 Anova table s Source DF Sum Sq Mean Sq F Between Within 13 Total / 34

11 Anova table s Source DF Sum Sq Mean Sq F Between Within Total / 34

12 Anova table s Source DF Sum Sq Mean Sq F Between Within Total The F test is highly significant (P < 0.001) 9 / 34

13 in identical twins s Fam Twin 1 Twin Fam Twin 1 Twin / 34

14 s Special case (all groups of size two) Fam sum diff Between family sum of squares Fam sum diff ( )/ /24 = = Within family sum of squares ( )/2 = / 34

15 anova table (finger ridges) s Source DF Sum Sq Mean Sq F Between families Within families Total The F test is very highly significant P < / 34

16 s The ANOVA calculations for these two examples are similar, but the interpretation of the mean squares is different. In the first example, different are regarded as, and we are primarily interested in differences (contrasts) between. Examples of factors which are usually treated as fixed are experimental treatments, sex, breed. 13 / 34

17 Random effects s In the second example, differences between particular families are not of interest. Instead, we regard the twelve family means as sampled from a N(m, σ 2 B ) distribution. Primary interest is in the of σ 2 B and σ2 W. Differences between families are treated as random effects. Examples of random effects are families, litters, or sires, where the levels are a sample of those possible and a repeat experiment would use different levels. 14 / 34

18 Fixed-effect s In the fixed-, F tests the hypothesis m 0 = m 1 = = m k 1, or equivalently, b 1 = b 2 = = b k 1 = 0. In the sheep example, F = on 2 and 13 d.f. (very highly significant). Differences between are established beyond reasonable doubt. 15 / 34

19 Comparing means s Mean values for each breed: Blackface Welsh Cross Estimated standard error of a difference between two means based on n 1 and n 2 observations is MW (1/n 1 + 1/n 2 ) where M W is the within group mean square from the ANOVA. Standard errors of differences between breed means are (Blackface v Welsh), and (otherwise). 16 / 34

20 Comparing means s If m 1 = m 2, the difference between means Ȳ 1 Ȳ 2 divided by its estimated s.e. has a t distn with N k = 13 d.f. Comparison Estimate SE t Blackface v Welsh Blackface v Cross Welsh v Cross The upper 2.5% point of t on 13 d.f. is Difference between Blackface and Cross does not quite reach 5% significance. The Welsh breed has significantly higher measurements than the other two. Confidence intervals are calculated in the usual way. 17 / 34

21 Contrasts s The general form of a contrast is a 1 m 1 + a 2 m a k m k, where a 1 + a a k = 0. The estimate of the contrast is a 1 Ȳ 1 + a 2 Ȳ a k Ȳ k, with estimated s.e. ( M W a 2 1 /n 1 + a 2 2 /n a 2 k /n ) k The t statistic for testing whether the contrast is zero is the ratio of the estimated contrast to its s.e. 18 / 34

22 Another contrast s In the sheep example, a simple genetic hypothesis is H 0 : m B + m W 2m C = 0. Estimate of the contrast Ȳ B + Ȳ W 2Ȳ C is normally distd with zero mean (if H 0 is true) and estimated M W (1/n B + 1/n W + 4/n C ) Here the estimate is ( ) = 0.66, with estimated 0.732(1/5 + 1/5 + 4/6) = t = 0.66/ = with 13 d.f. Not significant, there is no evidence against the genetic. 19 / 34

23 Random Equal group sizes s To the fixed- we add the assumption that m 1... m k are drawn randomly from an N(m, σ 2 B ) distn. We now have E(Y i ) = m, var (Y i ) = σ 2 B + σ 2 W. Expectations of between and within group mean squares are DF Mean Sq E(Mean Sq) Between groups k 1 M B σ 2 W + nσ2 B Within groups N k M W σ 2 W The F statistic tests H 0 : σ 2 B = 0. The overall mean Ȳ estimates m, with (σ 2 W + nσ2 B )/N. 20 / 34

24 s In the random the total of an observation σ 2 B + σ2 W is partitioned into σ2 B and σ2 W. Estimates are obtained by equating observed and expected mean squares: σ 2 W = M W, σ 2 B = (M B M W )/n. Source DF Sum Sq Mean Sq E(Mean Sq) Between families σ 2 W + 2σ2 B Within families σ 2 W Total In the finger ridges example, σ 2 W = 14.29, σ 2 B = / 34

25 Intraclass correlation s The intraclass correlation coefficient σ 2 B /(σ2 W + σ2 B ) measures the extent to which individuals in the same group are more similar than individuals in different groups. An estimate of the intraclass correlation is σ 2 B σ 2 B + σ 2 W M B M W = M B + (n 1)M W In the finger ridges example, σ 2 W = 14.29, σ 2 B = , and the estimated intraclass correlation is / 34

26 Cross-classification s In the sheep example, we may want to allow for the possibility that the measurement is affected by the sex of the animal. Suppose that the first three Blackface, first two Welsh, and first three Cross sheep are male, and the rest female. We now have a cross- classification of the data by breed and sex. Male Female Blackface Welsh Cross / 34

27 Two factors, cross-classification s breed - factor(rep(c( Blackface, Welsh, Cross ), each = 6)) sex - factor(rep(1:2, times = 3, each = 3), labels = c( male, female )) Cu - c(6.5,7.9,7.4,6.8,8.1,na, 10.4,9.8,NA,11.1,10.6,9.2, 6.9,9.2,8.4,7.6,9.7,8.9) fit - lm(cu breed + sex + breed:sex) anova(fit) Df Sum Sq Mean Sq F value breed sex breed:sex Residuals / 34

28 Sequential sums of squares s anova(fit) Df Sum Sq Mean Sq F value breed sex breed:sex Residuals ) Sum of squares for (ignoring sex). 2) Extra sum of squares obtained by fitting sex after breed (an additive : fitted values are sums of breed differences and the sex difference). 3) Sex difference is allowed to depend on breed. This is equivalent to a one-way anova with six levels, one for each combination of breed and sex. 25 / 34

29 Sequence of fits s Intercept b 0 has been omitted everywhere. breed only Blackface Welsh Cross Male 0 b 1 b 2 Female 0 b 1 b 2 additive Blackface Welsh Cross Male 0 b 1 b 2 Female c 1 b 1 + c 1 b 2 + c 1 saturated Blackface Welsh Cross Male 0 b 1 b 2 Female c 1 b 1 + c 1 + d 1 b 2 + c 1 + d 2 d 1 and d 2 are interaction terms (differences of differences). 26 / 34

30 s Mouse Litter An experiment to investigate variability in skin thickness of mice used 5 litters of 3 mice. For each mouse, skin thickness was measured twice by independent determinations. 27 / 34

31 example s In the random the total of an observation σ 2 B + σ2 W is partitioned into σ2 B and σ2 W. Estimates are obtained by equating observed and expected mean squares: σ 2 W = M W, σ 2 B = (M B M W )/n. Source DF Sum Sq Mean Sq E(Mean Sq) Between families σ 2 W + 2σ2 B Within families σ 2 W Total In the finger ridges example, σ 2 W = 14.29, σ 2 B = / 34

32 Mouse skin thickness example s Source DF SSQ MSQ E(MSQ) Litters σ 2 W + 2σ2 M + 6σ2 L Mice in litters σ 2 W + 2σ2 M Residual σ 2 W Estimated of Between observations, within mice 62.3 Between mice, within litters Between litters / 34

33 The mixed s There are several complications which can affect anova. a) It may be necessary to allow for (e.g. sex of mouse). b) The data may be unbalanced. For example, the number of readings per mouse may vary, perhaps because of missing observations. A solution to these problems is to use a mixed (with both fixed and random effects), and estimate by maximum likelihood. lme( ), lmer( ), MCMCglmm( ), asreml, / 34

34 A mixed s A simple example of a mixed is obtained if in the skin thickness example we treat litters as fixed rather than random. Expected mean squares are unchanged, except that the σ 2 L term in the expected mean square for litters is replaced by the sample of the. The test for litter effects is unchanged: F = between litters mean square between mice within litters mean square Compare the last part of farms example on problem sheet / 34

35 s Main assumptions underlying the anova procedure are: 1) Data are normally distributed. 2) The σw 2 is constant. Diagnostic plots, e.g. of residuals against fitted values, are used to check these assumptions (as for regression). If any of the assumptions is false, the situation can sometimes be improved by a transformation of the data. The most common transformations are the square root and logarithmic. These are most useful in correcting a tendency for the of an observation to increase with its size. 32 / 34

36 s breed - factor(rep(c( Blackface, Welsh, Cross ), c(5,5,6))) Cu - c(6.5,7.9,7.4,6.8,8.1,10.4,9.8,11.1,10.6,9.2, 6.9,9.2,8.4,7.6,9.7,8.9) lmfit - lm(cu breed) anova(lmfit) # anova table plot(lmfit) # diagnostic plots # alternatively... aovfit - aov(cu breed) summary(aovfit) # anova table 33 / 34

37 Two factors, nested classification s If litter and mouse are factors with 5 and 15 levels, aov(skin litter + Error(mouse)) produces two residual mean squares, a) between mice within litters (10 d.f.) b) between observations within mice (15 d.f.) Mean square with 10 d.f. is the denominator in the F test for differences among litters. The coding which numbers mice 1-3 within litters aov(skin litter + Error(litter:mouse)) works but is not recommended. 34 / 34

Oct Simple linear regression. Minimum mean square error prediction. Univariate. regression. Calculating intercept and slope

Oct Simple linear regression. Minimum mean square error prediction. Univariate. regression. Calculating intercept and slope Oct 2017 1 / 28 Minimum MSE Y is the response variable, X the predictor variable, E(X) = E(Y) = 0. BLUP of Y minimizes average discrepancy var (Y ux) = C YY 2u C XY + u 2 C XX This is minimized when u