Hierarchical Random Effects

Size: px
Start display at page:

Download "Hierarchical Random Effects"

Transcription

1 enote 5 1 enote 5 Hierarchical Random Effects

2 enote 5 INDHOLD 2 Indhold 5 Hierarchical Random Effects Introduction Main example: Lactase measurements in piglets Constant mean value structure The one layer model Two layer model Three or more layers Zero/Negative variance parameters Comparing different variance structures Better confidence limits in the balanced case Prediction of random effect coefficients R-TUTORIAL: The lactase example Exercises Introduction This enote takes a closer look at a type of models, where the experimental design leads to a natural hierarchy of the random effects. Consider for instance the following

3 enote MAIN EXAMPLE: LACTASE MEASUREMENTS IN PIGLETS 3 experimental design, to determine the fat content in milk (from a given area at a given time): 1. Ten farms were selected randomly from all milk producing farms in the area. 2. From each selected farm six cows were randomly selected. 3. From each selected cow two cups of milk were taken randomly from the daily production to be analyzed in the lab. Intuitively, two cups of milk from the same cow would be expected to be more similar, than two cups from different cows at different farms. Notice how data collected this way is in violation with the standard assumption of independent measurements. 5.2 Main example: Lactase measurements in piglets As part of a larger study of the intestinal health in newborn piglets, the gut enzyme lactase was measured in 20 piglets taken from 5 different litters. For each of the 20 piglets the lactase level was measured in three different regions. At the time the measurement was taken the piglet was either unborn (status=1) or newborn (status=2). These data are kindly provided by Charlotte Reinhard Bjørnvad, Department of Animal Science and Animal Health, Division of Animal Nutrition, Royal Veterinary and Agricultural University. The 60 log transformed measurements are listed below:

4 enote CONSTANT MEAN VALUE STRUCTURE 4 Obs litter pig reg status loglact Obs litter pig reg status loglact Notice the hierarchical structure in which the measurements are naturally ordered. First the five litters, then the piglets, and finally the individual measurements. This structure is illustrated in Figure 5.1 for the first two litters. Litter=1 Litter=2 Pig=1 Pig=2 Pig=3 Pig=4 Pig=5 Pig=6 Pig=7 Pig= Figur 5.1: The hierarchical structure of the lactase data set for the first two litters 5.3 Constant mean value structure The mixed model framework is very flexible, and it is possible to include the mean value structures known from fixed effects models, as well as the random effects structure. In this enote focus is on the variance structure, and for this reason the same simple mean value structure is used when the models are presented. The mean structure is simply

5 enote THE ONE LAYER MODEL 5 denoted µ without any subscripts. This is done in order to keep focus on the variance structure. In the analyzed examples more realistic mean value structures will be used. 5.4 The one layer model The simplest hierarchical model is the one layer model, which is typically denoted: y i = µ + ε i, ε i i.i.d. N(0, σ 2 ) Here all observations are independent, so this is really not a hierarchical model, and is listed here mainly for completeness. The mean value parameter(s) in a model with this simple variance structure is estimated via least squares method, maximum likelihood method, or restricted/residual maximum likelihood method. In this case these methods all give the same result, namely: ˆβ = (X X) 1 X y Here β is the parameter(s) of the fixed effects part of the model, y is the observations, and X is the design matrix for the fixed effects part of the model, as described in the previous enote. The single variance parameter in the one layer model is estimated via the restricted/residual maximum likelihood method, which gives an unbiased estimate ˆσ 2 = 1 N p N i=1 (y i ˆµ i ) 2 Here N is the total number of observations, p is the number of mean value parameters, and ˆµ i is the Best Linear Unbiased Estimate (BLUE) of the mean value for the i th observation, which is calculated as ˆµ i = (X ˆβ) i. The sum in the estimate for the variance is often referred to as the Sum of Squared Derivations (SSD), or simply as the Sum of Squares (SS). Lactase measurements in piglets analyzed as independent observations To analyse the lactase data set with a one layer model, the information about litter and pig is ignored. Remaining is the region (1,2 or 3), the status (1 or 2), and the interaction between region and status ((1,1), (1,2), (2,1), (2,2), (3,1), or (3,2)). The one layer model for the lactase data becomes: y i = µ + α(status i ) + β(reg i ) + γ(status i, reg i ) + ε i, ε i i.i.d. N(0, σ 2 )

6 enote THE ONE LAYER MODEL 6 reg [I] 53 status reg status 1 2 Figur 5.2: Factor structure in the lactase example when analyzed as independent measurements (one layer model). where i = 1,..., 60 is the observation number. The model is illustrated in the factor structure diagram in Figure 5.2 To analyze this model in R, write: model1 <- lm(loglact ~ reg + status + reg:status, data = lactase) anova(model1) Analysis of Variance Table Response: loglact Df Sum Sq Mean Sq F value Pr(>F) reg ** status * reg:status Residuals Signif. codes: 0 *** ** 0.01 *

7 enote THE ONE LAYER MODEL 7 In this case the mean value of the log-lactase measurements are described by region, status and the interaction between them. Notice how the intercept term µ is not specified. R adds the intercept term automatically. If the intercept term is undesired in a given model it must be removed by adding -1 after the last model term. From this output it follows that the estimate of the variance parameter is ˆσ 2 = , and two times the negative restricted/residual log likelihood function 2l re, when evaluated with the optimal parameters can be found from (-2) * loglik(model1, REML=TRUE) log Lik (df=7) and is 2l re = This is an important value as it is useful for comparing different variance structures. This enote is about hierarchical variance structures, and hence little attention is paid to the mean value structure. The ANOVA table indicates a p-value of 6.6% for removing the interaction term from the model, which is above the standard 5% significance level. Type I and Type III tests - again Type I (in anova): Succesive tests Each term is corrected for the terms PREVIOUSLY listed in the model Sums of squares sum to TOTAL sums of squares Depends on the order they are specified in the model Type III (in drop1 and anova of lmertest): Parallel tests Each term is corrected for ALL other the terms in the model Sums of squares do NOT generally sum to TOTAL sums of squares Does NOT depend on the order they are specified in the model If balanced data: the SAME! drop1 is really Type II = Type III with the condition: Do NOT test main effects which are part of (fixed) interactions!

8 enote TWO LAYER MODEL Two layer model The simplest real hierarchical model is the two layer model, which is denoted: y i = µ + a(sub i ) + ε i where a(sub i ) N(0, σ 2 a ), and ε i N(0, σ 2 ), and all (both a s and ε s) are independent. In this two layer model all observations are no longer independent, in fact the covariance between two observations is: 0, if sub i1 = sub i2 and i 1 = i 2 cov(y i1, y i2 ) = σa 2, if sub i1 = sub i2 and i 1 = i 2 σa 2 + σ 2, if i 1 = i 2 which is easily calculated from the model. This means that two observations coming from the same subject are positively correlated. This is exactly what is needed in many situations where several observations are taken from the same subject. It corresponds well with the intuition behind the experiment, and the variance parameters can be interpreted in a natural way. The subject variance parameter σ 2 a is the variance between subjects, and the residual variance parameter σ 2 is the variance between measurements on the same subject. The variance parameters and the fixed effect parameter(s) in the two layer model are estimated via the restricted/residual maximum likelihood method. The REML method is preferred to the maximum likelihood method, because it gives less biased estimates of the variance parameters. For the fixed effect parameter(s) the REML estimate is given by: ˆβ = (X V 1 X) 1 X V 1 y Here β is the parameter(s) of the fixed effect part of the model, y is the observations, and X is the design matrix for the fixed effect part of the model. The matrix V is the covariance matrix for the observations y, which is estimated from the REML estimates of the variance parameters. In this two layer model the structure of V is: V = σa 2 + σ 2 σa 2 σa σa 2 σa 2 + σ 2 σa σa 2 σa 2 σa 2 + σ σa 2 + σ 2 σa 2 σa σa 2 σa 2 + σ 2 σa σa 2 σa 2 σa 2 + σ 2 Here illustrated for the situation with two subjects and three observations on each subject. The general pattern is a block diagonal covariance matrix with blocks corresponding to each subject.

9 enote TWO LAYER MODEL 9 status reg reg status [I] 0 [pig] Figur 5.3: Factor structure in the lactase example when analyzed with a two layer model (random pig effect). Lactase measurements in piglets analyzed with two layer model To analyze the lactase data set with a two layer model the information about litter is still ignored, but the information about pig is now included. This corresponds to the two bottom layers in Figure 5.1 and is illustrated in the factor structure diagram in Figure 5.3. Alternatively the the pig information could have been omitted and the litter information included. The two layer model with random pig effect is: y i = µ + α(status i ) + β(reg i ) + γ(status i, reg i ) + d(pig i ) + ε i, where d(pig i N(0, σ 2 d ), and ε i N(0, σ 2 ), and all (both d s and ε s) are independent. Here i = 1,..., 60 is the observation number. The idea behind this model is that pigs are different, and the pigs in this investigation are random representatives from the relevant population of newborn pigs. The variance between pigs is described with the σd 2 parameter, and the remaining variance between measurements on the same pig is described with the σ 2 parameter.

10 enote TWO LAYER MODEL 10 To analyze this model in R, write: require(lmertest) model2 <- lmer(loglact ~ reg * status + (1 pig), data = lactase) anova(model2) Analysis of Variance Table of type III with Satterthwaite approximation for degrees of freedom Sum Sq Mean Sq NumDF DenDF F.value Pr(>F) reg ** status reg:status ** --- Signif. codes: 0 *** ** 0.01 * print(summary(model2), corr=false) Linear mixed model fit by REML t-tests use Satterthwaite approximations to degrees of freedom [lmermod] Formula: loglact ~ reg * status + (1 pig) Data: lactase REML criterion at convergence: 79.9 Scaled residuals: Min 1Q Median 3Q Max Random effects: Groups Name Variance Std.Dev. pig (Intercept) Residual Number of obs: 59, groups: pig, 20 Fixed effects: Estimate Std. Error df t value Pr(> t ) (Intercept) e-13 *** reg reg

11 enote THREE OR MORE LAYERS 11 status reg2:status reg3:status * --- Signif. codes: 0 *** ** 0.01 * From this output it is evident, that variation between pigs is a major part of the total variation. The variance between pigs is estimated to ˆσ d 2 = , and the residual variance is estimated to ˆσ 2 = So a little less than half of the total variation in the data set is due to the variation between pigs. Twice the negative restricted/residual log likelihood is 2l re = The conclusion about the fixed effect parameters is different from the one layer model. In the one layer model the interaction term between status and region could be removed with a p-value of 6.6%, but now the interaction term is significant with a p-value of 0.76%. The important lesson to be learned from this example is that the variance structure matters. Even in two models with the exact same fixed effect structure, the conclusions can differ greatly if the variance structure is not correct. Later in this enote a method for comparing different variance structures will be shown. 5.6 Three or more layers Hierarchical models with three or more levels are natural in some situations. Consider for instance the situation with randomly selected farms, randomly selected cows from each farm, and randomly selected cups of milk from the daily production of each cow. In this situation the natural model would be a three layer model. The basic three level model is denoted: y i = µ + a(block i ) + b(sub i ) + ε i where a(block i ) N(0, σa 2 ), b(sub i ) N(0, σb 2) and ε i N(0, σ 2 ) all independent. i is the observation index. The interpretation is similar to the interpretation of the two layer model. The parameter σa 2 is the variance between blocks (farms), σb 2 is the variance between subjects (cows) in the same block (farm), and σ 2 is the variance between observations from the same subject (cow). The parameters are estimated by REML method, just like in the two layer model, and the structure of the covariance matrix is:

12 enote THREE OR MORE LAYERS 12 V = σ 2 a +σ 2 b +σ2 σ 2 a +σ 2 b σ 2 a σ 2 a σ 2 a +σ 2 b σ2 a +σ 2 b +σ2 σ 2 a σ 2 a σ 2 a σ 2 a σ 2 a +σ 2 b +σ2 σ 2 a +σ 2 b σ 2 a σ 2 a σ 2 a +σ 2 b σ2 a +σ 2 b +σ σ 2 a +σ 2 b +σ2 σ 2 a +σ 2 b σ 2 a σ 2 a σ 2 a +σ 2 b σ2 a +σ 2 b +σ2 σ 2 a σ 2 a σ 2 a σ 2 a σ 2 a +σ 2 b +σ2 σ 2 a +σ 2 b σ 2 a σ 2 a σ 2 a +σ 2 b σ2 a +σ 2 b +σ2 Here illustrated for the case with two blocks, two subjects from each block, and two observations from each subject. The first 4 by 4 block in V corresponds to the first block, and the next non-zero 4 by 4 block corresponds to the second block. Within each of these blocks there is a similar structure, where two observations from the same subject are higher correlated, than two observations from different subjects. Lactase measurements in piglets analyzed with three layer model The natural model for the lactase data set is a three layer model, with litter as top layer, pig as the middle layer, and each single measurement as the bottom layer as illustrated in the factor structure diagram in Figure 5.4. This hierarchy is illustrated in Figure 5.1 for the first two litters. y i = µ + α(status i ) + β(reg i ) + γ(status i ), reg i )) + d(litter i ) + e(pig i ) + ε i, where d(litter i ) N(0, σ 2 d ), e(pig i) N(0, σ 2 e ), and ε i N(0, σ 2 ), and all (both d s, e s, and ε s) are independent. i is observation number. To analyze this model in R, write: (Here the litter information has been added to the random part of the model) model3 <- lmer(loglact ~ reg * status + (1 pig) + (1 litter), data=lactase) print(summary(model3), corr=false) Linear mixed model fit by REML t-tests use Satterthwaite approximations to degrees of freedom [lmermod] Formula: loglact ~ reg * status + (1 pig) + (1 litter) Data: lactase

13 enote THREE OR MORE LAYERS 13 status reg reg status [I] 0 [pig] [litter] Figur 5.4: Factor structure in the lactase example when analyzed with a three layer model (random pig and litter effect).

14 enote THREE OR MORE LAYERS 14 REML criterion at convergence: 79.9 Scaled residuals: Min 1Q Median 3Q Max Random effects: Groups Name Variance Std.Dev. pig (Intercept) litter (Intercept) Residual Number of obs: 59, groups: pig, 20; litter, 5 Fixed effects: Estimate Std. Error df t value Pr(> t ) (Intercept) e-13 *** reg reg status reg2:status reg3:status * --- Signif. codes: 0 *** ** 0.01 * anova(model3) Analysis of Variance Table of type III with Satterthwaite approximation for degrees of freedom Sum Sq Mean Sq NumDF DenDF F.value Pr(>F) reg ** status reg:status ** --- Signif. codes: 0 *** ** 0.01 * confint(model3, 1:3, oldnames=false) Computing profile confidence intervals...

15 enote ZERO/NEGATIVE VARIANCE PARAMETERS % 97.5 % sd_(intercept) pig sd_(intercept) litter sigma Compared to the output from the two layer model there is basically no difference. The likelihood is exactly the same and the variance for the litter effect is estimated at zero, and the conclusions about the fixed effect parameters are unchanged. 5.7 Zero/Negative variance parameters In some software, e.g. SAS Proc Mixed, an option can be set to allow for negative variance components - NOT in R though. When negative variance estimates occur, it is usually as an underestimate of small (or zero) variance. In this case a reasonable way to handle the negative estimates is simply to leave out the, or fix to zero, the corresponding variance component. In some rare cases it is possible to interpret a negative variance parameter estimate. For instance in the case of several observations on the same litter. It could be possible that some of the animals would do well and eat all the available food, and some of the animals would do poorly, and because all the food was now gone they would do even worse. In this type of competitive situations there is a negative within-group correlation, and the variation within a group could be greater than the variation between groups, which would lead to negative variance estimates. In R these are instead set to zero which really makes better sense in most cases. 5.8 Comparing different variance structures To test a reduction in the variance structure a restricted/residual likelihood ratio test can be used. A likelihood ratio test is used to compare two models A and B, where B is a sub-model of A. Typically the model including some variance component (model A), and the model without this variance component (model B) is to be compared. To do this the negative restricted/residual log-likelihood values (l re (A) must be computed. The test can now be computed as: G A B = 2l (B) re 2l (A) re and l re (B) ) from both models

16 enote COMPARING DIFFERENT VARIANCE STRUCTURES 16 The likelihood ratio test statistic follows a χ 2 d f distribution, where d f is the difference between the number of parameters in A and the number of parameters in B. In the case of removing a single variance component d f = 1. For the lactase data set three different variance structures have been used, the comparison of these variance models is seen in the following table: Model 2l re G value df P value (A) Pig and litter included 79.9 G A B = P A B = 1 (B) Only pig included 79.9 G B C = P B C = /2 (C) Independent observations 89.5 It follows that the litter variance component can be left out of the model, but the pig variance component is very significant. In other words: Two observations taken on the same pig are correlated, but two observations from different pigs within the same litter are not correlated (positively or negatively). Warning: It is really more correct to use DF=0.5 instead of df=1 in these tests! OR similarly(appxomately) divide the p values by 2. To calculate these p values via R, either use the anova function with refit=false (to get the restricted likelihood ratio tests): anova(model2, model3, refit=false) Or compute the negative log-likgelihood directly: (-2) * loglik(model3) (-2) * loglik(model2) (-2) * loglik(model1, REML=TRUE) # Need REML=TRUE for lm-objects to compute the p values by hand : G_value <- -2 * c(loglik(model1, REML=TRUE) - loglik(model2)) pchisq(g_value, df=1, lower.tail = FALSE) [1]

17 enote BETTER CONFIDENCE LIMITS IN THE BALANCED CASE Better confidence limits in the balanced case In R we get the confidence intervals as profile likelihood based such from the confint function. We also saw the alternative method of using an approximate χ 2 -distribution based on the Satterthwaite approximation approach - however this method requires the asymptotic variance-covariance matrix of the (random part) parameters. And per se these are not so easily extracted from lmer results. (It will be built into the lmertestpackage, but not currently available). In the balanced case the variance components can often be simply expressed by linear combinations of independent Mean Squares, which are all chi-square distributed and a simple chi-square approximation can be used to construct better confidence limits. Some of the theory behind this is given here. A factor F is said to be balanced if the same number of observations are made for each of its levels. In this section it is assumed that all factors are balanced. As seen in the enote on factor structure diagrams, it is possible in nice cases to calculate the degrees of freedom from the factor structure diagram. The formula to calculate the degrees of freedom df F for a factor F is: df F = F df G G:G<F Here F is the number of levels of the factor F, and the last sum is the sum of the degrees of freedom from all factors coarser than F. In these nice cases a similar formula exists for computing the sums of squares (SS) to be used in the analysis of variance table. The sum of squares SS F corresponding to a factor F is: SS F = 1 n F S F (j) 2 SS G j F G:G<F This formula require some explaining. The term n F is the number of observations on each level of the factor (as the factor is balanced this is the same on all levels). The term S F (j) is the sum of all observations at the j th level of the factor F. For instance if F is the factor indicating treatment ( no or yes ), then S F (yes) is the sum of all observations from treated individuals. The last sum is the sum of the SS s from all factors coarser than F.

18 enote BETTER CONFIDENCE LIMITS IN THE BALANCED CASE 18 Example 5.1 Balanced two layer model Consider again the two layer model: y i = µ + a(sub i ) + ε i where a(sub i ) N(0, σ 2 a ), and ε i N(0, σ 2 ), and all independent. The number of subjects is sub, the number of observations on each subject is n sub, and hence the total number of observations in the balanced case is N = sub n sub. The factor structure for this model is: N [I] N sub sub [sub] sub From this diagram the degrees of freedom for each factor can be calculated (in fact they are already printed in the diagram). For the constant factor 0 the degrees of freedom is always df 0 = 1, as it is the coarsest factor and has only one level. The subject factor has sub levels and only 0 is coarser, so df sub = sub 1. Finally for the finest factor I. I has N levels, but sub and 0 are coarser, so df I = N ( sub 1) 1 = N sub. Similarly the SS values can be calculated as: SS 0 = N 1 ( i=1...n y i ) 2 SS sub = n 1 sub j sub S sub (j) 2 SS 0 SS I = i=1...n y 2 i SS sub SS 0 These formulas may seem cumbersome, but actually they are fairly simple to use, as they only depend on the total sum, the total sum of squares, and the subject sums. With these formulas in place the analysis of variance table can be constructed as: Source SS df MS EMS Subject SS sub sub 1 SS sub /( sub 1) σ 2 + n sub σa 2 Residual SS I N sub SS I /(N sub ) σ 2

19 enote BETTER CONFIDENCE LIMITS IN THE BALANCED CASE 19 So natural estimates of the two variance parameters are: ˆσ 2 = MS I and ˆσ 2 a = MS sub MS I n sub In this balanced case it is also possible to write down better confidence intervals for the variance parameters, than the general confidence intervals based on the normal approximation, which will be described later. The standard 95% confidence interval for the residual variance is: (N sub )ˆσ 2 χ ;(N sub ) < σ 2 < (N sub )ˆσ2 χ ;(N sub ) Here N sub are the degrees of freedom corresponding to MS I. The parameter σa 2 is estimated by a linear combination of two MS values, so the exact degrees of freedom can not be found. Satterthwaite s approximation is used to approximate the degrees of freedom. It this case Satterthwaite determine the degrees of freedom by making the corresponding χ 2 distribution have the same variance as the estimate. The 95% confidence interval is: νˆσ a 2 χ 2 < ˆσ a 2 < νˆσ2 a 0.975;ν χ ;ν where the ν is estimated to: ν = ( ) 2 MSsub n sub (σ 2 a ) 2 sub 1 + ( ) 2 MSI n sub N sub Slightly more general, if fixed effects are also present in the model, or if possibly some inbalance occur, one may still estimate variance components by a so-called moment method. This is actually a simple alternative to REML estimation. It amounts to a solution of the linear system, where the theoretical expected random effect mean squares are equalled with the observed mean squares. In balanced experiments these theoretical expected mean squares can be deduced from the factor strcuture diagram, cf. enote 2, and the results of this will equal the REML estimates! In other cases, expected mean squares are almost impossible to deduce by hand calculation. This is not so easily available in R. In general, IF a variance component is estimated as a linear combination of mean squares: ˆσ 2 A = K k=1 b k MS k

20 enote PREDICTION OF RANDOM EFFECT COEFFICIENTS 20 where the mean squares have associated degrees of freedom f 1,..., f k, then an approximate 95% confidence band can be obtained by: νˆσ 2 A χ ;ν < σ 2 A < νˆσ2 A χ ;ν where the ν is estimated to: ν = K k=1 (ˆσ 2 A )2 (b k MS k ) 2 This is the basic Satterthwaite formula. In the case above, K = 2, b 1 = 1/n sub, b 2 = 1/n sub, f 1 = sub 1 and f 2 = N sub. In general, the REML estimates cannot be written as such linear combinations, so other methods are required. We saw in enote 4 a different satterthwaite method is described for REML estimates. Before computers revolutionized the world of statistics, the balanced case was extremely important. Great care was taken to obtain balanced experiments, but a single failed measurement could ruin a balanced design. With modern computers and software the balanced case seems less important. f k 5.10 Prediction of random effect coefficients This is reproduced from enote 4. A model with random effects is typically used in situations where the subjects (or blocks) in the study are to be considered as representatives from a greater population. As such, the main interest is usually not in the actual levels of the randomly selected subjects, but in the variation among them. It is however possible to obtain predictions of the individual subject levels u. The formula is given in matrix notation as: û = GZ V 1 (y X ˆβ) If G and R are known, û can be shown to be the best linear unbiased predictor (BLUP) of u. Here, best means minimum mean squared error. In real applications G and R are always estimated, but the predictor is still referred to as the BLUP. In R we get the random effect BLUPs very easily:

21 enote R-TUTORIAL: THE LACTASE EXAMPLE 21 ranef(model2) $pig (Intercept) R-TUTORIAL: The lactase example The R-function lmer from the package lme4 cannot be used for models with no random effects. Instead lm would have to be used whenever version of our models with no random effects are needed for reasons of comparison: lactase <- read.table("lactase.txt", sep=",", header=true, na.strings=".") lactase$litter <- factor(lactase$litter) lactase$pig <- factor(lactase$pig) lactase$reg <- factor(lactase$reg) lactase$status <- factor(lactase$status) model1 <- lm(loglact ~ reg + status + reg:status, data = lactase)

22 enote R-TUTORIAL: THE LACTASE EXAMPLE 22 require(car) Anova(model1, type = 3, test.statistic="f") Anova Table (Type III tests) Response: loglact Sum Sq Df F value Pr(>F) (Intercept) e-16 *** reg status reg:status Residuals Signif. codes: 0 *** ** 0.01 * Even though type III tests for main effects are produced by the Anova function, one must be careful about the interpretations of these, and for unbalanced cases and in fact also models with covariates we should not use such type III tests from this function (as the interpretation is generally unclear). In the present case there is only one test for which we would always be sure to have a good interpretation: namely the test for the interaction term and this would e.g. also be returned by the drop1 function: drop1(model1, test="f") Single term deletions Model: loglact ~ reg + status + reg:status Df Sum of Sq RSS AIC F value Pr(>F) <none> reg:status Signif. codes: 0 *** ** 0.01 * In R the two-layer model is specified using the function lmer

23 enote R-TUTORIAL: THE LACTASE EXAMPLE 23 model2 <- lmer(loglact ~ reg * status + (1 pig), data = lactase) anova(model2) Analysis of Variance Table of type III with Satterthwaite approximation for degrees of freedom Sum Sq Mean Sq NumDF DenDF F.value Pr(>F) reg ** status reg:status ** --- Signif. codes: 0 *** ** 0.01 * where the (1 pig) part specifies that there is a random effect for each level of the variable pig. The (default) output from anova, using the lmertest package, is Type III F statistics and p values. A Type I version is also an option (use type = 1). Remark 5.2 WARNING a confusing detail: in general the SS and MS columns of the lme4 (and hence lmertest) will NOT coincide with what is seen from lm as the SS and MS colunms from the lme4-anova are scaled by the proper mixed model error terms such that when they are divided by the error variance, the proper mixed model F- statistics will appear! Confused? See the example below!

24 enote R-TUTORIAL: THE LACTASE EXAMPLE 24 Example 5.3 Comparing anova on lm and lmer ## We illustrate this on a balanced data set, so we remove pigs 3-8: lactase_red <- subset(lactase, pig %in% c(1:2,9:20)) testmodel <- lmer(loglact ~ status + (1 pig), data = lactase_red) testmodelfixed <- lm(loglact ~ status + pig, data = lactase_red) anova(testmodel) Analysis of Variance Table of type III with Satterthwaite approximation for degrees of freedom Sum Sq Mean Sq NumDF DenDF F.value Pr(>F) status anova(testmodelfixed) Analysis of Variance Table Response: loglact Df Sum Sq Mean Sq F value Pr(>F) status pig * Residuals Signif. codes: 0 *** ** 0.01 * Note that the proper mixed model F-statistic for the status effect (as pigs are nested within status) is F Mixed = MS status = MS pig = where the SS status is defined the usual ANOVA way: (as there are 21 observations for each status in the reduced ) statusmeans <- tapply(lactase_red$loglact, lactase_red$status, mean) overallmean <- mean(lactase_red$loglact) 21*sum((statusmeans-overallmean)^2) [1]

25 enote R-TUTORIAL: THE LACTASE EXAMPLE 25 But from anova(model2) instead we get the number SS = , which is coming as: SS (lmer) status = = = SS(usual) status MS Residual MS Pig = SS (usual) status MS Residual (Mixed Error) and finally note that the F-statistic for status effect then can be achieved by taking the (socalled) MS from anova(model2) and dividing by the residual error: F Mixed = MS(lmer) status = MS Residual = And by the way the residual error MS from the fixed model is also the estimate of the residual error variance of the mixed model: sigma(testmodel)^2 [1] The function lmer returns a mermod object containing all information about the fitted model. In the example just below model2 is an mermod object: To extract information from the object various functions, so-called extractors, can be applied to the object. We have already met the two extractors anova and summary earlier. A list of all such methods is found as: methods(class = "mermod") [1] anova Anova as.function coef [5] confint deltamethod deviance df.residual [9] drop1 extractaic family fitted [13] fixef formula getl getme [17] hatvalues isglmm islmm isnlmm [21] isreml linearhypothesis loglik matchcoefs [25] model.frame model.matrix ngrps nobs [29] plot predict print profile [33] ranef refit refitml residuals [37] show sigma simulate summary [41] terms update VarCorr vcov [45] weights see?methods for accessing help and source code

26 enote EXERCISES Exercises Exercise 1 Blueberries In an experiment with 4 sorts of blueberries the fructification (the number of berries as percent of the number of flowers earlier in the season) was determined for 5 twigs on each of 6 bushes for every sort. The data are available in the file blueberry.txt see also the Hierachical Section 13.3 in enote 13. a) Investigate if the 4 sorts have different fructification. b) Draw the factor structure diagram corresponding to the analyzed model. c) Test if it is important to include bush as a random effect.

Workshop 9.3a: Randomized block designs

Workshop 9.3a: Randomized block designs -1- Workshop 93a: Randomized block designs Murray Logan November 23, 16 Table of contents 1 Randomized Block (RCB) designs 1 2 Worked Examples 12 1 Randomized Block (RCB) designs 11 RCB design Simple Randomized

More information

Repeated measures, part 1, simple methods

Repeated measures, part 1, simple methods enote 11 1 enote 11 Repeated measures, part 1, simple methods enote 11 INDHOLD 2 Indhold 11 Repeated measures, part 1, simple methods 1 11.1 Intro......................................... 2 11.1.1 Main

More information

Mixed Model Theory, Part I

Mixed Model Theory, Part I enote 4 1 enote 4 Mixed Model Theory, Part I enote 4 INDHOLD 2 Indhold 4 Mixed Model Theory, Part I 1 4.1 Design matrix for a systematic linear model.................. 2 4.2 The mixed model.................................

More information

Intruction to General and Generalized Linear Models

Intruction to General and Generalized Linear Models Intruction to General and Generalized Linear Models Mixed Effects Models IV Henrik Madsen Anna Helga Jónsdóttir hm@imm.dtu.dk April 30, 2012 Henrik Madsen Anna Helga Jónsdóttir (hm@imm.dtu.dk) Intruction

More information

Repeated measures, part 2, advanced methods

Repeated measures, part 2, advanced methods enote 12 1 enote 12 Repeated measures, part 2, advanced methods enote 12 INDHOLD 2 Indhold 12 Repeated measures, part 2, advanced methods 1 12.1 Intro......................................... 3 12.2 A

More information

20. REML Estimation of Variance Components. Copyright c 2018 (Iowa State University) 20. Statistics / 36

20. REML Estimation of Variance Components. Copyright c 2018 (Iowa State University) 20. Statistics / 36 20. REML Estimation of Variance Components Copyright c 2018 (Iowa State University) 20. Statistics 510 1 / 36 Consider the General Linear Model y = Xβ + ɛ, where ɛ N(0, Σ) and Σ is an n n positive definite

More information

Outline. Statistical inference for linear mixed models. One-way ANOVA in matrix-vector form

Outline. Statistical inference for linear mixed models. One-way ANOVA in matrix-vector form Outline Statistical inference for linear mixed models Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark general form of linear mixed models examples of analyses using linear mixed

More information

lme4 Luke Chang Last Revised July 16, Fitting Linear Mixed Models with a Varying Intercept

lme4 Luke Chang Last Revised July 16, Fitting Linear Mixed Models with a Varying Intercept lme4 Luke Chang Last Revised July 16, 2010 1 Using lme4 1.1 Fitting Linear Mixed Models with a Varying Intercept We will now work through the same Ultimatum Game example from the regression section and

More information

Mixed Model: Split plot with two whole-plot factors, one split-plot factor, and CRD at the whole-plot level (e.g. fancier split-plot p.

Mixed Model: Split plot with two whole-plot factors, one split-plot factor, and CRD at the whole-plot level (e.g. fancier split-plot p. STAT:5201 Applied Statistic II Mixed Model: Split plot with two whole-plot factors, one split-plot factor, and CRD at the whole-plot level (e.g. fancier split-plot p.422 OLRT) Hamster example with three

More information

A brief introduction to mixed models

A brief introduction to mixed models A brief introduction to mixed models University of Gothenburg Gothenburg April 6, 2017 Outline An introduction to mixed models based on a few examples: Definition of standard mixed models. Parameter estimation.

More information

Correlated Data: Linear Mixed Models with Random Intercepts

Correlated Data: Linear Mixed Models with Random Intercepts 1 Correlated Data: Linear Mixed Models with Random Intercepts Mixed Effects Models This lecture introduces linear mixed effects models. Linear mixed models are a type of regression model, which generalise

More information

Stat 579: Generalized Linear Models and Extensions

Stat 579: Generalized Linear Models and Extensions Stat 579: Generalized Linear Models and Extensions Linear Mixed Models for Longitudinal Data Yan Lu April, 2018, week 14 1 / 64 Data structure and Model t1 t2 tn i 1st subject y 11 y 12 y 1n1 2nd subject

More information

R Output for Linear Models using functions lm(), gls() & glm()

R Output for Linear Models using functions lm(), gls() & glm() LM 04 lm(), gls() &glm() 1 R Output for Linear Models using functions lm(), gls() & glm() Different kinds of output related to linear models can be obtained in R using function lm() {stats} in the base

More information

22s:152 Applied Linear Regression. Take random samples from each of m populations.

22s:152 Applied Linear Regression. Take random samples from each of m populations. 22s:152 Applied Linear Regression Chapter 8: ANOVA NOTE: We will meet in the lab on Monday October 10. One-way ANOVA Focuses on testing for differences among group means. Take random samples from each

More information

22s:152 Applied Linear Regression. There are a couple commonly used models for a one-way ANOVA with m groups. Chapter 8: ANOVA

22s:152 Applied Linear Regression. There are a couple commonly used models for a one-way ANOVA with m groups. Chapter 8: ANOVA 22s:152 Applied Linear Regression Chapter 8: ANOVA NOTE: We will meet in the lab on Monday October 10. One-way ANOVA Focuses on testing for differences among group means. Take random samples from each

More information

Simulation and Analysis of Data from a Classic Split Plot Experimental Design

Simulation and Analysis of Data from a Classic Split Plot Experimental Design Simulation and Analysis of Data from a Classic Split Plot Experimental Design 1 Split-Plot Experimental Designs Field Plot Block 1 Block 2 Block 3 Block 4 Genotype C Genotype B Genotype A Genotype B Genotype

More information

Stat 5303 (Oehlert): Balanced Incomplete Block Designs 1

Stat 5303 (Oehlert): Balanced Incomplete Block Designs 1 Stat 5303 (Oehlert): Balanced Incomplete Block Designs 1 > library(stat5303libs);library(cfcdae);library(lme4) > weardata

More information

Random and Mixed Effects Models - Part III

Random and Mixed Effects Models - Part III Random and Mixed Effects Models - Part III Statistics 149 Spring 2006 Copyright 2006 by Mark E. Irwin Quasi-F Tests When we get to more than two categorical factors, some times there are not nice F tests

More information

Regression and the 2-Sample t

Regression and the 2-Sample t Regression and the 2-Sample t James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) Regression and the 2-Sample t 1 / 44 Regression

More information

Ch 2: Simple Linear Regression

Ch 2: Simple Linear Regression Ch 2: Simple Linear Regression 1. Simple Linear Regression Model A simple regression model with a single regressor x is y = β 0 + β 1 x + ɛ, where we assume that the error ɛ is independent random component

More information

SPH 247 Statistical Analysis of Laboratory Data

SPH 247 Statistical Analysis of Laboratory Data SPH 247 Statistical Analysis of Laboratory Data March 31, 2015 SPH 247 Statistical Analysis of Laboratory Data 1 ANOVA Fixed and Random Effects We will review the analysis of variance (ANOVA) and then

More information

Homework 3 - Solution

Homework 3 - Solution STAT 526 - Spring 2011 Homework 3 - Solution Olga Vitek Each part of the problems 5 points 1. KNNL 25.17 (Note: you can choose either the restricted or the unrestricted version of the model. Please state

More information

Introduction and Background to Multilevel Analysis

Introduction and Background to Multilevel Analysis Introduction and Background to Multilevel Analysis Dr. J. Kyle Roberts Southern Methodist University Simmons School of Education and Human Development Department of Teaching and Learning Background and

More information

Random and Mixed Effects Models - Part II

Random and Mixed Effects Models - Part II Random and Mixed Effects Models - Part II Statistics 149 Spring 2006 Copyright 2006 by Mark E. Irwin Two-Factor Random Effects Model Example: Miles per Gallon (Neter, Kutner, Nachtsheim, & Wasserman, problem

More information

Stat 5303 (Oehlert): Randomized Complete Blocks 1

Stat 5303 (Oehlert): Randomized Complete Blocks 1 Stat 5303 (Oehlert): Randomized Complete Blocks 1 > library(stat5303libs);library(cfcdae);library(lme4) > immer Loc Var Y1 Y2 1 UF M 81.0 80.7 2 UF S 105.4 82.3 3 UF V 119.7 80.4 4 UF T 109.7 87.2 5 UF

More information

Example: Poisondata. 22s:152 Applied Linear Regression. Chapter 8: ANOVA

Example: Poisondata. 22s:152 Applied Linear Regression. Chapter 8: ANOVA s:5 Applied Linear Regression Chapter 8: ANOVA Two-way ANOVA Used to compare populations means when the populations are classified by two factors (or categorical variables) For example sex and occupation

More information

Introduction to Within-Person Analysis and RM ANOVA

Introduction to Within-Person Analysis and RM ANOVA Introduction to Within-Person Analysis and RM ANOVA Today s Class: From between-person to within-person ANOVAs for longitudinal data Variance model comparisons using 2 LL CLP 944: Lecture 3 1 The Two Sides

More information

Outline for today. Two-way analysis of variance with random effects

Outline for today. Two-way analysis of variance with random effects Outline for today Two-way analysis of variance with random effects Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark Two-way ANOVA using orthogonal projections March 4, 2018 1 /

More information

Outline. Example and Model ANOVA table F tests Pairwise treatment comparisons with LSD Sample and subsample size determination

Outline. Example and Model ANOVA table F tests Pairwise treatment comparisons with LSD Sample and subsample size determination Outline 1 The traditional approach 2 The Mean Squares approach for the Completely randomized design (CRD) CRD and one-way ANOVA Variance components and the F test Inference about the intercept Sample vs.

More information

Topic 25 - One-Way Random Effects Models. Outline. Random Effects vs Fixed Effects. Data for One-way Random Effects Model. One-way Random effects

Topic 25 - One-Way Random Effects Models. Outline. Random Effects vs Fixed Effects. Data for One-way Random Effects Model. One-way Random effects Topic 5 - One-Way Random Effects Models One-way Random effects Outline Model Variance component estimation - Fall 013 Confidence intervals Topic 5 Random Effects vs Fixed Effects Consider factor with numerous

More information

Mixed effects models

Mixed effects models Mixed effects models The basic theory and application in R Mitchel van Loon Research Paper Business Analytics Mixed effects models The basic theory and application in R Author: Mitchel van Loon Research

More information

The Analysis of Split-Plot Experiments

The Analysis of Split-Plot Experiments enote 7 1 enote 7 The Analysis of Split-Plot Experiments enote 7 INDHOLD 2 Indhold 7 The Analysis of Split-Plot Experiments 1 7.1 Introduction.................................... 2 7.2 The Split-Plot Model...............................

More information

MS&E 226: Small Data

MS&E 226: Small Data MS&E 226: Small Data Lecture 15: Examples of hypothesis tests (v5) Ramesh Johari ramesh.johari@stanford.edu 1 / 32 The recipe 2 / 32 The hypothesis testing recipe In this lecture we repeatedly apply the

More information

Multiple Regression Part I STAT315, 19-20/3/2014

Multiple Regression Part I STAT315, 19-20/3/2014 Multiple Regression Part I STAT315, 19-20/3/2014 Regression problem Predictors/independent variables/features Or: Error which can never be eliminated. Our task is to estimate the regression function f.

More information

Topic 17 - Single Factor Analysis of Variance. Outline. One-way ANOVA. The Data / Notation. One way ANOVA Cell means model Factor effects model

Topic 17 - Single Factor Analysis of Variance. Outline. One-way ANOVA. The Data / Notation. One way ANOVA Cell means model Factor effects model Topic 17 - Single Factor Analysis of Variance - Fall 2013 One way ANOVA Cell means model Factor effects model Outline Topic 17 2 One-way ANOVA Response variable Y is continuous Explanatory variable is

More information

Mixed models in R using the lme4 package Part 2: Longitudinal data, modeling interactions

Mixed models in R using the lme4 package Part 2: Longitudinal data, modeling interactions Mixed models in R using the lme4 package Part 2: Longitudinal data, modeling interactions Douglas Bates Department of Statistics University of Wisconsin - Madison Madison January 11, 2011

More information

Inference for Regression

Inference for Regression Inference for Regression Section 9.4 Cathy Poliak, Ph.D. cathy@math.uh.edu Office in Fleming 11c Department of Mathematics University of Houston Lecture 13b - 3339 Cathy Poliak, Ph.D. cathy@math.uh.edu

More information

Inference with Heteroskedasticity

Inference with Heteroskedasticity Inference with Heteroskedasticity Note on required packages: The following code requires the packages sandwich and lmtest to estimate regression error variance that may change with the explanatory variables.

More information

1 Multiple Regression

1 Multiple Regression 1 Multiple Regression In this section, we extend the linear model to the case of several quantitative explanatory variables. There are many issues involved in this problem and this section serves only

More information

Inference. ME104: Linear Regression Analysis Kenneth Benoit. August 15, August 15, 2012 Lecture 3 Multiple linear regression 1 1 / 58

Inference. ME104: Linear Regression Analysis Kenneth Benoit. August 15, August 15, 2012 Lecture 3 Multiple linear regression 1 1 / 58 Inference ME104: Linear Regression Analysis Kenneth Benoit August 15, 2012 August 15, 2012 Lecture 3 Multiple linear regression 1 1 / 58 Stata output resvisited. reg votes1st spend_total incumb minister

More information

STAT 3A03 Applied Regression With SAS Fall 2017

STAT 3A03 Applied Regression With SAS Fall 2017 STAT 3A03 Applied Regression With SAS Fall 2017 Assignment 2 Solution Set Q. 1 I will add subscripts relating to the question part to the parameters and their estimates as well as the errors and residuals.

More information

A Handbook of Statistical Analyses Using R 2nd Edition. Brian S. Everitt and Torsten Hothorn

A Handbook of Statistical Analyses Using R 2nd Edition. Brian S. Everitt and Torsten Hothorn A Handbook of Statistical Analyses Using R 2nd Edition Brian S. Everitt and Torsten Hothorn CHAPTER 12 Analysing Longitudinal Data I: Computerised Delivery of Cognitive Behavioural Therapy Beat the Blues

More information

Multiple Regression: Example

Multiple Regression: Example Multiple Regression: Example Cobb-Douglas Production Function The Cobb-Douglas production function for observed economic data i = 1,..., n may be expressed as where O i is output l i is labour input c

More information

General Linear Model (Chapter 4)

General Linear Model (Chapter 4) General Linear Model (Chapter 4) Outcome variable is considered continuous Simple linear regression Scatterplots OLS is BLUE under basic assumptions MSE estimates residual variance testing regression coefficients

More information

ST 512-Practice Exam I - Osborne Directions: Answer questions as directed. For true/false questions, circle either true or false.

ST 512-Practice Exam I - Osborne Directions: Answer questions as directed. For true/false questions, circle either true or false. ST 512-Practice Exam I - Osborne Directions: Answer questions as directed. For true/false questions, circle either true or false. 1. A study was carried out to examine the relationship between the number

More information

Randomized Complete Block Designs

Randomized Complete Block Designs Randomized Complete Block Designs David Allen University of Kentucky February 23, 2016 1 Randomized Complete Block Design There are many situations where it is impossible to use a completely randomized

More information

PAPER 218 STATISTICAL LEARNING IN PRACTICE

PAPER 218 STATISTICAL LEARNING IN PRACTICE MATHEMATICAL TRIPOS Part III Thursday, 7 June, 2018 9:00 am to 12:00 pm PAPER 218 STATISTICAL LEARNING IN PRACTICE Attempt no more than FOUR questions. There are SIX questions in total. The questions carry

More information

Introduction to General and Generalized Linear Models

Introduction to General and Generalized Linear Models Introduction to General and Generalized Linear Models Mixed effects models - II Henrik Madsen, Jan Kloppenborg Møller, Anders Nielsen April 16, 2012 H. Madsen, JK. Møller, A. Nielsen () Chapman & Hall

More information

UNIVERSITY OF TORONTO Faculty of Arts and Science

UNIVERSITY OF TORONTO Faculty of Arts and Science UNIVERSITY OF TORONTO Faculty of Arts and Science December 2013 Final Examination STA442H1F/2101HF Methods of Applied Statistics Jerry Brunner Duration - 3 hours Aids: Calculator Model(s): Any calculator

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression MATH 282A Introduction to Computational Statistics University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/ eariasca/math282a.html MATH 282A University

More information

Mixed models in R using the lme4 package Part 1: Linear mixed models with simple, scalar random effects

Mixed models in R using the lme4 package Part 1: Linear mixed models with simple, scalar random effects Mixed models in R using the lme4 package Part 1: Linear mixed models with simple, scalar random effects Douglas Bates 2011-03-16 Contents 1 Packages 1 2 Dyestuff 2 3 Mixed models 5 4 Penicillin 10 5 Pastes

More information

Generating OLS Results Manually via R

Generating OLS Results Manually via R Generating OLS Results Manually via R Sujan Bandyopadhyay Statistical softwares and packages have made it extremely easy for people to run regression analyses. Packages like lm in R or the reg command

More information

Econometrics 1. Lecture 8: Linear Regression (2) 黄嘉平

Econometrics 1. Lecture 8: Linear Regression (2) 黄嘉平 Econometrics 1 Lecture 8: Linear Regression (2) 黄嘉平 中国经济特区研究中 心讲师 办公室 : 文科楼 1726 E-mail: huangjp@szu.edu.cn Tel: (0755) 2695 0548 Office hour: Mon./Tue. 13:00-14:00 The linear regression model The linear

More information

BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation

BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation Yujin Chung November 29th, 2016 Fall 2016 Yujin Chung Lec13: MLE Fall 2016 1/24 Previous Parametric tests Mean comparisons (normality assumption)

More information

FACTORIAL DESIGNS and NESTED DESIGNS

FACTORIAL DESIGNS and NESTED DESIGNS Experimental Design and Statistical Methods Workshop FACTORIAL DESIGNS and NESTED DESIGNS Jesús Piedrafita Arilla jesus.piedrafita@uab.cat Departament de Ciència Animal i dels Aliments Items Factorial

More information

Introduction to SAS proc mixed

Introduction to SAS proc mixed Faculty of Health Sciences Introduction to SAS proc mixed Analysis of repeated measurements, 2017 Julie Forman Department of Biostatistics, University of Copenhagen 2 / 28 Preparing data for analysis The

More information

Coping with Additional Sources of Variation: ANCOVA and Random Effects

Coping with Additional Sources of Variation: ANCOVA and Random Effects Coping with Additional Sources of Variation: ANCOVA and Random Effects 1/49 More Noise in Experiments & Observations Your fixed coefficients are not always so fixed Continuous variation between samples

More information

Outline. Mixed models in R using the lme4 package Part 3: Longitudinal data. Sleep deprivation data. Simple longitudinal data

Outline. Mixed models in R using the lme4 package Part 3: Longitudinal data. Sleep deprivation data. Simple longitudinal data Outline Mixed models in R using the lme4 package Part 3: Longitudinal data Douglas Bates Longitudinal data: sleepstudy A model with random effects for intercept and slope University of Wisconsin - Madison

More information

ST505/S697R: Fall Homework 2 Solution.

ST505/S697R: Fall Homework 2 Solution. ST505/S69R: Fall 2012. Homework 2 Solution. 1. 1a; problem 1.22 Below is the summary information (edited) from the regression (using R output); code at end of solution as is code and output for SAS. a)

More information

Extensions of One-Way ANOVA.

Extensions of One-Way ANOVA. Extensions of One-Way ANOVA http://www.pelagicos.net/classes_biometry_fa18.htm What do I want You to Know What are two main limitations of ANOVA? What two approaches can follow a significant ANOVA? How

More information

Introduction to the Analysis of Hierarchical and Longitudinal Data

Introduction to the Analysis of Hierarchical and Longitudinal Data Introduction to the Analysis of Hierarchical and Longitudinal Data Georges Monette, York University with Ye Sun SPIDA June 7, 2004 1 Graphical overview of selected concepts Nature of hierarchical models

More information

Tento projekt je spolufinancován Evropským sociálním fondem a Státním rozpočtem ČR InoBio CZ.1.07/2.2.00/

Tento projekt je spolufinancován Evropským sociálním fondem a Státním rozpočtem ČR InoBio CZ.1.07/2.2.00/ Tento projekt je spolufinancován Evropským sociálním fondem a Státním rozpočtem ČR InoBio CZ.1.07/2.2.00/28.0018 Statistical Analysis in Ecology using R Linear Models/GLM Ing. Daniel Volařík, Ph.D. 13.

More information

Stat 209 Lab: Linear Mixed Models in R This lab covers the Linear Mixed Models tutorial by John Fox. Lab prepared by Karen Kapur. ɛ i Normal(0, σ 2 )

Stat 209 Lab: Linear Mixed Models in R This lab covers the Linear Mixed Models tutorial by John Fox. Lab prepared by Karen Kapur. ɛ i Normal(0, σ 2 ) Lab 2 STAT209 1/31/13 A complication in doing all this is that the package nlme (lme) is supplanted by the new and improved lme4 (lmer); both are widely used so I try to do both tracks in separate Rogosa

More information

Mixed models in R using the lme4 package Part 1: Linear mixed models with simple, scalar random

Mixed models in R using the lme4 package Part 1: Linear mixed models with simple, scalar random Mixed models in R using the lme4 package Part 1: Linear mixed models with simple, scalar random effects Douglas Bates 8 th International Amsterdam Conference on Multilevel Analysis

More information

Introduction to SAS proc mixed

Introduction to SAS proc mixed Faculty of Health Sciences Introduction to SAS proc mixed Analysis of repeated measurements, 2017 Julie Forman Department of Biostatistics, University of Copenhagen Outline Data in wide and long format

More information

The Random Effects Model Introduction

The Random Effects Model Introduction The Random Effects Model Introduction Sometimes, treatments included in experiment are randomly chosen from set of all possible treatments. Conclusions from such experiment can then be generalized to other

More information

Binary Regression. GH Chapter 5, ISL Chapter 4. January 31, 2017

Binary Regression. GH Chapter 5, ISL Chapter 4. January 31, 2017 Binary Regression GH Chapter 5, ISL Chapter 4 January 31, 2017 Seedling Survival Tropical rain forests have up to 300 species of trees per hectare, which leads to difficulties when studying processes which

More information

Recall that a measure of fit is the sum of squared residuals: where. The F-test statistic may be written as:

Recall that a measure of fit is the sum of squared residuals: where. The F-test statistic may be written as: 1 Joint hypotheses The null and alternative hypotheses can usually be interpreted as a restricted model ( ) and an model ( ). In our example: Note that if the model fits significantly better than the restricted

More information

13. October p. 1

13. October p. 1 Lecture 8 STK3100/4100 Linear mixed models 13. October 2014 Plan for lecture: 1. The lme function in the nlme library 2. Induced correlation structure 3. Marginal models 4. Estimation - ML and REML 5.

More information

Random Coefficients Model Examples

Random Coefficients Model Examples Random Coefficients Model Examples STAT:5201 Week 15 - Lecture 2 1 / 26 Each subject (or experimental unit) has multiple measurements (this could be over time, or it could be multiple measurements on a

More information

Applied Regression Analysis

Applied Regression Analysis Applied Regression Analysis Chapter 3 Multiple Linear Regression Hongcheng Li April, 6, 2013 Recall simple linear regression 1 Recall simple linear regression 2 Parameter Estimation 3 Interpretations of

More information

Lecture 1: Linear Models and Applications

Lecture 1: Linear Models and Applications Lecture 1: Linear Models and Applications Claudia Czado TU München c (Claudia Czado, TU Munich) ZFS/IMS Göttingen 2004 0 Overview Introduction to linear models Exploratory data analysis (EDA) Estimation

More information

Lecture 4. Random Effects in Completely Randomized Design

Lecture 4. Random Effects in Completely Randomized Design Lecture 4. Random Effects in Completely Randomized Design Montgomery: 3.9, 13.1 and 13.7 1 Lecture 4 Page 1 Random Effects vs Fixed Effects Consider factor with numerous possible levels Want to draw inference

More information

Dealing with Heteroskedasticity

Dealing with Heteroskedasticity Dealing with Heteroskedasticity James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) Dealing with Heteroskedasticity 1 / 27 Dealing

More information

Lecture 6 Multiple Linear Regression, cont.

Lecture 6 Multiple Linear Regression, cont. Lecture 6 Multiple Linear Regression, cont. BIOST 515 January 22, 2004 BIOST 515, Lecture 6 Testing general linear hypotheses Suppose we are interested in testing linear combinations of the regression

More information

Answer to exercise: Blood pressure lowering drugs

Answer to exercise: Blood pressure lowering drugs Answer to exercise: Blood pressure lowering drugs The data set bloodpressure.txt contains data from a cross-over trial, involving three different formulations of a drug for lowering of blood pressure:

More information

CO2 Handout. t(cbind(co2$type,co2$treatment,model.matrix(~type*treatment,data=co2)))

CO2 Handout. t(cbind(co2$type,co2$treatment,model.matrix(~type*treatment,data=co2))) CO2 Handout CO2.R: library(nlme) CO2[1:5,] plot(co2,outer=~treatment*type,layout=c(4,1)) m1co2.lis

More information

Regression: Main Ideas Setting: Quantitative outcome with a quantitative explanatory variable. Example, cont.

Regression: Main Ideas Setting: Quantitative outcome with a quantitative explanatory variable. Example, cont. TCELL 9/4/205 36-309/749 Experimental Design for Behavioral and Social Sciences Simple Regression Example Male black wheatear birds carry stones to the nest as a form of sexual display. Soler et al. wanted

More information

The Distribution of F

The Distribution of F The Distribution of F It can be shown that F = SS Treat/(t 1) SS E /(N t) F t 1,N t,λ a noncentral F-distribution with t 1 and N t degrees of freedom and noncentrality parameter λ = t i=1 n i(µ i µ) 2

More information

13 Simple Linear Regression

13 Simple Linear Regression B.Sc./Cert./M.Sc. Qualif. - Statistics: Theory and Practice 3 Simple Linear Regression 3. An industrial example A study was undertaken to determine the effect of stirring rate on the amount of impurity

More information

Mixed models with correlated measurement errors

Mixed models with correlated measurement errors Mixed models with correlated measurement errors Rasmus Waagepetersen October 9, 2018 Example from Department of Health Technology 25 subjects where exposed to electric pulses of 11 different durations

More information

STK4900/ Lecture 3. Program

STK4900/ Lecture 3. Program STK4900/9900 - Lecture 3 Program 1. Multiple regression: Data structure and basic questions 2. The multiple linear regression model 3. Categorical predictors 4. Planned experiments and observational studies

More information

Multi-factor analysis of variance

Multi-factor analysis of variance Faculty of Health Sciences Outline Multi-factor analysis of variance Basic statistics for experimental researchers 2016 Two-way ANOVA and interaction Matched samples ANOVA Random vs systematic variation

More information

Using R formulae to test for main effects in the presence of higher-order interactions

Using R formulae to test for main effects in the presence of higher-order interactions Using R formulae to test for main effects in the presence of higher-order interactions Roger Levy arxiv:1405.2094v2 [stat.me] 15 Jan 2018 January 16, 2018 Abstract Traditional analysis of variance (ANOVA)

More information

Lecture 3: Inference in SLR

Lecture 3: Inference in SLR Lecture 3: Inference in SLR STAT 51 Spring 011 Background Reading KNNL:.1.6 3-1 Topic Overview This topic will cover: Review of hypothesis testing Inference about 1 Inference about 0 Confidence Intervals

More information

A Handbook of Statistical Analyses Using R 2nd Edition. Brian S. Everitt and Torsten Hothorn

A Handbook of Statistical Analyses Using R 2nd Edition. Brian S. Everitt and Torsten Hothorn A Handbook of Statistical Analyses Using R 2nd Edition Brian S. Everitt and Torsten Hothorn CHAPTER 12 Analysing Longitudinal Data I: Computerised Delivery of Cognitive Behavioural Therapy Beat the Blues

More information

Lecture 22 Mixed Effects Models III Nested designs

Lecture 22 Mixed Effects Models III Nested designs Lecture 22 Mixed Effects Models III Nested designs 94 Introduction: Crossed Designs The two-factor designs considered so far involve every level of the first factor occurring with every level of the second

More information

Case of single exogenous (iv) variable (with single or multiple mediators) iv à med à dv. = β 0. iv i. med i + α 1

Case of single exogenous (iv) variable (with single or multiple mediators) iv à med à dv. = β 0. iv i. med i + α 1 Mediation Analysis: OLS vs. SUR vs. ISUR vs. 3SLS vs. SEM Note by Hubert Gatignon July 7, 2013, updated November 15, 2013, April 11, 2014, May 21, 2016 and August 10, 2016 In Chap. 11 of Statistical Analysis

More information

Multivariate Statistics in Ecology and Quantitative Genetics Summary

Multivariate Statistics in Ecology and Quantitative Genetics Summary Multivariate Statistics in Ecology and Quantitative Genetics Summary Dirk Metzler & Martin Hutzenthaler http://evol.bio.lmu.de/_statgen 5. August 2011 Contents Linear Models Generalized Linear Models Mixed-effects

More information

One-way ANOVA (Single-Factor CRD)

One-way ANOVA (Single-Factor CRD) One-way ANOVA (Single-Factor CRD) STAT:5201 Week 3: Lecture 3 1 / 23 One-way ANOVA We have already described a completed randomized design (CRD) where treatments are randomly assigned to EUs. There is

More information

Mixed-Models. version 30 October 2011

Mixed-Models. version 30 October 2011 Mixed-Models version 30 October 2011 Mixed models Mixed models estimate a vector! of fixed effects and one (or more) vectors u of random effects Both fixed and random effects models always include a vector

More information

Fitting mixed models in R

Fitting mixed models in R Fitting mixed models in R Contents 1 Packages 2 2 Specifying the variance-covariance matrix (nlme package) 3 2.1 Illustration:.................................... 3 2.2 Technical point: why to use as.numeric(time)

More information

PAPER 206 APPLIED STATISTICS

PAPER 206 APPLIED STATISTICS MATHEMATICAL TRIPOS Part III Thursday, 1 June, 2017 9:00 am to 12:00 pm PAPER 206 APPLIED STATISTICS Attempt no more than FOUR questions. There are SIX questions in total. The questions carry equal weight.

More information

Essential of Simple regression

Essential of Simple regression Essential of Simple regression We use simple regression when we are interested in the relationship between two variables (e.g., x is class size, and y is student s GPA). For simplicity we assume the relationship

More information

Exercise 5.4 Solution

Exercise 5.4 Solution Exercise 5.4 Solution Niels Richard Hansen University of Copenhagen May 7, 2010 1 5.4(a) > leukemia

More information

36-309/749 Experimental Design for Behavioral and Social Sciences. Sep. 22, 2015 Lecture 4: Linear Regression

36-309/749 Experimental Design for Behavioral and Social Sciences. Sep. 22, 2015 Lecture 4: Linear Regression 36-309/749 Experimental Design for Behavioral and Social Sciences Sep. 22, 2015 Lecture 4: Linear Regression TCELL Simple Regression Example Male black wheatear birds carry stones to the nest as a form

More information

Lecture 15. Hypothesis testing in the linear model

Lecture 15. Hypothesis testing in the linear model 14. Lecture 15. Hypothesis testing in the linear model Lecture 15. Hypothesis testing in the linear model 1 (1 1) Preliminary lemma 15. Hypothesis testing in the linear model 15.1. Preliminary lemma Lemma

More information

Regression, Part I. - In correlation, it would be irrelevant if we changed the axes on our graph.

Regression, Part I. - In correlation, it would be irrelevant if we changed the axes on our graph. Regression, Part I I. Difference from correlation. II. Basic idea: A) Correlation describes the relationship between two variables, where neither is independent or a predictor. - In correlation, it would

More information

Stat 5102 Final Exam May 14, 2015

Stat 5102 Final Exam May 14, 2015 Stat 5102 Final Exam May 14, 2015 Name Student ID The exam is closed book and closed notes. You may use three 8 1 11 2 sheets of paper with formulas, etc. You may also use the handouts on brand name distributions

More information

Stat/F&W Ecol/Hort 572 Review Points Ané, Spring 2010

Stat/F&W Ecol/Hort 572 Review Points Ané, Spring 2010 1 Linear models Y = Xβ + ɛ with ɛ N (0, σ 2 e) or Y N (Xβ, σ 2 e) where the model matrix X contains the information on predictors and β includes all coefficients (intercept, slope(s) etc.). 1. Number of

More information