ACOVA and Interactions

Size: px
Start display at page:

Download "ACOVA and Interactions"

Transcription

1 Chapter 15 ACOVA and Interactions Analysis of covariance (ACOVA) incorporates one or more regression variables into an analysis of variance. As such, we can think of it as analogous to the two-way ANOVA of Chapter 14 except that instead of having two different factor variables as predictors, we have one factor variable and one continuous variable. The regression variables are referred to as covariates (relative to the dependent variable), hence the name analysis of covariance. Covariates are also known as supplementary or concomitant observations. Cox (1958, chapter 4) gives a particularly nice discussion of the experimental design ideas behind analysis of covariance and illustrates various useful plotting techniques, see also Figure In 1957 and 1982, Biometrics devoted entire issues to the analysis of covariance. We begin our discussion with an example that involves a two-group one-way analysis of variance and one covariate. Section 2 looks at an example where the covariate can also be viewed as a factor variable. Section 3 uses ACOVA to look at lack-of-fit testing One covariate example Fisher (1947) gives data on the body weights (in kilograms) and heart weights (in grams) for domestic cats of both sexes that were given digitalis. A subset of the data is presented in Table Our primary interest is to determine whether females heart weights differ from males heart weights when both have received digitalis. As a first step, we might fit a one-way ANOVA model with sex groups, y i j = µ i + ε i j (15.1.1) = µ + α i + ε i j, where the y i j s are the heart weights, i = 1,2, and j = 1,...,24. This model yields the analysis of Table 15.1: Body weights (kg) and heart weights (g) of domestic cats. Females Males Body Heart Body Heart Body Heart Body Heart

2 ACOVA AND INTERACTIONS Table 15.2: One-way analysis of variance on heart weights: Model (15.1.1). Source df SS MS F P Sex Error Total Table 15.3: Analysis of variance for heart weights: Model (15.1.2). Source df Adj. SS MS F P Body weights Sex Error Total variance given in Table Note the overwhelming effect due to sexes. We now develop a model for both sex and weight that is analogous to the additive model (14.1.3) Additive regression effects Fisher provided both heart weights and body weights, so we can ask a more complex question, Is there a sex difference in the heart weights over and above the fact that male cats are naturally larger than female cats? To examine this we add a regression term to model (15.1.1) and fit the traditional analysis of covariance model, y i j = µ i + γz i j + ε i j (15.1.2) = µ + α i + γz i j + ε i j. Here the zs are the body weights and γ is a slope parameter associated with body weights. For this example the mean model is { µ1 + γz, if sex = female m(sex,z) = µ 2 + γz if sex = male. Model (15.1.2) is a special case of the general additive effects model (9.9.2). It is an extension of the simple linear regression between the ys and the zs in which we allow a different intercept µ i for each sex but the same slope. In many ways, it is analogous to the two-way additive effects model (14.1.3). In model (15.1.2) the effect of sex on heart weight is always the same for any fixed body weight, i.e., (µ 1 + γz) (µ 2 + γz) = µ 1 µ 2. Thus we can talk about µ 1 µ 2 being the sex effect regardless of body weight. The means for females and males are parallel lines with common slope γ and µ 1 µ 2 the distance between the lines. An analysis of variance table for model (15.1.2) is given as Table The interpretation of this table is different from the ANOVA tables examined earlier. For example, the sums of squares for body weights, sex, and error do not add up to the sum of squares total. The sums of squares in Table 15.3 are referred to as adjusted sums of squares (Adj. SS) because the body weight sum of squares is adjusted for sexes and the sex sum of squares is adjusted for body weights. The error line in Table 15.3 is simply the error from fitting model (15.1.2). The body weights line comes from comparing model (15.1.2) with the reduced model (15.1.1). Note that the only difference between models (15.1.1) and (15.1.2) is that (15.1.1) does not involve the regression on body weights, so by testing the two models we are testing whether there is a significant effect due

3 15.1 ONE COVARIATE EXAMPLE 359 to the regression on body weights. The standard way of comparing a full and a reduced model is by comparing their error terms. Model (15.1.2) has one more parameter, γ, than model (15.1.1), so there is one more degree of freedom for error in model (15.1.1) than in model (15.1.2), hence one degree of freedom for body weights. The adjusted sum of squares for body weights is the difference between the sum of squares error in model (15.1.1) and the sum of squares error in model (15.1.2). Given the sum of squares and the mean square, the F statistic for body weights is constructed in the usual way and is reported in Table We see a major effect due to the regression on body weights. The Sex line in Table 15.3 provides a test of whether there are differences in sexes after adjusting for the regression on body weights. This comes from comparing model (15.1.2) to a similar model in which sex differences have been eliminated. In model (15.1.2), the sex differences are incorporated as µ 1 and µ 2 in the first version and as α 1 and α 2 in the second version. To eliminate sex differences in model (15.1.2), we simply eliminate the distinctions between the µs (the αs). Such a model can be written as y i j = µ + γz i j + ε i j. (15.1.3) The analysis of covariance model without treatment effects is just a simple linear regression of heart weight on body weight. We have reduced the two sex parameters to one overall parameter, so the difference in degrees of freedom between model (15.1.3) and model (15.1.2) is 1. The difference in the sums of squares error between model (15.1.3) and model (15.1.2) is the adjusted sum of squares for sex reported in Table We see that the evidence for a sex effect over and above the effect due to the regression on body weights is not great. While ANOVA table Error terms are always the same for equivalent models, the table of coefficients depends on the particular parameterization of a model. I prefer the ACOVA model parameterization y i j = µ i + γz i j + ε i j. Some computer programs insist on using the equivalent model y i j = µ + α i + γz i j + ε i j (15.1.4) which is overparameterized. To get estimates of the parameters in model (15.1.4), one must impose side conditions on them. My choice would be to make µ = 0 and get a model equivalent to the first one. Other common choices of side conditions are: (a) α 1 = 0, (b) α 2 = 0, and (c) α 1 +α 2 = 0. Some programs are flexible enough to let you specify the side conditions yourself. Minitab, for example, uses the side conditions (c) and reports Covariate ˆγ SE( ˆγ) t P Constant Sex Body Wt Relative to model (15.1.4) the parameter estimates are ˆµ = 2.755, ˆα 1 = , ˆα 1 = , ˆγ = , so the estimated regression line for females is and for males E(y) = [ ( )] z = z E(y) = [2.755 ( )] z = z, e.g., the predicted values for females are and for males are ŷ 1 j = [ ( )] z 1 j = z 1 j ŷ 2 j = [2.755 ( )] z 2 j = z 2 j.

4 ACOVA AND INTERACTIONS Note that the t statistic for sex 1 is the square root of the F statistic for sex in Table The P values are identical. Similarly, the tests for body weights are equivalent. Again, we find clear evidence for the effect of body weights after fitting sexes. A 95% confidence interval for γ has end points ± 2.014(0.5759) which yields the interval (1.6,4.0). We are 95% confident that, for data comparable to the data in this study, an increase in body weight of one kilogram corresponds to a mean increase in heart weight of between 1.6g and 4.0g. (An increase in body weight corresponds to an increase in heart weight. Philosophically, we have no reason to believe that increasing body weights by one kg will cause an increase in heart weight.) In model (15.1.2), comparing treatments by comparing the treatment means ȳ i is inappropriate because of the complicating effect of the covariate. Adjusted means are often used to compare treatments. The formula and the actual values for the adjusted means are given below along with the raw means for body weights and heart rates. Note that the difference in adjusted means is Adjusted means ȳ i ˆγ( z i z ) Sex N Body Heart Adj. Heart Female Male Combined = ˆα 1 ˆα 2 = 2( ). We have seen previously that there is little evidence of a differential effect on heart weights due to sexes after adjusting for body weights. Nonetheless, from the adjusted means what evidence exists suggests that, even after adjusting for body weights, a typical heart weight for males, , is larger than a typical heart weight for females, Figures 15.1 through 15.3 contain residual plots for model (15.1.2). The plot of residuals versus predicted values looks exceptionally good. The plot of residuals versus sexes shows slightly less variability for females than for males. The difference is probably not enough to worry about. The normal plot of the residuals is alright with W above the appropriate percentile. The models that we have fitted form a hierarchy similar to that discussed in Chapter 14. The ACOVA model is larger than both the one-way and simple linear regression models, which are not comparable, but both are larger than the intercept only model. ACOVA One-Way ANOVA Simple Linear Regression Intercept Only In terms of numbered models the hierarchy is (15.1.2) (15.1.1) (15.1.3) (14.1.6) Such a hierarchy leads to two sequential ANOVA tables that are displayed in Table All of the results in Table 15.3 appear in Table Interaction models With these data, there is little reason to assume that when regressing heart weight on body weight the linear relationships are the same for females and males. Model (15.1.2) allows different intercepts

5 15.1 ONE COVARIATE EXAMPLE 361 Residual Fitted plot Standardized residuals Fitted Figure 15.1: Residuals versus predicted values, cat data. Residual Socio plot Standardized residuals x Figure 15.2: Residuals versus sex, cat data. for these regressions but uses the same slope γ. We should test the assumption of a common slope by fitting the more general model that allows different slopes for females and males, i.e., y i j = µ i + γ i z i j + ε i j (15.1.5) = µ + α i + γ i z i j + ε i j. In model (15.1.5) the γs depend on i and thus the slopes are allowed to differ between the sexes. While model (15.1.5) may look complicated, it consists of nothing more than fitting a simple linear regression to each group: one to the female data and a separate simple linear regression to the male

6 ACOVA AND INTERACTIONS Normal Q Q Plot Standardized residuals Theoretical Quantiles Figure 15.3: Normal plot for cat data, W = Table 15.4: Analyses of variance for rat weight gains. Source df Seq SS MS F P Body weights Sex Error Total Source df Seq SS MS F P Sex Body Weights Error Total data. The means model is { µ1 + γ m(sex,z) = 1 z, if sex = female µ 2 + γ 2 z if sex = male. Figure 15.4 contains some examples of how model (15.1.2) and model (15.1.5) might look when plotted. In model (15.1.2) the lines are always parallel. In model (15.1.5) they can have several appearences. The sum of squares error for model (15.1.5) can be found directly but it also comes from adding the error sums of squares for the separate female and male simple linear regressions. It is easily seen that for females the simple linear regression has an error sum of squares of on 22 degrees of freedom and the males have an error sum of squares of also on 22 degrees of freedom. Thus model (15.1.5) has an error sum of squares of = on = 44 degrees of freedom. The mean squared error for model (15.1.5) is MSE(5) = = 1.638

7 15.1 ONE COVARIATE EXAMPLE 363 No Interaction Interaction Mean Mean x 1 x 1 Interaction Interaction Mean Mean x 1 x 1 Figure 15.4 Patterns of interaction (effect modification) between a continuous predictor x 1 and a binary predictor x 2. and, using results from Table 15.3, the test of model (15.1.5) against the reduced model (15.1.2) has F = [ ]/[45 44] = =.126. The F statistic is very small; there is no evidence that we need to fit different slopes for the two sexes. Fitting model (15.1.5) gives us no reason to question our analysis of model (15.1.2). The interaction model is easily incorporated into our previous hierarchy of models. One-Way ANOVA or, in terms of numbered models, Interaction ACOVA Intercept Only Simple Linear Regression (15.1.5) (15.1.2) (15.1.1) (15.1.3) (14.1.6) The hierarchy leads to the two ANOVA tables given in Table We could also report C p statistics for all five models relative to the interaction model (15.1.5).

8 ACOVA AND INTERACTIONS Table 15.5: Analyses of variance for rat weight gains. Source df Seq SS MS F P Body weights Sex Sex*Body Wt Error Total Source df Seq SS MS F P Sex Body Weights Sex*Body Wt Error Total The table of coefficients depends on the particular parameterization of a model. I prefer the interaction model parameterization y i j = µ i + γ i z i j + ε i j, in which all of the parameters are uniquely defined. Some computer programs insist on using the equivalent model y i j = µ + α i + βz i j + γ i z i j + ε i j (15.1.6) which is overparameterized. To get estimates of the parameters, one must impose side conditions on them. My choice would be to make µ = 0 = β and get a model equivalent to the first one. Other common choices of side conditions are: (a) α 1 = 0 = γ 1, (b) α 2 = 0 = γ 2, and (c) α 1 + α 2 = 0 = γ 1 + γ 2. Some programs are flexible enough to let you specify the model yourself. Minitab, for example, uses the side conditions (c) and reports Covariate ˆγ SE( ˆγ) t P Constant Sex Body Wt Body Wt*Sex Relative to model (15.1.6) the parameter estimates are ˆµ = 2.789, ˆα 1 = 0.142, ˆα 2 = 0.142, ˆβ = ˆγ 1 = , ˆγ 2 = , so the estimated regression line for females is and for males E(y) = ( ) + [ ( )]z = z E(y) = ( ) + [ ( )]z = z, i.e., the fitted values for females are and for males ŷ 1 j = ( ) + [ ( )]z 1 j = z 1 j ŷ 2 j = ( ) + [ ( )]z 2 j = z 2 j,

9 15.1 ONE COVARIATE EXAMPLE Multiple covariates In our cat example, we had one covariate, but it would be very easy to extend model (15.1.2) to include more covariates. For example, with three covariates, x 1, x 2, x 3, the ACOVA model becomes y i j = µ i + γ 1 x i j1 + γ 2 x i j2 + γ 3 x i j3 + ε i j. We could even apply this idea to the cat example by considering a polynomial model. Incorporating into model (15.1.2) a cubic polynomial for one predictor z gives y i j = µ i + γ 1 z i j + γ 2 z 2 i j + γ 3 z 3 i j + ε i j The key point is that ACOVA models are additive effects models because none of the γ parameters depend on sex (i). If we have three covariates x 1, x 2, x 3, an ACOVA model has y i j = µ i + h(x i j1,x i j2,x i j3 ) + ε i j, for some function h( ). In this case µ 1 µ 2 is the differential effect for the two groups regardless of the covariate values. One possible interaction model allows completely different regressions functions for each group, y i j = µ i + γ i1 x i j1 + γ i2 x i j2 + γ i3 x i j3 + ε i j. Here we allow the slope parameters to depend on i. For the cat example we might consider separate cubic polynomials for each sex, i.e., y i j = µ i + γ i1 z i j + γ i2 z 2 i j + γ i3 z 3 i j + ε i j. Minitab commands The following Minitab commands were used to generate the analysis of these data. The means given by the ancova subcommand means are the adjusted treatment means. MTB > names c1 body c2 heart c3 sex MTB > note Fit model (\thesection.1). MTB > oneway c2 c3 MTB > note Fit model (\thesection.2). MTB > ancova c2 = c3; SUBC> covar c1; SUBC> resid c10; SUBC> fits c11; SUBC> means c3. MTB > plot c10 c11 MTB > plot c10 c3 MTB > note Split the data into females and males and MTB > note perform two regressions to fit model (\thesection.5). MTB > copy c1 c2 to c11 c12; SUBC> use c3=1. MTB > regress c12 on 1 c11 MTB > copy c1 c2 to c21 c22; SUBC> use c3=2. MTB > regress c22 on 1 c21

10 ACOVA AND INTERACTIONS 15.2 Regression modeling Consider again the ACOVA model (15.1.2) based on the factor variable sex (i) and the measurement variable body weight (z). To make life more interesting, let s consider a third sex category, say, herm (for hermaphrodite). If we create indicator variables for each of our three categories, say, x 1, x 2, x 3, we can rewrite both the one-way ANOVA model (15.1.1) and model (15.1.2) as linear models. (The SLR model (15.1.3) is already in linear model form.) The first form for the means of model (15.1.1) becomes a no intercept multiple regression model m(x 1,x 2,x 3 ) = µ 1 x 1 + µ 2 x 2 + µ 3 x 3 (15.2.1) { µ1, female = µ 2, male µ 3, herm and the second form for the means is the overparameterized model m(x 1,x 2,x 3 ) = µ + α 1 x 1 + α 2 x 2 + α 3 x 3 (15.2.2) (µ + α 1 ), female = (µ + α 2 ), male (µ + α 3 ), herm. The first form for the means of model (15.1.2) is the parallel lines regression model m(x 1,x 2,x 3,z) = µ 1 x 1 + µ 2 x 2 + µ 3 x 3 + γz (15.2.3) { µ1 + γz, female = µ 2 + γz, male µ 3 + γz, herm and the second form is the overparameterized parallel lines model m(x 1,x 2,x 3,z) = µ + α 1 x 1 + α 2 x 2 + α 3 x 3 + γz (15.2.4) (µ + α 1 ) + γz, female = (µ + α 2 ) + γz, male (µ + α 3 ) + γz, herm. Similarly, we could have parallel polynomials. For quadratics that would be m(x 1,x 2,x 3,z) = µ 1 x 1 + µ 2 x 2 + µ 3 x 3 + γ 1 z + γ 2 z 2 µ 1 + γ 1 z + γ 2 z 2, female = µ 2 + γ 1 z + γ 2 z 2, male µ 3 + γ 1 z + γ 2 z 2, herm wherein only the intercepts are different. The interaction model (15.1.5) gives separate lines for each group and can be written as m(x 1,x 2,x 3,z) = µ 1 x 1 + µ 2 x 2 + µ 3 x 3 + γ 1 zx 1 + γ 2 zx 2 + γ 3 zx 3 { µ1 + γ 1 z, female = µ 2 + γ 2 z, male µ 3 + γ 3 z, herm and the second form is the overparameterized model m(x 1,x 2,x 3,z) = µ + α 1 x 1 + α 2 x 2 + α 3 x 3 + βz + γ 1 zx 1 + γ 2 zx 2 + γ 3 zx 3 (µ + α 1 ) + (β + γ 1 )z, female = (µ + α 2 ) + (β + γ 2 )z, male (µ + α 3 ) + (β + γ 3 )z, herm.

11 15.2 REGRESSION MODELING 367 Every sex category has a completely separate line with different slopes and intercepts. Interaction parabolas would be completely separate parabolas for each group m(x 1,x 2,x 3,z) = µ 1 x 1 + µ 2 x 2 + µ 3 x 3 + γ 11 zx 1 + γ 21 z 2 x 1 = +γ 12 zx 2 + γ 22 z 2 x 2 + γ 13 zx 3 + γ 23 z 2 x 3 µ 1 + γ 11 z + γ 21 z 2, female µ 2 + γ 12 z + γ 22 z 2, male µ 3 + γ 13 z + γ 23 z 2, herm Using overparameterized models As discussed in Chapter 12, model (15.2.2) can be made into a regression model by dropping any one of the predictor variables, say x 1, m(x 1,x 2,x 3 ) = µ + α 2 x 2 + α 3 x 3 (15.2.5) µ, female = (µ + α 2 ), male (µ + α 3 ), herm. Using an intercept and indicators x 2 and x 3 for male and herm makes female the baseline category. Similarly, if we fit the ACOVA model (15.2.4) but drop out x 1 we get parallel lines m(x 1,x 2,x 3,z) = µ + α 2 x 2 + α 3 x 3 + γz (15.2.6) µ + γz, female = (µ + α 2 ) + γz, male (µ + α 3 ) + γz, herm. If, in the one-way ANOVA, we thought that males and females had the same mean, we could drop both x 1 and x 2 from model (15.2.2) to get { µ, female or male m(x 1,x 2,x 3 ) = µ + α 3 x 3 = µ + α 3, herm. If we thought that males and herms had the same mean, since neither male nor herm is the baseline, we could replace x 2 and x 3 with a new variable x = x 2 +x 3 that indicates membership in either group to get { µ, female m(x 1,x 2,x 3 ) = µ + α x = µ + α, male or herm. We could equally well fit the model m(x 1,x 2,x 3 ) = µ 1 x 1 + µ 3 x = { µ1, female µ 3, male or herm. In these cases, the analysis of covariance (15.2.4) behaves similarly. For example, without both x 1 and x 2 model (15.2.4) becomes m(x 1,x 2,x 3,z) = µ + α 3 x 3 + γz (15.2.7) { µ + γz, female or male = (µ + α 3 ) + γz, herm. and involves only two parallel lines, one that applies to both females and males, and another one for herms. Dropping both x 1 and x 2 from model (15.2.2) gives very different results than dropping the

12 ACOVA AND INTERACTIONS Table 15.6: Multiple weighings of a hopper car. Day First Second Third Day First Second Third intercept and x 2 from model (15.2.2). That statement may seem obvious, but if you think about the fact that dropping x 1 alone does not actually affect how the model fits the data, it might be tempting to think that further dropping x 2 could have the same effect after dropping x 1 as dropping x 2 has in model (15.2.1). We have already examined dropping both x 1 and x 2 from model (15.2.2), now consider dropping both the intercept and x 2 from model (15.2.2), i.e., dropping x 2 from model (15.2.1). The model becomes m(x) = µ 1 x 1 + µ 3 x 3 = { µ1, female 0, male µ 3, herm. This occurs because all of the predictor variables in the model take the value 0 for male. If we incorporate the covariate age into this model we get m(x) = µ 1 x 1 + µ 3 x 3 + γz = which are three parallel lines but male has an intercept of 0. { µ1 + γz, female 0 + γz, male µ 3 + γz, herm ACOVA and two-way ANOVA The material in section 1 is sufficiently complex to warrant another example. This time we use a covariate that also defines a grouping variable and explore the relationships between fitting an ACOVA and fitting a two-way ANOVA. EXAMPLE Hopper Data. The data in Table 15.6 were provided by Schneider and Pruett (1994). They were interested in whether the measurement system for the weight of railroad hopper cars was under control. A standard hopper car weighing about 266,000 pounds was used to obtain the first 3 weighings of the day on 20 days. The process was to move car onto the scales, weigh the car, move the car off, move the car on, weigh the car, move it off, move it on, and weigh it a third time. The tabled values are the weight of the car minus 260,000. As we did with the cat data, the first thing we might do is treat the three repeat observations as replications and do a one-way ANOVA on the days, y i j = µ i + ε i j, i = 1,...,12, j = 1,2,3. Summary statistics are given in Table 15.7 and the ANOVA table follows.

13 15.3 ACOVA AND TWO-WAY ANOVA 369 Table 15.7: Summary statistics for hopper data. DAY N MEAN STDEV DAY N MEAN STDEV Analysis of Variance Source df SS MS F P Day Error Total Obviously, there are differences in days Additive effects The three repeat observations on the hopper could be subject to trends. Treat the three observations as measurements of time with values 1, 2, 3. This now serves as a covariate z. With three distinct covariate values, we could fit a parabola. y i j = µ i + γ 1 z i j + γ 2 z 2 i j + ε i j, i = 1,...,12, j = 1,2,3. The software I used actually fits y i j = µ + α i + γ 1 z i j + γ 2 z 2 i j + ε i j, i = 1,...,12, j = 1,2,3 with the additional constraint that α α 20 = 0, so that ˆα 20 = ( ˆα ˆα 19 ). The output then only presents ˆα 1,..., ˆα 19

14 ACOVA AND INTERACTIONS Table of Coefficients Predictor ˆµ k SE( ˆµ k ) t P Constant z z Day The table of coefficients is ugly, especially because there are so many days, but the main point is that the z 2 term in not significant (P = 0.145). The corresponding ANOVA table is a little strange. The only really important thing is that it gives the Error line. There is also some interest in the fact that the F statistic reported for z 2 is the square of the t statistic, having identical P values. Analysis of Variance Source df SS MS F P z Day z Error Total Similar to Section 12.5, instead of fitting a maximal polynomial (we only have three times so can fit at most a quadratic in time), we could alternatively treat z as a factor variable and do a two-way ANOVA as in Chapter 14, i.e., fit y i j = µ + α i + η j + ε i j, i = 1,...,12, j = 1,2,3. The quadratic ACOVA model is equivalent to this two-way ANOVA model, so the two-way ANOVA model should have an equivalent ANOVA table. Analysis of Variance Source df SS MS F P Day Time Error Total

15 15.3 ACOVA AND TWO-WAY ANOVA 371 This has the same Error line as the quadratic ACOVA model. With a nonsignificant z 2 term in the quadratic model, it makes sense to check whether we need the linear term in z. The model is y i j = µ i + γ 1 z i j + ε i j, i = 1,...,12, j = 1,2,3 or y i j = µ + α i + γ 1 z i j + ε i j, i = 1,...,12, j = 1,2,3 subject to the constraint that α α 20 = 0. The table of coefficients is Table of Coefficients Predictor ˆµ k SE( ˆµ k ) t P Constant Time Day and we find no evidence that we need the linear term (P = 0.655). For completeness, an ANOVA table is Analysis of Variance Source df SS MS F P z Day Error Total It might be tempting to worry about interaction in this model. Resist the temptation! First, there are not enough observations for us to fit a full interaction model and still estimate σ 2. If we fit separate quadratics for each day, we would have 60 mean parameters and 60 observations, so zero degrees of freedom for error. Exactly the same thing would happen if we fit a standard interaction model from Chapter 14. But more importantly, it just makes sense to think of interaction as error for these data. What does it mean for there to be a time trend in these data? Surely we have no interest in time trends that go up one day and down another day without any rhyme or reason. For a time trend to be meaningful, it needs to be something that we can spot on a consistent basis. It has to be something that is strong enough that we can see it over and above the natural day to day variation of

16 ACOVA AND INTERACTIONS Table 15.8: Hooker data. Near Near Case Temperature Pressure Rep. Case Temperature Pressure Rep the weighing process. Well, the natural day to day variation of the weighing process is precisely the Day by Time interaction, so the interaction is precisely what we want to be using as our error term. In the model y i j = µ + α i + γ 1 z i j + γ 2 z 2 i j + ε i j, changes that are inconsistent across days and times, terms that depend on both i and j, are what we want to use as error. (An exception to this claim is if, say, we noticed that time trends go up one day, down the next, then up again, etc. That is a form of interaction that we could be interested in, but it s existence requires additional structure for the Days because it involves modeling effects for alternate days.) 15.4 Near replicate lack-of-fit tests In Section 8.5 and subsection we discussed Fisher s lack-of-fit test. Fisher s test is based on there being duplicate cases among the predictor variables. Often, there are few or none of these. Near replicate lack of fit tests were designed to ameliorate that problem by clustering together cases that are nearly replicates of one another. With the Hooker data, Fisher s lack-of-fit test suffers from few degrees of freedom for pure error. Table 15.8 contains a list of near replicates. These were obtained by grouping together cases that were within.5 degrees F. We then construct an F test by fitting 3 models. First, reindex the observations y i, i = 1,...,31 into y jk with j = 1,...,19 identifying the near replicate groups and k = 1,...,N i identifying observations within the near replicate group. Thus the simple linear regression model y i = β 0 + β 1 x i + ε i can be rewritten as y jk = β 0 + β 1 x jk + ε jk. The first of the three models in question is the simple linear regression performed on the near replicate cluster means x j y jk = β 0 + β 1 x j + ε jk. (15.4.1) This is sometimes called the artificial means model because it is a regression on the near replicate cluster means x j but the clusters are artificially constructed. The second model is a one-way analysis of variance model with groups defined by the near replicate clusters. y jk = µ j + ε jk. (15.4.2)

17 15.5 EXERCISES 373 As a regression model, define the predictor variables δ h j for h = 1,...,19 which are equal to 1 if h = j and 0 otherwise. Then the model can be rewritten as a multiple regression model through the origin y jk = µ 1 δ 1 j + µ 2 δ 2 j + + µ 19 δ 19, j + ε jk. The last model is called an analysis of covariance model because it incorporates the original predictor (covariate) x jk into the analysis of variance model (15.4.2). The model is which can alternatively be written as a regression Fitting these three models gives y jk = µ j + β 1 x jk + ε jk (15.4.3) y jk = µ 1 δ 1 j + µ 2 δ 2 j + + µ 19 δ 19, j + β 1 x jk + ε jk. Analysis of variance: artificial means model (15.4.1) Source df SS MS F P Regression Error Total Analysis of variance on near replicate groups (15.4.2) Source df SS MS F P Near Reps Error Total Analysis of covariance (15.4.3) Source df SS MS F P x Near Reps Error Total The lack of fit test uses the difference in the sums of squares error for the first two models in the numerator of the test and the mean squared error for the analysis of covariance model in the denominator of the test. The lack of fit test statistic is F = ( )/(29 12).027 = 7.4. This can be compared to an F(17,11) distribution which yields a P value of.001. This procedure is known as Shillington s test, cf. Christensen (2011) Exercises EXERCISE Table 15.8 contains data from Sulzberger (1953) and Williams (1959) on y, the maximum compressive strength parallel to the grain of wood from ten hoop pine trees. The data also include the temperature of the evaluation and a covariate z, the moisture content of the wood. Analyze the data. Examine (tabled) polynomial contrasts in the temperatures. EXERCISE Smith, Gnanadesikan, and Hughes (1962) gave data on urine characteristics of young men. The men were divided into four categories based on obesity. The data contain a

18 ACOVA AND INTERACTIONS Table 15.9: Compressive strength of hoop pine trees (y) with moisture contents (z). Temperature 20 C 0 C 20 C 40 C 60 C Tree z y z y z y z y z y Table 15.10: Excretory characteristics. Group I Group II z y 1 y 2 z y 1 y Group III Group IV z y 1 y 2 z y 1 y covariate z that measures specific gravity. The dependent variable is y 1 ; it measures pigment creatinine. These variables are included in Table Perform an analysis of covariance on y 1. How do the conclusions about obesity effects change between the ACOVA and the results of the ANOVA that ignores the covariate? EXERCISE Smith, Gnanadesikan, and Hughes (1962) also give data on the variable y 2 that measures chloride in the urine of young men. These data are also reported in Table As in the previous problem, the men were divided into four categories based on obesity. Perform an

19 15.5 EXERCISES 375 analysis of covariance on y 2 again using the specific gravity as the covariate z. Compare the results of the ACOVA to the results of the ANOVA that ignores the covariate. EXERCISE Test the need for a power transformation in each of the following problems from the previous chapter. Use all three constructed variables on each data set and compare results. (a) Exercise (b) Exercise (c) Exercise (d) Exercise (e) Exercise (f) Exercise (g) Exercise EXERCISE Consider the analysis of covariance for a completely randomized design with one covariate. Find the form for a 99% prediction interval for an observation, say, from the first treatment group with a given covariate value z. EXERCISE Assuming that in model (15.3.1) Cov(ȳ i, ˆγ) = 0, show that ) [ ] Var λ i (ȳ i z i ˆγ) = σ 2 a i=1 λi 2 + ( a i=1 λ i z i ) 2. b SSE zz ( a i=1

Categorical Predictor Variables

Categorical Predictor Variables Categorical Predictor Variables We often wish to use categorical (or qualitative) variables as covariates in a regression model. For binary variables (taking on only 2 values, e.g. sex), it is relatively

More information

Multiple Regression Examples

Multiple Regression Examples Multiple Regression Examples Example: Tree data. we have seen that a simple linear regression of usable volume on diameter at chest height is not suitable, but that a quadratic model y = β 0 + β 1 x +

More information

Linear Regression. In this lecture we will study a particular type of regression model: the linear regression model

Linear Regression. In this lecture we will study a particular type of regression model: the linear regression model 1 Linear Regression 2 Linear Regression In this lecture we will study a particular type of regression model: the linear regression model We will first consider the case of the model with one predictor

More information

Model Checking. Chapter 7

Model Checking. Chapter 7 Chapter 7 Model Checking In this chapter we consider methods for checking model assumptions and the use of transformations to correct problems with the assumptions. The primary method for checking model

More information

Two-Way Analysis of Variance - no interaction

Two-Way Analysis of Variance - no interaction 1 Two-Way Analysis of Variance - no interaction Example: Tests were conducted to assess the effects of two factors, engine type, and propellant type, on propellant burn rate in fired missiles. Three engine

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression Simple linear regression tries to fit a simple line between two variables Y and X. If X is linearly related to Y this explains some of the variability in Y. In most cases, there

More information

Residual Analysis for two-way ANOVA The twoway model with K replicates, including interaction,

Residual Analysis for two-way ANOVA The twoway model with K replicates, including interaction, Residual Analysis for two-way ANOVA The twoway model with K replicates, including interaction, is Y ijk = µ ij + ɛ ijk = µ + α i + β j + γ ij + ɛ ijk with i = 1,..., I, j = 1,..., J, k = 1,..., K. In carrying

More information

Correlation & Simple Regression

Correlation & Simple Regression Chapter 11 Correlation & Simple Regression The previous chapter dealt with inference for two categorical variables. In this chapter, we would like to examine the relationship between two quantitative variables.

More information

Lecture 18: Simple Linear Regression

Lecture 18: Simple Linear Regression Lecture 18: Simple Linear Regression BIOS 553 Department of Biostatistics University of Michigan Fall 2004 The Correlation Coefficient: r The correlation coefficient (r) is a number that measures the strength

More information

9 Correlation and Regression

9 Correlation and Regression 9 Correlation and Regression SW, Chapter 12. Suppose we select n = 10 persons from the population of college seniors who plan to take the MCAT exam. Each takes the test, is coached, and then retakes the

More information

Stat 135, Fall 2006 A. Adhikari HOMEWORK 10 SOLUTIONS

Stat 135, Fall 2006 A. Adhikari HOMEWORK 10 SOLUTIONS Stat 135, Fall 2006 A. Adhikari HOMEWORK 10 SOLUTIONS 1a) The model is cw i = β 0 + β 1 el i + ɛ i, where cw i is the weight of the ith chick, el i the length of the egg from which it hatched, and ɛ i

More information

Regression Models for Quantitative and Qualitative Predictors: An Overview

Regression Models for Quantitative and Qualitative Predictors: An Overview Regression Models for Quantitative and Qualitative Predictors: An Overview Polynomial regression models Interaction regression models Qualitative predictors Indicator variables Modeling interactions between

More information

VIII. ANCOVA. A. Introduction

VIII. ANCOVA. A. Introduction VIII. ANCOVA A. Introduction In most experiments and observational studies, additional information on each experimental unit is available, information besides the factors under direct control or of interest.

More information

Well-developed and understood properties

Well-developed and understood properties 1 INTRODUCTION TO LINEAR MODELS 1 THE CLASSICAL LINEAR MODEL Most commonly used statistical models Flexible models Well-developed and understood properties Ease of interpretation Building block for more

More information

Linear Regression Model. Badr Missaoui

Linear Regression Model. Badr Missaoui Linear Regression Model Badr Missaoui Introduction What is this course about? It is a course on applied statistics. It comprises 2 hours lectures each week and 1 hour lab sessions/tutorials. We will focus

More information

TAMS38 Experimental Design and Biostatistics, 4 p / 6 hp Examination on 19 April 2017, 8 12

TAMS38 Experimental Design and Biostatistics, 4 p / 6 hp Examination on 19 April 2017, 8 12 Kurskod: TAMS38 - Provkod: TEN1 TAMS38 Experimental Design and Biostatistics, 4 p / 6 hp Examination on 19 April 2017, 8 12 The collection of the formulas in mathematical statistics prepared by Department

More information

STAT 705 Chapter 19: Two-way ANOVA

STAT 705 Chapter 19: Two-way ANOVA STAT 705 Chapter 19: Two-way ANOVA Timothy Hanson Department of Statistics, University of South Carolina Stat 705: Data Analysis II 1 / 38 Two-way ANOVA Material covered in Sections 19.2 19.4, but a bit

More information

Ch 2: Simple Linear Regression

Ch 2: Simple Linear Regression Ch 2: Simple Linear Regression 1. Simple Linear Regression Model A simple regression model with a single regressor x is y = β 0 + β 1 x + ɛ, where we assume that the error ɛ is independent random component

More information

Inferences for Regression

Inferences for Regression Inferences for Regression An Example: Body Fat and Waist Size Looking at the relationship between % body fat and waist size (in inches). Here is a scatterplot of our data set: Remembering Regression In

More information

Regression, part II. I. What does it all mean? A) Notice that so far all we ve done is math.

Regression, part II. I. What does it all mean? A) Notice that so far all we ve done is math. Regression, part II I. What does it all mean? A) Notice that so far all we ve done is math. 1) One can calculate the Least Squares Regression Line for anything, regardless of any assumptions. 2) But, if

More information

Multiple Regression: Chapter 13. July 24, 2015

Multiple Regression: Chapter 13. July 24, 2015 Multiple Regression: Chapter 13 July 24, 2015 Multiple Regression (MR) Response Variable: Y - only one response variable (quantitative) Several Predictor Variables: X 1, X 2, X 3,..., X p (p = # predictors)

More information

Chapter 14 Student Lecture Notes 14-1

Chapter 14 Student Lecture Notes 14-1 Chapter 14 Student Lecture Notes 14-1 Business Statistics: A Decision-Making Approach 6 th Edition Chapter 14 Multiple Regression Analysis and Model Building Chap 14-1 Chapter Goals After completing this

More information

ST 512-Practice Exam I - Osborne Directions: Answer questions as directed. For true/false questions, circle either true or false.

ST 512-Practice Exam I - Osborne Directions: Answer questions as directed. For true/false questions, circle either true or false. ST 512-Practice Exam I - Osborne Directions: Answer questions as directed. For true/false questions, circle either true or false. 1. A study was carried out to examine the relationship between the number

More information

Model Building Chap 5 p251

Model Building Chap 5 p251 Model Building Chap 5 p251 Models with one qualitative variable, 5.7 p277 Example 4 Colours : Blue, Green, Lemon Yellow and white Row Blue Green Lemon Insects trapped 1 0 0 1 45 2 0 0 1 59 3 0 0 1 48 4

More information

School of Mathematical Sciences. Question 1. Best Subsets Regression

School of Mathematical Sciences. Question 1. Best Subsets Regression School of Mathematical Sciences MTH5120 Statistical Modelling I Practical 9 and Assignment 8 Solutions Question 1 Best Subsets Regression Response is Crime I n W c e I P a n A E P U U l e Mallows g E P

More information

1 Multiple Regression

1 Multiple Regression 1 Multiple Regression In this section, we extend the linear model to the case of several quantitative explanatory variables. There are many issues involved in this problem and this section serves only

More information

Regression Analysis IV... More MLR and Model Building

Regression Analysis IV... More MLR and Model Building Regression Analysis IV... More MLR and Model Building This session finishes up presenting the formal methods of inference based on the MLR model and then begins discussion of "model building" (use of regression

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression In simple linear regression we are concerned about the relationship between two variables, X and Y. There are two components to such a relationship. 1. The strength of the relationship.

More information

CS 5014: Research Methods in Computer Science

CS 5014: Research Methods in Computer Science Computer Science Clifford A. Shaffer Department of Computer Science Virginia Tech Blacksburg, Virginia Fall 2010 Copyright c 2010 by Clifford A. Shaffer Computer Science Fall 2010 1 / 207 Correlation and

More information

Contents. 2 2 factorial design 4

Contents. 2 2 factorial design 4 Contents TAMS38 - Lecture 10 Response surface methodology Lecturer: Zhenxia Liu Department of Mathematics - Mathematical Statistics 12 December, 2017 2 2 factorial design Polynomial Regression model First

More information

Data Set 8: Laysan Finch Beak Widths

Data Set 8: Laysan Finch Beak Widths Data Set 8: Finch Beak Widths Statistical Setting This handout describes an analysis of covariance (ANCOVA) involving one categorical independent variable (with only two levels) and one quantitative covariate.

More information

K. Model Diagnostics. residuals ˆɛ ij = Y ij ˆµ i N = Y ij Ȳ i semi-studentized residuals ω ij = ˆɛ ij. studentized deleted residuals ɛ ij =

K. Model Diagnostics. residuals ˆɛ ij = Y ij ˆµ i N = Y ij Ȳ i semi-studentized residuals ω ij = ˆɛ ij. studentized deleted residuals ɛ ij = K. Model Diagnostics We ve already seen how to check model assumptions prior to fitting a one-way ANOVA. Diagnostics carried out after model fitting by using residuals are more informative for assessing

More information

df=degrees of freedom = n - 1

df=degrees of freedom = n - 1 One sample t-test test of the mean Assumptions: Independent, random samples Approximately normal distribution (from intro class: σ is unknown, need to calculate and use s (sample standard deviation)) Hypotheses:

More information

STAT 3A03 Applied Regression With SAS Fall 2017

STAT 3A03 Applied Regression With SAS Fall 2017 STAT 3A03 Applied Regression With SAS Fall 2017 Assignment 2 Solution Set Q. 1 I will add subscripts relating to the question part to the parameters and their estimates as well as the errors and residuals.

More information

Contents. TAMS38 - Lecture 6 Factorial design, Latin Square Design. Lecturer: Zhenxia Liu. Factorial design 3. Complete three factor design 4

Contents. TAMS38 - Lecture 6 Factorial design, Latin Square Design. Lecturer: Zhenxia Liu. Factorial design 3. Complete three factor design 4 Contents Factorial design TAMS38 - Lecture 6 Factorial design, Latin Square Design Lecturer: Zhenxia Liu Department of Mathematics - Mathematical Statistics 28 November, 2017 Complete three factor design

More information

F-tests and Nested Models

F-tests and Nested Models F-tests and Nested Models Nested Models: A core concept in statistics is comparing nested s. Consider the Y = β 0 + β 1 x 1 + β 2 x 2 + ǫ. (1) The following reduced s are special cases (nested within)

More information

Simple Linear Regression: One Qualitative IV

Simple Linear Regression: One Qualitative IV Simple Linear Regression: One Qualitative IV 1. Purpose As noted before regression is used both to explain and predict variation in DVs, and adding to the equation categorical variables extends regression

More information

POL 681 Lecture Notes: Statistical Interactions

POL 681 Lecture Notes: Statistical Interactions POL 681 Lecture Notes: Statistical Interactions 1 Preliminaries To this point, the linear models we have considered have all been interpreted in terms of additive relationships. That is, the relationship

More information

STAT 705 Chapter 19: Two-way ANOVA

STAT 705 Chapter 19: Two-way ANOVA STAT 705 Chapter 19: Two-way ANOVA Adapted from Timothy Hanson Department of Statistics, University of South Carolina Stat 705: Data Analysis II 1 / 41 Two-way ANOVA This material is covered in Sections

More information

STK4900/ Lecture 3. Program

STK4900/ Lecture 3. Program STK4900/9900 - Lecture 3 Program 1. Multiple regression: Data structure and basic questions 2. The multiple linear regression model 3. Categorical predictors 4. Planned experiments and observational studies

More information

The Standard Linear Model: Hypothesis Testing

The Standard Linear Model: Hypothesis Testing Department of Mathematics Ma 3/103 KC Border Introduction to Probability and Statistics Winter 2017 Lecture 25: The Standard Linear Model: Hypothesis Testing Relevant textbook passages: Larsen Marx [4]:

More information

Intro to Linear Regression

Intro to Linear Regression Intro to Linear Regression Introduction to Regression Regression is a statistical procedure for modeling the relationship among variables to predict the value of a dependent variable from one or more predictor

More information

Basic Business Statistics, 10/e

Basic Business Statistics, 10/e Chapter 4 4- Basic Business Statistics th Edition Chapter 4 Introduction to Multiple Regression Basic Business Statistics, e 9 Prentice-Hall, Inc. Chap 4- Learning Objectives In this chapter, you learn:

More information

MS&E 226. In-Class Midterm Examination Solutions Small Data October 20, 2015

MS&E 226. In-Class Midterm Examination Solutions Small Data October 20, 2015 MS&E 226 In-Class Midterm Examination Solutions Small Data October 20, 2015 PROBLEM 1. Alice uses ordinary least squares to fit a linear regression model on a dataset containing outcome data Y and covariates

More information

Ma 3/103: Lecture 25 Linear Regression II: Hypothesis Testing and ANOVA

Ma 3/103: Lecture 25 Linear Regression II: Hypothesis Testing and ANOVA Ma 3/103: Lecture 25 Linear Regression II: Hypothesis Testing and ANOVA March 6, 2017 KC Border Linear Regression II March 6, 2017 1 / 44 1 OLS estimator 2 Restricted regression 3 Errors in variables 4

More information

Confidence Interval for the mean response

Confidence Interval for the mean response Week 3: Prediction and Confidence Intervals at specified x. Testing lack of fit with replicates at some x's. Inference for the correlation. Introduction to regression with several explanatory variables.

More information

Activity #12: More regression topics: LOWESS; polynomial, nonlinear, robust, quantile; ANOVA as regression

Activity #12: More regression topics: LOWESS; polynomial, nonlinear, robust, quantile; ANOVA as regression Activity #12: More regression topics: LOWESS; polynomial, nonlinear, robust, quantile; ANOVA as regression Scenario: 31 counts (over a 30-second period) were recorded from a Geiger counter at a nuclear

More information

Linear regression. We have that the estimated mean in linear regression is. ˆµ Y X=x = ˆβ 0 + ˆβ 1 x. The standard error of ˆµ Y X=x is.

Linear regression. We have that the estimated mean in linear regression is. ˆµ Y X=x = ˆβ 0 + ˆβ 1 x. The standard error of ˆµ Y X=x is. Linear regression We have that the estimated mean in linear regression is The standard error of ˆµ Y X=x is where x = 1 n s.e.(ˆµ Y X=x ) = σ ˆµ Y X=x = ˆβ 0 + ˆβ 1 x. 1 n + (x x)2 i (x i x) 2 i x i. The

More information

Example: Poisondata. 22s:152 Applied Linear Regression. Chapter 8: ANOVA

Example: Poisondata. 22s:152 Applied Linear Regression. Chapter 8: ANOVA s:5 Applied Linear Regression Chapter 8: ANOVA Two-way ANOVA Used to compare populations means when the populations are classified by two factors (or categorical variables) For example sex and occupation

More information

Linear models and their mathematical foundations: Simple linear regression

Linear models and their mathematical foundations: Simple linear regression Linear models and their mathematical foundations: Simple linear regression Steffen Unkel Department of Medical Statistics University Medical Center Göttingen, Germany Winter term 2018/19 1/21 Introduction

More information

Module 6: Model Diagnostics

Module 6: Model Diagnostics St@tmaster 02429/MIXED LINEAR MODELS PREPARED BY THE STATISTICS GROUPS AT IMM, DTU AND KU-LIFE Module 6: Model Diagnostics 6.1 Introduction............................... 1 6.2 Linear model diagnostics........................

More information

22s:152 Applied Linear Regression. Take random samples from each of m populations.

22s:152 Applied Linear Regression. Take random samples from each of m populations. 22s:152 Applied Linear Regression Chapter 8: ANOVA NOTE: We will meet in the lab on Monday October 10. One-way ANOVA Focuses on testing for differences among group means. Take random samples from each

More information

assumes a linear relationship between mean of Y and the X s with additive normal errors the errors are assumed to be a sample from N(0, σ 2 )

assumes a linear relationship between mean of Y and the X s with additive normal errors the errors are assumed to be a sample from N(0, σ 2 ) Multiple Linear Regression is used to relate a continuous response (or dependent) variable Y to several explanatory (or independent) (or predictor) variables X 1, X 2,, X k assumes a linear relationship

More information

Problems. Suppose both models are fitted to the same data. Show that SS Res, A SS Res, B

Problems. Suppose both models are fitted to the same data. Show that SS Res, A SS Res, B Simple Linear Regression 35 Problems 1 Consider a set of data (x i, y i ), i =1, 2,,n, and the following two regression models: y i = β 0 + β 1 x i + ε, (i =1, 2,,n), Model A y i = γ 0 + γ 1 x i + γ 2

More information

INTRODUCTION TO ANALYSIS OF VARIANCE

INTRODUCTION TO ANALYSIS OF VARIANCE CHAPTER 22 INTRODUCTION TO ANALYSIS OF VARIANCE Chapter 18 on inferences about population means illustrated two hypothesis testing situations: for one population mean and for the difference between two

More information

Multiple regression: introduction

Multiple regression: introduction Chapter 9 Multiple regression: introduction Multiple regression involves predicting values of a dependent variable from the values on a collection of other (predictor) variables. In particular, linear

More information

Contents. TAMS38 - Lecture 8 2 k p fractional factorial design. Lecturer: Zhenxia Liu. Example 0 - continued 4. Example 0 - Glazing ceramic 3

Contents. TAMS38 - Lecture 8 2 k p fractional factorial design. Lecturer: Zhenxia Liu. Example 0 - continued 4. Example 0 - Glazing ceramic 3 Contents TAMS38 - Lecture 8 2 k p fractional factorial design Lecturer: Zhenxia Liu Department of Mathematics - Mathematical Statistics Example 0 2 k factorial design with blocking Example 1 2 k p fractional

More information

Is economic freedom related to economic growth?

Is economic freedom related to economic growth? Is economic freedom related to economic growth? It is an article of faith among supporters of capitalism: economic freedom leads to economic growth. The publication Economic Freedom of the World: 2003

More information

Module 03 Lecture 14 Inferential Statistics ANOVA and TOI

Module 03 Lecture 14 Inferential Statistics ANOVA and TOI Introduction of Data Analytics Prof. Nandan Sudarsanam and Prof. B Ravindran Department of Management Studies and Department of Computer Science and Engineering Indian Institute of Technology, Madras Module

More information

VARIANCE COMPONENT ANALYSIS

VARIANCE COMPONENT ANALYSIS VARIANCE COMPONENT ANALYSIS T. KRISHNAN Cranes Software International Limited Mahatma Gandhi Road, Bangalore - 560 001 krishnan.t@systat.com 1. Introduction In an experiment to compare the yields of two

More information

Analysing data: regression and correlation S6 and S7

Analysing data: regression and correlation S6 and S7 Basic medical statistics for clinical and experimental research Analysing data: regression and correlation S6 and S7 K. Jozwiak k.jozwiak@nki.nl 2 / 49 Correlation So far we have looked at the association

More information

28. SIMPLE LINEAR REGRESSION III

28. SIMPLE LINEAR REGRESSION III 28. SIMPLE LINEAR REGRESSION III Fitted Values and Residuals To each observed x i, there corresponds a y-value on the fitted line, y = βˆ + βˆ x. The are called fitted values. ŷ i They are the values of

More information

Exam Applied Statistical Regression. Good Luck!

Exam Applied Statistical Regression. Good Luck! Dr. M. Dettling Summer 2011 Exam Applied Statistical Regression Approved: Tables: Note: Any written material, calculator (without communication facility). Attached. All tests have to be done at the 5%-level.

More information

22s:152 Applied Linear Regression. There are a couple commonly used models for a one-way ANOVA with m groups. Chapter 8: ANOVA

22s:152 Applied Linear Regression. There are a couple commonly used models for a one-way ANOVA with m groups. Chapter 8: ANOVA 22s:152 Applied Linear Regression Chapter 8: ANOVA NOTE: We will meet in the lab on Monday October 10. One-way ANOVA Focuses on testing for differences among group means. Take random samples from each

More information

Regression. Estimation of the linear function (straight line) describing the linear component of the joint relationship between two variables X and Y.

Regression. Estimation of the linear function (straight line) describing the linear component of the joint relationship between two variables X and Y. Regression Bivariate i linear regression: Estimation of the linear function (straight line) describing the linear component of the joint relationship between two variables and. Generally describe as a

More information

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X.

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X. Estimating σ 2 We can do simple prediction of Y and estimation of the mean of Y at any value of X. To perform inferences about our regression line, we must estimate σ 2, the variance of the error term.

More information

BIOMETRICS INFORMATION

BIOMETRICS INFORMATION BIOMETRICS INFORMATION Index of Pamphlet Topics (for pamphlets #1 to #60) as of December, 2000 Adjusted R-square ANCOVA: Analysis of Covariance 13: ANCOVA: Analysis of Covariance ANOVA: Analysis of Variance

More information

Linear Modelling in Stata Session 6: Further Topics in Linear Modelling

Linear Modelling in Stata Session 6: Further Topics in Linear Modelling Linear Modelling in Stata Session 6: Further Topics in Linear Modelling Mark Lunt Arthritis Research UK Epidemiology Unit University of Manchester 14/11/2017 This Week Categorical Variables Categorical

More information

Chapter 3 Multiple Regression Complete Example

Chapter 3 Multiple Regression Complete Example Department of Quantitative Methods & Information Systems ECON 504 Chapter 3 Multiple Regression Complete Example Spring 2013 Dr. Mohammad Zainal Review Goals After completing this lecture, you should be

More information

Unbalanced Designs & Quasi F-Ratios

Unbalanced Designs & Quasi F-Ratios Unbalanced Designs & Quasi F-Ratios ANOVA for unequal n s, pooled variances, & other useful tools Unequal nʼs Focus (so far) on Balanced Designs Equal n s in groups (CR-p and CRF-pq) Observation in every

More information

Inference for the Regression Coefficient

Inference for the Regression Coefficient Inference for the Regression Coefficient Recall, b 0 and b 1 are the estimates of the slope β 1 and intercept β 0 of population regression line. We can shows that b 0 and b 1 are the unbiased estimates

More information

22s:152 Applied Linear Regression

22s:152 Applied Linear Regression 22s:152 Applied Linear Regression Chapter 7: Dummy Variable Regression So far, we ve only considered quantitative variables in our models. We can integrate categorical predictors by constructing artificial

More information

Chapter 4: Regression Models

Chapter 4: Regression Models Sales volume of company 1 Textbook: pp. 129-164 Chapter 4: Regression Models Money spent on advertising 2 Learning Objectives After completing this chapter, students will be able to: Identify variables,

More information

Predict y from (possibly) many predictors x. Model Criticism Study the importance of columns

Predict y from (possibly) many predictors x. Model Criticism Study the importance of columns Lecture Week Multiple Linear Regression Predict y from (possibly) many predictors x Including extra derived variables Model Criticism Study the importance of columns Draw on Scientific framework Experiment;

More information

MORE ON MULTIPLE REGRESSION

MORE ON MULTIPLE REGRESSION DEPARTMENT OF POLITICAL SCIENCE AND INTERNATIONAL RELATIONS Posc/Uapp 816 MORE ON MULTIPLE REGRESSION I. AGENDA: A. Multiple regression 1. Categorical variables with more than two categories 2. Interaction

More information

Simple Linear Regression Using Ordinary Least Squares

Simple Linear Regression Using Ordinary Least Squares Simple Linear Regression Using Ordinary Least Squares Purpose: To approximate a linear relationship with a line. Reason: We want to be able to predict Y using X. Definition: The Least Squares Regression

More information

Chapter 1. Linear Regression with One Predictor Variable

Chapter 1. Linear Regression with One Predictor Variable Chapter 1. Linear Regression with One Predictor Variable 1.1 Statistical Relation Between Two Variables To motivate statistical relationships, let us consider a mathematical relation between two mathematical

More information

A discussion on multiple regression models

A discussion on multiple regression models A discussion on multiple regression models In our previous discussion of simple linear regression, we focused on a model in which one independent or explanatory variable X was used to predict the value

More information

Statistics 512: Solution to Homework#11. Problems 1-3 refer to the soybean sausage dataset of Problem 20.8 (ch21pr08.dat).

Statistics 512: Solution to Homework#11. Problems 1-3 refer to the soybean sausage dataset of Problem 20.8 (ch21pr08.dat). Statistics 512: Solution to Homework#11 Problems 1-3 refer to the soybean sausage dataset of Problem 20.8 (ch21pr08.dat). 1. Perform the two-way ANOVA without interaction for this model. Use the results

More information

Density Temp vs Ratio. temp

Density Temp vs Ratio. temp Temp Ratio Density 0.00 0.02 0.04 0.06 0.08 0.10 0.12 Density 0.0 0.2 0.4 0.6 0.8 1.0 1. (a) 170 175 180 185 temp 1.0 1.5 2.0 2.5 3.0 ratio The histogram shows that the temperature measures have two peaks,

More information

Introduction to Statistical modeling: handout for Math 489/583

Introduction to Statistical modeling: handout for Math 489/583 Introduction to Statistical modeling: handout for Math 489/583 Statistical modeling occurs when we are trying to model some data using statistical tools. From the start, we recognize that no model is perfect

More information

Measuring the fit of the model - SSR

Measuring the fit of the model - SSR Measuring the fit of the model - SSR Once we ve determined our estimated regression line, we d like to know how well the model fits. How far/close are the observations to the fitted line? One way to do

More information

Lecture 19 Multiple (Linear) Regression

Lecture 19 Multiple (Linear) Regression Lecture 19 Multiple (Linear) Regression Thais Paiva STA 111 - Summer 2013 Term II August 1, 2013 1 / 30 Thais Paiva STA 111 - Summer 2013 Term II Lecture 19, 08/01/2013 Lecture Plan 1 Multiple regression

More information

Intro to Linear Regression

Intro to Linear Regression Intro to Linear Regression Introduction to Regression Regression is a statistical procedure for modeling the relationship among variables to predict the value of a dependent variable from one or more predictor

More information

[4+3+3] Q 1. (a) Describe the normal regression model through origin. Show that the least square estimator of the regression parameter is given by

[4+3+3] Q 1. (a) Describe the normal regression model through origin. Show that the least square estimator of the regression parameter is given by Concordia University Department of Mathematics and Statistics Course Number Section Statistics 360/1 40 Examination Date Time Pages Final June 2004 3 hours 7 Instructors Course Examiner Marks Y.P. Chaubey

More information

O2. The following printout concerns a best subsets regression. Questions follow.

O2. The following printout concerns a best subsets regression. Questions follow. STAT-UB.0103 Exam 01.APIL.11 OVAL Version Solutions O1. Frank Tanner is the lab manager at BioVigor, a firm that runs studies for agricultural food supplements. He has been asked to design a protocol for

More information

R 2 and F -Tests and ANOVA

R 2 and F -Tests and ANOVA R 2 and F -Tests and ANOVA December 6, 2018 1 Partition of Sums of Squares The distance from any point y i in a collection of data, to the mean of the data ȳ, is the deviation, written as y i ȳ. Definition.

More information

Correlation and Regression

Correlation and Regression Correlation and Regression Dr. Bob Gee Dean Scott Bonney Professor William G. Journigan American Meridian University 1 Learning Objectives Upon successful completion of this module, the student should

More information

Chapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression

Chapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression BSTT523: Kutner et al., Chapter 1 1 Chapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression Introduction: Functional relation between

More information

Biostatistics and Design of Experiments Prof. Mukesh Doble Department of Biotechnology Indian Institute of Technology, Madras

Biostatistics and Design of Experiments Prof. Mukesh Doble Department of Biotechnology Indian Institute of Technology, Madras Biostatistics and Design of Experiments Prof. Mukesh Doble Department of Biotechnology Indian Institute of Technology, Madras Lecture - 39 Regression Analysis Hello and welcome to the course on Biostatistics

More information

STA441: Spring Multiple Regression. This slide show is a free open source document. See the last slide for copyright information.

STA441: Spring Multiple Regression. This slide show is a free open source document. See the last slide for copyright information. STA441: Spring 2018 Multiple Regression This slide show is a free open source document. See the last slide for copyright information. 1 Least Squares Plane 2 Statistical MODEL There are p-1 explanatory

More information

STAT 213 Interactions in Two-Way ANOVA

STAT 213 Interactions in Two-Way ANOVA STAT 213 Interactions in Two-Way ANOVA Colin Reimer Dawson Oberlin College 14 April 2016 Outline Last Time: Two-Way ANOVA Interaction Terms Reading Quiz (Multiple Choice) If there is no interaction present,

More information

6. Multiple regression - PROC GLM

6. Multiple regression - PROC GLM Use of SAS - November 2016 6. Multiple regression - PROC GLM Karl Bang Christensen Department of Biostatistics, University of Copenhagen. http://biostat.ku.dk/~kach/sas2016/ kach@biostat.ku.dk, tel: 35327491

More information

SMA 6304 / MIT / MIT Manufacturing Systems. Lecture 10: Data and Regression Analysis. Lecturer: Prof. Duane S. Boning

SMA 6304 / MIT / MIT Manufacturing Systems. Lecture 10: Data and Regression Analysis. Lecturer: Prof. Duane S. Boning SMA 6304 / MIT 2.853 / MIT 2.854 Manufacturing Systems Lecture 10: Data and Regression Analysis Lecturer: Prof. Duane S. Boning 1 Agenda 1. Comparison of Treatments (One Variable) Analysis of Variance

More information

Contents. TAMS38 - Lecture 10 Response surface. Lecturer: Jolanta Pielaszkiewicz. Response surface 3. Response surface, cont. 4

Contents. TAMS38 - Lecture 10 Response surface. Lecturer: Jolanta Pielaszkiewicz. Response surface 3. Response surface, cont. 4 Contents TAMS38 - Lecture 10 Response surface Lecturer: Jolanta Pielaszkiewicz Matematisk statistik - Matematiska institutionen Linköpings universitet Look beneath the surface; let not the several quality

More information

Department of Mathematics & Statistics STAT 2593 Final Examination 17 April, 2000

Department of Mathematics & Statistics STAT 2593 Final Examination 17 April, 2000 Department of Mathematics & Statistics STAT 2593 Final Examination 17 April, 2000 TIME: 3 hours. Total marks: 80. (Marks are indicated in margin.) Remember that estimate means to give an interval estimate.

More information

Interpreting the coefficients

Interpreting the coefficients Lecture Week 5 Multiple Linear Regression Interpreting the coefficients Uses of Multiple Regression Predict for specified new x-vars Predict in time. Focus on one parameter Use regression to adjust variation

More information

Multiple Regression. Inference for Multiple Regression and A Case Study. IPS Chapters 11.1 and W.H. Freeman and Company

Multiple Regression. Inference for Multiple Regression and A Case Study. IPS Chapters 11.1 and W.H. Freeman and Company Multiple Regression Inference for Multiple Regression and A Case Study IPS Chapters 11.1 and 11.2 2009 W.H. Freeman and Company Objectives (IPS Chapters 11.1 and 11.2) Multiple regression Data for multiple

More information

PubH 7405: REGRESSION ANALYSIS. MLR: INFERENCES, Part I

PubH 7405: REGRESSION ANALYSIS. MLR: INFERENCES, Part I PubH 7405: REGRESSION ANALYSIS MLR: INFERENCES, Part I TESTING HYPOTHESES Once we have fitted a multiple linear regression model and obtained estimates for the various parameters of interest, we want to

More information

INFERENCE FOR REGRESSION

INFERENCE FOR REGRESSION CHAPTER 3 INFERENCE FOR REGRESSION OVERVIEW In Chapter 5 of the textbook, we first encountered regression. The assumptions that describe the regression model we use in this chapter are the following. We

More information