Part II { Oneway Anova, Simple Linear Regression and ANCOVA with R

Size: px
Start display at page:

Download "Part II { Oneway Anova, Simple Linear Regression and ANCOVA with R"

Transcription

1 Part II { Oneway Anova, Simple Linear Regression and ANCOVA with R Gilles Lamothe February 21, 2017 Contents 1 Anova with one factor The data A visual description The aov function Pairwise comparisons with Tukey's method Comparison to a control { Dunnett's test Recode a vector with car Test for trend Fitting a polynomial response Optimizing a function Simple Linear Regression Test for lack-of-t Applying our new skills 11 4 General Linear Model { ANCOVA Regression by groups A general linear model Test for Lack-of-t Comparing the tness status groups Comparing the eect of age according to tness status Applying our new skills 21 1

2 Intro to R Workshop - Psychology Statistics Club 2 Intro to R Workshop - Psychology Statistics Club 1 Anova with one factor 1.1 The data Consider the data in the le Fish.txt. (Source: Design and Analysis of Experiements, by Dean and Voss). Description: The response variable in this experiment is hemoglobin in the blood of brown trout (in grams per 100 ml). The trout were randomly separated in four trough. The food given to the trough contained 0, 5, 10 and 15 grams of sulfamerazine per 100 lbs of sh (coded 1, 2, 3, 4). The measures were achieved on 10 randomly chosen sh from each trough after 35 days. We import the data and display the names of the columns. > fish<-read.table("fish.txt",header=true,sep="\t") > names(fish) [1] "hemoglobin" "group" > sapply(fish,is.factor) hemoglobin group FALSE FALSE We will start with an ANOVA. The explanatory variable must be a factor (i.e. a categorical variable). It identied the groups. > fish$group<-factor(fish$group) > levels(fish$group) [1] "1" "2" "3" "4" 1.2 A visual description We can use comparative boxplots to describe the amount of hemoglobin in the blood according the amount of sulfamerazine in the food. with(fish,boxplot(hemoglobin~group, ylab="hemoglobin",xlab="group"))

3 Intro to R Workshop - Psychology Statistics Club 3 Does the amount of sulfamerazine in the food appear to have eects on the amount of hemoglobin in the blood? 1.3 The aov function We will use the aov function to t an ANOVA model with one factor. Then, with the summary function, we obtain a summary of the t. > mod<-aov(hemoglobin~group,fish) > summary(mod) Df Sum Sq Mean Sq F value Pr(>F) group ** Residuals Remarks: The group means are signicantly dierent. An Anova model is a Linear model, so we could have used the lm function to t the model. Note that R 2 = 32:32%. > model<-lm(hemoglobin~group,fish) > summary(model) Call: lm(formula = hemoglobin ~ group, data = fish) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) < 2e-16 *** group *** group ** group *

4 Intro to R Workshop - Psychology Statistics Club 4 Residual standard error: on 36 degrees of freedom Multiple R-squared: ,Adjusted R-squared: F-statistic: on 3 and 36 DF, p-value: We can get the ANOVA table for the tted linear model. This is the test for the equality of means. > anova(model) Analysis of Variance Table Response: hemoglobin Df Sum Sq Mean Sq F value Pr(>F) group ** Residuals Pairwise comparisons with Tukey's method We can use an aov object in the function TukeyHSD. > TukeyHSD(mod) Tukey multiple comparisons of means 95% family-wise confidence level Fit: aov(formula = hemoglobin ~ group, data = fish) $group diff lwr upr p adj We can use the glht function from the multcomp package to test these multiple hypotheses. > library(multcomp) > summary(glht(model, linfct = mcp(group = "Tukey"))) Simultaneous Tests for General Linear Hypotheses Multiple Comparisons of Means: Tukey Contrasts

5 Intro to R Workshop - Psychology Statistics Club 5 Fit: lm(formula = hemoglobin ~ group, data = fish) Linear Hypotheses: Estimate Std. Error t value Pr(> t ) 2-1 == ** 3-1 == * 4-1 == == == == (Adjusted p values reported -- single-step method) 1.5 Comparison to a control { Dunnett's test At times, we are not necessarily interested in all pairwise comparison. We might only be interested in comparing the control group to the other groups. To do so, we use Dunnett's test. > summary(glht(model, linfct = mcp(group = "Dunnett"))) Simultaneous Tests for General Linear Hypotheses Multiple Comparisons of Means: Dunnett Contrasts Fit: lm(formula = hemoglobin ~ group, data = fish) Linear Hypotheses: Estimate Std. Error t value Pr(> t ) 2-1 == ** 3-1 == ** 4-1 == * (Adjusted p values reported -- single-step method) Remark: R assumes that the rst level for the factor corresponds to the control group. Keep this in mind, when using Dunnett's test. 1.6 Recode a vector with car When the explanatory variable x is quantitative, we might want to describe the mean response as a function of x. Let us, import the data in the le fish.txt again.

6 Intro to R Workshop - Psychology Statistics Club 6 > fish<-read.table("fish.txt",header=true,sep="\t") > names(fish) [1] "hemoglobin" "group" > unique(fish$group) [1] The groups 1, 2, 3, 4 are associated to the quantity x in grams of sulfamerazine per 100 lbs of sh, which are 0, 5, 10 and 15, respectively. We can recode a vector with the function recode from the car package. # Install the car package install.packages("car") # Load the car package library(car) We recode the values and assign them to the vector sulfamerazine. Then, we add the new vector to the dataframe fish. > sulfamerazine<-recode(fish$group,"1=0;2=5;3=10;4=15") > fish<-data.frame(fish,sulfamerazine) > unique(fish$sulfamerazine) [1] Test for trend We can do a test for trend with orthogonal polynomials in x of degree 0, 1, 2,..., r-1, where r is the number of levels of x. Our explanatory variable has r = 4 levels. So we will use polynomials with at most a degree of three. > model<-aov(hemoglobin~poly(sulfamerazine,3),fish) > summary.lm(model) Call: aov(formula = hemoglobin ~ poly(sulfamerazine, 3), data = fish) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) < 2e-16 *** poly(sulfamerazine, 3) * poly(sulfamerazine, 3) **

7 Intro to R Workshop - Psychology Statistics Club 7 poly(sulfamerazine, 3) Residual standard error: on 36 degrees of freedom Multiple R-squared: ,Adjusted R-squared: F-statistic: on 3 and 36 DF, p-value: Remark: The cubic trend is not signicant. However, the quadratic trend is signicant. We need a polynomial of degree 2 to describe the mean hemoglobin as a function of the amount of sulfamerazine. 1.8 Fitting a polynomial response We want to nd an estimated regression model of the form: ^y = b 0 + b 1 x + b 2 x 2 ; where ^y is the estimated amount of hemoglobin and x is the amount sulfamerazine in the food. We want to add x and x 2 to the model. We use the syntax I(x 2 ) to add x 2 to the model. > lm(hemoglobin~sulfamerazine+i(sulfamerazine^2),fish) Call: lm(formula = hemoglobin ~ sulfamerazine + I(sulfamerazine^2), data = fish) Coefficients: (Intercept) sulfamerazine I(sulfamerazine^2) Our estimated model is ^y = 7: :4513 x 0:0245 x 2 : We will produce a scatter plot with an overlay of the estimated regression model. ### produce a scatter plot with(fish,plot(sulfamerazine,hemoglobin, xlab="sulfamerazine (in grams per 100 lbs of fish)", ylab="hemoglobin in the blood",cex.lab=1.25,cex.axis=1.25)) The above command produces the following scatter plot.

8 Intro to R Workshop - Psychology Statistics Club 8 To overlay the estimated model on the plot, we will need to evaluate the model at many points. Since x goes from 0 to 15, then we will will produce a sequence of many values from 0 to 15. Then, we create a dataframe that contains all of these values and name this vector sulfamerazine. It should have the same name as the explanatory factor in the model. x=seq(0,15,by=0.1) newdata<-data.frame(sulfamerazine=x) We now use the function predict, to evaluate the estimated model at all these values. P<-predict(model,newData) lines(x,p) text(10,7,expression(hat(y)== *x *x^2),cex=1.2) Remarks: The function lines adds points to the plot that are joined by line segments. The function text adds text to the plot. The function expression also us to display a mathematical formula. Here is the resulting graph. Note that it appears that the optimal amount of sulfamerazine is somewhere between 5 and Optimizing a function R has a function called optimize for optimizing univariate functions. To maximize the mean amount of hemoglobin in the blood, we the amount of sulfamerazine should be at x = 9:21. This quantity corresponds to an estimated amount of hemoglobin in the blood of ^y = 9:39.

9 Intro to R Workshop - Psychology Statistics Club 9 > quadf<-function(x) *x *x^2 > optimize(quadf, lower = 0, upper = 15,maximum = TRUE) $maximum [1] $objective [1] Simple Linear Regression Let us reconsider, the sh data, where we describe the hemoglobin in the blood as a function of the amount of sulfamerazine in the food. Suppose that someone decides to use a Regression approach instead of an ANOVA approach. They might try to t a simple linear regression and produce a scatter plot with an overlay of the estimated line. mod<-lm(hemoglobin~sulfamerazine,fish) with(fish,plot(sulfamerazine,hemoglobin)) abline(mod) Comments: The t appears not bad. However, we might be concerned with the lack-of-t of our model. It does not appear to capture the curvature in the trend for small values of x. When there a replication for at least one of the values of x, then we can do a test for lack-of-t. 2.1 Test for lack-of-t To test for lack-of-t, we start with allowing the mean response as a function of the levels of x, but not with any particular pattern. That is, if there are r levels, then there could be r dierent means. This is an ANOVA model. Our null hypothesis will be the mean of the response can be expressed as a linear function of x. This is our simple linear model with x as a predictor. The ANOVA model and the simple linear model are nested. So we can use the anova function to compare the models.

10 Intro to R Workshop - Psychology Statistics Club 10 > # fit the full model > mod<-lm(hemoglobin~factor(sulfamerazine),fish) > # fit the reduced model > mod0<-lm(hemoglobin~sulfamerazine,fish) > # compare the models > anova(mod0,mod) Analysis of Variance Table Model 1: hemoglobin ~ sulfamerazine Model 2: hemoglobin ~ factor(sulfamerazine) Res.Df RSS Df Sum of Sq F Pr(>F) ** We have signicant evidence for the lack-of-t of the simple linear regression model. We could try to improve the t by approximating the response function with a quadratic function in x. Let us test for the lack-of-t of the quadratic model. The evidence for the lack-of-t of the quadratic model is not signicant. > # fit the full model > mod<-lm(hemoglobin~factor(sulfamerazine),fish) > # fit the reduced model > mod0<-lm(hemoglobin~sulfamerazine+i(sulfamerazine^2),fish) > # compare the models > anova(mod0,mod) Analysis of Variance Table Model 1: hemoglobin ~ sulfamerazine + I(sulfamerazine^2) Model 2: hemoglobin ~ factor(sulfamerazine) Res.Df RSS Df Sum of Sq F Pr(>F)

11 Intro to R Workshop - Psychology Statistics Club 11 3 Applying our new skills Exercises to try in class. 1. Consider a company that produces items made from glass. The data is in the le glassworks.txt. (Source: Applied Linear Models, by Kutner et al.). A large company is studying the eects of the length of special training for new employees on the quality of work. Employees are randomly assigned to have either 6, 8, 10, or 12 hours of training. After the special training, each employee is given the same amount of material. The response variable is the number of acceptable pieces. (a) Are there the eects of the duration of the special training on the quality of the work signicant? (b) Do a trend analysis to describe the quality of the work as a function of the time in training? 2. Consider the data in the le Productivity.txt. (Source: Applied Linear Models, by Kutner et al.). Description: An economist compiled data on productivity improvements for a sample of rms producing computing equipment. The productivity improvement is measured on a scale from 0 to 100. The rms were classied according to the level of their average expenditures for R&D in the last three years (low, medium, high). (a) Test whether or not the mean productivity improvements diers according to the level of R&D research.

12 Intro to R Workshop - Psychology Statistics Club 12 4 General Linear Model { ANCOVA Rehabilitation: Consider the data in the le Rehabilitation.txt. Describe the association between physical tness prior to surgery of persons undergoing corrective knee surgery and time required in physical therapy until successful rehabilitation. The response: The number of days required for successful completion of physical therapy. One explanatory factor: Status of tness prior to surgery. It is categorical with 3 levels: Below average (=1), average (=2), above average (=3). Observational Design: Cross sectional. A random selection of 24 male subjects ranging in age from 18 to 30 years that undergone similar corrective knee surgery in the last year. This is an ANOVA. Remarks: We can extend an ANOVA by including one ore more numerical explanatory factors. This is called an analysis of covariance (ANCOVA). The purposes of ANCOVA: { Reduce the error variance, so that the ANOVA becomes more powerful. { Control for confounding. { An ANCOVA model is a special case of a General Linear Model (i.e. a linear model with categorical and/or numerical predictors). { We will use the lm function to build the model. Learning Objectives: Learn to create overlayed scatter plots. Learn to use the lm function with more than on predictor. Learn to test for the lack-of-t of linearity. Multiple testing concerning regression coecients (i.e. compare many slopes or compare many intercepts).

13 Intro to R Workshop - Psychology Statistics Club Regression by groups We display the names of the variables and display the levels of the categorical variable. > datar<-read.table("rehabilitation.txt",header=true,sep="\t") > names(datar) [1] "time" "status" "age" > sapply(datar,is.factor) time status age FALSE FALSE FALSE > datar$status<-factor(datar$status) > levels(datar$status) [1] "1" "2" "3" We will create subsets of our dataframe according to the levels of the categorical variable. ## We use a logical argument to determine the rows to keep. ## Nothing after the comma means we keep all columns. data1=datar[datar$status=="1",] data2=datar[datar$status=="2",] data3=datar[datar$status=="3",] For each group, we will nd the least square line for the time for rehabilitation as a function of age. To properly overlay the three scatter plots, we will need the range for all ages and for all values of the response. ylim=range(datar$time) xlim=range(datar$age) We are now ready to produce the three plots by using the plot to create the rst plot and use points to overlay the other plots. ylab="time to rehabilitation (in days)" xlab="age (in years)" plot(data1$age,data1$time,ylab=ylab,xlab=xlab,ylim=ylim,xlim=xlim) abline(lm(data1$time~data1$age),lty=1) points(data2$age,data2$time,ylim=ylim,xlim=xlim,pch=2) abline(lm(data2$time~data2$age),lty=2) points(data3$age,data3$time,ylim=ylim,xlim=xlim,pch=3) abline(lm(data3$time~data3$age),lty=3) legend("bottomright", c("below Average","Average","Above Average"), lty=c(1,2,3), pch=c(1,2,3))

14 Intro to R Workshop - Psychology Statistics Club 14 Remarks: The slopes appear to be similar. Here the coecients for the three models. > lm(time~age,data=data1)$coefficients (Intercept) age > lm(time~age,data=data2)$coefficients (Intercept) age > lm(time~age,data=data3)$coefficients (Intercept) age If we can assume that the slopes are the same, then to describe the eects of the tness status it suces to compare the intercepts. To formally compare the slopes, we can build a linear model to allows for dierent slopes according to tness status and compare it to a reduced model where the slopes are assumed to be equal. (This is a general linear test.) 4.2 A general linear model We t a linear model: age and status are the predictors. The symbol * is indicate that we want a model with interactions. This means that we are allowing a dierent coecient for age according to the levels of status. > modelf=lm(time~age*status,data=datar) > anova(modelf) Analysis of Variance Table Response: time Df Sum Sq Mean Sq F value Pr(>F) age < 2.2e-16 *** status e-15 *** age:status Residuals

15 Intro to R Workshop - Psychology Statistics Club 15 Remarks: There are not signicant interactions between age and status. These results suggest that the slope of the regression between the response and age is similar for the three groups. We will t the reduced model and compare it the above full model. The symbol + indicates that we want an additive model. This means only one coecient for age (i.e. it does not depend on the tness status). > modelr=lm(time~age+status,data=datar) > anova(modelr,modelf) Analysis of Variance Table Model 1: time ~ age + status Model 2: time ~ age * status Res.Df RSS Df Sum of Sq F Pr(>F) Remark: This is a general linear test for the equality of the slopes. The p-value is large. So the evidence against the equality of the slopes is not signicant. Note that the p-value is exactly the same at the test for interaction. Actually, the tests are testing exactly the same thing. 4.3 Test for Lack-of-t Is it reasonable to assume that the associations between time for rehabilitation and age are linear? If not, then the ANCOVA will be biased. If we have repeated values for the variable age, we can test for linearity. We simply make the age a factor. This means that we make no assumption about the form of the association between the response and age. > modelfull=lm(time~factor(age)*status,data=datar) > anova(modelf,modelfull) Analysis of Variance Table Model 1: time ~ age * status Model 2: time ~ factor(age) * status Res.Df RSS Df Sum of Sq F Pr(>F) Remarks:

16 Intro to R Workshop - Psychology Statistics Club 16 We did not get a p-value. This means that there are not repeated values for age. So we cannot measure the error variance. If we round the ages to the nearest integer, we might get repeated values. There is not signicant dierence between the two models. Hence it is reasonable to assume linearity. > modelfull=lm(time~factor(round(age,0))*status,data=datar) > anova(modelf,modelfull) Analysis of Variance Table Model 1: time ~ age * status Model 2: time ~ factor(round(age, 0)) * status Res.Df RSS Df Sum of Sq F Pr(>F) If rounding did not work, try to approximate deviations from linearity with a quadratic model. We will t the larger model by using a quadratic function of age. > anova(modelf,modelfull) Analysis of Variance Table Model 1: time ~ age * status Model 2: time ~ (age + I(age^2)) * status Res.Df RSS Df Sum of Sq F Pr(>F) There are no signicant dierences between the two nested models. So it is reasonable to use the simpler model, i.e. to assume that the associations between the response and age are linear. 4.4 Comparing the tness status groups We established that the slopes of the three groups are the same. This means that it suces to compare the intercepts for a comparison of the three tness status groups. We will use the additive model. The p-value for the signicance of the status is very small. The are signicant dierences between the three groups. > modelr=lm(time~age+status,data=datar) > anova(modelr) Analysis of Variance Table Response: time Df Sum Sq Mean Sq F value Pr(>F)

17 Intro to R Workshop - Psychology Statistics Club 17 age < 2.2e-16 *** status < 2.2e-16 *** Residuals Warning: We will change the order of the predictors. Look at the F -value for status. Do you notice something strange? It has changed a lot. R uses sequential sums of squares (i.e. of type I). This means that when testing for the signicance of a predictor, we are only controlling for the eects of the previous covariable. If we put age after status, then the test for status does not take age into account. To not worry about order, we could use type II sum of squares. We use the Anova function in the car package. > anova(lm(time~status+age,data=datar)) Analysis of Variance Table Response: time Df Sum Sq Mean Sq F value Pr(>F) status < 2.2e-16 *** age < 2.2e-16 *** Residuals > library(car) > Anova(lm(time~status+age,data=dataR)) Anova Table (Type II tests) Response: time Sum Sq Df F value Pr(>F) status < 2.2e-16 *** age < 2.2e-16 *** Residuals

18 Intro to R Workshop - Psychology Statistics Club 18 We established that there are signicant dierences between the three tness status groups. Where are does dierences? We will need the multcomp package. Once it is installed, we load it: > library(multcomp). We need to interpret the regression coecients: 0 ; 1 ; 2 ; 3. > modelr$coefficients (Intercept) age status2 status is the slope for age. Notice that status 1 is missing. This means that the intercept (i.e. 0 is for group 1). 2 is the eect for group 2 in comparison to group 1. 3 is the eect for group 3 in comparison to group 1. We want to test: 2 = 0 i.e. groups 1 and 2 are equal 3 = 0 2 = 3 i.e. groups 1 and 3 are equal i.e. groups 2 and 3 are equal We can use the glht function from the multcomp package to test these multiple hypotheses. > summary(glht(modelr, linfct = mcp(status = "Tukey"))) Simultaneous Tests for General Linear Hypotheses Multiple Comparisons of Means: Tukey Contrasts Fit: lm(formula = time ~ age + status, data = datar) Linear Hypotheses: Estimate Std. Error t value Pr(> t ) 2-1 == <1e-05 *** 3-1 == <1e-05 *** 3-2 == <1e-05 *** (Adjusted p values reported -- single-step method)

19 Intro to R Workshop - Psychology Statistics Club Comparing the eect of age according to tness status We concluded above that the eect of age on the time for rehabilitation were equal equal between the tness groups. What would we do if they were not equal? We will do some pairwise comparisons of the slopes. Let us consider the full model with interactions. > modelf=lm(time~age*status,data=datar) > modelf$coefficients (Intercept) age status2 status3 age:status2 age:status We would like to compare the coecients for age: age, age:status2 age:status3. We will denote them 1, 4, 5. 1 is the slope for age within group 1. 4 is the comparison (i.e. deviation) of the slope for age within group 2 compared to group 1. 5 is the comparison (i.e. deviation) of the slope for age within group 3 compared to group 1. We want to test: 4 = 0 slopes for groups 1 and 2 are equal 5 = 0 slopes for groups 1 and 3 are equal 4 5 = 0 slopes for groups 2 and 3 are equal We can use the glht function from the multcomp package to test these multiple hypotheses. library(multcomp) K <- rbind( c(0, 0, 0, 0, 1, 0), c(0, 0, 0, 0, 0, 1), c(0, 0, 0, 0, 1, -1)) rownames(k) <- c("2-1", "3-1","2-3") summary(glht(modelf, linfct = K)) Remarks: The model has six coecients: 0 ; 1 ; : : : ; 5. We are building a matrix for the test: the columns are the coecients and the rows are the comparisons.

20 Intro to R Workshop - Psychology Statistics Club 20 The row (0, 0, 0, 1, -1, 0) means 4 5. We obtain the following output. Simultaneous Tests for General Linear Hypotheses Fit: lm(formula = time ~ age * status, data = datar) Linear Hypotheses: Estimate Std. Error t value Pr(> t ) 2-1 == == == (Adjusted p values reported -- single-step method) We observe that there are no signicant dierences in the pairwise comparisons. This should not be surprising, since we had concluded that the slopes were equal.

21 Intro to R Workshop - Psychology Statistics Club 21 5 Applying our new skills Exercises to try in class. 1. Consider the data in the le Productivity.txt. (Source: Applied Linear Models, by Kutner et al.). Description: An economist compiled data on productivity improvements for a sample of rms producing computing equipment. The productivity improvement is measured on a scale from 0 to 100. The rms were classied according to the level of their average expenditures for R&D in the last three years (low, medium, high). The economist also has information on annual productivity improvement in the prior year and wishes to use this information as a covariable in the model. (a) Test whether the three regression lines that describe the productivity improvements as a linear function of the productivity improvements in the prior year have the same slope? (b) Test a formal test here as to whether the three regression functions are linear? (c) Make pairwise comparisons to compare the rm according to their average expenditures for R&D in the last three years with an ANCOVA.

Biostatistics for physicists fall Correlation Linear regression Analysis of variance

Biostatistics for physicists fall Correlation Linear regression Analysis of variance Biostatistics for physicists fall 2015 Correlation Linear regression Analysis of variance Correlation Example: Antibody level on 38 newborns and their mothers There is a positive correlation in antibody

More information

Inferences on Linear Combinations of Coefficients

Inferences on Linear Combinations of Coefficients Inferences on Linear Combinations of Coefficients Note on required packages: The following code required the package multcomp to test hypotheses on linear combinations of regression coefficients. If you

More information

Regression. Bret Hanlon and Bret Larget. December 8 15, Department of Statistics University of Wisconsin Madison.

Regression. Bret Hanlon and Bret Larget. December 8 15, Department of Statistics University of Wisconsin Madison. Regression Bret Hanlon and Bret Larget Department of Statistics University of Wisconsin Madison December 8 15, 2011 Regression 1 / 55 Example Case Study The proportion of blackness in a male lion s nose

More information

Analysis of variance. Gilles Guillot. September 30, Gilles Guillot September 30, / 29

Analysis of variance. Gilles Guillot. September 30, Gilles Guillot September 30, / 29 Analysis of variance Gilles Guillot gigu@dtu.dk September 30, 2013 Gilles Guillot (gigu@dtu.dk) September 30, 2013 1 / 29 1 Introductory example 2 One-way ANOVA 3 Two-way ANOVA 4 Two-way ANOVA with interactions

More information

General Linear Statistical Models - Part III

General Linear Statistical Models - Part III General Linear Statistical Models - Part III Statistics 135 Autumn 2005 Copyright c 2005 by Mark E. Irwin Interaction Models Lets examine two models involving Weight and Domestic in the cars93 dataset.

More information

> nrow(hmwk1) # check that the number of observations is correct [1] 36 > attach(hmwk1) # I like to attach the data to avoid the '$' addressing

> nrow(hmwk1) # check that the number of observations is correct [1] 36 > attach(hmwk1) # I like to attach the data to avoid the '$' addressing Homework #1 Key Spring 2014 Psyx 501, Montana State University Prof. Colleen F Moore Preliminary comments: The design is a 4x3 factorial between-groups. Non-athletes do aerobic training for 6, 4 or 2 weeks,

More information

Tests of Linear Restrictions

Tests of Linear Restrictions Tests of Linear Restrictions 1. Linear Restricted in Regression Models In this tutorial, we consider tests on general linear restrictions on regression coefficients. In other tutorials, we examine some

More information

1 Multiple Regression

1 Multiple Regression 1 Multiple Regression In this section, we extend the linear model to the case of several quantitative explanatory variables. There are many issues involved in this problem and this section serves only

More information

Workshop 7.4a: Single factor ANOVA

Workshop 7.4a: Single factor ANOVA -1- Workshop 7.4a: Single factor ANOVA Murray Logan November 23, 2016 Table of contents 1 Revision 1 2 Anova Parameterization 2 3 Partitioning of variance (ANOVA) 10 4 Worked Examples 13 1. Revision 1.1.

More information

FACTORIAL DESIGNS and NESTED DESIGNS

FACTORIAL DESIGNS and NESTED DESIGNS Experimental Design and Statistical Methods Workshop FACTORIAL DESIGNS and NESTED DESIGNS Jesús Piedrafita Arilla jesus.piedrafita@uab.cat Departament de Ciència Animal i dels Aliments Items Factorial

More information

Multiple Predictor Variables: ANOVA

Multiple Predictor Variables: ANOVA What if you manipulate two factors? Multiple Predictor Variables: ANOVA Block 1 Block 2 Block 3 Block 4 A B C D B C D A C D A B D A B C Randomized Controlled Blocked Design: Design where each treatment

More information

Simple, Marginal, and Interaction Effects in General Linear Models

Simple, Marginal, and Interaction Effects in General Linear Models Simple, Marginal, and Interaction Effects in General Linear Models PRE 905: Multivariate Analysis Lecture 3 Today s Class Centering and Coding Predictors Interpreting Parameters in the Model for the Means

More information

STAT 572 Assignment 5 - Answers Due: March 2, 2007

STAT 572 Assignment 5 - Answers Due: March 2, 2007 1. The file glue.txt contains a data set with the results of an experiment on the dry sheer strength (in pounds per square inch) of birch plywood, bonded with 5 different resin glues A, B, C, D, and E.

More information

Multiple Predictor Variables: ANOVA

Multiple Predictor Variables: ANOVA Multiple Predictor Variables: ANOVA 1/32 Linear Models with Many Predictors Multiple regression has many predictors BUT - so did 1-way ANOVA if treatments had 2 levels What if there are multiple treatment

More information

Stats fest Analysis of variance. Single factor ANOVA. Aims. Single factor ANOVA. Data

Stats fest Analysis of variance. Single factor ANOVA. Aims. Single factor ANOVA. Data 1 Stats fest 2007 Analysis of variance murray.logan@sci.monash.edu.au Single factor ANOVA 2 Aims Description Investigate differences between population means Explanation How much of the variation in response

More information

Chapter 16: Understanding Relationships Numerical Data

Chapter 16: Understanding Relationships Numerical Data Chapter 16: Understanding Relationships Numerical Data These notes reflect material from our text, Statistics, Learning from Data, First Edition, by Roxy Peck, published by CENGAGE Learning, 2015. Linear

More information

Regression and the 2-Sample t

Regression and the 2-Sample t Regression and the 2-Sample t James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) Regression and the 2-Sample t 1 / 44 Regression

More information

Extensions of One-Way ANOVA.

Extensions of One-Way ANOVA. Extensions of One-Way ANOVA http://www.pelagicos.net/classes_biometry_fa18.htm What do I want You to Know What are two main limitations of ANOVA? What two approaches can follow a significant ANOVA? How

More information

STAT 3022 Spring 2007

STAT 3022 Spring 2007 Simple Linear Regression Example These commands reproduce what we did in class. You should enter these in R and see what they do. Start by typing > set.seed(42) to reset the random number generator so

More information

Regression, Part I. - In correlation, it would be irrelevant if we changed the axes on our graph.

Regression, Part I. - In correlation, it would be irrelevant if we changed the axes on our graph. Regression, Part I I. Difference from correlation. II. Basic idea: A) Correlation describes the relationship between two variables, where neither is independent or a predictor. - In correlation, it would

More information

Lecture 15 Topic 11: Unbalanced Designs (missing data)

Lecture 15 Topic 11: Unbalanced Designs (missing data) Lecture 15 Topic 11: Unbalanced Designs (missing data) In the real world, things fall apart: plants are destroyed/trampled/eaten animals get sick volunteers quit assistants are sloppy accidents happen

More information

Stat 5303 (Oehlert): Analysis of CR Designs; January

Stat 5303 (Oehlert): Analysis of CR Designs; January Stat 5303 (Oehlert): Analysis of CR Designs; January 2016 1 > resin

More information

Example: 1982 State SAT Scores (First year state by state data available)

Example: 1982 State SAT Scores (First year state by state data available) Lecture 11 Review Section 3.5 from last Monday (on board) Overview of today s example (on board) Section 3.6, Continued: Nested F tests, review on board first Section 3.4: Interaction for quantitative

More information

Psychology 405: Psychometric Theory

Psychology 405: Psychometric Theory Psychology 405: Psychometric Theory Homework Problem Set #2 Department of Psychology Northwestern University Evanston, Illinois USA April, 2017 1 / 15 Outline The problem, part 1) The Problem, Part 2)

More information

Stat 5102 Final Exam May 14, 2015

Stat 5102 Final Exam May 14, 2015 Stat 5102 Final Exam May 14, 2015 Name Student ID The exam is closed book and closed notes. You may use three 8 1 11 2 sheets of paper with formulas, etc. You may also use the handouts on brand name distributions

More information

Booklet of Code and Output for STAC32 Final Exam

Booklet of Code and Output for STAC32 Final Exam Booklet of Code and Output for STAC32 Final Exam December 7, 2017 Figure captions are below the Figures they refer to. LowCalorie LowFat LowCarbo Control 8 2 3 2 9 4 5 2 6 3 4-1 7 5 2 0 3 1 3 3 Figure

More information

A Handbook of Statistical Analyses Using R 3rd Edition. Torsten Hothorn and Brian S. Everitt

A Handbook of Statistical Analyses Using R 3rd Edition. Torsten Hothorn and Brian S. Everitt A Handbook of Statistical Analyses Using R 3rd Edition Torsten Hothorn and Brian S. Everitt CHAPTER 15 Simultaneous Inference and Multiple Comparisons: Genetic Components of Alcoholism, Deer Browsing

More information

Variance Decomposition and Goodness of Fit

Variance Decomposition and Goodness of Fit Variance Decomposition and Goodness of Fit 1. Example: Monthly Earnings and Years of Education In this tutorial, we will focus on an example that explores the relationship between total monthly earnings

More information

Notes on Maxwell & Delaney

Notes on Maxwell & Delaney Notes on Maxwell & Delaney PSY710 9 Designs with Covariates 9.1 Blocking Consider the following hypothetical experiment. We want to measure the effect of a drug on locomotor activity in hyperactive children.

More information

Orthogonal contrasts and multiple comparisons

Orthogonal contrasts and multiple comparisons BIOL 933 Lab 4 Fall 2017 Orthogonal contrasts Class comparisons in R Trend analysis in R Multiple mean comparisons Orthogonal contrasts and multiple comparisons Orthogonal contrasts Planned, single degree-of-freedom

More information

Multiple Regression Part I STAT315, 19-20/3/2014

Multiple Regression Part I STAT315, 19-20/3/2014 Multiple Regression Part I STAT315, 19-20/3/2014 Regression problem Predictors/independent variables/features Or: Error which can never be eliminated. Our task is to estimate the regression function f.

More information

Linear Probability Model

Linear Probability Model Linear Probability Model Note on required packages: The following code requires the packages sandwich and lmtest to estimate regression error variance that may change with the explanatory variables. If

More information

Foundations of Correlation and Regression

Foundations of Correlation and Regression BWH - Biostatistics Intermediate Biostatistics for Medical Researchers Robert Goldman Professor of Statistics Simmons College Foundations of Correlation and Regression Tuesday, March 7, 2017 March 7 Foundations

More information

Stat 412/512 TWO WAY ANOVA. Charlotte Wickham. stat512.cwick.co.nz. Feb

Stat 412/512 TWO WAY ANOVA. Charlotte Wickham. stat512.cwick.co.nz. Feb Stat 42/52 TWO WAY ANOVA Feb 6 25 Charlotte Wickham stat52.cwick.co.nz Roadmap DONE: Understand what a multiple regression model is. Know how to do inference on single and multiple parameters. Some extra

More information

Lecture 2. The Simple Linear Regression Model: Matrix Approach

Lecture 2. The Simple Linear Regression Model: Matrix Approach Lecture 2 The Simple Linear Regression Model: Matrix Approach Matrix algebra Matrix representation of simple linear regression model 1 Vectors and Matrices Where it is necessary to consider a distribution

More information

1.) Fit the full model, i.e., allow for separate regression lines (different slopes and intercepts) for each species

1.) Fit the full model, i.e., allow for separate regression lines (different slopes and intercepts) for each species Lecture notes 2/22/2000 Dummy variables and extra SS F-test Page 1 Crab claw size and closing force. Problem 7.25, 10.9, and 10.10 Regression for all species at once, i.e., include dummy variables for

More information

SCHOOL OF MATHEMATICS AND STATISTICS

SCHOOL OF MATHEMATICS AND STATISTICS RESTRICTED OPEN BOOK EXAMINATION (Not to be removed from the examination hall) Data provided: Statistics Tables by H.R. Neave MAS5052 SCHOOL OF MATHEMATICS AND STATISTICS Basic Statistics Spring Semester

More information

Analysis of Variance. Contents. 1 Analysis of Variance. 1.1 Review. Anthony Tanbakuchi Department of Mathematics Pima Community College

Analysis of Variance. Contents. 1 Analysis of Variance. 1.1 Review. Anthony Tanbakuchi Department of Mathematics Pima Community College Introductory Statistics Lectures Analysis of Variance 1-Way ANOVA: Many sample test of means Department of Mathematics Pima Community College Redistribution of this material is prohibited without written

More information

Statistics 512: Solution to Homework#11. Problems 1-3 refer to the soybean sausage dataset of Problem 20.8 (ch21pr08.dat).

Statistics 512: Solution to Homework#11. Problems 1-3 refer to the soybean sausage dataset of Problem 20.8 (ch21pr08.dat). Statistics 512: Solution to Homework#11 Problems 1-3 refer to the soybean sausage dataset of Problem 20.8 (ch21pr08.dat). 1. Perform the two-way ANOVA without interaction for this model. Use the results

More information

Module 4: Regression Methods: Concepts and Applications

Module 4: Regression Methods: Concepts and Applications Module 4: Regression Methods: Concepts and Applications Example Analysis Code Rebecca Hubbard, Mary Lou Thompson July 11-13, 2018 Install R Go to http://cran.rstudio.com/ (http://cran.rstudio.com/) Click

More information

Inference for Regression

Inference for Regression Inference for Regression Section 9.4 Cathy Poliak, Ph.D. cathy@math.uh.edu Office in Fleming 11c Department of Mathematics University of Houston Lecture 13b - 3339 Cathy Poliak, Ph.D. cathy@math.uh.edu

More information

MS&E 226: Small Data

MS&E 226: Small Data MS&E 226: Small Data Lecture 15: Examples of hypothesis tests (v5) Ramesh Johari ramesh.johari@stanford.edu 1 / 32 The recipe 2 / 32 The hypothesis testing recipe In this lecture we repeatedly apply the

More information

Regression and Models with Multiple Factors. Ch. 17, 18

Regression and Models with Multiple Factors. Ch. 17, 18 Regression and Models with Multiple Factors Ch. 17, 18 Mass 15 20 25 Scatter Plot 70 75 80 Snout-Vent Length Mass 15 20 25 Linear Regression 70 75 80 Snout-Vent Length Least-squares The method of least

More information

Analysis of Covariance. The following example illustrates a case where the covariate is affected by the treatments.

Analysis of Covariance. The following example illustrates a case where the covariate is affected by the treatments. Analysis of Covariance In some experiments, the experimental units (subjects) are nonhomogeneous or there is variation in the experimental conditions that are not due to the treatments. For example, a

More information

SPH 247 Statistical Analysis of Laboratory Data

SPH 247 Statistical Analysis of Laboratory Data SPH 247 Statistical Analysis of Laboratory Data March 31, 2015 SPH 247 Statistical Analysis of Laboratory Data 1 ANOVA Fixed and Random Effects We will review the analysis of variance (ANOVA) and then

More information

Variance Decomposition in Regression James M. Murray, Ph.D. University of Wisconsin - La Crosse Updated: October 04, 2017

Variance Decomposition in Regression James M. Murray, Ph.D. University of Wisconsin - La Crosse Updated: October 04, 2017 Variance Decomposition in Regression James M. Murray, Ph.D. University of Wisconsin - La Crosse Updated: October 04, 2017 PDF file location: http://www.murraylax.org/rtutorials/regression_anovatable.pdf

More information

Exercise 2 SISG Association Mapping

Exercise 2 SISG Association Mapping Exercise 2 SISG Association Mapping Load the bpdata.csv data file into your R session. LHON.txt data file into your R session. Can read the data directly from the website if your computer is connected

More information

ST430 Exam 2 Solutions

ST430 Exam 2 Solutions ST430 Exam 2 Solutions Date: November 9, 2015 Name: Guideline: You may use one-page (front and back of a standard A4 paper) of notes. No laptop or textbook are permitted but you may use a calculator. Giving

More information

Booklet of Code and Output for STAD29/STA 1007 Midterm Exam

Booklet of Code and Output for STAD29/STA 1007 Midterm Exam Booklet of Code and Output for STAD29/STA 1007 Midterm Exam List of Figures in this document by page: List of Figures 1 Packages................................ 2 2 Hospital infection risk data (some).................

More information

BIOL 458 BIOMETRY Lab 9 - Correlation and Bivariate Regression

BIOL 458 BIOMETRY Lab 9 - Correlation and Bivariate Regression BIOL 458 BIOMETRY Lab 9 - Correlation and Bivariate Regression Introduction to Correlation and Regression The procedures discussed in the previous ANOVA labs are most useful in cases where we are interested

More information

Lab 3 A Quick Introduction to Multiple Linear Regression Psychology The Multiple Linear Regression Model

Lab 3 A Quick Introduction to Multiple Linear Regression Psychology The Multiple Linear Regression Model Lab 3 A Quick Introduction to Multiple Linear Regression Psychology 310 Instructions.Work through the lab, saving the output as you go. You will be submitting your assignment as an R Markdown document.

More information

Analysis of Variance

Analysis of Variance Analysis of Variance Bret Hanlon and Bret Larget Department of Statistics University of Wisconsin Madison November 22 November 29, 2011 ANOVA 1 / 59 Cuckoo Birds Case Study Cuckoo birds have a behavior

More information

Extensions of One-Way ANOVA.

Extensions of One-Way ANOVA. Extensions of One-Way ANOVA http://www.pelagicos.net/classes_biometry_fa17.htm What do I want You to Know What are two main limitations of ANOVA? What two approaches can follow a significant ANOVA? How

More information

1-Way ANOVA MATH 143. Spring Department of Mathematics and Statistics Calvin College

1-Way ANOVA MATH 143. Spring Department of Mathematics and Statistics Calvin College 1-Way ANOVA MATH 143 Department of Mathematics and Statistics Calvin College Spring 2010 The basic ANOVA situation Two variables: 1 Categorical, 1 Quantitative Main Question: Do the (means of) the quantitative

More information

1 Use of indicator random variables. (Chapter 8)

1 Use of indicator random variables. (Chapter 8) 1 Use of indicator random variables. (Chapter 8) let I(A) = 1 if the event A occurs, and I(A) = 0 otherwise. I(A) is referred to as the indicator of the event A. The notation I A is often used. 1 2 Fitting

More information

2-way analysis of variance

2-way analysis of variance 2-way analysis of variance We may be considering the effect of two factors (A and B) on our response variable, for instance fertilizer and variety on maize yield; or therapy and sex on cholesterol level.

More information

Inference with Heteroskedasticity

Inference with Heteroskedasticity Inference with Heteroskedasticity Note on required packages: The following code requires the packages sandwich and lmtest to estimate regression error variance that may change with the explanatory variables.

More information

Regression. Marc H. Mehlman University of New Haven

Regression. Marc H. Mehlman University of New Haven Regression Marc H. Mehlman marcmehlman@yahoo.com University of New Haven the statistician knows that in nature there never was a normal distribution, there never was a straight line, yet with normal and

More information

Six Sigma Black Belt Study Guides

Six Sigma Black Belt Study Guides Six Sigma Black Belt Study Guides 1 www.pmtutor.org Powered by POeT Solvers Limited. Analyze Correlation and Regression Analysis 2 www.pmtutor.org Powered by POeT Solvers Limited. Variables and relationships

More information

Categorical Predictor Variables

Categorical Predictor Variables Categorical Predictor Variables We often wish to use categorical (or qualitative) variables as covariates in a regression model. For binary variables (taking on only 2 values, e.g. sex), it is relatively

More information

R Demonstration ANCOVA

R Demonstration ANCOVA R Demonstration ANCOVA Objective: The purpose of this week s session is to demonstrate how to perform an analysis of covariance (ANCOVA) in R, and how to plot the regression lines for each level of the

More information

Diagnostics and Transformations Part 2

Diagnostics and Transformations Part 2 Diagnostics and Transformations Part 2 Bivariate Linear Regression James H. Steiger Department of Psychology and Human Development Vanderbilt University Multilevel Regression Modeling, 2009 Diagnostics

More information

STK 2100 Oblig 1. Zhou Siyu. February 15, 2017

STK 2100 Oblig 1. Zhou Siyu. February 15, 2017 STK 200 Oblig Zhou Siyu February 5, 207 Question a) Make a scatter box plot for the data set. Answer:Here is the code I used to plot the scatter box in R. library ( MASS ) 2 pairs ( Boston ) Figure : Scatter

More information

MODELS WITHOUT AN INTERCEPT

MODELS WITHOUT AN INTERCEPT Consider the balanced two factor design MODELS WITHOUT AN INTERCEPT Factor A 3 levels, indexed j 0, 1, 2; Factor B 5 levels, indexed l 0, 1, 2, 3, 4; n jl 4 replicate observations for each factor level

More information

22s:152 Applied Linear Regression. 1-way ANOVA visual:

22s:152 Applied Linear Regression. 1-way ANOVA visual: 22s:152 Applied Linear Regression 1-way ANOVA visual: Chapter 8: 1-Way Analysis of Variance (ANOVA) 2-Way Analysis of Variance (ANOVA) 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 Y We now consider an analysis

More information

Statistics GIDP Ph.D. Qualifying Exam Methodology

Statistics GIDP Ph.D. Qualifying Exam Methodology Statistics GIDP Ph.D. Qualifying Exam Methodology January 9, 2018, 9:00am 1:00pm Instructions: Put your ID (not your name) on each sheet. Complete exactly 5 of 6 problems; turn in only those sheets you

More information

Lecture 6 Multiple Linear Regression, cont.

Lecture 6 Multiple Linear Regression, cont. Lecture 6 Multiple Linear Regression, cont. BIOST 515 January 22, 2004 BIOST 515, Lecture 6 Testing general linear hypotheses Suppose we are interested in testing linear combinations of the regression

More information

Exam Applied Statistical Regression. Good Luck!

Exam Applied Statistical Regression. Good Luck! Dr. M. Dettling Summer 2011 Exam Applied Statistical Regression Approved: Tables: Note: Any written material, calculator (without communication facility). Attached. All tests have to be done at the 5%-level.

More information

BIOL 458 BIOMETRY Lab 8 - Nested and Repeated Measures ANOVA

BIOL 458 BIOMETRY Lab 8 - Nested and Repeated Measures ANOVA BIOL 458 BIOMETRY Lab 8 - Nested and Repeated Measures ANOVA PART 1: NESTED ANOVA Nested designs are used when levels of one factor are not represented within all levels of another factor. Often this is

More information

1 Introduction to Minitab

1 Introduction to Minitab 1 Introduction to Minitab Minitab is a statistical analysis software package. The software is freely available to all students and is downloadable through the Technology Tab at my.calpoly.edu. When you

More information

Quantitative Understanding in Biology 2.3 Quantitative Comparison of Models and ANOVA

Quantitative Understanding in Biology 2.3 Quantitative Comparison of Models and ANOVA Quantitative Understanding in Biology 2.3 Quantitative Comparison of Models and ANOVA Jason Banfelder November 29th, 2018 1 Fitting a Michaelis-Menten Model to Myoglobin Binding Data A classic mathematical

More information

22s:152 Applied Linear Regression. Chapter 8: 1-Way Analysis of Variance (ANOVA) 2-Way Analysis of Variance (ANOVA)

22s:152 Applied Linear Regression. Chapter 8: 1-Way Analysis of Variance (ANOVA) 2-Way Analysis of Variance (ANOVA) 22s:152 Applied Linear Regression Chapter 8: 1-Way Analysis of Variance (ANOVA) 2-Way Analysis of Variance (ANOVA) We now consider an analysis with only categorical predictors (i.e. all predictors are

More information

COMPARING SEVERAL MEANS: ANOVA

COMPARING SEVERAL MEANS: ANOVA LAST UPDATED: November 15, 2012 COMPARING SEVERAL MEANS: ANOVA Objectives 2 Basic principles of ANOVA Equations underlying one-way ANOVA Doing a one-way ANOVA in R Following up an ANOVA: Planned contrasts/comparisons

More information

1: a b c d e 2: a b c d e 3: a b c d e 4: a b c d e 5: a b c d e. 6: a b c d e 7: a b c d e 8: a b c d e 9: a b c d e 10: a b c d e

1: a b c d e 2: a b c d e 3: a b c d e 4: a b c d e 5: a b c d e. 6: a b c d e 7: a b c d e 8: a b c d e 9: a b c d e 10: a b c d e Economics 102: Analysis of Economic Data Cameron Spring 2016 Department of Economics, U.C.-Davis Final Exam (A) Tuesday June 7 Compulsory. Closed book. Total of 58 points and worth 45% of course grade.

More information

Linear Modelling in Stata Session 6: Further Topics in Linear Modelling

Linear Modelling in Stata Session 6: Further Topics in Linear Modelling Linear Modelling in Stata Session 6: Further Topics in Linear Modelling Mark Lunt Arthritis Research UK Epidemiology Unit University of Manchester 14/11/2017 This Week Categorical Variables Categorical

More information

Example: Poisondata. 22s:152 Applied Linear Regression. Chapter 8: ANOVA

Example: Poisondata. 22s:152 Applied Linear Regression. Chapter 8: ANOVA s:5 Applied Linear Regression Chapter 8: ANOVA Two-way ANOVA Used to compare populations means when the populations are classified by two factors (or categorical variables) For example sex and occupation

More information

Homework 9 Sample Solution

Homework 9 Sample Solution Homework 9 Sample Solution # 1 (Ex 9.12, Ex 9.23) Ex 9.12 (a) Let p vitamin denote the probability of having cold when a person had taken vitamin C, and p placebo denote the probability of having cold

More information

Exercise I.1 I.2 I.3 I.4 II.1 II.2 III.1 III.2 III.3 IV.1 Question (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) Answer

Exercise I.1 I.2 I.3 I.4 II.1 II.2 III.1 III.2 III.3 IV.1 Question (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) Answer Solutions to Exam in 02402 December 2012 Exercise I.1 I.2 I.3 I.4 II.1 II.2 III.1 III.2 III.3 IV.1 Question (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) Answer 3 1 5 2 5 2 3 5 1 3 Exercise IV.2 IV.3 IV.4 V.1

More information

Regression on Faithful with Section 9.3 content

Regression on Faithful with Section 9.3 content Regression on Faithful with Section 9.3 content The faithful data frame contains 272 obervational units with variables waiting and eruptions measuring, in minutes, the amount of wait time between eruptions,

More information

Activity #12: More regression topics: LOWESS; polynomial, nonlinear, robust, quantile; ANOVA as regression

Activity #12: More regression topics: LOWESS; polynomial, nonlinear, robust, quantile; ANOVA as regression Activity #12: More regression topics: LOWESS; polynomial, nonlinear, robust, quantile; ANOVA as regression Scenario: 31 counts (over a 30-second period) were recorded from a Geiger counter at a nuclear

More information

Chapter 8 Conclusion

Chapter 8 Conclusion 1 Chapter 8 Conclusion Three questions about test scores (score) and student-teacher ratio (str): a) After controlling for differences in economic characteristics of different districts, does the effect

More information

STATISTICS 110/201 PRACTICE FINAL EXAM

STATISTICS 110/201 PRACTICE FINAL EXAM STATISTICS 110/201 PRACTICE FINAL EXAM Questions 1 to 5: There is a downloadable Stata package that produces sequential sums of squares for regression. In other words, the SS is built up as each variable

More information

ANOVA (Analysis of Variance) output RLS 11/20/2016

ANOVA (Analysis of Variance) output RLS 11/20/2016 ANOVA (Analysis of Variance) output RLS 11/20/2016 1. Analysis of Variance (ANOVA) The goal of ANOVA is to see if the variation in the data can explain enough to see if there are differences in the means.

More information

Week 7 Multiple factors. Ch , Some miscellaneous parts

Week 7 Multiple factors. Ch , Some miscellaneous parts Week 7 Multiple factors Ch. 18-19, Some miscellaneous parts Multiple Factors Most experiments will involve multiple factors, some of which will be nuisance variables Dealing with these factors requires

More information

Matrices and vectors A matrix is a rectangular array of numbers. Here s an example: A =

Matrices and vectors A matrix is a rectangular array of numbers. Here s an example: A = Matrices and vectors A matrix is a rectangular array of numbers Here s an example: 23 14 17 A = 225 0 2 This matrix has dimensions 2 3 The number of rows is first, then the number of columns We can write

More information

SMA 6304 / MIT / MIT Manufacturing Systems. Lecture 10: Data and Regression Analysis. Lecturer: Prof. Duane S. Boning

SMA 6304 / MIT / MIT Manufacturing Systems. Lecture 10: Data and Regression Analysis. Lecturer: Prof. Duane S. Boning SMA 6304 / MIT 2.853 / MIT 2.854 Manufacturing Systems Lecture 10: Data and Regression Analysis Lecturer: Prof. Duane S. Boning 1 Agenda 1. Comparison of Treatments (One Variable) Analysis of Variance

More information

Statistics Lab One-way Within-Subject ANOVA

Statistics Lab One-way Within-Subject ANOVA Statistics Lab One-way Within-Subject ANOVA PSYCH 710 9 One-way Within-Subjects ANOVA Section 9.1 reviews the basic commands you need to perform a one-way, within-subject ANOVA and to evaluate a linear

More information

Ordinary Least Squares Regression Explained: Vartanian

Ordinary Least Squares Regression Explained: Vartanian Ordinary Least Squares Regression Explained: Vartanian When to Use Ordinary Least Squares Regression Analysis A. Variable types. When you have an interval/ratio scale dependent variable.. When your independent

More information

STAT 571A Advanced Statistical Regression Analysis. Chapter 8 NOTES Quantitative and Qualitative Predictors for MLR

STAT 571A Advanced Statistical Regression Analysis. Chapter 8 NOTES Quantitative and Qualitative Predictors for MLR STAT 571A Advanced Statistical Regression Analysis Chapter 8 NOTES Quantitative and Qualitative Predictors for MLR 2015 University of Arizona Statistics GIDP. All rights reserved, except where previous

More information

Motor Trend Car Road Analysis

Motor Trend Car Road Analysis Motor Trend Car Road Analysis Zakia Sultana February 28, 2016 Executive Summary You work for Motor Trend, a magazine about the automobile industry. Looking at a data set of a collection of cars, they are

More information

2. Outliers and inference for regression

2. Outliers and inference for regression Unit6: Introductiontolinearregression 2. Outliers and inference for regression Sta 101 - Spring 2016 Duke University, Department of Statistical Science Dr. Çetinkaya-Rundel Slides posted at http://bit.ly/sta101_s16

More information

ST430 Exam 1 with Answers

ST430 Exam 1 with Answers ST430 Exam 1 with Answers Date: October 5, 2015 Name: Guideline: You may use one-page (front and back of a standard A4 paper) of notes. No laptop or textook are permitted but you may use a calculator.

More information

Comparing Nested Models

Comparing Nested Models Comparing Nested Models ST 370 Two regression models are called nested if one contains all the predictors of the other, and some additional predictors. For example, the first-order model in two independent

More information

Multiple Regression Introduction to Statistics Using R (Psychology 9041B)

Multiple Regression Introduction to Statistics Using R (Psychology 9041B) Multiple Regression Introduction to Statistics Using R (Psychology 9041B) Paul Gribble Winter, 2016 1 Correlation, Regression & Multiple Regression 1.1 Bivariate correlation The Pearson product-moment

More information

Acknowledgements. Outline. Marie Diener-West. ICTR Leadership / Team INTRODUCTION TO CLINICAL RESEARCH. Introduction to Linear Regression

Acknowledgements. Outline. Marie Diener-West. ICTR Leadership / Team INTRODUCTION TO CLINICAL RESEARCH. Introduction to Linear Regression INTRODUCTION TO CLINICAL RESEARCH Introduction to Linear Regression Karen Bandeen-Roche, Ph.D. July 17, 2012 Acknowledgements Marie Diener-West Rick Thompson ICTR Leadership / Team JHU Intro to Clinical

More information

NC Births, ANOVA & F-tests

NC Births, ANOVA & F-tests Math 158, Spring 2018 Jo Hardin Multiple Regression II R code Decomposition of Sums of Squares (and F-tests) NC Births, ANOVA & F-tests A description of the data is given at http://pages.pomona.edu/~jsh04747/courses/math58/

More information

BIOL 933!! Lab 10!! Fall Topic 13: Covariance Analysis

BIOL 933!! Lab 10!! Fall Topic 13: Covariance Analysis BIOL 933!! Lab 10!! Fall 2017 Topic 13: Covariance Analysis Covariable as a tool for increasing precision Carrying out a full ANCOVA Testing ANOVA assumptions Happiness Covariables as Tools for Increasing

More information

Lectures 5 & 6: Hypothesis Testing

Lectures 5 & 6: Hypothesis Testing Lectures 5 & 6: Hypothesis Testing in which you learn to apply the concept of statistical significance to OLS estimates, learn the concept of t values, how to use them in regression work and come across

More information

Explanatory Variables Must be Linear Independent...

Explanatory Variables Must be Linear Independent... Explanatory Variables Must be Linear Independent... Recall the multiple linear regression model Y j = β 0 + β 1 X 1j + β 2 X 2j + + β p X pj + ε j, i = 1,, n. is a shorthand for n linear relationships

More information

Stat 401B Exam 2 Fall 2015

Stat 401B Exam 2 Fall 2015 Stat 401B Exam Fall 015 I have neither given nor received unauthorized assistance on this exam. Name Signed Date Name Printed ATTENTION! Incorrect numerical answers unaccompanied by supporting reasoning

More information