Linear Model Specification in R

Size: px
Start display at page:

Download "Linear Model Specification in R"

Transcription

1 Linear Model Specification in R How to deal with overparameterisation? Paul Janssen 1 Luc Duchateau 2 1 Center for Statistics Hasselt University, Belgium 2 Faculty of Veterinary Medicine Ghent University, Belgium P.Janssen & L. Duchateau (UH & UG) Linear Model Specification in R 1 / 30

2 1. The data set 1.1. The data set of exam preparation DoseP DoseK Yield low low 21 low low 23 medium low 19 medium low 24 high low 24 high low 21 low high 29 low high 31 medium high 35 medium high 36 high high 41 high high 40 P.Janssen & L. Duchateau (UH & UG) Linear Model Specification in R 2 / 30

3 1.2. The extended data set DosePnum DoseP DoseK Yield 1 low low 21 1 low low 23 2 medium low 19 2 medium low 24 3 high low 24 3 high low 21 1 low high 29 1 low high 31 2 medium high 35 2 medium high 36 3 high high 41 3 high high 40 P.Janssen & L. Duchateau (UH & UG) Linear Model Specification in R 3 / 30

4 2. The linear regression model 2.1. Model specification The model is given by Y i = β 0 + β 1 dosepnum i + ε i, ε i iid N (0, σ 2 ) There is no overparameterisation in this model, both the intercept β 0 and the slope β 1 have a clear meaning P.Janssen & L. Duchateau (UH & UG) Linear Model Specification in R 4 / 30

5 2.2. Model matrix Y 1 Y 2 Y 3 Y 4 Y 5 Y 6 Y 7 Y 8 Y 9 Y 10 Y 11 Y 12 = [ β0 β 1 ] + ε 1 ε 2 ε 3 ε 4 ε 5 ε 6 ε 7 ε 8 ε 9 ε 10 ε 11 ε 12 Y = Xβ + ε X is full rank ε MV N(0, σ 2 I 15 ) Think of a data structure for which the linear regression model would become overparameterised P.Janssen & L. Duchateau (UH & UG) Linear Model Specification in R 5 / 30

6 2.3. Overparameterised linear regression model Assume we have used only one dose, for instance the medium dose Y 1 Y 2 Y 3 Y 4 Y 5 Y 6 Y 7 Y 8 Y 9 Y 10 Y 11 Y 12 = [ β0 β 1 ] + ε 1 ε 2 ε 3 ε 4 ε 5 ε 6 ε 7 ε 8 ε 9 ε 10 ε 11 ε 12 X is no longer full rank P.Janssen & L. Duchateau (UH & UG) Linear Model Specification in R 6 / 30

7 2.4. Model specification in R setwd("c:/users/lduchate/docs/oc/onderwijs/adekus/part3basicprinciples") tomatopk<-read.table('tomatopk.txt',header=t) linres.tomatopk<-lm(yield~dosepnum,data=tomatopk);summary(linres.tomatopk) Call: lm(formula = Yield ~ DosePnum, data = tomatopk) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) ** DosePnum Signif. codes: 0 '***' '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: on 10 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: on 1 and 10 DF, p-value: P.Janssen & L. Duchateau (UH & UG) Linear Model Specification in R 7 / 30

8 3. One-Way Analysis of Variance We consider 3 different models The cell means model (not overparametrised) The factor effects model (not overparametrised) The factor effects model with treatment restriction The factor effects model with sum restriction P.Janssen & L. Duchateau (UH & UG) Linear Model Specification in R 8 / 30

9 3.1. The cell means model Consider the effect of P dose as categorical variable The model is given by Y ij = µ i + ε i, ε i iid N (0, σ 2 ) with µ 1, µ 2 and µ 3 the population mean yield of low, medium and high P dosen, resp. There is no overparameterisation in this model, the population means have a clear meaning P.Janssen & L. Duchateau (UH & UG) Linear Model Specification in R 9 / 30

10 Model matrix Y 11 Y 12 Y 21 Y 22 Y 31 Y 32 Y 13 Y 14 Y 23 Y 24 Y 33 Y 34 = µ 1 µ 2 + µ 3 ε 11 ε 12 ε 21 ε 22 ε 31 ε 32 ε 13 ε 14 ε 23 ε 24 ε 33 ε 34 Y = Xβ + ε Xis full rank ε MV N(0, σ 2 I 15 ) P.Janssen & L. Duchateau (UH & UG) Linear Model Specification in R 10 / 30

11 Model specification in R onewaycellm.tomatopk<-lm(yield~dosep-1,data=tomatopk);summary(onewaycellm.tomatopk) Call: lm(formula = Yield ~ DoseP - 1, data = tomatopk) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(> t ) DosePhigh e-05 *** DosePlow *** DosePmedium e-05 *** --- Signif. codes: 0 '***' '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: on 9 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: on 3 and 9 DF, p-value: 6.651e-06 The parameters just correspond to the sample means tapply(tomatopk$yield, tomatopk$dosep, mean) high low medium P.Janssen & L. Duchateau (UH & UG) Linear Model Specification in R 11 / 30

12 3.2. The factor effects model Consider the effect of P dose as categorical variable The model is given by Y ij = µ + δ i + ε ij, ε ij iid N (0, σ 2 ) with µ a parameter common to all P dose levels, and δ 1, δ 2 and δ 3 the effect of low, medium and high P dosen, resp. on the population mean yield There is overparameterisation in this model, the population means do not have a clear meaning P.Janssen & L. Duchateau (UH & UG) Linear Model Specification in R 12 / 30

13 Model matrix Y 11 Y 12 Y 21 Y 22 Y 31 Y 32 Y 13 Y 14 Y 23 Y 24 Y 33 Y 34 = µ δ 1 δ 2 δ 3 + ε 11 ε 12 ε 21 ε 22 ε 31 ε 32 ε 13 ε 14 ε 23 ε 24 ε 33 ε 34 Y = Xβ + ε X is not full rank ε MV N(0, σ 2 I 15 ) P.Janssen & L. Duchateau (UH & UG) Linear Model Specification in R 13 / 30

14 3.2.2 Model specification in R: treatment restriction options("contrasts") $contrasts [1] "contr.treatment" "contr.treatment" onewaytrt.tomatopk<-lm(yield~dosep,data=tomatopk);summary(onewaytrt.tomatopk) Call: lm(formula = Yield ~ DoseP, data = tomatopk) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) e-05 *** DosePlow DosePmedium Signif. codes: 0 '***' '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: on 9 degrees of freedom Multiple R-squared: 0.091, Adjusted R-squared: F-statistic: on 2 and 9 DF, p-value: P.Janssen & L. Duchateau (UH & UG) Linear Model Specification in R 14 / 30

15 The estimated parameters of the model are given by (Intercept) DosePlow DosePmedium In this factor effects model with treatment restriction we have µ H = µ = intercept = 31.5 µ L = intercept + DosePlow = =26 µ M = intercept + DosePmedium = =28.5 P.Janssen & L. Duchateau (UH & UG) Linear Model Specification in R 15 / 30

16 3.2.3 Model specification in R: sum restriction options(contrasts=c("contr.sum","contr.sum")) onewaysum.tomatopk<-lm(yield~dosep,data=tomatopk);summary(onewaysum.tomatopk) Call: lm(formula = Yield ~ DoseP, data = tomatopk) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) e-07 *** DoseP DoseP Signif. codes: 0 '***' '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: on 9 degrees of freedom Multiple R-squared: 0.091, Adjusted R-squared: F-statistic: on 2 and 9 DF, p-value: P.Janssen & L. Duchateau (UH & UG) Linear Model Specification in R 16 / 30

17 The estimated parameters of the model are given by (Intercept) DoseP1 DoseP In this overparametrized model we use the -restriction, i.e., and δ 1 + δ 2 + δ 3 = 0 µ = δ 1 = δ 2 = } µ H = µ + δ 1 µ L = µ + δ 2 µ M = µ + δ 3 δ 3 = ( ) = µ H = µ + δ 1 = = 31.5 µ L = µ + δ 2 = = 26 µ M = µ + δ 3 = ( ) = 28.5 P.Janssen & L. Duchateau (UH & UG) Linear Model Specification in R 17 / 30

18 4. Two-Way Analysis of Variance The cell means model specification Consider the effect of P and K dose as categorical variables The model is given by Y ijk = µ ij + ε ij, ε ij iid N (0, σ 2 ) with µ ij the population mean yield of the i th P dose and the j th K dose. There is no overparameterisation in this model, the population means have a clear meaning We can however not disentangle the effect of the K and the P dose P.Janssen & L. Duchateau (UH & UG) Linear Model Specification in R 18 / 30

19 Matrix Notation Cell means model Y 111 Y 112 Y 211 Y 212 Y 311 Y 312 Y 211 Y 212 Y 221 Y 222 Y 231 Y 232 = Y = Xβ + ε µ 11 µ 21 µ 31 µ 12 µ 22 µ 32 + ε 111 ε ε 321 ε 322 P.Janssen & L. Duchateau (UH & UG) Linear Model Specification in R 19 / 30

20 We first have a look at the sample means of the treatment combinations yieldpk.average<-aggregate(tomatopk$yield, list(dosep = tomatopk$dosep,dosek=tomatopk$dosek), mean) yieldpk.average DoseP DoseK x 1 high high low high medium high high low low low medium low 21.5 P.Janssen & L. Duchateau (UH & UG) Linear Model Specification in R 20 / 30

21 Next we can look at the interaction plot interaction.plot(tomatopk$dosep,tomatopk$dosek,tomatopk$yield, trace.label ="DoseK",xlab="DoseP",ylab="Mean Yield") Mean Yield DoseK high low high low medium DoseP Picture says that yield is higher with higher K dose, and that there is only a substantial effect of the P dose at the high K dose. The two factors therefore interact! P.Janssen & L. Duchateau (UH & UG) Linear Model Specification in R 21 / 30

22 The factor effects model specification We decompose the cell means µ ij leading to y ijk = µ + π i + κ j + (πκ) ij + ε ijk µ is the overall mean π i = µ i. µ is the effect of P dose i, i = 1, 2, 3 κ j = µ.j µ is the effect of K dose j, j = 1, 2 (πκ) ij = µ ij (µ + π i + κ j ) is the interaction effect for P dose i and K dose j ε ijk is the random error term P.Janssen & L. Duchateau (UH & UG) Linear Model Specification in R 22 / 30

23 Matrix Notation Factor effects model Y 111 Y 112 Y 211 Y 212 Y 311 Y 312 Y 211 Y 212 Y 221 Y 222 Y 231 Y 232 = Y = Xβ + ε µ π 1 π 2 π 3 κ 1 κ 2 πκ 11 πκ 21 πκ 31 πκ 12 πκ 22 πκ 32 + ε 111 ε ε 321 ε 322 P.Janssen & L. Duchateau (UH & UG) Linear Model Specification in R 23 / 30

24 We revisit the example: Main effects DoseK DoseP Low Medium High Low µ.l = 22 (κ 1 = 6.667) High µ.h = (κ 2 = µ L. = 26 µ M. = 28.5 µ H. = 31.5 µ.. µ = (π 1 = 2.667) (π 2 = 0.167) (π 3 = 2.833) P.Janssen & L. Duchateau (UH & UG) Linear Model Specification in R 24 / 30

25 Interaction effects (πκ ij = µ ij µ π i κ j ) DoseK DoseP Low Medium High Low High Example µ LL = 22 = µ + π 1 + κ 1 + (πκ) 11 = µ HH = 40.5 = µ + π 3 + κ 2 + (πκ) 32 = P.Janssen & L. Duchateau (UH & UG) Linear Model Specification in R 25 / 30

26 4.2.3 Model specification in R: sum restriction options(contrasts=c("contr.sum","contr.sum")) twowaysum.tomatopk<-lm(yield~dosep*dosek,data=tomatopk); summary(twowaysum.tomatopk) Call: lm(formula = Yield ~ DoseP * DoseK, data = tomatopk) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) e-09 *** DoseP * DoseP * DoseK e-05 *** DoseP1:DoseK * DoseP2:DoseK * --- Signif. codes: 0 '***' '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: on 6 degrees of freedom Multiple R-squared: 0.967, Adjusted R-squared: F-statistic: on 5 and 6 DF, p-value: P.Janssen & L. Duchateau (UH & UG) Linear Model Specification in R 26 / 30

27 Interpretation of parameter estimates DoseP: DoseP1 (High) and DoseP2 (Low) DoseK: DoseK1 (High) From the output we see DoseP1=2.833, DoseP2=-2.667,DoseK1=6.667, DoseP1:DoseK1=2.333, DoseP2:DoseK1= Due to the sum restriction we have DoseP3=-0.167, DoseK2=-6.667, DoseP3:DoseK1=-( )= 0.333, DoseP1:DoseK2=-2.333, DoseP2:DoseK2=2.667, DoseP3:DoseK2= µ LL = intercept + DoseP2 + DoseK2 + DoseP2:DoseK2 µ HH = intercept + DoseP1 + DoseK1 + DoseP1:DoseK1 µ LL = 22, µ HH = 40.5, P.Janssen & L. Duchateau (UH & UG) Linear Model Specification in R 27 / 30

28 # Type I SUM OF SQUARES anova(twowaysum.tomatopk) Analysis of Variance Table Response: Yield Df Sum Sq Mean Sq F value Pr(>F) DoseP * DoseK e-05 *** DoseP:DoseK * Residuals Signif. codes: 0 '***' '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 P.Janssen & L. Duchateau (UH & UG) Linear Model Specification in R 28 / 30

29 # Type I SUM OF SQUARES are not influenced by sequence of variables twowaysum.tomatopkalt<-lm(yield~dosek*dosep,data=tomatopk); anova(twowaysum.tomatopkalt) Analysis of Variance Table Response: Yield Df Sum Sq Mean Sq F value Pr(>F) DoseK e-05 *** DoseP * DoseK:DoseP * Residuals Signif. codes: 0 '***' '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 P.Janssen & L. Duchateau (UH & UG) Linear Model Specification in R 29 / 30

30 Model specification in R: treatment restriction options(contrasts=c("contr.treatment","contr.treatment")) twowaytrt.tomatopk<-lm(yield~dosep*dosek,data=tomatopk); summary(twowaytrt.tomatopk) Call: lm(formula = Yield ~ DoseP * DoseK, data = tomatopk) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) e-08 *** DosePlow ** DosePmedium * DoseKlow e-05 *** DosePlow:DoseKlow * DosePmedium:DoseKlow Signif. codes: 0 '***' '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: on 6 degrees of freedom Multiple R-squared: 0.967, Adjusted R-squared: F-statistic: on 5 and 6 DF, p-value: P.Janssen & L. Duchateau (UH & UG) Linear Model Specification in R 30 / 30

Workshop 7.4a: Single factor ANOVA

Workshop 7.4a: Single factor ANOVA -1- Workshop 7.4a: Single factor ANOVA Murray Logan November 23, 2016 Table of contents 1 Revision 1 2 Anova Parameterization 2 3 Partitioning of variance (ANOVA) 10 4 Worked Examples 13 1. Revision 1.1.

More information

MODELS WITHOUT AN INTERCEPT

MODELS WITHOUT AN INTERCEPT Consider the balanced two factor design MODELS WITHOUT AN INTERCEPT Factor A 3 levels, indexed j 0, 1, 2; Factor B 5 levels, indexed l 0, 1, 2, 3, 4; n jl 4 replicate observations for each factor level

More information

Inference for Regression

Inference for Regression Inference for Regression Section 9.4 Cathy Poliak, Ph.D. cathy@math.uh.edu Office in Fleming 11c Department of Mathematics University of Houston Lecture 13b - 3339 Cathy Poliak, Ph.D. cathy@math.uh.edu

More information

MATH 644: Regression Analysis Methods

MATH 644: Regression Analysis Methods MATH 644: Regression Analysis Methods FINAL EXAM Fall, 2012 INSTRUCTIONS TO STUDENTS: 1. This test contains SIX questions. It comprises ELEVEN printed pages. 2. Answer ALL questions for a total of 100

More information

Stat 5102 Final Exam May 14, 2015

Stat 5102 Final Exam May 14, 2015 Stat 5102 Final Exam May 14, 2015 Name Student ID The exam is closed book and closed notes. You may use three 8 1 11 2 sheets of paper with formulas, etc. You may also use the handouts on brand name distributions

More information

ST430 Exam 2 Solutions

ST430 Exam 2 Solutions ST430 Exam 2 Solutions Date: November 9, 2015 Name: Guideline: You may use one-page (front and back of a standard A4 paper) of notes. No laptop or textbook are permitted but you may use a calculator. Giving

More information

ST430 Exam 1 with Answers

ST430 Exam 1 with Answers ST430 Exam 1 with Answers Date: October 5, 2015 Name: Guideline: You may use one-page (front and back of a standard A4 paper) of notes. No laptop or textook are permitted but you may use a calculator.

More information

Analysis of variance. Gilles Guillot. September 30, Gilles Guillot September 30, / 29

Analysis of variance. Gilles Guillot. September 30, Gilles Guillot September 30, / 29 Analysis of variance Gilles Guillot gigu@dtu.dk September 30, 2013 Gilles Guillot (gigu@dtu.dk) September 30, 2013 1 / 29 1 Introductory example 2 One-way ANOVA 3 Two-way ANOVA 4 Two-way ANOVA with interactions

More information

Lecture 10. Factorial experiments (2-way ANOVA etc)

Lecture 10. Factorial experiments (2-way ANOVA etc) Lecture 10. Factorial experiments (2-way ANOVA etc) Jesper Rydén Matematiska institutionen, Uppsala universitet jesper@math.uu.se Regression and Analysis of Variance autumn 2014 A factorial experiment

More information

SCHOOL OF MATHEMATICS AND STATISTICS

SCHOOL OF MATHEMATICS AND STATISTICS SHOOL OF MATHEMATIS AND STATISTIS Linear Models Autumn Semester 2015 16 2 hours Marks will be awarded for your best three answers. RESTRITED OPEN BOOK EXAMINATION andidates may bring to the examination

More information

Comparing Nested Models

Comparing Nested Models Comparing Nested Models ST 370 Two regression models are called nested if one contains all the predictors of the other, and some additional predictors. For example, the first-order model in two independent

More information

Biostatistics for physicists fall Correlation Linear regression Analysis of variance

Biostatistics for physicists fall Correlation Linear regression Analysis of variance Biostatistics for physicists fall 2015 Correlation Linear regression Analysis of variance Correlation Example: Antibody level on 38 newborns and their mothers There is a positive correlation in antibody

More information

Stat 411/511 ESTIMATING THE SLOPE AND INTERCEPT. Charlotte Wickham. stat511.cwick.co.nz. Nov

Stat 411/511 ESTIMATING THE SLOPE AND INTERCEPT. Charlotte Wickham. stat511.cwick.co.nz. Nov Stat 411/511 ESTIMATING THE SLOPE AND INTERCEPT Nov 20 2015 Charlotte Wickham stat511.cwick.co.nz Quiz #4 This weekend, don t forget. Usual format Assumptions Display 7.5 p. 180 The ideal normal, simple

More information

Stat 412/512 TWO WAY ANOVA. Charlotte Wickham. stat512.cwick.co.nz. Feb

Stat 412/512 TWO WAY ANOVA. Charlotte Wickham. stat512.cwick.co.nz. Feb Stat 42/52 TWO WAY ANOVA Feb 6 25 Charlotte Wickham stat52.cwick.co.nz Roadmap DONE: Understand what a multiple regression model is. Know how to do inference on single and multiple parameters. Some extra

More information

MS&E 226: Small Data

MS&E 226: Small Data MS&E 226: Small Data Lecture 15: Examples of hypothesis tests (v5) Ramesh Johari ramesh.johari@stanford.edu 1 / 32 The recipe 2 / 32 The hypothesis testing recipe In this lecture we repeatedly apply the

More information

Example: Poisondata. 22s:152 Applied Linear Regression. Chapter 8: ANOVA

Example: Poisondata. 22s:152 Applied Linear Regression. Chapter 8: ANOVA s:5 Applied Linear Regression Chapter 8: ANOVA Two-way ANOVA Used to compare populations means when the populations are classified by two factors (or categorical variables) For example sex and occupation

More information

22s:152 Applied Linear Regression

22s:152 Applied Linear Regression 22s:152 Applied Linear Regression Chapter 7: Dummy Variable Regression So far, we ve only considered quantitative variables in our models. We can integrate categorical predictors by constructing artificial

More information

Lecture 18: Simple Linear Regression

Lecture 18: Simple Linear Regression Lecture 18: Simple Linear Regression BIOS 553 Department of Biostatistics University of Michigan Fall 2004 The Correlation Coefficient: r The correlation coefficient (r) is a number that measures the strength

More information

Linear Regression Model. Badr Missaoui

Linear Regression Model. Badr Missaoui Linear Regression Model Badr Missaoui Introduction What is this course about? It is a course on applied statistics. It comprises 2 hours lectures each week and 1 hour lab sessions/tutorials. We will focus

More information

SCHOOL OF MATHEMATICS AND STATISTICS Autumn Semester

SCHOOL OF MATHEMATICS AND STATISTICS Autumn Semester RESTRICTED OPEN BOOK EXAMINATION (Not to be removed from the examination hall) Data provided: "Statistics Tables" by H.R. Neave PAS 371 SCHOOL OF MATHEMATICS AND STATISTICS Autumn Semester 2008 9 Linear

More information

Biostatistics 380 Multiple Regression 1. Multiple Regression

Biostatistics 380 Multiple Regression 1. Multiple Regression Biostatistics 0 Multiple Regression ORIGIN 0 Multiple Regression Multiple Regression is an extension of the technique of linear regression to describe the relationship between a single dependent (response)

More information

Figure 1: The fitted line using the shipment route-number of ampules data. STAT5044: Regression and ANOVA The Solution of Homework #2 Inyoung Kim

Figure 1: The fitted line using the shipment route-number of ampules data. STAT5044: Regression and ANOVA The Solution of Homework #2 Inyoung Kim 0.0 1.0 1.5 2.0 2.5 3.0 8 10 12 14 16 18 20 22 y x Figure 1: The fitted line using the shipment route-number of ampules data STAT5044: Regression and ANOVA The Solution of Homework #2 Inyoung Kim Problem#

More information

UNIVERSITY OF MASSACHUSETTS. Department of Mathematics and Statistics. Basic Exam - Applied Statistics. Tuesday, January 17, 2017

UNIVERSITY OF MASSACHUSETTS. Department of Mathematics and Statistics. Basic Exam - Applied Statistics. Tuesday, January 17, 2017 UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Basic Exam - Applied Statistics Tuesday, January 17, 2017 Work all problems 60 points are needed to pass at the Masters Level and 75

More information

Example of treatment contrasts used by R in estimating ANOVA coefficients

Example of treatment contrasts used by R in estimating ANOVA coefficients Example of treatment contrasts used by R in estimating ANOVA coefficients The first example shows a simple numerical design matrix in R (no factors) for the groups 1, a, b, ab. resp

More information

CAS MA575 Linear Models

CAS MA575 Linear Models CAS MA575 Linear Models Boston University, Fall 2013 Midterm Exam (Correction) Instructor: Cedric Ginestet Date: 22 Oct 2013. Maximal Score: 200pts. Please Note: You will only be graded on work and answers

More information

SCHOOL OF MATHEMATICS AND STATISTICS

SCHOOL OF MATHEMATICS AND STATISTICS RESTRICTED OPEN BOOK EXAMINATION (Not to be removed from the examination hall) Data provided: Statistics Tables by H.R. Neave MAS5052 SCHOOL OF MATHEMATICS AND STATISTICS Basic Statistics Spring Semester

More information

1 Use of indicator random variables. (Chapter 8)

1 Use of indicator random variables. (Chapter 8) 1 Use of indicator random variables. (Chapter 8) let I(A) = 1 if the event A occurs, and I(A) = 0 otherwise. I(A) is referred to as the indicator of the event A. The notation I A is often used. 1 2 Fitting

More information

Tests of Linear Restrictions

Tests of Linear Restrictions Tests of Linear Restrictions 1. Linear Restricted in Regression Models In this tutorial, we consider tests on general linear restrictions on regression coefficients. In other tutorials, we examine some

More information

Statistics 203 Introduction to Regression Models and ANOVA Practice Exam

Statistics 203 Introduction to Regression Models and ANOVA Practice Exam Statistics 203 Introduction to Regression Models and ANOVA Practice Exam Prof. J. Taylor You may use your 4 single-sided pages of notes This exam is 7 pages long. There are 4 questions, first 3 worth 10

More information

Explanatory Variables Must be Linear Independent...

Explanatory Variables Must be Linear Independent... Explanatory Variables Must be Linear Independent... Recall the multiple linear regression model Y j = β 0 + β 1 X 1j + β 2 X 2j + + β p X pj + ε j, i = 1,, n. is a shorthand for n linear relationships

More information

Regression and the 2-Sample t

Regression and the 2-Sample t Regression and the 2-Sample t James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) Regression and the 2-Sample t 1 / 44 Regression

More information

Example: 1982 State SAT Scores (First year state by state data available)

Example: 1982 State SAT Scores (First year state by state data available) Lecture 11 Review Section 3.5 from last Monday (on board) Overview of today s example (on board) Section 3.6, Continued: Nested F tests, review on board first Section 3.4: Interaction for quantitative

More information

Stat 401B Exam 2 Fall 2015

Stat 401B Exam 2 Fall 2015 Stat 401B Exam Fall 015 I have neither given nor received unauthorized assistance on this exam. Name Signed Date Name Printed ATTENTION! Incorrect numerical answers unaccompanied by supporting reasoning

More information

Recall that a measure of fit is the sum of squared residuals: where. The F-test statistic may be written as:

Recall that a measure of fit is the sum of squared residuals: where. The F-test statistic may be written as: 1 Joint hypotheses The null and alternative hypotheses can usually be interpreted as a restricted model ( ) and an model ( ). In our example: Note that if the model fits significantly better than the restricted

More information

> modlyq <- lm(ly poly(x,2,raw=true)) > summary(modlyq) Call: lm(formula = ly poly(x, 2, raw = TRUE))

> modlyq <- lm(ly poly(x,2,raw=true)) > summary(modlyq) Call: lm(formula = ly poly(x, 2, raw = TRUE)) School of Mathematical Sciences MTH5120 Statistical Modelling I Tutorial 4 Solutions The first two models were looked at last week and both had flaws. The output for the third model with log y and a quadratic

More information

Stat 401B Final Exam Fall 2015

Stat 401B Final Exam Fall 2015 Stat 401B Final Exam Fall 015 I have neither given nor received unauthorized assistance on this exam. Name Signed Date Name Printed ATTENTION! Incorrect numerical answers unaccompanied by supporting reasoning

More information

Booklet of Code and Output for STAC32 Final Exam

Booklet of Code and Output for STAC32 Final Exam Booklet of Code and Output for STAC32 Final Exam December 7, 2017 Figure captions are below the Figures they refer to. LowCalorie LowFat LowCarbo Control 8 2 3 2 9 4 5 2 6 3 4-1 7 5 2 0 3 1 3 3 Figure

More information

NC Births, ANOVA & F-tests

NC Births, ANOVA & F-tests Math 158, Spring 2018 Jo Hardin Multiple Regression II R code Decomposition of Sums of Squares (and F-tests) NC Births, ANOVA & F-tests A description of the data is given at http://pages.pomona.edu/~jsh04747/courses/math58/

More information

SMA 6304 / MIT / MIT Manufacturing Systems. Lecture 10: Data and Regression Analysis. Lecturer: Prof. Duane S. Boning

SMA 6304 / MIT / MIT Manufacturing Systems. Lecture 10: Data and Regression Analysis. Lecturer: Prof. Duane S. Boning SMA 6304 / MIT 2.853 / MIT 2.854 Manufacturing Systems Lecture 10: Data and Regression Analysis Lecturer: Prof. Duane S. Boning 1 Agenda 1. Comparison of Treatments (One Variable) Analysis of Variance

More information

22s:152 Applied Linear Regression. Take random samples from each of m populations.

22s:152 Applied Linear Regression. Take random samples from each of m populations. 22s:152 Applied Linear Regression Chapter 8: ANOVA NOTE: We will meet in the lab on Monday October 10. One-way ANOVA Focuses on testing for differences among group means. Take random samples from each

More information

Variance Decomposition and Goodness of Fit

Variance Decomposition and Goodness of Fit Variance Decomposition and Goodness of Fit 1. Example: Monthly Earnings and Years of Education In this tutorial, we will focus on an example that explores the relationship between total monthly earnings

More information

Chapter 12: Linear regression II

Chapter 12: Linear regression II Chapter 12: Linear regression II Timothy Hanson Department of Statistics, University of South Carolina Stat 205: Elementary Statistics for the Biological and Life Sciences 1 / 14 12.4 The regression model

More information

22s:152 Applied Linear Regression. There are a couple commonly used models for a one-way ANOVA with m groups. Chapter 8: ANOVA

22s:152 Applied Linear Regression. There are a couple commonly used models for a one-way ANOVA with m groups. Chapter 8: ANOVA 22s:152 Applied Linear Regression Chapter 8: ANOVA NOTE: We will meet in the lab on Monday October 10. One-way ANOVA Focuses on testing for differences among group means. Take random samples from each

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression ST 430/514 Recall: a regression model describes how a dependent variable (or response) Y is affected, on average, by one or more independent variables (or factors, or covariates).

More information

Extensions of One-Way ANOVA.

Extensions of One-Way ANOVA. Extensions of One-Way ANOVA http://www.pelagicos.net/classes_biometry_fa18.htm What do I want You to Know What are two main limitations of ANOVA? What two approaches can follow a significant ANOVA? How

More information

R Output for Linear Models using functions lm(), gls() & glm()

R Output for Linear Models using functions lm(), gls() & glm() LM 04 lm(), gls() &glm() 1 R Output for Linear Models using functions lm(), gls() & glm() Different kinds of output related to linear models can be obtained in R using function lm() {stats} in the base

More information

Multiple Regression: Example

Multiple Regression: Example Multiple Regression: Example Cobb-Douglas Production Function The Cobb-Douglas production function for observed economic data i = 1,..., n may be expressed as where O i is output l i is labour input c

More information

Extensions of One-Way ANOVA.

Extensions of One-Way ANOVA. Extensions of One-Way ANOVA http://www.pelagicos.net/classes_biometry_fa17.htm What do I want You to Know What are two main limitations of ANOVA? What two approaches can follow a significant ANOVA? How

More information

UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Basic Exam - Applied Statistics January, 2018

UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Basic Exam - Applied Statistics January, 2018 UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Basic Exam - Applied Statistics January, 2018 Work all problems. 60 points needed to pass at the Masters level, 75 to pass at the PhD

More information

1 Multiple Regression

1 Multiple Regression 1 Multiple Regression In this section, we extend the linear model to the case of several quantitative explanatory variables. There are many issues involved in this problem and this section serves only

More information

General Linear Statistical Models

General Linear Statistical Models General Linear Statistical Models Statistics 135 Autumn 2005 Copyright c 2005 by Mark E. Irwin This framework includes General Linear Statistical Models Linear Regression Analysis of Variance (ANOVA) Analysis

More information

Lecture 2. The Simple Linear Regression Model: Matrix Approach

Lecture 2. The Simple Linear Regression Model: Matrix Approach Lecture 2 The Simple Linear Regression Model: Matrix Approach Matrix algebra Matrix representation of simple linear regression model 1 Vectors and Matrices Where it is necessary to consider a distribution

More information

Chapter 8 Conclusion

Chapter 8 Conclusion 1 Chapter 8 Conclusion Three questions about test scores (score) and student-teacher ratio (str): a) After controlling for differences in economic characteristics of different districts, does the effect

More information

STAT 510 Final Exam Spring 2015

STAT 510 Final Exam Spring 2015 STAT 510 Final Exam Spring 2015 Instructions: The is a closed-notes, closed-book exam No calculator or electronic device of any kind may be used Use nothing but a pen or pencil Please write your name and

More information

General Linear Statistical Models - Part III

General Linear Statistical Models - Part III General Linear Statistical Models - Part III Statistics 135 Autumn 2005 Copyright c 2005 by Mark E. Irwin Interaction Models Lets examine two models involving Weight and Domestic in the cars93 dataset.

More information

Pumpkin Example: Flaws in Diagnostics: Correcting Models

Pumpkin Example: Flaws in Diagnostics: Correcting Models Math 3080. Treibergs Pumpkin Example: Flaws in Diagnostics: Correcting Models Name: Example March, 204 From Levine Ramsey & Smidt, Applied Statistics for Engineers and Scientists, Prentice Hall, Upper

More information

BIOSTATS 640 Spring 2018 Unit 2. Regression and Correlation (Part 1 of 2) R Users

BIOSTATS 640 Spring 2018 Unit 2. Regression and Correlation (Part 1 of 2) R Users BIOSTATS 640 Spring 08 Unit. Regression and Correlation (Part of ) R Users Unit Regression and Correlation of - Practice Problems Solutions R Users. In this exercise, you will gain some practice doing

More information

14 Multiple Linear Regression

14 Multiple Linear Regression B.Sc./Cert./M.Sc. Qualif. - Statistics: Theory and Practice 14 Multiple Linear Regression 14.1 The multiple linear regression model In simple linear regression, the response variable y is expressed in

More information

Psychology 405: Psychometric Theory

Psychology 405: Psychometric Theory Psychology 405: Psychometric Theory Homework Problem Set #2 Department of Psychology Northwestern University Evanston, Illinois USA April, 2017 1 / 15 Outline The problem, part 1) The Problem, Part 2)

More information

Swarthmore Honors Exam 2012: Statistics

Swarthmore Honors Exam 2012: Statistics Swarthmore Honors Exam 2012: Statistics 1 Swarthmore Honors Exam 2012: Statistics John W. Emerson, Yale University NAME: Instructions: This is a closed-book three-hour exam having six questions. You may

More information

Ch 2: Simple Linear Regression

Ch 2: Simple Linear Regression Ch 2: Simple Linear Regression 1. Simple Linear Regression Model A simple regression model with a single regressor x is y = β 0 + β 1 x + ɛ, where we assume that the error ɛ is independent random component

More information

22s:152 Applied Linear Regression. Chapter 8: 1-Way Analysis of Variance (ANOVA) 2-Way Analysis of Variance (ANOVA)

22s:152 Applied Linear Regression. Chapter 8: 1-Way Analysis of Variance (ANOVA) 2-Way Analysis of Variance (ANOVA) 22s:152 Applied Linear Regression Chapter 8: 1-Way Analysis of Variance (ANOVA) 2-Way Analysis of Variance (ANOVA) We now consider an analysis with only categorical predictors (i.e. all predictors are

More information

Regression and Models with Multiple Factors. Ch. 17, 18

Regression and Models with Multiple Factors. Ch. 17, 18 Regression and Models with Multiple Factors Ch. 17, 18 Mass 15 20 25 Scatter Plot 70 75 80 Snout-Vent Length Mass 15 20 25 Linear Regression 70 75 80 Snout-Vent Length Least-squares The method of least

More information

STAT 3022 Spring 2007

STAT 3022 Spring 2007 Simple Linear Regression Example These commands reproduce what we did in class. You should enter these in R and see what they do. Start by typing > set.seed(42) to reset the random number generator so

More information

Math 3330: Solution to midterm Exam

Math 3330: Solution to midterm Exam Math 3330: Solution to midterm Exam Question 1: (14 marks) Suppose the regression model is y i = β 0 + β 1 x i + ε i, i = 1,, n, where ε i are iid Normal distribution N(0, σ 2 ). a. (2 marks) Compute the

More information

Lecture 4 Multiple linear regression

Lecture 4 Multiple linear regression Lecture 4 Multiple linear regression BIOST 515 January 15, 2004 Outline 1 Motivation for the multiple regression model Multiple regression in matrix notation Least squares estimation of model parameters

More information

Variance Decomposition in Regression James M. Murray, Ph.D. University of Wisconsin - La Crosse Updated: October 04, 2017

Variance Decomposition in Regression James M. Murray, Ph.D. University of Wisconsin - La Crosse Updated: October 04, 2017 Variance Decomposition in Regression James M. Murray, Ph.D. University of Wisconsin - La Crosse Updated: October 04, 2017 PDF file location: http://www.murraylax.org/rtutorials/regression_anovatable.pdf

More information

Stat 5303 (Oehlert): Balanced Incomplete Block Designs 1

Stat 5303 (Oehlert): Balanced Incomplete Block Designs 1 Stat 5303 (Oehlert): Balanced Incomplete Block Designs 1 > library(stat5303libs);library(cfcdae);library(lme4) > weardata

More information

Multiple Predictor Variables: ANOVA

Multiple Predictor Variables: ANOVA Multiple Predictor Variables: ANOVA 1/32 Linear Models with Many Predictors Multiple regression has many predictors BUT - so did 1-way ANOVA if treatments had 2 levels What if there are multiple treatment

More information

STAT 350: Summer Semester Midterm 1: Solutions

STAT 350: Summer Semester Midterm 1: Solutions Name: Student Number: STAT 350: Summer Semester 2008 Midterm 1: Solutions 9 June 2008 Instructor: Richard Lockhart Instructions: This is an open book test. You may use notes, text, other books and a calculator.

More information

Stat 401B Exam 2 Fall 2016

Stat 401B Exam 2 Fall 2016 Stat 40B Eam Fall 06 I have neither given nor received unauthorized assistance on this eam. Name Signed Date Name Printed ATTENTION! Incorrect numerical answers unaccompanied by supporting reasoning will

More information

Matrices and vectors A matrix is a rectangular array of numbers. Here s an example: A =

Matrices and vectors A matrix is a rectangular array of numbers. Here s an example: A = Matrices and vectors A matrix is a rectangular array of numbers Here s an example: 23 14 17 A = 225 0 2 This matrix has dimensions 2 3 The number of rows is first, then the number of columns We can write

More information

Chaper 5: Matrix Approach to Simple Linear Regression. Matrix: A m by n matrix B is a grid of numbers with m rows and n columns. B = b 11 b m1 ...

Chaper 5: Matrix Approach to Simple Linear Regression. Matrix: A m by n matrix B is a grid of numbers with m rows and n columns. B = b 11 b m1 ... Chaper 5: Matrix Approach to Simple Linear Regression Matrix: A m by n matrix B is a grid of numbers with m rows and n columns B = b 11 b 1n b m1 b mn Element b ik is from the ith row and kth column A

More information

STAT 572 Assignment 5 - Answers Due: March 2, 2007

STAT 572 Assignment 5 - Answers Due: March 2, 2007 1. The file glue.txt contains a data set with the results of an experiment on the dry sheer strength (in pounds per square inch) of birch plywood, bonded with 5 different resin glues A, B, C, D, and E.

More information

FACTORIAL DESIGNS and NESTED DESIGNS

FACTORIAL DESIGNS and NESTED DESIGNS Experimental Design and Statistical Methods Workshop FACTORIAL DESIGNS and NESTED DESIGNS Jesús Piedrafita Arilla jesus.piedrafita@uab.cat Departament de Ciència Animal i dels Aliments Items Factorial

More information

Lab 3 A Quick Introduction to Multiple Linear Regression Psychology The Multiple Linear Regression Model

Lab 3 A Quick Introduction to Multiple Linear Regression Psychology The Multiple Linear Regression Model Lab 3 A Quick Introduction to Multiple Linear Regression Psychology 310 Instructions.Work through the lab, saving the output as you go. You will be submitting your assignment as an R Markdown document.

More information

22s:152 Applied Linear Regression. 1-way ANOVA visual:

22s:152 Applied Linear Regression. 1-way ANOVA visual: 22s:152 Applied Linear Regression 1-way ANOVA visual: Chapter 8: 1-Way Analysis of Variance (ANOVA) 2-Way Analysis of Variance (ANOVA) 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 Y We now consider an analysis

More information

6. Multiple Linear Regression

6. Multiple Linear Regression 6. Multiple Linear Regression SLR: 1 predictor X, MLR: more than 1 predictor Example data set: Y i = #points scored by UF football team in game i X i1 = #games won by opponent in their last 10 games X

More information

STAT 215 Confidence and Prediction Intervals in Regression

STAT 215 Confidence and Prediction Intervals in Regression STAT 215 Confidence and Prediction Intervals in Regression Colin Reimer Dawson Oberlin College 24 October 2016 Outline Regression Slope Inference Partitioning Variability Prediction Intervals Reminder:

More information

Introduction and Background to Multilevel Analysis

Introduction and Background to Multilevel Analysis Introduction and Background to Multilevel Analysis Dr. J. Kyle Roberts Southern Methodist University Simmons School of Education and Human Development Department of Teaching and Learning Background and

More information

Dealing with Heteroskedasticity

Dealing with Heteroskedasticity Dealing with Heteroskedasticity James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) Dealing with Heteroskedasticity 1 / 27 Dealing

More information

Regression Review. Statistics 149. Spring Copyright c 2006 by Mark E. Irwin

Regression Review. Statistics 149. Spring Copyright c 2006 by Mark E. Irwin Regression Review Statistics 149 Spring 2006 Copyright c 2006 by Mark E. Irwin Matrix Approach to Regression Linear Model: Y i = β 0 + β 1 X i1 +... + β p X ip + ɛ i ; ɛ i iid N(0, σ 2 ), i = 1,..., n

More information

Natural language support but running in an English locale

Natural language support but running in an English locale R version 3.2.1 (2015-06-18) -- "World-Famous Astronaut" Copyright (C) 2015 The R Foundation for Statistical Computing Platform: x86_64-apple-darwin13.4.0 (64-bit) R is free software and comes with ABSOLUTELY

More information

Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals. Regression Output. Conditions for inference.

Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals. Regression Output. Conditions for inference. Understanding regression output from software Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals In 1966 Cyril Burt published a paper called The genetic determination of differences

More information

Categorical Predictor Variables

Categorical Predictor Variables Categorical Predictor Variables We often wish to use categorical (or qualitative) variables as covariates in a regression model. For binary variables (taking on only 2 values, e.g. sex), it is relatively

More information

Cuckoo Birds. Analysis of Variance. Display of Cuckoo Bird Egg Lengths

Cuckoo Birds. Analysis of Variance. Display of Cuckoo Bird Egg Lengths Cuckoo Birds Analysis of Variance Bret Larget Departments of Botany and of Statistics University of Wisconsin Madison Statistics 371 29th November 2005 Cuckoo birds have a behavior in which they lay their

More information

Booklet of Code and Output for STAC32 Final Exam

Booklet of Code and Output for STAC32 Final Exam Booklet of Code and Output for STAC32 Final Exam December 12, 2015 List of Figures in this document by page: List of Figures 1 Time in days for students of different majors to find full-time employment..............................

More information

Ch 3: Multiple Linear Regression

Ch 3: Multiple Linear Regression Ch 3: Multiple Linear Regression 1. Multiple Linear Regression Model Multiple regression model has more than one regressor. For example, we have one response variable and two regressor variables: 1. delivery

More information

Coefficient of Determination

Coefficient of Determination Coefficient of Determination ST 430/514 The coefficient of determination, R 2, is defined as before: R 2 = 1 SS E (yi ŷ i ) = 1 2 SS yy (yi ȳ) 2 The interpretation of R 2 is still the fraction of variance

More information

Distribution Assumptions

Distribution Assumptions Merlise Clyde Duke University November 22, 2016 Outline Topics Normality & Transformations Box-Cox Nonlinear Regression Readings: Christensen Chapter 13 & Wakefield Chapter 6 Linear Model Linear Model

More information

Leverage. the response is in line with the other values, or the high leverage has caused the fitted model to be pulled toward the observed response.

Leverage. the response is in line with the other values, or the high leverage has caused the fitted model to be pulled toward the observed response. Leverage Some cases have high leverage, the potential to greatly affect the fit. These cases are outliers in the space of predictors. Often the residuals for these cases are not large because the response

More information

Homework 9 Sample Solution

Homework 9 Sample Solution Homework 9 Sample Solution # 1 (Ex 9.12, Ex 9.23) Ex 9.12 (a) Let p vitamin denote the probability of having cold when a person had taken vitamin C, and p placebo denote the probability of having cold

More information

lm statistics Chris Parrish

lm statistics Chris Parrish lm statistics Chris Parrish 2017-04-01 Contents s e and R 2 1 experiment1................................................. 2 experiment2................................................. 3 experiment3.................................................

More information

1.) Fit the full model, i.e., allow for separate regression lines (different slopes and intercepts) for each species

1.) Fit the full model, i.e., allow for separate regression lines (different slopes and intercepts) for each species Lecture notes 2/22/2000 Dummy variables and extra SS F-test Page 1 Crab claw size and closing force. Problem 7.25, 10.9, and 10.10 Regression for all species at once, i.e., include dummy variables for

More information

Math 2311 Written Homework 6 (Sections )

Math 2311 Written Homework 6 (Sections ) Math 2311 Written Homework 6 (Sections 5.4 5.6) Name: PeopleSoft ID: Instructions: Homework will NOT be accepted through email or in person. Homework must be submitted through CourseWare BEFORE the deadline.

More information

UNIVERSITY OF TORONTO Faculty of Arts and Science

UNIVERSITY OF TORONTO Faculty of Arts and Science UNIVERSITY OF TORONTO Faculty of Arts and Science December 2013 Final Examination STA442H1F/2101HF Methods of Applied Statistics Jerry Brunner Duration - 3 hours Aids: Calculator Model(s): Any calculator

More information

Statistics for Engineers Lecture 9 Linear Regression

Statistics for Engineers Lecture 9 Linear Regression Statistics for Engineers Lecture 9 Linear Regression Chong Ma Department of Statistics University of South Carolina chongm@email.sc.edu April 17, 2017 Chong Ma (Statistics, USC) STAT 509 Spring 2017 April

More information

Lecture 6 Multiple Linear Regression, cont.

Lecture 6 Multiple Linear Regression, cont. Lecture 6 Multiple Linear Regression, cont. BIOST 515 January 22, 2004 BIOST 515, Lecture 6 Testing general linear hypotheses Suppose we are interested in testing linear combinations of the regression

More information

L21: Chapter 12: Linear regression

L21: Chapter 12: Linear regression L21: Chapter 12: Linear regression Department of Statistics, University of South Carolina Stat 205: Elementary Statistics for the Biological and Life Sciences 1 / 37 So far... 12.1 Introduction One sample

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression MATH 282A Introduction to Computational Statistics University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/ eariasca/math282a.html MATH 282A University

More information