Comparing Nested Models

Similar documents
ST430 Exam 1 with Answers

Simple Linear Regression

ST430 Exam 2 Solutions

Multiple Linear Regression

1.) Fit the full model, i.e., allow for separate regression lines (different slopes and intercepts) for each species

Tests of Linear Restrictions

14 Multiple Linear Regression

22s:152 Applied Linear Regression. Take random samples from each of m populations.

Lecture 2. The Simple Linear Regression Model: Matrix Approach

Linear Regression. In this lecture we will study a particular type of regression model: the linear regression model

Ch 2: Simple Linear Regression

MATH 644: Regression Analysis Methods

Density Temp vs Ratio. temp

Lecture 10. Factorial experiments (2-way ANOVA etc)

1 Multiple Regression

Inference for Regression

Coefficient of Determination

UNIVERSITY OF MASSACHUSETTS. Department of Mathematics and Statistics. Basic Exam - Applied Statistics. Tuesday, January 17, 2017

Lecture 18: Simple Linear Regression

Variance Decomposition and Goodness of Fit

22s:152 Applied Linear Regression. There are a couple commonly used models for a one-way ANOVA with m groups. Chapter 8: ANOVA

Applied Regression Analysis

Lec 5: Factorial Experiment

1 Use of indicator random variables. (Chapter 8)

Biostatistics 380 Multiple Regression 1. Multiple Regression

Scatter plot of data from the study. Linear Regression

Lecture 2. Simple linear regression

Multiple comparisons The problem with the one-pair-at-a-time approach is its error rate.

" M A #M B. Standard deviation of the population (Greek lowercase letter sigma) σ 2

R 2 and F -Tests and ANOVA

Confidence Intervals, Testing and ANOVA Summary

Stat 412/512 TWO WAY ANOVA. Charlotte Wickham. stat512.cwick.co.nz. Feb

Ch 3: Multiple Linear Regression

Scatter plot of data from the study. Linear Regression

Workshop 7.4a: Single factor ANOVA

22s:152 Applied Linear Regression

13 Simple Linear Regression

Regression and the 2-Sample t

Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals. Regression Output. Conditions for inference.

Unit 6 - Introduction to linear regression

Variance Decomposition in Regression James M. Murray, Ph.D. University of Wisconsin - La Crosse Updated: October 04, 2017

Example: Poisondata. 22s:152 Applied Linear Regression. Chapter 8: ANOVA

School of Mathematical Sciences. Question 1

A discussion on multiple regression models

MS&E 226: Small Data

Chapter 12: Linear regression II

Analysis of Variance

Stat 411/511 ESTIMATING THE SLOPE AND INTERCEPT. Charlotte Wickham. stat511.cwick.co.nz. Nov

STAT 350: Summer Semester Midterm 1: Solutions

Objectives Simple linear regression. Statistical model for linear regression. Estimating the regression parameters

Figure 1: The fitted line using the shipment route-number of ampules data. STAT5044: Regression and ANOVA The Solution of Homework #2 Inyoung Kim

Statistics for Engineers Lecture 9 Linear Regression

Regression, Part I. - In correlation, it would be irrelevant if we changed the axes on our graph.

Leverage. the response is in line with the other values, or the high leverage has caused the fitted model to be pulled toward the observed response.

Statistiek II. John Nerbonne. March 17, Dept of Information Science incl. important reworkings by Harmut Fitz

36-707: Regression Analysis Homework Solutions. Homework 3

MODELS WITHOUT AN INTERCEPT

Multiple comparisons - subsequent inferences for two-way ANOVA

Joint Probability Distributions

Process Capability Analysis Using Experiments

Simple Linear Regression

Linear regression. We have that the estimated mean in linear regression is. ˆµ Y X=x = ˆβ 0 + ˆβ 1 x. The standard error of ˆµ Y X=x is.

Chapter 16: Understanding Relationships Numerical Data

Linear Model Specification in R

22s:152 Applied Linear Regression. Chapter 8: 1-Way Analysis of Variance (ANOVA) 2-Way Analysis of Variance (ANOVA)

Inferences for Regression

Unit 6 - Simple linear regression

Stat 401B Exam 2 Fall 2016

Lecture 6 Multiple Linear Regression, cont.

Stat 401B Final Exam Fall 2015

Regression and Models with Multiple Factors. Ch. 17, 18

Nested 2-Way ANOVA as Linear Models - Unbalanced Example

Inferences on Linear Combinations of Coefficients

Recall that a measure of fit is the sum of squared residuals: where. The F-test statistic may be written as:

G562 Geometric Morphometrics. Statistical Tests. Department of Geological Sciences Indiana University. (c) 2012, P. David Polly

Correlation and simple linear regression S5

Sleep data, two drugs Ch13.xls

Statistics 203 Introduction to Regression Models and ANOVA Practice Exam

Applied Regression Modeling: A Business Approach Chapter 2: Simple Linear Regression Sections

Lecture 3: Inference in SLR

Model Specification and Data Problems. Part VIII

AMS 315/576 Lecture Notes. Chapter 11. Simple Linear Regression

Stat 5102 Final Exam May 14, 2015

L21: Chapter 12: Linear regression

lm statistics Chris Parrish

Stat/F&W Ecol/Hort 572 Review Points Ané, Spring 2010

Linear Modelling: Simple Regression

(ii) Scan your answer sheets INTO ONE FILE only, and submit it in the drop-box.

Linear Regression Model. Badr Missaoui

Categorical Predictor Variables

AMS 7 Correlation and Regression Lecture 8

Econometrics. 4) Statistical inference

Cuckoo Birds. Analysis of Variance. Display of Cuckoo Bird Egg Lengths

Multiple Regression Part I STAT315, 19-20/3/2014

Formal Statement of Simple Linear Regression Model

One-Way Analysis of Variance. With regression, we related two quantitative, typically continuous variables.

Introduction and Background to Multilevel Analysis

Multiple Predictor Variables: ANOVA

UNIVERSITY OF TORONTO Faculty of Arts and Science

BIOSTATS 640 Spring 2018 Unit 2. Regression and Correlation (Part 1 of 2) R Users

Transcription:

Comparing Nested Models ST 370 Two regression models are called nested if one contains all the predictors of the other, and some additional predictors. For example, the first-order model in two independent variables, Y = β 0 + β 1 x 1 + β 2 x 2 + ɛ, is nested within the complete second-order model Y = β 0 + β 1 x 1 + β 2 x 2 + β 11 x 2 1 + β 22 x 2 2 + β 12 x 1 x 2 + ɛ. How to choose between them? 1 / 17 Multiple Linear Regression

If the models are being considered for making predictions about the mean response or about future observations, you could just use PRESS or P 2. But you may be interested in whether the simpler model is adequate as a description of the relationship, and not necessarily in whether it gives better predictions. 2 / 17 Multiple Linear Regression

A single added predictor If the larger model has just one more predictor than the smaller model, you could just test the significance of the one additional coefficient, using the t-statistic. Multiple added predictors When the models differ by r > 1 added predictors, you cannot compare them using t-statistics. The conventional test is based on comparing the regression sums of squares for the two models: the general regression test, or the extra sum of squares test. 3 / 17 Multiple Linear Regression

Write SS R,reduced and SS R,full for the regression sums of squares of the two models, where the reduced model is nested within the full model. The extra sum of squares is SS R,extra = SS R,full SS R,reduced and if this is large, the r additional predictors have explained a substantial additional amount of variability. We test the null hypothesis that the added predictors all have zero coefficients using the F -statistic F obs = SS R,extra/r MS E,full. 4 / 17 Multiple Linear Regression

In R The R function anova() (not to be confused with aov()) implements the extra sum of squares test: wirebondlm2 <- lm(strength ~ Length + I(Length^2) + Height, wirebond) wirebondlm3 <- lm(strength ~ Length + I(Length^2) + Height + I(Height^2) + I(Length * Height), wirebond) anova(wirebondlm1, wirebondlm3) It can also compare a sequence of more than two nested models: anova(wirebondlm1, wirebondlm2, wirebondlm3) 5 / 17 Multiple Linear Regression

Note Because SS R = SS T SS E and SS T is the same for all models, the extra sum of squares can also be written SS R,extra = SS E,reduced SS E,full That is, the extra sum of squares is also the amount by which the residual sum of squares is reduced by the additional predictors. Note The nested model F -test can also be used when r = 1, and is equvalent to the t -test for the added coefficient, because F = t 2. 6 / 17 Multiple Linear Regression

Indicator Variables ST 370 Recall that an indicator variable is a variable that takes only the values 0 and 1. A single indicator variable divides the data into two groups, and is a quantitative representation of a categorical factor with two levels. To represent a factor with a > 2 levels, you need a 1 indicator variables. 7 / 17 Multiple Linear Regression

Recall the paper strength example: the factor is Hardwood Concentration, with levels 5%, 10%, 15%, and 20%. Define indicator variables { 1 for 5% hardwood x 1 = 0 otherwise { 1 for 10% hardwood x 2 = 0 otherwise { 1 for 15% hardwood x 3 = 0 otherwise { 1 for 20% hardwood x 4 = 0 otherwise 8 / 17 Multiple Linear Regression

Consider the regression model Y i = β 0 + β 2 x i,2 + β 3 x i,3 + β 4 x i,4 + ɛ i. The interpretation of β 0 is, as always, the mean response when x 2 = x 3 = x 4 = 0; in this case, that is for the remaining (baseline) category, 5% hardwood. For 10% hardwood, x 2 = 1 and x 3 = x 4 = 0, so the mean response is β 0 + β 2 ; the interpretation of β 2 is the difference between the mean responses for 10% hardwood and the baseline category. 9 / 17 Multiple Linear Regression

Similarly β 3 is the difference between 15% hardwood and the baseline category, and β 4 is the difference between 20% hardwood and the baseline category. So the interpretations of β 0, β 2, β 3, and β 4 are exactly the same as the interpretations of µ, τ 2, τ 3, and τ 4 in the one-factor model Y i,j = µ + τ i + ɛ i,j. The factorial model may be viewed as a special form of regression model with these indicator variables as constructed predictors. Modern statistical software fits factorial models using regression with indicator functions. 10 / 17 Multiple Linear Regression

Combining Categorical and Quantitative Predictors Example: surface finish The response Y is a measure of the roughness of the surface of a metal part finished on a lathe. Factors RPM; Type of cutting tool (2 types, 302 and 416). 11 / 17 Multiple Linear Regression

Begin with the model Y = β 0 + β 1 x 1 + β 2 x 2 + ɛ where x 1 is RPM and x 2 is the indicator for type 416: parts <- read.csv("data/table-12-11.csv") pairs(parts) parts$type <- as.factor(parts$type) summary(lm(finish ~ RPM + Type, parts)) 12 / 17 Multiple Linear Regression

Call: lm(formula = Finish ~ RPM + Type, data = finish) Residuals: Min 1Q Median 3Q Max -0.9546-0.5039-0.1804 0.4893 1.5188 Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) 14.276196 2.091214 6.827 2.94e-06 *** RPM 0.141150 0.008833 15.979 1.13e-11 *** Type416-13.280195 0.302879-43.847 < 2e-16 *** --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 Residual standard error: 0.6771 on 17 degrees of freedom Multiple R-squared: 0.9924,Adjusted R-squared: 0.9915 F-statistic: 1104 on 2 and 17 DF, p-value: < 2.2e-16 13 / 17 Multiple Linear Regression

For tool type 302, x 2 = 0, so the fitted equation is ŷ = 14.276 + 0.141 RPM while for tool type 416, x 2 = 1, and the fitted equation is ŷ = 14.276 13.280 + 0.141 RPM = 0.996 + 0.141 RPM We are essentially fitting parallel straight lines against RPM for the two tool types: the same slope, but different intercepts. 14 / 17 Multiple Linear Regression

We could also allow the slopes to be different: Y = β 0 + β 1 x 1 + β 2 x 2 + β 1,2 x 1 x 2 + ɛ. In this model, the slopes versus RPM are β 1 for type 302 and β 1 + β 1,2 for type 416. summary(lm(finish ~ RPM * Type, parts)) 15 / 17 Multiple Linear Regression

Call: lm(formula = Finish ~ RPM * Type, data = finish) Residuals: Min 1Q Median 3Q Max -0.68655-0.44881-0.07609 0.30171 1.76690 Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) 11.50294 2.50430 4.593 0.0003 *** RPM 0.15293 0.01060 14.428 1.37e-10 *** Type416-6.09423 4.02457-1.514 0.1495 RPM:Type416-0.03057 0.01708-1.790 0.0924. --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 Residual standard error: 0.6371 on 16 degrees of freedom Multiple R-squared: 0.9936,Adjusted R-squared: 0.9924 F-statistic: 832.3 on 3 and 16 DF, p-value: < 2.2e-16 16 / 17 Multiple Linear Regression

We do not reject the null hypothesis that the interaction term has a zero coefficient, so the fitted lines are not significantly different from parallel. 17 / 17 Multiple Linear Regression