Lecture 6 Multiple Linear Regression, cont.

Similar documents
Ch 3: Multiple Linear Regression

Lecture 4 Multiple linear regression

Ch 2: Simple Linear Regression

Applied Regression Analysis

Inference for Regression

Multiple Linear Regression

Simple Linear Regression

Simple Linear Regression

Simple and Multiple Linear Regression

STAT 3A03 Applied Regression With SAS Fall 2017

MS&E 226: Small Data

Linear models and their mathematical foundations: Simple linear regression

STAT420 Midterm Exam. University of Illinois Urbana-Champaign October 19 (Friday), :00 4:15p. SOLUTIONS (Yellow)

UNIVERSITY OF MASSACHUSETTS. Department of Mathematics and Statistics. Basic Exam - Applied Statistics. Tuesday, January 17, 2017

Inference. ME104: Linear Regression Analysis Kenneth Benoit. August 15, August 15, 2012 Lecture 3 Multiple linear regression 1 1 / 58

ST430 Exam 2 Solutions

Applied Regression. Applied Regression. Chapter 2 Simple Linear Regression. Hongcheng Li. April, 6, 2013

R 2 and F -Tests and ANOVA

Chapter 3: Multiple Regression. August 14, 2018

LECTURE 6. Introduction to Econometrics. Hypothesis testing & Goodness of fit

Figure 1: The fitted line using the shipment route-number of ampules data. STAT5044: Regression and ANOVA The Solution of Homework #2 Inyoung Kim

MATH 644: Regression Analysis Methods

Coefficient of Determination

Simple linear regression

Multiple Linear Regression

Chapter 12: Multiple Linear Regression

Linear Regression Model. Badr Missaoui

Simple Linear Regression

Math 3330: Solution to midterm Exam

Measuring the fit of the model - SSR

Simple Linear Regression

ST505/S697R: Fall Homework 2 Solution.

22s:152 Applied Linear Regression. Take random samples from each of m populations.

Lecture 15 Multiple regression I Chapter 6 Set 2 Least Square Estimation The quadratic form to be minimized is

Problems. Suppose both models are fitted to the same data. Show that SS Res, A SS Res, B

STAT 540: Data Analysis and Regression

Correlation and the Analysis of Variance Approach to Simple Linear Regression

ST430 Exam 1 with Answers

6. Multiple Linear Regression

Variance Decomposition and Goodness of Fit

y ˆ i = ˆ " T u i ( i th fitted value or i th fit)

AMS 315/576 Lecture Notes. Chapter 11. Simple Linear Regression

General Linear Model (Chapter 4)

Lecture 2. The Simple Linear Regression Model: Matrix Approach

Formal Statement of Simple Linear Regression Model

Lecture 5: ANOVA and Correlation

22s:152 Applied Linear Regression. There are a couple commonly used models for a one-way ANOVA with m groups. Chapter 8: ANOVA

Multivariate Linear Regression Models

Lecture 15. Hypothesis testing in the linear model

Scatter plot of data from the study. Linear Regression

Applied Regression Modeling: A Business Approach Chapter 3: Multiple Linear Regression Sections

Concordia University (5+5)Q 1.

Tests of Linear Restrictions

STAT763: Applied Regression Analysis. Multiple linear regression. 4.4 Hypothesis testing

Probability and Statistics Notes

14 Multiple Linear Regression

Lecture 1: Linear Models and Applications

Regression Review. Statistics 149. Spring Copyright c 2006 by Mark E. Irwin

13 Simple Linear Regression

Recall that a measure of fit is the sum of squared residuals: where. The F-test statistic may be written as:

Outline. Remedial Measures) Extra Sums of Squares Standardized Version of the Multiple Regression Model

Regression Analysis II

Chapter 14 Simple Linear Regression (A)

Lecture 11: Simple Linear Regression

Lecture 18: Simple Linear Regression

Lecture 14 Simple Linear Regression

Table 1: Fish Biomass data set on 26 streams

Linear regression. We have that the estimated mean in linear regression is. ˆµ Y X=x = ˆβ 0 + ˆβ 1 x. The standard error of ˆµ Y X=x is.

Least Squares Estimation

Inferences for Regression

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models

Scatter plot of data from the study. Linear Regression

Oct Simple linear regression. Minimum mean square error prediction. Univariate. regression. Calculating intercept and slope

Density Temp vs Ratio. temp

Lecture 3: Inference in SLR

Econometrics. Week 8. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

ECON3150/4150 Spring 2015

Variance Decomposition in Regression James M. Murray, Ph.D. University of Wisconsin - La Crosse Updated: October 04, 2017

BNAD 276 Lecture 10 Simple Linear Regression Model

Inference for the Regression Coefficient

Review: General Approach to Hypothesis Testing. 1. Define the research question and formulate the appropriate null and alternative hypotheses.

Statistical methods for comparing multiple groups. Lecture 7: ANOVA. ANOVA: Definition. ANOVA: Concepts

Analysis of Covariance. The following example illustrates a case where the covariate is affected by the treatments.

Chapter 14 Student Lecture Notes Department of Quantitative Methods & Information Systems. Business Statistics. Chapter 14 Multiple Regression

STA 302 H1F / 1001 HF Fall 2007 Test 1 October 24, 2007

Visual interpretation with normal approximation

PART I. (a) Describe all the assumptions for a normal error regression model with one predictor variable,

SSR = The sum of squared errors measures how much Y varies around the regression line n. It happily turns out that SSR + SSE = SSTO.

36-707: Regression Analysis Homework Solutions. Homework 3

Analysis of Variance. Source DF Squares Square F Value Pr > F. Model <.0001 Error Corrected Total

Handout 4: Simple Linear Regression

Inference in Regression Analysis

Analysis of Variance Bios 662

Categorical Predictor Variables

Linear Regression. Simple linear regression model determines the relationship between one dependent variable (y) and one independent variable (x).

STAT 705 Chapter 16: One-way ANOVA

School of Mathematical Sciences. Question 1

STAT 511. Lecture : Simple linear regression Devore: Section Prof. Michael Levine. December 3, Levine STAT 511

Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals. Regression Output. Conditions for inference.

Correlation Analysis

Transcription:

Lecture 6 Multiple Linear Regression, cont. BIOST 515 January 22, 2004 BIOST 515, Lecture 6

Testing general linear hypotheses Suppose we are interested in testing linear combinations of the regression coefficients. For example, we might be interested in testing whether two regression coefficients are equal H 0 : β i = β j. Equivalently, H 0 : β i β j = 0. Such hypotheses can be expressed as H 0 : T β = 0, where T is an m p matrix of constants, such that only r of the m equations in T β = 0 are independent. BIOST 515, Lecture 6 1

For example, consider the model y i = β 0 + x i1 β 1 + x i2 β 2 + x i3 β 3 + ɛ i and testing the hypothesis H 0 : β 1 β 2 = 0. This hypothesis is equivalent to H 0 : ( 0 1 1 0 )β = 0. We may also consider the hypothesis H 0 : β 1 β 2 = 0, β 3 = 0 BIOST 515, Lecture 6 2

which is equivalent to H 0 : T β = 0 where T = ( 0 1 1 0 0 0 0 1 ). BIOST 515, Lecture 6 3

We can use sums of squares to test general linear hypotheses. The full model is y = Xβ + ɛ with residual sum of squares SSE(F M) = y y ˆβ X y (n p degrees of freedom). Obtain the reduced model by solving T β = 0 for r of the regression coefficients in the full model in terms of the remaining p + 1 r regression coefficients. Substitutin these values into the full model will yield the reduced model, y = Zγ + ɛ, where Z is an n (p + 1 r) matrix and γ is a (p + 1 r) 1 vector of unknown regression coefficients. The residual sum of BIOST 515, Lecture 6 4

squares for the reduced model is SSE(RM) = y y ˆγZ y (n p + r degrees of freedom) SSE(RM) SSE(F M) is called the sum of squares due to the hypothesis T β = 0. We can test this hypthesis using F 0 = (SSE(RM) SSE(F M))/r MSE F r,n p 1. BIOST 515, Lecture 6 5

CHS smoking example Recall the example where smoking status was recoded to smoke 1i = { 1, never smoked 0, otherwise and smoke 2i = and we fit the model { 1, former smoker 0, otherwise, BP i = β 0 + β 1 smoke 1i + β 2 smoke 2i + ɛ i. Estimate Std. Error t value Pr(> t ) (Intercept) 69.2963 1.6176 42.84 0.0000 smoke1 2.9860 1.7629 1.69 0.0909 smoke2 2.6239 1.8162 1.44 0.1492 BIOST 515, Lecture 6 6

We may be interested in testing H 0 : β 1 = β 2 which is equivalent to testing H 0 : ( 0 1 1 )β The full model is BP i = β 0 + β 1 smoke 1i + β 2 smoke 2i + ɛ i, andthe reduced model is BP i = β 0 + β 1 smoke 1i + β 1 smoke 2i + ɛ i = β 0 + β 1 (smoke 1i + smoke 2i ) + ɛ i = γ 0 + γ 1 z i + ɛ i The reduced model is equivalent to the model we fit with current smokers vs. former and never smokers. BIOST 515, Lecture 6 7

Full model: Df Sum Sq Mean Sq F value Pr(>F) smoke1 1 101.65 101.65 0.79 0.3737 smoke2 1 267.61 267.61 2.09 0.1492 Residuals 495 63465.82 128.21 Reduced model: Df Sum Sq Mean Sq F value Pr(>F) smoker 1 354.93 354.93 2.77 0.0965 Residuals 496 63480.15 127.98 (63480.15 63465.82)/1 F 0 = = 0.11 < 3.86. 128.21 Therefore we fail to reject the null hypothesis. BIOST 515, Lecture 6 8

We could also test this hypothesis using the t statistic ˆσ2 (C 11 + C 22 2C 12) t 0 = ˆβ 1 ˆβ 2 ŝe( ˆβ 1 ˆβ 2 ) = ˆβ1 ˆβ 2 where C = 0.0204 0.0204 0.0204 0.0204 0.0242 0.0204 0.0204 0.0204 0.0257. Therefore t 0 = (2.986 2.624) 128.21 (.0242 +.0257 2.0204) =.335 < t n p 1,.975 BIOST 515, Lecture 6 9

Consider the model BP i = β 0 + β 1 smoke 1i + β 2 smoke 2i + β 3 age i + ɛ i. Df Sum Sq Mean Sq F value Pr(>F) smoke1 1 101.65 101.65 0.83 0.3638 smoke2 1 267.61 267.61 2.18 0.1409 AGE 1 2687.39 2687.39 21.84 0.0000 Residuals 494 60778.42 123.03 Suppose we want to test H 0 : β 1 = β 2, β 3 = 0 BIOST 515, Lecture 6 10

which is equivalent to H 0 : The reduced model is ( 0 1 1 0 0 0 0 1 ) β = 0. BP i = β 0 + β 1 (smoke 1i + smoke 2i ) + ɛ i = γ 0 + γ 1 z i + ɛ i (63480.15 60778.42)/2 F 0 = 123.03 We reject the null hypothesis. = 10.98 > F 2,494,.95 = 3.01. BIOST 515, Lecture 6 11

Confidence intervals in multiple linear regression Confidence interval for a single coefficient Confidence interval for a fitted value Simultaneous confidence intervals on multiple coefficients BIOST 515, Lecture 6 12

Confidence interval for a single coefficient We can construct a confidence interval for β j Given that ˆβ j β j ŝe( ˆβ j ) = ˆβ j β j t n p 1, ˆσ2 C jj as follows. we can define a 100(1 α) confidence interval for β j as ˆβ j ± t n p 1,α/2 ˆσ 2 C jj. BIOST 515, Lecture 6 13

Confidence interval for a fitted value We can construct a confidence interval for the fitted response for a set of predictor values, x 01, x 02,..., x 0p. Define the vector x 0 as x 0 = The fitted value at this point is 1 x 01 x 02. x 0p ŷ 0 = x 0 ˆβ.. BIOST 515, Lecture 6 14

ŷ 0 is an unbiased estimator of E(y x 0 ), and the variance of ŷ 0 is var(ŷ 0 ) = σ 2 x 0(X X) 1 x 0. Therefore, the 100(1 α)% confidence interval for the fitted response at x 01, x 02,..., x 0p is ŷ 0 ± t n p 1,α/2 ˆσ 2 x 0 (X X) 1 x 0. BIOST 515, Lecture 6 15

Example from CHS In the last lecture, we fit the model BP i = β 0 + weight i β 1 + height i β 2 + age i β 3 + gender i β 4 + ɛ. Let s calculate the confidence interval for the fitted value for the 100th subject who has the covariate vector ( 194.8 159.2 70.0 0.0 ). The fitted value for BP is 73.67 and x 0(X X) 1 x 0 = 0.007972541 and the 95% confidence interval is 73.67 ± 1.96 11.11 0.007972541 = (71.72, 75.62). BIOST 515, Lecture 6 16

Age Height Weight Confidence interval for BP 65 80 95 140 170 100 200 300 55 60 65 70 75 80 85 0 0 0 0 BIOST 515, Lecture 6 17

Simultaneous confidence intervals Sometimes we may be interested in specifying a (1 α)100% confidence interval (or region) for the entire set or a subset of the coefficients. ( ˆβ β) X X( ˆβ β) F p+1,n p 1 (p + 1)MSE Therefore, we can define a (1 α)100% joint confidence region for all the parameters in β as ( ˆβ β) X X( ˆβ β) (p + 1)MSE F p+1,n p 1 BIOST 515, Lecture 6 18

Bonferroni intervals Another general pproach for obtaining simultaneous confidence intervals is ˆβ j ± ŝe( ˆβ j ), j = 0, 1,..., p. (1) Using the Bonferroni method, we set = t n p 1,α/(2(p+1)) leading to a Bonferroni confidence interval of ˆβ j ± t n p 1,α/(2(p+1)) ŝe( ˆβ j ). BIOST 515, Lecture 6 19

Bonferroni intervals CHS example BP i = β 0 + weight i β 1 + height i β 2 + age i β 3 + gender i β 4 + ɛ. The Bonferroni intervals are ˆβ j ± t 493,.005 ŝe( ˆβ j ) Lower Upper (Intercept) 49.25 131.64 WEIGHT 0.01 0.08 HEIGHT 0.22 0.22 AGE 0.57 0.09 GENDER 3.11 4.78 BIOST 515, Lecture 6 20

Hidden extrapolation in multiple regression Joint region of original data x 2 x 0 2 x 0 1 x 1 BIOST 515, Lecture 6 21

R 2 and adjusted R 2 As in simple linear regression R 2 = 1 SSE SST O. In general, R 2 increases whenever new terms are added to the model. Therefore, for model comparison, we may prefer to use an R 2 that is adjusted for the number of predictors in the model. This is the adjusted R 2 and is equivalent to R 2 adj = 1 MSE SST O/(n 1) BIOST 515, Lecture 6 22

Predictors R 2 Radj 2 weight, height, age, gender 0.0464 0.0386 smoke1, smoke2 0.0058 0.0018 weight, height 0.0221 0.0181 smoke1, smoke2, age 0.048 0.042 BIOST 515, Lecture 6 23