Inference in Normal Regression Model. Dr. Frank Wood

Similar documents
Bias Variance Trade-off

Formal Statement of Simple Linear Regression Model

Inference in Regression Analysis

Simple and Multiple Linear Regression

Remedial Measures, Brown-Forsythe test, F test

Ch 2: Simple Linear Regression

Lecture 3: Inference in SLR

Regression Models - Introduction

Concordia University (5+5)Q 1.

Multiple Regression. Dr. Frank Wood. Frank Wood, Linear Regression Models Lecture 12, Slide 1

Inference for Regression

Nonparametric Regression and Bonferroni joint confidence intervals. Yang Feng

Lecture 18: Simple Linear Regression

Inferences for Regression

y ˆ i = ˆ " T u i ( i th fitted value or i th fit)

INTERVAL ESTIMATION AND HYPOTHESES TESTING

Chapter 2 Inferences in Simple Linear Regression

6. Multiple Linear Regression

Simple Linear Regression

Simple Linear Regression

Chapter 2 Inferences in Regression and Correlation Analysis

ST505/S697R: Fall Homework 2 Solution.

Regression Estimation - Least Squares and Maximum Likelihood. Dr. Frank Wood

Simple linear regression

Multiple Linear Regression

Regression Estimation Least Squares and Maximum Likelihood

Correlation Analysis

ECO220Y Simple Regression: Testing the Slope

Lecture 15 Multiple regression I Chapter 6 Set 2 Least Square Estimation The quadratic form to be minimized is

Section 3: Simple Linear Regression

Simple Linear Regression

STAT 540: Data Analysis and Regression

Measuring the fit of the model - SSR

Math 3330: Solution to midterm Exam

Chapter 2 Multiple Regression I (Part 1)

Chapter 1 Linear Regression with One Predictor

Lecture 1 Linear Regression with One Predictor Variable.p2

Simple Linear Regression

Lecture 14 Simple Linear Regression

STAT5044: Regression and Anova. Inyoung Kim

Figure 1: The fitted line using the shipment route-number of ampules data. STAT5044: Regression and ANOVA The Solution of Homework #2 Inyoung Kim

2.1: Inferences about β 1

Unit 6 - Simple linear regression

Chapter 12 - Lecture 2 Inferences about regression coefficient

STA 302 H1F / 1001 HF Fall 2007 Test 1 October 24, 2007

Business Statistics. Lecture 10: Correlation and Linear Regression

2.2 Classical Regression in the Time Series Context

Statistical View of Least Squares

Outline. Remedial Measures) Extra Sums of Squares Standardized Version of the Multiple Regression Model

Lecture 6 Multiple Linear Regression, cont.

Econometrics. 4) Statistical inference

Inference for Regression Simple Linear Regression

Inference for the Regression Coefficient

Y i = η + ɛ i, i = 1,...,n.

Sociology 6Z03 Review II

STA121: Applied Regression Analysis

AMS 315/576 Lecture Notes. Chapter 11. Simple Linear Regression

17: INFERENCE FOR MULTIPLE REGRESSION. Inference for Individual Regression Coefficients

Chapter 6 Multiple Regression

Chapter 14. Linear least squares

7 Estimation. 7.1 Population and Sample (P.91-92)

Chapter 14 Student Lecture Notes Department of Quantitative Methods & Information Systems. Business Statistics. Chapter 14 Multiple Regression

Diagnostics and Remedial Measures

Simple Linear Regression Analysis

MA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2

Multivariate Regression

The simple linear regression model discussed in Chapter 13 was written as

LECTURE 6. Introduction to Econometrics. Hypothesis testing & Goodness of fit

Simple Linear Regression (Part 3)

BNAD 276 Lecture 10 Simple Linear Regression Model

Ch 3: Multiple Linear Regression

Statistics 512: Applied Linear Models. Topic 1

STA 108 Applied Linear Models: Regression Analysis Spring Solution for Homework #6

STAT420 Midterm Exam. University of Illinois Urbana-Champaign October 19 (Friday), :00 4:15p. SOLUTIONS (Yellow)

Mathematics for Economics MA course

TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics

Introduction to Statistical Inference Lecture 8: Linear regression, Tests and confidence intervals

BIOE 198MI Biomedical Data Analysis. Spring Semester Lab 5: Introduction to Statistics

Inference for Regression Inference about the Regression Model and Using the Regression Line

Coefficient of Determination

Applied Regression Analysis

Lecture 5: Omitted Variables, Dummy Variables and Multicollinearity

Business Statistics. Lecture 10: Course Review

[4+3+3] Q 1. (a) Describe the normal regression model through origin. Show that the least square estimator of the regression parameter is given by

Multiple Regression Analysis

TMA4255 Applied Statistics V2016 (5)

STA 4210 Practise set 2a

Statistics for Engineers Lecture 9 Linear Regression

Econ 3790: Business and Economics Statistics. Instructor: Yogesh Uppal

Econ 510 B. Brown Spring 2014 Final Exam Answers

Wooldridge, Introductory Econometrics, 4th ed. Appendix C: Fundamentals of mathematical statistics

Chapter 14 Simple Linear Regression (A)

Relating Graph to Matlab

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Big Data Analysis with Apache Spark UC#BERKELEY

Lecture 5: Linear Regression

Review: General Approach to Hypothesis Testing. 1. Define the research question and formulate the appropriate null and alternative hypotheses.

Business Statistics. Lecture 9: Simple Regression

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X.

AMS 7 Correlation and Regression Lecture 8

Transcription:

Inference in Normal Regression Model Dr. Frank Wood

Remember We know that the point estimator of b 1 is b 1 = (Xi X )(Y i Ȳ ) (Xi X ) 2 Last class we derived the sampling distribution of b 1, it being N(β 1, σ 2 {b 1 })(when σ 2 known) with σ 2 {b 1 } = σ 2 (Xi X ) 2 And we suggested that an estimate of σ 2 {b 1 } could be arrived at by substituting the MSE for σ 2 when σ 2 is unknown. s 2 {b 1 } = MSE (Xi X ) 2 = SSE n 2 (Xi X ) 2

Sampling Distribution of (b 1 β 1 )/s{b 1 } Since b 1 is normally distribute, (b 1 β 1 )/σ{b 1 } is a standard normal variable N(0, 1) We don t know σ 2 {b 1 } so it must be estimated from data. We have already denoted it s estimate s 2 {b 1 } Using this estimate we showed that b 1 β 1 s{b 1 } t(n 2) where s{b 1 } = s 2 {b 1 } It is from this fact that our confidence intervals and tests will derive.

Confidence Intervals and Hypothesis Tests Now that we know the sampling distribution of b 1 (t with n-2 degrees of freedom) we can construct confidence intervals and hypothesis tests easily

Confidence Interval for β 1 Since the studentized statistic follows a t distribution we can make the following probability statement P(t(α/2; n 2) b 1 β 1 s{b 1 } t(1 α/2; n 2)) = 1 α matlab: tpdf, tcdf, tinv

Remember Density: f (y) = df (y) dy Distribution (CDF): F (y) = P(Y y) = y f (t)dt Inverse CDF: F 1 y (p) = y s.t. f (t)dt = p

Book tables and Matlab commands Appendix B (or elsewhere in other books), a table of percentiles of the t distribution is given. In this table one number appears for each of a number of degrees of freedom ν and a parameter, call it A. Each entry is some value of t(a; ν) where P{t(ν) t(a; ν)} = A In words t(a; ν) is the point on the horizontal axis of the Student-t distribution where A percent of the mass under the curve is located to the left. This is precisely the quantity returned by tinv(a, ν) in Matlab. How can this be used to produce a confidence interval?

Interval arriving from picking α Note that by symmetry Remember t(α/2; n 2) = t(1 α/2; n 2) P(t(α/2; n 2) b 1 β 1 s{b 1 } t(1 α/2; n 2)) = 1 α Rearranging terms and using this symmetry we have P(b 1 t(1 α/2; n 2)s{b 1 } β 1 b 1 + t(1 α/2; n 2)s{b 1 }) And now we can use a table to look up and produce confidence intervals = 1 α

Using tables for Computing Intervals The tables in the book (table B.2 in the appendix) for t(1 α/2; ν) where P{t(ν) t(1 α/2; ν)} = A Provides the inverse CDF of the t-distribution This can be arrived at computationally as well Matlab: tinv(1 α/2, ν)

1 α confidence limits for β 1 The 1 α confidence limits for β 1 are b 1 ± t(1 α/2; n 2)s{b 1 } Note that this quantity can be used to calculate confidence intervals given n and α. Fixing α can guide the choice of sample size if a particular confidence interval is desired Give a sample size, vice versa. Also useful for hypothesis testing

Tests Concerning β 1 Example 1 Two-sided test H 0 : β 1 = 0 H a : β 1 0 Test statistic t = b1 0 s{b 1}

Tests Concerning β 1 We have an estimate of the sampling distribution of b 1 from the data. If the null hypothesis holds then the b 1 estimate coming from the data should be within the 95% confidence interval of the sampling distribution centered at 0 (in this case) t = b 1 0 s{b 1 } Variability in b 1 is assumed to arise from sampling noise.

Decision rules if t t(1 α/2; n 2), conclude H 0 if t > t(1 α/2; n 2), conclude H α Absolute values make the test two-sided

Intuition p-value is value of α that moves the green line to the blue line

Calculating the p-value The p-value, or attained significance level, is the smallest level of significance α for which the observed data indicate that the null hypothesis should be rejected. This can be looked up using the CDF of the test statistic. In Matlab Two-sided p-value 2 (1 tcdf ( t, ν))

Inferences Concerning β 0 Largely, inference procedures regarding β 0 can be performed in the same way as those for β 1 Remember the point estimator b 0 for β 0 b 0 = Ȳ b 1 X

Sampling distribution of b 0 The sampling distribution of b 0 refers to the different values of b 0 that would be obtained with repeated sampling when the levels of the predictor variable X are held constant from sample to sample. For the normal regression model the sampling distribution of b 0 is normal

Sampling distribution of b 0 When error variance is known E{b 0 } = β 0 σ 2 {b 0 } = σ 2 ( 1 n + X 2 (Xi X ) 2 ) When error variance is unknown s 2 {b 0 } = MSE( 1 n + X 2 (Xi X ) 2 )

Confidence interval for β 0 The 1 α confidence limits for β 0 are obtained in the same manner as those for β 1 b 0 ± t(1 α/2; n 2)s{b 0 }

Considerations on Inferences on β 0 and β 1 Effects of departures from normality The estimators of β0 and β 1 have the property of asymptotic normality - their distributions approach normality as the sample size increases (under general conditions) Spacing of the X levels The variances of b0 and b 1 (for a given n and σ 2 ) depend strongly on the spacing of X

Sampling distribution of point estimator of mean response Let X h be the level of X for which we would like an estimate of the mean response Needs to be one of the observed X s The mean response when X = X h is denoted by E{Y h } The point estimator of E{Y h } is Ŷ h = b 0 + b 1 X h We are interested in the sampling distribution of this quantity

Sampling Distribution of Ŷh We have Ŷ h = b 0 + b 1 X h Since this quantity is itself a linear combination of the Y i s it s sampling distribution is itself normal. The mean of the sampling distribution is Biased or unbiased? E{Ŷh} = E{b 0 } + E{b 1 }X h = β 0 + β 1 X h

Sampling Distribution of Ŷh To derive the sampling distribution variance of the mean response we first show that b 1 and (1/n) Y i are uncorrelated and, hence, for the normal error regression model independent We start with the definitions Ȳ = ( 1 n )Y i b 1 = k i Y i, k i = (X i X ) (Xi X ) 2

Sampling Distribution of Ŷh We want to show that mean response and the estimate b 1 are uncorrelated Cov(Ȳ, b 1) = σ 2 {Ȳ, b 1} = 0 To do this we need the following result (A.32) n n σ 2 { a i Y i, c i Y i } = i=1 i=1 n a i c i σ 2 {Y i } i=1 when the Y i are independent

Sampling Distribution of Ŷh Using this fact we have σ 2 { n i=1 1 n Y i, n k i Y i } = i=1 So the Ȳ and b 1 are uncorrelated = n i=1 n i=1 = σ2 n = 0 1 n k iσ 2 {Y i } 1 n k iσ 2 n i=1 k i

Sampling Distribution of Ŷh This means that we can write down the variance σ 2 {Ŷ h } = σ 2 {Ȳ + b 1 (X h X )} alternative and equivalent form of regression function But we know that the mean of Y and b 1 are uncorrelated so σ 2 {Ŷ h } = σ 2 {Ȳ } + σ 2 {b 1 }(X h X ) 2

Sampling Distribution of Ŷh We know (from last lecture) σ 2 {b 1 } = s 2 {b 1 } = σ 2 (Xi X ) 2 MSE (Xi X ) 2 And we can find σ 2 {Ȳ } = 1 n 2 σ 2 {Y i } = nσ2 n 2 = σ2 n

Sampling Distribution of Ŷh So, plugging in, we get σ 2 {Ŷ h } = σ2 n + σ 2 (Xi X ) 2 (X h X ) 2 Or ( 1 σ 2 {Ŷ h } = σ 2 n + (X h X ) 2 ) (Xi X ) 2

Sampling Distribution of Ŷh Since we often won t know σ 2 we can, as usual, plug in S 2 = SSE/(n 2), our estimate for it to get our estimate of this sampling distribution variance ( 1 s 2 {Ŷ h } = S 2 n + (X h X ) 2 ) (Xi X ) 2

No surprise... The sampling distribution of our point estimator for the output is distributed as a t-distribution with two degrees of freedom Ŷ h E{Y h } t(n 2) s{ŷ h } This means that we can construct confidence intervals in the same manner as before.

Confidence Intervals for E{Y h } The 1 α confidence intervals for E{Y h } are Ŷ h ± t(1 α/2; n 2)s{Ŷ h } From this hypothesis tests can be constructed as usual.

Comments The variance of the estimator Ŷ h is smallest near the mean of X. Designing studies such that the mean of X is near X h will improve inference precision When X h is zero the variance of the estimator Ŷh reduces to the variance of the estimator b 0 for β 0

Prediction interval for new input X h Roughly the same idea as for E{Y h } where X h is a known input point included in the estimation of b 1, b 0, and s 2 If all regression parameters are known then the 1 α prediction interval for a new observation Y h is E{Y h } ± z(1 α/2)σ

Prediction interval for new input X h If the regression parameters are unknown the 1 α prediction interval for a new observation Y h(new) is given by the following theorem Y h(new) Ŷ h t(n 2) s{pred} for the normal error regression model. s{pred} to be defined shortly. It follows directly that the 1 α prediction limits for Y h(new) are Ŷ h ± t(1 α/2; n 2)s{pred} This is very nearly the same as prediction for a known value of X but includes a correction for the fact that there is additional variability arising from the fact that the new input location was not used in the original estimates of b 1, b 0, and s 2

Prediction interval for new input X h Because Y h(new) is independent of Ŷ h we can directly write σ 2 {pred} = σ 2 {Y h(new) Ŷh} = σ 2 {Y h(new) }+σ 2 {Ŷh} = σ 2 +σ 2 {Ŷh} where from before we have that ( 1 σ 2 {Ŷh} = σ 2 n + (X h X ) 2 ) (Xi X ) 2 so [ σ 2 {pred} = σ 2 1 + 1 n + (X h X ) 2 ] (Xi X ) 2 but as before we don t know σ 2 so we will replace it...

Prediction interval for new input X h The value of s 2 {pred} is given by s 2 {pred} = MSE [1 + 1n + (X h X ) 2 ] (Xi X ) 2 Note that this quantity is slightly larger than s 2 {Ŷh}. It has two components The variance of the distribution of y at X = X h, namely σ 2 The variance of the sampling distribution of Ŷ h, namely s 2 {Ŷ h }.

Summary After this lecture you should be able to confidently do estimation, prediction, and hypothesis testing about the slope, intercept, and predicted values at any input point, old or new in the normal error linear regression setting.