Inference in Regression Analysis

Similar documents
Bias Variance Trade-off

Regression Estimation Least Squares and Maximum Likelihood

Inference in Normal Regression Model. Dr. Frank Wood

Remedial Measures, Brown-Forsythe test, F test

Multiple Regression. Dr. Frank Wood. Frank Wood, Linear Regression Models Lecture 12, Slide 1

Formal Statement of Simple Linear Regression Model

Simple and Multiple Linear Regression

Ch 2: Simple Linear Regression

Regression Models - Introduction

Simple Linear Regression

Multiple Linear Regression

Simple Linear Regression

STAT5044: Regression and Anova. Inyoung Kim

Regression Models - Introduction

Regression Estimation - Least Squares and Maximum Likelihood. Dr. Frank Wood

BNAD 276 Lecture 10 Simple Linear Regression Model

Lecture 14 Simple Linear Regression

Lecture 3: Inference in SLR

Measuring the fit of the model - SSR

Linear models and their mathematical foundations: Simple linear regression

Simple linear regression

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

Review: General Approach to Hypothesis Testing. 1. Define the research question and formulate the appropriate null and alternative hypotheses.

Chapter 12 - Lecture 2 Inferences about regression coefficient

STAT420 Midterm Exam. University of Illinois Urbana-Champaign October 19 (Friday), :00 4:15p. SOLUTIONS (Yellow)

LECTURE 6. Introduction to Econometrics. Hypothesis testing & Goodness of fit

1. Simple Linear Regression

Inference for Regression

Lecture 15 Multiple regression I Chapter 6 Set 2 Least Square Estimation The quadratic form to be minimized is

Review of Econometrics

Y i = η + ɛ i, i = 1,...,n.

Math 3330: Solution to midterm Exam

INTERVAL ESTIMATION AND HYPOTHESES TESTING

Review of Classical Least Squares. James L. Powell Department of Economics University of California, Berkeley

Inference for Regression Simple Linear Regression

Lecture 6 Multiple Linear Regression, cont.

Correlation Analysis

Lectures on Simple Linear Regression Stat 431, Summer 2012

Nonparametric Regression and Bonferroni joint confidence intervals. Yang Feng

Linear Models and Estimation by Least Squares

Ch 3: Multiple Linear Regression

Econometrics A. Simple linear model (2) Keio University, Faculty of Economics. Simon Clinet (Keio University) Econometrics A October 16, / 11

Linear Regression. 1 Introduction. 2 Least Squares

AMS 315/576 Lecture Notes. Chapter 11. Simple Linear Regression

6. Multiple Linear Regression

Inferences for Regression

A Bayesian Treatment of Linear Gaussian Regression

Statistical View of Least Squares

Chapter 1 Linear Regression with One Predictor

STAT Chapter 11: Regression

EC2001 Econometrics 1 Dr. Jose Olmo Room D309

ECON The Simple Regression Model

Homoskedasticity. Var (u X) = σ 2. (23)

Simple Linear Regression

Simple Linear Regression: The Model

Review of probability and statistics 1 / 31

Linear Model Under General Variance

Applied Econometrics (QEM)

Lecture 18: Simple Linear Regression

Lecture 10 Multiple Linear Regression

Introduction to Simple Linear Regression

MS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari

2. A Review of Some Key Linear Models Results. Copyright c 2018 Dan Nettleton (Iowa State University) 2. Statistics / 28

Multiple Regression Analysis: The Problem of Inference

Lecture 3: Multiple Regression

Lectures 5 & 6: Hypothesis Testing

Simple Linear Regression

Lecture 15. Hypothesis testing in the linear model

Simple Linear Regression Analysis

LECTURE 2 LINEAR REGRESSION MODEL AND OLS

Homework 2: Simple Linear Regression

Multiple Regression Analysis

Applied Regression Analysis

Simple Linear Regression. (Chs 12.1, 12.2, 12.4, 12.5)

17: INFERENCE FOR MULTIPLE REGRESSION. Inference for Individual Regression Coefficients

Peter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8

Financial Econometrics

where x and ȳ are the sample means of x 1,, x n

Outline. Remedial Measures) Extra Sums of Squares Standardized Version of the Multiple Regression Model

1 Statistical inference for a population mean

STAT 4385 Topic 03: Simple Linear Regression

Lecture 5: Clustering, Linear Regression

Inference for Regression Inference about the Regression Model and Using the Regression Line

Data Mining Stat 588

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X.

coefficients n 2 are the residuals obtained when we estimate the regression on y equals the (simple regression) estimated effect of the part of x 1

Linear Regression & Correlation

Section 3: Simple Linear Regression

Linear Regression Model. Badr Missaoui

Statistics for Engineers Lecture 9 Linear Regression

Coefficient of Determination

Simple Linear Regression

ECO220Y Simple Regression: Testing the Slope

Introduction to Statistical Inference Lecture 8: Linear regression, Tests and confidence intervals

MS&E 226: Small Data

Density Temp vs Ratio. temp

Multiple Linear Regression

Lecture 2 Simple Linear Regression STAT 512 Spring 2011 Background Reading KNNL: Chapter 1

Chapter 2 Multiple Regression I (Part 1)

Graduate Econometrics Lecture 4: Heteroskedasticity

Transcription:

Inference in Regression Analysis Dr. Frank Wood Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 4, Slide 1

Today: Normal Error Regression Model Y i = β 0 + β 1 X i + ǫ i Y i value of the response variable in the i th trial β 0 and β 1 are parameters X i is a known constant, the value of the predictor variable in the i th trial ǫ i ~ iid N(0,σ 2 ) i = 1,,n Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 4, Slide 2

Inferences concerning β 1 Tests concerning β 1 (the slope) are often of interest, particularly H 0 : β 1 = 0 H a : β1 0 the null hypothesis model Y i = β 0 +(0)X i + ǫ i implies that there is no relationship between Y and X Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 4, Slide 3

Review : Hypothesis Testing Elements of a statistical test Null hypothesis, H 0 Alternative hypothesis, H a Test statistic Rejection region Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 4, Slide 4

Review : Hypothesis Testing - Errors Errors A type I error is made if H 0 is rejected when H 0 is true. The probability of a type I error is denoted by α. The value of α is called the level of the test. A type II error is made if H 0 is accepted when H a is true. The probability of a type II error is denoted by β. Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 4, Slide 5

P-value The p-value, or attained significance level, is the smallest level of significance α for which the observed data indicate that the null hypothesis should be rejected. Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 4, Slide 6

Null Hypothesis If β 1 = 0 then with 95% confidence the b 1 would fall in some range around zero 40 Guess, y = 0x + 21.2, mse: 37.1 True, y = 2x + 9, mse: 4.22 35 Response/Output 30 25 20 15 10 1 2 3 4 5 6 7 8 9 10 11 Predictor/Input Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 4, Slide 7

Alternative Hypothesis : Least Squares Fit 40 35 Estimate, y = 2.09x + 8.36, mse: 4.15 True, y = 2x + 9, mse: 4.22 Response/Output 30 25 20 b 1 rescaled is test statistic 15 10 1 2 3 4 5 6 7 8 9 10 11 Predictor/Input Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 4, Slide 8

Testing This Hypothesis Only have a finite sample Different finite set of samples (from the same population / source) will (almost always) produce different estimates of β 0 and β 1 (b 0, b 1 ) given the same estimation procedure b 0 and b 1 are random variables whose sampling distributions can be statistically characterized Hypothesis tests can be constructed using these distributions. Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 4, Slide 9

Example : Sampling Dist. Of b 1 The point estimator for b 1 is b 1 = (Xi X)(Y i Ȳ) (Xi X) 2 The sampling distribution for b 1 is the distribution over b 1 that occurs when the predictor variables X i are held fixed and the observed outputs are repeatedly sampled Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 4, Slide 10

Sampling Dist. Of b 1 In Normal Regr. Model For a normal error regression model the sampling distribution of b 1 is normal, with mean and variance given by E(b 1 ) = β 1 σ 2 V(b 1 ) = (Xi X) 2 To show this we need to go through a number of algebraic steps. Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 4, Slide 11

First step To show (Xi X)(Y i Ȳ)= (X i X)Y i we observe (Xi X)(Y i Ȳ) = (X i X)Y i (X i X)Ȳ = (X i X)Y i Ȳ (X i X) = (X i X)Y i Ȳ (X i )+Ȳn Xi n = (X i X)Y i Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 4, Slide 12

Slope as linear combination of outputs b 1 can be expressed as a linear combination of the Y i s b 1 = (Xi X)(Y i Ȳ) (Xi X) 2 = (Xi X)Y i (Xi X) 2 where = k i Y i k i = (Xi X) (Xi X) 2 Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 4, Slide 13

Properties of the k i s It can be shown that ki = 0 ki X i = 1 k 2 i = 1 (Xi X) 2 (possible homework). We will use these properties to prove various properties of the sampling distributions of b 1 and b 0. write on board Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 4, Slide 14

Normality of b 1 s Sampling Distribution Useful fact: A linear combination of independent normal random variables is normally distributed More formally: when Y 1,, Y n are independent normal random variables, the linear combination a 1 Y 1 + a 2 Y 2 + + a n Y n is normally distributed, with mean a i E(Y i ) and variance a 2 iv(y i ) Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 4, Slide 15

Normality of b 1 s Sampling Distribution Since b 1 is a linear combination of the Y i s and each Y i is an independent normal random variable, then b 1 is distributed normally as well b 1 = k i Y i, k i = (X i X) (Xi X) 2 write on board Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 4, Slide 16

b 1 is an unbiased estimator This can be seen using two of the properties E(b 1 ) = E( k i Y i )= k i E(Y i )= k i (β 0 + β 1 X i ) = β 0 ki + β 1 ki X i = β 0 (0)+β 1 (1) = β 1 Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 4, Slide 17

Variance of b 1 Since the Y i are independent random variables with variance σ 2 and the k i s are constants we get V(b 1 ) = V( k i Y i )= k 2 i V(Y i) = k 2 i σ2 = σ 2 k 2 i = σ 2 1 (Xi X) 2 note that this assumes that we know σ 2. Can we? Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 4, Slide 18

Estimated variance of b 1 If we don t know σ 2 then we can replace it with the MSE estimate Remember s 2 = MSE= SSE n 2 = (Yi Ŷ i ) 2 n 2 = e 2 i n 2 plugging in we get V(b 1 ) = ˆV(b 1 ) = σ 2 (Xi X) 2 s 2 (Xi X) 2 Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 4, Slide 19

Digression : Gauss-Markov Theorem In a regression model where E(ǫ i ) = 0 and variance V(ǫ i ) = σ 2 < and ǫ i and ǫ j are uncorrelated for all i and j the least squares estimators b 0 and b 1 and unbiased and have minimum variance among all unbiased linear estimators. Remember b 1 = (Xi X)(Y i Ȳ) (Xi X) 2 b 0 = Ȳ b 1 X Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 4, Slide 20

Proof The theorem states that b 1 as minimum variance among all unbiased linear estimators of the form ˆβ 1 = c i Y i As this estimator must be unbiased we have E(ˆβ 1 ) = c i E(Y i )=β 1 = c i (β 0 + β 1 X i )=β 0 ci + β 1 ci X i = β 1 Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 4, Slide 21

Proof cont. Given these constraints β 0 ci + β 1 ci X i = β 1 clearly it must be the case that c i =0 and c i X i = 1 write these on board as conditions of unbiasedness The variance of this estimator is V(ˆβ 1 ) = c 2 i V(Y i )=σ 2 c 2 i Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 4, Slide 22

Proof cont. Now define c i = k i + d i where the k i are the constants we already defined and the d i are arbitrary constants. Let s look at the variance of the estimator V(ˆβ 1 ) = c 2 i V(Y i )=σ 2 (k i + d i ) 2 = σ 2 ( k 2 i + d 2 i +2 k i d i ) Note we just demonstrated that σ 2 k 2 i = V(b 1) Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 4, Slide 23

Proof cont. Now by showing that k i d i = 0 we re almost done ki d i = k i (c i k i ) = k i (c i k i ) = k i c i k 2 i = c i ( Xi X (Xi X) 2 ) 1 (Xi X) 2 = ci X i X c i (Xi X) 2 1 (Xi X) 2 =0 from conditions of unbiasedness Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 4, Slide 24

So we are left with Proof end V(ˆβ 1 ) = σ 2 ( k 2 i + d 2 i) = V(b 1 )+σ 2 ( d 2 i ) which is minimized when the d i s = 0. This means that the least squares estimator b 1 has minimum variance among all unbiased linear estimators. Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 4, Slide 25

Sampling Distribution of (b 1 - β 1 )/S(b 1 ) b 1 is normally distributed so (b 1 -β 1 )/(V(b 1 ) 1/2 ) is a standard normal variable We don t know V(b 1 ) so it must be estimated from data. We have already denoted it s estimate ˆV(b 1 ) Using this estimate we it can be shown that b 1 β 1 Ŝ(b 1 ) t(n 2) Ŝ(b 1)= ˆV(b1 ) Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 4, Slide 26

Where does this come from? We need to rely upon the following theorem For the normal error regression model SSE σ 2 = (Yi Ŷ i ) 2 σ 2 χ 2 (n 2) and is independent of b 0 and b 1 Intuitively this follows the standard result for the sum of squared normal random variables Here there are two linear constraints imposed by the regression parameter estimation that each reduce the number of degrees of freedom by one. Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 4, Slide 27

Another useful fact : t distribution Let z and χ 2 (ν) be independent random variables (standard normal and χ 2 respectively). We then define a t random variable as follows: t(ν)= z χ 2 (ν) ν This version of the t distribution has one parameter, the degrees of freedom ν Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 4, Slide 28

Distribution of the studentized statistic To derive the distribution of this statistic, first we do the following rewrite b1 β1 b 1 β 1 Ŝ(b 1 ) = S(b 1 ) Ŝ(b 1 ) S(b 1 ) This is a standard normal variable Ŝ(b 1 ) ˆV(b1 S(b 1 ) = ) V(b 1 ) Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 4, Slide 29

Studentized statistic cont. And note the following (X i X) 2 MSE ˆV(b 1 ) V(b 1 ) = = MSE σ 2 σ = SSE 2 σ 2 (n 2) (X i X) 2 where we know (by the given theorem) the distribution of the last term is χ 2 and indep. of b 1 and b 0 SSE σ 2 (n 2) χ2 (n 2) n 2 Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 4, Slide 30

Studentized statistic final But by the given definition of the t distribution we have our result b 1 β 1 Ŝ(b 1 ) t(n 2) because putting everything together we can see that b 1 β 1 Ŝ(b 1 ) z χ 2 (n 2) n 2 Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 4, Slide 31

Confidence Intervals and Hypothesis Tests Now that we know the sampling distribution of b 1 (t with n-2 degrees of freedom) we can construct confidence intervals and hypothesis tests easily Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 4, Slide 32