STA 114: Statistics. Notes 21. Linear Regression

Similar documents
[y i α βx i ] 2 (2) Q = i=1

Classical Inference for Gaussian Linear Models

Simple and Multiple Linear Regression

Review: General Approach to Hypothesis Testing. 1. Define the research question and formulate the appropriate null and alternative hypotheses.

Probability and Statistics Notes

Lecture 15. Hypothesis testing in the linear model

Problems. Suppose both models are fitted to the same data. Show that SS Res, A SS Res, B

Simple linear regression

Simple Linear Regression

Scatter plot of data from the study. Linear Regression

Business Statistics. Lecture 9: Simple Regression

Scatter plot of data from the study. Linear Regression

Lecture 14 Simple Linear Regression

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

Simple Linear Regression

Ch 2: Simple Linear Regression

Linear models and their mathematical foundations: Simple linear regression

WISE International Masters

where x and ȳ are the sample means of x 1,, x n

Correlation and Linear Regression

Multivariate Regression

Key Algebraic Results in Linear Regression

Statistics for Engineers Lecture 9 Linear Regression

Correlation 1. December 4, HMS, 2017, v1.1

Peter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8

Multivariate Regression (Chapter 10)

MS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari

Simple Linear Regression for the Climate Data

Correlation and Regression

STAT 100C: Linear models

Ma 3/103: Lecture 24 Linear Regression I: Estimation

Regression Analysis: Basic Concepts

Intro to Linear Regression

Regression Models - Introduction

STAT420 Midterm Exam. University of Illinois Urbana-Champaign October 19 (Friday), :00 4:15p. SOLUTIONS (Yellow)

MS&E 226: Small Data

17: INFERENCE FOR MULTIPLE REGRESSION. Inference for Individual Regression Coefficients

STAT5044: Regression and Anova. Inyoung Kim

Simple Linear Regression

Lecture 6 Multiple Linear Regression, cont.

Summer School in Statistics for Astronomers V June 1 - June 6, Regression. Mosuk Chow Statistics Department Penn State University.

Multiple Linear Regression

Final Exam. 1. (6 points) True/False. Please read the statements carefully, as no partial credit will be given.

Association studies and regression

(a) (3 points) Construct a 95% confidence interval for β 2 in Equation 1.

Applied Econometrics (QEM)

Linear Regression and Its Applications

This model of the conditional expectation is linear in the parameters. A more practical and relaxed attitude towards linear regression is to say that

Normal distribution We have a random sample from N(m, υ). The sample mean is Ȳ and the corrected sum of squares is S yy. After some simplification,

Business Statistics. Lecture 10: Course Review

UNIVERSITY OF MASSACHUSETTS. Department of Mathematics and Statistics. Basic Exam - Applied Statistics. Tuesday, January 17, 2017

Simple Linear Regression

2 Regression Analysis

Psychology 282 Lecture #3 Outline

ML Testing (Likelihood Ratio Testing) for non-gaussian models

The Standard Linear Model: Hypothesis Testing

Linear Regression. In this lecture we will study a particular type of regression model: the linear regression model

Regression Models - Introduction

Linear Models A linear model is defined by the expression

Stat 5102 Final Exam May 14, 2015

ECON 4160, Autumn term Lecture 1

Circle the single best answer for each multiple choice question. Your choice should be made clearly.

Linear Models and Estimation by Least Squares

Bayesian Regression Linear and Logistic Regression

Linear Models Review

Correlation and the Analysis of Variance Approach to Simple Linear Regression

Linear Regression (9/11/13)

Statistics 910, #5 1. Regression Methods

regression analysis is a type of inferential statistics which tells us whether relationships between two or more variables exist

Multilevel Models in Matrix Form. Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2

Master s Written Examination

The Simple Linear Regression Model

Data Analysis and Statistical Methods Statistics 651

Inverse of a Square Matrix. For an N N square matrix A, the inverse of A, 1

Applied Regression. Applied Regression. Chapter 2 Simple Linear Regression. Hongcheng Li. April, 6, 2013

Matrices and vectors A matrix is a rectangular array of numbers. Here s an example: A =

LECTURE 6. Introduction to Econometrics. Hypothesis testing & Goodness of fit

Regression and correlation. Correlation & Regression, I. Regression & correlation. Regression vs. correlation. Involve bivariate, paired data, X & Y

Math 423/533: The Main Theoretical Topics

Introduction to Linear Regression

Part 6: Multivariate Normal and Linear Models

Section 4.6 Simple Linear Regression

Stat 135, Fall 2006 A. Adhikari HOMEWORK 10 SOLUTIONS

Intro to Linear Regression

Statement: With my signature I confirm that the solutions are the product of my own work. Name: Signature:.

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

AMS 315/576 Lecture Notes. Chapter 11. Simple Linear Regression

Lectures on Simple Linear Regression Stat 431, Summer 2012

Chapter 10. Simple Linear Regression and Correlation

Mathematics for Economics MA course

11 Hypothesis Testing

Linear Regression. Machine Learning CSE546 Kevin Jamieson University of Washington. Oct 5, Kevin Jamieson 1

Review. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Ridge regression. Patrick Breheny. February 8. Penalized regression Ridge regression Bayesian interpretation

Master s Written Examination - Solution

Stat 502 Design and Analysis of Experiments General Linear Model

Part IB Statistics. Theorems with proof. Based on lectures by D. Spiegelhalter Notes taken by Dexter Chua. Lent 2015

F3: Classical normal linear rgression model distribution, interval estimation and hypothesis testing

Introduction to General and Generalized Linear Models

Confidence Intervals and Hypothesis Tests

Transcription:

STA 114: Statistics Notes 1. Linear Regression Introduction A most widely used statistical analysis is regression, where one tries to explain a response variable Y by an explanatory variable X based on paired data (X 1, Y 1 ),, (X n, Y n ). The most common way to model the dependence of Y on X is to look for a linear relationship with additional noise, Y i = β 0 + β 1 X i + ϵ i with ϵ 1,, ϵ n taken to be independent and identically distributed random variables with mean 0 and variance σ. The unknown quantities are (β 0, β 1, σ ). Many pairs of natural measurements exhibit such linear relationships. The figure below shows atmospheric pressure (in inches of Mercury) against boiling point of water (in degrees F) based on 17 pairs of observations. Although water s boiling point and atmospheric pressure should have a precise physical relationship, there would always be some deviation in actual measurements due to factors that are hard to control. Pressure 4 6 8 30 195 00 05 10 Temp Least squares line The straight line you see in the above figure is the line that best fits the data. This is found as follows. For any line y = b 0 +b 1 x, we can find the residuals e i = y i b 0 b 1 x i if we tried 1

to explain the observed values of Y by those of X using this line. The total deviation can be measured by the sum of squares of the residuals d(b 0, b 1 ) = e i = (y i b 0 b 1 x i ). We could find b 0 and b 1 that minimize d(b 0, b 1 ). This is easily done by using calculus. We set b 0 d(b 0, b 1 ) = 0, b 1 d(b 0, b 1 ) = 0 and solve for b 0, b 1. In particular, 0 = b 0 d(b 0, b 1 ) = (y i b 0 b 1 x i )( 1) = n(ȳ b 0 b 1 x) 0 = b 1 d(b 0, b 1 ) = (y i b 0 b 1 x i )( x i ) = ( x i y i nb 0 b 1 x i ). These are two linear equations in two unknowns b 0, b 1. The solutions are: ˆb1 = (y i ȳ)(x i x) (x i x) = y i(x i x) (x i x), ˆb0 = ȳ ˆb 1 x. Let s denote s xy = 1 (y n 1 i ȳ)(x i x) [this is the sample covariance between Y and X. Then using our old notation s x = 1 n n 1 (x i x) we can write the least squares solution as ˆb 0 = ȳ s xy x/s x, ˆb 1 = s xy /s x. The method of least squares was used by physicists working on astrological measurements in the early 18th century. A statistical framework was developed much later. The main import of the statistical development, as usual, has been to incorporate a notion of uncertainty. Statistical analysis of simple linear regression To put the linear regression relationship Y i = β 0 + β 1 X i + ϵ i into a statistical model, we need a distribution on the ϵ i s. The most common choice is a normal distribution Normal(0, σ ). This can be justified as follows: the additional factors that give rise to the noise term are many in number and act independently of each other, each making a little contribution. By the central limit theorem the aggregate of such numerous, independent, small contributions should behave like a normal variable. The mean is fixed at zero because any non-zero mean can be absorbed in the intercept β 0. So our statistical model is Y i = β 0 + β 1 X i + ϵ i, ϵ i Normal(0, σ ) with model parameters β 0 (, ), β 1 (, ), and σ > 0. We also need to assume that the error terms ϵ i s are independent of the explanatory variables X i (because the errors account for additional factors beyond the explanatory variable). Then, the log-likelihood function in the model parameters is given by, l x,y (β 0, β 1, σ ) = const n log σ (y i β 0 β 1 x i ). σ To find the MLE, first notice that for every σ, the log-likelihood function is maximized at (β 0, β 1 ) equal to the least squares solutions (ȳ s xy x/s x, s xy /s x), and so we must have ˆβ 1,MLE (x, y) = ȳ s xy x/s x, ˆβ,MLE (x, y) = s xy /s x. To shorten notations we ll just write ˆβ 0 and ˆβ 1 for ˆβ 0,MLE (x, y) and ˆβ 1,MLE (x, y) respectively.

Consequently, the MLE of σ is found by setting ˆσ MLE(x, y) = 1 n It is more common to estimate σ by ˆσ = s y x = 1 n l σ x,y ( ˆβ 0, ˆβ 1, σ ) = 0. This given, (y i ˆβ 0 ˆβ 1 x i ). (y i ˆβ 0 ˆβ 1 x i ). where n indicates that two unknown quantities (β 0 and β 1 ) were to be estimated to define the residuals. Sampling theory To construct confidence intervals and test procedures for the model parameters, we ll first need to look at their sampling theory. We ll explore these treating the explanatory variable as non-random, i.e., as if we picked and fixed the observations x 1,, x n. You can interpret this as being the conditional sampling theory of the estimated model parameters given the observed explanatory variables. It turns out that when Y i = β 0 + β 1 x i + ϵ i with ϵ i Normal(0, σ ), for any two numbers a and b, 1. a ˆβ 0 + b ˆβ 1 Normal. (n )ˆσ /σ χ (n ) { }) (aβ 0 + bβ 1, σ a + (a x b). n (n 1)s x 3. and these two random variables are mutually independent. A proof of this is given at the end of this handout. It is only for those who are curious to know why this happens. You may ignore it if you re not interested. Two special cases of the above result would be important to us. First, with a = 1 and b = 0 the result gives ˆβ 0 Normal(β 0, σ { 1 + x }) and is independent of ˆσ. Second, n (n 1)s x with a = 0 and b = 1 we have ˆβ 1 Normal(β 1, σ 1 ) and is independent of ˆσ. (n 1)s x ML confidence interval We shall only look at confidence intervals for parameters that can be written as η = aβ 0 + bβ 1 for some real numbers a and b (again, the interesting cases would be (a = 1, b = 0) corresponding to β 0 and (a = 0, b = 1) corresponding to β 1 ). From the result above it follows that ˆη = a ˆβ 0 + b ˆβ 1 Normal(η, σ η) where σ η = σ [ a Let ˆσ η = ˆσ [ a n + (a x b) (n 1)s x + (a x b) n (n 1)s x. A bit of algebra shows that ˆη η ˆσ η t(n ). 3 and ˆη is independent of ˆσ.

Therefore a 100(1 α)% confidence interval for η is given by B(x, y) = ˆη ˆσ η z n (α). This confidence interval is also the ML confidence interval, i.e., it matches {l x,y(η) max η l x,y(η) c /} for some c > 0 where l x,y(η) is the pofile log-likelihood of η, but we would not pursue a proof here. Hypotheses testing Again we restrict ourselves to parameters that can be written as η = aβ 0 + bβ 1. To test H 0 : η = η 0 against H 1 : η η 1, the size-α ML test is the one that rejects H 0 whenever η 0 is outside the 100(1 α)% ML confidence interval ˆη ˆσ η z n (α). The p-value based on these tests is precisely the α for which η 0 is just on the boundary of this interval. Of particular interest is testing H 0 : β 1 = 0 against H 1 : β 1 0. The null hypothesis says that X offers no explanation of Y within the linear relationship framework. This can be dealt as above with a = 0 and b = 1 and thus checking whether 0 is outside the interval ˆβ 1 ˆσz n (α)/ (n 1)s x. Prediction Suppose you want to predict the value Y of Y corresponding to an X = x that is known to you. The model says Y = β 0 + β 1 x + ϵ where ϵ Normal(0, σ ) and is independent of ϵ 1,, ϵ n. If we knew β 0, β 1, σ, we could give the interval β 0 + β 1 x σz(α), taking into account the variability σ of ϵ. Our best guess for β 0 + β 1 x is the fitted value ŷ = ˆβ 0 + ˆβ 1 x (the point on the least squares line with x-coordinate x ) with additional associated variability σ { 1 + ( x x ) } [follows from the result with a = 1, b = x. Also, we have an n (n 1)s x estimate ˆσ for σ. Putting all these together, it follows that the predictive interval has a 100(1 α)% coverage. Testing goodness of fit ˆβ 0 + ˆβ 1 x + ˆσ [1 + 1n + ( x x ) 1 zn (α) (n 1)s x Once we fit the model, we can construct our residuals e i = Y i ˆβ 0 ˆβ 1 x i, i = 1,, n. The histogram of these residuals should match the Normal(0, σ ) pdf, with σ estimated by ˆσ. So we could carry out a Pearson s goodness-of-fit test with these residuals in the same manner we did for iid normal data. The only difference would be that the p-value range would now be given by 1 F k 1 (Q(x)) to 1 F k 4 ((Q(x)) to reflect that 3 parameters are being estimated (β 0, β 1 and σ). Technical details (ignore if you re not interested) To dig into the sampling distribution of ˆβ 0, ˆβ 1 and ˆσ, we first need to recall an important result we had seen a long time back (Notes 7 to be precise). We saw that if Z 1,, Z n 4

Normal(0, 1) and W 1,, W n relate to Z 1,, Z n through the system of equations: W 1 = a 11 Z 1 + + a 1n Z n W = a 1 Z 1 + + a n Z n. W n = a n1 Z 1 + + a nn Z n then W 1,, W n are also independent Normal(0, 1) variables with W1 + +Wn = Z1 + + Zn provided the coefficients on each row form a vector of unit length and these vectors are orthogonal to each other. We had used this result to show that if X 1,, X n Normal(0, 1) then X Normal(0, 1/ n) and (X i X) χ (n 1) and these two are independent of each other. We will pursue a similar track here. First we note that when Y i = β 0 + β 1 x i + ϵ i with ϵ i Normal(0, σ ), Ȳ = 1 (β 0 + β 1 x i + ϵ i ) = β 0 + β 1 x + ϵ n and with u i = (x i x)/ (x i x), ˆβ 1 = Y i(x i x) (x = u i x) i Y i = u i (β 0 + β 1 x i + ϵ i ) = β 1 + u i ϵ i because u i = 0 and u ix i = 1. Define ζ 1 = ϵ 1 /σ,, ζ n = ϵ n /σ, then ζ 1,, ζ n Normal(0, 1) and from calculations above Ȳ = β 0 + β 1 x + σ ζ and ˆβ 1 = β 1 + σ u iζ i. Now get W 1,, W n as [ 1/ (x i x) u i ζ i = W = n ζ = W1 = 1 n ζ 1 + + 1 n ζ n x 1 x (x i x) ζ 1 + + W 3 = a 31 ζ 1 + + a n1 ζ n. W n = a n1 ζ 1 + + a nn ζ n x n x (x i x) ζ n where the rows are unit length and mutually orthogonal (the first two rows are so by design, and so can be extended to a set of n rows by what is known as Gram-Schmidt orthogonalization). So we have W i Normal(0, 1) and W1 + + Wn = ζ1 + + ζn. This leads to a rich collection of results. First we see that ˆβ 1 = β 1 +σw / (x i x) with W Normal(0, 1). Hence we must have ( ) σ ˆβ 1 Normal β 1, (x. i x) 5

Next, ˆβ 0 = Ȳ ˆβ 1 x = β 0 + σ [ W 1 xw n (x i x) with W 1, W independent Normal(0, 1). Therefore, [ ) 1n ˆβ 0 Normal (β 0, σ + ( x) n (x. i x) Furthermore, for any two numbers a and b a ˆβ 0 + b ˆβ 1 = aβ 1 + bβ 0 + σ Next we look at ˆσ. First note that, [ aw 1 (a x b)w n (x i x) { }) a Normal (aβ 0 + bβ 1, σ (a x b) +. n (n 1)s x Y i ˆβ 0 ˆβ 1 x i = ϵ i ( ˆβ 0 β 0 ) ( ˆβ 1 β 1 )x i = σ [ ζ i W 1 (x i x)w n (x i x) which leads to the following identity (with a bit of algebra that you can ignore) (Y i ˆβ 0 ˆβ 1 x i ) σ = ζi W1 W = W3 + + Wn because ζ i = W i. But W3 + +Wn is the sum of n independent Normal(0, 1) variables, so (n )ˆσ n = (Y i ˆβ 0 ˆβ 1 x i ) χ (n ) σ σ and because W 3,, W n are independent of W 1, W, we can conclude ˆσ is independent of ˆβ 0 and ˆβ 1. 6