MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7

Similar documents
CAS MA575 Linear Models

MA 575 Linear Models: Cedric E. Ginestet, Boston University Mixed Effects Estimation, Residuals Diagnostics Week 11, Lecture 1

Ch 2: Simple Linear Regression

Math 423/533: The Main Theoretical Topics

MA 575 Linear Models: Cedric E. Ginestet, Boston University Regularization: Ridge Regression and Lasso Week 14, Lecture 2

MA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2

COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 9: LINEAR REGRESSION

STAT 100C: Linear models

Regression Models - Introduction

MA 575 Linear Models: Cedric E. Ginestet, Boston University Bootstrap for Regression Week 9, Lecture 1

Lecture 15. Hypothesis testing in the linear model

Simple Linear Regression

STAT 100C: Linear models

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

Peter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8

Simple and Multiple Linear Regression

Simple Linear Regression

STAT 540: Data Analysis and Regression

MS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari

BIOS 2083 Linear Models c Abdus S. Wahed

STAT763: Applied Regression Analysis. Multiple linear regression. 4.4 Hypothesis testing

Maximum Likelihood Estimation

Stat 579: Generalized Linear Models and Extensions

11 Hypothesis Testing

Least Squares Estimation-Finite-Sample Properties

14 Multiple Linear Regression

Introduction to Estimation Methods for Time Series models. Lecture 1

Simple Linear Regression

[y i α βx i ] 2 (2) Q = i=1

This model of the conditional expectation is linear in the parameters. A more practical and relaxed attitude towards linear regression is to say that

Ch 3: Multiple Linear Regression

Linear Algebra Review

Outline. Remedial Measures) Extra Sums of Squares Standardized Version of the Multiple Regression Model

General Linear Model: Statistical Inference

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

Multiple Linear Regression

IEOR 165 Lecture 7 1 Bias-Variance Tradeoff

MA 575 Linear Models: Cedric E. Ginestet, Boston University Revision: Probability and Linear Algebra Week 1, Lecture 2

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X.

Multivariate Regression

y ˆ i = ˆ " T u i ( i th fitted value or i th fit)

Correlation and Regression

Problems. Suppose both models are fitted to the same data. Show that SS Res, A SS Res, B

Lecture 15 Multiple regression I Chapter 6 Set 2 Least Square Estimation The quadratic form to be minimized is

Chapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression

The linear model is the most fundamental of all serious statistical models encompassing:

18.S096 Problem Set 3 Fall 2013 Regression Analysis Due Date: 10/8/2013

Formal Statement of Simple Linear Regression Model

So far our focus has been on estimation of the parameter vector β in the. y = Xβ + u

Restricted Maximum Likelihood in Linear Regression and Linear Mixed-Effects Model

Quick Review on Linear Multiple Regression

STAT5044: Regression and Anova. Inyoung Kim

THE ANOVA APPROACH TO THE ANALYSIS OF LINEAR MIXED EFFECTS MODELS

Lecture 14 Simple Linear Regression

Association studies and regression

Summary of Chapter 7 (Sections ) and Chapter 8 (Section 8.1)

6. Multiple Linear Regression

3. For a given dataset and linear model, what do you think is true about least squares estimates? Is Ŷ always unique? Yes. Is ˆβ always unique? No.

First Year Examination Department of Statistics, University of Florida

Linear models and their mathematical foundations: Simple linear regression

Ma 3/103: Lecture 24 Linear Regression I: Estimation

MAT2377. Rafa l Kulik. Version 2015/November/26. Rafa l Kulik

Measuring the fit of the model - SSR

Chapter 14. Linear least squares

Quantitative Analysis of Financial Markets. Summary of Part II. Key Concepts & Formulas. Christopher Ting. November 11, 2017

Oct Simple linear regression. Minimum mean square error prediction. Univariate. regression. Calculating intercept and slope

Inference for Regression Simple Linear Regression

MFin Econometrics I Session 4: t-distribution, Simple Linear Regression, OLS assumptions and properties of OLS estimators

Chapter 8: Hypothesis Testing Lecture 9: Likelihood ratio tests

Lecture 3: Linear Models. Bruce Walsh lecture notes Uppsala EQG course version 28 Jan 2012

ML and REML Variance Component Estimation. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 58

AMS 315/576 Lecture Notes. Chapter 11. Simple Linear Regression

Part 6: Multivariate Normal and Linear Models

Lecture 2: Linear Models. Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011

10. Linear Models and Maximum Likelihood Estimation

Next is material on matrix rank. Please see the handout

Lecture 3: Multiple Regression

where x and ȳ are the sample means of x 1,, x n

2.1 Linear regression with matrices

13 Simple Linear Regression

MS&E 226: Small Data

Regression Estimation - Least Squares and Maximum Likelihood. Dr. Frank Wood

Matrix Approach to Simple Linear Regression: An Overview

EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix)

Regression Models - Introduction

Simple Regression Model Setup Estimation Inference Prediction. Model Diagnostic. Multiple Regression. Model Setup and Estimation.

Summer School in Statistics for Astronomers V June 1 - June 6, Regression. Mosuk Chow Statistics Department Penn State University.

LINEAR REGRESSION ANALYSIS. MODULE XVI Lecture Exercises

Essential of Simple regression

Inference for the Regression Coefficient

Business Statistics. Tommaso Proietti. Linear Regression. DEF - Università di Roma 'Tor Vergata'

9. Model Selection. statistical models. overview of model selection. information criteria. goodness-of-fit measures

Simple Linear Regression Analysis

CHAPTER EIGHT Linear Regression

Econ 620. Matrix Differentiation. Let a and x are (k 1) vectors and A is an (k k) matrix. ) x. (a x) = a. x = a (x Ax) =(A + A (x Ax) x x =(A + A )

Linear Regression. September 27, Chapter 3. Chapter 3 September 27, / 77

Review of Classical Least Squares. James L. Powell Department of Economics University of California, Berkeley

Part IB Statistics. Theorems with proof. Based on lectures by D. Spiegelhalter Notes taken by Dexter Chua. Lent 2015

Psychology 282 Lecture #3 Outline

Section 4.6 Simple Linear Regression

Transcription:

MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7 1 Random Vectors Let a 0 and y be n 1 vectors, and let A be an n n matrix. Here, a 0 and A are non-random, whereas y is a random vector, as above. Then, E[a 0 + Ay] = a 0 + AE[y]. The variance/covariance matrix of the random vector y is given by an outer product, Var[y] := E[y E[y])y E[y]) T ], In contrast to the expectation of a random vector, the variance of the transformed version of y, is non-linear, since we have Var[a 0 + Ay] = A Var[y]A T. 2 Simple Linear Regression 2.1 Distribution of the Estimators Recall that β 0 = Ȳ β 1 x, and β1 = SXY SXX. We have shown that these estimators are unbiased, in the sense that E[ β 0 X] = β 0, and E[ β 1 X] = β 1. Moreover, we have also computed the variance of the slope estimator, ) σ 2 Var[ β 1 X] =. SXX If in addition, we assume that the errors are iid draws from a normal distribution, N0, σ 2 ), we obtain the following distribution for this estimator, σ β 2 ) 1 X N β 1,. 1) SXX Finally, we have also considered the following sample estimator of the error variance for simple regression, σ 2 := 1 y ŷ i ) 2, 2) n 2 where ŷ i := β 0 + β 1 x i are the fitted values. Department of Mathematics and Statistics, Boston University 1

2.2 t-tests for Regression Coefficient Using these relationships, we can construct a t-test for the following null hypothesis, H 0 : β 1 = 0, H 1 : β 1 0; Here, we wish to test whether this particular regression coefficient is equal to a given value. Statistical inference can then be conducted by observing that under our distributional assumption on the error terms, we have 2.3 The F -test We consider the difference t 1 := β 1 tn 2). se β 1 ) SSreg := RSS 1 β 0 ) RSS 2 β 0, β 1 ). The F -test for regression is defined using the following formulae for comparing model M 1 with model M 2, F := RSS 1 RSS 2 p 2 p 1 RSS 2. n p 2 For simple regression, this gives This formula can be re-written in this manner, F := RSS 1 RSS 2 )/1, RSS 2 /n 2) F := SYY RSS 2)/1 σ 2 = SSreg σ 2. 3) That is, the F -statistic is simply defined as the re-scaled version of the difference SSreg := SYY RSS. Therefore, we are here interested in conducting the following hypothesis test, H 0 : E[Y X = x] = β 0, H 1 : E[Y X = x] = β 0 + β 1 x, We wish to test whether E[Y X = x] is constant as x varies. If the error terms are additionally assumed to be iid realizations from a normal distribution, then it can be shown that the F -statistic in equation 3) follows an F -distribution, which is denoted by F F p 2 p 1, n p 2 ), which follows from the fact that we are here considering a ratio of two independent random variables that both have a χ 2 -distribution with respective degrees of freedom p 2 p 1, and n p 2. 2.4 Coefficient of Determination R 2 ) The coefficient of determination measures the percentage of variance explained. From the definition of SSreg, we have SSreg = RSS β 0 ) RSS β 0, β 1 ) Department of Mathematics and Statistics, Boston University 2

Dividing both sides by SYY, we obtain SSreg SYY = RSS β 0 ) SYY RSS β 0, β 1 ), SYY where SYY := RSS β 0 ). This simplifies to give the coefficient of determination, or R 2, R 2 := SSreg SYY = 1 RSS β 0, β 1 ), SYY The F -statistic and R 2 have identical numerators, but different denominators, F = SSreg RSS β 0, β 1 )/n 2), and R2 = SSreg RSS β 0 ). Since we have more parameters in RSS 2 than in RSS 1, it follows that we necessarily have RSS 1 RSS 2 ; and therefore the R 2 is comprised between 0 and 1. 2.5 MSE Decomposition The MSE combines the previous two criteria, on the unbiasedness and the variance of β, through the following decomposition: E[ β β) 2 X] = E[ β E[ β X] + E[ β X] β) 2 X] = E [ β E[ β X]) 2 X ] + 2E [ β E[ β X])E[ β X] β) X ] + E [ E[ β X] β) 2 X ]. Here, the cross-product can be seen to cancel out, since the second term in this cross-product does not depend on Y, it follows that we obtain, E[ β E[ β X])E[ β X] β) X] = E[ β X] β)e[ β E[ β X]) X] = E[ β X] β)e[ β X] E[ β X]) = 0. Thus, the MSE admits the following decomposition, into a variance and a bias term: where the bias of β is defined as follows, 3 Multiple Regression 3.1 The Model MSE β, β) = Var[ β X] + b 2 β), b 2 β) := E[ β X] β) 2. Multiple linear regression MLR) is defined in the following manner, y i = p x ij β j + e i, i = 1,..., n, j=0 Department of Mathematics and Statistics, Boston University 3

which may then be reformulated, using linear algebra and letting p := p + 1, y = Xβ + e, where y and e are n 1) vectors, X is an n p ) matrix, and β is a p 1) vector. In addition to the standard OLS assumptions for simple linear regression, we will also assume that X has full rank, rankx) = p. The OLS estimators can be defined as the vector of β j s that minimizes the RSS, β := argmin RSSβ), β R p which takes the form, β = X T X) 1 X T y. 3.2 Hat Matrix The predicted values ŷ can then be written as, ŷ = X β ) = XX T X) 1 X T y =: Hy, Similarly, the residuals can also be expressed as a function of H, ê := y ŷ = y Hy = I H)y, with I denoting the n n identity matrix, and where again the residuals can also be seen to be a linear function of the observed values, y. In summary, we therefore have ŷ = Hy and ê = I H)y. Recall that H is idempotent and symmetric such that HH = H and H = H T, respectively. 3.3 ANOVA Table For a model including an intercept, the total sum of squares TSS), or SYY, can be expanded in the following manner using a given vector of predicted values ŷ for some target model: SYY := y i ȳ) 2 = y i ŷ i ) 2 + 2 y i ŷ i )ŷ i ȳ) + where the cross-term can be re-written in matrix notation, such that ŷ 1ȳ) T y ŷ) = ŷ T y ŷ) 1ȳ) T y ŷ) = ŷ T ê 1ȳ) T ê = 0, ŷ i ȳ) 2, where we have used the fact n êi = 0, which can be verified as an exercise. Therefore, we obtain the classical variance partitioning for multiple regression: n y i ȳ) 2 = n ŷ i ȳ) 2 + n y i ŷ i ) 2 SYY = SSreg + RSS n 1 = p 1 + n p. Department of Mathematics and Statistics, Boston University 4

Table 1. Analysis of Variance Table. Source df SS MS a F b p-value Regression p 1 SSreg SSreg /p 1) MSreg / σ 2 PF MSreg / σ 2 ) Residual n p RSS RSS /n p ) Total n 1 SYY SYY /n 1) a Here, let MSreg := SSreg /p 1), and σ 2 := RSS /n p ), as previously. b The F -statistic satisfies F F p 1, n p ), if in addition, e i iid N0, σ 2 ). This provides a particularly transparent way of allocating the different degrees of freedom to each variance component. We can then construct a table of variance for this model, as described in table 1. The F -statistic describes in table 1 can then be used to test for the following null hypothesis, H 0 : E[Y X = x] = β 0 H 1 : E[Y X = x] = x T β. The fact that we obtain an F -distribution depends on i) the normality of the error terms, and ii) the linearity of the modeling assumptions under both H 0 and H 1. Indeed, linearity is here required in order to derive a ratio of two χ 2 -distributions. 4 Maximum Likelihood 4.1 Probabilistic Model For some set of independent observations y i, x i ), with i = 1,..., n, we assume the following probabilistic model, ind y i Nx T i β, σ 2 ), i = 1,..., n. The likelihood function for this data set parametrized by β, σ 2 ) is then defined as a product of densities, Lβ, σ 2 ; y, X) := n py i x i, β, σ 2 ). In the case of multiple regression, the definition of the Normal distribution gives the following product, Lβ, σ 2 ; y, X) = n { 1 exp 1 } 2πσ 2 2σ 2 y i x T i β) 2. Intuitively, the maximum likelihood estimator MLE) is defined as the parameter value for which the data sample is the most likely. For a linear model like multiple regression, the set of parameters, whose values need to be optimized are composed of the vector of coefficients β and the variance σ 2, such that the MLEs is a vector of the form θ MLE := β 0,..., β p, σ 2 ). Department of Mathematics and Statistics, Boston University 5

4.2 Estimator of Variance We have seen that we can exploit the orthogonality of β and σ 2 in a Normal model, in order to maximize the likelihood by selecting these two sets of parameters independently of each other. Thus, once we have chosen β MLE, it suffices to select which gives σ MLE 2 := argmax log L β MLE, σ 2 ; y, X), σ 2 R + n σ 2 2 log2π) + n 2 logσ2 ) + 1 2σ 2 This can be readily solved in order to obtain ) y i x T β i MLE ) 2 = 0. σ 2 MLE = 1 n RSS β MLE ), which is a biased estimate of the true variance, σ 2. By contrast, the OLS estimator for this parameter is σ OLS 2 := 1 n p RSS β OLS ) = n n p σ2 MLE, which can be shown to be unbiased. In practice, we tend to favor the OLS estimator, as the MLE for σ 2 under-estimates the variance of the residuals, which can lead to spurious statistical inference on the β j s. Department of Mathematics and Statistics, Boston University 6