Regression #4: Properties of OLS Estimator (Part 2)

Similar documents
Regression #3: Properties of OLS Estimator

Regression #2. Econ 671. Purdue University. Justin L. Tobias (Purdue) Regression #2 1 / 24

Financial Econometrics

GLS and FGLS. Econ 671. Purdue University. Justin L. Tobias (Purdue) GLS and FGLS 1 / 22

Hypothesis Testing. Econ 690. Purdue University. Justin L. Tobias (Purdue) Testing 1 / 33

Regression #5: Confidence Intervals and Hypothesis Testing (Part 1)

Asymptotic Theory. L. Magee revised January 21, 2013

MS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari

Simple Linear Regression: The Model

Introduction to Estimation Methods for Time Series models. Lecture 1

EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix)

Ma 3/103: Lecture 24 Linear Regression I: Estimation

Problem Set #6: OLS. Economics 835: Econometrics. Fall 2012

Simple Linear Regression Model & Introduction to. OLS Estimation

Graduate Econometrics I: Asymptotic Theory

Variations. ECE 6540, Lecture 10 Maximum Likelihood Estimation

Graduate Econometrics I: Unbiased Estimation

Linear models. Linear models are computationally convenient and remain widely used in. applied econometric research

Basic Distributional Assumptions of the Linear Model: 1. The errors are unbiased: E[ε] = The errors are uncorrelated with common variance:

Introductory Econometrics

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

Business Economics BUSINESS ECONOMICS. PAPER No. : 8, FUNDAMENTALS OF ECONOMETRICS MODULE No. : 3, GAUSS MARKOV THEOREM

Introduction to Estimation Methods for Time Series models Lecture 2

1 Cricket chirps: an example

MS&E 226: Small Data

Applied Econometrics (QEM)

Econometrics A. Simple linear model (2) Keio University, Faculty of Economics. Simon Clinet (Keio University) Econometrics A October 16, / 11

Nonparametric Regression

Regression #8: Loose Ends

Gibbs Sampling in Linear Models #1

Regression and Statistical Inference

1 Appendix A: Matrix Algebra

Properties of the least squares estimates

Estimation Theory Fredrik Rusek. Chapters

ECON 3150/4150, Spring term Lecture 6

Parameter Estimation

Estimating Estimable Functions of β. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 17

Model Mis-specification

Economics 583: Econometric Theory I A Primer on Asymptotics

Review of Classical Least Squares. James L. Powell Department of Economics University of California, Berkeley

Summer School in Statistics for Astronomers V June 1 - June 6, Regression. Mosuk Chow Statistics Department Penn State University.

Advanced Statistics I : Gaussian Linear Model (and beyond)

Linear Regression. Junhui Qian. October 27, 2014

The Linear Regression Model

This model of the conditional expectation is linear in the parameters. A more practical and relaxed attitude towards linear regression is to say that

10-704: Information Processing and Learning Fall Lecture 24: Dec 7

Multiple Linear Regression

Link lecture - Lagrange Multipliers

Association studies and regression

Economics 620, Lecture 9: Asymptotics III: Maximum Likelihood Estimation

Large Sample Properties of Estimators in the Classical Linear Regression Model

The Simple Regression Model. Part II. The Simple Regression Model

Questions and Answers on Heteroskedasticity, Autocorrelation and Generalized Least Squares

Econ 582 Fixed Effects Estimation of Panel Data

Illustration of CLT: Poisson Sampling

Review of Econometrics

Econometrics Master in Business and Quantitative Methods

ECON 3150/4150, Spring term Lecture 7

Economics 620, Lecture 2: Regression Mechanics (Simple Regression)

Applied Regression. Applied Regression. Chapter 2 Simple Linear Regression. Hongcheng Li. April, 6, 2013

Quick Review on Linear Multiple Regression

Econometrics. Week 4. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

Time Series Analysis

P n. This is called the law of large numbers but it comes in two forms: Strong and Weak.

ECON3150/4150 Spring 2015

Multivariate Regression Analysis

Notes on Asymptotic Theory: Convergence in Probability and Distribution Introduction to Econometric Theory Econ. 770

Heteroscedasticity and Autocorrelation

Sensitivity of GLS estimators in random effects models

Covariance and Correlation

Lecture 28: Asymptotic confidence sets

ECON 4160, Autumn term Lecture 1

Xβ is a linear combination of the columns of X: Copyright c 2010 Dan Nettleton (Iowa State University) Statistics / 25 X =

i=1 h n (ˆθ n ) = 0. (2)

ECON 616: Lecture 1: Time Series Basics

Example: Suppose Y has a Poisson distribution with mean

MS&E 226: Small Data

Simple Linear Regression Estimation and Properties

Model comparison and selection

Lecture 4: Multivariate Regression, Part 2

SDS : Theoretical Statistics

Lectures 5 & 6: Hypothesis Testing

Final Exam. Economics 835: Econometrics. Fall 2010

Econometrics II - EXAM Answer each question in separate sheets in three hours

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

Reference: Davidson and MacKinnon Ch 2. In particular page

Simple Linear Regression

Advanced Econometrics

Estimation of the Response Mean. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 27

Linear Models and Estimation by Least Squares

ECON 5350 Class Notes Functional Form and Structural Change

The Finite Sample Properties of the Least Squares Estimator / Basic Hypothesis Testing

Chapter 3: Maximum-Likelihood & Bayesian Parameter Estimation (part 1)

ECE 275A Homework 6 Solutions

Christopher Dougherty London School of Economics and Political Science

LECTURE 2 LINEAR REGRESSION MODEL AND OLS

Accounting for Baseline Observations in Randomized Clinical Trials

Lecture 15 Multiple regression I Chapter 6 Set 2 Least Square Estimation The quadratic form to be minimized is

Lecture 24: Weighted and Generalized Least Squares

EC3062 ECONOMETRICS. THE MULTIPLE REGRESSION MODEL Consider T realisations of the regression equation. (1) y = β 0 + β 1 x β k x k + ε,

Transcription:

Regression #4: Properties of OLS Estimator (Part 2) Econ 671 Purdue University Justin L. Tobias (Purdue) Regression #4 1 / 24

Introduction In this lecture, we continue investigating properties associated with the OLS estimator. Our focus now turns to a derivation of the asymptotic normality of the estimator as well as a proof of a well-known efficiency property, known as the Gauss-Markov Theorem. Justin L. Tobias (Purdue) Regression #4 2 / 24

Asymptotic Normality To begin, let us consider the regression model when the error terms are normally distributed: y i = x i β + ɛ i, ɛ X N (0, σ 2 I n ). In this case, the sampling distribution of ˆβ (given X ) is immediate: Justin L. Tobias (Purdue) Regression #4 3 / 24

Asymptotic Normality Since it follows that ɛ X N ( 0, σ 2 I n ), Thus, the sampling distribution follows a normal distribution, with the mean and covariance matrix derived in the previous lecture. Justin L. Tobias (Purdue) Regression #4 4 / 24

Asymptotic Normality In many cases, however, we do not want to assume that the errors are normally distributed. If we replace the Gaussian assumption with something different, however, it can prove to be quite difficult to determine the exact (finite sample) sampling distribution of the OLS estimator. Instead, we can look for a large sample approximation that works for a variety of different cases. The approximation will be exact as n, and we will take it as a reasonable approximation in data sets of moderate or small sizes. Justin L. Tobias (Purdue) Regression #4 5 / 24

Asymptotic Normality With a minor abuse of the theorem itself, we first introduce the Lindberg-Levy CLT: Theorem Justin L. Tobias (Purdue) Regression #4 6 / 24

Asymptotic Normality It remains for us to figure out how to apply this result to (approximately) characterize the sampling distribution of the OLS estimator. To this end, let us write: ˆβ = (X X ) 1 X y = β + (X X ) 1 X ɛ Rearranging terms and multiplying both sides by a n, we can write and we note from our very first lecture that Justin L. Tobias (Purdue) Regression #4 7 / 24

Asymptotic Normality Thus, the term ( n) 1 X ɛ can be written as: In this last form, we can see that this term is simply a sample average of k 1 vectors x i ɛ i, scaled by n. Such quantities fall under the jurisdiction of the Lindberg-Levy CLT. Justin L. Tobias (Purdue) Regression #4 8 / 24

Asymptotic Normality Specifically, we can apply this CLT once we characterize the mean and covariance matrix of the terms appearing within the summation. To this end, note: 1 2 and Hence, we can apply the Lindberg-Levy CLT to give: Justin L. Tobias (Purdue) Regression #4 9 / 24

Asymptotic Normality As for the other key term appearing in our expression for n( ˆβ β), we note: so that OK, so let s review: Justin L. Tobias (Purdue) Regression #4 10 / 24

Asymptotic Normality Based on earlier derivations, the right hand side (Slutsky) must converge in distribution to: or We can then write: Justin L. Tobias (Purdue) Regression #4 11 / 24

Asymptotic Normality In practice, we replace the unknown population quantity [ Ex (x i x i ) ] 1 with a consistent estimate: [ ] 1 1 x i x i = n Thus, i [ ] 1 1 n X p [ X Ex (x i x i ) ] 1. We can also get an asymptotic result for the quadratic form: Justin L. Tobias (Purdue) Regression #4 12 / 24

Asymptotic Normality Note that we did not assume normality to get this result; provided the assumptions of the regression model are satisfied, the sampling distribution of ˆβ will be approximately normally distributed. This result will form the basis for testing hypotheses regarding β, as we will discuss in the following lectures. Justin L. Tobias (Purdue) Regression #4 13 / 24

Gauss-Markov Theorem We now move on to discuss an important result, related to the efficiency of the OLS estimator, known as the Gauss-Markov Theorem. This theorem states: Justin L. Tobias (Purdue) Regression #4 14 / 24

Gauss-Markov Theorem We will first prove this result for any linear combination of the elements of β. That is, suppose we seek to estimate where c is an arbitrary k 1 selector vector. For example, would select the intercept parameter. We seek to show that the OLS estimator of µ, has a variance at least as small as any other linear, unbiased estimator of µ. Justin L. Tobias (Purdue) Regression #4 15 / 24

Gauss-Markov Theorem To establish this result, let us first consider any other linear, unbiased estimator of µ. Call this estimator h. Linearity implies that h can be written in the form: for some n 1 vector a. We note that Justin L. Tobias (Purdue) Regression #4 16 / 24

Gauss-Markov Theorem For unbiasedness to hold, it must be the case that or (since this must apply for any β and c): Now, Justin L. Tobias (Purdue) Regression #4 17 / 24

Gauss-Markov Theorem The variance of our candidate estimator is: Comparing these, we obtain: Clearly, this is greater than or equal to zero, right? Justin L. Tobias (Purdue) Regression #4 18 / 24

Gauss-Markov Theorem Actually it is. To see this, note: The last line follows as the product represents a sum of squares. Justin L. Tobias (Purdue) Regression #4 19 / 24

Gauss-Markov Theorem Does this result hold unconditionally? Justin L. Tobias (Purdue) Regression #4 20 / 24

Gauss-Markov Theorem We will now prove this in a more general way, by directly comparing the covariance matrices between the two estimators. Let ˆθ 1 and ˆθ 2 be two unbiased estimators of θ. We would say that ˆθ 1 is more efficient than ˆθ 2 if the difference between the covariance matrices is negative semidefinite. That is, for any k 1 vector x 0, ( ) x Var(ˆθ 1 ) Var(ˆθ 2 ) x 0. This implies that element-by-element (and in terms of linear combinations), that ˆθ 1 is preferable to ˆθ 2. Justin L. Tobias (Purdue) Regression #4 21 / 24

Gauss-Markov Theorem Consider any other linear estimator of β, where A is k n and nonstochastic, given X. In terms of unbiasedness, so that unbiasedness implies Write where D is arbitrary. We then note: Justin L. Tobias (Purdue) Regression #4 22 / 24

Gauss-Markov Theorem Similarly, The condition that A X = I k must mean that DX = 0. This makes all the cross terms in the above vanish since, for example, Therefore, Justin L. Tobias (Purdue) Regression #4 23 / 24

Gauss-Markov Theorem Let us now consider the variance of our candidate estimator: Taking this further, The matrix DD is postive semidefinite, since x DD x is again a sum of squares. This difference is strictly positive unless D = 0, in which case β = ˆβ. We conclude that ˆβ is more efficient than β. Justin L. Tobias (Purdue) Regression #4 24 / 24