CLRM estimation Pietro Coretto Econometrics

Similar documents
Slide Set 13 Linear Model with Endogenous Regressors and the GMM estimator

Geometry of LS. LECTURE 3 GEOMETRY OF LS, PROPERTIES OF σ 2, PARTITIONED REGRESSION, GOODNESS OF FIT

Algebra of Least Squares

First, note that the LS residuals are orthogonal to the regressors. X Xb X y = 0 ( normal equations ; (k 1) ) So,

Linear Regression Models, OLS, Assumptions and Properties

Properties and Hypothesis Testing

ECON 3150/4150, Spring term Lecture 3

(all terms are scalars).the minimization is clearer in sum notation:

Chapter 1 Simple Linear Regression (part 6: matrix version)

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d

Simple Linear Regression

Statistical Properties of OLS estimators

Solutions to Odd Numbered End of Chapter Exercises: Chapter 4

ECON 3150/4150, Spring term Lecture 1

Lecture 22: Review for Exam 2. 1 Basic Model Assumptions (without Gaussian Noise)

Introduction to Econometrics (3 rd Updated Edition) Solutions to Odd- Numbered End- of- Chapter Exercises: Chapter 4

Efficient GMM LECTURE 12 GMM II

3/3/2014. CDS M Phil Econometrics. Types of Relationships. Types of Relationships. Types of Relationships. Vijayamohanan Pillai N.

Inverse Matrix. A meaning that matrix B is an inverse of matrix A.

Linear Regression Demystified

Economics 241B Relation to Method of Moments and Maximum Likelihood OLSE as a Maximum Likelihood Estimator

LECTURE 8: ORTHOGONALITY (CHAPTER 5 IN THE BOOK)

MA Advanced Econometrics: Properties of Least Squares Estimators

S Y Y = ΣY 2 n. Using the above expressions, the correlation coefficient is. r = SXX S Y Y

Definitions and Theorems. where x are the decision variables. c, b, and a are constant coefficients.

1 Inferential Methods for Correlation and Regression Analysis

Regression, Inference, and Model Building

Correlation Regression

Solution to Chapter 2 Analytical Exercises

Full file at

(VII.A) Review of Orthogonality

STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS. Comments:

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity

1 General linear Model Continued..

Simple Regression Model

Investigating the Significance of a Correlation Coefficient using Jackknife Estimates

Machine Learning Regression I Hamid R. Rabiee [Slides are based on Bishop Book] Spring

Summary and Discussion on Simultaneous Analysis of Lasso and Dantzig Selector

The Method of Least Squares. To understand least squares fitting of data.

Matrix Representation of Data in Experiment

In this section we derive some finite-sample properties of the OLS estimator. b is an estimator of β. It is a function of the random sample data.

Lesson 11: Simple Linear Regression

POLS, GLS, FGLS, GMM. Outline of Linear Systems of Equations. Common Coefficients, Panel Data Model. Preliminaries

6. Kalman filter implementation for linear algebraic equations. Karhunen-Loeve decomposition

Dr. Maddah ENMG 617 EM Statistics 11/26/12. Multiple Regression (2) (Chapter 15, Hines)

1 Covariance Estimation

Definition 4.2. (a) A sequence {x n } in a Banach space X is a basis for X if. unique scalars a n (x) such that x = n. a n (x) x n. (4.

Statistics 203 Introduction to Regression and Analysis of Variance Assignment #1 Solutions January 20, 2005

Economics 326 Methods of Empirical Research in Economics. Lecture 8: Multiple regression model

Machine Learning for Data Science (CS 4786)

Introduction to Optimization Techniques

WEIGHTED LEAST SQUARES - used to give more emphasis to selected points in the analysis. Recall, in OLS we minimize Q =! % =!

Maximum Likelihood Estimation

Questions and answers, kernel part

Symmetric Matrices and Quadratic Forms

Matrix Algebra 2.3 CHARACTERIZATIONS OF INVERTIBLE MATRICES Pearson Education, Inc.

Open book and notes. 120 minutes. Cover page and six pages of exam. No calculators.

TAMS24: Notations and Formulas

Why learn matrix algebra? Vectors & Matrices with statistical applications. Brief history of linear algebra

Simple Regression. Acknowledgement. These slides are based on presentations created and copyrighted by Prof. Daniel Menasce (GMU) CS 700

Problem Set 4 Due Oct, 12

Apply change-of-basis formula to rewrite x as a linear combination of eigenvectors v j.

Asymptotic Results for the Linear Regression Model

UNIT 11 MULTIPLE LINEAR REGRESSION

Circle the single best answer for each multiple choice question. Your choice should be made clearly.

CEU Department of Economics Econometrics 1, Problem Set 1 - Solutions

Convergence of random variables. (telegram style notes) P.J.C. Spreij

11 Correlation and Regression

Recurrence Relations

Machine Learning for Data Science (CS 4786)

Probability 2 - Notes 10. Lemma. If X is a random variable and g(x) 0 for all x in the support of f X, then P(g(X) 1) E[g(X)].

Local Polynomial Regression

Question 1: Exercise 8.2

24 MATH 101B: ALGEBRA II, PART D: REPRESENTATIONS OF GROUPS

Topic 9: Sampling Distributions of Estimators

Estimation for Complete Data

Stat 139 Homework 7 Solutions, Fall 2015

arxiv: v1 [math.pr] 13 Oct 2011

Chapter 3 Inner Product Spaces. Hilbert Spaces

Introduction to Optimization Techniques. How to Solve Equations

11 THE GMM ESTIMATION

Lecture 7: Density Estimation: k-nearest Neighbor and Basis Approach

The Basic Space Model

Least Squares Methods

Study the bias (due to the nite dimensional approximation) and variance of the estimators

Single-Equation GMM: Estimation

Lecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting

Lecture 12: September 27

Simple Linear Regression

Assessment and Modeling of Forests. FR 4218 Spring Assignment 1 Solutions

Regression with an Evaporating Logarithmic Trend

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.

Introduction to Machine Learning DIS10

Outline. Linear regression. Regularization functions. Polynomial curve fitting. Stochastic gradient descent for regression. MLE for regression

Lecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting

The central limit theorem for Student s distribution. Problem Karim M. Abadir and Jan R. Magnus. Econometric Theory, 19, 1195 (2003)

MATHEMATICAL SCIENCES PAPER-II

Lecture 16: UMVUE: conditioning on sufficient and complete statistics

Regression and generalization

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 11

Transcription:

Slide Set 4 CLRM estimatio Pietro Coretto pcoretto@uisa.it Ecoometrics Master i Ecoomics ad Fiace (MEF) Uiversità degli Studi di Napoli Federico II Versio: Thursday 24 th Jauary, 2019 (h08:41) P. Coretto MEF CLRM estimatio 1 / 22 Least Squares Method (LS) Give a additive regressio model: y = f (X; β) + ε ote that ε is ot observed, but it is fuctio of observables ad the ukow parameter ε = y f (X; β) LS method: assume the sigal f (X; β) is much stroger tha the error ε. look for a β such that the size of ε is as small as possible size of ε is measured by some orm ε P. Coretto MEF CLRM estimatio 2 / 22

Ordiary Least Squares estimator (OLS) OLS = LS with 2. Therefore the OLS objective fuctio is S(β) = ε 2 2 = ε ε = (y f (X; β)) (y f (X; β)), ad the OLS estimator b is defied as the optimal solutio b = arg mi S(β) β R K For the liear model S(β) = ε 2 2 = ε ε = (y Xβ) (y Xβ) = S(β) is icely covex! ε 2 i = (y i x iβ) 2 P. Coretto MEF CLRM estimatio 3 / 22 Propositio: OLS estimator The uique OLS estimator is b = (X X) 1 X y To see this, first we itroduce two simple matrix derivative rules: 1 Let a, b R p the a b b = b a b = a 2 Let b R p, ad let A R p p be symmetric, the a Ab b = 2Ab = 2b A P. Coretto MEF CLRM estimatio 4 / 22

Proof. Rewrite the LS objective fuctio S(β) =(y Xβ) (y Xβ) =y y β X y y Xβ + β X Xβ Note that the traspose of a scalar is the scalar itself, the so that we write y Xβ = (y Xβ) = β X y S(β) = y y 2β (X y) + β (X X)β (4.1) Sice S( ) is covex, there exists a miimum b which will satisfy the first order coditios S(β) β = 0 β=b P. Coretto MEF CLRM estimatio 5 / 22 By applyig the previous derivative rules (1) ad (2) to the 2 d ad 3 rd term of (4.1) S(b) b = 2(X y) + 2(X X)b = 0 Which lead to the so called ormal equatios (X X)b = X y The matrix X X is square symmetric (see homeworks). Based o A3 with probability 1 X X is o sigular, the (X X) 1 exists, the the ormal equatio ca be writte as (X X) 1 (X X)b = (X X) 1 X y = b = (X X) 1 X y which proves the desired result P. Coretto MEF CLRM estimatio 6 / 22

Formulatio i terms of sample averages It ca be show (see homeworks) that X X = x i x i ad X y = x i y i Defie S xx = 1 X X = 1 x i x i ad s xy = 1 X y = 1 x i y i Therefore b = (X X) 1 X y ca be writte as ) 1 1 1 b =( X X X y ( ) 1 1 ( ) 1 = x i x i x i y i =S 1 xx s xy P. Coretto MEF CLRM estimatio 7 / 22 Oce β is estimated via b, the estimated error, also called residual is obtaied as e = y Xb Fitted values, also called the predicted values, are ŷ = Xb so that e = y ŷ Note that ŷ i = b 1 + b 2 x i2 + b 2 x i2 +... for all i = 1, 2,..., What is ŷ i? ŷ i I s the estimated coditioal expectatio of Y for the whe X 1 = 1, X 2 = x i2,..., X K = x ik P. Coretto MEF CLRM estimatio 8 / 22

Algebraic/Geometric properties of the OLS Propositio (orthogoality of residuals) The colum space of X is orthogoal to the residual vector Proof. Write the ormal equatios X Xb X y = 0 = X (y Xb) = 0 = X e = 0 Therefore for every colum X k (observed regressor) it holds true that the ier product X k e = 0. P. Coretto MEF CLRM estimatio 9 / 22 Propositio (residuals sum to zero) If the liear model icludes the costat term, the e i = (y i x ib) = 0 Proof. By assumptio we have a lier model with costat/itercept term.that is y i = β 1 + β 2 x i2 + β 3 x i3 +... + ε i Therefore X 1 = 1 = (1,, 1,..., 1). Apply the previous property the 1 st colum of X X 1 e = 1 e = ad this proves the property e i = 0 P. Coretto MEF CLRM estimatio 10 / 22

Propositio (Fitted vector is a projectio) ŷ is the projectio of y oto the space spaed by colums of X (regressors) Proof. ŷ = Xb = X(X X) 1 X y = Py It suffices to show that that P = X(X X) 1 X is symmetric ad idempotet. P = (X(X X) 1 X ) ( (X X ) 1 ) X = X = X Therefore P is symmetric. ( (X X) ) 1 X = X(X X) 1 X = P P. Coretto MEF CLRM estimatio 11 / 22 PP = (X(X X) 1 X ) ( X(X X) 1 X ) = X(X X) 1 (X X)(X X) 1 X = X(X X) 1 X = P which shows that P is also idempotet, ad this completes the proof P it s called the ifluece matrix, because measures the impact of the observed ys o each predicted ŷ i. Elemets of the diagoal of P are called leverages, because are the ifluece y i o the the correspodig ŷ i P. Coretto MEF CLRM estimatio 12 / 22

Propositio (Orthogoal decompositio) The OLS fittig decomposes the observed vector y i the sum of two orthogoal compoets y = ŷ + e = Py + M y Remark: orthogoality implies that the idividual cotributios of each term of the decompositio of y are somewhat well idetified. Proof. First otice that e = y ŷ = y Py = (I P)y = M y where M = (I P). Therefore y = ŷ + e = Py + M y It remais to show that ŷ = Py ad e = M y are orthogoal vectors. P. Coretto MEF CLRM estimatio 13 / 22 First ote that M P = PM = 0, i fact (I P)P = I P PP = 0 Moreover Py, M y = (Py) (M y) = y P M y = y PM y = y 0y = 0 ad this completes the proof M = I P is called the residual maker matrix because it maps y ito e. It allows to write e i terms of the observables y ad X. Properties: M is idempotet ad symmetric (show it) M X = 0, i fact M X = (I P)X = X X = 0 Remark: it ca be show that this decompositio is also uique (a cosequece of Hilbert projectio theorem). P. Coretto MEF CLRM estimatio 14 / 22

OLS Projectio Source: Greee, W. H. (2011) Ecoometric Aalysis 7th Editio P. Coretto MEF CLRM estimatio 15 / 22 Estimate of the variace of the error term Mi of the LS objective fuctio S(b) = (y Xb) (y Xb) = e e This called Residual sum of squares RSS = ei 2 = e e Note that ad e = M y = M (Xβ + ε) = M ε RSS = e e = (M ε) (M ε) = ε M M ε = ε M ε P. Coretto MEF CLRM estimatio 16 / 22

Ubiased estimatio of the error variace s 2 = 1 K ei 2 = e e K = RSS K SER = stadard error of the regressio = s P. Coretto MEF CLRM estimatio 17 / 22 Estimatio error decompositio The samplig estimatio error is give by b β, ow b β = ( X X ) 1 X y β = ( X X ) 1 X (Xβ + ε) β = ( X X ) 1 (X X)β + ( X X ) 1 X ε β = β + ( X X ) 1 X ε β = ( X X ) 1 X ε The bias is the expected estimatio error: Bias(b) = E[b β] P. Coretto MEF CLRM estimatio 18 / 22

TSS = total sum of squares Let ȳ be the sample average of the observed y 1, y 2,..., y : ȳ = 1 y i, ad let ȳ = (ȳ, ȳ,..., ȳ). We ca also write ȳ = ȳ1 }{{} times TSS = the deviace (variability) observed i the idepedet variable y TSS = (y i y) 2 = (y ȳ) (y ȳ) This is a variability measure, because it computes the squared deviatios of y from its observed ucoditioal mea. P. Coretto MEF CLRM estimatio 19 / 22 ESS = explaied sum of squares ESS = the overall deviace of the predicted values of y wrt to the ucoditioal mea of y ESS = (ŷ i y) 2 = (ŷ ȳ) (ŷ ȳ) At first look this is ot exactly a measure of variability (why?). But it turs out that aother property of the OLS is that 1 ŷ i = 1 y i P. Coretto MEF CLRM estimatio 20 / 22

TSS decompositio ad goodess of fit It ca be show (we do t do this here) that TSS = ESS + RSS From the previous decompositio we get a famous (ad misused) goodess of fit statistic R 2 = ESS TSS = 1 RSS TSS R 2 is the portio of deviace observed i the y that is explaied by the liear model. This is also called coefficiet of determiatio. P. Coretto MEF CLRM estimatio 21 / 22 Problems with R 2 Icreases by addig more regressors. For this reaso its better to look at the so-called adjusted R 2 (for the degrees of freedom) which is computed as follows: R 2 = 1 RSS/( K) TSS/( 1) R 2 [0, 1] oly if the costat term is icluded i the model. So whe you estimate without itercept do t be scared if you get R 2 < 0 A extremely large R 2 is pathological, guess why! P. Coretto MEF CLRM estimatio 22 / 22