Regression and Statistical Inference

Similar documents
Simple Linear Regression: The Model

Ch 2: Simple Linear Regression

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

Review of Econometrics

Linear Models and Estimation by Least Squares

MFin Econometrics I Session 4: t-distribution, Simple Linear Regression, OLS assumptions and properties of OLS estimators

Advanced Econometrics I

Intermediate Econometrics

Review of Classical Least Squares. James L. Powell Department of Economics University of California, Berkeley

Lecture 11. Multivariate Normal theory

Probability Background

Linear models and their mathematical foundations: Simple linear regression

P (x). all other X j =x j. If X is a continuous random vector (see p.172), then the marginal distributions of X i are: f(x)dx 1 dx n

ECON The Simple Regression Model

LECTURE 2 LINEAR REGRESSION MODEL AND OLS

Multivariate Regression

Least Squares Estimation-Finite-Sample Properties

Linear Regression and Its Applications

Random Variables. Random variables. A numerically valued map X of an outcome ω from a sample space Ω to the real line R

Measuring the fit of the model - SSR

TMA4255 Applied Statistics V2016 (5)

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

WISE International Masters

Econometrics I Lecture 3: The Simple Linear Regression Model

Lecture 14 Simple Linear Regression

Uses of Asymptotic Distributions: In order to get distribution theory, we need to norm the random variable; we usually look at n 1=2 ( X n ).

Lecture 15. Hypothesis testing in the linear model

Introductory Econometrics

Asymptotic Statistics-III. Changliang Zou

Questions and Answers on Heteroskedasticity, Autocorrelation and Generalized Least Squares

LIST OF FORMULAS FOR STK1100 AND STK1110

Rewrap ECON November 18, () Rewrap ECON 4135 November 18, / 35

Wooldridge, Introductory Econometrics, 4th ed. Chapter 2: The simple regression model

Econometrics Summary Algebraic and Statistical Preliminaries

Econometrics I KS. Module 1: Bivariate Linear Regression. Alexander Ahammer. This version: March 12, 2018

STAT 512 sp 2018 Summary Sheet

3. Probability and Statistics

Lecture Notes 5 Convergence and Limit Theorems. Convergence with Probability 1. Convergence in Mean Square. Convergence in Probability, WLLN

Review of Statistics

Introduction to Estimation Methods for Time Series models. Lecture 1

the error term could vary over the observations, in ways that are related

Chapter 2: Fundamentals of Statistics Lecture 15: Models and statistics

STA 2201/442 Assignment 2

Business Statistics. Tommaso Proietti. Linear Regression. DEF - Università di Roma 'Tor Vergata'

LECTURE 6. Introduction to Econometrics. Hypothesis testing & Goodness of fit

MAT2377. Rafa l Kulik. Version 2015/November/26. Rafa l Kulik

STAT 540: Data Analysis and Regression

EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix)

Part IB Statistics. Theorems with proof. Based on lectures by D. Spiegelhalter Notes taken by Dexter Chua. Lent 2015

Linear Regression. Junhui Qian. October 27, 2014

Simple and Multiple Linear Regression

Econometrics I. Introduction to Regression Analysis (Outline of the course) Marijus Radavi ius

Lecture 4: Heteroskedasticity

Ch 3: Multiple Linear Regression

Ordinary Least Squares Regression

Advanced Quantitative Methods: ordinary least squares

Simple Linear Regression

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

Quantitative Analysis of Financial Markets. Summary of Part II. Key Concepts & Formulas. Christopher Ting. November 11, 2017

1. The Multivariate Classical Linear Regression Model

Chapter 6. Panel Data. Joan Llull. Quantitative Statistical Methods II Barcelona GSE

GARCH Models Estimation and Inference

Applied Econometrics (QEM)

Simple Linear Regression

Lecture 15 Multiple regression I Chapter 6 Set 2 Least Square Estimation The quadratic form to be minimized is

Simple Linear Regression. Material from Devore s book (Ed 8), and Cengagebrain.com

STAT5044: Regression and Anova. Inyoung Kim

Peter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8

The Simple Regression Model. Part II. The Simple Regression Model

Asymptotic Theory. L. Magee revised January 21, 2013

p(z)

Qualifying Exam in Probability and Statistics.

Chapter 2. Continued. Proofs For ANOVA Proof of ANOVA Identity. the product term in the above equation can be simplified as n

ECON 616: Lecture Two: Deterministic Trends, Nonstationary Processes

Regression Models - Introduction

where x and ȳ are the sample means of x 1,, x n

General Linear Model: Statistical Inference

Econometrics Multiple Regression Analysis: Heteroskedasticity

Confidence Intervals, Testing and ANOVA Summary

Linear Regression. Simple linear regression model determines the relationship between one dependent variable (y) and one independent variable (x).

Lecture 11. Probability Theory: an Overveiw

Master s Written Examination - Solution

Introductory Econometrics

Scatter plot of data from the study. Linear Regression

Quick Review on Linear Multiple Regression

Econometrics. Week 4. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

Simple Regression Model Setup Estimation Inference Prediction. Model Diagnostic. Multiple Regression. Model Setup and Estimation.

Chapter 12 - Lecture 2 Inferences about regression coefficient

1 Appendix A: Matrix Algebra

Recent Advances in the Field of Trade Theory and Policy Analysis Using Micro-Level Data

Formulas for probability theory and linear models SF2941

Comprehensive Examination Quantitative Methods Spring, 2018

STA 2101/442 Assignment 3 1

Math Camp II. Calculus. Yiqing Xu. August 27, 2014 MIT

Large Sample Properties of Estimators in the Classical Linear Regression Model

STAT420 Midterm Exam. University of Illinois Urbana-Champaign October 19 (Friday), :00 4:15p. SOLUTIONS (Yellow)

REGRESSION ANALYSIS AND INDICATOR VARIABLES

Chapter 6. Convergence. Probability Theory. Four different convergence concepts. Four different convergence concepts. Convergence in probability

Review: General Approach to Hypothesis Testing. 1. Define the research question and formulate the appropriate null and alternative hypotheses.

Multivariate Analysis and Likelihood Inference

Transcription:

Regression and Statistical Inference Walid Mnif wmnif@uwo.ca Department of Applied Mathematics The University of Western Ontario, London, Canada 1

Elements of Probability 2

Elements of Probability CDF&PDF Recall for a given random variable (r.v.) X, the cumulative distribution function (CDF) is dened: F (x) = P {X x}, x R N (1) The probability density function (PDF): Discrete r.v. p(x) = P {X = x} (2) Continuous r.v. F (a) = a f(x)dx (3) W. Mnif 3

Elements of Probability CDF&PDF Example: Normal distribution with mean µ and variance σ 2 : N (µ, σ) f(x) = 1 σ 2π exp( (x µ)2 /(2σ 2 )) (4) Figure 1: PDF of the standard normal r.v. (µ = 0, σ = 1) W. Mnif 4

Elements of Probability CDF&PDF Example 2: Chi-square distribution with k degrees of freedom The distribution of a sum of the squares of k independent standard normal random variables Pdf: f(x; k) = 1 2 k/2 Γ(k/2) xk/2 1 e x/2, if x > 0 and 0 elsewhere, (5) where Γ is the Gamma function dened as: Γ(x) = 0 t x 1 e t dt. (6) Exercise: Find out the characteristics of the Gamma function. W. Mnif 5

Example 3: t-distribution Dened as the distribution of Elements of Probability CDF&PDF Z V/υ, (7) where Z N (0, 1), V χ 2 υ, and Z and V are independent. Pdf f(x) = Γ(υ+1 2 ) πυγ( υ 2 )(1 + x2 υ ) (υ+1)/2 (8) W. Mnif 6

k th moment about the origin Elements of Probability Moments m k = E[X k ] (9) k th central moment µ k = E[(X E[X]) k ] = E[(X m 1 ) k ] (10) Example: X N (µ, σ 2 ) m 1 = µ, µ 1 = 0, µ 2 = σ 2 W. Mnif 7

Elements of Probability Moments Theorem. If E[ X k ] < for some positive integer k, then E[ X j ] <, j = 1, 2,..., k 1 In other words: the existence of the k th moment implies the existence of all moments of lower order W. Mnif 8

Elements of Probability Moments Generating Functions Denition. Let X be a r.v. with pdf f. The moment generating function (MGF) of X, denoted by M X (t), is M X (t) = E[e tx ] provided E[e tx ] for all values t in an interval ( ɛ, ɛ), ɛ > 0. Explicitly, Continuous case: M X (t) = etx f(x)dx Discrete case: M X (t) = x etx f(x) W. Mnif 9

Elements of Probability Moments Generating Functions Theorem. If X has MGF M X (t), then E[X n ] = M (n) X (0) = dn dt nm X(t) t=0 (11) Example: Suppose X N (µ, σ 2 ), then M X (t) = e tµ+t2 σ 2 /2 (12) Proof. Exercise. W. Mnif 10

Elements of Probability Moments Generating Functions Theorem. Let X and Y be r.v.s that have the same MGF. Then, X and Y have the same distribution. Theorem. Assume X is a r.v. with MGF M X (t). Then, the MGF of Y = ax + b is for any value t such that M X (at) exists. M Y (t) = e bt M X (at) (13) W. Mnif 11

Elements of Probability Markov's Inequality Proposition. If X is a nonnegative r.v., then ɛ > 0 P {X ɛ} E[X] ɛ (14) Proof. Exercise! (Hint: Dene the r.v. Y s.t. Y = ɛ, if X ɛ and 0, elsewhere.) Corollary. (Chebyshev's Inequality) if X is a r.v. with mean µ and variance σ 2, then k > 0, P { X µ kσ} 1 k2. (15) Proof. Exercise! (Hint: Use Markov's Inequality ) W. Mnif 12

Elements of Probability Weak Law of Large Numbers Theorem. Let X 1,X 2,... be a sequence of independent and identically distributed r.v. with E[X i ] = µ and V ar(x i ) = σ 2 <. Then, ɛ > 0, P ( X 1 +... + X n n µ > ɛ) 0 as n (16) Proof. Exercise! (Hint: Use Chebyshev's Inequality) Remark: The Weak Law of Large Numbers holds for V ar(x i ) = W. Mnif 13

Elements of Probability Strong Law of Large Numbers Theorem 1. Let X 1,X 2,..., be a sequence of independent and identically distributed r.v. with E[X i ] = µ and V ar(x i ) = σ 2 <. Then, ɛ > 0, P ( lim n X 1 +... + X n n µ < ɛ) = 1 (17) W. Mnif 14

Elements of Probability Strong Law of Large Numbers Figure 2: Testing Law Large number for X i N (4, 2)) W. Mnif 15

Elements of Probability Central Limit Theorem Theorem. Let {X i } be an i.i.d vector in R d, with E[X i ] = µ and V ar[x i ] = Σ <, then 1 n (X i µ) n i=1 D n N (0, Σ) (18) Comment: The CLT is applicable to any arbitrary probability distribution. Proof. See Kallenberg(1997). W. Mnif 16

Elements of Probability Continuous Mapping Theorem Theorem. Continuous mapping 1: If X n P a and a function f continuous at a, then g(x n ) P n g(a) n Theorem. Continuous mapping 2: If X n D X and f(.) is a continuous function, then g(x n ) D n g(x) n W. Mnif 17

Elements of Probability Slutsky's Theorem Theorem. Slutsky's Theorem Suppose that X n D Y, where Y is a constant. Then, n X and Y n P n 1. X n + Y n D n X + Y 2. X n Y n D n XY 3. Y 1 D n X n n Y 1 X, provided Y is invertible. W. Mnif 18

Linear regression W. Mnif 19

Linear regression Least Squares tting Construct a straight line with equation y = b 0 + b 1 x, "best ts" the data (x 1, y 1 ),...,(x n, y n ) Denote the i th tted (or predicted) value by ŷ i = b 0 + b 1 x i, i = 1,..., n and the i th residual by e i = y i ŷ i = y i (b 0 + b 1 x i ) W. Mnif 20

Linear regression Least Squares tting Dene SSE = 1 i n e 2 i SSE is a measure of good tness to the data. When SSE = 0, all points y i lie on the line b 0 + b 1 x i. The method of least squares: ( β 0, β 1 ) = arg min (b 0,b 1 ) R 2 SSE (19) W. Mnif 21

Linear regression Least Squares tting Proposition. Dene n i=1 x = x n i i=1, ȳ = y i, S xx = n n n S yy = (y i ȳ) 2, S xy = i=1 i=1 n (x i x) 2, i=1 n (x i x)(y i ȳ) The line that best ts the data using the least squares criterion is: y = β 0 + β 1 x, W. Mnif 22

where β 1 = S xy S xx, and β 0 = ȳ β 1 x Proof. Exercise Hint: Dene r = S xy Sxx S yy as the sample covariance coecient and write the SSE as a sum of three squares. Proof that SSE = S yy (1 r 2 ) + (b 1 Sxx r S yy ) 2 + n(ȳ b 0 b 1 x) 2 Remarks: Under the least squares criterion, SSE = S yy (1 r 2 ) SSE is a measure of unexplained variability (error could not be explained by the model) W. Mnif 23

r 2 is called the coecient of determination, and is denoted by R 2 β 1 = r S yy Sxx The sample correlation and the estimated slope have the same sign Exercice Show that n i=1 (y i ȳ) 2 = n i=1 (ŷ i ȳ) 2 + n i=1 (y i ŷ) 2 and interpret the meaning of each term. W. Mnif 24

Linear regression Least Squares tting Dene: Total sum of squares SST = n i=1 (y i ȳ) 2 measure the total variability of the original observations. Regression Sum of squares SSR = n i=1 (ŷ i ȳ) 2 measure the total variability explained by the regression model. From previous slide: SST = SSR + SSE 1 = SSR SST + SSE SST SSR SST SSE SST : portion of the total variability that is explained by the model : the portion of unexplained variability. W. Mnif 25

Linear regression Least Squares tting Recall SSE = SST (1 R 2 ) R 2 = SSR SST, so if R 2 1 Most of the variability is explained if R 2 0 Inecient regression model Some properties of the residuals n i=1 e i = 0, n i=1 e i x i = 0, 1 n n i=1 ŷ i = ȳ. Exercise: Check the previous properties. W. Mnif 26

Linear regression Linear regression model Linear relationship between the response and explanatory variables + an error term The model: y = β 0 + β 1 x + ɛ where ɛ, called the error term or disturbance, includes all the other factors. Let denote by y i the response to the x i, i = 1,..., n. Assume that {(y i, x i )} n i=1 is i.i.d from a distribution y i = β 0 + β 1 x i + ɛ i where ɛ i are i.i.d and ɛ 1 x i N (0, σ 2 ). β 0 (resp. β 1 ) is called the intercept (resp. slope). W. Mnif 27

Linear regression Linear regression model Afterwards, we suppose that E(y 2 i ) <, E(x2 i ) < and E(x2 i ) 0 Theorem. follows: Under linear model assumptions, the coecients are given as β 1 = cov(x i, y i ) V ar(x i ) β 0 = E(y i ) β 1 E(x i ) Proof. By the LIE, E(x i ɛ i ) = 0 E(x i (y i β 0 β 1 x i )) = 0 On the other hand, E(y i ) = β 0 + β 1 E(x i ). After solving the two equations with two unknowns β 0 and β 1, the result is shown. W. Mnif 28

Linear regression Linear regression model Lemma. Assume that {(y i, x i )} n i=1 then is i.i.d. sample from population, ˆβ 1 = S xy S xx ˆβ 0 = y ˆβ 1 x are consistent estimators respectively to β 1 and β 0. Furthermore, the conditional distribution of ˆβ1 (resp. ˆβ0 ) is normal with mean β 1 (resp. β 0 ) and variance σ2 S xx (resp. σ 2 ( 1 n + x2 S xx ) ) W. Mnif 29

Linear regression Linear regression model Proof. Since {(y i, x i )} n i=1 is i.i.d, then {(x iy i )} n i=1 and {(x2 i )}n i=1 are also i.i.d. From the WLLN, S xy n P n cov(x i, y i ). The same for S xx P n V ar(x i). Since x 1 n x is continuous for any neighborhood of x 0 0, we apply the Continuous Mapping Theorem to get n S xx P n 1 V ar(x i ). Therefore, S xy S xx P n cov(x i,y i ) V ar(x i ) = β 1. As y and x are consistent respectively to E(y i ) and E(x i ), then we obtain the consistency for ˆβ 0 to β 0. W. Mnif 30

It can be shown that ˆβ 1 = ˆβ 0 = n i=1 n i=1 x i x y i S xx ( 1 n x(x i x S xx ))y i As y i x i N (β 0 + β 1 x i, σ 2 ) and the sum of a linear combination of i.i.d normally distributed random variables is normally distributed, we obtain the proof of the rest of the lemma. Remark: Since E( ˆβ 0 ) = β 0 and E( ˆβ 1 ) = β 1, so the estimators ˆβ 0 and ˆβ 1 are unbiased. An unknown variable should be estimated too. What is it????? W. Mnif 31

Linear regression Linear regression model The answer is σ 2. Recall that σ 2 is the variance of the dispersion the model. Small σ 2 (x i, y i ) lie close to the true regression line Large σ 2 the model is weak to explain the observed values (x i, y i ) Lemma. The statistic s 2 = n i=1 (Y i Ŷi) 2 n 2 (20) is an unbiased estimator of σ 2. Furthermore, (n 2)s 2 σ 2 χ 2 n 2, (21) W. Mnif 32

where χ 2 n 2 is the chi-square distribution with n 2 degrees of freedom. Moreover, s 2 is independent of both ˆβ 0 and ˆβ 1. Proof. The proof is basically straightforward if we show β 0 + β 1 x j, j, is normally distributed. β 0 + β 1 x j = n i=1 ( 1 n + (x j x) (x i x) S xx )y i Which is a linear combination of mutually independent, normally distributed random variables y i. Given E(y i x i ) = β 0 + β 1 x i and V (y i ) = σ 2, we get β 0 + β 1 x j is normally distributed with mean, variance and W. Mnif 33

estimated standard error resp.: E( β 0 + β 1 x j x) = β 0 + β 1 x j V ( β 0 + β 1 x j x) = σ 2 [ 1 n + (x j x) 2 ] S xx s( β 0 + β 1 1 x j x) = s n + (x j x) 2 S xx On the other hand, recall that the residuals satisfy: n i=1 n i=1 n e i = (Y i Ŷi) = 0 e i x i = i=1 n (Y i Ŷi)x i = 0 i=1 Therefore we can eliminate two summands from the sum of squares W. Mnif 34

n i=1 (y i ŷ i ) 2. Thus there are only n 2 independent y i ŷ i. We need to show that s 2 is unbiased estimator, which is equivalent to show that E( n i=1 (y i ŷ i ) 2 ) = (n 2)σ 2 We can easily proof that SSE = S yy β 2 1S xx. By taking expectations of both sides, we get E(SSE x) = E(S yy x) E( β 2 1 x)s xx or E( β 2 1 x) = σ2 S xx + β 2 1 ( Use V (X) = E(X2 ) (E(X)) 2 ) E(y 2 i ) = σ 2 + (β 0 + β 1 x i ) 2, and E(y 2 ) = σ2 n + (β 0 + β 1 x) 2 W. Mnif 35

So, E(S yy ) = (n 1)σ 2 + β 2 1S xx E(SSE) = (n 2)σ 2. Conclusion: s 2 is unbiased estimator of σ 2. W. Mnif 36

Linear regression Condence intervals for the Regression Coecients Denition. (Two sided condence intervals) An interval of the form [a, b], where a b,is said to be a 100(1 α)% condence interval for the parameter θ if P (a θ b) 1 α Exercise: Suppose that θ N (µ, σ 2 ). Find 95% condence interval for θ. W. Mnif 37

Linear regression Condence intervals for the Regression Coecients Denition. (One sided condence intervals) An interval of the form [a, ) is a 100(1 α)% lower one sided condence interval for the parameter θ if P (a θ) 1 α Similarly, an interval of the form (, b] is upper one sided condence interval for the parameter θ if P (θ b) 1 α Exercise: Suppose that θ N (µ, σ 2 ). Find 95% lower and upper one sided condence interval for θ. W. Mnif 38

Linear regression Condence intervals for the Regression Coecients For β 1 : Recall: β 1 β 1 σ/ S xx N (0, 1), (n 2)s 2 σ 2 χ 2 n 2 So, β 1 β 1 s/ S xx t n 2 s( β 1 ) = s S xx : the estimated standard error of the estimate. 100(1 α)% condence interval for β 1 β1 ± tn 2(α/2)s( β1) W. Mnif 39

Linear regression Condence intervals for the Regression Coecients For β 0 : The estimated standard error of β 0 is : s( β 0 ) = s 1 n + x2 S xx As we did for β 1, the sampling distribution of β 0 is: β 0 β 0 s( β 0 ) t n 2 100(1 α)% condence interval for β 0 β 0 ± t n 2 (α/2)s( β 0 ) W. Mnif 40

Linear regression Test of Hypotheses Suppose we want to test the hypothesis H 0 : β 1 = 0 against H 1 : β 1 0. Comparable to answer to the question: Does a variation of x have an eect on the response variable y? Under hypothesis H 0, β 1 = 0 β 1 s/ S xx t n 2 Then, a two sided level α test of H 0 is reject H 0 if β 1 s/ S xx t n 2 (α/2) W. Mnif 41

Linear regression CI for the mean response E(y i x i ) Derive point and interval estimates for E(y i x i ) we need to study the sampling distribution of the estimator β 0 + β 1 x i Theorem. The point estimator β 0 + β 1 x i is unbiased estimator of the mean response E(y i x i ). Furthermore, It is normally distributed with mean, variance, and estimated standard error given by: E( β 0 + β 1 x i ) = β 0 + β 1 x i, V ( β 0 + β 1 x i ) = σ 2 ( 1 n s( β 0 + β 1 1 x i ) = s n + (x x)2 S xx ), + (x x)2 S xx. W. Mnif 42

Proof. Exercise. Consequently, β 0 + β 1 x i (β 0 + β 1 x i ) s( β 0 + β 1 x i ) t n 2 100(1 α)% CI for the mean response: β 0 + β 1 x i ± t n 2 (α/2)s( β 0 + β 1 x i ) W. Mnif 43

Linear regression Prediction Interval Theorem. The estimated standard error s(y i ŷ i ) is given by: s(y i ŷ i ) = s 1 + 1 n + (x x)2 S xx and y i ŷ i s(y i ŷ i ) t n 2 Proof. Use y i = β 0 + β 1 x i + ɛ where ɛ x i N (0, σ 2 ) W. Mnif 44

ŷ i = β 0 + β 1 x i ŷ i is independent of ɛ V (y i ŷ i ) = V (y i ) + V (ŷ i ) 100(1 α)% prediction interval β 0 + β 1 x i ± t n 2 (α/2)s(y i ŷ i ) W. Mnif 45

Linear regression Least squares linear predictor Dene y i R: explained variable or regressand; x i R d : explanatory variables or regressors The idea is to nd a "good" predictor of y i by a linear combination of x i, i = 1,..., n. "good"=pick up a linear combination of x i which minimizes the expected squared error. In other words, β = arg min b R d E[(y i X ib) 2 ] (22) W. Mnif 46

Linear regression Least squares linear predictor Theorem. Under the assumptions 1. E[y 2 i ] < 2. E[x i x i ] is a non-singular d d matrix 3. E[x i x i] < Then, β = (E[x i x i ]) 1 E(x i y i ) Proof. We have, b, E[(y i X i b)2 ] = E(y 2 i ) 2E(y ix i )b + b E(x i x i )b W. Mnif 47

First order condition, E[(y i X i b)2 ] b = 0 Recall if Σ is a d d matrix and b and X R d, then b X X = C X ΣX X = 2ΣX 2E(x i y i ) + 2E(x i x i)b = 0 Therefore, β = (E[x i x i ]) 1 E(x i y i ) Note that if we choose for example x i1 = 1, we work within an ane W. Mnif 48

framework. Moreover, if we choose d = 2, then we obtain the same results as in slide (28). W. Mnif 49

Linear regression Least squares linear predictor Theorem. Assume {(y i, x i )} n i=1 Under model assumptions, is i.i.d sample from a given population. β = [ n x i x i] 1 n x i y i (23) i=1 i=1 is a consistent estimator to β. Proof. We have from Cauchy Schwartz inequality E(x i y i ) E( x i y i ) (E(x i x i)) 1/2 E(y 2 i ) 1 2 < As {(y i, x i )} n i=1 is i.i.d sample, then {(y ix i )} n i=1 and {(x ix i )}n i=1 are W. Mnif 50

i.i.d. From WLLN, 1 n 1 n n i=1 n i=1 x i y i x i x i P n E(x iy i ) P n E(x ix i) Since g(a) = A 1 is continuous for any invertible A. continuous mapping theorem, By using the [ 1 n n i=1 x i x i] 1 P n [E(x ix i)] 1 W. Mnif 51

So, [ n i=1 x i x i] 1 n i=1 x i y i P n [E(x ix i)] 1 E(x i y i ) We can proof easily that the estimator dened before is the same as the OLS estimator. Recall the OLS estimator is dened as arg min b R d e e, where e = Y Xb and y 1 Y =. R n 1, X = y n x 1. x n R n d W. Mnif 52

Using the rst order condition, we get β = (X X) 1 X Y Now we can proof the OLS estimator is the same as (23). It is easy to proof it, just we use the following if A = (a 1,..., a n ) R m n and b 1 B =. R n p then, AB = n b i=1 a ib i n W. Mnif 53

Linear regression OLS Denition 2. Dene a regression model by y i = x i β + ɛ i such as E(ɛ i x i ) = 0. If V (y i x i ) is constant, then the model is homoskedastic. If it depends on x i, it is called heteroskedastic model. In fact, V (y i x i ) = V (x iβ + ɛ i x i ) = V (ɛ i x i ) = E(ɛ 2 i x i ) (E(E(ɛ i x i ))) 2 = E(ɛ 2 i x i ) = σ 2 (x i ) W. Mnif 54

Linear regression Finite-Sample Properties: Homoskedastic Case The model: {y i, x i }n i=1 i.i.d sample from y i = x iβ + ɛ i E(ɛ i x i ) = 0 V (ɛ i x i ) = E(ɛ 2 i x i ) = σ 2 Let study the OLS estimator proprieties when we applied to this model. We need to study its: bias variance consistency W. Mnif 55

asymptotic distribution W. Mnif 56

Linear regression OLS Bias Recall β = (X X) 1 X Y So the bias is: β β = (X X) 1 X (Xβ + ɛ) β = β + (X X) 1 X ɛ β = (X X) 1 X ɛ So, E( β β X) = (X X) 1 X E(ɛ X) = 0 W. Mnif 57

The OLS is conditionally unbiased. We can deduce that E( β β) = 0. Therefore, the OLS is unconditionally unbiased. W. Mnif 58

Linear regression OLS Variance V ( β X) = V ( β β X) = V ((X X) 1 X ɛ X) = (X X) 1 X V (ɛ)x(x X) 1 We used the following propriety: V (AX) = AV (X)A. On the other hand, E(ɛ 2 i X) = E(ɛ 2 i x i ) = σ 2 W. Mnif 59

For i j, E(ɛ i ɛ j X) = E(ɛ i E(ɛ j X, ɛ i ) X) = E(ɛ i E(ɛ j X) X) = E(ɛ i X)E(ɛ j X) = 0 So, E(ɛɛ X) = σ 2 I n and therefore V ( β X) = σ 2 (X X) 1. Recall that for ane model, we are estimating 2 coecients, and we found that the statistic s 2 = e e n 2 is unbiased consistent estimator for σ 2. We can, using the same methodology, show that s 2 = e e n d W. Mnif 60

is also unbiased consistent estimator for σ 2 for a dimension d model. W. Mnif 61

Linear regression Consistency of OLS Suppose that {(y i, x i )}n i=1 is i.i.d from y i = x i β + ɛ i and E(y 2 i ) <, E(x i x i) < E(x i x i ) is non singular E(ɛ i x i ) = 0, E(ɛ 2 i x i) = σ 2 E(ɛ 4 i ) <, E(x4 ij ) <, j = 1,.., d Then β = (X X) 1 X Y W. Mnif 62

is a consistent estimator to β. Proof. Then β = (X X) 1 X Y = [ 1 n x i x n i] 11 n = [ 1 n i=1 n x i x i] 11 n i=1 = β + [ 1 n n x i y i i=1 n x i (x iβ + ɛ i ) i=1 n x i x i] 11 n i=1 n x i ɛ i i=1 so β P β [ 1 n n n i=1 x ix i ] 1 1 n n i=1 x P iɛ i 1 n n 0 n i=1 x iɛ i P n 0 W. Mnif 63

[ 1 n We have E(x i x i ) <, by CMT, n i=1 x ix i ] 1 (E(x ix i )) 1 And, P n E(x ij ɛ i ) E( x ij ɛ i ) (E(x 2 ij)) 1 2 (E(ɛ 2 i )) 1 2 (E(x ix i )) 1 2 σ 2 < So from WLLN, 1 n Using the CMT β n i=1 x iɛ i P n β P n E(x iɛ i ) = 0, because E(ɛ i x i ) = 0. W. Mnif 64

Linear regression Asymptotic Propriety of OLS Under the previous assumption, we have n( β β) N (0, σ 2 [E(x i x i)] 1 ) Proof. We have {y i, x i } n i=1 is i.i.d, (x iɛ i ) n i=1 E(x i ɛ i ) = 0. Also, i.i.d. Furthermore, E(ɛ 2 i x ij x im ) E( ɛ 2 i x ij x im ) (E(ɛ 4 i )) 1 2 ((E(x 4 ij )) 2 1 (E(x 4 im )) 1 1 2 ) 2 < then V (x i ɛ i ) < and is well dened. Furthermore, V (x i ɛ i ) = E(ɛ 2 i x i x i) = σ 2 E(x i x i) W. Mnif 65

By the CLT, 1 n (x i ɛ i E(x i ɛ))) = 1 n x i ɛ i n n i=1 D n i=1 N (0, σ2 E(x i x i)) By applying the Slutsky's theorem, we obtain the result. Lemma. Under model assumptions, a R d, na ( β β) s 2 a ( X X n ) 1 a N (0, 1). Proof. As s 2 = e e n d σ 2 [E(x i x i )] 1 P n σ2, Then from the CMT, s 2 ( X X n ) 1 P n W. Mnif 66

From CMT, na ( β β) And D n N (0, σ2 a [E(x i x i )] 1 a) 1 s 2 a ( X X n ) 1 a P n 1. σ 2 a [E(x i x i )] 1 a The result is proofed by using the Slutsky's theorem. W. Mnif 67

Linear regression Null Hypothesis test Suppose H 0 : a β = c 0 is true, then n(a β c0 ) s 2 a ( X X n ) 1 a N (0, 1). Test with signicance level α: We reject H 0 : a β = c 0 and accept H 1 : a β c 0 if n(a β c0 ) > q, s 2 a ( X X n ) 1 a where q is dened such P ( Z > q) = α, Z N (0, 1) W. Mnif 68

Condence Interval set We cannot reject H 0 : β = c 0,if a β q s2 a (X X) 1 a c 0 a β + q s2 a (X X) 1 a Exercise: Consider c 0 = 0 and a = (1, 0,..., 0). Interpret the result. W. Mnif 69

Linear regression Heteroscedastic Model The model: {y i, x i }n i=1 i.i.d sample from y i = x i β + ɛ i E(ɛ i x i ) = 0 V (ɛɛ x) = E(ɛ 2 i x i) = diag(σ 2 (x 1 ),..., σ 2 (x i ),..., σ 2 (x n )) = Ω E(y 2 i ) <, E(x i x i) < E(x i x i ) is non singular E(ɛ 4 i ) <, E(x4 ij ) <, j = 1,.., d W. Mnif 70

We can proof (Exercise) V ( β X) = (X X) 1 X ΩX(X X) 1 E( β X) = E( β) = β D n( β β) N (0, [E(x ix i)] 1 E(σ 2 (x i )x i x i)[e(x i x i)] 1 ) n Exercise: Derive an estimator to [E(x i x i )] 1 E(σ 2 (x i )x i x i )[E(x ix i )] 1 and a condence interval for a coecient β j. W. Mnif 71

Linear regression Generalized Least Squares estimators The model: {y i, x i }n i=1 i.i.d sample from y i = x i β + ɛ i E(ɛ i x i ) = 0 V (ɛɛ x) = E(ɛ 2 i x i) = Ω > 0 E(y 2 i ) <, E(x i x i) < E(x i x i ) is non singular E(ɛ 4 i ) <, E(x4 ij ) <, j = 1,.., d W. Mnif 72

The GLS dened by β = (X Ω 1 X) 1 X Ω 1 Y is the best linear unbiased and consistent estimator. Proof. Decompose Ω 1 = C C Dene variables Ỹ = CY, X = CX, and ɛ = Cɛ Use the following lemma: Lemma 3. (Gauss-Markov) The OLS estimator is the Best Linear Unbiased Estimator (BLUE) under the linear model with E(ɛ i x i ) = 0 and E(ɛ i x i ) = 0 W. Mnif 73

References [1] Kallenberg, O., "Foundations of Modern Probability", New York, Springer-Verlag, 1997. [2] Rosenkrantz, W. A.,"Probability and Statistics for Science, Engineering, and Finance", Chapman & Hall, 2009. [3] Wooldridge, J., "Econometric Analysis of Cross Section and Panel Data", MIT Press, 2002. W. Mnif 74