Lecture Notes Autumn 2010 Dr. Getinet Haile, University of Mannheim
1. Introduction Introduction & CLRM, Autumn Term 2010 1
What is econometrics? Econometrics = economic statistics economic theory mathematics 1 Important notions: Data perceived as realizations of random variables Parameters are real numbers, not random variables Joint distributions of random variables depend on parameters A model a set of restrictions on the joint distribution of variables 1 according to Ragnar Frisch Introduction & CLRM, Autumn Term 2010 2
Motivating Example Mincer equation - The influence of schooling on wages ln(w AGE i ) = β 1 +β 2 S i +β 3 T ENURE i +β 4 EXP R i +ε i Notation: Logarithm of the wage rate: ln(w AGE i ) Years of schooling: S i Experience in the current job: T ENURE i Experience in the labor market: EXP R i Estimation of the parameters β k, where β 2 is return to schooling Introduction & CLRM, Autumn Term 2010 3
The importance of relationships 2 Relationship among variables is what empirical analysis is all about Consider: a dependent variable y i, and a vector of k explanatory variables, x i. Let r i = (y i, x i ) ; i = 1,..., N be i.i.d. Our interest is in the relationship between y i and the explanatory variables with the ultimate agenda of: 2 this section draws from Angrist, 2009 Introduction & CLRM, Autumn Term 2010 4
Description - How does y i usually vary with x i? Prediction - Can we use x i to forecast y i? Causality - What is the effect of elements of x i on y i? Generally we look for relationships that hold on average. On average relationships among variables are summarised by the Conditional Expectation Function (CEF), where E[y i x i ] h(x i ) y i = h(x i ) + ε i E[ε i x i ] 0. Introduction & CLRM, Autumn Term 2010 5
General regression equations Generalization: y i = β 1 x i1 + β 2 x i2 +... + β K x ik + ε i Index for observations i = 1, 2,..., n and regressors k = 1, 2,..., K y i = β x i + ε i (1x1) (1xK) (Kx1) (1x1) β = β 1 β 2. and x i = x i1 x i2. β K x ik Introduction & CLRM, Autumn Term 2010 6
The key problem of econometrics: We deal with non-experimental data Unobservable variables, interdependence, endogeneity, causality Examples: - Ability bias in Mincer equation - Reverse causality problem if unemployment is regressed on liberalization index - Causal effect on police force and crime is not an independent outcome - Simultaneity problem in demand price equation Introduction & CLRM, Autumn Term 2010 7
2. The CLRM: Parameter Estimation by OLS Hayashi p. 6/15-18 Introduction & CLRM, Autumn Term 2010 8
Classical linear regression model (CLRM) y i = β 1 x i1 + β 2 x i2 +... + β K x ik + ε i = x i β + ε i (1xK) (Kx1) y i : Dependent variable, observed x i = (x i1, x i2,..., x ik ): Explanatory variables, observed β = (β 1, β 2,..., β K ): Unknown parameters ε i : Disturbance component, unobserved b = (b 1, b 2,..., b K ) estimator of β e i = y i x ib: Estimated residual Introduction & CLRM, Autumn Term 2010 9
For convenience we introduce matrix notation y = X β + ε (nx1) (nxk) (Kx1) (nx1) y 1 y 2. y n = 1 x 12 x 13... x 1K 1. x 22..... 1 x n2... x nk β 1 β 2. β K + ε 1 ε 2. ε n Introduction & CLRM, Autumn Term 2010 10
Writing extensively: A system of linear equations y 1 = β 1 + β 2 x 12 +... + β K x 1K + ε 1 y 2 = β 1 + β 2 x 22 +... + β K x 2K + ε 2. y n = β 1 + β 2 x n2 +... + β K x nk + ε n Introduction & CLRM, Autumn Term 2010 11
We estimate the linear model and choose b such that SSR is minimized Obtain an estimator b of β by minimizing the SSR (sum of squared residuals): argmins(b) = argmin n i=1 e2 i = argmin n i=1 (y i x i b)2 {b} Differentiation with respect to b 1, b 2,..., b K FOC s: (1) S(b) b 1! = 0 e i = 0 (2) S(b) b 2! = 0 e i x i2 = 0. (K) S(b) b K! = 0 e i x ik = 0 FOC s can be conveniently written in matrix notation X e = 0 Introduction & CLRM, Autumn Term 2010 12
The system of K equations is solved by matrix algebra X e = X (y Xb) = X y X Xb = 0 Premultiplying by (X X) 1 : (X X) 1 X y (X X) 1 X Xb = 0 (X X) 1 X y Ib = 0 OLS-estimator: b = (X X) 1 X y Alternatively: b = ( 1 n X X ) 1 1 n X y = ( 1 n n i=1 x ix i) 1 1 n n i=1 x iy i Introduction & CLRM, Autumn Term 2010 13
Zoom into the matrices X X and X y b = ( 1 n X X ) 1 1 n X y = ( 1 n n i=1 x ix i) 1 1 n n i=1 x iy i n i=1 x ix i = x 2 i1 xi1 x i2 xi1 x i3... xi1 x ik xi1 x i2 x 2 i2 xi2 x ik...... xi1 x ik xi2 x ik... x 2 ik n i=1 x iy i = xi1 y i xi2 y i xi3 y i. xik y i Introduction & CLRM, Autumn Term 2010 14
3. Assumptions of the CLRM Hayashi p. 3-13 Introduction & CLRM, Autumn Term 2010 15
The four core assumptions of CLRM 1.1 Linearity y i = x i β + ε i 1.2 Strict exogeneity E(ε i X) = 0 E(ε i ) = 0 and Cov(ε i, x ik ) = E(ε i x ik ) = 0 1.3 No exact multicollinearity, P (rank(x) = k) = 1 No linear dependencies in the data matrix 1.4 Spherical disturbances: V ar(ε i X) = E(ε 2 i X) = σ2 Cov(ε i, ε j X) = 0; E(ε i ε j X) = 0 E(ε i ) = σi 2 and Cov(ε i, ε j ) = 0 by LTE (see Hayashi p. 18) Introduction & CLRM, Autumn Term 2010 16
Interpreting the parameters β of different types of linear equations Linear model y i = β 1 + β 2 x i2 +... + β K x ik + ε i : A one unit increase in the independent variable x ik increases the dependent variable by β k units Semi-log form log(y i ) = β 1 + β 2 x i2 +... + β K x ik + ε i : A one unit increase in the independent variable increases the dependent variable approximately by 100 β k percent Log linear model log(y i ) = β 1 log(x i1 ) + β 2 log(x i2 ) +... + β K log(x ik ) + ε i : A one percent increase in x ik increases the dependent variable y i approximately by β k percent Introduction & CLRM, Autumn Term 2010 17
Some important laws Law of Total Expectation (LTE): E X [E Y X (Y X)] = E Y (Y ) Double Expectation Theorem (DET): E X [E Y X (g(y ) X)] = E Y (g(y )) Law of Iterated Expectations (LIE): E Z X [E Y X,Z (Y X, Z) X] = E Y X (Y X) Introduction & CLRM, Autumn Term 2010 18
Some important laws (continued) Generalized DET: E X [E Y X (g(x, Y )) X] = E X,Y (g(x, Y )) Linearity of Conditional Expectations: E Y X [g(x)y X] = g(x)e Y X [Y X] Introduction & CLRM, Autumn Term 2010 19
4. Finite sample properties of the OLS estimator Hayashi p. 27-31 Introduction & CLRM, Autumn Term 2010 20
Finite sample properties of b = (X X) 1 X y 1. E(b) = β: Unbiasedness of the estimator Holds for any sample size Holds under assumptions 1.1-1.3 2. V ar(b X) = σ 2 (X X) 1 : Conditional variance of b Conditional variance depends on the data Holds under assumptions 1.1-1.4 3. V ar(ˆβ X) V ar(b X) ˆβ is any other linear unbiased estimator of β Holds under assumptions 1.1-1.4 Introduction & CLRM, Autumn Term 2010 21
Some key results from mathematical statistics z = (nx1) z 1 z 2. z n A = (mxn) a 11 a 12... a 1n a 21 a 22...... a m1 a m2... a mn A new random variable: v = (mx1) A (mxn) z (nx1) E(v) (mx1) = E(v 1 ) E(v 2 ). E(v m ) = AE(z) V ar(v) (mxm) = AV ar(z)a Introduction & CLRM, Autumn Term 2010 22
The OLS estimator s unbiasedness E(b) = β E(b β) = 0 sampling error b β = (X X) 1 X y β = (X X) 1 X (Xβ + ε) β = (X X) 1 X Xβ + (X X) 1 X ε β = β + (X X) 1 X ε β = (X X) 1 X ε E(b β X) = (X X) 1 X E(ε X) = 0 under assumption 1.2 E X (E(b X)) = E X (β) = E(b) by the LTE Introduction & CLRM, Autumn Term 2010 23
We show that V ar(b X) = σ 2 (X X) 1 V ar(b X) = V ar(b β X) = V ar((x X) 1 X ε X) = V ar(aε X) = AV ar(ε X)A = Aσ 2 I n A = σ 2 AI n A = σ 2 AA = σ 2 (X X) 1 X X(X X) 1 = σ 2 (X X) 1 Note: β non-random b β sampling error A = (X X) 1 X V ar(ε X) = σ 2 I n Introduction & CLRM, Autumn Term 2010 24
Sketch of the proof of the Gauss Markov theorem V ar( ˆβ X) V ar(b X) V ar( ˆβ X) = V ar(ˆβ β X) = V ar[(d + A)ε X] = (D + A)V ar(ε X)(D + A ) = σ 2 (D + A)(D + A ) = σ 2 (DD + AD + DA + AA ) = σ 2 [DD + (X X) 1 ] σ 2 (X X) 1 = V ar(b X) where C is a function of X ˆβ = Cy D = C A A (X X) 1 X Details of proof: Hayashi pages 29-30 Introduction & CLRM, Autumn Term 2010 25
The OLS estimator is BLUE - OLS is the best estimator Holds under the Gauss Markov theorem V ar( ˆβ X) V ar(b X) - OLS is linear Holds under assumption 1.1 - OLS is unbiased Holds under assumption 1.1-1.3 Introduction & CLRM, Autumn Term 2010 26
5. Hypothesis Testing under Normality Hayashi p. 33-45 Introduction & CLRM, Autumn Term 2010 27
Hypothesis testing Economic theory provides hypotheses about parameters If theory is right testable implications But: Hypotheses can t be tested without distributional assumptions about ε Distributional assumption: Normality assumption about the conditional distribution of ε X MV N(0, σ 2 I n ) [Assumption 1.5] Introduction & CLRM, Autumn Term 2010 28
Some facts from multivariate statistics Vector of random variables: x = (x 1, x 2,..., x n ) Expectation vector: E(x) = µ = (µ 1, µ 2,..., µ n ) = (E(x 1 ), E(x 2 ),..., E(x n )) Variance-covariance matrix: V ar(x) = Σ = V ar(x 1 ) Cov(x 1, x 2 )... Cov(x 1, x n ) Cov(x 1, x 2 ). V ar(x 2 ).... Cov(x 1, x n )... V ar(x n ) y = c + Ax; c, A non-random vector/matrix E(y) = (E(y 1 ), E(y 2 ),..., E(y n )) = c + Aµ V ar(y) = AΣA x MV N(µ, Σ) y = c + Ax MV N(c + Aµ, AΣA ) Introduction & CLRM, Autumn Term 2010 29
Application of the facts from multivariate statistics and the assumptions 1.1-1.5 b }{{ β } = (X X) 1 X ε sampling error Assuming ε X MV N(0, σ 2 I n ) b β X MV N ( (X X) 1 X E(ε X), (X X) 1 X σ 2 I n X(X X) 1) b β X MV N ( 0, σ 2 (X X) 1) Note that V ar(b X) = σ 2 (X X) 1 OLS-estimator conditionally normally distributed if ε X is multivariate normal Introduction & CLRM, Autumn Term 2010 30
Testing hypothesis about individual parameters (t-test) Null hypothesis: H 0 : β k = β k, β k a hypothesized value, a real number Under assumption 1.5 and ε X MV N(0, σ 2 I n ) alternative hypothesis: H A : β k β k If H 0 is true E(b k ) = β k Test statistic: t k = b k β k σ 2 [(X X) 1 ] kk N(0, 1) Note: [(X X) 1 ] kk is the k-th row k-th column element of (X X) 1 Introduction & CLRM, Autumn Term 2010 31
Nuisance parameter σ 2 can be estimated σ 2 = E(ε 2 i X) = V ar(ε i X) = E(ε 2 i ) = V ar(ε i) We don t know ε i but we use the estimator e i = y i x i b ˆσ 2 = 1 n n i=1 (e i 1 n n i=1 e i) 2 = 1 n n i=1 e2 i = 1 ne e ˆσ 2 is a biased estimator: E(ˆσ 2 X) = n K n σ2 Introduction & CLRM, Autumn Term 2010 32
An unbiased estimator of σ 2 For s 2 = 1 n K n i=1 e2 i = 1 n K e e we get an unbiased estimator E(s 2 X) = 1 n K E(e e X) = σ 2 E ( E(s 2 X) ) = E(s 2 ) = σ 2 Using this provides an unbiased estimator of V ar(b X) = σ 2 (X X) 1 : V ar(b X) = s 2 (X X) 1 t-statistic under H 0 : t k = b k β k [ ] V ar(b X) kk = b k β k SE(b k ) = b k β k [ V ar(bk X) ] t(n K) Introduction & CLRM, Autumn Term 2010 33
Decision rule for the t-test 1. H 0 : β k = β k, is often β k = 0 H A : β k β k 2. Given β k, OLS-estimate b k and s 2, we compute t k = b k β k SE(b k ) 3. Fix significance level α of two-sided test 4. Fix non-rejection and rejection regions decision Remark: σ2 [(X X) 1 ] kk : standard deviation b k X s2 [(X X) 1 ] kk : standard error b k X Introduction & CLRM, Autumn Term 2010 34
Testing joint hypotheses (F-test/Wald test) Write hypothesis as: H 0 : R β = r (#r x K) (K x 1) (#r x 1) R: matrix of real numbers r: number of restrictions Replacing the β = (β 1, β 2,..., β k ) by estimator b = (b 1, b 2,..., b K ) : R b = r Introduction & CLRM, Autumn Term 2010 35
Definition of the F-test statistic Properties of R b: R E(b X) = Rβ = r R V ar(b X)R = R σ 2 (X X) 1 R Rb = r MV N(Rβ, R σ 2 (X X) 1 R ) Using some additional important facts from multivariate statistics z = (z 1, z 2,..., z m ) MV N(µ, Ω) (z µ) Ω 1 (z µ) χ 2 (m) Result applied: Wald statistic (Rb r) [σ 2 R(X X) 1 R ] 1 (Rb r) χ 2 (#r) Introduction & CLRM, Autumn Term 2010 36
Properties of the F-test statistic Replace σ 2 by its unbiased estimate s 2 = 1 dividing by #r: n K n i=1 e2 i = 1 n K e e and F -ratio: F = (Rb r) [R(X X) 1 R ] 1 (Rb r)/#r (e e)/(n K) = (Rb r) [R V ar(b X)R ] 1 (Rb r)/#r F (#r, n K) Note: F-test is one-sided Proof: see Hayashi p. 41 Introduction & CLRM, Autumn Term 2010 37
Decision rule of the F-test 1. Specify H 0 in the form Rβ = r and H A : Rβ r. 2. Calculate F-statistic. 3. Look up entry in the table of the F-distribution for #r and n K at given significance level. 4. Null is not rejected on the significance level α for F less than F α (#r, n K) Introduction & CLRM, Autumn Term 2010 38
Alternative representation of the F-statistic Minimization of the unrestricted sum of squared residuals: min n i=1 (y i x i b)2 SSR U Minimization of the restricted sum of squared residuals: F -ratio: min n i=1 (y i x i b) 2 SSR R F = (SSR R SSR U )/#r SSR U /(n K) Introduction & CLRM, Autumn Term 2010 39
6. Confidence intervals and goodness of fit measures Hayashi p. 38/20 Introduction & CLRM, Autumn Term 2010 40
Duality of t-test and confidence interval Under H 0 : β k = β k t k = b k β k SE(b k ) t(n K) Probability for non-rejection: ( ) P tα 2 (n K) t k tα 2 (n K) = 1 α tα 2 (n K) tα 2 (n K) t k lower critical value upper critical value random variable (value of test statistic) 1 α fixed number ( ) P b k SE(b k )tα 2 (n K) β k b k + SE(b k )tα 2 (n K) = 1 α Introduction & CLRM, Autumn Term 2010 41
The confidence interval Confidence interval for β k : ( ) P b k SE(b k )tα 2 (n K) β k b k + SE(b k )tα 2 (n K) = 1 α The confidence bounds are random variables! b k SE(b k )tα 2 (n K): lower bound b k + SE(b k )tα 2 (n K): upper bound Wrong Interpretation: True parameter β k lies with probability 1 α within the bounds of the confidence interval Problem: Confidence bounds are not fixed; they are random! H 0 is rejected at significance level α if the hypothesized value does not lie within the confidence bounds of the 1 α interval. Introduction & CLRM, Autumn Term 2010 42
Coefficient of determination: uncentered R 2 Measure of the variability of the dependent variable: yi 2 = y y Decomposition of y y: y y = (ŷ + e) (ŷ + e) = ŷ ŷ + 2ŷe + e e = ŷ ŷ + e e R 2 uc 1 e e y y A good model explains much and therefore the residual variation is very small compared to the explained variation. Introduction & CLRM, Autumn Term 2010 43
Coefficient of determination: centered R 2 and R 2 adj Use centered R 2 if there is a constant in the model (x i1 = 1) n i=1 (y i y) 2 = n i=1 (ŷ i y) 2 + n i=1 e2 i R 2 c 1 ni=1 e 2 i ni=1 (y i y) 2 1 SSR SST Note, that R 2 uc and R 2 c lie both in the interval [0, 1] but describe different models. They are not comparable! R 2 adj is constructed with a penalty for heavy parametrization: Radj 2 = 1 SSR/(n K) SST/(n 1) = 1 n 1 SSR n K SST The R 2 adj is an accepted model selection criterion Introduction & CLRM, Autumn Term 2010 44
Alternative goodness of fit measures Akaike criterion (AIC): Schwarz criterion (SBC): log ( ) SSR n + 2K n log ( SSR n ) + log(n)k n Note: Both criteria include a penalty term for heavy parametrization Select model with smallest AIC/SBC Introduction & CLRM, Autumn Term 2010 45