Basic Econometrics - rewiev Jerzy Mycielski
Model Linear equation y i = x 1i β 1 + x 2i β 2 +... + x Ki β K + ε i, dla i = 1,..., N, Elements dependent (endogenous) variable y i independent (exogenous) variables x 1i,..., x Ki parameters β 1,..., β K error term ε i
Estimated equation Similar to model Di erences: by i = x 1i b 1 + x 2i b 2 +... + x Ki b K. tted values by i instead of dependent variable y i estimates b 1,..., b K instead of parameters β 1,..., β K residuals e i instead of error terms ε i Parameters are nonrandom but estimates are random
Fitted regression line (simulated data)
Same model di erent sample
Residual and the t of the model De nition of the residual e i = y i x 1i b 1 x 2i b 2... x Ki b K = y i by i. Therefore y i = by i + e i = x 1i b 1 + x 2i b 2 +... + x Ki b K + e i.
Fitting the regression line 18 y(t)=8+1,5*x(t)+e(t) e(t)~n(0,2) 16 14 12 ^ y i y i =ei 10 8 0.5 1 1.5 2 2.53 3.54 4.5 5 5.5 6 6.57 7.5
Minimization problem The t is the best if the sum of squares of residuals is the smallest possible: min b S (b) = min b N i=1 (y i by i ) 2 = min b N i=1 Soluton of this minimization problem gives the formula for OLS estimator b This also explains why this estimator is called Least Squares estimator e 2 i.
Matrix formulation of te model 2 3 2 32 3 2 3 y 1 x 11 x K 1 β 1 ε 1 6 7 6 76 7 6 7 4. 5 = 4. 54. 5 + 4. 5, y N {z } x 1N {z x KN } β {z K } ε N {z } y ε Therefore we can write: X y = Xβ + ε β Similarly e = y Xb = y by.
We sometimes use as well the notation 2 y i = β 1 6 x 1i x Ki 4. {z } x i β K 3 {z } β 7 5 + ε i, so that y i = x i β + ε i, for i = 1,..., N
Solution of Least Squares problem First order derivative of S (b) w.r.t. b: S (b) b = 2X 0 y + 2X 0 Xb. First order conditions (system of normal equations) X 0 Xb = X 0 y Solution (OLS estimator): b = (X 0 X) 1 X 0 y. But: Matrix X has to be invertible (if it is not we have perfect collinearity)
R 2 Total Sum of Squares TSS = Explained Sum of Squares ESS = N (y i y) 2 = (y y) 0 (y y) i=1 N by i i=1 Residual Sum of Squares 2 by = by 0 by by by RSS= N ei 2 i=1 = e 0 e It can be proven that TSS = ESS + RSS
R 2 So we can de ne: and R 2 = ESS TSS = explained variation total variation 0 R 2 1 R 2 can be interpreted as percent of total variation of dependent variable explained by the model
Dummy variables Dummy variable can only take values 0 or 1 De ne a model y i = β 1 x 1i +... + β K x Ki + γd i + ε i. For D j = 0 y i = β 1 x 1i +... + β K x Ki + ε i. For D j = 1, y j = β 1 x 1j +... + β K x Kj + γ + ε j. So the di erence between expected values of y i and y j is equal to E (y j ) E (y i ) = γ.
Dummy variables General case For z i = 1, z j = s: y i = x i β + γ 0 + S s=2 D s,i γ s + ε i. E (y j ) E (y i ) = xβ + γ 0 + γ s xβ γ 0 = γ s
Classical regression model - assumptions 1 Model is linear: y i = x 1i β 1 +... + x Ki β K + ε i for i = 1,..., N. or: y = Xβ + ε. 2 Explanarory variables x 1i,..., x ki are nonrandom for i = 1,..., N 3 Expected value of the error therm is equal to zero: E (ε i ) = 0 dla i = 1,..., N. or: E (ε) = 0.
Classical regression model - assumptions 4. Covariance (correlation) between two error terms is equal to zero: Cov (ε i, ε j ) = 0 dla i 6= j. Absence of autocorrelation 5. Variance is the same for all observations (homoscedasticity): Var (ε i ) = σ 2 dla i = 1,..., N. Two last assumptions can be formulated as Var (ε) = σ 2 I
Properties of OLS estimator in CRM OLS estimator is unbiased E (b) = E (X 0 X) 1 X 0 Xβ = E (β) + (X 0 X) 1 X 0 E (ε) = β. {z } 0 + E (X 0 X) 1 X 0 ε Variance of the OLS estimator is equal to Var (b) = Var β + (X 0 X) 1 X 0 ε = (X 0 X) 1 X 0 Var (ε) X (X 0 X) 1 {z } σ 2 I = σ 2 (X 0 X) 1 X 0 IX (X 0 X) 1 = σ 2 (X 0 X) 1 =,
Properties of OLS estimator in CRM It can be proven that s 2 = is unbiased estimator of σ 2 So can be estimated with e0 e N K = N i=1 ei 2 N K. b =s 2 X 0 X 1
Theorem (Gauss-Markov) Under assumptions of CRM, OLS estimator is best linear unbiased estimator (BLUE)
Hypotesis testing Additional assumption of normality of error term ε sn 0, σ 2 I needed for derivation of statisitics distributions Simple hypotesis H0 : β k = β k H 1 : β k 6= β. k Test statistics t = b k β k bse (b k ) s t N K Most popular case - testing signi cance of the variables H0 : β k = 0 H 1 : β k 6= 0. Indeed if β k = 0 then variable is redundant in our model Statistics t = y i = β 0 +... + β {z} k x ki +... + β K x Ki + ε i, 0 b k se(b b k )
Hypotesis testing General case: joint hypotesis Statistics H 0 : Hβ = h F = (e0 R e R e 0 e) g e 0 e/ (N K ) s F (g,n K ), where e R are residuals of the restricted model (model estimated under assumption that H 0 is true)
Hypotesis testing Signi cance intervals Pr b k bse (b k ) t < β α2 k < b k + bse (b k ) t = 1 α α2