Econometrics I. Andrea Beccarini. Summer 2011

Size: px

Start display at page:

Download "Econometrics I. Andrea Beccarini. Summer 2011"

Loreen Barker
6 years ago
Views:

1 Econometrics I Andrea Beccarini Summer 2011

2 Outline Very brief review of statistical basics Simple linear regression model (specification, point estimation, interval estimation, hypothesis tests, forecasting, maximum likelihood estimation) Multiple linear regression model Violations of (some) model assumptions 9

3 Review of basic statistics Random experiment (Zufallsexperiment) Sample space (Ergebnismenge) Event (Ereignis) Set operations (Verknüpfungen von Ereignissen) Partition (Partition oder vollständige Zerlegung) 10

4 Probability (Wahrscheinlichkeit) Kolmogorov s axioms (Kolmogorovs Axiome) Conditional probability (bedingte Wahrscheinlichkeit) Total probability (Satz von der totalen Wahrscheinlichkeit) Bayes theorem (Satz von Bayes) Independence (Unabhängigkeit) 11

5 Random variables (Zufallsvariable) Definition and intuition Distribution function and quantile function (Verteilungsfunktion und Quantilfunktion) Discrete and continuous random variables (diskrete und stetige Zufallsvariable) Density function (Dichtefunktion) Expectation (Erwartungswert) Variance (Varianz) 12

6 Special discrete distributions, e.g. Bernoulli, binomial, Poisson, geometric, hypergeometric,... Special continuous distributions e.g. normal, standard normal distribution, exponential, Pareto, χ 2,F,t,... There are many more special distributions Which distribution can be used when? 13

7 Simple linear regression model Econometrics: Application of statistical methods to empirical research in economics Econometric problems: Specification of an appropriate model Estimation of the model (Schätzung) Hypothesis testing Forecasting (Prognose) 14

8 Economic model SPECIFICATION functional (A-assumptions) error term (B-assumptions) variables (C-assumptions) Econometric model ESTIMATION Estimated model HYPOTHESIS TESTS FORECASTING 15

9 Data Empirical research requires (high quality) data Often, collecting data is the main problem of empirical research Thereisnosystematicapproach Kinds of data: Time series data (Zeitreihendaten), cross sectional data (Querschnittsdaten), panel data (Paneldaten) 16

10 Specification Numeric illustration: Data of the gratuity example t x t y t t x t y t Billing amount x t and tip y t (bothineuro) of 20 observed guests 17

11 Functional dependence (generic) y = f (x) More specifically, the functional dependence is assumed to be y = α + βx Other functional forms are of course possible; more on that later The econometric model is specified using the A-, B- and C-assumptions 18

12 Economic model: y t = α + βx t for t =1,...,20 Trinkgeld y α 20 20β R Rechnungsbetrag x 19

13 Econometric model: y t = α + βx t + u t for t =1,...,20 y t x t 20

14 The A-assumptions (functional specification): Assumption a1: No relevant exogenous variable is omitted from the econometric model, and the exogenous variable included in the model is relevant Assumption a2: The true functional dependence between x t and y t is linear Assumption a3: The parameters α and β are constant for all T observations (x t,y t ) 21

15 The B-assumptions (error term specification): Assumption b1: E(u t )=0for t =1,...,T Assumption b2: Homoskedasticity: Var(u t )=σ 2 for t =1,...,T Assumption b3: For all t 6= s with t =1, 2,..., T and s =1, 2,..,T we have Cov(u t,u s )=0 Assumption b4: Theerrortermsu t are normally distributed. Compact notation of all B-assumptions: u t NID(0,σ 2 ) for t =1,...,T 22

16 Graphical illustration of the error term distribution 23

17 The C-assumptions (variable specification): Assumption c1 The exogenous variable x t is not stochastic, but can be controlled as in an experimental situation Assumption c2 The exogenous variable x t is not constant for all observations t Of course, many (or even all?) of the A-, B-, and C-assumptions are restrictive and unrealistic We will nevertheless suppose they are satisfied for the time being, and consider their violations later on 24

18 Point estimation The simple (two-variable) linear regression model is y t = α + βx t + u t Numeric illustration: The first data of the gratuity example t x t y t y t x t 25

19 Estimation: Compute estimated values ˆα and ˆβ Distinguish between true and estimated values If the true econometric model is y t = α + βx t + u t then the corresponding estimated model is ŷ t =ˆα + ˆβx t 26

20 How can we estimate the coefficients? y t R^ 2 R^ 3 R^ x t 27

21 Least squares method Sum of squared residuals where the residuals are Sûû = TX û 2 t t=1 û t = y t ŷ t = y t ˆα ˆβx t Residual (Residuum): Difference between the observed value y t and the estimated (predicted) value ŷ t 28

22 Choose ˆα and ˆβ such that the sum of squared residuals is minimized Sûû = TX t=1 û 2 t = TX t=1 (y t ˆα ˆβx t ) 2 Derivation of estimators (Schätzer) [1] with ˆβ = S xy /S xx ˆα = ȳ ˆβ x S xx = X (x t x) 2 = X x 2 t T x 2 S xy = X (x t x)(y t ȳ) = X y t x t T xȳ 29

23 Numeric illustration for the three-points example t x t y t Calculate {1} ˆα, ˆβ ŷ 1, ŷ 2, ŷ 3 û 1, û 2, û 3 Sûû 30

24 The coefficient of determination R 2 Variation of the endogenous variable S yy = P (y t ȳ) 2 y = y t g y 1 y y2 y x y y x t R 31

25 Variation Sŷŷ = P (ŷ t ȳ) 2 and sum of squared residuals Sûû = P û 2 t y t $y3 y g $u 2 $y y2 y 1 y 2 $u $u 3 $R KQ y x t 3 y 32

26 Decomposition of sum of squares (Streuungszerlegungssatz): [2] or S yy = Sŷŷ + Sûû X (yt ȳ) 2 = X (ŷ t ȳ) 2 + X û 2 t Coefficient of determination (Bestimmtheitsmaß) R 2 = explained variation unexplained variation = S yy Sûû S yy = S ŷŷ S yy Computation of R 2 {2} R 2 = ˆβS xy S yy = S2 xy S xx S yy 33

27 Properties of the estimators The estimators ˆβ = S xy /S xx ˆα = ȳ ˆβ x are random variables Thought experiment: repeated samples Computer simulation [experiment.r] 34

28 Under the a-, b- and c-assumptions (without b4) [3] E(ˆα) = α E(ˆβ) = β and [4] Cov(ˆα, ˆβ) = σ 2 ( x/s xx ) Var(ˆα) = σ 2 ³ 1/T + x 2 /S xx Var(ˆβ) = σ 2 /S xx BLUE property: ˆα and ˆβ are the best linear unbiased estimators [5] If, additionally b4 is true, then ˆα and ˆβ are the best unbiased estimators 35

29 How are y t, ˆα and ˆβ distributed? Because of y t is normally distributed, t =1,...,T u t NID(0,σ 2 ) The expectation of y t is E (y t ) = E(α + βx t + u t ) = E (α)+e(βx t )+E(u t ) = α + βx t 36

30 Thevarianceofy t is Var(y t ) = E ³ (y t E (y t )) 2 = E ³ (y t α βx t ) 2 = E ³ u 2 t = E ³ (u t E(u t )) 2 = σ 2 Further, for t =1,...,T y t NID(α + βx t,σ 2 ) 37

31 Since ˆβ = S xy /S xx ˆα = ȳ ˆβ x both ˆα and ˆβ are linear transformations of the y t Linear transformations of independent normally distributed random variables are normally distributed Hence, ˆα N ³ α, σ 2 (1/T + x 2 /S xx ) ˆβ N ³ β, σ 2 /S xx 38

32 Interval estimation (Intervallschätzung) We already know that ˆβ is a random variable and ˆβ N ³ β, σ 2 /S xx Instead of a point estimator ˆβ we now want an interval estimator satisfying [ˆβ k ; ˆβ + k] P ³ˆβ k β ˆβ + k =1 a The interval [ˆβ k ; ˆβ + k] is called (1 a)-confidence interval (Konfidenzintervall) 39

33 Confidence interval when σ 2 is known Step 1: Standardization of ˆβ se(ˆβ) = q σ 2 /S xx z = ˆβ E(ˆβ) se(ˆβ) = ˆβ β N (0, 1) se(ˆβ) The random variable z =(ˆβ β)/se(ˆβ) is a pivot (Pivot), i.e. its distribution does not depend on unknown parameters 40

34 Step 2: Find the (1 α/2)-quantile z a/2 P ( z a/2 z z a/2 )=1 a Step 3: Substitute z by (ˆβ β)/se(ˆβ) P Ã z a/2 ˆβ β se(ˆβ) z a/2! =1 a Rewriting yields the (1 a)-interval [6]{3} hˆβ z a/2 se(ˆβ); ˆβ + z a/2 se(ˆβ) i 41

35 Confidence interval when σ 2 is unknown Step 1: Estimation of σ 2 and se(ˆβ): ˆσ 2 = 1 T 2 TX û 2 t t=1 is a consistent and unbiased estimator of σ 2 and q cse(ˆβ) = ˆσ 2 /S xx is a consistent estimator of se(ˆβ) (wepostponetheproofs) Step 2: Standardization of ˆβ t = ˆβ E(ˆβ) cse(ˆβ) = ˆβ β cse(ˆβ) t (T 2) 42

36 The random variable t =(ˆβ β)/cse(ˆβ) is a pivot Step 3: Find the (1 α/2)-quantile t a/2 P ( t a/2 t t a/2 )=1 a Step 4: Substitute and solve for β, P (ˆβ t a/2 cse(ˆβ) β ˆβ + t a/2 cse(ˆβ)) = 1 a The interval estimator is {4} hˆβ t a/2 cse(ˆβ); ˆβ + t a/2 cse(ˆβ) i 43

37 Interval estimator for intercept α hˆα ta/2 cse(ˆα) ; ˆα + t a/2 cse(ˆα) i where cse(ˆα) = q bσ 2 (1/T + x 2 /S xx ) Some terminology: The standard error (Standardfehler) is se(ˆβ); the estimated standard error is cse(ˆβ) Usually, both se(ˆβ) and cse(ˆβ) are called standard error (Standardfehler) Interpretation of interval estimators? 44

38 Hypothesis tests How can we test hypotheses about the regression coefficients (usually about the slope β)? Null hypothesis H 0 and alternative hypothesis H 1 (Nullhypothese und Alternativhypothese) There are one-sided and two-sided tests We already know that ˆβ N ³ β, σ 2 /S xx 45

39 If the null hypothesis H 0 : β = q is true, then β can be substituted by q ˆβ N ³ q, σ 2 /S xx Then P (ˆβ k q ˆβ + k) =1 a P (q k ˆβ q + k) =1 a With high probability 1 α, theestimatorˆβ will be inside the interval [q k; q + k], ifh 0 is true If the estimator ˆβ is outside the interval, that is evidence against the null hypothesis 46

40 Graphical illustration 47

41 The analytical approach is slightly different Step 1: Set up H 0 and H 1 and fix the significance level a H 0 : β = q H 1 : β 6= q Step 2: Estimate se(ˆβ) with ˆσ 2 = Sûû / (T 2) cse(ˆβ) = q ˆσ 2 /S xx 48

42 Step 3: Compute the t-test statistic If H 0 : β = q is true, then t = ˆβ q cse(ˆβ) t t (T 2) Step 4: Find the critical value t a/2 P ( t a/2 t t a/2 )=1 a Step 5: Compare t a/2 and t. If t is outside [ t a/2 ; t a/2 ], i.e. if t >t a/2, then reject H 0 {5} 49

43 Connections between hypothesis testing and confidence intervals Under the (two-sided) null hypothesis H 0 P ³ q t a/2 cse(ˆβ) ˆβ q + t a/2 cse(ˆβ) =1 a The (1 a)-confidence interval is hˆβ t a/2 cse(ˆβ); ˆβ + t a/2 cse(ˆβ) i Conclusion: If q is outside the confidence interval, H 0 is rejected {6} 50

44 One-sided hypothesis tests (einseitige Tests) Rightorleft-sidedtests Right-sided null hypothesis H 0 : β q H 1 : β>q The basic idea remains the same: If ˆβ is much larger than q, reject H 0 51

45 Graphical illustration: 52

46 Analytical approach (right-sided null hypothesis) Step 1: State H 0 and H 1 and set the significance level a H 0 : β q H 1 : β>q Step 2: Estimate se(ˆβ) Step 3: Compute the t-statistic Under H 0 its distribution is t t (T 2) t = ˆβ q cse(ˆβ) 53

47 Step 4: Find the critical value t a P (t t a )=1 a For left-sided null hypotheses, the steps 1, 2 and 3 are the same; the critical value is t 1 a with P (t <t a )=a Step 5: Compare t a and t; reject H 0,ift>t a {7} For left-sided null hypotheses, H 0 is rejected if t is less than the critical value, t<t 1 a 54

48 The p-value (p-wert) The p-value is the probability that the test statistic (a random variable) is greater than the realized test statistic Traditional approach: Reject the null hypothesis if the test statistic is inside the critical region, e.g. if t>t a Alternative approach: Comparison of probabilities; reject the null hypothesis if the p-value is less than the significance level a 55

49 Graphical illustration: 56

50 The two approaches comparison of t-statistic and critical value or comparison of p-value and significance level are essentially identical {8} Advantages of the p-value approach? Disadvantages? p-value formulas for right- and left-sided hypothesis tests? [7] p-value formula for two-sided hypothesis test? 57

51 How to choose the null and alternative hypotheses There are basically two strategies: State the opposite of the conjecture as the null hypothesis and try to reject it State the conjecture as the null hypothesis and show that it cannot be rejected There is an important asymmetry between rejection and non-rejection 58

52 Maximum likelihood estimation Main idea: Find those parameter values that maximize the probability (or likelihood) of observing the actually observed data Notation: θ : Parameter vector, e.g. θ =(α, β, σ 2 ) L(θ) : Likelihood (given all the data) ln L (θ) : Log-likelihood Maximum likelihood estimators ˆθ =argminlnl(θ) 59

53 We already know that, for t =1,...,T y t NID(α + βx t,σ 2 ), hence the density of y t is f yt (y) = Ã 1 2πσ 2 exp 1 2 (y α βx t ) 2 σ 2! Due to independence, the joint likelihood and log-likelihood are L(α, β, σ 2 ) = f y1,...,y T (y 1,...,y T )= ln L(α, β, σ 2 ) = lnf y1,...,y T (y 1,...,y T )= TY t=1 TX f yt (y t ) t=1 ln f yt (y t ) 60

54 Maximize ln L(α, β, σ 2 ) = lnf y1,...,y T (y 1,...,y T ) " Ã TX 1 = ln 2πσ 2 exp 1 (y t α βx t ) 2 2 σ 2 t=1 with respect to the parameters α, β, σ 2 [8]!# The ML estimators are ˆα ML = ȳ ˆβ ML x ˆβ ML = S xy ˆσ 2 ML = 1 T S xx TX û 2 t t=1 61

55 Hypothesis tests in the maximum likelihood framework (the three classical tests: Wald, LR, LM) Null and alternative hypotheses, e.g. H 0 : β = β 0 H 1 : β 6= β 0 Derivation of the test statistics [exercise] 62

56 Forecasting Conditional forecast: the value of the exogenous variable is known and non-stochastic x 0 Point forecast of the endogenous variable is {9} ŷ 0 =ˆα + ˆβx 0 Thetruevalueofy 0 is usually not ŷ 0 but y 0 = α + βx 0 + u 0 63

57 The forecasting error is ŷ 0 y 0 = ˆα + ˆβx 0 (α + βx 0 + u 0 ) = (ˆα α)+ ³ˆβ β x 0 u 0 There are two error sources: 1. The error term u 0 will not vanish, in general. 2. The parameter estimates ˆα and ˆβ will deviate from the true values α and β. 64

58 Properties of the point forecast Expected forecasting error: E(ŷ 0 y 0 ) = E(ˆα α)+e(ˆβ β)x 0 E(u 0 ) = 0 Variance of the forecasting error [9] Var(ŷ 0 y 0 )=σ 2 h 1+1/T +(x 0 x) 2 /S xx i Estimated variance of the forecasting error {9} Var(ŷ d 0 y 0 )=ˆσ 2 h 1+1/T +(x 0 x) 2 i /S xx 65

59 Interval forecast Step 1: Estimation of se(ŷ 0 y 0 ) Step 2: Standardization of (ŷ 0 y 0 ) =0 t = (ŷ z } { 0 y 0 ) E (ŷ 0 y 0 ) cse(ŷ 0 y 0 ) = ŷ0 y 0 cse(ŷ 0 y 0 ) t T 2 Step 3: Find the t a/2 -value (from statistical tables or using statistical computer software) 66

60 Step 4: With large probability 1 α, the random variable t will be inside the interval [ t a/2 ; t a/2 ], P Ã t a/2 ŷ0 y 0 cse (ŷ 0 y 0 ) t a/2! =1 a Solve for y 0 P ³ ŷ 0 t a/2 cse(ŷ 0 y 0 ) y 0 ŷ 0 + t a/2 cse(ŷ 0 y 0 ) =1 a Hence, the interval forecast is {9} hŷ0 t a/2 cse(ŷ 0 y 0 ); ŷ 0 + t a/2 cse(ŷ 0 y 0 ) i 67

61 Width of the interval 68

62 Multiple linear regression model Until today we only considered a single exogenous variable, but in most empirical problems we face many exogenous variables Many of the results from the simple linear regression model can be transferred to the multiple case Important tool: matrix algebra (main diagonal, transpose, addition, scalar multiplication, inner product, matrix multiplication, idem potent, determinant, rank, inverse, trace, definit matrices, semidefinite matrices) 69

63 Specification Example: Estimation of a production function for barley Conduct an experiment where the barley output (Gerste, g t )isobservedfor different combinations of phosphate (p t ) and nitrogen (n t ) There are T =30different combinations The following table shows the data 70

64 t p t n t g t t p t n t g t 1 22,00 40,00 38, ,00 110,00 59, ,00 60,00 49, ,00 50,00 55, ,00 90,00 59, ,00 70,00 54, ,00 120,00 59, ,00 90,00 66, ,00 50,00 45, ,00 110,00 61, ,00 80,00 53, ,00 40,00 48, ,00 100,00 56, ,00 60,00 54, ,00 120,00 50, ,00 80,00 58, ,00 40,00 44, ,00 100,00 62, ,00 60,00 54, ,00 50,00 50, ,00 90,00 60, ,00 70,00 51, ,00 120,00 58, ,00 100,00 59, ,00 50,00 51, ,00 110,00 68, ,00 80,00 58, ,00 60,00 59, ,00 100,00 57, ,00 100,00 64,39 71

65 Functional specification (A-assumptions) The economic (agro-economic) model formalizes the connection between the barley output (g) and the fertilizers (p and n) g = f(p, n) Possible function form g = α + β 1 p + β 2 n A more realistic functional form g = Ap β 1n β 2, where A, β 1 and β 2 are constant parameters 72

66 Take logarithms of the production function g = Ap β 1n β 2, ln g =lna + β 1 ln p + β 2 ln n Define α =lna, y =lng, x 1 =lnp and x 2 =lnn, then y = α + β 1 x 1 + β 2 x 2 Table of log-values: t x 1 x 2 y t (= ln p t ) (= ln n t ) (= ln g t ) 1 3,0910 3,6889 3, ,3673 4,6052 4,

67 The econometric model is for t =1,...,T y t = α + β 1 x 1t + β 2 x 2t + u t General model for K exogenous variables y t = α + β 1 x 1t + β 2 x 2t β K x kt + u t for t =1,...,T or y 1 = α + β 1 x 11 + β 2 x β K x K1 + u 1 y 2 = α + β 1 x 12 + β 2 x β K x K2 + u 2. y T = α + β 1 x 1T + β 2 x 2T β K x KT + u T 74

68 Matrix notation: Define y = y 1 y 2. y T ; X = 1 x x K1 1 x x K x 1T... x KT ; β = α β 1. β K ; u = u 1 u 2. u T Compactnotationforthemultipleregressionmodel y = Xβ + u or y 1 y 2. y T = 1 x x K1 1 x x K x 1T... x KT α β 1. β K + u 1 u 2. u T 75

69 The A-assumptions Assumption A1: No relevant exogenous variable is omitted from the econometric model, and all exogenous variables included in the model are relevant Assumption A2: The true functional dependence between X and y is linear Assumption A3: The parameters β are constant for all T observations (x t,y t ) 76

70 The B-assumptions TheB-assumptionsarethesameasinthesimplelinearmodel,i.e.E(u t )=0, Var(u t )=σ 2, Cov(u t,u s )=0for t 6= s and normality B1 to B4 in matrix notation u N ³ 0,σ 2 I T 77

71 The C-assumptions Assumption C1: The exogenous variables x 1t,...,x Kt are not stochastic, but can be controlled as in an experimental situation Assumption C2: No perfect multicollinearity: The are no parameter values γ 0, γ 1, γ 2,...,γ K (with at least one γ k 6=0), such that for all t =1,...,T γ 0 + γ 1 x 1t + γ 2 x 2t γ K x Kt =0 Assumption C2 in matrix notation: (implication: T K +1) rang(x) =K +1 78

72 Perfect multicollinearity with two regressors If C2 is violated, there are γ 0, γ 1, γ 2,(notall0)suchthat for all t =1,...,T, thus γ 0 + γ 1 x 1t + γ 2 x 2t =0 x 2t = (γ 0 /γ 2 ) (γ 1 /γ 2 ) x 1t = δ 0 + δ 1 x 1t with δ 0 = (γ 0 /γ 2 ) and δ 1 = (γ 1 /γ 2 ) Hence, there are not really two regressors, since y t = α + β 1 x 1t + β 2 x 2t + u t = (α + β 2 δ 0 ) +(β {z } 1 + β 2 δ 1 ) x {z } 1t + u t =α 0 =β 0 79

73 Point estimation The econometric model is y = Xβ + u y t = α + β 1 x 1t β K x Kt + u t for t =1,...,T The estimated model is ŷ = Xˆβ ŷ t = ˆα + ˆβ 1 x 1t ˆβ K x Kt for t =1,...,T 80

74 Define the residuals û = y ŷ û t = y t ŷ t for t =1,...,T How can we find an estimator ˆβ in the multiple regression model? The sum of squared residuals is Sûû = û 0 û = X û 2 t 81

75 Because of we have û = y Xˆβ = y t ˆα ˆβ 1 x 1t... ˆβ K x Kt Sûû = ³ y Xˆβ 0 ³ y Xˆβ = X ³ yt ˆα ˆβ 1 x 1t... ˆβ K x Kt 2 First order conditions Sûû ˆβ = Sûû / ˆα Sûû / ˆβ 1. Sûû / ˆβ K = 0 82

76 Vector of derivatives Sûû ˆβ = ³ y Xˆβ 0 ³ y Xˆβ ˆβ = ˆβ y0 y ˆβ 2y0 Xˆβ + ˆβ = 2X 0 y+2x 0 Xˆβ ˆβX 0 Xˆβ J.R.Magnus,H.Neudecker,Matrix Differential Calculus with Applications in Statistics and Econometrics, rev. ed., John Wiley & Sons: Chichester, Phoebus J. Dhrymes, Mathematics for Econometrics, 3rd ed., Springer: New York,

77 Solving the first order conditions yields the normal equations X 0 Xˆβ = X 0 y and thus ˆβ = ³ X 0 X 1 X 0 y The terms are X 0 X= P x1t... P xkt P x1t x Kt T P P x1t x 2 1t P P P xkt xkt x 1t... x 2 Kt, X0 y= P yt P x1t y t. P xkt y t Numeric illustration {10} 84

78 Meaning of the estimators ˆα, ˆβ 1 and ˆβ 2 Formal meaning ŷ t x 1t = ˆβ 1 and ŷ t x 2t = ˆβ 2 Meaning of ˆα: forx 1t = x 2t =0 ln ĝ t = ˆα = ĝ t = e =

79 Meaning of ˆβ 1 and ˆβ 2 : ˆβ 1 = ŷ t x 1t = (ln ĝ t) (ln p t ) Because of we find ln ĝ t ĝ t = 1 ĝ t and ˆβ 1 = ĝ t/ĝ t p t /p t ln p t p t = 1 p t ˆβ 1 is the estimated elasticity of the barley output with respect to the phosphate fertilizer 86

80 Coefficient of determination R 2 The total variation of y canbedecomposedinthesamewayasinthesimple linear model S yy {z} total variation = Sŷŷ {z} explained variation + Sûû {z} unexplained variation The coefficient of determination is defined as R 2 explained variation = total variation = S ŷŷ S yy = S yy Sûû S yy 87

81 Graphical illustration S yy E A B C F D G S 11 S 22 Here R 2 = A + B + C A + B + C + E 88

82 Computation of R 2 : In the simple linear regression model R 2 = S ŷŷ S yy = ˆβS xy S yy Itcanbeshownthatinthemultiplelinearregressionmodel KX Sŷŷ = k=1 ˆβ k S ky with the covariations S ky = P T t=1 (x kt x k )(y t ȳ) Then {11} R 2 = P Kk=1 b β k S ky S yy 89

83 Properties of the OLS estimators The estimator ˆβ is a random vector The expectation vector is [10] (unbiasedness, Erwartungstreue) E(ˆβ) =β The covariance matrix of ˆβ is [11] V(ˆβ) =σ 2 ³ X 0 X 1 90

84 Special case: Covariance matrix in the two regressor model: Var(ˆβ 1 ) = Var(ˆβ 2 ) = σ 2 S 11 ³ 1 R σ 2 S 22 ³ 1 R Var(ˆα) = σ 2 /T + x 2 1 Var(ˆβ 1 ) +2 x 1 x 2 Cov(ˆβ 1, ˆβ 2 )+ x 2 2 Var(ˆβ 2 ) Cov(ˆβ 1, ˆβ 2 ) = σ 2 R S 12 ³ 1 R where R = S2 12 S 11 S 22 91

85 Gauss-Markov theorem The estimator ˆβ = X 0 X 1 X 0 y is linear in y, since ˆβ = Dy with D = ³ X 0 X 1 X 0 ˆβ = X 0 X 1 X 0 y is not only unbiased but also efficient Let ˇβ be another linear unbiased estimator of β Then V(ˇβ) V(ˆβ) is positive semidefinit [12] 92

86 Distribution of the estimator The model is y = Xβ + u From u N(0,σ 2 I T ) we conclude that y is multivariate normally distributed Expectation vector and covariance matrix of endogenous variable E(y) = E(Xβ + u) =Xβ V(y) = V(Xβ + u) =V(u) =σ 2 I T Thus y N(Xβ,σ 2 I T ) 93

87 How is the estimator ˆβ distributed? Since ˆβ = X 0 X 1 X 0 y the estimator ˆβ also has a multivariate normal distribution Expectation vector and covariance matrix are already known Hence ˆβ N µβ,σ 2 ³ X 0 X 1 Problem: The error term variance σ 2 is unknown 94

88 The covariance matrix V(ˆβ) cannot be computed without σ 2 Since usually σ 2 is unknown, it has to be estimated An estimator of σ 2 is ˆσ 2 = Sûû T K 1 Its expectation is E(ˆσ 2 )=σ 2 [13]{12} The residual maker matrix M = I T X(X 0 X) 1 X 0 95

89 Interval estimation Interval estimation of a single component ˆβ k of the vector ˆβ P ³ˆβ k c β k ˆβ k + c =1 a We know that ˆβ k N(β k,var(ˆβ k )) where Var(ˆβ k ) is the (k +1) th diagonal element of σ 2 X 0 X 1 Problem: σ 2 and Var(ˆβ k ) are unknown 96

90 Step 1: Estimation of σ 2 by ˆσ 2 and se(ˆβ k )= cse(ˆβ k )= q d Var(ˆβ k ) q Var(ˆβ k ) by Step 2: Standardization of ˆβ k t = ˆβ k E(ˆβ k ) cse(ˆβ k ) = ˆβ k β k cse(ˆβ k ) t (T K 1) Step 3: Find the t a/2 -value Step 4: The (1 α)-interval estimator is {13} hˆβ k t a/2 cse(ˆβ k ); ˆβ k + t a/2 cse(ˆβ k ) i 97

91 Interval estimation of linear combinations of ˆβ Let r be an arbitrary (K +1)-column vector How can we find a confidence interval of r 0 β? Fertilizer example: r =[0, 1, 1] 0,thenr 0 β = β 1 + β 2 (economies of scale?) The point estimator of r 0 β is r 0ˆβ Thevarianceofr 0ˆβ is r 0 V(ˆβ)r = σ 2 r 0 (X 0 X) 1 r 98

92 The confidence interval for r 0 β is r 0ˆβ t a/2 ˆσ q r 0 (X 0 X) 1 r ; r 0ˆβ + t a/2 ˆσ q r 0 (X 0 X) 1 r Special case of a single component β k = r 0 β for r =[0,...,0, 1, 0,...,0] 0 where the 1 is located at the k th position Then Var(ˆβ k )=r 0 σ 2 (X 0 X) 1 r 99

93 Hypothesis tests: t-test There are tests of a single linear combination (t-tests) and tests of multiple linear combinations (F -tests) Testing a single linear combination of parameters: t-test (two-sided) Remember: In the simple linear regression case H 0 : β = q H 1 : β 6= q 100

94 Inthemultiplelinearmodelthenullandalternativehypothesesare H 0 : r 0 α + r 1 β r K β K = q H 1 : r 0 α + r 1 β r K β K 6= q or H 0 : r 0 β = q H 1 : r 0 β 6= q where r =[r 0,r 1,...,r K ] 0 101

95 The test procedure: 1. Set up H 0 and H 1 and fix thesignificance level a 2. Estimate se(r 0ˆβ) 3. Compute the t-statistic 4. Find the critical value t a/2 5. Test decision: Compare t a/2 and t {14} 102

96 The left-sided t-test H 0 : r 0 β q H 1 : r 0 β <q and the right-sided test H 0 : r 0 β q H 1 : r 0 β >q are similar The critical values are lower quantiles of the t-distribution for the left-sided test and upper quantiles for the right-sided test {14} 103

97 Hypothesis tests: F-test Simultaneous test of two or more linear combinations (restrictions) Null hypothesis and alternative hypothesis H 0 : Rβ = q H 1 : Rβ 6= q Exampels: H 0 : β 1 = β 2 =...= β K =0 H 0 : β 1 = β 2 =...= β K H 0 : β β k =1and β 1 =2β 2 H 0 : β 1 =5and β 2 =...= β K =0 104

98 Basic idea of the F -test: Compare the restricted and the unrestricted model Sum of squared residuals of the econometric model and the model under the null hypothesis Sûû = û 0 û = Sû0û 0 = û 00 û 0 = TX û 2 t t=1 TX ³û0 2 t where û 0 are the residuals if the model is estimated under the restrictions of the null hypothesis t=1 105

99 Example: Null hypothesis y t = α +0 x 1t x Kt + u t = α + u t Obviously, S bubu 0 S bubu ; the null hypothesis is likely to be false if S0 bubu larger than S bubu is much The test statistic is ³ S 0 bubu S bubu. L F = S bubu /(T K 1) where L is the number of restrictions in H 0 If the null hypothesis is true, then F F (L,T K 1) 106

100 The five steps of the F -test 1. Set up H 0 and H 1 and choose the significance level a 2. Calculate S bubu and S 0 bubu (moreonthecomputationofs0 bubu later) 3. Compute the F -test statistic 4. Find the critical value F a, i.e. the upper a-quantile of the F L,T K 1 -distribution 5. Reject H 0 if F>F a {15} 107

101 Remarks: For L =1the F -test is identical to a two-sided t-test Careful: A combination of t-tests is not the same as a single F -test The decisions of t-tests and an F -test can be contradicting Distinction between individual t-tests and a simultaneous F -test 108

102 Example: H 0 : β 1 = β 2 =0.33 {16} 109

103 Computation of û 00 û 0 Estimate β subject to the restrictions Rβ = q given in the null hypothesis Optimization under constraints: Minimize with respect to β subject to Rβ = q Sû0û 0 = (y Xβ) 0 (y Xβ) A standard Lagrange approach yields [14] ˆβ RLS = ˆβ ³ X 0 X 1 R 0 R ³ 1 X 0 X 1 R 0 ³Rˆβ q 110

104 Residuals of the restricted model: û 0 = y Xˆβ RLS {17} The F -test statistic can also be written as [15] ³ Rˆβ q 0 h R X 0 X 1 R 0 i 1 ³ Rˆβ q /L F = û 0 û/ (T K 1) Note the similarity to the t-test statistic ³ r 0ˆβ q 2 t 2 = ˆσ 2 h r 0 (X 0 X) 1 r i Standard statistical software includes simultaneous tests of linear combinations (F -tests) 111

105 Maximum likelihood estimation Repetition: If X is a K-dimensional random vector with multivariate normal distribution N(μ, Σ) then its joint density is µ f X (x) =(2π) K/2 (det Σ) 1/2 exp 1 2 (x μ)0 Σ 1 (x μ) Multiple linear regression model y = Xβ + u with u N ³ 0,σ 2 I Distribution of the endogenous variables: y N ³ Xβ,σ 2 I 112

106 Joint density of y f y (y) ³ = (2π) T 2 det σ 2 I 12 exp µ 1 2 (y ³ Xβ)0 σ 2 I 1 (y Xβ) = (2π) T/2 ³ σ 2T 1/2 Ã exp (y! Xβ)0 (y Xβ) 2σ 2 Log-likelihood function ln L ³ β,σ 2 = T 2 ln (2π) T 2 ln σ2 (y Xβ)0 (y Xβ) 2σ 2 113

107 First order condition for a maximum ln L β ln L σ 2 = X 0 (y X 0 β) σ 2 T 2σ 2 + (y Xβ)0 (y Xβ) 2σ 4 = " 0 0 # Solution of the FOCs [16] ˆβ ML = ³ X 0 X 1 X 0 y ˆσ 2 ML = 1 T ³û0û The ML estimator of β is identical to the OLS estimator, the ML estimator of σ 2 is different and thus biased (but asymptotically unbiased) 114

108 The classical tests (LR, Wald, LM) Illustration of the basic test ideas [threetests.r] Generalization to multiple restrictions H 0 : g(β) =0 H 1 : g(β) 6= 0 where β is the coefficient vector of a multiple linear regression model and g is a (possibly nonlinear) vector-valued function Test of L linear restrictions: g(β) =Rβ q 115

109 Wald test Idea: If g(ˆβ ML ) is significantly different from 0, reject H 0 Test statistic (for multiple restrictions) W = g ³ˆβ ML 0 h d Cov ³ g ³ˆβ ML i 1 g ³ˆβ ML d U χ 2 L if the null hypothesis is true Wald test statistic for L linear restrictions Rβ q = 0 [17] 116

110 Likelihood ratio (LR) test Idea: If the maximal likelihood under the restrictions L(ˆβ R, ˆσ 2 R ) is significantly lower than the maximal likelihood without restrictions L(ˆβ ML, ˆσ 2 ML ),then reject H 0 Test statistic LR =2 ³ ln L ³ˆβ ML, ˆσ 2 ML ln L ³ˆβ R, ˆσ 2 R d U χ 2 L if the null hypothesis is true LR test statistic for L linear restrictions Rβ q = 0 [18] 117

111 Lagrange multiplier (LM) test Idea: If the slope of the log-likelihood function ln L(ˆβ R )/ β is significantly different from 0, reject H 0 Test statistic LM = ln L(ˆβ R ) β if the null hypothesis is true 0 h d Cov ³ˆβ R i 1 ln L(ˆβ R ) β d U χ 2 L LM test statistic for L linear restrictions Rβ q = 0 [19] 118

112 Forecasting The approach is similar to forecasting in the simple linear regression Let x 0 =[1,x 10,x 20,...,x K0 ] 0 denote the vector of exogenous variables Point forecast ŷ 0 = x 0 0ˆβ Variance of the forecast error [20] Var(ŷ 0 y 0 )=σ 2 µ1+x 0 0 ³ X 0 X 1 x0 119

113 Presentation of the results In the literature, the results of regression analyses are often presented as follows ŷ = ˆα + ˆβ 1 x ˆβ K x K (cse(ˆα)) (cse(ˆβ 1 )) (cse(ˆβ K )) Sometimes you find t-values in the parentheses, i.e. the values of the test statistics for the tests H 0 : β k =0vs H 1 : β k 6=0 Often, R 2 and ˆσ and the value of the test statistic of the F test H 0 : β 1 =...= β K =0 vs H 1 : not H 0 are reported additionally 120

114 Fertilizer example: ŷ = x x 2 ( ) ( ) ( ) Additional results R 2 = ˆσ 2 = ˆσ = Test statistics H 0 : β 1 = H 0 : β 2 = H 0 : β 1 = β 2 =

115 Examples of computer output: Excel SPSS EViews Stata R matlab 122

116 Assumptions A1: No relevant variable is omitted, and no irrelevant variables are included A2: The true functional dependence between X and y is linear A3: The parameters β are constant for all T observations (x t,y t ) B1-B4: u N ³ 0,σ 2 I T C1: The exogenous variables are not stochastic C2: No perfect multicollinearity: rank(x) =K +1 All assumptions can be violated What happens if they are violated? 123

117 Omitted or irrelevant variables Assumption A1: No relevant exogenous variable is omitted from the econometric model, and all exogenous variables included in the model are relevant What happens if relevant variables are missing? What happens if there are irrelevant variables included in the model? Example: Wage structure in a firm with 20 employees; what are the determinants of the wage y t? 124

118 Data: Education x 1t ; age x 2t ; firm tenure x 3t t y t x 1t x 2t x 3t t y t x 1t x 2t x 3t

119 Three potential models (M2 is the true model) (M1) y t = α + βx 1t + u 0 t (M2) y t = α + β 1 x 1t + β 2 x 2t + u t (M3) y t = α + β 1 x 1t + β 2 x 2t + β 3 x 3t + u 00 t Model Variable Coeff. bse(.) t-test p-value (M1) Constant Education (M2) Constant Education Age (M3) Constant Education Age Firm tenure

120 Omitted relevant variables Graphical representation S yy E A B C F D G S 11 S

121 The models: (M1) y t = α + βx 1t + u 0 t (M2) y t = α + β 1 x 1t + β 2 x 2t + u t (M3) y t = α + β 1 x 1t + β 2 x 2t + β 3 x 3t + u 00 t Theerrorterms u 0 t = β 2 x 2t + u t E(u 0 t) = E(β 2 x 2t + u t ) = β 2 x 2t + E(u t ) = β 2 x 2t +0 6= 0 128

122 If a relevant exogenous variable is omitted, assumption B1 is violated! Consequence for point estimation ˆβ 0 1 = ˆβ 1 + ˆβ 2 S 12 E(ˆβ 0 1) = E Ã S 11 ˆβ 1 + ˆβ 2 S 12 S 11 = β 1 + β 2 S 12 S 11! Consequence for interval estimation hˆβ 0 1 t a/2 cse(ˆβ 0 1); ˆβ t a/2 cse(ˆβ 0 1) i 129

123 Further se(ˆβ 0 1)= r var ³ˆβ 0 1 with var ³ˆβ 0 σ 2 1 = S 11 The estimator ˆσ 2 = S bu 0 bu 0 T 2 is biased; the unbiased estimator is ˆσ 2 = S bubu T 3 130

124 Conclusion: The coverage probability of the confidence intervals is not 1 α Hypothesis tests are also biased: The probability of an error of the first kind does not equal the significance level If a relevant exogenous variable is omitted, then the point estimators are biased and inconsistent the interval estimators and hypothesis tests are no longer valid {18} 131

125 Irrelevant variables The error term in the misspecifiedmodelm3is and since β 3 =0 u 00 t = u t β 3 x 3t u 00 t = u t Consequently, E(bα 00 1 ) = α E(ˆβ 00 1) = β 1 E(ˆβ 00 2) = β 2 E(ˆβ 00 3) = β 3 =0 132

126 The variances of the estimators are Var(ˆβ 1 ) = Var(ˆβ 00 1) = σ 2 S 11 ³ 1 R σ 2 S 11 ³ 1 R R Theestimatederrortermvarianceis bσ 2 = S bu 00 bu 00 T 4 Conclusion: Omitted relevant variables are a serious problem, redundant variables are not (but they inflate the standard errors) 133

127 Diagnosis How can we find the correct model? The coeffcient of determination R 2 does not help select a model Adjusted R 2 R 2 = 1 S bubu /(T K 1) S yy /(T 1) = 1 ³ 1 R 2 T 1 T K 1 134

128 Further model selection criteria (trade-off between biasedness and inefficiency) Akaike information criterion (AIC) AIC =ln µ Sbubu T + 2(K +1) T t-test for single variables; F -test for multiple variables 135

129 Functional form Assumption A2: The true functional dependence between X and y is linear Milk example: Milk production m depends on amount of concentrated feed f t f t m t t f t m t

130 Kraftfutter 137 Milchmenge

131 A misspecified model returns useless results Some nonlinear dependencies Semi-Log. : m t = α + β ln f t + u t Invers : m t = α + β (1/f t )+u t Exponential : ln m t = α + βf t + u t Logarithmic : ln m t = α + β ln f t + u t Quadratic : m t = α + β 1 f t + β 2 ft 2 + u t 138

132 Approach I: Estimation of a nonlinear regression with criterion function y t = g(x t )+u t TX t=1 (y t g(x t )) 2 Optimization by numerical methods Approach II: Linearization of the model; then linear regression y t = α + βx t + u t y t = lnm t x t = lnf t 139

133 Diagnosis: Regression Specification Error Test (RESET) Higher order Taylor approximation y t = f(x t )=α + β 1 x t + β 2 x 2 t + β 3 x 3 t +... Are the higher orders (jointly) significant? F -test of β 2 = β 3 =...=0 Problem: What happens if there are many exogenous variables? 140

134 Basic idea of the RESET: by t 2, by t 3,... are included as additional exogenous variables y t = α + β 1 x t + γ 2 by 2 t + γ 3 by 3 t + u t If γ 2 and/or γ 3 are significant, then there are nonlinearities F -test of γ 2 = γ 3 =0(maybe even higher orders) The test is implemented in many statistical software packages 141

135 RESET in the linear model: 1. Estimate the linear model and calculate S bubu and the fitted by t 2. Add L powers of ŷ t to the linear model y t = α + β 1 x t + γ 2 ŷ 2 t + γ 3 ŷ 3 t + u t Estimate the extended model and calculate the sum of squared residuals S bubu 3. The null hypothesis is H 0 : γ 2 = γ 3 =0 142

136 4. Compute the F -test statistic F (L,T K 1) = ³ Sbubu S bubu /L S bubu / (T K 1) where K is the number of exogenous variables in the extended model 5. If F>F a (significance level a, degress of freedom L and T K 1), then H 0 is rejected and the linear model is discarded Milk example {18} 143

137 Qualitative exogenous variables Assumption A3: The parameters β are constant for all T observations (x t,y t ) Example: The wage y t depends on education x 1t and age x 2t y t = α + β 1 x 1t + β 2 x 2t + u t The wage equations for males and females might be different y t = α M + β M1 x 1t + β M2 x 2t + u t y t = α F + β F 1 x 1t + β F 2 x 2t + u t What happens if the difference is neglected? [qualitative.r] 144

138 Dummy variable D t = ( 0 if male 1 if female Extended model y t = α + D t γ + β 1 x 1t + δ 1 D t x 1t + β 2 x 2t + δ 2 D t x 2t + u t Model for men (D t =0) y t = α + β 1 x 1t + β 2 x 2t + u t Model for women (D t =1) y t =(α + γ)+(β 1 + δ 1 ) x 1t +(β 2 + δ 2 ) x 2t + u t 145

139 If the qualitative variable has more than two values, we need more than one dummy variable Example: Religion (protestant, catholic, other) D Pt = D Ct = 0 for other 1 for protestant 0 for catholic 0 for other 0 for protestant 1 for catholic Meaning of the coefficients; testing structural stability 146

140 Estimation of the model Use the ordinary t- or F -tests to detect differences in the coefficients, e.g. H 0 : γ = δ 1 = δ 2 =0 Very often, the model includes only a level effect, i.e. y t = α + γd t + β 1 x 1t + β 2 x 2t + u t Then use a t-test for γ 147

141 Estimation of the wage equation model y t = α + D t γ + β 1 x 1t + δ 1 D t x 1t + β 2 x 2t + δ 2 D t x 2t + u t Compare with separat estimation of the two models [wages.r] y t = α M + β M1 x 1t + β M2 x 2t + u t y t = α F + β F 1 x 1t + β F 2 x 2t + u t for men for women The point estimates and the sum of squared residuals are identical (why?) The standard errors differ (why?) 148

142 For simplicity we only consider one exogenous variable y t = α + γd t + βx t + δd t x t + u t Order the observations such that D t =0for t =1,...,T 1 and D t =1for t = T 1 +1,...,T The joint estimation minimizes (with respect to α, β, γ, δ) S (α, β, γ, δ) = T 1 X t=1 (y t α βx t ) 2 + TX t=t 1 +1 (y t (α + γ) (β + δ) x t ) 2 149

143 The first order conditions for the joint estimation are T1 S α = X t=1 T1 S β = X t=1 S X T γ = S δ = t=t 1 +1 TX t=t 1 +1 (y t α βx t ) (y t α βx t ) x t TX t=t 1 +1 TX t=t 1 +1 (y t (α + γ) (β + δ) x t )=0 (y t (α + γ) (β + δ) x t ) x t =0 (y t (α + γ) (β + δ) x t )=0 (y t (α + γ) (β + δ) x t ) x t =0 150

144 Hence, the point estimates in the joint estimation are identical to those of the separat estimations If the point estimates are identical, then so are the residuals; and if the residuals are identical, then so are the sums of squared residuals As to the standard errors, in the joint model we estimate ˆσ 2 = Sûû / (T 4) while in the separat estimations we estimate ˆσ 2 0 = S0 ûû / (T 1 2) ˆσ 2 1 = S1 ûû / ((T T 1) 2) 151

145 Remarks What happens if the dummy variables are not 0/1-coded but 1/2-coded? Consider the model where y t = α + γd 1t + δd 2t + βx t + u t ( 0 for males D 1t = 1 for females ( 0 for German citizenship D 2t = 1 else Interaction terms 152

3. Linear Regression With a Single Regressor

3. Linear Regression With a Single Regressor Econometrics: (I) Application of statistical methods in empirical research Testing economic theory with real-world data (data analysis) 56 Econometrics: (II)