3. Linear Regression With a Single Regressor

Size: px

Start display at page:

Download "3. Linear Regression With a Single Regressor"

Kellie Nicholson
5 years ago
Views:

1 3. Linear Regression With a Single Regressor Econometrics: (I) Application of statistical methods in empirical research Testing economic theory with real-world data (data analysis) 56

2 Econometrics: (II) Econometric tasks: 1. Economic model Specification: functional form (A assumptions) error term (B assumptions) variables (C assumptions) 2. Econometric model Estimation 57

3 Econometrics: (III) 3. Model estimation hypothesis testing prediction Data: We need good data in empirical economics Collecting data often poses practical problems (historical data vs. statistical experiments) No general (systematic) instructions 58

4 Types of data sets: Longitudinal data (observations on distinct entities at a given point of time) Time-series data (observations on a single entity observed at multiple points of time) Panel data (observations on distinct entities observed at multiple points of time) 59

5 Meaning of the notion regression : Specification and analysis of a functional (parametric) relationship between a dependent (endogenous) variable y and a set of independent (exogenous) variables 1 exogenous variable (x) regression with a single regressor k exogenous variables (x 1,..., x k ) regression with multiple regressors (see Section 4) 60

6 Example: Invoice price x i and tip y i (in euros) sampled from 20 guests in a restaurant i x i y i i x i y i i x i y i i x i y i

7 3.1 Model Specification 1. Functional form (A assumptions) Specification in 3 steps: Step #1: y = f(x), here: y = α + βx ( True relationship between invoice price x and tip y) 8 y I Tip Invoice price x 62

8 Step #2: y i = α + βx i for i = 1,..., N (economic model) (Data in our example) y i x i 63

9 Step #3: y i = α + βx i + u i for i = 1,..., N (econometric model) Remarks: We call α and β regression parameters or coefficients The random variable u i constitutes an error term 64

10 The A assumptions: Assumption #A1: Our econometric model does not lack of any further relevant exogenous variables beyond x i and the included exogenous variable x i is not irrelevant Assumption #A2: The true relationship between x i and y i is linear Assumption #A3: The coefficients α and β are constant for all N observations (x i, y i ) 65

11 2. Specification of the error term (B assumptions) Rationale for the error term: Sampling and measurement errors Omitted exogenous variables Human behavior 66

12 The B assumptions: Assumption #B1: For i = 1,..., N we have Assumption #B2: For i = 1,..., N we have E(u i ) = 0 Var(u i ) = σ 2 Assumption #B3: For i = 1,..., N and j = 1,..., N (i = j) we have Cov(u i, u j ) = 0 Assumption #B4: The error terms u i are normally distributed, i.e. u i N(0, σ 2 ) 67

13 Illustration of the B assumptions 68

14 3. Specification of the variables (C assumptions) The C assumptions: Assumption #C1: The exogenous regressor x i is deterministic (not a random variable) and can be controlled in an experiment Assumption #C2: The regressor x i is not constant for all i = 1,..., N (i.e. there is variation among the x i s) 69

15 3.2 (Point)Estimation Until now: Specification of the econometric model y i = α + βx i + u i Now: Derivation of estimators ˆα and ˆβ of the unknown parameters α und β via the Ordinary Least Squares (OLS) method 70

16 To this end: Distinction between true and estimated sphere True econometric model y i = α + βx i + u i Corresponding estimated model ŷ i = ˆα + ˆβx i (after having derived appropriate estimators ˆα and ˆβ) notion residual 71

17 Definition 3.1: (Residual) We call the deviation between the true value y i and the estimate ŷ i the ith residual and denote it by û i : û i = y i ŷ i = y i ˆα ˆβx i. Remarks: Note the difference between û i and the ith error term for which we have u i = y i α βx i u i rests on the true parameters α and β, û i rests on the parameter estimates ˆα and ˆβ 72

18 Basic idea of OLS estimation: Find estimators ˆα and ˆβ of the unknown parameters α and β from the true econometric model such that ˆα and ˆβ minizmize the sum of squared residuals: N i=1 û 2 i = N i=1 ( yi ˆα ˆβx i ) 2 73

19 Theorem 3.2: (OLS estimators) In linear regression model with a single regressor y i = α + βx i + u i, i = 1,..., n, the Ordinary Least Squares (OLS) estimators are given by ˆβ = N i=1 N (x i x)(y i y) i=1 ˆα = y ˆβx. (x i x) 2, (Proof: Class) 74

20 Remarks: The OLS estimators are formally derived by partial differentiation of the sum of squared residuals with respect to ˆα, ˆβ setting the partial derivatives equal to zero solving the system of equations for ˆα and ˆβ The assumption #B4, postulating normally distributed error terms, is not needed in deriving the OLS estimators 75

21 Example: (Tip) (I) x = i=1 x i = 31.50, y = i=1 y i = (x i x) 2 = 4130, i=1 20 i=1 (x i x)(y i y) = 519 ˆβ = 519/4130 = ˆα = = Estimated model: ŷ i = x i 76

22 Example: (Tip) (II) Residuals: û i = y i ŷ i = y i x i Sum of squared residuals: 20 i=1 û 2 i =

23 y i x i

24 For the (true) econometric model we have E(y i ) = E(α + βx i + u i ) = E(α) + E(βx i ) + E(u i ) = α + βx i and Var(y i ) = = E { [y i E(y i )] 2} E { [y i α βx i ] 2} = E(u 2 i ) = Var(u i ) = σ 2 79

25 Theorem 3.3: (Unbiasedness of the OLS estimators) Under the A, B, C assumptions (without #B4) we have E(ˆα) = α, E(ˆβ) = β, i.e. the OLS estimators are unbiased. Furthermore, Cov(ˆα, ˆβ) = σ 2 x Ni=1 (x i x) 2 Var(ˆα) = σ 2 [ 1 N + x 2 Ni=1 (x i x) 2 ] Var(ˆβ) = σ 2 Ni=1 (x i x) 2. 80

26 Theorem 3.4: (Gauß-Markov-Theorem) (a) Under the A, B, C assumptions (without #B4) the OLS estimators ˆα and ˆβ have minimal variance among all linear and unbiased estimators of α and β. (BLUE = Best Linear Unbiased Estimators) (b) Additionally, if the normality assumption #B4 for the error terms u i holds, then the OLS estimators ˆα and ˆβ have minimal variance among all unbiased estimators of α and β. (UMVUE = Uniformly minimum-variance unbiased estimators) 81

27 Remark: Under the normality assumption #B4 it follows that 1. the distribution of y i is given by y i N(α + βx i, σ 2 ) for all i = 1,..., N 2. the distributions of the OLS estimators are given by ( [ ˆα N α, σ 2 1 N + x 2 ]) Ni=1 (x i x) 2 ˆβ N ( β, σ 2 ) Ni=1 (x i x) 2 82

28 Now: Maximum likelihood estimation of the linear regression model y i = α + βx i + u i under all A, B, C assumptions, i.e. in the case of independent, identically distributed u i N(0, σ 2 ) Assumptions #B3 and #B4 imply stochastic independence among all u i 83

29 Derivation: (I) y i is a linear function of u i y i are independent with y i N(α + βx i, σ 2 ) The probability density function of y i is given by f yi (y) = 1 2πσ exp { 1 2 [ y α βxi σ ] 2 } 84

30 Derivation: (II) The joint density of the endogenous y i is given by f y1,...,y N (y 1,..., y N ) = = = N i=1 f yi (y i ) { N 1 exp 1 [ ] } yi α βx 2 i i=1 2πσ 2 σ 1 (2πσ 2 exp ) N/2 1 N 2σ 2 (y i α βx i ) 2 i=1 The likelihood function is given by L(α, β, σ 2 1 ) = (2πσ 2 exp ) N/2 1 2σ 2 N i=1 (y i α βx i ) 2 85

31 Derivation: (III) The log-likelihood function is given by L (α, β, σ 2 ) = ln[l(α, β, σ 2 )] = N 2 ln(2πσ2 ) 1 2σ 2 N i=1 (y i α βx i ) 2 For given σ 2 the log-likelihood L becomes maximal with respect to α and β when N i=1 (y i α βx i ) 2 becomes minimal with respect to α and β ML estimators of α and β coincide with OLS estimators 86

32 Derivation: (IV) Differentiating L with respect to σ 2 yields L σ 2 = N 2 1 2πσ 22π + 1 N (2σ 2 ) 22 = N 2σ σ 4 N i=1 i=1 (y i α βx i ) 2 (y i α βx i ) 2 Setting the latter term equal to zero and inserting the ML estimators of α, β, we obtain N 2ˆσ 2 ML + 1 2ˆσ 4 ML N i=1 ( yi ˆα ML ˆβ ML x i ) 2 } {{ } = û 2 i = 0 87

33 Derivation: (V) Solving for ˆσ 2 ML, we find the ML estimator of σ2 : N ˆσ 2 ML = 1 N ûi 2 i=1 Remarks: The ML estimator ˆσ 2 ML is biased: E (ˆσ 2 ML ) = N 2 Thus, an unbiased estimator of σ 2 is given by N σ2 ˆσ 2 = N N 2ˆσ2 ML = 1 N 2 N û 2 i i=1 88

34 Summary: (I) The OLS estimators of α and β are given by ˆα = y ˆβx, ˆβ = N i=1 N (x i x)(y i y) i=1 (x i x) 2 The variances of the OLS estimators are given by [ Var(ˆα) = σ 2 1 N + x 2 ] Ni=1 (x i x) 2 Var(ˆβ) = σ 2 Ni=1 (x i x) 2 89

35 Summary: (II) An unbiased estimator of σ 2 is given by ˆσ 2 = N N 2ˆσ2 ML = 1 N 2 N û 2 i i=1 standard errors of the OLS estimators 90

36 Definition 3.5: (Standard errors of the OLS estimators) Replacing the unknown σ 2 in the variance formulae of the OLS estimators ˆα and ˆβ given in Theorem 3.3 (Slide 80) by the unbiased estimator ˆσ 2 = 1 N 2 N ûi 2 i=1 and taking the square root, we obtain the so called standard errors of the OLS estimators: SE(ˆα) = [ ˆσ 2 1 N + x 2 Ni=1 ], (x i x) 2 SE( ˆβ) = ˆσ 2 Ni=1 (x i x) 2. 91

37 Remark: The standard error of an estimator is an estimator of the standard deviation of the estimator and thus constitutes an important measure of the precision of the estimator 92

38 Example: Standard errors of the OLS estimators in the tip example Dependent Variable: TIP Method: Least Squares Date: 06/25/04 Time: 21:46 Sample: 1 20 Included observations: 20 Variable Coefficient Std. Error t-statistic Prob. C INVOICE_PRICEICE R-squared Mean dependent var Adjusted R-squared S.D. dependent var S.E. of regression Akaike info criterion Sum squared resid Schwarz criterion Log likelihood F-statistic Durbin-Watson stat Prob(F-statistic)

39 Question: How does our model estimated by OLS ŷ i = ˆα + ˆβx i fit the data? How can we measure the fit of our regression line? Coefficient of determination R 2 94

40 Derivation: Variation of the endogenous y i S yy N i=1 (y i y) 2 Sum of squared residuals and explained variation Sûû N i=1 û 2 i and Sŷŷ N i=1 (ŷi ŷ ) 2 = N i=1 (ŷ i y) 2 95

41 Theorem 3.6: If the true relationship between the endogenous y i and the exogenous x i is linear (i.e. if Assumption #A2 is satisfied), then after OLS estimation the following result holds: S yy = Sŷŷ + Sûû. Remark: Via OLS estimation the entire variation in the y i observations can be divided into 2 components: 1. the variation explained by the estimated model (i.e. by the estimated regression line) Sŷŷ 2. the unexplained variation of the estimated model Sûû 96

42 Definition 3.7: (Coefficient of determination) The coefficient of determination of a regression model, R 2, is defined as that portion of the entire variation in the y i observations that are explained by the model (i.e. by the regression line): R 2 = explained variation entire variation = S ŷŷ S yy. Remarks: (I) It follows from Theorem 3.6 that R 2 = S ŷŷ S yy = S yy Sûû S yy = 1 S ûû S yy 97

43 Remarks: (II) We always have 0 R 2 1 R 2 = 0: Sŷŷ = 0, i.e. the entire variation of the y i observations is explained by the variation in the residuals û i the regression does not explain anything R 2 = 1: Sŷŷ = S yy, i.e. the entire variation of the y i observations is completely explained by the variation of the estimated observations (ŷ i ) the regression model explains everything 98

44 Remarks: (III) Direct computation of the R 2 from the data (y i, x i ): R 2 = ˆβS xy S yy = S2 xy S xx S yy = 1 N 2S2 xy 1 N S xx 1 N S yy = 1 (N 1) 2S2 xy 1 N 1 S xx 1 N 1 S yy (square of the sampling correlation coefficient) 99

45 3.3 Hypothesis Tests Setting: Consider the linear regression model with u i N(0, σ 2 ) (Assumption #B4) y i = α + βx i + u i for i = 1,..., N, Now: Statistical hypothesis tests covering the parameter β One- and two-sided tests at the significance level a 100

46 Two-sided testing problem (q R): H 0 : β = q versus H 1 : β q Appropriate test statistic: T = ˆβ q SE( ˆβ) 101

47 Reasoning: (I) ˆβ q is an estimator of the distance between β and q Estimator should be related to the dispersion (standard deviation) of ˆβ: SD(ˆβ) Var(ˆβ) = σ 2 Ni=1 (x i x) 2 Estimate SD(ˆβ) via the standard error SE(ˆβ) SE(ˆβ) = ˆσ 2 Ni=1 (x i x) 2 where ˆσ 2 = 1 N 2 N û 2 i i=1 102

48 Reasoning: (II) Distribution of T under validity of H 0 : β = q: T (under H 0) t N 2 (t distribution with N 2 degrees of freedom) decision rule Testing procedure: Find the (1 a/2) quantile of the t N 2 -distribution Critical region: (, t N 2;1 a/2 ] [t N 2;1 a/2, + ) i.e. reject H 0, if T t N 2;1 a/2 103

49 pdf of the t N 2 -distribution a a / 2 a / 2 t N ;1 a 0 ( q under H 0 ) t N ;1 a 104

50 EViews-Output of the tip regression Dependent Variable: TIP Method: Least Squares Date: 06/25/04 Time: 21:46 Sample: 1 20 Included observations: 20 Variable Coefficient Std. Error t-statistic Prob. C INVOICE_PRICEICE R-squared Mean dependent var Adjusted R-squared S.D. dependent var S.E. of regression Akaike info criterion Sum squared resid Schwarz criterion Log likelihood F-statistic Durbin-Watson stat Prob(F-statistic)

51 Illustration: (Tip example) (I) H 0 : β = 0 versus H 1 : β 0 at the level a = 0.05 (Realized) value of the test statistic: T = ˆβ SE(ˆβ) = = (1 a/2) quantile of the t N 2 distribution: t N 2;1 a/2 = t 18;0.975 = Decision rule: T = > = t 18;0.975 Reject H 0 at the 5% level (β is significantly different from zero) 106

52 Question: Is β significantly different from 0.1 (at the level a = 0.05)? Testing procedure: (I) H 0 : β = 0.1 versus H 1 : β 0.1 at the level a = 0.05 (Realized) value of the test statistic: T = ˆβ 0.1 SE(ˆβ) = =

53 Testing procedure: (II) (1 a/2) quantile of the t N 2 distribution: t N 2;1 a/2 = t 18;0.975 = Decision rule: T = < = t 18;0.975 H 0 cannot be rejected at the 5% level (β is not significantly different from 0.1) Remark: The (standard) test H 0 : β = 0 versus H 1 : β 0 is implemented in EViews 108

54 Now: One-sided hypothesis test at the level a H 0 : β q versus H 1 : β > q Conjecture behind this test: Invoice price has a positive impact on tip Appropriate test statistic: T = ˆβ q SE( ˆβ) 109

55 Reasoning: If ˆβ q is strongly positive, then this is an indicator of H 1 Decision rule: Find the (1 a) quantile of the t N 2 distribution t N 2;1 a Critical region: [t N 2;1 a, + ) i.e. reject H 0, if T > t N 2;1 a 110

56 Illustration: (Tip example) H 0 : β 0.1 versus H 1 : β > 0.1 at the level a = 0.05 (Realized) test statistic: T = ˆβ 0.1 SE(ˆβ) = = (1 a) quantile of the t N 2 distribution: t N 2;1 a = t 18;0.95 = Decision rule: T = < = t 18;0.95 H 0 cannot be rejected at the 5% level 111

57 Now: Important concept with regard to hypothesis testing with econometric software p-value Classical procedure: Fix the significance level a (very often: a = 0.05) Find the critical region (i.e. find the quantiles of the H 0 - distribution of the test statistic) via the fixed significance level a Conduct this concrete test with a sample and decide at the level a between rejection and non-rejection of H 0 112

58 Alternative procedure: Do not fix the significance level a, but instead consider a as variable Conduct the concrete test with a sample and obtain the realization of the test statistic T On the basis of the critical region (i.e. use the quantiles of the H 0 -distribution of T ) find the lowest significance level a min for which the concrete realization t just permits the rejection of H 0 113

59 Definition 3.8: (p-value) The smallest significance level a min, for which the observed value t of the test statistic T just permits the rejection of the null hypothesis H 0, is called the p-value. Pdf of the t N 2 -distribution a a / 2 a / 2 T = t t N ;1 a 0 ( q under H 0 ) t N ;1 a t N ;1 amin 114

60 Example: Consider the test problem H 0 : β = q versus H 1 : β q Decision rule: reject H 0 if (cf. Slides ) T t N 2;1 a/2 Assume that the realization of the test statistic T is t Find the p-value a min such that t = t N 2;1 amin /2 115

61 Obviously: If H 0 is rejected at the level a min, then H 0 is also rejected at any other significance level a > a min the lower the p-value a min, the higher the statistical justification of rejecting H 0 If the p-value a min < 0.05 then H 0 can be rejected at the 5% level In our tip example the p-value of the coefficient β is a min = (cf. Slide 105) the null hypothesis H 0 : β = 0 can be rejected at any conventional significance level 116

62 3.4 Prediction Now: Conditional prediction of the endogenous value y 0 given the exogenous value x 0 Forecast via the OLS estimators: ŷ 0 = ˆα + ˆβx 0 117

63 Illustration: (Tip example) Suppose that x 0 = 20 euros Estimated model: ŷ i = x i conditional forecast ŷ 0 =

64 Note: The true value y 0 will be y 0 = α + βx 0 + u 0 forecast error ŷ 0 y 0 = (ˆα α) + (ˆβ β)x 0 u 0 Two sources of the forecast error: 1. The error term u 0 may take on a value different from zero 2. The OLS estimators ˆα and ˆβ may differ from the true parameter values α and β 119

65 Reliability of the forecast: (I) Expected value of the forecast error: E(ŷ 0 y 0 ) = E(ˆα α) + E(ˆβ β)x 0 E(u 0 ) = 0 Variance of the forecast error: Var(ŷ 0 y 0 ) = σ 2 [ N + (x 0 x) 2 ] Ni=1 (x i x) 2 120

66 Reliability of the forecast: (II) Estimated variance of the forecast error: Var(ŷ 0 y 0 ) = ˆσ 2 [ N + (x 0 x) 2 ] Ni=1 (x i x) 2 where (cf. Slide 91) ˆσ 2 = 1/(N 2) N û 2 i i=1 Standard error of the forecast error: SE(ŷ 0 y 0 ) = Var(ŷ 0 y 0 ) 121

67 Illustration: (Tip example) (I) For x 0 = 20 the forecast is ŷ 0 = The expected value of the forecast error is 0 The OLS regression yields: ˆσ 2 = ( ) 2, x = 31.50, (cf. Output on Slides 105, 76) N i=1 (x i x) 2 =

68 Illustration: (Tip example) (II) The estimated variance of the forecast error is Var(ŷ 0 y 0 ) = ( ) 2 = [ (x ) ] For x 0 = 100 the forecast is ŷ 0 = the estimated variance of the forecast error is Var(ŷ 0 y 0 ) =

69 Now: Find a prediction interval that contains the endogenous value y 0 with a pre-specified probability Step #1: Consider the forecast error ŷ 0 y 0 and its standard error SE(ŷ 0 y 0 ) = Var(ŷ 0 y 0 ) 124

70 Step #2: Standardization of the forecast error =0 T = (ŷ {}}{ 0 y 0 ) E (ŷ 0 y 0 ) SE(ŷ 0 y 0 ) It can be shown that T t N 2 (t-distribution with N 2 degrees of freedom) 125

71 Step #3: (I) Derivation of the prediction interval With probability 1 a (a [0, 1]) T lies in the interval [ t N 2;1 a/2, t N 2;1 a/2 ], i.e. P ( t N 2;1 a/2 ŷ0 y 0 SE(ŷ 0 y 0 ) t N 2;1 a/2 ) = 1 a 126

72 Step #3: (II) Solving for y 0 we obtain P { ŷ 0 t N 2;1 a/2 SE(ŷ 0 y 0 ) y 0 ŷ 0 + t N 2;1 a/2 SE(ŷ 0 y 0 ) } = 1 a Thus, the prediction interval is given by [ŷ 0 t N 2;1 a/2 SE(ŷ 0 y 0 ), ŷ 0 + t N 2;1 a/2 SE(ŷ 0 y 0 )] 127

73 Illustration: (Tip example) For x 0 = 20 the forecast is ŷ 0 = The estimated variance is Var(ŷ 0 y 0 ) = for a = 0.05 the prediction interval is given by [ , ] 128

74 Width of the prediction interval for y 0 given x 0 y i y 0 t N 2;1 a / 2 SE( y 0 y 0 ) y 0 x 0 y 0 t N 2;1 a / 2 SE( y 0 y 0 ) x 0 129

The Simple Regression Model. Part II. The Simple Regression Model

The Simple Regression Model. Part II. The Simple Regression Model Part II The Simple Regression Model As of Sep 22, 2015 Definition 1 The Simple Regression Model Definition Estimation of the model, OLS OLS Statistics Algebraic properties Goodness-of-Fit, the R-square