Chapter 3 Properties ad Hypothesis Testig 3.1 Types of data The regressio techiques developed i previous chapters ca be applied to three differet kids of data. 1. Cross-sectioal data. 2. Time series data. 3. Pael data. The first cosists o observig various ecoomic uit (e.g. firms, coutries, households, idividuals) at oe poit i time. For example, we observe the wages, experiece ad educatio of may idividuals, oly oce ad at all at the same time. The secod cosists o observig the same ecoomic uit at differet poit i time. For example, we observe daily stock prices over may years. Fially, the third combies the characteristics of the first ad the secod. That is, we observe various ecoomic uits at repeated poits i time. For example, we have iformatio about the iflatio, uemploymet ad GDP of a group of coutries ad over may years. 3.2 Assumptios of the model Whe the regressors i our ecoometric model are o stochastic, we will make the followig six assumptios. 1. The model is liear i the parameters ad it is correctly specified. Equatio 2.1 is liear i β, while Equatio 2.2 is ot. Y = β 1 + β 2 X+ u (3.1) Y = β 1 X β 2 + u (3.2) 23
24 3 Properties ad Hypothesis Testig 2. There is some variatio i the regressor i the sample. We eed variatio i the variable X to idetify the relatioship. Cosider the OLS estimator for β 2 : b 2 = i=0 (X i X)(Y i Ȳ) i=0 (X i X) 2. (3.3) If there is o variatio i X, the the deomiator is zero ad we caot obtai b 2. 3. The expected value of the disturbace term is zero. E(u i )=0 for all i. (3.4) Some u i will be egative, some will be positive, but o average they will be zero. If a costat is icluded i the model, the coditio is satisfied automatically. 4. The disturbace term is homoscedastic. Homoscedasticity meas that the variace of the error terms u i is costat across all observatios i. Hece, we ca write: σ 2 u i = σ 2 u for all i. (3.5) Because the error term has zero mea (from assumptio 3), the the populatio variace of u i is equal to: E(u 2 i)=σ 2 u for all i. (3.6) σ 2 u is a populatio parameter, therefore it is ukow ad eed to be estimated. 5. The values of the disturbace terms have idepedet distributios. u i is distributed idepedetly of u j for all j i. (3.7) This meas that there is o autocorrrelatio i the error term. This meas that the populatio covariace betwee u i ad u j is zero: σ ui u j = 0. (3.8) With assumptios 1 through 5, we says that OLS coefficiets are BLUE: Best Liear Ubiased Estimators. They are best, because they have the smallest variace across all ubiased estimators. 6. The disturbace term has a ormal distributio. u i N[0,σ 2 u] for all i. (3.9) The error term is distributed ormal with mea zero ad variace σ 2 u. This assumptio becomes useful at the time of performig t tests, F tests, ad costructig cofidece itervals for β 1 ad β 2 usig the regressio results. The justificatio for this assumptio depeds o the cetral limit theorem. This oe state that if a radom variable is the composite result of the effects of a large umber of
3.4 Precisio of the coefficiets 25 other radom variables (that are ot ecessarily ormal), it will have a approximately ormal distributio. 3.3 Ubiasedess of the coefficiets Recall that a estimator ˆθ is ubiased if E( ˆθ)=θ. The expected value of the estimator is equal to the true populatio parameter. For the slope coefficiet i the OLS regressio we have: where b 2 = i=0 (X i X)(Y i Ȳ) i=0 (X i X) 2 (3.10) = β 2 + i=0 (X i X)u i i=0 (X i X) 2 = β 2 + a i = a i u i (X i X) i=0 (X i X) 2. (3.11) Thus, this shows that b 2 is equal to its true value, β 2, plus a liear combiatio of the values of the error terms. If we take expectatios of b 2 we have: E(b 2 )=E(β 2 )+E ( ) a i u i = β2 + E(a i u i )=β 2 + a i E(u i )=β. (3.12) The term a i goes out of the expectatio because a i is oly a fuctio of costat Xs. I additio, the last equality holds because E(u i )=0. Hece, b 2 is a ubiased estimator of β 2, E(b 2 )=β 2. 3.4 Precisio of the coefficiets We are also iterested o how precise b 1 ad b 2 are i estimatig the populatio parameters β 1 ad β 2. A measure of this precisio are their populatio variaces, give by: σ 2 b 1 = σ 2 u σ 2 b 2 = ( 1 + X i=0 (X i X) 2 ), ad (3.13) σ 2 u i=0 (X i X) 2 (3.14)
26 3 Properties ad Hypothesis Testig Oe cocer i the implemetatio of the above formulas is that σ 2 u is a ukow populatio parameter ad eed to be estimated. A atural estimator for this regressio variace is the variace of the regressio errors. Because the populatio regressio errors u i are also ukow, we use the sample couterparts e i ad adjust for the correspodig degrees of freedom. Hece, we have: Su 2 = 1 2 e 2 i. (3.15) This Su 2 is the ubiased estimator of σu 2, ad 2 are the degrees of freedom. We subtract two from the sample size because we are estimatig two parameters: the regressio costat ad oe slope coefficiet. The, we use the followig formulas to estimate the stadard errors of b 1 ad b 2 : S b1 = S b2 = S 2 u ( 1 + X i=0 (X i X) 2 ), ad (3.16) Su 2 i=0 (X i X) 2. (3.17) 3.5 The Gauss-Markov theorem The Gauss-Markov theorem simply states that whe assumptios 1 through 5 above are satisfied, the OLS estimators are Best Liear Ubiased Estimators (BLUE) of the regressio parameters. Best refers to smallest variace. 3.6 Hypotheses testig Hypothesis testig is simply a method of makig decisios usig data. It starts with the formulatio of the ull ad the alterative hypotheses ad the uses some test statistics to assess the truth of the ull hypothesis. 3.6.1 Formulatio of the ull hypothesis The formulatio of the ull hypothesis starts with a relatioship i mid. For example, that the percetage rate of price iflatio (p) depeds o the percetage rate of wage iflatio (w) followig the liear equatio: p i = β 1 + β 2 w i + u i (3.18)
3.6 Hypotheses testig 27 The, you wat to test the hypothesis that the price iflatio is equal to the wage iflatio. This is deoted by H 0 ad it is kow as the ull hypothesis. I additio, we also defie a alterative hypothesis, deoted by H 1 ad represets the coclusio of the test if the ull hypothesis is rejected. For our example the ull ad the alterative hypothesis are writte as: I geeral, the ull ad alterative hypotheses are: H 0 : β 2 = 1 (3.19) H 1 : β 2 1 (3.20) H 0 : β 2 = β 0 2 (3.21) H 1 : β 2 β 0 2. (3.22) 3.6.2 t-tests Recall that β 2 is ukow ad that we have to use the estimate b 2. The, the decisio rule to reject the ull hypothesis should compare the estimate b 2 with the hypothesized value β2 0. Ituitively, if the values are far apart, the there is evidece agaist the ull. This compariso should take ito accout the fact that b 2 is subject to some samplig variatio (it is ot the actual β 2 ). We will use the followig statistic: z= b 2 β 0 2 σ b2 (3.23) The umerator is just the distace betwee the regressio estimate ad the hypothesized value, with the deomiator is the stadard deviatio of b 2, give by the square root of the expressio i Equatio 3.14. z is the umber of stadard deviatios betwee b 2 ad β 2. For a kow σ b2, this oe follows a ormal distributio. However σ b2 is ukow ad we eed to use the estimate of the stadard error of b 2. This oe is give by S b2 ad it is preseted i Equatio 3.17. The we use the followig t-statistic: t = b 2 β 0 2 S b2 (3.24) To kow if the deviatios betwee b 2 ad β2 0 are sigificatly large, we compare this t-statistic with the critical values from the table t distributio with 2 degrees of freedom. The ull hypothesis is ot rejected if the followig coditio is met: t 2,α/2 b 2 β 0 2 S b2 t 2,α/2 (3.25) Where t 2,α/2 is just the otatio of the critical value tha comes from the t distributio with 2 degrees of freedom ad at sigificace level α. The sigificace
28 3 Properties ad Hypothesis Testig Fig. 3.1 Acceptace regio for the t-test. level is the probability that we reject the ull hypothesis whe i fact it is true. The rejectio regios are illustrated i Figure 3.1. 3.6.3 Cofidece itervals The cofidece iterval idicates the reliability of a estimate. The cofidece iterval for the populatio parameter β 2 ca be derived from Equatio 3.25 i the followig way: 1 α = P ( t 2,α/2 b 2 β 2 S b2 t 2,α/2 ) 1 α = P ( t 2,α/2 S b2 b 2 β 2 t 2,α/2 S b2 ) 1 α = P ( b 2 t 2,α/2 S b2 β 2 b 2 +t 2,α/2 S b2 ) (3.26) The meaig of the above equatio is that the populatio parameter β 2 will be betwee the lower cofidece limit b 2 t 2,α/2 S b2 ad the upper cofidece limit b 2 +t 2,α/2 S b2 with probability (1 α) or 100 (1 α)%. The p values provide a alterative approach to reportig the sigificace of regressio coefficiets or whe carryig out more geeral hypothesis testig. As you ca see from Equatio 3.25 ad Figure 3.1, differet sigificace levels α ca yield a differet coclusio i the rejectio or ot of the ull hypothesis. The p value of a hypothesis test represet the miimum sigificace level at which the ull is rejected. The, whe the p value is below the sigificace level α we reject the ull.
3.6 Hypotheses testig 29 Fig. 3.2 Cofidece iterval for β 2. 3.6.4 F test A useful tool if we wat to test if there is o relatioship betwee X ad Y if the F test. I the simple liear regressio model with oly oe slope coefficiet, the ull ad the alterative i a F test are: H 0 : β 2 = 0 (3.27) H 1 : β 2 0. (3.28) This test is build o the idea of testig how good is the regressio model i explaiig the variatio i Y. I Equatio 2.15 we already separated the variatio of Y ito its explaied ad uexplaied compoets. These are: (Y i Ȳ) 2 = (Ŷ i Ȳ) 2 + (Y i Ŷ i ) 2 (3.29) T SS = ESS+RSS. (3.30) The total sum of squares (TSS) is the summatio of the explaied sum of squares (ESS) ad the residual sum of squares (RSS). The, the F statistic for goodess of fit of a regressio is writte as the explaied sum of squares, per explaatory variable, divided by the residual sum of squares, per remaiig degrees of freedom: F = ESS/(k 1) RSS/( k) (3.31)
30 3 Properties ad Hypothesis Testig Fig. 3.3 Regressio output i MS Excel. where k is the total umber of coefficiets we are estimatig, hece (k 1) is the umber of slope coefficiets. That is, the total umber of parameters we are estimatig mius the costat parameter. If we divide the umerator ad the deomiator by T SS, the the F statistics ca be writte i terms of the R 2 as follows: F = (ESS/T SS)/(k 1) (RSS/T SS)/( k) = R2 /(k 1) (1 R 2 )/( k) (3.32) If this F statistic is greater that the critical value from the table F distributio with (k 1) ad ( k) degrees of freedom, F k 1, k, we reject the ull hypothesis ad coclude that the regressio model does ot sigificatly explai the variatio i variable Y. For the simple regressio model with oly oe slope coefficiet, k = 2, we have: R 2 F = (1 R 2 )/( 2). (3.33) If this F statistic>f 1, 2 we reject the ull hypothesis preseted i Equatio 3.28.
3.7 Computer output 31 3.7 Computer output The computer regressio output is very similar across differet statistical packages. Figure 3.3 shows the output usig MS Excel for the estimatio of the followig simple regressio model: wage = β 1 + β 2 exper i + u i (3.34) To obtai the regressio estimated coefficiets we use Equatios 2.4 ad 2.5: b 2 = (X i X)(Y i Ȳ) (X i X) 2 = 0.091 (3.35) b 1 = Ȳ b 2 X = 4.642 (3.36) The total sum of squares, estimates sum of squares, ad residual sum of squares are obtaied usig 2.15 ad 2.15: T SS = ESS = RSS = The regressio R 2 comes from Equatio 2.18: R 2 = 1 From the square root of Equatio 3.15: 1 S u = 2 (Y i Ȳ) 2 = 27347.439 (3.37) (Ŷ i Ȳ) 2 = 1505.539 (3.38) (Y i Ŷ i ) 2 = 25841.901 (3.39) e2 i (Y = 0.055 (3.40) 2 i Ȳ) e 2 i = 4.532 (3.41) The, the stadard errors of the coefficiets are computer usig Equatios 3.17 ad 3.17: ( 1 S b1 = Su 2 + X ) i=0 (X i X) 2 = 0.233 (3.42) Su S b2 = 2 i=0 (X = 0.011 (3.43) i X) 2 The F statistic uses Equatio 3.32:
32 3 Properties ad Hypothesis Testig The t statistics use Equatio 3.24: F = R2 /(k 1) (1 R 2 = 73.291 (3.44) )/( k) t = b 1 S b1 = 19.961 (3.45) t = b 2 S b2 = 8.561 (3.46) Fially, for the 95% upper ad lower cofidece levels, we use Equatio 3.26: b 1 t 2,α/2 S b1 = 4.186 (3.47) b 1 +t 2,α/2 S b1 = 5.099 (3.48) b 2 t 2,α/2 S b2 = 0.071 (3.49) b 2 +t 2,α/2 S b2 = 0.112 (3.50)