Econometrics Midterm Examination Answers March 4, 204. Question (35 points) Answer the following short questions. (i) De ne what is an unbiased estimator. Show that X is an unbiased estimator for E(X i ) = under the usual assumptions. (5 points) A: ^ is an unbiased estimator for if E(^) = : X = P n X i=n, under the usual assumption of random sample (where observations are independent draw from an identical distribution), then E( X) = P i E(X i)=n = n=n = : (ii) If E(X i ) = and V ar(x i ) = 2, and observations are independent of each other, what does the Law of Large Numbers and Central Limit Theorem state about the sample mean X = ( P n X i) =n? (6 points) A: LLN: X n converges in distribution to E(X i ) =. (Intuitive explanation also acceptable. The distribution of X n gets narrower and narrower when n increases, and when n!, Xn collapse to the true value.) CLT: p n( X n )! d N(0; 2 ): or X app N(; 2 =n): (Intuitive explanation: The distribution of X gets closer and closer to normal when n becomes larger and larger.) (iii) Is the following statement true or false? Explain. "If I always provide an estimate of 37 for whatever sample I obtain, this estimator is the most e cient because its variance is zero." (5 points) A: False. It is only meaningful to compare variance if the estimators are unbiased (or consistent.) We don t know if the true value is 37, so such an estimator is likely to be biased and inconsistent. (iv) Are the following valid null hypotheses in statistical testing? Explain. (a) X = 00; (b) X = 00. (5 points) A: Only A is valid because we test hypothesis on parameters (properties) of the population, not for a sample number. (v) What do the no perfect multicollinearity and zero conditional mean assumption mean in the basic assumption of linear regression model? What are the consequences if they are violated? (8 points)
A: Perfect multicollinearity means there exists a linear relationship between some regressors. This leads to the OLS estimator unde ned. Zero conditional mean means E(u i jx i ) = 0, meaning that u i is not predictable by x i : (Any information related to x i are captured by the linear function.) (This also implies that regressors and error are uncorrelated.) When this is violated, the estimator will become biased. (vi) What are the four factors that a ect the variance of individual slope coe cient estimators under OLS for a multiple regression model? How do they a ect the variance? (6 points) A: The four factors include the variance of the error term 2 (positive), sample size n (negative), the variance of the regressor (negative) and the Rk; 2 k between this regressor and other regressors (positive). Question 2 (2 points) Let z be a random column vector of size 3 : 0 z = @ (a) Write out z 0 z and zz 0 in terms of z ; z 2 and z 3 : (3 points) (b) If z N(0; I 3 ), what is the distribution of z 0 z? (2 points) (c) If z N(0; I 3 ); what is E(zz 0 )? What is E(z 0 z)? (4 points) (d) If a = (; ; ) 0 (a column vector of ), what is a 0 z? Calculate E(a 0 z) and V ar(a 0 z): (3 points) z z 2 z 3 A A: (a) z 0 z = z 2 + z2 2 + z2 3 0 zz 0 = @ z 2 z z 2 z z 3 z 2 z z 2 2 z 2 z 3 z 3 z z 3 z 2 z 2 3 A (b) Chi-Square with degree of freedom 3, since each is the square of standard normal and each term is independent of all others. (c) 0 E(z E(zz 0 2) E(z z 2 ) E(z z 3 ) ) = @ E(z 2 z ) E(z2 2) E(z 2z 3 ) A E(z 3 z ) E(z 3 z 2 ) E(z3 2) = V ar(z) = I 0 0 0 = @ 0 0 A 0 0 E(z 0 z) = E(z 2 ) + E(z2 2 ) + E(z2 3 ) = V ar(z ) + V ar(z 2 ) + V ar(z 3 ) = + + = 3: (Since E(z i ) = 0:) (d) a 0 z = z + z 2 + z 3 : E(a 0 z) = 0 + 0 + 0 = 0: V ar(a 0 z) = V ar(z ) + V ar(z 2 ) + V ar(z 3 ) = 3: (In terms of matrix, V ar(a 0 z) = a 0 Ia = a 0 a = 3:) 2
Question 3 (3 points) We want to know about the expenditure on food in a month for students in this university. In particular we want to estimate the mean of their food expenditure. Now we randomly sample (n) 225 students. The sample mean food expenditure obtained is Y = 962:5 with standard deviation of the sample s = 88:8: We want to test the hypothesis that H 0 : = 000 against H : 6= 000: (a) What test statistic do we use? What is its distribution if the null is true? (3 points) (b) Calculate the statistic and carry out the test at 5% signi cance level. (Critical value is.96.) (5 points) (c) Calculate the 95% two-sided con dence interval for the population mean food expenditure. (3 points) (d) Is the 99% two-sided con dence interval longer or shorter than the one at 95%? Why? (2 point) (a) We should use t test and t statistic should be used. Its distribution if the null hypothesis is true is T (224), or approximately normal. (3 points) (b) Here, t = (962:5 000)=(88:8= p 225) = 6: 334 5 < :96: Therefore we can reject the null hypothesis at 5% signi cance level. (5 points) (c) 95% con dence interval is given by 962:5:96(88:8= p 225) = 962:5: 6 = (950:9; 974:) : (3 points) (d) 99% con dence interval is bigger/wider, because given the same information, to increase probability of covering the true value, we have to allow a longer interval. (We use a larger critical value.) (2 point) 3
Question 4 (30 points) Consider one example we have gone through in class. We want to see what determine student s test score in school. The dependent variable testscr is the average test score of a school, str is the student-teacher ratio of the school (number of student per teacher), and avginc is the average income of families (per $000) in the school district. Student-teacher ratio captures how the class size can a ect students learning, while average income indirectly captures intensity of human capital investment from the family. Here we estimate testscr i = + 2 str i + 3 avginc i + 4 avginc 2 i + 5 avginc 3 i + u i where square and cube of average income are also included. The following shows the regression output from Stata:. reg testscr str avginc avginc2 avginc3 Source SS df MS Number of obs = 420 F( 4, 45) = 35.49 Model 8644.9747 4 2536.2437 Prob > F = 0.0000 Residual 65964.689 45 58.950889 R squared = 0.5663 Adj R squared = 0.5622 Total 5209.594 49 363.030056 Root MSE = 2.608 testscr Coef. Std. Err. t P> t [95% Conf. Interval] str.9277523.3369433 2.75 0.006.59008.2654239 avginc 5.24736.8536044 6.00 0.000 3.446809 6.802664 avginc2.0073.0377 2.72 0.007.740683.028462 avginc3.0007293.0004685.56 0.20.00097.006503 _cons 67.8974 8.679455 7.9 0.000 600.8362 634.9586 (a) Interpret cone cient on student-teacher ratio (str). Is it statistically signi cant at 5% level? (Critical value for 2-sided test under normal distribution is.96.) (5 points) (b) If instead we want to test whether the coe cient on str is -2.0, what is the test statistic and can we reject the null at 5% signi cance level? (3 points) (c) Write down the formula of R 2 in terms of various sum of squares and verify that the number shown on the right column is the same as calculated from the sum of squares shown on the left. Do the same for R 2 (or adjusted R 2 :) (4 points) (d) What is the F test on the right column testing? Write down the null and alternative hypothesis. How can we calculate this statistic with the sums of squares available? Can we reject the null hypothesis at 5% level? (6 points) (e) Verify the con dence interval shown for the coe cient str using the formula introduced in class and numbers provided in the results. (4 points) (Continue next page. If you don t have enough space, you can write on the next page.) Now we would like to test the joint signi cance of the coe cients on average income and its square and cube terms. That is H 0 : 3 = 4 = 5 = 0 using F test. (Note: In my notation, is the intercept term.) (f) What is the alternative hypothesis? (2 points) The results for the restricted regression is shown below 4
. reg testscr str Source SS df MS Number of obs = 420 F(, 48) = 22.58 Model 7794.004 7794.004 Prob > F = 0.0000 Residual 4435.484 48 345.252353 R squared = 0.052 Adj R squared = 0.0490 Total 5209.594 49 363.030056 Root MSE = 8.58 testscr Coef. Std. Err. t P> t [95% Conf. Interval] str 2.279808.4798256 4.75 0.000 3.22298.336637 _cons 698.933 9.46749 73.82 0.000 680.323 77.5428 (g) Perform the test. What is the F statistic in this sample? What are the distribution and the degrees of freedom for the distribution under the null hypothesis? Can we reject the null hypothesis at 5% signi cance level? (Possible critical values: F ;;0:05 = 3:84; F 3;;0:05 = 2:60; F 5;;0:05 = 2:2, where the subscripts means the numerator and denominator degrees of freedom and signi cance level respectively.) (6 points) Answers: (a) For one more student per teacher, the average score of the school falls for about -0.92 points, holding average income constant. As the p-value is smaller than 0.05, (or t > :96), the coe cient is signi cant. (5 points) (b) t = ( 0:9278 + 2:0)=0:3369 = 3: 82 5 > :96. So we can also reject the null that the coe cient is -2.0. (3 points) (c) R 2 = SSE=SST = SSR=SST = 8644:9747=5209:594 = 0:5663: R 2 65964:689=45 = (SSR=(n K))=(SST=(n )) = 5209:594=49 = 0:5622: (4 points) (d) The F test reported on the right is the test that all population coe cients beside the constant (intercept) term are zero. H 0 : 2 = 3 = 4 = 5 = 0 against H : 2 6= 0 or 3 6= 0 or 4 6= 0 or 5 6= 0: It can be calculated by F = (5209:594 65964:689)=4 65964:689=45 = 35:49 The p-value stated there is smaller than 0.00005, so we can reject the null at 5% signi cance level. (6 points) (e) The 95% con dence interval is 0:9278:96(0:33694) = 0:92780:660 4 = ( 0:9278 0:6604; 0:9278+0:6604) = ( : 588 2; 0:267 4) (Some discrepency due to the use of normal critical value. They may have used a more accurate critical value from T(45).) (4 points) (f) H : 3 6= 0 or 4 6= 0 or 5 6= 0: (2 points) (g) F = ((SSR R SSR U )=3)=(SSR U =(n K)) = 3 (4435:484 65964:689)=(65964:689=45) = 64: 3 > 2:60. So, we can reject the null hypothesis that all three coe cients are zero. (6 points) 5
Question 5 (0 points) Consider the case that the regression function does not have an intercept. If we know that the population regression function is y i = x i + u i where 0 = 0: (i) What is E(y i jx i )? What is E(y i jx i = 0)? (2 points) (ii) Derive the OLS estimator for. That means we minimize the sum of squares of residuals min (y i ^ x i ) 2 ^ (5 points) (iii) Show that the same estimator can be obtained by using the moment condition E(u i x i ) = 0: (3 points) A: (i) E(y i jx i ) = E( x i jx i ) + E(u i jx i ) = x i since E(u i jx i ) = 0 by basic assumptions. E(y i jx i = 0) = (0) = 0: Thus, a regression model without an intercept term has a mean of y zero when x = 0: (The line passes through the origin.) (ii) The rst order condition is 2(y i ^ x i )( x i ) = 0 x i y i + ^ x 2 i = 0 ^ = P n x iy i P n x2 i (iii) By the moment condition, E(u i x i ) = E((y i x i )x i ) = E(y i x i ) E(x 2 i ) = 0: By replacing the sample moments, we have Thus we have the same estimator. n y i x i = ^ n x 2 i! 6