WISE MA/PhD Programs Econometrics Instructor: Brett Graham Spring Semester, Academic Year Exam Version: A

WISE MA/PhD Programs Econometrics Instructor: Brett Graham Spring Semester, 2016-17 Academic Year Exam Version: A INSTRUCTIONS TO STUDENTS 1 The time allowed for this examination paper is 2 hours. 2 This examination paper contains 32 questions. You are REQUIRED to answer ALL questions. No marks will be deducted for wrong answers. 3 For each multiple choice question there is one and ONLY ONE suitable answer. 4 All numerical answers should be rounded to 3 decimal places. I will accept every answer to within 0.002 of the correct answer. All probabilities should be expressed in decimal form. 5 This examination paper contains 17 pages including this instruction sheet, an answer sheet for the first 30 questions, an answer sheet for question 31, an answer sheet for question 32 and a blank page at the end of the exam. 6 This is a closed-book examination. You are allowed to bring one handwritten 105mm by 75mm piece of paper to the exam. You are also allowed to use a financial calculator. 7 You are required to return all examination materials at the end of the examination. 8 Where required, please use the following critical values, Tail-end Probabilities of the Normal Distribution z 0.994 1.405 1.751 2.054 2.326 2.576 Pr(Z z) (%) 16 8 4 2 1 0.5 5% Significance Level Critical Values of the χ 2 m Distribution Degrees of Freedom (m) 1 2 3 4 5 6 7 8 9 Critical Value 3.84 5.99 7.81 9.49 11.07 12.59 14.07 15.51 16.92

(2) (3) Intercept 15.519 3.278 2.944 (0.336) (2.455) (2.513) College 5.869 5.886 5.913 (0.404) (0.398) (0.396) F emale 1.924 1.926 1.943 (0.403) (0.398) (0.397) Age 0.306 0.296 (0.060) (0.060) NE 0.888 (0.567) MW 1.592 (0.555) S 0.427 (0.534) F -statistic for regional effects 2.994 R 2 0.181 0.200 0.205 SER 6.320 6.245 6.226 Robust standard errors in parentheses. 1000 observations Table 1: Results of regressions of Average Hourly Earnings on selected regressors using randomly generated data. Use the information found in Table 1 to answer the next 4 questions (1-4): The data set used to estimate the regressions in Table 1 consists of information on 1000 full-time full-year workers. The variables are defined as follows: AHE = average hourly earnings (in 2015 $US/hour), College = 1 if college, 0 otherwise, NE = 1 if Region = Northeast, 0 otherwise, F emale = 1 if female, 0 otherwise, MW = 1 if Region = Midwest, 0 otherwise, Age = age (in years), S = 1 if Region = South, 0 otherwise. 1. Using the regression results in column, construct a 96% confidence interval of the difference between college graduate and high school graduate earnings. Solution: The 96% confidence interval of the college-high school earnings difference is [5.039, 6.699]. Page 2

2. Using the regression results in column (2), show the test statistic used to determine if age is a statistically important determinant of earnings? Solution: The t-statistic is 0.306/0.06 = 5.1. 3. Sally is a 28-year-old female college graduate. Betsy is a 39-year-old female college graduate. Using the regression results in column (2), construct a 99% confidence interval for the expected difference between their earnings. Solution: The 99% confidence interval for the expected difference between their earnings is: 11 [0.306 ± 2.576 0.06] = [$1.661, $5.071]. 4. Using the regression results in column (3) construct the χ 2 test statistic for testing if there are important regional differences and the 5% critical value of the test. Solution: The χ 2 -statistic testing if the coefficients on the regional regressors are zero is 8.982. The critical value associated with the test is 7.81. Use the following information to answer the next 3 questions (5-7): The random variable X has a Bernoulli distribution with Pr(X = 1) = 0.9. sample 3 times from this distribution and calculate the sample mean X. You randomly 5. What is Pr( X = 1/3)? Solution: Pr( X = 1/3) = 0.027 6. What is E( X)? Solution: E( X) = E(X) = 0.9 7. What is E( X 2 )? Solution: E( X 2 ) = var( X) + E( X) 2 = 0.9 (1 0.9)/3 + 0.9 2 = 0.84. Page 3

8. The conditional expectation of Y given X, E(Y X = x), is calculated as follows: a. E X [E(Y X)] b. k y i Pr(X = x i Y = y) c. k y i Pr(Y = y i X = x) d. k y i Pr(Y = y i, X = x) e. k E(Y X = x i) Pr(X = x i ) 9. The central limit theorem a. can only be applied if E(u X) = 0. b. only holds in the presence of the law of large numbers. c. postulates that the sample mean mean µ Y. Ȳ is a consistent estimator of the population d. states conditions under which the distribution of a variable involving the sum of Y 1,..., Y n i.i.d. variables converges to a normal distribution. e. states that Ȳ p µ Y. 10. The power of the test a. is the probability that the test incorrectly rejects the null hypothesis when the null is true. b. is the probability that the test correctly rejects the null hypothesis when the null is true. c. is the probability that the test correctly rejects the null hypothesis when the alternative is true. d. is the probability that the test incorrectly does not reject the null hypothesis when the alternative is true. e. is one minus the size of the test. Page 4

11. If two random variables X and Y are uncorrelated, which of the following statements is not true? a. cov(x, Y ) = 0 b. ρ X,Y = 0 c. var(x + Y ) = var(x) + var(y ) d. E(X + Y ) = E(X) + E(Y ) e. E(X Y ) = E(X) Solution: E(X Y ) = E(X) is incorrect, because independence between two variables implies that they are uncorrelated, but not vice versa. 12. Two teaching assistants, Susan and David, are comparing student attendance rates for their lectures. From a sample of 420 students from Susan s class, 252 students attended lectures. From a sample of 360 students from David s class, 216 students attended lectures. No student takes both Susan and David s class. Susan claims that she has a higher attendance rate than David. What is the relevant point estimate, test statistic and p-value for this claim? a. 0, 0 and 0.50 b. 0, 1 and 0.16 c. 0.6, 1 and 0.16 d. 0.6, 17.05 and 0.05 e. There is insufficent information to calculate all three values. Solution: Both classes have a proportion point estimate of 0.6 hence the point estimate is 0. Since the point estimate is zero the test statistic is also zero. Since this is a one sided test and we know the center of the normal distribution is 0 the p-value is 0.5. Page 5

Use the following information to answer the next 6 questions (13-18): The weekly spending habits of 500 randomly chosen males and 600 randomly chosen females is recorded. Let µ m denote the male population average of weekly spending and µ w denote the female population average of weekly spending. Let X m and X w denote their sample counterparts. Let σ m denote the male population standard deviation of weekly spending and σ w denote the female population standard deviation of weekly spending. Let s m and s w denote their sample counterparts. In the survey X m = 53.6, X w = 50.1, s m = 15.2, s w = 14.3. 13. You are interested in the competing hypotheses: H 0 : µ m µ w = 5 vs. H 1 : µ m µ w 5. Suppose that you decide to reject H 0 if X m X w 5 > 3. Within what range is the size of this test if σ m = σ w = 25? a. (0, 0.01) b. (0.01, 0.02) c. (0.02, 0.04) d. (0.04, 0.08) e. (0.08, 1) Solution: The size of the test is the probability of rejecting the null when the null is true. Pr( X m X w 5 > 3) µ m µ w = 5) = Pr( z > = Pr( z > 1.9817) = 2(1 Φ(1.9817)) 3 σ 2 m /n m + σ 2 w/n w ) Since the critical value of 1.9817 lies between z 0.04 and z 0.02, the significance level lies between 4% and 8% (it is 4.76%). 14. Within what range is the power 1 β of this test if µ m µ w = 3 and σ m = σ w = 25? a. (0, 0.01) b. (0.01, 0.02) c. (0.02, 0.04) d. (0.04, 0.8) e. (0.8, 1) Solution: The power of the test is the probability of rejecting the null when the null is false. Pr( X m X w 5 > 3 µ m µ w = 3) =Pr(z < + Pr(z > 1 σ 2 m /n m + σ 2 w/n w ) 5 σ 2 m /n m + σ 2 w/n w ) =Pr(z < 0.661) + Pr(z > 3.303) =Pr(z > 0.661) + Pr(z > 3.303) The second term is approximately zero, the first term is more than 8%, since z 0.08 = 1.405 > 0.661. Page 6

15. Using the sample information, what is the test statistic associated with H 0 : µ m µ w = 5 vs. H 1 : µ m µ w 5. Solution: t = X m X w (µ m µ w ) s 2 m /n m + s 2 w/n w = 53.6 50.1 5 15.22 /500 + 14.3 2 /600 = 1.674 16. Calculate the lower confidence limit of a 99% confidence interval for µ m µ w. Solution: Using the critical value z 0.005 = 2.576 from the table, the answer is LCL = X m X w z 0.005 s 2 m /n m + s 2 w/n w = 3.5 2.576 0.896 = 1.192. 17. Suppose that the survey is carried out 10 times, using independently selected people in each sample. For each of these 10 surveys, a 96% confidence interval for µ m µ w is constructed. What is the probability that the true value of µ m µ w is contained in all 10 of these confidence intervals? Solution: Pr(µ m µ w ) [LCL, UCL] = 0.96 10 = 0.665. 18. Suppose that the survey is carried out 10 times, using independently selected people in each sample. For each of these 10 surveys, a 92% confidence interval for µ m µ w is constructed. How many of these confidence intervals do you expect to contain the true value of µ m µ w? Solution: 9.2 Page 7

19. To investigate whether or not education and fertility are correlated, you collect data on population growth rates (Y ) and education (X) for 86 countries. Given the sums below, compute the sample correlation: n Y i = 1.594; n X i = 449.6; n Y i X i = 6.4697; n Y 2 i = 0.03982; n Xi 2 = 3, 022.76 Solution: r = 0.70904. Use the following information to answer the next 2 questions (20-21): A regression of average weekly earnings (AW E, measured in dollars) on age (Age, measured in years) using a random sample of college-educated full-time workers aged 25-65 yields the following AW E = 345 + 12 Age, R 2 = 0.023, SER = 624.1. 20. SER is a. measured in dollars and R 2 is measured in years. b. measured in years and R 2 is measured in dollars. c. unit-free and R 2 is measured in years. d. measured in dollars and R 2 is unit-free. e. measured in years and R 2 is measured in years. 21. What is the regression s predicted earnings for a 45-year-old worker? Solution: AW E = 345 + 12 45 = $885. Page 8

22. If the three least ( squares assumptions hold, then the large sample distribution of ˆβ 1 is a. N 0, var[(x i µ X )u i ] b. N c. N d. N e. N ( 0, ( β 1, ( β 1, [var(x i )] 2 ). var(u i ) 2 n[var(x i )] 2 ). σ 2 u (X i X) 2 ). var(u i ) n[var(x i )] 2 ). ( β 1, var[(x i µ X )u i ] n[var(x i )] 2 ). Use the following information to answer the next 2 questions (23-24): The dependent variable Y is test score. The independent variable X is class size. The sample size n = 1000. Suppose you have set up the following linear model and plan to estimate the model with ordinary least squares. Y i = β 0 + β 1 X i + u i, i = 1,..., 1000 The OLS estimator of β 0 equals 0.8 with a standard error of 0.383, and the OLS estimator of β 1 equals 0.2 with a standard error of 0.215. 23. What is the 96% confidence interval of the slope coefficient? Solution: The confidence interval is [0.2 ± z 0.02 0.215] = [ 0.242, 0.642]. 24. Suppose you want to test the hypothesis that class size affects test scores. What is the value of the test statistic? Solution: t = ˆβ 1 0 S.E.( ˆβ 1 ) = 0.2 0.215 = 0.930 Page 9

25. You regress height on weight and calculate the 95% confidence interval of the slope coefficient as [0.1, 0.9]. Which of the following statements will be true? a. you can reject the null hypothesis that β 1 = 0 at a 10% significance level. b. you can reject the null hypothesis that β 1 = 0 at a 1% significance level. c. you can reject the null hypothesis that β 0 = 0 at a 10% significance level. d. you can reject the null hypothesis that β 0 = 0 at a 1% significance level. e. you cannot reach any of the above conclusion from a 95% confidence interval of the slope coefficient of [0.1, 0.9]. 26. You have collected 500 observations of weight of Chinese people between the age of 20-22. The average weight of people in your sample is 64 kilograms, for females in your sample it is 61 kilograms and for males it is 66 kilograms. Using ordinary least squares regression, you estimate the following population regression model, W eight i = β 0 + β 1 F emale i + u i, where F emale i is a binary variable that takes the value 1 if the subject is female and zero if the subject is male. What are your estimates of β 0 and β 1? Solution: β 0 = 66. β 1 = 5. 27. In multiple linear regression which of the following is not a random variable? a. ˆβ0 b. ˆβ1 c. ū d. Ŷ e. µ X Solution: µ X. 28. Under the four least squares assumptions for the multiple linear regression problem, the OLS estimators for the unknown model parameters a. are BLUE. b. are efficient among all linear estimators of the slope and intercept. c. are unbiased and consistent. d. have a normal distribution in small samples as long as the errors are homoskedastic. e. have an exact normal distribution for n > 50. Page 10

29. The homoskedasticity-only F -statistic is given by which of the following formulas a. F = (R 2 unrestricted R2 restricted )/q (1 R 2 unrestricted )/(n k unrestricted 1). b. F = (R2 unrestricted R2 restricted )/q (1 R 2 restricted )/(n k restricted 1). c. F = (R 2 restricted R2 unrestricted )/q (1 R 2 unrestricted )/(n k unrestricted 1). d. F = (R2 restricted R2 unrestricted )/q (1 R 2 restricted )/(n k restricted 1). 30. Which of the following does R 2 and R 2 tell you? a. Whether an included variable is statistically significant. b. Whether the regressors are a true cause of movements in the dependent variable. c. Whether there is omitted variable bias. d. Whether you have chosen the most appropriate set of regressors. e. Whether the regressors are good at predicting the values of the dependent variable. Long Answers 31. Suppose that (X i, Y i ) are i.i.d. with finite fourth moments, E(X i ) = µ X and E(Y i ) = µ Y. (a) State precisely the Law of Large Numbers. Solution: If Y i, i = 1,..., n are i.i.d. and var(y i ) = σ 2 Y <, then Ȳ p µ Y. (b) State precisely the Central Limit Theorem. Solution: If Y i, i = 1,..., n are i.i.d. and var(y i ) = σ 2 Y <, then (Ȳ µ Y )/σ Ȳ d N(0, 1). (c) Derive E[( X µ X )(Ȳ µ Y )]. (If you use the Law of Large Numbers and/or the Central (3) Limit Theorem to answer this question, clearly state where you use the theorem and show that the assumptions of the theorem used are satisfied.) Page 11

Solution: From the definition of sample covariance, we have [( E[( X µ X )(Ȳ µ Y )] = E (X ) ( i µ X ) n (Y )] i µ Y ) n n = E [(X i µ X )(Y i µ Y )] = nσ XY n 2 = σ XY n The second equality uses the fact that (X i, Y i ) are i.i.d., thus E[(X i µ X )(Y j µ Y )] = σ Xi Y j = 0 for i j. Note that neither the Law of Large Numbers or the Central Limit Theorem are used in this derivation. n 2 32. Consider the following linear regression model, Y i = β 0 + β 1 X i + u i. (a) What are the three least squares assumptions that allow us to derive the sampling (1.5) distribution of ˆβ 1? Solution: (I) E(u i X i ) = 0 (II) (X i, Y i ), i = 1,..., n are i.i.d. (III) E(X 4 i ) < and E(Y 4 i ) < (b) Derive the difference between ˆβ 1 and β 1 when the three least squares assumptions are (1.5) satisfied. (If you use the law of large numbers or the central theorem to answer this question, clearly indicate where the theorem is used.) Solution: Denote ū = u i/n. Thus, Ȳ = β 0 + β 1 X + ū and Y i Ȳ = β 1(X i X) + (u i ū). Page 12

Thus, ˆβ 1 β 1 = = (X i X)(Y i Ȳ ) (X i X) 2 β 1 (X i X)(β 1 (X i X) + (u i ū)) (X i X) 2 β 1 = β 1 (X i X) 2 (X i X) 2 + (X i X)(u i ū)) (X i X) 2 β 1 = = (X i X)(u i ū)) (X i X) 2 (X i X)u i (X i X) 2 Where the last line follows since (X i X)ū = 0. Neither the Law of Large Numbers or the Central Limit Theorem are used to derive this relationship. (c) Derive E( ˆβ 1 ) if the three least squares assumptions are satisfied? (Indicate clearly where (2) each of the three least squares assumptions is used in your answer.) Solution: [ E[ ˆβ 1 ] =β 1 + E (X i X)u ] i n (X i X) 2 [ ( =β 1 + E E (X i X)u )] i n (X i X) X 2 1,..., X n [ =β 1 + E (X i X)E ] (u i X 1,..., X n ) (X i [ X) 2 n =β 1 + E (X i X)E ] (u i X i ) (X i X) 2 =β 1 where the fourth equality arises since (X i, Y i ), i = 1,..., n are i.i.d. and the last equality arises since E(u i X i ) = 0. Note that the third least squares assumption of finite fourth moments is not required for this derivation. Page 13

Answer Sheet Econometrics Mid-term Exam Name: Question Points Answer 1 1 2 1 3 1 4 1 5 1 6 1 7 1 8 1 9 1 10 1 11 1 12 1 Total: 12 Question Points Answer 13 1 14 1 15 1 16 1 17 1 18 1 19 1 20 1 21 1 22 1 23 1 24 1 25 1 26 1 27 1 28 1 29 1 30 1 Total: 18

Answer Sheet Econometrics Mid-term Exam Name: Question 31 (5 points) Page 15

Answer Sheet Econometrics Mid-term Exam Name: Question 32 (5 points) Page 16