WISE MA/PhD Programs Econometrics Instructor: Brett Graham Spring Semester, Academic Year Exam Version: A

WISE MA/PhD Programs Econometrics Instructor: Brett Graham Spring Semester, 2015-16 Academic Year Exam Version: A INSTRUCTIONS TO STUDENTS 1 The time allowed for this examination paper is 2 hours. 2 This examination paper contains 32 questions. You are REQUIRED to answer ALL questions. No marks will be deducted for wrong answers. 3 For each multiple choice question there is one and ONLY ONE suitable answer. 4 All numerical answers should be rounded to 3 decimal places. I will accept every answer to within 0.002 of the correct answer. All probabilities should be expressed in decimal form. 5 This examination paper contains 19 pages including this instruction sheet, an answer sheet for the first 30 questions, an answer sheet for question 31, an answer sheet for question 32 and a blank page at the end of the exam. 6 This is a closed-book examination. You are allowed to bring one handwritten 105mm by 75mm piece of paper to the exam. You are also allowed to use a financial calculator. 7 You are required to return all examination materials at the end of the examination. 8 Where required, please use the following critical values, Tail-end Probabilities of the Normal Distribution z 0.994 1.405 1.751 2.054 2.326 2.576 Pr(Z z) (%) 16 8 4 2 1 0.5 5% Significance Level Critical Values of the χ 2 m Distribution Degrees of Freedom (m) 1 2 3 4 5 6 7 8 9 Critical Value 3.84 5.99 7.81 9.49 11.07 12.59 14.07 15.51 16.92

Regressor (a) (b) (c) ln(p rice per citation) 0.533 0.408 0.961 (0.034) (0.044) (0.160) [ln(p rice per citation)] 2 0.017 (0.025) [ln(p rice per citation)] 3 0.0037 (0.0055) ln(age) 0.424 0.373 (0.119) (0.118) ln(age) ln(p rice per citation) 0.156 (0.052) ln(characters 1, 000, 000) 0.206 0.235 (0.098) (0.098) Intercept 4.77 3.21 3.41 (0.055) (0.38) (0.38) Summary Statistics SER 0.750 0.705 0.691 R 2 0.555 0.607 0.622 Table 1: Regression results for Questions 1 to 4. The dependent variable is the logarithm of subscriptions at U.S. libraries in the year 2000; 180 observations. Use the information found in Table 1 to answer the next 5 questions (1-4): The estimated regressions in Table 1 use data for the year 2000 on 180 economics journals. 1. A journal intends to increase its price per citation from 1 to 2 and the number of characters in its journal from 500,000 to 1,000,000. A researcher uses the regression specification from Column (b) in Table 1 to estimate the standard error of the predicted change in the logarithm of journal subscriptions. She transforms the regression into the following form ln(subscriptions) =γ 0 + γ 1 ln(p rice per citation) + γ 2 ln(age) + γ 3 [ln(characters 1, 000, 000) + λ ln(p rice per citation)] + u. so that the standard error can be calculated using the test statistic from the null hypothesis that γ 1 = 0 against the alternative that γ 1 0. What number does λ equal? λ = 1. Page 2

2. Using Column (c) in Table 1, test if the effect on subscriptions from a change in the journal s price per citations depends on the age of the journal. Show the value of your test statistic and your conclusion at the 4% level (depends or does not depend). This a test of significance of the coefficient on ln(age) ln(p rice per citation). Hence, the test statistic is t = 0.156/0.052 = 3, which is greater than the critical value of 2.054. We conclude that the elasticity of journal subscriptions depends on the age of the journal. 3. Using Column (a) in Table 1, construct a 96% confidence interval for the elasticity of demand for journal subscriptions (measured in absolute terms). UCL = 0.533 + 2.054 0.034 = 0.603. LCL = 0.533 2.054 0.034 = 0.463. 4. The price per citation of the Journal of Econometrics was $10 in 2006. The journal was established in 1973. Using Column (c) in Table 1 predict the logarithm of subscriptions to the journal in 2006 if Characters = 2, 000, 000. ln(subscriptions) =3.41 0.961 ln(10) + 0.017(ln(10)) 2 + 0.0037(ln(10)) 3 + 0.373 ln(2006 1973) + 0.156 ln(2006 1973) ln(10) + 0.235 ln(2) =4.05556. Page 3

Use the following information to answer the next 2 questions (5-6): Consider the following estimated regression model ln(y i ) = 0.9 + 0.5X i. 5. Which of the following is correct? a. A one percent increase in X implies a 0.02 percent increase in Y. b. A one percent increase in X implies a 50 percent increase in Y. c. A one percent increase in X implies a 50 unit increase in Y. d. A one unit increase in X implies a 50 unit increase in Y. e. A one unit increase in X implies a 50 percent increase in Y. A one unit increase in X implies a 500 percent increase in Y. 6. Which of the following is incorrect? The regression model is linear in a. β 0. b. β 1. c. X. d. Y. e. ln(y ). The regression model is not linear in Y. 7. In the regression model Y i = β 0 + β 1 X i + β 2 D i + β 3 (X i D i ) + u i, where X is a continuous variable and D is a binary variable, β 3 a. has a standard error that is not normally distributed even in large samples since D is not a normally distributed variable. b. indicates the estimated difference in the slopes of two population regression lines. c. has no meaning since (X i D i ) = 0 when D i = 0. d. indicates the slope of the regression when D = 1. Page 4

8. Nonlinear least squares a. gives you the same results as maximum likelihood estimation. b. is another name for polynomial least squares. c. should always be used when you have nonlinear equations. d. solves the minimization of the sum of squared predictive mistakes through sophisticated mathematical routines, essentially by trial and error methods. 9. The analysis is externally valid if a. its inferences and conclusions can be generalized from the population and setting studied to other populations and settings. b. the error terms in the model are homoskedastic. c. the error terms in the model are uncorrelated with the independent variables. d. the statistical inferences about causal effects are valid for the population being studied. e. the study uses data created by an ideal randomized controlled experiment. 10. Sample selection bias a. is a threat to external validity. b. is more important for nonlinear least squares estimation than for OLS. c. is only important for finite sample results. d. occurs when a selection process influences the availability of data and that process is related to the dependent variable. e. results in the OLS estimator being biased, although it is still consistent. 11. In the case of a simple regression, where the independent variable is measured with i.i.d. error, w i, ˆβ p 1 σw a. 2 σ σx 2 X β 1. b. 2 β +σ2 w σx 2 1. c. β 1 + σ2 w. d. β +σ2 w σx 2 1 + σ2 X. +σ2 w σx 2 +σ2 w Page 5

Use the following information to answer the next 3 questions (12-13): The demand for a commodity is given by Q = 100 2P +u, where Q denotes quantity, P denotes price, and u denotes factors other than price that determine demand. Supply for the commodity is given by Q = 10 + 0.5P + v, where v denotes factors other than price that determine supply. Suppose that u and v both have a mean of zero, have variance of 3 and 4 and are uncorrelated. 12. What is the covariance between Q and v. cov(q, v) = 4 1 + (0.5/2) = 3.2. 13. What is the covariance between Q and P. cov(q, P ) = 0.5 3 2 4 (0.5 + 2) 2 = 1.04. Page 6

Use the following information to answer the next 2 questions (14-15): Consider the following regression model: Y i = β 0 + β 1 X 1i + β 2 X 2i + u i, i = 1..., n. Suppose a sample of n = 60 households has the sample means and sample covariances below for a dependent variable, Y, and two regressors, X 1 and X 2 : Sample Covariances Sample Means Y X 1 X 2 Y 6 8 4 4 X 1 8 8-5.29 X 2 8 7 14. What is the value of ( X T X ). 2,2 From the definition of sample variance X T X 2,2 = n i=1 X 2 1i which implies that s 2 X 1 = 1 n 1 n i=1 (X 1,i X 1 ) 2 = 1 n 1 n X1,i 2 n n 1 X 1, 2 n X1,i 2 = (n 1)s 2 X 1 + n X 1 2 = 59(8) + 60(64) = 4312. i=1 i=1 Page 7

15. What is the value of ( X T Y ). 2,1 X T Y 2,1 = From the definition of sample covariance n X 1i Y i i=1 s X1 Y = 1 n 1 which implies that n i=1 (X 1,i X 1 )(Y i Ȳ ) = 1 n 1 n X 1,i Y i n n 1 X 1 Ȳ, n X 1,i Y i = (n 1)s X1 Y + n X 1 Ȳ = 59(4) + 60(8)(8) = 4076. i=1 i=1 16. Panel data a. always has a binary dependent variable. b. is also called longitudinal data. c. is the same as time series data. d. studies a group of people at a point in time. e. typically uses control and treatment groups. 17. Consider the entity fixed effects model with a single regressor, This model also can be written as Y it = β 1 X 1,it + α i + u it. Y it = β 0 + β 1 X 1,it + γ 2 D2 i +... + γ n Dn i + u it ; where D2 i = 1 if i = 2 and 0 otherwise, and so forth. The coefficient α 3 equals a. β 0 + γ 3. b. β 0 + γ 2 + γ 3. c. β 1 + γ 3. d. β 1 + γ 2 + γ 3. e. γ 3. α i = β 0 + γ 2 D2 i +... + γ n Dn i α 3 = β 0 + γ 3. Page 8

18. A researcher estimates the following fixed effects regression: Y it = β 0 + β 1 X it + β 2 D2 i +... + β n Dn i + β AS AS i + u i, where Dj i = 1 when i = j and zero otherwise. This regression suffers from perfect multicollinearity since AS i = a. γ 1 + n j=2 γ jdj i where γ 1 = AS 1 and γ j = AS j + AS 1 for j = 2,..., n. b. γ 1 + n j=2 γ jdj i where γ 1 = AS 1 and γ j = AS j AS 1 for j = 2,..., n. c. γ 1 + n j=2 γ idj i where γ 1 = AS 1 and γ j = AS 1 AS j for j = 2,..., n. d. γ i + n j=2 γ j where γ 1 = AS 1 and γ j = AS 1 AS j for j = 2,..., n. e. γ i + n j=2 γ 1Dj i where γ 1 = AS 1 and γ j = AS j + AS 1 for j = 2,..., n. 19. In the panel regression analysis of beer taxes on traffic deaths, the estimation period is 1982-1988 for the 48 contiguous U.S. states. To test for the significance of time fixed effects, you should calculate the F -statistic and compare it to the critical value from your F q, distribution. What is the value of q? 6 20. Which of the following statements is not a basic assumption of panel data regression? a. For each individual, its time series observations are i.i.d. over time. b. For each individual, its time series observations have finite fourth moments. c. The regression model does not suffer from omitted lagged effects. d. For each individual, the observations are constant over time. e. For each individual, the error terms are uncorrelated over time. For each individual, the observations are constant over time. Page 9

21. If the fifth assumption in the fixed effects regression is violated, that is, if for some t s cov(u it, u is X it, X is ) 0, then a. the OLS estimator is biased. b. using heteroskedastic-robust standard errors is not sufficient for correct statistical inference when using OLS. c. you can use the simple homoskedasticity-only standard errors calculated in your regression package. d. you cannot use fixed time effects in your estimation. e. you cannot use fixed entity effects in your estimation. using heteroskedastic-robust standard errors is not sufficient for correct statistical inference when using OLS. 22. Suppose X, which can take any real value, is the explanatory variable for the probability that Y = 1. Which of the following functions generates predictions of the probability that Y = 1 that are consistent with probability theory? a. Pr(Y = 1) = 1 sin(β π 0 + β 1 X) b. Pr(Y = 1) = 1 1+e (β 0 +β 1 X) c. Pr(Y = 1) = β 0 + β 1 X d. Pr(Y = 1) = 1 + e (β 0+β 1 X) e. Pr(Y = 1) = ln(β 0 + β 1 X) Pr(Y = 1) = 1 1 + e (β 0+β 1 X) Page 10

23. You want to estimate the probability that someone will quit smoking (Quit) given their years of smoking (Y ears) and gender (Male equals one if the person is male and zero if the person is female). You estimate the following probit model, Pr(Quit = 1 Y ears, Male) = Φ( 1.505 + 0.315Y ears 0.65Male). What is the estimated change in the expected probability that a female quits smoking when Y ears increases from 10 to 11? Pr(Quit = 1 Y ears = 10, Male = 0) = Φ( 1.505 + 0.315 10 0.65 0) = Φ(1.645) = 0.95. Pr(Quit = 1 Y ears = 11, Male = 0) = Φ( 1.505 + 0.315 11 0.65 0) = Φ(1.96) = 0.975. Hence, the estimated expected change is 0.025 or 2.5 percentage points. 24. The conditions for a valid instruments do not include which of the following: a. each instrument must be uncorrelated with the error term. b. each one of the instrumental variables must be normally distributed. c. at least one of the instruments must enter the population regression of X on the Z s and the W s. d. perfect multicollinearity between the predicted endogenous variables and the exogenous variables must be ruled out. e. the exogenous regressors must have finite fourth moments. 25. The J-statistic a. provides you with a test of the hypothesis that the instruments are exogenous for the case of exact identification. b. tells you if the instruments are relevant. c. is distributed χ 2 m k where m k is the degree of overidentification. d. is distributed χ 2 m k of regressors. where m k is the number of instruments minus the number Page 11

26. Having more relevant instruments a. is like having a larger sample size in that more information is available for use in the IV regressions. b. is not as important for inference as having the same number of endogenous variables as instruments. c. is a problem because instead of being just identified, the regression now becomes overidentified. d. typically results in larger standard errors for the TSLS estimator. Use the following information to answer the next 2 questions (27-28): The following table is a two stage least square estimate for an instrument variable regression: (a) (b) (c) Instrument variable(s) sales tax cigar-specific tax both sales and cigar-specific tax First-stage F -statistics 12.7 5.6 14.3 J-test 4.93 (p-value) (0.026) 27. Based on the above results we should conclude that a. Only the sales tax is exogenous. b. Only the cigar-specific tax is exogenous. c. Both the sales tax and the cigar-specific tax is exogenous. d. Neither the sales tax nor the cigar-specific tax is exogenous. e. At least one of the instruments is not exogenous but we cannot say which one. 28. Based on the above results we should conclude that a. Only the sales tax is relevant. b. Only the cigar-specific tax is relevant. c. Both the sales tax and the cigar-specific tax is relevant. d. Neither the sales tax or the cigar-specific tax is relevant. e. At least one of the instruments is not relevant but we cannot say which one. Page 12

Use the following information to answer the next 2 questions (29-30): Suppose that, in a randomly controlled experiment of the effect of light on worker productivity in the clothing industry, the following results are reported: Treatment Control Average shirts produced per hour ( X) 25.4 24.7 Standard deviation of shirts produced per hour (s X ) 3.9 4 Number of workers with at least 10 years experience 37 30 Number of workers with less than 10 years experience 60 73 29. What is the value of the standardized test statistic associated with a test of statistical significance of the average treatment effect on worker productivity? The test statistic is t = X T reatment X Control s 2 X T reatment n T reatment = + s2 X Control n Control Another acceptable test statistic is 1.253. 25.4 24.7 3.9 2 97 + 42 103 = 1.253. 30. What is the value of the standardized test statistic associated with a test of nonrandom assignment? A test of non-random assignment regresses the treatment X on the observed characteristic(s) W and tests if the coefficient(s) on W are significant. When W is binary this is a test if there is a difference in sample proportions of people recieveing the treatment between the two groups. In this example, there would be nonrandom assignment if the proportion of workers who get treated is significantly different across the different levels of experience. Let p E denote the proportion of people with at least 10 years experience who received the treatment and p I denote the proportion of people with less than 10 years experience who received the treatment. Random assignment means that p E = p I. Testing this null hypothesis results in a t-statistic of t = ˆp E ˆp I = ˆp E (1 ˆp E ) n E + ˆp I(1 ˆp I ) n I 0.552 0.451 0.552(1 0.552) + 0.451(1 0.451) 67 133 = 1.357. Another acceptable test statistic is 1.357. Page 13

Long Answers 31. Consider the following (small) panel data set of observations for Y it. i t 1 2 3 1 15 18 15 2 11 17 11 3 13 19 10 (a) Construct entity demeaned values of Y it. (2) i t Ȳ i Ỹ it 1 2 3 1 2 3 1 15 18 15 16-1 2-1 2 11 17 11 13-2 4-2 3 13 19 10 14-1 5-4 (b) Construct time demeaned values of Y it. (2) t 1 2 3 1 15 18 15 i 2 11 17 11 Ȳ t 3 13 19 10 13 18 12 Ỹ it 2-2 -1-1 1 2 0 3 3 0 1-2 (c) Construct entity and time demeaned values of Y it. Taking the entity-demeaned data. t 1 2 3 1-1 2-1 i 2-2 4-2 Ȳ t 3-1 5-4 -4/3 11/3-7/3 1 1/3-5/3 4/3 Ỹ it 2-2/3 1/3 1/3 3 1/3 4/3-5/3 Page 14

32. Consider the linear probability model Y i = β 0 + β 1 X i + u i, where Pr(Y i = 1 X i ) = β 0 + β 1 X i and X is a binary variable. Assume β 0 = 0.5 and β 1 = 0.2. (a) What is var(u i X i = 0)? var(u i X i ) = (β 0 + β 1 X i )[1 (β 0 + β 1 X i )] = 0.5 0.5 = 0.25. (b) What is var(u i X i = 1)? var(u i X i ) = (β 0 + β 1 X i )[1 (β 0 + β 1 X i )] = 0.3 0.7 = 0.21. (c) Consider the following sample of four i.i.d. observations (X i, Y i ) 4 i=1: (3) {(0, 0), (1, 1), (0, 0), (1, 0)}. What is the value of the likelihood function? f(β 0, β 1 ; Y 1,..., Y n X 1,..., X n ) = n (β 0 + β 1 X i ) y i [1 (β 0 + β 1 X i )] 1 y i i=1 f(0.5, 0.2; 0, 1, 0, 0 0, 1, 0, 1) = 0.5 0.3 0.5 0.7 = 0.052 Page 15

Answer Sheet Econometrics Mid-term Exam Name: Question Points Answer 1 1 2 1 3 1 4 1 5 1 6 1 7 1 8 1 9 1 10 1 11 1 12 1 13 1 14 1 15 1 Total: 15 Question Points Answer 16 1 17 1 18 1 19 1 20 1 21 1 22 1 23 1 24 1 25 1 26 1 27 1 28 1 29 1 30 1 Total: 15

Answer Sheet Econometrics Mid-term Exam Name: Question 31 (5 points) Page 17

Answer Sheet Econometrics Mid-term Exam Name: Question 32 (5 points) Page 18