i=1 y i 1fd i = dg= P N i=1 1fd i = dg.

ECOOMETRICS II (ECO 240S) University of Toronto. Department of Economics. Winter 208 Instrctor: Victor Agirregabiria SOLUTIO TO FIAL EXAM Tesday, April 0, 208. From 9:00am-2:00pm (3 hors) ISTRUCTIOS: - This is a closed-book exam. - o stdy aids, inclding calclators, are allowed. - Please, answer all the qestions. TOTAL MARKS = 00 PROBLEM (50 points). Consider the random-coe cients regression model, Y i = i + i D i, where sbindex i represents an individal, and D i 2 f0; g is a binary variable. We denote Y as the otcome variable, and D as the treatment variable. The researcher observes a random sample of fy; Dg for individals, fy i ; d i : i = ; 2; :::; g. The random variables ( i ; i ) are nobserved to the researcher, and they have means and, respectively. Random variable i represents the treatment e ect for individal i, i.e., i = fy i jd i = g fy i jd i = 0g. Therefore, is the Average Treatment E ect (ATE) over the poplation of individals. Qestion. [5 points] Given eqation Y i = i + i D i, obtain a representation of this model as a regression eqation where the intercept and the slope parameters are constants, i.e., constant-coe cients representation. Describe the error term in this regression eqation. ASWER. By de nition, i = + a i, and i = + b i, where a i and b i are zero mean random variables. Plgging these expressions in eqation Y i = i + i D i, we have that Y i = + a i + [ + b i ] D i, or what is eqivalent, Y i = + D i + " i, where the error term is " i a i + b i D i Qestion.2 [5 points] Using the constant-coe cients representation in Qestion., consider the OLS estimator in this eqation. Show that the OLS of the slope parameter can be written as y y 0, where for d 2 f0; g, y d i= y i fd i = dg= i= fd i = dg. ASWER. Let y and d be the sample means of Y and D, respectively. ote that, by de nition, we have that: (i) i= y i d i = y d; and (ii) i= d id i = i= d id i = d. he OLS estimator of the slope parameter in the simple regression of Y on D is b OLS = i= (y i y) (d i d) i= (d i d) (d i d) = i= y i d i y d i= d i d i d d Taking into accont expressions (i) and (ii), we have that: b OLS = y d y d d d d = y y d

ow, we can write the sample mean y as, y = P h i= y P i = i= y id i + P i i= y i( d i ) = y d + y 0 ( d) = y d + y 0 ( d) Plgging this expression into the eqation above for the OLS estimator, we get: b OLS = y y d y 0 ( d) d = y y 0 Qestion.3 [5 points] Consider the OLS estimator in Qestion.2. Sppose that D i is independent of ( i ; i ). Prove that this OLS is a consistent estimator of AT E. ASWER. Given a random sample of Y and D, the Law Large mbers implies and Sltsky s Theorem imply that, if 0 < d < : b OLS = y y 0! p E(Y j D = ) E(Y j D = 0) ow, taking into accont that Y = + D + ", with " a + b D, and that (a; b) are mean zero and independent of D, we have that: E(Y j D = ) = + + E(a + b j D = ) = + and E(Y j D = 0) = + E(a j D = 0) = Therefore, b OLS! p E(Y j D = ) E(Y j D = 0) =. Qestion.4 [5 points] Let fbe i : i = ; 2; :::; g be the residals from the OLS regression in Qestion.2. Under the assmption of independence between D i and ( i ; i ), explain how yo can se these residals to obtain a root- consistent nonparametric estimator of the whole Cmlative Distribtion Fnction (CDF). of the heterogeneos treatment e ects i. [ote for grading: The ennciate of this qestion does not make explicit the assmption of independence between i and i + i that is necessary to answer this speci c qestion] ASWER. Under the assmption of independence between D i and ( i ; i ), we have that i = fy i jd i = 0g Yi 0 and i + i = fy i jd i = g Yi. Let F and F + and the CDFs of the random variables Yi 0 = i and = i + i, respectively. Using the sb-sample of observations with d i = 0, we can estimate consistently Y i the CDF F. Similarly, sing the sb-sample of observations with d i =, we can estimate consistently the CDF F +. For any constant c, the estimators of F (c) and F + (c) are: bf (c) = 0 i= fy i c and d i = 0g bf + (c) = i= fy i c and d i = g where 0 i= fd i = 0g, and i= fd i = g. 2

ext, we show that we can obtain the CDF F in terms of the CDFs F and F +. Remember that i = Y. Then, for any constant c: i Yi 0 F (c) = Pr Y Y 0 c = Pr Y c + Y 0 = Pr ( + c + ), and nder independence between + and, we have: = R F + (c + ) f () d = E [F + (c + )] where E [:] represents the expectation over the distribtion of. ote that E [F + (c + )] is the same object as E Y0 [F Y (c + Y 0 )]. Therefore, we can constrct the following consistent estimator of F (c). bf (c) = E b h i bfy Y0 (c + Y 0 ) = 0 i= fd i = 0g h bfy (c + y i )i = 0 i= fd i = 0g j= fy j c + y i and d j = g Qestion.5 [5 points] ow, consider that the treatment variable D i ( i ; i ). Show that the OLS estimator of is inconsistent. is not independent of ASWER. A necessary condition for the consistency of the OLS estimator of is that E(D ") = 0. Bt we have that, E [D "] = E [D (a + b D)] = Pr(D = ) E [a + b j D = ] Since D is not independent of (a; b), we have that, in general, the conditional expectation E [a + b j D = ] is di erent to the nconditional expectation E [a + b] = 0. Therefore, E [D "] 6= 0 and the OLS estimator is inconsistent. Qestion.6 [5 points] Let Z i be a random variable with spport Z R. ote that Z i can be continos. Sppose that Z i satis es the following conditions: (Relevance) P D (z) Pr(D i = jz i = z) varies with z 2 Z; and (Independence) Z i is independent of ( i ; i ). Sppose that yo estimate the regression eqation (in Qestion.2) by Instrmental Variables (IV) sing Z i as instrment for D i. Prove that this IV estimator is inconsistent for parameter. ASWER. A necessary condition for the consistency of the OLS estimator of is that E(Z ") = 0. ow, we have that, E [Z "] = E [Z (a + b D)] = E [Z a] + E [Z b D] Since Z is independent of a, we have that E [Z a] = E [a] = 0. Then, E [Z "] = E Z [Z E (b D j Z)] = E Z [Z Pr(D = jz) E (b j Z; D = )] = E Z [Z P D E (b j Z; D = )] Since Z is OT independent of b, we have that the conditional expectation E (b j Z; D = ) is di erent to E (b j Z) = E (b) = 0. Therefore, E [Z "] 6= 0 and the IV estimator of is inconsistent. Qestion.7 [0 points] Sppose that we consider the following model for the treatment variable: D i = f i (Z i )g, where i is a zero mean random variable that is independent 3

of Z i, and (Z i ) is some fnction of Z i. Sppose that ( i ; i ; i ) are normally distribted and independent of Z i. (a) Obtain the expression for E (Y jz,d = ) in terms of,, and inverse Mills ratio. Similarly, obtain the expression for E (Y j Z, D = 0) in terms of,, and inverse Mills ratio. ASWER. If " and are normally distribted random variables with mean zero, we have that for any constant c: c " E (" j c) = c E (" j c) = " Taking into accont this property, we have that: c c E (Y j Z, D = ) = + + E (a + b D j Z, D = ) = + + E (a + b j Z, ) = + (a+b) And, E (Y j Z, D = 0) = + E (a + b D j Z, D = 0) = + E (a j Z, > ) = + a; (b) Using the derivations in (a), propose a two-step method for consistent estimation of. ASWER. First, note that E (Y j Z) = P D E (Y j Z, D = ) + [ into accont the expressions in (a), we have: E (Y j Z) = + a; + a+b; P D ] E (Y j Z, D = 0), and taking Or, in a regression-like eqation: Y = + Z + Z + e where Z, Z, a; a+b;, and e is an error term that is mean zero and mean independent of Z, i.e., E(ejZ) = 0. 4

Given this regression representation, we can estimate ( ; ; ) sing the following two-step method. In the rst step, we estimate Probit model where the dependent variable is D and the explanatory variables are the terms of a polynomial in Z that approximate fnction =. Given the estimated Probit model, we constrct the estimated vales c i \ (z i )! and b i \ (z i )! for every observation i in the sample. In the second step, we rn an OLS regression for y i on [; c i ; b i ]. This OLS estimation provides a consistent estimator of ( ; ; ). Qestion.8 [0 points] Consider the model in Qestion.7, bt now we do not assme that ( i ; i ; i ) are normally distribted We still assme that ( i ; i ; i ) are independent of Z i, bt their distribtion is onparametrically speci ed. Sppose that the spport of Z i is the whole real line, and P D (z) F ((z)) is strictly monotonic in z. (a) Obtain the expression for E (Y j Z, D = ) in terms of,, and a selection term. Similarly, obtain the expression for E (Y j Z, D = 0) in terms of,, and a selection term. ASWER. Under the assmption that is independent of Z we have that P D F () and there is a one-to-one relationship between P D and. Frthermore, nder the assmption of independence between Z and (; ; ), we have that: E (Y j Z, D = ) = + + E (a + b j Z, ) = + + S a+b () And E (Y j Z, D = 0) = + + E (a j Z, > ) = + S a () Finally, given the one-to-one relationship between P D and we can represent the selection terms S a+b () and S a () as fnctions of the propensity score P D. That is: E (Y j Z, D = ) = + + s a+b (P D ) and E (Y j Z, D = ) = + s a (P D ) (b) Using the eqations derived in (a), show that is identi ed. Propose a two-step method for the consistent estimation of. ASWER. First, note that E (Y j Z) = P D E (Y j Z, D = ) + [ into accont the expressions in (a), we have: P D ] E (Y j Z, D = 0), and taking E (Y j Z) = + P D + s (P D ) with s (p) p s a+b (p) + ( p) s a (p). Or, in a regression-like eqation: Y = + P D + s (P D ) + e where P D P D, s (P D ) s (P D ), and e is an error term that is mean zero and mean independent of Z, i.e., E(ejZ) = 0. 5

Given this regression representation, consider the following two-step nonparametric method. In the rst step, we estimate a nonparametric regression for the dependent variable D on the instrment Z. Given this estimated P regression, we constrct the estimated vales b P i P D (z i ) for every observation i in the sample. In the second step, we rn an Partially Linear Model for y i on b P i and s( b P i ). Unfortnately, it is clear that in this second step we cannot identify the parameter from the nonparametric selection term s( b P i ). The ATE is not identi ed in this model. PROBLEM 2 (25 points). Consider the binary choice model Y = fz + X " 0g, where " is indenpendently distribted of (Z; X) with CDF F (:) that is continosly di erentiable over the real line. The explanatory variable Z has spport over the whole real line, while the explanatory variable X is binary with spport f0; g. (a) [0 points] Describe Maximm Score Estimator (MSE) and the Smooth Maximm Score Estimator of the parameter. Explain why the SMSE deals with some of the limitations in the MSE. ASWER. Given that " is independent of (Z; X) is also median independent. Therefore, median("jz; X) = median(") = 0. Then, we have that: median(y j Z; X) = fz + X 0g Sppose that we se the this conditional median, fz + X 0g, to predict the otcome Y =. The Maximm Score Estimator (MSE) of is de ned as the vale of that maximizes the score fnction that conts the nmber of correct predictions when we predict Y = i fz + X 0g and predict Y = 0 i fz + X > 0g. That is, ^ MSE = arg max S() = X n i= y i fz i + x i 0g + ( y i ) fz i + x i > 0g This is estimator is consistent bt it has several limitations: () it is OT root-n consistent. Its rate of convergence to the tre is n =3. (2) It is not asymptotically normal. It has a not standard distribtion. (3) We cannot se the standard gradient based methods to search for the MSE. And (4) if the sample size is not large enogh, there may not be a niqe vale of that maximizes S(). The maximizer of S() can be a whole (compact) set in the space of. All these limitations of the MSE are related to the fact that the criterion fnction S() is a discontinos step-fnction. Based on this, Horowitz proposes an estimator that is based on a "smooth" score fnction. First, note that score fnction S() can be written as follows: S() = nx y i fz i + x i 0g + ( i= y i ) fz i + x i > 0g = nx y i fz i + x i 0g + ( y i ) ( fz i + x i 0g) i= = nx ( y i ) + i= nx (2y i ) fz i + x i 0g Therefore, maximizing S() is eqivalent to maximizing X n ^ MSE = arg max i= X n i= (2y i ) fz i + x i 0g, and: i= (2y i ) fz i + x i 0g 6

zi + x i Horowitz proposes to replace fz i + x i 0g by a fnction, where (:) is the CDF of the standard normal, and b n is a bandwidth parameter sch that: (i) b n! 0 as n! ; and (ii) nb n! as n!. That is, b n goes to zero bt more slowly than =n. The Smooth-MSE is de ned as: ^ SMSE = arg max b n nx zi + x i (2y i ) i= zi + x i As n!, and b n! 0, the fnction converges niformly to fz i + x i 0g, and the b n criterion fnction converges niformly to the Score fnction. This implies the consistency of ^ SMSE. Under the additional condition that nb n! as n!, and the Kernel fnction has enogh smooth derivatives (e.g., ormal CDF) this estimator is n consistent and asymptotically normal, with 2=5 < =2. It can be compted sing standard gradient search methods becase the criterion fnction is continosly di erentiable. (b) [5 points] Sppose that is known (or consistently estimated). Provide a constrctive proof of the identi cation of the distribtion fnction F (" 0 ) at any vale " 0 in the real line. b n ASWER. The CCP fnction P (z; x) = Pr(Y = jz = z, X = x) is nonparametrically identi ed from the data at every (z; x). Sppose that has been identi ed/estimated (e.g., SMSE estimator). For arbitrary vales of x 2 f0; g and " 2 R, say (x 0 ; " 0 ), we want to estimate F "jx0 (" 0 ). Let z 0 be the vale z 0 = " 0 x 0, and let P (z 0 ; x 0 ) be the CCP evalated at (z 0 ; x 0 ). Then: P (z 0 ; x 0 ) Pr(Y = j Z = z 0, X = x 0 ) = Pr(" z 0 + x 0 ) = F "jx0 (z 0 + x 0 ) = F "jx0 (" 0 ) Or eqivalently, F "jx0 (" 0 ) = P (z 0 ; x 0 ). That is, for any (x 0," 0 ) we can always de ne a vale z 0 sch that F "jx0 (" 0 ) is eqal to P (z 0 ; x 0 ). PROBLEM 3 (25 points). Consider the dynamic panel data model, y it = y it it is i:i:d: + i + it, where Qestion 3. [0 points] De ne Anderson-Hsiao IV estimator of. Derive the expression of its asymptotic variance. Explain the weak instrments problem as approaches. ASWER. For the AR()-PD model withot other regressors, the Anderson-Hsiao estimator is de ned as: " TX X # y i;t 2 y it ^ AH = t=3 i= " TX X # y i;t 2 y i;t t=3 i= otice that we need T 3 to implement this estimator. Anderson-Hsiao estimator is an IV estimator in the eqation in FD where the FD of the lagged endogenos variable is instrmented with y it 2. 7

To derive the asymptotic variance, notice that: As!, p ^AH = T P t=3 P i= y i;t 2 y i;t TP t=3 p P P y i;t 2 y i;t! p E (y t 2 y t ) i= y i;t i= 2 it P p i= By Mann-Wald Theorem, as!, y i;t 2 it! d (0; V ar (y i;t 2 it )) p ^AH! d (0; V AH ) with V AH = V ar (y t 2 t ) [E (y t 2 y t )] 2 To obtain the expression of V AH in terms of the parameters of the model, we need to derive the expressions for V ar (y t 2 t ) and E (y t 2 y t ). (A) V ar (y t 2 t ). First, note that V ar(y t 2 t ) = E(yt 2 2 2 t ). ote that y t 2 depends on transitory shocks at periods t 2 and before. Therefore, t is independent of y t 2. Applying the law of iterative expectations and the independence between t and y t 2, we have that E(yit 2 2 2 t ) = E(yt 2 2) E( 2 t ). Given that E( 2 t ) = 2 2, and E yt 2 2 2 = ( ) 2 + 2 2, we have that: V ar(y t 2 t ) = 2 2 2 ( ) 2 + 2 2 : (B) E (y t 2 y t ). We have that E (y t 2 y t ) = E (y y 2 ), and E (y y 2 ) = E ([ + 0 + :::] [ 2 + ( ) + ( ) 0 + :::]) = ( ) 2 + 2 2 + 4 2 + 6 2 + ::: = ( ) 2 2 = 2 + Then, combining (A) and (B), we have that: 2 2 2 ( ) V AH = 2 + 2 2 2 2 + 2( + )2 2 = 2 + 2 + + + 2 = 2 + We can see that as approaches to the variance of the AH estimator goes to in nity. This is de to a weak instrments problem. As approaches to, the proportion of the variance of y it explained by the instrment y it 2 (i.e., the R-sqare in the axiliary regression of y it on y it 2 ) goes to zero. 2 8

Qestion 3.2 [5 points] Sppose that jj < and for every individal i there is a time period t i < sch that y it i = i=. Show that nder these conditions we can obtain an IV estimator of that does not s er of a weak instrments problem as approaches. ASWER. Sppose that jj <, and that at some period t 0 in the past (that can be individal speci c), the process fy it g visited his individal-speci c mean, Stationarity implies that y;i = y;i + i, and y it = y;i E (y it j i) y;i = i Under this stationarity assmption, we have that, one period after t : y i;t + = y;i + i + i;t + And for any t > t, we have that: We have that: = y;i ( ) y;i + i + i;t + = y;i ( ) = y;i + i;t + i tx y it = y;i + j i;t t j=0 tx y it = j i;t t j=0 + i + i;t + And this implies that y it does OT dependent of i. This property implies that there are valid instrments for the eqation in levels. y it = y i;t + i + it If it is not serially correlated, then 4y i;t, 4y i;t 2,... are not correlated with ( i + it ). For instance, we can estimate by IV sing y i;t as instrment of y i;t. This IV estimator is: j j ^ = X TX i= t=3 X i= t=3 y i;t y it TX y i;t y i;t This IV estimator does not s er of weak instrments as!. otice that: p ^ = =2 i= y i;t ( i + it ) i= y i;t y i;t! d (0; V BB ) 9

where V BB = V ar (y t ( + t )) E (y t y t ) 2. Given that: V ar (y t ( i + it )) = 2 2 2 + 2 = ( + ), and E (y i;t y i;t ) = 2 = ( + ), we have that: V BB = 22 2 + 2 = ( + ) 4 = ( + ) 2 = 2 ( + ) + 2 2 In contrast with the AH estimator, the variance of this estimator does not goes to in nity as goes to one. ED OF THE EXAM 0