Introduction Let Y be a ( + l) random vector and Z a k random vector (l; k 2 N, the set of all natural numbers). Some random variables in Y and Z may

Size: px

Start display at page:

Download "Introduction Let Y be a ( + l) random vector and Z a k random vector (l; k 2 N, the set of all natural numbers). Some random variables in Y and Z may"

Ariel Farmer
5 years ago
Views:

1 Instrumental Variable Estimation Based on Mean Absolute Deviation Shinichi Sakata University of Michigan Department of Economics 240 Lorch Hall 6 Tappan Street Ann Arbor, MI U.S.A. February 4, 200 We propose a general estimation principle based on the assumption that instrumental variables (IV) do not explain the error term in a structural equation. The estimators based on this principle is independent of the normalization constraint, unlike the standard IV estimators such as the two-stage least squares estimator. Using the new principle, we propose the L IV estimator, which is an IV estimation counterpart of the least absolute deviation estimator. We investigate the asymptotic properties of this estimator, and propose a consistent estimator of its asymptotic covariance matrix and a consistent specication test based the L IV estimator. We also discuss the identiability in L IV estimation. Keywords: IV estimation, mean absolute deviation, least absolute deviation estimation The author is grateful to Professor Curt T. McMullen, who kindly provided a sketch of the proof of Lemma 4. in personal communications.

2 Introduction Let Y be a ( + l) random vector and Z a k random vector (l; k 2 N, the set of all natural numbers). Some random variables in Y and Z may be the same random variables. Suppose that an economic theory predicts that a certain (unknown) linear combination of the random variables in Y is unrelated to Z. We naturally want to know which linear combination it is. A statistical interpretation of the theory's prediction is that given a prediction accuracy measure, there exists ( + l) constant vector 0 such that the best linear predictor for 0 0Y by using Z is zero. The mean square error (MSE) is often employed as the accuracy measure. In terms of MSE, the linear predictor d 0 0 Y for 0 0 Y is the best if and only if it satises the orthogonality condition that E[(0 0 Y? d 0 0 Y )Z] = 0. By taking this orthogonality condition for the moment conditions in the generalized method-of-moments (GMM) estimation approach and imposing the normalization restriction that one of the elements in be one, the instrumental variable estimators such as the two-stage least squares (2SLS) estimator are formed. The IV estimators are widely used in econometric applications. Nevertheless, the IV estimators have a well-known problem. An IV estimator can deliver very dierent estimated equation depending on the normalization constraint, when k > l (i.e., the equation is \overidentied"). It is certainly annoying to see a big dierence between the equation that is estimated setting the rst coecient to one and that estimated setting the second coecient to one. We here propose an alternative approach, which yields estimators without this dependence on the normalization constraint. Choose a dispersion measure for univariate distributions and take the ratio: Q() = inf 2R k Dispersion of (Y 0? Z 0 ) about the origin Dispersion of Y 0 ; about the origin where R k is the k-dimensional real space. This ratio ranges between zero and one. The higher this ratio is, the less Y 0 is related to Z, because a high ratio means that the performance of the linear predictor based on Z is close to that of the predictor constantly equal to zero. If the theory's claim that 0 0Y is not related to Z is correct, we have that Q( 0 ) =. The parameter value 0 is thus the maximizer of Q. Even if the claim is not correct, the maximizer of Q can be viewed as representing the linear relationship among variables in Y that is closest to the theory's claim. Although the letter Y often denotes the vector that containing endogenous variables in the literature, Y may contain both endogenous and exogenous variables in this paper. 2

3 An estimator based on this approach can be dened as the maximizer of the sample analogue of Q. We can form various estimators by taking various dispersion measures in Q. A leading example of the dispersion measure is the standard deviation. The function Q with the standard deviation is: Q 2 (), inf 2R k E[(Y 0? Z 0 ) 2 ] =2 E[(Y 0 ) 2 ] =2 ; 2 R +l nf0g; () provided that both Y and Z have nite second moments, where, means \is dened to be". By using the standard results for the best linear predictor in terms of the mean square error (MSE), we can show that inf 2R k E[(Y 0? Z 0 ) 2 ] = 0 (M Y Y? M Y Z M? ZZ M ZY ); where M Y Y, E[Y Y 0 ], M Y Z, E[Y Z 0 ], and M ZY, M 0 ZY. We thus have that Q 2 () 2 =? 0 M Y Z M? ZZ M ZY 0 ; 2 R +l nf0g: M Y Y The eigen vector corresponding to the minimum eigen value of M Y Z M? ZZ M ZY in the metric of M Y Y maximizes Q 2, provided that M Y Y is nonsingular 2. The function Q 2 can be viewed as a utility function of our preferences over the values of, which represent the linear relationships among the random variables in Y. Because Q 2 is homogeneous of degree zero, is judged to be as good as c for each 2 R +l and each nonzero real number c. This feature of Q 2 is consistent with the view that if a structural equation is obtained by multiplying another structural equation by a scalar constant, the two equations represent the same relationship. To avoid having multiple values for that represent the same linear relationship, it is convenient to normalize. A commonly employed normalization is to set an element of to one. Let (Y ; Z t ); : : : ; (Y n ; Z n ) be a random sample drawn from the distribution of (Y; Z). Then the sample analogue of Q 2 () is ^Q2n dened by ^Q 2n () 2,? 0 ^M n Y Z ^M n ZZ + ^M n ZY 0 ^M n Y Y ; where ^M n Y Z, n? P n Y tz 0 t; ^M n ZY, ^M n Y Z 0 ; ^M n ZZ, n? P n Z tz 0 t ; ^M n Y Y, n? P n Y ty 0 t ; 0. 2 The eigen values of MY Z M? ZZ M ZY in the metric of MY Y are the values of that satises detfmy Z M? ZZ M ZY? MY Y g = 3

4 and A + denotes the Moore-Penrose (MP) generalized inverse of a matrix A. The estimator based on Q 2 is dened to be the minimizer of ^Q2n with the normalization restriction that sets one of the elements of to one. Interestingly, this estimator is the same as the IV estimator obtained by the canonical correlation approach in Sargan (958). As Sargan shows, this estimator is also the limited information maximum likelihood (LIML) estimator employing the joint normal distribution for the reduced-form disturbances when all explanatory variables in the structural equation are included in the instruments vector Z. Further, it can be viewed as an estimator based on the least variance ratio principle (see Kmenta (986, p. 690) and Schmidt (976, pp.70{77)), and it coincides with a special case of the GMM estimator with continuously updated weights considered in Hansen, Heaton, and Yaron (996). The homogeneity of Q is not limited to the implementation that uses the standard deviation for the dispersion measure. Given any dispersion measure, the dispersion of cu is equal to c times the dispersion of U for each random variable U and real constant c 0. It follows that Q is always homogeneous of degree zero. A reasonably chosen sample analogue of Q inherits this homogeneity property. The estimators generated by our approach are thus normalization-invariant in the sense that two dierent normalization restrictions result in the the same estimated linear relationship. This makes our approach attractive in practice. Another feature of our approach is that the estimator based on the proposed approach is meaningful even under misspecication. The maximizer of Q represents the relationship among the variables in Y that are least related to Z in terms of Q. In this sense, can be regarded as the most favorable parameter value for the theory's claim. It is thus sensible to estimate even under misspecication. The estimator generated in our strategy would be consistent for this under mild conditions. Given these attractive features of the proposed approach, a natural question arises. What dispersion measure should we use in Q? In choosing a dispersion measure, one may want to take into account outlierrobustness of the resulting estimator. The sample standard deviation is known to be highly sensitive to outlying values. This means that the estimator generated by taking the standard deviation in Q can be highly sensitive to the outlying observations on Y and Z. If this property could cause problems in a project, one can instead use the mean absolute deviation (MAD), which is known to be less sensitive to the outlying observations than the standard deviation. We call the resulting estimator the L IV estimator, because the MAD about the origin of a random variable is the same as the L norm of it. In the rest of this paper, we 4

5 focus on this estimator. If one prefers more outlier-robustness in our estimation strategy, one can employ a high breakdown point dispersion measure such as the S-dispersion measure of Rousseeuw and Yohai (984) or -dispersion measure Yohai and Zamar (988) to replace MAD at the possibly higher computational costs. Attempts to robustify the IV estimation method have been made in dierent ways in the past. Krasker and Welsch (985) propose use of reweighted IVs to reduce the eect of outliers in IV estimation. Krasker (986) instead considers an estimator similar to the conventional 2SLS that uses a bounded-inuence estimator in each stage instead of the OLS estimator. Also, Glahe and Hunt (970), Amemiya (982) and Powell (983) investigate the two-stage LAD estimators that use the LAD estimator instead of the OLS estimator in both stages or in one stage in 2SLS estimation. The estimators in these attempts inherits the property of the conventional IV estimator that the estimator is not normalization-invariant. In the context of estimating one or more structural equations in a simultaneous equation system, one may have a reduced-form system. The structural equations can be viewed as imposing constraints on the parameters of the reduced-form system. A natural way to form a robust estimator in this context is to incorporate these constraints in a robust estimation method for the reduced-form system. Prucha and Kelejian's (984) full information maximum likelihood (FIML) estimator employing the generalized t-distribution for the reduced-form disturbances is an example of this. Krishnakumar and Ronchetti (997) also take a similar approach, using an estimator of the reduced-form system with a bounded-inuence function. Because the parameter space for the reduced-form system under the constraints given by the structural equations is invariant to normalization of the structural equations, these estimators are normalization-invariant. A possible drawback of this approach is that we have to specify the reduced-form system even if we are only interested in a single structural equation. Unlike the previously developed estimators mentioned above, the L IV estimator is invariant to normalization and requires no reduced-form specication. In what follows, we analyze the properties of the L IV estimator and consider some related topics. After giving a formal denition of the L IV estimator and establishing its existence (Section 2), we investigate the consistency (Section 3), the identiability (Section 4), and the asymptotic normality (Section 5) in the L IV estimation. We also propose a consistent estimator for the asymptotic covariance matrix of the L IV estimator (Section 6), and a consistent specication test based on the L IV estimator (Section 7). We conclude this paper with the summary and remarks (Section 8). All formal assumptions are collected in Appendix A, and the proofs of theorems and lemmas are 5

6 given in Appendix B. 2 Denition We assume that the data are a realization of an i.i.d. sequence of q random vectors, X t, t 2 N, which are drawn from the distribution of a random vector X (Assumption A.). X t is partitioned as X t (Y 0 t ; Z 0 t) 0, where Y t is ( + l) and Z t is k, t = ; 2; : : : (l; k 2 N, q = + l + k). X is similarly partitioned as X (Y 0 ; Z 0 ). To ensure that each linear combination of the elements of X has nite absolute moment, we assume that each element of X has a nite absolute moment (Assumption A.2). If Y 0 = 0 almost surely (a.s.) for some 2 R +l nf0g, this value of would represent the linear relationship we are trying to estimate. We could nd this exact value of based on the observed values of Y a.s. This, however, seldom occurs in practice. We assume that the random variables in Y are linearly independent in the sense that for any 2 R +l nf0g, P [Y 0 = 0] <, and that the random variables in Z are linearly independent in the same sense (Assumption A.3). Let k k denote the L norm (i.e., kuk = E[jUj] for each random variable U). Then the function Q based on the MAD is given by Q(), inf 2R k ky 0? Z 0 k ky 0 k ; 2 R +l nf0g: (2) By using the basic results about normed linear spaces, we can show: Lemma 2.: Suppose that Assumptions A. hold. Then (a) For each 2 R +l there exists y 2 R k such that ky 0? Z 0 y k = inf 2R k ky 0? Z 0 k : (b) The mapping from R +l to R is continuous. 7! inf 2R k ky 0? Z 0 k (c) Q is continuous on R +l nf0g. 6

7 The function Q is homogeneous of degree zero. It follows that sup Q() = sup Q(); 2R +l nf0g 2@B +l where B +l is the unit ball centered at the origin in the ( + l)-dimensional Euclidean space +l is the boundary of it. +l is compact, it follows by the theorem of maximum that Q attains its supremum +l. Thus: Lemma 2.2: Suppose that Assumptions A.{A.3 hold. Then there exists 2 R +l nf0g such that Q( ) = Our estimation problem is to reveal of Lemma 2.2. sup Q(): 2R +k nf0g We dene our estimator as the minimizer of the sample analogue of Q with a normalization restriction. With no loss of generality, we set the rst element of to one. Definition (L IV estimator): For each sample size n 2 N dene ^Qn : (R +l nf0g)! R by ^Q n (;!), 8 >< >: inf 2R k n? P n jyt(!)0?z t(!) 0 j n? P n jyt(!)0 j otherwise if (Y (!); : : : ; Y n (!)) is of full row rank; for each (;!) 2 (R +l nf0g), where j j denotes the Euclidean norm. Let B be a nonempty subset of R l. If there exists an l random vector ^ n such that ^Q n (;? ^ n ; ) = sup ^Q n (;?; ) a.s.-p; 2B we call ^n the L IV estimator associated with parameter space B, or simply the L IV estimator. The (a.s.) continuity of the objective function is crucial to ensure existence of this estimator. When (Y ; : : : ; Y n ) is not of full row rank, the objective function would be discontinuous at some points in the parameter space, due to the division by zero, without the \otherwise" part of the denition of ^Qn. The event that (Y ; : : : ; Y n ) is not of full row rank is usually a rare event, if the sample size is reasonably large. The \otherwise" part thus has no impact in practice, while it makes our theoretical analysis simpler. To establish the existence of the L IV estimator, we use the next lemma. 7

8 Lemma 2.3: (a) For each n 2 N (; x n ) 7! inf 2R k n? nx jy 0 t? z 0 tj from R +l R nv to R is continuous, where x n is partitioned as x n, (x 0 ; : : : ; x0 n) 0 ; x t is further partitioned as x t, (y 0 t ; z0 t) 0 for t 2 N; and y t 2 R +l and z t 2 R k for each t 2 N. (b) Given Assumption A., ^Qn is measurable-(b nv F )=B for each n 2 N, and ^Qn (;!) is continuous on R +l nf0g for each! 2, where B m denotes the Borel -eld of R m and denotes the operator for the product -eld. It thus follows that (!; ) 7! ^Qn (;?;!) : B! R is a measurable function such that for any! 2 ^Q n (; ;!) : B! R is continuous. By assuming that B is compact (Assumption A.4), we can apply the standard result for existence of extremum estimators such as Gallant and White (988, Theorem 2.2) to establish the existence of the L IV estimator. Theorem 2.4: Suppose that Assumptions A. and A.4 hold. Then there exists an L IV estimator ^ n associated with parameter space B. 3 Consistency In this section, we establish consistency of the L IV estimator ^ n in the broad sense that ^ n tends to be close to the set of the maximizers of the function 7! Q(;?) : B! R when the sample size is large. The key step is the proof of the convergence of f ^Qn (;?; )g n2n to Q(;?) uniform in 2 B. We consider the behavior of the numerator and denominator of ^Qn separately. First we establish the uniform convergence of the average of jy 0 t? Z0 t j. (a) Lemma 3.: Suppose that Assumptions A. and A.2 hold. Then: sup (;)2R +l R k nf(0;0)g (b) For any nonempty, bounded subset of R +l R k, sup (;)2 n? P n? n jy 0 t? Ztj 0? ky 0? Z 0 k! 0 as n! a.s.-p: jj + jj nx jy 0 t? Z 0 tj? ky 0? Z 0 k! 0 as n! a.s.-p: 8

9 We can next show the uniform convergence of the numerator of ^Qn. (a) Lemma 3.2: Suppose that Assumptions A. and A.2 hold. Then: sup 2R +l nf0g P inf2r k n? n jy 0 t? Ztj 0? inf 2R k ky 0? Z 0 k! 0 as n! a.s.-p: jj (b) For any nonempty, bounded subset A of R +l nf0g, sup 2A2R inf n? k nx jy 0 t? Z 0 tj? inf 2R k ky 0? Z 0 k! 0 as n! a.s.-p: We now combine the results of Lemmas 3.(a) and 3.2(a) to obtain the uniform convergence of ^Qn. Lemma 3.3: Suppose that Assumptions A.{A.3 hold. Then It immediately follows from Lemma 3.3 that sup j ^Qn (; )? Q()j! 0 as n! a.s.-p: 2R +l nf0g sup j ^Qn (;?; )? Q(;?)j! 0 as n! a.s.-p: (3) 2B Thus, Lemma 4.2 of Potscher and Prucha (99) applies to our estimation problem. Theorem 3.4: Suppose that Assumptions A.{A.4 hold. Then the L IV estimator f ^ n g n2n is strongly consistent for B in the sense that d 2 ( ^ n ; B )! 0 as n! a.s.-p; where B, ( ) 2 B : Q(;? ) = sup Q(;?) ; 2B and d 2 is the Euclidean metric so that d 2 (; B ) = inf 2B j? j: 9

10 4 Identiable Uniqueness In establishing the asymptotic normality of the L IV estimator in the next section, we assume that B is a singleton, i.e., the maximizer of the function 7! Q(;?) : B! R is unique. In this section, we examine when this uniqueness holds. The uniqueness of the maximizer is sometimes called the identiability in the literature. Nevertheless, the consistency of ^ n established in Section 3 means that the maximizers of Q(;?) on B is identiable in a sense. So, we here use the term \identiable uniqueness" or simply uniqueness instead of identiability. In the LIML and 2SLS estimation, the order condition that l k is known to be necessary for the uniqueness. Is a similar result available for the L IV estimator? To answer this question, we rewrite Q as Q() = inf 2R k Y 0 = inf 2R k Y 0 ky 0 k? Z 0 ky 0 k? Z 0 ky 0 k ; 2 R +l nf0g: From this expression, we see that Q() is the distance between the unit-length vector Y 0 =ky 0 k and the space spanned by the elements of Z. A maximizer of Q on R +l thus gives the point on the unit sphere on the linear subspace spanned by Y that is furthest from the linear subspace spanned by Z. The following lemma gives us a useful insight about this geometry. Lemma 4.: Suppose that (E; k k) is a nite dimensional smooth normed linear space (see Beauzamy (985, Part III, Chapter ) for the smoothness of normed spaces.) and that M and N are linear subspaces of E with dimensions m and n, respectively, such that m > n > 0. Also let S(M) be the unit sphere in M. Then there exists a point x 0 in S(M) such that inf 2N kx 0? k = kx 0 k, so that the distance between x 0 and N is one. Suppose that Y 0? Z 0 6= 0 a.s.-p for each (; ) 2 (R +l R k )nf0g: (4) Under Assumption A.2, this implies that ky 0? Z 0 k is dierentiable with respect to and at each (; ) 6= 0. Under the dierentiability, the space spanned by Y and Z with norm k k is a smooth normed space. An implication of Lemma 4. is that if ( + l) > k, there exists 0 2 R +l nf0g such that the 0

11 distance between Y 0 0 =ky 0 0 k and the linear subspace spanned by Z is one. That is, if l k, there exists 0 2 R +l nf0g such that Q( 0 ) =. Further, when l > k, Q attains one even if we drop a variable from Y, because the dimension of the linear subspace spanned by the remaining variables in Y is l?, which is still no less than k. Note that which variable is dropped from Y does not matter for this result. So, there are nonzero ( + l) vectors () 0 ; : : : ; (+l) 0 such that Q( (j) 0 ) =, and the jth element of (j) 0 is zero, j = ; 2; : : : ; (+l). Because () 0 is nonzero, at least one of the second through last elements of it is nonzero. Let the ith element of () 0 is not zero (i > ). Then () 0 and (i) 0 are linearly independent. Thus, Q attains one on R +l nf0g if l k, and there are two or more linearly independent vectors in R +l that yield one for Q if l > k. The condition (4) is of course quite restrictive. It is violated, if Y shares one or more variables with Z. It also fails, if a variable contained in Y or Z has a discrete distribution. Fortunately, the same result holds without condition (4), as is found in the next theorem. Theorem 4.2: Suppose that Assumptions A.{A.3. Then: (a) If l k, then there exists 0 in R +l nf0g such that Q( 0 ) =. (b) If further l > k, then there exist two linearly independent vectors () 0 and (2) 0 in R +l nf0g that satises that Q( () 0 ) = Q((2) 0 ) =. The uniqueness of the optimum parameter value thus requires that the number of instrumental variables be no less than the number of the dependent variables minus one, as in LIML and 2SLS estimation. We now assume that the elements of Z are linearly independent for simplicity and that l k (Assumption A.5) to avoid the violation of the \order condition". As in the LIML and 2SLS estimation, the condition that l k does not imply that the optimum parameter value is unique up to scale. It seems that it is dicult to give an intuitively appealing sucient condition for the uniqueness without imposing further structure on X. The next example demonstrates a situation in which the uniqueness holds. Example 4.: In addition to Assumptions A.{A.3 and A.5, suppose that there exists a ( + l) k constant matrix 0 that satises the following conditions: (a) Letting, Y? 0 Z, for each 2 R +l nf0g, the conditional median of 0 is zero, and the conditional distribution function of 0 is strictly increasing at the origin.

12 (b) The rank of 0 is l, so that the dimension of the left null space of 0 is one. Then Q() = if and only if is a nonzero vector that belongs to the left null space of 0 ; so, the maximizer of Q on R +l nf0g is unique up to scalar multiple. To verify this fact, let be an arbitrary vector in R +l. An implication of condition (a) is the constant zero minimizes the mean absolute prediction error (MAPE) in predicting 0 based Z, and any predictor dierent from it with a positive probability yields a larger MAPE. Because 0 Y = 0 ( 0 Z + ) = 0 0 Z + ; it follows that 0 0 Z is the best-mape predictor of 0 Y, and any predictor that is dierent from 0 0 Z with nonzero probability yields a larger MAPE. We thus have that for any 2 R +l nf0g k 0 Y k k 0 Y? 0 0 Zk ; where the equality holds if and only if 0 0 Z = 0 a.s.-p. By Assumption A.3, it also holds that 0 0 Z = 0 a.s.-p if and only if 0 0 = 0. The latter condition holds if and only if belongs to the left null space of 0. 5 Asymptotic Normality We now investigate the asymptotic distribution of the L IV estimator, assuming that Q has a unique maximizer (i.e., B = f g) interior to B (Assumption A.6(a)). By Theorem 4.2, this requires that k l (Assumption A.5). In investigating the asymptotic distribution of an extremum estimator, we typically use the stochastic mean value theorem (Jennrich (969), Lemma 3) to approximate the appropriately standardized estimator by a linear transformation of a random vector that is asymptotically normally distributed. Nevertheless, the absolute value function in the expression in ^Qn (;?; ) creates non-dierentiable points on the parameter space B for typical realizations. This feature makes it impossible to use the stochastic mean value theorem in our problem. An approach alternative to the linearization of the rst order condition is to approximate the objective function by a quadratic function of the parameters in the neighborhood of in an appropriate sense. Huber (967), Pollard (985) and Potscher and Prucha (997, Chapter 9) describe this approach. In the L IV estimation, the objective function is ^Qn (;?; ), or equivalently, log ^Qn (;?; ). We approximate 2

13 log ^Qn (;?; ) by a quadratic function in our investigation. Note that ^Qn is not an average of random functions, and that the numerator of ^Qn is the inmum of the average of random functions. We need to take several steps to reach our goal due to these features of our problem. In addition to Assumptions A.{A.6(a), we impose a few mild conditions. For convenience, let V denote the rst element of Y and W the second through last elements of Y. Also let r and r 0 be the gradient operator and the Jacobian operator with respect to, and let r, r r 0. First, we require that R( ; ) have a unique minimum denoted, where R(; ), kv? W 0? Z 0 k ; (; ) 2 R l R k : (5) Because k k is convex, R is a convex function on R l R k ; so, R( ; ) : R k! R is also convex. Our requirement is thus that R( ; ) is strictly convex at (Assumption A.6(b)). The next lemma gives a convenient result in investigating the behavior of Q around. that Lemma 5.: Suppose that Assumptions A.{A.5 hold. Then there exists a compact subset ~? of R k such is interior to ~?, and where ^R n (; ;!), n? n X inf R(; ) = inf R(; ); 2 B; (6) 2? ~ 2R k sup inf ^R n (; ; )? inf ^Rn (; ; ) = 0 a.a. n 2 N a.s.-p (7) 2B 2? ~ 2R k jv t (!)? W t (!) 0? Z t (!) 0 j; (; ;!) 2 R l R k ; n 2 N; and a.a. stands for \almost all", so that \for a.a. n 2 N means \for all n 2 N with at most nite exceptions." It follows from this lemma that for each 2 B, there exists y 2 ~? such that R(; y ) = inf 2R k R(; ). Further, we have: Lemma 5.2: Suppose that Assumptions A.{A.6 hold. Let f j g j2n be an arbitrary sequence in B converging to, ~? a compact subset of R k mentioned in Lemma 5., and f j g j2n a sequence in ~? such that Then f j g converges to. R( j ; j ) = inf 2R k R( j ; ); j 2 N: 3

14 This result is used below to establish the quadratic approximation to inf 2R k R(; ). Second, we require that R be twice continuously dierentiable on an open neighborhood of ( 0 ; 0 ) 0 and an open neighborhood of ( 0 ; 0 0 ) 0 2 R l R k (Assumption A.7(a)). Despite that ^Rn (; ; ) is nondierentiable at some points for typical realizations, the dierentiability requirement for R is not restrictive. As easily veried, a sucient condition for this requirement is that V has the continuous conditional density function f(jw; Z) (with respect to the Lebesgue measure) given W and Z, and random variables in W and Z has nite second moments. Under this condition, we have that for each ( 0 ; 0 ) 0 2 R l R k, r R(; ) = E[(2F (W 0 + Z 0 jw; Z)? )W ]; r R(; ) = E[(2F (W 0 + Z 0 jw; Z)? )Z]; r R(; ) = 2E[f(W 0 + Z 0 jw; Z)W W 0 ]; r R(; ) = 2E[f(W 0 + Z 0 jw; Z)W Z 0 ]; (8) and r R(; ) = 2E[f(W 0 + Z 0 jw; Z)ZZ 0 ]; (9) where F (jw; Z) is the conditional distribution function of V given W and Z. Given the dierentiability of R, let l, r R( ; ); l, r R( ; ); J, r R( ; ); J, r R( ; ); J, r R( ; ); J, r R( ; ); l 0, r R( ; 0); J 0, r R( ; 0): (0) Because the dierentiable function R( ; ) : R k! R is minimized at, l r is a zero vector. Using this fact with the second order Taylor series expansion, we obtain that R(; ) =R( ; ) + l 0 (? ) + 2 (? ) 0 J (? ) + (? ) 0 J (? ) + 2 (? ) 0 J (? ) + o(j? j 2 + j? j 2 ) as (; )! ( ; ) () 4

15 and that R(; 0) =R( ; 0) + l 0 0 (? ) + 2 (? ) 0 J 0 (? ) + o(j? j 2 ) as! : (2) By imposing a condition that J is positive denite (Assumption A.7(b)), we obtain: Lemma 5.3: Suppose that Assumptions A.{A.6 and A.7(a)(b) hold. Then: inf 2R k R(; ) =R( ; ) + l 0 (? ) + 2 (? ) 0 (J? J J? J )(? ) + o(j? j 2 ) as! : We now combine Lemma 5.3 with (2) to obtain the quadratic approximation to Q. (a) (b) Lemma 5.4: Suppose that Assumptions A.{A.6 and A.7(a)(b) hold. Then: l R( ; )? l 0 R( ; 0) = 0: log Q(;?) = log R( ; ) R( ; 0)? 2 (? ) 0 K(? ) + o(j? j 2 ) as! ; where K,? ( J? J J?J R( ;? J 0 ) R( ; 0)? l l 0 R( ; ) 2 + ) l0 l0 0 R( : (3) ; 0) 2 Because Q is uniquely maximized at, K is positive semi-denite. We here assume that K is positive denite (Assumption A.7(c)) so that the second order term does not vanish in any direction. The assumptions made so far guarantees that Q is well approximated by a quadratic function in the neighborhood of. We next impose mild conditions to ensure that we can capture the random error in approximating Q by ^Qn. This requirement is important, because the random error is the source of the randomness of the estimator. Our new requirements are that each element of X has a nite second moment (Assumption A.8), that P [V? W 0? Z 0 = 0] = 0; (4) 5

16 and that P [V? W 0 = 0] = 0 (5) (Assumption A.9). In investigating the asymptotic normality, we often assume that the fourth moment of each random variable in X are nite. Our moment condition is substantially weaker. Also note that we do NOT require that (4) and (5) hold for arbitrary values of and. If V has a continuous conditional density given W and Z, (4) and (5) obviously hold. Even if V? W 0? Z 0 = 0 with positive probability for some and, (4) and (5) may still hold. Lemma 5.5: Suppose that Assumptions A.{A.9 hold and let, ( 0 ; 0 ) 0 and, B ~?. (a) Dene r : R q! R by r(x; ), 8 >< >: 0 if = ; j? j (jv? w0? z 0 j? jv? w 0? z 0 j +sgn(v? w 0? z 0 )(w 0 (? ) + z 0 (? ))) where x = (v; w 0 ; z 0 ) 0 2 R R l R k, = ( 0 ; 0 ) 0 2 R l R k, and sgn(a) = 8 >< Also for each n 2 N dene n : R l R k! R by >: if a > 0; 0 if a = 0;? if a < 0: otherwise, n (; ;!), n? n X fr(x t (!); ; )? E[r(X; ; )]g; (; ;!) 2 R l R k ; n 2 N: Then for each sequence of Euclidean balls fb n g n2n in that shrinks down to ( ; ), sup (;)2B n jn =2 n (; ; )j! 0 as n! prob-p: (6) (b) Dene r 0 : R q B! R by r 0 (x; ), 8 >< >: 0; if = ; j? j (jv? w0 j? jv? w 0 j + sgn(v? w 0 )w 0 (? )) otherwise, 6

17 where x = (v; w 0 ) 0 2 R R l. Also for each n 2 N dene 0 n : R l! R by 0 n(;!), n? n X fr 0 (X t (!); )? E[r 0 (X; )]g; (;!) 2 R l ; n 2 N: Then for each sequence of Euclidean balls f ~ Bn g n2n in B that shrinks down to, An implication of Assumption A.9 is: sup 2 ~ B n jn =2 0 n(; )j! 0 as n! prob-p: (7) Lemma 5.6: Suppose that Assumptions A., A.2, and A.9 hold. Then l = E[?sgn(V? W 0? Z 0 )W ] l = E[?sgn(V? W 0? Z 0 )Z] and l 0 = E[?sgn(V? W 0 )W ]: Dene (x),?sgn(v? w 0? z 0 )w? l ; x = (v; w 0 ; z 0 ) 0 2 R R l R k ; (8) (x),?sgn(v? w 0? z 0 )z? l (9) =?sgn(v? w 0? z 0 )z; x = (v; w 0 ; z 0 ) 0 2 R R l R k ; (20) and 0 (x),?sgn(v? w 0 )w? l 0 ; x = (v; w0 ; z 0 ) 0 2 R R l R k : (2) Then all (X), (X) and 0 (X) have zero mean vectors. By the denition of f ng n2n in Lemma 5.5, we have that for each n 2 N ^R n (; ; )? ^Rn ( ; ; ) = fr(; )? R( ; )g + n? n X (X t ) 0 (? ) +n? n X (X t ) 0 (? ) + n?=2 j? jn =2 n (; ; ): (22) 7

18 We also obtain from the denition of f 0 n g n2n that for each n 2 N ^R n (; 0; )? ^Rn ( ; 0; ) = fr(; 0)? R( ; 0)g + n? n X 0 (X t ) 0 (? ) We combine () with (22) and (2) with (23) to derive: +n?=2 j? jn =2 0 n(; ) (23) Lemma 5.7: Suppose that Assumptions A.{A.9 hold. Then: (a) For any sequence of random vectors f(b n ; g n ) :! R l R k g n2n that converges to ( ; ) prob-p, ^R n (b n ; g n ; )? ^Rn ( ; ; ) = l 0 (b n? ) + 2 (b n? ) 0 J (b n? ) + (b n? ) 0 J (g n? ) + 2 (g n? ) 0 J (g n? ) + n? n X (X t ) 0 (b n? ) +n? n X (X t ) 0 (g n? ) + o p (jb n? j 2 ) + o p (jg n? j 2 ) + o p (n? ) as n! : (b) For any sequence of random vectors fb n :! R l g n2n that converges to a.s.-p, ^R n (b n ; 0; )? ^Rn ( ; 0; ) = l 0 0 (b n? ) + 2 (b n? ) 0 J 0 (b n? ) +n? n X 0 (X t ) 0 (b n? ) + o p (jb n? j 2 ) + o p (n? ) as n! : To relate Lemma 5.7 to finf 2R k ^R n (b n ; )g n2n, the following result is convenient. Lemma 5.8: Suppose that Assumptions A.{A.6 hold. Let? be a compact subset of R k to which is interior. Also let fb n :! Bg n2n be a sequence of l random vectors that converges to a.s.-p. Then there exists a sequence of k random vectors fg n :!?g n2n such that Further, let ^R n (b n ; g n ; ) = inf 2R k ^Rn (b n ; ; ) a.a. n 2 N a.s.-p: (24) g n,? J? ( J (b n? ) + n? n X (X t ) ) ; n 2 N: 8

19 The sequence f(g n ; g n )g n2n satises that jg n? g n j = o p (n?=2 ) + o p (jb n? j) as n! ; (25) and that inf ^Rn (b n ; ; ) = ^Rn (b n ; g n ; ) + o p (n? ) + o p (jb n? j 2 ) as n! : (26) 2R k Lemma 5.8 suggests that we can obtain a quadratic approximation to inf 2R k ^Rn (b n ; ; ) by deriving a quadratic approximation to ^Rn (b n ; g n ; ). We decompose the dierence between ^Rn (b n ; ; ) and R( ; ) as ^R n (b n ; g n ; )? R( ; ) = ( ^Rn (b n ; g n ; )? ^Rn ( ; ; )) + ( ^Rn ( ; ; )? R( ; )); n 2 N: (27) Then we apply Lemma 5.7(a) to the rst term on the right-hand side of this equality and substitute the denition of fg n g n2n to the resulting expression. By combining the result with (26) of Lemma 5.8, we obtain: Lemma 5.9: Suppose that Assumptions A.{A.6 hold and let fb n :! Bg n2n be a sequence of l random vectors that converges to a.s.-p. Then inf 2R k ^Rn (b n ; ; )? R( ; ) =? 2 n? n X (X t )! 0 J? n? + l + n? n X nx (X t ) f (X t )? J J? (X t )g! 0 (b n? )! + ^Rn ( ; ; )? R( ; ) + 2 (b n? ) 0 (J? J J? J )(b n? ) + o p (n? ) + o p (jb n? j 2 ) as n! : (28) The objective function in the L IV estimation can be written as log ^Qn (;?b n ; ) = log Q(;? ) + flog ^Qn (;?b n ; )? log Q(;? )g = log Q(;? ) + log inf ^Rn (b n ; ; )? log R( ; ) 2R n k o? log ^Rn (b n ; 0; )? log R( ; 0) ; n 2 N: (29) 9

20 We now apply the second-order Taylor expansion of the logarithmic function to the second and third term on the right-hand side of this equality and apply Lemmas 5.7 and 5.9 to obtain: Lemma 5.0: Suppose that Assumptions A.{A.6 hold and let fb n :! Bg n2n be a sequence of l random vectors that converges to a.s.-p. Then log ^Qn (;?b n ; ) = n + 0 n(b n? )? 2 (b n? ) 0 K(b n? ) + o p (n? ) + o p (jb n? j 2 ) as n! ; (30) where n, log Q(;? )? ( X n 0 ( ) X n 2R( ; n? (X t )) J? n? (X t ) ) + R( ; ) f ^Rn ( ; ; )? R( ; )g? R( ; 0) f ^Rn ( ; 0; )? R( ; 0)g? 2R( ; ) f ^Rn ( ; ; )? R( ; )g R( ; 0) f ^Rn ( ; 0; )? R( ; 0)g 2 ; n 2 N; 2 n, n? n X ( ) (X t )? J J? (X t ) R( ;? 0 (X t) ) R( ; 0)? f ^Rn ( ; ; )? R( ; )gl R( ; ) 2 + f ^Rn ( ; 0; )? R( ; 0)gl 0 R( ; 0) 2 ; n 2 N; and K is dened by (3). Ignoring o p (n? ) + o p (jb n? j 2 ), the right hand side of (30) is minimized by setting + K? n to b n. This leads to the conjecture that K? n should accurately approximate ^ n, and this conjecture is indeed correct. Lemma 5.: Suppose that Assumptions A.{A.6 hold. Then ^ n = + K? n + o p (n?=2 ) as n! ; where K and f n g n2n are as in Lemma 5.0. By the asymptotic equivalence lemma (Rao (973), pp. 22{23), it follows from Lemma 5. that fn =2 ( ^n? )g n2n has the same asymptotic distribution as fn =2 K? n g n2n, provided that the latter converges in 20

21 distribution. Let (x), (x)? J J? (x) R( ;? 0 (x) ) R( ; 0)? fjy? w0? z 0 j? R( ; )gl R( ; ) 2 + fjy? w0 j? R( ; 0)gl 0 R( ; 0) 2 ; x = (v; w 0 ; z 0 ) 0 2 R R l R k : (3) Then we can easily verify that E[(X)] = 0 and E[(X) 0 (X)] < (see Assumption A.8). Because n can be written as n = n? n X (X t ); n 2 N; n is the sample average of zero mean random vectors with nite second moments. We assume that, E[(X)(X) 0 ] is nonsingular so that the central limit theorem (CLT) for i.i.d. random vectors (Rao (973), p. 28) applies to n =2 n. The asymptotic distribution of K? n =2 n is thus normal, and the next theorem follows. Theorem 5.2: Suppose that Assumptions A.{A.0 hold. Then D?=2 n =2 ( ^ n? ) A N(0; I l ) as n! ; where D?=2 is the inverse of the square root matrix of D, K? K? ; and I l is the l l identity matrix. Suppose that the specication is correct in that for some 0 2 R l Q(;? 0 ) = and that we take a suciently large compact set for B so that 0 2 B. Then we have that ( ; ) = ( 0 ; 0). This fact makes the asymptotic covariance matrix simpler under correct specication. Corollary 5.3: Suppose that Assumptions A.{A.0 hold, and that there exists 0 2 B such that Q(;? 0 ) =. Then it holds that = 0, and D?=2 0 n =2 ( ^ n? 0 ) A N(0; I l ) as n! ; where D 0, (J J? J )? J J? E[ZZ 0 ]J? J (J J? J )? : 2

22 The IV estimation might be motivated by a more strong notion of correctness that with the \true" parameter value 0 2 B, V? W 0 0 and Z are statistically independent. The asymptotic covariance matrix is further simplied given the strongly correct specication. Corollary 5.4: Suppose that Assumptions A.{A.0 hold, and that there exists 0 2 B such that U, V? W 0 0 and Z are statistically independent, and the median of U is zero. Also suppose that the distribution of U has a density f U (with respect to the Lebesgue measure) that is continuous and strictly positive at zero. Then n?=2 ( ^ n? 0 ) A N(0; 0) as n! ; where 0, 4f U (0) 2 E[W Z0 ]E[ZZ 0 ]? E[ZW 0 ]: When Z = W, we have that ^R n (; ; ) = n? n X jv t? W t ( + )j: For each realization, the objective function becomes one if we take the LAD estimate in regressing Y t on W t for. Thus, the L IV estimator is simply the LAD estimator in this special case, and Corollary 5.3 applies because l = k implies that the given specication is correct in our sense. Because we have that J 0 = J = J = J ; and that E[W W 0 ] = E[ZZ 0 ], the asymptotic covariance matrix of fn =2 ( ^ n? )g n2n is J 0? E[W W 0 ]J 0?, where J 0 is the Hessian matrix of E[jV? W 0 j] with respect to. Corollary 5.4 also applies to the LAD estimator under the strongly correct specication. The resulting asymptotic covariance matrix of the LAD estimator (with stochastic regressors) is (4f U (0) 2 )? E[W W 0 ]?, which takes essentially the same form as that provided by Bassett and Koenker (978) and Pollard (99). 6 Estimation of Asymptotic Covariance Matrix In statistical inference on, we need to consistently estimate the covariance matrix D of Theorem 5.2. This section proposes a consistent estimator for D. Because D = K? K?, it suces to develop consistent 22

23 estimators for and K. To see what is involved in estimation of, we rewrite as (x) =?sgn(v? w0? z 0 )w + J J? sgn(v? w 0? z 0 )z R( ; )??sgn(v? w0 )w R( ; 0)? fjv? w0? z 0 j? R( ; )gl R( ; ) 2 =?H(x); x = (v; w 0 ; z 0 ) 0 2 R R l R k ; + fjv? w0 j? R( ; 0)gl 0 R( ; 0) 2 where the rst equality holds by Lemma 5.4(a), H is l (l + k + l + 2) matrix dened by! I H, l R( ; ) ; C 0 R( ; ) ;? I l R( ; 0) ; l R( ; ) ;? l 0 2 R( ; ; 0) 2 e : R q R l R k! R l+k+l+2 is dene by e(x), 0 sgn(v? w 0? z 0 )w sgn(v? w 0? z 0 )z sgn(v? w 0 )w jv? w 0? z 0 j? R( ; )j jv? w 0 j? R( ; 0)j C A ; x = (v; w 0 ; z 0 ) 0 2 R R l R k ; and C,?J? J : It follows that can be written as = H H 0 ; where, E[e(X)e(X) 0 ]: (32) We can thus estimate consistently, if we can consistently estimate R( ; ), R( ; 0), l, l 0 C, and. By Lemmas 3.(b) and 3.2(b), we can easily show that and that inf ^Rn ( ^ n ; ; )! R( ; ) as n! a.s.-p (33) 2R k ^R n ( ^ n ; 0; )! R( ; 0) as n! a.s.-p: (34) 23

24 We consider how to estimate C, and K in what follows. A natural estimator of l is ^l ;n,?n? n X sgn(v t? W 0 t ^ n? Z 0 t^ n )W t ; n 2 N; (35) where f^ n :!?g n2n is a sequence of random vectors such that ^R n ( ^ n ; ^ n ; ) = inf ^R n ( ^ n ; ; ) a.s.-p; n 2 N; 2? and? is a compact subset of R l we pick. We assume that is interior to? (Assumption A.). By the Kolmogorov strong law of large numbers (Rao (973), p. 5), it follows from Assumptions A. and A.2 that?n? n X sgn(v t? W 0 t? Z 0 t )W t! l as n! a.s.-p: We can also show that n? nx sgn(v t? W 0 t ^ n? Z 0 t^ n )W t? n? n X sgn(v t? W 0 t? Z 0 t )W t! 0 as n! prob-p (36) (see the proof of Lemma 6.). The estimator f^l;n g n2n is thus consistent for l. Analogously, f^l0 ;n g n2n dened by ^l0 ;n,?n? n X sgn(v t? W 0 t ^ n )W t ; n 2 N (37) is consistent for l 0. Lemma 6.: Suppose that Assumptions A.{A. hold. Let f^l ;n g n2n and f^l 0 ;n g n2n be dened by (35) and (37), respectively. Then f^l;n g converges to l prob-p, and f^l0 ;n g converges to l 0 prob-p. A natural estimator for ^ nt, 0 is its sample analogue. Dene sgn(v t? Wt 0 ^ n? Zt^ 0 n )W t sgn(v t? Wt 0 ^ n? Zt^ 0 n )Z t sgn(v t? Wt 0 ^ n )W t jv t? Wt 0 ^ n? Zt^ 0 n j? ^Rn ( ^ n ; ^ n )j jv t? Wt 0 ^ n j? ^Rn ( ^ n ; 0)j C A ; t = ; : : : ; n; n 2 N: 24

25 Then the sample analogue of is ^ n, n? n X ^ nt^ 0 nt ; n 2 N: (38) Lemma 6.2: Suppose that Assumptions A.{A.9 hold and let f ^ ng n2n and be dened by (38) and (32), respectively. Then f ^ ng converges to prob-p. We next consider estimation of C =?J J. J and J are parts of the Hessian matrix of function R at ( ; ). A natural approach is that we use the Hessian matrix of ^Rn as an estimator for the Hessian of R, because ^Rn approximates R. Nevertheless, this approach is not suitable in our current situation, because ^R n is not smooth. We thus need to take an alternative approach. Our approach is based on Lemma 5.8. Let? be a compact set to which is interior (Assumption A.). Let fb n g n2n be a sequence of l random vectors that converges to prob-p. As shown in Lemma 5.8, there exist a sequence of k random vectors fg n g that minimizes ^Rn (b n ; ; ) with respect to on?, and fg n g is related to fb n g through g n = + C(b n? )? J? n? nx (X t ) + o p (n?=2 ) + o p (jb n? j) as n! : (39) This suggests that a slight change in b n would yield a change in g n that is approximately equal to the linear transformation of the change in b n by C. A possible way to proceed is use this observation with the perturbation method, which is often used in the statistical literature (see Meng and Rubin (99) for example). Take a sequence of random variables f n g n2n that converges to zero prob-p satisfying that n?=2 = n = O p () (Assumption A.2). This n represents the small change we make in b n. Though n does not have to be random, the randomness assumed here may be useful, allowing for data-based choices of the value of n. Let e i be the l vector that is obtained by replacing the ith element of the l zero vector with one. Let ^ n be the (a.s.) minimizer of ^Rn ( ^ n ; ; ) with respect to on? and ~ (i) n ^R n ( ^n + n e i ; ; ) with respect to on?, i = ; 2; : : : ; l. Then? n the (a.s.) minimizer of (~ n (i)? ^ n ) is consistent for the ith column of?j? J as formally shown in the proof of Lemma 6.3. We thus dene an estimator ^Cn for C as ^C n,? n (~ () n? ^ n);? n (~ (2) n? ^ n); : : : ; n? (~ (l) n? ^ n) ; n 2 N: (40) 25

26 Lemma 6.3: Suppose that Assumptions A.{A.2 hold. Then f ^Cn g n2n dened by (40) converges to C =?J? J prob-p. It is clear from its proof that the result of Lemma 6.3 is valid even if each element ^Cn uses a dierent series for f n g. To estimate K, we use the property given in Lemma 5.0. We again employ a sequence of random variables f n g n2n that converges to zero prob-p satisfying that n?=2 = n = O p () (Assumption A.2). With some algebra, we can derive from Lemma 5.0 that for each i = ; 2; : : : ; l,?2 n?2 log ^Qn (;?( ^ n + n e i ); )? log ^Qn (;? ^ n ; ) =e 0 ike i + o p () = K (ii) + o p () as n! ; (4) where K (ij) denotes the (i; j)-element of K, i; j = ; 2; : : : ; l. We thus dene ^K n(ii),?2 n?2 log ^Qn (;?( ^n + n e i ); )? log ^Qn (;? ^n ; ) ; i = ; 2; : : : ; l; n 2 N: (42) We can also derive that for each pair of distinct integers (i; j) between and l,? n?2 log ^Qn (;?( ^ n + n e i ); )? log ^Qn (;? ^ n ; ) = 2 (e i + e j ) 0 K(e i + e j ) + o p () = K (ij) + 2 K (ii) + 2 K (jj) + o p () as n! : (43) We thus propose an estimator for K (ij) : ^K n(ij),??2 n log ^Qn (;?( ^ n + n (e i + e j )); )? log ^Qn (;? ^ n ; )? 2 ^K n(ii)? 2 ^K n(jj) ; i 6= j; i; j = ; 2; : : : ; l; n 2 N: (44) Lemma 6.4: Suppose that Assumptions A.{A.0 and A.2 hold. Dene ^K n, 0 ^K n() ^Kn(2) ^Kn(l) ^K n(2) ^Kn(22) ^Kn(2l) ^K n(l) ^Kn(l2) ^Kn(ll) C A ; n 2 N; where K n(ij) are dened by (42) and (44). Then f ^Kn g n2n converges to K dened by (3) prob-p. Combining (33), (34) and Lemmas 6.{6.4 yields the consistent estimator for D. 26

27 Theorem 6.5: Suppose that Assumptions A.{A.2 hold. Dene ^D n, ^Kn ^ n ^Kn ; n 2 N; where ^H n,? I l ^R n ( ^ n ; ^ n ) ;? ^C0 n ^R n ( ^ n ; ^ n ) ;? ^ n, ^Hn ^ n ^H0 n ; n 2 N; I l ^R n ( ^ n ; 0) ;? ^l;n ^R n ( ^ n ; ^ n ) 2 ; +! ^l0 ;n ^R n ( ^ ; n 2 N; n ; 0) 2 and f ^ ng n2n is dened by (38). Then f ^Dn g n2n converges to D dened in Theorem 5.2 prob-p. 7 Specication Test If Q(;?) = for some 2 B, our model is correctly specied. In this section, we consider how to test this specication correctness. The proposed test corresponds to the over-identication test in the GMM estimation framework. A necessary and sucient condition for the correct specication is that Q(;? ) =. If the model is misspecied, we have that Q(;? ) <. Because we can consistently estimate Q(;? ) by ^Qn (;? ^ n ; ), one might think that a test based on the value of ^Qn (;? ^ n ; ) may be appropriate. Nevertheless, the limiting distribution of ^Qn (;? ^ n ; ) under the null, which we can derive from Lemmas 5.0 and 5., is that of a quadratic form of independently and normally distributed random variables with an unknown weighting matrix, as is the case for many generalized likelihood ratios. This approach is thus quite inconvenient. Alternatively, we here employ another condition that = 0, which is necessary and sucient for the correct specication. Let ^ n be the minimizer of ^Rn ( ^ n ; ; ) with respect to on some compact set? R k. Then ^ n consistently estimates. Below, we show that a certain quadratic form of ^ n is distributed with a 2 -distribution under the null and show that the same quadratic form diverges to innity prob-p under the alternative. We can thus form a consistent specication test based on this statistic. We rst derive a convenient approximation form for ^ n under the null. Note that K = J J? J =R( ; 0); and n = R( ; 0) J J? n? nx (X t ); n 2 N 27

28 under the null. By using Lemmas 5.8 and 5. with this fact, it is straightforward to derive that where ^ n =?J?=2 MJ?=2 n? nx (X t ) + o p (n?=2 ) as n! ; (45) M, I k? J?=2 J (J J? J )? J J?=2 : Because M is symmetric and idempotent with rank k? l, there exists a k (k? l) matrix T such that M = T T 0 and T 0 T = I k?l. Dene T 2, J?=2 J (J J? J )?=2 : Then we have that T 0 2T 2 = I l and that MT 2 = 0. The columns of T 2 is thus orthonormal vectors that spans the null space of M. Letting T is an orthogonal matrix. T, (T ; T 2 ); Substituting T T 0 into M in (45) and premultiply both sides of the resulting equation by n =2 T 0 J =2, we obtain that n =2 T 0 J =2 ^ n =? 0 n 0 l ; C A + o p () as n! ; (46) where 0 l denotes the l zero vector, and We have that and that n, T 0 J?=2 n?=2 nx E[T 0 J?=2 (X)] = 0 (X t ); n 2 N: var[t 0 J?=2 (X)] = T 0 J?=2 E[ZZ 0 ]J?=2 T : It follows by CLT for i.i.d. random vectors (Rao (973), p. 28) that (T 0 J?=2 E[ZZ 0 ]J?=2 T )?=2 n A N(0; Ik?l ) as n! : 28

29 Now let A be a symmetric positive denite matrix whose upper left (k? l) (k? l) block of A is equal to By using (46), we have that A, (T 0 J?=2 E[ZZ 0 ]J?=2 T )? (47) n^ 0 nj =2 T AT 0 J =2 ^ n = 0 na n + o p () as n! : By the asymptotic equivalence lemma (Rao (973), pp. 22{23), it follows that n^ 0 n J =2 T AT 0 J =2 ^ n A 2 (l? k) as n! : In addition, we have that under the alternative hypothesis that 6= 0, ^ 0 n J =2 T AT 0 J =2 ^ n! 0 J =2 T AT 0 J =2 > 0 as n! prob-p: This means that if we have an consistent estimator for J =2 T AT 0 J =2, we can form a consistent specication test based on n^ 0 nj =2 T AT 0 J =2 ^ n. where A convenient choice for A is A, 0 A 0 0 A 22 C A ; A 22, (J J? J )? : For this particular choice, we have that Note that J =2 T AT 0 J =2 = (J?=2 = = = 8 >< T A? T 0 J?=2 >: J?=2 n T 0 T 0 J?=2 E[ZZ 0 ]J?=2 T 0 0 A? 22 )? C A T 0 J?=2 9 >= >; J?=2 T T 0 J?=2 E[ZZ 0 ]J?=2 T T 0 J?=2 + J?=2 T 2 A? T 22 2J?=2 n o J?=2 MJ?=2 E[ZZ 0? ]J?=2 MJ?=2 + J?=2 T 2 A? T 22 2J?=2 :? o? J?=2 MJ?=2 = J?? J? J (J J? J )? J J? = J?? C(R( ; )K)? C 0 ; 29

30 where the second equality holds under the null. Also note that J?=2 T 2 A? T 22 2J?=2 = J? J (J J? J )?=2 (J J? J )(J J? J )?=2 J J? It follows that under the null, = J? J J J? = CC 0 : J =2 T AT 0 J =2 = (J?? C(R( ; )K)? C 0 )E[ZZ 0 ](J?? C(R( ; )K)? C 0 ) + CC 0 : We thus have that under the null n^ 0 nf(j?? C(R( ; )K)? C 0 )E[ZZ 0 ](J?? C(R( ; )K)? C 0 ) + CC 0 g?^ n A 2 (k? l) as n! : (48) To use (48) to form a test, we need to consistently estimate the unknown constants in it. Lemmas 6.3 and 6.4 provide consistent estimators for K and C. The matrix J can be consistently estimated in an analogous manner. For each n 2 N, dene ^J ;n(ii), 2?2 n log ^Rn ( ^ n ; ^ n + n e i ; )? log ^Rn ( ^ n ; ^ n ; ) ; i = ; 2; : : : ; l; (49) and ^J ;n(ij), n?2 log ^Rn ( ^ n ; ^ n + n (e i + e j ); )? log ^Rn ( ^ n ; ^ n ; ) (50)? 2 ^J ;n(ii)? 2 ^J ;n(jj) ; i 6= j; i; j = ; 2; : : : ; l: (5) Lemma 7.: Suppose that Assumptions A.{A.0 and A.2 hold. Dene ^J ;n, 0 ^J ;n() ^J;n(2) ^J;n(l) ^J ;n(2) ^J;n(22) ^J;n(2l) ^J ;n(l) ^J;n(l2) ^J;n(ll) C A ; n 2 N; where J n(ij) are dened by (49) and (5). Then f ^J;n g n2n converges to J dened by (0) prob-p. We now dene a test statistic T n by T n, n^ 0 n ^J + ;n? ^Cn ( ^Rn ( ^ n ; ^ n ; ) ^Kn ) + ^C0 n n? n X Z t Z 0 t ^J +? ^Cn ( ^RN ( ^ n ; ^ n ; ) ^Kn ) + ^C0 n + ^Cn ^C0 n +^ n ; n 2 N: (52)! 30

31 It is straightforward to verify that under the null T n? n^ 0 n f(j?? C(R( ; )K)? C 0 )E[ZZ 0 ](J?? C(R( ; )K)? C 0 ) + CC 0 g?^ n = o p (): By the asymptotic equivalence lemma (Rao (973), pp. 22{23), it follows that T n is asymptotically distributed with 2 (k? l) under the null. We can also easily verify that the asymptotic power of T n is one. Theorem 7.2: Suppose that Assumptions A.{A.2 hold. Also let T n be the statistic dened in (52). (a) If = 0, then T n A 2 (k? l) as n! : (b) If 6= 0, then for any real constant c P [T n > c]! as n! : The specication test with size p based on statistic T n is that we reject the null hypothesis when the observed value of T is greater than the upper 00p percentile of 2 -distribution with k? l degrees of freedom; and accept the null hypothesis otherwise. 8 Concluding Remarks The error term in a structural equation should not be explained by instrumental variables. If we judge whether the instrumental variables explain the error by the orthogonality condition, the IV estimators such as the two-stage least squares estimators are generated. In this paper, we propose an alternative approach. Choose a dispersion measure for univariate distributions. Under the population distribution, consider the ratio of the dispersion of the part of the error term the IVs cannot explain to the dispersion of the error term itself for each parameter value. This ratio ranges between zero and one. The higher this ratio is, the less the error is related to the instruments. If the model assumption is correct, the ratio attains one for some parameter value, and the maximizer gives the desired parameter value; the maximizer gives a model \closest" to our assumption otherwise. Our estimator is dened to be the maximizer of the sample analogue of the ratio of the dispersions. The estimators derived in this manner is invariant to the normalization constraint and require no reduced form specication. 3

Linear Regression and Its Applications

Linear Regression and Its Applications Predrag Radivojac October 13, 2014 Given a data set D = {(x i, y i )} n the objective is to learn the relationship between features and the target. We usually start