Consistent Parameter Estimation for Conditional Moment Restrictions

Size: px

Start display at page:

Download "Consistent Parameter Estimation for Conditional Moment Restrictions"

Prosper Chad Cameron
5 years ago
Views:

1 Consistent Parameter Estimation for Conditional Moment Restrictions Shih-Hsun Hsu Department of Economics National Taiwan University Chung-Ming Kuan Institute of Economics Academia Sinica Preliminary; Please do not cite or quote. Author for correspondence: Chung-Ming Kuan, Institute of Economics, Academia Sinica, Taipei 115, Taiwan; The research support from the National Science Council of the Republic of China (NSC H ) is gratefully acknowledged.

2 Abstract In estimating conditional moment restrictions, a well known difficulty is that the estimator based on a set of implied unconditional moments may lose its consistency when the parameters are not globally identified. In this paper, we consider a continuum of unconditional moments that are equivalent to the postulated conditional moments and project them along the exponential Fourier series. By constructing an objective function using these Fourier coefficients, a consistent GMM estimator is readily obtained. Our estimator compares favorably with that of Domínguez and Lobato (2004, Econometrica) in that our method utilizes the full continuum of unconditional moments. We establish the asymptotic properties of the proposed estimator. It is also shown that an efficient estimator can be derived from the proposed consistent estimator via a Newton-Raphson step. Our simulations demonstrate that the proposed estimator outperforms that of Domínguez and Lobato (2004) in terms of bias, standard error and mean squared error. JEL classification: C12, C22 Keywords: conditional moment restrictions, Fourier coefficients, generically comprehensive function, GMM, parameter identification

3 1 Introduction Many economic and econometric models can be characterized in terms of conditional moment restrictions. Estimating the parameters in such restrictions thus plays a major role in empirical studies. A typical estimation approach is to find a finite set of unconditional moment restrictions implied by the conditional moment restrictions and apply a suitable estimation method, such as the generalized method of moment (GMM) of Hansen (1982) or the empirical likelihood method of Qin and Lawless (1994). The instrumental-variable estimation method for regression models is a leading example. Yet there also exist estimation methods based directly on the conditional moments, e.g., Ai and Chen (2003) and Kitamura, Tripathi, and Ahn (2004). A crucial assumption in the former apporach is that the parameters in the conditional moment restrictions can be gloabally identified by the implied, unconditional restrictions. With this assumption, the consistency of an estimator can be easily established under suitable regularity conditions. Much research interest therefore focuses on estimator efficiency, e.g., Chamberlain (1987), Newey (1990, 1993), Carrasco and Florens (2000), and Donald, Imbens, and Newey (2003). Domínguez and Lobato (2004), however, pointed out that the assumption of global identifiability may fail to hold in nonlinear models. It is well known that, except in some special cases, conditional moment restrictions are equivalent to infinitely many unconditional restrictions in general (Bierens, 1982, 1990; Chamberlain, 1987; Stinchcombe and White, 1998). The identification problem may arise when one arbitrarily selects a finite number of unconditional moments for estimation. Domínguez and Lobato (2004) also demonstrated that this may happen even when the unconditional moments are based on the so-called optimal instruments. When the parameters of interest are not identified, the corresponding (GMM) estimator is inconsistent. To circumvent the potential non-identification and inconsistency problems, Domínguez and Lobato (2004) proposed a different approach to estimating conditional moment restrictions. They considered a continuum of unconditional moment restrictions that are equivalent to the conditional restrictions, where the unconditional moments depend on the instruments generated from the indicator function. There are some disadvantages of their approach. In view of Stinchcombe and White (1998), any generically comprehensively revealing function, such as the exponential and logistic functions, can also induce equivalent, unconditional moment restrictions. Compared with these functions, 1

4 the indicator function, which takes only the values one and zero, may not well present the information in the conditioning variables. Moreover, Domínguez and Lobato (2004) did not utilize the full continuum of unconditional restrictions but included only a finite number of them in estimation. This may result in further efficiency loss, as argued by Carrasco and Florens (2000). In this paper, we propose a new approach for consistent estimation of conditional moment restrictions. We also consider a continuum of unconditional moments that are equivalent to the postulated conditional moments. Instead of dealing with these moments directly, we project them along the exponential Fourier series. By constructing an objective function using these Fourier coefficients, a consistent GMM estimator is readily obtained. Because all unconditional moments are incorporated into each Fourier coefficient, our estimation method in effect utilizes all possible information. Moreover, the remote Fourier coefficients provide little information on the parameters of interest and may be excluded from the objective function. The objective function is hence relatively simple while still involving the full continuum of moment conditions. By choosing the exponential function to generate instruments, the resulting unconditional moments and the Fourier coefficients have analytic forms. These properties greatly facilitate estimation in practice. Compared with Carrasco and Florens (2000), our method is also simpler because we do not require preliminary estimation of the eigenvalues and eigen-functions of a covariance operator, nor do we need to determine a regularization parameter. In this paper, we show that under quite general conditions, the proposed estimator is consistent and asymptotically normally distributed. It is also shown that an efficient estimator is readily obtained from the proposed consistent estimator via a Newton-Raphson step, as in, e.g., Newey (1990) and Domínguez and Lobato (2004). Our simulations demonstrate that, under various settings, the proposed estimator compares favorably with that of Domínguez and Lobato (2004) in terms of bias, standard error and mean squared error. This paper is organized as follows. The proposed approach is introduced in Section 2. The asymptotic properties of the proposed estimator are established in Section 3. We report Monte Carlo simulation results in Section 4 and discuss some extensions of the proposed method. Section 6 concludes this paper. All proofs are deferred to the Appendix. 2

5 2 The Proposed Estimation Approach We are primarily interested in estimating θ o, the q 1 vector of unknown parameters, in the following conditional moment restriction: IE[h(Y, θ o ) X] = 0, w.p.1, (1) where h is a p 1 vector of functions of the variable Y (r 1) and the parameter vector θ (q 1), and X is an m 1 vector of conditioning variables. It is well known that (1) is equivalent to the unconditional moment restriction: IE[h(Y, θ o )f(x)] = 0, (2) for all measurable function f. Here, each f(x) may be interpreted as an instrument that helps to identify θ o. Clearly, estimation based on all measurable functions of X is practically infeasible. It is thus typical to select a finite number of unconditional moments in (2) and estimate θ o using, say, the GMM method. However, Domínguez and Lobato (2004) showed that if these restrictions are not chosen properly, there is no guarantee that θ o can be identified globally. This identification problem in turn renders the associated GMM estimator inconsistent. In light of Bierens (1982, 1990) and Stinchcombe and White (1998), the parameters of interest can be identified when the unconditional moments are based on a class of instruments that can span an appropriate space of functions of X. Domínguez and Lobato (2004) chose the indicator function to generate a continuum of instruments in (2), i.e., f(x) = 1(X τ ) = m 1(X j τ j ), j=1 with 1(B) the indicator function of the event B. The indicator function is not the only choice of functions that can generate the desired instruments; any generically comprehensively revealing function as defined in Stinchcombe and White (1998) will also do. Letting A denote the affine transformation such that A(X, τ ) = τ 0 + m j=1 X jτ j. Stinchcombe and White (1998) showed that, for a real analytic function G that is not a polynomial, (1) is equivalent to IE [ h(y, θ o )G ( A(X, τ ) )] = 0, for almost all τ T R m+1, (3) 3

6 where T has a nonempty interior and could be a small set in R m+1. Observe that (3) is a continuum of unconditional moment restrictions indexed by the parameter τ in T, cf. (2). In particular, the function G may be the exponential function (Bierens, 1982, 1990) or the logistic function (White, 1989). In this paper, we propose an alternative to constructing a consistent estimator of θ o that improves on the approach of Domínguez and Lobato (2004). We first consider the case where X is univariate, denoted as X. Without loss of generality, we transform X to a bounded random variable using a generalized logistic mapping: X = exp (X) c + exp (X), c > 0, so that 0 < X < 1/c. restriction (1) is equivalent to Because this mapping is one-to-one, the conditional moment IE[h(Y, θ o ) X] = 0, w.p.1; (4) see e.g., Bierens (1994, Theorem 3.2.1). For a real function G that is analytic but not a polynomial, Stinchcombe and White (1998) showed that the unonditional restriction equivalent to (4) is IE [ h(y, θ o )G ( A( X, τ ) )] = 0, for almost all τ T R 2. (5) Then, θ o can be globally identified via an L 2 -norm of (5): θ o = argmin θ Θ = argmin θ Θ IE [ h(y, θ)g ( A( X, ) )] 2 T L 2 IE [ h(y, θ o )G ( A( X, τ ) )] (6) 2 dp (τ ), where P (τ ) is a distribution function of τ in T. For the choice of G, we set G ( A( X, τ ) ) = exp ( A( X, τ ) ), as in Bierens (1982, 1990). This choice has some advantages over the indicator function. First, while the indicator function takes only the values one and zero, the exponential function is more flexible and hence may better present the information in the conditioning variables. That is, the exponential function may generate better instruments for identifying θ o. Second, the exponential function is generically comprehensively revealing, but the indicator function is not. By the genericity property, we can focus on the instruments indexed by τ in a small set T. By contrast, one must deal with the instruments generated 4

7 by the indicator function for all possible τ R. Moreover, in the class of generically comprehensively revealing function, exp ( A( X, τ ) ) and exp(τ X) only differ by a constant and hence play the same role in function approximation (Stinchcombe and White, 1998). Choosing the exponential function thus reduces the dimension of integration in (6) by one, i.e., T R. As far as implementation is concerned, the exponential function results in an analytic form for the objective function, which greatly facilitates estimation in practice. Apart from the choice of the instrument-generating function G, our approach differs from that of Domínguez and Lobato (2004) in an important respect. Instead of directly working on the unconditional moments, we project IE[h(Y, θ) exp(τ X)] along the exponential Fourier series: IE [ h(y, θ) exp(τ X) ] = k= C k (θ) exp (ikτ), where C k (θ) is a Fourier coefficient such that C k (θ) = 1 2π π π = 1 2π IE IE [ h(y, θ) exp(τ X) ] exp ( ikτ) dτ [ π h(y, θ) exp(τ X) exp( ikτ) dτ π ], k = 0, ±1, ±2,... Note that each C k (θ) incorporates the continuum of the original instruments exp(τ X) into a new instrument with the following analytic form: ϕ k ( X) = 1 2π π π exp(τ X) exp( ikτ) dτ = 1 2π ( 1) k [exp (π X) exp( π X)] ( X ik) = ( 1)k 2 sinh (π X) 2π( X ik), where sinh(w) = (exp (w) exp ( w))/2. Setting dp (τ) = dτ and T = [ π, π] in (6), we have from Parseval s Theorem that IE [ h(y, θ)g( X) ] 2 L 2 = 2π k= Ck (θ) 2 = k= It follows from (6) that θ o can be globally identified as θ o = argmin θ Θ k= IE [ h(y, θ)ϕ k ( X) ] 2. IE [ h(y, θ)ϕ k ( X) ] 2. (7) 5

8 That is, θ o is identified using a countable collection of unconditional moments. By Bessel s inequality, C k (θ) 0 as k tends to infinity, so that the moment condition IE [ h(y, θ)ϕ k ( X) ] becomes less informative for identifying θ o when k is large. This suggests that there is no need to include all the Fourier coefficients in estimating θ o. As such, we may replace IE [ h(y, θ)ϕ k ( X) ] with its sample counterpart and compute an estimator of θ o as θ(k T ) = argmin θ Θ = argmin θ Θ K T k= K T K T k= K T 1 T 1 T T h(y t, θ)ϕ k ( x t ) t=1 T t=1 h(y t, θ) ( 1)k 2 sinh (π x t ) 2π( xt ik) where K T depends on T but is less than T, and y t and x t are the realizations of Y and X, respectively. This is in fact a GMM estimator based on (2K T + 1) unconditional moments and the identity weighting matrix. The estimator θ(k T ) now can be computed quite straightforwardly using a numerical optimization algorithm. By choosing the indicator function to generate the desired instruments, Domínguez and Lobato (2004) showed that θ o can be identified as [ ] θ o = argmin IE h(y, θ)1(x ) 2 θ Θ = argmin θ Θ R L 2 IE [ h(y, θ)1(x τ) ] 2 dp (τ), where P (τ) is a distribution function of τ R, cf. (6) and (7). Setting P (τ) as P X (τ), the distribution function of X, it is easy to obtain the sample counterpart of the L 2 -norm above. The estimator proposed by Dimínguez and Lobato (2004) is then θ DL (T ) = argmin θ Θ T k=1 ( 1 T 2 2, (8) 2 T h(y t, θ)1(x t τ k )), (9) t=1 where τ k = x k, k = 1,..., T, are just the sample observations. This is also a GMM estimator based on T unconditional moments induced by the indicator function. Note that when P X (τ) is used in the L 2 -norm, θ DL (T ) utilizes only a finite number of the unconditional moments. By contrast, the proposed estimator (8) depends on a few new instruments ϕ k, yet each ϕ k utlizes the full continuum of the original instruments required for identifying θ o. 6

9 Carrasco and Florens (2000) also proposed an efficient GMM estimation method based on a continuum of unconditional moments. Adapting to the current context, their approach amounts to projecting the continuum of the induced unconditional moments along the eigen-functions of a covariance operator. This approach thus requires preliminary estimation of a covariance operator and its eigenvalues and eigen-functions. Also, to ensure the invertibility of the estimated covariance operator, a regularization parameter must be set by researchers. As a result, their method is not easy to implement and may be arbitrary in practice. 3 Asymptotic Properties In this section, we establish the asymptotic properties of the proposed estimator θ(k T ). For notation simplicity, we define µ k (θ) = IE[h(Y, θ)ϕ k ( X)] and m T,k (θ) = 1 T T h(y t, θ)ϕ k ( x t ). t=1 When integration and differentiation can be interchanged, let θ µ k (θ) = IE [ θ h(y, θ)ϕ k ( X) ], θ m T,k (θ) = 1 T T θ h(y t, θ)ϕ k ( x t ). t=1 Our maintained assumption is: θ o is the unique solution to IE[h(Y, θ) X] = 0. We impose the following conditions that are quite standard in the GMM literature and establish a lemma concerning the limiting behavior of the sum of m T,k (θ) µ k (θ). [A1] The observed data (y t, x t ), t = 1,..., T, are independent realizations of (Y, X). [A2] Θ R q is compact such that for each θ Θ, h(, θ) is measurable, and for each y R r, h(y, ) is continuous on Θ. [A3] IE[sup θ Θ h(y, θ) 2 ] <. Lemma 3.1 Suppose that [A1] [A3] hold. Then for any ε > 0, K T IP m T,k (θ) µ k (θ) 2 ε (2K T + 1) T ε k= K T uniformly in θ Θ, where is some real number. 7

10 This result shows that, as long as K T = o(t ), K T k= K T m T,k (θ) µ k (θ) 2 IP 0, and hence K T implies, uniformly for all θ in Θ, K T k= K T m T,k (θ) 2 IP k= µ k (θ) 2. (10) Given the maintained assumption, θ o = argmin θ Θ k= µ k(θ) 2, and θ(k T ) = argmin θ Θ K T k= K T m T,k (θ) 2. (11) The consistency result now follows from (10) and Theorem 2.1 of Newey and McFadden (1994). Theorem 3.2 (Consistency) Given [A1] [A3], suppose that K T as T such that K T = o(t ). Then θ(k T ) IP θ o. In what follows, for a complex number (function) f, let f denote its complex conjugate and Re(f) and Im(f) denote its real and imaginary parts, respectively. For the vector of complex functions f, its complex conjugate, real part and imaginary part are defined elementwise. For two p 1 vectors of functions f and g in L 2 [ π, π], their inner product is defined as f, g = π π f(τ) g(τ) dτ = π π i=1 p f i (τ)ḡ i (τ) dτ. Given (11), θ(k T ) satisfies the first order condition: K T 0 = θ m T,k ( θ(kt ) ) mt,k ( θ(kt ) ) + θ m T,k ( θ(kt ) ) mt,k ( θ(kt ) ) k= K T = K T k= K T 2 Re ( θ m T,k ( θ(kt ) ) mt,k ( θ(kt ) )). A mean value expansion of m T,k ( θ(kt ) ) about θ o gives m T,k ( θ(kt ) ) = m T,k (θ o ) + θ m T,k ( θ )( θ(kt ) θ o ), 8

11 where θ is on the line segment joining θ(k T ) and θ o which may be different for each row of θ m T,k ( θ ). It follows that K T k= K T 2 Re ( θ m T,k ( θ(kt ) ) [ mt,k (θ o ) + θ m T,k ( θ )( θ(kt ) θ o )] ) = 0. (12) Passing to the limit, we have, by the multiplication theorem (e.g., Stuart, 1961), K T k= K T Re K T k= K T Re ( θ m T,k (θ o ) θ m T,k (θ o ) ) IP ( θ m T,k (θ o ) ) T m T,k (θ o ) = k= k= The normalized estimator can then be expressed as T ( θ(kt ) θ o ) = [ k= θ µ k (θ o ) θ µ k (θ o ) ] 1 [ k= θ µ k (θ o )µ k (θ o ), θ µ k (θ o ) T m T,k (θ o ) + o IP (1). θ µ k (θ o ) T m T,k (θ o ) ] + o IP (1). Asymptotic normality then follows if the term in the first brackets is a finite valued, positive definite matrix, and the term in the second brackets obeys a central limit theorem. We impose the following conditions. [A4] θ o is in the interior of Θ. [A5] For each y, h(y, ) is continuously differentiable in a neighborhood N of θ o such that IE [ sup θ N θ h(y, θ) 2] <, where is a matrix norm. [A6] The q q matrix M q, with the (i, j)-th element [ IE θi h(y, θ o ) exp( X) ] [, IE θj h(y, θ o ) exp( X) ], is symmetric and positive definite. [A7] As T goes to infinity, 1 T T t=1 h(y t, θ o ) exp ( x t ) D Z, (13) 9

12 where Z is a Gaussian random element in L 2 [ π, π] that has a zero mean and the covariance operator K such that [ Kf(τ ) = IE h(y, θ o ) exp( X), ( f h(y, θ o ) exp( X) )], for any f (p 1) in L 2 [ π, π]. The covariance operator K in [A8] can be expressed as ( p ) π Kf(τ ) = κ ji (τ, s)f i (s) ds, with the kernel i=1 π j=1,...,p κ ji (τ, s) = IE [ h j (Y, θ o ) exp(τ X)h i (Y, θ o ) exp(s X) ], j, i = 1,..., p. It can also be seen that Z, f has a normal distribution with mean zero and variance Kf, f. Note that the functional convergence result in [A8] is the same as the Assumption 11 in Carrasco and Florens (2000). One may, of course, impose more primitive conditions on h and the data that ensure such convergence result. For example, [A8] would hold when h(y t, θ o ) exp( x t ) are i.i.d. with bounded second moment. More discussions and results on functional convergence can be found in Chen and White (1998) and Carrasco and Florens (2000). Let Ω q denote a q q matrix with the (i, j)-th element [ IE θi h(y, θ o ) exp( X) ] [, K IE θj h(y, θ o ) exp( X) ]. We are ready to present the limiting distribution of the normalized estimator. Theorem 3.3 (Asymptotic Normality) Given Assumptions [A1] [A7], suppose that K T as T such that K T = o(t 1/2 ). Then T ( θ(kt ) θ o ) where V = M 1 q Ω q M 1 q. D N (0, V), 10

13 4 Simulations In this section we examine the finite-sample performance of the proposed estimator via Monte Carlo simulations. We compare the performance of the nonlinear least squares (NLS) estimator: θ NLS = argmin θ Θ 1 T T h(y t, θ) 2. t=1 the estimator of Domínguez and Lobato (2004): θ DL in (9), and the proposed estimator: θ(k T ) with K T = 5 in (8). We consider three performance criteria: bias, standard error (SE), and mean squared error (MSE). In all experiments, we compute the parameter estimate using the GAUSS optimization procedure, OPTMUM, with the BFGS algorithm; for each optimization, three initial values are randomly drawn from the standard normal distribution. The number of replications is It should be noted that neither θ DL nor θ(k T ) is an efficient estimator here. 4.1 The Experiment in Domínguez and Lobato (2004) Following Domínguez and Lobato (2004), we consider a simple nonlinear model: Y = θ 2 ox + θ o X 2 + ɛ, ɛ N (0, 1), where θ o = 5/4. The associated conditional moment restriction is IE[ɛ X] = 0. In this case, in addition to θ NLS, θ DL and θ(k T ), we also consider the GMM estimator with the feasible optimal instrument: 2θX + X 2. The resulting GMM estimator is θ OPIV = argmin θ Θ ( 1 T 2 T (y t θ 2 x t θx 2 t )(2θx t + x 2 t )). t=1 We consider two cases in which X follows N (0, 1) distribution and N (1, 1) distribution, respectively. Domínguez and Lobato (2004) showed that in the latter case, θ o can not be identified using the feasible optimal instrument because θ = 5/4 and θ = 3 also satisfy the implied, unconditional moment restriction. When X is N (0, 1), θ o = 5/4 is the only real solution to the unconditional moment restriction implied by the feasible optimal instrument. 1 The simulation results are summarized in Tables 1. It can be seen that in both cases, the NLS estimator is the best estimator in terms of all 3 criteria, and 1 The other two solutions are complex: ± i. 11

14 θ OPIV has the worst performance with severe biases and larger standard errors. As to the comparison between θ DL and θ(k T ), when X N (1, 1), the performance of θ DL is dominated by that of θ(k T ) under all 3 criteria. When X N (0, 1), θ(k T ) outperforms θ DL under all 3 criteria only for the samples T = 50 and Model with An Endogenous Explanatory Variable We extend the previous experiment to the cases where the explanatory variable is endogenous. The model specification is: { Y = θ 2 o Z + θ o Z 2 + ɛ, Z = X + ν [ ɛ ν ] N ( 0, [ 1 ρ ρ 1 where θ o = 5/4, and X N (0, 1) and is independent of ɛ and ν. In this specification, X can be treated as an instrumental variable of Z and satisfies the conditional moment restriction IE[ɛ X] = 0. ] ) In Table 2, we summarize the results for ρ = 0.01, 0.1, 0.3, 0.5, 0.7, 0.9 and the sample sizes 50, 100 and 200., It can be seen that the NLS estimator has more severe bias (and smaller SE) when ρ becomes larger. It is also quite remarkable that the proposed estimator θ(k T ) outperforms θ DL under all 3 criteria for any ρ and any sample size. The bias of θ(k T ) is also less than that of ˆθ NLS in almost all cases, except for the case that ρ = 0.01 and T = 100. Moreover, the proposed estimator may outperform the NLS estimator in terms of MSE when ρ is not too small. For example, when ρ 0.3, ˆθ(K T ) has a smaller MSE than the NLS estimator for T = 100 and Noisy disturbance In this experiment, we examine the effect of the disturbance variance on the performance of various estimators. The model is again Y = θox 2 + θ o X 2 + ɛ, ɛ N (0, σ 2 ), where θ o = 5/4, X is a uniform random variable on ( 1, 1), and σ 2 =0.1, 1, 4, 9, and 16. The results are summarized in Table 3. We first observe that the bias, SE and MSE of these estimators all increase as σ 2 increases. In all cases, the proposed estimator is the one with the smallest bias, while the estimtor of Domínguez and Lobato (2004) has the largest bias. In terms of the MSE, the proposed estimator, in general, dominates ˆθ DL and may also outperform the NLS estimator when σ 2 is not too large. For example, for the 12

15 sample T = 100, the proposed estimator has smaller MSE than does the NLS estimator (ˆθ DL ) when σ 2 1 (σ 2 9). For T = 200, the proposed estimator has smaller MSE than does the NLS estimator when σ 2 4 but dominates ˆθ DL in all cases. 4.4 The Proposed Estimator with Various K T In this experiment, we examine how the choice of K T may affect the performance of ˆθ(K T ). The model specification is the same as that in Section 4.2 where the explanatory variable is endogenous. We consider the cases that ρ equals 0.01, 0.1, 0.5 and 0.9, and the sample T = 50, 100, 200 and 400. We simulate ˆθ(K T ) with K T = 1, 2,..., 10, 15, 20, 30, The results are in Table to Table For the three performance criteria (bias, SE and MSE), we also calculate their percentage changes when K T increase. In Table 4 1 1, for instance, when K T increases from 1 to 2, the bias increases %, SE decreases % and MSE decreases %. The performance of ˆθ NLS and ˆθ DL are also given for comparison. It is easily seen that, when K T increases, SE decreases in all cases while the bias also decreases in most cases (except in Tables 4-1-1, and 4-3-2). The percentage changes are typically small. In most case, the percentage change of the bias is less 0.1% when K T is greater than 5. These results suggest that for the propposed estimator, the first few Fourier coefficients indeed contain the most information for identifying θ o. 5 Extensions In this section we discuss two extensions of the previous results. We first discuss how to obtain an efficient estimator from the proposed consistent estimator via a single Newton- Raphson iterative step. We also extend the model with a univariate conditioning variable to the model with multiple conditioning variables. 5.1 Efficient Estimator We follow Newey (1990, 1993) and Dominguez and Lobato (2004) to construct a twostep efficient estimator. Let Q T (θ, K T ) be the objective function for the efficient GMM estimator and θ Q T (θ, K T ) and θθ Q T (θ, K T ) be its gradient vector and the Hessian matrix, respectively. Since θ(k T ) is consistent for θ o, it will be in a neighborhood of θ o for sufficiently large T. Therefore, a Newton-Raphson step from θ(k T ) will lead to a 13

16 consistent and efficient estimator: θ e (K T ) = θ(k [ 1 T ) θθ Q T (ˆθ(K T ), K T )] θ Q T (ˆθ(K T ), K T ). Note that this approach is valid provided that θ o Q T (θ, K T ). can be identified by the limit of 5.2 Multiple Conditioning Variables Consider the conditional moment restriction IE[h(Y, θ o ) X] = 0 with multiple conditioning variables X. This is equivalent to IE[h(Y, θ o ) exp(τ X)] = 0, almost all τ [ π, π] m R m. Let S := {k = [k 1 k 2... k m ] R m } with k i = 0, ±1, ±2,, ±. The Fourier representation of the unconditional moment is IE[h(Y, θ) exp(τ X)] = C k (θ) exp (ik τ ), where for k S, with C k (θ) = 1 (2π) m = [ π,π] m IE k S [h(y, θ) exp(τ X) ] exp ( ik τ ) dτ 1 [ (2π) m/2 IE h(y, θ)ϕ k1 ( X 1 )ϕ k2 ( X 2 ) ϕ km ( X m ) ϕ kj ( X j ) = 1 2π π π ], exp(τ j Xj ) exp ( ik j τ j ) dτ j = ( 1)kj 2 sinh (π X j ), 2π( Xj ik j ) as in the model with a univariate conditioning variable. Let ϕ k ( X) = ϕ k1 ( X 1 )ϕ k2 ( X 2 ) ϕ km ( X m ). Again by Parseval s Theorem, we have IE[h(Y, θ) exp( X)] 2 = (2π) m Ck (θ) 2 = [ 2 IE h(y, θ)ϕ k ( X)]. L 2 k S k S It follows that θ o can be globally identified as [ 2 θ o = argmin IE h(y, θ)ϕ k ( X)], (14) θ Θ k S Then, a consistent estimator of θ o can be computed as 2 1 T ˆθ(K T ) = argmin h(y θ Θ T t, θ)ϕ k ( x t ). (15) k S(K T ) t=1 14

17 Appendix Proof of Lemma 3.1: Let be a generic constant which varies in different cases and η tk = h(y t, θ)ϕ k ( x t ) IE [ h(y, θ)ϕ k ( X) ], for t = 1,..., T and k = 0,..., K T. First note that as X is bounded, so is sinh(π X). Thus, ϕ 0 ( X) is bounded, and with [A3], IE [ η t0 2] IE [ h(y, θ) 2 ϕ 0 ( X) 2]. For k 1, it is easy to see ϕ k ( X) / k. It follows that IE [ η tk 2] IE [ h(y, θ) 2 ϕ k ( X) 2] /k 2. By [A1], the data are i.i.d., and K T 1 T 2 η T tk = 1 k= K T IE t=1 T T 2 t=1 IE [ ηt0 2] + 2 T 2 ( K T 1 ) T k 2 T. k=1 K T k=1 t=1 T IE [ ηtk 2] By the implication rule and the generalized Chebyshev inequality, we have K T IP 1 T 2 K T η T tk ε IP 1 T 2 ε η T tk 2K k= K T t=1 k= K T t=1 T + 1 K T 1 T 2 η T tk 2K T + 1 ε k= K T IE (2K T + 1). T ε This result holds uniformly in θ because the bound does not depend on θ. t=1 Proof of Theorem 3.2: K T k= K T m T,k (θ) 2 IP We have seen from the text that Lemma 3.1 implies k= m k (θ) 2, uniformly for all θ in Θ. It follows from Theorem 2.1 of Newey and West (1994) that the solution to the left-hand side, ˆθ(K T ), must converge in probability to the unique solution to the right-hand side, θ o. 15

18 References Ai, C. and X. Chen (2003). Efficient estimation of models with conditional moment restrictions containing unknown functions, Econometrica, 71, Bierens, H. J. (1982). Consistent model specification test Journal of Econometrics, 26, Bierens, H. J. (1990). A consistent conditional moment test of functional form, Econometrica, 58, Bierens, H. J. (1994). Topics in Advanced Econometrics, New York: Cambridge University Press. Chen, X. and H. White (1998). Central limit and functional central limit theorems for Hilbert-valued dependent heterogeneous arrays with applications, Econometric Theory, 14, Carrasco, M. and J.- P. Florens (2000). Generalization of GMM to a continuum of moment conditions, Econometric Theory, 16, Chamberlain, G. (1987). Asymptotic efficiency in estimation with conditional moment restrictions, Journal of Econometrics, 34, Domínguez, M. A. and I. N. Lobato (2004). Consistent estimation of models defined by conditional moment restrictions, Econometrica, 72, Donald, S. G., G. W. Imbens, and W. K. Newey (2003). Empirical likelihood estimation and consistent tests with conditional moment restrictions, Journal of Econometrics, 117, Hansen, L. P. (1982). Large sample properties of generalized method of moments estimators, Econometrica, 50, Kitamura, Y., G. Tripathi, and H. Ahn (2004). Empirical likelihood-based inference in conditional moment restriction models, Econometrica, 72, Newey, W. K. (1990). Efficient instrumental variables estimation of nonlinear models, Econometrica, 58, Newey, W. K. (1993). Efficient estimation of models with conditional moment restrictions, in G. Maddala, C. Rao, and H. Vinod (eds.), Handbook of Statistics, 11, 16

19 pp Elsevier Science. Qin, J. and J. Lawless (1994). Empirical likelihood and generalized estimating equations, Annals of Statistics, 22, Stinchcombe, M. B. and H. White (1998). Consistent specification testing with nuisance parameters present only under the alternative, Econometric Theory, 14, Stuart, R. D. (1961). An Introduction to Fourier Analysis, New York: Halsted Press. White, H. (1989). An additional hidden unit test for neglected nonlinearity, Proceedings of the International Joint Conference on Neural Networks, Vol. 2, pp New York: IEEE Press. 17

20 Table 1: Example in Domínguez and Lobato (2004). Sample X N (0, 1) X N (1, 1) T Estimator Bias SE MSE Bias SE MSE 50 ˆθNLS ˆθ DL ˆθ(K T ) ˆθ OPIV ˆθNLS ˆθ DL ˆθ(K T ) ˆθ OPIV ˆθNLS ˆθ DL ˆθ(K T ) ˆθ OPIV

21 Table 2: Models with an endogenous explanatory variable. T = 50 T = 100 T = 200 ρ Est. Bias SE MSE Bias SE MSE Bias SE MSE 0.01 ˆθNLS ˆθ DL ˆθ(K T ) ˆθNLS ˆθ DL ˆθ(K T ) ˆθNLS ˆθ DL ˆθ(K T ) ˆθNLS ˆθ DL ˆθ(K T ) ˆθNLS ˆθ DL ˆθ(K T ) ˆθNLS ˆθ DL ˆθ(K T )

22 Table 3: Models with different disturbance variances. T = 50 T = 100 T = 200 σ 2 Est. Bias SE MSE Bias SE MSE Bias SE MSE σ 2 = 0.1 ˆθNLS ˆθ DL ˆθ(K T ) σ 2 = 1 ˆθNLS ˆθ DL ˆθ(K T ) σ 2 = 4 ˆθNLS ˆθ DL ˆθ(K T ) σ 2 = 9 ˆθNLS ˆθ DL ˆθ(K T ) σ 2 = 16 ˆθNLS ˆθ DL ˆθ(K T )

23 Table : The performance of ˆθ(K T ) with various K T (ρ = 0.01, T = 50). K T Bias Bias(+%) SE SE(+%) MSE MSE(+%) ˆθ NLSE ˆθ DL

24 Table : The performance of ˆθ(K T ) with various K T (ρ = 0.01, T = 100). K T Bias Bias(+%) SE SE(+%) MSE MSE(+%) ˆθ NLSE ˆθ DL

25 Table : The performance of ˆθ(K T ) with various K T (ρ = 0.01, T = 200). K T Bias Bias(+%) SE SE(+%) MSE MSE(+%) ˆθ NLSE ˆθ DL

26 Table : The performance of ˆθ(K T ) with various K T (ρ = 0.01, T = 400). K T Bias Bias(+%) SE SE(+%) MSE MSE(+%) ˆθ NLSE ˆθ DL

27 Table : The performance of ˆθ(K T ) with various K T (ρ = 0.1, T = 50). K T Bias Bias(+%) SE SE(+%) MSE MSE(+%) ˆθ NLSE ˆθ DL

28 Table : The performance of ˆθ(K T ) with various K T (ρ = 0.1, T = 100). K T Bias Bias(+%) SE SE(+%) MSE MSE(+%) ˆθ NLSE ˆθ DL

29 Table : The performance of ˆθ(K T ) with various K T (ρ = 0.1, T = 200). K T Bias Bias(+%) SE SE(+%) MSE MSE(+%) ˆθ NLSE ˆθ DL

30 Table : The performance of ˆθ(K T ) with various K T (ρ = 0.1, T = 400). K T Bias Bias(+%) SE SE(+%) MSE MSE(+%) ˆθ NLSE ˆθ DL

31 Table : The performance of ˆθ(K T ) with various K T (ρ = 0.5, T = 50). K T Bias Bias(+%) SE SE(+%) MSE MSE(+%) ˆθ NLSE ˆθ DL

32 Table : The performance of ˆθ(K T ) with various K T (ρ = 0.5, T = 100). K T Bias Bias(+%) SE SE(+%) MSE MSE(+%) ˆθ NLSE ˆθ DL

33 Table : The performance of ˆθ(K T ) with various K T (ρ = 0.5, T = 200). K T Bias Bias(+%) SE SE(+%) MSE MSE(+%) ˆθ NLSE ˆθ DL

34 Table : The performance of ˆθ(K T ) with various K T (ρ = 0.5, T = 400). K T Bias Bias(+%) SE SE(+%) MSE MSE(+%) ˆθ NLSE ˆθ DL

35 Table : The performance of ˆθ(K T ) with various K T (ρ = 0.9, T = 50). K T Bias Bias(+%) SE SE(+%) MSE MSE(+%) ˆθ NLSE ˆθ DL

36 Table : The performance of ˆθ(K T ) with various K T (ρ = 0.9, T = 100). K T Bias Bias(+%) SE SE(+%) MSE MSE(+%) ˆθ NLSE ˆθ DL

37 Table : The performance of ˆθ(K T ) with various K T (ρ = 0.9, T = 200). K T Bias Bias(+%) SE SE(+%) MSE MSE(+%) ˆθ NLSE ˆθ DL

38 Table : The performance of ˆθ(K T ) with various K T (ρ = 0.9, T = 400). K T Bias Bias(+%) SE SE(+%) MSE MSE(+%) ˆθ NLSE ˆθ DL

Estimation of Conditional Moment Restrictions. without Assuming Parameter Identifiability in the Implied Unconditional Moments

Estimation of Conditional Moment Restrictions without Assuming Parameter Identifiability in the Implied Unconditional Moments Shih-Hsun Hsu Department of Economics National Chengchi University Chung-Ming