MOMENT BASED INFERENCE WITH STRATIFIED DATA

Size: px

Start display at page:

Download "MOMENT BASED INFERENCE WITH STRATIFIED DATA"

Toby Jennings
5 years ago
Views:

1 MOMENT BASED INFERENCE WITH STRATIFIED DATA GAUTAM TRIPATHI Abstract. Many datasets used by economists and other social scientists are collected by stratified sampling. Stratification of the target population induces a probability distribution on the realized observations that differs from the underlying distribution for which inference is to be drawn. If the distinction between target and realized distributions is not taken into account, statistical inference can be severely biased. This paper shows how two well known techniques, the generalized method of moments (GMM) and empirical likelihood (EL), can be used to do efficient semiparametric inference in models with unconditional moment restrictions when the target population is stratified. We investigate and obtain results for three widely used stratified sampling schemes: variable probability (VP) sampling, multinomial (MN) sampling, and standard stratified (SS) sampling. 1. Introduction Many datasets used by economists and other social scientists consist of stratified samples. Stratification is usually employed for administrative convenience or to increase statistical precision by oversampling rare but informative outcomes. Whatever its reason, stratifying the target population is in general not innocuous. In particular, it induces a probability distribution on the realized observations that differs from the underlying population distribution for which inference is to be made. If the distinction between target and realized distributions is not taken into account, subsequent statistical inference can be severely biased. In this paper we show how to do efficient inference in models with unconditional moment restrictions when the target population is stratified. Let z be a random vector in R d and Θ a subset of R p such that H 0 : E f {g(z, θ )} = 0 for some θ Θ, (1.1) where g is a q 1 vector of functions known up to θ such that q p (i.e., overidentification is allowed), and f the unknown density of z with respect to a dominating measure µ which need not be the Lebesgue measure so that z can have discrete components. Note that, following usual mathematical convention, by a vector we mean a column vector. We do not make any notational distinction between a random variable and the value taken by it; the difference should be clear from the context. The symbol E f indicates that expectation is with respect to the pdf f. Date: This version: June 08, Key words and phrases. Empirical likelihood, GMM, Stratified sampling. This is a shortened and revised version of a paper that was previously circulated under the title GMM and empirical likelihood with stratified data. I thank seminar participants at several universities for helpful suggestions and conversations. Financial support for this project from the NSF via grant SES is also gratefully acknowledged. 1

2 2 TRIPATHI Familiar examples of models with unconditional moment restrictions are linear and nonlinear regression models of the form y = ψ(x, θ ) + ε, where ψ is known up to θ and the identifying assumption is that the error term is uncorrelated with the regressors; i.e., E f {xε} = 0. Here g(z, θ ) = x{y ψ(x, θ )} and z = (y, x). Multivariate extensions include linear and nonlinear simultaneous equations models where the error terms are uncorrelated with the explanatory variables. Sometimes econometric models are motivated in terms of conditional moment restrictions. E.g., suppose that we have the conditional moment restriction E y x {ρ(y, x, θ ) x} = 0 w.p.1, where ρ is a k 1 vector of functions known up to θ and y a vector ondogenous variables. This paper can handle such models in an instrumental variables (IV) framework: Letting A(x) denote a q k matrix of instruments, the conditional moment restriction yields unconditional moment restrictions of the form E f {g(z, θ )} = 0, where g(z, θ ) = A(x)ρ(y, x, θ ). Of course, since conditional moment restrictions are stronger than unconditional ones, it is possible to improve upon IV estimators. However, this extension is beyond the scope of the current paper and readers interested in seeing how conditional moment restrictions can be fully exploited in the presence of stratification are referred to Tripathi (2002). In this paper we treat the target density f as completely unknown. By contrast, most earlier works in the literature (some exceptions are described later in Section 2) make strong parametric assumptions about the conditional density of y given x. Furthermore, they seem to concentrate solely on linear regression and least squares or nonlinear discrete response models. Unlike these papers, we examine a general class of potentially overidentified models which nest linear regression and discrete choice models as special cases. E.g., our ability to handle IV models allows us to do semiparametric inference in Box-Cox type models using stratified samples. This is an important advantage because it is well known that least squares is not consistent for estimating such models. Let the target population be partitioned into L nonempty and disjoint strata C 1,..., C L. Depending upon the manner in which the observations are actually drawn from the strata, we study three general stratified sampling models. These are henceforth referred to as the variable probability biased sampling (VPBS) model, the multinomial biased sampling (MNBS) model, and the standard stratified biased sampling (SSBS) model. A good description of these stratification schemes can be found in Jewell (1985), Cosslett (1993), Imbens and Lancaster (1996), and Wooldridge (1999). In VPBS models, an observation is first drawn from the target population. If it lies in stratum C l, it is retained with known probability P l ; if it is discarded, all information about the observation is lost. Hence, instead of observing z from the target density f, we observe z from the realized µ-density f(ξ) = L l=1 P li{ξ C l }f (ξ) L l=1 P lq l = b(ξ)f (ξ) b, (1.2) where Q l = C l f (ξ) dµ, b(ξ) = L l=1 P li{ξ C l }, b = L l=1 P lq l, and I is the indicator function. VPBS models arise naturally when data is collected by telephone surveys. Note that Q l denotes the probability that a randomly chosen observation from the target population lies in the l th stratum; i.e., loosely speaking, Q l represents the demand for the l th stratum. Since L l=1 Q l = 1, the Q l s are popularly called aggregate shares. Throughout the paper, the aggregate shares are assumed to

3 MOMENT BASED INFERENCE WITH STRATIFIED DATA 3 be unknown 1. The parameter b also has a practical interpretation: It is the probability that an observation from the target population is ultimately retained in the sample. In MNBS models, the researcher first selects a stratum, say C l, with known probability H l so that H H L = 1. Then an observation is drawn randomly from the selected stratum. Hence, instead of observing z from the target density f, we observe z from the µ-density f(ξ) = L l=1 H l I{ξ C l }f (ξ) Q. (1.3) l In SSBS models, the number of observations drawn from each stratum is fixed in advance and data is sampled randomly within each stratum. Most large datasets are a result of standard stratified sampling. Suppose that n observations z 1,..., z n are collected by SS sampling. Then the realized SSBS density for a single observation is given by f n (ξ) = L l=1 (n l/n)i{ξ C l }f (ξ)/q l, where n l = n j=1 I{z j C l } denotes the number of observations lying in the l th stratum. The main difference between SSBS and MNBS models is that in SSBS models the n l s are treated as non-stochastic constants whereas in MNBS models they are random variables (thus, unlike MN sampling, observations collected using SS sampling are independently but not identically distributed across the strata). This means that statistical inference in SSBS models should be done conditional on the n l s. However, as discussed later in Section 4, we carry out inference in SSBS models by using an enriched sample (i.e., a stratified sample supplemented with an additional random sample) in order to overcome identification problems caused due to lack of knowledge of the aggregate shares. But the presence of this supplementary random sample ensures that the number of observations lying in each stratum of the enriched sample is a random variable. Hence, there is no need to condition on the number of observations lying in each stratum of the enriched sample when estimating SSBS models with enriched datasets. Therefore, statistical treatment for SSBS models can be simplified by simply assuming that z 1,..., z n are iid observations from the µ-density f(ξ) = L l=1 K l I{ξ C l }f (ξ) Q l = b(ξ, Q )f (ξ), (1.4) where K l = lim n (n l /n), b(ξ, Q ) = L l=1 (K l/q l )I{ξ C l}, and Q = (Q 1,..., Q L ). The K l s are regarded as known since replacing them by the corresponding n l /n s (the underlying sampling fractions for the stratified sample) does not matter asymptotically. Therefore, for the remainder of the paper we assume that the realized SSBS density is given by (1.4), where the K l s are known and sum to one 2. Since (1.3) and (1.4) make it clear that the realized densities for MN and SS sampling 1 Since the aggregate shares are endogenously determined parameters, it is not reasonable to assume that they are known. On the other hand, the retention probabilities P 1,..., P L (for VPBS models) or the sampling fractions H 1,..., H L (for MNBS models) and K 1,..., K L (for SSBS models), described later, are part of the sampling design that is fixed in advance by the data collection agency and their knowledge is usually in the public domain. 2 In fact, we can continue to use the assumption that z1,..., z n are iid draws from (1.4) in order to do efficient inference in SSBS models even when the aggregate shares are assumed known, provided we condition on the n l s. For the sake of completeness, in Appendix B we show how this can be done. Readers should note that Appendix B is the only section in the paper where the aggregate shares are assumed known.

4 4 TRIPATHI schemes are observationally equivalent, results obtained for SSBS models will be valid for MNBS models as well. Therefore, henceforth we only consider SSBS models without loss of generality. A simple example illustrates the dangers of ignoring stratification. Suppose we want to estimate θ = E f {z}, the mean of the target population. But since the target population is stratified we have iid observations z 1,..., z n from the realized density f in (1.2), (1.3), or (1.4) instead of iid observations from the target density f. Therefore, the naive estimator n j=1 z j/n will not be a consistent estimator of θ because n j=1 z j/n p E f {z} by the WLLN 3, but E f {z} E f {z}. Intuitively, the naive approach fails because stratification of the target population ensures that data comes from the realized density ven though the parameter of interest is defined in terms of target density f. Hence, statistical inference using the realized observations is actually about f and not f. The basic problem investigated in this paper can be loosely stated as follows: How can we use iid observations z 1,..., z n generated from the realized density f to do inference on the target density f? More precisely, we develop a unified approach to deal with different kinds of stratified sampling schemes and our treatment is general enough to handle stratification based only on the endogenous variable(s) (pure endogenous stratification), or on the exogenous variable(s) alone (pure exogenous stratification), or stratification that is based on some or all of the endogenous and exogenous variables together 4. Using this approach we show how to efficiently estimate θ, the cdf of the target population F 5, and the aggregate shares Q 1,..., Q L. We also efficiently estimate the cdf of the realized population F 6 and b, the probability that an observation from the target population is retained in the realized sample when data is collected by VP sampling. Furthermore, we obtain confidence regions and tests of parametric hypotheses for θ and also show how to test the validity of the overidentifying restrictions in (1.1). All estimation and testing is done without making any parametric assumptions about the target or realized densities. Note that none of the results obtained in this paper can be found in the literature surveyed in Section 2. The inferential approach developed in this paper is based on two well known techniques: the generalized method of moments (GMM) proposed by Hansen (1982), and empirical likelihood (EL) proposed by Owen (1988). We focus on GMM (which is related to the estimating functions approach studied in the statistics literature) because its unifying approach and wide applicability has made 3 Throughout the paper, all limits are taken as the sample size n. 4 The term exogenous is, strictly speaking, an abuse of terminology since (1.1) does not involve any conditioning. The careful reader may want to substitute stratification based on explanatory variables for exogenous stratification whenever the latter is encountered. 5 Efficient estimation of F is important if one wants to bootstrap from the target population. When prior information about the target population in the form of (1.1) is available, merely using a consistent estimator of F can lead to poor inference from the bootstrap. Hence, as pointed out by Brown and Newey (2002), in order to obtain an efficient bootstrapping procedure, resampling must be done using an efficient estimator of F ; i.e., an estimator which incorporates the stochastic restrictions imposed by the model (1.1). 6 Estimating the realized cdf is useful because comparing the estimators of F and F, denoted by ˆF and ˆF, respectively, can reveal the extent of bias induced by the stratification procedure. Of course, ˆF can also be compared with the empirical cdf of the observed data. But because ˆF takes the model (1.1) into account, it is more precise. Similarly, since ˆF is an asymptotically efficient estimator of F under H 0, using it to resample from F makes sense.

5 MOMENT BASED INFERENCE WITH STRATIFIED DATA 5 it the method of choice for estimating and testing overidentified nonlinear econometric models. Its availability in canned software packages has also added to its popularity with applied economists. An excellent exposition of the GMM methodology can be found in Newey and McFadden (1994). On the other hand, we also devote attention to EL because it has lately begun to emerge as a serious contender to GMM. See, e.g., Qin and Lawless (1994), Imbens (1997), Kitamura (1997, 2001), and other papers in the 2002 Journal of Business and Economics Statistics special issue on GMM. Owen (2001) provides a comprehensive and extremely readable account of the EL approach. Although GMM and EL based inference is asymptotically equivalent up to a first order analysis, recent research by Newey and Smith (2003) has shown that under certain regularity conditions EL has better second order properties than GMM; e.g., unlike GMM, the second order bias of EL does not depend upon the number of moment conditions. This makes EL very attractive for estimating models with large q (such as panel data models with long time dimension) where GMM is known to perform poorly in small samples. The following assumption is maintained throughout the paper. It ensures that the GMM and EL estimators described subsequently are consistent and asymptotically normal. Let B(θ, δ) denote an open ball with center θ and radius δ, the usual Euclidean norm (i.e., z = z z), g(z,θ) q p Jacobian matrix of partial derivatives of g(z, θ) with respect to θ, and z (j) the j th component of a vector z. The abbreviation w.p.1 stands for with (target) probability one. Assumption 1.1. (i) Θ is compact; (ii) θ int(θ) is the unique root of (1.1); (iii) g(z, θ) is continuous on Θ w.p.1; There exist δ > 0 and η > 0 such that: (iv) E f {sup θ Θ g(z, θ) 2(1+η) } < ; (v) g(z, θ) is twice continuously differentiable on B(θ, δ) w.p.1; (vi) E f {sup θ B(θ,δ) g(z,θ) } < ; (vii) E f {sup θ B(θ,δ) 2 g (i) (z,θ) } < for i = 1,..., q and j, k = 1,..., p; (viii) (Q (j) θ (k) 1,..., Q L ) (δ, 1 δ)... (δ, 1 δ). (ii) ensures that θ is globally identified. (i) (vii) are used to prove the consistency and asymptotic normality of EL estimators as in Kitamura (1997) and Qin and Lawless (1994), although GMM estimators can be shown to be consistent and asymptotically normal under slightly weaker conditions; see, e.g., Newey and McFadden (1994). (viii) is standard in the stratified sampling literature; see, e.g., Imbens (1992) and Imbens and Lancaster (1996). the 2. Related papers The VP, SS, and MN stratification schemes described earlier have been widely studied in the literature; e.g., Manski and Lerman (1977) estimate discrete choice models using MN sampling when stratification is based on a set of finite response variables (i.e., choice based sampling). Manski and McFadden (1981) allow for stratification on exogenous variables as well. Cosslett (1981b) and Imbens (1992) describe efficient estimation techniques for these models using choice based samples. Imbens and Lancaster (1996) extend Imbens (1992) to allow for SS and VP sampling, mixed response variables, and stratification on exogenous covariates. A well known application of SS and VP sampling can be found in Hausman and Wise (1981). Unlike the current paper, all these works assume that the

6 6 TRIPATHI conditional density of y given x is known up to a finite dimensional parameter and only the marginal pdf of the exogenous variables x is left unspecified. Regression under stratified sampling and a parametric pdf(y x) has also been well investigated; e.g., under SS sampling, DeMets and Halperin (1977) and Holt, Smith, and Winter (1980) estimate linear models under normality, while Scott and Wild (1986) look at logistic regression. Butler (2000) considers linear and nonlinear regression models with MN sampling weights when the error density is known. Bickel and Ritov (1991) also use MN sampling but they do not impose normality. However, they assume independence between the error term and the regressors (all of which have to be discrete). DuMouchel and Duncan (1983) look at weighted least squares estimators under SS sampling. Jewell (1985) and Quesenberry and Jewell (1986) propose iterative estimators of regression coefficients under SS and VP sampling without imposing normality or independence, but they do not provide any asymptotic theory for their estimators. Wooldridge (1999, 2001) also leaves f completely unspecified and provides asymptotic theory for M-estimation under VP and SS sampling. However, Wooldridge s model is defined in terms of a set of just identified unconditional moment conditions whereas we deal with possibly overidentified moment restrictions. Therefore, (1.1) nests his moment conditions as a special case. Note that since the moment conditions in Wooldridge s papers are exactly identified, their validity cannot be tested. In contrast, (1.1) is overidentified when q > p and testing H 0 under stratification is an important issue that is examined in this paper. Furthermore, when dealing with SS sampling, Wooldridge assumes that the aggregate shares are known. In contrast, the maintained assumption in this paper is that the aggregate shares are unknown. Butler (2000) also relaxes the parametric assumption on pdf(y x) when doing unconditional GMM on overidentified models with MN sampling weights and known aggregate shares. However, his focus is on comparing weighted maximum likelihood with weighted GMM and he does not investigate fully efficient GMM or any kind of inference including specification testing of overidentified moment conditions. Neither of these papers discuss estimation of b, Q, F, or F. Qin (1993) uses the EL approach to construct a nonparametric likelihood ratio confidence interval for θ = E f {z} using two independent samples from f (ξ) and f(ξ) = b(ξ)f (ξ)/b ; i.e., a two sample VPBS model in our terminology (Qin just calls it a biased sampling model). El- Barmi and Rothmann (1998) generalize Qin s treatment to handle overidentified models of the form E f {g(z, θ )} = 0. They also use two independent samples from f and f. In contrast, we only need a single sample from f to do inference in VPBS models. Unlike us, Qin (1993) and El-Barmi and Rothmann (1998) do not investigate SSBS or MNBS models. Nor do the latter consider testing overidentifying restrictions or GMM based inference. 3. Inference in VPBS models In this section we investigate VPBS models. We begin with an example. Example 3.1 (VPBS linear regression). Consider the linear model y = x θ + ε, where E f {xε} = 0 and instead of observing z = (y, x) from the target density f, we observe z from the realized VPBS density f defined in (1.2). If we ignore stratification and simply regress y on x, then θ cannot

7 MOMENT BASED INFERENCE WITH STRATIFIED DATA 7 be consistently estimated by the least squares estimator θ = ( n j=1 x jx j ) 1 n j=1 x jy j. To see this, observe that the probability limit of θ is given by (E f xx ) 1 (E f xy) = θ + (E f xx ) 1 E f {xε}/b. But since E f {xε} = 0 does not imply that E f {xε} = 0, it follows that θ is not consistent for θ. Furthermore, since the magnitude of the asymptotic bias depends upon the distribution of (y, x) and the retention probabilities P 1,..., P L, the decision to ignore stratification can only be made on a case by case basis. For some experimental evidence on this matter see Imbens and Lancaster (1996). Notice that θ remains inconsistent even if stratification is based only upon x. In contrast, as pointed out by Wooldridge (1999, 2001) and Tripathi (2002), if the identifying assumption E f {xε} = 0 is replaced by the stronger condition E y x {ε x} = 0 w.p.1, then ignoring exogenous stratification does not affect the consistency of θ although it will still affect its asymptotic variance. Therefore, exogenous stratification is ignorable depending upon the nature of the moment condition used to identify θ Identification. Since we use f to do inference on f, before proceeding any further we first have to investigate whether f can be recovered in terms of f. If there is no way of going from the realized density (loosely speaking, the reduced form ) to the target density (the structural form ), then statistical inference about f and, hence, the target cdf F is impossible. In other words, we first have to examine whether f is identified (the realized density f is, of course, identified by definition). Fortunately, there are no major identification issues for VPBS models: b is identified because b = 1/E f {1/}, the aggregate shares are identified because E f [I{z C l }/] = Q l E f [1/] for l = 1,..., L (hence, Q l = E f [I{z C l }/]/E f [b 1 (z)] for each l), and identification of f follows from the fact that f (ξ) = f(ξ)/{b(ξ)e f [b 1 (z)]}. Therefore, b, Q, and f can all be explicitly written in terms of the realized density f without any problems. As discussed later in Section 4.1, this is in sharp contrast to SSBS models where ignorance of Q leads to serious identification problems Efficient estimation. Since E f {g(z, θ )} = 0 if and only if E f {g(z, θ )/} = 0, we can efficiently estimate θ by doing optimal GMM on g(z, θ)/ 7. In technical terms, this is a change of measure result: since the target density can be expressed in terms of the realized density by inverting the mapping in (1.2), dividing g(z, θ ) by allows (1.1) to be rewritten in terms of f without loss of information. More intuitively, since 1/ = L l=1 I{z C l}/p l, this transformation represents an inverse probability weighting scheme in which oversampled strata are assigned smaller weights than the undersampled strata, thus correcting for the effects of stratification. Example 3.2. Suppose we want to estimate the mean of the target population using iid observations from the realized VPBS density. Since E f {z θ } = 0 if and only if E f {(z θ )/} = 0, the GMM estimator of θ is given by ˆθ = { n j=1 b 1 (z j )} 1 n j=1 z j/b(z j ) = L ˆQ l=1 l z l, where ˆQ l = (n l /P l )/ L l=1 (n l/p l ) and z l = n j=1 z ji{z j C l }/n l denotes the l th stratum sample average. As shown later in Example 3.3, the ˆQ l s estimate the aggregate shares. By the CLT, n 1/2 (ˆθ θ ) is 7 The M-estimators in Wooldridge (1999, 2001) can be motivated in a similar manner. Suppose that θ is identified as θ = argmin θ Θ E f {ψ(z, θ)}, where ψ is a real-valued objective function. Since E f {ψ(z, θ)} = b E f {ψ(z, θ)/} and b does not depend upon θ, it follows that θ = argmin θ Θ E f {ψ(z, θ)/}. Hence, ˆθ M, the M-estimator of θ, can be obtained by solving the sample analog of this problem; i.e., ˆθ M = argmin θ Θ n j=1 ψ(zj, θ)/b(zj). A similar argument works for SSBS models when the aggregate shares are assumed known. See Appendix B for details.

8 8 TRIPATHI asymptotically normal with mean zero and variance b 2 E f {(z θ )(z θ ) /b 2 (z)}. This of course agrees with the result given below in Theorem 3.1. Since the aggregate shares Q are also important parameters of interest, they can be estimated jointly with θ by simply adding additional moment conditions. In fact, since the Q l s add up to one, it suffices to estimate Q L = (Q 1,..., Q L 1 ) (L 1) 1. So let β = (θ, Q L ) (p+l 1) 1 and define 8 ρ(z, β) = g(z,θ) I{z C 1 } Q 1. I{z C L 1 } Q L 1 = [ g(z,θ) ] s(z) Q L, (3.1) (q+l 1) 1 where s(z) = (I{z C 1 },..., I{z C L 1 }) (L 1) 1. As the moment condition E f {ρ(z, β )} = 0 exhausts all information about (θ, Q ), an asymptotically efficient estimator of β can be obtained by doing two-step optimal GMM on (3.1) as follows: First, let ˆρ(β) = n j=1 ρ(z j, β)/n and obtain a preliminary estimator β = argmin β B ˆρ (β)ˆρ(β), where B = Θ [δ, 1 δ]... [δ, 1 δ]. Use this preliminary estimator to construct the efficient weight matrix Υ = n j=1 ρ(z j, β)ρ (z j, β)/n. Next, obtain ˆβ gmm = argmin β B GMM(β), where the objective function GMM(β) = ˆρ (β) Υ 1 ˆρ(β). Now define g b (z, θ ) = g(z, θ )/, D b = E f { g b(z,θ ) }, V b = E f {g b (z, θ )g b (z, θ )}, M b = b b D b (D b b D b ) 1 D b b, and C b = E f {g b (z, θ )[s(z) Q L ] /}. Standard GMM theory as in Newey and McFadden (1994) reveals that ˆβ gmm is consistent for β. Some additional matrix algebra can be used to show the following result. Theorem 3.1. n 1/2 ( ˆβ ] gmm β ) d N(0, ), where Σ 11 = (D b b D b ) 1, Σ 12 = b (D b b D b ) 1 D b b [ Σ11 Σ 12 Σ 12 Σ 22 C b, and Σ 22 = b 2 (E f {[ s(z) Q L ][ s(z) Q L ] } C b M bc b ). ˆβ gmm is asymptotically efficient because it is well known that optimally weighted GMM estimators are asymptotically efficient (Chamberlain 1987) 9. Notice that if is constant so that stratification disappears, then Σ 11 becomes the well known asymptotic variance for estimating θ in the absence of stratification. Similarly, if there is no auxiliary information (e.g., if g(z, θ ) is identically zero or if there are no overidentifying restrictions), then Σ 22 reduces to b 2 E f {[ s(z) Q L ][ s(z) Q L ] }. Therefore, imposing the overidentified model leads to an efficiency gain in estimating the aggregate shares. Theorem 3.1 also reveals that if θ is the only parameter of interest then it is not necessary to 8 Following the discussion in Section 3.1, Ef [I{z C l }/] = Q l E f [1/] for l = 1,..., L. Hence, the last L 1 moment conditions in (3.1) identify the aggregate shares. 9 Tripathi (2003) shows from first principles that the efficiency bound for estimating θ coincides with Σ 11. For the sake of conciseness, all efficiency bound calculations are omitted from this paper. Interested readers can find these calculations and additional details on efficiency related issues in the aforementioned paper which can be downloaded from the authors web site

9 MOMENT BASED INFERENCE WITH STRATIFIED DATA 9 jointly estimate Q in order to obtain an efficient estimator of θ (because, as mentioned at the beginning of Section 3.2, a GMM or EL estimator of θ based on the moment condition E f {g(z, θ )/} = 0 alone will be asymptotically efficient; i.e., have asymptotic variance Σ 11 ). Next, we consider estimating F and F. Since F and F have the same support, fix an evaluation point ξ R d. Although F (ξ) = Pr f {z ξ} and F (ξ) = Pr f {z ξ} can be estimated by incorporating additional conditions in (3.1), the resulting estimators will not be computationally practical because they would have to be recalculated each time ξ changed. A better alternative is to do EL. Of course, as mentioned earlier, EL has other properties that make it attractive for estimating θ, Q, and b as well. So, for a fixed β, construct the nonparametric loglikelihood for the realized sample by solving max n p1,...,p n j=1 log p j subject to the constraints p j 0, n j=1 p j = 1, and n j=1 ρ(z j, β)p j = 0. The solution to this optimization problem is given by ˆp j (β) = n 1 {1 + λ (β)ρ(z j, β)} 1 for j = 1,..., n, where λ(β) satisfies n ρ(z j,β) j=1 1+λ (β)ρ(z j,β) = 0. j=1 log{1 + λ (β)ρ(z j, β)} n log n and define the Now let EL(β) = n j=1 log ˆp j(β) = n empirical likelihood estimator of β as ˆβ el = argmax β B EL(β). Consistency of ˆβel follows from Kitamura (1997, Theorem 1), and using Qin and Lawless (1994, Theorem 1) along with some additional matrix algebra we can show that n 1/2 ( ˆβ el β ) and n 1/2 ( ˆβ gmm β ) have the same asymptotic distribution 10. Therefore, the EL estimators of θ and Q are also asymptotically efficient. Henceforth, for the remainder of the paper let ˆθ and ˆQ = ( ˆQ 1,..., ˆQ L 1, 1 L 1 ˆQ l=1 l ) denote the GMM or EL estimators of θ and Q, respectively. Also, define ˆD b = n 1 n j=1 g b (z j,ˆθ), ˆV b = n 1 n j=1 g b(z j, ˆθ)g b (z j, ˆθ), and Ĉb = n 1 n j=1 g b(z j, ˆθ)[s(z j ) ˆQ L ] /b(z j ). Since ˆD b p Db, ˆV b p Vb, Ĉ b p Cb, n/ n j=1 {1/b(z j)} p b, and n 1 n j=1 [s(z j) ˆQ L ][s(z j ) ˆQ L ] /b 2 (z j ) p E f {[ s(z) Q L ][ s(z) Q L ] }, the asymptotic variances of ˆθ and ˆQ can be easily estimated by replacing Σ 11 and Σ 22 with their consistent estimators. Example 3.3 (Example 3.1 contd.). Since β is just identified, the GMM or EL estimators of θ and Q l are given by ˆθ = { n j=1 x jx j /b(z j)} 1 n j=1 x jy j /b(z j ) and ˆQ l = (n l /P l )/ L l=1 (n l/p l ), respectively. By Theorem 3.1, n 1/2 (ˆθ θ ) is asymptotically normal with mean zero and variance xx {E f } 1 E f {xx ε 2 b }{E f xx } 1, which resembles the Eicker-White heteroscedasticity consistent asymptotic variance with a correction for stratification. Similarly, after a little simplification it follows that n 1/2 ( ˆQ l Q l ) d N(0, b {Q l 2Q l 2 + kp l Q l 2 }/P l ), where k = L l=1 (Q l /P l). Before describing estimators of F (ξ) and F (ξ), let us see how b can be efficiently estimated. Since, by (1.2), b = L l=1 P lq l, the GMM or EL estimator (depending on whether Q is estimated by GMM or EL) of b is given by ˆb = L l=1 P l ˆQ l. As in Theorem 3.1, it is straightforward to show that n 1/2 (ˆb b ) = n 1/2 n j=1 ϕ(z j) + o p (1), where ϕ(z) = b {[ b ]/} + b 2 E f {g b (z, θ )/}M b g b (z, θ ). Therefore, the next result is immediate. 10 Although GMM and EL estimators have the same asymptotic distribution, they differ from one another in finite samples. If, however, the model is just identified (i.e., if q = p), then the two coincide in finite samples as well because they are obtained by setting the sample analog of E f {ρ(z, β )} to zero; i.e., by solving n j=1 ρ(zj, ˆβ) = 0 for ˆβ.

10 10 TRIPATHI Theorem 3.2. n 1/2 (ˆb b ) d N(0, b 2 E f { b } 2 b 4 E f { g b (z,θ ) }M b E f { g b(z,θ ) }). Since ˆb is a known linear function of ˆQ and the latter is asymptotically efficient, it follows that ˆb is also asymptotically efficient. If there is no overidentification the asymptotic variance of n 1/2 (ˆb b ) becomes b 2 E f { b } 2. This makes sense because ˆQ l = (n l /P l )/ L l=1 (n l/p l ) when q = p and, hence, ˆb = L l=1 P l ˆQ l = n/ n j=1 {1/b(z j)} is just the sample analog of 1/E f {1/}. Now that we have both ˆβ el and ˆb el, F (ξ) and F (ξ) can be easily estimated by ˆF (ξ) = n ˆbel j=1 ˆp j( ˆβ el )I{z j ξ}/b(z j ) and ˆF (ξ) = n j=1 ˆp j( ˆβ el )I{z j ξ}, respectively. The next two results can be shown by using Qin and Lawless (1994, Theorem 1) and some tedious matrix algebra. Theorem 3.3. n 1/2 { ˆF (ξ) F (ξ)} is asymptotically normal with mean zero and variance b 2 E f { I{z ξ} F (ξ) } 2 b 2 E f { g b (z, θ )[I{z ξ} F (ξ)] }M b E f { g b(z, θ )[I{z ξ} F (ξ)] }. Theorem 3.4. n 1/2 { ˆF (ξ) F (ξ)} is asymptotically normal with mean zero and variance F (ξ){1 F (ξ)} E f [g b (z, θ )I{z ξ}]m b E f [g b (z, θ )I{z ξ}]. As before, there is a gain in efficiency when estimating F (ξ) and F (ξ) if the model is overidentified. Tripathi (2003) shows that the asymptotic variances in Theorems correspond to the efficiency bounds for estimating F (ξ) and F (ξ), respectively. Hence, these estimators are asymptotically efficient. Notice that if f = f (i.e., no stratification), then the asymptotic variances in Theorems reduce to F (ξ)[1 F (ξ)] E[g (z, θ )I{z ξ}]{ D(D D) 1 D }E[g(z, θ )I{z ξ}], where D = E{ g(z,θ ) } and V = E{g(z, θ )g (z, θ )}. This corresponds to the asymptotic variance for estimating F (ξ) under H 0 in the absence of stratification obtained by Brown and Newey (1998). Example 3.4. Suppose we know a priori that E f {g(z)} = 0, where g is a q 1 vector of known functions that does not depend upon any parameter θ. These type of auxiliary information models, which are a special case (1.1), have been investigated by Imbens and Lancaster (1994), Hellerstein and Imbens (1999), and Nevo (2003), although these authors do not consider efficient estimation of Q, b, F, or F for such models. Since there is no θ to be estimated here, the asymptotic distributions for GMM or EL estimators of Q, b, F (ξ), and F (ξ) under E f {g(z)} = 0 follow from Theorems by replacing g b (z, θ ) with g b (z) and setting D b = Hypothesis testing. Suppose we want to test the parametric restriction H 0 : R(θ ) = 0 against the alternative H 1 : R(θ ) 0, where R is a r 1 vector of twice continuously differentiable functions such that R(θ ) Wald statistic W = nr (ˆθ){ R(ˆθ) has rank r p. Since n 1/2 (ˆθ θ ) d N(0, (D b b D b ) 1 ), the GMM or EL based ( ˆD b 1 ˆV b ˆD b ) 1 R (ˆθ) } 1 R(ˆθ) d χ 2 r. Hence, a test for H 0 against H 1 can be based upon W. Another alternative is to base the test on the objective functions themselves. So let β gmm = argmin {β B:R(θ)=0} GMM(β) and β el = argmax {β B:R(θ)=0} EL(β) denote the GMM and EL estimators under H 0. Next, define DM = n{gmm( β gmm ) GMM( ˆβ gmm )} and LR = 2{EL( ˆβ el ) EL( β el )}, where DM denotes the GMM based distance metric statistic and LR the EL based likelihood ratio test statistic. A test for H 0 against H 1 can be based upon DM or EL. Critical values for these

11 MOMENT BASED INFERENCE WITH STRATIFIED DATA 11 tests follow from Newey and McFadden (1994, Theorem 9.2) and Qin and Lawless (1994, Theorem 2) who show that DM d χ 2 r and LR d χ 2 r under H 0. Since W, DM, and LR are asymptotically equivalent, the decision to use a particular test statistic depends upon computational and other considerations; e.g., though all three can be inverted to obtain asymptotically valid confidence regions, the DM and LR based regions are invariant to the formulation of H 0 and automatically satisfy natural range restrictions. Furthermore, unlike W and DM, the likelihood ratio statistic LR is internally studentized; i.e., it does not require preliminary estimation of any variance terms. This guarantees that confidence regions based on LR are also invariant to nonsingular transformations of the moment conditions. Internal studentization may also lead to better finite sample properties for LR; see, e.g., Fisher, Hall, Jing, and Wood (1996) Specification testing. For the remainder of this section, assume that q > p. Since inference based on the estimated θ is sensible only if (1.1) is true, it is important to devise a test for H 0 against the alternative that it is false. In this section we describe two ways of testing H 0. The first one is easy: GMM theory tells us that ngmm( ˆβ gmm ) d χ 2 q p under H 0. Therefore, a test for overidentifying restrictions (usually called the J-test) in (1.1) can be based on this result. An EL based specification test for H 0 can also be developed. Besides being internally studentized and invariant to nonsingular and algebraic transformations of the moment conditions, Kitamura (2001) has shown this test to be optimal in terms of a Hoeffding type large deviations criterion. So let ˆβ denote a n 1/2 -consistent preliminary estimator of β; e.g., ˆβ can be the GMM or EL estimator defined previously. The restricted, i.e., under H 0, EL can be written as EL r = n j=1 log ˆp j( ˆβ), where ˆp j ( ˆβ) are the EL probability weights defined earlier. Next, consider the unrestricted problem where the model is not imposed. It is well known that the nonparametric maximum likelihood estimator of f in the absence of any auxiliary information puts mass 1/n at each realized observation and is zero elsewhere. Therefore, the unrestricted nonparametric likelihood for the realized sample is given by EL ur = n log n. Now let ELR = 2(EL ur EL r ) = 2 n j=1 log{1 + λ ( ˆβ)ρ(z j, ˆβ)}, where λ( ˆβ) was defined earlier in Section 3.2. Then ELR can be regarded as an analog of the usual parametric likelihood ratio test statistic; i.e., H 0 is rejected if ELR is large enough. Critical values for ELR are easily obtained because ELR d χ 2 q p under H 0 by Qin and Lawless (1994, Corollary 4). 4. Inference in SSBS models We now investigate SSBS models. As shown subsequently, the major difference between VPBS and SSBS models is that unknown aggregate shares create severe identification problems for SSBS models that were not encountered with VPBS models Identification. Although f (ξ) = f(ξ)/b(ξ, Q ) by (1.4), we cannot recover f in terms of the realized density f because the aggregate shares Q are unknown. Thus the target pdf is unidentified and cannot be estimated consistently. So the first task is to overcome this lack of identification. As noted by Cosslett (1981a), Gill, Vardi, and Wellner (1988), and Imbens and Lancaster (1996), in order to ensure identification of f we have to randomly sample from the entire target population with positive probability. We now formalize this assumption.

12 12 TRIPATHI Assumption 4.1. The realized SSBS density is given by (ξ) = K 0 f (ξ) + (1 K 0 )b(ξ, Q )f (ξ), where K 0 (0, 1] is known. Letting c(ξ, Q ) = K 0 + (1 K 0 )b(ξ, Q ), we can conveniently write (ξ) = c(ξ, Q )f (ξ). Note that K 0 denotes the probability of randomly sampling from the whole target population. As a practical example, consider a dataset with 100 observations, 75 of which were obtained using SS sampling while the remaining 25 were obtained by random sampling. Hence, we can let K 0 = 0.25 and regard this enriched dataset as a random sample from. The no stratification case corresponds to setting K 0 = 1. Since Q l = {E fe I(z C l ) (1 K 0 )K l }/K 0 for l = 1,..., L under Assumption 4.1, the aggregate shares (and hence f ) are identified. The idea of using additional samples to ensure identification in biased sampling, incomplete, or measurement error models has of course been recognized earlier; see, e.g., Manski and Lerman (1977), Titterington (1983), Titterington and Mill (1983), Vardi (1985), Ridder (1992), Carroll, Ruppert, and Stefanski (1995), Wansbeek and Meijer (2000), Hirano, Imbens, Ridder, and Rubin (2001), Tripathi (2004), and the references therein. However, the use of supplementary random samples to do moment based inference in overidentified models with stratified data seems to be new to the literature and the results described in this paper cannot be found in any of the references cited here. The existence of such additional random samples should not be regarded as being an overly restrictive requirement. For instance, if the cost of collecting additional data is relatively small, Manski and Lerman (1977) suggest carrying out a small random survey to gather a supplementary sample in order to deal with the problem of unknown aggregate shares. Indeed, some important stratified U.S. datasets such as the Panel Study of Income Dynamics (PSID) and the National Longitudinal Survey (NLS) automatically provide an additional random sample that can be used for this purpose Efficient estimation and inference. Let Q = (Q 1,..., Q L 1, 1 L 1 l=1 Q l). Using some notation introduced earlier in Section 3.2, define 11 ρ(z, β) = g(z,θ) c(z,q) I{z C 1 } c(z,q) Q 1. I{z C L 1 } c(z,q) Q L 1 [ ] g(z,θ) c(z,q) = s(z) c(z,q) Q L (q+l 1) 1. (4.1) Since E fe {ρ(z, β )} = 0, the optimal GMM or EL estimators of θ and Q based on (4.1) are asymptotically efficient. Tripathi (2003) proves this from first principles. As in Section 3.2, the GMM estimator is given by ˆβ gmm = argmin β B GMM(β). Now let g(z, θ ) = g(z, θ ) + K0 1 (1 K 0) L l=1 (K l/q l 2 )I(z C l )E f {g(z, θ )I(z C l )} and define D c = E fe { g(z,θ ) /c(z, Q )}, g c (z, θ ) = g(z, θ )/c(z, Q ), C c = cov fe { g c (z, θ ), s(z)/c(z, Q )}, Ω c = var fe { g c (z, θ )}, and M c = Ω 1 c Ω 1 c D c (D cω 1 c D c ) 1 D cω 1 c. The asymptotic distribution of ˆβ gmm is given below. 11 The last L 1 moment conditions in (4.1) are identifying the aggregate shares because, by using the definition of c(z, Q ), it is easily seen that E fe [I{z C l }/c(z, Q )] = Q l if and only if [E fe I{z C l } (1 K 0)K l ]/K 0 = Q l.

13 MOMENT BASED INFERENCE WITH STRATIFIED DATA 13 Theorem 4.1. Let Assumption 4.1 hold, αl = K 0 Q l + (1 K 0 )K l, α = (α1,..., α L 1 ) (L 1) 1, and A = diag(α1,..., α L 1 ) (L 1) (L 1). Then n 1/2 ( ˆβ ] gmm β ) d N(0, ), where Σ 11 = (D cω 1 c [ Σ11 Σ 12 Σ 12 Σ 22 D c ) 1, Σ 12 = (D cω 1 c D c ) 1 D cω 1 c C c /K 0, and Σ 22 = {A α α C c M c C c }/K0 2. If L = 1 or K 0 = 1 so that stratification disappears, then Σ 11 is just the asymptotic variance for estimating θ in the absence of stratification. If there are no overidentifying restrictions, then Σ 22 reduces to {A α α }/K0 2. Hence, as expected, imposing the overidentified model leads to an efficiency gain in estimating the aggregate shares. Note that, loosely speaking, the second term in the definition of g(z, θ ) can be motivated as bias that appears because Q is an unknown parameter. To get a precise explanation of why the asymptotic variance of ˆθ gmm in Theorem 4.1 depends upon g(z, θ ), the reader should examine the proof of Theorem 4.4 in Tripathi (2003). Next, as in Section 3.2, define the EL estimator ˆβ el = argmax β B EL(β). Consistency of ˆβ el follows from Kitamura (1997, Theorem 1). Using Qin and Lawless (1994, Theorem 1) and the matrix algebra in the proof of Theorem 4.1, we can show that the asymptotic distribution of n 1/2 ( ˆβ el β ) is the same as that of n 1/2 ( ˆβ gmm β ); i.e., ˆθ el and ˆQ el are also asymptotically efficient. Asymptotic variances of ˆQ and ˆθ can be estimated in the obvious manner by replacing Σ 11 and Σ 22 with their sample analogs. These can be constructed by using 12 ˆDc = n 1 n j=1 g(z j,ˆθ) /c(z j, ˆQ), Ĉ c = n 1 n j=1 { g c (z j, ˆθ)s (z j )/c(z j, ˆQ)} n 1 n j=1 { g c (z j, ˆθ)}n 1 n j=1 {s (z j )/c(z j, ˆQ)}, and ˆΩ c = n 1 n j=1 { g c (z j, ˆθ) g c(z j, ˆθ)} n 1 n j=1 { g c (z j, ˆθ)}n 1 n j=1 { g c(z j, ˆθ)}, where g c (z, θ) = g c (z, θ) Q = ˆQ. Example 4.1. Suppose we want to estimate θ, the mean of the target population. Since θ is just identified, it is easily seen that ˆθ = L ˆQ l=1 l z l, where ˆQ l = {n l /n (1 K 0 )K l }/K 0. The asymptotic distribution of ˆθ follows by a direct application of the CLT or by Theorem 4.1 upon noting that D c is the p p identity matrix. Example 4.2 (SSBS linear regression). Consider the model in Example 3.1, but now assume that z is drawn from the enriched SSBS density. Then ˆθ = { n j=1 x jx j /c(z j, ˆQ)} 1 n j=1 x jy j /c(z j, ˆQ), where ˆQ was defined in the previous example. Hence, by Theorem 4.1, n 1/2 (ˆθ θ ) is asymptotically normal with mean zero and variance {E fe xx /c(z, Q )} 1 Ω c {E fe xx /c(z, Q )} 1. By the same result, n 1/2 ( ˆQ l Q l ) d N(0, αl (1 α l )) for each l. We can also use ˆβ el to estimate F (ξ) and F e (ξ). Let ˆF (ξ) = n j=1 ˆp j( ˆβ el )I{z j ξ}/c(z j, ˆQ el ) and ˆF (ξ) = n j=1 ˆp j( ˆβ el )I{z j ξ}, where ˆp j (β) s are the EL probabilities. The asymptotic distributions of ˆF (ξ) and ˆF (ξ) are given below. Theorem 4.2. Let Assumption 4.1 hold. Then n 1/2 { ˆF (ξ) F (ξ)} is asymptotically normal with mean zero and variance var fe {m c (z, ξ)} cov fe { g c(z, θ ), m c (z, ξ)}m c cov fe { g c (z, θ ), m c (z, ξ)}, where m(z, ξ) = I{z ξ} + K0 1 (1 K 0) L l=1 (K l/q l 2 )I{z C l } Pr f {z ξ z C l } and m c (z, ξ) = m(z, ξ)/c(z, Q ). 12 Since we continue to use n to denote the size of the enriched dataset, when estimating asymptotic variances each K l should be replaced by the corresponding underlying sampling fraction for the stratified sample.

14 14 TRIPATHI Theorem 4.3. Let Assumption 4.1 hold. Then n 1/2 { ˆF (ξ) F e (ξ)} is asymptotically normal with mean zero and variance F e (ξ){1 F e (ξ)} cov fe { g c(z, θ ), I(z ξ)}m c cov fe { g c (z, θ ), I(z ξ)}. These results again reveal that imposing the overidentified model leads to an efficiency gain in estimating F and F e. Asymptotic optimality of EL implies that ˆF (ξ) and ˆF (ξ) are also asymptotically efficient; alternatively, see Tripathi (2003) for a direct proof. Hypotheses of the form R(θ ) = 0 vs R(θ ) 0 can be tested using the Wald, DM, or LR statistics as described in Section 3.3 by basing the test on (4.1); in each case, the test statistic is asymptotically distributed as a χ 2 r random variable. If q > p, then GMM or EL based specification testing of H 0 can also be done using (4.1), the details being analogous to those in Section 3.4; i.e., in either case, the test statistic is asymptotically χ 2 q p. 5. Conclusion Using stratified data we develop GMM and EL based efficient semiparametric inference for moment restriction models that may be overidentified. Stratification of the target population induces selection in the realized sample which if not taken into account can lead to inconsistent estimators and biased inference. As shown in the paper, correcting for the effects of stratification is straightforward; namely, an appropriate transformation of the moment conditions ensures that all standard GMM and EL based inference goes through. No special software is required to implement the procedures developed in this paper; any computer package that can do GMM or EL will be able to do the same with stratified data. Appendix A. Proofs We only provide proofs for the results in Section 4 since SSBS models with unknown aggregate shares are the hardest to handle. Results for VPBS models described in Section 3 can be shown in a similar manner. Proof of Theorem 4.1. We prove this result for the L = 2 case. The case when there are more than two strata can be handled in a similar manner although the matrix algebra becomes much more tedious. When there are only two strata, ρ(z, β ) = (g(z, θ )/c(z, Q ), I{z C 1 }/c(z, Q ) Q 1 ) (q+1) 1 and Q = (Q 1, Q 2 ) 2 1. By standard GMM theory, n 1/2 ( ˆβ gmm β ) is asymptotically normal with mean zero and variance (D f e D fe ) 1, where D fe = E fe { ρ(z,β ) β } and V fe = E fe {ρ(z, β )ρ (z, β )}. [ ] Therefore, we show that (D f e D fe ) 1 Σ11 Σ = 12 Σ 12 Σ. So define G 22 1 = E fe {g(z, θ )I(z C 1 )/c(z, Q )}, V c = E fe {g(z, θ )g (z, θ )/c 2 (z, Q )}, and d = K 1 /(Q 1 α 1 ) + K 2/(Q 2 α 2 ). It is easily seen that [ Dc (1 K 0 ) dg ] 1 D fe = 0 K 0Q and V 1 fe = V c (1 K 0 ) Q 1 G 1 α 1. α 1 Next, since Ω c = V c + γg 1 G 1 /K2 0 where γ = α 1 α 2 (1 K 0) 2 d2 + 2K 0 (1 K 0 ) d, we can also show that = Ω 1 C cc c Ω 1 c α α 1 α 2 Cc Ω 1 1 c C c Q 1 G 1 Ω 1 c α 2 α 1 α 2 Cc Ω 1 1 c C c Q 1 2 K0 2 c + Ω 1 c α 1 Q 1 Q 1 G 1 α 1 Ω 1 c G 1 Q 2 1 α 2 α 1 α 1 α 2 Cc Ω 1 c C c K0 2 γg 1 Ω 1 c G 1 α 1 α 2 Cc Ω 1 c C c.

15 MOMENT BASED INFERENCE WITH STRATIFIED DATA 15 Hence, D D fe = D cω 1 c C cc c Ω 1 c D c K 0 D cω 1 α 1 α 2 Cc Ω 1 c C c c C c α 1 α 2 Cc Ω 1 c C c K 0 C c Ω 1 c D c K α 1 α 2 Cc Ω c C c α 1 α 2 Cc Ω 1 c C c D c + D cω 1 c Finally, the partitioned inverse formula reveals that [ (D fe D fe ) 1 (D = cω 1 c D c ) 1 (D cω 1 c The desired result follows. C c Ω 1 c. D c ) 1 D cω 1 c C c /K 0 D c (D cω 1 c D c ) 1 /K 0 {α 1 α 2 C c M c C c }/K 2 0 ]. Proof of Theorem 4.2. Recall ˆF (ξ) = n j=1 ˆp j( ˆβ el )I{z j ξ}/c(z j, ˆQ el ). By a Taylor expansion, 1 c(z, ˆQ el ) = 1 c(z, Q ) + (1 K 0) L l=1 K l I(z C l ) α l 2 ( ˆQ el,l Q l ) + O( ˆQ el Q 2 ). But it is easily seen that ˆQ el,l = { n j=1 ˆp j( ˆβ el )I(z j C l ) (1 K 0 )K l }/K 0. Hence, since ˆQ el,l Q l = n i=1 ˆp i( ˆβ el ){I(z i C l ) E fe I(z C l )}/K 0, we can write ˆF (ξ) = n j=1 ˆp j ( ˆβ el ) I{z j ξ} c(z j, Q ) + (1 K 0) K 0 L l=1 K l α l 2 ( ( n ˆp i ( ˆβ el ){I(z i C l ) E fe I(z C l )}) i=1 n ˆp j ( ˆβ el )I{z j ξ z j C l }) + O( ˆQ el Q 2 ). j=1 Next, n i=1 ˆp i( ˆβ el ){I(z i C l ) E fe I(z C l )} = O p (n 1/2 ) by Qin and Lawless (1994, Theorem 1) and n j=1 ˆp j( ˆβ el )I{z j ξ z j C l } = E fe I{z ξ z C l } + o p (1) by a uniform WLLN. Hence, since ˆQ el Q = O p (n 1/2 ), we have ˆF (ξ) = n j=1 ˆp j ( ˆβ el ) I{z j ξ} c(z j, Q ) + (1 K 0) K 0 L l=1 K l α l 2 ( n ˆp i ( ˆβ el ){I(z i C l ) E fe I(z C l )}) i=1 E fe I{z ξ z C l } + o p (n 1/2 ). Using the definition of c(z, Q ) and doing a little rearranging, we can rewrite this as n ˆF (ξ) = ˆp j ( ˆβ el ) m(z j, ξ) c(z j, Q ) + F m(z, ξ) (ξ) E fe { c(z, Q ) } + o p(n 1/2 ). j=1 Thus, as the ˆp j s sum to one, n 1/2 { ˆF (ξ) F (ξ)} = n 1/2 n ˆp j ( ˆβ el )[ m(z j, ξ) c(z j, Q ) E m(z, ξ) { c(z, Q ) }] + o p(1). j=1 But recall that n 1/2 { ˆF (ξ) F (ξ)} = n 1/2 n j=1 ˆp j( ˆβ el )[I{z j ξ} E fe I{z ξ}]. Therefore, the asymptotic distribution of n 1/2 { ˆF (ξ) F (ξ)} can be obtained from that of n 1/2 { ˆF (ξ) F (ξ)} upon replacing I{z ξ} in the proof of Theorem 4.3 by m(z, ξ)/c(z, Q ). The desired result follows.

Moment Based Inference with Stratified Data

University of Connecticut DigitalCommons@UConn Economics Working Papers Department of Economics January 2007 Moment Based Inference with Stratified Data Gautam Tripathi University of Connecticut Follow