MOMENT BASED INFERENCE WITH STRATIFIED DATA

Size: px
Start display at page:

Download "MOMENT BASED INFERENCE WITH STRATIFIED DATA"

Transcription

1 MOMENT BASED INFERENCE WITH STRATIFIED DATA GAUTAM TRIPATHI Abstract. Many datasets used by economists and other social scientists are collected by stratified sampling. Stratification of the target population induces a probability distribution on the realized observations that differs from the underlying distribution for which inference is to be drawn. If the distinction between target and realized distributions is not taken into account, statistical inference can be severely biased. This paper shows how two well known techniques, the generalized method of moments (GMM) and empirical likelihood (EL), can be used to do efficient semiparametric inference in models with unconditional moment restrictions when the target population is stratified. We investigate and obtain results for three widely used stratified sampling schemes: variable probability (VP) sampling, multinomial (MN) sampling, and standard stratified (SS) sampling. 1. Introduction Many datasets used by economists and other social scientists consist of stratified samples. Stratification is usually employed for administrative convenience or to increase statistical precision by oversampling rare but informative outcomes. Whatever its reason, stratifying the target population is in general not innocuous. In particular, it induces a probability distribution on the realized observations that differs from the underlying population distribution for which inference is to be made. If the distinction between target and realized distributions is not taken into account, subsequent statistical inference can be severely biased. In this paper we show how to do efficient inference in models with unconditional moment restrictions when the target population is stratified. Let z be a random vector in R d and Θ a subset of R p such that H 0 : E f {g(z, θ )} = 0 for some θ Θ, (1.1) where g is a q 1 vector of functions known up to θ such that q p (i.e., overidentification is allowed), and f the unknown density of z with respect to a dominating measure µ which need not be the Lebesgue measure so that z can have discrete components. Note that, following usual mathematical convention, by a vector we mean a column vector. We do not make any notational distinction between a random variable and the value taken by it; the difference should be clear from the context. The symbol E f indicates that expectation is with respect to the pdf f. Date: This version: June 08, Key words and phrases. Empirical likelihood, GMM, Stratified sampling. This is a shortened and revised version of a paper that was previously circulated under the title GMM and empirical likelihood with stratified data. I thank seminar participants at several universities for helpful suggestions and conversations. Financial support for this project from the NSF via grant SES is also gratefully acknowledged. 1

2 2 TRIPATHI Familiar examples of models with unconditional moment restrictions are linear and nonlinear regression models of the form y = ψ(x, θ ) + ε, where ψ is known up to θ and the identifying assumption is that the error term is uncorrelated with the regressors; i.e., E f {xε} = 0. Here g(z, θ ) = x{y ψ(x, θ )} and z = (y, x). Multivariate extensions include linear and nonlinear simultaneous equations models where the error terms are uncorrelated with the explanatory variables. Sometimes econometric models are motivated in terms of conditional moment restrictions. E.g., suppose that we have the conditional moment restriction E y x {ρ(y, x, θ ) x} = 0 w.p.1, where ρ is a k 1 vector of functions known up to θ and y a vector ondogenous variables. This paper can handle such models in an instrumental variables (IV) framework: Letting A(x) denote a q k matrix of instruments, the conditional moment restriction yields unconditional moment restrictions of the form E f {g(z, θ )} = 0, where g(z, θ ) = A(x)ρ(y, x, θ ). Of course, since conditional moment restrictions are stronger than unconditional ones, it is possible to improve upon IV estimators. However, this extension is beyond the scope of the current paper and readers interested in seeing how conditional moment restrictions can be fully exploited in the presence of stratification are referred to Tripathi (2002). In this paper we treat the target density f as completely unknown. By contrast, most earlier works in the literature (some exceptions are described later in Section 2) make strong parametric assumptions about the conditional density of y given x. Furthermore, they seem to concentrate solely on linear regression and least squares or nonlinear discrete response models. Unlike these papers, we examine a general class of potentially overidentified models which nest linear regression and discrete choice models as special cases. E.g., our ability to handle IV models allows us to do semiparametric inference in Box-Cox type models using stratified samples. This is an important advantage because it is well known that least squares is not consistent for estimating such models. Let the target population be partitioned into L nonempty and disjoint strata C 1,..., C L. Depending upon the manner in which the observations are actually drawn from the strata, we study three general stratified sampling models. These are henceforth referred to as the variable probability biased sampling (VPBS) model, the multinomial biased sampling (MNBS) model, and the standard stratified biased sampling (SSBS) model. A good description of these stratification schemes can be found in Jewell (1985), Cosslett (1993), Imbens and Lancaster (1996), and Wooldridge (1999). In VPBS models, an observation is first drawn from the target population. If it lies in stratum C l, it is retained with known probability P l ; if it is discarded, all information about the observation is lost. Hence, instead of observing z from the target density f, we observe z from the realized µ-density f(ξ) = L l=1 P li{ξ C l }f (ξ) L l=1 P lq l = b(ξ)f (ξ) b, (1.2) where Q l = C l f (ξ) dµ, b(ξ) = L l=1 P li{ξ C l }, b = L l=1 P lq l, and I is the indicator function. VPBS models arise naturally when data is collected by telephone surveys. Note that Q l denotes the probability that a randomly chosen observation from the target population lies in the l th stratum; i.e., loosely speaking, Q l represents the demand for the l th stratum. Since L l=1 Q l = 1, the Q l s are popularly called aggregate shares. Throughout the paper, the aggregate shares are assumed to

3 MOMENT BASED INFERENCE WITH STRATIFIED DATA 3 be unknown 1. The parameter b also has a practical interpretation: It is the probability that an observation from the target population is ultimately retained in the sample. In MNBS models, the researcher first selects a stratum, say C l, with known probability H l so that H H L = 1. Then an observation is drawn randomly from the selected stratum. Hence, instead of observing z from the target density f, we observe z from the µ-density f(ξ) = L l=1 H l I{ξ C l }f (ξ) Q. (1.3) l In SSBS models, the number of observations drawn from each stratum is fixed in advance and data is sampled randomly within each stratum. Most large datasets are a result of standard stratified sampling. Suppose that n observations z 1,..., z n are collected by SS sampling. Then the realized SSBS density for a single observation is given by f n (ξ) = L l=1 (n l/n)i{ξ C l }f (ξ)/q l, where n l = n j=1 I{z j C l } denotes the number of observations lying in the l th stratum. The main difference between SSBS and MNBS models is that in SSBS models the n l s are treated as non-stochastic constants whereas in MNBS models they are random variables (thus, unlike MN sampling, observations collected using SS sampling are independently but not identically distributed across the strata). This means that statistical inference in SSBS models should be done conditional on the n l s. However, as discussed later in Section 4, we carry out inference in SSBS models by using an enriched sample (i.e., a stratified sample supplemented with an additional random sample) in order to overcome identification problems caused due to lack of knowledge of the aggregate shares. But the presence of this supplementary random sample ensures that the number of observations lying in each stratum of the enriched sample is a random variable. Hence, there is no need to condition on the number of observations lying in each stratum of the enriched sample when estimating SSBS models with enriched datasets. Therefore, statistical treatment for SSBS models can be simplified by simply assuming that z 1,..., z n are iid observations from the µ-density f(ξ) = L l=1 K l I{ξ C l }f (ξ) Q l = b(ξ, Q )f (ξ), (1.4) where K l = lim n (n l /n), b(ξ, Q ) = L l=1 (K l/q l )I{ξ C l}, and Q = (Q 1,..., Q L ). The K l s are regarded as known since replacing them by the corresponding n l /n s (the underlying sampling fractions for the stratified sample) does not matter asymptotically. Therefore, for the remainder of the paper we assume that the realized SSBS density is given by (1.4), where the K l s are known and sum to one 2. Since (1.3) and (1.4) make it clear that the realized densities for MN and SS sampling 1 Since the aggregate shares are endogenously determined parameters, it is not reasonable to assume that they are known. On the other hand, the retention probabilities P 1,..., P L (for VPBS models) or the sampling fractions H 1,..., H L (for MNBS models) and K 1,..., K L (for SSBS models), described later, are part of the sampling design that is fixed in advance by the data collection agency and their knowledge is usually in the public domain. 2 In fact, we can continue to use the assumption that z1,..., z n are iid draws from (1.4) in order to do efficient inference in SSBS models even when the aggregate shares are assumed known, provided we condition on the n l s. For the sake of completeness, in Appendix B we show how this can be done. Readers should note that Appendix B is the only section in the paper where the aggregate shares are assumed known.

4 4 TRIPATHI schemes are observationally equivalent, results obtained for SSBS models will be valid for MNBS models as well. Therefore, henceforth we only consider SSBS models without loss of generality. A simple example illustrates the dangers of ignoring stratification. Suppose we want to estimate θ = E f {z}, the mean of the target population. But since the target population is stratified we have iid observations z 1,..., z n from the realized density f in (1.2), (1.3), or (1.4) instead of iid observations from the target density f. Therefore, the naive estimator n j=1 z j/n will not be a consistent estimator of θ because n j=1 z j/n p E f {z} by the WLLN 3, but E f {z} E f {z}. Intuitively, the naive approach fails because stratification of the target population ensures that data comes from the realized density ven though the parameter of interest is defined in terms of target density f. Hence, statistical inference using the realized observations is actually about f and not f. The basic problem investigated in this paper can be loosely stated as follows: How can we use iid observations z 1,..., z n generated from the realized density f to do inference on the target density f? More precisely, we develop a unified approach to deal with different kinds of stratified sampling schemes and our treatment is general enough to handle stratification based only on the endogenous variable(s) (pure endogenous stratification), or on the exogenous variable(s) alone (pure exogenous stratification), or stratification that is based on some or all of the endogenous and exogenous variables together 4. Using this approach we show how to efficiently estimate θ, the cdf of the target population F 5, and the aggregate shares Q 1,..., Q L. We also efficiently estimate the cdf of the realized population F 6 and b, the probability that an observation from the target population is retained in the realized sample when data is collected by VP sampling. Furthermore, we obtain confidence regions and tests of parametric hypotheses for θ and also show how to test the validity of the overidentifying restrictions in (1.1). All estimation and testing is done without making any parametric assumptions about the target or realized densities. Note that none of the results obtained in this paper can be found in the literature surveyed in Section 2. The inferential approach developed in this paper is based on two well known techniques: the generalized method of moments (GMM) proposed by Hansen (1982), and empirical likelihood (EL) proposed by Owen (1988). We focus on GMM (which is related to the estimating functions approach studied in the statistics literature) because its unifying approach and wide applicability has made 3 Throughout the paper, all limits are taken as the sample size n. 4 The term exogenous is, strictly speaking, an abuse of terminology since (1.1) does not involve any conditioning. The careful reader may want to substitute stratification based on explanatory variables for exogenous stratification whenever the latter is encountered. 5 Efficient estimation of F is important if one wants to bootstrap from the target population. When prior information about the target population in the form of (1.1) is available, merely using a consistent estimator of F can lead to poor inference from the bootstrap. Hence, as pointed out by Brown and Newey (2002), in order to obtain an efficient bootstrapping procedure, resampling must be done using an efficient estimator of F ; i.e., an estimator which incorporates the stochastic restrictions imposed by the model (1.1). 6 Estimating the realized cdf is useful because comparing the estimators of F and F, denoted by ˆF and ˆF, respectively, can reveal the extent of bias induced by the stratification procedure. Of course, ˆF can also be compared with the empirical cdf of the observed data. But because ˆF takes the model (1.1) into account, it is more precise. Similarly, since ˆF is an asymptotically efficient estimator of F under H 0, using it to resample from F makes sense.

5 MOMENT BASED INFERENCE WITH STRATIFIED DATA 5 it the method of choice for estimating and testing overidentified nonlinear econometric models. Its availability in canned software packages has also added to its popularity with applied economists. An excellent exposition of the GMM methodology can be found in Newey and McFadden (1994). On the other hand, we also devote attention to EL because it has lately begun to emerge as a serious contender to GMM. See, e.g., Qin and Lawless (1994), Imbens (1997), Kitamura (1997, 2001), and other papers in the 2002 Journal of Business and Economics Statistics special issue on GMM. Owen (2001) provides a comprehensive and extremely readable account of the EL approach. Although GMM and EL based inference is asymptotically equivalent up to a first order analysis, recent research by Newey and Smith (2003) has shown that under certain regularity conditions EL has better second order properties than GMM; e.g., unlike GMM, the second order bias of EL does not depend upon the number of moment conditions. This makes EL very attractive for estimating models with large q (such as panel data models with long time dimension) where GMM is known to perform poorly in small samples. The following assumption is maintained throughout the paper. It ensures that the GMM and EL estimators described subsequently are consistent and asymptotically normal. Let B(θ, δ) denote an open ball with center θ and radius δ, the usual Euclidean norm (i.e., z = z z), g(z,θ) q p Jacobian matrix of partial derivatives of g(z, θ) with respect to θ, and z (j) the j th component of a vector z. The abbreviation w.p.1 stands for with (target) probability one. Assumption 1.1. (i) Θ is compact; (ii) θ int(θ) is the unique root of (1.1); (iii) g(z, θ) is continuous on Θ w.p.1; There exist δ > 0 and η > 0 such that: (iv) E f {sup θ Θ g(z, θ) 2(1+η) } < ; (v) g(z, θ) is twice continuously differentiable on B(θ, δ) w.p.1; (vi) E f {sup θ B(θ,δ) g(z,θ) } < ; (vii) E f {sup θ B(θ,δ) 2 g (i) (z,θ) } < for i = 1,..., q and j, k = 1,..., p; (viii) (Q (j) θ (k) 1,..., Q L ) (δ, 1 δ)... (δ, 1 δ). (ii) ensures that θ is globally identified. (i) (vii) are used to prove the consistency and asymptotic normality of EL estimators as in Kitamura (1997) and Qin and Lawless (1994), although GMM estimators can be shown to be consistent and asymptotically normal under slightly weaker conditions; see, e.g., Newey and McFadden (1994). (viii) is standard in the stratified sampling literature; see, e.g., Imbens (1992) and Imbens and Lancaster (1996). the 2. Related papers The VP, SS, and MN stratification schemes described earlier have been widely studied in the literature; e.g., Manski and Lerman (1977) estimate discrete choice models using MN sampling when stratification is based on a set of finite response variables (i.e., choice based sampling). Manski and McFadden (1981) allow for stratification on exogenous variables as well. Cosslett (1981b) and Imbens (1992) describe efficient estimation techniques for these models using choice based samples. Imbens and Lancaster (1996) extend Imbens (1992) to allow for SS and VP sampling, mixed response variables, and stratification on exogenous covariates. A well known application of SS and VP sampling can be found in Hausman and Wise (1981). Unlike the current paper, all these works assume that the

6 6 TRIPATHI conditional density of y given x is known up to a finite dimensional parameter and only the marginal pdf of the exogenous variables x is left unspecified. Regression under stratified sampling and a parametric pdf(y x) has also been well investigated; e.g., under SS sampling, DeMets and Halperin (1977) and Holt, Smith, and Winter (1980) estimate linear models under normality, while Scott and Wild (1986) look at logistic regression. Butler (2000) considers linear and nonlinear regression models with MN sampling weights when the error density is known. Bickel and Ritov (1991) also use MN sampling but they do not impose normality. However, they assume independence between the error term and the regressors (all of which have to be discrete). DuMouchel and Duncan (1983) look at weighted least squares estimators under SS sampling. Jewell (1985) and Quesenberry and Jewell (1986) propose iterative estimators of regression coefficients under SS and VP sampling without imposing normality or independence, but they do not provide any asymptotic theory for their estimators. Wooldridge (1999, 2001) also leaves f completely unspecified and provides asymptotic theory for M-estimation under VP and SS sampling. However, Wooldridge s model is defined in terms of a set of just identified unconditional moment conditions whereas we deal with possibly overidentified moment restrictions. Therefore, (1.1) nests his moment conditions as a special case. Note that since the moment conditions in Wooldridge s papers are exactly identified, their validity cannot be tested. In contrast, (1.1) is overidentified when q > p and testing H 0 under stratification is an important issue that is examined in this paper. Furthermore, when dealing with SS sampling, Wooldridge assumes that the aggregate shares are known. In contrast, the maintained assumption in this paper is that the aggregate shares are unknown. Butler (2000) also relaxes the parametric assumption on pdf(y x) when doing unconditional GMM on overidentified models with MN sampling weights and known aggregate shares. However, his focus is on comparing weighted maximum likelihood with weighted GMM and he does not investigate fully efficient GMM or any kind of inference including specification testing of overidentified moment conditions. Neither of these papers discuss estimation of b, Q, F, or F. Qin (1993) uses the EL approach to construct a nonparametric likelihood ratio confidence interval for θ = E f {z} using two independent samples from f (ξ) and f(ξ) = b(ξ)f (ξ)/b ; i.e., a two sample VPBS model in our terminology (Qin just calls it a biased sampling model). El- Barmi and Rothmann (1998) generalize Qin s treatment to handle overidentified models of the form E f {g(z, θ )} = 0. They also use two independent samples from f and f. In contrast, we only need a single sample from f to do inference in VPBS models. Unlike us, Qin (1993) and El-Barmi and Rothmann (1998) do not investigate SSBS or MNBS models. Nor do the latter consider testing overidentifying restrictions or GMM based inference. 3. Inference in VPBS models In this section we investigate VPBS models. We begin with an example. Example 3.1 (VPBS linear regression). Consider the linear model y = x θ + ε, where E f {xε} = 0 and instead of observing z = (y, x) from the target density f, we observe z from the realized VPBS density f defined in (1.2). If we ignore stratification and simply regress y on x, then θ cannot

7 MOMENT BASED INFERENCE WITH STRATIFIED DATA 7 be consistently estimated by the least squares estimator θ = ( n j=1 x jx j ) 1 n j=1 x jy j. To see this, observe that the probability limit of θ is given by (E f xx ) 1 (E f xy) = θ + (E f xx ) 1 E f {xε}/b. But since E f {xε} = 0 does not imply that E f {xε} = 0, it follows that θ is not consistent for θ. Furthermore, since the magnitude of the asymptotic bias depends upon the distribution of (y, x) and the retention probabilities P 1,..., P L, the decision to ignore stratification can only be made on a case by case basis. For some experimental evidence on this matter see Imbens and Lancaster (1996). Notice that θ remains inconsistent even if stratification is based only upon x. In contrast, as pointed out by Wooldridge (1999, 2001) and Tripathi (2002), if the identifying assumption E f {xε} = 0 is replaced by the stronger condition E y x {ε x} = 0 w.p.1, then ignoring exogenous stratification does not affect the consistency of θ although it will still affect its asymptotic variance. Therefore, exogenous stratification is ignorable depending upon the nature of the moment condition used to identify θ Identification. Since we use f to do inference on f, before proceeding any further we first have to investigate whether f can be recovered in terms of f. If there is no way of going from the realized density (loosely speaking, the reduced form ) to the target density (the structural form ), then statistical inference about f and, hence, the target cdf F is impossible. In other words, we first have to examine whether f is identified (the realized density f is, of course, identified by definition). Fortunately, there are no major identification issues for VPBS models: b is identified because b = 1/E f {1/}, the aggregate shares are identified because E f [I{z C l }/] = Q l E f [1/] for l = 1,..., L (hence, Q l = E f [I{z C l }/]/E f [b 1 (z)] for each l), and identification of f follows from the fact that f (ξ) = f(ξ)/{b(ξ)e f [b 1 (z)]}. Therefore, b, Q, and f can all be explicitly written in terms of the realized density f without any problems. As discussed later in Section 4.1, this is in sharp contrast to SSBS models where ignorance of Q leads to serious identification problems Efficient estimation. Since E f {g(z, θ )} = 0 if and only if E f {g(z, θ )/} = 0, we can efficiently estimate θ by doing optimal GMM on g(z, θ)/ 7. In technical terms, this is a change of measure result: since the target density can be expressed in terms of the realized density by inverting the mapping in (1.2), dividing g(z, θ ) by allows (1.1) to be rewritten in terms of f without loss of information. More intuitively, since 1/ = L l=1 I{z C l}/p l, this transformation represents an inverse probability weighting scheme in which oversampled strata are assigned smaller weights than the undersampled strata, thus correcting for the effects of stratification. Example 3.2. Suppose we want to estimate the mean of the target population using iid observations from the realized VPBS density. Since E f {z θ } = 0 if and only if E f {(z θ )/} = 0, the GMM estimator of θ is given by ˆθ = { n j=1 b 1 (z j )} 1 n j=1 z j/b(z j ) = L ˆQ l=1 l z l, where ˆQ l = (n l /P l )/ L l=1 (n l/p l ) and z l = n j=1 z ji{z j C l }/n l denotes the l th stratum sample average. As shown later in Example 3.3, the ˆQ l s estimate the aggregate shares. By the CLT, n 1/2 (ˆθ θ ) is 7 The M-estimators in Wooldridge (1999, 2001) can be motivated in a similar manner. Suppose that θ is identified as θ = argmin θ Θ E f {ψ(z, θ)}, where ψ is a real-valued objective function. Since E f {ψ(z, θ)} = b E f {ψ(z, θ)/} and b does not depend upon θ, it follows that θ = argmin θ Θ E f {ψ(z, θ)/}. Hence, ˆθ M, the M-estimator of θ, can be obtained by solving the sample analog of this problem; i.e., ˆθ M = argmin θ Θ n j=1 ψ(zj, θ)/b(zj). A similar argument works for SSBS models when the aggregate shares are assumed known. See Appendix B for details.

8 8 TRIPATHI asymptotically normal with mean zero and variance b 2 E f {(z θ )(z θ ) /b 2 (z)}. This of course agrees with the result given below in Theorem 3.1. Since the aggregate shares Q are also important parameters of interest, they can be estimated jointly with θ by simply adding additional moment conditions. In fact, since the Q l s add up to one, it suffices to estimate Q L = (Q 1,..., Q L 1 ) (L 1) 1. So let β = (θ, Q L ) (p+l 1) 1 and define 8 ρ(z, β) = g(z,θ) I{z C 1 } Q 1. I{z C L 1 } Q L 1 = [ g(z,θ) ] s(z) Q L, (3.1) (q+l 1) 1 where s(z) = (I{z C 1 },..., I{z C L 1 }) (L 1) 1. As the moment condition E f {ρ(z, β )} = 0 exhausts all information about (θ, Q ), an asymptotically efficient estimator of β can be obtained by doing two-step optimal GMM on (3.1) as follows: First, let ˆρ(β) = n j=1 ρ(z j, β)/n and obtain a preliminary estimator β = argmin β B ˆρ (β)ˆρ(β), where B = Θ [δ, 1 δ]... [δ, 1 δ]. Use this preliminary estimator to construct the efficient weight matrix Υ = n j=1 ρ(z j, β)ρ (z j, β)/n. Next, obtain ˆβ gmm = argmin β B GMM(β), where the objective function GMM(β) = ˆρ (β) Υ 1 ˆρ(β). Now define g b (z, θ ) = g(z, θ )/, D b = E f { g b(z,θ ) }, V b = E f {g b (z, θ )g b (z, θ )}, M b = b b D b (D b b D b ) 1 D b b, and C b = E f {g b (z, θ )[s(z) Q L ] /}. Standard GMM theory as in Newey and McFadden (1994) reveals that ˆβ gmm is consistent for β. Some additional matrix algebra can be used to show the following result. Theorem 3.1. n 1/2 ( ˆβ ] gmm β ) d N(0, ), where Σ 11 = (D b b D b ) 1, Σ 12 = b (D b b D b ) 1 D b b [ Σ11 Σ 12 Σ 12 Σ 22 C b, and Σ 22 = b 2 (E f {[ s(z) Q L ][ s(z) Q L ] } C b M bc b ). ˆβ gmm is asymptotically efficient because it is well known that optimally weighted GMM estimators are asymptotically efficient (Chamberlain 1987) 9. Notice that if is constant so that stratification disappears, then Σ 11 becomes the well known asymptotic variance for estimating θ in the absence of stratification. Similarly, if there is no auxiliary information (e.g., if g(z, θ ) is identically zero or if there are no overidentifying restrictions), then Σ 22 reduces to b 2 E f {[ s(z) Q L ][ s(z) Q L ] }. Therefore, imposing the overidentified model leads to an efficiency gain in estimating the aggregate shares. Theorem 3.1 also reveals that if θ is the only parameter of interest then it is not necessary to 8 Following the discussion in Section 3.1, Ef [I{z C l }/] = Q l E f [1/] for l = 1,..., L. Hence, the last L 1 moment conditions in (3.1) identify the aggregate shares. 9 Tripathi (2003) shows from first principles that the efficiency bound for estimating θ coincides with Σ 11. For the sake of conciseness, all efficiency bound calculations are omitted from this paper. Interested readers can find these calculations and additional details on efficiency related issues in the aforementioned paper which can be downloaded from the authors web site

9 MOMENT BASED INFERENCE WITH STRATIFIED DATA 9 jointly estimate Q in order to obtain an efficient estimator of θ (because, as mentioned at the beginning of Section 3.2, a GMM or EL estimator of θ based on the moment condition E f {g(z, θ )/} = 0 alone will be asymptotically efficient; i.e., have asymptotic variance Σ 11 ). Next, we consider estimating F and F. Since F and F have the same support, fix an evaluation point ξ R d. Although F (ξ) = Pr f {z ξ} and F (ξ) = Pr f {z ξ} can be estimated by incorporating additional conditions in (3.1), the resulting estimators will not be computationally practical because they would have to be recalculated each time ξ changed. A better alternative is to do EL. Of course, as mentioned earlier, EL has other properties that make it attractive for estimating θ, Q, and b as well. So, for a fixed β, construct the nonparametric loglikelihood for the realized sample by solving max n p1,...,p n j=1 log p j subject to the constraints p j 0, n j=1 p j = 1, and n j=1 ρ(z j, β)p j = 0. The solution to this optimization problem is given by ˆp j (β) = n 1 {1 + λ (β)ρ(z j, β)} 1 for j = 1,..., n, where λ(β) satisfies n ρ(z j,β) j=1 1+λ (β)ρ(z j,β) = 0. j=1 log{1 + λ (β)ρ(z j, β)} n log n and define the Now let EL(β) = n j=1 log ˆp j(β) = n empirical likelihood estimator of β as ˆβ el = argmax β B EL(β). Consistency of ˆβel follows from Kitamura (1997, Theorem 1), and using Qin and Lawless (1994, Theorem 1) along with some additional matrix algebra we can show that n 1/2 ( ˆβ el β ) and n 1/2 ( ˆβ gmm β ) have the same asymptotic distribution 10. Therefore, the EL estimators of θ and Q are also asymptotically efficient. Henceforth, for the remainder of the paper let ˆθ and ˆQ = ( ˆQ 1,..., ˆQ L 1, 1 L 1 ˆQ l=1 l ) denote the GMM or EL estimators of θ and Q, respectively. Also, define ˆD b = n 1 n j=1 g b (z j,ˆθ), ˆV b = n 1 n j=1 g b(z j, ˆθ)g b (z j, ˆθ), and Ĉb = n 1 n j=1 g b(z j, ˆθ)[s(z j ) ˆQ L ] /b(z j ). Since ˆD b p Db, ˆV b p Vb, Ĉ b p Cb, n/ n j=1 {1/b(z j)} p b, and n 1 n j=1 [s(z j) ˆQ L ][s(z j ) ˆQ L ] /b 2 (z j ) p E f {[ s(z) Q L ][ s(z) Q L ] }, the asymptotic variances of ˆθ and ˆQ can be easily estimated by replacing Σ 11 and Σ 22 with their consistent estimators. Example 3.3 (Example 3.1 contd.). Since β is just identified, the GMM or EL estimators of θ and Q l are given by ˆθ = { n j=1 x jx j /b(z j)} 1 n j=1 x jy j /b(z j ) and ˆQ l = (n l /P l )/ L l=1 (n l/p l ), respectively. By Theorem 3.1, n 1/2 (ˆθ θ ) is asymptotically normal with mean zero and variance xx {E f } 1 E f {xx ε 2 b }{E f xx } 1, which resembles the Eicker-White heteroscedasticity consistent asymptotic variance with a correction for stratification. Similarly, after a little simplification it follows that n 1/2 ( ˆQ l Q l ) d N(0, b {Q l 2Q l 2 + kp l Q l 2 }/P l ), where k = L l=1 (Q l /P l). Before describing estimators of F (ξ) and F (ξ), let us see how b can be efficiently estimated. Since, by (1.2), b = L l=1 P lq l, the GMM or EL estimator (depending on whether Q is estimated by GMM or EL) of b is given by ˆb = L l=1 P l ˆQ l. As in Theorem 3.1, it is straightforward to show that n 1/2 (ˆb b ) = n 1/2 n j=1 ϕ(z j) + o p (1), where ϕ(z) = b {[ b ]/} + b 2 E f {g b (z, θ )/}M b g b (z, θ ). Therefore, the next result is immediate. 10 Although GMM and EL estimators have the same asymptotic distribution, they differ from one another in finite samples. If, however, the model is just identified (i.e., if q = p), then the two coincide in finite samples as well because they are obtained by setting the sample analog of E f {ρ(z, β )} to zero; i.e., by solving n j=1 ρ(zj, ˆβ) = 0 for ˆβ.

10 10 TRIPATHI Theorem 3.2. n 1/2 (ˆb b ) d N(0, b 2 E f { b } 2 b 4 E f { g b (z,θ ) }M b E f { g b(z,θ ) }). Since ˆb is a known linear function of ˆQ and the latter is asymptotically efficient, it follows that ˆb is also asymptotically efficient. If there is no overidentification the asymptotic variance of n 1/2 (ˆb b ) becomes b 2 E f { b } 2. This makes sense because ˆQ l = (n l /P l )/ L l=1 (n l/p l ) when q = p and, hence, ˆb = L l=1 P l ˆQ l = n/ n j=1 {1/b(z j)} is just the sample analog of 1/E f {1/}. Now that we have both ˆβ el and ˆb el, F (ξ) and F (ξ) can be easily estimated by ˆF (ξ) = n ˆbel j=1 ˆp j( ˆβ el )I{z j ξ}/b(z j ) and ˆF (ξ) = n j=1 ˆp j( ˆβ el )I{z j ξ}, respectively. The next two results can be shown by using Qin and Lawless (1994, Theorem 1) and some tedious matrix algebra. Theorem 3.3. n 1/2 { ˆF (ξ) F (ξ)} is asymptotically normal with mean zero and variance b 2 E f { I{z ξ} F (ξ) } 2 b 2 E f { g b (z, θ )[I{z ξ} F (ξ)] }M b E f { g b(z, θ )[I{z ξ} F (ξ)] }. Theorem 3.4. n 1/2 { ˆF (ξ) F (ξ)} is asymptotically normal with mean zero and variance F (ξ){1 F (ξ)} E f [g b (z, θ )I{z ξ}]m b E f [g b (z, θ )I{z ξ}]. As before, there is a gain in efficiency when estimating F (ξ) and F (ξ) if the model is overidentified. Tripathi (2003) shows that the asymptotic variances in Theorems correspond to the efficiency bounds for estimating F (ξ) and F (ξ), respectively. Hence, these estimators are asymptotically efficient. Notice that if f = f (i.e., no stratification), then the asymptotic variances in Theorems reduce to F (ξ)[1 F (ξ)] E[g (z, θ )I{z ξ}]{ D(D D) 1 D }E[g(z, θ )I{z ξ}], where D = E{ g(z,θ ) } and V = E{g(z, θ )g (z, θ )}. This corresponds to the asymptotic variance for estimating F (ξ) under H 0 in the absence of stratification obtained by Brown and Newey (1998). Example 3.4. Suppose we know a priori that E f {g(z)} = 0, where g is a q 1 vector of known functions that does not depend upon any parameter θ. These type of auxiliary information models, which are a special case (1.1), have been investigated by Imbens and Lancaster (1994), Hellerstein and Imbens (1999), and Nevo (2003), although these authors do not consider efficient estimation of Q, b, F, or F for such models. Since there is no θ to be estimated here, the asymptotic distributions for GMM or EL estimators of Q, b, F (ξ), and F (ξ) under E f {g(z)} = 0 follow from Theorems by replacing g b (z, θ ) with g b (z) and setting D b = Hypothesis testing. Suppose we want to test the parametric restriction H 0 : R(θ ) = 0 against the alternative H 1 : R(θ ) 0, where R is a r 1 vector of twice continuously differentiable functions such that R(θ ) Wald statistic W = nr (ˆθ){ R(ˆθ) has rank r p. Since n 1/2 (ˆθ θ ) d N(0, (D b b D b ) 1 ), the GMM or EL based ( ˆD b 1 ˆV b ˆD b ) 1 R (ˆθ) } 1 R(ˆθ) d χ 2 r. Hence, a test for H 0 against H 1 can be based upon W. Another alternative is to base the test on the objective functions themselves. So let β gmm = argmin {β B:R(θ)=0} GMM(β) and β el = argmax {β B:R(θ)=0} EL(β) denote the GMM and EL estimators under H 0. Next, define DM = n{gmm( β gmm ) GMM( ˆβ gmm )} and LR = 2{EL( ˆβ el ) EL( β el )}, where DM denotes the GMM based distance metric statistic and LR the EL based likelihood ratio test statistic. A test for H 0 against H 1 can be based upon DM or EL. Critical values for these

11 MOMENT BASED INFERENCE WITH STRATIFIED DATA 11 tests follow from Newey and McFadden (1994, Theorem 9.2) and Qin and Lawless (1994, Theorem 2) who show that DM d χ 2 r and LR d χ 2 r under H 0. Since W, DM, and LR are asymptotically equivalent, the decision to use a particular test statistic depends upon computational and other considerations; e.g., though all three can be inverted to obtain asymptotically valid confidence regions, the DM and LR based regions are invariant to the formulation of H 0 and automatically satisfy natural range restrictions. Furthermore, unlike W and DM, the likelihood ratio statistic LR is internally studentized; i.e., it does not require preliminary estimation of any variance terms. This guarantees that confidence regions based on LR are also invariant to nonsingular transformations of the moment conditions. Internal studentization may also lead to better finite sample properties for LR; see, e.g., Fisher, Hall, Jing, and Wood (1996) Specification testing. For the remainder of this section, assume that q > p. Since inference based on the estimated θ is sensible only if (1.1) is true, it is important to devise a test for H 0 against the alternative that it is false. In this section we describe two ways of testing H 0. The first one is easy: GMM theory tells us that ngmm( ˆβ gmm ) d χ 2 q p under H 0. Therefore, a test for overidentifying restrictions (usually called the J-test) in (1.1) can be based on this result. An EL based specification test for H 0 can also be developed. Besides being internally studentized and invariant to nonsingular and algebraic transformations of the moment conditions, Kitamura (2001) has shown this test to be optimal in terms of a Hoeffding type large deviations criterion. So let ˆβ denote a n 1/2 -consistent preliminary estimator of β; e.g., ˆβ can be the GMM or EL estimator defined previously. The restricted, i.e., under H 0, EL can be written as EL r = n j=1 log ˆp j( ˆβ), where ˆp j ( ˆβ) are the EL probability weights defined earlier. Next, consider the unrestricted problem where the model is not imposed. It is well known that the nonparametric maximum likelihood estimator of f in the absence of any auxiliary information puts mass 1/n at each realized observation and is zero elsewhere. Therefore, the unrestricted nonparametric likelihood for the realized sample is given by EL ur = n log n. Now let ELR = 2(EL ur EL r ) = 2 n j=1 log{1 + λ ( ˆβ)ρ(z j, ˆβ)}, where λ( ˆβ) was defined earlier in Section 3.2. Then ELR can be regarded as an analog of the usual parametric likelihood ratio test statistic; i.e., H 0 is rejected if ELR is large enough. Critical values for ELR are easily obtained because ELR d χ 2 q p under H 0 by Qin and Lawless (1994, Corollary 4). 4. Inference in SSBS models We now investigate SSBS models. As shown subsequently, the major difference between VPBS and SSBS models is that unknown aggregate shares create severe identification problems for SSBS models that were not encountered with VPBS models Identification. Although f (ξ) = f(ξ)/b(ξ, Q ) by (1.4), we cannot recover f in terms of the realized density f because the aggregate shares Q are unknown. Thus the target pdf is unidentified and cannot be estimated consistently. So the first task is to overcome this lack of identification. As noted by Cosslett (1981a), Gill, Vardi, and Wellner (1988), and Imbens and Lancaster (1996), in order to ensure identification of f we have to randomly sample from the entire target population with positive probability. We now formalize this assumption.

12 12 TRIPATHI Assumption 4.1. The realized SSBS density is given by (ξ) = K 0 f (ξ) + (1 K 0 )b(ξ, Q )f (ξ), where K 0 (0, 1] is known. Letting c(ξ, Q ) = K 0 + (1 K 0 )b(ξ, Q ), we can conveniently write (ξ) = c(ξ, Q )f (ξ). Note that K 0 denotes the probability of randomly sampling from the whole target population. As a practical example, consider a dataset with 100 observations, 75 of which were obtained using SS sampling while the remaining 25 were obtained by random sampling. Hence, we can let K 0 = 0.25 and regard this enriched dataset as a random sample from. The no stratification case corresponds to setting K 0 = 1. Since Q l = {E fe I(z C l ) (1 K 0 )K l }/K 0 for l = 1,..., L under Assumption 4.1, the aggregate shares (and hence f ) are identified. The idea of using additional samples to ensure identification in biased sampling, incomplete, or measurement error models has of course been recognized earlier; see, e.g., Manski and Lerman (1977), Titterington (1983), Titterington and Mill (1983), Vardi (1985), Ridder (1992), Carroll, Ruppert, and Stefanski (1995), Wansbeek and Meijer (2000), Hirano, Imbens, Ridder, and Rubin (2001), Tripathi (2004), and the references therein. However, the use of supplementary random samples to do moment based inference in overidentified models with stratified data seems to be new to the literature and the results described in this paper cannot be found in any of the references cited here. The existence of such additional random samples should not be regarded as being an overly restrictive requirement. For instance, if the cost of collecting additional data is relatively small, Manski and Lerman (1977) suggest carrying out a small random survey to gather a supplementary sample in order to deal with the problem of unknown aggregate shares. Indeed, some important stratified U.S. datasets such as the Panel Study of Income Dynamics (PSID) and the National Longitudinal Survey (NLS) automatically provide an additional random sample that can be used for this purpose Efficient estimation and inference. Let Q = (Q 1,..., Q L 1, 1 L 1 l=1 Q l). Using some notation introduced earlier in Section 3.2, define 11 ρ(z, β) = g(z,θ) c(z,q) I{z C 1 } c(z,q) Q 1. I{z C L 1 } c(z,q) Q L 1 [ ] g(z,θ) c(z,q) = s(z) c(z,q) Q L (q+l 1) 1. (4.1) Since E fe {ρ(z, β )} = 0, the optimal GMM or EL estimators of θ and Q based on (4.1) are asymptotically efficient. Tripathi (2003) proves this from first principles. As in Section 3.2, the GMM estimator is given by ˆβ gmm = argmin β B GMM(β). Now let g(z, θ ) = g(z, θ ) + K0 1 (1 K 0) L l=1 (K l/q l 2 )I(z C l )E f {g(z, θ )I(z C l )} and define D c = E fe { g(z,θ ) /c(z, Q )}, g c (z, θ ) = g(z, θ )/c(z, Q ), C c = cov fe { g c (z, θ ), s(z)/c(z, Q )}, Ω c = var fe { g c (z, θ )}, and M c = Ω 1 c Ω 1 c D c (D cω 1 c D c ) 1 D cω 1 c. The asymptotic distribution of ˆβ gmm is given below. 11 The last L 1 moment conditions in (4.1) are identifying the aggregate shares because, by using the definition of c(z, Q ), it is easily seen that E fe [I{z C l }/c(z, Q )] = Q l if and only if [E fe I{z C l } (1 K 0)K l ]/K 0 = Q l.

13 MOMENT BASED INFERENCE WITH STRATIFIED DATA 13 Theorem 4.1. Let Assumption 4.1 hold, αl = K 0 Q l + (1 K 0 )K l, α = (α1,..., α L 1 ) (L 1) 1, and A = diag(α1,..., α L 1 ) (L 1) (L 1). Then n 1/2 ( ˆβ ] gmm β ) d N(0, ), where Σ 11 = (D cω 1 c [ Σ11 Σ 12 Σ 12 Σ 22 D c ) 1, Σ 12 = (D cω 1 c D c ) 1 D cω 1 c C c /K 0, and Σ 22 = {A α α C c M c C c }/K0 2. If L = 1 or K 0 = 1 so that stratification disappears, then Σ 11 is just the asymptotic variance for estimating θ in the absence of stratification. If there are no overidentifying restrictions, then Σ 22 reduces to {A α α }/K0 2. Hence, as expected, imposing the overidentified model leads to an efficiency gain in estimating the aggregate shares. Note that, loosely speaking, the second term in the definition of g(z, θ ) can be motivated as bias that appears because Q is an unknown parameter. To get a precise explanation of why the asymptotic variance of ˆθ gmm in Theorem 4.1 depends upon g(z, θ ), the reader should examine the proof of Theorem 4.4 in Tripathi (2003). Next, as in Section 3.2, define the EL estimator ˆβ el = argmax β B EL(β). Consistency of ˆβ el follows from Kitamura (1997, Theorem 1). Using Qin and Lawless (1994, Theorem 1) and the matrix algebra in the proof of Theorem 4.1, we can show that the asymptotic distribution of n 1/2 ( ˆβ el β ) is the same as that of n 1/2 ( ˆβ gmm β ); i.e., ˆθ el and ˆQ el are also asymptotically efficient. Asymptotic variances of ˆQ and ˆθ can be estimated in the obvious manner by replacing Σ 11 and Σ 22 with their sample analogs. These can be constructed by using 12 ˆDc = n 1 n j=1 g(z j,ˆθ) /c(z j, ˆQ), Ĉ c = n 1 n j=1 { g c (z j, ˆθ)s (z j )/c(z j, ˆQ)} n 1 n j=1 { g c (z j, ˆθ)}n 1 n j=1 {s (z j )/c(z j, ˆQ)}, and ˆΩ c = n 1 n j=1 { g c (z j, ˆθ) g c(z j, ˆθ)} n 1 n j=1 { g c (z j, ˆθ)}n 1 n j=1 { g c(z j, ˆθ)}, where g c (z, θ) = g c (z, θ) Q = ˆQ. Example 4.1. Suppose we want to estimate θ, the mean of the target population. Since θ is just identified, it is easily seen that ˆθ = L ˆQ l=1 l z l, where ˆQ l = {n l /n (1 K 0 )K l }/K 0. The asymptotic distribution of ˆθ follows by a direct application of the CLT or by Theorem 4.1 upon noting that D c is the p p identity matrix. Example 4.2 (SSBS linear regression). Consider the model in Example 3.1, but now assume that z is drawn from the enriched SSBS density. Then ˆθ = { n j=1 x jx j /c(z j, ˆQ)} 1 n j=1 x jy j /c(z j, ˆQ), where ˆQ was defined in the previous example. Hence, by Theorem 4.1, n 1/2 (ˆθ θ ) is asymptotically normal with mean zero and variance {E fe xx /c(z, Q )} 1 Ω c {E fe xx /c(z, Q )} 1. By the same result, n 1/2 ( ˆQ l Q l ) d N(0, αl (1 α l )) for each l. We can also use ˆβ el to estimate F (ξ) and F e (ξ). Let ˆF (ξ) = n j=1 ˆp j( ˆβ el )I{z j ξ}/c(z j, ˆQ el ) and ˆF (ξ) = n j=1 ˆp j( ˆβ el )I{z j ξ}, where ˆp j (β) s are the EL probabilities. The asymptotic distributions of ˆF (ξ) and ˆF (ξ) are given below. Theorem 4.2. Let Assumption 4.1 hold. Then n 1/2 { ˆF (ξ) F (ξ)} is asymptotically normal with mean zero and variance var fe {m c (z, ξ)} cov fe { g c(z, θ ), m c (z, ξ)}m c cov fe { g c (z, θ ), m c (z, ξ)}, where m(z, ξ) = I{z ξ} + K0 1 (1 K 0) L l=1 (K l/q l 2 )I{z C l } Pr f {z ξ z C l } and m c (z, ξ) = m(z, ξ)/c(z, Q ). 12 Since we continue to use n to denote the size of the enriched dataset, when estimating asymptotic variances each K l should be replaced by the corresponding underlying sampling fraction for the stratified sample.

14 14 TRIPATHI Theorem 4.3. Let Assumption 4.1 hold. Then n 1/2 { ˆF (ξ) F e (ξ)} is asymptotically normal with mean zero and variance F e (ξ){1 F e (ξ)} cov fe { g c(z, θ ), I(z ξ)}m c cov fe { g c (z, θ ), I(z ξ)}. These results again reveal that imposing the overidentified model leads to an efficiency gain in estimating F and F e. Asymptotic optimality of EL implies that ˆF (ξ) and ˆF (ξ) are also asymptotically efficient; alternatively, see Tripathi (2003) for a direct proof. Hypotheses of the form R(θ ) = 0 vs R(θ ) 0 can be tested using the Wald, DM, or LR statistics as described in Section 3.3 by basing the test on (4.1); in each case, the test statistic is asymptotically distributed as a χ 2 r random variable. If q > p, then GMM or EL based specification testing of H 0 can also be done using (4.1), the details being analogous to those in Section 3.4; i.e., in either case, the test statistic is asymptotically χ 2 q p. 5. Conclusion Using stratified data we develop GMM and EL based efficient semiparametric inference for moment restriction models that may be overidentified. Stratification of the target population induces selection in the realized sample which if not taken into account can lead to inconsistent estimators and biased inference. As shown in the paper, correcting for the effects of stratification is straightforward; namely, an appropriate transformation of the moment conditions ensures that all standard GMM and EL based inference goes through. No special software is required to implement the procedures developed in this paper; any computer package that can do GMM or EL will be able to do the same with stratified data. Appendix A. Proofs We only provide proofs for the results in Section 4 since SSBS models with unknown aggregate shares are the hardest to handle. Results for VPBS models described in Section 3 can be shown in a similar manner. Proof of Theorem 4.1. We prove this result for the L = 2 case. The case when there are more than two strata can be handled in a similar manner although the matrix algebra becomes much more tedious. When there are only two strata, ρ(z, β ) = (g(z, θ )/c(z, Q ), I{z C 1 }/c(z, Q ) Q 1 ) (q+1) 1 and Q = (Q 1, Q 2 ) 2 1. By standard GMM theory, n 1/2 ( ˆβ gmm β ) is asymptotically normal with mean zero and variance (D f e D fe ) 1, where D fe = E fe { ρ(z,β ) β } and V fe = E fe {ρ(z, β )ρ (z, β )}. [ ] Therefore, we show that (D f e D fe ) 1 Σ11 Σ = 12 Σ 12 Σ. So define G 22 1 = E fe {g(z, θ )I(z C 1 )/c(z, Q )}, V c = E fe {g(z, θ )g (z, θ )/c 2 (z, Q )}, and d = K 1 /(Q 1 α 1 ) + K 2/(Q 2 α 2 ). It is easily seen that [ Dc (1 K 0 ) dg ] 1 D fe = 0 K 0Q and V 1 fe = V c (1 K 0 ) Q 1 G 1 α 1. α 1 Next, since Ω c = V c + γg 1 G 1 /K2 0 where γ = α 1 α 2 (1 K 0) 2 d2 + 2K 0 (1 K 0 ) d, we can also show that = Ω 1 C cc c Ω 1 c α α 1 α 2 Cc Ω 1 1 c C c Q 1 G 1 Ω 1 c α 2 α 1 α 2 Cc Ω 1 1 c C c Q 1 2 K0 2 c + Ω 1 c α 1 Q 1 Q 1 G 1 α 1 Ω 1 c G 1 Q 2 1 α 2 α 1 α 1 α 2 Cc Ω 1 c C c K0 2 γg 1 Ω 1 c G 1 α 1 α 2 Cc Ω 1 c C c.

15 MOMENT BASED INFERENCE WITH STRATIFIED DATA 15 Hence, D D fe = D cω 1 c C cc c Ω 1 c D c K 0 D cω 1 α 1 α 2 Cc Ω 1 c C c c C c α 1 α 2 Cc Ω 1 c C c K 0 C c Ω 1 c D c K α 1 α 2 Cc Ω c C c α 1 α 2 Cc Ω 1 c C c D c + D cω 1 c Finally, the partitioned inverse formula reveals that [ (D fe D fe ) 1 (D = cω 1 c D c ) 1 (D cω 1 c The desired result follows. C c Ω 1 c. D c ) 1 D cω 1 c C c /K 0 D c (D cω 1 c D c ) 1 /K 0 {α 1 α 2 C c M c C c }/K 2 0 ]. Proof of Theorem 4.2. Recall ˆF (ξ) = n j=1 ˆp j( ˆβ el )I{z j ξ}/c(z j, ˆQ el ). By a Taylor expansion, 1 c(z, ˆQ el ) = 1 c(z, Q ) + (1 K 0) L l=1 K l I(z C l ) α l 2 ( ˆQ el,l Q l ) + O( ˆQ el Q 2 ). But it is easily seen that ˆQ el,l = { n j=1 ˆp j( ˆβ el )I(z j C l ) (1 K 0 )K l }/K 0. Hence, since ˆQ el,l Q l = n i=1 ˆp i( ˆβ el ){I(z i C l ) E fe I(z C l )}/K 0, we can write ˆF (ξ) = n j=1 ˆp j ( ˆβ el ) I{z j ξ} c(z j, Q ) + (1 K 0) K 0 L l=1 K l α l 2 ( ( n ˆp i ( ˆβ el ){I(z i C l ) E fe I(z C l )}) i=1 n ˆp j ( ˆβ el )I{z j ξ z j C l }) + O( ˆQ el Q 2 ). j=1 Next, n i=1 ˆp i( ˆβ el ){I(z i C l ) E fe I(z C l )} = O p (n 1/2 ) by Qin and Lawless (1994, Theorem 1) and n j=1 ˆp j( ˆβ el )I{z j ξ z j C l } = E fe I{z ξ z C l } + o p (1) by a uniform WLLN. Hence, since ˆQ el Q = O p (n 1/2 ), we have ˆF (ξ) = n j=1 ˆp j ( ˆβ el ) I{z j ξ} c(z j, Q ) + (1 K 0) K 0 L l=1 K l α l 2 ( n ˆp i ( ˆβ el ){I(z i C l ) E fe I(z C l )}) i=1 E fe I{z ξ z C l } + o p (n 1/2 ). Using the definition of c(z, Q ) and doing a little rearranging, we can rewrite this as n ˆF (ξ) = ˆp j ( ˆβ el ) m(z j, ξ) c(z j, Q ) + F m(z, ξ) (ξ) E fe { c(z, Q ) } + o p(n 1/2 ). j=1 Thus, as the ˆp j s sum to one, n 1/2 { ˆF (ξ) F (ξ)} = n 1/2 n ˆp j ( ˆβ el )[ m(z j, ξ) c(z j, Q ) E m(z, ξ) { c(z, Q ) }] + o p(1). j=1 But recall that n 1/2 { ˆF (ξ) F (ξ)} = n 1/2 n j=1 ˆp j( ˆβ el )[I{z j ξ} E fe I{z ξ}]. Therefore, the asymptotic distribution of n 1/2 { ˆF (ξ) F (ξ)} can be obtained from that of n 1/2 { ˆF (ξ) F (ξ)} upon replacing I{z ξ} in the proof of Theorem 4.3 by m(z, ξ)/c(z, Q ). The desired result follows.

Moment Based Inference with Stratified Data

Moment Based Inference with Stratified Data University of Connecticut DigitalCommons@UConn Economics Working Papers Department of Economics January 2007 Moment Based Inference with Stratified Data Gautam Tripathi University of Connecticut Follow

More information

GMM AND EMPIRICAL LIKELIHOOD WITH STRATIFIED DATA. 1. Introduction

GMM AND EMPIRICAL LIKELIHOOD WITH STRATIFIED DATA. 1. Introduction GMM AND EMPIRICAL LIKELIHOOD WITH STRATIFIED DATA GAUTAM TRIPATHI Astract. Many datasets used y economists and other social scientists are collected y stratified sampling. Stratification of the target

More information

PAUL J. DEVEREUX AND GAUTAM TRIPATHI

PAUL J. DEVEREUX AND GAUTAM TRIPATHI OPTIMALLY COMBINING CENSORED AND UNCENSORED DATASETS PAUL J. DEVEREUX AND GAUTAM TRIPATHI Abstract. Economists and other social scientists often encounter data generating mechanisms (dgm s) that produce

More information

Generalized Neyman Pearson optimality of empirical likelihood for testing parameter hypotheses

Generalized Neyman Pearson optimality of empirical likelihood for testing parameter hypotheses Ann Inst Stat Math (2009) 61:773 787 DOI 10.1007/s10463-008-0172-6 Generalized Neyman Pearson optimality of empirical likelihood for testing parameter hypotheses Taisuke Otsu Received: 1 June 2007 / Revised:

More information

Spring 2017 Econ 574 Roger Koenker. Lecture 14 GEE-GMM

Spring 2017 Econ 574 Roger Koenker. Lecture 14 GEE-GMM University of Illinois Department of Economics Spring 2017 Econ 574 Roger Koenker Lecture 14 GEE-GMM Throughout the course we have emphasized methods of estimation and inference based on the principle

More information

Econometric Analysis of Cross Section and Panel Data

Econometric Analysis of Cross Section and Panel Data Econometric Analysis of Cross Section and Panel Data Jeffrey M. Wooldridge / The MIT Press Cambridge, Massachusetts London, England Contents Preface Acknowledgments xvii xxiii I INTRODUCTION AND BACKGROUND

More information

Flexible Estimation of Treatment Effect Parameters

Flexible Estimation of Treatment Effect Parameters Flexible Estimation of Treatment Effect Parameters Thomas MaCurdy a and Xiaohong Chen b and Han Hong c Introduction Many empirical studies of program evaluations are complicated by the presence of both

More information

Non-linear panel data modeling

Non-linear panel data modeling Non-linear panel data modeling Laura Magazzini University of Verona laura.magazzini@univr.it http://dse.univr.it/magazzini May 2010 Laura Magazzini (@univr.it) Non-linear panel data modeling May 2010 1

More information

High Dimensional Empirical Likelihood for Generalized Estimating Equations with Dependent Data

High Dimensional Empirical Likelihood for Generalized Estimating Equations with Dependent Data High Dimensional Empirical Likelihood for Generalized Estimating Equations with Dependent Data Song Xi CHEN Guanghua School of Management and Center for Statistical Science, Peking University Department

More information

The properties of L p -GMM estimators

The properties of L p -GMM estimators The properties of L p -GMM estimators Robert de Jong and Chirok Han Michigan State University February 2000 Abstract This paper considers Generalized Method of Moment-type estimators for which a criterion

More information

ORTHOGONALITY TESTS IN LINEAR MODELS SEUNG CHAN AHN ARIZONA STATE UNIVERSITY ABSTRACT

ORTHOGONALITY TESTS IN LINEAR MODELS SEUNG CHAN AHN ARIZONA STATE UNIVERSITY ABSTRACT ORTHOGONALITY TESTS IN LINEAR MODELS SEUNG CHAN AHN ARIZONA STATE UNIVERSITY ABSTRACT This paper considers several tests of orthogonality conditions in linear models where stochastic errors may be heteroskedastic

More information

A Course in Applied Econometrics Lecture 18: Missing Data. Jeff Wooldridge IRP Lectures, UW Madison, August Linear model with IVs: y i x i u i,

A Course in Applied Econometrics Lecture 18: Missing Data. Jeff Wooldridge IRP Lectures, UW Madison, August Linear model with IVs: y i x i u i, A Course in Applied Econometrics Lecture 18: Missing Data Jeff Wooldridge IRP Lectures, UW Madison, August 2008 1. When Can Missing Data be Ignored? 2. Inverse Probability Weighting 3. Imputation 4. Heckman-Type

More information

1 Motivation for Instrumental Variable (IV) Regression

1 Motivation for Instrumental Variable (IV) Regression ECON 370: IV & 2SLS 1 Instrumental Variables Estimation and Two Stage Least Squares Econometric Methods, ECON 370 Let s get back to the thiking in terms of cross sectional (or pooled cross sectional) data

More information

What s New in Econometrics. Lecture 15

What s New in Econometrics. Lecture 15 What s New in Econometrics Lecture 15 Generalized Method of Moments and Empirical Likelihood Guido Imbens NBER Summer Institute, 2007 Outline 1. Introduction 2. Generalized Method of Moments Estimation

More information

Economics 583: Econometric Theory I A Primer on Asymptotics

Economics 583: Econometric Theory I A Primer on Asymptotics Economics 583: Econometric Theory I A Primer on Asymptotics Eric Zivot January 14, 2013 The two main concepts in asymptotic theory that we will use are Consistency Asymptotic Normality Intuition consistency:

More information

Asymptotics for Nonlinear GMM

Asymptotics for Nonlinear GMM Asymptotics for Nonlinear GMM Eric Zivot February 13, 2013 Asymptotic Properties of Nonlinear GMM Under standard regularity conditions (to be discussed later), it can be shown that where ˆθ(Ŵ) θ 0 ³ˆθ(Ŵ)

More information

A Course in Applied Econometrics Lecture 14: Control Functions and Related Methods. Jeff Wooldridge IRP Lectures, UW Madison, August 2008

A Course in Applied Econometrics Lecture 14: Control Functions and Related Methods. Jeff Wooldridge IRP Lectures, UW Madison, August 2008 A Course in Applied Econometrics Lecture 14: Control Functions and Related Methods Jeff Wooldridge IRP Lectures, UW Madison, August 2008 1. Linear-in-Parameters Models: IV versus Control Functions 2. Correlated

More information

Advanced Econometrics

Advanced Econometrics Advanced Econometrics Dr. Andrea Beccarini Center for Quantitative Economics Winter 2013/2014 Andrea Beccarini (CQE) Econometrics Winter 2013/2014 1 / 156 General information Aims and prerequisites Objective:

More information

Economics 536 Lecture 7. Introduction to Specification Testing in Dynamic Econometric Models

Economics 536 Lecture 7. Introduction to Specification Testing in Dynamic Econometric Models University of Illinois Fall 2016 Department of Economics Roger Koenker Economics 536 Lecture 7 Introduction to Specification Testing in Dynamic Econometric Models In this lecture I want to briefly describe

More information

Econometrics of Panel Data

Econometrics of Panel Data Econometrics of Panel Data Jakub Mućk Meeting # 6 Jakub Mućk Econometrics of Panel Data Meeting # 6 1 / 36 Outline 1 The First-Difference (FD) estimator 2 Dynamic panel data models 3 The Anderson and Hsiao

More information

Missing dependent variables in panel data models

Missing dependent variables in panel data models Missing dependent variables in panel data models Jason Abrevaya Abstract This paper considers estimation of a fixed-effects model in which the dependent variable may be missing. For cross-sectional units

More information

A Course in Applied Econometrics Lecture 4: Linear Panel Data Models, II. Jeff Wooldridge IRP Lectures, UW Madison, August 2008

A Course in Applied Econometrics Lecture 4: Linear Panel Data Models, II. Jeff Wooldridge IRP Lectures, UW Madison, August 2008 A Course in Applied Econometrics Lecture 4: Linear Panel Data Models, II Jeff Wooldridge IRP Lectures, UW Madison, August 2008 5. Estimating Production Functions Using Proxy Variables 6. Pseudo Panels

More information

Specification Test for Instrumental Variables Regression with Many Instruments

Specification Test for Instrumental Variables Regression with Many Instruments Specification Test for Instrumental Variables Regression with Many Instruments Yoonseok Lee and Ryo Okui April 009 Preliminary; comments are welcome Abstract This paper considers specification testing

More information

Economics 582 Random Effects Estimation

Economics 582 Random Effects Estimation Economics 582 Random Effects Estimation Eric Zivot May 29, 2013 Random Effects Model Hence, the model can be re-written as = x 0 β + + [x ] = 0 (no endogeneity) [ x ] = = + x 0 β + + [x ] = 0 [ x ] = 0

More information

Specification testing in panel data models estimated by fixed effects with instrumental variables

Specification testing in panel data models estimated by fixed effects with instrumental variables Specification testing in panel data models estimated by fixed effects wh instrumental variables Carrie Falls Department of Economics Michigan State Universy Abstract I show that a handful of the regressions

More information

Journal of Econometrics

Journal of Econometrics Journal of Econometrics 151 (2009) 17 32 Contents lists available at ScienceDirect Journal of Econometrics journal homepage: www.elsevier.com/locate/jeconom Optimally combining censored and uncensored

More information

Testing Overidentifying Restrictions with Many Instruments and Heteroskedasticity

Testing Overidentifying Restrictions with Many Instruments and Heteroskedasticity Testing Overidentifying Restrictions with Many Instruments and Heteroskedasticity John C. Chao, Department of Economics, University of Maryland, chao@econ.umd.edu. Jerry A. Hausman, Department of Economics,

More information

Wooldridge, Introductory Econometrics, 4th ed. Chapter 15: Instrumental variables and two stage least squares

Wooldridge, Introductory Econometrics, 4th ed. Chapter 15: Instrumental variables and two stage least squares Wooldridge, Introductory Econometrics, 4th ed. Chapter 15: Instrumental variables and two stage least squares Many economic models involve endogeneity: that is, a theoretical relationship does not fit

More information

Identification and Estimation Using Heteroscedasticity Without Instruments: The Binary Endogenous Regressor Case

Identification and Estimation Using Heteroscedasticity Without Instruments: The Binary Endogenous Regressor Case Identification and Estimation Using Heteroscedasticity Without Instruments: The Binary Endogenous Regressor Case Arthur Lewbel Boston College Original December 2016, revised July 2017 Abstract Lewbel (2012)

More information

Introductory Econometrics

Introductory Econometrics Based on the textbook by Wooldridge: : A Modern Approach Robert M. Kunst robert.kunst@univie.ac.at University of Vienna and Institute for Advanced Studies Vienna November 23, 2013 Outline Introduction

More information

Approximate Distributions of the Likelihood Ratio Statistic in a Structural Equation with Many Instruments

Approximate Distributions of the Likelihood Ratio Statistic in a Structural Equation with Many Instruments CIRJE-F-466 Approximate Distributions of the Likelihood Ratio Statistic in a Structural Equation with Many Instruments Yukitoshi Matsushita CIRJE, Faculty of Economics, University of Tokyo February 2007

More information

Calibration Estimation for Semiparametric Copula Models under Missing Data

Calibration Estimation for Semiparametric Copula Models under Missing Data Calibration Estimation for Semiparametric Copula Models under Missing Data Shigeyuki Hamori 1 Kaiji Motegi 1 Zheng Zhang 2 1 Kobe University 2 Renmin University of China Economics and Economic Growth Centre

More information

Wooldridge, Introductory Econometrics, 4th ed. Appendix C: Fundamentals of mathematical statistics

Wooldridge, Introductory Econometrics, 4th ed. Appendix C: Fundamentals of mathematical statistics Wooldridge, Introductory Econometrics, 4th ed. Appendix C: Fundamentals of mathematical statistics A short review of the principles of mathematical statistics (or, what you should have learned in EC 151).

More information

A Note on Demand Estimation with Supply Information. in Non-Linear Models

A Note on Demand Estimation with Supply Information. in Non-Linear Models A Note on Demand Estimation with Supply Information in Non-Linear Models Tongil TI Kim Emory University J. Miguel Villas-Boas University of California, Berkeley May, 2018 Keywords: demand estimation, limited

More information

Review of Classical Least Squares. James L. Powell Department of Economics University of California, Berkeley

Review of Classical Least Squares. James L. Powell Department of Economics University of California, Berkeley Review of Classical Least Squares James L. Powell Department of Economics University of California, Berkeley The Classical Linear Model The object of least squares regression methods is to model and estimate

More information

Multiple Equation GMM with Common Coefficients: Panel Data

Multiple Equation GMM with Common Coefficients: Panel Data Multiple Equation GMM with Common Coefficients: Panel Data Eric Zivot Winter 2013 Multi-equation GMM with common coefficients Example (panel wage equation) 69 = + 69 + + 69 + 1 80 = + 80 + + 80 + 2 Note:

More information

The Influence Function of Semiparametric Estimators

The Influence Function of Semiparametric Estimators The Influence Function of Semiparametric Estimators Hidehiko Ichimura University of Tokyo Whitney K. Newey MIT July 2015 Revised January 2017 Abstract There are many economic parameters that depend on

More information

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3 Hypothesis Testing CB: chapter 8; section 0.3 Hypothesis: statement about an unknown population parameter Examples: The average age of males in Sweden is 7. (statement about population mean) The lowest

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

Cross-fitting and fast remainder rates for semiparametric estimation

Cross-fitting and fast remainder rates for semiparametric estimation Cross-fitting and fast remainder rates for semiparametric estimation Whitney K. Newey James M. Robins The Institute for Fiscal Studies Department of Economics, UCL cemmap working paper CWP41/17 Cross-Fitting

More information

Lecture 6: Discrete Choice: Qualitative Response

Lecture 6: Discrete Choice: Qualitative Response Lecture 6: Instructor: Department of Economics Stanford University 2011 Types of Discrete Choice Models Univariate Models Binary: Linear; Probit; Logit; Arctan, etc. Multinomial: Logit; Nested Logit; GEV;

More information

Bayesian Interpretations of Heteroskedastic Consistent Covariance Estimators Using the Informed Bayesian Bootstrap

Bayesian Interpretations of Heteroskedastic Consistent Covariance Estimators Using the Informed Bayesian Bootstrap Bayesian Interpretations of Heteroskedastic Consistent Covariance Estimators Using the Informed Bayesian Bootstrap Dale J. Poirier University of California, Irvine September 1, 2008 Abstract This paper

More information

Bootstrap Testing in Econometrics

Bootstrap Testing in Econometrics Presented May 29, 1999 at the CEA Annual Meeting Bootstrap Testing in Econometrics James G MacKinnon Queen s University at Kingston Introduction: Economists routinely compute test statistics of which the

More information

Efficient Estimation of Dynamic Panel Data Models: Alternative Assumptions and Simplified Estimation

Efficient Estimation of Dynamic Panel Data Models: Alternative Assumptions and Simplified Estimation Efficient Estimation of Dynamic Panel Data Models: Alternative Assumptions and Simplified Estimation Seung C. Ahn Arizona State University, Tempe, AZ 85187, USA Peter Schmidt * Michigan State University,

More information

Lawrence D. Brown* and Daniel McCarthy*

Lawrence D. Brown* and Daniel McCarthy* Comments on the paper, An adaptive resampling test for detecting the presence of significant predictors by I. W. McKeague and M. Qian Lawrence D. Brown* and Daniel McCarthy* ABSTRACT: This commentary deals

More information

Semi and Nonparametric Models in Econometrics

Semi and Nonparametric Models in Econometrics Semi and Nonparametric Models in Econometrics Part 4: partial identification Xavier d Haultfoeuille CREST-INSEE Outline Introduction First examples: missing data Second example: incomplete models Inference

More information

Bootstrap Approach to Comparison of Alternative Methods of Parameter Estimation of a Simultaneous Equation Model

Bootstrap Approach to Comparison of Alternative Methods of Parameter Estimation of a Simultaneous Equation Model Bootstrap Approach to Comparison of Alternative Methods of Parameter Estimation of a Simultaneous Equation Model Olubusoye, O. E., J. O. Olaomi, and O. O. Odetunde Abstract A bootstrap simulation approach

More information

The outline for Unit 3

The outline for Unit 3 The outline for Unit 3 Unit 1. Introduction: The regression model. Unit 2. Estimation principles. Unit 3: Hypothesis testing principles. 3.1 Wald test. 3.2 Lagrange Multiplier. 3.3 Likelihood Ratio Test.

More information

Binary Models with Endogenous Explanatory Variables

Binary Models with Endogenous Explanatory Variables Binary Models with Endogenous Explanatory Variables Class otes Manuel Arellano ovember 7, 2007 Revised: January 21, 2008 1 Introduction In Part I we considered linear and non-linear models with additive

More information

Quantile methods. Class Notes Manuel Arellano December 1, Let F (r) =Pr(Y r). Forτ (0, 1), theτth population quantile of Y is defined to be

Quantile methods. Class Notes Manuel Arellano December 1, Let F (r) =Pr(Y r). Forτ (0, 1), theτth population quantile of Y is defined to be Quantile methods Class Notes Manuel Arellano December 1, 2009 1 Unconditional quantiles Let F (r) =Pr(Y r). Forτ (0, 1), theτth population quantile of Y is defined to be Q τ (Y ) q τ F 1 (τ) =inf{r : F

More information

Closest Moment Estimation under General Conditions

Closest Moment Estimation under General Conditions Closest Moment Estimation under General Conditions Chirok Han Victoria University of Wellington New Zealand Robert de Jong Ohio State University U.S.A October, 2003 Abstract This paper considers Closest

More information

Econometrics Summary Algebraic and Statistical Preliminaries

Econometrics Summary Algebraic and Statistical Preliminaries Econometrics Summary Algebraic and Statistical Preliminaries Elasticity: The point elasticity of Y with respect to L is given by α = ( Y/ L)/(Y/L). The arc elasticity is given by ( Y/ L)/(Y/L), when L

More information

Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices

Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices Article Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices Fei Jin 1,2 and Lung-fei Lee 3, * 1 School of Economics, Shanghai University of Finance and Economics,

More information

Bayesian Interpretations of Heteroskedastic Consistent Covariance. Estimators Using the Informed Bayesian Bootstrap

Bayesian Interpretations of Heteroskedastic Consistent Covariance. Estimators Using the Informed Bayesian Bootstrap Bayesian Interpretations of Heteroskedastic Consistent Covariance Estimators Using the Informed Bayesian Bootstrap Dale J. Poirier University of California, Irvine May 22, 2009 Abstract This paper provides

More information

Locally Robust Semiparametric Estimation

Locally Robust Semiparametric Estimation Locally Robust Semiparametric Estimation Victor Chernozhukov Juan Carlos Escanciano Hidehiko Ichimura Whitney K. Newey The Institute for Fiscal Studies Department of Economics, UCL cemmap working paper

More information

Økonomisk Kandidateksamen 2004 (I) Econometrics 2. Rettevejledning

Økonomisk Kandidateksamen 2004 (I) Econometrics 2. Rettevejledning Økonomisk Kandidateksamen 2004 (I) Econometrics 2 Rettevejledning This is a closed-book exam (uden hjælpemidler). Answer all questions! The group of questions 1 to 4 have equal weight. Within each group,

More information

Econometrics II - EXAM Answer each question in separate sheets in three hours

Econometrics II - EXAM Answer each question in separate sheets in three hours Econometrics II - EXAM Answer each question in separate sheets in three hours. Let u and u be jointly Gaussian and independent of z in all the equations. a Investigate the identification of the following

More information

Closest Moment Estimation under General Conditions

Closest Moment Estimation under General Conditions Closest Moment Estimation under General Conditions Chirok Han and Robert de Jong January 28, 2002 Abstract This paper considers Closest Moment (CM) estimation with a general distance function, and avoids

More information

Modification and Improvement of Empirical Likelihood for Missing Response Problem

Modification and Improvement of Empirical Likelihood for Missing Response Problem UW Biostatistics Working Paper Series 12-30-2010 Modification and Improvement of Empirical Likelihood for Missing Response Problem Kwun Chuen Gary Chan University of Washington - Seattle Campus, kcgchan@u.washington.edu

More information

What s New in Econometrics. Lecture 13

What s New in Econometrics. Lecture 13 What s New in Econometrics Lecture 13 Weak Instruments and Many Instruments Guido Imbens NBER Summer Institute, 2007 Outline 1. Introduction 2. Motivation 3. Weak Instruments 4. Many Weak) Instruments

More information

Stochastic Convergence, Delta Method & Moment Estimators

Stochastic Convergence, Delta Method & Moment Estimators Stochastic Convergence, Delta Method & Moment Estimators Seminar on Asymptotic Statistics Daniel Hoffmann University of Kaiserslautern Department of Mathematics February 13, 2015 Daniel Hoffmann (TU KL)

More information

Econometrics. Week 8. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

Econometrics. Week 8. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague Econometrics Week 8 Institute of Economic Studies Faculty of Social Sciences Charles University in Prague Fall 2012 1 / 25 Recommended Reading For the today Instrumental Variables Estimation and Two Stage

More information

Chapter 6 Stochastic Regressors

Chapter 6 Stochastic Regressors Chapter 6 Stochastic Regressors 6. Stochastic regressors in non-longitudinal settings 6.2 Stochastic regressors in longitudinal settings 6.3 Longitudinal data models with heterogeneity terms and sequentially

More information

Generated Covariates in Nonparametric Estimation: A Short Review.

Generated Covariates in Nonparametric Estimation: A Short Review. Generated Covariates in Nonparametric Estimation: A Short Review. Enno Mammen, Christoph Rothe, and Melanie Schienle Abstract In many applications, covariates are not observed but have to be estimated

More information

Testing Restrictions and Comparing Models

Testing Restrictions and Comparing Models Econ. 513, Time Series Econometrics Fall 00 Chris Sims Testing Restrictions and Comparing Models 1. THE PROBLEM We consider here the problem of comparing two parametric models for the data X, defined by

More information

AGEC 661 Note Fourteen

AGEC 661 Note Fourteen AGEC 661 Note Fourteen Ximing Wu 1 Selection bias 1.1 Heckman s two-step model Consider the model in Heckman (1979) Y i = X iβ + ε i, D i = I {Z iγ + η i > 0}. For a random sample from the population,

More information

Size and Shape of Confidence Regions from Extended Empirical Likelihood Tests

Size and Shape of Confidence Regions from Extended Empirical Likelihood Tests Biometrika (2014),,, pp. 1 13 C 2014 Biometrika Trust Printed in Great Britain Size and Shape of Confidence Regions from Extended Empirical Likelihood Tests BY M. ZHOU Department of Statistics, University

More information

Final Exam. Economics 835: Econometrics. Fall 2010

Final Exam. Economics 835: Econometrics. Fall 2010 Final Exam Economics 835: Econometrics Fall 2010 Please answer the question I ask - no more and no less - and remember that the correct answer is often short and simple. 1 Some short questions a) For each

More information

On Variance Estimation for 2SLS When Instruments Identify Different LATEs

On Variance Estimation for 2SLS When Instruments Identify Different LATEs On Variance Estimation for 2SLS When Instruments Identify Different LATEs Seojeong Lee June 30, 2014 Abstract Under treatment effect heterogeneity, an instrument identifies the instrumentspecific local

More information

Inference for Identifiable Parameters in Partially Identified Econometric Models

Inference for Identifiable Parameters in Partially Identified Econometric Models Inference for Identifiable Parameters in Partially Identified Econometric Models Joseph P. Romano Department of Statistics Stanford University romano@stat.stanford.edu Azeem M. Shaikh Department of Economics

More information

Estimation for two-phase designs: semiparametric models and Z theorems

Estimation for two-phase designs: semiparametric models and Z theorems Estimation for two-phase designs:semiparametric models and Z theorems p. 1/27 Estimation for two-phase designs: semiparametric models and Z theorems Jon A. Wellner University of Washington Estimation for

More information

Longitudinal and Panel Data: Analysis and Applications for the Social Sciences. Table of Contents

Longitudinal and Panel Data: Analysis and Applications for the Social Sciences. Table of Contents Longitudinal and Panel Data Preface / i Longitudinal and Panel Data: Analysis and Applications for the Social Sciences Table of Contents August, 2003 Table of Contents Preface i vi 1. Introduction 1.1

More information

Panel Data Models. James L. Powell Department of Economics University of California, Berkeley

Panel Data Models. James L. Powell Department of Economics University of California, Berkeley Panel Data Models James L. Powell Department of Economics University of California, Berkeley Overview Like Zellner s seemingly unrelated regression models, the dependent and explanatory variables for panel

More information

(θ θ ), θ θ = 2 L(θ ) θ θ θ θ θ (θ )= H θθ (θ ) 1 d θ (θ )

(θ θ ), θ θ = 2 L(θ ) θ θ θ θ θ (θ )= H θθ (θ ) 1 d θ (θ ) Setting RHS to be zero, 0= (θ )+ 2 L(θ ) (θ θ ), θ θ = 2 L(θ ) 1 (θ )= H θθ (θ ) 1 d θ (θ ) O =0 θ 1 θ 3 θ 2 θ Figure 1: The Newton-Raphson Algorithm where H is the Hessian matrix, d θ is the derivative

More information

Statistics 3858 : Maximum Likelihood Estimators

Statistics 3858 : Maximum Likelihood Estimators Statistics 3858 : Maximum Likelihood Estimators 1 Method of Maximum Likelihood In this method we construct the so called likelihood function, that is L(θ) = L(θ; X 1, X 2,..., X n ) = f n (X 1, X 2,...,

More information

INFERENCE APPROACHES FOR INSTRUMENTAL VARIABLE QUANTILE REGRESSION. 1. Introduction

INFERENCE APPROACHES FOR INSTRUMENTAL VARIABLE QUANTILE REGRESSION. 1. Introduction INFERENCE APPROACHES FOR INSTRUMENTAL VARIABLE QUANTILE REGRESSION VICTOR CHERNOZHUKOV CHRISTIAN HANSEN MICHAEL JANSSON Abstract. We consider asymptotic and finite-sample confidence bounds in instrumental

More information

Discrete Dependent Variable Models

Discrete Dependent Variable Models Discrete Dependent Variable Models James J. Heckman University of Chicago This draft, April 10, 2006 Here s the general approach of this lecture: Economic model Decision rule (e.g. utility maximization)

More information

Chapter 1. GMM: Basic Concepts

Chapter 1. GMM: Basic Concepts Chapter 1. GMM: Basic Concepts Contents 1 Motivating Examples 1 1.1 Instrumental variable estimator....................... 1 1.2 Estimating parameters in monetary policy rules.............. 2 1.3 Estimating

More information

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018 Econometrics I KS Module 2: Multivariate Linear Regression Alexander Ahammer Department of Economics Johannes Kepler University of Linz This version: April 16, 2018 Alexander Ahammer (JKU) Module 2: Multivariate

More information

Calibration Estimation of Semiparametric Copula Models with Data Missing at Random

Calibration Estimation of Semiparametric Copula Models with Data Missing at Random Calibration Estimation of Semiparametric Copula Models with Data Missing at Random Shigeyuki Hamori 1 Kaiji Motegi 1 Zheng Zhang 2 1 Kobe University 2 Renmin University of China Econometrics Workshop UNC

More information

8. Hypothesis Testing

8. Hypothesis Testing FE661 - Statistical Methods for Financial Engineering 8. Hypothesis Testing Jitkomut Songsiri introduction Wald test likelihood-based tests significance test for linear regression 8-1 Introduction elements

More information

The regression model with one stochastic regressor (part II)

The regression model with one stochastic regressor (part II) The regression model with one stochastic regressor (part II) 3150/4150 Lecture 7 Ragnar Nymoen 6 Feb 2012 We will finish Lecture topic 4: The regression model with stochastic regressor We will first look

More information

Let us first identify some classes of hypotheses. simple versus simple. H 0 : θ = θ 0 versus H 1 : θ = θ 1. (1) one-sided

Let us first identify some classes of hypotheses. simple versus simple. H 0 : θ = θ 0 versus H 1 : θ = θ 1. (1) one-sided Let us first identify some classes of hypotheses. simple versus simple H 0 : θ = θ 0 versus H 1 : θ = θ 1. (1) one-sided H 0 : θ θ 0 versus H 1 : θ > θ 0. (2) two-sided; null on extremes H 0 : θ θ 1 or

More information

Generalized Method of Moment

Generalized Method of Moment Generalized Method of Moment CHUNG-MING KUAN Department of Finance & CRETA National Taiwan University June 16, 2010 C.-M. Kuan (Finance & CRETA, NTU Generalized Method of Moment June 16, 2010 1 / 32 Lecture

More information

New Developments in Econometrics Lecture 9: Stratified Sampling

New Developments in Econometrics Lecture 9: Stratified Sampling New Developments in Econometrics Lecture 9: Stratified Sampling Jeff Wooldridge Cemmap Lectures, UCL, June 2009 1. Overview of Stratified Sampling 2. Regression Analysis 3. Clustering and Stratification

More information

Estimation of Dynamic Regression Models

Estimation of Dynamic Regression Models University of Pavia 2007 Estimation of Dynamic Regression Models Eduardo Rossi University of Pavia Factorization of the density DGP: D t (x t χ t 1, d t ; Ψ) x t represent all the variables in the economy.

More information

DSGE Methods. Estimation of DSGE models: GMM and Indirect Inference. Willi Mutschler, M.Sc.

DSGE Methods. Estimation of DSGE models: GMM and Indirect Inference. Willi Mutschler, M.Sc. DSGE Methods Estimation of DSGE models: GMM and Indirect Inference Willi Mutschler, M.Sc. Institute of Econometrics and Economic Statistics University of Münster willi.mutschler@wiwi.uni-muenster.de Summer

More information

Inference about Clustering and Parametric. Assumptions in Covariance Matrix Estimation

Inference about Clustering and Parametric. Assumptions in Covariance Matrix Estimation Inference about Clustering and Parametric Assumptions in Covariance Matrix Estimation Mikko Packalen y Tony Wirjanto z 26 November 2010 Abstract Selecting an estimator for the variance covariance matrix

More information

Panel Threshold Regression Models with Endogenous Threshold Variables

Panel Threshold Regression Models with Endogenous Threshold Variables Panel Threshold Regression Models with Endogenous Threshold Variables Chien-Ho Wang National Taipei University Eric S. Lin National Tsing Hua University This Version: June 29, 2010 Abstract This paper

More information

AN EMPIRICAL LIKELIHOOD RATIO TEST FOR NORMALITY

AN EMPIRICAL LIKELIHOOD RATIO TEST FOR NORMALITY Econometrics Working Paper EWP0401 ISSN 1485-6441 Department of Economics AN EMPIRICAL LIKELIHOOD RATIO TEST FOR NORMALITY Lauren Bin Dong & David E. A. Giles Department of Economics, University of Victoria

More information

Introduction to Estimation Methods for Time Series models Lecture 2

Introduction to Estimation Methods for Time Series models Lecture 2 Introduction to Estimation Methods for Time Series models Lecture 2 Fulvio Corsi SNS Pisa Fulvio Corsi Introduction to Estimation () Methods for Time Series models Lecture 2 SNS Pisa 1 / 21 Estimators:

More information

The generalized method of moments

The generalized method of moments Robert M. Kunst robert.kunst@univie.ac.at University of Vienna and Institute for Advanced Studies Vienna February 2008 Based on the book Generalized Method of Moments by Alastair R. Hall (2005), Oxford

More information

Linear models. Linear models are computationally convenient and remain widely used in. applied econometric research

Linear models. Linear models are computationally convenient and remain widely used in. applied econometric research Linear models Linear models are computationally convenient and remain widely used in applied econometric research Our main focus in these lectures will be on single equation linear models of the form y

More information

Max. Likelihood Estimation. Outline. Econometrics II. Ricardo Mora. Notes. Notes

Max. Likelihood Estimation. Outline. Econometrics II. Ricardo Mora. Notes. Notes Maximum Likelihood Estimation Econometrics II Department of Economics Universidad Carlos III de Madrid Máster Universitario en Desarrollo y Crecimiento Económico Outline 1 3 4 General Approaches to Parameter

More information

ECO Class 6 Nonparametric Econometrics

ECO Class 6 Nonparametric Econometrics ECO 523 - Class 6 Nonparametric Econometrics Carolina Caetano Contents 1 Nonparametric instrumental variable regression 1 2 Nonparametric Estimation of Average Treatment Effects 3 2.1 Asymptotic results................................

More information

The BLP Method of Demand Curve Estimation in Industrial Organization

The BLP Method of Demand Curve Estimation in Industrial Organization The BLP Method of Demand Curve Estimation in Industrial Organization 9 March 2006 Eric Rasmusen 1 IDEAS USED 1. Instrumental variables. We use instruments to correct for the endogeneity of prices, the

More information

Nonparametric Identification of a Binary Random Factor in Cross Section Data - Supplemental Appendix

Nonparametric Identification of a Binary Random Factor in Cross Section Data - Supplemental Appendix Nonparametric Identification of a Binary Random Factor in Cross Section Data - Supplemental Appendix Yingying Dong and Arthur Lewbel California State University Fullerton and Boston College July 2010 Abstract

More information

Partial Identification and Confidence Intervals

Partial Identification and Confidence Intervals Partial Identification and Confidence Intervals Jinyong Hahn Department of Economics, UCLA Geert Ridder Department of Economics, USC September 17, 009 Abstract We consider statistical inference on a single

More information

A Robust Approach to Estimating Production Functions: Replication of the ACF procedure

A Robust Approach to Estimating Production Functions: Replication of the ACF procedure A Robust Approach to Estimating Production Functions: Replication of the ACF procedure Kyoo il Kim Michigan State University Yao Luo University of Toronto Yingjun Su IESR, Jinan University August 2018

More information

TECHNICAL WORKING PAPER SERIES USING WEIGHTS TO ADJUST FOR SAMPLE SELECTION WHEN AUXILIARY INFORMATION IS AVAILABLE. Aviv Nevo

TECHNICAL WORKING PAPER SERIES USING WEIGHTS TO ADJUST FOR SAMPLE SELECTION WHEN AUXILIARY INFORMATION IS AVAILABLE. Aviv Nevo TECHNICAL WORKING PAPER SERIES USING WEIGHTS TO ADJUST FOR SAMPLE SELECTION WHEN AUXILIARY INFORMATION IS AVAILABLE Aviv Nevo Technical Working Paper 275 http://www.nber.org/papers/t0275 NATIONAL BUREAU

More information