SEMIPARAMETRIC EFFICIENCY IN GMM MODELS WITH AUXILIARY DATA. By Xiaohong Chen, Han Hong and Alessandro Tarozzi New York University and Duke University

Size: px
Start display at page:

Download "SEMIPARAMETRIC EFFICIENCY IN GMM MODELS WITH AUXILIARY DATA. By Xiaohong Chen, Han Hong and Alessandro Tarozzi New York University and Duke University"

Transcription

1 Submitted to the Annals of Statistics SEMIPARAMETRIC EFFICIENCY IN GMM MODELS WITH AUXILIARY DATA By Xiaohong Chen, Han Hong and Alessandro Tarozzi New York University and Duke University We study semiarametric efficiency bounds and efficient estimation of arameters defined through general moment restrictions with missing data. Identification relies ouxiliary data containing information about the distribution of the missing variables conditional on roxy variables that are observed in both the rimary and the auxiliary database, when such distribution is common to the two data sets. The auxiliary samle can be indeendent of the rimary samle, or can be a subset of it. For both cases, we derive bounds when the robability of missing data given the roxy variables is unknown, or known, or belongs to a correctly secified arametric family. We find that the conditional robability is not ancillary when the two samles are indeendent. For all cases, we discuss efficient semiarametric estimators. An estimator based o conditional exectation rojection is shown to require milder regularity conditions than one based on inverse robability weighting.. Introduction. Many emirical studies are comlicated by the resence of missing data. In such circumstances, identifying assumtions become necessary to overcome the lack of identification that results from the missing information. One solution to this identification roblem is based on the assumtion that information on the true value of the variables in the data set of interest (the rimary data set) can be recovered using auxiliary data sources under a conditional indeendence assumtion. The key element of this identification strategy is that the distribution of the variables of interest is assumed to be indeendent of whether they belong to the rimary or the auxiliary samle, conditional o set of roxy variables, which are observed in both samles. The first goal of this aer is to study semiarametric efficiency bounds of arameters defined through general nonlinear and over-identified moment conditions for missing data models under a conditional indeendence assumtion. We rovide semiarametric efficiency bounds for the cases when the roensity score is unknown, or is known, or belongs to a correctly secified arametric family. In our context, the roensity score is defined as the robability that one observation belongs to the subsamle where only the roxy variables are observed. The auxiliary samle can be either a subset of the rimary samle ( verify-in-samle case) or indeendent of the rimary samle ( verify-out-of-samle ). The former case is a secial case of the MAR or CAR missing AMS 2000 subject classifications: Primary 62H2, 62D05; secondary 62F2, 62G20. Keywords and hrases: Semiarametric Efficiency Bounds, GMM, Measurement Error, Missing Data, Auxiliary data, Sieve Estimation

2 2 X. CHEN, H. HONG AND A. TAROZZI data structure where the missing variables are common to all subjects. Semiarametric efficiency bounds for this case are closely related to the results in 26 when there is a single hierarchy in the case of monotone missing data atterns for a fixed set of instrument functions. (See also 25 and 7.) We rovide new results on semiarametric efficiency bounds for the verify-out-of-samle case. We find that while more information on the roensity score will not affect the asymtotic efficiency bounds for arameters defined in the verify-in-samle case (as shown in, e.g. 26, 7 and 6), it will imrove the asymtotic efficiency for arameters defined in the verify-out-of-samle case. Our new efficiency bound results for the case when the arametric roensity is correctly secified should be useful ilied work because such assumtion is frequently adoted by emirical researchers. The second goal of aer is to develo two classes of sieve-based, Generalized Method of Moments (GMM) estimators that achieve the efficiency bounds for arameters defined under either the verify-out-of-samle or the verify-in-samle framework. Each estimator relies only on one nonarametric estimate; a conditional exectation rojection based GMM (hereafter CEP-GMM) estimator only requires the nonarametric estimation of a conditional exectation, while an inverse robability weighting based GMM (hereafter IPW-GMM) estimator only needs a nonarametric estimate of the roensity score. We establish asymtotic normality and efficiency roerties of both estimators under weaker regularity conditions than the existing ones in the literature. Irticular, we allow for nonlinear and non-smooth moment restrictions and for unbounded suort of conditioning (or roxy) variables. The CEP-GMM estimator resents some advantages over the IPW-GMM estimator. First, its root-symtotic normality and efficiency can be derived without the strong assumtion that the unknown roensity score is uniformly bounded away from zero and one. Second, the CEP-GMM estimator is characterized by a simle common format that achieves the relevant efficiency bound for all the cases we consider, regardless of whether the roensity score is unknown, or known or arametrically secified. Instead, the IPW-GMM estimator will be generally inefficient when the roensity score is known, or is arametrically estimated using a correctly secified arametric model; in such instances, different combinations of nonarametric and arametric estimates of the roensity score have to be secifically derived to achieve the efficiency bounds. Our results calso be alied to the estimation of arametric nonlinear models with nonclassical measurement errors with validation data, a toic that has been studied in 6, 29, 5, 20, 8 among others. Section 2 describes the model and resents the semiarametric efficiency bounds. Semiarametrically efficient CEP-GMM and IPW-GMM estimators are develoed in sections 3 and 4 resectively. In Section 5 we illustrate emirically the erformance of the different estimators in the estimation of the distribution of rivate consumtion in rural India in the resence of missing data. Section 6 concludes. All roofs are given in the aendices.

3 EFFICIENT GMM WITH AUXILIARY DATA 3 2. Semiarametric Efficiency Bounds. Let (X i, Y i, D i ) n be an i.i.d. samle from (X, Y, D), and denote Z i = (Y i, X i ) where Y i is only observed when D i = 0. We are interested in the estimation of arameters β R d β defined imlicitly in terms of general nonlinear moment conditions. In the first (verify-out-of-samle) case such conditions are described by () E m(z; β) D = = 0 if and only if β = β 0, while in the second (verify-in-samle) case the condition is (2) E m(z; β) = 0 if and only if β = β 0, where Z = (Y, X) and m( ; β) is a set of functions with dimension d m d β. In other words, under case () Y is always missing in the rimary data set (D = ), which is a random samle from the oulation of interest, while an indeendent auxiliary samle (where D = 0) will serve the urose of ensuring the identification of arameters that would not be identified by the rimary data set alone. Under case (2), the auxiliary samle is instead a subset of the entire rimary samle. In this section we resent the semiarametric efficiency bound for the estimation of β imlicitly defined by either moment conditions () or (2). To state the efficiency bounds we introduce some notations. Let = P r(d = ) and (X) = P r(d = X). In this aer β is tyically used to denote arbitrary value in the arameter sace, but to save notation in this section β is also used as the true arameter value β 0. Define E(X; β) = E m(z; β) X to be the conditional exectation of the moment conditions given X, and define V (m(z; β) X) = E m(z; β)m(z; β) X E(X; β)e(x; β) to be the conditional variance of the moment conditions given X. Iddition, define Jβ = β E m(z; β) D = and J β 2 = E m(z; β). β Assumtion. (i) Both Jβ and J β 2 have full column rank equal to d β; (ii) The data (X i, Y i, D i ) n is an i.i.d. samle from (X, Y, D); (iii) = P r(d = ) (0, ). Notice that in both case () and (2) the moment conditions are assumed to hold in the rimary samle in which some information is missing. Identification is ossible because of the access to auxiliary data set (D = 0) which contains both Y and a set of roxy variables X that are also otentially of interest, if the following fundamental conditional indeendence assumtion holds: Assumtion 2. Y D X.

4 4 X. CHEN, H. HONG AND A. TAROZZI Conditional indeendence assumtions have been used extensively in econometrics and statistics to achieve identification with missing data. Examles include inference in models with attrition or nonresonse (e.g. 2, 25, 27, 34, 35), the estimation of treatment effects (see e.g. the references surveyed in 7), the recovery of comarability over time of statistics calculated using data collected with different methodology (e.g. Clogg et al. (99), 28, 32). Under case (2), Assumtion would be satisfied if, for instance, the robability of validating a given observation only deends on X. In case (), Assumtion 2 requires that the samling scheme used to create the auxiliary samle deends only on X. If a simle random subset of the rimary data is validated, (X) is a constant and the auxiliary data set is characterized by the same distribution of (Y, X) as the rimary data set, and Assumtion 2 is easily seen satisfied. In this case, which is common in the statistics literature, the auxiliary data set is usually called a validation data set. A stratified samle satisfying Assumtion 2 in model (2) calso be roduced through a two-stage samling design using a finite number of strata (see e.g. 4 and 3), in which case the only variable X that is observed for all samled observations is a discrete stratum indicator. Theorem. Under Assumtions and 2, the asymtotic variance lower bound for n( ˆβ β) for any regular estimator ˆβ is given by (J βω β J β). When the moment condition case () holds, J β J β and Ω β = Ω β where Ω (X) 2 (X) β = E 2 V (m(z; β) X) + ( (X)) 2 E (X; β) E (X; β). When the moment condition case (2) holds, J β J 2 β and Ω β = Ω 2 β where Ω 2 β = E V m (Z; β) X + E (X; β) E(X; β). (X) 2.. Information content of the roensity score. It is interesting to analyze whether the knowledge of the roensity score (X) decreases the semiarametric efficiency bounds for the arameters β. Theorem 2. Under Assumtions and 2, if (X) is known, then the semiarametric efficiency variance bound for estimating β is (J β Ω β J β). When the moment condition case () holds, J β Jβ and Ω β = Ω β where Ω (X) 2 (X)2 β = E 2 V (m(z; β) X) + ( (X)) 2 E(X; β)e (X; β). When the moment condition (2) holds, J β Jβ 2 and Ω β = Ω 2 β given in Theorem.

5 EFFICIENT GMM WITH AUXILIARY DATA 5 In other words, knowledge of (X) reduces the semiarametric efficient variance bound for β under the verify-out-of-samle case, but it does not under the verifyin-samle case. The following argument rovides an intuition for this result. When (2) holds, β is defined through the relation m(y, x; β)f(y x)dyf(x)dx = 0. The roensity score (X) does not enter the definition of β, therefore its knowledge should not affect the variance bound for β. However, the relation that identifies β when () holds clearly deends on (X): m(y, x; β)(x)f(y x)dyf(x)dx = 0. Remark : A secial case of Theorem 2 is when (X) is a constant. In this case, the auxiliary samle is also called a validation samle and is drawn randomly from the same oulatios the rimary samle, e.g. Y, X D (6, 29, 20). In such case it is then easy to see that the two efficiency bounds given in Theorem 2 become identical. Another interesting question is what is the efficiency bound for the estimation of β defined by moment condition () if the roensity score is unknown but is assumed to belong to a correctly secified arametric family, so that (X) = (X; γ). Let γ (X) = (X; γ)/ γ, and define the score function for γ as S γ = S γ (D, X) = D (X;γ) (X;γ)( (X;γ)) γ(x). Theorem 3. Under Assumtions and 2, if (X) = (X; γ) and ES γ (D, X)S γ (D, X) is ositive definite, then the efficiency variance bound for estimating β defined by moment condition () is given by (J β J β) where J β = Jβ and Ω β Ω β = Ω β + (E E(X; β) γ(x) ) ES γ S γ γ (X)E(X; β) (E ). This variance bound is clearly larger than Ω β stated in Theorem 2, but it is smaller than the bound in Theorem. This latter result can be verified noting first that the bound in Theorem 3 corresonds to the variance of the following influence function: ( D)(X) β) (m(z; β) E(X; β)) + Proj(E(X; (D (X)) ( (X)) S (X)E(X; β) γ(d, X)) +, where we use Proj(Z Z 2 ) to denote the oulation least squares rojection of a random variable Z onto the linear sace sanned by Z 2. The conclusion follows noting that the variance bound stated in Theorem for moment condition () is instead the variance of the following influence function ( D)(X) DE(X; β) + m(z; β) E(X; β), ( (X))

6 6 X. CHEN, H. HONG AND A. TAROZZI whose corresonding variance is larger. Our results for GMM models comlement and extend the finding in the rogram evaluation literature that knowing the roensity score decreases the efficient variance bound for the estimation of the average effect of treatment on the treated, while the roensity score is ancillary for the average treatment effect arameter (6). 3. CEP-GMM Estimation. In this section, we consider a first class of semiarametrically efficient estimators based o conditional exectation rojection (CEP) method. If Assumtion 2 holds, identification follows by noting that, under case () E m(z; β) D = = E m (Z; β) x, D = 0 f (x D = ) dx, while under case (2), E m (Z; β) = E m (Z; β) x, D = 0 f (x) dx. Therefore, E m(z; β) x, D = 0 can be recovered using observations where D = 0, and it can then be integrated against either f(x D = ) or f(x) to recover the arameters of interest. In other words, the rojection method first estimates E(X; β) E m(z; β) X = E m(z; β) X, D = 0 nonarametrically from the auxiliary samle, and theverages this nonarametric estimator over the rimary samle. 3.. Efficient estimation with unknown roensity score. In the following, we use subscrits and a to refer to observations belonging to the rimary samle and to the auxiliary samle resectively. Let n be the size of the rimary samle and be the size of the auxiliary samle. Observations in the rimary samle are indexed by i =,..., n. Observations in the auxiliary samle are indexed by j =,...,. Under moment condition () (verify-out-of-samle case), n = n +. Under moment condition (2) (verify-in-samle case), n = n. Let Ê(X; β) denote a nonarametric estimate of E(X; β) using the auxiliary samle. 8 (hereafter CHT) used a sieve based method for this nonarametric estimation. Let q l (X), l =, 2,...} denote a sequence of known basis functions that caroximate any square-measurable function of X arbitrarily well. Also let q k(na) (X) = ( ( ) q (X),..., q k(na) (X)) and Q a = q k(na) (X a ),..., q k(na) (X ana ) for some integer k( ), with k( ) and k( )/n 0 when n. Then for each given β, the first ste nonarametric estimation can be defined as, Ê (X; β) = m (Z aj ; β) q k(na) (X aj ) ( Q ) aq a q k() (X). j=

7 EFFICIENT GMM WITH AUXILIARY DATA 7 A generalized method of moment estimator for β 0 can then be defined as (3) ˆβ = arg min β B ( n ( n ) Ê (X i ; β)) Ŵ Ê (X i ; β). n n The n-consistency and asymtotic normality of this CEP-GMM estimator have been established in CHT. Following the roof of their claim (A.2), we have the following asymtotic reresentation: n n Ê (X i ; β 0 ) = n + n n E(X i ; β 0 ) n n j= f X (X aj ) f(x aj D = 0) m(z aj; β 0 ) E(X aj ; β 0 ) + o (), where we use f X (X) to denote the density of X in the rimary data set, and o () reresents a term that converges to 0 in robability. When moment condition () holds, n = n +, f X (X) = f(x D = ) and f X (X) ( )(X) = f(x D = 0) ( (X)). In this case we calso write the influence function for n (4) n D ie(x i ; β 0 ) + ( D i ) (X i ) ( (X i )) n n n Ê (X i; β 0 ) as } m(z i ; β 0 ) E(X i ; β 0 ) + o (). The roof of Theorem shows that the two terms in the influence function corresond to the two comonents of the efficient influence function that contain informatiobout f(x D = ) and f(y X) resectively. These two terms are orthogonal to each other, so that n n Avar( Ê (X i ; β 0 )) = Ω n β, where Ω β is given in Theorem. When moment condition (2) holds, f X (X) = f(x), n = nd The influence function for (5) f X (X) f(x D = 0) = n n (X). n Ê (X i; β 0 ) can then be writtes n } E(X i ; β 0 ) + ( D i ) m(z i ; β 0 ) E(X i ; β 0 ) + o (). n (X i )

8 8 X. CHEN, H. HONG AND A. TAROZZI The two terms in the influence function corresond to the two comonents of the rojected efficiency influence function that contain informatiobout f(x) and f(y X) in the roof of Theorem. The orthogonality between these two terms imlies that n n Avar( Ê (X i ; β 0 )) = Ω 2 β, n where Ω 2 β is given in Theorem. The semiarametric efficiency bounds given in Theorem are thechieved by an otimally weighted GMM estimator ˆβ for β 0 that uses a weighting matrix Ŵ = Ω β + o (). Before we formally resent the semiarametric efficiency roerty of the CEP-GMM estimator, we need to introduce some notations and assumtions. Let the suort of X be X = R dx. We could use more comlicated notations and let X = X c X dc, with X c being the suort of the continuous variables and X dc the suort of the finitely many discrete variables. Further we could decomose X c = X c X c2 with X c = R d x, and X c2 being a comact and connected subset of R d x,2. Then, under simle and usual modifications of the assumtions, the large samle results stated below would remain valid. To avoid tedious notation yet to allow for some unbounded suort elements of X, we assume X = X c = R dx. For any d x vector a = (a,..., a dx ) of non-negative integers, we write a = d x k= a k, and for any x = (x,..., x dx ) X, we denote the a -th derivative of a function h : X R as: a h(x) = a x a... xa dx d x h(x). For some γ > 0, let γ be the largest integer smaller than γ, and let Λ γ (X ) denote a Hölder sace with smoothness γ, i.e., a sace of functions h : X R which have u to γ continuous derivatives, and the highest (γ-th) derivatives are Hölder continuous with the Hölder exonent γ γ (0,. The Hölder sace becomes a Banach sace when endowed with the Hölder norm: h Λ γ = su x h(x) + max a =γ su x x a h(x) a h(x) (x x) γ γ <. (x x) Let Λ γ (X, ω ) denote a weighted Hölder sace of functions h : X R such that h( ) + 2 ω/2 is in Λ γ (X ). We call Λ γ c (X, ω ) h Λ γ (X, ω ) : h( ) + 2 ω/2 Λ γ c < } a weighted Hölder ball (with radius c). The sieve estimator Ê(X; β) needs to converge to E(X; β) in some metric. We allow suorts of the roxy variables to be unbounded, and use a weighted su-norm metric defined as g,ω su g(x, β) + x 2 ω/2 x X,β B for some ω > 0. Also we let Π n g denote the rojection of g onto the closed linear san of q k(na) (x) = (q (x),..., q k(na)(x)) under the norm,ω. Let f Xa (x) = f X D=0 (x) and f X (x) = f X D= (x).

9 EFFICIENT GMM WITH AUXILIARY DATA 9 The following assumtion is sufficient to ensure that Ê( ; β) converges to E( ; β) under the suremum norm,ω. Assumtion 3. Let Ŵ W = o () for a ositive semidefinite matrix W, and the following hold: () for all β B, E( ; β) belongs to a weighted Hölder ball Λ γ c (X, ω ) for some γ > 0 and ω 0; (2) ( + x 2 ) ω f X (x)dx <, ( + x 2 ) ω f Xa (x)dx < for some ω > ω 0; (3) For each fixed x, E(x; β) is continuous at β for all β B; (4) V arm(z i ; β) X i = x, D i = 0 is bounded uniformly over x and β; (5) For any E( ; β) Λ γ c (X, ω ), there is a sequence Π n E in the sieve sace G n = g( ; β) Λ γ c (X, ω ) : g(x; β) = q k(na) (x) π(β)} such that E( ; β) Π n E( ; β),ω = o(). Also E a q k(na) (X)q k(na) (X) is non-singular. Theorem 4. Let β be the CEP-GMM estimator given in (3). Under Assumtions, 2 and 3, if k( ), k(na) 0, then ˆβ β 0 = o (). Additional regularity conditions are required for stating the asymtotic normality results. Let E ( ) = E( D = ) and E a ( ) = E( D = 0). Denote h 2 2,a = h(x) 2 f Xa (x)dx = E a h(x) 2 } and Π 2n h be the rojection of h onto the closed linear san of q k(na) (x) = (q (x),..., q k(na)(x)) under the norm 2,a. Assumtion 4. Let β 0 int(b), E E(X; β 0 )E(X; β 0 ) be ositive definite, and the following hold: () Assumtion 3. is satisfied with γ > d x /2 and Assumtion 3.2 is satisfied with ω > ω +γ; (2) For each fixed x, and for some δ > 0 E(x;β) β is continuous in β B with β β 0 δ, E su β: β β0 δ E(X;β) β < ; (3) There exist a constant ɛ (0,, a δ > 0 and a measurable function b( ) with E b(x ) < such that Ẽ(x;β) β E(x;β) β b(x) Ẽ E,ω ɛ for all β B with β β 0 δ and all Ẽ ( ) 2 ( ) Λ γ dx c (X, ω ) with Ẽ E fx (X) 2γ+dx,ω δ. (4) E a f Xa (X) < ; (5) k( ) = O na, n γ 2γ+dx f a X ( ) f Xa ( ) Π f X ( ) 2n f Xa ( ) = o(n /2 ). 2,a Theorem 5. Let β be the CEP-GMM estimator given in (3). Under Assumtions, 2, 3 and 4, we have n( β β 0 ) N (0, V ), with V = (J β W J β) J β W Ω βw J β (J β W J β), where Ω β is given in Theorem. Furthermore, if W = Ω β, then n( β β 0 ) N (0, V 0 ), with V 0 = (J β Ω β J β), where J β = Jβ and Ω β = Ω β under moment condition (), and J β = Jβ 2 and Ω β = Ω 2 β under moment condition (2). Remark 2: (i) Assumtions 3 and 4 allow for m(z; β) to be non-smooth such as in quantile based moment functions. (ii) The weights ω and ω are needed since the suort of the conditioning variable X is allowed to include the entire Euclidean sace. When X has bounded suort and f X is bounded above and below over its suort, we

10 0 X. CHEN, H. HONG AND A. TAROZZI can simly set ω = 0 = ω in Assumtions 3 and 4, and relace Assumtion 4. with the assumtion that 3. holds with γ > d x /2. (iii) Since f X (X) f Xa (X) = (X)( ) ( (X)), Assumtion 5.4 is automatically satisfied under the condition 0 < (x) <, which is a condition tyically imosed in the rogram evaluation literature such as in 8. Assumtion 4.5 will be satisfied under mild smoothness conditions imosed on (X) (X) ( ). dx 2γ+dx Irticular, if we let k( ) = O na, the growth order which leads to the otimal ( ) convergence rate of Ê( ; β 0) E( ; β 0 ) 2,a = O n γ 2γ+dx a, then Assumtion 4.5 is sat- ( ) f isfied with X ( ) f Xa ( ) Π f X ( ) 2n f Xa ( ) = o n dx ( ) 2(2γ+dx) a = o k( ) 2. For examle, both 2,a ( ) Assumtions 4.4 and 4.5 will be satisfied as long as ( ) Λγ (X, ω ) with γ > d x /2. The roofs of Theorems 4 and 5 follow directly from those in CHT, who also rovide simle consistent estimators of V and V 0 : V = ( J W J ) J W ΩW J ( J W J ) and V 0 = ( J Ω J ), where for both moment conditions () and (2), J = n n Ê(X i; β) β, Ω = j= ) ( υ ajûaj ( υ ajûaj ) + n 2 n (Ê(Xi ; β)ê(x i; β) ), Û aj = m(y aj, X aj ; β) Ê(X aj; β), υ n (Q aj = q k(na) ) (X i ) a Q a q k(na) (X aj ). n 3.2. CEP estimation with arametric or known roensity score. Suose now that the roensity score (X) is correctly arameterized as (X; γ) u to a finite-dimensional unknowrameter γ. Theorems 2 and 5 show that the otimally weighted CEP-GMM estimator defined in (3) still achieves the semiarametric efficiency bound for β defined by moment condition (2). However, according to Theorems 3 and 5, such an estimator is no longer efficient for β defined through moment condition (). Rewriting moment condition () as E E(X i ; β 0 ) (X i) = 0, we cagain construct an efficient estimator for β 0 based on the sieve estimate Ê(X; β) and the correctly secified arametric form (X; γ). Irticular, the otimally weighted GMM estimator using the following samle moment condition will achieve the efficiency bound in Theorem 3 for β defined through (): (6) n n Ê(X i ; β) (X i; ˆγ), ˆ

11 where ˆ = n n γ: n EFFICIENT GMM WITH AUXILIARY DATA and ˆγ is the arametric MLE estimator that solves the score equation for n Sˆγ (D i, X i ) = n n D i (X i ; ˆγ) (X i ; ˆγ)( (X i ; ˆγ)) ˆγ(X i ) = 0. Theorem 6. Let (X; γ) be the arametric roensity score function known u to the arameters γ and let ES γ0 (D, X)S γ0 (D, X) be ositive definite. Let β 0 satisfy the moment condition () and β be its CEP-GMM estimator using the samle moment (6). Under Assumtions, 2, 3 and 4, we have n( β β 0 ) N (0, V ), with V = (Jβ W J β ) Jβ W Ωβ W Jβ (Jβ W J β ), where Ω β is given in Theorem 3. Further, if W = Ω β, then n( β β 0 ) N (0, V 0 ), where V 0 = (Jβ Ω β J β ) is the efficiency variance bound given in Theorem 3. The roof of this theorem is very similar to the revious ones and hence is omitted. It suffices to oint out that n n Ê(X i ; β 0 ) (X i;ˆγ) ˆ can be shown to be asymtotically equivalent to where n n (X i ) E(X i ; β 0 ) + D } i (X i ) m(z i; β 0 ) E(X i ; β 0 ) +E E(X i ; β 0 ) γ(x i ) n(ˆγ γ0 ), n(ˆγ γ0 ) = ES γ0 (D, X)S γ0 (D, X) n n S γ0 (D i, X i ) + o (). We remark that even whe arametric assumtion is being made about the roensity score (X; γ) (in fact even if iddition f(y ) is assumed to be a arametric likelihood), the inference about β is still semiarametric. This is because the marginal density f(x) is still nonarametric and contains semiarametric informatiobout β. This exlains why nonarametric estimation is still needed to achieve the efficiency bound for β. The case where the roensity is fully known can be considered a secial case of arametric roensity score where the arameters are known. In this case, the efficient moment condition is as in (6) after relacing (X j ; ˆγ) with the known (X j ). Remark: When the auxiliary data set is a validation data set, e.g. (X) =, the arameters β defined by both moment conditions () and (2) coincide. Therefore the CEP-GMM estimator defined in (3) when we take n = nd the summation to be over the all observations will achieve semiarametric efficiency.

12 2 X. CHEN, H. HONG AND A. TAROZZI 4. IPW-GMM Estimation. Alternative estimation method for β is the inverse robability weighting based GMM (IPW-GMM). Several authors have considered inverse robability weighting aired with a conditional indeendence assumtion for estimation in resence of missing information. Recent examles include arametric IPW as in 24, 34, 35 and 32, for missing data models, and nonarametric inverse robability weighting as in 8 for the case of mean treatment effect analysis. In this section, we extend existing results and first show that the otimally weighted IPW-GMM estimator of β is semiarametrically efficient when the roensity score is unknown. The same estimator, however, will be generally inefficient when the roensity score is known or belongs to a correctly secified arametric family; combinations of nonarametric and known or arametric estimated roensity scores are needed to achieve the semiarametric efficiency bounds for these cases. 4.. Efficient estimation with unknown roensity score. The IPW-GMM method uses the fact that under Assumtion 2, moment condition () can be rewrittes: (7) (X)( ) E m(z; β) D = = E m(z; β) ( (X)) D = 0 ; while moment condition (2) is equivalent to: (8) E m(z; β) = E m(z; β) (X) D = 0. Let ˆ(X) be a consistent estimate of the true roensity score. Then we can estimate β 0 defined by case () using GMM with the following samle moment: (9) n ˆ(X j ) m(z j ; β) ˆ(X j= j ) ˆ ˆ, and estimate β 0 defined by case (2) using GMM with the following samle moment: (0) n ˆ m(z j ; β) ˆ(X j= j ). The inverse robability weighting aroach is considered semiarametric when ˆ(X) is estimated nonarametrically. In this case, it can be shown that the samle moment (9) evaluated at β 0 is asymtotically equivalent to n (X i ) ( D i )m(z i ; β 0 ) n (X i ) + E(X i; β 0 ) D i (X i ) + o (). (X i ) The two comonents of this influence functiore negatively correlated. Because of this, the asymtotic variance might be smaller than that of the estimator of β 0 based

13 EFFICIENT GMM WITH AUXILIARY DATA 3 on moment condition (9) with the known (X). The influence function can be rewritten in terms of two orthogonal terms as n n ( D i ) m(z i ; β 0 ) E(X i ; β 0 ) which is identical to the influence function in (4). Therefore, Avar( n ˆ(X j ) m(z j ; β 0 ) ˆ(X j= j ) } (X i ) (X i ) + D ie(x i ; β 0 ) + o (), ˆ ˆ ) = Ω β, where Ω β is given in Theorem. An otimally weighted GMM estimator for β 0 defined by case () using samle moment (9) should thechieve the semiarametric efficiency bound stated in Theorem. The influence function reresentation for samle moment (0) can be calculated as n ( D i )m(z i ; β 0 ) n (X i ) + E(X i; β 0 ) D i (X i ) + o (), (X i ) whose two comonents are again negatively correlated. As in the revious case, the influence function can be written in terms of two orthogonal comonents as n n ( D i ) m(z i ; β 0 ) E(X i ; β 0 ) } (X i ) + E(X i; β 0 ) + o (), which is identical to the influence function in (5). Hence, an otimally weighted GMM estimator for β 0 defined by case (2) using samle moment (0) achieves the semiarametric efficiency bound for case () stated in Theorem. In this subsection, to emhasize that the true roensity score function is unknown and has to be estimated nonarametrically, we use o (x) E(D X = x) to indicate the true roensity score and (x) to denote any candidate function. (Note that to save notations in the rest of the main text (x) denotes the true roensity score.) Let ( ) be a sieve estimator of o (x) that uses the combined samle (D i, X i ) : i =,..., n}. Let Z ai = (Y ai, X ai ) : i =,..., } be the auxiliary (i.e. D = 0) data set. We define the IPW-GMM estimator β for moment condition () as ( ) ( ) (X ai ) β = arg min m(z ai ; β) β B (X ai ) Ŵ (X ai ) () m(z ai ; β) (X ai ) and the IPW-GMM estimator β for moment condition (2) as ( ) ( ) β = arg min m(z ai ; β) β B (X ai ) Ŵ (2) m(z ai ; β). (X ai ) There are two oular sieve nonarametric estimators of o ():

14 4 X. CHEN, H. HONG AND A. TAROZZI (i) a sieve Least Squares (LS) estimator ls (x) as in 6: ls = arg min () H n n n (D i (X i )) 2 /2. In the aendix we establish the consistency and convergence rate of ls (x) under the assumtion that the variables in X have unbounded suort. (ii) a sieve Maximum Likelihood (ML) estimator mle (x) as in 8: n mle = arg max D i log(x i ) + ( D i ) log (X i )}, H n = () H n n h Λ γ c (X ) : h(x) = A kn (x) π 2} or } h Λ γ c (X ) : h(x) = ex(a kn (x) π). Recall E a ( ) = ( )f X D=0 (x)dx. Define a weighted su-norm as, for some ω > 0, h,ω su x X h(x) + x 2 ω/2. Assumtion 5. Let Ŵ W = o () for a ositive semidefinite matrix W, and the following hold: () o () belongs to a Hölder ball H = () Λ γ c (X ) : 0 < (x) < } for some γ > 0; (2) ( + x 2 ) ω f X (x)dx < for some ω > 0; (3) there is a non-increasing function b( ) such that b(δ) 0 as δ 0 and E a su m(z i ; β) m(z i, β) 2 β β <δ b(δ) for all small ositive value δ; (4) E a su β B m(z i ; β) 2 < ; (5) for any h H, there is a sequence Π n h H n such that h Π n h,ω = o(). Theorem 7. Let β be the IPW-GMM estimator given in () or (2). Under Assumtions 2, and 5, if kn n 0, k n, then: β β 0 = o (). Let E( ) = ( )f X (x)dx, h 2 = h(x) 2 f X (x)dx, and Π 2n h be the rojection of h onto the closed linear san of q kn (x) = (q (x),..., q kn (x)) under the norm 2. We need the following additional assumtions to obtaisymtotic normality. o(x) Assumtion 6. : Let β 0 int(b), E E(X; β o(x) 0)E(X; β 0 ) be ositive definite, and the following hold: () Assumtions 5. and 5.2 are satisfied with γ > d x /2 and ω > γ; (2) There exist a constant ɛ (0, and a small δ 0 > 0 such that E a su m(z i ; β) m(z i, β) 2 β β <δ const.δ ɛ

15 EFFICIENT GMM WITH AUXILIARY DATA 5 for any small ositive value δ δ 0 ; (3) E a su β B: β β0 δ 0 m(z i ; β) 2 ( + X i 2 ) ω < E(X;β for some small δ 0 > 0; (4) E 0 ) β ( + X 2 ) ω 2 <, and for all x X, E(x;β) ( ) β is continuous around β 0 ; (5) k n = O n dx 2γ+dx, n γ 2γ+dx E( ;βo) Π o( ) 2n E( ;βo) o( ) = 2 o(n /2 ). (6) either one of the following is satisfied: (6a) su β B: β β0 δ 0 su x X E(x, β) const. < for some small δ 0 > 0; (6b) E a su β B: β β0 δ 0 E(X, β) 4 const. < for some small δ 0 > 0, and f X D=0 ( ) Λ γ c (X ) with γ > 3d x /4; (6c) E a su β B: β β0 δ 0 E(X, β) 2 const. < for some small δ 0 > 0, and f X D=0 ( ) Λ γ c (X ) with γ > d x. Theorem 8. Let β be the IPW-GMM estimator given in () or (2). Under Assumtions, 2, 5 and 6, we have n( β β 0 ) N (0, V ), with V the same as in Theorem 5. Remark 3: (i) The weighting ω is needed since the suort of the conditioning variable X is assumed to be the entire Euclidean sace. When X has bounded suort and f X D=0 is bounded above and below over its suort, we can simly set ω = 0 in Assumtions 5 and 6 and relace 6. with the assumtion that 5. holds with γ > d x /2. Note that Assumtion 6.6a is easily satisfied when X has comact suort. When X = R dx, Assumtion 6.6a rules out E(x, β) being linear in x; Assumtions 6.6b or 6.6c allow for linear E(x, β) but need smoother roensity score (x) and density f X D=0. (ii) Assumtions 5 and 6 agaillow for non-smooth moment conditions. (iii) Since f X D=0(X) f X (X) that f X D=0(X) f X (X) = o(x), the assumtion 0 < o (x) < imlies, hence E() and E a() in Assumtions 5 and 6 are effectively equivalent. (iv) Although Assumtion 5. imoses the same strong condition 0 < o (x) < as that tyically assumed in the rogram evaluation literature, unlike most existing aers on estimation of average treatment effects, our aer allows for unbounded suort of X and assumes weaker smoothness on o (x) and E( ; β o ). In ( ) articular, if we let k n = O n dx 2γ+dx, the growth order which leads to the otimal ( convergence rate of () o () 2 = O n ) γ 2γ+dx, then Assumtion 6.5 is satisfied with ( ) ( ) E( ;βo) Π o( ) 2n E( ;βo) o( ) = o n dx 2(2γ+dx) = o k 2 n IPW Estimation with arametric or known roensity score. The case of moment condition (2) is simler and therefore we briefly discuss it first. Theorems and 2 have shown that knowledge about the roensity score does not change the semiarametric efficiency bound. Furthermore, theorems 5 and 8 show that both a nonarametric CEP-GMM estimator and a nonarametric IPW-GMM estimator for β achieve this semiarametric efficiency bound regardless of whether the roensity score is unknown, known or arametrically secified. The following theorem also states, without roof, the interesting result that the arametric IPW estimator using (X; ˆγ) is in fact less

16 6 X. CHEN, H. HONG AND A. TAROZZI efficient than the one using a nonarametric estimate ˆ(X) in (0), but is more efficient than the one using the known (X). Theorem 9. Suose the arametric model (X i ; γ) is correctly secified. Suose also that ES γ (D, X)S γ (D, X) is ositive definite. Under moment condition (2) and using the otimally weighted samle moment condition (0), an IPW-GMM estimator for β using a arametric estimate of (X i ; ˆγ) in lace of ˆ(X i ) in (0) is more efficient than the one using the known (X i ), but is less efficient than the one using a nonarametric estimate ˆ(X i ) of the roensity score. and This result is based on the following relations, which hold asymtotically n Avar( n Avar( ˆ n m(z j ; β 0 ) (X j= i ; ˆγ) ) Avar( ˆ m(z j ; β 0 ) (X j= i ) ) ˆ n m(z j ; β 0 ) (X j= i ; ˆγ) ) Avar( ˆ m(z j ; β 0 ) ˆ(X j= i ) ). Now consider the more interesting case where moment condition () holds and samle moment condition (9) is used. Consider the case when the arametric roensity score is correctly secified. First, it is clear that the otimally weighted IPW-GMM estimator of β based on (9) that uses a nonarametric estimate of ˆ(X) does not achieve the efficiency bound in Theorem 3, because we see from Theorem 8 that this estimator achieves instead the variance bound in Theorem, which is larger than the variance bound in Theorem 3. However, the arametric two ste IPW estimator that uses a arametric first ste for (X; γ) does not achieve the efficiency bound in Theorem 3 either. To see this, note that the arametric two ste IPW estimator is based on the moment condition n (X j ; ˆγ) m(z j ; β) (X j= j ; ˆγ) which has a linear influence function reresentation of where m(z i ; β 0 ) ( D i)(x i ) (X i ) Proj(E(X i ; β 0 ) D i (X i ) (X i ) ˆ ˆ, + Proj(E(X i ; β 0 ) D i (X i ) (X i ) S γ(d i, X i )) = E E(X; β 0 ) γ(x) (X) S γ(d i, X i )), E S γ (D i, X i )S γ (D i, X i ) Sγ (D i, X i )

17 EFFICIENT GMM WITH AUXILIARY DATA 7 is the influence function from the first ste estimation of γ. The difference between this influence functiond the influence function for Theorem 3 can be verified to be equal to (X) Res((D (X)) (X) E(X; β 0) S γ(d i, X i )), which is obviously orthogonal to the influence function of Theorem 3. Therefore, the two ste arametric IPW estimator has a variance larger than the efficiency bound under the assumtion of correct secification of the arametric model for (X; γ). An IPW tye estimator that achieves the efficiency bound under correct secification can be obtained by combining both nonarametric and arametric estimates of the roensity score. Such an efficient moment condition is given by (3) n j= m(z j ; β) (X j; ˆγ) ˆ ˆ(X j ) ˆ, where ˆγ is the maximum likelihood estimator for γ 0 and ˆ(X) is the sieve estimate of the roensity score. This moment condition has the following asymtotic linear reresentation: n n ( D i )(m(z i ; β 0 ) E(X i ; β 0 )) (X i) (X i ) + (X i)e(x i ; β 0 ) +E E(X;β0 ) ) γ (X i n(ˆγ γ), which is identical to the influence function under correct arametric secification of (X; γ) leading to the semiarametric efficiency bound in Theorem 3. The case where the roensity score is fully known can be considered a secial case of arametric roensity score where the arameters are known. In this case, the efficient moment condition is as in (3) after relacing (X j ; ˆγ) with the known (X j ). It is finally worth noting that Assumtion 2 is an identificatiossumtion that is not testable. Therefore both the CEP-GMM estimator and the IPW-GMM estimator will converge to the same oulation limit regardless of whether Assumtion 2 holds, as long as the same weighting matrix is being used. The oulation difference between CEP and IPW can only arise from the arametric mis-secification of the aroximating models for E(X; β) and (X). 5. Emirical Illustration. We illustrate our method emirically using data from the Indian National Samle Survey (NSS), a data set routinely used by the Indian Government to monitor changes in the distribution of rivate consumtion. Several researchers have argued that due to changes in the exenditure questionnaire adoted for data collection in the round of the NSS, overty estimates from this round are likely to be non-comarable with those from revious years. A change in the questionnaire section where food exenditures are recorded likely led to the overestimation of food consumtion, and hence to the underestimation of overty (4, 32). In other

18 8 X. CHEN, H. HONG AND A. TAROZZI words, a missing data roblem arises because the variable of interest (total exenditure as it would have been recorded using a standard questionnaire) is not observed. 3, 2 and 32 argue that total exenditure i set of miscellaneous items for which the questionnaire was not modified ( comarable items hereafter) can be used as a roxy variable to roduce an estimate of overty for that is comarable with revious years. In this section, we assume that the researcher is interested in estimating the cumulative distribution function (cdf) for rural India in of a measure of total monthly exenditure that is comarable with revious NSS rounds. In the terminology used in this article, this situation corresonds to a verify-out-of-samle case (), where the arameter of interest β is identified in terms of a variable Y that is not observed in the rimary samle (the survey). The moment function takes the form m(z; β 0 ) = (Y y) β 0, where y is a given threshold. We use the revious round of the NSS (993-94) as auxiliary survey, and exenditure in comarable items as roxy variable X. The two rounds are indeendent cross-sections from the Indian oulation. The crucial identifying assumtion is that the distribution of Y conditional on X remained stable between and (for a discussiond indirect evidence is suort of this assumtion see 32). Table reorts oint estimates and standard errors for the cdf of total (log) household exenditure at selected thresholds, using different estimators. The first column reorts the cdf estimated using the noncomarable data from the rimary samle. Column 2 reorts CEP-GMM estimates, calculated using 3 rd order olynomial slines in exenditure in comarable items as sieve basis, with 0 knots at the equal range quantiles of the emirical distribution of the roxy variables. Column 3 reorts estimates obtained using moment condition (7), but with a nonarametric first ste where we estimate P (X) using sieve-logit, including the basis functions we used for CEP-GMM as regressors. In Column 4 we imose a arametric model, and we estimate the roensity score using logit, with X entered linearly in the single index. Column 5 reorts the results for the estimator described in Section 3.2, which is efficient whe arametric model is correctly secified for P (X). For values of Y below 7 the adjusted estimates of the cdf in columns 2 to 4 are larger than the unadjusted figures resented in column, even if only by one-two ercentage oints. This is consistent with what would be exected based on the hyothesis that the change in the questionnaire introduced in led to an overstatement of total reorted exenditure with resect to revious NSS rounds. As exected, CEP and IPW non-arametric estimators roduce virtually identical results. The estimates in columns 4 and 5 imose a simle logit for the roensity score, but they are still very similar. In the verify-out-of-samle case, knowledge of a arametric form for P (X) lowers the semiarametric efficiency bound, and this may exlain why in some cases the standard errors in column 4 are lower than for the estimators in columns 2 and 3, which are only efficient when P (X) is unknown. Notice also that when the arametric assumtion is correct the efficient estimator is the one in column 5. Indeed the standard errors for this estimator are always lower or virtually identical to those in column 4 every time this

19 EFFICIENT GMM WITH AUXILIARY DATA 9 latter estimator is more recise than the nonarametric estimators in columns 2 and Conclusions. We derive semiarametric efficiency bounds for the estimation of arameters defined through general nonlinear, ossibly non-smooth and over-identified moment conditions, when variables in the rimary samle of interest are missing. For identification we rely on the validity of a conditional indeendence assumtiond on the availability of auxiliary samle that contains information on the relation between missing variables and other roxy variables that are also observed in the rimary samle. We study two alternative frameworks. In the first case ( verify-out-of-samle ) validation is done with auxiliary data set which is indeendent from the rimary data set of interest. In the second case ( verify-in-samle ) a subset of the observations in the rimary samle is validated. We show that the otimally weighted CEP-GMM estimators achieve the semiarametric efficiency bounds when the roensity score is unknown, or is known or belongs to a correctly secified arametric family. These estimators only use a nonarametric estimate of the conditional exectation of the moment functions, and their asymtotic efficiency is obtained under regularity conditions weaker than the existing ones in the literature. Irticular, these CEP-GMM estimators still achieve efficiency bounds when roxy (conditioning) variables have unbounded suorts and moment conditions are not smooth. We also rove that an otimally weighted IPW-GMM estimator is semiarametrically efficient with fully unknown roensity score. However, this estimator is not efficient when the roensity score is either known, or is arametrically estimated using a correctly secified arametric model; in such instances, aroriate combinations of nonarametric and arametric estimates of the roensity score are needed to achieve the efficiency bounds. We have also demonstrated that, from the theoretical oint of view, the CEP-GMM estimators are more attractive than the IPW-GMM estimators. Recently and indeendently 9 advocated a similar sieve conditional exectation rojection based estimator for the average treatment effect arameter in rogram evaluatiolications. Also, for the estimation of the average treatment effects in missing data models, 33 suggested that a semiarametrically secified roensity score, such as a single index or a artially linear form, can be used to reduce the curse of dimensionality in the nonarametric estimation of the roensity score. An interesting toic for future research is to study the efficiency imlications of these semiarametric restrictions on the roensity score. REFERENCES Ai, C. and Chen, X. (2003). Efficient estimation of models with conditional moment restrictions containing unknown functions. Econometrica, Bickel, P. J., Klaassen, C. A., Ritov, Y. and Wellner, J. A. (993). Efficient and Adative Estimation for Semiarametric Models. Sringer. 3 Breslow, N. E., McNeney, B. and Wellner, J. A. (2003). Large samle theory for semiarametric regression models with two-hase, outcome deendent samling. Annals of Statistics,

20 20 X. CHEN, H. HONG AND A. TAROZZI 4 Breslow, N. E., Robins, J. M. and Wellner, J. A. (2000). On the semiarametric efficiency of logistic regression under case-control samling. Bernoulli, Carroll, R., Ruert, D. and Stefanski, L. (995). Measurement error in nonlinear models. No. 63 in Monograhs on Statistics and Alied Probability, Chamand Hall, New York. 6 Carroll, R. and Wand, M. (99). Semiarametric estimation in logistic measurement error models. Journal of the Royal Statistical Society, Chen, J. B. and Breslow, N. E. (2004). Semiarametric efficient estimation for the auxiliary outcome roblem with the conditional mean model. Canadian Journal of Statistics, Chen, X., Hong, H. and Tamer, E. (2005). Measurement Error Models with Auxiliary Data. Review of Economic Studies, Chen, X., Linton, O. and van Keilegom, I. (2003). Estimation of semiarametric models when the criterion function is not smooth. Econometrica, Chen, X. and Shen, X. (998). Sieve extremum estimates for weakly deendent data. Econometrica, Clogg, C., Rubin, D., Schenker, N., Schultz, B. and Weidman, L. (99). Multile imutation of industry and occuation codes in census ublic-use samles using bayesian logistic regression. Journal of The American Statistical Association, Deaton, A. (2003). Adjusted Indian Poverty Estimates for Economic and Political Weekly Deaton, A. and Drèze, J. (2002). Poverty and Inequality in india, a Re-Examination. Economic and Political Weekly Setember 7. 4 Deaton, A. and Kozel, V. (2005). Data and Dogma: The Great indian Poverty Debate. New Delhi, India, MacMillian. 5 Gallant, A. R. and Nychka, D. W. (987). Semi-nonarametric maximum likelihood estimation. Econometrica, Hahn, J. (998). On the role of roensity score in efficient semiarametric estimation of average treatment effects. Econometrica, Heckman, J., LaLonde, R. and Smith, J. (999). The economics and econometrics of active labor market rograms. In Handbook of Labor Economics, Vol. 3A (O. Ashenfelter and D. Card, eds.). Elsevier Science. 8 Hirano, K., Imbens, G. and Ridder, G. (2003). Efficient estimation of average treatment effects using the estimated roensity score. Econometrica, Imbens, G., Newey, W. and Ridder, G. (2005). Mean-squared-error calculations for average treatment effects. Working Paer. 20 Lee, L. and Seanski, J. (995). Estimation of linear and nonlinear errors-invariables models using validation data. Journal of the American Statistical Association, Little, R. and Rubin, D. (2002). Statistical Analysis with Missing Data. 2nd ed. John Wiley & Sons, New York. 22 Newey, W. (990). Semiarametric efficiency bounds. Journal of Alied Econometrics, Newey, W. (994). The asymtotic variance of semiarametric estimators. Econometrica, Robins, J., Mark, S. and Newey, W. (992). Estimating exosure effects by modelling the exectation of exosure conditional on confounders. Biometrics, Robins, J. M. and Rotnitzky, A. (995). Semiarametric efficiency in multivariate regression models with missing data. Journal of the American Statistical Association, Robins, J. M., Rotnitzky, A. and Zhao, L. P. (994). Estimation of regression coefficients when some regressors are not always observed. Journal of the American Statistical Association, Rotnitzky, A. and Robins, J. (995). Semiarametric regression estimation in the resence of deendent censoring. Biometrika, Schenker, N. (2003). Assessing variability due to race bridging: alication to census counts and vital rates for the year Journal of The American Statistical Association, Seanski, J. and Carroll, R. (993). Semiarametric quasi-likelihood and variance estimation

21 EFFICIENT GMM WITH AUXILIARY DATA 2 in measurement error models. Journal of Econometrics, Shen, X. (997). On Methods of Sieves and Penalization. Annals of Statistics, Shen, X. and Wong, W. (994). Convergence Rates of Sieve Estimates. Annals of Statistics, Tarozzi, A. (2006). Calculating comarable statistics from incomarable surveys, with alication to overty in india. Journal of Business and Economic Statistics. Forthcoming. 33 Wang, Q., Linton, O. and Hardle, W. (2004). Semiarametric regressionalysis for missing resonse data. Journal of the American Statistical Association, Wooldridge, J. (2002). Inverse robability weighted m-estimators for samle selection, attrition and stratification. Portuguese Economic Journal, Wooldridge, J. (2003). Inverse robability weighted estimation for general missing data roblems. Manuscrit, Michigan State University. Aendix A - Calculation of Efficiency Bounds. Proof. Theorem We follow closely the structure of semiarametric efficiency bound derivation of 22 and 2. Case (). Consider a arametric ath θ for the joint distribution of Y, D and X. Define θ = P θ (D = ). The joint density function for Y, D and X is given by (4) f θ (y, x, d) = d θ( θ ) d f θ (x D = ) d f θ (x D = 0) d f(y x) d. The resulting score function is given by S θ (d, y, x) = d θ θ ( θ )ṗθ + ( d)s θ (x D = 0) + ds θ (x D = ) + ( d)s θ (y x), wheres θ (y x) = θ log f θ (y x), ṗ θ = θ θ, s θ (x d) = θ log f θ (x d). The tangent sace of this model is therefore given by: (5) T = a (d θ ) + ( d)s θ (x D = 0) + ( d)s θ (y x) + ds θ (x D = ), where s θ (y x) f θ (y x) dy = 0, s θ (x d) f θ (x d) dx = 0, and a is a finite constant. Consider first the case when the model is exactly identified. In this case β is uniquely identified by condition (). Differentiating under the integral gives (6) β (θ) θ ( ) = Jβ E m(z; β) log f θ (Y, X D = ) θ D =. The second comonent of the right hand side of this exression can be calculated as (7) E m(z; β)s θ (Y X) D = + E m(z; β)s θ (X D = ) D = Pathwise differentiability follows if we can find Ψ (Y, X, D) T such that (8) β (θ) / θ = E Ψ (Y, X, D) S θ (Y, X, D)

Semiparametric Efficiency in GMM Models with Nonclassical Measurement Error

Semiparametric Efficiency in GMM Models with Nonclassical Measurement Error Semiarametric Efficiency in GMM Models with Nonclassical Measurement Error Xiaohong Chen New York University Han Hong Duke University Alessandro Tarozzi Duke University August 2005 Abstract We study semiarametric

More information

arxiv: v2 [math.st] 4 Apr 2008

arxiv: v2 [math.st] 4 Apr 2008 The Annals of Statistics 2008, Vol. 36, No. 2, 808 843 DOI: 0.24/009053607000000947 c Institute of Mathematical Statistics, 2008 arxiv:0705.0069v2 math.st] 4 Apr 2008 SEMIPARAMETRIC EFFICIENCY IN GMM MODELS

More information

4. Score normalization technical details We now discuss the technical details of the score normalization method.

4. Score normalization technical details We now discuss the technical details of the score normalization method. SMT SCORING SYSTEM This document describes the scoring system for the Stanford Math Tournament We begin by giving an overview of the changes to scoring and a non-technical descrition of the scoring rules

More information

Estimation of the large covariance matrix with two-step monotone missing data

Estimation of the large covariance matrix with two-step monotone missing data Estimation of the large covariance matrix with two-ste monotone missing data Masashi Hyodo, Nobumichi Shutoh 2, Takashi Seo, and Tatjana Pavlenko 3 Deartment of Mathematical Information Science, Tokyo

More information

Partial Identification in Triangular Systems of Equations with Binary Dependent Variables

Partial Identification in Triangular Systems of Equations with Binary Dependent Variables Partial Identification in Triangular Systems of Equations with Binary Deendent Variables Azeem M. Shaikh Deartment of Economics University of Chicago amshaikh@uchicago.edu Edward J. Vytlacil Deartment

More information

The following document is intended for online publication only (authors webpage).

The following document is intended for online publication only (authors webpage). The following document is intended for online ublication only (authors webage). Sulement to Identi cation and stimation of Distributional Imacts of Interventions Using Changes in Inequality Measures, Part

More information

Elements of Asymptotic Theory. James L. Powell Department of Economics University of California, Berkeley

Elements of Asymptotic Theory. James L. Powell Department of Economics University of California, Berkeley Elements of Asymtotic Theory James L. Powell Deartment of Economics University of California, Berkeley Objectives of Asymtotic Theory While exact results are available for, say, the distribution of the

More information

Lecture 3 Consistency of Extremum Estimators 1

Lecture 3 Consistency of Extremum Estimators 1 Lecture 3 Consistency of Extremum Estimators 1 This lecture shows how one can obtain consistency of extremum estimators. It also shows how one can find the robability limit of extremum estimators in cases

More information

1 Extremum Estimators

1 Extremum Estimators FINC 9311-21 Financial Econometrics Handout Jialin Yu 1 Extremum Estimators Let θ 0 be a vector of k 1 unknown arameters. Extremum estimators: estimators obtained by maximizing or minimizing some objective

More information

General Linear Model Introduction, Classes of Linear models and Estimation

General Linear Model Introduction, Classes of Linear models and Estimation Stat 740 General Linear Model Introduction, Classes of Linear models and Estimation An aim of scientific enquiry: To describe or to discover relationshis among events (variables) in the controlled (laboratory)

More information

Uniform Law on the Unit Sphere of a Banach Space

Uniform Law on the Unit Sphere of a Banach Space Uniform Law on the Unit Shere of a Banach Sace by Bernard Beauzamy Société de Calcul Mathématique SA Faubourg Saint Honoré 75008 Paris France Setember 008 Abstract We investigate the construction of a

More information

Elements of Asymptotic Theory. James L. Powell Department of Economics University of California, Berkeley

Elements of Asymptotic Theory. James L. Powell Department of Economics University of California, Berkeley Elements of Asymtotic Theory James L. Powell Deartment of Economics University of California, Berkeley Objectives of Asymtotic Theory While exact results are available for, say, the distribution of the

More information

Approximating min-max k-clustering

Approximating min-max k-clustering Aroximating min-max k-clustering Asaf Levin July 24, 2007 Abstract We consider the roblems of set artitioning into k clusters with minimum total cost and minimum of the maximum cost of a cluster. The cost

More information

LECTURE 7 NOTES. x n. d x if. E [g(x n )] E [g(x)]

LECTURE 7 NOTES. x n. d x if. E [g(x n )] E [g(x)] LECTURE 7 NOTES 1. Convergence of random variables. Before delving into the large samle roerties of the MLE, we review some concets from large samle theory. 1. Convergence in robability: x n x if, for

More information

Notes on Instrumental Variables Methods

Notes on Instrumental Variables Methods Notes on Instrumental Variables Methods Michele Pellizzari IGIER-Bocconi, IZA and frdb 1 The Instrumental Variable Estimator Instrumental variable estimation is the classical solution to the roblem of

More information

Combining Logistic Regression with Kriging for Mapping the Risk of Occurrence of Unexploded Ordnance (UXO)

Combining Logistic Regression with Kriging for Mapping the Risk of Occurrence of Unexploded Ordnance (UXO) Combining Logistic Regression with Kriging for Maing the Risk of Occurrence of Unexloded Ordnance (UXO) H. Saito (), P. Goovaerts (), S. A. McKenna (2) Environmental and Water Resources Engineering, Deartment

More information

University of Michigan School of Public Health

University of Michigan School of Public Health University of Michigan School of Public Health The University of Michigan Deartment of Biostatistics Working Paer Series ear 003 Paer 5 Robust Likelihood-based Analysis of Multivariate Data with Missing

More information

Morten Frydenberg Section for Biostatistics Version :Friday, 05 September 2014

Morten Frydenberg Section for Biostatistics Version :Friday, 05 September 2014 Morten Frydenberg Section for Biostatistics Version :Friday, 05 Setember 204 All models are aroximations! The best model does not exist! Comlicated models needs a lot of data. lower your ambitions or get

More information

CONVOLVED SUBSAMPLING ESTIMATION WITH APPLICATIONS TO BLOCK BOOTSTRAP

CONVOLVED SUBSAMPLING ESTIMATION WITH APPLICATIONS TO BLOCK BOOTSTRAP Submitted to the Annals of Statistics arxiv: arxiv:1706.07237 CONVOLVED SUBSAMPLING ESTIMATION WITH APPLICATIONS TO BLOCK BOOTSTRAP By Johannes Tewes, Dimitris N. Politis and Daniel J. Nordman Ruhr-Universität

More information

Flexible Estimation of Treatment Effect Parameters

Flexible Estimation of Treatment Effect Parameters Flexible Estimation of Treatment Effect Parameters Thomas MaCurdy a and Xiaohong Chen b and Han Hong c Introduction Many empirical studies of program evaluations are complicated by the presence of both

More information

Towards understanding the Lorenz curve using the Uniform distribution. Chris J. Stephens. Newcastle City Council, Newcastle upon Tyne, UK

Towards understanding the Lorenz curve using the Uniform distribution. Chris J. Stephens. Newcastle City Council, Newcastle upon Tyne, UK Towards understanding the Lorenz curve using the Uniform distribution Chris J. Stehens Newcastle City Council, Newcastle uon Tyne, UK (For the Gini-Lorenz Conference, University of Siena, Italy, May 2005)

More information

Nonparametric estimation of Exact consumer surplus with endogeneity in price

Nonparametric estimation of Exact consumer surplus with endogeneity in price Nonarametric estimation of Exact consumer surlus with endogeneity in rice Anne Vanhems February 7, 2009 Abstract This aer deals with nonarametric estimation of variation of exact consumer surlus with endogenous

More information

Estimating Time-Series Models

Estimating Time-Series Models Estimating ime-series Models he Box-Jenkins methodology for tting a model to a scalar time series fx t g consists of ve stes:. Decide on the order of di erencing d that is needed to roduce a stationary

More information

Location of solutions for quasi-linear elliptic equations with general gradient dependence

Location of solutions for quasi-linear elliptic equations with general gradient dependence Electronic Journal of Qualitative Theory of Differential Equations 217, No. 87, 1 1; htts://doi.org/1.14232/ejqtde.217.1.87 www.math.u-szeged.hu/ejqtde/ Location of solutions for quasi-linear ellitic equations

More information

Estimation of Separable Representations in Psychophysical Experiments

Estimation of Separable Representations in Psychophysical Experiments Estimation of Searable Reresentations in Psychohysical Exeriments Michele Bernasconi (mbernasconi@eco.uninsubria.it) Christine Choirat (cchoirat@eco.uninsubria.it) Raffaello Seri (rseri@eco.uninsubria.it)

More information

Elementary Analysis in Q p

Elementary Analysis in Q p Elementary Analysis in Q Hannah Hutter, May Szedlák, Phili Wirth November 17, 2011 This reort follows very closely the book of Svetlana Katok 1. 1 Sequences and Series In this section we will see some

More information

Use of Transformations and the Repeated Statement in PROC GLM in SAS Ed Stanek

Use of Transformations and the Repeated Statement in PROC GLM in SAS Ed Stanek Use of Transformations and the Reeated Statement in PROC GLM in SAS Ed Stanek Introduction We describe how the Reeated Statement in PROC GLM in SAS transforms the data to rovide tests of hyotheses of interest.

More information

E cient Semiparametric Estimation of Dose-Response Functions

E cient Semiparametric Estimation of Dose-Response Functions E cient Semiarametric Estimation of Dose-Resonse Functions Matias D. Cattaneo y UC-Berkeley PRELIMIARY AD ICOMPLETE DRAFT COMMETS WELCOME Aril 7 Abstract. A large fraction of te literature on rogram evaluation

More information

Lecture 2: Consistency of M-estimators

Lecture 2: Consistency of M-estimators Lecture 2: Instructor: Deartment of Economics Stanford University Preared by Wenbo Zhou, Renmin University References Takeshi Amemiya, 1985, Advanced Econometrics, Harvard University Press Newey and McFadden,

More information

On-Line Appendix. Matching on the Estimated Propensity Score (Abadie and Imbens, 2015)

On-Line Appendix. Matching on the Estimated Propensity Score (Abadie and Imbens, 2015) On-Line Aendix Matching on the Estimated Proensity Score Abadie and Imbens, 205 Alberto Abadie and Guido W. Imbens Current version: August 0, 205 The first art of this aendix contains additional roofs.

More information

On split sample and randomized confidence intervals for binomial proportions

On split sample and randomized confidence intervals for binomial proportions On slit samle and randomized confidence intervals for binomial roortions Måns Thulin Deartment of Mathematics, Usala University arxiv:1402.6536v1 [stat.me] 26 Feb 2014 Abstract Slit samle methods have

More information

Asymptotically Optimal Simulation Allocation under Dependent Sampling

Asymptotically Optimal Simulation Allocation under Dependent Sampling Asymtotically Otimal Simulation Allocation under Deendent Samling Xiaoing Xiong The Robert H. Smith School of Business, University of Maryland, College Park, MD 20742-1815, USA, xiaoingx@yahoo.com Sandee

More information

Chapter 3. GMM: Selected Topics

Chapter 3. GMM: Selected Topics Chater 3. GMM: Selected oics Contents Otimal Instruments. he issue of interest..............................2 Otimal Instruments under the i:i:d: assumtion..............2. he basic result............................2.2

More information

System Reliability Estimation and Confidence Regions from Subsystem and Full System Tests

System Reliability Estimation and Confidence Regions from Subsystem and Full System Tests 009 American Control Conference Hyatt Regency Riverfront, St. Louis, MO, USA June 0-, 009 FrB4. System Reliability Estimation and Confidence Regions from Subsystem and Full System Tests James C. Sall Abstract

More information

Michael Lechner. Swiss Institute for Empirical Economic Research (SEW), University of St. Gallen

Michael Lechner. Swiss Institute for Empirical Economic Research (SEW), University of St. Gallen PARTIAL IDENTIFICATION OF WAGE EFFECT OF TRAINING PROGRAM Michael Lechner wiss Institute for Emirical Economic Research (EW), University of t. Gallen Blaise Melly Deartment of Economics, Brown University

More information

Robustness of classifiers to uniform l p and Gaussian noise Supplementary material

Robustness of classifiers to uniform l p and Gaussian noise Supplementary material Robustness of classifiers to uniform l and Gaussian noise Sulementary material Jean-Yves Franceschi Ecole Normale Suérieure de Lyon LIP UMR 5668 Omar Fawzi Ecole Normale Suérieure de Lyon LIP UMR 5668

More information

Statics and dynamics: some elementary concepts

Statics and dynamics: some elementary concepts 1 Statics and dynamics: some elementary concets Dynamics is the study of the movement through time of variables such as heartbeat, temerature, secies oulation, voltage, roduction, emloyment, rices and

More information

Elementary theory of L p spaces

Elementary theory of L p spaces CHAPTER 3 Elementary theory of L saces 3.1 Convexity. Jensen, Hölder, Minkowski inequality. We begin with two definitions. A set A R d is said to be convex if, for any x 0, x 1 2 A x = x 0 + (x 1 x 0 )

More information

Extensions of the Penalized Spline Propensity Prediction Method of Imputation

Extensions of the Penalized Spline Propensity Prediction Method of Imputation Extensions of the Penalized Sline Proensity Prediction Method of Imutation by Guangyu Zhang A dissertation submitted in artial fulfillment of the requirements for the degree of Doctor of Philosohy (Biostatistics)

More information

E cient Semiparametric Estimation of Quantile Treatment E ects

E cient Semiparametric Estimation of Quantile Treatment E ects E cient Semiarametric Estimation of Quantile Treatment E ects Sergio Firo y First Draft: ovember 2002 This Draft: January 2004 Abstract This aer resents calculations of semiarametric e ciency bounds for

More information

Lecture 6. 2 Recurrence/transience, harmonic functions and martingales

Lecture 6. 2 Recurrence/transience, harmonic functions and martingales Lecture 6 Classification of states We have shown that all states of an irreducible countable state Markov chain must of the same tye. This gives rise to the following classification. Definition. [Classification

More information

Finite Mixture EFA in Mplus

Finite Mixture EFA in Mplus Finite Mixture EFA in Mlus November 16, 2007 In this document we describe the Mixture EFA model estimated in Mlus. Four tyes of deendent variables are ossible in this model: normally distributed, ordered

More information

CHAPTER-II Control Charts for Fraction Nonconforming using m-of-m Runs Rules

CHAPTER-II Control Charts for Fraction Nonconforming using m-of-m Runs Rules CHAPTER-II Control Charts for Fraction Nonconforming using m-of-m Runs Rules. Introduction: The is widely used in industry to monitor the number of fraction nonconforming units. A nonconforming unit is

More information

E-companion to A risk- and ambiguity-averse extension of the max-min newsvendor order formula

E-companion to A risk- and ambiguity-averse extension of the max-min newsvendor order formula e-comanion to Han Du and Zuluaga: Etension of Scarf s ma-min order formula ec E-comanion to A risk- and ambiguity-averse etension of the ma-min newsvendor order formula Qiaoming Han School of Mathematics

More information

Estimating function analysis for a class of Tweedie regression models

Estimating function analysis for a class of Tweedie regression models Title Estimating function analysis for a class of Tweedie regression models Author Wagner Hugo Bonat Deartamento de Estatística - DEST, Laboratório de Estatística e Geoinformação - LEG, Universidade Federal

More information

ON UNIFORM BOUNDEDNESS OF DYADIC AVERAGING OPERATORS IN SPACES OF HARDY-SOBOLEV TYPE. 1. Introduction

ON UNIFORM BOUNDEDNESS OF DYADIC AVERAGING OPERATORS IN SPACES OF HARDY-SOBOLEV TYPE. 1. Introduction ON UNIFORM BOUNDEDNESS OF DYADIC AVERAGING OPERATORS IN SPACES OF HARDY-SOBOLEV TYPE GUSTAVO GARRIGÓS ANDREAS SEEGER TINO ULLRICH Abstract We give an alternative roof and a wavelet analog of recent results

More information

IMPROVED BOUNDS IN THE SCALED ENFLO TYPE INEQUALITY FOR BANACH SPACES

IMPROVED BOUNDS IN THE SCALED ENFLO TYPE INEQUALITY FOR BANACH SPACES IMPROVED BOUNDS IN THE SCALED ENFLO TYPE INEQUALITY FOR BANACH SPACES OHAD GILADI AND ASSAF NAOR Abstract. It is shown that if (, ) is a Banach sace with Rademacher tye 1 then for every n N there exists

More information

A Comparison between Biased and Unbiased Estimators in Ordinary Least Squares Regression

A Comparison between Biased and Unbiased Estimators in Ordinary Least Squares Regression Journal of Modern Alied Statistical Methods Volume Issue Article 7 --03 A Comarison between Biased and Unbiased Estimators in Ordinary Least Squares Regression Ghadban Khalaf King Khalid University, Saudi

More information

On the asymptotic sizes of subset Anderson-Rubin and Lagrange multiplier tests in linear instrumental variables regression

On the asymptotic sizes of subset Anderson-Rubin and Lagrange multiplier tests in linear instrumental variables regression On the asymtotic sizes of subset Anderson-Rubin and Lagrange multilier tests in linear instrumental variables regression Patrik Guggenberger Frank Kleibergeny Sohocles Mavroeidisz Linchun Chen\ June 22

More information

MATH 829: Introduction to Data Mining and Analysis Consistency of Linear Regression

MATH 829: Introduction to Data Mining and Analysis Consistency of Linear Regression 1/9 MATH 829: Introduction to Data Mining and Analysis Consistency of Linear Regression Dominique Guillot Deartments of Mathematical Sciences University of Delaware February 15, 2016 Distribution of regression

More information

p-adic Measures and Bernoulli Numbers

p-adic Measures and Bernoulli Numbers -Adic Measures and Bernoulli Numbers Adam Bowers Introduction The constants B k in the Taylor series exansion t e t = t k B k k! k=0 are known as the Bernoulli numbers. The first few are,, 6, 0, 30, 0,

More information

Inference for Empirical Wasserstein Distances on Finite Spaces: Supplementary Material

Inference for Empirical Wasserstein Distances on Finite Spaces: Supplementary Material Inference for Emirical Wasserstein Distances on Finite Saces: Sulementary Material Max Sommerfeld Axel Munk Keywords: otimal transort, Wasserstein distance, central limit theorem, directional Hadamard

More information

Recovering preferences in the household production framework: The case of averting behavior

Recovering preferences in the household production framework: The case of averting behavior Udo Ebert Recovering references in the household roduction framework: The case of averting behavior February 2002 * Address: Deartment of Economics, University of Oldenburg, D-26 Oldenburg, ermany Tel.:

More information

Voting with Behavioral Heterogeneity

Voting with Behavioral Heterogeneity Voting with Behavioral Heterogeneity Youzong Xu Setember 22, 2016 Abstract This aer studies collective decisions made by behaviorally heterogeneous voters with asymmetric information. Here behavioral heterogeneity

More information

#A64 INTEGERS 18 (2018) APPLYING MODULAR ARITHMETIC TO DIOPHANTINE EQUATIONS

#A64 INTEGERS 18 (2018) APPLYING MODULAR ARITHMETIC TO DIOPHANTINE EQUATIONS #A64 INTEGERS 18 (2018) APPLYING MODULAR ARITHMETIC TO DIOPHANTINE EQUATIONS Ramy F. Taki ElDin Physics and Engineering Mathematics Deartment, Faculty of Engineering, Ain Shams University, Cairo, Egyt

More information

On Isoperimetric Functions of Probability Measures Having Log-Concave Densities with Respect to the Standard Normal Law

On Isoperimetric Functions of Probability Measures Having Log-Concave Densities with Respect to the Standard Normal Law On Isoerimetric Functions of Probability Measures Having Log-Concave Densities with Resect to the Standard Normal Law Sergey G. Bobkov Abstract Isoerimetric inequalities are discussed for one-dimensional

More information

Quantitative estimates of propagation of chaos for stochastic systems with W 1, kernels

Quantitative estimates of propagation of chaos for stochastic systems with W 1, kernels oname manuscrit o. will be inserted by the editor) Quantitative estimates of roagation of chaos for stochastic systems with W, kernels Pierre-Emmanuel Jabin Zhenfu Wang Received: date / Acceted: date Abstract

More information

substantial literature on emirical likelihood indicating that it is widely viewed as a desirable and natural aroach to statistical inference in a vari

substantial literature on emirical likelihood indicating that it is widely viewed as a desirable and natural aroach to statistical inference in a vari Condence tubes for multile quantile lots via emirical likelihood John H.J. Einmahl Eindhoven University of Technology Ian W. McKeague Florida State University May 7, 998 Abstract The nonarametric emirical

More information

Unobservable Selection and Coefficient Stability: Theory and Evidence

Unobservable Selection and Coefficient Stability: Theory and Evidence Unobservable Selection and Coefficient Stability: Theory and Evidence Emily Oster Brown University and NBER August 9, 016 Abstract A common aroach to evaluating robustness to omitted variable bias is to

More information

Positivity, local smoothing and Harnack inequalities for very fast diffusion equations

Positivity, local smoothing and Harnack inequalities for very fast diffusion equations Positivity, local smoothing and Harnack inequalities for very fast diffusion equations Dedicated to Luis Caffarelli for his ucoming 60 th birthday Matteo Bonforte a, b and Juan Luis Vázquez a, c Abstract

More information

Paper C Exact Volume Balance Versus Exact Mass Balance in Compositional Reservoir Simulation

Paper C Exact Volume Balance Versus Exact Mass Balance in Compositional Reservoir Simulation Paer C Exact Volume Balance Versus Exact Mass Balance in Comositional Reservoir Simulation Submitted to Comutational Geosciences, December 2005. Exact Volume Balance Versus Exact Mass Balance in Comositional

More information

ON POLYNOMIAL SELECTION FOR THE GENERAL NUMBER FIELD SIEVE

ON POLYNOMIAL SELECTION FOR THE GENERAL NUMBER FIELD SIEVE MATHEMATICS OF COMPUTATIO Volume 75, umber 256, October 26, Pages 237 247 S 25-5718(6)187-9 Article electronically ublished on June 28, 26 O POLYOMIAL SELECTIO FOR THE GEERAL UMBER FIELD SIEVE THORSTE

More information

SAS for Bayesian Mediation Analysis

SAS for Bayesian Mediation Analysis Paer 1569-2014 SAS for Bayesian Mediation Analysis Miočević Milica, Arizona State University; David P. MacKinnon, Arizona State University ABSTRACT Recent statistical mediation analysis research focuses

More information

arxiv: v1 [physics.data-an] 26 Oct 2012

arxiv: v1 [physics.data-an] 26 Oct 2012 Constraints on Yield Parameters in Extended Maximum Likelihood Fits Till Moritz Karbach a, Maximilian Schlu b a TU Dortmund, Germany, moritz.karbach@cern.ch b TU Dortmund, Germany, maximilian.schlu@cern.ch

More information

LORENZO BRANDOLESE AND MARIA E. SCHONBEK

LORENZO BRANDOLESE AND MARIA E. SCHONBEK LARGE TIME DECAY AND GROWTH FOR SOLUTIONS OF A VISCOUS BOUSSINESQ SYSTEM LORENZO BRANDOLESE AND MARIA E. SCHONBEK Abstract. In this aer we analyze the decay and the growth for large time of weak and strong

More information

Hotelling s Two- Sample T 2

Hotelling s Two- Sample T 2 Chater 600 Hotelling s Two- Samle T Introduction This module calculates ower for the Hotelling s two-grou, T-squared (T) test statistic. Hotelling s T is an extension of the univariate two-samle t-test

More information

Using the Divergence Information Criterion for the Determination of the Order of an Autoregressive Process

Using the Divergence Information Criterion for the Determination of the Order of an Autoregressive Process Using the Divergence Information Criterion for the Determination of the Order of an Autoregressive Process P. Mantalos a1, K. Mattheou b, A. Karagrigoriou b a.deartment of Statistics University of Lund

More information

Real Analysis 1 Fall Homework 3. a n.

Real Analysis 1 Fall Homework 3. a n. eal Analysis Fall 06 Homework 3. Let and consider the measure sace N, P, µ, where µ is counting measure. That is, if N, then µ equals the number of elements in if is finite; µ = otherwise. One usually

More information

Research Note REGRESSION ANALYSIS IN MARKOV CHAIN * A. Y. ALAMUTI AND M. R. MESHKANI **

Research Note REGRESSION ANALYSIS IN MARKOV CHAIN * A. Y. ALAMUTI AND M. R. MESHKANI ** Iranian Journal of Science & Technology, Transaction A, Vol 3, No A3 Printed in The Islamic Reublic of Iran, 26 Shiraz University Research Note REGRESSION ANALYSIS IN MARKOV HAIN * A Y ALAMUTI AND M R

More information

Proof: We follow thearoach develoed in [4]. We adot a useful but non-intuitive notion of time; a bin with z balls at time t receives its next ball at

Proof: We follow thearoach develoed in [4]. We adot a useful but non-intuitive notion of time; a bin with z balls at time t receives its next ball at A Scaling Result for Exlosive Processes M. Mitzenmacher Λ J. Sencer We consider the following balls and bins model, as described in [, 4]. Balls are sequentially thrown into bins so that the robability

More information

MATHEMATICAL MODELLING OF THE WIRELESS COMMUNICATION NETWORK

MATHEMATICAL MODELLING OF THE WIRELESS COMMUNICATION NETWORK Comuter Modelling and ew Technologies, 5, Vol.9, o., 3-39 Transort and Telecommunication Institute, Lomonosov, LV-9, Riga, Latvia MATHEMATICAL MODELLIG OF THE WIRELESS COMMUICATIO ETWORK M. KOPEETSK Deartment

More information

Topic 7: Using identity types

Topic 7: Using identity types Toic 7: Using identity tyes June 10, 2014 Now we would like to learn how to use identity tyes and how to do some actual mathematics with them. By now we have essentially introduced all inference rules

More information

Bayesian Spatially Varying Coefficient Models in the Presence of Collinearity

Bayesian Spatially Varying Coefficient Models in the Presence of Collinearity Bayesian Satially Varying Coefficient Models in the Presence of Collinearity David C. Wheeler 1, Catherine A. Calder 1 he Ohio State University 1 Abstract he belief that relationshis between exlanatory

More information

Econometrica Supplementary Material

Econometrica Supplementary Material Econometrica Sulementary Material SUPPLEMENT TO WEAKLY BELIEF-FREE EQUILIBRIA IN REPEATED GAMES WITH PRIVATE MONITORING (Econometrica, Vol. 79, No. 3, May 2011, 877 892) BY KANDORI,MICHIHIRO IN THIS SUPPLEMENT,

More information

MODELING THE RELIABILITY OF C4ISR SYSTEMS HARDWARE/SOFTWARE COMPONENTS USING AN IMPROVED MARKOV MODEL

MODELING THE RELIABILITY OF C4ISR SYSTEMS HARDWARE/SOFTWARE COMPONENTS USING AN IMPROVED MARKOV MODEL Technical Sciences and Alied Mathematics MODELING THE RELIABILITY OF CISR SYSTEMS HARDWARE/SOFTWARE COMPONENTS USING AN IMPROVED MARKOV MODEL Cezar VASILESCU Regional Deartment of Defense Resources Management

More information

ON THE ADVERSARIAL ROBUSTNESS OF ROBUST ESTIMATORS

ON THE ADVERSARIAL ROBUSTNESS OF ROBUST ESTIMATORS O THE ADVERSARIAL ROBUSTESS OF ROBUST ESTIMATORS BY ERHA BAYAKTAR,, LIFEG LAI, University of Michigan and University of California, Davis Motivated by recent data analytics alications, we study the adversarial

More information

Maximum Likelihood Asymptotic Theory. Eduardo Rossi University of Pavia

Maximum Likelihood Asymptotic Theory. Eduardo Rossi University of Pavia Maximum Likelihood Asymtotic Theory Eduardo Rossi University of Pavia Slutsky s Theorem, Cramer s Theorem Slutsky s Theorem Let {X N } be a random sequence converging in robability to a constant a, and

More information

On the Interplay of Regularity and Decay in Case of Radial Functions I. Inhomogeneous spaces

On the Interplay of Regularity and Decay in Case of Radial Functions I. Inhomogeneous spaces On the Interlay of Regularity and Decay in Case of Radial Functions I. Inhomogeneous saces Winfried Sickel, Leszek Skrzyczak and Jan Vybiral July 29, 2010 Abstract We deal with decay and boundedness roerties

More information

AI*IA 2003 Fusion of Multiple Pattern Classifiers PART III

AI*IA 2003 Fusion of Multiple Pattern Classifiers PART III AI*IA 23 Fusion of Multile Pattern Classifiers PART III AI*IA 23 Tutorial on Fusion of Multile Pattern Classifiers by F. Roli 49 Methods for fusing multile classifiers Methods for fusing multile classifiers

More information

SECTION 5: FIBRATIONS AND HOMOTOPY FIBERS

SECTION 5: FIBRATIONS AND HOMOTOPY FIBERS SECTION 5: FIBRATIONS AND HOMOTOPY FIBERS In this section we will introduce two imortant classes of mas of saces, namely the Hurewicz fibrations and the more general Serre fibrations, which are both obtained

More information

Chapter 7 Sampling and Sampling Distributions. Introduction. Selecting a Sample. Introduction. Sampling from a Finite Population

Chapter 7 Sampling and Sampling Distributions. Introduction. Selecting a Sample. Introduction. Sampling from a Finite Population Chater 7 and s Selecting a Samle Point Estimation Introduction to s of Proerties of Point Estimators Other Methods Introduction An element is the entity on which data are collected. A oulation is a collection

More information

ECON 4130 Supplementary Exercises 1-4

ECON 4130 Supplementary Exercises 1-4 HG Set. 0 ECON 430 Sulementary Exercises - 4 Exercise Quantiles (ercentiles). Let X be a continuous random variable (rv.) with df f( x ) and cdf F( x ). For 0< < we define -th quantile (or 00-th ercentile),

More information

HENSEL S LEMMA KEITH CONRAD

HENSEL S LEMMA KEITH CONRAD HENSEL S LEMMA KEITH CONRAD 1. Introduction In the -adic integers, congruences are aroximations: for a and b in Z, a b mod n is the same as a b 1/ n. Turning information modulo one ower of into similar

More information

1 Riesz Potential and Enbeddings Theorems

1 Riesz Potential and Enbeddings Theorems Riesz Potential and Enbeddings Theorems Given 0 < < and a function u L loc R, the Riesz otential of u is defined by u y I u x := R x y dy, x R We begin by finding an exonent such that I u L R c u L R for

More information

A New Asymmetric Interaction Ridge (AIR) Regression Method

A New Asymmetric Interaction Ridge (AIR) Regression Method A New Asymmetric Interaction Ridge (AIR) Regression Method by Kristofer Månsson, Ghazi Shukur, and Pär Sölander The Swedish Retail Institute, HUI Research, Stockholm, Sweden. Deartment of Economics and

More information

The inverse Goldbach problem

The inverse Goldbach problem 1 The inverse Goldbach roblem by Christian Elsholtz Submission Setember 7, 2000 (this version includes galley corrections). Aeared in Mathematika 2001. Abstract We imrove the uer and lower bounds of the

More information

Additive results for the generalized Drazin inverse in a Banach algebra

Additive results for the generalized Drazin inverse in a Banach algebra Additive results for the generalized Drazin inverse in a Banach algebra Dragana S. Cvetković-Ilić Dragan S. Djordjević and Yimin Wei* Abstract In this aer we investigate additive roerties of the generalized

More information

Analysis of some entrance probabilities for killed birth-death processes

Analysis of some entrance probabilities for killed birth-death processes Analysis of some entrance robabilities for killed birth-death rocesses Master s Thesis O.J.G. van der Velde Suervisor: Dr. F.M. Sieksma July 5, 207 Mathematical Institute, Leiden University Contents Introduction

More information

Distributed Rule-Based Inference in the Presence of Redundant Information

Distributed Rule-Based Inference in the Presence of Redundant Information istribution Statement : roved for ublic release; distribution is unlimited. istributed Rule-ased Inference in the Presence of Redundant Information June 8, 004 William J. Farrell III Lockheed Martin dvanced

More information

Semiparametric Regression for Clustered Data Using Generalized Estimating Equations

Semiparametric Regression for Clustered Data Using Generalized Estimating Equations Semiarametric Regression for Clustered Data Using Generalized Estimating Equations Xihong Lin Raymond J. Carroll We consider estimation in a semiarametric generalized linear model for clustered data using

More information

arxiv: v4 [math.st] 3 Jun 2016

arxiv: v4 [math.st] 3 Jun 2016 Electronic Journal of Statistics ISSN: 1935-7524 arxiv: math.pr/0000000 Bayesian Estimation Under Informative Samling Terrance D. Savitsky and Daniell Toth 2 Massachusetts Ave. N.E, Washington, D.C. 20212

More information

WALD S EQUATION AND ASYMPTOTIC BIAS OF RANDOMLY STOPPED U-STATISTICS

WALD S EQUATION AND ASYMPTOTIC BIAS OF RANDOMLY STOPPED U-STATISTICS PROCEEDINGS OF HE AMERICAN MAHEMAICAL SOCIEY Volume 125, Number 3, March 1997, Pages 917 925 S 0002-9939(97)03574-0 WALD S EQUAION AND ASYMPOIC BIAS OF RANDOMLY SOPPED U-SAISICS VICOR H. DE LA PEÑA AND

More information

Conversions among Several Classes of Predicate Encryption and Applications to ABE with Various Compactness Tradeoffs

Conversions among Several Classes of Predicate Encryption and Applications to ABE with Various Compactness Tradeoffs Conversions among Several Classes of Predicate Encrytion and Alications to ABE with Various Comactness Tradeoffs Nuttaong Attraadung, Goichiro Hanaoka, and Shota Yamada National Institute of Advanced Industrial

More information

arxiv: v2 [stat.me] 3 Nov 2014

arxiv: v2 [stat.me] 3 Nov 2014 onarametric Stein-tye Shrinkage Covariance Matrix Estimators in High-Dimensional Settings Anestis Touloumis Cancer Research UK Cambridge Institute University of Cambridge Cambridge CB2 0RE, U.K. Anestis.Touloumis@cruk.cam.ac.uk

More information

Mollifiers and its applications in L p (Ω) space

Mollifiers and its applications in L p (Ω) space Mollifiers and its alications in L () sace MA Shiqi Deartment of Mathematics, Hong Kong Batist University November 19, 2016 Abstract This note gives definition of mollifier and mollification. We illustrate

More information

Slash Distributions and Applications

Slash Distributions and Applications CHAPTER 2 Slash Distributions and Alications 2.1 Introduction The concet of slash distributions was introduced by Kafadar (1988) as a heavy tailed alternative to the normal distribution. Further literature

More information

ON THE LEAST SIGNIFICANT p ADIC DIGITS OF CERTAIN LUCAS NUMBERS

ON THE LEAST SIGNIFICANT p ADIC DIGITS OF CERTAIN LUCAS NUMBERS #A13 INTEGERS 14 (014) ON THE LEAST SIGNIFICANT ADIC DIGITS OF CERTAIN LUCAS NUMBERS Tamás Lengyel Deartment of Mathematics, Occidental College, Los Angeles, California lengyel@oxy.edu Received: 6/13/13,

More information

DEPARTMENT OF ECONOMICS ISSN DISCUSSION PAPER 20/07 TWO NEW EXPONENTIAL FAMILIES OF LORENZ CURVES

DEPARTMENT OF ECONOMICS ISSN DISCUSSION PAPER 20/07 TWO NEW EXPONENTIAL FAMILIES OF LORENZ CURVES DEPARTMENT OF ECONOMICS ISSN 1441-549 DISCUSSION PAPER /7 TWO NEW EXPONENTIAL FAMILIES OF LORENZ CURVES ZuXiang Wang * & Russell Smyth ABSTRACT We resent two new Lorenz curve families by using the basic

More information

COMMUNICATION BETWEEN SHAREHOLDERS 1

COMMUNICATION BETWEEN SHAREHOLDERS 1 COMMUNICATION BTWN SHARHOLDRS 1 A B. O A : A D Lemma B.1. U to µ Z r 2 σ2 Z + σ2 X 2r ω 2 an additive constant that does not deend on a or θ, the agents ayoffs can be written as: 2r rθa ω2 + θ µ Y rcov

More information

Adaptive Estimation of the Regression Discontinuity Model

Adaptive Estimation of the Regression Discontinuity Model Adative Estimation of the Regression Discontinuity Model Yixiao Sun Deartment of Economics Univeristy of California, San Diego La Jolla, CA 9293-58 Feburary 25 Email: yisun@ucsd.edu; Tel: 858-534-4692

More information