Bounded, Efficient, and Doubly Robust Estimation with Inverse Weighting

Size: px
Start display at page:

Download "Bounded, Efficient, and Doubly Robust Estimation with Inverse Weighting"

Transcription

1 Biometrika (2008), 94, 2, pp C 2008 Biometrika Trust Printed in Great Britain Advance Access publication on 31 July 2008 Bounded, Efficient, and Doubly obust Estimation with Inverse Weighting BY Z. TAN Department of Statistics, utgers University, Piscataway, New Jersey 08854, U.S.A. ztan@stat.rutgers.edu SUMMAY Consider the problem of estimating the mean of an outcome in the presence of missing data or estimating population average treatment effects in causal inference. A doubly robust estimator remains consistent if an outcome regression model or a propensity score model is correctly specified. We build on the nonparametric likelihood approach of Tan and propose new doubly robust estimators. These estimators have desirable properties in efficiency if the propensity score model is correctly specified, and in boundedness even if the inverse probability weights are highly variable. We compare new and existing estimators in a simulation study and find that the robustified likelihood estimators yield overall the smallest mean squared errors. Some key words: Causal inference; Double robustness; Inverse weighting; Missing data; Nonparametric likelihood; Propensity score. 1. INTODUCTION Consider the problem of estimating the mean of an outcome in the presence of missing data under ignorability (ubin, 1976). A related problem is to estimate population average treatment effects under no unmeasured confounding in causal inference (Neyman, 1923; ubin, 1974). Such problems can be handled in two different ways. One approach is to model the mean of the outcome given covariates, called the outcome regression function, and derive an estimator based on the fitted values for observed and missing outcomes. The other approach is to model the probability of non-missingness given the covariates, called the propensity score (osenbaum & ubin, 1983), and derive an estimator through inverse probability weighting of observed outcomes. Inverse-probability-weighted estimators are central to the semiparametric theory of estimation with missing data (e.g., Tsiatis, 2006; van der Laan & obins, 2003). The two approaches rely on different modelling assumptions and one does not necessarily dominate the other (Tan, 2007). A doubly robust approach makes use of both the outcome regression model and the propensity score model and derives an estimator that remains consistent if either of the two models is correctly specified. A prototypical doubly robust estimator is the augmented inverse-probability-weighted estimator of obins et al. (1994). ecently, a number of alternative doubly robust estimators have been proposed. See Kang & Schafer (2007) and the related discussions. All existing doubly robust estimators are locally efficient: they attain the semiparametric variance bound, and hence asymptotically equivalent to each other, if both the propensity score model and the outcome regression model are correctly specified. Therefore, it is important to compare doubly robust estimators in their statistical properties if only one of the models is correctly specified or if both models are misspecified.

2 2 Z. TAN We review various doubly robust estimators and highlight statistical criteria underlying their construction. Some estimators are intrinsically efficient: if the propensity score model is correctly specified, then each of them is asymptotically efficient among a class of augmented inverseprobability-weighted estimators that use the same fitted outcome regression function (Tan, 2006, 2007). Some estimators are improved-locally efficient: if the propensity score model is correctly specified, then they are asymptotically at least as efficient as the augmented inverse-probabilityweighted estimator that uses the true propensity score and an optimally fitted outcome regression function (ubin & van der Laan, 2008; Tan, 2008). Some estimators are population-bounded or sample-bounded: they lie within the range of all possible values or that of observed values of the outcome (obins et al., 2007). The properties of boundedness rule out estimates outside the population or sample range even when the inverse probability weights are highly variable. We propose a robustification of the likelihood estimator of Tan (2006), named calibrated likelihood estimator, by calibrating the coefficients in a linear, extended propensity score model. The estimator is computationally convenient, involving two steps of maximizing concave functions. Moreover, the estimator is locally and intrinsically efficient and sample-bounded, and is further improved-locally efficient if the outcome regression function is suitably estimated. No existing doubly robust estimators achieve these four properties simultaneously. We further derive a robustification of the likelihood estimator of Tan (2006), named augmented likelihood estimator, by incorporating an augmentation term. This estimator satisfies only a weaker form of boundedness than population and sample boundedness. We compare new and existing estimators in a simulation study and find that the calibrated and augmented likelihood estimators yield overall the smallest mean squared errors. 2. MISSING DATA POBLEMS 2 1. Setup Let X be a vector of covariates and Y be an outcome. The variables X are always observed, but Y may be missing. Let be the non-missing indicator such that = 1 or 0 if Y is observed or missing respectively. Throughout, assume that the missing data mechanism is ignorable, that is, and Y are conditionally independent given X (ubin, 1976). Suppose that an independent and identically distributed sample of n units is available. The observed data consist of (X i, i, i Y i ), i = 1,..., n. Our objective is to estimate the population mean µ = E(Y ). Although this problem is simple to describe, it provides a basic setting for us to investigate methods for handling missing data Models There are two different ways of postulating dimension-reduction assumptions to obtain consistent and asymptotically normal estimators of µ. One approach is to specify a parametric model for the outcome regression function m(x) = E(Y X) in the form E(Y X) = m(x; α) = Ψ{α T g(x)}, (1) where Ψ is an inverse link function, g(x) is a vector of known functions including the constant 1, and α is a vector of unknown parameters. Let ˆα OLS be the maximum quasi-likelihood estimator of α or its variant. For concreteness, fix ˆα OLS as the estimator that solves the equation 0 = Ẽ [{Y m(x; α)}g(x)], where Ẽ denotes sample average. Let ˆµ OLS = Ẽ{ ˆm OLS(X)}, where ˆm OLS (X) = m(x; ˆα OLS ). Under regularity conditions, if model (1) is correctly specified, then ˆµ OLS is consistent and asymptotically normal, with asymptotic variance no greater than the semiparametric variance bound, provided that E(Y 2 ) <.

3 Biometrika style 3 The other approach is to specify a parametric model for the propensity score π(x) = P ( = 1 X) in the form P ( = 1 X) = π(x; γ) = Π{γ T f(x)}, (2) where Π is an inverse link function, f(x) is a vector of known functions, and γ is a vector of unknown parameters. Let ˆγ ML be the maximum likelihood estimator of γ and hence a solution to the equation 0 = Ẽ [{ π(x; γ)}ϱ(x; γ)f(x)], where ϱ(x; γ) = Π {γ T f(x)}/[π(x; γ){1 π(x; γ)}] and Π is the derivative of Π. Two non-augmented inverse-probability-weighted estimators are { } { } / { } Y Y ˆµ IPW = Ẽ, ˆµ IPW,ratio = ˆπ ML (X) Ẽ Ẽ, ˆπ ML (X) ˆπ ML (X) where ˆπ ML (X) = π(x; ˆγ ML ). Under regularity conditions, if model (2) is correctly specified, then ˆµ IPW and ˆµ IPW,ratio are consistent and asymptotically normal, with asymptotic variances no smaller than the semiparametric variance bound, provided that E{π 1 (X)} < and E{Y 2 π 1 (X)} <. See Tan (2007) for a comparison between the two approaches Existing estimators The estimator ˆµ O is based on model (1) only, and ˆµ IPW and ˆµ IPW,ratio are based on model (2) only. Alternatively, a range of estimators have been proposed by using both model (1) and model (2) to gain efficiency and robustness. Many such estimators can be cast in the form [ { } ] [ Y ˆµ(ˆπ, ˆm) = Ẽ ˆπ(X) ˆπ(X) 1 ˆm(X) = Ẽ ˆm(X) + ] {Y ˆm(X)}, ˆπ(X) where ˆπ(X) and ˆm(X) are fitted values of π(x) and m(x) respectively. See Kang & Schafer (2007), obins et al. (2007), and Tan (2006, 2007, 2008) for related discussions. Consider the following estimators of µ, with the same choice ˆπ ML (X) for ˆπ(X) but different choices for ˆm(X). obins et al. (1994) proposed the estimator ˆµ AIPW = ˆµ(ˆπ ML, ˆm OLS ). Scharfstein et al. (1999) suggested the estimator ˆµ OLS,ext = ˆµ{ˆπ ML, ˆm ext (ˆπ ML )} = Ẽ{ ˆm ext(x; ˆπ ML )}, where ˆm ext (X; ˆπ) = m ext {X; ˆκ(ˆπ)} and ˆκ(ˆπ) is a solution to 0 = Ẽ[{Y m ext(x; κ)} {ˆπ 1 (X), g T (X)} T ] for the extended outcome regression model E(Y X) = m ext (X; κ) = Ψ{κ 1ˆπ 1 (X) + κ T 2 g(x)} with κ = (κ 1, κ T 2 )T. Kang & Schafer (2007) considered the estimator ˆµ WLS = ˆµ{ˆπ ML, ˆm WLS (ˆπ ML )} = Ẽ{ ˆm WLS(X; ˆπ ML )}, where ˆm WLS (X; ˆπ) = m{x; ˆα WLS (ˆπ)} and ˆα WLS (ˆπ) is a solution to 0 = Ẽ[ˆπ 1 (X){Y m(x; α)}g(x)] and hence differs from ˆα OLS in using weight ˆπ 1 (X). ubin & van der Laan (2008) proposed two related estimators ˆµ V = ˆµ{ˆπ ML, ˆm V (ˆπ ML )}, µ V = ˆµ{ˆπ ML, m V (ˆπ ML )}, where ˆm V (X; ˆπ) = m{x; ˆα V (ˆπ)} and ˆα V (ˆπ) = argmin α Ẽ([Y/ˆπ(X) {/ˆπ(X) 1}m(X; α)] 2 ) for the first estimator and m V (X; ˆπ) = m{x; α V (ˆπ)} and α V (ˆπ) = argmin α Ẽ[{/ˆπ(X)}{/ˆπ(X) 1}{Y m(x; α)} 2 ] for the second estimator. The estimator α V (ˆπ) is a weighted least-squares estimator using weight ˆπ 1 (X){ˆπ 1 (X) 1}. Our notation makes explicit the dependency of ˆm ext (ˆπ), ˆm WLS (ˆπ), ˆm V (ˆπ), and m V (ˆπ) on ˆπ. The choice ˆπ ML (X) for ˆπ(X) is derived under model (2), independently of model (1). A more elaborate choice can be derived under an extended propensity score model with extra linear

4 Z. TAN predictors depending on ˆm(X). Consider the model { } ˆυ(X) P ( = 1 X) = π ext (X; ν) = Π ν1 T ˆϱ ML (X)ˆπ ML (X) + νt 2 f(x), (3) where ν = (ν T 1, νt 2 )T, ˆυ(X) = {1, ˆm(X)} T, and ˆϱ ML (X) = ϱ(x; ˆγ ML ). Let ˆν( ˆm) be the maximum likelihood estimator of ν and write ˆπ ext (X; ˆm) = π ext {X; ˆν( ˆm)}. Substitution of ˆπ ext ( ˆm OLS ) for ˆπ ML in ˆµ IPW yields the estimator of otnitzky & obins (1995), ˆµ IPW,ext = ˆµ{ˆπ ext ( ˆm OLS ), 0}. For ˆm = ˆm OLS or ˆm WLS (ˆπ ML ), substitution of ˆπ ext ( ˆm) for ˆπ ML in ˆµ(ˆπ ML, ˆm), but not for that within ˆm, yields the estimators ˆµ AIPW,ext = ˆµ{ˆπ ext ( ˆm OLS ), ˆm OLS }, ˆµ WLS,ext = ˆµ[ˆπ ext { ˆm WLS (ˆπ ML )}, ˆm WLS (ˆπ ML )]. by obins et al. in a 2008 technical report at Harvard University. In addition, they proposed ˆµ WLS,ext2 = ˆµ(ˆπ ext { ˆm WLS (ˆπ ML )}, ˆm WLS [ˆπ ext { ˆm WLS (ˆπ ML )}]) through a further iteration from ˆµ WLS,ext. The targeted maximum likelihood approach of van der Laan & ubin (2006, Sections ) is closely related to the estimators ˆµ OLS,ext and ˆµ IPW,ext. With ˆm OLS and ˆπ ML as initial fitted values, this approach leads to the estimators ˆµ TML = ˆµ{ˆπ ML, ˆm TML (ˆπ ML )} = Ẽ{ ˆm TML(X; ˆπ ML )}, ˆµ TIPW = ˆµ{ˆπ TML ( ˆm OLS ), 0}, ˆµ TAIPW = ˆµ{ˆπ TML ( ˆm OLS ), ˆm TML (ˆπ ML )}, where ˆm TML (X; ˆπ) is obtained by fitting E(Y X) = m ext (X; κ) with κ 2 fixed at ˆα OLS, and ˆπ TML (X; ˆm) is obtained by fitting P ( = 1 X) = π ext (X; ν) with ν 2 fixed at ˆγ ML. The estimators ˆµ IPW,ext and ˆµ TIPW are similar to the two likelihood estimators of Tan (2006). The first estimator accommodates the variation of ˆγ ML whereas the second ignores that variation Comparison Consider the following criteria for evaluating estimators of µ. Note that improved local efficiency implies local efficiency, and sample boundedness implies population boundedness. (a) Double robustness: ˆµ remains consistent if either model (1) or model (2) is correctly specified. (b) Local efficiency: ˆµ attains the semiparametric variance bound, i.e., it is asymptotically equivalent to the first order to Ẽ[Y/π(X) {/π(x) 1}m(X)] if both model (1) and model (2) are correctly specified. (c) Improved local efficiency: ˆµ is asymptotically at least as efficient as Ẽ[Y/π(X) {/π(x) 1}m(X; α)] for arbitrary α if model (2) is correctly specified. (d) Intrinsic efficiency: ˆµ attains the minimum asymptotic variance among the class of estimators Ẽ[Y/ˆπ ML (X) b T 1 {/ˆπ ML(X) 1}ˆυ(X)] for arbitrary b 1 if model (2) is correctly specified, where ˆυ(X) = {1, ˆm(X)} T and ˆm(X) is the fitted value of m(x) used in ˆµ. Therefore, ˆµ is asymptotically at least as efficient as ˆµ IPW, ˆµ IPW,ratio, and ˆµ(ˆπ ML, ˆm). (e) Population boundedness: ˆµ lies within the range of all possible values of Y, if model (1) or model (2) or both are misspecified. (f) Sample boundedness: ˆµ lies within the range of {Y i : i = 1, i = 1,..., n}, if model (1) or model (2) or both are misspecified. The upper half of Table 1 presents a comparison of various estimators in Section 2 3 in terms of the foregoing criteria. See Sections 3 4 for a discussion of the likelihood and regression estimators in the lower half of Table 1.

5 Biometrika style 5 Table 1. Theoretical comparison of estimators ˆµ TAIPW ˆµ TML ˆµ AIPW,ext ˆµ AIPW ˆµ OLS,ext ˆµ WLS ˆµ V µ V ˆµ IPW,ext ˆµ WLS,ext ˆµ WLS,ext2 D LE IE ILE PB SB ˆµ LIK,OLS ˆµ EG,OLS µ EG,OLS µ LIK2,OLS µ LIK2,WLS µ LIK2,V D LE IE ILE PB SB D, LE, IE, ILE, PB, and SB correspond to criteria (a) (f). 3. POPOSED APPOACH 3 1. Summary We extend the nonparametric likelihood approach of Tan (2006). The main contribution is to obtain an estimator of µ that is doubly robust, locally and intrinsically efficient, and samplebounded simultaneously. Moreover, our approach is flexible enough to allow different choices, such as ˆm OLS, ˆm WLS (ˆπ ML ), and m V (ˆπ ML ), for the fitted value ˆm. If ˆm = m V (ˆπ ML ), then the resulting estimator is further improved-locally efficient Non-doubly-robust likelihood estimator We describe the likelihood estimator of Tan (2006) in the current setup of missing data. The nonparametric likelihood of (X i, i, i Y i ), i = 1,..., n, is [ n ] L 1 L 2 = π(x i ; γ) i {1 π(x i ; γ)} 1 i G 1 ({X i, Y i }) G 0 ({X i }), i=1 i: i =1 i: i =0 where G 1 is the joint distribution of (X, Y ) and G 0 is the marginal distribution of X. Maximizing L 1 leads to the maximum likelihood estimator ˆγ ML. ecall that ˆm(x) is a fitted value of m(x) based on model (1) and ˆυ(x) = {1, ˆm(x)} T. Let ĥ = (ĥt 1, ĥt 2 )T where ĥ 1 (x) = {1 ˆπ ML (x)} ˆυ(x), ĥ 2 (x) = π γ (x; ˆγ ML). We choose to ignore the fact that G 0 and the marginal distribution of X under G 1 are identical, and retain only the constraints ĥ(x) dg 1 = ĥ(x) dg 0, i.e., {1 ˆπ(x)} dg 1 = {1 ˆπ(x)} dg 0, {1 ˆπ(x)} ˆm(x) dg 1 = {1 ˆπ(x)} ˆm(x) dg 0, π π γ (x; ˆγ ML) dg 1 = γ (x; ˆγ ML) dg 0.

6 Z. TAN See Kong et al. (2003) for a related formulation. The first two constraints respectively ensure that the resulting estimator of µ is consistent under correctly specified model (2) and locally efficient, whereas the third constraint accounts for the variation of ˆγ ML such that the resulting estimator is intrinsically efficient. Furthermore, we require that G 1 be a probability measure supported on {(X i, Y i ) : i = 1, i = 1,..., n} and hence dg 1 = 1, and G 0 be a nonnegative measure (not necessarily a probability) supported on {X i : i = 0, i = 1,..., n}. Maximizing L 2 subject to these constraints leads to the estimators n 1 Ĝ 1 ({X i, Y i }) = ω(x i ; ˆλ) if i = 1, n 1 Ĝ 0 ({X i }) = 1 ω(x i ; ˆλ) if i = 0, where ω(x; λ) = ˆπ ML (X) + λ T ĥ(x), ˆλ = argmaxλ l(λ), and l(λ) = Ẽ[ log{ω(x; λ)} + (1 ) log{1 ω(x; λ)}]. The function l(λ) is finite and concave on the set {λ : ω(x i ; λ) > 0 if i = 1 and ω(x i ; λ) < 1 if i = 0, i = 1,..., n}. Moreover, l(λ) is strictly concave and bounded from above, and hence has a unique maximum, if and only if the set {λ : λ T ĥ(x i ) 0 if i = 1 and λ T ĥ(x i ) 0 if i = 0, i = 1,..., n} is empty. (4) See the Appendix for a proof. From our experience, ˆλ can be computed effectively by using a globally convergent optimization algorithm such as the package trust. Setting the gradient of l(λ) to 0 shows that ˆλ is a solution to [ 0 = Ẽ By construction, ˆλ also satisfies The resulting estimator of µ is 1 = ˆµ LIK = ω(x; λ) ω(x; λ){1 ω(x; λ)}ĥ(x) dĝ1 = Ẽ { y dĝ1 = Ẽ ω(x; ˆλ) { Y } ω(x; ˆλ) ]. (5). (6) The estimator ˆµ LIK is structurally similar to ˆµ IPW,ext based on the extended model (3). The value ˆλ can be interpreted as the maximum likelihood estimator of λ under the linear, extended propensity score model P ( = 1 X) = ω(x; λ). However, there are important differences between ˆµ LIK and ˆµ IPW,ext. First, ω(x i ; ˆλ) may not lie between 0 and 1 for all i = 1,..., n. It is only required that ω(x i ; ˆλ) > 0 if i = 1 and ω(x i ; ˆλ) < 1 if i = 0. Moreover, equation (6) automatically holds, whereas Ẽ{/ˆπ ext(x)} = 1 does not. By (6), ω(x i ; ˆλ) with i = 1 are bounded from below by n 1, and ˆµ LIK is sample-bounded. In contrast, ˆπ ext (X i ) with i = 1 may be arbitrarily close to 0, and ˆµ IPW,ext is not sample-bounded. Tan (2006, Theorem 4) obtained an asymptotic expansion of ˆµ LIK, assuming that model (2) is correctly specified. Here, we provide a general asymptotic expansion of ˆµ LIK, allowing for misspecification of model (1) and model (2). See Manski (1988) for related asymptotic theory in misspecified models. Under regularity conditions, ˆλ converges to a constant λ in probability }.

7 Biometrika style 7 with the expansion [ ˆλ λ = ˆB 1 ω(x; λ ] ) Ẽ ω(x; λ ){1 ω(x; λ )}ĥ(x) + o p (n 1/2 ), where [ ˆB = Ẽ { ω(x; λ )} 2 ] ω 2 (X; λ ){1 ω(x; λ )} 2 ĥ(x)ĥt (X). Moreover, a Taylor expansion of ˆµ LIK about λ yields { } [ Y ˆµ LIK = Ẽ ω(x; λ ) Ĉ T ˆB 1 ω(x; λ ] ) Ẽ ω(x; λ ){1 ω(x; λ )}ĥ(x) + o p (n 1/2 ), (7) where Ĉ = Ẽ[{Y/ω2 (X; λ )}ĥ(x)]. If model (2) is correctly specified, then λ = 0 and hence the expansion reduces to ˆµ LIK = ˆµ EG + o p (n 1/2 ) with ˆµ EG = Ẽ(ˆη) ˆβ T Ẽ(ˆξ), where ˆη = Y/ˆπ ML (X), ˆξ = [{/ˆπ ML (X) 1}ˆυ T (X), { ˆπ ML (X)}ˆϱ ML (X)f T (X)] T, ˆB = Ẽ(ˆξ ˆξ T ), Ĉ = Ẽ(ˆξˆη), and ˆβ = ˆB 1 Ĉ is the least-squares estimator in the linear regression of ˆη on ˆξ. The estimator ˆµ EG is locally and intrinsically efficient (obins et al., 1995), but not doubly robust. See Section 4 5 for a further discussion Doubly robust likelihood estimator The estimator ˆµ LIK is sample-bounded and locally and intrinsically efficient. If ˆm = ˆm V (ˆπ ML ) or m V (ˆπ ML ), then ˆµ LIK is further improved-locally efficient because it is asymptotically at least as efficient as ˆµ V or µ V, which is improved-locally efficient. However, ˆµ LIK is not doubly robust. It may be inconsistent if model (1) is correctly specified but model (2) is misspecified. We propose a robustification of ˆµ LIK such that it satisfies double robustness in addition to sample boundedness and local and intrinsic efficiency. We first discuss a simple version of our proposal. Consider the system of estimating equations [{ } ] 0 = Ẽ ω(x; λ) 1 ˆυ(X), (8) [ 0 = Ẽ ω(x; λ) ω(x; λ){1 ω(x; λ)}ĥ2(x) ], (9) which are equivalent to (5) except that ( ω)/{ω(1 ω)} is replaced by (/ω 1)/(1 ˆπ ML ) in the equations associated with ĥ1 = (1 ˆπ ML )ˆυ. Let λ be a solution to (8) (9) subject to the constraint that ω(x i ; λ) > 0 if i = 1 (i = 1,..., n) and let { } Y µ LIK = Ẽ ω(x; λ). Note that ˆυ(X) includes the constant 1 and hence Ẽ{/ω(X; λ)} = 1 by (8). Therefore, µ LIK is sample-bounded in a similar manner as ˆµ LIK is. We derive asymptotic expansions for λ and µ LIK, allowing for misspecification of model (1) and model (2), in parallel to those for ˆλ and ˆµ LIK. Under regularity conditions, λ converges to a constant λ in probability with the expansion { } λ λ = B T 1 Ẽ ω(x;λ ) 1 ˆυ(X) + o ω(x;λ ) p (n 1/2 ), ω(x;λ ){1 ω(x;λ )}ĥ2(x)

8 Z. TAN where B = Ẽ ω 2 (X;λ )ĥ1(x)ˆυ T { ω(x;λ (X) )} 2 ω 2 (X;λ ){1 ω(x;λ )} 2 ĥ1(x)ĥt 2 (X) ω 2 (X;λ )ĥ2(x)ˆυ T { ω(x;λ (X) )} 2 ω 2 (X;λ ){1 ω(x;λ )} 2 ĥ2(x)ĥt 2 (X). Moreover, a Taylor expansion of µ LIK about λ yields { } { } Y µ LIK = Ẽ ω(x; λ ) Ĉ T BT 1 Ẽ ω(x;λ ) 1 ˆυ(X) + o ω(x;λ ) p (n 1/2 ). (10) ω(x;λ ){1 ω(x;λ )}ĥ2(x) If model (2) is correctly specified, then λ = 0 and hence the expansion reduces to µ LIK = µ EG + o p (n 1/2 ) with µ EG = Ẽ(ˆη) β T Ẽ(ˆξ), where ˆζ = [ˆυ T (X)/ˆπ ML (X), { ˆπ ML (X)}ˆϱ ML (X)f T (X)] T, B = Ẽ(ˆξ ˆζ T ), and β = B 1 Ĉ. In this case, ˆµ EG and µ EG are asymptotically equivalent to the first order and hence so are ˆµ LIK and µ LIK. However, µ EG is akin to the doubly robust regression estimator of Tan (2006). These regression estimators, unlike ˆµ EG, satisfies double robustness in addition to local and intrinsic efficiency. The estimators ˆµ LIK and µ LIK are sample-bounded and locally and intrinsically efficient. However, µ LIK, unlike ˆµ LIK, is further doubly robust. This difference follows from the general asymptotic expansions (7) for ˆµ LIK and (10) for µ LIK. The leading terms are structurally similar to respectively ˆµ EG, which is not doubly robust, and µ EG, which is doubly robust. Alternatively, µ LIK is doubly robust because { } Ẽ ˆm(X) = Ẽ{ ˆm(X)} (11) ω(x; λ) by (8) and hence µ LIK is identical to ˆµ{ω( ; λ), ˆm} in the typical form of doubly robust estimators. In contrast, Ẽ{ ˆm(X)/ω(X; ˆλ)} = Ẽ{ ˆm(X)} does not necessarily hold for ˆµ LIK. We regard λ as a calibration of the maximum likelihood estimator ˆλ in the linear, extended propensity score model P ( = 1 X) = ω(x; λ) such that equation (11) holds. So far, we seem to fulfil the objective of deriving an estimator that is doubly robust, locally and intrinsically efficient, and sample-bounded. However, there remain subtle issues about the existence and computation of λ. First, it is difficult to characterize conditions under which there exists a solution to (8) (9) subject to the constraint that ω(x i ; λ) > 0 if i = 1 (i = 1,..., n). Moreover, algorithms for solving nonlinear equations such as (8) (9) may fail to locate a solution, much less all possible solutions, if any exists. It presents a further challenge to accommodate the constraint on the domain of λ. Finally, if indeed there exists no solution or multiple solutions, it remains difficult to redefine λ or select λ among multiple solutions. These difficulties are applicable not only to (8) (9), but to nonlinear estimating equations in general. See Small et al. (2000) for a survey that mainly deals with multiple solutions. We now discuss a more effective version of our proposal to address the foregoing issues. ecall that ˆλ is defined as a maximizer of l(λ). Under condition (4), l(λ) is strictly concave and bounded from above and hence ˆλ exists and is unique. Consider the following two-step estimator. (a) Compute ˆλ = (ˆλ T 1, ˆλ T 2 )T, partitioned according to ĥ = (ĥt 1, ĥt 2 )T. (b) Compute λ step2 = ( λ T 1,step2, ˆλ T 2 )T, where λ 1,step2 = argmax λ1 κ 1 (λ 1 ) and [ κ 1 (λ 1 ) = Ẽ log{ω(x; λ 1, ˆλ 2 )} log{ω(x; ˆλ)} ] λ T 1 ˆυ(X). 1 ˆπ ML (X)

9 Biometrika style 9 The function κ 1 (λ 1 ) is finite and concave on the set {λ 1 : ω(x i ; λ 1, ˆλ 2 ) > 0 if i = 1, i = 1,..., n}. Moreover, as shown in the Appendix, κ 1 (λ 1 ) is strictly concave and bounded from above, and hence has a unique maximum, if and only if the set {λ 1 : λ T 1 ˆυ(X i) 0 if i = 1, i = 1,..., n, and Ẽ{λT 1 ˆυ(X)} 0} is empty. (12) Like ˆλ in step (a), λ 1,step2 in step (b) can be computed effectively by using a globally convergent optimization algorithm such as the package trust. Setting the gradient of κ 1 (λ 1 ) to 0 shows that λ 1,step2 is a solution to 0 = Ẽ [{ ω(x; λ 1, ˆλ 2 ) 1 } ˆυ(X) ], (13) which is equivalent to (8) with λ 2 evaluated at ˆλ 2. In fact, we consider (13) as estimating equations and obtain κ 1 (λ 1 ) as an objective function by integrating the right side of (13). This construction is feasible because the matrix of the partial derivatives of the right side of (13) is symmetric and negative-semidefinite. In the degenerate case where ĥ2(x) is removed from ĥ(x), then λ consists of λ 1 only and hence λ and λstep2 are identical. The resulting estimator of µ is { } µ LIK2 = Ẽ Y ω(x; λ. step2 ) The estimator µ LIK2, like µ LIK, is sample-bounded and doubly robust due to, respectively, Ẽ{/ω(X; λ step2 )} = 1 and E{ ˆm(X)/ω(X; λ step2 )} = E{ ˆm(X)} by (13). Furthermore, µ LIK2 is asymptotically equivalent to the first order to ˆµ LIK and µ LIK if model (2) is correctly specified, and hence is locally and intrinsically efficient. See the Appendix for an asymptotic expansion of µ LIK2, allowing for misspecification of model (1) and model (2). The foregoing development allows a general choice of the fitted value ˆm(X). The estimator µ LIK2 is doubly robust, locally and intrinsically efficient, and sample-bounded. Nevertheless, different choices of ˆm(X) lead to specific versions of µ LIK2 that differ beyond the four properties. Denote by µ LIK2,OLS, µ LIK2,WLS, and µ LIK2,V the versions of µ LIK2 corresponding to ˆm = ˆm OLS, ˆm WLS (ˆπ ML ), and m V (ˆπ ML ), and similarly denote those of ˆµ LIK, ˆµ EG, and µ EG. The estimator µ LIK2,V, unlike µ LIK2,OLS and µ LIK2,WLS, is further improved-locally efficient. See Table 1 for a comparison of these estimators among other estimators. 4. EXTENSIONS AND COMPAISONS 4 1. Specification of ˆυ(X) The vector ˆυ(X) is so far fixed as {1, ˆm(X)} T. However, it can be replaced throughout by a general vector of known functions of X including the constant 1 as in Tan (2006). With this extension, ˆµ LIK and µ LIK2 still have asymptotic expansions in the current forms. The two estimators are sample-bounded and intrinsically efficient. Furthermore, if ˆm(X) = b T 1 ˆυ(X) for some vector b 1, (14) then ˆµ LIK is locally efficient, and µ LIK2 is doubly robust and locally efficient. Condition (14) automatically holds for ˆυ(X) = {1, ˆm(X)} T with b 1 = (0, 1) T. Consider the case where model (1) is linear with identity link Ψ. Then g(x) is an alternative choice of ˆυ(X) satisfying (14). For this choice, intrinsic efficiency implies improved local effi-

10 Z. TAN ciency and hence ˆµ LIK and µ LIK2 are improved-locally efficient. This result can also be seen from the following relationship. Suppose that ĥ2(x) is removed from ĥ(x) throughout. Then ˆµ EG and µ EG are identical to ˆµ V and µ V respectively, which are improved-locally efficient (Tan, 2008). The estimators ˆµ LIK and µ LIK2 have increased asymptotic variances, but are still asymptotically equivalent to the first order to ˆµ EG and µ EG if model (2) is correctly specified. Therefore, the original estimators ˆµ LIK and µ LIK2 are improved-locally efficient Estimation of E(X) and G 1 The estimators ˆµ LIK and µ LIK2 for µ = E(Y ) can be used for estimating E(X) with Y replaced by X, and similarly for estimating the expectations of functions of X. The resulting estimators have similar properties to those of ˆµ LIK and µ LIK2. Suppose that X is contained in ˆυ(X) by specification. If model (2) is correctly specified, then Ẽ{X/ω(X; ˆλ)} is asymptotically at least as efficient as Ẽ[X/ˆπ ML(X) {/ˆπ ML (X) 1}X] = Ẽ(X) by intrinsic efficiency, and hence asymptotically equivalent to the first order to Ẽ(X). The estimator Ẽ{X/ω(X; λ step2 )}, in contrast with Ẽ{X/ω(X; ˆλ)}, is identical to Ẽ(X) by (13), whether or not model (2) is correctly specified. Estimation of E(Y ), E(X), and the expectations of functions of (X, Y ) is unified in estimation of G 1 from the distributional perspective of Tan (2006). Let G 1,step2 be the probability measure supported on {(X i, Y i ) : i = 1, i = 1,..., n} such that if i = 1 then G 1,step2 ({X i, Y i }) = n 1 ω(x i ; λ step2 ). Then Ĝ1 and G 1,step2 are both estimators of G 1, supported on the completely observed data. However, G1,step2 satisfies ˆυ(x) d G 1,step2 = Ẽ{ˆυ(X)}, i.e., the weighted average of ˆυ(X) under G 1,step2 is exactly matched to the overall sample average of ˆυ(X). We compare our approach with the empirical likelihood approach of Qin & Zhang (2003). Their approach is to maximize i: i =1 G 1({X i, Y i }) subject to the constraints that G 1 is a probability measure supported on {(X i, Y i ) : i = 1, i = 1,..., n} and â(x) dg 1 = Ẽ{â(X)}, where â(x) = {ˆπ ML (x), ˆm(x)} T. The maximization leads to the estimator that if i = 1 then Ĝ QZ ({X i, Y i }) = n ˆλ T QZ [â(x i) Ẽ{â(X)}], where n 1 = n i=1 i, ˆλQZ = argmax λ1 l QZ (λ 1 ) and l QZ (λ 1 ) = Ẽ{ log(1 + λt 1 [â(x i) Ẽ{â(X)}])}. The estimator ˆµ QZ = y dĝqz is sample-bounded due to dĝqz = 1, and doubly robust and locally efficient due to ˆm(x) dĝqz = Ẽ{ ˆm(X)}. However, ˆµ QZ is not intrinsically or improved-locally efficient, even in the special case where π(x) is known and substituted for ˆπ ML (X) and ˆm V (ˆπ ML ) or m V (ˆπ ML ) is used for ˆm Augmentation of ˆµ LIK The estimator µ LIK2 is derived as a robustification of ˆµ LIK to realize double robustness and retain sample boundedness and local and intrinsic efficiency. Our method is to calibrate the estimation of λ. An alternative method for robustification is to augment ˆµ LIK with the additional term Ẽ[{/ω(X; ˆλ) 1} ˆm(X)], in a similar manner to augmenting ˆµ IPW,ext to ˆµ AIPW,ext by obins et al. in their 2008 technical report. The resulting estimator is doubly robust and locally and intrinsically efficient, but not sample-bounded.

11 Biometrika style 11 ecall that ˆλ = ˆλ( ˆm) depends on ˆm and write ˆω(X; ˆm) = ω{x; ˆλ( ˆm)}. Substitution of ˆω( ˆm) for ˆπ ext ( ˆm) in various estimators in Section 2 3 leads to ˆµ AIPW,lik = ˆµ{ˆω( ˆm OLS ), ˆm OLS }, ˆµ WLS,lik = ˆµ[ˆω{ ˆm WLS (ˆπ ML )}, ˆm WLS (ˆπ ML )], ˆµ WLS,lik2 = ˆµ(ˆω{ ˆm WLS (ˆπ ML )}, ˆm WLS [ˆω{ ˆm WLS (ˆπ ML )}]), µ V,lik = ˆµ[ˆω{ m V (ˆπ ML )}, m V (ˆπ ML )]. These estimators are similar to their counterparts in Section 2 3 in terms of the six properties in Table 1. The estimator ˆµ{ˆω( ˆm), ˆm} is not population-bounded or sample-bounded, whereas ˆµ WLS,lik2 is population-bounded. Nevertheless, ˆµ{ˆω( ˆm), ˆm} is bounded in the absolute value by = max{ ˆm(X i ) : i = 1,..., n} + max{ Y i ˆm(X i ) : i = 1, i = 1,..., n}, due to normalization (6). In contrast, ˆµ{ˆπ ext ( ˆm), ˆm} may lie outside this range, because such a normalization does not hold for ˆπ ext (X) as discussed in Section 3 2. Kang & Schafer (2007) and obins et al. (2007) considered a modification of ˆµ(ˆπ, ˆm) by deliberately normalizing the weights, that is, ˆµ ratio (ˆπ, ˆm) = Ẽ 1 { ˆπ(X) } Ẽ ( Y ˆπ(X) ˆπ(X) } Ẽ = Ẽ{ ˆm(X)} + Ẽ 1 { ˆπ(X) ) [ ˆm(X) Ẽ{ ˆm(X)}] ] [ {Y ˆm(X)} ˆπ(X) The estimator ˆµ ratio {ˆπ ext ( ˆm), ˆm} is bounded in the absolute value by. Moreover, it is similar to ˆµ{ˆπ ext ( ˆm), ˆm} and ˆµ{ˆω( ˆm), ˆm} in terms of the six properties in Table 1. These estimators, two based on ˆπ ext and one based on ˆω, are asymptotically equivalent to each other if model (2) is correctly specified, but may differ in various ways otherwise Bounded robustification of ˆµ IPW,ext The estimator ˆµ AIPW,ext is doubly robust but not sample-bounded. An alternative robustification of ˆµ IPW,ext can be derived such that it is doubly robust and sample-bounded in a similar manner as µ LIK2 is derived from ˆµ LIK. Our method is to calibrate estimation of ν in the extended model (3). For simplicity, fix Π(z) = expit(z), i.e., {1 + exp( z)} 1. Then ϱ(x; γ) 1 free of γ, and π ext (X; ν) reduces to Π{ν T 1 ˆυ(X)/ˆπ ML(X) + ν T 2 f(x)}. ecall that ˆν = (ˆν T 1, ˆνT 2 )T is the maximum likelihood estimator of ν and hence a solution to 0 = Ẽ[ { π ext (X; ν)} f(x) ], [ 0 = Ẽ { π ext (X; ν)} ˆυ(X) ]. (15) ˆπ ML (X) Let ν step2 = ( ν 1,step2 T, ˆνT 2 )T, ν 1,step2 = argmax ν1 J 1(ν 1 ), and [ { J 1(ν 1 ) = Ẽ ˆυ(X) ˆπ ML (X) exp ν1 T ˆπ ML (X) ˆνT 2 f(x) } ] (1 )ν1 T ˆυ(X) by integrating the right side of (17) below. The function J 1(ν 1 ), unlike l(λ) and κ 1 (λ 1 ), is finite and concave everywhere. Moreover, J 1(ν 1 ) is strictly concave and bounded from above, and hence has a unique maximum, if and only if the set {ν 1 : ν1 Tˆυ(X i) 0 if i = 1, i = 1,..., n, and Ẽ{(1 )νt 1 ˆυ(X)} 0} is empty. (16) See the Appendix for a proof. The existence condition (16) for ν 1,step2 is more demanding than (12) for λ 1,step2 in that (16) implies (12), but not necessarily vice versa. Setting the gradient of.

12 Z. TAN J 1(ν 1 ) to 0 shows that ν 1,step2 is a solution to [{ } 0 = Ẽ π ext (X; ν 1, ˆν 2 ) 1 ] ˆυ(X), (17) which is equivalent to (15) with ( π ext ) replaced by (/π ext 1)ˆπ ML and ν 2 evaluated at ˆν 2. The resulting estimator of µ is µ IPW,ext2 = Ẽ{Y/π ext(x; ν step2 )}. This estimator, like µ LIK2, is doubly robust, locally and intrinsically efficient, and sample-bounded. We compare µ IPW,ext2 with the bounded, doubly robust estimator of obins et al. (2007, Section 4 1 2). Consider the extended propensity score model π ext,sl (X; χ, γ) = Π(χ[ ˆm(X) Ẽ{ ˆm(X)}] + γ T f(x)). Let ˆχ = ˆχ( ˆm) be a solution to ( 0 = Ẽ [ ˆm(X) Ẽ{ ˆm(X)}] π ext,sl (X; χ, ˆγ ML ) and write ˆπ ext,sl (X; ˆm) = π ext,sl {X; ˆχ( ˆm), ˆγ ML }. The estimator ˆµ IPW,ext,LS = ˆµ ratio {ˆπ ext,sl ( ˆm), 0} is sample-bounded. Moreover, it is identical to ˆµ ratio {ˆπ ext,sl ( ˆm), ˆm} by the construction of ˆχ and hence is doubly robust and locally efficient. However, it is not intrinsically or improved-locally efficient, even in the case where ˆγ ML is replaced by the true value and ˆm(X) Ẽ{ ˆm(X)} in π ext,sl(x; χ, γ) is replaced by [ ˆm(X) Ẽ{ ˆm(X)}]/π(X) egression estimators The estimators ˆµ EG and µ EG are called regression estimators (Tan, 2006, 2007), with connection to survey sampling (e.g., Cochran, 1977) and Monte Carlo integration (e.g., Hammersley & Handscomb, 1964). The idea is to exploit the fact that if model (2) is correctly specified, then ˆη has mean µ and ˆξ has mean 0 asymptotically. The estimator ˆµ EG attains the minimum asymptotic variance among the class of estimators Ẽ(ˆη) bt Ẽ(ˆξ) for arbitrary b. Moreover, µ EG is asymptotically equivalent to the first order to ˆµ EG because both β and ˆβ converge β = E 1 (ξξ T )E(ξη) in probability. Note that Ẽ(ˆξ 2 ) = 0 and hence Ẽ(ˆη) bt Ẽ(ˆξ) reduces to Ẽ(ˆη) bt 1 Ẽ(ˆξ 1 ), where b = (b T 1, bt 2 )T and ˆξ = (ˆξ 1 T, ˆξ 2 T)T according to ĥ = (ĥt 1, ĥt 2 )T. The estimators ˆµ EG and µ EG are no longer asymptotically equivalent if model (2) is misspecified. In fact, µ EG is doubly robust whereas ˆµ EG is not. The estimator µ EG is akin to the doubly robust regression estimator of Tan (2006), in which ˆη is defined as {ˆυ T (X)/ˆπ ML (X), ˆϱ ML (X)f T (X)} T. A benefit of using this version of ˆη is that the resulting matrix B is symmetric and negative-semidefinite. Moreover, if {λ : λ T h(x i ) = 0 if i = 1, i = 1,..., n} is empty, then B is negative-definite. This symmetrization tends to stabilize the inversion of B in β = B 1 Ĉ and hence improve the finite-sample behavior of µ EG. A similar symmetrization can be applied to estimating equations (8) (9). Consider the following estimating equations in place of (9) [ { } ] 0 = Ẽ ω(x; λ) 1 ĥ2 (X). (18) 1 ˆπ ML (X) The matrix of the partial derivatives of the right sides of (8) and (18) is symmetric and negativesemidefinite. If {λ : λ T ĥ(x i ) = 0 if i = 1, i = 1,..., n} is empty, then the matrix is negativedefinite. In fact, (8) and (18) are jointly equivalent to setting to 0 the gradient of κ(λ) = Ẽ([ log{ω(x; λ)} λ T ĥ(x)]/{1 ˆπ ML (X)}), similarly as (13) is obtained from κ 1 (λ 1 ). The function κ(λ) has similar properties of concavity and boundedness to those of κ 1 (λ 1 ). Therefore, it is numerically convenient to redefine λ as a maximizer to κ(λ) or equivalently a solution to ),

13 Biometrika style 13 (8) and (18) subject to the constraint that ω(x i ; λ) > 0 if i = 1 (i = 1,..., n). The resulting estimator µ LIK is comparable to µ LIK2 in terms of the six properties in Table 1. A limitation of the modified estimator µ LIK as compared with µ LIK2 is that it is difficult to generalize µ LIK while retaining the structure of λ to the setup of causal inference with non-binary, discrete treatments. See Section 5 4 for a further discussion. 5. CAUSAL INFEENCE 5 1. Setup We now turn to causal inference in the framework of potential outcomes (Neyman, 1923; ubin, 1974). Let X be a vector of covariates and Y be an outcome as before. Let T be a treatment variable taking values in T = {0, 1,..., J 1} with J 2, where 0 denotes the null treatment or placebo. For each t T, let Y t be the potential outcome that would be observed under treatment t. We make the consistency assumption that Y = Y t if T = t, and the no-confounding assumption that for each t T, t and Y t are conditionally independent given X, where t = 1{T = t}. Throughout, 1{ } denotes the indicator function. The observed data consist of independent and identically distributed (X i, T i, Y i ), i = 1,..., n. Our objective is to estimate the population mean µ t = E(Y t ) for t T. The difference µ t µ 0 is called the average causal effect of treatment t. To a certain extent, this problem can be handled as J separate problems of estimating µ t from the data (X i, t,i, t,i Y t,i ), i = 1,..., n, as in Sections 2 4. However, the estimators of µ t obtained in this way are not jointly intrinsically efficient and hence those of µ t µ 0 may be inefficient even marginally Models and existing estimators Consider a parametric model for m(t, X) = E(Y T = t, X) in the form E(Y T = t, X) = m(t, X; α) (t T ), (19) where m(t, x; α) is a known function and α is a vector of unknown parameters. To focus on main ideas, assume that m(t, X; α) = Ψ{α T t g(x)}, where α t is a vector of unknown parameters and α = (α T 0,..., αt J 1 )T. This specification of (19) is separable in the sense that m(t, X; α) depends on α only through α t. By abuse of notation, treat m(t, X; α) as m(t, X; α t ). Let ˆα t,ols be a solution to 0 = Ẽ[ t{y m(t, X; α t )}g(x)] and write ˆm OLS (t, X) = m(t, X; ˆα t,ols ). Consider a parametric model for π(t, X) = P (T = t X) in the form P (T = t X) = π(t, X; γ) (t T ), (20) where π(t, x; γ) is a known function and γ is a vector of unknown parameters. Let ˆγ ML be the maximum likelihood estimator of γ and write ˆπ ML (t, X) = π(t, X; ˆγ ML ). A convenient specification of (20) is the multinomial logit model π(t, X; γ) = exp{γt T f(x)} (21) j T exp{γt j f(x)}, where γ = (γ0 T, γt 1,..., γt J 1 )T with γ 0 = 0. In this case, the score equations for ˆγ ML are 0 = Ẽ[{ t π(t, X; γ)}f(x)] for t = 1,..., J 1. To estimate µ t, the estimators in Section 2 3 can be adopted. eplace ˆµ(ˆπ, ˆm) by ˆµ t (ˆπ, ˆm) = Ẽ [ t Y ˆπ(t, X) { t ˆπ(t, X) 1 } ] ˆm(t, X),

14 Z. TAN where ˆπ(t, X) and ˆm(t, X) are estimators of π(t, X) and m(t, X) respectively. Various choices of the two estimators are available. The estimator ˆm OLS (t, X) is a simple choice of ˆm(t, X), and ˆπ ML (t, X) is a simple choice of ˆπ(t, X). Moreover, there are iterative choices of ˆm(t, X) and ˆπ(t, X). Let ˆm ext (t, X; ˆπ) = m ext {t, X; ˆκ t (ˆπ)}, ˆm WLS (t, X; ˆπ) = m{t, X; ˆα t,wls (ˆπ)}, and m V (t, X; ˆπ) = m{t, X; α t,v (ˆπ)}, where ˆκ t (ˆπ), ˆα t,wls (ˆπ), and α t,v (ˆπ) are obtained by substituting t, ˆπ(t, X), and m(t, X; α t ) for, ˆπ(X), and m(x; α) throughout in ˆκ(ˆπ), ˆα WLS (ˆπ), and α V (ˆπ). Construction of an extension to ˆπ ext ( ˆm) seems difficult for a general specification of model (20) with J > 2. Nevertheless, the task is straightforward if the multinomial logit specification (21) is used. Consider the model 1 P (T = t X) = π ext (t, X; ν) = C(X; ν) exp ˆυ(j, X) ν1t,j T ˆπ ML (j, X) + νt 2tf(X), (22) where ν = (ν1 T, νt 2 )T, ν 1 is the vector of ν 1t,j for t, j T with ν 10,j = 0 for j T and ν 1t,0 = ν 11,0 for t 0, ν 2 is the vector of ν 2t for t T with ν 20 = 0, ˆυ(j, X) = {1, ˆm(j, X)} T, and C(X; ν) is determined by t T π ext(t, X; ν) 1. Let ˆν( ˆm) be the maximum likelihood estimator of ν and write ˆπ ext (t, X; ˆm) = π ext {t, X; ˆν( ˆm)}. The foregoing choices of ˆm(t, X) and ˆπ(t, X) can be employed in similar combinations to those of ˆm(X) and ˆπ(X) in Section 2 3. Label the resulting estimators of µ t accordingly. For each t T, the marginal behavior of ˆµ t can be evaluated by the criteria in Section 2 4. However, consider the following criteria for the joint behavior of (ˆµ 0, ˆµ 1,..., ˆµ J 1 ). We say that a vector-valued estimator ˆθ 1 is more efficient than ˆθ 2 if the asymptotic variance matrix of ˆθ 1 is smaller than that of ˆθ 2 in the order on positive-definite matrices. (a) Joint double robustness: (ˆµ 0, ˆµ 1,..., ˆµ J 1 ) remains consistent if either model (19) or model (20) is correctly specified. (b) Joint local efficiency: (ˆµ 0, ˆµ 1,..., ˆµ J 1 ) attains the semiparametric variance bound if both model (19) and model (20) are correctly specified. (c) Joint improved local efficiency: (ˆµ 0, ˆµ 1,..., ˆµ J 1 ) is at least as efficient as {ˆµ 0 (α 0 ), ˆµ 1 (α 1 ),..., ˆµ J 1 (α J 1 )} if model (20) is correctly specified, where ˆµ t (α t ) = Ẽ[ ty/π(t, X) { t /π(t, X) 1}m(t, X; α t )] for α t a vector of arbitrary constants (t T ). (d) Joint intrinsic efficiency: (ˆµ 0, ˆµ 1,..., ˆµ J 1 ) is at least as efficient as {ˆµ 0 (b 0 ), ˆµ 1 (b 1 ),..., ˆµ J 1 (b J 1 )} if model (20) is correctly specified, where ˆµ t (b t ) = Ẽ[ ty/ˆπ ML (t, X) b T t { t /ˆπ ML (t, X) 1}ˆυ(t, X)] for b t a vector of arbitrary constants (t T ). (e) Joint population boundedness: ˆµ t is population-bounded for each t T. (f) Joint sample boundedness: ˆµ t is sample-bounded for each t T. Joint double robustness, local efficiency, or population or sample boundedness is equivalent to the fact that ˆµ t satisfies the corresponding property for each t T. However, joint intrinsic or improved local efficiency is respectively more stringent than the fact that for each t T, ˆµ t satisfies intrinsic or improved local efficiency. The comparison in Table 1 remains applicable except for one correction, if the estimators are replaced by the joint estimators of (µ 0, µ 1,..., µ J 1 ) and the properties are replaced by those on the joint behavior. See Sections for a description of the likelihood and regression estimators. The correction is that none of the joint estimators satisfies joint improved local efficiency, although Table 1 is still valid regarding whether or not the estimators of µ t satisfy improved local efficiency marginally. See Tan (2008, Section 3) for a further discussion. j T

15 Biometrika style 15 Note that (ˆµ t,ipw,ext ) t T satisfies joint intrinsic efficiency because ˆυ(j, X)/ˆπ ML (j, X), j T, are simultaneously included as extra linear predictors for log{π(t, X)/π(0, X)} for each t 0 in model (22). For fixed j 0, if model (22) were specified such that log{π(t, X)/π(0, X)} = ν T 2t f(x) if t 0 or j, or νt 1j,j ˆυ(j, X)/ˆπ ML(j, X) + ν T 2j f(x) if t = j, then ˆµ j,ipw,ext would satisfy intrinsic efficiency marginally, but (ˆµ t,ipw,ext ) t T would not satisfy joint intrinsic efficiency. See Tan (2007, Section 3) for a related discussion Non-doubly-robust likelihood estimator We present the likelihood estimator of Tan (2006) in the setup of causal inference, with the extension to accommodate discrete, binary or non-binary, treatments. See a 2007 utgers University technical report by Tan for a further extension to deal with marginal and nested structural models. The nonparametric likelihood of (X i, T i, Y i ), i = 1,..., n, is n n L 1 L 2 = π(t i, X i ; γ) G Ti ({X i, Y i }), i=1 where G t is the joint distribution of (X, Y t ), t T. Maximizing L 1 leads to the maximum likelihood estimator ˆγ ML. ecall that ˆm(t, x) is an estimator of m(t, x) based on model (19) and υ(t, x) = {1, ˆm(t, x)} T. Let ĥ = (ĥt 1, ĥt 2 )T and ĥ1 = (ĥt 10, ĥt 11,..., ĥt 1,J 1 )T where ĥ 1j (t, x) = [1{t = j} ˆπ ML (t, x)]ˆυ(j, x) (j T ), ĥ 2 (t, x) = π γ (t, x; ˆγ ML). By construction, t T ĥ(t, x) 0 because t T ˆπ ML(t, x) 1. We choose to ignore the fact that G t, t T, induce the same marginal distribution of X, and retain only the constraints t T ĥ(t, x) dgt = 0, i.e., 0 = [1{t = j} ˆπ ML (t, x)] dg t (j T ), t T 0 = [1{t = j} ˆπ ML (t, x)] ˆm(j, x) dg t (j T ), t T 0 = π γ (t, x; ˆγ ML) dg t. t T Furthermore, we require that G t be a probability measure supported on {(X i, Y i ) : T i = t, i = 1,..., n} and hence dg t = 1, t T. Maximizing L 2 subject to these constraints leads to the estimators that if T i = t then Ĝ t ({X i, Y i }) = i=1 n 1 ω(t, X i ; ˆλ), where ω(t, X; λ) = ˆπ ML (t, X) + λ T ĥ(t, X), ˆλ = argmaxλ l(λ), and l(λ) = Ẽ[log{ω(T, X; λ)}]. The function l(λ) is finite and concave on the set {λ : ω(t i, X i ; λ) > 0, i = 1..., n}. Moreover, l(λ) is strictly concave and bounded from above, and hence has a unique maximum, if and only if {λ : λ T ĥ(t i, X i ) 0, i = 1..., n} is empty. This proposition follows in a similar manner as that concerning l(λ) and condition (4) in Section 3 2. The estimators Ĝt, t T, are similar to Ĝ1 in Section 3 2. If J = 2, ˆπ ML (1, X) is identified as ˆπ ML (X), ĥ10 is removed in ĥ, and the constraint dg 0 = 1 is cancelled, then Ĝ1 reduces to exactly Ĝ1 in Section 3 2. For causal inference, Ĝt, t T, are equally of interest and constrained

16 Z. TAN as probability measures. In contrast, only Ĝ1, but not Ĝ 0, is of interest and constrained as a probability measure in the missing data setup. Setting the gradient of l(λ) to 0 shows that ˆλ is a solution to { } ĥ(t, X) 0 = Ẽ, (23) ω(t, X; λ) or equivalently 0 = t T ĥ(t, x) d Ĝ t. The resulting estimator of µ t is { } ˆµ t,lik = y t dĝt = Ẽ t Y ω(t, X; ˆλ). We derive the following asymptotic expansions for ˆλ and ˆµ t,lik, allowing for misspecification of model (19) and model (20), similarly as in Section 3 2. Under regularity conditions, ˆλ converges to a constant λ with the expansion ˆλ λ = ˆB 1Ẽ{ĥ(T, X)/ω(T, X; λ )} + o p (n 1/2 ). Moreover, ˆµ t,lik has the expansion ˆµ t,lik = Ẽ { } t Y ω(t, X; λ ) Ĉ t T ˆB 1 Ẽ { } ĥ(t, X) ω(t, X; λ + o p (n 1/2 ), ) where ˆB = Ẽ{h(T, X)ĥT (T, X)/ω 2 (T, X; λ )} and Ĉt = Ẽ{ ty/ω 2 (T, X; λ )}. If model (20) is correctly specified, then λ = 0 and hence ˆµ t,lik is asymptotically equivalent to the first order to ˆµ t,eg = Ẽ(ˆη t) Ĉ T t ˆB 1 Ẽ(ˆξ), where ˆη t = t Y/ˆπ ML (T, X), ˆξ = ĥ(t, X)/ˆπ ML (T, X), ˆB = Ẽ(ˆξ ˆξ T ), and Ĉt = Ẽ(ˆξˆη t ) Doubly robust likelihood estimator The estimator ˆµ t,lik is sample-bounded and locally and intrinsically efficient marginally. Moreover, (ˆµ 0,LIK, ˆµ 1,LIK,..., ˆµ J 1,LIK ) satisfies joint intrinsic efficiency. However, ˆµ t,lik is not doubly robust. We propose a robustfication of ˆµ t,lik such that the resulting estimator of µ t satisfies double robustness in addition to sample boundedness and local and intrinsic efficiency, and the joint estimator satisfies joint intrinsic efficiency. For our derivation, rewrite ĥ(t, x) as ĥ(t, x) = ˆ (t, x) ˆπ ML (t, x) ˆ (j, x), (24) j T where ˆ = (ˆ T 1, ˆ T 2 )T, ˆ 2 is defined the same as ĥ2, but ˆ 1 is defined as ĥ1 with ĥ1j(t, x) replaced by ˆ 1j (t, x) = 1{t = j}υ(j, x), j T. Instead of (23), consider the system of estimating equations { ˆ (T, X) 0 = Ẽ ω(t, X; λ) } ˆ (t, X), (25) t T i.e., 0 = Ẽ[{ t/ω(t, X; λ) 1}ˆυ(t, X)], t T, and 0 = Ẽ{ˆ 2 (T, X)/ω(T, X; λ)}. In retrospect, the vector of estimating functions ˆ (T, X)/ω(T, X; λ) t T ˆ (t, X) in (25) equals ĥ(t, X)/ω(T, X; λ) in (23) left-multiplied by the matrix I t T ˆ (T, X)λ T, where I is the appropriate identity matrix. Let λ be a solution to (25) subject to the constraint that ω(t i, X i ; λ) > 0 (i = 1,..., n) and let µ t,lik = Ẽ{ ty/ω(t, X; λ)}.

Nonparametric Likelihood and Doubly Robust Estimating Equations for Marginal and Nested Structural Models. Zhiqiang Tan 1

Nonparametric Likelihood and Doubly Robust Estimating Equations for Marginal and Nested Structural Models. Zhiqiang Tan 1 Nonparametric Likelihood and Doubly Robust Estimating Equations for Marginal and Nested Structural Models Zhiqiang Tan 1 Abstract. Drawing inferences about treatment effects is of interest in many fields.

More information

Modification and Improvement of Empirical Likelihood for Missing Response Problem

Modification and Improvement of Empirical Likelihood for Missing Response Problem UW Biostatistics Working Paper Series 12-30-2010 Modification and Improvement of Empirical Likelihood for Missing Response Problem Kwun Chuen Gary Chan University of Washington - Seattle Campus, kcgchan@u.washington.edu

More information

Comment: Understanding OR, PS and DR

Comment: Understanding OR, PS and DR Statistical Science 2007, Vol. 22, No. 4, 560 568 DOI: 10.1214/07-STS227A Main article DOI: 10.1214/07-STS227 c Institute of Mathematical Statistics, 2007 Comment: Understanding OR, PS and DR Zhiqiang

More information

Double Robustness. Bang and Robins (2005) Kang and Schafer (2007)

Double Robustness. Bang and Robins (2005) Kang and Schafer (2007) Double Robustness Bang and Robins (2005) Kang and Schafer (2007) Set-Up Assume throughout that treatment assignment is ignorable given covariates (similar to assumption that data are missing at random

More information

Weighting Methods. Harvard University STAT186/GOV2002 CAUSAL INFERENCE. Fall Kosuke Imai

Weighting Methods. Harvard University STAT186/GOV2002 CAUSAL INFERENCE. Fall Kosuke Imai Weighting Methods Kosuke Imai Harvard University STAT186/GOV2002 CAUSAL INFERENCE Fall 2018 Kosuke Imai (Harvard) Weighting Methods Stat186/Gov2002 Fall 2018 1 / 13 Motivation Matching methods for improving

More information

High Dimensional Propensity Score Estimation via Covariate Balancing

High Dimensional Propensity Score Estimation via Covariate Balancing High Dimensional Propensity Score Estimation via Covariate Balancing Kosuke Imai Princeton University Talk at Columbia University May 13, 2017 Joint work with Yang Ning and Sida Peng Kosuke Imai (Princeton)

More information

An Efficient Estimation Method for Longitudinal Surveys with Monotone Missing Data

An Efficient Estimation Method for Longitudinal Surveys with Monotone Missing Data An Efficient Estimation Method for Longitudinal Surveys with Monotone Missing Data Jae-Kwang Kim 1 Iowa State University June 28, 2012 1 Joint work with Dr. Ming Zhou (when he was a PhD student at ISU)

More information

Covariate Balancing Propensity Score for General Treatment Regimes

Covariate Balancing Propensity Score for General Treatment Regimes Covariate Balancing Propensity Score for General Treatment Regimes Kosuke Imai Princeton University October 14, 2014 Talk at the Department of Psychiatry, Columbia University Joint work with Christian

More information

A Sampling of IMPACT Research:

A Sampling of IMPACT Research: A Sampling of IMPACT Research: Methods for Analysis with Dropout and Identifying Optimal Treatment Regimes Marie Davidian Department of Statistics North Carolina State University http://www.stat.ncsu.edu/

More information

Robustness to Parametric Assumptions in Missing Data Models

Robustness to Parametric Assumptions in Missing Data Models Robustness to Parametric Assumptions in Missing Data Models Bryan Graham NYU Keisuke Hirano University of Arizona April 2011 Motivation Motivation We consider the classic missing data problem. In practice

More information

A note on profile likelihood for exponential tilt mixture models

A note on profile likelihood for exponential tilt mixture models Biometrika (2009), 96, 1,pp. 229 236 C 2009 Biometrika Trust Printed in Great Britain doi: 10.1093/biomet/asn059 Advance Access publication 22 January 2009 A note on profile likelihood for exponential

More information

Combining multiple observational data sources to estimate causal eects

Combining multiple observational data sources to estimate causal eects Department of Statistics, North Carolina State University Combining multiple observational data sources to estimate causal eects Shu Yang* syang24@ncsuedu Joint work with Peng Ding UC Berkeley May 23,

More information

Classification. Chapter Introduction. 6.2 The Bayes classifier

Classification. Chapter Introduction. 6.2 The Bayes classifier Chapter 6 Classification 6.1 Introduction Often encountered in applications is the situation where the response variable Y takes values in a finite set of labels. For example, the response Y could encode

More information

Primal-dual Covariate Balance and Minimal Double Robustness via Entropy Balancing

Primal-dual Covariate Balance and Minimal Double Robustness via Entropy Balancing Primal-dual Covariate Balance and Minimal Double Robustness via (Joint work with Daniel Percival) Department of Statistics, Stanford University JSM, August 9, 2015 Outline 1 2 3 1/18 Setting Rubin s causal

More information

An augmented inverse probability weighted survival function estimator

An augmented inverse probability weighted survival function estimator An augmented inverse probability weighted survival function estimator Sundarraman Subramanian & Dipankar Bandyopadhyay Abstract We analyze an augmented inverse probability of non-missingness weighted estimator

More information

Simple design-efficient calibration estimators for rejective and high-entropy sampling

Simple design-efficient calibration estimators for rejective and high-entropy sampling Biometrika (202), 99,, pp. 6 C 202 Biometrika Trust Printed in Great Britain Advance Access publication on 3 July 202 Simple design-efficient calibration estimators for rejective and high-entropy sampling

More information

Calibration Estimation of Semiparametric Copula Models with Data Missing at Random

Calibration Estimation of Semiparametric Copula Models with Data Missing at Random Calibration Estimation of Semiparametric Copula Models with Data Missing at Random Shigeyuki Hamori 1 Kaiji Motegi 1 Zheng Zhang 2 1 Kobe University 2 Renmin University of China Institute of Statistics

More information

Lecture 6: Discrete Choice: Qualitative Response

Lecture 6: Discrete Choice: Qualitative Response Lecture 6: Instructor: Department of Economics Stanford University 2011 Types of Discrete Choice Models Univariate Models Binary: Linear; Probit; Logit; Arctan, etc. Multinomial: Logit; Nested Logit; GEV;

More information

Web-based Supplementary Materials for A Robust Method for Estimating. Optimal Treatment Regimes

Web-based Supplementary Materials for A Robust Method for Estimating. Optimal Treatment Regimes Biometrics 000, 000 000 DOI: 000 000 0000 Web-based Supplementary Materials for A Robust Method for Estimating Optimal Treatment Regimes Baqun Zhang, Anastasios A. Tsiatis, Eric B. Laber, and Marie Davidian

More information

Calibration Estimation of Semiparametric Copula Models with Data Missing at Random

Calibration Estimation of Semiparametric Copula Models with Data Missing at Random Calibration Estimation of Semiparametric Copula Models with Data Missing at Random Shigeyuki Hamori 1 Kaiji Motegi 1 Zheng Zhang 2 1 Kobe University 2 Renmin University of China Econometrics Workshop UNC

More information

Propensity Score Weighting with Multilevel Data

Propensity Score Weighting with Multilevel Data Propensity Score Weighting with Multilevel Data Fan Li Department of Statistical Science Duke University October 25, 2012 Joint work with Alan Zaslavsky and Mary Beth Landrum Introduction In comparative

More information

Harvard University. Harvard University Biostatistics Working Paper Series

Harvard University. Harvard University Biostatistics Working Paper Series Harvard University Harvard University Biostatistics Working Paper Series Year 2015 Paper 197 On Varieties of Doubly Robust Estimators Under Missing Not at Random With an Ancillary Variable Wang Miao Eric

More information

Density estimation Nonparametric conditional mean estimation Semiparametric conditional mean estimation. Nonparametrics. Gabriel Montes-Rojas

Density estimation Nonparametric conditional mean estimation Semiparametric conditional mean estimation. Nonparametrics. Gabriel Montes-Rojas 0 0 5 Motivation: Regression discontinuity (Angrist&Pischke) Outcome.5 1 1.5 A. Linear E[Y 0i X i] 0.2.4.6.8 1 X Outcome.5 1 1.5 B. Nonlinear E[Y 0i X i] i 0.2.4.6.8 1 X utcome.5 1 1.5 C. Nonlinearity

More information

Estimating the Marginal Odds Ratio in Observational Studies

Estimating the Marginal Odds Ratio in Observational Studies Estimating the Marginal Odds Ratio in Observational Studies Travis Loux Christiana Drake Department of Statistics University of California, Davis June 20, 2011 Outline The Counterfactual Model Odds Ratios

More information

arxiv: v1 [stat.me] 15 May 2011

arxiv: v1 [stat.me] 15 May 2011 Working Paper Propensity Score Analysis with Matching Weights Liang Li, Ph.D. arxiv:1105.2917v1 [stat.me] 15 May 2011 Associate Staff of Biostatistics Department of Quantitative Health Sciences, Cleveland

More information

Estimation of Optimal Treatment Regimes Via Machine Learning. Marie Davidian

Estimation of Optimal Treatment Regimes Via Machine Learning. Marie Davidian Estimation of Optimal Treatment Regimes Via Machine Learning Marie Davidian Department of Statistics North Carolina State University Triangle Machine Learning Day April 3, 2018 1/28 Optimal DTRs Via ML

More information

Propensity Score Analysis with Hierarchical Data

Propensity Score Analysis with Hierarchical Data Propensity Score Analysis with Hierarchical Data Fan Li Alan Zaslavsky Mary Beth Landrum Department of Health Care Policy Harvard Medical School May 19, 2008 Introduction Population-based observational

More information

Marginal and Nested Structural Models Using Instrumental Variables

Marginal and Nested Structural Models Using Instrumental Variables Marginal and Nested Structural Models Using Instrumental Variables Zhiqiang TAN The objective of many scientific studies is to evaluate the effect of a treatment on an outcome of interest ceteris paribus.

More information

e author and the promoter give permission to consult this master dissertation and to copy it or parts of it for personal use. Each other use falls

e author and the promoter give permission to consult this master dissertation and to copy it or parts of it for personal use. Each other use falls e author and the promoter give permission to consult this master dissertation and to copy it or parts of it for personal use. Each other use falls under the restrictions of the copyright, in particular

More information

Flexible Estimation of Treatment Effect Parameters

Flexible Estimation of Treatment Effect Parameters Flexible Estimation of Treatment Effect Parameters Thomas MaCurdy a and Xiaohong Chen b and Han Hong c Introduction Many empirical studies of program evaluations are complicated by the presence of both

More information

Extending causal inferences from a randomized trial to a target population

Extending causal inferences from a randomized trial to a target population Extending causal inferences from a randomized trial to a target population Issa Dahabreh Center for Evidence Synthesis in Health, Brown University issa dahabreh@brown.edu January 16, 2019 Issa Dahabreh

More information

Targeted Maximum Likelihood Estimation in Safety Analysis

Targeted Maximum Likelihood Estimation in Safety Analysis Targeted Maximum Likelihood Estimation in Safety Analysis Sam Lendle 1 Bruce Fireman 2 Mark van der Laan 1 1 UC Berkeley 2 Kaiser Permanente ISPE Advanced Topics Session, Barcelona, August 2012 1 / 35

More information

University of California, Berkeley

University of California, Berkeley University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 2009 Paper 251 Nonparametric population average models: deriving the form of approximate population

More information

Comment: Performance of Double-Robust Estimators When Inverse Probability Weights Are Highly Variable

Comment: Performance of Double-Robust Estimators When Inverse Probability Weights Are Highly Variable Statistical Science 2007, Vol. 22, No. 4, 544 559 DOI: 10.1214/07-STS227D Main article DOI: 10.1214/07-STS227 c Institute of Mathematical Statistics, 2007 Comment: Performance of Double-Robust Estimators

More information

Causal Inference Lecture Notes: Causal Inference with Repeated Measures in Observational Studies

Causal Inference Lecture Notes: Causal Inference with Repeated Measures in Observational Studies Causal Inference Lecture Notes: Causal Inference with Repeated Measures in Observational Studies Kosuke Imai Department of Politics Princeton University November 13, 2013 So far, we have essentially assumed

More information

5 Methods Based on Inverse Probability Weighting Under MAR

5 Methods Based on Inverse Probability Weighting Under MAR 5 Methods Based on Inverse Probability Weighting Under MAR The likelihood-based and multiple imputation methods we considered for inference under MAR in Chapters 3 and 4 are based, either directly or indirectly,

More information

Plug-in Approach to Active Learning

Plug-in Approach to Active Learning Plug-in Approach to Active Learning Stanislav Minsker Stanislav Minsker (Georgia Tech) Plug-in approach to active learning 1 / 18 Prediction framework Let (X, Y ) be a random couple in R d { 1, +1}. X

More information

Outline of GLMs. Definitions

Outline of GLMs. Definitions Outline of GLMs Definitions This is a short outline of GLM details, adapted from the book Nonparametric Regression and Generalized Linear Models, by Green and Silverman. The responses Y i have density

More information

7 Sensitivity Analysis

7 Sensitivity Analysis 7 Sensitivity Analysis A recurrent theme underlying methodology for analysis in the presence of missing data is the need to make assumptions that cannot be verified based on the observed data. If the assumption

More information

University of California, Berkeley

University of California, Berkeley University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 2011 Paper 288 Targeted Maximum Likelihood Estimation of Natural Direct Effect Wenjing Zheng Mark J.

More information

Causal Inference with a Continuous Treatment and Outcome: Alternative Estimators for Parametric Dose-Response Functions

Causal Inference with a Continuous Treatment and Outcome: Alternative Estimators for Parametric Dose-Response Functions Causal Inference with a Continuous Treatment and Outcome: Alternative Estimators for Parametric Dose-Response Functions Joe Schafer Office of the Associate Director for Research and Methodology U.S. Census

More information

What s New in Econometrics. Lecture 1

What s New in Econometrics. Lecture 1 What s New in Econometrics Lecture 1 Estimation of Average Treatment Effects Under Unconfoundedness Guido Imbens NBER Summer Institute, 2007 Outline 1. Introduction 2. Potential Outcomes 3. Estimands and

More information

EM Algorithm II. September 11, 2018

EM Algorithm II. September 11, 2018 EM Algorithm II September 11, 2018 Review EM 1/27 (Y obs, Y mis ) f (y obs, y mis θ), we observe Y obs but not Y mis Complete-data log likelihood: l C (θ Y obs, Y mis ) = log { f (Y obs, Y mis θ) Observed-data

More information

This is the submitted version of the following book chapter: stat08068: Double robustness, which will be

This is the submitted version of the following book chapter: stat08068: Double robustness, which will be This is the submitted version of the following book chapter: stat08068: Double robustness, which will be published in its final form in Wiley StatsRef: Statistics Reference Online (http://onlinelibrary.wiley.com/book/10.1002/9781118445112)

More information

Calibration Estimation for Semiparametric Copula Models under Missing Data

Calibration Estimation for Semiparametric Copula Models under Missing Data Calibration Estimation for Semiparametric Copula Models under Missing Data Shigeyuki Hamori 1 Kaiji Motegi 1 Zheng Zhang 2 1 Kobe University 2 Renmin University of China Economics and Economic Growth Centre

More information

Chapter 3: Maximum Likelihood Theory

Chapter 3: Maximum Likelihood Theory Chapter 3: Maximum Likelihood Theory Florian Pelgrin HEC September-December, 2010 Florian Pelgrin (HEC) Maximum Likelihood Theory September-December, 2010 1 / 40 1 Introduction Example 2 Maximum likelihood

More information

Econometrics of Panel Data

Econometrics of Panel Data Econometrics of Panel Data Jakub Mućk Meeting # 6 Jakub Mućk Econometrics of Panel Data Meeting # 6 1 / 36 Outline 1 The First-Difference (FD) estimator 2 Dynamic panel data models 3 The Anderson and Hsiao

More information

DOUBLY ROBUST NONPARAMETRIC MULTIPLE IMPUTATION FOR IGNORABLE MISSING DATA

DOUBLY ROBUST NONPARAMETRIC MULTIPLE IMPUTATION FOR IGNORABLE MISSING DATA Statistica Sinica 22 (2012), 149-172 doi:http://dx.doi.org/10.5705/ss.2010.069 DOUBLY ROBUST NONPARAMETRIC MULTIPLE IMPUTATION FOR IGNORABLE MISSING DATA Qi Long, Chiu-Hsieh Hsu and Yisheng Li Emory University,

More information

Nonresponse weighting adjustment using estimated response probability

Nonresponse weighting adjustment using estimated response probability Nonresponse weighting adjustment using estimated response probability Jae-kwang Kim Yonsei University, Seoul, Korea December 26, 2006 Introduction Nonresponse Unit nonresponse Item nonresponse Basic strategy

More information

arxiv: v2 [stat.me] 17 Jan 2017

arxiv: v2 [stat.me] 17 Jan 2017 Semiparametric Estimation with Data Missing Not at Random Using an Instrumental Variable arxiv:1607.03197v2 [stat.me] 17 Jan 2017 BaoLuo Sun 1, Lan Liu 1, Wang Miao 1,4, Kathleen Wirth 2,3, James Robins

More information

Introduction to Estimation Methods for Time Series models Lecture 2

Introduction to Estimation Methods for Time Series models Lecture 2 Introduction to Estimation Methods for Time Series models Lecture 2 Fulvio Corsi SNS Pisa Fulvio Corsi Introduction to Estimation () Methods for Time Series models Lecture 2 SNS Pisa 1 / 21 Estimators:

More information

INVERSE PROBABILITY WEIGHTED ESTIMATION FOR GENERAL MISSING DATA PROBLEMS

INVERSE PROBABILITY WEIGHTED ESTIMATION FOR GENERAL MISSING DATA PROBLEMS IVERSE PROBABILITY WEIGHTED ESTIMATIO FOR GEERAL MISSIG DATA PROBLEMS Jeffrey M. Wooldridge Department of Economics Michigan State University East Lansing, MI 48824-1038 (517) 353-5972 wooldri1@msu.edu

More information

Data Integration for Big Data Analysis for finite population inference

Data Integration for Big Data Analysis for finite population inference for Big Data Analysis for finite population inference Jae-kwang Kim ISU January 23, 2018 1 / 36 What is big data? 2 / 36 Data do not speak for themselves Knowledge Reproducibility Information Intepretation

More information

Empirical likelihood methods in missing response problems and causal interference

Empirical likelihood methods in missing response problems and causal interference The University of Toledo The University of Toledo Digital Repository Theses and Dissertations 2017 Empirical likelihood methods in missing response problems and causal interference Kaili Ren University

More information

Causal Inference Basics

Causal Inference Basics Causal Inference Basics Sam Lendle October 09, 2013 Observed data, question, counterfactuals Observed data: n i.i.d copies of baseline covariates W, treatment A {0, 1}, and outcome Y. O i = (W i, A i,

More information

University of California, Berkeley

University of California, Berkeley University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 2010 Paper 260 Collaborative Targeted Maximum Likelihood For Time To Event Data Ori M. Stitelman Mark

More information

Notes on the Multivariate Normal and Related Topics

Notes on the Multivariate Normal and Related Topics Version: July 10, 2013 Notes on the Multivariate Normal and Related Topics Let me refresh your memory about the distinctions between population and sample; parameters and statistics; population distributions

More information

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Jae-Kwang Kim Department of Statistics, Iowa State University Outline 1 Introduction 2 Observed likelihood 3 Mean Score

More information

A note on L convergence of Neumann series approximation in missing data problems

A note on L convergence of Neumann series approximation in missing data problems A note on L convergence of Neumann series approximation in missing data problems Hua Yun Chen Division of Epidemiology & Biostatistics School of Public Health University of Illinois at Chicago 1603 West

More information

6 Pattern Mixture Models

6 Pattern Mixture Models 6 Pattern Mixture Models A common theme underlying the methods we have discussed so far is that interest focuses on making inference on parameters in a parametric or semiparametric model for the full data

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

Shu Yang and Jae Kwang Kim. Harvard University and Iowa State University

Shu Yang and Jae Kwang Kim. Harvard University and Iowa State University Statistica Sinica 27 (2017), 000-000 doi:https://doi.org/10.5705/ss.202016.0155 DISCUSSION: DISSECTING MULTIPLE IMPUTATION FROM A MULTI-PHASE INFERENCE PERSPECTIVE: WHAT HAPPENS WHEN GOD S, IMPUTER S AND

More information

A Course in Applied Econometrics Lecture 18: Missing Data. Jeff Wooldridge IRP Lectures, UW Madison, August Linear model with IVs: y i x i u i,

A Course in Applied Econometrics Lecture 18: Missing Data. Jeff Wooldridge IRP Lectures, UW Madison, August Linear model with IVs: y i x i u i, A Course in Applied Econometrics Lecture 18: Missing Data Jeff Wooldridge IRP Lectures, UW Madison, August 2008 1. When Can Missing Data be Ignored? 2. Inverse Probability Weighting 3. Imputation 4. Heckman-Type

More information

Power and Sample Size Calculations with the Additive Hazards Model

Power and Sample Size Calculations with the Additive Hazards Model Journal of Data Science 10(2012), 143-155 Power and Sample Size Calculations with the Additive Hazards Model Ling Chen, Chengjie Xiong, J. Philip Miller and Feng Gao Washington University School of Medicine

More information

Recent Advances in the analysis of missing data with non-ignorable missingness

Recent Advances in the analysis of missing data with non-ignorable missingness Recent Advances in the analysis of missing data with non-ignorable missingness Jae-Kwang Kim Department of Statistics, Iowa State University July 4th, 2014 1 Introduction 2 Full likelihood-based ML estimation

More information

Spring 2017 Econ 574 Roger Koenker. Lecture 14 GEE-GMM

Spring 2017 Econ 574 Roger Koenker. Lecture 14 GEE-GMM University of Illinois Department of Economics Spring 2017 Econ 574 Roger Koenker Lecture 14 GEE-GMM Throughout the course we have emphasized methods of estimation and inference based on the principle

More information

A Course in Applied Econometrics Lecture 14: Control Functions and Related Methods. Jeff Wooldridge IRP Lectures, UW Madison, August 2008

A Course in Applied Econometrics Lecture 14: Control Functions and Related Methods. Jeff Wooldridge IRP Lectures, UW Madison, August 2008 A Course in Applied Econometrics Lecture 14: Control Functions and Related Methods Jeff Wooldridge IRP Lectures, UW Madison, August 2008 1. Linear-in-Parameters Models: IV versus Control Functions 2. Correlated

More information

Nuisance parameter elimination for proportional likelihood ratio models with nonignorable missingness and random truncation

Nuisance parameter elimination for proportional likelihood ratio models with nonignorable missingness and random truncation Biometrika Advance Access published October 24, 202 Biometrika (202), pp. 8 C 202 Biometrika rust Printed in Great Britain doi: 0.093/biomet/ass056 Nuisance parameter elimination for proportional likelihood

More information

Robustness of a semiparametric estimator of a copula

Robustness of a semiparametric estimator of a copula Robustness of a semiparametric estimator of a copula Gunky Kim a, Mervyn J. Silvapulle b and Paramsothy Silvapulle c a Department of Econometrics and Business Statistics, Monash University, c Caulfield

More information

STAT331. Cox s Proportional Hazards Model

STAT331. Cox s Proportional Hazards Model STAT331 Cox s Proportional Hazards Model In this unit we introduce Cox s proportional hazards (Cox s PH) model, give a heuristic development of the partial likelihood function, and discuss adaptations

More information

DISCUSSION PAPER. The Bias from Misspecification of Control Variables as Linear. L e o n a r d G o f f. November 2014 RFF DP 14-41

DISCUSSION PAPER. The Bias from Misspecification of Control Variables as Linear. L e o n a r d G o f f. November 2014 RFF DP 14-41 DISCUSSION PAPER November 014 RFF DP 14-41 The Bias from Misspecification of Control Variables as Linear L e o n a r d G o f f 1616 P St. NW Washington, DC 0036 0-38-5000 www.rff.org The Bias from Misspecification

More information

A Measure of Robustness to Misspecification

A Measure of Robustness to Misspecification A Measure of Robustness to Misspecification Susan Athey Guido W. Imbens December 2014 Graduate School of Business, Stanford University, and NBER. Electronic correspondence: athey@stanford.edu. Graduate

More information

Optimal Treatment Regimes for Survival Endpoints from a Classification Perspective. Anastasios (Butch) Tsiatis and Xiaofei Bai

Optimal Treatment Regimes for Survival Endpoints from a Classification Perspective. Anastasios (Butch) Tsiatis and Xiaofei Bai Optimal Treatment Regimes for Survival Endpoints from a Classification Perspective Anastasios (Butch) Tsiatis and Xiaofei Bai Department of Statistics North Carolina State University 1/35 Optimal Treatment

More information

(θ θ ), θ θ = 2 L(θ ) θ θ θ θ θ (θ )= H θθ (θ ) 1 d θ (θ )

(θ θ ), θ θ = 2 L(θ ) θ θ θ θ θ (θ )= H θθ (θ ) 1 d θ (θ ) Setting RHS to be zero, 0= (θ )+ 2 L(θ ) (θ θ ), θ θ = 2 L(θ ) 1 (θ )= H θθ (θ ) 1 d θ (θ ) O =0 θ 1 θ 3 θ 2 θ Figure 1: The Newton-Raphson Algorithm where H is the Hessian matrix, d θ is the derivative

More information

On the Power of Tests for Regime Switching

On the Power of Tests for Regime Switching On the Power of Tests for Regime Switching joint work with Drew Carter and Ben Hansen Douglas G. Steigerwald UC Santa Barbara May 2015 D. Steigerwald (UCSB) Regime Switching May 2015 1 / 42 Motivating

More information

Introduction to Empirical Processes and Semiparametric Inference Lecture 25: Semiparametric Models

Introduction to Empirical Processes and Semiparametric Inference Lecture 25: Semiparametric Models Introduction to Empirical Processes and Semiparametric Inference Lecture 25: Semiparametric Models Michael R. Kosorok, Ph.D. Professor and Chair of Biostatistics Professor of Statistics and Operations

More information

Accounting for Population Uncertainty in Covariance Structure Analysis

Accounting for Population Uncertainty in Covariance Structure Analysis Accounting for Population Uncertainty in Structure Analysis Boston College May 21, 2013 Joint work with: Michael W. Browne The Ohio State University matrix among observed variables are usually implied

More information

Deductive Derivation and Computerization of Semiparametric Efficient Estimation

Deductive Derivation and Computerization of Semiparametric Efficient Estimation Deductive Derivation and Computerization of Semiparametric Efficient Estimation Constantine Frangakis, Tianchen Qian, Zhenke Wu, and Ivan Diaz Department of Biostatistics Johns Hopkins Bloomberg School

More information

Generalized Linear Models

Generalized Linear Models Generalized Linear Models Advanced Methods for Data Analysis (36-402/36-608 Spring 2014 1 Generalized linear models 1.1 Introduction: two regressions So far we ve seen two canonical settings for regression.

More information

CCP Estimation. Robert A. Miller. March Dynamic Discrete Choice. Miller (Dynamic Discrete Choice) cemmap 6 March / 27

CCP Estimation. Robert A. Miller. March Dynamic Discrete Choice. Miller (Dynamic Discrete Choice) cemmap 6 March / 27 CCP Estimation Robert A. Miller Dynamic Discrete Choice March 2018 Miller Dynamic Discrete Choice) cemmap 6 March 2018 1 / 27 Criteria for Evaluating Estimators General principles to apply when assessing

More information

Propensity-Score Based Methods for Causal Inference in Observational Studies with Fixed Non-Binary Treatments

Propensity-Score Based Methods for Causal Inference in Observational Studies with Fixed Non-Binary Treatments Propensity-Score Based Methods for Causal Inference in Observational Studies with Fixed Non-Binary reatments Shandong Zhao Department of Statistics, University of California, Irvine, CA 92697 shandonm@uci.edu

More information

Bootstrapping Sensitivity Analysis

Bootstrapping Sensitivity Analysis Bootstrapping Sensitivity Analysis Qingyuan Zhao Department of Statistics, The Wharton School University of Pennsylvania May 23, 2018 @ ACIC Based on: Qingyuan Zhao, Dylan S. Small, and Bhaswar B. Bhattacharya.

More information

Conditional Empirical Likelihood Approach to Statistical Analysis with Missing Data

Conditional Empirical Likelihood Approach to Statistical Analysis with Missing Data Conditional Empirical Likelihood Approach to Statistical Analysis with Missing Data by Peisong Han A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy

More information

arxiv: v1 [stat.me] 5 Apr 2017

arxiv: v1 [stat.me] 5 Apr 2017 Doubly Robust Inference for Targeted Minimum Loss Based Estimation in Randomized Trials with Missing Outcome Data arxiv:1704.01538v1 [stat.me] 5 Apr 2017 Iván Díaz 1 and Mark J. van der Laan 2 1 Division

More information

A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness

A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness A. Linero and M. Daniels UF, UT-Austin SRC 2014, Galveston, TX 1 Background 2 Working model

More information

Using Estimating Equations for Spatially Correlated A

Using Estimating Equations for Spatially Correlated A Using Estimating Equations for Spatially Correlated Areal Data December 8, 2009 Introduction GEEs Spatial Estimating Equations Implementation Simulation Conclusion Typical Problem Assess the relationship

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1396 1 / 44 Table

More information

The BLP Method of Demand Curve Estimation in Industrial Organization

The BLP Method of Demand Curve Estimation in Industrial Organization The BLP Method of Demand Curve Estimation in Industrial Organization 9 March 2006 Eric Rasmusen 1 IDEAS USED 1. Instrumental variables. We use instruments to correct for the endogeneity of prices, the

More information

SIMPLE EXAMPLES OF ESTIMATING CAUSAL EFFECTS USING TARGETED MAXIMUM LIKELIHOOD ESTIMATION

SIMPLE EXAMPLES OF ESTIMATING CAUSAL EFFECTS USING TARGETED MAXIMUM LIKELIHOOD ESTIMATION Johns Hopkins University, Dept. of Biostatistics Working Papers 3-3-2011 SIMPLE EXAMPLES OF ESTIMATING CAUSAL EFFECTS USING TARGETED MAXIMUM LIKELIHOOD ESTIMATION Michael Rosenblum Johns Hopkins Bloomberg

More information

Nonlinear and/or Non-normal Filtering. Jesús Fernández-Villaverde University of Pennsylvania

Nonlinear and/or Non-normal Filtering. Jesús Fernández-Villaverde University of Pennsylvania Nonlinear and/or Non-normal Filtering Jesús Fernández-Villaverde University of Pennsylvania 1 Motivation Nonlinear and/or non-gaussian filtering, smoothing, and forecasting (NLGF) problems are pervasive

More information

Propensity score adjusted method for missing data

Propensity score adjusted method for missing data Graduate Theses and Dissertations Graduate College 2013 Propensity score adjusted method for missing data Minsun Kim Riddles Iowa State University Follow this and additional works at: http://lib.dr.iastate.edu/etd

More information

arxiv: v2 [stat.me] 8 Oct 2018

arxiv: v2 [stat.me] 8 Oct 2018 SENSITIVITY ANALYSIS FOR INVERSE PROBABILITY WEIGHTING ESTIMATORS VIA THE PERCENTILE BOOTSTRAP QINGYUAN ZHAO, DYLAN S. SMALL AND BHASWAR B. BHATTACHARYA arxiv:1711.1186v [stat.me] 8 Oct 018 Department

More information

Improving Efficiency of Inferences in Randomized Clinical Trials Using Auxiliary Covariates

Improving Efficiency of Inferences in Randomized Clinical Trials Using Auxiliary Covariates Improving Efficiency of Inferences in Randomized Clinical Trials Using Auxiliary Covariates Anastasios (Butch) Tsiatis Department of Statistics North Carolina State University http://www.stat.ncsu.edu/

More information

University of California, Berkeley

University of California, Berkeley University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 24 Paper 153 A Note on Empirical Likelihood Inference of Residual Life Regression Ying Qing Chen Yichuan

More information

Introduction to Econometrics

Introduction to Econometrics Introduction to Econometrics T H I R D E D I T I O N Global Edition James H. Stock Harvard University Mark W. Watson Princeton University Boston Columbus Indianapolis New York San Francisco Upper Saddle

More information

Economic modelling and forecasting

Economic modelling and forecasting Economic modelling and forecasting 2-6 February 2015 Bank of England he generalised method of moments Ole Rummel Adviser, CCBS at the Bank of England ole.rummel@bankofengland.co.uk Outline Classical estimation

More information

Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling

Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling Jae-Kwang Kim 1 Iowa State University June 26, 2013 1 Joint work with Shu Yang Introduction 1 Introduction

More information

Notes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed

Notes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed 18.466 Notes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed 1. MLEs in exponential families Let f(x,θ) for x X and θ Θ be a likelihood function, that is, for present purposes,

More information

Hypothesis Testing Based on the Maximum of Two Statistics from Weighted and Unweighted Estimating Equations

Hypothesis Testing Based on the Maximum of Two Statistics from Weighted and Unweighted Estimating Equations Hypothesis Testing Based on the Maximum of Two Statistics from Weighted and Unweighted Estimating Equations Takeshi Emura and Hisayuki Tsukuma Abstract For testing the regression parameter in multivariate

More information

CS 195-5: Machine Learning Problem Set 1

CS 195-5: Machine Learning Problem Set 1 CS 95-5: Machine Learning Problem Set Douglas Lanman dlanman@brown.edu 7 September Regression Problem Show that the prediction errors y f(x; ŵ) are necessarily uncorrelated with any linear function of

More information