Bounded, Efficient, and Doubly Robust Estimation with Inverse Weighting
|
|
- Cordelia Young
- 5 years ago
- Views:
Transcription
1 Biometrika (2008), 94, 2, pp C 2008 Biometrika Trust Printed in Great Britain Advance Access publication on 31 July 2008 Bounded, Efficient, and Doubly obust Estimation with Inverse Weighting BY Z. TAN Department of Statistics, utgers University, Piscataway, New Jersey 08854, U.S.A. ztan@stat.rutgers.edu SUMMAY Consider the problem of estimating the mean of an outcome in the presence of missing data or estimating population average treatment effects in causal inference. A doubly robust estimator remains consistent if an outcome regression model or a propensity score model is correctly specified. We build on the nonparametric likelihood approach of Tan and propose new doubly robust estimators. These estimators have desirable properties in efficiency if the propensity score model is correctly specified, and in boundedness even if the inverse probability weights are highly variable. We compare new and existing estimators in a simulation study and find that the robustified likelihood estimators yield overall the smallest mean squared errors. Some key words: Causal inference; Double robustness; Inverse weighting; Missing data; Nonparametric likelihood; Propensity score. 1. INTODUCTION Consider the problem of estimating the mean of an outcome in the presence of missing data under ignorability (ubin, 1976). A related problem is to estimate population average treatment effects under no unmeasured confounding in causal inference (Neyman, 1923; ubin, 1974). Such problems can be handled in two different ways. One approach is to model the mean of the outcome given covariates, called the outcome regression function, and derive an estimator based on the fitted values for observed and missing outcomes. The other approach is to model the probability of non-missingness given the covariates, called the propensity score (osenbaum & ubin, 1983), and derive an estimator through inverse probability weighting of observed outcomes. Inverse-probability-weighted estimators are central to the semiparametric theory of estimation with missing data (e.g., Tsiatis, 2006; van der Laan & obins, 2003). The two approaches rely on different modelling assumptions and one does not necessarily dominate the other (Tan, 2007). A doubly robust approach makes use of both the outcome regression model and the propensity score model and derives an estimator that remains consistent if either of the two models is correctly specified. A prototypical doubly robust estimator is the augmented inverse-probability-weighted estimator of obins et al. (1994). ecently, a number of alternative doubly robust estimators have been proposed. See Kang & Schafer (2007) and the related discussions. All existing doubly robust estimators are locally efficient: they attain the semiparametric variance bound, and hence asymptotically equivalent to each other, if both the propensity score model and the outcome regression model are correctly specified. Therefore, it is important to compare doubly robust estimators in their statistical properties if only one of the models is correctly specified or if both models are misspecified.
2 2 Z. TAN We review various doubly robust estimators and highlight statistical criteria underlying their construction. Some estimators are intrinsically efficient: if the propensity score model is correctly specified, then each of them is asymptotically efficient among a class of augmented inverseprobability-weighted estimators that use the same fitted outcome regression function (Tan, 2006, 2007). Some estimators are improved-locally efficient: if the propensity score model is correctly specified, then they are asymptotically at least as efficient as the augmented inverse-probabilityweighted estimator that uses the true propensity score and an optimally fitted outcome regression function (ubin & van der Laan, 2008; Tan, 2008). Some estimators are population-bounded or sample-bounded: they lie within the range of all possible values or that of observed values of the outcome (obins et al., 2007). The properties of boundedness rule out estimates outside the population or sample range even when the inverse probability weights are highly variable. We propose a robustification of the likelihood estimator of Tan (2006), named calibrated likelihood estimator, by calibrating the coefficients in a linear, extended propensity score model. The estimator is computationally convenient, involving two steps of maximizing concave functions. Moreover, the estimator is locally and intrinsically efficient and sample-bounded, and is further improved-locally efficient if the outcome regression function is suitably estimated. No existing doubly robust estimators achieve these four properties simultaneously. We further derive a robustification of the likelihood estimator of Tan (2006), named augmented likelihood estimator, by incorporating an augmentation term. This estimator satisfies only a weaker form of boundedness than population and sample boundedness. We compare new and existing estimators in a simulation study and find that the calibrated and augmented likelihood estimators yield overall the smallest mean squared errors. 2. MISSING DATA POBLEMS 2 1. Setup Let X be a vector of covariates and Y be an outcome. The variables X are always observed, but Y may be missing. Let be the non-missing indicator such that = 1 or 0 if Y is observed or missing respectively. Throughout, assume that the missing data mechanism is ignorable, that is, and Y are conditionally independent given X (ubin, 1976). Suppose that an independent and identically distributed sample of n units is available. The observed data consist of (X i, i, i Y i ), i = 1,..., n. Our objective is to estimate the population mean µ = E(Y ). Although this problem is simple to describe, it provides a basic setting for us to investigate methods for handling missing data Models There are two different ways of postulating dimension-reduction assumptions to obtain consistent and asymptotically normal estimators of µ. One approach is to specify a parametric model for the outcome regression function m(x) = E(Y X) in the form E(Y X) = m(x; α) = Ψ{α T g(x)}, (1) where Ψ is an inverse link function, g(x) is a vector of known functions including the constant 1, and α is a vector of unknown parameters. Let ˆα OLS be the maximum quasi-likelihood estimator of α or its variant. For concreteness, fix ˆα OLS as the estimator that solves the equation 0 = Ẽ [{Y m(x; α)}g(x)], where Ẽ denotes sample average. Let ˆµ OLS = Ẽ{ ˆm OLS(X)}, where ˆm OLS (X) = m(x; ˆα OLS ). Under regularity conditions, if model (1) is correctly specified, then ˆµ OLS is consistent and asymptotically normal, with asymptotic variance no greater than the semiparametric variance bound, provided that E(Y 2 ) <.
3 Biometrika style 3 The other approach is to specify a parametric model for the propensity score π(x) = P ( = 1 X) in the form P ( = 1 X) = π(x; γ) = Π{γ T f(x)}, (2) where Π is an inverse link function, f(x) is a vector of known functions, and γ is a vector of unknown parameters. Let ˆγ ML be the maximum likelihood estimator of γ and hence a solution to the equation 0 = Ẽ [{ π(x; γ)}ϱ(x; γ)f(x)], where ϱ(x; γ) = Π {γ T f(x)}/[π(x; γ){1 π(x; γ)}] and Π is the derivative of Π. Two non-augmented inverse-probability-weighted estimators are { } { } / { } Y Y ˆµ IPW = Ẽ, ˆµ IPW,ratio = ˆπ ML (X) Ẽ Ẽ, ˆπ ML (X) ˆπ ML (X) where ˆπ ML (X) = π(x; ˆγ ML ). Under regularity conditions, if model (2) is correctly specified, then ˆµ IPW and ˆµ IPW,ratio are consistent and asymptotically normal, with asymptotic variances no smaller than the semiparametric variance bound, provided that E{π 1 (X)} < and E{Y 2 π 1 (X)} <. See Tan (2007) for a comparison between the two approaches Existing estimators The estimator ˆµ O is based on model (1) only, and ˆµ IPW and ˆµ IPW,ratio are based on model (2) only. Alternatively, a range of estimators have been proposed by using both model (1) and model (2) to gain efficiency and robustness. Many such estimators can be cast in the form [ { } ] [ Y ˆµ(ˆπ, ˆm) = Ẽ ˆπ(X) ˆπ(X) 1 ˆm(X) = Ẽ ˆm(X) + ] {Y ˆm(X)}, ˆπ(X) where ˆπ(X) and ˆm(X) are fitted values of π(x) and m(x) respectively. See Kang & Schafer (2007), obins et al. (2007), and Tan (2006, 2007, 2008) for related discussions. Consider the following estimators of µ, with the same choice ˆπ ML (X) for ˆπ(X) but different choices for ˆm(X). obins et al. (1994) proposed the estimator ˆµ AIPW = ˆµ(ˆπ ML, ˆm OLS ). Scharfstein et al. (1999) suggested the estimator ˆµ OLS,ext = ˆµ{ˆπ ML, ˆm ext (ˆπ ML )} = Ẽ{ ˆm ext(x; ˆπ ML )}, where ˆm ext (X; ˆπ) = m ext {X; ˆκ(ˆπ)} and ˆκ(ˆπ) is a solution to 0 = Ẽ[{Y m ext(x; κ)} {ˆπ 1 (X), g T (X)} T ] for the extended outcome regression model E(Y X) = m ext (X; κ) = Ψ{κ 1ˆπ 1 (X) + κ T 2 g(x)} with κ = (κ 1, κ T 2 )T. Kang & Schafer (2007) considered the estimator ˆµ WLS = ˆµ{ˆπ ML, ˆm WLS (ˆπ ML )} = Ẽ{ ˆm WLS(X; ˆπ ML )}, where ˆm WLS (X; ˆπ) = m{x; ˆα WLS (ˆπ)} and ˆα WLS (ˆπ) is a solution to 0 = Ẽ[ˆπ 1 (X){Y m(x; α)}g(x)] and hence differs from ˆα OLS in using weight ˆπ 1 (X). ubin & van der Laan (2008) proposed two related estimators ˆµ V = ˆµ{ˆπ ML, ˆm V (ˆπ ML )}, µ V = ˆµ{ˆπ ML, m V (ˆπ ML )}, where ˆm V (X; ˆπ) = m{x; ˆα V (ˆπ)} and ˆα V (ˆπ) = argmin α Ẽ([Y/ˆπ(X) {/ˆπ(X) 1}m(X; α)] 2 ) for the first estimator and m V (X; ˆπ) = m{x; α V (ˆπ)} and α V (ˆπ) = argmin α Ẽ[{/ˆπ(X)}{/ˆπ(X) 1}{Y m(x; α)} 2 ] for the second estimator. The estimator α V (ˆπ) is a weighted least-squares estimator using weight ˆπ 1 (X){ˆπ 1 (X) 1}. Our notation makes explicit the dependency of ˆm ext (ˆπ), ˆm WLS (ˆπ), ˆm V (ˆπ), and m V (ˆπ) on ˆπ. The choice ˆπ ML (X) for ˆπ(X) is derived under model (2), independently of model (1). A more elaborate choice can be derived under an extended propensity score model with extra linear
4 Z. TAN predictors depending on ˆm(X). Consider the model { } ˆυ(X) P ( = 1 X) = π ext (X; ν) = Π ν1 T ˆϱ ML (X)ˆπ ML (X) + νt 2 f(x), (3) where ν = (ν T 1, νt 2 )T, ˆυ(X) = {1, ˆm(X)} T, and ˆϱ ML (X) = ϱ(x; ˆγ ML ). Let ˆν( ˆm) be the maximum likelihood estimator of ν and write ˆπ ext (X; ˆm) = π ext {X; ˆν( ˆm)}. Substitution of ˆπ ext ( ˆm OLS ) for ˆπ ML in ˆµ IPW yields the estimator of otnitzky & obins (1995), ˆµ IPW,ext = ˆµ{ˆπ ext ( ˆm OLS ), 0}. For ˆm = ˆm OLS or ˆm WLS (ˆπ ML ), substitution of ˆπ ext ( ˆm) for ˆπ ML in ˆµ(ˆπ ML, ˆm), but not for that within ˆm, yields the estimators ˆµ AIPW,ext = ˆµ{ˆπ ext ( ˆm OLS ), ˆm OLS }, ˆµ WLS,ext = ˆµ[ˆπ ext { ˆm WLS (ˆπ ML )}, ˆm WLS (ˆπ ML )]. by obins et al. in a 2008 technical report at Harvard University. In addition, they proposed ˆµ WLS,ext2 = ˆµ(ˆπ ext { ˆm WLS (ˆπ ML )}, ˆm WLS [ˆπ ext { ˆm WLS (ˆπ ML )}]) through a further iteration from ˆµ WLS,ext. The targeted maximum likelihood approach of van der Laan & ubin (2006, Sections ) is closely related to the estimators ˆµ OLS,ext and ˆµ IPW,ext. With ˆm OLS and ˆπ ML as initial fitted values, this approach leads to the estimators ˆµ TML = ˆµ{ˆπ ML, ˆm TML (ˆπ ML )} = Ẽ{ ˆm TML(X; ˆπ ML )}, ˆµ TIPW = ˆµ{ˆπ TML ( ˆm OLS ), 0}, ˆµ TAIPW = ˆµ{ˆπ TML ( ˆm OLS ), ˆm TML (ˆπ ML )}, where ˆm TML (X; ˆπ) is obtained by fitting E(Y X) = m ext (X; κ) with κ 2 fixed at ˆα OLS, and ˆπ TML (X; ˆm) is obtained by fitting P ( = 1 X) = π ext (X; ν) with ν 2 fixed at ˆγ ML. The estimators ˆµ IPW,ext and ˆµ TIPW are similar to the two likelihood estimators of Tan (2006). The first estimator accommodates the variation of ˆγ ML whereas the second ignores that variation Comparison Consider the following criteria for evaluating estimators of µ. Note that improved local efficiency implies local efficiency, and sample boundedness implies population boundedness. (a) Double robustness: ˆµ remains consistent if either model (1) or model (2) is correctly specified. (b) Local efficiency: ˆµ attains the semiparametric variance bound, i.e., it is asymptotically equivalent to the first order to Ẽ[Y/π(X) {/π(x) 1}m(X)] if both model (1) and model (2) are correctly specified. (c) Improved local efficiency: ˆµ is asymptotically at least as efficient as Ẽ[Y/π(X) {/π(x) 1}m(X; α)] for arbitrary α if model (2) is correctly specified. (d) Intrinsic efficiency: ˆµ attains the minimum asymptotic variance among the class of estimators Ẽ[Y/ˆπ ML (X) b T 1 {/ˆπ ML(X) 1}ˆυ(X)] for arbitrary b 1 if model (2) is correctly specified, where ˆυ(X) = {1, ˆm(X)} T and ˆm(X) is the fitted value of m(x) used in ˆµ. Therefore, ˆµ is asymptotically at least as efficient as ˆµ IPW, ˆµ IPW,ratio, and ˆµ(ˆπ ML, ˆm). (e) Population boundedness: ˆµ lies within the range of all possible values of Y, if model (1) or model (2) or both are misspecified. (f) Sample boundedness: ˆµ lies within the range of {Y i : i = 1, i = 1,..., n}, if model (1) or model (2) or both are misspecified. The upper half of Table 1 presents a comparison of various estimators in Section 2 3 in terms of the foregoing criteria. See Sections 3 4 for a discussion of the likelihood and regression estimators in the lower half of Table 1.
5 Biometrika style 5 Table 1. Theoretical comparison of estimators ˆµ TAIPW ˆµ TML ˆµ AIPW,ext ˆµ AIPW ˆµ OLS,ext ˆµ WLS ˆµ V µ V ˆµ IPW,ext ˆµ WLS,ext ˆµ WLS,ext2 D LE IE ILE PB SB ˆµ LIK,OLS ˆµ EG,OLS µ EG,OLS µ LIK2,OLS µ LIK2,WLS µ LIK2,V D LE IE ILE PB SB D, LE, IE, ILE, PB, and SB correspond to criteria (a) (f). 3. POPOSED APPOACH 3 1. Summary We extend the nonparametric likelihood approach of Tan (2006). The main contribution is to obtain an estimator of µ that is doubly robust, locally and intrinsically efficient, and samplebounded simultaneously. Moreover, our approach is flexible enough to allow different choices, such as ˆm OLS, ˆm WLS (ˆπ ML ), and m V (ˆπ ML ), for the fitted value ˆm. If ˆm = m V (ˆπ ML ), then the resulting estimator is further improved-locally efficient Non-doubly-robust likelihood estimator We describe the likelihood estimator of Tan (2006) in the current setup of missing data. The nonparametric likelihood of (X i, i, i Y i ), i = 1,..., n, is [ n ] L 1 L 2 = π(x i ; γ) i {1 π(x i ; γ)} 1 i G 1 ({X i, Y i }) G 0 ({X i }), i=1 i: i =1 i: i =0 where G 1 is the joint distribution of (X, Y ) and G 0 is the marginal distribution of X. Maximizing L 1 leads to the maximum likelihood estimator ˆγ ML. ecall that ˆm(x) is a fitted value of m(x) based on model (1) and ˆυ(x) = {1, ˆm(x)} T. Let ĥ = (ĥt 1, ĥt 2 )T where ĥ 1 (x) = {1 ˆπ ML (x)} ˆυ(x), ĥ 2 (x) = π γ (x; ˆγ ML). We choose to ignore the fact that G 0 and the marginal distribution of X under G 1 are identical, and retain only the constraints ĥ(x) dg 1 = ĥ(x) dg 0, i.e., {1 ˆπ(x)} dg 1 = {1 ˆπ(x)} dg 0, {1 ˆπ(x)} ˆm(x) dg 1 = {1 ˆπ(x)} ˆm(x) dg 0, π π γ (x; ˆγ ML) dg 1 = γ (x; ˆγ ML) dg 0.
6 Z. TAN See Kong et al. (2003) for a related formulation. The first two constraints respectively ensure that the resulting estimator of µ is consistent under correctly specified model (2) and locally efficient, whereas the third constraint accounts for the variation of ˆγ ML such that the resulting estimator is intrinsically efficient. Furthermore, we require that G 1 be a probability measure supported on {(X i, Y i ) : i = 1, i = 1,..., n} and hence dg 1 = 1, and G 0 be a nonnegative measure (not necessarily a probability) supported on {X i : i = 0, i = 1,..., n}. Maximizing L 2 subject to these constraints leads to the estimators n 1 Ĝ 1 ({X i, Y i }) = ω(x i ; ˆλ) if i = 1, n 1 Ĝ 0 ({X i }) = 1 ω(x i ; ˆλ) if i = 0, where ω(x; λ) = ˆπ ML (X) + λ T ĥ(x), ˆλ = argmaxλ l(λ), and l(λ) = Ẽ[ log{ω(x; λ)} + (1 ) log{1 ω(x; λ)}]. The function l(λ) is finite and concave on the set {λ : ω(x i ; λ) > 0 if i = 1 and ω(x i ; λ) < 1 if i = 0, i = 1,..., n}. Moreover, l(λ) is strictly concave and bounded from above, and hence has a unique maximum, if and only if the set {λ : λ T ĥ(x i ) 0 if i = 1 and λ T ĥ(x i ) 0 if i = 0, i = 1,..., n} is empty. (4) See the Appendix for a proof. From our experience, ˆλ can be computed effectively by using a globally convergent optimization algorithm such as the package trust. Setting the gradient of l(λ) to 0 shows that ˆλ is a solution to [ 0 = Ẽ By construction, ˆλ also satisfies The resulting estimator of µ is 1 = ˆµ LIK = ω(x; λ) ω(x; λ){1 ω(x; λ)}ĥ(x) dĝ1 = Ẽ { y dĝ1 = Ẽ ω(x; ˆλ) { Y } ω(x; ˆλ) ]. (5). (6) The estimator ˆµ LIK is structurally similar to ˆµ IPW,ext based on the extended model (3). The value ˆλ can be interpreted as the maximum likelihood estimator of λ under the linear, extended propensity score model P ( = 1 X) = ω(x; λ). However, there are important differences between ˆµ LIK and ˆµ IPW,ext. First, ω(x i ; ˆλ) may not lie between 0 and 1 for all i = 1,..., n. It is only required that ω(x i ; ˆλ) > 0 if i = 1 and ω(x i ; ˆλ) < 1 if i = 0. Moreover, equation (6) automatically holds, whereas Ẽ{/ˆπ ext(x)} = 1 does not. By (6), ω(x i ; ˆλ) with i = 1 are bounded from below by n 1, and ˆµ LIK is sample-bounded. In contrast, ˆπ ext (X i ) with i = 1 may be arbitrarily close to 0, and ˆµ IPW,ext is not sample-bounded. Tan (2006, Theorem 4) obtained an asymptotic expansion of ˆµ LIK, assuming that model (2) is correctly specified. Here, we provide a general asymptotic expansion of ˆµ LIK, allowing for misspecification of model (1) and model (2). See Manski (1988) for related asymptotic theory in misspecified models. Under regularity conditions, ˆλ converges to a constant λ in probability }.
7 Biometrika style 7 with the expansion [ ˆλ λ = ˆB 1 ω(x; λ ] ) Ẽ ω(x; λ ){1 ω(x; λ )}ĥ(x) + o p (n 1/2 ), where [ ˆB = Ẽ { ω(x; λ )} 2 ] ω 2 (X; λ ){1 ω(x; λ )} 2 ĥ(x)ĥt (X). Moreover, a Taylor expansion of ˆµ LIK about λ yields { } [ Y ˆµ LIK = Ẽ ω(x; λ ) Ĉ T ˆB 1 ω(x; λ ] ) Ẽ ω(x; λ ){1 ω(x; λ )}ĥ(x) + o p (n 1/2 ), (7) where Ĉ = Ẽ[{Y/ω2 (X; λ )}ĥ(x)]. If model (2) is correctly specified, then λ = 0 and hence the expansion reduces to ˆµ LIK = ˆµ EG + o p (n 1/2 ) with ˆµ EG = Ẽ(ˆη) ˆβ T Ẽ(ˆξ), where ˆη = Y/ˆπ ML (X), ˆξ = [{/ˆπ ML (X) 1}ˆυ T (X), { ˆπ ML (X)}ˆϱ ML (X)f T (X)] T, ˆB = Ẽ(ˆξ ˆξ T ), Ĉ = Ẽ(ˆξˆη), and ˆβ = ˆB 1 Ĉ is the least-squares estimator in the linear regression of ˆη on ˆξ. The estimator ˆµ EG is locally and intrinsically efficient (obins et al., 1995), but not doubly robust. See Section 4 5 for a further discussion Doubly robust likelihood estimator The estimator ˆµ LIK is sample-bounded and locally and intrinsically efficient. If ˆm = ˆm V (ˆπ ML ) or m V (ˆπ ML ), then ˆµ LIK is further improved-locally efficient because it is asymptotically at least as efficient as ˆµ V or µ V, which is improved-locally efficient. However, ˆµ LIK is not doubly robust. It may be inconsistent if model (1) is correctly specified but model (2) is misspecified. We propose a robustification of ˆµ LIK such that it satisfies double robustness in addition to sample boundedness and local and intrinsic efficiency. We first discuss a simple version of our proposal. Consider the system of estimating equations [{ } ] 0 = Ẽ ω(x; λ) 1 ˆυ(X), (8) [ 0 = Ẽ ω(x; λ) ω(x; λ){1 ω(x; λ)}ĥ2(x) ], (9) which are equivalent to (5) except that ( ω)/{ω(1 ω)} is replaced by (/ω 1)/(1 ˆπ ML ) in the equations associated with ĥ1 = (1 ˆπ ML )ˆυ. Let λ be a solution to (8) (9) subject to the constraint that ω(x i ; λ) > 0 if i = 1 (i = 1,..., n) and let { } Y µ LIK = Ẽ ω(x; λ). Note that ˆυ(X) includes the constant 1 and hence Ẽ{/ω(X; λ)} = 1 by (8). Therefore, µ LIK is sample-bounded in a similar manner as ˆµ LIK is. We derive asymptotic expansions for λ and µ LIK, allowing for misspecification of model (1) and model (2), in parallel to those for ˆλ and ˆµ LIK. Under regularity conditions, λ converges to a constant λ in probability with the expansion { } λ λ = B T 1 Ẽ ω(x;λ ) 1 ˆυ(X) + o ω(x;λ ) p (n 1/2 ), ω(x;λ ){1 ω(x;λ )}ĥ2(x)
8 Z. TAN where B = Ẽ ω 2 (X;λ )ĥ1(x)ˆυ T { ω(x;λ (X) )} 2 ω 2 (X;λ ){1 ω(x;λ )} 2 ĥ1(x)ĥt 2 (X) ω 2 (X;λ )ĥ2(x)ˆυ T { ω(x;λ (X) )} 2 ω 2 (X;λ ){1 ω(x;λ )} 2 ĥ2(x)ĥt 2 (X). Moreover, a Taylor expansion of µ LIK about λ yields { } { } Y µ LIK = Ẽ ω(x; λ ) Ĉ T BT 1 Ẽ ω(x;λ ) 1 ˆυ(X) + o ω(x;λ ) p (n 1/2 ). (10) ω(x;λ ){1 ω(x;λ )}ĥ2(x) If model (2) is correctly specified, then λ = 0 and hence the expansion reduces to µ LIK = µ EG + o p (n 1/2 ) with µ EG = Ẽ(ˆη) β T Ẽ(ˆξ), where ˆζ = [ˆυ T (X)/ˆπ ML (X), { ˆπ ML (X)}ˆϱ ML (X)f T (X)] T, B = Ẽ(ˆξ ˆζ T ), and β = B 1 Ĉ. In this case, ˆµ EG and µ EG are asymptotically equivalent to the first order and hence so are ˆµ LIK and µ LIK. However, µ EG is akin to the doubly robust regression estimator of Tan (2006). These regression estimators, unlike ˆµ EG, satisfies double robustness in addition to local and intrinsic efficiency. The estimators ˆµ LIK and µ LIK are sample-bounded and locally and intrinsically efficient. However, µ LIK, unlike ˆµ LIK, is further doubly robust. This difference follows from the general asymptotic expansions (7) for ˆµ LIK and (10) for µ LIK. The leading terms are structurally similar to respectively ˆµ EG, which is not doubly robust, and µ EG, which is doubly robust. Alternatively, µ LIK is doubly robust because { } Ẽ ˆm(X) = Ẽ{ ˆm(X)} (11) ω(x; λ) by (8) and hence µ LIK is identical to ˆµ{ω( ; λ), ˆm} in the typical form of doubly robust estimators. In contrast, Ẽ{ ˆm(X)/ω(X; ˆλ)} = Ẽ{ ˆm(X)} does not necessarily hold for ˆµ LIK. We regard λ as a calibration of the maximum likelihood estimator ˆλ in the linear, extended propensity score model P ( = 1 X) = ω(x; λ) such that equation (11) holds. So far, we seem to fulfil the objective of deriving an estimator that is doubly robust, locally and intrinsically efficient, and sample-bounded. However, there remain subtle issues about the existence and computation of λ. First, it is difficult to characterize conditions under which there exists a solution to (8) (9) subject to the constraint that ω(x i ; λ) > 0 if i = 1 (i = 1,..., n). Moreover, algorithms for solving nonlinear equations such as (8) (9) may fail to locate a solution, much less all possible solutions, if any exists. It presents a further challenge to accommodate the constraint on the domain of λ. Finally, if indeed there exists no solution or multiple solutions, it remains difficult to redefine λ or select λ among multiple solutions. These difficulties are applicable not only to (8) (9), but to nonlinear estimating equations in general. See Small et al. (2000) for a survey that mainly deals with multiple solutions. We now discuss a more effective version of our proposal to address the foregoing issues. ecall that ˆλ is defined as a maximizer of l(λ). Under condition (4), l(λ) is strictly concave and bounded from above and hence ˆλ exists and is unique. Consider the following two-step estimator. (a) Compute ˆλ = (ˆλ T 1, ˆλ T 2 )T, partitioned according to ĥ = (ĥt 1, ĥt 2 )T. (b) Compute λ step2 = ( λ T 1,step2, ˆλ T 2 )T, where λ 1,step2 = argmax λ1 κ 1 (λ 1 ) and [ κ 1 (λ 1 ) = Ẽ log{ω(x; λ 1, ˆλ 2 )} log{ω(x; ˆλ)} ] λ T 1 ˆυ(X). 1 ˆπ ML (X)
9 Biometrika style 9 The function κ 1 (λ 1 ) is finite and concave on the set {λ 1 : ω(x i ; λ 1, ˆλ 2 ) > 0 if i = 1, i = 1,..., n}. Moreover, as shown in the Appendix, κ 1 (λ 1 ) is strictly concave and bounded from above, and hence has a unique maximum, if and only if the set {λ 1 : λ T 1 ˆυ(X i) 0 if i = 1, i = 1,..., n, and Ẽ{λT 1 ˆυ(X)} 0} is empty. (12) Like ˆλ in step (a), λ 1,step2 in step (b) can be computed effectively by using a globally convergent optimization algorithm such as the package trust. Setting the gradient of κ 1 (λ 1 ) to 0 shows that λ 1,step2 is a solution to 0 = Ẽ [{ ω(x; λ 1, ˆλ 2 ) 1 } ˆυ(X) ], (13) which is equivalent to (8) with λ 2 evaluated at ˆλ 2. In fact, we consider (13) as estimating equations and obtain κ 1 (λ 1 ) as an objective function by integrating the right side of (13). This construction is feasible because the matrix of the partial derivatives of the right side of (13) is symmetric and negative-semidefinite. In the degenerate case where ĥ2(x) is removed from ĥ(x), then λ consists of λ 1 only and hence λ and λstep2 are identical. The resulting estimator of µ is { } µ LIK2 = Ẽ Y ω(x; λ. step2 ) The estimator µ LIK2, like µ LIK, is sample-bounded and doubly robust due to, respectively, Ẽ{/ω(X; λ step2 )} = 1 and E{ ˆm(X)/ω(X; λ step2 )} = E{ ˆm(X)} by (13). Furthermore, µ LIK2 is asymptotically equivalent to the first order to ˆµ LIK and µ LIK if model (2) is correctly specified, and hence is locally and intrinsically efficient. See the Appendix for an asymptotic expansion of µ LIK2, allowing for misspecification of model (1) and model (2). The foregoing development allows a general choice of the fitted value ˆm(X). The estimator µ LIK2 is doubly robust, locally and intrinsically efficient, and sample-bounded. Nevertheless, different choices of ˆm(X) lead to specific versions of µ LIK2 that differ beyond the four properties. Denote by µ LIK2,OLS, µ LIK2,WLS, and µ LIK2,V the versions of µ LIK2 corresponding to ˆm = ˆm OLS, ˆm WLS (ˆπ ML ), and m V (ˆπ ML ), and similarly denote those of ˆµ LIK, ˆµ EG, and µ EG. The estimator µ LIK2,V, unlike µ LIK2,OLS and µ LIK2,WLS, is further improved-locally efficient. See Table 1 for a comparison of these estimators among other estimators. 4. EXTENSIONS AND COMPAISONS 4 1. Specification of ˆυ(X) The vector ˆυ(X) is so far fixed as {1, ˆm(X)} T. However, it can be replaced throughout by a general vector of known functions of X including the constant 1 as in Tan (2006). With this extension, ˆµ LIK and µ LIK2 still have asymptotic expansions in the current forms. The two estimators are sample-bounded and intrinsically efficient. Furthermore, if ˆm(X) = b T 1 ˆυ(X) for some vector b 1, (14) then ˆµ LIK is locally efficient, and µ LIK2 is doubly robust and locally efficient. Condition (14) automatically holds for ˆυ(X) = {1, ˆm(X)} T with b 1 = (0, 1) T. Consider the case where model (1) is linear with identity link Ψ. Then g(x) is an alternative choice of ˆυ(X) satisfying (14). For this choice, intrinsic efficiency implies improved local effi-
10 Z. TAN ciency and hence ˆµ LIK and µ LIK2 are improved-locally efficient. This result can also be seen from the following relationship. Suppose that ĥ2(x) is removed from ĥ(x) throughout. Then ˆµ EG and µ EG are identical to ˆµ V and µ V respectively, which are improved-locally efficient (Tan, 2008). The estimators ˆµ LIK and µ LIK2 have increased asymptotic variances, but are still asymptotically equivalent to the first order to ˆµ EG and µ EG if model (2) is correctly specified. Therefore, the original estimators ˆµ LIK and µ LIK2 are improved-locally efficient Estimation of E(X) and G 1 The estimators ˆµ LIK and µ LIK2 for µ = E(Y ) can be used for estimating E(X) with Y replaced by X, and similarly for estimating the expectations of functions of X. The resulting estimators have similar properties to those of ˆµ LIK and µ LIK2. Suppose that X is contained in ˆυ(X) by specification. If model (2) is correctly specified, then Ẽ{X/ω(X; ˆλ)} is asymptotically at least as efficient as Ẽ[X/ˆπ ML(X) {/ˆπ ML (X) 1}X] = Ẽ(X) by intrinsic efficiency, and hence asymptotically equivalent to the first order to Ẽ(X). The estimator Ẽ{X/ω(X; λ step2 )}, in contrast with Ẽ{X/ω(X; ˆλ)}, is identical to Ẽ(X) by (13), whether or not model (2) is correctly specified. Estimation of E(Y ), E(X), and the expectations of functions of (X, Y ) is unified in estimation of G 1 from the distributional perspective of Tan (2006). Let G 1,step2 be the probability measure supported on {(X i, Y i ) : i = 1, i = 1,..., n} such that if i = 1 then G 1,step2 ({X i, Y i }) = n 1 ω(x i ; λ step2 ). Then Ĝ1 and G 1,step2 are both estimators of G 1, supported on the completely observed data. However, G1,step2 satisfies ˆυ(x) d G 1,step2 = Ẽ{ˆυ(X)}, i.e., the weighted average of ˆυ(X) under G 1,step2 is exactly matched to the overall sample average of ˆυ(X). We compare our approach with the empirical likelihood approach of Qin & Zhang (2003). Their approach is to maximize i: i =1 G 1({X i, Y i }) subject to the constraints that G 1 is a probability measure supported on {(X i, Y i ) : i = 1, i = 1,..., n} and â(x) dg 1 = Ẽ{â(X)}, where â(x) = {ˆπ ML (x), ˆm(x)} T. The maximization leads to the estimator that if i = 1 then Ĝ QZ ({X i, Y i }) = n ˆλ T QZ [â(x i) Ẽ{â(X)}], where n 1 = n i=1 i, ˆλQZ = argmax λ1 l QZ (λ 1 ) and l QZ (λ 1 ) = Ẽ{ log(1 + λt 1 [â(x i) Ẽ{â(X)}])}. The estimator ˆµ QZ = y dĝqz is sample-bounded due to dĝqz = 1, and doubly robust and locally efficient due to ˆm(x) dĝqz = Ẽ{ ˆm(X)}. However, ˆµ QZ is not intrinsically or improved-locally efficient, even in the special case where π(x) is known and substituted for ˆπ ML (X) and ˆm V (ˆπ ML ) or m V (ˆπ ML ) is used for ˆm Augmentation of ˆµ LIK The estimator µ LIK2 is derived as a robustification of ˆµ LIK to realize double robustness and retain sample boundedness and local and intrinsic efficiency. Our method is to calibrate the estimation of λ. An alternative method for robustification is to augment ˆµ LIK with the additional term Ẽ[{/ω(X; ˆλ) 1} ˆm(X)], in a similar manner to augmenting ˆµ IPW,ext to ˆµ AIPW,ext by obins et al. in their 2008 technical report. The resulting estimator is doubly robust and locally and intrinsically efficient, but not sample-bounded.
11 Biometrika style 11 ecall that ˆλ = ˆλ( ˆm) depends on ˆm and write ˆω(X; ˆm) = ω{x; ˆλ( ˆm)}. Substitution of ˆω( ˆm) for ˆπ ext ( ˆm) in various estimators in Section 2 3 leads to ˆµ AIPW,lik = ˆµ{ˆω( ˆm OLS ), ˆm OLS }, ˆµ WLS,lik = ˆµ[ˆω{ ˆm WLS (ˆπ ML )}, ˆm WLS (ˆπ ML )], ˆµ WLS,lik2 = ˆµ(ˆω{ ˆm WLS (ˆπ ML )}, ˆm WLS [ˆω{ ˆm WLS (ˆπ ML )}]), µ V,lik = ˆµ[ˆω{ m V (ˆπ ML )}, m V (ˆπ ML )]. These estimators are similar to their counterparts in Section 2 3 in terms of the six properties in Table 1. The estimator ˆµ{ˆω( ˆm), ˆm} is not population-bounded or sample-bounded, whereas ˆµ WLS,lik2 is population-bounded. Nevertheless, ˆµ{ˆω( ˆm), ˆm} is bounded in the absolute value by = max{ ˆm(X i ) : i = 1,..., n} + max{ Y i ˆm(X i ) : i = 1, i = 1,..., n}, due to normalization (6). In contrast, ˆµ{ˆπ ext ( ˆm), ˆm} may lie outside this range, because such a normalization does not hold for ˆπ ext (X) as discussed in Section 3 2. Kang & Schafer (2007) and obins et al. (2007) considered a modification of ˆµ(ˆπ, ˆm) by deliberately normalizing the weights, that is, ˆµ ratio (ˆπ, ˆm) = Ẽ 1 { ˆπ(X) } Ẽ ( Y ˆπ(X) ˆπ(X) } Ẽ = Ẽ{ ˆm(X)} + Ẽ 1 { ˆπ(X) ) [ ˆm(X) Ẽ{ ˆm(X)}] ] [ {Y ˆm(X)} ˆπ(X) The estimator ˆµ ratio {ˆπ ext ( ˆm), ˆm} is bounded in the absolute value by. Moreover, it is similar to ˆµ{ˆπ ext ( ˆm), ˆm} and ˆµ{ˆω( ˆm), ˆm} in terms of the six properties in Table 1. These estimators, two based on ˆπ ext and one based on ˆω, are asymptotically equivalent to each other if model (2) is correctly specified, but may differ in various ways otherwise Bounded robustification of ˆµ IPW,ext The estimator ˆµ AIPW,ext is doubly robust but not sample-bounded. An alternative robustification of ˆµ IPW,ext can be derived such that it is doubly robust and sample-bounded in a similar manner as µ LIK2 is derived from ˆµ LIK. Our method is to calibrate estimation of ν in the extended model (3). For simplicity, fix Π(z) = expit(z), i.e., {1 + exp( z)} 1. Then ϱ(x; γ) 1 free of γ, and π ext (X; ν) reduces to Π{ν T 1 ˆυ(X)/ˆπ ML(X) + ν T 2 f(x)}. ecall that ˆν = (ˆν T 1, ˆνT 2 )T is the maximum likelihood estimator of ν and hence a solution to 0 = Ẽ[ { π ext (X; ν)} f(x) ], [ 0 = Ẽ { π ext (X; ν)} ˆυ(X) ]. (15) ˆπ ML (X) Let ν step2 = ( ν 1,step2 T, ˆνT 2 )T, ν 1,step2 = argmax ν1 J 1(ν 1 ), and [ { J 1(ν 1 ) = Ẽ ˆυ(X) ˆπ ML (X) exp ν1 T ˆπ ML (X) ˆνT 2 f(x) } ] (1 )ν1 T ˆυ(X) by integrating the right side of (17) below. The function J 1(ν 1 ), unlike l(λ) and κ 1 (λ 1 ), is finite and concave everywhere. Moreover, J 1(ν 1 ) is strictly concave and bounded from above, and hence has a unique maximum, if and only if the set {ν 1 : ν1 Tˆυ(X i) 0 if i = 1, i = 1,..., n, and Ẽ{(1 )νt 1 ˆυ(X)} 0} is empty. (16) See the Appendix for a proof. The existence condition (16) for ν 1,step2 is more demanding than (12) for λ 1,step2 in that (16) implies (12), but not necessarily vice versa. Setting the gradient of.
12 Z. TAN J 1(ν 1 ) to 0 shows that ν 1,step2 is a solution to [{ } 0 = Ẽ π ext (X; ν 1, ˆν 2 ) 1 ] ˆυ(X), (17) which is equivalent to (15) with ( π ext ) replaced by (/π ext 1)ˆπ ML and ν 2 evaluated at ˆν 2. The resulting estimator of µ is µ IPW,ext2 = Ẽ{Y/π ext(x; ν step2 )}. This estimator, like µ LIK2, is doubly robust, locally and intrinsically efficient, and sample-bounded. We compare µ IPW,ext2 with the bounded, doubly robust estimator of obins et al. (2007, Section 4 1 2). Consider the extended propensity score model π ext,sl (X; χ, γ) = Π(χ[ ˆm(X) Ẽ{ ˆm(X)}] + γ T f(x)). Let ˆχ = ˆχ( ˆm) be a solution to ( 0 = Ẽ [ ˆm(X) Ẽ{ ˆm(X)}] π ext,sl (X; χ, ˆγ ML ) and write ˆπ ext,sl (X; ˆm) = π ext,sl {X; ˆχ( ˆm), ˆγ ML }. The estimator ˆµ IPW,ext,LS = ˆµ ratio {ˆπ ext,sl ( ˆm), 0} is sample-bounded. Moreover, it is identical to ˆµ ratio {ˆπ ext,sl ( ˆm), ˆm} by the construction of ˆχ and hence is doubly robust and locally efficient. However, it is not intrinsically or improved-locally efficient, even in the case where ˆγ ML is replaced by the true value and ˆm(X) Ẽ{ ˆm(X)} in π ext,sl(x; χ, γ) is replaced by [ ˆm(X) Ẽ{ ˆm(X)}]/π(X) egression estimators The estimators ˆµ EG and µ EG are called regression estimators (Tan, 2006, 2007), with connection to survey sampling (e.g., Cochran, 1977) and Monte Carlo integration (e.g., Hammersley & Handscomb, 1964). The idea is to exploit the fact that if model (2) is correctly specified, then ˆη has mean µ and ˆξ has mean 0 asymptotically. The estimator ˆµ EG attains the minimum asymptotic variance among the class of estimators Ẽ(ˆη) bt Ẽ(ˆξ) for arbitrary b. Moreover, µ EG is asymptotically equivalent to the first order to ˆµ EG because both β and ˆβ converge β = E 1 (ξξ T )E(ξη) in probability. Note that Ẽ(ˆξ 2 ) = 0 and hence Ẽ(ˆη) bt Ẽ(ˆξ) reduces to Ẽ(ˆη) bt 1 Ẽ(ˆξ 1 ), where b = (b T 1, bt 2 )T and ˆξ = (ˆξ 1 T, ˆξ 2 T)T according to ĥ = (ĥt 1, ĥt 2 )T. The estimators ˆµ EG and µ EG are no longer asymptotically equivalent if model (2) is misspecified. In fact, µ EG is doubly robust whereas ˆµ EG is not. The estimator µ EG is akin to the doubly robust regression estimator of Tan (2006), in which ˆη is defined as {ˆυ T (X)/ˆπ ML (X), ˆϱ ML (X)f T (X)} T. A benefit of using this version of ˆη is that the resulting matrix B is symmetric and negative-semidefinite. Moreover, if {λ : λ T h(x i ) = 0 if i = 1, i = 1,..., n} is empty, then B is negative-definite. This symmetrization tends to stabilize the inversion of B in β = B 1 Ĉ and hence improve the finite-sample behavior of µ EG. A similar symmetrization can be applied to estimating equations (8) (9). Consider the following estimating equations in place of (9) [ { } ] 0 = Ẽ ω(x; λ) 1 ĥ2 (X). (18) 1 ˆπ ML (X) The matrix of the partial derivatives of the right sides of (8) and (18) is symmetric and negativesemidefinite. If {λ : λ T ĥ(x i ) = 0 if i = 1, i = 1,..., n} is empty, then the matrix is negativedefinite. In fact, (8) and (18) are jointly equivalent to setting to 0 the gradient of κ(λ) = Ẽ([ log{ω(x; λ)} λ T ĥ(x)]/{1 ˆπ ML (X)}), similarly as (13) is obtained from κ 1 (λ 1 ). The function κ(λ) has similar properties of concavity and boundedness to those of κ 1 (λ 1 ). Therefore, it is numerically convenient to redefine λ as a maximizer to κ(λ) or equivalently a solution to ),
13 Biometrika style 13 (8) and (18) subject to the constraint that ω(x i ; λ) > 0 if i = 1 (i = 1,..., n). The resulting estimator µ LIK is comparable to µ LIK2 in terms of the six properties in Table 1. A limitation of the modified estimator µ LIK as compared with µ LIK2 is that it is difficult to generalize µ LIK while retaining the structure of λ to the setup of causal inference with non-binary, discrete treatments. See Section 5 4 for a further discussion. 5. CAUSAL INFEENCE 5 1. Setup We now turn to causal inference in the framework of potential outcomes (Neyman, 1923; ubin, 1974). Let X be a vector of covariates and Y be an outcome as before. Let T be a treatment variable taking values in T = {0, 1,..., J 1} with J 2, where 0 denotes the null treatment or placebo. For each t T, let Y t be the potential outcome that would be observed under treatment t. We make the consistency assumption that Y = Y t if T = t, and the no-confounding assumption that for each t T, t and Y t are conditionally independent given X, where t = 1{T = t}. Throughout, 1{ } denotes the indicator function. The observed data consist of independent and identically distributed (X i, T i, Y i ), i = 1,..., n. Our objective is to estimate the population mean µ t = E(Y t ) for t T. The difference µ t µ 0 is called the average causal effect of treatment t. To a certain extent, this problem can be handled as J separate problems of estimating µ t from the data (X i, t,i, t,i Y t,i ), i = 1,..., n, as in Sections 2 4. However, the estimators of µ t obtained in this way are not jointly intrinsically efficient and hence those of µ t µ 0 may be inefficient even marginally Models and existing estimators Consider a parametric model for m(t, X) = E(Y T = t, X) in the form E(Y T = t, X) = m(t, X; α) (t T ), (19) where m(t, x; α) is a known function and α is a vector of unknown parameters. To focus on main ideas, assume that m(t, X; α) = Ψ{α T t g(x)}, where α t is a vector of unknown parameters and α = (α T 0,..., αt J 1 )T. This specification of (19) is separable in the sense that m(t, X; α) depends on α only through α t. By abuse of notation, treat m(t, X; α) as m(t, X; α t ). Let ˆα t,ols be a solution to 0 = Ẽ[ t{y m(t, X; α t )}g(x)] and write ˆm OLS (t, X) = m(t, X; ˆα t,ols ). Consider a parametric model for π(t, X) = P (T = t X) in the form P (T = t X) = π(t, X; γ) (t T ), (20) where π(t, x; γ) is a known function and γ is a vector of unknown parameters. Let ˆγ ML be the maximum likelihood estimator of γ and write ˆπ ML (t, X) = π(t, X; ˆγ ML ). A convenient specification of (20) is the multinomial logit model π(t, X; γ) = exp{γt T f(x)} (21) j T exp{γt j f(x)}, where γ = (γ0 T, γt 1,..., γt J 1 )T with γ 0 = 0. In this case, the score equations for ˆγ ML are 0 = Ẽ[{ t π(t, X; γ)}f(x)] for t = 1,..., J 1. To estimate µ t, the estimators in Section 2 3 can be adopted. eplace ˆµ(ˆπ, ˆm) by ˆµ t (ˆπ, ˆm) = Ẽ [ t Y ˆπ(t, X) { t ˆπ(t, X) 1 } ] ˆm(t, X),
14 Z. TAN where ˆπ(t, X) and ˆm(t, X) are estimators of π(t, X) and m(t, X) respectively. Various choices of the two estimators are available. The estimator ˆm OLS (t, X) is a simple choice of ˆm(t, X), and ˆπ ML (t, X) is a simple choice of ˆπ(t, X). Moreover, there are iterative choices of ˆm(t, X) and ˆπ(t, X). Let ˆm ext (t, X; ˆπ) = m ext {t, X; ˆκ t (ˆπ)}, ˆm WLS (t, X; ˆπ) = m{t, X; ˆα t,wls (ˆπ)}, and m V (t, X; ˆπ) = m{t, X; α t,v (ˆπ)}, where ˆκ t (ˆπ), ˆα t,wls (ˆπ), and α t,v (ˆπ) are obtained by substituting t, ˆπ(t, X), and m(t, X; α t ) for, ˆπ(X), and m(x; α) throughout in ˆκ(ˆπ), ˆα WLS (ˆπ), and α V (ˆπ). Construction of an extension to ˆπ ext ( ˆm) seems difficult for a general specification of model (20) with J > 2. Nevertheless, the task is straightforward if the multinomial logit specification (21) is used. Consider the model 1 P (T = t X) = π ext (t, X; ν) = C(X; ν) exp ˆυ(j, X) ν1t,j T ˆπ ML (j, X) + νt 2tf(X), (22) where ν = (ν1 T, νt 2 )T, ν 1 is the vector of ν 1t,j for t, j T with ν 10,j = 0 for j T and ν 1t,0 = ν 11,0 for t 0, ν 2 is the vector of ν 2t for t T with ν 20 = 0, ˆυ(j, X) = {1, ˆm(j, X)} T, and C(X; ν) is determined by t T π ext(t, X; ν) 1. Let ˆν( ˆm) be the maximum likelihood estimator of ν and write ˆπ ext (t, X; ˆm) = π ext {t, X; ˆν( ˆm)}. The foregoing choices of ˆm(t, X) and ˆπ(t, X) can be employed in similar combinations to those of ˆm(X) and ˆπ(X) in Section 2 3. Label the resulting estimators of µ t accordingly. For each t T, the marginal behavior of ˆµ t can be evaluated by the criteria in Section 2 4. However, consider the following criteria for the joint behavior of (ˆµ 0, ˆµ 1,..., ˆµ J 1 ). We say that a vector-valued estimator ˆθ 1 is more efficient than ˆθ 2 if the asymptotic variance matrix of ˆθ 1 is smaller than that of ˆθ 2 in the order on positive-definite matrices. (a) Joint double robustness: (ˆµ 0, ˆµ 1,..., ˆµ J 1 ) remains consistent if either model (19) or model (20) is correctly specified. (b) Joint local efficiency: (ˆµ 0, ˆµ 1,..., ˆµ J 1 ) attains the semiparametric variance bound if both model (19) and model (20) are correctly specified. (c) Joint improved local efficiency: (ˆµ 0, ˆµ 1,..., ˆµ J 1 ) is at least as efficient as {ˆµ 0 (α 0 ), ˆµ 1 (α 1 ),..., ˆµ J 1 (α J 1 )} if model (20) is correctly specified, where ˆµ t (α t ) = Ẽ[ ty/π(t, X) { t /π(t, X) 1}m(t, X; α t )] for α t a vector of arbitrary constants (t T ). (d) Joint intrinsic efficiency: (ˆµ 0, ˆµ 1,..., ˆµ J 1 ) is at least as efficient as {ˆµ 0 (b 0 ), ˆµ 1 (b 1 ),..., ˆµ J 1 (b J 1 )} if model (20) is correctly specified, where ˆµ t (b t ) = Ẽ[ ty/ˆπ ML (t, X) b T t { t /ˆπ ML (t, X) 1}ˆυ(t, X)] for b t a vector of arbitrary constants (t T ). (e) Joint population boundedness: ˆµ t is population-bounded for each t T. (f) Joint sample boundedness: ˆµ t is sample-bounded for each t T. Joint double robustness, local efficiency, or population or sample boundedness is equivalent to the fact that ˆµ t satisfies the corresponding property for each t T. However, joint intrinsic or improved local efficiency is respectively more stringent than the fact that for each t T, ˆµ t satisfies intrinsic or improved local efficiency. The comparison in Table 1 remains applicable except for one correction, if the estimators are replaced by the joint estimators of (µ 0, µ 1,..., µ J 1 ) and the properties are replaced by those on the joint behavior. See Sections for a description of the likelihood and regression estimators. The correction is that none of the joint estimators satisfies joint improved local efficiency, although Table 1 is still valid regarding whether or not the estimators of µ t satisfy improved local efficiency marginally. See Tan (2008, Section 3) for a further discussion. j T
15 Biometrika style 15 Note that (ˆµ t,ipw,ext ) t T satisfies joint intrinsic efficiency because ˆυ(j, X)/ˆπ ML (j, X), j T, are simultaneously included as extra linear predictors for log{π(t, X)/π(0, X)} for each t 0 in model (22). For fixed j 0, if model (22) were specified such that log{π(t, X)/π(0, X)} = ν T 2t f(x) if t 0 or j, or νt 1j,j ˆυ(j, X)/ˆπ ML(j, X) + ν T 2j f(x) if t = j, then ˆµ j,ipw,ext would satisfy intrinsic efficiency marginally, but (ˆµ t,ipw,ext ) t T would not satisfy joint intrinsic efficiency. See Tan (2007, Section 3) for a related discussion Non-doubly-robust likelihood estimator We present the likelihood estimator of Tan (2006) in the setup of causal inference, with the extension to accommodate discrete, binary or non-binary, treatments. See a 2007 utgers University technical report by Tan for a further extension to deal with marginal and nested structural models. The nonparametric likelihood of (X i, T i, Y i ), i = 1,..., n, is n n L 1 L 2 = π(t i, X i ; γ) G Ti ({X i, Y i }), i=1 where G t is the joint distribution of (X, Y t ), t T. Maximizing L 1 leads to the maximum likelihood estimator ˆγ ML. ecall that ˆm(t, x) is an estimator of m(t, x) based on model (19) and υ(t, x) = {1, ˆm(t, x)} T. Let ĥ = (ĥt 1, ĥt 2 )T and ĥ1 = (ĥt 10, ĥt 11,..., ĥt 1,J 1 )T where ĥ 1j (t, x) = [1{t = j} ˆπ ML (t, x)]ˆυ(j, x) (j T ), ĥ 2 (t, x) = π γ (t, x; ˆγ ML). By construction, t T ĥ(t, x) 0 because t T ˆπ ML(t, x) 1. We choose to ignore the fact that G t, t T, induce the same marginal distribution of X, and retain only the constraints t T ĥ(t, x) dgt = 0, i.e., 0 = [1{t = j} ˆπ ML (t, x)] dg t (j T ), t T 0 = [1{t = j} ˆπ ML (t, x)] ˆm(j, x) dg t (j T ), t T 0 = π γ (t, x; ˆγ ML) dg t. t T Furthermore, we require that G t be a probability measure supported on {(X i, Y i ) : T i = t, i = 1,..., n} and hence dg t = 1, t T. Maximizing L 2 subject to these constraints leads to the estimators that if T i = t then Ĝ t ({X i, Y i }) = i=1 n 1 ω(t, X i ; ˆλ), where ω(t, X; λ) = ˆπ ML (t, X) + λ T ĥ(t, X), ˆλ = argmaxλ l(λ), and l(λ) = Ẽ[log{ω(T, X; λ)}]. The function l(λ) is finite and concave on the set {λ : ω(t i, X i ; λ) > 0, i = 1..., n}. Moreover, l(λ) is strictly concave and bounded from above, and hence has a unique maximum, if and only if {λ : λ T ĥ(t i, X i ) 0, i = 1..., n} is empty. This proposition follows in a similar manner as that concerning l(λ) and condition (4) in Section 3 2. The estimators Ĝt, t T, are similar to Ĝ1 in Section 3 2. If J = 2, ˆπ ML (1, X) is identified as ˆπ ML (X), ĥ10 is removed in ĥ, and the constraint dg 0 = 1 is cancelled, then Ĝ1 reduces to exactly Ĝ1 in Section 3 2. For causal inference, Ĝt, t T, are equally of interest and constrained
16 Z. TAN as probability measures. In contrast, only Ĝ1, but not Ĝ 0, is of interest and constrained as a probability measure in the missing data setup. Setting the gradient of l(λ) to 0 shows that ˆλ is a solution to { } ĥ(t, X) 0 = Ẽ, (23) ω(t, X; λ) or equivalently 0 = t T ĥ(t, x) d Ĝ t. The resulting estimator of µ t is { } ˆµ t,lik = y t dĝt = Ẽ t Y ω(t, X; ˆλ). We derive the following asymptotic expansions for ˆλ and ˆµ t,lik, allowing for misspecification of model (19) and model (20), similarly as in Section 3 2. Under regularity conditions, ˆλ converges to a constant λ with the expansion ˆλ λ = ˆB 1Ẽ{ĥ(T, X)/ω(T, X; λ )} + o p (n 1/2 ). Moreover, ˆµ t,lik has the expansion ˆµ t,lik = Ẽ { } t Y ω(t, X; λ ) Ĉ t T ˆB 1 Ẽ { } ĥ(t, X) ω(t, X; λ + o p (n 1/2 ), ) where ˆB = Ẽ{h(T, X)ĥT (T, X)/ω 2 (T, X; λ )} and Ĉt = Ẽ{ ty/ω 2 (T, X; λ )}. If model (20) is correctly specified, then λ = 0 and hence ˆµ t,lik is asymptotically equivalent to the first order to ˆµ t,eg = Ẽ(ˆη t) Ĉ T t ˆB 1 Ẽ(ˆξ), where ˆη t = t Y/ˆπ ML (T, X), ˆξ = ĥ(t, X)/ˆπ ML (T, X), ˆB = Ẽ(ˆξ ˆξ T ), and Ĉt = Ẽ(ˆξˆη t ) Doubly robust likelihood estimator The estimator ˆµ t,lik is sample-bounded and locally and intrinsically efficient marginally. Moreover, (ˆµ 0,LIK, ˆµ 1,LIK,..., ˆµ J 1,LIK ) satisfies joint intrinsic efficiency. However, ˆµ t,lik is not doubly robust. We propose a robustfication of ˆµ t,lik such that the resulting estimator of µ t satisfies double robustness in addition to sample boundedness and local and intrinsic efficiency, and the joint estimator satisfies joint intrinsic efficiency. For our derivation, rewrite ĥ(t, x) as ĥ(t, x) = ˆ (t, x) ˆπ ML (t, x) ˆ (j, x), (24) j T where ˆ = (ˆ T 1, ˆ T 2 )T, ˆ 2 is defined the same as ĥ2, but ˆ 1 is defined as ĥ1 with ĥ1j(t, x) replaced by ˆ 1j (t, x) = 1{t = j}υ(j, x), j T. Instead of (23), consider the system of estimating equations { ˆ (T, X) 0 = Ẽ ω(t, X; λ) } ˆ (t, X), (25) t T i.e., 0 = Ẽ[{ t/ω(t, X; λ) 1}ˆυ(t, X)], t T, and 0 = Ẽ{ˆ 2 (T, X)/ω(T, X; λ)}. In retrospect, the vector of estimating functions ˆ (T, X)/ω(T, X; λ) t T ˆ (t, X) in (25) equals ĥ(t, X)/ω(T, X; λ) in (23) left-multiplied by the matrix I t T ˆ (T, X)λ T, where I is the appropriate identity matrix. Let λ be a solution to (25) subject to the constraint that ω(t i, X i ; λ) > 0 (i = 1,..., n) and let µ t,lik = Ẽ{ ty/ω(t, X; λ)}.
Nonparametric Likelihood and Doubly Robust Estimating Equations for Marginal and Nested Structural Models. Zhiqiang Tan 1
Nonparametric Likelihood and Doubly Robust Estimating Equations for Marginal and Nested Structural Models Zhiqiang Tan 1 Abstract. Drawing inferences about treatment effects is of interest in many fields.
More informationModification and Improvement of Empirical Likelihood for Missing Response Problem
UW Biostatistics Working Paper Series 12-30-2010 Modification and Improvement of Empirical Likelihood for Missing Response Problem Kwun Chuen Gary Chan University of Washington - Seattle Campus, kcgchan@u.washington.edu
More informationComment: Understanding OR, PS and DR
Statistical Science 2007, Vol. 22, No. 4, 560 568 DOI: 10.1214/07-STS227A Main article DOI: 10.1214/07-STS227 c Institute of Mathematical Statistics, 2007 Comment: Understanding OR, PS and DR Zhiqiang
More informationDouble Robustness. Bang and Robins (2005) Kang and Schafer (2007)
Double Robustness Bang and Robins (2005) Kang and Schafer (2007) Set-Up Assume throughout that treatment assignment is ignorable given covariates (similar to assumption that data are missing at random
More informationWeighting Methods. Harvard University STAT186/GOV2002 CAUSAL INFERENCE. Fall Kosuke Imai
Weighting Methods Kosuke Imai Harvard University STAT186/GOV2002 CAUSAL INFERENCE Fall 2018 Kosuke Imai (Harvard) Weighting Methods Stat186/Gov2002 Fall 2018 1 / 13 Motivation Matching methods for improving
More informationHigh Dimensional Propensity Score Estimation via Covariate Balancing
High Dimensional Propensity Score Estimation via Covariate Balancing Kosuke Imai Princeton University Talk at Columbia University May 13, 2017 Joint work with Yang Ning and Sida Peng Kosuke Imai (Princeton)
More informationAn Efficient Estimation Method for Longitudinal Surveys with Monotone Missing Data
An Efficient Estimation Method for Longitudinal Surveys with Monotone Missing Data Jae-Kwang Kim 1 Iowa State University June 28, 2012 1 Joint work with Dr. Ming Zhou (when he was a PhD student at ISU)
More informationCovariate Balancing Propensity Score for General Treatment Regimes
Covariate Balancing Propensity Score for General Treatment Regimes Kosuke Imai Princeton University October 14, 2014 Talk at the Department of Psychiatry, Columbia University Joint work with Christian
More informationA Sampling of IMPACT Research:
A Sampling of IMPACT Research: Methods for Analysis with Dropout and Identifying Optimal Treatment Regimes Marie Davidian Department of Statistics North Carolina State University http://www.stat.ncsu.edu/
More informationRobustness to Parametric Assumptions in Missing Data Models
Robustness to Parametric Assumptions in Missing Data Models Bryan Graham NYU Keisuke Hirano University of Arizona April 2011 Motivation Motivation We consider the classic missing data problem. In practice
More informationA note on profile likelihood for exponential tilt mixture models
Biometrika (2009), 96, 1,pp. 229 236 C 2009 Biometrika Trust Printed in Great Britain doi: 10.1093/biomet/asn059 Advance Access publication 22 January 2009 A note on profile likelihood for exponential
More informationCombining multiple observational data sources to estimate causal eects
Department of Statistics, North Carolina State University Combining multiple observational data sources to estimate causal eects Shu Yang* syang24@ncsuedu Joint work with Peng Ding UC Berkeley May 23,
More informationClassification. Chapter Introduction. 6.2 The Bayes classifier
Chapter 6 Classification 6.1 Introduction Often encountered in applications is the situation where the response variable Y takes values in a finite set of labels. For example, the response Y could encode
More informationPrimal-dual Covariate Balance and Minimal Double Robustness via Entropy Balancing
Primal-dual Covariate Balance and Minimal Double Robustness via (Joint work with Daniel Percival) Department of Statistics, Stanford University JSM, August 9, 2015 Outline 1 2 3 1/18 Setting Rubin s causal
More informationAn augmented inverse probability weighted survival function estimator
An augmented inverse probability weighted survival function estimator Sundarraman Subramanian & Dipankar Bandyopadhyay Abstract We analyze an augmented inverse probability of non-missingness weighted estimator
More informationSimple design-efficient calibration estimators for rejective and high-entropy sampling
Biometrika (202), 99,, pp. 6 C 202 Biometrika Trust Printed in Great Britain Advance Access publication on 3 July 202 Simple design-efficient calibration estimators for rejective and high-entropy sampling
More informationCalibration Estimation of Semiparametric Copula Models with Data Missing at Random
Calibration Estimation of Semiparametric Copula Models with Data Missing at Random Shigeyuki Hamori 1 Kaiji Motegi 1 Zheng Zhang 2 1 Kobe University 2 Renmin University of China Institute of Statistics
More informationLecture 6: Discrete Choice: Qualitative Response
Lecture 6: Instructor: Department of Economics Stanford University 2011 Types of Discrete Choice Models Univariate Models Binary: Linear; Probit; Logit; Arctan, etc. Multinomial: Logit; Nested Logit; GEV;
More informationWeb-based Supplementary Materials for A Robust Method for Estimating. Optimal Treatment Regimes
Biometrics 000, 000 000 DOI: 000 000 0000 Web-based Supplementary Materials for A Robust Method for Estimating Optimal Treatment Regimes Baqun Zhang, Anastasios A. Tsiatis, Eric B. Laber, and Marie Davidian
More informationCalibration Estimation of Semiparametric Copula Models with Data Missing at Random
Calibration Estimation of Semiparametric Copula Models with Data Missing at Random Shigeyuki Hamori 1 Kaiji Motegi 1 Zheng Zhang 2 1 Kobe University 2 Renmin University of China Econometrics Workshop UNC
More informationPropensity Score Weighting with Multilevel Data
Propensity Score Weighting with Multilevel Data Fan Li Department of Statistical Science Duke University October 25, 2012 Joint work with Alan Zaslavsky and Mary Beth Landrum Introduction In comparative
More informationHarvard University. Harvard University Biostatistics Working Paper Series
Harvard University Harvard University Biostatistics Working Paper Series Year 2015 Paper 197 On Varieties of Doubly Robust Estimators Under Missing Not at Random With an Ancillary Variable Wang Miao Eric
More informationDensity estimation Nonparametric conditional mean estimation Semiparametric conditional mean estimation. Nonparametrics. Gabriel Montes-Rojas
0 0 5 Motivation: Regression discontinuity (Angrist&Pischke) Outcome.5 1 1.5 A. Linear E[Y 0i X i] 0.2.4.6.8 1 X Outcome.5 1 1.5 B. Nonlinear E[Y 0i X i] i 0.2.4.6.8 1 X utcome.5 1 1.5 C. Nonlinearity
More informationEstimating the Marginal Odds Ratio in Observational Studies
Estimating the Marginal Odds Ratio in Observational Studies Travis Loux Christiana Drake Department of Statistics University of California, Davis June 20, 2011 Outline The Counterfactual Model Odds Ratios
More informationarxiv: v1 [stat.me] 15 May 2011
Working Paper Propensity Score Analysis with Matching Weights Liang Li, Ph.D. arxiv:1105.2917v1 [stat.me] 15 May 2011 Associate Staff of Biostatistics Department of Quantitative Health Sciences, Cleveland
More informationEstimation of Optimal Treatment Regimes Via Machine Learning. Marie Davidian
Estimation of Optimal Treatment Regimes Via Machine Learning Marie Davidian Department of Statistics North Carolina State University Triangle Machine Learning Day April 3, 2018 1/28 Optimal DTRs Via ML
More informationPropensity Score Analysis with Hierarchical Data
Propensity Score Analysis with Hierarchical Data Fan Li Alan Zaslavsky Mary Beth Landrum Department of Health Care Policy Harvard Medical School May 19, 2008 Introduction Population-based observational
More informationMarginal and Nested Structural Models Using Instrumental Variables
Marginal and Nested Structural Models Using Instrumental Variables Zhiqiang TAN The objective of many scientific studies is to evaluate the effect of a treatment on an outcome of interest ceteris paribus.
More informatione author and the promoter give permission to consult this master dissertation and to copy it or parts of it for personal use. Each other use falls
e author and the promoter give permission to consult this master dissertation and to copy it or parts of it for personal use. Each other use falls under the restrictions of the copyright, in particular
More informationFlexible Estimation of Treatment Effect Parameters
Flexible Estimation of Treatment Effect Parameters Thomas MaCurdy a and Xiaohong Chen b and Han Hong c Introduction Many empirical studies of program evaluations are complicated by the presence of both
More informationExtending causal inferences from a randomized trial to a target population
Extending causal inferences from a randomized trial to a target population Issa Dahabreh Center for Evidence Synthesis in Health, Brown University issa dahabreh@brown.edu January 16, 2019 Issa Dahabreh
More informationTargeted Maximum Likelihood Estimation in Safety Analysis
Targeted Maximum Likelihood Estimation in Safety Analysis Sam Lendle 1 Bruce Fireman 2 Mark van der Laan 1 1 UC Berkeley 2 Kaiser Permanente ISPE Advanced Topics Session, Barcelona, August 2012 1 / 35
More informationUniversity of California, Berkeley
University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 2009 Paper 251 Nonparametric population average models: deriving the form of approximate population
More informationComment: Performance of Double-Robust Estimators When Inverse Probability Weights Are Highly Variable
Statistical Science 2007, Vol. 22, No. 4, 544 559 DOI: 10.1214/07-STS227D Main article DOI: 10.1214/07-STS227 c Institute of Mathematical Statistics, 2007 Comment: Performance of Double-Robust Estimators
More informationCausal Inference Lecture Notes: Causal Inference with Repeated Measures in Observational Studies
Causal Inference Lecture Notes: Causal Inference with Repeated Measures in Observational Studies Kosuke Imai Department of Politics Princeton University November 13, 2013 So far, we have essentially assumed
More information5 Methods Based on Inverse Probability Weighting Under MAR
5 Methods Based on Inverse Probability Weighting Under MAR The likelihood-based and multiple imputation methods we considered for inference under MAR in Chapters 3 and 4 are based, either directly or indirectly,
More informationPlug-in Approach to Active Learning
Plug-in Approach to Active Learning Stanislav Minsker Stanislav Minsker (Georgia Tech) Plug-in approach to active learning 1 / 18 Prediction framework Let (X, Y ) be a random couple in R d { 1, +1}. X
More informationOutline of GLMs. Definitions
Outline of GLMs Definitions This is a short outline of GLM details, adapted from the book Nonparametric Regression and Generalized Linear Models, by Green and Silverman. The responses Y i have density
More information7 Sensitivity Analysis
7 Sensitivity Analysis A recurrent theme underlying methodology for analysis in the presence of missing data is the need to make assumptions that cannot be verified based on the observed data. If the assumption
More informationUniversity of California, Berkeley
University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 2011 Paper 288 Targeted Maximum Likelihood Estimation of Natural Direct Effect Wenjing Zheng Mark J.
More informationCausal Inference with a Continuous Treatment and Outcome: Alternative Estimators for Parametric Dose-Response Functions
Causal Inference with a Continuous Treatment and Outcome: Alternative Estimators for Parametric Dose-Response Functions Joe Schafer Office of the Associate Director for Research and Methodology U.S. Census
More informationWhat s New in Econometrics. Lecture 1
What s New in Econometrics Lecture 1 Estimation of Average Treatment Effects Under Unconfoundedness Guido Imbens NBER Summer Institute, 2007 Outline 1. Introduction 2. Potential Outcomes 3. Estimands and
More informationEM Algorithm II. September 11, 2018
EM Algorithm II September 11, 2018 Review EM 1/27 (Y obs, Y mis ) f (y obs, y mis θ), we observe Y obs but not Y mis Complete-data log likelihood: l C (θ Y obs, Y mis ) = log { f (Y obs, Y mis θ) Observed-data
More informationThis is the submitted version of the following book chapter: stat08068: Double robustness, which will be
This is the submitted version of the following book chapter: stat08068: Double robustness, which will be published in its final form in Wiley StatsRef: Statistics Reference Online (http://onlinelibrary.wiley.com/book/10.1002/9781118445112)
More informationCalibration Estimation for Semiparametric Copula Models under Missing Data
Calibration Estimation for Semiparametric Copula Models under Missing Data Shigeyuki Hamori 1 Kaiji Motegi 1 Zheng Zhang 2 1 Kobe University 2 Renmin University of China Economics and Economic Growth Centre
More informationChapter 3: Maximum Likelihood Theory
Chapter 3: Maximum Likelihood Theory Florian Pelgrin HEC September-December, 2010 Florian Pelgrin (HEC) Maximum Likelihood Theory September-December, 2010 1 / 40 1 Introduction Example 2 Maximum likelihood
More informationEconometrics of Panel Data
Econometrics of Panel Data Jakub Mućk Meeting # 6 Jakub Mućk Econometrics of Panel Data Meeting # 6 1 / 36 Outline 1 The First-Difference (FD) estimator 2 Dynamic panel data models 3 The Anderson and Hsiao
More informationDOUBLY ROBUST NONPARAMETRIC MULTIPLE IMPUTATION FOR IGNORABLE MISSING DATA
Statistica Sinica 22 (2012), 149-172 doi:http://dx.doi.org/10.5705/ss.2010.069 DOUBLY ROBUST NONPARAMETRIC MULTIPLE IMPUTATION FOR IGNORABLE MISSING DATA Qi Long, Chiu-Hsieh Hsu and Yisheng Li Emory University,
More informationNonresponse weighting adjustment using estimated response probability
Nonresponse weighting adjustment using estimated response probability Jae-kwang Kim Yonsei University, Seoul, Korea December 26, 2006 Introduction Nonresponse Unit nonresponse Item nonresponse Basic strategy
More informationarxiv: v2 [stat.me] 17 Jan 2017
Semiparametric Estimation with Data Missing Not at Random Using an Instrumental Variable arxiv:1607.03197v2 [stat.me] 17 Jan 2017 BaoLuo Sun 1, Lan Liu 1, Wang Miao 1,4, Kathleen Wirth 2,3, James Robins
More informationIntroduction to Estimation Methods for Time Series models Lecture 2
Introduction to Estimation Methods for Time Series models Lecture 2 Fulvio Corsi SNS Pisa Fulvio Corsi Introduction to Estimation () Methods for Time Series models Lecture 2 SNS Pisa 1 / 21 Estimators:
More informationINVERSE PROBABILITY WEIGHTED ESTIMATION FOR GENERAL MISSING DATA PROBLEMS
IVERSE PROBABILITY WEIGHTED ESTIMATIO FOR GEERAL MISSIG DATA PROBLEMS Jeffrey M. Wooldridge Department of Economics Michigan State University East Lansing, MI 48824-1038 (517) 353-5972 wooldri1@msu.edu
More informationData Integration for Big Data Analysis for finite population inference
for Big Data Analysis for finite population inference Jae-kwang Kim ISU January 23, 2018 1 / 36 What is big data? 2 / 36 Data do not speak for themselves Knowledge Reproducibility Information Intepretation
More informationEmpirical likelihood methods in missing response problems and causal interference
The University of Toledo The University of Toledo Digital Repository Theses and Dissertations 2017 Empirical likelihood methods in missing response problems and causal interference Kaili Ren University
More informationCausal Inference Basics
Causal Inference Basics Sam Lendle October 09, 2013 Observed data, question, counterfactuals Observed data: n i.i.d copies of baseline covariates W, treatment A {0, 1}, and outcome Y. O i = (W i, A i,
More informationUniversity of California, Berkeley
University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 2010 Paper 260 Collaborative Targeted Maximum Likelihood For Time To Event Data Ori M. Stitelman Mark
More informationNotes on the Multivariate Normal and Related Topics
Version: July 10, 2013 Notes on the Multivariate Normal and Related Topics Let me refresh your memory about the distinctions between population and sample; parameters and statistics; population distributions
More informationStatistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach
Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Jae-Kwang Kim Department of Statistics, Iowa State University Outline 1 Introduction 2 Observed likelihood 3 Mean Score
More informationA note on L convergence of Neumann series approximation in missing data problems
A note on L convergence of Neumann series approximation in missing data problems Hua Yun Chen Division of Epidemiology & Biostatistics School of Public Health University of Illinois at Chicago 1603 West
More information6 Pattern Mixture Models
6 Pattern Mixture Models A common theme underlying the methods we have discussed so far is that interest focuses on making inference on parameters in a parametric or semiparametric model for the full data
More informationFall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.
1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n
More informationShu Yang and Jae Kwang Kim. Harvard University and Iowa State University
Statistica Sinica 27 (2017), 000-000 doi:https://doi.org/10.5705/ss.202016.0155 DISCUSSION: DISSECTING MULTIPLE IMPUTATION FROM A MULTI-PHASE INFERENCE PERSPECTIVE: WHAT HAPPENS WHEN GOD S, IMPUTER S AND
More informationA Course in Applied Econometrics Lecture 18: Missing Data. Jeff Wooldridge IRP Lectures, UW Madison, August Linear model with IVs: y i x i u i,
A Course in Applied Econometrics Lecture 18: Missing Data Jeff Wooldridge IRP Lectures, UW Madison, August 2008 1. When Can Missing Data be Ignored? 2. Inverse Probability Weighting 3. Imputation 4. Heckman-Type
More informationPower and Sample Size Calculations with the Additive Hazards Model
Journal of Data Science 10(2012), 143-155 Power and Sample Size Calculations with the Additive Hazards Model Ling Chen, Chengjie Xiong, J. Philip Miller and Feng Gao Washington University School of Medicine
More informationRecent Advances in the analysis of missing data with non-ignorable missingness
Recent Advances in the analysis of missing data with non-ignorable missingness Jae-Kwang Kim Department of Statistics, Iowa State University July 4th, 2014 1 Introduction 2 Full likelihood-based ML estimation
More informationSpring 2017 Econ 574 Roger Koenker. Lecture 14 GEE-GMM
University of Illinois Department of Economics Spring 2017 Econ 574 Roger Koenker Lecture 14 GEE-GMM Throughout the course we have emphasized methods of estimation and inference based on the principle
More informationA Course in Applied Econometrics Lecture 14: Control Functions and Related Methods. Jeff Wooldridge IRP Lectures, UW Madison, August 2008
A Course in Applied Econometrics Lecture 14: Control Functions and Related Methods Jeff Wooldridge IRP Lectures, UW Madison, August 2008 1. Linear-in-Parameters Models: IV versus Control Functions 2. Correlated
More informationNuisance parameter elimination for proportional likelihood ratio models with nonignorable missingness and random truncation
Biometrika Advance Access published October 24, 202 Biometrika (202), pp. 8 C 202 Biometrika rust Printed in Great Britain doi: 0.093/biomet/ass056 Nuisance parameter elimination for proportional likelihood
More informationRobustness of a semiparametric estimator of a copula
Robustness of a semiparametric estimator of a copula Gunky Kim a, Mervyn J. Silvapulle b and Paramsothy Silvapulle c a Department of Econometrics and Business Statistics, Monash University, c Caulfield
More informationSTAT331. Cox s Proportional Hazards Model
STAT331 Cox s Proportional Hazards Model In this unit we introduce Cox s proportional hazards (Cox s PH) model, give a heuristic development of the partial likelihood function, and discuss adaptations
More informationDISCUSSION PAPER. The Bias from Misspecification of Control Variables as Linear. L e o n a r d G o f f. November 2014 RFF DP 14-41
DISCUSSION PAPER November 014 RFF DP 14-41 The Bias from Misspecification of Control Variables as Linear L e o n a r d G o f f 1616 P St. NW Washington, DC 0036 0-38-5000 www.rff.org The Bias from Misspecification
More informationA Measure of Robustness to Misspecification
A Measure of Robustness to Misspecification Susan Athey Guido W. Imbens December 2014 Graduate School of Business, Stanford University, and NBER. Electronic correspondence: athey@stanford.edu. Graduate
More informationOptimal Treatment Regimes for Survival Endpoints from a Classification Perspective. Anastasios (Butch) Tsiatis and Xiaofei Bai
Optimal Treatment Regimes for Survival Endpoints from a Classification Perspective Anastasios (Butch) Tsiatis and Xiaofei Bai Department of Statistics North Carolina State University 1/35 Optimal Treatment
More information(θ θ ), θ θ = 2 L(θ ) θ θ θ θ θ (θ )= H θθ (θ ) 1 d θ (θ )
Setting RHS to be zero, 0= (θ )+ 2 L(θ ) (θ θ ), θ θ = 2 L(θ ) 1 (θ )= H θθ (θ ) 1 d θ (θ ) O =0 θ 1 θ 3 θ 2 θ Figure 1: The Newton-Raphson Algorithm where H is the Hessian matrix, d θ is the derivative
More informationOn the Power of Tests for Regime Switching
On the Power of Tests for Regime Switching joint work with Drew Carter and Ben Hansen Douglas G. Steigerwald UC Santa Barbara May 2015 D. Steigerwald (UCSB) Regime Switching May 2015 1 / 42 Motivating
More informationIntroduction to Empirical Processes and Semiparametric Inference Lecture 25: Semiparametric Models
Introduction to Empirical Processes and Semiparametric Inference Lecture 25: Semiparametric Models Michael R. Kosorok, Ph.D. Professor and Chair of Biostatistics Professor of Statistics and Operations
More informationAccounting for Population Uncertainty in Covariance Structure Analysis
Accounting for Population Uncertainty in Structure Analysis Boston College May 21, 2013 Joint work with: Michael W. Browne The Ohio State University matrix among observed variables are usually implied
More informationDeductive Derivation and Computerization of Semiparametric Efficient Estimation
Deductive Derivation and Computerization of Semiparametric Efficient Estimation Constantine Frangakis, Tianchen Qian, Zhenke Wu, and Ivan Diaz Department of Biostatistics Johns Hopkins Bloomberg School
More informationGeneralized Linear Models
Generalized Linear Models Advanced Methods for Data Analysis (36-402/36-608 Spring 2014 1 Generalized linear models 1.1 Introduction: two regressions So far we ve seen two canonical settings for regression.
More informationCCP Estimation. Robert A. Miller. March Dynamic Discrete Choice. Miller (Dynamic Discrete Choice) cemmap 6 March / 27
CCP Estimation Robert A. Miller Dynamic Discrete Choice March 2018 Miller Dynamic Discrete Choice) cemmap 6 March 2018 1 / 27 Criteria for Evaluating Estimators General principles to apply when assessing
More informationPropensity-Score Based Methods for Causal Inference in Observational Studies with Fixed Non-Binary Treatments
Propensity-Score Based Methods for Causal Inference in Observational Studies with Fixed Non-Binary reatments Shandong Zhao Department of Statistics, University of California, Irvine, CA 92697 shandonm@uci.edu
More informationBootstrapping Sensitivity Analysis
Bootstrapping Sensitivity Analysis Qingyuan Zhao Department of Statistics, The Wharton School University of Pennsylvania May 23, 2018 @ ACIC Based on: Qingyuan Zhao, Dylan S. Small, and Bhaswar B. Bhattacharya.
More informationConditional Empirical Likelihood Approach to Statistical Analysis with Missing Data
Conditional Empirical Likelihood Approach to Statistical Analysis with Missing Data by Peisong Han A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy
More informationarxiv: v1 [stat.me] 5 Apr 2017
Doubly Robust Inference for Targeted Minimum Loss Based Estimation in Randomized Trials with Missing Outcome Data arxiv:1704.01538v1 [stat.me] 5 Apr 2017 Iván Díaz 1 and Mark J. van der Laan 2 1 Division
More informationA Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness
A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness A. Linero and M. Daniels UF, UT-Austin SRC 2014, Galveston, TX 1 Background 2 Working model
More informationUsing Estimating Equations for Spatially Correlated A
Using Estimating Equations for Spatially Correlated Areal Data December 8, 2009 Introduction GEEs Spatial Estimating Equations Implementation Simulation Conclusion Typical Problem Assess the relationship
More informationLinear & nonlinear classifiers
Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1396 1 / 44 Table
More informationThe BLP Method of Demand Curve Estimation in Industrial Organization
The BLP Method of Demand Curve Estimation in Industrial Organization 9 March 2006 Eric Rasmusen 1 IDEAS USED 1. Instrumental variables. We use instruments to correct for the endogeneity of prices, the
More informationSIMPLE EXAMPLES OF ESTIMATING CAUSAL EFFECTS USING TARGETED MAXIMUM LIKELIHOOD ESTIMATION
Johns Hopkins University, Dept. of Biostatistics Working Papers 3-3-2011 SIMPLE EXAMPLES OF ESTIMATING CAUSAL EFFECTS USING TARGETED MAXIMUM LIKELIHOOD ESTIMATION Michael Rosenblum Johns Hopkins Bloomberg
More informationNonlinear and/or Non-normal Filtering. Jesús Fernández-Villaverde University of Pennsylvania
Nonlinear and/or Non-normal Filtering Jesús Fernández-Villaverde University of Pennsylvania 1 Motivation Nonlinear and/or non-gaussian filtering, smoothing, and forecasting (NLGF) problems are pervasive
More informationPropensity score adjusted method for missing data
Graduate Theses and Dissertations Graduate College 2013 Propensity score adjusted method for missing data Minsun Kim Riddles Iowa State University Follow this and additional works at: http://lib.dr.iastate.edu/etd
More informationarxiv: v2 [stat.me] 8 Oct 2018
SENSITIVITY ANALYSIS FOR INVERSE PROBABILITY WEIGHTING ESTIMATORS VIA THE PERCENTILE BOOTSTRAP QINGYUAN ZHAO, DYLAN S. SMALL AND BHASWAR B. BHATTACHARYA arxiv:1711.1186v [stat.me] 8 Oct 018 Department
More informationImproving Efficiency of Inferences in Randomized Clinical Trials Using Auxiliary Covariates
Improving Efficiency of Inferences in Randomized Clinical Trials Using Auxiliary Covariates Anastasios (Butch) Tsiatis Department of Statistics North Carolina State University http://www.stat.ncsu.edu/
More informationUniversity of California, Berkeley
University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 24 Paper 153 A Note on Empirical Likelihood Inference of Residual Life Regression Ying Qing Chen Yichuan
More informationIntroduction to Econometrics
Introduction to Econometrics T H I R D E D I T I O N Global Edition James H. Stock Harvard University Mark W. Watson Princeton University Boston Columbus Indianapolis New York San Francisco Upper Saddle
More informationEconomic modelling and forecasting
Economic modelling and forecasting 2-6 February 2015 Bank of England he generalised method of moments Ole Rummel Adviser, CCBS at the Bank of England ole.rummel@bankofengland.co.uk Outline Classical estimation
More informationFractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling
Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling Jae-Kwang Kim 1 Iowa State University June 26, 2013 1 Joint work with Shu Yang Introduction 1 Introduction
More informationNotes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed
18.466 Notes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed 1. MLEs in exponential families Let f(x,θ) for x X and θ Θ be a likelihood function, that is, for present purposes,
More informationHypothesis Testing Based on the Maximum of Two Statistics from Weighted and Unweighted Estimating Equations
Hypothesis Testing Based on the Maximum of Two Statistics from Weighted and Unweighted Estimating Equations Takeshi Emura and Hisayuki Tsukuma Abstract For testing the regression parameter in multivariate
More informationCS 195-5: Machine Learning Problem Set 1
CS 95-5: Machine Learning Problem Set Douglas Lanman dlanman@brown.edu 7 September Regression Problem Show that the prediction errors y f(x; ŵ) are necessarily uncorrelated with any linear function of
More information