High-Dimensional M-Estimation with Missing Outcomes: A Semi-Parametric Framework

Size: px

Start display at page:

Download "High-Dimensional M-Estimation with Missing Outcomes: A Semi-Parametric Framework"

Juliana Stevens
5 years ago
Views:

1 High-Dimesioal M-Estimatio with Missig Outcomes: A Semi-Parametric Framework (A Overview of the Methods ad the Mai Results) Abhishek Chakrabortty Uiversity of Pesylvaia Harvard Visit. August 20-23, 2018.

2 The Basic Framework ad Set-Up Variables of iterest: outcome Y Y R ad covariates X X R p (possibly high dimesioal, compared to the sample size). The supports Y ad X of Y ad X eed ot be cotiuous. Mai issue: The outcome Y may ot always be observed. Let T {0, 1} deote the idicator of the true Y beig observed. The (partly) uobserved radom vector (T, Y, X) is assumed to be joitly defied o a commo probability space with measure P( ). Observables: Z := (T, TY, X); Data: D := {Z i (T i, T i Y i, X i )} i=1 i.i.d. realizatios of Z (whose distributio is defied via P( )). Note the settig is particularly allowed to be high dimesioal wherei p ca diverge with, icludig p, p or p. Abhishek Chakrabortty High-D M-Estimatio with Missig Outcomes: A Semi-Parametric Framework 2/35

3 The Basic Framework ad Set-Up Variables of iterest: outcome Y Y R ad covariates X X R p (possibly high dimesioal, compared to the sample size). The supports Y ad X of Y ad X eed ot be cotiuous. Mai issue: The outcome Y may ot always be observed. Let T {0, 1} deote the idicator of the true Y beig observed. The (partly) uobserved radom vector (T, Y, X) is assumed to be joitly defied o a commo probability space with measure P( ). Observables: Z := (T, TY, X); Data: D := {Z i (T i, T i Y i, X i )} i=1 i.i.d. realizatios of Z (whose distributio is defied via P( )). Note the settig is particularly allowed to be high dimesioal wherei p ca diverge with, icludig p, p or p. Abhishek Chakrabortty High-D M-Estimatio with Missig Outcomes: A Semi-Parametric Framework 2/35

4 Geeral Applicability of the Framework Itself Geerally applicable to ay missig data settig - with missig outcomes Y ad (possibly) high dimesioal covariates X. Causal iferece problems (via potetial outcomes framework). Here, X is ofte called cofouders (for observatioal studies) or adjustmet variables/features (for radomized trials). Usual set-up: biary treatmet (a.k.a. exposure/itervetio) assigmet: T {0, 1}, ad potetial outcomes: {Y (0), Y (1) }. More geerally, multi-category treatmets: T {0, 1,..., k} (for a fixed k 1) ad correspodig potetial outcomes: {Y (j) } k j=0. Observed outcome: Y := k j=0 Y (j)1(t = j) or, (Y T = j) Y (j) j [i.e. depedig o T, we observe oly oe of {Y (j) } k j=0 ]. Applicability: For each j {0,..., k}, the set-up as above is icluded i our framework based o the represetative map : (T, Y, X) (T j, Y (j), X) with T j := 1(T = j) 0 j k. Abhishek Chakrabortty High-D M-Estimatio with Missig Outcomes: A Semi-Parametric Framework 3/35

5 Geeral Applicability of the Framework Itself Geerally applicable to ay missig data settig - with missig outcomes Y ad (possibly) high dimesioal covariates X. Causal iferece problems (via potetial outcomes framework). Here, X is ofte called cofouders (for observatioal studies) or adjustmet variables/features (for radomized trials). Usual set-up: biary treatmet (a.k.a. exposure/itervetio) assigmet: T {0, 1}, ad potetial outcomes: {Y (0), Y (1) }. More geerally, multi-category treatmets: T {0, 1,..., k} (for a fixed k 1) ad correspodig potetial outcomes: {Y (j) } k j=0. Observed outcome: Y := k j=0 Y (j)1(t = j) or, (Y T = j) Y (j) j [i.e. depedig o T, we observe oly oe of {Y (j) } k j=0 ]. Applicability: For each j {0,..., k}, the set-up as above is icluded i our framework based o the represetative map : (T, Y, X) (T j, Y (j), X) with T j := 1(T = j) 0 j k. Abhishek Chakrabortty High-D M-Estimatio with Missig Outcomes: A Semi-Parametric Framework 3/35

6 The Stadard Fudametal Assumptios Igorability assumptio: T Y X. A.k.a. missig at radom (MAR) i the missig data literature. A.k.a. o umeasured cofoudig (NUC) i causal iferece. Special case: T (Y, X). A.k.a. missig completely at radom (MCAR) i missig data literature, ad complete radomizatio (e.g. radomized trials) i causal iferece (CI) literature. Positivity assumptio (a.k.a. sufficiet overlap i CI literature): Let π(x) := P(T = 1 X) be the propesity score (PS), ad let π 0 := P(T = 1). The, π( ) is uiformly bouded away from 0: 1 π(x) δ π > 0 x X, for some costat δ π > 0. Abhishek Chakrabortty High-D M-Estimatio with Missig Outcomes: A Semi-Parametric Framework 4/35

7 The Stadard Fudametal Assumptios Igorability assumptio: T Y X. A.k.a. missig at radom (MAR) i the missig data literature. A.k.a. o umeasured cofoudig (NUC) i causal iferece. Special case: T (Y, X). A.k.a. missig completely at radom (MCAR) i missig data literature, ad complete radomizatio (e.g. radomized trials) i causal iferece (CI) literature. Positivity assumptio (a.k.a. sufficiet overlap i CI literature): Let π(x) := P(T = 1 X) be the propesity score (PS), ad let π 0 := P(T = 1). The, π( ) is uiformly bouded away from 0: 1 π(x) δ π > 0 x X, for some costat δ π > 0. Abhishek Chakrabortty High-D M-Estimatio with Missig Outcomes: A Semi-Parametric Framework 4/35

8 The Parameter(s) of Iterest: (High Dim.) M-Estimatio Parameters Give this settig, we aim to estimate (based o the data D ) the followig parameter θ 0 R d (possibly high dimesioal): θ 0 := arg mi θ R d R(θ), where R(θ) := E{L(Y, X, θ)} ad L(Y, X, θ) : R R p R d R is ay loss fuctio that is covex ad differetiable i θ R d. (The existece ad uiqueess of θ 0 is implcitly assumed. It is guarateed wheever R(θ) is strogly covex ad coercive. This is true for most stadard examples). Geerally, this correspods to M-estimatio problems (which have a vast classical literature). We provide some useful examples later. The key challeges: the missigess via T (if ot accouted for, estimator will be icosistet!) ad the high dimesioality (of X & θ 0 ). These issues make the methods & the aalyses quite tricky! Need to devise suitable methods - ivolves estimatio of uisace fuctios (leadig to error terms ivolvig o-iid summads with complex depedecies) ad careful o-asymptotic aalyses. Abhishek Chakrabortty High-D M-Estimatio with Missig Outcomes: A Semi-Parametric Framework 5/35

9 The Parameter(s) of Iterest: (High Dim.) M-Estimatio Parameters Give this settig, we aim to estimate (based o the data D ) the followig parameter θ 0 R d (possibly high dimesioal): θ 0 := arg mi θ R d R(θ), where R(θ) := E{L(Y, X, θ)} ad L(Y, X, θ) : R R p R d R is ay loss fuctio that is covex ad differetiable i θ R d. (The existece ad uiqueess of θ 0 is implcitly assumed. It is guarateed wheever R(θ) is strogly covex ad coercive. This is true for most stadard examples). Geerally, this correspods to M-estimatio problems (which have a vast classical literature). We provide some useful examples later. The key challeges: the missigess via T (if ot accouted for, estimator will be icosistet!) ad the high dimesioality (of X & θ 0 ). These issues make the methods & the aalyses quite tricky! Need to devise suitable methods - ivolves estimatio of uisace fuctios (leadig to error terms ivolvig o-iid summads with complex depedecies) ad careful o-asymptotic aalyses. Abhishek Chakrabortty High-D M-Estimatio with Missig Outcomes: A Semi-Parametric Framework 5/35

10 The Parameter(s) of Iterest ad Problem Formulatio Cotd. The case of low-d parameters is also cosidered, e.g. with d = 1 ad L(Y, X, θ) := (Y θ) 2, we have mea estimatio: θ 0 := E(Y ). This also relates to average treatmet effect (ATE) estimatio i CI (ad also i the process, the average coditioal treatmet effect estimatio - which is of iterest i persoalized medicie). Note: the same methodology also addresses (coordiate-wise) estimatio of high-d meas, e.g. whe θ 0 := E(XY ). Today, we maily focus o the more challegig high dimesioal M-estimatio problem itroduced earlier. The basic uderlyig priciple is almost the same for the low-d mea estimatio problem as well. We first provide a few (class of) applicatios for this problem. Abhishek Chakrabortty High-D M-Estimatio with Missig Outcomes: A Semi-Parametric Framework 6/35

11 High Dimesioal M-Estimatio: A Few (Class of) Applicatios 1 Stadard high dimesioal regressio problems with (1) missig outcomes ad (2) potetially misspecified (workig) models. Set d = p + 1, L(Y, X, θ) := l(y, X θ) with l(, ) : R R R. Some choices of l(, ) - (a) squared loss: l sq (u, v) := (u v) 2, (b) logistic loss: l log (u, v) := log(1 + exp v) uv, (c) expoetial loss: l exp (u, v) = exp(v) uv etc., amog may others. Note throughout, regardless of ay motivatig workig model beig true or ot, the defiitio of θ 0 is completely model free. 2 Ay series estimatio or o-liear regressio problem based o fiite (but high dimesioal) bases (o model) ad missig Y. Let Ψ(X) := {ψ j (X)} d j=1 be ay set of d basis fuctios with d possibly high dimesioal. Set L(Y, X, θ) := l{y, Ψ(X) θ} ad the choices of l(, ) ca be kept exactly the same as above. E.g. polyomial bases: Ψ(X) := {1, x k j : 1 j p, 1 k d 0 }. (d 0 = 1 correspods to liear bases - as i previous example). Abhishek Chakrabortty High-D M-Estimatio with Missig Outcomes: A Semi-Parametric Framework 7/35

12 High Dimesioal M-Estimatio: A Few (Class of) Applicatios 1 Stadard high dimesioal regressio problems with (1) missig outcomes ad (2) potetially misspecified (workig) models. Set d = p + 1, L(Y, X, θ) := l(y, X θ) with l(, ) : R R R. Some choices of l(, ) - (a) squared loss: l sq (u, v) := (u v) 2, (b) logistic loss: l log (u, v) := log(1 + exp v) uv, (c) expoetial loss: l exp (u, v) = exp(v) uv etc., amog may others. Note throughout, regardless of ay motivatig workig model beig true or ot, the defiitio of θ 0 is completely model free. 2 Ay series estimatio or o-liear regressio problem based o fiite (but high dimesioal) bases (o model) ad missig Y. Let Ψ(X) := {ψ j (X)} d j=1 be ay set of d basis fuctios with d possibly high dimesioal. Set L(Y, X, θ) := l{y, Ψ(X) θ} ad the choices of l(, ) ca be kept exactly the same as above. E.g. polyomial bases: Ψ(X) := {1, x k j : 1 j p, 1 k d 0 }. (d 0 = 1 correspods to liear bases - as i previous example). Abhishek Chakrabortty High-D M-Estimatio with Missig Outcomes: A Semi-Parametric Framework 7/35

13 Examples ad Applicatios Cotd. A. Sigal recovery i high dimesioal sigle idex models (SIMs) with elliptically symmetric desig distributio (e.g. X is Gaussia). Let Y = f (β 0 X, ɛ) with f : R2 Y ukow (i.e. β 0 idetifiable oly upto scalar multiples) ad ɛ X (i.e., Y X β 0X). Cosider ay of the regressio problems itroduced earlier i Example 1. The, θ 0 defied therei satisfies: (θ 0 ) [ 1] β 0! Classic example of a misspecified parametric model defiig θ 0, yet θ 0 directly relates to a actual semi-parametric model! B. Applicatios of all these problems i causal iferece: Liear heterogeeous treatmet effects estimatio: applicatio of the liear regressio example (applied twice). Average coditioal treatmet effects (ACTE) estimatio via series estimators: applicatio of the series estimatio example. Causal iferece via SIMs (sigal recovery, ACTE estimatio ad ATE estimatio): applicatio of the SIM example above. Abhishek Chakrabortty High-D M-Estimatio with Missig Outcomes: A Semi-Parametric Framework 8/35

14 Examples ad Applicatios Cotd. A. Sigal recovery i high dimesioal sigle idex models (SIMs) with elliptically symmetric desig distributio (e.g. X is Gaussia). Let Y = f (β 0 X, ɛ) with f : R2 Y ukow (i.e. β 0 idetifiable oly upto scalar multiples) ad ɛ X (i.e., Y X β 0X). Cosider ay of the regressio problems itroduced earlier i Example 1. The, θ 0 defied therei satisfies: (θ 0 ) [ 1] β 0! Classic example of a misspecified parametric model defiig θ 0, yet θ 0 directly relates to a actual semi-parametric model! B. Applicatios of all these problems i causal iferece: Liear heterogeeous treatmet effects estimatio: applicatio of the liear regressio example (applied twice). Average coditioal treatmet effects (ACTE) estimatio via series estimators: applicatio of the series estimatio example. Causal iferece via SIMs (sigal recovery, ACTE estimatio ad ATE estimatio): applicatio of the SIM example above. Abhishek Chakrabortty High-D M-Estimatio with Missig Outcomes: A Semi-Parametric Framework 8/35

15 Some Importat Facts to be Cosidered It is geerally ecessary to accout for the missigess i Y. The complete case estimator of θ 0 i geeral will be icosistet! That estimator may be cosistet oly if: (1) φ(x, θ 0 ) = 0 a.s. for every X (for regressio problems, this idicates the correct model case), ad/or (2) T (Y, X) (i.e. the MCAR case). With θ 0 (ad X) beig high dimesioal (compared to ), we eed some further structural costraits o θ 0 to estimate it usig D. We assume that θ 0 is s-sparse: θ 0 0 := s ad s mi(, d). Note: the sparsity requiremet has attractive (ad fairly ituitive) geometric justificatio for all the examples we have give here. Some otatios: θ R d, φ(x, θ) := E{L(Y, X, θ) X} X ad for ay fuctio f (Z, θ), f (Z, θ) := θ f (Z, θ) Rd Z. Abhishek Chakrabortty High-D M-Estimatio with Missig Outcomes: A Semi-Parametric Framework 9/35

16 Some Importat Facts to be Cosidered It is geerally ecessary to accout for the missigess i Y. The complete case estimator of θ 0 i geeral will be icosistet! That estimator may be cosistet oly if: (1) φ(x, θ 0 ) = 0 a.s. for every X (for regressio problems, this idicates the correct model case), ad/or (2) T (Y, X) (i.e. the MCAR case). With θ 0 (ad X) beig high dimesioal (compared to ), we eed some further structural costraits o θ 0 to estimate it usig D. We assume that θ 0 is s-sparse: θ 0 0 := s ad s mi(, d). Note: the sparsity requiremet has attractive (ad fairly ituitive) geometric justificatio for all the examples we have give here. Some otatios: θ R d, φ(x, θ) := E{L(Y, X, θ) X} X ad for ay fuctio f (Z, θ), f (Z, θ) := θ f (Z, θ) Rd Z. Abhishek Chakrabortty High-D M-Estimatio with Missig Outcomes: A Semi-Parametric Framework 9/35

17 Estimatio of θ 0 : The Debiased ad Double Robust (DDR) Approach We ext ote that R(θ) := E{L(Y, X, θ)} E X {φ(x, θ)} admits the followig debiased ad doubly robust (DDR) represetatio: θ, [ ] T R(θ) E X {φ(x, θ)} + E {L(Y, X, θ) φ(x, θ)}. (1) π(x) The 2 d term i (1) is simply 0, ad is ofte called the augmeted IPW (AIPW) term. It ca be see as a debiasig term (of sorts). For estimatio via empirical (ad estimated) versio of (1), the debiasig term plays a crucial role i facilitatig the aalyses & determiig the properties (covergece rates) of the estimator. The double robust (DR) aspect: replace {φ(x, θ), π(x)} by ay {φ (X, θ), π (X)} ad (1) cotiues to hold as log as oe but ot ecessarily both of φ (, ) = φ(, ) or π ( ) = π( ) hold. Note that eq. (1) is a purely o-parametric idetificatio of R(θ) based o the observable Z ad two uisace fuctios: π(x) ad φ(x, θ), which may be both ukow but are estimable from D. Abhishek Chakrabortty High-D M-Estimatio with Missig Outcomes: A Semi-Parametric Framework 10/35

18 Estimatio of θ 0 : The Debiased ad Double Robust (DDR) Approach We ext ote that R(θ) := E{L(Y, X, θ)} E X {φ(x, θ)} admits the followig debiased ad doubly robust (DDR) represetatio: θ, [ ] T R(θ) E X {φ(x, θ)} + E {L(Y, X, θ) φ(x, θ)}. (1) π(x) The 2 d term i (1) is simply 0, ad is ofte called the augmeted IPW (AIPW) term. It ca be see as a debiasig term (of sorts). For estimatio via empirical (ad estimated) versio of (1), the debiasig term plays a crucial role i facilitatig the aalyses & determiig the properties (covergece rates) of the estimator. The double robust (DR) aspect: replace {φ(x, θ), π(x)} by ay {φ (X, θ), π (X)} ad (1) cotiues to hold as log as oe but ot ecessarily both of φ (, ) = φ(, ) or π ( ) = π( ) hold. Note that eq. (1) is a purely o-parametric idetificatio of R(θ) based o the observable Z ad two uisace fuctios: π(x) ad φ(x, θ), which may be both ukow but are estimable from D. Abhishek Chakrabortty High-D M-Estimatio with Missig Outcomes: A Semi-Parametric Framework 10/35

19 Estimatio of θ 0 : The Debiased ad Double Robust (DDR) Approach We ext ote that R(θ) := E{L(Y, X, θ)} E X {φ(x, θ)} admits the followig debiased ad doubly robust (DDR) represetatio: θ, [ ] T R(θ) E X {φ(x, θ)} + E {L(Y, X, θ) φ(x, θ)}. (1) π(x) The 2 d term i (1) is simply 0, ad is ofte called the augmeted IPW (AIPW) term. It ca be see as a debiasig term (of sorts). For estimatio via empirical (ad estimated) versio of (1), the debiasig term plays a crucial role i facilitatig the aalyses & determiig the properties (covergece rates) of the estimator. The double robust (DR) aspect: replace {φ(x, θ), π(x)} by ay {φ (X, θ), π (X)} ad (1) cotiues to hold as log as oe but ot ecessarily both of φ (, ) = φ(, ) or π ( ) = π( ) hold. Note that eq. (1) is a purely o-parametric idetificatio of R(θ) based o the observable Z ad two uisace fuctios: π(x) ad φ(x, θ), which may be both ukow but are estimable from D. Abhishek Chakrabortty High-D M-Estimatio with Missig Outcomes: A Semi-Parametric Framework 10/35

20 The DDR Estimator of θ 0 Give the DDR represetatio (1) of R(θ), let { π( ), φ(, )} be ay estimators of the uisace compoets {π( ), φ(, )} based o D. The, we defie our L 1 -pealized DDR estimator θ DDR of θ 0 as: θ DDR θ DDR (λ ) := arg mi θ R d L DDR (θ) := 1 i=1 φ(x i, θ) + { } L DDR (θ) + λ θ 1, where T { i L(Y i, X i, θ) π(x i ) φ(x } i, θ), ad λ 0 is the tuig parameter. We shall assume the followig basic coditios regardig the costructio of π( ) ad φ(, ): π( ) is obtaied from the data subset T := {T i, X i } i=1 D oly while the other uisace fuctio s estimates { φ(x i, θ)} i=1 are obtaied i a cross-fitted maer (via sample splittig). We assume (temporarily) that both π( ) ad φ(, ) are correctly specified. Uder misspecficatios, the DR properties of θ DDR (i terms of cosistecy & o-sharp rates) will be discussed later. Abhishek Chakrabortty High-D M-Estimatio with Missig Outcomes: A Semi-Parametric Framework 11/35

21 The DDR Estimator of θ 0 Give the DDR represetatio (1) of R(θ), let { π( ), φ(, )} be ay estimators of the uisace compoets {π( ), φ(, )} based o D. The, we defie our L 1 -pealized DDR estimator θ DDR of θ 0 as: θ DDR θ DDR (λ ) := arg mi θ R d L DDR (θ) := 1 i=1 φ(x i, θ) + { } L DDR (θ) + λ θ 1, where T { i L(Y i, X i, θ) π(x i ) φ(x } i, θ), ad λ 0 is the tuig parameter. We shall assume the followig basic coditios regardig the costructio of π( ) ad φ(, ): π( ) is obtaied from the data subset T := {T i, X i } i=1 D oly while the other uisace fuctio s estimates { φ(x i, θ)} i=1 are obtaied i a cross-fitted maer (via sample splittig). We assume (temporarily) that both π( ) ad φ(, ) are correctly specified. Uder misspecficatios, the DR properties of θ DDR (i terms of cosistecy & o-sharp rates) will be discussed later. Abhishek Chakrabortty High-D M-Estimatio with Missig Outcomes: A Semi-Parametric Framework 11/35

22 Simplifyig Assumptios ad Easy Implemetatio Algorithm For simplicity of the theoretical aalyses, we assume that φ(x, θ) is differetiable i θ a.s., ad L(Y, X, θ) ad φ(x, θ) satisfy the followig separable forms : for some h(x) R d ad g(x, θ) R, L(Y, X, θ) = h(x){y g(x, θ)}, ad φ(x, θ) = h(x){ m(x) g(x, θ)}, where m(x) := E(Y X) ad m(x) deotes the correspodig estimator of m(x) eeded to costruct φ(x, θ). To obtai { φ(x i, θ)} i=1 uder the assumed form, oe oly eeds the (cross-fitted) estimates { m(x i )} i=1 of m( ). These assumptios hold for all examples give before. Implemetatio algorithm. θ DDR ca be obtaied simply as: { } θ DDR θ 1 DDR (λ ) := arg mi L(Ỹi, X i, θ) + λ θ θ R d 1, where Ỹi := m(x i ) + i=1 T i π(x i ) {Y i m(x i )}, i, is a pseudo outcome. Abhishek Chakrabortty High-D M-Estimatio with Missig Outcomes: A Semi-Parametric Framework 12/35

23 Simplifyig Assumptios ad Easy Implemetatio Algorithm For simplicity of the theoretical aalyses, we assume that φ(x, θ) is differetiable i θ a.s., ad L(Y, X, θ) ad φ(x, θ) satisfy the followig separable forms : for some h(x) R d ad g(x, θ) R, L(Y, X, θ) = h(x){y g(x, θ)}, ad φ(x, θ) = h(x){ m(x) g(x, θ)}, where m(x) := E(Y X) ad m(x) deotes the correspodig estimator of m(x) eeded to costruct φ(x, θ). To obtai { φ(x i, θ)} i=1 uder the assumed form, oe oly eeds the (cross-fitted) estimates { m(x i )} i=1 of m( ). These assumptios hold for all examples give before. Implemetatio algorithm. θ DDR ca be obtaied simply as: { } θ DDR θ 1 DDR (λ ) := arg mi L(Ỹi, X i, θ) + λ θ θ R d 1, where Ỹi := m(x i ) + i=1 T i π(x i ) {Y i m(x i )}, i, is a pseudo outcome. Abhishek Chakrabortty High-D M-Estimatio with Missig Outcomes: A Semi-Parametric Framework 12/35

24 Properties of θ DDR : Geeral Covergece Rates For ay choices of π( ) ad m( ) ad ay realizatio of D, choose ay λ 2 L DDR (θ 0 ). The for ay such choice, ad uder some basic (stadard) assumptios, the DDR estimator θ DDR (λ ) satisfies a determiistic deviatio boud: with s := θ 0 0, 2 1 θ DDR (λ ) θ 0 λ s, ad θddr (λ ) θ 0 λ s. Probabilistic bouds for L DDR (θ 0 ) (the lower boud of λ ): L DDR (θ 0 ) T 0, + T π, + T m, + R π,m,, where T 0, is the mai term (a cetered iid average), T π, is the π-error term ivolvig π( ) π( ) ad T m, is the m-error term ivolvig m( ) m( ), while R π,m, is the (π, m)-error term (usually lower order) ivolvig the product of π( ) π( ) ad m( ) m( ). For all the terms, the aalyses are fully o-asymptotic ad quite uaced, especially i order to get sharp rates for T π, ad T m,. Abhishek Chakrabortty High-D M-Estimatio with Missig Outcomes: A Semi-Parametric Framework 13/35

25 Properties of θ DDR : Geeral Covergece Rates For ay choices of π( ) ad m( ) ad ay realizatio of D, choose ay λ 2 L DDR (θ 0 ). The for ay such choice, ad uder some basic (stadard) assumptios, the DDR estimator θ DDR (λ ) satisfies a determiistic deviatio boud: with s := θ 0 0, 2 1 θ DDR (λ ) θ 0 λ s, ad θddr (λ ) θ 0 λ s. Probabilistic bouds for L DDR (θ 0 ) (the lower boud of λ ): L DDR (θ 0 ) T 0, + T π, + T m, + R π,m,, where T 0, is the mai term (a cetered iid average), T π, is the π-error term ivolvig π( ) π( ) ad T m, is the m-error term ivolvig m( ) m( ), while R π,m, is the (π, m)-error term (usually lower order) ivolvig the product of π( ) π( ) ad m( ) m( ). For all the terms, the aalyses are fully o-asymptotic ad quite uaced, especially i order to get sharp rates for T π, ad T m,. Abhishek Chakrabortty High-D M-Estimatio with Missig Outcomes: A Semi-Parametric Framework 13/35

26 Geeral Covergece Rates of θ DDR Cotd. Basic (high level) cosistecy coditios o { π( ), m( )}. Let { π( ), m( )} be ay geeral ad correct estimators of {π( ), m( )}, ad assume they satisfy the followig poitwise covergece rates: π(x) π(x) P δ,π ad m(x) m(x) P ξ,m x X, (2) for some sequeces δ,π, ξ,m 0 such that (δ,π + ξ,m ) log(d) = o(1) ad the product δ,π ξ,m (log ) = o( (log d)/). Uder the above set-up ad the coditio i (2), alog with some more suitable assumptios, we the have: with high probability, } log d T 0, {δ,π log(d), ad T m, log d log d, T π, { } ξ 2,m log(d), R π,m, δ,π ξ 2,m (log ). ( ) Hece, L DDR log d log d (θ 0 ) + o with high prob. Abhishek Chakrabortty High-D M-Estimatio with Missig Outcomes: A Semi-Parametric Framework 14/35

27 Geeral Covergece Rates of θ DDR Cotd. Basic (high level) cosistecy coditios o { π( ), m( )}. Let { π( ), m( )} be ay geeral ad correct estimators of {π( ), m( )}, ad assume they satisfy the followig poitwise covergece rates: π(x) π(x) P δ,π ad m(x) m(x) P ξ,m x X, (2) for some sequeces δ,π, ξ,m 0 such that (δ,π + ξ,m ) log(d) = o(1) ad the product δ,π ξ,m (log ) = o( (log d)/). Uder the above set-up ad the coditio i (2), alog with some more suitable assumptios, we the have: with high probability, } log d T 0, {δ,π log(d), ad T m, log d log d, T π, { } ξ 2,m log(d), R π,m, δ,π ξ 2,m (log ). ( ) Hece, L DDR log d log d (θ 0 ) + o with high prob. Abhishek Chakrabortty High-D M-Estimatio with Missig Outcomes: A Semi-Parametric Framework 14/35

28 Desparsificatio of θ DDR ad Asymptotic Liear Expasio (ALE) Cosider θ DDR for the squared loss: L(Y, X, θ) := {Y Ψ(X) θ} 2, where Ψ(X) R d deotes ay (high dimesioal) vector of basis fuctios of X. Defie Ω := Σ 1, where Σ := E{Ψ(X)Ψ(X) }. Let Ω be ay reasoable estimator of Ω (assume Ω is sparse if reqd.). We the defie the desparsified DDR estimator θ DDR as follows. θ DDR := θ DDR + Ω 1 {Ỹi Ψ(X i ) θddr }Ψ(X i ), where Ỹ i := m(x i ) + i=1 T i π(x i ) {Y i m(x i )} are pseudo outcomes as before. Uder suitable assumptios, icludig all previous coditios ad coditios o Ω Ω ad I Ω Σ max, θ DDR satisfies the ALE: ( θ DDR θ 0 ) = 1 Ω{ψ 0 (Z i )} +, where = o P ( 1 2 ), i=1 ad ψ 0 (Z) := [m(x) Ψ(X) θ 0 + {T /π(x)}{y m(x)}]ψ(x) with E{ψ 0 (Z)} = 0. This ALE is optimal ad also facilitates iferece. Abhishek Chakrabortty High-D M-Estimatio with Missig Outcomes: A Semi-Parametric Framework 15/35

29 Desparsificatio of θ DDR ad Asymptotic Liear Expasio (ALE) Cosider θ DDR for the squared loss: L(Y, X, θ) := {Y Ψ(X) θ} 2, where Ψ(X) R d deotes ay (high dimesioal) vector of basis fuctios of X. Defie Ω := Σ 1, where Σ := E{Ψ(X)Ψ(X) }. Let Ω be ay reasoable estimator of Ω (assume Ω is sparse if reqd.). We the defie the desparsified DDR estimator θ DDR as follows. θ DDR := θ DDR + Ω 1 {Ỹi Ψ(X i ) θddr }Ψ(X i ), where Ỹ i := m(x i ) + i=1 T i π(x i ) {Y i m(x i )} are pseudo outcomes as before. Uder suitable assumptios, icludig all previous coditios ad coditios o Ω Ω ad I Ω Σ max, θ DDR satisfies the ALE: ( θ DDR θ 0 ) = 1 Ω{ψ 0 (Z i )} +, where = o P ( 1 2 ), i=1 ad ψ 0 (Z) := [m(x) Ψ(X) θ 0 + {T /π(x)}{y m(x)}]ψ(x) with E{ψ 0 (Z)} = 0. This ALE is optimal ad also facilitates iferece. Abhishek Chakrabortty High-D M-Estimatio with Missig Outcomes: A Semi-Parametric Framework 15/35

30 The Desparsified DDR Estimator: A Sketch of the Aalyses ad Rates The error ca be decomposed as: =,1 +,2 +,3, where,1 := 1 ( Ω Ω) i=1 ψ 0(Z i ),,2 := (I d Ω Σ)( θ DDR θ 0 ) ad,3 := Ω(T π, + T m, + R π,m, ). Assume the basic covergece coditios (2) for { π( ), m( )}, ad that Ω Ω = O P (a ) ad I Ω Σ max = O P (b ) for some a, b = o(1) ad Ω = O(1). The, with high probability, we have: log d log d,1 a,,2 b s ad log d { },3 log(d) δ,π + ξ 2,m + δ,π ξ 2,m log(d). Thus, uder suitable assumptios o the rates, = o P ( 1 2 ). Choose Ω to be ay stadard (sparse) precisio matrix estimator, e.g. the ode-wise Lasso estimator. Here, a = s Ω (log d)/ ad b = (log d)/ uder suitable coditios, with s Ω := max Ω j 0. 1 j d Abhishek Chakrabortty High-D M-Estimatio with Missig Outcomes: A Semi-Parametric Framework 16/35

31 The Desparsified DDR Estimator: A Sketch of the Aalyses ad Rates The error ca be decomposed as: =,1 +,2 +,3, where,1 := 1 ( Ω Ω) i=1 ψ 0(Z i ),,2 := (I d Ω Σ)( θ DDR θ 0 ) ad,3 := Ω(T π, + T m, + R π,m, ). Assume the basic covergece coditios (2) for { π( ), m( )}, ad that Ω Ω = O P (a ) ad I Ω Σ max = O P (b ) for some a, b = o(1) ad Ω = O(1). The, with high probability, we have: log d log d,1 a,,2 b s ad log d { },3 log(d) δ,π + ξ 2,m + δ,π ξ 2,m log(d). Thus, uder suitable assumptios o the rates, = o P ( 1 2 ). Choose Ω to be ay stadard (sparse) precisio matrix estimator, e.g. the ode-wise Lasso estimator. Here, a = s Ω (log d)/ ad b = (log d)/ uder suitable coditios, with s Ω := max Ω j 0. 1 j d Abhishek Chakrabortty High-D M-Estimatio with Missig Outcomes: A Semi-Parametric Framework 16/35

32 The Desparsified DDR Estimator: A Sketch of the Aalyses ad Rates The error ca be decomposed as: =,1 +,2 +,3, where,1 := 1 ( Ω Ω) i=1 ψ 0(Z i ),,2 := (I d Ω Σ)( θ DDR θ 0 ) ad,3 := Ω(T π, + T m, + R π,m, ). Assume the basic covergece coditios (2) for { π( ), m( )}, ad that Ω Ω = O P (a ) ad I Ω Σ max = O P (b ) for some a, b = o(1) ad Ω = O(1). The, with high probability, we have: log d log d,1 a,,2 b s ad log d { },3 log(d) δ,π + ξ 2,m + δ,π ξ 2,m log(d). Thus, uder suitable assumptios o the rates, = o P ( 1 2 ). Choose Ω to be ay stadard (sparse) precisio matrix estimator, e.g. the ode-wise Lasso estimator. Here, a = s Ω (log d)/ ad b = (log d)/ uder suitable coditios, with s Ω := max Ω j 0. 1 j d Abhishek Chakrabortty High-D M-Estimatio with Missig Outcomes: A Semi-Parametric Framework 16/35

33 The DR Aspect: Geeral Covergece Rates (uder Misspecificatio) Fially, let { π( ), m( )} be ay geeral but potetially misspecified estimators with targets {π ( ), m ( )}, so that either π ( ) = π( ) or m ( ) = m( ) but ot ecessarily both. Assume the same pt. wise covergece coditios ad rates (δ,π, ξ,m ) for { π( ), m( )} as i (2), but ow with {π( ), m( )} therei replaced by {π ( ), m ( )}. Uder some suitable assumptios, we have: with high probability, log d { T 0, + T π, + T m, 1 + 1(π,m ) (π,m) + o(1) } & R π,m, { δ,π 1 (m m) + ξ /2,m 1 (π π) + δ,π ξ /2,m } (log ). Note that the 2 d ad/or 3 rd terms (depedig o which estimator is misspecified) will also cotribute ow to the rate (log d)/. The 4 th term is o(1) but o loger igorable (ad may be slower). Regardless, this establishes geeral covergece rates & DR property of θ DDR uder possible misspecificatio of { π( ), m( )}. For the 4 th term, sharper rates eed a case-by-case aalysis. Abhishek Chakrabortty High-D M-Estimatio with Missig Outcomes: A Semi-Parametric Framework 17/35

34 The DR Aspect: Geeral Covergece Rates (uder Misspecificatio) Fially, let { π( ), m( )} be ay geeral but potetially misspecified estimators with targets {π ( ), m ( )}, so that either π ( ) = π( ) or m ( ) = m( ) but ot ecessarily both. Assume the same pt. wise covergece coditios ad rates (δ,π, ξ,m ) for { π( ), m( )} as i (2), but ow with {π( ), m( )} therei replaced by {π ( ), m ( )}. Uder some suitable assumptios, we have: with high probability, log d { T 0, + T π, + T m, 1 + 1(π,m ) (π,m) + o(1) } & R π,m, { δ,π 1 (m m) + ξ /2,m 1 (π π) + δ,π ξ /2,m } (log ). Note that the 2 d ad/or 3 rd terms (depedig o which estimator is misspecified) will also cotribute ow to the rate (log d)/. The 4 th term is o(1) but o loger igorable (ad may be slower). Regardless, this establishes geeral covergece rates & DR property of θ DDR uder possible misspecificatio of { π( ), m( )}. For the 4 th term, sharper rates eed a case-by-case aalysis. Abhishek Chakrabortty High-D M-Estimatio with Missig Outcomes: A Semi-Parametric Framework 17/35

35 Choices of the Nuisace Compoet Estimators π( ) ad m( ) Note: our theory holds geerally for ay choices of π( ) ad m( ) uder mild coditios (provided they are both correct estimators). Uder misspecificatios, cosistecy & geeral o-sharp rates are also established. Sharp rates eed case-by-case aalyses. Below we provide oly some choices of π( ) ad m( ) that may be used to implemet our geeral theory & methods for θ DDR. Choices of π( ): we cosider 2 (classes of) choices (these choices may also be used to implemet the aive IPW type estimator). Choices of m( ): first ote that m(x) := E(Y X) E(Y X, T = 1) (uder MAR). We cosider 2 (classes of) choices of m( ) as well. For both π( ) ad m( ), we cosider estimators from two families: (Exteded) parametric families. Semi-parametric sigle idex families. Abhishek Chakrabortty High-D M-Estimatio with Missig Outcomes: A Semi-Parametric Framework 18/35

36 Choices of the Nuisace Compoet Estimators π( ) ad m( ) Note: our theory holds geerally for ay choices of π( ) ad m( ) uder mild coditios (provided they are both correct estimators). Uder misspecificatios, cosistecy & geeral o-sharp rates are also established. Sharp rates eed case-by-case aalyses. Below we provide oly some choices of π( ) ad m( ) that may be used to implemet our geeral theory & methods for θ DDR. Choices of π( ): we cosider 2 (classes of) choices (these choices may also be used to implemet the aive IPW type estimator). Choices of m( ): first ote that m(x) := E(Y X) E(Y X, T = 1) (uder MAR). We cosider 2 (classes of) choices of m( ) as well. For both π( ) ad m( ), we cosider estimators from two families: (Exteded) parametric families. Semi-parametric sigle idex families. Abhishek Chakrabortty High-D M-Estimatio with Missig Outcomes: A Semi-Parametric Framework 18/35

37 Choices of π( ): (Exteded) Parametric Families If π( ) is kow, we set π( ) := π( ). Otherwise, we estimate π( ) via two (class of) choices of π( ) (each assumed to be correct ). (Exteded) parametric family: π(x) = g{α Ψ(X)}, where g( ) [0, 1] is a kow fuctio [e.g. g expit (u) := exp(u)/{1 + exp(u)}], Ψ(X) := {ψ k (X)} K k=1 is ay set of K basis fuctios (with K possibly), ad α R K is a ukow (sparse) parameter vector. Example: Ψ(X) may correspod to the polyomial bases of X upto ay fixed degree k. Note: the special case of liear bases (k = 1) icludes all stadard parametric regressio models. Further, the case of π( ) = costat (but ukow) i.e. MCAR is also icluded. Estimator: we set π(x) = g{ α Ψ(X)}, where α deotes ay suitable estimator (possibly pealized) of α based o T := {T i, X i } i=1. Example of α: whe g( ) = g expit ( ), α may be obtaied based o a stadard L 1 -pealized logistic regressio of {T i vs. Ψ(X i )} i=1. Abhishek Chakrabortty High-D M-Estimatio with Missig Outcomes: A Semi-Parametric Framework 19/35

38 Choices of π( ): (Exteded) Parametric Families If π( ) is kow, we set π( ) := π( ). Otherwise, we estimate π( ) via two (class of) choices of π( ) (each assumed to be correct ). (Exteded) parametric family: π(x) = g{α Ψ(X)}, where g( ) [0, 1] is a kow fuctio [e.g. g expit (u) := exp(u)/{1 + exp(u)}], Ψ(X) := {ψ k (X)} K k=1 is ay set of K basis fuctios (with K possibly), ad α R K is a ukow (sparse) parameter vector. Example: Ψ(X) may correspod to the polyomial bases of X upto ay fixed degree k. Note: the special case of liear bases (k = 1) icludes all stadard parametric regressio models. Further, the case of π( ) = costat (but ukow) i.e. MCAR is also icluded. Estimator: we set π(x) = g{ α Ψ(X)}, where α deotes ay suitable estimator (possibly pealized) of α based o T := {T i, X i } i=1. Example of α: whe g( ) = g expit ( ), α may be obtaied based o a stadard L 1 -pealized logistic regressio of {T i vs. Ψ(X i )} i=1. Abhishek Chakrabortty High-D M-Estimatio with Missig Outcomes: A Semi-Parametric Framework 19/35

39 Choices of π( ): Semi-Parametric Sigle Idex Families Semi-parametric sigle idex family: π(x) = g(α X), where g( ) (0, 1) is ukow ad α R p is a (sparse) ukow parameter (idetifiable oly upto scalar multiples, hece set α 2 = 1 wlog). Give a estimator α of α, we estimate π(x) E(T α X) as: π(x) π( α, x) := 1 h 1 h i=1 T ik { α (X i x)/h } i=1 K { α (X i x)/h } := l π ( α, x) fπ ( α, x), where K ( ) deotes ay stadard (2 d order) kerel fuctio, h = h > 0 deotes the badwidth sequece, ad l π (α, x) := l α (α X) := g(α X)f α (α X) ad f π (α, x) := f α (α X), with g(α x) E(T α X = α x) ad f α (α x) beig the desity of α X at α x. Obtaiig α: I geeral, ay approach (if available) from (high dimesioal) sigle idex model literature ca be used. But if X is elliptically symmetric, the α may be obtaied as simply as a stadard L 1 -pealized logistic regressio of {T i vs. X i } i=1. Abhishek Chakrabortty High-D M-Estimatio with Missig Outcomes: A Semi-Parametric Framework 20/35

40 Choices of π( ): Semi-Parametric Sigle Idex Families Semi-parametric sigle idex family: π(x) = g(α X), where g( ) (0, 1) is ukow ad α R p is a (sparse) ukow parameter (idetifiable oly upto scalar multiples, hece set α 2 = 1 wlog). Give a estimator α of α, we estimate π(x) E(T α X) as: π(x) π( α, x) := 1 h 1 h i=1 T ik { α (X i x)/h } i=1 K { α (X i x)/h } := l π ( α, x) fπ ( α, x), where K ( ) deotes ay stadard (2 d order) kerel fuctio, h = h > 0 deotes the badwidth sequece, ad l π (α, x) := l α (α X) := g(α X)f α (α X) ad f π (α, x) := f α (α X), with g(α x) E(T α X = α x) ad f α (α x) beig the desity of α X at α x. Obtaiig α: I geeral, ay approach (if available) from (high dimesioal) sigle idex model literature ca be used. But if X is elliptically symmetric, the α may be obtaied as simply as a stadard L 1 -pealized logistic regressio of {T i vs. X i } i=1. Abhishek Chakrabortty High-D M-Estimatio with Missig Outcomes: A Semi-Parametric Framework 20/35

41 Choices of π( ): Semi-Parametric Sigle Idex Families Semi-parametric sigle idex family: π(x) = g(α X), where g( ) (0, 1) is ukow ad α R p is a (sparse) ukow parameter (idetifiable oly upto scalar multiples, hece set α 2 = 1 wlog). Give a estimator α of α, we estimate π(x) E(T α X) as: π(x) π( α, x) := 1 h 1 h i=1 T ik { α (X i x)/h } i=1 K { α (X i x)/h } := l π ( α, x) fπ ( α, x), where K ( ) deotes ay stadard (2 d order) kerel fuctio, h = h > 0 deotes the badwidth sequece, ad l π (α, x) := l α (α X) := g(α X)f α (α X) ad f π (α, x) := f α (α X), with g(α x) E(T α X = α x) ad f α (α x) beig the desity of α X at α x. Obtaiig α: I geeral, ay approach (if available) from (high dimesioal) sigle idex model literature ca be used. But if X is elliptically symmetric, the α may be obtaied as simply as a stadard L 1 -pealized logistic regressio of {T i vs. X i } i=1. Abhishek Chakrabortty High-D M-Estimatio with Missig Outcomes: A Semi-Parametric Framework 20/35

42 Choices of m( ): (Exteded) Parametric Families (Exteded) parametric family: m(x) = g{γ Ψ(X)}, where g( ) is a kow lik fuctio [e.g. caoical liks: idetity, expit or exp], Ψ(X) := {ψ k (X)} K k=1 is ay set of K basis fuctios (with K possibly), ad γ R K is a ukow (sparse) parameter vector. Example: Ψ(X) may correspod to the polyomial bases of X upto ay fixed degree k. Note: the special case of liear bases (k = 1) icludes all stadard parametric regressio models. Estimator: we set m(x) = g{ γ Ψ(X)}, where γ deotes ay suitable estimator (possibly pealized) of γ based o the data subset of complete cases : D (c) := {(Y i, X i ) T i = 1} i=1. Example of γ: whe g( ) := ay caoical lik fuctio, γ may be simply obtaied based o the respective usual L 1 -pealized caoical lik based regressio (e.g. liear, logistic or poisso) of {(Y i vs. X i ) T i = 1} i=1 from the complete case data D(c). Abhishek Chakrabortty High-D M-Estimatio with Missig Outcomes: A Semi-Parametric Framework 21/35

43 Choices of m( ): (Exteded) Parametric Families (Exteded) parametric family: m(x) = g{γ Ψ(X)}, where g( ) is a kow lik fuctio [e.g. caoical liks: idetity, expit or exp], Ψ(X) := {ψ k (X)} K k=1 is ay set of K basis fuctios (with K possibly), ad γ R K is a ukow (sparse) parameter vector. Example: Ψ(X) may correspod to the polyomial bases of X upto ay fixed degree k. Note: the special case of liear bases (k = 1) icludes all stadard parametric regressio models. Estimator: we set m(x) = g{ γ Ψ(X)}, where γ deotes ay suitable estimator (possibly pealized) of γ based o the data subset of complete cases : D (c) := {(Y i, X i ) T i = 1} i=1. Example of γ: whe g( ) := ay caoical lik fuctio, γ may be simply obtaied based o the respective usual L 1 -pealized caoical lik based regressio (e.g. liear, logistic or poisso) of {(Y i vs. X i ) T i = 1} i=1 from the complete case data D(c). Abhishek Chakrabortty High-D M-Estimatio with Missig Outcomes: A Semi-Parametric Framework 21/35

44 Choices of m( ): Semi-Parametric Sigle Idex Families Semi-parametric sigle idex family: m(x) = g(γ X), where g( ) is a ukow lik ad γ R p is a (sparse) ukow parameter (idetifiable oly upto scalar multiples, hece set γ 2 = 1 wlog). Give a estimator γ of γ, we estimate m(x) E(Y γ X, T ) as: 1 h i=1 m(x) m( γ, x) := T iy i K { γ (X i x)/h } i=1 T i K { γ (X i x)/h }, 1 h where K ( ) deotes ay stadard (2 d order) kerel fuctio, ad h = h > 0 deotes the badwidth sequece. Obtaiig γ: I geeral, ay approach (if available) from HD SIM literature ca be used o the complete case data subset D (c). If X is elliptically symmetric ad Y = f (γ X; ɛ) with f ukow ad ɛ (T, X), the γ may be obtaied as L 1 -pealized IPW estimator θ IPW (discussed later) for ay caoical lik based regressio problem (recall the illustrative ex. regardig SIMs). To implemet θ IPW, ca use ay of the 2 earlier choices of π( ). Abhishek Chakrabortty High-D M-Estimatio with Missig Outcomes: A Semi-Parametric Framework 22/35

45 Choices of m( ): Semi-Parametric Sigle Idex Families Semi-parametric sigle idex family: m(x) = g(γ X), where g( ) is a ukow lik ad γ R p is a (sparse) ukow parameter (idetifiable oly upto scalar multiples, hece set γ 2 = 1 wlog). Give a estimator γ of γ, we estimate m(x) E(Y γ X, T ) as: 1 h i=1 m(x) m( γ, x) := T iy i K { γ (X i x)/h } i=1 T i K { γ (X i x)/h }, 1 h where K ( ) deotes ay stadard (2 d order) kerel fuctio, ad h = h > 0 deotes the badwidth sequece. Obtaiig γ: I geeral, ay approach (if available) from HD SIM literature ca be used o the complete case data subset D (c). If X is elliptically symmetric ad Y = f (γ X; ɛ) with f ukow ad ɛ (T, X), the γ may be obtaied as L 1 -pealized IPW estimator θ IPW (discussed later) for ay caoical lik based regressio problem (recall the illustrative ex. regardig SIMs). To implemet θ IPW, ca use ay of the 2 earlier choices of π( ). Abhishek Chakrabortty High-D M-Estimatio with Missig Outcomes: A Semi-Parametric Framework 22/35

46 Choices of m( ): Semi-Parametric Sigle Idex Families Semi-parametric sigle idex family: m(x) = g(γ X), where g( ) is a ukow lik ad γ R p is a (sparse) ukow parameter (idetifiable oly upto scalar multiples, hece set γ 2 = 1 wlog). Give a estimator γ of γ, we estimate m(x) E(Y γ X, T ) as: 1 h i=1 m(x) m( γ, x) := T iy i K { γ (X i x)/h } i=1 T i K { γ (X i x)/h }, 1 h where K ( ) deotes ay stadard (2 d order) kerel fuctio, ad h = h > 0 deotes the badwidth sequece. Obtaiig γ: I geeral, ay approach (if available) from HD SIM literature ca be used o the complete case data subset D (c). If X is elliptically symmetric ad Y = f (γ X; ɛ) with f ukow ad ɛ (T, X), the γ may be obtaied as L 1 -pealized IPW estimator θ IPW (discussed later) for ay caoical lik based regressio problem (recall the illustrative ex. regardig SIMs). To implemet θ IPW, ca use ay of the 2 earlier choices of π( ). Abhishek Chakrabortty High-D M-Estimatio with Missig Outcomes: A Semi-Parametric Framework 22/35

47 Covergece Rates Regardig The Choices of π( ) π( ) For either choices of π( ), assume that the igrediet estimator α satisfies: α α 1 P a for some a = o(1). The, uder some suitable assumptios, we have: with high probability (w.h.p.), π(x) π(x) a for ay fixed x X, (for method 1). For method 2, we have: with high probability, for ay fixed x X, ( π(x) π(x) h ) ( ) log p + a + a h h 3 + a2 h 2. Usually, we expect the L 1 error rate of α to be a = s α (log d )/ where s α := α 0 ad d = K or p (depedig o the method). Abhishek Chakrabortty High-D M-Estimatio with Missig Outcomes: A Semi-Parametric Framework 23/35

48 Covergece Rates Regardig the Choices of m( ) m( ) For either of the two choices of m( ), assume that the igrediet estimator γ satisfies: γ γ 1 P b for some b = o(1). The, uder some suitable assumptios, we have: with high probability, m(x) m(x) b for ay fixed x X (for method 1). For method 2, we have: with high probability, for ay fixed x X, m(x) m(x) ( h ) ( ) log p + b + b h h 3 + b2 h 2. We typically expect the L 1 error rate of γ to be b = s γ (log d )/ where s γ := α 0 ad d = K or p (depedig o the method). Abhishek Chakrabortty High-D M-Estimatio with Missig Outcomes: A Semi-Parametric Framework 24/35

49 The Mea Estimatio Problem Abhishek Chakrabortty High-D M-Estimatio with Missig Outcomes: A Semi-Parametric Framework 25/35

50 Mea Estimatio: The Debiased ad Doubly Robust (DDR) Estimator Let us ext focus o the (low-d) mea estimatio problem, where θ 0 := E(Y ). The, we first ote the DDR represetatio of θ 0 : [ ] T θ 0 E(Y ) = E X {m(x)} + E {Y m(x)}, π(x) where m(x) := E(Y X). The secod term above is actually 0 ad is ofte called the augmeted IPW (AIPW) term. Agai, this is a purely o-parametric idetificatio of θ 0 based o the observable Z ad two uisace fuctios: π(x) ad m(x) both of which may be ukow but are still estimable from D. Let m( ) ad π( ) be ay estimators of m( ) ad π( ), ad further assume that { m(x i )} i=1 are obtaied i a cross-fitted maer (via sample splittig). The, defie the DDR estimator θ DDR of θ 0 : θ DDR = 1 m(x i ) + 1 i=1 i=1 T i π(x i ) {Y i m(x i )}. Abhishek Chakrabortty High-D M-Estimatio with Missig Outcomes: A Semi-Parametric Framework 26/35

51 Mea Estimatio: The Debiased ad Doubly Robust (DDR) Estimator Let us ext focus o the (low-d) mea estimatio problem, where θ 0 := E(Y ). The, we first ote the DDR represetatio of θ 0 : [ ] T θ 0 E(Y ) = E X {m(x)} + E {Y m(x)}, π(x) where m(x) := E(Y X). The secod term above is actually 0 ad is ofte called the augmeted IPW (AIPW) term. Agai, this is a purely o-parametric idetificatio of θ 0 based o the observable Z ad two uisace fuctios: π(x) ad m(x) both of which may be ukow but are still estimable from D. Let m( ) ad π( ) be ay estimators of m( ) ad π( ), ad further assume that { m(x i )} i=1 are obtaied i a cross-fitted maer (via sample splittig). The, defie the DDR estimator θ DDR of θ 0 : θ DDR = 1 m(x i ) + 1 i=1 i=1 T i π(x i ) {Y i m(x i )}. Abhishek Chakrabortty High-D M-Estimatio with Missig Outcomes: A Semi-Parametric Framework 26/35

52 Properties of The DDR Mea Estimator θ DDR is cosistet for θ 0 as soo as oe, but ot ecessarily both, of π( ) or m( ) are correct estimators (cosistet, with some rate coditios) of the true π( ) or m( )! Hece, the double robustess. Further whe both { m( ), π( )} are correct, θ DDR is a -cosistet ad RAL (regular ad asymptotically liear) estimator of θ 0 with IF (ifluece fuctio), which is also the optimal/ efficiet IF, give by ψ eff (Z) = {m(x) θ 0 } + T π(x) {Y m(x)}, with E{ψ eff(z)} = 0. So, θ DDR the satisfies: ( θ DDR θ 0 ) = 1 i=1 ψ eff(z i ) + o P (1). This holds regardless of how { π( ), m( )} are obtaied, as log as they satisfy some (basic) cosistecy & rate requiremets. ψ eff (Z) is the IF with the smallest variace (the semi-parametric efficiecy boud) achievable by ay RAL estimator of θ 0. Abhishek Chakrabortty High-D M-Estimatio with Missig Outcomes: A Semi-Parametric Framework 27/35

53 Properties of The DDR Mea Estimator θ DDR is cosistet for θ 0 as soo as oe, but ot ecessarily both, of π( ) or m( ) are correct estimators (cosistet, with some rate coditios) of the true π( ) or m( )! Hece, the double robustess. Further whe both { m( ), π( )} are correct, θ DDR is a -cosistet ad RAL (regular ad asymptotically liear) estimator of θ 0 with IF (ifluece fuctio), which is also the optimal/ efficiet IF, give by ψ eff (Z) = {m(x) θ 0 } + T π(x) {Y m(x)}, with E{ψ eff(z)} = 0. So, θ DDR the satisfies: ( θ DDR θ 0 ) = 1 i=1 ψ eff(z i ) + o P (1). This holds regardless of how { π( ), m( )} are obtaied, as log as they satisfy some (basic) cosistecy & rate requiremets. ψ eff (Z) is the IF with the smallest variace (the semi-parametric efficiecy boud) achievable by ay RAL estimator of θ 0. Abhishek Chakrabortty High-D M-Estimatio with Missig Outcomes: A Semi-Parametric Framework 27/35

54 Properties of The DDR Mea Estimator θ DDR is cosistet for θ 0 as soo as oe, but ot ecessarily both, of π( ) or m( ) are correct estimators (cosistet, with some rate coditios) of the true π( ) or m( )! Hece, the double robustess. Further whe both { m( ), π( )} are correct, θ DDR is a -cosistet ad RAL (regular ad asymptotically liear) estimator of θ 0 with IF (ifluece fuctio), which is also the optimal/ efficiet IF, give by ψ eff (Z) = {m(x) θ 0 } + T π(x) {Y m(x)}, with E{ψ eff(z)} = 0. So, θ DDR the satisfies: ( θ DDR θ 0 ) = 1 i=1 ψ eff(z i ) + o P (1). This holds regardless of how { π( ), m( )} are obtaied, as log as they satisfy some (basic) cosistecy & rate requiremets. ψ eff (Z) is the IF with the smallest variace (the semi-parametric efficiecy boud) achievable by ay RAL estimator of θ 0. Abhishek Chakrabortty High-D M-Estimatio with Missig Outcomes: A Semi-Parametric Framework 27/35

55 The DDR Mea Estimator: -Cosistecy ad RAL Properties Recall that, usig the expasio we already got, we have: ( θddr θ 0 ) = 1 i=1 ψ(z i) + T (m) + T (π) R. First term: ot a problem -scaled i.i.d. average of ψ(z) with E{ψ(Z)} = 0 ad Var{ψ(Z)} <. Last term: essetially make it go to zero. Will hold wheever π( ) π ( ) m( ) m ( ) = o P (1). A typical sufficiet (but ot ecessary) coditio: each of π( ) π ( ) ad m( ) m ( ) are o P ( 0.25 ). Now for T (m) 1 { } i=1 { m(x i ) m (X i )} 1 T i π (X i ) If π ( ) = π( ), the E{ T (m) } = 0 ad Var{ T (m) } = O[E X { m(x) m (X)} 2 ] = o(1). Hece, T (m) = o P (1). But if π ( ) π( ), the { m( ) m ( )} will cotribute ad the rate is uclear! T (m) may be slower tha 1/2 i high-d cases. Abhishek Chakrabortty High-D M-Estimatio with Missig Outcomes: A Semi-Parametric Framework 28/35

56 The DDR Mea Estimator: -Cosistecy ad RAL Properties Recall that, usig the expasio we already got, we have: ( θddr θ 0 ) = 1 i=1 ψ(z i) + T (m) + T (π) R. First term: ot a problem -scaled i.i.d. average of ψ(z) with E{ψ(Z)} = 0 ad Var{ψ(Z)} <. Last term: essetially make it go to zero. Will hold wheever π( ) π ( ) m( ) m ( ) = o P (1). A typical sufficiet (but ot ecessary) coditio: each of π( ) π ( ) ad m( ) m ( ) are o P ( 0.25 ). Now for T (m) 1 { } i=1 { m(x i ) m (X i )} 1 T i π (X i ) If π ( ) = π( ), the E{ T (m) } = 0 ad Var{ T (m) } = O[E X { m(x) m (X)} 2 ] = o(1). Hece, T (m) = o P (1). But if π ( ) π( ), the { m( ) m ( )} will cotribute ad the rate is uclear! T (m) may be slower tha 1/2 i high-d cases. Abhishek Chakrabortty High-D M-Estimatio with Missig Outcomes: A Semi-Parametric Framework 28/35

57 The DDR Mea Estimator: -Cosistecy ad RAL Properties Recall that, usig the expasio we already got, we have: ( θddr θ 0 ) = 1 i=1 ψ(z i) + T (m) + T (π) R. First term: ot a problem -scaled i.i.d. average of ψ(z) with E{ψ(Z)} = 0 ad Var{ψ(Z)} <. Last term: essetially make it go to zero. Will hold wheever π( ) π ( ) m( ) m ( ) = o P (1). A typical sufficiet (but ot ecessary) coditio: each of π( ) π ( ) ad m( ) m ( ) are o P ( 0.25 ). Now for T (m) 1 { } i=1 { m(x i ) m (X i )} 1 T i π (X i ) If π ( ) = π( ), the E{ T (m) } = 0 ad Var{ T (m) } = O[E X { m(x) m (X)} 2 ] = o(1). Hece, T (m) = o P (1). But if π ( ) π( ), the { m( ) m ( )} will cotribute ad the rate is uclear! T (m) may be slower tha 1/2 i high-d cases. Abhishek Chakrabortty High-D M-Estimatio with Missig Outcomes: A Semi-Parametric Framework 28/35

58 Aalysis of the DDR Estimator: RAL Properties Cotd. Next, for T (π) = 1 i=1 {Y i m (X i )} { Ti π(x i ) } T i π (X i ) If m ( ) = m( ), the E{ T (π) } = 0 ad Var{ T (π) } = O[E X { π(x) π(x)} 2 ] = o(1). Hece, T (π) = o P (1). But if π ( ) π( ), the { π( ) π( )} will cotribute ad the rate is uclear! T (π) may be slower tha 1/2 i high-d cases. Overall, as log as oe of m( ) or π( ) is correct, θ DDR is ideed a cosistet (ad hece, DR) estimator of θ 0. But its RAL properties & achievability of 1/2 rate will geerally require more coditios ad a case by case aalysis, ad the exact IF (if existet) depeds o which of the two are correct (ad how we obtai them). If m( ) ad π( ) are both correct, the regardless of how they are obtaied, θ DDR is always a optimal RAL estimator of θ 0. Choices of { π( ), m( )}? We first obtai a geeral theory ad the prove the rates for a class of choices (possibly misspecificied). Abhishek Chakrabortty High-D M-Estimatio with Missig Outcomes: A Semi-Parametric Framework 29/35

HIGH DIMENSIONAL M-ESTIMATION WITH MISSING OUTCOMES: A SEMI-PARAMETRIC FRAMEWORK

HIGH DIMENSIONAL M-ESTIMATION WITH MISSING OUTCOMES: A SEMI-PARAMETRIC FRAMEWORK By Abhishek Chakrabortty, T. Toy Cai ad Hogzhe Li Uiversity of Pesylvaia October 0, 018 I this paper, we cosider high dimesioal