High-Dimensional M-Estimation with Missing Outcomes: A Semi-Parametric Framework

Size: px
Start display at page:

Download "High-Dimensional M-Estimation with Missing Outcomes: A Semi-Parametric Framework"

Transcription

1 High-Dimesioal M-Estimatio with Missig Outcomes: A Semi-Parametric Framework (A Overview of the Methods ad the Mai Results) Abhishek Chakrabortty Uiversity of Pesylvaia Harvard Visit. August 20-23, 2018.

2 The Basic Framework ad Set-Up Variables of iterest: outcome Y Y R ad covariates X X R p (possibly high dimesioal, compared to the sample size). The supports Y ad X of Y ad X eed ot be cotiuous. Mai issue: The outcome Y may ot always be observed. Let T {0, 1} deote the idicator of the true Y beig observed. The (partly) uobserved radom vector (T, Y, X) is assumed to be joitly defied o a commo probability space with measure P( ). Observables: Z := (T, TY, X); Data: D := {Z i (T i, T i Y i, X i )} i=1 i.i.d. realizatios of Z (whose distributio is defied via P( )). Note the settig is particularly allowed to be high dimesioal wherei p ca diverge with, icludig p, p or p. Abhishek Chakrabortty High-D M-Estimatio with Missig Outcomes: A Semi-Parametric Framework 2/35

3 The Basic Framework ad Set-Up Variables of iterest: outcome Y Y R ad covariates X X R p (possibly high dimesioal, compared to the sample size). The supports Y ad X of Y ad X eed ot be cotiuous. Mai issue: The outcome Y may ot always be observed. Let T {0, 1} deote the idicator of the true Y beig observed. The (partly) uobserved radom vector (T, Y, X) is assumed to be joitly defied o a commo probability space with measure P( ). Observables: Z := (T, TY, X); Data: D := {Z i (T i, T i Y i, X i )} i=1 i.i.d. realizatios of Z (whose distributio is defied via P( )). Note the settig is particularly allowed to be high dimesioal wherei p ca diverge with, icludig p, p or p. Abhishek Chakrabortty High-D M-Estimatio with Missig Outcomes: A Semi-Parametric Framework 2/35

4 Geeral Applicability of the Framework Itself Geerally applicable to ay missig data settig - with missig outcomes Y ad (possibly) high dimesioal covariates X. Causal iferece problems (via potetial outcomes framework). Here, X is ofte called cofouders (for observatioal studies) or adjustmet variables/features (for radomized trials). Usual set-up: biary treatmet (a.k.a. exposure/itervetio) assigmet: T {0, 1}, ad potetial outcomes: {Y (0), Y (1) }. More geerally, multi-category treatmets: T {0, 1,..., k} (for a fixed k 1) ad correspodig potetial outcomes: {Y (j) } k j=0. Observed outcome: Y := k j=0 Y (j)1(t = j) or, (Y T = j) Y (j) j [i.e. depedig o T, we observe oly oe of {Y (j) } k j=0 ]. Applicability: For each j {0,..., k}, the set-up as above is icluded i our framework based o the represetative map : (T, Y, X) (T j, Y (j), X) with T j := 1(T = j) 0 j k. Abhishek Chakrabortty High-D M-Estimatio with Missig Outcomes: A Semi-Parametric Framework 3/35

5 Geeral Applicability of the Framework Itself Geerally applicable to ay missig data settig - with missig outcomes Y ad (possibly) high dimesioal covariates X. Causal iferece problems (via potetial outcomes framework). Here, X is ofte called cofouders (for observatioal studies) or adjustmet variables/features (for radomized trials). Usual set-up: biary treatmet (a.k.a. exposure/itervetio) assigmet: T {0, 1}, ad potetial outcomes: {Y (0), Y (1) }. More geerally, multi-category treatmets: T {0, 1,..., k} (for a fixed k 1) ad correspodig potetial outcomes: {Y (j) } k j=0. Observed outcome: Y := k j=0 Y (j)1(t = j) or, (Y T = j) Y (j) j [i.e. depedig o T, we observe oly oe of {Y (j) } k j=0 ]. Applicability: For each j {0,..., k}, the set-up as above is icluded i our framework based o the represetative map : (T, Y, X) (T j, Y (j), X) with T j := 1(T = j) 0 j k. Abhishek Chakrabortty High-D M-Estimatio with Missig Outcomes: A Semi-Parametric Framework 3/35

6 The Stadard Fudametal Assumptios Igorability assumptio: T Y X. A.k.a. missig at radom (MAR) i the missig data literature. A.k.a. o umeasured cofoudig (NUC) i causal iferece. Special case: T (Y, X). A.k.a. missig completely at radom (MCAR) i missig data literature, ad complete radomizatio (e.g. radomized trials) i causal iferece (CI) literature. Positivity assumptio (a.k.a. sufficiet overlap i CI literature): Let π(x) := P(T = 1 X) be the propesity score (PS), ad let π 0 := P(T = 1). The, π( ) is uiformly bouded away from 0: 1 π(x) δ π > 0 x X, for some costat δ π > 0. Abhishek Chakrabortty High-D M-Estimatio with Missig Outcomes: A Semi-Parametric Framework 4/35

7 The Stadard Fudametal Assumptios Igorability assumptio: T Y X. A.k.a. missig at radom (MAR) i the missig data literature. A.k.a. o umeasured cofoudig (NUC) i causal iferece. Special case: T (Y, X). A.k.a. missig completely at radom (MCAR) i missig data literature, ad complete radomizatio (e.g. radomized trials) i causal iferece (CI) literature. Positivity assumptio (a.k.a. sufficiet overlap i CI literature): Let π(x) := P(T = 1 X) be the propesity score (PS), ad let π 0 := P(T = 1). The, π( ) is uiformly bouded away from 0: 1 π(x) δ π > 0 x X, for some costat δ π > 0. Abhishek Chakrabortty High-D M-Estimatio with Missig Outcomes: A Semi-Parametric Framework 4/35

8 The Parameter(s) of Iterest: (High Dim.) M-Estimatio Parameters Give this settig, we aim to estimate (based o the data D ) the followig parameter θ 0 R d (possibly high dimesioal): θ 0 := arg mi θ R d R(θ), where R(θ) := E{L(Y, X, θ)} ad L(Y, X, θ) : R R p R d R is ay loss fuctio that is covex ad differetiable i θ R d. (The existece ad uiqueess of θ 0 is implcitly assumed. It is guarateed wheever R(θ) is strogly covex ad coercive. This is true for most stadard examples). Geerally, this correspods to M-estimatio problems (which have a vast classical literature). We provide some useful examples later. The key challeges: the missigess via T (if ot accouted for, estimator will be icosistet!) ad the high dimesioality (of X & θ 0 ). These issues make the methods & the aalyses quite tricky! Need to devise suitable methods - ivolves estimatio of uisace fuctios (leadig to error terms ivolvig o-iid summads with complex depedecies) ad careful o-asymptotic aalyses. Abhishek Chakrabortty High-D M-Estimatio with Missig Outcomes: A Semi-Parametric Framework 5/35

9 The Parameter(s) of Iterest: (High Dim.) M-Estimatio Parameters Give this settig, we aim to estimate (based o the data D ) the followig parameter θ 0 R d (possibly high dimesioal): θ 0 := arg mi θ R d R(θ), where R(θ) := E{L(Y, X, θ)} ad L(Y, X, θ) : R R p R d R is ay loss fuctio that is covex ad differetiable i θ R d. (The existece ad uiqueess of θ 0 is implcitly assumed. It is guarateed wheever R(θ) is strogly covex ad coercive. This is true for most stadard examples). Geerally, this correspods to M-estimatio problems (which have a vast classical literature). We provide some useful examples later. The key challeges: the missigess via T (if ot accouted for, estimator will be icosistet!) ad the high dimesioality (of X & θ 0 ). These issues make the methods & the aalyses quite tricky! Need to devise suitable methods - ivolves estimatio of uisace fuctios (leadig to error terms ivolvig o-iid summads with complex depedecies) ad careful o-asymptotic aalyses. Abhishek Chakrabortty High-D M-Estimatio with Missig Outcomes: A Semi-Parametric Framework 5/35

10 The Parameter(s) of Iterest ad Problem Formulatio Cotd. The case of low-d parameters is also cosidered, e.g. with d = 1 ad L(Y, X, θ) := (Y θ) 2, we have mea estimatio: θ 0 := E(Y ). This also relates to average treatmet effect (ATE) estimatio i CI (ad also i the process, the average coditioal treatmet effect estimatio - which is of iterest i persoalized medicie). Note: the same methodology also addresses (coordiate-wise) estimatio of high-d meas, e.g. whe θ 0 := E(XY ). Today, we maily focus o the more challegig high dimesioal M-estimatio problem itroduced earlier. The basic uderlyig priciple is almost the same for the low-d mea estimatio problem as well. We first provide a few (class of) applicatios for this problem. Abhishek Chakrabortty High-D M-Estimatio with Missig Outcomes: A Semi-Parametric Framework 6/35

11 High Dimesioal M-Estimatio: A Few (Class of) Applicatios 1 Stadard high dimesioal regressio problems with (1) missig outcomes ad (2) potetially misspecified (workig) models. Set d = p + 1, L(Y, X, θ) := l(y, X θ) with l(, ) : R R R. Some choices of l(, ) - (a) squared loss: l sq (u, v) := (u v) 2, (b) logistic loss: l log (u, v) := log(1 + exp v) uv, (c) expoetial loss: l exp (u, v) = exp(v) uv etc., amog may others. Note throughout, regardless of ay motivatig workig model beig true or ot, the defiitio of θ 0 is completely model free. 2 Ay series estimatio or o-liear regressio problem based o fiite (but high dimesioal) bases (o model) ad missig Y. Let Ψ(X) := {ψ j (X)} d j=1 be ay set of d basis fuctios with d possibly high dimesioal. Set L(Y, X, θ) := l{y, Ψ(X) θ} ad the choices of l(, ) ca be kept exactly the same as above. E.g. polyomial bases: Ψ(X) := {1, x k j : 1 j p, 1 k d 0 }. (d 0 = 1 correspods to liear bases - as i previous example). Abhishek Chakrabortty High-D M-Estimatio with Missig Outcomes: A Semi-Parametric Framework 7/35

12 High Dimesioal M-Estimatio: A Few (Class of) Applicatios 1 Stadard high dimesioal regressio problems with (1) missig outcomes ad (2) potetially misspecified (workig) models. Set d = p + 1, L(Y, X, θ) := l(y, X θ) with l(, ) : R R R. Some choices of l(, ) - (a) squared loss: l sq (u, v) := (u v) 2, (b) logistic loss: l log (u, v) := log(1 + exp v) uv, (c) expoetial loss: l exp (u, v) = exp(v) uv etc., amog may others. Note throughout, regardless of ay motivatig workig model beig true or ot, the defiitio of θ 0 is completely model free. 2 Ay series estimatio or o-liear regressio problem based o fiite (but high dimesioal) bases (o model) ad missig Y. Let Ψ(X) := {ψ j (X)} d j=1 be ay set of d basis fuctios with d possibly high dimesioal. Set L(Y, X, θ) := l{y, Ψ(X) θ} ad the choices of l(, ) ca be kept exactly the same as above. E.g. polyomial bases: Ψ(X) := {1, x k j : 1 j p, 1 k d 0 }. (d 0 = 1 correspods to liear bases - as i previous example). Abhishek Chakrabortty High-D M-Estimatio with Missig Outcomes: A Semi-Parametric Framework 7/35

13 Examples ad Applicatios Cotd. A. Sigal recovery i high dimesioal sigle idex models (SIMs) with elliptically symmetric desig distributio (e.g. X is Gaussia). Let Y = f (β 0 X, ɛ) with f : R2 Y ukow (i.e. β 0 idetifiable oly upto scalar multiples) ad ɛ X (i.e., Y X β 0X). Cosider ay of the regressio problems itroduced earlier i Example 1. The, θ 0 defied therei satisfies: (θ 0 ) [ 1] β 0! Classic example of a misspecified parametric model defiig θ 0, yet θ 0 directly relates to a actual semi-parametric model! B. Applicatios of all these problems i causal iferece: Liear heterogeeous treatmet effects estimatio: applicatio of the liear regressio example (applied twice). Average coditioal treatmet effects (ACTE) estimatio via series estimators: applicatio of the series estimatio example. Causal iferece via SIMs (sigal recovery, ACTE estimatio ad ATE estimatio): applicatio of the SIM example above. Abhishek Chakrabortty High-D M-Estimatio with Missig Outcomes: A Semi-Parametric Framework 8/35

14 Examples ad Applicatios Cotd. A. Sigal recovery i high dimesioal sigle idex models (SIMs) with elliptically symmetric desig distributio (e.g. X is Gaussia). Let Y = f (β 0 X, ɛ) with f : R2 Y ukow (i.e. β 0 idetifiable oly upto scalar multiples) ad ɛ X (i.e., Y X β 0X). Cosider ay of the regressio problems itroduced earlier i Example 1. The, θ 0 defied therei satisfies: (θ 0 ) [ 1] β 0! Classic example of a misspecified parametric model defiig θ 0, yet θ 0 directly relates to a actual semi-parametric model! B. Applicatios of all these problems i causal iferece: Liear heterogeeous treatmet effects estimatio: applicatio of the liear regressio example (applied twice). Average coditioal treatmet effects (ACTE) estimatio via series estimators: applicatio of the series estimatio example. Causal iferece via SIMs (sigal recovery, ACTE estimatio ad ATE estimatio): applicatio of the SIM example above. Abhishek Chakrabortty High-D M-Estimatio with Missig Outcomes: A Semi-Parametric Framework 8/35

15 Some Importat Facts to be Cosidered It is geerally ecessary to accout for the missigess i Y. The complete case estimator of θ 0 i geeral will be icosistet! That estimator may be cosistet oly if: (1) φ(x, θ 0 ) = 0 a.s. for every X (for regressio problems, this idicates the correct model case), ad/or (2) T (Y, X) (i.e. the MCAR case). With θ 0 (ad X) beig high dimesioal (compared to ), we eed some further structural costraits o θ 0 to estimate it usig D. We assume that θ 0 is s-sparse: θ 0 0 := s ad s mi(, d). Note: the sparsity requiremet has attractive (ad fairly ituitive) geometric justificatio for all the examples we have give here. Some otatios: θ R d, φ(x, θ) := E{L(Y, X, θ) X} X ad for ay fuctio f (Z, θ), f (Z, θ) := θ f (Z, θ) Rd Z. Abhishek Chakrabortty High-D M-Estimatio with Missig Outcomes: A Semi-Parametric Framework 9/35

16 Some Importat Facts to be Cosidered It is geerally ecessary to accout for the missigess i Y. The complete case estimator of θ 0 i geeral will be icosistet! That estimator may be cosistet oly if: (1) φ(x, θ 0 ) = 0 a.s. for every X (for regressio problems, this idicates the correct model case), ad/or (2) T (Y, X) (i.e. the MCAR case). With θ 0 (ad X) beig high dimesioal (compared to ), we eed some further structural costraits o θ 0 to estimate it usig D. We assume that θ 0 is s-sparse: θ 0 0 := s ad s mi(, d). Note: the sparsity requiremet has attractive (ad fairly ituitive) geometric justificatio for all the examples we have give here. Some otatios: θ R d, φ(x, θ) := E{L(Y, X, θ) X} X ad for ay fuctio f (Z, θ), f (Z, θ) := θ f (Z, θ) Rd Z. Abhishek Chakrabortty High-D M-Estimatio with Missig Outcomes: A Semi-Parametric Framework 9/35

17 Estimatio of θ 0 : The Debiased ad Double Robust (DDR) Approach We ext ote that R(θ) := E{L(Y, X, θ)} E X {φ(x, θ)} admits the followig debiased ad doubly robust (DDR) represetatio: θ, [ ] T R(θ) E X {φ(x, θ)} + E {L(Y, X, θ) φ(x, θ)}. (1) π(x) The 2 d term i (1) is simply 0, ad is ofte called the augmeted IPW (AIPW) term. It ca be see as a debiasig term (of sorts). For estimatio via empirical (ad estimated) versio of (1), the debiasig term plays a crucial role i facilitatig the aalyses & determiig the properties (covergece rates) of the estimator. The double robust (DR) aspect: replace {φ(x, θ), π(x)} by ay {φ (X, θ), π (X)} ad (1) cotiues to hold as log as oe but ot ecessarily both of φ (, ) = φ(, ) or π ( ) = π( ) hold. Note that eq. (1) is a purely o-parametric idetificatio of R(θ) based o the observable Z ad two uisace fuctios: π(x) ad φ(x, θ), which may be both ukow but are estimable from D. Abhishek Chakrabortty High-D M-Estimatio with Missig Outcomes: A Semi-Parametric Framework 10/35

18 Estimatio of θ 0 : The Debiased ad Double Robust (DDR) Approach We ext ote that R(θ) := E{L(Y, X, θ)} E X {φ(x, θ)} admits the followig debiased ad doubly robust (DDR) represetatio: θ, [ ] T R(θ) E X {φ(x, θ)} + E {L(Y, X, θ) φ(x, θ)}. (1) π(x) The 2 d term i (1) is simply 0, ad is ofte called the augmeted IPW (AIPW) term. It ca be see as a debiasig term (of sorts). For estimatio via empirical (ad estimated) versio of (1), the debiasig term plays a crucial role i facilitatig the aalyses & determiig the properties (covergece rates) of the estimator. The double robust (DR) aspect: replace {φ(x, θ), π(x)} by ay {φ (X, θ), π (X)} ad (1) cotiues to hold as log as oe but ot ecessarily both of φ (, ) = φ(, ) or π ( ) = π( ) hold. Note that eq. (1) is a purely o-parametric idetificatio of R(θ) based o the observable Z ad two uisace fuctios: π(x) ad φ(x, θ), which may be both ukow but are estimable from D. Abhishek Chakrabortty High-D M-Estimatio with Missig Outcomes: A Semi-Parametric Framework 10/35

19 Estimatio of θ 0 : The Debiased ad Double Robust (DDR) Approach We ext ote that R(θ) := E{L(Y, X, θ)} E X {φ(x, θ)} admits the followig debiased ad doubly robust (DDR) represetatio: θ, [ ] T R(θ) E X {φ(x, θ)} + E {L(Y, X, θ) φ(x, θ)}. (1) π(x) The 2 d term i (1) is simply 0, ad is ofte called the augmeted IPW (AIPW) term. It ca be see as a debiasig term (of sorts). For estimatio via empirical (ad estimated) versio of (1), the debiasig term plays a crucial role i facilitatig the aalyses & determiig the properties (covergece rates) of the estimator. The double robust (DR) aspect: replace {φ(x, θ), π(x)} by ay {φ (X, θ), π (X)} ad (1) cotiues to hold as log as oe but ot ecessarily both of φ (, ) = φ(, ) or π ( ) = π( ) hold. Note that eq. (1) is a purely o-parametric idetificatio of R(θ) based o the observable Z ad two uisace fuctios: π(x) ad φ(x, θ), which may be both ukow but are estimable from D. Abhishek Chakrabortty High-D M-Estimatio with Missig Outcomes: A Semi-Parametric Framework 10/35

20 The DDR Estimator of θ 0 Give the DDR represetatio (1) of R(θ), let { π( ), φ(, )} be ay estimators of the uisace compoets {π( ), φ(, )} based o D. The, we defie our L 1 -pealized DDR estimator θ DDR of θ 0 as: θ DDR θ DDR (λ ) := arg mi θ R d L DDR (θ) := 1 i=1 φ(x i, θ) + { } L DDR (θ) + λ θ 1, where T { i L(Y i, X i, θ) π(x i ) φ(x } i, θ), ad λ 0 is the tuig parameter. We shall assume the followig basic coditios regardig the costructio of π( ) ad φ(, ): π( ) is obtaied from the data subset T := {T i, X i } i=1 D oly while the other uisace fuctio s estimates { φ(x i, θ)} i=1 are obtaied i a cross-fitted maer (via sample splittig). We assume (temporarily) that both π( ) ad φ(, ) are correctly specified. Uder misspecficatios, the DR properties of θ DDR (i terms of cosistecy & o-sharp rates) will be discussed later. Abhishek Chakrabortty High-D M-Estimatio with Missig Outcomes: A Semi-Parametric Framework 11/35

21 The DDR Estimator of θ 0 Give the DDR represetatio (1) of R(θ), let { π( ), φ(, )} be ay estimators of the uisace compoets {π( ), φ(, )} based o D. The, we defie our L 1 -pealized DDR estimator θ DDR of θ 0 as: θ DDR θ DDR (λ ) := arg mi θ R d L DDR (θ) := 1 i=1 φ(x i, θ) + { } L DDR (θ) + λ θ 1, where T { i L(Y i, X i, θ) π(x i ) φ(x } i, θ), ad λ 0 is the tuig parameter. We shall assume the followig basic coditios regardig the costructio of π( ) ad φ(, ): π( ) is obtaied from the data subset T := {T i, X i } i=1 D oly while the other uisace fuctio s estimates { φ(x i, θ)} i=1 are obtaied i a cross-fitted maer (via sample splittig). We assume (temporarily) that both π( ) ad φ(, ) are correctly specified. Uder misspecficatios, the DR properties of θ DDR (i terms of cosistecy & o-sharp rates) will be discussed later. Abhishek Chakrabortty High-D M-Estimatio with Missig Outcomes: A Semi-Parametric Framework 11/35

22 Simplifyig Assumptios ad Easy Implemetatio Algorithm For simplicity of the theoretical aalyses, we assume that φ(x, θ) is differetiable i θ a.s., ad L(Y, X, θ) ad φ(x, θ) satisfy the followig separable forms : for some h(x) R d ad g(x, θ) R, L(Y, X, θ) = h(x){y g(x, θ)}, ad φ(x, θ) = h(x){ m(x) g(x, θ)}, where m(x) := E(Y X) ad m(x) deotes the correspodig estimator of m(x) eeded to costruct φ(x, θ). To obtai { φ(x i, θ)} i=1 uder the assumed form, oe oly eeds the (cross-fitted) estimates { m(x i )} i=1 of m( ). These assumptios hold for all examples give before. Implemetatio algorithm. θ DDR ca be obtaied simply as: { } θ DDR θ 1 DDR (λ ) := arg mi L(Ỹi, X i, θ) + λ θ θ R d 1, where Ỹi := m(x i ) + i=1 T i π(x i ) {Y i m(x i )}, i, is a pseudo outcome. Abhishek Chakrabortty High-D M-Estimatio with Missig Outcomes: A Semi-Parametric Framework 12/35

23 Simplifyig Assumptios ad Easy Implemetatio Algorithm For simplicity of the theoretical aalyses, we assume that φ(x, θ) is differetiable i θ a.s., ad L(Y, X, θ) ad φ(x, θ) satisfy the followig separable forms : for some h(x) R d ad g(x, θ) R, L(Y, X, θ) = h(x){y g(x, θ)}, ad φ(x, θ) = h(x){ m(x) g(x, θ)}, where m(x) := E(Y X) ad m(x) deotes the correspodig estimator of m(x) eeded to costruct φ(x, θ). To obtai { φ(x i, θ)} i=1 uder the assumed form, oe oly eeds the (cross-fitted) estimates { m(x i )} i=1 of m( ). These assumptios hold for all examples give before. Implemetatio algorithm. θ DDR ca be obtaied simply as: { } θ DDR θ 1 DDR (λ ) := arg mi L(Ỹi, X i, θ) + λ θ θ R d 1, where Ỹi := m(x i ) + i=1 T i π(x i ) {Y i m(x i )}, i, is a pseudo outcome. Abhishek Chakrabortty High-D M-Estimatio with Missig Outcomes: A Semi-Parametric Framework 12/35

24 Properties of θ DDR : Geeral Covergece Rates For ay choices of π( ) ad m( ) ad ay realizatio of D, choose ay λ 2 L DDR (θ 0 ). The for ay such choice, ad uder some basic (stadard) assumptios, the DDR estimator θ DDR (λ ) satisfies a determiistic deviatio boud: with s := θ 0 0, 2 1 θ DDR (λ ) θ 0 λ s, ad θddr (λ ) θ 0 λ s. Probabilistic bouds for L DDR (θ 0 ) (the lower boud of λ ): L DDR (θ 0 ) T 0, + T π, + T m, + R π,m,, where T 0, is the mai term (a cetered iid average), T π, is the π-error term ivolvig π( ) π( ) ad T m, is the m-error term ivolvig m( ) m( ), while R π,m, is the (π, m)-error term (usually lower order) ivolvig the product of π( ) π( ) ad m( ) m( ). For all the terms, the aalyses are fully o-asymptotic ad quite uaced, especially i order to get sharp rates for T π, ad T m,. Abhishek Chakrabortty High-D M-Estimatio with Missig Outcomes: A Semi-Parametric Framework 13/35

25 Properties of θ DDR : Geeral Covergece Rates For ay choices of π( ) ad m( ) ad ay realizatio of D, choose ay λ 2 L DDR (θ 0 ). The for ay such choice, ad uder some basic (stadard) assumptios, the DDR estimator θ DDR (λ ) satisfies a determiistic deviatio boud: with s := θ 0 0, 2 1 θ DDR (λ ) θ 0 λ s, ad θddr (λ ) θ 0 λ s. Probabilistic bouds for L DDR (θ 0 ) (the lower boud of λ ): L DDR (θ 0 ) T 0, + T π, + T m, + R π,m,, where T 0, is the mai term (a cetered iid average), T π, is the π-error term ivolvig π( ) π( ) ad T m, is the m-error term ivolvig m( ) m( ), while R π,m, is the (π, m)-error term (usually lower order) ivolvig the product of π( ) π( ) ad m( ) m( ). For all the terms, the aalyses are fully o-asymptotic ad quite uaced, especially i order to get sharp rates for T π, ad T m,. Abhishek Chakrabortty High-D M-Estimatio with Missig Outcomes: A Semi-Parametric Framework 13/35

26 Geeral Covergece Rates of θ DDR Cotd. Basic (high level) cosistecy coditios o { π( ), m( )}. Let { π( ), m( )} be ay geeral ad correct estimators of {π( ), m( )}, ad assume they satisfy the followig poitwise covergece rates: π(x) π(x) P δ,π ad m(x) m(x) P ξ,m x X, (2) for some sequeces δ,π, ξ,m 0 such that (δ,π + ξ,m ) log(d) = o(1) ad the product δ,π ξ,m (log ) = o( (log d)/). Uder the above set-up ad the coditio i (2), alog with some more suitable assumptios, we the have: with high probability, } log d T 0, {δ,π log(d), ad T m, log d log d, T π, { } ξ 2,m log(d), R π,m, δ,π ξ 2,m (log ). ( ) Hece, L DDR log d log d (θ 0 ) + o with high prob. Abhishek Chakrabortty High-D M-Estimatio with Missig Outcomes: A Semi-Parametric Framework 14/35

27 Geeral Covergece Rates of θ DDR Cotd. Basic (high level) cosistecy coditios o { π( ), m( )}. Let { π( ), m( )} be ay geeral ad correct estimators of {π( ), m( )}, ad assume they satisfy the followig poitwise covergece rates: π(x) π(x) P δ,π ad m(x) m(x) P ξ,m x X, (2) for some sequeces δ,π, ξ,m 0 such that (δ,π + ξ,m ) log(d) = o(1) ad the product δ,π ξ,m (log ) = o( (log d)/). Uder the above set-up ad the coditio i (2), alog with some more suitable assumptios, we the have: with high probability, } log d T 0, {δ,π log(d), ad T m, log d log d, T π, { } ξ 2,m log(d), R π,m, δ,π ξ 2,m (log ). ( ) Hece, L DDR log d log d (θ 0 ) + o with high prob. Abhishek Chakrabortty High-D M-Estimatio with Missig Outcomes: A Semi-Parametric Framework 14/35

28 Desparsificatio of θ DDR ad Asymptotic Liear Expasio (ALE) Cosider θ DDR for the squared loss: L(Y, X, θ) := {Y Ψ(X) θ} 2, where Ψ(X) R d deotes ay (high dimesioal) vector of basis fuctios of X. Defie Ω := Σ 1, where Σ := E{Ψ(X)Ψ(X) }. Let Ω be ay reasoable estimator of Ω (assume Ω is sparse if reqd.). We the defie the desparsified DDR estimator θ DDR as follows. θ DDR := θ DDR + Ω 1 {Ỹi Ψ(X i ) θddr }Ψ(X i ), where Ỹ i := m(x i ) + i=1 T i π(x i ) {Y i m(x i )} are pseudo outcomes as before. Uder suitable assumptios, icludig all previous coditios ad coditios o Ω Ω ad I Ω Σ max, θ DDR satisfies the ALE: ( θ DDR θ 0 ) = 1 Ω{ψ 0 (Z i )} +, where = o P ( 1 2 ), i=1 ad ψ 0 (Z) := [m(x) Ψ(X) θ 0 + {T /π(x)}{y m(x)}]ψ(x) with E{ψ 0 (Z)} = 0. This ALE is optimal ad also facilitates iferece. Abhishek Chakrabortty High-D M-Estimatio with Missig Outcomes: A Semi-Parametric Framework 15/35

29 Desparsificatio of θ DDR ad Asymptotic Liear Expasio (ALE) Cosider θ DDR for the squared loss: L(Y, X, θ) := {Y Ψ(X) θ} 2, where Ψ(X) R d deotes ay (high dimesioal) vector of basis fuctios of X. Defie Ω := Σ 1, where Σ := E{Ψ(X)Ψ(X) }. Let Ω be ay reasoable estimator of Ω (assume Ω is sparse if reqd.). We the defie the desparsified DDR estimator θ DDR as follows. θ DDR := θ DDR + Ω 1 {Ỹi Ψ(X i ) θddr }Ψ(X i ), where Ỹ i := m(x i ) + i=1 T i π(x i ) {Y i m(x i )} are pseudo outcomes as before. Uder suitable assumptios, icludig all previous coditios ad coditios o Ω Ω ad I Ω Σ max, θ DDR satisfies the ALE: ( θ DDR θ 0 ) = 1 Ω{ψ 0 (Z i )} +, where = o P ( 1 2 ), i=1 ad ψ 0 (Z) := [m(x) Ψ(X) θ 0 + {T /π(x)}{y m(x)}]ψ(x) with E{ψ 0 (Z)} = 0. This ALE is optimal ad also facilitates iferece. Abhishek Chakrabortty High-D M-Estimatio with Missig Outcomes: A Semi-Parametric Framework 15/35

30 The Desparsified DDR Estimator: A Sketch of the Aalyses ad Rates The error ca be decomposed as: =,1 +,2 +,3, where,1 := 1 ( Ω Ω) i=1 ψ 0(Z i ),,2 := (I d Ω Σ)( θ DDR θ 0 ) ad,3 := Ω(T π, + T m, + R π,m, ). Assume the basic covergece coditios (2) for { π( ), m( )}, ad that Ω Ω = O P (a ) ad I Ω Σ max = O P (b ) for some a, b = o(1) ad Ω = O(1). The, with high probability, we have: log d log d,1 a,,2 b s ad log d { },3 log(d) δ,π + ξ 2,m + δ,π ξ 2,m log(d). Thus, uder suitable assumptios o the rates, = o P ( 1 2 ). Choose Ω to be ay stadard (sparse) precisio matrix estimator, e.g. the ode-wise Lasso estimator. Here, a = s Ω (log d)/ ad b = (log d)/ uder suitable coditios, with s Ω := max Ω j 0. 1 j d Abhishek Chakrabortty High-D M-Estimatio with Missig Outcomes: A Semi-Parametric Framework 16/35

31 The Desparsified DDR Estimator: A Sketch of the Aalyses ad Rates The error ca be decomposed as: =,1 +,2 +,3, where,1 := 1 ( Ω Ω) i=1 ψ 0(Z i ),,2 := (I d Ω Σ)( θ DDR θ 0 ) ad,3 := Ω(T π, + T m, + R π,m, ). Assume the basic covergece coditios (2) for { π( ), m( )}, ad that Ω Ω = O P (a ) ad I Ω Σ max = O P (b ) for some a, b = o(1) ad Ω = O(1). The, with high probability, we have: log d log d,1 a,,2 b s ad log d { },3 log(d) δ,π + ξ 2,m + δ,π ξ 2,m log(d). Thus, uder suitable assumptios o the rates, = o P ( 1 2 ). Choose Ω to be ay stadard (sparse) precisio matrix estimator, e.g. the ode-wise Lasso estimator. Here, a = s Ω (log d)/ ad b = (log d)/ uder suitable coditios, with s Ω := max Ω j 0. 1 j d Abhishek Chakrabortty High-D M-Estimatio with Missig Outcomes: A Semi-Parametric Framework 16/35

32 The Desparsified DDR Estimator: A Sketch of the Aalyses ad Rates The error ca be decomposed as: =,1 +,2 +,3, where,1 := 1 ( Ω Ω) i=1 ψ 0(Z i ),,2 := (I d Ω Σ)( θ DDR θ 0 ) ad,3 := Ω(T π, + T m, + R π,m, ). Assume the basic covergece coditios (2) for { π( ), m( )}, ad that Ω Ω = O P (a ) ad I Ω Σ max = O P (b ) for some a, b = o(1) ad Ω = O(1). The, with high probability, we have: log d log d,1 a,,2 b s ad log d { },3 log(d) δ,π + ξ 2,m + δ,π ξ 2,m log(d). Thus, uder suitable assumptios o the rates, = o P ( 1 2 ). Choose Ω to be ay stadard (sparse) precisio matrix estimator, e.g. the ode-wise Lasso estimator. Here, a = s Ω (log d)/ ad b = (log d)/ uder suitable coditios, with s Ω := max Ω j 0. 1 j d Abhishek Chakrabortty High-D M-Estimatio with Missig Outcomes: A Semi-Parametric Framework 16/35

33 The DR Aspect: Geeral Covergece Rates (uder Misspecificatio) Fially, let { π( ), m( )} be ay geeral but potetially misspecified estimators with targets {π ( ), m ( )}, so that either π ( ) = π( ) or m ( ) = m( ) but ot ecessarily both. Assume the same pt. wise covergece coditios ad rates (δ,π, ξ,m ) for { π( ), m( )} as i (2), but ow with {π( ), m( )} therei replaced by {π ( ), m ( )}. Uder some suitable assumptios, we have: with high probability, log d { T 0, + T π, + T m, 1 + 1(π,m ) (π,m) + o(1) } & R π,m, { δ,π 1 (m m) + ξ /2,m 1 (π π) + δ,π ξ /2,m } (log ). Note that the 2 d ad/or 3 rd terms (depedig o which estimator is misspecified) will also cotribute ow to the rate (log d)/. The 4 th term is o(1) but o loger igorable (ad may be slower). Regardless, this establishes geeral covergece rates & DR property of θ DDR uder possible misspecificatio of { π( ), m( )}. For the 4 th term, sharper rates eed a case-by-case aalysis. Abhishek Chakrabortty High-D M-Estimatio with Missig Outcomes: A Semi-Parametric Framework 17/35

34 The DR Aspect: Geeral Covergece Rates (uder Misspecificatio) Fially, let { π( ), m( )} be ay geeral but potetially misspecified estimators with targets {π ( ), m ( )}, so that either π ( ) = π( ) or m ( ) = m( ) but ot ecessarily both. Assume the same pt. wise covergece coditios ad rates (δ,π, ξ,m ) for { π( ), m( )} as i (2), but ow with {π( ), m( )} therei replaced by {π ( ), m ( )}. Uder some suitable assumptios, we have: with high probability, log d { T 0, + T π, + T m, 1 + 1(π,m ) (π,m) + o(1) } & R π,m, { δ,π 1 (m m) + ξ /2,m 1 (π π) + δ,π ξ /2,m } (log ). Note that the 2 d ad/or 3 rd terms (depedig o which estimator is misspecified) will also cotribute ow to the rate (log d)/. The 4 th term is o(1) but o loger igorable (ad may be slower). Regardless, this establishes geeral covergece rates & DR property of θ DDR uder possible misspecificatio of { π( ), m( )}. For the 4 th term, sharper rates eed a case-by-case aalysis. Abhishek Chakrabortty High-D M-Estimatio with Missig Outcomes: A Semi-Parametric Framework 17/35

35 Choices of the Nuisace Compoet Estimators π( ) ad m( ) Note: our theory holds geerally for ay choices of π( ) ad m( ) uder mild coditios (provided they are both correct estimators). Uder misspecificatios, cosistecy & geeral o-sharp rates are also established. Sharp rates eed case-by-case aalyses. Below we provide oly some choices of π( ) ad m( ) that may be used to implemet our geeral theory & methods for θ DDR. Choices of π( ): we cosider 2 (classes of) choices (these choices may also be used to implemet the aive IPW type estimator). Choices of m( ): first ote that m(x) := E(Y X) E(Y X, T = 1) (uder MAR). We cosider 2 (classes of) choices of m( ) as well. For both π( ) ad m( ), we cosider estimators from two families: (Exteded) parametric families. Semi-parametric sigle idex families. Abhishek Chakrabortty High-D M-Estimatio with Missig Outcomes: A Semi-Parametric Framework 18/35

36 Choices of the Nuisace Compoet Estimators π( ) ad m( ) Note: our theory holds geerally for ay choices of π( ) ad m( ) uder mild coditios (provided they are both correct estimators). Uder misspecificatios, cosistecy & geeral o-sharp rates are also established. Sharp rates eed case-by-case aalyses. Below we provide oly some choices of π( ) ad m( ) that may be used to implemet our geeral theory & methods for θ DDR. Choices of π( ): we cosider 2 (classes of) choices (these choices may also be used to implemet the aive IPW type estimator). Choices of m( ): first ote that m(x) := E(Y X) E(Y X, T = 1) (uder MAR). We cosider 2 (classes of) choices of m( ) as well. For both π( ) ad m( ), we cosider estimators from two families: (Exteded) parametric families. Semi-parametric sigle idex families. Abhishek Chakrabortty High-D M-Estimatio with Missig Outcomes: A Semi-Parametric Framework 18/35

37 Choices of π( ): (Exteded) Parametric Families If π( ) is kow, we set π( ) := π( ). Otherwise, we estimate π( ) via two (class of) choices of π( ) (each assumed to be correct ). (Exteded) parametric family: π(x) = g{α Ψ(X)}, where g( ) [0, 1] is a kow fuctio [e.g. g expit (u) := exp(u)/{1 + exp(u)}], Ψ(X) := {ψ k (X)} K k=1 is ay set of K basis fuctios (with K possibly), ad α R K is a ukow (sparse) parameter vector. Example: Ψ(X) may correspod to the polyomial bases of X upto ay fixed degree k. Note: the special case of liear bases (k = 1) icludes all stadard parametric regressio models. Further, the case of π( ) = costat (but ukow) i.e. MCAR is also icluded. Estimator: we set π(x) = g{ α Ψ(X)}, where α deotes ay suitable estimator (possibly pealized) of α based o T := {T i, X i } i=1. Example of α: whe g( ) = g expit ( ), α may be obtaied based o a stadard L 1 -pealized logistic regressio of {T i vs. Ψ(X i )} i=1. Abhishek Chakrabortty High-D M-Estimatio with Missig Outcomes: A Semi-Parametric Framework 19/35

38 Choices of π( ): (Exteded) Parametric Families If π( ) is kow, we set π( ) := π( ). Otherwise, we estimate π( ) via two (class of) choices of π( ) (each assumed to be correct ). (Exteded) parametric family: π(x) = g{α Ψ(X)}, where g( ) [0, 1] is a kow fuctio [e.g. g expit (u) := exp(u)/{1 + exp(u)}], Ψ(X) := {ψ k (X)} K k=1 is ay set of K basis fuctios (with K possibly), ad α R K is a ukow (sparse) parameter vector. Example: Ψ(X) may correspod to the polyomial bases of X upto ay fixed degree k. Note: the special case of liear bases (k = 1) icludes all stadard parametric regressio models. Further, the case of π( ) = costat (but ukow) i.e. MCAR is also icluded. Estimator: we set π(x) = g{ α Ψ(X)}, where α deotes ay suitable estimator (possibly pealized) of α based o T := {T i, X i } i=1. Example of α: whe g( ) = g expit ( ), α may be obtaied based o a stadard L 1 -pealized logistic regressio of {T i vs. Ψ(X i )} i=1. Abhishek Chakrabortty High-D M-Estimatio with Missig Outcomes: A Semi-Parametric Framework 19/35

39 Choices of π( ): Semi-Parametric Sigle Idex Families Semi-parametric sigle idex family: π(x) = g(α X), where g( ) (0, 1) is ukow ad α R p is a (sparse) ukow parameter (idetifiable oly upto scalar multiples, hece set α 2 = 1 wlog). Give a estimator α of α, we estimate π(x) E(T α X) as: π(x) π( α, x) := 1 h 1 h i=1 T ik { α (X i x)/h } i=1 K { α (X i x)/h } := l π ( α, x) fπ ( α, x), where K ( ) deotes ay stadard (2 d order) kerel fuctio, h = h > 0 deotes the badwidth sequece, ad l π (α, x) := l α (α X) := g(α X)f α (α X) ad f π (α, x) := f α (α X), with g(α x) E(T α X = α x) ad f α (α x) beig the desity of α X at α x. Obtaiig α: I geeral, ay approach (if available) from (high dimesioal) sigle idex model literature ca be used. But if X is elliptically symmetric, the α may be obtaied as simply as a stadard L 1 -pealized logistic regressio of {T i vs. X i } i=1. Abhishek Chakrabortty High-D M-Estimatio with Missig Outcomes: A Semi-Parametric Framework 20/35

40 Choices of π( ): Semi-Parametric Sigle Idex Families Semi-parametric sigle idex family: π(x) = g(α X), where g( ) (0, 1) is ukow ad α R p is a (sparse) ukow parameter (idetifiable oly upto scalar multiples, hece set α 2 = 1 wlog). Give a estimator α of α, we estimate π(x) E(T α X) as: π(x) π( α, x) := 1 h 1 h i=1 T ik { α (X i x)/h } i=1 K { α (X i x)/h } := l π ( α, x) fπ ( α, x), where K ( ) deotes ay stadard (2 d order) kerel fuctio, h = h > 0 deotes the badwidth sequece, ad l π (α, x) := l α (α X) := g(α X)f α (α X) ad f π (α, x) := f α (α X), with g(α x) E(T α X = α x) ad f α (α x) beig the desity of α X at α x. Obtaiig α: I geeral, ay approach (if available) from (high dimesioal) sigle idex model literature ca be used. But if X is elliptically symmetric, the α may be obtaied as simply as a stadard L 1 -pealized logistic regressio of {T i vs. X i } i=1. Abhishek Chakrabortty High-D M-Estimatio with Missig Outcomes: A Semi-Parametric Framework 20/35

41 Choices of π( ): Semi-Parametric Sigle Idex Families Semi-parametric sigle idex family: π(x) = g(α X), where g( ) (0, 1) is ukow ad α R p is a (sparse) ukow parameter (idetifiable oly upto scalar multiples, hece set α 2 = 1 wlog). Give a estimator α of α, we estimate π(x) E(T α X) as: π(x) π( α, x) := 1 h 1 h i=1 T ik { α (X i x)/h } i=1 K { α (X i x)/h } := l π ( α, x) fπ ( α, x), where K ( ) deotes ay stadard (2 d order) kerel fuctio, h = h > 0 deotes the badwidth sequece, ad l π (α, x) := l α (α X) := g(α X)f α (α X) ad f π (α, x) := f α (α X), with g(α x) E(T α X = α x) ad f α (α x) beig the desity of α X at α x. Obtaiig α: I geeral, ay approach (if available) from (high dimesioal) sigle idex model literature ca be used. But if X is elliptically symmetric, the α may be obtaied as simply as a stadard L 1 -pealized logistic regressio of {T i vs. X i } i=1. Abhishek Chakrabortty High-D M-Estimatio with Missig Outcomes: A Semi-Parametric Framework 20/35

42 Choices of m( ): (Exteded) Parametric Families (Exteded) parametric family: m(x) = g{γ Ψ(X)}, where g( ) is a kow lik fuctio [e.g. caoical liks: idetity, expit or exp], Ψ(X) := {ψ k (X)} K k=1 is ay set of K basis fuctios (with K possibly), ad γ R K is a ukow (sparse) parameter vector. Example: Ψ(X) may correspod to the polyomial bases of X upto ay fixed degree k. Note: the special case of liear bases (k = 1) icludes all stadard parametric regressio models. Estimator: we set m(x) = g{ γ Ψ(X)}, where γ deotes ay suitable estimator (possibly pealized) of γ based o the data subset of complete cases : D (c) := {(Y i, X i ) T i = 1} i=1. Example of γ: whe g( ) := ay caoical lik fuctio, γ may be simply obtaied based o the respective usual L 1 -pealized caoical lik based regressio (e.g. liear, logistic or poisso) of {(Y i vs. X i ) T i = 1} i=1 from the complete case data D(c). Abhishek Chakrabortty High-D M-Estimatio with Missig Outcomes: A Semi-Parametric Framework 21/35

43 Choices of m( ): (Exteded) Parametric Families (Exteded) parametric family: m(x) = g{γ Ψ(X)}, where g( ) is a kow lik fuctio [e.g. caoical liks: idetity, expit or exp], Ψ(X) := {ψ k (X)} K k=1 is ay set of K basis fuctios (with K possibly), ad γ R K is a ukow (sparse) parameter vector. Example: Ψ(X) may correspod to the polyomial bases of X upto ay fixed degree k. Note: the special case of liear bases (k = 1) icludes all stadard parametric regressio models. Estimator: we set m(x) = g{ γ Ψ(X)}, where γ deotes ay suitable estimator (possibly pealized) of γ based o the data subset of complete cases : D (c) := {(Y i, X i ) T i = 1} i=1. Example of γ: whe g( ) := ay caoical lik fuctio, γ may be simply obtaied based o the respective usual L 1 -pealized caoical lik based regressio (e.g. liear, logistic or poisso) of {(Y i vs. X i ) T i = 1} i=1 from the complete case data D(c). Abhishek Chakrabortty High-D M-Estimatio with Missig Outcomes: A Semi-Parametric Framework 21/35

44 Choices of m( ): Semi-Parametric Sigle Idex Families Semi-parametric sigle idex family: m(x) = g(γ X), where g( ) is a ukow lik ad γ R p is a (sparse) ukow parameter (idetifiable oly upto scalar multiples, hece set γ 2 = 1 wlog). Give a estimator γ of γ, we estimate m(x) E(Y γ X, T ) as: 1 h i=1 m(x) m( γ, x) := T iy i K { γ (X i x)/h } i=1 T i K { γ (X i x)/h }, 1 h where K ( ) deotes ay stadard (2 d order) kerel fuctio, ad h = h > 0 deotes the badwidth sequece. Obtaiig γ: I geeral, ay approach (if available) from HD SIM literature ca be used o the complete case data subset D (c). If X is elliptically symmetric ad Y = f (γ X; ɛ) with f ukow ad ɛ (T, X), the γ may be obtaied as L 1 -pealized IPW estimator θ IPW (discussed later) for ay caoical lik based regressio problem (recall the illustrative ex. regardig SIMs). To implemet θ IPW, ca use ay of the 2 earlier choices of π( ). Abhishek Chakrabortty High-D M-Estimatio with Missig Outcomes: A Semi-Parametric Framework 22/35

45 Choices of m( ): Semi-Parametric Sigle Idex Families Semi-parametric sigle idex family: m(x) = g(γ X), where g( ) is a ukow lik ad γ R p is a (sparse) ukow parameter (idetifiable oly upto scalar multiples, hece set γ 2 = 1 wlog). Give a estimator γ of γ, we estimate m(x) E(Y γ X, T ) as: 1 h i=1 m(x) m( γ, x) := T iy i K { γ (X i x)/h } i=1 T i K { γ (X i x)/h }, 1 h where K ( ) deotes ay stadard (2 d order) kerel fuctio, ad h = h > 0 deotes the badwidth sequece. Obtaiig γ: I geeral, ay approach (if available) from HD SIM literature ca be used o the complete case data subset D (c). If X is elliptically symmetric ad Y = f (γ X; ɛ) with f ukow ad ɛ (T, X), the γ may be obtaied as L 1 -pealized IPW estimator θ IPW (discussed later) for ay caoical lik based regressio problem (recall the illustrative ex. regardig SIMs). To implemet θ IPW, ca use ay of the 2 earlier choices of π( ). Abhishek Chakrabortty High-D M-Estimatio with Missig Outcomes: A Semi-Parametric Framework 22/35

46 Choices of m( ): Semi-Parametric Sigle Idex Families Semi-parametric sigle idex family: m(x) = g(γ X), where g( ) is a ukow lik ad γ R p is a (sparse) ukow parameter (idetifiable oly upto scalar multiples, hece set γ 2 = 1 wlog). Give a estimator γ of γ, we estimate m(x) E(Y γ X, T ) as: 1 h i=1 m(x) m( γ, x) := T iy i K { γ (X i x)/h } i=1 T i K { γ (X i x)/h }, 1 h where K ( ) deotes ay stadard (2 d order) kerel fuctio, ad h = h > 0 deotes the badwidth sequece. Obtaiig γ: I geeral, ay approach (if available) from HD SIM literature ca be used o the complete case data subset D (c). If X is elliptically symmetric ad Y = f (γ X; ɛ) with f ukow ad ɛ (T, X), the γ may be obtaied as L 1 -pealized IPW estimator θ IPW (discussed later) for ay caoical lik based regressio problem (recall the illustrative ex. regardig SIMs). To implemet θ IPW, ca use ay of the 2 earlier choices of π( ). Abhishek Chakrabortty High-D M-Estimatio with Missig Outcomes: A Semi-Parametric Framework 22/35

47 Covergece Rates Regardig The Choices of π( ) π( ) For either choices of π( ), assume that the igrediet estimator α satisfies: α α 1 P a for some a = o(1). The, uder some suitable assumptios, we have: with high probability (w.h.p.), π(x) π(x) a for ay fixed x X, (for method 1). For method 2, we have: with high probability, for ay fixed x X, ( π(x) π(x) h ) ( ) log p + a + a h h 3 + a2 h 2. Usually, we expect the L 1 error rate of α to be a = s α (log d )/ where s α := α 0 ad d = K or p (depedig o the method). Abhishek Chakrabortty High-D M-Estimatio with Missig Outcomes: A Semi-Parametric Framework 23/35

48 Covergece Rates Regardig the Choices of m( ) m( ) For either of the two choices of m( ), assume that the igrediet estimator γ satisfies: γ γ 1 P b for some b = o(1). The, uder some suitable assumptios, we have: with high probability, m(x) m(x) b for ay fixed x X (for method 1). For method 2, we have: with high probability, for ay fixed x X, m(x) m(x) ( h ) ( ) log p + b + b h h 3 + b2 h 2. We typically expect the L 1 error rate of γ to be b = s γ (log d )/ where s γ := α 0 ad d = K or p (depedig o the method). Abhishek Chakrabortty High-D M-Estimatio with Missig Outcomes: A Semi-Parametric Framework 24/35

49 The Mea Estimatio Problem Abhishek Chakrabortty High-D M-Estimatio with Missig Outcomes: A Semi-Parametric Framework 25/35

50 Mea Estimatio: The Debiased ad Doubly Robust (DDR) Estimator Let us ext focus o the (low-d) mea estimatio problem, where θ 0 := E(Y ). The, we first ote the DDR represetatio of θ 0 : [ ] T θ 0 E(Y ) = E X {m(x)} + E {Y m(x)}, π(x) where m(x) := E(Y X). The secod term above is actually 0 ad is ofte called the augmeted IPW (AIPW) term. Agai, this is a purely o-parametric idetificatio of θ 0 based o the observable Z ad two uisace fuctios: π(x) ad m(x) both of which may be ukow but are still estimable from D. Let m( ) ad π( ) be ay estimators of m( ) ad π( ), ad further assume that { m(x i )} i=1 are obtaied i a cross-fitted maer (via sample splittig). The, defie the DDR estimator θ DDR of θ 0 : θ DDR = 1 m(x i ) + 1 i=1 i=1 T i π(x i ) {Y i m(x i )}. Abhishek Chakrabortty High-D M-Estimatio with Missig Outcomes: A Semi-Parametric Framework 26/35

51 Mea Estimatio: The Debiased ad Doubly Robust (DDR) Estimator Let us ext focus o the (low-d) mea estimatio problem, where θ 0 := E(Y ). The, we first ote the DDR represetatio of θ 0 : [ ] T θ 0 E(Y ) = E X {m(x)} + E {Y m(x)}, π(x) where m(x) := E(Y X). The secod term above is actually 0 ad is ofte called the augmeted IPW (AIPW) term. Agai, this is a purely o-parametric idetificatio of θ 0 based o the observable Z ad two uisace fuctios: π(x) ad m(x) both of which may be ukow but are still estimable from D. Let m( ) ad π( ) be ay estimators of m( ) ad π( ), ad further assume that { m(x i )} i=1 are obtaied i a cross-fitted maer (via sample splittig). The, defie the DDR estimator θ DDR of θ 0 : θ DDR = 1 m(x i ) + 1 i=1 i=1 T i π(x i ) {Y i m(x i )}. Abhishek Chakrabortty High-D M-Estimatio with Missig Outcomes: A Semi-Parametric Framework 26/35

52 Properties of The DDR Mea Estimator θ DDR is cosistet for θ 0 as soo as oe, but ot ecessarily both, of π( ) or m( ) are correct estimators (cosistet, with some rate coditios) of the true π( ) or m( )! Hece, the double robustess. Further whe both { m( ), π( )} are correct, θ DDR is a -cosistet ad RAL (regular ad asymptotically liear) estimator of θ 0 with IF (ifluece fuctio), which is also the optimal/ efficiet IF, give by ψ eff (Z) = {m(x) θ 0 } + T π(x) {Y m(x)}, with E{ψ eff(z)} = 0. So, θ DDR the satisfies: ( θ DDR θ 0 ) = 1 i=1 ψ eff(z i ) + o P (1). This holds regardless of how { π( ), m( )} are obtaied, as log as they satisfy some (basic) cosistecy & rate requiremets. ψ eff (Z) is the IF with the smallest variace (the semi-parametric efficiecy boud) achievable by ay RAL estimator of θ 0. Abhishek Chakrabortty High-D M-Estimatio with Missig Outcomes: A Semi-Parametric Framework 27/35

53 Properties of The DDR Mea Estimator θ DDR is cosistet for θ 0 as soo as oe, but ot ecessarily both, of π( ) or m( ) are correct estimators (cosistet, with some rate coditios) of the true π( ) or m( )! Hece, the double robustess. Further whe both { m( ), π( )} are correct, θ DDR is a -cosistet ad RAL (regular ad asymptotically liear) estimator of θ 0 with IF (ifluece fuctio), which is also the optimal/ efficiet IF, give by ψ eff (Z) = {m(x) θ 0 } + T π(x) {Y m(x)}, with E{ψ eff(z)} = 0. So, θ DDR the satisfies: ( θ DDR θ 0 ) = 1 i=1 ψ eff(z i ) + o P (1). This holds regardless of how { π( ), m( )} are obtaied, as log as they satisfy some (basic) cosistecy & rate requiremets. ψ eff (Z) is the IF with the smallest variace (the semi-parametric efficiecy boud) achievable by ay RAL estimator of θ 0. Abhishek Chakrabortty High-D M-Estimatio with Missig Outcomes: A Semi-Parametric Framework 27/35

54 Properties of The DDR Mea Estimator θ DDR is cosistet for θ 0 as soo as oe, but ot ecessarily both, of π( ) or m( ) are correct estimators (cosistet, with some rate coditios) of the true π( ) or m( )! Hece, the double robustess. Further whe both { m( ), π( )} are correct, θ DDR is a -cosistet ad RAL (regular ad asymptotically liear) estimator of θ 0 with IF (ifluece fuctio), which is also the optimal/ efficiet IF, give by ψ eff (Z) = {m(x) θ 0 } + T π(x) {Y m(x)}, with E{ψ eff(z)} = 0. So, θ DDR the satisfies: ( θ DDR θ 0 ) = 1 i=1 ψ eff(z i ) + o P (1). This holds regardless of how { π( ), m( )} are obtaied, as log as they satisfy some (basic) cosistecy & rate requiremets. ψ eff (Z) is the IF with the smallest variace (the semi-parametric efficiecy boud) achievable by ay RAL estimator of θ 0. Abhishek Chakrabortty High-D M-Estimatio with Missig Outcomes: A Semi-Parametric Framework 27/35

55 The DDR Mea Estimator: -Cosistecy ad RAL Properties Recall that, usig the expasio we already got, we have: ( θddr θ 0 ) = 1 i=1 ψ(z i) + T (m) + T (π) R. First term: ot a problem -scaled i.i.d. average of ψ(z) with E{ψ(Z)} = 0 ad Var{ψ(Z)} <. Last term: essetially make it go to zero. Will hold wheever π( ) π ( ) m( ) m ( ) = o P (1). A typical sufficiet (but ot ecessary) coditio: each of π( ) π ( ) ad m( ) m ( ) are o P ( 0.25 ). Now for T (m) 1 { } i=1 { m(x i ) m (X i )} 1 T i π (X i ) If π ( ) = π( ), the E{ T (m) } = 0 ad Var{ T (m) } = O[E X { m(x) m (X)} 2 ] = o(1). Hece, T (m) = o P (1). But if π ( ) π( ), the { m( ) m ( )} will cotribute ad the rate is uclear! T (m) may be slower tha 1/2 i high-d cases. Abhishek Chakrabortty High-D M-Estimatio with Missig Outcomes: A Semi-Parametric Framework 28/35

56 The DDR Mea Estimator: -Cosistecy ad RAL Properties Recall that, usig the expasio we already got, we have: ( θddr θ 0 ) = 1 i=1 ψ(z i) + T (m) + T (π) R. First term: ot a problem -scaled i.i.d. average of ψ(z) with E{ψ(Z)} = 0 ad Var{ψ(Z)} <. Last term: essetially make it go to zero. Will hold wheever π( ) π ( ) m( ) m ( ) = o P (1). A typical sufficiet (but ot ecessary) coditio: each of π( ) π ( ) ad m( ) m ( ) are o P ( 0.25 ). Now for T (m) 1 { } i=1 { m(x i ) m (X i )} 1 T i π (X i ) If π ( ) = π( ), the E{ T (m) } = 0 ad Var{ T (m) } = O[E X { m(x) m (X)} 2 ] = o(1). Hece, T (m) = o P (1). But if π ( ) π( ), the { m( ) m ( )} will cotribute ad the rate is uclear! T (m) may be slower tha 1/2 i high-d cases. Abhishek Chakrabortty High-D M-Estimatio with Missig Outcomes: A Semi-Parametric Framework 28/35

57 The DDR Mea Estimator: -Cosistecy ad RAL Properties Recall that, usig the expasio we already got, we have: ( θddr θ 0 ) = 1 i=1 ψ(z i) + T (m) + T (π) R. First term: ot a problem -scaled i.i.d. average of ψ(z) with E{ψ(Z)} = 0 ad Var{ψ(Z)} <. Last term: essetially make it go to zero. Will hold wheever π( ) π ( ) m( ) m ( ) = o P (1). A typical sufficiet (but ot ecessary) coditio: each of π( ) π ( ) ad m( ) m ( ) are o P ( 0.25 ). Now for T (m) 1 { } i=1 { m(x i ) m (X i )} 1 T i π (X i ) If π ( ) = π( ), the E{ T (m) } = 0 ad Var{ T (m) } = O[E X { m(x) m (X)} 2 ] = o(1). Hece, T (m) = o P (1). But if π ( ) π( ), the { m( ) m ( )} will cotribute ad the rate is uclear! T (m) may be slower tha 1/2 i high-d cases. Abhishek Chakrabortty High-D M-Estimatio with Missig Outcomes: A Semi-Parametric Framework 28/35

58 Aalysis of the DDR Estimator: RAL Properties Cotd. Next, for T (π) = 1 i=1 {Y i m (X i )} { Ti π(x i ) } T i π (X i ) If m ( ) = m( ), the E{ T (π) } = 0 ad Var{ T (π) } = O[E X { π(x) π(x)} 2 ] = o(1). Hece, T (π) = o P (1). But if π ( ) π( ), the { π( ) π( )} will cotribute ad the rate is uclear! T (π) may be slower tha 1/2 i high-d cases. Overall, as log as oe of m( ) or π( ) is correct, θ DDR is ideed a cosistet (ad hece, DR) estimator of θ 0. But its RAL properties & achievability of 1/2 rate will geerally require more coditios ad a case by case aalysis, ad the exact IF (if existet) depeds o which of the two are correct (ad how we obtai them). If m( ) ad π( ) are both correct, the regardless of how they are obtaied, θ DDR is always a optimal RAL estimator of θ 0. Choices of { π( ), m( )}? We first obtai a geeral theory ad the prove the rates for a class of choices (possibly misspecificied). Abhishek Chakrabortty High-D M-Estimatio with Missig Outcomes: A Semi-Parametric Framework 29/35

HIGH DIMENSIONAL M-ESTIMATION WITH MISSING OUTCOMES: A SEMI-PARAMETRIC FRAMEWORK

HIGH DIMENSIONAL M-ESTIMATION WITH MISSING OUTCOMES: A SEMI-PARAMETRIC FRAMEWORK HIGH DIMENSIONAL M-ESTIMATION WITH MISSING OUTCOMES: A SEMI-PARAMETRIC FRAMEWORK By Abhishek Chakrabortty, T. Toy Cai ad Hogzhe Li Uiversity of Pesylvaia October 0, 018 I this paper, we cosider high dimesioal

More information

Lecture 19: Convergence

Lecture 19: Convergence Lecture 19: Covergece Asymptotic approach I statistical aalysis or iferece, a key to the success of fidig a good procedure is beig able to fid some momets ad/or distributios of various statistics. I may

More information

Slide Set 13 Linear Model with Endogenous Regressors and the GMM estimator

Slide Set 13 Linear Model with Endogenous Regressors and the GMM estimator Slide Set 13 Liear Model with Edogeous Regressors ad the GMM estimator Pietro Coretto pcoretto@uisa.it Ecoometrics Master i Ecoomics ad Fiace (MEF) Uiversità degli Studi di Napoli Federico II Versio: Friday

More information

First Year Quantitative Comp Exam Spring, Part I - 203A. f X (x) = 0 otherwise

First Year Quantitative Comp Exam Spring, Part I - 203A. f X (x) = 0 otherwise First Year Quatitative Comp Exam Sprig, 2012 Istructio: There are three parts. Aswer every questio i every part. Questio I-1 Part I - 203A A radom variable X is distributed with the margial desity: >

More information

Summary and Discussion on Simultaneous Analysis of Lasso and Dantzig Selector

Summary and Discussion on Simultaneous Analysis of Lasso and Dantzig Selector Summary ad Discussio o Simultaeous Aalysis of Lasso ad Datzig Selector STAT732, Sprig 28 Duzhe Wag May 4, 28 Abstract This is a discussio o the work i Bickel, Ritov ad Tsybakov (29). We begi with a short

More information

Advanced Stochastic Processes.

Advanced Stochastic Processes. Advaced Stochastic Processes. David Gamarik LECTURE 2 Radom variables ad measurable fuctios. Strog Law of Large Numbers (SLLN). Scary stuff cotiued... Outlie of Lecture Radom variables ad measurable fuctios.

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS MASSACHUSTTS INSTITUT OF TCHNOLOGY 6.436J/5.085J Fall 2008 Lecture 9 /7/2008 LAWS OF LARG NUMBRS II Cotets. The strog law of large umbers 2. The Cheroff boud TH STRONG LAW OF LARG NUMBRS While the weak

More information

Convergence of random variables. (telegram style notes) P.J.C. Spreij

Convergence of random variables. (telegram style notes) P.J.C. Spreij Covergece of radom variables (telegram style otes).j.c. Spreij this versio: September 6, 2005 Itroductio As we kow, radom variables are by defiitio measurable fuctios o some uderlyig measurable space

More information

Distribution of Random Samples & Limit theorems

Distribution of Random Samples & Limit theorems STAT/MATH 395 A - PROBABILITY II UW Witer Quarter 2017 Néhémy Lim Distributio of Radom Samples & Limit theorems 1 Distributio of i.i.d. Samples Motivatig example. Assume that the goal of a study is to

More information

An Introduction to Asymptotic Theory

An Introduction to Asymptotic Theory A Itroductio to Asymptotic Theory Pig Yu School of Ecoomics ad Fiace The Uiversity of Hog Kog Pig Yu (HKU) Asymptotic Theory 1 / 20 Five Weapos i Asymptotic Theory Five Weapos i Asymptotic Theory Pig Yu

More information

EECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1

EECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1 EECS564 Estimatio, Filterig, ad Detectio Hwk 2 Sols. Witer 25 4. Let Z be a sigle observatio havig desity fuctio where. p (z) = (2z + ), z (a) Assumig that is a oradom parameter, fid ad plot the maximum

More information

Sample Size Estimation in the Proportional Hazards Model for K-sample or Regression Settings Scott S. Emerson, M.D., Ph.D.

Sample Size Estimation in the Proportional Hazards Model for K-sample or Regression Settings Scott S. Emerson, M.D., Ph.D. ample ie Estimatio i the Proportioal Haards Model for K-sample or Regressio ettigs cott. Emerso, M.D., Ph.D. ample ie Formula for a Normally Distributed tatistic uppose a statistic is kow to be ormally

More information

A survey on penalized empirical risk minimization Sara A. van de Geer

A survey on penalized empirical risk minimization Sara A. van de Geer A survey o pealized empirical risk miimizatio Sara A. va de Geer We address the questio how to choose the pealty i empirical risk miimizatio. Roughly speakig, this pealty should be a good boud for the

More information

Lecture 8: Convergence of transformations and law of large numbers

Lecture 8: Convergence of transformations and law of large numbers Lecture 8: Covergece of trasformatios ad law of large umbers Trasformatio ad covergece Trasformatio is a importat tool i statistics. If X coverges to X i some sese, we ofte eed to check whether g(x ) coverges

More information

Chapter 3. Strong convergence. 3.1 Definition of almost sure convergence

Chapter 3. Strong convergence. 3.1 Definition of almost sure convergence Chapter 3 Strog covergece As poited out i the Chapter 2, there are multiple ways to defie the otio of covergece of a sequece of radom variables. That chapter defied covergece i probability, covergece i

More information

Lecture 33: Bootstrap

Lecture 33: Bootstrap Lecture 33: ootstrap Motivatio To evaluate ad compare differet estimators, we eed cosistet estimators of variaces or asymptotic variaces of estimators. This is also importat for hypothesis testig ad cofidece

More information

Lecture 24: Variable selection in linear models

Lecture 24: Variable selection in linear models Lecture 24: Variable selectio i liear models Cosider liear model X = Z β + ε, β R p ad Varε = σ 2 I. Like the LSE, the ridge regressio estimator does ot give 0 estimate to a compoet of β eve if that compoet

More information

Economics 241B Relation to Method of Moments and Maximum Likelihood OLSE as a Maximum Likelihood Estimator

Economics 241B Relation to Method of Moments and Maximum Likelihood OLSE as a Maximum Likelihood Estimator Ecoomics 24B Relatio to Method of Momets ad Maximum Likelihood OLSE as a Maximum Likelihood Estimator Uder Assumptio 5 we have speci ed the distributio of the error, so we ca estimate the model parameters

More information

Product measures, Tonelli s and Fubini s theorems For use in MAT3400/4400, autumn 2014 Nadia S. Larsen. Version of 13 October 2014.

Product measures, Tonelli s and Fubini s theorems For use in MAT3400/4400, autumn 2014 Nadia S. Larsen. Version of 13 October 2014. Product measures, Toelli s ad Fubii s theorems For use i MAT3400/4400, autum 2014 Nadia S. Larse Versio of 13 October 2014. 1. Costructio of the product measure The purpose of these otes is to preset the

More information

Efficient GMM LECTURE 12 GMM II

Efficient GMM LECTURE 12 GMM II DECEMBER 1 010 LECTURE 1 II Efficiet The estimator depeds o the choice of the weight matrix A. The efficiet estimator is the oe that has the smallest asymptotic variace amog all estimators defied by differet

More information

ECONOMETRIC THEORY. MODULE XIII Lecture - 34 Asymptotic Theory and Stochastic Regressors

ECONOMETRIC THEORY. MODULE XIII Lecture - 34 Asymptotic Theory and Stochastic Regressors ECONOMETRIC THEORY MODULE XIII Lecture - 34 Asymptotic Theory ad Stochastic Regressors Dr. Shalabh Departmet of Mathematics ad Statistics Idia Istitute of Techology Kapur Asymptotic theory The asymptotic

More information

Sequences and Series of Functions

Sequences and Series of Functions Chapter 6 Sequeces ad Series of Fuctios 6.1. Covergece of a Sequece of Fuctios Poitwise Covergece. Defiitio 6.1. Let, for each N, fuctio f : A R be defied. If, for each x A, the sequece (f (x)) coverges

More information

Lecture 20: Multivariate convergence and the Central Limit Theorem

Lecture 20: Multivariate convergence and the Central Limit Theorem Lecture 20: Multivariate covergece ad the Cetral Limit Theorem Covergece i distributio for radom vectors Let Z,Z 1,Z 2,... be radom vectors o R k. If the cdf of Z is cotiuous, the we ca defie covergece

More information

Lecture 3: August 31

Lecture 3: August 31 36-705: Itermediate Statistics Fall 018 Lecturer: Siva Balakrisha Lecture 3: August 31 This lecture will be mostly a summary of other useful expoetial tail bouds We will ot prove ay of these i lecture,

More information

Basics of Probability Theory (for Theory of Computation courses)

Basics of Probability Theory (for Theory of Computation courses) Basics of Probability Theory (for Theory of Computatio courses) Oded Goldreich Departmet of Computer Sciece Weizma Istitute of Sciece Rehovot, Israel. oded.goldreich@weizma.ac.il November 24, 2008 Preface.

More information

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss ECE 90 Lecture : Complexity Regularizatio ad the Squared Loss R. Nowak 5/7/009 I the previous lectures we made use of the Cheroff/Hoeffdig bouds for our aalysis of classifier errors. Hoeffdig s iequality

More information

5 : Exponential Family and Generalized Linear Models

5 : Exponential Family and Generalized Linear Models 0-708: Probabilistic Graphical Models 0-708, Sprig 206 5 : Expoetial Family ad Geeralized Liear Models Lecturer: Matthew Gormley Scribes: Yua Li, Yichog Xu, Silu Wag Expoetial Family Probability desity

More information

6 Sample Size Calculations

6 Sample Size Calculations 6 Sample Size Calculatios Oe of the major resposibilities of a cliical trial statisticia is to aid the ivestigators i determiig the sample size required to coduct a study The most commo procedure for determiig

More information

Stochastic Simulation

Stochastic Simulation Stochastic Simulatio 1 Itroductio Readig Assigmet: Read Chapter 1 of text. We shall itroduce may of the key issues to be discussed i this course via a couple of model problems. Model Problem 1 (Jackso

More information

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n. Jauary 1, 2019 Resamplig Methods Motivatio We have so may estimators with the property θ θ d N 0, σ 2 We ca also write θ a N θ, σ 2 /, where a meas approximately distributed as Oce we have a cosistet estimator

More information

A Distributional Approach Using Propensity Scores

A Distributional Approach Using Propensity Scores A Distributioal Approach Usig Propesity Scores Zhiqiag Ta Departmet of Biostatistics Johs Hopkis School of Public Health http://www.biostat.jhsph.edu/ zta Jue 20, 2005 Outlie Itroductio Couterfactual framework

More information

Empirical Processes: Glivenko Cantelli Theorems

Empirical Processes: Glivenko Cantelli Theorems Empirical Processes: Gliveko Catelli Theorems Mouliath Baerjee Jue 6, 200 Gliveko Catelli classes of fuctios The reader is referred to Chapter.6 of Weller s Torgo otes, Chapter??? of VDVW ad Chapter 8.3

More information

7.1 Convergence of sequences of random variables

7.1 Convergence of sequences of random variables Chapter 7 Limit Theorems Throughout this sectio we will assume a probability space (, F, P), i which is defied a ifiite sequece of radom variables (X ) ad a radom variable X. The fact that for every ifiite

More information

Optimally Sparse SVMs

Optimally Sparse SVMs A. Proof of Lemma 3. We here prove a lower boud o the umber of support vectors to achieve geeralizatio bouds of the form which we cosider. Importatly, this result holds ot oly for liear classifiers, but

More information

LECTURE 14 NOTES. A sequence of α-level tests {ϕ n (x)} is consistent if

LECTURE 14 NOTES. A sequence of α-level tests {ϕ n (x)} is consistent if LECTURE 14 NOTES 1. Asymptotic power of tests. Defiitio 1.1. A sequece of -level tests {ϕ x)} is cosistet if β θ) := E θ [ ϕ x) ] 1 as, for ay θ Θ 1. Just like cosistecy of a sequece of estimators, Defiitio

More information

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4 MATH 30: Probability ad Statistics 9. Estimatio ad Testig of Parameters Estimatio ad Testig of Parameters We have bee dealig situatios i which we have full kowledge of the distributio of a radom variable.

More information

Supplemental Material: Proofs

Supplemental Material: Proofs Proof to Theorem Supplemetal Material: Proofs Proof. Let be the miimal umber of traiig items to esure a uique solutio θ. First cosider the case. It happes if ad oly if θ ad Rak(A) d, which is a special

More information

Asymptotic Results for the Linear Regression Model

Asymptotic Results for the Linear Regression Model Asymptotic Results for the Liear Regressio Model C. Fli November 29, 2000 1. Asymptotic Results uder Classical Assumptios The followig results apply to the liear regressio model y = Xβ + ε, where X is

More information

Random Variables, Sampling and Estimation

Random Variables, Sampling and Estimation Chapter 1 Radom Variables, Samplig ad Estimatio 1.1 Itroductio This chapter will cover the most importat basic statistical theory you eed i order to uderstad the ecoometric material that will be comig

More information

Lecture 3 : Random variables and their distributions

Lecture 3 : Random variables and their distributions Lecture 3 : Radom variables ad their distributios 3.1 Radom variables Let (Ω, F) ad (S, S) be two measurable spaces. A map X : Ω S is measurable or a radom variable (deoted r.v.) if X 1 (A) {ω : X(ω) A}

More information

SEMIPARAMETRIC SINGLE-INDEX MODELS. Joel L. Horowitz Department of Economics Northwestern University

SEMIPARAMETRIC SINGLE-INDEX MODELS. Joel L. Horowitz Department of Economics Northwestern University SEMIPARAMETRIC SINGLE-INDEX MODELS by Joel L. Horowitz Departmet of Ecoomics Northwester Uiversity INTRODUCTION Much of applied ecoometrics ad statistics ivolves estimatig a coditioal mea fuctio: E ( Y

More information

January 25, 2017 INTRODUCTION TO MATHEMATICAL STATISTICS

January 25, 2017 INTRODUCTION TO MATHEMATICAL STATISTICS Jauary 25, 207 INTRODUCTION TO MATHEMATICAL STATISTICS Abstract. A basic itroductio to statistics assumig kowledge of probability theory.. Probability I a typical udergraduate problem i probability, we

More information

Statisticians use the word population to refer the total number of (potential) observations under consideration

Statisticians use the word population to refer the total number of (potential) observations under consideration 6 Samplig Distributios Statisticias use the word populatio to refer the total umber of (potetial) observatios uder cosideratio The populatio is just the set of all possible outcomes i our sample space

More information

Fall 2013 MTH431/531 Real analysis Section Notes

Fall 2013 MTH431/531 Real analysis Section Notes Fall 013 MTH431/531 Real aalysis Sectio 8.1-8. Notes Yi Su 013.11.1 1. Defiitio of uiform covergece. We look at a sequece of fuctios f (x) ad study the coverget property. Notice we have two parameters

More information

Ada Boost, Risk Bounds, Concentration Inequalities. 1 AdaBoost and Estimates of Conditional Probabilities

Ada Boost, Risk Bounds, Concentration Inequalities. 1 AdaBoost and Estimates of Conditional Probabilities CS8B/Stat4B Sprig 008) Statistical Learig Theory Lecture: Ada Boost, Risk Bouds, Cocetratio Iequalities Lecturer: Peter Bartlett Scribe: Subhrasu Maji AdaBoost ad Estimates of Coditioal Probabilities We

More information

lim za n n = z lim a n n.

lim za n n = z lim a n n. Lecture 6 Sequeces ad Series Defiitio 1 By a sequece i a set A, we mea a mappig f : N A. It is customary to deote a sequece f by {s } where, s := f(). A sequece {z } of (complex) umbers is said to be coverget

More information

A RANK STATISTIC FOR NON-PARAMETRIC K-SAMPLE AND CHANGE POINT PROBLEMS

A RANK STATISTIC FOR NON-PARAMETRIC K-SAMPLE AND CHANGE POINT PROBLEMS J. Japa Statist. Soc. Vol. 41 No. 1 2011 67 73 A RANK STATISTIC FOR NON-PARAMETRIC K-SAMPLE AND CHANGE POINT PROBLEMS Yoichi Nishiyama* We cosider k-sample ad chage poit problems for idepedet data i a

More information

Lecture 12: September 27

Lecture 12: September 27 36-705: Itermediate Statistics Fall 207 Lecturer: Siva Balakrisha Lecture 2: September 27 Today we will discuss sufficiecy i more detail ad the begi to discuss some geeral strategies for costructig estimators.

More information

6.867 Machine learning, lecture 7 (Jaakkola) 1

6.867 Machine learning, lecture 7 (Jaakkola) 1 6.867 Machie learig, lecture 7 (Jaakkola) 1 Lecture topics: Kerel form of liear regressio Kerels, examples, costructio, properties Liear regressio ad kerels Cosider a slightly simpler model where we omit

More information

Summary. Recap ... Last Lecture. Summary. Theorem

Summary. Recap ... Last Lecture. Summary. Theorem Last Lecture Biostatistics 602 - Statistical Iferece Lecture 23 Hyu Mi Kag April 11th, 2013 What is p-value? What is the advatage of p-value compared to hypothesis testig procedure with size α? How ca

More information

Discrete Mathematics for CS Spring 2008 David Wagner Note 22

Discrete Mathematics for CS Spring 2008 David Wagner Note 22 CS 70 Discrete Mathematics for CS Sprig 2008 David Wager Note 22 I.I.D. Radom Variables Estimatig the bias of a coi Questio: We wat to estimate the proportio p of Democrats i the US populatio, by takig

More information

Introduction to Extreme Value Theory Laurens de Haan, ISM Japan, Erasmus University Rotterdam, NL University of Lisbon, PT

Introduction to Extreme Value Theory Laurens de Haan, ISM Japan, Erasmus University Rotterdam, NL University of Lisbon, PT Itroductio to Extreme Value Theory Laures de Haa, ISM Japa, 202 Itroductio to Extreme Value Theory Laures de Haa Erasmus Uiversity Rotterdam, NL Uiversity of Lisbo, PT Itroductio to Extreme Value Theory

More information

Exponential Families and Bayesian Inference

Exponential Families and Bayesian Inference Computer Visio Expoetial Families ad Bayesia Iferece Lecture Expoetial Families A expoetial family of distributios is a d-parameter family f(x; havig the followig form: f(x; = h(xe g(t T (x B(, (. where

More information

Chapter 6 Principles of Data Reduction

Chapter 6 Principles of Data Reduction Chapter 6 for BST 695: Special Topics i Statistical Theory. Kui Zhag, 0 Chapter 6 Priciples of Data Reductio Sectio 6. Itroductio Goal: To summarize or reduce the data X, X,, X to get iformatio about a

More information

ANALYSIS OF EXPERIMENTAL ERRORS

ANALYSIS OF EXPERIMENTAL ERRORS ANALYSIS OF EXPERIMENTAL ERRORS All physical measuremets ecoutered i the verificatio of physics theories ad cocepts are subject to ucertaities that deped o the measurig istrumets used ad the coditios uder

More information

b i u x i U a i j u x i u x j

b i u x i U a i j u x i u x j M ath 5 2 7 Fall 2 0 0 9 L ecture 1 9 N ov. 1 6, 2 0 0 9 ) S ecod- Order Elliptic Equatios: Weak S olutios 1. Defiitios. I this ad the followig two lectures we will study the boudary value problem Here

More information

4. Partial Sums and the Central Limit Theorem

4. Partial Sums and the Central Limit Theorem 1 of 10 7/16/2009 6:05 AM Virtual Laboratories > 6. Radom Samples > 1 2 3 4 5 6 7 4. Partial Sums ad the Cetral Limit Theorem The cetral limit theorem ad the law of large umbers are the two fudametal theorems

More information

Berry-Esseen bounds for self-normalized martingales

Berry-Esseen bounds for self-normalized martingales Berry-Essee bouds for self-ormalized martigales Xiequa Fa a, Qi-Ma Shao b a Ceter for Applied Mathematics, Tiaji Uiversity, Tiaji 30007, Chia b Departmet of Statistics, The Chiese Uiversity of Hog Kog,

More information

Kernel density estimator

Kernel density estimator Jauary, 07 NONPARAMETRIC ERNEL DENSITY ESTIMATION I this lecture, we discuss kerel estimatio of probability desity fuctios PDF Noparametric desity estimatio is oe of the cetral problems i statistics I

More information

The standard deviation of the mean

The standard deviation of the mean Physics 6C Fall 20 The stadard deviatio of the mea These otes provide some clarificatio o the distictio betwee the stadard deviatio ad the stadard deviatio of the mea.. The sample mea ad variace Cosider

More information

Lecture 19. sup y 1,..., yn B d n

Lecture 19. sup y 1,..., yn B d n STAT 06A: Polyomials of adom Variables Lecture date: Nov Lecture 19 Grothedieck s Iequality Scribe: Be Hough The scribes are based o a guest lecture by ya O Doell. I this lecture we prove Grothedieck s

More information

Information Theory and Statistics Lecture 4: Lempel-Ziv code

Information Theory and Statistics Lecture 4: Lempel-Ziv code Iformatio Theory ad Statistics Lecture 4: Lempel-Ziv code Łukasz Dębowski ldebowsk@ipipa.waw.pl Ph. D. Programme 203/204 Etropy rate is the limitig compressio rate Theorem For a statioary process (X i)

More information

1.010 Uncertainty in Engineering Fall 2008

1.010 Uncertainty in Engineering Fall 2008 MIT OpeCourseWare http://ocw.mit.edu.00 Ucertaity i Egieerig Fall 2008 For iformatio about citig these materials or our Terms of Use, visit: http://ocw.mit.edu.terms. .00 - Brief Notes # 9 Poit ad Iterval

More information

Estimation for Complete Data

Estimation for Complete Data Estimatio for Complete Data complete data: there is o loss of iformatio durig study. complete idividual complete data= grouped data A complete idividual data is the oe i which the complete iformatio of

More information

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample. Statistical Iferece (Chapter 10) Statistical iferece = lear about a populatio based o the iformatio provided by a sample. Populatio: The set of all values of a radom variable X of iterest. Characterized

More information

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 11

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 11 Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture Tolstikhi Ilya Abstract We will itroduce the otio of reproducig kerels ad associated Reproducig Kerel Hilbert Spaces (RKHS). We will cosider couple

More information

5. Likelihood Ratio Tests

5. Likelihood Ratio Tests 1 of 5 7/29/2009 3:16 PM Virtual Laboratories > 9. Hy pothesis Testig > 1 2 3 4 5 6 7 5. Likelihood Ratio Tests Prelimiaries As usual, our startig poit is a radom experimet with a uderlyig sample space,

More information

Lecture 12: November 13, 2018

Lecture 12: November 13, 2018 Mathematical Toolkit Autum 2018 Lecturer: Madhur Tulsiai Lecture 12: November 13, 2018 1 Radomized polyomial idetity testig We will use our kowledge of coditioal probability to prove the followig lemma,

More information

Properties and Hypothesis Testing

Properties and Hypothesis Testing Chapter 3 Properties ad Hypothesis Testig 3.1 Types of data The regressio techiques developed i previous chapters ca be applied to three differet kids of data. 1. Cross-sectioal data. 2. Time series data.

More information

LECTURE 8: ASYMPTOTICS I

LECTURE 8: ASYMPTOTICS I LECTURE 8: ASYMPTOTICS I We are iterested i the properties of estimators as. Cosider a sequece of radom variables {, X 1}. N. M. Kiefer, Corell Uiversity, Ecoomics 60 1 Defiitio: (Weak covergece) A sequece

More information

Algorithms for Clustering

Algorithms for Clustering CR2: Statistical Learig & Applicatios Algorithms for Clusterig Lecturer: J. Salmo Scribe: A. Alcolei Settig: give a data set X R p where is the umber of observatio ad p is the umber of features, we wat

More information

Bayesian Methods: Introduction to Multi-parameter Models

Bayesian Methods: Introduction to Multi-parameter Models Bayesia Methods: Itroductio to Multi-parameter Models Parameter: θ = ( θ, θ) Give Likelihood p(y θ) ad prior p(θ ), the posterior p proportioal to p(y θ) x p(θ ) Margial posterior ( θ, θ y) is Iterested

More information

An Introduction to Randomized Algorithms

An Introduction to Randomized Algorithms A Itroductio to Radomized Algorithms The focus of this lecture is to study a radomized algorithm for quick sort, aalyze it usig probabilistic recurrece relatios, ad also provide more geeral tools for aalysis

More information

Statistical Inference Based on Extremum Estimators

Statistical Inference Based on Extremum Estimators T. Rotheberg Fall, 2007 Statistical Iferece Based o Extremum Estimators Itroductio Suppose 0, the true value of a p-dimesioal parameter, is kow to lie i some subset S R p : Ofte we choose to estimate 0

More information

Lecture 01: the Central Limit Theorem. 1 Central Limit Theorem for i.i.d. random variables

Lecture 01: the Central Limit Theorem. 1 Central Limit Theorem for i.i.d. random variables CSCI-B609: A Theorist s Toolkit, Fall 06 Aug 3 Lecture 0: the Cetral Limit Theorem Lecturer: Yua Zhou Scribe: Yua Xie & Yua Zhou Cetral Limit Theorem for iid radom variables Let us say that we wat to aalyze

More information

Random Walks on Discrete and Continuous Circles. by Jeffrey S. Rosenthal School of Mathematics, University of Minnesota, Minneapolis, MN, U.S.A.

Random Walks on Discrete and Continuous Circles. by Jeffrey S. Rosenthal School of Mathematics, University of Minnesota, Minneapolis, MN, U.S.A. Radom Walks o Discrete ad Cotiuous Circles by Jeffrey S. Rosethal School of Mathematics, Uiversity of Miesota, Mieapolis, MN, U.S.A. 55455 (Appeared i Joural of Applied Probability 30 (1993), 780 789.)

More information

7.1 Convergence of sequences of random variables

7.1 Convergence of sequences of random variables Chapter 7 Limit theorems Throughout this sectio we will assume a probability space (Ω, F, P), i which is defied a ifiite sequece of radom variables (X ) ad a radom variable X. The fact that for every ifiite

More information

Lecture 11 and 12: Basic estimation theory

Lecture 11 and 12: Basic estimation theory Lecture ad 2: Basic estimatio theory Sprig 202 - EE 94 Networked estimatio ad cotrol Prof. Kha March 2 202 I. MAXIMUM-LIKELIHOOD ESTIMATORS The maximum likelihood priciple is deceptively simple. Louis

More information

On Classification Based on Totally Bounded Classes of Functions when There are Incomplete Covariates

On Classification Based on Totally Bounded Classes of Functions when There are Incomplete Covariates Joural of Statistical Theory ad Applicatios Volume, Number 4, 0, pp. 353-369 ISSN 538-7887 O Classificatio Based o Totally Bouded Classes of Fuctios whe There are Icomplete Covariates Majid Mojirsheibai

More information

6. Sufficient, Complete, and Ancillary Statistics

6. Sufficient, Complete, and Ancillary Statistics Sufficiet, Complete ad Acillary Statistics http://www.math.uah.edu/stat/poit/sufficiet.xhtml 1 of 7 7/16/2009 6:13 AM Virtual Laboratories > 7. Poit Estimatio > 1 2 3 4 5 6 6. Sufficiet, Complete, ad Acillary

More information

Support vector machine revisited

Support vector machine revisited 6.867 Machie learig, lecture 8 (Jaakkola) 1 Lecture topics: Support vector machie ad kerels Kerel optimizatio, selectio Support vector machie revisited Our task here is to first tur the support vector

More information

Rates of Convergence by Moduli of Continuity

Rates of Convergence by Moduli of Continuity Rates of Covergece by Moduli of Cotiuity Joh Duchi: Notes for Statistics 300b March, 017 1 Itroductio I this ote, we give a presetatio showig the importace, ad relatioship betwee, the modulis of cotiuity

More information

CS284A: Representations and Algorithms in Molecular Biology

CS284A: Representations and Algorithms in Molecular Biology CS284A: Represetatios ad Algorithms i Molecular Biology Scribe Notes o Lectures 3 & 4: Motif Discovery via Eumeratio & Motif Represetatio Usig Positio Weight Matrix Joshua Gervi Based o presetatios by

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 3 9/11/2013. Large deviations Theory. Cramér s Theorem

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 3 9/11/2013. Large deviations Theory. Cramér s Theorem MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/5.070J Fall 203 Lecture 3 9//203 Large deviatios Theory. Cramér s Theorem Cotet.. Cramér s Theorem. 2. Rate fuctio ad properties. 3. Chage of measure techique.

More information

Econ 325 Notes on Point Estimator and Confidence Interval 1 By Hiro Kasahara

Econ 325 Notes on Point Estimator and Confidence Interval 1 By Hiro Kasahara Poit Estimator Eco 325 Notes o Poit Estimator ad Cofidece Iterval 1 By Hiro Kasahara Parameter, Estimator, ad Estimate The ormal probability desity fuctio is fully characterized by two costats: populatio

More information

Lecture Stat Maximum Likelihood Estimation

Lecture Stat Maximum Likelihood Estimation Lecture Stat 461-561 Maximum Likelihood Estimatio A.D. Jauary 2008 A.D. () Jauary 2008 1 / 63 Maximum Likelihood Estimatio Ivariace Cosistecy E ciecy Nuisace Parameters A.D. () Jauary 2008 2 / 63 Parametric

More information

Notes On Median and Quantile Regression. James L. Powell Department of Economics University of California, Berkeley

Notes On Median and Quantile Regression. James L. Powell Department of Economics University of California, Berkeley Notes O Media ad Quatile Regressio James L. Powell Departmet of Ecoomics Uiversity of Califoria, Berkeley Coditioal Media Restrictios ad Least Absolute Deviatios It is well-kow that the expected value

More information

1 Covariance Estimation

1 Covariance Estimation Eco 75 Lecture 5 Covariace Estimatio ad Optimal Weightig Matrices I this lecture, we cosider estimatio of the asymptotic covariace matrix B B of the extremum estimator b : Covariace Estimatio Lemma 4.

More information

REGRESSION WITH QUADRATIC LOSS

REGRESSION WITH QUADRATIC LOSS REGRESSION WITH QUADRATIC LOSS MAXIM RAGINSKY Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X, Y ), where, as before, X is a R d

More information

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d Liear regressio Daiel Hsu (COMS 477) Maximum likelihood estimatio Oe of the simplest liear regressio models is the followig: (X, Y ),..., (X, Y ), (X, Y ) are iid radom pairs takig values i R d R, ad Y

More information

Measure and Measurable Functions

Measure and Measurable Functions 3 Measure ad Measurable Fuctios 3.1 Measure o a Arbitrary σ-algebra Recall from Chapter 2 that the set M of all Lebesgue measurable sets has the followig properties: R M, E M implies E c M, E M for N implies

More information

MATHEMATICAL SCIENCES PAPER-II

MATHEMATICAL SCIENCES PAPER-II MATHEMATICAL SCIENCES PAPER-II. Let {x } ad {y } be two sequeces of real umbers. Prove or disprove each of the statemets :. If {x y } coverges, ad if {y } is coverget, the {x } is coverget.. {x + y } coverges

More information

Output Analysis and Run-Length Control

Output Analysis and Run-Length Control IEOR E4703: Mote Carlo Simulatio Columbia Uiversity c 2017 by Marti Haugh Output Aalysis ad Ru-Legth Cotrol I these otes we describe how the Cetral Limit Theorem ca be used to costruct approximate (1 α%

More information

Lecture 7: Properties of Random Samples

Lecture 7: Properties of Random Samples Lecture 7: Properties of Radom Samples 1 Cotiued From Last Class Theorem 1.1. Let X 1, X,...X be a radom sample from a populatio with mea µ ad variace σ

More information

Dirichlet s Theorem on Arithmetic Progressions

Dirichlet s Theorem on Arithmetic Progressions Dirichlet s Theorem o Arithmetic Progressios Athoy Várilly Harvard Uiversity, Cambridge, MA 0238 Itroductio Dirichlet s theorem o arithmetic progressios is a gem of umber theory. A great part of its beauty

More information

Binomial Distribution

Binomial Distribution 0.0 0.5 1.0 1.5 2.0 2.5 3.0 0 1 2 3 4 5 6 7 0.0 0.5 1.0 1.5 2.0 2.5 3.0 Overview Example: coi tossed three times Defiitio Formula Recall that a r.v. is discrete if there are either a fiite umber of possible

More information

Large Sample Theory. Convergence. Central Limit Theorems Asymptotic Distribution Delta Method. Convergence in Probability Convergence in Distribution

Large Sample Theory. Convergence. Central Limit Theorems Asymptotic Distribution Delta Method. Convergence in Probability Convergence in Distribution Large Sample Theory Covergece Covergece i Probability Covergece i Distributio Cetral Limit Theorems Asymptotic Distributio Delta Method Covergece i Probability A sequece of radom scalars {z } = (z 1,z,

More information

Integrable Functions. { f n } is called a determining sequence for f. If f is integrable with respect to, then f d does exist as a finite real number

Integrable Functions. { f n } is called a determining sequence for f. If f is integrable with respect to, then f d does exist as a finite real number MATH 532 Itegrable Fuctios Dr. Neal, WKU We ow shall defie what it meas for a measurable fuctio to be itegrable, show that all itegral properties of simple fuctios still hold, ad the give some coditios

More information

Lecture 11 October 27

Lecture 11 October 27 STATS 300A: Theory of Statistics Fall 205 Lecture October 27 Lecturer: Lester Mackey Scribe: Viswajith Veugopal, Vivek Bagaria, Steve Yadlowsky Warig: These otes may cotai factual ad/or typographic errors..

More information

Accuracy Assessment for High-Dimensional Linear Regression

Accuracy Assessment for High-Dimensional Linear Regression Uiversity of Pesylvaia ScholarlyCommos Statistics Papers Wharto Faculty Research -016 Accuracy Assessmet for High-Dimesioal Liear Regressio Toy Cai Uiversity of Pesylvaia Zijia Guo Uiversity of Pesylvaia

More information