High-Dimensional M-Estimation with Missing Outcomes: A Semi-Parametric Framework
|
|
- Juliana Stevens
- 5 years ago
- Views:
Transcription
1 High-Dimesioal M-Estimatio with Missig Outcomes: A Semi-Parametric Framework (A Overview of the Methods ad the Mai Results) Abhishek Chakrabortty Uiversity of Pesylvaia Harvard Visit. August 20-23, 2018.
2 The Basic Framework ad Set-Up Variables of iterest: outcome Y Y R ad covariates X X R p (possibly high dimesioal, compared to the sample size). The supports Y ad X of Y ad X eed ot be cotiuous. Mai issue: The outcome Y may ot always be observed. Let T {0, 1} deote the idicator of the true Y beig observed. The (partly) uobserved radom vector (T, Y, X) is assumed to be joitly defied o a commo probability space with measure P( ). Observables: Z := (T, TY, X); Data: D := {Z i (T i, T i Y i, X i )} i=1 i.i.d. realizatios of Z (whose distributio is defied via P( )). Note the settig is particularly allowed to be high dimesioal wherei p ca diverge with, icludig p, p or p. Abhishek Chakrabortty High-D M-Estimatio with Missig Outcomes: A Semi-Parametric Framework 2/35
3 The Basic Framework ad Set-Up Variables of iterest: outcome Y Y R ad covariates X X R p (possibly high dimesioal, compared to the sample size). The supports Y ad X of Y ad X eed ot be cotiuous. Mai issue: The outcome Y may ot always be observed. Let T {0, 1} deote the idicator of the true Y beig observed. The (partly) uobserved radom vector (T, Y, X) is assumed to be joitly defied o a commo probability space with measure P( ). Observables: Z := (T, TY, X); Data: D := {Z i (T i, T i Y i, X i )} i=1 i.i.d. realizatios of Z (whose distributio is defied via P( )). Note the settig is particularly allowed to be high dimesioal wherei p ca diverge with, icludig p, p or p. Abhishek Chakrabortty High-D M-Estimatio with Missig Outcomes: A Semi-Parametric Framework 2/35
4 Geeral Applicability of the Framework Itself Geerally applicable to ay missig data settig - with missig outcomes Y ad (possibly) high dimesioal covariates X. Causal iferece problems (via potetial outcomes framework). Here, X is ofte called cofouders (for observatioal studies) or adjustmet variables/features (for radomized trials). Usual set-up: biary treatmet (a.k.a. exposure/itervetio) assigmet: T {0, 1}, ad potetial outcomes: {Y (0), Y (1) }. More geerally, multi-category treatmets: T {0, 1,..., k} (for a fixed k 1) ad correspodig potetial outcomes: {Y (j) } k j=0. Observed outcome: Y := k j=0 Y (j)1(t = j) or, (Y T = j) Y (j) j [i.e. depedig o T, we observe oly oe of {Y (j) } k j=0 ]. Applicability: For each j {0,..., k}, the set-up as above is icluded i our framework based o the represetative map : (T, Y, X) (T j, Y (j), X) with T j := 1(T = j) 0 j k. Abhishek Chakrabortty High-D M-Estimatio with Missig Outcomes: A Semi-Parametric Framework 3/35
5 Geeral Applicability of the Framework Itself Geerally applicable to ay missig data settig - with missig outcomes Y ad (possibly) high dimesioal covariates X. Causal iferece problems (via potetial outcomes framework). Here, X is ofte called cofouders (for observatioal studies) or adjustmet variables/features (for radomized trials). Usual set-up: biary treatmet (a.k.a. exposure/itervetio) assigmet: T {0, 1}, ad potetial outcomes: {Y (0), Y (1) }. More geerally, multi-category treatmets: T {0, 1,..., k} (for a fixed k 1) ad correspodig potetial outcomes: {Y (j) } k j=0. Observed outcome: Y := k j=0 Y (j)1(t = j) or, (Y T = j) Y (j) j [i.e. depedig o T, we observe oly oe of {Y (j) } k j=0 ]. Applicability: For each j {0,..., k}, the set-up as above is icluded i our framework based o the represetative map : (T, Y, X) (T j, Y (j), X) with T j := 1(T = j) 0 j k. Abhishek Chakrabortty High-D M-Estimatio with Missig Outcomes: A Semi-Parametric Framework 3/35
6 The Stadard Fudametal Assumptios Igorability assumptio: T Y X. A.k.a. missig at radom (MAR) i the missig data literature. A.k.a. o umeasured cofoudig (NUC) i causal iferece. Special case: T (Y, X). A.k.a. missig completely at radom (MCAR) i missig data literature, ad complete radomizatio (e.g. radomized trials) i causal iferece (CI) literature. Positivity assumptio (a.k.a. sufficiet overlap i CI literature): Let π(x) := P(T = 1 X) be the propesity score (PS), ad let π 0 := P(T = 1). The, π( ) is uiformly bouded away from 0: 1 π(x) δ π > 0 x X, for some costat δ π > 0. Abhishek Chakrabortty High-D M-Estimatio with Missig Outcomes: A Semi-Parametric Framework 4/35
7 The Stadard Fudametal Assumptios Igorability assumptio: T Y X. A.k.a. missig at radom (MAR) i the missig data literature. A.k.a. o umeasured cofoudig (NUC) i causal iferece. Special case: T (Y, X). A.k.a. missig completely at radom (MCAR) i missig data literature, ad complete radomizatio (e.g. radomized trials) i causal iferece (CI) literature. Positivity assumptio (a.k.a. sufficiet overlap i CI literature): Let π(x) := P(T = 1 X) be the propesity score (PS), ad let π 0 := P(T = 1). The, π( ) is uiformly bouded away from 0: 1 π(x) δ π > 0 x X, for some costat δ π > 0. Abhishek Chakrabortty High-D M-Estimatio with Missig Outcomes: A Semi-Parametric Framework 4/35
8 The Parameter(s) of Iterest: (High Dim.) M-Estimatio Parameters Give this settig, we aim to estimate (based o the data D ) the followig parameter θ 0 R d (possibly high dimesioal): θ 0 := arg mi θ R d R(θ), where R(θ) := E{L(Y, X, θ)} ad L(Y, X, θ) : R R p R d R is ay loss fuctio that is covex ad differetiable i θ R d. (The existece ad uiqueess of θ 0 is implcitly assumed. It is guarateed wheever R(θ) is strogly covex ad coercive. This is true for most stadard examples). Geerally, this correspods to M-estimatio problems (which have a vast classical literature). We provide some useful examples later. The key challeges: the missigess via T (if ot accouted for, estimator will be icosistet!) ad the high dimesioality (of X & θ 0 ). These issues make the methods & the aalyses quite tricky! Need to devise suitable methods - ivolves estimatio of uisace fuctios (leadig to error terms ivolvig o-iid summads with complex depedecies) ad careful o-asymptotic aalyses. Abhishek Chakrabortty High-D M-Estimatio with Missig Outcomes: A Semi-Parametric Framework 5/35
9 The Parameter(s) of Iterest: (High Dim.) M-Estimatio Parameters Give this settig, we aim to estimate (based o the data D ) the followig parameter θ 0 R d (possibly high dimesioal): θ 0 := arg mi θ R d R(θ), where R(θ) := E{L(Y, X, θ)} ad L(Y, X, θ) : R R p R d R is ay loss fuctio that is covex ad differetiable i θ R d. (The existece ad uiqueess of θ 0 is implcitly assumed. It is guarateed wheever R(θ) is strogly covex ad coercive. This is true for most stadard examples). Geerally, this correspods to M-estimatio problems (which have a vast classical literature). We provide some useful examples later. The key challeges: the missigess via T (if ot accouted for, estimator will be icosistet!) ad the high dimesioality (of X & θ 0 ). These issues make the methods & the aalyses quite tricky! Need to devise suitable methods - ivolves estimatio of uisace fuctios (leadig to error terms ivolvig o-iid summads with complex depedecies) ad careful o-asymptotic aalyses. Abhishek Chakrabortty High-D M-Estimatio with Missig Outcomes: A Semi-Parametric Framework 5/35
10 The Parameter(s) of Iterest ad Problem Formulatio Cotd. The case of low-d parameters is also cosidered, e.g. with d = 1 ad L(Y, X, θ) := (Y θ) 2, we have mea estimatio: θ 0 := E(Y ). This also relates to average treatmet effect (ATE) estimatio i CI (ad also i the process, the average coditioal treatmet effect estimatio - which is of iterest i persoalized medicie). Note: the same methodology also addresses (coordiate-wise) estimatio of high-d meas, e.g. whe θ 0 := E(XY ). Today, we maily focus o the more challegig high dimesioal M-estimatio problem itroduced earlier. The basic uderlyig priciple is almost the same for the low-d mea estimatio problem as well. We first provide a few (class of) applicatios for this problem. Abhishek Chakrabortty High-D M-Estimatio with Missig Outcomes: A Semi-Parametric Framework 6/35
11 High Dimesioal M-Estimatio: A Few (Class of) Applicatios 1 Stadard high dimesioal regressio problems with (1) missig outcomes ad (2) potetially misspecified (workig) models. Set d = p + 1, L(Y, X, θ) := l(y, X θ) with l(, ) : R R R. Some choices of l(, ) - (a) squared loss: l sq (u, v) := (u v) 2, (b) logistic loss: l log (u, v) := log(1 + exp v) uv, (c) expoetial loss: l exp (u, v) = exp(v) uv etc., amog may others. Note throughout, regardless of ay motivatig workig model beig true or ot, the defiitio of θ 0 is completely model free. 2 Ay series estimatio or o-liear regressio problem based o fiite (but high dimesioal) bases (o model) ad missig Y. Let Ψ(X) := {ψ j (X)} d j=1 be ay set of d basis fuctios with d possibly high dimesioal. Set L(Y, X, θ) := l{y, Ψ(X) θ} ad the choices of l(, ) ca be kept exactly the same as above. E.g. polyomial bases: Ψ(X) := {1, x k j : 1 j p, 1 k d 0 }. (d 0 = 1 correspods to liear bases - as i previous example). Abhishek Chakrabortty High-D M-Estimatio with Missig Outcomes: A Semi-Parametric Framework 7/35
12 High Dimesioal M-Estimatio: A Few (Class of) Applicatios 1 Stadard high dimesioal regressio problems with (1) missig outcomes ad (2) potetially misspecified (workig) models. Set d = p + 1, L(Y, X, θ) := l(y, X θ) with l(, ) : R R R. Some choices of l(, ) - (a) squared loss: l sq (u, v) := (u v) 2, (b) logistic loss: l log (u, v) := log(1 + exp v) uv, (c) expoetial loss: l exp (u, v) = exp(v) uv etc., amog may others. Note throughout, regardless of ay motivatig workig model beig true or ot, the defiitio of θ 0 is completely model free. 2 Ay series estimatio or o-liear regressio problem based o fiite (but high dimesioal) bases (o model) ad missig Y. Let Ψ(X) := {ψ j (X)} d j=1 be ay set of d basis fuctios with d possibly high dimesioal. Set L(Y, X, θ) := l{y, Ψ(X) θ} ad the choices of l(, ) ca be kept exactly the same as above. E.g. polyomial bases: Ψ(X) := {1, x k j : 1 j p, 1 k d 0 }. (d 0 = 1 correspods to liear bases - as i previous example). Abhishek Chakrabortty High-D M-Estimatio with Missig Outcomes: A Semi-Parametric Framework 7/35
13 Examples ad Applicatios Cotd. A. Sigal recovery i high dimesioal sigle idex models (SIMs) with elliptically symmetric desig distributio (e.g. X is Gaussia). Let Y = f (β 0 X, ɛ) with f : R2 Y ukow (i.e. β 0 idetifiable oly upto scalar multiples) ad ɛ X (i.e., Y X β 0X). Cosider ay of the regressio problems itroduced earlier i Example 1. The, θ 0 defied therei satisfies: (θ 0 ) [ 1] β 0! Classic example of a misspecified parametric model defiig θ 0, yet θ 0 directly relates to a actual semi-parametric model! B. Applicatios of all these problems i causal iferece: Liear heterogeeous treatmet effects estimatio: applicatio of the liear regressio example (applied twice). Average coditioal treatmet effects (ACTE) estimatio via series estimators: applicatio of the series estimatio example. Causal iferece via SIMs (sigal recovery, ACTE estimatio ad ATE estimatio): applicatio of the SIM example above. Abhishek Chakrabortty High-D M-Estimatio with Missig Outcomes: A Semi-Parametric Framework 8/35
14 Examples ad Applicatios Cotd. A. Sigal recovery i high dimesioal sigle idex models (SIMs) with elliptically symmetric desig distributio (e.g. X is Gaussia). Let Y = f (β 0 X, ɛ) with f : R2 Y ukow (i.e. β 0 idetifiable oly upto scalar multiples) ad ɛ X (i.e., Y X β 0X). Cosider ay of the regressio problems itroduced earlier i Example 1. The, θ 0 defied therei satisfies: (θ 0 ) [ 1] β 0! Classic example of a misspecified parametric model defiig θ 0, yet θ 0 directly relates to a actual semi-parametric model! B. Applicatios of all these problems i causal iferece: Liear heterogeeous treatmet effects estimatio: applicatio of the liear regressio example (applied twice). Average coditioal treatmet effects (ACTE) estimatio via series estimators: applicatio of the series estimatio example. Causal iferece via SIMs (sigal recovery, ACTE estimatio ad ATE estimatio): applicatio of the SIM example above. Abhishek Chakrabortty High-D M-Estimatio with Missig Outcomes: A Semi-Parametric Framework 8/35
15 Some Importat Facts to be Cosidered It is geerally ecessary to accout for the missigess i Y. The complete case estimator of θ 0 i geeral will be icosistet! That estimator may be cosistet oly if: (1) φ(x, θ 0 ) = 0 a.s. for every X (for regressio problems, this idicates the correct model case), ad/or (2) T (Y, X) (i.e. the MCAR case). With θ 0 (ad X) beig high dimesioal (compared to ), we eed some further structural costraits o θ 0 to estimate it usig D. We assume that θ 0 is s-sparse: θ 0 0 := s ad s mi(, d). Note: the sparsity requiremet has attractive (ad fairly ituitive) geometric justificatio for all the examples we have give here. Some otatios: θ R d, φ(x, θ) := E{L(Y, X, θ) X} X ad for ay fuctio f (Z, θ), f (Z, θ) := θ f (Z, θ) Rd Z. Abhishek Chakrabortty High-D M-Estimatio with Missig Outcomes: A Semi-Parametric Framework 9/35
16 Some Importat Facts to be Cosidered It is geerally ecessary to accout for the missigess i Y. The complete case estimator of θ 0 i geeral will be icosistet! That estimator may be cosistet oly if: (1) φ(x, θ 0 ) = 0 a.s. for every X (for regressio problems, this idicates the correct model case), ad/or (2) T (Y, X) (i.e. the MCAR case). With θ 0 (ad X) beig high dimesioal (compared to ), we eed some further structural costraits o θ 0 to estimate it usig D. We assume that θ 0 is s-sparse: θ 0 0 := s ad s mi(, d). Note: the sparsity requiremet has attractive (ad fairly ituitive) geometric justificatio for all the examples we have give here. Some otatios: θ R d, φ(x, θ) := E{L(Y, X, θ) X} X ad for ay fuctio f (Z, θ), f (Z, θ) := θ f (Z, θ) Rd Z. Abhishek Chakrabortty High-D M-Estimatio with Missig Outcomes: A Semi-Parametric Framework 9/35
17 Estimatio of θ 0 : The Debiased ad Double Robust (DDR) Approach We ext ote that R(θ) := E{L(Y, X, θ)} E X {φ(x, θ)} admits the followig debiased ad doubly robust (DDR) represetatio: θ, [ ] T R(θ) E X {φ(x, θ)} + E {L(Y, X, θ) φ(x, θ)}. (1) π(x) The 2 d term i (1) is simply 0, ad is ofte called the augmeted IPW (AIPW) term. It ca be see as a debiasig term (of sorts). For estimatio via empirical (ad estimated) versio of (1), the debiasig term plays a crucial role i facilitatig the aalyses & determiig the properties (covergece rates) of the estimator. The double robust (DR) aspect: replace {φ(x, θ), π(x)} by ay {φ (X, θ), π (X)} ad (1) cotiues to hold as log as oe but ot ecessarily both of φ (, ) = φ(, ) or π ( ) = π( ) hold. Note that eq. (1) is a purely o-parametric idetificatio of R(θ) based o the observable Z ad two uisace fuctios: π(x) ad φ(x, θ), which may be both ukow but are estimable from D. Abhishek Chakrabortty High-D M-Estimatio with Missig Outcomes: A Semi-Parametric Framework 10/35
18 Estimatio of θ 0 : The Debiased ad Double Robust (DDR) Approach We ext ote that R(θ) := E{L(Y, X, θ)} E X {φ(x, θ)} admits the followig debiased ad doubly robust (DDR) represetatio: θ, [ ] T R(θ) E X {φ(x, θ)} + E {L(Y, X, θ) φ(x, θ)}. (1) π(x) The 2 d term i (1) is simply 0, ad is ofte called the augmeted IPW (AIPW) term. It ca be see as a debiasig term (of sorts). For estimatio via empirical (ad estimated) versio of (1), the debiasig term plays a crucial role i facilitatig the aalyses & determiig the properties (covergece rates) of the estimator. The double robust (DR) aspect: replace {φ(x, θ), π(x)} by ay {φ (X, θ), π (X)} ad (1) cotiues to hold as log as oe but ot ecessarily both of φ (, ) = φ(, ) or π ( ) = π( ) hold. Note that eq. (1) is a purely o-parametric idetificatio of R(θ) based o the observable Z ad two uisace fuctios: π(x) ad φ(x, θ), which may be both ukow but are estimable from D. Abhishek Chakrabortty High-D M-Estimatio with Missig Outcomes: A Semi-Parametric Framework 10/35
19 Estimatio of θ 0 : The Debiased ad Double Robust (DDR) Approach We ext ote that R(θ) := E{L(Y, X, θ)} E X {φ(x, θ)} admits the followig debiased ad doubly robust (DDR) represetatio: θ, [ ] T R(θ) E X {φ(x, θ)} + E {L(Y, X, θ) φ(x, θ)}. (1) π(x) The 2 d term i (1) is simply 0, ad is ofte called the augmeted IPW (AIPW) term. It ca be see as a debiasig term (of sorts). For estimatio via empirical (ad estimated) versio of (1), the debiasig term plays a crucial role i facilitatig the aalyses & determiig the properties (covergece rates) of the estimator. The double robust (DR) aspect: replace {φ(x, θ), π(x)} by ay {φ (X, θ), π (X)} ad (1) cotiues to hold as log as oe but ot ecessarily both of φ (, ) = φ(, ) or π ( ) = π( ) hold. Note that eq. (1) is a purely o-parametric idetificatio of R(θ) based o the observable Z ad two uisace fuctios: π(x) ad φ(x, θ), which may be both ukow but are estimable from D. Abhishek Chakrabortty High-D M-Estimatio with Missig Outcomes: A Semi-Parametric Framework 10/35
20 The DDR Estimator of θ 0 Give the DDR represetatio (1) of R(θ), let { π( ), φ(, )} be ay estimators of the uisace compoets {π( ), φ(, )} based o D. The, we defie our L 1 -pealized DDR estimator θ DDR of θ 0 as: θ DDR θ DDR (λ ) := arg mi θ R d L DDR (θ) := 1 i=1 φ(x i, θ) + { } L DDR (θ) + λ θ 1, where T { i L(Y i, X i, θ) π(x i ) φ(x } i, θ), ad λ 0 is the tuig parameter. We shall assume the followig basic coditios regardig the costructio of π( ) ad φ(, ): π( ) is obtaied from the data subset T := {T i, X i } i=1 D oly while the other uisace fuctio s estimates { φ(x i, θ)} i=1 are obtaied i a cross-fitted maer (via sample splittig). We assume (temporarily) that both π( ) ad φ(, ) are correctly specified. Uder misspecficatios, the DR properties of θ DDR (i terms of cosistecy & o-sharp rates) will be discussed later. Abhishek Chakrabortty High-D M-Estimatio with Missig Outcomes: A Semi-Parametric Framework 11/35
21 The DDR Estimator of θ 0 Give the DDR represetatio (1) of R(θ), let { π( ), φ(, )} be ay estimators of the uisace compoets {π( ), φ(, )} based o D. The, we defie our L 1 -pealized DDR estimator θ DDR of θ 0 as: θ DDR θ DDR (λ ) := arg mi θ R d L DDR (θ) := 1 i=1 φ(x i, θ) + { } L DDR (θ) + λ θ 1, where T { i L(Y i, X i, θ) π(x i ) φ(x } i, θ), ad λ 0 is the tuig parameter. We shall assume the followig basic coditios regardig the costructio of π( ) ad φ(, ): π( ) is obtaied from the data subset T := {T i, X i } i=1 D oly while the other uisace fuctio s estimates { φ(x i, θ)} i=1 are obtaied i a cross-fitted maer (via sample splittig). We assume (temporarily) that both π( ) ad φ(, ) are correctly specified. Uder misspecficatios, the DR properties of θ DDR (i terms of cosistecy & o-sharp rates) will be discussed later. Abhishek Chakrabortty High-D M-Estimatio with Missig Outcomes: A Semi-Parametric Framework 11/35
22 Simplifyig Assumptios ad Easy Implemetatio Algorithm For simplicity of the theoretical aalyses, we assume that φ(x, θ) is differetiable i θ a.s., ad L(Y, X, θ) ad φ(x, θ) satisfy the followig separable forms : for some h(x) R d ad g(x, θ) R, L(Y, X, θ) = h(x){y g(x, θ)}, ad φ(x, θ) = h(x){ m(x) g(x, θ)}, where m(x) := E(Y X) ad m(x) deotes the correspodig estimator of m(x) eeded to costruct φ(x, θ). To obtai { φ(x i, θ)} i=1 uder the assumed form, oe oly eeds the (cross-fitted) estimates { m(x i )} i=1 of m( ). These assumptios hold for all examples give before. Implemetatio algorithm. θ DDR ca be obtaied simply as: { } θ DDR θ 1 DDR (λ ) := arg mi L(Ỹi, X i, θ) + λ θ θ R d 1, where Ỹi := m(x i ) + i=1 T i π(x i ) {Y i m(x i )}, i, is a pseudo outcome. Abhishek Chakrabortty High-D M-Estimatio with Missig Outcomes: A Semi-Parametric Framework 12/35
23 Simplifyig Assumptios ad Easy Implemetatio Algorithm For simplicity of the theoretical aalyses, we assume that φ(x, θ) is differetiable i θ a.s., ad L(Y, X, θ) ad φ(x, θ) satisfy the followig separable forms : for some h(x) R d ad g(x, θ) R, L(Y, X, θ) = h(x){y g(x, θ)}, ad φ(x, θ) = h(x){ m(x) g(x, θ)}, where m(x) := E(Y X) ad m(x) deotes the correspodig estimator of m(x) eeded to costruct φ(x, θ). To obtai { φ(x i, θ)} i=1 uder the assumed form, oe oly eeds the (cross-fitted) estimates { m(x i )} i=1 of m( ). These assumptios hold for all examples give before. Implemetatio algorithm. θ DDR ca be obtaied simply as: { } θ DDR θ 1 DDR (λ ) := arg mi L(Ỹi, X i, θ) + λ θ θ R d 1, where Ỹi := m(x i ) + i=1 T i π(x i ) {Y i m(x i )}, i, is a pseudo outcome. Abhishek Chakrabortty High-D M-Estimatio with Missig Outcomes: A Semi-Parametric Framework 12/35
24 Properties of θ DDR : Geeral Covergece Rates For ay choices of π( ) ad m( ) ad ay realizatio of D, choose ay λ 2 L DDR (θ 0 ). The for ay such choice, ad uder some basic (stadard) assumptios, the DDR estimator θ DDR (λ ) satisfies a determiistic deviatio boud: with s := θ 0 0, 2 1 θ DDR (λ ) θ 0 λ s, ad θddr (λ ) θ 0 λ s. Probabilistic bouds for L DDR (θ 0 ) (the lower boud of λ ): L DDR (θ 0 ) T 0, + T π, + T m, + R π,m,, where T 0, is the mai term (a cetered iid average), T π, is the π-error term ivolvig π( ) π( ) ad T m, is the m-error term ivolvig m( ) m( ), while R π,m, is the (π, m)-error term (usually lower order) ivolvig the product of π( ) π( ) ad m( ) m( ). For all the terms, the aalyses are fully o-asymptotic ad quite uaced, especially i order to get sharp rates for T π, ad T m,. Abhishek Chakrabortty High-D M-Estimatio with Missig Outcomes: A Semi-Parametric Framework 13/35
25 Properties of θ DDR : Geeral Covergece Rates For ay choices of π( ) ad m( ) ad ay realizatio of D, choose ay λ 2 L DDR (θ 0 ). The for ay such choice, ad uder some basic (stadard) assumptios, the DDR estimator θ DDR (λ ) satisfies a determiistic deviatio boud: with s := θ 0 0, 2 1 θ DDR (λ ) θ 0 λ s, ad θddr (λ ) θ 0 λ s. Probabilistic bouds for L DDR (θ 0 ) (the lower boud of λ ): L DDR (θ 0 ) T 0, + T π, + T m, + R π,m,, where T 0, is the mai term (a cetered iid average), T π, is the π-error term ivolvig π( ) π( ) ad T m, is the m-error term ivolvig m( ) m( ), while R π,m, is the (π, m)-error term (usually lower order) ivolvig the product of π( ) π( ) ad m( ) m( ). For all the terms, the aalyses are fully o-asymptotic ad quite uaced, especially i order to get sharp rates for T π, ad T m,. Abhishek Chakrabortty High-D M-Estimatio with Missig Outcomes: A Semi-Parametric Framework 13/35
26 Geeral Covergece Rates of θ DDR Cotd. Basic (high level) cosistecy coditios o { π( ), m( )}. Let { π( ), m( )} be ay geeral ad correct estimators of {π( ), m( )}, ad assume they satisfy the followig poitwise covergece rates: π(x) π(x) P δ,π ad m(x) m(x) P ξ,m x X, (2) for some sequeces δ,π, ξ,m 0 such that (δ,π + ξ,m ) log(d) = o(1) ad the product δ,π ξ,m (log ) = o( (log d)/). Uder the above set-up ad the coditio i (2), alog with some more suitable assumptios, we the have: with high probability, } log d T 0, {δ,π log(d), ad T m, log d log d, T π, { } ξ 2,m log(d), R π,m, δ,π ξ 2,m (log ). ( ) Hece, L DDR log d log d (θ 0 ) + o with high prob. Abhishek Chakrabortty High-D M-Estimatio with Missig Outcomes: A Semi-Parametric Framework 14/35
27 Geeral Covergece Rates of θ DDR Cotd. Basic (high level) cosistecy coditios o { π( ), m( )}. Let { π( ), m( )} be ay geeral ad correct estimators of {π( ), m( )}, ad assume they satisfy the followig poitwise covergece rates: π(x) π(x) P δ,π ad m(x) m(x) P ξ,m x X, (2) for some sequeces δ,π, ξ,m 0 such that (δ,π + ξ,m ) log(d) = o(1) ad the product δ,π ξ,m (log ) = o( (log d)/). Uder the above set-up ad the coditio i (2), alog with some more suitable assumptios, we the have: with high probability, } log d T 0, {δ,π log(d), ad T m, log d log d, T π, { } ξ 2,m log(d), R π,m, δ,π ξ 2,m (log ). ( ) Hece, L DDR log d log d (θ 0 ) + o with high prob. Abhishek Chakrabortty High-D M-Estimatio with Missig Outcomes: A Semi-Parametric Framework 14/35
28 Desparsificatio of θ DDR ad Asymptotic Liear Expasio (ALE) Cosider θ DDR for the squared loss: L(Y, X, θ) := {Y Ψ(X) θ} 2, where Ψ(X) R d deotes ay (high dimesioal) vector of basis fuctios of X. Defie Ω := Σ 1, where Σ := E{Ψ(X)Ψ(X) }. Let Ω be ay reasoable estimator of Ω (assume Ω is sparse if reqd.). We the defie the desparsified DDR estimator θ DDR as follows. θ DDR := θ DDR + Ω 1 {Ỹi Ψ(X i ) θddr }Ψ(X i ), where Ỹ i := m(x i ) + i=1 T i π(x i ) {Y i m(x i )} are pseudo outcomes as before. Uder suitable assumptios, icludig all previous coditios ad coditios o Ω Ω ad I Ω Σ max, θ DDR satisfies the ALE: ( θ DDR θ 0 ) = 1 Ω{ψ 0 (Z i )} +, where = o P ( 1 2 ), i=1 ad ψ 0 (Z) := [m(x) Ψ(X) θ 0 + {T /π(x)}{y m(x)}]ψ(x) with E{ψ 0 (Z)} = 0. This ALE is optimal ad also facilitates iferece. Abhishek Chakrabortty High-D M-Estimatio with Missig Outcomes: A Semi-Parametric Framework 15/35
29 Desparsificatio of θ DDR ad Asymptotic Liear Expasio (ALE) Cosider θ DDR for the squared loss: L(Y, X, θ) := {Y Ψ(X) θ} 2, where Ψ(X) R d deotes ay (high dimesioal) vector of basis fuctios of X. Defie Ω := Σ 1, where Σ := E{Ψ(X)Ψ(X) }. Let Ω be ay reasoable estimator of Ω (assume Ω is sparse if reqd.). We the defie the desparsified DDR estimator θ DDR as follows. θ DDR := θ DDR + Ω 1 {Ỹi Ψ(X i ) θddr }Ψ(X i ), where Ỹ i := m(x i ) + i=1 T i π(x i ) {Y i m(x i )} are pseudo outcomes as before. Uder suitable assumptios, icludig all previous coditios ad coditios o Ω Ω ad I Ω Σ max, θ DDR satisfies the ALE: ( θ DDR θ 0 ) = 1 Ω{ψ 0 (Z i )} +, where = o P ( 1 2 ), i=1 ad ψ 0 (Z) := [m(x) Ψ(X) θ 0 + {T /π(x)}{y m(x)}]ψ(x) with E{ψ 0 (Z)} = 0. This ALE is optimal ad also facilitates iferece. Abhishek Chakrabortty High-D M-Estimatio with Missig Outcomes: A Semi-Parametric Framework 15/35
30 The Desparsified DDR Estimator: A Sketch of the Aalyses ad Rates The error ca be decomposed as: =,1 +,2 +,3, where,1 := 1 ( Ω Ω) i=1 ψ 0(Z i ),,2 := (I d Ω Σ)( θ DDR θ 0 ) ad,3 := Ω(T π, + T m, + R π,m, ). Assume the basic covergece coditios (2) for { π( ), m( )}, ad that Ω Ω = O P (a ) ad I Ω Σ max = O P (b ) for some a, b = o(1) ad Ω = O(1). The, with high probability, we have: log d log d,1 a,,2 b s ad log d { },3 log(d) δ,π + ξ 2,m + δ,π ξ 2,m log(d). Thus, uder suitable assumptios o the rates, = o P ( 1 2 ). Choose Ω to be ay stadard (sparse) precisio matrix estimator, e.g. the ode-wise Lasso estimator. Here, a = s Ω (log d)/ ad b = (log d)/ uder suitable coditios, with s Ω := max Ω j 0. 1 j d Abhishek Chakrabortty High-D M-Estimatio with Missig Outcomes: A Semi-Parametric Framework 16/35
31 The Desparsified DDR Estimator: A Sketch of the Aalyses ad Rates The error ca be decomposed as: =,1 +,2 +,3, where,1 := 1 ( Ω Ω) i=1 ψ 0(Z i ),,2 := (I d Ω Σ)( θ DDR θ 0 ) ad,3 := Ω(T π, + T m, + R π,m, ). Assume the basic covergece coditios (2) for { π( ), m( )}, ad that Ω Ω = O P (a ) ad I Ω Σ max = O P (b ) for some a, b = o(1) ad Ω = O(1). The, with high probability, we have: log d log d,1 a,,2 b s ad log d { },3 log(d) δ,π + ξ 2,m + δ,π ξ 2,m log(d). Thus, uder suitable assumptios o the rates, = o P ( 1 2 ). Choose Ω to be ay stadard (sparse) precisio matrix estimator, e.g. the ode-wise Lasso estimator. Here, a = s Ω (log d)/ ad b = (log d)/ uder suitable coditios, with s Ω := max Ω j 0. 1 j d Abhishek Chakrabortty High-D M-Estimatio with Missig Outcomes: A Semi-Parametric Framework 16/35
32 The Desparsified DDR Estimator: A Sketch of the Aalyses ad Rates The error ca be decomposed as: =,1 +,2 +,3, where,1 := 1 ( Ω Ω) i=1 ψ 0(Z i ),,2 := (I d Ω Σ)( θ DDR θ 0 ) ad,3 := Ω(T π, + T m, + R π,m, ). Assume the basic covergece coditios (2) for { π( ), m( )}, ad that Ω Ω = O P (a ) ad I Ω Σ max = O P (b ) for some a, b = o(1) ad Ω = O(1). The, with high probability, we have: log d log d,1 a,,2 b s ad log d { },3 log(d) δ,π + ξ 2,m + δ,π ξ 2,m log(d). Thus, uder suitable assumptios o the rates, = o P ( 1 2 ). Choose Ω to be ay stadard (sparse) precisio matrix estimator, e.g. the ode-wise Lasso estimator. Here, a = s Ω (log d)/ ad b = (log d)/ uder suitable coditios, with s Ω := max Ω j 0. 1 j d Abhishek Chakrabortty High-D M-Estimatio with Missig Outcomes: A Semi-Parametric Framework 16/35
33 The DR Aspect: Geeral Covergece Rates (uder Misspecificatio) Fially, let { π( ), m( )} be ay geeral but potetially misspecified estimators with targets {π ( ), m ( )}, so that either π ( ) = π( ) or m ( ) = m( ) but ot ecessarily both. Assume the same pt. wise covergece coditios ad rates (δ,π, ξ,m ) for { π( ), m( )} as i (2), but ow with {π( ), m( )} therei replaced by {π ( ), m ( )}. Uder some suitable assumptios, we have: with high probability, log d { T 0, + T π, + T m, 1 + 1(π,m ) (π,m) + o(1) } & R π,m, { δ,π 1 (m m) + ξ /2,m 1 (π π) + δ,π ξ /2,m } (log ). Note that the 2 d ad/or 3 rd terms (depedig o which estimator is misspecified) will also cotribute ow to the rate (log d)/. The 4 th term is o(1) but o loger igorable (ad may be slower). Regardless, this establishes geeral covergece rates & DR property of θ DDR uder possible misspecificatio of { π( ), m( )}. For the 4 th term, sharper rates eed a case-by-case aalysis. Abhishek Chakrabortty High-D M-Estimatio with Missig Outcomes: A Semi-Parametric Framework 17/35
34 The DR Aspect: Geeral Covergece Rates (uder Misspecificatio) Fially, let { π( ), m( )} be ay geeral but potetially misspecified estimators with targets {π ( ), m ( )}, so that either π ( ) = π( ) or m ( ) = m( ) but ot ecessarily both. Assume the same pt. wise covergece coditios ad rates (δ,π, ξ,m ) for { π( ), m( )} as i (2), but ow with {π( ), m( )} therei replaced by {π ( ), m ( )}. Uder some suitable assumptios, we have: with high probability, log d { T 0, + T π, + T m, 1 + 1(π,m ) (π,m) + o(1) } & R π,m, { δ,π 1 (m m) + ξ /2,m 1 (π π) + δ,π ξ /2,m } (log ). Note that the 2 d ad/or 3 rd terms (depedig o which estimator is misspecified) will also cotribute ow to the rate (log d)/. The 4 th term is o(1) but o loger igorable (ad may be slower). Regardless, this establishes geeral covergece rates & DR property of θ DDR uder possible misspecificatio of { π( ), m( )}. For the 4 th term, sharper rates eed a case-by-case aalysis. Abhishek Chakrabortty High-D M-Estimatio with Missig Outcomes: A Semi-Parametric Framework 17/35
35 Choices of the Nuisace Compoet Estimators π( ) ad m( ) Note: our theory holds geerally for ay choices of π( ) ad m( ) uder mild coditios (provided they are both correct estimators). Uder misspecificatios, cosistecy & geeral o-sharp rates are also established. Sharp rates eed case-by-case aalyses. Below we provide oly some choices of π( ) ad m( ) that may be used to implemet our geeral theory & methods for θ DDR. Choices of π( ): we cosider 2 (classes of) choices (these choices may also be used to implemet the aive IPW type estimator). Choices of m( ): first ote that m(x) := E(Y X) E(Y X, T = 1) (uder MAR). We cosider 2 (classes of) choices of m( ) as well. For both π( ) ad m( ), we cosider estimators from two families: (Exteded) parametric families. Semi-parametric sigle idex families. Abhishek Chakrabortty High-D M-Estimatio with Missig Outcomes: A Semi-Parametric Framework 18/35
36 Choices of the Nuisace Compoet Estimators π( ) ad m( ) Note: our theory holds geerally for ay choices of π( ) ad m( ) uder mild coditios (provided they are both correct estimators). Uder misspecificatios, cosistecy & geeral o-sharp rates are also established. Sharp rates eed case-by-case aalyses. Below we provide oly some choices of π( ) ad m( ) that may be used to implemet our geeral theory & methods for θ DDR. Choices of π( ): we cosider 2 (classes of) choices (these choices may also be used to implemet the aive IPW type estimator). Choices of m( ): first ote that m(x) := E(Y X) E(Y X, T = 1) (uder MAR). We cosider 2 (classes of) choices of m( ) as well. For both π( ) ad m( ), we cosider estimators from two families: (Exteded) parametric families. Semi-parametric sigle idex families. Abhishek Chakrabortty High-D M-Estimatio with Missig Outcomes: A Semi-Parametric Framework 18/35
37 Choices of π( ): (Exteded) Parametric Families If π( ) is kow, we set π( ) := π( ). Otherwise, we estimate π( ) via two (class of) choices of π( ) (each assumed to be correct ). (Exteded) parametric family: π(x) = g{α Ψ(X)}, where g( ) [0, 1] is a kow fuctio [e.g. g expit (u) := exp(u)/{1 + exp(u)}], Ψ(X) := {ψ k (X)} K k=1 is ay set of K basis fuctios (with K possibly), ad α R K is a ukow (sparse) parameter vector. Example: Ψ(X) may correspod to the polyomial bases of X upto ay fixed degree k. Note: the special case of liear bases (k = 1) icludes all stadard parametric regressio models. Further, the case of π( ) = costat (but ukow) i.e. MCAR is also icluded. Estimator: we set π(x) = g{ α Ψ(X)}, where α deotes ay suitable estimator (possibly pealized) of α based o T := {T i, X i } i=1. Example of α: whe g( ) = g expit ( ), α may be obtaied based o a stadard L 1 -pealized logistic regressio of {T i vs. Ψ(X i )} i=1. Abhishek Chakrabortty High-D M-Estimatio with Missig Outcomes: A Semi-Parametric Framework 19/35
38 Choices of π( ): (Exteded) Parametric Families If π( ) is kow, we set π( ) := π( ). Otherwise, we estimate π( ) via two (class of) choices of π( ) (each assumed to be correct ). (Exteded) parametric family: π(x) = g{α Ψ(X)}, where g( ) [0, 1] is a kow fuctio [e.g. g expit (u) := exp(u)/{1 + exp(u)}], Ψ(X) := {ψ k (X)} K k=1 is ay set of K basis fuctios (with K possibly), ad α R K is a ukow (sparse) parameter vector. Example: Ψ(X) may correspod to the polyomial bases of X upto ay fixed degree k. Note: the special case of liear bases (k = 1) icludes all stadard parametric regressio models. Further, the case of π( ) = costat (but ukow) i.e. MCAR is also icluded. Estimator: we set π(x) = g{ α Ψ(X)}, where α deotes ay suitable estimator (possibly pealized) of α based o T := {T i, X i } i=1. Example of α: whe g( ) = g expit ( ), α may be obtaied based o a stadard L 1 -pealized logistic regressio of {T i vs. Ψ(X i )} i=1. Abhishek Chakrabortty High-D M-Estimatio with Missig Outcomes: A Semi-Parametric Framework 19/35
39 Choices of π( ): Semi-Parametric Sigle Idex Families Semi-parametric sigle idex family: π(x) = g(α X), where g( ) (0, 1) is ukow ad α R p is a (sparse) ukow parameter (idetifiable oly upto scalar multiples, hece set α 2 = 1 wlog). Give a estimator α of α, we estimate π(x) E(T α X) as: π(x) π( α, x) := 1 h 1 h i=1 T ik { α (X i x)/h } i=1 K { α (X i x)/h } := l π ( α, x) fπ ( α, x), where K ( ) deotes ay stadard (2 d order) kerel fuctio, h = h > 0 deotes the badwidth sequece, ad l π (α, x) := l α (α X) := g(α X)f α (α X) ad f π (α, x) := f α (α X), with g(α x) E(T α X = α x) ad f α (α x) beig the desity of α X at α x. Obtaiig α: I geeral, ay approach (if available) from (high dimesioal) sigle idex model literature ca be used. But if X is elliptically symmetric, the α may be obtaied as simply as a stadard L 1 -pealized logistic regressio of {T i vs. X i } i=1. Abhishek Chakrabortty High-D M-Estimatio with Missig Outcomes: A Semi-Parametric Framework 20/35
40 Choices of π( ): Semi-Parametric Sigle Idex Families Semi-parametric sigle idex family: π(x) = g(α X), where g( ) (0, 1) is ukow ad α R p is a (sparse) ukow parameter (idetifiable oly upto scalar multiples, hece set α 2 = 1 wlog). Give a estimator α of α, we estimate π(x) E(T α X) as: π(x) π( α, x) := 1 h 1 h i=1 T ik { α (X i x)/h } i=1 K { α (X i x)/h } := l π ( α, x) fπ ( α, x), where K ( ) deotes ay stadard (2 d order) kerel fuctio, h = h > 0 deotes the badwidth sequece, ad l π (α, x) := l α (α X) := g(α X)f α (α X) ad f π (α, x) := f α (α X), with g(α x) E(T α X = α x) ad f α (α x) beig the desity of α X at α x. Obtaiig α: I geeral, ay approach (if available) from (high dimesioal) sigle idex model literature ca be used. But if X is elliptically symmetric, the α may be obtaied as simply as a stadard L 1 -pealized logistic regressio of {T i vs. X i } i=1. Abhishek Chakrabortty High-D M-Estimatio with Missig Outcomes: A Semi-Parametric Framework 20/35
41 Choices of π( ): Semi-Parametric Sigle Idex Families Semi-parametric sigle idex family: π(x) = g(α X), where g( ) (0, 1) is ukow ad α R p is a (sparse) ukow parameter (idetifiable oly upto scalar multiples, hece set α 2 = 1 wlog). Give a estimator α of α, we estimate π(x) E(T α X) as: π(x) π( α, x) := 1 h 1 h i=1 T ik { α (X i x)/h } i=1 K { α (X i x)/h } := l π ( α, x) fπ ( α, x), where K ( ) deotes ay stadard (2 d order) kerel fuctio, h = h > 0 deotes the badwidth sequece, ad l π (α, x) := l α (α X) := g(α X)f α (α X) ad f π (α, x) := f α (α X), with g(α x) E(T α X = α x) ad f α (α x) beig the desity of α X at α x. Obtaiig α: I geeral, ay approach (if available) from (high dimesioal) sigle idex model literature ca be used. But if X is elliptically symmetric, the α may be obtaied as simply as a stadard L 1 -pealized logistic regressio of {T i vs. X i } i=1. Abhishek Chakrabortty High-D M-Estimatio with Missig Outcomes: A Semi-Parametric Framework 20/35
42 Choices of m( ): (Exteded) Parametric Families (Exteded) parametric family: m(x) = g{γ Ψ(X)}, where g( ) is a kow lik fuctio [e.g. caoical liks: idetity, expit or exp], Ψ(X) := {ψ k (X)} K k=1 is ay set of K basis fuctios (with K possibly), ad γ R K is a ukow (sparse) parameter vector. Example: Ψ(X) may correspod to the polyomial bases of X upto ay fixed degree k. Note: the special case of liear bases (k = 1) icludes all stadard parametric regressio models. Estimator: we set m(x) = g{ γ Ψ(X)}, where γ deotes ay suitable estimator (possibly pealized) of γ based o the data subset of complete cases : D (c) := {(Y i, X i ) T i = 1} i=1. Example of γ: whe g( ) := ay caoical lik fuctio, γ may be simply obtaied based o the respective usual L 1 -pealized caoical lik based regressio (e.g. liear, logistic or poisso) of {(Y i vs. X i ) T i = 1} i=1 from the complete case data D(c). Abhishek Chakrabortty High-D M-Estimatio with Missig Outcomes: A Semi-Parametric Framework 21/35
43 Choices of m( ): (Exteded) Parametric Families (Exteded) parametric family: m(x) = g{γ Ψ(X)}, where g( ) is a kow lik fuctio [e.g. caoical liks: idetity, expit or exp], Ψ(X) := {ψ k (X)} K k=1 is ay set of K basis fuctios (with K possibly), ad γ R K is a ukow (sparse) parameter vector. Example: Ψ(X) may correspod to the polyomial bases of X upto ay fixed degree k. Note: the special case of liear bases (k = 1) icludes all stadard parametric regressio models. Estimator: we set m(x) = g{ γ Ψ(X)}, where γ deotes ay suitable estimator (possibly pealized) of γ based o the data subset of complete cases : D (c) := {(Y i, X i ) T i = 1} i=1. Example of γ: whe g( ) := ay caoical lik fuctio, γ may be simply obtaied based o the respective usual L 1 -pealized caoical lik based regressio (e.g. liear, logistic or poisso) of {(Y i vs. X i ) T i = 1} i=1 from the complete case data D(c). Abhishek Chakrabortty High-D M-Estimatio with Missig Outcomes: A Semi-Parametric Framework 21/35
44 Choices of m( ): Semi-Parametric Sigle Idex Families Semi-parametric sigle idex family: m(x) = g(γ X), where g( ) is a ukow lik ad γ R p is a (sparse) ukow parameter (idetifiable oly upto scalar multiples, hece set γ 2 = 1 wlog). Give a estimator γ of γ, we estimate m(x) E(Y γ X, T ) as: 1 h i=1 m(x) m( γ, x) := T iy i K { γ (X i x)/h } i=1 T i K { γ (X i x)/h }, 1 h where K ( ) deotes ay stadard (2 d order) kerel fuctio, ad h = h > 0 deotes the badwidth sequece. Obtaiig γ: I geeral, ay approach (if available) from HD SIM literature ca be used o the complete case data subset D (c). If X is elliptically symmetric ad Y = f (γ X; ɛ) with f ukow ad ɛ (T, X), the γ may be obtaied as L 1 -pealized IPW estimator θ IPW (discussed later) for ay caoical lik based regressio problem (recall the illustrative ex. regardig SIMs). To implemet θ IPW, ca use ay of the 2 earlier choices of π( ). Abhishek Chakrabortty High-D M-Estimatio with Missig Outcomes: A Semi-Parametric Framework 22/35
45 Choices of m( ): Semi-Parametric Sigle Idex Families Semi-parametric sigle idex family: m(x) = g(γ X), where g( ) is a ukow lik ad γ R p is a (sparse) ukow parameter (idetifiable oly upto scalar multiples, hece set γ 2 = 1 wlog). Give a estimator γ of γ, we estimate m(x) E(Y γ X, T ) as: 1 h i=1 m(x) m( γ, x) := T iy i K { γ (X i x)/h } i=1 T i K { γ (X i x)/h }, 1 h where K ( ) deotes ay stadard (2 d order) kerel fuctio, ad h = h > 0 deotes the badwidth sequece. Obtaiig γ: I geeral, ay approach (if available) from HD SIM literature ca be used o the complete case data subset D (c). If X is elliptically symmetric ad Y = f (γ X; ɛ) with f ukow ad ɛ (T, X), the γ may be obtaied as L 1 -pealized IPW estimator θ IPW (discussed later) for ay caoical lik based regressio problem (recall the illustrative ex. regardig SIMs). To implemet θ IPW, ca use ay of the 2 earlier choices of π( ). Abhishek Chakrabortty High-D M-Estimatio with Missig Outcomes: A Semi-Parametric Framework 22/35
46 Choices of m( ): Semi-Parametric Sigle Idex Families Semi-parametric sigle idex family: m(x) = g(γ X), where g( ) is a ukow lik ad γ R p is a (sparse) ukow parameter (idetifiable oly upto scalar multiples, hece set γ 2 = 1 wlog). Give a estimator γ of γ, we estimate m(x) E(Y γ X, T ) as: 1 h i=1 m(x) m( γ, x) := T iy i K { γ (X i x)/h } i=1 T i K { γ (X i x)/h }, 1 h where K ( ) deotes ay stadard (2 d order) kerel fuctio, ad h = h > 0 deotes the badwidth sequece. Obtaiig γ: I geeral, ay approach (if available) from HD SIM literature ca be used o the complete case data subset D (c). If X is elliptically symmetric ad Y = f (γ X; ɛ) with f ukow ad ɛ (T, X), the γ may be obtaied as L 1 -pealized IPW estimator θ IPW (discussed later) for ay caoical lik based regressio problem (recall the illustrative ex. regardig SIMs). To implemet θ IPW, ca use ay of the 2 earlier choices of π( ). Abhishek Chakrabortty High-D M-Estimatio with Missig Outcomes: A Semi-Parametric Framework 22/35
47 Covergece Rates Regardig The Choices of π( ) π( ) For either choices of π( ), assume that the igrediet estimator α satisfies: α α 1 P a for some a = o(1). The, uder some suitable assumptios, we have: with high probability (w.h.p.), π(x) π(x) a for ay fixed x X, (for method 1). For method 2, we have: with high probability, for ay fixed x X, ( π(x) π(x) h ) ( ) log p + a + a h h 3 + a2 h 2. Usually, we expect the L 1 error rate of α to be a = s α (log d )/ where s α := α 0 ad d = K or p (depedig o the method). Abhishek Chakrabortty High-D M-Estimatio with Missig Outcomes: A Semi-Parametric Framework 23/35
48 Covergece Rates Regardig the Choices of m( ) m( ) For either of the two choices of m( ), assume that the igrediet estimator γ satisfies: γ γ 1 P b for some b = o(1). The, uder some suitable assumptios, we have: with high probability, m(x) m(x) b for ay fixed x X (for method 1). For method 2, we have: with high probability, for ay fixed x X, m(x) m(x) ( h ) ( ) log p + b + b h h 3 + b2 h 2. We typically expect the L 1 error rate of γ to be b = s γ (log d )/ where s γ := α 0 ad d = K or p (depedig o the method). Abhishek Chakrabortty High-D M-Estimatio with Missig Outcomes: A Semi-Parametric Framework 24/35
49 The Mea Estimatio Problem Abhishek Chakrabortty High-D M-Estimatio with Missig Outcomes: A Semi-Parametric Framework 25/35
50 Mea Estimatio: The Debiased ad Doubly Robust (DDR) Estimator Let us ext focus o the (low-d) mea estimatio problem, where θ 0 := E(Y ). The, we first ote the DDR represetatio of θ 0 : [ ] T θ 0 E(Y ) = E X {m(x)} + E {Y m(x)}, π(x) where m(x) := E(Y X). The secod term above is actually 0 ad is ofte called the augmeted IPW (AIPW) term. Agai, this is a purely o-parametric idetificatio of θ 0 based o the observable Z ad two uisace fuctios: π(x) ad m(x) both of which may be ukow but are still estimable from D. Let m( ) ad π( ) be ay estimators of m( ) ad π( ), ad further assume that { m(x i )} i=1 are obtaied i a cross-fitted maer (via sample splittig). The, defie the DDR estimator θ DDR of θ 0 : θ DDR = 1 m(x i ) + 1 i=1 i=1 T i π(x i ) {Y i m(x i )}. Abhishek Chakrabortty High-D M-Estimatio with Missig Outcomes: A Semi-Parametric Framework 26/35
51 Mea Estimatio: The Debiased ad Doubly Robust (DDR) Estimator Let us ext focus o the (low-d) mea estimatio problem, where θ 0 := E(Y ). The, we first ote the DDR represetatio of θ 0 : [ ] T θ 0 E(Y ) = E X {m(x)} + E {Y m(x)}, π(x) where m(x) := E(Y X). The secod term above is actually 0 ad is ofte called the augmeted IPW (AIPW) term. Agai, this is a purely o-parametric idetificatio of θ 0 based o the observable Z ad two uisace fuctios: π(x) ad m(x) both of which may be ukow but are still estimable from D. Let m( ) ad π( ) be ay estimators of m( ) ad π( ), ad further assume that { m(x i )} i=1 are obtaied i a cross-fitted maer (via sample splittig). The, defie the DDR estimator θ DDR of θ 0 : θ DDR = 1 m(x i ) + 1 i=1 i=1 T i π(x i ) {Y i m(x i )}. Abhishek Chakrabortty High-D M-Estimatio with Missig Outcomes: A Semi-Parametric Framework 26/35
52 Properties of The DDR Mea Estimator θ DDR is cosistet for θ 0 as soo as oe, but ot ecessarily both, of π( ) or m( ) are correct estimators (cosistet, with some rate coditios) of the true π( ) or m( )! Hece, the double robustess. Further whe both { m( ), π( )} are correct, θ DDR is a -cosistet ad RAL (regular ad asymptotically liear) estimator of θ 0 with IF (ifluece fuctio), which is also the optimal/ efficiet IF, give by ψ eff (Z) = {m(x) θ 0 } + T π(x) {Y m(x)}, with E{ψ eff(z)} = 0. So, θ DDR the satisfies: ( θ DDR θ 0 ) = 1 i=1 ψ eff(z i ) + o P (1). This holds regardless of how { π( ), m( )} are obtaied, as log as they satisfy some (basic) cosistecy & rate requiremets. ψ eff (Z) is the IF with the smallest variace (the semi-parametric efficiecy boud) achievable by ay RAL estimator of θ 0. Abhishek Chakrabortty High-D M-Estimatio with Missig Outcomes: A Semi-Parametric Framework 27/35
53 Properties of The DDR Mea Estimator θ DDR is cosistet for θ 0 as soo as oe, but ot ecessarily both, of π( ) or m( ) are correct estimators (cosistet, with some rate coditios) of the true π( ) or m( )! Hece, the double robustess. Further whe both { m( ), π( )} are correct, θ DDR is a -cosistet ad RAL (regular ad asymptotically liear) estimator of θ 0 with IF (ifluece fuctio), which is also the optimal/ efficiet IF, give by ψ eff (Z) = {m(x) θ 0 } + T π(x) {Y m(x)}, with E{ψ eff(z)} = 0. So, θ DDR the satisfies: ( θ DDR θ 0 ) = 1 i=1 ψ eff(z i ) + o P (1). This holds regardless of how { π( ), m( )} are obtaied, as log as they satisfy some (basic) cosistecy & rate requiremets. ψ eff (Z) is the IF with the smallest variace (the semi-parametric efficiecy boud) achievable by ay RAL estimator of θ 0. Abhishek Chakrabortty High-D M-Estimatio with Missig Outcomes: A Semi-Parametric Framework 27/35
54 Properties of The DDR Mea Estimator θ DDR is cosistet for θ 0 as soo as oe, but ot ecessarily both, of π( ) or m( ) are correct estimators (cosistet, with some rate coditios) of the true π( ) or m( )! Hece, the double robustess. Further whe both { m( ), π( )} are correct, θ DDR is a -cosistet ad RAL (regular ad asymptotically liear) estimator of θ 0 with IF (ifluece fuctio), which is also the optimal/ efficiet IF, give by ψ eff (Z) = {m(x) θ 0 } + T π(x) {Y m(x)}, with E{ψ eff(z)} = 0. So, θ DDR the satisfies: ( θ DDR θ 0 ) = 1 i=1 ψ eff(z i ) + o P (1). This holds regardless of how { π( ), m( )} are obtaied, as log as they satisfy some (basic) cosistecy & rate requiremets. ψ eff (Z) is the IF with the smallest variace (the semi-parametric efficiecy boud) achievable by ay RAL estimator of θ 0. Abhishek Chakrabortty High-D M-Estimatio with Missig Outcomes: A Semi-Parametric Framework 27/35
55 The DDR Mea Estimator: -Cosistecy ad RAL Properties Recall that, usig the expasio we already got, we have: ( θddr θ 0 ) = 1 i=1 ψ(z i) + T (m) + T (π) R. First term: ot a problem -scaled i.i.d. average of ψ(z) with E{ψ(Z)} = 0 ad Var{ψ(Z)} <. Last term: essetially make it go to zero. Will hold wheever π( ) π ( ) m( ) m ( ) = o P (1). A typical sufficiet (but ot ecessary) coditio: each of π( ) π ( ) ad m( ) m ( ) are o P ( 0.25 ). Now for T (m) 1 { } i=1 { m(x i ) m (X i )} 1 T i π (X i ) If π ( ) = π( ), the E{ T (m) } = 0 ad Var{ T (m) } = O[E X { m(x) m (X)} 2 ] = o(1). Hece, T (m) = o P (1). But if π ( ) π( ), the { m( ) m ( )} will cotribute ad the rate is uclear! T (m) may be slower tha 1/2 i high-d cases. Abhishek Chakrabortty High-D M-Estimatio with Missig Outcomes: A Semi-Parametric Framework 28/35
56 The DDR Mea Estimator: -Cosistecy ad RAL Properties Recall that, usig the expasio we already got, we have: ( θddr θ 0 ) = 1 i=1 ψ(z i) + T (m) + T (π) R. First term: ot a problem -scaled i.i.d. average of ψ(z) with E{ψ(Z)} = 0 ad Var{ψ(Z)} <. Last term: essetially make it go to zero. Will hold wheever π( ) π ( ) m( ) m ( ) = o P (1). A typical sufficiet (but ot ecessary) coditio: each of π( ) π ( ) ad m( ) m ( ) are o P ( 0.25 ). Now for T (m) 1 { } i=1 { m(x i ) m (X i )} 1 T i π (X i ) If π ( ) = π( ), the E{ T (m) } = 0 ad Var{ T (m) } = O[E X { m(x) m (X)} 2 ] = o(1). Hece, T (m) = o P (1). But if π ( ) π( ), the { m( ) m ( )} will cotribute ad the rate is uclear! T (m) may be slower tha 1/2 i high-d cases. Abhishek Chakrabortty High-D M-Estimatio with Missig Outcomes: A Semi-Parametric Framework 28/35
57 The DDR Mea Estimator: -Cosistecy ad RAL Properties Recall that, usig the expasio we already got, we have: ( θddr θ 0 ) = 1 i=1 ψ(z i) + T (m) + T (π) R. First term: ot a problem -scaled i.i.d. average of ψ(z) with E{ψ(Z)} = 0 ad Var{ψ(Z)} <. Last term: essetially make it go to zero. Will hold wheever π( ) π ( ) m( ) m ( ) = o P (1). A typical sufficiet (but ot ecessary) coditio: each of π( ) π ( ) ad m( ) m ( ) are o P ( 0.25 ). Now for T (m) 1 { } i=1 { m(x i ) m (X i )} 1 T i π (X i ) If π ( ) = π( ), the E{ T (m) } = 0 ad Var{ T (m) } = O[E X { m(x) m (X)} 2 ] = o(1). Hece, T (m) = o P (1). But if π ( ) π( ), the { m( ) m ( )} will cotribute ad the rate is uclear! T (m) may be slower tha 1/2 i high-d cases. Abhishek Chakrabortty High-D M-Estimatio with Missig Outcomes: A Semi-Parametric Framework 28/35
58 Aalysis of the DDR Estimator: RAL Properties Cotd. Next, for T (π) = 1 i=1 {Y i m (X i )} { Ti π(x i ) } T i π (X i ) If m ( ) = m( ), the E{ T (π) } = 0 ad Var{ T (π) } = O[E X { π(x) π(x)} 2 ] = o(1). Hece, T (π) = o P (1). But if π ( ) π( ), the { π( ) π( )} will cotribute ad the rate is uclear! T (π) may be slower tha 1/2 i high-d cases. Overall, as log as oe of m( ) or π( ) is correct, θ DDR is ideed a cosistet (ad hece, DR) estimator of θ 0. But its RAL properties & achievability of 1/2 rate will geerally require more coditios ad a case by case aalysis, ad the exact IF (if existet) depeds o which of the two are correct (ad how we obtai them). If m( ) ad π( ) are both correct, the regardless of how they are obtaied, θ DDR is always a optimal RAL estimator of θ 0. Choices of { π( ), m( )}? We first obtai a geeral theory ad the prove the rates for a class of choices (possibly misspecificied). Abhishek Chakrabortty High-D M-Estimatio with Missig Outcomes: A Semi-Parametric Framework 29/35
HIGH DIMENSIONAL M-ESTIMATION WITH MISSING OUTCOMES: A SEMI-PARAMETRIC FRAMEWORK
HIGH DIMENSIONAL M-ESTIMATION WITH MISSING OUTCOMES: A SEMI-PARAMETRIC FRAMEWORK By Abhishek Chakrabortty, T. Toy Cai ad Hogzhe Li Uiversity of Pesylvaia October 0, 018 I this paper, we cosider high dimesioal
More informationLecture 19: Convergence
Lecture 19: Covergece Asymptotic approach I statistical aalysis or iferece, a key to the success of fidig a good procedure is beig able to fid some momets ad/or distributios of various statistics. I may
More informationSlide Set 13 Linear Model with Endogenous Regressors and the GMM estimator
Slide Set 13 Liear Model with Edogeous Regressors ad the GMM estimator Pietro Coretto pcoretto@uisa.it Ecoometrics Master i Ecoomics ad Fiace (MEF) Uiversità degli Studi di Napoli Federico II Versio: Friday
More informationFirst Year Quantitative Comp Exam Spring, Part I - 203A. f X (x) = 0 otherwise
First Year Quatitative Comp Exam Sprig, 2012 Istructio: There are three parts. Aswer every questio i every part. Questio I-1 Part I - 203A A radom variable X is distributed with the margial desity: >
More informationSummary and Discussion on Simultaneous Analysis of Lasso and Dantzig Selector
Summary ad Discussio o Simultaeous Aalysis of Lasso ad Datzig Selector STAT732, Sprig 28 Duzhe Wag May 4, 28 Abstract This is a discussio o the work i Bickel, Ritov ad Tsybakov (29). We begi with a short
More informationAdvanced Stochastic Processes.
Advaced Stochastic Processes. David Gamarik LECTURE 2 Radom variables ad measurable fuctios. Strog Law of Large Numbers (SLLN). Scary stuff cotiued... Outlie of Lecture Radom variables ad measurable fuctios.
More informationMASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS
MASSACHUSTTS INSTITUT OF TCHNOLOGY 6.436J/5.085J Fall 2008 Lecture 9 /7/2008 LAWS OF LARG NUMBRS II Cotets. The strog law of large umbers 2. The Cheroff boud TH STRONG LAW OF LARG NUMBRS While the weak
More informationConvergence of random variables. (telegram style notes) P.J.C. Spreij
Covergece of radom variables (telegram style otes).j.c. Spreij this versio: September 6, 2005 Itroductio As we kow, radom variables are by defiitio measurable fuctios o some uderlyig measurable space
More informationDistribution of Random Samples & Limit theorems
STAT/MATH 395 A - PROBABILITY II UW Witer Quarter 2017 Néhémy Lim Distributio of Radom Samples & Limit theorems 1 Distributio of i.i.d. Samples Motivatig example. Assume that the goal of a study is to
More informationAn Introduction to Asymptotic Theory
A Itroductio to Asymptotic Theory Pig Yu School of Ecoomics ad Fiace The Uiversity of Hog Kog Pig Yu (HKU) Asymptotic Theory 1 / 20 Five Weapos i Asymptotic Theory Five Weapos i Asymptotic Theory Pig Yu
More informationEECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1
EECS564 Estimatio, Filterig, ad Detectio Hwk 2 Sols. Witer 25 4. Let Z be a sigle observatio havig desity fuctio where. p (z) = (2z + ), z (a) Assumig that is a oradom parameter, fid ad plot the maximum
More informationSample Size Estimation in the Proportional Hazards Model for K-sample or Regression Settings Scott S. Emerson, M.D., Ph.D.
ample ie Estimatio i the Proportioal Haards Model for K-sample or Regressio ettigs cott. Emerso, M.D., Ph.D. ample ie Formula for a Normally Distributed tatistic uppose a statistic is kow to be ormally
More informationA survey on penalized empirical risk minimization Sara A. van de Geer
A survey o pealized empirical risk miimizatio Sara A. va de Geer We address the questio how to choose the pealty i empirical risk miimizatio. Roughly speakig, this pealty should be a good boud for the
More informationLecture 8: Convergence of transformations and law of large numbers
Lecture 8: Covergece of trasformatios ad law of large umbers Trasformatio ad covergece Trasformatio is a importat tool i statistics. If X coverges to X i some sese, we ofte eed to check whether g(x ) coverges
More informationChapter 3. Strong convergence. 3.1 Definition of almost sure convergence
Chapter 3 Strog covergece As poited out i the Chapter 2, there are multiple ways to defie the otio of covergece of a sequece of radom variables. That chapter defied covergece i probability, covergece i
More informationLecture 33: Bootstrap
Lecture 33: ootstrap Motivatio To evaluate ad compare differet estimators, we eed cosistet estimators of variaces or asymptotic variaces of estimators. This is also importat for hypothesis testig ad cofidece
More informationLecture 24: Variable selection in linear models
Lecture 24: Variable selectio i liear models Cosider liear model X = Z β + ε, β R p ad Varε = σ 2 I. Like the LSE, the ridge regressio estimator does ot give 0 estimate to a compoet of β eve if that compoet
More informationEconomics 241B Relation to Method of Moments and Maximum Likelihood OLSE as a Maximum Likelihood Estimator
Ecoomics 24B Relatio to Method of Momets ad Maximum Likelihood OLSE as a Maximum Likelihood Estimator Uder Assumptio 5 we have speci ed the distributio of the error, so we ca estimate the model parameters
More informationProduct measures, Tonelli s and Fubini s theorems For use in MAT3400/4400, autumn 2014 Nadia S. Larsen. Version of 13 October 2014.
Product measures, Toelli s ad Fubii s theorems For use i MAT3400/4400, autum 2014 Nadia S. Larse Versio of 13 October 2014. 1. Costructio of the product measure The purpose of these otes is to preset the
More informationEfficient GMM LECTURE 12 GMM II
DECEMBER 1 010 LECTURE 1 II Efficiet The estimator depeds o the choice of the weight matrix A. The efficiet estimator is the oe that has the smallest asymptotic variace amog all estimators defied by differet
More informationECONOMETRIC THEORY. MODULE XIII Lecture - 34 Asymptotic Theory and Stochastic Regressors
ECONOMETRIC THEORY MODULE XIII Lecture - 34 Asymptotic Theory ad Stochastic Regressors Dr. Shalabh Departmet of Mathematics ad Statistics Idia Istitute of Techology Kapur Asymptotic theory The asymptotic
More informationSequences and Series of Functions
Chapter 6 Sequeces ad Series of Fuctios 6.1. Covergece of a Sequece of Fuctios Poitwise Covergece. Defiitio 6.1. Let, for each N, fuctio f : A R be defied. If, for each x A, the sequece (f (x)) coverges
More informationLecture 20: Multivariate convergence and the Central Limit Theorem
Lecture 20: Multivariate covergece ad the Cetral Limit Theorem Covergece i distributio for radom vectors Let Z,Z 1,Z 2,... be radom vectors o R k. If the cdf of Z is cotiuous, the we ca defie covergece
More informationLecture 3: August 31
36-705: Itermediate Statistics Fall 018 Lecturer: Siva Balakrisha Lecture 3: August 31 This lecture will be mostly a summary of other useful expoetial tail bouds We will ot prove ay of these i lecture,
More informationBasics of Probability Theory (for Theory of Computation courses)
Basics of Probability Theory (for Theory of Computatio courses) Oded Goldreich Departmet of Computer Sciece Weizma Istitute of Sciece Rehovot, Israel. oded.goldreich@weizma.ac.il November 24, 2008 Preface.
More informationECE 901 Lecture 12: Complexity Regularization and the Squared Loss
ECE 90 Lecture : Complexity Regularizatio ad the Squared Loss R. Nowak 5/7/009 I the previous lectures we made use of the Cheroff/Hoeffdig bouds for our aalysis of classifier errors. Hoeffdig s iequality
More information5 : Exponential Family and Generalized Linear Models
0-708: Probabilistic Graphical Models 0-708, Sprig 206 5 : Expoetial Family ad Geeralized Liear Models Lecturer: Matthew Gormley Scribes: Yua Li, Yichog Xu, Silu Wag Expoetial Family Probability desity
More information6 Sample Size Calculations
6 Sample Size Calculatios Oe of the major resposibilities of a cliical trial statisticia is to aid the ivestigators i determiig the sample size required to coduct a study The most commo procedure for determiig
More informationStochastic Simulation
Stochastic Simulatio 1 Itroductio Readig Assigmet: Read Chapter 1 of text. We shall itroduce may of the key issues to be discussed i this course via a couple of model problems. Model Problem 1 (Jackso
More informationResampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.
Jauary 1, 2019 Resamplig Methods Motivatio We have so may estimators with the property θ θ d N 0, σ 2 We ca also write θ a N θ, σ 2 /, where a meas approximately distributed as Oce we have a cosistet estimator
More informationA Distributional Approach Using Propensity Scores
A Distributioal Approach Usig Propesity Scores Zhiqiag Ta Departmet of Biostatistics Johs Hopkis School of Public Health http://www.biostat.jhsph.edu/ zta Jue 20, 2005 Outlie Itroductio Couterfactual framework
More informationEmpirical Processes: Glivenko Cantelli Theorems
Empirical Processes: Gliveko Catelli Theorems Mouliath Baerjee Jue 6, 200 Gliveko Catelli classes of fuctios The reader is referred to Chapter.6 of Weller s Torgo otes, Chapter??? of VDVW ad Chapter 8.3
More information7.1 Convergence of sequences of random variables
Chapter 7 Limit Theorems Throughout this sectio we will assume a probability space (, F, P), i which is defied a ifiite sequece of radom variables (X ) ad a radom variable X. The fact that for every ifiite
More informationOptimally Sparse SVMs
A. Proof of Lemma 3. We here prove a lower boud o the umber of support vectors to achieve geeralizatio bouds of the form which we cosider. Importatly, this result holds ot oly for liear classifiers, but
More informationLECTURE 14 NOTES. A sequence of α-level tests {ϕ n (x)} is consistent if
LECTURE 14 NOTES 1. Asymptotic power of tests. Defiitio 1.1. A sequece of -level tests {ϕ x)} is cosistet if β θ) := E θ [ ϕ x) ] 1 as, for ay θ Θ 1. Just like cosistecy of a sequece of estimators, Defiitio
More informationMATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4
MATH 30: Probability ad Statistics 9. Estimatio ad Testig of Parameters Estimatio ad Testig of Parameters We have bee dealig situatios i which we have full kowledge of the distributio of a radom variable.
More informationSupplemental Material: Proofs
Proof to Theorem Supplemetal Material: Proofs Proof. Let be the miimal umber of traiig items to esure a uique solutio θ. First cosider the case. It happes if ad oly if θ ad Rak(A) d, which is a special
More informationAsymptotic Results for the Linear Regression Model
Asymptotic Results for the Liear Regressio Model C. Fli November 29, 2000 1. Asymptotic Results uder Classical Assumptios The followig results apply to the liear regressio model y = Xβ + ε, where X is
More informationRandom Variables, Sampling and Estimation
Chapter 1 Radom Variables, Samplig ad Estimatio 1.1 Itroductio This chapter will cover the most importat basic statistical theory you eed i order to uderstad the ecoometric material that will be comig
More informationLecture 3 : Random variables and their distributions
Lecture 3 : Radom variables ad their distributios 3.1 Radom variables Let (Ω, F) ad (S, S) be two measurable spaces. A map X : Ω S is measurable or a radom variable (deoted r.v.) if X 1 (A) {ω : X(ω) A}
More informationSEMIPARAMETRIC SINGLE-INDEX MODELS. Joel L. Horowitz Department of Economics Northwestern University
SEMIPARAMETRIC SINGLE-INDEX MODELS by Joel L. Horowitz Departmet of Ecoomics Northwester Uiversity INTRODUCTION Much of applied ecoometrics ad statistics ivolves estimatig a coditioal mea fuctio: E ( Y
More informationJanuary 25, 2017 INTRODUCTION TO MATHEMATICAL STATISTICS
Jauary 25, 207 INTRODUCTION TO MATHEMATICAL STATISTICS Abstract. A basic itroductio to statistics assumig kowledge of probability theory.. Probability I a typical udergraduate problem i probability, we
More informationStatisticians use the word population to refer the total number of (potential) observations under consideration
6 Samplig Distributios Statisticias use the word populatio to refer the total umber of (potetial) observatios uder cosideratio The populatio is just the set of all possible outcomes i our sample space
More informationFall 2013 MTH431/531 Real analysis Section Notes
Fall 013 MTH431/531 Real aalysis Sectio 8.1-8. Notes Yi Su 013.11.1 1. Defiitio of uiform covergece. We look at a sequece of fuctios f (x) ad study the coverget property. Notice we have two parameters
More informationAda Boost, Risk Bounds, Concentration Inequalities. 1 AdaBoost and Estimates of Conditional Probabilities
CS8B/Stat4B Sprig 008) Statistical Learig Theory Lecture: Ada Boost, Risk Bouds, Cocetratio Iequalities Lecturer: Peter Bartlett Scribe: Subhrasu Maji AdaBoost ad Estimates of Coditioal Probabilities We
More informationlim za n n = z lim a n n.
Lecture 6 Sequeces ad Series Defiitio 1 By a sequece i a set A, we mea a mappig f : N A. It is customary to deote a sequece f by {s } where, s := f(). A sequece {z } of (complex) umbers is said to be coverget
More informationA RANK STATISTIC FOR NON-PARAMETRIC K-SAMPLE AND CHANGE POINT PROBLEMS
J. Japa Statist. Soc. Vol. 41 No. 1 2011 67 73 A RANK STATISTIC FOR NON-PARAMETRIC K-SAMPLE AND CHANGE POINT PROBLEMS Yoichi Nishiyama* We cosider k-sample ad chage poit problems for idepedet data i a
More informationLecture 12: September 27
36-705: Itermediate Statistics Fall 207 Lecturer: Siva Balakrisha Lecture 2: September 27 Today we will discuss sufficiecy i more detail ad the begi to discuss some geeral strategies for costructig estimators.
More information6.867 Machine learning, lecture 7 (Jaakkola) 1
6.867 Machie learig, lecture 7 (Jaakkola) 1 Lecture topics: Kerel form of liear regressio Kerels, examples, costructio, properties Liear regressio ad kerels Cosider a slightly simpler model where we omit
More informationSummary. Recap ... Last Lecture. Summary. Theorem
Last Lecture Biostatistics 602 - Statistical Iferece Lecture 23 Hyu Mi Kag April 11th, 2013 What is p-value? What is the advatage of p-value compared to hypothesis testig procedure with size α? How ca
More informationDiscrete Mathematics for CS Spring 2008 David Wagner Note 22
CS 70 Discrete Mathematics for CS Sprig 2008 David Wager Note 22 I.I.D. Radom Variables Estimatig the bias of a coi Questio: We wat to estimate the proportio p of Democrats i the US populatio, by takig
More informationIntroduction to Extreme Value Theory Laurens de Haan, ISM Japan, Erasmus University Rotterdam, NL University of Lisbon, PT
Itroductio to Extreme Value Theory Laures de Haa, ISM Japa, 202 Itroductio to Extreme Value Theory Laures de Haa Erasmus Uiversity Rotterdam, NL Uiversity of Lisbo, PT Itroductio to Extreme Value Theory
More informationExponential Families and Bayesian Inference
Computer Visio Expoetial Families ad Bayesia Iferece Lecture Expoetial Families A expoetial family of distributios is a d-parameter family f(x; havig the followig form: f(x; = h(xe g(t T (x B(, (. where
More informationChapter 6 Principles of Data Reduction
Chapter 6 for BST 695: Special Topics i Statistical Theory. Kui Zhag, 0 Chapter 6 Priciples of Data Reductio Sectio 6. Itroductio Goal: To summarize or reduce the data X, X,, X to get iformatio about a
More informationANALYSIS OF EXPERIMENTAL ERRORS
ANALYSIS OF EXPERIMENTAL ERRORS All physical measuremets ecoutered i the verificatio of physics theories ad cocepts are subject to ucertaities that deped o the measurig istrumets used ad the coditios uder
More informationb i u x i U a i j u x i u x j
M ath 5 2 7 Fall 2 0 0 9 L ecture 1 9 N ov. 1 6, 2 0 0 9 ) S ecod- Order Elliptic Equatios: Weak S olutios 1. Defiitios. I this ad the followig two lectures we will study the boudary value problem Here
More information4. Partial Sums and the Central Limit Theorem
1 of 10 7/16/2009 6:05 AM Virtual Laboratories > 6. Radom Samples > 1 2 3 4 5 6 7 4. Partial Sums ad the Cetral Limit Theorem The cetral limit theorem ad the law of large umbers are the two fudametal theorems
More informationBerry-Esseen bounds for self-normalized martingales
Berry-Essee bouds for self-ormalized martigales Xiequa Fa a, Qi-Ma Shao b a Ceter for Applied Mathematics, Tiaji Uiversity, Tiaji 30007, Chia b Departmet of Statistics, The Chiese Uiversity of Hog Kog,
More informationKernel density estimator
Jauary, 07 NONPARAMETRIC ERNEL DENSITY ESTIMATION I this lecture, we discuss kerel estimatio of probability desity fuctios PDF Noparametric desity estimatio is oe of the cetral problems i statistics I
More informationThe standard deviation of the mean
Physics 6C Fall 20 The stadard deviatio of the mea These otes provide some clarificatio o the distictio betwee the stadard deviatio ad the stadard deviatio of the mea.. The sample mea ad variace Cosider
More informationLecture 19. sup y 1,..., yn B d n
STAT 06A: Polyomials of adom Variables Lecture date: Nov Lecture 19 Grothedieck s Iequality Scribe: Be Hough The scribes are based o a guest lecture by ya O Doell. I this lecture we prove Grothedieck s
More informationInformation Theory and Statistics Lecture 4: Lempel-Ziv code
Iformatio Theory ad Statistics Lecture 4: Lempel-Ziv code Łukasz Dębowski ldebowsk@ipipa.waw.pl Ph. D. Programme 203/204 Etropy rate is the limitig compressio rate Theorem For a statioary process (X i)
More information1.010 Uncertainty in Engineering Fall 2008
MIT OpeCourseWare http://ocw.mit.edu.00 Ucertaity i Egieerig Fall 2008 For iformatio about citig these materials or our Terms of Use, visit: http://ocw.mit.edu.terms. .00 - Brief Notes # 9 Poit ad Iterval
More informationEstimation for Complete Data
Estimatio for Complete Data complete data: there is o loss of iformatio durig study. complete idividual complete data= grouped data A complete idividual data is the oe i which the complete iformatio of
More informationStatistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.
Statistical Iferece (Chapter 10) Statistical iferece = lear about a populatio based o the iformatio provided by a sample. Populatio: The set of all values of a radom variable X of iterest. Characterized
More informationMachine Learning Theory Tübingen University, WS 2016/2017 Lecture 11
Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture Tolstikhi Ilya Abstract We will itroduce the otio of reproducig kerels ad associated Reproducig Kerel Hilbert Spaces (RKHS). We will cosider couple
More information5. Likelihood Ratio Tests
1 of 5 7/29/2009 3:16 PM Virtual Laboratories > 9. Hy pothesis Testig > 1 2 3 4 5 6 7 5. Likelihood Ratio Tests Prelimiaries As usual, our startig poit is a radom experimet with a uderlyig sample space,
More informationLecture 12: November 13, 2018
Mathematical Toolkit Autum 2018 Lecturer: Madhur Tulsiai Lecture 12: November 13, 2018 1 Radomized polyomial idetity testig We will use our kowledge of coditioal probability to prove the followig lemma,
More informationProperties and Hypothesis Testing
Chapter 3 Properties ad Hypothesis Testig 3.1 Types of data The regressio techiques developed i previous chapters ca be applied to three differet kids of data. 1. Cross-sectioal data. 2. Time series data.
More informationLECTURE 8: ASYMPTOTICS I
LECTURE 8: ASYMPTOTICS I We are iterested i the properties of estimators as. Cosider a sequece of radom variables {, X 1}. N. M. Kiefer, Corell Uiversity, Ecoomics 60 1 Defiitio: (Weak covergece) A sequece
More informationAlgorithms for Clustering
CR2: Statistical Learig & Applicatios Algorithms for Clusterig Lecturer: J. Salmo Scribe: A. Alcolei Settig: give a data set X R p where is the umber of observatio ad p is the umber of features, we wat
More informationBayesian Methods: Introduction to Multi-parameter Models
Bayesia Methods: Itroductio to Multi-parameter Models Parameter: θ = ( θ, θ) Give Likelihood p(y θ) ad prior p(θ ), the posterior p proportioal to p(y θ) x p(θ ) Margial posterior ( θ, θ y) is Iterested
More informationAn Introduction to Randomized Algorithms
A Itroductio to Radomized Algorithms The focus of this lecture is to study a radomized algorithm for quick sort, aalyze it usig probabilistic recurrece relatios, ad also provide more geeral tools for aalysis
More informationStatistical Inference Based on Extremum Estimators
T. Rotheberg Fall, 2007 Statistical Iferece Based o Extremum Estimators Itroductio Suppose 0, the true value of a p-dimesioal parameter, is kow to lie i some subset S R p : Ofte we choose to estimate 0
More informationLecture 01: the Central Limit Theorem. 1 Central Limit Theorem for i.i.d. random variables
CSCI-B609: A Theorist s Toolkit, Fall 06 Aug 3 Lecture 0: the Cetral Limit Theorem Lecturer: Yua Zhou Scribe: Yua Xie & Yua Zhou Cetral Limit Theorem for iid radom variables Let us say that we wat to aalyze
More informationRandom Walks on Discrete and Continuous Circles. by Jeffrey S. Rosenthal School of Mathematics, University of Minnesota, Minneapolis, MN, U.S.A.
Radom Walks o Discrete ad Cotiuous Circles by Jeffrey S. Rosethal School of Mathematics, Uiversity of Miesota, Mieapolis, MN, U.S.A. 55455 (Appeared i Joural of Applied Probability 30 (1993), 780 789.)
More information7.1 Convergence of sequences of random variables
Chapter 7 Limit theorems Throughout this sectio we will assume a probability space (Ω, F, P), i which is defied a ifiite sequece of radom variables (X ) ad a radom variable X. The fact that for every ifiite
More informationLecture 11 and 12: Basic estimation theory
Lecture ad 2: Basic estimatio theory Sprig 202 - EE 94 Networked estimatio ad cotrol Prof. Kha March 2 202 I. MAXIMUM-LIKELIHOOD ESTIMATORS The maximum likelihood priciple is deceptively simple. Louis
More informationOn Classification Based on Totally Bounded Classes of Functions when There are Incomplete Covariates
Joural of Statistical Theory ad Applicatios Volume, Number 4, 0, pp. 353-369 ISSN 538-7887 O Classificatio Based o Totally Bouded Classes of Fuctios whe There are Icomplete Covariates Majid Mojirsheibai
More information6. Sufficient, Complete, and Ancillary Statistics
Sufficiet, Complete ad Acillary Statistics http://www.math.uah.edu/stat/poit/sufficiet.xhtml 1 of 7 7/16/2009 6:13 AM Virtual Laboratories > 7. Poit Estimatio > 1 2 3 4 5 6 6. Sufficiet, Complete, ad Acillary
More informationSupport vector machine revisited
6.867 Machie learig, lecture 8 (Jaakkola) 1 Lecture topics: Support vector machie ad kerels Kerel optimizatio, selectio Support vector machie revisited Our task here is to first tur the support vector
More informationRates of Convergence by Moduli of Continuity
Rates of Covergece by Moduli of Cotiuity Joh Duchi: Notes for Statistics 300b March, 017 1 Itroductio I this ote, we give a presetatio showig the importace, ad relatioship betwee, the modulis of cotiuity
More informationCS284A: Representations and Algorithms in Molecular Biology
CS284A: Represetatios ad Algorithms i Molecular Biology Scribe Notes o Lectures 3 & 4: Motif Discovery via Eumeratio & Motif Represetatio Usig Positio Weight Matrix Joshua Gervi Based o presetatios by
More informationMASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 3 9/11/2013. Large deviations Theory. Cramér s Theorem
MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/5.070J Fall 203 Lecture 3 9//203 Large deviatios Theory. Cramér s Theorem Cotet.. Cramér s Theorem. 2. Rate fuctio ad properties. 3. Chage of measure techique.
More informationEcon 325 Notes on Point Estimator and Confidence Interval 1 By Hiro Kasahara
Poit Estimator Eco 325 Notes o Poit Estimator ad Cofidece Iterval 1 By Hiro Kasahara Parameter, Estimator, ad Estimate The ormal probability desity fuctio is fully characterized by two costats: populatio
More informationLecture Stat Maximum Likelihood Estimation
Lecture Stat 461-561 Maximum Likelihood Estimatio A.D. Jauary 2008 A.D. () Jauary 2008 1 / 63 Maximum Likelihood Estimatio Ivariace Cosistecy E ciecy Nuisace Parameters A.D. () Jauary 2008 2 / 63 Parametric
More informationNotes On Median and Quantile Regression. James L. Powell Department of Economics University of California, Berkeley
Notes O Media ad Quatile Regressio James L. Powell Departmet of Ecoomics Uiversity of Califoria, Berkeley Coditioal Media Restrictios ad Least Absolute Deviatios It is well-kow that the expected value
More information1 Covariance Estimation
Eco 75 Lecture 5 Covariace Estimatio ad Optimal Weightig Matrices I this lecture, we cosider estimatio of the asymptotic covariace matrix B B of the extremum estimator b : Covariace Estimatio Lemma 4.
More informationREGRESSION WITH QUADRATIC LOSS
REGRESSION WITH QUADRATIC LOSS MAXIM RAGINSKY Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X, Y ), where, as before, X is a R d
More informationLinear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d
Liear regressio Daiel Hsu (COMS 477) Maximum likelihood estimatio Oe of the simplest liear regressio models is the followig: (X, Y ),..., (X, Y ), (X, Y ) are iid radom pairs takig values i R d R, ad Y
More informationMeasure and Measurable Functions
3 Measure ad Measurable Fuctios 3.1 Measure o a Arbitrary σ-algebra Recall from Chapter 2 that the set M of all Lebesgue measurable sets has the followig properties: R M, E M implies E c M, E M for N implies
More informationMATHEMATICAL SCIENCES PAPER-II
MATHEMATICAL SCIENCES PAPER-II. Let {x } ad {y } be two sequeces of real umbers. Prove or disprove each of the statemets :. If {x y } coverges, ad if {y } is coverget, the {x } is coverget.. {x + y } coverges
More informationOutput Analysis and Run-Length Control
IEOR E4703: Mote Carlo Simulatio Columbia Uiversity c 2017 by Marti Haugh Output Aalysis ad Ru-Legth Cotrol I these otes we describe how the Cetral Limit Theorem ca be used to costruct approximate (1 α%
More informationLecture 7: Properties of Random Samples
Lecture 7: Properties of Radom Samples 1 Cotiued From Last Class Theorem 1.1. Let X 1, X,...X be a radom sample from a populatio with mea µ ad variace σ
More informationDirichlet s Theorem on Arithmetic Progressions
Dirichlet s Theorem o Arithmetic Progressios Athoy Várilly Harvard Uiversity, Cambridge, MA 0238 Itroductio Dirichlet s theorem o arithmetic progressios is a gem of umber theory. A great part of its beauty
More informationBinomial Distribution
0.0 0.5 1.0 1.5 2.0 2.5 3.0 0 1 2 3 4 5 6 7 0.0 0.5 1.0 1.5 2.0 2.5 3.0 Overview Example: coi tossed three times Defiitio Formula Recall that a r.v. is discrete if there are either a fiite umber of possible
More informationLarge Sample Theory. Convergence. Central Limit Theorems Asymptotic Distribution Delta Method. Convergence in Probability Convergence in Distribution
Large Sample Theory Covergece Covergece i Probability Covergece i Distributio Cetral Limit Theorems Asymptotic Distributio Delta Method Covergece i Probability A sequece of radom scalars {z } = (z 1,z,
More informationIntegrable Functions. { f n } is called a determining sequence for f. If f is integrable with respect to, then f d does exist as a finite real number
MATH 532 Itegrable Fuctios Dr. Neal, WKU We ow shall defie what it meas for a measurable fuctio to be itegrable, show that all itegral properties of simple fuctios still hold, ad the give some coditios
More informationLecture 11 October 27
STATS 300A: Theory of Statistics Fall 205 Lecture October 27 Lecturer: Lester Mackey Scribe: Viswajith Veugopal, Vivek Bagaria, Steve Yadlowsky Warig: These otes may cotai factual ad/or typographic errors..
More informationAccuracy Assessment for High-Dimensional Linear Regression
Uiversity of Pesylvaia ScholarlyCommos Statistics Papers Wharto Faculty Research -016 Accuracy Assessmet for High-Dimesioal Liear Regressio Toy Cai Uiversity of Pesylvaia Zijia Guo Uiversity of Pesylvaia
More information