Three Approaches towards Optimal Property Estimation and Testing

Size: px

Start display at page:

Download "Three Approaches towards Optimal Property Estimation and Testing"

Eugenia Robinson
6 years ago
Views:

1 Three Approaches towards Optimal Property Estimatio ad Testig Jiatao Jiao (taford EE) Joit work with: Yaju Ha, Dmitri Pavlichi, Kartik Vekat, Tsachy Weissma Frotiers i Distributio Testig Workshop, FOC 2017 Oct. 14th, / 23

tatistical properties Disclaimer: Throughout this talk, refers to the umber of samples, refer to the alphabet size of a distributio. 1 hao etropy: H(P) i=1 p i l p i.

2 tatistical properties Disclaimer: Throughout this talk, refers to the umber of samples, refer to the alphabet size of a distributio. 1 hao etropy: H(P) i=1 p i l p i. 2 F α (P): F α (P) i=1 pα i, α > 0. 3 KL divergece, χ 2 divergece, L 1 distace, Helliger distace F (P, Q) i=1 f (p i, q i ) for f (x, y) = x l(x/y), (x y) 2 /x, x y, ( x y) 2. 2 / 23

3 Tolerat testig/learig/estimatio We focus o the questio: how may samples are eeded to achieve accuracy ɛ for estimatig these properties from empirical data? Example: L 1 (P, U ), U = (1/, 1/,..., 1/), observe i.i.d. samples from P; (VV 11, VV 11): exist approach whose error is l ; o cosistet estimator whe The MLE plug-i L 1 ( ˆP, U ) achieves error l whe l ; whe. 3 / 23

4 Tolerat testig/learig/estimatio We focus o the questio: how may samples are eeded to achieve accuracy ɛ for estimatig these properties from empirical data? Example: L 1 (P, U ), U = (1/, 1/,..., 1/), observe i.i.d. samples from P; (VV 11, VV 11): exist approach whose error is l ; o cosistet estimator whe The MLE plug-i L 1 ( ˆP, U ) achieves error Effective sample size elargemet l whe l ; whe. Miimax rate-optimal with samples MLE with l samples imilar results also hold for hao etropy (VV 11, VV 11, VV 13, WY 16, JVHW 15), power sum fuctioal (JVHW 15), Réyi etropy estimatio (AOT 14), χ 2, Helliger, ad KL-divergece estimatio (HJW 16, BZLV 16), L r orm estimatio uder Gaussia white oise model (HJMW 17), L 1 distace estimatio (JHW 16), etc. except for support size (WY 16) 3 / 23

5 Effective sample size elargemet R mimax (F, P, ) = if ˆF (X 1,...,X ) P P sup E ˆF F (P) R plug-i (F, P, ) = sup E F ( ˆP ) F (P). P P F (P) P R mimax (F, P, ) R plug-i (F, P, ) ( ) 1 p i log M p i=1 i log() + log() + log() F α(p) = p α i, 0 < α 1 M 2 ( log()) i=1 α α F 1 α(p), 2 < α < 1 M ( log()) α + 1 α α + 1 α F α(p), 1 < α < 2 3 M ( log()) (α 1) (α 1) F α(p), α M 2 ( { }) log() 1(p i 0) {P : mi i p i 1 Θ max, } e ( ) e Θ i=1 qi p i q i M q i l i=1 i=1 i=1 qi q i 4 / 23

6 Effective sample size elargemet Divergece fuctios: here P, Q M where we have m samples from p ad samples from q. For the Kullback-Leibler ad χ 2 divergece estimators we oly cosider (P, Q) {(P, Q) P, Q M, P i Q i u()} where u() is some fuctio of. F (P, Q) R mimax (F, P, m, ) R plug-i (F, P, m, ) p i q i mi{m, } log(mi{m, }) + mi{m, } i=1 1 ( p i q i ) 2 2 mi{m, } log(mi{m, }) mi{m, } i=1 ( ) pi D(P Q) = p i log q i=1 i m log(m) + u() log() + log(u()) u() + m m + u() + log(u()) u() + m χ 2 pi 2 u() 2 (P Q) = 1 q i=1 i log() + u() + u()3/2 u() 2 + u() + u()3/2 m m 5 / 23

7 Goal of this talk Uderstad the mechaism behid the logarithmic sample size elargemet. For what fuctioals do we have this pheomeo? What cocrete algorithms achieve this pheomeo? If there exist multiple approaches, what are their relative advatages ad disadvatages? 6 / 23

8 First approach: Approximatio methodology Questio Is the elargemet pheomeo caused by the fact that the fuctioals are permutatio ivariat (symmetric)? 7 / 23

9 First approach: Approximatio methodology Questio Is the elargemet pheomeo caused by the fact that the fuctioals are permutatio ivariat (symmetric)? Aswer Nope. :) Literature o approximatio methodology VV 11 (liear estimator), WY 16, WY 16 JVHW 15, AOT 14, HJW 16, BZLV 16, HJMW 16, JHW 16 7 / 23

10 Example: L 1 distace estimatio Give Q = (q 1, q 2,..., q ), we estimate L 1 (P, Q) give i.i.d. samples from P. Theorem (J., Ha, Weissma 16) ( ) uppose l l l i=1 qi q i l, 2. The, if ˆL For the MLE, we have sup E P ˆL L 1 (P, Q) P M qi q i l. (1) i=1 sup E P L 1 ( ˆP, Q) L 1 (P, Q) P M q i i=1 qi. (2) 8 / 23

11 Cofidece sets i biomial model: coverage probability 1 A 0 1 Θ = [0, 1] ˆp B(, p)

12 Cofidece sets i biomial model: coverage probability 1 A l 0 1 Θ = [0, 1] ˆp B(, p)

13 Cofidece sets i biomial model: coverage probability 1 A l 0 1 ˆp < l Θ = [0, 1] ˆp B(, p)

14 Cofidece sets i biomial model: coverage probability 1 A l l U(ˆp) 0 1 ˆp < l Θ = [0, 1] ˆp B(, p)

15 Cofidece sets i biomial model: coverage probability 1 A l l U(ˆp) 0 1 ˆp < l ˆp > l Θ = [0, 1] ˆp B(, p)

16 Cofidece sets i biomial model: coverage probability 1 A l l ˆp l U(ˆp) U(ˆp) 0 1 ˆp < l ˆp > l Θ = [0, 1] ˆp B(, p) 9 / 23

17 Cofidece sets i biomial model: coverage probability 1 A l l ˆp l U(ˆp) U(ˆp) 0 1 ˆp < l ˆp > l Theorem (J., Ha, Weissma 16) Θ = [0, 1] ˆp B(, p) Partitio [0, 1] ito fiitely umber of itervals I i = [x i, x i+1 ], x 0 = 0, x 1 l, x i+1 l x i. The, 1 if p I i, the ˆp 2I i with probability 1 A ; 2 if ˆp I i, the p 2I i with probability 1 A ; 3 Those itervals are of the shortest legth. 9 / 23

18 Algorithmic descriptio of Approximatio methodology First coduct samplig splittig, get ˆp i, ˆp i i.i.d. with distributio 2 B(/2, p i). uppose q i I j. For each i do the followig: 1 if ˆp i I j, compute best polyomial approximatio i 2I j : P K (x; q i ) = arg mi max z q i P(z), (3) P Poly K z 2I j ad the estimate p i q i by the ubiased estimator of P K (p i ; q i ) usig ˆp i ; 2 if ˆp i / I j, estimate p i q i by ˆp i q i ; 3 sum everythig up. 10 / 23

19 Why it works? 1 uppose ˆp i I j. No matter what we use to estimate, oe ca always assume that p i 2I j ; 2 The bias of the MLE is approximately (trukov ad Tima 77) qi sup p i q i E ˆp i q i q i p i 2I j ; (4) 3 The bias of the Approximatio methodology is approximately (Ditzia ad Totik 87) qi sup p i q i P K (p i ; q i ) q i p i 2I j l. (5) 4 Permutatio ivariace does ot play a role sice we are doig symbol by symbol bias correctio; 5 The bias domiates i high dimesios (measure cocetratio pheomeo). 11 / 23

20 Properties of the Approximatio Methodology 1 Applies to essetially ay fuctioal 2 Applies to a wide rage of statistical models (biomial, Poisso, Gaussia, etc) 3 Near-liear complexity 4 Explicit polyomial approximatio for each differet fuctioal 5 Need to tue parameters i practice 12 / 23

21 ecod approach: Local momet matchig methodology Motivatio Does there exist a sigle plug-i estimator that ca replace the Approximatio methodology? 13 / 23

22 ecod approach: Local momet matchig methodology Motivatio Does there exist a sigle plug-i estimator that ca replace the Approximatio methodology? Aswer No. For ay plug-i rule ˆP, there exists a fixed Q such that L 1 ( ˆP, Q) requires samples to cosistetly estimate L 1 (P, Q), while the optimal method requires at most l. 13 / 23

23 ecod approach: Local momet matchig methodology Motivatio Does there exist a sigle plug-i estimator that ca replace the Approximatio methodology? Aswer No. For ay plug-i rule ˆP, there exists a fixed Q such that L 1 ( ˆP, Q) requires samples to cosistetly estimate L 1 (P, Q), while the optimal method requires at most l. Weakeed goal What about we oly cosider permutatio ivariat fuctioals? Literature o the local momet matchig methodology VV 11 (liear programmig), HJW / 23

24 Local momet matchig methodology Theorem (Ha, J., Weissma 17) There exists a sigle estimator ˆP, efficietly computable, ad achieves the optimal phase trasitios for ALL the permutatio ivariat fuctioals metioed above. I particular, it solves the miimax problem if ˆP sup P M E ˆP P < 1 l + where P < = (p (1), p (2),..., p () ), p (i) p (i+1). ( Õ( 1/3 ) ), (6) 14 / 23

25 A simple example Assume for all i, p i l, ˆp i l. Cosider the hao etropy fuctioal H(P) = i=1 f (p i), f (x) = x l(1/x). Theorem (VV 11, Wu ad Yag 16, J. et al 15) Optimal error i estimatig H is l, while MLE error is. 15 / 23

26 A simple example Assume for all i, p i l, ˆp i l. Cosider the hao etropy fuctioal H(P) = i=1 f (p i), f (x) = x l(1/x). Theorem (VV 11, Wu ad Yag 16, J. et al 15) Optimal error i estimatig H is l, while MLE error is. uppose we use the plug-i rule i=1 f (q i) to estimate H(P), where q i l. The, for ay P K (x) Poly K, K = l, H i f (q i ) = i (f (p i ) P K (p i )) + i (P K (p i ) P K (q i )) + (P K (q i ) f (q i )) i 2 if max f (x) P K (x) + (P K (p i ) P K (q i )) P K x [0, l ] i l + (P K (p i ) P K (q i )). i 15 / 23

27 Local momet matchig We showed for ay plug-i rule Q, H i f (q i ) l + i (P K (p i ) P K (q i )). (7) Why MLE is bad? The MLE is bad because [ E (P K (p i ) P K (q i ))]. (8) i olutio It suffices to reduce the bias of P K (q i ) i estimatig P K (p i ). 16 / 23

28 Local momet matchig Ideal situatio uppose for each 0 k l, j p k j = j q k j, (9) we immediately have [ ] E (P K (p i ) P K (q i )) = 0. (10) i 17 / 23

29 Algorithmic descriptio of local momet matchig For each iterval I j, collect A = {i : ˆp i I j }. The, for each 0 k l, we solve Q such that ( qi k ubiased estimates of ) pi k ɛ σ k,a, (11) i A i A here σ k,a = stadard deviatio of ubiased estimates of pi k. (12) i A Existece of solutio The solutio exists with overwhelmig probability sice the true distributio P satisfies these iequalities with overwhelmig probability. 18 / 23

30 Properties of the Local momet matchig Methodology 1 Applies oly to permutatio ivariat fuctioals 2 Applies to a wide rage of statistical models (biomial, Poisso, Gaussia, etc) 3 Polyomial complexity 4 Implicit polyomial approximatio, just eed to compute oce 5 Need to tue parameters i practice 19 / 23

31 Third approach: the profile maximum likelihood methodology (PML) Properties Approximatio Local MM PML Permutatio ivariat No Yes Yes tatistical model Broad Broad (Cojectured) Broad Complexity Near-liear Polyomial Uclear Fuctioal depedet Yes No No Parameter tuig Yes Yes No Thak you! 20 / 23

32 Literature Jayadev Acharya, Hirakedu Das, Alo Orlitsky, ad Aada Theertha uresh. A uified maximum likelihood approach for optimal distributio property estimatio, Proceedigs of ICML, Jiatao Jiao, Yaju Ha, ad Tsachy Weissma. Miimax Estimatio of the L 1 Distace, arxiv e-prits, May 2017 Gregory Valiat ad Paul Valiat. A CLT ad tight lower bouds for estimatig etropy, Electroic Colloquium o Computatioal Complexity (ECCC), 2010 Gregory Valiat ad Paul Valiat. Estimatig the usee: a subliear-sample caoical estimator of distributios, Electroic Colloquium o Computatioal Complexity, Gregory Valiat ad Paul Valiat, Estimatig the usee: a / log sample estimator for etropy ad support size, show optimal via ew clts, Proceedigs of TOC, Gregory Valiat ad Paul Valiat, The power of liear estimators, Proceedigs of FOC, / 23

33 Literature Yihog Wu ad Pegku Yag. Miimax rates of etropy estimatio o large alphabets via best polyomial approximatio. IEEE Trasactios o Iformatio Theory 62.6 (2016): Jiatao Jiao, Kartik Vekat, Yaju Ha, ad Tsachy Weissma. Miimax estimatio of fuctioals of discrete distributios. IEEE Trasactios o Iformatio Theory 61.5 (2015): Jayadev Acharya, Alo Orlitsky, Aada Theertha uresh, Himashu Tyagi. The complexity of estimatig Ryi etropy. Proceedigs of the Twety-ixth Aual ACM-IAM ymposium o Discrete Algorithms. ociety for Idustrial ad Applied Mathematics, Yaju Ha, Jiatao Jiao, ad Tsachy Weissma. Miimax Rate-Optimal Estimatio of Divergeces betwee Discrete Distributios. arxiv preprit arxiv: (2016). Yuheg Bu, haofeg Zou, Yigbi Liag, Veugopal V. Veeravalli. Estimatio of KL Divergece: Optimal Miimax Rate. arxiv preprit arxiv: (2016). 22 / 23

34 Literature Yaju Ha, Jiatao Jiao, Rajarshi Mukherjee, ad Tsachy Weissma. O Estimatio of L r -Norms i Gaussia White Noise Models. arxiv preprit arxiv: (2017). Yihog Wu ad Pegku Yag. Chebyshev polyomials, momet matchig, ad optimal estimatio of the usee. arxiv preprit arxiv: (2015). Yaju Ha, Jiatao Jiao, Tsachy Weissma, Local momet matchig: a uified methodology for symmetric fuctioal estimatio ad distributio estimatio uder Wasserstei distace, i preparatio 23 / 23

Lecture 16: Achieving and Estimating the Fundamental Limit

Lecture 16: Achieving and Estimating the Fundamental Limit EE378A tatistical igal Processig Lecture 6-05/25/207 Lecture 6: Achievig ad Estimatig the Fudametal Limit Lecturer: Jiatao Jiao cribe: William Clary I this lecture, we formally defie the two distict problems