Lecture 8: Minimax Lower Bounds: LeCam, Fano, and Assouad
|
|
- Jayson Berry
- 5 years ago
- Views:
Transcription
1 40.850: athematical Foundation of Big Data Analysis Spring 206 Lecture 8: inimax Lower Bounds: LeCam, Fano, and Assouad Lecturer: Fang Han arch 07 Disclaimer: These notes have not been subjected to the usual scrutiny reserved for formal publications. They may be distributed outside this class only with the permission of the Lecturer. 8. Overview on minimax lower bound This lecture note is focused on introducing the general framework for constructing lower bounds for any given statistical problem. Such a framework, formulated by Lucien LeCam and many others in 970s-80s, aims to mathematically rigorously understand statistical problems challenge via a worst-case analysis. We define a statistical problem as follows. It contains three components: () a parameter space Θ; (2) a class of probability measures {P θ, θ Θ}; (3) a semi-distance d(, ) : Θ Θ R on Θ. A statistical problem aims to recover θ, measured by d(, ), given P θ. Example 8... Throughout this note, we will repeatedly visit the sparse normal mean estimation problem: estimate θ Θ s := {v R p : p(v) s} based on P θ := N p (θ, σ 2 I d ) n. Here d(, ) is commonly adopted to be the Euclidean distance on R p space. Regarding a general statistical problem, we aim to know how hard, at worst, it is to recover any given θ Θ. This is formulated by the maximum risk of the estimator on Θ: r( θ n ) := E θ d 2 ( θ n, θ). 8.. Upper bounding the maximum risk By the techniques we have learnt so far, usually it is not too hard to upper bound r( θ n ). For example, in Example 8.., let s consider the LE (also called the least square estimator): We then have the following theorem. θ n := argmin v Θ s n X i v 2. Theorem 8..2 (Sparse normal mean estimation - upper bound). For Example 8.., we have where C is an absolute constant. i= E θ θ n θ 2 2 C σ2 s log(ep/s), s n Remark Note that the upper bound does not depend on the scale of θ, which is a little bit surprising. 8-
2 8-2 Lecture 8: inimax Lower Bounds: LeCam, Fano, and Assouad Proof. For any θ Θ s, by definition of θ n, we have n X i θ n n 2 X i θ 2, i= i= which implies θ n θ X θ, θ θ n θ n θ θ n θ 2 2 X θ, θ n θ 2 where X := n Xi. By the standard uniform consistency argument in EP, using the fact that p( θ n θ) 2s, we can continue writing θ n θ 2 2 v B(2s) v T (X θ) where B(s) := {v R p : p(v) s, v 2 = }. We then employ a covering net argument as in Lecture note #4. In particular, for any a R q, v v 2 2 ɛ, and v 2 = v 2 =, we have This implies (v T a) 2 (v T 2 a) 2 2ɛ v 2= (v T a) 2. v T a 2 ( 2ɛ) v T a 2. v 2= v N ɛ Here N ɛ is the ɛ-net on S q, with the cardinality smaller than ( + 2/ɛ) q. This further implies P θ ( θ n θ 2 2 2t) P θ ( v B(2s) v T (X θ) 2t) ( ) p 9 2s P θ ( v T (X θ) t) 2s ( ) 2s 9p 2 exp( nt 2 /2σ 2 ) 2s where the last inequality is due to the fact that X θ N p (0, σ 2 I d /n) under P θ, implying v T (X θ) N q (0, σ 2 /n). This implies θ n θ 2 = O P ( s log(ep/s)/n). The expectation version is by EX = P (X t)dt for any positive r.v. X. t> A general reduction scheme In this section, we introduce a general framework to calculate lower bounds for any given statistical problem. This section largely follows Tsybakov s wonderful book Introduction to Nonparametric Estimation. Our aim is to lower bound the minimax error: E θ d 2 (, θ), where is any measurable statistic on P θ for any given θ.
3 Lecture 8: inimax Lower Bounds: LeCam, Fano, and Assouad 8-3 () The first step is to reduce the expectation bounds to probability. This is via the arkov s inequality: E θ d(, θ) AP θ (d(, θ) A). Therefore, as long as we can find a A such that θn P θ (d(, θ) A) C > 0 for some absolute constant C, we have E θ d 2 (, θ) C 2 A 2. (2) The second step is to find a worst-case parameter space {θ 0, θ,..., θ } Θ of inite number of elements (θ 0 is made to be the reference value). This is usually the key step. By the property of imum, we must have P θ (d(, θ) A) θ {θ 0,...,θ } P θ (d(, θ) A). (3) The third step is to make θ 0,..., θ uniformly distinguishable, so that we could transfer the estimation problem to a testing problem. In detail, we must have to construct θ 0,..., θ such that d(θ j, θ k ) 2A, for any k j. Accordingly, for any estimator, if there exists θ k such that d(, θ k ) d(, θ j ), then we must have d(, θ j ) A (otherwise d(θ j, θ k ) d(, θ k ) + d(, θ k ) 2A). In other words, we must have where Ψ is the minimum distance test: P θj (d(, θ j ) A) P θj (Ψ j), for j = 0,...,, Ψ := argmin d(, θ j ). 0 j Therefore, we have successfully transferred the estimation problems to testing problems: P θ (d(, θ) A) Ψ 0 j P θj (Ψ j), The testing problem on the righthand side has been very nice to deal with. In particular, when =, we have recovered the two-class hypothesis testing problem, which by intuition we know Nayman-Pearson Lemma could get in to give a sharp lower bound (actually, this is one of the major motivations to bring ormation theory to statistics: LE is exactly the projection estimator regarding the K-L divergence!). The lower bound for multiple (but finite) hypothesis testing problem is constructed by LeCam. As you can imagine, it is still based on likelihood ratios. 8.2 LeCam s approach In the following, we drop the absolute continuous issue. Please check Tsybakov s book for the complete version. Theorem Let P 0, P,..., P be probability measures on (X, A). We then have Ψ 0 j τ P j (Ψ j) τ>0 + τ ( ) dp0 P j τ.
4 8-4 Lecture 8: inimax Lower Bounds: LeCam, Fano, and Assouad Proof. Let s write Ψ be the test taking values {0,,..., }. Let { } dp0 T j := τ. We then have where P 0 (Ψ 0) = P 0 (Ψ = j) = I(Ψ j )dp 0 = Accordingly, we have τ p 0 := I(Ψ j ) dp 0 P j (Ψ j) τ P j (Tj C ) = τ(p 0 α), P j (Ψ = j) and α := { } max P j(ψ j) = max P 0 (Ψ 0), max P j(ψ j) 0 j j τp j ({Ψ = j} T j ) ( ) dp0 P j < τ. max τ(p 0 α), τ( α) = max{τ(p 0 α), p 0 } min max{τ(p α), p} = 0 p + τ. This completes the proof. This then gives us the desired result. Theorem (LeCam). Assume Θ = {θ 0, θ,..., θ } is constructed such that () d(θ j, θ k ) 2A > 0 for any 0 j < k ; (2) there exists τ > 0 and 0 < α < such that ( ) dp0 P j τ α. P j (Ψ j) We then have P θ (d(, θ) A) τ ( α). + τ 8.2. Elementary ormation theory Given Theorem 8.2.2, what is left is to determine the value of P j ( dp 0 ) τ. This is, naturally, the problem of determining the distance between two probability measures P 0 and P j, which plays a key role in ormation theory. Of note, there is a track in probability to rethink everything in statistics from the perspective of the ormation theory. One motivating example is Stein s method, which is a flexible way in determining
5 Lecture 8: inimax Lower Bounds: LeCam, Fano, and Assouad 8-5 the rate for weak convergence in weak topology. People of interest should read Ramon von Handel s note ( as well as John Duchi s ( class/stats3/lectures/full_notes.pdf). ( ) dp Nevertheless, let s restrict our interest to studying P 0 j τ. To this end, let s introduce the following concepts. Definition For any two probability measures P and Q on (X, A), pose ν is a σ-filed measure on (X, A) satisfying P ν and Q ν. We define p = dp/dν and q = dq/dν, and f-divergence: D f (P, Q) := f ( ) dp dq. dq Hellinger distance (H 2 as a special case of f-divergence by taking f(x) = ( x ) 2 ): ( H(P, Q) := ( p /2 ( q) dν) 2 = [ dp ) /2 dq] 2. The last equality is the famous Sheffe s theorem. Total variation distance (taking f(x) = x /2): V (P, Q) := P (T ) Q(T ) = (p q)dν = T A T A 2 T p q dν. Kullback-Leibler (K-L) divergence (taking f(x) = x log x): K(P, Q) = log dp dp for P Q. dq χ 2 divergence (by taking f(x) = (x ) 2 ): ξ 2 (P, Q) := ( ) 2 dp dq. There are a bunch of useful inequalities within the f-divergence family. However, due to the scope limit, let s focus on the one of the most useful, Pinsker s inequalities. Lemma (Pinsker s inequalities). () V (P, Q) K(P, Q)/2. (2) If P Q, then log dp dq dp := pq>0 p log p q dν K(P, Q) + 2K(P, Q), and ( log dp ) dp K(P, Q) + K(P, Q)/2, dq + where a + := max(a, 0). Proof. Left as a good exercise.
6 8-6 Lecture 8: inimax Lower Bounds: LeCam, Fano, and Assouad Back to Theorem Using the Pinsker s inequalities, we are now well equipped to provide a more user-friendly version of LeCam s theorem. Theorem Assume that 2 and Θ = {θ 0,..., θ } satisfies () d(θ j, θ k ) 2A > 0 for any 0 j < k ; (2) P j P 0 and K(P j, P 0 ) α log, with 0 < α < /8 (the upper bound /8 is to make sure the final lower bound is larger than 0). We then have P θ (d(, θ) A) + ( 2α 2α log ). Proof. In Theorem 8.2.2, for any 0 < τ <, we have ( ) ( dp0 dpj P j τ = P j ) ( = P j log > log ) ( log dp ) j, dp 0 τ dp 0 τ log(/τ) dp 0 + where the last inequality is due to arkov s inequality. We then can employ the second Pinsker s inequality to derive ( ) [ ] dp0 P j τ K(P j, P 0 ) + K(P j, P 0 )/2. log /τ By Jensen s inequality and Condition (2), K(P j, P 0 ) K(P j, P 0 ) /2 α log. Accordingly, we have ( ) dp0 P j τ log /τ (α log + α log /2), which, combined with Theorem 8.2.2, yields P θ (d(, θ) A) τ + τ ( Picking τ = / minimizes the above term and finalizes the proof. log /τ (α log + ) α log /2) Application to Example 8.. Let s then show the upper bound in Theorem 8..2 is rate-optimal in the minimax sense via using Theorem
7 Lecture 8: inimax Lower Bounds: LeCam, Fano, and Assouad 8-7 Theorem Regarding the statistical problem in Example 8..2, we have E θ θ 2 2 σ 2 s log(ep/s)/n. s To prove the result, we need a strong result in combinatorics. In detail, let We then have the Varshamov-Gilbert bound. Ω := {ω = (ω,..., ω m ), ω i {0, }} = {0, } m. Lemma (Varshamov-Gilbert bound). Let m > 8. Then there exists a subset {ω (0),..., ω () } of Ω such that ω (0) = (0,..., 0) and ω (j) ω (k) 0 m 8 for 0 j < k (Here 0 represents the Hamming distance) and 2 m/8. Proof of Theorem To prove this theorem, we use a variation of the Varshamov-Gilbert bound. We construct the parameter space Θ 0 as follows θ 0 = 0, [θ Ti ] j = a I(j T i ), where T i T, which includes all s distinct elements in {,..., p} that pairwise have overlaps at, at most, s/8 position, and a > 0 is a value to be determined later. Using a variation of the Varshamov-Gilbert bound (a proof using probabilistic method like the proof of random projection. aybe I will cover it in class), we can prove log := log Θ 0 s log(p/s). Furthermore, for Gaussian distribution, we know and KL(P j, P 0 ) = 2 θ j 2 2 = nsa 2 /2σ 2 θ j θ k 2 2 sa 2. Accordingly, picking a 2 σ 2 log(p/s)/n completes the proof. 8.3 Fano and Assouad I do not believe I will have time to introduce them, but I can promise you they are very useful. Actually, they are developed to handle the cases LeCam s method cannot or is very difficult to handle. People of interest should read Duchi s note.
21.2 Example 1 : Non-parametric regression in Mean Integrated Square Error Density Estimation (L 2 2 risk)
10-704: Information Processing and Learning Spring 2015 Lecture 21: Examples of Lower Bounds and Assouad s Method Lecturer: Akshay Krishnamurthy Scribes: Soumya Batra Note: LaTeX template courtesy of UC
More informationLecture 35: December The fundamental statistical distances
36-705: Intermediate Statistics Fall 207 Lecturer: Siva Balakrishnan Lecture 35: December 4 Today we will discuss distances and metrics between distributions that are useful in statistics. I will be lose
More information10-704: Information Processing and Learning Fall Lecture 21: Nov 14. sup
0-704: Information Processing and Learning Fall 206 Lecturer: Aarti Singh Lecture 2: Nov 4 Note: hese notes are based on scribed notes from Spring5 offering of this course LaeX template courtesy of UC
More informationIntroduction to Statistical Learning Theory
Introduction to Statistical Learning Theory In the last unit we looked at regularization - adding a w 2 penalty. We add a bias - we prefer classifiers with low norm. How to incorporate more complicated
More informationMinimax lower bounds I
Minimax lower bounds I Kyoung Hee Kim Sungshin University 1 Preliminaries 2 General strategy 3 Le Cam, 1973 4 Assouad, 1983 5 Appendix Setting Family of probability measures {P θ : θ Θ} on a sigma field
More informationLecture 21: Minimax Theory
Lecture : Minimax Theory Akshay Krishnamurthy akshay@cs.umass.edu November 8, 07 Recap In the first part of the course, we spent the majority of our time studying risk minimization. We found many ways
More informationRATE-OPTIMAL GRAPHON ESTIMATION. By Chao Gao, Yu Lu and Harrison H. Zhou Yale University
Submitted to the Annals of Statistics arxiv: arxiv:0000.0000 RATE-OPTIMAL GRAPHON ESTIMATION By Chao Gao, Yu Lu and Harrison H. Zhou Yale University Network analysis is becoming one of the most active
More informationStatistics 612: L p spaces, metrics on spaces of probabilites, and connections to estimation
Statistics 62: L p spaces, metrics on spaces of probabilites, and connections to estimation Moulinath Banerjee December 6, 2006 L p spaces and Hilbert spaces We first formally define L p spaces. Consider
More informationDiscussion of Hypothesis testing by convex optimization
Electronic Journal of Statistics Vol. 9 (2015) 1 6 ISSN: 1935-7524 DOI: 10.1214/15-EJS990 Discussion of Hypothesis testing by convex optimization Fabienne Comte, Céline Duval and Valentine Genon-Catalot
More information10/36-702: Minimax Theory
0/36-70: Minimax Theory Introduction When solving a statistical learning problem, there are often many procedures to choose from. This leads to the following question: how can we tell if one statistical
More informationOn Bayes Risk Lower Bounds
Journal of Machine Learning Research 17 (2016) 1-58 Submitted 4/16; Revised 10/16; Published 12/16 On Bayes Risk Lower Bounds Xi Chen Stern School of Business New York University New York, NY 10012, USA
More information10-704: Information Processing and Learning Fall Lecture 24: Dec 7
0-704: Information Processing and Learning Fall 206 Lecturer: Aarti Singh Lecture 24: Dec 7 Note: These notes are based on scribed notes from Spring5 offering of this course. LaTeX template courtesy of
More informationLecture 9: October 25, Lower bounds for minimax rates via multiple hypotheses
Information and Coding Theory Autumn 07 Lecturer: Madhur Tulsiani Lecture 9: October 5, 07 Lower bounds for minimax rates via multiple hypotheses In this lecture, we extend the ideas from the previous
More information21.1 Lower bounds on minimax risk for functional estimation
ECE598: Information-theoretic methods in high-dimensional statistics Spring 016 Lecture 1: Functional estimation & testing Lecturer: Yihong Wu Scribe: Ashok Vardhan, Apr 14, 016 In this chapter, we will
More informationLecture 18: March 15
CS71 Randomness & Computation Spring 018 Instructor: Alistair Sinclair Lecture 18: March 15 Disclaimer: These notes have not been subjected to the usual scrutiny accorded to formal publications. They may
More informationLecture 6: September 19
36-755: Advanced Statistical Theory I Fall 2016 Lecture 6: September 19 Lecturer: Alessandro Rinaldo Scribe: YJ Choe Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes have
More informationSeries 7, May 22, 2018 (EM Convergence)
Exercises Introduction to Machine Learning SS 2018 Series 7, May 22, 2018 (EM Convergence) Institute for Machine Learning Dept. of Computer Science, ETH Zürich Prof. Dr. Andreas Krause Web: https://las.inf.ethz.ch/teaching/introml-s18
More informationMetric spaces and metrizability
1 Motivation Metric spaces and metrizability By this point in the course, this section should not need much in the way of motivation. From the very beginning, we have talked about R n usual and how relatively
More informationLecture 4: Hashing and Streaming Algorithms
CSE 521: Design and Analysis of Algorithms I Winter 2017 Lecture 4: Hashing and Streaming Algorithms Lecturer: Shayan Oveis Gharan 01/18/2017 Scribe: Yuqing Ai Disclaimer: These notes have not been subjected
More informationLecture 9: March 26, 2014
COMS 6998-3: Sub-Linear Algorithms in Learning and Testing Lecturer: Rocco Servedio Lecture 9: March 26, 204 Spring 204 Scriber: Keith Nichols Overview. Last Time Finished analysis of O ( n ɛ ) -query
More information16.1 Bounding Capacity with Covering Number
ECE598: Information-theoretic methods in high-dimensional statistics Spring 206 Lecture 6: Upper Bounds for Density Estimation Lecturer: Yihong Wu Scribe: Yang Zhang, Apr, 206 So far we have been mostly
More information1 Closest Pair of Points on the Plane
CS 31: Algorithms (Spring 2019): Lecture 5 Date: 4th April, 2019 Topic: Divide and Conquer 3: Closest Pair of Points on a Plane Disclaimer: These notes have not gone through scrutiny and in all probability
More informationLecture 1: Introduction, Entropy and ML estimation
0-704: Information Processing and Learning Spring 202 Lecture : Introduction, Entropy and ML estimation Lecturer: Aarti Singh Scribes: Min Xu Disclaimer: These notes have not been subjected to the usual
More informationLecture 21: P vs BPP 2
Advanced Complexity Theory Spring 206 Prof. Dana Moshkovitz Lecture 2: P vs BPP 2 Overview In the previous lecture, we began our discussion of pseudorandomness. We presented the Blum- Micali definition
More informationMinimax lower bounds
Minimax lower bounds Maxim Raginsky December 4, 013 Now that we have a good handle on the performance of ERM and its variants, it is time to ask whether we can do better. For example, consider binary classification:
More informationLecture 8: Linear Algebra Background
CSE 521: Design and Analysis of Algorithms I Winter 2017 Lecture 8: Linear Algebra Background Lecturer: Shayan Oveis Gharan 2/1/2017 Scribe: Swati Padmanabhan Disclaimer: These notes have not been subjected
More information1.1 Basis of Statistical Decision Theory
ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016 Lecture 1: Introduction Lecturer: Yihong Wu Scribe: AmirEmad Ghassami, Jan 21, 2016 [Ed. Jan 31] Outline: Introduction of
More informationNotions such as convergent sequence and Cauchy sequence make sense for any metric space. Convergent Sequences are Cauchy
Banach Spaces These notes provide an introduction to Banach spaces, which are complete normed vector spaces. For the purposes of these notes, all vector spaces are assumed to be over the real numbers.
More information19.1 Maximum Likelihood estimator and risk upper bound
ECE598: Information-theoretic methods in high-dimensional statistics Spring 016 Lecture 19: Denoising sparse vectors - Ris upper bound Lecturer: Yihong Wu Scribe: Ravi Kiran Raman, Apr 1, 016 This lecture
More informationSUBMITTED TO IEEE TRANSACTIONS ON INFORMATION THEORY 1. Minimax-Optimal Bounds for Detectors Based on Estimated Prior Probabilities
SUBMITTED TO IEEE TRANSACTIONS ON INFORMATION THEORY 1 Minimax-Optimal Bounds for Detectors Based on Estimated Prior Probabilities Jiantao Jiao*, Lin Zhang, Member, IEEE and Robert D. Nowak, Fellow, IEEE
More informationMGMT 69000: Topics in High-dimensional Data Analysis Falll 2016
MGMT 69000: Topics in High-dimensional Data Analysis Falll 2016 Lecture 14: Information Theoretic Methods Lecturer: Jiaming Xu Scribe: Hilda Ibriga, Adarsh Barik, December 02, 2016 Outline f-divergence
More informationLecture 5: January 30
CS71 Randomness & Computation Spring 018 Instructor: Alistair Sinclair Lecture 5: January 30 Disclaimer: These notes have not been subjected to the usual scrutiny accorded to formal publications. They
More informationKernels A Machine Learning Overview
Kernels A Machine Learning Overview S.V.N. Vishy Vishwanathan vishy@axiom.anu.edu.au National ICT of Australia and Australian National University Thanks to Alex Smola, Stéphane Canu, Mike Jordan and Peter
More informationAn Algorithmist s Toolkit Nov. 10, Lecture 17
8.409 An Algorithmist s Toolkit Nov. 0, 009 Lecturer: Jonathan Kelner Lecture 7 Johnson-Lindenstrauss Theorem. Recap We first recap a theorem (isoperimetric inequality) and a lemma (concentration) from
More informationConsistency of the maximum likelihood estimator for general hidden Markov models
Consistency of the maximum likelihood estimator for general hidden Markov models Jimmy Olsson Centre for Mathematical Sciences Lund University Nordstat 2012 Umeå, Sweden Collaborators Hidden Markov models
More informationQuantum Minimax Theorem
This presentation is shown in RIS conference, Kyoto, JAPAN ftanaka@sigmath.es.osaka-u.ac.jp Quantum inimax Theorem in Statistical ti ti Decision i Theory November, 4 Public Version 田中冬彦 Fuyuhiko TANAKA
More informationLecture 9. d N(0, 1). Now we fix n and think of a SRW on [0,1]. We take the k th step at time k n. and our increments are ± 1
Random Walks and Brownian Motion Tel Aviv University Spring 011 Lecture date: May 0, 011 Lecture 9 Instructor: Ron Peled Scribe: Jonathan Hermon In today s lecture we present the Brownian motion (BM).
More informationMath 259: Introduction to Analytic Number Theory Functions of finite order: product formula and logarithmic derivative
Math 259: Introduction to Analytic Number Theory Functions of finite order: product formula and logarithmic derivative This chapter is another review of standard material in complex analysis. See for instance
More informationLecture 5: Probabilistic tools and Applications II
T-79.7003: Graphs and Networks Fall 2013 Lecture 5: Probabilistic tools and Applications II Lecturer: Charalampos E. Tsourakakis Oct. 11, 2013 5.1 Overview In the first part of today s lecture we will
More informationMath 259: Introduction to Analytic Number Theory Functions of finite order: product formula and logarithmic derivative
Math 259: Introduction to Analytic Number Theory Functions of finite order: product formula and logarithmic derivative This chapter is another review of standard material in complex analysis. See for instance
More informationLecture 4: Two-point Sampling, Coupon Collector s problem
Randomized Algorithms Lecture 4: Two-point Sampling, Coupon Collector s problem Sotiris Nikoletseas Associate Professor CEID - ETY Course 2013-2014 Sotiris Nikoletseas, Associate Professor Randomized Algorithms
More information5.1 Inequalities via joint range
ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016 Lecture 5: Inequalities between f-divergences via their joint range Lecturer: Yihong Wu Scribe: Pengkun Yang, Feb 9, 2016
More informationLecture 17: Density Estimation Lecturer: Yihong Wu Scribe: Jiaqi Mu, Mar 31, 2016 [Ed. Apr 1]
ECE598: Information-theoretic methods in high-dimensional statistics Spring 06 Lecture 7: Density Estimation Lecturer: Yihong Wu Scribe: Jiaqi Mu, Mar 3, 06 [Ed. Apr ] In last lecture, we studied the minimax
More informationLecture 3: Semantics of Propositional Logic
Lecture 3: Semantics of Propositional Logic 1 Semantics of Propositional Logic Every language has two aspects: syntax and semantics. While syntax deals with the form or structure of the language, it is
More informationSupremum of simple stochastic processes
Subspace embeddings Daniel Hsu COMS 4772 1 Supremum of simple stochastic processes 2 Recap: JL lemma JL lemma. For any ε (0, 1/2), point set S R d of cardinality 16 ln n S = n, and k N such that k, there
More informationf-divergences, and applications Adityanand Guntuboyina
TO APPEAR I IEEE TRASACTIOS O IFORMATIO THEORY, 20 Lower bounds for the minimax risk using f-divergences, and applications Adityanand Guntuboyina ariv:0020042v2 [mathst] 8 Feb 20 Abstract Lower bounds
More informationLecture 15: October 15
10-725: Optimization Fall 2012 Lecturer: Barnabas Poczos Lecture 15: October 15 Scribes: Christian Kroer, Fanyi Xiao Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes have
More informationOn the method of typical bounded differences. Lutz Warnke. Georgia Tech
On the method of typical bounded differences Lutz Warnke Georgia Tech What is this talk about? Motivation Behaviour of a function of independent random variables ξ 1,..., ξ n : X = F (ξ 1,..., ξ n ) the
More informationSpringer Series in Statistics
Springer Series in Statistics Advisors: P. Bickel, P. Diggle, S. Fienberg, U. Gather, I. Olkin, S. Zeger The French edition of this work that is the basis of this expanded edition was translated by Vladimir
More informationLecture 2: August 31
0-704: Information Processing and Learning Fall 206 Lecturer: Aarti Singh Lecture 2: August 3 Note: These notes are based on scribed notes from Spring5 offering of this course. LaTeX template courtesy
More information14 : Theory of Variational Inference: Inner and Outer Approximation
10-708: Probabilistic Graphical Models 10-708, Spring 2014 14 : Theory of Variational Inference: Inner and Outer Approximation Lecturer: Eric P. Xing Scribes: Yu-Hsin Kuo, Amos Ng 1 Introduction Last lecture
More informationCONSIDER an estimation problem in which we want to
2386 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 4, APRIL 2011 Lower Bounds for the Minimax Risk Using f-divergences, and Applications Adityanand Guntuboyina, Student Member, IEEE Abstract Lower
More informationLecture 4: September Reminder: convergence of sequences
36-705: Intermediate Statistics Fall 2017 Lecturer: Siva Balakrishnan Lecture 4: September 6 In this lecture we discuss the convergence of random variables. At a high-level, our first few lectures focused
More informationLecture 12 November 3
STATS 300A: Theory of Statistics Fall 2015 Lecture 12 November 3 Lecturer: Lester Mackey Scribe: Jae Hyuck Park, Christian Fong Warning: These notes may contain factual and/or typographic errors. 12.1
More information6.1 Variational representation of f-divergences
ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016 Lecture 6: Variational representation, HCR and CR lower bounds Lecturer: Yihong Wu Scribe: Georgios Rovatsos, Feb 11, 2016
More informationLecture 2: Uniform Entropy
STAT 583: Advanced Theory of Statistical Inference Spring 218 Lecture 2: Uniform Entropy Lecturer: Fang Han April 16 Disclaimer: These notes have not been subjected to the usual scrutiny reserved for formal
More information2. Metric Spaces. 2.1 Definitions etc.
2. Metric Spaces 2.1 Definitions etc. The procedure in Section for regarding R as a topological space may be generalized to many other sets in which there is some kind of distance (formally, sets with
More informationEXPOSITORY NOTES ON DISTRIBUTION THEORY, FALL 2018
EXPOSITORY NOTES ON DISTRIBUTION THEORY, FALL 2018 While these notes are under construction, I expect there will be many typos. The main reference for this is volume 1 of Hörmander, The analysis of liner
More information15.1 Upper bound via Sudakov minorization
ECE598: Information-theoretic methos in high-imensional statistics Spring 206 Lecture 5: Suakov, Maurey, an uality of metric entropy Lecturer: Yihong Wu Scribe: Aolin Xu, Mar 7, 206 [E. Mar 24] In this
More informationSection 9.2 introduces the description of Markov processes in terms of their transition probabilities and proves the existence of such processes.
Chapter 9 Markov Processes This lecture begins our study of Markov processes. Section 9.1 is mainly ideological : it formally defines the Markov property for one-parameter processes, and explains why it
More informationLecture 5: Two-point Sampling
Randomized Algorithms Lecture 5: Two-point Sampling Sotiris Nikoletseas Professor CEID - ETY Course 2017-2018 Sotiris Nikoletseas, Professor Randomized Algorithms - Lecture 5 1 / 26 Overview A. Pairwise
More informationChapter 9: Hypothesis Testing Sections
Chapter 9: Hypothesis Testing Sections 9.1 Problems of Testing Hypotheses 9.2 Testing Simple Hypotheses 9.3 Uniformly Most Powerful Tests Skip: 9.4 Two-Sided Alternatives 9.6 Comparing the Means of Two
More informationThe Minesweeper game: Percolation and Complexity
The Minesweeper game: Percolation and Complexity Elchanan Mossel Hebrew University of Jerusalem and Microsoft Research March 15, 2002 Abstract We study a model motivated by the minesweeper game In this
More informationSpring 2014 Advanced Probability Overview. Lecture Notes Set 1: Course Overview, σ-fields, and Measures
36-752 Spring 2014 Advanced Probability Overview Lecture Notes Set 1: Course Overview, σ-fields, and Measures Instructor: Jing Lei Associated reading: Sec 1.1-1.4 of Ash and Doléans-Dade; Sec 1.1 and A.1
More informationOn the Complexity of Best Arm Identification with Fixed Confidence
On the Complexity of Best Arm Identification with Fixed Confidence Discrete Optimization with Noise Aurélien Garivier, Emilie Kaufmann COLT, June 23 th 2016, New York Institut de Mathématiques de Toulouse
More informationStatistics: Learning models from data
DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial
More informationarxiv: v4 [cs.it] 17 Oct 2015
Upper Bounds on the Relative Entropy and Rényi Divergence as a Function of Total Variation Distance for Finite Alphabets Igal Sason Department of Electrical Engineering Technion Israel Institute of Technology
More informationCS229T/STATS231: Statistical Learning Theory. Lecturer: Tengyu Ma Lecture 11 Scribe: Jongho Kim, Jamie Kang October 29th, 2018
CS229T/STATS231: Statistical Learning Theory Lecturer: Tengyu Ma Lecture 11 Scribe: Jongho Kim, Jamie Kang October 29th, 2018 1 Overview This lecture mainly covers Recall the statistical theory of GANs
More informationLecture 14: Random Walks, Local Graph Clustering, Linear Programming
CSE 521: Design and Analysis of Algorithms I Winter 2017 Lecture 14: Random Walks, Local Graph Clustering, Linear Programming Lecturer: Shayan Oveis Gharan 3/01/17 Scribe: Laura Vonessen Disclaimer: These
More informationPriors for the frequentist, consistency beyond Schwartz
Victoria University, Wellington, New Zealand, 11 January 2016 Priors for the frequentist, consistency beyond Schwartz Bas Kleijn, KdV Institute for Mathematics Part I Introduction Bayesian and Frequentist
More informationLecture 22: Error exponents in hypothesis testing, GLRT
10-704: Information Processing and Learning Spring 2012 Lecture 22: Error exponents in hypothesis testing, GLRT Lecturer: Aarti Singh Scribe: Aarti Singh Disclaimer: These notes have not been subjected
More informationAsymptotic Nonequivalence of Nonparametric Experiments When the Smoothness Index is ½
University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 1998 Asymptotic Nonequivalence of Nonparametric Experiments When the Smoothness Index is ½ Lawrence D. Brown University
More informationMathematical Tripos Part III Michaelmas 2017 Distribution Theory & Applications, Example sheet 1 (answers) Dr A.C.L. Ashton
Mathematical Tripos Part III Michaelmas 7 Distribution Theory & Applications, Eample sheet (answers) Dr A.C.L. Ashton Comments and corrections to acla@damtp.cam.ac.uk.. Construct a non-zero element of
More informationLecture 2. We now introduce some fundamental tools in martingale theory, which are useful in controlling the fluctuation of martingales.
Lecture 2 1 Martingales We now introduce some fundamental tools in martingale theory, which are useful in controlling the fluctuation of martingales. 1.1 Doob s inequality We have the following maximal
More informationALGEBRAIC STRUCTURE AND DEGREE REDUCTION
ALGEBRAIC STRUCTURE AND DEGREE REDUCTION Let S F n. We define deg(s) to be the minimal degree of a non-zero polynomial that vanishes on S. We have seen that for a finite set S, deg(s) n S 1/n. In fact,
More informationStatistical Approaches to Learning and Discovery. Week 4: Decision Theory and Risk Minimization. February 3, 2003
Statistical Approaches to Learning and Discovery Week 4: Decision Theory and Risk Minimization February 3, 2003 Recall From Last Time Bayesian expected loss is ρ(π, a) = E π [L(θ, a)] = L(θ, a) df π (θ)
More information3.0.1 Multivariate version and tensor product of experiments
ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016 Lecture 3: Minimax risk of GLM and four extensions Lecturer: Yihong Wu Scribe: Ashok Vardhan, Jan 28, 2016 [Ed. Mar 24]
More informationEstimating Unnormalised Models by Score Matching
Estimating Unnormalised Models by Score Matching Michael Gutmann Probabilistic Modelling and Reasoning (INFR11134) School of Informatics, University of Edinburgh Spring semester 2018 Program 1. Basics
More informationPermutation estimation and minimax matching thresholds
Olivier Collier IMAGINE, Université Paris-Est Arnak Dalalyan CREST, ENSAE Abstract The problem of matching two sets of features appears in various tasks of computer vision and can be often formalized as
More informationLecture 10: Generalized likelihood ratio test
Stat 200: Introduction to Statistical Inference Autumn 2018/19 Lecture 10: Generalized likelihood ratio test Lecturer: Art B. Owen October 25 Disclaimer: These notes have not been subjected to the usual
More information1 Glivenko-Cantelli type theorems
STA79 Lecture Spring Semester Glivenko-Cantelli type theorems Given i.i.d. observations X,..., X n with unknown distribution function F (t, consider the empirical (sample CDF ˆF n (t = I [Xi t]. n Then
More informationMAT 585: Johnson-Lindenstrauss, Group testing, and Compressed Sensing
MAT 585: Johnson-Lindenstrauss, Group testing, and Compressed Sensing Afonso S. Bandeira April 9, 2015 1 The Johnson-Lindenstrauss Lemma Suppose one has n points, X = {x 1,..., x n }, in R d with d very
More informationis a Borel subset of S Θ for each c R (Bertsekas and Shreve, 1978, Proposition 7.36) This always holds in practical applications.
Stat 811 Lecture Notes The Wald Consistency Theorem Charles J. Geyer April 9, 01 1 Analyticity Assumptions Let { f θ : θ Θ } be a family of subprobability densities 1 with respect to a measure µ on a measurable
More informationAditya Bhaskara CS 5968/6968, Lecture 1: Introduction and Review 12 January 2016
Lecture 1: Introduction and Review We begin with a short introduction to the course, and logistics. We then survey some basics about approximation algorithms and probability. We also introduce some of
More informationLecture 2 One too many inequalities
University of Illinois Department of Economics Spring 2017 Econ 574 Roger Koenker Lecture 2 One too many inequalities In lecture 1 we introduced some of the basic conceptual building materials of the course.
More informationPROBABILITY: LIMIT THEOREMS II, SPRING HOMEWORK PROBLEMS
PROBABILITY: LIMIT THEOREMS II, SPRING 218. HOMEWORK PROBLEMS PROF. YURI BAKHTIN Instructions. You are allowed to work on solutions in groups, but you are required to write up solutions on your own. Please
More informationLecture 21: Quantum communication complexity
CPSC 519/619: Quantum Computation John Watrous, University of Calgary Lecture 21: Quantum communication complexity April 6, 2006 In this lecture we will discuss how quantum information can allow for a
More informationOn Consistent Hypotheses Testing
Mikhail Ermakov St.Petersburg E-mail: erm2512@gmail.com History is rather distinctive. Stein (1954); Hodges (1954); Hoefding and Wolfowitz (1956), Le Cam and Schwartz (1960). special original setups: Shepp,
More informationInformation Theory and Hypothesis Testing
Summer School on Game Theory and Telecommunications Campione, 7-12 September, 2014 Information Theory and Hypothesis Testing Mauro Barni University of Siena September 8 Review of some basic results linking
More informationLecture 12: Introduction to Spectral Graph Theory, Cheeger s inequality
CSE 521: Design and Analysis of Algorithms I Spring 2016 Lecture 12: Introduction to Spectral Graph Theory, Cheeger s inequality Lecturer: Shayan Oveis Gharan May 4th Scribe: Gabriel Cadamuro Disclaimer:
More informationGeneralization bounds
Advanced Course in Machine Learning pring 200 Generalization bounds Handouts are jointly prepared by hie Mannor and hai halev-hwartz he problem of characterizing learnability is the most basic question
More informationLecture 6 Basic Probability
Lecture 6: Basic Probability 1 of 17 Course: Theory of Probability I Term: Fall 2013 Instructor: Gordan Zitkovic Lecture 6 Basic Probability Probability spaces A mathematical setup behind a probabilistic
More informationDivergence Based priors for the problem of hypothesis testing
Divergence Based priors for the problem of hypothesis testing gonzalo garcía-donato and susie Bayarri May 22, 2009 gonzalo garcía-donato and susie Bayarri () DB priors May 22, 2009 1 / 46 Jeffreys and
More informationMore Empirical Process Theory
More Empirical Process heory 4.384 ime Series Analysis, Fall 2008 Recitation by Paul Schrimpf Supplementary to lectures given by Anna Mikusheva October 24, 2008 Recitation 8 More Empirical Process heory
More informationThe information complexity of sequential resource allocation
The information complexity of sequential resource allocation Emilie Kaufmann, joint work with Olivier Cappé, Aurélien Garivier and Shivaram Kalyanakrishan SMILE Seminar, ENS, June 8th, 205 Sequential allocation
More information2. The Concept of Convergence: Ultrafilters and Nets
2. The Concept of Convergence: Ultrafilters and Nets NOTE: AS OF 2008, SOME OF THIS STUFF IS A BIT OUT- DATED AND HAS A FEW TYPOS. I WILL REVISE THIS MATE- RIAL SOMETIME. In this lecture we discuss two
More informationCS 591, Lecture 2 Data Analytics: Theory and Applications Boston University
CS 591, Lecture 2 Data Analytics: Theory and Applications Boston University Charalampos E. Tsourakakis January 25rd, 2017 Probability Theory The theory of probability is a system for making better guesses.
More informationLecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016
Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016 1 Entropy Since this course is about entropy maximization,
More informationLecture 5: February 16, 2012
COMS 6253: Advanced Computational Learning Theory Lecturer: Rocco Servedio Lecture 5: February 16, 2012 Spring 2012 Scribe: Igor Carboni Oliveira 1 Last time and today Previously: Finished first unit on
More informationLecture 6: September 22
CS294 Markov Chain Monte Carlo: Foundations & Applications Fall 2009 Lecture 6: September 22 Lecturer: Prof. Alistair Sinclair Scribes: Alistair Sinclair Disclaimer: These notes have not been subjected
More information