Lecture 8: Minimax Lower Bounds: LeCam, Fano, and Assouad

Size: px
Start display at page:

Download "Lecture 8: Minimax Lower Bounds: LeCam, Fano, and Assouad"

Transcription

1 40.850: athematical Foundation of Big Data Analysis Spring 206 Lecture 8: inimax Lower Bounds: LeCam, Fano, and Assouad Lecturer: Fang Han arch 07 Disclaimer: These notes have not been subjected to the usual scrutiny reserved for formal publications. They may be distributed outside this class only with the permission of the Lecturer. 8. Overview on minimax lower bound This lecture note is focused on introducing the general framework for constructing lower bounds for any given statistical problem. Such a framework, formulated by Lucien LeCam and many others in 970s-80s, aims to mathematically rigorously understand statistical problems challenge via a worst-case analysis. We define a statistical problem as follows. It contains three components: () a parameter space Θ; (2) a class of probability measures {P θ, θ Θ}; (3) a semi-distance d(, ) : Θ Θ R on Θ. A statistical problem aims to recover θ, measured by d(, ), given P θ. Example 8... Throughout this note, we will repeatedly visit the sparse normal mean estimation problem: estimate θ Θ s := {v R p : p(v) s} based on P θ := N p (θ, σ 2 I d ) n. Here d(, ) is commonly adopted to be the Euclidean distance on R p space. Regarding a general statistical problem, we aim to know how hard, at worst, it is to recover any given θ Θ. This is formulated by the maximum risk of the estimator on Θ: r( θ n ) := E θ d 2 ( θ n, θ). 8.. Upper bounding the maximum risk By the techniques we have learnt so far, usually it is not too hard to upper bound r( θ n ). For example, in Example 8.., let s consider the LE (also called the least square estimator): We then have the following theorem. θ n := argmin v Θ s n X i v 2. Theorem 8..2 (Sparse normal mean estimation - upper bound). For Example 8.., we have where C is an absolute constant. i= E θ θ n θ 2 2 C σ2 s log(ep/s), s n Remark Note that the upper bound does not depend on the scale of θ, which is a little bit surprising. 8-

2 8-2 Lecture 8: inimax Lower Bounds: LeCam, Fano, and Assouad Proof. For any θ Θ s, by definition of θ n, we have n X i θ n n 2 X i θ 2, i= i= which implies θ n θ X θ, θ θ n θ n θ θ n θ 2 2 X θ, θ n θ 2 where X := n Xi. By the standard uniform consistency argument in EP, using the fact that p( θ n θ) 2s, we can continue writing θ n θ 2 2 v B(2s) v T (X θ) where B(s) := {v R p : p(v) s, v 2 = }. We then employ a covering net argument as in Lecture note #4. In particular, for any a R q, v v 2 2 ɛ, and v 2 = v 2 =, we have This implies (v T a) 2 (v T 2 a) 2 2ɛ v 2= (v T a) 2. v T a 2 ( 2ɛ) v T a 2. v 2= v N ɛ Here N ɛ is the ɛ-net on S q, with the cardinality smaller than ( + 2/ɛ) q. This further implies P θ ( θ n θ 2 2 2t) P θ ( v B(2s) v T (X θ) 2t) ( ) p 9 2s P θ ( v T (X θ) t) 2s ( ) 2s 9p 2 exp( nt 2 /2σ 2 ) 2s where the last inequality is due to the fact that X θ N p (0, σ 2 I d /n) under P θ, implying v T (X θ) N q (0, σ 2 /n). This implies θ n θ 2 = O P ( s log(ep/s)/n). The expectation version is by EX = P (X t)dt for any positive r.v. X. t> A general reduction scheme In this section, we introduce a general framework to calculate lower bounds for any given statistical problem. This section largely follows Tsybakov s wonderful book Introduction to Nonparametric Estimation. Our aim is to lower bound the minimax error: E θ d 2 (, θ), where is any measurable statistic on P θ for any given θ.

3 Lecture 8: inimax Lower Bounds: LeCam, Fano, and Assouad 8-3 () The first step is to reduce the expectation bounds to probability. This is via the arkov s inequality: E θ d(, θ) AP θ (d(, θ) A). Therefore, as long as we can find a A such that θn P θ (d(, θ) A) C > 0 for some absolute constant C, we have E θ d 2 (, θ) C 2 A 2. (2) The second step is to find a worst-case parameter space {θ 0, θ,..., θ } Θ of inite number of elements (θ 0 is made to be the reference value). This is usually the key step. By the property of imum, we must have P θ (d(, θ) A) θ {θ 0,...,θ } P θ (d(, θ) A). (3) The third step is to make θ 0,..., θ uniformly distinguishable, so that we could transfer the estimation problem to a testing problem. In detail, we must have to construct θ 0,..., θ such that d(θ j, θ k ) 2A, for any k j. Accordingly, for any estimator, if there exists θ k such that d(, θ k ) d(, θ j ), then we must have d(, θ j ) A (otherwise d(θ j, θ k ) d(, θ k ) + d(, θ k ) 2A). In other words, we must have where Ψ is the minimum distance test: P θj (d(, θ j ) A) P θj (Ψ j), for j = 0,...,, Ψ := argmin d(, θ j ). 0 j Therefore, we have successfully transferred the estimation problems to testing problems: P θ (d(, θ) A) Ψ 0 j P θj (Ψ j), The testing problem on the righthand side has been very nice to deal with. In particular, when =, we have recovered the two-class hypothesis testing problem, which by intuition we know Nayman-Pearson Lemma could get in to give a sharp lower bound (actually, this is one of the major motivations to bring ormation theory to statistics: LE is exactly the projection estimator regarding the K-L divergence!). The lower bound for multiple (but finite) hypothesis testing problem is constructed by LeCam. As you can imagine, it is still based on likelihood ratios. 8.2 LeCam s approach In the following, we drop the absolute continuous issue. Please check Tsybakov s book for the complete version. Theorem Let P 0, P,..., P be probability measures on (X, A). We then have Ψ 0 j τ P j (Ψ j) τ>0 + τ ( ) dp0 P j τ.

4 8-4 Lecture 8: inimax Lower Bounds: LeCam, Fano, and Assouad Proof. Let s write Ψ be the test taking values {0,,..., }. Let { } dp0 T j := τ. We then have where P 0 (Ψ 0) = P 0 (Ψ = j) = I(Ψ j )dp 0 = Accordingly, we have τ p 0 := I(Ψ j ) dp 0 P j (Ψ j) τ P j (Tj C ) = τ(p 0 α), P j (Ψ = j) and α := { } max P j(ψ j) = max P 0 (Ψ 0), max P j(ψ j) 0 j j τp j ({Ψ = j} T j ) ( ) dp0 P j < τ. max τ(p 0 α), τ( α) = max{τ(p 0 α), p 0 } min max{τ(p α), p} = 0 p + τ. This completes the proof. This then gives us the desired result. Theorem (LeCam). Assume Θ = {θ 0, θ,..., θ } is constructed such that () d(θ j, θ k ) 2A > 0 for any 0 j < k ; (2) there exists τ > 0 and 0 < α < such that ( ) dp0 P j τ α. P j (Ψ j) We then have P θ (d(, θ) A) τ ( α). + τ 8.2. Elementary ormation theory Given Theorem 8.2.2, what is left is to determine the value of P j ( dp 0 ) τ. This is, naturally, the problem of determining the distance between two probability measures P 0 and P j, which plays a key role in ormation theory. Of note, there is a track in probability to rethink everything in statistics from the perspective of the ormation theory. One motivating example is Stein s method, which is a flexible way in determining

5 Lecture 8: inimax Lower Bounds: LeCam, Fano, and Assouad 8-5 the rate for weak convergence in weak topology. People of interest should read Ramon von Handel s note ( as well as John Duchi s ( class/stats3/lectures/full_notes.pdf). ( ) dp Nevertheless, let s restrict our interest to studying P 0 j τ. To this end, let s introduce the following concepts. Definition For any two probability measures P and Q on (X, A), pose ν is a σ-filed measure on (X, A) satisfying P ν and Q ν. We define p = dp/dν and q = dq/dν, and f-divergence: D f (P, Q) := f ( ) dp dq. dq Hellinger distance (H 2 as a special case of f-divergence by taking f(x) = ( x ) 2 ): ( H(P, Q) := ( p /2 ( q) dν) 2 = [ dp ) /2 dq] 2. The last equality is the famous Sheffe s theorem. Total variation distance (taking f(x) = x /2): V (P, Q) := P (T ) Q(T ) = (p q)dν = T A T A 2 T p q dν. Kullback-Leibler (K-L) divergence (taking f(x) = x log x): K(P, Q) = log dp dp for P Q. dq χ 2 divergence (by taking f(x) = (x ) 2 ): ξ 2 (P, Q) := ( ) 2 dp dq. There are a bunch of useful inequalities within the f-divergence family. However, due to the scope limit, let s focus on the one of the most useful, Pinsker s inequalities. Lemma (Pinsker s inequalities). () V (P, Q) K(P, Q)/2. (2) If P Q, then log dp dq dp := pq>0 p log p q dν K(P, Q) + 2K(P, Q), and ( log dp ) dp K(P, Q) + K(P, Q)/2, dq + where a + := max(a, 0). Proof. Left as a good exercise.

6 8-6 Lecture 8: inimax Lower Bounds: LeCam, Fano, and Assouad Back to Theorem Using the Pinsker s inequalities, we are now well equipped to provide a more user-friendly version of LeCam s theorem. Theorem Assume that 2 and Θ = {θ 0,..., θ } satisfies () d(θ j, θ k ) 2A > 0 for any 0 j < k ; (2) P j P 0 and K(P j, P 0 ) α log, with 0 < α < /8 (the upper bound /8 is to make sure the final lower bound is larger than 0). We then have P θ (d(, θ) A) + ( 2α 2α log ). Proof. In Theorem 8.2.2, for any 0 < τ <, we have ( ) ( dp0 dpj P j τ = P j ) ( = P j log > log ) ( log dp ) j, dp 0 τ dp 0 τ log(/τ) dp 0 + where the last inequality is due to arkov s inequality. We then can employ the second Pinsker s inequality to derive ( ) [ ] dp0 P j τ K(P j, P 0 ) + K(P j, P 0 )/2. log /τ By Jensen s inequality and Condition (2), K(P j, P 0 ) K(P j, P 0 ) /2 α log. Accordingly, we have ( ) dp0 P j τ log /τ (α log + α log /2), which, combined with Theorem 8.2.2, yields P θ (d(, θ) A) τ + τ ( Picking τ = / minimizes the above term and finalizes the proof. log /τ (α log + ) α log /2) Application to Example 8.. Let s then show the upper bound in Theorem 8..2 is rate-optimal in the minimax sense via using Theorem

7 Lecture 8: inimax Lower Bounds: LeCam, Fano, and Assouad 8-7 Theorem Regarding the statistical problem in Example 8..2, we have E θ θ 2 2 σ 2 s log(ep/s)/n. s To prove the result, we need a strong result in combinatorics. In detail, let We then have the Varshamov-Gilbert bound. Ω := {ω = (ω,..., ω m ), ω i {0, }} = {0, } m. Lemma (Varshamov-Gilbert bound). Let m > 8. Then there exists a subset {ω (0),..., ω () } of Ω such that ω (0) = (0,..., 0) and ω (j) ω (k) 0 m 8 for 0 j < k (Here 0 represents the Hamming distance) and 2 m/8. Proof of Theorem To prove this theorem, we use a variation of the Varshamov-Gilbert bound. We construct the parameter space Θ 0 as follows θ 0 = 0, [θ Ti ] j = a I(j T i ), where T i T, which includes all s distinct elements in {,..., p} that pairwise have overlaps at, at most, s/8 position, and a > 0 is a value to be determined later. Using a variation of the Varshamov-Gilbert bound (a proof using probabilistic method like the proof of random projection. aybe I will cover it in class), we can prove log := log Θ 0 s log(p/s). Furthermore, for Gaussian distribution, we know and KL(P j, P 0 ) = 2 θ j 2 2 = nsa 2 /2σ 2 θ j θ k 2 2 sa 2. Accordingly, picking a 2 σ 2 log(p/s)/n completes the proof. 8.3 Fano and Assouad I do not believe I will have time to introduce them, but I can promise you they are very useful. Actually, they are developed to handle the cases LeCam s method cannot or is very difficult to handle. People of interest should read Duchi s note.

21.2 Example 1 : Non-parametric regression in Mean Integrated Square Error Density Estimation (L 2 2 risk)

21.2 Example 1 : Non-parametric regression in Mean Integrated Square Error Density Estimation (L 2 2 risk) 10-704: Information Processing and Learning Spring 2015 Lecture 21: Examples of Lower Bounds and Assouad s Method Lecturer: Akshay Krishnamurthy Scribes: Soumya Batra Note: LaTeX template courtesy of UC

More information

Lecture 35: December The fundamental statistical distances

Lecture 35: December The fundamental statistical distances 36-705: Intermediate Statistics Fall 207 Lecturer: Siva Balakrishnan Lecture 35: December 4 Today we will discuss distances and metrics between distributions that are useful in statistics. I will be lose

More information

10-704: Information Processing and Learning Fall Lecture 21: Nov 14. sup

10-704: Information Processing and Learning Fall Lecture 21: Nov 14. sup 0-704: Information Processing and Learning Fall 206 Lecturer: Aarti Singh Lecture 2: Nov 4 Note: hese notes are based on scribed notes from Spring5 offering of this course LaeX template courtesy of UC

More information

Introduction to Statistical Learning Theory

Introduction to Statistical Learning Theory Introduction to Statistical Learning Theory In the last unit we looked at regularization - adding a w 2 penalty. We add a bias - we prefer classifiers with low norm. How to incorporate more complicated

More information

Minimax lower bounds I

Minimax lower bounds I Minimax lower bounds I Kyoung Hee Kim Sungshin University 1 Preliminaries 2 General strategy 3 Le Cam, 1973 4 Assouad, 1983 5 Appendix Setting Family of probability measures {P θ : θ Θ} on a sigma field

More information

Lecture 21: Minimax Theory

Lecture 21: Minimax Theory Lecture : Minimax Theory Akshay Krishnamurthy akshay@cs.umass.edu November 8, 07 Recap In the first part of the course, we spent the majority of our time studying risk minimization. We found many ways

More information

RATE-OPTIMAL GRAPHON ESTIMATION. By Chao Gao, Yu Lu and Harrison H. Zhou Yale University

RATE-OPTIMAL GRAPHON ESTIMATION. By Chao Gao, Yu Lu and Harrison H. Zhou Yale University Submitted to the Annals of Statistics arxiv: arxiv:0000.0000 RATE-OPTIMAL GRAPHON ESTIMATION By Chao Gao, Yu Lu and Harrison H. Zhou Yale University Network analysis is becoming one of the most active

More information

Statistics 612: L p spaces, metrics on spaces of probabilites, and connections to estimation

Statistics 612: L p spaces, metrics on spaces of probabilites, and connections to estimation Statistics 62: L p spaces, metrics on spaces of probabilites, and connections to estimation Moulinath Banerjee December 6, 2006 L p spaces and Hilbert spaces We first formally define L p spaces. Consider

More information

Discussion of Hypothesis testing by convex optimization

Discussion of Hypothesis testing by convex optimization Electronic Journal of Statistics Vol. 9 (2015) 1 6 ISSN: 1935-7524 DOI: 10.1214/15-EJS990 Discussion of Hypothesis testing by convex optimization Fabienne Comte, Céline Duval and Valentine Genon-Catalot

More information

10/36-702: Minimax Theory

10/36-702: Minimax Theory 0/36-70: Minimax Theory Introduction When solving a statistical learning problem, there are often many procedures to choose from. This leads to the following question: how can we tell if one statistical

More information

On Bayes Risk Lower Bounds

On Bayes Risk Lower Bounds Journal of Machine Learning Research 17 (2016) 1-58 Submitted 4/16; Revised 10/16; Published 12/16 On Bayes Risk Lower Bounds Xi Chen Stern School of Business New York University New York, NY 10012, USA

More information

10-704: Information Processing and Learning Fall Lecture 24: Dec 7

10-704: Information Processing and Learning Fall Lecture 24: Dec 7 0-704: Information Processing and Learning Fall 206 Lecturer: Aarti Singh Lecture 24: Dec 7 Note: These notes are based on scribed notes from Spring5 offering of this course. LaTeX template courtesy of

More information

Lecture 9: October 25, Lower bounds for minimax rates via multiple hypotheses

Lecture 9: October 25, Lower bounds for minimax rates via multiple hypotheses Information and Coding Theory Autumn 07 Lecturer: Madhur Tulsiani Lecture 9: October 5, 07 Lower bounds for minimax rates via multiple hypotheses In this lecture, we extend the ideas from the previous

More information

21.1 Lower bounds on minimax risk for functional estimation

21.1 Lower bounds on minimax risk for functional estimation ECE598: Information-theoretic methods in high-dimensional statistics Spring 016 Lecture 1: Functional estimation & testing Lecturer: Yihong Wu Scribe: Ashok Vardhan, Apr 14, 016 In this chapter, we will

More information

Lecture 18: March 15

Lecture 18: March 15 CS71 Randomness & Computation Spring 018 Instructor: Alistair Sinclair Lecture 18: March 15 Disclaimer: These notes have not been subjected to the usual scrutiny accorded to formal publications. They may

More information

Lecture 6: September 19

Lecture 6: September 19 36-755: Advanced Statistical Theory I Fall 2016 Lecture 6: September 19 Lecturer: Alessandro Rinaldo Scribe: YJ Choe Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes have

More information

Series 7, May 22, 2018 (EM Convergence)

Series 7, May 22, 2018 (EM Convergence) Exercises Introduction to Machine Learning SS 2018 Series 7, May 22, 2018 (EM Convergence) Institute for Machine Learning Dept. of Computer Science, ETH Zürich Prof. Dr. Andreas Krause Web: https://las.inf.ethz.ch/teaching/introml-s18

More information

Metric spaces and metrizability

Metric spaces and metrizability 1 Motivation Metric spaces and metrizability By this point in the course, this section should not need much in the way of motivation. From the very beginning, we have talked about R n usual and how relatively

More information

Lecture 4: Hashing and Streaming Algorithms

Lecture 4: Hashing and Streaming Algorithms CSE 521: Design and Analysis of Algorithms I Winter 2017 Lecture 4: Hashing and Streaming Algorithms Lecturer: Shayan Oveis Gharan 01/18/2017 Scribe: Yuqing Ai Disclaimer: These notes have not been subjected

More information

Lecture 9: March 26, 2014

Lecture 9: March 26, 2014 COMS 6998-3: Sub-Linear Algorithms in Learning and Testing Lecturer: Rocco Servedio Lecture 9: March 26, 204 Spring 204 Scriber: Keith Nichols Overview. Last Time Finished analysis of O ( n ɛ ) -query

More information

16.1 Bounding Capacity with Covering Number

16.1 Bounding Capacity with Covering Number ECE598: Information-theoretic methods in high-dimensional statistics Spring 206 Lecture 6: Upper Bounds for Density Estimation Lecturer: Yihong Wu Scribe: Yang Zhang, Apr, 206 So far we have been mostly

More information

1 Closest Pair of Points on the Plane

1 Closest Pair of Points on the Plane CS 31: Algorithms (Spring 2019): Lecture 5 Date: 4th April, 2019 Topic: Divide and Conquer 3: Closest Pair of Points on a Plane Disclaimer: These notes have not gone through scrutiny and in all probability

More information

Lecture 1: Introduction, Entropy and ML estimation

Lecture 1: Introduction, Entropy and ML estimation 0-704: Information Processing and Learning Spring 202 Lecture : Introduction, Entropy and ML estimation Lecturer: Aarti Singh Scribes: Min Xu Disclaimer: These notes have not been subjected to the usual

More information

Lecture 21: P vs BPP 2

Lecture 21: P vs BPP 2 Advanced Complexity Theory Spring 206 Prof. Dana Moshkovitz Lecture 2: P vs BPP 2 Overview In the previous lecture, we began our discussion of pseudorandomness. We presented the Blum- Micali definition

More information

Minimax lower bounds

Minimax lower bounds Minimax lower bounds Maxim Raginsky December 4, 013 Now that we have a good handle on the performance of ERM and its variants, it is time to ask whether we can do better. For example, consider binary classification:

More information

Lecture 8: Linear Algebra Background

Lecture 8: Linear Algebra Background CSE 521: Design and Analysis of Algorithms I Winter 2017 Lecture 8: Linear Algebra Background Lecturer: Shayan Oveis Gharan 2/1/2017 Scribe: Swati Padmanabhan Disclaimer: These notes have not been subjected

More information

1.1 Basis of Statistical Decision Theory

1.1 Basis of Statistical Decision Theory ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016 Lecture 1: Introduction Lecturer: Yihong Wu Scribe: AmirEmad Ghassami, Jan 21, 2016 [Ed. Jan 31] Outline: Introduction of

More information

Notions such as convergent sequence and Cauchy sequence make sense for any metric space. Convergent Sequences are Cauchy

Notions such as convergent sequence and Cauchy sequence make sense for any metric space. Convergent Sequences are Cauchy Banach Spaces These notes provide an introduction to Banach spaces, which are complete normed vector spaces. For the purposes of these notes, all vector spaces are assumed to be over the real numbers.

More information

19.1 Maximum Likelihood estimator and risk upper bound

19.1 Maximum Likelihood estimator and risk upper bound ECE598: Information-theoretic methods in high-dimensional statistics Spring 016 Lecture 19: Denoising sparse vectors - Ris upper bound Lecturer: Yihong Wu Scribe: Ravi Kiran Raman, Apr 1, 016 This lecture

More information

SUBMITTED TO IEEE TRANSACTIONS ON INFORMATION THEORY 1. Minimax-Optimal Bounds for Detectors Based on Estimated Prior Probabilities

SUBMITTED TO IEEE TRANSACTIONS ON INFORMATION THEORY 1. Minimax-Optimal Bounds for Detectors Based on Estimated Prior Probabilities SUBMITTED TO IEEE TRANSACTIONS ON INFORMATION THEORY 1 Minimax-Optimal Bounds for Detectors Based on Estimated Prior Probabilities Jiantao Jiao*, Lin Zhang, Member, IEEE and Robert D. Nowak, Fellow, IEEE

More information

MGMT 69000: Topics in High-dimensional Data Analysis Falll 2016

MGMT 69000: Topics in High-dimensional Data Analysis Falll 2016 MGMT 69000: Topics in High-dimensional Data Analysis Falll 2016 Lecture 14: Information Theoretic Methods Lecturer: Jiaming Xu Scribe: Hilda Ibriga, Adarsh Barik, December 02, 2016 Outline f-divergence

More information

Lecture 5: January 30

Lecture 5: January 30 CS71 Randomness & Computation Spring 018 Instructor: Alistair Sinclair Lecture 5: January 30 Disclaimer: These notes have not been subjected to the usual scrutiny accorded to formal publications. They

More information

Kernels A Machine Learning Overview

Kernels A Machine Learning Overview Kernels A Machine Learning Overview S.V.N. Vishy Vishwanathan vishy@axiom.anu.edu.au National ICT of Australia and Australian National University Thanks to Alex Smola, Stéphane Canu, Mike Jordan and Peter

More information

An Algorithmist s Toolkit Nov. 10, Lecture 17

An Algorithmist s Toolkit Nov. 10, Lecture 17 8.409 An Algorithmist s Toolkit Nov. 0, 009 Lecturer: Jonathan Kelner Lecture 7 Johnson-Lindenstrauss Theorem. Recap We first recap a theorem (isoperimetric inequality) and a lemma (concentration) from

More information

Consistency of the maximum likelihood estimator for general hidden Markov models

Consistency of the maximum likelihood estimator for general hidden Markov models Consistency of the maximum likelihood estimator for general hidden Markov models Jimmy Olsson Centre for Mathematical Sciences Lund University Nordstat 2012 Umeå, Sweden Collaborators Hidden Markov models

More information

Quantum Minimax Theorem

Quantum Minimax Theorem This presentation is shown in RIS conference, Kyoto, JAPAN ftanaka@sigmath.es.osaka-u.ac.jp Quantum inimax Theorem in Statistical ti ti Decision i Theory November, 4 Public Version 田中冬彦 Fuyuhiko TANAKA

More information

Lecture 9. d N(0, 1). Now we fix n and think of a SRW on [0,1]. We take the k th step at time k n. and our increments are ± 1

Lecture 9. d N(0, 1). Now we fix n and think of a SRW on [0,1]. We take the k th step at time k n. and our increments are ± 1 Random Walks and Brownian Motion Tel Aviv University Spring 011 Lecture date: May 0, 011 Lecture 9 Instructor: Ron Peled Scribe: Jonathan Hermon In today s lecture we present the Brownian motion (BM).

More information

Math 259: Introduction to Analytic Number Theory Functions of finite order: product formula and logarithmic derivative

Math 259: Introduction to Analytic Number Theory Functions of finite order: product formula and logarithmic derivative Math 259: Introduction to Analytic Number Theory Functions of finite order: product formula and logarithmic derivative This chapter is another review of standard material in complex analysis. See for instance

More information

Lecture 5: Probabilistic tools and Applications II

Lecture 5: Probabilistic tools and Applications II T-79.7003: Graphs and Networks Fall 2013 Lecture 5: Probabilistic tools and Applications II Lecturer: Charalampos E. Tsourakakis Oct. 11, 2013 5.1 Overview In the first part of today s lecture we will

More information

Math 259: Introduction to Analytic Number Theory Functions of finite order: product formula and logarithmic derivative

Math 259: Introduction to Analytic Number Theory Functions of finite order: product formula and logarithmic derivative Math 259: Introduction to Analytic Number Theory Functions of finite order: product formula and logarithmic derivative This chapter is another review of standard material in complex analysis. See for instance

More information

Lecture 4: Two-point Sampling, Coupon Collector s problem

Lecture 4: Two-point Sampling, Coupon Collector s problem Randomized Algorithms Lecture 4: Two-point Sampling, Coupon Collector s problem Sotiris Nikoletseas Associate Professor CEID - ETY Course 2013-2014 Sotiris Nikoletseas, Associate Professor Randomized Algorithms

More information

5.1 Inequalities via joint range

5.1 Inequalities via joint range ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016 Lecture 5: Inequalities between f-divergences via their joint range Lecturer: Yihong Wu Scribe: Pengkun Yang, Feb 9, 2016

More information

Lecture 17: Density Estimation Lecturer: Yihong Wu Scribe: Jiaqi Mu, Mar 31, 2016 [Ed. Apr 1]

Lecture 17: Density Estimation Lecturer: Yihong Wu Scribe: Jiaqi Mu, Mar 31, 2016 [Ed. Apr 1] ECE598: Information-theoretic methods in high-dimensional statistics Spring 06 Lecture 7: Density Estimation Lecturer: Yihong Wu Scribe: Jiaqi Mu, Mar 3, 06 [Ed. Apr ] In last lecture, we studied the minimax

More information

Lecture 3: Semantics of Propositional Logic

Lecture 3: Semantics of Propositional Logic Lecture 3: Semantics of Propositional Logic 1 Semantics of Propositional Logic Every language has two aspects: syntax and semantics. While syntax deals with the form or structure of the language, it is

More information

Supremum of simple stochastic processes

Supremum of simple stochastic processes Subspace embeddings Daniel Hsu COMS 4772 1 Supremum of simple stochastic processes 2 Recap: JL lemma JL lemma. For any ε (0, 1/2), point set S R d of cardinality 16 ln n S = n, and k N such that k, there

More information

f-divergences, and applications Adityanand Guntuboyina

f-divergences, and applications Adityanand Guntuboyina TO APPEAR I IEEE TRASACTIOS O IFORMATIO THEORY, 20 Lower bounds for the minimax risk using f-divergences, and applications Adityanand Guntuboyina ariv:0020042v2 [mathst] 8 Feb 20 Abstract Lower bounds

More information

Lecture 15: October 15

Lecture 15: October 15 10-725: Optimization Fall 2012 Lecturer: Barnabas Poczos Lecture 15: October 15 Scribes: Christian Kroer, Fanyi Xiao Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes have

More information

On the method of typical bounded differences. Lutz Warnke. Georgia Tech

On the method of typical bounded differences. Lutz Warnke. Georgia Tech On the method of typical bounded differences Lutz Warnke Georgia Tech What is this talk about? Motivation Behaviour of a function of independent random variables ξ 1,..., ξ n : X = F (ξ 1,..., ξ n ) the

More information

Springer Series in Statistics

Springer Series in Statistics Springer Series in Statistics Advisors: P. Bickel, P. Diggle, S. Fienberg, U. Gather, I. Olkin, S. Zeger The French edition of this work that is the basis of this expanded edition was translated by Vladimir

More information

Lecture 2: August 31

Lecture 2: August 31 0-704: Information Processing and Learning Fall 206 Lecturer: Aarti Singh Lecture 2: August 3 Note: These notes are based on scribed notes from Spring5 offering of this course. LaTeX template courtesy

More information

14 : Theory of Variational Inference: Inner and Outer Approximation

14 : Theory of Variational Inference: Inner and Outer Approximation 10-708: Probabilistic Graphical Models 10-708, Spring 2014 14 : Theory of Variational Inference: Inner and Outer Approximation Lecturer: Eric P. Xing Scribes: Yu-Hsin Kuo, Amos Ng 1 Introduction Last lecture

More information

CONSIDER an estimation problem in which we want to

CONSIDER an estimation problem in which we want to 2386 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 4, APRIL 2011 Lower Bounds for the Minimax Risk Using f-divergences, and Applications Adityanand Guntuboyina, Student Member, IEEE Abstract Lower

More information

Lecture 4: September Reminder: convergence of sequences

Lecture 4: September Reminder: convergence of sequences 36-705: Intermediate Statistics Fall 2017 Lecturer: Siva Balakrishnan Lecture 4: September 6 In this lecture we discuss the convergence of random variables. At a high-level, our first few lectures focused

More information

Lecture 12 November 3

Lecture 12 November 3 STATS 300A: Theory of Statistics Fall 2015 Lecture 12 November 3 Lecturer: Lester Mackey Scribe: Jae Hyuck Park, Christian Fong Warning: These notes may contain factual and/or typographic errors. 12.1

More information

6.1 Variational representation of f-divergences

6.1 Variational representation of f-divergences ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016 Lecture 6: Variational representation, HCR and CR lower bounds Lecturer: Yihong Wu Scribe: Georgios Rovatsos, Feb 11, 2016

More information

Lecture 2: Uniform Entropy

Lecture 2: Uniform Entropy STAT 583: Advanced Theory of Statistical Inference Spring 218 Lecture 2: Uniform Entropy Lecturer: Fang Han April 16 Disclaimer: These notes have not been subjected to the usual scrutiny reserved for formal

More information

2. Metric Spaces. 2.1 Definitions etc.

2. Metric Spaces. 2.1 Definitions etc. 2. Metric Spaces 2.1 Definitions etc. The procedure in Section for regarding R as a topological space may be generalized to many other sets in which there is some kind of distance (formally, sets with

More information

EXPOSITORY NOTES ON DISTRIBUTION THEORY, FALL 2018

EXPOSITORY NOTES ON DISTRIBUTION THEORY, FALL 2018 EXPOSITORY NOTES ON DISTRIBUTION THEORY, FALL 2018 While these notes are under construction, I expect there will be many typos. The main reference for this is volume 1 of Hörmander, The analysis of liner

More information

15.1 Upper bound via Sudakov minorization

15.1 Upper bound via Sudakov minorization ECE598: Information-theoretic methos in high-imensional statistics Spring 206 Lecture 5: Suakov, Maurey, an uality of metric entropy Lecturer: Yihong Wu Scribe: Aolin Xu, Mar 7, 206 [E. Mar 24] In this

More information

Section 9.2 introduces the description of Markov processes in terms of their transition probabilities and proves the existence of such processes.

Section 9.2 introduces the description of Markov processes in terms of their transition probabilities and proves the existence of such processes. Chapter 9 Markov Processes This lecture begins our study of Markov processes. Section 9.1 is mainly ideological : it formally defines the Markov property for one-parameter processes, and explains why it

More information

Lecture 5: Two-point Sampling

Lecture 5: Two-point Sampling Randomized Algorithms Lecture 5: Two-point Sampling Sotiris Nikoletseas Professor CEID - ETY Course 2017-2018 Sotiris Nikoletseas, Professor Randomized Algorithms - Lecture 5 1 / 26 Overview A. Pairwise

More information

Chapter 9: Hypothesis Testing Sections

Chapter 9: Hypothesis Testing Sections Chapter 9: Hypothesis Testing Sections 9.1 Problems of Testing Hypotheses 9.2 Testing Simple Hypotheses 9.3 Uniformly Most Powerful Tests Skip: 9.4 Two-Sided Alternatives 9.6 Comparing the Means of Two

More information

The Minesweeper game: Percolation and Complexity

The Minesweeper game: Percolation and Complexity The Minesweeper game: Percolation and Complexity Elchanan Mossel Hebrew University of Jerusalem and Microsoft Research March 15, 2002 Abstract We study a model motivated by the minesweeper game In this

More information

Spring 2014 Advanced Probability Overview. Lecture Notes Set 1: Course Overview, σ-fields, and Measures

Spring 2014 Advanced Probability Overview. Lecture Notes Set 1: Course Overview, σ-fields, and Measures 36-752 Spring 2014 Advanced Probability Overview Lecture Notes Set 1: Course Overview, σ-fields, and Measures Instructor: Jing Lei Associated reading: Sec 1.1-1.4 of Ash and Doléans-Dade; Sec 1.1 and A.1

More information

On the Complexity of Best Arm Identification with Fixed Confidence

On the Complexity of Best Arm Identification with Fixed Confidence On the Complexity of Best Arm Identification with Fixed Confidence Discrete Optimization with Noise Aurélien Garivier, Emilie Kaufmann COLT, June 23 th 2016, New York Institut de Mathématiques de Toulouse

More information

Statistics: Learning models from data

Statistics: Learning models from data DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial

More information

arxiv: v4 [cs.it] 17 Oct 2015

arxiv: v4 [cs.it] 17 Oct 2015 Upper Bounds on the Relative Entropy and Rényi Divergence as a Function of Total Variation Distance for Finite Alphabets Igal Sason Department of Electrical Engineering Technion Israel Institute of Technology

More information

CS229T/STATS231: Statistical Learning Theory. Lecturer: Tengyu Ma Lecture 11 Scribe: Jongho Kim, Jamie Kang October 29th, 2018

CS229T/STATS231: Statistical Learning Theory. Lecturer: Tengyu Ma Lecture 11 Scribe: Jongho Kim, Jamie Kang October 29th, 2018 CS229T/STATS231: Statistical Learning Theory Lecturer: Tengyu Ma Lecture 11 Scribe: Jongho Kim, Jamie Kang October 29th, 2018 1 Overview This lecture mainly covers Recall the statistical theory of GANs

More information

Lecture 14: Random Walks, Local Graph Clustering, Linear Programming

Lecture 14: Random Walks, Local Graph Clustering, Linear Programming CSE 521: Design and Analysis of Algorithms I Winter 2017 Lecture 14: Random Walks, Local Graph Clustering, Linear Programming Lecturer: Shayan Oveis Gharan 3/01/17 Scribe: Laura Vonessen Disclaimer: These

More information

Priors for the frequentist, consistency beyond Schwartz

Priors for the frequentist, consistency beyond Schwartz Victoria University, Wellington, New Zealand, 11 January 2016 Priors for the frequentist, consistency beyond Schwartz Bas Kleijn, KdV Institute for Mathematics Part I Introduction Bayesian and Frequentist

More information

Lecture 22: Error exponents in hypothesis testing, GLRT

Lecture 22: Error exponents in hypothesis testing, GLRT 10-704: Information Processing and Learning Spring 2012 Lecture 22: Error exponents in hypothesis testing, GLRT Lecturer: Aarti Singh Scribe: Aarti Singh Disclaimer: These notes have not been subjected

More information

Asymptotic Nonequivalence of Nonparametric Experiments When the Smoothness Index is ½

Asymptotic Nonequivalence of Nonparametric Experiments When the Smoothness Index is ½ University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 1998 Asymptotic Nonequivalence of Nonparametric Experiments When the Smoothness Index is ½ Lawrence D. Brown University

More information

Mathematical Tripos Part III Michaelmas 2017 Distribution Theory & Applications, Example sheet 1 (answers) Dr A.C.L. Ashton

Mathematical Tripos Part III Michaelmas 2017 Distribution Theory & Applications, Example sheet 1 (answers) Dr A.C.L. Ashton Mathematical Tripos Part III Michaelmas 7 Distribution Theory & Applications, Eample sheet (answers) Dr A.C.L. Ashton Comments and corrections to acla@damtp.cam.ac.uk.. Construct a non-zero element of

More information

Lecture 2. We now introduce some fundamental tools in martingale theory, which are useful in controlling the fluctuation of martingales.

Lecture 2. We now introduce some fundamental tools in martingale theory, which are useful in controlling the fluctuation of martingales. Lecture 2 1 Martingales We now introduce some fundamental tools in martingale theory, which are useful in controlling the fluctuation of martingales. 1.1 Doob s inequality We have the following maximal

More information

ALGEBRAIC STRUCTURE AND DEGREE REDUCTION

ALGEBRAIC STRUCTURE AND DEGREE REDUCTION ALGEBRAIC STRUCTURE AND DEGREE REDUCTION Let S F n. We define deg(s) to be the minimal degree of a non-zero polynomial that vanishes on S. We have seen that for a finite set S, deg(s) n S 1/n. In fact,

More information

Statistical Approaches to Learning and Discovery. Week 4: Decision Theory and Risk Minimization. February 3, 2003

Statistical Approaches to Learning and Discovery. Week 4: Decision Theory and Risk Minimization. February 3, 2003 Statistical Approaches to Learning and Discovery Week 4: Decision Theory and Risk Minimization February 3, 2003 Recall From Last Time Bayesian expected loss is ρ(π, a) = E π [L(θ, a)] = L(θ, a) df π (θ)

More information

3.0.1 Multivariate version and tensor product of experiments

3.0.1 Multivariate version and tensor product of experiments ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016 Lecture 3: Minimax risk of GLM and four extensions Lecturer: Yihong Wu Scribe: Ashok Vardhan, Jan 28, 2016 [Ed. Mar 24]

More information

Estimating Unnormalised Models by Score Matching

Estimating Unnormalised Models by Score Matching Estimating Unnormalised Models by Score Matching Michael Gutmann Probabilistic Modelling and Reasoning (INFR11134) School of Informatics, University of Edinburgh Spring semester 2018 Program 1. Basics

More information

Permutation estimation and minimax matching thresholds

Permutation estimation and minimax matching thresholds Olivier Collier IMAGINE, Université Paris-Est Arnak Dalalyan CREST, ENSAE Abstract The problem of matching two sets of features appears in various tasks of computer vision and can be often formalized as

More information

Lecture 10: Generalized likelihood ratio test

Lecture 10: Generalized likelihood ratio test Stat 200: Introduction to Statistical Inference Autumn 2018/19 Lecture 10: Generalized likelihood ratio test Lecturer: Art B. Owen October 25 Disclaimer: These notes have not been subjected to the usual

More information

1 Glivenko-Cantelli type theorems

1 Glivenko-Cantelli type theorems STA79 Lecture Spring Semester Glivenko-Cantelli type theorems Given i.i.d. observations X,..., X n with unknown distribution function F (t, consider the empirical (sample CDF ˆF n (t = I [Xi t]. n Then

More information

MAT 585: Johnson-Lindenstrauss, Group testing, and Compressed Sensing

MAT 585: Johnson-Lindenstrauss, Group testing, and Compressed Sensing MAT 585: Johnson-Lindenstrauss, Group testing, and Compressed Sensing Afonso S. Bandeira April 9, 2015 1 The Johnson-Lindenstrauss Lemma Suppose one has n points, X = {x 1,..., x n }, in R d with d very

More information

is a Borel subset of S Θ for each c R (Bertsekas and Shreve, 1978, Proposition 7.36) This always holds in practical applications.

is a Borel subset of S Θ for each c R (Bertsekas and Shreve, 1978, Proposition 7.36) This always holds in practical applications. Stat 811 Lecture Notes The Wald Consistency Theorem Charles J. Geyer April 9, 01 1 Analyticity Assumptions Let { f θ : θ Θ } be a family of subprobability densities 1 with respect to a measure µ on a measurable

More information

Aditya Bhaskara CS 5968/6968, Lecture 1: Introduction and Review 12 January 2016

Aditya Bhaskara CS 5968/6968, Lecture 1: Introduction and Review 12 January 2016 Lecture 1: Introduction and Review We begin with a short introduction to the course, and logistics. We then survey some basics about approximation algorithms and probability. We also introduce some of

More information

Lecture 2 One too many inequalities

Lecture 2 One too many inequalities University of Illinois Department of Economics Spring 2017 Econ 574 Roger Koenker Lecture 2 One too many inequalities In lecture 1 we introduced some of the basic conceptual building materials of the course.

More information

PROBABILITY: LIMIT THEOREMS II, SPRING HOMEWORK PROBLEMS

PROBABILITY: LIMIT THEOREMS II, SPRING HOMEWORK PROBLEMS PROBABILITY: LIMIT THEOREMS II, SPRING 218. HOMEWORK PROBLEMS PROF. YURI BAKHTIN Instructions. You are allowed to work on solutions in groups, but you are required to write up solutions on your own. Please

More information

Lecture 21: Quantum communication complexity

Lecture 21: Quantum communication complexity CPSC 519/619: Quantum Computation John Watrous, University of Calgary Lecture 21: Quantum communication complexity April 6, 2006 In this lecture we will discuss how quantum information can allow for a

More information

On Consistent Hypotheses Testing

On Consistent Hypotheses Testing Mikhail Ermakov St.Petersburg E-mail: erm2512@gmail.com History is rather distinctive. Stein (1954); Hodges (1954); Hoefding and Wolfowitz (1956), Le Cam and Schwartz (1960). special original setups: Shepp,

More information

Information Theory and Hypothesis Testing

Information Theory and Hypothesis Testing Summer School on Game Theory and Telecommunications Campione, 7-12 September, 2014 Information Theory and Hypothesis Testing Mauro Barni University of Siena September 8 Review of some basic results linking

More information

Lecture 12: Introduction to Spectral Graph Theory, Cheeger s inequality

Lecture 12: Introduction to Spectral Graph Theory, Cheeger s inequality CSE 521: Design and Analysis of Algorithms I Spring 2016 Lecture 12: Introduction to Spectral Graph Theory, Cheeger s inequality Lecturer: Shayan Oveis Gharan May 4th Scribe: Gabriel Cadamuro Disclaimer:

More information

Generalization bounds

Generalization bounds Advanced Course in Machine Learning pring 200 Generalization bounds Handouts are jointly prepared by hie Mannor and hai halev-hwartz he problem of characterizing learnability is the most basic question

More information

Lecture 6 Basic Probability

Lecture 6 Basic Probability Lecture 6: Basic Probability 1 of 17 Course: Theory of Probability I Term: Fall 2013 Instructor: Gordan Zitkovic Lecture 6 Basic Probability Probability spaces A mathematical setup behind a probabilistic

More information

Divergence Based priors for the problem of hypothesis testing

Divergence Based priors for the problem of hypothesis testing Divergence Based priors for the problem of hypothesis testing gonzalo garcía-donato and susie Bayarri May 22, 2009 gonzalo garcía-donato and susie Bayarri () DB priors May 22, 2009 1 / 46 Jeffreys and

More information

More Empirical Process Theory

More Empirical Process Theory More Empirical Process heory 4.384 ime Series Analysis, Fall 2008 Recitation by Paul Schrimpf Supplementary to lectures given by Anna Mikusheva October 24, 2008 Recitation 8 More Empirical Process heory

More information

The information complexity of sequential resource allocation

The information complexity of sequential resource allocation The information complexity of sequential resource allocation Emilie Kaufmann, joint work with Olivier Cappé, Aurélien Garivier and Shivaram Kalyanakrishan SMILE Seminar, ENS, June 8th, 205 Sequential allocation

More information

2. The Concept of Convergence: Ultrafilters and Nets

2. The Concept of Convergence: Ultrafilters and Nets 2. The Concept of Convergence: Ultrafilters and Nets NOTE: AS OF 2008, SOME OF THIS STUFF IS A BIT OUT- DATED AND HAS A FEW TYPOS. I WILL REVISE THIS MATE- RIAL SOMETIME. In this lecture we discuss two

More information

CS 591, Lecture 2 Data Analytics: Theory and Applications Boston University

CS 591, Lecture 2 Data Analytics: Theory and Applications Boston University CS 591, Lecture 2 Data Analytics: Theory and Applications Boston University Charalampos E. Tsourakakis January 25rd, 2017 Probability Theory The theory of probability is a system for making better guesses.

More information

Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016

Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016 Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016 1 Entropy Since this course is about entropy maximization,

More information

Lecture 5: February 16, 2012

Lecture 5: February 16, 2012 COMS 6253: Advanced Computational Learning Theory Lecturer: Rocco Servedio Lecture 5: February 16, 2012 Spring 2012 Scribe: Igor Carboni Oliveira 1 Last time and today Previously: Finished first unit on

More information

Lecture 6: September 22

Lecture 6: September 22 CS294 Markov Chain Monte Carlo: Foundations & Applications Fall 2009 Lecture 6: September 22 Lecturer: Prof. Alistair Sinclair Scribes: Alistair Sinclair Disclaimer: These notes have not been subjected

More information