Hoeffding, Azuma, McDiarmid

Similar documents
Chapter 6 Continuous Random Variables and Distributions

Lecture 1: Introduction to integration theory and bounded variation

Discrete Mathematics and Probability Theory Summer 2014 James Cook Note 17

Discrete Mathematics and Probability Theory Spring 2013 Anant Sahai Lecture 17

Review of Calculus, cont d

Continuous Random Variables Class 5, Jeremy Orloff and Jonathan Bloom

Math 360: A primitive integral and elementary functions

Quadratic Forms. Quadratic Forms

Math 426: Probability Final Exam Practice

10 Vector Integral Calculus

Phil Wertheimer UMD Math Qualifying Exam Solutions Analysis - January, 2015

Interpreting Integrals and the Fundamental Theorem

Polynomial Approximations for the Natural Logarithm and Arctangent Functions. Math 230

1B40 Practical Skills

4.1. Probability Density Functions

Polynomials and Division Theory

SOME INTEGRAL INEQUALITIES OF GRÜSS TYPE

Heavy tail and stable distributions

Chapter 6 Techniques of Integration

Principles of Real Analysis I Fall VI. Riemann Integration

Problem. Statement. variable Y. Method: Step 1: Step 2: y d dy. Find F ( Step 3: Find f = Y. Solution: Assume

Calculus II: Integrations and Series

ARITHMETIC OPERATIONS. The real numbers have the following properties: a b c ab ac

Journal of Inequalities in Pure and Applied Mathematics

Review of Gaussian Quadrature method

Surface maps into free groups

Tutorial 4. b a. h(f) = a b a ln 1. b a dx = ln(b a) nats = log(b a) bits. = ln λ + 1 nats. = log e λ bits. = ln 1 2 ln λ + 1. nats. = ln 2e. bits.

Taylor Polynomial Inequalities

Section 6.1 INTRO to LAPLACE TRANSFORMS

f(x)dx . Show that there 1, 0 < x 1 does not exist a differentiable function g : [ 1, 1] R such that g (x) = f(x) for all

Definite Integrals. The area under a curve can be approximated by adding up the areas of rectangles = 1 1 +

Chapter 3 Single Random Variables and Probability Distributions (Part 2)

1. On some properties of definite integrals. We prove

Advanced Calculus: MATH 410 Notes on Integrals and Integrability Professor David Levermore 17 October 2004

Regular Language. Nonregular Languages The Pumping Lemma. The pumping lemma. Regular Language. The pumping lemma. Infinitely long words 3/17/15

Let's start with an example:

Best Approximation. Chapter The General Case

Calculus in R. Chapter Di erentiation

CMDA 4604: Intermediate Topics in Mathematical Modeling Lecture 19: Interpolation and Quadrature

The Banach algebra of functions of bounded variation and the pointwise Helly selection theorem

2.4 Linear Inequalities and Interval Notation

Math 113 Exam 2 Practice

Question 1. Question 3. Question 4. Graduate Analysis I Chapter 5

The practical version

Numerical Analysis: Trapezoidal and Simpson s Rule

Math 554 Integration

ODE: Existence and Uniqueness of a Solution

Math 259 Winter Solutions to Homework #9

Improper Integrals. Type I Improper Integrals How do we evaluate an integral such as

Week 7 Riemann Stieltjes Integration: Lectures 19-21

1 Probability Density Functions

Solution for Assignment 1 : Intro to Probability and Statistics, PAC learning

Notes on length and conformal metrics

0.1 THE REAL NUMBER LINE AND ORDER

5.1 Estimating with Finite Sums Calculus

Main topics for the First Midterm

Intermediate Math Circles Wednesday, November 14, 2018 Finite Automata II. Nickolas Rollick a b b. a b 4

and that at t = 0 the object is at position 5. Find the position of the object at t = 2.

Math 1B, lecture 4: Error bounds for numerical methods

Lecture 3 ( ) (translated and slightly adapted from lecture notes by Martin Klazar)

Chapter 2. Random Variables and Probability Distributions

Properties of the Riemann Integral

Continuous Random Variable X:

Linear Inequalities. Work Sheet 1

Section 3.2 Maximum Principle and Uniqueness

5 Probability densities

Improper Integrals. Introduction. Type 1: Improper Integrals on Infinite Intervals. When we defined the definite integral.

Exam 2, Mathematics 4701, Section ETY6 6:05 pm 7:40 pm, March 31, 2016, IH-1105 Instructor: Attila Máté 1

Energy Bands Energy Bands and Band Gap. Phys463.nb Phenomenon

378 Relations Solutions for Chapter 16. Section 16.1 Exercises. 3. Let A = {0,1,2,3,4,5}. Write out the relation R that expresses on A.

Problem Set 4: Solutions Math 201A: Fall 2016

Experiments, Outcomes, Events and Random Variables: A Revisit

Mathematics Number: Logarithms

Properties of Integrals, Indefinite Integrals. Goals: Definition of the Definite Integral Integral Calculations using Antiderivatives

Advanced Calculus: MATH 410 Uniform Convergence of Functions Professor David Levermore 11 December 2015

Improper Integrals. The First Fundamental Theorem of Calculus, as we ve discussed in class, goes as follows:

Vectors , (0,0). 5. A vector is commonly denoted by putting an arrow above its symbol, as in the picture above. Here are some 3-dimensional vectors:

Chapter 9 Definite Integrals

Recitation 3: More Applications of the Derivative

The area under the graph of f and above the x-axis between a and b is denoted by. f(x) dx. π O

MATH 174A: PROBLEM SET 5. Suggested Solution

Beginning Darboux Integration, Math 317, Intro to Analysis II

3 x x x 1 3 x a a a 2 7 a Ba 1 NOW TRY EXERCISES 89 AND a 2/ Evaluate each expression.

Chapter 6 Notes, Larson/Hostetler 3e

Chapter 1: Logarithmic functions and indices

8 Laplace s Method and Local Limit Theorems

P 3 (x) = f(0) + f (0)x + f (0) 2. x 2 + f (0) . In the problem set, you are asked to show, in general, the n th order term is a n = f (n) (0)

Chapter 0. What is the Lebesgue integral about?

APPENDIX. Precalculus Review D.1. Real Numbers and the Real Number Line

MTH 505: Number Theory Spring 2017

The Regulated and Riemann Integrals

DERIVATIVES NOTES HARRIS MATH CAMP Introduction

Math Calculus with Analytic Geometry II

2 b. , a. area is S= 2π xds. Again, understand where these formulas came from (pages ).

4181H Problem Set 11 Selected Solutions. Chapter 19. n(log x) n 1 1 x x dx,

p-adic Egyptian Fractions

set is not closed under matrix [ multiplication, ] and does not form a group.

The Wave Equation I. MA 436 Kurt Bryan

THE EXISTENCE-UNIQUENESS THEOREM FOR FIRST-ORDER DIFFERENTIAL EQUATIONS.

New Expansion and Infinite Series

Transcription:

Hoeffding, Azum, McDirmid Krl Strtos 1 Hoeffding (sum of independent RVs) Hoeffding s lemm. If X [, ] nd E[X] 0, then for ll t > 0: E[e tx ] e t2 ( ) 2 / Proof. Since e t is conve, for ll [, ]: This mens: E[e tx ] ( where φ(t) : t + ln et e t et + et ( et the form ( ). Look t the derivtives of φ: φ () φ () ( 1 e t( ) e t( ) e t( ) ) et( ) e t e φ(t) et( ) ). We did the second step ecuse we wnt ) 2 α t( ) ( ) 2 ( t( ) + α ) 2 for α : α ( t( ) + α ) } {{ } u ( t( ) t( ) + α ) ( ) 2 } {{ } 1 u ( )2 4 We used the fct tht the concve function u(1 u) u u 2 chieves its mimum of 1/4 t u 1/2. Now we pproimte φ(t) t t 0 with the first-degree Tylor polynomil. Reminder theorem gives us tht The φ(t) φ(0) + 1 t φ (0) + R 1 (θ) for some θ [0, t] t2 2 φ (θ) t2 ( ) 2 1

Hoeffding s inequlity. Given iid rndom vriles X 1... X m where X i [ i, i ], let S m : m X i. Then for ny ɛ > 0: P (S m E[S m ] ɛ) e 2ɛ2 / m (i i)2 Proof. Using the Chernoff ounding technique, we write for ll t 0: P (S m E[S m ] ɛ) P (e t(sm E[Sm]) e tɛ ) E[e t(sm E[Sm]) ]e tɛ [ m ] E e t(xi E[Xi]) e tɛ m [ E e t(xi E[Xi])] e tɛ m m y Mrkov y independence e t 2 (i i ) 2 e tɛ y Hoeffding s lemm e t 2 (i i ) 2 tɛ Since t2 ( i i) 2 4ɛ tɛ is conve, we minimize it with t ( i i), yielding the ound 2 m e 2ɛ2 2ɛ ( i i ) 2 2 m ( e i i ) 2. 1 The proof suggests tht the result cn e generlized to vriles tht re not necessrily independent, since we just need the epecttion to rek over product. 2 Azum (sum of mrtingle differences) Conditionl Hoeffding s lemm. If V [f(z), f(z) + c] nd E[V Z] 0, then for ll t > 0: E[e tv Z] e t2 c 2 / Note tht E[e tv Z] is rndom vrile in Z. Proof. Similr to the proof of Hoeffding s lemm. Use f(z), f(z) + c nd use E[ Z] insted of E[ ]. V 1, V 2,... is clled mrtingle difference sequence wrt. X 1, X 2,... if V i is function of X 1... X i. E[ V i ] < E[V i+1 X 1... X i ] 0 1 Without Hoeffding s lemm, we could hndle the cse X i {0, 1} y eplicitly ounding the non-centered quntity E[e tx i] p i e t +(1 p i ) 1 p i (e t +1) ep( p i (e t +1)) (here p i : E[X i ]) nd oserving m E[etX i] ep( E[S m](e t + 1)). 2

Azum s inequlity. Given mrtingle difference sequence V 1, V 2,... wrt. X 1, X 2,... where V i [f i (X 1... X i 1 ), f i (X 1... X i 1 ) + c i ] for some f i nd c i 0, for ll ɛ > 0: [ m ] E V i ɛ e 2ɛ2 / m c2 i Proof. For ech k [m], define S k : k V i. By the lw of iterted epecttions (LIE) E X [X] E Z [E X Z [X Z]] (see the ppendi): where E[e ts k ] E[E[e ts k X 1... X k 1 ]] E[e ts k X 1... X k 1 ] E[e ts k 1 e tv k X 1... X k 1 ] E[e ts k 1 X 1... X k 1 ]E[e tv k X 1... X k 1 ] E[e ts k 1 X 1... X k 1 ]e t2 c 2 k / The second step holds ecuse S k 1 only depends on X 1... X k 1. The third step holds y conditionl Hoeffding s lemm. Thus E[e tsm ] e t2 c 2 m / E[e tsm 1 ] e t 2 m c 2 i Use the Chernoff ounding technique on S m : P (S m ɛ) P (e tsm e tɛ ) E [ e tsm] e tɛ e t 2 m c 2 i tɛ y Mrkov y the ove rgument By minimizing the conve function t2 m c2 i tɛ with t 4ɛ/ m ound e 2ɛ2 / m c2 i. c2 i, we get the 3 McDirmid ( Lipschitz function of independent RVs) McDirmid s inequlity. Given iid rndom vriles X 1... X m X, let f : X m R e function ounded in Lipschitz-like mnner s follows: for ll 1... m, i X, there is some c i 0 such tht Let f(s) : f(x 1... X m ). Then f( 1... i... m ) f( 1... i... m ) c i P (f(s) E[f(S)] ɛ) e 2ɛ2 / m c2 i Proof. Define V : f(s) E[f(S)]. Will show V m V i is sum of ounded mrgingle differences V i [f i (X 1... X i 1 ), f i (X 1... X i 1 ) + c i ]. Then Azum s inequlity gives the desired result. Define V i : E[V X 1... X i ] E[V X 1... X i 1 ]. Note tht ech V i is function of X 1... X i nd the telescoping sum gives m V i E[V X 1... X m ] V 3

In ddition, E[E[V X 1... X i ] X 1... X i 1 ] E[V X 1... X i 1 ] (y LIE), so we hve E[V i X 1... X i 1 ] E[E[V X 1... X i ] V X 1... X i 1 ] 0 Thus V 1... V m is mrtingle difference sequence wrt. X 1... X m. 2 Now ound V i in terms of X 1... X i 1 : V i sup E[V X 1... X i ] E[V X 1... X i 1 ] : W i X V i inf X E[V X 1... X i ] E[V X 1... X i 1 ] : U i Using the Lipschitz condition on f: W i U i sup E[V X 1... X i ] E[V X 1... X i ], X sup E[f(S) X 1... X i ] E[f(S) X 1... X i ], X c i Thus W i U i +c i nd it follows V i [U i, U i +c i ] where U i is function of X 1... X i 1. References. Appendi D of Foundtions of Mchine Lerning (MRT), Chpter 12 of Proility nd Computing (MU) 2 We ve constructed doo mrtingle Z 0, Z 1,..., Z m wrt. X 0 constnt, X 1,..., X m for the trget quntity V. Tht is, Z i : E[V X 0... X m] which gives V i Z i Z i 1. 4

4 Appendies 4.1 Crsh Course on Conditionl RVs The proof of Azum s nd McDirmid s inequlity mkes hevy use of conditionl epecttions. Let s sy X is rndom vrile. Then E X [X] is constnt. However, E X Y [X Y ] is rndom vrile (rndom over Y )! compute vlue for specific y Y : E X Y [X Y y] P X Y (X Y y) d is constnt. We cn only The lw of iterted epecttions (LIE) 3 sttes tht E Y [E X Y [X Y ]] E X [X] }{{}}{{} fnc of Y constnt Now tht we know the definition, it s pretty esy to show: E Y [E X Y [X Y ]] P Y (Y y) E X Y [X Y y] dy y y E X [X] ( ) P Y (Y y) P X Y (X Y y) d dy ( ) P Y (Y y) P X Y (X Y y) dy d y P X (X ) d The sme principle holds when we work with more thn two vriles: E Y Z [E X Y,Z [X Y, Z] Z] E X Z [X Z] }{{}}{{} fnc of Y, Z fnc of Z It siclly sys we re free to condition on nything s long s we eventully tke epecttion over it. 4.2 Mrtingles A sequence Z 0, Z 1... is mrtingle wrt. X 0, X 1... if Z i is function of X 0... X i. E[ Z i ] E[Z i+1 X 0... X i ] Z i 3 Also clled the lw of totl epecttion, the tower rule, the smoothing theorem, Adm s Lw. 5

A doo mrtingle is mrtingle constructed s follows. Let X 0... X n e ny sequence. We re interested in Y tht depends on ll X 0... X n ; we ssume E[ Y ]. We define Z i to e the epecttion of Y given X 0... X i : Z i : E[Y X 0... X i ] To verify Z 0... Z n is mrtingle, we need to check the third condition: E[Z i+1 X 0... X i ] E[E[Y X 0... X i+1 ] X 0... X i ] y def E[Y X 0... X i ] y LIE Z i For instnce, consider sequence of rewrds in n independent fir gmles: X 1... X n where E[X i ] 0. We re interested in the totl rewrd Y n X i. Then our doo mrtingle is given y Z i n E[X j X 1... X i ] j1 since E[X j X 1... X i ] E[X j ] 0 for j > i. I.e., the refined estimte of the totl rewrd t time i is simply the sum up to tht time. By construction, if Z 0, Z 1,... is mrtingle wrt. X 0, X 1,..., then V 1, V 2,... defined y V i : Z i Z i 1 i j1 is mrtingle difference sequence defined efore since V i Z i Z i 1 is function of X 1... X i. E[ V i ] E[ Z i Z i 1 ] < E[V i+1 X 1... X i ] E[Z i+1 X 1... X i ] Z i 0 X j 6