E Tail Inequalities. E.1 Markov s Inequality. Non-Lecture E: Tail Inequalities

Similar documents
11 Tail Inequalities Markov s Inequality. Lecture 11: Tail Inequalities [Fa 13]

princeton univ. F 13 cos 521: Advanced Algorithm Design Lecture 3: Large deviations bounds and applications Lecturer: Sanjeev Arora

find (x): given element x, return the canonical element of the set containing x;

Finding Dense Subgraphs in G(n, 1/2)

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 12 10/21/2013. Martingale Concentration Inequalities and Applications

18.1 Introduction and Recap

= z 20 z n. (k 20) + 4 z k = 4

Stanford University CS254: Computational Complexity Notes 7 Luca Trevisan January 29, Notes for Lecture 7

Lecture 4: Universal Hash Functions/Streaming Cont d

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X

Lecture Space-Bounded Derandomization

Eigenvalues of Random Graphs

Expected Value and Variance

Edge Isoperimetric Inequalities

Notes on Frequency Estimation in Data Streams

Lectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix

Lecture 3 January 31, 2017

Introduction to Algorithms

More metrics on cartesian products

Limited Dependent Variables

Stanford University CS359G: Graph Partitioning and Expanders Handout 4 Luca Trevisan January 13, 2011

Lecture 10 Support Vector Machines II

APPENDIX A Some Linear Algebra

Lecture 4: November 17, Part 1 Single Buffer Management

Structure and Drive Paul A. Jensen Copyright July 20, 2003

The Expectation-Maximization Algorithm

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U)

MATH 5707 HOMEWORK 4 SOLUTIONS 2. 2 i 2p i E(X i ) + E(Xi 2 ) ä i=1. i=1

THE SUMMATION NOTATION Ʃ

Affine transformations and convexity

Difference Equations

U.C. Berkeley CS294: Spectral Methods and Expanders Handout 8 Luca Trevisan February 17, 2016

Inner Product. Euclidean Space. Orthonormal Basis. Orthogonal

Problem Set 9 Solutions

Randomness and Computation

Exercises of Chapter 2

U.C. Berkeley CS294: Beyond Worst-Case Analysis Luca Trevisan September 5, 2017

Min Cut, Fast Cut, Polynomial Identities

P exp(tx) = 1 + t 2k M 2k. k N

Vapnik-Chervonenkis theory

Lecture 10: May 6, 2013

Example: (13320, 22140) =? Solution #1: The divisors of are 1, 2, 3, 4, 5, 6, 9, 10, 12, 15, 18, 20, 27, 30, 36, 41,

Asymptotics of the Solution of a Boundary Value. Problem for One-Characteristic Differential. Equation Degenerating into a Parabolic Equation

MA 323 Geometric Modelling Course Notes: Day 13 Bezier Curves & Bernstein Polynomials

Foundations of Arithmetic

princeton univ. F 17 cos 521: Advanced Algorithm Design Lecture 7: LP Duality Lecturer: Matt Weinberg

The Order Relation and Trace Inequalities for. Hermitian Operators

Linear Approximation with Regularization and Moving Least Squares

8.6 The Complex Number System

6.842 Randomness and Computation February 18, Lecture 4

Maximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models

CSCE 790S Background Results

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification

Errors for Linear Systems

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results.

College of Computer & Information Science Fall 2009 Northeastern University 20 October 2009

CHAPTER 17 Amortized Analysis

Section 8.3 Polar Form of Complex Numbers

Integrals and Invariants of Euler-Lagrange Equations

} Often, when learning, we deal with uncertainty:

Math 426: Probability MWF 1pm, Gasson 310 Homework 4 Selected Solutions

Lecture 3. Ax x i a i. i i

An (almost) unbiased estimator for the S-Gini index

Modelli Clamfim Equazioni differenziali 7 ottobre 2013

n α j x j = 0 j=1 has a nontrivial solution. Here A is the n k matrix whose jth column is the vector for all t j=0

Economics 101. Lecture 4 - Equilibrium and Efficiency

Exercises. 18 Algorithms

Week 2. This week, we covered operations on sets and cardinality.

Lecture 3: Probability Distributions

A combinatorial problem associated with nonograms

Homework 9 Solutions. 1. (Exercises from the book, 6 th edition, 6.6, 1-3.) Determine the number of distinct orderings of the letters given:

Société de Calcul Mathématique SA

Introduction to Algorithms

Supplement to Clustering with Statistical Error Control

A 2D Bounded Linear Program (H,c) 2D Linear Programming

Bernoulli Numbers and Polynomials

Part (a) (Number of collisions) Recall we showed that if we throw m balls in n bins, the average number of. Use Chebyshev s inequality to show that:

Generalized Linear Methods

2.3 Nilpotent endomorphisms

Finding Primitive Roots Pseudo-Deterministically

20. Mon, Oct. 13 What we have done so far corresponds roughly to Chapters 2 & 3 of Lee. Now we turn to Chapter 4. The first idea is connectedness.

Homework Assignment 3 Due in class, Thursday October 15

Feature Selection: Part 1

Outline and Reading. Dynamic Programming. Dynamic Programming revealed. Computing Fibonacci. The General Dynamic Programming Technique

Online Classification: Perceptron and Winnow

Math 261 Exercise sheet 2

Canonical transformations

U.C. Berkeley CS294: Beyond Worst-Case Analysis Handout 6 Luca Trevisan September 12, 2017

Introduction to Vapor/Liquid Equilibrium, part 2. Raoult s Law:

COS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture # 15 Scribe: Jieming Mao April 1, 2013

Unit 5: Quadratic Equations & Functions

Complete subgraphs in multipartite graphs

Online Appendix to The Allocation of Talent and U.S. Economic Growth

Applied Stochastic Processes

C/CS/Phy191 Problem Set 3 Solutions Out: Oct 1, 2008., where ( 00. ), so the overall state of the system is ) ( ( ( ( 00 ± 11 ), Φ ± = 1

1 GSW Iterative Techniques for y = Ax

CHAPTER 14 GENERAL PERTURBATION THEORY

Supplement: Proofs and Technical Details for The Solution Path of the Generalized Lasso

Remarks on the Properties of a Quasi-Fibonacci-like Polynomial Sequence

ELASTIC WAVE PROPAGATION IN A CONTINUOUS MEDIUM

Transcription:

Algorthms Non-Lecture E: Tal Inequaltes If you hold a cat by the tal you learn thngs you cannot learn any other way. Mar Twan E Tal Inequaltes The smple recursve structure of sp lsts made t relatvely easy to derve an upper bound on the expected worst-case search tme, by way of a stronger hgh-probablty upper bound on the worst-case search tme. We can prove smlar results for treaps, but because of the more complex recursve structure, we need slghtly more sophstcated probablstc tools. These tools are usually called tal nequaltes; ntutvely, they bound the probablty that a random varable wth a bell-shaped dstrbuton taes a value n the tals of the dstrbuton, far away from the mean. E.1 Marov s Inequalty Perhaps the smplest tal nequalty was named after the Russan mathematcan Andrey Marov; however, n strct accordance wth Stgler s Law of Eponymy, t frst appeared n the wors of Marov s probablty teacher, Pafnuty Chebyshev. 1 Marov s Inequalty. Let X be a non-negatve nteger random varable. Pr[X t] E[X ]/t. For any t > 0, we have Proof: The nequalty follows from the defnton of expectaton by smple algebrac manpulaton. E[X ] = Pr[X = ] [defnton of E[X ]] =0 = Pr[X ] [algebra] =0 t 1 Pr[X ] [snce t < ] =0 t 1 Pr[X t] [snce < t] =0 = t Pr[X t] [algebra] Unfortunately, the bounds that Marov s nequalty mples (at least drectly) are often very wea, even useless. (For example, Marov s nequalty mples that wth hgh probablty, every node n an n-node treap has depth O(n 2 log n). Well, duh!) To get stronger bounds, we need to explot some addtonal structure n our random varables. 1 The closely related tal bound tradtonally called Chebyshev s nequalty was actually dscovered by the French statstcan Irénée-Jules Benaymé, a frend and colleague of Chebyshev s. 1

Algorthms Non-Lecture E: Tal Inequaltes E.2 Sums of Indcator Varables A set of random varables X 1, X 2,..., X n are sad to be mutually ndependent f and only f n Pr (X = x ) = Pr[X = x ] for all possble values x 1, x 2,..., x n. For examples, dfferent flps of the same far con are mutually ndependent, but the number of heads and the number of tals n a sequence of n con flps are not ndependent (snce they must add to n). Mutual ndependence of the X s mples that the expectaton of the product of the X s s equal to the product of the expectatons: E X = E[X ]. Moreover, f X 1, X 2,..., X n are ndependent, then for any functon f, the random varables f (X 1 ), f (X 2 ),..., f (X n ) are also mutually ndependent. Suppose X = n X s the sum of n mutually ndependent random ndcator varables X. For each, let p = Pr[X = 1], and let µ = E[X ] = E[X ] = p. Chernoff Bound (Upper Tal). Pr[X > (1 + δ)µ] < (1 + δ) 1+δ µ for any δ > 0. Proof: The proof s farly long, but t reples on just a few basc components: a clever substtuton, Marov s nequalty, the ndependence of the X s, The World s Most Useful Inequalty e x > 1 + x, a tny bt of calculus, and lots of hgh-school algebra. We start by ntroducng a varable t, whose role wll become clear shortly. P r[x > (1 + δ)µ] = Pr[e tx > e t(1+δ)µ ] To cut down on the superscrpts, I ll usually wrte exp(x) nstead of e x n the rest of the proof. Now apply Marov s nequalty to the rght sde of ths equaton: P r[x > (1 + δ)µ] < E[exp(tX )] exp(t(1 + δ)µ). We can smplfy the expectaton on the rght usng the fact that the terms X are ndependent. E[exp(tX )] = E exp t X = E exp(tx ) = E[exp(tX )] We can bound the ndvdual expectatons E e tx usng The World s Most Useful Inequalty: E[exp(tX )] = p e t + (1 p ) = 1 + (e t 1)p < exp (e t 1)p Ths nequalty gves us a smple upper bound for E[e tx ]: E[exp(tX )] < exp((e t 1)p ) < exp (e t 1)p = exp((e t 1)µ) 2

Algorthms Non-Lecture E: Tal Inequaltes Substtutng ths bac nto our orgnal fracton from Marov s nequalty, we obtan P r[x > (1 + δ)µ] < E[exp(tX )] exp(t(1 + δ)µ) < exp((et 1)µ) exp(t(1 + δ)µ) = exp(e t 1 t(1 + δ)) µ Notce that ths last nequalty holds for all possble values of t. To obtan the fnal tal bound, we wll choose t to mae ths bound as tght as possble. To mnmze e t 1 t tδ, we tae ts dervatve wth respect to t and set t to zero: d d t (et 1 t(1 + δ)) = e t 1 δ = 0. (And you thought calculus would never be useful!) Ths equaton has just one soluton t = ln(1 + δ). Pluggng ths bac nto our bound gves us P r[x > (1 + δ)µ] < exp(δ (1 + δ) ln(1 + δ)) µ = (1 + δ) 1+δ µ And we re done! Ths form of the Chernoff bound can be a bt clumsy to use. A more complcated argument gves us the bound Pr[X > (1 + δ)µ] < e µδ2 /3 for any 0 < δ < 1. A smlar argument gves us an nequalty boundng the probablty that X s sgnfcantly smaller than ts expected value: Chernoff Bound (Lower Tal). Pr[X < (1 δ)µ] < (1 δ) 1 δ µ < e µδ2 /2 for any δ > 0. E.3 Bac to Treaps In our analyss of randomzed treaps, we defned the ndcator varable A to have the value 1 f and only f the node wth the th smallest ey ( node ) was a proper ancestor of the node wth the th smallest ey ( node ). We argued that Pr[A [ ] = 1] = + 1, and from ths we concluded that the expected depth of node s E[depth()] = n Pr[A = 1] = H + H n 2 < 2 ln n. To prove a worst-case expected bound on the depth of the tree, we need to argue that the maxmum depth of any node s small. Chernoff bounds mae ths argument easy, once we establsh that the relevant ndcator varables are mutually ndependent. Lemma 1. For any ndex, the 1 random varables A wth < are mutually ndependent. Smlarly, for any ndex, the n random varables A wth > are mutually ndependent. 3

Algorthms Non-Lecture E: Tal Inequaltes Proof: To smplfy the notaton, we explctly consder only the case = 1, although the argument generalzes easly to other values of. Fx n 1 arbtrary ndcator values x 2, x 3,..., x n. We prove the lemma by nducton on n, wth the vacuous base case n = 1. The defnton of condtonal probablty gves us n Pr (A 1 = x ) = Pr (A = x ) A n 1 = x n = Pr (A = x ) An 1 = x n Pr A n 1 = x n Now recall that A n 1 = 1 f and only f node n has the smallest prorty, and the other n 2 ndcator varables A 1 depend only on the order of the prortes of nodes 1 through. There are exactly ()! permutatons of the n prortes n whch the nth prorty s smallest, and each of these permutatons s equally lely. Thus, Pr (A = x ) An 1 = x n = Pr (A = x ) The nductve hypothess mples that the varables A 2 1,..., A 1 are mutually ndependent, so Pr (A = x ) = Pr A 1 = x. We conclude that n Pr (A 1 = x ) = Pr A n 1 = x n Pr A 1 = x = Pr A 1 = x, or n other words, that the ndcator varables are mutually ndependent. Theorem 2. The depth of a randomzed treap wth n nodes s O(log n) wth hgh probablty. Proof: Frst let s bound the probablty that the depth of node s at most 8 ln n. There s nothng specal about the constant 8 here; I m beng generous to mae the analyss easer. The depth s a sum of n ndcator varables A, as ranges from 1 to n. Our Observaton allows us to partton these varables nto two mutually ndependent subsets. Let d < () = < A and d > () = < A, so that depth() = d <() + d > (). If depth() > 8 ln n, then ether d < () > 4 ln n or d > () > 4 ln n. Chernoff s nequalty, wth µ = E[d < ()] = H 1 < ln n and δ = 3, bounds the probablty that d < () > 4 ln n as follows. e 3 µ e 3 ln n Pr[d < () > 4 ln n] < Pr[d < () > 4µ] < 4 4 < 4 4 = n ln(e3 /4 4) = n 3 4 ln 4 < 1 n 2. (The last step uses the fact that 4 ln 4 5.54518 > 5.) The same analyss mples that Pr[d > () > 4 ln n] < 1/n 2. These nequaltes mply the crude bound Pr[depth() > 4 ln n] < 2/n 2. Now consder the probablty that the treap has depth greater than 10 ln n. Even though the dstrbutons of dfferent nodes depths are not ndependent, we can conservatvely bound the probablty of falure as follows: n n Pr max depth() > 8 ln n = Pr (depth() > 8 ln n) Pr[depth() > 8 ln n] < 2 n. =1 4 =1

Algorthms Non-Lecture E: Tal Inequaltes Ths argument mples more generally that for any constant c, the depth of the treap s greater than c ln n wth probablty at most 2/n c ln c c. We can mae the falure probablty an arbtrarly small polynomal by choosng c approprately. Ths lemma mples that any search, nserton, deleton, or merge operaton on an n-node treap requres O(log n) tme wth hgh probablty. In partcular, the expected worst-case tme for each of these operatons s O(log n). Exercses 1. Prove that for any nteger such that 1 < < n, the n 1 ndcator varables A wth are not mutually ndependent. [Hnt: Consder the case n = 3.] 2. Recall from Exercse 1 n the prevous note that the expected number of descendants of any node n a treap s O(log n). Why doesn t the Chernoff-bound argument for depth mply that, wth hgh probablty, every node n a treap has O(log n) descendants? The concluson s clearly bogus Every treap has a node wth n descendants! but what s the hole n the argument? 3. A heater s a sort of dual treap, n whch the prortes of the nodes are gven, but ther search eys are generate ndependently and unformly from the unt nterval [0, 1]. You can assume all prortes and eys are dstnct. (a) Prove that for any r, the node wth the rth smallest prorty has expected depth O(log r). (b) Prove that an n-node heater has depth O(log n) wth hgh probablty. (c) Descrbe algorthms to perform the operatons INSERT and DELETEMIN n a heater. What are the expected worst-case runnng tmes of your algorthms? In partcular, can you express the expected runnng tme of INSERT n terms of the prorty ran of the newly nserted tem? c Copyrght 2009 Jeff Ercson. Released under a Creatve Commons Attrbuton-NonCommercal-ShareAle 3.0 Lcense (http://creatvecommons.org/lcenses/by-nc-sa/3.0/). Free dstrbuton s strongly encouraged; commercal dstrbuton s expressly forbdden. See http://www.cs.uuc.edu/~jeffe/teachng/algorthms/ for the most recent revson. 5