An Introduction to Randomized Algorithms

Similar documents
Convergence of random variables. (telegram style notes) P.J.C. Spreij

Lecture 6: Coupon Collector s problem

Basics of Probability Theory (for Theory of Computation courses)

Learning Theory: Lecture Notes

Discrete Mathematics for CS Spring 2007 Luca Trevisan Lecture 22

Lecture 2: April 3, 2013

7.1 Convergence of sequences of random variables

Math 155 (Lecture 3)

1 Convergence in Probability and the Weak Law of Large Numbers

Advanced Stochastic Processes.

Axioms of Measure Theory

If a subset E of R contains no open interval, is it of zero measure? For instance, is the set of irrationals in [0, 1] is of measure zero?

Lecture 19: Convergence

Lecture 12: November 13, 2018

7.1 Convergence of sequences of random variables

Chapter 3. Strong convergence. 3.1 Definition of almost sure convergence

Lecture 7: Properties of Random Samples

n outcome is (+1,+1, 1,..., 1). Let the r.v. X denote our position (relative to our starting point 0) after n moves. Thus X = X 1 + X 2 + +X n,

Lecture 2: Concentration Bounds

This section is optional.

Discrete Mathematics for CS Spring 2005 Clancy/Wagner Notes 21. Some Important Distributions

It is always the case that unions, intersections, complements, and set differences are preserved by the inverse image of a function.

EE 4TM4: Digital Communications II Probability Theory

Application to Random Graphs

Introduction to Probability. Ariel Yadin

Random Variables, Sampling and Estimation

STAT Homework 1 - Solutions

Lecture 2. The Lovász Local Lemma

Measure and Measurable Functions

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS

Law of the sum of Bernoulli random variables

Distribution of Random Samples & Limit theorems

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 21 11/27/2013

Empirical Process Theory and Oracle Inequalities

CS 330 Discussion - Probability

On Random Line Segments in the Unit Square

Randomized Algorithms I, Spring 2018, Department of Computer Science, University of Helsinki Homework 1: Solutions (Discussed January 25, 2018)

Problem Set 2 Solutions


Infinite Sequences and Series

Massachusetts Institute of Technology

Lecture 3 : Random variables and their distributions

Math 216A Notes, Week 5

Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 19

Lecture 15: Learning Theory: Concentration Inequalities

CEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering

Lecture Notes for Analysis Class

Chapter 5. Inequalities. 5.1 The Markov and Chebyshev inequalities

Discrete Mathematics and Probability Theory Spring 2012 Alistair Sinclair Note 15

Chapter 0. Review of set theory. 0.1 Sets

Discrete Mathematics and Probability Theory Summer 2014 James Cook Note 15

Product measures, Tonelli s and Fubini s theorems For use in MAT3400/4400, autumn 2014 Nadia S. Larsen. Version of 13 October 2014.

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 6 9/24/2008 DISCRETE RANDOM VARIABLES AND THEIR EXPECTATIONS

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 2 9/9/2013. Large Deviations for i.i.d. Random Variables

Econ 325/327 Notes on Sample Mean, Sample Proportion, Central Limit Theorem, Chi-square Distribution, Student s t distribution 1.

Discrete Mathematics for CS Spring 2008 David Wagner Note 22

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss

Expectation and Variance of a random variable

CS / MCS 401 Homework 3 grader solutions

Topic 9: Sampling Distributions of Estimators

CS 171 Lecture Outline October 09, 2008

Optimally Sparse SVMs

CHAPTER 10 INFINITE SEQUENCES AND SERIES

Lecture 2 February 8, 2016

CHAPTER 5. Theory and Solution Using Matrix Techniques

Simulation. Two Rule For Inverting A Distribution Function

Topic 9: Sampling Distributions of Estimators

Integrable Functions. { f n } is called a determining sequence for f. If f is integrable with respect to, then f d does exist as a finite real number

Introduction to Probability. Ariel Yadin. Lecture 2

Final Review for MATH 3510

The Boolean Ring of Intervals

Econ 325 Notes on Point Estimator and Confidence Interval 1 By Hiro Kasahara

ACO Comprehensive Exam 9 October 2007 Student code A. 1. Graph Theory

1 Hash tables. 1.1 Implementation

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4

Lecture 4: April 10, 2013

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.

Introduction to Probability. Ariel Yadin. Lecture 7

6.3 Testing Series With Positive Terms

f X (12) = Pr(X = 12) = Pr({(6, 6)}) = 1/36

Disjoint Systems. Abstract

Lecture 2 Long paths in random graphs

Lecture 3: August 31

Here are some examples of algebras: F α = A(G). Also, if A, B A(G) then A, B F α. F α = A(G). In other words, A(G)

lim za n n = z lim a n n.

Fall 2013 MTH431/531 Real analysis Section Notes

Singular Continuous Measures by Michael Pejic 5/14/10

BHW #13 1/ Cooper. ENGR 323 Probabilistic Analysis Beautiful Homework # 13

Topic 9: Sampling Distributions of Estimators

2.2. Central limit theorem.

Chapter 6 Principles of Data Reduction

Lecture 2: Monte Carlo Simulation

Approximations and more PMFs and PDFs

Frequentist Inference

Solution. 1 Solutions of Homework 1. Sangchul Lee. October 27, Problem 1.1

Topic 5 [434 marks] (i) Find the range of values of n for which. (ii) Write down the value of x dx in terms of n, when it does exist.

Linear Regression Demystified

Math 525: Lecture 5. January 18, 2018

Variance of Discrete Random Variables Class 5, Jeremy Orloff and Jonathan Bloom

Agnostic Learning and Concentration Inequalities

Transcription:

A Itroductio to Radomized Algorithms The focus of this lecture is to study a radomized algorithm for quick sort, aalyze it usig probabilistic recurrece relatios, ad also provide more geeral tools for aalysis of radomized algorithms. For a quick overview of probability ad terms associated with it, the reader is advised to see the Appedix. 1 Radomized QuickSort I this sectio, we preset the classic quick sort algorithm ad compute the expected ruig time of the algorithm. We assume that the elemets of the set are all distict. Below is the radomized quick sort algorithm. Algorithm RadQuickSort(S) Choose a pivot elemet x i u.a.r from S = {x 1, x 2,, x } Split the set S ito two subsets S 1 = {x j x j < x i } ad S 2 = {x j x j > x i } by comparig each x j with the chose x i Recurse o sets S 1 ad S 2 Output the sorted set S 1 the x i ad the sorted S 2. ed Algorithm Aalysis Let T() be the umber of steps take by the RadQuickSort algorithm o a set of size. Note that the maximum value of T() occurs whe the pivot elemet x i is the largest/smallest elemet of the remaiig set durig each recursive call of the algorithm. I this case, T() = + ( 1) + + 1 = O( 2 ). This value of T() is reached with a very low probability of 1 1 1 1 2 = 1!. Also, the best case occurs whe the pivot elemet splits the set S ito two equal sized subsets ad the T() = O( l ). This implies that T() has a distributio betwee O( l ) ad O( 2 ). Now we derive the expected value of T(). Note that if the i th smallest elemet is chose as the pivot elemet the S 1 ad S 2 will be of sizes i 1 ad i 1 respectively ad this choice has a probability of 1. The recurrece relatio for T() is: T() = + T(X) + T( 1 X) (1) where, Pr[X = i] = 1 for i {0, 1,, 1}. Hece, Pr[ 1 X = i] = Pr[X = 1 i] = 1. However, it is icorrect to write T( 1 X) = T(X). Takig expectatios o both sides of (1), E[T()] = + 1 1 E[T(i)] + 1 1 j=1 E[T(j)] = + 2 1 E[T(i)]. Let f(i) = E[T(i)]. The, f() = + 2 1 f(i). Simplifyig, Substitutig 1 for i (2), f() = 2 + 2(f(1) + f(2) + + f( 1)) (2) ( 1)f( 1) = ( 1) 2 + 2(f(1) + f(2) + + f( 2)) (3) Subtractig (3) from (2), we get f() ( 1)f( 1) = (2 1)+2f( 1) or f() = +1 2 1 f( 1)+. We prove by iductio that f() 2 l. 1

1.0.1 Claim: f() 2 l Proof: Iductio basis: = 1, the claim holds. Let the claim hold for all values up to 1. The, f() = +1 2 1 f( 1) + +1 2 1 2( 1)l( 1) + = 2(2 1) = 2(2 1) l( 1) + 2 1 (l + l(1 1 2 1 )) + We make use of the stadard iequality stated below. by iductio hypothesis Fact 1.1 1 + x e x for x R. For example, 1 + 0.2 e 0.2 ad 1 0.2 e 0.2. f() 2(2 1) (l 1 ) + 2 1 = 2 l 2 l 2 + 2 + 2 1 2 2 l, establishig the iductive step. Hece, the expected ruig time of the radomized quick sort algorithm is O( l ). But oe of the limitatios of the recurrece relatio approach is that we do ot how the ruig time of the algorithm is spread aroud its expected value. Ca this aalysis be exteded to aswer questios such as, with what probability does the algorithm RadQuickSort eeds more tha 12 l time steps? Later o, we apply a differet techique ad establish that this probability is very small. Similarly, for the case of the Maximum algorithm, by solvig the recurrece relatio for the expected ruig time, we will ot be able to aswer questios of the form, what is the probability that the ru time is greater tha 3? Aswers to such questios give how close the radom variable is spread aroud its expected value or how varied the distributio is. To be able to aswer such queries, we study Tail iequalities i the ext sectio. 2 Tail Iequalities I this sectio, we study three ways to estimate the tail probabilities of radom variables. It will be oted that, the more iformatio we kow about the radom variable the better the estimate we ca derive about a give tail probability. 2.1 Markov Iequality Theorem 2.1 Markov Iequality If X is a o-egative valued radom variable with a expectatio of µ, the Pr[X cµ] 1 c. Proof. By defiitio, µ = a apr[x = a] = a<cµ apr[x = a] + a cµ apr[x = a] 0 + a cµ cµpr[x = a] as X is o-egative valued = cµ a cµ Pr[X = a] = cµpr[x cµ] Hece, Pr[X cµ] µ cµ = 1 c. The kowledge of the stadard deviatio of the radom variable X would most ofte give a better boud. 2.2 Chebychev Iequality We first defie the terms stadard deviatio ad variace of a radom variable X. Defiitio 2.2 Let X be a radom variable with a expectatio of µ. The variace of X, deoted by var(x), is defied as var(x) = E[(X µ) 2 ]. The stadard deviatio of X, deoted by σ X, is defied as σ X = var(x). Note that by defiitio, var(x) = E[(X µ) 2 ] = E[X 2 2Xµ+µ 2 ] = E[X 2 ] µ 2. The secod equality follows from the liearity of expectatios. 2

Theorem 2.3 Chebychev Iequality Let X be a radom variable with expectatio µ X ad stadard deviatio σ X. The, Pr[ X µ X cσ X ] 1 c 2. Proof. Let radom variable Y = (X µ X ) 2. The, E[Y ] = E[(X µ X ) 2 ] = σx 2 by defiitio ad also Y is a o-egative valued radom variable. Now, Pr[ X µ X cσ X ] = Pr[(X µ X ) 2 c 2 σx 2 ] = Pr[Y c2 σx 2 ]. Applyig Markov Iequality to the radom variable Y, Pr[Y c 2 σx 2 ] = Pr[Y c2 µ Y ] 1 c. Hece the 2 theorem. 2.3 Cheroff Bouds The tail estimates give by Theorem 2.1, Theorem 2.3 work for radom variables i geeral. But if the radom variable X ca be expressed as a sum of idepedet radom variables each of which is 0, 1 valued, the we ca obtai very tight bouds o the tail estimates. This is expressed i the followig theorem ad the bouds are commoly called as Cheroff Bouds. Theorem 2.4 Cheroff Bouds: Let X be a radom variable defied as X = X 1 + X 2 + + X where each X i, 1 i, is a 0, 1 valued radom variable ad all X i s ( are idepedet. ) Also, let E[X] = µ ad Pr[X i = 1] = p i, 1 i. The for ay µ. e δ > 0, Pr[X µ(1 + δ)] δ (1+δ) 1+δ Proof. The ormal strategy employed to prove tail estimates of sums of idepedet radom variables is to make use of expoetial momets. While provig Chebychev iequality (Theorem 2.3), we made use of secod-order momet. It ca be observed that usig higher order momets would geerally improve the boud o the tail iequality. But usig expoetial momets would result i a vast improvemet i the boud as we shall see later i some examples. Observe that, µ = p i (4) by defiitio of X ad usig liearity of expectatios. Let Y i = e txi for some parameter t to be chose later. Note that Y i s are also idepedet as X i s are. Defie the radom variable Y = Y 1 Y 2 Y. The, E[Y i ] = E[e txi ] = p i e t + (1 p i )e 0 = 1 p i + p i e t (5) E[Y ] = E[Y 1 Y 2 Y ] = Π E[Y i] = Π (1 p i + p i e t ) (6) where the secod equality follows from idepedece of Y i s ad the last equality follows from (5). Now, Pr[X µ(1 + δ)] = Pr[Y e tµ(1+δ) ] E[Y ] e tµ(1+δ) = Π (1 p i + p i e t ) e tµ(1+δ) (7) by usig Markov iequality from Theorem 2.1 ad equatio (6). We ow make use of the iequality from Fact 1.1 i equatio (7) which the reduces to, Pr[X µ(1 + δ)] = Π e pi(1 et ) e tµ(1+δ) = e µ(1 e t ) = e µ[(e t 1) t(1+δ)] e tµ(1+δ) where the secod equality follows from equatio (4). I (8), we ca choose a value of t that miimizes the probability estimate. To fid the miimum let f(t) = le µ[(et 1) t(1+δ)] = µ(1 e t ) tµ(1 + δ). Differetiatig f(t) with respect to t ad equatig it to zero gives us µe t µ(1 + δ) = 0 or t = l(1 + δ). Usig this value of t i (8), That completes the proof. Pr[X µ(1 + δ)] e µ(1 (1+δ)) (1 + δ) = e µδ ( µ(1+δ) (1 + δ) = e δ ) µ. (9) (1+δ) (1 + δ) (1+δ) But the form of the iequality i equatio (9) is ot very coveiet to hadle. I additio this form is hard to ivert, i.e. give the probability boud, choose a appropriate δ. Istead, we use the followig form most ofte. (8) 3

Theorem 2.5 Let X be defied as i Theorem 2.4. The, 2.4 Applicatio of Tail Iequalities Pr[X (1 + δ)µ] We cosider the use of tail iequalities to two problems. 2.4.1 Balls ad Bis { e µδ 2 /4 if δ 1 e µδ l δ if δ > 1 Cosider throwig balls idepedetly ad uiformly at radom ito bis. We are iterested i the probability that bi 1 cotais more tha 4 balls. Defie 0, 1 valued radom variables X i, 1 i, defied as X i = 1 if ball i falls ito bi 1 ad 0 otherwise. By uiformity, Pr[X i = 1] = 1. Defie the radom variable X = X 1 + X 2 + X. Thus X deotes the umber of balls that fall i bi 1. By liearity of expectatios, E[X] = E[ X i] = E[X i] = 1 = 1. Usig Markov iequality from Theorem 2.1, we get Pr[X 4] 1 4 (10) Before usig Chebychev iequality (Theorem 2.3), we first compute the stadard deviatio of X as follows. var(x i ) = E[X 2 i ] E[X i ] 2 = 1 1 2 (11) var(x) = var(x 1 ) = ( 1 1 2 ) = 1 1 where the first equality i (12) follows from the idepedece of X i s. Hece, σ X = 1 1 (12) (13) Applyig Chebychev iequality to the radom variable X, Pr[X 4] = Pr[X 1 3] Pr[ X 1 3] ( 1 3 1 1 ) 2 = 1 1 9 1 9 (14) Usig Cheroff bouds from Theorem 2.4, Pr[X 4] = Pr[X (1 + 3)1] e 1 3l 3 = 1 27 where i we used the simpler form of Cheroff bouds as stated i Theorem 2.5. Now comparig equatios (10), (14) ad (15) we ca see that usig Chebychev iequality gives a better boud tha that of Markov iequality ad a much better boud is obtaied usig Cheroff bouds. As aother example, let us cosider the probability that bi 1 has more tha 1 + 10 l balls. Usig Markov iequality, we get Pr[X 1 1 + 10 l]. Usig Chebychev iequality we get that Pr[X 1 + 10 l] Pr[ X 1 10 l] 1 1 100 l 2 1 1+10 l 100l 2. Whereas, usig Cheroff bouds, we ca get Pr[X 1 + 10 l] e 10l l (10l ) = 10l 10 l 1. With a tighter aalysis usig Cheroff bouds it ca be show that for the case of balls ad 10 bis, the probability that bi 1 has more tha c l l l for some costat c is very small, say 1. 2 (15) 4

2.4.2 l balls ad bis I this sceario we have l balls ad bis where balls are throw ito the bis idepedetly ad uiformly at radom. Defie the radom variables X i, 1 i l ad X as above. We have Pr[X i = 1] = 1/ ad E[X] = l. Let us estimate the probability that bi 1 has more tha 10 l balls. Usig Markov iequality, we get that Pr[X 10 l] 1/10. Istead, usig Cheroff bouds, we get Pr[X 10 l] = Pr[X (1 + 9)l] e 9l l 9 1/ 20 which is expoetially small. I geeral, it ca be observed that if the expectatio of the radom variable is small the we pay a higher pealty to derive a w.h.p boud. A Probability Theory We start by defiig probability ad the itroduce some well-kow iequalities that we ofte use. Let Ω be a arbitrary set, called the sample space. We start by defiig a σ field, also sometimes called a σ algebra. Defiitio A.1 (σ field) A collectio F of subsets of Ω is called a σ field if it satisfies: 1. Ω F 2. A F implies A c F, ad 3. For ay coutable sequece A 1, A 2,..., if A 1, A 2,... F the A 1 A 2... F. Defiitio A.2 A set fuctio Pr o a σ field F of subsets of Ω such that Pr : F [0, 1] is called a probability measure if it satisfies: 1. 0 Pr(A) 1, A F. 2. Pr(Ω) = 1, ad 3. If A 1, A 2,... is a disjoit sequece of sets i F the ( ) Pr A i = Pr(A i ) The triad (Ω, F, Pr) is ofte called a probability space. For equivalet ad alterate defiitios, examples, ad a more complete itroductio, we refer the reader to stadard textbooks o probability [?]. I the followig, if o probability space is metioed the ay space (Ω, F, Pr) ca be take. We ofte use the followig iequality called Boole s iequality which is part of a geeral Boole-Boferroi iequalities [?] ad this is also sometimes referred to as the uio boud as it provides a boud o the probability of a uio of evets. This iequality is also referred to as the (fiite) sub-additivity property of the probability measure. Propositio A.3 (Boole s iequality) For ay arbitrary evets A 1, A 2,...A, ( ) Pr A i Pr(A i ) The otio of idepedece is a importat cocept i the study of probability. Defiitio A.4 (Idepedece) A collectio of evets {A i : i I} is said to be idepedet if for all S I, Pr( i S A i ) = Π i S Pr(A i ). We ow defie radom variable, which is ay measurable fuctio from Ω to IR. Let R deote the stadard Borel σ field associated with IR, which is the σ field geerated by left-ope itervals of IR [?]. Defiitio A.5 (Radom Variable) Give a probability space (Ω, F, Pr), a mappig X : Ω IR is called a radom variable if it satisfies the coditio that X 1 (R) F for every R R. 5

We represet as {X x} as the set {ω Ω X(ω) x} for x IR ad also write Pr(X x) as the probability of the above evet. Similar defiitio ca be made for represetig the set {ω Ω X(ω = x} as {X = x}. The otio of idepedece also exteds to radom variables. Two radom variables X ad Y are said to be idepedet if the evets {X x} ad {Y y} are idepedet for x, y R. The defiitio exteds to multiple radom variables just as i Defiitio A.4. Associated with ay radom variable is a distributio fuctio defied as follows. Defiitio A.6 (Distributio fuctio) The distributio fuctio F : IR [0, 1] for a radom variable X is defied as F X (x) = Pr(X x). A radom variable X is said to be a discrete radom variable if the rage of X is a fiite or coutably ifiite subset of IR. For discrete radom variables, the followig defiitio ca be provided for the desity of a radom variable. Defiitio A.7 (Desity) Give a radom variable X, the desity fuctio f X : IR [0, 1] of X is defied as f X (x) = Pr(X = x). The above defiitio ca be exteded to all types of radom variables also with proper care. I the rest of this sectio, we focus o discrete radom variables oly ad hece the defiitios are made for the case of discrete radom variables. With proper care, the defiitios however ca be exteded [?]. A importat quatity of iterest of a radom variable is its expectatio. Defiitio A.8 (Expectatio) Give a probability space (Ω, F, Pr) ad a radom variable X, the expectatio of X, deoted E[X], is defied as E[X] = x IRxPr[X = x] with the covetio that 0 = 0 = 0. 6