Essentials on the Analysis of Randomized Algorithms

Similar documents
Midterm 2 Review. CS70 Summer Lecture 6D. David Dinh 28 July UC Berkeley

Disjointness and Additivity

Approximate Counting and Markov Chain Monte Carlo

Stat 516, Homework 1

Lecture 2 Sep 5, 2017

Random Variable. Pr(X = a) = Pr(s)

ELEG 3143 Probability & Stochastic Process Ch. 2 Discrete Random Variables

Lecture Notes 7 Random Processes. Markov Processes Markov Chains. Random Processes

Randomized Algorithms

Eleventh Problem Assignment

Probability & Computing

Twelfth Problem Assignment

Lecture 4: Two-point Sampling, Coupon Collector s problem

Model Counting for Logical Theories

Introduction to Probability

Math 151. Rumbos Fall Solutions to Review Problems for Exam 2. Pr(X = 1) = ) = Pr(X = 2) = Pr(X = 3) = p X. (k) =

Discrete Random Variables

Lecture 5: The Principle of Deferred Decisions. Chernoff Bounds

1 Ways to Describe a Stochastic Process

Lecture 20 : Markov Chains

CS155: Probability and Computing: Randomized Algorithms and Probabilistic Analysis

Markov chains. Randomness and Computation. Markov chains. Markov processes

Markov Chains and Stochastic Sampling

The Markov Chain Monte Carlo Method

Tail Inequalities. The Chernoff bound works for random variables that are a sum of indicator variables with the same distribution (Bernoulli trials).

Northwestern University Department of Electrical Engineering and Computer Science

Theorem 1.7 [Bayes' Law]: Assume that,,, are mutually disjoint events in the sample space s.t.. Then Pr( )

1 Stat 605. Homework I. Due Feb. 1, 2011

Bandits, Experts, and Games

Part IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

Markov Processes Hamid R. Rabiee

Markov Chains. X(t) is a Markov Process if, for arbitrary times t 1 < t 2 <... < t k < t k+1. If X(t) is discrete-valued. If X(t) is continuous-valued

MARKOV PROCESSES. Valerio Di Valerio

Lecture Notes 5 Convergence and Limit Theorems. Convergence with Probability 1. Convergence in Mean Square. Convergence in Probability, WLLN

Probability. Lecture Notes. Adolfo J. Rumbos

Stochastic Models in Computer Science A Tutorial

Randomized algorithm

Notes on Discrete Probability

Quick Tour of Basic Probability Theory and Linear Algebra

Problem Points S C O R E Total: 120

Lectures on Markov Chains

Markov Chains (Part 3)

Randomized Algorithms

STA 711: Probability & Measure Theory Robert L. Wolpert

Topic 3: The Expectation of a Random Variable

Random variables. DS GA 1002 Probability and Statistics for Data Science.

Mathematical Methods for Computer Science

Name: Firas Rassoul-Agha

MATH Solutions to Probability Exercises

X = X X n, + X 2

Sample Spaces, Random Variables

1 Random Variable: Topics

Random variable X is a mapping that maps each outcome s in the sample space to a unique real number x, x. X s. Real Line

CS145: Probability & Computing

Example 1. The sample space of an experiment where we flip a pair of coins is denoted by:

CSE 312 Final Review: Section AA

Recap. Probability, stochastic processes, Markov chains. ELEC-C7210 Modeling and analysis of communication networks

Some Definition and Example of Markov Chain

CSE 525 Randomized Algorithms & Probabilistic Analysis Spring Lecture 3: April 9

An Introduction to Randomized algorithms

DO NOT OPEN THIS QUESTION BOOKLET UNTIL YOU ARE TOLD TO DO SO

Markov Chains CK eqns Classes Hitting times Rec./trans. Strong Markov Stat. distr. Reversibility * Markov Chains

Lecture 4: Inequalities and Asymptotic Estimates

Expectation of Random Variables

Problem Points S C O R E Total: 120

Chernoff Bounds. Theme: try to show that it is unlikely a random variable X is far away from its expectation.

Things to remember when learning probability distributions:

MATH Notebook 5 Fall 2018/2019

Let (Ω, F) be a measureable space. A filtration in discrete time is a sequence of. F s F t

IEOR 6711: Stochastic Models I SOLUTIONS to the First Midterm Exam, October 7, 2008

Assignment 4: Solutions

Expected Value 7/7/2006

25.1 Markov Chain Monte Carlo (MCMC)

6.262: Discrete Stochastic Processes 2/2/11. Lecture 1: Introduction and Probability review

Randomized Algorithms

Notes 1 : Measure-theoretic foundations I

Bayesian Methods for Machine Learning

Randomized Computation

p. 4-1 Random Variables

Handout 5. α a1 a n. }, where. xi if a i = 1 1 if a i = 0.

Lecture 4: Sampling, Tail Inequalities

Lectures 6: Degree Distributions and Concentration Inequalities

U.C. Berkeley CS278: Computational Complexity Professor Luca Trevisan 1/29/2002. Notes for Lecture 3

S n = x + X 1 + X X n.

Chapter 3, 4 Random Variables ENCS Probability and Stochastic Processes. Concordia University

Randomized Complexity Classes; RP

Math 180B Problem Set 3

March 1, Florida State University. Concentration Inequalities: Martingale. Approach and Entropy Method. Lizhe Sun and Boning Yang.

ECE534, Spring 2018: Solutions for Problem Set #4 Due Friday April 6, 2018

Lecture 2: Review of Probability

6.842 Randomness and Computation March 3, Lecture 8

Introduction to probability theory

COUPON COLLECTING. MARK BROWN Department of Mathematics City College CUNY New York, NY

EE514A Information Theory I Fall 2013

Pr[X = s Y = t] = Pr[X = s] Pr[Y = t]

Probability Theory Overview and Analysis of Randomized Algorithms

11.1 Set Cover ILP formulation of set cover Deterministic rounding

ECE 302 Division 1 MWF 10:30-11:20 (Prof. Pollak) Final Exam Solutions, 5/3/2004. Please read the instructions carefully before proceeding.

EECS 495: Randomized Algorithms Lecture 14 Random Walks. j p ij = 1. Pr[X t+1 = j X 0 = i 0,..., X t 1 = i t 1, X t = i] = Pr[X t+

ECE-517: Reinforcement Learning in Artificial Intelligence. Lecture 4: Discrete-Time Markov Chains

Transcription:

Essentials on the Analysis of Randomized Algorithms Dimitris Diochnos Feb 0, 2009 Abstract These notes were written with Monte Carlo algorithms primarily in mind. Topics covered are basic (discrete) random variables, techniques on approximating and bounding combinations and probabilities (emphasis on Hoeffding bound), the central limit theorem, the weak and strong law of large numbers, and fundamental problems that show these techniques in action. Basic definitions on Markov Chains are also presented. In the appendix we find the basic randomized algorithmic schemes, as well as an overview of the complexity classes where these algorithms fall. Most definitions and results are drawn from [BT02]. Basics Definition. (Probability Mass Function (PMF)). The PMF p X of a discrete random variable X is a function that describes the probability mass of each (discrete) value x that X can take; i.e. p X (x) = Pr[X = x].. (Discrete) Random Variables Definition.2 (Bernoulli Random Variable). X is a Bernoulli random variable that takes two values 0 and depending on the outcome of a random process (e.g. tossing a coin once). Its PMF is: { p, if x =, p X (x) = p, if x = 0. The expected value of X is E[X] = p, while the variance is Var[X] = p( p). Definition.3 (Binomial Random Variable). Y is a Binomial random variable with parameters and p that is constructed by Bernoulli random variables X,...,X, each of which is with probability p. It is defined as the sum Y = i= X i. Its PMF is: p Y (k) = Pr[Y = k] = ( k ) p k ( p) k, k = 0,,...,. The expected value of Y is E[Y] = p, while the variance is Var[Y] = p( p). Definition.4 (Geometric Random Variable). Given a sequence of Bernoulli random variables X,X 2,..., each of which is with probability p, Z is a Geometric random variable expressing the minimum i such that X i =. Its PMF is: p Z (k) = ( p) k p, k =,...,. The expected value of Z is E[Z] = p p, while the variance is Var[Z] = p. 2 Definition.5 (Poisson Random Variable). S is a Poisson random variable with parameter λ and PMF given by: λ λk p S (k) = e k!, k = 0,,...,. The expected value of S is E[S] = λ, and the variance is also Var[S] = λ. p Y (k) = k=0 p Z (k) = k= p S (k) = k=0

.2 Bernoulli process Informally it is a sequence of independent coin tosses. Definition.6 (Bernoulli process). It is a sequence X,X 2,... of independent Bernoulli random variables X i such that for every i it holds: { Pr[Xi = ] = Pr[success at the ith trial] = p Pr[X i = 0] = Pr[failure at the ith trial] = p 2 Approximating and Bounding In this section important tools on approximating and bounding probabilities will be explored. 2. The Cauchy-Schwartz Inequality ( n ) 2 ( n ) ( n x i y i x 2 i y 2 i i= i= i= ) () 2.2 Bounding Combinations Let < k < n, with k,n. Then: ( n ) k ( n ) < < k k ( ne ) k and k ( n e ) n < n! (2) 2.3 Common Approximations Exponential: ( x) e x. Poisson: The Poisson PMF with parameter λ is a good approximation for a binomial PMF with parameters and p, provided that λ = p, is very large, and p is very small. 2.4 Bounding Probabilities Union Bound: Let A,A 2,...,A be events in a probability space. Then [ ] Pr A i Pr[A i ] max {Pr[A i ]} (3) i i= i= The first inequality is equality for disjoint events A i. Markov s Inequality: Any non-negative random variable X satisfies: Pr[X α] E[X], α > 0. (4) α Chebyshev s Inequality: Let X be a r. v. with expected value µ and variance σ 2. Then: Pr[ X µ α] σ2, α > 0. (5) α2 Remark 2. (Chebyshev vs. Markov). The Chebyshev inequality tends to give better bounds than the Markov inequality, because it also uses information on the variance of X. 2

Theorem 2.2 (Weak Law of Large umbers). Let X,...,X be a sequence of independent identically distributed random variables, with expected value µ. For every ǫ > 0: ] Pr[ X i µ ǫ 0, as (6) i= Proof. Let X,...,X be a sequence of independent identically distributed random variables, with expected value µ and variance σ 2. Define the random variable Y = i= X i. By linearity of expectation we get E[Y] = i= E[X i] = µ. Since all the X i are independent, the variance is Var[Y] = 2 i= Var[X i] = σ2. We now apply Chebyshev s inequality and obtain Pr[ Y µ ǫ] σ2 ǫ, for any ǫ > 0. 2 2.4. Concentration and Tail Inequalities Proposition 2.3 (Chernoff - Hoeffding Bound [Hoe63]). Let X,...,X be independent random variables, each taking values in the range I = [α,β], and let µ denote the mean of their expectations. Then: [ ] Pr X i µ ǫ e 2ǫ2 /(β α) 2. (7) i= Assuming we want to bound the quantity above by δ, it is enough (β α) 2 2ǫ ln 2 δ. Some typical bounds obtained by the inequality are shown below: ǫ = Ω(/ ) δ 0. 0.0 0.00 46, 052 92, 04 38, 56 (a) ǫ = 2 0 2 = 0.005 δ 0. 0.0 0.00 4, 605, 7 9, 20, 34 3, 85, 5 (b) ǫ = 2 0 3 = 0.0005 Figure : Typical lower bounds on when I = β α =. δ 0. 0.0 0.00 84, 207 368, 44 552, 62 (a) ǫ = 2 0 2 = 0.005 δ 0. 0.0 0.00 8, 420, 68 36, 84, 362 55, 262, 043 (b) ǫ = 2 0 3 = 0.0005 Figure 2: Typical lower bounds on when I = β α = 2. Definition 2.4 (Martingale [AS08]). A martingale is a sequence X 0,...,X of random variables so that for 0 i < it holds: E[X i+ X i,x i,...,x 0 ] = X i. fair gambling Proposition 2.5 (Azuma s Inequality [Mau79, AS08]). Let c = X 0,...,X be a martingale Azuma: c = 0 with X i+ X i 0 i <. Then: Pr[ X c > λ ] < 2e λ2 /2 For λ = 2ln 2.774 2e λ2 /2 =. Check tbl. for some typical approximate values. (8) 3

λ 2 2.5 3 2e λ2 /2 0.27067 0.08787387 0.02227993 Table : Typical bounds obtained for some λ by Azuma s inequality. 2.4.2 Lower bounds on Tails [AS08, appendix] In section 2.4. we obtained upper bounds on Pr[X > α] which were of the form e cǫ2. We can also obtain lower bounds on Pr[X > α]; typically we get Pr[X > α] = Ω(e cǫ2 e dǫ ). 3 Fundamental Problems 3. Coins A coin has two sides; H and T. Set Pr[H] = p and Pr[T] = p, where p is a fixed number. 3.. Games with coins How many heads H: Tossing a coin times and recording the number of times H appeared is a Binomial random variable. First H: Tossing a coin until H comes up is a Geometric random variable. Both H and T: The weighted sum p( + /( p)) + ( p)( + /p) = /p + p/( p) expresses the expected amount of coin tosses in order to observe both H and T. Fair coin: After coin tosses, we observe H T + λ or H T λ with probability bounded by Azuma s Inequality; eq. (8). 3.2 The Coupon Collector s Problem Given coupons what is the expected amount of trials in order to observe all of them? i.e. we are drawing coupons with replacement. Let T be the total time to observe all coupons, and let t i denote the time needed to collect coupon i after i coupons have been collected; i.e. T = i= t i. ote that p i = ( (i ))/, and each t i is a geometric r.v. By linearity of expectation we get: E[T] = E[t i ] = = p i i= i= i= (i ) = i= where H is the th harmonic number. For large we get: + i = i= i = H (9) E[T] = H = ln + γ + 2 2 2 + 20 4 + o ( / 4), (0) where γ = lim (H ln) = 0.577256649... is the Euler-Mascheroni constant [Wei]. Hence, E[T] = O(ln ). All the t i are independent, so the variance is: Var[T] = Var[t i ] = i= < 2 i= i= p i p 2 i This problem is also known as Balls in Bins Problem. < p 2 i= i = i= 2 ( + i) 2 = 2 i 2 = 2 π2 6 < 2 2 () i= i 2 4

Applying Chebyshev s inequality (5) to equations (9) and () we get Pr[ T H λ] < 2 λ 2. (2) 3.2. Generalized Coupon Collector s Problem Again we have coupons, but this time we are interested in a subset K of them ( K = k < ). What we want is the minimum number of trials so that the probability some member of K is missed is less than some predefined value η. Let T be the number of trials. Then a member of K is missed in all T trials with probability ( /) T. By the Union Bound (eq. (3)) this probability is upper bounded by k( /) T. By the exponential approximation this probability is upper bounded by ke T/. By requiring this probability being less than η we get: 4 Central Limit Theorem T > ln(k/η). (3) Theorem 4. (Central Limit Theorem). Let X,X 2,...,X be a sequence of independent identically distributed random variables with common µ and variance σ 2, and define i= Z = X i µ σ. (4) Then, the CDF of Z converges to the standard normal CDF E[Z ] = 0 Φ(z) = z e x2 /2 dx, 2π (5) Var[Z ] = in the sense that lim Pr[Z z] = Φ(z), for all z. (6) Proposition 4.2 (De Moivre - Laplace Approximation to the Binomial). If S is a binomial random variable with parameters p and, is large, and κ,λ, then ( ) ( ) λ + /2 p κ /2 p Pr[κ S λ] Φ Φ (7) p( p) p( p) Remark 4.3 (Quality). The closer p is to 0 or, the larger the so that the approximation is good. When p 0.5 around 40 to 50 already gives very good results. Theorem 4.4 (Strong Law of Large umbers). Let X,X 2,...,X be a sequence of independent identically distributed random variables with mean µ. Then, the sequence of sample means M = i= X i converges to µ, with probability, in the sense that 4. Applications Pr [ lim ] X i = µ =. (8) i= Example (Coins Revisited). We toss a coin 25 times and 20 times we observe H. What is the probability of this event, given that the coin is fair? A direct computation yields ( 25 20)2 25 0.005834. Chebyshev yields Pr[ X 2.5 7.5] 6.25 7.5 = /9 = 0.. The Hoeffding bound gives Pr[ M 2 0.5 0.3] e 50 0.09 = 5

e 4.5 = 0.009. Azuma s inequality gives double of what ) the Hoeffding bound( gives. The ) 9 2.5 Central Limit Theorem gives Pr[S c] Φ so, Pr[S 25 9] Φ 6.25 = ( c p σ Φ ( 6.5 2.5) = Φ(2.6) = 0.9953. So, the requested probability is less than 0.9953 = 0.0047. Using the De Moivre approximation we can compute directly Pr(S 25 = 20) Φ ( ) 20.5 2.5 2.5 Φ ( ) 9.5 2.5 2.5 = Φ(3.2) Φ(2.8) = 0.9993 0.9974 = 0.009. Example 2 (Lower Bound on Iterations). Assume we have a biased coin which gives rise to H with probability p and we want to estimate this value within 0.0 with probability at least 0.9. The Chebyshev inequality gives Pr[ M p 0.0] p( p) (0.0) 04 2 4 and we want that bounded by 0.9 = 0.. This gives = 25,000. The Hoeffding bound gives =,53. In the case of the Central Limit theorem we observe that the variance of M p σ 2 is p( p)/ /(4). Hence, z = ǫ/(/(2 )) = 2ǫ. So we get: Pr[ M p deviation 0.0] 2Pr[M p 0.0] 2( Φ(2 0.0 )) 0.. This implies Φ(0.02 ) 0.95 = Φ(.645) 0.02.645 (82.25) 2 = 6,766. 5 Markov Chains [MR95] Definition 5. (Markov Chain:). A Markov Chain M is a discrete-time stochastic process defined over a set of states S in terms of a S S matrix P of transition probabilities. The set S is either finite or countably infinite. M is in only one state at a time. State transitions occur at time-steps t =,2,... The entry P ij denotes the probability that the next state will be j, given that M is currently at state i. ote that P ij [0,], i,j S and j P ij =. Remark 5.2 (Memorylessness Property:). The next state of M depends only on its current state. 5. otation and Conventions Definition 5.3 (t-step transition probability:). We denote as P (t) ij = Pr[X t = j X 0 = i]. Definition 5.4 (First transition into state j at time t:). is denoted by r t ij and is given by r (t) ij = Pr[X t = j, and for s t,x s j X 0 = i]. Definition 5.5 (Transition into state j at some time t > 0:). is denoted by f ij and is given by f ij = t>0 r(t) ij. Definition 5.6 (Expected # of steps to reach j starting from i:). is denoted by h ij and is given by { h ij = t>0 tr(t) ij, if f ij =,, otherwise. 5.2 Definitions and a theorem The states of M can be classified as: Transient: f ii < ( h ij = ). Persistent: f ii =. These can be further classified as: ull persistent: h ii =. on-null persistent: h ii <. 6

Definition 5.7 (Strong component C:). of a directed graph G, is a maximal subgraph of G such that there exists a path from i to j and back to i for every pair of vertices i,j C. Final: There is no edge so that we can leave G. Definition 5.8 (Irreducible Markov chain:). G consists of a single strong component. Definition 5.9 (State probability vector). q (t) = (q (t),q(t) 2,...,q(t) n ) probability that M is in state i at time t. Definition 5.0 (Stationary distribution). for M with transition matrix P is a probability distribution π such that π = πp. Definition 5. (Periodicity T of a state i). guarantees another visit to state i after a+ti steps for some i 0. A state is periodic if T >, and aperiodic otherwise. A Markov Chain M is aperiodic, if every state is aperiodic. Definition 5.2 (Ergodic state). is one that is aperiodic and non-null persistent. Definition 5.3 (Ergodic Markov Chain). is one in which all states are ergodic. Theorem 5.4 (Fundamental Theorem of Markov Chains). Any irreducible, finite, and aperiodic Markov Chain M has the following properties:. All states are ergodic. 2. There is a unique stationary distribution π, such that π i > 0, i {,...,n}. 3. For all i {,...,n} : f ii = and h ii = π i. 4. Let (i,t) be the number of times M visits state i in t steps. Then, lim t (i,t) t = π i. row vector 6 A glimpse beyond Random walks on graphs and expanders, machine learning, random number generators, parallel computation, probabilistic method. References [AS08] oga Alon and Joel Spencer. The Probabilistic Method. John Wiley, 2008. [BT02] Dimitri P. Bertsekas and John. Tsitsiklis. Introduction to Probability. Athena Scientific, 2002. [Hoe63] Wassily Hoeffding. Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association, 58(30):3 30, March 963. [Mau79] B. Maurey. Construction de suites symétriques. C. R. Acad. Sci. Paris, 288:679 68, 979. [MR95] Rajeev Motwani and Prabhakar Raghavan. Randomized Algorithms. Cambridge University Press, 995. 8th printing, 2006. [Wei] Eric W. Weisstein. Euler-Mascheroni constant. Wolfram MathWorld. http://mathworld.wolfram.com/euler-mascheroniconstant.html. 7

A Basic Randomized Algorithmic Schemes Definition A. (Monte Carlo:). Monte Carlo (MC) algorithms exploit randomness in order to solve problems. The idea is that successive iterations of the core loop of the algorithm give result(s) which are independent of the previous runs. They can be classified as having one-sided error, or a two-sided error. For example, assume you have an algorithm A that decides whether x belongs in a language L, so that the answer we get is: x L Pr[A(x) accepts] p, x L Pr[A(x) accepts] = 0. This is an example of a one-sided error algorithm. A two-sided error algorithm arises if the probability of accepting an input x, when in fact x L, is non-zero. Definition A.2 (Las Vegas:). Las Vegas algorithms are Monte Carlo algorithms which never make a mistake on the result. An example of such an algorithm is quicksort (RandQS). ote that the running time of the Las Vegas algorithms depends on the input. B Complexity Classes on Randomized Algorithms Definition B. (Class RP:). RP (Randomized Polynomial time) algorithms are one-sided error Monte Carlo algorithms, that can err only when x L. Usually p = 2, but choice is arbitrary. Definition B.2 (Class ZPP:). ZPP (Zero-error Probabilistic Polynomial time) algorithms are algorithms that belong in RP co-rp. ote that Las Vegas algorithms belong in this class. Definition B.3 (Class BPP:). BPP (Bounded-error Probabilistic Polynomial time) algorithms are two-sided error Monte Carlo algorithms of the following form: x L Pr[A(x) accepts] 3 4, x L Pr[A(x) accepts] 4. Definition B.4 (Class RLP:). Class RLP is a subclass of RP, in which the algorithms use O(lg n) workspace in worst case. RandQS ZPP B. Categorizing Randomized Complexity Classes. P RP P. 2. RP BPP PP. 3. P PP PSPACE. 8