Lecture 5: Two-point Sampling

Similar documents
Lecture 4: Two-point Sampling, Coupon Collector s problem

6.842 Randomness and Computation Lecture 5

Lecture 6: Coupon Collector s problem

Lecture 5: The Principle of Deferred Decisions. Chernoff Bounds

Randomness and Computation March 13, Lecture 3

Random Variable. Pr(X = a) = Pr(s)

Lecture 5. 1 Review (Pairwise Independence and Derandomization)

Expectation of geometric distribution

Topics in Probabilistic Combinatorics and Algorithms Winter, Basic Derandomization Techniques

6.045: Automata, Computability, and Complexity (GITCS) Class 17 Nancy Lynch

Lecture 4: Hashing and Streaming Algorithms

Discrete Mathematics for CS Spring 2007 Luca Trevisan Lecture 20

Expectation of geometric distribution. Variance and Standard Deviation. Variance: Examples

Topic: Sampling, Medians of Means method and DNF counting Date: October 6, 2004 Scribe: Florin Oprea

1 Randomized Computation

Lecture 2: A Las Vegas Algorithm for finding the closest pair of points in the plane

Average Case Complexity: Levin s Theory

U.C. Berkeley CS278: Computational Complexity Professor Luca Trevisan 1/29/2002. Notes for Lecture 3

X = X X n, + X 2

Notes on Discrete Probability

Handout 5. α a1 a n. }, where. xi if a i = 1 1 if a i = 0.

Lectures One Way Permutations, Goldreich Levin Theorem, Commitments

CSE 190, Great ideas in algorithms: Pairwise independent hash functions

CSE 502 Class 11 Part 2

Introduction to Randomized Algorithms: Quick Sort and Quick Selection

Week 12-13: Discrete Probability

Lecture notes on OPP algorithms [Preliminary Draft]

STOR Lecture 16. Properties of Expectation - I

The space complexity of approximating the frequency moments

SUMMARY OF PROBABILITY CONCEPTS SO FAR (SUPPLEMENT FOR MA416)

Homework 4 Solutions

Discrete Random Variables

1 Randomized complexity

CS6999 Probabilistic Methods in Integer Programming Randomized Rounding Andrew D. Smith April 2003

An Introduction to Randomized algorithms

Randomized algorithms. Inge Li Gørtz

Lecture 2 Sep 5, 2017

Lecture 2: Streaming Algorithms

Lecture 23: Alternation vs. Counting

11.1 Set Cover ILP formulation of set cover Deterministic rounding

Handout 1: Mathematical Background

Discrete Mathematics and Probability Theory Fall 2015 Note 20. A Brief Introduction to Continuous Probability

Some notes on streaming algorithms continued

1 Maintaining a Dictionary

1 Distributional problems

Expectation, inequalities and laws of large numbers

Handout 1: Probability

Disjointness and Additivity

Randomized Algorithms

Notes for Lecture 18

Lecture Examples of problems which have randomized algorithms

Instructor: Bobby Kleinberg Lecture Notes, 25 April The Miller-Rabin Randomized Primality Test

Lecture Hardness of Set Cover

Midterm 2 Review. CS70 Summer Lecture 6D. David Dinh 28 July UC Berkeley

CPSC 467: Cryptography and Computer Security

Handout 1: Mathematical Background

Factoring. there exists some 1 i < j l such that x i x j (mod p). (1) p gcd(x i x j, n).

14.1 Finding frequent elements in stream

CSE525: Randomized Algorithms and Probabilistic Analysis April 2, Lecture 1

The Parameterized Complexity of Approximate Inference in Bayesian Networks

Lecture 9 - One Way Permutations

CPSC 467b: Cryptography and Computer Security

Overview. CSE 21 Day 5. Image/Coimage. Monotonic Lists. Functions Probabilistic analysis

A crash course in probability. Periklis A. Papakonstantinou Rutgers Business School

THE SOLOVAY STRASSEN TEST

Lecture 11. Probability Theory: an Overveiw

Lecture 3 - Expectation, inequalities and laws of large numbers

CSCB63 Winter Week10 - Lecture 2 - Hashing. Anna Bretscher. March 21, / 30

Discrete Mathematics and Probability Theory Spring 2014 Anant Sahai Note 18

Hash tables. Hash tables

Lecture 2: Randomized Algorithms

Homework 10 (due December 2, 2009)

CSC 5170: Theory of Computational Complexity Lecture 5 The Chinese University of Hong Kong 8 February 2010

Problem Set 2. Assigned: Mon. November. 23, 2015

Lecture 10 Oct. 3, 2017

Randomized algorithm

MATH 115, SUMMER 2012 LECTURE 4 THURSDAY, JUNE 21ST

(x 1 +x 2 )(x 1 x 2 )+(x 2 +x 3 )(x 2 x 3 )+(x 3 +x 1 )(x 3 x 1 ).

20.1 2SAT. CS125 Lecture 20 Fall 2016

Discrete Mathematics and Probability Theory Fall 2011 Rao Midterm 2 Solutions

Pigeonhole Principle and Ramsey Theory

Lecture 4. P r[x > ce[x]] 1/c. = ap r[x = a] + a>ce[x] P r[x = a]

Lecture 13 (Part 2): Deviation from mean: Markov s inequality, variance and its properties, Chebyshev s inequality

Lecture 4: Probability and Discrete Random Variables

1 Probability Review. CS 124 Section #8 Hashing, Skip Lists 3/20/17. Expectation (weighted average): the expectation of a random quantity X is:

Model Counting for Logical Theories

the amount of the data corresponding to the subinterval the width of the subinterval e x2 to the left by 5 units results in another PDF g(x) = 1 π

Randomized Complexity Classes; RP

Notes for Lecture 14 v0.9

Lower Bounds for Testing Bipartiteness in Dense Graphs

Introduction to discrete probability. The rules Sample space (finite except for one example)

Example continued. Math 425 Intro to Probability Lecture 37. Example continued. Example

Notes on Zero Knowledge

Lecture 7: ɛ-biased and almost k-wise independent spaces

CS280, Spring 2004: Final

Computer Science A Cryptography and Data Security. Claude Crépeau

Ma/CS 6b Class 15: The Probabilistic Method 2

Discrete Mathematics and Probability Theory Fall 2013 Vazirani Note 16. A Brief Introduction to Continuous Probability

CS 580: Algorithm Design and Analysis

Lecture 1: January 16

Transcription:

Randomized Algorithms Lecture 5: Two-point Sampling Sotiris Nikoletseas Professor CEID - ETY Course 2017-2018 Sotiris Nikoletseas, Professor Randomized Algorithms - Lecture 5 1 / 26

Overview A. Pairwise independence of random variables B. The pairwise independent sampling theorem C. Probability amplification via reduced randomness Sotiris Nikoletseas, Professor Randomized Algorithms - Lecture 5 2 / 26

A. On the Additivity of Variance In general the variance of a sum of random variables is not equal to the sum of their variances However, variances do add for independent variables (i.e. mutually independent variables) In fact, mutual independence is not necessary and pairwise independence suffices This is very useful, since in many situations the random variables involved are pairwise independent but not mutually independent. Sotiris Nikoletseas, Professor Randomized Algorithms - Lecture 5 3 / 26

Conditional distributions Let X, Y be discrete random variables. Their joint probability density function is f(x, y) = Pr{(X = x) (Y = y)} Clearly f 1 (x) = Pr{X = x} = y f(x, y) and f 2 (y) = Pr{Y = y} = x f(x, y) Also, the conditional probability density function is: Pr{(X = x) (Y = y)} f(x y) = Pr{X = x Y = y} = Pr{Y = y} f(x, y) f(x, y) = = f 2 (y) f(x, y) x = Sotiris Nikoletseas, Professor Randomized Algorithms - Lecture 5 4 / 26

Pairwise independence Let random variables X 1, X 2,..., X n. These are called pairwise independent iff for all i j it is Pr{(X i = x) (X j = y)} = Pr{X i = x}, x, y Equivalently, Pr{(X i = x) (X j = y)} = = Pr{X i = x} Pr{X j = y}, x, y Generalizing, the collection is k-wise independent iff, for every subset I {1, 2,..., n} with I < k for every set of values { {a i }, b and j / I, it is Pr X j = b i I X i = a i } = Pr{X j = b} Sotiris Nikoletseas, Professor Randomized Algorithms - Lecture 5 5 / 26

Mutual (or full ) independence The random variables X 1, X 2,..., X n are mutually independent iff for any subset X i1, X i2,..., X ik, (2 k n) of them, it is Pr{(X i1 = x 1 ) (X i2 = x 2 ) (X ik = x k )} = = Pr{X i1 = x 1 } Pr{X i2 = x 2 } Pr{X ik = x k } Example (for n = 3). Let A 1, A 2, A 3 3 events. They are mutually independent iff all four equalities hold: Pr{A 1 A 2 } = Pr{A 1 } Pr{A 2 } (1) Pr{A 2 A 3 } = Pr{A 2 } Pr{A 3 } (2) Pr{A 1 A 3 } = Pr{A 1 } Pr{A 3 } (3) Pr{A 1 A 2 A 3 } = Pr{A 1 } Pr{A 2 } Pr{A 3 } (4) They are called pairwise independent if (1), (2), (3) hold. Sotiris Nikoletseas, Professor Randomized Algorithms - Lecture 5 6 / 26

Mutual vs pairwise independence Important notice: Pairwise independence does not imply mutual independence in general. Example. Let a probability space including all permutations of a, b, c as well as aaa, bbb, ccc (all 9 points considered equiprobable). Let A k = at place k there is an a (for k = 1, 2, 3). It is Pr{A 1 } = Pr{A 2 } = Pr{A 3 } = 2+1 9 = 1 3 Also Pr{A 1 A 2 } = Pr{A 2 A 3 } = Pr{A 1 A 3 } = 1 9 = 1 3 1 3 thus A 1, A 2, A 3 are pairwise independent. But Pr{A 1 A 2 A 3 } = 1 9 Pr{A 1} Pr{A 2 } Pr{A 3 } = 1 27 thus the events are not mutually independent Sotiris Nikoletseas, Professor Randomized Algorithms - Lecture 5 7 / 26

Variance: key features Definition: V ar(x) = E[(X µ) 2 ] = x (x µ) 2 Pr{X = x} where µ = E[X] = x x Pr{X = x} We call standard deviation of X the σ = V ar(x) Basic Properties: (i) V ar(x) = E[X 2 ] E 2 [X] (ii) V ar(cx) = c 2 V ar(x), where c constant. (iii) V ar(x + c) = V ar(x), where c constant. proof of (i): V ar(x) = E[(X µ) 2 ] = E[X 2 2µX + µ 2 ] = E[X 2 ] + E[ 2µX] + E[µ 2 ] = E[X 2 ] 2µE[X] + µ 2 = E[X 2 ] µ 2 Sotiris Nikoletseas, Professor Randomized Algorithms - Lecture 5 8 / 26

The additivity of variance Theorem: if X 1, X 2,..., X n ( are pairwise independent random n ) n variables, then: V ar X i = V ar(x i ) i=1 Proof: V ar(x 1 + + X n ) = E[(X 1 + + X n ) 2 ] E 2 [X 1 + + X n ] = n n n n = E Xi 2 + X i X j µ 2 i + µ i µ j = = i=1 1 i j n n (E[Xi 2 ] µ 2 i ) + i=1 n 1 i j n i=1 i=1 1 i j n (E[X i X j ] µ i µ j ) = (since X i pairwise independent, so 1 i j n E(X i X j ) = E(X i )E(X j = µ i µ j ) n V ar(x i ) Note: As we see in the proof, the pairwise independence suffices, and mutual (full) independence is not needed. Sotiris Nikoletseas, Professor Randomized Algorithms - Lecture 5 9 / 26 i=1

B. The pairwise independent sampling theorem Another Example. Birthday matching: Let us try to estimate the number of pairs of people in a room having birthday on the same day. Note 1: Matching birthdays for different pairs of students are pairwise independent, since knowing that (George, Takis) have a match tell us nothing about whether (George, Petros) match. Note 2: However, the events are not mutually independent. Indeed they are not even 3-wise independent since if (George,Takis) match and (Takis, Petros) match then (George, Petros) match! Sotiris Nikoletseas, Professor Randomized Algorithms - Lecture 5 10 / 26

Birthday matching Let us calculate the probability of having a certain number of birthday matches. Let B 1, B 2,..., B n the birthdays of n independently chosen people and let E i,j be the indicator variable for the event of a (i, j) match (i.e. B i = B j ). As said, the events E i,j are pairwise independent but not mutually independent. Clearly, Pr{E i,j } = Pr{B i = B j } = 365 1 365 i j). Let D the number of matching pairs. Then D = 1 i<j n E i,j By linearity of expectation we have E[D] = E = E[E i,j ] = 1 i<j n E i,j 1 i<j n 1 365 = 1 365 (for ( ) n 1 2 365 Sotiris Nikoletseas, Professor Randomized Algorithms - Lecture 5 11 / 26

Birthday matching Since the variances of pairwise independent variables E i,j add up, it is: V ar[d] = V ar = V ar[e i,j ] = = ( ) n 1 2 365 (1 1 365 ) 1 i<j n E i,j 1 i<j n As an example, for a class of n = 100 students, it is E[D] 14 and V ar[d] < 14(1 1 365 ) < 14. So by Chebyshev s inequality we have Pr{ D 14 x} 14 x 2 Letting x = 6, we conclude that with more than 50% chance the number of matching birthdays will be between 8 and 20. Sotiris Nikoletseas, Professor Randomized Algorithms - Lecture 5 12 / 26

The Pairwise Independent Sampling Theorem (I) We can actually generalize and not restrict to sums of zero-one (indicator) valued variables neither to variables with the same distribution. We below state the theorem for possibly different distributions with same mean and variance (but this is done for simplicity, and the result holds for distributions with different means and/or variances as well). Theorem. Let X 1,..., X n pairwise independent variables with the same mean µ and variance σ 2. Let S n = n i=1 X i Then Pr { S n n µ } ( x 1 σ ) 2 n x Proof. Note that E[ Sn n ] = nµ n = µ, V ar[ Sn n ] = ( ) 1 2 n nσ 2 = σ2 n and apply Chebyshev s inequality. Sotiris Nikoletseas, Professor Randomized Algorithms - Lecture 5 13 / 26

The Pairwise Independent Sampling Theorem (II) Note: This Theorem actually provides a precise general evaluation of how the average of pairwise independent random samples approaches their mean. If the number n of samples becomes large enough we can arbitrarily close approach the mean with confidence arbitrarily close to 100% (n > σ2 ) i.e. a x 2 large number of samples is needed for distributions of large variance and when we want to assure high concentration around the mean). Sotiris Nikoletseas, Professor Randomized Algorithms - Lecture 5 14 / 26

C. Reduced randomness at probability amplification Motivation: Randomized Algorithms, for a given input x, actually choose n random numbers ( witnesses ) and run a deterministic algorithm on the input, using each of these random numbers. intuitively, if the deterministic algorithm has a probability of error ɛ (e.g. 1 2 ), t independent runs reduce the error probability to ɛ t 1 (e.g. 2 ) and amplify the correctness t probability from 1 2 to 1 1 2. t however, true randomness is quite expensive! What happens if we are constrained to use no more than a constant c random numbers? The simplest case is when c = 2 e.g. we choose just 2 random numbers (thus the name two-point sampling) Sotiris Nikoletseas, Professor Randomized Algorithms - Lecture 5 15 / 26

C. Reduced randomness at probability amplification Problem definition: If our randomized algorithm reduced the error probability to ɛ t with t random numbers, what can we expect about the error probability with 2 random numbers only? an obvious bound is ɛ 2 (e.g. when ɛ = 1 2 reducing the error probability from 1 2 to 1 4 ) can we do any better? it turns out that we can indeed do much better and reduce the error probability to 1 t (which is much smaller than the constant ɛ 2 ) Sotiris Nikoletseas, Professor Randomized Algorithms - Lecture 5 16 / 26

C. Reduced randomness at probability amplification High level idea: generate t ( pseudo-random ) numbers based on the chosen 2 truly random numbers and use them in lieu of t truly independent random numbers. these generated numbers are dependent on the 2 chosen numbers. Hence they are not independent, but are pairwise independent. This loss of full independence reduces the accuracy of the algorithmic process but is still quite usable in reducing the error probability. Sotiris Nikoletseas, Professor Randomized Algorithms - Lecture 5 17 / 26

C. Reduced randomness at probability amplification The process (high level description): 1 Choose a large prime number p (e.g. Mersenne prime, 2 31 1). 2 Define Z p as the ring of integers modulo p (e.g 0, 1, 2,..., 2 31 2) 3 Choose 2 truly random numbers, a and b from Z p (e.g. a = 2 20 + 781 and b = 2 27 44). 4 Generate t pseudo-random numbers y i = (ai + b) mod p e.g. y 0 = 2 27 44 y 1 = (2 20 + 781) 1 + 2 27 44 y 2 = (2 20 + 781) 2 + 2 27 44 and so on. 5 Use each of the y i s as t witnesses (in lieu of t purely independent random witnesses) Sotiris Nikoletseas, Professor Randomized Algorithms - Lecture 5 18 / 26

C. Reduced randomness at probability amplification Performance (high level discussion) As we will see: for any given error bound ɛ, the error probability is reduced from ɛ 2 to 1 t. however, instead of requiring O(log 1 ɛ ) runs on the deterministic algorithm (in the case of t independent random witnesses) we will require O( 1 ɛ ) runs in the case of 2 independent random witnesses. thus, we gain in the probability amplification but loose on some efficiency. we need significantly less true randomness. Sotiris Nikoletseas, Professor Randomized Algorithms - Lecture 5 19 / 26

The class RP (I) Definition. The class RP (Random Polynomial time) consists of all languages L admitting a randomized algorithm A running in worst case polynomial time such that for any input x: Notes: x L Pr{A(x) accepts} 1 2 x / L Pr{A(x) accepts} = 0 language recognition computational decision problems the 1 2 value is arbitrary. The success probability needs just to be lower-bounded by an inverse polynomial function of the input size (a polynomial number of algorithm repetitions would boost it to constant, in polynomial time). RP actually includes Monte Carlo algorithms than can err only when x L (one-sided error) Sotiris Nikoletseas, Professor Randomized Algorithms - Lecture 5 20 / 26

The RP algorithm An RP algorithm, for given input x, actually picks a random number r from Z p and computes A(x, r) with the following properties: - x L A(x, r) = 1, for half of the possible values of r - x / L A(x, r) = 0, for all possible choices of r As said, - if we run algorithm A t times on the same input x, the error probability is 1 2 but this requires t truly random t numbers. - if we restrict ourselves to t = 2 true random numbers a, b from Z p and run A(x, a) and A(x, b), the error probability can be as high as 1 4 - but we can do better with t pseudo-random numbers: Let r i = a i + b (mod p), where a, b are truly random, as above, for i = 1, 2,..., t. Sotiris Nikoletseas, Professor Randomized Algorithms - Lecture 5 21 / 26

Modulo rings pairwise independence (I) Let p a prime number and Z p = {0, 1, 2,..., p 1} denote the ring of the integers modulo p. Lemma 1. Given y, i Z p and choosing a, b randomly uniformly from Z p, the probability of y a i + b (mod p) is 1 p Proof. Imagine that we first choose a. Then when choosing b, it must be y a i b (mod p). Since we choose b uniformly and modulo p can take p values, the probability is clearly 1 p indeed. Sotiris Nikoletseas, Professor Randomized Algorithms - Lecture 5 22 / 26

Modulo rings pairwise independence (II) Lemma 2. Given y, z, x, w Z p such that x w, and choosing a, b randomly uniformly from Z p, the probability of y a x + b (mod p) and z a w + b (mod p) is 1 p 2 Proof. It is y z a (x w) (mod p). Since x w 0, the equation holds for a unique value of a. This in turn implies a specific value for b. The probability that a, b get those two specific values is clearly 1 p 1 p = 1. p 2 Sotiris Nikoletseas, Professor Randomized Algorithms - Lecture 5 23 / 26

Modulo rings pairwise independence (III) Lemma 3. Let i, j two distinct elements of Z p,and choose a, b randomly uniformly from Z p. Then the two variables Y i = a i + b (mod p) and Y j = a j + b (mod p) are uniformly distributed on Z p and are pairwise independent. Proof. From lemma 1, it is clearly Pr{Y i = α} = 1 p, for any α Z p, so Y i, Y j indeed have uniform distribution. As for pairwise independence, note that: Pr{Y i = α Y j = β} = Pr{Y i=α Y j =β} Pr{Y j =β} = Thus, Y i, Y j are pairwise independent. 1 p 2 1 p = Pr{Y i = α} Sotiris Nikoletseas, Professor Randomized Algorithms - Lecture 5 24 / 26

Modulo rings pairwise independence (IV) Note: This is only pairwise independence. Indeed, consider the variables Y 1, Y 2 Y 3, Y 4 as defined above. Every pair of them are pairwise independent. But, if you give the value of Y 1, Y 2 then we know the values of Y 3, Y 4 immediately, since the values of Y 1 and Y 2 uniquely determine a and b, and thus we can compute all Y i variables. Sotiris Nikoletseas, Professor Randomized Algorithms - Lecture 5 25 / 26

Modulo rings pairwise independence (V) Thus Y = t i=1 A(x, r i) is a sum of random variables which are pairwise independent, since the r i values are pairwise independent. Assume that x L, then E[Y ] = t 2 and V ar[y ] = t i=1 V ar[a(x, r i)] = t 1 1 2 2 = t 4 The probability that all those t executions failed corresponds to the event Y = 0, and Pr{Y = 0} Pr{ Y E[Y ] E[Y ]} = Pr{ Y t 2 t 2 } V ar[y ] = t ( 2) t 2 4 = 1 ( 2) t 2 t from Chebyshev s inequality. Thus the use of pseudo-randomness (t pseudo-random numbers) allows to exploit our 2 random bits to reduce the error probability from 1 4 to 1 t. Sotiris Nikoletseas, Professor Randomized Algorithms - Lecture 5 26 / 26