Randomized Computation - PDF Free Download

Randomized Computation Slides based on S.Aurora, B.Barak. Complexity Theory: A Modern Approach. Ahto Buldas Ahto.Buldas@ut.ee

We do not assume anything about the distribution of the instances of the problem to be solved. Instead we incorporate randomization into the algorithm itself... It may seem at first surprising that employing randomization leads to efficient algorithm. This claim is substantiated by two examples. The first has to do with finding the nearest pair in a set of n points in R k. The second example is an extremely efficient algorithm for determining whether a number is prime. Michael Rabin, 1976 2

We study probabilistic computation, and complexity classes associated with it. NB! It is an open question whether or not the universe has any randomness. Still, we assume that true random number generators exist. Then arguably, a realistic model for a real-life computer is a Turing machine with a random number generator, which we call a Probabilistic Turing Machine (PTM). It is natural to wonder whether difficult problems like 3SAT are efficiently solvable using a PTM. 3

We will formally define the class BPP of languages decidable by polynomialtime PTMs and discuss its relation to previously studied classes such as P/poly and PH. One consequence is that if PH does not collapse, then 3SAT does not have efficient probabilistic algorithms. We also show that probabilistic algorithms can be very practical by presenting ways to greatly reduce their error to absolutely minuscule quantities. Thus the class BPP (and its sister classes RP, corp and ZPP) are arguably as important as P in capturing efficient computation. 3

Probabilistic Turing Machines Syntactically, a PTM is no different from a non-deterministic TM: it is a TM with two transition functions δ 0, δ 1. The difference lies in how we interpret the graph of all possible computations: instead of asking whether there exists a sequence of choices that makes the TM accept, we ask how large is the fraction of choices for which this happens. More precisely, if M is a PTM, then we envision that in every step in the computation, M chooses randomly which one of its transition functions to apply (with probability half applying δ 0 and with probability half applying δ 1 ). We say that M decides a language if it outputs the right answer with probability at least 3/4. 4

Notice, the ability to pick (with equal probability) one of δ 0, δ 1 to apply at each step is equivalent to the machine having a fair coin, which, each time it is tossed, comes up Heads or Tails with equal probability regardless of the past history of Heads/Tails. 4

BPTIME and BPP Def.: For T : N N and L {0, 1}, we say that a PTM M decides L in time T (n), if for every x {0, 1}, M halts in T ( x ) steps regardless of its random choices, and Pr[M(x) = L(x)] 3/4, where we denote L(x) = 1 if x L and L(x) = 0 if x L. We let BPTIME(T (n)) denote the class of languages decided by PTMs in O(T (n)) time and let BPP = c>0 BPTIME(n c ), i.e. L is in BPP if there is a poly-time PTM M such that: x L Pr[1 M(x)] 3 4 x L Pr[1 M(x)] 1 4. 5

Relations with other classes Since a deterministic TM is a special case of a PTM (where both transition functions are equal), the class P BPP. Under plausible complexity assumptions it holds that BPP = P. Nonetheless, as far as we know it may even be that BPP = EXP. Note that BPP EXP, since given a polynomial-time PTM M and input x {0, 1} n in time 2 poly(n) it is possible to enumerate all possible random choices and compute precisely the probability that M(x) = 1. We will show that BPP P/poly and BPP Σ p 2 Πp 2. 6

One-sided error: the class RP The class BPP captured what we call probabilistic algorithms with two sided error. That is, it allows the machine M to output (with some small probability) both 0 when x L and 1 when x L. However, many probabilistic algorithms have the property of one sided error. For example if x L they will never output 1, although they may output 0 when x L. This is captured by the definition of RP. Def.: L RP if there is a probabilistic poly-time M such that x {0, 1} : x L Pr[1 M(x)] 3 4 x L Pr[1 M(x)] = 0. Note that RP NP, since every accepting branch is a certificate that the input is in the language. In contrast, we do not know if BPP NP. 7

corp The class corp = {L: L RP} captures one-sided error algorithms with the error in the other direction (i.e., may output 1 when x L but will never output 0 if x L. Def.: L corp if there is a probabilistic poly-time M such that x {0, 1} : x L Pr[1 M(x)] = 1 x L Pr[1 M(x)] 1 4. Note also that RP BPP and corp BPP. 8

The class ZPP Def.: A probabilistic Turing machine M is said to be expected poly-time, if there is c > 0 such that for every x the average running time of M(x) is x c. Note that the event that M(x) never stops (i.e. the running time is infinitely large) is possible but must have zero probability: T (M, x) = p 1 1 + p 2 2 +... + p 1000 1000 +... Def.: L ZPP if there is an expected poly-time M that never errs, i.e.: x L Pr[1 M(x)] = 1, x L Pr[0 M(x)] = 1. 9

ZPP = RP corp This ought to be slightly surprising, since the corresponding statement for nondeterminism is open; i.e., whether or not P = NP\coNP. Proof : The inclusions ZPP RP and ZPP corp are straightforward. So, it remains to show that RP corp ZPP. Let L RP corp. Let M be the RP-machine for L and M be the corp-machine for L. The ZPP algorithm for L can be constructed as follows: (1) If 1 M(x) then output 1 and halt. (2) If 0 M (x) then output 0 and halt. (3) Go back to step (1). If x L the with probability 3 4 the algorithm will output 1 at step (1). If x L then with probability 4 3 the algorithm will output 0 at step (2). So, the 10

probability that x is not yet correctly decided at step m is 1 4 m and hence the probability that x is eventually decided is The average running time is: 1 lim m 1 4 m = 1. t t( x ) + t ( x ) + 2(t( x ) + t ( x )) 1 4 1 + 3(t( x ) + t ( x )) 1 4 2 +... = [t( x ) + t ( x )] (1 + 24 1 + 34 2 + 44 ) 3 +... [t( x ) + t ( x )] (1 + 12 1 + 12 2 + 12 ) 3 +... = 2[t( x ) + t ( x )], because of m+1 4 m 1 2 m, which directly follows from m + 1 2 m. 10

Improving Randomized Algorithms by Iteration If L RP and M is the corresponding poly-time PTM, then by using the following iterated algorithm M m : (1) If 1 M(x) then output 1 and halt. (2) m := m 1 (3) If m > 0 then goto (1), otherwise output 0 and halt. we get that for any x {0, 1} : x L Pr[1 M m (x)] = 1 1 4 m x L Pr[1 M m (x)] = 0. The same technique can be used for improving corp algorithms. 11

Improving BPP Algorithms by using Voting Any BPP algorithm M can be improved by using the following voting algorithm M m (where m > 0 is an odd number): Generate m samples: b i M(x) If i b i > m 2 then output 1, otherwise output 0 For every i = 1... m we define a new random variable x i which is 1, if and only if b i erroneously indicates x L, i.e. either x L and b i = 0 or x L and b i = 1. By definition of BPP, we know that Pr[x i = 1] 1 4. The voting algorithm is correct about x, iff m i=1 x i < m 2. To analyze the quality of the voting algorithm, we need Chernoff bounds from Probability Theory. 12

Chernoff Bounds Theorem(Chernoff bound): Let x 1,..., x m be independent 0/1 random variables with the same probability distribution. Let p = Pr[x i = 1] and X = m i=1 x i. Then for any 0 Θ 1: Pr[X (1 + Θ)pm] e Θ2 3 pm (1) Pr[X (1 Θ)pm] e Θ2 2 pm. (2) The proof is based on two lemmas: Lemma 1: If 0 Θ 1 then Θ2 2 Θ (1 + Θ) ln(1 + Θ) Θ2 3. Lemma 2: If 0 Θ 1 then Θ (1 Θ) ln(1 Θ) Θ2 2. 13

Proof (1): Pr[X (1 + Θ)pm] = Pr[e tx e t(1+θ)pm ] for any 0 < t. Markovi inequality implies Pr[e tx k E[e tx ]] 1/k for any k > 0. We take k = e t(1+θ)pm (E[e tx ]) 1. Then Pr[X (1 + Θ)pm] e t(1+θ)pm E[e tx ]. As E[e tx ] = (E[e tx 1]) m = (1 + p(e t 1)) m, then after substitution we obtain: Pr[X (1 + Θ)pm] e t(1+θ)pm (1 + p(e t 1)) m e t(1+θ)pm e pm(et 1). Here we used the fact that (1 + a) m e am for every a > 0. Finally, by taking t = ln(1 + Θ), we obtain from Lemma 1 that Pr[X (1 + Θ)pm] e pm[θ (1+Θ) ln(1+θ)] e Θ2 3 pm. 14

Proof (2): Pr[X (1 Θ)pm]=Pr[pm X Θpm]=Pr[e t(pm X) e tθpm ] for any 0 < t. Markovi ineq. implies Pr[e t(pm X) k E[e t(pm X) ]] 1 k for any k > 0. We take k = e tθpm (E[e t(pm X) ]) 1. Then Pr[X (1 Θ)pm] e tθpm E[e t(pm X) ] = e t(1 Θ)pm E[e tx ]. As E[e tx ] = (E[e tx 1]) m = (1 p(1 e t )) m, we obtain Pr[X (1 Θ)pm] e t(1 Θ)pm (1 p(1 e t )) m e t(1 Θ)pm e pm(1 e t ) = e pm[t(1 Θ)+1 e t]. Finally, taking t = ln(1 Θ) we obtain from Lemma 2 that Pr[X (1 Θ)pm] e pm[θ (1 Θ) ln(1 Θ)] e Θ2 2 pm. 15

Proof of Lemma 1 Lemma 1: If 0 Θ 1 then Θ2 2 Θ (1 + Θ) ln(1 + Θ) Θ2 3. Proof : First, note that Θ (1 + Θ) ln(1 + Θ) = Θ (1 + Θ) ( ) Θ 1 Θ2 2 + Θ3 3 Θ4 4 +... = Θ2 1 2 + Θ3 2 3 Θ4 3 4 + Θ5 4 5... = n=2 Θ n ( 1) n 1 n(n 1). As the series r = 2 3 Θ3 3 4 Θ4 + 4 5 Θ5... is with alternating signs and their absolute values are strongly decreasing, then the sum of this series is Inequality Θ n (n 1)n Θn+1 n(n+1) is a direct consequence of n 1 n+1 Θ 1. 16

positive, because the fist term is positive. Consequently: Θ (1 + Θ) ln(1 + Θ) = Θ2 2 + r Θ2 2. Analogously, we claim that the series s = Θ4 sum and hence: 3 4 4 5 Θ5 + 4 5 Θ6... has positive Θ (1 + Θ) ln(1 + Θ) = Θ2 2 + Θ3 3 s Θ2 2 + Θ3 3 Θ2 2 + Θ2 3 = Θ2 6. 16

Proof of Lemma 2 Lemma 2: If 0 Θ 1 then Θ (1 Θ) ln(1 Θ) Θ2 2. Proof : It is easy to see that Θ (1 Θ) ln(1 Θ) = Θ2 2 1 + Θ3 3 2 + Θ4 4 3 +... = from which the inequality directly follows. n=2 Θ n n(n 1), 17

Analysis of the Voting Algorithm For i = 1... m let x i {0, 1} be the error variables, i.e. x i = 1 iff the i-th sample b i of M(x) wrongly reflects the truth value of x L. By the definition of BPP, we have p = Pr[x i = 1] 1 4. By taking Θ = 1 in the first Chernoff bound, we obtain Pr m x i m e 12 m. i=1 2 Hence, the voting algorithm has error < e m 12. 18

BPPɛ Let ɛ: N [0... 1] be a function. Def : A language L {0, 1} belongs to the class BPP ɛ if there is a polytime probabilistic Turing machine N such that for every x {0, 1} n : x L Pr[N(x) = 1] > 1 ɛ( x ); x L Pr[N(x) = 1] < ɛ( x ). Exercise: By using Chernoff bounds, prove the following: If ɛ(n) = 2 no(1), then BPP ɛ = BPP. If ɛ(n) = n O(1), then BPP 12 ɛ = BPP. 19

Theorem: BPP P/poly. BPP P/poly Proof : Let L BPP. As BPP = BPP 2 n, there is probabilistic Turing machine M with running time t(n) = n O(1) such that for every x {0, 1} n the probability of error is: Hence, Pr a [M(x, a) [x L]] < 2 n. Pr a [ x {0, 1}n : M(x, a) [x L]] < x {0,1} n 2 n = 1, and there exists a n {0, 1} t(n) such that for every x {0, 1} n : M(x, a n ) = [x L]. Therefore, the machine M with advice string a = (a 0, a 1,...) decides L and hence L P/poly. 20

BPP is in PH Theorem(Sipser-Gács): BPP Σ p 2 Πp 2. Proof : Suppose L BPP = BPP 2 n, i.e. there is a poly-time M that uses m = poly(n) random bits such that for any n and x {0, 1} n : x L Pr[1 M(x, r)] 1 2 n r x L Pr r [1 M(x, r)] 2 n. Let S x denote the set of rs for which M accepts the input pair (x, r). Then either S x (1 2 n )2 m or S x 2 n 2 m, depending on whether x L. We show how to check with two alternations, which case is true. 21

For k = m+1 n, let U = {u 1,..., u k } be a set of k strings in {0, 1} m. We define G U to be a graph with vertex set {0, 1} m and edges (r, s) for every r, s such that r = s u i for some i {1,..., k}. (where denotes vector addition modulo 2, or equivalently, bitwise XOR). Note that the degree of G U is k. For a set S {0, 1} m, define Γ U (S) to be all the neighbors of S in the graph G U. That is, r Γ U (S) if there is an s S and i {1,..., k} such that r = s u i. Claim 1: For every set S {0, 1} m with S 2 m n and every set U of size k, it holds that Γ U (S) {0, 1} m. Indeed, since G U has degree k, it holds that Γ U (S) k S m+1 n2 n 2 m < 2 m. Note that there is n 0, such that for every n > n 0 we have m + 1 < n2 n. Claim 2: For every set S {0, 1} m with S (1 2 n )2 m there exists a set U of size k such that Γ U (S) = {0, 1} m. We show this by the 21

probabilistic method, by proving that for every S, if we choose U at random by taking k random strings u 1,..., u k, then Pr[Γ U (S) = {0, 1} m ] > 0. Indeed, for r {0, 1} m, let B r denote the bad event that r is not in Γ U (S). Then, B r = i [k] B i r where B i r is the event that r S u i, or equivalently, that r u i S (using the fact that modulo 2, a b = c a = c b). Yet, r u i is a uniform element in {0, 1} m, and so it will be in S with probability at least 1 2 n. Since B 1 r,..., B k r are independent and each of it happens with probability 2 n, the probability that B r happens is at most (2 n ) k < 2 m. By the union bound, the probability that Γ U (S) {0, 1} m is bounded by r Pr[B r ] < 1. Together Claims 1 and 2 show x L if and only if the following statement is true (with just a finite number of possible exceptions in case x n 0 ): u 1,..., u k {0, 1} m r {0, 1} m k i=1 M(x, r u i) = 1 thus showing L Σ p 2. As L BPP implies L BPP, we also have that L Σ p 2 which is equivalent to L Πp 2. 21