Approximate Verification - PDF Free Download

Approximate Verification Michel de Rougemont Université de Paris-II Ekaterinenburg-April 2014 i

Contents 1 Probabilistic Classes 3 1.1 Probabilistic decision classes................................... 3 1.1.1 Error amplification.................................... 6 1.1.2 Comparing PP and #P................................. 7 1.1.3 Other probabilistic classes................................ 8 1.2 Examples of probabilistic algorithms.............................. 8 1.2.1 Random walk in an undirected graph......................... 8 1.2.2 Perfect matching in a bipartite graph......................... 11 1.2.3 Solovay Strassen primality test............................. 13 1.3 Exercises............................................. 16 2 Probabilistic verification 17 2.1 Interactive proofs......................................... 18 2.2 Examples of interactive protocols................................ 19 2.2.1 The problem of non-isomorphism of graphs...................... 20 2.2.2 A protocol for the permanent.............................. 21 2.2.3 A protocol for QBF.................................... 24 2.3 Multiprover protocols...................................... 28 2.4 Probabilistic checkable proofs: PCP............................... 29 2.4.1 A PCP(n 3, 1) for 3SAT................................. 31 2.5 Exercises............................................. 37 3 Approximation 39 3.1 Optimization problems...................................... 39 3.1.1 Examples of approximation algorithms......................... 41 3.1.2 Approximation of optimization problems........................ 44 3.1.3 Reductions and completeness.............................. 50 3.1.4 Non-approximability................................... 52 3.2 Counting problems........................................ 54 3.2.1 Approximation of counting problems.......................... 54 3.2.2 Approximation of #DNF................................ 56 3.3 Approximate verification..................................... 58 3.4 Exercises............................................. 61 iii

4 The PCP theorem 65 4.1 Various PCP forms........................................ 65 4.2 The main theorem........................................ 67 4.2.1 Structure of the proof.................................. 67 4.2.2 Expanders......................................... 68 4.3 Preprocessing........................................... 69 4.4 Dinur s amplification....................................... 70 4.5 Composition for the Alphabet reduction............................ 74 4.6 Proof of the main theorem.................................... 75 5 Property Testing 77 5.1 Preliminaries........................................... 78 5.1.1 Property Testing..................................... 78 5.1.2 Words........................................... 78 5.1.3 Ranked Trees....................................... 79 5.1.4 Unranked Trees...................................... 79 5.2 Testing regular languages.................................... 80 5.2.1 Basic definitions..................................... 80 5.2.2 The tester......................................... 81 5.2.3 Robustness of the tester................................. 83 5.3 Testing regular ranked tree languages............................. 84 5.3.1 Basic definitions..................................... 84 5.3.2 The tester......................................... 86 5.3.3 Robustness of the tester................................. 86 5.4 Extension to unranked trees................................... 89 6 Homework 1 91 iv

Introduction These chapters are the class notes for the course Approximate Verification, taught in April 2014. Some of the chapters are taken from the book Logic and Complexity, by R. Lassaigne and M. de Rougemont. The chapters on the PCP theorem and on Property testing are new. Given a large finite structure U, we study how difficult it is to verify that U satisfies a property. For NP properties, there are small specific witnesses such that given U and the witness we can quickly verify the property. For conp or PSPACE properties, small witnesses are not known. We relax the verifier to be a probabilistic algorithm, and hence only convince ourself that the property is true with high probability. We consider three models which realize this goal: Interactive proofs (IP), Probabilistic-Checkable Proofs (PCP) and Property Testing. In the last model, we want to approximately verify a property of U in time independent of the size of U, i.e. without reading the entire data, by just observing a few samples. It is related to the area of sublinear algorithms and adapted to the area of "Big data". We will cover the following subjects: 1. Probabilistic Verification: Interactive Proofs, PCP model, Property Testing. Graph non- Isomorphism, The protocols for the Permanent and QBF. 2. The easy PCP model: NP = P CP (n 3, 1), using Walsh-Hadamard codes. We will only mention the hard PCP theorem: NP = P CP (log n, 1), using Dinur s Gap amplification. Applications are lower bounds for Optimization problems. For example, one can show that Maxclique is hard to approximate, assuming the (hard) PCP theorem. 3. Property testing. Tester to decide if a long word w n belongs to a regular expression, (based on a paper with E. Fischer and F. Magniez), and recent work motivated by applications to "Big data". We wish to thank Miklós Sántha for sharing his PCP notes with us. 1

Chapter 1 Probabilistic Classes In the previous chapters, the complexity classes concerned problems which could be solved by procedures without error. We now consider probabilistic procedures such that if x L, the probability to accept will be close to 1 and if x L, the probability to reject will be close to 1. A procedure without error can be implemented on a sequential or parallel machine model, made of hardware components which are not completely reliable. The software environment (Unix, for example) is also error prone, as every computer scientist will attest. Let us suppose a rate of error of 10 8 for a given machine. An imperfect procedure with a rate of error of 10 10 will not be distinguishable from a perfect procedure, because the error probability due to the algorithm is negligible in comparison with the error probability due to the machine. In practice, a probabilistic algorithm can be extremely useful. The role of the probabilistic complexity classes is also fundamental to the theoretical point of view and the interaction between logic and randomness is a very rich subject. In the previous chapters on deterministic complexity classes, we showed how certain problems are intrinsically difficult. We now want to know if these problems can be solved with probabilistic algorithms with a gain in time or space. In the first section, we define the main probabilistic classes PP, RP and BPP, and we give in Section 2 some classical examples of probabilistic algorithms. 1.1 Probabilistic decision classes A probabilistic algorithm is a constructive procedure which uses a new instruction: random choice. One can flip a coin or randomly choose a value among k equiprobable distinct values. In this chapter, a random choice is the selection of the values 0 or 1 with probability 1 2. We can associate a non-deterministic machine with a probabilistic algorithm: just consider the random choice as a non-deterministic instruction. We define the notion of probabilistic acceptance of a 3

4 CHAPTER 1. PROBABILISTIC CLASSES language, through the use of the computation tree of specific non-deterministic Turing machines. Definition 1.1. A probabilistic Turing machine is a non-deterministic Turing machine whose computation tree is a complete binary tree, i.e. all the branches have the same length. A probabilistic execution starts with the initial description (q 0.x) and follows at each step a possible transition of the machine. At each non-deterministic node, the machine makes a random choice between two possible transitions with a uniform distribution. The computation tree is a complete binary tree of depth t where all the paths have equal probability. The probabilistic space is Ω t = {(ρ, 1 2 t ) : ρ {0, 1}t } The notion of acceptance is defined by comparing the number of accepting paths acc M (x) and rejecting paths rej M (x). In an equivalent way, we talk of acceptance probability as the quotient of the number of accepting paths by the total number of paths: Pr Ω [M accepts x] = acc M(x) 2 t Pr Ω [M rejects x] = rej Mx) 2 t A run or experiment is a path in the computation tree leading to an accepting, or rejecting state. There are three main probabilistic classes associated with the class P, where the probabilistic conditions differ: 1. PP (Probabilistic P) 2. RP (Random P) 3. BPP(Bounded error Probabilistic P) The important class for practical applications is BPP and RP is one of its subclasses. The class PP is important for theoretical purposes. Definition 1.2. The class PP is the class of languages L for which there is a probabilistic polynomial time bounded Turing machine such that : if x L, then Pr Ω [M accepts x] > 1 2, if x L, then Pr Ω [M accepts x] 1 2. An important property of the class PP is: Proposition 1.1. NP P P

1.1. PROBABILISTIC DECISION CLASSES 5 Proof : Let a language L NP, defined by a machine M. Let us transform M into M such that the computation tree is complete and such that M accepts x iff M accepts x. Let us define a new machine M whose depth is the depth of the machine M plus one. On the first left branch, we duplicate the computation tree of M and on the right first branch we construct a complete binary tree of accepting states only, as in the figure below. M accepts x iff M x accepts x iff Pr Ω [M accepts x] > 1 2. One concludes that L P P. Figure 1.1: The computation tree of M which accepts with probability > 1 2 iff x L. The classes RP and BPP are more interesting from a practical and algorithmic point of view. Definition 1.3. RP is the class of languages L for which there is a probabilistic polynomial time bounded Turing machine M such that: if x L, then Pr Ω [M accepts x] > 1 2 if x L, then Pr Ω [M accepts x] = 0. Notice the asymmetry of this definition with regard to x L. A language is in the class corp if its complement is in the class RP. Another important class is the class ZPP = RP corp. If a problem is in ZPP, one can run on an input x an RP algorithm and independently a corp algorithm: with probability greater than 1/2, the two algorithms will give coherent answers. If we repeat this computation k times, the probability to obtain incoherent answers is smaller than 1/2 k. When one repeats both algorithms until coherent answers are received, one obtains a probabilistic algorithm without error whose time complexity is expected to be polynomial. Such algorithms are also called Las Vegas algorithms. Definition 1.4. BPP is the class of languages L for which there is a probabilistic Turing machine working in polynomial time and a constant ε > 0 such that: if x L, then Pr Ω [M accepts x] 1 2 + ε,

6 CHAPTER 1. PROBABILISTIC CLASSES if x L, then Pr Ω [M accepts x] 1 2 ε. Notice that ε must be constant, independent of the input x. The class BPP is also known as the class of problems which can be solved by Monte Carlo algorithms. A fundamental difference exists between the classes PP and BPP. Let x be an input of size n, and m = 2 n the number of leaves of the computation tree. If the number of accepting leaves is m 2 + 1, then x is accepted is the PP sense. In order to be accepted in the BPP sense, there must be an ε, independent of x, such that the number of accepting leaves is greater than m ( 1 2 + ε). Example. Consider the computation tree below for m = 16 and ε = 0.09. The number of accepting leaves (9) is not greater than 16/2 + 16 0, 09 = 9, 44 and the input is accepted in the PP sense but not in the BPP sense. Figure 1.2: A probabilistic computation tree with 16 final states, 9 accepting and 7 rejecting. A fundamental property of the class BPP is that the error probability of 1 2 ε can be made arbitrarily small between 0 and 1 2, in particular very close to 0. This property is called error amplification and is used repeatedly in all probabilistic situations. 1.1.1 Error amplification In the case of an RP algorithm, the probability of error is the probability of not accepting if x L and this value is at most p < 1 2 from the definition of RP. Let us repeat k independent experiments and let us accept x if one of the experiments is accepting. In this case, the error probability is the probability of obtaining k rejecting answer, i.e. less than p k < ( 1 2 )k. Let us show that if a problem is in the class BPP, one can also reduce the error probability to ( 1 2 )k, i.e. to an exponentially small amount, after O(k) experiments. Let us iterate 2m + 1 times the BPP algorithm and let us accept according to a majority test, i.e. if the number of accepting experiments is greater than m. Let p be the error probability 1 2 ε and q = 1 p. The error probability of this new procedure is the probability of not accepting when we should have, i.e.

1.1. PROBABILISTIC DECISION CLASSES 7 the probability µ of obtaining no more than m accepting answers. We have ( ) 2m+1 i possibilities of obtaining i accepting answers among the 2m + 1 cases. µ = m ( ) 2m + 1 m p 2m+1 i q i = p p m q m i i=0 p p m q m m i=0 i=0 ( 2m + 1 ( ) 2m + 1 = p p m q m 2 2m i i ) p m i q i m µ is indeed the sum on i = 0,..., m of the probability of obtaining i positive answers, and so 2m + 1 i negative answers. By bounding p by 1, one obtains µ p m q m 2 2m = (4pq) m If m = c k, then µ [(4pq) c ] k. But 4pq = 4 ( 1 2 ε) ( 1 2 + ε) = 1 4ε2 < 1 ; there exists a constant c such that (4 p q) c < 1 2 and µ ( 1 2 )k. The key of the argument rests on the facts that µ (4pq) m and that 4pq = 1 4 ε 2 < 1 δ. In the case of a PP algorithm, one cannot conclude that 4pq < 1 δ: the probability of error can be p = 1 2 ( 1 2 )n for a computation tree of depth n. One obtains then: 4pq = 4 ( 1 2 ( 1 2 )n ) ( 1 2 + ( 1 2 )n ) = 1 ( 1 2 )2n 2. In order to bound the error probability by ( 1 2 )k, we need to find c such that (1 ( 1 2 )2n 2 ) c < 1 2 and obtain c > 22n 3. We need to repeat 2m + 1 = 2c k + 1 > 2k 2 2n 3 + 1, i.e. Ω(2 2n ), an exponential number of experiments. 1.1.2 Comparing PP and #P Recall from Chapter?? the definition of the counting class #P. Lemma 1.1. The classes P PP and P #P coincide. Proof : Let us first show first that P PP P #P. If L P PP, there is a polynomial time Turing s machine which makes calls to an oracle of the class PP and which accepts L. Each call to an oracle A takes an input y and answers 1 if more than half of the leaves of the computation tree accept, 0 otherwise. Suppose without loss of generality that the computation tree of A does not have all its accepting leaves. If m is the depth of the computation tree, an oracle #P computes the exact number of accepting leaves in binary with at most m + 1 bits: this number, different from 2 m, is greater than 2 m 1 iff the mth bit is 1. An oracle #P which computes the exact value, can also output the mth bit. One can replace the oracle of the class PP with an oracle of the class #P and L P #P. Let us show now that P #P P PP. If L P #P, we can replace the oracle A of the class #P by several calls to an oracle of the class PP. Let z = z m.z m 1...z 1 be the answer of the oracle #P where z i is the ith bit of the binary representation of z. The first call to the oracle A in the sense of PP gives us the mth bit. Let A 1 the oracle whose computation tree is one step deeper than the computation tree of A: on the left branch we duplicate the computation tree of A and on the right branch we generate a tree of

8 CHAPTER 1. PROBABILISTIC CLASSES the same depth as the computation tree of A with 2m 3.2m 4 accepting leaves (hence 4 rejecting leaves) if the mth bit is 1. If the mth bit is 0, we generate on the right branch a tree with 3.2m 4 accepting leaves (hence 2 m 4 rejecting leaves). The PP oracle gives us the (m 1)th bit on A 1. If we iterate this construction m times, we will know all the bits of the binary representation of z. A call to an oracle of the class #P can be replaced by m calls to an oracle of the class PP and L P PP. 1.1.3 Other probabilistic classes The three classes PP, BPP and RP generalize to other complexity classes. Just replace P by the corresponding deterministic class, for example L or NC. We shall consider RL, the analogue of RP for logarithmic space, and RNC the analogue of RP for parallel time in the sense of NC. 1.2 Examples of probabilistic algorithms We give three simple examples of probabilistic algorithms. In each case, the probabilistic algorithm has an advantage compared with deterministic algorithms, either for time or space: a random walk for the problem UGAP, an RL algorithm; a randomized algorithm which decides the existence of perfect matchings, an RNC algorithm; a test of primality, a corp algorithm. The primality problem was the standard BPP problem, not known to be in P until 2002, when a P algorithm appeared [AKS02]. It is entirely possible that an L algorithm will appear in the future for UGAP and an NC algorithm for perfect matchings. Nevertheless, these probabilistic algorithms remain extremely simple, elegant and interesting. The randomized primality tests are the algorithms used in practice. 1.2.1 Random walk in an undirected graph Let G n = (D n, E) be an undirected graph with e edges. UGAP, Undirected Graph Accessibility Problem, is a decision problem defined as: Input: An undirected graph and two nodes s and t. Output: 1 if there is a path between s and t, 0 otherwise. It is also known as GAP, Graph Accessibility Problem, in the case when the graph is symmetric. It is computable in polynomial time because there are P algorithms to compute a path between two nodes s and t. It is also in the class NL because UGAP is a restriction of GAP, an NL-complete problem. Is this problem in the class L? The answer to this question is not presently known. Let us show, however, that there exists a probabilistic algorithm in space

1.2. EXAMPLES OF PROBABILISTIC ALGORITHMS 9 O(log n), i.e. that UGAP is in the class RL. We will analyze a random walk from s which will eventually terminate in t on a positive instance. Probabilistic algorithm for UGAP. Iterate k times the procedure. Let u := s, i := 1. While i < 2.n 3 UGAP = 0. { Consider the edges whose origin is u. Select a random edge (u, u ) with a uniform distribution 1. u := u, i := i + 1. If u = t, then UGAP = 1.} The space requirement of this algorithm consists of two registers, one to code u an arbitrary node in a graph with n nodes and the other for the integer i whose value is less than 2n 3. Both use O(log n) bits. Let us show now that a random walk of length O(n 3 ) has a probability greater than 1 2 to find t if there exists a path between s and t. The proof uses some classical results on Markov chains [MR95] to estimate the average time T (i) necessary to visit all the nodes from a given node i. This average time T (i) will be bounded by O(n 3 ). We will conclude that a random walk of length greater than T (i) will have a probability greater than 1 2 to reach t if there exists a path from s to t. The previous algorithm defines a Markov process with a transition matrix A, such that a i,j is the probability to reach the node i from the node j. We can then compute A, A 2,...A k. The matrix A k gives the probability to reach a node i from a node j after k steps. The stationary probabilities if they exist, are defined as the limit probabilities on each of the nodes and are represented by the vector π such that: A.π = π where n i=1 π(i) = 1. If d(i) is the degree of the node i, then π(i) = d(i) 2e where 2e is the number of edges (i, j) E. The previous equation admits a unique solution, and this last expression is a solution. A classical result on Markov chains is: if the chain is irreducible (the graph G is connected), finite and aperiodic, then π is unique. Definition 1.5. Let t(i, j) the expected number of transitions to reach j from i and T (i) the expected number of transitions necessary for a random path to reach all the nodes from i. 1 If there are m edges, each one is selected with the probability 1/m.

10 CHAPTER 1. PROBABILISTIC CLASSES T (i) is the inverse of the stationary probability π(i) and T (i) = 2e d(i). For an edge a, let f(a) the number of times a random walk crosses the edge a and IE(f(a)) the expectation of this random variable, i.e. the average number of crossing of the edge a. We also use the fact that: IE(f(a)) = 1 2e This expectation is also called the stationary frequency of an edge and is independent of the edge considered. Lemma 1.2. If i and j are two adjacent nodes of G, then t(i, j) + t(j, i) 2e. Proof : If i and j are adjacent nodes, t(i, j) + t(j, i), the average number of transitions for a random path from i towards j and back is 2e times E, the expected number of occurrences of an edge a. This expectation is independent of the edge a from the previous remark. So t(i, j) + t(j, i) = 2e.E. Consider the edge a = (i, j). The expectation of the number of appearances of this edge, written E a, is less than 1, because numerous paths from i to j and back do not follow this edge 2. Therefore t(i, j) + t(j, i) 2e. From this lemma, we can deduce that if d(i, j) is the distance between i and j i.e. the number of edges of the shortest path between i and j, then t(i, j) + t(j, i) 2e.d(i, j) Lemma 1.3. T (i) 2e.(n 1). Proof : Let H be a spanning tree for the graph G. To explore all the nodes of G from a node s, it is necessary to cross all edges e of H in both directions. Therefore, T (i) From the previous lemma, we conclude: (j,j ) H (t(i j, i j ) + t(i j, i j )) T (i) 2e.(n 1) Notice that e < n 2 and so T (i) < 2.n 3. Theorem 1.1. The problem UGAP is in the class RL. Proof : Generate a random path from s of length 2.n 3 + 1. The probability to find t, if there exists a path between s and t is greater than 1 2 + ε. If there is no path between s and t, this procedure does not make any error. If there is a path, we repeat this procedure k times, the rate of error is less than 1/2 k, and we obtain an RL algorithm. 2 If we need to take this edge a, it is called an isthmus, and E a = 1.

1.2. EXAMPLES OF PROBABILISTIC ALGORITHMS 11 1.2.2 Perfect matching in a bipartite graph The perfect matching problem takes a bipartite graph G with 2n nodes and decides if there exists a set of n edges which defines a matching, such that every node is covered by a single edge. This problem is computable in polynomial time, i.e. in the class P but no NC algorithm is known for this problem. We are going to show that the perfect matching problem is in the class RNC. Definition 1.6. A matching M is a subset of non-adjacent edges such that every edge of G is adjacent to an edge of M. If V is even, a matching is perfect if M contains V /2 edges. Let G be a graph and A its adjacency matrix. The Tutte s matrix associated with the graph G is defined by: x i,j if i > i and a i,j = 1 b ij = x j,i if i < j and a i,j = 1 0 otherwise One associates with a Tutte s matrix a polynomial defined as the determinant of the matrix: det(b) = σ i x i,σ(i) where σ Σ is a permutation of {1, 2,.., n}. A perfect matching is a particular permutation such that i x i,σ(i) 0. The value of a permutation σ is : value(σ) = n i=1 b i,σ(i) If cycles(σ) is the number of cycles of σ and sign(σ) = ( 1) n+ cycles(σ), then: det(b) = σ sign(σ).value(σ) Theorem 1.2. det(b) = 0 iff the graph G has no perfect matching Proof : A perfect matching is a permutation which associates a monomial i x i,σ(i). If we associate non-zero values to all the x i,j of the monomial and zero values to all the others, then the determinant is not null. If the determinant is not null, we can construct the matching from non-null monomials. Let Σ 1 be the set of permutations which contain at least one cycle of odd length and Σ 2 the other permutations. det(b) = sign(σ).value(σ) + sign(σ).value(σ) σ Σ 1 σ Σ 2

12 CHAPTER 1. PROBABILISTIC CLASSES If the determinant is not null, there exists a permutation in Σ 2. Consider a permutation σ i Σ 1 and let C its odd cycle of maximum length. Let σ i be the permutation σ i where the cycle C is taken in the opposite direction. Then value(σ i ) = value(σ i ) and σ i,σ sign(σ).value(σ) = 0. Generalizing to all i permutations of Σ 1, sign(σ).value(σ) = 0 σ Σ 1 and consequently σ Σ 2 sign(σ).value(σ) 0 and there is a permutation with even cycles only. Such a permutation defines a perfect matching by taking every other edge along the even cycles. Lemma 1.4 (Schwartz). Let P be a polynomial with n variables of total degree d. On a domain of cardinality m, it has at most d.m n 1 roots. Proof : We prove this result by induction on n. If n = 1, then the polynomial has at most d roots. Assume the property true for n 1 and let us show that it is true for n. Let us write P by isolating the variable x n : P (x 1,..., x n ) = g 0 (x 1,..., x n 1 ) + g 1 (x 1,..., x n 1 ).x n +... + g k (x 1,..., x n 1 ).x k n Each polynomial g i is of degree at most d i and g k is of degree at most d k on the variables x 1,..., x n 1. According to the induction hypothesis, every polynomial has at most (d i).m n 2 roots. Let us compute the number of a = (a 1,..., a n ) such that P (a) = 0 by distinguishing two cases: either a n is a root of the polynomial P (a 1,..., a n 1, x n ), or (a 1,..., a n 1 ) is a root of each g i and g i (a 1,..., a n 1 ) = 0. In the first case, the number of a is at most k.m n 1 because the degree of P ( in x n ) is k. In the second case, the number of roots is bounded by the number of roots of g k (which is of smaller degree) multiplied by m (because a n is arbitrary) and so is bounded by (d k).m n 2.m, by the induction hypothesis. One concludes that the number of roots is bounded by as claimed. k.m n 1 + (d k).m n 1 = d.m n 1 One can also formulate the previous lemma by saying that Pr[P (x 1,..., x n ) = 0] d m There are at most d.m n 1 roots in a space of size m n, hence the probability that a random choice a is a root is bounded by d.mn 1 m n = d m. Denote the set {1, 2,..., N} by [1...N]. Corollary 1.1. Let P be a polynomial with n variables with integer coefficients. If N c.degree(p ), then there are at most N n /c roots in the space [1...N] n.

1.2. EXAMPLES OF PROBABILISTIC ALGORITHMS 13 One can then imagine a probabilistic algorithm which selects random values c i,j for x i,j in the interval [1...N], and evaluate detb c = det(b[c i,j /x i,j ]), i.e. the Tutte s determinant where the values c i,j replace the variables x i,j, using for example Csanky s algorithm, an NC 2 procedure. Algorithm for perfect matching: Iterate k times the procedure. Generate random values c i,j < N. If detb c 0, then the graph has a perfect matching. Otherwise the graph has probably no perfect matching. We can conclude with the analysis of this algorithm: Theorem 1.3. The perfect matching problem is in the class RNC. Proof : If detb c 0, then the determinant detb is not identically null and the algorithm gives the correct answer, i.e. there exists a perfect matching. If detb c = 0, then either detb = 0 or detb 0 but the random values are the roots of the polynomial and the algorithm gives an incorrect answer. If N > 2n, where n is the total degree of the polynomial, the probability to select the roots of the polynomial (the probability of error) is less than 1 2 ε from the previous remarks. If the procedure makes k independent trials, the probability of error µ is the the probability to select each time the roots of the polynomial (detb c = 0) and µ < 1/2 k. The computation of the determinant detb c is realized by Csanky s algorithm, in parallel time O((log n) 2 ). The algorithm guarantees that the perfect matching problem is in the class RNC. 1.2.3 Solovay Strassen primality test Fermat s theorem states that if p is prime then for any a less than p, the relation a p 1 1 (p) holds, i.e. there exists an integer k such that a p 1 = k.p+1. The converse is however false because there are non-prime numbers, called the Carmichael numbers, which have the same property. To go further, we need basic results on quadratic residues. Consider the equation x 2 a (p), i.e. x 2 is congruent to a modulo p. If there is a solution to the equation x 2 a (p), one says that a is a quadratic residue modulo p, otherwise it is a non-quadratic residue. There is a simple algorithm (in polynomial time) to decide if a is a quadratic residue. If p is prime, Jacobi s symbol of a and p is, ( a 0 a 0 (p) p ) = 1 if a is a quadratic residue 1 otherwise

14 CHAPTER 1. PROBABILISTIC CLASSES If p is not prime, let the decomposition of p in prime numbers be: p = p a 1 1...pa k k. In this case, Jacobi s symbol is: ( a k p ) = ( a p )a i i=1 Jacobi s symbol is the product of the symbols ( a p 1 ), ( a p 2 ),..., ( a p k ) with their order of multiplicity a 1...a k. Jacobi s symbol is an interesting function because it can be computed easily without knowing the prime decomposition of a and p. The remarkable result of Number theory which we use is: Theorem 1.4. If p is prime, then for all a such that 1 a p 1, ( a p 1 ) ) = a( 2 p (p) If p is composite, then for more than half of the a, ( a p 1 ) ) a( 2 p (p) In this last case, one says that a is the witness of the fact that p is composite and we have many witnesses. Notice that the two conditions imply that p is prime iff for all a such that 1 a p 1, ( a p 1 ) = a( 2 ) (p) p We can then imagine the following probabilistic algorithm [SS77] to decide if a given number n is prime. Solovay Strassen primality test. Iterate k times the procedure. Generate a random a between 2 and n 1 and compute the greatest common divisor (a, n). If (a, n) > 1, then n is composite. Otherwise, compute X = ( a n 1 n ) (n) and Y = a( 2 ) (n). If X Y, then n is composite. Otherwise n is probably prime. Notice that this procedure relies on the fast computation of the greatest common divisor (with Euclid s algorithm for example) and of the Jacobi s symbol. If n is prime, this algorithm always gives a correct answer. If n is composite, the probability of error is at most 1 2 at the first iteration, from the second condition of the previous theorem. The error probability is at most 1 4 at the second iteration and at most 1 at the kth iteration. We can then conclude. 2 k Theorem 1.5. The primality problem is in the class corp.

1.2. EXAMPLES OF PROBABILISTIC ALGORITHMS 15 Another more complex algorithm shows that the problem is also in the class RP and we conclude that the primality problem is in the class ZPP. Bibliographical notes. The probabilistic Turing machines were introduced by J. Gill [Gil77] as the classes RP, BPP and PP. The importance of probabilistic machines was shown with the primality tests of [Rab80] and [SS77]. The use of the random walk for the problem UGAP is detailed in [AKR + 79] and the probabilistic algorithm for perfect matching is due to [MVV87]. The probabilistic algorithms are studied in detail in the book of R. Motwani and P. Raghavan [MR95].

16 CHAPTER 1. PROBABILISTIC CLASSES 1.3 Exercises 1. Elementary properties of the classes RP, BPP and PP: (a) show that the classes BPP and PP are closed by complement; (b) show that the classes BPP and RP are closed by union, intersection and Cartesian product; (c) show that the class PP is closed by intersection and Cartesian product [BRS91]. 2. Search of a perfect matching [MVV87]. Generalize the probabilistic algorithm which decides the existence of a perfect matching, and show that one can also explicitly find a perfect matching when it exists, using an RNC algorithm. 3. Miller Rabin primality test [Rab80]. Let n be an odd integer and b an integer between 1 and n. Let m, t be integers such that n 12 t.m where m is odd. Let W n {0, 1,..., n 1} a unary relation defined by: b W n if (a) b n 1 1 (mod n), or (b) i {0, 1,..., t 1} such as b 2i.m 1 (mod n), b 2i.m 1 (mod n) and b 2i+1.m 1 (mod n). The ultimate goal of the exercise is to show that if n is composite, W n (n 1)/2. (a) Show that if n is prime, then W n is the empty set. (b) Suppose there exists a {1,..., n 1} such as a n 1 1 (mod n). Show that B = {b : b n 1 1 (mod n)} is a subgroup of Z n = {b {0, 1,..., n 1} : (b, n) = 1}. Show that W n, the complement of W n is a subset of B and that B (n 1)/2. (c) Suppose that for all a {1,..., n 1}, a n 1 1 (mod n) i. For i = 1,..., t show that B i = {b : b 2i.m 1 ou b 2i.m 1 (mod n)} is a subgroup of Z n. ii. Let f the function such that f(x) = 0 if x m 1 and f(x) = i if x 2i.m 1. Show that f is well defined and that for any x W n, fx) t 1. iii. Let B = B j where j = Max{f(x) : x W n }. Show that W n B and that B (n 1)/2.

Chapter 2 Probabilistic verification The class NP is often called the class of problems verifiable in polynomial time. Indeed, a language L is in the class NP iff there exists a binary relation R which satisfies: R is decidable in polynomial time; there exists k such that if R(x, y), then y < O( x k ); L = {x : there exists y such that R(x, y)}. We say that y is a witness or a proof of the fact that x L. This situation can be represented as follows: a prover P transmits a proof y to a verifier V to convince him that x L. The verifier accepts x iff R(x, y) is satisfied and this can be computed in polynomial time. Figure 2.1: The classical schema between a prover P and a verifier V. The classes introduced in the previous chapter are associated with decision problems and generalize the classic complexity classes. We wish to generalize NP as we generalized P to BPP. Probabilistic verification is the problem of verifying with a probabilistic algorithm whether two values a, b are such that f(a) = b for a given function f. The verifier will follow a procedure which uses both randomness and interaction with a prover, to test whether the prover knows that f(a) = b. The probabilistic verification generalizes the previous schema in two specific ways : the verifier will use randomness and interaction, i.e. will determine new questions to the prover. The objective is to show that some problems which have long proofs (of exponential length) can be verified in polynomial time with high probability. This new notion modifies the classical notion of a proof in logic. 17

18 CHAPTER 2. PROBABILISTIC VERIFICATION In the first section, we study the class IP of Interactive Proofs. In the second section we give classical examples. In the third section we study MIP the extension of IP to Multiprovers and in the fourth section the class PCP, of Probabilistically Checkable Proofs. 2.1 Interactive proofs The previous schema is generalized as the verifier interacts with the prover and follows a BPP algorithm. We consider a model where the random bits are secret, i.e. not known to the prover. Figure 2.2: Interaction between a prover P and a verifier V in the IP model. Let P and V be two Turing machines which communicate through a common tape where they exchange messages. The machine V uses a secret 1 random coin and follows a BPP algorithm, whereas the prover P has no constraints 2. The input x is known of both P and V and the two machines exchange messages on a common tape. The exchanges are described by a sequence of functions Vρ 1,.., Vρ k which define the messages generated by V (also called questions to the prover) and a sequence of functions P 1,..., P k 1 which define the messages generated by P (also called the answers of the prover). The verifier accepts or rejects in polynomial time: one writes P.V (x) = 1 if V accepts and P.V (x) = 0 if V rejects. Let x be an input of length n and ρ {0, 1} m a random word representing the random bits generated by V. A protocol is a sequence of functions Vρ 1,..., Vρ k 1 and of functions P 1,..., P k 1 computable in polynomial time. Each Vρ j is a function of ({0, 1} ) j into {0, 1} for 0 < j < k where k n p. Each Pρ j is a function of ({0, 1} ) j+1 into {0, 1}. The function Vρ k maps ({0, 1} ) k into {0, 1}. We write: V computes u 1 = Vρ 1 (x) P computes v 1 = P 1 (x, u 1 ) V computes u 2 = Vρ 2 (x, v 1 ) P computes v 2 = P 2 (x, u 1, u 2 ) 1 The prover ignores the result of the coin flipping. In another model, the AM (Arthur Merlin) games, the coin flipping is public, i.e. known to the prover. The two models are equivalent. 2 The prover can in principle use non-recursive oracles. A prover limited within the class PSPACE would give, however, an equivalent model.

2.2. EXAMPLES OF INTERACTIVE PROTOCOLS 19... P computes v k 1 = P k 1 (x, u 1, u 2,..., u k 1 ) V computes u k = Vρ k (x, v 1,..., v k 1 ) {0, 1} The functions V i ρ depend on the random vector ρ and the last value u k is the decision of the verifier V. The result u k = 1 (resp. v k = 0) is also written P.V (x) = 1 (resp. P.V (x) = 0). We consider the following probability: Pr ρ [P.V (x) = 1] where the probabilistic space is the set of random boolean sequences of length m where m is O(n r ), all equiprobable. Definition 2.1. A language L admits an interactive proof if there is a protocol such that for all x : if x L, there exists a prover P such that Pr[P.V (x) = 1] = 1 if x L, then for any prover P, Pr[P.V (x) = 1] 1 2. Notice that the first condition assumes no error. In a more general definition, we could allow an error in both conditions and take Pr[P.V (x) = 1] 1 ε. The first condition is sometimes called the completeness and the second condition is called the soundness. An honest prover always satisfies the first condition. Definition 2.2. The class IP is the class of languages L for which there is an interactive proof. Notice that NP IP, because the protocol is reduced to a single interaction. In the case of SAT, the verifier asks the prover for a valuation which satisfies all the clauses and then verifies it in polynomial time. In the same way corp IP. The first indication that IP is a class much larger than NP is the existence of protocols for problems in conp. The most classical example is the protocol for the non-isomorphism of graphs, described below. 2.2 Examples of interactive protocols We describe three protocols for the following problems: the problem of the non-isomorphism of graphs, a problem in the class conp, the verification of the permanent, a problem of the class #P, the counting class introduced in chapter??, the QBF problem, a PSPACE-complete problem. The first protocol is very simple and explains the interest of such a model. The protocol for the verification of the permanent introduces fundamental algebraic techniques and also uses Schwartz s lemma. The protocol for QBF is a generalization of the protocol for the permanent.

20 CHAPTER 2. PROBABILISTIC VERIFICATION 2.2.1 The problem of non-isomorphism of graphs The problem of the non-isomorphism of graphs is defined as follows: GRAPH.NON.ISO: Input: two graphs (G 1, G 2 ) with n nodes. Output: 1 if G 1 G 2, 0 otherwise. If we interchange the output values, i.e. if the output is the value 0 if G 1 G 2, and 1 otherwise, we define the graph isomorphism problem, which is in the class NP. In this case, the prover transmits a permutation π which he claims satisfy G 2 = π(g 1 ). The verifier checks that for all nodes a, b, we have E 1 (a, b) iff E 2 (π(a), π(b)), which can be done in time O(n 2 ). The graph non-isomorphism is therefore in the class conp and it seems that any prover may have to consider all the possible permutations π and make sure that none is an isomorphism: this constitutes a proof of exponential length. We can however design an interactive protocol where the prover and the verifier exchange O(n 2 ) bits with a probabilistic verifier bounded in time O(n 2 ). Suppose we select a random permutation π on the finite domain D n (the set of nodes) of the graph with a uniform distribution, with classical probabilistic techniques [MR95]. One can generate a graph G = (D n, E ) isomorphic to the given graph G = (D n, E) by defining E (i, j) iff E((πi), π(j)). This construction is used by the following interactive protocol. Protocol for GRAPH.NON.ISO: x = (G 1, G 2 ). 1. V picks a random α r {1, 2}. If α = 1 (resp. α = 2), V constructs a new graph G isomorphic to G α by choosing a random permutation π such that G = π(g). The verifier V transmits this graph G to P. 2. P send β {1, 2} to V. 3. If β α, then P.V (x) = 0, otherwise P.V (x) = 1. The verifier constructs isomorphic copies G of G 1 or G 2 and asks the prover to decide if the copy G comes from G 1 or from G 2. Equivalently he asks the prover to decide the value of the random bit α which is secret. The correct answer of the prover is β = 1 if G G 1 and β = 2 if G G 2, which can be computed if the two graphs G 1 and G 2 are non-isomorphic. The verifier V interprets a correct answer of P as evidence that the graphs are not isomorphic, and an incorrect answer as evidence that graphs are isomorphic. Let us show that this procedure is an interactive proof. Theorem 2.1. The problem GRAPH.NON.ISO is in the class IP. Proof : If x = (G 1, G 2 ) GRAPH.NON.ISO, there is a prover which never makes a mistake and can compute β = α. For all random choice, Pr[P.V (x) = 1] = 1

2.2. EXAMPLES OF INTERACTIVE PROTOCOLS 21 If x = (G 1, G 2 ) GRAPH.NON.ISO, the graphs G 1 and G 2 are isomorphic. The new graph G G 1 G 2, and the prover cannot distinguish the origin of G. The probability that the prover selects β α is 1 2, in which case P.V (x) = 0 and there is no error. The probability that the prover selects β = α is also 1 2, in which case P.V (x) = 1 and there is an error. The second condition of an IP protocol is verified. It is clear that by repeating the test k times, the probability of error is 1 2 k. 2.2.2 A protocol for the permanent The permanent of a matrix A of size (n, n) is defined as: P erm(a) = n π S n i=1 a i,π(i) where S n is the set of permutations on {1, 2,..., n} and π is a permutation. The definition of the permanent is very close to the determinant, as we only add the monomials. It is however much more difficult to compute the permanent than the determinant. A precise formalization of this difficulty was given by Valiant [Val79] who showed that the computation of the permanent is #P-complete (see Chapter??), a class of functions such that the polynomial hierarchy PH is contained in the class P #P. The protocol LFKN introduced in [LFKN92] for the permanent generalizes a protocol introduced by Nisan to verify the number of models satisfying a propositional formula CNF or DNF. The verification of the permanent is the decision problem associated with the language PERM = {(A, a) : P erm(a) = a} The verifier V is going to maintain a list of couples (A i, a i ) such that if all the couples (A i, a i ) satisfy P erm(a i ) = a i mod p for a prime number p chosen in advance, then V accepts. We say that the statement (A i, a i ) is true. All the numerical values are taken modulo a prime number p chosen in advance by the verifier to guarantee that the length of the messages are of polynomial size. There are two stages: an extension stage and a reduction stage. Extension stage: the list ((A, a)) (reduced to a single couple of dimension k) is replaced by a list ((A 1, a 1 ), (A 2, a 2 ),..., (A k, a k )) where every A i is of dimension k 1 according to the co-factors equation k P erm(a) = a 1,i P erm(a i ) where A i is the ith co-factor, of dimension k 1, and a i = P erm(a i ). i=1 Reduction stage: the list ((A 1, a 1 ), (A 2, a 2 ),..., (A k, a k )) with k elements of dimension j is reduced to a single element of dimension j: ((B, b)). One combines A 1, A 2,..., A k in a single matrix B = δ 1 (λ).a 1 + δ 2 (λ).a 2 +... + δ k (λ).a k

22 CHAPTER 2. PROBABILISTIC VERIFICATION obtained with the interpolation polynomials 3 δ i (λ) where 0 < i k and δ i (λ) = 0<j k,j i (λ j) (i j) The expression δ i (λ).a i is the matrix A i where all the coefficients are multiplied by δ i (λ). This procedure is called linearization and preserves the dimension j. The permanent of B is a polynomial Q in λ of degree less than k 2. P erm(b) = Q[λ] = i=1,...,k 2 p i λ i The verifier V asks the prover P to transmit the coefficients p i of this polynomial. In an equivalent way, the verifier could ask the prover for the permanent of certain k 2 specific matrices in λ. By interpolation he would obtain the coefficients of the polynomial. The verifier V verifies that: if λ = 1, then Q[1] = P erm(a 1 ) = a 1,... if λ = k, then Q[k] = P erm(a k ) = a k. These tests are called the coherence tests: if one of the tests is not true the verifier rejects. The verifier V chooses then a random integer value λ 0 on the interval [1,..., p] where p is the prime number chosen in advance such that 2 n p n! 2 n because perm(a) n! 2 n. The verifier V defines then ((B, b)) such that B = δ 1 (λ 0 ).A 1 + δ 2 (λ 0 ).A 2 +... + δ k (λ 0 ).A k and b = Q[λ 0 ]. The protocol LFKN can be described in the following way : Protocol LFKN for the verification of the permanent: Input : L = ((A n, a)). Choose a prime number p, such that 2 n p n!.2 n. All the numerical values are considered modulo p. For i = n,..., 1, iterate : 1. Let L = ((A i, a)). For i = 1, if P erm(a 1 ) = a then P.V (x) = 1 otherwise P.V (x) = 0. 2. Extension step: V asks P for the values a j of the permanent of the co-factors matrices A j i 1 following the first line of A i for j = 1,.., i. If a = i a 1,i.a j j=1 3 The interpolation polynomials satisfy the fundamental property : δ i(λ) = 0 for λ = 1,..., i 1, i + 1,..., k and δ i(λ) = 1 for λ = i.

2.2. EXAMPLES OF INTERACTIVE PROTOCOLS 23 the verifier V replaces ((A i, a)) by ((A 1 i 1, a 1), (A 2 i 1, a 2),..., (A i i 1, a i) following the development of the co-factors, otherwise P.V (x) = 0. 3. Reduction step: V asks for the coefficients of the polynomial Q(λ) associated with the list ((A 1 i 1, a 1), (A 2 i 1, a 2),..., (A i i 1, a i)). If a coherence test is false, then P.V (x) = 0, otherwise V defines (B i 1, b) by choosing a random λ 0 in the interval [1...p], and b = Q[λ 0 ]. B = δ 1 (λ 0 ).A 1 i 1 + δ 2 (λ 0 ).A 2 i 1 +... + δ i (λ 0 )A i i 1 Two lemmas show that the LFKN protocol is a protocol IP for the language PERM. Lemma 2.1. If ((A, a)) is false, i.e. if P erm(a) = a is false, at least one statement in the list ((A 1, a 1 ), (A 2, a 2 ),..., (A k, a k )) is false. Proof : Consider the equation linking the permanent of A to the permanent of the co-factors, i.e. P erm(a) = k a 1,i P erm(a i ) i=1 If the statement ((A, a)) is false and the previous equation is satisfied, at least one of the statements (A i, a i ) must be false. Lemma 2.2. In a reduction step, if each ((A 1, a 1 ), (A 2, a 2 ),..., (A k, a k )) statement is true and the prover is honest, then ((B, b)) is true. For any prover, if there is a false statement in ((A 1, a 1 ), (A 2, a 2 ),..., (A k, a k )), then ((B, b)) is false with probability greater than 1 n2 p. Proof : An honest prover follows the interpolation method. If he starts with true statements, he obtains the true statement ((B, b)). If one of the statements is false, P has to send a polynomial f which passes all the coherence tests but differs from the polynomial f. For a random choice of λ, the probability that these two polynomials give the same value is equal to d p, from Schwartz s lemma if d is the degree of f. As d n 2, the probability that ((B, b)) is a false statement is greater than 1 n2 p. Theorem 2.2. The language PERM is in the class IP. Proof : If x L, there is a prover P which passes all the coherence tests and P erm(a 1 ) = a so the verifier accepts. If x L, according to Lemma 2.1, one of the statements (A j i, a j) is false during the extension step for any prover. According to the lemma 2.2, (B i, b) is probably false (with a high probability) and for i = 1 the verifier rejects with a high probability.

24 CHAPTER 2. PROBABILISTIC VERIFICATION 2.2.3 A protocol for QBF The previous protocol can now be generalized to obtain a protocol for the problem QBF. This problem is PSPACE-complete so that we conclude that: PSPACE IP. Notice that it is easy to check that IP PSPACE as we can check all possible random choices and for each one simulate the protocol in polynomial space. The protocol introduces an important technique (already used for the problems #CNF and #DNF): the arithmetization of formulas which associates with a formula ψ(x) a polynomial in x. The method consists in verifying a sequence of polynomials obtained by elimination of the variables and is similar to the LFKN protocol for the permanent. Two problems arise then: the integers exchanged between the prover and the verifier can be too large, of the order 2 2n, and the degree of the polynomials can be too high, of the order 2 n. The first problem is solved by taking all the integers modulo a prime number p fixed in advance. The second problem is solved by considering certain formulas QBF such that the degree of the associated polynomials remains polynomial. Every formula QBF is equivalent to a formula of this form. The arithmetization of formulas Suppose that the boolean formulas only use negations on boolean variables. If a boolean formula does not satisfy this condition, one can find a an equivalent one which satisfies it by applying the rules of classical logic. Example. The following formula: x 1 (x 1 x 2 x 3 ( x 1 x 2 )) is equivalent to the formula: x 1 ( x 1 x 2 x 3 ((x 1 x 2 )) where the negation is only applied to the boolean variables. The arithmetization maps a formula ψ into an algebraic expression α(ψ) as follows: α(x i ) = x i α( x i ) = 1 x i α(ψ 1 ψ 2 ) = α(ψ 1 ). α(ψ 2 ) α(ψ 1 ψ 2 ) = α(ψ 1 ) + α(ψ 2 ) α( x i ψ(x i )) = x i =0,1 α(ψ(x i)) α( x i ψ(x i )) = x i =0,1 α(ψ(x i)) Example. If ψ is a closed formula: x 1 ( x 1 x 2 x 3 ((x 1 x 2 ) x 3 ))

2.2. EXAMPLES OF INTERACTIVE PROTOCOLS 25 then α(ψ) is the expression: [(1 x 1 ) + (x 1.x 2 + x 3 )] x 1 x 3 x 2 which is the integer value 2. The arithmetization of a formula with free variables yields a polynomial or a functional form. Definition 2.3. The functional form with respect to x i of a formula B is the polynomial in x i obtained by the arithmetization of the formula where we suppress the leftmost quantifier on x i in B. We write this polynomial as B i (x i ). Example. If B is the formula: x 1 ( x 1 x 2 x 3 ((x 1 x 2 ) x 3 )) the functional form of B with respect to x 1 is B 1 (x 1 ) : B 1 (x 1 ) = [(1 x 1 ) + (x 1.x 2 + x 3 )] x 2 x 3 B 1 (x 1 ) = x 2 1 + 1 The relation between the truth of a formula B and the numerical value of the arithmetization is given by the following lemma. Lemma 2.3. The formula B is true iff α(b) 0. Proof : Let us show this simple result by induction on the structure of the formula B. If B is atomic, B 1, B 1 B 2 or B 1 B 2 the result is true. If B is x 1 B 1 (x 1 ) then B is true if B 1 (0) or B 1 (1) is true: if one of the expressions is not zero, i.e. there is an x 1 which satisfies B 1 and the sum is not zero. Handling large values and large degrees The arithmetization of a formula may yield numerical values which can be of the order of 2 2n and require an exponential length. If B is the formula x 1... x n x n+1 x n+2 (x n+1 x n+2 ) the numerical value obtained is 4 2n. On the other hand, the degree of the polynomial can be large. For the formula x 2... x n (x 1 x 2 ) the arithmetization is x 2n 2 1.(x 1 + 1) 2n 2