(x 1 +x 2 )(x 1 x 2 )+(x 2 +x 3 )(x 2 x 3 )+(x 3 +x 1 )(x 3 x 1 ).

Similar documents
CPSC 536N: Randomized Algorithms Term 2. Lecture 9

Lecture 2: January 18

Randomized Algorithms

CSC 5170: Theory of Computational Complexity Lecture 5 The Chinese University of Hong Kong 8 February 2010

[Disclaimer: This is not a complete list of everything you need to know, just some of the topics that gave people difficulty.]

Lecture 7: Schwartz-Zippel Lemma, Perfect Matching. 1.1 Polynomial Identity Testing and Schwartz-Zippel Lemma

Chapter 1 Review of Equations and Inequalities

Example: 2x y + 3z = 1 5y 6z = 0 x + 4z = 7. Definition: Elementary Row Operations. Example: Type I swap rows 1 and 3

Advanced Combinatorial Optimization Feb 13 and 25, Lectures 4 and 6

1 Reductions from Maximum Matchings to Perfect Matchings

Review for Exam Find all a for which the following linear system has no solutions, one solution, and infinitely many solutions.

CS1800: Strong Induction. Professor Kevin Gold

Essential facts about NP-completeness:

Designing Information Devices and Systems I Fall 2018 Lecture Notes Note Introduction to Linear Algebra the EECS Way

* 8 Groups, with Appendix containing Rings and Fields.

Lecture 14: IP = PSPACE

Quadratic Equations Part I

Lecture 5 Polynomial Identity Testing

Counting bases of representable matroids

Theory of Computer Science to Msc Students, Spring Lecture 2

Linear Algebra, Summer 2011, pt. 2

Designing Information Devices and Systems I Spring 2018 Lecture Notes Note Introduction to Linear Algebra the EECS Way

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 1 x 2. x n 8 (4) 3 4 2

Advanced topic: Space complexity

Lecture 9: Elementary Matrices

Answers in blue. If you have questions or spot an error, let me know. 1. Find all matrices that commute with A =. 4 3

-bit integers are all in ThC. Th The following problems are complete for PSPACE NPSPACE ATIME QSAT, GEOGRAPHY, SUCCINCT REACH.

Chapter 5 : Randomized computation

8 Square matrices continued: Determinants

Final Review Sheet. B = (1, 1 + 3x, 1 + x 2 ) then 2 + 3x + 6x 2

0.1. Linear transformations

Lecture for Week 2 (Secs. 1.3 and ) Functions and Limits

22m:033 Notes: 3.1 Introduction to Determinants

Volume in n Dimensions

No Solution Equations Let s look at the following equation: 2 +3=2 +7

Math 138: Introduction to solving systems of equations with matrices. The Concept of Balance for Systems of Equations

MATH 115, SUMMER 2012 LECTURE 12

Determinants - Uniqueness and Properties

6.045: Automata, Computability, and Complexity (GITCS) Class 17 Nancy Lynch

Lecture 6: Finite Fields

Modern Algebra Prof. Manindra Agrawal Department of Computer Science and Engineering Indian Institute of Technology, Kanpur

Recitation 8: Graphs and Adjacency Matrices

MATH 310, REVIEW SHEET 2

Chapter 4. Solving Systems of Equations. Chapter 4

Differential Equations

CMPSCI611: Three Divide-and-Conquer Examples Lecture 2

Further Mathematical Methods (Linear Algebra) 2002

Factoring. there exists some 1 i < j l such that x i x j (mod p). (1) p gcd(x i x j, n).

CPSC 467b: Cryptography and Computer Security

x 1 + x 2 2 x 1 x 2 1 x 2 2 min 3x 1 + 2x 2

Math 775 Homework 1. Austin Mohr. February 9, 2011

Generating Function Notes , Fall 2005, Prof. Peter Shor

MAT1302F Mathematical Methods II Lecture 19

Math101, Sections 2 and 3, Spring 2008 Review Sheet for Exam #2:

Basic matrix operations

Topic: Sampling, Medians of Means method and DNF counting Date: October 6, 2004 Scribe: Florin Oprea

Lecture 22: Counting

Algebra Exam. Solutions and Grading Guide

Determinants of 2 2 Matrices

MATH 61-02: PRACTICE PROBLEMS FOR FINAL EXAM

1 GSW Sets of Systems

20.1 2SAT. CS125 Lecture 20 Fall 2016

1 Adjacency matrix and eigenvalues

GAUSSIAN ELIMINATION AND LU DECOMPOSITION (SUPPLEMENT FOR MA511)

6 EIGENVALUES AND EIGENVECTORS

Algorithms: Lecture 2

Review Questions REVIEW QUESTIONS 71

36 What is Linear Algebra?

Mathematics-I Prof. S.K. Ray Department of Mathematics and Statistics Indian Institute of Technology, Kanpur. Lecture 1 Real Numbers

One-to-one functions and onto functions

Functions and cardinality (solutions) sections A and F TA: Clive Newstead 6 th May 2014

1 Primals and Duals: Zero Sum Games

1300 Linear Algebra and Vector Geometry

6.841/18.405J: Advanced Complexity Wednesday, February 12, Lecture Lecture 3

Example: 2x y + 3z = 1 5y 6z = 0 x + 4z = 7. Definition: Elementary Row Operations. Example: Type I swap rows 1 and 3

CS1800: Mathematical Induction. Professor Kevin Gold

DETERMINANTS. , x 2 = a 11b 2 a 21 b 1

CS 6820 Fall 2014 Lectures, October 3-20, 2014

REVIEW QUESTIONS. Chapter 1: Foundations: Sets, Logic, and Algorithms

ALGEBRA. 1. Some elementary number theory 1.1. Primes and divisibility. We denote the collection of integers

Discrete Mathematics and Probability Theory Spring 2014 Anant Sahai Note 3

CSCI3390-Lecture 6: An Undecidable Problem

STEP Support Programme. Hints and Partial Solutions for Assignment 17

Lecture 7: More Arithmetic and Fun With Primes

ORDERS OF ELEMENTS IN A GROUP

CS 151 Complexity Theory Spring Solution Set 5

compare to comparison and pointer based sorting, binary trees

3 The language of proof

Definition 6.1 (p.277) A positive integer n is prime when n > 1 and the only positive divisors are 1 and n. Alternatively

Introduction to Algorithms / Algorithms I Lecturer: Michael Dinitz Topic: Intro to Learning Theory Date: 12/8/16

Chapter 2. Mathematical Reasoning. 2.1 Mathematical Models

NP-Completeness I. Lecture Overview Introduction: Reduction and Expressiveness

Time to learn about NP-completeness!

CS 124 Math Review Section January 29, 2018

1 Positive definiteness and semidefiniteness

Lecture 1 Systems of Linear Equations and Matrices

STAT 309: MATHEMATICAL COMPUTATIONS I FALL 2018 LECTURE 13

MTH 464: Computational Linear Algebra

1 Randomized Computation

CSCI3390-Lecture 18: Why is the P =?NP Problem Such a Big Deal?

Transcription:

CMPSCI611: Verifying Polynomial Identities Lecture 13 Here is a problem that has a polynomial-time randomized solution, but so far no poly-time deterministic solution. Let F be any field and let Q(x 1,..., x n ) be a polynomial of degree d in n different variables. We are not necessarily given Q in its standard form as a sum of monomials! It may be, for example, an arithmetic product of polynomials which when computed out would be too big for us to represent. All we are guaranteed is that we can evaluate this polynomial given specific values of each of the variables. Our problem is to decide whether Q is identically zero, meaning that it is equal to the zero polynomial as a polynomial (we write this Q 0 ). The problem would be trivial if Q were in standard form but consider the polynomial (x 1 +x 2 )(x 1 x 2 )+(x 2 +x 3 )(x 2 x 3 )+(x 3 +x 1 )(x 3 x 1 ). When we multiply this out, everything cancels and we get zero. But in general, the computation of the standard form of an expression of length n is not a polynomialtime operation, simply because the output might take too long to write down. 1

Testing for zero is really the same problem as taking two polynomials in this form and testing whether they are equal, since Q Q iff Q Q 0. (If we can evaluate both Q(z 1,..., z n ) and Q (z 1,..., z n ), we can certainly evaluate their difference.) We ll also see an application of this problem to graph theory in a bit. The main idea of Schwartz algorithm (used, for example, by the Maple symbolic algebra system) is a simple one. We choose a value z i for each variable x i at random, and evaluate Q(z 1,..., z n ). If Q 0, we will always get an answer of 0. If Q 0, however, it seems reasonable that we should get a fairly random answer, and thus we are likely to get a nonzero result, which will prove that Q 0. It is possible for a function to be zero most of the time without being identically zero, of course, but we will show that a polynomial cannot do so, at least over an infinite field or a field larger than the degree of the polynomial. 2

One round of Schwartz algorithm proceeds as follows: Choose a set S of m distinct field values. Choose z 1,..., z n each uniformly from S, independently. Evaluate Q(z 1,..., z n ). If you get 0, return true, else return false. As we said, if Q 0 we will return true with probability 1. We need to analyze the probability of returning true if Q is a nonzero polynomial of degree d in fact we will show that this probability is d/m. Since m is under our control (as long as the field has enough elements), we can use repeated trials of the basic algorithm to reduce our error probability. If we ever get a result of false we know it is correct, and if we get true many times in a row we know that either Q 0 or we have been very unlucky in our random choices we ll examine the question of how unlucky soon. 3

Let s first consider the case where n = 1, that is, Q is a polynomial of degree d in a single variable. We know something about such polynomials over any field they can have at most d different roots. (Why? For each root r, the linear polynomial x r must divide Q. Over a field, this means that the product of the linear polynomials for each root must also divide Q, and the degree of this product is exactly the number of roots.) It was pointed out in class that field is not exactly the algebraic property we need, though it is sufficient. The polynomials over a field are not a field, for example, because you cannot divide. For any Q(x) of degree d and any S of size m, the greatest chance of getting Q(z 1 ) = 0 is if Q has d roots and all of them happen to be in S. In this case the probability is d/m, and in any other case it is less. So we have established the bound we want in the special case n = 1. We will prove the d/m bound for n variables by induction on n. So we assume as inductive hypothesis that for any d and any polynomial R of degree d on n 1 variables, the probability that R(z 1,..., z n 1 ) = 0 for random z i s is at most d/m. 4

Here is a bad argument that gets at the central idea of the correct argument. Consider any sequence of values z 2,..., z n each chosen from S. If we substitute these values into Q, we get a polynomial Q(x 1, z 2,..., z n ) in the single variable x 1, and this polynomial has degree at most d. By our argument for the one-variable case, our probability of getting Q = 0 is at most d/m. The probability that Schwartz algorithm fails is the average, over all possible sequences (z 2,..., z n ), of the probability of getting 0 from that sequence. The average of these m n 1 numbers, that are each at most d/m, must itself be at most d/m. Do you see the flaw in the argument? The probability of a one-variable polynomial being 0 is bounded by d/m only if that polynomial is not itself the zero polynomial! There is no reason that a particular sequence of z values might not cause every coefficient of Q to become zero, and hence cause all the values of z 1 to lead to Q(z 1,..., z n ) being zero. 5

So our m n 1 individual probabilities are not all bounded by d/m, as some of them may be 1. But our correct argument will put a bound on the number that are 1, in order to show that the average of all the numbers is really at most d/m. Let s rewrite Q as a polynomial in x 1, whose coefficients are polynomials in the other n 1 variables: Q = Q k x k 1 + Q k 1 x k 1 1 +... + Q 1 x 1 + Q 0 Here k is the maximum degree of x 1 in any term of Q. Note that the degree of the polynomial Q k (z 2,..., z n ) must be at most d k, since there are no terms in Q of degree greater than d. How could a setting of (z 2,..., z n ) cause Q to become the zero polynomial? A necessary, but not sufficient condition, is that Q k (z 2,..., z n ) be zero. But by our inductive hypothesis, we know that the probability of this event is bounded above by (d k)/m, because Q k has degree at most d k and Q k has only n 1 variables. 6

Let s look again at our m n 1 individual probabilities. At most a (d k)/m fraction of them might be 1. The others are each at most d/m, as we said, but we can do better. Since the degree of Q in the variable x 1 is exactly k, if the values of the z i s cause Q to become nonzero they cause it to have degree at most k. Thus the probability that the choice of z 1 causes Q(z 1,..., z n ) to be 0 is at most k/m. We have the average of m n 1 probabilities, which is the sum of those probabilities divided by m n 1. The sum of the ones, divided by m n 1, is at most (d k)/m. The sum of the others, divided by m n 1, is at most k/m because each of them is at most k m. Our total probability is thus at most d/m, completing our inductive step and thus our proof. 7

If we want to ensure that the probability of a wrong answer is at most ɛ, then we can set m to be 2d (assuming that there are enough elements in F ) and run the basic Schwartz algorithm t = log 2 (1/ɛ) times. The only way we can get a wrong answer is if Q 0 and each of the trials accidently gave us a value of zero. Since each trial fails with probability at most d/m = 1/2, the probability of an overall failure is at most 2 t = ɛ, as desired. Remember that setting t to 1000, or even 100, drives the error probability low enough that it becomes insignificant next to other possible bad events like a large meteorite landing on the computer during the calculation. Remember also that this amplification of probabilities only works when the probability of success that we start with is nontrivial, in particular at least an inverse polynomial 1/p(n). If the probability of failure is 1 f(n), then the probability of t consecutive failures is (1 f(n)) t or approximately e tf(n). For this to be a small probability, tf(n) must be at least 1, which is to say that t must be at least 1/f(n). 8

When we study NP-complete problems, we will see problems that have randomized algorithms with a very small probability of success such as 2 n. For example, in the SATISFIABILITY problem our input is a boolean formula in n variables and we ask whether there is any setting of the n variables that makes the formula true. It is certainly possible that only one of the 2 n settings makes the formula true, so that choosing a random setting has only a 2 n probability of success. In this case t trials would have at most a t2 n probability of success, which is vanishingly small for any reasonable t. 9

This analysis depended on the fact that the basic Schwartz algorithm has only one-sided error it can return true when the right answer is false, but not vice-versa. Thus repeated trials could fail only if each individual trial fails, and we can multiply the individual failure probabilities to get the overall failure probability. If we have a randomized Monte Carlo algorithm with two-sided error, meaning that no individual answer can be trusted, the analysis of repeated trials is different. If the answer is a bit and our failure probability is significantly less than 1/2, at most 1/2 1/p(n) for some polynomial p, it turns out that taking the majority result of a sufficiently large polynomial number of repeated trials is overwhelmingly likely to be correct. If the basic failure probability is something like 1/2 2 n, however, a polynomial number of repeated trials don t help much. 10

CMPSCI611: Perfect Matchings Again Lecture 13 We can apply the Schwartz algorithm to verify polynomial identities in a familiar setting the problem of determining whether a matching exists in a bipartite graph. Given G = (U, V, E), we can construct a polynomial that is identically zero iff G does not have a perfect matching. Thus the Schwartz algorithm can be used to either: Prove that the graph has a perfect matching, or Give us arbitrarily high confidence that it does not, given enough independent trials of the basic algorithm. In the first case, the Schwartz algorithm will not give us a perfect matching in the graph, only assurance that it exists. 11

The polynomial we must test is the determinant of a particular matrix. Given any n by n matrix A, the determinant is defined to be a particular sum of products of matrix entries. For every bijection σ from the set {1,..., n} to itself (every permutation), we form the product of the entries A i,σ(i) for each i. Then we add these products together with coefficient 1 if σ is an even permutation and 1 if it is an odd permutation. (What do these last two terms mean? Any permutation can be written as a product (composition) of tranpositions, permutations that swap two elements and keep the others fixed. The various products resulting in a given permutation are either all odd or all even, and we call the permutation odd or even accordingly. A k-cycle, which is a permutation that takes some a 1 to a 2, a 2 to a 3, etc., until it takes a k to a 1, is odd if k is even and even if k is odd.) 12

So the determinant of a 2 by 2 A is a 11 a 22 a 12 a 21, and the determinant of a 3 by 3 A is a 11 a 22 a 33 a 11 a 32 a 23 + a 12 a 23 a 31 a 12 a 21 a 33 + a 13 a 32 a 21 a 13 a 31 a 22. For general n by n this definition does not give a good way to compute the determinant because there are n! different permutations to consider. But there are two good ways to compute it more quickly: The row operations of Gaussian elimination do not change the determinant, except for swapping rows which multiplies it by 1. So we can transform the original matrix to an upper triangular matrix by a polynomial number of these operations, counting the number of times we swap rows. The determinant of an upper triangular matrix is just the product of the entries on its main diagonal. There is a general method, which we won t cover here, to express the determinant of A as one entry of the product of some matrices derived from A. This means that we can in principle find the determinant in subcubic time, or find it quickly in parallel. 13

Both of these methods assume that we are carrying out the matrix operation over a field in which entries can be stored, added, and multiplied quickly. The integers, integers modulo p, real numbers, and the complex numbers are such fields, and polynomials over one variable (even O(1) variables) still allow these operations even though they do not form a field. But as we have seen, simple operations on polynomials over many variables may produce answers that we can no longer store or manipulate easily. Let s return to the perfect matching problem on a bipartite graph G = (U, V, E), where U = V = n. Define the following n by n matrix A the (i, j) entry is the variable x ij if there is an edge from vertex i of U to vertex j of V, and 0 otherwise. Look at the determinant of this matrix. Any permutation σ represents a bijection from U to V. If this bijection corresponds to a matching in E, then every entry a i,σ(i) is nonzero and the product of these entries becomes a term in the determinant of A, with either a plus or minus sign. 14

Furthermore, if this term for σ appears in the determinant it cannot be canceled by any other term. This is because each term contains n variables, and no two contain exactly the same variables. (If σ τ, there must be some variable x i,σ(i) that is not x i,τ(i).) So given G, we test whether G has a perfect matching by repeatedly choosing random values from some set S, for each x ij, getting a matrix of field elements, and evaluating the determinant of that matrix. If we ever get a nonzero value, a perfect matching exists, and if we keep getting nonzero values we build our confidence that no perfect matching exists. 15

How does the time of this algorithm compare with our other methods to test for perfect matchings? The network flow method takes O(e 2 n) = O(n 5 ) time as we presented it, or O(n 3 ) by a more complicated algorithm that we did not present. Finding the determinant (as we didn t prove) takes O(n 3 ) by the standard method of matrix multiplication or o(n 3 ) by faster methods. So the new method is not clearly better for sequential machines, but it does allow for a fast parallel algorithm where network flow, as far as we know, doesn t. (Note finally that the Adler notes are quoting running time in terms of n as the whole input size, not the number of vertices. So they refer to the times of my O(n 3 ) algorithms as O(n 3/2 ). ) 16