Pr[X = s Y = t] = Pr[X = s] Pr[Y = t]

Homework 4 By: John Steinberger Problem 1. Recall that a real n n matrix A is positive semidefinite if A is symmetric and x T Ax 0 for all x R n. Assume A is a real n n matrix. Show TFAE 1 : (a) A is positive semidefinite (b) A is symmetric and the eigenvalues of A are all nonnegative ( 0) (c) A can be written as B T B for some n n upper triangular matrix B (you can use last homework s result; vocabulary: B T B is called the Cholesky decomposition of A) (d) there exist vectors x 1,..., x n R n such that A = n i=1 x ix T i (note: in fact if k = rank(a), there exists k vectors x 1,..., x k R k such that A = k i=1 x ix T i ; see Part C of Problem 1 of Homework ) (e) there exists a symmetric matrix X such that X 2 = A (therefore A has a square root ); moreover, X is positive semidefinite (f) A is symmetric and every symmetric minor of A has determinant 0 (a symmetric minor of A is the restriction of A to rows and columns belonging to some set J [n]; for example if J = {1, 2} then the symmetric minor is the upper left-hand corner 2 2 submatrix of A) (g) A can be written as P 1 DP where P is an orthogonal matrix 2 and D is a diagonal matrix with nonnegative entries on its diagonal (h) A is symmetric and A, X 0 for all n n positive semidefinite matrices X, where A, X is the dot product of A and X considered as vectors: A, X = i,j A ijx ij Problem 2. Let X and Y be random variables taking values in some set S. We say X is equidistributed to X if Pr[X = s] = Pr[ X = s] for all s S. The notation X X means that X and X and equidistributed. We say X and Y are independent if Pr[X = s Y = t] = Pr[X = s] Pr[Y = t] for all s, t S. For this problem, X and Y may or may not be independent; it doesn t matter. Part A. Prove there exists a pair of random variables ( X, Ỹ ) such that X X, Ỹ Y, and such that Pr[ X Ỹ ] = (X, Y ), where (X, Y ) is the statistical distance between X and Y. Note: For the expression Pr[ X = Ỹ ] to make sense, X and Ỹ need to be defined over the same probability space. You can construct this probability space however you want. All we want is that X X, Ỹ Y, and that Pr[ X Ỹ ] = (X, Y ). Other note: the pair ( X, Ỹ ) is called a coupling of X and Y. Part B. Prove that Pr[ X Ỹ ] (X, Y ) for any pair of random variables ( X, Ỹ ) (defined over the same probability space) such that X X, Ỹ Y. Part C. Conclude by parts A and B that (X, Y ) = min X X,Ỹ Y Pr[ X Ỹ ] where the notation means that the min is taken over all pairs of random variables ( X, Ỹ ) defined over a common probability space and such that X X, Ỹ Y. Part D. Use Part C to give a different proof that for any distinguisher D : S {0, 1}. D (X, Y ) := Pr[D(X) = 1] Pr[D(Y ) = 1] (X, Y ) Problem. If X is a random variable, we write X r for a random variable composed of r independent copies of X. That is, X r = (X 1,..., X r ) where X i X for each i and the X i s are independent. (Note: A 1 TFAE: The Following Are Equivalent. 2 See the last homework for the definition of orthogonal matrix. 1

random variable of the form Z = (Z 1,..., Z r ) with the coordinates are independent i.e. where the Z i s are independent is sometimes called a product distribution.) Say (X, Y ) = ε. How large could (X r, Y r ) be? How small? Find examples maximizing and minimizing (X r, Y r ), where the only constraint is that (X, Y ) = ε. You do not need to prove your examples are best possible. Also, you may find it hard to exactly compute (X r, Y r ). In this case, for your examples, simply mention how large r must be, as a function of ε, such that (X r, Y r ) = Ω(1). Problem 4. For this problem, all the random variables we consider have range [N] = {1,...,N}. A random variable X is a convex combination of random variables Y 1,..., Y t if there exist numbers w 1,..., w t, 0 w i, t i=1 w i = 1, such that t Pr[X = i] = w j Pr[Y j = i] for all i [N]. Let v = (v 1,..., v N ) R N. We define v(x) = j=1 N v i Pr[X = i]. i=1 (Say that v i is the value of i [N]; then v(x) is the expected value obtained by selecting i according to X. The vector v is also called a payoff vector or utility vector.) Part A. Show that X is a convex combination of Y 1,..., Y k if and only if v(x) min j v(y j ) for all v R N. (One could also say: if and only if v(x) max j v(y j ) for all v R N.) Part B. Let K N. Say that X is K-flat if for every i [N], either Pr[X = i] = 0 or Pr[X = i] = 1 K. (This implies X is uniformly distributed over a subset of [N] of size K.) The min-entropy of X, written H (X), is defined as ( H (X) = min log 2 x [N] 1 Pr[X = x] (So if X is K-flat, then H (X) = log 2 (K).) Show that H (X) log 2 (K) if and only if X is a convex combination of K-flat sources. Problem 5. (Note: If you want to start thinking right away about the problem, you can jump forward to where it says Toy version below.) The complexity class BPP is called a two-sided error complexity class because we are allowed to make mistakes for both yes and no answers. The one-sided error analogues of BPP are two complexity classes called RP and corp. A language L is in RP if and only there exists a probabilistic polynomial-time Turing machine M such that x / L = Pr[M(x) = 0] = 1 ). x L = Pr[M(x) = 1] 0.5. This is the short way of saying the definition. More formally, since M is a probabilistic Turing machine, it takes two inputs: the normal input x and the random tape y. Then M is probabilistic polynomial-time if there exists a polynomial p(n) such that M(x, y) halts in at most p( x ) steps for all x {0, 1}, for any y {0, 1} p( x ). Note that we can assume the random tape has length p( x ) since anyway M cannot read more than p(n) bits of the random tape in p(n) steps. (Indeed, I did something a bit stupid when I defined Source is a synonym of random variable. 2

BPP in HW 1, because I allowed the random tape to have a different length than the running time of the machine; normally, one simply considers the random tape to have the same length as the running time.) In any case, the more formal way of writing things is to say that L RP if and only there exists a PPTM (Probabilistic Polynomial-time Turing Machine) M of running time p(n) such that x / L = Pr y {0,1} p( x ) [M(x, y) = 0] = 1 x L = Pr y {0,1} p( x ) [M(x, y) = 1] 0.5 for every x {0, 1}. For corp, it s the opposite: a language L is in corp if and only if there exists a PPTM M of running time p(n) such that x / L = Pr y {0,1} p( x ) [M(x, y) = 0] 0.5 x L = Pr y {0,1} p( x ) [M(x, y) = 1] = 1 for every x {0, 1}. Note that for RP, if M outputs 0 then maybe it has made a mistake (because it is allowed to output 0 even if the correct answer is 1). For corp, if M outputs 1 then maybe it has made a mistake. Put otherwise: for RP, if M outputs 1 then it is sure of its answer for corp, if M outputs 0 then it is sure of its answer This is how I remember which is RP and which is corp: RP is the one where you need to be sure when you say yes (when you say yes it means you ve actually understood something about your input). And corp is the one where you need to be sure when you say no (if you say no it means you ve actually understood something about your input). Another way to remember is that the definitions have been made so that RP NP and corp conp. (Check those inclusions for yourself.) The constant 0.5 in the definition could be changed to 2 or 9 10 or 1 10 without affecting which languages are in RP an corp, because we can apply error reduction to navigate between these constants. For example, if a PPTM M is such that x / L = Pr[M(x) = 0] = 1 x L = Pr[M(x) = 1] 0.01 then applying error reduction (more precisely, repeating M 100 times on each input, and outputting 1 if and only if M ever outputs 1) one finds a PPTM M such that x / L = Pr[M (x) = 0] = 1 x L = Pr[M (x) = 1] 1 (1 0.01) 100 1 1/e. Repeating M 200 times we would get Pr[M (x) = 1] 1 1/e 2, and so forth. In fact, we can even apply error reduction starting from a probability of correctness that is only inversepolynomially bounded away from 0; more precisely, say M is a PPTM and q(n) > 0 is a polynomial such that x / L = Pr[M(x) = 0] = 1 x L = Pr[M(x) = 1] 1 q(n) for all {0, 1}. Then if we build a machine M by running M q(n) 2 times (and answering yes if M answers yes a single time), then M is also a PPTM and we have x / L = Pr[M (x) = 0] = 1, x L = Pr[M (x) = 1] 1 e q(n).

Since the constant 0.5 can be replaced by 2, we have RP BPP and corp BPP. This is normal: if you re allowed two-sided error then it s easier than if you re only allowed one-sided error. We also have P RP, P corp. So we can write the pretty chain: P RP corp RP corp BPP. (Pretty, but completely useless. In fact, people believe that P = BPP, even if they are not able to prove it. Thus people really believe that P = RP = corp = BPP.) To illustrate corp, consider the language of PRIMES, which is the set of binary strings whose value, as an integer, is a prime number. Actually, people now know that PRIMES P, but let s ignore that right now. As first-year students you learned the Miller-Rabin primality test which, if it answers no, has detected that its input is composite (non-prime). Thus, this test shows PRIMES corp. As another example, consider the language ZEROPOLY consisting of strings that encode an algebraic circuit (that s like a boolean circuit, but now the gates are and + instead of and and ) over the field F 2 = {0, 1} that computes a symbolic polynomial that is the 0 polynomial. (A symbolic polynomial: over the field F 2, the polynomial x 2 + x is always equal to 0, so as a function it is the 0 function; but as a symbolic polynomial i.e. as a sequence of coefficients this polynomial is not the 0 polynomial.) For example, consider these two algebraic circuits: a + + + x 1 x 2 x 1 x 2 The circuit on the left computes the polynomial x 1 x 2 + x 1 x 2 = 0 which is (symbolically) the 0 polynomial. The circuit on the right computes the polynomial (x 1 + x 1 x 2 )x 2 = x 1 x 2 + x 1 x 2 2. Even though this second polynomial is always equal to 0 over F 2, it is not, symbolically, the 0 polynomial. So the circuit on the left is in ZEROPOLY but the circuit on the right is not in ZEROPOLY. Note that a circuit which takes n bits to describe will have depth at most n and so will give a polynomial of degree d 2 n (assuming at most two input wires per gate). To test if a circuit is symbolically 0, we take each input gate 4 x i at random from the finite field F 2 2n of size 2 2n, and apply the circuit to these inputs, working over F 2 2n. If the circuit is symbolically 0, we will obtain 0; otherwise, since the polynomial computed has degree at most 2 n, the Schwartz-Zippel lemma implies that there is chance at most 2 n /2 2n = 1/2 n of obtaining 0. Note that it takes 2n = O(n) bits to represent an element of F 2 2n, so that the value of the circuit on inputs in F 2 2n can be evaluated 5 in poly(n)-time (given that the circuit itself has O(n) gates). Thus this gives us a one-sided error algorithm, and ZEROPOLY corp. (It is when we say no that we are sure of our answer, with this algorithm.) Unlike PRIMES, which is known to be in P, ZEROPOLY is not known to be in P. (Actually, the problem of placing ZEROPOLY in P is a big research topic in TCS, known as polynomial identity testing.) So far, we have only given examples of languages in corp. But languages in RP are easy to construct from languages in corp. Indeed, one has: L RP if and only if L corp. 4 Here x i does not refer to the i-th bit of the input string! The input string describes an algebraic circuit, and has length n. The circuit which is described very likely has fewer than n input gates. We are using x i as the name of the value given to the i-th input gate. 5 A more subtle issue is whether we can find a polynomial of degree 2n over F 2 that is irreducible over F 2 we need this polynomial to construct the field F 2 2n and do computations over it! Thankfully, such an irreducible polynomial can be found in poly(n) time. b 4

You can check this for yourself. So the language NONZEROPOLY (with obvious definition) is in RP. But ZEROPOLY is not known to be in RP, and (symmetrically) NONZEROPOLY is not known to be in corp. There is one more natural complexity class along the same lines, known as ZPP. This is the class of languages for which there is a PPTM that never makes an error, but that is sometimes undecided. Formally, a L ZPP if and only if there exists a PPTM M such that x / L = Pr[M(x) = 0] 1, Pr[M(x) = 1] = 0, x L = Pr[M(x) = 1] 1, Pr[M(x) = 0] = 0. Here M is allowed to answer a special symbol which means I don t know. Thus with probability 1, M answers correctly, and M never outputs the wrong answer, but M is also allowed to output no answer at all. By error reduction, the fraction 1 could be replaced with 0.99. For fun, why don t you decide whether: Also decide which of these is true: ZPP BPP or BPP ZPP. RP corp ZPP RP corp = ZPP ZPP RP corp. (If the middle one is true then of course all three are true.) (These questions are not part of Problem 5.) OK... In the first homework, in Problem 5, we considered error reduction for BPP. In the case of RP and corp, doing error reduction is much more trivial. (For BPP you need to use a Chernoff bound to compute by how much error is reduced when you repeat the algorithm a large number of times; for RP and corp, you just need to know the formula (1 a) b e ab.) In this problem we will not consider by how much can error be reduced using repetition for RP and corp (which is kind of stupid), but how much randomness is really needed to reduce the error by a certain amount. Basically, we are going to consider the question of randomness efficient error reduction (and in particular, if such randomness efficient error reduction is possible at all). Say that we have a language L in RP and a PPTM M of running time p(n) such that x / L = Pr r {0,1} p( x ) [M(x, r) = 0] = 1 x L = Pr r {0,1} p( x ) [M(x, r) = 1] 0.5 for every x {0, 1}. Fix some x {0, 1} and let n = x. If x / L then there is no problem, M will always be right, so assume that x L. Then there is a set of B {0, 1} p(n) of bad random strings such that M(x, r) = 0 if r B, and we know B 2 p(n) /2. If we repeat M t times on t entirely independent random strings r 1,..., r t, the probability of being unlucky all t times (i.e., having r 1,...,r t B, so that we still answer 0 in the end) is 0.5 t. This is our probability of error after t (independent) repetitions. Note that sampling r 1,...,r t costs us tp(n) random coins. That is: For t independent repetitions of M, we achieve error 0.5 t, and using tp(n) random coins. Still repeating M t times, but on dependent strings r 1,..., r t (dependent strings might require less coins to sample!), we would like to know if we can achieve error something like 0.5 t (maybe c t for some other constant c < 1, like c = 2 ), while using substantially fewer than tp(n) coins to sample the strings r 1,..., r t. Basically, this question: For t non-independent repetitions of M, can we achieve (say) error (2/) t, with much fewer than tp(n) random coins? 5

Note that the set B of bad random strings could change for each x, so we basically have no knowledge of what B is. Our method should work for any set B of bad strings such that B 2 p(n) /2. Solving this problem actually requires tools you haven t learned. All we are going to do now is to look at a toy version of the problem, just to see if the problem could even possibly have a good solution. The toy version is the following: Toy version. Let R = {0, 1} p be the set of random strings 6. Let m be a number of random coins. Say a function f : {0, 1} m R t is a good sampling method if Pr z {0,1} m[f(z) Bt ] ( ) t 2 for any set B R such that B / R 0.5. Find the smallest m that you can such that a good sampling method exists. (End Toy Version.) Well, that s it! That s Problem 5. Note that R t means the set {(r 1,...,r t ) : r i R, 1 i t} and that the event f(z) B t is exactly the bad event that each random string r i in f(z) =: (r 1,..., r t ) is in B. We called the above a toy version of the problem because it only considers whether f exists, and not whether f is actually easy to compute. To be useful in the real world, f should be a polytime function, but for the toy version we don t worry about that. (Actually, this last comment should give you a big hint as to how to construct f.) Bonus Part: Replace 2 by (1 + c)0.5, for fixed c > 0, and see how m depends on c. (This means, (2/) t is replaced by [(1 + c)/2] t.) You should consider that c > 0 is fixed, and that t. Note: I am not really sure, but maybe for this problem you will get better bounds by using the following strong Chernoff bounds : If X 1,..., X n [0, 1] are independent random variables and if X = n i=1 X i, then we have: (a) (b) ) E[X] ( e δ Pr[X (1 + δ)e[x]] (1 + δ) 1+δ ( e δ ) E[X] Pr[X < (1 δ)e[x]] (1 δ) 1 δ e δ2e[x]. These bounds are much uglier than the usual Chernoff bound, and so less practical (usually). But maybe here they will give a better m. I m not sure, you can try. Problem 6. We have now defined a number of complexity classes that refer to probabilistic computations: BPP, RP, corp, ZPP. But the famous complexity class NP does not refer to probabilistic computation. Try to mix the definition NP with the definition of BPP (or with one of the other randomized computation classes) to create your own definition of a new complexity class, that has a bit of both. There is no right answer just try to make your complexity class a cool one. 6 We have consciously replaced p(n) by p (this is not a typo). In the toy version we fix the input to some length, and we don t need to mention dependence on n anymore. 6