Grothendieck s Inequality

Similar documents
Introduction to Semidefinite Programming I: Basic properties a

Lecture 5. Max-cut, Expansion and Grothendieck s Inequality

Inner product spaces. Layers of structure:

Lecture Notes 1: Vector spaces

Approximation Algorithms

Lecture 3: Review of Linear Algebra

U.C. Berkeley CS294: Spectral Methods and Expanders Handout 11 Luca Trevisan February 29, 2016

Lecture 7: Positive Semidefinite Matrices

Lecture 3: Review of Linear Algebra

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.

The Hilbert Space of Random Variables

MAT 419 Lecture Notes Transcribed by Eowyn Cenek 6/1/2012

Lecture 4: Polynomial Optimization

Problem Set 6: Solutions Math 201A: Fall a n x n,

Lecture 2: Linear Algebra Review

Duke University, Department of Electrical and Computer Engineering Optimization for Scientists and Engineers c Alex Bronstein, 2014

Lecture 04: Balls and Bins: Birthday Paradox. Birthday Paradox

Integer Linear Programs

Lecture Semidefinite Programming and Graph Partitioning

Projection Theorem 1

MIT Algebraic techniques and semidefinite optimization February 14, Lecture 3

Linear Algebra Massoud Malek

16 Embeddings of the Euclidean metric

Week Quadratic forms. Principal axes theorem. Text reference: this material corresponds to parts of sections 5.5, 8.2,

Elementary linear algebra

Recall that any inner product space V has an associated norm defined by

Analysis-3 lecture schemes

Lecture 1: Review of linear algebra

SDP Relaxations for MAXCUT

Lecture 10. Semidefinite Programs and the Max-Cut Problem Max Cut

There are two things that are particularly nice about the first basis

(x, y) = d(x, y) = x y.

Some notes on Coxeter groups

University of Leeds, School of Mathematics MATH 3181 Inner product and metric spaces, Solutions 1

The Cut-Norm and Combinatorial Optimization. Alan Frieze and Ravi Kannan

Lecture 8: The Goemans-Williamson MAXCUT algorithm

APPENDIX A. Background Mathematics. A.1 Linear Algebra. Vector algebra. Let x denote the n-dimensional column vector with components x 1 x 2.

Part IA. Vectors and Matrices. Year

1. General Vector Spaces

Further Mathematical Methods (Linear Algebra) 2002

1. Subspaces A subset M of Hilbert space H is a subspace of it is closed under the operation of forming linear combinations;i.e.,

CSCI 1951-G Optimization Methods in Finance Part 10: Conic Optimization

MTH4101 CALCULUS II REVISION NOTES. 1. COMPLEX NUMBERS (Thomas Appendix 7 + lecture notes) ax 2 + bx + c = 0. x = b ± b 2 4ac 2a. i = 1.

LINEAR ALGEBRA REVIEW

2. Review of Linear Algebra

CS 6820 Fall 2014 Lectures, October 3-20, 2014

8 Approximation Algorithms and Max-Cut

Problem 1 (Exercise 2.2, Monograph)

CS286.2 Lecture 15: Tsirelson s characterization of XOR games

Euclidean Space. This is a brief review of some basic concepts that I hope will already be familiar to you.

Cambridge University Press The Mathematics of Signal Processing Steven B. Damelin and Willard Miller Excerpt More information

Chapter 1. Preliminaries. The purpose of this chapter is to provide some basic background information. Linear Space. Hilbert Space.

October 25, 2013 INNER PRODUCT SPACES

Math Linear Algebra II. 1. Inner Products and Norms

Lecture 1 and 2: Random Spanning Trees

7. Dimension and Structure.

Exercise Solutions to Functional Analysis

BBM402-Lecture 20: LP Duality

In English, this means that if we travel on a straight line between any two points in C, then we never leave C.

Lecture 20: Goemans-Williamson MAXCUT Approximation Algorithm. 2 Goemans-Williamson Approximation Algorithm for MAXCUT

Chapter 2. Vectors and Vector Spaces

Lecture 3: Semidefinite Programming

CSC Linear Programming and Combinatorial Optimization Lecture 10: Semidefinite Programming

Contents. Appendix D (Inner Product Spaces) W-51. Index W-63

2. Matrix Algebra and Random Vectors

Two Lectures on the Ellipsoid Method for Solving Linear Programs

Quantum state discrimination with post-measurement information!

approximation algorithms I

Linear Algebra. Session 12

Throughout these notes we assume V, W are finite dimensional inner product spaces over C.

Elements of Convex Optimization Theory

Recall that if X 1,...,X n are random variables with finite expectations, then. The X i can be continuous or discrete or of any other type.

Lecture 20: 6.1 Inner Products

Mathematics 530. Practice Problems. n + 1 }

Lecture: Examples of LP, SOCP and SDP

5. Orthogonal matrices

Introduction to Geometry

Lagrange Multipliers

Combinatorial Optimization Spring Term 2015 Rico Zenklusen. 2 a = ( 3 2 ) 1 E(a, A) = E(( 3 2 ), ( 4 0

Principal Component Analysis

1 Quantum states and von Neumann entropy

SPECIAL CASES OF THE CLASS NUMBER FORMULA

PCA with random noise. Van Ha Vu. Department of Mathematics Yale University

Topics in Representation Theory: Fourier Analysis and the Peter Weyl Theorem

Lecture 4 Orthonormal vectors and QR factorization

MATH 167: APPLIED LINEAR ALGEBRA Least-Squares

Direct product theorem for discrepancy

Convex and Semidefinite Programming for Approximation

Mathathon Round 1 (2 points each)

Rank minimization via the γ 2 norm

Kernel Method: Data Analysis with Positive Definite Kernels

Math Exam 2, October 14, 2008

6.842 Randomness and Computation March 3, Lecture 8

CSC 2414 Lattices in Computer Science September 27, Lecture 4. An Efficient Algorithm for Integer Programming in constant dimensions

Economics 204 Summer/Fall 2010 Lecture 10 Friday August 6, 2010

NONCOMMUTATIVE POLYNOMIAL EQUATIONS. Edward S. Letzter. Introduction

Fourth Week: Lectures 10-12

Lecture 4: Two-point Sampling, Coupon Collector s problem

8.5 Taylor Polynomials and Taylor Series

Vector Spaces. Addition : R n R n R n Scalar multiplication : R R n R n.

Transcription:

Grothendieck s Inequality Leqi Zhu 1 Introduction Let A = (A ij ) R m n be an m n matrix. Then A defines a linear operator between normed spaces (R m, p ) and (R n, q ), for 1 p, q. The (p q)-norm of A is the quantity A p q = max x Rn : x p=1 Ax q. (Recall that, for a vector x = (x i ) R d, the p-norm of x is x p = ( i x i p ) 1/p ; the -norm of x is x = max i x i.) If p = q, then we denote the norm by A p. For what value of p and q is A p q maximized? Since A is linear, it suffices to consider p such that {x R n : x p 1} contains as many points as possible. We also want Ax q as large as possible. Figure 1 gives an illustration of {x R : x p = 1}, for p {1,, }. Going by the figure, A 1 A p q. 1 1-1 1-1 Figure 1: Depiction of {x R : x p = 1} for p {1,, }. Besides providing an upper bound on any (p q)-norm, it is known that the ( 1)-norm provides a constant approximation to the cut norm of a matrix, A C = max S [m],t [n] max i S,j T A ij, which is closely related to the MAX-CUT problem on a graph. One way to compute A 1 is by solving a quadratic integer program: max A ij x i y j s.t. (x, y) { 1, 1} m+n 1

To see this, note that A ijx i y j = i (Ay) ix i. Taking the maximum over x { 1, 1} m gives us Ay 1. Then taking the maximum over y { 1, 1} n gives us A 1 (this requires an argument, but it follows from the convexity of {x R m : x = 1} and triangle inequality). This quadratic integer program may be relaxed to the following semidefinite program: max A ij x (i), y (j) s.t. x (1),... x (m), y (1),..., y (n) are unit vectors in (R d, ) Notice that, if d = 1, then we have exactly the same optimization problem. It is known that exactly computing A p q, for 1 q < p, is NP-hard, while exactly computing A p is NP-hard for p / {1,, }. (As far as I can tell, it is an open question whether or not A p q is computable in polynomial time when 1 p < q.) Hence, if d = 1, we cannot hope for the ellipsoid method to converge quickly on all instances of A. However, there are no hardness results for d > 1. Thus, in principle, the ellipsoid method could converge quickly. A natural questions is: how well does an optimal solution to the semidefinite program approximate A 1? Grothendieck s Inequality provides an answer to this question: Theorem (Grothendieck s Inequality). There exists a fixed constant C > 0 such that, for all m, n 1, A R m n, and any Hilbert space H (vector space over R with an inner product), max unit vectors x (i),y (j) H A ij x (i), y (j) H C A 1. Grothendieck s constant is the smallest C such that the above inequality holds. As far as I can tell, determining the exact value of Grothendieck s constant is an open problem. However, it is known that it lies between 1.57 ln(1+ ) 1.78. Hence, the value of an optimal solution to the semidefinite program provides and K = a constant approximation of A 1. However, this is a bit unsatisfying because, given an optimal solution to the semidefinite program, we do not know how to round the solution to obtain an integer solution (x, y) { 1, 1} m+n with a good approximation ratio. Alon and Naor resolved this problem by rather nicely by adapting Krivine s proof of Grothendieck s Inequality, which obtains the upper bound K on Grothendieck s constant, to obtain a randomized rounding method: Theorem (Alon and Naor). For d = m + n, given an optimal solution to the semidefinite program, x (i), y (j) R m+n, it is possible to obtain (x, y) { 1, 1} m+n (using randomized rounding) such that E A ij x i y j = 1 A ij x (i), y (j) 1 K K A 1 0.56 A 1.

It turns out that Krivine s proof can also be adapted to prove a theorem about degree- pseudo-distributions µ : { 1, 1} m+n R. Recall that µ has to satisfy two properties for the pseudo-expectation Ẽµ that arises from µ: Ẽµ[1] = x { 1,1} µ(x) = 1 and Ẽµ[f ] = m+n x { 1,1} m+n µ(x)(f(x)) 0 for all degree-1 polynomials f : { 1, 1} m+n R. Theorem (SOS). For any degree- pseudo-distribution µ : { 1, 1} m+n R, Ẽ µ(x,y) A ij x i y j K A 1 In this note, we will prove Grothendieck s Inequality when H = R m+n. The proof is mainly due to Krivine. However, we use a nice simplification of a key lemma in Krivine s proof (which holds for general H), due to Alon and Naor. This will provide us with the tools to prove Alon and Naor s theorem. Finally, we will discuss the connection to SOS and how to prove the SOS theorem. Grothendieck s Inequality Krivine s proof of Grothendieck s Inequality relies on Grothendieck s Identity, which, as the name suggests, was first proved by Grothendieck: Lemma (Grothendieck s Identity). Let x and y be unit vectors in (R d, ), where d. If z is a unit vector picked uniformly at random from (R d, ), then E[sign( x, z )sign( y, z )] = arcsin( x, y ). Here, sign(a) { 1, 1} is 1 if and only if a 0. Proof. Consider sign( x, z )sign( y, z ). This has a nice geometric interpretation. First, we orient the sphere {x R d : x = 1} so that z is at the top. It can be verified that sign( x, z )sign( y, z ) is 1 if and only if both x and y lie in the same (upper or lower) half of the sphere when it is oriented this way. Equivalently, {x R d : z, x = 0} is a hyperplane passing through the origin (with normal z). A vector x R d satisfies x, z > 0 if and only if it lies above the hyperplane. Figure contains a depiction of this. Now, consider the expectation. Given the geometric interpretation, the expectation is Pr[x, y lie in same half] Pr[x, y lie in different halves] = 1 Pr[x, y lie in different halves], when a random hyperplane passing through the origin is selected (with normal z). Then we note that the probability x and y lie in different halves of the circle is θ, where θ is the angle between x and y (factor of comes from z and z defining same hyperplane). Hence, the expectation is 1 θ. On the other hand, arcsin( x, y ) = arcsin(cos θ) = arcsin(sin( θ)) = ( θ θ) = 1. This doesn t appear to help much as we don t know what arcsin( x, y ) is. The next lemma addresses this problem. 3

x z y x z y sign( x, z )sign( y, z ) = 1 sign( x, z )sign( y, z ) = 1 Figure : Geometric intepretation of sign( x, z )sign( y, z ). Lemma (Krivine/Alon and Naor). Suppose that x (i), y (j) are unit vectors in R m+n, for i [m], j [n]. Then there are unit vectors ˆx (i), ŷ (j) in R m+n, for i [m], j [m], such that arcsin( ˆx (i), ŷ (j) ) = ln(1 + ) x (i), y (j). Proof. Let c = ln(1 + ) and d = m + n. By Taylor s expansion, sin(c x (i), y (j) ) = ( 1) k c k+1 (k + 1)! ( x(i), y (j) ) k+1. k=0 Our goal is to write the above as the inner product of two vectors in some vector space. This suggests that we need an infinite dimensional vector space. Towards this end, consider the infinite-dimensional vector space H obtained by taking the direct product of k + 1 tensor powers of R d, i.e. H = k=0 (Rd ) k+1. (As a bit of an aside, the direct sum of two vector spaces A and B of dimension α and β, respectively, is a vector space A B of dimension α + β; given vectors a A, b B, we get the vector a b = (a 1,..., a α, b 1,..., b β ). Similarly, the tensor product of A and B, A B, gives a vector space of dimension αβ; given vectors a A and b B, we get the vector a b = (a i b j ).) Let X (i) and Y (j) be vectors in H with the following coordinates for the k th part in the direct sum: X (i) k Y (j) k c k+1 = ( 1) k (k + 1)! (x(i) ) (k+1) = ( 1) k c k+1 (k + 1)! (y(j) ) (k+1) It is a fact that a (k+1), b (k+1) = ( a, b ) k+1. Hence, X (i), Y (j) = sin(c x (i), y (j) ), as required. Moreover, it can be verified that X (i), X (i) = sinh(c x (i), x (i) ) = sinh(c) = 1, by appealing to the Taylor s expansion of sinh(x) = 1 (ex e x ) and using the preceding fact. Similarly, Y (j), Y (j) = 1. It follows that X (i) and Y (j) are unit vectors in H. 4

Consider the span, S, of {X (i), Y (j) }. As there are only d = m + n vectors, S is isomorphic a subspace in R d. By finding an orthonormal basis for S (for example, using Gram-Schmidt) and mapping the basis to the standard basis for R m+n, we can preserve inner products. Thus, X (i), Y (j) correspond to unit vectors ˆx (i), ŷ (j) in R d with the same inner product (in H and R d, respectively). It follows that arcsin( ˆx (i), ŷ (j) ) = arcsin( X (i), Y (j) ) = c x (i), y (j) = ln(1 + ) x (i), y (j), as required. Proof of Grothendieck s Inequality. We may now prove Grothendieck s Inequality when H = R m+n. For x (i), y (j) that maximizes A ij x (i), y (j), we first apply Krivine/Alon and Naor s lemma to obtain ˆx (i), ŷ (j). Define random variables ˆx i = sign( ˆx (i), z ) and ŷ j = sign( ŷ (j), z ), where z is a unit vector in R m+n chosen uniformly at random. We may then compute: E A ij ˆx i ŷ j = [ ] A ij E sign( ˆx (i), z )sign( ŷ (j), z ) = A ij arcsin( ˆx (i), ŷ (j) ) = ln(1 + ) A ij x (i), y (j) = 1 max A x (i), y (j) K unit x (i),y (j) R m+n As (ˆx, ŷ) { 1, 1} m+n, this is at most max xi,y j { 1,1} Grothendieck s Inequality immediately follows. 3 Alon and Naor s Theorem Alon and Naor s rounding algorithm is as follows: A ijx i y j = A 1. 1. Compute the optimal solution x (i), y (j) of the semidefinite program.. Find ˆx (i), ŷ (j) such that arcsin( ˆx (i), ŷ (j) ) = ln(1 + ) x (i), y (j). 3. Pick a unit vector z uniformly at random from R m+n. 4. Set ˆx i = sign( ˆx (i), z ) and ŷ j = sign( ŷ (j), z ). Then the same calculation as in the preceding section gives us that: E A ij ˆx i ŷ j = 1 max A x (i), y (j). K unit x (i),y (j) R m+n Alon and Naor s theorem follows. 5

Perhaps the final detail to address is how to find ˆx (i), ŷ (j). We may do so with the following semidefinite program: min A ij ˆx (i), ŷ (j) ln(1 + ) A ij x (i), y (j) s.t. ˆx (1),... ˆx (m), ŷ (1),..., ŷ (n) are unit vectors in (R d, ) By Krivine s lemma, the optimal value is 0. 4 Connection to SOS Let µ : { 1, 1} m+n R be an arbitrary degree- pseudo-distribution. Recall that we aim to prove that Ẽµ(x,y)[ A ijx i y j ] K A 1, for all such µ. To see the similarity between this statement and Grothendieck s Inequality, we appeal to the fact that, by considering µ (w) = 1 (µ(w) + µ( w)), which has the same pseudo-expectation as µ on A ijx i y j, we may assume that Ẽ µ(x,y) [x i ] = Ẽµ(x,y)[y j ] = 0. Moreover, as (x, y) { 1, 1} m+n, Ẽµ(x,y)[x i ] = Ẽ µ(x,y) [yj ] = Ẽµ(x,y)[1] = 1. By the Quadratic Sampling Lemma (see lecture notes or Boaz s notes), there exists a joint normal probability distribution ρ : R m+n R that has the same first two moments as µ; i.e. the same mean (entry-wise) and covariance matrix (here we consider the formal covariance matrix under pseudo-expectation of µ). Geometrically, sampling a point in R m+n according to ρ is essentially sampling a point from {x R m+n : x 1}, the unit ball in R m+n. Hence, roughly speaking, we have: E (u,v) ρ A ij u i v j E x (i),y (j) ρ A ij x (i), u (j) E x (i),y (j) R m+n : x (i), y (j) 1 max x (i),y (j) R m+n : x (i), y (j) 1 = max x (i),y (j) R m+n : x (i), y (j) =1 K A 1 A ij u (i), v (j) A ij x (i), y (j) A ij x (i), y (j) The fourth line follows since the maximum value of the semidefinite program is achieved on the boundary (i.e. we could have relaxed the constraint on x (i), y (j) to x (i), y (j) 1). As µ and ρ have the same first two moments, the pseudo-expectation of A ijx i y j under µ is the same as the expectation under ρ, so we have the desired bound. Of-course, this is rather imprecise as 6

sampling from ρ does not exactly correspond to sampling from {x R m+n : x 1} (although it is true with high probability), hence the second line is only. But, hopefully this lends some intuition on why it could be true. In the remainder of this section, we formally prove this. We first prove a modified version of Grothendieck s Identity. Lemma. Let (u, v) R be a joint normal distribution such that: ([ [ ]) 0 1 ρ (u, v) N,. 0] ρ 1 The latter matrix is the covariance matrix. Notice that, by Cauchy-Schwartz, 1 ρ 1. Then E[sign(u)sign(v)] = arcsin (E[uv]) = arcsin(ρ). Proof. We may write v = ρu + 1 ρ w, where w N (0, 1) is independent of u. (This is because E[u(ρu+ 1 ρ w)] = E[ρu + 1 ρ uw] = ρ as E[u ] = 1 and u and w are independent, hence E[uw] = E[u]E[w] = 0.) It follows that E[sign(u)sign(v)] = 1 Pr u,w [sign(u) sign(ρu + 1 ρ w)]. We observe that sign(u) sign(v) if and only if sign(u) sign(w) and ρu < 1 ρ w (equivalently, ρ < w u +w, except when u = w = 0). W θ θ U U + W = r θ < arcsin( 1 ρ ) Figure 3: Geometric depiction of U, U + W, θ. To compute the probability of this, we interpret this geometrically. First, as u and w are independent, u and w may be viewed as vectors U = (u, 0), W = (0, w) in R w, respectively. In this view, u +w = (cos θ), where θ is the angle between W and U + W. As u, w N (0, 1) are independent, for any r > 0, if we do rejection sampling for (u, w) such that u + w = r, then this is the same as sampling a point uniformly at random from the circle of radius r in R. Hence, Pr[sign(u) sign(v) u + w = r ] = Pr[(cos(θ)) > ρ u + w = r ] = Pr[(sin(θ)) < 1 ρ u + w = r ] is proportion of the arc U + W = r such that (sin(θ)) < 1 ρ, where θ is the angle between W and U + W. This is summarized in Figure 3. We may compute this to be arcsin( 1 ρ )r r = 1 arcsin( 1 ρ ). As r 0 Pr[u + w = r ] = 1, 7

this gives us that Pr[sign(u) sign(v)] = 1 arcsin( 1 ρ ). It follows that E[sign(u)sign(v)] = 1 arcsin( 1 ρ ) = 1 arccos(ρ) = 1 ( arcsin(ρ)) = arcsin(ρ). The preceding lemma implies that if (u 1,..., u m, v 1,..., v m ) R m+n is a joint normal distribution that satisfies E[u i ] = E[v j ] = 0 and E[u i ] = E[v j ] = 1, then, for all i, j, E[sign(u i )sign(v j )] = arcsin(e[u iv j ]). To carry out the same calculation we did in the preceding section to prove Grothendieck s Inequality, we need arcsin(e[u i v j ]) = ln(1 + )Ẽµ(x,y)[x i y j ], which should look familiar from Krivine s lemma in the previous section. Put in another way, we need to pick the covariance of u i v j according to Ẽµ(x,y)[x i y j ]. The next lemma proves exactly this: Lemma. There exists a joint normal distribution (u 1,..., u m, v 1,..., v n ) R m+n such that, for all i [m], j [n], E[u i ] = E[v j ] = 0, E[u i ] = E[v j ] = 1, and arcsin(e[u i v j ]) = ln(1 + )Ẽµ(x,y)[x i y j ]. Proof. Let Σ R (m+n) (m+n) be the matrix defined by Σ ij = E µ(w) [w i w j ], where w = (x, y). In other words, Σ is the formal covariance matrix of (x, y) (here is where we use the assumption that Σ ij = Ẽµ(x,y)[x i ] = Ẽµ(x,y)[y j ] = 0). By the Quadratic Sampling Lemma, Σ is positive semidefinite and symmetric, i.e. it actually defines a covariance matrix. Moreover, since Ẽµ(x,y)[1] = 1, Ẽ µ(x,y) [x i ] = Ẽµ(x,y)[1] = 1 and, similarly, Ẽµ(x,y)[yj ] = 1. If we extend the pseudo-expectation function so that it applies to matrices (in terms of x, y) entry-wise, i.e., (Ẽµ(x,y)[B]) ij = Ẽµ(x,y)[B ij ], then we may view Σ more compactly as: ] [Ẽµ(x,y) [xx t ] Ẽ µ(x,y) [xy t ] Ẽ µ(x,y) [yx t ] Ẽ µ(x,y) [yy t. ] Consider the matrix Σ R (m+n) (m+n) defined as follows: [ sinh (ln(1 + ) Ẽ µ(x,y) [xx t ]) sin (ln(1 + ] )Ẽµ(x,y)[xy t ]) sin (ln(1 + )Ẽµ(x,y)[yx t ]) sinh (ln(1 + )Ẽµ(x,y)[yy t ]) where sinh and sin is applied entry-wise to each submatrix. Since Σ is positive semidefinite and symmetric, it can be shown that the same holds for Σ, 8

i.e. Σ defines a covariance matrix. Moreover, since sinh(ln(1 + )) = 1 and Ẽµ(x,y)[x i ] = Ẽµ(x,y)[yj ] = 1, we have that the diagonal of Σ are all 1 s. It follows that we can pick (u, v) to be a joint normal distribution with E[u i ] = E[v j ] = 0 and covariance matrix Σ. Putting this all together, we have that: A 1 E (u,v) A ij sign(u i )sign(v j ) = A ij E (u,v) [sign(u i )sign(v j )] = A ij arcsin(e (u,v) [u i v j ]) = 1 A ij Ẽ µ(x,y) [x i y j ]. K The SOS theorem follows. 9