Lecture 4: Polynomial Optimization

Similar documents
Lower bounds on the size of semidefinite relaxations. David Steurer Cornell

SDP Relaxations for MAXCUT

Lecture 3: Semidefinite Programming

Semidefinite programming lifts and sparse sums-of-squares

Lecture 5. Max-cut, Expansion and Grothendieck s Inequality

Advanced SDPs Lecture 6: March 16, 2017

approximation algorithms I

Sum-of-Squares Method, Tensor Decomposition, Dictionary Learning

Semidefinite Programming

Grothendieck s Inequality

4. Algebra and Duality

Global Optimization of Polynomials

Unbounded Convex Semialgebraic Sets as Spectrahedral Shadows

MIT Algebraic techniques and semidefinite optimization February 14, Lecture 3

6-1 The Positivstellensatz P. Parrilo and S. Lall, ECC

U.C. Berkeley CS294: Beyond Worst-Case Analysis Handout 12 Luca Trevisan October 3, 2017

SEMIDEFINITE PROGRAM BASICS. Contents

Example: feasibility. Interpretation as formal proof. Example: linear inequalities and Farkas lemma

- Well-characterized problems, min-max relations, approximate certificates. - LP problems in the standard form, primal and dual linear programs

Hilbert s 17th Problem to Semidefinite Programming & Convex Algebraic Geometry

3. Linear Programming and Polyhedral Combinatorics

The maximal stable set problem : Copositive programming and Semidefinite Relaxations

LOWER BOUNDS FOR A POLYNOMIAL IN TERMS OF ITS COEFFICIENTS

Lecture Semidefinite Programming and Graph Partitioning

12. Hilbert Polynomials and Bézout s Theorem

An explicit construction of distinguished representations of polynomials nonnegative over finite sets

CO 250 Final Exam Guide

Notes on the decomposition result of Karlin et al. [2] for the hierarchy of Lasserre by M. Laurent, December 13, 2012

Maximum cut and related problems

Approximation Algorithms

Integer Programming ISE 418. Lecture 12. Dr. Ted Ralphs

9. Interpretations, Lifting, SOS and Moments

Positive semidefinite rank

CS264: Beyond Worst-Case Analysis Lecture #11: LP Decoding

conp = { L L NP } (1) This problem is essentially the same as SAT because a formula is not satisfiable if and only if its negation is a tautology.

Duality of LPs and Applications

6.854J / J Advanced Algorithms Fall 2008

Robust and Optimal Control, Spring 2015

Lecture #21. c T x Ax b. maximize subject to

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Introduction to Semidefinite Programming I: Basic properties a

A new look at nonnegativity on closed sets

1 Overview. 2 Extreme Points. AM 221: Advanced Optimization Spring 2016

Lecture 10 (Submodular function)

3. Linear Programming and Polyhedral Combinatorics

BBM402-Lecture 20: LP Duality

LMI MODELLING 4. CONVEX LMI MODELLING. Didier HENRION. LAAS-CNRS Toulouse, FR Czech Tech Univ Prague, CZ. Universidad de Valladolid, SP March 2009

CS 6820 Fall 2014 Lectures, October 3-20, 2014

FINITE DIFFERENCES. Lecture 1: (a) Operators (b) Forward Differences and their calculations. (c) Backward Differences and their calculations.

Approximation Algorithms and Hardness of Approximation May 14, Lecture 22

Lecture 10. Semidefinite Programs and the Max-Cut Problem Max Cut

4-1 The Algebraic-Geometric Dictionary P. Parrilo and S. Lall, ECC

6-1 Computational Complexity

princeton univ. F 17 cos 521: Advanced Algorithm Design Lecture 6: Provable Approximation via Linear Programming

Minimum Ellipsoid Bounds for Solutions of Polynomial Systems via Sum of Squares

1 Primals and Duals: Zero Sum Games

NP-Complete problems

10. Smooth Varieties. 82 Andreas Gathmann

Spring 2017 CO 250 Course Notes TABLE OF CONTENTS. richardwu.ca. CO 250 Course Notes. Introduction to Optimization

7. Lecture notes on the ellipsoid algorithm

Semidefinite and Second Order Cone Programming Seminar Fall 2001 Lecture 5

MAT-INF4110/MAT-INF9110 Mathematical optimization

Topological properties

Lift-and-Project Techniques and SDP Hierarchies

Non-Convex Optimization via Real Algebraic Geometry

MIT Algebraic techniques and semidefinite optimization May 9, Lecture 21. Lecturer: Pablo A. Parrilo Scribe:???

Lecture 2 Linear Codes

Complexity of Deciding Convexity in Polynomial Optimization

CS261: Problem Set #3

Continuous Optimisation, Chpt 9: Semidefinite Optimisation

Game Theory. Greg Plaxton Theory in Programming Practice, Spring 2004 Department of Computer Science University of Texas at Austin

Chapter 1. Preliminaries. The purpose of this chapter is to provide some basic background information. Linear Space. Hilbert Space.

Real Symmetric Matrices and Semidefinite Programming

HW1 solutions. 1. α Ef(x) β, where Ef(x) is the expected value of f(x), i.e., Ef(x) = n. i=1 p if(a i ). (The function f : R R is given.

Convex and Semidefinite Programming for Approximation

Convex Feasibility Problems

1 Quantum states and von Neumann entropy

Kernelization by matroids: Odd Cycle Transversal

Integer programming, Barvinok s counting algorithm and Gomory relaxations

Ranks of Real Symmetric Tensors

Flows and Cuts. 1 Concepts. CS 787: Advanced Algorithms. Instructor: Dieter van Melkebeek

I.3. LMI DUALITY. Didier HENRION EECI Graduate School on Control Supélec - Spring 2010

Analysis and synthesis: a complexity perspective

MATH 829: Introduction to Data Mining and Analysis Support vector machines and kernels

Convex Optimization. (EE227A: UC Berkeley) Lecture 28. Suvrit Sra. (Algebra + Optimization) 02 May, 2013

Algebraic Varieties. Notes by Mateusz Micha lek for the lecture on April 17, 2018, in the IMPRS Ringvorlesung Introduction to Nonlinear Algebra

POSITIVE POLYNOMIALS LECTURE NOTES (09: 10/05/10) Contents

1 Matroid intersection


1 Lecture 6-7, Scribe: Willy Quach

15-850: Advanced Algorithms CMU, Fall 2018 HW #4 (out October 17, 2018) Due: October 28, 2018

Lecture 20: Goemans-Williamson MAXCUT Approximation Algorithm. 2 Goemans-Williamson Approximation Algorithm for MAXCUT

Discrete Optimization

U.C. Berkeley CS294: Beyond Worst-Case Analysis Handout 8 Luca Trevisan September 19, 2017

ON SUM OF SQUARES DECOMPOSITION FOR A BIQUADRATIC MATRIX FUNCTION

Distributionally robust optimization techniques in batch bayesian optimisation

Lecture 5. Theorems of Alternatives and Self-Dual Embedding

Linear and Integer Optimization (V3C1/F4C1)

Advanced Combinatorial Optimization September 22, Lecture 4

Lecture 22: Counting

Transcription:

CS369H: Hierarchies of Integer Programming Relaxations Spring 2016-2017 Lecture 4: Polynomial Optimization Professor Moses Charikar Scribes: Mona Azadkia 1 Introduction Non-negativity over the hypercube. Given a low degree polynomial f : {0, 1} n R, we want to decide whether f 0 over the hypercube or there exists a point x {0, 1} n such that f(x) < 0. Now let s see how we can formulate the Max-cut problem in this language. Remember that we have a graph G = (V, E) and we are looking for a bipartition of the set of vertices V such that the number of edges between these two parts (size of the cut) will be maximized. For V = n we encode a bipartition of the vertex set of G by a vector x {0, 1} n and we let f G (x) be the number of edges cut by the bipartition x. This function is a degree-2 polynomial, f G (x) = (x i x j ) 2. {ij} E(G) Therefore deciding if the polynomial c f G (x) takes negative value over the hypercube is equivalent to deciding if the maximum cut in G is larger than c. 2 Sum-of-Squares Algorithm The Sum-of-Squares algorithm gets a polynomial f : {0, 1} n R as input and outputs 1. Either a proof that f(x) 0 for all x {0, 1} n, 2. or an object that pretends to be a point x {0, 1} n with f(x) < 0. 2.1 Sum-of-Squares Certificate Definition 4.1 (SOS certificate). A degree d SOS certificate for a function f : {0, 1} n R consists of polynomials g 1,, g r : {0, 1} n R of degree at most d/2 for some r N such that f(x) = r gi 2 (x) for every x {0, 1} n. We will refer to degree d SOS certificate for f also as a degree d SOS proof of the inequality f 0. Now one question is that how big is r? We will answer this question later. 1

2.2 Verifying Certificates How do we show that f r gi 2 (x) vanishes for every x {0, 1} n? Since g 1,, g r have degree at most d/2, we can represent each polynomial g i by n O(d) coefficients (say in a monomial basis). Thus in n O(d) r time we can verify whether all the coefficients of f gi 2 (x) are equal to zero or not by reducing it to a multilinear polynomial by repeatedly applying x 2 i = x i (which holds for x i {0, 1}). Let s consider f : {0, 1} n R which x {0, 1} n we have f(x) 0. Is there always a certificate for f? Proposition 4.1. For non-negative f : {0, 1} n R there exists a degree-2n SOS certificate. Proof. Using the fact that I{x = y} = f(x) = y {0,1} n (f(y) I{x = y}) = x 2 i i Ones(y) y Zeros(y) (1 x i ) 2 for x, y {0, 1} n, we may write f(y) x 2 i y {0,1} n i Ones(y) y Zeros(y) then by construction we have found a certificate for f where g y (x) = f(y) (1 x i ) 2 x 2 i i Ones(y) y Zeros(y) (1 x i ) 2. 2.3 Finding certificates We saw that we can check sos certificates efficiently. Also the following theorem shows that we can also find them in an efficient way. This sum-of-squares algorithm is based on semidefinite programming and has first been proposed by Naum Shor in 1987, later refined by Pablo Parrilo in 2000, and Jean Lasserre in 2001. Theorem 4.1 (sum-of-squares algorithm-certificate version). There exists an algorithm that for a given function f : {0, 1} n R, k N it outputs a degree-k sos certificate for f + 2 n in time n O(k) if f has a degree-k sos certificate. Theorem 4.2. f has a degree-d sos certificate p.s.d matrix A such that x {0, 1} n, f(x) = (1, x) d/2, A(1, x) d/2. Proof. First let s prove the ( ). If A 0, then we can find the following representation of A A = i V i V T i. Then note that by (1, x) d/2 we mean the vector (1, x i,, x i x j, ) therfore we will have (1, x) d/2, A(1, x) d/2 = (1, x) d/2, V i Vi T (1, x) d/2 = i i (1, x) d/2, V i Vi T (1, x) d/2 2

Then note that (1, x) d/2, V i Vi T (1, x) d/2 ] = [1, x i,, x i x j, ] [V i [Vi T ][1, x i,, x i x j, ] T ] ] = ([1, x i,, x i x j, ] [V i )([1, x i,, x i x j, ] [V i } {{ } g i (x) So defining g i (x) = Vi T (1, x) d/2 will give us f(x) = gi 2 (x). To prove ( ) going in the backward direction of the previous argument will give us the desired A. Note that based on this proof we can conclude if f has a sos certificate then we can find a degree-r sos certificate such that r n d/2. Exercise 4.1. For a graph G = (V, E) consider its Laplacian L G = (i,j)(e i e j )(e i e j ) T. Show that ) T λ max (L G ) n 2 f G has degree-2 sos certificate where f G is the max cut polynomial. Exercise 4.2. f : {0, 1} n R with degree at most d for even d N there exists M R 0 such that M f has degree-d sos certificate. Also M can be chosen n O(d) times the largest coefficient of f in the monomial basis. 2.4 Degree-k sos certificate First of all note that the the functions with degree-k sos certificate form a closed convex cone. Then by hyperplane separation theorem for convex cones, for every function f : {0, 1} n R without degree-k sos certificate there exists a hyperplane through the origin that separates f from the cone of functions with degree-k sos certificate in the sense that the halfspace H above that hyperplane contains the degree-k sos certificate cone but not f. Now let s see how does such a halfspace look like? We can represent that halfspace by its normal function µ : {0, 1} n R so that { } H = g {0, 1} n R µ(x)g(x) 0 x {0,1} n And by scaling without loss of generality we can assume that µ(x) = 1. If we had µ(x) 0 for x{0,1} n all x {0, 1} n this µ would correspond to a probability distribution on the hypercube. In that case, the hyperplane H is simply the set of all the functions with nonnegative expectation with respect to the measure µ. Therefore for f H the expectation of f with respect to µ is negative which means that there exists at least one point x on the hypercube such that f(x) < 0. The point is that µ is not necessarily nonnegative, there is no guarantee that µ(x) 0 for all x {0, 1} n. But still it behaves like a probability distribution in many ways. So let s formalize this idea. We are going to define a pseudo-distribution and pseudo-expectation 3

Definition 4.2 (Degree-d pseudo distribution). over the hypercube is a function µ : {0, 1} n R such that for every polynomial f of degree at most d/2 we have Ẽµf 2 0 and Ẽµ1 = 1, where Ẽ µ f = µ(x)f(x) x {0,1} n and we call it pseudo expectation of f. Lemma 4.1. Suppose µ is a degree-l pseudo distribution. Then there exists a multilinear polynomial µ of degree at most l such that Ẽ µ p = Ẽµ p p of degree l Proof. Let U l R {0,1}n be the linear subspace of multilinear polynomials of degree at most l. Then this space contains all polynomials of degree at most l. Decompose the function µ as µ = µ + µ such that µ U l and µ U l. Then for every p U l we have Ẽ µ p =< µ + µ, p >=< µ, p >= Ẽµ p The notion of pseudo expectation can be easily extended to vector valued functions, in which case this denotes the vector obtained by taking expectation of every coordinate of f. Using this notion we can write the conclusion of last lemma more succinctly as Ẽ µ (1, x) l = Ẽµ (1, x) l. Exercise 4.3. If µ has degree bigger than l what is the projection of µ onto U l? Exercise 4.4. Show that if µ is a degree-2n pseudo distribution, then µ(x) 0 for all x. Exercise 4.5. Show that µ : {0, 1} n R is a pseudo distribution if and only if and Ẽ µ 1 = 1, Ẽ µ ( [(1, x) d/2 ][(1, x) d/2 ] T ) 0. Exercise 4.6. For all d and all pseudo distributions µ of degree d there exists a degree-d pseudo distribution µ with the same pseudo moments up to degree d as µ such that d ( ) n µ (x) 2 n d Hint: Fourier Analysis. Exercise 4.7. Show that the set of degree-d pseudo distributions over {0, 1} n admits a separation algorithm with running time n O(d). Concretely show that there exists an n O(d) -time algorithm that given a vector N (R n ) d outside of the following set χ d outputs a halfspace that separates N from χ d. Here χ d is the set that consists of all coefficient vectors M (R n+1 ) d such that the function µ : {0, 1} n R with µ(x) =< M, (1, x) d > is a degree-d pseudo distribution over {0, 1} n. Exercise 4.8. Show that for every d N, the following set of pseudo moments admits a separation algorithm with running time n O(d), d =0 M d = {Ẽµ (1, x) d µ is deg-d pseudo distribution over {0, 1} n }. 4

3 Duality Now we show that there is a dual relationship between sos certificates and pseudo distributions. Theorem 4.3. For all functions f : {0, 1} n R and every d N, there exists a degree-d sos certificate for the non-negativity of f if and only if every degree-d pseudo distribution µ over {0, 1} n satisfies Ẽµf 0. Proof. ( ) If f has a degree-d sos certificate then we have f(x) = i g 2 i (x) where g i s are of degree at most d/2. Then for every degree-d pseudo distribution µ we have Ẽµg 2 i 0 which will give Ẽµ g 2 i = Ẽ µ f 0. ( ) If there is no degree-d sos certificate, then we want to show that there exists a pseudo distribution µ such that Ẽµf < 0. Now by hyperplane separation theorem, there exists a halfspace H through the origin such that contains the cone but not f. Let µ : {0, 1} n R be the normal of H so that { } H = g : {0, 1} n Rµ g 0. Since f H we know that Ẽµf < 0. Since H contains the degree-d sos cone, every polynomial g of degree at most d/2 satisfies Ẽµg 2 0. It remains to argue that Ẽµ1 = 1, which means that we can rescale µ by a nonnegative factor to ensure that Ẽµ1 = 1. In fact by one of the exercises we know that there exists M R 0 such that M + f has a degree-d sos certificate, which means that as desired. Ẽ µ 1 = 1 ) (Ẽµ M + f M Ẽµf > 0, 5