Lecture 17. Solving LPs/SDPs using Multiplicative Weights Multiplicative Weights

Similar documents
princeton univ. F 17 cos 521: Advanced Algorithm Design Lecture 7: LP Duality Lecturer: Matt Weinberg

U.C. Berkeley CS294: Beyond Worst-Case Analysis Luca Trevisan September 5, 2017

College of Computer & Information Science Fall 2009 Northeastern University 20 October 2009

Lecture 4. Instructor: Haipeng Luo

The Minimum Universal Cost Flow in an Infeasible Flow Network

COS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture # 15 Scribe: Jieming Mao April 1, 2013

1 The Mistake Bound Model

Lectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix

Assortment Optimization under MNL

Lecture 11. minimize. c j x j. j=1. 1 x j 0 j. +, b R m + and c R n +

Lecture 10: May 6, 2013

Errors for Linear Systems

U.C. Berkeley CS294: Spectral Methods and Expanders Handout 8 Luca Trevisan February 17, 2016

Lecture 10 Support Vector Machines II

Lecture 20: Lift and Project, SDP Duality. Today we will study the Lift and Project method. Then we will prove the SDP duality theorem.

Chapter Newton s Method

Additional Codes using Finite Difference Method. 1 HJB Equation for Consumption-Saving Problem Without Uncertainty

COS 521: Advanced Algorithms Game Theory and Linear Programming

CS : Algorithms and Uncertainty Lecture 17 Date: October 26, 2016

Stanford University CS359G: Graph Partitioning and Expanders Handout 4 Luca Trevisan January 13, 2011

1 GSW Iterative Techniques for y = Ax

Problem Set 9 Solutions

10-701/ Machine Learning, Fall 2005 Homework 3

A 2D Bounded Linear Program (H,c) 2D Linear Programming

STAT 309: MATHEMATICAL COMPUTATIONS I FALL 2018 LECTURE 16

1 Matrix representations of canonical matrices

APPENDIX A Some Linear Algebra

Stanford University Graph Partitioning and Expanders Handout 3 Luca Trevisan May 8, 2013

Communication Complexity 16:198: February Lecture 4. x ij y ij

HMMT February 2016 February 20, 2016

The Experts/Multiplicative Weights Algorithm and Applications

Lecture 12: Discrete Laplacian

Feature Selection: Part 1

Solutions to exam in SF1811 Optimization, Jan 14, 2015

Lecture 14: Bandits with Budget Constraints

Grover s Algorithm + Quantum Zeno Effect + Vaidman

Chapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems

Solutions HW #2. minimize. Ax = b. Give the dual problem, and make the implicit equality constraints explicit. Solution.

Math 217 Fall 2013 Homework 2 Solutions

Module 9. Lecture 6. Duality in Assignment Problems

Lecture 17: Lee-Sidford Barrier

Lecture 10 Support Vector Machines. Oct

LECTURE 9 CANONICAL CORRELATION ANALYSIS

Lecture 4: Universal Hash Functions/Streaming Cont d

Outline and Reading. Dynamic Programming. Dynamic Programming revealed. Computing Fibonacci. The General Dynamic Programming Technique

Calculation of time complexity (3%)

Finding Primitive Roots Pseudo-Deterministically

COS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture #16 Scribe: Yannan Wang April 3, 2014

1 Convex Optimization

U.C. Berkeley CS278: Computational Complexity Professor Luca Trevisan 2/21/2008. Notes for Lecture 8

Lecture 5 Decoding Binary BCH Codes

Lecture Notes on Linear Regression

Section 8.3 Polar Form of Complex Numbers

Foundations of Arithmetic

Affine transformations and convexity

Difference Equations

ρ some λ THE INVERSE POWER METHOD (or INVERSE ITERATION) , for , or (more usually) to

princeton univ. F 13 cos 521: Advanced Algorithm Design Lecture 3: Large deviations bounds and applications Lecturer: Sanjeev Arora

U.C. Berkeley CS294: Beyond Worst-Case Analysis Handout 6 Luca Trevisan September 12, 2017

2.3 Nilpotent endomorphisms

Design and Analysis of Algorithms

Maximal Margin Classifier

Perron Vectors of an Irreducible Nonnegative Interval Matrix

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 12 10/21/2013. Martingale Concentration Inequalities and Applications

THE CHINESE REMAINDER THEOREM. We should thank the Chinese for their wonderful remainder theorem. Glenn Stevens

More metrics on cartesian products

Online Classification: Perceptron and Winnow

MMA and GCMMA two methods for nonlinear optimization

p 1 c 2 + p 2 c 2 + p 3 c p m c 2

How Strong Are Weak Patents? Joseph Farrell and Carl Shapiro. Supplementary Material Licensing Probabilistic Patents to Cournot Oligopolists *

Lecture 5 September 17, 2015

Singular Value Decomposition: Theory and Applications

CHAPTER 17 Amortized Analysis

6.854J / J Advanced Algorithms Fall 2008

APPROXIMATE PRICES OF BASKET AND ASIAN OPTIONS DUPONT OLIVIER. Premia 14

Maximizing the number of nonnegative subsets

Faster and Simpler Width-Independent Parallel Algorithms for Positive Semidefinite Programming

Eigenvalues of Random Graphs

Yong Joon Ryang. 1. Introduction Consider the multicommodity transportation problem with convex quadratic cost function. 1 2 (x x0 ) T Q(x x 0 )

Matrix Approximation via Sampling, Subspace Embedding. 1 Solving Linear Systems Using SVD

Linear Approximation with Regularization and Moving Least Squares

CS286r Assign One. Answer Key

Lecture 4: November 17, Part 1 Single Buffer Management

Tracking with Kalman Filter

Randić Energy and Randić Estrada Index of a Graph

Exercises of Chapter 2

Bézier curves. Michael S. Floater. September 10, These notes provide an introduction to Bézier curves. i=0

Computing Correlated Equilibria in Multi-Player Games

C/CS/Phy191 Problem Set 3 Solutions Out: Oct 1, 2008., where ( 00. ), so the overall state of the system is ) ( ( ( ( 00 ± 11 ), Φ ± = 1

Norms, Condition Numbers, Eigenvalues and Eigenvectors

Notes on Frequency Estimation in Data Streams

Min Cut, Fast Cut, Polynomial Identities

Vapnik-Chervonenkis theory

Homework Notes Week 7

CME 302: NUMERICAL LINEAR ALGEBRA FALL 2005/06 LECTURE 13

Kernel Methods and SVMs Extension

18.1 Introduction and Recap

Spectral Graph Theory and its Applications September 16, Lecture 5

On the correction of the h-index for career length

1 Gradient descent for convex functions: univariate case

Transcription:

Lecture 7 Solvng LPs/SDPs usng Multplcatve Weghts In the last lecture we saw the Multplcatve Weghts (MW) algorthm and how t could be used to effectvely solve the experts problem n whch we have many experts and wsh to make predctons that are approxmately as good as the predctons made by the best expert. In ths lecture we wll see how to apply the MW algorthm to effcently approxmate the optmal soluton to LPs and SDPs. 7. Multplcatve Weghts Recall the followng result from Lecture 6 about the Hedge algorthm: heorem 7.. Suppose the cost vectors are m (t) [, ] N. hen for any ɛ, and for any, the Hedge algorthm guarantees that for all [m], p (t) m (t) m (t) + ɛ + ln N ɛ So the total cost pad by the algorthm s no more than an addtve factor of ɛ + ln N ɛ worse than the cost ncurred by any ndvdual component of the cost vector. heorem 7. mples a smlar result for the average cost ncurred per round. (One can get a smlar result for the MW algorthm, where nstead of the update rule w (t) the rule w (t) w (t) ( ɛm (t) ).) w (t) exp( ɛm (t) ), we used Corollary 7.2. Suppose the cost vectors are m (t) [ ρ, ρ] N. hen for any ɛ, and for 2 any 4 ln N ρ 2, the Hedge algorthm guarantees than for all [m] ɛ 2 p (t) m (t) m (t) + ɛ * Lecturer: Anupam Gupta. Scrbe: m Wlson.

LECURE 7. SOLVING LPS/SDPS USING MULIPLICAIVE WEIGHS 2 Note: We dd not cover ths n lecture, but one can show that f the cost vectors are n [0, ρ], then usng the MW algorthm, the settng 4 ln N ɛ ρ suffces to get the same guarantee of 2 Lemma 7.3. Suppose the cost vectors are m (t) [0, ρ] N. hen for any ɛ 2, and for any 4 ln N ɛ ρ, the MW algorthm guarantees than for all [m] 2 p (t) m (t) m (t) + ɛ A proof of ths can be found n the Arora, Hazan, and Kale survey [AHK05]. 7.2 Solvng LPs wth Multplcatve Weghts We wll use the MW algorthm to help solve LPs wth m constrants of the form mn c x s.t. Ax b x 0 Supposng that we know c x = OP (by bnary search), we wll am to fnd an ɛ-approxmate soluton x such that c x = OP A x b ɛ x 0 or output nfeasble f no soluton exsts. he runtme for ths wll be O s the wdth of the LP whch wll be defned shortly. 7.2. Smplfyng the Constrants ( ρ 2 log m ɛ 2 ) where ρ Instead of searchng for solutons x R n, we wll package together the easy constrants nto the smple convex regon K = {x R n x 0, c x = OP} Now we wsh to solve Ax b such that x K. Note that ths s partcularly easy to solve f Ax b s only one constrant,.e., we are tryng to determne whether x K such that α x β for some α R n, β R. For example, f c 0 and OP max α c we can set x = OP e whch wll satsfy our constrants; else we could output Infeasble. c For general c we are essentally reduced to solvng an LP over two constrants, whch whle not as trval as ths, s stll smple. We wll henceforth assume we have an oracle that gven α R n, β R, and K R n ether returns x R n such that α x β, or correctly asserts that there s no such x. β

LECURE 7. SOLVING LPS/SDPS USING MULIPLICAIVE WEIGHS 3 7.2.2 Usng Multplcatve Weghts We wll use ths oracle that allows us to satsfy one constrant (αx β) for k K, along wth the MW algorthm to get an algorthm satsfy all of the constrants Ax b for x K. Each of the constrants a topx b wll be vewed as an expert for a total of m experts. Each round we wll produce a vector p (t) that wll gve us a convex combnaton of the constrants as follows p (t) A x p (t) b }{{}}{{} α (t) β (t) Usng our oracle, we can determne whether α (t) x β (t) has some soluton x (t) K, or f no such soluton exsts. Clearly f no soluton exsts, then Ax b s nfeasble over K, so our LP s nfeasble. (It s easy to see the contrapostve: f there were a soluton to Ax b, x K, then ths vector x would also satsfy α (t) x β (t) ; here we use the fact that p (t) 0.) Moreover, the vector p (t) serves as proof of ths nfeasblty. Otherwse, we wll set our cost vector so that m (t) = a x (t) b, update our weghts and proceed wth the next round. If we have not determned the LP to be nfeasble after rounds we wll termnate and return the soluton x = Why do we set our cost vectors ths way? It almost seems lke we should ncur no cost when a x (t) b 0 (.e., when we satsfy ths constrant), whereas we are ncurrng a hgher cost the more we satsfy t. Well, the dea s whenever a (t) x b s postve, we have oversatsfed the constrant. Gvng a postve cost to ths constrant causes us to reduce the weght of ths constrant n ths next round. hs works analogously to the experts problem where an expert who s wrong (has hgh cost) s gven less credence (less weght) n future rounds. Smlarly, for any constrant n whch a (t) x b s negatve, we have faled the constrant. Gvng a negatve cost to ths constrant causes us to ncrease the weght of ths constrant n the next round. Intally we set all of our weghts equal to express our gnorance; all constrants are equally hard. Whenever we update our weghts we reduce the weghts of constrants we oversatsfed so we ll cover them less n future rounds. We ncrease the weghts of constrants we ddn t satsfy so we ll cover them more n future rounds. Our hope s that over tme ths wll converge to a soluton where we satsfy all constrants to a roughly equal extent. x (t) 7.2.3 Analyzng Multplcatve Weghts Supposng that we do not dscover our LP s nfeasble, how many rounds should we run and how good wll our soluton be? If we defne ρ = max{, max,x K { a x b }}

LECURE 7. SOLVING LPS/SDPS USING MULIPLICAIVE WEIGHS 4 to be the maxmum magntude of any cost assgned to a constrant, then we may mmedately apply Corollary 7.2 to fnd that after 4 ln n ρ 2 rounds, ɛ 2 p (t) m (t) m (t) + ɛ where ɛ 2, m(t) = a x (t) b [ ρ, ρ] n for all [m], and each x () K. Note that we do not actually need to fnd ρ; t suffces to keep track of ρ t = max{, max,t t{ a x (t ) b }}, the maxmum cost seen so far, and run untl 4 ln n ɛ 2 ρ 2. What guarantee do we get? On the left hand sde of ths nequalty we have p (t) m (t) = p (t) (Ax (t) b) = p (t) Ax (t) p (t) b 0 where the fnal nequalty holds due to our oracle s propertes. herefore the left hand sde s at least 0. And on the rght hand sde we have m (t) = = a a x (t) b ( ) x (t) b = a x b Combnng ths wth our nequalty for the rght hand sde we get : a x b + ɛ 0 a x b ɛ herefore we can obtan an ɛ-feasble soluton to Ax b, x K n tme O ( log m ρ 2) tme ɛ 2 where ρ = max{, max,x K { a x b }} s the wdth of the LP. 7.2.4 Example: Mnmum Set Cover Recall the mnmum fractonal set cover problem wth m sets F = {S, S 2,..., S m } and n elements U. he goal s to pck fractons of sets n order to cover each element to an extent of :.e., to solve the followng LP s.t. mn x S S x S S e x S 0 e

LECURE 7. SOLVING LPS/SDPS USING MULIPLICAIVE WEIGHS 5 Suppose we know OP = L [, m], so K = { S x S = L, x S 0}. We want to fnd x K such that S e x S for all elements e. Our oracle, gven some p, must try to fnd x K such that p e p e = e S S x S S e e x S p e e S x S p(s) where p(s) s the total weght of elements n S. hs quantty s clearly maxmzed over K by concentratng on a set wth the maxmum weght and settng { L for some S F maxmzng p(s) x S = 0 for all other S Note that the wdth of ths LP s at most max x S L m e S e How does the weght update step work? Intally we set w () for all constrants. Whenever a set s overcovered, we reduce the weght of that set so we don t try as hard to cover t n the next step. Whenever a set s undercovered we ncrease the weght of the set so we try harder to cover t n the next step. Now, after 4L 2 ln n/ɛ 2 steps we wll obtan an ɛ-approxmate soluton x such that x S = L S x S ɛ S e x 0 Note that, n ths case, the constrant matrx s completely nonnegatve, and we can scale up our soluton to get a feasble soluton x = x/( ɛ) so that x S = S x S S e x 0 L ɛ L( + ɛ)

LECURE 7. SOLVING LPS/SDPS USING MULIPLICAIVE WEIGHS 6 7.2.5 Comments. he scalng we used for mnmum set cover to obtan a non-optmal, feasble soluton can be appled to any LP where b > ɛ ndeed, we could just multply all the x values by max /(b ɛ). hs s often useful, partcularly when we re gong to round ths LP soluton and ncur further losses, and hence losng ths factor may be nsgnfcant. 2. If the constrant matrx A s all postve the problem s sad to be a coverng problem (we are just nterested n puttng enough weght on x to cover every constrant). If the constrant matrx s all negatve or equvalently, f we have Ax b wth an allpostve matrx A the problem s sad to be a packng problem (we are packng as much weght nto x as possble wthout volatng any constrant). In ether case, we can use a smlar scalng trck to get a non-optmal, feasble soluton. In ths case we can reduce the run-tme further. Assume we have a coverng problem: mn{c x Ax b, x 0}. By scalng, we can transform ths nto a problem of the form mn{c x Ax, x 0} he unform values of b = allows us to set the cost vectors m (t) = a x (t) nstead of m (t) = a x (t) ; ths translaton does not change the algorthm. But the postve cost vectors allow us to use Lemma 7.3 to reduce the runtme from O ( log m ɛ 2 O ( log m ɛ 2 ρ ). ρ 2) to 3. In general, the wdth of our LPs may not turn out to be as nce. For example, n the weghted mnmum set cover problem mn c S x S S s.t. x S S e x S 0 e our optmum, and hence the wdth, can ncrease to as much as m max S c S mn S c S. An approach developed by Garg and Könemann [GK07] can be useful to solve the problems wthout the wdth penalty. 4. he MW algorthm does not need a perfect oracle. Beng able to determne gven α R n and β R f there s no x K wth α topx β, or else returnng an x K such that α x β ɛ s suffcent for our purposes. hs gves us solutons x K such that Ax b (ɛ + ɛ ). 5. here was exactly one pont where we used the fact that our constrants were lnear. hat was concludng that a x (t) b = a x b

LECURE 7. SOLVING LPS/SDPS USING MULIPLICAIVE WEIGHS 7 However, we can make a smlar clam for any set of convex constrants as well: f we wanted to fnd x K such that f (x) 0 for [m], wth the f s convex. hen as long as we could solve the oracle and fnd x K wth p(t) f (x) 0 effcently, the rest of the argument would go through. In partcular, n the step where we used lnearty, we could nstead use ( ) f (x (t) ) f x (t) = f ( x). 7.3 Solvng SDPs wth Multplcatve Weghts Suppose we now move to solvng SDPs of the form mn C X s.t. A X b X 0 note that the frst few constrants are lnear constrants. It s only the psd-ness constrant that s non-lnear so we only need to modfy our MW algorthm by absorbng the X 0 constrant nto the oracle. It wll be also convenent to requre the constrant tr(x) = as well: usually we can guess the trace of the soluton X. (If the trace of the soluton we seek s not but R, we can scale the problem by R to get unt trace.) hen the oracle we must mplement s ths: Let K := {X X 0, tr(x) = }. Gven a symmetrc matrx A R n n and β R, does there exst X K such that A X β? (Agan, A, β wll be obtaned n the algorthm by settng A () := p (t) A, and β () := p (t) b.) But we know from Lecture 2 that ths s equvalent to askng whether the maxmum egenvalue of the symmetrc matrx A s at least β. Indeed, f ths s so, and f λ max s the maxmum egenvalue of A wth unt egenvector x, then A (xx ) = tr(a xx ) = tr(axx ) = tr(λ max xx ) = λ max so our oracle should return X = xx, else t should return Infeasble. Moreover, usng the Observaton #4 on the prevous page, t suffces to return x such that x Ax λ max ɛ. How fast ths can be done depends on the partcular structure of the matrx A; n the next secton we see that for the max-cut problem, the matrx A tself s psd, and hence we can fnd such an x relatvely quckly.

LECURE 7. SOLVING LPS/SDPS USING MULIPLICAIVE WEIGHS 8 7.3. Example: Max Cut hs part s loosely based on the paper of Klen and Lu [KL96]. Recall the Max Cut SDP we derved n Lecture 2: max 4 L X s.t. (e e ) X = X 0 As usual, we wll thnk of the edge weghts as summng to : ths means that tr(l) = L = j L j =. If we let b = OP and scale X by /n, we are lookng for feasblty of the constrants: n 4b L X n(e e ) X = X 0 Fnally, f we take K = {X X 0, tr(x) = }, the above SDP s equvalent to fndng X K such that n 4b L X n(e e ) X (hs s because tr(x) = means X =. Snce we have the constrants n(e e ) X = nx, ths means X = /n for all.) By the dscussons of the prevous secton, our oracle wll need to check whether there exsts X K such that D (t) X, where D (t) = p (t) 0 n n 4b L + = p (t) n(e e ). And agan, s s equvalent to checkng whether λ max (D (t) ). Implementng the oracle. It s useful to note that D (t) s postve semdefnte: ndeed, t s the sum of the Laplacan (whch s psd), and a bunch of matrces e e (whch are psd). Note: In Homework #6, you wll show that for any psd matrx D, the power method startng wth a random unt vector can fnd x K such that D (xx ) [λ max (D)/( + ɛ), λ max (D)]. he algorthm succeeds wth hgh probablty, and runs n tme O(ɛ m log n) tme, where m s the number of edges n G (and hence the number of non-zeroes n L). So we can run ths algorthm: f t answers wth an x such that D (t) (xx ) s smaller than /(+ɛ), we answer sayng λ max (D (t) ) <. Else we return the vector x: ths has the property that D (t) (xx ) /( + ɛ) ɛ. Now, usng the Observaton #4 on the prevous page, we know ths wll suffce to get a soluton that has an O(ɛ) nfeasblty. Boundng the wdth. he wdth of our algorthm s the maxmum possble magntude of D (t) X for X K,.e., the maxmum possble egenvalue of D (t). Snce D (t) s postve

LECURE 7. SOLVING LPS/SDPS USING MULIPLICAIVE WEIGHS 9 semdefnte all of ts egenvalues are non-negatve. Moreover, tr(l) =, and also tr(e e ) =. So λ max (D (t) ) λ (D (t) ) = tr(d (t) ) ( ) = tr p (t) n n 0 4b L + p (t) n(e e ) = p (t) 0 = n n 4b tr(l) + = n( + /4b). = p (t) n tr(e e ) Fnally, the max-cut values we are nterested n le between /2 (snce the max-cut s at least half the edge-weght) and. So b [/2, ], and the wdth s O(n). Runnng me. Settng the wdth ρ = O(n) gves us a runtme of ( ) n 2 log n O whch we can reduce to ɛ 2 oracle ( ) n log n O ɛ 2 oracle usng Lemma 7.3, snce our cost vectors can be made all nonnegatve. Fnally, pluggng n our oracle gves a fnal runtme of ( mn log 2 ) n O, where m s the number of edges n our graph. Note: We can now scale the average matrx X by n to get a matrx X satsfyng: ɛ 3 4 L X b( ɛ) X ɛ tr( X) = n X 0 he attentve reader wll observe that ths s not as nce as we d lke. We d really want each X [ ɛ, + ɛ] then we could transform ths soluton nto one where X = and 4 L X b( ɛ O() ). What we have only guarantees that X [ ɛ, + nɛ], and so we d need to set ɛ /n for any non-trval guarantees. hs would stll gve us a run-tme of O(ɛ 3 mn 4 poly log n) stll polynomal (and useful to examplfy the technque), but t could be better. One can avod ths loss by defnng K dfferently n fact, n a way that s smlar to Secton 7.2. the detals can be found n [KL96]. One can do even better usng the matrx multplcatve weghts algorthms: see, e.g., [AK07, Ste0].

Bblography [AHK05] Sanjeev Arora, Elad Hazan, and Satyen Kale. he multplcatve weghts update method: a meta algorthm and applcatons. echncal report, Prnceton Unversty, 2005. 7. [AK07] [GK07] [KL96] [Ste0] Sanjeev Arora and Satyen Kale. A combnatoral, prmal-dual approach to semdefnte programs. In SOC, pages 227 236, 2007. 7.3. Naveen Garg and Jochen Könemann. Faster and smpler algorthms for multcommodty flow and other fractonal packng problems. SIAM J. Comput., 37(2):630 652 (electronc), 2007. 3 Phlp Klen and Hsueh-I Lu. Effcent approxmaton algorthms for semdefnte programs arsng from MAX CU and COLORING. In Proceedngs of the wentyeghth Annual ACM Symposum on the heory of Computng (Phladelpha, PA, 996), pages 338 347, New York, 996. ACM. 7.3. Davd Steurer. Fast sdp algorthms for constrant satsfacton problems. In SODA, pages 684 697, 200. 7.3. 0