Lecture 14 Ellipsoid method

Similar documents
10. Ellipsoid method

5. Subgradient method

Subgradient Method. Ryan Tibshirani Convex Optimization

Selected Topics in Optimization. Some slides borrowed from

Subgradient Method. Guest Lecturer: Fatma Kilinc-Karzan. Instructors: Pradeep Ravikumar, Aarti Singh Convex Optimization /36-725

Primal/Dual Decomposition Methods

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 3. Gradient Method

Gradient Descent. Ryan Tibshirani Convex Optimization /36-725

Lecture 12 Unconstrained Optimization (contd.) Constrained Optimization. October 15, 2008

Lecture 25: Subgradient Method and Bundle Methods April 24

Newton s Method. Ryan Tibshirani Convex Optimization /36-725

Gradient descent. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725

IE 5531: Engineering Optimization I

Lecture 15: October 15

Lecture 9 Sequential unconstrained minimization

Lecture 5: Gradient Descent. 5.1 Unconstrained minimization problems and Gradient descent

1. Gradient method. gradient method, first-order methods. quadratic bounds on convex functions. analysis of gradient method

Lecture 15 Newton Method and Self-Concordance. October 23, 2008

CSCI : Optimization and Control of Networks. Review on Convex Optimization

Dual methods and ADMM. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725

A Brief Review on Convex Optimization

Barrier Method. Javier Peña Convex Optimization /36-725

Unconstrained minimization of smooth functions

Analytic Center Cutting-Plane Method

Newton s Method. Javier Peña Convex Optimization /36-725

Constrained Optimization and Lagrangian Duality

Conditional Gradient (Frank-Wolfe) Method

Gradient Descent. Lecturer: Pradeep Ravikumar Co-instructor: Aarti Singh. Convex Optimization /36-725

Stochastic Subgradient Method

4. Convex optimization problems

Lecture: Convex Optimization Problems

Convex Optimization. Newton s method. ENSAE: Optimisation 1/44

Optimization II: Unconstrained Multivariable

Online Learning Summer School Copenhagen 2015 Lecture 1

Lecture 14: October 17

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization /

Convex Optimization. Ofer Meshi. Lecture 6: Lower Bounds Constrained Optimization

8 Numerical methods for unconstrained problems

Algorithmic Convex Geometry

Math 273a: Optimization Subgradients of convex functions

10. Unconstrained minimization

Introduction to Optimization

Convex Optimization. Problem set 2. Due Monday April 26th

Nonlinear Optimization for Optimal Control

Nonlinear Programming

Chapter 2. Optimization. Gradients, convexity, and ALS

arxiv: v1 [math.oc] 1 Jul 2016

Convex Optimization. Prof. Nati Srebro. Lecture 12: Infeasible-Start Newton s Method Interior Point Methods

Primal-dual Subgradient Method for Convex Problems with Functional Constraints

Numerical Optimization

Non-negative Matrix Factorization via accelerated Projected Gradient Descent

Chapter 4. Unconstrained optimization

Machine Learning. Support Vector Machines. Fabio Vandin November 20, 2017

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 4. Subgradient

Lecture 5: September 15

Convex optimization problems. Optimization problem in standard form

You should be able to...

Suppose that the approximate solutions of Eq. (1) satisfy the condition (3). Then (1) if η = 0 in the algorithm Trust Region, then lim inf.

Descent methods. min x. f(x)

Introduction. New Nonsmooth Trust Region Method for Unconstraint Locally Lipschitz Optimization Problems

Summary of the simplex method

Trust Region Methods. Lecturer: Pradeep Ravikumar Co-instructor: Aarti Singh. Convex Optimization /36-725

Continuous Optimisation, Chpt 6: Solution methods for Constrained Optimisation

CS675: Convex and Combinatorial Optimization Spring 2018 The Ellipsoid Algorithm. Instructor: Shaddin Dughmi

MATH 829: Introduction to Data Mining and Analysis Computing the lasso solution

Line Search Methods for Unconstrained Optimisation

Lecture 11. Linear Soft Margin Support Vector Machines

Lecture 7: September 17

Topics in Theoretical Computer Science: An Algorithmist's Toolkit Fall 2007

Unconstrained minimization

2.098/6.255/ Optimization Methods Practice True/False Questions

15-850: Advanced Algorithms CMU, Fall 2018 HW #4 (out October 17, 2018) Due: October 28, 2018

Convex Optimization and Modeling

Lecture 5: September 12

CPSC 540: Machine Learning

1 Newton s Method. Suppose we want to solve: x R. At x = x, f (x) can be approximated by:

1. Nonlinear Equations. This lecture note excerpted parts from Michael Heath and Max Gunzburger. f(x) = 0

The Ellipsoid Algorithm

Lecture 14: Newton s Method

ORIE 6326: Convex Optimization. Quasi-Newton Methods

Subgradient Methods. Stephen Boyd (with help from Jaehyun Park) Notes for EE364b, Stanford University, Spring

Math 273a: Optimization Subgradient Methods

Solution Methods. Richard Lusby. Department of Management Engineering Technical University of Denmark

CSC Linear Programming and Combinatorial Optimization Lecture 8: Ellipsoid Algorithm

EE364 Review Session 4

Contraction Mappings Consider the equation

4. Convex optimization problems

U.C. Berkeley CS294: Spectral Methods and Expanders Handout 11 Luca Trevisan February 29, 2016

Second Order Optimization Algorithms I

1 Convexity, concavity and quasi-concavity. (SB )

4. Convex optimization problems (part 1: general)

Optimization Tutorial 1. Basic Gradient Descent

Quasi-Newton Methods

Scientific Computing: An Introductory Survey

Scientific Computing: An Introductory Survey

Motivation. Lecture 2 Topics from Optimization and Duality. network utility maximization (NUM) problem:

Optimization II: Unconstrained Multivariable

minimize x subject to (x 2)(x 4) u,

6.252 NONLINEAR PROGRAMMING LECTURE 10 ALTERNATIVES TO GRADIENT PROJECTION LECTURE OUTLINE. Three Alternatives/Remedies for Gradient Projection

Gradient Descent. Dr. Xiaowei Huang

Transcription:

S. Boyd EE364 Lecture 14 Ellipsoid method idea of localization methods bisection on R center of gravity algorithm ellipsoid method 14 1

Localization f : R n R convex (and for now, differentiable) problem: minimize f oracle model: for any x we can evaluate f and f(x) (at some cost) recall: f(x) f(x 0 )+ f(x 0 ) T (x x 0 ) hence f(x 0 ) T (x x 0 ) 0 f(x) f(x 0 ) level curves of f x 0 f(x 0 ) f(x 0 ) T (x x 0 ) 0 by evaluating f we rule out a halfspace in our search for x : x {x f(x 0 ) T (x x 0 ) 0} idea: get one bit of info by evaluating f Ellipsoid method 14 2

suppose we have evaluated f(x 1 ),..., f(x k ) then we know x {x f(x i ) T (x x i ) 0} f(x 1 ) x 1 x 2 f(x 2 ) x k f(x k ) on the basis of f(x 1 ),..., f(x k ), we have localized x to a polyhedron What is a good point x k+1 at which to evaluate f? Ellipsoid method 14 3

localization algorithm (idea) 1. after iteration k 1 we know x C (k 1) : C (k 1) = {x f(x (i) ) T (x x (i) ) 0,i=1,...,k 1} 2. evaluate f(x k ) for some x (k) C (k 1) 3. C (k) := C (k 1) {x f(x (k) ) T (x x (k) ) 0} C (k 1) f(x (k) ) f(x (k) ) x (k) x (k) C (k) C (k) gives our uncertainty of x at iteration k pick x (k) so that C (k+1) is as small as possible clearly want x (k) near center of C (k) Ellipsoid method 14 4

Example: bisection on R f : R R C (k) is interval obvious choice: x (k+1) := midpoint(c (k) ) x (k+1) C (k) C (k+1) bisection algorithm given interval C =[l, u] containing x repeat 1. x := (l + u)/2 2. evaluate f (x) 3. if f (x) < 0, l := x; else u := x Ellipsoid method 14 5

we have length(c (k+1) )=length(c (k) )/2, so length(c (k) )=2 k length(c (0) ) interpretation: length(c (k) ) measures our uncertainty in x uncertainty is halved at each iteration (get exactly one bit info about x per iteration) # steps required for uncertainty ɛ: log 2 length(c (0) ) ɛ = log 2 initial uncertainty final uncertainty question: can bisection be extended to R n? or is it special since R is linear ordering? Ellipsoid method 14 6

Center of gravity algorithm take x (k+1) = CG(C (k) ) (center of gravity) CG(C (k) )= xdx C (k) / dx C (k) theorem. if C R n convex, x cg = CG(C), g 0, vol ( C {x g T (x x cg ) 0} ) (1 1/e) vol(c) 0.63 vol(c) (independent of n) hence vol(c (k) ) 0.63 k vol(c (0) ) vol(c (k) ) measures uncertainty at iteration k uncertainty reduced by 0.63 or more per iteration max. # steps required for uncertainty ɛ: 1.85 log vol(c(0) ) ɛ =1.85 log initial uncertainty final uncertainty from this can prove f(x (k) ) f(x ) (later) Ellipsoid method 14 7

advantages of CG-method guaranteed convergence number of steps independent of dimension n disadvantages finding x (k+1) = CG(C (k) ) is harder than original problem C (k) becomes more complex as k increases (removing redundant constraints is harder than solving original problem) (but, can modify CG-method to work) Ellipsoid method 14 8

Ellipsoid algorithm idea: localize x in an ellipsoid instead of a polyhedron 1. at iteration k we know x E (k) 2. set x (k+1) := center(e (k) ); evaluate f(x (k+1) ) 3. hence we know x E (k) {z f(x (k) ) T (z x (k) ) 0} (a half-ellipsoid) 4. set E (k+1) := minimum volume ellipsoid covering E (k) {z f(x (k+1) ) T (z x (k+1) ) 0} E (k) f(x (k+1) ) E (k+1) x (k+1) Ellipsoid method 14 9

compared to CG method: localization set doesn t grow more complicated, but, we add unnecessary points in step 4 properties of ellipsoid method reduces to bisection for n =1 simple formula for E (k+1) given E (k), f(x (k+1) ) E (k+1) can be larger than E (k) in diameter (max semi-axis length), but is always smaller in volume vol(e (k+1) ) <e 1 2nvol(E (k) ) (note that volume reduction factor depends on n) extends to nondifferentiable, constrained, quasiconvex (more later) Ellipsoid method 14 10

Example x (0) x (1) x (2) x (3) x (4) x (5) Ellipsoid method 14 11

Updating the ellipsoid E(x, A) = { z (z x) T A 1 (z x) 1 } E + x x + g E (for n>1) minimum volume ellipsoid containing E { z g T (z x) 0 } is given by x + = x 1 n +1 A g A + = n 2 n 2 1 where g = g / gt Ag ( A 2 n+1 A g gt A ) Ellipsoid method 14 12

Proof of convergence assumptions: f is Lipschitz: f(y) f(x) G y x E (0) is ball with radius R suppose f(x (i) ) >f +ɛ,i=0,...,k then f(x) f + ɛ = x E (k) since at iter i we only discard points with f f(x (i) ) from Lipschitz condition, x x ɛ/g = f(x) f + ɛ = x E (k), i.e., B = {x x x ɛ/g} E (k) hence vol(b) vol(e (k) ),so β n (ɛ/g) n e k 2n vol(e (0) )=e k 2n βn R n (β n is volume of unit ball in R n ) therefore k 2n 2 log(rg/ɛ) Ellipsoid method 14 13

E (0) x x (k) E (k) B = {x x x ɛ/g} f(x) f + ɛ conclusion: for K>2n 2 log(rg/ɛ), min i=0,...,k f(x(i) ) f + ɛ Ellipsoid method 14 14

interpretation of complexity: since x E 0 ={x x x (0) R}, our prior knowledge of f is f [f(x (0) ) GR, f(x (0) )] our prior uncertainty in f is GR after k iterations our knowledge of f is f [ ] min i=0,...,k f(x(i) ) ɛ, min i=0,...,k f(x(i) ) posterior uncertainty in f is ɛ iterations required: 2n 2 log RG ɛ =2n 2 log prior uncertainty posterior uncertainty efficiency: 0.72/n 2 bits per gradient evaluation (note: degrades with n) Ellipsoid method 14 15

Stopping criterion f(x ) f(x (k) )+ f(x (k) ) T (x x (k) ) f(x (k) ) + inf f(x (k) ) T (x x (k) ) x E (k) = f(x (k) ) f(x (k) ) T A (k) f(x (k) ) simple stopping criterion: f(x (k) ) T A (k) f(x (k) ) ɛ f f(x (k) ) f(x (k) ) f(x (k) ) T A (k) f(x (k) ) 0 5 10 15 20 25 30 k Ellipsoid method 14 16

more sophisticated criterion: U k L k ɛ, where U k = min L k = max i k i k f(x(i) ) ( f(x (i) ) ) f(x (i) ) T A (i) f(x (i) ) f U k L k 0 5 10 15 20 25 30 k Ellipsoid method 14 17

Basic ellipsoid algorithm given ellipsoid E(x, A) containing x repeat 1. evaluate f(x) 2. if f(x) T A f(x) ɛ, return(x) 3. update ellipsoid / 3a. g := f(x) f(x)t A f(x) interpretation: 3b. x := x 1 3c. A := n2 n 2 1 n+1 A g ( ) A 2 n+1 A g gt A change coords so uncertainty (E) is unit ball take gradient step with length 1/(n +1) properties: not a descent method like quasi-newton method with fixed step length much slower convergence than BFGS, etc. but, extends to nondifferentiable f Ellipsoid method 14 18

Ellipsoid method for standard problem minimize subject to f 0 (x) f i (x) 0, i=1,...,m same idea: maintain ellipsoids E (k) that contain x decrease in volume to zero case 1: x (k) feasible, i.e., f i (x (k) ) 0, i =1,...,m then do usual update of E (k) based on f 0 (x (k) ): rules out halfspace of points with larger function value than current point case 2: x (k) infeasible, say, f j (x (k) ) > 0; then f j (x (k) ) T (x x (k) ) 0= f j (x)>0= xinfeasible so update E (k) based on f j (x (k) ) rules out halfspace of infeasible points Ellipsoid method 14 19

Example f 1 (x) =0 x (0) x (1) f 0(x (1) ) f 1 (x (0) ) f 0 (x (2) ) x (2) x (3) f 1 (x (3) ) f 0 (x (4) ) x (4) x (5) f 0 (x (5) ) Ellipsoid method 14 20

Stopping criterion if x k is feasible, we have a lower bound on f as before: f f(x (k) ) f(x (k) ) T A (k) f(x (k) ) if x (k) is infeasible, we have for all x E (k) f j (x) f j (x (k) )+ f j (x (k) ) T (x x (k) ) f j (x (k) ) + inf x E (k) f j (x (k) ) T (x x (k) ) = f j (x (k) ) f j (x (k) ) T A (k) f j (x (k) ) hence, problem is infeasible if for some j, f j (x (k) ) f j (x (k) ) T A (k) f j (x (k) ) > 0 Ellipsoid method 14 21