Convex Optimization. Ofer Meshi. Lecture 6: Lower Bounds Constrained Optimization

Similar documents
Convex Optimization Lecture 16

Lagrange duality. The Lagrangian. We consider an optimization program of the form

Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization

Part IB Optimisation

Convex Optimization. Prof. Nati Srebro. Lecture 12: Infeasible-Start Newton s Method Interior Point Methods

Lecture: Duality of LP, SOCP and SDP

Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem

Convex Optimization M2

Optimization for Machine Learning

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 4. Subgradient

Linear and non-linear programming

CSCI : Optimization and Control of Networks. Review on Convex Optimization

Convex Optimization Boyd & Vandenberghe. 5. Duality

A Brief Review on Convex Optimization

Duality in Linear Programs. Lecturer: Ryan Tibshirani Convex Optimization /36-725

5. Duality. Lagrangian

Lecture: Duality.

Constrained Optimization and Lagrangian Duality

Lecture 15 Newton Method and Self-Concordance. October 23, 2008

Primal-dual Subgradient Method for Convex Problems with Functional Constraints

ICS-E4030 Kernel Methods in Machine Learning

ECE Optimization for wireless networks Final. minimize f o (x) s.t. Ax = b,

Applications of Linear Programming

EE364a Review Session 5

- Well-characterized problems, min-max relations, approximate certificates. - LP problems in the standard form, primal and dual linear programs

Convex Optimization and SVM

Convex Optimization and Modeling

Conditional Gradient (Frank-Wolfe) Method

Convex Optimization. Newton s method. ENSAE: Optimisation 1/44

Convex Optimization & Lagrange Duality

Duality of LPs and Applications

12. Interior-point methods

Optimization and Optimal Control in Banach Spaces

Optimization for Communications and Networks. Poompat Saengudomlert. Session 4 Duality and Lagrange Multipliers

Primal/Dual Decomposition Methods

Subgradient. Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes. definition. subgradient calculus

STAT 200C: High-dimensional Statistics

Lagrangian Duality Theory

Convex Optimization. Lecture 12 - Equality Constrained Optimization. Instructor: Yuanzhang Xiao. Fall University of Hawaii at Manoa

Duality. Lagrange dual problem weak and strong duality optimality conditions perturbation and sensitivity analysis generalized inequalities

Convex Optimization and l 1 -minimization

15-850: Advanced Algorithms CMU, Fall 2018 HW #4 (out October 17, 2018) Due: October 28, 2018

Lecture 6 : Projected Gradient Descent

TMA947/MAN280 APPLIED OPTIMIZATION

Barrier Method. Javier Peña Convex Optimization /36-725

Motivation. Lecture 2 Topics from Optimization and Duality. network utility maximization (NUM) problem:

On the interior of the simplex, we have the Hessian of d(x), Hd(x) is diagonal with ith. µd(w) + w T c. minimize. subject to w T 1 = 1,

Subgradient Method. Ryan Tibshirani Convex Optimization

10. Ellipsoid method

Nesterov s Optimal Gradient Methods

Duality. Geoff Gordon & Ryan Tibshirani Optimization /

Lecture 6: Conic Optimization September 8

1. Gradient method. gradient method, first-order methods. quadratic bounds on convex functions. analysis of gradient method

Linear and Combinatorial Optimization

Math 273a: Optimization Convex Conjugacy

Optimisation in Higher Dimensions

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

Lecture 16: October 22

Optimization. Escuela de Ingeniería Informática de Oviedo. (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 1 / 30

Newton s Method. Javier Peña Convex Optimization /36-725

Input: System of inequalities or equalities over the reals R. Output: Value for variables that minimizes cost function

EE/AA 578, Univ of Washington, Fall Duality

Applied Lagrange Duality for Constrained Optimization

Math 273a: Optimization Subgradients of convex functions

subject to (x 2)(x 4) u,

Interior Point Methods for Mathematical Programming

Master 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique

LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE

Frank-Wolfe Method. Ryan Tibshirani Convex Optimization

3.10 Lagrangian relaxation

Introduction to Mathematical Programming IE406. Lecture 10. Dr. Ted Ralphs

How hard is this function to optimize?

Newton s Method. Ryan Tibshirani Convex Optimization /36-725

CSCI 1951-G Optimization Methods in Finance Part 09: Interior Point Methods

Lecture 8. Strong Duality Results. September 22, 2008

14. Duality. ˆ Upper and lower bounds. ˆ General duality. ˆ Constraint qualifications. ˆ Counterexample. ˆ Complementary slackness.

Multidisciplinary System Design Optimization (MSDO)

Numerical Optimization. Review: Unconstrained Optimization

I.3. LMI DUALITY. Didier HENRION EECI Graduate School on Control Supélec - Spring 2010

More First-Order Optimization Algorithms

Discrete Optimization 2010 Lecture 8 Lagrangian Relaxation / P, N P and co-n P

Lecture 1: Background on Convex Analysis

Lecture 6: September 12

Exponentiated Gradient Descent

Algorithms for constrained local optimization

1 Sparsity and l 1 relaxation

Convex Optimization and Support Vector Machine

12. Interior-point methods

Lecture 24 November 27

A Review of Linear Programming

Convex Optimization and Modeling

Lecture 18: Optimization Programming

CS 435, 2018 Lecture 3, Date: 8 March 2018 Instructor: Nisheeth Vishnoi. Gradient Descent

POLYNOMIAL OPTIMIZATION WITH SUMS-OF-SQUARES INTERPOLANTS

On Acceleration with Noise-Corrupted Gradients. + m k 1 (x). By the definition of Bregman divergence:

Nonlinear equations. Norms for R n. Convergence orders for iterative methods

Lecture 23: November 19

Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016

Interior Point Methods in Mathematical Programming

Bregman Divergence and Mirror Descent

Transcription:

Convex Optimization Ofer Meshi Lecture 6: Lower Bounds Constrained Optimization

Lower Bounds Some upper bounds: #iter μ 2 M #iter 2 M #iter L L μ 2 Oracle/ops GD κ log 1/ε M x # ε L # x # L # ε # με f + O(n) A- GD κ log 1/ε Is this the best we can do? What if we allow O n # M x # ops? YES! When using only first- order oracle (gradients) History: Nemirovski & Yudin (1983) Nesterov (2004) ε x x f + O(n)

Lower Bounds Black- box procedure: Ignoring runtime Past information mapping from x (<), f (<),, x (?), f (?) with f (?) f x (?) We count the number of oracle accesses Next point to x (?@<) Not satisfied by, e.g., Newton s method (uses # f)

Lower Bounds Theorem: For t n there exists a μ- strongly- convex and L- Lipschitz function f such that for any algorithm with access only to first- order oracle, min f <G?GH x(?) min J: J G L f x L# 8μt #M Implies Ω 1/ε

Lower Bounds Proof intuition: Play a game you (method) pick a point x? to query the oracle, and I (adversary) provide the answer. I can make sure that there exists f consistent with all previous answers (gradients), but has a core segment containing ε- optimal solutions which you did not query yet. Examples: Find a number between 1 and 10 20 questions game Lower bound for Sorting Ω n log n Choose the answer on- the- fly, but consistently

Lower Bounds Proof: Consider the μ- strongly- convex function f x = γ max x U + μ <GUG? 2 x # The subdifferential is then f x = μx + γ conv e U, i x U = max <G^G? x^ First- order oracle: f x = μx + γe U where i is the first coordinate for which x U = max <G^G? x^ Assume w.l.g. For all t 0 we have x (?@<) Span f (<),, f (?) x (<) = 0

Lower Bounds Proof: Consider the μ- strongly- convex function f x = γ max <GUG? x U + μ 2 x # At x (<) = 0 returns γe <, thus x (#) must lie on the line e < By induction, x (?) lies in the linear span of e <,, e?c< Therefore, x U (?) = 0 for all i t It follows that f x (?) 0 But we next construct a vector y with f y = ef Thus, f x (?) f x ef #M? #M? And taking γ = L/2 completes the proof Consistent with previous choices

Lower Bounds Proof: Consider the μ- strongly- convex function f x = γ max <GUG? x U + μ 2 x # At x (<) = 0 returns γe <, thus x (#) must lie on the line e < By induction, x (?) lies in the linear span of e <,, e?c< Therefore, x U (?) = 0 for all i t It follows that f x (?) 0 But we next construct a vector y with f y = ef Consider y = e M?,, e M?, 0,, 0 1 t t+1 n Notice that 0 f(y) Thus, y is a (global) minimizer with value f y = γ e M? + M ef t # M f? = ef f #M? #M?

Lower Bounds Proof without the assumptions (on Span and x < ): Consider the μ- strongly- convex function f x = γ max x, v U + μ U 2 x x g # Adversary decides x g and v U on- the- fly, but in time to answer the oracle query x g is set to x (<) (the first point queried) v? is set to any vector orthogonal to x (<),, x (?c<) If the space is large (n t), then there always exists such a vector The vector y in this case will be y = γ μt h v U U

Lower Bounds We showed bounds for non- smooth μ- strongly- convex functions, similar bounds exist for the smooth and/or convex cases. Summary of lower bounds with first- order oracle in high dimension: GD μ 2 M 2 M L κ log 1/ε M x # ε L μ 2 L # x # L # ε # με A- GD κ log 1/ε M x # ε x x Lower bound Ω κ log 1/ε Ω M x # Matching the upper bounds! (up to constants) Subgradient descent for non- smooth Accelerated GD for smooth (not attained by GD) ε Ω L# x # ε # Ω L# με

Lower Bounds What about low dimension? Do the bounds still hold? NO! For example, center- of- mass O n log 1 ε Also uses just gradients A similar construction shows lower bound Ω n log 1 ε BUT cost per iteration much higher oracle complexity runtime complexity Understanding runtime is much harder (as in general complexity theory) Why lower bounds are important Tells us what we shouldn t try Know that we are doing the best possible Tells us what we can hope for (motivation for A- GD) Improves understanding of the problem

Convex Optimization Ofer Meshi Lecture 6: Lower Bounds Constrained Optimization

Constrained Optimization Problems (P) min J R m f g (x) s. t. f U x 0 i = 1 m f g, f <,, f q : R r R { } Where f g, f U are convex We can easily introduce equality constraints: s. t. h^ x = 0 j = 1 p { h^ x 0 j = 1 p h^ x 0 Since we require convexity h^ must be linear: Ax = b A R ~ r, b R ~

Example: Linear Programming Linear Programming: min s.t. c H x Gx h Ax = b The feasible set is a polyhedron Unbounded only if feasible set is unbounded (but not always)

Example: Linear Programming Max- flow Vertices: 1 n Capacities: C U^ on edge i j Source node 1, Sink node n max X <U U s.t. 0 X U^ C U^ ij X U = ^ X U^ i = 2 n 1

Example: Linear Programming Piecewise linear minimization min J max <GUGq a U H x + b U Equivalent LP min? t s. t. a U H x + b U t i Non- smooth => constrained smooth

Example: Linear Programming ll < - norm minimization q min a J U < H U x b U Equivalent LP (2m constraints) q min t U? U < s. t. t U a H U x b U t U i = 1 m

Example: Linear Programming Feasibility problem Find x s.t. f U x 0 i = 1 m h^ x = 0 j = 1 p Equivalent LP min 0 J s.t. f U x 0 i = 1 m h^ x = 0 j = 1 p Example: linear separation Find w s.t. w H x U 1 x U χ @ w H x U 1 x U χ c

Example: Quadratic Programming Quadratic program min s.t. J < P is PSD => convex # xh Px + q H x Gx h Ax = b

Example: Quadratic Programming Quadratic program min s.t. J < Example: least squares # xh Px + q H x Gx h Ax = b min J Ax b # # with linear constraints: l x u

Optimality Condition x is optimal for (P) iff it is feasible and f g x,y x 0 Two options: Either x is in the interior, so f g x = 0 feasible y Or x is on the boundary, then f g x 0 => y f g x, y x = 0 is a supporting hyperplaneof the feasible set Generalizes unconstrained optimality condition f g x = 0 BUT: we want a more local optimality condition, without considering all feasible y

Lagrange Multipliers (P) Claim: min J R m f g (x) s. t. f U x 0 i = 1 m P = inf sup J R m R R š g WHY? h^ x = 0 q j = 1 p f g x + h λ U f U x U < ~ + h ν^h^ x ^ <

Lagrange Multipliers p = inf f J R m g x + h λ U f U x + h ν^h^ x R R U < ^ < š g L x, λ, ν Lagrangian If x is feasible, then sup is at λ = 0, and we get = f g x If x is infeasible, then: if f U x > 0 λ U if h U x > 0 ν U if h U x < 0 ν U Therefore, sup g, L x, λ, ν = f g x x is feasible ow q Lagrange multipliers ~

Lagrange Duality Claim (w/o any assumptions): sup g, inf J L x, λ, ν inf J Proof: for all λ 0, ν we have: inf J sup g, L x, λ, ν inf J š J Gg š J g f g x + h λ U f U x U < inf g x = p J since this holds for all λ 0, ν, we get q L x, λ, ν 0 ~ = 0 + h ν^h^ x sup inf L x, λ, ν g, J p = inf sup L x, λ, ν J g, ^ <

The Dual Define the dual objective function: g λ, ν = inf L x, λ, ν J The (Lagrange) dual problem: (D) max R R g(λ, ν) s. t. λ U 0 i = 1 m We denote by d the optimal value, and λ, ν dual opt. (if exists) is

Weak Duality Theorem (weak duality): d p Holds even if f g, f U, h^ not convex! Strong duality: That is, d = sup inf g, J d = p L x, λ, ν = inf J Means we can swap sup and inf WHEN? sup L x, λ, ν g, = p