Math 273a: Optimization Lagrange Duality

Similar documents
Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Math 273a: Optimization Subgradient Methods

Coordinate Update Algorithm Short Course Subgradients and Subgradient Methods

Math 273a: Optimization Subgradients of convex functions

Dual Methods. Lecturer: Ryan Tibshirani Convex Optimization /36-725

Math 273a: Optimization Convex Conjugacy

Dual Ascent. Ryan Tibshirani Convex Optimization

Sparse Optimization Lecture: Dual Methods, Part I

Math 273a: Optimization Overview of First-Order Optimization Algorithms

10-725/36-725: Convex Optimization Spring Lecture 21: April 6

Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem

Lecture 24 November 27

Coordinate Update Algorithm Short Course Operator Splitting

Primal/Dual Decomposition Methods

Coordinate Update Algorithm Short Course Proximal Operators and Algorithms

Lecture 23: November 19

CSC 576: Gradient Descent Algorithms

In applications, we encounter many constrained optimization problems. Examples Basis pursuit: exact sparse recovery problem

Math 273a: Optimization Subgradients of convex functions

Motivation. Lecture 2 Topics from Optimization and Duality. network utility maximization (NUM) problem:

Introduction to Alternating Direction Method of Multipliers

Lecture 23: Conditional Gradient Method

Solving Dual Problems

Convex Optimization. Dani Yogatama. School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA. February 12, 2014

Dual methods and ADMM. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725

Lecture 16: October 22

Primal-dual Subgradient Method for Convex Problems with Functional Constraints

Splitting methods for decomposing separable convex programs

Bregman Divergence and Mirror Descent

Math 164: Optimization Barzilai-Borwein Method

Convergence of Fixed-Point Iterations

Lecture 7: Weak Duality

Tight Rates and Equivalence Results of Operator Splitting Schemes

Convex Optimization Overview (cnt d)

The Multi-Path Utility Maximization Problem

Sparse Optimization Lecture: Basic Sparse Optimization Models

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization /

Operators. Pontus Giselsson

Lecture 6: September 12

Optimization methods

Some Properties of the Augmented Lagrangian in Cone Constrained Optimization

Machine Learning. Support Vector Machines. Fabio Vandin November 20, 2017

Decentralized Quadratically Approximated Alternating Direction Method of Multipliers

Dual Decomposition.

Lecture 7: September 17

Accelerated primal-dual methods for linearly constrained convex problems

1. Sets A set is any collection of elements. Examples: - the set of even numbers between zero and the set of colors on the national flag.

Introduction to Mathematical Programming IE406. Lecture 10. Dr. Ted Ralphs

Algorithms for Nonsmooth Optimization

Kaisa Joki Adil M. Bagirov Napsu Karmitsa Marko M. Mäkelä. New Proximal Bundle Method for Nonsmooth DC Optimization

Operator Splitting for Parallel and Distributed Optimization

Convex Optimization. Ofer Meshi. Lecture 6: Lower Bounds Constrained Optimization

ON A CLASS OF NONSMOOTH COMPOSITE FUNCTIONS

Alternative Decompositions for Distributed Maximization of Network Utility: Framework and Applications

Sparse Optimization Lecture: Dual Certificate in l 1 Minimization

Proximal splitting methods on convex problems with a quadratic term: Relax!

Lecture 6 : Projected Gradient Descent

Double Smoothing technique for Convex Optimization Problems with Linear Constraints

UVA CS 4501: Machine Learning

Lecture 5: Gradient Descent. 5.1 Unconstrained minimization problems and Gradient descent

Nesterov s Acceleration

DLM: Decentralized Linearized Alternating Direction Method of Multipliers

An interior-point stochastic approximation method and an L1-regularized delta rule

Multibody simulation

minimize x subject to (x 2)(x 4) u,

ALADIN An Algorithm for Distributed Non-Convex Optimization and Control

Lagrangian Relaxation in MIP

Convex Optimization Conjugate, Subdifferential, Proximation

10701 Recitation 5 Duality and SVM. Ahmed Hefny

Statistical Machine Learning II Spring 2017, Learning Theory, Lecture 4

Dual Proximal Gradient Method

ACCELERATED FIRST-ORDER PRIMAL-DUAL PROXIMAL METHODS FOR LINEARLY CONSTRAINED COMPOSITE CONVEX PROGRAMMING

A Greedy Framework for First-Order Optimization

Sparse Linear Programming via Primal and Dual Augmented Coordinate Descent

Master 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique

Lecture: Duality of LP, SOCP and SDP

EE/AA 578, Univ of Washington, Fall Duality

Utility Maximization for Communication Networks with Multi-path Routing

Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.18

Utility Maximization for Communication Networks with Multi-path Routing

Introduction. New Nonsmooth Trust Region Method for Unconstraint Locally Lipschitz Optimization Problems

ECE Optimization for wireless networks Final. minimize f o (x) s.t. Ax = b,

Lecture 25: Subgradient Method and Bundle Methods April 24

HYBRID JACOBIAN AND GAUSS SEIDEL PROXIMAL BLOCK COORDINATE UPDATE METHODS FOR LINEARLY CONSTRAINED CONVEX PROGRAMMING

Big Data Analytics: Optimization and Randomization

You should be able to...

Duality Uses and Correspondences. Ryan Tibshirani Convex Optimization

Lecture 6: September 17

Subgradient Method. Ryan Tibshirani Convex Optimization

Math 273a: Optimization

Part 5: Penalty and augmented Lagrangian methods for equality constrained optimization. Nick Gould (RAL)

3.10 Lagrangian relaxation

Duality revisited. Javier Peña Convex Optimization /36-725

Numerical optimization

Subgradient Method. Guest Lecturer: Fatma Kilinc-Karzan. Instructors: Pradeep Ravikumar, Aarti Singh Convex Optimization /36-725

Lecture 11 and 12: Penalty methods and augmented Lagrangian methods for nonlinear programming

STAT 200C: High-dimensional Statistics

Generalization to inequality constrained problem. Maximize

Convergence rate estimates for the gradient differential inclusion

Primal Solutions and Rate Analysis for Subgradient Methods

Transcription:

Math 273a: Optimization Lagrange Duality Instructor: Wotao Yin Department of Mathematics, UCLA Winter 2015 online discussions on piazza.com

Gradient descent / forward Euler assume function f is proper close conve, differentiable consider min f() gradient descent iteration (with step size c): k+1 = k c f( k ) k+1 minimizes the following local quadratic approimation of f: f( k ) + f( k ), k + 1 2c k 2 2 compare with forward Euler iteration, a.k.a. the eplicit update: (t + 1) = (t) Δt f((t))

Backward Euler / implicit gradient descent backward Euler iteration, also known as the implicit update: (t + 1) solve (t + 1) = (t) Δt f ((t + 1)). equivalent to: 1. u(t + 1) solve u = f ((t) Δt u), 2. (t + 1) = (t) Δt u(t + 1). we can view it as the implicit gradient descent: ( k+1, u k+1 ) solve = k cu, u = f(). c is the step size, very different from a standard step size. eplicit (implicit) update uses the gradient at the start (end) point

Implicit gradient step = proimal operation proimal update: optimality condition: pro cf (z) := arg min f() + 1 2c z 2. 0 = c f( ) + ( z). given input z, - pro cf (z) returns solution - f(pro cf (z)) returns u (, u ) solve = z cu, u = f(). Proposition Proimal operator is equivalent to an implicit gradient (or backward Euler) step.

Proimal operator handles sub-differentiable f assume that f is closed, proper, sub-differentiable conve function f() is denoted as the subdifferential of f at. Recall u f() if f( ) f() + u,, R n. f() is point-to-set, neither direction is unique pro is well-defined for sub-differentiable f; it is point-to-point, pro maps any input to a unique point

Proimal operator pro cf (z) := arg min f() + 1 2c z 2. since objective is strongly conve, solution pro cf (z) is unique since f is proper, dom pro cf = R n the followings are equivalent pro cf z = = arg min f() + 1 2c z 2, solve 0 c f() + ( z), (, u ) solve = z cu, u f(). point minimizes f if and only if = pro f ( ).

Lagrange duality Conve problem minimize f() subject to A = b. Rela the constraints and price their violation (pay a price if violated one way; get paid if violated the other way; payment is linear to the violation) L(; y) := f() + y T (A b) For later use, define the augmented Lagrangian L A(; y, c) := f() + y T (A b) + c A b 2 2 Minimize L for fied price y: d(y) := min L(; y). Always, d(y) is conve The Lagrange dual problem minimize y d(y) Given dual solution y, recover = min L(; y ) (under which conditions?) Question: how to compute the eplicit/implicit gradients of d(y)?

Dual eplicit gradient (ascent) algorithm Assume d(y) is differentiable (true if f() is strictly conve.) Gradient descent iteration (if the maimizing dual is used, it is called gradient ascent): y k+1 = y k c f(y k ). It turns out to be relatively easy to compute d, via an unstrained subproblem: d(y) = b Aˉ, where ˉ = arg min L(; y). Dual gradient iteration 1. k solve min L(; y k ); 2. y k+1 = y k c(b A k ).

Sub-gradient of d(y) Assume d(y) is sub-differentiable (which condition on primal can guarantee this?) Lemma Given dual point y and ˉ = arg min L(; y), we have b Aˉ d(y). Proof. Recall u d(y) if d(y ) d(y) + u, y y for all y ; d(y) := min L(; y). From (ii) and definition of ˉ, d(y) + b Aˉ, y y = L(ˉ; y) + (b Aˉ) T (y y ) = [f(ˉ + y T (Aˉ b)] + (b Aˉ) T (y y ) = [f(ˉ) + (y ) T (Aˉ b)] = L(ˉ; y ) d(y ). From (i), b Aˉ d(y).

Dual eplicit (sub)gradient iteration The iteration: 1. k solve min L(; y k ); 2. y k+1 = y k c k (b A k ); Notes: (b A k ) d(y k ) as shown in the last slide it does not require d(y) to be differentiable convergence might require a careful choice of c k (e.g., a diminishing sequence) if d(y) is only sub-differentiable (or lacking Lipschitz continuous gradient)

Dual implicit gradient Goal: to descend using the (sub)gradient of d at the net point y k+1 : Following from the Lemma, we have b A k+1 d(y k+1 ), where k+1 = arg min L(; y k+1 ) Since the implicit step is y k+1 = y k c(b A k+1 ), we can derive k+1 = arg min L(; y k+1 ) 0 L( k+1 ; y k+1 ) = f( k+1 ) + A T y k+1 = f( k+1 ) + A T (y k c(b A k+1 )). Therefore, while k+1 is a solution to min L(; y k+1 ); it is also a solution to min L A(; y k, c) = f() + (y k ) T (A b) + c 2 A b 2, which is independent of y k+1.

Dual implicit gradient Proposition Assuming y = y c(b A ), the followings are equivalent 1. solve min L(; y ), 2. solve min L A(; y, c).

Dual implicit gradient iteration The iteration y k+1 = pro cd (y k ) is commonly known as the augmented Lagrangian method or the method of multipliers. Implementation: k+1 solve 1. min L A(; y k, c); 2. y k+1 = y k c(b A k+1 ). Proposition The followings are equivalent 1. the augmented Lagrangian iteration; 2. the implicit gradient iteration of d(y); 3. the proimal iteration y k+1 = pro cd (y k ).

Definitions: Dual eplicit/implicit (sub)gradient computation L(; y) = f() + y T (A b) L A (; y, c) = L(; y) + c A b 2 2 Objective: d(y) = min L(; y). Eplicit (sub)gradient iteration: y k+1 = y k c d(y k ) or use a subgradient d(y k ) 1. k+1 = arg min L(; y k ); 2. y k+1 = y k c(b A k+1 ). Implicit (sub)gradient step: y k+1 = pro cd y k 1. k+1 = arg min L A(; y k, c); 2. y k+1 = y k c(b A k+1 ). The implicit iteration is more stable; step size c does not need to diminish.