Math 273a: Optimization Lagrange Duality

Size: px

Start display at page:

Download "Math 273a: Optimization Lagrange Duality"

Shanon Caldwell
5 years ago
Views:

1 Math 273a: Optimization Lagrange Duality Instructor: Wotao Yin Department of Mathematics, UCLA Winter 2015 online discussions on piazza.com

2 Gradient descent / forward Euler assume function f is proper close conve, differentiable consider min f() gradient descent iteration (with step size c): k+1 = k c f( k ) k+1 minimizes the following local quadratic approimation of f: f( k ) + f( k ), k + 1 2c k 2 2 compare with forward Euler iteration, a.k.a. the eplicit update: (t + 1) = (t) Δt f((t))

3 Backward Euler / implicit gradient descent backward Euler iteration, also known as the implicit update: (t + 1) solve (t + 1) = (t) Δt f ((t + 1)). equivalent to: 1. u(t + 1) solve u = f ((t) Δt u), 2. (t + 1) = (t) Δt u(t + 1). we can view it as the implicit gradient descent: ( k+1, u k+1 ) solve = k cu, u = f(). c is the step size, very different from a standard step size. eplicit (implicit) update uses the gradient at the start (end) point

4 Implicit gradient step = proimal operation proimal update: optimality condition: pro cf (z) := arg min f() + 1 2c z 2. 0 = c f( ) + ( z). given input z, - pro cf (z) returns solution - f(pro cf (z)) returns u (, u ) solve = z cu, u = f(). Proposition Proimal operator is equivalent to an implicit gradient (or backward Euler) step.

5 Proimal operator handles sub-differentiable f assume that f is closed, proper, sub-differentiable conve function f() is denoted as the subdifferential of f at. Recall u f() if f( ) f() + u,, R n. f() is point-to-set, neither direction is unique pro is well-defined for sub-differentiable f; it is point-to-point, pro maps any input to a unique point

6 Proimal operator pro cf (z) := arg min f() + 1 2c z 2. since objective is strongly conve, solution pro cf (z) is unique since f is proper, dom pro cf = R n the followings are equivalent pro cf z = = arg min f() + 1 2c z 2, solve 0 c f() + ( z), (, u ) solve = z cu, u f(). point minimizes f if and only if = pro f ( ).

7 Lagrange duality Conve problem minimize f() subject to A = b. Rela the constraints and price their violation (pay a price if violated one way; get paid if violated the other way; payment is linear to the violation) L(; y) := f() + y T (A b) For later use, define the augmented Lagrangian L A(; y, c) := f() + y T (A b) + c A b 2 2 Minimize L for fied price y: d(y) := min L(; y). Always, d(y) is conve The Lagrange dual problem minimize y d(y) Given dual solution y, recover = min L(; y ) (under which conditions?) Question: how to compute the eplicit/implicit gradients of d(y)?

8 Dual eplicit gradient (ascent) algorithm Assume d(y) is differentiable (true if f() is strictly conve.) Gradient descent iteration (if the maimizing dual is used, it is called gradient ascent): y k+1 = y k c f(y k ). It turns out to be relatively easy to compute d, via an unstrained subproblem: d(y) = b Aˉ, where ˉ = arg min L(; y). Dual gradient iteration 1. k solve min L(; y k ); 2. y k+1 = y k c(b A k ).

9 Sub-gradient of d(y) Assume d(y) is sub-differentiable (which condition on primal can guarantee this?) Lemma Given dual point y and ˉ = arg min L(; y), we have b Aˉ d(y). Proof. Recall u d(y) if d(y ) d(y) + u, y y for all y ; d(y) := min L(; y). From (ii) and definition of ˉ, d(y) + b Aˉ, y y = L(ˉ; y) + (b Aˉ) T (y y ) = [f(ˉ + y T (Aˉ b)] + (b Aˉ) T (y y ) = [f(ˉ) + (y ) T (Aˉ b)] = L(ˉ; y ) d(y ). From (i), b Aˉ d(y).

10 Dual eplicit (sub)gradient iteration The iteration: 1. k solve min L(; y k ); 2. y k+1 = y k c k (b A k ); Notes: (b A k ) d(y k ) as shown in the last slide it does not require d(y) to be differentiable convergence might require a careful choice of c k (e.g., a diminishing sequence) if d(y) is only sub-differentiable (or lacking Lipschitz continuous gradient)

11 Dual implicit gradient Goal: to descend using the (sub)gradient of d at the net point y k+1 : Following from the Lemma, we have b A k+1 d(y k+1 ), where k+1 = arg min L(; y k+1 ) Since the implicit step is y k+1 = y k c(b A k+1 ), we can derive k+1 = arg min L(; y k+1 ) 0 L( k+1 ; y k+1 ) = f( k+1 ) + A T y k+1 = f( k+1 ) + A T (y k c(b A k+1 )). Therefore, while k+1 is a solution to min L(; y k+1 ); it is also a solution to min L A(; y k, c) = f() + (y k ) T (A b) + c 2 A b 2, which is independent of y k+1.

12 Dual implicit gradient Proposition Assuming y = y c(b A ), the followings are equivalent 1. solve min L(; y ), 2. solve min L A(; y, c).

13 Dual implicit gradient iteration The iteration y k+1 = pro cd (y k ) is commonly known as the augmented Lagrangian method or the method of multipliers. Implementation: k+1 solve 1. min L A(; y k, c); 2. y k+1 = y k c(b A k+1 ). Proposition The followings are equivalent 1. the augmented Lagrangian iteration; 2. the implicit gradient iteration of d(y); 3. the proimal iteration y k+1 = pro cd (y k ).

14 Definitions: Dual eplicit/implicit (sub)gradient computation L(; y) = f() + y T (A b) L A (; y, c) = L(; y) + c A b 2 2 Objective: d(y) = min L(; y). Eplicit (sub)gradient iteration: y k+1 = y k c d(y k ) or use a subgradient d(y k ) 1. k+1 = arg min L(; y k ); 2. y k+1 = y k c(b A k+1 ). Implicit (sub)gradient step: y k+1 = pro cd y k 1. k+1 = arg min L A(; y k, c); 2. y k+1 = y k c(b A k+1 ). The implicit iteration is more stable; step size c does not need to diminish.

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course otes for EE7C (Spring 018): Conve Optimization and Approimation Instructor: Moritz Hardt Email: hardt+ee7c@berkeley.edu Graduate Instructor: Ma Simchowitz Email: msimchow+ee7c@berkeley.edu October