Dual Proximal Gradient Method

Similar documents
Dual Decomposition.

9. Dual decomposition and dual algorithms

The proximal mapping

6. Proximal gradient method

6. Proximal gradient method

Dual and primal-dual methods

EE 546, Univ of Washington, Spring Proximal mapping. introduction. review of conjugate functions. proximal mapping. Proximal mapping 6 1

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization /

Lecture: Smoothing.

Fast proximal gradient methods

1. Gradient method. gradient method, first-order methods. quadratic bounds on convex functions. analysis of gradient method

Proximal Gradient Descent and Acceleration. Ryan Tibshirani Convex Optimization /36-725

Dual methods and ADMM. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725

Subgradient. Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes. definition. subgradient calculus

Agenda. Fast proximal gradient methods. 1 Accelerated first-order methods. 2 Auxiliary sequences. 3 Convergence analysis. 4 Numerical examples

On the interior of the simplex, we have the Hessian of d(x), Hd(x) is diagonal with ith. µd(w) + w T c. minimize. subject to w T 1 = 1,

Constrained Optimization and Lagrangian Duality

Math 273a: Optimization Convex Conjugacy

Lecture: Duality of LP, SOCP and SDP

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 9. Alternating Direction Method of Multipliers

Proximal Methods for Optimization with Spasity-inducing Norms

5. Subgradient method

Dual Methods. Lecturer: Ryan Tibshirani Convex Optimization /36-725

Lecture 8: February 9

Conditional Gradient (Frank-Wolfe) Method

Optimization methods

Subgradient Method. Ryan Tibshirani Convex Optimization

Optimization methods

Dual Ascent. Ryan Tibshirani Convex Optimization

SIAM Conference on Imaging Science, Bologna, Italy, Adaptive FISTA. Peter Ochs Saarland University

Convex Optimization. Newton s method. ENSAE: Optimisation 1/44

Coordinate Update Algorithm Short Course Subgradients and Subgradient Methods

Lecture: Duality.

Subgradient Method. Guest Lecturer: Fatma Kilinc-Karzan. Instructors: Pradeep Ravikumar, Aarti Singh Convex Optimization /36-725

Duality in Linear Programs. Lecturer: Ryan Tibshirani Convex Optimization /36-725

Lecture 7: September 17

Convex Optimization. (EE227A: UC Berkeley) Lecture 15. Suvrit Sra. (Gradient methods III) 12 March, 2013

Math 273a: Optimization Subgradients of convex functions

Master 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique

Convex Analysis Notes. Lecturer: Adrian Lewis, Cornell ORIE Scribe: Kevin Kircher, Cornell MAE

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.16

Accelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems)

Sparse Optimization Lecture: Dual Methods, Part I

Lecture 9: September 28

Newton s Method. Javier Peña Convex Optimization /36-725

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 4. Subgradient

Convex Optimization Algorithms for Machine Learning in 10 Slides

Frank-Wolfe Method. Ryan Tibshirani Convex Optimization

Smoothing Proximal Gradient Method. General Structured Sparse Regression

This can be 2 lectures! still need: Examples: non-convex problems applications for matrix factorization

CPSC 540: Machine Learning

On the Local Quadratic Convergence of the Primal-Dual Augmented Lagrangian Method

Convex Optimization Conjugate, Subdifferential, Proximation

Expanding the reach of optimal methods

arxiv: v7 [math.oc] 22 Feb 2018

Primal-dual Subgradient Method for Convex Problems with Functional Constraints

Lasso: Algorithms and Extensions

Adaptive Restarting for First Order Optimization Methods

Lecture 1: Background on Convex Analysis

Splitting methods for decomposing separable convex programs

Accelerated primal-dual methods for linearly constrained convex problems

Proximal Newton Method. Ryan Tibshirani Convex Optimization /36-725

Accelerated Proximal Gradient Methods for Convex Optimization

On the equivalence of the primal-dual hybrid gradient method and Douglas Rachford splitting

Lecture: Algorithms for Compressed Sensing

Distributed Optimization: Analysis and Synthesis via Circuits

Convex Optimization on Large-Scale Domains Given by Linear Minimization Oracles

You should be able to...

Optimizing Nonconvex Finite Sums by a Proximal Primal-Dual Method

Algorithms for Nonsmooth Optimization

A Fast Augmented Lagrangian Algorithm for Learning Low-Rank Matrices

Coordinate Update Algorithm Short Course Proximal Operators and Algorithms

Primal and Dual Predicted Decrease Approximation Methods

A Tutorial on Primal-Dual Algorithm

Math 273a: Optimization Subgradient Methods

Lecture 1: September 25

Proximal methods. S. Villa. October 7, 2014

Distributed Optimization via Alternating Direction Method of Multipliers

1 Sparsity and l 1 relaxation

ON A CLASS OF NONSMOOTH COMPOSITE FUNCTIONS

Optimization for Learning and Big Data

LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE

10. Unconstrained minimization

Nonlinear Optimization for Optimal Control

The Alternating Direction Method of Multipliers

Splitting Techniques in the Face of Huge Problem Sizes: Block-Coordinate and Block-Iterative Approaches

Newton s Method. Ryan Tibshirani Convex Optimization /36-725

8. Conjugate functions

Taylor-like models in nonsmooth optimization

Convex Optimization. Ofer Meshi. Lecture 6: Lower Bounds Constrained Optimization

Gradient descent. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725

Lecture 3. Optimization Problems and Iterative Algorithms

Coordinate Update Algorithm Short Course Operator Splitting

5. Duality. Lagrangian

Lecture 24 November 27

Unconstrained minimization of smooth functions

Proximal Newton Method. Zico Kolter (notes by Ryan Tibshirani) Convex Optimization

Proximal gradient methods

Coordinate Descent and Ascent Methods

Transcription:

Dual Proximal Gradient Method http://bicmr.pku.edu.cn/~wenzw/opt-2016-fall.html Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes

Outline 2/19 1 proximal gradient method applied to the dual 2 Examples 3 alternating minimization method

Dual methods 3/19 subgradient method : slow, step size selection difficult gradient method : requires differentiable dual cost function often dual cost is not differentiable, or has nontrivial domain dual can be smoothed by adding small strongly convex term to primal augmented Lagrangian method equivalent to gradient ascent on a smoothed dual problem however smoothing destroys separable structrue proximal gradient method(this lecture): dual cost split in two terms one term is differentiable with Lipschitz continuous gradient other term has an inexpensive prox-operator

Composite structure in the dual 4/19 min f (x) + g(ax) max f ( A T z) g (z) dual has the right structure for the proximal gradient method if prox-operator of g (or g ) is cheap (closed form or simple algorithm) f is strongly convex ( f (x) (µ/2)x T x is convex ) implies f ( A T z) has Lipschitz continuous gradient ( L = A 2 2 /µ ): A f ( A T u) A f ( A T v) 2 A 2 2 µ u v 2 because f is Lipschitz continuous with constant 1/µ

Dual proximal gradient update 5/19 z + = prox tg (z + ta f ( A T z)) equivalent expression in terms of f : z + = prox tg (z + taˆx) where ˆx = argmin(f (x) + z T Ax) x if f is separable, calculation of ˆx decomposes into independent problems step size t constant or from backtracking line search can use accelerated proximal gradient methods

Alternating minimization interpretation 6/19 Moreau decomposition gives alternate expression for z-update z + = z + t(aˆx ŷ) where ˆx = argmin(f (x) + z T Ax) x ŷ = prox t 1 g (z/t + Aˆx) = argmin(g(y) + z T (Aˆx y) + t y 2 Aˆx y 2 2) in each iteration, an alternating minimization of: Lagrangian f (x) + g(y) + z T (Ax y) over x augmented Lagrangian f (x) + g(y) + z T (Ax y) + t 2 Ax y 2 2 over y

Outline 7/19 1 proximal gradient method applied to the dual 2 Examples 3 alternating minimization method

Regularized norm approximation 8/19 min f (x) + Ax b (with f strongly convex) a special case of Page 4 with g(y) = y b { g b T z z 1 (x) = prox tg (z) = P C (z tb) + otherwise C is unit norm ball for dual norm dual gradient projection update ˆx = argmin x (f (x) + z T Ax) z + = P C (z + t(aˆx b))

Example 9/19 p min f (x) + B i x 2 i=1 (with f strongly convex) a special case of Page 4 with g(y 1,..., y p ) = p i=1 y i 2 and A = [ B T 1 B T 2 B T p ] T dual gradient projection update ˆx = argmin(f (x) + ( x p B T i z i ) T x) i=1 z + i = P Ci (z i + tb iˆx), i = 1,..., p C i is unit Euclidean norm ball in R m i, if B i R m i n

10 0 gradient 10/19 numerical example f (x) = 1 2 Cx d 2 2 with random generated C R 2000 1000, B i R 10 1000, p = 500 10-1 FISTA f dual objective f 10-2 10-3 10-4 10-5 10-6 0 100 200 300 400 500 600 700 800 900 k

11/19 Minimization over intersection of convex sets min s.t. f (x) x C 1 C m f strongly convex; e.g., f (x) = x a 2 2 for projecting a on intersection sets C i are closed, convex, and easy to project onto this is a special case of Page 4 with g a sum of indicators g(y 1,..., y m ) = I C1 (y 1 ) + + I Cm (y m ), A = [ I I ] T dual proximal gradient update ˆx = argmin x (f (x) + (z 1 + + z m ) T x) z + i = z i + tˆx tp Ci (z i /t + ˆx), i = 1,..., m

Decomposition of separable problems 12/19 n m min f j (x j ) + g i (A i1 x 1 + + A in x n ) j=1 i=1 each f i is strongly convex; g i has inexpensive prox-operator dual proximal gradient update z + i ˆx j = argmin x j (f j (x j ) + = prox tg i (z i + t m z T i A ij x j ), i=1 n A ij ˆx j ), j=1 j = 1,..., n i = 1,..., m

Outline 13/19 1 proximal gradient method applied to the dual 2 Examples 3 alternating minimization method

Primal problem with separable structure 14/19 composite problem with separable f min f 1 (x 1 ) + f 2 (x 2 ) + g(a 1 x 1 + A 2 x 2 ) we assume f 1 strongly convex, but not necessarily f 2 dual problem max f 1 ( A T 1 z) f 2 ( A T 2 z) g (z) first term is differentiable with Lipschitz continuous gradient prox-operator h(z) = f 2 ( AT 2 z) + g (z) was discussed

Dual proximal gradient method 15/19 z + = prox th (z + ta 1 f 1 ( A T 1 z)) equivalent form using f 1 : z + = prox th (z + ta 1 ˆx 1 ) where ˆx 1 = argmin x 1 (f 1 (x 1 ) + z T A 1 x 1 ) prox-operator of h(z) = f 2 ( AT 2 z) + g (z) is given by prox th (w) = w + t(a 2 ˆx 2 ŷ) where ˆx 2, ŷ minimize an augmented Lagrangian ( ˆx 2, ŷ) = argmin(f 2 (x 2 ) + g(y) + t x 2,y 2 A 2x 2 y + w/t 2 2)

Proof: prox th (w) = w + t(a 2 ˆx 2 ŷ) h(z) = f 2 ( AT 2 z) g (z) and h (y) = sup y T z f2 ( A T 2 z) g (z) z = sup y T z f2 (w) + g (z), s.t. w = A T 2 z z,w = inf v sup y T z f2 (w) g (z) + v T (w + A T 2 z) z,w = inf v f 2(v) + g(a 2 v + y) Moreau decomposition: w = prox th (w) + tprox t 1 h (w/t) min t 1 h (y) + 1 2 y w/t 2 2 min y,v f 2 (v) + g(a 2 v + y) + t 2 y w/t 2 2 min u,v f 2 (v) + g(u) + t 2 u A 2v w/t 2 2 using u = A 2 v + y prox th (w) = w ty = w t(u A 2 v) 16/19

Alternating minimization method 17/19 starting at some initial z, repeat the following iteration 1 minimize the Lagrangian over x 1 : ˆx 1 = argmin x 1 (f 1 (x 1 ) + z T A 1 x 1 ) 2 minimize the augmented Lagrangian over ˆx 2, ŷ: ( ˆx 2, ŷ) = argmin(f 2 (x 2 ) + g(y) + t x 2,y 2 A 1 ˆx 1 + A 2 x 2 y + z/t 2 2) 3 update dual variable: z + = z + t(a 1 ˆx 1 + A 2 ˆx 2 ŷ)

Comparison with augmented Lagrangian method 18/19 augmented Lagrangian method (for problem on page 14) 1 compute minimizer ˆx 1, ˆx 2, ŷ of the augmented Lagrangian f 1 (x 1 ) + f 2 (x 2 ) + g(y) + t 2 A 1x 1 + A 2 x 2 y + z/t 2 2 2 update dual variable: z + = z + t(a 1 ˆx 1 + A 2 ˆx 2 ŷ) differences with alternating minimization more general: AL method does not require strong convexity of f 1 quadratic penalty in step 1 destroys separability

References 19/19 alternating minimization method P. Tseng, Applications of a splitting algorithm to decomposition in convex programming and variational inequalities, SIAM J. Control and Optimization (1991) P. Tseng, Further applications of a splitting algorithm to decomposition in variational inequalities and convex programming, Mathematical Programming (1990)