Dual and primal-dual methods
|
|
- Deirdre Rogers
- 6 years ago
- Views:
Transcription
1 ELE 538B: Large-Scale Optimization for Data Science Dual and primal-dual methods Yuxin Chen Princeton University, Spring 2018
2 Outline Dual proximal gradient method Primal-dual proximal gradient method
3 Dual proximal gradient method
4 Constrained convex optimization minimize x subject to f(x) Ax + b C where f is convex, and C is convex set projection onto feasible set could be highly nontrivial (even when projection onto C is easy) Dual and primal-dual method 9-4
5 Constrained convex optimization More generally, minimize x f(x) + h(ax) where f and h are convex proximal operator w.r.t. h(x) := h(ax) could be difficult (even when prox h is inexpensive) Dual and primal-dual method 9-5
6 A possible route: dual formulation minimize x f(x) + h(ax) add auxiliary variable z minimize x,z subject to f(x) + h(z) Ax = z dual formulation: maximize λ min x,z f(x) + h(z) + λ, Ax z }{{} :=L(x,z,λ) (Lagrangian) Dual and primal-dual method 9-6
7 A possible route: dual formulation maximize λ maximize λ min x,z { min A λ, x + f(x) x,z maximize λ f(x) + h(z) + λ, Ax z decouple x and z } f ( A λ) h (λ) + min z { h(z) λ, z } where f (resp. h ) is Fenchel conjugate of f (resp. h) Dual and primal-dual method 9-7
8 Primal vs. dual problems (primal) minimize x f(x) + h(ax) (dual) minimize λ f ( A λ) + h (λ) Dual formulation is useful if proximal operator w.r.t. h is cheap (recall Moreau decomposition prox h (x) + prox h (x) = x) f is smooth (or if f is strongly convex) Dual and primal-dual method 9-8
9 Dual proximal gradient methods Apply proximal gradient methods for dual problem: Algorithm 9.1 Dual proximal gradient algorithm 1: for t = 0, 1, do 2: λ t+1 = prox ηth ( λ t + η t A f ( A λ t)) let Q(λ) := f ( A λ) h (λ) and Q opt = max λ Q(λ), then Q opt Q(λ t ) 1 t Dual and primal-dual method 9-9
10 Primal representation of dual proximal gradient methods Algorithm 9.1 admits a more explicit primal representation Algorithm 9.2 Dual proximal gradient algorithm (primal representation) 1: for t = 0, 1, do 2: x t { = arg min x f(x) + A λ t, x } 3: λ t+1 = λ t + η t Ax t η t prox η 1 t h ( η 1 t λ t + Ax t) {x t } is primal sequence, which is nevertheless not always feasible Dual and primal-dual method 9-10
11 Justification of primal representation By definition of x t, A λ t f(x t ) This together with conjugate subgradient theorem and smoothness of f yields x t = f ( A λ t ) Therefore, dual proximal gradient update rule can be rewritten as λ t+1 = prox ηth ( λ t + η t Ax t) Dual and primal-dual method 9-11
12 Justification of primal representation (cont.) Moreover, from extended Moreau decomposition, we know ( prox ηth λ t + η t Ax t) = λ t + η t Ax t ( η t prox η 1 t h η 1 t λ t + Ax t) = λ t+1 = λ t + η t Ax t ( η t prox η 1 t h η 1 t λ t + Ax t) Dual and primal-dual method 9-12
13 Accuracy of primal sequence One can control primal accuracy via dual accuracy: Lemma 9.1 Let x λ := arg min x { f(x) + A λ, x }. Suppose f is µ-strongly convex. Then x x λ 2 2 2( Q opt Q(λ) ) µ consequence: x x t 2 2 1/t Dual and primal-dual method 9-13
14 Recall that Lagrangian is given by Proof of Lemma 9.1 L(x, z, λ) := f(x) + A λ, x }{{} := f(x,λ) + h(z) λ, z }{{} := h(z,λ) For any λ, define x λ := arg min x f(x, λ) and zλ := arg min z h(x, λ) (non-rigorous). Then by strong convexity, L(x, z, λ) L(x λ, z λ, λ) f(x, λ) f(x λ, λ) 1 2 µ x x λ 2 2 In addition, since Ax = z, one has L(x, z, λ) = f(x ) + h(z ) + λ, Ax z = f(x ) + h(ax ) opt duality = F = Q opt This combined with L(x λ, z λ, λ) = Q(λ) gives Q opt Q(λ) 1 2 µ x x λ 2 2 as claimed Dual and primal-dual method 9-14
15 Accelerated dual proximal gradient methods One can apply FISTA to dual problem to improve convergence: Algorithm 9.3 Accelerated dual proximal gradient algorithm 1: for t = 0, 1, do 2: λ t+1 = prox ηth ( w t + η t A f ( A w t)) 3: θ t+1 = θ 2 t 2 4: w t+1 = λ t+1 + θt 1 θ t+1 ( λ t+1 λ t) apply FISTA theory and Lemma 9.1 to get Q opt Q(λ t ) 1 t 2 and x x t t 2 Dual and primal-dual method 9-15
16 Primal representation of accelerated dual proximal gradient methods Algorithm 9.3 admits more explicit primal representation Algorithm 9.4 Accelerated dual proximal gradient algorithm (primal representation) 1: for t = 0, 1, do 2: x t = arg min x f(x) + A w t, x 3: λ t+1 = w t + η t Ax t η t prox η 1 t h 4: θ t+1 = θ 2 t 2 5: w t+1 = λ t+1 + θt 1 θ t+1 ( λ t+1 λ t) ( η 1 t w t + Ax t) Dual and primal-dual method 9-16
17 Primal-dual proximal gradient method
18 Nonsmooth optimization minimize x f(x) + h(ax) where f and h are closed and convex both f and h might be non-smooth both f and h admit inexpensive proximal operators Dual and primal-dual method 9-18
19 Primal-dual approaches? minimize x f(x) + h(ax) So far we have discussed proximal method (resp. dual proximal method), which essentially updates only primal (resp. dual) variables Question: can we update both primal and dual variables simultaneously and take advantage of both prox f and prox h? Dual and primal-dual method 9-19
20 A saddle-point formulation To this end, we first derive saddle-point formulation that includes both primal and dual variables minimize x f(x) + h(ax) add auxiliary variable z minimize x,z f(x) + h(z) subject to Ax = z maximize λ min x,z f(x) + h(z) + λ, Ax z maximize λ min x f(x) + λ, Ax h (λ) minimize x max λ f(x) + λ, Ax h (λ) (saddle-point problem) Dual and primal-dual method 9-20
21 A saddle-point formulation minimize x max λ f(x) + λ, Ax h (λ) (9.1) one can then consider updating primal variable x and dual variable λ simultaneously we ll first examine optimality condition for (9.1), which in turn gives ideas about how to jointly update primal and dual variables Dual and primal-dual method 9-21
22 Optimality condition minimize x max λ f(x) + λ, Ax h (λ) optimality condition: { 0 f(x) + A λ 0 Ax h (λ) 0 [ A A ] [ x λ ] + [ f(x) h (λ) ] := F(x, λ) (9.2) key idea: iteratively update (x, λ) to reach a point obeying 0 F(x, λ) Dual and primal-dual method 9-22
23 How to solve 0 F(x) in general? In general, finding solution to 0 F(x) }{{} called monotone inclusion problem if F is maximal monotone x (I + F)(x) is equivalent to finding fixed point of (I + ηf) 1, i.e. solution to }{{} resolvent of F x = (I + ηf) 1 (x) This suggests natural fixed-point iteration / resolvent iteration: x t+1 = (I + ηf) 1 (x t ), t = 0, 1, Dual and primal-dual method 9-23
24 Aside: monotone operators Ryu, Boyd 16 relation F is called monotone if u v, x y 0, (x, u), (y, v) F relation F is called maximal monotone if there is no monotone operator that contains it Dual and primal-dual method 9-24
25 Proximal point method x t+1 = (I + η t F) 1 (x t ), t = 0, 1, If F = f for some convex function f, then this proximal point method becomes x t+1 = prox ηtf (x t ), t = 0, 1, useful when prox ηtf is cheap Dual and primal-dual method 9-25
26 Back to primal-dual approaches Recall that we want to solve [ ] [ ] A 0 x + A λ [ f(x) h (λ) ] := F(x, λ) issue of proximal point method: computing (I + ηf) 1 is in general difficult Dual and primal-dual method 9-26
27 Back to primal-dual approaches observation: practically we may often consider splitting F into two operators 0 A(x, λ) + B(x, λ) with A(x, λ) = [ A A ] [ x λ ], B(x, λ) = [ f(x) h (λ) ] (9.3) (I + ηa) 1 can be computed by solving linear systems (I + ηb) 1 is easy if prox f and prox h are both inexpensive solution: design update rules based on (I + ηa) 1 and (I + ηb) 1 instead of (I + ηf) 1 Dual and primal-dual method 9-27
28 Operator splitting via Cayley operators We now introduce one principled approach based on operator splitting find x s.t. 0 F(x) = A(x) + B(x) }{{} operator splitting let R A := (I + ηa) 1 and R B := (I + ηb) 1 be resolvents, and C A := 2R A I and C B := 2R B I be Cayley operators Lemma A(x) + B(x) }{{} x R A+B (x) C A C B (z) = z with x = R B (z) (9.4) }{{} it comes down to finding fixed point of C A C B Dual and primal-dual method 9-28
29 Operator splitting via Cayley operators x R A+B (x) C A C B (z) = z advantage: allows to apply C A (resp. R A ) and C B (resp. R B ) sequentially (instead of computing R A+B ) Dual and primal-dual method 9-29
30 Proof of Lemma 9.2 C A C B (z) = z x = R B (z) (9.5a) z = 2x z (9.5b) x = R B ( z) (9.5c) z = 2 x z (9.5d) From (9.5b) and (9.5d), we see that x = x which together with (9.5d) gives 2x = z + z (9.6) Dual and primal-dual method 9-30
31 Proof of Lemma 9.2 (cont.) Recall that z x + ηb(x) and z x + ηa(x) Adding these two facts and using (9.6), we get 2x = z + z 2x + ηb(x) + ηa(x) 0 A(x) + B(x) Dual and primal-dual method 9-31
32 Douglas-Rachford splitting How to find points obeying x = C A C B (x)? First attempt: fixed-point iteration z t+1 = C A C B (z t ) unfortunately, it may not converge in general Douglas-Rachford splitting: damped fixed-point iteration z t+1 = 1 ) I + CA C B (z 2( t ) converges when solution to 0 A(x) + B(x) exists! Dual and primal-dual method 9-32
33 More explicit expression for D-R splitting Douglas-Rachford splitting update rule z t+1 = 1 2( I + CA C B ) (z t ) is essentially: x t+ 1 2 = RB (z t ) z t+ 1 2 = 2x t+ 1 2 z t x t+1 ( = R A z t+ 1 ) 2 z t+1 = 1 z 2( t + 2x t+1 z t+ 1 ) 2 = z t + x t+1 x t+ 1 2 where x t+ 1 2 and z t+ 1 2 are auxiliary variables Dual and primal-dual method 9-33
34 More explicit expression for D-R splitting or equivalently, x t+ 1 2 = RB (z t ) x t+1 = R A ( 2x t+ 1 2 z t ) z t+1 = z t + x t+1 x t+ 1 2 Dual and primal-dual method 9-33
35 Douglas-Rachford primal-dual splitting minimize x max λ f(x) + λ, Ax h (λ) Applying Douglas-Rachford splitting to (9.3) yields x t+ 1 2 = proxηf (p t ) λ t+ 1 2 = proxηg (q t ) [ ] [ x t+1 I ηa λ t+1 = ηa I ] 1 [ 2x t+ 1 2 p t 2λ t+ 1 2 q t ] p t+1 = p t + x t+1 x t+ 1 2 q t+1 = q t + λ t+1 λ t+ 1 2 Dual and primal-dual method 9-34
36 Example minimize x x 2 + γ Ax b 1 minimize x f(x) + g(ax) with f(x) := x 2 and g(y) := γ y b x k x / x ADMM primal DR primal dual DR iteration number k Connor, Vandenberghe 14 Dual and primal-dual method 9-35
37 Example minimize Kx b 1 + γ Dx iso }{{} certain l 2 l 1 norm minimize x f(x) + g(ax) s.t. 0 x 1 with f(x) := 1 {0 x 1} (x) and g(y 1, y 2 ) := y 1 b 1 + γ y 2 iso (f(x k ) f )/f CP ADMM primal DR primal dual DR iteration number k Connor, Vandenberghe 14 Dual and primal-dual method 9-36
38 Reference [1] Optimization methods for large-scale systems, EE236C lecture notes, L. Vandenberghe, UCLA. [2] Convex optimization, EE364B lecture notes, S. Boyd, Stanford. [3] Mathematical optimization, MATH301 lecture notes, E. Candes, Stanford. [4] First-order methods in optimization, A. Beck, Vol. 25, SIAM, [5] Primal-dual decomposition by operator splitting and applications to image deblurring, D. O Connor, L. Vandenberghe, SIAM Journal on Imaging Sciences, [6] On the numerical solution of heat conduction problems in two and three space variables, J. Douglas, H. Rachford, Transactions of the American mathematical Society, Dual and primal-dual method 9-37
39 Reference [7] A primer on monotone operator methods, E. Ryu, S. Boyd, Appl. Comput. Math., Dual and primal-dual method 9-38
Dual Proximal Gradient Method
Dual Proximal Gradient Method http://bicmr.pku.edu.cn/~wenzw/opt-2016-fall.html Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes Outline 2/19 1 proximal gradient method
More informationAccelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems)
Accelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems) Donghwan Kim and Jeffrey A. Fessler EECS Department, University of Michigan
More informationProximal gradient methods
ELE 538B: Large-Scale Optimization for Data Science Proximal gradient methods Yuxin Chen Princeton University, Spring 08 Outline Proximal gradient descent for composite functions Proximal mapping / operator
More informationAgenda. Fast proximal gradient methods. 1 Accelerated first-order methods. 2 Auxiliary sequences. 3 Convergence analysis. 4 Numerical examples
Agenda Fast proximal gradient methods 1 Accelerated first-order methods 2 Auxiliary sequences 3 Convergence analysis 4 Numerical examples 5 Optimality of Nesterov s scheme Last time Proximal gradient method
More information9. Dual decomposition and dual algorithms
EE 546, Univ of Washington, Spring 2016 9. Dual decomposition and dual algorithms dual gradient ascent example: network rate control dual decomposition and the proximal gradient method examples with simple
More information6. Proximal gradient method
L. Vandenberghe EE236C (Spring 2013-14) 6. Proximal gradient method motivation proximal mapping proximal gradient method with fixed step size proximal gradient method with line search 6-1 Proximal mapping
More informationUses of duality. Geoff Gordon & Ryan Tibshirani Optimization /
Uses of duality Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Remember conjugate functions Given f : R n R, the function is called its conjugate f (y) = max x R n yt x f(x) Conjugates appear
More information6. Proximal gradient method
L. Vandenberghe EE236C (Spring 2016) 6. Proximal gradient method motivation proximal mapping proximal gradient method with fixed step size proximal gradient method with line search 6-1 Proximal mapping
More informationAccelerated gradient methods
ELE 538B: Large-Scale Optimization for Data Science Accelerated gradient methods Yuxin Chen Princeton University, Spring 018 Outline Heavy-ball methods Nesterov s accelerated gradient methods Accelerated
More informationCourse Notes for EE227C (Spring 2018): Convex Optimization and Approximation
Course otes for EE7C (Spring 018): Conve Optimization and Approimation Instructor: Moritz Hardt Email: hardt+ee7c@berkeley.edu Graduate Instructor: Ma Simchowitz Email: msimchow+ee7c@berkeley.edu October
More informationDual Decomposition.
1/34 Dual Decomposition http://bicmr.pku.edu.cn/~wenzw/opt-2017-fall.html Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes Outline 2/34 1 Conjugate function 2 introduction:
More informationCoordinate Update Algorithm Short Course Subgradients and Subgradient Methods
Coordinate Update Algorithm Short Course Subgradients and Subgradient Methods Instructor: Wotao Yin (UCLA Math) Summer 2016 1 / 30 Notation f : H R { } is a closed proper convex function domf := {x R n
More informationFast proximal gradient methods
L. Vandenberghe EE236C (Spring 2013-14) Fast proximal gradient methods fast proximal gradient method (FISTA) FISTA with line search FISTA as descent method Nesterov s second method 1 Fast (proximal) gradient
More informationLasso: Algorithms and Extensions
ELE 538B: Sparsity, Structure and Inference Lasso: Algorithms and Extensions Yuxin Chen Princeton University, Spring 2017 Outline Proximal operators Proximal gradient methods for lasso and its extensions
More informationMath 273a: Optimization Convex Conjugacy
Math 273a: Optimization Convex Conjugacy Instructor: Wotao Yin Department of Mathematics, UCLA Fall 2015 online discussions on piazza.com Convex conjugate (the Legendre transform) Let f be a closed proper
More informationProximal splitting methods on convex problems with a quadratic term: Relax!
Proximal splitting methods on convex problems with a quadratic term: Relax! The slides I presented with added comments Laurent Condat GIPSA-lab, Univ. Grenoble Alpes, France Workshop BASP Frontiers, Jan.
More informationCoordinate Update Algorithm Short Course Operator Splitting
Coordinate Update Algorithm Short Course Operator Splitting Instructor: Wotao Yin (UCLA Math) Summer 2016 1 / 25 Operator splitting pipeline 1. Formulate a problem as 0 A(x) + B(x) with monotone operators
More information1. Gradient method. gradient method, first-order methods. quadratic bounds on convex functions. analysis of gradient method
L. Vandenberghe EE236C (Spring 2016) 1. Gradient method gradient method, first-order methods quadratic bounds on convex functions analysis of gradient method 1-1 Approximate course outline First-order
More informationSplitting methods for decomposing separable convex programs
Splitting methods for decomposing separable convex programs Philippe Mahey LIMOS - ISIMA - Université Blaise Pascal PGMO, ENSTA 2013 October 4, 2013 1 / 30 Plan 1 Max Monotone Operators Proximal techniques
More informationShiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 9. Alternating Direction Method of Multipliers
Shiqian Ma, MAT-258A: Numerical Optimization 1 Chapter 9 Alternating Direction Method of Multipliers Shiqian Ma, MAT-258A: Numerical Optimization 2 Separable convex optimization a special case is min f(x)
More informationConvergence of Fixed-Point Iterations
Convergence of Fixed-Point Iterations Instructor: Wotao Yin (UCLA Math) July 2016 1 / 30 Why study fixed-point iterations? Abstract many existing algorithms in optimization, numerical linear algebra, and
More informationAccelerated Proximal Gradient Methods for Convex Optimization
Accelerated Proximal Gradient Methods for Convex Optimization Paul Tseng Mathematics, University of Washington Seattle MOPTA, University of Guelph August 18, 2008 ACCELERATED PROXIMAL GRADIENT METHODS
More informationOptimization methods
Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,
More informationSparse Optimization Lecture: Dual Methods, Part I
Sparse Optimization Lecture: Dual Methods, Part I Instructor: Wotao Yin July 2013 online discussions on piazza.com Those who complete this lecture will know dual (sub)gradient iteration augmented l 1 iteration
More informationA General Framework for a Class of Primal-Dual Algorithms for TV Minimization
A General Framework for a Class of Primal-Dual Algorithms for TV Minimization Ernie Esser UCLA 1 Outline A Model Convex Minimization Problem Main Idea Behind the Primal Dual Hybrid Gradient (PDHG) Method
More informationMath 273a: Optimization Overview of First-Order Optimization Algorithms
Math 273a: Optimization Overview of First-Order Optimization Algorithms Wotao Yin Department of Mathematics, UCLA online discussions on piazza.com 1 / 9 Typical flow of numerical optimization Optimization
More informationMath 273a: Optimization Subgradients of convex functions
Math 273a: Optimization Subgradients of convex functions Made by: Damek Davis Edited by Wotao Yin Department of Mathematics, UCLA Fall 2015 online discussions on piazza.com 1 / 42 Subgradients Assumptions
More informationOptimization for Learning and Big Data
Optimization for Learning and Big Data Donald Goldfarb Department of IEOR Columbia University Department of Mathematics Distinguished Lecture Series May 17-19, 2016. Lecture 1. First-Order Methods for
More informationDual methods and ADMM. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725
Dual methods and ADMM Barnabas Poczos & Ryan Tibshirani Convex Optimization 10-725/36-725 1 Given f : R n R, the function is called its conjugate Recall conjugate functions f (y) = max x R n yt x f(x)
More informationLECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE
LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE CONVEX ANALYSIS AND DUALITY Basic concepts of convex analysis Basic concepts of convex optimization Geometric duality framework - MC/MC Constrained optimization
More informationTight Rates and Equivalence Results of Operator Splitting Schemes
Tight Rates and Equivalence Results of Operator Splitting Schemes Wotao Yin (UCLA Math) Workshop on Optimization for Modern Computing Joint w Damek Davis and Ming Yan UCLA CAM 14-51, 14-58, and 14-59 1
More informationNewton s Method. Javier Peña Convex Optimization /36-725
Newton s Method Javier Peña Convex Optimization 10-725/36-725 1 Last time: dual correspondences Given a function f : R n R, we define its conjugate f : R n R, f ( (y) = max y T x f(x) ) x Properties and
More informationProximal Gradient Descent and Acceleration. Ryan Tibshirani Convex Optimization /36-725
Proximal Gradient Descent and Acceleration Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: subgradient method Consider the problem min f(x) with f convex, and dom(f) = R n. Subgradient method:
More informationThe Alternating Direction Method of Multipliers
The Alternating Direction Method of Multipliers With Adaptive Step Size Selection Peter Sutor, Jr. Project Advisor: Professor Tom Goldstein December 2, 2015 1 / 25 Background The Dual Problem Consider
More informationSplitting Techniques in the Face of Huge Problem Sizes: Block-Coordinate and Block-Iterative Approaches
Splitting Techniques in the Face of Huge Problem Sizes: Block-Coordinate and Block-Iterative Approaches Patrick L. Combettes joint work with J.-C. Pesquet) Laboratoire Jacques-Louis Lions Faculté de Mathématiques
More informationThe proximal mapping
The proximal mapping http://bicmr.pku.edu.cn/~wenzw/opt-2016-fall.html Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes Outline 2/37 1 closed function 2 Conjugate function
More informationContraction Methods for Convex Optimization and Monotone Variational Inequalities No.16
XVI - 1 Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.16 A slightly changed ADMM for convex optimization with three separable operators Bingsheng He Department of
More informationarxiv: v1 [math.oc] 21 Apr 2016
Accelerated Douglas Rachford methods for the solution of convex-concave saddle-point problems Kristian Bredies Hongpeng Sun April, 06 arxiv:604.068v [math.oc] Apr 06 Abstract We study acceleration and
More informationECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference
ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring
More informationDuality in Linear Programs. Lecturer: Ryan Tibshirani Convex Optimization /36-725
Duality in Linear Programs Lecturer: Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: proximal gradient descent Consider the problem x g(x) + h(x) with g, h convex, g differentiable, and
More informationMath 273a: Optimization Subgradient Methods
Math 273a: Optimization Subgradient Methods Instructor: Wotao Yin Department of Mathematics, UCLA Fall 2015 online discussions on piazza.com Nonsmooth convex function Recall: For ˉx R n, f(ˉx) := {g R
More informationYou should be able to...
Lecture Outline Gradient Projection Algorithm Constant Step Length, Varying Step Length, Diminishing Step Length Complexity Issues Gradient Projection With Exploration Projection Solving QPs: active set
More informationFrank-Wolfe Method. Ryan Tibshirani Convex Optimization
Frank-Wolfe Method Ryan Tibshirani Convex Optimization 10-725 Last time: ADMM For the problem min x,z f(x) + g(z) subject to Ax + Bz = c we form augmented Lagrangian (scaled form): L ρ (x, z, w) = f(x)
More informationProximal Methods for Optimization with Spasity-inducing Norms
Proximal Methods for Optimization with Spasity-inducing Norms Group Learning Presentation Xiaowei Zhou Department of Electronic and Computer Engineering The Hong Kong University of Science and Technology
More informationConditional Gradient (Frank-Wolfe) Method
Conditional Gradient (Frank-Wolfe) Method Lecturer: Aarti Singh Co-instructor: Pradeep Ravikumar Convex Optimization 10-725/36-725 1 Outline Today: Conditional gradient method Convergence analysis Properties
More informationEE 546, Univ of Washington, Spring Proximal mapping. introduction. review of conjugate functions. proximal mapping. Proximal mapping 6 1
EE 546, Univ of Washington, Spring 2012 6. Proximal mapping introduction review of conjugate functions proximal mapping Proximal mapping 6 1 Proximal mapping the proximal mapping (prox-operator) of a convex
More informationConvex Optimization. (EE227A: UC Berkeley) Lecture 15. Suvrit Sra. (Gradient methods III) 12 March, 2013
Convex Optimization (EE227A: UC Berkeley) Lecture 15 (Gradient methods III) 12 March, 2013 Suvrit Sra Optimal gradient methods 2 / 27 Optimal gradient methods We saw following efficiency estimates for
More informationOn the interior of the simplex, we have the Hessian of d(x), Hd(x) is diagonal with ith. µd(w) + w T c. minimize. subject to w T 1 = 1,
Math 30 Winter 05 Solution to Homework 3. Recognizing the convexity of g(x) := x log x, from Jensen s inequality we get d(x) n x + + x n n log x + + x n n where the equality is attained only at x = (/n,...,
More informationCoordinate Update Algorithm Short Course Proximal Operators and Algorithms
Coordinate Update Algorithm Short Course Proximal Operators and Algorithms Instructor: Wotao Yin (UCLA Math) Summer 2016 1 / 36 Why proximal? Newton s method: for C 2 -smooth, unconstrained problems allow
More informationPrimal/Dual Decomposition Methods
Primal/Dual Decomposition Methods Daniel P. Palomar Hong Kong University of Science and Technology (HKUST) ELEC5470 - Convex Optimization Fall 2018-19, HKUST, Hong Kong Outline of Lecture Subgradients
More informationLecture: Algorithms for Compressed Sensing
1/56 Lecture: Algorithms for Compressed Sensing Zaiwen Wen Beijing International Center For Mathematical Research Peking University http://bicmr.pku.edu.cn/~wenzw/bigdata2017.html wenzw@pku.edu.cn Acknowledgement:
More informationA Tutorial on Primal-Dual Algorithm
A Tutorial on Primal-Dual Algorithm Shenlong Wang University of Toronto March 31, 2016 1 / 34 Energy minimization MAP Inference for MRFs Typical energies consist of a regularization term and a data term.
More informationRecent Developments of Alternating Direction Method of Multipliers with Multi-Block Variables
Recent Developments of Alternating Direction Method of Multipliers with Multi-Block Variables Department of Systems Engineering and Engineering Management The Chinese University of Hong Kong 2014 Workshop
More informationA Dykstra-like algorithm for two monotone operators
A Dykstra-like algorithm for two monotone operators Heinz H. Bauschke and Patrick L. Combettes Abstract Dykstra s algorithm employs the projectors onto two closed convex sets in a Hilbert space to construct
More informationMaster 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique
Master 2 MathBigData S. Gaïffas 1 3 novembre 2014 1 CMAP - Ecole Polytechnique 1 Supervised learning recap Introduction Loss functions, linearity 2 Penalization Introduction Ridge Sparsity Lasso 3 Some
More informationConvex Optimization Conjugate, Subdifferential, Proximation
1 Lecture Notes, HCI, 3.11.211 Chapter 6 Convex Optimization Conjugate, Subdifferential, Proximation Bastian Goldlücke Computer Vision Group Technical University of Munich 2 Bastian Goldlücke Overview
More informationLecture 8 Plus properties, merit functions and gap functions. September 28, 2008
Lecture 8 Plus properties, merit functions and gap functions September 28, 2008 Outline Plus-properties and F-uniqueness Equation reformulations of VI/CPs Merit functions Gap merit functions FP-I book:
More information2 Regularized Image Reconstruction for Compressive Imaging and Beyond
EE 367 / CS 448I Computational Imaging and Display Notes: Compressive Imaging and Regularized Image Reconstruction (lecture ) Gordon Wetzstein gordon.wetzstein@stanford.edu This document serves as a supplement
More informationNewton s Method. Ryan Tibshirani Convex Optimization /36-725
Newton s Method Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: dual correspondences Given a function f : R n R, we define its conjugate f : R n R, Properties and examples: f (y) = max x
More informationConstrained Optimization and Lagrangian Duality
CIS 520: Machine Learning Oct 02, 2017 Constrained Optimization and Lagrangian Duality Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may
More informationOptimization for Machine Learning
Optimization for Machine Learning (Problems; Algorithms - A) SUVRIT SRA Massachusetts Institute of Technology PKU Summer School on Data Science (July 2017) Course materials http://suvrit.de/teaching.html
More informationConvex Optimization & Lagrange Duality
Convex Optimization & Lagrange Duality Chee Wei Tan CS 8292 : Advanced Topics in Convex Optimization and its Applications Fall 2010 Outline Convex optimization Optimality condition Lagrange duality KKT
More informationThis can be 2 lectures! still need: Examples: non-convex problems applications for matrix factorization
This can be 2 lectures! still need: Examples: non-convex problems applications for matrix factorization x = prox_f(x)+prox_{f^*}(x) use to get prox of norms! PROXIMAL METHODS WHY PROXIMAL METHODS Smooth
More informationACCELERATED FIRST-ORDER PRIMAL-DUAL PROXIMAL METHODS FOR LINEARLY CONSTRAINED COMPOSITE CONVEX PROGRAMMING
ACCELERATED FIRST-ORDER PRIMAL-DUAL PROXIMAL METHODS FOR LINEARLY CONSTRAINED COMPOSITE CONVEX PROGRAMMING YANGYANG XU Abstract. Motivated by big data applications, first-order methods have been extremely
More informationDistributed Optimization via Alternating Direction Method of Multipliers
Distributed Optimization via Alternating Direction Method of Multipliers Stephen Boyd, Neal Parikh, Eric Chu, Borja Peleato Stanford University ITMANET, Stanford, January 2011 Outline precursors dual decomposition
More informationOperator Splitting for Parallel and Distributed Optimization
Operator Splitting for Parallel and Distributed Optimization Wotao Yin (UCLA Math) Shanghai Tech, SSDS 15 June 23, 2015 URL: alturl.com/2z7tv 1 / 60 What is splitting? Sun-Tzu: (400 BC) Caesar: divide-n-conquer
More informationLecture 6: Conic Optimization September 8
IE 598: Big Data Optimization Fall 2016 Lecture 6: Conic Optimization September 8 Lecturer: Niao He Scriber: Juan Xu Overview In this lecture, we finish up our previous discussion on optimality conditions
More informationOptimization for Machine Learning
Optimization for Machine Learning Editors: Suvrit Sra suvrit@gmail.com Max Planck Insitute for Biological Cybernetics 72076 Tübingen, Germany Sebastian Nowozin Microsoft Research Cambridge, CB3 0FB, United
More informationAccelerated primal-dual methods for linearly constrained convex problems
Accelerated primal-dual methods for linearly constrained convex problems Yangyang Xu SIAM Conference on Optimization May 24, 2017 1 / 23 Accelerated proximal gradient For convex composite problem: minimize
More informationEE 367 / CS 448I Computational Imaging and Display Notes: Image Deconvolution (lecture 6)
EE 367 / CS 448I Computational Imaging and Display Notes: Image Deconvolution (lecture 6) Gordon Wetzstein gordon.wetzstein@stanford.edu This document serves as a supplement to the material discussed in
More informationLecture Note 5: Semidefinite Programming for Stability Analysis
ECE7850: Hybrid Systems:Theory and Applications Lecture Note 5: Semidefinite Programming for Stability Analysis Wei Zhang Assistant Professor Department of Electrical and Computer Engineering Ohio State
More informationConvex Optimization Algorithms for Machine Learning in 10 Slides
Convex Optimization Algorithms for Machine Learning in 10 Slides Presenter: Jul. 15. 2015 Outline 1 Quadratic Problem Linear System 2 Smooth Problem Newton-CG 3 Composite Problem Proximal-Newton-CD 4 Non-smooth,
More informationADMM for monotone operators: convergence analysis and rates
ADMM for monotone operators: convergence analysis and rates Radu Ioan Boţ Ernö Robert Csetne May 4, 07 Abstract. We propose in this paper a unifying scheme for several algorithms from the literature dedicated
More informationLecture 8: February 9
0-725/36-725: Convex Optimiation Spring 205 Lecturer: Ryan Tibshirani Lecture 8: February 9 Scribes: Kartikeya Bhardwaj, Sangwon Hyun, Irina Caan 8 Proximal Gradient Descent In the previous lecture, we
More informationA SIMPLE PARALLEL ALGORITHM WITH AN O(1/T ) CONVERGENCE RATE FOR GENERAL CONVEX PROGRAMS
A SIMPLE PARALLEL ALGORITHM WITH AN O(/T ) CONVERGENCE RATE FOR GENERAL CONVEX PROGRAMS HAO YU AND MICHAEL J. NEELY Abstract. This paper considers convex programs with a general (possibly non-differentiable)
More informationA GENERAL FRAMEWORK FOR A CLASS OF FIRST ORDER PRIMAL-DUAL ALGORITHMS FOR TV MINIMIZATION
A GENERAL FRAMEWORK FOR A CLASS OF FIRST ORDER PRIMAL-DUAL ALGORITHMS FOR TV MINIMIZATION ERNIE ESSER XIAOQUN ZHANG TONY CHAN Abstract. We generalize the primal-dual hybrid gradient (PDHG) algorithm proposed
More informationarxiv: v1 [math.oc] 23 May 2017
A DERANDOMIZED ALGORITHM FOR RP-ADMM WITH SYMMETRIC GAUSS-SEIDEL METHOD JINCHAO XU, KAILAI XU, AND YINYU YE arxiv:1705.08389v1 [math.oc] 23 May 2017 Abstract. For multi-block alternating direction method
More informationOn the equivalence of the primal-dual hybrid gradient method and Douglas Rachford splitting
Mathematical Programming manuscript No. (will be inserted by the editor) On the equivalence of the primal-dual hybrid gradient method and Douglas Rachford splitting Daniel O Connor Lieven Vandenberghe
More informationOn the acceleration of the double smoothing technique for unconstrained convex optimization problems
On the acceleration of the double smoothing technique for unconstrained convex optimization problems Radu Ioan Boţ Christopher Hendrich October 10, 01 Abstract. In this article we investigate the possibilities
More informationA GENERAL FRAMEWORK FOR A CLASS OF FIRST ORDER PRIMAL-DUAL ALGORITHMS FOR CONVEX OPTIMIZATION IN IMAGING SCIENCE
A GENERAL FRAMEWORK FOR A CLASS OF FIRST ORDER PRIMAL-DUAL ALGORITHMS FOR CONVEX OPTIMIZATION IN IMAGING SCIENCE ERNIE ESSER XIAOQUN ZHANG TONY CHAN Abstract. We generalize the primal-dual hybrid gradient
More informationLagrange duality. The Lagrangian. We consider an optimization program of the form
Lagrange duality Another way to arrive at the KKT conditions, and one which gives us some insight on solving constrained optimization problems, is through the Lagrange dual. The dual is a maximization
More informationConvex Optimization. Dani Yogatama. School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA. February 12, 2014
Convex Optimization Dani Yogatama School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA February 12, 2014 Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12,
More informationSubgradient Method. Ryan Tibshirani Convex Optimization
Subgradient Method Ryan Tibshirani Convex Optimization 10-725 Consider the problem Last last time: gradient descent min x f(x) for f convex and differentiable, dom(f) = R n. Gradient descent: choose initial
More informationDual Ascent. Ryan Tibshirani Convex Optimization
Dual Ascent Ryan Tibshirani Conve Optimization 10-725 Last time: coordinate descent Consider the problem min f() where f() = g() + n i=1 h i( i ), with g conve and differentiable and each h i conve. Coordinate
More informationSparse Covariance Selection using Semidefinite Programming
Sparse Covariance Selection using Semidefinite Programming A. d Aspremont ORFE, Princeton University Joint work with O. Banerjee, L. El Ghaoui & G. Natsoulis, U.C. Berkeley & Iconix Pharmaceuticals Support
More informationA Multilevel Proximal Algorithm for Large Scale Composite Convex Optimization
A Multilevel Proximal Algorithm for Large Scale Composite Convex Optimization Panos Parpas Department of Computing Imperial College London www.doc.ic.ac.uk/ pp500 p.parpas@imperial.ac.uk jointly with D.V.
More informationA Primal-dual Three-operator Splitting Scheme
Noname manuscript No. (will be inserted by the editor) A Primal-dual Three-operator Splitting Scheme Ming Yan Received: date / Accepted: date Abstract In this paper, we propose a new primal-dual algorithm
More informationCourse Notes for EE227C (Spring 2018): Convex Optimization and Approximation
Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Instructor: Moritz Hardt Email: hardt+ee227c@berkeley.edu Graduate Instructor: Max Simchowitz Email: msimchow+ee227c@berkeley.edu
More informationNesterov s Acceleration
Nesterov s Acceleration Nesterov Accelerated Gradient min X f(x)+ (X) f -smooth. Set s 1 = 1 and = 1. Set y 0. Iterate by increasing t: g t 2 @f(y t ) s t+1 = 1+p 1+4s 2 t 2 y t = x t + s t 1 s t+1 (x
More informationminimize x subject to (x 2)(x 4) u,
Math 6366/6367: Optimization and Variational Methods Sample Preliminary Exam Questions 1. Suppose that f : [, L] R is a C 2 -function with f () on (, L) and that you have explicit formulae for
More informationSum-Power Iterative Watefilling Algorithm
Sum-Power Iterative Watefilling Algorithm Daniel P. Palomar Hong Kong University of Science and Technolgy (HKUST) ELEC547 - Convex Optimization Fall 2009-10, HKUST, Hong Kong November 11, 2009 Outline
More informationProximal methods. S. Villa. October 7, 2014
Proximal methods S. Villa October 7, 2014 1 Review of the basics Often machine learning problems require the solution of minimization problems. For instance, the ERM algorithm requires to solve a problem
More informationLecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem
Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem Michael Patriksson 0-0 The Relaxation Theorem 1 Problem: find f := infimum f(x), x subject to x S, (1a) (1b) where f : R n R
More informationarxiv: v4 [math.oc] 29 Jan 2018
Noname manuscript No. (will be inserted by the editor A new primal-dual algorithm for minimizing the sum of three functions with a linear operator Ming Yan arxiv:1611.09805v4 [math.oc] 29 Jan 2018 Received:
More informationUVA CS 4501: Machine Learning
UVA CS 4501: Machine Learning Lecture 16 Extra: Support Vector Machine Optimization with Dual Dr. Yanjun Qi University of Virginia Department of Computer Science Today Extra q Optimization of SVM ü SVM
More informationOn convergence rate of the Douglas-Rachford operator splitting method
On convergence rate of the Douglas-Rachford operator splitting method Bingsheng He and Xiaoming Yuan 2 Abstract. This note provides a simple proof on a O(/k) convergence rate for the Douglas- Rachford
More informationSmoothing Proximal Gradient Method. General Structured Sparse Regression
for General Structured Sparse Regression Xi Chen, Qihang Lin, Seyoung Kim, Jaime G. Carbonell, Eric P. Xing (Annals of Applied Statistics, 2012) Gatsby Unit, Tea Talk October 25, 2013 Outline Motivation:
More informationE5295/5B5749 Convex optimization with engineering applications. Lecture 5. Convex programming and semidefinite programming
E5295/5B5749 Convex optimization with engineering applications Lecture 5 Convex programming and semidefinite programming A. Forsgren, KTH 1 Lecture 5 Convex optimization 2006/2007 Convex quadratic program
More informationPrimal-dual algorithms for the sum of two and three functions 1
Primal-dual algorithms for the sum of two and three functions 1 Ming Yan Michigan State University, CMSE/Mathematics 1 This works is partially supported by NSF. optimization problems for primal-dual algorithms
More informationPrimal-dual coordinate descent A Coordinate Descent Primal-Dual Algorithm with Large Step Size and Possibly Non-Separable Functions
Primal-dual coordinate descent A Coordinate Descent Primal-Dual Algorithm with Large Step Size and Possibly Non-Separable Functions Olivier Fercoq and Pascal Bianchi Problem Minimize the convex function
More information