Coordinate Update Algorithm Short Course Proximal Operators and Algorithms

Size: px
Start display at page:

Download "Coordinate Update Algorithm Short Course Proximal Operators and Algorithms"

Transcription

1 Coordinate Update Algorithm Short Course Proximal Operators and Algorithms Instructor: Wotao Yin (UCLA Math) Summer / 36

2 Why proximal? Newton s method: for C 2 -smooth, unconstrained problems allow modest size Gradient method: for C 1 -smooth, unconstrained problems give large size and parallel implementations Proximal method: for smooth and non-smooth, constrained and unconstrained exploit problem structures gives large size and parallel implementations 2 / 36

3 Newton s algorithm uses low-level (explicit) operation: x k+1 x k λh 1 (x k ) f(x k ) Gradient algorithm uses low-level (explicit) operation: x k+1 x k λ f(x k ) Proximal-point algorithm uses high-level (implicit) operation: x k+1 prox λf (x k ) well-known algorithms are special cases Proximal operator prox λf is an optimization problem either standalone or used as a subproblem simple for structured f, but there are many 3 / 36

4 Notation and Assumptions f : R n R { } is a closed, proper, convex function (why? to ensure prox λf is well-defined and unique) proper refers to domf closed refers to epif is a closed set can extend R n to general Hilbert space H can take. This saves x domf in many settings. an operator maps R n to R n, also called a mapping or a map indicator function for set C: ι C(x) := { 0, if x C, otherwise. arg min x f(x) is the set of minimizers of f; if unique, is the minimizer 4 / 36

5 Definition Definition The proximal operator prox f : R n R n of a function f is defined by: ( prox f (v) = arg min f(x) + 1 ) x R n 2 x v 2 The (scaled) proximal operator prox λf : R n R n is defined by: ( prox λf (v) = arg min f(x) + 1 ) x R n 2λ x v 2 Strong convex problems with unique minimizers, so = is used Moreau envelope: ( f(v) := min f(x) + 1 ) x R n 2λ x v 2 is a differentiable function. f(v) = 1 λ (v x ) 5 / 36

6 Proximal of indicator function is projection Consider a nonempty closed convex set C prox ιc (x) = arg min (ι C(y) + 1 ) y 2 y x 2 1 = arg min y x 2 y C 2 =: proj C (x) 6 / 36

7 7 / 36 1D illustration f(x) + 1 x 2 v 2 is above f(x) unless v arg min f, prox λf (v) moves away from v f ( prox λf (v) ) < f(v)

8 Proximal parameter Tuning λ in ( prox λf (v) = arg min f(x) + 1 ) x R n 2λ x v 2 as λ : prox λf (v) proj arg min f(x) (v) as λ 0, prox λf (v) proj domf (v) where { } 1 proj domf (v) = arg min x R n 2 v x 2 : f(x) is finite (prox λf (v) v) is generally nonlinear in λ, so λ is a nonlinear step size 8 / 36

9 prox λf is soft projection The path {prox λf (v) : λ > 0} domf prox λf (v) is between and proj domf (v) and proj arg min f(x) (v) The paths by different v may overlap or join If v arg min f, then prox arg min f(x) (v) = v is a fixed point 9 / 36

10 Examples

11 Examples linear function: let a R n, b R and proximal of linear function: f(x) := a T x + b. ( prox λf (v) := arg min (a T x + b) + 1 ) x R n 2λ x v 2 has first-order optimality conditions: a + 1 λ (prox λf (v) v) = 0 prox λf (v) = v λa application: proximal of the linear approximation of f let f (1) (x) = f(x 0 ) + f(x 0 ), x x 0 then, prox λf (1)(x 0 ) = x 0 λ f(x 0 ) is a gradient step with size λ 10 / 36

12 Examples quadratic function let A be a symmetric positive semi-definite matrix, b R n, and proximal of quadratic function: f(x) := 1 2 xt Ax b T x + c. ( prox λf (v) := arg min f(x) + 1 ) x R n 2λ x v 2 has first order optimality conditions: (Av b) + 1 λ (v v) = 0 v = (λa + I) 1 (λb + v) v = (λa + I) 1 (λb + λav + v λav) v = v + (A + 1 λ I) 1 (b Av) It recovers an iterative refinement method for Ax = b. 11 / 36

13 application: proximal of the quadratic approximation of f let f (2) (x) = f(x 0 ) + f(x 0 ), x x (x 2 x0 ) T 2 f(x 0 )(x x 0 ) =: 1 2 xt Ax b T x + c where A = 2 f(x 0 ) b = ( 2 f(x 0 )) T x 0 f(x 0 ) by letting v = x 0, we get prox λf (2)(x 0 ) = x 0 ( 2 f(x 0 ) + 1 λ I) 1 f(x 0 ) recovers the modified-hessian Newton or Levenberg-Marquardt method 12 / 36

14 Examples l 1-norm: let f(x) = x 1. proximal of l 1-norm: Then ( prox λf (v) := arg min x ) x R n 2λ x v 2 The subgradient optimality condition 0 f(v ) + 1 λ (v v) v v λ f(v ) Recall f(x) = x 1 x n Hence, it reduces to component-wise subproblems: v i v i v i 13 / 36

15 proximal of l 1-norm (cont). Three cases: v i > 0, then v i v i = λ, so v i = v i λ v i < 0, then v i v i = λ, so v i = v i + λ v i = 0, then v i = v i v i [ λ, λ]. Rewriting these conditions in terms of v, if v i > λ, then vi = v i λ if v i < λ, then vi = v i + λ if v i [ λ, λ], then vi = 0. prox λf is the shrinkage (or element-wise soft-thresholding) operator shrink(v, λ) i = max( v i λ, 0) vi v i In Matlab: max(abs(v)-lambda,0).*sign(v) 14 / 36

16 Examples proximal of l 2-norm: let f(x) = x 2, then x prox λf (x) = max( x 2 λ, 0) x 2 = x proj B2 (0,λ)x, special convention: 0/0 = 0 if x = 0. general pattern: proximal of l p-norm: suppose p 1 + q 1 = 1, p, q [1, ]. Then, prox λf (x) = x proj x, Bq(0,λ) Useful for getting the proximals of l -norm and l 2,1-norm 15 / 36

17 Examples unitary-invariant matrix norms are vector norms on the singular values Frobenius norm: l 2 of singular values nuclear norm: l 1 of singular values l 2-operator norm: l of singular values note: the spectral norm returns the square root of the max eigenvalue of A T A, may not equal the max singular value for asymmetric matrices notation: let be a unitary-invariant matrix norm let be the corresponding vector norm (for the singular values) 16 / 36

18 proximals of unitary-invariant matrix norms: computation steps: X = prox (A) := arg min X + 1 2λ X A 2 F 1. SVD: A Udiag(σ)V T 2. proximal: σ arg min s s + 1 s σ 2 2λ 3. return: X Udiag(σ )V T 17 / 36

19 Proximable functions definition: a function f : R n R is proximable if prox γf can be computed in O(n) or O(npolylog(n)) time examples: norms: l 1, l 2, l 2,1, l,... separable functions/constraints: x 0, l x u standard simplex: {x R n : 1 T x = 1, x 0}... In general, f, g both proximable f + g proximable. But, there are exceptions. If f + g is proximable, we can simplify operator splitting 18 / 36

20 f + g proximable functions Let denote operator composition: for example ( (prox f prox g )(x) := prox f proxg (x) ) rule 1: if f : R R is convex and f (0) = 0, then the scalar function f + is proximable: prox f+ = prox f prox Application: the elastic net regularizer 1 2 x α x 1 19 / 36

21 rule 2: g is a 1-homogeneous function if g(αx) = αg(x), α 0. Examples: l 1, l, ι 0, ι 0 If g is a 1-homogeneous function, then 2 + g is proximable: prox 2 +g = prox 2 prox g 20 / 36

22 rule 3: 1D discrete total variation is TV(x) := n 1 xi+1 xi i=1 f is component prox-monotonic if, x R n and i, j {1,..., n}, x i < x j x i = x j ( prox f (x) ) i ( prox f (x) ) j ( prox f (x) ) i = ( prox f (x) ) j Examples: l 1, l 2, l, ι l, ι u, ι [l,u]. If f is component prox-monotonic, then f + TV is proximable: prox f+tv = prox f prox TV Application: Fused LASSO regularizer α x 1 + TV(x) 21 / 36

23 Properties

24 Separable Sum Proximal Proposition For a separable function f(x, y) = φ(x) + ψ(y), prox λf (v, w) = (prox λφ (v), prox λψ (w)) We have observed this with the proximal of x 1 := n i=1 xi Can be used to derive x 2,1 := p i=1 x (i) (as in nonoverlap group LASSO) 22 / 36

25 Proximal fixed-point Theorem (fixed-point = minimizer) Let λ > 0. Point x R n is a minimizer of f if, and only if, prox λf (x ) = x. Proof. : Let x arg min f(x). Then for any x R n, f(x) + 1 2λ x x 2 f(x ) + 1 2λ x x 2. Thus, x = arg min f(x) + 1 2λ x x 2, so x = prox λf (x ). : Let x = prox λf (x ), then by the subgradient optimality condition: 0 f(x ) + 1 λ (x x ) = f(x ) Thus, 0 f(x ), and x arg min f(x). 23 / 36

26 Proximal operator and resolvent Definition For a monotone operator T, (I + λt ) 1 is the (well-defined) resolvent of T. Proposition prox λf = (I + λ f) 1. Informal proof. x (I + λ f) 1 (v) v (I + λ f)(x) v x + λ f(x) 0 x v + λ f(x) 0 1 (x v) + f(x) λ ( x = arg min f(x) + 1 ) x R d 2λ x v 2 24 / 36

27 Proximal-Point Algorithm

28 Proximal-point algorithm (PPA) iteration: x k+1 prox λf (x k ) seldom used to minimize f because prox λf is as difficult recovers the method of multipliers or augmented Lagrangian method (later lecture) has iterating convergence properties can relax λ to take values in an interval R / 36

29 Proximal is firmly nonexpansive definition: a map T is nonexpansive if T (x) T (y) x y, x, y definition: a map T is firmly nonexpansive if T (x) T (y) 2 x y 2 (x T (x)) (y T (y)) 2, x, y A key property in the development and analysis of first-order algorithms! 26 / 36

30 Proposition For proper closed convex f and λ > 0, prox γf is firmly nonexpansive. Proof. Take arbitrary x, y, and let x := prox γf (x) and y := prox γf (y). By the subgradient optimality conditions: (x x ) λ f(x ) (y y ) λ f(y ). Since f is monotone, i.e., f(x ) f(y ), x y 0 element-wise, we have (x x ) (y y ), x y 0, which is equivalent to x y 2 x y 2 (x x ) (y y ) / 36

31 PPA convergence properties (without proofs) Assume: convex f, x exists. Since prox γf is firmly nonexpansive x k x (weakly in inf-dim H) above is still true subject to summable error in computing prox γf (x k ) fixed-point residual rate prox λf (x k ) x k 2 = o(1/k 2 ) objective rate f(x k ) f(x ) = o(1/k) 28 / 36

32 Assume: strongly convex f: C > 0 p q, x y C x y, x, y and p f(x), q f(y) Then, prox λf is a contraction: prox λf (x) prox λf (y) 2 and (assuming x exists) thus x k+1 x 2 Therefore, x k x linearly λC x y 2. 1 (1 + 2λC) k xk x. 29 / 36

33 PPA Interpretations (destination) subgradient-descent interpretation: x k+1 = prox λf (x k ) x k+1 = (I + λ f) 1 (x k ) x k (I + λ f)x k+1 x k x k+1 + λ f(x k+1 ) x k+1 = x k λ f(x k+1 ) where f(x k+1 ) f(x k+1 ) interpretation: descent using the negative destination subgradient compare: origin subgradient is not necessarily a descent direction 30 / 36

34 dual interpretation: Let y k+1 = f(x k+1 ) f(x k+1 ). Substituting formula of x k+1, we get y k+1 f(x k λy k+1 ). Computing prox λf (x k ) is equivalent to solving for a subgradient at the descent destination. Related to the Moreau decomposition (in a later lecture). 31 / 36

35 approx-gradient interpretation: Assume that f is twice differentiable, then as λ 0, prox λf (x) = (I + f) 1 (x) = x λ f(x) + o(λ) 32 / 36

36 disappearing Tikhonov-regularization interpretation: ( x k+1 prox λf (x k ) = arg min f(x) + 1 ) x R n 2λ x xk 2 2 The second term is regularization: x k+1 should be close to x k Regularization goes away as x k converges 33 / 36

37 Bregman iterative regularization: x k+1 arg min x 1 λ Dp r (x; x k ) + f(x) given a proper closed convex function r and a subgradient p(x k ) r(x k ), D p r (x; x k ) := r(x) r(x k ) p, x x k PPA is the special case corresponding to setting r = / 36

38 Summary Proximal operator is easy to understand Is a standard tool for nonsmooth/constrained optimization Gives a fixed-point optimality condition PPA is more stable than gradient descent Sits at a high level of abstraction In closed form for many functions 35 / 36

39 Not covered Usage in operator splitting Proximals of dual functions, compute by minimizing augmented Lagrangian Proximals of nonconvex functions 36 / 36

Coordinate Update Algorithm Short Course Subgradients and Subgradient Methods

Coordinate Update Algorithm Short Course Subgradients and Subgradient Methods Coordinate Update Algorithm Short Course Subgradients and Subgradient Methods Instructor: Wotao Yin (UCLA Math) Summer 2016 1 / 30 Notation f : H R { } is a closed proper convex function domf := {x R n

More information

Convergence of Fixed-Point Iterations

Convergence of Fixed-Point Iterations Convergence of Fixed-Point Iterations Instructor: Wotao Yin (UCLA Math) July 2016 1 / 30 Why study fixed-point iterations? Abstract many existing algorithms in optimization, numerical linear algebra, and

More information

Proximal Operator and Proximal Algorithms (Lecture notes of UCLA 285J Fall 2016)

Proximal Operator and Proximal Algorithms (Lecture notes of UCLA 285J Fall 2016) Proximal Operator and Proximal Algorithms (Lecture notes of UCLA 285J Fall 206) Instructor: Wotao Yin April 29, 207 Given a function f, the proximal operator maps an input point x to the minimizer of f

More information

Coordinate Update Algorithm Short Course Operator Splitting

Coordinate Update Algorithm Short Course Operator Splitting Coordinate Update Algorithm Short Course Operator Splitting Instructor: Wotao Yin (UCLA Math) Summer 2016 1 / 25 Operator splitting pipeline 1. Formulate a problem as 0 A(x) + B(x) with monotone operators

More information

Convex Optimization. (EE227A: UC Berkeley) Lecture 15. Suvrit Sra. (Gradient methods III) 12 March, 2013

Convex Optimization. (EE227A: UC Berkeley) Lecture 15. Suvrit Sra. (Gradient methods III) 12 March, 2013 Convex Optimization (EE227A: UC Berkeley) Lecture 15 (Gradient methods III) 12 March, 2013 Suvrit Sra Optimal gradient methods 2 / 27 Optimal gradient methods We saw following efficiency estimates for

More information

Math 273a: Optimization Subgradient Methods

Math 273a: Optimization Subgradient Methods Math 273a: Optimization Subgradient Methods Instructor: Wotao Yin Department of Mathematics, UCLA Fall 2015 online discussions on piazza.com Nonsmooth convex function Recall: For ˉx R n, f(ˉx) := {g R

More information

Math 273a: Optimization Convex Conjugacy

Math 273a: Optimization Convex Conjugacy Math 273a: Optimization Convex Conjugacy Instructor: Wotao Yin Department of Mathematics, UCLA Fall 2015 online discussions on piazza.com Convex conjugate (the Legendre transform) Let f be a closed proper

More information

Sparse Optimization Lecture: Dual Methods, Part I

Sparse Optimization Lecture: Dual Methods, Part I Sparse Optimization Lecture: Dual Methods, Part I Instructor: Wotao Yin July 2013 online discussions on piazza.com Those who complete this lecture will know dual (sub)gradient iteration augmented l 1 iteration

More information

Tight Rates and Equivalence Results of Operator Splitting Schemes

Tight Rates and Equivalence Results of Operator Splitting Schemes Tight Rates and Equivalence Results of Operator Splitting Schemes Wotao Yin (UCLA Math) Workshop on Optimization for Modern Computing Joint w Damek Davis and Ming Yan UCLA CAM 14-51, 14-58, and 14-59 1

More information

Math 273a: Optimization Overview of First-Order Optimization Algorithms

Math 273a: Optimization Overview of First-Order Optimization Algorithms Math 273a: Optimization Overview of First-Order Optimization Algorithms Wotao Yin Department of Mathematics, UCLA online discussions on piazza.com 1 / 9 Typical flow of numerical optimization Optimization

More information

Conditional Gradient (Frank-Wolfe) Method

Conditional Gradient (Frank-Wolfe) Method Conditional Gradient (Frank-Wolfe) Method Lecturer: Aarti Singh Co-instructor: Pradeep Ravikumar Convex Optimization 10-725/36-725 1 Outline Today: Conditional gradient method Convergence analysis Properties

More information

Proximal Gradient Descent and Acceleration. Ryan Tibshirani Convex Optimization /36-725

Proximal Gradient Descent and Acceleration. Ryan Tibshirani Convex Optimization /36-725 Proximal Gradient Descent and Acceleration Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: subgradient method Consider the problem min f(x) with f convex, and dom(f) = R n. Subgradient method:

More information

About Split Proximal Algorithms for the Q-Lasso

About Split Proximal Algorithms for the Q-Lasso Thai Journal of Mathematics Volume 5 (207) Number : 7 http://thaijmath.in.cmu.ac.th ISSN 686-0209 About Split Proximal Algorithms for the Q-Lasso Abdellatif Moudafi Aix Marseille Université, CNRS-L.S.I.S

More information

Proximal methods. S. Villa. October 7, 2014

Proximal methods. S. Villa. October 7, 2014 Proximal methods S. Villa October 7, 2014 1 Review of the basics Often machine learning problems require the solution of minimization problems. For instance, the ERM algorithm requires to solve a problem

More information

Math 273a: Optimization Subgradients of convex functions

Math 273a: Optimization Subgradients of convex functions Math 273a: Optimization Subgradients of convex functions Made by: Damek Davis Edited by Wotao Yin Department of Mathematics, UCLA Fall 2015 online discussions on piazza.com 1 / 42 Subgradients Assumptions

More information

Splitting methods for decomposing separable convex programs

Splitting methods for decomposing separable convex programs Splitting methods for decomposing separable convex programs Philippe Mahey LIMOS - ISIMA - Université Blaise Pascal PGMO, ENSTA 2013 October 4, 2013 1 / 30 Plan 1 Max Monotone Operators Proximal techniques

More information

Lecture 9: September 28

Lecture 9: September 28 0-725/36-725: Convex Optimization Fall 206 Lecturer: Ryan Tibshirani Lecture 9: September 28 Scribes: Yiming Wu, Ye Yuan, Zhihao Li Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These

More information

Optimization for Learning and Big Data

Optimization for Learning and Big Data Optimization for Learning and Big Data Donald Goldfarb Department of IEOR Columbia University Department of Mathematics Distinguished Lecture Series May 17-19, 2016. Lecture 1. First-Order Methods for

More information

LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE

LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE CONVEX ANALYSIS AND DUALITY Basic concepts of convex analysis Basic concepts of convex optimization Geometric duality framework - MC/MC Constrained optimization

More information

6. Proximal gradient method

6. Proximal gradient method L. Vandenberghe EE236C (Spring 2013-14) 6. Proximal gradient method motivation proximal mapping proximal gradient method with fixed step size proximal gradient method with line search 6-1 Proximal mapping

More information

Iterative Convex Optimization Algorithms; Part One: Using the Baillon Haddad Theorem

Iterative Convex Optimization Algorithms; Part One: Using the Baillon Haddad Theorem Iterative Convex Optimization Algorithms; Part One: Using the Baillon Haddad Theorem Charles Byrne (Charles Byrne@uml.edu) http://faculty.uml.edu/cbyrne/cbyrne.html Department of Mathematical Sciences

More information

Convex Optimization. Lecture 12 - Equality Constrained Optimization. Instructor: Yuanzhang Xiao. Fall University of Hawaii at Manoa

Convex Optimization. Lecture 12 - Equality Constrained Optimization. Instructor: Yuanzhang Xiao. Fall University of Hawaii at Manoa Convex Optimization Lecture 12 - Equality Constrained Optimization Instructor: Yuanzhang Xiao University of Hawaii at Manoa Fall 2017 1 / 19 Today s Lecture 1 Basic Concepts 2 for Equality Constrained

More information

Math 273a: Optimization Lagrange Duality

Math 273a: Optimization Lagrange Duality Math 273a: Optimization Lagrange Duality Instructor: Wotao Yin Department of Mathematics, UCLA Winter 2015 online discussions on piazza.com Gradient descent / forward Euler assume function f is proper

More information

Convex Optimization Notes

Convex Optimization Notes Convex Optimization Notes Jonathan Siegel January 2017 1 Convex Analysis This section is devoted to the study of convex functions f : B R {+ } and convex sets U B, for B a Banach space. The case of B =

More information

Agenda. Fast proximal gradient methods. 1 Accelerated first-order methods. 2 Auxiliary sequences. 3 Convergence analysis. 4 Numerical examples

Agenda. Fast proximal gradient methods. 1 Accelerated first-order methods. 2 Auxiliary sequences. 3 Convergence analysis. 4 Numerical examples Agenda Fast proximal gradient methods 1 Accelerated first-order methods 2 Auxiliary sequences 3 Convergence analysis 4 Numerical examples 5 Optimality of Nesterov s scheme Last time Proximal gradient method

More information

EE 546, Univ of Washington, Spring Proximal mapping. introduction. review of conjugate functions. proximal mapping. Proximal mapping 6 1

EE 546, Univ of Washington, Spring Proximal mapping. introduction. review of conjugate functions. proximal mapping. Proximal mapping 6 1 EE 546, Univ of Washington, Spring 2012 6. Proximal mapping introduction review of conjugate functions proximal mapping Proximal mapping 6 1 Proximal mapping the proximal mapping (prox-operator) of a convex

More information

Lecture 8: February 9

Lecture 8: February 9 0-725/36-725: Convex Optimiation Spring 205 Lecturer: Ryan Tibshirani Lecture 8: February 9 Scribes: Kartikeya Bhardwaj, Sangwon Hyun, Irina Caan 8 Proximal Gradient Descent In the previous lecture, we

More information

Optimization methods

Optimization methods Optimization methods Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda /8/016 Introduction Aim: Overview of optimization methods that Tend to

More information

Master 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique

Master 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique Master 2 MathBigData S. Gaïffas 1 3 novembre 2014 1 CMAP - Ecole Polytechnique 1 Supervised learning recap Introduction Loss functions, linearity 2 Penalization Introduction Ridge Sparsity Lasso 3 Some

More information

A Unified Approach to Proximal Algorithms using Bregman Distance

A Unified Approach to Proximal Algorithms using Bregman Distance A Unified Approach to Proximal Algorithms using Bregman Distance Yi Zhou a,, Yingbin Liang a, Lixin Shen b a Department of Electrical Engineering and Computer Science, Syracuse University b Department

More information

6. Proximal gradient method

6. Proximal gradient method L. Vandenberghe EE236C (Spring 2016) 6. Proximal gradient method motivation proximal mapping proximal gradient method with fixed step size proximal gradient method with line search 6-1 Proximal mapping

More information

Operator Splitting for Parallel and Distributed Optimization

Operator Splitting for Parallel and Distributed Optimization Operator Splitting for Parallel and Distributed Optimization Wotao Yin (UCLA Math) Shanghai Tech, SSDS 15 June 23, 2015 URL: alturl.com/2z7tv 1 / 60 What is splitting? Sun-Tzu: (400 BC) Caesar: divide-n-conquer

More information

1 Sparsity and l 1 relaxation

1 Sparsity and l 1 relaxation 6.883 Learning with Combinatorial Structure Note for Lecture 2 Author: Chiyuan Zhang Sparsity and l relaxation Last time we talked about sparsity and characterized when an l relaxation could recover the

More information

The proximal mapping

The proximal mapping The proximal mapping http://bicmr.pku.edu.cn/~wenzw/opt-2016-fall.html Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes Outline 2/37 1 closed function 2 Conjugate function

More information

Math 273a: Optimization Subgradients of convex functions

Math 273a: Optimization Subgradients of convex functions Math 273a: Optimization Subgradients of convex functions Made by: Damek Davis Edited by Wotao Yin Department of Mathematics, UCLA Fall 2015 online discussions on piazza.com 1 / 20 Subgradients Assumptions

More information

Optimization methods

Optimization methods Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,

More information

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring

More information

1 Overview. 2 A Characterization of Convex Functions. 2.1 First-order Taylor approximation. AM 221: Advanced Optimization Spring 2016

1 Overview. 2 A Characterization of Convex Functions. 2.1 First-order Taylor approximation. AM 221: Advanced Optimization Spring 2016 AM 221: Advanced Optimization Spring 2016 Prof. Yaron Singer Lecture 8 February 22nd 1 Overview In the previous lecture we saw characterizations of optimality in linear optimization, and we reviewed the

More information

Math 273a: Optimization Basic concepts

Math 273a: Optimization Basic concepts Math 273a: Optimization Basic concepts Instructor: Wotao Yin Department of Mathematics, UCLA Spring 2015 slides based on Chong-Zak, 4th Ed. Goals of this lecture The general form of optimization: minimize

More information

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization /

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization / Uses of duality Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Remember conjugate functions Given f : R n R, the function is called its conjugate f (y) = max x R n yt x f(x) Conjugates appear

More information

Optimization for Machine Learning

Optimization for Machine Learning Optimization for Machine Learning (Problems; Algorithms - A) SUVRIT SRA Massachusetts Institute of Technology PKU Summer School on Data Science (July 2017) Course materials http://suvrit.de/teaching.html

More information

Lasso: Algorithms and Extensions

Lasso: Algorithms and Extensions ELE 538B: Sparsity, Structure and Inference Lasso: Algorithms and Extensions Yuxin Chen Princeton University, Spring 2017 Outline Proximal operators Proximal gradient methods for lasso and its extensions

More information

On convergence rate of the Douglas-Rachford operator splitting method

On convergence rate of the Douglas-Rachford operator splitting method On convergence rate of the Douglas-Rachford operator splitting method Bingsheng He and Xiaoming Yuan 2 Abstract. This note provides a simple proof on a O(/k) convergence rate for the Douglas- Rachford

More information

Sequential Unconstrained Minimization: A Survey

Sequential Unconstrained Minimization: A Survey Sequential Unconstrained Minimization: A Survey Charles L. Byrne February 21, 2013 Abstract The problem is to minimize a function f : X (, ], over a non-empty subset C of X, where X is an arbitrary set.

More information

Frank-Wolfe Method. Ryan Tibshirani Convex Optimization

Frank-Wolfe Method. Ryan Tibshirani Convex Optimization Frank-Wolfe Method Ryan Tibshirani Convex Optimization 10-725 Last time: ADMM For the problem min x,z f(x) + g(z) subject to Ax + Bz = c we form augmented Lagrangian (scaled form): L ρ (x, z, w) = f(x)

More information

min f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term;

min f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term; Chapter 2 Gradient Methods The gradient method forms the foundation of all of the schemes studied in this book. We will provide several complementary perspectives on this algorithm that highlight the many

More information

Dual and primal-dual methods

Dual and primal-dual methods ELE 538B: Large-Scale Optimization for Data Science Dual and primal-dual methods Yuxin Chen Princeton University, Spring 2018 Outline Dual proximal gradient method Primal-dual proximal gradient method

More information

This can be 2 lectures! still need: Examples: non-convex problems applications for matrix factorization

This can be 2 lectures! still need: Examples: non-convex problems applications for matrix factorization This can be 2 lectures! still need: Examples: non-convex problems applications for matrix factorization x = prox_f(x)+prox_{f^*}(x) use to get prox of norms! PROXIMAL METHODS WHY PROXIMAL METHODS Smooth

More information

Contraction Methods for Convex Optimization and monotone variational inequalities No.12

Contraction Methods for Convex Optimization and monotone variational inequalities No.12 XII - 1 Contraction Methods for Convex Optimization and monotone variational inequalities No.12 Linearized alternating direction methods of multipliers for separable convex programming Bingsheng He Department

More information

Accelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems)

Accelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems) Accelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems) Donghwan Kim and Jeffrey A. Fessler EECS Department, University of Michigan

More information

4TE3/6TE3. Algorithms for. Continuous Optimization

4TE3/6TE3. Algorithms for. Continuous Optimization 4TE3/6TE3 Algorithms for Continuous Optimization (Algorithms for Constrained Nonlinear Optimization Problems) Tamás TERLAKY Computing and Software McMaster University Hamilton, November 2005 terlaky@mcmaster.ca

More information

Lecture 7: September 17

Lecture 7: September 17 10-725: Optimization Fall 2013 Lecture 7: September 17 Lecturer: Ryan Tibshirani Scribes: Serim Park,Yiming Gu 7.1 Recap. The drawbacks of Gradient Methods are: (1) requires f is differentiable; (2) relatively

More information

MATH 829: Introduction to Data Mining and Analysis Computing the lasso solution

MATH 829: Introduction to Data Mining and Analysis Computing the lasso solution 1/16 MATH 829: Introduction to Data Mining and Analysis Computing the lasso solution Dominique Guillot Departments of Mathematical Sciences University of Delaware February 26, 2016 Computing the lasso

More information

Dual Proximal Gradient Method

Dual Proximal Gradient Method Dual Proximal Gradient Method http://bicmr.pku.edu.cn/~wenzw/opt-2016-fall.html Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes Outline 2/19 1 proximal gradient method

More information

Sparse Regularization via Convex Analysis

Sparse Regularization via Convex Analysis Sparse Regularization via Convex Analysis Ivan Selesnick Electrical and Computer Engineering Tandon School of Engineering New York University Brooklyn, New York, USA 29 / 66 Convex or non-convex: Which

More information

An Accelerated Hybrid Proximal Extragradient Method for Convex Optimization and its Implications to Second-Order Methods

An Accelerated Hybrid Proximal Extragradient Method for Convex Optimization and its Implications to Second-Order Methods An Accelerated Hybrid Proximal Extragradient Method for Convex Optimization and its Implications to Second-Order Methods Renato D.C. Monteiro B. F. Svaiter May 10, 011 Revised: May 4, 01) Abstract This

More information

Smoothing Proximal Gradient Method. General Structured Sparse Regression

Smoothing Proximal Gradient Method. General Structured Sparse Regression for General Structured Sparse Regression Xi Chen, Qihang Lin, Seyoung Kim, Jaime G. Carbonell, Eric P. Xing (Annals of Applied Statistics, 2012) Gatsby Unit, Tea Talk October 25, 2013 Outline Motivation:

More information

ARock: an algorithmic framework for asynchronous parallel coordinate updates

ARock: an algorithmic framework for asynchronous parallel coordinate updates ARock: an algorithmic framework for asynchronous parallel coordinate updates Zhimin Peng, Yangyang Xu, Ming Yan, Wotao Yin ( UCLA Math, U.Waterloo DCO) UCLA CAM Report 15-37 ShanghaiTech SSDS 15 June 25,

More information

EE 367 / CS 448I Computational Imaging and Display Notes: Image Deconvolution (lecture 6)

EE 367 / CS 448I Computational Imaging and Display Notes: Image Deconvolution (lecture 6) EE 367 / CS 448I Computational Imaging and Display Notes: Image Deconvolution (lecture 6) Gordon Wetzstein gordon.wetzstein@stanford.edu This document serves as a supplement to the material discussed in

More information

Lecture 3. Optimization Problems and Iterative Algorithms

Lecture 3. Optimization Problems and Iterative Algorithms Lecture 3 Optimization Problems and Iterative Algorithms January 13, 2016 This material was jointly developed with Angelia Nedić at UIUC for IE 598ns Outline Special Functions: Linear, Quadratic, Convex

More information

Proximal Newton Method. Zico Kolter (notes by Ryan Tibshirani) Convex Optimization

Proximal Newton Method. Zico Kolter (notes by Ryan Tibshirani) Convex Optimization Proximal Newton Method Zico Kolter (notes by Ryan Tibshirani) Convex Optimization 10-725 Consider the problem Last time: quasi-newton methods min x f(x) with f convex, twice differentiable, dom(f) = R

More information

Convex Optimization. Problem set 2. Due Monday April 26th

Convex Optimization. Problem set 2. Due Monday April 26th Convex Optimization Problem set 2 Due Monday April 26th 1 Gradient Decent without Line-search In this problem we will consider gradient descent with predetermined step sizes. That is, instead of determining

More information

9. Dual decomposition and dual algorithms

9. Dual decomposition and dual algorithms EE 546, Univ of Washington, Spring 2016 9. Dual decomposition and dual algorithms dual gradient ascent example: network rate control dual decomposition and the proximal gradient method examples with simple

More information

Gradient Descent. Ryan Tibshirani Convex Optimization /36-725

Gradient Descent. Ryan Tibshirani Convex Optimization /36-725 Gradient Descent Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: canonical convex programs Linear program (LP): takes the form min x subject to c T x Gx h Ax = b Quadratic program (QP): like

More information

MLCC 2018 Variable Selection and Sparsity. Lorenzo Rosasco UNIGE-MIT-IIT

MLCC 2018 Variable Selection and Sparsity. Lorenzo Rosasco UNIGE-MIT-IIT MLCC 2018 Variable Selection and Sparsity Lorenzo Rosasco UNIGE-MIT-IIT Outline Variable Selection Subset Selection Greedy Methods: (Orthogonal) Matching Pursuit Convex Relaxation: LASSO & Elastic Net

More information

Dual methods and ADMM. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725

Dual methods and ADMM. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725 Dual methods and ADMM Barnabas Poczos & Ryan Tibshirani Convex Optimization 10-725/36-725 1 Given f : R n R, the function is called its conjugate Recall conjugate functions f (y) = max x R n yt x f(x)

More information

Numerisches Rechnen. (für Informatiker) M. Grepl P. Esser & G. Welper & L. Zhang. Institut für Geometrie und Praktische Mathematik RWTH Aachen

Numerisches Rechnen. (für Informatiker) M. Grepl P. Esser & G. Welper & L. Zhang. Institut für Geometrie und Praktische Mathematik RWTH Aachen Numerisches Rechnen (für Informatiker) M. Grepl P. Esser & G. Welper & L. Zhang Institut für Geometrie und Praktische Mathematik RWTH Aachen Wintersemester 2011/12 IGPM, RWTH Aachen Numerisches Rechnen

More information

Subgradient Method. Ryan Tibshirani Convex Optimization

Subgradient Method. Ryan Tibshirani Convex Optimization Subgradient Method Ryan Tibshirani Convex Optimization 10-725 Consider the problem Last last time: gradient descent min x f(x) for f convex and differentiable, dom(f) = R n. Gradient descent: choose initial

More information

Accelerated Proximal Gradient Methods for Convex Optimization

Accelerated Proximal Gradient Methods for Convex Optimization Accelerated Proximal Gradient Methods for Convex Optimization Paul Tseng Mathematics, University of Washington Seattle MOPTA, University of Guelph August 18, 2008 ACCELERATED PROXIMAL GRADIENT METHODS

More information

Lecture 1: September 25

Lecture 1: September 25 0-725: Optimization Fall 202 Lecture : September 25 Lecturer: Geoff Gordon/Ryan Tibshirani Scribes: Subhodeep Moitra Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes have

More information

minimize x subject to (x 2)(x 4) u,

minimize x subject to (x 2)(x 4) u, Math 6366/6367: Optimization and Variational Methods Sample Preliminary Exam Questions 1. Suppose that f : [, L] R is a C 2 -function with f () on (, L) and that you have explicit formulae for

More information

Convex Functions. Pontus Giselsson

Convex Functions. Pontus Giselsson Convex Functions Pontus Giselsson 1 Today s lecture lower semicontinuity, closure, convex hull convexity preserving operations precomposition with affine mapping infimal convolution image function supremum

More information

Convex Optimization. Dani Yogatama. School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA. February 12, 2014

Convex Optimization. Dani Yogatama. School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA. February 12, 2014 Convex Optimization Dani Yogatama School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA February 12, 2014 Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12,

More information

Algorithms for Nonsmooth Optimization

Algorithms for Nonsmooth Optimization Algorithms for Nonsmooth Optimization Frank E. Curtis, Lehigh University presented at Center for Optimization and Statistical Learning, Northwestern University 2 March 2018 Algorithms for Nonsmooth Optimization

More information

Algorithms for Constrained Optimization

Algorithms for Constrained Optimization 1 / 42 Algorithms for Constrained Optimization ME598/494 Lecture Max Yi Ren Department of Mechanical Engineering, Arizona State University April 19, 2015 2 / 42 Outline 1. Convergence 2. Sequential quadratic

More information

You should be able to...

You should be able to... Lecture Outline Gradient Projection Algorithm Constant Step Length, Varying Step Length, Diminishing Step Length Complexity Issues Gradient Projection With Exploration Projection Solving QPs: active set

More information

Constrained Optimization

Constrained Optimization 1 / 22 Constrained Optimization ME598/494 Lecture Max Yi Ren Department of Mechanical Engineering, Arizona State University March 30, 2015 2 / 22 1. Equality constraints only 1.1 Reduced gradient 1.2 Lagrange

More information

Proximal splitting methods on convex problems with a quadratic term: Relax!

Proximal splitting methods on convex problems with a quadratic term: Relax! Proximal splitting methods on convex problems with a quadratic term: Relax! The slides I presented with added comments Laurent Condat GIPSA-lab, Univ. Grenoble Alpes, France Workshop BASP Frontiers, Jan.

More information

14. Nonlinear equations

14. Nonlinear equations L. Vandenberghe ECE133A (Winter 2018) 14. Nonlinear equations Newton method for nonlinear equations damped Newton method for unconstrained minimization Newton method for nonlinear least squares 14-1 Set

More information

Proximal gradient methods

Proximal gradient methods ELE 538B: Large-Scale Optimization for Data Science Proximal gradient methods Yuxin Chen Princeton University, Spring 08 Outline Proximal gradient descent for composite functions Proximal mapping / operator

More information

A GENERALIZATION OF THE REGULARIZATION PROXIMAL POINT METHOD

A GENERALIZATION OF THE REGULARIZATION PROXIMAL POINT METHOD A GENERALIZATION OF THE REGULARIZATION PROXIMAL POINT METHOD OGANEDITSE A. BOIKANYO AND GHEORGHE MOROŞANU Abstract. This paper deals with the generalized regularization proximal point method which was

More information

Lecture 5 : Projections

Lecture 5 : Projections Lecture 5 : Projections EE227C. Lecturer: Professor Martin Wainwright. Scribe: Alvin Wan Up until now, we have seen convergence rates of unconstrained gradient descent. Now, we consider a constrained minimization

More information

SEMI-SMOOTH SECOND-ORDER TYPE METHODS FOR COMPOSITE CONVEX PROGRAMS

SEMI-SMOOTH SECOND-ORDER TYPE METHODS FOR COMPOSITE CONVEX PROGRAMS SEMI-SMOOTH SECOND-ORDER TYPE METHODS FOR COMPOSITE CONVEX PROGRAMS XIANTAO XIAO, YONGFENG LI, ZAIWEN WEN, AND LIWEI ZHANG Abstract. The goal of this paper is to study approaches to bridge the gap between

More information

Part 3: Trust-region methods for unconstrained optimization. Nick Gould (RAL)

Part 3: Trust-region methods for unconstrained optimization. Nick Gould (RAL) Part 3: Trust-region methods for unconstrained optimization Nick Gould (RAL) minimize x IR n f(x) MSc course on nonlinear optimization UNCONSTRAINED MINIMIZATION minimize x IR n f(x) where the objective

More information

Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem

Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem Michael Patriksson 0-0 The Relaxation Theorem 1 Problem: find f := infimum f(x), x subject to x S, (1a) (1b) where f : R n R

More information

ON THE GLOBAL AND LINEAR CONVERGENCE OF THE GENERALIZED ALTERNATING DIRECTION METHOD OF MULTIPLIERS

ON THE GLOBAL AND LINEAR CONVERGENCE OF THE GENERALIZED ALTERNATING DIRECTION METHOD OF MULTIPLIERS ON THE GLOBAL AND LINEAR CONVERGENCE OF THE GENERALIZED ALTERNATING DIRECTION METHOD OF MULTIPLIERS WEI DENG AND WOTAO YIN Abstract. The formulation min x,y f(x) + g(y) subject to Ax + By = b arises in

More information

Positive Definite Matrix

Positive Definite Matrix 1/29 Chia-Ping Chen Professor Department of Computer Science and Engineering National Sun Yat-sen University Linear Algebra Positive Definite, Negative Definite, Indefinite 2/29 Pure Quadratic Function

More information

NOTES ON FIRST-ORDER METHODS FOR MINIMIZING SMOOTH FUNCTIONS. 1. Introduction. We consider first-order methods for smooth, unconstrained

NOTES ON FIRST-ORDER METHODS FOR MINIMIZING SMOOTH FUNCTIONS. 1. Introduction. We consider first-order methods for smooth, unconstrained NOTES ON FIRST-ORDER METHODS FOR MINIMIZING SMOOTH FUNCTIONS 1. Introduction. We consider first-order methods for smooth, unconstrained optimization: (1.1) minimize f(x), x R n where f : R n R. We assume

More information

1 Introduction and preliminaries

1 Introduction and preliminaries Proximal Methods for a Class of Relaxed Nonlinear Variational Inclusions Abdellatif Moudafi Université des Antilles et de la Guyane, Grimaag B.P. 7209, 97275 Schoelcher, Martinique abdellatif.moudafi@martinique.univ-ag.fr

More information

Newton s Method. Javier Peña Convex Optimization /36-725

Newton s Method. Javier Peña Convex Optimization /36-725 Newton s Method Javier Peña Convex Optimization 10-725/36-725 1 Last time: dual correspondences Given a function f : R n R, we define its conjugate f : R n R, f ( (y) = max y T x f(x) ) x Properties and

More information

BASICS OF CONVEX ANALYSIS

BASICS OF CONVEX ANALYSIS BASICS OF CONVEX ANALYSIS MARKUS GRASMAIR 1. Main Definitions We start with providing the central definitions of convex functions and convex sets. Definition 1. A function f : R n R + } is called convex,

More information

1 Non-negative Matrix Factorization (NMF)

1 Non-negative Matrix Factorization (NMF) 2018-06-21 1 Non-negative Matrix Factorization NMF) In the last lecture, we considered low rank approximations to data matrices. We started with the optimal rank k approximation to A R m n via the SVD,

More information

A Tutorial on Primal-Dual Algorithm

A Tutorial on Primal-Dual Algorithm A Tutorial on Primal-Dual Algorithm Shenlong Wang University of Toronto March 31, 2016 1 / 34 Energy minimization MAP Inference for MRFs Typical energies consist of a regularization term and a data term.

More information

A Brief Review on Convex Optimization

A Brief Review on Convex Optimization A Brief Review on Convex Optimization 1 Convex set S R n is convex if x,y S, λ,µ 0, λ+µ = 1 λx+µy S geometrically: x,y S line segment through x,y S examples (one convex, two nonconvex sets): A Brief Review

More information

Lecture 6: Conic Optimization September 8

Lecture 6: Conic Optimization September 8 IE 598: Big Data Optimization Fall 2016 Lecture 6: Conic Optimization September 8 Lecturer: Niao He Scriber: Juan Xu Overview In this lecture, we finish up our previous discussion on optimality conditions

More information

A Multilevel Proximal Algorithm for Large Scale Composite Convex Optimization

A Multilevel Proximal Algorithm for Large Scale Composite Convex Optimization A Multilevel Proximal Algorithm for Large Scale Composite Convex Optimization Panos Parpas Department of Computing Imperial College London www.doc.ic.ac.uk/ pp500 p.parpas@imperial.ac.uk jointly with D.V.

More information

Oslo Class 6 Sparsity based regularization

Oslo Class 6 Sparsity based regularization RegML2017@SIMULA Oslo Class 6 Sparsity based regularization Lorenzo Rosasco UNIGE-MIT-IIT May 4, 2017 Learning from data Possible only under assumptions regularization min Ê(w) + λr(w) w Smoothness Sparsity

More information

Primal/Dual Decomposition Methods

Primal/Dual Decomposition Methods Primal/Dual Decomposition Methods Daniel P. Palomar Hong Kong University of Science and Technology (HKUST) ELEC5470 - Convex Optimization Fall 2018-19, HKUST, Hong Kong Outline of Lecture Subgradients

More information

Proximal Methods for Optimization with Spasity-inducing Norms

Proximal Methods for Optimization with Spasity-inducing Norms Proximal Methods for Optimization with Spasity-inducing Norms Group Learning Presentation Xiaowei Zhou Department of Electronic and Computer Engineering The Hong Kong University of Science and Technology

More information

Sparse Optimization Lecture: Basic Sparse Optimization Models

Sparse Optimization Lecture: Basic Sparse Optimization Models Sparse Optimization Lecture: Basic Sparse Optimization Models Instructor: Wotao Yin July 2013 online discussions on piazza.com Those who complete this lecture will know basic l 1, l 2,1, and nuclear-norm

More information