Coordinate Update Algorithm Short Course Proximal Operators and Algorithms
|
|
- Delphia Garrison
- 5 years ago
- Views:
Transcription
1 Coordinate Update Algorithm Short Course Proximal Operators and Algorithms Instructor: Wotao Yin (UCLA Math) Summer / 36
2 Why proximal? Newton s method: for C 2 -smooth, unconstrained problems allow modest size Gradient method: for C 1 -smooth, unconstrained problems give large size and parallel implementations Proximal method: for smooth and non-smooth, constrained and unconstrained exploit problem structures gives large size and parallel implementations 2 / 36
3 Newton s algorithm uses low-level (explicit) operation: x k+1 x k λh 1 (x k ) f(x k ) Gradient algorithm uses low-level (explicit) operation: x k+1 x k λ f(x k ) Proximal-point algorithm uses high-level (implicit) operation: x k+1 prox λf (x k ) well-known algorithms are special cases Proximal operator prox λf is an optimization problem either standalone or used as a subproblem simple for structured f, but there are many 3 / 36
4 Notation and Assumptions f : R n R { } is a closed, proper, convex function (why? to ensure prox λf is well-defined and unique) proper refers to domf closed refers to epif is a closed set can extend R n to general Hilbert space H can take. This saves x domf in many settings. an operator maps R n to R n, also called a mapping or a map indicator function for set C: ι C(x) := { 0, if x C, otherwise. arg min x f(x) is the set of minimizers of f; if unique, is the minimizer 4 / 36
5 Definition Definition The proximal operator prox f : R n R n of a function f is defined by: ( prox f (v) = arg min f(x) + 1 ) x R n 2 x v 2 The (scaled) proximal operator prox λf : R n R n is defined by: ( prox λf (v) = arg min f(x) + 1 ) x R n 2λ x v 2 Strong convex problems with unique minimizers, so = is used Moreau envelope: ( f(v) := min f(x) + 1 ) x R n 2λ x v 2 is a differentiable function. f(v) = 1 λ (v x ) 5 / 36
6 Proximal of indicator function is projection Consider a nonempty closed convex set C prox ιc (x) = arg min (ι C(y) + 1 ) y 2 y x 2 1 = arg min y x 2 y C 2 =: proj C (x) 6 / 36
7 7 / 36 1D illustration f(x) + 1 x 2 v 2 is above f(x) unless v arg min f, prox λf (v) moves away from v f ( prox λf (v) ) < f(v)
8 Proximal parameter Tuning λ in ( prox λf (v) = arg min f(x) + 1 ) x R n 2λ x v 2 as λ : prox λf (v) proj arg min f(x) (v) as λ 0, prox λf (v) proj domf (v) where { } 1 proj domf (v) = arg min x R n 2 v x 2 : f(x) is finite (prox λf (v) v) is generally nonlinear in λ, so λ is a nonlinear step size 8 / 36
9 prox λf is soft projection The path {prox λf (v) : λ > 0} domf prox λf (v) is between and proj domf (v) and proj arg min f(x) (v) The paths by different v may overlap or join If v arg min f, then prox arg min f(x) (v) = v is a fixed point 9 / 36
10 Examples
11 Examples linear function: let a R n, b R and proximal of linear function: f(x) := a T x + b. ( prox λf (v) := arg min (a T x + b) + 1 ) x R n 2λ x v 2 has first-order optimality conditions: a + 1 λ (prox λf (v) v) = 0 prox λf (v) = v λa application: proximal of the linear approximation of f let f (1) (x) = f(x 0 ) + f(x 0 ), x x 0 then, prox λf (1)(x 0 ) = x 0 λ f(x 0 ) is a gradient step with size λ 10 / 36
12 Examples quadratic function let A be a symmetric positive semi-definite matrix, b R n, and proximal of quadratic function: f(x) := 1 2 xt Ax b T x + c. ( prox λf (v) := arg min f(x) + 1 ) x R n 2λ x v 2 has first order optimality conditions: (Av b) + 1 λ (v v) = 0 v = (λa + I) 1 (λb + v) v = (λa + I) 1 (λb + λav + v λav) v = v + (A + 1 λ I) 1 (b Av) It recovers an iterative refinement method for Ax = b. 11 / 36
13 application: proximal of the quadratic approximation of f let f (2) (x) = f(x 0 ) + f(x 0 ), x x (x 2 x0 ) T 2 f(x 0 )(x x 0 ) =: 1 2 xt Ax b T x + c where A = 2 f(x 0 ) b = ( 2 f(x 0 )) T x 0 f(x 0 ) by letting v = x 0, we get prox λf (2)(x 0 ) = x 0 ( 2 f(x 0 ) + 1 λ I) 1 f(x 0 ) recovers the modified-hessian Newton or Levenberg-Marquardt method 12 / 36
14 Examples l 1-norm: let f(x) = x 1. proximal of l 1-norm: Then ( prox λf (v) := arg min x ) x R n 2λ x v 2 The subgradient optimality condition 0 f(v ) + 1 λ (v v) v v λ f(v ) Recall f(x) = x 1 x n Hence, it reduces to component-wise subproblems: v i v i v i 13 / 36
15 proximal of l 1-norm (cont). Three cases: v i > 0, then v i v i = λ, so v i = v i λ v i < 0, then v i v i = λ, so v i = v i + λ v i = 0, then v i = v i v i [ λ, λ]. Rewriting these conditions in terms of v, if v i > λ, then vi = v i λ if v i < λ, then vi = v i + λ if v i [ λ, λ], then vi = 0. prox λf is the shrinkage (or element-wise soft-thresholding) operator shrink(v, λ) i = max( v i λ, 0) vi v i In Matlab: max(abs(v)-lambda,0).*sign(v) 14 / 36
16 Examples proximal of l 2-norm: let f(x) = x 2, then x prox λf (x) = max( x 2 λ, 0) x 2 = x proj B2 (0,λ)x, special convention: 0/0 = 0 if x = 0. general pattern: proximal of l p-norm: suppose p 1 + q 1 = 1, p, q [1, ]. Then, prox λf (x) = x proj x, Bq(0,λ) Useful for getting the proximals of l -norm and l 2,1-norm 15 / 36
17 Examples unitary-invariant matrix norms are vector norms on the singular values Frobenius norm: l 2 of singular values nuclear norm: l 1 of singular values l 2-operator norm: l of singular values note: the spectral norm returns the square root of the max eigenvalue of A T A, may not equal the max singular value for asymmetric matrices notation: let be a unitary-invariant matrix norm let be the corresponding vector norm (for the singular values) 16 / 36
18 proximals of unitary-invariant matrix norms: computation steps: X = prox (A) := arg min X + 1 2λ X A 2 F 1. SVD: A Udiag(σ)V T 2. proximal: σ arg min s s + 1 s σ 2 2λ 3. return: X Udiag(σ )V T 17 / 36
19 Proximable functions definition: a function f : R n R is proximable if prox γf can be computed in O(n) or O(npolylog(n)) time examples: norms: l 1, l 2, l 2,1, l,... separable functions/constraints: x 0, l x u standard simplex: {x R n : 1 T x = 1, x 0}... In general, f, g both proximable f + g proximable. But, there are exceptions. If f + g is proximable, we can simplify operator splitting 18 / 36
20 f + g proximable functions Let denote operator composition: for example ( (prox f prox g )(x) := prox f proxg (x) ) rule 1: if f : R R is convex and f (0) = 0, then the scalar function f + is proximable: prox f+ = prox f prox Application: the elastic net regularizer 1 2 x α x 1 19 / 36
21 rule 2: g is a 1-homogeneous function if g(αx) = αg(x), α 0. Examples: l 1, l, ι 0, ι 0 If g is a 1-homogeneous function, then 2 + g is proximable: prox 2 +g = prox 2 prox g 20 / 36
22 rule 3: 1D discrete total variation is TV(x) := n 1 xi+1 xi i=1 f is component prox-monotonic if, x R n and i, j {1,..., n}, x i < x j x i = x j ( prox f (x) ) i ( prox f (x) ) j ( prox f (x) ) i = ( prox f (x) ) j Examples: l 1, l 2, l, ι l, ι u, ι [l,u]. If f is component prox-monotonic, then f + TV is proximable: prox f+tv = prox f prox TV Application: Fused LASSO regularizer α x 1 + TV(x) 21 / 36
23 Properties
24 Separable Sum Proximal Proposition For a separable function f(x, y) = φ(x) + ψ(y), prox λf (v, w) = (prox λφ (v), prox λψ (w)) We have observed this with the proximal of x 1 := n i=1 xi Can be used to derive x 2,1 := p i=1 x (i) (as in nonoverlap group LASSO) 22 / 36
25 Proximal fixed-point Theorem (fixed-point = minimizer) Let λ > 0. Point x R n is a minimizer of f if, and only if, prox λf (x ) = x. Proof. : Let x arg min f(x). Then for any x R n, f(x) + 1 2λ x x 2 f(x ) + 1 2λ x x 2. Thus, x = arg min f(x) + 1 2λ x x 2, so x = prox λf (x ). : Let x = prox λf (x ), then by the subgradient optimality condition: 0 f(x ) + 1 λ (x x ) = f(x ) Thus, 0 f(x ), and x arg min f(x). 23 / 36
26 Proximal operator and resolvent Definition For a monotone operator T, (I + λt ) 1 is the (well-defined) resolvent of T. Proposition prox λf = (I + λ f) 1. Informal proof. x (I + λ f) 1 (v) v (I + λ f)(x) v x + λ f(x) 0 x v + λ f(x) 0 1 (x v) + f(x) λ ( x = arg min f(x) + 1 ) x R d 2λ x v 2 24 / 36
27 Proximal-Point Algorithm
28 Proximal-point algorithm (PPA) iteration: x k+1 prox λf (x k ) seldom used to minimize f because prox λf is as difficult recovers the method of multipliers or augmented Lagrangian method (later lecture) has iterating convergence properties can relax λ to take values in an interval R / 36
29 Proximal is firmly nonexpansive definition: a map T is nonexpansive if T (x) T (y) x y, x, y definition: a map T is firmly nonexpansive if T (x) T (y) 2 x y 2 (x T (x)) (y T (y)) 2, x, y A key property in the development and analysis of first-order algorithms! 26 / 36
30 Proposition For proper closed convex f and λ > 0, prox γf is firmly nonexpansive. Proof. Take arbitrary x, y, and let x := prox γf (x) and y := prox γf (y). By the subgradient optimality conditions: (x x ) λ f(x ) (y y ) λ f(y ). Since f is monotone, i.e., f(x ) f(y ), x y 0 element-wise, we have (x x ) (y y ), x y 0, which is equivalent to x y 2 x y 2 (x x ) (y y ) / 36
31 PPA convergence properties (without proofs) Assume: convex f, x exists. Since prox γf is firmly nonexpansive x k x (weakly in inf-dim H) above is still true subject to summable error in computing prox γf (x k ) fixed-point residual rate prox λf (x k ) x k 2 = o(1/k 2 ) objective rate f(x k ) f(x ) = o(1/k) 28 / 36
32 Assume: strongly convex f: C > 0 p q, x y C x y, x, y and p f(x), q f(y) Then, prox λf is a contraction: prox λf (x) prox λf (y) 2 and (assuming x exists) thus x k+1 x 2 Therefore, x k x linearly λC x y 2. 1 (1 + 2λC) k xk x. 29 / 36
33 PPA Interpretations (destination) subgradient-descent interpretation: x k+1 = prox λf (x k ) x k+1 = (I + λ f) 1 (x k ) x k (I + λ f)x k+1 x k x k+1 + λ f(x k+1 ) x k+1 = x k λ f(x k+1 ) where f(x k+1 ) f(x k+1 ) interpretation: descent using the negative destination subgradient compare: origin subgradient is not necessarily a descent direction 30 / 36
34 dual interpretation: Let y k+1 = f(x k+1 ) f(x k+1 ). Substituting formula of x k+1, we get y k+1 f(x k λy k+1 ). Computing prox λf (x k ) is equivalent to solving for a subgradient at the descent destination. Related to the Moreau decomposition (in a later lecture). 31 / 36
35 approx-gradient interpretation: Assume that f is twice differentiable, then as λ 0, prox λf (x) = (I + f) 1 (x) = x λ f(x) + o(λ) 32 / 36
36 disappearing Tikhonov-regularization interpretation: ( x k+1 prox λf (x k ) = arg min f(x) + 1 ) x R n 2λ x xk 2 2 The second term is regularization: x k+1 should be close to x k Regularization goes away as x k converges 33 / 36
37 Bregman iterative regularization: x k+1 arg min x 1 λ Dp r (x; x k ) + f(x) given a proper closed convex function r and a subgradient p(x k ) r(x k ), D p r (x; x k ) := r(x) r(x k ) p, x x k PPA is the special case corresponding to setting r = / 36
38 Summary Proximal operator is easy to understand Is a standard tool for nonsmooth/constrained optimization Gives a fixed-point optimality condition PPA is more stable than gradient descent Sits at a high level of abstraction In closed form for many functions 35 / 36
39 Not covered Usage in operator splitting Proximals of dual functions, compute by minimizing augmented Lagrangian Proximals of nonconvex functions 36 / 36
Coordinate Update Algorithm Short Course Subgradients and Subgradient Methods
Coordinate Update Algorithm Short Course Subgradients and Subgradient Methods Instructor: Wotao Yin (UCLA Math) Summer 2016 1 / 30 Notation f : H R { } is a closed proper convex function domf := {x R n
More informationConvergence of Fixed-Point Iterations
Convergence of Fixed-Point Iterations Instructor: Wotao Yin (UCLA Math) July 2016 1 / 30 Why study fixed-point iterations? Abstract many existing algorithms in optimization, numerical linear algebra, and
More informationProximal Operator and Proximal Algorithms (Lecture notes of UCLA 285J Fall 2016)
Proximal Operator and Proximal Algorithms (Lecture notes of UCLA 285J Fall 206) Instructor: Wotao Yin April 29, 207 Given a function f, the proximal operator maps an input point x to the minimizer of f
More informationCoordinate Update Algorithm Short Course Operator Splitting
Coordinate Update Algorithm Short Course Operator Splitting Instructor: Wotao Yin (UCLA Math) Summer 2016 1 / 25 Operator splitting pipeline 1. Formulate a problem as 0 A(x) + B(x) with monotone operators
More informationConvex Optimization. (EE227A: UC Berkeley) Lecture 15. Suvrit Sra. (Gradient methods III) 12 March, 2013
Convex Optimization (EE227A: UC Berkeley) Lecture 15 (Gradient methods III) 12 March, 2013 Suvrit Sra Optimal gradient methods 2 / 27 Optimal gradient methods We saw following efficiency estimates for
More informationMath 273a: Optimization Subgradient Methods
Math 273a: Optimization Subgradient Methods Instructor: Wotao Yin Department of Mathematics, UCLA Fall 2015 online discussions on piazza.com Nonsmooth convex function Recall: For ˉx R n, f(ˉx) := {g R
More informationMath 273a: Optimization Convex Conjugacy
Math 273a: Optimization Convex Conjugacy Instructor: Wotao Yin Department of Mathematics, UCLA Fall 2015 online discussions on piazza.com Convex conjugate (the Legendre transform) Let f be a closed proper
More informationSparse Optimization Lecture: Dual Methods, Part I
Sparse Optimization Lecture: Dual Methods, Part I Instructor: Wotao Yin July 2013 online discussions on piazza.com Those who complete this lecture will know dual (sub)gradient iteration augmented l 1 iteration
More informationTight Rates and Equivalence Results of Operator Splitting Schemes
Tight Rates and Equivalence Results of Operator Splitting Schemes Wotao Yin (UCLA Math) Workshop on Optimization for Modern Computing Joint w Damek Davis and Ming Yan UCLA CAM 14-51, 14-58, and 14-59 1
More informationMath 273a: Optimization Overview of First-Order Optimization Algorithms
Math 273a: Optimization Overview of First-Order Optimization Algorithms Wotao Yin Department of Mathematics, UCLA online discussions on piazza.com 1 / 9 Typical flow of numerical optimization Optimization
More informationConditional Gradient (Frank-Wolfe) Method
Conditional Gradient (Frank-Wolfe) Method Lecturer: Aarti Singh Co-instructor: Pradeep Ravikumar Convex Optimization 10-725/36-725 1 Outline Today: Conditional gradient method Convergence analysis Properties
More informationProximal Gradient Descent and Acceleration. Ryan Tibshirani Convex Optimization /36-725
Proximal Gradient Descent and Acceleration Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: subgradient method Consider the problem min f(x) with f convex, and dom(f) = R n. Subgradient method:
More informationAbout Split Proximal Algorithms for the Q-Lasso
Thai Journal of Mathematics Volume 5 (207) Number : 7 http://thaijmath.in.cmu.ac.th ISSN 686-0209 About Split Proximal Algorithms for the Q-Lasso Abdellatif Moudafi Aix Marseille Université, CNRS-L.S.I.S
More informationProximal methods. S. Villa. October 7, 2014
Proximal methods S. Villa October 7, 2014 1 Review of the basics Often machine learning problems require the solution of minimization problems. For instance, the ERM algorithm requires to solve a problem
More informationMath 273a: Optimization Subgradients of convex functions
Math 273a: Optimization Subgradients of convex functions Made by: Damek Davis Edited by Wotao Yin Department of Mathematics, UCLA Fall 2015 online discussions on piazza.com 1 / 42 Subgradients Assumptions
More informationSplitting methods for decomposing separable convex programs
Splitting methods for decomposing separable convex programs Philippe Mahey LIMOS - ISIMA - Université Blaise Pascal PGMO, ENSTA 2013 October 4, 2013 1 / 30 Plan 1 Max Monotone Operators Proximal techniques
More informationLecture 9: September 28
0-725/36-725: Convex Optimization Fall 206 Lecturer: Ryan Tibshirani Lecture 9: September 28 Scribes: Yiming Wu, Ye Yuan, Zhihao Li Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These
More informationOptimization for Learning and Big Data
Optimization for Learning and Big Data Donald Goldfarb Department of IEOR Columbia University Department of Mathematics Distinguished Lecture Series May 17-19, 2016. Lecture 1. First-Order Methods for
More informationLECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE
LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE CONVEX ANALYSIS AND DUALITY Basic concepts of convex analysis Basic concepts of convex optimization Geometric duality framework - MC/MC Constrained optimization
More information6. Proximal gradient method
L. Vandenberghe EE236C (Spring 2013-14) 6. Proximal gradient method motivation proximal mapping proximal gradient method with fixed step size proximal gradient method with line search 6-1 Proximal mapping
More informationIterative Convex Optimization Algorithms; Part One: Using the Baillon Haddad Theorem
Iterative Convex Optimization Algorithms; Part One: Using the Baillon Haddad Theorem Charles Byrne (Charles Byrne@uml.edu) http://faculty.uml.edu/cbyrne/cbyrne.html Department of Mathematical Sciences
More informationConvex Optimization. Lecture 12 - Equality Constrained Optimization. Instructor: Yuanzhang Xiao. Fall University of Hawaii at Manoa
Convex Optimization Lecture 12 - Equality Constrained Optimization Instructor: Yuanzhang Xiao University of Hawaii at Manoa Fall 2017 1 / 19 Today s Lecture 1 Basic Concepts 2 for Equality Constrained
More informationMath 273a: Optimization Lagrange Duality
Math 273a: Optimization Lagrange Duality Instructor: Wotao Yin Department of Mathematics, UCLA Winter 2015 online discussions on piazza.com Gradient descent / forward Euler assume function f is proper
More informationConvex Optimization Notes
Convex Optimization Notes Jonathan Siegel January 2017 1 Convex Analysis This section is devoted to the study of convex functions f : B R {+ } and convex sets U B, for B a Banach space. The case of B =
More informationAgenda. Fast proximal gradient methods. 1 Accelerated first-order methods. 2 Auxiliary sequences. 3 Convergence analysis. 4 Numerical examples
Agenda Fast proximal gradient methods 1 Accelerated first-order methods 2 Auxiliary sequences 3 Convergence analysis 4 Numerical examples 5 Optimality of Nesterov s scheme Last time Proximal gradient method
More informationEE 546, Univ of Washington, Spring Proximal mapping. introduction. review of conjugate functions. proximal mapping. Proximal mapping 6 1
EE 546, Univ of Washington, Spring 2012 6. Proximal mapping introduction review of conjugate functions proximal mapping Proximal mapping 6 1 Proximal mapping the proximal mapping (prox-operator) of a convex
More informationLecture 8: February 9
0-725/36-725: Convex Optimiation Spring 205 Lecturer: Ryan Tibshirani Lecture 8: February 9 Scribes: Kartikeya Bhardwaj, Sangwon Hyun, Irina Caan 8 Proximal Gradient Descent In the previous lecture, we
More informationOptimization methods
Optimization methods Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda /8/016 Introduction Aim: Overview of optimization methods that Tend to
More informationMaster 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique
Master 2 MathBigData S. Gaïffas 1 3 novembre 2014 1 CMAP - Ecole Polytechnique 1 Supervised learning recap Introduction Loss functions, linearity 2 Penalization Introduction Ridge Sparsity Lasso 3 Some
More informationA Unified Approach to Proximal Algorithms using Bregman Distance
A Unified Approach to Proximal Algorithms using Bregman Distance Yi Zhou a,, Yingbin Liang a, Lixin Shen b a Department of Electrical Engineering and Computer Science, Syracuse University b Department
More information6. Proximal gradient method
L. Vandenberghe EE236C (Spring 2016) 6. Proximal gradient method motivation proximal mapping proximal gradient method with fixed step size proximal gradient method with line search 6-1 Proximal mapping
More informationOperator Splitting for Parallel and Distributed Optimization
Operator Splitting for Parallel and Distributed Optimization Wotao Yin (UCLA Math) Shanghai Tech, SSDS 15 June 23, 2015 URL: alturl.com/2z7tv 1 / 60 What is splitting? Sun-Tzu: (400 BC) Caesar: divide-n-conquer
More information1 Sparsity and l 1 relaxation
6.883 Learning with Combinatorial Structure Note for Lecture 2 Author: Chiyuan Zhang Sparsity and l relaxation Last time we talked about sparsity and characterized when an l relaxation could recover the
More informationThe proximal mapping
The proximal mapping http://bicmr.pku.edu.cn/~wenzw/opt-2016-fall.html Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes Outline 2/37 1 closed function 2 Conjugate function
More informationMath 273a: Optimization Subgradients of convex functions
Math 273a: Optimization Subgradients of convex functions Made by: Damek Davis Edited by Wotao Yin Department of Mathematics, UCLA Fall 2015 online discussions on piazza.com 1 / 20 Subgradients Assumptions
More informationOptimization methods
Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,
More informationECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference
ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring
More information1 Overview. 2 A Characterization of Convex Functions. 2.1 First-order Taylor approximation. AM 221: Advanced Optimization Spring 2016
AM 221: Advanced Optimization Spring 2016 Prof. Yaron Singer Lecture 8 February 22nd 1 Overview In the previous lecture we saw characterizations of optimality in linear optimization, and we reviewed the
More informationMath 273a: Optimization Basic concepts
Math 273a: Optimization Basic concepts Instructor: Wotao Yin Department of Mathematics, UCLA Spring 2015 slides based on Chong-Zak, 4th Ed. Goals of this lecture The general form of optimization: minimize
More informationUses of duality. Geoff Gordon & Ryan Tibshirani Optimization /
Uses of duality Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Remember conjugate functions Given f : R n R, the function is called its conjugate f (y) = max x R n yt x f(x) Conjugates appear
More informationOptimization for Machine Learning
Optimization for Machine Learning (Problems; Algorithms - A) SUVRIT SRA Massachusetts Institute of Technology PKU Summer School on Data Science (July 2017) Course materials http://suvrit.de/teaching.html
More informationLasso: Algorithms and Extensions
ELE 538B: Sparsity, Structure and Inference Lasso: Algorithms and Extensions Yuxin Chen Princeton University, Spring 2017 Outline Proximal operators Proximal gradient methods for lasso and its extensions
More informationOn convergence rate of the Douglas-Rachford operator splitting method
On convergence rate of the Douglas-Rachford operator splitting method Bingsheng He and Xiaoming Yuan 2 Abstract. This note provides a simple proof on a O(/k) convergence rate for the Douglas- Rachford
More informationSequential Unconstrained Minimization: A Survey
Sequential Unconstrained Minimization: A Survey Charles L. Byrne February 21, 2013 Abstract The problem is to minimize a function f : X (, ], over a non-empty subset C of X, where X is an arbitrary set.
More informationFrank-Wolfe Method. Ryan Tibshirani Convex Optimization
Frank-Wolfe Method Ryan Tibshirani Convex Optimization 10-725 Last time: ADMM For the problem min x,z f(x) + g(z) subject to Ax + Bz = c we form augmented Lagrangian (scaled form): L ρ (x, z, w) = f(x)
More informationmin f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term;
Chapter 2 Gradient Methods The gradient method forms the foundation of all of the schemes studied in this book. We will provide several complementary perspectives on this algorithm that highlight the many
More informationDual and primal-dual methods
ELE 538B: Large-Scale Optimization for Data Science Dual and primal-dual methods Yuxin Chen Princeton University, Spring 2018 Outline Dual proximal gradient method Primal-dual proximal gradient method
More informationThis can be 2 lectures! still need: Examples: non-convex problems applications for matrix factorization
This can be 2 lectures! still need: Examples: non-convex problems applications for matrix factorization x = prox_f(x)+prox_{f^*}(x) use to get prox of norms! PROXIMAL METHODS WHY PROXIMAL METHODS Smooth
More informationContraction Methods for Convex Optimization and monotone variational inequalities No.12
XII - 1 Contraction Methods for Convex Optimization and monotone variational inequalities No.12 Linearized alternating direction methods of multipliers for separable convex programming Bingsheng He Department
More informationAccelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems)
Accelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems) Donghwan Kim and Jeffrey A. Fessler EECS Department, University of Michigan
More information4TE3/6TE3. Algorithms for. Continuous Optimization
4TE3/6TE3 Algorithms for Continuous Optimization (Algorithms for Constrained Nonlinear Optimization Problems) Tamás TERLAKY Computing and Software McMaster University Hamilton, November 2005 terlaky@mcmaster.ca
More informationLecture 7: September 17
10-725: Optimization Fall 2013 Lecture 7: September 17 Lecturer: Ryan Tibshirani Scribes: Serim Park,Yiming Gu 7.1 Recap. The drawbacks of Gradient Methods are: (1) requires f is differentiable; (2) relatively
More informationMATH 829: Introduction to Data Mining and Analysis Computing the lasso solution
1/16 MATH 829: Introduction to Data Mining and Analysis Computing the lasso solution Dominique Guillot Departments of Mathematical Sciences University of Delaware February 26, 2016 Computing the lasso
More informationDual Proximal Gradient Method
Dual Proximal Gradient Method http://bicmr.pku.edu.cn/~wenzw/opt-2016-fall.html Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes Outline 2/19 1 proximal gradient method
More informationSparse Regularization via Convex Analysis
Sparse Regularization via Convex Analysis Ivan Selesnick Electrical and Computer Engineering Tandon School of Engineering New York University Brooklyn, New York, USA 29 / 66 Convex or non-convex: Which
More informationAn Accelerated Hybrid Proximal Extragradient Method for Convex Optimization and its Implications to Second-Order Methods
An Accelerated Hybrid Proximal Extragradient Method for Convex Optimization and its Implications to Second-Order Methods Renato D.C. Monteiro B. F. Svaiter May 10, 011 Revised: May 4, 01) Abstract This
More informationSmoothing Proximal Gradient Method. General Structured Sparse Regression
for General Structured Sparse Regression Xi Chen, Qihang Lin, Seyoung Kim, Jaime G. Carbonell, Eric P. Xing (Annals of Applied Statistics, 2012) Gatsby Unit, Tea Talk October 25, 2013 Outline Motivation:
More informationARock: an algorithmic framework for asynchronous parallel coordinate updates
ARock: an algorithmic framework for asynchronous parallel coordinate updates Zhimin Peng, Yangyang Xu, Ming Yan, Wotao Yin ( UCLA Math, U.Waterloo DCO) UCLA CAM Report 15-37 ShanghaiTech SSDS 15 June 25,
More informationEE 367 / CS 448I Computational Imaging and Display Notes: Image Deconvolution (lecture 6)
EE 367 / CS 448I Computational Imaging and Display Notes: Image Deconvolution (lecture 6) Gordon Wetzstein gordon.wetzstein@stanford.edu This document serves as a supplement to the material discussed in
More informationLecture 3. Optimization Problems and Iterative Algorithms
Lecture 3 Optimization Problems and Iterative Algorithms January 13, 2016 This material was jointly developed with Angelia Nedić at UIUC for IE 598ns Outline Special Functions: Linear, Quadratic, Convex
More informationProximal Newton Method. Zico Kolter (notes by Ryan Tibshirani) Convex Optimization
Proximal Newton Method Zico Kolter (notes by Ryan Tibshirani) Convex Optimization 10-725 Consider the problem Last time: quasi-newton methods min x f(x) with f convex, twice differentiable, dom(f) = R
More informationConvex Optimization. Problem set 2. Due Monday April 26th
Convex Optimization Problem set 2 Due Monday April 26th 1 Gradient Decent without Line-search In this problem we will consider gradient descent with predetermined step sizes. That is, instead of determining
More information9. Dual decomposition and dual algorithms
EE 546, Univ of Washington, Spring 2016 9. Dual decomposition and dual algorithms dual gradient ascent example: network rate control dual decomposition and the proximal gradient method examples with simple
More informationGradient Descent. Ryan Tibshirani Convex Optimization /36-725
Gradient Descent Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: canonical convex programs Linear program (LP): takes the form min x subject to c T x Gx h Ax = b Quadratic program (QP): like
More informationMLCC 2018 Variable Selection and Sparsity. Lorenzo Rosasco UNIGE-MIT-IIT
MLCC 2018 Variable Selection and Sparsity Lorenzo Rosasco UNIGE-MIT-IIT Outline Variable Selection Subset Selection Greedy Methods: (Orthogonal) Matching Pursuit Convex Relaxation: LASSO & Elastic Net
More informationDual methods and ADMM. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725
Dual methods and ADMM Barnabas Poczos & Ryan Tibshirani Convex Optimization 10-725/36-725 1 Given f : R n R, the function is called its conjugate Recall conjugate functions f (y) = max x R n yt x f(x)
More informationNumerisches Rechnen. (für Informatiker) M. Grepl P. Esser & G. Welper & L. Zhang. Institut für Geometrie und Praktische Mathematik RWTH Aachen
Numerisches Rechnen (für Informatiker) M. Grepl P. Esser & G. Welper & L. Zhang Institut für Geometrie und Praktische Mathematik RWTH Aachen Wintersemester 2011/12 IGPM, RWTH Aachen Numerisches Rechnen
More informationSubgradient Method. Ryan Tibshirani Convex Optimization
Subgradient Method Ryan Tibshirani Convex Optimization 10-725 Consider the problem Last last time: gradient descent min x f(x) for f convex and differentiable, dom(f) = R n. Gradient descent: choose initial
More informationAccelerated Proximal Gradient Methods for Convex Optimization
Accelerated Proximal Gradient Methods for Convex Optimization Paul Tseng Mathematics, University of Washington Seattle MOPTA, University of Guelph August 18, 2008 ACCELERATED PROXIMAL GRADIENT METHODS
More informationLecture 1: September 25
0-725: Optimization Fall 202 Lecture : September 25 Lecturer: Geoff Gordon/Ryan Tibshirani Scribes: Subhodeep Moitra Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes have
More informationminimize x subject to (x 2)(x 4) u,
Math 6366/6367: Optimization and Variational Methods Sample Preliminary Exam Questions 1. Suppose that f : [, L] R is a C 2 -function with f () on (, L) and that you have explicit formulae for
More informationConvex Functions. Pontus Giselsson
Convex Functions Pontus Giselsson 1 Today s lecture lower semicontinuity, closure, convex hull convexity preserving operations precomposition with affine mapping infimal convolution image function supremum
More informationConvex Optimization. Dani Yogatama. School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA. February 12, 2014
Convex Optimization Dani Yogatama School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA February 12, 2014 Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12,
More informationAlgorithms for Nonsmooth Optimization
Algorithms for Nonsmooth Optimization Frank E. Curtis, Lehigh University presented at Center for Optimization and Statistical Learning, Northwestern University 2 March 2018 Algorithms for Nonsmooth Optimization
More informationAlgorithms for Constrained Optimization
1 / 42 Algorithms for Constrained Optimization ME598/494 Lecture Max Yi Ren Department of Mechanical Engineering, Arizona State University April 19, 2015 2 / 42 Outline 1. Convergence 2. Sequential quadratic
More informationYou should be able to...
Lecture Outline Gradient Projection Algorithm Constant Step Length, Varying Step Length, Diminishing Step Length Complexity Issues Gradient Projection With Exploration Projection Solving QPs: active set
More informationConstrained Optimization
1 / 22 Constrained Optimization ME598/494 Lecture Max Yi Ren Department of Mechanical Engineering, Arizona State University March 30, 2015 2 / 22 1. Equality constraints only 1.1 Reduced gradient 1.2 Lagrange
More informationProximal splitting methods on convex problems with a quadratic term: Relax!
Proximal splitting methods on convex problems with a quadratic term: Relax! The slides I presented with added comments Laurent Condat GIPSA-lab, Univ. Grenoble Alpes, France Workshop BASP Frontiers, Jan.
More information14. Nonlinear equations
L. Vandenberghe ECE133A (Winter 2018) 14. Nonlinear equations Newton method for nonlinear equations damped Newton method for unconstrained minimization Newton method for nonlinear least squares 14-1 Set
More informationProximal gradient methods
ELE 538B: Large-Scale Optimization for Data Science Proximal gradient methods Yuxin Chen Princeton University, Spring 08 Outline Proximal gradient descent for composite functions Proximal mapping / operator
More informationA GENERALIZATION OF THE REGULARIZATION PROXIMAL POINT METHOD
A GENERALIZATION OF THE REGULARIZATION PROXIMAL POINT METHOD OGANEDITSE A. BOIKANYO AND GHEORGHE MOROŞANU Abstract. This paper deals with the generalized regularization proximal point method which was
More informationLecture 5 : Projections
Lecture 5 : Projections EE227C. Lecturer: Professor Martin Wainwright. Scribe: Alvin Wan Up until now, we have seen convergence rates of unconstrained gradient descent. Now, we consider a constrained minimization
More informationSEMI-SMOOTH SECOND-ORDER TYPE METHODS FOR COMPOSITE CONVEX PROGRAMS
SEMI-SMOOTH SECOND-ORDER TYPE METHODS FOR COMPOSITE CONVEX PROGRAMS XIANTAO XIAO, YONGFENG LI, ZAIWEN WEN, AND LIWEI ZHANG Abstract. The goal of this paper is to study approaches to bridge the gap between
More informationPart 3: Trust-region methods for unconstrained optimization. Nick Gould (RAL)
Part 3: Trust-region methods for unconstrained optimization Nick Gould (RAL) minimize x IR n f(x) MSc course on nonlinear optimization UNCONSTRAINED MINIMIZATION minimize x IR n f(x) where the objective
More informationLecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem
Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem Michael Patriksson 0-0 The Relaxation Theorem 1 Problem: find f := infimum f(x), x subject to x S, (1a) (1b) where f : R n R
More informationON THE GLOBAL AND LINEAR CONVERGENCE OF THE GENERALIZED ALTERNATING DIRECTION METHOD OF MULTIPLIERS
ON THE GLOBAL AND LINEAR CONVERGENCE OF THE GENERALIZED ALTERNATING DIRECTION METHOD OF MULTIPLIERS WEI DENG AND WOTAO YIN Abstract. The formulation min x,y f(x) + g(y) subject to Ax + By = b arises in
More informationPositive Definite Matrix
1/29 Chia-Ping Chen Professor Department of Computer Science and Engineering National Sun Yat-sen University Linear Algebra Positive Definite, Negative Definite, Indefinite 2/29 Pure Quadratic Function
More informationNOTES ON FIRST-ORDER METHODS FOR MINIMIZING SMOOTH FUNCTIONS. 1. Introduction. We consider first-order methods for smooth, unconstrained
NOTES ON FIRST-ORDER METHODS FOR MINIMIZING SMOOTH FUNCTIONS 1. Introduction. We consider first-order methods for smooth, unconstrained optimization: (1.1) minimize f(x), x R n where f : R n R. We assume
More information1 Introduction and preliminaries
Proximal Methods for a Class of Relaxed Nonlinear Variational Inclusions Abdellatif Moudafi Université des Antilles et de la Guyane, Grimaag B.P. 7209, 97275 Schoelcher, Martinique abdellatif.moudafi@martinique.univ-ag.fr
More informationNewton s Method. Javier Peña Convex Optimization /36-725
Newton s Method Javier Peña Convex Optimization 10-725/36-725 1 Last time: dual correspondences Given a function f : R n R, we define its conjugate f : R n R, f ( (y) = max y T x f(x) ) x Properties and
More informationBASICS OF CONVEX ANALYSIS
BASICS OF CONVEX ANALYSIS MARKUS GRASMAIR 1. Main Definitions We start with providing the central definitions of convex functions and convex sets. Definition 1. A function f : R n R + } is called convex,
More information1 Non-negative Matrix Factorization (NMF)
2018-06-21 1 Non-negative Matrix Factorization NMF) In the last lecture, we considered low rank approximations to data matrices. We started with the optimal rank k approximation to A R m n via the SVD,
More informationA Tutorial on Primal-Dual Algorithm
A Tutorial on Primal-Dual Algorithm Shenlong Wang University of Toronto March 31, 2016 1 / 34 Energy minimization MAP Inference for MRFs Typical energies consist of a regularization term and a data term.
More informationA Brief Review on Convex Optimization
A Brief Review on Convex Optimization 1 Convex set S R n is convex if x,y S, λ,µ 0, λ+µ = 1 λx+µy S geometrically: x,y S line segment through x,y S examples (one convex, two nonconvex sets): A Brief Review
More informationLecture 6: Conic Optimization September 8
IE 598: Big Data Optimization Fall 2016 Lecture 6: Conic Optimization September 8 Lecturer: Niao He Scriber: Juan Xu Overview In this lecture, we finish up our previous discussion on optimality conditions
More informationA Multilevel Proximal Algorithm for Large Scale Composite Convex Optimization
A Multilevel Proximal Algorithm for Large Scale Composite Convex Optimization Panos Parpas Department of Computing Imperial College London www.doc.ic.ac.uk/ pp500 p.parpas@imperial.ac.uk jointly with D.V.
More informationOslo Class 6 Sparsity based regularization
RegML2017@SIMULA Oslo Class 6 Sparsity based regularization Lorenzo Rosasco UNIGE-MIT-IIT May 4, 2017 Learning from data Possible only under assumptions regularization min Ê(w) + λr(w) w Smoothness Sparsity
More informationPrimal/Dual Decomposition Methods
Primal/Dual Decomposition Methods Daniel P. Palomar Hong Kong University of Science and Technology (HKUST) ELEC5470 - Convex Optimization Fall 2018-19, HKUST, Hong Kong Outline of Lecture Subgradients
More informationProximal Methods for Optimization with Spasity-inducing Norms
Proximal Methods for Optimization with Spasity-inducing Norms Group Learning Presentation Xiaowei Zhou Department of Electronic and Computer Engineering The Hong Kong University of Science and Technology
More informationSparse Optimization Lecture: Basic Sparse Optimization Models
Sparse Optimization Lecture: Basic Sparse Optimization Models Instructor: Wotao Yin July 2013 online discussions on piazza.com Those who complete this lecture will know basic l 1, l 2,1, and nuclear-norm
More information