Math 273a: Optimization Overview of First-Order Optimization Algorithms

Size: px
Start display at page:

Download "Math 273a: Optimization Overview of First-Order Optimization Algorithms"

Transcription

1 Math 273a: Optimization Overview of First-Order Optimization Algorithms Wotao Yin Department of Mathematics, UCLA online discussions on piazza.com 1 / 9

2 Typical flow of numerical optimization Optimization Problem Standard forms: LS, LP, SOCP... Algorithm Prob-specific improvements Solution 2 / 9

3 Linear programming (LP) General form minimize x c T x + c 0 subject to a T i x b i, i = 1,..., m ā T j x = b j, j = 1,..., m Every LP can be turned into the standard form minimize x c T x subject to Ax = b, x 0. no analytic formula for solutions reliable algorithms and software packages (e.g., CPlex, Gurobi) computation time proportional to n 2 m if m n, less with structured data a mature technology (unless A is huge) 3 / 9

4 Example: basis pursuit Goal: recover a sparse solution in [l, u] to Ax = b Model: l 1-minimization problem minimize x x 1 subject to Ax = b l x u 4 / 9

5 Original signal Recovered signal 5 / 9

6 Original model: minimize x x 1 subject to Ax = b l x u LP formulation: minimize x +.x x + + x subject to A(x + x ) = [A A] l x + x = [I I] ( x + x ) 0 ( x + ( x + x x ) ) u = b 6 / 9

7 Another optimization flow Optimization Problem simple parts (coordinates / operators) cooridnate / splitting algorithm Solution 7 / 9

8 Coordinate descent and operator splitting algorithms provide us with a simple yet powerful approach to derive algorithms, which are applicable to largely many problems (because they have friendly structures) easy to implement (often just several lines of code) applied on median/large scale problems, having (nearly) state-of-the-art performance 1 highly scalable (can go stochastic, parallel, distributed, asynchronous) convergence and complexity guarantees 1 They may not be the best choice for problems of small/median scales. 8 / 9

9 Monotone operator splitting pipeline 1. recognize the simple parts in your problem 2. reformulate as a monotone inclusion: e.g., 0 (A 1 + A 2)x (A i contains the simple parts in your problem) 3. apply an operator-splitting scheme: e.g., 0 (A 1 + A 2)x x = T A1 T }{{ A2 (x) } T 4. run the iteration z k+1 = z k + λ(tz k z k ), λ (0, 1] 9 / 9

10 Operator splitting: The big-three 10 / 9

11 The big three two-operator splitting schemes 0 Ax + Bx forward-backward (Mercier 79) for (maximally monotone) + (cocoercive) Douglas-Rachford (Lion-Mercier 79) for (maximally monotone) + (maximally monotone) forward-backward-forward (Tseng 00) for (maximally monotone) + (Lipschitz & monotone) all the schemes are built from forward operators and backward operators 11 / 9

12 Forward-backward splitting require: A monotone, B single-valued monotone forward-backward splitting (FBS) operator (Lion-Mercier 79) T FBS := (I + γa) 1 (I γb) where γ > 0 reduces to forward operator if A = 0, and backward operator if B = 0 equivalent conditions: 0 Ax + Bx x γbx x + γax (I γb)x (I + γa)x (I + γa) 1 (I γb)x = x x = T FBS (x) 12 / 9

13 FBS iteration: x k+1 = T FBS (x k ) converges if B is β-cocoercive and γ < 2β (come later) convergence rates are known typical usage: minimize smooth + proximable minimize smooth + proximable linear + proximable/constraints decentralized consensus minimization 13 / 9

14 Douglas-Rachford splitting require: A, B both monotone, possibly set-valued define reflective resolvent R γt := 2(I + γt) 1 I Douglas-Rachford splitting (DRS) operator (Lion-Mercier 79) T DRS := 1 2 I R γa R γb note: switching A and B gives a different DRS operator introduce z (I + γa)x. Then, x = J γa (z) and equivalent conditions: 0 Ax + Bx z = TPRS(z), λ x = J γb z (not finished) applications: alternating projection, ADMM, distributed optimization 14 / 9

15 Peaceman-Rachford splitting (PRS) operator: T PRS := R γa R γb (relaxed) Peaceman-Rachford splitting (PRS) operator, λ (0, 1]: T 1/2 PRS recovers T DRS T λ PRS := (1 λ)i + λr γa R γb T λ PRS, λ (0, 1), requires a weaker condition to converge than T PRS but the latter tends to be faster when it converges 15 / 9

16 Operator splitting: Direct applications 16 / 9

17 Constrained minimization C is a convex set. f is a proper close convex function. minimize x f (x) subject to x C equivalent condition: 0 N C (x) + f (x) (N C (x) = ι C (x), the subdifferential of C s indicator function) 17 / 9

18 if f is Lipschitz differentiable, then apply forward-backward splitting x k+1 = proj C (I γ f )x k recovers the projected gradient method [picture] 18 / 9

19 Constrained minimization, cont. if f is non-differentiable, then apply Douglas-Rachford splitting (DRS) z k+1 = ( 1 2 I (2prox γf I ) (2proj C I ) ) z k (recover x k = proj C z k ) dual DRS approach: introduce x y = 0 and apply ADMM to minimize x,y f (x) + ι C (y) subject to x y = 0. (indicator function ι C (y) = 0, if y C, and otherwise.) equivalence: the ADMM iteration = the DRS iteration 19 / 9

20 Regularization least squares minimize x r(x) + 1 Kx b 2 } 2 {{} f (x) K: linear operator; b: input data (observation) r: enforces a structure on x. examples: l 2 2, l 1, sorted l 1, l 2, TV, nuclear norm,... equivalent condition: 0 r(x) + f (x) x = (I + γ r) 1 }{{} =prox γr (I γ f )x forward-backward splitting iteration: x k+1 = prox γr (I γ f )x k 20 / 9

21 Example: LASSO (basis pursuit denoising) Tibshirani 96: find a sparse vector x such that Lx b by solving minimize x λ x Lx b 2 }{{}} 2 {{} r(x) f (x) 2 simple parts: proximable function: r(x) := λ x 1 arg min x {r(x) + 1 2γ x y 2 } = sign(y) max{0, y λγ} smooth function: f (x) := 1 2 Lx b 2 and f (x) = L T (Lx b) apply forward-backward splitting to 0 Ax + Bx: x k+1 = (I + γa) 1 (I γb)x k }{{} Tx k 21 / 9

22 realization: (I γb)x k = x k γ f (x k ) (I + γa) 1 (y) = sign(y) max{0, y λγ} therefore, x k+1 = (I + γa) 1 (I γb)x k is realized as y k = x k γl T (Lx k b) x k+1 = sign(y k ) max{0, y k λγ} which recovers the Iterative Soft-Thresholding Algorithm (ISTA) if matrix L is huge, we can randomly sample rows, columns, or blocks of L at each iteration, or distribute the storage of L and run (asynchronous) parallel algorithms 22 / 9

23 Multi-function minimization f 1,..., f m : H (, ] are proper closed convex functions. product-space trick: minimize f 1(x) + + f N (x) x introduce copies x (i) H of x; let x = (x (1),..., x (N) ) H N define the set C = {x : x (1) = = x (N) } equivalent problem in H N : minimize x ι C (x) + N f i(x (i) ) i=1 then apply a two-operator splitting scheme 23 / 9

24 if all f i are smooth, apply forward-backward splitting and evaluate f i if all f i are proximable, apply Douglas-Rachford splitting and evaluate prox fi in parallel (alternatively, one can apply Eckstein-Svaiter projective splitting) if some f i are smooth and the others are proximable, we cannot mix them. We need to apply a three-operator splitting (e.g, FBS and DRS) 24 / 9

25 Alternating projection: project, project C 1 and C 2 are closed convex sets. Let dist(x, C) = min{ x y : y C}. equivalent problem: minimize 1 2 dist2 (x, C 1) dist2 (x, C 2). apply Peaceman-Rachford splitting: z k+1 = proj C1 proj C2 (z k ). C 1 z 1 C 2 z 0 25 / 9

26 Alternating projection: project, project C 1 and C 2 are closed convex sets. Let dist(x, C) = min{ x y : y C}. equivalent problem: minimize 1 2 dist2 (x, C 1) dist2 (x, C 2). apply Peaceman-Rachford splitting: z k+1 = proj C1 proj C2 (z k ). C 1 z 2 z 3 z 1 C 2 z 0 26 / 9

27 Operator splitting: Dual applications 27 / 9

28 Lagrange duality original problem: minimize x f (x) subject to Ax = b. relaxations: Lagrangian: L(x; w) := f (x) + w T (Ax b) augmented Lagrangian: L(x; w, γ) := f (x) + w T (Ax b) + γ Ax b 2 2 (negative) dual function, which is convex (even if f is not): (negative) dual problem: d(w) = min x L(x; w) minimize w d(w) 28 / 9

29 Duality: why? convex (also some nonconvex) problems have two perspectives: the primal problem and the dual problem duality brings us: an alternative or relaxed problem provides lower bounds certificates of optimality or infeasibility economic interpretations duality + operator splitting: decouples the linear constraints that couple the variables gives rise to parallel and distributed algorithms 29 / 9

30 Forward operator on dual function suppose that d(w) is differentiable (equivalently, f (x) is strictly convex) dual forward operator based on d: w + = (I γ d)w the operator can be evaluated through minimizing the Lagrangian: { x + = arg min x L(x ; w) w + = w γ(b Ax + ) dual forward operator can be evaluated without involving d at all property: b Ax + = d(w) 30 / 9

31 Dual backward operator dual backward operator based on d: w + = (I + γ d) 1 w = arg min w { d(w ) + 1 2γ w w 2} w + can be computed through minimizing the augmented Lagrangian: { x + arg min x L(x ; w, γ) w + = w γ(b Ax + ) dual backward operator can also be evaluated without involving d at all property: b Ax + d(w + ) 31 / 9

32 Dual algorithms (no splitting yet) original problem: minimize x f (x) subject to Ax = b. if f is strongly convex (thus d is Lipschitz differentiable), apply dual gradient iteration: { x k+1 = arg min x L(x ; w k ) w k+1 = w γ(b Ax k+1 ) if f is convex (d may not be differentiable), apply dual PPA: { x k+1 arg min x L(x ; w k, γ) w k+1 = w γ(b Ax k+1 ) this iteration has a more complicated subproblem but is also more stable neither algorithms explicitly involves the dual function d 32 / 9

33 Monotropic program definition: minimize x 1,...,x m f 1(x 1) + + f m(x m) subject to A 1x A mx m = b. x 1,..., x m are separable in the objective but coupled in the constraints the dual problem has the form where minimize w d 1(w) + + d m(w) d i(w) := min x i { fi(x i) + w T( A ix i 1 m b)}. each d i only involves x i, f i, and A i; they are connected by dual variable w 33 / 9

34 Examples of monotropic programs linear programs min{f (x) : Ax C} min x,y{f (x) + ι C (y) : Ax y = 0} consensus problem min{f 1(x 1) + + f n(x n) : Ax = 0}, where Ax = 0 x 1 = = x m the structure of A enables distributed computing exchange problems 34 / 9

35 Dual (Lagrangian) decomposition minimize x 1,...,x m f 1(x 1) + + f m(x m) subject to A 1x A mx m = b. the variables x 1,..., x m are decoupled in the Lagrangian L(x 1,..., x n; w) = m { fi(x i) + w T (A ix i 1 m b)} i=1 (but not so in the augmented Lagrangian since it includes the term γ 2 A1x1 + + Amxm b 2 ) 35 / 9

36 let A = [A 1 A m] and x = [x 1;... ; x m] the dual gradient iteration x k+1 = arg min x L(x ; w k ) w k+1 = w γ(b Ax k+1 ) the first step reduces to m independent subproblems x k+1 i = arg min f i(x i ) + w kt (A ix i 1 x i m b), i = 1,..., m which can be solved in parallel this decomposition requires strongly convex f 1,..., f m (equivalently, Lipschitz differentiable d 1,..., d m) (dual PPA doesn t have this requirement, but the first step doesn t decouple either) 36 / 9

37 Dual forward-backward splitting original problem: minimize x f 1(x 1) + f 2(x 2) subject to A 1x 1 + A 2x 2 = b. require: strongly convex f 1 (thus Lipschitz d 1), convex f 2 FBS iteration: z k+1 = prox γd2 (I γ d 1)z k FBS expressed in terms of original problem s components: x k+1 1 = arg min x f 1 1(x 1) + w kt A 1x 1 x k+1 2 arg min x f 2 2(x 2) + w kt A 2x 2 + γ 2 A1xk A 2x 2 b 2 w k+1 = w γ(b A 1x k+1 1 A 2x k+1 2 ) we have recovered Tseng s Alternating Minimization Algorithm 37 / 9

38 Dual Douglas-Rachford splitting original problem: minimize x,y f 1(x 1) + f 2(x 2) subject to A 1x 1 + A 2x 2 = b. f 1, f 2 are convex, no strong-convexity requirement DRS iteration: z k+1 = ( 1 2 I (2prox γd 2 I )(2prox γd1 I ) ) z k DRS expressed in terms of original problem s components: x k+1 1 arg min x f 1 1(x 1) + w kt A 1x 1 + γ 2 A1x 1 + A 2x2 k b 2 x k+1 2 arg min x f 2 2(x 2) + w kt A 2x 2 + γ 2 A1xk A 2x 2 b 2 w k+1 = w k γ(b A 1x k+1 1 A 2x k+1 2 ) recover the Alternating Direction Method of Multipliers (ADMM) 38 / 9

39 Dual operator splitting for monotropic programming the problem has a separable objective and coupling linear constraints each iteration: separate f i subproblems + multiplier update Lagrangian x i-subproblems require strongly convex f i and can be solved in parallel augmented Lagrangian x i-subproblems do not require strong-convex f i but are solved in sequence (they can also be solved in parallel with more advance techniques) 39 / 9

40 Operator splitting: Primal-dual applications 40 / 9

41 proximable linear composition problem: minimize proximable linear + proximable + smooth equivalent condition: minimize x decouple r 1 from L: introduce r 1(Lx) + r 2(x) + f (x) 0 (L T r 1 L + r 2 + f )x dual variable y r 1(Lx) Lx r 1 (y) equivalent condition: [ ] [ ] [ ] [ ] 0 L T x r2(x) f (x) L 0 y r1 (y) 0 41 / 9

42 equivalent condition (copied from last slide): [ ] [ ] [ ] [ ] 0 L T x r2(x) f (x) L 0 y r1 (y) 0 }{{}}{{} Az Bz [ ] x primal-dual variable: z =. y apply forward-backward splitting to 0 Az + Bz: z k+1 = J γa F γb z k z k+1 = (I + γa) 1 (I γb)z k { x k+1 + γl T y k+1 + γ r2(x k+1 ) = x k γ f (x k ) solve y k+1 γlx k+1 + γ r 1 (y k+1 ) = y k issue: both x k+1 and y k+1 appear in both equations! 42 / 9

43 solution: introduce the metric [ ] I γl T U = 0 γl I apply forward-backward splitting to 0 U 1 Az + U 1 Bz: z k+1 = J γu 1 A (I γu 1 B)z k z k+1 = (I + γu 1 A) 1 (I γu 1 B)z k solve Uz k+1 + γãzk+1 = Uz k γbz k { x k+1 γl T y k+1 + γl T y k+1 + γ r2(x k+1 ) = x k γl T y k γ f (x k ) y k+1 γl x k+1 γlx k+1 + γ r1 (y k+1 ) = y k γl x k (like Gaussian elimination, y k+1 is cancelled from the first equation) 43 / 9

44 strategy: obtain x k+1 from the first equation; plug in x k+1 as a constant into the second equation and then obtain y k+1 final iteration: x k+1 = prox γr2 (x k γl T y k γ f (x k )) y k+1 = prox γr (y k + γl(2x k+1 x k )) 1 nice properties: apply L and f explicitly solve proximal-point subproblems of r 1 and r 2 convergence follows from standard forward-backward splitting 44 / 9

45 Example: total variation (TV) deblurring Rudin-Osher-Fatemi 92: minimize u 1 2 Ku b 2 + λ Du 1 subject to 0 u simple parts: smooth + proximable linear + proximable smooth function: f (u) = 1 Ku b 2 2 linear operator: D proximable function: r 1 = λ 1 proximable function: r 2 = ι [0,255] (equivalent to the constraints 0 u 255) 45 / 9

46 (The steps below are sophisticated but routine. We skip the details.) equivalent condition: [ ] [ ] [ ] [ ] r2 D u f 0 u 0 + D r1 w 0 0 w (where w is the auxiliary (dual) variable) forward-backward splitting algorithm under a special metric: ( u k+1 = proj [0,255] n u k γd w k γ f (u k ) ) ( w k+1 = prox γl w k + γd(2u k+1 u k ) ) every step is simple to implement parallel, distributed, and stochastic algorithms are also available 46 / 9

47 Three-operator splitting 47 / 9

48 A three-operator splitting scheme motivation: all existing monotone splitting schemes reduce to one of the big three : forward-backward (1970s) Douglas-Rachford (1970s) forward-backward-forward (2000) benefits of a multi-operator splitting scheme save extra variables, potential savings in memory and cpu time fewer tricks, increased flexibility improve theoretical understanding to operator splitting challenge: the fixed-point of the operator T encodes a solution convergence of the iteration z k+1 = z k + λ(tz k z k ) 48 / 9

49 A three-operator splitting scheme require: A, B maximally monotone, C cocoercive Davis and Yin 15: T DY := I J γb + J γa (2J γb I γc J γb ) (evaluating T DY z will evaluate J γa, J γb, and C only once each) reduces to BFS if A = 0, FBS if B = 0, and to DRS if C = 0 equivalent conditions: 0 Ax + Bx + Cx z = T DY (z), x = J γb z 49 / 9

50 let C be β-cocoercive choose γ (0, 2β) z k+1 = T DY z k can implemented as: 1. xb k = J γb (z k ) 2. xa k = J γa (2xB k z k γcxb) k 3. z k+1 = z k + (xa k xb) k J γa, J γb, and γc are evaluated once at each iteration the intermediate variables: xa k x and xb k x 50 / 9

51 Three-operator natural applications Nonnegative matrix completion: y = A(X ) + w, w N (0, σ) minimize X R d m y A(X) 2 F + µ X + ι +(X) Smooth optimization with linear and box constraints minimize x H f (x) subject to: Lx = b; 0 x b. Covers kernelized SVMs and quadratic programs 51 / 9

52 Three-set split feasibility problem find x H such that x C 1 C 2 and Lx C 3, applications: nonnegative semi-definite programs and conic programs through homogeneous self-dual embedding. reformulation: minimize x ι C1 (x) + ι C2 (x) + 1 dist(lx, C3) / 9

53 Dual Davis-Yin splitting original problem, m 3: minimize x f 1(x 1) + + f m(x m) subject to A 1x A mx m = b. require: strongly convex f 1,..., f m 2, convex f m 1 and f m Davis-Yin 15 iteration: z k+1 = T DY z k for (d d m 2) + d m 1 + d m express in terms of (augmented) Lagrangian: x k+1 i = arg min x f i 1(x i ) + w kt A ix i, i = 1,... m 2, in parallel x k+1 m 1 arg min x j,j=m 1 fj(x j ) + w kt A jx j + γ 2 Ajx j + i m 1 Aixk+1 i b 2 xm k+1 arg min x m f m(x m) + w kt A mx m + γ 2 Amx m + m 1 i=1 Aixk+1 i b 2 w k+1 = w k γ ( b ) m i=1 Aixk+1 i m = 2 recovers ADMM, m = 3 gives simplest 3-block ADMM to date 53 / 9

54 Steps to build an operator-splitting algorithm Problem Simple Parts Standard Operators Algorithm 54 / 9

55 Summary monotone operator splitting is a powerful computational tool the first three-operator splitting scheme is introduced not covered: additive / block-separable structures enable parallel, distributed, and stochastic algorithms convergence analysis objective error: f k f point error: z k z 2 accelerated rates by averaging or extrapolation coordinate descent algorithms 55 / 9

Operator Splitting for Parallel and Distributed Optimization

Operator Splitting for Parallel and Distributed Optimization Operator Splitting for Parallel and Distributed Optimization Wotao Yin (UCLA Math) Shanghai Tech, SSDS 15 June 23, 2015 URL: alturl.com/2z7tv 1 / 60 What is splitting? Sun-Tzu: (400 BC) Caesar: divide-n-conquer

More information

Coordinate Update Algorithm Short Course Operator Splitting

Coordinate Update Algorithm Short Course Operator Splitting Coordinate Update Algorithm Short Course Operator Splitting Instructor: Wotao Yin (UCLA Math) Summer 2016 1 / 25 Operator splitting pipeline 1. Formulate a problem as 0 A(x) + B(x) with monotone operators

More information

Splitting methods for decomposing separable convex programs

Splitting methods for decomposing separable convex programs Splitting methods for decomposing separable convex programs Philippe Mahey LIMOS - ISIMA - Université Blaise Pascal PGMO, ENSTA 2013 October 4, 2013 1 / 30 Plan 1 Max Monotone Operators Proximal techniques

More information

Tight Rates and Equivalence Results of Operator Splitting Schemes

Tight Rates and Equivalence Results of Operator Splitting Schemes Tight Rates and Equivalence Results of Operator Splitting Schemes Wotao Yin (UCLA Math) Workshop on Optimization for Modern Computing Joint w Damek Davis and Ming Yan UCLA CAM 14-51, 14-58, and 14-59 1

More information

Coordinate Update Algorithm Short Course Proximal Operators and Algorithms

Coordinate Update Algorithm Short Course Proximal Operators and Algorithms Coordinate Update Algorithm Short Course Proximal Operators and Algorithms Instructor: Wotao Yin (UCLA Math) Summer 2016 1 / 36 Why proximal? Newton s method: for C 2 -smooth, unconstrained problems allow

More information

Math 273a: Optimization Lagrange Duality

Math 273a: Optimization Lagrange Duality Math 273a: Optimization Lagrange Duality Instructor: Wotao Yin Department of Mathematics, UCLA Winter 2015 online discussions on piazza.com Gradient descent / forward Euler assume function f is proper

More information

Proximal splitting methods on convex problems with a quadratic term: Relax!

Proximal splitting methods on convex problems with a quadratic term: Relax! Proximal splitting methods on convex problems with a quadratic term: Relax! The slides I presented with added comments Laurent Condat GIPSA-lab, Univ. Grenoble Alpes, France Workshop BASP Frontiers, Jan.

More information

Sparse Optimization Lecture: Dual Methods, Part I

Sparse Optimization Lecture: Dual Methods, Part I Sparse Optimization Lecture: Dual Methods, Part I Instructor: Wotao Yin July 2013 online discussions on piazza.com Those who complete this lecture will know dual (sub)gradient iteration augmented l 1 iteration

More information

A Primal-dual Three-operator Splitting Scheme

A Primal-dual Three-operator Splitting Scheme Noname manuscript No. (will be inserted by the editor) A Primal-dual Three-operator Splitting Scheme Ming Yan Received: date / Accepted: date Abstract In this paper, we propose a new primal-dual algorithm

More information

Recent Developments of Alternating Direction Method of Multipliers with Multi-Block Variables

Recent Developments of Alternating Direction Method of Multipliers with Multi-Block Variables Recent Developments of Alternating Direction Method of Multipliers with Multi-Block Variables Department of Systems Engineering and Engineering Management The Chinese University of Hong Kong 2014 Workshop

More information

Asynchronous Algorithms for Conic Programs, including Optimal, Infeasible, and Unbounded Ones

Asynchronous Algorithms for Conic Programs, including Optimal, Infeasible, and Unbounded Ones Asynchronous Algorithms for Conic Programs, including Optimal, Infeasible, and Unbounded Ones Wotao Yin joint: Fei Feng, Robert Hannah, Yanli Liu, Ernest Ryu (UCLA, Math) DIMACS: Distributed Optimization,

More information

Dual and primal-dual methods

Dual and primal-dual methods ELE 538B: Large-Scale Optimization for Data Science Dual and primal-dual methods Yuxin Chen Princeton University, Spring 2018 Outline Dual proximal gradient method Primal-dual proximal gradient method

More information

ARock: an algorithmic framework for asynchronous parallel coordinate updates

ARock: an algorithmic framework for asynchronous parallel coordinate updates ARock: an algorithmic framework for asynchronous parallel coordinate updates Zhimin Peng, Yangyang Xu, Ming Yan, Wotao Yin ( UCLA Math, U.Waterloo DCO) UCLA CAM Report 15-37 ShanghaiTech SSDS 15 June 25,

More information

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization /

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization / Uses of duality Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Remember conjugate functions Given f : R n R, the function is called its conjugate f (y) = max x R n yt x f(x) Conjugates appear

More information

Math 273a: Optimization Subgradients of convex functions

Math 273a: Optimization Subgradients of convex functions Math 273a: Optimization Subgradients of convex functions Made by: Damek Davis Edited by Wotao Yin Department of Mathematics, UCLA Fall 2015 online discussions on piazza.com 1 / 42 Subgradients Assumptions

More information

Distributed Optimization via Alternating Direction Method of Multipliers

Distributed Optimization via Alternating Direction Method of Multipliers Distributed Optimization via Alternating Direction Method of Multipliers Stephen Boyd, Neal Parikh, Eric Chu, Borja Peleato Stanford University ITMANET, Stanford, January 2011 Outline precursors dual decomposition

More information

M. Marques Alves Marina Geremia. November 30, 2017

M. Marques Alves Marina Geremia. November 30, 2017 Iteration complexity of an inexact Douglas-Rachford method and of a Douglas-Rachford-Tseng s F-B four-operator splitting method for solving monotone inclusions M. Marques Alves Marina Geremia November

More information

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 9. Alternating Direction Method of Multipliers

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 9. Alternating Direction Method of Multipliers Shiqian Ma, MAT-258A: Numerical Optimization 1 Chapter 9 Alternating Direction Method of Multipliers Shiqian Ma, MAT-258A: Numerical Optimization 2 Separable convex optimization a special case is min f(x)

More information

Optimization methods

Optimization methods Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,

More information

Convergence of Fixed-Point Iterations

Convergence of Fixed-Point Iterations Convergence of Fixed-Point Iterations Instructor: Wotao Yin (UCLA Math) July 2016 1 / 30 Why study fixed-point iterations? Abstract many existing algorithms in optimization, numerical linear algebra, and

More information

The Alternating Direction Method of Multipliers

The Alternating Direction Method of Multipliers The Alternating Direction Method of Multipliers With Adaptive Step Size Selection Peter Sutor, Jr. Project Advisor: Professor Tom Goldstein December 2, 2015 1 / 25 Background The Dual Problem Consider

More information

ALADIN An Algorithm for Distributed Non-Convex Optimization and Control

ALADIN An Algorithm for Distributed Non-Convex Optimization and Control ALADIN An Algorithm for Distributed Non-Convex Optimization and Control Boris Houska, Yuning Jiang, Janick Frasch, Rien Quirynen, Dimitris Kouzoupis, Moritz Diehl ShanghaiTech University, University of

More information

Accelerated primal-dual methods for linearly constrained convex problems

Accelerated primal-dual methods for linearly constrained convex problems Accelerated primal-dual methods for linearly constrained convex problems Yangyang Xu SIAM Conference on Optimization May 24, 2017 1 / 23 Accelerated proximal gradient For convex composite problem: minimize

More information

A Tutorial on Primal-Dual Algorithm

A Tutorial on Primal-Dual Algorithm A Tutorial on Primal-Dual Algorithm Shenlong Wang University of Toronto March 31, 2016 1 / 34 Energy minimization MAP Inference for MRFs Typical energies consist of a regularization term and a data term.

More information

A General Framework for a Class of Primal-Dual Algorithms for TV Minimization

A General Framework for a Class of Primal-Dual Algorithms for TV Minimization A General Framework for a Class of Primal-Dual Algorithms for TV Minimization Ernie Esser UCLA 1 Outline A Model Convex Minimization Problem Main Idea Behind the Primal Dual Hybrid Gradient (PDHG) Method

More information

Optimization for Learning and Big Data

Optimization for Learning and Big Data Optimization for Learning and Big Data Donald Goldfarb Department of IEOR Columbia University Department of Mathematics Distinguished Lecture Series May 17-19, 2016. Lecture 1. First-Order Methods for

More information

Proximal Methods for Optimization with Spasity-inducing Norms

Proximal Methods for Optimization with Spasity-inducing Norms Proximal Methods for Optimization with Spasity-inducing Norms Group Learning Presentation Xiaowei Zhou Department of Electronic and Computer Engineering The Hong Kong University of Science and Technology

More information

EE 367 / CS 448I Computational Imaging and Display Notes: Image Deconvolution (lecture 6)

EE 367 / CS 448I Computational Imaging and Display Notes: Image Deconvolution (lecture 6) EE 367 / CS 448I Computational Imaging and Display Notes: Image Deconvolution (lecture 6) Gordon Wetzstein gordon.wetzstein@stanford.edu This document serves as a supplement to the material discussed in

More information

Alternating Direction Method of Multipliers. Ryan Tibshirani Convex Optimization

Alternating Direction Method of Multipliers. Ryan Tibshirani Convex Optimization Alternating Direction Method of Multipliers Ryan Tibshirani Convex Optimization 10-725 Consider the problem Last time: dual ascent min x f(x) subject to Ax = b where f is strictly convex and closed. Denote

More information

About Split Proximal Algorithms for the Q-Lasso

About Split Proximal Algorithms for the Q-Lasso Thai Journal of Mathematics Volume 5 (207) Number : 7 http://thaijmath.in.cmu.ac.th ISSN 686-0209 About Split Proximal Algorithms for the Q-Lasso Abdellatif Moudafi Aix Marseille Université, CNRS-L.S.I.S

More information

Master 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique

Master 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique Master 2 MathBigData S. Gaïffas 1 3 novembre 2014 1 CMAP - Ecole Polytechnique 1 Supervised learning recap Introduction Loss functions, linearity 2 Penalization Introduction Ridge Sparsity Lasso 3 Some

More information

Optimization methods

Optimization methods Optimization methods Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda /8/016 Introduction Aim: Overview of optimization methods that Tend to

More information

Math 273a: Optimization Convex Conjugacy

Math 273a: Optimization Convex Conjugacy Math 273a: Optimization Convex Conjugacy Instructor: Wotao Yin Department of Mathematics, UCLA Fall 2015 online discussions on piazza.com Convex conjugate (the Legendre transform) Let f be a closed proper

More information

This can be 2 lectures! still need: Examples: non-convex problems applications for matrix factorization

This can be 2 lectures! still need: Examples: non-convex problems applications for matrix factorization This can be 2 lectures! still need: Examples: non-convex problems applications for matrix factorization x = prox_f(x)+prox_{f^*}(x) use to get prox of norms! PROXIMAL METHODS WHY PROXIMAL METHODS Smooth

More information

Coordinate Update Algorithm Short Course The Package TMAC

Coordinate Update Algorithm Short Course The Package TMAC Coordinate Update Algorithm Short Course The Package TMAC Instructor: Wotao Yin (UCLA Math) Summer 2016 1 / 16 TMAC: A Toolbox of Async-Parallel, Coordinate, Splitting, and Stochastic Methods C++11 multi-threading

More information

Sparse Regularization via Convex Analysis

Sparse Regularization via Convex Analysis Sparse Regularization via Convex Analysis Ivan Selesnick Electrical and Computer Engineering Tandon School of Engineering New York University Brooklyn, New York, USA 29 / 66 Convex or non-convex: Which

More information

Convex Optimization. (EE227A: UC Berkeley) Lecture 15. Suvrit Sra. (Gradient methods III) 12 March, 2013

Convex Optimization. (EE227A: UC Berkeley) Lecture 15. Suvrit Sra. (Gradient methods III) 12 March, 2013 Convex Optimization (EE227A: UC Berkeley) Lecture 15 (Gradient methods III) 12 March, 2013 Suvrit Sra Optimal gradient methods 2 / 27 Optimal gradient methods We saw following efficiency estimates for

More information

Math 273a: Optimization Subgradient Methods

Math 273a: Optimization Subgradient Methods Math 273a: Optimization Subgradient Methods Instructor: Wotao Yin Department of Mathematics, UCLA Fall 2015 online discussions on piazza.com Nonsmooth convex function Recall: For ˉx R n, f(ˉx) := {g R

More information

Algorithm-Hardware Co-Optimization of Memristor-Based Framework for Solving SOCP and Homogeneous QCQP Problems

Algorithm-Hardware Co-Optimization of Memristor-Based Framework for Solving SOCP and Homogeneous QCQP Problems L.C.Smith College of Engineering and Computer Science Algorithm-Hardware Co-Optimization of Memristor-Based Framework for Solving SOCP and Homogeneous QCQP Problems Ao Ren Sijia Liu Ruizhe Cai Wujie Wen

More information

Coordinate Descent and Ascent Methods

Coordinate Descent and Ascent Methods Coordinate Descent and Ascent Methods Julie Nutini Machine Learning Reading Group November 3 rd, 2015 1 / 22 Projected-Gradient Methods Motivation Rewrite non-smooth problem as smooth constrained problem:

More information

Motivation. Lecture 2 Topics from Optimization and Duality. network utility maximization (NUM) problem:

Motivation. Lecture 2 Topics from Optimization and Duality. network utility maximization (NUM) problem: CDS270 Maryam Fazel Lecture 2 Topics from Optimization and Duality Motivation network utility maximization (NUM) problem: consider a network with S sources (users), each sending one flow at rate x s, through

More information

Primal-dual algorithms for the sum of two and three functions 1

Primal-dual algorithms for the sum of two and three functions 1 Primal-dual algorithms for the sum of two and three functions 1 Ming Yan Michigan State University, CMSE/Mathematics 1 This works is partially supported by NSF. optimization problems for primal-dual algorithms

More information

arxiv: v4 [math.oc] 29 Jan 2018

arxiv: v4 [math.oc] 29 Jan 2018 Noname manuscript No. (will be inserted by the editor A new primal-dual algorithm for minimizing the sum of three functions with a linear operator Ming Yan arxiv:1611.09805v4 [math.oc] 29 Jan 2018 Received:

More information

Dual methods and ADMM. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725

Dual methods and ADMM. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725 Dual methods and ADMM Barnabas Poczos & Ryan Tibshirani Convex Optimization 10-725/36-725 1 Given f : R n R, the function is called its conjugate Recall conjugate functions f (y) = max x R n yt x f(x)

More information

Sparse Optimization Lecture: Basic Sparse Optimization Models

Sparse Optimization Lecture: Basic Sparse Optimization Models Sparse Optimization Lecture: Basic Sparse Optimization Models Instructor: Wotao Yin July 2013 online discussions on piazza.com Those who complete this lecture will know basic l 1, l 2,1, and nuclear-norm

More information

Bias-free Sparse Regression with Guaranteed Consistency

Bias-free Sparse Regression with Guaranteed Consistency Bias-free Sparse Regression with Guaranteed Consistency Wotao Yin (UCLA Math) joint with: Stanley Osher, Ming Yan (UCLA) Feng Ruan, Jiechao Xiong, Yuan Yao (Peking U) UC Riverside, STATS Department March

More information

Convex Optimization. Dani Yogatama. School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA. February 12, 2014

Convex Optimization. Dani Yogatama. School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA. February 12, 2014 Convex Optimization Dani Yogatama School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA February 12, 2014 Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12,

More information

Nonconvex ADMM: Convergence and Applications

Nonconvex ADMM: Convergence and Applications Nonconvex ADMM: Convergence and Applications Instructor: Wotao Yin (UCLA Math) Based on CAM 15-62 with Yu Wang and Jinshan Zeng Summer 2016 1 / 54 1. Alternating Direction Method of Multipliers (ADMM):

More information

ACCELERATED FIRST-ORDER PRIMAL-DUAL PROXIMAL METHODS FOR LINEARLY CONSTRAINED COMPOSITE CONVEX PROGRAMMING

ACCELERATED FIRST-ORDER PRIMAL-DUAL PROXIMAL METHODS FOR LINEARLY CONSTRAINED COMPOSITE CONVEX PROGRAMMING ACCELERATED FIRST-ORDER PRIMAL-DUAL PROXIMAL METHODS FOR LINEARLY CONSTRAINED COMPOSITE CONVEX PROGRAMMING YANGYANG XU Abstract. Motivated by big data applications, first-order methods have been extremely

More information

Duality in Linear Programs. Lecturer: Ryan Tibshirani Convex Optimization /36-725

Duality in Linear Programs. Lecturer: Ryan Tibshirani Convex Optimization /36-725 Duality in Linear Programs Lecturer: Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: proximal gradient descent Consider the problem x g(x) + h(x) with g, h convex, g differentiable, and

More information

ARock: an Algorithmic Framework for Asynchronous Parallel Coordinate Updates

ARock: an Algorithmic Framework for Asynchronous Parallel Coordinate Updates ARock: an Algorithmic Framework for Asynchronous Parallel Coordinate Updates Zhimin Peng Yangyang Xu Ming Yan Wotao Yin May 3, 216 Abstract Finding a fixed point to a nonexpansive operator, i.e., x = T

More information

Primal-dual coordinate descent A Coordinate Descent Primal-Dual Algorithm with Large Step Size and Possibly Non-Separable Functions

Primal-dual coordinate descent A Coordinate Descent Primal-Dual Algorithm with Large Step Size and Possibly Non-Separable Functions Primal-dual coordinate descent A Coordinate Descent Primal-Dual Algorithm with Large Step Size and Possibly Non-Separable Functions Olivier Fercoq and Pascal Bianchi Problem Minimize the convex function

More information

Does Alternating Direction Method of Multipliers Converge for Nonconvex Problems?

Does Alternating Direction Method of Multipliers Converge for Nonconvex Problems? Does Alternating Direction Method of Multipliers Converge for Nonconvex Problems? Mingyi Hong IMSE and ECpE Department Iowa State University ICCOPT, Tokyo, August 2016 Mingyi Hong (Iowa State University)

More information

First-order methods for structured nonsmooth optimization

First-order methods for structured nonsmooth optimization First-order methods for structured nonsmooth optimization Sangwoon Yun Department of Mathematics Education Sungkyunkwan University Oct 19, 2016 Center for Mathematical Analysis & Computation, Yonsei University

More information

Gauge optimization and duality

Gauge optimization and duality 1 / 54 Gauge optimization and duality Junfeng Yang Department of Mathematics Nanjing University Joint with Shiqian Ma, CUHK September, 2015 2 / 54 Outline Introduction Duality Lagrange duality Fenchel

More information

Contraction Methods for Convex Optimization and monotone variational inequalities No.12

Contraction Methods for Convex Optimization and monotone variational inequalities No.12 XII - 1 Contraction Methods for Convex Optimization and monotone variational inequalities No.12 Linearized alternating direction methods of multipliers for separable convex programming Bingsheng He Department

More information

Splitting Techniques in the Face of Huge Problem Sizes: Block-Coordinate and Block-Iterative Approaches

Splitting Techniques in the Face of Huge Problem Sizes: Block-Coordinate and Block-Iterative Approaches Splitting Techniques in the Face of Huge Problem Sizes: Block-Coordinate and Block-Iterative Approaches Patrick L. Combettes joint work with J.-C. Pesquet) Laboratoire Jacques-Louis Lions Faculté de Mathématiques

More information

2 Regularized Image Reconstruction for Compressive Imaging and Beyond

2 Regularized Image Reconstruction for Compressive Imaging and Beyond EE 367 / CS 448I Computational Imaging and Display Notes: Compressive Imaging and Regularized Image Reconstruction (lecture ) Gordon Wetzstein gordon.wetzstein@stanford.edu This document serves as a supplement

More information

Distributed Consensus Optimization

Distributed Consensus Optimization Distributed Consensus Optimization Ming Yan Michigan State University, CMSE/Mathematics September 14, 2018 Decentralized-1 Backgroundwhy andwe motivation need decentralized optimization? I Decentralized

More information

Adaptive Primal Dual Optimization for Image Processing and Learning

Adaptive Primal Dual Optimization for Image Processing and Learning Adaptive Primal Dual Optimization for Image Processing and Learning Tom Goldstein Rice University tag7@rice.edu Ernie Esser University of British Columbia eesser@eos.ubc.ca Richard Baraniuk Rice University

More information

Asynchronous Parallel Computing in Signal Processing and Machine Learning

Asynchronous Parallel Computing in Signal Processing and Machine Learning Asynchronous Parallel Computing in Signal Processing and Machine Learning Wotao Yin (UCLA Math) joint with Zhimin Peng (UCLA), Yangyang Xu (IMA), Ming Yan (MSU) Optimization and Parsimonious Modeling IMA,

More information

A Unified Approach to Proximal Algorithms using Bregman Distance

A Unified Approach to Proximal Algorithms using Bregman Distance A Unified Approach to Proximal Algorithms using Bregman Distance Yi Zhou a,, Yingbin Liang a, Lixin Shen b a Department of Electrical Engineering and Computer Science, Syracuse University b Department

More information

On the interior of the simplex, we have the Hessian of d(x), Hd(x) is diagonal with ith. µd(w) + w T c. minimize. subject to w T 1 = 1,

On the interior of the simplex, we have the Hessian of d(x), Hd(x) is diagonal with ith. µd(w) + w T c. minimize. subject to w T 1 = 1, Math 30 Winter 05 Solution to Homework 3. Recognizing the convexity of g(x) := x log x, from Jensen s inequality we get d(x) n x + + x n n log x + + x n n where the equality is attained only at x = (/n,...,

More information

Lecture 24 November 27

Lecture 24 November 27 EE 381V: Large Scale Optimization Fall 01 Lecture 4 November 7 Lecturer: Caramanis & Sanghavi Scribe: Jahshan Bhatti and Ken Pesyna 4.1 Mirror Descent Earlier, we motivated mirror descent as a way to improve

More information

A GENERAL FRAMEWORK FOR A CLASS OF FIRST ORDER PRIMAL-DUAL ALGORITHMS FOR CONVEX OPTIMIZATION IN IMAGING SCIENCE

A GENERAL FRAMEWORK FOR A CLASS OF FIRST ORDER PRIMAL-DUAL ALGORITHMS FOR CONVEX OPTIMIZATION IN IMAGING SCIENCE A GENERAL FRAMEWORK FOR A CLASS OF FIRST ORDER PRIMAL-DUAL ALGORITHMS FOR CONVEX OPTIMIZATION IN IMAGING SCIENCE ERNIE ESSER XIAOQUN ZHANG TONY CHAN Abstract. We generalize the primal-dual hybrid gradient

More information

ARock: an Algorithmic Framework for Async-Parallel Coordinate Updates

ARock: an Algorithmic Framework for Async-Parallel Coordinate Updates ARock: an Algorithmic Framework for Async-Parallel Coordinate Updates Zhimin Peng Yangyang Xu Ming Yan Wotao Yin July 7, 215 The problem of finding a fixed point to a nonexpansive operator is an abstraction

More information

Optimality, Duality, Complementarity for Constrained Optimization

Optimality, Duality, Complementarity for Constrained Optimization Optimality, Duality, Complementarity for Constrained Optimization Stephen Wright University of Wisconsin-Madison May 2014 Wright (UW-Madison) Optimality, Duality, Complementarity May 2014 1 / 41 Linear

More information

Fast proximal gradient methods

Fast proximal gradient methods L. Vandenberghe EE236C (Spring 2013-14) Fast proximal gradient methods fast proximal gradient method (FISTA) FISTA with line search FISTA as descent method Nesterov s second method 1 Fast (proximal) gradient

More information

Beyond Heuristics: Applying Alternating Direction Method of Multipliers in Nonconvex Territory

Beyond Heuristics: Applying Alternating Direction Method of Multipliers in Nonconvex Territory Beyond Heuristics: Applying Alternating Direction Method of Multipliers in Nonconvex Territory Xin Liu(4Ð) State Key Laboratory of Scientific and Engineering Computing Institute of Computational Mathematics

More information

Primal-dual Subgradient Method for Convex Problems with Functional Constraints

Primal-dual Subgradient Method for Convex Problems with Functional Constraints Primal-dual Subgradient Method for Convex Problems with Functional Constraints Yurii Nesterov, CORE/INMA (UCL) Workshop on embedded optimization EMBOPT2014 September 9, 2014 (Lucca) Yu. Nesterov Primal-dual

More information

Variational Image Restoration

Variational Image Restoration Variational Image Restoration Yuling Jiao yljiaostatistics@znufe.edu.cn School of and Statistics and Mathematics ZNUFE Dec 30, 2014 Outline 1 1 Classical Variational Restoration Models and Algorithms 1.1

More information

The Direct Extension of ADMM for Multi-block Convex Minimization Problems is Not Necessarily Convergent

The Direct Extension of ADMM for Multi-block Convex Minimization Problems is Not Necessarily Convergent The Direct Extension of ADMM for Multi-block Convex Minimization Problems is Not Necessarily Convergent Yinyu Ye K. T. Li Professor of Engineering Department of Management Science and Engineering Stanford

More information

4TE3/6TE3. Algorithms for. Continuous Optimization

4TE3/6TE3. Algorithms for. Continuous Optimization 4TE3/6TE3 Algorithms for Continuous Optimization (Algorithms for Constrained Nonlinear Optimization Problems) Tamás TERLAKY Computing and Software McMaster University Hamilton, November 2005 terlaky@mcmaster.ca

More information

Accelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems)

Accelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems) Accelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems) Donghwan Kim and Jeffrey A. Fessler EECS Department, University of Michigan

More information

6. Proximal gradient method

6. Proximal gradient method L. Vandenberghe EE236C (Spring 2016) 6. Proximal gradient method motivation proximal mapping proximal gradient method with fixed step size proximal gradient method with line search 6-1 Proximal mapping

More information

Introduction to Alternating Direction Method of Multipliers

Introduction to Alternating Direction Method of Multipliers Introduction to Alternating Direction Method of Multipliers Yale Chang Machine Learning Group Meeting September 29, 2016 Yale Chang (Machine Learning Group Meeting) Introduction to Alternating Direction

More information

Coordinate Update Algorithm Short Course Subgradients and Subgradient Methods

Coordinate Update Algorithm Short Course Subgradients and Subgradient Methods Coordinate Update Algorithm Short Course Subgradients and Subgradient Methods Instructor: Wotao Yin (UCLA Math) Summer 2016 1 / 30 Notation f : H R { } is a closed proper convex function domf := {x R n

More information

ICS-E4030 Kernel Methods in Machine Learning

ICS-E4030 Kernel Methods in Machine Learning ICS-E4030 Kernel Methods in Machine Learning Lecture 3: Convex optimization and duality Juho Rousu 28. September, 2016 Juho Rousu 28. September, 2016 1 / 38 Convex optimization Convex optimisation This

More information

A New Primal Dual Algorithm for Minimizing the Sum of Three Functions with a Linear Operator

A New Primal Dual Algorithm for Minimizing the Sum of Three Functions with a Linear Operator https://doi.org/10.1007/s10915-018-0680-3 A New Primal Dual Algorithm for Minimizing the Sum of Three Functions with a Linear Operator Ming Yan 1,2 Received: 22 January 2018 / Accepted: 22 February 2018

More information

Convex Optimization M2

Convex Optimization M2 Convex Optimization M2 Lecture 3 A. d Aspremont. Convex Optimization M2. 1/49 Duality A. d Aspremont. Convex Optimization M2. 2/49 DMs DM par email: dm.daspremont@gmail.com A. d Aspremont. Convex Optimization

More information

12. Interior-point methods

12. Interior-point methods 12. Interior-point methods Convex Optimization Boyd & Vandenberghe inequality constrained minimization logarithmic barrier function and central path barrier method feasibility and phase I methods complexity

More information

A GENERAL FRAMEWORK FOR A CLASS OF FIRST ORDER PRIMAL-DUAL ALGORITHMS FOR TV MINIMIZATION

A GENERAL FRAMEWORK FOR A CLASS OF FIRST ORDER PRIMAL-DUAL ALGORITHMS FOR TV MINIMIZATION A GENERAL FRAMEWORK FOR A CLASS OF FIRST ORDER PRIMAL-DUAL ALGORITHMS FOR TV MINIMIZATION ERNIE ESSER XIAOQUN ZHANG TONY CHAN Abstract. We generalize the primal-dual hybrid gradient (PDHG) algorithm proposed

More information

Proximal Gradient Descent and Acceleration. Ryan Tibshirani Convex Optimization /36-725

Proximal Gradient Descent and Acceleration. Ryan Tibshirani Convex Optimization /36-725 Proximal Gradient Descent and Acceleration Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: subgradient method Consider the problem min f(x) with f convex, and dom(f) = R n. Subgradient method:

More information

Relaxed linearized algorithms for faster X-ray CT image reconstruction

Relaxed linearized algorithms for faster X-ray CT image reconstruction Relaxed linearized algorithms for faster X-ray CT image reconstruction Hung Nien and Jeffrey A. Fessler University of Michigan, Ann Arbor The 13th Fully 3D Meeting June 2, 2015 1/20 Statistical image reconstruction

More information

LOCAL LINEAR CONVERGENCE OF ADMM Daniel Boley

LOCAL LINEAR CONVERGENCE OF ADMM Daniel Boley LOCAL LINEAR CONVERGENCE OF ADMM Daniel Boley Model QP/LP: min 1 / 2 x T Qx+c T x s.t. Ax = b, x 0, (1) Lagrangian: L(x,y) = 1 / 2 x T Qx+c T x y T x s.t. Ax = b, (2) where y 0 is the vector of Lagrange

More information

Convergence analysis for a primal-dual monotone + skew splitting algorithm with applications to total variation minimization

Convergence analysis for a primal-dual monotone + skew splitting algorithm with applications to total variation minimization Convergence analysis for a primal-dual monotone + skew splitting algorithm with applications to total variation minimization Radu Ioan Boţ Christopher Hendrich November 7, 202 Abstract. In this paper we

More information

A primal-dual fixed point algorithm for multi-block convex minimization *

A primal-dual fixed point algorithm for multi-block convex minimization * Journal of Computational Mathematics Vol.xx, No.x, 201x, 1 16. http://www.global-sci.org/jcm doi:?? A primal-dual fixed point algorithm for multi-block convex minimization * Peijun Chen School of Mathematical

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Support vector machines (SVMs) are one of the central concepts in all of machine learning. They are simply a combination of two ideas: linear classification via maximum (or optimal

More information

9. Dual decomposition and dual algorithms

9. Dual decomposition and dual algorithms EE 546, Univ of Washington, Spring 2016 9. Dual decomposition and dual algorithms dual gradient ascent example: network rate control dual decomposition and the proximal gradient method examples with simple

More information

On the order of the operators in the Douglas Rachford algorithm

On the order of the operators in the Douglas Rachford algorithm On the order of the operators in the Douglas Rachford algorithm Heinz H. Bauschke and Walaa M. Moursi June 11, 2015 Abstract The Douglas Rachford algorithm is a popular method for finding zeros of sums

More information

MLCC 2018 Variable Selection and Sparsity. Lorenzo Rosasco UNIGE-MIT-IIT

MLCC 2018 Variable Selection and Sparsity. Lorenzo Rosasco UNIGE-MIT-IIT MLCC 2018 Variable Selection and Sparsity Lorenzo Rosasco UNIGE-MIT-IIT Outline Variable Selection Subset Selection Greedy Methods: (Orthogonal) Matching Pursuit Convex Relaxation: LASSO & Elastic Net

More information

Coordinate descent. Geoff Gordon & Ryan Tibshirani Optimization /

Coordinate descent. Geoff Gordon & Ryan Tibshirani Optimization / Coordinate descent Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Adding to the toolbox, with stats and ML in mind We ve seen several general and useful minimization tools First-order methods

More information

I P IANO : I NERTIAL P ROXIMAL A LGORITHM FOR N ON -C ONVEX O PTIMIZATION

I P IANO : I NERTIAL P ROXIMAL A LGORITHM FOR N ON -C ONVEX O PTIMIZATION I P IANO : I NERTIAL P ROXIMAL A LGORITHM FOR N ON -C ONVEX O PTIMIZATION Peter Ochs University of Freiburg Germany 17.01.2017 joint work with: Thomas Brox and Thomas Pock c 2017 Peter Ochs ipiano c 1

More information

3. Duality: What is duality? Why does it matter? Sensitivity through duality.

3. Duality: What is duality? Why does it matter? Sensitivity through duality. 1 Overview of lecture (10/5/10) 1. Review Simplex Method 2. Sensitivity Analysis: How does solution change as parameters change? How much is the optimal solution effected by changing A, b, or c? How much

More information

10-725/36-725: Convex Optimization Spring Lecture 21: April 6

10-725/36-725: Convex Optimization Spring Lecture 21: April 6 10-725/36-725: Conve Optimization Spring 2015 Lecturer: Ryan Tibshirani Lecture 21: April 6 Scribes: Chiqun Zhang, Hanqi Cheng, Waleed Ammar Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer:

More information

Sparse Optimization Lecture: Dual Certificate in l 1 Minimization

Sparse Optimization Lecture: Dual Certificate in l 1 Minimization Sparse Optimization Lecture: Dual Certificate in l 1 Minimization Instructor: Wotao Yin July 2013 Note scriber: Zheng Sun Those who complete this lecture will know what is a dual certificate for l 1 minimization

More information

Sparse and Regularized Optimization

Sparse and Regularized Optimization Sparse and Regularized Optimization In many applications, we seek not an exact minimizer of the underlying objective, but rather an approximate minimizer that satisfies certain desirable properties: sparsity

More information

Alternative Decompositions for Distributed Maximization of Network Utility: Framework and Applications

Alternative Decompositions for Distributed Maximization of Network Utility: Framework and Applications Alternative Decompositions for Distributed Maximization of Network Utility: Framework and Applications Daniel P. Palomar Hong Kong University of Science and Technology (HKUST) ELEC5470 - Convex Optimization

More information

On convergence rate of the Douglas-Rachford operator splitting method

On convergence rate of the Douglas-Rachford operator splitting method On convergence rate of the Douglas-Rachford operator splitting method Bingsheng He and Xiaoming Yuan 2 Abstract. This note provides a simple proof on a O(/k) convergence rate for the Douglas- Rachford

More information

Frank-Wolfe Method. Ryan Tibshirani Convex Optimization

Frank-Wolfe Method. Ryan Tibshirani Convex Optimization Frank-Wolfe Method Ryan Tibshirani Convex Optimization 10-725 Last time: ADMM For the problem min x,z f(x) + g(z) subject to Ax + Bz = c we form augmented Lagrangian (scaled form): L ρ (x, z, w) = f(x)

More information