Operator Splitting for Parallel and Distributed Optimization

Size: px
Start display at page:

Download "Operator Splitting for Parallel and Distributed Optimization"

Transcription

1 Operator Splitting for Parallel and Distributed Optimization Wotao Yin (UCLA Math) Shanghai Tech, SSDS 15 June 23, 2015 URL: alturl.com/2z7tv 1 / 60

2 What is splitting? Sun-Tzu: (400 BC) Caesar: divide-n-conquer ( BC) splitting in computing: break a problem separate parts solve the separate parts sub-solutions combine the sub-solutions in a controlled fashion 2 / 60

3 Monotone operator-splitting pipeline 1. recognize the simple parts in your problem 2. build an equivalent monotone-inclusion problem: 0 Ax (the simple parts are separately placed in A) 3. apply an operator-splitting scheme: 0 Ax = z = Tz (the simple parts become sub-operators of T) 4. run the iteration z k+1 = z k + λ(tz k z k ), λ (0, 1] (known as the Krasnosel skiĭ Mann (KM) iteration) 3 / 60

4 Example: LASSO (basis pursuit denoising) Tibshirani 96: 2 simple parts: smooth + simple minimize 1 2 Ax b 2 + λ x 1 smooth function: f (x) = 1 Ax b 2 2 simple nonsmooth function: r(x) = λ x 1 equivalent condition: 0 f (x) + r(x) forward-backward splitting algorithm: x k+1 = prox γr (x k γ f (x k )) }{{} Tx k also known as the Iterative Soft-Thresholding Algorithm (ISTA) 4 / 60

5 Example: total variation (TV) deblurring Rudin-Osher-Fatemi 92: minimize u 1 2 Ku b 2 + λ Du 1 subject to 0 u simple parts: smooth + simple linear + simple smooth function: f (u) = 1 Ku b 2 2 linear operator: D simple nonsmooth function: r 1 = λ 1 simple indicator function: r 2 = ι [0,255] 5 / 60

6 (We only show the results, skipping the details) equivalent condition: [ ] [ ] [ ] [ ] r2 D u f 0 u 0 + D r1 w 0 0 w (where w is the auxiliary (dual) variable) forward-backward splitting algorithm under a special metric: u k+1 = proj [0,255] n (u k γd w k γ f (u k )) w k+1 = prox γl (w k + γd(2u k+1 u k )) every step is simple to implement 6 / 60

7 Simple parts 7 / 60

8 Simple parts linear maps (e.g., matrices, finite differences, orthogonal transforms) differentiable functions (non-differentiable) functions that have simple proximal maps sets that are easy to project to Abstraction: we look for monotone maps which have certain simple operators 8 / 60

9 Monotone map A : H H is monotone if Ax Ay, x y 0, x, y H extend to set-valued A : H 2 H : p q, x y 0, x, y H, p Ax, q Ay examples: a positive semi-definite linear map a skew-symmetric linear map: Ax, x = 0, x H f : subdifferential of a proper closed convex function f 9 / 60

10 Steps to build an operator-splitting algorithm Problem Monotone Maps Standard Operators Algorithm 10 / 60

11 Forward operator require: A is monotone and either Lipschitz or cocoercive 1 definition: fwd γa := (I γa) examples: forward Euler for ẏ + g(t, y) = 0: y t+1 = y t h g(t, y t ) = (I h g(t, ))y t gradient descent for min f (x): x k+1 = x k γ f (x k ) = (I γ f )x k equivalent conditions: 0 = Ax x = (I γa)x 1 Ax Ay, x y β Ax Ay 2, x, y H. If A is β-cocoercive, then A is 1/β-Lipschitz. The reverse is generally untrue. 11 / 60

12 Backward operator require: A is monotone definition: J γa := (I + γa) 1 equivalent conditions: 0 Ax x x + γax x = J γa x even if A is set-valued, J γa is single-valued examples: regularized matrix inversion backward Euler proximal map, including projection as a special case 12 / 60

13 Proximal map require: a proper closed convex function f definition: the minimizer must satisfy: prox γf (y) = arg min x f (x) + 1 x y 2 2γ 0 γ f (x ) + (x y) x = (I + γ f ) 1 (y) = J γ f (y) therefore, prox γf J γ f (I + γ f ) 1 13 / 60

14 Reflective backward operator require: A is monotone definition: R γa := 2J γa I reflects x through J γa x by adding J γa x x examples: mirror or reflective projection: refl C = 2proj C I reflective proximal map: for closed proper convex function f R γ f = 2J γ f I = 2prox γf I 14 / 60

15 Operator splitting 15 / 60

16 Steps to build an operator-splitting algorithm Problem Monotone Maps Standard Operators Algorithm 16 / 60

17 The big three two-operator splitting schemes 0 Ax + Bx forward-backward (Mercier 79) for (maximally monotone) + (cocoercive) Douglas-Rachford (Lion-Mercier 79) for (maximally monotone) + (maximally monotone) forward-backward-forward (Tseng 00) for (maximally monotone) + (Lipschitz & monotone) all the schemes are built from forward operators and backward operators 17 / 60

18 Forward-backward splitting require: A maximally monotone, B cocoercive (thus single-valued) forward-backward splitting (FBS) operator (Lion-Mercier 79) T FBS := J γa (I γb) reduces to forward operator if A = 0, and backward operator if B = 0 equivalent conditions: 0 Ax + Bx x = T FBS (x) backward-forward splitting (BFS) operator T BFS := (I γb) J γa, then 0 Ax + Bx z = T BFS (z), x = J γa z 18 / 60

19 Regularization least squares minimize x r(x) + 1 Kx b 2 } 2 {{} f (x) K: linear operator b: input data (observation) r: enforces a structure on x. examples: l 2 2, l 1, sorted l 1, l 2, TV, nuclear norm,... equivalent condition: 0 r(x) + f (x) forward-backward splitting iteration: x k+1 = prox γr (I γ f )x k 19 / 60

20 Constrained minimization C is a convex set. f is a proper close convex function. minimize x f (x) subject to x C equivalent condition: 0 N C (x) + f (x) if f is Lipschitz differentiable, then apply forward-backward splitting x k+1 = proj C (I γ f )x k recovers the projected gradient method 20 / 60

21 Douglas-Rachford splitting require: A, B both monotone Douglas-Rachford splitting (DRS) operator (Lion-Mercier 79) T DRS := 1 2 I R γa R γb note: switching A and B gives a different DRS operator (relaxed) Peaceman-Rachford splitting (PRS) operator, λ (0, 1]: T λ PRS := (1 λ)i + λr γa R γb also, T PRS = T 1 PRS equivalent conditions: 0 Ax + Bx z = T λ PRS(z), x = J γb z 21 / 60

22 Constrained minimization, cont. if f is non-differentiable, then apply Douglas-Rachford splitting (DRS) (where x k = proj C z k ) z k+1 = ( 1 2 I (2prox γf I ) (2proj C I ) ) z k dual approach: introduce x y = 0 and apply ADMM to minimize x,y f (x) + ι C (y) subject to x y = 0. (indicator function ι C (y) = 0, if y C, and otherwise.) equivalence: the ADMM iteration = the DRS iteration 22 / 60

23 Multi-function minimization f 1,..., f m : H (, ] are proper closed convex functions. product-space trick: minimize f 1(x) + + f N (x) x introduce copies x (i) H of x; let x = (x (1),..., x (N) ) H N let C = {x : x (1) = = x (N) } equivalent problem in H N : minimize x ι C (x) + N f i(x (i) ) i=1 then apply a two-operator splitting scheme 23 / 60

24 Operator splitting: Dual application 24 / 60

25 Duality convex (and some nonconvex) optimization problems have two perspectives: the primal problem and the dual problem duality brings us: alternative or relaxed problems, lower bounds certificates for optimality or infeasibility economic interpretations duality + operator splitting: decouples objective functions and constraints components gives rise to parallel and distributed algorithms 25 / 60

26 Lagrange duality original problem: minimize x f (x) subject to Ax = b. relaxations: Lagrangian: L(x; w) := f (x) + w T (Ax b) augmented Lagrangian: L(x; w, γ) := f (x) + w T (Ax b) + γ Ax b 2 2 dual function, which is convex (even if f is not): d(w) = min L(x; w) x dual problem: minimize w d(w) 26 / 60

27 Monotropic program definition: minimize x 1,...,x m f 1(x 1) + + f m(x m) subject to A 1x 1 + A mx m = b. where f i(x i) may include ι Ci (x) for constraint x i C i x 1,..., x m are separable in the objective and coupled in the constraints dual problem has the form minimize w d 1(w) + + d m(w) where d i(w) := min xi f i(x i) + w T (A ix i 1 m b) 27 / 60

28 Examples of monotropic programs linear programs min{f (x) : Ax C} min x,y{f (x) + ι C (y) : Ax y = 0} consensus problem min{f 1(x 1) + + f n(x n) : Ax = 0}, where Ax = 0 x 1 = = x m the structure of A enables distributed computing exchange problem 28 / 60

29 Dual forward-backward splitting original problem: minimize x f 1(x) + f 2(y) subject to A 1x 1 + A 2x 2 = b. require: strongly convex f 1 (thus Lipschitz d 1) FBS iteration: z k+1 = prox γd2 (I γ d 1 )z k express in terms of (augmented) Lagrangian: x k+1 1 = arg min f 1(x 1) + w kt A 1x 1 x 1 x k+1 2 arg min f 2(x 2) + w kt A 2x 2 + γ x 2 2 A1xk A 2x 2 b 2 w k+1 = w γ(b A 1x k+1 1 A 2x k+1 2 ) we have recovered Tseng s Alternating Minimization Algorithm 29 / 60

30 Dual Douglas-Rachford splitting original problem: minimize x no strong-convexity requirement f 1(x) + f 2(y) subject to A 1x 1 + A 2x 2 = b. DRS iteration: z k+1 = ( 1 2 I (2prox γd 2 I )(2prox γd1 I ) ) z k express in terms of augmented Lagrangian: x k+1 1 arg min f 1(x 1) + w kt A 1x 1 + γ 2 A 1x 1 + A 2x2 k b 2 x k+1 x 1 2 arg min x 2 f 2(x 2) + w kt A 2x 2 + γ k+1 A1x1 + A 2x 2 b 2 2 w k+1 = w k γ(b A 1x k+1 1 A 2x k+1 2 ) recover the Alternating Direction Method of Multipliers (ADMM) 30 / 60

31 Operator splitting: Primal-dual application 31 / 60

32 Nonsmooth linear composition problem: minimize nonsmooth linear + nonsmooth + smooth equivalent condition: minimize x decouple r 1 from L: introduce r 1(Lx) + r 2(x) + f (x) 0 (L T r 1 L + r 2 + f )x dual variable y r 1(Lx) Lx r 1 (y) equivalent condition: [ ] [ ] [ ] [ ] 0 L T x r2(x) f (x) L 0 y r1 (y) 0 32 / 60

33 equivalent condition (copied from last slide): [ ] [ ] [ ] [ ] 0 L T x r2(x) f (x) L 0 y r1 (y) 0 }{{}}{{} Az Bz [ ] x primal-dual variable: z =. y apply forward-backward splitting to 0 Az + Bz: z k+1 = J γa F γb z k z k+1 = (I + γa) 1 (I γb)z k { x k+1 + γl T y k+1 + γ r2(x k+1 ) = x k γ f (x k ) solve y k+1 γlx k+1 + γ r 1 (y k+1 ) = y k issue: both x k+1 and y k+1 appear in both equations! 33 / 60

34 solution: introduce the metric [ ] I γl T U = 0 γl I apply forward-backward splitting to 0 U 1 Az + U 1 Bz: z k+1 = J γu 1 A (I γu 1 B)z k z k+1 = (I + γu 1 A) 1 (I γu 1 B)z k solve Uz k+1 + γãz k+1 = Uz k γbz k { x k+1 γl T y k+1 + γl T y k+1 + γ r2(x k+1 ) = x k γl T y k γ f (x k ) y k+1 γl x k+1 γlx k+1 + γ r1 (y k+1 ) = y k γl x k (like Gaussian elimination, y k+1 is cancelled from the first equation) 34 / 60

35 strategy: obtain x k+1 from the first equation; plug in x k+1 as a constant into the second equation and then obtain y k+1 final iteration: x k+1 = prox γr2 (x k γl T y k γ f (x k )) y k+1 = prox γr (y k + γl(2x k+1 x k )) 1 nice properties: apply L and f explicitly solve proximal-point subproblems of r 1 and r 2 convergence follows from standard forward-backward splitting 35 / 60

36 Example: total variation (TV) deblurring Rudin-Osher-Fatemi 92: minimize u 1 2 Ku b 2 + λ Du 1 subject to 0 u simple parts: smooth + simple linear + simple smooth function: f (u) = 1 Ku b 2 2 linear operator: D simple nonsmooth function: r 1 = λ 1 simple indicator function: r 2 = ι [0,255] 36 / 60

37 (We only show the results, skipping the details) equivalent condition: [ ] [ ] [ ] [ ] r2 D u f 0 u 0 + D r1 w 0 0 w (where w is the auxiliary (dual) variable) forward-backward splitting algorithm under a special metric: u k+1 = proj [0,255] n (u k γd w k γ f (u k )) w k+1 = prox γl (w k + γd(2u k+1 u k )) every step is simple to implement 37 / 60

38 Operator splitting: Parallel and distributed applications 38 / 60

39 Huge matrix A background: you wish to distribute a huge matrix A in your problem three schemes to distribute A: A (1). A 1 A N A (M) A 1,1 A 1,N A M,1 A M,N scheme 1 scheme 2 scheme 3 39 / 60

40 choose a scheme based on the structures of functions/operators example: convex and smooth f ; convex r (possibly nonsmooth) minimize x r(x) + f (Ax) choose scheme 1 for separable f, that is, f (y) = m i=1 fi(yi) apply forward-backward splitting x k+1 = prox γr (x k γa T f (Ax k )) ( M = prox γr x k γ A T (i) f i (A (i) x k ) ) i=1 we can distribute/parallelize A T (i) fi(a (i)x k ) 40 / 60

41 choose scheme 2 for separable r prox γr1 (y 1) prox γr (y) =. apply forward-backward splitting: prox γrn (y N ) x k+1 = prox γr (x k γa T f (Ax k )) cache g = f (Ax k ) x k+1 1 prox γr1 (x1 k γa T 1 g). =. prox γrn (xn k γa T N g) broadcast g = f ( ) M j=1 Ajxk+1 j x k+1 N we distribute the computing of A jx k+1 j = A jprox γrj (x k j γa T j g) this is also known as parallel coordinate descent 41 / 60

42 choose scheme 3 for separable f and r (we skip the details) Remarks: parallel computing is enabled by separability structure try not to introduce extra variables for convergence speed parallelism can be further accelerated by asynchronous parallelism 42 / 60

43 assume scheme 1: A = Duality and structure trade A (1). A (M) primal problem: convex nonsmooth r 1,..., r m ; strongly-convex f : m minimize ri(a w i=1 (i)w) + f (w) structure: nonsmooth linear + strongly-convex let f (y) = sup x y, x f (x) denote the convex conjugate of f dual problem: minimize y f ( m i=1 AT (m) yi ) + m i=1 r i (y i) dual structure: nonsmooth + smooth linear 43 / 60

44 Example: support vector machine (SVM) given N sample-label pairs (x i, y i) where x i R d and y i {1, 1} primal problem: dual problem: apply scheme 1 minimize w,b maximize α R N m i=1 ι R + ( yi (x T (i) w b) 1) w 2 quadratic(α; X) + nonsmooth(α) 44 / 60

45 Operator splitting: Decentralized applications 45 / 60

46 Decentralized computing n agents in a connected network G = (V, E) with bi-directional links E each agent i has a private function f i problem: find a consensus solution x to minimize x R p f (x) := n f i(x i) subject to x i = x j, i, j. i=1 challenges: no center, only between-neighbor communication benefits: fault tolerance, no long-dist communication, privacy 46 / 60

47 Decentralized ADMM Decentralized consensus optimization problem: minimize fi(xi) x i,i V i V subject to x i = x j, (i, j) E ADMM reformulation: minimize x i,i V, y ij,(i,j) E subject to i V fi(xi) x i = y ij, x j = y ij, (i, j) E ADMM (Douglas-Rachford splitting) alternates between two steps each agent: update x i while related y ij are fixed each pair of agents (i, j): update y ij and dual var while x i, x j are fixed 47 / 60

48 Primal-dual splitting problem: minimize x i,i V i V ( ri(x i) + f i(x i) ) subject to W x = x. where r i are convex and f i are convex and smooth the mixing matrix W R E E : w ij 0 only if i = j or agents i, j are neighbors symmetric W = W T, doubly stochastic W 1 = 1. thus I W 0 consensus: x i = x j, (i, j) E W x = x where x stacks all x T i equivalent problem: minimize x i,i V r(x) + f (x) = i V ( ri(x i) + f i(x i) ) subject to W x = x. 48 / 60

49 let V T V = 1 2 (I W ) equivalent problem (KKT conditions): [ ] [ ] [ ] r V T x f (x) 0 + V 0 q 0 applying forward-backward splitting with a special metric, skipping details, eliminating q, we obtain x k+1+1/2 = W x k+1 + x k+1/2 1 2 (W + I )xk α [ f (x k+1 ) f (x k ) ] x k+2 = arg min r(x) + 1 2α x xk+1+1/2 2 F this recovers the PG-EXTRA decentralized algorithm 49 / 60

50 Operator-splitting analysis 50 / 60

51 How to analyze splitting algorithms? problem: find x such that 0 (A + B)x and 0 (A + B + C)x iteration: z k+1 = T(z k ) require: fixed point z of T encodes a solution x z k+1 z 2 z k z 2 C z k+1 z k 2 sufficiency: T is α-averaged, α (0, 1) 51 / 60

52 Averaged operator weaker than contractive operators; strong than nonexpansive operators T is α-averaged, α (0, 1), if for any z, ˉz H Tz Tˉz 2 z ˉz 2 1 α α (I T)z (I T)ˉz 2. assume z k+1 = Tz k and ˉz = Tˉz, then consequences: z k+1 z k 0 z k+1 ˉz 2 z k ˉz 2 1 α α zk+1 z k 2, boundedness of {z k }, subsequence z k j z weakly (by demiclossedness and monotonicity) z k z weakly and z = Tz z k+1 z k 2 = o(1/k) (Davis-Y 15) 52 / 60

53 Examples A is monotone J γa := (I + γa) 1 is (1/2)-averaged 2 R γa = 2J γa I is nonexpansive T is nonexpansive T 1 = (1 α)i + αt is α-averaged, α (0, 1) A is β-cocoercive F γa := I γa is (1 γ 2β )-averaged Baillon-Haddad: if f is convex, f is 1 γ -Lipschitz I γ f is (1 )-averaged β 2β 2 also known as firmly nonexpansive 53 / 60

54 Composition properties T 1, T 2 are nonexpansive T 1 T 2 is nonexpansive T 1, T 2 are averaged T 1 T 2 is averaged 54 / 60

55 Consequences assume A, B are monotone B is β-cocoercive, γ (0, 2β) FBS: J γa F γb is averaged PRS: (2J γa I ) (2J γb I ) is nonexpansive DRS: 1 I + 1 (2J 2 2 γa I ) (2J γb I ) is (1/2)-averaged 55 / 60

56 A three-operator splitting scheme require: A, B maximally monotone, C cocoercive Davis and Yin 15: T DYS := I J γb + J γa (2J γb I γc J γb ) (evaluating T 3z will evaluate J γa, J γb, and C only once each) reduces to BFS if A = 0, FBS if B = 0, and to DRS if C = 0 equivalent conditions: 0 Ax + Bx + Cx z = T 3(z), x = J γb z 56 / 60

57 Abstraction: KM (Krasnosel skiĭ Mann) iteration require: T is nonexpansive (1-Lipschitz) choose λ > 0 so that T λ = (1 λ) + λt is an averaged operator KM iteration: z k+1 = T λ z k special cases: J γa, (I γa), T FBS, T BFS, T DRS, TPRS, λ and T DYS (the step-size of any cocoercive map therein must be bounded) convergence: if FixT, then z k z FixT. divergence: if FixT =, then (z k ) k 0 goes unbounded. 57 / 60

58 Open question Find an operator-splitting scheme for 0 (T T m)x, m 4. require: no use of auxiliary variable convergence is guaranteed under monotonic T i s 58 / 60

59 Summary monotone operator splitting is a set of powerful and elegant tools for many problems in signal processing, machine learning, computer vision, etc. they give rise to parallel, distributed, and decentralized algorithms under the hood: fixed-point and nonexpansive-operator theory not covered: the convergence rates of objective error: f k f point error: z k z 2 accelerated rates by averaging and extrapolation 59 / 60

60 Thank you! References: H. Bauschke and P. Combettes. Convex analysis and monotone operator theory in Hilbert spaces. Springer, N. Komodakis and J.-C. Pasquet. Playing with duality: an overview of recent primal-dual approaches for solving large-scale optimization problems. IEEE Signal Processing Magazine, D. Davis and Y, Convergence rate analysis of several splitting schemes, UCLA CAM 14-51, D. Davis and Y, A three-operator splitting method and its acceleration, UCLA CAM 15-13, / 60

Math 273a: Optimization Overview of First-Order Optimization Algorithms

Math 273a: Optimization Overview of First-Order Optimization Algorithms Math 273a: Optimization Overview of First-Order Optimization Algorithms Wotao Yin Department of Mathematics, UCLA online discussions on piazza.com 1 / 9 Typical flow of numerical optimization Optimization

More information

Coordinate Update Algorithm Short Course Operator Splitting

Coordinate Update Algorithm Short Course Operator Splitting Coordinate Update Algorithm Short Course Operator Splitting Instructor: Wotao Yin (UCLA Math) Summer 2016 1 / 25 Operator splitting pipeline 1. Formulate a problem as 0 A(x) + B(x) with monotone operators

More information

Tight Rates and Equivalence Results of Operator Splitting Schemes

Tight Rates and Equivalence Results of Operator Splitting Schemes Tight Rates and Equivalence Results of Operator Splitting Schemes Wotao Yin (UCLA Math) Workshop on Optimization for Modern Computing Joint w Damek Davis and Ming Yan UCLA CAM 14-51, 14-58, and 14-59 1

More information

ARock: an algorithmic framework for asynchronous parallel coordinate updates

ARock: an algorithmic framework for asynchronous parallel coordinate updates ARock: an algorithmic framework for asynchronous parallel coordinate updates Zhimin Peng, Yangyang Xu, Ming Yan, Wotao Yin ( UCLA Math, U.Waterloo DCO) UCLA CAM Report 15-37 ShanghaiTech SSDS 15 June 25,

More information

Convergence of Fixed-Point Iterations

Convergence of Fixed-Point Iterations Convergence of Fixed-Point Iterations Instructor: Wotao Yin (UCLA Math) July 2016 1 / 30 Why study fixed-point iterations? Abstract many existing algorithms in optimization, numerical linear algebra, and

More information

Splitting methods for decomposing separable convex programs

Splitting methods for decomposing separable convex programs Splitting methods for decomposing separable convex programs Philippe Mahey LIMOS - ISIMA - Université Blaise Pascal PGMO, ENSTA 2013 October 4, 2013 1 / 30 Plan 1 Max Monotone Operators Proximal techniques

More information

Coordinate Update Algorithm Short Course Proximal Operators and Algorithms

Coordinate Update Algorithm Short Course Proximal Operators and Algorithms Coordinate Update Algorithm Short Course Proximal Operators and Algorithms Instructor: Wotao Yin (UCLA Math) Summer 2016 1 / 36 Why proximal? Newton s method: for C 2 -smooth, unconstrained problems allow

More information

Splitting Techniques in the Face of Huge Problem Sizes: Block-Coordinate and Block-Iterative Approaches

Splitting Techniques in the Face of Huge Problem Sizes: Block-Coordinate and Block-Iterative Approaches Splitting Techniques in the Face of Huge Problem Sizes: Block-Coordinate and Block-Iterative Approaches Patrick L. Combettes joint work with J.-C. Pesquet) Laboratoire Jacques-Louis Lions Faculté de Mathématiques

More information

Proximal splitting methods on convex problems with a quadratic term: Relax!

Proximal splitting methods on convex problems with a quadratic term: Relax! Proximal splitting methods on convex problems with a quadratic term: Relax! The slides I presented with added comments Laurent Condat GIPSA-lab, Univ. Grenoble Alpes, France Workshop BASP Frontiers, Jan.

More information

A Primal-dual Three-operator Splitting Scheme

A Primal-dual Three-operator Splitting Scheme Noname manuscript No. (will be inserted by the editor) A Primal-dual Three-operator Splitting Scheme Ming Yan Received: date / Accepted: date Abstract In this paper, we propose a new primal-dual algorithm

More information

ARock: an Algorithmic Framework for Asynchronous Parallel Coordinate Updates

ARock: an Algorithmic Framework for Asynchronous Parallel Coordinate Updates ARock: an Algorithmic Framework for Asynchronous Parallel Coordinate Updates Zhimin Peng Yangyang Xu Ming Yan Wotao Yin May 3, 216 Abstract Finding a fixed point to a nonexpansive operator, i.e., x = T

More information

ARock: an Algorithmic Framework for Async-Parallel Coordinate Updates

ARock: an Algorithmic Framework for Async-Parallel Coordinate Updates ARock: an Algorithmic Framework for Async-Parallel Coordinate Updates Zhimin Peng Yangyang Xu Ming Yan Wotao Yin July 7, 215 The problem of finding a fixed point to a nonexpansive operator is an abstraction

More information

Sparse Optimization Lecture: Dual Methods, Part I

Sparse Optimization Lecture: Dual Methods, Part I Sparse Optimization Lecture: Dual Methods, Part I Instructor: Wotao Yin July 2013 online discussions on piazza.com Those who complete this lecture will know dual (sub)gradient iteration augmented l 1 iteration

More information

Iterative Convex Optimization Algorithms; Part One: Using the Baillon Haddad Theorem

Iterative Convex Optimization Algorithms; Part One: Using the Baillon Haddad Theorem Iterative Convex Optimization Algorithms; Part One: Using the Baillon Haddad Theorem Charles Byrne (Charles Byrne@uml.edu) http://faculty.uml.edu/cbyrne/cbyrne.html Department of Mathematical Sciences

More information

Asynchronous Parallel Computing in Signal Processing and Machine Learning

Asynchronous Parallel Computing in Signal Processing and Machine Learning Asynchronous Parallel Computing in Signal Processing and Machine Learning Wotao Yin (UCLA Math) joint with Zhimin Peng (UCLA), Yangyang Xu (IMA), Ming Yan (MSU) Optimization and Parsimonious Modeling IMA,

More information

Primal-dual algorithms for the sum of two and three functions 1

Primal-dual algorithms for the sum of two and three functions 1 Primal-dual algorithms for the sum of two and three functions 1 Ming Yan Michigan State University, CMSE/Mathematics 1 This works is partially supported by NSF. optimization problems for primal-dual algorithms

More information

arxiv: v4 [math.oc] 29 Jan 2018

arxiv: v4 [math.oc] 29 Jan 2018 Noname manuscript No. (will be inserted by the editor A new primal-dual algorithm for minimizing the sum of three functions with a linear operator Ming Yan arxiv:1611.09805v4 [math.oc] 29 Jan 2018 Received:

More information

About Split Proximal Algorithms for the Q-Lasso

About Split Proximal Algorithms for the Q-Lasso Thai Journal of Mathematics Volume 5 (207) Number : 7 http://thaijmath.in.cmu.ac.th ISSN 686-0209 About Split Proximal Algorithms for the Q-Lasso Abdellatif Moudafi Aix Marseille Université, CNRS-L.S.I.S

More information

A Tutorial on Primal-Dual Algorithm

A Tutorial on Primal-Dual Algorithm A Tutorial on Primal-Dual Algorithm Shenlong Wang University of Toronto March 31, 2016 1 / 34 Energy minimization MAP Inference for MRFs Typical energies consist of a regularization term and a data term.

More information

Asynchronous Algorithms for Conic Programs, including Optimal, Infeasible, and Unbounded Ones

Asynchronous Algorithms for Conic Programs, including Optimal, Infeasible, and Unbounded Ones Asynchronous Algorithms for Conic Programs, including Optimal, Infeasible, and Unbounded Ones Wotao Yin joint: Fei Feng, Robert Hannah, Yanli Liu, Ernest Ryu (UCLA, Math) DIMACS: Distributed Optimization,

More information

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization /

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization / Uses of duality Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Remember conjugate functions Given f : R n R, the function is called its conjugate f (y) = max x R n yt x f(x) Conjugates appear

More information

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 9. Alternating Direction Method of Multipliers

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 9. Alternating Direction Method of Multipliers Shiqian Ma, MAT-258A: Numerical Optimization 1 Chapter 9 Alternating Direction Method of Multipliers Shiqian Ma, MAT-258A: Numerical Optimization 2 Separable convex optimization a special case is min f(x)

More information

Convex Optimization. (EE227A: UC Berkeley) Lecture 15. Suvrit Sra. (Gradient methods III) 12 March, 2013

Convex Optimization. (EE227A: UC Berkeley) Lecture 15. Suvrit Sra. (Gradient methods III) 12 March, 2013 Convex Optimization (EE227A: UC Berkeley) Lecture 15 (Gradient methods III) 12 March, 2013 Suvrit Sra Optimal gradient methods 2 / 27 Optimal gradient methods We saw following efficiency estimates for

More information

Dual and primal-dual methods

Dual and primal-dual methods ELE 538B: Large-Scale Optimization for Data Science Dual and primal-dual methods Yuxin Chen Princeton University, Spring 2018 Outline Dual proximal gradient method Primal-dual proximal gradient method

More information

A New Primal Dual Algorithm for Minimizing the Sum of Three Functions with a Linear Operator

A New Primal Dual Algorithm for Minimizing the Sum of Three Functions with a Linear Operator https://doi.org/10.1007/s10915-018-0680-3 A New Primal Dual Algorithm for Minimizing the Sum of Three Functions with a Linear Operator Ming Yan 1,2 Received: 22 January 2018 / Accepted: 22 February 2018

More information

Primal-dual coordinate descent A Coordinate Descent Primal-Dual Algorithm with Large Step Size and Possibly Non-Separable Functions

Primal-dual coordinate descent A Coordinate Descent Primal-Dual Algorithm with Large Step Size and Possibly Non-Separable Functions Primal-dual coordinate descent A Coordinate Descent Primal-Dual Algorithm with Large Step Size and Possibly Non-Separable Functions Olivier Fercoq and Pascal Bianchi Problem Minimize the convex function

More information

Accelerated primal-dual methods for linearly constrained convex problems

Accelerated primal-dual methods for linearly constrained convex problems Accelerated primal-dual methods for linearly constrained convex problems Yangyang Xu SIAM Conference on Optimization May 24, 2017 1 / 23 Accelerated proximal gradient For convex composite problem: minimize

More information

M. Marques Alves Marina Geremia. November 30, 2017

M. Marques Alves Marina Geremia. November 30, 2017 Iteration complexity of an inexact Douglas-Rachford method and of a Douglas-Rachford-Tseng s F-B four-operator splitting method for solving monotone inclusions M. Marques Alves Marina Geremia November

More information

Recent Developments of Alternating Direction Method of Multipliers with Multi-Block Variables

Recent Developments of Alternating Direction Method of Multipliers with Multi-Block Variables Recent Developments of Alternating Direction Method of Multipliers with Multi-Block Variables Department of Systems Engineering and Engineering Management The Chinese University of Hong Kong 2014 Workshop

More information

Coordinate Update Algorithm Short Course Subgradients and Subgradient Methods

Coordinate Update Algorithm Short Course Subgradients and Subgradient Methods Coordinate Update Algorithm Short Course Subgradients and Subgradient Methods Instructor: Wotao Yin (UCLA Math) Summer 2016 1 / 30 Notation f : H R { } is a closed proper convex function domf := {x R n

More information

Optimization for Learning and Big Data

Optimization for Learning and Big Data Optimization for Learning and Big Data Donald Goldfarb Department of IEOR Columbia University Department of Mathematics Distinguished Lecture Series May 17-19, 2016. Lecture 1. First-Order Methods for

More information

Distributed Consensus Optimization

Distributed Consensus Optimization Distributed Consensus Optimization Ming Yan Michigan State University, CMSE/Mathematics September 14, 2018 Decentralized-1 Backgroundwhy andwe motivation need decentralized optimization? I Decentralized

More information

Convergence analysis for a primal-dual monotone + skew splitting algorithm with applications to total variation minimization

Convergence analysis for a primal-dual monotone + skew splitting algorithm with applications to total variation minimization Convergence analysis for a primal-dual monotone + skew splitting algorithm with applications to total variation minimization Radu Ioan Boţ Christopher Hendrich November 7, 202 Abstract. In this paper we

More information

Math 273a: Optimization Convex Conjugacy

Math 273a: Optimization Convex Conjugacy Math 273a: Optimization Convex Conjugacy Instructor: Wotao Yin Department of Mathematics, UCLA Fall 2015 online discussions on piazza.com Convex conjugate (the Legendre transform) Let f be a closed proper

More information

minimize x subject to (x 2)(x 4) u,

minimize x subject to (x 2)(x 4) u, Math 6366/6367: Optimization and Variational Methods Sample Preliminary Exam Questions 1. Suppose that f : [, L] R is a C 2 -function with f () on (, L) and that you have explicit formulae for

More information

This can be 2 lectures! still need: Examples: non-convex problems applications for matrix factorization

This can be 2 lectures! still need: Examples: non-convex problems applications for matrix factorization This can be 2 lectures! still need: Examples: non-convex problems applications for matrix factorization x = prox_f(x)+prox_{f^*}(x) use to get prox of norms! PROXIMAL METHODS WHY PROXIMAL METHODS Smooth

More information

Master 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique

Master 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique Master 2 MathBigData S. Gaïffas 1 3 novembre 2014 1 CMAP - Ecole Polytechnique 1 Supervised learning recap Introduction Loss functions, linearity 2 Penalization Introduction Ridge Sparsity Lasso 3 Some

More information

Distributed Convex Optimization

Distributed Convex Optimization Master Program 2013-2015 Electrical Engineering Distributed Convex Optimization A Study on the Primal-Dual Method of Multipliers Delft University of Technology He Ming Zhang, Guoqiang Zhang, Richard Heusdens

More information

Primal-dual coordinate descent

Primal-dual coordinate descent Primal-dual coordinate descent Olivier Fercoq Joint work with P. Bianchi & W. Hachem 15 July 2015 1/28 Minimize the convex function f, g, h convex f is differentiable Problem min f (x) + g(x) + h(mx) x

More information

A General Framework for a Class of Primal-Dual Algorithms for TV Minimization

A General Framework for a Class of Primal-Dual Algorithms for TV Minimization A General Framework for a Class of Primal-Dual Algorithms for TV Minimization Ernie Esser UCLA 1 Outline A Model Convex Minimization Problem Main Idea Behind the Primal Dual Hybrid Gradient (PDHG) Method

More information

Math 273a: Optimization Lagrange Duality

Math 273a: Optimization Lagrange Duality Math 273a: Optimization Lagrange Duality Instructor: Wotao Yin Department of Mathematics, UCLA Winter 2015 online discussions on piazza.com Gradient descent / forward Euler assume function f is proper

More information

Adaptive Primal Dual Optimization for Image Processing and Learning

Adaptive Primal Dual Optimization for Image Processing and Learning Adaptive Primal Dual Optimization for Image Processing and Learning Tom Goldstein Rice University tag7@rice.edu Ernie Esser University of British Columbia eesser@eos.ubc.ca Richard Baraniuk Rice University

More information

Accelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems)

Accelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems) Accelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems) Donghwan Kim and Jeffrey A. Fessler EECS Department, University of Michigan

More information

A New Use of Douglas-Rachford Splitting and ADMM for Classifying Infeasible, Unbounded, and Pathological Conic Programs

A New Use of Douglas-Rachford Splitting and ADMM for Classifying Infeasible, Unbounded, and Pathological Conic Programs A New Use of Douglas-Rachford Splitting and ADMM for Classifying Infeasible, Unbounded, and Pathological Conic Programs Yanli Liu, Ernest K. Ryu, and Wotao Yin May 23, 2017 Abstract In this paper, we present

More information

Distributed Optimization via Alternating Direction Method of Multipliers

Distributed Optimization via Alternating Direction Method of Multipliers Distributed Optimization via Alternating Direction Method of Multipliers Stephen Boyd, Neal Parikh, Eric Chu, Borja Peleato Stanford University ITMANET, Stanford, January 2011 Outline precursors dual decomposition

More information

ACCELERATED FIRST-ORDER PRIMAL-DUAL PROXIMAL METHODS FOR LINEARLY CONSTRAINED COMPOSITE CONVEX PROGRAMMING

ACCELERATED FIRST-ORDER PRIMAL-DUAL PROXIMAL METHODS FOR LINEARLY CONSTRAINED COMPOSITE CONVEX PROGRAMMING ACCELERATED FIRST-ORDER PRIMAL-DUAL PROXIMAL METHODS FOR LINEARLY CONSTRAINED COMPOSITE CONVEX PROGRAMMING YANGYANG XU Abstract. Motivated by big data applications, first-order methods have been extremely

More information

Math 273a: Optimization Subgradient Methods

Math 273a: Optimization Subgradient Methods Math 273a: Optimization Subgradient Methods Instructor: Wotao Yin Department of Mathematics, UCLA Fall 2015 online discussions on piazza.com Nonsmooth convex function Recall: For ˉx R n, f(ˉx) := {g R

More information

Sparse Regularization via Convex Analysis

Sparse Regularization via Convex Analysis Sparse Regularization via Convex Analysis Ivan Selesnick Electrical and Computer Engineering Tandon School of Engineering New York University Brooklyn, New York, USA 29 / 66 Convex or non-convex: Which

More information

Optimization methods

Optimization methods Optimization methods Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda /8/016 Introduction Aim: Overview of optimization methods that Tend to

More information

On the order of the operators in the Douglas Rachford algorithm

On the order of the operators in the Douglas Rachford algorithm On the order of the operators in the Douglas Rachford algorithm Heinz H. Bauschke and Walaa M. Moursi June 11, 2015 Abstract The Douglas Rachford algorithm is a popular method for finding zeros of sums

More information

Dual methods and ADMM. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725

Dual methods and ADMM. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725 Dual methods and ADMM Barnabas Poczos & Ryan Tibshirani Convex Optimization 10-725/36-725 1 Given f : R n R, the function is called its conjugate Recall conjugate functions f (y) = max x R n yt x f(x)

More information

Convex Optimization. Dani Yogatama. School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA. February 12, 2014

Convex Optimization. Dani Yogatama. School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA. February 12, 2014 Convex Optimization Dani Yogatama School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA February 12, 2014 Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12,

More information

On the equivalence of the primal-dual hybrid gradient method and Douglas Rachford splitting

On the equivalence of the primal-dual hybrid gradient method and Douglas Rachford splitting Mathematical Programming manuscript No. (will be inserted by the editor) On the equivalence of the primal-dual hybrid gradient method and Douglas Rachford splitting Daniel O Connor Lieven Vandenberghe

More information

Contraction Methods for Convex Optimization and monotone variational inequalities No.12

Contraction Methods for Convex Optimization and monotone variational inequalities No.12 XII - 1 Contraction Methods for Convex Optimization and monotone variational inequalities No.12 Linearized alternating direction methods of multipliers for separable convex programming Bingsheng He Department

More information

On the convergence rate of a forward-backward type primal-dual splitting algorithm for convex optimization problems

On the convergence rate of a forward-backward type primal-dual splitting algorithm for convex optimization problems On the convergence rate of a forward-backward type primal-dual splitting algorithm for convex optimization problems Radu Ioan Boţ Ernö Robert Csetnek August 5, 014 Abstract. In this paper we analyze the

More information

On the interior of the simplex, we have the Hessian of d(x), Hd(x) is diagonal with ith. µd(w) + w T c. minimize. subject to w T 1 = 1,

On the interior of the simplex, we have the Hessian of d(x), Hd(x) is diagonal with ith. µd(w) + w T c. minimize. subject to w T 1 = 1, Math 30 Winter 05 Solution to Homework 3. Recognizing the convexity of g(x) := x log x, from Jensen s inequality we get d(x) n x + + x n n log x + + x n n where the equality is attained only at x = (/n,...,

More information

Optimality, Duality, Complementarity for Constrained Optimization

Optimality, Duality, Complementarity for Constrained Optimization Optimality, Duality, Complementarity for Constrained Optimization Stephen Wright University of Wisconsin-Madison May 2014 Wright (UW-Madison) Optimality, Duality, Complementarity May 2014 1 / 41 Linear

More information

ADMM for monotone operators: convergence analysis and rates

ADMM for monotone operators: convergence analysis and rates ADMM for monotone operators: convergence analysis and rates Radu Ioan Boţ Ernö Robert Csetne May 4, 07 Abstract. We propose in this paper a unifying scheme for several algorithms from the literature dedicated

More information

The Alternating Direction Method of Multipliers

The Alternating Direction Method of Multipliers The Alternating Direction Method of Multipliers With Adaptive Step Size Selection Peter Sutor, Jr. Project Advisor: Professor Tom Goldstein December 2, 2015 1 / 25 Background The Dual Problem Consider

More information

First-order methods for structured nonsmooth optimization

First-order methods for structured nonsmooth optimization First-order methods for structured nonsmooth optimization Sangwoon Yun Department of Mathematics Education Sungkyunkwan University Oct 19, 2016 Center for Mathematical Analysis & Computation, Yonsei University

More information

Optimization methods

Optimization methods Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,

More information

9. Dual decomposition and dual algorithms

9. Dual decomposition and dual algorithms EE 546, Univ of Washington, Spring 2016 9. Dual decomposition and dual algorithms dual gradient ascent example: network rate control dual decomposition and the proximal gradient method examples with simple

More information

Math 273a: Optimization Subgradients of convex functions

Math 273a: Optimization Subgradients of convex functions Math 273a: Optimization Subgradients of convex functions Made by: Damek Davis Edited by Wotao Yin Department of Mathematics, UCLA Fall 2015 online discussions on piazza.com 1 / 42 Subgradients Assumptions

More information

Beyond Heuristics: Applying Alternating Direction Method of Multipliers in Nonconvex Territory

Beyond Heuristics: Applying Alternating Direction Method of Multipliers in Nonconvex Territory Beyond Heuristics: Applying Alternating Direction Method of Multipliers in Nonconvex Territory Xin Liu(4Ð) State Key Laboratory of Scientific and Engineering Computing Institute of Computational Mathematics

More information

Convex Optimization Algorithms for Machine Learning in 10 Slides

Convex Optimization Algorithms for Machine Learning in 10 Slides Convex Optimization Algorithms for Machine Learning in 10 Slides Presenter: Jul. 15. 2015 Outline 1 Quadratic Problem Linear System 2 Smooth Problem Newton-CG 3 Composite Problem Proximal-Newton-CD 4 Non-smooth,

More information

arxiv: v1 [math.oc] 12 Mar 2013

arxiv: v1 [math.oc] 12 Mar 2013 On the convergence rate improvement of a primal-dual splitting algorithm for solving monotone inclusion problems arxiv:303.875v [math.oc] Mar 03 Radu Ioan Boţ Ernö Robert Csetnek André Heinrich February

More information

A generalized forward-backward method for solving split equality quasi inclusion problems in Banach spaces

A generalized forward-backward method for solving split equality quasi inclusion problems in Banach spaces Available online at www.isr-publications.com/jnsa J. Nonlinear Sci. Appl., 10 (2017), 4890 4900 Research Article Journal Homepage: www.tjnsa.com - www.isr-publications.com/jnsa A generalized forward-backward

More information

Self Equivalence of the Alternating Direction Method of Multipliers

Self Equivalence of the Alternating Direction Method of Multipliers Self Equivalence of the Alternating Direction Method of Multipliers Ming Yan Wotao Yin August 11, 2014 Abstract In this paper, we show interesting self equivalence results for the alternating direction

More information

A Unified Approach to Proximal Algorithms using Bregman Distance

A Unified Approach to Proximal Algorithms using Bregman Distance A Unified Approach to Proximal Algorithms using Bregman Distance Yi Zhou a,, Yingbin Liang a, Lixin Shen b a Department of Electrical Engineering and Computer Science, Syracuse University b Department

More information

A Parallel Block-Coordinate Approach for Primal-Dual Splitting with Arbitrary Random Block Selection

A Parallel Block-Coordinate Approach for Primal-Dual Splitting with Arbitrary Random Block Selection EUSIPCO 2015 1/19 A Parallel Block-Coordinate Approach for Primal-Dual Splitting with Arbitrary Random Block Selection Jean-Christophe Pesquet Laboratoire d Informatique Gaspard Monge - CNRS Univ. Paris-Est

More information

Complexity of the relaxed Peaceman-Rachford splitting method for the sum of two maximal strongly monotone operators

Complexity of the relaxed Peaceman-Rachford splitting method for the sum of two maximal strongly monotone operators Complexity of the relaxed Peaceman-Rachford splitting method for the sum of two maximal strongly monotone operators Renato D.C. Monteiro, Chee-Khian Sim November 3, 206 Abstract This paper considers the

More information

WHY DUALITY? Gradient descent Newton s method Quasi-newton Conjugate gradients. No constraints. Non-differentiable ???? Constrained problems? ????

WHY DUALITY? Gradient descent Newton s method Quasi-newton Conjugate gradients. No constraints. Non-differentiable ???? Constrained problems? ???? DUALITY WHY DUALITY? No constraints f(x) Non-differentiable f(x) Gradient descent Newton s method Quasi-newton Conjugate gradients etc???? Constrained problems? f(x) subject to g(x) apple 0???? h(x) =0

More information

Bias-free Sparse Regression with Guaranteed Consistency

Bias-free Sparse Regression with Guaranteed Consistency Bias-free Sparse Regression with Guaranteed Consistency Wotao Yin (UCLA Math) joint with: Stanley Osher, Ming Yan (UCLA) Feng Ruan, Jiechao Xiong, Yuan Yao (Peking U) UC Riverside, STATS Department March

More information

arxiv: v2 [math.oc] 2 Mar 2017

arxiv: v2 [math.oc] 2 Mar 2017 CYCLIC COORDINATE UPDATE ALGORITHMS FOR FIXED-POINT PROBLEMS: ANALYSIS AND APPLICATIONS YAT TIN CHOW, TIANYU WU, AND WOTAO YIN arxiv:6.0456v [math.oc] Mar 07 Abstract. Many problems reduce to the fixed-point

More information

Monotone Operator Splitting Methods in Signal and Image Recovery

Monotone Operator Splitting Methods in Signal and Image Recovery Monotone Operator Splitting Methods in Signal and Image Recovery P.L. Combettes 1, J.-C. Pesquet 2, and N. Pustelnik 3 2 Univ. Pierre et Marie Curie, Paris 6 LJLL CNRS UMR 7598 2 Univ. Paris-Est LIGM CNRS

More information

Coordinate Update Algorithm Short Course The Package TMAC

Coordinate Update Algorithm Short Course The Package TMAC Coordinate Update Algorithm Short Course The Package TMAC Instructor: Wotao Yin (UCLA Math) Summer 2016 1 / 16 TMAC: A Toolbox of Async-Parallel, Coordinate, Splitting, and Stochastic Methods C++11 multi-threading

More information

Lecture: Algorithms for Compressed Sensing

Lecture: Algorithms for Compressed Sensing 1/56 Lecture: Algorithms for Compressed Sensing Zaiwen Wen Beijing International Center For Mathematical Research Peking University http://bicmr.pku.edu.cn/~wenzw/bigdata2017.html wenzw@pku.edu.cn Acknowledgement:

More information

LOCAL LINEAR CONVERGENCE OF ADMM Daniel Boley

LOCAL LINEAR CONVERGENCE OF ADMM Daniel Boley LOCAL LINEAR CONVERGENCE OF ADMM Daniel Boley Model QP/LP: min 1 / 2 x T Qx+c T x s.t. Ax = b, x 0, (1) Lagrangian: L(x,y) = 1 / 2 x T Qx+c T x y T x s.t. Ax = b, (2) where y 0 is the vector of Lagrange

More information

Asynchronous Non-Convex Optimization For Separable Problem

Asynchronous Non-Convex Optimization For Separable Problem Asynchronous Non-Convex Optimization For Separable Problem Sandeep Kumar and Ketan Rajawat Dept. of Electrical Engineering, IIT Kanpur Uttar Pradesh, India Distributed Optimization A general multi-agent

More information

I P IANO : I NERTIAL P ROXIMAL A LGORITHM FOR N ON -C ONVEX O PTIMIZATION

I P IANO : I NERTIAL P ROXIMAL A LGORITHM FOR N ON -C ONVEX O PTIMIZATION I P IANO : I NERTIAL P ROXIMAL A LGORITHM FOR N ON -C ONVEX O PTIMIZATION Peter Ochs University of Freiburg Germany 17.01.2017 joint work with: Thomas Brox and Thomas Pock c 2017 Peter Ochs ipiano c 1

More information

Alternating Direction Method of Multipliers. Ryan Tibshirani Convex Optimization

Alternating Direction Method of Multipliers. Ryan Tibshirani Convex Optimization Alternating Direction Method of Multipliers Ryan Tibshirani Convex Optimization 10-725 Consider the problem Last time: dual ascent min x f(x) subject to Ax = b where f is strictly convex and closed. Denote

More information

Douglas-Rachford Splitting: Complexity Estimates and Accelerated Variants

Douglas-Rachford Splitting: Complexity Estimates and Accelerated Variants 53rd IEEE Conference on Decision and Control December 5-7, 204. Los Angeles, California, USA Douglas-Rachford Splitting: Complexity Estimates and Accelerated Variants Panagiotis Patrinos and Lorenzo Stella

More information

Optimization for Machine Learning

Optimization for Machine Learning Optimization for Machine Learning (Problems; Algorithms - A) SUVRIT SRA Massachusetts Institute of Technology PKU Summer School on Data Science (July 2017) Course materials http://suvrit.de/teaching.html

More information

Duality in Linear Programs. Lecturer: Ryan Tibshirani Convex Optimization /36-725

Duality in Linear Programs. Lecturer: Ryan Tibshirani Convex Optimization /36-725 Duality in Linear Programs Lecturer: Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: proximal gradient descent Consider the problem x g(x) + h(x) with g, h convex, g differentiable, and

More information

Distributed Computation of Quantiles via ADMM

Distributed Computation of Quantiles via ADMM 1 Distributed Computation of Quantiles via ADMM Franck Iutzeler Abstract In this paper, we derive distributed synchronous and asynchronous algorithms for computing uantiles of the agents local values.

More information

Does Alternating Direction Method of Multipliers Converge for Nonconvex Problems?

Does Alternating Direction Method of Multipliers Converge for Nonconvex Problems? Does Alternating Direction Method of Multipliers Converge for Nonconvex Problems? Mingyi Hong IMSE and ECpE Department Iowa State University ICCOPT, Tokyo, August 2016 Mingyi Hong (Iowa State University)

More information

A GENERAL FRAMEWORK FOR A CLASS OF FIRST ORDER PRIMAL-DUAL ALGORITHMS FOR CONVEX OPTIMIZATION IN IMAGING SCIENCE

A GENERAL FRAMEWORK FOR A CLASS OF FIRST ORDER PRIMAL-DUAL ALGORITHMS FOR CONVEX OPTIMIZATION IN IMAGING SCIENCE A GENERAL FRAMEWORK FOR A CLASS OF FIRST ORDER PRIMAL-DUAL ALGORITHMS FOR CONVEX OPTIMIZATION IN IMAGING SCIENCE ERNIE ESSER XIAOQUN ZHANG TONY CHAN Abstract. We generalize the primal-dual hybrid gradient

More information

On convergence rate of the Douglas-Rachford operator splitting method

On convergence rate of the Douglas-Rachford operator splitting method On convergence rate of the Douglas-Rachford operator splitting method Bingsheng He and Xiaoming Yuan 2 Abstract. This note provides a simple proof on a O(/k) convergence rate for the Douglas- Rachford

More information

Approaching monotone inclusion problems via second order dynamical systems with linear and anisotropic damping

Approaching monotone inclusion problems via second order dynamical systems with linear and anisotropic damping March 0, 206 3:4 WSPC Proceedings - 9in x 6in secondorderanisotropicdamping206030 page Approaching monotone inclusion problems via second order dynamical systems with linear and anisotropic damping Radu

More information

Second order forward-backward dynamical systems for monotone inclusion problems

Second order forward-backward dynamical systems for monotone inclusion problems Second order forward-backward dynamical systems for monotone inclusion problems Radu Ioan Boţ Ernö Robert Csetnek March 6, 25 Abstract. We begin by considering second order dynamical systems of the from

More information

ON THE GLOBAL AND LINEAR CONVERGENCE OF THE GENERALIZED ALTERNATING DIRECTION METHOD OF MULTIPLIERS

ON THE GLOBAL AND LINEAR CONVERGENCE OF THE GENERALIZED ALTERNATING DIRECTION METHOD OF MULTIPLIERS ON THE GLOBAL AND LINEAR CONVERGENCE OF THE GENERALIZED ALTERNATING DIRECTION METHOD OF MULTIPLIERS WEI DENG AND WOTAO YIN Abstract. The formulation min x,y f(x) + g(y) subject to Ax + By = b arises in

More information

CYCLIC COORDINATE-UPDATE ALGORITHMS FOR FIXED-POINT PROBLEMS: ANALYSIS AND APPLICATIONS

CYCLIC COORDINATE-UPDATE ALGORITHMS FOR FIXED-POINT PROBLEMS: ANALYSIS AND APPLICATIONS SIAM J. SCI. COMPUT. Vol. 39, No. 4, pp. A80 A300 CYCLIC COORDINATE-UPDATE ALGORITHMS FOR FIXED-POINT PROBLEMS: ANALYSIS AND APPLICATIONS YAT TIN CHOW, TIANYU WU, AND WOTAO YIN Abstract. Many problems

More information

Convex Optimization Notes

Convex Optimization Notes Convex Optimization Notes Jonathan Siegel January 2017 1 Convex Analysis This section is devoted to the study of convex functions f : B R {+ } and convex sets U B, for B a Banach space. The case of B =

More information

In collaboration with J.-C. Pesquet A. Repetti EC (UPE) IFPEN 16 Dec / 29

In collaboration with J.-C. Pesquet A. Repetti EC (UPE) IFPEN 16 Dec / 29 A Random block-coordinate primal-dual proximal algorithm with application to 3D mesh denoising Emilie CHOUZENOUX Laboratoire d Informatique Gaspard Monge - CNRS Univ. Paris-Est, France Horizon Maths 2014

More information

Coordinate Descent and Ascent Methods

Coordinate Descent and Ascent Methods Coordinate Descent and Ascent Methods Julie Nutini Machine Learning Reading Group November 3 rd, 2015 1 / 22 Projected-Gradient Methods Motivation Rewrite non-smooth problem as smooth constrained problem:

More information

A GENERAL FRAMEWORK FOR A CLASS OF FIRST ORDER PRIMAL-DUAL ALGORITHMS FOR TV MINIMIZATION

A GENERAL FRAMEWORK FOR A CLASS OF FIRST ORDER PRIMAL-DUAL ALGORITHMS FOR TV MINIMIZATION A GENERAL FRAMEWORK FOR A CLASS OF FIRST ORDER PRIMAL-DUAL ALGORITHMS FOR TV MINIMIZATION ERNIE ESSER XIAOQUN ZHANG TONY CHAN Abstract. We generalize the primal-dual hybrid gradient (PDHG) algorithm proposed

More information

ALADIN An Algorithm for Distributed Non-Convex Optimization and Control

ALADIN An Algorithm for Distributed Non-Convex Optimization and Control ALADIN An Algorithm for Distributed Non-Convex Optimization and Control Boris Houska, Yuning Jiang, Janick Frasch, Rien Quirynen, Dimitris Kouzoupis, Moritz Diehl ShanghaiTech University, University of

More information

Signal Processing and Networks Optimization Part VI: Duality

Signal Processing and Networks Optimization Part VI: Duality Signal Processing and Networks Optimization Part VI: Duality Pierre Borgnat 1, Jean-Christophe Pesquet 2, Nelly Pustelnik 1 1 ENS Lyon Laboratoire de Physique CNRS UMR 5672 pierre.borgnat@ens-lyon.fr,

More information

Variational Image Restoration

Variational Image Restoration Variational Image Restoration Yuling Jiao yljiaostatistics@znufe.edu.cn School of and Statistics and Mathematics ZNUFE Dec 30, 2014 Outline 1 1 Classical Variational Restoration Models and Algorithms 1.1

More information