Tight Rates and Equivalence Results of Operator Splitting Schemes

Size: px
Start display at page:

Download "Tight Rates and Equivalence Results of Operator Splitting Schemes"

Transcription

1 Tight Rates and Equivalence Results of Operator Splitting Schemes Wotao Yin (UCLA Math) Workshop on Optimization for Modern Computing Joint w Damek Davis and Ming Yan UCLA CAM 14-51, 14-58, and / 45

2 Operator splitting methods They are methods for solving problems like minimize x minimize x,y find x C 1 C 2, f (x) + g(x), f (x) + g(y), subject to Ax + By = b, by iteratively performing simple operations. Algorithms: alternating projection, forward-backward splitting (FBS), Douglas-Rachford splitting (DRS), Peaceman-Rachford splitting (PRS), ADMM, etc. Most of them can be written as x k+1 T(x k ), where T satisfies x = T(x) x is a solution. T is nonexpansive. In particular, T(x k ) x 2 x k x 2. T is composed of I γ h, prox γh, and refl γh. 2 / 45

3 This talk Reviews some examples of prox and splitting algorithms. Establishes new convergence results, many of which are tight. Argues that convergence of DRS, PRS, and ADMM automatically improves upon better regularity properties. DRS, PRS, and ADMM are self-dual primal-dual algorithms. 3 / 45

4 Proximal operator Unlike those with explicit formulas, prox method is an optimization problem Examples: prox λf (v) := arg min f (x) + 1 x v 2 x 2λ f = ι C Euclidean projection prox f (v) = Proj C (v) closed-form formulas for norms and many separable functions Relation to resolvent: prox λf = (I + λ f ) 1, where f is proper closed convex S is maximally monotone (I + λs) 1 is a point-to-point mapping proximal-point algorithm (PPA): x k+1 = (I + λs) 1 (x k ) 4 / 45

5 Properties of prox λf Fixed point is optimal. f (x ) = min x f (x) x = prox γf (x ) T = prox λf is firmly nonexpansive, i.e., T(x) T(y) 2 x y 2 (x T(x)) (y T(y)) 2 weak convergence in Hilbert space, and the rate of fixed-point residual Interpretation: backward Euler / implicit gradient x k+1 = prox λf (x k ) x k+1 = (I + λ f ) 1 (x k ) x k x k+1 + λ f (x k+1 ) x k+1 = x k λ f (x k+1 ) (We use f for the subgradient of f, uniquely determined by proxλf ) Moreau decomposition: x = prox f (x) + prox f (x) For linear subspace S and f = ι S, reduces to x = Proj S (x) + Proj S (x) 5 / 45

6 Forward-backward splitting (FBS) minimize x r(x) + f (x) Suppose A = r and B = f (f is differentiable). Optimality condition has the operator form 0 ( r + f )x 0 (A + B)x (I γb)x (I + γa)x Prox-gradient (prox-linear) iteration (Sub)gradient form (I + γa) 1 (I γb) x = x. }{{}}{{} backward forward x k+1 = prox γr (x k γ f (x k )). x k+1 = x k γ r(x k+1 ) γ f (x k ). 6 / 45

7 Reflection operator and averaged operator Definition: refl f = 2prox f I. Subgradient form: x k f = prox f (z k ) = z k f (x k f ) z k+1 = refl f (z k ) = z k 2 f (x k f ). refl f is nonexpansive, but not firmly nonexpansive. Averaged operator: weighted average of I and a nonexpansive T. So, prox f = (refl f ) 1/2. Property: λ (0, 1], x, y T λ := (1 λ)i + λt. T λ (x) T λ (y) 2 x y 2 1 λ λ (x T λ(x)) (y T λ (y)) 2 7 / 45

8 Peaceman-Rachford splitting (PRS) Iteration: Subgradient form: Diagram: minimize z f (z) + g(z) z k+1 = T PRS (z k ) := refl γf refl γg(z k ) z k+1 = z k 2γ f (x k f ) 2γ g(x k g ). refl γg (z k ) x k g = prox γg (z k ) x k f = prox γf refl γg(z k ) z k T PRS (z k ) 8 / 45

9 Peaceman-Rachford splitting (PRS) Iteration: Subgradient form: Diagram: minimize z f (z) + g(z) z k+1 = T PRS (z k ) := refl γf refl γg(z k ) z k+1 = z k 2γ f (x k f ) 2γ g(x k g ). refl γg(z k ) γ g(x k g ) γ f (x k f ) x k g = prox γg (z k ) γ g(x k g ) x k f = prox γf refl γg(z k ) γ f (x k f ) z k T PRS (z k ) 9 / 45

10 PRS iteration may not converge. Example: Let C 1 = x 1 axis and C 2 = x 2 axis. minimize ι C1 (x) + ι C2 (x). x 2 z even x x 1 z odd Converges if one of the two functions is strongly convex Most well-known example of PRS: method of alternating project 10 / 45

11 Douglas-Rachford splitting (DRS) and relaxed PRS Relaxed PRS: fix z 0, γ > 0 and relaxation parameters (λ j) j 0 (0, 1] z k+1 = (T PRS ) λk (z k ). DRS: Corresponds to λ k 1/2. Always converges weakly, when a solution exists 1. (T PRS ) λk reflect, reflect, λ k -average. Fixed points minimizers of f + g. prox γg (z k ) a minimizer (proved in 2011 in Banach space). 2 1 Eckstein and Bertsekas, On the Douglas-Rachford Splitting Method and the Proximal Point Algorithm for Maximal Monotone Operators 2 Svaiter, On weak convergence of the Douglas-Rachford method 11 / 45

12 First-order algorithms: subgradient forms minimize f (x) + g(x) x (Sub)gradient descent: z k+1 = z k γ f (z k ) γ g(z k ). Proximal point algorithm (PPA): z k+1 = z k γ f (z k+1 ) γ g(z k+1 ). Forward backward splitting (FBS): z k+1 = z k γ f (z k+1 ) γ g(z k ). Relaxed Peaceman-Rachford splitting (PRS): ( ) z k+1 = z k λ γ f (x k k f ) + γ g(x g ). 12 / 45

13 ADMM ADMM iteration: minimize x,y f (x) + g(y) subject to Ax + By = b 1. x k+1 = arg min x f (x) + (w k ) T Ax + γ 2 Ax + Byk b 2 ; 2. y k+1 = arg min y g(y) + (w k ) T By + γ 2 Ax k+1 + By b 2 3. w k+1 = w k + γ(ax k+1 + By k+1 b). Equivalent to DRS applied to the dual problem 3 : Lagrangian:L(x, y; w) = f (x) + w T Ax + g(y) + w T By w T b }{{}}{{} L 1 (x;w) L 2 (y;w) Define: Dual problem: d 1(w) := min L 1(x; w), x d 2(w) := min L 2(y; w). y minimize d 1(w) + d 2(w). w 3 Gabay / 45

14 Diagram of ADMM refl γd1 (z k ) w k = prox γd1 (z k ) z k z k+1 T PRS (z k ) 14 / 45

15 Diagram of ADMM refl γd1 (z k ) γax k γ(by k+1 b) w k = prox γd1 (z k ) γax k γax k z k z k+1 T PRS (z k ) 15 / 45

16 Diagram of ADMM refl γd1 (z k ) γax k γ(by k+1 b) w k+1 = prox γd1 (z k+1 ) w k = prox γd1 (z k ) γax k γax k γax k+1 z k z k+1 T PRS (z k ) 16 / 45

17 Diagram of ADMM refl γd1 (z k ) γax k γ(by k+1 b) w k = prox γd1 (z k ) γ(ax k+1 + By k+1 b) w k+1 = prox γd1 (z k+1 ) γax k γax k γax k+1 z k z k+1 T PRS (z k ) 17 / 45

18 Generally, Krasnosel skiĭ-mann (KM) iteration 4 5 Definitions: H Hilbert space. T : H H nonexpansive. Fixed points: z H such that Tz = z. Averaged iteration of T, (aka KM iteration) z k+1 = T λk (z k ) := (1 λ k )z k + λ k Tz k. Convergence: Converges weakly to a fixed point if λ k bounded away from 0 and 1. If no fixed point and λ k is bounded away from 0, the sequence (z j ) j 0 is unbounded. (Browder-Göhde-Kirk fixed-point theorem.) Special cases: DRS, PRS, ADMM, FBS, PPA,... 4 Krasnosel skiĭ: Two remarks on the method of successive approximations (1955) 5 Mann: Mean value methods in iteration (1953) 18 / 45

19 Part 2: Convergence rates Fixed-point residual The fixed-point residual (FPR) of the KM iteration: Tz k z k 2 = 1 λ 2 k z k+1 z k 2. Tz z = 0 often means z is optimal. Small FPR implies Tz k z k. The property Tz k z k 0 is called asymptotic regularity. In general, convergence of z k z can be arbitrarily slow. In optimization Tz k z k is usually some sorts of gradients or subgradients, so it is a dual measure of optimality The rate of Tz k z k 2 controls the progress of convergence In ADMM: Tz k z k = 2γ(Ax k + By k b). 19 / 45

20 History of FPR 1978 (λ = 1/2): Brèzis and Lions 6 show FPR satisfies ( ) Tz k z k 2 1 = O. k + 1 If T = prox γf, then ( ) Tz k z k 2 1 = O. (k + 1) (General λ): Baillon and Bruck 7 conjecture O (1/(k + 1)) for nonexpansive maps on Banach spaces (General λ): Cominetti, Soto, and Vaisman 8 prove the conjecture of Baillon and Bruck. 6 Produits infinis de resolvantes 7 The rate of asymptotic regularity is O(1/ k) 8 On the rate of convergence of Krasnosel skiĭ-mann iterations and their connection with sums of Bernoullis 20 / 45

21 Convergence rates Objective error (Non-ergodic) error: consider minimizing h(x) and x is a minimizer of h: h(x k ) h(x ) Its convergence to zero does not imply strong convergence. Useful as a filter through which we view the distance to the solution. Ergodic error: Define ergodic iterates: x k = (1/Λ k ) We measure the quantity k k λ ixg k, where Λ k = λ i, i=0 h(x k ) h(x ) i=0 21 / 45

22 History of objective error 1967 Polyak proved the subgradient method achieves O(1/ k + 1). 1980s Nemirovsky and Yudin show lower complexity of Ω(1/ k + 1) for general class of subgradient methods. 1980s? showed gradient descent O(1/(k + 1)) Nesterov proposed accelerated gradient descent to achieve O(1/(k + 1) 2 ) Güler proved O(1/(k + 1)) convergence for PPA Beck and Teboulle proved O(1/(k + 1)) for FBS, and proposed accelerated variant that achieves O(1/(k + 1) 2 ) Goldstein, O Donoghue, and Setzer proved O(1/(k + 1)) for ADMM when objectives both primal objectives are strongly convex Wei and Ozdaglar showed O(1/(k + 1)) ergodic convergence of ADMM with specific binary matrix A and B He and Yuan showed O(1/(k + 1)) of VI-based optimality violation. Recently, Bot, Chambolle, Deng, Falidi, Lai, Ma, Monteiro, Peyre, Pock, Svaiter, Zhang, violation to VI and Lagrangian optimality, duality gap 22 / 45

23 Contributions on rates (with Damek Davis) KM iteration: FPR o(1/k), tight, improved from O to o. PPA based on prox f : FPR o(1/k 2 ), tight (by an example in Brezis-Lions 78) improved; objective o(1/k), tight (by an infinite-dim example). FBS based on I g and prox f : same rates as PPA, tight. 23 / 45

24 Relaxed PRS (including DRS and, for some, also PRS): all are new FPR: o(1/k), tight (by an infinite-dim example) Ergodic squared feasibility: O(1/k 2 ), tight (by a 2D example) Lipschitz f or g: ergodic objective: o(1/k), tight (by a 1D example) objective: o(1/ k), tight (by an infinite-dim example) Strongly convex f or g: strong sequence convergence, best sequence error o(1/k) ergodic error O(1/k) Gradient Lipschitz f or g: best objective o(1/k); Limit γ properly: objective o(1/k) and FPR o(1/k 2 ) Strongly convex + gradient Lipschitz (applied to either the same or different functions): all rates (FPR, objective, sequence) are linear 24 / 45

25 ADMM (as dual DRS) by d f = A ( f ) A and d g = B ( g ) B. f strongly convex d f is differentiable and Lipschitz; (same for g) f differentiable and AA is full-rank d f is strongly convex; (same for g) Translate the results from relaxed PRS to ADMM: general case: ergodic squared constraint feasibility: O(1/k 2 ) squared constraint feasibility: o(1/k) ergodic objective: O(1/k) objective: o(1/ k) strongly convex f or g: squared feasibility o(1/k 2 ), objective o(1/k) strongly convex function + gradient Lipschitz + matrix full-rank: everything linear convergence Note: Results in Deng-Yin-2012 cover more cases. 25 / 45

26 Method of alternating projection for finding x C 1 C 2 : linear regularity: a special case of PRS, all rates linear in general: same as relax PRS with gradient Lipschitz objectives results extended to x C 1 C n when C 1 C 2 =, converge to the shortest line segment between DRS ( reflect, reflect, average ) for finding x C 1 C 2 : in general: a sequence of points in each set distance to other set: general o(1/k), ergodic O(1/k 2 ) 26 / 45

27 Results in a nutshell Essentially tight upper and lower bounds on fixed-point residual (FPR) for KM iterations. Relaxed PRS point sequence can converge strongly yet arbitrarily slowly. Objective convergence: On average, relaxed PRS performs as well as PPA. In the worst case, relaxed PRS performs nearly as slowly as the subgradient method. When g is Lipschitz, DRS performs as well as FBS, yet no knowledge of Lipschitz constant is needed. 27 / 45

28 Relaxed PRS algorithm converges linearly whenever one of the objectives is strongly convex and one has a Lipschitz derivative. They can be either the same or different functions. For feasibility problems relaxed PRS converges linearly under regularity assumptions on the intersection. For feasibility problems with no regularity, we can generate a point in each set and bound their distance to each other. ADMM produces similar rates for objective functions and the feasibility separately. 28 / 45

29 Part 3: Basic lemma for summable and monotonic sequence Lemma Suppose that nonnegative scalar sequences (λ j) j 0 and (a j) j 0 that are summable i=0 λiai <. Let Λ k := k λi for k 0. i=0 1. If (a j) j 0 is monotonically nonincreasing, then ( a k 1 ) ( ) 1 λ ia i and a k = o. (1) Λ k Λ k Λ k/2 i=0 1.1 If (λ j) j 0 is bounded away from 0 and, then a k = o(1/(k + 1)); 1.2 If λ k = (k + 1) p for p 0 and all k 1, then a k = o(1/(k + 1) p+1 ). 2. Suppose that the nonnegative scalar sequence (b j) j 0 is monotonically nonincreasing and satisfies b k λ k a k λ k+1 a k+1. Then for all k 0 ( ) 1 ( ) 1 b k λ ia (k + 1) 2 k and b k = o. (2) (k + 1) 2 i=0 29 / 45

30 Intuitions Every convergence rate follows from this lemma. Sequence (1/(j + 1)) j 0 is not summable, so a k must decrease faster. Follows because 2i i+1 λiai 0 Extensions: Same assumptions except quasi monotonic: a k+1 a k + e k. Then, e k has to converge one-order faster than a k to preserve its rate. Same assumptions but no monotonicity: k best := arg min{a i : i = 0,..., k}. i Then, all the rates hold for a kbest instead of a k. 30 / 45

31 The idea of FPR convergence rate In general: The term Tz k z k 2 = 1 z k z k+1 2 is monotonic. λ 2 k Furthermore λ i=0 k(1 λ k ) Tz k z k 2 <. Thus, convergence controlled by k i=0 λ k(1 λ k ) When g is Lipschitz (PPA, FBS, DRS) Still have monotonicity, but i=0 (i + 1) Tz k z k 2 <. Requires information about the objective functions.. 31 / 45

32 Example: PPA convergence rate minimize x h(x) PPA iteration: z k+1 = z k γ h(z k+1 ) Minimizer z Objective error sequence a k = h(z k+1 ) h(z ) FPR sequence b k = (1/γ) z k+2 z k+1 2 For any z, h(z k+1 ) h(z) z k+1 z, h(z k+1 ) ((sub)gradient inequality) = 1 γ zk+1 z, z k z k+1 = 1 ( z k z 2 z k+1 z 2 z k+1 z k 2). 2γ 32 / 45

33 h(z k+1 ) h(z) 1 ( z k z 2 z k+1 z 2 z k+1 z k 2). 2γ Nonnegativity: obvious Summability: at z = z : a k (1/2γ) ( z k z 2 z k+1 z 2 z k+1 z k 2) = i=0 a k (1/2γ) z 0 z. Monotonicity: at z = z k 0 b k = (1/γ) z k+2 z k+1 2 h(z k+1 ) h(z k+2 ) = a k a k+1. By the lemma: ( ) 1 a k = o. k + 1 Also, b k (= FPR) is monotonic = ( ) 1 b k = o. (k + 1) 2 33 / 45

34 A fundamental inequality Proposition If z + = (T PRS ) λ (z), then for all x dom(f ) dom(g) 4γλ(f (x f ) + g(x g) f (x) g(x)) ( z x 2 z + x ) z + z 2 λ ( = 2 z + x, z z ) z + z 2. 2λ k Nonergodic rate: Use Cauchy-Schwarz on inner product. The objective error involves both x f and x g. It can be negative. The inequality also has the other side, i.e., a lower bound. Additional regularity properties enable same-point objective error Ergodic rate: Sum both sides, divide by Λ j, and use Jensen s inequality. 34 / 45

35 The other cases If f or g is lipschitz then λ k (f (x k ) + g(x k ) f (x ) g(x )) <. i=0 = best-point convergence rates When λ k 1/2 and g is Lipschitz, we have to construct an auxiliary monotonic sequence that dominates the objective. Under strong convexity, i=0 λ k x k x 2 < = running-best convergence rates. The feasibility problem and the linear convergence result use same fundamental inequality. 35 / 45

36 Other applications More applications in paper: Feasibility; Parallelized model fitting; Linear Programming (linear convergence); Semidefinite programming. 36 / 45

37 Part 4: Primal-dual equivalence (with Ming Yan) Definition: apply the same algorithm to both primal and dual problems, with proper initialization and parameters, the iterates of one can be explicitly reconstructed from those of the other. Eckstein 9 shows DRS is equivalent to DRS on the dual, for a special case Eckstein and Fukushima 10 shows ADMM is equivalent to ADMM on the dual, for a special case AA T = I. Rarely mentioned in the literature. We extend the result to ADMM and relaxed PRS (including DRS and PRS) for general cases, assuming only convexity and the existence of primal-dual solutions. We introduce an equivalent primal-dual algorithm for the saddle-point problem. We establish conditions for the equivalence between ADMMs with swapped orders of subproblems. 9 Eckstien. Splitting methods for monotone operators with applications to parallel optimization, PhD thesis, Eckstein and Fukushima. Some reformulations and applications of the alternating direction, / 45

38 Remarks Different splitting leads to different ADMM iterates. Specifically, we consider minimize x,y f (x) + g(y) (P1) and its dual minimize v subject to Ax + By = b f ( A v) + g ( B v) + v, b. ADMM is applied to (P1) the reformulated dual problem: minimize u,v f ( A u) + (g ( B v) + v, b ) (D1) subject to u v = 0. Examples: YALL1 package 11, l 1-l 1 model 12, traffic equilibrium 13, dual alternating projection. 11 J.Yang and Y.Zhang, Alternating direction algorithms for l1 -problems in compressive sensing, Y.Xiao, H.Zhu, S.-Y. Wu. Primal and dual alternating direction algorithms for l1-l1-norm minimization problems in compressive sensing, Primal: Fukushima 96; dual: Gabay / 45

39 Remarks Penalty parameter λ in the primal ADMM becomes λ 1 in the dual ADMM. It balances primal-dual progress. The perfect symmetry between primal and dual ADMMs suggest that ADMM is a primal-dual algorithm to a saddle-point formulation. 39 / 45

40 Saddle-point formulation and its algorithm The original problem (P1) is equivalent to min max g(y) + u, By b f ( A u). y u Primal-Dual Algorithm: Initialize u 0, u 1, y 0, λ > 0, for k = 0, 1,..., do: ū k = 2u k u k 1 y k+1 = arg min y u k+1 = arg min u Remarks: g(y) + (2λ) 1 By By k + λū k 2 2 f ( A u) + λ/2 u u k λ 1 (By k+1 b) 2 2 If B = I, then it is equivalent to Chambolle-Pock, whose paper also noted the equivalence between it and ADMM. ADMM and PD have the same # iterations but different flops per-iteration. 40 / 45

41 Application: extended monotropic programming minimize x 1,x 2,,x N N f i (x i ), i=1 subject to N A i x i = b. Convert the problem into the following ADMM-ready formulation N minimize f i(x i) + ι N (y) {x i },{y i } {y: i=1 i=1 y i =b} subject to A i x i y i = 0. i=1 ADMM: iteratively update {x i}, {y i}, {u i} Primal-Dual: iteratively update {y i}, {u i}, and at the end recover {x i} 41 / 45

42 Assumption: f ( A u) has an easy form, for example, when f i( ) = (1/2) 2 A i R m n i and A ia T i = I. For each iteration k and block i: ADMM: 10m + 2mn i PD: 10m due to the hiding of x i Pre/post-processing: ADMM has a pre-step of mn i for each i PD has a post-step of mn i for each i Distributed computing: Same communication for ADMM and PD PD has better load balance since its per-iteration flop is independent of n i 42 / 45

43 Swap x/y-update order Two similar ADMM on the same problem: ADMM 1 updates y, then x, then dual variable z ADMM 2 updates x, then y, then dual variable z In general, they produce different iterates, but there are exceptions. Define: F(s) := min f (x) + ι {x:ax=s} (x), x (3a) Theorem G(t) := min g(y) + ι {y:by=b t} (y). (3b) y 1. Assume prox G is affine. Given the iterates of ADMM 2, if z 0 2 G(b By 0 2), then the iterates of ADMM 1 can be recovered as x k 1 = x k+1 2, z k 1 = z k 2 + λ 1 (Ax k By k 2 b). 2. Assume prox F is affine. Given the iterates of ADMM 1, if z 0 1 G(Ax 0 1), then the iterates of ADMM 2 can be recovered as y k 2 = y k+1 1, z k 2 = z k 1 + λ 1 (Ax k By k+1 1 b). 43 / 45

44 Affine proximal mapping Definition A mapping T is affine if, for any r 1 and r 2, ( 1 T 2 r1 + 1 ) 2 r2 = 1 2 Tr Tr2. Proposition Let G be a proper, closed, convex function. The following statements are equivalent: 1. prox G( ) is affine; 2. prox λg( ) is affine for λ > 0; 3. aprox G( ) bi + ci is affine for any scalars a, b and c; 4. prox G ( ) is affine; 5. G is convex quadratic (or, affine or constant) and has an affine domain (either G or the intersection of hyperplanes in G). If function g obeys Part 5, then G defined in (3b) satisfies Part 5, too. 44 / 45

45 Conclusion Our work Analyzed relaxed PRS, ADMM, and KM iterations. Provided worst-case non-asymptotic convergence analysis. Provided lower complexity bounds for the basic rates. Showed the limitations of the methods. Established primal-dual equivalence and conditions for order-swapping equivalence Reflections The methods are essentially nonexpansive operator splitting iterations applied to optimality conditions of the original problem When splitting methods use points other than z k, it lacks an objective function for monotonic decrease or for acceleration Splitting methods based on implicit steps automatically adjust to regularity properties present. (That s why they are practically fast.) 45 / 45

Convergence of Fixed-Point Iterations

Convergence of Fixed-Point Iterations Convergence of Fixed-Point Iterations Instructor: Wotao Yin (UCLA Math) July 2016 1 / 30 Why study fixed-point iterations? Abstract many existing algorithms in optimization, numerical linear algebra, and

More information

Math 273a: Optimization Overview of First-Order Optimization Algorithms

Math 273a: Optimization Overview of First-Order Optimization Algorithms Math 273a: Optimization Overview of First-Order Optimization Algorithms Wotao Yin Department of Mathematics, UCLA online discussions on piazza.com 1 / 9 Typical flow of numerical optimization Optimization

More information

Operator Splitting for Parallel and Distributed Optimization

Operator Splitting for Parallel and Distributed Optimization Operator Splitting for Parallel and Distributed Optimization Wotao Yin (UCLA Math) Shanghai Tech, SSDS 15 June 23, 2015 URL: alturl.com/2z7tv 1 / 60 What is splitting? Sun-Tzu: (400 BC) Caesar: divide-n-conquer

More information

Splitting methods for decomposing separable convex programs

Splitting methods for decomposing separable convex programs Splitting methods for decomposing separable convex programs Philippe Mahey LIMOS - ISIMA - Université Blaise Pascal PGMO, ENSTA 2013 October 4, 2013 1 / 30 Plan 1 Max Monotone Operators Proximal techniques

More information

Coordinate Update Algorithm Short Course Operator Splitting

Coordinate Update Algorithm Short Course Operator Splitting Coordinate Update Algorithm Short Course Operator Splitting Instructor: Wotao Yin (UCLA Math) Summer 2016 1 / 25 Operator splitting pipeline 1. Formulate a problem as 0 A(x) + B(x) with monotone operators

More information

Coordinate Update Algorithm Short Course Proximal Operators and Algorithms

Coordinate Update Algorithm Short Course Proximal Operators and Algorithms Coordinate Update Algorithm Short Course Proximal Operators and Algorithms Instructor: Wotao Yin (UCLA Math) Summer 2016 1 / 36 Why proximal? Newton s method: for C 2 -smooth, unconstrained problems allow

More information

Accelerated primal-dual methods for linearly constrained convex problems

Accelerated primal-dual methods for linearly constrained convex problems Accelerated primal-dual methods for linearly constrained convex problems Yangyang Xu SIAM Conference on Optimization May 24, 2017 1 / 23 Accelerated proximal gradient For convex composite problem: minimize

More information

ARock: an algorithmic framework for asynchronous parallel coordinate updates

ARock: an algorithmic framework for asynchronous parallel coordinate updates ARock: an algorithmic framework for asynchronous parallel coordinate updates Zhimin Peng, Yangyang Xu, Ming Yan, Wotao Yin ( UCLA Math, U.Waterloo DCO) UCLA CAM Report 15-37 ShanghaiTech SSDS 15 June 25,

More information

arxiv: v3 [math.oc] 1 May 2015

arxiv: v3 [math.oc] 1 May 2015 Noname manuscript No. will be inserted by the editor) arxiv:1407.5210v3 [math.oc] 1 May 2015 Faster convergence rates of relaxed Peaceman-Rachford and ADMM under regularity assumptions Damek Davis Wotao

More information

Math 273a: Optimization Subgradient Methods

Math 273a: Optimization Subgradient Methods Math 273a: Optimization Subgradient Methods Instructor: Wotao Yin Department of Mathematics, UCLA Fall 2015 online discussions on piazza.com Nonsmooth convex function Recall: For ˉx R n, f(ˉx) := {g R

More information

A Primal-dual Three-operator Splitting Scheme

A Primal-dual Three-operator Splitting Scheme Noname manuscript No. (will be inserted by the editor) A Primal-dual Three-operator Splitting Scheme Ming Yan Received: date / Accepted: date Abstract In this paper, we propose a new primal-dual algorithm

More information

Recent Developments of Alternating Direction Method of Multipliers with Multi-Block Variables

Recent Developments of Alternating Direction Method of Multipliers with Multi-Block Variables Recent Developments of Alternating Direction Method of Multipliers with Multi-Block Variables Department of Systems Engineering and Engineering Management The Chinese University of Hong Kong 2014 Workshop

More information

On convergence rate of the Douglas-Rachford operator splitting method

On convergence rate of the Douglas-Rachford operator splitting method On convergence rate of the Douglas-Rachford operator splitting method Bingsheng He and Xiaoming Yuan 2 Abstract. This note provides a simple proof on a O(/k) convergence rate for the Douglas- Rachford

More information

Proximal splitting methods on convex problems with a quadratic term: Relax!

Proximal splitting methods on convex problems with a quadratic term: Relax! Proximal splitting methods on convex problems with a quadratic term: Relax! The slides I presented with added comments Laurent Condat GIPSA-lab, Univ. Grenoble Alpes, France Workshop BASP Frontiers, Jan.

More information

arxiv: v4 [math.oc] 29 Jan 2018

arxiv: v4 [math.oc] 29 Jan 2018 Noname manuscript No. (will be inserted by the editor A new primal-dual algorithm for minimizing the sum of three functions with a linear operator Ming Yan arxiv:1611.09805v4 [math.oc] 29 Jan 2018 Received:

More information

Dual and primal-dual methods

Dual and primal-dual methods ELE 538B: Large-Scale Optimization for Data Science Dual and primal-dual methods Yuxin Chen Princeton University, Spring 2018 Outline Dual proximal gradient method Primal-dual proximal gradient method

More information

Splitting Techniques in the Face of Huge Problem Sizes: Block-Coordinate and Block-Iterative Approaches

Splitting Techniques in the Face of Huge Problem Sizes: Block-Coordinate and Block-Iterative Approaches Splitting Techniques in the Face of Huge Problem Sizes: Block-Coordinate and Block-Iterative Approaches Patrick L. Combettes joint work with J.-C. Pesquet) Laboratoire Jacques-Louis Lions Faculté de Mathématiques

More information

6. Proximal gradient method

6. Proximal gradient method L. Vandenberghe EE236C (Spring 2016) 6. Proximal gradient method motivation proximal mapping proximal gradient method with fixed step size proximal gradient method with line search 6-1 Proximal mapping

More information

Iterative Convex Optimization Algorithms; Part One: Using the Baillon Haddad Theorem

Iterative Convex Optimization Algorithms; Part One: Using the Baillon Haddad Theorem Iterative Convex Optimization Algorithms; Part One: Using the Baillon Haddad Theorem Charles Byrne (Charles Byrne@uml.edu) http://faculty.uml.edu/cbyrne/cbyrne.html Department of Mathematical Sciences

More information

6. Proximal gradient method

6. Proximal gradient method L. Vandenberghe EE236C (Spring 2013-14) 6. Proximal gradient method motivation proximal mapping proximal gradient method with fixed step size proximal gradient method with line search 6-1 Proximal mapping

More information

M. Marques Alves Marina Geremia. November 30, 2017

M. Marques Alves Marina Geremia. November 30, 2017 Iteration complexity of an inexact Douglas-Rachford method and of a Douglas-Rachford-Tseng s F-B four-operator splitting method for solving monotone inclusions M. Marques Alves Marina Geremia November

More information

Primal-dual algorithms for the sum of two and three functions 1

Primal-dual algorithms for the sum of two and three functions 1 Primal-dual algorithms for the sum of two and three functions 1 Ming Yan Michigan State University, CMSE/Mathematics 1 This works is partially supported by NSF. optimization problems for primal-dual algorithms

More information

Convex Optimization Notes

Convex Optimization Notes Convex Optimization Notes Jonathan Siegel January 2017 1 Convex Analysis This section is devoted to the study of convex functions f : B R {+ } and convex sets U B, for B a Banach space. The case of B =

More information

Asynchronous Algorithms for Conic Programs, including Optimal, Infeasible, and Unbounded Ones

Asynchronous Algorithms for Conic Programs, including Optimal, Infeasible, and Unbounded Ones Asynchronous Algorithms for Conic Programs, including Optimal, Infeasible, and Unbounded Ones Wotao Yin joint: Fei Feng, Robert Hannah, Yanli Liu, Ernest Ryu (UCLA, Math) DIMACS: Distributed Optimization,

More information

I P IANO : I NERTIAL P ROXIMAL A LGORITHM FOR N ON -C ONVEX O PTIMIZATION

I P IANO : I NERTIAL P ROXIMAL A LGORITHM FOR N ON -C ONVEX O PTIMIZATION I P IANO : I NERTIAL P ROXIMAL A LGORITHM FOR N ON -C ONVEX O PTIMIZATION Peter Ochs University of Freiburg Germany 17.01.2017 joint work with: Thomas Brox and Thomas Pock c 2017 Peter Ochs ipiano c 1

More information

A Unified Approach to Proximal Algorithms using Bregman Distance

A Unified Approach to Proximal Algorithms using Bregman Distance A Unified Approach to Proximal Algorithms using Bregman Distance Yi Zhou a,, Yingbin Liang a, Lixin Shen b a Department of Electrical Engineering and Computer Science, Syracuse University b Department

More information

A New Primal Dual Algorithm for Minimizing the Sum of Three Functions with a Linear Operator

A New Primal Dual Algorithm for Minimizing the Sum of Three Functions with a Linear Operator https://doi.org/10.1007/s10915-018-0680-3 A New Primal Dual Algorithm for Minimizing the Sum of Three Functions with a Linear Operator Ming Yan 1,2 Received: 22 January 2018 / Accepted: 22 February 2018

More information

Sparse Optimization Lecture: Dual Methods, Part I

Sparse Optimization Lecture: Dual Methods, Part I Sparse Optimization Lecture: Dual Methods, Part I Instructor: Wotao Yin July 2013 online discussions on piazza.com Those who complete this lecture will know dual (sub)gradient iteration augmented l 1 iteration

More information

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 9. Alternating Direction Method of Multipliers

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 9. Alternating Direction Method of Multipliers Shiqian Ma, MAT-258A: Numerical Optimization 1 Chapter 9 Alternating Direction Method of Multipliers Shiqian Ma, MAT-258A: Numerical Optimization 2 Separable convex optimization a special case is min f(x)

More information

EE 546, Univ of Washington, Spring Proximal mapping. introduction. review of conjugate functions. proximal mapping. Proximal mapping 6 1

EE 546, Univ of Washington, Spring Proximal mapping. introduction. review of conjugate functions. proximal mapping. Proximal mapping 6 1 EE 546, Univ of Washington, Spring 2012 6. Proximal mapping introduction review of conjugate functions proximal mapping Proximal mapping 6 1 Proximal mapping the proximal mapping (prox-operator) of a convex

More information

Convex Optimization. (EE227A: UC Berkeley) Lecture 15. Suvrit Sra. (Gradient methods III) 12 March, 2013

Convex Optimization. (EE227A: UC Berkeley) Lecture 15. Suvrit Sra. (Gradient methods III) 12 March, 2013 Convex Optimization (EE227A: UC Berkeley) Lecture 15 (Gradient methods III) 12 March, 2013 Suvrit Sra Optimal gradient methods 2 / 27 Optimal gradient methods We saw following efficiency estimates for

More information

Dual Proximal Gradient Method

Dual Proximal Gradient Method Dual Proximal Gradient Method http://bicmr.pku.edu.cn/~wenzw/opt-2016-fall.html Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes Outline 2/19 1 proximal gradient method

More information

Self Equivalence of the Alternating Direction Method of Multipliers

Self Equivalence of the Alternating Direction Method of Multipliers Self Equivalence of the Alternating Direction Method of Multipliers Ming Yan Wotao Yin August 11, 2014 Abstract In this paper, we show interesting self equivalence results for the alternating direction

More information

ARock: an Algorithmic Framework for Asynchronous Parallel Coordinate Updates

ARock: an Algorithmic Framework for Asynchronous Parallel Coordinate Updates ARock: an Algorithmic Framework for Asynchronous Parallel Coordinate Updates Zhimin Peng Yangyang Xu Ming Yan Wotao Yin May 3, 216 Abstract Finding a fixed point to a nonexpansive operator, i.e., x = T

More information

On the equivalence of the primal-dual hybrid gradient method and Douglas Rachford splitting

On the equivalence of the primal-dual hybrid gradient method and Douglas Rachford splitting Mathematical Programming manuscript No. (will be inserted by the editor) On the equivalence of the primal-dual hybrid gradient method and Douglas Rachford splitting Daniel O Connor Lieven Vandenberghe

More information

Sequential Unconstrained Minimization: A Survey

Sequential Unconstrained Minimization: A Survey Sequential Unconstrained Minimization: A Survey Charles L. Byrne February 21, 2013 Abstract The problem is to minimize a function f : X (, ], over a non-empty subset C of X, where X is an arbitrary set.

More information

Math 273a: Optimization Subgradients of convex functions

Math 273a: Optimization Subgradients of convex functions Math 273a: Optimization Subgradients of convex functions Made by: Damek Davis Edited by Wotao Yin Department of Mathematics, UCLA Fall 2015 online discussions on piazza.com 1 / 42 Subgradients Assumptions

More information

An Accelerated Hybrid Proximal Extragradient Method for Convex Optimization and its Implications to Second-Order Methods

An Accelerated Hybrid Proximal Extragradient Method for Convex Optimization and its Implications to Second-Order Methods An Accelerated Hybrid Proximal Extragradient Method for Convex Optimization and its Implications to Second-Order Methods Renato D.C. Monteiro B. F. Svaiter May 10, 011 Revised: May 4, 01) Abstract This

More information

FAST ALTERNATING DIRECTION OPTIMIZATION METHODS

FAST ALTERNATING DIRECTION OPTIMIZATION METHODS FAST ALTERNATING DIRECTION OPTIMIZATION METHODS TOM GOLDSTEIN, BRENDAN O DONOGHUE, SIMON SETZER, AND RICHARD BARANIUK Abstract. Alternating direction methods are a common tool for general mathematical

More information

A New Use of Douglas-Rachford Splitting and ADMM for Classifying Infeasible, Unbounded, and Pathological Conic Programs

A New Use of Douglas-Rachford Splitting and ADMM for Classifying Infeasible, Unbounded, and Pathological Conic Programs A New Use of Douglas-Rachford Splitting and ADMM for Classifying Infeasible, Unbounded, and Pathological Conic Programs Yanli Liu, Ernest K. Ryu, and Wotao Yin May 23, 2017 Abstract In this paper, we present

More information

Fast proximal gradient methods

Fast proximal gradient methods L. Vandenberghe EE236C (Spring 2013-14) Fast proximal gradient methods fast proximal gradient method (FISTA) FISTA with line search FISTA as descent method Nesterov s second method 1 Fast (proximal) gradient

More information

Auxiliary-Function Methods in Optimization

Auxiliary-Function Methods in Optimization Auxiliary-Function Methods in Optimization Charles Byrne (Charles Byrne@uml.edu) http://faculty.uml.edu/cbyrne/cbyrne.html Department of Mathematical Sciences University of Massachusetts Lowell Lowell,

More information

ARock: an Algorithmic Framework for Async-Parallel Coordinate Updates

ARock: an Algorithmic Framework for Async-Parallel Coordinate Updates ARock: an Algorithmic Framework for Async-Parallel Coordinate Updates Zhimin Peng Yangyang Xu Ming Yan Wotao Yin July 7, 215 The problem of finding a fixed point to a nonexpansive operator is an abstraction

More information

Accelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems)

Accelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems) Accelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems) Donghwan Kim and Jeffrey A. Fessler EECS Department, University of Michigan

More information

Math 273a: Optimization Lagrange Duality

Math 273a: Optimization Lagrange Duality Math 273a: Optimization Lagrange Duality Instructor: Wotao Yin Department of Mathematics, UCLA Winter 2015 online discussions on piazza.com Gradient descent / forward Euler assume function f is proper

More information

This can be 2 lectures! still need: Examples: non-convex problems applications for matrix factorization

This can be 2 lectures! still need: Examples: non-convex problems applications for matrix factorization This can be 2 lectures! still need: Examples: non-convex problems applications for matrix factorization x = prox_f(x)+prox_{f^*}(x) use to get prox of norms! PROXIMAL METHODS WHY PROXIMAL METHODS Smooth

More information

Primal-dual fixed point algorithms for separable minimization problems and their applications in imaging

Primal-dual fixed point algorithms for separable minimization problems and their applications in imaging 1/38 Primal-dual fixed point algorithms for separable minimization problems and their applications in imaging Xiaoqun Zhang Department of Mathematics and Institute of Natural Sciences Shanghai Jiao Tong

More information

Lecture Notes on Iterative Optimization Algorithms

Lecture Notes on Iterative Optimization Algorithms Charles L. Byrne Department of Mathematical Sciences University of Massachusetts Lowell December 8, 2014 Lecture Notes on Iterative Optimization Algorithms Contents Preface vii 1 Overview and Examples

More information

Coordinate Update Algorithm Short Course Subgradients and Subgradient Methods

Coordinate Update Algorithm Short Course Subgradients and Subgradient Methods Coordinate Update Algorithm Short Course Subgradients and Subgradient Methods Instructor: Wotao Yin (UCLA Math) Summer 2016 1 / 30 Notation f : H R { } is a closed proper convex function domf := {x R n

More information

A SIMPLE PARALLEL ALGORITHM WITH AN O(1/T ) CONVERGENCE RATE FOR GENERAL CONVEX PROGRAMS

A SIMPLE PARALLEL ALGORITHM WITH AN O(1/T ) CONVERGENCE RATE FOR GENERAL CONVEX PROGRAMS A SIMPLE PARALLEL ALGORITHM WITH AN O(/T ) CONVERGENCE RATE FOR GENERAL CONVEX PROGRAMS HAO YU AND MICHAEL J. NEELY Abstract. This paper considers convex programs with a general (possibly non-differentiable)

More information

Proximal Gradient Descent and Acceleration. Ryan Tibshirani Convex Optimization /36-725

Proximal Gradient Descent and Acceleration. Ryan Tibshirani Convex Optimization /36-725 Proximal Gradient Descent and Acceleration Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: subgradient method Consider the problem min f(x) with f convex, and dom(f) = R n. Subgradient method:

More information

Adaptive Primal Dual Optimization for Image Processing and Learning

Adaptive Primal Dual Optimization for Image Processing and Learning Adaptive Primal Dual Optimization for Image Processing and Learning Tom Goldstein Rice University tag7@rice.edu Ernie Esser University of British Columbia eesser@eos.ubc.ca Richard Baraniuk Rice University

More information

Extensions of the CQ Algorithm for the Split Feasibility and Split Equality Problems

Extensions of the CQ Algorithm for the Split Feasibility and Split Equality Problems Extensions of the CQ Algorithm for the Split Feasibility Split Equality Problems Charles L. Byrne Abdellatif Moudafi September 2, 2013 Abstract The convex feasibility problem (CFP) is to find a member

More information

1 Introduction and preliminaries

1 Introduction and preliminaries Proximal Methods for a Class of Relaxed Nonlinear Variational Inclusions Abdellatif Moudafi Université des Antilles et de la Guyane, Grimaag B.P. 7209, 97275 Schoelcher, Martinique abdellatif.moudafi@martinique.univ-ag.fr

More information

Optimization for Learning and Big Data

Optimization for Learning and Big Data Optimization for Learning and Big Data Donald Goldfarb Department of IEOR Columbia University Department of Mathematics Distinguished Lecture Series May 17-19, 2016. Lecture 1. First-Order Methods for

More information

Master 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique

Master 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique Master 2 MathBigData S. Gaïffas 1 3 novembre 2014 1 CMAP - Ecole Polytechnique 1 Supervised learning recap Introduction Loss functions, linearity 2 Penalization Introduction Ridge Sparsity Lasso 3 Some

More information

ACCELERATED FIRST-ORDER PRIMAL-DUAL PROXIMAL METHODS FOR LINEARLY CONSTRAINED COMPOSITE CONVEX PROGRAMMING

ACCELERATED FIRST-ORDER PRIMAL-DUAL PROXIMAL METHODS FOR LINEARLY CONSTRAINED COMPOSITE CONVEX PROGRAMMING ACCELERATED FIRST-ORDER PRIMAL-DUAL PROXIMAL METHODS FOR LINEARLY CONSTRAINED COMPOSITE CONVEX PROGRAMMING YANGYANG XU Abstract. Motivated by big data applications, first-order methods have been extremely

More information

Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.16

Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.16 XVI - 1 Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.16 A slightly changed ADMM for convex optimization with three separable operators Bingsheng He Department of

More information

Distributed Optimization via Alternating Direction Method of Multipliers

Distributed Optimization via Alternating Direction Method of Multipliers Distributed Optimization via Alternating Direction Method of Multipliers Stephen Boyd, Neal Parikh, Eric Chu, Borja Peleato Stanford University ITMANET, Stanford, January 2011 Outline precursors dual decomposition

More information

Math 273a: Optimization Convex Conjugacy

Math 273a: Optimization Convex Conjugacy Math 273a: Optimization Convex Conjugacy Instructor: Wotao Yin Department of Mathematics, UCLA Fall 2015 online discussions on piazza.com Convex conjugate (the Legendre transform) Let f be a closed proper

More information

Douglas-Rachford Splitting: Complexity Estimates and Accelerated Variants

Douglas-Rachford Splitting: Complexity Estimates and Accelerated Variants 53rd IEEE Conference on Decision and Control December 5-7, 204. Los Angeles, California, USA Douglas-Rachford Splitting: Complexity Estimates and Accelerated Variants Panagiotis Patrinos and Lorenzo Stella

More information

Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem

Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem Michael Patriksson 0-0 The Relaxation Theorem 1 Problem: find f := infimum f(x), x subject to x S, (1a) (1b) where f : R n R

More information

Asynchronous Parallel Computing in Signal Processing and Machine Learning

Asynchronous Parallel Computing in Signal Processing and Machine Learning Asynchronous Parallel Computing in Signal Processing and Machine Learning Wotao Yin (UCLA Math) joint with Zhimin Peng (UCLA), Yangyang Xu (IMA), Ming Yan (MSU) Optimization and Parsimonious Modeling IMA,

More information

LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE

LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE CONVEX ANALYSIS AND DUALITY Basic concepts of convex analysis Basic concepts of convex optimization Geometric duality framework - MC/MC Constrained optimization

More information

Nonconvex ADMM: Convergence and Applications

Nonconvex ADMM: Convergence and Applications Nonconvex ADMM: Convergence and Applications Instructor: Wotao Yin (UCLA Math) Based on CAM 15-62 with Yu Wang and Jinshan Zeng Summer 2016 1 / 54 1. Alternating Direction Method of Multipliers (ADMM):

More information

Convex Analysis Notes. Lecturer: Adrian Lewis, Cornell ORIE Scribe: Kevin Kircher, Cornell MAE

Convex Analysis Notes. Lecturer: Adrian Lewis, Cornell ORIE Scribe: Kevin Kircher, Cornell MAE Convex Analysis Notes Lecturer: Adrian Lewis, Cornell ORIE Scribe: Kevin Kircher, Cornell MAE These are notes from ORIE 6328, Convex Analysis, as taught by Prof. Adrian Lewis at Cornell University in the

More information

subject to (x 2)(x 4) u,

subject to (x 2)(x 4) u, Exercises Basic definitions 5.1 A simple example. Consider the optimization problem with variable x R. minimize x 2 + 1 subject to (x 2)(x 4) 0, (a) Analysis of primal problem. Give the feasible set, the

More information

Convergence rate estimates for the gradient differential inclusion

Convergence rate estimates for the gradient differential inclusion Convergence rate estimates for the gradient differential inclusion Osman Güler November 23 Abstract Let f : H R { } be a proper, lower semi continuous, convex function in a Hilbert space H. The gradient

More information

Research Article Modified Halfspace-Relaxation Projection Methods for Solving the Split Feasibility Problem

Research Article Modified Halfspace-Relaxation Projection Methods for Solving the Split Feasibility Problem Advances in Operations Research Volume 01, Article ID 483479, 17 pages doi:10.1155/01/483479 Research Article Modified Halfspace-Relaxation Projection Methods for Solving the Split Feasibility Problem

More information

The proximal mapping

The proximal mapping The proximal mapping http://bicmr.pku.edu.cn/~wenzw/opt-2016-fall.html Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes Outline 2/37 1 closed function 2 Conjugate function

More information

Lecture: Algorithms for Compressed Sensing

Lecture: Algorithms for Compressed Sensing 1/56 Lecture: Algorithms for Compressed Sensing Zaiwen Wen Beijing International Center For Mathematical Research Peking University http://bicmr.pku.edu.cn/~wenzw/bigdata2017.html wenzw@pku.edu.cn Acknowledgement:

More information

Complexity of the relaxed Peaceman-Rachford splitting method for the sum of two maximal strongly monotone operators

Complexity of the relaxed Peaceman-Rachford splitting method for the sum of two maximal strongly monotone operators Complexity of the relaxed Peaceman-Rachford splitting method for the sum of two maximal strongly monotone operators Renato D.C. Monteiro, Chee-Khian Sim November 3, 206 Abstract This paper considers the

More information

arxiv: v1 [math.oc] 21 Apr 2016

arxiv: v1 [math.oc] 21 Apr 2016 Accelerated Douglas Rachford methods for the solution of convex-concave saddle-point problems Kristian Bredies Hongpeng Sun April, 06 arxiv:604.068v [math.oc] Apr 06 Abstract We study acceleration and

More information

FINDING BEST APPROXIMATION PAIRS RELATIVE TO A CONVEX AND A PROX-REGULAR SET IN A HILBERT SPACE

FINDING BEST APPROXIMATION PAIRS RELATIVE TO A CONVEX AND A PROX-REGULAR SET IN A HILBERT SPACE FINDING BEST APPROXIMATION PAIRS RELATIVE TO A CONVEX AND A PROX-REGULAR SET IN A HILBERT SPACE D. RUSSELL LUKE Abstract. We study the convergence of an iterative projection/reflection algorithm originally

More information

Convex Optimization M2

Convex Optimization M2 Convex Optimization M2 Lecture 3 A. d Aspremont. Convex Optimization M2. 1/49 Duality A. d Aspremont. Convex Optimization M2. 2/49 DMs DM par email: dm.daspremont@gmail.com A. d Aspremont. Convex Optimization

More information

Inertial Douglas-Rachford splitting for monotone inclusion problems

Inertial Douglas-Rachford splitting for monotone inclusion problems Inertial Douglas-Rachford splitting for monotone inclusion problems Radu Ioan Boţ Ernö Robert Csetnek Christopher Hendrich January 5, 2015 Abstract. We propose an inertial Douglas-Rachford splitting algorithm

More information

On the complexity of the hybrid proximal extragradient method for the iterates and the ergodic mean

On the complexity of the hybrid proximal extragradient method for the iterates and the ergodic mean On the complexity of the hybrid proximal extragradient method for the iterates and the ergodic mean Renato D.C. Monteiro B. F. Svaiter March 17, 2009 Abstract In this paper we analyze the iteration-complexity

More information

A New Use of Douglas-Rachford Splitting and ADMM for Identifying Infeasible, Unbounded, and Pathological Conic Programs

A New Use of Douglas-Rachford Splitting and ADMM for Identifying Infeasible, Unbounded, and Pathological Conic Programs A New Use of Douglas-Rachford Splitting and ADMM for Identifying Infeasible, Unbounded, and Pathological Conic Programs Yanli Liu Ernest K. Ryu Wotao Yin October 14, 2017 Abstract In this paper, we present

More information

arxiv: v1 [math.oc] 23 May 2017

arxiv: v1 [math.oc] 23 May 2017 A DERANDOMIZED ALGORITHM FOR RP-ADMM WITH SYMMETRIC GAUSS-SEIDEL METHOD JINCHAO XU, KAILAI XU, AND YINYU YE arxiv:1705.08389v1 [math.oc] 23 May 2017 Abstract. For multi-block alternating direction method

More information

Math 273a: Optimization Subgradients of convex functions

Math 273a: Optimization Subgradients of convex functions Math 273a: Optimization Subgradients of convex functions Made by: Damek Davis Edited by Wotao Yin Department of Mathematics, UCLA Fall 2015 online discussions on piazza.com 1 / 20 Subgradients Assumptions

More information

9. Dual decomposition and dual algorithms

9. Dual decomposition and dual algorithms EE 546, Univ of Washington, Spring 2016 9. Dual decomposition and dual algorithms dual gradient ascent example: network rate control dual decomposition and the proximal gradient method examples with simple

More information

Convergence rate analysis for averaged fixed point iterations in the presence of Hölder regularity

Convergence rate analysis for averaged fixed point iterations in the presence of Hölder regularity Convergence rate analysis for averaged fixed point iterations in the presence of Hölder regularity Jonathan M. Borwein Guoyin Li Matthew K. Tam October 23, 205 Abstract In this paper, we establish sublinear

More information

Iteration-complexity of first-order penalty methods for convex programming

Iteration-complexity of first-order penalty methods for convex programming Iteration-complexity of first-order penalty methods for convex programming Guanghui Lan Renato D.C. Monteiro July 24, 2008 Abstract This paper considers a special but broad class of convex programing CP)

More information

ADMM for monotone operators: convergence analysis and rates

ADMM for monotone operators: convergence analysis and rates ADMM for monotone operators: convergence analysis and rates Radu Ioan Boţ Ernö Robert Csetne May 4, 07 Abstract. We propose in this paper a unifying scheme for several algorithms from the literature dedicated

More information

arxiv: v7 [math.oc] 22 Feb 2018

arxiv: v7 [math.oc] 22 Feb 2018 A SMOOTH PRIMAL-DUAL OPTIMIZATION FRAMEWORK FOR NONSMOOTH COMPOSITE CONVEX MINIMIZATION QUOC TRAN-DINH, OLIVIER FERCOQ, AND VOLKAN CEVHER arxiv:1507.06243v7 [math.oc] 22 Feb 2018 Abstract. We propose a

More information

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring

More information

Optimization for Machine Learning

Optimization for Machine Learning Optimization for Machine Learning (Problems; Algorithms - A) SUVRIT SRA Massachusetts Institute of Technology PKU Summer School on Data Science (July 2017) Course materials http://suvrit.de/teaching.html

More information

1. Gradient method. gradient method, first-order methods. quadratic bounds on convex functions. analysis of gradient method

1. Gradient method. gradient method, first-order methods. quadratic bounds on convex functions. analysis of gradient method L. Vandenberghe EE236C (Spring 2016) 1. Gradient method gradient method, first-order methods quadratic bounds on convex functions analysis of gradient method 1-1 Approximate course outline First-order

More information

An adaptive accelerated first-order method for convex optimization

An adaptive accelerated first-order method for convex optimization An adaptive accelerated first-order method for convex optimization Renato D.C Monteiro Camilo Ortiz Benar F. Svaiter July 3, 22 (Revised: May 4, 24) Abstract This paper presents a new accelerated variant

More information

A NEW ITERATIVE METHOD FOR THE SPLIT COMMON FIXED POINT PROBLEM IN HILBERT SPACES. Fenghui Wang

A NEW ITERATIVE METHOD FOR THE SPLIT COMMON FIXED POINT PROBLEM IN HILBERT SPACES. Fenghui Wang A NEW ITERATIVE METHOD FOR THE SPLIT COMMON FIXED POINT PROBLEM IN HILBERT SPACES Fenghui Wang Department of Mathematics, Luoyang Normal University, Luoyang 470, P.R. China E-mail: wfenghui@63.com ABSTRACT.

More information

Lasso: Algorithms and Extensions

Lasso: Algorithms and Extensions ELE 538B: Sparsity, Structure and Inference Lasso: Algorithms and Extensions Yuxin Chen Princeton University, Spring 2017 Outline Proximal operators Proximal gradient methods for lasso and its extensions

More information

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization /

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization / Uses of duality Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Remember conjugate functions Given f : R n R, the function is called its conjugate f (y) = max x R n yt x f(x) Conjugates appear

More information

Convex Optimization Theory. Chapter 5 Exercises and Solutions: Extended Version

Convex Optimization Theory. Chapter 5 Exercises and Solutions: Extended Version Convex Optimization Theory Chapter 5 Exercises and Solutions: Extended Version Dimitri P. Bertsekas Massachusetts Institute of Technology Athena Scientific, Belmont, Massachusetts http://www.athenasc.com

More information

The Direct Extension of ADMM for Multi-block Convex Minimization Problems is Not Necessarily Convergent

The Direct Extension of ADMM for Multi-block Convex Minimization Problems is Not Necessarily Convergent The Direct Extension of ADMM for Multi-block Convex Minimization Problems is Not Necessarily Convergent Yinyu Ye K. T. Li Professor of Engineering Department of Management Science and Engineering Stanford

More information

BASICS OF CONVEX ANALYSIS

BASICS OF CONVEX ANALYSIS BASICS OF CONVEX ANALYSIS MARKUS GRASMAIR 1. Main Definitions We start with providing the central definitions of convex functions and convex sets. Definition 1. A function f : R n R + } is called convex,

More information

Primal-dual coordinate descent A Coordinate Descent Primal-Dual Algorithm with Large Step Size and Possibly Non-Separable Functions

Primal-dual coordinate descent A Coordinate Descent Primal-Dual Algorithm with Large Step Size and Possibly Non-Separable Functions Primal-dual coordinate descent A Coordinate Descent Primal-Dual Algorithm with Large Step Size and Possibly Non-Separable Functions Olivier Fercoq and Pascal Bianchi Problem Minimize the convex function

More information

A Tutorial on Primal-Dual Algorithm

A Tutorial on Primal-Dual Algorithm A Tutorial on Primal-Dual Algorithm Shenlong Wang University of Toronto March 31, 2016 1 / 34 Energy minimization MAP Inference for MRFs Typical energies consist of a regularization term and a data term.

More information

Optimization methods

Optimization methods Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,

More information

Golden Ratio Algorithms for Variational Inequalities

Golden Ratio Algorithms for Variational Inequalities Golden Ratio Algorithms for Variational Inequalities Yura Malitsky Abstract arxiv:1803.08832v1 [math.oc] 23 Mar 2018 The paper presents a fully explicit algorithm for monotone variational inequalities.

More information