arxiv: v1 [math.oc] 23 May 2017

Similar documents
The Direct Extension of ADMM for Multi-block Convex Minimization Problems is Not Necessarily Convergent

Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.18

Dealing with Constraints via Random Permutation

Recent Developments of Alternating Direction Method of Multipliers with Multi-Block Variables

Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.16

The Direct Extension of ADMM for Multi-block Convex Minimization Problems is Not Necessarily Convergent

Distributed Optimization via Alternating Direction Method of Multipliers

Dual Methods. Lecturer: Ryan Tibshirani Convex Optimization /36-725

On the Convergence of Multi-Block Alternating Direction Method of Multipliers and Block Coordinate Descent Method

Dual Ascent. Ryan Tibshirani Convex Optimization

Dual methods and ADMM. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725

The Alternating Direction Method of Multipliers

Lecture Note 7: Iterative methods for solving linear systems. Xiaoqun Zhang Shanghai Jiao Tong University

Divide-and-combine Strategies in Statistical Modeling for Massive Data

Contraction Methods for Convex Optimization and monotone variational inequalities No.12

Accelerated primal-dual methods for linearly constrained convex problems

Inexact Alternating Direction Method of Multipliers for Separable Convex Optimization

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization /

A Unified Approach to Proximal Algorithms using Bregman Distance

Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.11

Numerical Methods - Numerical Linear Algebra

Coordinate Update Algorithm Short Course Operator Splitting

Conjugate Gradient (CG) Method

Dual and primal-dual methods

Adaptive Stochastic Alternating Direction Method of Multipliers

Fast Nonnegative Matrix Factorization with Rank-one ADMM

STA141C: Big Data & High Performance Statistical Computing

EE 367 / CS 448I Computational Imaging and Display Notes: Image Deconvolution (lecture 6)

2 Regularized Image Reconstruction for Compressive Imaging and Beyond

Preconditioning via Diagonal Scaling

Introduction to Alternating Direction Method of Multipliers

ACCELERATED FIRST-ORDER PRIMAL-DUAL PROXIMAL METHODS FOR LINEARLY CONSTRAINED COMPOSITE CONVEX PROGRAMMING

CLASSICAL ITERATIVE METHODS

Bindel, Fall 2016 Matrix Computations (CS 6210) Notes for

Lecture Note 5: Semidefinite Programming for Stability Analysis

A Quick Tour of Linear Algebra and Optimization for Machine Learning

Numerical Linear Algebra Primer. Ryan Tibshirani Convex Optimization /36-725

Jordan Journal of Mathematics and Statistics (JJMS) 5(3), 2012, pp A NEW ITERATIVE METHOD FOR SOLVING LINEAR SYSTEMS OF EQUATIONS

Tight Rates and Equivalence Results of Operator Splitting Schemes

1 Computing with constraints

Proximal ADMM with larger step size for two-block separable convex programming and its application to the correlation matrices calibrating problems

An ADMM algorithm for optimal sensor and actuator selection

Convex Optimization Algorithms for Machine Learning in 10 Slides

Alternating Direction Method of Multipliers. Ryan Tibshirani Convex Optimization

Application of the Strictly Contractive Peaceman-Rachford Splitting Method to Multi-block Separable Convex Programming

Approximation algorithms for nonnegative polynomial optimization problems over unit spheres

You should be able to...

An Optimization-based Approach to Decentralized Assignability

Chapter 7 Iterative Techniques in Matrix Algebra

HYBRID JACOBIAN AND GAUSS SEIDEL PROXIMAL BLOCK COORDINATE UPDATE METHODS FOR LINEARLY CONSTRAINED CONVEX PROGRAMMING

On the acceleration of augmented Lagrangian method for linearly constrained optimization

Nonlinear Programming Algorithms Handout

Some definitions. Math 1080: Numerical Linear Algebra Chapter 5, Solving Ax = b by Optimization. A-inner product. Important facts

Image restoration: numerical optimisation

Linearized Alternating Direction Method with Adaptive Penalty for Low-Rank Representation

j=1 u 1jv 1j. 1/ 2 Lemma 1. An orthogonal set of vectors must be linearly independent.

1. Nonlinear Equations. This lecture note excerpted parts from Michael Heath and Max Gunzburger. f(x) = 0

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 9. Alternating Direction Method of Multipliers

A Bregman alternating direction method of multipliers for sparse probabilistic Boolean network problem

EXAMPLES OF CLASSICAL ITERATIVE METHODS

6. Iterative Methods for Linear Systems. The stepwise approach to the solution...

Numerical optimization

Sparse Gaussian conditional random fields

Applications of Linear Programming

Mathematical Optimisation, Chpt 2: Linear Equations and inequalities

Key words. alternating direction method of multipliers, convex composite optimization, indefinite proximal terms, majorization, iteration-complexity

7.2 Steepest Descent and Preconditioning

Math 5630: Iterative Methods for Systems of Equations Hung Phan, UMass Lowell March 22, 2018

Lecture 2: Linear Algebra Review

Math Introduction to Numerical Analysis - Class Notes. Fernando Guevara Vasquez. Version Date: January 17, 2012.

CHAPTER 11. A Revision. 1. The Computers and Numbers therein

ECS289: Scalable Machine Learning

Optimization. Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison

Computational Methods. Systems of Linear Equations

Course Notes: Week 1

Nonlinear Optimization for Optimal Control

Math 471 (Numerical methods) Chapter 3 (second half). System of equations

Lecture 9: September 28

CAAM 454/554: Stationary Iterative Methods

LOCAL LINEAR CONVERGENCE OF ADMM Daniel Boley

ADMM and Accelerated ADMM as Continuous Dynamical Systems

Here is an example of a block diagonal matrix with Jordan Blocks on the diagonal: J

Splitting Techniques in the Face of Huge Problem Sizes: Block-Coordinate and Block-Iterative Approaches

Coordinate descent. Geoff Gordon & Ryan Tibshirani Optimization /

More First-Order Optimization Algorithms

Optimal Linearized Alternating Direction Method of Multipliers for Convex Programming 1

1 Number Systems and Errors 1

CHAPTER 5. Basic Iterative Methods

Asynchronous Non-Convex Optimization For Separable Problem

Multi-Block ADMM and its Convergence

Constrained Nonlinear Optimization Algorithms

A Customized ADMM for Rank-Constrained Optimization Problems with Approximate Formulations

Math 273a: Optimization Subgradients of convex functions

Goal: to construct some general-purpose algorithms for solving systems of linear Equations

ARock: an algorithmic framework for asynchronous parallel coordinate updates

Numerical Linear Algebra Primer. Ryan Tibshirani Convex Optimization

Chap 3. Linear Algebra

Algebra C Numerical Linear Algebra Sample Exam Problems

On Glowinski s Open Question of Alternating Direction Method of Multipliers

Math 411 Preliminaries

Transcription:

A DERANDOMIZED ALGORITHM FOR RP-ADMM WITH SYMMETRIC GAUSS-SEIDEL METHOD JINCHAO XU, KAILAI XU, AND YINYU YE arxiv:1705.08389v1 [math.oc] 23 May 2017 Abstract. For multi-block alternating direction method of multipliers(admm), where the objective function can be decomposed into multiple block components, we show that with block symmetric Gauss-Seidel iteration, the algorithm will converge quickly. The method will apply a block symmetric Gauss-Seidel iteration in the primal update and a linear correction that can be derived in view of Richard iteration. We also establish the linear convergence rate for linear systems. 1. Introduction. The alternating direction method of multipliers has been very popular recently due to its application in the large-scale problem such as big data problems and machine learning[1]. Consider the following optimization problem (1.1) min θ 1 (x 1 ) + θ 2 (x 2 ) A 1 x 1 + A 2 x 2 = b x i χ i, i = 1, 2 Here θ i : R n i R are closed proper convex functions; χ i R n are closed convex sets; A T i A i are nonsingular; the feasible set is nonempty. We first construct the augmented Lagrangian (1.2) L(x 1, x 2, λ) = θ 1 (x 1 ) + θ 2 (x 2 ) + λ T (A 1 x 1 + A 2 x 2 b) + β 2 A 1x 1 + A 2 x 2 b 2 Here β is a positive constant. Assume we already have x k = (x k 1, x k 2), λ k, now we generate x k+1, λ k+1 as follows Date: May 4, 2018. Key words and phrases. Alternating direction method of multipliers (ADMM), block symmetric Gauss-Seidel. 1

x k+1 1 = arg min {θ 1 (x 1 ) + β2 A 1x 1 + A 2 x k2 b 2 + (λ k ) ( T A 1 x 1 + A 2 x k2 b )} x 1 { x k+1 2 = arg min θ 2 (x 2 ) + β x 2 2 A 1x k+1 1 + A 2 x 2 b 2 + (λ k ) ( T A 1 x k+1 1 + A 2 x 2 b )} λ k+1 = λ k + β ( A 1 x k+1 1 + A 2 x k+2 2 b ) It is well known that ADMM for two blocks convergences[4][5][6]. A natural idea is to extend ADMM to more than two blocks, i.e., consider the following optimization problem, (1.3) min θ i (x i ) A i x i = b x i χ i, i = 1, 2,..., m Here θ i : R n i R are closed proper convex functions; χ i R n are closed convex sets; A T i A i are nonsingular; the feasible set is nonempty. The corresponding augmented Lagrangian function reads (1.4) L(1) (x 1, x 2,..., x m, λ) = θ i (x i ) + λ T ( A i x i b) + β m 2 A i x i b 2 2 Note (1.4) can also be written in form of (1.5) L(2) (x 1, x 2,..., x m, λ) = θ i (x i ) + β A 2 i x i b + 1 2 β λ 2 1 2β λ 2 2 The direct extension of ADMM is shown in Algorithm 1. 2

Algorithm 1 Direct Extension of ADMM Assume we already have x k, λ k. (1.6) { x k+1 i = arg min θ i (x i ) + β i 1 x i 2 A j x k+1 j + A i x i + (1.7) λ k+1 = λ k + β ( m j=1 A j x k+1 j j=1 b ) j=i+1 A j x k j b + 1 β λk 2 } i = 1, 2,..., m Unfortunately, such a scheme is not ensured to be convergent[2]. For example, consider applying ADMM with three blocks to the following problem, (1.8) min 0 s.t. Ax = 0 where (1.9) A = (A 1, A 2, A 3 ) = 1 1 1 1 1 2 1 2 2 Then Algorithm 1 gives a linear system where the spectral radius of the iteration matrix can be shown to be greater than 1. As a remedy for the divergence of multi-block ADMM, [7] proposes a randomized algorithm(rp-admm) and proved its convergence for linear objective function by an expectation argument. The algorithm for computing x k+1, µ k+1 from x k, µ k reads as Algorithm 2. There are other efforts to improve the convergence of multi-block ADMM. For example, in [3], the author suggests a correction after x k+1, µ k+1 obtained from Algorithm 1. To make comparison between different algorithms, we adapt the algorithm ADM-G in [3] to the form in Algorithm 3. 3

Algorithm 2 RP-ADMM (1) Pick a permutation σ of {1, 2,..., m} uniformly at random. (2) For i = 1, 2,..., m, compute x k+1 σ(i) by (1.10) x k+1 σ(i) = arg (3) (1.11) λ k+1 = λ k + β min L(2) (x k+1 x σ(1),..., xk+1 σ(i 1), x σ(i), x k+1 σ(i+1),..., xk+1 σ(m) ; λk ) σ(i) χσ(i) ( m A j x k+1 j b ) Algorithm 3 The ADMM with Gaussian back substitution(g-admm) Let α (0, 1), β > 0. (1) Prediction step. { } x k i = arg min θ i (x i ) + β i 1 x i 2 (1.12) A j x k j + A i x i + A j x k j b + 1 β λk 2 j=1 j=i+1 i = 1, 2,..., m (2) Correction step. Assume A T A = D + L + L T, where L is lower triangular matrix and D is a diagonal matrix. x k+1 = x k + α(d + L) 1 D( x k x k ) ( (1.13) m ) λ k+1 = λ k + αβ A j x k j b In this paper, we propose an approach based on the insights into the two different algorithms described above(rp-admm and G-ADMM). We attribute the convergence of RP-ADMM to the symmetrization of the optimization step. Indeed, when we randomly permute the order of optimization variables, it is actually symmetrizing the update procedure for x k. In Algorithm 1, we may have totally different convergence behavior if we exchange the order of variables x 1, x 2,..., x m, for example, we may get a convergence scheme if we do update x 1 x 2... x m, and a divergence scheme if we do update x m x m 1... x 1. This is not desirable and Algorithm 2 overcomes this by doing a random shuffling. Based on this observation, we propose a Symmetric Gauss-Seidel ADMM(S-ADMM) scheme for the multi-block 4

problem and prove its convergence in the worst case, i.e., the objective function is 0(not strongly convex). Meanwhile, in G-ADMM, the authors used a correction step after each loop. In our algorithm, we will use Richard iteration[9] to correct the dual variables in the Schur decomposition, which turns out to be the gradient descent for dual variables in the augmented Lagrangian method. We remark that although viewed in iteration methods, Richard iteration is not the best method to solve Ax = b, but still, it outperforms Algorithm 3 in many numerical examples we did. The rest of the paper is organized as follows. In Section 2, we describe S-ADMM algorithm and prove its convergence in the worst case. In Section 3, we apply the algorithm to solve some concrete examples and compare it with the existing methods numerically. 2. S-ADMM. We propose the following algorithm to solve the problem. Algorithm 4 S-ADMM Let β > 0, ω (0, 2β). Assume we already have x k. (1) Forward optimization. (2.1) x k i = arg min x i { θ i (x i ) + β i 1 2 A j x k j + A i x i + j=1 (2) Backward optimization. { x k+1 i = arg min θ i (x i ) + β i 1 x i 2 (2.2) A j x k j + A i x i + (3) Dual update. j=1 (2.3) λ k+1 = λ k + ω ( m j=1 A j x k+1 j j=i+1 A j x k j b + 1 β λk 2 } i = 1, 2,..., m j=i+1 b A j x k+1 j b + 1 β λk 2 } i = m 1, m 2,..., 1 ) Consider the following optimization problem, 5

(2.4) (2.5) min x i,,2,...,m s.t. 1 2 θ ix 2 i A i x i = b Here θ i 0. The augmented Lagrangian is (2.6) L(x 1, x 2,..., x m ; θ 1, θ 2,..., θ m ) = 1 2 ( m ) θ i x 2 i µ T A i x i b + β A 2 i x i b The optimization problem is equivalent to solve 2 (2.7) i.e. L x i = L θ i = 0, i = 1, 2,..., m (2.8) θ 1 + βa T 1 A 1 βa T 1 A 2... βa T 1 A m A T 1 x 1 βa T βa T 2 A 1 θ 2 + βa T 2 A 2... βa T 2 A m A T 1 b 2 x 2 βa...... βa T ma 1 βa T ma 2... θ m + βa T ma m A T m x = T 2 b. M βa T mb A 1 A 2... A m 0 µ b Let θ 1 (2.9) θ = θ 2... θ m and A = (A 1 A 2... A m ), G = θ + βa T A, then (2.8) is equivalent to (2.10) ( ) ( G A T x = A 0 µ) 6 ( ) βa T b b

Now we assume G is invertible. Note this is true if we assume θ i > 0, i, or A T A is nonsingular. Or most generally, let A T A = U T ΛU, where U is orthogonal matrix, we have G = U T (θ + βλ)u. We assume θ i > 0 or Λ ii > 0 for all i = 1, 2,..., m. If we do Schur decomposition, we obtain (2.11) ( ) ( ) ( ) G A T x βa 0 AG 1 A T = T b µ b βag 1 A T b The well-known augmented Lagrangian method solves (2.11) by doing Gauss-Seidel iteration[9][8], i.e. (2.12) (2.13) Gx n+1 A T µ n = βa T b µ n+1 = µ n + ω(b βag 1 A T b AG 1 A T µ n ) Note (2.13) is exactly Richardson iteration for AG 1 A T µ = b βag 1 A T b. Due to (2.12), we have (2.14) x n+1 = G 1 A T µ n + βg 1 A T b Then (2.15) Ax n+1 = AG 1 A T µ n + βag 1 A T b (2.13) is exactly (2.16) µ n+1 = µ n + ω(b Ax n+1 ) which is consistent with augmented Lagrangian method. We think the key here is to symmetrize the iteration process for (2.12). Therefore, we propose the symmetric Gauss-Seidel method for (2.12). Let G = L + L T + D, where L is a lower triangular matrix and D is a diagonal matrix. We have (L + D)x n+ 1 2 = L T x n + A T µ n + βa T b (L T + D)x n+1 = Lx n+ 1 2 + A T µ n + βa T b (2.17) x n+1 = x n + G 1 (A T µ n + βa T b Gx n ) where G = (L + D)D 1 (L T + D). Instead of solving (2.13) directly, we substitute G by G 7

(2.18) µ n+1 = µ n + ω(b Ax n+1 ) We will later prove that the scheme converges. In sum, the scheme is (2.19) (2.20) (2.21) or in the compact form (2.22) ( ) x n+1 µ n+1 = (L + D)x n+ 1 2 = L T x n + A T µ n + βa T b (L T + D)x n+1 = Lx n+ 1 2 + A T µ n + βa T b ( x n µ n ) + µ n+1 = µ n + ω(b Ax n+1 ) ( G 0 ωa I We now select appropriate ω such that ) 1 (( βa T b ωb ( ( ) 1 ( ) G ) 0 G A T (2.23) ρ I < 1 ωa I ωa 0 ) ( G A T ωa 0 ) ( x n µ n )) Theorem 2.1. Assume H = G 1 G satisfies λ(h) (0, 1) and G = βa T A is invertible. If 0 < ω < 2β, then (2.23) holds. Proof. ( ( ) 1 ( ) G ) ( 0 G A T G (2.24) λ = 1 G G ) 1 A T ωa I ωa 0 ωa G 1 G + ωa ωa G 1 A T ( x Assume λ is an eigenvalue of the above matrix and is the corresponding y) eigenvector, we then have (2.25) which is equivalent to ( G 1 G G ) ( ) 1 A T x ωa G 1 G + ωa ωa G 1 A T y ( x = λ y) (2.26) (2.27) Hx Hz = λx ωkhx + ωkx + ωkhz = λz 8

where z = G 1 A T y, K = G 1 A T A = 1 I, H = G 1 G. Let c = ω, it is easy to β β obtain (2.28) c(1 λ)x = λz From this we can see if λ = 0, then x = 0, and from (2.26) we have Hz = 0, as H is nonsingular, we have z = 0, a contradiction. Thus λ 0 and (2.29) z = Plug it into (2.26), we have (2.30) Hx = This indicates c(1 λ) x λ λ 1 c(1 λ) λ x (2.31) 0 < Now we estimate λ, let λ 1 c(1 λ) λ < 1 (2.32) We have λ 1 c(1 λ) λ = ξ (0, 1) (2.33) λ 2 (ξ + cξ)λ + cξ = 0 (2.34) λ = ξ(1 + c) ± ξ 2 (1 + c) 2 4ξc 2 (1) If ξ 2 (1 + c) 2 4ξc 0, i.e. ξ 4c (1+c) 2, λ will be real. Note in this case c 1. We have from the first inequality in (2.31) (2.35) λ c c + 1 and from the second (2.36) (λ c)(λ 1) < 0 9

On condition that c < 2, i.e.ω < 2β, we have 0 < λ < 2, and thus the statement holds. (2) If ξ 2 (1 + c) 2 4ξc < 0,i.e., ξ < 4c, λ will be a complex number. And we (1+c) 2 have (2.37) λ 1 = 1 2 (ξ(1 + c) 2)2 + 4ξc ξ 2 (1 + c) 2 = 1 ξ < 1 Thus, (2.23) holds. Remark 2.2. We would like to point out the relationship between the iteration matrix (2.38) ( I G 1 A T A G 1 A T A G 1 A T A A A G 1 A T when ω = 1. In [7], they discuss the eigenvalues of (2.39) M = and proved that ) ( ) I QA T A QA T AQA T A A AQA T (1 λ) 2 (2.40) 1 2λ eig(qat A) λ eig(m) Based on this observation, it is proved that if eig(qa T A) (0, 4 ), then eig(m) 3 ( 1, 1). In our example, Q = G 1 when ω = 1. Note (2.41) eig(qa T A) = eig( G 1 G) (0, 1) However, this does not mean the convergence behavior is better for S-ADMM, as we can only obtain eig(m) ( 1, 1) from (2.41). We can now actually prove that the scheme (2.19)(2.20)(2.21) converges. Lemma 2.3. The eigenvalues of G 1 G are distributed in (0, 1). Proof. Note (2.42) eig(g 1 LD 1 L T ) = eig( G 1 2 LD 1 L T G 1 2 ) and therefore the eigenvalues of G 1 G are all greater than 1. Thus the eigenvalues of G 1 G are distributed in (0, 1) 10

Corollary 2.4. If 0 < ω < 2β, then the scheme (2.19)(2.20)(2.21) converges for θ = 0 and full column rank A. 11

3. Numerical Examples In this section, we apply the S-ADMM to solve the problem of counterexample proposed in [2] and also a quadratic objective function with 1-norm penalty and linear constraint. We compare this method with Algorithm 2 and Algorithm 3. The code is written in Python and run on an x86 64 Linux machine. 3.1. Counter-example in [2]. The problem is presented in (1.8). It is analyzed in [7] that a cyclic ADMMM is a divergent scheme, and in the same paper the author proved that Algorithm 2 convergences with high probability. To make the result comparable to each other, we pick the same initial points b = (0, 0, 0) and x 0 = (1, 1, 1), with β = 4.0 and α = 0.2. The result is presented in Figure 1. We see that S-ADMM performs as well as PR-ADMM algorithm, but it is more oscillating compared to G-ADMM. Figure 1. Counter example, the curves describes Ax k b in the iterations. 3.2. Quadratic Objective Function with 1-norm Penalty. In this example, we solve the following manufactured optimization problem. 12

(3.1) Here min s.t. 10 10 x T Θ i x + x 1 A i x i = b (3.2) Θ i = ( ) 5 + i 1, b = 1 11 1 5 + i 1 1 + i 2 + i 3 + i 4 + i......, A i = 19 + i 20 + i We run the algorithms for 500 iterations and obtain the result shown in Figure 2 and Figure 3. Figure 2. Ax k b in each iteration 13

Figure 3. Function value in each iteration We see in Figure 2 that the primal feasibility Ax k b of S-ADMM decreases continuous and linearly, while for G-ADMM the residual tends to stop decreasing after 300 iterations. For PR-ADMM, the residual oscillates and decreases slowly. We have to remark that during the numerical experiments the author observes not every time PR-ADMM converges. The program may halt due to float overflow. This is easy to understand because in [7] the authors proves that the algorithm converges using the expectation argument. This argument only guarantees that PR-ADMM converges with high probability. However, if viewed in terms of function value decreasing, S-ADMM may not seem very inviting. It is the slowest among all. 4. Conclusions We propose a new algorithm based on block symmetric Gauss-Seidel algorithm to do the primal update in ADMM. The algorithm will converge if the objective is strongly convex and when the objective function is 0, the rate can actually be found with linear algebra. This algorithm can be viewed as a de-randomized version of PR-ADMM and the dual update can be viewed as a Richard iteration correction in the Schur decomposition. Moreover, we interpret the algorithm as adding a regularization or proximal term to the original problem and then solve the problem analytically. This facilitates us to see how the algorithm helps accelerate the 14

convergence of ADMM. We believe that a better correction step could be found to accelerate the algorithm, by doing a more sophisticated correction step. 15

References [1] Stephen Boyd, Neal Parikh, Eric Chu, Borja Peleato, and Jonathan Eckstein. Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends R in Machine Learning, 3(1):1 122, 2011. [2] Caihua Chen, Bingsheng He, Yinyu Ye, and Xiaoming Yuan. The direct extension of admm for multi-block convex minimization problems is not necessarily convergent. Mathematical Programming, 155(1-2):57 79, 2016. [3] Bingsheng He, Min Tao, and Xiaoming Yuan. Alternating direction method with gaussian back substitution for separable convex programming. SIAM Journal on Optimization, 22(2):313 340, 2012. [4] Bingsheng He and Xiaoming Yuan. On the o(1/n) convergence rate of the douglas rachford alternating direction method. SIAM Journal on Numerical Analysis, 50(2):700 709, 2012. [5] Renato DC Monteiro and Benar F Svaiter. Iteration-complexity of block-decomposition algorithms and the alternating direction method of multipliers. SIAM Journal on Optimization, 23(1):475 507, 2013. [6] Robert Nishihara, Laurent Lessard, Benjamin Recht, Andrew Packard, and Michael I Jordan. A general analysis of the convergence of admm. In ICML, pages 343 352, 2015. [7] Ruoyu Sun, Zhi-Quan Luo, and Yinyu Ye. On the expected convergence of randomly permuted admm. arxiv preprint arxiv:1503.06387, 2015. [8] Jinchao Xu. Multilevel iterative methods for discretized pdes, lecture notes, April 2017. [9] Jinchao Xu. Optimal iterative methods for linear and nonlinear problems,lecture notes, April 2017. Department of Mathematics, Pennsylvania State University, University Park, PA 16802, USA E-mail address: xu@math.psu.edu Institute for Computational and Mathematical Engineering, Stanford University, CA 94305-4042 E-mail address: kailaix@stanford.edu Department of Management Science and Engineering, Stanford University, CA 94305-4121 E-mail address: yyye@stanford.edu 16