arxiv: v1 [math.oc] 4 Mar 2019

Size: px

Start display at page:

Download "arxiv: v1 [math.oc] 4 Mar 2019"

Erick Fisher
5 years ago
Views:

1 arxiv: v1 [math.oc] 4 Mar 2019 The Application of Multi-block ADMM on Isotonic Regression Problems Junxiang Wang and Liang Zhao George Mason University Abstract The multi-block ADMM has received much attention from optimization researchers due to its good scalability. In this paper, the multi-block ADMM is applied to solve two large-scale problems related to isotonic regression. Numerical experiments show that the multi-block ADMM is convergent when the chosen parameter is small enough and the multi-block ADMM scales well compared with baselines. 1 Introduction In recent years, the Alternating Direction Method of Multipliers (ADMM) has received considerable attention from the community of machine learning researchers. This is because it is a natural fit for wide large-scale data applications, including asset allocation [37], phase retrieval [38], event forecasting [40, 41], vaccine adverse event detection [35] and compressive sensing [6]. The main advantage of the ADMM is that it splits separable objective functions into two subproblems, each of which is easy to solve. Therefore, the ADMM can be implemented in a distributive and parallel manner. One part of the ADMM community investigates the convergence property of the multi-block ADMM, the extension of the classic ADMM with no less than three variables. The multi-block ADMM is written mathematically as follows: min x1,,x n n i=1 f i(x i ), s.t. n i=1 A ix i = 0 where f i : R mi R(i = 1,, n) are convex functions, x i R mi (i = 1,, n) are vectors of length m i. A i R p mi (i = 1,, n) are matrices. The augmented Lagrangian function of the multi-block ADMM is L ρ (x 1,, x n, y) = n i=1 f i(x i ) + (ρ/2) n i=1 A ix i + y/ρ 2 2 where y R p is a dual variable and ρ > 0 is a penalty parameter. The multi-block ADMM is solved by the following steps: x k+1 i arg min xi L ρ (, x k+1 i 1, x i, x k i+1, )(i = 1,, n) y k+1 y k + ρ n A ix k+1 i=1 i 1

2 The primal residual r and dual residuals s i (i = 1,, n 1) can monitor the convergence process of the multi-block ADMM, which are defined as r k = n i=1 A ix k i, sk i = ρ n j=i+1 A j(x k+1 j x k j ), respectively. There are a variety of related works on discussing the convergence of the multi-block ADMM, which are detailed in Section C in the supplementary materials. Because the multi-block ADMM is a computational framework to solve problems with linear equality constraints, it cannot directly be applied to the problems with multiple inequality constraints such as the well-known isotonic regression problems [16]. In this paper, we propose new strategies based on the multi-block ADMM to address existing computational challenges in the isotonic regression problems. The isotonic regression, which was firstly introduced by 1950s, aims to return a sequence of responses given a predictor and pre-defined order constraints, which are represented by a Directed Acyclic Graph (DAG) [1]. The isotonic regression has a board range of applications such as learning in the past ten years [14, 15, 22, 24, 39]. There are a variety of works to solve isotonic regression problems. See [2, 3, 15, 16, 22, 29] for more information. Most of them solve the isotonic regression problem step-wise because one objective variable may be subject to another. However, The main drawback is that their computational cost is very expensive when solving large-scale problems. To deal with the challenge of scalability, we leverge the advantage of parallel computing of the multi-block ADMM by integrating objective variables into several vectors. The following questions are addressed for the proposed multiblock ADMM frameworks on two isotonic regression problems: 1. Does the multi-block ADMM converge whatever ρ is chosen? 2. Is the multi-block ADMM scalable to two problems compared with other baselines? The rest of the paper is organized as follows. In Section 2, we give formulations of two problems related to isotonic regression and present two multi-block ADMM algorithms to solve them, respectively. In Section 3, experiments on simulated datasets are conducted to validate the convergence properties and scalability of the multi-block ADMM. Section 4 concludes by summarizing the whole paper. 2 Isotonic Regression Problems In this section, we introduce two isotonic regression problems in detail. Specifically, Section 2.1 focuses on how the multi-block ADMM solves the smoothed isotonic regression problem with linear order constraints; a multidimensional ordering problem is solved by the multi-block ADMM in Section The Multi-block ADMM for Smoothed Isotonic Regression The classic isotonic regression is a problem to return a non-decreasing response given a predictor. However, the fitted response resembles a step function while a smooth and continuous response is expected. To achieve this, Sysoev and 2

3 Burdakov proposed a smoothed isotonic regression problem given a predictor x i (i = 1,, n) [31]: Problem 1 (Smoothed Isotonic Regression). min β1,,β n n i=1 w i(x i β i ) 2 + λ n 1 s.t. β 1 β 2 β n i=1 (β i β i+1 ) 2 where w i > 0(i = 1,, n) are assigned weights and λ 0 is a penalty parameter. When λ = 0, the problem is reduced to the classic isotonic regression problem. The smoothed isotonic regression problem can be solved by the Smoothed Pool-Adjacent-Violators (SPAV) algorithm [31]. However, this algorithm meets with difficulty when solving large-scale datasets. This is because the time complexity of the SPAV algorithm is O(n 2 ). To deal with the scalability issue, the multi-block ADMM is applied to realize parallel computing: we introduce two vectors p and q of length n 1 such that p i = β i (i = 1,, n 1) and q i = β i+1 (i = 1,, n 1), respectively. The problem can be reformulated as min p,q,u w 1 (x 1 p 1 ) 2 + n 1 i=2 ((w i/2)(x i p i ) 2 +(w i /2)(x i q i 1 ) 2 )+w n (x n q n 1 ) 2 +λ u 2 2 s.t. p q + u = 0, u 0, p i+1 = q i, i = 1, 2,, n 2 where p = [p 1,, p n 1 ] and q = [q 1,, q n 1 ]. The augmented Lagrangian is L ρ (u, p, q, y 1, y 2 ) = w 1 (x 1 p 1 ) 2 + n 1 i=2 ((w i/2)(x i p i ) 2 +(w i /2)(x i q i 1 ) 2 )+ w n (x n q n 1 ) 2 +λ u 2 2+(ρ/2) p q+u+y 1 /ρ 2 2+(ρ/2) n 2 i=1 (p i+1 q i +y 2,i /ρ) 2 where y 1 = [y 1,1,, y 1,n 1 ], y 2 = [y 2,1,, y 2,n 2 ] and ρ > 0. Due to space limit, the multi-block ADMM to solve Problem 1 is detailed in Section A in the supplementary materials. 2.2 The Multi-block ADMM for Multi-dimensional Ordering The previous smoothed isotonic regression only considers linear orders, while multi-dimensional orders are more common in isotonic regression applications [30]. A multi-dimensional order is defined in a m dimensional space Z i = (z i,1,, z i,m ). Z i Z j if and only if z i,1 z j,1,, z i,m z j,m. It can be represented equivalently as Directed Acyclic Graph (DAG) G = (V, E) where (Z i, Z j ) E if Z i Z j [30]. Obviously, many pairs (Z i, Z j ) are incomparable. Formally, the multi-dimensional ordering problem is formulated as follows: Problem 2 (Multi-dimensional Ordering). min α1,,α n n i=1 w i(y i α i ) 2 s.t.α i α j iff Z i Z j (1 i, j n). 3

4 where w i > 0 are weights. The multi-dimensional ordering problem has been investigated since 1970s [12], but its computational difficulty has been mentioned in multiple papers [5, 9 11, 20, 23, 25, 27 30, 32]. Therefore, researchers develop approximate methods and restrict them in the small datasets [30]. To handle large-scale multi-dimensional ordering problems, we leverage the advantage of parallel computing of the multi-block ADMM to solve it in an exact form. By introducing two vectors g and h, this problem is equivalent of min g,h W T ((Y g) (Y g))/2 + W T ((Y h) (Y h))/2 s.t. E 1 g E 2 h + v = 0, v 0, g = h. where W = (w 1,, w n ), Y = (Y 1,, Y n ) and is the Hadamard product. E 1 R E n and E 2 R E n are representations of the edge set E: the k th edge (i, j) E means that E 1,k,i = 1 and E 2,k,j = 1 while E 1,k,p = 0(1 p n, p i) and E 2,k,q = 0(1 q n, q j). The augmented Lagrangian is L ρ (v, g, h, y 1, y 2 ) = W T ((Y g) (Y g))/2 + W T ((Y h) (Y h))/2 + (ρ/2) E 1 g E 2 h + v + y 1 /ρ (ρ/2) g h + y 2 /ρ 2 2. where ρ > 0. Due to space limit, the multi-block ADMM to solve Problem 2 is detailed in Section B in the supplementary materials. 3 Experiment In this section, we validate the multi-block ADMM using simulated datasets and compare it with existing state-of-the-art methods. All experiments were conducted on a 64-bit machine with Intel(R) core(tm)processor (i7-6820hq CPU@ 2.70GHZ) and 16.0GB memory. 3.1 Data Generation and Parameter Settings The experimental data and parameters for two applications are explained in this section. Commonly, the maximal number of iteration was set to 10, 000. We set the tolerance ε = 0.01 n according to the recommendation by Boyd et al. [4]. For the smoothed isotonic regression problem, we generated simulated observations x i (i = 1,, n) from a uniform distribution in (0, 1000); we set w i = 1(i = 1,, n) and λ = 1. For the multi-dimensional ordering problem, we generated simulated observations Y i (i = 1,, n) from a uniform distribution in (0, 1000). w i (i = 1,, n) were set to 1. E 1 and E 2 were generated from a 2-dimensional random grid graph to simulate the ordering of the 2-dimensional space. 3.2 Baselines Two methods Smoothed Pool-Adjacent-Violators(SPAV) [31] and Interior Point Method(IPM) [16] are used for comparison. The details can be found in Section D in the supplementary materials. 4

5 3.3 Experimental Results In this section, the experimental results on two problems are explained in detail. 1. Does the multi-block ADMM converge whatever ρ is chosen? Figure 1 illustrates the convergence properties of the multi-block ADMM when n = 1000 on the smoothed isotonic regression problem and the multi-dimensional ordering problem. Two choices of ρ = 0.1 and ρ = 10 are shown on Figure 1. The primal residual r and the dual residual s reflect the convergence trend of the multi-block ADMM. Overall, Figure 1 (a)-(c) shows that the multi-block ADMM converges while Figure 1(d) shows the divergence: r and s drop drastically at the beginning and then decrease smoothly through the end in the Figures 1 (a)-(c); however, Figure 1(d) displays a surge of r. Moreover, when ρ = 0.1, r is located above s while when ρ = 10 the situation is the opposite. Obviously, the multi-block ADMM can obtain a reasonable solution within tens of iterations as long as ρ is small and hence it is suitable for large-scale optimization. (a).ρ = 0.1 on the smoothed isotonic regression problem. (b).ρ = 10 on the smoothed isotonic regression problem. (c).ρ = 0.1 on the multi-dimensional ordering problem. (d).ρ = 10 on the multi-dimensional ordering problem. Figure 1: Convergence of the multi-block ADMM when n = 1000 on two problems. 2. Is the multi-block ADMM scalable to two problems compared 5

6 with other baselines? Figure 2 shows the scalability of the multi-block ADMM compared with two baselines on two problems. The running time of the multiblock ADMM increases linearly with the number of observations on two problems. In Figure 2(a), the SPAV is more efficient than the multi-block ADMM when the number of observations n is less than 20,000 but needs more time since then; as for Figure 2(b), the multi-block ADMM is more efficient than the IPM no matter how many observations there are. (a).scalability on the smoothed isotonic regression problem. (b).scalability on the multi-dimensional ordering problem. Figure 2: Scalability of the multi-block ADMM on two problems. 4 Conclusion The multi-block ADMM is an interesting topic in the optimization community in recent years. In this paper, we apply the multi-block ADMM to two problems related to isotonic regression: smoothed isotonic regression and multi-dimensional ordering. Most existing methods are not efficient enough to run on large-scale datasets. However, the main advantage of the multi-block ADMM is parallel computing and hence it is scalable to large datasets. We find that the multi-block ADMM converges when ρ is small and its running time increases linearly with the scalability of observations. References [1] Miriam Ayer, H Daniel Brunk, George M Ewing, William T Reid, and Edward Silverman. An empirical distribution function for sampling with incomplete information. The annals of mathematical statistics, pages , [2] RE Barlow and HD Brunk. The isotonic regression problem and its dual. Journal of the American Statistical Association, 67(337): ,

7 [3] Michael J Best and Nilotpal Chakravarti. Active set algorithms for isotonic regression; a unifying framework. Mathematical Programming, 47(1-3): , [4] Stephen Boyd, Neal Parikh, Eric Chu, Borja Peleato, and Jonathan Eckstein. Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends R in Machine Learning, 3(1):1 122, [5] Gordon Bril, Richard Dykstra, Carolyn Pillers, and Tim Robertson. Algorithm as 206: isotonic regression in two independent variables. Journal of the Royal Statistical Society. Series C (Applied Statistics), 33(3): , [6] Rick Chartrand and Brendt Wohlberg. A nonconvex admm algorithm for group sparsity with sparse groups. In Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on, pages IEEE, [7] Caihua Chen, Bingsheng He, Yinyu Ye, and Xiaoming Yuan. The direct extension of admm for multi-block convex minimization problems is not necessarily convergent. Mathematical Programming, 155(1-2):57 79, [8] Wei Deng, Ming-Jun Lai, Zhimin Peng, and Wotao Yin. Parallel multiblock admm with o (1/k) convergence. Journal of Scientific Computing, 71(2): , [9] Holger Dette and Regine Scheder. Strictly monotone and smooth nonparametric regression for two or more variables. Canadian Journal of Statistics, 34(4): , [10] Richard L Dykstra and Tim Robertson. An algorithm for isotonic regression for two or more independent variables. The Annals of Statistics, pages , [11] David Gamarnik. Efficient learning of monotone concepts via quadratic optimization. In Proceedings of the eleventh annual conference on computational learning theory, pages ACM, [12] David Lee Hanson, Gordon Pledger, FT Wright, et al. On consistency in monotonic regression. The Annals of Statistics, 1(3): , [13] Bingsheng He and Xiaoming Yuan. On the direct extension of admm for multi-block separable convex programming and beyond: from variational inequality perspective. Manuscript, optimizationonline. org/db_html/2014/03/4293. html, [14] Sham M Kakade, Varun Kanade, Ohad Shamir, and Adam Kalai. Efficient learning of generalized linear and single index models with isotonic regression. In Advances in Neural Information Processing Systems, pages ,

8 [15] Adam Tauman Kalai and Ravi Sastry. The isotron algorithm: Highdimensional isotonic regression. In COLT. Citeseer, [16] Rasmus Kyng, Anup Rao, and Sushant Sachdeva. Fast, provable algorithms for isotonic regression in all l_p-norms. In Advances in Neural Information Processing Systems, pages , [17] Tianyi Lin, Shiqian Ma, and Shuzhong Zhang. On the convergence rate of multi-block admm. arxiv preprint arxiv: , 229, [18] Tianyi Lin, Shiqian Ma, and Shuzhong Zhang. Global convergence of unmodified 3-block admm for a class of convex minimization problems. arxiv preprint arxiv: , [19] Tianyi Lin, Shiqian Ma, and Shuzhong Zhang. Iteration complexity analysis of multi-block admm for a family of convex minimization without strong convexity. Journal of Scientific Computing, 69(1):52 81, [20] Ronny Luss, Saharon Rosset, and Moni Shahar. Decomposing isotonic regression for efficiently solving large problems. In Advances in Neural Information Processing Systems, pages , [21] Sindri Magnússon, Pradeep Chathuranga Weeraddana, Michael G Rabbat, and Carlo Fischione. On the convergence of alternating direction lagrangian methods for nonconvex structured optimization problems. IEEE Transactions on Control of Network Systems, 3(3): , [22] Taesup Moon, Alex Smola, Yi Chang, and Zhaohui Zheng. Intervalrank: isotonic regression with listwise and pairwise constraints. In Proceedings of the third ACM international conference on Web search and data mining, pages ACM, [23] Hari Mukarjee and Steven Stern. Feasible nonparametric estimation of multiargument monotone functions. Journal of the American Statistical Association, 89(425):77 80, [24] Harikrishna Narasimhan and Shivani Agarwal. On the relationship between binary classification, bipartite ranking, and binary class probability estimation. In Advances in Neural Information Processing Systems, pages , [25] Shixian Qian and William F Eddy. An algorithm for isotonic regression on ordered rectangular grids. Journal of Computational and Graphical Statistics, 5(3): , [26] Daniel P Robinson and Rachael Tappenden. A flexible admm algorithm for big data applications. Journal of Scientific Computing, 71(1): ,

9 [27] Olli Saarela and Elja Arjas. A method for bayesian monotonic multiple regression. Scandinavian Journal of Statistics, 38(3): , [28] Georgia Salanti and Kurt Ulm. Multidimensional isotonic regression and estimation of the threshold value [29] Quentin F Stout. Isotonic regression via partitioning. Algorithmica, 66(1):93 112, [30] Quentin F Stout. Isotonic regression for multiple independent variables. Algorithmica, 71(2): , [31] Oleg Sysoev and Oleg Burdakov. A smoothed monotonic regression via l2 regularization. Knowledge and Information Systems, pages 1 22, [32] Oleg Sysoev, Oleg Burdakov, and Anders Grimvall. A segmentation-based algorithm for large-scale partially ordered monotonic regression. Computational Statistics & Data Analysis, 55(8): , [33] Min Tao and Xiaoming Yuan. Convergence analysis of the direct extension of admm for multiple-block separable convex minimization. Advances in Computational Mathematics, pages 1 41, [34] Fenghui Wang, Zongben Xu, and Hong-Kun Xu. Convergence of bregman alternating direction method with multipliers for nonconvex composite problems. arxiv preprint arxiv: , [35] Junxiang Wang and Liang Zhao. Multi-instance domain adaptation for vaccine adverse event detection. In Proceedings of the 2018 World Wide Web Conference on World Wide Web, pages International World Wide Web Conferences Steering Committee, [36] Yu Wang, Wotao Yin, and Jinshan Zeng. Global convergence of admm in nonconvex nonsmooth optimization. arxiv preprint arxiv: , [37] Zaiwen Wen, Xianhua Peng, Xin Liu, Xiaoling Sun, and Xiaodi Bai. Asset allocation under the basel accord risk measures. arxiv preprint arxiv: , [38] Zaiwen Wen, Chao Yang, Xin Liu, and Stefano Marchesini. Alternating direction methods for classical and ptychographic phase retrieval. Inverse Problems, 28(11):115010, [39] Bianca Zadrozny and Charles Elkan. Transforming classifier scores into accurate multiclass probability estimates. In Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, pages ACM,

10 [40] Liang Zhao, Junxiang Wang, Feng Chen, Chang-Tien Lu, and Naren Ramakrishnan. Spatial event forecasting in social media with geographically hierarchical regularization. Proceedings of the IEEE, 105(10): , [41] Liang Zhao, Junxiang Wang, and Xiaojie Guo. Distant-supervision of heterogeneous multitask learning for social event forecasting with multilingual indicators

11 Appendix A The Multi-block ADMM Algorithm to Solve Problem 1 Algorithm 1 The Multi-block ADMM Algorithm to Solve Problem 1 1: Initialize p, q, u, y 1, y 2, ρ > 0, k = 0. 2: repeat 3: Update u k+1 in Equation (1). 4: Update p k+1 in Equation (2). 5: Update q k+1 in Equation (3). 6: Update r1 k+1 p k+1 q k+1 + u k+1. 7: Update r k+1 2,i p k+1 i+1 qk+1 i (i = 1,, n 2), r2 k+1 [r2,1 k+1 8: Update s k+1 1 ρ(p k+1 q k+1 p k + q k ). 9: Update s k+1 2 ρ(q k q k+1 ). 10: Update r k+1 r1 k rk # Calculate the primal residual. 11: Update s k+1 s k sk # Calculate the dual residual. 12: Update y1 k+1 y1 k + ρr1 k+1. 13: Update y2 k+1 y2 k + ρr2 k+1. 14: k k : until convergence. 16: Output p, q and u. 2,n 2 ]. The multi-block ADMM to solve Problem 1 is shown in Algorithm 1. Specifically, Line 3-5 update three primal variables u, p and q alternately; Line 6-11 compute primal and dual residuals; Line update two dual variables y 1 and y 2. Since the objective function and constraints are both separable, each subproblem has a closed-form solution and can be implemented in parallel. The detailed solutions to the subproblems are as follows. 1. Update u. The variable u is updated as follows: u k+1 arg min u (ρ/2) p k q k + u + y k 1 /ρ λ u 2 2, s.t. u 0. (1) This subproblem has a closed-from solution as follows: u k+1 max((ρ(q k p k ) y k 1 )/(ρ + 2λ), 0). 2. Update p. The variable p is updated as follows: p k+1 1 arg min p1 w 1 (x 1 p 1 ) 2 + (ρ/2) p 1 q1 k + u k y1 k /ρ 2 2 p k+1 i arg min pi (w i /2)(x i p i ) 2 + (ρ/2) p i qi k + u k+1 i + yi k /ρ (ρ/2) p i qi 1 k + y2,i 1/ρ k 2 2(i = 2,, n 1). (2) 11

12 This subproblem has a closed-from solution as follows: p k+1 1 (2w 1 x 1 + ρq k 1 ρu k+1 1 y k 1,1)/(2w 1 + ρ). p k+1 i (w i x i + ρq k i ρu k+1 i y k 1,i + ρq k i 1 y k 2,i 1)/(w i + 2ρ)(i = 2,, n 1). 3. Update q. The variable q is updated as follows: q k+1 i arg min qi (w i+1 /2)(x i+1 q i ) 2 + (ρ/2) p k+1 i q i + u k+1 i + y k 1,i/ρ (ρ/2)(p k+1 i+1 q i + y k 2,i/ρ)(i = 1,, n 2). q k+1 n 1 arg min q n 1 w n (x n q n 1 ) 2 + (ρ/2) p k+1 n 1 q n 1 + u k+1 n 1 + yk 1,n 1/ρ 2 2. (3) This subproblem has a closed-form solution as follows: q k+1 i (w i+1 x i+1 + ρp k+1 i + ρu k+1 i + y k 1,i + ρp k+1 i+1 + yk 2,i)/(w i+1 + 2ρ)(i = 1,, n 2). q k+1 n 1 (2w nx n + ρp k+1 n 1 + ρuk+1 n 1 + yk 1,n 1)/(2w n + ρ). In addition to the advantage of parallel computing, the convergence of the Algorithm 1 can be guaranteed by the following theorem: Theorem 1. There exists d > 0 such that if ρ < d, the Algorithm 1 converges globally to the optimal point with sublinear convergence rate. Proof. Because the objective function is strongly convex in p and q, all assumptions in [17] are satisfied and hence the Theorem 3.2 and Theorem 4.2 in [17] ensure the global sublinear convergence if ρ is smaller than a threshold d. B The Multi-block ADMM Algorithm to Solve Problem 2 The multi-block ADMM to solve Problem 2 is shown in Algorithm 2. Specifically, Line 3-5 update three primal variables v, g and h alternately; Line 6-12 compute primal and dual residuals; Line update two dual variables y 1 and y 2. Since the objective function and constraints are both separable, each subproblem has a closed-form solution and can be implemented in parallel. The detailed solutions to the subproblems are as follows: 1. Update v. The variable v is updated as follows: v k+1 arg min v (ρ/2) E 1 g k E 2 h k + v + y k 1 /ρ 2 2, s.t. v 0. (4) This subproblem has a closed-from solution as follows: v k+1 max( E 1 g k + E 2 h k y k 1 /ρ, 0). 12

13 Algorithm 2 The Multi-block ADMM Algorithm to Solve Problem 2 1: Initialize g, h, v, y 1, y 2, ρ > 0, k = 0. 2: repeat 3: Update v k+1 in Equation (4). 4: Update g k+1 in Equation (5). 5: Update h k+1 in Equation (6). 6: Update r k+1 1 E 1 g k+1 E 2 h k+1 + v k+1. 7: Update r k+1 2 g k+1 h k+1. 8: Update s k+1 1 ρ(e 1 g k+1 E 2 h k+1 E 1 g k + E 2 h k ). 9: Update s k+1 2 ρe 2 (h k h k+1 ). 10: Update s k+1 3 ρ(h k h k+1 ). 11: Update r k rk # Calculate the primal residual. s k sk sk # Calculate the dual r k+1 12: Update s k+1 residual. 13: Update y1 k+1 y1 k + ρr k+1 14: Update y2 k+1 y2 k + ρr k+1 15: k k : until convergence. 17: Output g, h and v Update g. The variable g is updated as follows: g k+1 arg min g W T ((Y g) (Y g))/2 + (ρ/2) E 1 g E 2 h k + v k+1 + y k 1 /ρ (ρ/2) g h k + y k 2 /ρ 2 2. (5) This subproblem has a closed-from solution as follows: g k+1 (diag(w ) + ρe T 1 E 1 + ρi) 1 (diag(w )Y + ρe T 1 E 2 h k ρe T 1 v k+1 E T 1 y k 1 + ρh k y k 2 ) where I is an identity matrix and diag(w ) is a diagonal matrix where the main diagonal is W. 3. Update h. The variable h is updated as follows: h k+1 arg min h W T ((Y h) (Y h))/2 + (ρ/2) E 1 g k+1 E 2 h + v k+1 + y k 1 /ρ (ρ/2) g k+1 h + y k 2 /ρ 2 2. (6) This subproblem has a closed-form solution as follows: h k+1 (diag(w ) + ρe T 2 E 2 + ρi) 1 (diagy + ρe T 2 E 1 g k+1 + ρe T 2 v k+1 + E T 2 y k 1 + ρg k+1 + y k 2 ). As for the convergence properties of Algorithm 2, the below theorem guarantees the convergence if ρ is small. Theorem 2. There exists d > 0 such that if ρ < d, then the Algorithm 2 converges globally to the optimal point with sublinear convergence rate. Proof. The proof is the same as Theorem 1. 13

14 C Related Work on the Multi-block ADMM The multi-block ADMM was firstly studied by Chen et al. [7] when they concluded that the multi-block ADMM does not necessarily converge by giving a counterexample. He and Yuan explained why the multi-block ADMM may diverge from the perspective of variational inequality framework [13]. Since then, many researchers studied sufficient conditions to ensure the global convergence of the multi-block ADMM. For example, Robinson and Tappenden and Lin et al. imposed strong convexity assumption on the objective function f i (x i )(i = 1,, n) [17,26]. Tao and Yuan did not require all objective functions f i (x i )(i = 1,, n) to be strongly convex, but imposed full-rank assumption on A i (i = 1,, n) [33]. Lin et al. weakened the strong convexity assumption on the objective function f i (x i )(i = 1,, n) by imposing the additional cocercity assumption on f i (x i )(i = 1,, n) [19]. Deng et al. proved that the multi-block ADMM preserved convergence if multiple variables x i (i = 1,, n) were updated in the Jacobian fashion rather than Gauss-Seidel fashion [8]. Lin et al. proved that the multi-block ADMM converged for any ρ > 0 in the regularized least squares decomposition problem [18]. Moreover, some work extended the multiblock ADMM into the nonconvex settings. See [21, 34, 36] for more information. D Baselines In order to test the scalability of the multi-block ADMM on two applications, two baselines were utilized for comparison: 1. Smoothed Pool-Adjacent-Violators(SPAV) [31]. The SPAV algorithm is a extension of the PAV algorithm designed for the smoothed isotonic regression problem. It partitions all β i (i = 1,, n) into many blocks according to the feasibility of inequality constraints. The iteration ends when all constraints are feasible. 2. Interior Point Method(IPM) [16]. The IPM method is an efficient algorithm to solve multi-dimensional ordering with convergence guarantee. Its time complexity is O( E 1.5 log 2 V log 2 ( V /δ)) where δ is a tolerance. 14

arxiv: v1 [math.oc] 23 May 2017

arxiv: v1 [math.oc] 23 May 2017 A DERANDOMIZED ALGORITHM FOR RP-ADMM WITH SYMMETRIC GAUSS-SEIDEL METHOD JINCHAO XU, KAILAI XU, AND YINYU YE arxiv:1705.08389v1 [math.oc] 23 May 2017 Abstract. For multi-block alternating direction method