arxiv: v1 [math.oc] 4 Mar 2019

Size: px
Start display at page:

Download "arxiv: v1 [math.oc] 4 Mar 2019"

Transcription

1 arxiv: v1 [math.oc] 4 Mar 2019 The Application of Multi-block ADMM on Isotonic Regression Problems Junxiang Wang and Liang Zhao George Mason University Abstract The multi-block ADMM has received much attention from optimization researchers due to its good scalability. In this paper, the multi-block ADMM is applied to solve two large-scale problems related to isotonic regression. Numerical experiments show that the multi-block ADMM is convergent when the chosen parameter is small enough and the multi-block ADMM scales well compared with baselines. 1 Introduction In recent years, the Alternating Direction Method of Multipliers (ADMM) has received considerable attention from the community of machine learning researchers. This is because it is a natural fit for wide large-scale data applications, including asset allocation [37], phase retrieval [38], event forecasting [40, 41], vaccine adverse event detection [35] and compressive sensing [6]. The main advantage of the ADMM is that it splits separable objective functions into two subproblems, each of which is easy to solve. Therefore, the ADMM can be implemented in a distributive and parallel manner. One part of the ADMM community investigates the convergence property of the multi-block ADMM, the extension of the classic ADMM with no less than three variables. The multi-block ADMM is written mathematically as follows: min x1,,x n n i=1 f i(x i ), s.t. n i=1 A ix i = 0 where f i : R mi R(i = 1,, n) are convex functions, x i R mi (i = 1,, n) are vectors of length m i. A i R p mi (i = 1,, n) are matrices. The augmented Lagrangian function of the multi-block ADMM is L ρ (x 1,, x n, y) = n i=1 f i(x i ) + (ρ/2) n i=1 A ix i + y/ρ 2 2 where y R p is a dual variable and ρ > 0 is a penalty parameter. The multi-block ADMM is solved by the following steps: x k+1 i arg min xi L ρ (, x k+1 i 1, x i, x k i+1, )(i = 1,, n) y k+1 y k + ρ n A ix k+1 i=1 i 1

2 The primal residual r and dual residuals s i (i = 1,, n 1) can monitor the convergence process of the multi-block ADMM, which are defined as r k = n i=1 A ix k i, sk i = ρ n j=i+1 A j(x k+1 j x k j ), respectively. There are a variety of related works on discussing the convergence of the multi-block ADMM, which are detailed in Section C in the supplementary materials. Because the multi-block ADMM is a computational framework to solve problems with linear equality constraints, it cannot directly be applied to the problems with multiple inequality constraints such as the well-known isotonic regression problems [16]. In this paper, we propose new strategies based on the multi-block ADMM to address existing computational challenges in the isotonic regression problems. The isotonic regression, which was firstly introduced by 1950s, aims to return a sequence of responses given a predictor and pre-defined order constraints, which are represented by a Directed Acyclic Graph (DAG) [1]. The isotonic regression has a board range of applications such as learning in the past ten years [14, 15, 22, 24, 39]. There are a variety of works to solve isotonic regression problems. See [2, 3, 15, 16, 22, 29] for more information. Most of them solve the isotonic regression problem step-wise because one objective variable may be subject to another. However, The main drawback is that their computational cost is very expensive when solving large-scale problems. To deal with the challenge of scalability, we leverge the advantage of parallel computing of the multi-block ADMM by integrating objective variables into several vectors. The following questions are addressed for the proposed multiblock ADMM frameworks on two isotonic regression problems: 1. Does the multi-block ADMM converge whatever ρ is chosen? 2. Is the multi-block ADMM scalable to two problems compared with other baselines? The rest of the paper is organized as follows. In Section 2, we give formulations of two problems related to isotonic regression and present two multi-block ADMM algorithms to solve them, respectively. In Section 3, experiments on simulated datasets are conducted to validate the convergence properties and scalability of the multi-block ADMM. Section 4 concludes by summarizing the whole paper. 2 Isotonic Regression Problems In this section, we introduce two isotonic regression problems in detail. Specifically, Section 2.1 focuses on how the multi-block ADMM solves the smoothed isotonic regression problem with linear order constraints; a multidimensional ordering problem is solved by the multi-block ADMM in Section The Multi-block ADMM for Smoothed Isotonic Regression The classic isotonic regression is a problem to return a non-decreasing response given a predictor. However, the fitted response resembles a step function while a smooth and continuous response is expected. To achieve this, Sysoev and 2

3 Burdakov proposed a smoothed isotonic regression problem given a predictor x i (i = 1,, n) [31]: Problem 1 (Smoothed Isotonic Regression). min β1,,β n n i=1 w i(x i β i ) 2 + λ n 1 s.t. β 1 β 2 β n i=1 (β i β i+1 ) 2 where w i > 0(i = 1,, n) are assigned weights and λ 0 is a penalty parameter. When λ = 0, the problem is reduced to the classic isotonic regression problem. The smoothed isotonic regression problem can be solved by the Smoothed Pool-Adjacent-Violators (SPAV) algorithm [31]. However, this algorithm meets with difficulty when solving large-scale datasets. This is because the time complexity of the SPAV algorithm is O(n 2 ). To deal with the scalability issue, the multi-block ADMM is applied to realize parallel computing: we introduce two vectors p and q of length n 1 such that p i = β i (i = 1,, n 1) and q i = β i+1 (i = 1,, n 1), respectively. The problem can be reformulated as min p,q,u w 1 (x 1 p 1 ) 2 + n 1 i=2 ((w i/2)(x i p i ) 2 +(w i /2)(x i q i 1 ) 2 )+w n (x n q n 1 ) 2 +λ u 2 2 s.t. p q + u = 0, u 0, p i+1 = q i, i = 1, 2,, n 2 where p = [p 1,, p n 1 ] and q = [q 1,, q n 1 ]. The augmented Lagrangian is L ρ (u, p, q, y 1, y 2 ) = w 1 (x 1 p 1 ) 2 + n 1 i=2 ((w i/2)(x i p i ) 2 +(w i /2)(x i q i 1 ) 2 )+ w n (x n q n 1 ) 2 +λ u 2 2+(ρ/2) p q+u+y 1 /ρ 2 2+(ρ/2) n 2 i=1 (p i+1 q i +y 2,i /ρ) 2 where y 1 = [y 1,1,, y 1,n 1 ], y 2 = [y 2,1,, y 2,n 2 ] and ρ > 0. Due to space limit, the multi-block ADMM to solve Problem 1 is detailed in Section A in the supplementary materials. 2.2 The Multi-block ADMM for Multi-dimensional Ordering The previous smoothed isotonic regression only considers linear orders, while multi-dimensional orders are more common in isotonic regression applications [30]. A multi-dimensional order is defined in a m dimensional space Z i = (z i,1,, z i,m ). Z i Z j if and only if z i,1 z j,1,, z i,m z j,m. It can be represented equivalently as Directed Acyclic Graph (DAG) G = (V, E) where (Z i, Z j ) E if Z i Z j [30]. Obviously, many pairs (Z i, Z j ) are incomparable. Formally, the multi-dimensional ordering problem is formulated as follows: Problem 2 (Multi-dimensional Ordering). min α1,,α n n i=1 w i(y i α i ) 2 s.t.α i α j iff Z i Z j (1 i, j n). 3

4 where w i > 0 are weights. The multi-dimensional ordering problem has been investigated since 1970s [12], but its computational difficulty has been mentioned in multiple papers [5, 9 11, 20, 23, 25, 27 30, 32]. Therefore, researchers develop approximate methods and restrict them in the small datasets [30]. To handle large-scale multi-dimensional ordering problems, we leverage the advantage of parallel computing of the multi-block ADMM to solve it in an exact form. By introducing two vectors g and h, this problem is equivalent of min g,h W T ((Y g) (Y g))/2 + W T ((Y h) (Y h))/2 s.t. E 1 g E 2 h + v = 0, v 0, g = h. where W = (w 1,, w n ), Y = (Y 1,, Y n ) and is the Hadamard product. E 1 R E n and E 2 R E n are representations of the edge set E: the k th edge (i, j) E means that E 1,k,i = 1 and E 2,k,j = 1 while E 1,k,p = 0(1 p n, p i) and E 2,k,q = 0(1 q n, q j). The augmented Lagrangian is L ρ (v, g, h, y 1, y 2 ) = W T ((Y g) (Y g))/2 + W T ((Y h) (Y h))/2 + (ρ/2) E 1 g E 2 h + v + y 1 /ρ (ρ/2) g h + y 2 /ρ 2 2. where ρ > 0. Due to space limit, the multi-block ADMM to solve Problem 2 is detailed in Section B in the supplementary materials. 3 Experiment In this section, we validate the multi-block ADMM using simulated datasets and compare it with existing state-of-the-art methods. All experiments were conducted on a 64-bit machine with Intel(R) core(tm)processor (i7-6820hq CPU@ 2.70GHZ) and 16.0GB memory. 3.1 Data Generation and Parameter Settings The experimental data and parameters for two applications are explained in this section. Commonly, the maximal number of iteration was set to 10, 000. We set the tolerance ε = 0.01 n according to the recommendation by Boyd et al. [4]. For the smoothed isotonic regression problem, we generated simulated observations x i (i = 1,, n) from a uniform distribution in (0, 1000); we set w i = 1(i = 1,, n) and λ = 1. For the multi-dimensional ordering problem, we generated simulated observations Y i (i = 1,, n) from a uniform distribution in (0, 1000). w i (i = 1,, n) were set to 1. E 1 and E 2 were generated from a 2-dimensional random grid graph to simulate the ordering of the 2-dimensional space. 3.2 Baselines Two methods Smoothed Pool-Adjacent-Violators(SPAV) [31] and Interior Point Method(IPM) [16] are used for comparison. The details can be found in Section D in the supplementary materials. 4

5 3.3 Experimental Results In this section, the experimental results on two problems are explained in detail. 1. Does the multi-block ADMM converge whatever ρ is chosen? Figure 1 illustrates the convergence properties of the multi-block ADMM when n = 1000 on the smoothed isotonic regression problem and the multi-dimensional ordering problem. Two choices of ρ = 0.1 and ρ = 10 are shown on Figure 1. The primal residual r and the dual residual s reflect the convergence trend of the multi-block ADMM. Overall, Figure 1 (a)-(c) shows that the multi-block ADMM converges while Figure 1(d) shows the divergence: r and s drop drastically at the beginning and then decrease smoothly through the end in the Figures 1 (a)-(c); however, Figure 1(d) displays a surge of r. Moreover, when ρ = 0.1, r is located above s while when ρ = 10 the situation is the opposite. Obviously, the multi-block ADMM can obtain a reasonable solution within tens of iterations as long as ρ is small and hence it is suitable for large-scale optimization. (a).ρ = 0.1 on the smoothed isotonic regression problem. (b).ρ = 10 on the smoothed isotonic regression problem. (c).ρ = 0.1 on the multi-dimensional ordering problem. (d).ρ = 10 on the multi-dimensional ordering problem. Figure 1: Convergence of the multi-block ADMM when n = 1000 on two problems. 2. Is the multi-block ADMM scalable to two problems compared 5

6 with other baselines? Figure 2 shows the scalability of the multi-block ADMM compared with two baselines on two problems. The running time of the multiblock ADMM increases linearly with the number of observations on two problems. In Figure 2(a), the SPAV is more efficient than the multi-block ADMM when the number of observations n is less than 20,000 but needs more time since then; as for Figure 2(b), the multi-block ADMM is more efficient than the IPM no matter how many observations there are. (a).scalability on the smoothed isotonic regression problem. (b).scalability on the multi-dimensional ordering problem. Figure 2: Scalability of the multi-block ADMM on two problems. 4 Conclusion The multi-block ADMM is an interesting topic in the optimization community in recent years. In this paper, we apply the multi-block ADMM to two problems related to isotonic regression: smoothed isotonic regression and multi-dimensional ordering. Most existing methods are not efficient enough to run on large-scale datasets. However, the main advantage of the multi-block ADMM is parallel computing and hence it is scalable to large datasets. We find that the multi-block ADMM converges when ρ is small and its running time increases linearly with the scalability of observations. References [1] Miriam Ayer, H Daniel Brunk, George M Ewing, William T Reid, and Edward Silverman. An empirical distribution function for sampling with incomplete information. The annals of mathematical statistics, pages , [2] RE Barlow and HD Brunk. The isotonic regression problem and its dual. Journal of the American Statistical Association, 67(337): ,

7 [3] Michael J Best and Nilotpal Chakravarti. Active set algorithms for isotonic regression; a unifying framework. Mathematical Programming, 47(1-3): , [4] Stephen Boyd, Neal Parikh, Eric Chu, Borja Peleato, and Jonathan Eckstein. Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends R in Machine Learning, 3(1):1 122, [5] Gordon Bril, Richard Dykstra, Carolyn Pillers, and Tim Robertson. Algorithm as 206: isotonic regression in two independent variables. Journal of the Royal Statistical Society. Series C (Applied Statistics), 33(3): , [6] Rick Chartrand and Brendt Wohlberg. A nonconvex admm algorithm for group sparsity with sparse groups. In Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on, pages IEEE, [7] Caihua Chen, Bingsheng He, Yinyu Ye, and Xiaoming Yuan. The direct extension of admm for multi-block convex minimization problems is not necessarily convergent. Mathematical Programming, 155(1-2):57 79, [8] Wei Deng, Ming-Jun Lai, Zhimin Peng, and Wotao Yin. Parallel multiblock admm with o (1/k) convergence. Journal of Scientific Computing, 71(2): , [9] Holger Dette and Regine Scheder. Strictly monotone and smooth nonparametric regression for two or more variables. Canadian Journal of Statistics, 34(4): , [10] Richard L Dykstra and Tim Robertson. An algorithm for isotonic regression for two or more independent variables. The Annals of Statistics, pages , [11] David Gamarnik. Efficient learning of monotone concepts via quadratic optimization. In Proceedings of the eleventh annual conference on computational learning theory, pages ACM, [12] David Lee Hanson, Gordon Pledger, FT Wright, et al. On consistency in monotonic regression. The Annals of Statistics, 1(3): , [13] Bingsheng He and Xiaoming Yuan. On the direct extension of admm for multi-block separable convex programming and beyond: from variational inequality perspective. Manuscript, optimizationonline. org/db_html/2014/03/4293. html, [14] Sham M Kakade, Varun Kanade, Ohad Shamir, and Adam Kalai. Efficient learning of generalized linear and single index models with isotonic regression. In Advances in Neural Information Processing Systems, pages ,

8 [15] Adam Tauman Kalai and Ravi Sastry. The isotron algorithm: Highdimensional isotonic regression. In COLT. Citeseer, [16] Rasmus Kyng, Anup Rao, and Sushant Sachdeva. Fast, provable algorithms for isotonic regression in all l_p-norms. In Advances in Neural Information Processing Systems, pages , [17] Tianyi Lin, Shiqian Ma, and Shuzhong Zhang. On the convergence rate of multi-block admm. arxiv preprint arxiv: , 229, [18] Tianyi Lin, Shiqian Ma, and Shuzhong Zhang. Global convergence of unmodified 3-block admm for a class of convex minimization problems. arxiv preprint arxiv: , [19] Tianyi Lin, Shiqian Ma, and Shuzhong Zhang. Iteration complexity analysis of multi-block admm for a family of convex minimization without strong convexity. Journal of Scientific Computing, 69(1):52 81, [20] Ronny Luss, Saharon Rosset, and Moni Shahar. Decomposing isotonic regression for efficiently solving large problems. In Advances in Neural Information Processing Systems, pages , [21] Sindri Magnússon, Pradeep Chathuranga Weeraddana, Michael G Rabbat, and Carlo Fischione. On the convergence of alternating direction lagrangian methods for nonconvex structured optimization problems. IEEE Transactions on Control of Network Systems, 3(3): , [22] Taesup Moon, Alex Smola, Yi Chang, and Zhaohui Zheng. Intervalrank: isotonic regression with listwise and pairwise constraints. In Proceedings of the third ACM international conference on Web search and data mining, pages ACM, [23] Hari Mukarjee and Steven Stern. Feasible nonparametric estimation of multiargument monotone functions. Journal of the American Statistical Association, 89(425):77 80, [24] Harikrishna Narasimhan and Shivani Agarwal. On the relationship between binary classification, bipartite ranking, and binary class probability estimation. In Advances in Neural Information Processing Systems, pages , [25] Shixian Qian and William F Eddy. An algorithm for isotonic regression on ordered rectangular grids. Journal of Computational and Graphical Statistics, 5(3): , [26] Daniel P Robinson and Rachael Tappenden. A flexible admm algorithm for big data applications. Journal of Scientific Computing, 71(1): ,

9 [27] Olli Saarela and Elja Arjas. A method for bayesian monotonic multiple regression. Scandinavian Journal of Statistics, 38(3): , [28] Georgia Salanti and Kurt Ulm. Multidimensional isotonic regression and estimation of the threshold value [29] Quentin F Stout. Isotonic regression via partitioning. Algorithmica, 66(1):93 112, [30] Quentin F Stout. Isotonic regression for multiple independent variables. Algorithmica, 71(2): , [31] Oleg Sysoev and Oleg Burdakov. A smoothed monotonic regression via l2 regularization. Knowledge and Information Systems, pages 1 22, [32] Oleg Sysoev, Oleg Burdakov, and Anders Grimvall. A segmentation-based algorithm for large-scale partially ordered monotonic regression. Computational Statistics & Data Analysis, 55(8): , [33] Min Tao and Xiaoming Yuan. Convergence analysis of the direct extension of admm for multiple-block separable convex minimization. Advances in Computational Mathematics, pages 1 41, [34] Fenghui Wang, Zongben Xu, and Hong-Kun Xu. Convergence of bregman alternating direction method with multipliers for nonconvex composite problems. arxiv preprint arxiv: , [35] Junxiang Wang and Liang Zhao. Multi-instance domain adaptation for vaccine adverse event detection. In Proceedings of the 2018 World Wide Web Conference on World Wide Web, pages International World Wide Web Conferences Steering Committee, [36] Yu Wang, Wotao Yin, and Jinshan Zeng. Global convergence of admm in nonconvex nonsmooth optimization. arxiv preprint arxiv: , [37] Zaiwen Wen, Xianhua Peng, Xin Liu, Xiaoling Sun, and Xiaodi Bai. Asset allocation under the basel accord risk measures. arxiv preprint arxiv: , [38] Zaiwen Wen, Chao Yang, Xin Liu, and Stefano Marchesini. Alternating direction methods for classical and ptychographic phase retrieval. Inverse Problems, 28(11):115010, [39] Bianca Zadrozny and Charles Elkan. Transforming classifier scores into accurate multiclass probability estimates. In Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, pages ACM,

10 [40] Liang Zhao, Junxiang Wang, Feng Chen, Chang-Tien Lu, and Naren Ramakrishnan. Spatial event forecasting in social media with geographically hierarchical regularization. Proceedings of the IEEE, 105(10): , [41] Liang Zhao, Junxiang Wang, and Xiaojie Guo. Distant-supervision of heterogeneous multitask learning for social event forecasting with multilingual indicators

11 Appendix A The Multi-block ADMM Algorithm to Solve Problem 1 Algorithm 1 The Multi-block ADMM Algorithm to Solve Problem 1 1: Initialize p, q, u, y 1, y 2, ρ > 0, k = 0. 2: repeat 3: Update u k+1 in Equation (1). 4: Update p k+1 in Equation (2). 5: Update q k+1 in Equation (3). 6: Update r1 k+1 p k+1 q k+1 + u k+1. 7: Update r k+1 2,i p k+1 i+1 qk+1 i (i = 1,, n 2), r2 k+1 [r2,1 k+1 8: Update s k+1 1 ρ(p k+1 q k+1 p k + q k ). 9: Update s k+1 2 ρ(q k q k+1 ). 10: Update r k+1 r1 k rk # Calculate the primal residual. 11: Update s k+1 s k sk # Calculate the dual residual. 12: Update y1 k+1 y1 k + ρr1 k+1. 13: Update y2 k+1 y2 k + ρr2 k+1. 14: k k : until convergence. 16: Output p, q and u. 2,n 2 ]. The multi-block ADMM to solve Problem 1 is shown in Algorithm 1. Specifically, Line 3-5 update three primal variables u, p and q alternately; Line 6-11 compute primal and dual residuals; Line update two dual variables y 1 and y 2. Since the objective function and constraints are both separable, each subproblem has a closed-form solution and can be implemented in parallel. The detailed solutions to the subproblems are as follows. 1. Update u. The variable u is updated as follows: u k+1 arg min u (ρ/2) p k q k + u + y k 1 /ρ λ u 2 2, s.t. u 0. (1) This subproblem has a closed-from solution as follows: u k+1 max((ρ(q k p k ) y k 1 )/(ρ + 2λ), 0). 2. Update p. The variable p is updated as follows: p k+1 1 arg min p1 w 1 (x 1 p 1 ) 2 + (ρ/2) p 1 q1 k + u k y1 k /ρ 2 2 p k+1 i arg min pi (w i /2)(x i p i ) 2 + (ρ/2) p i qi k + u k+1 i + yi k /ρ (ρ/2) p i qi 1 k + y2,i 1/ρ k 2 2(i = 2,, n 1). (2) 11

12 This subproblem has a closed-from solution as follows: p k+1 1 (2w 1 x 1 + ρq k 1 ρu k+1 1 y k 1,1)/(2w 1 + ρ). p k+1 i (w i x i + ρq k i ρu k+1 i y k 1,i + ρq k i 1 y k 2,i 1)/(w i + 2ρ)(i = 2,, n 1). 3. Update q. The variable q is updated as follows: q k+1 i arg min qi (w i+1 /2)(x i+1 q i ) 2 + (ρ/2) p k+1 i q i + u k+1 i + y k 1,i/ρ (ρ/2)(p k+1 i+1 q i + y k 2,i/ρ)(i = 1,, n 2). q k+1 n 1 arg min q n 1 w n (x n q n 1 ) 2 + (ρ/2) p k+1 n 1 q n 1 + u k+1 n 1 + yk 1,n 1/ρ 2 2. (3) This subproblem has a closed-form solution as follows: q k+1 i (w i+1 x i+1 + ρp k+1 i + ρu k+1 i + y k 1,i + ρp k+1 i+1 + yk 2,i)/(w i+1 + 2ρ)(i = 1,, n 2). q k+1 n 1 (2w nx n + ρp k+1 n 1 + ρuk+1 n 1 + yk 1,n 1)/(2w n + ρ). In addition to the advantage of parallel computing, the convergence of the Algorithm 1 can be guaranteed by the following theorem: Theorem 1. There exists d > 0 such that if ρ < d, the Algorithm 1 converges globally to the optimal point with sublinear convergence rate. Proof. Because the objective function is strongly convex in p and q, all assumptions in [17] are satisfied and hence the Theorem 3.2 and Theorem 4.2 in [17] ensure the global sublinear convergence if ρ is smaller than a threshold d. B The Multi-block ADMM Algorithm to Solve Problem 2 The multi-block ADMM to solve Problem 2 is shown in Algorithm 2. Specifically, Line 3-5 update three primal variables v, g and h alternately; Line 6-12 compute primal and dual residuals; Line update two dual variables y 1 and y 2. Since the objective function and constraints are both separable, each subproblem has a closed-form solution and can be implemented in parallel. The detailed solutions to the subproblems are as follows: 1. Update v. The variable v is updated as follows: v k+1 arg min v (ρ/2) E 1 g k E 2 h k + v + y k 1 /ρ 2 2, s.t. v 0. (4) This subproblem has a closed-from solution as follows: v k+1 max( E 1 g k + E 2 h k y k 1 /ρ, 0). 12

13 Algorithm 2 The Multi-block ADMM Algorithm to Solve Problem 2 1: Initialize g, h, v, y 1, y 2, ρ > 0, k = 0. 2: repeat 3: Update v k+1 in Equation (4). 4: Update g k+1 in Equation (5). 5: Update h k+1 in Equation (6). 6: Update r k+1 1 E 1 g k+1 E 2 h k+1 + v k+1. 7: Update r k+1 2 g k+1 h k+1. 8: Update s k+1 1 ρ(e 1 g k+1 E 2 h k+1 E 1 g k + E 2 h k ). 9: Update s k+1 2 ρe 2 (h k h k+1 ). 10: Update s k+1 3 ρ(h k h k+1 ). 11: Update r k rk # Calculate the primal residual. s k sk sk # Calculate the dual r k+1 12: Update s k+1 residual. 13: Update y1 k+1 y1 k + ρr k+1 14: Update y2 k+1 y2 k + ρr k+1 15: k k : until convergence. 17: Output g, h and v Update g. The variable g is updated as follows: g k+1 arg min g W T ((Y g) (Y g))/2 + (ρ/2) E 1 g E 2 h k + v k+1 + y k 1 /ρ (ρ/2) g h k + y k 2 /ρ 2 2. (5) This subproblem has a closed-from solution as follows: g k+1 (diag(w ) + ρe T 1 E 1 + ρi) 1 (diag(w )Y + ρe T 1 E 2 h k ρe T 1 v k+1 E T 1 y k 1 + ρh k y k 2 ) where I is an identity matrix and diag(w ) is a diagonal matrix where the main diagonal is W. 3. Update h. The variable h is updated as follows: h k+1 arg min h W T ((Y h) (Y h))/2 + (ρ/2) E 1 g k+1 E 2 h + v k+1 + y k 1 /ρ (ρ/2) g k+1 h + y k 2 /ρ 2 2. (6) This subproblem has a closed-form solution as follows: h k+1 (diag(w ) + ρe T 2 E 2 + ρi) 1 (diagy + ρe T 2 E 1 g k+1 + ρe T 2 v k+1 + E T 2 y k 1 + ρg k+1 + y k 2 ). As for the convergence properties of Algorithm 2, the below theorem guarantees the convergence if ρ is small. Theorem 2. There exists d > 0 such that if ρ < d, then the Algorithm 2 converges globally to the optimal point with sublinear convergence rate. Proof. The proof is the same as Theorem 1. 13

14 C Related Work on the Multi-block ADMM The multi-block ADMM was firstly studied by Chen et al. [7] when they concluded that the multi-block ADMM does not necessarily converge by giving a counterexample. He and Yuan explained why the multi-block ADMM may diverge from the perspective of variational inequality framework [13]. Since then, many researchers studied sufficient conditions to ensure the global convergence of the multi-block ADMM. For example, Robinson and Tappenden and Lin et al. imposed strong convexity assumption on the objective function f i (x i )(i = 1,, n) [17,26]. Tao and Yuan did not require all objective functions f i (x i )(i = 1,, n) to be strongly convex, but imposed full-rank assumption on A i (i = 1,, n) [33]. Lin et al. weakened the strong convexity assumption on the objective function f i (x i )(i = 1,, n) by imposing the additional cocercity assumption on f i (x i )(i = 1,, n) [19]. Deng et al. proved that the multi-block ADMM preserved convergence if multiple variables x i (i = 1,, n) were updated in the Jacobian fashion rather than Gauss-Seidel fashion [8]. Lin et al. proved that the multi-block ADMM converged for any ρ > 0 in the regularized least squares decomposition problem [18]. Moreover, some work extended the multiblock ADMM into the nonconvex settings. See [21, 34, 36] for more information. D Baselines In order to test the scalability of the multi-block ADMM on two applications, two baselines were utilized for comparison: 1. Smoothed Pool-Adjacent-Violators(SPAV) [31]. The SPAV algorithm is a extension of the PAV algorithm designed for the smoothed isotonic regression problem. It partitions all β i (i = 1,, n) into many blocks according to the feasibility of inequality constraints. The iteration ends when all constraints are feasible. 2. Interior Point Method(IPM) [16]. The IPM method is an efficient algorithm to solve multi-dimensional ordering with convergence guarantee. Its time complexity is O( E 1.5 log 2 V log 2 ( V /δ)) where δ is a tolerance. 14

arxiv: v1 [math.oc] 23 May 2017

arxiv: v1 [math.oc] 23 May 2017 A DERANDOMIZED ALGORITHM FOR RP-ADMM WITH SYMMETRIC GAUSS-SEIDEL METHOD JINCHAO XU, KAILAI XU, AND YINYU YE arxiv:1705.08389v1 [math.oc] 23 May 2017 Abstract. For multi-block alternating direction method

More information

Fast Nonnegative Matrix Factorization with Rank-one ADMM

Fast Nonnegative Matrix Factorization with Rank-one ADMM Fast Nonnegative Matrix Factorization with Rank-one Dongjin Song, David A. Meyer, Martin Renqiang Min, Department of ECE, UCSD, La Jolla, CA, 9093-0409 dosong@ucsd.edu Department of Mathematics, UCSD,

More information

Online isotonic regression

Online isotonic regression Online isotonic regression Wojciech Kot lowski Joint work with: Wouter Koolen (CWI, Amsterdam) Alan Malek (MIT) Poznań University of Technology 06.06.2017 Outline 1 Motivation 2 Isotonic regression 3 Online

More information

Recent Developments of Alternating Direction Method of Multipliers with Multi-Block Variables

Recent Developments of Alternating Direction Method of Multipliers with Multi-Block Variables Recent Developments of Alternating Direction Method of Multipliers with Multi-Block Variables Department of Systems Engineering and Engineering Management The Chinese University of Hong Kong 2014 Workshop

More information

Distributed Optimization via Alternating Direction Method of Multipliers

Distributed Optimization via Alternating Direction Method of Multipliers Distributed Optimization via Alternating Direction Method of Multipliers Stephen Boyd, Neal Parikh, Eric Chu, Borja Peleato Stanford University ITMANET, Stanford, January 2011 Outline precursors dual decomposition

More information

A Bregman alternating direction method of multipliers for sparse probabilistic Boolean network problem

A Bregman alternating direction method of multipliers for sparse probabilistic Boolean network problem A Bregman alternating direction method of multipliers for sparse probabilistic Boolean network problem Kangkang Deng, Zheng Peng Abstract: The main task of genetic regulatory networks is to construct a

More information

An efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss

An efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss An efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss arxiv:1811.04545v1 [stat.co] 12 Nov 2018 Cheng Wang School of Mathematical Sciences, Shanghai Jiao

More information

A Customized ADMM for Rank-Constrained Optimization Problems with Approximate Formulations

A Customized ADMM for Rank-Constrained Optimization Problems with Approximate Formulations A Customized ADMM for Rank-Constrained Optimization Problems with Approximate Formulations Chuangchuang Sun and Ran Dai Abstract This paper proposes a customized Alternating Direction Method of Multipliers

More information

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 9. Alternating Direction Method of Multipliers

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 9. Alternating Direction Method of Multipliers Shiqian Ma, MAT-258A: Numerical Optimization 1 Chapter 9 Alternating Direction Method of Multipliers Shiqian Ma, MAT-258A: Numerical Optimization 2 Separable convex optimization a special case is min f(x)

More information

Decomposing Isotonic Regression for Efficiently Solving Large Problems

Decomposing Isotonic Regression for Efficiently Solving Large Problems Decomposing Isotonic Regression for Efficiently Solving Large Problems Ronny Luss Dept. of Statistics and OR Tel Aviv University ronnyluss@gmail.com Saharon Rosset Dept. of Statistics and OR Tel Aviv University

More information

ARock: an algorithmic framework for asynchronous parallel coordinate updates

ARock: an algorithmic framework for asynchronous parallel coordinate updates ARock: an algorithmic framework for asynchronous parallel coordinate updates Zhimin Peng, Yangyang Xu, Ming Yan, Wotao Yin ( UCLA Math, U.Waterloo DCO) UCLA CAM Report 15-37 ShanghaiTech SSDS 15 June 25,

More information

Does Alternating Direction Method of Multipliers Converge for Nonconvex Problems?

Does Alternating Direction Method of Multipliers Converge for Nonconvex Problems? Does Alternating Direction Method of Multipliers Converge for Nonconvex Problems? Mingyi Hong IMSE and ECpE Department Iowa State University ICCOPT, Tokyo, August 2016 Mingyi Hong (Iowa State University)

More information

Dual Ascent. Ryan Tibshirani Convex Optimization

Dual Ascent. Ryan Tibshirani Convex Optimization Dual Ascent Ryan Tibshirani Conve Optimization 10-725 Last time: coordinate descent Consider the problem min f() where f() = g() + n i=1 h i( i ), with g conve and differentiable and each h i conve. Coordinate

More information

Divide-and-combine Strategies in Statistical Modeling for Massive Data

Divide-and-combine Strategies in Statistical Modeling for Massive Data Divide-and-combine Strategies in Statistical Modeling for Massive Data Liqun Yu Washington University in St. Louis March 30, 2017 Liqun Yu (WUSTL) D&C Statistical Modeling for Massive Data March 30, 2017

More information

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization /

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization / Uses of duality Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Remember conjugate functions Given f : R n R, the function is called its conjugate f (y) = max x R n yt x f(x) Conjugates appear

More information

Nonconvex ADMM: Convergence and Applications

Nonconvex ADMM: Convergence and Applications Nonconvex ADMM: Convergence and Applications Instructor: Wotao Yin (UCLA Math) Based on CAM 15-62 with Yu Wang and Jinshan Zeng Summer 2016 1 / 54 1. Alternating Direction Method of Multipliers (ADMM):

More information

Beyond Heuristics: Applying Alternating Direction Method of Multipliers in Nonconvex Territory

Beyond Heuristics: Applying Alternating Direction Method of Multipliers in Nonconvex Territory Beyond Heuristics: Applying Alternating Direction Method of Multipliers in Nonconvex Territory Xin Liu(4Ð) State Key Laboratory of Scientific and Engineering Computing Institute of Computational Mathematics

More information

Alternating Direction Method of Multipliers. Ryan Tibshirani Convex Optimization

Alternating Direction Method of Multipliers. Ryan Tibshirani Convex Optimization Alternating Direction Method of Multipliers Ryan Tibshirani Convex Optimization 10-725 Consider the problem Last time: dual ascent min x f(x) subject to Ax = b where f is strictly convex and closed. Denote

More information

The Alternating Direction Method of Multipliers

The Alternating Direction Method of Multipliers The Alternating Direction Method of Multipliers With Adaptive Step Size Selection Peter Sutor, Jr. Project Advisor: Professor Tom Goldstein December 2, 2015 1 / 25 Background The Dual Problem Consider

More information

The Direct Extension of ADMM for Multi-block Convex Minimization Problems is Not Necessarily Convergent

The Direct Extension of ADMM for Multi-block Convex Minimization Problems is Not Necessarily Convergent The Direct Extension of ADMM for Multi-block Convex Minimization Problems is Not Necessarily Convergent Yinyu Ye K. T. Li Professor of Engineering Department of Management Science and Engineering Stanford

More information

Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.18

Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.18 XVIII - 1 Contraction Methods for Convex Optimization and Monotone Variational Inequalities No18 Linearized alternating direction method with Gaussian back substitution for separable convex optimization

More information

Decentralized Quadratically Approximated Alternating Direction Method of Multipliers

Decentralized Quadratically Approximated Alternating Direction Method of Multipliers Decentralized Quadratically Approimated Alternating Direction Method of Multipliers Aryan Mokhtari Wei Shi Qing Ling Alejandro Ribeiro Department of Electrical and Systems Engineering, University of Pennsylvania

More information

Distributed Optimization and Statistics via Alternating Direction Method of Multipliers

Distributed Optimization and Statistics via Alternating Direction Method of Multipliers Distributed Optimization and Statistics via Alternating Direction Method of Multipliers Stephen Boyd, Neal Parikh, Eric Chu, Borja Peleato Stanford University Stanford Statistics Seminar, September 2010

More information

Contraction Methods for Convex Optimization and monotone variational inequalities No.12

Contraction Methods for Convex Optimization and monotone variational inequalities No.12 XII - 1 Contraction Methods for Convex Optimization and monotone variational inequalities No.12 Linearized alternating direction methods of multipliers for separable convex programming Bingsheng He Department

More information

Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.11

Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.11 XI - 1 Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.11 Alternating direction methods of multipliers for separable convex programming Bingsheng He Department of Mathematics

More information

Optimizing Nonconvex Finite Sums by a Proximal Primal-Dual Method

Optimizing Nonconvex Finite Sums by a Proximal Primal-Dual Method Optimizing Nonconvex Finite Sums by a Proximal Primal-Dual Method Davood Hajinezhad Iowa State University Davood Hajinezhad Optimizing Nonconvex Finite Sums by a Proximal Primal-Dual Method 1 / 35 Co-Authors

More information

Dual methods and ADMM. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725

Dual methods and ADMM. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725 Dual methods and ADMM Barnabas Poczos & Ryan Tibshirani Convex Optimization 10-725/36-725 1 Given f : R n R, the function is called its conjugate Recall conjugate functions f (y) = max x R n yt x f(x)

More information

ECS289: Scalable Machine Learning

ECS289: Scalable Machine Learning ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Sept 29, 2016 Outline Convex vs Nonconvex Functions Coordinate Descent Gradient Descent Newton s method Stochastic Gradient Descent Numerical Optimization

More information

Multi-Block ADMM and its Convergence

Multi-Block ADMM and its Convergence Multi-Block ADMM and its Convergence Yinyu Ye Department of Management Science and Engineering Stanford University Email: yyye@stanford.edu We show that the direct extension of alternating direction method

More information

Exact Hybrid Covariance Thresholding for Joint Graphical Lasso

Exact Hybrid Covariance Thresholding for Joint Graphical Lasso Exact Hybrid Covariance Thresholding for Joint Graphical Lasso Qingming Tang Chao Yang Jian Peng Jinbo Xu Toyota Technological Institute at Chicago Massachusetts Institute of Technology Abstract. This

More information

Introduction to Alternating Direction Method of Multipliers

Introduction to Alternating Direction Method of Multipliers Introduction to Alternating Direction Method of Multipliers Yale Chang Machine Learning Group Meeting September 29, 2016 Yale Chang (Machine Learning Group Meeting) Introduction to Alternating Direction

More information

STA141C: Big Data & High Performance Statistical Computing

STA141C: Big Data & High Performance Statistical Computing STA141C: Big Data & High Performance Statistical Computing Lecture 8: Optimization Cho-Jui Hsieh UC Davis May 9, 2017 Optimization Numerical Optimization Numerical Optimization: min X f (X ) Can be applied

More information

A Randomized Approach for Crowdsourcing in the Presence of Multiple Views

A Randomized Approach for Crowdsourcing in the Presence of Multiple Views A Randomized Approach for Crowdsourcing in the Presence of Multiple Views Presenter: Yao Zhou joint work with: Jingrui He - 1 - Roadmap Motivation Proposed framework: M2VW Experimental results Conclusion

More information

Dealing with Constraints via Random Permutation

Dealing with Constraints via Random Permutation Dealing with Constraints via Random Permutation Ruoyu Sun UIUC Joint work with Zhi-Quan Luo (U of Minnesota and CUHK (SZ)) and Yinyu Ye (Stanford) Simons Institute Workshop on Fast Iterative Methods in

More information

Dual Methods. Lecturer: Ryan Tibshirani Convex Optimization /36-725

Dual Methods. Lecturer: Ryan Tibshirani Convex Optimization /36-725 Dual Methods Lecturer: Ryan Tibshirani Conve Optimization 10-725/36-725 1 Last time: proimal Newton method Consider the problem min g() + h() where g, h are conve, g is twice differentiable, and h is simple.

More information

Linearized Alternating Direction Method with Adaptive Penalty for Low-Rank Representation

Linearized Alternating Direction Method with Adaptive Penalty for Low-Rank Representation Linearized Alternating Direction Method with Adaptive Penalty for Low-Rank Representation Zhouchen Lin Visual Computing Group Microsoft Research Asia Risheng Liu Zhixun Su School of Mathematical Sciences

More information

Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation

Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation Adam J. Rothman School of Statistics University of Minnesota October 8, 2014, joint work with Liliana

More information

BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage

BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage Lingrui Gan, Naveen N. Narisetty, Feng Liang Department of Statistics University of Illinois at Urbana-Champaign Problem Statement

More information

Distributed Inexact Newton-type Pursuit for Non-convex Sparse Learning

Distributed Inexact Newton-type Pursuit for Non-convex Sparse Learning Distributed Inexact Newton-type Pursuit for Non-convex Sparse Learning Bo Liu Department of Computer Science, Rutgers Univeristy Xiao-Tong Yuan BDAT Lab, Nanjing University of Information Science and Technology

More information

Forward Selection and Estimation in High Dimensional Single Index Models

Forward Selection and Estimation in High Dimensional Single Index Models Forward Selection and Estimation in High Dimensional Single Index Models Shikai Luo and Subhashis Ghosal North Carolina State University August 29, 2016 Abstract We propose a new variable selection and

More information

LARGE SCALE 2D SPECTRAL COMPRESSED SENSING IN CONTINUOUS DOMAIN

LARGE SCALE 2D SPECTRAL COMPRESSED SENSING IN CONTINUOUS DOMAIN LARGE SCALE 2D SPECTRAL COMPRESSED SENSING IN CONTINUOUS DOMAIN Jian-Feng Cai, Weiyu Xu 2, and Yang Yang 3 Department of Mathematics, Hong Kong University of Science and Technology, Clear Water Bay, Kowloon,

More information

Simplicial Nonnegative Matrix Tri-Factorization: Fast Guaranteed Parallel Algorithm

Simplicial Nonnegative Matrix Tri-Factorization: Fast Guaranteed Parallel Algorithm Simplicial Nonnegative Matrix Tri-Factorization: Fast Guaranteed Parallel Algorithm Duy-Khuong Nguyen 13, Quoc Tran-Dinh 2, and Tu-Bao Ho 14 1 Japan Advanced Institute of Science and Technology, Japan

More information

A GENERAL FORMULATION FOR SUPPORT VECTOR MACHINES. Wei Chu, S. Sathiya Keerthi, Chong Jin Ong

A GENERAL FORMULATION FOR SUPPORT VECTOR MACHINES. Wei Chu, S. Sathiya Keerthi, Chong Jin Ong A GENERAL FORMULATION FOR SUPPORT VECTOR MACHINES Wei Chu, S. Sathiya Keerthi, Chong Jin Ong Control Division, Department of Mechanical Engineering, National University of Singapore 0 Kent Ridge Crescent,

More information

Coordinate Update Algorithm Short Course Operator Splitting

Coordinate Update Algorithm Short Course Operator Splitting Coordinate Update Algorithm Short Course Operator Splitting Instructor: Wotao Yin (UCLA Math) Summer 2016 1 / 25 Operator splitting pipeline 1. Formulate a problem as 0 A(x) + B(x) with monotone operators

More information

Math 273a: Optimization Overview of First-Order Optimization Algorithms

Math 273a: Optimization Overview of First-Order Optimization Algorithms Math 273a: Optimization Overview of First-Order Optimization Algorithms Wotao Yin Department of Mathematics, UCLA online discussions on piazza.com 1 / 9 Typical flow of numerical optimization Optimization

More information

Differential network analysis from cross-platform gene expression data: Supplementary Information

Differential network analysis from cross-platform gene expression data: Supplementary Information Differential network analysis from cross-platform gene expression data: Supplementary Information Xiao-Fei Zhang, Le Ou-Yang, Xing-Ming Zhao, and Hong Yan Contents 1 Supplementary Figures Supplementary

More information

Linearized Alternating Direction Method of Multipliers for Constrained Nonconvex Regularized Optimization

Linearized Alternating Direction Method of Multipliers for Constrained Nonconvex Regularized Optimization JMLR: Workshop and Conference Proceedings 63:97 19, 216 ACML 216 Linearized Alternating Direction Method of Multipliers for Constrained Nonconvex Regularized Optimization Linbo Qiao qiao.linbo@nudt.edu.cn

More information

Distributed Convex Optimization

Distributed Convex Optimization Master Program 2013-2015 Electrical Engineering Distributed Convex Optimization A Study on the Primal-Dual Method of Multipliers Delft University of Technology He Ming Zhang, Guoqiang Zhang, Richard Heusdens

More information

Supplementary Material of A Novel Sparsity Measure for Tensor Recovery

Supplementary Material of A Novel Sparsity Measure for Tensor Recovery Supplementary Material of A Novel Sparsity Measure for Tensor Recovery Qian Zhao 1,2 Deyu Meng 1,2 Xu Kong 3 Qi Xie 1,2 Wenfei Cao 1,2 Yao Wang 1,2 Zongben Xu 1,2 1 School of Mathematics and Statistics,

More information

2 Regularized Image Reconstruction for Compressive Imaging and Beyond

2 Regularized Image Reconstruction for Compressive Imaging and Beyond EE 367 / CS 448I Computational Imaging and Display Notes: Compressive Imaging and Regularized Image Reconstruction (lecture ) Gordon Wetzstein gordon.wetzstein@stanford.edu This document serves as a supplement

More information

Recent Advances in Structured Sparse Models

Recent Advances in Structured Sparse Models Recent Advances in Structured Sparse Models Julien Mairal Willow group - INRIA - ENS - Paris 21 September 2010 LEAR seminar At Grenoble, September 21 st, 2010 Julien Mairal Recent Advances in Structured

More information

Learning Bound for Parameter Transfer Learning

Learning Bound for Parameter Transfer Learning Learning Bound for Parameter Transfer Learning Wataru Kumagai Faculty of Engineering Kanagawa University kumagai@kanagawa-u.ac.jp Abstract We consider a transfer-learning problem by using the parameter

More information

ECS289: Scalable Machine Learning

ECS289: Scalable Machine Learning ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Oct 18, 2016 Outline One versus all/one versus one Ranking loss for multiclass/multilabel classification Scaling to millions of labels Multiclass

More information

Sparse Gaussian conditional random fields

Sparse Gaussian conditional random fields Sparse Gaussian conditional random fields Matt Wytock, J. ico Kolter School of Computer Science Carnegie Mellon University Pittsburgh, PA 53 {mwytock, zkolter}@cs.cmu.edu Abstract We propose sparse Gaussian

More information

HYBRID JACOBIAN AND GAUSS SEIDEL PROXIMAL BLOCK COORDINATE UPDATE METHODS FOR LINEARLY CONSTRAINED CONVEX PROGRAMMING

HYBRID JACOBIAN AND GAUSS SEIDEL PROXIMAL BLOCK COORDINATE UPDATE METHODS FOR LINEARLY CONSTRAINED CONVEX PROGRAMMING SIAM J. OPTIM. Vol. 8, No. 1, pp. 646 670 c 018 Society for Industrial and Applied Mathematics HYBRID JACOBIAN AND GAUSS SEIDEL PROXIMAL BLOCK COORDINATE UPDATE METHODS FOR LINEARLY CONSTRAINED CONVEX

More information

A Brief Overview of Practical Optimization Algorithms in the Context of Relaxation

A Brief Overview of Practical Optimization Algorithms in the Context of Relaxation A Brief Overview of Practical Optimization Algorithms in the Context of Relaxation Zhouchen Lin Peking University April 22, 2018 Too Many Opt. Problems! Too Many Opt. Algorithms! Zero-th order algorithms:

More information

10-725/36-725: Convex Optimization Spring Lecture 21: April 6

10-725/36-725: Convex Optimization Spring Lecture 21: April 6 10-725/36-725: Conve Optimization Spring 2015 Lecturer: Ryan Tibshirani Lecture 21: April 6 Scribes: Chiqun Zhang, Hanqi Cheng, Waleed Ammar Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer:

More information

Preconditioning via Diagonal Scaling

Preconditioning via Diagonal Scaling Preconditioning via Diagonal Scaling Reza Takapoui Hamid Javadi June 4, 2014 1 Introduction Interior point methods solve small to medium sized problems to high accuracy in a reasonable amount of time.

More information

Asynchronous Parallel Computing in Signal Processing and Machine Learning

Asynchronous Parallel Computing in Signal Processing and Machine Learning Asynchronous Parallel Computing in Signal Processing and Machine Learning Wotao Yin (UCLA Math) joint with Zhimin Peng (UCLA), Yangyang Xu (IMA), Ming Yan (MSU) Optimization and Parsimonious Modeling IMA,

More information

The Direct Extension of ADMM for Multi-block Convex Minimization Problems is Not Necessarily Convergent

The Direct Extension of ADMM for Multi-block Convex Minimization Problems is Not Necessarily Convergent The Direct Extension of ADMM for Multi-block Convex Minimization Problems is Not Necessarily Convergent Caihua Chen Bingsheng He Yinyu Ye Xiaoming Yuan 4 December, Abstract. The alternating direction method

More information

A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables

A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables Niharika Gauraha and Swapan Parui Indian Statistical Institute Abstract. We consider the problem of

More information

Fast ADMM for Sum of Squares Programs Using Partial Orthogonality

Fast ADMM for Sum of Squares Programs Using Partial Orthogonality Fast ADMM for Sum of Squares Programs Using Partial Orthogonality Antonis Papachristodoulou Department of Engineering Science University of Oxford www.eng.ox.ac.uk/control/sysos antonis@eng.ox.ac.uk with

More information

Solving DC Programs that Promote Group 1-Sparsity

Solving DC Programs that Promote Group 1-Sparsity Solving DC Programs that Promote Group 1-Sparsity Ernie Esser Contains joint work with Xiaoqun Zhang, Yifei Lou and Jack Xin SIAM Conference on Imaging Science Hong Kong Baptist University May 14 2014

More information

Support Vector Machines and Kernel Methods

Support Vector Machines and Kernel Methods 2018 CS420 Machine Learning, Lecture 3 Hangout from Prof. Andrew Ng. http://cs229.stanford.edu/notes/cs229-notes3.pdf Support Vector Machines and Kernel Methods Weinan Zhang Shanghai Jiao Tong University

More information

L 2,1 Norm and its Applications

L 2,1 Norm and its Applications L 2, Norm and its Applications Yale Chang Introduction According to the structure of the constraints, the sparsity can be obtained from three types of regularizers for different purposes.. Flat Sparsity.

More information

Fast Asynchronous Parallel Stochastic Gradient Descent: A Lock-Free Approach with Convergence Guarantee

Fast Asynchronous Parallel Stochastic Gradient Descent: A Lock-Free Approach with Convergence Guarantee Fast Asynchronous Parallel Stochastic Gradient Descent: A Lock-Free Approach with Convergence Guarantee Shen-Yi Zhao and Wu-Jun Li National Key Laboratory for Novel Software Technology Department of Computer

More information

Scalable Subspace Clustering

Scalable Subspace Clustering Scalable Subspace Clustering René Vidal Center for Imaging Science, Laboratory for Computational Sensing and Robotics, Institute for Computational Medicine, Department of Biomedical Engineering, Johns

More information

arxiv: v1 [cs.cv] 1 Jun 2014

arxiv: v1 [cs.cv] 1 Jun 2014 l 1 -regularized Outlier Isolation and Regression arxiv:1406.0156v1 [cs.cv] 1 Jun 2014 Han Sheng Department of Electrical and Electronic Engineering, The University of Hong Kong, HKU Hong Kong, China sheng4151@gmail.com

More information

Accelerated primal-dual methods for linearly constrained convex problems

Accelerated primal-dual methods for linearly constrained convex problems Accelerated primal-dual methods for linearly constrained convex problems Yangyang Xu SIAM Conference on Optimization May 24, 2017 1 / 23 Accelerated proximal gradient For convex composite problem: minimize

More information

Jeff Howbert Introduction to Machine Learning Winter

Jeff Howbert Introduction to Machine Learning Winter Classification / Regression Support Vector Machines Jeff Howbert Introduction to Machine Learning Winter 2012 1 Topics SVM classifiers for linearly separable classes SVM classifiers for non-linearly separable

More information

A Tutorial on Support Vector Machine

A Tutorial on Support Vector Machine A Tutorial on School of Computing National University of Singapore Contents Theory on Using with Other s Contents Transforming Theory on Using with Other s What is a classifier? A function that maps instances

More information

SMO vs PDCO for SVM: Sequential Minimal Optimization vs Primal-Dual interior method for Convex Objectives for Support Vector Machines

SMO vs PDCO for SVM: Sequential Minimal Optimization vs Primal-Dual interior method for Convex Objectives for Support Vector Machines vs for SVM: Sequential Minimal Optimization vs Primal-Dual interior method for Convex Objectives for Support Vector Machines Ding Ma Michael Saunders Working paper, January 5 Introduction In machine learning,

More information

GROUP-SPARSE SUBSPACE CLUSTERING WITH MISSING DATA

GROUP-SPARSE SUBSPACE CLUSTERING WITH MISSING DATA GROUP-SPARSE SUBSPACE CLUSTERING WITH MISSING DATA D Pimentel-Alarcón 1, L Balzano 2, R Marcia 3, R Nowak 1, R Willett 1 1 University of Wisconsin - Madison, 2 University of Michigan - Ann Arbor, 3 University

More information

Nonlinear Support Vector Machines through Iterative Majorization and I-Splines

Nonlinear Support Vector Machines through Iterative Majorization and I-Splines Nonlinear Support Vector Machines through Iterative Majorization and I-Splines P.J.F. Groenen G. Nalbantov J.C. Bioch July 9, 26 Econometric Institute Report EI 26-25 Abstract To minimize the primal support

More information

Large-scale Stochastic Optimization

Large-scale Stochastic Optimization Large-scale Stochastic Optimization 11-741/641/441 (Spring 2016) Hanxiao Liu hanxiaol@cs.cmu.edu March 24, 2016 1 / 22 Outline 1. Gradient Descent (GD) 2. Stochastic Gradient Descent (SGD) Formulation

More information

Support Vector Machines: Maximum Margin Classifiers

Support Vector Machines: Maximum Margin Classifiers Support Vector Machines: Maximum Margin Classifiers Machine Learning and Pattern Recognition: September 16, 2008 Piotr Mirowski Based on slides by Sumit Chopra and Fu-Jie Huang 1 Outline What is behind

More information

Algorithm-Hardware Co-Optimization of Memristor-Based Framework for Solving SOCP and Homogeneous QCQP Problems

Algorithm-Hardware Co-Optimization of Memristor-Based Framework for Solving SOCP and Homogeneous QCQP Problems L.C.Smith College of Engineering and Computer Science Algorithm-Hardware Co-Optimization of Memristor-Based Framework for Solving SOCP and Homogeneous QCQP Problems Ao Ren Sijia Liu Ruizhe Cai Wujie Wen

More information

ADMM and Fast Gradient Methods for Distributed Optimization

ADMM and Fast Gradient Methods for Distributed Optimization ADMM and Fast Gradient Methods for Distributed Optimization João Xavier Instituto Sistemas e Robótica (ISR), Instituto Superior Técnico (IST) European Control Conference, ECC 13 July 16, 013 Joint work

More information

ECS289: Scalable Machine Learning

ECS289: Scalable Machine Learning ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Oct 27, 2015 Outline One versus all/one versus one Ranking loss for multiclass/multilabel classification Scaling to millions of labels Multiclass

More information

A Fast Augmented Lagrangian Algorithm for Learning Low-Rank Matrices

A Fast Augmented Lagrangian Algorithm for Learning Low-Rank Matrices A Fast Augmented Lagrangian Algorithm for Learning Low-Rank Matrices Ryota Tomioka 1, Taiji Suzuki 1, Masashi Sugiyama 2, Hisashi Kashima 1 1 The University of Tokyo 2 Tokyo Institute of Technology 2010-06-22

More information

ON THE GLOBAL AND LINEAR CONVERGENCE OF THE GENERALIZED ALTERNATING DIRECTION METHOD OF MULTIPLIERS

ON THE GLOBAL AND LINEAR CONVERGENCE OF THE GENERALIZED ALTERNATING DIRECTION METHOD OF MULTIPLIERS ON THE GLOBAL AND LINEAR CONVERGENCE OF THE GENERALIZED ALTERNATING DIRECTION METHOD OF MULTIPLIERS WEI DENG AND WOTAO YIN Abstract. The formulation min x,y f(x) + g(y) subject to Ax + By = b arises in

More information

Coordinate descent methods

Coordinate descent methods Coordinate descent methods Master Mathematics for data science and big data Olivier Fercoq November 3, 05 Contents Exact coordinate descent Coordinate gradient descent 3 3 Proximal coordinate descent 5

More information

Discriminative Feature Grouping

Discriminative Feature Grouping Discriative Feature Grouping Lei Han and Yu Zhang, Department of Computer Science, Hong Kong Baptist University, Hong Kong The Institute of Research and Continuing Education, Hong Kong Baptist University

More information

Differentiable Fine-grained Quantization for Deep Neural Network Compression

Differentiable Fine-grained Quantization for Deep Neural Network Compression Differentiable Fine-grained Quantization for Deep Neural Network Compression Hsin-Pai Cheng hc218@duke.edu Yuanjun Huang University of Science and Technology of China Anhui, China yjhuang@mail.ustc.edu.cn

More information

Inexact Alternating Direction Method of Multipliers for Separable Convex Optimization

Inexact Alternating Direction Method of Multipliers for Separable Convex Optimization Inexact Alternating Direction Method of Multipliers for Separable Convex Optimization Hongchao Zhang hozhang@math.lsu.edu Department of Mathematics Center for Computation and Technology Louisiana State

More information

MULTIPLEKERNELLEARNING CSE902

MULTIPLEKERNELLEARNING CSE902 MULTIPLEKERNELLEARNING CSE902 Multiple Kernel Learning -keywords Heterogeneous information fusion Feature selection Max-margin classification Multiple kernel learning MKL Convex optimization Kernel classification

More information

Matrix Factorization with Applications to Clustering Problems: Formulation, Algorithms and Performance

Matrix Factorization with Applications to Clustering Problems: Formulation, Algorithms and Performance Date: Mar. 3rd, 2017 Matrix Factorization with Applications to Clustering Problems: Formulation, Algorithms and Performance Presenter: Songtao Lu Department of Electrical and Computer Engineering Iowa

More information

On the Convergence of Multi-Block Alternating Direction Method of Multipliers and Block Coordinate Descent Method

On the Convergence of Multi-Block Alternating Direction Method of Multipliers and Block Coordinate Descent Method On the Convergence of Multi-Block Alternating Direction Method of Multipliers and Block Coordinate Descent Method Caihua Chen Min Li Xin Liu Yinyu Ye This version: September 9, 2015 Abstract The paper

More information

Sparse inverse covariance estimation with the lasso

Sparse inverse covariance estimation with the lasso Sparse inverse covariance estimation with the lasso Jerome Friedman Trevor Hastie and Robert Tibshirani November 8, 2007 Abstract We consider the problem of estimating sparse graphs by a lasso penalty

More information

An Optimization-based Approach to Decentralized Assignability

An Optimization-based Approach to Decentralized Assignability 2016 American Control Conference (ACC) Boston Marriott Copley Place July 6-8, 2016 Boston, MA, USA An Optimization-based Approach to Decentralized Assignability Alborz Alavian and Michael Rotkowitz Abstract

More information

A Unified Approach to Proximal Algorithms using Bregman Distance

A Unified Approach to Proximal Algorithms using Bregman Distance A Unified Approach to Proximal Algorithms using Bregman Distance Yi Zhou a,, Yingbin Liang a, Lixin Shen b a Department of Electrical Engineering and Computer Science, Syracuse University b Department

More information

An Efficient Algorithm For Weak Hierarchical Lasso. Yashu Liu, Jie Wang, Jieping Ye Arizona State University

An Efficient Algorithm For Weak Hierarchical Lasso. Yashu Liu, Jie Wang, Jieping Ye Arizona State University An Efficient Algorithm For Weak Hierarchical Lasso Yashu Liu, Jie Wang, Jieping Ye Arizona State University Outline Regression with Interactions Problems and Challenges Weak Hierarchical Lasso The Proposed

More information

Big Data Analytics: Optimization and Randomization

Big Data Analytics: Optimization and Randomization Big Data Analytics: Optimization and Randomization Tianbao Yang Tutorial@ACML 2015 Hong Kong Department of Computer Science, The University of Iowa, IA, USA Nov. 20, 2015 Yang Tutorial for ACML 15 Nov.

More information

Analysis of Robust PCA via Local Incoherence

Analysis of Robust PCA via Local Incoherence Analysis of Robust PCA via Local Incoherence Huishuai Zhang Department of EECS Syracuse University Syracuse, NY 3244 hzhan23@syr.edu Yi Zhou Department of EECS Syracuse University Syracuse, NY 3244 yzhou35@syr.edu

More information

4y Springer NONLINEAR INTEGER PROGRAMMING

4y Springer NONLINEAR INTEGER PROGRAMMING NONLINEAR INTEGER PROGRAMMING DUAN LI Department of Systems Engineering and Engineering Management The Chinese University of Hong Kong Shatin, N. T. Hong Kong XIAOLING SUN Department of Mathematics Shanghai

More information

Change point method: an exact line search method for SVMs

Change point method: an exact line search method for SVMs Erasmus University Rotterdam Bachelor Thesis Econometrics & Operations Research Change point method: an exact line search method for SVMs Author: Yegor Troyan Student number: 386332 Supervisor: Dr. P.J.F.

More information

LOCAL LINEAR CONVERGENCE OF ADMM Daniel Boley

LOCAL LINEAR CONVERGENCE OF ADMM Daniel Boley LOCAL LINEAR CONVERGENCE OF ADMM Daniel Boley Model QP/LP: min 1 / 2 x T Qx+c T x s.t. Ax = b, x 0, (1) Lagrangian: L(x,y) = 1 / 2 x T Qx+c T x y T x s.t. Ax = b, (2) where y 0 is the vector of Lagrange

More information

Convergence of Bregman Alternating Direction Method with Multipliers for Nonconvex Composite Problems

Convergence of Bregman Alternating Direction Method with Multipliers for Nonconvex Composite Problems Convergence of Bregman Alternating Direction Method with Multipliers for Nonconvex Composite Problems Fenghui Wang, Zongben Xu, and Hong-Kun Xu arxiv:40.8625v3 [math.oc] 5 Dec 204 Abstract The alternating

More information

Distributed Consensus Optimization

Distributed Consensus Optimization Distributed Consensus Optimization Ming Yan Michigan State University, CMSE/Mathematics September 14, 2018 Decentralized-1 Backgroundwhy andwe motivation need decentralized optimization? I Decentralized

More information

Sparse and Regularized Optimization

Sparse and Regularized Optimization Sparse and Regularized Optimization In many applications, we seek not an exact minimizer of the underlying objective, but rather an approximate minimizer that satisfies certain desirable properties: sparsity

More information