arxiv: v1 [math.oc] 4 Mar 2019
|
|
- Erick Fisher
- 5 years ago
- Views:
Transcription
1 arxiv: v1 [math.oc] 4 Mar 2019 The Application of Multi-block ADMM on Isotonic Regression Problems Junxiang Wang and Liang Zhao George Mason University Abstract The multi-block ADMM has received much attention from optimization researchers due to its good scalability. In this paper, the multi-block ADMM is applied to solve two large-scale problems related to isotonic regression. Numerical experiments show that the multi-block ADMM is convergent when the chosen parameter is small enough and the multi-block ADMM scales well compared with baselines. 1 Introduction In recent years, the Alternating Direction Method of Multipliers (ADMM) has received considerable attention from the community of machine learning researchers. This is because it is a natural fit for wide large-scale data applications, including asset allocation [37], phase retrieval [38], event forecasting [40, 41], vaccine adverse event detection [35] and compressive sensing [6]. The main advantage of the ADMM is that it splits separable objective functions into two subproblems, each of which is easy to solve. Therefore, the ADMM can be implemented in a distributive and parallel manner. One part of the ADMM community investigates the convergence property of the multi-block ADMM, the extension of the classic ADMM with no less than three variables. The multi-block ADMM is written mathematically as follows: min x1,,x n n i=1 f i(x i ), s.t. n i=1 A ix i = 0 where f i : R mi R(i = 1,, n) are convex functions, x i R mi (i = 1,, n) are vectors of length m i. A i R p mi (i = 1,, n) are matrices. The augmented Lagrangian function of the multi-block ADMM is L ρ (x 1,, x n, y) = n i=1 f i(x i ) + (ρ/2) n i=1 A ix i + y/ρ 2 2 where y R p is a dual variable and ρ > 0 is a penalty parameter. The multi-block ADMM is solved by the following steps: x k+1 i arg min xi L ρ (, x k+1 i 1, x i, x k i+1, )(i = 1,, n) y k+1 y k + ρ n A ix k+1 i=1 i 1
2 The primal residual r and dual residuals s i (i = 1,, n 1) can monitor the convergence process of the multi-block ADMM, which are defined as r k = n i=1 A ix k i, sk i = ρ n j=i+1 A j(x k+1 j x k j ), respectively. There are a variety of related works on discussing the convergence of the multi-block ADMM, which are detailed in Section C in the supplementary materials. Because the multi-block ADMM is a computational framework to solve problems with linear equality constraints, it cannot directly be applied to the problems with multiple inequality constraints such as the well-known isotonic regression problems [16]. In this paper, we propose new strategies based on the multi-block ADMM to address existing computational challenges in the isotonic regression problems. The isotonic regression, which was firstly introduced by 1950s, aims to return a sequence of responses given a predictor and pre-defined order constraints, which are represented by a Directed Acyclic Graph (DAG) [1]. The isotonic regression has a board range of applications such as learning in the past ten years [14, 15, 22, 24, 39]. There are a variety of works to solve isotonic regression problems. See [2, 3, 15, 16, 22, 29] for more information. Most of them solve the isotonic regression problem step-wise because one objective variable may be subject to another. However, The main drawback is that their computational cost is very expensive when solving large-scale problems. To deal with the challenge of scalability, we leverge the advantage of parallel computing of the multi-block ADMM by integrating objective variables into several vectors. The following questions are addressed for the proposed multiblock ADMM frameworks on two isotonic regression problems: 1. Does the multi-block ADMM converge whatever ρ is chosen? 2. Is the multi-block ADMM scalable to two problems compared with other baselines? The rest of the paper is organized as follows. In Section 2, we give formulations of two problems related to isotonic regression and present two multi-block ADMM algorithms to solve them, respectively. In Section 3, experiments on simulated datasets are conducted to validate the convergence properties and scalability of the multi-block ADMM. Section 4 concludes by summarizing the whole paper. 2 Isotonic Regression Problems In this section, we introduce two isotonic regression problems in detail. Specifically, Section 2.1 focuses on how the multi-block ADMM solves the smoothed isotonic regression problem with linear order constraints; a multidimensional ordering problem is solved by the multi-block ADMM in Section The Multi-block ADMM for Smoothed Isotonic Regression The classic isotonic regression is a problem to return a non-decreasing response given a predictor. However, the fitted response resembles a step function while a smooth and continuous response is expected. To achieve this, Sysoev and 2
3 Burdakov proposed a smoothed isotonic regression problem given a predictor x i (i = 1,, n) [31]: Problem 1 (Smoothed Isotonic Regression). min β1,,β n n i=1 w i(x i β i ) 2 + λ n 1 s.t. β 1 β 2 β n i=1 (β i β i+1 ) 2 where w i > 0(i = 1,, n) are assigned weights and λ 0 is a penalty parameter. When λ = 0, the problem is reduced to the classic isotonic regression problem. The smoothed isotonic regression problem can be solved by the Smoothed Pool-Adjacent-Violators (SPAV) algorithm [31]. However, this algorithm meets with difficulty when solving large-scale datasets. This is because the time complexity of the SPAV algorithm is O(n 2 ). To deal with the scalability issue, the multi-block ADMM is applied to realize parallel computing: we introduce two vectors p and q of length n 1 such that p i = β i (i = 1,, n 1) and q i = β i+1 (i = 1,, n 1), respectively. The problem can be reformulated as min p,q,u w 1 (x 1 p 1 ) 2 + n 1 i=2 ((w i/2)(x i p i ) 2 +(w i /2)(x i q i 1 ) 2 )+w n (x n q n 1 ) 2 +λ u 2 2 s.t. p q + u = 0, u 0, p i+1 = q i, i = 1, 2,, n 2 where p = [p 1,, p n 1 ] and q = [q 1,, q n 1 ]. The augmented Lagrangian is L ρ (u, p, q, y 1, y 2 ) = w 1 (x 1 p 1 ) 2 + n 1 i=2 ((w i/2)(x i p i ) 2 +(w i /2)(x i q i 1 ) 2 )+ w n (x n q n 1 ) 2 +λ u 2 2+(ρ/2) p q+u+y 1 /ρ 2 2+(ρ/2) n 2 i=1 (p i+1 q i +y 2,i /ρ) 2 where y 1 = [y 1,1,, y 1,n 1 ], y 2 = [y 2,1,, y 2,n 2 ] and ρ > 0. Due to space limit, the multi-block ADMM to solve Problem 1 is detailed in Section A in the supplementary materials. 2.2 The Multi-block ADMM for Multi-dimensional Ordering The previous smoothed isotonic regression only considers linear orders, while multi-dimensional orders are more common in isotonic regression applications [30]. A multi-dimensional order is defined in a m dimensional space Z i = (z i,1,, z i,m ). Z i Z j if and only if z i,1 z j,1,, z i,m z j,m. It can be represented equivalently as Directed Acyclic Graph (DAG) G = (V, E) where (Z i, Z j ) E if Z i Z j [30]. Obviously, many pairs (Z i, Z j ) are incomparable. Formally, the multi-dimensional ordering problem is formulated as follows: Problem 2 (Multi-dimensional Ordering). min α1,,α n n i=1 w i(y i α i ) 2 s.t.α i α j iff Z i Z j (1 i, j n). 3
4 where w i > 0 are weights. The multi-dimensional ordering problem has been investigated since 1970s [12], but its computational difficulty has been mentioned in multiple papers [5, 9 11, 20, 23, 25, 27 30, 32]. Therefore, researchers develop approximate methods and restrict them in the small datasets [30]. To handle large-scale multi-dimensional ordering problems, we leverage the advantage of parallel computing of the multi-block ADMM to solve it in an exact form. By introducing two vectors g and h, this problem is equivalent of min g,h W T ((Y g) (Y g))/2 + W T ((Y h) (Y h))/2 s.t. E 1 g E 2 h + v = 0, v 0, g = h. where W = (w 1,, w n ), Y = (Y 1,, Y n ) and is the Hadamard product. E 1 R E n and E 2 R E n are representations of the edge set E: the k th edge (i, j) E means that E 1,k,i = 1 and E 2,k,j = 1 while E 1,k,p = 0(1 p n, p i) and E 2,k,q = 0(1 q n, q j). The augmented Lagrangian is L ρ (v, g, h, y 1, y 2 ) = W T ((Y g) (Y g))/2 + W T ((Y h) (Y h))/2 + (ρ/2) E 1 g E 2 h + v + y 1 /ρ (ρ/2) g h + y 2 /ρ 2 2. where ρ > 0. Due to space limit, the multi-block ADMM to solve Problem 2 is detailed in Section B in the supplementary materials. 3 Experiment In this section, we validate the multi-block ADMM using simulated datasets and compare it with existing state-of-the-art methods. All experiments were conducted on a 64-bit machine with Intel(R) core(tm)processor (i7-6820hq CPU@ 2.70GHZ) and 16.0GB memory. 3.1 Data Generation and Parameter Settings The experimental data and parameters for two applications are explained in this section. Commonly, the maximal number of iteration was set to 10, 000. We set the tolerance ε = 0.01 n according to the recommendation by Boyd et al. [4]. For the smoothed isotonic regression problem, we generated simulated observations x i (i = 1,, n) from a uniform distribution in (0, 1000); we set w i = 1(i = 1,, n) and λ = 1. For the multi-dimensional ordering problem, we generated simulated observations Y i (i = 1,, n) from a uniform distribution in (0, 1000). w i (i = 1,, n) were set to 1. E 1 and E 2 were generated from a 2-dimensional random grid graph to simulate the ordering of the 2-dimensional space. 3.2 Baselines Two methods Smoothed Pool-Adjacent-Violators(SPAV) [31] and Interior Point Method(IPM) [16] are used for comparison. The details can be found in Section D in the supplementary materials. 4
5 3.3 Experimental Results In this section, the experimental results on two problems are explained in detail. 1. Does the multi-block ADMM converge whatever ρ is chosen? Figure 1 illustrates the convergence properties of the multi-block ADMM when n = 1000 on the smoothed isotonic regression problem and the multi-dimensional ordering problem. Two choices of ρ = 0.1 and ρ = 10 are shown on Figure 1. The primal residual r and the dual residual s reflect the convergence trend of the multi-block ADMM. Overall, Figure 1 (a)-(c) shows that the multi-block ADMM converges while Figure 1(d) shows the divergence: r and s drop drastically at the beginning and then decrease smoothly through the end in the Figures 1 (a)-(c); however, Figure 1(d) displays a surge of r. Moreover, when ρ = 0.1, r is located above s while when ρ = 10 the situation is the opposite. Obviously, the multi-block ADMM can obtain a reasonable solution within tens of iterations as long as ρ is small and hence it is suitable for large-scale optimization. (a).ρ = 0.1 on the smoothed isotonic regression problem. (b).ρ = 10 on the smoothed isotonic regression problem. (c).ρ = 0.1 on the multi-dimensional ordering problem. (d).ρ = 10 on the multi-dimensional ordering problem. Figure 1: Convergence of the multi-block ADMM when n = 1000 on two problems. 2. Is the multi-block ADMM scalable to two problems compared 5
6 with other baselines? Figure 2 shows the scalability of the multi-block ADMM compared with two baselines on two problems. The running time of the multiblock ADMM increases linearly with the number of observations on two problems. In Figure 2(a), the SPAV is more efficient than the multi-block ADMM when the number of observations n is less than 20,000 but needs more time since then; as for Figure 2(b), the multi-block ADMM is more efficient than the IPM no matter how many observations there are. (a).scalability on the smoothed isotonic regression problem. (b).scalability on the multi-dimensional ordering problem. Figure 2: Scalability of the multi-block ADMM on two problems. 4 Conclusion The multi-block ADMM is an interesting topic in the optimization community in recent years. In this paper, we apply the multi-block ADMM to two problems related to isotonic regression: smoothed isotonic regression and multi-dimensional ordering. Most existing methods are not efficient enough to run on large-scale datasets. However, the main advantage of the multi-block ADMM is parallel computing and hence it is scalable to large datasets. We find that the multi-block ADMM converges when ρ is small and its running time increases linearly with the scalability of observations. References [1] Miriam Ayer, H Daniel Brunk, George M Ewing, William T Reid, and Edward Silverman. An empirical distribution function for sampling with incomplete information. The annals of mathematical statistics, pages , [2] RE Barlow and HD Brunk. The isotonic regression problem and its dual. Journal of the American Statistical Association, 67(337): ,
7 [3] Michael J Best and Nilotpal Chakravarti. Active set algorithms for isotonic regression; a unifying framework. Mathematical Programming, 47(1-3): , [4] Stephen Boyd, Neal Parikh, Eric Chu, Borja Peleato, and Jonathan Eckstein. Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends R in Machine Learning, 3(1):1 122, [5] Gordon Bril, Richard Dykstra, Carolyn Pillers, and Tim Robertson. Algorithm as 206: isotonic regression in two independent variables. Journal of the Royal Statistical Society. Series C (Applied Statistics), 33(3): , [6] Rick Chartrand and Brendt Wohlberg. A nonconvex admm algorithm for group sparsity with sparse groups. In Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on, pages IEEE, [7] Caihua Chen, Bingsheng He, Yinyu Ye, and Xiaoming Yuan. The direct extension of admm for multi-block convex minimization problems is not necessarily convergent. Mathematical Programming, 155(1-2):57 79, [8] Wei Deng, Ming-Jun Lai, Zhimin Peng, and Wotao Yin. Parallel multiblock admm with o (1/k) convergence. Journal of Scientific Computing, 71(2): , [9] Holger Dette and Regine Scheder. Strictly monotone and smooth nonparametric regression for two or more variables. Canadian Journal of Statistics, 34(4): , [10] Richard L Dykstra and Tim Robertson. An algorithm for isotonic regression for two or more independent variables. The Annals of Statistics, pages , [11] David Gamarnik. Efficient learning of monotone concepts via quadratic optimization. In Proceedings of the eleventh annual conference on computational learning theory, pages ACM, [12] David Lee Hanson, Gordon Pledger, FT Wright, et al. On consistency in monotonic regression. The Annals of Statistics, 1(3): , [13] Bingsheng He and Xiaoming Yuan. On the direct extension of admm for multi-block separable convex programming and beyond: from variational inequality perspective. Manuscript, optimizationonline. org/db_html/2014/03/4293. html, [14] Sham M Kakade, Varun Kanade, Ohad Shamir, and Adam Kalai. Efficient learning of generalized linear and single index models with isotonic regression. In Advances in Neural Information Processing Systems, pages ,
8 [15] Adam Tauman Kalai and Ravi Sastry. The isotron algorithm: Highdimensional isotonic regression. In COLT. Citeseer, [16] Rasmus Kyng, Anup Rao, and Sushant Sachdeva. Fast, provable algorithms for isotonic regression in all l_p-norms. In Advances in Neural Information Processing Systems, pages , [17] Tianyi Lin, Shiqian Ma, and Shuzhong Zhang. On the convergence rate of multi-block admm. arxiv preprint arxiv: , 229, [18] Tianyi Lin, Shiqian Ma, and Shuzhong Zhang. Global convergence of unmodified 3-block admm for a class of convex minimization problems. arxiv preprint arxiv: , [19] Tianyi Lin, Shiqian Ma, and Shuzhong Zhang. Iteration complexity analysis of multi-block admm for a family of convex minimization without strong convexity. Journal of Scientific Computing, 69(1):52 81, [20] Ronny Luss, Saharon Rosset, and Moni Shahar. Decomposing isotonic regression for efficiently solving large problems. In Advances in Neural Information Processing Systems, pages , [21] Sindri Magnússon, Pradeep Chathuranga Weeraddana, Michael G Rabbat, and Carlo Fischione. On the convergence of alternating direction lagrangian methods for nonconvex structured optimization problems. IEEE Transactions on Control of Network Systems, 3(3): , [22] Taesup Moon, Alex Smola, Yi Chang, and Zhaohui Zheng. Intervalrank: isotonic regression with listwise and pairwise constraints. In Proceedings of the third ACM international conference on Web search and data mining, pages ACM, [23] Hari Mukarjee and Steven Stern. Feasible nonparametric estimation of multiargument monotone functions. Journal of the American Statistical Association, 89(425):77 80, [24] Harikrishna Narasimhan and Shivani Agarwal. On the relationship between binary classification, bipartite ranking, and binary class probability estimation. In Advances in Neural Information Processing Systems, pages , [25] Shixian Qian and William F Eddy. An algorithm for isotonic regression on ordered rectangular grids. Journal of Computational and Graphical Statistics, 5(3): , [26] Daniel P Robinson and Rachael Tappenden. A flexible admm algorithm for big data applications. Journal of Scientific Computing, 71(1): ,
9 [27] Olli Saarela and Elja Arjas. A method for bayesian monotonic multiple regression. Scandinavian Journal of Statistics, 38(3): , [28] Georgia Salanti and Kurt Ulm. Multidimensional isotonic regression and estimation of the threshold value [29] Quentin F Stout. Isotonic regression via partitioning. Algorithmica, 66(1):93 112, [30] Quentin F Stout. Isotonic regression for multiple independent variables. Algorithmica, 71(2): , [31] Oleg Sysoev and Oleg Burdakov. A smoothed monotonic regression via l2 regularization. Knowledge and Information Systems, pages 1 22, [32] Oleg Sysoev, Oleg Burdakov, and Anders Grimvall. A segmentation-based algorithm for large-scale partially ordered monotonic regression. Computational Statistics & Data Analysis, 55(8): , [33] Min Tao and Xiaoming Yuan. Convergence analysis of the direct extension of admm for multiple-block separable convex minimization. Advances in Computational Mathematics, pages 1 41, [34] Fenghui Wang, Zongben Xu, and Hong-Kun Xu. Convergence of bregman alternating direction method with multipliers for nonconvex composite problems. arxiv preprint arxiv: , [35] Junxiang Wang and Liang Zhao. Multi-instance domain adaptation for vaccine adverse event detection. In Proceedings of the 2018 World Wide Web Conference on World Wide Web, pages International World Wide Web Conferences Steering Committee, [36] Yu Wang, Wotao Yin, and Jinshan Zeng. Global convergence of admm in nonconvex nonsmooth optimization. arxiv preprint arxiv: , [37] Zaiwen Wen, Xianhua Peng, Xin Liu, Xiaoling Sun, and Xiaodi Bai. Asset allocation under the basel accord risk measures. arxiv preprint arxiv: , [38] Zaiwen Wen, Chao Yang, Xin Liu, and Stefano Marchesini. Alternating direction methods for classical and ptychographic phase retrieval. Inverse Problems, 28(11):115010, [39] Bianca Zadrozny and Charles Elkan. Transforming classifier scores into accurate multiclass probability estimates. In Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, pages ACM,
10 [40] Liang Zhao, Junxiang Wang, Feng Chen, Chang-Tien Lu, and Naren Ramakrishnan. Spatial event forecasting in social media with geographically hierarchical regularization. Proceedings of the IEEE, 105(10): , [41] Liang Zhao, Junxiang Wang, and Xiaojie Guo. Distant-supervision of heterogeneous multitask learning for social event forecasting with multilingual indicators
11 Appendix A The Multi-block ADMM Algorithm to Solve Problem 1 Algorithm 1 The Multi-block ADMM Algorithm to Solve Problem 1 1: Initialize p, q, u, y 1, y 2, ρ > 0, k = 0. 2: repeat 3: Update u k+1 in Equation (1). 4: Update p k+1 in Equation (2). 5: Update q k+1 in Equation (3). 6: Update r1 k+1 p k+1 q k+1 + u k+1. 7: Update r k+1 2,i p k+1 i+1 qk+1 i (i = 1,, n 2), r2 k+1 [r2,1 k+1 8: Update s k+1 1 ρ(p k+1 q k+1 p k + q k ). 9: Update s k+1 2 ρ(q k q k+1 ). 10: Update r k+1 r1 k rk # Calculate the primal residual. 11: Update s k+1 s k sk # Calculate the dual residual. 12: Update y1 k+1 y1 k + ρr1 k+1. 13: Update y2 k+1 y2 k + ρr2 k+1. 14: k k : until convergence. 16: Output p, q and u. 2,n 2 ]. The multi-block ADMM to solve Problem 1 is shown in Algorithm 1. Specifically, Line 3-5 update three primal variables u, p and q alternately; Line 6-11 compute primal and dual residuals; Line update two dual variables y 1 and y 2. Since the objective function and constraints are both separable, each subproblem has a closed-form solution and can be implemented in parallel. The detailed solutions to the subproblems are as follows. 1. Update u. The variable u is updated as follows: u k+1 arg min u (ρ/2) p k q k + u + y k 1 /ρ λ u 2 2, s.t. u 0. (1) This subproblem has a closed-from solution as follows: u k+1 max((ρ(q k p k ) y k 1 )/(ρ + 2λ), 0). 2. Update p. The variable p is updated as follows: p k+1 1 arg min p1 w 1 (x 1 p 1 ) 2 + (ρ/2) p 1 q1 k + u k y1 k /ρ 2 2 p k+1 i arg min pi (w i /2)(x i p i ) 2 + (ρ/2) p i qi k + u k+1 i + yi k /ρ (ρ/2) p i qi 1 k + y2,i 1/ρ k 2 2(i = 2,, n 1). (2) 11
12 This subproblem has a closed-from solution as follows: p k+1 1 (2w 1 x 1 + ρq k 1 ρu k+1 1 y k 1,1)/(2w 1 + ρ). p k+1 i (w i x i + ρq k i ρu k+1 i y k 1,i + ρq k i 1 y k 2,i 1)/(w i + 2ρ)(i = 2,, n 1). 3. Update q. The variable q is updated as follows: q k+1 i arg min qi (w i+1 /2)(x i+1 q i ) 2 + (ρ/2) p k+1 i q i + u k+1 i + y k 1,i/ρ (ρ/2)(p k+1 i+1 q i + y k 2,i/ρ)(i = 1,, n 2). q k+1 n 1 arg min q n 1 w n (x n q n 1 ) 2 + (ρ/2) p k+1 n 1 q n 1 + u k+1 n 1 + yk 1,n 1/ρ 2 2. (3) This subproblem has a closed-form solution as follows: q k+1 i (w i+1 x i+1 + ρp k+1 i + ρu k+1 i + y k 1,i + ρp k+1 i+1 + yk 2,i)/(w i+1 + 2ρ)(i = 1,, n 2). q k+1 n 1 (2w nx n + ρp k+1 n 1 + ρuk+1 n 1 + yk 1,n 1)/(2w n + ρ). In addition to the advantage of parallel computing, the convergence of the Algorithm 1 can be guaranteed by the following theorem: Theorem 1. There exists d > 0 such that if ρ < d, the Algorithm 1 converges globally to the optimal point with sublinear convergence rate. Proof. Because the objective function is strongly convex in p and q, all assumptions in [17] are satisfied and hence the Theorem 3.2 and Theorem 4.2 in [17] ensure the global sublinear convergence if ρ is smaller than a threshold d. B The Multi-block ADMM Algorithm to Solve Problem 2 The multi-block ADMM to solve Problem 2 is shown in Algorithm 2. Specifically, Line 3-5 update three primal variables v, g and h alternately; Line 6-12 compute primal and dual residuals; Line update two dual variables y 1 and y 2. Since the objective function and constraints are both separable, each subproblem has a closed-form solution and can be implemented in parallel. The detailed solutions to the subproblems are as follows: 1. Update v. The variable v is updated as follows: v k+1 arg min v (ρ/2) E 1 g k E 2 h k + v + y k 1 /ρ 2 2, s.t. v 0. (4) This subproblem has a closed-from solution as follows: v k+1 max( E 1 g k + E 2 h k y k 1 /ρ, 0). 12
13 Algorithm 2 The Multi-block ADMM Algorithm to Solve Problem 2 1: Initialize g, h, v, y 1, y 2, ρ > 0, k = 0. 2: repeat 3: Update v k+1 in Equation (4). 4: Update g k+1 in Equation (5). 5: Update h k+1 in Equation (6). 6: Update r k+1 1 E 1 g k+1 E 2 h k+1 + v k+1. 7: Update r k+1 2 g k+1 h k+1. 8: Update s k+1 1 ρ(e 1 g k+1 E 2 h k+1 E 1 g k + E 2 h k ). 9: Update s k+1 2 ρe 2 (h k h k+1 ). 10: Update s k+1 3 ρ(h k h k+1 ). 11: Update r k rk # Calculate the primal residual. s k sk sk # Calculate the dual r k+1 12: Update s k+1 residual. 13: Update y1 k+1 y1 k + ρr k+1 14: Update y2 k+1 y2 k + ρr k+1 15: k k : until convergence. 17: Output g, h and v Update g. The variable g is updated as follows: g k+1 arg min g W T ((Y g) (Y g))/2 + (ρ/2) E 1 g E 2 h k + v k+1 + y k 1 /ρ (ρ/2) g h k + y k 2 /ρ 2 2. (5) This subproblem has a closed-from solution as follows: g k+1 (diag(w ) + ρe T 1 E 1 + ρi) 1 (diag(w )Y + ρe T 1 E 2 h k ρe T 1 v k+1 E T 1 y k 1 + ρh k y k 2 ) where I is an identity matrix and diag(w ) is a diagonal matrix where the main diagonal is W. 3. Update h. The variable h is updated as follows: h k+1 arg min h W T ((Y h) (Y h))/2 + (ρ/2) E 1 g k+1 E 2 h + v k+1 + y k 1 /ρ (ρ/2) g k+1 h + y k 2 /ρ 2 2. (6) This subproblem has a closed-form solution as follows: h k+1 (diag(w ) + ρe T 2 E 2 + ρi) 1 (diagy + ρe T 2 E 1 g k+1 + ρe T 2 v k+1 + E T 2 y k 1 + ρg k+1 + y k 2 ). As for the convergence properties of Algorithm 2, the below theorem guarantees the convergence if ρ is small. Theorem 2. There exists d > 0 such that if ρ < d, then the Algorithm 2 converges globally to the optimal point with sublinear convergence rate. Proof. The proof is the same as Theorem 1. 13
14 C Related Work on the Multi-block ADMM The multi-block ADMM was firstly studied by Chen et al. [7] when they concluded that the multi-block ADMM does not necessarily converge by giving a counterexample. He and Yuan explained why the multi-block ADMM may diverge from the perspective of variational inequality framework [13]. Since then, many researchers studied sufficient conditions to ensure the global convergence of the multi-block ADMM. For example, Robinson and Tappenden and Lin et al. imposed strong convexity assumption on the objective function f i (x i )(i = 1,, n) [17,26]. Tao and Yuan did not require all objective functions f i (x i )(i = 1,, n) to be strongly convex, but imposed full-rank assumption on A i (i = 1,, n) [33]. Lin et al. weakened the strong convexity assumption on the objective function f i (x i )(i = 1,, n) by imposing the additional cocercity assumption on f i (x i )(i = 1,, n) [19]. Deng et al. proved that the multi-block ADMM preserved convergence if multiple variables x i (i = 1,, n) were updated in the Jacobian fashion rather than Gauss-Seidel fashion [8]. Lin et al. proved that the multi-block ADMM converged for any ρ > 0 in the regularized least squares decomposition problem [18]. Moreover, some work extended the multiblock ADMM into the nonconvex settings. See [21, 34, 36] for more information. D Baselines In order to test the scalability of the multi-block ADMM on two applications, two baselines were utilized for comparison: 1. Smoothed Pool-Adjacent-Violators(SPAV) [31]. The SPAV algorithm is a extension of the PAV algorithm designed for the smoothed isotonic regression problem. It partitions all β i (i = 1,, n) into many blocks according to the feasibility of inequality constraints. The iteration ends when all constraints are feasible. 2. Interior Point Method(IPM) [16]. The IPM method is an efficient algorithm to solve multi-dimensional ordering with convergence guarantee. Its time complexity is O( E 1.5 log 2 V log 2 ( V /δ)) where δ is a tolerance. 14
arxiv: v1 [math.oc] 23 May 2017
A DERANDOMIZED ALGORITHM FOR RP-ADMM WITH SYMMETRIC GAUSS-SEIDEL METHOD JINCHAO XU, KAILAI XU, AND YINYU YE arxiv:1705.08389v1 [math.oc] 23 May 2017 Abstract. For multi-block alternating direction method
More informationFast Nonnegative Matrix Factorization with Rank-one ADMM
Fast Nonnegative Matrix Factorization with Rank-one Dongjin Song, David A. Meyer, Martin Renqiang Min, Department of ECE, UCSD, La Jolla, CA, 9093-0409 dosong@ucsd.edu Department of Mathematics, UCSD,
More informationOnline isotonic regression
Online isotonic regression Wojciech Kot lowski Joint work with: Wouter Koolen (CWI, Amsterdam) Alan Malek (MIT) Poznań University of Technology 06.06.2017 Outline 1 Motivation 2 Isotonic regression 3 Online
More informationRecent Developments of Alternating Direction Method of Multipliers with Multi-Block Variables
Recent Developments of Alternating Direction Method of Multipliers with Multi-Block Variables Department of Systems Engineering and Engineering Management The Chinese University of Hong Kong 2014 Workshop
More informationDistributed Optimization via Alternating Direction Method of Multipliers
Distributed Optimization via Alternating Direction Method of Multipliers Stephen Boyd, Neal Parikh, Eric Chu, Borja Peleato Stanford University ITMANET, Stanford, January 2011 Outline precursors dual decomposition
More informationA Bregman alternating direction method of multipliers for sparse probabilistic Boolean network problem
A Bregman alternating direction method of multipliers for sparse probabilistic Boolean network problem Kangkang Deng, Zheng Peng Abstract: The main task of genetic regulatory networks is to construct a
More informationAn efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss
An efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss arxiv:1811.04545v1 [stat.co] 12 Nov 2018 Cheng Wang School of Mathematical Sciences, Shanghai Jiao
More informationA Customized ADMM for Rank-Constrained Optimization Problems with Approximate Formulations
A Customized ADMM for Rank-Constrained Optimization Problems with Approximate Formulations Chuangchuang Sun and Ran Dai Abstract This paper proposes a customized Alternating Direction Method of Multipliers
More informationShiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 9. Alternating Direction Method of Multipliers
Shiqian Ma, MAT-258A: Numerical Optimization 1 Chapter 9 Alternating Direction Method of Multipliers Shiqian Ma, MAT-258A: Numerical Optimization 2 Separable convex optimization a special case is min f(x)
More informationDecomposing Isotonic Regression for Efficiently Solving Large Problems
Decomposing Isotonic Regression for Efficiently Solving Large Problems Ronny Luss Dept. of Statistics and OR Tel Aviv University ronnyluss@gmail.com Saharon Rosset Dept. of Statistics and OR Tel Aviv University
More informationARock: an algorithmic framework for asynchronous parallel coordinate updates
ARock: an algorithmic framework for asynchronous parallel coordinate updates Zhimin Peng, Yangyang Xu, Ming Yan, Wotao Yin ( UCLA Math, U.Waterloo DCO) UCLA CAM Report 15-37 ShanghaiTech SSDS 15 June 25,
More informationDoes Alternating Direction Method of Multipliers Converge for Nonconvex Problems?
Does Alternating Direction Method of Multipliers Converge for Nonconvex Problems? Mingyi Hong IMSE and ECpE Department Iowa State University ICCOPT, Tokyo, August 2016 Mingyi Hong (Iowa State University)
More informationDual Ascent. Ryan Tibshirani Convex Optimization
Dual Ascent Ryan Tibshirani Conve Optimization 10-725 Last time: coordinate descent Consider the problem min f() where f() = g() + n i=1 h i( i ), with g conve and differentiable and each h i conve. Coordinate
More informationDivide-and-combine Strategies in Statistical Modeling for Massive Data
Divide-and-combine Strategies in Statistical Modeling for Massive Data Liqun Yu Washington University in St. Louis March 30, 2017 Liqun Yu (WUSTL) D&C Statistical Modeling for Massive Data March 30, 2017
More informationUses of duality. Geoff Gordon & Ryan Tibshirani Optimization /
Uses of duality Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Remember conjugate functions Given f : R n R, the function is called its conjugate f (y) = max x R n yt x f(x) Conjugates appear
More informationNonconvex ADMM: Convergence and Applications
Nonconvex ADMM: Convergence and Applications Instructor: Wotao Yin (UCLA Math) Based on CAM 15-62 with Yu Wang and Jinshan Zeng Summer 2016 1 / 54 1. Alternating Direction Method of Multipliers (ADMM):
More informationBeyond Heuristics: Applying Alternating Direction Method of Multipliers in Nonconvex Territory
Beyond Heuristics: Applying Alternating Direction Method of Multipliers in Nonconvex Territory Xin Liu(4Ð) State Key Laboratory of Scientific and Engineering Computing Institute of Computational Mathematics
More informationAlternating Direction Method of Multipliers. Ryan Tibshirani Convex Optimization
Alternating Direction Method of Multipliers Ryan Tibshirani Convex Optimization 10-725 Consider the problem Last time: dual ascent min x f(x) subject to Ax = b where f is strictly convex and closed. Denote
More informationThe Alternating Direction Method of Multipliers
The Alternating Direction Method of Multipliers With Adaptive Step Size Selection Peter Sutor, Jr. Project Advisor: Professor Tom Goldstein December 2, 2015 1 / 25 Background The Dual Problem Consider
More informationThe Direct Extension of ADMM for Multi-block Convex Minimization Problems is Not Necessarily Convergent
The Direct Extension of ADMM for Multi-block Convex Minimization Problems is Not Necessarily Convergent Yinyu Ye K. T. Li Professor of Engineering Department of Management Science and Engineering Stanford
More informationContraction Methods for Convex Optimization and Monotone Variational Inequalities No.18
XVIII - 1 Contraction Methods for Convex Optimization and Monotone Variational Inequalities No18 Linearized alternating direction method with Gaussian back substitution for separable convex optimization
More informationDecentralized Quadratically Approximated Alternating Direction Method of Multipliers
Decentralized Quadratically Approimated Alternating Direction Method of Multipliers Aryan Mokhtari Wei Shi Qing Ling Alejandro Ribeiro Department of Electrical and Systems Engineering, University of Pennsylvania
More informationDistributed Optimization and Statistics via Alternating Direction Method of Multipliers
Distributed Optimization and Statistics via Alternating Direction Method of Multipliers Stephen Boyd, Neal Parikh, Eric Chu, Borja Peleato Stanford University Stanford Statistics Seminar, September 2010
More informationContraction Methods for Convex Optimization and monotone variational inequalities No.12
XII - 1 Contraction Methods for Convex Optimization and monotone variational inequalities No.12 Linearized alternating direction methods of multipliers for separable convex programming Bingsheng He Department
More informationContraction Methods for Convex Optimization and Monotone Variational Inequalities No.11
XI - 1 Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.11 Alternating direction methods of multipliers for separable convex programming Bingsheng He Department of Mathematics
More informationOptimizing Nonconvex Finite Sums by a Proximal Primal-Dual Method
Optimizing Nonconvex Finite Sums by a Proximal Primal-Dual Method Davood Hajinezhad Iowa State University Davood Hajinezhad Optimizing Nonconvex Finite Sums by a Proximal Primal-Dual Method 1 / 35 Co-Authors
More informationDual methods and ADMM. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725
Dual methods and ADMM Barnabas Poczos & Ryan Tibshirani Convex Optimization 10-725/36-725 1 Given f : R n R, the function is called its conjugate Recall conjugate functions f (y) = max x R n yt x f(x)
More informationECS289: Scalable Machine Learning
ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Sept 29, 2016 Outline Convex vs Nonconvex Functions Coordinate Descent Gradient Descent Newton s method Stochastic Gradient Descent Numerical Optimization
More informationMulti-Block ADMM and its Convergence
Multi-Block ADMM and its Convergence Yinyu Ye Department of Management Science and Engineering Stanford University Email: yyye@stanford.edu We show that the direct extension of alternating direction method
More informationExact Hybrid Covariance Thresholding for Joint Graphical Lasso
Exact Hybrid Covariance Thresholding for Joint Graphical Lasso Qingming Tang Chao Yang Jian Peng Jinbo Xu Toyota Technological Institute at Chicago Massachusetts Institute of Technology Abstract. This
More informationIntroduction to Alternating Direction Method of Multipliers
Introduction to Alternating Direction Method of Multipliers Yale Chang Machine Learning Group Meeting September 29, 2016 Yale Chang (Machine Learning Group Meeting) Introduction to Alternating Direction
More informationSTA141C: Big Data & High Performance Statistical Computing
STA141C: Big Data & High Performance Statistical Computing Lecture 8: Optimization Cho-Jui Hsieh UC Davis May 9, 2017 Optimization Numerical Optimization Numerical Optimization: min X f (X ) Can be applied
More informationA Randomized Approach for Crowdsourcing in the Presence of Multiple Views
A Randomized Approach for Crowdsourcing in the Presence of Multiple Views Presenter: Yao Zhou joint work with: Jingrui He - 1 - Roadmap Motivation Proposed framework: M2VW Experimental results Conclusion
More informationDealing with Constraints via Random Permutation
Dealing with Constraints via Random Permutation Ruoyu Sun UIUC Joint work with Zhi-Quan Luo (U of Minnesota and CUHK (SZ)) and Yinyu Ye (Stanford) Simons Institute Workshop on Fast Iterative Methods in
More informationDual Methods. Lecturer: Ryan Tibshirani Convex Optimization /36-725
Dual Methods Lecturer: Ryan Tibshirani Conve Optimization 10-725/36-725 1 Last time: proimal Newton method Consider the problem min g() + h() where g, h are conve, g is twice differentiable, and h is simple.
More informationLinearized Alternating Direction Method with Adaptive Penalty for Low-Rank Representation
Linearized Alternating Direction Method with Adaptive Penalty for Low-Rank Representation Zhouchen Lin Visual Computing Group Microsoft Research Asia Risheng Liu Zhixun Su School of Mathematical Sciences
More informationProperties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation
Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation Adam J. Rothman School of Statistics University of Minnesota October 8, 2014, joint work with Liliana
More informationBAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage
BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage Lingrui Gan, Naveen N. Narisetty, Feng Liang Department of Statistics University of Illinois at Urbana-Champaign Problem Statement
More informationDistributed Inexact Newton-type Pursuit for Non-convex Sparse Learning
Distributed Inexact Newton-type Pursuit for Non-convex Sparse Learning Bo Liu Department of Computer Science, Rutgers Univeristy Xiao-Tong Yuan BDAT Lab, Nanjing University of Information Science and Technology
More informationForward Selection and Estimation in High Dimensional Single Index Models
Forward Selection and Estimation in High Dimensional Single Index Models Shikai Luo and Subhashis Ghosal North Carolina State University August 29, 2016 Abstract We propose a new variable selection and
More informationLARGE SCALE 2D SPECTRAL COMPRESSED SENSING IN CONTINUOUS DOMAIN
LARGE SCALE 2D SPECTRAL COMPRESSED SENSING IN CONTINUOUS DOMAIN Jian-Feng Cai, Weiyu Xu 2, and Yang Yang 3 Department of Mathematics, Hong Kong University of Science and Technology, Clear Water Bay, Kowloon,
More informationSimplicial Nonnegative Matrix Tri-Factorization: Fast Guaranteed Parallel Algorithm
Simplicial Nonnegative Matrix Tri-Factorization: Fast Guaranteed Parallel Algorithm Duy-Khuong Nguyen 13, Quoc Tran-Dinh 2, and Tu-Bao Ho 14 1 Japan Advanced Institute of Science and Technology, Japan
More informationA GENERAL FORMULATION FOR SUPPORT VECTOR MACHINES. Wei Chu, S. Sathiya Keerthi, Chong Jin Ong
A GENERAL FORMULATION FOR SUPPORT VECTOR MACHINES Wei Chu, S. Sathiya Keerthi, Chong Jin Ong Control Division, Department of Mechanical Engineering, National University of Singapore 0 Kent Ridge Crescent,
More informationCoordinate Update Algorithm Short Course Operator Splitting
Coordinate Update Algorithm Short Course Operator Splitting Instructor: Wotao Yin (UCLA Math) Summer 2016 1 / 25 Operator splitting pipeline 1. Formulate a problem as 0 A(x) + B(x) with monotone operators
More informationMath 273a: Optimization Overview of First-Order Optimization Algorithms
Math 273a: Optimization Overview of First-Order Optimization Algorithms Wotao Yin Department of Mathematics, UCLA online discussions on piazza.com 1 / 9 Typical flow of numerical optimization Optimization
More informationDifferential network analysis from cross-platform gene expression data: Supplementary Information
Differential network analysis from cross-platform gene expression data: Supplementary Information Xiao-Fei Zhang, Le Ou-Yang, Xing-Ming Zhao, and Hong Yan Contents 1 Supplementary Figures Supplementary
More informationLinearized Alternating Direction Method of Multipliers for Constrained Nonconvex Regularized Optimization
JMLR: Workshop and Conference Proceedings 63:97 19, 216 ACML 216 Linearized Alternating Direction Method of Multipliers for Constrained Nonconvex Regularized Optimization Linbo Qiao qiao.linbo@nudt.edu.cn
More informationDistributed Convex Optimization
Master Program 2013-2015 Electrical Engineering Distributed Convex Optimization A Study on the Primal-Dual Method of Multipliers Delft University of Technology He Ming Zhang, Guoqiang Zhang, Richard Heusdens
More informationSupplementary Material of A Novel Sparsity Measure for Tensor Recovery
Supplementary Material of A Novel Sparsity Measure for Tensor Recovery Qian Zhao 1,2 Deyu Meng 1,2 Xu Kong 3 Qi Xie 1,2 Wenfei Cao 1,2 Yao Wang 1,2 Zongben Xu 1,2 1 School of Mathematics and Statistics,
More information2 Regularized Image Reconstruction for Compressive Imaging and Beyond
EE 367 / CS 448I Computational Imaging and Display Notes: Compressive Imaging and Regularized Image Reconstruction (lecture ) Gordon Wetzstein gordon.wetzstein@stanford.edu This document serves as a supplement
More informationRecent Advances in Structured Sparse Models
Recent Advances in Structured Sparse Models Julien Mairal Willow group - INRIA - ENS - Paris 21 September 2010 LEAR seminar At Grenoble, September 21 st, 2010 Julien Mairal Recent Advances in Structured
More informationLearning Bound for Parameter Transfer Learning
Learning Bound for Parameter Transfer Learning Wataru Kumagai Faculty of Engineering Kanagawa University kumagai@kanagawa-u.ac.jp Abstract We consider a transfer-learning problem by using the parameter
More informationECS289: Scalable Machine Learning
ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Oct 18, 2016 Outline One versus all/one versus one Ranking loss for multiclass/multilabel classification Scaling to millions of labels Multiclass
More informationSparse Gaussian conditional random fields
Sparse Gaussian conditional random fields Matt Wytock, J. ico Kolter School of Computer Science Carnegie Mellon University Pittsburgh, PA 53 {mwytock, zkolter}@cs.cmu.edu Abstract We propose sparse Gaussian
More informationHYBRID JACOBIAN AND GAUSS SEIDEL PROXIMAL BLOCK COORDINATE UPDATE METHODS FOR LINEARLY CONSTRAINED CONVEX PROGRAMMING
SIAM J. OPTIM. Vol. 8, No. 1, pp. 646 670 c 018 Society for Industrial and Applied Mathematics HYBRID JACOBIAN AND GAUSS SEIDEL PROXIMAL BLOCK COORDINATE UPDATE METHODS FOR LINEARLY CONSTRAINED CONVEX
More informationA Brief Overview of Practical Optimization Algorithms in the Context of Relaxation
A Brief Overview of Practical Optimization Algorithms in the Context of Relaxation Zhouchen Lin Peking University April 22, 2018 Too Many Opt. Problems! Too Many Opt. Algorithms! Zero-th order algorithms:
More information10-725/36-725: Convex Optimization Spring Lecture 21: April 6
10-725/36-725: Conve Optimization Spring 2015 Lecturer: Ryan Tibshirani Lecture 21: April 6 Scribes: Chiqun Zhang, Hanqi Cheng, Waleed Ammar Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer:
More informationPreconditioning via Diagonal Scaling
Preconditioning via Diagonal Scaling Reza Takapoui Hamid Javadi June 4, 2014 1 Introduction Interior point methods solve small to medium sized problems to high accuracy in a reasonable amount of time.
More informationAsynchronous Parallel Computing in Signal Processing and Machine Learning
Asynchronous Parallel Computing in Signal Processing and Machine Learning Wotao Yin (UCLA Math) joint with Zhimin Peng (UCLA), Yangyang Xu (IMA), Ming Yan (MSU) Optimization and Parsimonious Modeling IMA,
More informationThe Direct Extension of ADMM for Multi-block Convex Minimization Problems is Not Necessarily Convergent
The Direct Extension of ADMM for Multi-block Convex Minimization Problems is Not Necessarily Convergent Caihua Chen Bingsheng He Yinyu Ye Xiaoming Yuan 4 December, Abstract. The alternating direction method
More informationA New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables
A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables Niharika Gauraha and Swapan Parui Indian Statistical Institute Abstract. We consider the problem of
More informationFast ADMM for Sum of Squares Programs Using Partial Orthogonality
Fast ADMM for Sum of Squares Programs Using Partial Orthogonality Antonis Papachristodoulou Department of Engineering Science University of Oxford www.eng.ox.ac.uk/control/sysos antonis@eng.ox.ac.uk with
More informationSolving DC Programs that Promote Group 1-Sparsity
Solving DC Programs that Promote Group 1-Sparsity Ernie Esser Contains joint work with Xiaoqun Zhang, Yifei Lou and Jack Xin SIAM Conference on Imaging Science Hong Kong Baptist University May 14 2014
More informationSupport Vector Machines and Kernel Methods
2018 CS420 Machine Learning, Lecture 3 Hangout from Prof. Andrew Ng. http://cs229.stanford.edu/notes/cs229-notes3.pdf Support Vector Machines and Kernel Methods Weinan Zhang Shanghai Jiao Tong University
More informationL 2,1 Norm and its Applications
L 2, Norm and its Applications Yale Chang Introduction According to the structure of the constraints, the sparsity can be obtained from three types of regularizers for different purposes.. Flat Sparsity.
More informationFast Asynchronous Parallel Stochastic Gradient Descent: A Lock-Free Approach with Convergence Guarantee
Fast Asynchronous Parallel Stochastic Gradient Descent: A Lock-Free Approach with Convergence Guarantee Shen-Yi Zhao and Wu-Jun Li National Key Laboratory for Novel Software Technology Department of Computer
More informationScalable Subspace Clustering
Scalable Subspace Clustering René Vidal Center for Imaging Science, Laboratory for Computational Sensing and Robotics, Institute for Computational Medicine, Department of Biomedical Engineering, Johns
More informationarxiv: v1 [cs.cv] 1 Jun 2014
l 1 -regularized Outlier Isolation and Regression arxiv:1406.0156v1 [cs.cv] 1 Jun 2014 Han Sheng Department of Electrical and Electronic Engineering, The University of Hong Kong, HKU Hong Kong, China sheng4151@gmail.com
More informationAccelerated primal-dual methods for linearly constrained convex problems
Accelerated primal-dual methods for linearly constrained convex problems Yangyang Xu SIAM Conference on Optimization May 24, 2017 1 / 23 Accelerated proximal gradient For convex composite problem: minimize
More informationJeff Howbert Introduction to Machine Learning Winter
Classification / Regression Support Vector Machines Jeff Howbert Introduction to Machine Learning Winter 2012 1 Topics SVM classifiers for linearly separable classes SVM classifiers for non-linearly separable
More informationA Tutorial on Support Vector Machine
A Tutorial on School of Computing National University of Singapore Contents Theory on Using with Other s Contents Transforming Theory on Using with Other s What is a classifier? A function that maps instances
More informationSMO vs PDCO for SVM: Sequential Minimal Optimization vs Primal-Dual interior method for Convex Objectives for Support Vector Machines
vs for SVM: Sequential Minimal Optimization vs Primal-Dual interior method for Convex Objectives for Support Vector Machines Ding Ma Michael Saunders Working paper, January 5 Introduction In machine learning,
More informationGROUP-SPARSE SUBSPACE CLUSTERING WITH MISSING DATA
GROUP-SPARSE SUBSPACE CLUSTERING WITH MISSING DATA D Pimentel-Alarcón 1, L Balzano 2, R Marcia 3, R Nowak 1, R Willett 1 1 University of Wisconsin - Madison, 2 University of Michigan - Ann Arbor, 3 University
More informationNonlinear Support Vector Machines through Iterative Majorization and I-Splines
Nonlinear Support Vector Machines through Iterative Majorization and I-Splines P.J.F. Groenen G. Nalbantov J.C. Bioch July 9, 26 Econometric Institute Report EI 26-25 Abstract To minimize the primal support
More informationLarge-scale Stochastic Optimization
Large-scale Stochastic Optimization 11-741/641/441 (Spring 2016) Hanxiao Liu hanxiaol@cs.cmu.edu March 24, 2016 1 / 22 Outline 1. Gradient Descent (GD) 2. Stochastic Gradient Descent (SGD) Formulation
More informationSupport Vector Machines: Maximum Margin Classifiers
Support Vector Machines: Maximum Margin Classifiers Machine Learning and Pattern Recognition: September 16, 2008 Piotr Mirowski Based on slides by Sumit Chopra and Fu-Jie Huang 1 Outline What is behind
More informationAlgorithm-Hardware Co-Optimization of Memristor-Based Framework for Solving SOCP and Homogeneous QCQP Problems
L.C.Smith College of Engineering and Computer Science Algorithm-Hardware Co-Optimization of Memristor-Based Framework for Solving SOCP and Homogeneous QCQP Problems Ao Ren Sijia Liu Ruizhe Cai Wujie Wen
More informationADMM and Fast Gradient Methods for Distributed Optimization
ADMM and Fast Gradient Methods for Distributed Optimization João Xavier Instituto Sistemas e Robótica (ISR), Instituto Superior Técnico (IST) European Control Conference, ECC 13 July 16, 013 Joint work
More informationECS289: Scalable Machine Learning
ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Oct 27, 2015 Outline One versus all/one versus one Ranking loss for multiclass/multilabel classification Scaling to millions of labels Multiclass
More informationA Fast Augmented Lagrangian Algorithm for Learning Low-Rank Matrices
A Fast Augmented Lagrangian Algorithm for Learning Low-Rank Matrices Ryota Tomioka 1, Taiji Suzuki 1, Masashi Sugiyama 2, Hisashi Kashima 1 1 The University of Tokyo 2 Tokyo Institute of Technology 2010-06-22
More informationON THE GLOBAL AND LINEAR CONVERGENCE OF THE GENERALIZED ALTERNATING DIRECTION METHOD OF MULTIPLIERS
ON THE GLOBAL AND LINEAR CONVERGENCE OF THE GENERALIZED ALTERNATING DIRECTION METHOD OF MULTIPLIERS WEI DENG AND WOTAO YIN Abstract. The formulation min x,y f(x) + g(y) subject to Ax + By = b arises in
More informationCoordinate descent methods
Coordinate descent methods Master Mathematics for data science and big data Olivier Fercoq November 3, 05 Contents Exact coordinate descent Coordinate gradient descent 3 3 Proximal coordinate descent 5
More informationDiscriminative Feature Grouping
Discriative Feature Grouping Lei Han and Yu Zhang, Department of Computer Science, Hong Kong Baptist University, Hong Kong The Institute of Research and Continuing Education, Hong Kong Baptist University
More informationDifferentiable Fine-grained Quantization for Deep Neural Network Compression
Differentiable Fine-grained Quantization for Deep Neural Network Compression Hsin-Pai Cheng hc218@duke.edu Yuanjun Huang University of Science and Technology of China Anhui, China yjhuang@mail.ustc.edu.cn
More informationInexact Alternating Direction Method of Multipliers for Separable Convex Optimization
Inexact Alternating Direction Method of Multipliers for Separable Convex Optimization Hongchao Zhang hozhang@math.lsu.edu Department of Mathematics Center for Computation and Technology Louisiana State
More informationMULTIPLEKERNELLEARNING CSE902
MULTIPLEKERNELLEARNING CSE902 Multiple Kernel Learning -keywords Heterogeneous information fusion Feature selection Max-margin classification Multiple kernel learning MKL Convex optimization Kernel classification
More informationMatrix Factorization with Applications to Clustering Problems: Formulation, Algorithms and Performance
Date: Mar. 3rd, 2017 Matrix Factorization with Applications to Clustering Problems: Formulation, Algorithms and Performance Presenter: Songtao Lu Department of Electrical and Computer Engineering Iowa
More informationOn the Convergence of Multi-Block Alternating Direction Method of Multipliers and Block Coordinate Descent Method
On the Convergence of Multi-Block Alternating Direction Method of Multipliers and Block Coordinate Descent Method Caihua Chen Min Li Xin Liu Yinyu Ye This version: September 9, 2015 Abstract The paper
More informationSparse inverse covariance estimation with the lasso
Sparse inverse covariance estimation with the lasso Jerome Friedman Trevor Hastie and Robert Tibshirani November 8, 2007 Abstract We consider the problem of estimating sparse graphs by a lasso penalty
More informationAn Optimization-based Approach to Decentralized Assignability
2016 American Control Conference (ACC) Boston Marriott Copley Place July 6-8, 2016 Boston, MA, USA An Optimization-based Approach to Decentralized Assignability Alborz Alavian and Michael Rotkowitz Abstract
More informationA Unified Approach to Proximal Algorithms using Bregman Distance
A Unified Approach to Proximal Algorithms using Bregman Distance Yi Zhou a,, Yingbin Liang a, Lixin Shen b a Department of Electrical Engineering and Computer Science, Syracuse University b Department
More informationAn Efficient Algorithm For Weak Hierarchical Lasso. Yashu Liu, Jie Wang, Jieping Ye Arizona State University
An Efficient Algorithm For Weak Hierarchical Lasso Yashu Liu, Jie Wang, Jieping Ye Arizona State University Outline Regression with Interactions Problems and Challenges Weak Hierarchical Lasso The Proposed
More informationBig Data Analytics: Optimization and Randomization
Big Data Analytics: Optimization and Randomization Tianbao Yang Tutorial@ACML 2015 Hong Kong Department of Computer Science, The University of Iowa, IA, USA Nov. 20, 2015 Yang Tutorial for ACML 15 Nov.
More informationAnalysis of Robust PCA via Local Incoherence
Analysis of Robust PCA via Local Incoherence Huishuai Zhang Department of EECS Syracuse University Syracuse, NY 3244 hzhan23@syr.edu Yi Zhou Department of EECS Syracuse University Syracuse, NY 3244 yzhou35@syr.edu
More information4y Springer NONLINEAR INTEGER PROGRAMMING
NONLINEAR INTEGER PROGRAMMING DUAN LI Department of Systems Engineering and Engineering Management The Chinese University of Hong Kong Shatin, N. T. Hong Kong XIAOLING SUN Department of Mathematics Shanghai
More informationChange point method: an exact line search method for SVMs
Erasmus University Rotterdam Bachelor Thesis Econometrics & Operations Research Change point method: an exact line search method for SVMs Author: Yegor Troyan Student number: 386332 Supervisor: Dr. P.J.F.
More informationLOCAL LINEAR CONVERGENCE OF ADMM Daniel Boley
LOCAL LINEAR CONVERGENCE OF ADMM Daniel Boley Model QP/LP: min 1 / 2 x T Qx+c T x s.t. Ax = b, x 0, (1) Lagrangian: L(x,y) = 1 / 2 x T Qx+c T x y T x s.t. Ax = b, (2) where y 0 is the vector of Lagrange
More informationConvergence of Bregman Alternating Direction Method with Multipliers for Nonconvex Composite Problems
Convergence of Bregman Alternating Direction Method with Multipliers for Nonconvex Composite Problems Fenghui Wang, Zongben Xu, and Hong-Kun Xu arxiv:40.8625v3 [math.oc] 5 Dec 204 Abstract The alternating
More informationDistributed Consensus Optimization
Distributed Consensus Optimization Ming Yan Michigan State University, CMSE/Mathematics September 14, 2018 Decentralized-1 Backgroundwhy andwe motivation need decentralized optimization? I Decentralized
More informationSparse and Regularized Optimization
Sparse and Regularized Optimization In many applications, we seek not an exact minimizer of the underlying objective, but rather an approximate minimizer that satisfies certain desirable properties: sparsity
More information