Multi-Block ADMM and its Convergence

Similar documents
Workshop on Nonlinear Optimization

The Direct Extension of ADMM for Multi-block Convex Minimization Problems is Not Necessarily Convergent

Recent Developments of Alternating Direction Method of Multipliers with Multi-Block Variables

On convergence rate of the Douglas-Rachford operator splitting method

arxiv: v1 [math.oc] 23 May 2017

Does Alternating Direction Method of Multipliers Converge for Nonconvex Problems?

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 9. Alternating Direction Method of Multipliers

4y Springer NONLINEAR INTEGER PROGRAMMING

Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.16

Coordinate Update Algorithm Short Course Operator Splitting

The Direct Extension of ADMM for Multi-block Convex Minimization Problems is Not Necessarily Convergent

Beyond Heuristics: Applying Alternating Direction Method of Multipliers in Nonconvex Territory

Distributed Optimization via Alternating Direction Method of Multipliers

On the Iteration Complexity of Some Projection Methods for Monotone Linear Variational Inequalities

Proximal Newton Method. Zico Kolter (notes by Ryan Tibshirani) Convex Optimization

Math 273a: Optimization Overview of First-Order Optimization Algorithms

Solving DC Programs that Promote Group 1-Sparsity

A Brief Overview of Practical Optimization Algorithms in the Context of Relaxation

Douglas-Rachford splitting for nonconvex feasibility problems

Optimizing Nonconvex Finite Sums by a Proximal Primal-Dual Method

Algorithms for Nonsmooth Optimization

Accelerated primal-dual methods for linearly constrained convex problems

An efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss

Iterative Reweighted Minimization Methods for l p Regularized Unconstrained Nonlinear Programming

Network Localization via Schatten Quasi-Norm Minimization

Distributed Optimization and Statistics via Alternating Direction Method of Multipliers

A Trust Funnel Algorithm for Nonconvex Equality Constrained Optimization with O(ɛ 3/2 ) Complexity

Project Discussions: SNL/ADMM, MDP/Randomization, Quadratic Regularization, and Online Linear Programming

The Multi-Path Utility Maximization Problem

Some Inexact Hybrid Proximal Augmented Lagrangian Algorithms

Dealing with Constraints via Random Permutation

ADMM and Fast Gradient Methods for Distributed Optimization

Optimal Linearized Alternating Direction Method of Multipliers for Convex Programming 1

Proximal Newton Method. Ryan Tibshirani Convex Optimization /36-725

Solving Dual Problems

A Randomized Approach for Crowdsourcing in the Presence of Multiple Views

A direct formulation for sparse PCA using semidefinite programming

Proximal ADMM with larger step size for two-block separable convex programming and its application to the correlation matrices calibrating problems

Complexity Analysis of Interior Point Algorithms for Non-Lipschitz and Nonconvex Minimization

Coordinate Update Algorithm Short Course Proximal Operators and Algorithms

Matrix Factorization with Applications to Clustering Problems: Formulation, Algorithms and Performance

A GENERAL INERTIAL PROXIMAL POINT METHOD FOR MIXED VARIATIONAL INEQUALITY PROBLEM

A Unified Approach to Proximal Algorithms using Bregman Distance

Journal of Convex Analysis (accepted for publication) A HYBRID PROJECTION PROXIMAL POINT ALGORITHM. M. V. Solodov and B. F.

On the acceleration of augmented Lagrangian method for linearly constrained optimization

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

THE solution of the absolute value equation (AVE) of

Local Strong Convexity of Maximum-Likelihood TDOA-Based Source Localization and Its Algorithmic Implications

Stochastic Proximal Gradient Algorithm

Dual Methods. Lecturer: Ryan Tibshirani Convex Optimization /36-725

An Inexact Sequential Quadratic Optimization Method for Nonlinear Optimization

On the Convergence of Multi-Block Alternating Direction Method of Multipliers and Block Coordinate Descent Method

A Fast Augmented Lagrangian Algorithm for Learning Low-Rank Matrices

A Generalized Uncertainty Principle and Sparse Representation in Pairs of Bases

Dual Ascent. Ryan Tibshirani Convex Optimization

1. Introduction. We consider the following constrained optimization problem:

The Alternating Direction Method of Multipliers

A Bregman alternating direction method of multipliers for sparse probabilistic Boolean network problem

First-order methods for structured nonsmooth optimization

Frank-Wolfe Method. Ryan Tibshirani Convex Optimization

Generalized Alternating Direction Method of Multipliers: New Theoretical Insights and Applications

A Power Penalty Method for a Bounded Nonlinear Complementarity Problem

A direct formulation for sparse PCA using semidefinite programming

ALADIN An Algorithm for Distributed Non-Convex Optimization and Control

Expanding the reach of optimal methods

A DECOMPOSITION PROCEDURE BASED ON APPROXIMATE NEWTON DIRECTIONS

Structural and Multidisciplinary Optimization. P. Duysinx and P. Tossings

I P IANO : I NERTIAL P ROXIMAL A LGORITHM FOR N ON -C ONVEX O PTIMIZATION

Coordinate descent. Geoff Gordon & Ryan Tibshirani Optimization /

Alternative Decompositions for Distributed Maximization of Network Utility: Framework and Applications

Lecture 23: November 21

Infeasibility Detection and an Inexact Active-Set Method for Large-Scale Nonlinear Optimization

Active sets, steepest descent, and smooth approximation of functions

Frist order optimization methods for sparse inverse covariance selection

Sparse Gaussian conditional random fields

Proximal-like contraction methods for monotone variational inequalities in a unified framework

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization /

A Distributed Method for Optimal Capacity Reservation

Alternating Direction Method of Multipliers. Ryan Tibshirani Convex Optimization

A Proximal Alternating Direction Method for Semi-Definite Rank Minimization (Supplementary Material)

Composite nonlinear models at scale

Inexact Newton Methods and Nonlinear Constrained Optimization

A class of Smoothing Method for Linear Second-Order Cone Programming

Projection methods to solve SDP

M. Marques Alves Marina Geremia. November 30, 2017

Minimizing the Difference of L 1 and L 2 Norms with Applications

arxiv: v2 [math.oc] 28 Jan 2019

Variational Bayesian Inference Techniques

Primal-dual Subgradient Method for Convex Problems with Functional Constraints

Ranking from Crowdsourced Pairwise Comparisons via Matrix Manifold Optimization

Spectral Processing. Misha Kazhdan

A SIMPLE PARALLEL ALGORITHM WITH AN O(1/T ) CONVERGENCE RATE FOR GENERAL CONVEX PROGRAMS

Asynchronous Non-Convex Optimization For Separable Problem

OPER 627: Nonlinear Optimization Lecture 9: Trust-region methods

Spectral gradient projection method for solving nonlinear monotone equations

Optimization for Learning and Big Data

Splitting methods for decomposing separable convex programs

An interior-point stochastic approximation method and an L1-regularized delta rule

Bindel, Spring 2016 Numerical Analysis (CS 4220) Notes for

A Novel Iterative Convex Approximation Method

Transcription:

Multi-Block ADMM and its Convergence Yinyu Ye Department of Management Science and Engineering Stanford University Email: yyye@stanford.edu We show that the direct extension of alternating direction method of multipliers (ADMM) with three blocks is not necessarily convergent even for solving a square system of linear equations, although its convergence proof was established 40 years ago with one or two-block. However, we prove that, in each iteration if one randomly and independently permutes the updating order of variable blocks followed by the regular multiplier update, then ADMM will converge in expectation when solving any square system of linear equations with any number of blocks. We also discuss its extension to solve general convex optimization problems, in particular, linear and quadratic programs. Joint work with Ruoyu Sun and Tom Luo. 1

Optimization for High Dimensional Data Tao Yao Pennsylvania State University Email: tyy1@engr.psu.edu we consider folded concave penalized sparse linear regression (FCSLR), an important statistical model aimed for high-dimensional data analysis problems. Due to the presence of a folded concave penalty, the problem formulation involves both nonsmoothness and nonconvexity. We show that local solutions satisfying a second order necessary condition (SONC) entail good estimation accuracy with a probability guarantee. We discuss (1) a novel potential reduction method (PR) and (2) a new mixed integer linear programming-based global optimization (MIPGO) scheme. This talk is based on joint work with Hongcheng Liu, Runze Li and Yinyu Ye. 2

Efficient Approaches for Two Big Data Matrix Optimization Problems Zhaosong Lu Department of Mathematics Simon Fraser University, Canada Email: zhaosong@sfu.ca In the first part of this talk, we consider low rank matrix completion problem, which has wide applications such as collaborative filtering, image inpainting and Microarray data imputation. In particular, we present an efficient and scalable algorithm for matrix completion. In each iteration, we pursue a rank-one matrix basis generated by the top singular vector pair of the current approximation residual and update the weights for all rank-one matrices obtained up to the current iteration. We further propose a novel weight updating rule to reduce the time and storage complexity, making the proposed algorithm scalable to large matrices. We establish a linear rate of convergence for the algorithm. Numerical experiments on many real-world large scale datasets demonstrate that our algorithm is much more efficient than the state-of-the-art algorithms while achieving similar or better prediction performance. In the second part we consider the problem of estimating multiple graphical models simultaneously using the fused lasso penalty, which encourages adjacent graphs to share similar structures. A motivating example is the analysis of brain networks of Alzheimer s disease, which involves estimating a brain network for the normal controls (NC), a brain network for the patients with mild cognitive impairment (MCI), and a brain network for Alzheimer s patients (AD). We expect the two brain networks for NC and MCI and for MCI and AD to share common structures but not to be identical to each other. We establish a necessary and sufficient condition for the graphs to be decomposable. As a consequence, a simple but effective screening rule is proposed, which decomposes large graphs into small subgraphs and dramatically reduces the overall computational cost. Numerical experiments on both synthetic and real data demonstrate the effectiveness and efficiency of the proposed approach. 3

Optimal Joint Provision of Backhaul and Radio Access Networks Zhi-quan Luo University of Minnesota/The Chinese University of Hong Kong, Shenzhen Email: luoxx032@umn.edu We consider a cloud-based heterogeneous network of base stations (BSs) connected via a backhaul network of routers and wired/wireless links with limited capacity. The optimal provision of such networks requires proper resource allocation across the radio access links in conjunction with appropriate traffic engineering within the backhaul network. In this work we propose an efficient algorithm for joint resource allocation across the wireless links and the flow control over the entire network. The proposed algorithm, which maximizes the min-rate among all the transmitted commodities, is based on a decomposition approach that leverages both the Alternating Direction Method of Multipliers (ADMM) and the weighted-mmse (WMMSE) algorithm. We show that this algorithm is easily parallelizable and converges globally to a stationary solution of the joint optimization problem. The proposed algorithm can also be extended to networks with multi-antenna nodes and other utility functions. 4

A Wavelet Frame Based Model for Smooth Functions and Beyond Jianfeng Cai Department of Mathematics University of Iowa Email: jianfeng-cai@uiowa.edu In this talk, we present a new wavelet frame based image restoration model that explicitly treats images as piecewise smooth functions. It estimates both the image to be restored and its singularity set. It can well protect singularities, which are important image features, and provide enough regularization in smooth regions at the same time. This model penalizes the l 2 - norm of the wavelet frame coefficients away from the singularity set, while penalizes the l 1 -norm of the coefficients on the singularity set. 5

Some New Results on the Convergence of Multi-Block ADMM Shiqian Ma Department of Systems Engineering and Engineering Management The Chinese University of Hong Kong Email: sqma@se.cuhk.edu.hk The alternating direction method of multipliers (ADMM) is widely used in solving structured convex optimization problems due to its superior practical performance. On the theoretical side however, a counterexample was shown by Chen et.al. indicating that the multi-block ADMM for minimizing the sum of N (N 3) convex functions with N block variables linked by linear constraints may diverge. It is therefore of great interest to investigate further sufficient conditions on the input side which can guarantee convergence for the multi-block ADMM. In this talk, we present the following two new results regarding to the convergence of multi-block ADMM. (1) The existing results on sufficient conditions for guaranteeing convergence of multi-block ADMM typically require the strong convexity on parts of the objective. In our work, we show convergence and convergence rate results for the multi-block ADMM applied to solve certain N-block (N 3) convex minimization problems without requiring strong convexity. Specifically, we prove that the multi-block ADMM returns an ɛ-optimal solution within O(1/ɛ 2 ) iterations by solving an associated perturbation to the original problem; and the multi-block ADMM returns an ɛ-optimal solution within O(1/ɛ) iterations when it is applied to solve a certain sharing problem, under the condition that the augmented Lagrangian function satisfies the Kurdyka- Lojasiewicz property, which essentially covers most convex optimization models except for some pathological cases. (2) It is known that the 2-block ADMM globally converges with any penalty parameter, i.e., it is a parameter free algorithm. In our work, we show that the unmodified 3-block ADMM is also parameter free, when it is applied to solving a certain sharing problem, which covers many interesting applications. 6

The Linearized Proximal Algorithm for Convex Composite Optimization with Applications Chong Li Department of Mathematics Zhejiang University, China Email: cli@zju.edu.cn We propose a linearized proximal algorithm (LPA) to solve a convex composite optimization problem. Each iteration of the LPA is a proximal minimization on the composition of the outer function and the linearization of the inner function at current iterate. The LPA has the attractive computational advantage in that the solution of each subproblem is a singleton, which avoids the difficulty of finding the whole solution set of the subproblem, as in the Gauss-Newton method (GNM), while it still maintains the same local convergence rate as that of the GNM. Under the assumptions of local weak sharp minima of order p (p [1, 2]) and the quasi-regularity condition, we establish the local superlinear convergence rate for the LPA. We also propose a globalization strategy for LPA based on the backtracking linesearch and the inexact LPA, as well as the superlinear convergence results. We further apply the LPA to solve a feasibility problem, as well as a sensor network localization problem. Our numerical results illustrate that the LPA meets the demand for a robust and efficient algorithm for the sensor network localization problem. 7

Douglas-Rachford Splitting for Nonconvex Feasibility Problems Ting Kei Pong Department of Applied Mathematics The Hong Kong Polytechnic University Email: tk.pong@polyu.edu.hk The Douglas-Rachford (DR) splitting method is a popular approach for finding a point in the intersection of two closed convex sets and has also been applied very successfully to various nonconvex instances, even though the theoretical justification in this latter setting is far from being complete. In this talk, we aim at understanding the behavior of the DR splitting in finding an intersection point of a closed convex set C and a possibly nonconvex closed set D. In particular, we adapt this method to minimize the square distance to C subject to D. When the stepsize is small and either C or D is compact, we show that the sequence generated is bounded and any cluster point gives a stationary point of the minimization problem. Moreover, if C and D are in addition semi-algebraic, then the whole sequence is convergent. We also discuss a generalization of the method to minimize the sum of a Lipschitz differentiable function and a proper closed function, both possibly nonconvex. We present preliminary numerical results comparing the DR splitting against the alternating projection method. This is joint work with Guoyin Li. 8

Newton-Krylov Methods for Problems with Embedded Monte Carlo Simulations Tim Kelley Department of Mathematics North Carolina State University, USA Email: tim kelley@ncsu.edu We analyze the behavior of inexact Newton methods for problems where the nonlinear residual, Jacobian, and Jacobian-vector products are the outputs of Monte Carlo simulations. We propose algorithms which account for the randomness in the iteration, develop theory for the behavior of these algorithms, and illustrate the results with an example from neutronics. 9

On Game Problems with Vector-Valued Payoffs Guangya Chen Chinese Academy of Sciences Email: chengy@amss.ac.cn In this talk we introduce Shapley s model on game problems with vectorvalued payoffs and Aumann s extension. We introduce also Aumann s works on utility functions and applications in research of this kind of game problems. 10

On Power Penalty Methods for Linear Complementarity Problems Arising from American Option Pricing Xiaoqi Yang Department of Applied Mathematics The Hong Kong Polytechnic University Email: xiao.qi.yang@polyu.edu.hk Power penalty methods for solving a linear parabolic complementarity problem arising from American option pricing have attracted much attention. These methods require us to solve a series of systems of nonlinear equations (called penalized equations). In this paper, we first study the relationships among the solutions of penalized equations under appropriate conditions. Additionally, since these penalized equations are neither smooth nor convex, some existing algorithms, such as Newton method, cannot be applied directly to solve them. We shall apply the nonlinear Jacobian method to solve penalized equations and verify that the iteration sequence generated by the method converges monotonically to the solution of the penalized equation. Some numerical results confirm the theoretical results and the efficiency of the proposed algorithm. 11

On the Linear Convergence Rate of a Generalized Proximal Point Algorithm Xiaoming Yuan Department of Mathematics Hong Kong Baptist University Email: xmyuan@hkbu.edu.hk The proximal point algorithm (PPA) has been well studied in the literature. In particular, its linear convergence rate has been studied by Rockafellar in 1976 under certain condition. We consider a generalized PPA in the generic setting of finding a zero point of a maximal monotone operator, and show that the condition proposed by Rockafellar can also sufficiently ensure the linear convergence rate for this generalized PPA. Both the exact and inexact versions of this generalized PPA are discussed. The motivation to consider this generalized PPA is that it includes as special cases the relaxed versions of some splitting methods that are originated from PPA. Thus, linear convergence results of this generalized PPA can be used to better understand the convergence of some widely usedalgorithms in the literature. We focus on the particular convex minimization context and specify Rockafellar s condition to see how to ensure the linear convergence rate for some efficient numerical schemes, including the classical augmented Lagrangian method proposed by Hensen and Powell in 1969 and its relaxed version, the original alternating direction method of multipliers (ADMM) by Glowinski and Marrocco in 1975 and its relaxed version (i.e., the generalized ADMM by Eckstein and Bertsekas in 1992). Some refined conditions weaker than existing ones are proposed in these particular contexts. 12

Sparse Solutions of Linear Complementarity Problems Shuhuang Xiang School of Mathematics and Statistics Central South University, China Email: xiangsh@csu.edu.cn We consider the characterization and computation of sparse solutions and least-p-norm (0 < p < 1) solutions of the linear complementarity problems LCP(q, M). We show that the number of non-zero entries of any least-pnorm solution of the LCP(q, M) is less than or equal to the rank of M for any arbitrary matrix M and any number p (0, 1), and there is p (0, 1) such that all least-p-norm solutions for p (0, p) are sparse solutions. Moreover, we provide conditions on M such that a sparse solution can be found by solving convex minimization. 13

Applications of Sparse Solutions of Linear Complementarity Problems Xiaojun Chen Department of Applied Mathematics The Hong Kong Polytechnic University Email: xiaojun.chen@polyu.edu.hk We consider applications of sparse solution of the LCP to the problem of portfolio selection within the Markowitz mean-variance framework. We test this methodology on the data sets constructed by Fama and French and the newly launched Shanghai-Hong Kong Stock Connect scheme. We compare with the 1/N strategy, and L 1 regularized portfolios in the terms of Sharpe ratio, VaR and CVaR. 14