Sparsity Matters. Robert J. Vanderbei September 20. IDA: Center for Communications Research Princeton NJ.

Similar documents
Optimization for Compressed Sensing

Abstract This paper is about the efficient solution of large-scale compressed sensing problems.

Fast Algorithms for LAD Lasso Problems

Abstract We propose two approaches to solve large-scale compressed sensing problems. The

Fast Fourier Optimization and Related Sparsifications

Solving Linear and Integer Programs

Lecture 3 Interior Point Methods and Nonlinear Optimization

15-780: LinearProgramming

Fast-Fourier Optimization

Lecture 11: Post-Optimal Analysis. September 23, 2009

The dual simplex method with bounds

Canonical Problem Forms. Ryan Tibshirani Convex Optimization

A Generalized Uncertainty Principle and Sparse Representation in Pairs of Bases

The Solution of Euclidean Norm Trust Region SQP Subproblems via Second Order Cone Programs, an Overview and Elementary Introduction

Simplex Method for LP (II)

Compressed sensing. Or: the equation Ax = b, revisited. Terence Tao. Mahler Lecture Series. University of California, Los Angeles

Sparse PCA with applications in finance

Lesson 27 Linear Programming; The Simplex Method

Constrained optimization

The Ongoing Development of CSDP

MS&E 318 (CME 338) Large-Scale Numerical Optimization

1 Singular Value Decomposition and Principal Component

Linear Methods (Math 211) - Lecture 2

Interior-Point Methods for Linear Optimization

ORF 307: Lecture 2. Linear Programming: Chapter 2 Simplex Methods

Markowitz Efficient Portfolio Frontier as Least-Norm Analytic Solution to Underdetermined Equations

Interior Point Methods. We ll discuss linear programming first, followed by three nonlinear problems. Algorithms for Linear Programming Problems

CME 302: NUMERICAL LINEAR ALGEBRA FALL 2005/06 LECTURE 6

A direct formulation for sparse PCA using semidefinite programming

Linear Programming Redux

A priori bounds on the condition numbers in interior-point methods

The simplex algorithm

Introduction to Mathematical Programming IE406. Lecture 10. Dr. Ted Ralphs

A Review of Matrix Analysis

Contents. 1 Vectors, Lines and Planes 1. 2 Gaussian Elimination Matrices Vector Spaces and Subspaces 124

CS295: Convex Optimization. Xiaohui Xie Department of Computer Science University of California, Irvine

"SYMMETRIC" PRIMAL-DUAL PAIR

A Review of Linear Programming

Fast Algorithms for SDPs derived from the Kalman-Yakubovich-Popov Lemma

Solving Linear Systems Using Gaussian Elimination

A Parametric Simplex Algorithm for Linear Vector Optimization Problems

21. Solve the LP given in Exercise 19 using the big-m method discussed in Exercise 20.

Sparse Optimization Lecture: Basic Sparse Optimization Models

Optimality, Duality, Complementarity for Constrained Optimization

6-1 The Positivstellensatz P. Parrilo and S. Lall, ECC

ORF 522. Linear Programming and Convex Analysis

A direct formulation for sparse PCA using semidefinite programming

TIM 206 Lecture 3: The Simplex Method

Yinyu Ye, MS&E, Stanford MS&E310 Lecture Note #06. The Simplex Method

Review Questions REVIEW QUESTIONS 71

III. Applications in convex optimization

Numerical Methods I Non-Square and Sparse Linear Systems

Solving linear systems (6 lectures)

MAT016: Optimization

Lecture: Algorithms for LP, SOCP and SDP

Sparsity in Underdetermined Systems

ORF 523 Lecture 9 Spring 2016, Princeton University Instructor: A.A. Ahmadi Scribe: G. Hall Thursday, March 10, 2016

Exploiting hyper-sparsity when computing preconditioners for conjugate gradients in interior point methods

Semidefinite Programming

Primal-Dual Interior-Point Methods. Ryan Tibshirani Convex Optimization

Linear Programming: Chapter 5 Duality

3. Vector spaces 3.1 Linear dependence and independence 3.2 Basis and dimension. 5. Extreme points and basic feasible solutions

Derivation of the Kalman Filter

LP. Kap. 17: Interior-point methods

The Simplex Method: An Example

Lecture 4: January 26

Review Solutions, Exam 2, Operations Research

LINEAR SYSTEMS (11) Intensive Computation

9. Numerical linear algebra background

Conic optimization under combinatorial sparsity constraints

On Optimal Frame Conditioners

CSCI 1951-G Optimization Methods in Finance Part 10: Conic Optimization

The Dual Simplex Algorithm

Noisy Signal Recovery via Iterative Reweighted L1-Minimization

Primal-Dual Interior-Point Methods. Ryan Tibshirani Convex Optimization /36-725

A Generalized Homogeneous and Self-Dual Algorithm. for Linear Programming. February 1994 (revised December 1994)

Matrix Rank Minimization with Applications

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28

Optimization (168) Lecture 7-8-9

Acyclic Semidefinite Approximations of Quadratically Constrained Quadratic Programs

10-725/ Optimization Midterm Exam

Duality Theory of Constrained Optimization

9. Numerical linear algebra background

Fast-Fourier Optimization

CSC373: Algorithm Design, Analysis and Complexity Fall 2017 DENIS PANKRATOV NOVEMBER 1, 2017

Lagrange Duality. Daniel P. Palomar. Hong Kong University of Science and Technology (HKUST)

Linear Algebraic Equations

Part 1. The Review of Linear Programming

Chapter 8 Cholesky-based Methods for Sparse Least Squares: The Benefits of Regularization

A TOUR OF LINEAR ALGEBRA FOR JDEP 384H

Revised Simplex Method

12. Interior-point methods

COMPARATIVE ANALYSIS OF ORTHOGONAL MATCHING PURSUIT AND LEAST ANGLE REGRESSION

Nonlinear Optimization for Optimal Control

Note 3: LP Duality. If the primal problem (P) in the canonical form is min Z = n (1) then the dual problem (D) in the canonical form is max W = m (2)

Relation of Pure Minimum Cost Flow Model to Linear Programming

Convex Optimization and l 1 -minimization

Dual Basic Solutions. Observation 5.7. Consider LP in standard form with A 2 R m n,rank(a) =m, and dual LP:

Parallel programming practices for the solution of Sparse Linear Systems (motivated by computational physics and graphics)

Inderjit Dhillon The University of Texas at Austin

Transcription:

Sparsity Matters Robert J. Vanderbei 2017 September 20 http://www.princeton.edu/ rvdb IDA: Center for Communications Research Princeton NJ

The simplex method is 200 times faster...

The simplex method is 200 times faster... than the simplex method. John Forrest

More Generally... Any optimization algorithm is 200 times faster...

More Generally... Any optimization algorithm is 200 times faster... than that same algorithm.

What Can Go Wrong? Sometimes the straightforward/default numerical linear algebra isn t the best way to solve a particular type of problem. Sometimes the obvious natural formulation of an optimization problem isn t very easy to solve but there is a mathematically equivalent variant that one can solve quickly. Sometimes the real-world problem to be solved isn t precisely defined and an alternate formulation might be vastly easier to solve. I will discuss some examples of each of these scenarios.

Numerical Linear Algebra

Linear Programming (LP) Primal: maximize c T x subject to Ax = b x 0 Interior-Point Algorithm Dual: minimize b T y subject to A T y z = c z 0 primal infeasibility dual infeasibility step length initialize x, y, z such that x and z are positive while (not optimal) { ρ = b Ax σ = c A T y + z µ = δ z T x / n solve: A x = ρ A T y + X 1 Z x = σ + µx 1 e z z = X 1 Z x µx 1 e + z { } θ = r (max ) 1 j xj /x j, z j /z j x x + θ x, y y + θ y, z z + θ z } Note: X = Diag(x) and Z = Diag(z).

Linear Programming (LP) Primal: maximize c T x subject to Ax = b x 0 Interior-Point Algorithm Dual: minimize b T y subject to A T y z = c z 0 primal infeasibility dual infeasibility step length initialize x, y, z such that x and z are positive while (not optimal) { ρ = b Ax σ = c A T y + z µ = δ z T x / n solve: A x = ρ A T y + X 1 Z x = σ + µx 1 e z z = X 1 Z x µx 1 e + z { } θ = r (max ) 1 j xj /x j, z j /z j x x + θ x, y y + θ y, z z + θ z } Note: X = Diag(x) and Z = Diag(z).

Reduced KKT System A x = ρ A T y + X 1 Z x = σ + µx 1 e z Normal Equations Use second set of equations above to solve for x, where x = DA T y + Dσ + µz 1 e x D = XZ 1 and eliminate x from the first set of equations to arrive at the normal equations: ADA T y = ρ ADσ µaz 1 e + Ax

Problems with Dense Columns Consider this LP: maximize c T x + c 0 x 0 subject to Ax + a x 0 = b x, x 0 0 There are practical real-world problems with this structure. Also, the Big-M method for algorithms that require a feasible initial starting point involves such a structure. The computationally intensive part of the original approach to finding the step directions in interior-point methods was to solve the so-called normal equations for the dual step direction y: [ ] [ ] [ D 2 0 A a 0 d 2 A T a T ] y =... where D is a diagonal matrix and d is a scalar. Multiplying out, we get: (AD 2 A T + d 2 aa T ) y =...

The matrix AD 2 A T is sparse but aa T is dense. Hence, the sum is dense and solving the system of equations as formulated is highly inefficient. There are three ways to address this problem: Use the Sherman-Morrison-Woodbury-AddYourNameHere formula to express solutions to the dense system in terms of solutions to the system without the dense (rank one) term. Solve the reduced KKT system instead of the normal equations. (Symmetric Indefinite Systems for Interior Point Methods, Math. Prog., 58:1-32, 1993) Re-express the problem using several variables in place of the single variable x 0. (Splitting Dense Columns in Sparse Linear Systems, Linear Algebra and Applications, 152:107-117, 1991)

Splitting Dense Columns A problem with these constraints... [ A a ] [ x x 0 ] = b can be reformulated in this equivalent but sparser fashion... a 1 a 2 A a 3... a m 1 a m 1 1 1 1...... 1 1 1 1 x x 0,1 x 0,2 x 0,3. x 0,m 1 x 0,m = b 0 0. 0 0 For an optimizer based on solving the normal equations, this second form of the problem will solve much faster than the original form. Of course, most/all modern interior-point method solvers solve the reduced KKT system to avoid this issue.

The dense columns example is a bit dated but it illustrates an important point: The number of constraints, m, and the number of variables, n, often have little to do with how long it takes to solve a problem. The sparsity of the constraint matrix plays a huge role. Example: I recently solved a generalized network flow problem using ampl/loqo. It has 195,503 variables and 123,571 constraints. The problem solved in 3.9 seconds on my laptop computer. For comparison, I solved a dense problem with 1/100-th as many variables and 1/100-th as many constraints. It took 59 seconds to solve.

The dense columns example is a bit dated but it illustrates an important point: The number of constraints, m, and the number of variables, n, often have little to do with how long it takes to solve a problem. The sparsity of the constraint matrix plays a huge role. Example: I recently solved a generalized network flow problem using ampl/loqo. It has 195,503 variables and 123,571 constraints. The problem solves in 3.9 seconds on my laptop computer. For comparison, I solved a dense problem with 1/100-th as many variables and 1/100-th as many constraints. It took 59 seconds to solve. SPARSITY MATTERS

Mathematically Equivalent Problems

Markowitz Model for Portfolio Selection minimize µ x T Σx r T x subject to e T x = 1 x 0 Here, r denotes the vector of expected returns, Σ denotes the matrix of covariances of the returns, e is the vector of all ones and µ is the risk aversion parameter. Equivalent Reformulation The matrix Σ is positive semidefinite and therefore can be expressed (and probably was defined) as a product Σ = U T U. minimize µ y T y r T x subject to y U x = 0 e T x = 1 x 0 Example. A problem with 10,000 market assets and a covariance matrix based on data from 100 time periods in the past, the first problem has 10,000 variables and just one constraint. It solves in 1480 seconds. The second formulation has 10,100 variables and 101 constraints. It solves in 10.3 seconds a speedup by a factor of 144.

Alternate Formulations

Compressed sensing Aims to recover a sparse signal from a small number of measurements. Let x 0 = (x 0 1,..., x 0 n) T R n denote a signal to be recovered. We assume n is large and that x 0 is sparse. Let A be a given (or chosen) m n matrix with m n. The compressed sensing problem is to recover x 0 assuming only that we know y = Ax 0 and that x 0 is sparse. Since x 0 is a sparse vector, one can hope that it is the sparsest solution to the underdetermined linear system and therefore can be recovered from y by solving (P 0 ) min x x 0 subject to Ax = y, where x 0 = #{i : x i 0}. This problem is NP-hard due to the nonconvexity of the 0-pseudo-norm.

Basis Pursuit Basis pursuit is one way to circumvent NP-hardness. Replace x 0 with x 1 = j x j : (P 1 ) min x x 1 subject to Ax = y.

Two New Ideas The first idea is motivated by the fact that the desired solution is sparse and therefore should require only a relatively small number of simplex pivots to find, starting from an appropriately chosen starting point the zero vector. Using the parametric simplex method it is easy to take the zero vector as the starting basic solution. The second idea requires a modified sensing scheme: reshape the signal vector x into a matrix X and then multiply the matrix signal on both the left and the right sides to get a compressed matrix signal. This idea allows one to formulate the linear programming problem in such a way that the constraint matrix is very sparse and therefore the problem can be solved very efficiently. Details to follow...

An Equivalent Parametric Formulation Consider the following parametric perturbation to (P 1 ): ( x, ε) = argmin µ x 1 + ε 1 (1) x,ε subject to Ax + ε = y. For large values of µ, the optimal solution has x = 0 and ε = y. For values of µ close to zero, the situation reverses: ε = 0. We refer to this problem as the vector compressed sensing problem.

Dealing with Absolute Values One way to reformulate the optimization problem (1) as a linear programming problem is to split each variable into a difference between two nonnegative variables, x = x + x and ε = ε + ε, where the entries of x +, x, ε +, ε are all nonnegative. The next step is to replace x 1 with 1 T (x + + x ) and to make a similar substitution for ε 1. In general, the sum does not equal the absolute value but it is easy to see that it does at optimality.

Linear Programming Formulation Here is the reformulated linear programming problem: min x +,x,ε +,ε µ1t (x + + x ) + 1 T (ε + + ε ) subject to A(x + x ) + (ε + ε ) = y x +, x, ε +, ε 0. For µ large, the optimal solution has x + = x = 0, and ε + ε = y. And, given that these variables are required to be nonnegative, it follows that y i > 0 = ε + i > 0 and ε i = 0 y i < 0 = ε + i = 0 and ε i > 0 y i = 0 = ε + i = 0 and ε i = 0 With these choices, the solution is feasible for all µ and is optimal for large µ. Furthermore, declaring the nonzero variables to be basic variables and the zero variables to be nonbasic, we see that this optimal solution is also a basic solution and can therefore serve as a starting point for the parametric simplex method.

Kronecker Compressed Sensing Unlike the vector compressed sensing problem, Kronecker compressed sensing is used for sensing multidimensional signals (e.g., matrices or tensors). For example, given a sparse matrix signal X 0 R n 1 n 2, we can use two sensing matrices A R m 1 n 1 and B R m 2 n 2 and try to recover X 0 from knowledge of Y = AX 0 B T by solving the Kronecker compressed sensing problem: (P 2 ) X = argmin X 1 subject to AXB T = Y. When the signal is multidimensional, Kronecker compressed sensing is more natural than classical vector compressed sensing.

Kroneckerizing Vector Problems Sometimes, even when facing vector signals, it is beneficial to use Kronecker compressed sensing due to its added computational efficiency. More specifically, even though the target signal is a vector x 0 R n, if we assume that n can be factored into n 1 n 2, then we can first reshape x 0 into a matrix X 0 R n 1 n 2 by putting successive length n 1 sub-vectors of x 0 into columns of X 0. We then multiply the matrix signal X 0 on both the left and the right by sensing matrices A and B to get a compressed matrix signal Y 0. We will show that we are able to solve this Kronecker compressed sensing problem much more efficiently than the corresponding vector compressed sensing problem.

Vectorizing the Kroneckerization Standard LP solvers expect to see a vector of variables, not a matrix. The naive vectorization goes like this... Let x = vec(x) and y = vec(y ), where, as usual, the vec( ) operator takes a matrix and concatenates its elements column-by-column to build one large column-vector containing all the elements of the matrix. In terms of x and y, problem (P 2 ) can be rewritten as an equivalent vector compressed sensing problem: vec( X) = argmin x 1 subject to Ux = y, where U is given by the (m 1 m 2 ) (n 1 n 2 ) Kronecker product of A and B: U = B A = Ab 11 Ab 1n2..... Ab m2 1 Ab m2 n 2. The matrix U is fully dense. This is bad.

Sparsifying the Constraint Matrix The key to an efficient algorithm for solving the linear programming problem associated with the Kronecker sensing problem lies in noting that the dense matrix U can be factored into a product of two very sparse matrices: U = Ab 11 Ab 1n2..... Ab m2 1 Ab m2 n 2 = A 0 0 0 A 0...... 0 0 A b 11 I b 12 I b 1n2 I b 21 I b 22 I b 2n2 I...... b m2 1I b m2 1I b m2 n 2 I =: V W, where I denotes an n 1 n 1 identity matrix and 0 denotes an m 1 n 1 zero matrix. The constraints on the problem are Ux = y.

Exploiting the Sparsification The matrix U is usually completely dense. But, it is a product of two very sparse matrices: V and W. Hence, introducing some new variables, call them z, we can rewrite the constraints like this: z W x = 0 V z = y. And, as before, we can split x into a difference between their positive and negative parts to convert the problem to a linear program: T + min x +,x µ 1 (x + x ) subject to z W (x + x ) = 0 V z = y x +, x 0. This formulation has more variables and more constraints. But, the constraint matrix is very sparse. For linear programming, sparsity of the constraint matrix is the key to algorithm efficiency.

Numerical Results For the vector sensor, we generated random problems using m = 1,122 n = 20,022 = 141 142. We varied the number of nonzeros k in signal x 0 from 2 to 150. = 33 34 and We solved the straightforward linear programming formulations of these instances using an interior-point solver called loqo. We also solved a large number of instances of the parametrically formulated problem using the parametric simplex method. We followed a similar plan for the Kronecker sensor. For these problems, we used m 1 = 33, m 2 = 34, n 1 = 141, n 2 = 142, and various values of k. Again, the straightforward linear programming problems were solved by loqo and the parametrically formulated versions were solved by a custom developed parametric simplex method. For the Kronecker sensing problems, the matrices A and B were generated so that their elements are independent standard Gaussian random variables. For the vector sensing problems, the corresponding matrix U was used. We also ran some publicly-available, state-of-the-art codes: L 1 L s, FHTP, and Mirror Prox.

m = 1,222, n = 20,022. Error bars represent one standard deviation.

m = 1,122, k = 100, n = 141 n 2.

Conclusions The interior-point solver (loqo) applied to the Kronecker sensing problem is uniformly faster than both l 1 l s and the interior-point solver applied to the vector problem (the three horizontal lines in the plot). For very sparse problems, the parametric simplex method is best. In particular, for k 70, the parametric simplex method applied to the Kronecker sensing problem is the fastest method. It can be two or three orders of magnitude faster than l 1 l s. But, as explained earlier, the Kronecker sensing problem involves changing the underlying problem being solved. If one is required to stick with the vector problem, then it too is the best method for k 80 after which the l 1 l s method wins. Instructions for downloading and running the various codes/algorithms described herein can be found at: http://www.orfe.princeton.edu/~rvdb/tex/cts/kronecker_sim.html

Thank You!

And Let The Fun Begin...