FINDING PARALLELISM IN GENERAL-PURPOSE LINEAR PROGRAMMING

Similar documents
Experiments with iterative computation of search directions within interior methods for constrained optimization

Multilevel low-rank approximation preconditioners Yousef Saad Department of Computer Science and Engineering University of Minnesota

Exploiting hyper-sparsity when computing preconditioners for conjugate gradients in interior point methods

Incomplete Cholesky preconditioners that exploit the low-rank property

Linear Algebra in IPMs

AMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences)

Lecture 10: Linear programming. duality. and. The dual of the LP in standard form. maximize w = b T y (D) subject to A T y c, minimize z = c T x (P)

SOLVING SPARSE LINEAR SYSTEMS OF EQUATIONS. Chao Yang Computational Research Division Lawrence Berkeley National Laboratory Berkeley, CA, USA

CSCI 1951-G Optimization Methods in Finance Part 01: Linear Programming

FPGA Implementation of a Predictive Controller

Multicommodity Flows and Column Generation

AM 121: Intro to Optimization Models and Methods

MS&E 318 (CME 338) Large-Scale Numerical Optimization

Linear Programming: Simplex

MARCH 24-27, 2014 SAN JOSE, CA

Part 4: Active-set methods for linearly constrained optimization. Nick Gould (RAL)

Solving the normal equations system arising from interior po. linear programming by iterative methods

Worked Examples for Chapter 5

AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 23: GMRES and Other Krylov Subspace Methods; Preconditioning

J.I. Aliaga 1 M. Bollhöfer 2 A.F. Martín 1 E.S. Quintana-Ortí 1. March, 2009

Indefinite and physics-based preconditioning

Preconditioning Techniques for Large Linear Systems Part III: General-Purpose Algebraic Preconditioners

Accelerating interior point methods with GPUs for smart grid systems

10 Numerical methods for constrained problems

CME342 Parallel Methods in Numerical Analysis. Matrix Computation: Iterative Methods II. Sparse Matrix-vector Multiplication.

Yinyu Ye, MS&E, Stanford MS&E310 Lecture Note #06. The Simplex Method

Relation of Pure Minimum Cost Flow Model to Linear Programming

Enhancing Scalability of Sparse Direct Methods

Linear algebra issues in Interior Point methods for bound-constrained least-squares problems

21. Solve the LP given in Exercise 19 using the big-m method discussed in Exercise 20.

Scientific Computing with Case Studies SIAM Press, Lecture Notes for Unit VII Sparse Matrix

Sparse Matrices and Iterative Methods

Applications. Stephen J. Stoyan, Maged M. Dessouky*, and Xiaoqing Wang

A SPECIALIZED INTERIOR-POINT ALGORITHM FOR MULTICOMMODITY NETWORK FLOWS

arxiv:cs/ v1 [cs.ms] 13 Aug 2004

Constraint Reduction for Linear Programs with Many Constraints

Implementation of Warm-Start Strategies in Interior-Point Methods for Linear Programming in Fixed Dimension

Sparsity Matters. Robert J. Vanderbei September 20. IDA: Center for Communications Research Princeton NJ.

Contents. Preface... xi. Introduction...

Chapter 8 Cholesky-based Methods for Sparse Least Squares: The Benefits of Regularization

Sparsity-Preserving Difference of Positive Semidefinite Matrix Representation of Indefinite Matrices

Lecture 11: Post-Optimal Analysis. September 23, 2009

A robust multilevel approximate inverse preconditioner for symmetric positive definite matrices

Convergence Analysis of Inexact Infeasible Interior Point Method. for Linear Optimization

Lecture 10: Linear programming duality and sensitivity 0-0

SMO vs PDCO for SVM: Sequential Minimal Optimization vs Primal-Dual interior method for Convex Objectives for Support Vector Machines

Interior Point Methods for Convex Quadratic and Convex Nonlinear Programming

Numerical Methods I: Numerical linear algebra

Parallelization of Multilevel Preconditioners Constructed from Inverse-Based ILUs on Shared-Memory Multiprocessors

2.098/6.255/ Optimization Methods Practice True/False Questions

Parallel programming practices for the solution of Sparse Linear Systems (motivated by computational physics and graphics)

Lecture slides by Kevin Wayne

Truncated Newton Method

Preface to the Second Edition. Preface to the First Edition

Solving Symmetric Indefinite Systems with Symmetric Positive Definite Preconditioners

1 Number Systems and Errors 1

Math 5593 Linear Programming Week 1

Primal-Dual Interior-Point Methods

9.1 Preconditioned Krylov Subspace Methods

Understanding the Simplex algorithm. Standard Optimization Problems.

LINEAR AND NONLINEAR PROGRAMMING

Minisymposia 9 and 34: Avoiding Communication in Linear Algebra. Jim Demmel UC Berkeley bebop.cs.berkeley.edu

Prelude to the Simplex Algorithm. The Algebraic Approach The search for extreme point solutions.

On implementing a primal-dual interior-point method for conic quadratic optimization

- Well-characterized problems, min-max relations, approximate certificates. - LP problems in the standard form, primal and dual linear programs

CS711008Z Algorithm Design and Analysis

Linear Programming. Linear Programming I. Lecture 1. Linear Programming. Linear Programming

Preconditioning Techniques Analysis for CG Method

Novel update techniques for the revised simplex method (and their application)

Review Solutions, Exam 2, Operations Research

Fine-grained Parallel Incomplete LU Factorization

Incomplete LU Preconditioning and Error Compensation Strategies for Sparse Matrices

Linear Programming and the Simplex method

ORF 522. Linear Programming and Convex Analysis

Matrix-Free Interior Point Method

LINEAR PROGRAMMING I. a refreshing example standard form fundamental questions geometry linear algebra simplex algorithm

Open-Source Parallel FE Software : FrontISTR -- Performance Considerations about B/F (Byte per Flop) of SpMV on K-Supercomputer and GPU-Clusters --

CO350 Linear Programming Chapter 6: The Simplex Method

The new challenges to Krylov subspace methods Yousef Saad Department of Computer Science and Engineering University of Minnesota

Scalable Non-blocking Preconditioned Conjugate Gradient Methods

Solving quadratic multicommodity problems through an interior-point algorithm

6.4 Krylov Subspaces and Conjugate Gradients

Iterative methods for Linear System

Slack Variable. Max Z= 3x 1 + 4x 2 + 5X 3. Subject to: X 1 + X 2 + X x 1 + 4x 2 + X X 1 + X 2 + 4X 3 10 X 1 0, X 2 0, X 3 0

Computers and Mathematics with Applications

OPERATIONS RESEARCH. Linear Programming Problem

Preconditioning and iterative solution of symmetric indefinite linear systems arising from interior point methods for linear programming

A PRECONDITIONER FOR LINEAR SYSTEMS ARISING FROM INTERIOR POINT OPTIMIZATION METHODS

Numerical Methods in Matrix Computations

Review Questions REVIEW QUESTIONS 71

Solving Large Nonlinear Sparse Systems

Uniform Boundedness of a Preconditioned Normal Matrix Used in Interior Point Methods

A priori bounds on the condition numbers in interior-point methods

Introduction to optimization

An advanced ILU preconditioner for the incompressible Navier-Stokes equations

Algorithms and Theory of Computation. Lecture 13: Linear Programming (2)

What s New in Active-Set Methods for Nonlinear Optimization?

The Ongoing Development of CSDP

Multilevel Low-Rank Preconditioners Yousef Saad Department of Computer Science and Engineering University of Minnesota. Modelling 2014 June 2,2014

Interior Point Methods in Mathematical Programming

Transcription:

FINDING PARALLELISM IN GENERAL-PURPOSE LINEAR PROGRAMMING Daniel Thuerck 1,2 (advisors Michael Goesele 1,2 and Marc Pfetsch 1 ) Maxim Naumov 3 1 Graduate School of Computational Engineering, TU Darmstadt 2 Graphics, Capture and Massively Parallel Computing, TU Darmstadt 3 NVIDIA Research 10.05.2017 TU Darmstadt GCC / GSC CE Daniel Thuerck, Maxim Naumov 1

INTRODUCTION TO LINEAR PROGRAMMING 10.05.2017 TU Darmstadt GCC / GSC CE Daniel Thuerck, Maxim Naumov 2

Linear Programs min c x s.t. Ax b x 0 Linear objective function Linear constraints c where A = a 1 T a m T and b = b 1 b m 10.05.2017 TU Darmstadt GCC / GSC CE Daniel Thuerck, Maxim Naumov 3

Linear Programs: Applications [3P Logistics] 10.05.2017 TU Darmstadt GCC / GSC CE Daniel Thuerck, Maxim Naumov 4

Lower-Level Parallelism in LP INTERNALS OF AN LP SOLVER 10.05.2017 TU Darmstadt GCC / GSC CE Daniel Thuerck, Maxim Naumov 5

Solving LPs min c x s.t. Ax = b. x 0 A is m n matrix, with m n A is sparse and has full row-rank Variables may be bounded: l x u Standard LP format 10.05.2017 TU Darmstadt GCC / GSC CE Daniel Thuerck, Maxim Naumov 6

Solving LPs Simplex Interior Point c c 10.05.2017 TU Darmstadt GCC / GSC CE Daniel Thuerck, Maxim Naumov 7

Solving LPs Simplex Interior Point (IPM) Basis (active set) Augmented (Newton) System D A A A B = Normal Equations AD 1 A 10.05.2017 TU Darmstadt GCC / GSC CE Daniel Thuerck, Maxim Naumov 8

Solving LPs IPM / Aug. System IPM / Normal Equations D A A AD 1 A (m + n) (m + n), sparse Symmetric, indefinite Solution: Indefinite LDL T or MINRES method m m, SPD, might be dense Squared condition number Solution: Cholesky-factorization or CG method 10.05.2017 TU Darmstadt GCC / GSC CE Daniel Thuerck, Maxim Naumov 9

Solving LPs IPM / Aug. System IPM / Normal Equations D A A AD 1 A (m + n) (m + n), sparse Symmetric, indefinite Solution: Indefinite LDL T or MINRES method m m, SPD, might be dense Squared condition number Solution: Cholesky-factorization or CG method 10.05.2017 TU Darmstadt GCC / GSC CE Daniel Thuerck, Maxim Naumov 10

Introducing culip-lp An ongoing implementation of Mehrotra s Primal-Dual interior point algorithm [1], featuring... (Iterative) Linear Algebra based on the Augmented Matrix approach, Full-rank guarantees, Comprehensive preprocessing & prescaling. Towards solving large-scale LPs on the GPU as open source for everybody 10.05.2017 TU Darmstadt GCC / GSC CE Daniel Thuerck, Maxim Naumov 11

Progress report IMPLEMENTING CULIP-LP 10.05.2017 TU Darmstadt GCC / GSC CE Daniel Thuerck, Maxim Naumov 12

Solver architecture Preprocess Scale Standardize IPM loop 10.05.2017 TU Darmstadt GCC / GSC CE Daniel Thuerck, Maxim Naumov 13

Solver architecture Preprocess Scale Standardize IPM loop Input data: Constraints Constraints Objective vector Bounds (on some variables) A eq x = b eq A le x b le c l, u 10.05.2017 TU Darmstadt GCC / GSC CE Daniel Thuerck, Maxim Naumov 14

Solver architecture Preprocess Scale Standardize IPM loop A eq x = b eq A le x b le c l, u Storage format: CSR Compressed sparse row format Provides efficient row-based access by 3 arrays: row_ptr 0 2 3 4 5 a e b c d 0 1 1 2 0 col_ind a b c d e val 10.05.2017 TU Darmstadt GCC / GSC CE Daniel Thuerck, Maxim Naumov 15

Solver architecture Preprocess Scale Standardize IPM loop A eq x = b eq A le x b le c l, u Example: LP pb-simp-nonunif (see [2]) Input matrix: 1,4 Mio x 23k with 4,36 Mio nonzeros Removed 1 singleton inequality Removed 3629 low-forcing constraints Removed 1 fixed variable Removed 1,1 Mio (!) singleton inequalities Result: approx. 3,6 Mio nonzeros removed Execute in rounds 10.05.2017 TU Darmstadt GCC / GSC CE Daniel Thuerck, Maxim Naumov 16

Solver architecture Preprocess Scale Standardize IPM loop A eq x = b eq A le x b le c l, u Goal: Reduce element variance in matrices Scaling [3] makes a difference 1. Geometric scaling (1x 4x) A i, = 2. Equilibration (1x) A i, max A i, min( A i, ) A i, = A i, A i, 2 10.05.2017 TU Darmstadt GCC / GSC CE Daniel Thuerck, Maxim Naumov 17

Solver architecture Preprocess Scale Standardize IPM loop Goal: Format LP in standard form Shift variables: l x u 0 x u + l Split (free) variables x x = x + x x +, x 0 Build std matrix: A le I = b Le A eq b eq A eq x = b eq A le x b le c l, u 10.05.2017 TU Darmstadt GCC / GSC CE Daniel Thuerck, Maxim Naumov 18

Solver architecture Preprocess Scale Standardize IPM loop Ensure A has full rank (symbolically) PAQ = m u Ax = b c u m c 10.05.2017 TU Darmstadt GCC / GSC CE Daniel Thuerck, Maxim Naumov 19

Solver architecture Preprocess Scale Standardize IPM loop Goal: Solve KKT conditions by Newton steps Steps: Augmented matrix assembly Solving the (indefinite) augmented matrix Solve twice: predictor and corrector Stepsize along v = v p + v c v p v c Ax = b c u 10.05.2017 TU Darmstadt GCC / GSC CE Daniel Thuerck, Maxim Naumov 20

Solving the augmented system Iterative strategy: Symmetric, indefinite: use MINRES [4] (in parts) D A A Equilibrate system implicitly Preconditioner: Experiments ongoing Direct strategy: Symmetric, indefinite: use SPRAL SSIDS [5] Reordering by METIS [6] Scaling for large pivots 10.05.2017 TU Darmstadt GCC / GSC CE Daniel Thuerck, Maxim Naumov 21

Intermediate findings PERFORMANCE EVALUATION 10.05.2017 TU Darmstadt GCC / GSC CE Daniel Thuerck, Maxim Naumov 22

Benchmark problems Problem name [7] M N NNZ ex9 40,962 10,404 517,112 ex10 696,608 17,680 1,162,000 neos-631710 169,576 167,056 834,166 bley_xl1 175,620 5831 869,391 map06 328,818 164,547 549,920 map10 328,818 164,547 549,920 nb10tb 150,495 73340 1,172,289 neos-142912 58,726 416,040 1,855,220 in 1,526,202 1,449,074 6,811,639 10.05.2017 TU Darmstadt GCC / GSC CE Daniel Thuerck, Maxim Naumov 23

Performance Problem name [7] NNZ CLP barr [sec] culip-lp [sec] ex9 517,112 X (NC) 81 ex10 1,162,000 X (NS) 141 neos-631710 834,166 172 478 bley_xl1 869,391 X (NS) 1,492 map06 549,920 X (NC) 466 map10 549,920 X (NC) 615 nb10tb 1,172,289 X (NC) 2,461 neos-142912 1,855,220 356 447 in 6,811,639 X (NS) NC X failed, NS did not start 1 st iteration, NC did not converged within 1 hour 10.05.2017 TU Darmstadt GCC / GSC CE Daniel Thuerck, Maxim Naumov 24

time [sec] Runtime breakdown 10 9 8 MINRES Predictor Corrector 7 6 5 4 3 2 1 0 SPRAL 1 11 21 31 41 51 61 71 81 91 IPM step Example: map10 [7] Problem: map10 [7] 10.05.2017 TU Darmstadt GCC / GSC CE Daniel Thuerck, Maxim Naumov 25

Iterations Relative Residual Iterative vs. direct methods 6000 MINRES Iterations 1.E+00 MINRES relative residual 5000 1.E-01 4000 1.E-02 3000 2000 1000 Predictor Corrector 1.E-03 1.E-04 1.E-05 1.E-06 0 1 2 3 4 5 6 7 8 9 10 11 12 13 IPM step 1.E-07 Example: map10 [7] 1 2 3 4 5 6 7 8 9 10 11 12 13 IPM step 10.05.2017 TU Darmstadt GCC / GSC CE Daniel Thuerck, Maxim Naumov 26

Numerical difficulty Condition of matrix D A A depends mainly on D = diag(x) diag(s) Remedies with strong duality towards the end often yielding 2x2 pivoting in factorizations (e.g. LDL in SPRAL) Preconditioning for MINRES or GMRES max(x i s i ) min x i s i 10 10 where x T =[x 1,,x n ] are solution and s T =[s 1,,s n ] are slack variables 10.05.2017 TU Darmstadt GCC / GSC CE Daniel Thuerck, Maxim Naumov 27

Findings on the solver s performance We know that individual components attain speedup on the GPU Linear algebra components, such as preconditioned MINRES, GMRES, etc. Graph reorderings and matchings LP problems: Medium-sized problems: faster than open-source IPM code CLP We can solve many large problems where CLP fails However, more time is needed to integrate all components together on the GPU 10.05.2017 TU Darmstadt GCC / GSC CE Daniel Thuerck, Maxim Naumov 28

What s keeping you from optimizing your runtime? LP Solver (a.k.a the black box ) = Rank estimation Preconditioner MPS I/O Preprocessing Graph partitioning Bipartite matching Krylov-Solver Component BFS Matrix factorization Max-flow SpMV Equilibration Matrix reordering LP scaling 10.05.2017 TU Darmstadt GCC / GSC CE Daniel Thuerck, Maxim Naumov 29

Future Work Numerics Preconditioning techniques Investigate influence of 2x2 pivots in LDL T factorization Tuning Benchmark & optimize warp-based kernels in preprocessing Never explicitly form the standardized matrix 10.05.2017 TU Darmstadt GCC / GSC CE Daniel Thuerck, Maxim Naumov 30

Higher-Level Parallelism in LP FEASIBILITY STUDY: LP DECOMPOSITIONS 10.05.2017 TU Darmstadt GCC / GSC CE Daniel Thuerck, Maxim Naumov 31

Solving an LP: The usual setup Large LP Solution Preprocess LP Solver Standardize (a.k.a the black box ) Solve 10.05.2017 TU Darmstadt GCC / GSC CE Daniel Thuerck, Maxim Naumov 32

Higher-level parallelism by LP decomposition Slave LP 1 Large LP Apply LP decomposition Master LP Slave LP k Solution Assemble master/slave solutions 10.05.2017 TU Darmstadt GCC / GSC CE Daniel Thuerck, Maxim Naumov 33

LP-decompositions: feasibility Decomposition works on structure of the constraint matrix A: Benders [9] Dantzig-Wolfe [10] 10.05.2017 TU Darmstadt GCC / GSC CE Daniel Thuerck, Maxim Naumov 34

LP-decompositions: prototype Implemented a Benders decomposition using hypergraph partitioning: 10.05.2017 TU Darmstadt GCC / GSC CE Daniel Thuerck, Maxim Naumov 35

Acknowledgements The work of Daniel Thuerck is supported by the 'Excellence Initiative' of the German Federal and State Governments and the Graduate School of Computational Engineering at Technische Universität Darmstadt. 10.05.2017 TU Darmstadt GCC / GSC CE Daniel Thuerck, Maxim Naumov 36

References [1] Mehrotra, Sanjay. "On the implementation of a primal-dual interior point method." SIAM Journal on optimization 2.4 (1992): 575-601. [2] Gondzio, Jacek. "Presolve analysis of linear programs prior to applying an interior point method."informs Journal on Computing 9.1 (1997): 73-91. [3] Gondzio, Jacek. "Presolve analysis of linear programs prior to applying an interior point method."informs Journal on Computing 9.1 (1997): 73-91. [4] Paige, Christopher C., and Michael A. Saunders. "Solution of sparse indefinite systems of linear equations." SIAM journal on numerical analysis 12.4 (1975): 617-629. [5] Paige, Christopher C., and Michael A. Saunders. "Solution of sparse indefinite systems of linear equations." SIAM journal on numerical analysis 12.4 (1975): 617-629. 10.05.2017 TU Darmstadt GCC / GSC CE Daniel Thuerck, Maxim Naumov 37

References [6] Karypis, George, and Vipin Kumar. "A fast and high quality multilevel scheme for partitioning irregular graphs." SIAM Journal on scientific Computing 20.1 (1998): 359-392. [7] Koch, Thorsten, et al. "MIPLIB 2010." Mathematical Programming Computation 3.2 (2011): 103-163. [8] Forrest, John, David de la Nuez, and Robin Lougee-Heimer. "CLP user guide." IBM Research (2004). [9] Benders, Jacques F. "Partitioning procedures for solving mixed-variables programming problems."numerische mathematik 4.1 (1962): 238-252. [10] Dantzig, George B., and Philip Wolfe. "Decomposition principle for linear programs." Operations research 8.1 (1960): 101-111. 10.05.2017 TU Darmstadt GCC / GSC CE Daniel Thuerck, Maxim Naumov 38