Linear Algebra in IPMs

Size: px

Start display at page:

Download "Linear Algebra in IPMs"

Hugo Jenkins
6 years ago
Views:

1 School of Mathematics H E U N I V E R S I Y O H F E D I N B U R G Linear Algebra in IPMs Jacek Gondzio J.Gondzio@ed.ac.uk URL: Paris, January

2 Outline Linear Algebra in IPMs for LP, QP and NLP Definite, Indefinite and Quasidefinite Systems Cholesky factorization Exploiting Sparsity in Gaussian Elimination Minimum degree ordering Nested dissection Very Large Scale Optimization implicit inverse representation from sparsity to block-sparsity structured optimization problems OOPS: Object-Oriented Parallel Solver Applications financial planning problems (nonlinear risk measures) data mining (nonlinear kernels in SVMs) Paris, January

3 Summary: From LP to QP Newton direction where A 0 0 Q A I S 0 X Augmented system [ Q Θ 1 A A 0 [ x y s ξ p = b Ax, ] = [ ξp ξ d ξ µ ξ d = c A y s+qx, ξ µ = µe XSe. ][ x y Conclusion: QP is a natural extension of LP. ] = ] [ ξd X 1 ] ξ µ. ξ p Paris, January ,

4 IPMs: LP vs QP Augmented system in LP [ Θ 1 A ][ x A 0 y ] = [ ξd X 1 ] ξ µ. ξ p Eliminate x from the first equation and get normal equations (AΘA ) y = g. Paris, January

5 IPMs: LP vs QP Augmented system in QP [ Q Θ 1 A A 0 ][ x y ] = [ ξd X 1 ] ξ µ. ξ p Eliminate x from the first equation and get normal equations (A(Q+Θ 1 ) 1 A ) y = g. One can use normal equations in LP, but not in QP. Normal equations in QP may become almost completely dense even for sparse matrices A and Q. hus, in QP, usually the indefinite augmented system form is used. Paris, January

6 KK systems in IPMs for LP, QP and NLP LP [ Θ 1 A A 0 ][ x y ] = [ f d] QP [ Q+Θ 1 A A 0 ][ x y ] = [ f d] NLP [ Q(x,y)+Θ 1 P A(x) A(x) Θ D ][ x y ] = [ f d] Paris, January

7 Cholesky factorization Compute a decomposition LDL = AΘA. where: L is a lower triangular matrix; and D is a diagonal matrix. Cholesky factorization is simply the Gaussian Elimination process that exploits two properties of the matrix: symmetry; positive definiteness. Paris, January

8 Use of Cholesky factorization Replace the difficult equation (AΘA ) y = g, with a sequence of easy equations: L u = g, D v = u, L y = v. Note that g = Lu = L(Dv) = LD(L y) = (LDL ) y = (AΘA ) y. Paris, January

9 Symmetric Gaussian Elimination Let H R m m be a symmetric positive definite matrix h 11 h 12 h 1m h H = 21 h 22 h 2m h m1 h m2 h mm By applying Gaussian Elimination to it, we can represent it in the following form: l l m1 l m2 1 d d d mm 1 l 21 l m1 0 1 l m Paris, January

10 Symmetric GE: Examples Example 1: [ ] = [ ][ ][ ]. Example 2: [ ] = [ ][ ][ ]. Paris, January

11 Existence of LDL factorization Lemma: he decomposition H = LDL with d ii > 0, i exists iff H is positive definite (PD). Proof: Part 1 ( ) Let H = LDL with d ii > 0. ake any x 0 and let u = L x. Since L is a unit lower triangular matrix it is nonsingular so u 0 and m x Hx = x LDL x = u Du = d ii u 2 i > 0. i=1 Paris, January

12 Proof (cont d): Part 2 ( ) Proof by induction on dimension of H. For m = 1. H = h 11 = d 11 > 0 iff H is PD. Assume the [ result ] is true for m = k 1 1. W a Let H = a R q k k be given k k positive definite matrix with W R (k 1) (k 1), a R k 1 and q R. Note first that since H is PD, W is also PD. Indeed for any (x,0) R k we have [ ][ ] [x W a x,0] a q 0 =x Wx>0 x R k 1,x 0. From inductive hypothesis we know that W =LDL with d ii >0. Let [ ] [ ][ ][ W a L 0 D 0 a = L ] l q l 1 0 d, 0 1 Paris, January

13 where l is the solution of equation (LD)l = a (it is well defined since L and D are nonsingular) [ ] and d is given by d = q l Dl. W a Hence matrix H = a has an q L D L decomposition. It remains to prove that d > 0. Consider the vector [ x = L ] l. 1 Since H is positive definite, we get 0 < x Hx [ ][ = [ l L 1 L 0 D 0,1] l 1 0 d [ ] D 0 0 = [0,1] 0 d][ 1 = d, which completes the proof. ][ L l 0 1 ][ L l 1 Paris, January ]

14 Definite & Indefinite Systems Cholesky factorization fails for indefinite matrix. Example 1: Negative pivot d 22 < 0. [ ] [ ][ = 2/ /3 ][ 1 2/3 0 1 ]. Example 2: d 11 = 0. Can t even start the elimination. [ ] =??? Paris, January

15 Definite & Indefinite Systems (cont d) IPMs: For indefinite augmented system [ Θ 1 A ][ ] x A 0 y one needs to use some special tricks. For positive definite normal equations = [ r h]. (AΘA ) y = g. one can compute the Cholesky factorization. Paris, January

16 Major Cholesky Andre-Louis Cholesky ( ) Major of French Army, descendant from the Cholewski family of Polish imigrants. Read: M. A. Saunders, Major Cholesky would feel proud, ORSA Journal on Computing, vol 6 (1994) No 1, pp Paris, January

17 Symmetric Factorization wo step solution method: factorization to LDL form, backsolve to compute direction y. A symmetric nonsingular matrix H is factorizable if there exists a diagonal matrix D and unit lower triangular matrix L such that H = LDL. A symmetric matrix H is strongly factorizable if for any permutation matrix P a factorization PHP = LDL exists. he general symmetric indefinite matrix is not factorizable. Paris, January

18 Factoring Indefinite Matrix wo options are possible: 1. Replace diagonal matrix D with a block-diagonal one and allow 2 2 (indefinite) pivots [ 0 a a 0 ] and [ 0 a a d Hence obtain a decomp. H = LDL with block-diagonal D. 2. Regularize indefinite matrix to produce a quasidefinite matrix [ ] K = E A, A F where E R n n is positive definite, F R m m is positive definite, and A R m n has full row rank. Paris, January ].

19 Quasidefinite (QDF) Matrices Symmetric matrix is called quasidefinite if [ ] K = E A, A F where E R n n and F R m m are positive definite, and A R m n has full row rank. QDF matrices are strongly factorizable. For any quasidefinite matrix there exists a Cholesky-like factorization K = LDL, where D is diagonal but not positive definite: n negative pivots; and m positive pivots. Paris, January

20 From Indefinite to Quasidefinite Indefinite matrix H = [ Q Θ 1 A A 0 in IPMs can be converted to a quasidefinite one. Regularize indefinite matrix to produce a quasi-definite matrix. Use dynamic regularization [ H = Q Θ 1 A ] [ ] Rp 0 + A 0 0 R, d wherer p R n n andr d R m m aretheprimalanddual regularizations. For any quasidefinite matrix there exists a Cholesky-like factorization H = LDL, ]. where D is diagonal but not positive definite: n negative pivots and m positive pivots. Paris, January

21 Large Problems are Sparse Suppose a large LP is solved: m,n Can all variables be linked at the same time? No, usually only a subset of them is linked. here are usually only several nonzeros per row in an LP. Large problems are always sparse. Exploiting sparsity in computations leads to huge savings. Exploiting sparsity means mainly avoiding doing useless computations: the computations for which the result is known, as for example multiplications with zero. Paris, January

22 Exploiting sparsity: Example Ax = It requires computing [ ] 2 A.1 +5 A.3 2 A and involves only five multiplications and five additions. We say that this matrix-vector multiplication needs 5 flops. A flop is a floating point operation: x := x+a b. Paris, January

23 General Sparse Systems Single step in Gaussian Elimination [ ] p v A = u A 1 produces the following Schur complement A 1 p 1 uv. Markowitz Pivot Choice Let r i and c i,i = 1,2,...,n be numbers of nonzero entries in row and column i, respectively. he elimination of the pivot a ij needs f ij = (r i 1)(c j 1) flops to be made. his step creates at most f ij new nonzero entries in the Schur complement. Paris, January

24 General Sparse Systems he effect of pivot elimination on the sparsity pattern p x x x x 2 x x x x 3 x x x x 4 x x x 5 x x x 6 x x 7 x x x 8 x x x x pivot : p nonzero : x p x x x x 2 x x x x 3 x x x f f f f 4 x x x 5 x f f x f f 6 x x 7 x x x 8 x f f f f pivot : p nonzero : x fill in : f Paris, January

25 Markowitz Pivot Choice: Example Markowitz: Choose the pivot with min i,j f ij x x x x x 2 x x x x 3 x x x x 4 x x x 5 x x x 6 p x 7 x x x 8 x x x x x x x f x 2 x x x x 3 x x x x 4 x x x f 5 x x x 6 p x 7 x x x 8 x x x x Paris, January

26 Exploiting Sparsity in Cholesky Factorization Matrix H and its Cholesky Factor p x x x x x H = x x x x L = x x x x x x x x x x Reordered Matrix H and its Cholesky Factor x x x PHP x x x = x x L = x x x x x x x x x Paris, January

27 From Sparsity to Block-Sparsity: Sparse Matrix p x x x x x x x x H= x x L= x x x x x PHP x x = x x L= x x x x x x x x x x x x x x x x x P Block-Sparse Matrix L= L= Paris, January

28 Minimum Degree Ordering (MDO) In symmetric positive definite case: pivots are chosen from the diagonal and r i = c i hence choose the pivot with min i r i Minimum degree ordering: choose an element with the minimum number of nonzeros in a row, that is, choose a node with the minimum number of neighbours (a node with the minimum degree) in a graph related to sparsity pattern of the matrix. Paris, January

29 Minimum Degree Ordering (MDO) Sparse Matrix Pivot h 11 Pivot h 22 x x x x p x x x x x x x x x x x p x x x x x x f f x x x x H = x x x x f x f x x x x x x x x x f f x x x x x x x x x x x x x Minimum degree ordering: choose a diagonal element corresponding to a row with the minimum number of nonzeros. Permute rows and columns of H accordingly. MDO is simply the symmetric version of Markowitz pivot rule. Paris, January

30 From Sparsity to Block-Sparsity: Apply minimum degree ordering to (sparse) blocks: Block-Sparse Matrix Pivot Block H 11 Pivot Block H 22 H= P P Choose a diagonal block-pivot corresponding to a block-row with the minimum number of blocks. Permute block-rows and block-columns of H accordingly. Paris, January

31 Nested Dissection: Original Matrix x x x x 2 x x x x x 3 x x x x 4 x x x x x x 5 x x x x x 6 x x x x x 7 x x x x 8 x x x x x 9 x x x x 10 x x x x x x 11 x x x x x Reordered Matrix x x x x 2 x x x x x 3 x x x x 5 x x x x x 6 x x x x x 8 x x x x x 9 x x x x 10 x x x x x x 11 x x x x x 4 x x x x x x 7 x x x x Paris, January

32 Structured Problems Observation: ruly large scale problems are not only sparse... such problems are structured Structure is displayed in: Jacobian matrix A Hessian matrix Q Structure can be exploited in: IPM Algorithm Linear Algebra of IPM (focus of this lecture) Paris, January

33 Primal Block-Angular Structure: Q = [ ], A = [ ] and A = [ ] Reorder blocks: {1,3; 2,4; 5}. H =, PHP = Paris, January

34 Dual Block-Angular Structure: Q =, A = [ ] and A = Reorder blocks: {1,4; 2,5; 3}. H =, PHP = Paris, January

35 Row & Col Bordered Block-Diag Structure: Q =, A = [ ] and A = Reorder blocks: {1,4; 2,5; 3,6}. H =, PHP = Paris, January

36 Example: Bordered Block-Diagonal Structure Φ 1 B Φ n Bn = B 1...B n Φ }{{ 0 } Φ L 1 D = L n L 1,0...L n,0 L }{{ 0 } L D n D0 }{{} D he blocks Φ i, i = 0,1,...,n are KK systems. L 1 L 1,0.... L n L n,0 L 0 } {{ } L Paris, January

37 Example: Bordered Block-Diagonal Structure Cholesky-like factors obtained by Schur-complement: Φ i = L i D i L i L i,0 = B i L i Di 1, i = 1..n C = Φ 0 n i=1 L i,0 D i L i,0 = L 0D 0 L 0 And the system Φx = b is solved by z i = L 1 i b i z 0 = L 1 0 (b 0 L i,0 z i ) y i = D 1 i z i x 0 = L 0 y 0 x i = L i (y i L i,0 x 0) Operations(Cholesky, Solve, Product) performed on sub-blocks Paris, January

38 Abstract Linear Algebra for IPMs Execute the operation solve (reduced) KK system in IPMs for LP, QP and NLP. It works like the backslash operator in MALAB. Assumptions: Q and A are block-structured Paris, January

39 Linear Algebra of IPMs [ Q Θ 1 P A Θ D ] } A {{ } Φ(NLP) [ x y ree representation of matrix A: D 11 B 11 D 1 D 12 B 12 D 2 D 10 D 21 D 22 B B B D D 20 C 31 C 32 D 30 Primal Block Angular Structure Dual Block Angular Structure ] = Primal Block Angular Structure [ f d] D 1 D 2 D 11 D 12 D 10 B11 B12 D D D D B B B A C 31 C 32 D 30 Paris, January

40 Structures of A and Q imply structure of Φ: Q 11 D 11 B 11 Q 12 Q D12 B Q10 Q 21 Q22 Q D 10 D 21 D22 B B B Q 23 D Q20 D 20 Q Q Q 30 C 31 C 32 D 30 D 11 D 12 C 31 ( Q A A 0 B 11 B 12 D 10 ) D 21 D 22 D 23 C 32 B B B 23 D 20 D 30 Q 11 D 11 D 11 B 11 ~ Q 31 ~ C 31 Q 12 D 12 D 12 B 12 D 10 B 11 B12 D B 21 D 21 ~ ~ C ~ ~ 31 C ~ 31 C ~ 32 ~ C ~ ~ Q 32 C ~ ~ 31 Q31 Q32 Q32 Q32 32 Q32 P Q 10 D 10 Q 21 Q D B D 22 Q D B 23 ( Q A A 0 ) D 23 Q D 20 P 1 21 B 22 B B D20 C ~ 32 ~ Q ~ 31 C 31 ~ Q ~ 31 C 31 ~ Q ~ 31 C 31 ~ Q 32 C ~ 32 ~ Q 32 C ~ 32 ~ Q 32 C ~ 32 ~ Q 32 C ~ 32 Q 30 D 30 D 30 Paris, January

41 OOPS: Object-oriented linear algebra for IPM Every node in the block elimination tree has its own linear algebra implementation (depending on its type) Each implementation is a realisation of an abstract linear algebra interface. Different implementations are available for different structures Matrix Interface Matrix Factorize SolveL SolveLt y=mx y=mtx PrimalBlockAng Implicit factorization A i B i BorderedBlockDiag Implicit factorization A i B i C i SparseMatrix General sparse linear algebra DualBlockAng Implicit factorization A i C i RankCorrector Rank corrector implementation D R DenseMatrix General dense linear algebra Rebuild block elimination tree with matrix interface structures Paris, January

42 OOPS: Matrix Revolutions, Matrix Reloaded Paris, January

43 Structured Problems... are present everywhere. Paris, January

44 Sources of Structure Dynamics Staircase structure A A B I A A B I A B I A B I A A B I A A B I A B I A A B I x t+1 = A t x t +B t u t x t+1 =A t+1 t x t +...+A t+1 t p x t p+b t u t Paris, January

45 Sources of Structure Uncertainty Block-angular structure W W W W W W W W W W W i x 1 +W i y i = b i lt x a(lt ) +W l t x lt = b lt Paris, January

46 Sources of Structure Common resource constraint k B i x i = b Dantzig-Wolfe structure i=1 A A A B B B Paris, January

47 Sources of Structure Other types of near-separability Row and column bordered block-diagonal structure A C A C A B B B C D Paris, January

48 Sources of Structure (low) rank-corrector A+VV = C A + V V = C and networks, ODE- or PDE-discretizations, etc. Paris, January

49 Example Applications: financial planning problems (nonlinear risk measures) machine learning (nonlinear kernels in SVMs) Paris, January

50 Financial Planning Problems (ALM) A set of assets J ={1..J} given (bonds, stock, real estate) At every stage t = we can buy or sell different assets he return of asset j at stage t is uncertain Investment decisions: what to buy or sell, at which time stage Objectives: maximize the final wealth minimize the associated risk Mean Variance formulation: max IE(X) ρvar(x) Stochastic Program: formulate deterministic equivalent standard QP, but huge extentions: nonlinear risk measures (log utility, skewness) Paris, January

51 ALM: Largest Problem Attempted Optimization of 21 assets(stock market indices) 7 time stages. Using multistage stochastic programming Scenario tree geometry: M scenarios second level nodes with variables each. Scenario ree generated using geometric Brownian motion billion variables, 353 million constraints 128 nodes 30 nodes 30 nodes Paris, January

52 Sparsity of Linear Algebra = 8127 columns for Schur-complement Prohibitively expensive Need facility to exploit nested structure Need to be careful that Schurcomplement calculations stay sparse on second level Paris, January

53 Results (ALM: Mean-Variance QP formulation): Prob Stgs Asts Scen Rows Cols iter time procs machine ALM M 64M 154M BlueGene ALM M 96M 269M BlueGene ALM M 180M 500M BlueGene ALM M 353M 1.011M HPCx he problem with 353 million of constraints 1 billion of variables was solved in 50 minutes using 1280 procs. Equation systems of dimension billion were solved with the direct (implicit) factorization. One IPM iteration takes less than a minute. Paris, January

54 Support Vector Machines: Formulated as the (dual) quadratic program: min e y y Ky, s.t. d y = 0, 0 y λe. Ferris & Munson, SIOP 13 (2003) Kernel function K(x, z) = φ(x), φ(z), where φ is a (nonlinear) mapping from X to feature space F Matrix K: K ij = K(x i,x j ) Linear Kernel K(x,z) = x z. Polynomial Kernel K(x,z) = (x z +1) d. Gaussian Kernel K(x,z) = e γ x z 2. Paris, January

55 SVMs with Nonlinear Kernels: K is very large and dense! Approximate: K LL or K LL +D Introduce v = L y and get a separable QP: min e y v v y Dy, s.t. d y = 0, v L y = 0, 0 y λe. H = L L Structure can be exploited in: Linear Algebra of IPM Kristian Woodsend: PhD hesis, Edinburgh Paris, January

56 References Gondzio and Sarkissian, Parallel interior point solver for structured linear programs, Math Prog 96 (2003) Gondzio and Grothey, Reoptimization with the primaldual IPM, SIAM J. on Optimization 13 (2003) Gondzio and Grothey, Parallel IPM solver for structured QPs: application to financial planning problems, Annals of Oper Res 152 (2007) Woodsend and Gondzio, Hybrid MPI/OpenMP parallel linear support vector machine training, J. of Machine Learning Research 20 (2009) Papers available: OOPS: Object-Oriented Parallel Solver Paris, January

57 Conclusions: Interior Point Methods are well-suited to Large Scale Optimization Direct Methods are well-suited to structure exploitation Use IPMs in your research! Paris, January

58 Implementation of IPMs Andersen, Gondzio, Mészáros and Xu Implementation of IPMs for large scale LP, in: Interior Point Methods in Mathematical Programming,. erlaky (ed.), Kluwer Academic Publishers, 1996, pp Altman and Gondzio Regularized symmetric indefinite systems in interior point methods for linear and quadratic optimization, Optimization Methods and Software, (1999), pp Recent Survey on IPMs (easy reading) Gondzio Interior point methods 25 years later, European J. of Operational Research 218 (2012) Paris, January

Using interior point methods for optimization in training very large scale Support Vector Machines. Interior Point Methods for QP. Elements of the IPM

School of Mathematics T H E U N I V E R S I T Y O H Using interior point methods for optimization in training very large scale Support Vector Machines F E D I N U R Part : Interior Point Methods for QP