Constraint Reduction for Linear Programs with Many Constraints André L. Tits Institute for Systems Research and Department of Electrical and Computer Engineering University of Maryland, College Park Pierre-Antoine Absil School of Computer Science and Information Technology Florida State University, Tallahassee William Woessner Department of Computer Science University of Maryland, College Park UMBC, 11 March 2. 1
Consider the following linear program in dual standard form max b T y subject to A T y c, (1) where A is m n. Suppose n m. Observation: Normally, only a small subset of the constraints (no more than m under nondegeneracy assumptions) are active at the solution. The others are redundant. b Objective: Compute search direction based on a reduced Newton-KKT system, by adaptively selecting a small subset of critical columns of A. Hope: Significantly reduced cost per iteration. No drastic increase in the number of iterations. Preserve theoretical convergence properties. 2
Outline 1. Background Some related work Notation Primal-dual framework Operation count Reduced Newton-KKT system 2. Reduced, Dual-Feasible PD Affine Scaling: µ = Algorithm statement Observation Numerical experiments Convergence properties 3. Reduced Mehrotra-Predictor-Corrector Algorithm statement Numerical experiments 4. Concluding Remarks 3
Background Some related work Indicators (to identify early the zero components of x ): El-Bakry et al.[1994], Facchinei et al.[2]. Column generation, build-up, build-down : Ye [1992], den Hertog et al.[1992, 1994, 199], Goffin et al.[1994], Ye [1997], Luo et al.[1999]. Focus is on complexity analysis; Good numerical results on discretized semi-infinite programming problems; but typically, many more than m columns of A are retained. Notation n := {1,...,n}, A = [a 1,...,a n ], e = [1,..., 1] T, Given Q n, A Q := col[a i : i Q], X Q := diag(x i : i Q), x Q := [x i : i Q] T S Q := diag(s i : i Q). s Q := [s i : i Q] T 4
Background (cont d) Primal-dual framework Primal-dual LP pair in standard form: min c T x subject to Ax = b, x max b T y subject to A T y + s = c, s. (2) Perturbed (µ ) KKT conditions of optimality: A T y + s c = (3) Ax b = (4) XSe = µe () x, s, (6) Given µ, µ-perturbed Newton-KKT system: A T I x r c A y = r b S X s Xs + µe, with r b := Ax b (primal residue) and r c := A T y + s c (dual residue).
Background (cont d) Primal-dual framework (cont d) Equivalently, ( x, y, s) satisfy that normal equations: AS 1 XA T y = r b + A( S 1 Xr c + x µs 1 e) s = A T y r c x = x + µs 1 e S 1 X s Simple interior-point iteration: Given x >, s >, y, Select a value for µ ( ); Solve Newton-KKT system for x, y, s; Set x + := x + α P x >, s + := s + α D s >, y + := y + α D y with appropriate α P, α D (possibly forced to be equal). Note: If (y, s) is dual feasible, then (y +, s + ) also is. 6
Background (cont d) Operation count Reduced Newton-KKT system Operation count (for a dense problem): - Forming G := AS 1 XA T : m 2 n; - Forming v := r b + A(x S 1 (Xr c + µe)); 2mn; - Solving G y = v: m 3 /3 (Cholesky); - Computing s = A T y r c : 2mn; - Computing x = x + S 1 ( X s + µe): 2n; Benefit in replacing A with A Q : n replaced with Q Assume n m and m 1. Then main gain can be achieved in line 1, i.e., by merely redefining G := A Q S 1 Q X QA T Q, and leaving the rest unchanged. This is done in the sequel. Key question: How to select Q for Significantly reducing the work per iteration ( Q small); Avoiding a dramatic increase in number of iterations; Preserving theoretical convergence properties. 7
Reduced, Dual-Feasible PD Affine Scaling: µ = Algorithm statement Iteration rpdas. Parameters. β (, 1), x max > x min >, M m. Data. y, with A T y < c; s := c A T y (> ) (i.e., r c = ); x > ; Q n, including the indices of the M smallest entries of s. Step 1. Compute search direction. Step 2. Updates. Solve (i) Primal update. Set and compute A Q S 1 Q X QA T Q y = b s A T y x S 1 X s x + i min{max{min{ y 2 + x 2, x min }, x i }, x max }, i n, (8) where ( x ) i := min{ x i, }. (ii) Dual update. Set if s i i n, t D min{( s i / s i ) : s i <, i n} otherwise. (9) Set ˆt D min{max{βt D, t D y }, 1}. Set y + y + ˆt D y, s + s + ˆt D s. (So r c remains at.) 8
Reduced, Dual-Feasible PD Affine Scaling: µ = Observation ( x Q, y, s Q ) constructed by iteration rpdas, also satisfy s Q = A T Q y x Q = x Q S 1 Q X Q s Q, (1a) (1b) i.e., they satisfy the full set of normal equations associated with the constraint-reduced system. Equivalently, they satisfy the Newton system (with µ = and r c = ) A T Q I A Q x Q y = b A Q x Q. S Q X Q s Q X Q s Q (This is a key ingredient to the local convergence analysis.) 9
Reduced, Dual-Feasible PD Affine Scaling: µ = Numerical experiments Heuristic used for Q: For given M m, Q = indexes of M smallest components of s. Parameter value: β =.99 Selection of x : Based on Mehrotra s [SIOPT, 1992] scheme. Test problems (with dual-feasible initial point) Polytopic approximation of unit sphere entries of b N(, 1); columns of A uniformly distributed on the unit sphere; components of y and s uniformly distributed on (,1); c = A T y + s to ensure dual feasibility. Fully random problem entries of A and b N(, 1); y, s, and c generated as above. SCSD1, SCSD6, SHIP4L, and WOODW from Netlib: SIPOW1, SIPOW2 (semi-infinite) from CUTE: 1
Reduced, Dual-Feasible PD Affine Scaling: µ = Numerical experiments (cont d) The points on the plots correspond to different runs of Algorithm rpdas on the same problem. The runs only differ by the number of constraints M that are retained in Q; this information is indicated on the horizontal axis in relative value. The rightmost point thus corresponds to the experiment without constraint reduction, while the points on the extreme left correspond to the most drastic constraint reduction. Observations: In most cases, surprisingly, the number of iterations does NOT increase as M is reduced. Thus any gain in per iteration directly translates into the same relative gain in overall. Displayed values are purely indicative. Indeed, they will strongly depend on the implementation (in particular, how the product A Q S 1 Q X QA T Q is computed), and on the possible sparsity of the data. The algorithm sometimes fails for small Q. This is due to A Q losing rank, and accordingly A Q SQ 1X QA T Q becoming singular. (Note that this will almost surely not happen when A is generated randomly.) Schemes to bypass this difficulty are being investigated. 11
m=32, n=8192, prob=tgtcstr, sw MPC=, sw mu Meh=, sw p update=2, sw x=2, sw sparse=1, diag ld= 2 1. solves 1..1.2.3.4..6.7.8.9 1 6 4 3 2 1.1.2.3.4..6.7.8.9 1 Q /n (smallest Q displayed is 64) rpdas on polytopic approximation of unit sphere; m = 32, n = 8192. m=32, n=8192, prob=randn, sw MPC=, sw mu Meh=, sw p update=2, sw x=2, sw sparse=1, diag ld= 3 2. solves 2 1. 1..1.2.3.4..6.7.8.9 1 6 4 3 2 1.1.2.3.4..6.7.8.9 1 Q /n (smallest Q displayed is 64) rpdas on fully random problem; m = 32, n = 8192. 12
m=77, n=76, prob=scsd1, sw MPC=, sw mu Meh=, sw p update=2, sw x=2, sw sparse=1, diag ld=.2.1 solves.1..1.2.3.4..6.7.8.9 1 2 1 1.1.2.3.4..6.7.8.9 1 Q /n (smallest Q displayed is 14) rpdas on SCSD1: m = 77, n = 76. m=147, n=13, prob=scsd6, sw MPC=, sw mu Meh=, sw p update=2, sw x=2, sw sparse=1, diag ld=.8.6 solves.4.2.1.2.3.4..6.7.8.9 1 7 6 4 3 2 1.1.2.3.4..6.7.8.9 1 Q /n (smallest Q displayed is 294) rpdas on SCSD6: m = 147, n = 13. 13
m=36, n=2162, prob=ship4l, sw MPC=, sw mu Meh=, sw p update=2, sw x=2, sw sparse=1, diag ld= 2. 2 solves 1. 1..1.2.3.4..6.7.8.9 1 3 2 2 1 1.1.2.3.4..6.7.8.9 1 Q /n (smallest Q displayed is 1679) rpdas on SHIP4L: m = 36, n = 2162. m=198, n=8418, prob=woodw, sw MPC=, sw mu Meh=, sw p update=2, sw x=2, sw sparse=1, diag ld= 4 solves 3 2 1.2.4.6.8 1 1.2 1.4 1.6 1.8 2 7 6 4 3 2 1.2.4.6.8 1 1.2 1.4 1.6 1.8 2 Q /n (smallest Q displayed is 8418) rpdas on WOODW: m = 198, n = 8418. 14
m=2, n=1, prob=sipow1, sw MPC=, sw mu Meh=, sw p update=2, sw x=2, sw sparse=1, diag ld=.4.3 solves.2.1.1.2.3.4..6.7.8.9 1 3 2 2 1 1.1.2.3.4..6.7.8.9 1 Q /n (smallest Q displayed is 12) rpdas on SIPOW1: m = 2, n = 1. rpdas on SIPOW2: m = 2, n = 1. 1
Reduced, Dual-Feasible PD Affine Scaling: µ = Convergence properties Let F := {y : A T y c}. For y F, let I(y) := {i : a T i y = c i}. Assumption 1. All m M submatrices of A have full (row) rank. Assumption 2. The dual (y) solution set is nonempty and bounded. Assumption 3. For all y F, the set {a i : i I(y)} is linear independent. Theorem. {y k } converges to the dual solution set. Assumption 4. The dual solution set is a singleton, say, {y }, and the associated KKT multiplier x satisfies x i < x max for all i. Theorem. {(x k, y k )} converges to (x, y ) Q-quadratically. The global convergence analysis focusses on the monotone decrease of the dual objective function b T y. The lower bound y 2 + x 2 in the primal update formula (8) is essential as it keep the Newton-KKT matrix away from singularity as long as KKT points are not approached. (A step along the primal direction x would not allow for this.) 16
Reduced Mehrotra-Predictor-Corrector Algorithm statement Iteration rmpc. Parameters. β (, 1), M m. Data. y, with A T y < c; s := c A T y; x > ; µ := x T s/n; Q n, including the indices of the M smallest components of s. Step 1. Compute affine scaling step. Solve A Q S 1 Q X QA T Q y = r b + A( S 1 Xr c + x) and compute s A T y r c x x S 1 X s t aff P arg max{t [, 1] x + t x } t aff D arg max{t [, 1] x + t s } Step 2. Compute centering parameter. µ aff (x + t aff P x)t (s + t aff D s)/n σ (µ aff /µ) 3 Step 3. Compute centering/corrector direction. A Q S 1 Q X QA T Q y cc = AS 1 (σµe X s) s cc A T y cc x cc S 1 (σµe X s) S 1 X s cc 17
Step 4. Compute MPC step. x mpc x + x cc y mpc y + y cc s mpc s + s cc t max P arg max{t [, 1] x + t x mpc } t max D arg max{t [, 1] s + t s mpc } t P = min{βt max P, 1} t D = min{βt max D, 1} Step. Updates. x + x + t P x mpc y + y + t D y mpc s + s + t D s mpc Numerical experiments with dual-feasible initial point Algorithm rmpc was run on the same problems as rpdas, with the same (dual-feasible) initial points. The results are reported in the next few slides. 18
m=32, n=8192, prob=tgtcstr, sw MPC=1, sw mu Meh=, sw p update=, sw x=2, sw sparse=1, diag ld= 2. 2 solves 1. 1..1.2.3.4..6.7.8.9 1 2 2 1 1.1.2.3.4..6.7.8.9 1 Q /n (smallest Q displayed is 968) Dual-feasible rmpc on polytopic approximation of unit sphere; m = 32, n = 8192. m=32, n=8192, prob=randn, sw MPC=1, sw mu Meh=, sw p update=, sw x=2, sw sparse=1, diag ld= 2 1. solves 1..1.2.3.4..6.7.8.9 1 3 3 2 2 1 1.1.2.3.4..6.7.8.9 1 Q /n (smallest Q displayed is 64) Dual-feasible MPC on fully random problem; m = 32, n = 8192. 19
m=77, n=76, prob=scsd1, sw MPC=1, sw mu Meh=, sw p update=, sw x=2, sw sparse=1, diag ld=.2.1 solves.1..1.2.3.4..6.7.8.9 1 3 2 2 1 1.1.2.3.4..6.7.8.9 1 Q /n (smallest Q displayed is 14) Dual-feasible rmpc on SCSD1: m = 77, n = 76. m=147, n=13, prob=scsd6, sw MPC=1, sw mu Meh=, sw p update=, sw x=2, sw sparse=1, diag ld=.4.3 solves.2.1.1.2.3.4..6.7.8.9 1 3 3 2 2 1 1.1.2.3.4..6.7.8.9 1 Q /n (smallest Q displayed is 294) Dual-feasible rmpc on SCSD6: m = 147, n = 13. 2
m=36, n=2162, prob=ship4l, sw MPC=1, sw mu Meh=, sw p update=, sw x=2, sw sparse=1, diag ld= 2 1. solves 1..2.4.6.8 1 1.2 1.4 1.6 1.8 2 1 1.2.4.6.8 1 1.2 1.4 1.6 1.8 2 Q /n (smallest Q displayed is 2162) Dual-feasible rmpc on SHIP4L: m = 36, n = 2162. m=198, n=8418, prob=woodw, sw MPC=1, sw mu Meh=, sw p update=, sw x=2, sw sparse=1, diag ld= 4 3 solves 2 1.2.4.6.8 1 1.2 1.4 1.6 1.8 2 4 3 2 1.2.4.6.8 1 1.2 1.4 1.6 1.8 2 Q /n (smallest Q displayed is 8418) Dual-feasible rmpc on WOODW: m = 198, n = 8418. 21
m=2, n=1, prob=sipow1, sw MPC=1, sw mu Meh=, sw p update=, sw x=2, sw sparse=1, diag ld=..4 solves.3.2.1.1.2.3.4..6.7.8.9 1 2 1 1.1.2.3.4..6.7.8.9 1 Q /n (smallest Q displayed is 12) Dual-feasible rmpc on SIPOW1: m = 2, n = 1. m=2, n=1, prob=sipow2, sw MPC=1, sw mu Meh=, sw p update=, sw x=2, sw sparse=1, diag ld=.8.6 solves.4.2.1.2.3.4..6.7.8.9 1 2 1 1.1.2.3.4..6.7.8.9 1 Q /n (smallest Q displayed is 12) Dual-feasible rmpc on SIPOW2: m = 2, n = 1. 22
Reduced Mehrotra-Predictor-Corrector (cont d) Numerical experiments with infeasible initial point The next few slides reports results obtained on the same problem, but with the (usually infeasible) initial point as recommended by Mehrotra [SIOPT, 1992]. 23
m=32, n=8192, prob=tgtcstr, sw MPC=1, sw mu Meh=, sw p update=, sw x=3, sw sparse=1, diag ld= 2 1. solves 1..1.2.3.4..6.7.8.9 1 2 2 1 1.1.2.3.4..6.7.8.9 1 Q /n (smallest Q displayed is 968) (Infeasible) rmpc on polytopic approximation of unit sphere; m = 32, n = 8192. m=32, n=8192, prob=randn, sw MPC=1, sw mu Meh=, sw p update=, sw x=3, sw sparse=1, diag ld= 2. 2 solves 1. 1..1.2.3.4..6.7.8.9 1 3 2 2 1 1.1.2.3.4..6.7.8.9 1 Q /n (smallest Q displayed is 968) (Infeasible) MPC on fully random problem; m = 32, n = 8192. 24
m=77, n=76, prob=scsd1, sw MPC=1, sw mu Meh=, sw p update=, sw x=3, sw sparse=1, diag ld=.4.3 solves.2.1.1.2.3.4..6.7.8.9 1 4 3 2 1.1.2.3.4..6.7.8.9 1 Q /n (smallest Q displayed is 14) (Infeasible) rmpc on SCSD1: m = 77, n = 76. m=147, n=13, prob=scsd6, sw MPC=1, sw mu Meh=, sw p update=, sw x=3, sw sparse=1, diag ld=.8.6 solves.4.2.1.2.3.4..6.7.8.9 1 4 3 2 1.1.2.3.4..6.7.8.9 1 Q /n (smallest Q displayed is 294) (Infeasible) rmpc on SCSD6: m = 147, n = 13. 2
m=36, n=2162, prob=ship4l, sw MPC=1, sw mu Meh=, sw p update=, sw x=3, sw sparse=1, diag ld= 2. 2 solves 1. 1..2.4.6.8 1 1.2 1.4 1.6 1.8 2 2 1 1.2.4.6.8 1 1.2 1.4 1.6 1.8 2 Q /n (smallest Q displayed is 2162) (Infeasible) rmpc on SHIP4L: m = 36, n = 2162. m=198, n=8418, prob=woodw, sw MPC=1, sw mu Meh=, sw p update=, sw x=3, sw sparse=1, diag ld= 6 solves 4 3 2 1.1.2.3.4..6.7.8.9 1 8 6 4 2.1.2.3.4..6.7.8.9 1 Q /n (smallest Q displayed is 736) (Infeasible) rmpc on WOODW: m = 198, n = 8418. 26
m=2, n=1, prob=sipow1, sw MPC=1, sw mu Meh=, sw p update=, sw x=3, sw sparse=1, diag ld= 2. 2 solves 1. 1..1.2.3.4..6.7.8.9 1 12 1 8 6 4 2.1.2.3.4..6.7.8.9 1 Q /n (smallest Q displayed is 2223) (Infeasible) rmpc on SIPOW1: m = 2, n = 1. m=2, n=1, prob=sipow2, sw MPC=1, sw mu Meh=, sw p update=, sw x=3, sw sparse=1, diag ld=.8.6 solves.4.2.1.2.3.4..6.7.8.9 1 2 1 1.1.2.3.4..6.7.8.9 1 Q /n (smallest Q displayed is 12) (Infeasible) rmpc on SIPOW2: m = 2, n = 1. 27
Concluding Remarks Reduced version of an primal-dual affine scaling algorithm (rpdas) and of Mehrotra s predictor-corrector algorithm (rmpc) were proposed. When n m and m 1, for both rpdas and rmpc, major reduction in per iteration can be achieved. Under nondegeneracy assumptions, rpdas is proved to converge quadratically in the primal-dual space; a convergence proof for rmpc is lacking at this time. Numerical experiments show that The number of iterations to convergence remains essentially constant as Q decreases, down to a small multiple of m. One some problems (e.g., SCSD6), when Q is reduced below a certain value, the algorithm fails due to A Q losing rank. Schemes to bypass this difficulty are being investigated. This presentation can be downloaded from http://www.isr.umd.edu/~andre/umbc.pdf The full paper should be completed by April 2. 28