A Constraint-Reduced MPC Algorithm for Convex Quadratic Programming, with a Modified Active-Set Identification Scheme M. Paul Laiu 1 and (presenter) André L. Tits 2 1 Oak Ridge National Laboratory laiump@ornl.gov 2 Department ECE and ISR, University of Maryland, College Park andre@umd.edu ISMP, Bordeaux, July 1 6, 2018
Outline Mehrotra s Predictor/Corrector (MPC) for CQP Constraint-Reduced MPC for CQP Constraint Selection Convergence Theorem Numerical Results Conclusions
Convex Quadratic Program (CQP) (P) minimize f (x) := 1 x R n 2 xt Hx + c T x subject to Ax b, maximize 1 x R n, λ R m 2 xt Hx + b T λ (D) subject to Hx + c A T λ = 0, λ 0. x R n, c R n, H = H T 0, λ R m, A R m n, b R m. We are mostly interested in the case when most of the primal inequality constraints are inactive at the solution (e.g., m n). (x, λ) solves (P) (D) iff it satisfies the (KKT) system where S := diag(s) 0. Hx A T λ + c = 0 Ax b s = 0 Sλ = 0 s, λ 0,
MPC for CQP Given (x, λ), with s := Ax b > 0 and λ > 0, Compute the Newton (affine-scaling) search direction by solving H AT 0 x a A 0 I λ a = Hx AT λ + c 0, 0 S Λ s a Sλ where Λ := diag(λ). Set µ := s T λ/m and σ := (1 α a ) 3, with α a := argmax{α [0, 1] : s + α s a 0, λ + α λ a 0}. Compute a centering/corrector direction by solving Update H AT 0 A 0 I 0 S Λ x c λ c s c = 0 0 σµ1 S a λ a. (x +, λ + ) = (x, λ) + (α p ( x a + x c ), α d ( λ a + λ c )), with α p, α d (0, 1] such that s + := Ax + b > 0 and λ + > 0.
Toward Constraint Reduction By block Gaussian elimination, the Newton system becomes M x a = (Hx + c), s a = A x a, λ a = λ S 1 Λ s a, where M := H + A T S 1 ΛA = H + m λ i i=1 s i a i a T i, and similarly (same M) for the computation of the centering/corrector direction. When m > n, the main cost in computing the search direction is in forming M: approximately mn 2 /2 multiplications per iteration. If m n and we limit the sum to q appropriately selected terms, the cost per iteration will be reduced by a factor of m/q (!)
Outline Mehrotra s Predictor/Corrector (MPC) for CQP Constraint-Reduced MPC for CQP Constraint Selection Convergence Theorem Numerical Results Conclusions
Constraint Reduction (CR) for CQP: Basic Ideas Observation: For CQPs with m n, it is typical that most of the constraints are irrelevant or redundant. Idea: At each iteration, try to guess a small subset Q of constraints to compute the search direction. irrelevant n = 2 m = 13 x x redundant active Ignore many constraints Q = 6
CR for CQP: Reduced Normal Matrix At each iteration, select a working set Q of constraints and obtain an MPC search direction, from the current (x k, λ k ), for the problem 1 minimize x R n 2 xt Hx + c T x subject to A Q x b Q, with A Q a submatrix of A and b Q a subvector of b. Normal matrix for the reduced problem M (Q) := H + A T QS 1 Q Λ QA Q = H + λ i a i a T i s i i Q Cost of forming M (Q) reduces from mn 2 /2 to Q n 2 /2 multiplications Step sizes are still computed using all m constraints. (But the CPU cost is small compared to that of computing the search direction.)
CR-MPC for CQP At each iteration: Given (x, λ), with s := Ax b > 0 and λ > 0, select working set Q Compute the Newton (affine-scaling) search direction by solving M (Q) x a = (Hx + c), and set s a = A x a, λ a Q = λ Q S 1 Q Λ Q s a Q.
CR-MPC for CQP At each iteration: Given (x, λ), with s := Ax b > 0 and λ > 0, select working set Q Compute the Newton (affine-scaling) search direction by solving M (Q) x a = (Hx + c), and set s a = A x a, λ a Q = λ Q S 1 Q Λ Q s a Q. Set µ (Q) = s T Q λ Q/q and σ := (1 α a ) 3, with α a := argmax{α [0, 1] : s + α s a 0, λ + α λ a 0}. Compute the centering/corrector direction by solving M (Q) x c = A T QS 1 Q (σµ (Q)1 S a Q λ a Q), and set s c = A x c, λ c Q = S 1 Q ( Λ Q s c Q + σµ (Q)1 SQ a λa Q).
CR-MPC for CQP At each iteration: Given (x, λ), with s := Ax b > 0 and λ > 0, select working set Q Compute the Newton (affine-scaling) search direction by solving M (Q) x a = (Hx + c), and set s a = A x a, λ a Q = λ Q S 1 Q Λ Q s a Q. Set µ (Q) = s T Q λ Q/q and σ := (1 α a ) 3, with α a := argmax{α [0, 1] : s + α s a 0, λ + α λ a 0}. Compute the centering/corrector direction by solving M (Q) x c = A T QS 1 Q (σµ (Q)1 S a Q λ a Q), and set s c = A x c, λ c Q = S 1 Q ( Λ Q s c Q + σµ (Q)1 SQ a λa Q). Set the mixing parameter γ (0, 1] (see next slide). Set ( x, λ Q ) = ( x a, λ a Q) + γ( x c, λ c Q)
CR-MPC for CQP At each iteration: Given (x, λ), with s := Ax b > 0 and λ > 0, select working set Q Compute the Newton (affine-scaling) search direction by solving M (Q) x a = (Hx + c), and set s a = A x a, λ a Q = λ Q S 1 Q Λ Q s a Q. Set µ (Q) = s T Q λ Q/q and σ := (1 α a ) 3, with α a := argmax{α [0, 1] : s + α s a 0, λ + α λ a 0}. Compute the centering/corrector direction by solving M (Q) x c = A T QS 1 Q (σµ (Q)1 S a Q λ a Q), and set s c = A x c, λ c Q = S 1 Q ( Λ Q s c Q + σµ (Q)1 SQ a λa Q). Set the mixing parameter γ (0, 1] (see next slide). Set ( x, λ Q ) = ( x a, λ a Q) + γ( x c, λ c Q) Update. With α p, α d (0, 1] such that s + > 0 and λ + Q > 0: (x +, λ + Q ) = (x, λ Q) + (α p x, α d λ Q ), s + = Ax + b, λ + i = ((s + Q )T (λ + Q )/ Q )/s+ i, i Q.
CR-MPC for CQP with Convergence Safeguards The CR-MPC search direction is given by ( x, λ) = ( x a, λ a ) + γ( x c, λ c ), where mixing parameter γ guarantees that x is a descent direction for primal objective function f (indeed, in the CR context, this is critical to convergence); limits the effect of too-large x c { } γ := min γ 1, τ xa x c, τ xa σµ with γ 1 := argmax {f (x) f (x + x a + γ x c ) ζ(f (x) f (x + x a ))}, γ [0,1] where f is the primal objective function and τ, ζ (0, 1).
Regularized Normal Matrix M is invertible iff [H A T ] has full rank, which can be guaranteed by pre-processing. HOWEVER nonsingularity of M (Q) (and indeed unique solvability of the reduced linear systems) requires that the reduced matrix [H A T Q ] have full (numerical) rank, which is far from guaranteed. Regularization: replace M (Q) with M (Q) := W + A T QS 1 Q Λ QA Q, where W := H + ϱr, with R 0, and let ϱ > 0 go to zero as a solution of the optimization { problem } is approached. The choice ϱ := min and R = I turns out to be 1, E(x,λ) Ē adequate. Here E(x, λ) is a measure of the distance to optimality: where E(x, λ) := ( v(x, λ), w(x, λ) ), v(x, λ) = Hx + c A T λ, w i (x, λ) := min{ s i, λ i }, i = 1,..., m.
Outline Mehrotra s Predictor/Corrector (MPC) for CQP Constraint-Reduced MPC for CQP Constraint Selection Convergence Theorem Numerical Results Conclusions
How to select Q? Various constraint selection rules have been used in the past. We propose a CONDITION on the selection rule that guarantees convergence of the regularized CR-MPC algorithm. This condition is met by all existing selection rules we are aware of. Condition (CSR) The constraint-selection rule should be such that: 1. when {(x k, λ k )} is bounded away from optimality, Q k eventually includes every active (primal) constraint at limit points x of {x k } when such limit points are approached; 2. when {x k } converges to a primal solution point x, Q k eventually includes every active constraint at x.
Some Previously used Constraint-Selection Rules Rule JOT [Jung, O Leary, ALT: Adaptive constraint reduction for training support vector machines, Electronic Transactions on Numerical Analysis, Vol. 31, 156 177 (2008)]: Q := {i : s i η}, where η = qth smallest slack s i, q being a certain decreasing function of duality measure µ that saturates at q = n. Rule FFK-CWH (for general NLP) Proposed in [Chen, Wang, He: A feasible active set QP-free method for nonlinear programming, SIAM J. Optimization, 17(2), 401 429 (2006)]: Q := {i : s i E(x, λ)} based on a result in [Facchinei, Fischer, Kanzow: On the accurate identification of active constraints, SIAM J. Optimization, 9(1), 14 32 (1998)]: (x x, λ λ ) E(x, λ) is bounded in a neighborhood of (x, λ )
Proposed Constraint-Selection Rule: Rule R Parameters: δ > 0, 0 < β < θ < 1. Input: Iteration: k, Slack variable: s k, Error: E min (value of error E k when δ k was last reduced), E k := E(x k, λ k ), Threshold: δ k 1. Output: Working set: Q k, Threshold: δ k Error: E min. if k = 0: δ 0 := δ, E min := E 0 else if E k βe min δk := θδ k 1 Emin := E k else δk := δ k 1 Select Q k := {i m s k i δ k }. Theorem: Rule R satisfies CSR.
Outline Mehrotra s Predictor/Corrector (MPC) for CQP Constraint-Reduced MPC for CQP Constraint Selection Convergence Theorem Numerical Results Conclusions
Convergence Theorem Assumptions: 1. Primal strictly feasible set is non-empty; primal solution set F P is non-empty and bounded. At every feasible point x, A A(x) has full row rank. (A(x) denotes the active set at x.) 2. There exists (unique) x where SOSC with strict complementarity holds, with (unique) λ.
Convergence Theorem Assumptions: 1. Primal strictly feasible set is non-empty; primal solution set F P is non-empty and bounded. At every feasible point x, A A(x) has full row rank. (A(x) denotes the active set at x.) 2. There exists (unique) x where SOSC with strict complementarity holds, with (unique) λ. Theorem Suppose that Condition CSR and Assumption 1 hold. Then {x k } F P. Suppose that, in addition, Assumption 2 holds. Then Q k contains A(x ) for k large enough. [With Rule R or Rule FFK-CWH, Q k = A(x ) for k large enough.] Q-quadratic convergence. Specifically, C > 0 such that, given any initial point (x 0, λ 0 ), there exists k such that, for all k > k, (x k+1 x, λ k+1 λ ) C (x k x, λ k λ ) 2.
Outline Mehrotra s Predictor/Corrector (MPC) for CQP Constraint-Reduced MPC for CQP Constraint Selection Convergence Theorem Numerical Results Conclusions
Randomly Generated Problems Problem setting: 1 minimize x R n 2 xt Hx + c T x subject to Ax b. A N (0, 1), c N (0, 1), x 0 U(0, 1), and s 0 U(1, 2) b := Ax 0 s 0 m := 10 000 and n between 10 and 500. We consider the following two classes of Hessian matrices: 1. Strongly convex quadratic program: diagonal H, diag(h) U(0, 1). 2. Linear Program: H = 0. We solved 50 randomly generated problems for each class of H and for each problem size, and report the results averaged over the 50 problems.
Randomly Generated Problems 20 18 16 14 12 10 8 10 20 50 100 200 500 Iteration count 10 4 10 3 10 2 10 1 10 20 50 100 200 500 Size of working set (a) Strongly convex QP 10 2 10 1 10 0 10-1 10-2 10 20 50 100 200 500 Computation time 40 35 30 25 20 15 10 10 20 50 100 200 500 Iteration count 10 4 10 3 10 2 10 1 10 20 50 100 200 500 Size of working set (b) LP 10 2 10 1 10 0 10-1 10-2 10 20 50 100 200 500 Computation time
Data Fitting Problems Regularized minimax data fitting problem: minimize Ā x b + 1 2ᾱ xt H x x R n minimize x R n u R u + 1 2ᾱ xt H x subject to Ā x b u1, Ā x + b u1. b: noisy data measurement from a target function g. A: trigonometric basis, x: expansion coefficients H: regularization matrix, ᾱ: penalty parameter m = 10 000, n from 10 to 500 For each choice of g and for each problem size, we solved the problem 50 times and report the results averaged over the 50 problems.
Data Fitting Problems 250 10 4 10 3 200 10 2 150 100 10 3 10 0 50 0 10 20 50 100 200 500 Iteration count 10 2 10 20 50 100 200 500 Size of working set (a) g(t) = sin(10t) cos(25t 2 ) 10-2 10 20 50 100 200 500 Computation time 250 10 4 10 3 200 10 2 150 100 10 3 10 0 50 0 10 2 10-2 10 20 50 100 200 500 10 20 50 100 200 500 10 20 50 100 200 500 Iteration count Size of working set Computation time (b) g(t) = sin(5t 3 ) cos 2 (10t)
Outline Mehrotra s Predictor/Corrector (MPC) for CQP Constraint-Reduced MPC for CQP Constraint Selection Convergence Theorem Numerical Results Conclusions
Conclusions A convergent, constraint-reduced (CR) variant of Mehrotra s Predictor/Corrector for convex quadratic programming was stated and analyzed.
Conclusions A convergent, constraint-reduced (CR) variant of Mehrotra s Predictor/Corrector for convex quadratic programming was stated and analyzed. A regularization scheme was used to account for CR-triggered rank deficiency away from solutions.
Conclusions A convergent, constraint-reduced (CR) variant of Mehrotra s Predictor/Corrector for convex quadratic programming was stated and analyzed. A regularization scheme was used to account for CR-triggered rank deficiency away from solutions. A class of constraint-selection rules was defined by means of a sufficient condition (Condition CSR) that guarantees strong convergence properties for the resulting algorithm.
Conclusions A convergent, constraint-reduced (CR) variant of Mehrotra s Predictor/Corrector for convex quadratic programming was stated and analyzed. A regularization scheme was used to account for CR-triggered rank deficiency away from solutions. A class of constraint-selection rules was defined by means of a sufficient condition (Condition CSR) that guarantees strong convergence properties for the resulting algorithm. A new selection rule was proposed, based on a modified version of an active-constraint identification function due to Facchinei et al.
Conclusions A convergent, constraint-reduced (CR) variant of Mehrotra s Predictor/Corrector for convex quadratic programming was stated and analyzed. A regularization scheme was used to account for CR-triggered rank deficiency away from solutions. A class of constraint-selection rules was defined by means of a sufficient condition (Condition CSR) that guarantees strong convergence properties for the resulting algorithm. A new selection rule was proposed, based on a modified version of an active-constraint identification function due to Facchinei et al. Numerical results were reported that show the benefit of CR on problems with many inequality constraints, and the power of the new proposed selection rule. The slides are available from http://www.ece.umd.edu/ andre