A Constraint-Reduced MPC Algorithm for Convex Quadratic Programming, with a Modified Active-Set Identification Scheme

A Constraint-Reduced MPC Algorithm for Convex Quadratic Programming, with a Modified Active-Set Identification Scheme M. Paul Laiu 1 and (presenter) André L. Tits 2 1 Oak Ridge National Laboratory laiump@ornl.gov 2 Department ECE and ISR, University of Maryland, College Park andre@umd.edu ISMP, Bordeaux, July 1 6, 2018

Outline Mehrotra s Predictor/Corrector (MPC) for CQP Constraint-Reduced MPC for CQP Constraint Selection Convergence Theorem Numerical Results Conclusions

Convex Quadratic Program (CQP) (P) minimize f (x) := 1 x R n 2 xt Hx + c T x subject to Ax b, maximize 1 x R n, λ R m 2 xt Hx + b T λ (D) subject to Hx + c A T λ = 0, λ 0. x R n, c R n, H = H T 0, λ R m, A R m n, b R m. We are mostly interested in the case when most of the primal inequality constraints are inactive at the solution (e.g., m n). (x, λ) solves (P) (D) iff it satisfies the (KKT) system where S := diag(s) 0. Hx A T λ + c = 0 Ax b s = 0 Sλ = 0 s, λ 0,

MPC for CQP Given (x, λ), with s := Ax b > 0 and λ > 0, Compute the Newton (affine-scaling) search direction by solving H AT 0 x a A 0 I λ a = Hx AT λ + c 0, 0 S Λ s a Sλ where Λ := diag(λ). Set µ := s T λ/m and σ := (1 α a ) 3, with α a := argmax{α [0, 1] : s + α s a 0, λ + α λ a 0}. Compute a centering/corrector direction by solving Update H AT 0 A 0 I 0 S Λ x c λ c s c = 0 0 σµ1 S a λ a. (x +, λ + ) = (x, λ) + (α p ( x a + x c ), α d ( λ a + λ c )), with α p, α d (0, 1] such that s + := Ax + b > 0 and λ + > 0.

Toward Constraint Reduction By block Gaussian elimination, the Newton system becomes M x a = (Hx + c), s a = A x a, λ a = λ S 1 Λ s a, where M := H + A T S 1 ΛA = H + m λ i i=1 s i a i a T i, and similarly (same M) for the computation of the centering/corrector direction. When m > n, the main cost in computing the search direction is in forming M: approximately mn 2 /2 multiplications per iteration. If m n and we limit the sum to q appropriately selected terms, the cost per iteration will be reduced by a factor of m/q (!)

Outline Mehrotra s Predictor/Corrector (MPC) for CQP Constraint-Reduced MPC for CQP Constraint Selection Convergence Theorem Numerical Results Conclusions

Constraint Reduction (CR) for CQP: Basic Ideas Observation: For CQPs with m n, it is typical that most of the constraints are irrelevant or redundant. Idea: At each iteration, try to guess a small subset Q of constraints to compute the search direction. irrelevant n = 2 m = 13 x x redundant active Ignore many constraints Q = 6

CR for CQP: Reduced Normal Matrix At each iteration, select a working set Q of constraints and obtain an MPC search direction, from the current (x k, λ k ), for the problem 1 minimize x R n 2 xt Hx + c T x subject to A Q x b Q, with A Q a submatrix of A and b Q a subvector of b. Normal matrix for the reduced problem M (Q) := H + A T QS 1 Q Λ QA Q = H + λ i a i a T i s i i Q Cost of forming M (Q) reduces from mn 2 /2 to Q n 2 /2 multiplications Step sizes are still computed using all m constraints. (But the CPU cost is small compared to that of computing the search direction.)

CR-MPC for CQP At each iteration: Given (x, λ), with s := Ax b > 0 and λ > 0, select working set Q Compute the Newton (affine-scaling) search direction by solving M (Q) x a = (Hx + c), and set s a = A x a, λ a Q = λ Q S 1 Q Λ Q s a Q. Set µ (Q) = s T Q λ Q/q and σ := (1 α a ) 3, with α a := argmax{α [0, 1] : s + α s a 0, λ + α λ a 0}. Compute the centering/corrector direction by solving M (Q) x c = A T QS 1 Q (σµ (Q)1 S a Q λ a Q), and set s c = A x c, λ c Q = S 1 Q ( Λ Q s c Q + σµ (Q)1 SQ a λa Q). Set the mixing parameter γ (0, 1] (see next slide). Set ( x, λ Q ) = ( x a, λ a Q) + γ( x c, λ c Q)

CR-MPC for CQP with Convergence Safeguards The CR-MPC search direction is given by ( x, λ) = ( x a, λ a ) + γ( x c, λ c ), where mixing parameter γ guarantees that x is a descent direction for primal objective function f (indeed, in the CR context, this is critical to convergence); limits the effect of too-large x c { } γ := min γ 1, τ xa x c, τ xa σµ with γ 1 := argmax {f (x) f (x + x a + γ x c ) ζ(f (x) f (x + x a ))}, γ [0,1] where f is the primal objective function and τ, ζ (0, 1).

Regularized Normal Matrix M is invertible iff [H A T ] has full rank, which can be guaranteed by pre-processing. HOWEVER nonsingularity of M (Q) (and indeed unique solvability of the reduced linear systems) requires that the reduced matrix [H A T Q ] have full (numerical) rank, which is far from guaranteed. Regularization: replace M (Q) with M (Q) := W + A T QS 1 Q Λ QA Q, where W := H + ϱr, with R 0, and let ϱ > 0 go to zero as a solution of the optimization { problem } is approached. The choice ϱ := min and R = I turns out to be 1, E(x,λ) Ē adequate. Here E(x, λ) is a measure of the distance to optimality: where E(x, λ) := ( v(x, λ), w(x, λ) ), v(x, λ) = Hx + c A T λ, w i (x, λ) := min{ s i, λ i }, i = 1,..., m.

Outline Mehrotra s Predictor/Corrector (MPC) for CQP Constraint-Reduced MPC for CQP Constraint Selection Convergence Theorem Numerical Results Conclusions

How to select Q? Various constraint selection rules have been used in the past. We propose a CONDITION on the selection rule that guarantees convergence of the regularized CR-MPC algorithm. This condition is met by all existing selection rules we are aware of. Condition (CSR) The constraint-selection rule should be such that: 1. when {(x k, λ k )} is bounded away from optimality, Q k eventually includes every active (primal) constraint at limit points x of {x k } when such limit points are approached; 2. when {x k } converges to a primal solution point x, Q k eventually includes every active constraint at x.

Some Previously used Constraint-Selection Rules Rule JOT [Jung, O Leary, ALT: Adaptive constraint reduction for training support vector machines, Electronic Transactions on Numerical Analysis, Vol. 31, 156 177 (2008)]: Q := {i : s i η}, where η = qth smallest slack s i, q being a certain decreasing function of duality measure µ that saturates at q = n. Rule FFK-CWH (for general NLP) Proposed in [Chen, Wang, He: A feasible active set QP-free method for nonlinear programming, SIAM J. Optimization, 17(2), 401 429 (2006)]: Q := {i : s i E(x, λ)} based on a result in [Facchinei, Fischer, Kanzow: On the accurate identification of active constraints, SIAM J. Optimization, 9(1), 14 32 (1998)]: (x x, λ λ ) E(x, λ) is bounded in a neighborhood of (x, λ )

Proposed Constraint-Selection Rule: Rule R Parameters: δ > 0, 0 < β < θ < 1. Input: Iteration: k, Slack variable: s k, Error: E min (value of error E k when δ k was last reduced), E k := E(x k, λ k ), Threshold: δ k 1. Output: Working set: Q k, Threshold: δ k Error: E min. if k = 0: δ 0 := δ, E min := E 0 else if E k βe min δk := θδ k 1 Emin := E k else δk := δ k 1 Select Q k := {i m s k i δ k }. Theorem: Rule R satisfies CSR.

Outline Mehrotra s Predictor/Corrector (MPC) for CQP Constraint-Reduced MPC for CQP Constraint Selection Convergence Theorem Numerical Results Conclusions

Convergence Theorem Assumptions: 1. Primal strictly feasible set is non-empty; primal solution set F P is non-empty and bounded. At every feasible point x, A A(x) has full row rank. (A(x) denotes the active set at x.) 2. There exists (unique) x where SOSC with strict complementarity holds, with (unique) λ. Theorem Suppose that Condition CSR and Assumption 1 hold. Then {x k } F P. Suppose that, in addition, Assumption 2 holds. Then Q k contains A(x ) for k large enough. [With Rule R or Rule FFK-CWH, Q k = A(x ) for k large enough.] Q-quadratic convergence. Specifically, C > 0 such that, given any initial point (x 0, λ 0 ), there exists k such that, for all k > k, (x k+1 x, λ k+1 λ ) C (x k x, λ k λ ) 2.

Outline Mehrotra s Predictor/Corrector (MPC) for CQP Constraint-Reduced MPC for CQP Constraint Selection Convergence Theorem Numerical Results Conclusions

Randomly Generated Problems Problem setting: 1 minimize x R n 2 xt Hx + c T x subject to Ax b. A N (0, 1), c N (0, 1), x 0 U(0, 1), and s 0 U(1, 2) b := Ax 0 s 0 m := 10 000 and n between 10 and 500. We consider the following two classes of Hessian matrices: 1. Strongly convex quadratic program: diagonal H, diag(h) U(0, 1). 2. Linear Program: H = 0. We solved 50 randomly generated problems for each class of H and for each problem size, and report the results averaged over the 50 problems.

Randomly Generated Problems 20 18 16 14 12 10 8 10 20 50 100 200 500 Iteration count 10 4 10 3 10 2 10 1 10 20 50 100 200 500 Size of working set (a) Strongly convex QP 10 2 10 1 10 0 10-1 10-2 10 20 50 100 200 500 Computation time 40 35 30 25 20 15 10 10 20 50 100 200 500 Iteration count 10 4 10 3 10 2 10 1 10 20 50 100 200 500 Size of working set (b) LP 10 2 10 1 10 0 10-1 10-2 10 20 50 100 200 500 Computation time

Data Fitting Problems Regularized minimax data fitting problem: minimize Ā x b + 1 2ᾱ xt H x x R n minimize x R n u R u + 1 2ᾱ xt H x subject to Ā x b u1, Ā x + b u1. b: noisy data measurement from a target function g. A: trigonometric basis, x: expansion coefficients H: regularization matrix, ᾱ: penalty parameter m = 10 000, n from 10 to 500 For each choice of g and for each problem size, we solved the problem 50 times and report the results averaged over the 50 problems.

Data Fitting Problems 250 10 4 10 3 200 10 2 150 100 10 3 10 0 50 0 10 20 50 100 200 500 Iteration count 10 2 10 20 50 100 200 500 Size of working set (a) g(t) = sin(10t) cos(25t 2 ) 10-2 10 20 50 100 200 500 Computation time 250 10 4 10 3 200 10 2 150 100 10 3 10 0 50 0 10 2 10-2 10 20 50 100 200 500 10 20 50 100 200 500 10 20 50 100 200 500 Iteration count Size of working set Computation time (b) g(t) = sin(5t 3 ) cos 2 (10t)

Outline Mehrotra s Predictor/Corrector (MPC) for CQP Constraint-Reduced MPC for CQP Constraint Selection Convergence Theorem Numerical Results Conclusions

Conclusions A convergent, constraint-reduced (CR) variant of Mehrotra s Predictor/Corrector for convex quadratic programming was stated and analyzed.

Conclusions A convergent, constraint-reduced (CR) variant of Mehrotra s Predictor/Corrector for convex quadratic programming was stated and analyzed. A regularization scheme was used to account for CR-triggered rank deficiency away from solutions. A class of constraint-selection rules was defined by means of a sufficient condition (Condition CSR) that guarantees strong convergence properties for the resulting algorithm. A new selection rule was proposed, based on a modified version of an active-constraint identification function due to Facchinei et al.