A Constraint-Reduced MPC Algorithm for Convex Quadratic Programming, with a Modified Active-Set Identification Scheme

Similar documents
A Constraint-Reduced MPC Algorithm for Convex Quadratic Programming, with a Modified Active Set Identification Scheme

Constraint Reduction for Linear Programs with Many Constraints

ABSTRACT. linear and convex quadratic programs with many more inequality constraints than

SVM May 2007 DOE-PI Dianne P. O Leary c 2007

Support Vector Machines: Maximum Margin Classifiers

Algorithms for Constrained Optimization

minimize x subject to (x 2)(x 4) u,

A Constraint-Reduced Variant of Mehrotra s Predictor-Corrector Algorithm

Lecture 15 Newton Method and Self-Concordance. October 23, 2008

MPC Infeasibility Handling

Written Examination

Lecture: Algorithms for LP, SOCP and SDP

On Generalized Primal-Dual Interior-Point Methods with Non-uniform Complementarity Perturbations for Quadratic Programming

CS711008Z Algorithm Design and Analysis

10 Numerical methods for constrained problems

ALADIN An Algorithm for Distributed Non-Convex Optimization and Control

Part 4: Active-set methods for linearly constrained optimization. Nick Gould (RAL)

Linear algebra issues in Interior Point methods for bound-constrained least-squares problems

Interior Point Methods. We ll discuss linear programming first, followed by three nonlinear problems. Algorithms for Linear Programming Problems

Infeasibility Detection and an Inexact Active-Set Method for Large-Scale Nonlinear Optimization

Second-order cone programming

Linear Programming: Simplex

Lecture 8 Plus properties, merit functions and gap functions. September 28, 2008

Numerical Optimization. Review: Unconstrained Optimization

Optimization. Yuh-Jye Lee. March 21, Data Science and Machine Intelligence Lab National Chiao Tung University 1 / 29

Interior Point Methods in Mathematical Programming

N. L. P. NONLINEAR PROGRAMMING (NLP) deals with optimization models with at least one nonlinear function. NLP. Optimization. Models of following form:

Optimality Conditions for Constrained Optimization

Lecture 18: Optimization Programming

Algorithms for constrained local optimization

An Inexact Sequential Quadratic Optimization Method for Nonlinear Optimization

Convex Optimization. Dani Yogatama. School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA. February 12, 2014

Lectures 9 and 10: Constrained optimization problems and their optimality conditions

Homework 4. Convex Optimization /36-725

Lecture 1. 1 Conic programming. MA 796S: Convex Optimization and Interior Point Methods October 8, Consider the conic program. min.

CSCI 1951-G Optimization Methods in Finance Part 09: Interior Point Methods

Interior-Point Methods for Linear Optimization

A Constraint-Reduced Variant of Mehrotra s Predictor-Corrector Algorithm

The convergence of stationary iterations with indefinite splitting

Nonlinear Optimization: What s important?

Support Vector Machine (SVM) and Kernel Methods

Agenda. Interior Point Methods. 1 Barrier functions. 2 Analytic center. 3 Central path. 4 Barrier method. 5 Primal-dual path following algorithms

SF2822 Applied Nonlinear Optimization. Preparatory question. Lecture 9: Sequential quadratic programming. Anders Forsgren

5.5 Quadratic programming

Lecture 3. Optimization Problems and Iterative Algorithms

Optimization. Yuh-Jye Lee. March 28, Data Science and Machine Intelligence Lab National Chiao Tung University 1 / 40

Convex Optimization. Newton s method. ENSAE: Optimisation 1/44

Nonlinear Optimization for Optimal Control

Numerical Methods for Model Predictive Control. Jing Yang

Interior Point Algorithms for Constrained Convex Optimization

Interior Point Methods for LP

1 Computing with constraints

A GLOBALLY CONVERGENT STABILIZED SQP METHOD: SUPERLINEAR CONVERGENCE

EE/ACM Applications of Convex Optimization in Signal Processing and Communications Lecture 17

Following The Central Trajectory Using The Monomial Method Rather Than Newton's Method

SMO vs PDCO for SVM: Sequential Minimal Optimization vs Primal-Dual interior method for Convex Objectives for Support Vector Machines

Interior Point Methods for Convex Quadratic and Convex Nonlinear Programming

CE 191: Civil and Environmental Engineering Systems Analysis. LEC 05 : Optimality Conditions

ECE580 Fall 2015 Solution to Midterm Exam 1 October 23, Please leave fractions as fractions, but simplify them, etc.

CSCI : Optimization and Control of Networks. Review on Convex Optimization

Linear Regression (continued)

Support Vector Machine (SVM) and Kernel Methods

Sparsity Regularization

BOUNDS ON EIGENVALUES OF MATRICES ARISING FROM INTERIOR-POINT METHODS

A Brief Review on Convex Optimization

A Smoothing Newton Method for Solving Absolute Value Equations

Chapter 3 Numerical Methods

CS-E4830 Kernel Methods in Machine Learning

Constrained optimization: direct methods (cont.)

A GLOBALLY CONVERGENT STABILIZED SQP METHOD

A STABILIZED SQP METHOD: SUPERLINEAR CONVERGENCE

Optimality, Duality, Complementarity for Constrained Optimization

Algorithms for nonlinear programming problems II

18. Primal-dual interior-point methods

ABSTRACT. been done to make these machines more efficient in classification. In our work, we

E5295/5B5749 Convex optimization with engineering applications. Lecture 5. Convex programming and semidefinite programming


Support Vector Machine (SVM) and Kernel Methods

INTERIOR-POINT METHODS FOR NONCONVEX NONLINEAR PROGRAMMING: CONVERGENCE ANALYSIS AND COMPUTATIONAL PERFORMANCE

2.3 Linear Programming

Support Vector Machines, Kernel SVM

Constrained Optimization and Lagrangian Duality

Constrained Nonlinear Optimization Algorithms

Lecture 7 Duality II

AN AUGMENTED LAGRANGIAN AFFINE SCALING METHOD FOR NONLINEAR PROGRAMMING

Scientific Computing: Optimization

Computational Optimization. Constrained Optimization Part 2

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

2.098/6.255/ Optimization Methods Practice True/False Questions

A Regularized Interior-Point Method for Constrained Nonlinear Least Squares

Interior-Point Methods

Numerical optimization

A FULL-NEWTON STEP INFEASIBLE-INTERIOR-POINT ALGORITHM COMPLEMENTARITY PROBLEMS

Optimization. Escuela de Ingeniería Informática de Oviedo. (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 1 / 30

UNDERGROUND LECTURE NOTES 1: Optimality Conditions for Constrained Optimization Problems

Optimization Problems with Constraints - introduction to theory, numerical Methods and applications

Optimization for Communications and Networks. Poompat Saengudomlert. Session 4 Duality and Lagrange Multipliers

Sufficient Conditions for Finite-variable Constrained Minimization

Numerical Optimization Professor Horst Cerjak, Horst Bischof, Thomas Pock Mat Vis-Gra SS09

ISM206 Lecture Optimization of Nonlinear Objective with Linear Constraints

Transcription:

A Constraint-Reduced MPC Algorithm for Convex Quadratic Programming, with a Modified Active-Set Identification Scheme M. Paul Laiu 1 and (presenter) André L. Tits 2 1 Oak Ridge National Laboratory laiump@ornl.gov 2 Department ECE and ISR, University of Maryland, College Park andre@umd.edu ISMP, Bordeaux, July 1 6, 2018

Outline Mehrotra s Predictor/Corrector (MPC) for CQP Constraint-Reduced MPC for CQP Constraint Selection Convergence Theorem Numerical Results Conclusions

Convex Quadratic Program (CQP) (P) minimize f (x) := 1 x R n 2 xt Hx + c T x subject to Ax b, maximize 1 x R n, λ R m 2 xt Hx + b T λ (D) subject to Hx + c A T λ = 0, λ 0. x R n, c R n, H = H T 0, λ R m, A R m n, b R m. We are mostly interested in the case when most of the primal inequality constraints are inactive at the solution (e.g., m n). (x, λ) solves (P) (D) iff it satisfies the (KKT) system where S := diag(s) 0. Hx A T λ + c = 0 Ax b s = 0 Sλ = 0 s, λ 0,

MPC for CQP Given (x, λ), with s := Ax b > 0 and λ > 0, Compute the Newton (affine-scaling) search direction by solving H AT 0 x a A 0 I λ a = Hx AT λ + c 0, 0 S Λ s a Sλ where Λ := diag(λ). Set µ := s T λ/m and σ := (1 α a ) 3, with α a := argmax{α [0, 1] : s + α s a 0, λ + α λ a 0}. Compute a centering/corrector direction by solving Update H AT 0 A 0 I 0 S Λ x c λ c s c = 0 0 σµ1 S a λ a. (x +, λ + ) = (x, λ) + (α p ( x a + x c ), α d ( λ a + λ c )), with α p, α d (0, 1] such that s + := Ax + b > 0 and λ + > 0.

Toward Constraint Reduction By block Gaussian elimination, the Newton system becomes M x a = (Hx + c), s a = A x a, λ a = λ S 1 Λ s a, where M := H + A T S 1 ΛA = H + m λ i i=1 s i a i a T i, and similarly (same M) for the computation of the centering/corrector direction. When m > n, the main cost in computing the search direction is in forming M: approximately mn 2 /2 multiplications per iteration. If m n and we limit the sum to q appropriately selected terms, the cost per iteration will be reduced by a factor of m/q (!)

Outline Mehrotra s Predictor/Corrector (MPC) for CQP Constraint-Reduced MPC for CQP Constraint Selection Convergence Theorem Numerical Results Conclusions

Constraint Reduction (CR) for CQP: Basic Ideas Observation: For CQPs with m n, it is typical that most of the constraints are irrelevant or redundant. Idea: At each iteration, try to guess a small subset Q of constraints to compute the search direction. irrelevant n = 2 m = 13 x x redundant active Ignore many constraints Q = 6

CR for CQP: Reduced Normal Matrix At each iteration, select a working set Q of constraints and obtain an MPC search direction, from the current (x k, λ k ), for the problem 1 minimize x R n 2 xt Hx + c T x subject to A Q x b Q, with A Q a submatrix of A and b Q a subvector of b. Normal matrix for the reduced problem M (Q) := H + A T QS 1 Q Λ QA Q = H + λ i a i a T i s i i Q Cost of forming M (Q) reduces from mn 2 /2 to Q n 2 /2 multiplications Step sizes are still computed using all m constraints. (But the CPU cost is small compared to that of computing the search direction.)

CR-MPC for CQP At each iteration: Given (x, λ), with s := Ax b > 0 and λ > 0, select working set Q Compute the Newton (affine-scaling) search direction by solving M (Q) x a = (Hx + c), and set s a = A x a, λ a Q = λ Q S 1 Q Λ Q s a Q.

CR-MPC for CQP At each iteration: Given (x, λ), with s := Ax b > 0 and λ > 0, select working set Q Compute the Newton (affine-scaling) search direction by solving M (Q) x a = (Hx + c), and set s a = A x a, λ a Q = λ Q S 1 Q Λ Q s a Q. Set µ (Q) = s T Q λ Q/q and σ := (1 α a ) 3, with α a := argmax{α [0, 1] : s + α s a 0, λ + α λ a 0}. Compute the centering/corrector direction by solving M (Q) x c = A T QS 1 Q (σµ (Q)1 S a Q λ a Q), and set s c = A x c, λ c Q = S 1 Q ( Λ Q s c Q + σµ (Q)1 SQ a λa Q).

CR-MPC for CQP At each iteration: Given (x, λ), with s := Ax b > 0 and λ > 0, select working set Q Compute the Newton (affine-scaling) search direction by solving M (Q) x a = (Hx + c), and set s a = A x a, λ a Q = λ Q S 1 Q Λ Q s a Q. Set µ (Q) = s T Q λ Q/q and σ := (1 α a ) 3, with α a := argmax{α [0, 1] : s + α s a 0, λ + α λ a 0}. Compute the centering/corrector direction by solving M (Q) x c = A T QS 1 Q (σµ (Q)1 S a Q λ a Q), and set s c = A x c, λ c Q = S 1 Q ( Λ Q s c Q + σµ (Q)1 SQ a λa Q). Set the mixing parameter γ (0, 1] (see next slide). Set ( x, λ Q ) = ( x a, λ a Q) + γ( x c, λ c Q)

CR-MPC for CQP At each iteration: Given (x, λ), with s := Ax b > 0 and λ > 0, select working set Q Compute the Newton (affine-scaling) search direction by solving M (Q) x a = (Hx + c), and set s a = A x a, λ a Q = λ Q S 1 Q Λ Q s a Q. Set µ (Q) = s T Q λ Q/q and σ := (1 α a ) 3, with α a := argmax{α [0, 1] : s + α s a 0, λ + α λ a 0}. Compute the centering/corrector direction by solving M (Q) x c = A T QS 1 Q (σµ (Q)1 S a Q λ a Q), and set s c = A x c, λ c Q = S 1 Q ( Λ Q s c Q + σµ (Q)1 SQ a λa Q). Set the mixing parameter γ (0, 1] (see next slide). Set ( x, λ Q ) = ( x a, λ a Q) + γ( x c, λ c Q) Update. With α p, α d (0, 1] such that s + > 0 and λ + Q > 0: (x +, λ + Q ) = (x, λ Q) + (α p x, α d λ Q ), s + = Ax + b, λ + i = ((s + Q )T (λ + Q )/ Q )/s+ i, i Q.

CR-MPC for CQP with Convergence Safeguards The CR-MPC search direction is given by ( x, λ) = ( x a, λ a ) + γ( x c, λ c ), where mixing parameter γ guarantees that x is a descent direction for primal objective function f (indeed, in the CR context, this is critical to convergence); limits the effect of too-large x c { } γ := min γ 1, τ xa x c, τ xa σµ with γ 1 := argmax {f (x) f (x + x a + γ x c ) ζ(f (x) f (x + x a ))}, γ [0,1] where f is the primal objective function and τ, ζ (0, 1).

Regularized Normal Matrix M is invertible iff [H A T ] has full rank, which can be guaranteed by pre-processing. HOWEVER nonsingularity of M (Q) (and indeed unique solvability of the reduced linear systems) requires that the reduced matrix [H A T Q ] have full (numerical) rank, which is far from guaranteed. Regularization: replace M (Q) with M (Q) := W + A T QS 1 Q Λ QA Q, where W := H + ϱr, with R 0, and let ϱ > 0 go to zero as a solution of the optimization { problem } is approached. The choice ϱ := min and R = I turns out to be 1, E(x,λ) Ē adequate. Here E(x, λ) is a measure of the distance to optimality: where E(x, λ) := ( v(x, λ), w(x, λ) ), v(x, λ) = Hx + c A T λ, w i (x, λ) := min{ s i, λ i }, i = 1,..., m.

Outline Mehrotra s Predictor/Corrector (MPC) for CQP Constraint-Reduced MPC for CQP Constraint Selection Convergence Theorem Numerical Results Conclusions

How to select Q? Various constraint selection rules have been used in the past. We propose a CONDITION on the selection rule that guarantees convergence of the regularized CR-MPC algorithm. This condition is met by all existing selection rules we are aware of. Condition (CSR) The constraint-selection rule should be such that: 1. when {(x k, λ k )} is bounded away from optimality, Q k eventually includes every active (primal) constraint at limit points x of {x k } when such limit points are approached; 2. when {x k } converges to a primal solution point x, Q k eventually includes every active constraint at x.

Some Previously used Constraint-Selection Rules Rule JOT [Jung, O Leary, ALT: Adaptive constraint reduction for training support vector machines, Electronic Transactions on Numerical Analysis, Vol. 31, 156 177 (2008)]: Q := {i : s i η}, where η = qth smallest slack s i, q being a certain decreasing function of duality measure µ that saturates at q = n. Rule FFK-CWH (for general NLP) Proposed in [Chen, Wang, He: A feasible active set QP-free method for nonlinear programming, SIAM J. Optimization, 17(2), 401 429 (2006)]: Q := {i : s i E(x, λ)} based on a result in [Facchinei, Fischer, Kanzow: On the accurate identification of active constraints, SIAM J. Optimization, 9(1), 14 32 (1998)]: (x x, λ λ ) E(x, λ) is bounded in a neighborhood of (x, λ )

Proposed Constraint-Selection Rule: Rule R Parameters: δ > 0, 0 < β < θ < 1. Input: Iteration: k, Slack variable: s k, Error: E min (value of error E k when δ k was last reduced), E k := E(x k, λ k ), Threshold: δ k 1. Output: Working set: Q k, Threshold: δ k Error: E min. if k = 0: δ 0 := δ, E min := E 0 else if E k βe min δk := θδ k 1 Emin := E k else δk := δ k 1 Select Q k := {i m s k i δ k }. Theorem: Rule R satisfies CSR.

Outline Mehrotra s Predictor/Corrector (MPC) for CQP Constraint-Reduced MPC for CQP Constraint Selection Convergence Theorem Numerical Results Conclusions

Convergence Theorem Assumptions: 1. Primal strictly feasible set is non-empty; primal solution set F P is non-empty and bounded. At every feasible point x, A A(x) has full row rank. (A(x) denotes the active set at x.) 2. There exists (unique) x where SOSC with strict complementarity holds, with (unique) λ.

Convergence Theorem Assumptions: 1. Primal strictly feasible set is non-empty; primal solution set F P is non-empty and bounded. At every feasible point x, A A(x) has full row rank. (A(x) denotes the active set at x.) 2. There exists (unique) x where SOSC with strict complementarity holds, with (unique) λ. Theorem Suppose that Condition CSR and Assumption 1 hold. Then {x k } F P. Suppose that, in addition, Assumption 2 holds. Then Q k contains A(x ) for k large enough. [With Rule R or Rule FFK-CWH, Q k = A(x ) for k large enough.] Q-quadratic convergence. Specifically, C > 0 such that, given any initial point (x 0, λ 0 ), there exists k such that, for all k > k, (x k+1 x, λ k+1 λ ) C (x k x, λ k λ ) 2.

Outline Mehrotra s Predictor/Corrector (MPC) for CQP Constraint-Reduced MPC for CQP Constraint Selection Convergence Theorem Numerical Results Conclusions

Randomly Generated Problems Problem setting: 1 minimize x R n 2 xt Hx + c T x subject to Ax b. A N (0, 1), c N (0, 1), x 0 U(0, 1), and s 0 U(1, 2) b := Ax 0 s 0 m := 10 000 and n between 10 and 500. We consider the following two classes of Hessian matrices: 1. Strongly convex quadratic program: diagonal H, diag(h) U(0, 1). 2. Linear Program: H = 0. We solved 50 randomly generated problems for each class of H and for each problem size, and report the results averaged over the 50 problems.

Randomly Generated Problems 20 18 16 14 12 10 8 10 20 50 100 200 500 Iteration count 10 4 10 3 10 2 10 1 10 20 50 100 200 500 Size of working set (a) Strongly convex QP 10 2 10 1 10 0 10-1 10-2 10 20 50 100 200 500 Computation time 40 35 30 25 20 15 10 10 20 50 100 200 500 Iteration count 10 4 10 3 10 2 10 1 10 20 50 100 200 500 Size of working set (b) LP 10 2 10 1 10 0 10-1 10-2 10 20 50 100 200 500 Computation time

Data Fitting Problems Regularized minimax data fitting problem: minimize Ā x b + 1 2ᾱ xt H x x R n minimize x R n u R u + 1 2ᾱ xt H x subject to Ā x b u1, Ā x + b u1. b: noisy data measurement from a target function g. A: trigonometric basis, x: expansion coefficients H: regularization matrix, ᾱ: penalty parameter m = 10 000, n from 10 to 500 For each choice of g and for each problem size, we solved the problem 50 times and report the results averaged over the 50 problems.

Data Fitting Problems 250 10 4 10 3 200 10 2 150 100 10 3 10 0 50 0 10 20 50 100 200 500 Iteration count 10 2 10 20 50 100 200 500 Size of working set (a) g(t) = sin(10t) cos(25t 2 ) 10-2 10 20 50 100 200 500 Computation time 250 10 4 10 3 200 10 2 150 100 10 3 10 0 50 0 10 2 10-2 10 20 50 100 200 500 10 20 50 100 200 500 10 20 50 100 200 500 Iteration count Size of working set Computation time (b) g(t) = sin(5t 3 ) cos 2 (10t)

Outline Mehrotra s Predictor/Corrector (MPC) for CQP Constraint-Reduced MPC for CQP Constraint Selection Convergence Theorem Numerical Results Conclusions

Conclusions A convergent, constraint-reduced (CR) variant of Mehrotra s Predictor/Corrector for convex quadratic programming was stated and analyzed.

Conclusions A convergent, constraint-reduced (CR) variant of Mehrotra s Predictor/Corrector for convex quadratic programming was stated and analyzed. A regularization scheme was used to account for CR-triggered rank deficiency away from solutions.

Conclusions A convergent, constraint-reduced (CR) variant of Mehrotra s Predictor/Corrector for convex quadratic programming was stated and analyzed. A regularization scheme was used to account for CR-triggered rank deficiency away from solutions. A class of constraint-selection rules was defined by means of a sufficient condition (Condition CSR) that guarantees strong convergence properties for the resulting algorithm.

Conclusions A convergent, constraint-reduced (CR) variant of Mehrotra s Predictor/Corrector for convex quadratic programming was stated and analyzed. A regularization scheme was used to account for CR-triggered rank deficiency away from solutions. A class of constraint-selection rules was defined by means of a sufficient condition (Condition CSR) that guarantees strong convergence properties for the resulting algorithm. A new selection rule was proposed, based on a modified version of an active-constraint identification function due to Facchinei et al.

Conclusions A convergent, constraint-reduced (CR) variant of Mehrotra s Predictor/Corrector for convex quadratic programming was stated and analyzed. A regularization scheme was used to account for CR-triggered rank deficiency away from solutions. A class of constraint-selection rules was defined by means of a sufficient condition (Condition CSR) that guarantees strong convergence properties for the resulting algorithm. A new selection rule was proposed, based on a modified version of an active-constraint identification function due to Facchinei et al. Numerical results were reported that show the benefit of CR on problems with many inequality constraints, and the power of the new proposed selection rule. The slides are available from http://www.ece.umd.edu/ andre