HESSIAN MATRICES OF PENALTY FUNCTIONS FOR SOLVING CONSTRAINED-OPTIMIZATION PROBLEMS

Similar documents
Kernel Methods and Support Vector Machines

Ch 12: Variations on Backpropagation

The Methods of Solution for Constrained Nonlinear Programming

Page 1 Lab 1 Elementary Matrix and Linear Algebra Spring 2011

Block designs and statistics

Least Squares Fitting of Data

Intelligent Systems: Reasoning and Recognition. Perceptrons and Support Vector Machines

ON THE TWO-LEVEL PRECONDITIONING IN LEAST SQUARES METHOD

Feature Extraction Techniques

A new type of lower bound for the largest eigenvalue of a symmetric matrix

Introduction to Optimization Techniques. Nonlinear Programming

Model Fitting. CURM Background Material, Fall 2014 Dr. Doreen De Leon

INNER CONSTRAINTS FOR A 3-D SURVEY NETWORK

Polygonal Designs: Existence and Construction

Support Vector Machine Classification of Uncertain and Imbalanced data using Robust Optimization

Chapter 6 1-D Continuous Groups

arxiv: v1 [math.na] 10 Oct 2016

Hybrid System Identification: An SDP Approach

Sharp Time Data Tradeoffs for Linear Inverse Problems

An Algorithm for Posynomial Geometric Programming, Based on Generalized Linear Programming

CHAPTER 8 CONSTRAINED OPTIMIZATION 2: SEQUENTIAL QUADRATIC PROGRAMMING, INTERIOR POINT AND GENERALIZED REDUCED GRADIENT METHODS

The Distribution of the Covariance Matrix for a Subset of Elliptical Distributions with Extension to Two Kurtosis Parameters

Lecture 21. Interior Point Methods Setup and Algorithm

Topic 5a Introduction to Curve Fitting & Linear Regression

Physics 215 Winter The Density Matrix

COS 424: Interacting with Data. Written Exercises

Recovering Data from Underdetermined Quadratic Measurements (CS 229a Project: Final Writeup)

RESTARTED FULL ORTHOGONALIZATION METHOD FOR SHIFTED LINEAR SYSTEMS

Interactive Markov Models of Evolutionary Algorithms

paper prepared for the 1996 PTRC Conference, September 2-6, Brunel University, UK ON THE CALIBRATION OF THE GRAVITY MODEL

Anisotropic reference media and the possible linearized approximations for phase velocities of qs waves in weakly anisotropic media

Chaotic Coupled Map Lattices

Supporting Information for Supression of Auger Processes in Confined Structures

A Simple Regression Problem

1 Bounding the Margin

RECOVERY OF A DENSITY FROM THE EIGENVALUES OF A NONHOMOGENEOUS MEMBRANE

A Simplified Analytical Approach for Efficiency Evaluation of the Weaving Machines with Automatic Filling Repair

A1. Find all ordered pairs (a, b) of positive integers for which 1 a + 1 b = 3

Variations on Backpropagation

On Nonlinear Controllability of Homogeneous Systems Linear in Control

which together show that the Lax-Milgram lemma can be applied. (c) We have the basic Galerkin orthogonality

On Conditions for Linearity of Optimal Estimation

Support Vector Machines MIT Course Notes Cynthia Rudin

DEPARTMENT OF ECONOMETRICS AND BUSINESS STATISTICS

Boosting with log-loss

COMPONENT MODE SYNTHESIS, FIXED-INTERFACE MODEL Revision A

Deflation of the I-O Series Some Technical Aspects. Giorgio Rampa University of Genoa April 2007

Neural Network Learning as an Inverse Problem

Solutions 1. Introduction to Coding Theory - Spring 2010 Solutions 1. Exercise 1.1. See Examples 1.2 and 1.11 in the course notes.

A model reduction approach to numerical inversion for a parabolic partial differential equation

Convolutional Codes. Lecture Notes 8: Trellis Codes. Example: K=3,M=2, rate 1/2 code. Figure 95: Convolutional Encoder

On Constant Power Water-filling

Non-Parametric Non-Line-of-Sight Identification 1

On the Use of A Priori Information for Sparse Signal Approximations

Saddle Points in Random Matrices: Analysis of Knuth Search Algorithms

Asynchronous Gossip Algorithms for Stochastic Optimization

Multivariate Methods. Matlab Example. Principal Components Analysis -- PCA

THE WEIGHTING METHOD AND MULTIOBJECTIVE PROGRAMMING UNDER NEW CONCEPTS OF GENERALIZED (, )-INVEXITY

Kinetic Theory of Gases: Elementary Ideas

Lecture 9 November 23, 2015

Introduction to Robotics (CS223A) (Winter 2006/2007) Homework #5 solutions

OPTIMIZATION in multi-agent networks has attracted

Solving initial value problems by residual power series method

The linear sampling method and the MUSIC algorithm

Uniform Approximation and Bernstein Polynomials with Coefficients in the Unit Interval

Exact tensor completion with sum-of-squares

A New Class of APEX-Like PCA Algorithms

by F. A. LOOTSMA Throughout this paper we shall be dealing with a number of methods for solving the constrained-minimization problem:

Intelligent Systems: Reasoning and Recognition. Artificial Neural Networks

Curious Bounds for Floor Function Sums

Numerical issues in the implementation of high order polynomial multidomain penalty spectral Galerkin methods for hyperbolic conservation laws

Kinetic Theory of Gases: Elementary Ideas

Tight Bounds for Maximal Identifiability of Failure Nodes in Boolean Network Tomography

Lecture 13 Eigenvalue Problems

Comparison of Stability of Selected Numerical Methods for Solving Stiff Semi- Linear Differential Equations

Optimal quantum detectors for unambiguous detection of mixed states

A Better Algorithm For an Ancient Scheduling Problem. David R. Karger Steven J. Phillips Eric Torng. Department of Computer Science

E0 370 Statistical Learning Theory Lecture 6 (Aug 30, 2011) Margin Analysis

Decentralized Adaptive Control of Nonlinear Systems Using Radial Basis Neural Networks

PROXSCAL. Notation. W n n matrix with weights for source k. E n s matrix with raw independent variables F n p matrix with fixed coordinates

Explicit solution of the polynomial least-squares approximation problem on Chebyshev extrema nodes

(t, m, s)-nets and Maximized Minimum Distance, Part II

Ştefan ŞTEFĂNESCU * is the minimum global value for the function h (x)

Symbolic Analysis as Universal Tool for Deriving Properties of Non-linear Algorithms Case study of EM Algorithm

Principal Components Analysis

Genetic Quantum Algorithm and its Application to Combinatorial Optimization Problem

Constrained Consensus and Optimization in Multi-Agent Networks arxiv: v2 [math.oc] 17 Dec 2008

OBJECTIVES INTRODUCTION

THE KALMAN FILTER: A LOOK BEHIND THE SCENE

Tail estimates for norms of sums of log-concave random vectors

Ph 20.3 Numerical Solution of Ordinary Differential Equations

Multi-Dimensional Hegselmann-Krause Dynamics

Generalized AOR Method for Solving System of Linear Equations. Davod Khojasteh Salkuyeh. Department of Mathematics, University of Mohaghegh Ardabili,

Lecture 20 November 7, 2013

Necessity of low effective dimension

SOLUTIONS. PROBLEM 1. The Hamiltonian of the particle in the gravitational field can be written as, x 0, + U(x), U(x) =

Lower Bounds for Quantized Matrix Completion

Moving Least-Squares: A Numerical Differentiation Method for Irregularly Spaced Calculation Points

DESIGN OF THE DIE PROFILE FOR THE INCREMENTAL RADIAL FORGING PROCESS *

ma x = -bv x + F rod.

Transcription:

R 702 Philips Res. Repts 24, 322-330, 1969 HESSIAN MATRICES OF PENALTY FUNCTIONS FOR SOLVING CONSTRAINED-OPTIMIZATION PROBLEMS by F. A. LOOTSMA Abstract This paper deals with the Hessian atrices of penalty functions, evaluated at their iniizing point. It is concerned with the condition nuber of these atrices for sall values of a controlling paraeter. At the end of the paper a coparison is ade between different types of penalty functions on the grounds of the results obtained. 1. Classification of penalty-function techniques Throughout this paper we shall be concerned with the proble iniize f(x) subject t~ } g,(x) ~ 0; 1 = I,...,, (1.1) where x denotes an eleent of the n-diensional vector space En. We shall be assuing that the functions J, -gl>..., -g are convex and twice differentiable with continuous second-order partial derivatives on an open, convex subset V of En. The constraint set R={xlg,(x);;:;::O; i=i,...,} (1.2) is a bounded subset of V. The interior Ro of R is non-epty. We consider a nuber of penalty-function techniques for solving proble (1.1). One can distinguish two classes, both of which have been referred to by expressive naes. The interior-point ethods operate in the interior Ro of R. The penalty function is given by Br(x) =f(x) - r ~ cp[g,(x)], '=1 (1.3) where cp is a concave function of one variable, say y. Its derivative tp' cp'(y) = y-v reads (l.4) with a positive, integer '11. A point x(r) iniizing (1.3) over Ro 'exists then for "any r > 0, Any convergent sequence {x(rk)}' where {rk} is a onotonie, decreasing null sequence as k --* 00, converges to a solution of (1.1). The exterior-point ethods or outside-in ethods present an approach to a iniu solution fro outside the constraint set. The' general for of the

HESSIAN MATRICES OF PENALTY FUNCTIONS 323 penalty function is given by Lr(x) = f(x) - r- 1 ~ lp[g,(x)], '=1 (1.5) where 11' is a concave function of one variable y, such that for y < 0, 1p(y) = co(y) for y::::;; 0. The derivative co' of co is given by { co'(y) = (-y)v. (1.6) Let z(r) denote a point iniizing (1.5) over En. Any convergent sequence {zerk)}, where {rk} denotes again a onotonie, decreasing null sequence, converges to a solution of (1.1). It will be convenient to extend the terinology that we have been using in previous papers. Following Murray S) we shall refer to interior-point penalty functions of the type (1.3) as barrier functions. The exterior-point penalty functions (1.5) will briefly be indicated as loss functions, a nae which has also been used by Fiacco and McCorick 1). Furtherore, we introduce a classification based on the behaviour of the functions tp' and co' in a neighbourhood of y = O. A barrier function is said to be of order 'JI if the function tp' has a pole of order 'JI at y = o. Siilarly, a loss function is of order 'JI if the function co' has a zero of order 'JI at y = o. 2. Conditioning An intriguing point is the choice of a penalty function for nuerical purposes. We shall not repeat here all the arguents supporting the choice of the firstorder penalty functions for coputational purposes. Our concern is an arguent which has been introduced only by Murray S), naely the question of "conditioning". This is a qualification referring to the Hessian atrix of a penalty function. The otivation for such a study is the idea that failures of (second-order) unconstrained-iniization techniques ay be due to illconditioning of the Hessian atrix at soe iteration points. Throughout this paper it is tacitly assued that penalty functions are strictly convex so that they have a unique iniizing point in their definition area. We shall priarily be concerned with the Hessian atrix of penalty functions at the iniizing point. In what follows we shall refer to it as the principal Hessian atrix. The reason will be clear. In a neighbourhood ofthe iniizing point a useful approxiation of a penalty function is given by a quadratic function, with the principal Hessian atrix as the coefficient atrix of the quadratic ter. It is therefore reasonable to assue that unconstrained ini-

324 F. A. LOOTSMA izatiori ay be obstructed by ill-conditioning of the principal Hessian atrix. Conditioning of a atrix is easured by the condition nuber, which for syetric, positive definite atrices is defined as the ratio of the greatest to the sallest eigenvalue. We are particularly interested in variations of the condition nuber of the principal Hessian atrix as a function of r in the case where r decreases to O. The condition nuber is also affected by the order of the penalty function. These are, roughly, the points to be discussed in the present paper. 3. The principal Hessian atrix of penalty functions The following analysis will be carried out under the uniqueness conditions forulated in a previous paper 4). They are sufficient for proble (1.1) to have a unique iniu solution x with unique Lagrangian ultipliers Ul>..., Urn. We take \If and \l2f to represent the gradient and the Hessian atrix of J, etc. It will be convenient to define D(x,u) = \l2f(x) - 2: UI\l2gl(X). We think of the constraints as arranged in such a way that gt(.x') = 0; i = 1,..., oe, gt(.x') > 0; i = oe+ 1,...,. Hence, oestands for the nuber of active constraints at the iniu solution x. Let us, first, consider the barrier functions of the type (1.3). The gradient of (1.3) vanishes at its iniizing point x(r) whence \I f[x(r)] - r ~ \lgl[x(r)].f ; gt"[x(r)] = o. (3.1) Let u(r) denote the vector with coponents ul(r), i = 1,...,, defined by We have shown 4) that li ul(r) = UI; r.j.o (3.2) i = 1,...,. (3.3) It.is readily verified that the Hessian atrix of (1.3) evaluated at x(r) can be written as D[x(r), u(r)] + r- 1tv G 1 [x(r), u(r)], (3.4)

HESSIAN MATRICES OF PENALTY FUNCTIONS 325 ---------------------------------------------------------- where RI G 1 (X,u) = 11 ~ U1 1 + 1/v \1gl(X) \1gl(x)T. The gradients in the above expression are thought of as colun vectors, whereas the index sybol T is used for transposition. The second ter in (3.4) represents a su of atrices of rank 1. A convenient approxiation of (3.4) for sall, positive values of r is given by (3.5) where and F 1 (x,ü) = (~ " G 1 [x(r),u(r)]) dr 1/v,=0 (3.6) Tbe atrices D(x,ü) and G 1 (x,ü) are syetric and positive sei-definite. The uniqueness conditions 4) iposed on proble (1.1) iply that either D(x,ü) or G 1 (x,ü) has rank n. Finally, it can be shown that F 1 (x,ü) is syetric and for any y such!hat G 1 (x,ü) y = O. Fro tbese arguents we obtain that the approxiation (3.5) of the principal Hessian atrix (3.4) is positive definite and syetric for sufficiently sall, positive values of r. The loss function (1.5) can be treated in a siilar way. The gradient of (1.5) vanishes at the iniizing point z(r). For sall values of r this leads to 4) \1f[z(r)] _r- 1 " ~ {-gl[z(r)]}" \1gl[z(r)] = O. (3.7) Taking tl(r) to denote the vector with coponents we have vl(r) = r=: {-gl[z(r)]}"; i = 1,..., a, li vtcr) = ÜI; 'to i = 1,..., ri.. A useful approxiation of the principal Hessian atrix is then given by H 2 = D(x,ü) + F 2 (x,ü) + r- 1/v Gix,ü), (3.8)

326 F. A. LOOTSMA where the atrix Fix,ü) is defined in a siilar way as Fl(x,ü), and ex G 2 (x,ü) = v ~ (Ü,)l-l/v \1g,(x) \1g,(x)T., (3.9) There is clearly a striking siilarity in the for of HI defined by (3.5) and H 2 of (3.8). 4. Eigenvalues and condition nuber of the principal Hessian atrix It will be convenient to confine our attention in this section to a atrix of the for 1 D+F+-G, (4.1) P where D is a positive definite atrix of rank n, G a positive sei-definite atrix of rank (X < n, and F such that yt F y = 0 for all y satisfying G y = O. The paraeter p is a positive controlling paraeter. The eigenvalues of (4.1) are not affected by a coordinate transforation. Let us therefore transfor D, F and G into atrices D', F' and G', respectively, in such a way that G' is a diagonal atrix. Then (4.1) reduces to (4.2) Here Gu' is a diagonal atrix with (X rows and (X coluns, and positive diagonal eleents ru',..., rex.." The partitioning of D' and F' is siilar to that of G' so that DIl' has (X rows and (X coluns, etc. For sall values of p the eigenvalues of (4.2), and consequently the eigenvalues of (4.1), are given by Át ~p-lrtl'; Át ~ f-l,'; i = 1,..., oe, i = (X + 1,..., n, where f-lex+/,..., f-ln' denote the eigenvalues of D 22 '. The proof rests on the theores of Gerschgorin (see Wilkinson 7), p. 71 ff.). Suppose that and For sall values of p the condition nuber of (4.1), here the ratio of the greatest to the sallest eigenvalue, is given by -1 ru' p =rf-ln

HESSIAN MATRICES OF PENALTY FUNCTIONS 327 SO that the condition nuber is proportional to p-l. Finally, the deterinant of (4.1) varies with p-a. Substituting p = r+!" one obtains the results of the next section. We ay note that the condition nuber converges to the finite value Yll' (as PtO) in the particular case that oe= n (if, for exaple, the proble is one of linear prograing). 5. Coparison of first-order and higher-order penalty functions The penalty functions of order 'JI have a principal Hessian atrix with a condition nuber proportional to r- 1/v for sall values of r. This iplies that the first-order penalty functions (the logarithic barrier function and the quadratic loss function) tend to becoe ill-conditioned ore rapidly than any of the higher-order penalty functions. The inverse of the principal Hessian atrix, which plays a doinant part in the successful algorith of Davidon, Fletcher and Powell 3) as well as in the variable-etric or quasi-newton ethods 6) derived fro it, tends to becoe singular as r t O. For sall values of r its deterinant is proportional to rt: Here again, the first-order penalty functions have a disadvantage with respect to higher-order penalty functions. On the other hand, first-order penalty-function techniques converge faster than the higher-order ones. Generally, we have by the rule of De I'Hêpital that. f [x(r)] - f(x) Ii = 1. (5.1) r~o a ~ ÜI gl[x(r)] Using (3.2) and (3.3) for the barrier functions we are led to Working with a loss function we obtain in a siilar way that a f[z(r)]-f(x) ~ _rl/v~ (ÜI)I+1/V. Hence, a variation of the condition nuber proportional to r- 1 / v is copensated by a rate of convergence varying with r'!". In other words, it is doubtful whether a higher-order penalty-function technique will give a substantial iproveent in coparison with a first-order technique.

328 F. A. LOOTSMA 6. Balancing the constraints The condition nuber of a principal Hessian atrix depends alost entirely on the greatest eigenvalue of (3.6), if a barrier function is eployed, or (3.9), if we are dealing with a loss function. It would apparently be a nuisance if this greatest eigenvalue differs considerably fro the reaining eigenvalues of these atrices. The question arises whether we can balance the (active) constraints in such a way that the positive eigenvalues of (3.6) or (3.9) are of the sae order of agnitude. We confine ourselves to first-order penalty functions. The atrices (3.6) and (3.9) are then reduced to and Cl ~ (ÜI)2 V'gl(x) 'Vgl(x)T (6.1) Cl ~ V'gl(x) V'gl(x)T (6.2) respectively. Let us first consider the case that the gradients V'gl(X), i = 1,..., ()(,are orthogonal. This is not the general case, but the orthogonality hypothesis leads to soe understanding of the causes of ibalance. It follows easily that (6.1) has a positive eigenvalues given by (ü,liv'g,(x)11)2; i = 1,..., a, (6.3) Using the Kuhn-Tucker relations and the orthogonality relations one can readily show that _ {J111V'j(x)11 UI = ; i = 1,..., ()(, IIV'g,(x) II where (JI is the cosine of the angle between 'Vj(x) and V'gl(x). The positive eigenvalues of( 6.1) are accordingly given by ({JIIIV'j(x)IIY; i = 1,..., a, (6.4) which deonstrates that ibalance is only due to the angles between the gradients of the objective function and the active constraints. In a siilar way we obtain that the positive eigenvalues of (6.2) can be written as II'Vgl(x)W; i = 1,..., ()(. (6.5) Ibalance of the active constraints is here only produced by differences in the lengths of gradients. Hence, a proble ay be unbalanced with respect to the logarithic barrier function, but balanced with respect to the quadratic loss function, and vice versa..

HESSIAN MATRICES OF PENALTY FUNCTIONS 329 The balance of the active constraints function is taken to be is restored if the logarithic barrier I(x) - r ~ PI In g,(x), (6.6) and if the weight factors P, are chosen by (Ü,JJ\7g,(X)JI)-2; i = 1,...,0(, PI= {. o or 1; l = 0( + 1,..., n. (6.7) Then (6.1) is reduced to a atrix of the for '" ~ ala?, (6.8) where the vectors al>..., a", are orthogonal and noralized to length 1. The quantities appearing in the right-hand side of (6.7) are not available at the beginning of the coputational process for solving (1.1). Convenient approxiations, however, are gradually obtained at successive r-inia. One could start with the barrier function (6.6) and with P, = 1 for all i = 1,..., n. Ibalance of the constraints is then indicated by differences in the quantities (u,(r) JJ\7g,[x(r)]JI)2; 1,..., 0(, (6.9) for sall values of r. Using (6.9) to odify the weight factors one can restart the coputations with an iproved equilibriu. A siilar device can of course be eployed in the case that the quadratic loss function is used. Let us finally depart fro the orthogonality hypothesis, and consider what happens if the weight factors (6.7) are eployed in the case that the gradients \7g,(x), i = 1,..., et, are independent but not necessarily orthogonal. Ibalance is still governed by a atrix ofthe for (6.8) with noralized vectors al'..., a", of length 1. This atrix can also be written as AAT where A is the atrix with coluns al'..., a",. The positive eigenvalues of AAT are given by the eigenvalues of the atrix ATA, which has, as a consequence of the noralization, diagonal eleents equal to 1. Hence, the positive eigenvalues of AA T su up «ae. Orthogonality of the vectors al'..., a", iplies that the eigenvalues are toll qual to 1. The continuity of the eigenvalues ensures that sall perturbations of the orthogonality will only lead to sall deviations of the eigenvalues fro unity. The above device of choosing weight factors can therefore frequently be used in order to balance the constraints, even in the case that the gradients of the active constraints are not orthogonal.

330 F. A. LOOTSMA Acknowledgeent The author wishes to thank Prof. Dr J. F. Benders (Technological University, Eindhoven) and Dr J. D. Pearson (Inforation Systes and Autoation) for the inspiring discussions and criticiss. REFERENCES Eindhoven, May 1969 1) A. V. Fiacco and G. P. McCorick, Nonlinearprograing, Sequential unconstrained iniization techniques, Wiley, New York, 1968. 2) R. Fletcher and A. P. McCann, Acceleration techniques for non-linear prograing, Paper presented at the conference on optiization, Keele, 1968. 3) R. Fletcher and M. J. D. Powell, The Coputer Journal 6, 163-168, 1963. 4) F. A. Lootsa, Philips Res. Repts. 23, 408-423, 1968. 5) W. Murr ay, Ill-conditioning in barrier and penalty functions arising in constrained nonlinear prograing, Paper presented at the sixth int. syp. on ath. prograing, Princeton, N.J., 1967. 6) J. D. Pearson, The Coputer Journal12, 171-178, 1969. 7) J. H. Wilkinson, The algebraic eigenvalue proble, Clarendon Press, Oxford, 1965. 8) W. I. Zangwill, Manageent Science 13, 344-358, 1967.