R 702 Philips Res. Repts 24, 322-330, 1969 HESSIAN MATRICES OF PENALTY FUNCTIONS FOR SOLVING CONSTRAINED-OPTIMIZATION PROBLEMS by F. A. LOOTSMA Abstract This paper deals with the Hessian atrices of penalty functions, evaluated at their iniizing point. It is concerned with the condition nuber of these atrices for sall values of a controlling paraeter. At the end of the paper a coparison is ade between different types of penalty functions on the grounds of the results obtained. 1. Classification of penalty-function techniques Throughout this paper we shall be concerned with the proble iniize f(x) subject t~ } g,(x) ~ 0; 1 = I,...,, (1.1) where x denotes an eleent of the n-diensional vector space En. We shall be assuing that the functions J, -gl>..., -g are convex and twice differentiable with continuous second-order partial derivatives on an open, convex subset V of En. The constraint set R={xlg,(x);;:;::O; i=i,...,} (1.2) is a bounded subset of V. The interior Ro of R is non-epty. We consider a nuber of penalty-function techniques for solving proble (1.1). One can distinguish two classes, both of which have been referred to by expressive naes. The interior-point ethods operate in the interior Ro of R. The penalty function is given by Br(x) =f(x) - r ~ cp[g,(x)], '=1 (1.3) where cp is a concave function of one variable, say y. Its derivative tp' cp'(y) = y-v reads (l.4) with a positive, integer '11. A point x(r) iniizing (1.3) over Ro 'exists then for "any r > 0, Any convergent sequence {x(rk)}' where {rk} is a onotonie, decreasing null sequence as k --* 00, converges to a solution of (1.1). The exterior-point ethods or outside-in ethods present an approach to a iniu solution fro outside the constraint set. The' general for of the
HESSIAN MATRICES OF PENALTY FUNCTIONS 323 penalty function is given by Lr(x) = f(x) - r- 1 ~ lp[g,(x)], '=1 (1.5) where 11' is a concave function of one variable y, such that for y < 0, 1p(y) = co(y) for y::::;; 0. The derivative co' of co is given by { co'(y) = (-y)v. (1.6) Let z(r) denote a point iniizing (1.5) over En. Any convergent sequence {zerk)}, where {rk} denotes again a onotonie, decreasing null sequence, converges to a solution of (1.1). It will be convenient to extend the terinology that we have been using in previous papers. Following Murray S) we shall refer to interior-point penalty functions of the type (1.3) as barrier functions. The exterior-point penalty functions (1.5) will briefly be indicated as loss functions, a nae which has also been used by Fiacco and McCorick 1). Furtherore, we introduce a classification based on the behaviour of the functions tp' and co' in a neighbourhood of y = O. A barrier function is said to be of order 'JI if the function tp' has a pole of order 'JI at y = o. Siilarly, a loss function is of order 'JI if the function co' has a zero of order 'JI at y = o. 2. Conditioning An intriguing point is the choice of a penalty function for nuerical purposes. We shall not repeat here all the arguents supporting the choice of the firstorder penalty functions for coputational purposes. Our concern is an arguent which has been introduced only by Murray S), naely the question of "conditioning". This is a qualification referring to the Hessian atrix of a penalty function. The otivation for such a study is the idea that failures of (second-order) unconstrained-iniization techniques ay be due to illconditioning of the Hessian atrix at soe iteration points. Throughout this paper it is tacitly assued that penalty functions are strictly convex so that they have a unique iniizing point in their definition area. We shall priarily be concerned with the Hessian atrix of penalty functions at the iniizing point. In what follows we shall refer to it as the principal Hessian atrix. The reason will be clear. In a neighbourhood ofthe iniizing point a useful approxiation of a penalty function is given by a quadratic function, with the principal Hessian atrix as the coefficient atrix of the quadratic ter. It is therefore reasonable to assue that unconstrained ini-
324 F. A. LOOTSMA izatiori ay be obstructed by ill-conditioning of the principal Hessian atrix. Conditioning of a atrix is easured by the condition nuber, which for syetric, positive definite atrices is defined as the ratio of the greatest to the sallest eigenvalue. We are particularly interested in variations of the condition nuber of the principal Hessian atrix as a function of r in the case where r decreases to O. The condition nuber is also affected by the order of the penalty function. These are, roughly, the points to be discussed in the present paper. 3. The principal Hessian atrix of penalty functions The following analysis will be carried out under the uniqueness conditions forulated in a previous paper 4). They are sufficient for proble (1.1) to have a unique iniu solution x with unique Lagrangian ultipliers Ul>..., Urn. We take \If and \l2f to represent the gradient and the Hessian atrix of J, etc. It will be convenient to define D(x,u) = \l2f(x) - 2: UI\l2gl(X). We think of the constraints as arranged in such a way that gt(.x') = 0; i = 1,..., oe, gt(.x') > 0; i = oe+ 1,...,. Hence, oestands for the nuber of active constraints at the iniu solution x. Let us, first, consider the barrier functions of the type (1.3). The gradient of (1.3) vanishes at its iniizing point x(r) whence \I f[x(r)] - r ~ \lgl[x(r)].f ; gt"[x(r)] = o. (3.1) Let u(r) denote the vector with coponents ul(r), i = 1,...,, defined by We have shown 4) that li ul(r) = UI; r.j.o (3.2) i = 1,...,. (3.3) It.is readily verified that the Hessian atrix of (1.3) evaluated at x(r) can be written as D[x(r), u(r)] + r- 1tv G 1 [x(r), u(r)], (3.4)
HESSIAN MATRICES OF PENALTY FUNCTIONS 325 ---------------------------------------------------------- where RI G 1 (X,u) = 11 ~ U1 1 + 1/v \1gl(X) \1gl(x)T. The gradients in the above expression are thought of as colun vectors, whereas the index sybol T is used for transposition. The second ter in (3.4) represents a su of atrices of rank 1. A convenient approxiation of (3.4) for sall, positive values of r is given by (3.5) where and F 1 (x,ü) = (~ " G 1 [x(r),u(r)]) dr 1/v,=0 (3.6) Tbe atrices D(x,ü) and G 1 (x,ü) are syetric and positive sei-definite. The uniqueness conditions 4) iposed on proble (1.1) iply that either D(x,ü) or G 1 (x,ü) has rank n. Finally, it can be shown that F 1 (x,ü) is syetric and for any y such!hat G 1 (x,ü) y = O. Fro tbese arguents we obtain that the approxiation (3.5) of the principal Hessian atrix (3.4) is positive definite and syetric for sufficiently sall, positive values of r. The loss function (1.5) can be treated in a siilar way. The gradient of (1.5) vanishes at the iniizing point z(r). For sall values of r this leads to 4) \1f[z(r)] _r- 1 " ~ {-gl[z(r)]}" \1gl[z(r)] = O. (3.7) Taking tl(r) to denote the vector with coponents we have vl(r) = r=: {-gl[z(r)]}"; i = 1,..., a, li vtcr) = ÜI; 'to i = 1,..., ri.. A useful approxiation of the principal Hessian atrix is then given by H 2 = D(x,ü) + F 2 (x,ü) + r- 1/v Gix,ü), (3.8)
326 F. A. LOOTSMA where the atrix Fix,ü) is defined in a siilar way as Fl(x,ü), and ex G 2 (x,ü) = v ~ (Ü,)l-l/v \1g,(x) \1g,(x)T., (3.9) There is clearly a striking siilarity in the for of HI defined by (3.5) and H 2 of (3.8). 4. Eigenvalues and condition nuber of the principal Hessian atrix It will be convenient to confine our attention in this section to a atrix of the for 1 D+F+-G, (4.1) P where D is a positive definite atrix of rank n, G a positive sei-definite atrix of rank (X < n, and F such that yt F y = 0 for all y satisfying G y = O. The paraeter p is a positive controlling paraeter. The eigenvalues of (4.1) are not affected by a coordinate transforation. Let us therefore transfor D, F and G into atrices D', F' and G', respectively, in such a way that G' is a diagonal atrix. Then (4.1) reduces to (4.2) Here Gu' is a diagonal atrix with (X rows and (X coluns, and positive diagonal eleents ru',..., rex.." The partitioning of D' and F' is siilar to that of G' so that DIl' has (X rows and (X coluns, etc. For sall values of p the eigenvalues of (4.2), and consequently the eigenvalues of (4.1), are given by Át ~p-lrtl'; Át ~ f-l,'; i = 1,..., oe, i = (X + 1,..., n, where f-lex+/,..., f-ln' denote the eigenvalues of D 22 '. The proof rests on the theores of Gerschgorin (see Wilkinson 7), p. 71 ff.). Suppose that and For sall values of p the condition nuber of (4.1), here the ratio of the greatest to the sallest eigenvalue, is given by -1 ru' p =rf-ln
HESSIAN MATRICES OF PENALTY FUNCTIONS 327 SO that the condition nuber is proportional to p-l. Finally, the deterinant of (4.1) varies with p-a. Substituting p = r+!" one obtains the results of the next section. We ay note that the condition nuber converges to the finite value Yll' (as PtO) in the particular case that oe= n (if, for exaple, the proble is one of linear prograing). 5. Coparison of first-order and higher-order penalty functions The penalty functions of order 'JI have a principal Hessian atrix with a condition nuber proportional to r- 1/v for sall values of r. This iplies that the first-order penalty functions (the logarithic barrier function and the quadratic loss function) tend to becoe ill-conditioned ore rapidly than any of the higher-order penalty functions. The inverse of the principal Hessian atrix, which plays a doinant part in the successful algorith of Davidon, Fletcher and Powell 3) as well as in the variable-etric or quasi-newton ethods 6) derived fro it, tends to becoe singular as r t O. For sall values of r its deterinant is proportional to rt: Here again, the first-order penalty functions have a disadvantage with respect to higher-order penalty functions. On the other hand, first-order penalty-function techniques converge faster than the higher-order ones. Generally, we have by the rule of De I'Hêpital that. f [x(r)] - f(x) Ii = 1. (5.1) r~o a ~ ÜI gl[x(r)] Using (3.2) and (3.3) for the barrier functions we are led to Working with a loss function we obtain in a siilar way that a f[z(r)]-f(x) ~ _rl/v~ (ÜI)I+1/V. Hence, a variation of the condition nuber proportional to r- 1 / v is copensated by a rate of convergence varying with r'!". In other words, it is doubtful whether a higher-order penalty-function technique will give a substantial iproveent in coparison with a first-order technique.
328 F. A. LOOTSMA 6. Balancing the constraints The condition nuber of a principal Hessian atrix depends alost entirely on the greatest eigenvalue of (3.6), if a barrier function is eployed, or (3.9), if we are dealing with a loss function. It would apparently be a nuisance if this greatest eigenvalue differs considerably fro the reaining eigenvalues of these atrices. The question arises whether we can balance the (active) constraints in such a way that the positive eigenvalues of (3.6) or (3.9) are of the sae order of agnitude. We confine ourselves to first-order penalty functions. The atrices (3.6) and (3.9) are then reduced to and Cl ~ (ÜI)2 V'gl(x) 'Vgl(x)T (6.1) Cl ~ V'gl(x) V'gl(x)T (6.2) respectively. Let us first consider the case that the gradients V'gl(X), i = 1,..., ()(,are orthogonal. This is not the general case, but the orthogonality hypothesis leads to soe understanding of the causes of ibalance. It follows easily that (6.1) has a positive eigenvalues given by (ü,liv'g,(x)11)2; i = 1,..., a, (6.3) Using the Kuhn-Tucker relations and the orthogonality relations one can readily show that _ {J111V'j(x)11 UI = ; i = 1,..., ()(, IIV'g,(x) II where (JI is the cosine of the angle between 'Vj(x) and V'gl(x). The positive eigenvalues of( 6.1) are accordingly given by ({JIIIV'j(x)IIY; i = 1,..., a, (6.4) which deonstrates that ibalance is only due to the angles between the gradients of the objective function and the active constraints. In a siilar way we obtain that the positive eigenvalues of (6.2) can be written as II'Vgl(x)W; i = 1,..., ()(. (6.5) Ibalance of the active constraints is here only produced by differences in the lengths of gradients. Hence, a proble ay be unbalanced with respect to the logarithic barrier function, but balanced with respect to the quadratic loss function, and vice versa..
HESSIAN MATRICES OF PENALTY FUNCTIONS 329 The balance of the active constraints function is taken to be is restored if the logarithic barrier I(x) - r ~ PI In g,(x), (6.6) and if the weight factors P, are chosen by (Ü,JJ\7g,(X)JI)-2; i = 1,...,0(, PI= {. o or 1; l = 0( + 1,..., n. (6.7) Then (6.1) is reduced to a atrix of the for '" ~ ala?, (6.8) where the vectors al>..., a", are orthogonal and noralized to length 1. The quantities appearing in the right-hand side of (6.7) are not available at the beginning of the coputational process for solving (1.1). Convenient approxiations, however, are gradually obtained at successive r-inia. One could start with the barrier function (6.6) and with P, = 1 for all i = 1,..., n. Ibalance of the constraints is then indicated by differences in the quantities (u,(r) JJ\7g,[x(r)]JI)2; 1,..., 0(, (6.9) for sall values of r. Using (6.9) to odify the weight factors one can restart the coputations with an iproved equilibriu. A siilar device can of course be eployed in the case that the quadratic loss function is used. Let us finally depart fro the orthogonality hypothesis, and consider what happens if the weight factors (6.7) are eployed in the case that the gradients \7g,(x), i = 1,..., et, are independent but not necessarily orthogonal. Ibalance is still governed by a atrix ofthe for (6.8) with noralized vectors al'..., a", of length 1. This atrix can also be written as AAT where A is the atrix with coluns al'..., a",. The positive eigenvalues of AAT are given by the eigenvalues of the atrix ATA, which has, as a consequence of the noralization, diagonal eleents equal to 1. Hence, the positive eigenvalues of AA T su up «ae. Orthogonality of the vectors al'..., a", iplies that the eigenvalues are toll qual to 1. The continuity of the eigenvalues ensures that sall perturbations of the orthogonality will only lead to sall deviations of the eigenvalues fro unity. The above device of choosing weight factors can therefore frequently be used in order to balance the constraints, even in the case that the gradients of the active constraints are not orthogonal.
330 F. A. LOOTSMA Acknowledgeent The author wishes to thank Prof. Dr J. F. Benders (Technological University, Eindhoven) and Dr J. D. Pearson (Inforation Systes and Autoation) for the inspiring discussions and criticiss. REFERENCES Eindhoven, May 1969 1) A. V. Fiacco and G. P. McCorick, Nonlinearprograing, Sequential unconstrained iniization techniques, Wiley, New York, 1968. 2) R. Fletcher and A. P. McCann, Acceleration techniques for non-linear prograing, Paper presented at the conference on optiization, Keele, 1968. 3) R. Fletcher and M. J. D. Powell, The Coputer Journal 6, 163-168, 1963. 4) F. A. Lootsa, Philips Res. Repts. 23, 408-423, 1968. 5) W. Murr ay, Ill-conditioning in barrier and penalty functions arising in constrained nonlinear prograing, Paper presented at the sixth int. syp. on ath. prograing, Princeton, N.J., 1967. 6) J. D. Pearson, The Coputer Journal12, 171-178, 1969. 7) J. H. Wilkinson, The algebraic eigenvalue proble, Clarendon Press, Oxford, 1965. 8) W. I. Zangwill, Manageent Science 13, 344-358, 1967.