A DIMENSION REDUCING CONIC METHOD FOR UNCONSTRAINED OPTIMIZATION

1 A DIMENSION REDUCING CONIC METHOD FOR UNCONSTRAINED OPTIMIZATION G E MANOUSSAKIS, T N GRAPSA and C A BOTSARIS Department of Mathematics, University of Patras, GR 26110 Patras, Greece e-mail :gemini@mathupatrasgr, grapsa@mathupatrasgr, botsaris@otenetgr Abstract In this paper we present a new algorithm for finding the unconstrained minimum of a twice continuously differentiable function f(x) in n variables This algorithm is based on a conic model function, which does not involve the conjugacy matrix or the Hessian of the model function The basic idea in this paper is to accelerate the convergence of the conic method choosing more appropriate points x 1, x 2,, x n+1 to apply the conic model To do this, we apply in the gradient of f a dimension reducing method (DR), which uses reduction to proper simpler one dimensional nonlinear equations, converges quadratically and incorporates the advantages of Newton and Nonlinear SOR algorithms The new method has been implemented and tested in well known test functions It converges in n + 1 iterations on conic functions and, as numerical results indicate, rapidly minimizes general functions Keywords and phrases: Unconstrained Optimization, Conic, Dimension Reducing Method I INTRODUCTION The general unconstrained minimization problem is min f(x), x IR n where f: IR n IR, is a twice continuously differentiable function in n variables x = (x 1, x 2,, x Although most methods for unconstrained minimization are based on a quadratic model, various authors have introduced non-quadratic algorithms Jacobson and Oksman in 15] presented a homogeneous model Davidon in 4] introduced conic models for such problems Botsaris and Bacopoulos in 2] presented a conic algorithm which does not involve the conjugacy matrix or the Hessian of the model function Recently several authors presented algorithms based on conic model functions In 19] we improved the convergence of the conic method presented in 2], applying a non monotone line search using the Barzilai and Borwein step 3], 14] The choice of step length is related to the eigenvalues of the Hessian at the minimizer and not to the function value Although the method does not guarantee descent in the objective function at each iteration it guarantees global convergence Grapsa and Vrahatis proposed a dimension reducing method for unconstrained optimization, named DROPT Method 13] This method is based on the methods studied in 9], 10], 11], 12] and it incorporates the advantages of Newton and SOR algorithms Although this procedure uses reduction to simpler one dimensional nonlinear equations, it generates a sequence in IR n 1 which converges quadratically to the n 1 components of the optimum while the remaining component is evaluated separately using the final approximations of the others For this component an initial guess is not necessary and it is at the user s disposal to choose which will be the remaining component, according to the problem Also this method does not directly need any gradient evaluation and it compares favorably with quadratically convergent optimization methods In this paper we use the dimension reducing method to obtain more appropriate points x 1, x 2,, x n+1 to apply the conic model In this way we accelerate more the convergence of the conic method Next we give the main notions of the conic and the dimension reducing methods, present the new method, study its convergence, state the corresponding algorithm and give numerical applications II THE CONIC METHOD A conic model function has the form : c(x) = 1 1 + p x 2 1 + p c(x)(x β) + c(β), (1) β where β is the location of the minimum, p IR n is the vector defining the horizon of the conic function, ie the hyperplane where c(x) takes an infinite value, which is defined by the equation : 1 + p x = 0 As can be seen the conic model in this form does not involve and, therefore, it does not require, an estimate of the conjugacy matrix Q or the Hessian matrix of the objective function The function c(x) has a unique minimum whenever the conjugacy matrix Q of the conic function is positive definite, and then the location β of the minimizer is determined by: β = Q 1 β 1+p Q 1 β, provided that 1 + p Q 1 β 0 Assuming that the objective function is conic we obtain : 2f(x)p + (1 + p x)g (x) ] β 2(1 + p β)f(β) = (1 + p x)g (x)x 2f(x) (2) If the horizon p is known, then by calculating the above equation at n + 1 distinct points

x 1, x 2,, x n+1 we have an (n + 1) (n + 1) linear system : ] β Aα = s φ, α = ω (3) where f 1 A = f 2 p 0 ] + G, f n+1 (1 + p x 1 )g 1 1 (1 + p x 2 )g 2 1 G = s = (1 + p x n+1 )g n+1 1 (1 + p x 1 )g 1 x 1 (1 + p x 2 )g 2 x 2 (1 + p x n+1 )g n+1 x n+1 f 1 f 2 φ = 2 f n+1, ω = 2(1 + p β)f(β) This system is linear in the (n + 1) dimensional unknown vector α consisting of the location of the minimum β and the scaled value of the minimum ω, assuming that the gauge vector p is available A recursive method to solve the system above is presented in 2] III THE DIMENSION REDUCING OPTIMIZATION METHOD DROPT-METHOD Notation Throughout this paper IR n is the n dimensional real space of column vectors x with components x 1, x 2,, x n, (y; z) represents the column vector with components y 1, y 2,, y m, z 1, z 2,, z k, i f(x) denotes the partial derivative of f(x) with respect to the ith variable x i, Ā denotes the closure of the set A and f(x 1,, x i 1,, x i+1,, x defines the mapping obtained by holding x 1,, x i 1, x i+1,, x n fixed, g(x) = (g 1 (x),, g n (x)) defines the gradient f(x) of the objective function f at x, while H = H ij ] defines the Hessian 2 f(x) of f at x To obtain a sequence {x p }, p = 0, 1, of points in IR n which converges to a local optimum (critical) point x = (x 1,, x n) D of the function f, we consider the sets B i, i = 1,, n to be those connected components of g 1 i (0) containing x on which n g i 0, for i = 1,, n respectively Next, applying the Implicit Function Theorem 6], 21], for each one of the components g i, i = 1,, n we can find open neighborhoods A 1 IR n 1 and A 2,i IR, i = 1,, n of the points y = (x 1,, x n 1) and x n respectively, such that for any y = (x 1,, x n 1 ) Ā 1 there exist unique mappings ϕ i defined and continuous in A 1 such that : x n = ϕ i (y) Ā 2,i, i = 1,, n and ( g i y; ϕi (y) ) = 0, i = 1,, n Moreover, the partial derivatives j ϕ i, j = 1,, n 1 exist in A 1 for each ϕ i, i = 1,, n, they are continuous in Ā 1 and they ( are ) given by : j ϕ i (y) = jgi y;ϕ i(y) ), ng i (y;ϕ i(y) i = 1,, n, j = 1,, n 1 Next, working exactly as in 10], we utilize Taylor s formula to expand the ϕ i (y), i = 1,, n, about y p By straightforward calculations, we can obtain the following iterative scheme for the computation of the n 1 components of x : y p+1 = y p + A 1 p V p, p = 0, 1,, (4) where : y p = x p i ], i = 1,, n 1 ] (5) j g i (y p ; x p,i A p = n g i (y p ; x p,i jg n (y p ; x p,n n g n (y p ; x p,n i, j = 1,, n 1 (6) V p = v i ] = x p,i n x p,n n ], i = 1,, n 1 (7) with x p,i n = ϕ i (y p ) After a desired number of iterations of (4), say p = m, the nth component of x can be approximated by means of the following relation : x m+1 n = x m,n n { } n 1 j=1 (x m+1 j x m j ) jgn(ym ;x m,n ng n(y (8) m ;x m,n Remark 1 Relative procedures for obtaining x can be constructed by replacing x n with any one of the components x 1,, x n 1, for example x int, and taking y = (x 1,, x int 1, x int+1,, x Remark 2 The above described method does not require the expressions ϕ i but only the values x p,i n which are given by the solution of the one dimensional equations g i (x p 1,, xp n 1, ) = 0 So, by holding y p = (x p 1,, xp n 1 ) fixed, we can solve the equations g i (y p ; r p i ) = 0, i = 1,, n for rp i in the interval (a, b) with an accuracy δ Of course we can use any one of the well known one dimensional methods 21], 22], 25], 26] to solve the above equations Here we employ a modified bisection method described in 13], 27], 29] The only computable information required by this bisection method is the algebraic signs of the function ψ Moreover, it is the only method that can be applied to problems with imprecise function values IV THE NEW METHOD In order to find the point β which minimizes a given function f: D IR n IR in a specific domain D, we shall try to obtain a sequence of points x k, k = 0, 1, which converges to β If we assume that f is a conic function, then we have to solve the system defined by (3) As stated in

Section 2 this system is produced by evaluating (2) in n + 1 points x 1, x 2,, x n+1 In 2] these points were produced by applying Armijo s Method In this paper we propose the DR Method to obtain these points DR Method results to a better set of points, so the convergence of the conic algorithm is accelerated Let ρ i = 1 + p x i+1 1 + p = f + k i x i gi+1 x (9) where f = f i+1 f i, x = x i+1 x i, and k i = f 2 gi+1 xg i x] 1 2 provided that the quantity under the square root is non-negative If this quantity is negative then the conic method cannot proceed In this case, using DR Method, we get a new point x, evaluate a new equation (2) and we restart the conic procedure to solve the modified system (3) The gauge vector p can be determined by solving the linear system : Zp = r (10) x 2 ρ 1 x 1 ρ 1 1 x where Z = 3 ρ 2 x 2, r = ρ 2 1 x n+1 ρ n x n ρ n 1 From (3), (10), we have that the location of the minimum β can be determined through the equation : p = Z 1 r, α = A 1 (s φ) We carry out the necessary inversions recursively as new points are constructed by the algorithm Using Householder s formula for matrix inversion it can be verified that : Z 1 i = Z 1 i 1 where z i = x i+1 ρ i x i, and Z 1 i 1 e l(z i Z 1 i 1 e l ) zi Z 1 i 1 e l (11) p i = p i 1 + Z 1 i 1 e l(η i zi p i 1) zi Z 1 i 1 e (12) l where η i = ρ i 1, provided that zi Z 1 i 1 e l is bounded away from zero We note that e l is a vector with zero elements except the position l = i, where it has unity The solution of the linear ( system is proved to be : 1 + q ) u α = u 1 + q v (13) v where Let us further define : Λ i = q = p 0 ], (14) u = G 1 s (15) v = G 1 φ (16) 1 0 0 0 1 0 0 0 λ i λ i = 1 + p i x i 1 + p i 1 x i (17) (18) and ] yi+1 1 + p i = x i+1 λ i gi+1 1 (19) Then G 1 i+1 can be computed according to the recursive formula : G 1 i+1 = Λ i λ i G 1 i G 1 i e c (yi+1 G 1 i e c ) yi+1 G 1 i e c ] (20) provided that yi+1 G 1 i e c is bounded away from zero The recursive equations for the vectors u and v, required to compute α i+1 from (13), are found to be : i e c (yi+1 u i+1 = Λ u ] i θ i+1 ) i (21) and v i+1 = Λ i where λ i u i v i G 1 y i+1 G 1 i e c G 1 i e c (yi+1 v 1 i ξ i+1 ) yi+1 G 1 i e c ] (22) θ i+1 = 1 + p i x i+1 λ i gi+1 x, ξ i+1 = 2f i + 1 (23) i+1 The proposed method is illustrated in the following algorithms in pseudo code where g = (g 1, g 2, g indicates the gradient of the objective function, x 0 the starting point, a = (a 1, a 2,, a, b = (b 1, b 2,, b indicate the endpoints in each coordinate direction which are used for the above mentioned one dimensional bisection method, with predetermined accuracy δ, MIT the maximum number of iterations required and ε 1, ε 2, ε 3, ε 4 the predetermined desired accuracies Algorithm 1 : Dimension Reducing Optimization (DROPT) Step 1 Input {x 0 ; a; b; δ; MIT ; ε 1 ; ε 2 } Step 2 Set p = 1 Step 3 If p < MIT replace p by p + 1 and go to next step; otherwise, go to Step 14 Step 4 If g(x p ) ε 1 go to Step 14 Step 5 Find a coordinate int such that the following relation holds : sgn g i (x p 1,, xp int 1, a int, x p int+1,, xp n) sgn g i (x p 1,, xp int 1, b int, x p int+1,, xp n) = 1, for all i = 1, 2,, n If this is impossible, apply Armijo s method and go to Step 4 Step 6 Compute the approximate solutions r i for all i = 1, 2,, n of the equation g i (x p 1,, xp int 1, r i, x p int+1,, xp n) = 0, by applying the modified bisection in (a int, b int ) within accuracy δ Set x p,i int = r i Step 7 Set y p = (x p 1,, xp int 1, xp int+1,, xp n) Step 8 Set the elements of the matrix A p of Relation (6) using x int instead of x n

Step 9 Set the elements of the vector V p of Relation (7) using x int instead of x n Step 10 Solve the (n 1) (n 1) linear system A p s p = V p for s p Step 11 Set y p+1 = y p + s p Step 12 Compute x int by virtue of Relation (8) and set x p = (y p ; x int ) Step 13 If s p ε 2 go to Step 14; otherwise return to Step 3 Step 14 Output {x p } The criterion in Step 5 ensures the existence of the solution r i which will be computed at Step 6 If this criterion is not satisfied we apply Armijo s method 1], 13], 28] for a few steps and then try again with DR method Our experience is that in many examples studied in various dimensions as well as for all the problems studied in this paper (see below in Numerical Applications) the application of such a subprocedure is not necessary We have merged it in our algorithm for completeness Algorithm 2 : DR Conic Method Step 1 Assume x 0 is given Set i=0 Step 2 Set d 0 = g 0 Step 3 Use DR Method to get a point x 1 Step 4 Set α0 = x 1 0 ], G 0 = I, Z 0 = I, u 0 = α 0, v 0 = 0, p 0 = 0, λ 0 = 1, jc=1, lc=1 Step 5 If g 0 ε 1 (1 + f 0 ) then stop; else go to Step 6 Step 6 Use (17), (18), (19) to calculate L i+1, y i+1 Step 7 If yi+1 G 1 I e c < ε 3, then set x 0 = x i+1 and go to Step 3; else go to Step 8 Step 8 Use (12), (13), (14), (15), (16), (20), (21), (22) to calculate a i+1 Step 9 If (x i β i ) g i < ε 4 then then set x 0 = x i+1 and go to Step 3; else go to Step 10 Step 10 If f(β i+1 ) f(x i ), then set x i+1 = β i+1, and go to step 11; else go to step 3 Step 11 Set i = i + 1; If jc = n + 1, then reset jc = 1; else jc = jc + 1 Step 12 Set d i = µ i (x i β i ), where µ i = sign gi (x i β i ) Step 13 If d i + γ i M, then go to step 14; else set x 0 = x i and go to Step 3 Step 14 If δ i = (f i+1 f i ) 2 gi+1 (x i+1 x i )gi (x i+1 x i ) < 0, then using DR Method find an x i+1 and repeat this procedure until the new x i+1 so obtained satisfies δ i > 0; go to Step 15 Step 15 If zi Z i 1e l < ε 3, then set x 0 = x i+1 and go to Step 3; else go to Step 16 Step 16 Use (11),(12) to calculate Z 1 i+1, p i+1 Step 17 If lc = n, then reset lc=1; else set lc = lc + 1 Step 18 Go to Step 5 V THE CONVERGENCE OF THE NEW METHOD The convergence of the new algorithm can be established using the convergence theorems by Polak in 23], Grapsa Vrahatis in 13] and Bacopoulos Botsaris in 2] Consider the sequence {x i } generated by our algorithm We make the following assumptions: (a) x i is desirable iff g(x i ) = 0 (b) f(x) is a twice continuously differentiable function in R n, whose Hessian is non singular at any desirable point (c) There exists an x 0 R n such that the level set L 0 = {x f(x) f(x 0 )} is compact The compactness assumption, together with the fact that f(x i+1 ) f(x i ), guaranteed by the use of DR Method, implies that the sequence {x i } generated by our algorithm is bounded and therefore has accumulation points (d) Let M > 0 such that M 2 sup g(x), x L 0 (e) The algorithm terminates if g(x i ) = 0 Then the sequence generated by the new algorithm satisfies the assumptions of the above mentioned three convergence theorems and consequently either the sequence is finite and terminates at a desirable point or else it is infinite and every accumulation point x of {x i } is desirable VI NUMERICAL APPLICATIONS The procedures described in this section have been implemented using a new FORTRAN program named DRCON (Dimension Reducing Conic) DR- CON has been tested on a Pentium IV PC compatible with random problems of various dimensions Our experience is that the algorithm behaves predictably and reliably and the results have been quite satisfactory Some typical computational results are given below For the following problems, the reported parameters indicate : n dimension, x 0 = (x 1, x 2,, x starting point, x = (x 1, x 2,, x n) approximate local optimum computed within an accuracy of ε = 10 15, IT the total number of iterations required to obtain x, FE the total number of function evaluations (including derivatives), ASG the total number of algebraic signs of the components of the gradient that are required for applying the modified bisection 27], 29] In Tables 1 3 we compare the numerical results obtained, for various starting points, by applying Armijo s steepest descent method 1] as well as conjugate gradient methods and variable metric methods, with the corresponding numerical results of the method presented in this paper The index α indicates the classical starting point Furthermore, D indicates divergence or non convergence while FR, PR and BFGS indicate the corresponding results obtained by Fletcher Reeves 24],

Polak Ribiere 24] and Broyden Fletcher Goldfarb Shanno (BFGS) 5] algorithms, respectively In the following examples we minimize the objective m function f given by f(x) = f 2 i (x), where f i are i=1 given below for each example : Example 1 Rosenbrock function, m = n = 2, 20] f 1 (x) = 10 ( x 2 x 2 1), f2 (x) = 1 x 1, with f(x ) = 0 at x = (1, 1) Example 2 Freudenstein and Roth function, m = n = 2, 20] ( ) f 1 (x) = 13 + x 1 + (5 x 2 )x 2 2 x 2, ( ) f 2 (x) = 29 + x 1 + (x 2 + 1)x 2 14 x 2, with f(x ) = 0 at x = (5, 4) and f(x ) = 489842 at x = (1141, 08968 ) Example 3 Brown almost linear function, m = n = 3, 20] n f i (x) = x i + x j (n + 1), 1 i < n, j=1 ( n ) f n (x) = j=1 x j 1, with, f(x ) = 0 at x = (a,, a, a 1 where a satisfies the equation na n (n + 1)a n 1 + 1 = 0 and f(x ) = 1 at x = (0,, 0, n + 1) Table 15 Rosenbrock function Armijo FR PR BFGS DROPT x 0 IT FE IT FE IT FE IT FE IT FE ASG ( 12, 1) α 1881 21396 142 2545 19 364 22 343 7 30 100 ( 3, 6) 5960 74560 194 4462 23 455 28 436 7 29 100 ( 2, 2) 1828 20852 29 480 15 290 20 305 6 27 100 (3, 3) 5993 74364 130 2939 26 509 25 384 13 48 140 (1, 20) D D 259 5732 32 689 32 689 1 6 20 (10, 10) 18416 251611 310 7469 26 526 32 505 4 21 80 (100, 100) D D D D 33 746 54 822 3 16 60 ( 2000, 2000) 2542 35743 D D 93 2466 173 2667 5 25 100 Table 16 Freudenstein and Roth function Armijo FR PR BFGS DROPT x 0 IT FE IT FE IT FE IT FE IT FE ASG (05, 2) α 1827 24155 18 356 8 187 7 138 23 111 460 (05, 1000) 1380 18770 D D D D D D 22 110 440 ( 2, 2) 1119 14625 19 336 8 180 7 121 23 110 460 ( 20, 20) 1851 24986 24 451 10 211 9 149 11 55 220 (45, 45) 1239 16289 18 342 9 196 8 129 5 25 100 (10, 100) 1845 24664 10 200 9 194 9 170 16 80 320 (12, 2) 2027 26886 70 1145 8 130 7 103 24 115 480 (4, 1000) 1886 25597 D D D D D D 37 181 740 Table 17 Brown almost linear function Armijo FR PR BFGS DROPT x 0 IT FE IT FE IT FE IT FE IT FE ASG (05, 05, 05) α 177 1330 12 177 6 93 6 91 7 36 150 (0, 0, 3) 221 1612 27 389 9 143 9 131 1 7 30 ( 1, 0, 3) 233 1691 34 491 42 612 12 180 4 26 120 (01, 01, 2) 181 1349 52 742 9 149 9 145 16 76 330 (12, 12, 0) 183 1368 23 334 9 138 8 121 7 43 210 (08, 07, 2) 193 1446 70 996 23 336 14 216 7 31 120 ( 01, 01, 01) 173 1295 20 290 7 109 6 92 6 25 90 (12, 12, 10) 64 563 47 711 10 165 9 139 6 29 120

References 1] Armijo, L (1966), Minimization of functions having Lipschitz continuous first partial derivatives, Pacific J Math, 16, 1 3 2] Bacopoulos A and Botsaris, CA (1992), A new conic method for unconstrained minimization, J Math Anal Appl, 167, 12 31 3] Barzilai, J and Borwein, JM (1988), Two-point step size gradient methods, IMA J Num Anal, 8, 141 148 4] Davidon, WC (1959), Variable metric methods for minimization, A E C, Research and Development Report, No ANL 5990, Argonne Nat l Lab, Argonne, Illinois 5] JE Dennis, Jr and RB Schnabel, Numerical methods for unconstrained optimization and nonlinear equations (Prentice Hall, Inc, Englewood Cliffs, New Jersey, 1983) 6] J Dieudonné, Foundations of modern analysis, Academic Press, New York, 1969 7] Fletcher, R and Powell, M (1963), A rapidly convergent descent method for minimization, Comput J, 6, 163 168 8] Fletcher, R and Reeves, C (1964), Function minimization by conjugate gradients, Comput J, 7, 149 154 9] TN Grapsa and MN Vrahatis, The implicit function theorem for solving systems of nonlinear equations in IR 2, Inter J Computer Math 28 (1989) 171 181 10] TN Grapsa and MN Vrahatis, A dimension reducing method for solving systems of nonlinear equations in IR n, Inter J Computer Math 32 (1990) 205 216 11] TN Grapsa, MN Vrahatis and TC Bountis, Solving systems of nonlinear equations in IR n using a rotating hyperplane in IR n+1, Inter J Computer Math 35 (1990) 133 151 12] TN Grapsa and MN Vrahatis, A new dimension reducing method for solving systems of nonlinear equations, Inter J Computer Math 55 (1995) 235 244 13] TN Grapsa and MN Vrahatis, A dimension reducing method for unconstrained optimization, Journal ofcomputational and Applied Mathematics 66 (1996) 239 253 14] Grippo, L, Lampariello, F and Lycidi, S (1986), A nonmonotone line search technique for Newton s method, SIAM J Numer Anal, 23, 707 716 15] Jacobson, DH and Oksman, W,(1972), An algorithm that minimizes homogeneous functions of n variables in n + 2 iterations and rapidly minimizes general functions, J Math Anal Appl 38, 535 552 16] B Kearfott, An efficient degree computation method for a generalized method of bisection, Numer Math 32 (1979), 109 127 17] R B Kearfott, Some tests of generalized bisection, ACM Trans Math Software 13 (1987), 197 220 18] M Kupferschmid and JG Ecker, A note on solution of nonlinear programming problems with imprecise function and gradient values, Math Program Study 31 (1987) 129 138 19] Manoussakis, GE, Sotiropoulos, DG, Botsaris, CA, and Grapsa, TN (2002), A Non Monotone Conic Method For Unconstrained Optimization, In: Proceedings of 4th GRACM, Congress on Computational Mechanics, 27-29 June, University of Patras, Greece 20] Moré, BJ, Garbow, BS and Hillstrom, KE (1981), Testing unconstrained optimization, ACM Trans Math Software, 7, 17 41 21] JM Ortega and WC Rheinbolt, Iterative Solution of Nonlinear Equations in Several Variables (Academic Press, New York, 1970) 22] A Ostrowski, Solution of equations in Euclidean and Banach spaces, Third Edition, Academic Press, London, 1973 23] Polak, E (1971), Computational Methods in Optimization: A Unified Approach, Academic Press, New York 24] WH Press, SA Teukolsky, WT Vetterling and BP Flannery, Numerical Recipes, The Art of Scientific Computing, 2nd ed (Cambridge University Press, New York, 1992) 25] W C Rheinboldt, Methods for solving systems of equations, SIAM, Philadelphia, 1974 26] J F Traub, Iterative methods for the solution of equations, Prentice Hall, Inc, Englewood Cliffs, NJ, 1964 27] M N Vrahatis, CHABIS: A mathematical software package for locating and evaluating roots of systems of nonlinear equations, ACM Trans Math Software 14 (1988), 330 336 28] MN Vrahatis, GS Androulakis and GE Manoussakis, A new unconstrained optimization method for imprecise function and gradient values, J Math Anal Appl accepted for publication 29] M N Vrahatis and K I Iordanidis, A rapid generalized method of bisection for solving systems of non linear equations, Numer Math 49 (1986), 123 138