REPORTS IN INFORMATICS - PDF Free Download

REPORTS IN INFORMATICS ISSN 0333-3590 Sparsty n Hgher Order Methods n Optmzaton Ger Gundersen and Trond Stehaug REPORT NO 327 June 2006 Department of Informatcs UNIVERSITY OF BERGEN Bergen, Norway

Ths report has URL http://www..ub.no/publasoner/texrap/ps/2006-327.ps Reports n Informatcs from Department of Informatcs, Unversty of Bergen, Norway, s avalable at http://www..ub.no/publasoner/texrap/. Requests for paper copes of ths report can be sent to: Department of Informatcs, Unversty of Bergen, Høytenologsenteret, P.O. Box 7800, N-5020 Bergen, Norway

Sparsty n Hgher Order Methods n Optmzaton Ger Gundersen and Trond Stehaug Department of Informatcs Unversty of Bergen Norway {gerg,trond}@.ub.no 29th June 2006 Abstract In ths paper t s shown that when the sparsty structure of the problem s utlzed hgher order methods are compettve to second order methods (Newton), for solvng unconstraned optmzaton problems when the obectve functon s three tmes contnuously dfferentable. It s also shown how to arrange the computatons of the hgher dervatves. 1 Introducton The use of hgher order methods usng exact dervatves n unconstraned optmzaton have not been consdered practcal from a computatonal pont of vew. We wll show when the sparsty structure of the problem s utlzed hgher order methods are compettve compared to second order methods (Newton). The sparsty structure of the tensor s nduced by the sparsty structure of the Hessan matrx. Ths s utlzed to mae effcent algorthms for Hessan matrces wth a sylne structure. We show that classcal thrd order methods,.e Halley s, Chebyshev s and Super Halley s method, can all be regarded as two step Newton le methods. We present numercal results on thrd order local methods that utlzes the sparsty and super-symmetry where the computatonal cost s so low that they are compettve wth Newton s method. Thrd order methods wll n general use fewer teratons than a second order method to reach the same accuracy. However, the number of arthmetc operatons per teraton s hgher for thrd order methods than second order methods. Further, there s an ncreased memory requrement. For sparse systems we show that ths ncrease and the ncrease n arthmetc operatons are very modest. If we compare the number of arthmetc operatons to compute the gradent of the thrd order Taylor approxmaton to the second order approxmaton (as for Newton), we fnd that ths rato can be expressed as the rato of the memory requrements. In the case for banded Hessan matrces wth (half) bandwdth β, ths rato s bounded by β+2. We focus on obectve 2 functons where the Hessan matrx has a sylne (or envelope) structure, and we brefly dscuss general sparse Hessan matrces and tensors. In secton 2 we dscuss methods that have a cubcally convergence rate. We show that these methods may be regarded as two step of Newton s method. In secton 3 we brefly ntroduce a global method based on trustregons. We show that the number of (outer) etratons usng a cubc model s lower than usng a quadratc model. In secton 4 we summarze the computatons usng the hgher order (thrd) dervatve and show how to utlze the super symmetry. In secton 5 we ntroduce the concept of nduced sparsty of the thrd dervatve. In secton 6 we brefly dscuss data structures for nduced sparse tensors. Fnally, n secton 7 we gve some numercal results on the cost gong from quadratc to cubc operatons, and ntroduce an effcency rato ndcator. 1

2 Methods for Solvng Nonlnear Equatons One of the central problems of scentfc computaton s the effcent numercal soluton of the system of n equatons n n unnowns F (x) = 0 (1) where F : R n R n s suffcently smooth and the Jacoban F (x ) s nonsngular. Consder the Halley class of teratons [13] for solvng (1). where x +1 = x {I + 1 2 L(x )[I αl(x )] 1 }(F (x )) 1 F (x ), = 0, 1,..., (2) L(x) = (F (x)) 1 F (x)(f (x)) 1 F (x). Ths class contans the classcal Chebyshev s method (α = 0), Halley s method (α = 1 ), and Super Halley s 2 method (α = 1). More on the Chebyshev s Method [1, 6, 14, 21, 22, 27, 28, 29], Halley s Method [1, 4, 5, 6, 7, 8, 14, 15, 20, 21, 22, 24, 27, 28, 29, 30] and Super Halley s method [7, 13, 14]. All members n the Halley class are cubcally convergent. The formulaton (2) s not sutable for mplementaton. By rewrtng the equaton we get the followng teratve method for = 0, 1,... Solve for s (1) : Solve for s (2) : The new step s F (x )s (1) = F (x ) (3) (F (x ) + αf (x )s (1) )s(2) = 1 2 F (x )s (1) s(1). (4) x +1 = x + s (1) + s (2). The frst equaton (3) s the Newton equaton. The second equaton (4) s an approxmaton to the Newton equaton snce and F (x + s (1) )s(2) = F (x + s (1) ) F (x + s (1) ) F (x ) + F (x )s (1) + 1 2 F (x )s (1) s(1) = 1 2 F (x )s (1) s(1) F (x + s (1) ) F (x ) + αf (x )s (1). Wth startng ponts x 0 close to the soluton x these methods have a superor convergence compared to Newton s method. 2

2.1 Unconstraned Optmzaton The methods n the prevous secton also apply for algorthms for the unconstraned optmzaton problem mn f(x) (5) x Rn where f : R n R s suffcently smooth and the Hessan 2 f(x ) s symmetrc postve defnte. The necessary condton at a soluton (5) s that the gradent s zero Thus n (1) we have that f(x) = 0. F (x) = f(x) or F = f x (x), = 1,..., n. Then we have for unconstraned optmzaton that F = 2 f(x) s the Hessan matrx and F = 3 f(x) s the thrd dervatves of f at x. 2.2 Motvaton The followng statements show that hgher order methods s not consdered practcal. (Ortega and Rhenboldt 1970) [20]: Methods whch requre second and hgher order dervatves, are rather cumbersome from a computatonal vew pont. Note that, whle computaton of F nvolves only n 2 partal dervatves F, computaton of F requres n 3 second partal dervatves F, n general exorbant amount of wor ndeed. (Rhenboldt 1974) [21] Clearly, comparsons of ths type turn out to be even worse for methods wth dervatves of order larger than two. Except n the case n = 1, where all dervatves requre only one functon evaluaton, the practcal value of methods nvolvng more than the frst dervatve of F s therefore very questonable. (Rhenboldt 1998) [22]: Clearly, for ncreasng dmenson n the requred computatonal wor soon outweghs the advantage of the hgher-order convergence. From ths vew pont there s hardly any ustfcaton to favor the Chebyshev method for large n. A certan modfcaton can be seen - and wth recent numercal experments that utlzes sparsty then these statements are contradcted. Consder the followng test functons: Chaned Rosenbroc [26] and Generalzed Rosenbroc [23]. Generalzed Rosenbroc has a Hessan matrx of arrowhead structure, whle for the Generalzed Rosenbroc the Hessan matrx structure s banded. We compare Newton s method, Chebyshev s method, Super Halley s method and Halley s method. The test cases show that the thrd order methods are compettve wth Newton s method, also for ncreasng n. The termnaton crtera for all methods are f f(x ) 10 8 f(x 0). 3

x O =(1.7,...,1.7) and x*=(1.0,...,1.0) 40 35 Newton 9 teratons Chebyshev 6 teratons Halley 6 teratons Super Halley 4 teratons CPU Tmngs (ms) 30 25 20 15 10 5 0 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 n Fgure 1: Chaned Rosenbroc. x O =(1.08,0.99,...,1.08,0.99) and x*=(1.0,...,1.0) x O =(1.3003,122,...,1.3003,122) and x*=(1.0,...,1.0) 25 Newton 5 teratons Chebyshev 3 teratons Halley 3 teratons Super Halley 3 teratons 70 60 Newton 25 teratons Chebyshev 12 teratons Halley 12 teratons Super Halley 10 teratons 20 CPU Tmngs (ms) 15 10 CPU Tmngs (ms) 50 40 30 20 5 10 0 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 n 0 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 n Fgure 2: Generalzed Rosenbroc. Fgure 3: Generalzed Rosenbroc. Fgure (1), Fgure (2) and Fgure (3) shows that the total CPU tme to solve the unconstraned optmzaton problem (5) ncreases lnearly n the number of unnowns and the CPU tme s almost dentcal. If there should be any gan n usng thrd order methods they must have fewer teratons then the second order methods, snce the computatonal cost s hgher for each teraton for the thrd order method. How many fewer teratons the thrd order methods must have s very problem dependent, snce the CPU tme s also very domnated by the cost of functon, gradent, Hessan and tensor evaluatons. 3 Hgher Order Global Methods A thrd order Taylor approxmaton of f(x + p) evaluated at x s m(p) = f + g T p + 1 2 pt Hp + 1 6 pt (pt )p (6) where f = f(x) and we can expect the model m(p) to be a good approxmaton for p. Algorthms for the unconstraned optmzaton problem mn f(x) (7) x Rn 4

generates a sequence of terates x where at every terate x we generate a model functon m(p) that approxmates the functon. The new terate s x +1 = x + p provded that we have a suffcently good model. To solve the unconstraned optmzaton problem we have mplemented a trust-regon method [19], both wth a cubc and a quadratc model. The trust-regon method s a global method, thus t guarantees that t fnds an x where f(x ) = 0, for any startng ponts. The trust-regon method wll requre the value, the gradent or the Hessan matrx m(p), m(p) = g + Hp + 1 2 (pt )p, 2 m(p) = H + (pt ) (8) of the cubc model (6). The Trust-Regon Subproblem (TRS) (9) must be solved for each teraton of the trust-regon method. mn m(p) = p R n gt p + 1 2 pt Hp + 1 6 pt (pt )p s.t p (9) Algorthm 1 Trust-Regon Method Let 0 < γ 2 γ 3 < 1 γ 1, 0 < β 0 and 0 < β 1 < 1 Gven ˆ > 0, 0 (0, ˆ ), and η [0, β 0 ) for = 0, 1, 2,... do Compute f(x ), 2 f(x ) and 3 f(x ). Determne an (approxmate) soluton p to the TRS (9). Compute ρ = (f(x ) f(x + p ))/(m (0) m (p )). f ρ < β 0 then γ 2 p +1 γ 3 p else f ρ > β 1 and p = then p +1 mn{γ 1 p, ˆ } end f end f f ρ > η then x +1 = x + p else x +1 = x end f Preferable values for the constant n Algorthm 1 are β 0 = 1 4, β1 = 3 4, γ1 = 2, and γ2 = γ3 = 1 2. 3.1 Trust-Regon Iteratons for Quadratc and Cubc Model In ths secton we compare the trust-regon method wth a cubc model to a trust-regon method wth a quadratc model on the number of teratons from a startng pont x 0 to a soluton x, where f(x ) = 0. We have taen several test functons and tested them for dfferent n and startng ponts. Chaned Rosenbroc x 0 = {( 1.2, 1.0,..., 1.2, 1.0), ( 1.0, 1.0,..., 1.0, 1.0)} and x = (1.0..., 1.0), Generalzed Rosenbroc x 0 = ( 1.2, 1.0,..., 1.2, 1.0) and x = (1.0..., 1.0), BroydenTrdagonal x 0 = ( 1.0, 1.0,..., 1.0, 1.0) and x = ( 0.57,..., 0.42), and Beale [3] x 0 = {(4, 4), (3, 3), (2, 2)} and x = (3.5, 0.5) Table (1) shows that the trust-regon method wth a cubc method (C) uses fewer teratons then wth a quadratc model (Q), to get to a soluton. The termnaton crtera for both methods are f f(x ) 10 8 f(x 0). 5

Table 1: Trust-Regon Method. Trust-Regon Iteratons for Quadratc and Cubc Model Chaned Rosenbroc Chaned Rosenbroc Generalzed Rosenbroc BroydenTrdagonal Beale n Q C n Q C n Q C n Q C n Q C 2 22 15 4 14 12 4 12 6 4 6 4 2 18 12 6 29 16 8 14 10 8 10 6 20 6 4 2 16 9 18 27 19 10 15 12 10 10 6 30 6 4 2 14 11 4 Computatons of the Cubc Model Let f be a three tmes contnuously dfferentable functon f : R n R. For a gven x R n let g = f(x), H = 2 f(x), T = 3 f(x), 1 n. (10) x x x x x x Then H s a symmetrc matrx and T a super-symmetrc tensor, g R n, H R n n, and T R n n n usng the notaton n [2]. We say that a n n n tensor s super-symmetrc when T = T = T = T = T = T,,, T = T = T,. Snce the entres of a super-symmetrc tensor are nvarant under any permutaton of the ndces we only need to store the 1 6 (n + 2)(n + 1)n nonzero elements T for 1 n, as llustrated n Fgure 4 for n = 9. Fgure 4: The stored elements for a dense super-symmetrc tensor where n = 9. We wll present the followng tensor computatons of the cubc model (4) p T (pt )p, (pt )p and (pt ) where p T (pt )p R s the cubc value term, (pt )p R n s the gradent term and (pt ) R n n s the Hessan term. The cubc value term s a scalar, the cubc gradent term s a vector, and the cubc Hessan term s a matrx. These operatons are needed for the local methods (The Halley class) and the global method (trust-regon). 6

The cubc value term can s defned as p T (pt )p = n =1 p n p By tang nto account super-symmetry, 1 n, we have the followng p T (pt )p = =1 =1 =1 n =1 (" n 1! 1 p p 6 p T + 3p T =1 The computaton (12) leads to the followng algorthm Algorthm 2 Computng c c + p T (pt )p. Let T R n n n be a super-symmetrc tensor. Let p R n. Let c, s, t R. t = 0 for = 1 to 1 do s = 0 for = 1 to 1 do s+ = p T t+ = p (6s + 3p T ) s = 0 for = 1 to 1 do s+ = p T c+ = p (t + p (3s + p T )) Algorthm 2 requres 1 3 n3 + 3n 2 + 11 n number of arthmetc operatons. 3 The cubc gradent term can be defned as g = n =1 =1 By tang nto account super-symmetry we have the followng p T. (11) # ) 1 + 3p p T + p 2 T. (12) =1 n p p T, 1 n. (13) where g = g (1) + g (2) + g (3) + g (4) + g (5) + g (6), 1 n, (14) g (2) = g (1) = =1 =+1 =1 =1 p p T = p p T, 1 n (15) =1 =+1 p p T, 1 n, (16) 7

g (3) = g (4) = g (5) = g (6) = n =1 =+1 n =+1 =1 n =+1 =+1 n p p T = p p T = n =+1 =+1 p p T = p p T = n =1 =+1 n =+1 =1 n =+1 =+1 n p p T, 1 n, (17) p p T, 1 n, (18) n =+1 =+1 If we combne the computatons of (15) to (20) we get the followng algorthm. Algorthm 3 Computng g g + (pt )p. Let T R n n n be a super-symmetrc tensor. Let p, g R n. Let s R. s = 0 for = 1 to 1 do for = 1 to 1 do s+ = p T g + = 2p p T g + = p (2s + p T ) g + = 2p (s + p T ) s = 0 for = 1 to 1 do s+ = p T g + = p 2 T g + = (2p s + p 2 T ) Algorthm 3 requres 2 3 n3 + 6n 2 2 n number of arthmetc operatons. 3 The cubc Hessan term can be defned as H = p p T, 1 n, (19) p p T, 1 n. (20) n p T, 1, n. (21) =1 By tang nto account the symmetres we have the followng H = p T + =1 =+1 p T + n =+1 p T, (22) Snce a super-symmetrc tensor s nvarant under any permutaton of the ndces we have that 8

where H = H (1) + H (2) + H (3), 1 n (23) H (1) = H (2) = H (3) = p T, 1 n, (24) =1 =+1 n =+1 p T, 1 n, (25) p T, 1 n. (26) If we combne (24), (25) and (26) we get the followng algorthm that computes the lower part of H. Algorthm 4 Computng H H + (pt ). Let T R n n n be a super-symmetrc tensor. Let H R n n be a symmetrc matrx. Let p R n. for = 1 to 1 do for = 1 to 1 do H + = p T H + = p T H + = p T H + = p T H + = p T for = 1 to 1 do H + = p T H + = p T H + = p T Algorthm 4 requres n 3 + n 2 number of arthmetc operatons. 5 The Sparsty Structure of the Tensor In ths secton wll show how we can utlze the sparsty structure of the Hessan matrx to nduce the sparsty of the thrd dervatves (tensor), that s we ntroduce the concept of nduced sparsty. Fnally, we wll focus on obectve functons where the Hessan matrx has a sylne (or envelope) structure. We use the structure of partally separable functons to derve the structure of the Hessan matrces and thus the tensor. 9

5.1 Induced Sparsty The sparsty of the Hessan matrx s defned to be [10] Defne 2 x x f(x) = 0, x R n, 1, n. (27) C = { x R n so that 2 x x f(x) 0} (28) C s the nonzero ndex structure of row of the Hessan matrx. We assume that the dagonal element s ncluded n the ndex structure of the Hessan matrx,.e. C. Theorem 2. The sparsty structure of the tensor T s determned by the sparsty structure (27) of the Hessan matrx. Proof. Assume that / C,. It follows that T = 0, 1. We say that the sparsty structure of the tensor s nduced by the sparsty structure of the tensor. Ths s further llustrated n the Fgures (5), (6), (7) and (8). Fgure 5: Stored elements of a symmetrc arrowhead matrx. Fgure 6: Stored elements of a symmetrc trdagonal matrx. 10

Fgure 7: Stored elements of the tensor nduced by an arrowhead symmetrc matrx where n = 9. Fgure 8: Stored elements of the tensor nduced by an trdagonal symmetrc matrx where n = 9. 5.2 A Sylne Matrx In a symmetrc sylne storage mode, all matrx elements from the frst nonzero n each row to the dagonal n the row are explctly stored. We defne β to be the (lower) bandwdth of row, β = max{ for nonzero H, }. Further, defne f to be the start ndex for row n the Hessan matrx as f = β. (29) The P sylne storage requrement for a symmetrc sylne storage (only the lower trangle need to be stored), s β + n. 5.3 Partally Separable functons A functon on the form f = φ s partally separable where each element functon φ depends on only a few components of x [10]. It follows that the gradents φ and 2 φ of each element functon contans ust a few nonzeros. 5.3.1 A Sylne Structure Theorem 1. If the functon f = P φ s partally separable on the form f(x) = m =1 φ (x β (l), x β (l) +1,..., x +β (u), x 1 +β (u) ) 11

and β = β (l) + β (u) 0, = 1, 2,..., m. Then the Hessan matrx has a sylne structure, and f (29) s monotoncally ncreasng. Proof. The gradent of f s f x (x) = P m =1 x φ (x β (l) = P J x φ (x (l) β, x β (l), x β (l) where J = { : β (l) J s the sum of all element functons contanng x. Then an element n the gradent s dependent of +1,..., x +β (u), x 1 +β (u) ) +1,..., x +β (u), x 1 +β (u) ), = 1, 2,..., n + β u () }, = 1, 2,..., n where and x l, x l +1,..., x, x +1,..., x u 1, x u l = mn J { β (l) } u = max J { + β (u) }. We have that f = l and further show that f s monotoncally ncreasng. Let be the element functon and J +1 f +1 = (+1) l = β (l) > 0. Suppose that f +1 < f we wll show that ths s not possble. Snce f we have then J, but β (l) < f = mn{ β (l) } β (l) = f +1. Ths volates the assumpton that f +1 < f hence f +1 f, thus f s monotoncally ncreasng. Snce the unon of closed ntervals wth a common element s also a closed nterval then the sequence [ l, u] s closed nterval, Thus we have a sylne matrx where each row of H contans the elements of the unon of all φ that contans x..e. C = {f,..., }. 5.3.2 A Banded Matrx In a band matrx β s constant. Whch mples that f s strctly monotoncally ncreasng, that s f < f +1, = β,..., n β. The excepton s n the frst rows when = 1, 2,..., β, then f = f +1. Corollary 1. If the functon f = P φ s partally separable on the form f(x) = n β (u) =1+β (l) φ (x β (l), x β (l) +1,..., x +β (u) 1, x +β (u)) where β = β (l) + β (u) s constant. Then the Hessan s a band matrx wth (half) bandwdth β. 12

Proof. Then 8 < { : β (l) + β (u) } f = 1 + β (l),..., n β (u) J = { : 1 + β (u) } f = 1, 2,..., β (l) : { : β (l) n} f = n, n 1,..., n β (u) + 1 such that an element n the gradent s dependent of where and x l, x l +1,..., x, x +1,..., x u 1, x u l = mn J { β (l) } = β (l) β (u) u = max J { + β (u) } = + β (l) + β (u). Thus f = l and f < f +1. Except for = 1, 2,..., β then l = 1 and f = f +1. 6 Data Structures for Sparse Tensors In ths secton we brefly dscuss data structures for nduced sparse super-symmetrc tensors. The sparsty structure of the tensor s decded by the sparsty structure of the row and row of the Hessan matrx, ths from (27). Thus we have the ntersecton of two ndex sets. Further we defne the tube (, ) to be T, = 1, 2,..., C T C s the nonzero ndex structure of (, ) of the tensor, where C and C s defned as n (28). The number of stored nonzero n the tensor s thus 6.1 A General Sparse Tensor nnz(t ) = n C =1 C \ C. Creatng a data structure for a general sparse super-symmetrc tensor we must compute and store the whole ndex structure á pror any computaton wth the tensor. Then we have at least 2nnz(T ) memory requrements. Snce the structure of the tensor s nown by the Hessan matrx t should be exploted to save memory. Ths can be done by performng the computaton C T C before each numercal computaton wth tube (, ). Thus we only need to store the sze of C T C. Ths s less memory, but the computatonal cost wll ncrease. Ths s llustrated n Algorthm 5 computng g g + (pt )p. 6.1.1 An ANSI C Implementaton wth Á Pror Storage In ths secton we present an ANSI C mplementaton of the operaton g g + (pt )p wth á pror storage of the tensor nduced by a general sparse Hessan matrx. Thus the tensor s stored as a general sparse tensor. Consder the followng matrx H (note that the numercal values are randomly generated for both the matrx H, and ts nduced tensor and the vector p), 13

0 H = B @ 1 1 2 3 1 4 2 5 1 1 1 6 Fgure 9: Stored elements of a symmetrc general matrx. Then we have the followng Symmetrc Compressed Row Storage (CRS) n ANSI C of the H matrx: double valueh[] = {1,1,2,3,1,4,2,5,1,1,1,6}; nt ndexh[] = {0,0,1,2,1,3,2,4,0,2,4,5}; nt ponterh[] = {0,1,3,4,6,8,12}; Then we have the followng Super-Symmetrc Compressed Tube Storage (CTS) n ANSI C of the nduced tensor of the matrx H: double valuet[] = {1,2,2,2,3,4,4,4,5,5,5,6,6,6,6,6,6,6,6}; nt ndext[] = {0,0,0,1,2,1,1,3,2,2,4,0,2,2,4,0,2,4,5}; nt pontert[] = {0,1,2,4,5,6,8,9,11,12,13,15,19}; 1 C A 14

Fnally, we have the mplementaton of the operaton g g + (pt )p usng the above data structures: nt N = 6; double p[] = {1,1,1,1,1,1}; double g[] = {0,0,0,0,0,0}; nt start = 0, stop = 0, = 0, = 0, = 0; nt nd = 0, tp = 0; nt p = 0, p = 0, p = 0, pp = 0, pp = 0, p = 0, p = 0, pp = 0, pt=0; nt starttube=0, stoptube=0; double T = 0, T = 0, T = 0, T = 0; double ga = 0; for( = 0;<N;++,nd++,tp++){ start = ponterh[]; stop = ponterh[+1]-1; = ndexh[start]; p = p[]; p = 2*p; pp = p*p; for(;<;start++,nd++,tp++){ = ndexh[start]; p = p[]; pp = p*p; pp = 2*pp; p = 2*p; starttube = pontert[tp]; stoptube = pontert[tp+1]-1; for(;starttube<stoptube;starttube++,nd++){ //Handle the case when no ndces are equal:!=!=!= = ndext[nd]; T = valuet[nd]; p = p[]; pt = p*t; ga += pt; g[] += pp*t; } //Handle the case when two ndces are equal: = T = valuet[nd]; g[] += (p*ga+p*p*t); g[] += (p*ga+pp*t); ga = 0.0; = ndexh[start+1]; } starttube = pontert[tp]; stoptube = pontert[tp+1]-1; for(;starttube<stoptube;starttube++,nd++){ //Handle the case when two ndces are equal: = = ndext[nd]; T = valuet[nd]; p = p[]; ga += p*t; g[] += pp*t; } //Handle the case when all three ndces are equal g[] += (p*ga + pp*valuet[nd]); ga = 0.0; } We have used a lnear arrays to store both the Hessan matrx and ts nduced tensor. If flexblty s an ssue agged arrays can perform ust as well as lnear arrays consderng effcency [11, 17]. Ths would apply for the languages C, C# and Java [12]. 6.2 Banded and Sylne Tensors It s more straghtforward creatng data structures for banded or sylne tensors snce we only need the ndex structure of the Hessan matrx wthout any loss n performance. For a sylne we have that 15

C = {f } and C C = {max{f, f } },. Then a band tensor has the start ndex = f,..., for tube (, ). All the elements from the start ndex to are nonzero elements for a banded tensor. Further a sylne tensor has the start ndex = max{f, f },..., for tube (, ). All the elements from the start ndex to are nonzero elements for a sylne tensor. Ths s llustrated n Algorthm 6 computng H H + (pt ) for a sylne tensor. Algorthm 5 Computng g g + (pt )p. Let T R n n n be a super-symmetrc tensor. Let p, g R n. Let s R. C s the nonzero ndex pattern of row of the matrx H. C s the nonzero ndex structure of row of the Hessan matrx. s = 0 for C < do for C C < do s+ = p T g + = 2p p T g + = p (2s + p T ) g + = 2p (s + p T ) s = 0 for C < do s+ = p T g + = p 2 T g + = (2p s + p 2 T ) Algorthm 6 Computng H H + (pt ). Let T R n n n be a super-symmetrc tensor. Let H R n n be a symmetrc matrx. Let p R n. for = f to 1 do for = max{f, f } to 1 do H + = p T H + = p T H + = p T H + = p T H + = p T for = f to 1 do H + = p T H + = p T H + = p T Fgure 10: Algorthm 5 requres 4nnz(T ) + 8nnz(H) 6n number of arthmetc operatons. Fgure 11: Algorthm 6 requres 6nnz(T ) 4nnz(H) number of arthmetc operatons. 16

7 The Effcency Rato Indcator A measure of the complexty of worng wth the thrd dervatve (tensor) compared to the second dervatve (matrx) we wll use the rato of the number of nonzero elements n the tensor and number of nonzero elements n Hessan matrx. Cubc and quadratc value term rato s flops(p T (pt )p) flops(p T Hp) cubc and quadratc gradent term rato s = 2nnz(T ) + 5nnz(H) n 2nnz(H) + 3n = nnz(t ) nnz(h) + 2.5 + O( 1 n ), where flops((pt )p) flops(hp) = 4nnz(T ) + 8nnz(H) 6n 4nnz(H) n = nnz(t ) nnz(h) + 2 + O( 1 n ) lm n O( 1 n ) 0. The nonzero elements that we store of the banded Hessan matrx s nnz(h) = β (n ) = (β + 1)(n β 2 ). =0 The number of nonzero elements n a tensor s further derved from the unform banded Hessan matrx wth band wdth β and where n s the dmenson of the Hessan matrx. nnz(t ) = ( β (β + 1)) (n β) + =0 β =0 1 2 ( + 1) = 1 2 (β + 1)(β + 2)(n 2 3 β). For a unform band matrx we have the rato of number of nonzero elements of the tensor and the Hessan matrx s nnz(t )/nnz(h) = 1 2 (β + 1)(β + 2)(n 2 3 β)/(β + 1)(n β 2 ) = 1 2 (β + 2)(n 2 3 β)/(n β 2 ) β + 2. 2 Then the rato of number of nonzero elements of the tensor and the band Hessan matrx s β + 2 2 where β and the number of nonzero elements of the tensor s of the same order as for the band Hessan matrx. Tang nto account symmetry ths rato s n+2 3 for a dense matrx and tensor. 7.1 The Cost of the Halley Class versus the Newton Cost For each teraton of Newton s method and one teraton of the Halley class there s a dfference n cost. We wll for each of these methods outlne a computatonal cost and compare them to each other. The computatonal cost s for a banded Hessan matrx, thus the bandwdth β s fxed. 17

Snce for a small β the square root s a sgnfcant amount of the computaton, we use an LDL T factorzaton. Further we solve the system LDL T x = b wth Ly = b, Dz = y, and L T x = z [9]. The computatonal effort of one step of Newtons method s 1. evaluaton of f(x ), f(x ), 2 f(x ) 2. factorzaton LDL T = 2 f(x ) requres nβ 2 + 4nβ flops 3. soluton of LDL T s (1) 4. updatng the soluton x +1 = x + s (1) = f(x ) requres 4nβ + n flops requres n flops The total computatonal effort for one Newton step wthout the cost of functon, gradent and Hessan evaluatons s n(β 2 + 8β + 2) flops. The computatonal effort of one step of Chebyshev s method s 1. evaluaton of f(x ), f(x ), 2 f(x ), and 3 f(x ) 2. factorzaton LDL T = 2 f(x ) requres nβ 2 + 4nβ flops 3. soluton of LDL T s (1) = f(x ) requres 4nβ + n flops 4. evaluaton of (pt )p requres 2(β + 1)(β + 2)(n 2 3 β) + 8(β + 1)(n β 2 ) 6n flops 5. soluton of LDL T s (2) = 1 2 (pt )p requres 4nβ + n flops 6. updatng the soluton x +1 = x + s (1) + s (2) requres 2n flops The total computatonal effort for one Chebyshev step wthout the cost of functon, gradent, Hessan and tensor evaluatons s 3nβ 2 4 3 β3 + 26nβ 8β 2 20 3 β + 10n flops. The computatonal effort of one step of Halley s method s 1. evaluaton of f(x ), f(x ), 2 f(x ), and 3 f(x ) 2. factorzaton LDL T = 2 f(x ) requres nβ 2 + 4nβ flops 3. soluton of LDL T s (1) = f(x ) requres 4nβ + n flops 4. evaluaton of (pt ) requres 3(β + 1)(β + 2)(n 2 3 β) 4(β + 1)(n β 2 ) flops 5. evaluaton of (pt )p (matrx-vector product) requres 4nβ + 2n flops 6. evaluaton of 2 f(x ) + 1 2 (pt ) requres 2nβ + 2n flops 7. factorzaton LDL T = 2 f(x ) + 1 2 (pt ) requres nβ2 + 4nβ flops 8. soluton of LDL T s (2) = (pt )p requres 4nβ + n flops 9. updatng the soluton x +1 = x + s (1) + s (2) requres 2n flops The total computatonal effort for one Halley step wthout the cost of functon, gradent, Hessan and tensor evaluatons s 5nβ 2 2β 3 + 27nβ 4β 2 2β + 10n flops. The computatonal effort of one step of Super Halley s method s 1. evaluaton of f(x ), f(x ), 2 f(x ), and 3 f(x ) 2. factorzaton LDL T = 2 f(x ) requres nβ 2 + 4nβ flops 18

3. soluton of LDL T s (1) = f(x ) requres 4nβ + n flops 4. evaluaton of (pt ) requres 3(β + 1)(β + 2)(n 2 3 β) 4(β + 1)(n β 2 ) flops 5. evaluaton of (pt )p (matrx-vector product) requres 4nβ + 2n flops 6. evaluaton of 2 f(x ) + (pt ) requres nβ + n flops 7. factorzaton LDL T = 2 f(x ) + (pt ) nβ 2 + 4nβ flops 8. soluton of LDL T s (2) = (pt )p requres 4nβ + n flops 9. updatng the soluton x +1 = x + s (1) + s (2) requres 2n flops The total computatonal effort for one Super Halley step wthout the cost of functon, gradent, Hessan and tensor evaluatons s 5nβ 2 2β 3 + 26nβ 4β 2 2β + 9n flops. It s mportant to notce that we have not taen nto account the evaluaton of the test functon.e functon value, gradent, Hessan and/or tensor snce they are problem dependent. Ths unnown factor mght have a crucal mpact on the total runnng tme of the methods. 7.1.1 The Effcency Rato Indcator for the Halley Class The rato ndcators below shows how much more the cost of usng thrd order method s compared to a Newton s method for one teraton. The Chebyshev to Newton cost s f lops(chebyshev) f lops(n ewton) the Halley to Newton cost s = 3nβ2 4 3 β3 + 26nβ 8β 2 20 3 β + 10n n(β 2 + 8β + 2) = 3β2 + 26β + 10 β 2 + 8β + 2 + O( β n ), f lops(halley) flops(newton) = 5nβ2 2β 3 + 27nβ 4β 2 2β + 10n n(β 2 = 5β2 + 27β + 10 + 8β + 2) β 2 + O( β + 8β + 2 n ), the Super Halley to Newton cost s where f lops(superhalley) f lops(n ewton) = 5nβ2 2β 3 + 26nβ 4β 2 2β + 9n n(β 2 + 8β + 2) = 5β2 + 26β + 9 β 2 + 8β + 2 + O(β n ), 7.2 Numercal Results lm O(β n n ) 0. In ths secton we present numercal results where the theoretcal flops count for the operatons s compared to the rato nnz(t ) nnz(h), and we have the measured CPU tmes for the operatons compared to the rato nnz(t ) nnz(h). Comparng the rato nnz(t ) nnz(h) for all of the test cases shows that the rato ndcator has a lnear behavour for ncreasng n thus t ndcates the cost from gong from a quadratc to a cubc operaton. 19

50 45 40 n=10 n=30 n=50 n=80 n=100 Dense Hessan Matrces Cubc term 50 45 40 n=10 n=30 n=50 n=80 n=100 Dense Hessan Matrces Cubc term 35 35 Flops Rato 30 25 20 CPU Rato 30 25 20 15 15 10 10 5 5 0 0 5 10 15 20 25 30 35 40 45 50 nnz(t)/nnz(h) 0 0 5 10 15 20 25 30 35 40 45 50 nnz(t)/nnz(h) Fgure 12: Dense matrces: The rato nnz(t )/nnz(h) to the flops rato. Fgure 13: Dense matrces: The rato nnz(t )/nnz(h) to the CPU rato. 50 45 40 Dense Hessan Matrces Gradent term n=10 n=30 n=50 n=80 n=100 50 45 40 Dense Hessan Matrces Gradent term n=10 n=30 n=50 n=80 n=100 35 35 Flops Rato 30 25 20 CPU Rato 30 25 20 15 15 10 10 5 5 0 0 5 10 15 20 25 30 35 40 45 50 nnz(t)/nnz(h) 0 0 5 10 15 20 25 30 35 40 45 50 nnz(t)/nnz(h) Fgure 14: Dense matrces: The rato nnz(t )/nnz(h) to the flops rato. Fgure 15: Dense matrces: The rato nnz(t )/nnz(h) to the CPU rato. 20

15 Banded Hessan Matrces Cubc term n=100 n=300 n=500 n=800 n=1000 15 Banded Hessan Matrces Cubc term n=100 n=300 n=500 n=800 n=1000 10 10 Flops Rato CPU Rato 5 5 0 0 5 10 15 nnz(t)/nnz(h) 0 0 5 10 15 nnz(t)/nnz(h) Fgure 16: Banded matrces: The rato nnz(t )/nnz(h) to the flops rato. Fgure 17: Banded matrces: The rato nnz(t )/nnz(h) to the CPU rato. 20 18 16 Banded Hessan Matrces Gradent term n=100 n=300 n=500 n=800 n=1000 20 18 16 Banded Hessan Matrces Gradent term n=100 n=300 n=500 n=800 n=1000 14 14 Flops Rato 12 10 8 CPU Rato 12 10 8 6 6 4 4 2 2 0 0 2 4 6 8 10 12 14 16 18 20 nnz(t)/nnz(h) 0 0 2 4 6 8 10 12 14 16 18 20 nnz(t)/nnz(h) Fgure 18: Banded matrces: The rato nnz(t )/nnz(h) to the flops rato. Fgure 19: Banded matrces: The rato nnz(t )/nnz(h) to the CPU rato. 21

10 9 8 7 n=100 n=237 n=468 n=675 n=729 n=957 n=960 Sparse Hessan Matrces Cubc term 10 9 8 7 n=100 n=237 n=468 n=675 n=729 n=957 n=960 Sparse Hessan Matrces Cubc term Flops Rato 6 5 4 CPU Rato 6 5 4 3 3 2 2 1 1 0 0 1 2 3 4 5 6 7 8 9 10 nnz(t)/nnz(h) 0 0 1 2 3 4 5 6 7 8 9 10 nnz(t)/nnz(h) Fgure 20: Sparse matrces: The rato nnz(t )/nnz(h) to the flops rato. Fgure 21: Sparse matrces: The rato nnz(t )/nnz(h) to the CPU rato. 10 9 8 7 Sparse Hessan Matrces Gradent term n=100 n=237 n=468 n=675 n=729 n=957 n=960 10 9 8 7 Sparse Hessan Matrces Gradent term n=100 n=237 n=468 n=675 n=729 n=957 n=960 Flops Rato 6 5 4 CPU Rato 6 5 4 3 3 2 2 1 1 0 0 1 2 3 4 5 6 7 8 9 10 nnz(t)/nnz(h) 0 0 1 2 3 4 5 6 7 8 9 10 nnz(t)/nnz(h) Fgure 22: Sparse matrces: The rato nnz(t )/nnz(h) to the flops rato. Fgure 23: Sparse matrces: The rato nnz(t )/nnz(h) to the CPU rato. Conclusons and Future Wor In ths paper we have seen examples of thrd order local methods that are compettve wth second order methods (Newton). Thus the use of exact thrd order dervatves s practcal from a computatonal vew. The trust-regon algorthm wth a cubc model has fewer teratons then when mplemented wth a quadratc model, for our test cases. Thus ndcatng that exact thrd dervatves s also useful for global methods, whch s promsng for further wor n ths area. The concept of nduced sparsty maes t possble to create data structure and algorthms for tensor operatons n a straghtforward manner. For banded and sylne type Hessan matrces we need not to store the ndex structure of the tensor, ust ts values. For general sparse Hessan matrces we have presented some examples of data structures for the tensor none whch we consdered to be satsfactorly when t comes to effcency and memory requrements. 22

We have also seen that partally separable functons can reveal the structure of the Hessan matrx thus snce the tensor (thrd dervatves) s nduced by the Hessan matrx, t also reveals ts structure. Numercal results shows that the rato nnz(t ) nnz(h) s an ndcator for all types of Hessan structure. The growth of the rato nnz(t ) nnz(t ) nnz(h) for dense Hessan structure s large for ncreasng n. The rato nnz(h) for sparse Matrx Maret matrces s usually small. The Effcency Rato Indcator for the Halley Class for banded Hessan matrces s ndependent of n. A Computng the Cubc Model The Cubc gradent term: (pt )p Algorthm 7 Computng g (1). Let T R n n n be super-symmetrc. Let g (1) R n be a vector. for = 1 to do for = 1 to do g (1) + = p p T Fgure 24: An mplementaton of (15). Algorthm 8 Computng g (2). Let T R n n n be super-symmetrc. Let g (2) R n be a vector. for = 1 to do for = + 1 to do g (2) + = p p T Fgure 25: An mplementaton of (16) for T. Algorthm 9 Computng g (2). Let T R n n n be super-symmetrc. Let g (2) R n be a vector. for = 1 to do for = 1 to 1 do g (2) + = p p T Fgure 26: An 1 n approach of (16) for T. 23

Algorthm 10 Computng g (3). Let T R n n n be super-symmetrc. Let g (3) R n be a vector. for = 1 to do for = + 1 to n do g (3) + = p p T Fgure 27: An mplementaton of (17) for T. Algorthm 11 Computng g (3). Let T R n n n be super-symmetrc. Let g (3) R n be a vector. for = 1 to 1 do for = 1 to do g (3) + = p p T Fgure 28: An 1 n approach of (17) for T. Algorthm 12 Computng g (4). Let T R n n n be super-symmetrc. Let g (4) R n be a vector. for = + 1 to n do for = 1 to do g (4) + = p p T Fgure 29: An mplementaton of (18) for T. Algorthm 13 Computng g (4). Let T R n n n be super-symmetrc. Let g (4) R n be a vector. for = 1 to 1 do for = 1 to do g (4) + = p p T Fgure 30: An 1 n approach of (18) for T. 24

Algorthm 14 Computng g (5). Let T R n n n be super-symmetrc. Let g (5) R n be a vector. for = + 1 to n do for = + 1 to do g (5) + = p p T Fgure 31: An mplementaton of (19) for T. Algorthm 15 Computng g (5). Let T R n n n be super-symmetrc. Let g (5) R n be a vector. for = 1 to do for = 1 to 1 do g (5) + = p p T Fgure 32: An 1 n approach (19) for T. Algorthm 16 Computng g (6). Let T R n n n be super-symmetrc. Let g (6) R n be a vector. for = + 1 to n do for = + 1 to n do g (6) + = p p T Fgure 33: An mplementaton of (20) for T. Algorthm 17 Computng g (6). Let T R n n n be super-symmetrc. Let g (6) R n be a vector. for = 1 to 1 do for = 1 to 1 do g (6) + = p p T Fgure 34: An 1 n approach of (20) for T. Note that Algorthm 8 and 9 are equvalent, Algorthm 10 and 11 equvalent, Algorthm 12 and 13 equvalent, Algorthm 14 and 15 equvalent, and Algorthm 16 and 17 equvalent n the respect that they produce the same fnal gradent, but the ntermedate entres n the gradent are not the same. 25

The Cubc Hessan term: (pt ) Algorthm 18 Computng H (1). Let T R n n n be super-symmetrc. Let H (1) R n n be symmetrc. for = 1 to do for = 1 to do H (1) + = p T Algorthm 19 Computng H (1). Let T R n n n be super-symmetrc. Let H (1) R n n be a symmetrc. for = 1 to do for = 1 to do H (1) + = p T Fgure 35: An mplementaton of (24). Fgure 36: An 1 n approach of (24). Algorthm 20 Computng H (2). Let T R n n n be super-symmetrc. Let H (2) R n n be a symmetrc. for = 1 to do for = + 1 to do H (2) + = p T Algorthm 21 Computng H (2). Let T R n n n be super-symmetrc. Let H (2) R n n be a symmetrc. for = 1 to do for = 1 to 1 do H (2) + = p T Fgure 37: An mplementaton of (25). Fgure 38: An 1 n approach of (25). 26

Algorthm 22 Computng H (3). Let T R n n n be super-symmetrc. Let H (3) R n n be a symmetrc. for = 1 to do for = + 1 to n do H (3) + = p T Algorthm 23 Computng H (3). Let T R n n n be super-symmetrc. Let H (3) R n n be a symmetrc. for = 1 to 1 do for = 1 to do H (3) + = p T Fgure 39: An mplementaton of (26). Fgure 40: An 1 n approach of (26). Note that Algorthm 18 and 19 are equvalent, Algorthm 20 and 21 are equvalent, and Algorthm 22 and 23 equvalent n the respect that they produce the same fnal matrx, but the ntermedate entres n the matrces are not the same. References [1] M. Altman. Iteratve Methods of Hgher Order. Bulletn De LÁcademe Polonase Des Scences Sere des scences math., astr. et phys.-vol. I, No. 2, 1961. [2] B. W. Bader and T. G. Kolda. MATLAB Tensor Classes for Fast Algorthm Prototypng. Techncal Report SAND 2004-5187, October 2004. [3] E.M.L. Beale. On an teratve method of fndng a local mnmum of a functon of more than one varable. Tech. Rep. No. 25, Statstcal Technques Research Group, Prnceton Unv., Prnceton, N.J., 1958. [4] A. M. Cuyt. Numercal Stablty of the Halley-Iteraton for the Soluton of a System of Nonlnear Equatons. Mathematcs of Computaton. Volume 38, Number 157. January, 1982. [5] A. M. Cuyt. Computatonal Implementaton of the Multvarate Halley Method for Solvng Nonlnear Systems of Equatons. ACM Transactons on Mathematcal Software, Vol. 11, No. 1, March 1985, Pages 20-36. [6] N. Deng and H. Zhang. Theoretcal Effcency of a New Inexact Method of Tangent Hyperbolas. Optmzaton Methods and Software. Vol. 19, Nos. 3-4, June-August 2004, pp.247-265. [7] J. A. Ezquerro and M. A. Hernandez. A New Class of Thrd-Order Methods n Banach Spaces. Bulletn of the Insttute of Mathematcs Academa Snca. Volume 31, Number 1, March 2003. [8] W. Gander. On Halley s Iteraton Method. Amercan Mathematcal Monthly, Vol. 92, No. 2. (Feb., 1985), pp. 131-134. [9] G. H. Golub and C. F. Van Loan. Matrx Computatons. Johns Hopns Unversty Press, 3rd edton, 1996. 27

[10] A. Grewan and Ph. L. Tont. On the unconstraned optmzaton of partally separable functons. In Mchael J. D. Powell, edtor, Nonlnear Optmzaton 1981, pages 301-312. Academc Press, New Yor, NY, 1982. [11] G. Gundersen and T. Stehaug. Data Structures In Java for Matrx Computaton. Concurrency and Computaton: Practce and Experence, 16(8):735815, July 2004. [12] G. Gundersen and T. Stehaug. On the Effcency of Arrays n C, C# and Java. In C. Rong, edtor, Proc. of the Norwegan Informatcs Conference (NIK 04), pages 1930. Tapr Academc Publsher, Nov. 2004. [13] J. M. Guterrez and M. A. Hernandez. An acceleraton of Newton s method: Super-Halley method. Appled Mathematcs and Computaton. 25 January 2001, vol. 117, no. 2, pp. 223-239(17). [14] D. Han. The Convergence on a Famly of Iteratons wth Cubc Order. Journal of Computatonal Mathematcs, Vol.19, No.5, 2001, 467-474. [15] R. H. F. Jacson and G. P. McCormc. The Polyadc Structure of Factorable Functon Tensors wth Applcatons to Hgher-Order Mnmzaton Technques. Journal of Optmzaton Theory and Applcatons: Vol. 51, No. 1, October 1986. [16] R. Kalaba and A. Tshler. A Generalzed Newton Algorthm Usng Hgher-Order Dervatves. Journal of Optmzaton Theory and Applcatons: Vol. 39, No. 1, January 1983. [17] M. Luán and A. Usman and P. Harde and T. L. Freeman and J. R. Gurd. Storage formats for sparse matrces n Java. In Proceedngs of the 5th Internatonal Conference on Computatonal Scence ICCS 2005, Part I, Part I, volume 3514 of Lecture Notes n Computer Scence, pages 364-371. Sprnger-Verlag, 2005. [18] J.J. More, B.S. Garbow and K.E. Hllstrom. Testng unconstraned optmzaton software. ACM Transactons on Mathematcal Software, 7, 17 41, 1981. [19] J. Nocedal and S. J. Wrght. Numercal Optmzaton. Sprnger Seres n Operatons Research. Sprnger-Verlag, 1999. [20] J. M. Ortega and W. C. Rhenboldt. Iteratve soluton of nonlnear equatons n several varables. New Yor, Academc Press, 1970. [21] W. C. Rhenboldt. Methods for Solvng Systems of Equatons of Nonlnear Equatons. Reg. Conf. Ser. n Appl. Math, Vol. 14. SIAM Publcatons, Phladelpha, PA, 1974. [22] W. C. Rhenboldt. Methods for Solvng Systems of Equatons of Nonlnear Equatons. Second edton. Regonal Conf. Seres n Appl. Math., Vol. 70. SIAM Publcatons, Phladelpha, PA, 1998. [23] H. P. Schwefel. Numercal Optmzaton of Computer Models. John Wley and Sons, Chchester, 1981. [24] P. Sebah and. Gourdon. Newton s method and hgher order teratons. numbers.computaton.free.fr/constants/constant.html [25] J. F. Traub. Iteratve Methods for the Soluton of Equatons. Prentce-Hall, Englewood Clffs, NJ, 1964. [26] Ph.L Tont. Some numercal results usng a sparse matrx updatng formula n unconstraned optmzaton. Mathematcs of Computaton, Volume 32, Number 143. July 1978, pages 839-851. [27] W. Werner. Some mprovement of classcal teratve methods for the soluton of nonlnear equatons. In: E.L. Allgower, K. Glashoff and N.-O. Petgen, Eds., Proc. Numercal Soluton of Nonlnear Equatons (Sprnger, Bremen, 1980) 426 440. 28

[28] W. Werner. Iteratve Soluton of Systems of Nonlnear Equatons based upon Quadratc Approxmatons. Comp and Math, Appls. Vol. 12A. No. 3. pp. 331-343. 1986. [29] T. Yamamoto. Hstorcal developments n convergence analyss for Newton s and Newton-le methods. Journal of Computatonal and Appled Mathematcs 124 (2000) 1-23. [30] S. Zheng and D. Robbe. A Note on the Convergence of Halley s Method for Solvng Operator Equatons. J. Austral. Math. Soc. Ser. B 37(1995),16-25. 29