Modified Cholesky algorithms: a catalog with new approaches

Size: px

Start display at page:

Download "Modified Cholesky algorithms: a catalog with new approaches"

Lillian Richardson
6 years ago
Views:

1 Math. Program., Ser. A (2008) 5: DOI 0.007/s FULL LENGTH PAPER Modified Cholesky algorithms: a catalog with new approaches Haw-ren Fang Dianne P. O Leary Received: 8 August 2006 / Accepted: 2 June 2007 / Published online: 24 July 2007 Springer-Verlag 2007 Abstract Given an n n symmetric possibly indefinite A, a modified Cholesky algorithm computes a factorization of the positive definite A + E, where E is a correction. Since the factorization is often used to compute a Newton-like downhill search direction for an optimization problem, the goals are to compute the modification without much additional cost and to keep A + E wellconditioned and close to A. Gill, Murray and Wright introduced a stable algorithm, with a bound of E 2 = O(n 2 ). An algorithm of Schnabel and Eskow further guarantees E 2 = O(n). We present variants that also ensure E 2 = O(n). Moré and Sorensen and Cheng and Higham used the block LBL T factorization with blocks of order or 2. Algorithms in this class have a worst-case cost O(n 3 ) higher than the standard Cholesky factorization. We present a new approach using a sandwiched LTL T -LBL T factorization, with T tridiagonal, that guarantees a modification cost of at most O(n 2 ). H.-r. Fang s work was supported by National Science Foundation Grant CCF D. P. O Leary s work was supported by National Science Foundation Grant CCF and Department of Energy Grant DEFG0204ER H.-r. Fang Department of Computer Science and Engineering, University of Minnesota, 200 Union Street, Minneapolis, MN 55455, USA D. P. O Leary (B) Department of Computer Science and Institute for Advanced Computer Studies, University of Maryland, A.V. Williams Building, College Park, Maryland, MD 20742, USA oleary@cs.umd.edu 23

2 320 H.-r. Fang, D. P. O Leary Introduction Modified Cholesky algorithms are widely used in nonlinear optimization to compute Newton-like directions. Given a symmetric possibly indefinite n n A approximating the Hessian of a function to be minimized, the goal is to find a positive definite Â = A + E, where E is small. The search direction x is then computed by solving the linear system (A + E) x = g(x) where g(x) is the gradient of the function to be minimized. This direction is Newton-like and guaranteed to be downhill. Four objectives to be achieved when computing E are listed below [6,8,9]. Objective. If A is sufficiently positive definite, E = 0. Objective 2. If A is not positive definite, E is not much larger than inf A : A + A is positive definite} for some reasonable norm. Objective 3. The A + E is reasonably well-conditioned. Objective 4. The cost of the algorithm is only a small multiple of n 2 higher than that of the standard Cholesky factorization, which takes 3 n3 + O(n 2 ) flops ( 6 n3 + O(n 2 ) multiplications and 6 n3 + O(n 2 ) additions). Objective ensures that the fast convergence of Newton-like methods on convex programming problems is retained by the modified Cholesky algorithms. Objective 2 keeps the search direction close to Newton s direction, while Objective 3 implies numerical stability when computing the search direction. Objective 4 makes the work in computing the modification small relative to the work in factoring a dense. There are two classes of algorithms, motivated by the simple case when A is diagonal: A = diag(d, d 2,...,d n ). In this case we can make A+ E = diag( ˆd, ˆd 2,..., ˆd n ) positive definite by choosing ˆd k := max d k,δ} for k =,...,n, where δ>0is a preset small tolerance. We call such a modification algorithm a Type-I algorithm. Alternatively, we can choose ˆd k := maxd k,δ}. We call modified Cholesky algorithms of this kind Type-II algorithms. In both types of algorithms, δ must be kept small to satisfy Objective 2, but large enough to satisfy Objective 3. Early approaches were of Type-I [4, Chap 4][6], whereas more recently Type-II algorithms have prevailed [6,8,9]. There are three useful factorizations of a symmetric A as PAP T = LXL T, where P is a permutation for pivoting, and L is unit lower triangular:. the LDL T factorization, where X is diagonal. 2. the LBL T factorization, where X is block diagonal with block order or 2 [2 4]. 3. the LTL T factorization, where X is a tridiagonal, and the off-diagonal elements in the first column are all zero [,7]. Existing modified Cholesky algorithms use either the LDL T factorization [4, Chap. 4][8,9], the LBL T factorization [6,6], or the LTL T factorization [5, Sect. 4.5]. We present new modified LDL T factorizations and an approach via the sandwiched LTL T -LBL T factorization. All modified LDL T algorithms presented in this article modify the as the decomposition proceeds. On the other hand, all modified LBL T and LTL T algorithms modify a computed factorization; thus one can use an existing implementation and can adjust the tolerance without refactoring. If D is nonnegative, it is the Cholesky factorization in the LDL T form. 23

3 Modified Cholesky algorithms 32 Table Properties of modified Cholesky algorithms Algorithm Type δ Obj. Obj. 2 Obj. 3 Obj. 4 LDL T GMW8 I ɛ M (4) (3) (25) O(n 2 ) SE90 II τη (32) (3)(4) (33) O(n 2 ) SE99 II τη (32) (3)(8) (33) O(n 2 ) GMW-I I ɛ M (32) (2) (25) O(n 2 ) GMW-II II τη (32) (24) (25) O(n 2 ) SE-I I τη (32) (29)(8) (33) O(n 2 ) LBL T MS79 I ɛ M (34) (35) (39) O(n 3 ) CH98 II u A (34) (36) (40) O(n 3 ) LTL T LTL T -MS79 I ɛ M (43) (44) (46) O(n 2 ) LTL T -CH98 II τη (43) (45) (47) O(n 2 ) Table entries in columns 4 6 note the number of the equation related to the corresponding objective Table 2 Notation Symbol Description ɛ M Machine epsilon u Unit roundoff, ɛ M /2 n Dimension of the problem A An n n symmetric ξ Maximum magnitude of off-diagonal elements of A η Maximum magnitude of diagonal elements of A λ i (A) ith smallest eigenvalue of A λ min (A) Smallest eigenvalue of A λ max (A) Largest eigenvalue of A τ 3 ɛ M 3 τ ɛ 2 M In all we review five modified Cholesky algorithms in the literature and give five new ones, each of which depends on a modification tolerance parameter δ>0. Satisfaction of Objectives 3 is measured by bounds, discussed in detail as the algorithms are introduced, and equation numbers of the bounds are referenced in Table, where the new algorithms are indicated in boldface. Table 2 lists some notation used in this paper. We use diag(a,...,a n ) to denote the diagonal formed by a,...,a n, and Diag(A) to denote the diagonal formed by the diagonal of A. The organization of this paper is as follows. Section 2 presents the modified LDL T factorizations in the literature and Sect. 3 presents our new variants. Section 4 describes 23

4 322 H.-r. Fang, D. P. O Leary modified LBL T factorizations in the literature. Section 5 gives new algorithms using a sandwiched LTL T -LBL T factorization. Section 6 summarizes the results of our computational tests. Concluding remarks are given in Sect Modified LDL T Algorithms Given a LDL T factorization of a symmetric A, a naïve way to modify A to be positive definite is by making nonpositive elements in the[ diagonal ] D positive. 0 However, the factorization of A may fail to exist (e.g., ), and even if it does, 0 this method fails to meet Objective 2. For example, in the LDL T factorization A = [ ] ɛ = 0 [ ][ ][ ] 0 ɛ 0 ɛ ɛ 0 = LDL T, ɛ 0 the modification is unbounded when ɛ 0 +.A3 3 example is given in [4, Chap. 4]. In a modified L DL T algorithm for a positive definite Â = A + E, with E = diag(δ,δ 2,...,δ n ), we compute δ k 0atthekth step of the factorization, [ for ak c k =,...,n. Denote the Schur complement at the kth step by A k = k T ] for c k Ā k k =,...,n, where a k R and c k is a column vector of n k elements. Initially A := A. The factorization can be computed by setting L(k+:n, k) := c k, D(k, k) := a k + δ k and A k+ := Ā k c kck T () a k + δ k a k + δ k for k =,...,n. The challenge is to determine δ k to satisfy the four objectives. All the algorithms in Sects. 2 and 3 follow this model, although Schnabel and Eskow [8,9] originally formulated their algorithm in LL T form. One may incorporate a diagonal pivoting strategy in the algorithms, symmetrically interchanging rows and columns at the kth step to ensure that a k A k ( j, j) (pivoting on the diagonal element of maximum magnitude) or a k A k ( j, j) (pivoting on the element of maximum value) for j =,...,n k. The resulting modified LDL T factorization is in the form P(A + E)P T = LDL T = L L T, (2) where P is a permutation. Gill and Murray introduced a stable algorithm in 974 [3]. It was subsequently refined by Gill et al. in 98 [4, Chap. 4]; we call it GMW8 hereafter. Schnabel and Eskow introduced another modified LDL T algorithm in 990 [8]. It was subsequently revised in 999 [9]. We call these algorithms SE90 and SE99, respectively. 23

5 Modified Cholesky algorithms The GMW8 Algorithm GMW8 determines δ k in () by setting a k + δ k = max δ, a k, c k 2 } β 2 for k =,...,n, where β>0 and the small tolerance δ>0 are preset. The default value of δ in the experiments in [6,9]isδ := ɛ M (machine epsilon). The rationale behind GMW8 is that β becomes a bound on the magnitude of the off-diagonal elements in the lower triangular L of the Cholesky factorization in (2). The challenge is to choose β such that E 2 is well-controlled and Objective is satisfied. The correction E is bounded by E 2 ( ) ξ 2 + (n )β + 2(η + (n )β 2 ) + δ =: f (β), (3) β where η and ξ are the maximum magnitudes of the diagonal and off-diagonal elements of A, respectively. Note that since E is diagonal, its -norm, 2-norm and -norm are the same. The overall extra cost of GMW8 relative to the standard Cholesky factorization is O(n 2 ), so Objective 4 is satisfied. Now consider Objective 2. The minimum of (3) is min β f (β) = 2ξ ( ) n 2 + n + 2η + δ 4nξ + 2η + δ, which is attained with β 2 = ξ for n >. n 2 A diagonal pivoting strategy is used in GMW8. The pivot is chosen as the maximum magnitude diagonal element. 2 To satisfy Objective, GMW8 sets β 2 η, so that E = 0ifA is sufficiently positive definite [3]. More precisely, E = 0ifβ 2 η and Therefore, β is chosen by λ min (A) δ. (4) } β 2 ξ := max η, n 2,ɛ M (5) for n >. The bound E 2 = O(n 2 ) is obtained by substituting (5)into(3). 2 Alternatively, one could pivot on the maximum diagonal element, but pivoting on the maximum magnitude usually gives a smaller E 2 in our experiments. 23

6 324 H.-r. Fang, D. P. O Leary 2.2 The SE90 and SE99 Algorithms SE90 and the later revised algorithm SE99 were inspired by a lemma related to the Gerschgorin circle theorem [5, pp. 344]. The ith Gerschgorin radius and circle are defined by R i (A) := n j=, j =i a ij and C i (A) := z : z a ii R i (A)} for i n. Recall that Gerschgorin showed that the eigenvalues of A are contained in the union of the circles C i (A). Therefore, one could perturb A to be positive semidefinite by setting δ k := max0, a kk + R k (A)} for k =,...,n in (). The modification δ k can be reduced by the following lemma. [ ] a c T Lemma Given a symmetric A = R c Ā n n, suppose we add a perturbation δ 0, a + c } to a, so that a + δ c. The resulting Schur complement 3 is Â := Ā a+δ cct. Then C i(â) C i+ (A) for i =,...,n. Proof This proof is a condensed version of that in [8]. Let ā ij and â ij denote the (i, j) entries of Ā and Â respectively for i, j <n. Also denote c =[(c),(c) 2,..., (c) n ] T.For i < n, R i+ (A) R i (Â) = (R i+ (A) R i (Ā)) + (R i (Ā) R i (Â)). The difference between R i+ (A) and R i (Ā) is (c) i. In addition, the ith column of Ā Â is (c) i c a+δ, whose -norm minus (c)2 i a+δ is the upper bound for (R i(ā) R i (Â)). Therefore, R i+ (A) R i (Â) (c) i (c) i ( c (c) i ) a + δ ( = (c) i c ) + (c)2 i a + δ a + δ (c)2 i a + δ =ā ii â ii 0. This means that the Gerschgorin circles contract, and the contraction of each circle is no less than the perturbation of the circle center. Therefore, C i (Â) C i+ (A) for i =,...,n. Following this result, A + E can be made positive semidefinite by setting δ k := max0, a k + c k } in () fork =,...,n. Note that a k c k is the lower endpoint of the Gerschgorin circle C (A k ). Repeatedly applying Lemma gets the 3 Note that the ith row/column of Ā corresponds to the (i +)st row/column of A. 23

7 Modified Cholesky algorithms 325 bound δ k max0, a kk + R k (A)} for k =,...,n. However, this naïve method may fail to satisfy Objective. To satisfy Objective, SE90 consists of two phases. The 2-phase strategy was also presented in [2]. Phase performs steps of the standard Cholesky factorization (i.e., without perturbation, δ k := 0), as long as all diagonal elements of the next Schur complement are sufficiently positive. The pseudo-code is given in Algorithm. Algorithm Phase of a 2-Phase Strategy. Given a symmetric A R n n and a small tolerance δ>0.} A := A, k := Pivot on the maximum [ diagonal element of A. ak c Denote A k = k T ], then Diag(Ā c k Ā k ) a k I n k after pivoting.} k if a δ then while Diag(Ā k c k c k T a ) δi k n k and k < n do A k+ := Ā k c k c k T a k k := k + Pivot on maximum diagonal of A k. end while end if In practice, there are matrices for which Phase of SE90 gives an inordinately large E 2. These matrices are generally close to being positive definite. To remedy this, the revised algorithm SE99 uses the relaxed 2-phase strategy that allows modest negative numbers on the diagonal in Phase. The pseudo-code is given in Algorithm 2. Algorithm 2 Relaxed Phase of a 2-Phase Strategy. Given a symmetric A R n n, δ>0and0<µ.} η := max i n A ii if Diag(A) µηi n then A := A, k := Pivot on the maximum [ diagonal element of A. ak c Denote A k = k T ], then Diag(Ā c k Ā k ) a k I n k after pivoting.} k while a k δ and Diag(A k ) µa k I n k+ and Diag(Ā k c k c k T a ) µηi k n k and k < n do A k+ := Ā k c k c k T a k k := k + Pivot on maximum diagonal of A k. end while end if SE90 uses the tolerance δ := τη, where η is the maximum magnitude of the diagonal elements of A, and τ = 3 ɛ M. Therefore, in Phase, Diag(A k ) τηi n k+ (6) 23

8 326 H.-r. Fang, D. P. O Leary for k =,...,minn, K +}, where K is the number of steps in Phase. If A is sufficiently positive definite, then K = n and the factorization completes without using Phase 2. Otherwise, Phase ends when setting δ K + := 0 results in A K +2 having a diagonal element less than δ. It is not hard to see that ˆη η and ˆξ ξ + η, (7) where ˆη and ˆξ (and η and ξ) are the maximum magnitudes of the diagonal and offdiagonal elements of A K + (and A), respectively [8]. Phase 2 proceeds as in Phase, except that δ k is determined by δ k := maxδ k, a k + max c k,τη}} G + τη, (8) for k = K+,...,n 2, where G is the maximum of zero and the negative of the lowest Gerschgorin endpoint of A K +, and δ 0 := 0 for the case K = 0. The rationale for δ k δ k is because increasing δ k up to δ k does not increase E 2 at this point and may possibly reduce the subsequent δ i for k < i n.thisnondecreasing strategy can be applied to virtually all modified Cholesky algorithms with modifications confined to the diagonal. In SE99, condition (6) isrelaxed into the two conditions in the while loop in Algorithm 2 for some 0 <µ. This possibly increases the number of Phase pivots. Schnabel and Eskow suggested µ = 0. for SE99 [9]. In SE99, δ := τη, where τ = ɛ 3 M 2 is smaller than τ = 3 ɛ M in SE90, potentially keeping E smaller. The modification in Phase 2 of SE99 turns out to be δ k := maxδ k, a k + max c k, τη}} G + τη, (9) where G is the negative of the lowest Gerschgorin endpoint of A K +. In experiments, Schnabel and Eskow [8] obtained a smaller value of E 2 when using special treatment for the final 2 2 Schur complement A n, setting }} τ(λ2 (A n ) λ (A n )) δ n =δ n := max δ n 2, λ (A n )+max,τη τ (0) G + 2τ (G + η), τ () where λ (A n 2 ) and λ 2 (A n 2 ) are the smaller and larger eigenvalues of A n 2, respectively. The last inequality holds because λ (A n ) G and λ 2 (A n ) λ (A n ) 2(G + η). In (0), δ n and δ n are chosen to obtain the bound 23 κ 2 (A n + δ n I 2 ) + (τ/( τ)) τ/( τ) = τ, (2)

9 Modified Cholesky algorithms 327 where I 2 = [ ] 0. Finally, by (8) and (), 0 E 2 G + If K = 0, then G η + (n )ξ. By(7), if K > 0, then 2τ (G + η). (3) τ G (n K )(ξ + η). (4) In either case, E 2 = O(n). Recall that with GMW8, E 2 = O(n 2 ). Since small negative numbers are allowed on the diagonal in Phase, two changes have to be made. First, it is required to check whether a k δ at each step, as shown in Algorithm 2, whereas it is not required in Algorithm. Second, it is possible that SE99 moves into Phase 2 at the last step (i.e., the number of steps in Phase is K = n ). In such a case, }} τ δ n := max 0, a n + max τ a n, τη G + τ G + τη. (5) τ The special treatment for the final 2 2 Schur complement in Phase 2 in SE90 (with δ := τη) is also used in SE99 (with δ := τη). So the bound () still holds. By (9), () and (5), E 2 G + 2τ (G + η). (6) τ Although (6) for SE99 looks the same as (3) for SE90 and both guarantee E 2 = O(n), the bound on G in (6)isdifferentfor0< K < n. Due to relaxing, the bounds (7)on ˆη and ˆξ are replaced by ˆη η and ˆξ ξ + ( + µ)η, (7) where ˆη and ˆξ (and η and ξ) are the maximum magnitudes of the diagonal and offdiagonal elements of A K + (and A), respectively. Therefore, if 0 < K < n, G (n K )(ξ + ( + µ)η) + µη. (8) Recall that K is the number of steps in Phase, and SE99 potentially has more steps staying in Phase than SE90. Diagonal pivoting is also used in both SE90 and SE99 algorithms. The analysis above does not rely on the pivoting, but pivoting reduces E 2 empirically. In Phase, the pivot is chosen as the largest diagonal entry as shown in Algorithms and 2. In Phase 2, one may choose the pivot with the largest lower endpoint of the Gerschgorin circle in the current Schur complement. This provides the least modification at the current step. In other words, after diagonally interchanging rows and columns, G (A k ) G i (A k ) for k = K +,...,n 2 and i =,...,n k+, where G i (A k ) = a ii R i (A k ) is the lower endpoint of the ith Gerschgorin circle C i (A k ). 23

10 328 H.-r. Fang, D. P. O Leary However, computing all G i (A k ) in Phase 2 takes Objective 4. The proof of Lemma shows (n K )3 3 additions and fails to satisfy ( â ii R i (Â) ā ii R i+ (A) + (c) i c ) a + δ for i =,...,n. Therefore, ( G i (A k+ ) G i+ (A k ) + (c k ) i c ) k a k + δ k for k =,...,n and i =,...,n k. Using this fact, SE90 and SE99 recursively compute the lower bounds of these Gerschgorin intervals by ( Ĝ i (A k+ ) := Ĝ i+ (A k ) + (c k ) i c ) k a k + δ k for k =,...,n and i =,...,n k. The base cases are Ĝ i (A ) := G i (A) for i =,...,n. Computing these estimated lower endpoints Ĝ i (A k+ ) for pivoting takes (n K ) 2 additions and 2 (n K )2 multiplications. Hence Objective 4 is satisfied. Note that the bounds on E 2 in (3) for SE90 and in (6) for SE99 are independent of the pivoting strategy applied. 3 New modified LDL T Algorithms This section presents three variants of the LDL T algorithms, GMW-I, GMW-II and SE-I, and illustrates their performance in terms of how small E and κ(a + E) are. Experiments were run on a laptop with an Intel Celeron 2.8 GHz CPU using IEEE standard arithmetic with machine epsilon ɛ M = We measure the size of E by the ratios r 2 = E 2 λ min (A) and r E F F = ( λ i (A)<0 λ i(a) 2. (9) ) /2 Recall that the nondecreasing strategy for modified LDL T factorizations does not alter the bound on E 2 but may increase E F. To show its impact on E, Figs., 2, 3 display the distributions of r F, and the other plots user 2. Note that assuming λ min (A) < 0, the denominators are the norms of the least modification to make the positive semidefinite. The random matrices in our experiments are of the form QΛQ T, where Q R n n is a random orthogonal computed by the method of G. W. Stewart [20], and Λ R n n is diagonal with uniformly distributed random eigenvalues in [, 0000] 23

11 Modified Cholesky algorithms 329 or [, ]. For the matrices with eigenvalues in [, 0000], we impose the condition that there is at least one negative eigenvalue The GMW-I Algorithm The GMW8 algorithm, a Type-I algorithm, satisfies E 2 = O(n 2 ), whereas SE90 and SE99 further guarantee E 2 = O(n), asshownin(3) and (6), respectively. Schnabel and Eskow [8] pointed out that the 2-phase strategy can drop the bound on E 2 of GMW8 to be O(n). In our experiments, we note that incorporating the 2-phase strategy into GMW8 introduces difficulties similar to those for SE90, and again relaxing provides the rescue. We denote by GMW-I the algorithm that uses the relaxed Phase of SE99 with Phase 2 defined by GMW8. Denote the number of steps in Phase by K. Then δ k = 0 for k =, 2,...,K. Instead of (3), the bound on E 2 is E 2 ( ) 2 ˆξ + (n K )β + 2( ˆη + (n K )β 2 ) + δ, (20) β where ˆη and ˆξ are the maximum magnitudes of the diagonal and off-diagonal elements of A K +, respectively. Now we do not need β 2 ˆη to satisfy Objective, so β is chosen as the minimizer of (20), } β 2 ˆξ = max (n K ) 2,ɛ M for n K >. Substituting this into (20) and invoking (7), we obtain E 2 4(n K )ˆξ + 2 ˆη + δ 4(n K )(ξ + ( + µ)η) + 2η + δ = O(n), (2) where we ignore the extreme case β 2 = ɛ M. We still use δ := ɛ M as in GMW8 and set µ = 0.75 in the relaxed 2-phase strategy since it is an empirically good value for the GMW algorithms. (Recall that µ = 0. for SE99.) Pivoting reduces E 2 in the original GMW8 algorithm; we pivot on the maximum element instead of the maximum magnitude element in Phase 2, because on average the resulting κ 2 (A + E) is smaller in our experiments. We call our variant GMW-I. Figure shows our experimental result. The GMW-I algorithm performed well. The nondecreasing strategy (described in Sect. 2.2) substantially reduced κ 2 (A + E) but roughly doubled E F (though with E 2 comparable). Note that the bound on E 2 in (3) is preserved with the nondecreasing strategy. 4 Results of all ten algorithms on random matrices with eigenvalues in [ 0000, ], omitted in this article for lack of space, can be found in [0]. 23

12 330 H.-r. Fang, D. P. O Leary (a) r F (c) r F n=00, eig. range [-,0000] n=00, eig. range [-,] (b) κ 2 (A+E) (d) κ 2 (A+E) n=00, eig. range [-,0000] e+07 e n=00, eig. range [-,] e+0 e+09 e+08 e+07 e Fig. Measures of r F and κ 2 (A + E) for the Type-I GMW algorithms for 30 random matrices with n = 00. Key: original GMW8 (solid line), with 2-phase strategy (plus), with relaxed 2-phase strategy (GMW-I) (cross), with relaxed 2-phase and nondecreasing strategy (open square) 3.2 The GMW-II Algorithm In this subsection we introduce our GMW-II algorithm, a Type-II variant of the (Type-I) GMW8 algorithm. We apply the nondecreasing strategy and choose δ k in ()tobe a k + δ k := max δ,a k + δ k, c k 2 } β 2 for k =,...,n, where β>0and small tolerance δ>0are preset, and δ 0 := 0. The magnitude of the off-diagonal elements in L is still bounded by β, where LDL T = L L T. The bound on E 2 for GMW8 is given in (3). For the Type-II GMW algorithm, it is ( ) ξ 2 E 2 + (n )β + (η + (n )β 2 ) + δ =: f (β). (22) β Equality is attained with β 2 = ξ for n >. Recall that η and ξ are the maximum n 2 n magnitudes of the diagonal and off-diagonal elements of A, respectively. The minimum of (22) is 23 min β f (β) = 2ξ ( ) n 2 n + n + η + δ 4nξ + η + δ.

13 Modified Cholesky algorithms 33 The minimum is attained with β 2 = ξ for n >. Therefore, β is chosen by n 2 n } β 2 ξ := max η, n 2 n,ɛ M for n >, where β 2 η is for satisfying Objective with pivoting. Substituting this into (22), we obtain E 2 = O(n 2 ). The relaxed 2-phase strategy in Algorithm 2 is also incorporated into our GMW-II algorithm. Therefore, the bound on E 2 is E 2 ( ) 2 ˆξ + (n K )β + ( ˆη + (n K )β 2 ) + δ, (23) β where K is the number of steps in Phase, and ˆη and ˆξ are the maximum magnitudes of the diagonal and off-diagonal elements of A K +, respectively. Since β 2 ˆη is not required for satisfying Objective, β is determined by } β 2 ˆξ := max (n K ) 2 (n K ),ɛ M for n K >. Substituting this into (23), we obtain E 2 4(n K )ˆξ +ˆη + δ 4(n K )(ξ + ( + µ)η) + η + δ = O(n), (24) where we ignore the extreme case β 2 = ɛ M. The last inequality is derived using (7). The diagonal pivoting strategy can be incorporated into the Type-II GMW algorithms. We pivot on the maximum element for our GMW-II algorithm, as in the GMW-I algorithm. Note that all the a priori bounds on E 2 given above for all algorithms in the GMW class are independent of the pivoting strategy applied, if any. Recall that GMW8 and GMW-I use δ := ɛ M. For the Type-II GMW algorithms, we use δ := ɛ 3 M 2 η as in SE99. Our experimental results are shown in Fig. 2. Similar to SE90 and the Type-I GMW algorithms, incorporating the 2-phase strategy results in difficulties for the matrices with eigenvalues [, 0000], and relaxing is the cure. For all algorithms in the GMW class, the worst-case condition number is ( ( ) ξ + η n ) κ 2 (A + E) = O n 3. (25) δ The proof uses Theorem, Lemma 3, the bounds [δ, η + (n )β 2 ] for the diagonal elements in D, and the bound β for the magnitude of the off-diagonal elements in L, where P(A + E)P T = LDL T = L L T as denoted in (2). Whether the 2-phase strategy or the relaxed 2-phase strategy is applied, the bound on κ 2 (A + E) remains exponential using (7) and (7), respectively. The bounds are not changed when the nondecreasing strategy is applied. All the modified Cholesky 23

14 332 H.-r. Fang, D. P. O Leary (a) r F (c) r F n=00, eig. range [-,0000] n=00, eig. range [-,] (b) κ 2 (A+E) (d) κ 2 (A+E) n=00, eig. range [-,0000] e+08 e+07 e n=00, eig. range [-,] e+07 e Fig. 2 Measures of r F and κ 2 (A + E) for the Type-II GMW algorithms for 30 random matrices with n = 00, nondecreasing strategy invoked. Key: original Type-II GMW (solid line), with 2-phase strategy (open square), with relaxed 2-phase strategy (GMW-II) (cross) algorithms in Sects. 2 and 3 are numerically stable, since they can be regarded as the Cholesky factorizations of the symmetric positive definite A + E [6]. 3.3 The SE-I Algorithm Both SE90 and SE99 are Type-II algorithms. In this section we present the Type-I variant corresponding to SE99, denoted by SE-I, by making three changes. First, instead of (9), we determine δ k by δ k := max0, 2a k, a k + max a k, τη}} max2g, G + τη} (26) for k = K+,...,n 2. Second, instead of (0), the special treatment of the last 2 2 Schur complement in Phase 2 to keep E 2 small is δ n =δ n :=max 0, 2λ (A n ), λ (A n )+max max 2G, G + 2τ } (G + η) τ τ(λ2 (A n ) λ (A n )) τ }}, τη. (27) Note that κ 2 (A n + δ n I 2 ) minκ 2 (A n ), τ }. The derivation is similar to that of (). Third, if the algorithm switches to Phase 2 at the last step, then δ n is determined by 23

15 Modified Cholesky algorithms 333 }} τ δ n := max 0, 2a n, a n + max τ a n, τη max 2G, G + τ } (G + η) τ instead of (5). By (26), (27) and (28), we obtain E 2 max 2G, G + 2τ } (G + η). (29) τ Comparing (29) with (6), SE-I approximately doubles SE99 s bound on E. Now we investigate the satisfaction of Objective for the GMW and SE algorithms. We begin with a theorem of Ostrowski [5, pp. 224]. Theorem (Ostrowski) Suppose we are given a symmetric M C n n and a nonsingular S C n n. There exists θ k > 0 such that λ k (SMS ) = θ k λ k (M), where λ (SS ) θ k λ n (SS ). Consider the 2-phase strategy presented in Algorithm and the relaxed 2-phase strategy presented in Algorithm 2 with pivoting on the maximum diagonal element. Clearly E = 0 if the factorization is done in Phase. The derivation of the condition under which the algorithm runs to completion without switching to Phase 2 is by finite induction. We denote the incomplete LDL T factorization of a symmetric A R n n after step k by L k D k Lk T, where D k = k n k [ k n k ] D k 0, 0 S k with D k diagonal and S k the Schur complement. We claim that the following condition guarantees E = 0: λ min (A) δ L k Lk T 2 (30) for k =,...,n. At the beginning of step k, we assume that the diagonal elements of the Schur complement are all larger than or equal to δ, and investigate whether this condition holds in the next Schur complement. 5 By Theorem and (30), λ min (D k )λ max (L k L T k ) λ min(a) δ L k L T k 2 = δλ max (L k L T k ), and therefore λ min (D k ) δ,so λ min (S k ) λ min (D k ) δ, (28) 5 For the base case, we have λ min (A) δ from (30), so A δi is positive semidefinite and therefore diag(a) δi. 23

16 334 H.-r. Fang, D. P. O Leary which implies Diag(S k ) δi n k. By induction, we stay in Phase during the whole factorization. We conclude that if (30) holds, then E = 0. In the next two lemmas we develop bounds on LL T 2 and on λ min (LL T ), in order to bound the condition number of A + E for algorithms in Sects. 4 and 5. Lemma 2 If the positive semidefinite Hermitian M C n n has a diagonal element equal to, (i.e., m kk = for some k n), then λ min (M) λ max (M). Proof Let M = UΛU denote the spectral decomposition of M, and a := U e k. Since m kk =, = e T k Me k = a U (UΛU )Ua = a Λa. Note that a a =. We conclude that the weighted average of the eigenvalues of M is. Therefore, λ min (M) λ max (M). Lemma 3 6 For any lower unit triangular L R n n with (L) ij γ for j < i n, and λ max (LL T ) n + 2 n(n )γ 2, ( + γ) 2 2n λ min (LL T ). Proof By Lemma 2, λ min (LL T ) λ max (LL T ). Next, an upper bound on λ max (LL T ) is λ max (LL T ) trace(ll T ) n + 2 n(n )γ 2. Computing the inverse of a lower triangular, we obtain (L ) ii = fori =,...,n and the bounds (L ) ij γ i k= j+ (L ) ik for j < i n. The solution to this recursion is for j < i n. Therefore, (L i j ) ij γ( + γ) λ min (LL T ) = (LL T ) 2 L 2 2 L L ( + γ) 2n 2. Now we can bound L k Lk T 2 in (30). Pivoting on the maximum diagonal element of each Schur complement, the magnitude of the elements in L k are bounded by for all k. By Lemma 3, L k Lk T 2 n(n + ). (3) 2 6 This lemma was also presented in [5, Chap. 4] and [6]. 23

17 Modified Cholesky algorithms 335 (a) r F n=00, eig. range [-,0000] (b) κ 2 (A+E) n=00, eig. range [-,0000] e+2 e+0 e+08 e (c) 0 n=00, eig. range [-,] (d) 00 n=00, eig. range [-,] r F κ 2 (A+E) Fig. 3 Measures of r F and κ 2 (A + E) for the SE algorithms for 30 random matrices with n = 00. Key: original SE99 (solid line), Type-I SE99 (SE-I) (open square), Type-I SE99 with nondecreasing strategy (cross) Substituting this into (30), we obtain the following result. For algorithms GMW-I, GMW-II, SE90, SE99, and SE-I that use the 2-phase strategy or the relaxed 2-phase strategy, if λ min (A) n(n + )δ, (32) 2 then by (30) and (3) we conclude that E = 0. Our experimental results are shown in Fig. 3. For the random matrices with eigenvalues in [, 0000], SE-I resulted in larger E 2 and E F but substantially smaller κ 2 (A + E) than those of SE99. For the random matrices with eigenvalues in [, ], SE-I had comparable E 2, smaller E F, but larger κ 2 (A + E) than SE99. The nondecreasing strategy can be incorporated into the Type-I SE algorithm. The resulting E 2, E F and κ 2 (A+ E) were comparable to those of SE-I for the random matrices with eigenvalues in [, 0000], and comparable to those of SE99 for the random matrices with eigenvalues in [, ], Incorporating the non-relaxed 2-phase strategy into the Type-I SE algorithms is possible, but it would result in difficulties similar to those of SE90. For all the algorithms in the SE class, the worst-case condition number is ( (ξ + η)n 3 4 n κ 2 (A + E) = O δ ). (33) The sketch of the proof is similar to that for the GMW algorithms. In practice, the condition number is bounded by about /τ and / τ respectively for SE90 and SE99 [8], and is comparable to κ 2 (A) for SE-I. 23

18 336 H.-r. Fang, D. P. O Leary 4 Modified LBL T Algorithms Any symmetric A R n n has an LBL T factorization, where B is block diagonal with block order or 2 [2 4]. A modified LBL T algorithm first computes the LBL T factorization, and then perturbs ˆB = B + B to be positive definite, so that P(A + E)P T = L ˆBL T is positive definite as well, where P is the permutation for pivoting. Moré and Sorensen suggested a modified LBL T algorithm [6] which we call MS79. Each block in B, denoted by d, is modified to be ˆd := maxδ, d }, with δ> 0 the preset small tolerance. For each 2 2 block D, its spectral decomposition D = U [ λ λ2 ] U T is modified to be ˆD := U [ ˆλ ˆλ2 ] U T, where ˆλ i := maxδ, λ i } for i =, 2. Cheng and Higham proposed another modified LBL T algorithm [6] which we call CH98. Each block d is modified to be ˆd := maxδ, d}, with δ>0the preset small tolerance. Each 2 2 block D, with its spectral decomposition denoted by D = U [ λ λ2 ] U T, is modified to be ˆD := U [ ˆλ ˆλ2 ] U T, where ˆλ i = maxδ, λ i } for i =, 2. The key distinction is that MS79 is a Type-I algorithm, whereas CH98 is of Type II. The MS79 algorithm was developed before the fast Bunch-Parlett and bounded Bunch- Kaufman pivoting strategies (rook pivoting) for the LBL T factorization [2], but rook pivoting is also applicable to MS79. For MS79, we set δ := ɛ M as used in GMW8. Cheng and Higham [6] suggested δ := u A for CH98, where u = ɛ M /2isthe unit roundoff. MS79 predated the four objectives. Cheng and Higham investigated the objectives for CH98 [6], and our analysis for MS79 is similar. For both MS79 and CH98, if λ min (B) δ, then E = 0. By Theorem, ifa is positive definite, λ min (B) λ min(a) λ max (LL T ) Consider E 2 for MS79. By Theorem,. Therefore, E = 0 is guaranteed when λ min (A) δ LL T 2. (34) E 2 = λ max (E) = λ max (L BL T ) λ max (LL T )λ max ( B) = λ max (LL T ) max δ λ min (B), 2λ min (B), 0}. By Theorem again, λ min (B) λ min(a) λ min (LL T ) and λ min(b) λ min(a) λ max (LL T ) for λ min (A) <0. Therefore, E 2 2λ min (A)κ 2 (LL T ) for λ min (A) δ LL T 2. (35) Similarly, the bound on E 2 for CH98 is 23 E 2 δ LL T 2 λ min (A)κ 2 (LL T ) for λ min (A) 0. (36)

19 Modified Cholesky algorithms 337 Now we assess how well Objective 3 is satisfied for MS79. By Theorem, and λ min (A + E) λ min (LL T )λ min ( ˆB) = λ min (LL T ) max δ, min λ min (LL T ) max } λ i(b) i n δ, min i n λ i (A) λ max (LL T ) λ max (A + E) λ max (LL T )λ max ( ˆB) = λ max (LL T ) max δ, λ min (B), λ max (B)} λ max (LL T ) max δ, λ min(a) λ min (LL T ), λ max (A) λ min (LL T ) }, (37) }. (38) By (37) and (38), κ 2 (A + E) κ 2 (LL T ) 2 κ 2 (A). (39) The bound on κ 2 (A + E) for CH98 [6]is } κ 2 (A + E) κ 2 (LL T λ max (A) ) max, λ min (LL T. (40) )δ There are four pivoting algorithms for the LBL T factorization: Bunch-Parlett (complete pivoting) [4], Bunch-Kaufman (partial pivoting) [3], fast Bunch-Parlett and bounded Bunch-Kaufman (rook pivoting) [2], denoted by BP, BK, FBP and BBK, respectively. All these algorithms have a preset argument 0<α<. The BK algorithm takes O(n 2 ) time for pivoting, but the elements in L are unbounded. It is discouraged for the modified LBL T algorithms because Objectives 3 may not be satisfied. It is clear from (34) (40) that λ min (LL T ), λ max (LL T ) and κ 2 (LL T ) play an important role for the satisfaction of Objectives 3 for both MS79 and CH98. The BP, BBK and FBP algorithms all produce a bound on the elements in L in terms of the pivoting argument α, suggested to be α = to minimize the bound on the element growth of the Schur complements [2 4]. The corresponding element bound of the unit lower triangular L is γ = Alternatively, we could choose α = 0.5 to minimize the element bound of L [, Table 3.], which is γ = 2, leading to sharper bounds on λ min (LL T ), λ max (LL T ) and κ 2 (LL T ).The bounds in Table 3 are obtained using Lemma 3. Although α = 2 results in smaller bounds, α = is a better choice in practice, as shown in Fig. 4. The BP pivoting strategy takes 6 n3 + O(n 2 ) comparisons and does not meet Objective 4. The number of comparisons for the BBK and FBP pivoting strategies are between those of the BK and BP algorithms (i.e., between O(n 2 ) and O(n 3 )). There are matrices that require traversing the whole of each Schur complement 23

20 338 H.-r. Fang, D. P. O Leary Table 3 Bounds for the LBL T factorization with the BP, BBK or FBP pivoting algorithm α γ λ min (LL T ) λ max (LL T ) κ 2 (LL T ) n 4n 2 3n (4n 2 3n)3.78 2n n 2n 2 n (2n 2 n)3 2n 2 (a) r 2 (c) r 2 n=00, eig. range [-,0000] n=00, eig. range [-,] (b) κ 2 (A+E) (d) κ 2 (A+E) e+2 e+0 n=00, eig. range [-,0000] e+08 e n=00, eig. range [-,] e+2 e+0 e+08 e Fig. 4 Measures of r 2 and κ 2 (A + E) for MS79 and CH98 for 30 random matrices with n = 00. Key: MS79 α = (solid line), MS79 α = 0.5(plus), CH98 α = (cross), CH98 α = 0.5(open square) with either the BBK or the FBP pivoting strategy [2]. Hence they take Θ(n 3 ) comparisons for pivoting in worst cases and fail to meet Objective 4. On the other hand, these artificial matrices are not derived naturally from optimization problems. In practice we do not notice a slowdown in computing a search direction by the MS79 or CH98 algorithm. Here and throughout the remainder of this paper, we assume the pivoting strategy applied to MS79 and CH98 is BBK, unless otherwise noted. Three remarks are in order. First, both MS79 and CH98 satisfy Objectives 3. Second, the bound on E 2 for MS79 is about twice that for CH98, whereas A + E is generally better conditioned for MS79 than for CH98. Third, both algorithms fail to satisfy Objective 4 in the worst case, although in practice it is generally not a problem. 5 A new approach via Sandwiched LTL T -LBL T factorization Aasen [], Parlett and Reid [7] introduced the LTL T factorization and its application to solving symmetric linear systems. We denote the factorization by PAP T = LTL T, where T is symmetric tridiagonal, and L is unit lower triangular with the magnitude of its elements bounded by and the off-diagonal elements in the first column all zero. The work in computing Aasen s LTL T factorization is about the same as that of the Cholesky factorization, whereas the formulation by Parlett and Reid doubles the cost. 23

21 Modified Cholesky algorithms 339 Table 4 Comparison costs of various pivoting strategies for the LBL T factorization Symmetric General Tridiagonal Case Worst Best Worst Best BP O(n 3 ) O(n 2 ) FBP O(n 3 ) O(n 2 ) O(n 2 ) O(n) BBK O(n 3 ) O(n 2 ) O(n 2 ) O(n) In both formulations the storage is the same as that required by the Cholesky factorization, and the numerical stability of its use in solving linear systems is empirically comparable to that of the LBL T factorization [2]. Our new algorithms arise from the fact that A is positive definite if and only if T is positive definite. A modification algorithm makes ˆT = T + T symmetric positive definite, and the resulting factorization Â = P(A+ E)P T = L ˆTL T is also symmetric positive definite. It is not a new idea to use LTL T factorization for computing a Newton-like search direction. Cheng [5, Sect. 4.5] computes T to minimize T F subject to λ min (T + T ) δ. However, his method requires solving the eigenproblem of a symmetric tridiagonal. Empirically, the most efficient method is the divide-and-conquer algorithm originated by Cuppen [7]. It requires O(n 3 ) flops in the worst case [8, p.23]. Therefore, Objective 4 is not satisfied. Our method, satisfying all four objectives, was inspired by the merits of triadic structure (no more than two off-diagonal elements in every column of a ) discussed in []. We can apply any of the modified LDL T algorithms in Sects. 2 and 3 and the modified LBL T algorithms in Sect. 4 to the T. The resulting modified factorization satisfies Objective, since these previously discussed algorithms satisfy Objective. The triadic structure of a symmetric is preserved in the LDL T or LBL T factorizations [, Theorem 2.5]. This implies that the modified LDL T or LBL T algorithms in Sects. 2, 3, 4 applied to a symmetric triadic are very efficient. Recall that both MS79 and CH98 have difficulties in satisfying Objective 4. The potential excessive cost can be reduced to be O(n 2 ) by instead applying MS79 or CH98 to the symmetric tridiagonal T of the LTL T factorization. We call the resulting algorithms LTL T -MS79 and LTL T -CH98, respectively. For LTL T -MS79, we use δ := ɛ M as used in GMW8. For LTL T -CH98, we use δ := ɛ 3 M 2 η as used in SE99. Table 4 compares the costs of these LBL T pivoting strategies for symmetric and symmetric tridiagonal matrices. We use the BBK pivoting strategy for both MS79 and CH98, because it is the cheapest pivoting strategy that guarantees a bounded L.Even so, Objective 4 is not satisfied in the worst case. We use the BP pivoting strategy for both LTL T -CH98 and LTL T -MS79. Then Objective 4 is satisfied [], even though BP is the most expensive pivoting strategy. Given a symmetric A R n n, we denote its LTL T factorization by PAP T = LTL T, and the LBL T factorization of T by PT P T = L B L T.The 23

22 340 H.-r. Fang, D. P. O Leary resulting sandwiched factorization is PAP T = L P T L B L T PL T. Adding a perturbation B to B to make it positive definite, the modified factorization of T is P(T + T ) P T = L( B + B) L T. The modified LTL T factorization is P(A + E)P T = L P T L( B + B) L T PL T. (4) The L is unit lower triangular with the magnitude of all elements bounded by and all the off-diagonal elements in the first column zero. By Lemma 3, theltl T factorization satisfies λ max (LL T ) 2 n(n ) λ min (LL T ) 2 4 2n (42) for n >. Lemma 4 gives the bounds on λ max ( L L T ) and λ min ( L L T ), where L is triadic and unit lower triangular. Lemma 4 Let F γ (k) = k/2 i= ( ) k i i γ k i and Φ γ = + +4/γ 2 γ for k N and γ>0. For any triadic and unit lower triangular L R n n with the magnitude of the off-diagonal elements bounded by γ,. +(/γ ) Φk γ F γ (k) Φγ k for k N. 2. ( L ) ij F γ (i j +) for j i n. 3. λ max ( L L T ) n + (2n 3)γ 2 for n >. 4. λ min ( L L T ) ( Φ γ Φγ n )2. Proof Part one of this lemma is [, Lemma 5.4]. Part two is from the proof of [, Lemma 5.6]. For part three, λ max ( L L T ) trace( L L T ) = L 2 2 F n + (2n 3)γ for n >. Finally, by parts one and two, λ min ( L L T ) = ( L L T ) 2 = L 2 2 L L ( n 2 ( n ) 2 ( Φ n ) F γ (k)) Φγ k γ 2 =. Φ γ k= Now we can assess the satisfaction of Objectives 3 for our LTL T -MS79 and LTL T -CH98 algorithms. To ensure a bounded L of the LBL T factorization, we can use BP, FBP or BBK, but not BK. By (34), λ min (T ) δ L L T 2 implies E = 0. By Theorem,ifA is positive definite, λ min (A) λ min (T )λ min (LL T ). We conclude that E = 0if 23 k= λ min (A) δ L L T 2 λ min (LL T ). (43)

23 Modified Cholesky algorithms 34 (a) r 2 (c) r 2 n=00, eig. range [-,0000] n=00, eig. range [-,] (b) κ 2 (A+E) (d) κ 2 (A+E) e+2 e+0 n=00, eig. range [-,0000] e+08 e e+2 e+0 n=00, eig. range [-,] e+08 e Fig. 5 Measures of r 2 and κ 2 (A + E) for LTL T -MS79 and LTL T -CH98 for 30 random matrices with n = 00. Key: LTL T -MS79 α = 0.68 (solid line), LTL T -MS79 α = 0.5(plus), LTL T -CH98 α = 0.68 (cross), LTL T -CH98 α = 0.5(open square) For LTL T -MS79, by Theorem and (35), E 2 = λ max (E) = λ max (L TL T ) λ max (LL T )λ max ( T ) = LL T 2 T 2 2λ min (A)κ 2 (LL T )κ 2 ( L L T ) (44) for λ min (A) δ LL T 2 L L T 2.ForLTL T -CH98, by Theorem and (36), E 2 δ LL T 2 L L T 2 λ min (A)κ 2 (LL T )κ 2 ( L L T ) (45) for λ min (A) 0. For LTL T -MS79, by Theorem and (39), κ 2 (A + E) κ 2 (LL T )κ 2 (T + T ) κ 2 (LL T )κ 2 ( L L T ) 2 κ 2 (T ) κ 2 (LL T ) 2 κ 2 ( L L T ) 2 κ 2 (A). (46) For LTL T -CH98, by Theorem and (40), } κ 2 (A + E) κ 2 (LL T )κ 2 ( L L T λ max (T ) ) max, λ min ( L L T )δ } κ 2 (LL T )κ 2 ( L L T λ max (A) ) max, λ min (LL T )λ min ( L L T. (47) )δ 23

24 342 H.-r. Fang, D. P. O Leary Note that the pivoting argument used in PT P T = L B L T is α = for symmetric triadic matrices [, Theorem 4.]. The corresponding element bound of L is γ = One may choose α = 0.5 to obtain the minimum element bound of L [, Table 3.], which is γ = 2, but it could result in an excessive E 2 for random matrices with eigenvalues [, 0000] as shown in Fig. 5. The bounds on LL T 2 and λ min (LL T ) are given in (42). The bounds on L L T 2 and λ min ( L L T ) areinlemma4 with γ = We conclude that LL T 2 L L T 2 7.5n 3 7.5n n λ min (LL T )λ min ( L L T 9 (48) ) 4 n (3.4 n ) 2 for n >. Comparing (48) with Table 3 with α = + 7 8, the bounds on LL T 2 and λ min (LL T ) for MS79 and CH98 are sharper than the bounds on LL T 2 L L T 2 and λ min (LL T )λ min ( L L T ) for LTL T -MS79 and LTL T -CH98, respectively. Comparing (35) and (36) with (44) and (45), MS79 and CH98 have sharper bounds on E 2 than LTL T -MS79 and LTL T -CH98, respectively. Comparing (39) and (40) with (46) and (47), MS79 and CH98 have sharper bounds on κ 2 (A + E) than LTL T -MS79 and LTL T -CH98, respectively. In our experiments, however, our LTL T -MS79 and LTL T -CH98 algorithms usually performed as well as (and sometimes better than) MS79 and CH98, respectively. Compare Fig. 5 (with α = 0.68) with Fig. 4 (with α = 0.640). In our experiments on the random matrices with eigenvalues in [, ], E 2 produced by LTL T -MS79 and LTL T -CH98 were comparable to those from MS79 and CH98, respectively. For the random matrices with eigenvalues in [, 0000], our LTL T -MS79 and LTL T - CH98 algorithms slightly outperformed MS79 and CH98 by keeping E 2 smaller on average, respectively. 6 Additional numerical experiments Our previous experiments provided good values for the parameters in our methods. Now we present more extensive comparisons among the methods. We ran three tests in our experiments. The first test concerns random matrices similar tothosein[6,8,9]. The second test was on the first in [8] forwhichse90 had difficulties. The third test was on the 33 matrices used in [9]. Our experiments were on a laptop with a Intel Celeron 2.8 GHz CPU using IEEE standard arithmetic with machine epsilon ɛ M = Random matrices To investigate the behaviors of the factorization algorithms, we experimented on the random matrices with eigenvalues in [, 0000] and [, ] with n = 00. The random matrices were generated as described in Sect. 3. We compare the performance of the four Type-I algorithms, GMW-I, SE-I, MS79 and LTL T -MS79, and the four Type-II algorithms, GMW-II, SE99, CH98 and LTL T -CH98. Figure 6 shows the 23

25 Modified Cholesky algorithms 343 (a) r 2 (c) r 2 n=00, eig. range [-,0000] n=00, eig. range [-,] (b) κ 2 (A+E) (d) κ 2 (A+E) n=00, eig. range [-,0000] e+07 e e n=00, eig. range [-,] Fig. 6 Measures of r 2 and κ 2 (A + E) for GMW-I, SE-I, MS79, and LTL T -MS79 for 30 random matrices with n = 00. Key: GMW-I (solid line), SE-I (plus), CH98 (cross), LTL T -CH98 (open square) (a) r 2 (c) r 2 n=00, eig. range [-,0000] n=00, eig. range [-,] (b) κ 2 (A+E) (d) κ 2 (A+E) n=00, eig. range [-,0000] e+2 e+0 e+08 e n=00, eig. range [-,] e+2 e+0 e+08 e Fig. 7 Measures of r 2 and κ 2 (A+ E) for GMW-II, SE99, CH98, and LTL T -CH98 for 30 random matrices with n = 00. Key: GMW-II (solid line), SE99 (plus), CH98 (cross), LTL T -CH98 (open square) results of the Type-I algorithms, whereas Fig. 7 shows results of the Type-II algorithms. We measure the size of E by r 2 = E 2 λ min (A) as defined in (9). Consider the Type-I algorithms. MS79 and LTL T -MS79 generally produced comparable E 2 and condition numbers, but for matrices with eigenvalues in [, 0000], LTL T -MS79 achieved a smaller E 2 than MS79 in several cases. For matrices with eigenvalues in [, ], SE-I outperformed the other Type-I algorithms by not only producing smaller E 2 but also smaller κ 2 (A + E). 23

26 344 H.-r. Fang, D. P. O Leary Table 5 Measures of E and κ 2 (A + E) for the Benchmark (49) Algorithm r 2 r F κ 2 (A + E) GMW GMW-I GMW-II SE SE SE-I MS CH LTL T -MS LTL T -CH Now compare the Type-II algorithms. In experiments on matrices with eigenvalues in [, 0000], GMW-II and SE99 produced E 2 smaller than the others on average. The LTL T -CH98 algorithm outperformed CH98 by usually achieving a smaller E 2. For the random matrices with eigenvalues in [, ], SE99 remains the best. 6.2 The Benchmark Schnabel and Eskow [8] identified a that gives SE90 difficulties: A = (49) It became one of the benchmark matrices for the modified Cholesky algorithms [6,9]. This has eigenvalues 0.378, 0.343, 0.248, }. The measures of E 2 and E F in terms of r 2 and r F, and the condition numbers κ 2 (A + E) are listed in Table 5 for various modified Cholesky algorithms, where the new methods are in boldface. It shows that SE90 is the only algorithm that encountered difficulty when applied to this. 6.3 The 33 matrices 33 matrices, generated by Gay, Overton, and Wright from optimization problems where GMW8 outperformed SE90, were used by Schnabel and Eskow [9] to evaluate modified Cholesky algorithms. Table 6 summarizes r 2 = E 2 λ min (A) and ζ = log 0 (κ 2(A + E)) for the existing algorithms in the literature, whereas Table 7 gives the result of the new algorithms. Matrix B3_ is positive definite but extremely ill-conditioned, so that we measure the size of E by E 2 instead of r 2. We see that SE90 did not perform well on several 23

Analysis of Block LDL T Factorizations for Symmetric Indefinite Matrices

Analysis of Block LDL T Factorizations for Symmetric Indefinite Matrices Haw-ren Fang August 24, 2007 Abstract We consider the block LDL T factorizations for symmetric indefinite matrices in the form LBL