1.1. Contributions. The most important feature of problem (1.1) is that A is

Size: px

Start display at page:

Download "1.1. Contributions. The most important feature of problem (1.1) is that A is"

Lee Powell
5 years ago
Views:

1 FAST AND STABLE ALGORITHMS FOR BANDED PLUS SEMISEPARABLE SYSTEMS OF LINEAR EQUATIONS S. HANDRASEKARAN AND M. GU y Abstract. We present fast and numerically stable algorithms for the solution of linear systems of equations here the coecient matrix can be ritten in the form of a banded plus semiseparable matrix. Such matrices include banded matrices, semiseparable matrices, and block-diagonal plus semiseparable matrices as special cases. Our algorithms are based on novel matrix factorizations developed specically for matrices ith such structures. We also present interesting numerical results ith these algorithms. Key ords. banded matrix, semiseparable matrix, fast algorithms, stable algorithms. AMS subject classications. 5A9, 5A3, 65F5, 65L, 65R.. Introduction. In this paper e consider fast and numerically stable solutions of the n-by-n linear system of equations (.) A x = b; here A is the sum of a banded matrix and a semiseparable matrix. This class of matrices also include block-diagonal + semiseparable matrices. Linear systems of such forms appear in the numerical solution of boundray-value problems for ordinary dierential equations and certain integral equations (see Greengard and Rokhlin [6], Starr [9]) and Lee and Greengard [7]. Some related ork on (.) can also be found in Eidelman and Gohberg [3, 4]... ontributions. The most important feature of problem (.) is that A is a dense but highly structured matrix. Although direct methods have been developed for ecient and numerically stable LU and QR factorizations of banded matrices (see Demmel [, h. ]), such methods do not currently exist for semiseparable matrices, let alone banded+semiseparable matrices. The main diculty is that LU and QR factorizations have tremendous diculties in mentaining the banded+semiseparable structure, and consequently requre O(n 3 ) ops to compute such factorizations of A in (.). Although iterative methods, such as those based on the Lanczos or Arnoldi procedures, can take advantage of the structure in A and can be used to solve (.), their convergence rate is highly problem-dependent and can be very slo ithout eective preconditioning (see Saad [8]). In this paper, e present a number of fast and numerically stable direct methods for solving (.). Our methods are based on some ne matrix factorizations e developed specically for banded+semiseparable matrices. We also present interesting results from our numerical experiments ith these methods in matlab... Notation. To describe the problem precisely, e rst introduce some notation. Reminiscent of matlab notation, e use triu(a; k) to denote the matrix hich is identical to the matrix A on and above the k-th diagonal. k = is the main diagonal, k > is above the main diagonal and k < is belo the main diagonal. Similarly, Department of Electrical and omputer Engineering, University of alifornia, Santa Barbara. address: shivece.ucsb.edu ydepartment of Mathematics, University of alifornia, Los Angeles. address: mgumath.ucla.edu A op is a oating point operation such as +;?; and.

2 S. handrasekaran and M. Gu tril(a; k) denotes the matrix hich is identical to the matrix A on and belo the k-th diagonal. For example, triu A = tril?a = A : As a banded+semiseparable matrix, the matrix A in (.) can be ritten as (.) A = D + triu? u v T ; b u + + tril? p q T ;?b l? here D is an n n banded matrix, ith b u non-zero diagonals strictly above the main diagonal and b l non-zero diagonals strictly belo the main diagonal; u and v are n r u matrices; and p and q are n r l matrices. When b u = b l =, D is a diagonal matrix, and A is a diagonal+semiseparable matrix. When r u = r l =, A = D is a banded matrix. We are interested in the numerical solution of the linear system (.). The rest of this paper provides a set of numerically backard stable algorithms that take approximately O? n(b u + b l + r u + r l ) ops to solve (.) as opposed to O(n 3 ) by using traditional methods involving LU and QR factorizations. The exact constant hidden in the O() notation varies among our algorithms. Throughout this paper, e ill take the liberty to use I to denote an identity matrix of any dimension. The rest of this paper is organized as follos. In x e illustrate the basic ideas behind our algorithms through a simple example. In x3 e describe the algorithms in some detail. In x4 e present our numerical results ith these algorithms.. Basic Idea. In this section e give a description of the basic idea in the simple case hen D is a diagonal matrix, and u, v, p, and q have only one column. The idea is to compute a to-sided decomposition of the form (.) A = W L H ; here W and H can be ritten as the product of n elementary matrices, and L is a loer triangular matrix. The three matrices W, L, H themselves are never explicitly formed, but inverted eciently on-line as the algorithm proceeds. In this section, e ill choose the matrices W and H to be the products of elementary Givens rotations. When e discuss our algorithms in full detail in x3, e ill allo ourselves the additional freedom of choosing W and H to be products of elementary Householder reections or Gaussian elimination matrices ith column and/or ro permutations. More specically consider the 5 5 case: D u v u v u v 3 u v 4 p q D u v u v 3 u v 4 A = B p q p q D u v 3 u v 4 A : p 3 q p 3 q p 3 q D 3 u 3 v 4 p 4 q p 4 q p 4 q p 4 q 3 D 4 For future convenience e also assume that the right-hand side is of the form b b b = B b A?? B p b 3 p 3 b 4 p 4

3 Fast and Stable Solver for Semiseparable Systems 3 here? =. (Of course, the second term on the right-hand side has no eect at this stage, but it ill capture the general form of the recursion as e proceed.) No suppose that W is a Givens rotation such that W u p u A = u + u A (.) bu for any vector. Then if e apply W from the left to A e obtain ba A b ba A b bu v bu v 3 bu v 4 ba = W A = B p q p q D u v 3 u v 4 p 3 q p 3 q p 3 q D 3 u 3 v 4 A : p 4 q p 4 q p 4 q p 4 q 3 D 4 We also apply W to b to obtain b b b = W b = W BB b A?? B b 3 b 4 p p 3 p 4 AA = B b b b b b b 3 b 4 A?? B here e have deliberately ritten the formula in such a ay that it ould be correct even if? had not been zero. We next choose a Givens rotation, H, such that (.3) H T ba ba A = q ba + b A A ea p p 3 p 4 for any vector. Further let H T q = q eq eq and H T ba ea = : ba ea Then No let (.4) ea ea = W A H = A b ea A e bu v bu v 3 bu v 4 H = B p eq p eq D u v 3 u v 4 p 3 eq p 3 eq p 3 q D 3 u 3 v 4 A : p 4 eq p 4 eq p 4 q p 4 q 3 D 4 H? x = H? B x x x x 3 x 4 e ex A B x A ex: x 3 x 4

4 4 S. handrasekaran and M. Gu Then it follos from W A H H? x = e A ex = W b = b b, that Also let e = b b ea : =? + e eq ; e b = b b? e e A and e b = b? p : To reach this stage e needed to compute all the \tilde" and \hat" quantities except ex. They can be computed in constant time, independent of the size of the matrix A. No e can proceed to solve the smaller 4 4 system of equations ea bu v bu v 3 bu v 4 ex e b B p eq D u v 3 u v 4 B x e p 3 eq p 3 q D 3 u 3 v 4 A A = B b x 3 b 3 A? B p 3 p 4 eq p 4 q p 4 q 3 D 4 x 4 b 4 p 4 hich is exactly like the original 5 5 system of equations in form. That is, the coecient matrix is a diagonal matrix plus a semiseparable matrix, and the righthand side is also of the requisite form. Hence e can use this recursion 3 times until the problem size becomes, at hich point e solve the sysmtem directly. Let the 5 numbers obtained by this recursion, e, e, e, e 3, and e 4, be the components of the ve dimensional vector. Then it follos from equation (.4) that the actual solution x to the original 5 5 system of equations is given by (.5) x = H H H ; here the H i 's are the successive Givens transforms computed from the recursion (.3), but set up in such a ay that they only aect ros i and i +. Since there are only 3 of these transforms e retain the linear time complexity of the algorithm. The backard stability of the algorithm follos from the fact that e only use orthogonal transforms and a single forard-substitution. Our factorization is similar in form to the ULV factorization proposed by Steart []. Hoever, the ULV factorization of Steart is developed primarily to reveal potential numerical rank-deciency in a general matrix and can take O(n 3 ) ops to compute; hereas our factorization is designed primarily to take advantage of the banded+semiseparable structure for large savings in computational cost. There are to places in the recursion here elimination is necessary. In equation (.) e chose W to be a Givens rotation to eliminate u, and in equation (.3) e chose P to be another Givens rotation to eliminate b A. These transformations can be replaced by Householder transformations or Gaussian elimination matrices ith ro or column pivoting. This results in several algorithms ith dierent eciency and numerical stability properties. In the next section, e describe a general procedure for solving (.) via the computation of the factorization (.) We also discuss eciency and numerical stability issues for dierent choices of W and H in (.). 3. The Algorithms. We no describe fast algorithms for solving (.), here A is a general banded+semiseparable matrix of the form (.).

5 Fast and Stable Solver for Semiseparable Systems Preprocessing and Basic Linear Algebra Procedures. Some preprocessing is needed before the algorithms formally start. We assume that u is loer triangular. This can be achieved by computing a QR factorization u T = Q R and setting (3.) u := R T and v := v Q: This operation takes roughly 6nru ops using the fact that Q is computed in factored form [5, h. 5]. We also revie a fe ell-knon basic linear algebra routines needed in our algorithms. Let L be an m by s loer triangular matrix ith m > s, l l l. L =..... l s l s l ss : B... A l m l m l ms Algorithm 3. belo is a standard procedure for eciently zeroing out entries l ; l ; ; l ss on the main diagonal of L by using s Givens rotations (see [5, h. ]). Algorithm 3.. Elimination ith Givens Rotations for i := s to step? do hoose c i + s i = such that q ci s i li;i = here?s i c i l i = l i+;i i i;i + l i+;i : Set l i;i :=, l i+;i := i, and compute li; l i;i? ci s := i li; l i;i? : l i+; l i+;i??s i c i l i+; l i+;i? endfor Let W be the product of all the Givens rotations used in the above algorithm. Then its output can be ritten via a matrix-matrix product as L := W L. Similarly, e can zero out the main diagonal of L by using a banded Gassian elimination procedure ith ro pivoting. See Golub and Van Loan [5, h. 4] for details. Let G R ms be a general dense matrix. Then e can choose a Householder tranformation H = I? u u T ; ith kuk = to zero out all the entries in the rst ro of G except the (; ) entry: b (3.) G H = bg G b : The cost for computing u is O(s), and the cost for computing G H is about 4ms ops (see [5, h. 5]).

6 6 S. handrasekaran and M. Gu Alternatively, e can choose H in (3.) to be a Gaussian elimination matrix of the form?h T H = : I olumn pivoting can be used to enhance numerical stability. The cost for computing G H is about ms ops (see Golub and Van Loan [5, h. 3]). 3.. Ne Algorithms. Let ` = b u + r u + and m = ` + b l, e begin by riting A in the folloing form G E (3.3) A = ; F here G R m` is a dense matrix (its banded+semiseparable structure ill be ignored); F R (n?ru?)(n?`) is a banded+semiseparable rectangular matrix; and both R (n?m)` and E R (ru+)(n?`) are lo rank matrices. We caution that strictly speaking equation (3.3) is not a block partitioning of A, since the ro dimension of G is larger than that of E in general. In further detail, e rite = p q T, here p R (n?m)rl and q R`rl contain the last n? m ros of p and the rst ` ros of q, respectively. Similarly, E = u v T, here u R (ru+)ru and v R (n?`)ru contain the rst r u + ros of u and the last n? ` ros of v, respectively. As suggested in x3., e assume that u is a loer triangular matrix. As in x, e ill solve (.) by recursively solving the linear sytem (3.4) A x = b b? ; p here? = R rl is an auxiliary vector that ill play the role of scalar? in the example in x. As before, e ill compute a to-sided decomposition (.) of A and invert matrices W, L, and H on-line. To start the recursion, e choose a matrix W so that W u is a loer triangular matrix ith zeros on its main diagonal. ompute (3.5) bu := W u; b G := W I G; and b := W b? : I p? Linear system (3.4) no becomes B bg p q T v bu T A x = b b: F We further choose H to zero out the rst ro of G b except the (; ) entry. ompute e A := G b H and et (3.6) A := H T q: eg G e eq

7 Fast and Stable Solver for Semiseparable Systems 7 Linear system (3.4) no has the folloing form: e B eg G e bu v T (3.7) A ex = b b; p e p eq T F here ex = e b A x = B ex A and b = b B b I A : ex b b? p? H? No e can perform one step of forard substitution in (3.7) to get e = =e b and eg bu v T b b? e eg e b (3.8) A ex = A A : p eq T F b b? p (? + e e) b b? p This is a system smaller in dimension than (3.4). To complete the recursion, in the folloing e rerite it in the form of (3.4). Rerite v = et p = et ef f et A and F = ev ep ef 3 F e here e T and e T are the rst ros of v and p, respectively; f e R m?ru ; and f e, and ef 3 are column vectors of appropriate dimensions. Similar to equation (3.3), the block form of F above is strictly speaking not a block partitioning of F, since the length of ef is larger than in general. F e is itself a banded+semiseparable rectangular matrix. With this notation, e can no rerite (3.8) in the form of (3.4) as G E _ F ~ ex = b _? (3.9) ep here eg bu e _G = A e T eq T f e ; _ = ep? eq T e ; _E = bu A ev T ; _ B b = b T e b b b? e T ith e T and b T being the (` + )-th and (r u + )-th ros of q and u, respectively. Once again the block form of _G is not a block partitioning. As in x, e can perform elimination and forard substitution steps using formulas (3.4) through (3.9) recursively for some k times to obtain solution components e, e ; : : :; e k?. We stop the recursion hen the problem size n? k in (3.9) becomes so small that n? k m, at hich point e solve it directly to get a solution e.

8 8 S. handrasekaran and M. Gu To recover the solution to our original problem (.), let H, H ; : : :; H k? be the elimination matrices used at the second elimination step dened by equations (3.6) and (3.7), e compute the solution to (.) as (3.) x = I H I A I H I A I H k? I A B e. e k? e here the various identity matrices I, are in general of dierent dimensions Eciency and Numerical Stability onsiderations. In this section e consider special choices of matrices W and H in the recursion and ho they aect the eciency and numerical stability of the procedure. To make op counting simpler, in this section e assume that r l ; r u ; b l ; b u n even though our algorithms ork for general banded + semiseparable matrices. For complete backard stability, e can choose the W matrices in (3.5) to be the product of r u Givens rotations as suggested by Algorithm 3.. The costs for computing bu, G, b and b are about 3r u ops, 6r u` ops, and 6r u ops respectively. Hence the total cost for one step of (3.5) is about 3r u (r u + `) ops. We then choose H in (3.6) as a Householder transformation. The costs for computing G b H and H T q are about 4m` ops and 4r l` ops, respectively. Hence the total cost for one step of (3.6) is about 4(m + r l )` ops. In equation (3.8), the costs for computing e b and are about m ops and r l ops, respectively, leading to a total of (m + r l ) ops. In equation (3.9), the main cost is to explicitly form the last ro and column of G. _ The costs for computing bu e and e T eq T are about ru ops and r l` ops, respectively. There is essentially no cost for f e, hich consists of the non-zero components of a column in the banded matrix D. Hence the total cost in (3.9) is about (ru + r l`) ops. Since there are k n steps of recursion, the total cost for the procedure is about? 3ru (r u + `) + 4(m + r l )` + (ru + r l`) n (3.) =? 5r u + (b u + b l + 3r l + 5r u ) (b u + r u ) n ops: Additionally, there is a cost of about 6run ops for the preprocessing step (3.). With such choices of W and H, e obtain a factorization (.) ith orthorgonal matrix W and H. Since only orthogonal transformations and one forard substitution are used for the solution of (.), this algorithm is backard stable. To reduce computational cost, e can also choose W via the banded Gaussian elimination procedure ith ro pivoting in Golub and Van Loan [5, h. 4]. And e can choose H as a Gaussian elimination matrix, ith column pivoting if necessary. This choice of W and H leads to a factorization (.) ith upper triangular matrices W and H. It is quite interesting to note that factorizations of this form do not seem to have been discussed before in the literature. With this choice of W and H, the cost for one step of (3.5) is about r u (r u + `) ops; the cost for one step of (3.6) is about (m+r l )` ops; and the total cost in (3.9) is about (r u + r l`) ops. With k n steps of recursion, the total cost for the procedure is about?ru (r u + `) + (m + r l )` + (ru + r l`) n (3.) =? 3ru + (b u + b l + r l + r u ) (b u + r u ) n ops:

9 Fast and Stable Solver for Semiseparable Systems 9 Additionally, there is a cost of about 6run ops for the preprocessing step (3.). It is ell-knon that Gaussian elimination ith partial pivoting could occasionally become numerically unstable if certain element groth is too large (see Golub and Van Loan [5, h. 3]). It is likely that by using Gaussian elimination procedures in (3.5) and (3.6), the resulting factorization could become numerically unstable in pathological cases for large values of r u and b u. Alternatively, e can choose only one of W and H to be orthogonal, leading to a factorization (.) ith one of W and H to be orthogonal and the other upper triangular. Furthermore, the choices of W and H can change from one recursion step to another, leading to a factorization (.) ith no obvious structures in W and H. While our algorithms ere presented in such a ay that the only one variable in (3.4) is eliminated in forard substitution at every recursion step. It is straightforard to reorganize the computation to develop a block version here a number of of variables are eliminated together. Given the success of the recent linear algebra package Lapack [] in using block algorithms to speed up numerical computation, it seems clear that hen the dimension becomes very large, the problem (.) can be solved more eciently by block versions of our algorithms. Finally, e note that the problem (.) can be reritten in the folloing form (3.3) B y = S b; B = S A S and x = S y; here S is the matrix ith 's on the main anti-diagonal and zero elsehere. It is easy to verify that B = (S D S) + tril (S u) (S v) T ;?b u? + triu (S p) (S q) T ; b l + : It can be veried that (S D S) is a banded matrix ith b u non-zero diagonals strictly belo the main diagonal and b l non-zero diagonals strictly above the main diagonal. Hence B is itself a banded+semiseparable matrix ith the banded+semiseparable structure of A. Applying the to algorithms e just discussed to solve (3.3), e see that the total costs are? 5r (3.4) l + (b l + b u + 3r u + 5r l ) (b l + r l ) n ops and (3.5)? 3r l + (b l + b u + r u + r l ) (b l + r l ) n ops; respectively. This suggests that one should choose among the to forms (.) and (3.3) according to formulas (3.) through (3.5) to reduce computational cost. 4. Numerical Experiments. In this section, e summarize the results from our numerical experiments ith the algorithms that ere presented in x3. These experiments ere performed on an UltraSparc orkstation in matlab ith double precision?6. We tested to algorithms For example, hen n =, e have S = :

10 S. handrasekaran and M. Gu Algorithm-I: Only Gaussian elimination steps ith partial pivoting are used in computing (.). Algorithm-II: Only Givens rotations and Householder reections are used in computing (.). In all of the test matrices, e chose r l = n=, r u = n=5, b u = and b l =. The matrix entries ere generated randomly. In Table 3., e compared Algorithms I and II in terms of the numbers of ops required to solve (.). The column marked GEPP is the number of ops required for Gaussian elimination ith partial pivoting to solve (.) by treating A as a dense matrix. We see that Algorithm I requires less ops than Algorithm II, and both Algorithms I and II require signicantly less numbers of ops to solve (.) than GEPP. In Table 3., e compared Algorithms I and II in terms of execution times and backard errors. The execution times are in seconds, and the backard error is dened as ka bx? bk kak kbxk ; here bx is the computed solution to (.). This backard error is the smallest relative backard error in -norm (see [5, h. 3]). learly Algorithm I is faster than Algorithm II as expected. Both are comparable in terms of backard errors. Hoever, as e mentioned in x3, Algorithm I could be numerically unstable in pathological cases if any one of b u and r u is very large. REFERENES [] E. Anderson, Z. Bai,. Bischof, J. Demmel, J. Dongarra, J. Du roz, A. Greenbaum, S. Hammarling, A. McKenney, S. Ostrouchov, and D. Sorensen, LAPAK Users' Guide, SIAM, Philadelphia, PA, second ed., 994. [] W. J. Demmel, Numerical Linear Algebra, SIAM, Philadelphia, 997. [3] Y. Eidelman and I. Gohberg, Inversion formulas and linear complexity algorithm for diagonal plus semiseparable matrices, omputers and Mathematics ith Applications, Feb. 997, vol.33, no.4, pp. 69{79. [4] Y. Eidelman and I. Gohberg, A look-ahead block Schur algorithm for diagonal plus semiseparable matrices, omputers and Mathematics ith Applications, May 998, vol.35, no., pp. 5{34. [5] G. Golub and. Van Loan, Matrix omputations, Johns Hopkins University Press, Baltimore, MD, 3nd ed., 996. [6] L. Greengard and V. Rokhlin, On the numerical solution of to-point boundary value problems, omm. Pure Appl. Math., XLIV (99), pp. 49{45. [7] June-Yub Lee and L. Greengard, A fast adaptive numerical method for sti to-point boundary value problems, SIAM Journal on Scientic omputing, March 997, vol.8, no., pp. 43{9. [8] Y. Saad, Iterative Methods for Sparse Linear Systems, PWS Publishing ompany, 996. [9] H. P. Starr, Jr., On the Numerical Solution of One-Dimensional Integral and Dierential Equations, PhD thesis, Department of omputer Science, Yale University, May 99. [] G. W. Steart, Updating a rank-revealing ULV decomposition, SIAM J. Mat. Anal. Appl., 4 (993), pp. 494{499.

11 Fast and Stable Solver for Semiseparable Systems Table 3. Numbers of Flops n Algorithm I Algorithm II GEPP Table 3. Execution Times and Backard Errors Time (seconds) Backard Error n Algorithm I Algorithm II Algorithm I Algorithm II 5 7.6?. 6.?9.6? ?9 5.8? ?.? ?.? ? 6.3? ? 3.8? ?.9? ? 5.3? ? 3.4? ?.?9

Exponentials of Symmetric Matrices through Tridiagonal Reductions

Exponentials of Symmetric Matrices through Tridiagonal Reductions Ya Yan Lu Department of Mathematics City University of Hong Kong Kowloon, Hong Kong Abstract A simple and efficient numerical algorithm