problem Au = u by constructing an orthonormal basis V k = [v 1 ; : : : ; v k ], at each k th iteration step, and then nding an approximation for the e

Similar documents
Arnoldi Methods in SLEPc

Exponentials of Symmetric Matrices through Tridiagonal Reductions

The Godunov Inverse Iteration: A Fast and Accurate Solution to the Symmetric Tridiagonal Eigenvalue Problem

Universiteit-Utrecht. Department. of Mathematics. Jacobi-Davidson algorithms for various. eigenproblems. - A working document -

D. Gimenez, M. T. Camara, P. Montilla. Aptdo Murcia. Spain. ABSTRACT

Iterative methods for symmetric eigenvalue problems

Reduced Synchronization Overhead on. December 3, Abstract. The standard formulation of the conjugate gradient algorithm involves

A JACOBI-DAVIDSON ITERATION METHOD FOR LINEAR EIGENVALUE PROBLEMS. GERARD L.G. SLEIJPEN y AND HENK A. VAN DER VORST y

Computing least squares condition numbers on hybrid multicore/gpu systems

ARPACK. Dick Kachuma & Alex Prideaux. November 3, Oxford University Computing Laboratory

EIGIFP: A MATLAB Program for Solving Large Symmetric Generalized Eigenvalue Problems

Density-Matrix-Based Algorithms for Solving Eingenvalue Problems

OUTLINE 1. Introduction 1.1 Notation 1.2 Special matrices 2. Gaussian Elimination 2.1 Vector and matrix norms 2.2 Finite precision arithmetic 2.3 Fact

Using Godunov s Two-Sided Sturm Sequences to Accurately Compute Singular Vectors of Bidiagonal Matrices.

LAPACK-Style Codes for Pivoted Cholesky and QR Updating

Lecture 8: Fast Linear Solvers (Part 7)

Domain decomposition on different levels of the Jacobi-Davidson method

Contents. Preface... xi. Introduction...

LAPACK-Style Codes for Pivoted Cholesky and QR Updating. Hammarling, Sven and Higham, Nicholas J. and Lucas, Craig. MIMS EPrint: 2006.

A communication-avoiding thick-restart Lanczos method on a distributed-memory system

Eigenvalue Problems CHAPTER 1 : PRELIMINARIES

A HARMONIC RESTARTED ARNOLDI ALGORITHM FOR CALCULATING EIGENVALUES AND DETERMINING MULTIPLICITY

eciency. In many instances, the cost of computing a few matrix-vector products

Deflation for inversion with multiple right-hand sides in QCD

The parallel computation of the smallest eigenpair of an. acoustic problem with damping. Martin B. van Gijzen and Femke A. Raeven.

Solving Ax = b, an overview. Program

Preconditioned inverse iteration and shift-invert Arnoldi method

A Parallel Implementation of the Trace Minimization Eigensolver

A Parallel Implementation of the Davidson Method for Generalized Eigenproblems

A Domain Decomposition Based Jacobi-Davidson Algorithm for Quantum Dot Simulation

Scientific Computing with Case Studies SIAM Press, Lecture Notes for Unit VII Sparse Matrix

Applied Mathematics 205. Unit V: Eigenvalue Problems. Lecturer: Dr. David Knezevic

An Asynchronous Algorithm on NetSolve Global Computing System

Department of. Computer Science. Functional Implementations of. Eigensolver. December 15, Colorado State University

The Conjugate Gradient Method

4.8 Arnoldi Iteration, Krylov Subspaces and GMRES

SOLVING SPARSE LINEAR SYSTEMS OF EQUATIONS. Chao Yang Computational Research Division Lawrence Berkeley National Laboratory Berkeley, CA, USA

Index. for generalized eigenvalue problem, butterfly form, 211

FEM and sparse linear system solving

APPLIED NUMERICAL LINEAR ALGEBRA

Preconditioned Eigenvalue Solvers for electronic structure calculations. Andrew V. Knyazev. Householder Symposium XVI May 26, 2005

ITERATIVE METHODS BASED ON KRYLOV SUBSPACES

Block Krylov-Schur Method for Large Symmetric Eigenvalue Problems

Krylov subspace projection methods

The Algebraic Multigrid Projection for Eigenvalue Problems; Backrotations and Multigrid Fixed Points. Sorin Costiner and Shlomo Ta'asan

LINEAR ALGEBRA 1, 2012-I PARTIAL EXAM 3 SOLUTIONS TO PRACTICE PROBLEMS

The Algorithm of Multiple Relatively Robust Representations for Multi-Core Processors

Sparse BLAS-3 Reduction

Alternative correction equations in the Jacobi-Davidson method

DELFT UNIVERSITY OF TECHNOLOGY

BLZPACK: Description and User's Guide. Osni A. Marques. CERFACS Report TR/PA/95/30

Performance Evaluation of Some Inverse Iteration Algorithms on PowerXCell T M 8i Processor

Preface to the Second Edition. Preface to the First Edition

Domain Decomposition-based contour integration eigenvalue solvers

Henk van der Vorst. Abstract. We discuss a novel approach for the computation of a number of eigenvalues and eigenvectors

Solving Large Nonlinear Sparse Systems

Last Time. Social Network Graphs Betweenness. Graph Laplacian. Girvan-Newman Algorithm. Spectral Bisection

Implementation of a preconditioned eigensolver using Hypre

Algorithm for Sparse Approximate Inverse Preconditioners in the Conjugate Gradient Method

Largest Bratu solution, lambda=4

The quadratic eigenvalue problem (QEP) is to find scalars λ and nonzero vectors u satisfying

Alternative correction equations in the Jacobi-Davidson method. Mathematical Institute. Menno Genseberger and Gerard L. G.

Krylov Subspaces. Lab 1. The Arnoldi Iteration

A Refined Jacobi-Davidson Method and Its Correction Equation

Positive Denite Matrix. Ya Yan Lu 1. Department of Mathematics. City University of Hong Kong. Kowloon, Hong Kong. Abstract

Preconditioned Conjugate Gradient-Like Methods for. Nonsymmetric Linear Systems 1. Ulrike Meier Yang 2. July 19, 1994

Computation of Smallest Eigenvalues using Spectral Schur Complements

On Orthogonal Block Elimination. Christian Bischof and Xiaobai Sun. Argonne, IL Argonne Preprint MCS-P

ECS231 Handout Subspace projection methods for Solving Large-Scale Eigenvalue Problems. Part I: Review of basic theory of eigenvalue problems

A CHEBYSHEV-DAVIDSON ALGORITHM FOR LARGE SYMMETRIC EIGENPROBLEMS

MULTIGRID ARNOLDI FOR EIGENVALUES

Krylov Space Methods. Nonstationary sounds good. Radu Trîmbiţaş ( Babeş-Bolyai University) Krylov Space Methods 1 / 17

Porting a Sphere Optimization Program from lapack to scalapack

IN THE international academic circles MATLAB is accepted

Institute for Advanced Computer Studies. Department of Computer Science. Iterative methods for solving Ax = b. GMRES/FOM versus QMR/BiCG

ETNA Kent State University

Solving the Inverse Toeplitz Eigenproblem Using ScaLAPACK and MPI *

A hybrid Hermitian general eigenvalue solver

ADDITIVE SCHWARZ FOR SCHUR COMPLEMENT 305 the parallel implementation of both preconditioners on distributed memory platforms, and compare their perfo

Numerical Methods in Matrix Computations

Algorithm 853: an Efficient Algorithm for Solving Rank-Deficient Least Squares Problems

Institute for Advanced Computer Studies. Department of Computer Science. Two Algorithms for the The Ecient Computation of

M.A. Botchev. September 5, 2014

Key words. conjugate gradients, normwise backward error, incremental norm estimation.

Preconditioned Eigensolver LOBPCG in hypre and PETSc

Application of Lanczos and Schur vectors in structural dynamics

On the Modification of an Eigenvalue Problem that Preserves an Eigenspace

Presentation of XLIFE++

Matrix Eigensystem Tutorial For Parallel Computation

Centro de Processamento de Dados, Universidade Federal do Rio Grande do Sul,

A Parallel Implementation of the. Yuan-Jye Jason Wu y. September 2, Abstract. The GTH algorithm is a very accurate direct method for nding

Computation of eigenvalues and singular values Recall that your solutions to these questions will not be collected or evaluated.

MATH 304 Linear Algebra Lecture 20: The Gram-Schmidt process (continued). Eigenvalues and eigenvectors.

An Arnoldi Method for Nonlinear Symmetric Eigenvalue Problems

Computation of the singular subspace associated with the smallest singular values of large matrices

Laboratoire d'informatique Fondamentale de Lille

The rate of convergence of the GMRES method

Matrix Algorithms. Volume II: Eigensystems. G. W. Stewart H1HJ1L. University of Maryland College Park, Maryland

Finite-choice algorithm optimization in Conjugate Gradients

Deflation Strategies to Improve the Convergence of Communication-Avoiding GMRES

Transcription:

A Parallel Solver for Extreme Eigenpairs 1 Leonardo Borges and Suely Oliveira 2 Computer Science Department, Texas A&M University, College Station, TX 77843-3112, USA. Abstract. In this paper a parallel algorithm for nding a group of extreme eigenvalues is presented. The algorithm is based on the well known Davidson method for nding one eigenvalue of a matrix. Here we incorporate knowledge about the structure of the subspace through the use of an arrowhead solver which allows more parallelization in both the original Davidson and our new version. In our numerical results various preconditioners (diagonal, multigrid and ADI) are compared. The performance results presented are for the Paragon but our implementation is portable to machines which provide MPI and BLAS. 1 Introduction A large number of scientic applications rely on the computation of a few eigenvalues for a given matrix A. Typically they require the lowest or highest eigenvalues. Our algorithm (DSE) is based on the Davidson algorithm, but calculates various eigenvalues through implicit shifting. DSE was rst presented in [10] under the name RDME to express its ability to identify eigenvalues with multiplicity bigger than one. The choice of preconditioner is an important issue in eliminating convergence to the wrong eigenvalue [14] In the next section, we describe the Davidson algorithm and our version for computing several eigenvalues. In [9] Oliveira presented convergence rates for Davidson type algorithm dependent on the type of preconditioner. These results are summarized here in Section 3. Section 4 addresses parallelization strategies discussing the data distribution in a MIMD architecture, and a fast solver for the projected subspace eigenproblem. In Section 5 we present numerical and performance results for the parallel implementation on the Paragon. Further results about the parallel algorithm and other numerical results are presented in [2]. 2 The Davidson Algorithm Two of the most popular iterative methods for large symmetric eigenvalue problems are Lanczos and Davidson algorithms. Both methods solve the eigenvalue 1 This research is supported by NSF grant ASC 9528912 and a Texas A&M University Interdisciplinary Research Initiative Award. 2 Department of Computer Science, Texas A&M University, College Station, TX 77843. email: suely@cs.tamu.edu.

problem Au = u by constructing an orthonormal basis V k = [v 1 ; : : : ; v k ], at each k th iteration step, and then nding an approximation for the eigenvector u of A by using a vector u k from the subspace spanned by V k. Specically, the original problem is projected onto the subspace which reduces the problem to a smaller eigenproblem S k y = y ~ k, where S k = Vk T AV k. Then the eigenpair ( ~ k ; y k ) can be obtained by applying a ecient procedure for small matrices. To complete the iteration, the eigenvector y k is mapped back as u k = V k y k, which is an approximation to the eigenvector u of the original problem. The dierence between the two algorithms consists on the way that basis V k is built. The attractiveness of the Lanczos algorithm results from the fact that each projected matrix S k is tridiagonal. Unfortunately, sometimes this method may require a large number of iterations. The Davidson algorithm denes a dense matrix S k on the subspace, but since we can incorporate a preconditioner in this algorithm the number of iterations can be much lower than for Lanczos. In Davidson type algorithms, a preconditioner M k is applied to the current residual, r k = Au k? ~ k u k, and the preconditioned residual t k = M k r k is orthonormalized against the previous columns of V k = [v 1 ; v 2 ; : : : ; v k ]. Although in the original formulation M is the diagonal matrix (diag(a)? I)?1 [6], the Generalized Davidson (GD) algorithm allows the incorporation of dierent operators for M. The DSE algorithm can be summarized as follows. Algorithm 1 { Restarted Davidson for Several Eigenvalues Given a matrix A, a normalized vector v 1, number of eigenpairs p, restart index q, and the minimal dimension m for the projected matrix S (m > p), compute approximations and u for the p smallest eigenpairs of A. 1. Set V 1 [v 1 ]. (initial guess) 2. For j = 1; ::: ; p (approximation for j-th eigenpair) While k = 1 or kr k?1 k < do (a) Project S k = Vk T k. (b) If (m + q) dim S (restart S k ) Reduce S k ( k ) (mm) to its m smaller eigenvectors, and update V k for the new basis. (c) Compute the j th smallest eigenpair k, y k of S k. (d) Compute the Ritz vector u k V k y k. (e) Check convergence for r k Au k? k u k. (f) Apply preconditioner t k Mr k. (g) Expand basis V k+1 [V k ; t k ] using modied Gram Schmidt (MGS). End while 3. End For. The core ideas of DSE (Algorithm 1) are based on the projection of A into the subspace spanned by the columns of V k. The interation number k is not necessarily equal to dim S k, since we have incorporated implicit restarts. The matrix S k is obtained by adding one more column and row Vk T Av k to matrix S k?1 (step 2.a). Other important aspects of the DSE algorithm are:

(1) the eigenvalue solver for the subspace matrix S k (step 2.c); (2) the use of an auxiliary matrix W k = [w 1 ; : : : ; w k ] to provide a residual calculation r k = Au k? k u k = w k y k? k u k with less computational work (step 2.e); the choice of a preconditioner M (step 2.f); and the use of modied Gram-Schmidt orthonormalization (step 2.g) which preserves numerical stability when updating the orthonormal basis V k+1. At each iteration, the algorithm expands the matrix S either until all the rst p eigenvalues have been converged, or S reaches a maximum dimension m + q; In the latter case, restarting is applied by using the orthonormal decomposition S k = Yk T ky k of S. It corresponds to step 2.b in the algorithm. Because of our choice for m, note that in step 2.c dim S will be always bigger or equal to j. 3 Convergence Rate A proof of convergence (but without a rate estimate) for the Davidson algorithm is given in Crouzeix, Philippe and Sadkane [5]. A bound on the convergence rate was rst presented in [10]. The complete proof is shown in Oliveira [9]. Let A be the given matrix whose eigenvalues and eigenvectors are wanted. The preconditioner M is given for one step, and Davidson's algorithm is used with u k being the current computed approximate eigenvector. The current eigenvalue estimate is the Rayleigh quotient b k = A (u k ) = (u T k Au k)=(u T k u k). Let the exact eigenvector with the smallest eigenvalue of A be u, and Au = u: (If is a repeated eigenvalue of A, then we can let u be the normalized projection of u k onto this eigenspace.) Theorem 1. Let P be the orthogonal projection onto ker(a?i)?. Suppose that A and M are symmetric positive denite. If kp? P MP (A? I)k 2 < 1; then for almost any starting value x 1, the convergence of the eigenvalue estimates b k converge to ultimately geometrically with convergence factor bounded by 2, and the angle between the computed eigenvector and the exact eigenspace goes to zero ultimately geometrically with convergence factor bounded by. A geometric convergence rate can be found for DSE (which obtains eigenvalues beyond the smallest (or largest) eigenvalue) by modifying Theorem 1. In the following theorem assume that 0 = kp 0? P 0 MP 0 P 0 (A? p I)P 0 k 2 where P 0 is the orthogonal projection onto the orthogonal complement of the span of the rst p? 1 eigenvectors. Then we can shown, in a similar way to Theorem 1 that the convergence factor for the new algorithm is bounded by ( 0 ) 2 To prove Theorem 2 we use the fact that P 0 s k = s k, as s k is orthogonal to the bottom p eigenvectors. and that although (A? p I) is no longer positive semi-denite, P 0 (A? p I)P 0 is.

Theorem 2. Suppose that A and M are symmetric positive denite and that the rst p? 1 eigenvectors have been found exactly. Let P 0 be the orthogonal projection onto the orthogonal complement of the span of the rst p? 1 eigenvectors of A. If kp 0? P 0 MP 0 (A? p I)P 0 k 2 0 < 1; then for almost any starting value x 1, the eigenvalue estimates b k obtained by our modied Davidson algorithm for several eigenvalues converges to p ultimately geometrically with convergence factor is bounded by ( 0 ) 2, and the angle between the exact and computed eigenvector goes to zero ultimately geometrically with convergence factor bounded by 0. 4 Parallel Implementation Previous implementations for the Davidson algorithm solve the eigenvalue problem in subspace S by using algorithms for dense matrices: early works [3, 4, 17] adopt EISPACK [12] routines, and later implementations [13, 15] use LAPACK [1] or reductions to tridiagonal form. Partial parallelization is obtained through the matrix-vector operations and sparse format storage for matrix A [13, 15]. Here we explore the relationship between two successive matrices S k which allows us to represent S k through an arrowhead matrix. The arrowhead structure is extremely sparse and the associated eigenvalue problem can be solved by a highly parallelizable method. 4.1 Data Distribution Data partitioning signicantly aects the performance of a parallel system by determining the actual degree of concurrency of the processors. Matrices are partitioned along distinct processors so that the program exploits all the best possible data parallelism: The nal distribution is well balanced, and most of the computational work can be performed without communication. These two conditions make the parallel program very suited for distributed memory architectures. Both computational workload and storage requirements are the same for all processors. Communication overhead is kept as low as possible. Matrix A is split into row blocks A i, i = 1; : : : ; N, each one containing dn=ne rows of A. Thus processor i, i = 1; : : : ; N stores A i, the i th row block of A. Matrices V k and W k are stored in the same fashion. This data distribution allow us to perform many of the matrix-vector computations in place. The orthonormalization strategy is also an important aspect in parallel environments. Recall that the modied Gram Schmidt (MGS) algorithm will be applied to the extended matrix [V k ; t k ] where the current basis V k has been previously orthonormalized. This observation reduces the computational work by eliminating the outer loop from the two nested loops in the full MGS algorithm.

4.2 The Arrowhead Relationship Between Matrices S k As pointed in [2, 10], the relationship between S k and S k?1 can be used to show that S k is explicitly similar to an arrowhead matrix Sk ~ of the form k?1 ~s ~S k = k ~s T ; (1) k s kk where ~s k = Y T V T w k?1 k?1 k, s kk = vk T w k, and the diagonal matrix k?1 corresponds to the orthonormal decomposition S k?1 = Y k?1 k?1 Y T k?1. In practice, the matrix S k does not need to be stored: only a vector for k and a matrix for Y k are required from one iteration to the next. Thus, given the eigenvalues k?1 and eigenvectors Y k?1 of S k?1, matrix Sk ~ can be used to nd the eigenvalues k of S k. Arrowhead eigensolvers [8, 11] are highly parallelizable and typically perform O(k 2 ) operations, instead of the usual O(k 3 ) eort of algorithms for dense matrices S. 5 Numerical Results In our numerical results we employ three kind of preconditiners: diagonal preconditioner (as in the original Davidson), multigrid and ADI. A preconditioner can be expressed as the matrix which solves Ax = b by applying an iterative method to MAx = Mb instead. In the case of a Diagonal preconditioner this would correspond to scaling the system and then solving. Multigrid and ADI preconditioners are more complex and for that we refer the reader to [16, 18, 19]. In our implementation level 1, 2 and 3 BLAS and the Message Passing Interface (MPI) library were used for easy portability. The computational results in this section were obtained with a nite dierence approximation for?u + gu = f (2) on a unit square domain. Here g is null inside a 0:2 0:2 square on the center of the rectangle and g = 100 for the remaining domain. To compare the performance delivered by distinct preconditioners we observe the total timing and number of iterations required for the sequential DSE for nding the ten smallest eigenpairs (p = 10) assuming convergence for residual norms less or equal to 10?7. The restart indexes were q = 10 and m = 15. This corresponds to apply restarting every time that the projected matrix S k achieves order 25, reducing its order to 15. Table 1 presents the actual running timings In a single processor of the Intel Paragon, running three grid sizes: 3131, 6363, and 127127 (matrices of orders 961, 3969 and 16129, respectively.). It reects the tradeo between preconditioning strategies: although the diagonal preconditioner (DIAG) is the easiest and fastest to compute, it requires an increasing number of iterations for larger matrices. Multigrid preconditioners (MG) are more expensive than DIAG, but they turn to be more eective for larger matrices. Finally, the ADI method aggregate the advantages of the previous preconditioners in the sense that it is more eective and less expensive than

MG. More details about the preconditioners used here can be found in [2] and its references. Table 1. Sequential times and number of iterations for three preconditioners. matrix order 961 matrix order 3969 matrix order 16129 iterations time (sec) iterations time (sec) iterations time (sec) ADI 29 4.7 26 16.2 27 80.7 MG 34 8.8 40 43.0 40 254.1 DIAG 174 8.6 319 43.2 700 386.0 The overall behavior of the DSE algorithm (with a multigrid preconditioner) is shown in Figure 1 for matrices sizes 3969 and 16129, as a function of the number of processors. Note that the estimated optimal number of processors is not far from the actual optimal. The model for our estimates is presented in [2]. 22 Actual times compared with theoretical estimates 20 18 order 16129 order 3969 estimated optimal 16 time (sec) 14 12 10 8 6 0 5 10 15 20 25 number of processors Fig. 1. Actual and estimated times for equation (2). Results for two dierent matrix sizes performance on the Paragon are shown. To conclude, we compare the performance of the parallel DSE with PAR- PACK [7], a parallel implementation of ARPACK 1. Figure 2 presents the total running times for both algorithms for the problem described above. For these runs, DSE used our parallel implementation of the ADI as its preconditioner. 1 ARPACK implements an Implicitly Restarted Arnoldi Method (IRAM) which in the symmetric case corresponds to the Implicitly Restarted Lanczos algorithm. We used the regular mode when running PARPACK.

The problem was solved by using 4, 8, and 16 processors to obtain relative residuals kau?uk=kuk of order less than 10?5. We show our theoretical analysis for the parallel algorithm in [2]. Other numerical results for the sequential DSE algorithm, including examples showing the behavior of the algorithm for eigenvalues with multiplicity greater than one, were presented in [10]. DSE (ADI) and PARPACK for three distinct problem sizes 60 50 DSE (ADI) PARPACK 4 processors 8 processors 16 processors time (sec) 40 30 20 10 10 3 10 4 log: problem size Fig. 2. Running times for DSE and PARPACK using 4, 8, and 16 processors on the Paragon. References 1. E. Anderson, Z. Bai, C. Bischof, J. Demmel, J. Dongarra, J. Croz, A. Greenbaum, S. Hammarling, A. McKenney, S. Ostrouchov, and D. Sorensen. LAPACK User's Guide. SIAM, Philadelphia, 1992. 2. L. Borges and S. Oliveira. A parallel Davidson-type algorithm for several eigenvalues. Journal of Computational Physics. To appear. 3. G. Cisneros, M. Berrondo, and C. F. Brunge. DVDSON: A subroutine to evaluate selected sets of eigenvalues and eigenvectors of large symmetric matrices. Compu. Chem., 10:281{291, 1986. 4. G. Cisneros and C. F. Brunge. An improved computer program for eigenvector and eigenvalues of large conguration iteraction matrices using the algorithm of Davidson. Compu. Chem., 8:157{160, 1984. 5. M. Crouzeix, B. Philippe, and M. Sadkane. The Davidson method. SIAM J. Sci. Comput., 15(1):62{76, 1994. 6. E. R. Davidson. The iterative calculation of a few of the lowest eigenvalues and corresponding eigenvectors of large real-symmetric matrices. J. Comp. Phys., 17:87{ 94, 1975.

7. K. Maschho and D. Sorensen. A portable implementation of ARPACK for distributed memory parallel architectures. In Proceedings of Copper Mountain Conference on Iterative Methods, April 9{13 1996. 8. D. P. O'Leary and G. W. Stewart. Computing the eigenvalues and eigenvectors of symmetric arrowhead matrices. J. Comp. Phys., 90:497{505, 1990. 9. S. Oliveira. On the convergence rate of a preconditioned algorithm for eigenvalue problems. Submitted. 10. S. Oliveira. A convergence proof of an iterative subspace method for eigenvalues problem. In F. Cucker and M. Shub, editors, Foundations of Computational Mathematics Selected Papers, pages 316{325. Springer, January 1997. (selected). 11. S. Oliveira. A new parallel chasing algorithm for transforming arrowhead matrices to tridiagonal form. Mathematics of Computation, 67(221):221{235, January 1998. 12. B. T. Smith, J. M. Boyle, J. J. Dongarra, B. S. Garbow, Y. Ikebe, V. C. Klema, and C. B. Moler. Matrix eigensystem routines: EISPACK guide. Number 6 in Lecture Notes Comput. Sci. Springer{Verlag, Berlin, Heidelberg, New York, second edition, 1976. 13. A. Stathopoulos and C. F. Fischer. A Davidson program for nding a few selected extreme eigeinpairs of a large, sparse, real, symmetric matrix. Comp. Phys. Comm., 79:268{290, 1994. 14. A. Stathopoulos, Y. Saad, and C. F. Fisher. Robust preconditioning of large, sparse, symmetric eigenvalue problems. J. Comp. and Appl. Mathematics, 64:197{ 215, 1995. 15. V. M. Umar and C. F. Fischer. Multitasking the Davidson algorithm for the large, sparse eigenvalue problem. Int. J. Supercomput. Appl., 3:28{53, 1989. 16. R. S. Varga. Matrix Iterative Analysis. Prentice-Hall, 1963. 17. J. Weber, R. Lacroix, and G. Wanner. The eigenvalue problem in conguration iteration calculations: A computer program based on a new derivation of the algorithm of Davidson. Compu. Chem., 4:55{60, 1980. 18. J. R. Westlake. A handbook of numerical matrix inversion and solution of linear equations. Wiley, 1968. 19. D. M. Young. Iterative Solution of Large Linear Systems. Academic Press, 1971. This article was processed using the LATEX macro package with LLNCS style