Parallel Solution of Sparse Linear Systems

Size: px
Start display at page:

Download "Parallel Solution of Sparse Linear Systems"

Transcription

1 Parallel Solution of Sparse Linear Systems Murat Manguoglu Abstract Many simulations in science and engineering give rise to sparse linear systems of equations. t is a well known fact that the cost of the simulation process is almost always governed by the solution of the linear systems especially for largescale problems. The emergence of extreme-scale parallel platforms, along with the increasing number of processing cores available on a single chip pose significant challenges for algorithm development. Machines with tens of thousands of multicore processors place tremendous constraints on the communication as well as memory access requirements of algorithms. The increase in number of cores in a processing unit without an increase in memory bandwidth aggravates an already significant memory bottleneck. Sparse linear algebra kernels are well-known for their poor processor utilization. This is a result of limited memory reuse, which renders data caching less effective. n view of emerging hardware trends, it is necessary to develop algorithms that strike a more meaningful balance between memory accesses, communication, and computation. Specifically, an algorithm that performs more floating point operations at the expense of reduced memory accesses and communication is likely to yield better performance. We presented two alternative variations of DS factorization based methods for solution of sparse linear systems on parallel computing platforms. Performance comparisons to traditional LU factorization based parallel solvers is also discussed. We show that combining iterative methods with direct solvers and using DS factorization, one can achieve better scalability and shorter time to solution. Murat Manguoglu Department of Computer Engineering, Middle East Technical University, Ankara, Turkey manguoglu@ceng.metu.edu.tr 1

2 2 Murat Manguoglu 1 ntroduction Many simulations in science and engineering give rise to sparse linear systems of equations. t is a well known fact that the cost of the solution process is almost always governed by the solution of the linear systems especially for large-scale problems. The emergence of extreme-scale parallel platforms, along with the increasing number of processing cores available on a single chip pose significant challenges for algorithm development. Machines with tens of thousands of multicore processors place tremendous constraints on the communication as well as memory access requirements of algorithms. The increase in number of cores in a processing unit without an increase in memory bandwidth aggravates an already significant memory bottleneck. Sparse linear algebra kernels are well-known for their poor processor utilization. This is a result of limited memory reuse, which renders data caching less effective. n view of emerging hardware trends, it is necessary to develop algorithms that strike a more meaningful balance between memory accesses, communication, and computation. Specifically, an algorithm that performs more floating point operations at the expense of reduced memory accesses and communication is likely to yield better performance. Significant amount of effort has been devoted to design and implementation of parallel sparse linear systems solvers. Existing parallel sparse direct solvers, such as MUMPS [3, 2, 1], Pardiso [39, 40], SuperLU [22], and WSMP [14, 13], are based on LU factorization. Therefore, the speed improvements realized by such solvers are often limited due to the inherited limitations of sparse LU factorizations and sparse triangular forward-backward sweeps. terative solvers, such as preconditioned Krylov subspace methods with sparse approximate inverse or incomplete LU factorization based preconditioners, on the other hand, are often more scalable but not as robust as direct solvers. We present two robust hybrid algorithms based on DS factorization for parallel solution of general sparse linear systems. At the cost of increased computation, DS factorization for solving the system allows us to minimize the interprocess communications and, hence, enhances concurrency. The remainder of this chapter is organized as follows. n Section 2, we present banded and sparse variations of DS factorization. n Section 3, we develop two hybrid general sparse linear system solvers that use DS factorization to solve the preconditioned system. 2 Banded and sparse parallel DS factorizations A number of banded solvers have been proposed and implemented in software packages such as, LAPACK [4] for uniprocessors, ScaLAPACK [7], and Spike [36, 34, 21, 6, 9, 30, 31, 37] for parallel architectures. The central idea of Spike is to partition the matrix so that each process (or processing element) can work on its own part of the matrix, with the processes communicating only during the solu-

3 Parallel Solution of Sparse Linear Systems 3 tion of the common reduced system. The size of the reduced system is determined by the bandwidth of the matrix and the number of partitions. Unlike classical sequential LU factorization of the coefficient matrix A, for solving a banded linear system Ax = f, the Spike scheme employs the factorization: A = DS, (1) where D is the block diagonal of A for a given number of partitions. The factor S, given by D 1 A (assuming D is nonsingular), called the spike matrix, consists of the block diagonal identity matrix modified by spikes to the right and left of each partition. The process of solving Ax = f, then, reduces to a sequence of the following steps g D 1 f (modification of the right hand side) S D 1 A (forming the spike system coefficient matrix) ˆx Ŝ 1 ĝ (solving a smaller independent reduced system) x S 1 g (retrieving the full solution). All the steps of the solution process can be executed in perfect parallelism with the exception of the solution of the small reduced system. The size of the reduced system will increase as we increase the number of partitions (or processors). Furthermore, each step can be accomplished using one of several available methods, depending on the specific parallel architecture and the linear system at hand. This gives rise to a family of optimized variants of the basic Spike algorithm. For a small banded system we illustrate the partitioning of the system among three processors in Figure 1. Figure 2 depicts the spike system Sx = g. Finally, Figure 3 shows the smaller reduced system. Dashed boxes in figures show those elements which could be near machine precision if A is diagonally dominant system. gnoring those small elements will give rise to truncated variation of the algorithm allowing more parallelism in the solution of the reduced system. For the solution of general sparse linear systems (where A is sparse) a new algorithm has been proposed in [24, 28]. nspired by the banded Spike solver, this algorithm also uses the DS factorization and partitioning of the system (see Figures 4 and 5). The resulting S matrix, however, is not banded and consists of sparse spikes. Nevertheless, a smaller reduced system can still be obtained (see Figure 6). Solution stages follow exactly the banded case. Both banded and sparse DS factorization based algorithms can be adapted for efficient and scalable solution of general sparse linear systems as will be discussed in Section 3. 3 Solution of general sparse linear systems n this section, we present two hybrid methods for the solution of general sparse linear systems which use an iterative method as the outer layer with preconditioning.

4 4 Murat Manguoglu A x f A 11 A 22 * = A 33 Fig. 1 Partitioning of the system (Ax = f ) into three parts, boxes represent nonzeros D -1 A x D -1 f * = Fig. 2 Spike system: Sx = g where g = D 1 f and S = D 1 A for the system in Figure 1 n both methods, the solution of preconditioned linear systems are handled by one of the variations of the DS factorization based banded or sparse solvers described in Section 2. The first method relies on reordering systems so that the large entries in the coefficient matrix are moved closer to the main diagonal. After reordering an effective banded preconditioner, M, can be extracted and used for solving the system with an outer iterative layer. M could be treated as dense or sparse within the band and hence the diagonal blocks can be handled by a variety of algorithms. The second method, on the other hand, eliminates the need for the reordering with weights by dropping small elements when forming the preconditioner. An outer

5 Parallel Solution of Sparse Linear Systems 5 S x g * = Fig. 3 Reduced system: Ŝ ˆx = ĝ obtained from the spike system in Figure 2 A x f A 11 A 22 * = A 33 Fig. 4 Partitioning of the system (Ax = f ) into three parts, boxes represent nonzeros iterative method is also used for the second method and preconditioned systems are handled by the sparse variation of the DS factorization. 3.1 Weighted reordering and banded preconditioning Given a linear system of equations, Ax = f, we first apply a nonsymmetric row permutation as follows: QAx = Q f. (2)

6 6 Murat Manguoglu D -1 A x D -1 f * = Fig. 5 Spike system: Sx = g where g = D 1 f and S = D 1 A for the system in Figure 4 S x * = g Fig. 6 Reduced system: Ŝ ˆx = ĝ obtained from the spike system in Figure 5 Here, Q is the row permutation matrix that either maximizes the number of nonzeros on the diagonal of A, or the permutation that maximizes the product of the absolute values of the diagonal entries [11]. The first algorithm is known as maximum traversal search, while the second algorithm provides scaling factors so that the absolute values of the diagonal entries are equal to one and all other elements are less than or equal to one. This scaling can be applied as follows: (QD 2 AD 1 )(D 1 1 x) = (QD 2 f ). (3) Both algorithms are implemented in subroutine MC64 [10] of the HSL [17] library. Following the above nonsymmetric reordering and optional scaling, we apply the symmetric permutation P as follows:

7 Parallel Solution of Sparse Linear Systems 7 (PQD 2 AD 1 P T )(PD 1 1 x) = (PQD 2 f ). (4) The permutation, P, can be chosen such that the magnitude of the nonzeros closer to the main diagonal are larger than the ones that are far away. Such a reordering can be obtained by solving the following eigenvalue problem. The second smallest eigenvalue and the corresponding eigenvector of the Laplacian of a graph have been used in a number of application areas including matrix reordering [26, 23, 5], graph partitioning [32, 33], machine learning [29], protein analysis and data mining [16, 42, 20], and web search [15]. The second smallest eigenvalue of the Laplacian of a graph is sometimes called the algebraic connectivity of the graph, and the corresponding eigenvector is known as the Fiedler vector, due to the work of Fiedler [12]. For a given n n sparse symmetric matrix A, or an undirected weighted graph with positive weights, one can form the weighted-laplacian matrix, L w, as follows: { L w (i, j) = ĵ i A(i, ĵ) if i = j, A(i, j) if i j. (5) Since the Fiedler vector can be computed independently for disconnected graphs, we assume that the graph is connected. The eigenvalues of L w are different than zero except λ 1. The eigenvector x 2 corresponding to smallest nontrivial eigenvalue λ 2 is called the Fiedler vector. Since we assume a connected graph, the trivial eigenvector, x 1, is a vector of all ones. n case the matrix, A, is nonsymmetric one can use ( A + A T )/2, instead. A state of the art multilevel solver [18] called MC73 Fiedler for computing the Fiedler vector is implemented in the Harwell Subroutine Library(HSL) [17]. t uses a series of levels of coarser graphs where the eigenvalue problem corresponding to the coarsest level is solved via the Lanczos method for estimating the Fiedler vector. The results are then prolongated to the finer graphs and Rayleigh Quotient terations (RQ) with shift and invert are used for refining the eigenvector. Linear systems encountered in RQ are solved via the SYMMLQ algorithm. We consider MC73 Fiedler as one of the best uniprocessor implementation for determining the Fiedler vector. A new parallel algorithm TraceMin-Fiedler is developed based on the Trace Minimization algorithm (TraceMin) [38, 35], and parallel results comparing it to MC73 Fiedler is presented in [25]. We consider solving the standard symmetric eigenvalue problem Lx = λx (6) where L denotes the weighted Laplacian, using the TraceMin scheme for obtaining the Fiedler vector. The basic TraceMin algorithm can be summarized as follows. Let X k be an approximation of the eigenvectors corresponding to the p smallest eigenvalues such that X T k LX k = Σ k and X T k X k =, where Σ k = DAG(ρ (k) 1,ρ(k) 2,...,ρ(k) p ). The updated approximation is obtained by solving the minimization problem min tr(x k k ) T L(X k k ), subject to T k X k = 0. (7)

8 8 Murat Manguoglu This in turn leads to the need for solving a saddle point problem, in each iteration of the TraceMin algorithm, of the form [ ][ ] [ ] L Xk k LXk X T =. (8) k 0 0 N k We solve first the Schur complement system (X T k L 1 X k )N k = X T k X k for obtaining N k. After k is retrieved, (X k k ) is then used to obtain X k+1 which forms the section X T k+1 LX k+1 = Σ k+1,x T k+1 X k+1 =. (9) The TraceMin-Fiedler algorithm, which is based on the basic TraceMin algorithm, is given in Figure 7. Algorithm 1: Data: L is the n n Laplacian matrix defined in Eqn.(5), ε out is the stopping criterion for the. of the eigenvalue problem residual, p is the number of eigenpairs to be computed, and q is the dimension of the search space Result: x 2 is the eigenvector corresponding to the second smallest eigenvalue of L p 2; q p + 2 ; n conv 0; X conv [ ]; ˆL L + L ; D the diagonal of L ; ˆD the diagonal of ˆL ; X 1 rand(n,q); for k = 1,2,... max it do 1. Orthonormalize X k into V k ; 2. Compute the interaction matrix H k V T k LV k; 3. Compute the eigendecomposition H k Y k = Y k Σ k of H k. The eigenvalues Σ k are arranged in ascending order and the eigenvectors are chosen to be orthogonal; 4. Compute the corresponding Ritz vectors X k V k Y k ; Note that X k is a section, i.e. X T k LX k = Σ k,x T k X k = ; 5. Compute the relative residual LX k X k Σ k / L ; 6. Test for convergence: f the relative residual of an approximate eigenvector is less than ε out, move that vector from X k to X conv and replace n conv by n conv + 1 increment. f n conv p, stop; 7. Deflate: f n conv > 0,X k X k X conv (X T convx k ); 8. if n conv = 0 then Solve the linear system ˆLW k = X k approximately with relative residual ε in via the PCG scheme using the diagonal preconditioner ˆD; else Solve the linear system LW k = X k approximately with relative residual ε in via the PCG scheme using the diagonal preconditioner D; 9. Form the Schur complement S k X T k W k; 10. Solve the linear system S k N k = X T k X k for N k ; 11. Update X k+1 X k k = W k N k ; Fig. 7 TraceMin-Fiedler algorithm.

9 Parallel Solution of Sparse Linear Systems 9 Using the above algorithm, speed improvements over the uniprocessor MC73 Fiedler using TraceMin-Fiedler on 1, 8, 16, 32, and 64 cores are shown in Figure 8 for matrices obtained from the University of Florida Sparse Matrix Collection [8]. The platform we use is a cluster with nfiniband interconnection where each node consists of two six-core ntel Xeon CPUs (Westmere X5670) running at 2.93GHz (12 cores per node) Speed mprovement nlpkkt160 circuit5m cage15 rajat31 circuit5m_dc nlpkkt120 freescale1 memchip kktpow er Fig. 8 Speed improvement : Time (MC73 Fiedler) / Time (TraceMin-Fiedler) for 9 test problems. Reordering using the Fiedler vector provides matrices in which the large elements are clustered aroung the main diagonal as shown in Figure 9 for a matrix obtained from the University of Florida Sparse Matrix Collection. Once the reordered system is obtained one can extract a banded preconditioner and solve the general system using a preconditioned iterative method. Systems involving the preconditioner are solved at each iteration using the DS factorization. n which systems involving the diagonal blocks in D can be solved by using (i) dense sequential or multithreaded banded solvers [26] or (ii) a sparse sequential or multithreaded direct solver. Second variation has been implemented in [41, 27, 23] and is called PSPKE. We obtained the g3 circuit (1,585,478 unknowns and 7,660,826 nonzeros) matrix from the University of Florida Sparse Matrix Collection. n Figure 10, speed improvements of PSPKE, MUMPS and Pardiso are presented for g3 circuit system is presented. The platform we use is an ntel Xeon (X5560@2.8GHz) cluster with nband interconnection and 16GB memory per node. n PSPKE, BiCGStab [43] is used as the outer iterative solver and the iterations are stopped

10 10 Murat Manguoglu (a) original matrix (b) TraceMin-Fiedler Fig. 9 Sparsity plots of eurqsa; red and blue indicates the largest and the smallest elements, respectively. when f Ax / f The reduced system is truncated to enhance parallelism and solved directly. 9 Speed mprovement PSPKE PARDSO MUMPS Number of Cores Fig. 10 The speed improvement for g3 circuit compared to Pardiso using one core (73.5 seconds) 3.2 Domain Decomposing Parallel Sparse Solver Given a general sparse linear system Ax = f, we partition A R n n into p block rows A = [A 1,A 2,...,A p ] T. Let where D consists of the p block diagonals of A, A = D + R, (10)

11 Parallel Solution of Sparse Linear Systems 11 A 11 A 22 D =..., App (11) and R consists of the remaining elements (i.e. R = A D). We note that the process can be viewed as an algebraic domain decomposition. Therefore, we will call the method described in this section Domain Decomposing Parallel Sparse Solver (DDPS). The DS factorization in DDPS is for solving the systems involving the preconditioner. Let L i and Ũ i be the incomplete or complete LU factorizations of A ii, where i = 1,2,..., p. We define D = à 11 Ã22... Ãpp (12) in which à ii = L i Ũ i. The DDPS algorithm is shown in Figure 11. Data: Ax = f and p Result: x 1. D + R A for a given p; 2. L i Ũ i A ii (approximate or exact) for i = 1,2,..., p; 3. R R (by dropping some elements); 4. S D 1 R; 5. identify nonzero columns of S and store their indices in array c; 6. solve Ax = f via a Krylov subspace method with a preconditioner P = D + R and stopping tolerance ε out solve Pz = y ( D 1 Pz = D 1 y ( + S)z = g); 6.1. g D 1 y; 6.2. Ŝ ((c,c) + S(c,c)); ẑ z(c); ĝ g(c); 6.3. solve the smaller independent system: Ŝẑ = ĝ (directly or iteratively with stopping tolerance ε in ); 6.4. z(c) ẑ; 6.5. z g Sz; end end Fig. 11 DDPS algorithm. Stages 1 5 are preprocessing stages where the right-hand-side is not required. After preprocessing we solve the system via a Krylov subspace method and using a preconditioner. The major operations in a Krylov subspace method are: (i) matrix vector multiplications, (ii) inner products, and (iii) preconditioning operations in

12 12 Murat Manguoglu the form of Pz = y. Details of the preconditioning operations for DDPS are given in Figure 11. Each stage, with the exception of Stage 6.3, can be executed with perfect parallelism, requiring no interprocessor communication. n Stage 6.3, the solution of the smaller system Ŝẑ = ĝ is the only part of the algorithm that requires communication. We solve this smaller reduced system iteratively via BiCGStab without preconditioning. The size of Ŝ, which is determined by the nonzero columns it has, is problem dependent and is expected to have an influence on the overall scalability of the algorithm. We employ several techniques to reduce the dimension of Ŝ. First, we use METS [19] reordering to minimize the total communication volume, hence reducing the size of Ŝ. We also use the following dropping strategy: if for any column j in R i R(:, j) i δ max l R(:,l) i (i = 1,2,.., p) we do not consider that column when forming Ŝ. Here R i is the block row partition of R (i.e. R = [R 1,R 2,...,R p ] T ). We call this dropping strategy apriori dropping and use it for obtaining the results in this section. Another possibility is to drop elements after computing S posteriori dropping. We note that dropping elements from R in Stage 3 to reduce the size of Ŝ results in an approximation of the solution. Furthermore, we can use approximate LU factorization of the diagonal blocks in Stage 2 and solve Ŝẑ = ĝ iteratively in Stage 6.3. Therefore, we place an outer iterative layer (e.g. BiCGStab) where we use the above algorithm as a solver for systems involving the preconditioner P = D + R, where R consists only of the columns that are not dropped. We stop the outer iterations when the relative residual at the k th iteration r k / r 0 ε out. DDPS is a direct solver if (i) nothing is dropped from R, (ii) exact LU factorization of A ii is computed, and (iii) Ŝẑ = ĝ is solved exactly. n the case of using DDPS as a direct solver, an outer iterative scheme is not required. The choices we make in Stages 2, 3 and 6.3, result in a solver that can be as robust as a direct solver or as scalable as an iterative solver, or anything in between. We note that the outer iterative layer also benefits from our partitioning strategy as METS minimizes the total communication volume in parallel sparse matrix vector multiplications. We further note that Ŝ consists of dense columns, which we store as a two-dimensional array in memory, and as a result matrix vector multiplications can be done via BLAS2 (or BLAS3 in case of multiple right hand sides). We obtained the torso3 matrix (259, 156 unknowns and 4, 429, 042 nonzeros) from the University of Florida sparse matrix collection. n Figure 12 we present for this matrix the speed improvements of DDPS and MUMPS solvers compared to Pardiso on a single core of an ntel Xeon (X5560@2.8GHz) cluster with nband interconnection and 16GB memory per node, and with ε out = n this example, DDPS uses Pardiso for solving systems involving the diagonal blocks of D.

13 Parallel Solution of Sparse Linear Systems MUMPS DDPS speed improvement number of cores Fig. 12 The speed improvement for torso3 compared to Pardiso using one core (49.4 seconds) 4 Conclusions We have presented two alternative formulations of DS factorization based methods for solution of sparse linear systems on parallel computing platforms. Performance comparisons to traditional LU factorization based parallel solvers show that combining iterative methods with direct solvers and using DS factorization, one can achieve better scalability and shorter time to solution. Acknowledgements would like to thank Ahmed Sameh, Ananth Grama, Faisal Saied, Eric Cox, Kenji Takizawa, Madan Sathe, Mehmet Koyuturk, Olaf Schenk, and Tayfun Tezduyar for their help and many useful discussions. References 1. P. R. AMESTOY AND. S. DUFF, Multifrontal parallel distributed symmetric and unsymmetric solvers, Comput. Methods Appl. Mech. Eng, 184 (2000), pp P. R. AMESTOY,. S. DUFF, J.-Y. L EXCELLENT, AND J. KOSTER, A fully asynchronous multifrontal solver using distributed dynamic scheduling, SAM J. Matrix Anal. Appl., 23 (2001), pp P. R. AMESTOY, A. GUERMOUCHE, J.-Y. L EXCELLENT, AND S. PRALET, Hybrid scheduling for the parallel solution of linear systems, Parallel Comput., 32 (2006), pp E. ANDERSON, Z. BA, C. BSCHOF, J. DEMMEL, J. DONGARRA, J. DUCROZ, A. GREEN- BAUM, S. HAMMARLNG, A. MCKENNEY, AND D. SORENSEN., LAPACK: A portable linear algebra library for high-performance computers, tech. report, Knoxville, S. T. BARNARD, A. POTHEN, AND H. SMON, A spectral algorithm for envelope reduction of sparse matrices, Numerical Linear Algebra with Applications, 2 (1995), pp

14 14 Murat Manguoglu 6. M. W. BERRY AND A. SAMEH, Multiprocessor schemes for solving block tridiagonal linear systems, The nternational Journal of Supercomputer Applications, 1 (1988), pp L. S. BLACKFORD, J. CHO, A. CLEARY, E. D AZEVEDO, J. DEMMEL,. DHLLON, J. DONGARRA, S. HAMMARLNG, G. HENRY, A. PETTET, K. STANLEY, D. WALKER, AND R. C. WHALEY, ScaLAPACK: a linear algebra library for message-passing computers, in Proceedings of the Eighth SAM Conference on Parallel Processing for Scientific Computing (Minneapolis, MN, 1997), Philadelphia, PA, USA, 1997, Society for ndustrial and Applied Mathematics, p. 15 (electronic). 8. T. A. DAVS, University of Florida sparse matrix collection. NA Digest, J. J. DONGARRA AND A. H. SAMEH, On some parallel banded system solvers, Parallel Computing, 1 (1984), pp DUFF AND J. KOSTER, On algorithms for permuting large entries to the diagonal of a sparse matrix, S. DUFF AND J. KOSTER, The design and use of algorithms for permuting large entries to the diagonal of sparse matrices, SAM Journal on Matrix Analysis and Applications, 20 (1999), pp M. FEDLER, Algebraic connectivity of graphs, Czechoslovak Mathematical Journal, 23 (1973), pp A. GUPTA, Recent advances in direct methods for solving unsymmetric sparse systems of linear equations, ACM Transactions on Mathematical Software, 28 (2002), pp A. GUPTA, S. KORC, AND T. GEORGE, Sparse matrix factorization on massively parallel computers, in SC 09 USB Key, ACM/EEE, Portland, OR, USA, Nov a1,. 15. X. HE, H. ZHA, C. H. DNG, AND H. D. SMON, Web document clustering using hyperlink structures, Computational Statistics & Data Analysis, 41 (2002), pp D. J. HGHAM, G. KALNA, AND M. KBBLE, Spectral clustering and its use in bioinformatics, Journal of Computational and Applied Mathematics, 204 (2007), pp Special issue dedicated to Professor Shinnosuke Oharu on the occasion of his 65th birthday. 17. HSL, A collection of Fortran codes for large-scale scientific computation, See Y. HU AND J. SCOTT, HSL MC73: a fast multilevel Fiedler and profile reduction code, Technical Report RAL-TR , G. KARYPS AND V. KUMAR, METS: Unstructured graph partitioning and sparse matrix ordering system, The University of Minnesota, 2 (1995). 20. S. KUNDU, D. SORENSEN, AND J. G.N. PHLPHS, Automatic domain decomposition of proteins by a gaussian network model, Proteins: Structure, Function, and Bioinformatics, 57 (2004), pp D. H. LAWRE AND A. H. SAMEH, The computation and communication complexity of a parallel banded system solver, ACM Trans. Math. Softw., 10 (1984), pp X. L AND J. W. DEMMEL, Superlu dist: A scalable distributed-memory sparse direct solver for unsymmetric linear systems, ACM Trans. Mathematical Software, 29 (2003), pp M. MANGUOGLU, A parallel hybrid sparse linear system solver, in Computational Electromagnetics nternational Workshop, CEM 2009, July 2009, pp M. MANGUOGLU, A domain decomposing parallel sparse linear system solver, journal of Computational and Applied Mathematics, (n review). 25. M. MANGUOGLU, E. COX, F. SAED, AND A. SAMEH, TRACEMN-Fiedler: A Parallel Algorithm for Computing the Fiedler Vector, High Performance Computing for Computational Science VECPAR 2010, (2011), pp M. MANGUOGLU, M. KOYUTURK, A. SAMEH, AND A. GRAMA, Weighted matrix ordering and parallel banded preconditioners for iterative linear system solvers, SAM Journal on Scientific Computing, 32 (2010), pp M. MANGUOGLU, A. SAMEH, AND O. SCHENK, Pspike: A parallel hybrid sparse linear system solver, Euro-Par 2009 Parallel Processing, (2009), pp M. MANGUOGLU, K. TAKZAWA, A. SAMEH, AND T. TEZDUYAR, Nested and parallel sparse algorithms for arterial fluid mechanics computations with boundary layer mesh refinement, nternational Journal for Numerical Methods in Fluids, 65 (2011), pp

15 Parallel Solution of Sparse Linear Systems A. Y. NG, M.. JORDAN, AND Y. WESS, On spectral clustering: Analysis and an algorithm, in Advances in Neural nformation Processing Systems 14, MT Press, 2001, pp E. POLZZ AND A. H. SAMEH, A parallel hybrid banded system solver: the spike algorithm, Parallel Comput., 32 (2006), pp , Spike: A parallel environment for solving banded linear systems, Computers & Fluids, 36 (2007), pp A. POTHEN, H. D. SMON, AND K.-P. LOU, Partitioning sparse matrices with eigenvectors of graphs, SAM J. Matrix Anal. Appl., 11 (1990), pp H. QU AND E. R. HANCOCK, Graph matching and clustering using spectral partitions, Pattern Recognition, 39 (2006), pp D. J. K. S. C. CHEN AND A. H. SAMEH, Practical parallel band triangular system solvers, ACM Transactions on Mathematical Software, 4 (1978), pp A. SAMEH AND Z. TONG, The trace minimization method for the symmetric generalized eigenvalue problem, J. Comput. Appl. Math., 123 (2000), pp A. H. SAMEH AND D. J. KUCK, On stable parallel linear system solvers, J. ACM, 25 (1978), pp A. H. SAMEH AND V. SARN, Hybrid parallel linear system solvers, nternational Journal of Computational Fluid Dynamics, 12 (1999), pp A. H. SAMEH AND J. A. WSNEWSK, A trace minimization algorithm for the generalized eigenvalue problem, SAM Journal on Numerical Analysis, 19 (1982), pp O. SCHENK AND K. GÄRTNER, Solving unsymmetric sparse systems of linear equations with pardiso, Future Generation Computer Systems, 20 (2004), pp , On fast factorization pivoting methods for sparse symmetric indefinite systems, Electronic Transactions on Numerical Analysis, 23 (2006), pp O. SCHENK, M. MANGUOGLU, A. SAMEH, M. CHRSTAN, AND M. SATHE, Parallel scalable PDE-constrained optimization: antenna identification in hyperthermia cancer treatment planning, Computer Science Research and Development, 23 (2009). 42. S. J. SHEPHERD, C. B. BEGGS, AND S. JONES, Amino acid partitioning using a fiedler vector model, Journal European Biophysics Journal, 37 (2007), pp H. A. VAN DER VORST, Bi-cgstab: a fast and smoothly converging variant of bi-cg for the solution of nonsymmetric linear systems, SAM J. Sci. Stat. Comput., 13 (1992), pp

Parallel Algorithms for Solution of Large Sparse Linear Systems with Applications

Parallel Algorithms for Solution of Large Sparse Linear Systems with Applications Parallel Algorithms for Solution of Large Sparse Linear Systems with Applications Murat Manguoğlu Department of Computer Engineering Middle East Technical University, Ankara, Turkey Prace workshop: HPC

More information

Sparse Matrix Computations in Arterial Fluid Mechanics

Sparse Matrix Computations in Arterial Fluid Mechanics Sparse Matrix Computations in Arterial Fluid Mechanics Murat Manguoğlu Middle East Technical University, Turkey Kenji Takizawa Ahmed Sameh Tayfun Tezduyar Waseda University, Japan Purdue University, USA

More information

WEIGHTED MATRIX ORDERING AND PARALLEL BANDED PRECONDITIONERS FOR ITERATIVE LINEAR SYSTEM SOLVERS

WEIGHTED MATRIX ORDERING AND PARALLEL BANDED PRECONDITIONERS FOR ITERATIVE LINEAR SYSTEM SOLVERS SIAM J. SCI. COMPUT. Vol. 32, No. 3, pp. 1201 1216 c 2010 Society for Industrial and Applied Mathematics WEIGHTED MATRIX ORDERING AND PARALLEL BANDED PRECONDITIONERS FOR ITERATIVE LINEAR SYSTEM SOLVERS

More information

Using Random Butterfly Transformations to Avoid Pivoting in Sparse Direct Methods

Using Random Butterfly Transformations to Avoid Pivoting in Sparse Direct Methods Using Random Butterfly Transformations to Avoid Pivoting in Sparse Direct Methods Marc Baboulin 1, Xiaoye S. Li 2 and François-Henry Rouet 2 1 University of Paris-Sud, Inria Saclay, France 2 Lawrence Berkeley

More information

Recent advances in sparse linear solver technology for semiconductor device simulation matrices

Recent advances in sparse linear solver technology for semiconductor device simulation matrices Recent advances in sparse linear solver technology for semiconductor device simulation matrices (Invited Paper) Olaf Schenk and Michael Hagemann Department of Computer Science University of Basel Basel,

More information

Maximum-weighted matching strategies and the application to symmetric indefinite systems

Maximum-weighted matching strategies and the application to symmetric indefinite systems Maximum-weighted matching strategies and the application to symmetric indefinite systems by Stefan Röllin, and Olaf Schenk 2 Technical Report CS-24-7 Department of Computer Science, University of Basel

More information

Scientific Computing

Scientific Computing Scientific Computing Direct solution methods Martin van Gijzen Delft University of Technology October 3, 2018 1 Program October 3 Matrix norms LU decomposition Basic algorithm Cost Stability Pivoting Pivoting

More information

EVALUATING SPARSE LINEAR SYSTEM SOLVERS ON SCALABLE PARALLEL ARCHITECTURES

EVALUATING SPARSE LINEAR SYSTEM SOLVERS ON SCALABLE PARALLEL ARCHITECTURES AFRL-RI-RS-TR-2008-273 Final Technical Report October 2008 EVALUATING SPARSE LINEAR SYSTEM SOLVERS ON SCALABLE PARALLEL ARCHITECTURES Purdue University Sponsored by Defense Advanced Research Projects Agency

More information

Direct solution methods for sparse matrices. p. 1/49

Direct solution methods for sparse matrices. p. 1/49 Direct solution methods for sparse matrices p. 1/49 p. 2/49 Direct solution methods for sparse matrices Solve Ax = b, where A(n n). (1) Factorize A = LU, L lower-triangular, U upper-triangular. (2) Solve

More information

Solving Ax = b, an overview. Program

Solving Ax = b, an overview. Program Numerical Linear Algebra Improving iterative solvers: preconditioning, deflation, numerical software and parallelisation Gerard Sleijpen and Martin van Gijzen November 29, 27 Solving Ax = b, an overview

More information

A robust multilevel approximate inverse preconditioner for symmetric positive definite matrices

A robust multilevel approximate inverse preconditioner for symmetric positive definite matrices DICEA DEPARTMENT OF CIVIL, ENVIRONMENTAL AND ARCHITECTURAL ENGINEERING PhD SCHOOL CIVIL AND ENVIRONMENTAL ENGINEERING SCIENCES XXX CYCLE A robust multilevel approximate inverse preconditioner for symmetric

More information

J.I. Aliaga 1 M. Bollhöfer 2 A.F. Martín 1 E.S. Quintana-Ortí 1. March, 2009

J.I. Aliaga 1 M. Bollhöfer 2 A.F. Martín 1 E.S. Quintana-Ortí 1. March, 2009 Parallel Preconditioning of Linear Systems based on ILUPACK for Multithreaded Architectures J.I. Aliaga M. Bollhöfer 2 A.F. Martín E.S. Quintana-Ortí Deparment of Computer Science and Engineering, Univ.

More information

Computing least squares condition numbers on hybrid multicore/gpu systems

Computing least squares condition numbers on hybrid multicore/gpu systems Computing least squares condition numbers on hybrid multicore/gpu systems M. Baboulin and J. Dongarra and R. Lacroix Abstract This paper presents an efficient computation for least squares conditioning

More information

Incomplete Cholesky preconditioners that exploit the low-rank property

Incomplete Cholesky preconditioners that exploit the low-rank property anapov@ulb.ac.be ; http://homepages.ulb.ac.be/ anapov/ 1 / 35 Incomplete Cholesky preconditioners that exploit the low-rank property (theory and practice) Artem Napov Service de Métrologie Nucléaire, Université

More information

Preconditioned Parallel Block Jacobi SVD Algorithm

Preconditioned Parallel Block Jacobi SVD Algorithm Parallel Numerics 5, 15-24 M. Vajteršic, R. Trobec, P. Zinterhof, A. Uhl (Eds.) Chapter 2: Matrix Algebra ISBN 961-633-67-8 Preconditioned Parallel Block Jacobi SVD Algorithm Gabriel Okša 1, Marián Vajteršic

More information

Preface to the Second Edition. Preface to the First Edition

Preface to the Second Edition. Preface to the First Edition n page v Preface to the Second Edition Preface to the First Edition xiii xvii 1 Background in Linear Algebra 1 1.1 Matrices................................. 1 1.2 Square Matrices and Eigenvalues....................

More information

Density-Matrix-Based Algorithms for Solving Eingenvalue Problems

Density-Matrix-Based Algorithms for Solving Eingenvalue Problems University of Massachusetts - Amherst From the SelectedWorks of Eric Polizzi 2009 Density-Matrix-Based Algorithms for Solving Eingenvalue Problems Eric Polizzi, University of Massachusetts - Amherst Available

More information

PRECONDITIONING IN THE PARALLEL BLOCK-JACOBI SVD ALGORITHM

PRECONDITIONING IN THE PARALLEL BLOCK-JACOBI SVD ALGORITHM Proceedings of ALGORITMY 25 pp. 22 211 PRECONDITIONING IN THE PARALLEL BLOCK-JACOBI SVD ALGORITHM GABRIEL OKŠA AND MARIÁN VAJTERŠIC Abstract. One way, how to speed up the computation of the singular value

More information

Contents. Preface... xi. Introduction...

Contents. Preface... xi. Introduction... Contents Preface... xi Introduction... xv Chapter 1. Computer Architectures... 1 1.1. Different types of parallelism... 1 1.1.1. Overlap, concurrency and parallelism... 1 1.1.2. Temporal and spatial parallelism

More information

On aggressive early deflation in parallel variants of the QR algorithm

On aggressive early deflation in parallel variants of the QR algorithm On aggressive early deflation in parallel variants of the QR algorithm Bo Kågström 1, Daniel Kressner 2, and Meiyue Shao 1 1 Department of Computing Science and HPC2N Umeå University, S-901 87 Umeå, Sweden

More information

A High-Performance Parallel Hybrid Method for Large Sparse Linear Systems

A High-Performance Parallel Hybrid Method for Large Sparse Linear Systems Outline A High-Performance Parallel Hybrid Method for Large Sparse Linear Systems Azzam Haidar CERFACS, Toulouse joint work with Luc Giraud (N7-IRIT, France) and Layne Watson (Virginia Polytechnic Institute,

More information

Scientific Computing with Case Studies SIAM Press, Lecture Notes for Unit VII Sparse Matrix

Scientific Computing with Case Studies SIAM Press, Lecture Notes for Unit VII Sparse Matrix Scientific Computing with Case Studies SIAM Press, 2009 http://www.cs.umd.edu/users/oleary/sccswebpage Lecture Notes for Unit VII Sparse Matrix Computations Part 1: Direct Methods Dianne P. O Leary c 2008

More information

Numerical Methods in Matrix Computations

Numerical Methods in Matrix Computations Ake Bjorck Numerical Methods in Matrix Computations Springer Contents 1 Direct Methods for Linear Systems 1 1.1 Elements of Matrix Theory 1 1.1.1 Matrix Algebra 2 1.1.2 Vector Spaces 6 1.1.3 Submatrices

More information

Lecture 8: Fast Linear Solvers (Part 7)

Lecture 8: Fast Linear Solvers (Part 7) Lecture 8: Fast Linear Solvers (Part 7) 1 Modified Gram-Schmidt Process with Reorthogonalization Test Reorthogonalization If Av k 2 + δ v k+1 2 = Av k 2 to working precision. δ = 10 3 2 Householder Arnoldi

More information

Towards parallel bipartite matching algorithms

Towards parallel bipartite matching algorithms Outline Towards parallel bipartite matching algorithms Bora Uçar CNRS and GRAAL, ENS Lyon, France Scheduling for large-scale systems, 13 15 May 2009, Knoxville Joint work with Patrick R. Amestoy (ENSEEIHT-IRIT,

More information

Arnoldi Methods in SLEPc

Arnoldi Methods in SLEPc Scalable Library for Eigenvalue Problem Computations SLEPc Technical Report STR-4 Available at http://slepc.upv.es Arnoldi Methods in SLEPc V. Hernández J. E. Román A. Tomás V. Vidal Last update: October,

More information

A Block Compression Algorithm for Computing Preconditioners

A Block Compression Algorithm for Computing Preconditioners A Block Compression Algorithm for Computing Preconditioners J Cerdán, J Marín, and J Mas Abstract To implement efficiently algorithms for the solution of large systems of linear equations in modern computer

More information

DELFT UNIVERSITY OF TECHNOLOGY

DELFT UNIVERSITY OF TECHNOLOGY DELFT UNIVERSITY OF TECHNOLOGY REPORT 16-02 The Induced Dimension Reduction method applied to convection-diffusion-reaction problems R. Astudillo and M. B. van Gijzen ISSN 1389-6520 Reports of the Delft

More information

Accelerating linear algebra computations with hybrid GPU-multicore systems.

Accelerating linear algebra computations with hybrid GPU-multicore systems. Accelerating linear algebra computations with hybrid GPU-multicore systems. Marc Baboulin INRIA/Université Paris-Sud joint work with Jack Dongarra (University of Tennessee and Oak Ridge National Laboratory)

More information

Enhancing Scalability of Sparse Direct Methods

Enhancing Scalability of Sparse Direct Methods Journal of Physics: Conference Series 78 (007) 0 doi:0.088/7-6596/78//0 Enhancing Scalability of Sparse Direct Methods X.S. Li, J. Demmel, L. Grigori, M. Gu, J. Xia 5, S. Jardin 6, C. Sovinec 7, L.-Q.

More information

IMPROVING THE PERFORMANCE OF SPARSE LU MATRIX FACTORIZATION USING A SUPERNODAL ALGORITHM

IMPROVING THE PERFORMANCE OF SPARSE LU MATRIX FACTORIZATION USING A SUPERNODAL ALGORITHM IMPROVING THE PERFORMANCE OF SPARSE LU MATRIX FACTORIZATION USING A SUPERNODAL ALGORITHM Bogdan OANCEA PhD, Associate Professor, Artife University, Bucharest, Romania E-mail: oanceab@ie.ase.ro Abstract:

More information

The design and use of a sparse direct solver for skew symmetric matrices

The design and use of a sparse direct solver for skew symmetric matrices The design and use of a sparse direct solver for skew symmetric matrices Iain S. Duff May 6, 2008 RAL-TR-2006-027 c Science and Technology Facilities Council Enquires about copyright, reproduction and

More information

EIGIFP: A MATLAB Program for Solving Large Symmetric Generalized Eigenvalue Problems

EIGIFP: A MATLAB Program for Solving Large Symmetric Generalized Eigenvalue Problems EIGIFP: A MATLAB Program for Solving Large Symmetric Generalized Eigenvalue Problems JAMES H. MONEY and QIANG YE UNIVERSITY OF KENTUCKY eigifp is a MATLAB program for computing a few extreme eigenvalues

More information

Incomplete LU Preconditioning and Error Compensation Strategies for Sparse Matrices

Incomplete LU Preconditioning and Error Compensation Strategies for Sparse Matrices Incomplete LU Preconditioning and Error Compensation Strategies for Sparse Matrices Eun-Joo Lee Department of Computer Science, East Stroudsburg University of Pennsylvania, 327 Science and Technology Center,

More information

LU Preconditioning for Overdetermined Sparse Least Squares Problems

LU Preconditioning for Overdetermined Sparse Least Squares Problems LU Preconditioning for Overdetermined Sparse Least Squares Problems Gary W. Howell 1 and Marc Baboulin 2 1 North Carolina State University, USA gwhowell@ncsu.edu 2 Université Paris-Sud and Inria, France

More information

SOLVING SPARSE LINEAR SYSTEMS OF EQUATIONS. Chao Yang Computational Research Division Lawrence Berkeley National Laboratory Berkeley, CA, USA

SOLVING SPARSE LINEAR SYSTEMS OF EQUATIONS. Chao Yang Computational Research Division Lawrence Berkeley National Laboratory Berkeley, CA, USA 1 SOLVING SPARSE LINEAR SYSTEMS OF EQUATIONS Chao Yang Computational Research Division Lawrence Berkeley National Laboratory Berkeley, CA, USA 2 OUTLINE Sparse matrix storage format Basic factorization

More information

Accelerating the spike family of algorithms by solving linear systems with multiple right-hand sides

Accelerating the spike family of algorithms by solving linear systems with multiple right-hand sides Accelerating the spike family of algorithms by solving linear systems with multiple right-hand sides Dept. of Computer Engineering & Informatics University of Patras Graduate Program in Computer Science

More information

SCALABLE HYBRID SPARSE LINEAR SOLVERS

SCALABLE HYBRID SPARSE LINEAR SOLVERS The Pennsylvania State University The Graduate School Department of Computer Science and Engineering SCALABLE HYBRID SPARSE LINEAR SOLVERS A Thesis in Computer Science and Engineering by Keita Teranishi

More information

Preconditioning Techniques for Large Linear Systems Part III: General-Purpose Algebraic Preconditioners

Preconditioning Techniques for Large Linear Systems Part III: General-Purpose Algebraic Preconditioners Preconditioning Techniques for Large Linear Systems Part III: General-Purpose Algebraic Preconditioners Michele Benzi Department of Mathematics and Computer Science Emory University Atlanta, Georgia, USA

More information

An evaluation of sparse direct symmetric solvers: an introduction and preliminary finding

An evaluation of sparse direct symmetric solvers: an introduction and preliminary finding Numerical Analysis Group Internal Report 24- An evaluation of sparse direct symmetric solvers: an introduction and preliminary finding J A Scott Y Hu N I M Gould November 8, 24 c Council for the Central

More information

Using an Auction Algorithm in AMG based on Maximum Weighted Matching in Matrix Graphs

Using an Auction Algorithm in AMG based on Maximum Weighted Matching in Matrix Graphs Using an Auction Algorithm in AMG based on Maximum Weighted Matching in Matrix Graphs Pasqua D Ambra Institute for Applied Computing (IAC) National Research Council of Italy (CNR) pasqua.dambra@cnr.it

More information

AMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences)

AMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences) AMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences) Lecture 19: Computing the SVD; Sparse Linear Systems Xiangmin Jiao Stony Brook University Xiangmin Jiao Numerical

More information

A Robust Preconditioned Iterative Method for the Navier-Stokes Equations with High Reynolds Numbers

A Robust Preconditioned Iterative Method for the Navier-Stokes Equations with High Reynolds Numbers Applied and Computational Mathematics 2017; 6(4): 202-207 http://www.sciencepublishinggroup.com/j/acm doi: 10.11648/j.acm.20170604.18 ISSN: 2328-5605 (Print); ISSN: 2328-5613 (Online) A Robust Preconditioned

More information

TR A Comparison of the Performance of SaP::GPU and Intel s Math Kernel Library (MKL) for Solving Dense Banded Linear Systems

TR A Comparison of the Performance of SaP::GPU and Intel s Math Kernel Library (MKL) for Solving Dense Banded Linear Systems TR-0-07 A Comparison of the Performance of ::GPU and Intel s Math Kernel Library (MKL) for Solving Dense Banded Linear Systems Ang Li, Omkar Deshmukh, Radu Serban, Dan Negrut May, 0 Abstract ::GPU is a

More information

ITERATIVE METHODS FOR SPARSE LINEAR SYSTEMS

ITERATIVE METHODS FOR SPARSE LINEAR SYSTEMS ITERATIVE METHODS FOR SPARSE LINEAR SYSTEMS YOUSEF SAAD University of Minnesota PWS PUBLISHING COMPANY I(T)P An International Thomson Publishing Company BOSTON ALBANY BONN CINCINNATI DETROIT LONDON MADRID

More information

Key words. conjugate gradients, normwise backward error, incremental norm estimation.

Key words. conjugate gradients, normwise backward error, incremental norm estimation. Proceedings of ALGORITMY 2016 pp. 323 332 ON ERROR ESTIMATION IN THE CONJUGATE GRADIENT METHOD: NORMWISE BACKWARD ERROR PETR TICHÝ Abstract. Using an idea of Duff and Vömel [BIT, 42 (2002), pp. 300 322

More information

REPORT DOCUMENTATION PAGE

REPORT DOCUMENTATION PAGE REPORT DOCUMENTATION PAGE Form Approved OMB NO. 0704-0188 The public reporting burden for this collection of information is estimated to average 1 hour per response, including the time for reviewing instructions,

More information

SPARSE SOLVERS POISSON EQUATION. Margreet Nool. November 9, 2015 FOR THE. CWI, Multiscale Dynamics

SPARSE SOLVERS POISSON EQUATION. Margreet Nool. November 9, 2015 FOR THE. CWI, Multiscale Dynamics SPARSE SOLVERS FOR THE POISSON EQUATION Margreet Nool CWI, Multiscale Dynamics November 9, 2015 OUTLINE OF THIS TALK 1 FISHPACK, LAPACK, PARDISO 2 SYSTEM OVERVIEW OF CARTESIUS 3 POISSON EQUATION 4 SOLVERS

More information

July 18, Abstract. capture the unsymmetric structure of the matrices. We introduce a new algorithm which

July 18, Abstract. capture the unsymmetric structure of the matrices. We introduce a new algorithm which An unsymmetrized multifrontal LU factorization Λ Patrick R. Amestoy y and Chiara Puglisi z July 8, 000 Abstract A well-known approach to compute the LU factorization of a general unsymmetric matrix A is

More information

Implementation of a preconditioned eigensolver using Hypre

Implementation of a preconditioned eigensolver using Hypre Implementation of a preconditioned eigensolver using Hypre Andrew V. Knyazev 1, and Merico E. Argentati 1 1 Department of Mathematics, University of Colorado at Denver, USA SUMMARY This paper describes

More information

Fine-grained Parallel Incomplete LU Factorization

Fine-grained Parallel Incomplete LU Factorization Fine-grained Parallel Incomplete LU Factorization Edmond Chow School of Computational Science and Engineering Georgia Institute of Technology Sparse Days Meeting at CERFACS June 5-6, 2014 Contribution

More information

A dissection solver with kernel detection for unsymmetric matrices in FreeFem++

A dissection solver with kernel detection for unsymmetric matrices in FreeFem++ . p.1/21 11 Dec. 2014, LJLL, Paris FreeFem++ workshop A dissection solver with kernel detection for unsymmetric matrices in FreeFem++ Atsushi Suzuki Atsushi.Suzuki@ann.jussieu.fr Joint work with François-Xavier

More information

Algorithm for Sparse Approximate Inverse Preconditioners in the Conjugate Gradient Method

Algorithm for Sparse Approximate Inverse Preconditioners in the Conjugate Gradient Method Algorithm for Sparse Approximate Inverse Preconditioners in the Conjugate Gradient Method Ilya B. Labutin A.A. Trofimuk Institute of Petroleum Geology and Geophysics SB RAS, 3, acad. Koptyug Ave., Novosibirsk

More information

Solving PDEs with CUDA Jonathan Cohen

Solving PDEs with CUDA Jonathan Cohen Solving PDEs with CUDA Jonathan Cohen jocohen@nvidia.com NVIDIA Research PDEs (Partial Differential Equations) Big topic Some common strategies Focus on one type of PDE in this talk Poisson Equation Linear

More information

ANONSINGULAR tridiagonal linear system of the form

ANONSINGULAR tridiagonal linear system of the form Generalized Diagonal Pivoting Methods for Tridiagonal Systems without Interchanges Jennifer B. Erway, Roummel F. Marcia, and Joseph A. Tyson Abstract It has been shown that a nonsingular symmetric tridiagonal

More information

AMS Mathematics Subject Classification : 65F10,65F50. Key words and phrases: ILUS factorization, preconditioning, Schur complement, 1.

AMS Mathematics Subject Classification : 65F10,65F50. Key words and phrases: ILUS factorization, preconditioning, Schur complement, 1. J. Appl. Math. & Computing Vol. 15(2004), No. 1, pp. 299-312 BILUS: A BLOCK VERSION OF ILUS FACTORIZATION DAVOD KHOJASTEH SALKUYEH AND FAEZEH TOUTOUNIAN Abstract. ILUS factorization has many desirable

More information

Linear Solvers. Andrew Hazel

Linear Solvers. Andrew Hazel Linear Solvers Andrew Hazel Introduction Thus far we have talked about the formulation and discretisation of physical problems...... and stopped when we got to a discrete linear system of equations. Introduction

More information

Chapter 7 Iterative Techniques in Matrix Algebra

Chapter 7 Iterative Techniques in Matrix Algebra Chapter 7 Iterative Techniques in Matrix Algebra Per-Olof Persson persson@berkeley.edu Department of Mathematics University of California, Berkeley Math 128B Numerical Analysis Vector Norms Definition

More information

Porting a Sphere Optimization Program from lapack to scalapack

Porting a Sphere Optimization Program from lapack to scalapack Porting a Sphere Optimization Program from lapack to scalapack Paul C. Leopardi Robert S. Womersley 12 October 2008 Abstract The sphere optimization program sphopt was originally written as a sequential

More information

CME342 Parallel Methods in Numerical Analysis. Matrix Computation: Iterative Methods II. Sparse Matrix-vector Multiplication.

CME342 Parallel Methods in Numerical Analysis. Matrix Computation: Iterative Methods II. Sparse Matrix-vector Multiplication. CME342 Parallel Methods in Numerical Analysis Matrix Computation: Iterative Methods II Outline: CG & its parallelization. Sparse Matrix-vector Multiplication. 1 Basic iterative methods: Ax = b r = b Ax

More information

FINE-GRAINED PARALLEL INCOMPLETE LU FACTORIZATION

FINE-GRAINED PARALLEL INCOMPLETE LU FACTORIZATION FINE-GRAINED PARALLEL INCOMPLETE LU FACTORIZATION EDMOND CHOW AND AFTAB PATEL Abstract. This paper presents a new fine-grained parallel algorithm for computing an incomplete LU factorization. All nonzeros

More information

A hybrid Hermitian general eigenvalue solver

A hybrid Hermitian general eigenvalue solver Available online at www.prace-ri.eu Partnership for Advanced Computing in Europe A hybrid Hermitian general eigenvalue solver Raffaele Solcà *, Thomas C. Schulthess Institute fortheoretical Physics ETHZ,

More information

A Sparse QS-Decomposition for Large Sparse Linear System of Equations

A Sparse QS-Decomposition for Large Sparse Linear System of Equations A Sparse QS-Decomposition for Large Sparse Linear System of Equations Wujian Peng 1 and Biswa N. Datta 2 1 Department of Math, Zhaoqing University, Zhaoqing, China, douglas peng@yahoo.com 2 Department

More information

Numerical Methods I Non-Square and Sparse Linear Systems

Numerical Methods I Non-Square and Sparse Linear Systems Numerical Methods I Non-Square and Sparse Linear Systems Aleksandar Donev Courant Institute, NYU 1 donev@courant.nyu.edu 1 MATH-GA 2011.003 / CSCI-GA 2945.003, Fall 2014 September 25th, 2014 A. Donev (Courant

More information

APPLIED NUMERICAL LINEAR ALGEBRA

APPLIED NUMERICAL LINEAR ALGEBRA APPLIED NUMERICAL LINEAR ALGEBRA James W. Demmel University of California Berkeley, California Society for Industrial and Applied Mathematics Philadelphia Contents Preface 1 Introduction 1 1.1 Basic Notation

More information

Scientific Computing: An Introductory Survey

Scientific Computing: An Introductory Survey Scientific Computing: An Introductory Survey Chapter 2 Systems of Linear Equations Prof. Michael T. Heath Department of Computer Science University of Illinois at Urbana-Champaign Copyright c 2002. Reproduction

More information

Multilevel low-rank approximation preconditioners Yousef Saad Department of Computer Science and Engineering University of Minnesota

Multilevel low-rank approximation preconditioners Yousef Saad Department of Computer Science and Engineering University of Minnesota Multilevel low-rank approximation preconditioners Yousef Saad Department of Computer Science and Engineering University of Minnesota SIAM CSE Boston - March 1, 2013 First: Joint work with Ruipeng Li Work

More information

Using Postordering and Static Symbolic Factorization for Parallel Sparse LU

Using Postordering and Static Symbolic Factorization for Parallel Sparse LU Using Postordering and Static Symbolic Factorization for Parallel Sparse LU Michel Cosnard LORIA - INRIA Lorraine Nancy, France Michel.Cosnard@loria.fr Laura Grigori LORIA - Univ. Henri Poincaré Nancy,

More information

Preconditioned Eigensolver LOBPCG in hypre and PETSc

Preconditioned Eigensolver LOBPCG in hypre and PETSc Preconditioned Eigensolver LOBPCG in hypre and PETSc Ilya Lashuk, Merico Argentati, Evgueni Ovtchinnikov, and Andrew Knyazev Department of Mathematics, University of Colorado at Denver, P.O. Box 173364,

More information

A Parallel Implementation of the Trace Minimization Eigensolver

A Parallel Implementation of the Trace Minimization Eigensolver A Parallel Implementation of the Trace Minimization Eigensolver Eloy Romero and Jose E. Roman Instituto ITACA, Universidad Politécnica de Valencia, Camino de Vera, s/n, 4622 Valencia, Spain. Tel. +34-963877356,

More information

Finite-choice algorithm optimization in Conjugate Gradients

Finite-choice algorithm optimization in Conjugate Gradients Finite-choice algorithm optimization in Conjugate Gradients Jack Dongarra and Victor Eijkhout January 2003 Abstract We present computational aspects of mathematically equivalent implementations of the

More information

Preconditioners for the incompressible Navier Stokes equations

Preconditioners for the incompressible Navier Stokes equations Preconditioners for the incompressible Navier Stokes equations C. Vuik M. ur Rehman A. Segal Delft Institute of Applied Mathematics, TU Delft, The Netherlands SIAM Conference on Computational Science and

More information

Iterative Methods for Solving A x = b

Iterative Methods for Solving A x = b Iterative Methods for Solving A x = b A good (free) online source for iterative methods for solving A x = b is given in the description of a set of iterative solvers called templates found at netlib: http

More information

Sparse BLAS-3 Reduction

Sparse BLAS-3 Reduction Sparse BLAS-3 Reduction to Banded Upper Triangular (Spar3Bnd) Gary Howell, HPC/OIT NC State University gary howell@ncsu.edu Sparse BLAS-3 Reduction p.1/27 Acknowledgements James Demmel, Gene Golub, Franc

More information

Numerical Linear Algebra

Numerical Linear Algebra Numerical Linear Algebra Decompositions, numerical aspects Gerard Sleijpen and Martin van Gijzen September 27, 2017 1 Delft University of Technology Program Lecture 2 LU-decomposition Basic algorithm Cost

More information

A Column Pre-ordering Strategy for the Unsymmetric-Pattern Multifrontal Method

A Column Pre-ordering Strategy for the Unsymmetric-Pattern Multifrontal Method A Column Pre-ordering Strategy for the Unsymmetric-Pattern Multifrontal Method TIMOTHY A. DAVIS University of Florida A new method for sparse LU factorization is presented that combines a column pre-ordering

More information

Program Lecture 2. Numerical Linear Algebra. Gaussian elimination (2) Gaussian elimination. Decompositions, numerical aspects

Program Lecture 2. Numerical Linear Algebra. Gaussian elimination (2) Gaussian elimination. Decompositions, numerical aspects Numerical Linear Algebra Decompositions, numerical aspects Program Lecture 2 LU-decomposition Basic algorithm Cost Stability Pivoting Cholesky decomposition Sparse matrices and reorderings Gerard Sleijpen

More information

A sparse multifrontal solver using hierarchically semi-separable frontal matrices

A sparse multifrontal solver using hierarchically semi-separable frontal matrices A sparse multifrontal solver using hierarchically semi-separable frontal matrices Pieter Ghysels Lawrence Berkeley National Laboratory Joint work with: Xiaoye S. Li (LBNL), Artem Napov (ULB), François-Henry

More information

Iterative methods for Linear System of Equations. Joint Advanced Student School (JASS-2009)

Iterative methods for Linear System of Equations. Joint Advanced Student School (JASS-2009) Iterative methods for Linear System of Equations Joint Advanced Student School (JASS-2009) Course #2: Numerical Simulation - from Models to Software Introduction In numerical simulation, Partial Differential

More information

On the Preconditioning of the Block Tridiagonal Linear System of Equations

On the Preconditioning of the Block Tridiagonal Linear System of Equations On the Preconditioning of the Block Tridiagonal Linear System of Equations Davod Khojasteh Salkuyeh Department of Mathematics, University of Mohaghegh Ardabili, PO Box 179, Ardabil, Iran E-mail: khojaste@umaacir

More information

Solving Large Nonlinear Sparse Systems

Solving Large Nonlinear Sparse Systems Solving Large Nonlinear Sparse Systems Fred W. Wubs and Jonas Thies Computational Mechanics & Numerical Mathematics University of Groningen, the Netherlands f.w.wubs@rug.nl Centre for Interdisciplinary

More information

Sparse LU Factorization on GPUs for Accelerating SPICE Simulation

Sparse LU Factorization on GPUs for Accelerating SPICE Simulation Nano-scale Integrated Circuit and System (NICS) Laboratory Sparse LU Factorization on GPUs for Accelerating SPICE Simulation Xiaoming Chen PhD Candidate Department of Electronic Engineering Tsinghua University,

More information

PFEAST: A High Performance Sparse Eigenvalue Solver Using Distributed-Memory Linear Solvers

PFEAST: A High Performance Sparse Eigenvalue Solver Using Distributed-Memory Linear Solvers PFEAST: A High Performance Sparse Eigenvalue Solver Using Distributed-Memory Linear Solvers James Kestyn, Vasileios Kalantzis, Eric Polizzi, Yousef Saad Electrical and Computer Engineering Department,

More information

Robust Preconditioned Conjugate Gradient for the GPU and Parallel Implementations

Robust Preconditioned Conjugate Gradient for the GPU and Parallel Implementations Robust Preconditioned Conjugate Gradient for the GPU and Parallel Implementations Rohit Gupta, Martin van Gijzen, Kees Vuik GPU Technology Conference 2012, San Jose CA. GPU Technology Conference 2012,

More information

Mathematics and Computer Science

Mathematics and Computer Science Technical Report TR-2007-002 Block preconditioning for saddle point systems with indefinite (1,1) block by Michele Benzi, Jia Liu Mathematics and Computer Science EMORY UNIVERSITY International Journal

More information

A dissection solver with kernel detection for symmetric finite element matrices on shared memory computers

A dissection solver with kernel detection for symmetric finite element matrices on shared memory computers A dissection solver with kernel detection for symmetric finite element matrices on shared memory computers Atsushi Suzuki, François-Xavier Roux To cite this version: Atsushi Suzuki, François-Xavier Roux.

More information

AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 23: GMRES and Other Krylov Subspace Methods; Preconditioning

AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 23: GMRES and Other Krylov Subspace Methods; Preconditioning AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 23: GMRES and Other Krylov Subspace Methods; Preconditioning Xiangmin Jiao SUNY Stony Brook Xiangmin Jiao Numerical Analysis I 1 / 18 Outline

More information

FINE-GRAINED PARALLEL INCOMPLETE LU FACTORIZATION

FINE-GRAINED PARALLEL INCOMPLETE LU FACTORIZATION FINE-GRAINED PARALLEL INCOMPLETE LU FACTORIZATION EDMOND CHOW AND AFTAB PATEL Abstract. This paper presents a new fine-grained parallel algorithm for computing an incomplete LU factorization. All nonzeros

More information

Domain Decomposition-based contour integration eigenvalue solvers

Domain Decomposition-based contour integration eigenvalue solvers Domain Decomposition-based contour integration eigenvalue solvers Vassilis Kalantzis joint work with Yousef Saad Computer Science and Engineering Department University of Minnesota - Twin Cities, USA SIAM

More information

Lecture Notes to Accompany. Scientific Computing An Introductory Survey. by Michael T. Heath. Chapter 2. Systems of Linear Equations

Lecture Notes to Accompany. Scientific Computing An Introductory Survey. by Michael T. Heath. Chapter 2. Systems of Linear Equations Lecture Notes to Accompany Scientific Computing An Introductory Survey Second Edition by Michael T. Heath Chapter 2 Systems of Linear Equations Copyright c 2001. Reproduction permitted only for noncommercial,

More information

6.4 Krylov Subspaces and Conjugate Gradients

6.4 Krylov Subspaces and Conjugate Gradients 6.4 Krylov Subspaces and Conjugate Gradients Our original equation is Ax = b. The preconditioned equation is P Ax = P b. When we write P, we never intend that an inverse will be explicitly computed. P

More information

Alternative correction equations in the Jacobi-Davidson method

Alternative correction equations in the Jacobi-Davidson method Chapter 2 Alternative correction equations in the Jacobi-Davidson method Menno Genseberger and Gerard Sleijpen Abstract The correction equation in the Jacobi-Davidson method is effective in a subspace

More information

Some Geometric and Algebraic Aspects of Domain Decomposition Methods

Some Geometric and Algebraic Aspects of Domain Decomposition Methods Some Geometric and Algebraic Aspects of Domain Decomposition Methods D.S.Butyugin 1, Y.L.Gurieva 1, V.P.Ilin 1,2, and D.V.Perevozkin 1 Abstract Some geometric and algebraic aspects of various domain decomposition

More information

1. Introduction. We consider the problem of solving the linear system. Ax = b, (1.1)

1. Introduction. We consider the problem of solving the linear system. Ax = b, (1.1) SCHUR COMPLEMENT BASED DOMAIN DECOMPOSITION PRECONDITIONERS WITH LOW-RANK CORRECTIONS RUIPENG LI, YUANZHE XI, AND YOUSEF SAAD Abstract. This paper introduces a robust preconditioner for general sparse

More information

A SPARSE APPROXIMATE INVERSE PRECONDITIONER FOR NONSYMMETRIC LINEAR SYSTEMS

A SPARSE APPROXIMATE INVERSE PRECONDITIONER FOR NONSYMMETRIC LINEAR SYSTEMS INTERNATIONAL JOURNAL OF NUMERICAL ANALYSIS AND MODELING, SERIES B Volume 5, Number 1-2, Pages 21 30 c 2014 Institute for Scientific Computing and Information A SPARSE APPROXIMATE INVERSE PRECONDITIONER

More information

A METHOD FOR PROFILING THE DISTRIBUTION OF EIGENVALUES USING THE AS METHOD. Kenta Senzaki, Hiroto Tadano, Tetsuya Sakurai and Zhaojun Bai

A METHOD FOR PROFILING THE DISTRIBUTION OF EIGENVALUES USING THE AS METHOD. Kenta Senzaki, Hiroto Tadano, Tetsuya Sakurai and Zhaojun Bai TAIWANESE JOURNAL OF MATHEMATICS Vol. 14, No. 3A, pp. 839-853, June 2010 This paper is available online at http://www.tjm.nsysu.edu.tw/ A METHOD FOR PROFILING THE DISTRIBUTION OF EIGENVALUES USING THE

More information

MARCH 24-27, 2014 SAN JOSE, CA

MARCH 24-27, 2014 SAN JOSE, CA MARCH 24-27, 2014 SAN JOSE, CA Sparse HPC on modern architectures Important scientific applications rely on sparse linear algebra HPCG a new benchmark proposal to complement Top500 (HPL) To solve A x =

More information

Variants of BiCGSafe method using shadow three-term recurrence

Variants of BiCGSafe method using shadow three-term recurrence Variants of BiCGSafe method using shadow three-term recurrence Seiji Fujino, Takashi Sekimoto Research Institute for Information Technology, Kyushu University Ryoyu Systems Co. Ltd. (Kyushu University)

More information

An approximate minimum degree algorithm for matrices with dense rows

An approximate minimum degree algorithm for matrices with dense rows RT/APO/08/02 An approximate minimum degree algorithm for matrices with dense rows P. R. Amestoy 1, H. S. Dollar 2, J. K. Reid 2, J. A. Scott 2 ABSTRACT We present a modified version of the approximate

More information

From Direct to Iterative Substructuring: some Parallel Experiences in 2 and 3D

From Direct to Iterative Substructuring: some Parallel Experiences in 2 and 3D From Direct to Iterative Substructuring: some Parallel Experiences in 2 and 3D Luc Giraud N7-IRIT, Toulouse MUMPS Day October 24, 2006, ENS-INRIA, Lyon, France Outline 1 General Framework 2 The direct

More information