Parallel Solution of Sparse Linear Systems
|
|
- Antony Stanley
- 5 years ago
- Views:
Transcription
1 Parallel Solution of Sparse Linear Systems Murat Manguoglu Abstract Many simulations in science and engineering give rise to sparse linear systems of equations. t is a well known fact that the cost of the simulation process is almost always governed by the solution of the linear systems especially for largescale problems. The emergence of extreme-scale parallel platforms, along with the increasing number of processing cores available on a single chip pose significant challenges for algorithm development. Machines with tens of thousands of multicore processors place tremendous constraints on the communication as well as memory access requirements of algorithms. The increase in number of cores in a processing unit without an increase in memory bandwidth aggravates an already significant memory bottleneck. Sparse linear algebra kernels are well-known for their poor processor utilization. This is a result of limited memory reuse, which renders data caching less effective. n view of emerging hardware trends, it is necessary to develop algorithms that strike a more meaningful balance between memory accesses, communication, and computation. Specifically, an algorithm that performs more floating point operations at the expense of reduced memory accesses and communication is likely to yield better performance. We presented two alternative variations of DS factorization based methods for solution of sparse linear systems on parallel computing platforms. Performance comparisons to traditional LU factorization based parallel solvers is also discussed. We show that combining iterative methods with direct solvers and using DS factorization, one can achieve better scalability and shorter time to solution. Murat Manguoglu Department of Computer Engineering, Middle East Technical University, Ankara, Turkey manguoglu@ceng.metu.edu.tr 1
2 2 Murat Manguoglu 1 ntroduction Many simulations in science and engineering give rise to sparse linear systems of equations. t is a well known fact that the cost of the solution process is almost always governed by the solution of the linear systems especially for large-scale problems. The emergence of extreme-scale parallel platforms, along with the increasing number of processing cores available on a single chip pose significant challenges for algorithm development. Machines with tens of thousands of multicore processors place tremendous constraints on the communication as well as memory access requirements of algorithms. The increase in number of cores in a processing unit without an increase in memory bandwidth aggravates an already significant memory bottleneck. Sparse linear algebra kernels are well-known for their poor processor utilization. This is a result of limited memory reuse, which renders data caching less effective. n view of emerging hardware trends, it is necessary to develop algorithms that strike a more meaningful balance between memory accesses, communication, and computation. Specifically, an algorithm that performs more floating point operations at the expense of reduced memory accesses and communication is likely to yield better performance. Significant amount of effort has been devoted to design and implementation of parallel sparse linear systems solvers. Existing parallel sparse direct solvers, such as MUMPS [3, 2, 1], Pardiso [39, 40], SuperLU [22], and WSMP [14, 13], are based on LU factorization. Therefore, the speed improvements realized by such solvers are often limited due to the inherited limitations of sparse LU factorizations and sparse triangular forward-backward sweeps. terative solvers, such as preconditioned Krylov subspace methods with sparse approximate inverse or incomplete LU factorization based preconditioners, on the other hand, are often more scalable but not as robust as direct solvers. We present two robust hybrid algorithms based on DS factorization for parallel solution of general sparse linear systems. At the cost of increased computation, DS factorization for solving the system allows us to minimize the interprocess communications and, hence, enhances concurrency. The remainder of this chapter is organized as follows. n Section 2, we present banded and sparse variations of DS factorization. n Section 3, we develop two hybrid general sparse linear system solvers that use DS factorization to solve the preconditioned system. 2 Banded and sparse parallel DS factorizations A number of banded solvers have been proposed and implemented in software packages such as, LAPACK [4] for uniprocessors, ScaLAPACK [7], and Spike [36, 34, 21, 6, 9, 30, 31, 37] for parallel architectures. The central idea of Spike is to partition the matrix so that each process (or processing element) can work on its own part of the matrix, with the processes communicating only during the solu-
3 Parallel Solution of Sparse Linear Systems 3 tion of the common reduced system. The size of the reduced system is determined by the bandwidth of the matrix and the number of partitions. Unlike classical sequential LU factorization of the coefficient matrix A, for solving a banded linear system Ax = f, the Spike scheme employs the factorization: A = DS, (1) where D is the block diagonal of A for a given number of partitions. The factor S, given by D 1 A (assuming D is nonsingular), called the spike matrix, consists of the block diagonal identity matrix modified by spikes to the right and left of each partition. The process of solving Ax = f, then, reduces to a sequence of the following steps g D 1 f (modification of the right hand side) S D 1 A (forming the spike system coefficient matrix) ˆx Ŝ 1 ĝ (solving a smaller independent reduced system) x S 1 g (retrieving the full solution). All the steps of the solution process can be executed in perfect parallelism with the exception of the solution of the small reduced system. The size of the reduced system will increase as we increase the number of partitions (or processors). Furthermore, each step can be accomplished using one of several available methods, depending on the specific parallel architecture and the linear system at hand. This gives rise to a family of optimized variants of the basic Spike algorithm. For a small banded system we illustrate the partitioning of the system among three processors in Figure 1. Figure 2 depicts the spike system Sx = g. Finally, Figure 3 shows the smaller reduced system. Dashed boxes in figures show those elements which could be near machine precision if A is diagonally dominant system. gnoring those small elements will give rise to truncated variation of the algorithm allowing more parallelism in the solution of the reduced system. For the solution of general sparse linear systems (where A is sparse) a new algorithm has been proposed in [24, 28]. nspired by the banded Spike solver, this algorithm also uses the DS factorization and partitioning of the system (see Figures 4 and 5). The resulting S matrix, however, is not banded and consists of sparse spikes. Nevertheless, a smaller reduced system can still be obtained (see Figure 6). Solution stages follow exactly the banded case. Both banded and sparse DS factorization based algorithms can be adapted for efficient and scalable solution of general sparse linear systems as will be discussed in Section 3. 3 Solution of general sparse linear systems n this section, we present two hybrid methods for the solution of general sparse linear systems which use an iterative method as the outer layer with preconditioning.
4 4 Murat Manguoglu A x f A 11 A 22 * = A 33 Fig. 1 Partitioning of the system (Ax = f ) into three parts, boxes represent nonzeros D -1 A x D -1 f * = Fig. 2 Spike system: Sx = g where g = D 1 f and S = D 1 A for the system in Figure 1 n both methods, the solution of preconditioned linear systems are handled by one of the variations of the DS factorization based banded or sparse solvers described in Section 2. The first method relies on reordering systems so that the large entries in the coefficient matrix are moved closer to the main diagonal. After reordering an effective banded preconditioner, M, can be extracted and used for solving the system with an outer iterative layer. M could be treated as dense or sparse within the band and hence the diagonal blocks can be handled by a variety of algorithms. The second method, on the other hand, eliminates the need for the reordering with weights by dropping small elements when forming the preconditioner. An outer
5 Parallel Solution of Sparse Linear Systems 5 S x g * = Fig. 3 Reduced system: Ŝ ˆx = ĝ obtained from the spike system in Figure 2 A x f A 11 A 22 * = A 33 Fig. 4 Partitioning of the system (Ax = f ) into three parts, boxes represent nonzeros iterative method is also used for the second method and preconditioned systems are handled by the sparse variation of the DS factorization. 3.1 Weighted reordering and banded preconditioning Given a linear system of equations, Ax = f, we first apply a nonsymmetric row permutation as follows: QAx = Q f. (2)
6 6 Murat Manguoglu D -1 A x D -1 f * = Fig. 5 Spike system: Sx = g where g = D 1 f and S = D 1 A for the system in Figure 4 S x * = g Fig. 6 Reduced system: Ŝ ˆx = ĝ obtained from the spike system in Figure 5 Here, Q is the row permutation matrix that either maximizes the number of nonzeros on the diagonal of A, or the permutation that maximizes the product of the absolute values of the diagonal entries [11]. The first algorithm is known as maximum traversal search, while the second algorithm provides scaling factors so that the absolute values of the diagonal entries are equal to one and all other elements are less than or equal to one. This scaling can be applied as follows: (QD 2 AD 1 )(D 1 1 x) = (QD 2 f ). (3) Both algorithms are implemented in subroutine MC64 [10] of the HSL [17] library. Following the above nonsymmetric reordering and optional scaling, we apply the symmetric permutation P as follows:
7 Parallel Solution of Sparse Linear Systems 7 (PQD 2 AD 1 P T )(PD 1 1 x) = (PQD 2 f ). (4) The permutation, P, can be chosen such that the magnitude of the nonzeros closer to the main diagonal are larger than the ones that are far away. Such a reordering can be obtained by solving the following eigenvalue problem. The second smallest eigenvalue and the corresponding eigenvector of the Laplacian of a graph have been used in a number of application areas including matrix reordering [26, 23, 5], graph partitioning [32, 33], machine learning [29], protein analysis and data mining [16, 42, 20], and web search [15]. The second smallest eigenvalue of the Laplacian of a graph is sometimes called the algebraic connectivity of the graph, and the corresponding eigenvector is known as the Fiedler vector, due to the work of Fiedler [12]. For a given n n sparse symmetric matrix A, or an undirected weighted graph with positive weights, one can form the weighted-laplacian matrix, L w, as follows: { L w (i, j) = ĵ i A(i, ĵ) if i = j, A(i, j) if i j. (5) Since the Fiedler vector can be computed independently for disconnected graphs, we assume that the graph is connected. The eigenvalues of L w are different than zero except λ 1. The eigenvector x 2 corresponding to smallest nontrivial eigenvalue λ 2 is called the Fiedler vector. Since we assume a connected graph, the trivial eigenvector, x 1, is a vector of all ones. n case the matrix, A, is nonsymmetric one can use ( A + A T )/2, instead. A state of the art multilevel solver [18] called MC73 Fiedler for computing the Fiedler vector is implemented in the Harwell Subroutine Library(HSL) [17]. t uses a series of levels of coarser graphs where the eigenvalue problem corresponding to the coarsest level is solved via the Lanczos method for estimating the Fiedler vector. The results are then prolongated to the finer graphs and Rayleigh Quotient terations (RQ) with shift and invert are used for refining the eigenvector. Linear systems encountered in RQ are solved via the SYMMLQ algorithm. We consider MC73 Fiedler as one of the best uniprocessor implementation for determining the Fiedler vector. A new parallel algorithm TraceMin-Fiedler is developed based on the Trace Minimization algorithm (TraceMin) [38, 35], and parallel results comparing it to MC73 Fiedler is presented in [25]. We consider solving the standard symmetric eigenvalue problem Lx = λx (6) where L denotes the weighted Laplacian, using the TraceMin scheme for obtaining the Fiedler vector. The basic TraceMin algorithm can be summarized as follows. Let X k be an approximation of the eigenvectors corresponding to the p smallest eigenvalues such that X T k LX k = Σ k and X T k X k =, where Σ k = DAG(ρ (k) 1,ρ(k) 2,...,ρ(k) p ). The updated approximation is obtained by solving the minimization problem min tr(x k k ) T L(X k k ), subject to T k X k = 0. (7)
8 8 Murat Manguoglu This in turn leads to the need for solving a saddle point problem, in each iteration of the TraceMin algorithm, of the form [ ][ ] [ ] L Xk k LXk X T =. (8) k 0 0 N k We solve first the Schur complement system (X T k L 1 X k )N k = X T k X k for obtaining N k. After k is retrieved, (X k k ) is then used to obtain X k+1 which forms the section X T k+1 LX k+1 = Σ k+1,x T k+1 X k+1 =. (9) The TraceMin-Fiedler algorithm, which is based on the basic TraceMin algorithm, is given in Figure 7. Algorithm 1: Data: L is the n n Laplacian matrix defined in Eqn.(5), ε out is the stopping criterion for the. of the eigenvalue problem residual, p is the number of eigenpairs to be computed, and q is the dimension of the search space Result: x 2 is the eigenvector corresponding to the second smallest eigenvalue of L p 2; q p + 2 ; n conv 0; X conv [ ]; ˆL L + L ; D the diagonal of L ; ˆD the diagonal of ˆL ; X 1 rand(n,q); for k = 1,2,... max it do 1. Orthonormalize X k into V k ; 2. Compute the interaction matrix H k V T k LV k; 3. Compute the eigendecomposition H k Y k = Y k Σ k of H k. The eigenvalues Σ k are arranged in ascending order and the eigenvectors are chosen to be orthogonal; 4. Compute the corresponding Ritz vectors X k V k Y k ; Note that X k is a section, i.e. X T k LX k = Σ k,x T k X k = ; 5. Compute the relative residual LX k X k Σ k / L ; 6. Test for convergence: f the relative residual of an approximate eigenvector is less than ε out, move that vector from X k to X conv and replace n conv by n conv + 1 increment. f n conv p, stop; 7. Deflate: f n conv > 0,X k X k X conv (X T convx k ); 8. if n conv = 0 then Solve the linear system ˆLW k = X k approximately with relative residual ε in via the PCG scheme using the diagonal preconditioner ˆD; else Solve the linear system LW k = X k approximately with relative residual ε in via the PCG scheme using the diagonal preconditioner D; 9. Form the Schur complement S k X T k W k; 10. Solve the linear system S k N k = X T k X k for N k ; 11. Update X k+1 X k k = W k N k ; Fig. 7 TraceMin-Fiedler algorithm.
9 Parallel Solution of Sparse Linear Systems 9 Using the above algorithm, speed improvements over the uniprocessor MC73 Fiedler using TraceMin-Fiedler on 1, 8, 16, 32, and 64 cores are shown in Figure 8 for matrices obtained from the University of Florida Sparse Matrix Collection [8]. The platform we use is a cluster with nfiniband interconnection where each node consists of two six-core ntel Xeon CPUs (Westmere X5670) running at 2.93GHz (12 cores per node) Speed mprovement nlpkkt160 circuit5m cage15 rajat31 circuit5m_dc nlpkkt120 freescale1 memchip kktpow er Fig. 8 Speed improvement : Time (MC73 Fiedler) / Time (TraceMin-Fiedler) for 9 test problems. Reordering using the Fiedler vector provides matrices in which the large elements are clustered aroung the main diagonal as shown in Figure 9 for a matrix obtained from the University of Florida Sparse Matrix Collection. Once the reordered system is obtained one can extract a banded preconditioner and solve the general system using a preconditioned iterative method. Systems involving the preconditioner are solved at each iteration using the DS factorization. n which systems involving the diagonal blocks in D can be solved by using (i) dense sequential or multithreaded banded solvers [26] or (ii) a sparse sequential or multithreaded direct solver. Second variation has been implemented in [41, 27, 23] and is called PSPKE. We obtained the g3 circuit (1,585,478 unknowns and 7,660,826 nonzeros) matrix from the University of Florida Sparse Matrix Collection. n Figure 10, speed improvements of PSPKE, MUMPS and Pardiso are presented for g3 circuit system is presented. The platform we use is an ntel Xeon (X5560@2.8GHz) cluster with nband interconnection and 16GB memory per node. n PSPKE, BiCGStab [43] is used as the outer iterative solver and the iterations are stopped
10 10 Murat Manguoglu (a) original matrix (b) TraceMin-Fiedler Fig. 9 Sparsity plots of eurqsa; red and blue indicates the largest and the smallest elements, respectively. when f Ax / f The reduced system is truncated to enhance parallelism and solved directly. 9 Speed mprovement PSPKE PARDSO MUMPS Number of Cores Fig. 10 The speed improvement for g3 circuit compared to Pardiso using one core (73.5 seconds) 3.2 Domain Decomposing Parallel Sparse Solver Given a general sparse linear system Ax = f, we partition A R n n into p block rows A = [A 1,A 2,...,A p ] T. Let where D consists of the p block diagonals of A, A = D + R, (10)
11 Parallel Solution of Sparse Linear Systems 11 A 11 A 22 D =..., App (11) and R consists of the remaining elements (i.e. R = A D). We note that the process can be viewed as an algebraic domain decomposition. Therefore, we will call the method described in this section Domain Decomposing Parallel Sparse Solver (DDPS). The DS factorization in DDPS is for solving the systems involving the preconditioner. Let L i and Ũ i be the incomplete or complete LU factorizations of A ii, where i = 1,2,..., p. We define D = à 11 Ã22... Ãpp (12) in which à ii = L i Ũ i. The DDPS algorithm is shown in Figure 11. Data: Ax = f and p Result: x 1. D + R A for a given p; 2. L i Ũ i A ii (approximate or exact) for i = 1,2,..., p; 3. R R (by dropping some elements); 4. S D 1 R; 5. identify nonzero columns of S and store their indices in array c; 6. solve Ax = f via a Krylov subspace method with a preconditioner P = D + R and stopping tolerance ε out solve Pz = y ( D 1 Pz = D 1 y ( + S)z = g); 6.1. g D 1 y; 6.2. Ŝ ((c,c) + S(c,c)); ẑ z(c); ĝ g(c); 6.3. solve the smaller independent system: Ŝẑ = ĝ (directly or iteratively with stopping tolerance ε in ); 6.4. z(c) ẑ; 6.5. z g Sz; end end Fig. 11 DDPS algorithm. Stages 1 5 are preprocessing stages where the right-hand-side is not required. After preprocessing we solve the system via a Krylov subspace method and using a preconditioner. The major operations in a Krylov subspace method are: (i) matrix vector multiplications, (ii) inner products, and (iii) preconditioning operations in
12 12 Murat Manguoglu the form of Pz = y. Details of the preconditioning operations for DDPS are given in Figure 11. Each stage, with the exception of Stage 6.3, can be executed with perfect parallelism, requiring no interprocessor communication. n Stage 6.3, the solution of the smaller system Ŝẑ = ĝ is the only part of the algorithm that requires communication. We solve this smaller reduced system iteratively via BiCGStab without preconditioning. The size of Ŝ, which is determined by the nonzero columns it has, is problem dependent and is expected to have an influence on the overall scalability of the algorithm. We employ several techniques to reduce the dimension of Ŝ. First, we use METS [19] reordering to minimize the total communication volume, hence reducing the size of Ŝ. We also use the following dropping strategy: if for any column j in R i R(:, j) i δ max l R(:,l) i (i = 1,2,.., p) we do not consider that column when forming Ŝ. Here R i is the block row partition of R (i.e. R = [R 1,R 2,...,R p ] T ). We call this dropping strategy apriori dropping and use it for obtaining the results in this section. Another possibility is to drop elements after computing S posteriori dropping. We note that dropping elements from R in Stage 3 to reduce the size of Ŝ results in an approximation of the solution. Furthermore, we can use approximate LU factorization of the diagonal blocks in Stage 2 and solve Ŝẑ = ĝ iteratively in Stage 6.3. Therefore, we place an outer iterative layer (e.g. BiCGStab) where we use the above algorithm as a solver for systems involving the preconditioner P = D + R, where R consists only of the columns that are not dropped. We stop the outer iterations when the relative residual at the k th iteration r k / r 0 ε out. DDPS is a direct solver if (i) nothing is dropped from R, (ii) exact LU factorization of A ii is computed, and (iii) Ŝẑ = ĝ is solved exactly. n the case of using DDPS as a direct solver, an outer iterative scheme is not required. The choices we make in Stages 2, 3 and 6.3, result in a solver that can be as robust as a direct solver or as scalable as an iterative solver, or anything in between. We note that the outer iterative layer also benefits from our partitioning strategy as METS minimizes the total communication volume in parallel sparse matrix vector multiplications. We further note that Ŝ consists of dense columns, which we store as a two-dimensional array in memory, and as a result matrix vector multiplications can be done via BLAS2 (or BLAS3 in case of multiple right hand sides). We obtained the torso3 matrix (259, 156 unknowns and 4, 429, 042 nonzeros) from the University of Florida sparse matrix collection. n Figure 12 we present for this matrix the speed improvements of DDPS and MUMPS solvers compared to Pardiso on a single core of an ntel Xeon (X5560@2.8GHz) cluster with nband interconnection and 16GB memory per node, and with ε out = n this example, DDPS uses Pardiso for solving systems involving the diagonal blocks of D.
13 Parallel Solution of Sparse Linear Systems MUMPS DDPS speed improvement number of cores Fig. 12 The speed improvement for torso3 compared to Pardiso using one core (49.4 seconds) 4 Conclusions We have presented two alternative formulations of DS factorization based methods for solution of sparse linear systems on parallel computing platforms. Performance comparisons to traditional LU factorization based parallel solvers show that combining iterative methods with direct solvers and using DS factorization, one can achieve better scalability and shorter time to solution. Acknowledgements would like to thank Ahmed Sameh, Ananth Grama, Faisal Saied, Eric Cox, Kenji Takizawa, Madan Sathe, Mehmet Koyuturk, Olaf Schenk, and Tayfun Tezduyar for their help and many useful discussions. References 1. P. R. AMESTOY AND. S. DUFF, Multifrontal parallel distributed symmetric and unsymmetric solvers, Comput. Methods Appl. Mech. Eng, 184 (2000), pp P. R. AMESTOY,. S. DUFF, J.-Y. L EXCELLENT, AND J. KOSTER, A fully asynchronous multifrontal solver using distributed dynamic scheduling, SAM J. Matrix Anal. Appl., 23 (2001), pp P. R. AMESTOY, A. GUERMOUCHE, J.-Y. L EXCELLENT, AND S. PRALET, Hybrid scheduling for the parallel solution of linear systems, Parallel Comput., 32 (2006), pp E. ANDERSON, Z. BA, C. BSCHOF, J. DEMMEL, J. DONGARRA, J. DUCROZ, A. GREEN- BAUM, S. HAMMARLNG, A. MCKENNEY, AND D. SORENSEN., LAPACK: A portable linear algebra library for high-performance computers, tech. report, Knoxville, S. T. BARNARD, A. POTHEN, AND H. SMON, A spectral algorithm for envelope reduction of sparse matrices, Numerical Linear Algebra with Applications, 2 (1995), pp
14 14 Murat Manguoglu 6. M. W. BERRY AND A. SAMEH, Multiprocessor schemes for solving block tridiagonal linear systems, The nternational Journal of Supercomputer Applications, 1 (1988), pp L. S. BLACKFORD, J. CHO, A. CLEARY, E. D AZEVEDO, J. DEMMEL,. DHLLON, J. DONGARRA, S. HAMMARLNG, G. HENRY, A. PETTET, K. STANLEY, D. WALKER, AND R. C. WHALEY, ScaLAPACK: a linear algebra library for message-passing computers, in Proceedings of the Eighth SAM Conference on Parallel Processing for Scientific Computing (Minneapolis, MN, 1997), Philadelphia, PA, USA, 1997, Society for ndustrial and Applied Mathematics, p. 15 (electronic). 8. T. A. DAVS, University of Florida sparse matrix collection. NA Digest, J. J. DONGARRA AND A. H. SAMEH, On some parallel banded system solvers, Parallel Computing, 1 (1984), pp DUFF AND J. KOSTER, On algorithms for permuting large entries to the diagonal of a sparse matrix, S. DUFF AND J. KOSTER, The design and use of algorithms for permuting large entries to the diagonal of sparse matrices, SAM Journal on Matrix Analysis and Applications, 20 (1999), pp M. FEDLER, Algebraic connectivity of graphs, Czechoslovak Mathematical Journal, 23 (1973), pp A. GUPTA, Recent advances in direct methods for solving unsymmetric sparse systems of linear equations, ACM Transactions on Mathematical Software, 28 (2002), pp A. GUPTA, S. KORC, AND T. GEORGE, Sparse matrix factorization on massively parallel computers, in SC 09 USB Key, ACM/EEE, Portland, OR, USA, Nov a1,. 15. X. HE, H. ZHA, C. H. DNG, AND H. D. SMON, Web document clustering using hyperlink structures, Computational Statistics & Data Analysis, 41 (2002), pp D. J. HGHAM, G. KALNA, AND M. KBBLE, Spectral clustering and its use in bioinformatics, Journal of Computational and Applied Mathematics, 204 (2007), pp Special issue dedicated to Professor Shinnosuke Oharu on the occasion of his 65th birthday. 17. HSL, A collection of Fortran codes for large-scale scientific computation, See Y. HU AND J. SCOTT, HSL MC73: a fast multilevel Fiedler and profile reduction code, Technical Report RAL-TR , G. KARYPS AND V. KUMAR, METS: Unstructured graph partitioning and sparse matrix ordering system, The University of Minnesota, 2 (1995). 20. S. KUNDU, D. SORENSEN, AND J. G.N. PHLPHS, Automatic domain decomposition of proteins by a gaussian network model, Proteins: Structure, Function, and Bioinformatics, 57 (2004), pp D. H. LAWRE AND A. H. SAMEH, The computation and communication complexity of a parallel banded system solver, ACM Trans. Math. Softw., 10 (1984), pp X. L AND J. W. DEMMEL, Superlu dist: A scalable distributed-memory sparse direct solver for unsymmetric linear systems, ACM Trans. Mathematical Software, 29 (2003), pp M. MANGUOGLU, A parallel hybrid sparse linear system solver, in Computational Electromagnetics nternational Workshop, CEM 2009, July 2009, pp M. MANGUOGLU, A domain decomposing parallel sparse linear system solver, journal of Computational and Applied Mathematics, (n review). 25. M. MANGUOGLU, E. COX, F. SAED, AND A. SAMEH, TRACEMN-Fiedler: A Parallel Algorithm for Computing the Fiedler Vector, High Performance Computing for Computational Science VECPAR 2010, (2011), pp M. MANGUOGLU, M. KOYUTURK, A. SAMEH, AND A. GRAMA, Weighted matrix ordering and parallel banded preconditioners for iterative linear system solvers, SAM Journal on Scientific Computing, 32 (2010), pp M. MANGUOGLU, A. SAMEH, AND O. SCHENK, Pspike: A parallel hybrid sparse linear system solver, Euro-Par 2009 Parallel Processing, (2009), pp M. MANGUOGLU, K. TAKZAWA, A. SAMEH, AND T. TEZDUYAR, Nested and parallel sparse algorithms for arterial fluid mechanics computations with boundary layer mesh refinement, nternational Journal for Numerical Methods in Fluids, 65 (2011), pp
15 Parallel Solution of Sparse Linear Systems A. Y. NG, M.. JORDAN, AND Y. WESS, On spectral clustering: Analysis and an algorithm, in Advances in Neural nformation Processing Systems 14, MT Press, 2001, pp E. POLZZ AND A. H. SAMEH, A parallel hybrid banded system solver: the spike algorithm, Parallel Comput., 32 (2006), pp , Spike: A parallel environment for solving banded linear systems, Computers & Fluids, 36 (2007), pp A. POTHEN, H. D. SMON, AND K.-P. LOU, Partitioning sparse matrices with eigenvectors of graphs, SAM J. Matrix Anal. Appl., 11 (1990), pp H. QU AND E. R. HANCOCK, Graph matching and clustering using spectral partitions, Pattern Recognition, 39 (2006), pp D. J. K. S. C. CHEN AND A. H. SAMEH, Practical parallel band triangular system solvers, ACM Transactions on Mathematical Software, 4 (1978), pp A. SAMEH AND Z. TONG, The trace minimization method for the symmetric generalized eigenvalue problem, J. Comput. Appl. Math., 123 (2000), pp A. H. SAMEH AND D. J. KUCK, On stable parallel linear system solvers, J. ACM, 25 (1978), pp A. H. SAMEH AND V. SARN, Hybrid parallel linear system solvers, nternational Journal of Computational Fluid Dynamics, 12 (1999), pp A. H. SAMEH AND J. A. WSNEWSK, A trace minimization algorithm for the generalized eigenvalue problem, SAM Journal on Numerical Analysis, 19 (1982), pp O. SCHENK AND K. GÄRTNER, Solving unsymmetric sparse systems of linear equations with pardiso, Future Generation Computer Systems, 20 (2004), pp , On fast factorization pivoting methods for sparse symmetric indefinite systems, Electronic Transactions on Numerical Analysis, 23 (2006), pp O. SCHENK, M. MANGUOGLU, A. SAMEH, M. CHRSTAN, AND M. SATHE, Parallel scalable PDE-constrained optimization: antenna identification in hyperthermia cancer treatment planning, Computer Science Research and Development, 23 (2009). 42. S. J. SHEPHERD, C. B. BEGGS, AND S. JONES, Amino acid partitioning using a fiedler vector model, Journal European Biophysics Journal, 37 (2007), pp H. A. VAN DER VORST, Bi-cgstab: a fast and smoothly converging variant of bi-cg for the solution of nonsymmetric linear systems, SAM J. Sci. Stat. Comput., 13 (1992), pp
Parallel Algorithms for Solution of Large Sparse Linear Systems with Applications
Parallel Algorithms for Solution of Large Sparse Linear Systems with Applications Murat Manguoğlu Department of Computer Engineering Middle East Technical University, Ankara, Turkey Prace workshop: HPC
More informationSparse Matrix Computations in Arterial Fluid Mechanics
Sparse Matrix Computations in Arterial Fluid Mechanics Murat Manguoğlu Middle East Technical University, Turkey Kenji Takizawa Ahmed Sameh Tayfun Tezduyar Waseda University, Japan Purdue University, USA
More informationWEIGHTED MATRIX ORDERING AND PARALLEL BANDED PRECONDITIONERS FOR ITERATIVE LINEAR SYSTEM SOLVERS
SIAM J. SCI. COMPUT. Vol. 32, No. 3, pp. 1201 1216 c 2010 Society for Industrial and Applied Mathematics WEIGHTED MATRIX ORDERING AND PARALLEL BANDED PRECONDITIONERS FOR ITERATIVE LINEAR SYSTEM SOLVERS
More informationUsing Random Butterfly Transformations to Avoid Pivoting in Sparse Direct Methods
Using Random Butterfly Transformations to Avoid Pivoting in Sparse Direct Methods Marc Baboulin 1, Xiaoye S. Li 2 and François-Henry Rouet 2 1 University of Paris-Sud, Inria Saclay, France 2 Lawrence Berkeley
More informationRecent advances in sparse linear solver technology for semiconductor device simulation matrices
Recent advances in sparse linear solver technology for semiconductor device simulation matrices (Invited Paper) Olaf Schenk and Michael Hagemann Department of Computer Science University of Basel Basel,
More informationMaximum-weighted matching strategies and the application to symmetric indefinite systems
Maximum-weighted matching strategies and the application to symmetric indefinite systems by Stefan Röllin, and Olaf Schenk 2 Technical Report CS-24-7 Department of Computer Science, University of Basel
More informationScientific Computing
Scientific Computing Direct solution methods Martin van Gijzen Delft University of Technology October 3, 2018 1 Program October 3 Matrix norms LU decomposition Basic algorithm Cost Stability Pivoting Pivoting
More informationEVALUATING SPARSE LINEAR SYSTEM SOLVERS ON SCALABLE PARALLEL ARCHITECTURES
AFRL-RI-RS-TR-2008-273 Final Technical Report October 2008 EVALUATING SPARSE LINEAR SYSTEM SOLVERS ON SCALABLE PARALLEL ARCHITECTURES Purdue University Sponsored by Defense Advanced Research Projects Agency
More informationDirect solution methods for sparse matrices. p. 1/49
Direct solution methods for sparse matrices p. 1/49 p. 2/49 Direct solution methods for sparse matrices Solve Ax = b, where A(n n). (1) Factorize A = LU, L lower-triangular, U upper-triangular. (2) Solve
More informationSolving Ax = b, an overview. Program
Numerical Linear Algebra Improving iterative solvers: preconditioning, deflation, numerical software and parallelisation Gerard Sleijpen and Martin van Gijzen November 29, 27 Solving Ax = b, an overview
More informationA robust multilevel approximate inverse preconditioner for symmetric positive definite matrices
DICEA DEPARTMENT OF CIVIL, ENVIRONMENTAL AND ARCHITECTURAL ENGINEERING PhD SCHOOL CIVIL AND ENVIRONMENTAL ENGINEERING SCIENCES XXX CYCLE A robust multilevel approximate inverse preconditioner for symmetric
More informationJ.I. Aliaga 1 M. Bollhöfer 2 A.F. Martín 1 E.S. Quintana-Ortí 1. March, 2009
Parallel Preconditioning of Linear Systems based on ILUPACK for Multithreaded Architectures J.I. Aliaga M. Bollhöfer 2 A.F. Martín E.S. Quintana-Ortí Deparment of Computer Science and Engineering, Univ.
More informationComputing least squares condition numbers on hybrid multicore/gpu systems
Computing least squares condition numbers on hybrid multicore/gpu systems M. Baboulin and J. Dongarra and R. Lacroix Abstract This paper presents an efficient computation for least squares conditioning
More informationIncomplete Cholesky preconditioners that exploit the low-rank property
anapov@ulb.ac.be ; http://homepages.ulb.ac.be/ anapov/ 1 / 35 Incomplete Cholesky preconditioners that exploit the low-rank property (theory and practice) Artem Napov Service de Métrologie Nucléaire, Université
More informationPreconditioned Parallel Block Jacobi SVD Algorithm
Parallel Numerics 5, 15-24 M. Vajteršic, R. Trobec, P. Zinterhof, A. Uhl (Eds.) Chapter 2: Matrix Algebra ISBN 961-633-67-8 Preconditioned Parallel Block Jacobi SVD Algorithm Gabriel Okša 1, Marián Vajteršic
More informationPreface to the Second Edition. Preface to the First Edition
n page v Preface to the Second Edition Preface to the First Edition xiii xvii 1 Background in Linear Algebra 1 1.1 Matrices................................. 1 1.2 Square Matrices and Eigenvalues....................
More informationDensity-Matrix-Based Algorithms for Solving Eingenvalue Problems
University of Massachusetts - Amherst From the SelectedWorks of Eric Polizzi 2009 Density-Matrix-Based Algorithms for Solving Eingenvalue Problems Eric Polizzi, University of Massachusetts - Amherst Available
More informationPRECONDITIONING IN THE PARALLEL BLOCK-JACOBI SVD ALGORITHM
Proceedings of ALGORITMY 25 pp. 22 211 PRECONDITIONING IN THE PARALLEL BLOCK-JACOBI SVD ALGORITHM GABRIEL OKŠA AND MARIÁN VAJTERŠIC Abstract. One way, how to speed up the computation of the singular value
More informationContents. Preface... xi. Introduction...
Contents Preface... xi Introduction... xv Chapter 1. Computer Architectures... 1 1.1. Different types of parallelism... 1 1.1.1. Overlap, concurrency and parallelism... 1 1.1.2. Temporal and spatial parallelism
More informationOn aggressive early deflation in parallel variants of the QR algorithm
On aggressive early deflation in parallel variants of the QR algorithm Bo Kågström 1, Daniel Kressner 2, and Meiyue Shao 1 1 Department of Computing Science and HPC2N Umeå University, S-901 87 Umeå, Sweden
More informationA High-Performance Parallel Hybrid Method for Large Sparse Linear Systems
Outline A High-Performance Parallel Hybrid Method for Large Sparse Linear Systems Azzam Haidar CERFACS, Toulouse joint work with Luc Giraud (N7-IRIT, France) and Layne Watson (Virginia Polytechnic Institute,
More informationScientific Computing with Case Studies SIAM Press, Lecture Notes for Unit VII Sparse Matrix
Scientific Computing with Case Studies SIAM Press, 2009 http://www.cs.umd.edu/users/oleary/sccswebpage Lecture Notes for Unit VII Sparse Matrix Computations Part 1: Direct Methods Dianne P. O Leary c 2008
More informationNumerical Methods in Matrix Computations
Ake Bjorck Numerical Methods in Matrix Computations Springer Contents 1 Direct Methods for Linear Systems 1 1.1 Elements of Matrix Theory 1 1.1.1 Matrix Algebra 2 1.1.2 Vector Spaces 6 1.1.3 Submatrices
More informationLecture 8: Fast Linear Solvers (Part 7)
Lecture 8: Fast Linear Solvers (Part 7) 1 Modified Gram-Schmidt Process with Reorthogonalization Test Reorthogonalization If Av k 2 + δ v k+1 2 = Av k 2 to working precision. δ = 10 3 2 Householder Arnoldi
More informationTowards parallel bipartite matching algorithms
Outline Towards parallel bipartite matching algorithms Bora Uçar CNRS and GRAAL, ENS Lyon, France Scheduling for large-scale systems, 13 15 May 2009, Knoxville Joint work with Patrick R. Amestoy (ENSEEIHT-IRIT,
More informationArnoldi Methods in SLEPc
Scalable Library for Eigenvalue Problem Computations SLEPc Technical Report STR-4 Available at http://slepc.upv.es Arnoldi Methods in SLEPc V. Hernández J. E. Román A. Tomás V. Vidal Last update: October,
More informationA Block Compression Algorithm for Computing Preconditioners
A Block Compression Algorithm for Computing Preconditioners J Cerdán, J Marín, and J Mas Abstract To implement efficiently algorithms for the solution of large systems of linear equations in modern computer
More informationDELFT UNIVERSITY OF TECHNOLOGY
DELFT UNIVERSITY OF TECHNOLOGY REPORT 16-02 The Induced Dimension Reduction method applied to convection-diffusion-reaction problems R. Astudillo and M. B. van Gijzen ISSN 1389-6520 Reports of the Delft
More informationAccelerating linear algebra computations with hybrid GPU-multicore systems.
Accelerating linear algebra computations with hybrid GPU-multicore systems. Marc Baboulin INRIA/Université Paris-Sud joint work with Jack Dongarra (University of Tennessee and Oak Ridge National Laboratory)
More informationEnhancing Scalability of Sparse Direct Methods
Journal of Physics: Conference Series 78 (007) 0 doi:0.088/7-6596/78//0 Enhancing Scalability of Sparse Direct Methods X.S. Li, J. Demmel, L. Grigori, M. Gu, J. Xia 5, S. Jardin 6, C. Sovinec 7, L.-Q.
More informationIMPROVING THE PERFORMANCE OF SPARSE LU MATRIX FACTORIZATION USING A SUPERNODAL ALGORITHM
IMPROVING THE PERFORMANCE OF SPARSE LU MATRIX FACTORIZATION USING A SUPERNODAL ALGORITHM Bogdan OANCEA PhD, Associate Professor, Artife University, Bucharest, Romania E-mail: oanceab@ie.ase.ro Abstract:
More informationThe design and use of a sparse direct solver for skew symmetric matrices
The design and use of a sparse direct solver for skew symmetric matrices Iain S. Duff May 6, 2008 RAL-TR-2006-027 c Science and Technology Facilities Council Enquires about copyright, reproduction and
More informationEIGIFP: A MATLAB Program for Solving Large Symmetric Generalized Eigenvalue Problems
EIGIFP: A MATLAB Program for Solving Large Symmetric Generalized Eigenvalue Problems JAMES H. MONEY and QIANG YE UNIVERSITY OF KENTUCKY eigifp is a MATLAB program for computing a few extreme eigenvalues
More informationIncomplete LU Preconditioning and Error Compensation Strategies for Sparse Matrices
Incomplete LU Preconditioning and Error Compensation Strategies for Sparse Matrices Eun-Joo Lee Department of Computer Science, East Stroudsburg University of Pennsylvania, 327 Science and Technology Center,
More informationLU Preconditioning for Overdetermined Sparse Least Squares Problems
LU Preconditioning for Overdetermined Sparse Least Squares Problems Gary W. Howell 1 and Marc Baboulin 2 1 North Carolina State University, USA gwhowell@ncsu.edu 2 Université Paris-Sud and Inria, France
More informationSOLVING SPARSE LINEAR SYSTEMS OF EQUATIONS. Chao Yang Computational Research Division Lawrence Berkeley National Laboratory Berkeley, CA, USA
1 SOLVING SPARSE LINEAR SYSTEMS OF EQUATIONS Chao Yang Computational Research Division Lawrence Berkeley National Laboratory Berkeley, CA, USA 2 OUTLINE Sparse matrix storage format Basic factorization
More informationAccelerating the spike family of algorithms by solving linear systems with multiple right-hand sides
Accelerating the spike family of algorithms by solving linear systems with multiple right-hand sides Dept. of Computer Engineering & Informatics University of Patras Graduate Program in Computer Science
More informationSCALABLE HYBRID SPARSE LINEAR SOLVERS
The Pennsylvania State University The Graduate School Department of Computer Science and Engineering SCALABLE HYBRID SPARSE LINEAR SOLVERS A Thesis in Computer Science and Engineering by Keita Teranishi
More informationPreconditioning Techniques for Large Linear Systems Part III: General-Purpose Algebraic Preconditioners
Preconditioning Techniques for Large Linear Systems Part III: General-Purpose Algebraic Preconditioners Michele Benzi Department of Mathematics and Computer Science Emory University Atlanta, Georgia, USA
More informationAn evaluation of sparse direct symmetric solvers: an introduction and preliminary finding
Numerical Analysis Group Internal Report 24- An evaluation of sparse direct symmetric solvers: an introduction and preliminary finding J A Scott Y Hu N I M Gould November 8, 24 c Council for the Central
More informationUsing an Auction Algorithm in AMG based on Maximum Weighted Matching in Matrix Graphs
Using an Auction Algorithm in AMG based on Maximum Weighted Matching in Matrix Graphs Pasqua D Ambra Institute for Applied Computing (IAC) National Research Council of Italy (CNR) pasqua.dambra@cnr.it
More informationAMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences)
AMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences) Lecture 19: Computing the SVD; Sparse Linear Systems Xiangmin Jiao Stony Brook University Xiangmin Jiao Numerical
More informationA Robust Preconditioned Iterative Method for the Navier-Stokes Equations with High Reynolds Numbers
Applied and Computational Mathematics 2017; 6(4): 202-207 http://www.sciencepublishinggroup.com/j/acm doi: 10.11648/j.acm.20170604.18 ISSN: 2328-5605 (Print); ISSN: 2328-5613 (Online) A Robust Preconditioned
More informationTR A Comparison of the Performance of SaP::GPU and Intel s Math Kernel Library (MKL) for Solving Dense Banded Linear Systems
TR-0-07 A Comparison of the Performance of ::GPU and Intel s Math Kernel Library (MKL) for Solving Dense Banded Linear Systems Ang Li, Omkar Deshmukh, Radu Serban, Dan Negrut May, 0 Abstract ::GPU is a
More informationITERATIVE METHODS FOR SPARSE LINEAR SYSTEMS
ITERATIVE METHODS FOR SPARSE LINEAR SYSTEMS YOUSEF SAAD University of Minnesota PWS PUBLISHING COMPANY I(T)P An International Thomson Publishing Company BOSTON ALBANY BONN CINCINNATI DETROIT LONDON MADRID
More informationKey words. conjugate gradients, normwise backward error, incremental norm estimation.
Proceedings of ALGORITMY 2016 pp. 323 332 ON ERROR ESTIMATION IN THE CONJUGATE GRADIENT METHOD: NORMWISE BACKWARD ERROR PETR TICHÝ Abstract. Using an idea of Duff and Vömel [BIT, 42 (2002), pp. 300 322
More informationREPORT DOCUMENTATION PAGE
REPORT DOCUMENTATION PAGE Form Approved OMB NO. 0704-0188 The public reporting burden for this collection of information is estimated to average 1 hour per response, including the time for reviewing instructions,
More informationSPARSE SOLVERS POISSON EQUATION. Margreet Nool. November 9, 2015 FOR THE. CWI, Multiscale Dynamics
SPARSE SOLVERS FOR THE POISSON EQUATION Margreet Nool CWI, Multiscale Dynamics November 9, 2015 OUTLINE OF THIS TALK 1 FISHPACK, LAPACK, PARDISO 2 SYSTEM OVERVIEW OF CARTESIUS 3 POISSON EQUATION 4 SOLVERS
More informationJuly 18, Abstract. capture the unsymmetric structure of the matrices. We introduce a new algorithm which
An unsymmetrized multifrontal LU factorization Λ Patrick R. Amestoy y and Chiara Puglisi z July 8, 000 Abstract A well-known approach to compute the LU factorization of a general unsymmetric matrix A is
More informationImplementation of a preconditioned eigensolver using Hypre
Implementation of a preconditioned eigensolver using Hypre Andrew V. Knyazev 1, and Merico E. Argentati 1 1 Department of Mathematics, University of Colorado at Denver, USA SUMMARY This paper describes
More informationFine-grained Parallel Incomplete LU Factorization
Fine-grained Parallel Incomplete LU Factorization Edmond Chow School of Computational Science and Engineering Georgia Institute of Technology Sparse Days Meeting at CERFACS June 5-6, 2014 Contribution
More informationA dissection solver with kernel detection for unsymmetric matrices in FreeFem++
. p.1/21 11 Dec. 2014, LJLL, Paris FreeFem++ workshop A dissection solver with kernel detection for unsymmetric matrices in FreeFem++ Atsushi Suzuki Atsushi.Suzuki@ann.jussieu.fr Joint work with François-Xavier
More informationAlgorithm for Sparse Approximate Inverse Preconditioners in the Conjugate Gradient Method
Algorithm for Sparse Approximate Inverse Preconditioners in the Conjugate Gradient Method Ilya B. Labutin A.A. Trofimuk Institute of Petroleum Geology and Geophysics SB RAS, 3, acad. Koptyug Ave., Novosibirsk
More informationSolving PDEs with CUDA Jonathan Cohen
Solving PDEs with CUDA Jonathan Cohen jocohen@nvidia.com NVIDIA Research PDEs (Partial Differential Equations) Big topic Some common strategies Focus on one type of PDE in this talk Poisson Equation Linear
More informationANONSINGULAR tridiagonal linear system of the form
Generalized Diagonal Pivoting Methods for Tridiagonal Systems without Interchanges Jennifer B. Erway, Roummel F. Marcia, and Joseph A. Tyson Abstract It has been shown that a nonsingular symmetric tridiagonal
More informationAMS Mathematics Subject Classification : 65F10,65F50. Key words and phrases: ILUS factorization, preconditioning, Schur complement, 1.
J. Appl. Math. & Computing Vol. 15(2004), No. 1, pp. 299-312 BILUS: A BLOCK VERSION OF ILUS FACTORIZATION DAVOD KHOJASTEH SALKUYEH AND FAEZEH TOUTOUNIAN Abstract. ILUS factorization has many desirable
More informationLinear Solvers. Andrew Hazel
Linear Solvers Andrew Hazel Introduction Thus far we have talked about the formulation and discretisation of physical problems...... and stopped when we got to a discrete linear system of equations. Introduction
More informationChapter 7 Iterative Techniques in Matrix Algebra
Chapter 7 Iterative Techniques in Matrix Algebra Per-Olof Persson persson@berkeley.edu Department of Mathematics University of California, Berkeley Math 128B Numerical Analysis Vector Norms Definition
More informationPorting a Sphere Optimization Program from lapack to scalapack
Porting a Sphere Optimization Program from lapack to scalapack Paul C. Leopardi Robert S. Womersley 12 October 2008 Abstract The sphere optimization program sphopt was originally written as a sequential
More informationCME342 Parallel Methods in Numerical Analysis. Matrix Computation: Iterative Methods II. Sparse Matrix-vector Multiplication.
CME342 Parallel Methods in Numerical Analysis Matrix Computation: Iterative Methods II Outline: CG & its parallelization. Sparse Matrix-vector Multiplication. 1 Basic iterative methods: Ax = b r = b Ax
More informationFINE-GRAINED PARALLEL INCOMPLETE LU FACTORIZATION
FINE-GRAINED PARALLEL INCOMPLETE LU FACTORIZATION EDMOND CHOW AND AFTAB PATEL Abstract. This paper presents a new fine-grained parallel algorithm for computing an incomplete LU factorization. All nonzeros
More informationA hybrid Hermitian general eigenvalue solver
Available online at www.prace-ri.eu Partnership for Advanced Computing in Europe A hybrid Hermitian general eigenvalue solver Raffaele Solcà *, Thomas C. Schulthess Institute fortheoretical Physics ETHZ,
More informationA Sparse QS-Decomposition for Large Sparse Linear System of Equations
A Sparse QS-Decomposition for Large Sparse Linear System of Equations Wujian Peng 1 and Biswa N. Datta 2 1 Department of Math, Zhaoqing University, Zhaoqing, China, douglas peng@yahoo.com 2 Department
More informationNumerical Methods I Non-Square and Sparse Linear Systems
Numerical Methods I Non-Square and Sparse Linear Systems Aleksandar Donev Courant Institute, NYU 1 donev@courant.nyu.edu 1 MATH-GA 2011.003 / CSCI-GA 2945.003, Fall 2014 September 25th, 2014 A. Donev (Courant
More informationAPPLIED NUMERICAL LINEAR ALGEBRA
APPLIED NUMERICAL LINEAR ALGEBRA James W. Demmel University of California Berkeley, California Society for Industrial and Applied Mathematics Philadelphia Contents Preface 1 Introduction 1 1.1 Basic Notation
More informationScientific Computing: An Introductory Survey
Scientific Computing: An Introductory Survey Chapter 2 Systems of Linear Equations Prof. Michael T. Heath Department of Computer Science University of Illinois at Urbana-Champaign Copyright c 2002. Reproduction
More informationMultilevel low-rank approximation preconditioners Yousef Saad Department of Computer Science and Engineering University of Minnesota
Multilevel low-rank approximation preconditioners Yousef Saad Department of Computer Science and Engineering University of Minnesota SIAM CSE Boston - March 1, 2013 First: Joint work with Ruipeng Li Work
More informationUsing Postordering and Static Symbolic Factorization for Parallel Sparse LU
Using Postordering and Static Symbolic Factorization for Parallel Sparse LU Michel Cosnard LORIA - INRIA Lorraine Nancy, France Michel.Cosnard@loria.fr Laura Grigori LORIA - Univ. Henri Poincaré Nancy,
More informationPreconditioned Eigensolver LOBPCG in hypre and PETSc
Preconditioned Eigensolver LOBPCG in hypre and PETSc Ilya Lashuk, Merico Argentati, Evgueni Ovtchinnikov, and Andrew Knyazev Department of Mathematics, University of Colorado at Denver, P.O. Box 173364,
More informationA Parallel Implementation of the Trace Minimization Eigensolver
A Parallel Implementation of the Trace Minimization Eigensolver Eloy Romero and Jose E. Roman Instituto ITACA, Universidad Politécnica de Valencia, Camino de Vera, s/n, 4622 Valencia, Spain. Tel. +34-963877356,
More informationFinite-choice algorithm optimization in Conjugate Gradients
Finite-choice algorithm optimization in Conjugate Gradients Jack Dongarra and Victor Eijkhout January 2003 Abstract We present computational aspects of mathematically equivalent implementations of the
More informationPreconditioners for the incompressible Navier Stokes equations
Preconditioners for the incompressible Navier Stokes equations C. Vuik M. ur Rehman A. Segal Delft Institute of Applied Mathematics, TU Delft, The Netherlands SIAM Conference on Computational Science and
More informationIterative Methods for Solving A x = b
Iterative Methods for Solving A x = b A good (free) online source for iterative methods for solving A x = b is given in the description of a set of iterative solvers called templates found at netlib: http
More informationSparse BLAS-3 Reduction
Sparse BLAS-3 Reduction to Banded Upper Triangular (Spar3Bnd) Gary Howell, HPC/OIT NC State University gary howell@ncsu.edu Sparse BLAS-3 Reduction p.1/27 Acknowledgements James Demmel, Gene Golub, Franc
More informationNumerical Linear Algebra
Numerical Linear Algebra Decompositions, numerical aspects Gerard Sleijpen and Martin van Gijzen September 27, 2017 1 Delft University of Technology Program Lecture 2 LU-decomposition Basic algorithm Cost
More informationA Column Pre-ordering Strategy for the Unsymmetric-Pattern Multifrontal Method
A Column Pre-ordering Strategy for the Unsymmetric-Pattern Multifrontal Method TIMOTHY A. DAVIS University of Florida A new method for sparse LU factorization is presented that combines a column pre-ordering
More informationProgram Lecture 2. Numerical Linear Algebra. Gaussian elimination (2) Gaussian elimination. Decompositions, numerical aspects
Numerical Linear Algebra Decompositions, numerical aspects Program Lecture 2 LU-decomposition Basic algorithm Cost Stability Pivoting Cholesky decomposition Sparse matrices and reorderings Gerard Sleijpen
More informationA sparse multifrontal solver using hierarchically semi-separable frontal matrices
A sparse multifrontal solver using hierarchically semi-separable frontal matrices Pieter Ghysels Lawrence Berkeley National Laboratory Joint work with: Xiaoye S. Li (LBNL), Artem Napov (ULB), François-Henry
More informationIterative methods for Linear System of Equations. Joint Advanced Student School (JASS-2009)
Iterative methods for Linear System of Equations Joint Advanced Student School (JASS-2009) Course #2: Numerical Simulation - from Models to Software Introduction In numerical simulation, Partial Differential
More informationOn the Preconditioning of the Block Tridiagonal Linear System of Equations
On the Preconditioning of the Block Tridiagonal Linear System of Equations Davod Khojasteh Salkuyeh Department of Mathematics, University of Mohaghegh Ardabili, PO Box 179, Ardabil, Iran E-mail: khojaste@umaacir
More informationSolving Large Nonlinear Sparse Systems
Solving Large Nonlinear Sparse Systems Fred W. Wubs and Jonas Thies Computational Mechanics & Numerical Mathematics University of Groningen, the Netherlands f.w.wubs@rug.nl Centre for Interdisciplinary
More informationSparse LU Factorization on GPUs for Accelerating SPICE Simulation
Nano-scale Integrated Circuit and System (NICS) Laboratory Sparse LU Factorization on GPUs for Accelerating SPICE Simulation Xiaoming Chen PhD Candidate Department of Electronic Engineering Tsinghua University,
More informationPFEAST: A High Performance Sparse Eigenvalue Solver Using Distributed-Memory Linear Solvers
PFEAST: A High Performance Sparse Eigenvalue Solver Using Distributed-Memory Linear Solvers James Kestyn, Vasileios Kalantzis, Eric Polizzi, Yousef Saad Electrical and Computer Engineering Department,
More informationRobust Preconditioned Conjugate Gradient for the GPU and Parallel Implementations
Robust Preconditioned Conjugate Gradient for the GPU and Parallel Implementations Rohit Gupta, Martin van Gijzen, Kees Vuik GPU Technology Conference 2012, San Jose CA. GPU Technology Conference 2012,
More informationMathematics and Computer Science
Technical Report TR-2007-002 Block preconditioning for saddle point systems with indefinite (1,1) block by Michele Benzi, Jia Liu Mathematics and Computer Science EMORY UNIVERSITY International Journal
More informationA dissection solver with kernel detection for symmetric finite element matrices on shared memory computers
A dissection solver with kernel detection for symmetric finite element matrices on shared memory computers Atsushi Suzuki, François-Xavier Roux To cite this version: Atsushi Suzuki, François-Xavier Roux.
More informationAMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 23: GMRES and Other Krylov Subspace Methods; Preconditioning
AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 23: GMRES and Other Krylov Subspace Methods; Preconditioning Xiangmin Jiao SUNY Stony Brook Xiangmin Jiao Numerical Analysis I 1 / 18 Outline
More informationFINE-GRAINED PARALLEL INCOMPLETE LU FACTORIZATION
FINE-GRAINED PARALLEL INCOMPLETE LU FACTORIZATION EDMOND CHOW AND AFTAB PATEL Abstract. This paper presents a new fine-grained parallel algorithm for computing an incomplete LU factorization. All nonzeros
More informationDomain Decomposition-based contour integration eigenvalue solvers
Domain Decomposition-based contour integration eigenvalue solvers Vassilis Kalantzis joint work with Yousef Saad Computer Science and Engineering Department University of Minnesota - Twin Cities, USA SIAM
More informationLecture Notes to Accompany. Scientific Computing An Introductory Survey. by Michael T. Heath. Chapter 2. Systems of Linear Equations
Lecture Notes to Accompany Scientific Computing An Introductory Survey Second Edition by Michael T. Heath Chapter 2 Systems of Linear Equations Copyright c 2001. Reproduction permitted only for noncommercial,
More information6.4 Krylov Subspaces and Conjugate Gradients
6.4 Krylov Subspaces and Conjugate Gradients Our original equation is Ax = b. The preconditioned equation is P Ax = P b. When we write P, we never intend that an inverse will be explicitly computed. P
More informationAlternative correction equations in the Jacobi-Davidson method
Chapter 2 Alternative correction equations in the Jacobi-Davidson method Menno Genseberger and Gerard Sleijpen Abstract The correction equation in the Jacobi-Davidson method is effective in a subspace
More informationSome Geometric and Algebraic Aspects of Domain Decomposition Methods
Some Geometric and Algebraic Aspects of Domain Decomposition Methods D.S.Butyugin 1, Y.L.Gurieva 1, V.P.Ilin 1,2, and D.V.Perevozkin 1 Abstract Some geometric and algebraic aspects of various domain decomposition
More information1. Introduction. We consider the problem of solving the linear system. Ax = b, (1.1)
SCHUR COMPLEMENT BASED DOMAIN DECOMPOSITION PRECONDITIONERS WITH LOW-RANK CORRECTIONS RUIPENG LI, YUANZHE XI, AND YOUSEF SAAD Abstract. This paper introduces a robust preconditioner for general sparse
More informationA SPARSE APPROXIMATE INVERSE PRECONDITIONER FOR NONSYMMETRIC LINEAR SYSTEMS
INTERNATIONAL JOURNAL OF NUMERICAL ANALYSIS AND MODELING, SERIES B Volume 5, Number 1-2, Pages 21 30 c 2014 Institute for Scientific Computing and Information A SPARSE APPROXIMATE INVERSE PRECONDITIONER
More informationA METHOD FOR PROFILING THE DISTRIBUTION OF EIGENVALUES USING THE AS METHOD. Kenta Senzaki, Hiroto Tadano, Tetsuya Sakurai and Zhaojun Bai
TAIWANESE JOURNAL OF MATHEMATICS Vol. 14, No. 3A, pp. 839-853, June 2010 This paper is available online at http://www.tjm.nsysu.edu.tw/ A METHOD FOR PROFILING THE DISTRIBUTION OF EIGENVALUES USING THE
More informationMARCH 24-27, 2014 SAN JOSE, CA
MARCH 24-27, 2014 SAN JOSE, CA Sparse HPC on modern architectures Important scientific applications rely on sparse linear algebra HPCG a new benchmark proposal to complement Top500 (HPL) To solve A x =
More informationVariants of BiCGSafe method using shadow three-term recurrence
Variants of BiCGSafe method using shadow three-term recurrence Seiji Fujino, Takashi Sekimoto Research Institute for Information Technology, Kyushu University Ryoyu Systems Co. Ltd. (Kyushu University)
More informationAn approximate minimum degree algorithm for matrices with dense rows
RT/APO/08/02 An approximate minimum degree algorithm for matrices with dense rows P. R. Amestoy 1, H. S. Dollar 2, J. K. Reid 2, J. A. Scott 2 ABSTRACT We present a modified version of the approximate
More informationFrom Direct to Iterative Substructuring: some Parallel Experiences in 2 and 3D
From Direct to Iterative Substructuring: some Parallel Experiences in 2 and 3D Luc Giraud N7-IRIT, Toulouse MUMPS Day October 24, 2006, ENS-INRIA, Lyon, France Outline 1 General Framework 2 The direct
More information