A dissection solver with kernel detection for unsymmetric matrices in FreeFem++
|
|
- Cecilia Stanley
- 5 years ago
- Views:
Transcription
1 . p.1/21 11 Dec. 2014, LJLL, Paris FreeFem++ workshop A dissection solver with kernel detection for unsymmetric matrices in FreeFem++ Atsushi Suzuki Atsushi.Suzuki@ann.jussieu.fr Joint work with François-Xavier Roux, LJLL-UPMC, ONERA,
2 . p.2/21 Sparse direct solvers Software parallel elimination data management kernel environment strategy and pivoting detection SuperLU_MT shared super-nodal dynamic data + pivoting no Pardiso shared super-nodal dynamic data + ε-perturbation no + pivoting SuperLU_DIST distributed super-nodal static data + ε-perturbation none MUMPS distributed multi-frontal dynamic data + pivoting yes J. W. Demmel, S. C. Eisenstat, J. R. Gilbert, X. S. Li, J. W. H. Liu. A supernodal approach to sparse partial pivoting, SIAM Journal on Matrix Analysis and Applications, 20 (1999), O. Schenk, K. Gärtner. Solving unsymmetric sparse systems of liner equations with PARDISO, Future Generation of Computer Systems, 20 (2004), X. S. Li, J. W. Demmel. SuperLU_DIST : A scalable distributed-memory sparse direct solver for unsymmetric linear systems, ACM Transactions on Mathematical Software, 29 (200), P. R. Amestoy, I. S. Duff, J.-Y. L Execellent. Mutlifrontal parallel distributed symmetric and unsymmetric solvers, Computer Methods in Applied Mechanics and Engineering, 184 (2000)
3 Sparse direct solvers Software parallel elimination data management kernel environment strategy and pivoting detection SuperLU_MT shared super-nodal dynamic data + pivoting no Pardiso shared super-nodal dynamic data + ε-perturbation no + pivoting SuperLU_DIST distributed super-nodal static data + ε-perturbation none MUMPS distributed multi-frontal dynamic data + pivoting yes Dissection shared multi-frontal static data + pivoting yes J. W. Demmel, S. C. Eisenstat, J. R. Gilbert, X. S. Li, J. W. H. Liu. A supernodal approach to sparse partial pivoting, SIAM Journal on Matrix Analysis and Applications, 20 (1999), O. Schenk, K. Gärtner. Solving unsymmetric sparse systems of liner equations with PARDISO, Future Generation of Computer Systems, 20 (2004), X. S. Li, J. W. Demmel. SuperLU_DIST : A scalable distributed-memory sparse direct solver for unsymmetric linear systems, ACM Transactions on Mathematical Software, 29 (200), P. R. Amestoy, I. S. Duff, J.-Y. L Execellent. Mutlifrontal parallel distributed symmetric and unsymmetric solvers, Computer Methods in Applied Mechanics and Engineering, 184 (2000) A. Suzuki, F.-X. Roux, A dissection solver with kernel detection for symmetric finite element matrices on shared memory computers, International Journal for Numerical Methods in Engineering, 100 (2014) p./21
4 . p.4/21 Overview of Dissection solver Dissection based on graph decomposition of sparse matrix 1 dense solver a 5 b e 7 f 2 dense solver c a 5 b c 6 d e 7 f dense solver sparse solver d each leaf can be computed in parallel sparse solver (tridiag) is written with Fortran90 by F.-X. Roux? load unbalance? different sizes of subdomains? parallel computation of higher levels : smaller subdomains with many cores/processors?
5 . p.5/21 Overview : recursive generation of Schur complement 8 9 a b c d e f aa a5 a bb b5 b2 b1 Schur complement cc dd c6 d6 d c1 d1 by sparse solver ee e7 e e a 5b ff f f Schur complement by dense solver 6c 6d Schur complement a 2b 7e 7f by dense solver d e f b 1c 1d 1e dense factorization there is no 6 block in sparse matrix. 6 in Schur complement has values as result of fill-in. This is analyzed in symbolic phase.
6 . p.6/21 Overview : graph decomposition by METIS/SCOTCH matrix size = 206,76, bisection level = 8, sub-blocks = 255 METIS SCOTCH number of nonzeros in dense blocks METIS SCOTCH diag 82,659,78 66,956,787 off-diag 18,097, ,858,81 total 265,79,487 22,815,168
7 . p.7/21 Block factorization and block pivot strategy block pivot strategy is employed for parallel efficiency Π T 1 Π T L L 21 L D D U 1 1 U 1 2 U U 2 2 U Π Π Π T L 1 L 2 L D U Π k k LDU LDU LDU 2 LDU 2 LDU LDU When null-pivots appear in a block, they are sent to the last, and corresponding rows and columns are nullified for rank (n k) update. 0
8 . p.8/21 Dissection solver : parallelization of dense blocks T LDL factorization A 11 A 12 A 1 A 14 dtrsm + dcaling A 22 A 2 A 24 A A 4 A 22 A 2 A 24 A A 4 -= A 21 A 1 A 11-1 A 12 A 1 A 14 A 44 A 44 A 41 dgemm α (1) : factorization L 11 D 11 L T 11 = A(1) 11 β (1) k : DTRSM + scaling D 1 11 (L 1 11 A(1) 1 k ) γ (1) k, l : DGEMM A k, l ---= (L 1 11 A 1 k) T D 1 11 (L 1 11 A(1) 1 l ) α (2) : factorization L 22 D 22 L T 22 = A(2) 22 α (1) {β (1) 2, β(1), β(1) 4 } {γ(1) 2,2, γ(1) 2,, γ(1),,,γ(1), β(2) 4 } {γ(2),, γ(2) - updating of Schur complement depends on factorization sequential among levels 4,4 } α(2) {β (2) α (1) {β (1) 2 -γ(1) 2,2 -α(2), β (1), β(1) 4 } {γ(1) 2,, γ(1),,,γ(1) 4,4 } {β(2) -γ(2), -α(), β (2) 4 } {γ(2) + factorization of diagonal block // updating of rest of Schur complement,4, γ(2) 4,4 },4, γ(2) 4,4 }
9 . p.9/21 Task scheduling in Dissection solver n = 206, 76 4?,5?,6?,7? 2?,?,1? DTRSM DGEMM SUBTR α β γ Westmere Xeon 5680@.GHz, 6core x2, POSIX threads 2?,?,1? DTRSM α 21,1 DGEMM SUBTR β γ factorization 22, factorization 11 αβγ αβγ computation of Schur complement in level, factorization in level 2 : scheduled together. tasks on critical path : scheduled statically other tasks : executed in greedy way with large group using predicted complexity + fine greedy
10 . p.10/21 Performance comparison N =1,004,784 symmetric real matrix : 2.0GHz Dissection Intel Pardiso MUMPS + parablas core CPU elapsed / CPU elapsed / CPU elapsed / 1 5, , ,40.6 5,41.1 5, , ,64.7 2, , , ,547.5, , , ,40.9 1, , , , , , , , , , , , , ,88.7 1, automatic parallelization of BLAS by OpenMP is not efficient + Dissection uses coarse grain parallelization by Pthreads # core GFlop/s time for parallel tasks time for the numerical factorization elapsed time idle time of cores elapsed time CPU time GFlops /159.84GFlops = 60% : running on NUMA is another challenge
11 . p.11/21 Task scheduling and performance of Dissection solver n = 206, 76 Westmere Xeon 5680@.GHz, 6core x2, POSIX threads LDLt LinvU fdhlfs DTRSM fdgemm sdgemm DSUB2d DSUB2o DSUBmd DSUBmo Dfilld Dfillo SpLDLt SpSchr dealloc GFlop/s
12 . p.12/21 Detection of the kernel of matrix in Dissection solver kernel of stiffness matrix coarse space for domain decomposition theoretically the last block is singular for semi-positive definite matrices numerically all blocks may be nearly singular ε > 0 : given threshold for null pivot a i i / a i 1 i 1 < ε a i i is null pivot regular entry candidates of null pivot 11 Schur complement matrix from suspicious null pivots and additional nodes = the algorithm to compute dimension of the kernel
13 How to determine dimension of the kernel? 2 4Ã11 à 12 à 21 à 22 5 = 2 4Ã11 0 à 21 S I 1 à 1 0 I 2 11 Ã12 5 S22 = 0 Kerà = 2 4à 1 11 Ã12 I 2 5 numerical example: symmetric semi-positive definite, m + k = = 10 by Householder-QR factorization: Matrix Computation; Golub, Van Loan, e e e e e-02-9.e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e-1-8.0e e e e e e-1-5.e e e e e e e e e e e e e-14 how to set threshold to distinguish non-zero (1.28e-02) and zero (1.2e-11) values? = an algorithm by measuring dimension of a projection onto the image space and detecting existence of inverse of the matrix using higher precision arithmetic. p.1/21
14 Detection of the kernel of matrix (1/4) A R N N, dimkera = k 1, dimima m. two parameters: l, n, which define factorization, A 11 A 12 5 = 4 A I 1 A 1 11 A 12 5 A 11 R (N l) (N l) or R (N n) (N n). A 21 A 22 A 21 S 22 0 I 2 projection : P n : RN span 2 solution in subspace, Ā N l b = Ā Ā 1 11 A 12 I Ā 1 11 b 1 0 5, A 11 R (N n) (N n) 5, A 11 R (N l) (N l), b = 2 4 b 1 5 = b b N l b l : in quadruple-precision with perturbation to simulate double-precision round-off error. n : candidate of dimension of the kernel compute for l = n 1, n, n + 1 err (n) l := max max x=[0 x l ] 0 P n (Ā N l A x x) x, max x=[x N l 0] 0 5 Ā N l A x x x A 1 N k+1 n = k + 1 err n 1 0 err n 0 err n+1 1 n = k err n 1 0 err n 0 err n+1 1 n = k 1 err n 1 0 err n 0 err n+1 1 does not exist. err(k) k 1 is computable and is similar order of Ā 1 N A N I N.. p.14/21
15 S 22 0 Ā 1 A I p.15/21 Detection of the kernel of matrix (2/4) Ā 1 : in quadruple-precision with perturbation to simulate double-precision round-off error with decomposition of A into regular part A 1 1 R m m and others A 11 A I A I 11 A 1 11 A 125 A 21 A 22 A 21 Â 1 11 I 22 0 S 22 0 I 22 A 1 11 = U 1 11 D 1 11 L artificial perturbation with ε A A I 11 Â 1 11 A 12 4Ǎ I A 21 A 22 0 I 22 0 Š 1 22 A 21 Â 1 11 I 22 Ǎ 1 11 = U 1 11 D 1 11 L 1 11 within quadruple-precision A A A 11 A I A 21 A 22 A 21 A 22 0 I 2 2 = 4 (Ǎ 1 11 A 11 I 1 ) Â 1 11 A12Š 1 22 A 21(I 1 Â 1 11 A 11) Ǎ 1 11 A 12 Â 1 11 A12Š 1 Š 1 22 A 21(I 1 Â 1 11 A 11) Š 1 22 Ŝ22 I 2 Ŝ 22 := A 22 A 21 Â 1 11 A 12 Â 1 11 A 11 I , Ǎ 1 11 A 11 I 1 10, Š 1 22 S 22 I S 22 = 0 Ŝ Ā 1 A I 0 22 Ŝ22 5
16 Detection of the kernel of matrix (/4) Stokes equations with full-neumann bc., N = 199, 808, m = 4, τ = 10 2, eigenvalues by Householder-QR [D] i : diagonal entry [D] 1 i k err (k) k 1 err (k) k err (k) k dim. of kernel = 5 dim. of kernel = 6 dim. of kernel = 7. p.16/
17 . p.17/21 Detection of the kernel of matrix (4/4) stationary Navier-Stokes equations, Re = 12,800, N = 47,886, m = 4, τ = 10 2, matrix by Householder QR factorization e e e e e e e e e e e e e e e e e e e e e-16 k err (k) k 1 err (k) k err (k) k dim. of kernel = 1 dim. of kernel =
18 . p.18/21 dependency of kernel detection on a parameter in MUMPS Threshold for null pivots by CNTL() automatic elstct e-14 kernel error 4.817e e e+0 residual e e e-15 stokes e-21 kernel NA ( ) error e e e e+1 residual e e e e-12 elstct e-16 kernel error e e+0.184e e+0 residual 1.827e e e e-15 Koutsovasilis/F e-14 kernel error 1.809e e e e+0 residual 1.557e e e e-16 ( ) stokes1 with CNTL()= 10 exceeds the memory limitation.
19 . p.19/21 memory consumption: Dissection and Intel Pardiso real 206,76 sym Dissection Intel Pardiso min max avg min max avg 1 thread fact (+2%).977 (+25%) fw/bw (+25%) threads fact (+2%) (+17%) fw/bw (+16%) threads fact (+12%) (+7%) fw/bw (+6%) threads fact (+17%) (+5%) fw/bw (+0%) real 206,76 unsym Dissection Intel Pardiso min max avg min max avg 1 thread fact (+17%) (+10%) fw/bw (+9%) threads fact (+1%) (+8%) fw/bw (+7%) threads fact (+11%) (+4%) fw/bw (+%) threads fact (+9%) (+1%) fw/bw (-%)
20 . p.20/21 FreeFem++ example stationary Navier-Stokes equations by Newton-Raphson iteration : cavitynewtow.edp find (u, p) V g L 2 0 (Ω) a(u, v) + a 1 (u, u, v) + b(v, p) = 0 b(u, q) ε(p, q) = 0 ε = load "Dissection" defaulttodissection(); XXMh [up1,up2,pp]; varf vdns ([u1,u2,p],[v1,v2,q])= int2d(th)(nu*(dx(u1)*dx(v1)+dy(u1)*dy(v1) + dx(u2)*dx(v2)+dy(u2)*dy(v2)) // + p*q*1.e-6 + p*dx(v1)+p*dy(v2) - dx(u1)*q-dy(u2)*q + Ugrad(u1,u2,up1,up2) *[v1,v2] + Ugrad(up1,up2,u1,u2) *[v1,v2]) + on(1,2,,4,u1=0,u2=0); //... up1[]=u1[]; matrix Ans=vDNS(XXMh,XXMh,tgv=-1); set(ans,solver=sparsesolver,tgv=-1, numthreads=4,tolpivot=1.0e-2); real[int] b=vns(0,xxmh,tgv=-1); real[int] w=ansˆ-1*b; u1[]-=w;
21 . p.21/21 summary diss_int(pointer dslv, real_or_complex, numthreads) diss_s_fact(pointer dslv, dim, CSR_symboic_data, sym, decomposer, level) diss_n_fact(pointer dslv, matrix_coef, scaling, tolpivot) diss_get_kern_dim(pointer dslv, kern_dim) diss_get_kern_vecs(pointer dslv, vectors) diss_fee(pointer dslv) written in Dissection.cpp for FreeFem++ load module. sparse matrix is stored in CSR format with structurally symmetric pattern. symmetric matrix is stored in either upper or lower. real symmetric/unsymmetric with kernel detection. complex general matrix without kernel detection. symbolic factorization contains scheduling of parallel tasks in addition to ordering by SCOTCH/METIS and fill-in analysis. SCOTCH/METIS sometimes does not produce aligned bisection tree. symbolic factorization phase is slower than one of UMFPACK double-double quadruple precision arithmetic library will replace Fortran90 REAL(16) usage.
A dissection solver with kernel detection for symmetric finite element matrices on shared memory computers
A dissection solver with kernel detection for symmetric finite element matrices on shared memory computers Atsushi Suzuki, François-Xavier Roux To cite this version: Atsushi Suzuki, François-Xavier Roux.
More informationDissection sparse direct solver for indefinite finite element matrices and application to a semi-conductor problem
10th FreeFem++ days 2018, 13 Dec. Dissection sparse direct solver for indefinite finite element matrices and application to a semi-conductor problem Atsushi Suzuki 1 1 Cybermedia Center, Osaka University
More informationUsing Random Butterfly Transformations to Avoid Pivoting in Sparse Direct Methods
Using Random Butterfly Transformations to Avoid Pivoting in Sparse Direct Methods Marc Baboulin 1, Xiaoye S. Li 2 and François-Henry Rouet 2 1 University of Paris-Sud, Inria Saclay, France 2 Lawrence Berkeley
More informationSparse linear solvers
Sparse linear solvers Laura Grigori ALPINES INRIA and LJLL, UPMC On sabbatical at UC Berkeley March 2015 Plan Sparse linear solvers Sparse matrices and graphs Classes of linear solvers Sparse Cholesky
More informationIncomplete Cholesky preconditioners that exploit the low-rank property
anapov@ulb.ac.be ; http://homepages.ulb.ac.be/ anapov/ 1 / 35 Incomplete Cholesky preconditioners that exploit the low-rank property (theory and practice) Artem Napov Service de Métrologie Nucléaire, Université
More informationNumerical simulation of the Gross-Pitaevskii equation by pseudo-spectral and finite element methods comparison of GPS code and FreeFem++
. p.1/13 10 ec. 2014, LJLL, Paris FreeFem++ workshop : BECASIM session Numerical simulation of the Gross-Pitaevskii equation by pseudo-spectral and finite element methods comparison of GPS code and FreeFem++
More informationOn the design of parallel linear solvers for large scale problems
On the design of parallel linear solvers for large scale problems Journée problème de Poisson, IHP, Paris M. Faverge, P. Ramet M. Faverge Assistant Professor Bordeaux INP LaBRI Inria Bordeaux - Sud-Ouest
More informationMaximum-weighted matching strategies and the application to symmetric indefinite systems
Maximum-weighted matching strategies and the application to symmetric indefinite systems by Stefan Röllin, and Olaf Schenk 2 Technical Report CS-24-7 Department of Computer Science, University of Basel
More informationIMPROVING THE PERFORMANCE OF SPARSE LU MATRIX FACTORIZATION USING A SUPERNODAL ALGORITHM
IMPROVING THE PERFORMANCE OF SPARSE LU MATRIX FACTORIZATION USING A SUPERNODAL ALGORITHM Bogdan OANCEA PhD, Associate Professor, Artife University, Bucharest, Romania E-mail: oanceab@ie.ase.ro Abstract:
More informationParallelization of Multilevel Preconditioners Constructed from Inverse-Based ILUs on Shared-Memory Multiprocessors
Parallelization of Multilevel Preconditioners Constructed from Inverse-Based ILUs on Shared-Memory Multiprocessors J.I. Aliaga 1 M. Bollhöfer 2 A.F. Martín 1 E.S. Quintana-Ortí 1 1 Deparment of Computer
More informationSPARSE SOLVERS POISSON EQUATION. Margreet Nool. November 9, 2015 FOR THE. CWI, Multiscale Dynamics
SPARSE SOLVERS FOR THE POISSON EQUATION Margreet Nool CWI, Multiscale Dynamics November 9, 2015 OUTLINE OF THIS TALK 1 FISHPACK, LAPACK, PARDISO 2 SYSTEM OVERVIEW OF CARTESIUS 3 POISSON EQUATION 4 SOLVERS
More informationScientific Computing
Scientific Computing Direct solution methods Martin van Gijzen Delft University of Technology October 3, 2018 1 Program October 3 Matrix norms LU decomposition Basic algorithm Cost Stability Pivoting Pivoting
More informationA Sparse QS-Decomposition for Large Sparse Linear System of Equations
A Sparse QS-Decomposition for Large Sparse Linear System of Equations Wujian Peng 1 and Biswa N. Datta 2 1 Department of Math, Zhaoqing University, Zhaoqing, China, douglas peng@yahoo.com 2 Department
More informationEnhancing Scalability of Sparse Direct Methods
Journal of Physics: Conference Series 78 (007) 0 doi:0.088/7-6596/78//0 Enhancing Scalability of Sparse Direct Methods X.S. Li, J. Demmel, L. Grigori, M. Gu, J. Xia 5, S. Jardin 6, C. Sovinec 7, L.-Q.
More informationA Column Pre-ordering Strategy for the Unsymmetric-Pattern Multifrontal Method
A Column Pre-ordering Strategy for the Unsymmetric-Pattern Multifrontal Method TIMOTHY A. DAVIS University of Florida A new method for sparse LU factorization is presented that combines a column pre-ordering
More informationV C V L T I 0 C V B 1 V T 0 I. l nk
Multifrontal Method Kailai Xu September 16, 2017 Main observation. Consider the LDL T decomposition of a SPD matrix [ ] [ ] [ ] [ ] B V T L 0 I 0 L T L A = = 1 V T V C V L T I 0 C V B 1 V T, 0 I where
More informationJ.I. Aliaga 1 M. Bollhöfer 2 A.F. Martín 1 E.S. Quintana-Ortí 1. March, 2009
Parallel Preconditioning of Linear Systems based on ILUPACK for Multithreaded Architectures J.I. Aliaga M. Bollhöfer 2 A.F. Martín E.S. Quintana-Ortí Deparment of Computer Science and Engineering, Univ.
More informationInformation Sciences Institute 22 June 2012 Bob Lucas, Gene Wagenbreth, Dan Davis, Roger Grimes and
Accelerating the Multifrontal Method Information Sciences Institute 22 June 2012 Bob Lucas, Gene Wagenbreth, Dan Davis, Roger Grimes {rflucas,genew,ddavis}@isi.edu and grimes@lstc.com 3D Finite Element
More informationAn evaluation of sparse direct symmetric solvers: an introduction and preliminary finding
Numerical Analysis Group Internal Report 24- An evaluation of sparse direct symmetric solvers: an introduction and preliminary finding J A Scott Y Hu N I M Gould November 8, 24 c Council for the Central
More informationSparse LU Factorization on GPUs for Accelerating SPICE Simulation
Nano-scale Integrated Circuit and System (NICS) Laboratory Sparse LU Factorization on GPUs for Accelerating SPICE Simulation Xiaoming Chen PhD Candidate Department of Electronic Engineering Tsinghua University,
More informationImprovements for Implicit Linear Equation Solvers
Improvements for Implicit Linear Equation Solvers Roger Grimes, Bob Lucas, Clement Weisbecker Livermore Software Technology Corporation Abstract Solving large sparse linear systems of equations is often
More informationDense Arithmetic over Finite Fields with CUMODP
Dense Arithmetic over Finite Fields with CUMODP Sardar Anisul Haque 1 Xin Li 2 Farnam Mansouri 1 Marc Moreno Maza 1 Wei Pan 3 Ning Xie 1 1 University of Western Ontario, Canada 2 Universidad Carlos III,
More informationRecent advances in sparse linear solver technology for semiconductor device simulation matrices
Recent advances in sparse linear solver technology for semiconductor device simulation matrices (Invited Paper) Olaf Schenk and Michael Hagemann Department of Computer Science University of Basel Basel,
More informationOn the design of parallel linear solvers for large scale problems
On the design of parallel linear solvers for large scale problems ICIAM - August 2015 - Mini-Symposium on Recent advances in matrix computations for extreme-scale computers M. Faverge, X. Lacoste, G. Pichon,
More informationA convex QP solver based on block-lu updates
Block-LU updates p. 1/24 A convex QP solver based on block-lu updates PP06 SIAM Conference on Parallel Processing for Scientific Computing San Francisco, CA, Feb 22 24, 2006 Hanh Huynh and Michael Saunders
More informationSOLVING SPARSE LINEAR SYSTEMS OF EQUATIONS. Chao Yang Computational Research Division Lawrence Berkeley National Laboratory Berkeley, CA, USA
1 SOLVING SPARSE LINEAR SYSTEMS OF EQUATIONS Chao Yang Computational Research Division Lawrence Berkeley National Laboratory Berkeley, CA, USA 2 OUTLINE Sparse matrix storage format Basic factorization
More informationMUMPS. The MUMPS library: work done during the SOLSTICE project. MUMPS team, Lyon-Grenoble, Toulouse, Bordeaux
The MUMPS library: work done during the SOLSTICE project MUMPS team, Lyon-Grenoble, Toulouse, Bordeaux Sparse Days and ANR SOLSTICE Final Workshop June MUMPS MUMPS Team since beg. of SOLSTICE (2007) Permanent
More informationA sparse multifrontal solver using hierarchically semi-separable frontal matrices
A sparse multifrontal solver using hierarchically semi-separable frontal matrices Pieter Ghysels Lawrence Berkeley National Laboratory Joint work with: Xiaoye S. Li (LBNL), Artem Napov (ULB), François-Henry
More informationContents. Preface... xi. Introduction...
Contents Preface... xi Introduction... xv Chapter 1. Computer Architectures... 1 1.1. Different types of parallelism... 1 1.1.1. Overlap, concurrency and parallelism... 1 1.1.2. Temporal and spatial parallelism
More information1 Overview. 2 Adapting to computing system evolution. 11 th European LS-DYNA Conference 2017, Salzburg, Austria
1 Overview Improving LSTC s Multifrontal Linear Solver Roger Grimes 3, Robert Lucas 3, Nick Meng 2, Francois-Henry Rouet 3, Clement Weisbecker 3, and Ting-Ting Zhu 1 1 Cray Incorporated 2 Intel Corporation
More informationComputing least squares condition numbers on hybrid multicore/gpu systems
Computing least squares condition numbers on hybrid multicore/gpu systems M. Baboulin and J. Dongarra and R. Lacroix Abstract This paper presents an efficient computation for least squares conditioning
More informationUtilisation de la compression low-rank pour réduire la complexité du solveur PaStiX
Utilisation de la compression low-rank pour réduire la complexité du solveur PaStiX 26 Septembre 2018 - JCAD 2018 - Lyon Grégoire Pichon, Mathieu Faverge, Pierre Ramet, Jean Roman Outline 1. Context 2.
More informationAMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences)
AMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences) Lecture 19: Computing the SVD; Sparse Linear Systems Xiangmin Jiao Stony Brook University Xiangmin Jiao Numerical
More informationMath 102, Winter Final Exam Review. Chapter 1. Matrices and Gaussian Elimination
Math 0, Winter 07 Final Exam Review Chapter. Matrices and Gaussian Elimination { x + x =,. Different forms of a system of linear equations. Example: The x + 4x = 4. [ ] [ ] [ ] vector form (or the column
More informationMultilevel low-rank approximation preconditioners Yousef Saad Department of Computer Science and Engineering University of Minnesota
Multilevel low-rank approximation preconditioners Yousef Saad Department of Computer Science and Engineering University of Minnesota SIAM CSE Boston - March 1, 2013 First: Joint work with Ruipeng Li Work
More informationToward black-box adaptive domain decomposition methods
Toward black-box adaptive domain decomposition methods Frédéric Nataf Laboratory J.L. Lions (LJLL), CNRS, Alpines Inria and Univ. Paris VI joint work with Victorita Dolean (Univ. Nice Sophia-Antipolis)
More informationStatic-scheduling and hybrid-programming in SuperLU DIST on multicore cluster systems
Static-scheduling and hybrid-programming in SuperLU DIST on multicore cluster systems Ichitaro Yamazaki University of Tennessee, Knoxville Xiaoye Sherry Li Lawrence Berkeley National Laboratory MS49: Sparse
More informationImplementation of a KKT-based active-set QP solver
Block-LU updates p. 1/27 Implementation of a KKT-based active-set QP solver ISMP 2006 19th International Symposium on Mathematical Programming Rio de Janeiro, Brazil, July 30 August 4, 2006 Hanh Huynh
More informationDELFT UNIVERSITY OF TECHNOLOGY
DELFT UNIVERSITY OF TECHNOLOGY REPORT 09-08 Preconditioned conjugate gradient method enhanced by deflation of rigid body modes applied to composite materials. T.B. Jönsthövel ISSN 1389-6520 Reports of
More informationSparse factorization using low rank submatrices. Cleve Ashcraft LSTC 2010 MUMPS User Group Meeting April 15-16, 2010 Toulouse, FRANCE
Sparse factorization using low rank submatrices Cleve Ashcraft LSTC cleve@lstc.com 21 MUMPS User Group Meeting April 15-16, 21 Toulouse, FRANCE ftp.lstc.com:outgoing/cleve/mumps1 Ashcraft.pdf 1 LSTC Livermore
More informationNumerical Methods in Matrix Computations
Ake Bjorck Numerical Methods in Matrix Computations Springer Contents 1 Direct Methods for Linear Systems 1 1.1 Elements of Matrix Theory 1 1.1.1 Matrix Algebra 2 1.1.2 Vector Spaces 6 1.1.3 Submatrices
More informationSolving Large Nonlinear Sparse Systems
Solving Large Nonlinear Sparse Systems Fred W. Wubs and Jonas Thies Computational Mechanics & Numerical Mathematics University of Groningen, the Netherlands f.w.wubs@rug.nl Centre for Interdisciplinary
More informationLarge Scale Sparse Linear Algebra
Large Scale Sparse Linear Algebra P. Amestoy (INP-N7, IRIT) A. Buttari (CNRS, IRIT) T. Mary (University of Toulouse, IRIT) A. Guermouche (Univ. Bordeaux, LaBRI), J.-Y. L Excellent (INRIA, LIP, ENS-Lyon)
More informationAccelerating linear algebra computations with hybrid GPU-multicore systems.
Accelerating linear algebra computations with hybrid GPU-multicore systems. Marc Baboulin INRIA/Université Paris-Sud joint work with Jack Dongarra (University of Tennessee and Oak Ridge National Laboratory)
More informationTowards parallel bipartite matching algorithms
Outline Towards parallel bipartite matching algorithms Bora Uçar CNRS and GRAAL, ENS Lyon, France Scheduling for large-scale systems, 13 15 May 2009, Knoxville Joint work with Patrick R. Amestoy (ENSEEIHT-IRIT,
More informationOpportunities for ELPA to Accelerate the Solution of the Bethe-Salpeter Eigenvalue Problem
Opportunities for ELPA to Accelerate the Solution of the Bethe-Salpeter Eigenvalue Problem Peter Benner, Andreas Marek, Carolin Penke August 16, 2018 ELSI Workshop 2018 Partners: The Problem The Bethe-Salpeter
More informationJacobi-Davidson Eigensolver in Cusolver Library. Lung-Sheng Chien, NVIDIA
Jacobi-Davidson Eigensolver in Cusolver Library Lung-Sheng Chien, NVIDIA lchien@nvidia.com Outline CuSolver library - cusolverdn: dense LAPACK - cusolversp: sparse LAPACK - cusolverrf: refactorization
More informationScalable Hybrid Programming and Performance for SuperLU Sparse Direct Solver
Scalable Hybrid Programming and Performance for SuperLU Sparse Direct Solver Sherry Li Lawrence Berkeley National Laboratory Piyush Sao Rich Vuduc Georgia Institute of Technology CUG 14, May 4-8, 14, Lugano,
More informationDirect and Incomplete Cholesky Factorizations with Static Supernodes
Direct and Incomplete Cholesky Factorizations with Static Supernodes AMSC 661 Term Project Report Yuancheng Luo 2010-05-14 Introduction Incomplete factorizations of sparse symmetric positive definite (SSPD)
More informationSparse BLAS-3 Reduction
Sparse BLAS-3 Reduction to Banded Upper Triangular (Spar3Bnd) Gary Howell, HPC/OIT NC State University gary howell@ncsu.edu Sparse BLAS-3 Reduction p.1/27 Acknowledgements James Demmel, Gene Golub, Franc
More informationSome Geometric and Algebraic Aspects of Domain Decomposition Methods
Some Geometric and Algebraic Aspects of Domain Decomposition Methods D.S.Butyugin 1, Y.L.Gurieva 1, V.P.Ilin 1,2, and D.V.Perevozkin 1 Abstract Some geometric and algebraic aspects of various domain decomposition
More informationRecent advances in HPC with FreeFem++
Recent advances in HPC with FreeFem++ Pierre Jolivet Laboratoire Jacques-Louis Lions Laboratoire Jean Kuntzmann Fourth workshop on FreeFem++ December 6, 2012 With F. Hecht, F. Nataf, C. Prud homme. Outline
More informationAccelerating computation of eigenvectors in the nonsymmetric eigenvalue problem
Accelerating computation of eigenvectors in the nonsymmetric eigenvalue problem Mark Gates 1, Azzam Haidar 1, and Jack Dongarra 1,2,3 1 University of Tennessee, Knoxville, TN, USA 2 Oak Ridge National
More informationChallenges for Matrix Preconditioning Methods
Challenges for Matrix Preconditioning Methods Matthias Bollhoefer 1 1 Dept. of Mathematics TU Berlin Preconditioning 2005, Atlanta, May 19, 2005 supported by the DFG research center MATHEON in Berlin Outline
More informationarxiv: v1 [cs.na] 20 Jul 2015
AN EFFICIENT SOLVER FOR SPARSE LINEAR SYSTEMS BASED ON RANK-STRUCTURED CHOLESKY FACTORIZATION JEFFREY N. CHADWICK AND DAVID S. BINDEL arxiv:1507.05593v1 [cs.na] 20 Jul 2015 Abstract. Direct factorization
More informationFAST STRUCTURED EIGENSOLVER FOR DISCRETIZED PARTIAL DIFFERENTIAL OPERATORS ON GENERAL MESHES
Proceedings of the Project Review, Geo-Mathematical Imaging Group Purdue University, West Lafayette IN, Vol. 1 2012 pp. 123-132. FAST STRUCTURED EIGENSOLVER FOR DISCRETIZED PARTIAL DIFFERENTIAL OPERATORS
More informationUsing Postordering and Static Symbolic Factorization for Parallel Sparse LU
Using Postordering and Static Symbolic Factorization for Parallel Sparse LU Michel Cosnard LORIA - INRIA Lorraine Nancy, France Michel.Cosnard@loria.fr Laura Grigori LORIA - Univ. Henri Poincaré Nancy,
More informationA DISTRIBUTED-MEMORY RANDOMIZED STRUCTURED MULTIFRONTAL METHOD FOR SPARSE DIRECT SOLUTIONS
A DISTRIBUTED-MEMORY RANDOMIZED STRUCTURED MULTIFRONTAL METHOD FOR SPARSE DIRECT SOLUTIONS ZIXING XIN, JIANLIN XIA, MAARTEN V. DE HOOP, STEPHEN CAULEY, AND VENKATARAMANAN BALAKRISHNAN Abstract. We design
More informationBlock Bidiagonal Decomposition and Least Squares Problems
Block Bidiagonal Decomposition and Least Squares Problems Åke Björck Department of Mathematics Linköping University Perspectives in Numerical Analysis, Helsinki, May 27 29, 2008 Outline Bidiagonal Decomposition
More informationAccelerating computation of eigenvectors in the dense nonsymmetric eigenvalue problem
Accelerating computation of eigenvectors in the dense nonsymmetric eigenvalue problem Mark Gates 1, Azzam Haidar 1, and Jack Dongarra 1,2,3 1 University of Tennessee, Knoxville, TN, USA 2 Oak Ridge National
More informationTowards a stable static pivoting strategy for the sequential and parallel solution of sparse symmetric indefinite systems
Technical Report RAL-TR-2005-007 Towards a stable static pivoting strategy for the sequential and parallel solution of sparse symmetric indefinite systems Iain S. Duff and Stéphane Pralet April 22, 2005
More informationNumerical Linear Algebra
Numerical Linear Algebra Decompositions, numerical aspects Gerard Sleijpen and Martin van Gijzen September 27, 2017 1 Delft University of Technology Program Lecture 2 LU-decomposition Basic algorithm Cost
More informationProgram Lecture 2. Numerical Linear Algebra. Gaussian elimination (2) Gaussian elimination. Decompositions, numerical aspects
Numerical Linear Algebra Decompositions, numerical aspects Program Lecture 2 LU-decomposition Basic algorithm Cost Stability Pivoting Cholesky decomposition Sparse matrices and reorderings Gerard Sleijpen
More informationParallel Polynomial Evaluation
Parallel Polynomial Evaluation Jan Verschelde joint work with Genady Yoffe University of Illinois at Chicago Department of Mathematics, Statistics, and Computer Science http://www.math.uic.edu/ jan jan@math.uic.edu
More informationJacobi-Based Eigenvalue Solver on GPU. Lung-Sheng Chien, NVIDIA
Jacobi-Based Eigenvalue Solver on GPU Lung-Sheng Chien, NVIDIA lchien@nvidia.com Outline Symmetric eigenvalue solver Experiment Applications Conclusions Symmetric eigenvalue solver The standard form is
More informationAPPLIED NUMERICAL LINEAR ALGEBRA
APPLIED NUMERICAL LINEAR ALGEBRA James W. Demmel University of California Berkeley, California Society for Industrial and Applied Mathematics Philadelphia Contents Preface 1 Introduction 1 1.1 Basic Notation
More informationThe design and use of a sparse direct solver for skew symmetric matrices
The design and use of a sparse direct solver for skew symmetric matrices Iain S. Duff May 6, 2008 RAL-TR-2006-027 c Science and Technology Facilities Council Enquires about copyright, reproduction and
More informationParallel Scalability of a FETI DP Mortar Method for Problems with Discontinuous Coefficients
Parallel Scalability of a FETI DP Mortar Method for Problems with Discontinuous Coefficients Nina Dokeva and Wlodek Proskurowski Department of Mathematics, University of Southern California, Los Angeles,
More informationParallel Algorithms for Solution of Large Sparse Linear Systems with Applications
Parallel Algorithms for Solution of Large Sparse Linear Systems with Applications Murat Manguoğlu Department of Computer Engineering Middle East Technical University, Ankara, Turkey Prace workshop: HPC
More informationParallel sparse linear solvers and applications in CFD
Parallel sparse linear solvers and applications in CFD Jocelyne Erhel Joint work with Désiré Nuentsa Wakam () and Baptiste Poirriez () SAGE team, Inria Rennes, France journée Calcul Intensif Distribué
More informationAn exact reanalysis algorithm using incremental Cholesky factorization and its application to crack growth modeling
INTERNATIONAL JOURNAL FOR NUMERICAL METHODS IN ENGINEERING Int. J. Numer. Meth. Engng 01; 91:158 14 Published online 5 June 01 in Wiley Online Library (wileyonlinelibrary.com). DOI: 10.100/nme.4 SHORT
More informationParallel Numerical Algorithms
Parallel Numerical Algorithms Chapter 6 Matrix Models Section 6.2 Low Rank Approximation Edgar Solomonik Department of Computer Science University of Illinois at Urbana-Champaign CS 554 / CSE 512 Edgar
More informationParallel scalability of a FETI DP mortar method for problems with discontinuous coefficients
Parallel scalability of a FETI DP mortar method for problems with discontinuous coefficients Nina Dokeva and Wlodek Proskurowski University of Southern California, Department of Mathematics Los Angeles,
More informationIndefinite and physics-based preconditioning
Indefinite and physics-based preconditioning Jed Brown VAW, ETH Zürich 2009-01-29 Newton iteration Standard form of a nonlinear system F (u) 0 Iteration Solve: Update: J(ũ)u F (ũ) ũ + ũ + u Example (p-bratu)
More informationAn Efficient FETI Implementation on Distributed Shared Memory Machines with Independent Numbers of Subdomains and Processors
Contemporary Mathematics Volume 218, 1998 B 0-8218-0988-1-03024-7 An Efficient FETI Implementation on Distributed Shared Memory Machines with Independent Numbers of Subdomains and Processors Michel Lesoinne
More informationQR Factorization of Tall and Skinny Matrices in a Grid Computing Environment
QR Factorization of Tall and Skinny Matrices in a Grid Computing Environment Emmanuel AGULLO (INRIA / LaBRI) Camille COTI (Iowa State University) Jack DONGARRA (University of Tennessee) Thomas HÉRAULT
More informationScaling and Pivoting in an Out-of-Core Sparse Direct Solver
Scaling and Pivoting in an Out-of-Core Sparse Direct Solver JENNIFER A. SCOTT Rutherford Appleton Laboratory Out-of-core sparse direct solvers reduce the amount of main memory needed to factorize and solve
More informationMS&E 318 (CME 338) Large-Scale Numerical Optimization
Stanford University, Management Science & Engineering (and ICME MS&E 38 (CME 338 Large-Scale Numerical Optimization Course description Instructor: Michael Saunders Spring 28 Notes : Review The course teaches
More informationPerformance and Scalability of the Block Low-Rank Multifrontal Factorization on Multicore Architectures
Performance and Scalability of the Block Low-Rank Multifrontal Factorization on Multicore Architectures Patrick R. Amestoy, Alfredo Buttari, Jean-Yves L Excellent, Théo Mary To cite this version: Patrick
More informationScientific Computing with Case Studies SIAM Press, Lecture Notes for Unit VII Sparse Matrix
Scientific Computing with Case Studies SIAM Press, 2009 http://www.cs.umd.edu/users/oleary/sccswebpage Lecture Notes for Unit VII Sparse Matrix Computations Part 1: Direct Methods Dianne P. O Leary c 2008
More informationA DISTRIBUTED-MEMORY RANDOMIZED STRUCTURED MULTIFRONTAL METHOD FOR SPARSE DIRECT SOLUTIONS
SIAM J. SCI. COMPUT. Vol. 39, No. 4, pp. C292 C318 c 2017 Society for Industrial and Applied Mathematics A DISTRIBUTED-MEMORY RANDOMIZED STRUCTURED MULTIFRONTAL METHOD FOR SPARSE DIRECT SOLUTIONS ZIXING
More informationSolution of Linear Equations
Solution of Linear Equations (Com S 477/577 Notes) Yan-Bin Jia Sep 7, 07 We have discussed general methods for solving arbitrary equations, and looked at the special class of polynomial equations A subclass
More informationSparse Linear Algebra: Direct Methods, advanced features
Sparse Linear Algebra: Direct Methods, advanced features P. Amestoy and A. Buttari (INPT(ENSEEIHT)-IRIT) A. Guermouche (Univ. Bordeaux-LaBRI), J.-Y. L Excellent and B. Uçar (INRIA-CNRS/LIP-ENS Lyon) F.-H.
More informationAdaptive Coarse Space Selection in BDDC and FETI-DP Iterative Substructuring Methods: Towards Fast and Robust Solvers
Adaptive Coarse Space Selection in BDDC and FETI-DP Iterative Substructuring Methods: Towards Fast and Robust Solvers Jan Mandel University of Colorado at Denver Bedřich Sousedík Czech Technical University
More informationBlock Low-Rank (BLR) approximations to improve multifrontal sparse solvers
Block Low-Rank (BLR) approximations to improve multifrontal sparse solvers Joint work with Patrick Amestoy, Cleve Ashcraft, Olivier Boiteau, Alfredo Buttari and Jean-Yves L Excellent, PhD started on October
More informationHigh Performance Parallel Tucker Decomposition of Sparse Tensors
High Performance Parallel Tucker Decomposition of Sparse Tensors Oguz Kaya INRIA and LIP, ENS Lyon, France SIAM PP 16, April 14, 2016, Paris, France Joint work with: Bora Uçar, CNRS and LIP, ENS Lyon,
More informationParallelism in FreeFem++.
Parallelism in FreeFem++. Guy Atenekeng 1 Frederic Hecht 2 Laura Grigori 1 Jacques Morice 2 Frederic Nataf 2 1 INRIA, Saclay 2 University of Paris 6 Workshop on FreeFem++, 2009 Outline 1 Introduction Motivation
More informationPreliminary/Qualifying Exam in Numerical Analysis (Math 502a) Spring 2012
Instructions Preliminary/Qualifying Exam in Numerical Analysis (Math 502a) Spring 2012 The exam consists of four problems, each having multiple parts. You should attempt to solve all four problems. 1.
More informationAn Accelerated Block-Parallel Newton Method via Overlapped Partitioning
An Accelerated Block-Parallel Newton Method via Overlapped Partitioning Yurong Chen Lab. of Parallel Computing, Institute of Software, CAS (http://www.rdcps.ac.cn/~ychen/english.htm) Summary. This paper
More informationThe Algorithm of Multiple Relatively Robust Representations for Multi-Core Processors
Aachen Institute for Advanced Study in Computational Engineering Science Preprint: AICES-2010/09-4 23/September/2010 The Algorithm of Multiple Relatively Robust Representations for Multi-Core Processors
More informationMultilevel spectral coarse space methods in FreeFem++ on parallel architectures
Multilevel spectral coarse space methods in FreeFem++ on parallel architectures Pierre Jolivet Laboratoire Jacques-Louis Lions Laboratoire Jean Kuntzmann DD 21, Rennes. June 29, 2012 In collaboration with
More informationHigh Performance Computing Using Out-of- Core Sparse Direct Solvers
High Performance Computing Using Out-of- Core Sparse Direct Solvers andhapati P. Raju and Siddhartha Khaitan Abstract In-core memory requirement is a bottleneck in solving large three dimensional Navier-Stokes
More informationJuly 18, Abstract. capture the unsymmetric structure of the matrices. We introduce a new algorithm which
An unsymmetrized multifrontal LU factorization Λ Patrick R. Amestoy y and Chiara Puglisi z July 8, 000 Abstract A well-known approach to compute the LU factorization of a general unsymmetric matrix A is
More informationParallel sparse direct solvers for Poisson s equation in streamer discharges
Parallel sparse direct solvers for Poisson s equation in streamer discharges Margreet Nool, Menno Genseberger 2 and Ute Ebert,3 Centrum Wiskunde & Informatica (CWI), P.O.Box 9479, 9 GB Amsterdam, The Netherlands
More informationMulticore Parallelization of Determinant Quantum Monte Carlo Simulations
Multicore Parallelization of Determinant Quantum Monte Carlo Simulations Andrés Tomás, Che-Rung Lee, Zhaojun Bai, Richard Scalettar UC Davis SIAM Conference on Computation Science & Engineering Reno, March
More informationApplications of Randomized Methods for Decomposing and Simulating from Large Covariance Matrices
Applications of Randomized Methods for Decomposing and Simulating from Large Covariance Matrices Vahid Dehdari and Clayton V. Deutsch Geostatistical modeling involves many variables and many locations.
More informationMatrix Assembly in FEA
Matrix Assembly in FEA 1 In Chapter 2, we spoke about how the global matrix equations are assembled in the finite element method. We now want to revisit that discussion and add some details. For example,
More informationAMS526: Numerical Analysis I (Numerical Linear Algebra)
AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 1: Course Overview & Matrix-Vector Multiplication Xiangmin Jiao SUNY Stony Brook Xiangmin Jiao Numerical Analysis I 1 / 20 Outline 1 Course
More informationFinite element programming by FreeFem++ advanced course
1 / 37 Finite element programming by FreeFem++ advanced course Atsushi Suzuki 1 1 Cybermedia Center, Osaka University atsushi.suzuki@cas.cmc.osaka-u.ac.jp Japan SIAM tutorial 4-5, June 2016 2 / 37 Outline
More informationSparse Matrix Computations in Arterial Fluid Mechanics
Sparse Matrix Computations in Arterial Fluid Mechanics Murat Manguoğlu Middle East Technical University, Turkey Kenji Takizawa Ahmed Sameh Tayfun Tezduyar Waseda University, Japan Purdue University, USA
More information