A dissection solver with kernel detection for unsymmetric matrices in FreeFem++

Size: px
Start display at page:

Download "A dissection solver with kernel detection for unsymmetric matrices in FreeFem++"

Transcription

1 . p.1/21 11 Dec. 2014, LJLL, Paris FreeFem++ workshop A dissection solver with kernel detection for unsymmetric matrices in FreeFem++ Atsushi Suzuki Atsushi.Suzuki@ann.jussieu.fr Joint work with François-Xavier Roux, LJLL-UPMC, ONERA,

2 . p.2/21 Sparse direct solvers Software parallel elimination data management kernel environment strategy and pivoting detection SuperLU_MT shared super-nodal dynamic data + pivoting no Pardiso shared super-nodal dynamic data + ε-perturbation no + pivoting SuperLU_DIST distributed super-nodal static data + ε-perturbation none MUMPS distributed multi-frontal dynamic data + pivoting yes J. W. Demmel, S. C. Eisenstat, J. R. Gilbert, X. S. Li, J. W. H. Liu. A supernodal approach to sparse partial pivoting, SIAM Journal on Matrix Analysis and Applications, 20 (1999), O. Schenk, K. Gärtner. Solving unsymmetric sparse systems of liner equations with PARDISO, Future Generation of Computer Systems, 20 (2004), X. S. Li, J. W. Demmel. SuperLU_DIST : A scalable distributed-memory sparse direct solver for unsymmetric linear systems, ACM Transactions on Mathematical Software, 29 (200), P. R. Amestoy, I. S. Duff, J.-Y. L Execellent. Mutlifrontal parallel distributed symmetric and unsymmetric solvers, Computer Methods in Applied Mechanics and Engineering, 184 (2000)

3 Sparse direct solvers Software parallel elimination data management kernel environment strategy and pivoting detection SuperLU_MT shared super-nodal dynamic data + pivoting no Pardiso shared super-nodal dynamic data + ε-perturbation no + pivoting SuperLU_DIST distributed super-nodal static data + ε-perturbation none MUMPS distributed multi-frontal dynamic data + pivoting yes Dissection shared multi-frontal static data + pivoting yes J. W. Demmel, S. C. Eisenstat, J. R. Gilbert, X. S. Li, J. W. H. Liu. A supernodal approach to sparse partial pivoting, SIAM Journal on Matrix Analysis and Applications, 20 (1999), O. Schenk, K. Gärtner. Solving unsymmetric sparse systems of liner equations with PARDISO, Future Generation of Computer Systems, 20 (2004), X. S. Li, J. W. Demmel. SuperLU_DIST : A scalable distributed-memory sparse direct solver for unsymmetric linear systems, ACM Transactions on Mathematical Software, 29 (200), P. R. Amestoy, I. S. Duff, J.-Y. L Execellent. Mutlifrontal parallel distributed symmetric and unsymmetric solvers, Computer Methods in Applied Mechanics and Engineering, 184 (2000) A. Suzuki, F.-X. Roux, A dissection solver with kernel detection for symmetric finite element matrices on shared memory computers, International Journal for Numerical Methods in Engineering, 100 (2014) p./21

4 . p.4/21 Overview of Dissection solver Dissection based on graph decomposition of sparse matrix 1 dense solver a 5 b e 7 f 2 dense solver c a 5 b c 6 d e 7 f dense solver sparse solver d each leaf can be computed in parallel sparse solver (tridiag) is written with Fortran90 by F.-X. Roux? load unbalance? different sizes of subdomains? parallel computation of higher levels : smaller subdomains with many cores/processors?

5 . p.5/21 Overview : recursive generation of Schur complement 8 9 a b c d e f aa a5 a bb b5 b2 b1 Schur complement cc dd c6 d6 d c1 d1 by sparse solver ee e7 e e a 5b ff f f Schur complement by dense solver 6c 6d Schur complement a 2b 7e 7f by dense solver d e f b 1c 1d 1e dense factorization there is no 6 block in sparse matrix. 6 in Schur complement has values as result of fill-in. This is analyzed in symbolic phase.

6 . p.6/21 Overview : graph decomposition by METIS/SCOTCH matrix size = 206,76, bisection level = 8, sub-blocks = 255 METIS SCOTCH number of nonzeros in dense blocks METIS SCOTCH diag 82,659,78 66,956,787 off-diag 18,097, ,858,81 total 265,79,487 22,815,168

7 . p.7/21 Block factorization and block pivot strategy block pivot strategy is employed for parallel efficiency Π T 1 Π T L L 21 L D D U 1 1 U 1 2 U U 2 2 U Π Π Π T L 1 L 2 L D U Π k k LDU LDU LDU 2 LDU 2 LDU LDU When null-pivots appear in a block, they are sent to the last, and corresponding rows and columns are nullified for rank (n k) update. 0

8 . p.8/21 Dissection solver : parallelization of dense blocks T LDL factorization A 11 A 12 A 1 A 14 dtrsm + dcaling A 22 A 2 A 24 A A 4 A 22 A 2 A 24 A A 4 -= A 21 A 1 A 11-1 A 12 A 1 A 14 A 44 A 44 A 41 dgemm α (1) : factorization L 11 D 11 L T 11 = A(1) 11 β (1) k : DTRSM + scaling D 1 11 (L 1 11 A(1) 1 k ) γ (1) k, l : DGEMM A k, l ---= (L 1 11 A 1 k) T D 1 11 (L 1 11 A(1) 1 l ) α (2) : factorization L 22 D 22 L T 22 = A(2) 22 α (1) {β (1) 2, β(1), β(1) 4 } {γ(1) 2,2, γ(1) 2,, γ(1),,,γ(1), β(2) 4 } {γ(2),, γ(2) - updating of Schur complement depends on factorization sequential among levels 4,4 } α(2) {β (2) α (1) {β (1) 2 -γ(1) 2,2 -α(2), β (1), β(1) 4 } {γ(1) 2,, γ(1),,,γ(1) 4,4 } {β(2) -γ(2), -α(), β (2) 4 } {γ(2) + factorization of diagonal block // updating of rest of Schur complement,4, γ(2) 4,4 },4, γ(2) 4,4 }

9 . p.9/21 Task scheduling in Dissection solver n = 206, 76 4?,5?,6?,7? 2?,?,1? DTRSM DGEMM SUBTR α β γ Westmere Xeon 5680@.GHz, 6core x2, POSIX threads 2?,?,1? DTRSM α 21,1 DGEMM SUBTR β γ factorization 22, factorization 11 αβγ αβγ computation of Schur complement in level, factorization in level 2 : scheduled together. tasks on critical path : scheduled statically other tasks : executed in greedy way with large group using predicted complexity + fine greedy

10 . p.10/21 Performance comparison N =1,004,784 symmetric real matrix : 2.0GHz Dissection Intel Pardiso MUMPS + parablas core CPU elapsed / CPU elapsed / CPU elapsed / 1 5, , ,40.6 5,41.1 5, , ,64.7 2, , , ,547.5, , , ,40.9 1, , , , , , , , , , , , , ,88.7 1, automatic parallelization of BLAS by OpenMP is not efficient + Dissection uses coarse grain parallelization by Pthreads # core GFlop/s time for parallel tasks time for the numerical factorization elapsed time idle time of cores elapsed time CPU time GFlops /159.84GFlops = 60% : running on NUMA is another challenge

11 . p.11/21 Task scheduling and performance of Dissection solver n = 206, 76 Westmere Xeon 5680@.GHz, 6core x2, POSIX threads LDLt LinvU fdhlfs DTRSM fdgemm sdgemm DSUB2d DSUB2o DSUBmd DSUBmo Dfilld Dfillo SpLDLt SpSchr dealloc GFlop/s

12 . p.12/21 Detection of the kernel of matrix in Dissection solver kernel of stiffness matrix coarse space for domain decomposition theoretically the last block is singular for semi-positive definite matrices numerically all blocks may be nearly singular ε > 0 : given threshold for null pivot a i i / a i 1 i 1 < ε a i i is null pivot regular entry candidates of null pivot 11 Schur complement matrix from suspicious null pivots and additional nodes = the algorithm to compute dimension of the kernel

13 How to determine dimension of the kernel? 2 4Ã11 à 12 à 21 à 22 5 = 2 4Ã11 0 à 21 S I 1 à 1 0 I 2 11 Ã12 5 S22 = 0 Kerà = 2 4à 1 11 Ã12 I 2 5 numerical example: symmetric semi-positive definite, m + k = = 10 by Householder-QR factorization: Matrix Computation; Golub, Van Loan, e e e e e-02-9.e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e-1-8.0e e e e e e-1-5.e e e e e e e e e e e e e-14 how to set threshold to distinguish non-zero (1.28e-02) and zero (1.2e-11) values? = an algorithm by measuring dimension of a projection onto the image space and detecting existence of inverse of the matrix using higher precision arithmetic. p.1/21

14 Detection of the kernel of matrix (1/4) A R N N, dimkera = k 1, dimima m. two parameters: l, n, which define factorization, A 11 A 12 5 = 4 A I 1 A 1 11 A 12 5 A 11 R (N l) (N l) or R (N n) (N n). A 21 A 22 A 21 S 22 0 I 2 projection : P n : RN span 2 solution in subspace, Ā N l b = Ā Ā 1 11 A 12 I Ā 1 11 b 1 0 5, A 11 R (N n) (N n) 5, A 11 R (N l) (N l), b = 2 4 b 1 5 = b b N l b l : in quadruple-precision with perturbation to simulate double-precision round-off error. n : candidate of dimension of the kernel compute for l = n 1, n, n + 1 err (n) l := max max x=[0 x l ] 0 P n (Ā N l A x x) x, max x=[x N l 0] 0 5 Ā N l A x x x A 1 N k+1 n = k + 1 err n 1 0 err n 0 err n+1 1 n = k err n 1 0 err n 0 err n+1 1 n = k 1 err n 1 0 err n 0 err n+1 1 does not exist. err(k) k 1 is computable and is similar order of Ā 1 N A N I N.. p.14/21

15 S 22 0 Ā 1 A I p.15/21 Detection of the kernel of matrix (2/4) Ā 1 : in quadruple-precision with perturbation to simulate double-precision round-off error with decomposition of A into regular part A 1 1 R m m and others A 11 A I A I 11 A 1 11 A 125 A 21 A 22 A 21 Â 1 11 I 22 0 S 22 0 I 22 A 1 11 = U 1 11 D 1 11 L artificial perturbation with ε A A I 11 Â 1 11 A 12 4Ǎ I A 21 A 22 0 I 22 0 Š 1 22 A 21 Â 1 11 I 22 Ǎ 1 11 = U 1 11 D 1 11 L 1 11 within quadruple-precision A A A 11 A I A 21 A 22 A 21 A 22 0 I 2 2 = 4 (Ǎ 1 11 A 11 I 1 ) Â 1 11 A12Š 1 22 A 21(I 1 Â 1 11 A 11) Ǎ 1 11 A 12 Â 1 11 A12Š 1 Š 1 22 A 21(I 1 Â 1 11 A 11) Š 1 22 Ŝ22 I 2 Ŝ 22 := A 22 A 21 Â 1 11 A 12 Â 1 11 A 11 I , Ǎ 1 11 A 11 I 1 10, Š 1 22 S 22 I S 22 = 0 Ŝ Ā 1 A I 0 22 Ŝ22 5

16 Detection of the kernel of matrix (/4) Stokes equations with full-neumann bc., N = 199, 808, m = 4, τ = 10 2, eigenvalues by Householder-QR [D] i : diagonal entry [D] 1 i k err (k) k 1 err (k) k err (k) k dim. of kernel = 5 dim. of kernel = 6 dim. of kernel = 7. p.16/

17 . p.17/21 Detection of the kernel of matrix (4/4) stationary Navier-Stokes equations, Re = 12,800, N = 47,886, m = 4, τ = 10 2, matrix by Householder QR factorization e e e e e e e e e e e e e e e e e e e e e-16 k err (k) k 1 err (k) k err (k) k dim. of kernel = 1 dim. of kernel =

18 . p.18/21 dependency of kernel detection on a parameter in MUMPS Threshold for null pivots by CNTL() automatic elstct e-14 kernel error 4.817e e e+0 residual e e e-15 stokes e-21 kernel NA ( ) error e e e e+1 residual e e e e-12 elstct e-16 kernel error e e+0.184e e+0 residual 1.827e e e e-15 Koutsovasilis/F e-14 kernel error 1.809e e e e+0 residual 1.557e e e e-16 ( ) stokes1 with CNTL()= 10 exceeds the memory limitation.

19 . p.19/21 memory consumption: Dissection and Intel Pardiso real 206,76 sym Dissection Intel Pardiso min max avg min max avg 1 thread fact (+2%).977 (+25%) fw/bw (+25%) threads fact (+2%) (+17%) fw/bw (+16%) threads fact (+12%) (+7%) fw/bw (+6%) threads fact (+17%) (+5%) fw/bw (+0%) real 206,76 unsym Dissection Intel Pardiso min max avg min max avg 1 thread fact (+17%) (+10%) fw/bw (+9%) threads fact (+1%) (+8%) fw/bw (+7%) threads fact (+11%) (+4%) fw/bw (+%) threads fact (+9%) (+1%) fw/bw (-%)

20 . p.20/21 FreeFem++ example stationary Navier-Stokes equations by Newton-Raphson iteration : cavitynewtow.edp find (u, p) V g L 2 0 (Ω) a(u, v) + a 1 (u, u, v) + b(v, p) = 0 b(u, q) ε(p, q) = 0 ε = load "Dissection" defaulttodissection(); XXMh [up1,up2,pp]; varf vdns ([u1,u2,p],[v1,v2,q])= int2d(th)(nu*(dx(u1)*dx(v1)+dy(u1)*dy(v1) + dx(u2)*dx(v2)+dy(u2)*dy(v2)) // + p*q*1.e-6 + p*dx(v1)+p*dy(v2) - dx(u1)*q-dy(u2)*q + Ugrad(u1,u2,up1,up2) *[v1,v2] + Ugrad(up1,up2,u1,u2) *[v1,v2]) + on(1,2,,4,u1=0,u2=0); //... up1[]=u1[]; matrix Ans=vDNS(XXMh,XXMh,tgv=-1); set(ans,solver=sparsesolver,tgv=-1, numthreads=4,tolpivot=1.0e-2); real[int] b=vns(0,xxmh,tgv=-1); real[int] w=ansˆ-1*b; u1[]-=w;

21 . p.21/21 summary diss_int(pointer dslv, real_or_complex, numthreads) diss_s_fact(pointer dslv, dim, CSR_symboic_data, sym, decomposer, level) diss_n_fact(pointer dslv, matrix_coef, scaling, tolpivot) diss_get_kern_dim(pointer dslv, kern_dim) diss_get_kern_vecs(pointer dslv, vectors) diss_fee(pointer dslv) written in Dissection.cpp for FreeFem++ load module. sparse matrix is stored in CSR format with structurally symmetric pattern. symmetric matrix is stored in either upper or lower. real symmetric/unsymmetric with kernel detection. complex general matrix without kernel detection. symbolic factorization contains scheduling of parallel tasks in addition to ordering by SCOTCH/METIS and fill-in analysis. SCOTCH/METIS sometimes does not produce aligned bisection tree. symbolic factorization phase is slower than one of UMFPACK double-double quadruple precision arithmetic library will replace Fortran90 REAL(16) usage.

A dissection solver with kernel detection for symmetric finite element matrices on shared memory computers

A dissection solver with kernel detection for symmetric finite element matrices on shared memory computers A dissection solver with kernel detection for symmetric finite element matrices on shared memory computers Atsushi Suzuki, François-Xavier Roux To cite this version: Atsushi Suzuki, François-Xavier Roux.

More information

Dissection sparse direct solver for indefinite finite element matrices and application to a semi-conductor problem

Dissection sparse direct solver for indefinite finite element matrices and application to a semi-conductor problem 10th FreeFem++ days 2018, 13 Dec. Dissection sparse direct solver for indefinite finite element matrices and application to a semi-conductor problem Atsushi Suzuki 1 1 Cybermedia Center, Osaka University

More information

Using Random Butterfly Transformations to Avoid Pivoting in Sparse Direct Methods

Using Random Butterfly Transformations to Avoid Pivoting in Sparse Direct Methods Using Random Butterfly Transformations to Avoid Pivoting in Sparse Direct Methods Marc Baboulin 1, Xiaoye S. Li 2 and François-Henry Rouet 2 1 University of Paris-Sud, Inria Saclay, France 2 Lawrence Berkeley

More information

Sparse linear solvers

Sparse linear solvers Sparse linear solvers Laura Grigori ALPINES INRIA and LJLL, UPMC On sabbatical at UC Berkeley March 2015 Plan Sparse linear solvers Sparse matrices and graphs Classes of linear solvers Sparse Cholesky

More information

Incomplete Cholesky preconditioners that exploit the low-rank property

Incomplete Cholesky preconditioners that exploit the low-rank property anapov@ulb.ac.be ; http://homepages.ulb.ac.be/ anapov/ 1 / 35 Incomplete Cholesky preconditioners that exploit the low-rank property (theory and practice) Artem Napov Service de Métrologie Nucléaire, Université

More information

Numerical simulation of the Gross-Pitaevskii equation by pseudo-spectral and finite element methods comparison of GPS code and FreeFem++

Numerical simulation of the Gross-Pitaevskii equation by pseudo-spectral and finite element methods comparison of GPS code and FreeFem++ . p.1/13 10 ec. 2014, LJLL, Paris FreeFem++ workshop : BECASIM session Numerical simulation of the Gross-Pitaevskii equation by pseudo-spectral and finite element methods comparison of GPS code and FreeFem++

More information

On the design of parallel linear solvers for large scale problems

On the design of parallel linear solvers for large scale problems On the design of parallel linear solvers for large scale problems Journée problème de Poisson, IHP, Paris M. Faverge, P. Ramet M. Faverge Assistant Professor Bordeaux INP LaBRI Inria Bordeaux - Sud-Ouest

More information

Maximum-weighted matching strategies and the application to symmetric indefinite systems

Maximum-weighted matching strategies and the application to symmetric indefinite systems Maximum-weighted matching strategies and the application to symmetric indefinite systems by Stefan Röllin, and Olaf Schenk 2 Technical Report CS-24-7 Department of Computer Science, University of Basel

More information

IMPROVING THE PERFORMANCE OF SPARSE LU MATRIX FACTORIZATION USING A SUPERNODAL ALGORITHM

IMPROVING THE PERFORMANCE OF SPARSE LU MATRIX FACTORIZATION USING A SUPERNODAL ALGORITHM IMPROVING THE PERFORMANCE OF SPARSE LU MATRIX FACTORIZATION USING A SUPERNODAL ALGORITHM Bogdan OANCEA PhD, Associate Professor, Artife University, Bucharest, Romania E-mail: oanceab@ie.ase.ro Abstract:

More information

Parallelization of Multilevel Preconditioners Constructed from Inverse-Based ILUs on Shared-Memory Multiprocessors

Parallelization of Multilevel Preconditioners Constructed from Inverse-Based ILUs on Shared-Memory Multiprocessors Parallelization of Multilevel Preconditioners Constructed from Inverse-Based ILUs on Shared-Memory Multiprocessors J.I. Aliaga 1 M. Bollhöfer 2 A.F. Martín 1 E.S. Quintana-Ortí 1 1 Deparment of Computer

More information

SPARSE SOLVERS POISSON EQUATION. Margreet Nool. November 9, 2015 FOR THE. CWI, Multiscale Dynamics

SPARSE SOLVERS POISSON EQUATION. Margreet Nool. November 9, 2015 FOR THE. CWI, Multiscale Dynamics SPARSE SOLVERS FOR THE POISSON EQUATION Margreet Nool CWI, Multiscale Dynamics November 9, 2015 OUTLINE OF THIS TALK 1 FISHPACK, LAPACK, PARDISO 2 SYSTEM OVERVIEW OF CARTESIUS 3 POISSON EQUATION 4 SOLVERS

More information

Scientific Computing

Scientific Computing Scientific Computing Direct solution methods Martin van Gijzen Delft University of Technology October 3, 2018 1 Program October 3 Matrix norms LU decomposition Basic algorithm Cost Stability Pivoting Pivoting

More information

A Sparse QS-Decomposition for Large Sparse Linear System of Equations

A Sparse QS-Decomposition for Large Sparse Linear System of Equations A Sparse QS-Decomposition for Large Sparse Linear System of Equations Wujian Peng 1 and Biswa N. Datta 2 1 Department of Math, Zhaoqing University, Zhaoqing, China, douglas peng@yahoo.com 2 Department

More information

Enhancing Scalability of Sparse Direct Methods

Enhancing Scalability of Sparse Direct Methods Journal of Physics: Conference Series 78 (007) 0 doi:0.088/7-6596/78//0 Enhancing Scalability of Sparse Direct Methods X.S. Li, J. Demmel, L. Grigori, M. Gu, J. Xia 5, S. Jardin 6, C. Sovinec 7, L.-Q.

More information

A Column Pre-ordering Strategy for the Unsymmetric-Pattern Multifrontal Method

A Column Pre-ordering Strategy for the Unsymmetric-Pattern Multifrontal Method A Column Pre-ordering Strategy for the Unsymmetric-Pattern Multifrontal Method TIMOTHY A. DAVIS University of Florida A new method for sparse LU factorization is presented that combines a column pre-ordering

More information

V C V L T I 0 C V B 1 V T 0 I. l nk

V C V L T I 0 C V B 1 V T 0 I. l nk Multifrontal Method Kailai Xu September 16, 2017 Main observation. Consider the LDL T decomposition of a SPD matrix [ ] [ ] [ ] [ ] B V T L 0 I 0 L T L A = = 1 V T V C V L T I 0 C V B 1 V T, 0 I where

More information

J.I. Aliaga 1 M. Bollhöfer 2 A.F. Martín 1 E.S. Quintana-Ortí 1. March, 2009

J.I. Aliaga 1 M. Bollhöfer 2 A.F. Martín 1 E.S. Quintana-Ortí 1. March, 2009 Parallel Preconditioning of Linear Systems based on ILUPACK for Multithreaded Architectures J.I. Aliaga M. Bollhöfer 2 A.F. Martín E.S. Quintana-Ortí Deparment of Computer Science and Engineering, Univ.

More information

Information Sciences Institute 22 June 2012 Bob Lucas, Gene Wagenbreth, Dan Davis, Roger Grimes and

Information Sciences Institute 22 June 2012 Bob Lucas, Gene Wagenbreth, Dan Davis, Roger Grimes and Accelerating the Multifrontal Method Information Sciences Institute 22 June 2012 Bob Lucas, Gene Wagenbreth, Dan Davis, Roger Grimes {rflucas,genew,ddavis}@isi.edu and grimes@lstc.com 3D Finite Element

More information

An evaluation of sparse direct symmetric solvers: an introduction and preliminary finding

An evaluation of sparse direct symmetric solvers: an introduction and preliminary finding Numerical Analysis Group Internal Report 24- An evaluation of sparse direct symmetric solvers: an introduction and preliminary finding J A Scott Y Hu N I M Gould November 8, 24 c Council for the Central

More information

Sparse LU Factorization on GPUs for Accelerating SPICE Simulation

Sparse LU Factorization on GPUs for Accelerating SPICE Simulation Nano-scale Integrated Circuit and System (NICS) Laboratory Sparse LU Factorization on GPUs for Accelerating SPICE Simulation Xiaoming Chen PhD Candidate Department of Electronic Engineering Tsinghua University,

More information

Improvements for Implicit Linear Equation Solvers

Improvements for Implicit Linear Equation Solvers Improvements for Implicit Linear Equation Solvers Roger Grimes, Bob Lucas, Clement Weisbecker Livermore Software Technology Corporation Abstract Solving large sparse linear systems of equations is often

More information

Dense Arithmetic over Finite Fields with CUMODP

Dense Arithmetic over Finite Fields with CUMODP Dense Arithmetic over Finite Fields with CUMODP Sardar Anisul Haque 1 Xin Li 2 Farnam Mansouri 1 Marc Moreno Maza 1 Wei Pan 3 Ning Xie 1 1 University of Western Ontario, Canada 2 Universidad Carlos III,

More information

Recent advances in sparse linear solver technology for semiconductor device simulation matrices

Recent advances in sparse linear solver technology for semiconductor device simulation matrices Recent advances in sparse linear solver technology for semiconductor device simulation matrices (Invited Paper) Olaf Schenk and Michael Hagemann Department of Computer Science University of Basel Basel,

More information

On the design of parallel linear solvers for large scale problems

On the design of parallel linear solvers for large scale problems On the design of parallel linear solvers for large scale problems ICIAM - August 2015 - Mini-Symposium on Recent advances in matrix computations for extreme-scale computers M. Faverge, X. Lacoste, G. Pichon,

More information

A convex QP solver based on block-lu updates

A convex QP solver based on block-lu updates Block-LU updates p. 1/24 A convex QP solver based on block-lu updates PP06 SIAM Conference on Parallel Processing for Scientific Computing San Francisco, CA, Feb 22 24, 2006 Hanh Huynh and Michael Saunders

More information

SOLVING SPARSE LINEAR SYSTEMS OF EQUATIONS. Chao Yang Computational Research Division Lawrence Berkeley National Laboratory Berkeley, CA, USA

SOLVING SPARSE LINEAR SYSTEMS OF EQUATIONS. Chao Yang Computational Research Division Lawrence Berkeley National Laboratory Berkeley, CA, USA 1 SOLVING SPARSE LINEAR SYSTEMS OF EQUATIONS Chao Yang Computational Research Division Lawrence Berkeley National Laboratory Berkeley, CA, USA 2 OUTLINE Sparse matrix storage format Basic factorization

More information

MUMPS. The MUMPS library: work done during the SOLSTICE project. MUMPS team, Lyon-Grenoble, Toulouse, Bordeaux

MUMPS. The MUMPS library: work done during the SOLSTICE project. MUMPS team, Lyon-Grenoble, Toulouse, Bordeaux The MUMPS library: work done during the SOLSTICE project MUMPS team, Lyon-Grenoble, Toulouse, Bordeaux Sparse Days and ANR SOLSTICE Final Workshop June MUMPS MUMPS Team since beg. of SOLSTICE (2007) Permanent

More information

A sparse multifrontal solver using hierarchically semi-separable frontal matrices

A sparse multifrontal solver using hierarchically semi-separable frontal matrices A sparse multifrontal solver using hierarchically semi-separable frontal matrices Pieter Ghysels Lawrence Berkeley National Laboratory Joint work with: Xiaoye S. Li (LBNL), Artem Napov (ULB), François-Henry

More information

Contents. Preface... xi. Introduction...

Contents. Preface... xi. Introduction... Contents Preface... xi Introduction... xv Chapter 1. Computer Architectures... 1 1.1. Different types of parallelism... 1 1.1.1. Overlap, concurrency and parallelism... 1 1.1.2. Temporal and spatial parallelism

More information

1 Overview. 2 Adapting to computing system evolution. 11 th European LS-DYNA Conference 2017, Salzburg, Austria

1 Overview. 2 Adapting to computing system evolution. 11 th European LS-DYNA Conference 2017, Salzburg, Austria 1 Overview Improving LSTC s Multifrontal Linear Solver Roger Grimes 3, Robert Lucas 3, Nick Meng 2, Francois-Henry Rouet 3, Clement Weisbecker 3, and Ting-Ting Zhu 1 1 Cray Incorporated 2 Intel Corporation

More information

Computing least squares condition numbers on hybrid multicore/gpu systems

Computing least squares condition numbers on hybrid multicore/gpu systems Computing least squares condition numbers on hybrid multicore/gpu systems M. Baboulin and J. Dongarra and R. Lacroix Abstract This paper presents an efficient computation for least squares conditioning

More information

Utilisation de la compression low-rank pour réduire la complexité du solveur PaStiX

Utilisation de la compression low-rank pour réduire la complexité du solveur PaStiX Utilisation de la compression low-rank pour réduire la complexité du solveur PaStiX 26 Septembre 2018 - JCAD 2018 - Lyon Grégoire Pichon, Mathieu Faverge, Pierre Ramet, Jean Roman Outline 1. Context 2.

More information

AMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences)

AMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences) AMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences) Lecture 19: Computing the SVD; Sparse Linear Systems Xiangmin Jiao Stony Brook University Xiangmin Jiao Numerical

More information

Math 102, Winter Final Exam Review. Chapter 1. Matrices and Gaussian Elimination

Math 102, Winter Final Exam Review. Chapter 1. Matrices and Gaussian Elimination Math 0, Winter 07 Final Exam Review Chapter. Matrices and Gaussian Elimination { x + x =,. Different forms of a system of linear equations. Example: The x + 4x = 4. [ ] [ ] [ ] vector form (or the column

More information

Multilevel low-rank approximation preconditioners Yousef Saad Department of Computer Science and Engineering University of Minnesota

Multilevel low-rank approximation preconditioners Yousef Saad Department of Computer Science and Engineering University of Minnesota Multilevel low-rank approximation preconditioners Yousef Saad Department of Computer Science and Engineering University of Minnesota SIAM CSE Boston - March 1, 2013 First: Joint work with Ruipeng Li Work

More information

Toward black-box adaptive domain decomposition methods

Toward black-box adaptive domain decomposition methods Toward black-box adaptive domain decomposition methods Frédéric Nataf Laboratory J.L. Lions (LJLL), CNRS, Alpines Inria and Univ. Paris VI joint work with Victorita Dolean (Univ. Nice Sophia-Antipolis)

More information

Static-scheduling and hybrid-programming in SuperLU DIST on multicore cluster systems

Static-scheduling and hybrid-programming in SuperLU DIST on multicore cluster systems Static-scheduling and hybrid-programming in SuperLU DIST on multicore cluster systems Ichitaro Yamazaki University of Tennessee, Knoxville Xiaoye Sherry Li Lawrence Berkeley National Laboratory MS49: Sparse

More information

Implementation of a KKT-based active-set QP solver

Implementation of a KKT-based active-set QP solver Block-LU updates p. 1/27 Implementation of a KKT-based active-set QP solver ISMP 2006 19th International Symposium on Mathematical Programming Rio de Janeiro, Brazil, July 30 August 4, 2006 Hanh Huynh

More information

DELFT UNIVERSITY OF TECHNOLOGY

DELFT UNIVERSITY OF TECHNOLOGY DELFT UNIVERSITY OF TECHNOLOGY REPORT 09-08 Preconditioned conjugate gradient method enhanced by deflation of rigid body modes applied to composite materials. T.B. Jönsthövel ISSN 1389-6520 Reports of

More information

Sparse factorization using low rank submatrices. Cleve Ashcraft LSTC 2010 MUMPS User Group Meeting April 15-16, 2010 Toulouse, FRANCE

Sparse factorization using low rank submatrices. Cleve Ashcraft LSTC 2010 MUMPS User Group Meeting April 15-16, 2010 Toulouse, FRANCE Sparse factorization using low rank submatrices Cleve Ashcraft LSTC cleve@lstc.com 21 MUMPS User Group Meeting April 15-16, 21 Toulouse, FRANCE ftp.lstc.com:outgoing/cleve/mumps1 Ashcraft.pdf 1 LSTC Livermore

More information

Numerical Methods in Matrix Computations

Numerical Methods in Matrix Computations Ake Bjorck Numerical Methods in Matrix Computations Springer Contents 1 Direct Methods for Linear Systems 1 1.1 Elements of Matrix Theory 1 1.1.1 Matrix Algebra 2 1.1.2 Vector Spaces 6 1.1.3 Submatrices

More information

Solving Large Nonlinear Sparse Systems

Solving Large Nonlinear Sparse Systems Solving Large Nonlinear Sparse Systems Fred W. Wubs and Jonas Thies Computational Mechanics & Numerical Mathematics University of Groningen, the Netherlands f.w.wubs@rug.nl Centre for Interdisciplinary

More information

Large Scale Sparse Linear Algebra

Large Scale Sparse Linear Algebra Large Scale Sparse Linear Algebra P. Amestoy (INP-N7, IRIT) A. Buttari (CNRS, IRIT) T. Mary (University of Toulouse, IRIT) A. Guermouche (Univ. Bordeaux, LaBRI), J.-Y. L Excellent (INRIA, LIP, ENS-Lyon)

More information

Accelerating linear algebra computations with hybrid GPU-multicore systems.

Accelerating linear algebra computations with hybrid GPU-multicore systems. Accelerating linear algebra computations with hybrid GPU-multicore systems. Marc Baboulin INRIA/Université Paris-Sud joint work with Jack Dongarra (University of Tennessee and Oak Ridge National Laboratory)

More information

Towards parallel bipartite matching algorithms

Towards parallel bipartite matching algorithms Outline Towards parallel bipartite matching algorithms Bora Uçar CNRS and GRAAL, ENS Lyon, France Scheduling for large-scale systems, 13 15 May 2009, Knoxville Joint work with Patrick R. Amestoy (ENSEEIHT-IRIT,

More information

Opportunities for ELPA to Accelerate the Solution of the Bethe-Salpeter Eigenvalue Problem

Opportunities for ELPA to Accelerate the Solution of the Bethe-Salpeter Eigenvalue Problem Opportunities for ELPA to Accelerate the Solution of the Bethe-Salpeter Eigenvalue Problem Peter Benner, Andreas Marek, Carolin Penke August 16, 2018 ELSI Workshop 2018 Partners: The Problem The Bethe-Salpeter

More information

Jacobi-Davidson Eigensolver in Cusolver Library. Lung-Sheng Chien, NVIDIA

Jacobi-Davidson Eigensolver in Cusolver Library. Lung-Sheng Chien, NVIDIA Jacobi-Davidson Eigensolver in Cusolver Library Lung-Sheng Chien, NVIDIA lchien@nvidia.com Outline CuSolver library - cusolverdn: dense LAPACK - cusolversp: sparse LAPACK - cusolverrf: refactorization

More information

Scalable Hybrid Programming and Performance for SuperLU Sparse Direct Solver

Scalable Hybrid Programming and Performance for SuperLU Sparse Direct Solver Scalable Hybrid Programming and Performance for SuperLU Sparse Direct Solver Sherry Li Lawrence Berkeley National Laboratory Piyush Sao Rich Vuduc Georgia Institute of Technology CUG 14, May 4-8, 14, Lugano,

More information

Direct and Incomplete Cholesky Factorizations with Static Supernodes

Direct and Incomplete Cholesky Factorizations with Static Supernodes Direct and Incomplete Cholesky Factorizations with Static Supernodes AMSC 661 Term Project Report Yuancheng Luo 2010-05-14 Introduction Incomplete factorizations of sparse symmetric positive definite (SSPD)

More information

Sparse BLAS-3 Reduction

Sparse BLAS-3 Reduction Sparse BLAS-3 Reduction to Banded Upper Triangular (Spar3Bnd) Gary Howell, HPC/OIT NC State University gary howell@ncsu.edu Sparse BLAS-3 Reduction p.1/27 Acknowledgements James Demmel, Gene Golub, Franc

More information

Some Geometric and Algebraic Aspects of Domain Decomposition Methods

Some Geometric and Algebraic Aspects of Domain Decomposition Methods Some Geometric and Algebraic Aspects of Domain Decomposition Methods D.S.Butyugin 1, Y.L.Gurieva 1, V.P.Ilin 1,2, and D.V.Perevozkin 1 Abstract Some geometric and algebraic aspects of various domain decomposition

More information

Recent advances in HPC with FreeFem++

Recent advances in HPC with FreeFem++ Recent advances in HPC with FreeFem++ Pierre Jolivet Laboratoire Jacques-Louis Lions Laboratoire Jean Kuntzmann Fourth workshop on FreeFem++ December 6, 2012 With F. Hecht, F. Nataf, C. Prud homme. Outline

More information

Accelerating computation of eigenvectors in the nonsymmetric eigenvalue problem

Accelerating computation of eigenvectors in the nonsymmetric eigenvalue problem Accelerating computation of eigenvectors in the nonsymmetric eigenvalue problem Mark Gates 1, Azzam Haidar 1, and Jack Dongarra 1,2,3 1 University of Tennessee, Knoxville, TN, USA 2 Oak Ridge National

More information

Challenges for Matrix Preconditioning Methods

Challenges for Matrix Preconditioning Methods Challenges for Matrix Preconditioning Methods Matthias Bollhoefer 1 1 Dept. of Mathematics TU Berlin Preconditioning 2005, Atlanta, May 19, 2005 supported by the DFG research center MATHEON in Berlin Outline

More information

arxiv: v1 [cs.na] 20 Jul 2015

arxiv: v1 [cs.na] 20 Jul 2015 AN EFFICIENT SOLVER FOR SPARSE LINEAR SYSTEMS BASED ON RANK-STRUCTURED CHOLESKY FACTORIZATION JEFFREY N. CHADWICK AND DAVID S. BINDEL arxiv:1507.05593v1 [cs.na] 20 Jul 2015 Abstract. Direct factorization

More information

FAST STRUCTURED EIGENSOLVER FOR DISCRETIZED PARTIAL DIFFERENTIAL OPERATORS ON GENERAL MESHES

FAST STRUCTURED EIGENSOLVER FOR DISCRETIZED PARTIAL DIFFERENTIAL OPERATORS ON GENERAL MESHES Proceedings of the Project Review, Geo-Mathematical Imaging Group Purdue University, West Lafayette IN, Vol. 1 2012 pp. 123-132. FAST STRUCTURED EIGENSOLVER FOR DISCRETIZED PARTIAL DIFFERENTIAL OPERATORS

More information

Using Postordering and Static Symbolic Factorization for Parallel Sparse LU

Using Postordering and Static Symbolic Factorization for Parallel Sparse LU Using Postordering and Static Symbolic Factorization for Parallel Sparse LU Michel Cosnard LORIA - INRIA Lorraine Nancy, France Michel.Cosnard@loria.fr Laura Grigori LORIA - Univ. Henri Poincaré Nancy,

More information

A DISTRIBUTED-MEMORY RANDOMIZED STRUCTURED MULTIFRONTAL METHOD FOR SPARSE DIRECT SOLUTIONS

A DISTRIBUTED-MEMORY RANDOMIZED STRUCTURED MULTIFRONTAL METHOD FOR SPARSE DIRECT SOLUTIONS A DISTRIBUTED-MEMORY RANDOMIZED STRUCTURED MULTIFRONTAL METHOD FOR SPARSE DIRECT SOLUTIONS ZIXING XIN, JIANLIN XIA, MAARTEN V. DE HOOP, STEPHEN CAULEY, AND VENKATARAMANAN BALAKRISHNAN Abstract. We design

More information

Block Bidiagonal Decomposition and Least Squares Problems

Block Bidiagonal Decomposition and Least Squares Problems Block Bidiagonal Decomposition and Least Squares Problems Åke Björck Department of Mathematics Linköping University Perspectives in Numerical Analysis, Helsinki, May 27 29, 2008 Outline Bidiagonal Decomposition

More information

Accelerating computation of eigenvectors in the dense nonsymmetric eigenvalue problem

Accelerating computation of eigenvectors in the dense nonsymmetric eigenvalue problem Accelerating computation of eigenvectors in the dense nonsymmetric eigenvalue problem Mark Gates 1, Azzam Haidar 1, and Jack Dongarra 1,2,3 1 University of Tennessee, Knoxville, TN, USA 2 Oak Ridge National

More information

Towards a stable static pivoting strategy for the sequential and parallel solution of sparse symmetric indefinite systems

Towards a stable static pivoting strategy for the sequential and parallel solution of sparse symmetric indefinite systems Technical Report RAL-TR-2005-007 Towards a stable static pivoting strategy for the sequential and parallel solution of sparse symmetric indefinite systems Iain S. Duff and Stéphane Pralet April 22, 2005

More information

Numerical Linear Algebra

Numerical Linear Algebra Numerical Linear Algebra Decompositions, numerical aspects Gerard Sleijpen and Martin van Gijzen September 27, 2017 1 Delft University of Technology Program Lecture 2 LU-decomposition Basic algorithm Cost

More information

Program Lecture 2. Numerical Linear Algebra. Gaussian elimination (2) Gaussian elimination. Decompositions, numerical aspects

Program Lecture 2. Numerical Linear Algebra. Gaussian elimination (2) Gaussian elimination. Decompositions, numerical aspects Numerical Linear Algebra Decompositions, numerical aspects Program Lecture 2 LU-decomposition Basic algorithm Cost Stability Pivoting Cholesky decomposition Sparse matrices and reorderings Gerard Sleijpen

More information

Parallel Polynomial Evaluation

Parallel Polynomial Evaluation Parallel Polynomial Evaluation Jan Verschelde joint work with Genady Yoffe University of Illinois at Chicago Department of Mathematics, Statistics, and Computer Science http://www.math.uic.edu/ jan jan@math.uic.edu

More information

Jacobi-Based Eigenvalue Solver on GPU. Lung-Sheng Chien, NVIDIA

Jacobi-Based Eigenvalue Solver on GPU. Lung-Sheng Chien, NVIDIA Jacobi-Based Eigenvalue Solver on GPU Lung-Sheng Chien, NVIDIA lchien@nvidia.com Outline Symmetric eigenvalue solver Experiment Applications Conclusions Symmetric eigenvalue solver The standard form is

More information

APPLIED NUMERICAL LINEAR ALGEBRA

APPLIED NUMERICAL LINEAR ALGEBRA APPLIED NUMERICAL LINEAR ALGEBRA James W. Demmel University of California Berkeley, California Society for Industrial and Applied Mathematics Philadelphia Contents Preface 1 Introduction 1 1.1 Basic Notation

More information

The design and use of a sparse direct solver for skew symmetric matrices

The design and use of a sparse direct solver for skew symmetric matrices The design and use of a sparse direct solver for skew symmetric matrices Iain S. Duff May 6, 2008 RAL-TR-2006-027 c Science and Technology Facilities Council Enquires about copyright, reproduction and

More information

Parallel Scalability of a FETI DP Mortar Method for Problems with Discontinuous Coefficients

Parallel Scalability of a FETI DP Mortar Method for Problems with Discontinuous Coefficients Parallel Scalability of a FETI DP Mortar Method for Problems with Discontinuous Coefficients Nina Dokeva and Wlodek Proskurowski Department of Mathematics, University of Southern California, Los Angeles,

More information

Parallel Algorithms for Solution of Large Sparse Linear Systems with Applications

Parallel Algorithms for Solution of Large Sparse Linear Systems with Applications Parallel Algorithms for Solution of Large Sparse Linear Systems with Applications Murat Manguoğlu Department of Computer Engineering Middle East Technical University, Ankara, Turkey Prace workshop: HPC

More information

Parallel sparse linear solvers and applications in CFD

Parallel sparse linear solvers and applications in CFD Parallel sparse linear solvers and applications in CFD Jocelyne Erhel Joint work with Désiré Nuentsa Wakam () and Baptiste Poirriez () SAGE team, Inria Rennes, France journée Calcul Intensif Distribué

More information

An exact reanalysis algorithm using incremental Cholesky factorization and its application to crack growth modeling

An exact reanalysis algorithm using incremental Cholesky factorization and its application to crack growth modeling INTERNATIONAL JOURNAL FOR NUMERICAL METHODS IN ENGINEERING Int. J. Numer. Meth. Engng 01; 91:158 14 Published online 5 June 01 in Wiley Online Library (wileyonlinelibrary.com). DOI: 10.100/nme.4 SHORT

More information

Parallel Numerical Algorithms

Parallel Numerical Algorithms Parallel Numerical Algorithms Chapter 6 Matrix Models Section 6.2 Low Rank Approximation Edgar Solomonik Department of Computer Science University of Illinois at Urbana-Champaign CS 554 / CSE 512 Edgar

More information

Parallel scalability of a FETI DP mortar method for problems with discontinuous coefficients

Parallel scalability of a FETI DP mortar method for problems with discontinuous coefficients Parallel scalability of a FETI DP mortar method for problems with discontinuous coefficients Nina Dokeva and Wlodek Proskurowski University of Southern California, Department of Mathematics Los Angeles,

More information

Indefinite and physics-based preconditioning

Indefinite and physics-based preconditioning Indefinite and physics-based preconditioning Jed Brown VAW, ETH Zürich 2009-01-29 Newton iteration Standard form of a nonlinear system F (u) 0 Iteration Solve: Update: J(ũ)u F (ũ) ũ + ũ + u Example (p-bratu)

More information

An Efficient FETI Implementation on Distributed Shared Memory Machines with Independent Numbers of Subdomains and Processors

An Efficient FETI Implementation on Distributed Shared Memory Machines with Independent Numbers of Subdomains and Processors Contemporary Mathematics Volume 218, 1998 B 0-8218-0988-1-03024-7 An Efficient FETI Implementation on Distributed Shared Memory Machines with Independent Numbers of Subdomains and Processors Michel Lesoinne

More information

QR Factorization of Tall and Skinny Matrices in a Grid Computing Environment

QR Factorization of Tall and Skinny Matrices in a Grid Computing Environment QR Factorization of Tall and Skinny Matrices in a Grid Computing Environment Emmanuel AGULLO (INRIA / LaBRI) Camille COTI (Iowa State University) Jack DONGARRA (University of Tennessee) Thomas HÉRAULT

More information

Scaling and Pivoting in an Out-of-Core Sparse Direct Solver

Scaling and Pivoting in an Out-of-Core Sparse Direct Solver Scaling and Pivoting in an Out-of-Core Sparse Direct Solver JENNIFER A. SCOTT Rutherford Appleton Laboratory Out-of-core sparse direct solvers reduce the amount of main memory needed to factorize and solve

More information

MS&E 318 (CME 338) Large-Scale Numerical Optimization

MS&E 318 (CME 338) Large-Scale Numerical Optimization Stanford University, Management Science & Engineering (and ICME MS&E 38 (CME 338 Large-Scale Numerical Optimization Course description Instructor: Michael Saunders Spring 28 Notes : Review The course teaches

More information

Performance and Scalability of the Block Low-Rank Multifrontal Factorization on Multicore Architectures

Performance and Scalability of the Block Low-Rank Multifrontal Factorization on Multicore Architectures Performance and Scalability of the Block Low-Rank Multifrontal Factorization on Multicore Architectures Patrick R. Amestoy, Alfredo Buttari, Jean-Yves L Excellent, Théo Mary To cite this version: Patrick

More information

Scientific Computing with Case Studies SIAM Press, Lecture Notes for Unit VII Sparse Matrix

Scientific Computing with Case Studies SIAM Press, Lecture Notes for Unit VII Sparse Matrix Scientific Computing with Case Studies SIAM Press, 2009 http://www.cs.umd.edu/users/oleary/sccswebpage Lecture Notes for Unit VII Sparse Matrix Computations Part 1: Direct Methods Dianne P. O Leary c 2008

More information

A DISTRIBUTED-MEMORY RANDOMIZED STRUCTURED MULTIFRONTAL METHOD FOR SPARSE DIRECT SOLUTIONS

A DISTRIBUTED-MEMORY RANDOMIZED STRUCTURED MULTIFRONTAL METHOD FOR SPARSE DIRECT SOLUTIONS SIAM J. SCI. COMPUT. Vol. 39, No. 4, pp. C292 C318 c 2017 Society for Industrial and Applied Mathematics A DISTRIBUTED-MEMORY RANDOMIZED STRUCTURED MULTIFRONTAL METHOD FOR SPARSE DIRECT SOLUTIONS ZIXING

More information

Solution of Linear Equations

Solution of Linear Equations Solution of Linear Equations (Com S 477/577 Notes) Yan-Bin Jia Sep 7, 07 We have discussed general methods for solving arbitrary equations, and looked at the special class of polynomial equations A subclass

More information

Sparse Linear Algebra: Direct Methods, advanced features

Sparse Linear Algebra: Direct Methods, advanced features Sparse Linear Algebra: Direct Methods, advanced features P. Amestoy and A. Buttari (INPT(ENSEEIHT)-IRIT) A. Guermouche (Univ. Bordeaux-LaBRI), J.-Y. L Excellent and B. Uçar (INRIA-CNRS/LIP-ENS Lyon) F.-H.

More information

Adaptive Coarse Space Selection in BDDC and FETI-DP Iterative Substructuring Methods: Towards Fast and Robust Solvers

Adaptive Coarse Space Selection in BDDC and FETI-DP Iterative Substructuring Methods: Towards Fast and Robust Solvers Adaptive Coarse Space Selection in BDDC and FETI-DP Iterative Substructuring Methods: Towards Fast and Robust Solvers Jan Mandel University of Colorado at Denver Bedřich Sousedík Czech Technical University

More information

Block Low-Rank (BLR) approximations to improve multifrontal sparse solvers

Block Low-Rank (BLR) approximations to improve multifrontal sparse solvers Block Low-Rank (BLR) approximations to improve multifrontal sparse solvers Joint work with Patrick Amestoy, Cleve Ashcraft, Olivier Boiteau, Alfredo Buttari and Jean-Yves L Excellent, PhD started on October

More information

High Performance Parallel Tucker Decomposition of Sparse Tensors

High Performance Parallel Tucker Decomposition of Sparse Tensors High Performance Parallel Tucker Decomposition of Sparse Tensors Oguz Kaya INRIA and LIP, ENS Lyon, France SIAM PP 16, April 14, 2016, Paris, France Joint work with: Bora Uçar, CNRS and LIP, ENS Lyon,

More information

Parallelism in FreeFem++.

Parallelism in FreeFem++. Parallelism in FreeFem++. Guy Atenekeng 1 Frederic Hecht 2 Laura Grigori 1 Jacques Morice 2 Frederic Nataf 2 1 INRIA, Saclay 2 University of Paris 6 Workshop on FreeFem++, 2009 Outline 1 Introduction Motivation

More information

Preliminary/Qualifying Exam in Numerical Analysis (Math 502a) Spring 2012

Preliminary/Qualifying Exam in Numerical Analysis (Math 502a) Spring 2012 Instructions Preliminary/Qualifying Exam in Numerical Analysis (Math 502a) Spring 2012 The exam consists of four problems, each having multiple parts. You should attempt to solve all four problems. 1.

More information

An Accelerated Block-Parallel Newton Method via Overlapped Partitioning

An Accelerated Block-Parallel Newton Method via Overlapped Partitioning An Accelerated Block-Parallel Newton Method via Overlapped Partitioning Yurong Chen Lab. of Parallel Computing, Institute of Software, CAS (http://www.rdcps.ac.cn/~ychen/english.htm) Summary. This paper

More information

The Algorithm of Multiple Relatively Robust Representations for Multi-Core Processors

The Algorithm of Multiple Relatively Robust Representations for Multi-Core Processors Aachen Institute for Advanced Study in Computational Engineering Science Preprint: AICES-2010/09-4 23/September/2010 The Algorithm of Multiple Relatively Robust Representations for Multi-Core Processors

More information

Multilevel spectral coarse space methods in FreeFem++ on parallel architectures

Multilevel spectral coarse space methods in FreeFem++ on parallel architectures Multilevel spectral coarse space methods in FreeFem++ on parallel architectures Pierre Jolivet Laboratoire Jacques-Louis Lions Laboratoire Jean Kuntzmann DD 21, Rennes. June 29, 2012 In collaboration with

More information

High Performance Computing Using Out-of- Core Sparse Direct Solvers

High Performance Computing Using Out-of- Core Sparse Direct Solvers High Performance Computing Using Out-of- Core Sparse Direct Solvers andhapati P. Raju and Siddhartha Khaitan Abstract In-core memory requirement is a bottleneck in solving large three dimensional Navier-Stokes

More information

July 18, Abstract. capture the unsymmetric structure of the matrices. We introduce a new algorithm which

July 18, Abstract. capture the unsymmetric structure of the matrices. We introduce a new algorithm which An unsymmetrized multifrontal LU factorization Λ Patrick R. Amestoy y and Chiara Puglisi z July 8, 000 Abstract A well-known approach to compute the LU factorization of a general unsymmetric matrix A is

More information

Parallel sparse direct solvers for Poisson s equation in streamer discharges

Parallel sparse direct solvers for Poisson s equation in streamer discharges Parallel sparse direct solvers for Poisson s equation in streamer discharges Margreet Nool, Menno Genseberger 2 and Ute Ebert,3 Centrum Wiskunde & Informatica (CWI), P.O.Box 9479, 9 GB Amsterdam, The Netherlands

More information

Multicore Parallelization of Determinant Quantum Monte Carlo Simulations

Multicore Parallelization of Determinant Quantum Monte Carlo Simulations Multicore Parallelization of Determinant Quantum Monte Carlo Simulations Andrés Tomás, Che-Rung Lee, Zhaojun Bai, Richard Scalettar UC Davis SIAM Conference on Computation Science & Engineering Reno, March

More information

Applications of Randomized Methods for Decomposing and Simulating from Large Covariance Matrices

Applications of Randomized Methods for Decomposing and Simulating from Large Covariance Matrices Applications of Randomized Methods for Decomposing and Simulating from Large Covariance Matrices Vahid Dehdari and Clayton V. Deutsch Geostatistical modeling involves many variables and many locations.

More information

Matrix Assembly in FEA

Matrix Assembly in FEA Matrix Assembly in FEA 1 In Chapter 2, we spoke about how the global matrix equations are assembled in the finite element method. We now want to revisit that discussion and add some details. For example,

More information

AMS526: Numerical Analysis I (Numerical Linear Algebra)

AMS526: Numerical Analysis I (Numerical Linear Algebra) AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 1: Course Overview & Matrix-Vector Multiplication Xiangmin Jiao SUNY Stony Brook Xiangmin Jiao Numerical Analysis I 1 / 20 Outline 1 Course

More information

Finite element programming by FreeFem++ advanced course

Finite element programming by FreeFem++ advanced course 1 / 37 Finite element programming by FreeFem++ advanced course Atsushi Suzuki 1 1 Cybermedia Center, Osaka University atsushi.suzuki@cas.cmc.osaka-u.ac.jp Japan SIAM tutorial 4-5, June 2016 2 / 37 Outline

More information

Sparse Matrix Computations in Arterial Fluid Mechanics

Sparse Matrix Computations in Arterial Fluid Mechanics Sparse Matrix Computations in Arterial Fluid Mechanics Murat Manguoğlu Middle East Technical University, Turkey Kenji Takizawa Ahmed Sameh Tayfun Tezduyar Waseda University, Japan Purdue University, USA

More information