Large Scale Sparse Linear Algebra
|
|
- Magdalen Tucker
- 5 years ago
- Views:
Transcription
1 Large Scale Sparse Linear Algebra P. Amestoy (INP-N7, IRIT) A. Buttari (CNRS, IRIT) T. Mary (University of Toulouse, IRIT) A. Guermouche (Univ. Bordeaux, LaBRI), J.-Y. L Excellent (INRIA, LIP, ENS-Lyon) B. Uçar (CNRS, LIP, ENS-Lyon) F.-H. Rouet (LSTC, Livermore, USA) C. Weisbecker (LSTC, Livermore, USA) Main principle Principle: build approximated factorization A ε = L ε U ε at given accuracy ε Part I: asymptotic complexity reduction Theoretical proof and experimental validation that (3D case): Operations: O(N 6 ) O(N 5 ) O(N 4 ) Memory: O(N 4 ) O(N 3 log N) Part II: efficient and scalable algorithms How to design algorithms to efficiently translate the theoretical complexity reduction into actual performance and memory gains for large-scale systems and applications?
2 Impact on industrial applications 10 ߝ௫ ܧ 0 20 Dip (km) Cross (km) 15,, ܧ ௫ 5 Depth (km) m/s Structural mechanics Matrix of order 8M Required accuracy: 10 9 Seismic imaging Matrix of order 17M Required accuracy: 10 3 Results on 900 cores: Electromagnetism Matrix of order 30M Required accuracy: 10 7 factorization time (s) memory/proc (GB) application MUMPS BLR ratio MUMPS BLR gain structural % seismic % electromag % Introduction
3 Rank and rank k approximation In the following, B is a dense matrix of size m n. Definition 1 (Rank) The rank k of B is defined as the smallest integer such that there exist matrices X and Y of size m k and n k such that B = XY T. Definition 2 We call a rank-k approximation of B at accuracy ε any matrix B of rank k such that B B ε. Optimal rank-k approximation Theorem 3 (Eckart-Young) Let UΣV T be the SVD decomposition of B and let us note σ i = Σ i,i its singular values. Then B = U 1:m,1:k Σ 1:k,1:k V1:n,1:k T is the optimal rank-k approximation of B and B B 2 = σ k+1.
4 Numerical rank Definition 4 (Numerical rank) The numerical rank Rk ε (B) of B at accuracy ε is defined as the smallest integer k ε such that there exists a matrix B of rank k ε such that B B ε. Theorem 5 Let UΣV T be the SVD decomposition of B and let us note σ i = Σ i,i its singular values. Then the numerical rank of B at accuracy ε is given by Proof: in exercise. k ε = min σ k+1 ε. 1 k min(m,n) Low-rank matrices If the numerical rank of B is equal to min(m, n) then B is said to be full-rank. Inversely, if Rk ε (B) < min(m, n), then B is said to be rank-deficient. A class of rank-deficient matrices of particular interest are low-rank matrices, defined as follows. Definition 6 (Low-rank matrix) B is said to be low-rank (for a given accuracy ε) if its numerical rank k ε is small enough such that its rank-k ε approximation B = XY T requires less storage than the full-rank matrix B, i.e., if k ε (m + n) mn. In that case, B is said to be a low-rank approximation of B and ε is called the low-rank threshold. In the following, for the sake of simplicity, we refer to the numerical rank of a matrix at accuracy ε simply as its rank.
5 Compression kernels The act of computing B from B is called the compression of B. What are the different methods to compress B? SVD: optimal but expensive: O(mn min(m, n)) operations Truncated QR factorization: slightly less accurate but much cheaper: O(mnk ε ) operations widely used Multiple other methods: randomized algorithms, adaptive cross-approximation, interpolative decomposition, CUR, etc. In the following, we assume truncated QR is used as compression kernel. Low-rank subblocks Frontal matrices are not low-rank but in some applications they exhibit low-rank blocks A block B represents the interaction between two subdomains σ and τ. σ If they have a small diameter and are far away their interaction is weak rank is τ low. The block-admissibility condition formalizes this intuition: σ τ is admissible max (diam (σ), diam (τ)) η dist (σ, τ) σ τ rank of distance between and
6 Block Low-Rank matrices A BLR matrix is defined by a partition P = S S, with S = {σ 1,..., σ p }. σ 1 σ 2 σ 3 σ 4 σ 5 σ 1 σ 6 σ 6 σ 2 σ 3 σ 4 σ 5 Gray blocks are non-admissible and therefore kept full-rank White blocks are admissible and therefore compressed to low-rank Standard BLR factorization: FSCU + FSCU (Factor, Solve, Compress, Update) To evaluate the complexity of the BLR factorization, we must compute the cost of these four main steps
7 Part I: complexity of the BLR factorization Cost analysis of the involved steps Let us consider two blocks A and B of size b b and of rank bounded by r. step type operation cost Factor FR A LU Solve FR-FR B BU 1 Compress LR A Ã Update FR-FR C AB Update LR-FR C ÃB Update FR-LR C A B Update LR-LR C Ã B This is not enough to compute the complexity we need to bound the number of FR blocks!
8 Bounding the number of FR blocks BLR-admissibility condition of a partition P { #{σ, σ τ P is not admissible} q P is admissible #{τ, σ τ P is not admissible} q Non-Admissible Admissible Main result For any matrix, we can build an admissible P for q = O(1), s.t. the maximal rank of the admissible blocks of A is r Amestoy, Buttari, L Excellent, and Mary. On the Complexity of the Block Low-Rank Multifrontal Factorization, SIAM J. Sci. Comput., Memory complexity of the dense BLR factorization Let us consider a dense (frontal) matrix of order m divided into p p blocks of order b, with p = m/b. The memory complexity to store the matrix can be computed as M total (b, p, r) = M FR (b, p) + M LR (b, p, r) Ex. 1: compute M FR (b, p) = M LR (b, p, r) = Ex. 2: assuming b = O(m x ) and r = O(m α ), compute M total (m, x, α) = Ex. 3: compute the optimal block size b = O(m x ) and the resulting optimal complexity: x =, b =, and M opt (m, r) = M total (m, x, α) =
9 Flop complexity of the dense BLR factorization Let us consider a dense (frontal) matrix of order m divided into p p blocks of order b, with p = m/b. step type cost number C step (b, p, r) C step (m, x, α) Factor FR O(b 3 ) Solve FR-FR O(b 3 ) Compress LR O(b 2 r) Update FR-FR O(b 3 ) Update LR-FR O(b 2 r) Update LR-LR O(b 2 r) Ex. 1: compute C step (b, p, r) = cost number Ex. 2: compute C step (m, x, α) with b = O(m x ) and r = O(m α ). Ex. 3: compute the total complexity (sum of all steps) C total (m, x, α) = Ex. 4: compute the optimal block size b = O(m x ) and the resulting optimal complexity: x =, b =, and C opt (m, r) = C total (m, x, α) = Complexity of the sparse multifrontal BLR factorization Sparse multifrontal complexity with ND For a dense complexity C opt (m, r), the sparse complexity is computed as log 2 N ( ) N d 1 C mf = O( 2 dl C 2 l ), l=0 where d is the dimension (2 or 3). operations (OPC) N N grid factor size (NNZ) FR O(N 3 ) O(N 2 log N) BLR O(N 5/2 r 1/2 ) O(N 2 ) N N N grid FR O(N 6 ) O(N 4 ) BLR
10 Flop count Flop count Experimental Setting: Matrices 1. Poisson: N 3 grid with a 7-point stencil with u = 1 on the boundary Ω u = f Rank bound is theoretically proven to be r = O(1). 2. Helmholtz: N 3 grid with a 27-point stencil, ω is the angular frequency, v(x) is the seismic velocity field, and u(x, ω) is the time-harmonic wavefield solution to the forcing term s(x, ω). ( ω2 v(x) 2 ) u(x, ω) = s(x, ω) ω is fixed and equal to 4Hz. Heuristically, rank bound can be expected to behave as r = O(N). Experimental MF flop complexity: Poisson (ε = ) Nested Dissection ordering (geometric) METIS ordering (purely algebraic) FR -t: 5n 2:02 BLR (FSCU) -t: 2105n 1: FR -t: 3n 2:05 BLR (FSCU) -t: 1068n 1: Mesh size N Mesh size N good agreement with theoretical complexity remains close to ND complexity with METIS ordering
11 Factors size Factors size Flop count Flop count Experimental MF flop complexity: Helmholtz (ε = 10 4 ) Nested Dissection ordering (geometric) METIS ordering (purely algebraic) FR -t: 12n 2: BLR (FSCU) -t: 31n 1: FR -t: 8n 2: BLR (FSCU) -t: 22n 1: Mesh size N Mesh size N good agreement with theoretical complexity remains close to ND complexity with METIS ordering Experimental MF complexity: factor size NNZ (Poisson) NNZ (Helmholtz) FR -t: 3n 1:40 BLR -t: 12n 1:05 log n FR -t: 15n 1: BLR -t: 32n 1: Mesh size N Mesh size N good agreement with theoretical complexity remains close to ND complexity with METIS ordering (not shown)
12 Flop count Flop count Experimental MF complexity: low-rank threshold ε OPC (Poisson) OPC (Helmholtz) = 10!14 -t: 905n 1: = 10!10 -t: 1068n 1:50 0 = 10!6 -t: 1045n 1:43 0 = 10!2 -t: 851n 1: = 10!5 -t: 23n 1:89 0 = 10!4 -t: 22n 1:87 0 = 10!3 -t: 14n 1: Mesh size N Mesh size N theory states ε should only play a role in the constant factor true for Helmholtz, but not Poisson why? Influence of zero-rank blocks on the complexity N N F R ε = N LR N ZR N F R ε = N LR N ZR N F R ε = 10 6 N LR N ZR N F R ε = 10 2 N LR N ZR Number of full-rank/low-rank/zero-rank blocks in percentage of the total number of blocks (Poisson problem). N F R decreases with N: asymptotically negligible N ZR increases with ε (as one would expect) but also with N: asymptotically dominant
13 Normalized.ops Influence of the block size b on the complexity Analysis on the root node (of size m = N 2 ): m = m = m = Block size b large range of acceptable block sizes around the optimal b flexibility to tune block size for performance that range increases with the size of the matrix necessity to have variable block sizes Part II: performance of the BLR factorization
14 Normalized time Normalized time Normalized flops Normalized time Sequential result (matrix S3) LAI parts Factor+Solve Update Compress LAI parts Factor+Solve Update Compress FR BLR 0 FR BLR Normalized Flops Normalized Time 7.7 gain in flops only translated to a 3.3 gain in time: why? lower granularity of the Update higher relative weight of the FR parts inefficient Compress Multithreaded result on 24 threads LAI parts Factor+Solve Update Compress LAI parts Factor+Solve Update Compress FR BLR 0 FR BLR Normalized Time (Seq.) Normalized Time (MT) 3.3 gain in sequential becomes 1.7 in multithreaded: why? LAI parts have become critical Update and Compress are memory-bound
15 Exploiting tree-based multithreading in MF solvers thr 0-3 thr 0-3 thr 0-3 Node parallelism L0 layer thr 0-3 thr 0-3 thr 0-3 thr 0-3 L Excellent and Sid-Lakhdar. A study of shared-memory parallelism in a multifrontal solver, Parallel Computing. how big an impact can tree-based multithreading make? Impact of tree-based multithreading on BLR (24 threads) % hai % lai Higher AI Lower AI node only node + tree time % lai time % lai FR % % BLR % % In FR, top of the tree is dominant tree MT brings little gain In BLR, bottom of the tree compresses less, becomes important 1.7 gain becomes 1.9 thanks to tree-based multithreading Theoretical speedup tree only node only node + tree N N grid N N N grid FR O(1) O(N) O(N 2 ) BLR O(log N) O(log N) O(N log N) FR O(1) O(N 3 ) O(N 4 ) BLR
16 Right-looking Vs. Left-looking analysis (24 threads) FR time BLR time RL LL RL LL Update Total read once written at each step read at each step written once RL factorization LL factorization Lower volume of memory transfers in LL (more critical in MT) Update is now less memory-bound: 1.9 gain becomes 2.4 in LL LUAR variant: accumulation and recompression FSCU (Factor, Solve, Compress, Update) FSCU+LUAR Better granularity in Update operations Potential recompression asymptotic complexity reduction? Designed and compared several recompression strategies
17 GF/s Gflops/s Performance of Outer Product with LUA(R) (24 threads) Outer Product benchmark b=256 b= Size of Outer Product LL LUA LUAR average size of Outer Product flops ( ) time (s) Outer Product Total Outer Product Total All metrics include the Recompression overhead Higher granularity and lower flops in Update: 2.4 gain becomes 2.6 Impact of machine properties on BLR: roofline model specs time (s) for peak bw BLR factorization (GF/s) (GB/s) RL LL LUA grunch (28 threads) brunch (24 threads) S3 matrix Arithmetic Intensity in BLR: LL > RL (lower volume of memory transfers) LUA > LL (higher granularities more efficient cache use) brunch grunch RL LL LUA Arithmetic Intensity of the Outer Product
18 FCSU variant: compress before solve FSCU (Factor, Solve, Compress, Update) FSCU+LUAR Better granularity in Update operations Potential recompression asymptotic complexity reduction? Designed and compared several recompression strategies FCSU(+LUAR) Restricted pivoting, e.g. to diagonal blocks Low-rank Solve asymptotic complexity reduction? Performance and accuracy of FCSU vs FSCU standard pivoting restricted pivoting FR FSCU FR FSCU FCSU +LUAR +LUAR +LUAR flops ( ) time (s) residual 4.5e e e e e-09 On this problem, restricted pivoting is enough to ensure stability better BLAS-3/BLAS-2 ratio Compressing before the Solve has little impact on the residual flop reduction 2.6 gain becomes 3.7
19 Normalized time (FR=1) Flop count Flop count Variants improve asymptotic complexity We have theoretically proven that: FSCU FSCU+LUAR FCSU+LUAR dense O(m 5/2 r 1/2 ) O(m 7/3 r 2/3 ) O(m 2 r) sparse (3D) O(N 5 r 1/2 ) O(N 14/3 r 2/3 ) O(N 4 r) Amestoy, Buttari, L Excellent, and Mary. On the Complexity of the Block Low-Rank Multifrontal Factorization, SIAM J. Sci. Comput., Poisson (ε = ) FSCU -t: 1068n 1:50 FSCU+LUAR -t: 2235n 1:42 FCSU+LUAR -t: 6175n 1: Helmholtz (ε = 10 4 ) FSCU -t: 22n 1:87 FSCU+LUAR -t: 34n 1:82 FCSU+LUAR -t: 60n 1: Mesh size N Mesh size N Multicore performance results (24 threads) 1 FR BLR BLR Hz 7Hz 10Hz E3 E4 S3 S4 p8d p8ar p8cr BLR : FSCU, right-looking, node only multithreading BLR+ : FCSU+LUAR, left-looking, node+tree multithreading Amestoy, Buttari, L Excellent, and Mary. Performance and Scalability of the Block Low-Rank Multifrontal Factorization on Multicore Architectures, submitted to ACM Trans. Math. Soft., 2017.
20 The problem with FCSU FSCU (Factor, Solve, Compress, Update) FSCU+LUAR Better granularity in Update operations Potential recompression asymptotic complexity reduction? Designed and compared several recompression strategies FCSU(+LUAR) Restricted pivoting, e.g. to diagonal blocks not acceptable in many applications Low-rank Solve asymptotic complexity reduction? Compress before Solve + pivoting: CFSU variant D k What s straightforward: Column swaps on B can be performed as row swaps of Y Triangular solve and update can also be performed on Y X Y T B What s less straightforward: How to assess the quality of pivot k? We need to estimate B :,k max : B :,k max, assuming X is orthonormal (e.g. RRQR, SVD) How to deal with postponed/delayed pivots? Several strategies to merge them with next panel
21 Residual Normalized flops FSCU vs FCSU vs CFSU FSCU Standard pivoting Compress after Solve FCSU Restricted pivoting Compress before Solve CFSU Standard pivoting Compress before Solve FR FSCU FCSU CFSU 0 barrier2-10 Lin para-10 kkt_power perf009d perf009ar FSCU FCSU CFSU barrier2-10 Lin para-10 kkt_power perf009d perf009ar When FCSU is enough (left), CFSU does not degrade compression When FCSU fails (right), CFSU achieves both good residual and compression Distributed-memory parallelism P 0 P 0 P 0 P 1 P 2 P 3 LU messages P 0 P 1 P 2 P 1 P 1 P 2 P 2 P 3 P 3 P 4 P 4 P 5 P 5 CB messages P 3 P 4 P 5 Volume of LU messages is reduced in BLR (compressed factors) Volume of CB messages can be reduced by compressing the CB but it is an overhead cost
22 Total bytes sent Time (s) Strong scalability analysis 2000 FR BLR x10 45x10 60x10 75x10 90x10 Number of MPIs x Number of cores Compression rate is not significantly impacted by number of processes Flops reduced by 12.8 but volume of communications only by 2.2 higher relative weight of communications Load unbalance (ratio between most and less loaded processes) increases from 1.28 to 2.57 Communication analysis LU messages CB messages Front size #10 4 FR case: LU messages dominate BLR case: CB messages dominate underwhelming reduction of comms CB compression allows for truly reducing the comms; it is an overhead cost but may lead to speedups depending on network speed w.r.t. processor speed Theoretical communication analysis bounds W LU W CB W tot FR O(n 4/3 p) O(n 4/3 ) O(n 4/3 p) BLR (CB FR ) BLR (CB LR )
23 Result on a very large problem Result on matrix 15Hz (order , nnz ) on 900 cores: flops factors memory (GB) elapsed time (s) (PF) size (TB) avg. max. ana. fac. sol. MUMPS OOM OOM OOM BLR /RHS ratio References Amestoy, Buttari, L Excellent, and Mary. On the Complexity of the Block Low-Rank Multifrontal Factorization, SIAM J. Sci. Comput., Amestoy, Buttari, L Excellent, and Mary. Performance and Scalability of the Block Low-Rank Multifrontal Factorization on Multicore Architectures, submitted to ACM Trans. Math. Soft., Amestoy, Brossier, Buttari, L Excellent, Mary, Métivier, Miniussi, and Operto. Fast 3D frequency-domain full waveform inversion with a parallel Block Low-Rank multifrontal direct solver: application to OBC data from the North Sea, Geophysics, Shantsev, Jaysaval, de la Kethulle de Ryhove, Amestoy, Buttari, L Excellent, and Mary. Large-scale 3D EM modeling with a Block Low-Rank multifrontal direct solver, Geophysical Journal International, 2017.
Performance and Scalability of the Block Low-Rank Multifrontal Factorization on Multicore Architectures
Performance and Scalability of the Block Low-Rank Multifrontal Factorization on Multicore Architectures Patrick R. Amestoy, Alfredo Buttari, Jean-Yves L Excellent, Théo Mary To cite this version: Patrick
More informationBlock Low-Rank (BLR) approximations to improve multifrontal sparse solvers
Block Low-Rank (BLR) approximations to improve multifrontal sparse solvers Joint work with Patrick Amestoy, Cleve Ashcraft, Olivier Boiteau, Alfredo Buttari and Jean-Yves L Excellent, PhD started on October
More informationc 2017 Society for Industrial and Applied Mathematics
SIAM J. SCI. COMPUT. Vol. 39, No. 4, pp. A1710 A1740 c 2017 Society for Industrial and Applied Mathematics ON THE COMPLEXITY OF THE BLOCK LOW-RANK MULTIFRONTAL FACTORIZATION PATRICK AMESTOY, ALFREDO BUTTARI,
More informationSUMMARY INTRODUCTION BLOCK LOW-RANK MULTIFRONTAL METHOD
D frequency-domain seismic modeling with a Block Low-Rank algebraic multifrontal direct solver. C. Weisbecker,, P. Amestoy, O. Boiteau, R. Brossier, A. Buttari, J.-Y. L Excellent, S. Operto, J. Virieux
More informationBridging the gap between flat and hierarchical low-rank matrix formats: the multilevel BLR format
Bridging the gap between flat and hierarchical low-rank matrix formats: the multilevel BLR format Amestoy, Patrick and Buttari, Alfredo and L Excellent, Jean-Yves and Mary, Theo 2018 MIMS EPrint: 2018.12
More informationMUMPS. The MUMPS library: work done during the SOLSTICE project. MUMPS team, Lyon-Grenoble, Toulouse, Bordeaux
The MUMPS library: work done during the SOLSTICE project MUMPS team, Lyon-Grenoble, Toulouse, Bordeaux Sparse Days and ANR SOLSTICE Final Workshop June MUMPS MUMPS Team since beg. of SOLSTICE (2007) Permanent
More informationSparse Linear Algebra: Direct Methods, advanced features
Sparse Linear Algebra: Direct Methods, advanced features P. Amestoy and A. Buttari (INPT(ENSEEIHT)-IRIT) A. Guermouche (Univ. Bordeaux-LaBRI), J.-Y. L Excellent and B. Uçar (INRIA-CNRS/LIP-ENS Lyon) F.-H.
More information1 Overview. 2 Adapting to computing system evolution. 11 th European LS-DYNA Conference 2017, Salzburg, Austria
1 Overview Improving LSTC s Multifrontal Linear Solver Roger Grimes 3, Robert Lucas 3, Nick Meng 2, Francois-Henry Rouet 3, Clement Weisbecker 3, and Ting-Ting Zhu 1 1 Cray Incorporated 2 Intel Corporation
More informationImprovements for Implicit Linear Equation Solvers
Improvements for Implicit Linear Equation Solvers Roger Grimes, Bob Lucas, Clement Weisbecker Livermore Software Technology Corporation Abstract Solving large sparse linear systems of equations is often
More informationA sparse multifrontal solver using hierarchically semi-separable frontal matrices
A sparse multifrontal solver using hierarchically semi-separable frontal matrices Pieter Ghysels Lawrence Berkeley National Laboratory Joint work with: Xiaoye S. Li (LBNL), Artem Napov (ULB), François-Henry
More informationUtilisation de la compression low-rank pour réduire la complexité du solveur PaStiX
Utilisation de la compression low-rank pour réduire la complexité du solveur PaStiX 26 Septembre 2018 - JCAD 2018 - Lyon Grégoire Pichon, Mathieu Faverge, Pierre Ramet, Jean Roman Outline 1. Context 2.
More informationIncomplete Cholesky preconditioners that exploit the low-rank property
anapov@ulb.ac.be ; http://homepages.ulb.ac.be/ anapov/ 1 / 35 Incomplete Cholesky preconditioners that exploit the low-rank property (theory and practice) Artem Napov Service de Métrologie Nucléaire, Université
More informationAMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences)
AMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences) Lecture 19: Computing the SVD; Sparse Linear Systems Xiangmin Jiao Stony Brook University Xiangmin Jiao Numerical
More informationGeophysical Journal International
Geophysical Journal International Geophys. J. Int. (2017) 209, 1558 1571 Advance Access publication 2017 March 15 GJI Geomagnetism, rock magnetism and palaeomagnetism doi: 10.1093/gji/ggx106 Large-scale
More informationLU factorization with Panel Rank Revealing Pivoting and its Communication Avoiding version
1 LU factorization with Panel Rank Revealing Pivoting and its Communication Avoiding version Amal Khabou Advisor: Laura Grigori Université Paris Sud 11, INRIA Saclay France SIAMPP12 February 17, 2012 2
More informationParallel Numerical Algorithms
Parallel Numerical Algorithms Chapter 6 Matrix Models Section 6.2 Low Rank Approximation Edgar Solomonik Department of Computer Science University of Illinois at Urbana-Champaign CS 554 / CSE 512 Edgar
More informationMULTI-LAYER HIERARCHICAL STRUCTURES AND FACTORIZATIONS
MULTI-LAYER HIERARCHICAL STRUCTURES AND FACTORIZATIONS JIANLIN XIA Abstract. We propose multi-layer hierarchically semiseparable MHS structures for the fast factorizations of dense matrices arising from
More informationHybrid static/dynamic scheduling for already optimized dense matrix factorization. Joint Laboratory for Petascale Computing, INRIA-UIUC
Hybrid static/dynamic scheduling for already optimized dense matrix factorization Simplice Donfack, Laura Grigori, INRIA, France Bill Gropp, Vivek Kale UIUC, USA Joint Laboratory for Petascale Computing,
More informationEnhancing Scalability of Sparse Direct Methods
Journal of Physics: Conference Series 78 (007) 0 doi:0.088/7-6596/78//0 Enhancing Scalability of Sparse Direct Methods X.S. Li, J. Demmel, L. Grigori, M. Gu, J. Xia 5, S. Jardin 6, C. Sovinec 7, L.-Q.
More informationComputing least squares condition numbers on hybrid multicore/gpu systems
Computing least squares condition numbers on hybrid multicore/gpu systems M. Baboulin and J. Dongarra and R. Lacroix Abstract This paper presents an efficient computation for least squares conditioning
More informationFrom Direct to Iterative Substructuring: some Parallel Experiences in 2 and 3D
From Direct to Iterative Substructuring: some Parallel Experiences in 2 and 3D Luc Giraud N7-IRIT, Toulouse MUMPS Day October 24, 2006, ENS-INRIA, Lyon, France Outline 1 General Framework 2 The direct
More informationCommunication-avoiding LU and QR factorizations for multicore architectures
Communication-avoiding LU and QR factorizations for multicore architectures DONFACK Simplice INRIA Saclay Joint work with Laura Grigori INRIA Saclay Alok Kumar Gupta BCCS,Norway-5075 16th April 2010 Communication-avoiding
More informationSparse factorization using low rank submatrices. Cleve Ashcraft LSTC 2010 MUMPS User Group Meeting April 15-16, 2010 Toulouse, FRANCE
Sparse factorization using low rank submatrices Cleve Ashcraft LSTC cleve@lstc.com 21 MUMPS User Group Meeting April 15-16, 21 Toulouse, FRANCE ftp.lstc.com:outgoing/cleve/mumps1 Ashcraft.pdf 1 LSTC Livermore
More informationSpectrum-Revealing Matrix Factorizations Theory and Algorithms
Spectrum-Revealing Matrix Factorizations Theory and Algorithms Ming Gu Department of Mathematics University of California, Berkeley April 5, 2016 Joint work with D. Anderson, J. Deursch, C. Melgaard, J.
More informationIntroduction to communication avoiding algorithms for direct methods of factorization in Linear Algebra
Introduction to communication avoiding algorithms for direct methods of factorization in Linear Algebra Laura Grigori Abstract Modern, massively parallel computers play a fundamental role in a large and
More informationTowards parallel bipartite matching algorithms
Outline Towards parallel bipartite matching algorithms Bora Uçar CNRS and GRAAL, ENS Lyon, France Scheduling for large-scale systems, 13 15 May 2009, Knoxville Joint work with Patrick R. Amestoy (ENSEEIHT-IRIT,
More informationA dissection solver with kernel detection for unsymmetric matrices in FreeFem++
. p.1/21 11 Dec. 2014, LJLL, Paris FreeFem++ workshop A dissection solver with kernel detection for unsymmetric matrices in FreeFem++ Atsushi Suzuki Atsushi.Suzuki@ann.jussieu.fr Joint work with François-Xavier
More informationJ.I. Aliaga 1 M. Bollhöfer 2 A.F. Martín 1 E.S. Quintana-Ortí 1. March, 2009
Parallel Preconditioning of Linear Systems based on ILUPACK for Multithreaded Architectures J.I. Aliaga M. Bollhöfer 2 A.F. Martín E.S. Quintana-Ortí Deparment of Computer Science and Engineering, Univ.
More informationOn the design of parallel linear solvers for large scale problems
On the design of parallel linear solvers for large scale problems Journée problème de Poisson, IHP, Paris M. Faverge, P. Ramet M. Faverge Assistant Professor Bordeaux INP LaBRI Inria Bordeaux - Sud-Ouest
More informationOn the design of parallel linear solvers for large scale problems
On the design of parallel linear solvers for large scale problems ICIAM - August 2015 - Mini-Symposium on Recent advances in matrix computations for extreme-scale computers M. Faverge, X. Lacoste, G. Pichon,
More informationStatic-scheduling and hybrid-programming in SuperLU DIST on multicore cluster systems
Static-scheduling and hybrid-programming in SuperLU DIST on multicore cluster systems Ichitaro Yamazaki University of Tennessee, Knoxville Xiaoye Sherry Li Lawrence Berkeley National Laboratory MS49: Sparse
More informationAn Adaptive Hierarchical Matrix on Point Iterative Poisson Solver
Malaysian Journal of Mathematical Sciences 10(3): 369 382 (2016) MALAYSIAN JOURNAL OF MATHEMATICAL SCIENCES Journal homepage: http://einspem.upm.edu.my/journal An Adaptive Hierarchical Matrix on Point
More informationERLANGEN REGIONAL COMPUTING CENTER
ERLANGEN REGIONAL COMPUTING CENTER Making Sense of Performance Numbers Georg Hager Erlangen Regional Computing Center (RRZE) Friedrich-Alexander-Universität Erlangen-Nürnberg OpenMPCon 2018 Barcelona,
More informationNumerical Methods I Non-Square and Sparse Linear Systems
Numerical Methods I Non-Square and Sparse Linear Systems Aleksandar Donev Courant Institute, NYU 1 donev@courant.nyu.edu 1 MATH-GA 2011.003 / CSCI-GA 2945.003, Fall 2014 September 25th, 2014 A. Donev (Courant
More informationBlock-tridiagonal matrices
Block-tridiagonal matrices. p.1/31 Block-tridiagonal matrices - where do these arise? - as a result of a particular mesh-point ordering - as a part of a factorization procedure, for example when we compute
More informationBalanced Truncation Model Reduction of Large and Sparse Generalized Linear Systems
Balanced Truncation Model Reduction of Large and Sparse Generalized Linear Systems Jos M. Badía 1, Peter Benner 2, Rafael Mayo 1, Enrique S. Quintana-Ortí 1, Gregorio Quintana-Ortí 1, A. Remón 1 1 Depto.
More informationA High-Performance Parallel Hybrid Method for Large Sparse Linear Systems
Outline A High-Performance Parallel Hybrid Method for Large Sparse Linear Systems Azzam Haidar CERFACS, Toulouse joint work with Luc Giraud (N7-IRIT, France) and Layne Watson (Virginia Polytechnic Institute,
More informationCME342 Parallel Methods in Numerical Analysis. Matrix Computation: Iterative Methods II. Sparse Matrix-vector Multiplication.
CME342 Parallel Methods in Numerical Analysis Matrix Computation: Iterative Methods II Outline: CG & its parallelization. Sparse Matrix-vector Multiplication. 1 Basic iterative methods: Ax = b r = b Ax
More informationSparse BLAS-3 Reduction
Sparse BLAS-3 Reduction to Banded Upper Triangular (Spar3Bnd) Gary Howell, HPC/OIT NC State University gary howell@ncsu.edu Sparse BLAS-3 Reduction p.1/27 Acknowledgements James Demmel, Gene Golub, Franc
More informationSparse linear solvers
Sparse linear solvers Laura Grigori ALPINES INRIA and LJLL, UPMC On sabbatical at UC Berkeley March 2015 Plan Sparse linear solvers Sparse matrices and graphs Classes of linear solvers Sparse Cholesky
More informationAlgorithm 853: an Efficient Algorithm for Solving Rank-Deficient Least Squares Problems
Algorithm 853: an Efficient Algorithm for Solving Rank-Deficient Least Squares Problems LESLIE FOSTER and RAJESH KOMMU San Jose State University Existing routines, such as xgelsy or xgelsd in LAPACK, for
More informationIMPROVING THE PERFORMANCE OF SPARSE LU MATRIX FACTORIZATION USING A SUPERNODAL ALGORITHM
IMPROVING THE PERFORMANCE OF SPARSE LU MATRIX FACTORIZATION USING A SUPERNODAL ALGORITHM Bogdan OANCEA PhD, Associate Professor, Artife University, Bucharest, Romania E-mail: oanceab@ie.ase.ro Abstract:
More informationCommunication avoiding parallel algorithms for dense matrix factorizations
Communication avoiding parallel dense matrix factorizations 1/ 44 Communication avoiding parallel algorithms for dense matrix factorizations Edgar Solomonik Department of EECS, UC Berkeley October 2013
More informationAn Efficient Solver for Sparse Linear Systems based on Rank-Structured Cholesky Factorization
An Efficient Solver for Sparse Linear Systems based on Rank-Structured Cholesky Factorization David Bindel Department of Computer Science Cornell University 15 March 2016 (TSIMF) Rank-Structured Cholesky
More informationDirect solution methods for sparse matrices. p. 1/49
Direct solution methods for sparse matrices p. 1/49 p. 2/49 Direct solution methods for sparse matrices Solve Ax = b, where A(n n). (1) Factorize A = LU, L lower-triangular, U upper-triangular. (2) Solve
More informationMatrix Computations: Direct Methods II. May 5, 2014 Lecture 11
Matrix Computations: Direct Methods II May 5, 2014 ecture Summary You have seen an example of how a typical matrix operation (an important one) can be reduced to using lower level BS routines that would
More informationMatrix Assembly in FEA
Matrix Assembly in FEA 1 In Chapter 2, we spoke about how the global matrix equations are assembled in the finite element method. We now want to revisit that discussion and add some details. For example,
More informationSOLVING SPARSE LINEAR SYSTEMS OF EQUATIONS. Chao Yang Computational Research Division Lawrence Berkeley National Laboratory Berkeley, CA, USA
1 SOLVING SPARSE LINEAR SYSTEMS OF EQUATIONS Chao Yang Computational Research Division Lawrence Berkeley National Laboratory Berkeley, CA, USA 2 OUTLINE Sparse matrix storage format Basic factorization
More informationMath 671: Tensor Train decomposition methods II
Math 671: Tensor Train decomposition methods II Eduardo Corona 1 1 University of Michigan at Ann Arbor December 13, 2016 Table of Contents 1 What we ve talked about so far: 2 The Tensor Train decomposition
More informationV C V L T I 0 C V B 1 V T 0 I. l nk
Multifrontal Method Kailai Xu September 16, 2017 Main observation. Consider the LDL T decomposition of a SPD matrix [ ] [ ] [ ] [ ] B V T L 0 I 0 L T L A = = 1 V T V C V L T I 0 C V B 1 V T, 0 I where
More informationHigh Performance Parallel Tucker Decomposition of Sparse Tensors
High Performance Parallel Tucker Decomposition of Sparse Tensors Oguz Kaya INRIA and LIP, ENS Lyon, France SIAM PP 16, April 14, 2016, Paris, France Joint work with: Bora Uçar, CNRS and LIP, ENS Lyon,
More informationA DISTRIBUTED-MEMORY RANDOMIZED STRUCTURED MULTIFRONTAL METHOD FOR SPARSE DIRECT SOLUTIONS
A DISTRIBUTED-MEMORY RANDOMIZED STRUCTURED MULTIFRONTAL METHOD FOR SPARSE DIRECT SOLUTIONS ZIXING XIN, JIANLIN XIA, MAARTEN V. DE HOOP, STEPHEN CAULEY, AND VENKATARAMANAN BALAKRISHNAN Abstract. We design
More informationSingular Value Decompsition
Singular Value Decompsition Massoud Malek One of the most useful results from linear algebra, is a matrix decomposition known as the singular value decomposition It has many useful applications in almost
More informationLU Factorization with Panel Rank Revealing Pivoting and Its Communication Avoiding Version
LU Factorization with Panel Rank Revealing Pivoting and Its Communication Avoiding Version Amal Khabou James Demmel Laura Grigori Ming Gu Electrical Engineering and Computer Sciences University of California
More informationThis ensures that we walk downhill. For fixed λ not even this may be the case.
Gradient Descent Objective Function Some differentiable function f : R n R. Gradient Descent Start with some x 0, i = 0 and learning rate λ repeat x i+1 = x i λ f(x i ) until f(x i+1 ) ɛ Line Search Variant
More informationFast matrix algebra for dense matrices with rank-deficient off-diagonal blocks
CHAPTER 2 Fast matrix algebra for dense matrices with rank-deficient off-diagonal blocks Chapter summary: The chapter describes techniques for rapidly performing algebraic operations on dense matrices
More informationBLAS: Basic Linear Algebra Subroutines Analysis of the Matrix-Vector-Product Analysis of Matrix-Matrix Product
Level-1 BLAS: SAXPY BLAS-Notation: S single precision (D for double, C for complex) A α scalar X vector P plus operation Y vector SAXPY: y = αx + y Vectorization of SAXPY (αx + y) by pipelining: page 8
More informationSPARSE SOLVERS POISSON EQUATION. Margreet Nool. November 9, 2015 FOR THE. CWI, Multiscale Dynamics
SPARSE SOLVERS FOR THE POISSON EQUATION Margreet Nool CWI, Multiscale Dynamics November 9, 2015 OUTLINE OF THIS TALK 1 FISHPACK, LAPACK, PARDISO 2 SYSTEM OVERVIEW OF CARTESIUS 3 POISSON EQUATION 4 SOLVERS
More informationMatrix decompositions
Matrix decompositions How can we solve Ax = b? 1 Linear algebra Typical linear system of equations : x 1 x +x = x 1 +x +9x = 0 x 1 +x x = The variables x 1, x, and x only appear as linear terms (no powers
More informationCommunication-avoiding parallel and sequential QR factorizations
Communication-avoiding parallel and sequential QR factorizations James Demmel, Laura Grigori, Mark Hoemmen, and Julien Langou May 30, 2008 Abstract We present parallel and sequential dense QR factorization
More informationA Parallel Geometric Multifrontal Solver Using Hierarchically Semiseparable Structure
Page 26 of 46 A Parallel Geometric Multifrontal Solver Using Hierarchically Semiseparable Structure SHEN WANG, Department of Mathematics, Purdue University XIAOYE S. LI, Lawrence Berkeley National Laboratory
More informationNumerical Methods I Solving Square Linear Systems: GEM and LU factorization
Numerical Methods I Solving Square Linear Systems: GEM and LU factorization Aleksandar Donev Courant Institute, NYU 1 donev@courant.nyu.edu 1 MATH-GA 2011.003 / CSCI-GA 2945.003, Fall 2014 September 18th,
More informationSolution to Laplace Equation using Preconditioned Conjugate Gradient Method with Compressed Row Storage using MPI
Solution to Laplace Equation using Preconditioned Conjugate Gradient Method with Compressed Row Storage using MPI Sagar Bhatt Person Number: 50170651 Department of Mechanical and Aerospace Engineering,
More informationApplications of Randomized Methods for Decomposing and Simulating from Large Covariance Matrices
Applications of Randomized Methods for Decomposing and Simulating from Large Covariance Matrices Vahid Dehdari and Clayton V. Deutsch Geostatistical modeling involves many variables and many locations.
More informationBlock Bidiagonal Decomposition and Least Squares Problems
Block Bidiagonal Decomposition and Least Squares Problems Åke Björck Department of Mathematics Linköping University Perspectives in Numerical Analysis, Helsinki, May 27 29, 2008 Outline Bidiagonal Decomposition
More informationA fast randomized algorithm for computing a Hierarchically Semi-Separable representation of a matrix
A fast randomized algorithm for computing a Hierarchically Semi-Separable representation of a matrix P.G. Martinsson, Department of Applied Mathematics, University of Colorado at Boulder Abstract: Randomized
More informationarxiv: v1 [cs.na] 20 Jul 2015
AN EFFICIENT SOLVER FOR SPARSE LINEAR SYSTEMS BASED ON RANK-STRUCTURED CHOLESKY FACTORIZATION JEFFREY N. CHADWICK AND DAVID S. BINDEL arxiv:1507.05593v1 [cs.na] 20 Jul 2015 Abstract. Direct factorization
More informationExploiting off-diagonal rank structures in the solution of linear matrix equations
Stefano Massei Exploiting off-diagonal rank structures in the solution of linear matrix equations Based on joint works with D. Kressner (EPFL), M. Mazza (IPP of Munich), D. Palitta (IDCTS of Magdeburg)
More informationA Novel Aggregation Method based on Graph Matching for Algebraic MultiGrid Preconditioning of Sparse Linear Systems
A Novel Aggregation Method based on Graph Matching for Algebraic MultiGrid Preconditioning of Sparse Linear Systems Pasqua D Ambra, Alfredo Buttari, Daniela Di Serafino, Salvatore Filippone, Simone Gentile,
More informationH 2 -matrices with adaptive bases
1 H 2 -matrices with adaptive bases Steffen Börm MPI für Mathematik in den Naturwissenschaften Inselstraße 22 26, 04103 Leipzig http://www.mis.mpg.de/ Problem 2 Goal: Treat certain large dense matrices
More informationI-v k e k. (I-e k h kt ) = Stability of Gauss-Huard Elimination for Solving Linear Systems. 1 x 1 x x x x
Technical Report CS-93-08 Department of Computer Systems Faculty of Mathematics and Computer Science University of Amsterdam Stability of Gauss-Huard Elimination for Solving Linear Systems T. J. Dekker
More informationAccelerating linear algebra computations with hybrid GPU-multicore systems.
Accelerating linear algebra computations with hybrid GPU-multicore systems. Marc Baboulin INRIA/Université Paris-Sud joint work with Jack Dongarra (University of Tennessee and Oak Ridge National Laboratory)
More informationMath 671: Tensor Train decomposition methods
Math 671: Eduardo Corona 1 1 University of Michigan at Ann Arbor December 8, 2016 Table of Contents 1 Preliminaries and goal 2 Unfolding matrices for tensorized arrays The Tensor Train decomposition 3
More informationA DISTRIBUTED-MEMORY RANDOMIZED STRUCTURED MULTIFRONTAL METHOD FOR SPARSE DIRECT SOLUTIONS
SIAM J. SCI. COMPUT. Vol. 39, No. 4, pp. C292 C318 c 2017 Society for Industrial and Applied Mathematics A DISTRIBUTED-MEMORY RANDOMIZED STRUCTURED MULTIFRONTAL METHOD FOR SPARSE DIRECT SOLUTIONS ZIXING
More informationParallel sparse direct solvers for Poisson s equation in streamer discharges
Parallel sparse direct solvers for Poisson s equation in streamer discharges Margreet Nool, Menno Genseberger 2 and Ute Ebert,3 Centrum Wiskunde & Informatica (CWI), P.O.Box 9479, 9 GB Amsterdam, The Netherlands
More informationContents. Preface... xi. Introduction...
Contents Preface... xi Introduction... xv Chapter 1. Computer Architectures... 1 1.1. Different types of parallelism... 1 1.1.1. Overlap, concurrency and parallelism... 1 1.1.2. Temporal and spatial parallelism
More informationRank Revealing QR factorization. F. Guyomarc h, D. Mezher and B. Philippe
Rank Revealing QR factorization F. Guyomarc h, D. Mezher and B. Philippe 1 Outline Introduction Classical Algorithms Full matrices Sparse matrices Rank-Revealing QR Conclusion CSDA 2005, Cyprus 2 Situation
More informationMultilevel low-rank approximation preconditioners Yousef Saad Department of Computer Science and Engineering University of Minnesota
Multilevel low-rank approximation preconditioners Yousef Saad Department of Computer Science and Engineering University of Minnesota SIAM CSE Boston - March 1, 2013 First: Joint work with Ruipeng Li Work
More informationDense LU factorization and its error analysis
Dense LU factorization and its error analysis Laura Grigori INRIA and LJLL, UPMC February 2016 Plan Basis of floating point arithmetic and stability analysis Notation, results, proofs taken from [N.J.Higham,
More informationLinear Algebra Linear Algebra : Matrix decompositions Monday, February 11th Math 365 Week #4
Linear Algebra Linear Algebra : Matrix decompositions Monday, February 11th Math Week # 1 Saturday, February 1, 1 Linear algebra Typical linear system of equations : x 1 x +x = x 1 +x +9x = 0 x 1 +x x
More informationParallel Singular Value Decomposition. Jiaxing Tan
Parallel Singular Value Decomposition Jiaxing Tan Outline What is SVD? How to calculate SVD? How to parallelize SVD? Future Work What is SVD? Matrix Decomposition Eigen Decomposition A (non-zero) vector
More informationSolving PDEs with CUDA Jonathan Cohen
Solving PDEs with CUDA Jonathan Cohen jocohen@nvidia.com NVIDIA Research PDEs (Partial Differential Equations) Big topic Some common strategies Focus on one type of PDE in this talk Poisson Equation Linear
More information14.2 QR Factorization with Column Pivoting
page 531 Chapter 14 Special Topics Background Material Needed Vector and Matrix Norms (Section 25) Rounding Errors in Basic Floating Point Operations (Section 33 37) Forward Elimination and Back Substitution
More informationParallel programming using MPI. Analysis and optimization. Bhupender Thakur, Jim Lupo, Le Yan, Alex Pacheco
Parallel programming using MPI Analysis and optimization Bhupender Thakur, Jim Lupo, Le Yan, Alex Pacheco Outline l Parallel programming: Basic definitions l Choosing right algorithms: Optimal serial and
More informationAn H-LU Based Direct Finite Element Solver Accelerated by Nested Dissection for Large-scale Modeling of ICs and Packages
PIERS ONLINE, VOL. 6, NO. 7, 2010 679 An H-LU Based Direct Finite Element Solver Accelerated by Nested Dissection for Large-scale Modeling of ICs and Packages Haixin Liu and Dan Jiao School of Electrical
More informationFAST STRUCTURED EIGENSOLVER FOR DISCRETIZED PARTIAL DIFFERENTIAL OPERATORS ON GENERAL MESHES
Proceedings of the Project Review, Geo-Mathematical Imaging Group Purdue University, West Lafayette IN, Vol. 1 2012 pp. 123-132. FAST STRUCTURED EIGENSOLVER FOR DISCRETIZED PARTIAL DIFFERENTIAL OPERATORS
More informationRank revealing factorizations, and low rank approximations
Rank revealing factorizations, and low rank approximations L. Grigori Inria Paris, UPMC January 2018 Plan Low rank matrix approximation Rank revealing QR factorization LU CRTP: Truncated LU factorization
More informationA Sparse QS-Decomposition for Large Sparse Linear System of Equations
A Sparse QS-Decomposition for Large Sparse Linear System of Equations Wujian Peng 1 and Biswa N. Datta 2 1 Department of Math, Zhaoqing University, Zhaoqing, China, douglas peng@yahoo.com 2 Department
More informationSome Geometric and Algebraic Aspects of Domain Decomposition Methods
Some Geometric and Algebraic Aspects of Domain Decomposition Methods D.S.Butyugin 1, Y.L.Gurieva 1, V.P.Ilin 1,2, and D.V.Perevozkin 1 Abstract Some geometric and algebraic aspects of various domain decomposition
More informationSymmetric Pivoting in ScaLAPACK Craig Lucas University of Manchester Cray User Group 8 May 2006, Lugano
Symmetric Pivoting in ScaLAPACK Craig Lucas University of Manchester Cray User Group 8 May 2006, Lugano Introduction Introduction We wanted to parallelize a serial algorithm for the pivoted Cholesky factorization
More informationA simple FEM solver and its data parallelism
A simple FEM solver and its data parallelism Gundolf Haase Institute for Mathematics and Scientific Computing University of Graz, Austria Chile, Jan. 2015 Partial differential equation Considered Problem
More informationPartial Left-Looking Structured Multifrontal Factorization & Algorithms for Compressed Sensing. Cinna Julie Wu
Partial Left-Looking Structured Multifrontal Factorization & Algorithms for Compressed Sensing by Cinna Julie Wu A dissertation submitted in partial satisfaction of the requirements for the degree of Doctor
More informationc 2013 Society for Industrial and Applied Mathematics
SIAM J. MATRIX ANAL. APPL. Vol. 34, No. 3, pp. 1401 1429 c 2013 Society for Industrial and Applied Mathematics LU FACTORIZATION WITH PANEL RANK REVEALING PIVOTING AND ITS COMMUNICATION AVOIDING VERSION
More informationMinisymposia 9 and 34: Avoiding Communication in Linear Algebra. Jim Demmel UC Berkeley bebop.cs.berkeley.edu
Minisymposia 9 and 34: Avoiding Communication in Linear Algebra Jim Demmel UC Berkeley bebop.cs.berkeley.edu Motivation (1) Increasing parallelism to exploit From Top500 to multicores in your laptop Exponentially
More informationLecture 5: Randomized methods for low-rank approximation
CBMS Conference on Fast Direct Solvers Dartmouth College June 23 June 27, 2014 Lecture 5: Randomized methods for low-rank approximation Gunnar Martinsson The University of Colorado at Boulder Research
More informationParallel PIPS-SBB Multi-level parallelism for 2-stage SMIPS. Lluís-Miquel Munguia, Geoffrey M. Oxberry, Deepak Rajan, Yuji Shinano
Parallel PIPS-SBB Multi-level parallelism for 2-stage SMIPS Lluís-Miquel Munguia, Geoffrey M. Oxberry, Deepak Rajan, Yuji Shinano ... Our contribution PIPS-PSBB*: Multi-level parallelism for Stochastic
More informationFast algorithms for hierarchically semiseparable matrices
NUMERICAL LINEAR ALGEBRA WITH APPLICATIONS Numer. Linear Algebra Appl. 2010; 17:953 976 Published online 22 December 2009 in Wiley Online Library (wileyonlinelibrary.com)..691 Fast algorithms for hierarchically
More informationA direct solver for elliptic PDEs in three dimensions based on hierarchical merging of Poincaré-Steklov operators
(1) A direct solver for elliptic PDEs in three dimensions based on hierarchical merging of Poincaré-Steklov operators S. Hao 1, P.G. Martinsson 2 Abstract: A numerical method for variable coefficient elliptic
More informationMatrix decompositions
Matrix decompositions How can we solve Ax = b? 1 Linear algebra Typical linear system of equations : x 1 x +x = x 1 +x +9x = 0 x 1 +x x = The variables x 1, x, and x only appear as linear terms (no powers
More information