Block Low-Rank (BLR) approximations to improve multifrontal sparse solvers
|
|
- Sherilyn Oliver
- 5 years ago
- Views:
Transcription
1 Block Low-Rank (BLR) approximations to improve multifrontal sparse solvers Joint work with Patrick Amestoy, Cleve Ashcraft, Olivier Boiteau, Alfredo Buttari and Jean-Yves L Excellent, PhD started on October 1st, 2010 and financed by EDF Clément Weisbecker, ENSEEIHT-IRIT, University of Toulouse, France MUMPS Users Group Meeting Clamart, France, May 29-30, 2013
2 Introduction Multifrontal solver Block Low-Rank approximations to improve multifrontal sparse solvers direct solver for large linear systems objective: A = LU Low-rank approximations compression and flop reduction accuracy controlled by a numerical parameter Combine these two notions to improve multifrontal solvers (in the context of MUMPS) 2/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013
3 The multifrontal method
4 Separator tree (Schreiber 1982) dependencies between variables topological order 4/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013
5 Frontal matrices At each node, a partial factorization of the frontal matrix is performed : CB 1 CB 2 FS assembly elim CB CBs are stacked waiting for assembly peak of CB stack NFS FS separator variables NFS border variables FS 5/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013
6 Low-rank approximations
7 Low-rank matrices Low-rank matrix (Bebendorf) Let A be a m n matrix Let k ε be the approximated numerical rank of A at accuracy ε A is said to be a low-rank matrix if there exist matrices X of size m k ε, Y of size n k ε, E of size m n such that : A = X Y T + E, where E 2 ε and k ε (m + n) < mn n k 2 k 2 Y T 2 k 1 k 1 C = Y T 1 X 2 Low-rank product: X 1 (Y T 1 X 2)Y T 2 m X1 7/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013
8 Where can I find low-rank matrices? Problem: Matrices are usually not low-rank! Solution (Börm, Bebendorf): Well defined subblocks A b = A στ can be low-rank! (b = σ τ I I) Define a clustering C to obtain low-rank blocks I 1 I 2 I 3 I 1 I 2 I 3 8/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013
9 Admissibility condition How to find a good clustering? Admissibility condition for elliptic PDEs (Börm, Grasedyck, Hackbusch) Let σ I and τ I be clusters of indices The block (σ, τ) is said admissible if σ and τ satisfy diam(σ) + diam(τ) 2ηdist(σ, τ) where η is a problem dependant fixed parameter (usually 1 or 05) diam(σ) dist(σ,τ) diam(τ) σ 9/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013 τ
10 Illustration (1) D 1 D 2 10/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013
11 Illustration (1) D 1 D 2 10/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013
12 Illustration (1) D 1 D 2 10/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013
13 Illustration (1) D 1 D 2 10/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013
14 Illustration (1) D 1 D 2 10/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013
15 Illustration (1) D 1 D 2 10/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013
16 Illustration (2) rank of distance between and room for compression 11/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013
17 Exploiting low-rank subblocks: H matrices à = D 11 X 12 Y T 12 X 36 Y T 36 X 21 Y T 21 D 22 X 63 Y T 63 D 44 X 45 Y T 45 X 54 Y T 54 D 55 I 7 I 7 I 3 I 3 I 3 I 6 I 6 I 3 I 6 I 6 I 1 I 1 I 1 I 2 I 2 I 1 I 2 I 2 I 4 I 4 I 4 I 5 I 5 I 4 I 5 I 5 (Börm, Bebendorf, Hackbush, Grasedyck) 12/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013
18 Exploiting low-rank subblocks: HSS matrices D 1 X 1 B 1 Y T 2 X 1 R 1 B 3 W T 4 YT 4 X 1 R 1 B 3 W T 5 YT 5 Ã = X 2 B 2 Y1 T D 2 X 2 R 2 B 3 W4 T YT 4 X 2 R 2 B 3 W5 T YT 5 X 4 R 4 B 6 W1 T YT 1 X 4 R 4 B 6 W2 T YT 2 D 4 X 4 B 4 Y5 T X 5 R 5 B 6 W1 T YT 1 X 5 R 5 B 6 W2 T YT 2 X 5 B 5 Y4 T D B 3 B 6 6 R 1,W 1 B 1 R 2,W 2 R 4,W 4 B 4 R 5,W 5 1 B B 5 5 X 1,Y 1 X 2,Y 2 X 4,Y 4 X 5,Y 5 D 1 D 2 D 4 D 5 (Li, Napov, Xia, Gu, Wang, Rouet Gillman, Martinsson) 13/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013
19 Exploiting low-rank subblocks: BLR matrices D 1 X 12 Y T 12 X 13 Y T 13 X 14 Y T 14 Ã = X 21 Y T 21 D 2 X 23 Y T 23 X 24 Y T 24 X 31 Y T 31 X 32 Y T 32 D 3 X 34 Y T 34 X 41 Y T 41 X 42 Y T 42 X 43 Y T 43 D 4 particular case of H-matrices no tree natural matrix structure Idea: representing fronts with BLR matrices 14/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013
20 Comparative study: compression rates Compression rates of the frontal matrix at the root of a multifrontal tree, on two 3D stencils (discretization of a 128 x 128 x 128 cube) fraction of dense storage Laplacian low-rank threshold H HSS BLR fraction of dense storage pts Geoazur stencil low-rank threshold H HSS BLR 15/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013
21 Comparative study: compression cost Compression cost of the frontal matrix at the root of a multifrontal tree, on two 3D stencils (discretization of a 128 x 128 x 128 cube) log 10 (number of operations for compression) Laplacian H HSS BLR log 10 (number of operations for compression) 27-pts Geoazur stencil low-rank threshold low-rank threshold H HSS BLR 16/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013
22 Adaptation to a multifrontal solver 17/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013
23 Adaptation to a multifrontal solver no relative order efficient clustering 17/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013
24 Adaptation to a multifrontal solver no relative order efficient compr and decompr 200 efficient clustering distribution assembly 17/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013
25 Adaptation to a multifrontal solver no relative order efficient compr and decompr flat non hierarchical structure efficient clustering distribution assembly pivoting 17/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013
26 Clustering variables
27 Admissible clustering Constraint : the admissibility condition should be satisfied large diameters fraction of memory used 83% small diameters fraction of memory used 57% 19/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013
28 Halo algorithm for the clustering of a separator Designed to catch the geometry of the problem Computed with the graph instead of the mesh Coupled with a third party partitioning tool 20/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013
29 Halo algorithm for the clustering of a separator Designed to catch the geometry of the problem Computed with the graph instead of the mesh Coupled with a third party partitioning tool 1 The separator 20/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013
30 Halo algorithm for the clustering of a separator Designed to catch the geometry of the problem Computed with the graph instead of the mesh Coupled with a third party partitioning tool 1 The separator 2 The halo 20/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013
31 Halo algorithm for the clustering of a separator Designed to catch the geometry of the problem Computed with the graph instead of the mesh Coupled with a third party partitioning tool 1 The separator 2 The halo 3 Extraction of the halo 20/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013
32 Halo algorithm for the clustering of a separator Designed to catch the geometry of the problem Computed with the graph instead of the mesh Coupled with a third party partitioning tool 1 The separator 2 The halo 3 Extraction of the halo 4 Partition of the halo 20/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013
33 Halo algorithm for the clustering of a separator Designed to catch the geometry of the problem Computed with the graph instead of the mesh Coupled with a third party partitioning tool P 1 P 2 P 3 1 The separator 2 The halo 3 Extraction of the halo 4 Partition of the halo 5 Partition of the separator (block size is fixed) 20/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013
34 Clustering of the variables of a front front = separator + border S 21/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013
35 Clustering of the variables of a front front = separator + border 1- separator : halo S 21/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013
36 Clustering of the variables of a front front = separator + border 1- separator : halo 2- border? 2 choices : S 21/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013
37 Clustering of the variables of a front front = separator + border 1- separator : halo 2- border? 2 choices : S EXPLICIT S 21/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013
38 Clustering of the variables of a front front = separator + border 1- separator : halo 2- border? 2 choices : S EXPLICIT S 21/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013
39 Clustering of the variables of a front front = separator + border 1- separator : halo 2- border? 2 choices : S EXPLICIT INHERITED (top down) S S 21/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013
40 Clustering of the variables of a front front = separator + border 1- separator : halo 2- border? 2 choices : S EXPLICIT INHERITED (top down) S S 21/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013
41 Clustering of the variables of a front front = separator + border 1- separator : halo 2- border? 2 choices : S EXPLICIT INHERITED (top down) S S 21/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013
42 Clustering of the variables of a front front = separator + border 1- separator : halo 2- border? 2 choices : S EXPLICIT INHERITED (top down) S S 21/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013
43 Structural comparison optimal optimal = optimal block 22/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013
44 Structural comparison optimal optimal = optimal block small optimal = large enough block 22/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013
45 Structural comparison optimal optimal = optimal block small optimal = large enough block small small = too small block 22/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013
46 Structural comparison optimal optimal = optimal block small optimal = large enough block small small = too small block reclustering strategies S 4 22/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013
47 Illustration with a top-level separator from SCOTCH side face view perspective view 23/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013
48 Block Low-Rank multifrontal method
49 General idea fronts approximated with low-rank matrices σ L i 1,1 U i 1,1 L i 2,2 U i 2,2 τ U i di,di L i di,di CB 25/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013
50 Switch level (1) percentof the entries of the factor computed in fronts larger than N mesh size N =200 N =400 N =600 N =800 N =1000 a large ratio of the memory consumption is targeted 26/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013
51 Switch level (1) 100 percentof operations performed in fronts larger than N N =200 N =400 N =600 N =800 N = mesh size a large ratio of the flop count is targeted 26/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013
52 Switch level (2) low-rank techniques only used on large fronts mesh size N = N = N = N = N = (Proportion of fronts larger than N with respect to the total number of fronts in the multifrontal tree, for different mesh sizes) Only a few fronts are actually compressed! 27/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013
53 Factorization algorithms task operation type dense low-rank Factor (F) B = LU T (2/3)n 3 (2/3)n 3 Compress (C) C = XY T kn 2 Solve (S) D = X(Y T L 1 ) n 3 kn 2 CB update (U) D = D X 1 (Y1 T X 2)Y2 T 2n 3 2kn 2 28/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013
54 Factorization algorithms task operation type dense low-rank Factor (F) B = LU T (2/3)n 3 (2/3)n 3 Compress (C) C = XY T kn 2 Solve (S) D = X(Y T L 1 ) n 3 kn 2 CB update (U) D = D X 1 (Y1 T X 2)Y2 T 2n 3 2kn 2 Standard FR algorithm: F Factor S Solve U Update 28/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013
55 Factorization algorithms task operation type dense low-rank Factor (F) B = LU T (2/3)n 3 (2/3)n 3 Compress (C) C = XY T kn 2 Solve (S) D = X(Y T L 1 ) n 3 kn 2 CB update (U) D = D X 1 (Y1 T X 2)Y2 T 2n 3 2kn 2 Standard FR algorithm: F Factor S Solve U Update 28/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013
56 Factorization algorithms task operation type dense low-rank Factor (F) B = LU T (2/3)n 3 (2/3)n 3 Compress (C) C = XY T kn 2 Solve (S) D = X(Y T L 1 ) n 3 kn 2 CB update (U) D = D X 1 (Y1 T X 2)Y2 T 2n 3 2kn 2 Standard FR algorithm: F Factor S Solve U Update 28/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013
57 Factorization algorithms task operation type dense low-rank Factor (F) B = LU T (2/3)n 3 (2/3)n 3 Compress (C) C = XY T kn 2 Solve (S) D = X(Y T L 1 ) n 3 kn 2 CB update (U) D = D X 1 (Y1 T X 2)Y2 T 2n 3 2kn 2 Standard FR algorithm: F Factor S Solve U Update 28/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013
58 Factorization algorithms task operation type dense low-rank Factor (F) B = LU T (2/3)n 3 (2/3)n 3 Compress (C) C = XY T kn 2 Solve (S) D = X(Y T L 1 ) n 3 kn 2 CB update (U) D = D X 1 (Y1 T X 2)Y2 T 2n 3 2kn 2 BLR FSCU algorithm: F Factor S Solve U Update 28/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013
59 Factorization algorithms task operation type dense low-rank Factor (F) B = LU T (2/3)n 3 (2/3)n 3 Compress (C) C = XY T kn 2 Solve (S) D = X(Y T L 1 ) n 3 kn 2 CB update (U) D = D X 1 (Y1 T X 2)Y2 T 2n 3 2kn 2 BLR FSCU algorithm: F Factor S Solve U Update 28/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013
60 Factorization algorithms task operation type dense low-rank Factor (F) B = LU T (2/3)n 3 (2/3)n 3 Compress (C) C = XY T kn 2 Solve (S) D = X(Y T L 1 ) n 3 kn 2 CB update (U) D = D X 1 (Y1 T X 2)Y2 T 2n 3 2kn 2 BLR FSCU algorithm: F Factor S Solve U Update 28/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013
61 Factorization algorithms task operation type dense low-rank Factor (F) B = LU T (2/3)n 3 (2/3)n 3 Compress (C) C = XY T kn 2 Solve (S) D = X(Y T L 1 ) n 3 kn 2 CB update (U) D = D X 1 (Y1 T X 2)Y2 T 2n 3 2kn 2 BLR FSCU algorithm: F Factor S Solve C Compress U Update 28/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013
62 Factorization algorithms task operation type dense low-rank Factor (F) B = LU T (2/3)n 3 (2/3)n 3 Compress (C) C = XY T kn 2 Solve (S) D = X(Y T L 1 ) n 3 kn 2 CB update (U) D = D X 1 (Y1 T X 2)Y2 T 2n 3 2kn 2 BLR FSCU algorithm: F Factor S Solve C Compress U Update 28/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013
63 Factorization algorithms task operation type dense low-rank Factor (F) B = LU T (2/3)n 3 (2/3)n 3 Compress (C) C = XY T kn 2 Solve (S) D = X(Y T L 1 ) n 3 kn 2 CB update (U) D = D X 1 (Y1 T X 2)Y2 T 2n 3 2kn 2 F Factor S Solve C Compress U Update FCSU version more efficient less stability 28/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013
64 Factorization algorithms task operation type dense low-rank Factor (F) B = LU T (2/3)n 3 (2/3)n 3 Compress (C) C = XY T kn 2 Solve (S) D = X(Y T L 1 ) n 3 kn 2 CB update (U) D = D X 1 (Y1 T X 2)Y2 T 2n 3 2kn 2 F Factor S Solve C Compress U Update FSUC version no flop reduction more stability 28/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013
65 Factorization algorithms task operation type dense low-rank Factor (F) B = LU T (2/3)n 3 (2/3)n 3 Compress (C) C = XY T kn 2 Solve (S) D = X(Y T L 1 ) n 3 kn 2 CB update (U) D = D X 1 (Y1 T X 2)Y2 T 2n 3 2kn 2 F Factor S Solve C Compress U Update High algorithmic flexibility 28/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013
66 Experiments
67 Set of problems Name Prop Arith N NZ mem LU flops LU CSR appli ( 10 6 ) ( 10 9 ) (GB) ( ) Curl D/sym D E-15 Geoazur D/unsym Z E-4 wave prop EDF_A_MECA_R12 2D/sym D E-15 mechanics EDF_D_THER_R7 3D/sym D E-15 thermics CSR = Componentwise Scaled Residual Code_Aster tpll01{a,d} test cases (refined) large matrices applicative problems 30/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013
68 Metrics memory: L = fraction of FR factors storage obtained with BLR (%) CB = fraction of FR maximum size of CB stack obtained with BLR (%) flops: fraction of FR operations needed for the BLR factorization (in percent or absolute data, including the compression cost) 31/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013
69 Clustering strategy memory flops time L CB facto clustering full rank analysis clustering inh exp inh exp inh exp inh exp Curl % 624% 70% 55% 109% 111% 13 s 27 s 897 s Geoazur % 770% 470% 450% 608% 591% 5 s 42 s 62 s TH_RAFF7 341% 307% 175% 162% 72% 66% 34 s 206 s 387 s ME_RAFF12 529% 511% 48% 41% 61% 61% 121 s 239 s 1971 s inherited version is more than 2 times faster same results on L 11 comparable results on L 21 a little less good on CBs inherited clustering used for all the experiments 32/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013
70 Global ordering of the matrix: general results Results with different orderings on Geoazur128 problem FR ordering mry flops peak L CB flops AMD 109GB 39E GB 735% 400% 594% AMF 72GB 18E GB 699% 535% 483% PORD 55GB 10E GB 704% 490% 489% METIS 46GB 62E GB 787% 464% 626% SCOTCH 49GB 66E GB 794% 481% 634% LR 33/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013
71 Global ordering of the matrix: general results Results with different orderings on Geoazur128 problem FR ordering mry flops peak L CB flops AMD 109GB 39E GB 735% 400% 594% AMF 72GB 18E GB 699% 535% 483% PORD 55GB 10E GB 704% 490% 489% METIS 46GB 62E GB 787% 464% 626% SCOTCH 49GB 66E GB 794% 481% 634% LR Algebraic approach 33/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013
72 Global ordering of the matrix: AMF tree [visualization tool developed by M Bremond talk tomorrow 11:30] 34/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013
73 Global ordering of the matrix: SCOTCH tree [visualization tool developed by M Bremond talk tomorrow 11:30] 35/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013
74 Influence of the size of the problem different refinements of problem EDF_A_MECA done with Homard memory flops Refinement N L CB facto R8 2, 101, % 43% 37% R11 33, 570, % 29% 12% R12 134, 250, % 24% 7% the larger the problem, the more efficient the method target = challenging problems 36/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013
75 Scalability with respect to the size of the problem Laplacian problem, ε = Full Rank O(N 2 ) BLR O(N 4/3 ) 14 flops facto Mesh size M O(N 4/3 ) complexity 37/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013
76 Influence of the size of the clusters memory flops block size L CB facto compr other % 55% 75% 98E+11 46E % 50% 68% 15E+12 41E % 46% 63% 19E+12 37E % 45% 60% 24E+12 34E % 44% 58% 27E+12 33E % 44% 56% 31E+12 32E % 45% 56% 34E+12 31E % 44% 56% 37E+12 31E % 44% 57% 44E+12 31E+13 flexibility (no need for a fine tuning) can be dynamically adapted for pivoting, parallelism, 38/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013
77 Global results: 2D problems EDF_A_MECA_R12 QR DROPPING PARAMETER L CB flops facto (%) CSR CURL-CURL 2D 5000x QR DROPPING PARAMETER Name N mem LU flops LU CSR EDF_A_MECA_R12 134E GB 2E+14 4E-15 Curl-Curl E+6 29 GB 5E+12 2E-15 39/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013
78 Global results: 3D problems L CB flops facto (%) CSR EDF_D_THER_R GEOAZUR 3D 128x128x QR DROPPING PARAMETER QR DROPPING PARAMETER Name N mem LU flops LU CSR EDF_D_THER_R7 8E GB 1E+14 8E-15 Geoazur E+6 54 GB 6E+13 3E-4 40/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013
79 Application to geophysics (1) Helmholtz equation for seismic modeling (SEISCOPE project) EAGE overthrust ground model 0 20 Dip (km) Cross (km) 15 5 Depth (km) m/s fqcy Flops LU Mem LU Peak memory 2 Hz 8957E+11 3 GB 4 GB 4 Hz 1639E GB 25 GB 8 Hz 5769E GB 283 GB 41/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013
80 Application to geophysics (2) 5 Dip (km) b) 20 ) Dip (km) e) m ) Dip (km) Dip (km) Depth (km) h) Dip (km) i) k) Dip (km) C 10 Depth (km) Dip (km) (k ro ss s ro s l) (k m ) 15 5 Dip (km) m ) 15 5 s Dip (km) 10 0 ro s C Depth (km) Depth (km) Depth (km) j) 15 C ro ss C ro ss C ro ss Dip (km) 10 (k (k m ) m ) 5 0 C ro ss (k Depth (km) (k m ) f) C ro ss C ro ss (k m ) g) Dip (km) 10 ro ss C Depth (km) m ) 5 0 (k 0 15 (k m ) C c) 20 (k m Depth (km) Depth (km) Depth (km) Depth (km) Dip (km) Depth (km) 5 5 d) 42/51 0 C ro ss C ro ss ) 0 (k m (k m ) a) a) MUMPS Users Group Meeting Clamart, France, May 29-30, 2013
81 Application to geophysics (3) Dip (km) Dip (km) d) Dip (km) (k C ro ss ss C ro flops 43/51 c) 20 Depth (km) m ) 5 5 Depth (km) Depth (km) 0 (k (k ss C ro m ) b) 20 ) 15 m Dip (km) 10 (k 5 C ro ss 0 Depth (km) m ) a) memory ε fqcy facto F CB (10 5 ) 2 Hz 4 Hz 8 Hz 418 % 274 % 218 % 618 % 500 % 416 % 323% 244% 239% (10 4 ) 2 Hz 4 Hz 8 Hz 329 % 200 % 152 % 534 % 422 % 289 % 239% 217% 194% (10 3 ) 2 Hz 4 Hz 8 Hz 246 % 138 % 98 % 447 % 345 % 213 % 168% 190% 159% MUMPS Users Group Meeting Clamart, France, May 29-30, 2013
82 Preconditioning with BLR: set of problems N NZ Cond application Piston 13E+6 547E+6 51E+5 external pressure force on the top perf001d 20E+6 758E+6 15E+11 cavity hook subjected to internal pressure force (challenging for EDF) 44/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013
83 Preconditioning with BLR: number of iterations Number of iterations needed for the convergence FR is MUMPS 410 single precision means no convergence in 1, 000 seconds FR Piston perf001d perf001d: only 2 iterations to convergence with a BLR double precision precond at ε = 10 8 good solution versus coarse sword of usual single precision! 45/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013
84 Preconditioning with BLR: Piston (compr and time) 70 (%) PISTON 60 L CB flops facto (%) QR DROPPING PARAMETER 46/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013
85 Preconditioning with BLR: Piston (compr and time) (s) Total execution time Factorization time Solves time (GCPC) PISTON 250 time in seconds FR = 313s FR = 308s FR = 5s QR DROPPING PARAMETER 46/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013
86 Preconditioning with BLR: perf001d (compr and time) (%) PERF L CB flops facto (%) QR DROPPING PARAMETER 47/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013
87 Preconditioning with BLR: perf001d (compr and time) PERF (s) 450 Total execution time Solves time (GCPC) 400 Factorization time 350 time in seconds FR = 569s FR = 117s FR = 452s QR DROPPING PARAMETER 47/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013
88 Software work new FR factorization with two levels of blocking: internal panels: efficiency of BLAS 3 operations external panels: correspond to low-rank blocks done in factorization algorithm: BLR seq unsymm with all MUMPS features and double panel BLR seq symm prototype (single panel, no pivoting) actual flop compression is achieved in both cases MPI symm and unsymm: panel facto rewritten for BLR very short term: BLR seq symm with all MUMPS features and double panel short term: long term: BLR MPI symm and unsymm BLR solution phase actual memory gains 48/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013
89 Conclusion & perspectives efficient method on various applicative problems considerable memory reduction & substantial decrease in computations O(N 4/3 ) complexity, comparable to HSS can be used as a preconditioner or as a direct solver good potential for parallelism code is stable on tested problems MPI study on larger and more difficult problems error propagation study absolute or relative dropping parameter? relative to what? (work with S Gratton, M Ngom and D Titley-Peloquin started) 49/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013
90 Aknowledgements O Boiteau, B Quinnez and N Tardieu (EDF R&D) S Operto, R Brossier and J Virieux (SEISCOPE Project) for their contribution to the geophysics study the Toulouse Computing Center (CICT) and N Renon S Li, A Napov and F-H Rouet (LBNL Berkeley) Details on this work can be found in: P Amestoy, C Ashcraft, O Boiteau, A Buttari, J-Y L Excellent and C Weisbecker, Improving multifrontal methods by means of block low-rank representations, IRIT technical report RT/APO/12/6, INRIA technical report INRIA/RR-8199, submitted to SIAM SISC, enseeihtfr/documents/rt_apo_12_6_blrpdf, /51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013
91 Thank you! Any questions?
SUMMARY INTRODUCTION BLOCK LOW-RANK MULTIFRONTAL METHOD
D frequency-domain seismic modeling with a Block Low-Rank algebraic multifrontal direct solver. C. Weisbecker,, P. Amestoy, O. Boiteau, R. Brossier, A. Buttari, J.-Y. L Excellent, S. Operto, J. Virieux
More informationLarge Scale Sparse Linear Algebra
Large Scale Sparse Linear Algebra P. Amestoy (INP-N7, IRIT) A. Buttari (CNRS, IRIT) T. Mary (University of Toulouse, IRIT) A. Guermouche (Univ. Bordeaux, LaBRI), J.-Y. L Excellent (INRIA, LIP, ENS-Lyon)
More informationA sparse multifrontal solver using hierarchically semi-separable frontal matrices
A sparse multifrontal solver using hierarchically semi-separable frontal matrices Pieter Ghysels Lawrence Berkeley National Laboratory Joint work with: Xiaoye S. Li (LBNL), Artem Napov (ULB), François-Henry
More informationMUMPS. The MUMPS library: work done during the SOLSTICE project. MUMPS team, Lyon-Grenoble, Toulouse, Bordeaux
The MUMPS library: work done during the SOLSTICE project MUMPS team, Lyon-Grenoble, Toulouse, Bordeaux Sparse Days and ANR SOLSTICE Final Workshop June MUMPS MUMPS Team since beg. of SOLSTICE (2007) Permanent
More informationPerformance and Scalability of the Block Low-Rank Multifrontal Factorization on Multicore Architectures
Performance and Scalability of the Block Low-Rank Multifrontal Factorization on Multicore Architectures Patrick R. Amestoy, Alfredo Buttari, Jean-Yves L Excellent, Théo Mary To cite this version: Patrick
More informationSparse Linear Algebra: Direct Methods, advanced features
Sparse Linear Algebra: Direct Methods, advanced features P. Amestoy and A. Buttari (INPT(ENSEEIHT)-IRIT) A. Guermouche (Univ. Bordeaux-LaBRI), J.-Y. L Excellent and B. Uçar (INRIA-CNRS/LIP-ENS Lyon) F.-H.
More informationc 2017 Society for Industrial and Applied Mathematics
SIAM J. SCI. COMPUT. Vol. 39, No. 4, pp. A1710 A1740 c 2017 Society for Industrial and Applied Mathematics ON THE COMPLEXITY OF THE BLOCK LOW-RANK MULTIFRONTAL FACTORIZATION PATRICK AMESTOY, ALFREDO BUTTARI,
More informationSparse factorization using low rank submatrices. Cleve Ashcraft LSTC 2010 MUMPS User Group Meeting April 15-16, 2010 Toulouse, FRANCE
Sparse factorization using low rank submatrices Cleve Ashcraft LSTC cleve@lstc.com 21 MUMPS User Group Meeting April 15-16, 21 Toulouse, FRANCE ftp.lstc.com:outgoing/cleve/mumps1 Ashcraft.pdf 1 LSTC Livermore
More informationImprovements for Implicit Linear Equation Solvers
Improvements for Implicit Linear Equation Solvers Roger Grimes, Bob Lucas, Clement Weisbecker Livermore Software Technology Corporation Abstract Solving large sparse linear systems of equations is often
More informationEnhancing Scalability of Sparse Direct Methods
Journal of Physics: Conference Series 78 (007) 0 doi:0.088/7-6596/78//0 Enhancing Scalability of Sparse Direct Methods X.S. Li, J. Demmel, L. Grigori, M. Gu, J. Xia 5, S. Jardin 6, C. Sovinec 7, L.-Q.
More informationAn Efficient Solver for Sparse Linear Systems based on Rank-Structured Cholesky Factorization
An Efficient Solver for Sparse Linear Systems based on Rank-Structured Cholesky Factorization David Bindel Department of Computer Science Cornell University 15 March 2016 (TSIMF) Rank-Structured Cholesky
More informationA Parallel Geometric Multifrontal Solver Using Hierarchically Semiseparable Structure
Page 26 of 46 A Parallel Geometric Multifrontal Solver Using Hierarchically Semiseparable Structure SHEN WANG, Department of Mathematics, Purdue University XIAOYE S. LI, Lawrence Berkeley National Laboratory
More informationAn H-LU Based Direct Finite Element Solver Accelerated by Nested Dissection for Large-scale Modeling of ICs and Packages
PIERS ONLINE, VOL. 6, NO. 7, 2010 679 An H-LU Based Direct Finite Element Solver Accelerated by Nested Dissection for Large-scale Modeling of ICs and Packages Haixin Liu and Dan Jiao School of Electrical
More informationBridging the gap between flat and hierarchical low-rank matrix formats: the multilevel BLR format
Bridging the gap between flat and hierarchical low-rank matrix formats: the multilevel BLR format Amestoy, Patrick and Buttari, Alfredo and L Excellent, Jean-Yves and Mary, Theo 2018 MIMS EPrint: 2018.12
More information1 Overview. 2 Adapting to computing system evolution. 11 th European LS-DYNA Conference 2017, Salzburg, Austria
1 Overview Improving LSTC s Multifrontal Linear Solver Roger Grimes 3, Robert Lucas 3, Nick Meng 2, Francois-Henry Rouet 3, Clement Weisbecker 3, and Ting-Ting Zhu 1 1 Cray Incorporated 2 Intel Corporation
More informationIncomplete Cholesky preconditioners that exploit the low-rank property
anapov@ulb.ac.be ; http://homepages.ulb.ac.be/ anapov/ 1 / 35 Incomplete Cholesky preconditioners that exploit the low-rank property (theory and practice) Artem Napov Service de Métrologie Nucléaire, Université
More informationGeophysical Journal International
Geophysical Journal International Geophys. J. Int. (2017) 209, 1558 1571 Advance Access publication 2017 March 15 GJI Geomagnetism, rock magnetism and palaeomagnetism doi: 10.1093/gji/ggx106 Large-scale
More informationA High-Performance Parallel Hybrid Method for Large Sparse Linear Systems
Outline A High-Performance Parallel Hybrid Method for Large Sparse Linear Systems Azzam Haidar CERFACS, Toulouse joint work with Luc Giraud (N7-IRIT, France) and Layne Watson (Virginia Polytechnic Institute,
More informationUtilisation de la compression low-rank pour réduire la complexité du solveur PaStiX
Utilisation de la compression low-rank pour réduire la complexité du solveur PaStiX 26 Septembre 2018 - JCAD 2018 - Lyon Grégoire Pichon, Mathieu Faverge, Pierre Ramet, Jean Roman Outline 1. Context 2.
More informationFAST STRUCTURED EIGENSOLVER FOR DISCRETIZED PARTIAL DIFFERENTIAL OPERATORS ON GENERAL MESHES
Proceedings of the Project Review, Geo-Mathematical Imaging Group Purdue University, West Lafayette IN, Vol. 1 2012 pp. 123-132. FAST STRUCTURED EIGENSOLVER FOR DISCRETIZED PARTIAL DIFFERENTIAL OPERATORS
More informationFast algorithms for hierarchically semiseparable matrices
NUMERICAL LINEAR ALGEBRA WITH APPLICATIONS Numer. Linear Algebra Appl. 2010; 17:953 976 Published online 22 December 2009 in Wiley Online Library (wileyonlinelibrary.com)..691 Fast algorithms for hierarchically
More informationFrom Direct to Iterative Substructuring: some Parallel Experiences in 2 and 3D
From Direct to Iterative Substructuring: some Parallel Experiences in 2 and 3D Luc Giraud N7-IRIT, Toulouse MUMPS Day October 24, 2006, ENS-INRIA, Lyon, France Outline 1 General Framework 2 The direct
More informationAn Adaptive Hierarchical Matrix on Point Iterative Poisson Solver
Malaysian Journal of Mathematical Sciences 10(3): 369 382 (2016) MALAYSIAN JOURNAL OF MATHEMATICAL SCIENCES Journal homepage: http://einspem.upm.edu.my/journal An Adaptive Hierarchical Matrix on Point
More informationA Novel Aggregation Method based on Graph Matching for Algebraic MultiGrid Preconditioning of Sparse Linear Systems
A Novel Aggregation Method based on Graph Matching for Algebraic MultiGrid Preconditioning of Sparse Linear Systems Pasqua D Ambra, Alfredo Buttari, Daniela Di Serafino, Salvatore Filippone, Simone Gentile,
More informationFast matrix algebra for dense matrices with rank-deficient off-diagonal blocks
CHAPTER 2 Fast matrix algebra for dense matrices with rank-deficient off-diagonal blocks Chapter summary: The chapter describes techniques for rapidly performing algebraic operations on dense matrices
More informationTechnology USA; 3 Dept. of Mathematics, University of California, Irvine. SUMMARY
Adrien Scheuer, Leonardo Zepeda-Núñez,3*), Russell J. Hewett 2 and Laurent Demanet Dept. of Mathematics and Earth Resources Lab, Massachusetts Institute of Technology; 2 Total E&P Research & Technology
More informationPartial Left-Looking Structured Multifrontal Factorization & Algorithms for Compressed Sensing. Cinna Julie Wu
Partial Left-Looking Structured Multifrontal Factorization & Algorithms for Compressed Sensing by Cinna Julie Wu A dissertation submitted in partial satisfaction of the requirements for the degree of Doctor
More informationNumerical Methods I Non-Square and Sparse Linear Systems
Numerical Methods I Non-Square and Sparse Linear Systems Aleksandar Donev Courant Institute, NYU 1 donev@courant.nyu.edu 1 MATH-GA 2011.003 / CSCI-GA 2945.003, Fall 2014 September 25th, 2014 A. Donev (Courant
More informationSOLVING SPARSE LINEAR SYSTEMS OF EQUATIONS. Chao Yang Computational Research Division Lawrence Berkeley National Laboratory Berkeley, CA, USA
1 SOLVING SPARSE LINEAR SYSTEMS OF EQUATIONS Chao Yang Computational Research Division Lawrence Berkeley National Laboratory Berkeley, CA, USA 2 OUTLINE Sparse matrix storage format Basic factorization
More informationOn the design of parallel linear solvers for large scale problems
On the design of parallel linear solvers for large scale problems ICIAM - August 2015 - Mini-Symposium on Recent advances in matrix computations for extreme-scale computers M. Faverge, X. Lacoste, G. Pichon,
More informationFast Structured Spectral Methods
Spectral methods HSS structures Fast algorithms Conclusion Fast Structured Spectral Methods Yingwei Wang Department of Mathematics, Purdue University Joint work with Prof Jie Shen and Prof Jianlin Xia
More informationMultilevel low-rank approximation preconditioners Yousef Saad Department of Computer Science and Engineering University of Minnesota
Multilevel low-rank approximation preconditioners Yousef Saad Department of Computer Science and Engineering University of Minnesota SIAM CSE Boston - March 1, 2013 First: Joint work with Ruipeng Li Work
More informationJ.I. Aliaga 1 M. Bollhöfer 2 A.F. Martín 1 E.S. Quintana-Ortí 1. March, 2009
Parallel Preconditioning of Linear Systems based on ILUPACK for Multithreaded Architectures J.I. Aliaga M. Bollhöfer 2 A.F. Martín E.S. Quintana-Ortí Deparment of Computer Science and Engineering, Univ.
More informationA Block Compression Algorithm for Computing Preconditioners
A Block Compression Algorithm for Computing Preconditioners J Cerdán, J Marín, and J Mas Abstract To implement efficiently algorithms for the solution of large systems of linear equations in modern computer
More informationMARCH 24-27, 2014 SAN JOSE, CA
MARCH 24-27, 2014 SAN JOSE, CA Sparse HPC on modern architectures Important scientific applications rely on sparse linear algebra HPCG a new benchmark proposal to complement Top500 (HPL) To solve A x =
More informationTowards parallel bipartite matching algorithms
Outline Towards parallel bipartite matching algorithms Bora Uçar CNRS and GRAAL, ENS Lyon, France Scheduling for large-scale systems, 13 15 May 2009, Knoxville Joint work with Patrick R. Amestoy (ENSEEIHT-IRIT,
More informationA dissection solver with kernel detection for unsymmetric matrices in FreeFem++
. p.1/21 11 Dec. 2014, LJLL, Paris FreeFem++ workshop A dissection solver with kernel detection for unsymmetric matrices in FreeFem++ Atsushi Suzuki Atsushi.Suzuki@ann.jussieu.fr Joint work with François-Xavier
More informationLU factorization with Panel Rank Revealing Pivoting and its Communication Avoiding version
1 LU factorization with Panel Rank Revealing Pivoting and its Communication Avoiding version Amal Khabou Advisor: Laura Grigori Université Paris Sud 11, INRIA Saclay France SIAMPP12 February 17, 2012 2
More informationUsing R for Iterative and Incremental Processing
Using R for Iterative and Incremental Processing Shivaram Venkataraman, Indrajit Roy, Alvin AuYoung, Robert Schreiber UC Berkeley and HP Labs UC BERKELEY Big Data, Complex Algorithms PageRank (Dominant
More informationA DISTRIBUTED-MEMORY RANDOMIZED STRUCTURED MULTIFRONTAL METHOD FOR SPARSE DIRECT SOLUTIONS
A DISTRIBUTED-MEMORY RANDOMIZED STRUCTURED MULTIFRONTAL METHOD FOR SPARSE DIRECT SOLUTIONS ZIXING XIN, JIANLIN XIA, MAARTEN V. DE HOOP, STEPHEN CAULEY, AND VENKATARAMANAN BALAKRISHNAN Abstract. We design
More informationOn the design of parallel linear solvers for large scale problems
On the design of parallel linear solvers for large scale problems Journée problème de Poisson, IHP, Paris M. Faverge, P. Ramet M. Faverge Assistant Professor Bordeaux INP LaBRI Inria Bordeaux - Sud-Ouest
More informationTHÈSE. En vue de l obtention du DOCTORAT DE L UNIVERSITÉ DE TOULOUSE. Théo Mary
THÈSE En vue de l obtention du DOCTORAT DE L UNIVERSITÉ DE TOULOUSE Délivré par : l Université Toulouse 3 Paul Sabatier (UT3 Paul Sabatier) Présentée et soutenue le 24 Novembre 2017 par : Théo Mary Block
More informationMultipole-Based Preconditioners for Sparse Linear Systems.
Multipole-Based Preconditioners for Sparse Linear Systems. Ananth Grama Purdue University. Supported by the National Science Foundation. Overview Summary of Contributions Generalized Stokes Problem Solenoidal
More informationAccelerating linear algebra computations with hybrid GPU-multicore systems.
Accelerating linear algebra computations with hybrid GPU-multicore systems. Marc Baboulin INRIA/Université Paris-Sud joint work with Jack Dongarra (University of Tennessee and Oak Ridge National Laboratory)
More informationA DISTRIBUTED-MEMORY RANDOMIZED STRUCTURED MULTIFRONTAL METHOD FOR SPARSE DIRECT SOLUTIONS
SIAM J. SCI. COMPUT. Vol. 39, No. 4, pp. C292 C318 c 2017 Society for Industrial and Applied Mathematics A DISTRIBUTED-MEMORY RANDOMIZED STRUCTURED MULTIFRONTAL METHOD FOR SPARSE DIRECT SOLUTIONS ZIXING
More informationV C V L T I 0 C V B 1 V T 0 I. l nk
Multifrontal Method Kailai Xu September 16, 2017 Main observation. Consider the LDL T decomposition of a SPD matrix [ ] [ ] [ ] [ ] B V T L 0 I 0 L T L A = = 1 V T V C V L T I 0 C V B 1 V T, 0 I where
More informationLecture 8: Fast Linear Solvers (Part 7)
Lecture 8: Fast Linear Solvers (Part 7) 1 Modified Gram-Schmidt Process with Reorthogonalization Test Reorthogonalization If Av k 2 + δ v k+1 2 = Av k 2 to working precision. δ = 10 3 2 Householder Arnoldi
More informationHierarchical Matrices. Jon Cockayne April 18, 2017
Hierarchical Matrices Jon Cockayne April 18, 2017 1 Sources Introduction to Hierarchical Matrices with Applications [Börm et al., 2003] 2 Sources Introduction to Hierarchical Matrices with Applications
More informationBALANCING-RELATED MODEL REDUCTION FOR DATA-SPARSE SYSTEMS
BALANCING-RELATED Peter Benner Professur Mathematik in Industrie und Technik Fakultät für Mathematik Technische Universität Chemnitz Computational Methods with Applications Harrachov, 19 25 August 2007
More informationSPARSE SOLVERS POISSON EQUATION. Margreet Nool. November 9, 2015 FOR THE. CWI, Multiscale Dynamics
SPARSE SOLVERS FOR THE POISSON EQUATION Margreet Nool CWI, Multiscale Dynamics November 9, 2015 OUTLINE OF THIS TALK 1 FISHPACK, LAPACK, PARDISO 2 SYSTEM OVERVIEW OF CARTESIUS 3 POISSON EQUATION 4 SOLVERS
More informationFast direct solvers for elliptic PDEs
Fast direct solvers for elliptic PDEs Gunnar Martinsson The University of Colorado at Boulder Students: Adrianna Gillman (now at Dartmouth) Nathan Halko Sijia Hao Patrick Young (now at GeoEye Inc.) Collaborators:
More informationSome Geometric and Algebraic Aspects of Domain Decomposition Methods
Some Geometric and Algebraic Aspects of Domain Decomposition Methods D.S.Butyugin 1, Y.L.Gurieva 1, V.P.Ilin 1,2, and D.V.Perevozkin 1 Abstract Some geometric and algebraic aspects of various domain decomposition
More informationDomain Decomposition-based contour integration eigenvalue solvers
Domain Decomposition-based contour integration eigenvalue solvers Vassilis Kalantzis joint work with Yousef Saad Computer Science and Engineering Department University of Minnesota - Twin Cities, USA SIAM
More informationJuly 18, Abstract. capture the unsymmetric structure of the matrices. We introduce a new algorithm which
An unsymmetrized multifrontal LU factorization Λ Patrick R. Amestoy y and Chiara Puglisi z July 8, 000 Abstract A well-known approach to compute the LU factorization of a general unsymmetric matrix A is
More informationLinear Solvers. Andrew Hazel
Linear Solvers Andrew Hazel Introduction Thus far we have talked about the formulation and discretisation of physical problems...... and stopped when we got to a discrete linear system of equations. Introduction
More informationEfficient domain decomposition methods for the time-harmonic Maxwell equations
Efficient domain decomposition methods for the time-harmonic Maxwell equations Marcella Bonazzoli 1, Victorita Dolean 2, Ivan G. Graham 3, Euan A. Spence 3, Pierre-Henri Tournier 4 1 Inria Saclay (Defi
More informationSparse linear solvers
Sparse linear solvers Laura Grigori ALPINES INRIA and LJLL, UPMC On sabbatical at UC Berkeley March 2015 Plan Sparse linear solvers Sparse matrices and graphs Classes of linear solvers Sparse Cholesky
More informationA robust inner-outer HSS preconditioner
NUMERICAL LINEAR ALGEBRA WIH APPLICAIONS Numer. Linear Algebra Appl. 2011; 00:1 0 [Version: 2002/09/18 v1.02] A robust inner-outer HSS preconditioner Jianlin Xia 1 1 Department of Mathematics, Purdue University,
More informationMULTI-LAYER HIERARCHICAL STRUCTURES AND FACTORIZATIONS
MULTI-LAYER HIERARCHICAL STRUCTURES AND FACTORIZATIONS JIANLIN XIA Abstract. We propose multi-layer hierarchically semiseparable MHS structures for the fast factorizations of dense matrices arising from
More informationMultigrid based preconditioners for the numerical solution of two-dimensional heterogeneous problems in geophysics
Technical Report RAL-TR-2007-002 Multigrid based preconditioners for the numerical solution of two-dimensional heterogeneous problems in geophysics I. S. Duff, S. Gratton, X. Pinel, and X. Vasseur January
More informationAvoiding Communication in Distributed-Memory Tridiagonalization
Avoiding Communication in Distributed-Memory Tridiagonalization SIAM CSE 15 Nicholas Knight University of California, Berkeley March 14, 2015 Joint work with: Grey Ballard (SNL) James Demmel (UCB) Laura
More informationMinisymposia 9 and 34: Avoiding Communication in Linear Algebra. Jim Demmel UC Berkeley bebop.cs.berkeley.edu
Minisymposia 9 and 34: Avoiding Communication in Linear Algebra Jim Demmel UC Berkeley bebop.cs.berkeley.edu Motivation (1) Increasing parallelism to exploit From Top500 to multicores in your laptop Exponentially
More informationOptimal Interface Conditions for an Arbitrary Decomposition into Subdomains
Optimal Interface Conditions for an Arbitrary Decomposition into Subdomains Martin J. Gander and Felix Kwok Section de mathématiques, Université de Genève, Geneva CH-1211, Switzerland, Martin.Gander@unige.ch;
More informationTwo-level Domain decomposition preconditioning for the high-frequency time-harmonic Maxwell equations
Two-level Domain decomposition preconditioning for the high-frequency time-harmonic Maxwell equations Marcella Bonazzoli 2, Victorita Dolean 1,4, Ivan G. Graham 3, Euan A. Spence 3, Pierre-Henri Tournier
More informationHybrid static/dynamic scheduling for already optimized dense matrix factorization. Joint Laboratory for Petascale Computing, INRIA-UIUC
Hybrid static/dynamic scheduling for already optimized dense matrix factorization Simplice Donfack, Laura Grigori, INRIA, France Bill Gropp, Vivek Kale UIUC, USA Joint Laboratory for Petascale Computing,
More informationInexactness and flexibility in linear Krylov solvers
Inexactness and flexibility in linear Krylov solvers Luc Giraud ENSEEIHT (N7) - IRIT, Toulouse Matrix Analysis and Applications CIRM Luminy - October 15-19, 2007 in honor of Gérard Meurant for his 60 th
More informationSparse LU Factorization on GPUs for Accelerating SPICE Simulation
Nano-scale Integrated Circuit and System (NICS) Laboratory Sparse LU Factorization on GPUs for Accelerating SPICE Simulation Xiaoming Chen PhD Candidate Department of Electronic Engineering Tsinghua University,
More informationA Fast Direct Solver for a Class of Elliptic Partial Differential Equations
J Sci Comput (2009) 38: 316 330 DOI 101007/s10915-008-9240-6 A Fast Direct Solver for a Class of Elliptic Partial Differential Equations Per-Gunnar Martinsson Received: 20 September 2007 / Revised: 30
More informationFast direct solvers for elliptic PDEs
Fast direct solvers for elliptic PDEs Gunnar Martinsson The University of Colorado at Boulder PhD Students: Tracy Babb Adrianna Gillman (now at Dartmouth) Nathan Halko (now at Spot Influence, LLC) Sijia
More informationSparse factorizations: Towards optimal complexity and resilience at exascale
Sparse factorizations: Towards optimal complexity and resilience at exascale Xiaoye Sherry Li Lawrence Berkeley National Laboratory Challenges in 21st Century Experimental Mathematical Computation Workshop,
More informationHigh Performance Parallel Tucker Decomposition of Sparse Tensors
High Performance Parallel Tucker Decomposition of Sparse Tensors Oguz Kaya INRIA and LIP, ENS Lyon, France SIAM PP 16, April 14, 2016, Paris, France Joint work with: Bora Uçar, CNRS and LIP, ENS Lyon,
More informationParallel Preconditioning Methods for Ill-conditioned Problems
Parallel Preconditioning Methods for Ill-conditioned Problems Kengo Nakajima Information Technology Center, The University of Tokyo 2014 Conference on Advanced Topics and Auto Tuning in High Performance
More informationMigration with Implicit Solvers for the Time-harmonic Helmholtz
Migration with Implicit Solvers for the Time-harmonic Helmholtz Yogi A. Erlangga, Felix J. Herrmann Seismic Laboratory for Imaging and Modeling, The University of British Columbia {yerlangga,fherrmann}@eos.ubc.ca
More informationBalanced Truncation Model Reduction of Large and Sparse Generalized Linear Systems
Balanced Truncation Model Reduction of Large and Sparse Generalized Linear Systems Jos M. Badía 1, Peter Benner 2, Rafael Mayo 1, Enrique S. Quintana-Ortí 1, Gregorio Quintana-Ortí 1, A. Remón 1 1 Depto.
More informationEffective matrix-free preconditioning for the augmented immersed interface method
Effective matrix-free preconditioning for the augmented immersed interface method Jianlin Xia a, Zhilin Li b, Xin Ye a a Department of Mathematics, Purdue University, West Lafayette, IN 47907, USA. E-mail:
More informationLeveraging Task-Parallelism in Energy-Efficient ILU Preconditioners
Leveraging Task-Parallelism in Energy-Efficient ILU Preconditioners José I. Aliaga Leveraging task-parallelism in energy-efficient ILU preconditioners Universidad Jaime I (Castellón, Spain) José I. Aliaga
More informationConvergence Behavior of a Two-Level Optimized Schwarz Preconditioner
Convergence Behavior of a Two-Level Optimized Schwarz Preconditioner Olivier Dubois 1 and Martin J. Gander 2 1 IMA, University of Minnesota, 207 Church St. SE, Minneapolis, MN 55455 dubois@ima.umn.edu
More informationA direct solver for elliptic PDEs in three dimensions based on hierarchical merging of Poincaré-Steklov operators
(1) A direct solver for elliptic PDEs in three dimensions based on hierarchical merging of Poincaré-Steklov operators S. Hao 1, P.G. Martinsson 2 Abstract: A numerical method for variable coefficient elliptic
More informationAMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences)
AMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences) Lecture 19: Computing the SVD; Sparse Linear Systems Xiangmin Jiao Stony Brook University Xiangmin Jiao Numerical
More informationLU Factorization with Panel Rank Revealing Pivoting and Its Communication Avoiding Version
LU Factorization with Panel Rank Revealing Pivoting and Its Communication Avoiding Version Amal Khabou James Demmel Laura Grigori Ming Gu Electrical Engineering and Computer Sciences University of California
More informationQR Factorization of Tall and Skinny Matrices in a Grid Computing Environment
QR Factorization of Tall and Skinny Matrices in a Grid Computing Environment Emmanuel AGULLO (INRIA / LaBRI) Camille COTI (Iowa State University) Jack DONGARRA (University of Tennessee) Thomas HÉRAULT
More informationJacobi-Davidson Eigensolver in Cusolver Library. Lung-Sheng Chien, NVIDIA
Jacobi-Davidson Eigensolver in Cusolver Library Lung-Sheng Chien, NVIDIA lchien@nvidia.com Outline CuSolver library - cusolverdn: dense LAPACK - cusolversp: sparse LAPACK - cusolverrf: refactorization
More informationScalable Hybrid Programming and Performance for SuperLU Sparse Direct Solver
Scalable Hybrid Programming and Performance for SuperLU Sparse Direct Solver Sherry Li Lawrence Berkeley National Laboratory Piyush Sao Rich Vuduc Georgia Institute of Technology CUG 14, May 4-8, 14, Lugano,
More informationRobust solution of Poisson-like problems with aggregation-based AMG
Robust solution of Poisson-like problems with aggregation-based AMG Yvan Notay Université Libre de Bruxelles Service de Métrologie Nucléaire Paris, January 26, 215 Supported by the Belgian FNRS http://homepages.ulb.ac.be/
More informationPreface to the Second Edition. Preface to the First Edition
n page v Preface to the Second Edition Preface to the First Edition xiii xvii 1 Background in Linear Algebra 1 1.1 Matrices................................. 1 1.2 Square Matrices and Eigenvalues....................
More informationFAST SPARSE SELECTED INVERSION
SIAM J. MARIX ANAL. APPL. Vol. 36, No. 3, pp. 83 34 c 05 Society for Industrial and Applied Mathematics FAS SPARSE SELECED INVERSION JIANLIN XIA, YUANZHE XI, SEPHEN CAULEY, AND VENKAARAMANAN BALAKRISHNAN
More informationExploiting Low-Rank Structure in Computing Matrix Powers with Applications to Preconditioning
Exploiting Low-Rank Structure in Computing Matrix Powers with Applications to Preconditioning Erin C. Carson, Nicholas Knight, James Demmel, Ming Gu U.C. Berkeley SIAM PP 12, Savannah, Georgia, USA, February
More informationMASSIVELY PARALLEL STRUCTURED MULTIFRONTAL SOLVER FOR TIME-HARMONIC ELASTIC WAVES IN 3D ANISOTROPIC MEDIA
Proceedings of the Project Review, Geo-Mathematical Imaging Group Purdue University, West Lafayette IN, Vol. pp. 97-. MASSIVELY PARALLEL STRUCTURED MULTIFRONTAL SOLVER FOR TIME-HARMONIC ELASTIC WAVES IN
More informationAn Efficient Low Memory Implicit DG Algorithm for Time Dependent Problems
An Efficient Low Memory Implicit DG Algorithm for Time Dependent Problems P.-O. Persson and J. Peraire Massachusetts Institute of Technology 2006 AIAA Aerospace Sciences Meeting, Reno, Nevada January 9,
More informationarxiv: v1 [cs.na] 20 Jul 2015
AN EFFICIENT SOLVER FOR SPARSE LINEAR SYSTEMS BASED ON RANK-STRUCTURED CHOLESKY FACTORIZATION JEFFREY N. CHADWICK AND DAVID S. BINDEL arxiv:1507.05593v1 [cs.na] 20 Jul 2015 Abstract. Direct factorization
More informationAccelerating Linear Algebra on Heterogeneous Architectures of Multicore and GPUs using MAGMA and DPLASMA and StarPU Schedulers
UT College of Engineering Tutorial Accelerating Linear Algebra on Heterogeneous Architectures of Multicore and GPUs using MAGMA and DPLASMA and StarPU Schedulers Stan Tomov 1, George Bosilca 1, and Cédric
More informationFINDING PARALLELISM IN GENERAL-PURPOSE LINEAR PROGRAMMING
FINDING PARALLELISM IN GENERAL-PURPOSE LINEAR PROGRAMMING Daniel Thuerck 1,2 (advisors Michael Goesele 1,2 and Marc Pfetsch 1 ) Maxim Naumov 3 1 Graduate School of Computational Engineering, TU Darmstadt
More informationOpen-source finite element solver for domain decomposition problems
1/29 Open-source finite element solver for domain decomposition problems C. Geuzaine 1, X. Antoine 2,3, D. Colignon 1, M. El Bouajaji 3,2 and B. Thierry 4 1 - University of Liège, Belgium 2 - University
More informationA dissection solver with kernel detection for symmetric finite element matrices on shared memory computers
A dissection solver with kernel detection for symmetric finite element matrices on shared memory computers Atsushi Suzuki, François-Xavier Roux To cite this version: Atsushi Suzuki, François-Xavier Roux.
More informationContents. Preface... xi. Introduction...
Contents Preface... xi Introduction... xv Chapter 1. Computer Architectures... 1 1.1. Different types of parallelism... 1 1.1.1. Overlap, concurrency and parallelism... 1 1.1.2. Temporal and spatial parallelism
More informationCOMPARED to other computational electromagnetic
IEEE TRANSACTIONS ON MICROWAVE THEORY AND TECHNIQUES, VOL. 58, NO. 12, DECEMBER 2010 3697 Existence of -Matrix Representations of the Inverse Finite-Element Matrix of Electrodynamic Problems and -Based
More informationDirect solution methods for sparse matrices. p. 1/49
Direct solution methods for sparse matrices p. 1/49 p. 2/49 Direct solution methods for sparse matrices Solve Ax = b, where A(n n). (1) Factorize A = LU, L lower-triangular, U upper-triangular. (2) Solve
More informationExploiting off-diagonal rank structures in the solution of linear matrix equations
Stefano Massei Exploiting off-diagonal rank structures in the solution of linear matrix equations Based on joint works with D. Kressner (EPFL), M. Mazza (IPP of Munich), D. Palitta (IDCTS of Magdeburg)
More informationEFFECTIVE AND ROBUST PRECONDITIONING OF GENERAL SPD MATRICES VIA STRUCTURED INCOMPLETE FACTORIZATION
EFFECVE AND ROBUS PRECONDONNG OF GENERAL SPD MARCES VA SRUCURED NCOMPLEE FACORZAON JANLN XA AND ZXNG XN Abstract For general symmetric positive definite SPD matrices, we present a framework for designing
More informationSEISMIC INVERSE SCATTERING VIA DISCRETE HELMHOLTZ OPERATOR FACTORIZATION AND OPTIMIZATION
Proceedings of the Project Review, Geo-Mathematical Imaging Group (Purdue University, West Lafayette IN), Vol. (2009) pp. 09-32. SEISMIC INVERSE SCATTERING VIA DISCRETE HELMHOLTZ OPERATOR FACTORIZATION
More information