Block Low-Rank (BLR) approximations to improve multifrontal sparse solvers

Size: px
Start display at page:

Download "Block Low-Rank (BLR) approximations to improve multifrontal sparse solvers"

Transcription

1 Block Low-Rank (BLR) approximations to improve multifrontal sparse solvers Joint work with Patrick Amestoy, Cleve Ashcraft, Olivier Boiteau, Alfredo Buttari and Jean-Yves L Excellent, PhD started on October 1st, 2010 and financed by EDF Clément Weisbecker, ENSEEIHT-IRIT, University of Toulouse, France MUMPS Users Group Meeting Clamart, France, May 29-30, 2013

2 Introduction Multifrontal solver Block Low-Rank approximations to improve multifrontal sparse solvers direct solver for large linear systems objective: A = LU Low-rank approximations compression and flop reduction accuracy controlled by a numerical parameter Combine these two notions to improve multifrontal solvers (in the context of MUMPS) 2/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013

3 The multifrontal method

4 Separator tree (Schreiber 1982) dependencies between variables topological order 4/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013

5 Frontal matrices At each node, a partial factorization of the frontal matrix is performed : CB 1 CB 2 FS assembly elim CB CBs are stacked waiting for assembly peak of CB stack NFS FS separator variables NFS border variables FS 5/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013

6 Low-rank approximations

7 Low-rank matrices Low-rank matrix (Bebendorf) Let A be a m n matrix Let k ε be the approximated numerical rank of A at accuracy ε A is said to be a low-rank matrix if there exist matrices X of size m k ε, Y of size n k ε, E of size m n such that : A = X Y T + E, where E 2 ε and k ε (m + n) < mn n k 2 k 2 Y T 2 k 1 k 1 C = Y T 1 X 2 Low-rank product: X 1 (Y T 1 X 2)Y T 2 m X1 7/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013

8 Where can I find low-rank matrices? Problem: Matrices are usually not low-rank! Solution (Börm, Bebendorf): Well defined subblocks A b = A στ can be low-rank! (b = σ τ I I) Define a clustering C to obtain low-rank blocks I 1 I 2 I 3 I 1 I 2 I 3 8/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013

9 Admissibility condition How to find a good clustering? Admissibility condition for elliptic PDEs (Börm, Grasedyck, Hackbusch) Let σ I and τ I be clusters of indices The block (σ, τ) is said admissible if σ and τ satisfy diam(σ) + diam(τ) 2ηdist(σ, τ) where η is a problem dependant fixed parameter (usually 1 or 05) diam(σ) dist(σ,τ) diam(τ) σ 9/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013 τ

10 Illustration (1) D 1 D 2 10/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013

11 Illustration (1) D 1 D 2 10/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013

12 Illustration (1) D 1 D 2 10/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013

13 Illustration (1) D 1 D 2 10/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013

14 Illustration (1) D 1 D 2 10/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013

15 Illustration (1) D 1 D 2 10/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013

16 Illustration (2) rank of distance between and room for compression 11/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013

17 Exploiting low-rank subblocks: H matrices à = D 11 X 12 Y T 12 X 36 Y T 36 X 21 Y T 21 D 22 X 63 Y T 63 D 44 X 45 Y T 45 X 54 Y T 54 D 55 I 7 I 7 I 3 I 3 I 3 I 6 I 6 I 3 I 6 I 6 I 1 I 1 I 1 I 2 I 2 I 1 I 2 I 2 I 4 I 4 I 4 I 5 I 5 I 4 I 5 I 5 (Börm, Bebendorf, Hackbush, Grasedyck) 12/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013

18 Exploiting low-rank subblocks: HSS matrices D 1 X 1 B 1 Y T 2 X 1 R 1 B 3 W T 4 YT 4 X 1 R 1 B 3 W T 5 YT 5 Ã = X 2 B 2 Y1 T D 2 X 2 R 2 B 3 W4 T YT 4 X 2 R 2 B 3 W5 T YT 5 X 4 R 4 B 6 W1 T YT 1 X 4 R 4 B 6 W2 T YT 2 D 4 X 4 B 4 Y5 T X 5 R 5 B 6 W1 T YT 1 X 5 R 5 B 6 W2 T YT 2 X 5 B 5 Y4 T D B 3 B 6 6 R 1,W 1 B 1 R 2,W 2 R 4,W 4 B 4 R 5,W 5 1 B B 5 5 X 1,Y 1 X 2,Y 2 X 4,Y 4 X 5,Y 5 D 1 D 2 D 4 D 5 (Li, Napov, Xia, Gu, Wang, Rouet Gillman, Martinsson) 13/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013

19 Exploiting low-rank subblocks: BLR matrices D 1 X 12 Y T 12 X 13 Y T 13 X 14 Y T 14 Ã = X 21 Y T 21 D 2 X 23 Y T 23 X 24 Y T 24 X 31 Y T 31 X 32 Y T 32 D 3 X 34 Y T 34 X 41 Y T 41 X 42 Y T 42 X 43 Y T 43 D 4 particular case of H-matrices no tree natural matrix structure Idea: representing fronts with BLR matrices 14/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013

20 Comparative study: compression rates Compression rates of the frontal matrix at the root of a multifrontal tree, on two 3D stencils (discretization of a 128 x 128 x 128 cube) fraction of dense storage Laplacian low-rank threshold H HSS BLR fraction of dense storage pts Geoazur stencil low-rank threshold H HSS BLR 15/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013

21 Comparative study: compression cost Compression cost of the frontal matrix at the root of a multifrontal tree, on two 3D stencils (discretization of a 128 x 128 x 128 cube) log 10 (number of operations for compression) Laplacian H HSS BLR log 10 (number of operations for compression) 27-pts Geoazur stencil low-rank threshold low-rank threshold H HSS BLR 16/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013

22 Adaptation to a multifrontal solver 17/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013

23 Adaptation to a multifrontal solver no relative order efficient clustering 17/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013

24 Adaptation to a multifrontal solver no relative order efficient compr and decompr 200 efficient clustering distribution assembly 17/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013

25 Adaptation to a multifrontal solver no relative order efficient compr and decompr flat non hierarchical structure efficient clustering distribution assembly pivoting 17/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013

26 Clustering variables

27 Admissible clustering Constraint : the admissibility condition should be satisfied large diameters fraction of memory used 83% small diameters fraction of memory used 57% 19/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013

28 Halo algorithm for the clustering of a separator Designed to catch the geometry of the problem Computed with the graph instead of the mesh Coupled with a third party partitioning tool 20/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013

29 Halo algorithm for the clustering of a separator Designed to catch the geometry of the problem Computed with the graph instead of the mesh Coupled with a third party partitioning tool 1 The separator 20/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013

30 Halo algorithm for the clustering of a separator Designed to catch the geometry of the problem Computed with the graph instead of the mesh Coupled with a third party partitioning tool 1 The separator 2 The halo 20/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013

31 Halo algorithm for the clustering of a separator Designed to catch the geometry of the problem Computed with the graph instead of the mesh Coupled with a third party partitioning tool 1 The separator 2 The halo 3 Extraction of the halo 20/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013

32 Halo algorithm for the clustering of a separator Designed to catch the geometry of the problem Computed with the graph instead of the mesh Coupled with a third party partitioning tool 1 The separator 2 The halo 3 Extraction of the halo 4 Partition of the halo 20/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013

33 Halo algorithm for the clustering of a separator Designed to catch the geometry of the problem Computed with the graph instead of the mesh Coupled with a third party partitioning tool P 1 P 2 P 3 1 The separator 2 The halo 3 Extraction of the halo 4 Partition of the halo 5 Partition of the separator (block size is fixed) 20/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013

34 Clustering of the variables of a front front = separator + border S 21/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013

35 Clustering of the variables of a front front = separator + border 1- separator : halo S 21/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013

36 Clustering of the variables of a front front = separator + border 1- separator : halo 2- border? 2 choices : S 21/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013

37 Clustering of the variables of a front front = separator + border 1- separator : halo 2- border? 2 choices : S EXPLICIT S 21/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013

38 Clustering of the variables of a front front = separator + border 1- separator : halo 2- border? 2 choices : S EXPLICIT S 21/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013

39 Clustering of the variables of a front front = separator + border 1- separator : halo 2- border? 2 choices : S EXPLICIT INHERITED (top down) S S 21/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013

40 Clustering of the variables of a front front = separator + border 1- separator : halo 2- border? 2 choices : S EXPLICIT INHERITED (top down) S S 21/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013

41 Clustering of the variables of a front front = separator + border 1- separator : halo 2- border? 2 choices : S EXPLICIT INHERITED (top down) S S 21/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013

42 Clustering of the variables of a front front = separator + border 1- separator : halo 2- border? 2 choices : S EXPLICIT INHERITED (top down) S S 21/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013

43 Structural comparison optimal optimal = optimal block 22/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013

44 Structural comparison optimal optimal = optimal block small optimal = large enough block 22/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013

45 Structural comparison optimal optimal = optimal block small optimal = large enough block small small = too small block 22/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013

46 Structural comparison optimal optimal = optimal block small optimal = large enough block small small = too small block reclustering strategies S 4 22/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013

47 Illustration with a top-level separator from SCOTCH side face view perspective view 23/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013

48 Block Low-Rank multifrontal method

49 General idea fronts approximated with low-rank matrices σ L i 1,1 U i 1,1 L i 2,2 U i 2,2 τ U i di,di L i di,di CB 25/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013

50 Switch level (1) percentof the entries of the factor computed in fronts larger than N mesh size N =200 N =400 N =600 N =800 N =1000 a large ratio of the memory consumption is targeted 26/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013

51 Switch level (1) 100 percentof operations performed in fronts larger than N N =200 N =400 N =600 N =800 N = mesh size a large ratio of the flop count is targeted 26/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013

52 Switch level (2) low-rank techniques only used on large fronts mesh size N = N = N = N = N = (Proportion of fronts larger than N with respect to the total number of fronts in the multifrontal tree, for different mesh sizes) Only a few fronts are actually compressed! 27/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013

53 Factorization algorithms task operation type dense low-rank Factor (F) B = LU T (2/3)n 3 (2/3)n 3 Compress (C) C = XY T kn 2 Solve (S) D = X(Y T L 1 ) n 3 kn 2 CB update (U) D = D X 1 (Y1 T X 2)Y2 T 2n 3 2kn 2 28/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013

54 Factorization algorithms task operation type dense low-rank Factor (F) B = LU T (2/3)n 3 (2/3)n 3 Compress (C) C = XY T kn 2 Solve (S) D = X(Y T L 1 ) n 3 kn 2 CB update (U) D = D X 1 (Y1 T X 2)Y2 T 2n 3 2kn 2 Standard FR algorithm: F Factor S Solve U Update 28/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013

55 Factorization algorithms task operation type dense low-rank Factor (F) B = LU T (2/3)n 3 (2/3)n 3 Compress (C) C = XY T kn 2 Solve (S) D = X(Y T L 1 ) n 3 kn 2 CB update (U) D = D X 1 (Y1 T X 2)Y2 T 2n 3 2kn 2 Standard FR algorithm: F Factor S Solve U Update 28/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013

56 Factorization algorithms task operation type dense low-rank Factor (F) B = LU T (2/3)n 3 (2/3)n 3 Compress (C) C = XY T kn 2 Solve (S) D = X(Y T L 1 ) n 3 kn 2 CB update (U) D = D X 1 (Y1 T X 2)Y2 T 2n 3 2kn 2 Standard FR algorithm: F Factor S Solve U Update 28/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013

57 Factorization algorithms task operation type dense low-rank Factor (F) B = LU T (2/3)n 3 (2/3)n 3 Compress (C) C = XY T kn 2 Solve (S) D = X(Y T L 1 ) n 3 kn 2 CB update (U) D = D X 1 (Y1 T X 2)Y2 T 2n 3 2kn 2 Standard FR algorithm: F Factor S Solve U Update 28/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013

58 Factorization algorithms task operation type dense low-rank Factor (F) B = LU T (2/3)n 3 (2/3)n 3 Compress (C) C = XY T kn 2 Solve (S) D = X(Y T L 1 ) n 3 kn 2 CB update (U) D = D X 1 (Y1 T X 2)Y2 T 2n 3 2kn 2 BLR FSCU algorithm: F Factor S Solve U Update 28/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013

59 Factorization algorithms task operation type dense low-rank Factor (F) B = LU T (2/3)n 3 (2/3)n 3 Compress (C) C = XY T kn 2 Solve (S) D = X(Y T L 1 ) n 3 kn 2 CB update (U) D = D X 1 (Y1 T X 2)Y2 T 2n 3 2kn 2 BLR FSCU algorithm: F Factor S Solve U Update 28/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013

60 Factorization algorithms task operation type dense low-rank Factor (F) B = LU T (2/3)n 3 (2/3)n 3 Compress (C) C = XY T kn 2 Solve (S) D = X(Y T L 1 ) n 3 kn 2 CB update (U) D = D X 1 (Y1 T X 2)Y2 T 2n 3 2kn 2 BLR FSCU algorithm: F Factor S Solve U Update 28/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013

61 Factorization algorithms task operation type dense low-rank Factor (F) B = LU T (2/3)n 3 (2/3)n 3 Compress (C) C = XY T kn 2 Solve (S) D = X(Y T L 1 ) n 3 kn 2 CB update (U) D = D X 1 (Y1 T X 2)Y2 T 2n 3 2kn 2 BLR FSCU algorithm: F Factor S Solve C Compress U Update 28/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013

62 Factorization algorithms task operation type dense low-rank Factor (F) B = LU T (2/3)n 3 (2/3)n 3 Compress (C) C = XY T kn 2 Solve (S) D = X(Y T L 1 ) n 3 kn 2 CB update (U) D = D X 1 (Y1 T X 2)Y2 T 2n 3 2kn 2 BLR FSCU algorithm: F Factor S Solve C Compress U Update 28/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013

63 Factorization algorithms task operation type dense low-rank Factor (F) B = LU T (2/3)n 3 (2/3)n 3 Compress (C) C = XY T kn 2 Solve (S) D = X(Y T L 1 ) n 3 kn 2 CB update (U) D = D X 1 (Y1 T X 2)Y2 T 2n 3 2kn 2 F Factor S Solve C Compress U Update FCSU version more efficient less stability 28/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013

64 Factorization algorithms task operation type dense low-rank Factor (F) B = LU T (2/3)n 3 (2/3)n 3 Compress (C) C = XY T kn 2 Solve (S) D = X(Y T L 1 ) n 3 kn 2 CB update (U) D = D X 1 (Y1 T X 2)Y2 T 2n 3 2kn 2 F Factor S Solve C Compress U Update FSUC version no flop reduction more stability 28/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013

65 Factorization algorithms task operation type dense low-rank Factor (F) B = LU T (2/3)n 3 (2/3)n 3 Compress (C) C = XY T kn 2 Solve (S) D = X(Y T L 1 ) n 3 kn 2 CB update (U) D = D X 1 (Y1 T X 2)Y2 T 2n 3 2kn 2 F Factor S Solve C Compress U Update High algorithmic flexibility 28/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013

66 Experiments

67 Set of problems Name Prop Arith N NZ mem LU flops LU CSR appli ( 10 6 ) ( 10 9 ) (GB) ( ) Curl D/sym D E-15 Geoazur D/unsym Z E-4 wave prop EDF_A_MECA_R12 2D/sym D E-15 mechanics EDF_D_THER_R7 3D/sym D E-15 thermics CSR = Componentwise Scaled Residual Code_Aster tpll01{a,d} test cases (refined) large matrices applicative problems 30/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013

68 Metrics memory: L = fraction of FR factors storage obtained with BLR (%) CB = fraction of FR maximum size of CB stack obtained with BLR (%) flops: fraction of FR operations needed for the BLR factorization (in percent or absolute data, including the compression cost) 31/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013

69 Clustering strategy memory flops time L CB facto clustering full rank analysis clustering inh exp inh exp inh exp inh exp Curl % 624% 70% 55% 109% 111% 13 s 27 s 897 s Geoazur % 770% 470% 450% 608% 591% 5 s 42 s 62 s TH_RAFF7 341% 307% 175% 162% 72% 66% 34 s 206 s 387 s ME_RAFF12 529% 511% 48% 41% 61% 61% 121 s 239 s 1971 s inherited version is more than 2 times faster same results on L 11 comparable results on L 21 a little less good on CBs inherited clustering used for all the experiments 32/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013

70 Global ordering of the matrix: general results Results with different orderings on Geoazur128 problem FR ordering mry flops peak L CB flops AMD 109GB 39E GB 735% 400% 594% AMF 72GB 18E GB 699% 535% 483% PORD 55GB 10E GB 704% 490% 489% METIS 46GB 62E GB 787% 464% 626% SCOTCH 49GB 66E GB 794% 481% 634% LR 33/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013

71 Global ordering of the matrix: general results Results with different orderings on Geoazur128 problem FR ordering mry flops peak L CB flops AMD 109GB 39E GB 735% 400% 594% AMF 72GB 18E GB 699% 535% 483% PORD 55GB 10E GB 704% 490% 489% METIS 46GB 62E GB 787% 464% 626% SCOTCH 49GB 66E GB 794% 481% 634% LR Algebraic approach 33/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013

72 Global ordering of the matrix: AMF tree [visualization tool developed by M Bremond talk tomorrow 11:30] 34/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013

73 Global ordering of the matrix: SCOTCH tree [visualization tool developed by M Bremond talk tomorrow 11:30] 35/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013

74 Influence of the size of the problem different refinements of problem EDF_A_MECA done with Homard memory flops Refinement N L CB facto R8 2, 101, % 43% 37% R11 33, 570, % 29% 12% R12 134, 250, % 24% 7% the larger the problem, the more efficient the method target = challenging problems 36/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013

75 Scalability with respect to the size of the problem Laplacian problem, ε = Full Rank O(N 2 ) BLR O(N 4/3 ) 14 flops facto Mesh size M O(N 4/3 ) complexity 37/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013

76 Influence of the size of the clusters memory flops block size L CB facto compr other % 55% 75% 98E+11 46E % 50% 68% 15E+12 41E % 46% 63% 19E+12 37E % 45% 60% 24E+12 34E % 44% 58% 27E+12 33E % 44% 56% 31E+12 32E % 45% 56% 34E+12 31E % 44% 56% 37E+12 31E % 44% 57% 44E+12 31E+13 flexibility (no need for a fine tuning) can be dynamically adapted for pivoting, parallelism, 38/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013

77 Global results: 2D problems EDF_A_MECA_R12 QR DROPPING PARAMETER L CB flops facto (%) CSR CURL-CURL 2D 5000x QR DROPPING PARAMETER Name N mem LU flops LU CSR EDF_A_MECA_R12 134E GB 2E+14 4E-15 Curl-Curl E+6 29 GB 5E+12 2E-15 39/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013

78 Global results: 3D problems L CB flops facto (%) CSR EDF_D_THER_R GEOAZUR 3D 128x128x QR DROPPING PARAMETER QR DROPPING PARAMETER Name N mem LU flops LU CSR EDF_D_THER_R7 8E GB 1E+14 8E-15 Geoazur E+6 54 GB 6E+13 3E-4 40/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013

79 Application to geophysics (1) Helmholtz equation for seismic modeling (SEISCOPE project) EAGE overthrust ground model 0 20 Dip (km) Cross (km) 15 5 Depth (km) m/s fqcy Flops LU Mem LU Peak memory 2 Hz 8957E+11 3 GB 4 GB 4 Hz 1639E GB 25 GB 8 Hz 5769E GB 283 GB 41/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013

80 Application to geophysics (2) 5 Dip (km) b) 20 ) Dip (km) e) m ) Dip (km) Dip (km) Depth (km) h) Dip (km) i) k) Dip (km) C 10 Depth (km) Dip (km) (k ro ss s ro s l) (k m ) 15 5 Dip (km) m ) 15 5 s Dip (km) 10 0 ro s C Depth (km) Depth (km) Depth (km) j) 15 C ro ss C ro ss C ro ss Dip (km) 10 (k (k m ) m ) 5 0 C ro ss (k Depth (km) (k m ) f) C ro ss C ro ss (k m ) g) Dip (km) 10 ro ss C Depth (km) m ) 5 0 (k 0 15 (k m ) C c) 20 (k m Depth (km) Depth (km) Depth (km) Depth (km) Dip (km) Depth (km) 5 5 d) 42/51 0 C ro ss C ro ss ) 0 (k m (k m ) a) a) MUMPS Users Group Meeting Clamart, France, May 29-30, 2013

81 Application to geophysics (3) Dip (km) Dip (km) d) Dip (km) (k C ro ss ss C ro flops 43/51 c) 20 Depth (km) m ) 5 5 Depth (km) Depth (km) 0 (k (k ss C ro m ) b) 20 ) 15 m Dip (km) 10 (k 5 C ro ss 0 Depth (km) m ) a) memory ε fqcy facto F CB (10 5 ) 2 Hz 4 Hz 8 Hz 418 % 274 % 218 % 618 % 500 % 416 % 323% 244% 239% (10 4 ) 2 Hz 4 Hz 8 Hz 329 % 200 % 152 % 534 % 422 % 289 % 239% 217% 194% (10 3 ) 2 Hz 4 Hz 8 Hz 246 % 138 % 98 % 447 % 345 % 213 % 168% 190% 159% MUMPS Users Group Meeting Clamart, France, May 29-30, 2013

82 Preconditioning with BLR: set of problems N NZ Cond application Piston 13E+6 547E+6 51E+5 external pressure force on the top perf001d 20E+6 758E+6 15E+11 cavity hook subjected to internal pressure force (challenging for EDF) 44/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013

83 Preconditioning with BLR: number of iterations Number of iterations needed for the convergence FR is MUMPS 410 single precision means no convergence in 1, 000 seconds FR Piston perf001d perf001d: only 2 iterations to convergence with a BLR double precision precond at ε = 10 8 good solution versus coarse sword of usual single precision! 45/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013

84 Preconditioning with BLR: Piston (compr and time) 70 (%) PISTON 60 L CB flops facto (%) QR DROPPING PARAMETER 46/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013

85 Preconditioning with BLR: Piston (compr and time) (s) Total execution time Factorization time Solves time (GCPC) PISTON 250 time in seconds FR = 313s FR = 308s FR = 5s QR DROPPING PARAMETER 46/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013

86 Preconditioning with BLR: perf001d (compr and time) (%) PERF L CB flops facto (%) QR DROPPING PARAMETER 47/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013

87 Preconditioning with BLR: perf001d (compr and time) PERF (s) 450 Total execution time Solves time (GCPC) 400 Factorization time 350 time in seconds FR = 569s FR = 117s FR = 452s QR DROPPING PARAMETER 47/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013

88 Software work new FR factorization with two levels of blocking: internal panels: efficiency of BLAS 3 operations external panels: correspond to low-rank blocks done in factorization algorithm: BLR seq unsymm with all MUMPS features and double panel BLR seq symm prototype (single panel, no pivoting) actual flop compression is achieved in both cases MPI symm and unsymm: panel facto rewritten for BLR very short term: BLR seq symm with all MUMPS features and double panel short term: long term: BLR MPI symm and unsymm BLR solution phase actual memory gains 48/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013

89 Conclusion & perspectives efficient method on various applicative problems considerable memory reduction & substantial decrease in computations O(N 4/3 ) complexity, comparable to HSS can be used as a preconditioner or as a direct solver good potential for parallelism code is stable on tested problems MPI study on larger and more difficult problems error propagation study absolute or relative dropping parameter? relative to what? (work with S Gratton, M Ngom and D Titley-Peloquin started) 49/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013

90 Aknowledgements O Boiteau, B Quinnez and N Tardieu (EDF R&D) S Operto, R Brossier and J Virieux (SEISCOPE Project) for their contribution to the geophysics study the Toulouse Computing Center (CICT) and N Renon S Li, A Napov and F-H Rouet (LBNL Berkeley) Details on this work can be found in: P Amestoy, C Ashcraft, O Boiteau, A Buttari, J-Y L Excellent and C Weisbecker, Improving multifrontal methods by means of block low-rank representations, IRIT technical report RT/APO/12/6, INRIA technical report INRIA/RR-8199, submitted to SIAM SISC, enseeihtfr/documents/rt_apo_12_6_blrpdf, /51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013

91 Thank you! Any questions?

SUMMARY INTRODUCTION BLOCK LOW-RANK MULTIFRONTAL METHOD

SUMMARY INTRODUCTION BLOCK LOW-RANK MULTIFRONTAL METHOD D frequency-domain seismic modeling with a Block Low-Rank algebraic multifrontal direct solver. C. Weisbecker,, P. Amestoy, O. Boiteau, R. Brossier, A. Buttari, J.-Y. L Excellent, S. Operto, J. Virieux

More information

Large Scale Sparse Linear Algebra

Large Scale Sparse Linear Algebra Large Scale Sparse Linear Algebra P. Amestoy (INP-N7, IRIT) A. Buttari (CNRS, IRIT) T. Mary (University of Toulouse, IRIT) A. Guermouche (Univ. Bordeaux, LaBRI), J.-Y. L Excellent (INRIA, LIP, ENS-Lyon)

More information

A sparse multifrontal solver using hierarchically semi-separable frontal matrices

A sparse multifrontal solver using hierarchically semi-separable frontal matrices A sparse multifrontal solver using hierarchically semi-separable frontal matrices Pieter Ghysels Lawrence Berkeley National Laboratory Joint work with: Xiaoye S. Li (LBNL), Artem Napov (ULB), François-Henry

More information

MUMPS. The MUMPS library: work done during the SOLSTICE project. MUMPS team, Lyon-Grenoble, Toulouse, Bordeaux

MUMPS. The MUMPS library: work done during the SOLSTICE project. MUMPS team, Lyon-Grenoble, Toulouse, Bordeaux The MUMPS library: work done during the SOLSTICE project MUMPS team, Lyon-Grenoble, Toulouse, Bordeaux Sparse Days and ANR SOLSTICE Final Workshop June MUMPS MUMPS Team since beg. of SOLSTICE (2007) Permanent

More information

Performance and Scalability of the Block Low-Rank Multifrontal Factorization on Multicore Architectures

Performance and Scalability of the Block Low-Rank Multifrontal Factorization on Multicore Architectures Performance and Scalability of the Block Low-Rank Multifrontal Factorization on Multicore Architectures Patrick R. Amestoy, Alfredo Buttari, Jean-Yves L Excellent, Théo Mary To cite this version: Patrick

More information

Sparse Linear Algebra: Direct Methods, advanced features

Sparse Linear Algebra: Direct Methods, advanced features Sparse Linear Algebra: Direct Methods, advanced features P. Amestoy and A. Buttari (INPT(ENSEEIHT)-IRIT) A. Guermouche (Univ. Bordeaux-LaBRI), J.-Y. L Excellent and B. Uçar (INRIA-CNRS/LIP-ENS Lyon) F.-H.

More information

c 2017 Society for Industrial and Applied Mathematics

c 2017 Society for Industrial and Applied Mathematics SIAM J. SCI. COMPUT. Vol. 39, No. 4, pp. A1710 A1740 c 2017 Society for Industrial and Applied Mathematics ON THE COMPLEXITY OF THE BLOCK LOW-RANK MULTIFRONTAL FACTORIZATION PATRICK AMESTOY, ALFREDO BUTTARI,

More information

Sparse factorization using low rank submatrices. Cleve Ashcraft LSTC 2010 MUMPS User Group Meeting April 15-16, 2010 Toulouse, FRANCE

Sparse factorization using low rank submatrices. Cleve Ashcraft LSTC 2010 MUMPS User Group Meeting April 15-16, 2010 Toulouse, FRANCE Sparse factorization using low rank submatrices Cleve Ashcraft LSTC cleve@lstc.com 21 MUMPS User Group Meeting April 15-16, 21 Toulouse, FRANCE ftp.lstc.com:outgoing/cleve/mumps1 Ashcraft.pdf 1 LSTC Livermore

More information

Improvements for Implicit Linear Equation Solvers

Improvements for Implicit Linear Equation Solvers Improvements for Implicit Linear Equation Solvers Roger Grimes, Bob Lucas, Clement Weisbecker Livermore Software Technology Corporation Abstract Solving large sparse linear systems of equations is often

More information

Enhancing Scalability of Sparse Direct Methods

Enhancing Scalability of Sparse Direct Methods Journal of Physics: Conference Series 78 (007) 0 doi:0.088/7-6596/78//0 Enhancing Scalability of Sparse Direct Methods X.S. Li, J. Demmel, L. Grigori, M. Gu, J. Xia 5, S. Jardin 6, C. Sovinec 7, L.-Q.

More information

An Efficient Solver for Sparse Linear Systems based on Rank-Structured Cholesky Factorization

An Efficient Solver for Sparse Linear Systems based on Rank-Structured Cholesky Factorization An Efficient Solver for Sparse Linear Systems based on Rank-Structured Cholesky Factorization David Bindel Department of Computer Science Cornell University 15 March 2016 (TSIMF) Rank-Structured Cholesky

More information

A Parallel Geometric Multifrontal Solver Using Hierarchically Semiseparable Structure

A Parallel Geometric Multifrontal Solver Using Hierarchically Semiseparable Structure Page 26 of 46 A Parallel Geometric Multifrontal Solver Using Hierarchically Semiseparable Structure SHEN WANG, Department of Mathematics, Purdue University XIAOYE S. LI, Lawrence Berkeley National Laboratory

More information

An H-LU Based Direct Finite Element Solver Accelerated by Nested Dissection for Large-scale Modeling of ICs and Packages

An H-LU Based Direct Finite Element Solver Accelerated by Nested Dissection for Large-scale Modeling of ICs and Packages PIERS ONLINE, VOL. 6, NO. 7, 2010 679 An H-LU Based Direct Finite Element Solver Accelerated by Nested Dissection for Large-scale Modeling of ICs and Packages Haixin Liu and Dan Jiao School of Electrical

More information

Bridging the gap between flat and hierarchical low-rank matrix formats: the multilevel BLR format

Bridging the gap between flat and hierarchical low-rank matrix formats: the multilevel BLR format Bridging the gap between flat and hierarchical low-rank matrix formats: the multilevel BLR format Amestoy, Patrick and Buttari, Alfredo and L Excellent, Jean-Yves and Mary, Theo 2018 MIMS EPrint: 2018.12

More information

1 Overview. 2 Adapting to computing system evolution. 11 th European LS-DYNA Conference 2017, Salzburg, Austria

1 Overview. 2 Adapting to computing system evolution. 11 th European LS-DYNA Conference 2017, Salzburg, Austria 1 Overview Improving LSTC s Multifrontal Linear Solver Roger Grimes 3, Robert Lucas 3, Nick Meng 2, Francois-Henry Rouet 3, Clement Weisbecker 3, and Ting-Ting Zhu 1 1 Cray Incorporated 2 Intel Corporation

More information

Incomplete Cholesky preconditioners that exploit the low-rank property

Incomplete Cholesky preconditioners that exploit the low-rank property anapov@ulb.ac.be ; http://homepages.ulb.ac.be/ anapov/ 1 / 35 Incomplete Cholesky preconditioners that exploit the low-rank property (theory and practice) Artem Napov Service de Métrologie Nucléaire, Université

More information

Geophysical Journal International

Geophysical Journal International Geophysical Journal International Geophys. J. Int. (2017) 209, 1558 1571 Advance Access publication 2017 March 15 GJI Geomagnetism, rock magnetism and palaeomagnetism doi: 10.1093/gji/ggx106 Large-scale

More information

A High-Performance Parallel Hybrid Method for Large Sparse Linear Systems

A High-Performance Parallel Hybrid Method for Large Sparse Linear Systems Outline A High-Performance Parallel Hybrid Method for Large Sparse Linear Systems Azzam Haidar CERFACS, Toulouse joint work with Luc Giraud (N7-IRIT, France) and Layne Watson (Virginia Polytechnic Institute,

More information

Utilisation de la compression low-rank pour réduire la complexité du solveur PaStiX

Utilisation de la compression low-rank pour réduire la complexité du solveur PaStiX Utilisation de la compression low-rank pour réduire la complexité du solveur PaStiX 26 Septembre 2018 - JCAD 2018 - Lyon Grégoire Pichon, Mathieu Faverge, Pierre Ramet, Jean Roman Outline 1. Context 2.

More information

FAST STRUCTURED EIGENSOLVER FOR DISCRETIZED PARTIAL DIFFERENTIAL OPERATORS ON GENERAL MESHES

FAST STRUCTURED EIGENSOLVER FOR DISCRETIZED PARTIAL DIFFERENTIAL OPERATORS ON GENERAL MESHES Proceedings of the Project Review, Geo-Mathematical Imaging Group Purdue University, West Lafayette IN, Vol. 1 2012 pp. 123-132. FAST STRUCTURED EIGENSOLVER FOR DISCRETIZED PARTIAL DIFFERENTIAL OPERATORS

More information

Fast algorithms for hierarchically semiseparable matrices

Fast algorithms for hierarchically semiseparable matrices NUMERICAL LINEAR ALGEBRA WITH APPLICATIONS Numer. Linear Algebra Appl. 2010; 17:953 976 Published online 22 December 2009 in Wiley Online Library (wileyonlinelibrary.com)..691 Fast algorithms for hierarchically

More information

From Direct to Iterative Substructuring: some Parallel Experiences in 2 and 3D

From Direct to Iterative Substructuring: some Parallel Experiences in 2 and 3D From Direct to Iterative Substructuring: some Parallel Experiences in 2 and 3D Luc Giraud N7-IRIT, Toulouse MUMPS Day October 24, 2006, ENS-INRIA, Lyon, France Outline 1 General Framework 2 The direct

More information

An Adaptive Hierarchical Matrix on Point Iterative Poisson Solver

An Adaptive Hierarchical Matrix on Point Iterative Poisson Solver Malaysian Journal of Mathematical Sciences 10(3): 369 382 (2016) MALAYSIAN JOURNAL OF MATHEMATICAL SCIENCES Journal homepage: http://einspem.upm.edu.my/journal An Adaptive Hierarchical Matrix on Point

More information

A Novel Aggregation Method based on Graph Matching for Algebraic MultiGrid Preconditioning of Sparse Linear Systems

A Novel Aggregation Method based on Graph Matching for Algebraic MultiGrid Preconditioning of Sparse Linear Systems A Novel Aggregation Method based on Graph Matching for Algebraic MultiGrid Preconditioning of Sparse Linear Systems Pasqua D Ambra, Alfredo Buttari, Daniela Di Serafino, Salvatore Filippone, Simone Gentile,

More information

Fast matrix algebra for dense matrices with rank-deficient off-diagonal blocks

Fast matrix algebra for dense matrices with rank-deficient off-diagonal blocks CHAPTER 2 Fast matrix algebra for dense matrices with rank-deficient off-diagonal blocks Chapter summary: The chapter describes techniques for rapidly performing algebraic operations on dense matrices

More information

Technology USA; 3 Dept. of Mathematics, University of California, Irvine. SUMMARY

Technology USA; 3 Dept. of Mathematics, University of California, Irvine. SUMMARY Adrien Scheuer, Leonardo Zepeda-Núñez,3*), Russell J. Hewett 2 and Laurent Demanet Dept. of Mathematics and Earth Resources Lab, Massachusetts Institute of Technology; 2 Total E&P Research & Technology

More information

Partial Left-Looking Structured Multifrontal Factorization & Algorithms for Compressed Sensing. Cinna Julie Wu

Partial Left-Looking Structured Multifrontal Factorization & Algorithms for Compressed Sensing. Cinna Julie Wu Partial Left-Looking Structured Multifrontal Factorization & Algorithms for Compressed Sensing by Cinna Julie Wu A dissertation submitted in partial satisfaction of the requirements for the degree of Doctor

More information

Numerical Methods I Non-Square and Sparse Linear Systems

Numerical Methods I Non-Square and Sparse Linear Systems Numerical Methods I Non-Square and Sparse Linear Systems Aleksandar Donev Courant Institute, NYU 1 donev@courant.nyu.edu 1 MATH-GA 2011.003 / CSCI-GA 2945.003, Fall 2014 September 25th, 2014 A. Donev (Courant

More information

SOLVING SPARSE LINEAR SYSTEMS OF EQUATIONS. Chao Yang Computational Research Division Lawrence Berkeley National Laboratory Berkeley, CA, USA

SOLVING SPARSE LINEAR SYSTEMS OF EQUATIONS. Chao Yang Computational Research Division Lawrence Berkeley National Laboratory Berkeley, CA, USA 1 SOLVING SPARSE LINEAR SYSTEMS OF EQUATIONS Chao Yang Computational Research Division Lawrence Berkeley National Laboratory Berkeley, CA, USA 2 OUTLINE Sparse matrix storage format Basic factorization

More information

On the design of parallel linear solvers for large scale problems

On the design of parallel linear solvers for large scale problems On the design of parallel linear solvers for large scale problems ICIAM - August 2015 - Mini-Symposium on Recent advances in matrix computations for extreme-scale computers M. Faverge, X. Lacoste, G. Pichon,

More information

Fast Structured Spectral Methods

Fast Structured Spectral Methods Spectral methods HSS structures Fast algorithms Conclusion Fast Structured Spectral Methods Yingwei Wang Department of Mathematics, Purdue University Joint work with Prof Jie Shen and Prof Jianlin Xia

More information

Multilevel low-rank approximation preconditioners Yousef Saad Department of Computer Science and Engineering University of Minnesota

Multilevel low-rank approximation preconditioners Yousef Saad Department of Computer Science and Engineering University of Minnesota Multilevel low-rank approximation preconditioners Yousef Saad Department of Computer Science and Engineering University of Minnesota SIAM CSE Boston - March 1, 2013 First: Joint work with Ruipeng Li Work

More information

J.I. Aliaga 1 M. Bollhöfer 2 A.F. Martín 1 E.S. Quintana-Ortí 1. March, 2009

J.I. Aliaga 1 M. Bollhöfer 2 A.F. Martín 1 E.S. Quintana-Ortí 1. March, 2009 Parallel Preconditioning of Linear Systems based on ILUPACK for Multithreaded Architectures J.I. Aliaga M. Bollhöfer 2 A.F. Martín E.S. Quintana-Ortí Deparment of Computer Science and Engineering, Univ.

More information

A Block Compression Algorithm for Computing Preconditioners

A Block Compression Algorithm for Computing Preconditioners A Block Compression Algorithm for Computing Preconditioners J Cerdán, J Marín, and J Mas Abstract To implement efficiently algorithms for the solution of large systems of linear equations in modern computer

More information

MARCH 24-27, 2014 SAN JOSE, CA

MARCH 24-27, 2014 SAN JOSE, CA MARCH 24-27, 2014 SAN JOSE, CA Sparse HPC on modern architectures Important scientific applications rely on sparse linear algebra HPCG a new benchmark proposal to complement Top500 (HPL) To solve A x =

More information

Towards parallel bipartite matching algorithms

Towards parallel bipartite matching algorithms Outline Towards parallel bipartite matching algorithms Bora Uçar CNRS and GRAAL, ENS Lyon, France Scheduling for large-scale systems, 13 15 May 2009, Knoxville Joint work with Patrick R. Amestoy (ENSEEIHT-IRIT,

More information

A dissection solver with kernel detection for unsymmetric matrices in FreeFem++

A dissection solver with kernel detection for unsymmetric matrices in FreeFem++ . p.1/21 11 Dec. 2014, LJLL, Paris FreeFem++ workshop A dissection solver with kernel detection for unsymmetric matrices in FreeFem++ Atsushi Suzuki Atsushi.Suzuki@ann.jussieu.fr Joint work with François-Xavier

More information

LU factorization with Panel Rank Revealing Pivoting and its Communication Avoiding version

LU factorization with Panel Rank Revealing Pivoting and its Communication Avoiding version 1 LU factorization with Panel Rank Revealing Pivoting and its Communication Avoiding version Amal Khabou Advisor: Laura Grigori Université Paris Sud 11, INRIA Saclay France SIAMPP12 February 17, 2012 2

More information

Using R for Iterative and Incremental Processing

Using R for Iterative and Incremental Processing Using R for Iterative and Incremental Processing Shivaram Venkataraman, Indrajit Roy, Alvin AuYoung, Robert Schreiber UC Berkeley and HP Labs UC BERKELEY Big Data, Complex Algorithms PageRank (Dominant

More information

A DISTRIBUTED-MEMORY RANDOMIZED STRUCTURED MULTIFRONTAL METHOD FOR SPARSE DIRECT SOLUTIONS

A DISTRIBUTED-MEMORY RANDOMIZED STRUCTURED MULTIFRONTAL METHOD FOR SPARSE DIRECT SOLUTIONS A DISTRIBUTED-MEMORY RANDOMIZED STRUCTURED MULTIFRONTAL METHOD FOR SPARSE DIRECT SOLUTIONS ZIXING XIN, JIANLIN XIA, MAARTEN V. DE HOOP, STEPHEN CAULEY, AND VENKATARAMANAN BALAKRISHNAN Abstract. We design

More information

On the design of parallel linear solvers for large scale problems

On the design of parallel linear solvers for large scale problems On the design of parallel linear solvers for large scale problems Journée problème de Poisson, IHP, Paris M. Faverge, P. Ramet M. Faverge Assistant Professor Bordeaux INP LaBRI Inria Bordeaux - Sud-Ouest

More information

THÈSE. En vue de l obtention du DOCTORAT DE L UNIVERSITÉ DE TOULOUSE. Théo Mary

THÈSE. En vue de l obtention du DOCTORAT DE L UNIVERSITÉ DE TOULOUSE. Théo Mary THÈSE En vue de l obtention du DOCTORAT DE L UNIVERSITÉ DE TOULOUSE Délivré par : l Université Toulouse 3 Paul Sabatier (UT3 Paul Sabatier) Présentée et soutenue le 24 Novembre 2017 par : Théo Mary Block

More information

Multipole-Based Preconditioners for Sparse Linear Systems.

Multipole-Based Preconditioners for Sparse Linear Systems. Multipole-Based Preconditioners for Sparse Linear Systems. Ananth Grama Purdue University. Supported by the National Science Foundation. Overview Summary of Contributions Generalized Stokes Problem Solenoidal

More information

Accelerating linear algebra computations with hybrid GPU-multicore systems.

Accelerating linear algebra computations with hybrid GPU-multicore systems. Accelerating linear algebra computations with hybrid GPU-multicore systems. Marc Baboulin INRIA/Université Paris-Sud joint work with Jack Dongarra (University of Tennessee and Oak Ridge National Laboratory)

More information

A DISTRIBUTED-MEMORY RANDOMIZED STRUCTURED MULTIFRONTAL METHOD FOR SPARSE DIRECT SOLUTIONS

A DISTRIBUTED-MEMORY RANDOMIZED STRUCTURED MULTIFRONTAL METHOD FOR SPARSE DIRECT SOLUTIONS SIAM J. SCI. COMPUT. Vol. 39, No. 4, pp. C292 C318 c 2017 Society for Industrial and Applied Mathematics A DISTRIBUTED-MEMORY RANDOMIZED STRUCTURED MULTIFRONTAL METHOD FOR SPARSE DIRECT SOLUTIONS ZIXING

More information

V C V L T I 0 C V B 1 V T 0 I. l nk

V C V L T I 0 C V B 1 V T 0 I. l nk Multifrontal Method Kailai Xu September 16, 2017 Main observation. Consider the LDL T decomposition of a SPD matrix [ ] [ ] [ ] [ ] B V T L 0 I 0 L T L A = = 1 V T V C V L T I 0 C V B 1 V T, 0 I where

More information

Lecture 8: Fast Linear Solvers (Part 7)

Lecture 8: Fast Linear Solvers (Part 7) Lecture 8: Fast Linear Solvers (Part 7) 1 Modified Gram-Schmidt Process with Reorthogonalization Test Reorthogonalization If Av k 2 + δ v k+1 2 = Av k 2 to working precision. δ = 10 3 2 Householder Arnoldi

More information

Hierarchical Matrices. Jon Cockayne April 18, 2017

Hierarchical Matrices. Jon Cockayne April 18, 2017 Hierarchical Matrices Jon Cockayne April 18, 2017 1 Sources Introduction to Hierarchical Matrices with Applications [Börm et al., 2003] 2 Sources Introduction to Hierarchical Matrices with Applications

More information

BALANCING-RELATED MODEL REDUCTION FOR DATA-SPARSE SYSTEMS

BALANCING-RELATED MODEL REDUCTION FOR DATA-SPARSE SYSTEMS BALANCING-RELATED Peter Benner Professur Mathematik in Industrie und Technik Fakultät für Mathematik Technische Universität Chemnitz Computational Methods with Applications Harrachov, 19 25 August 2007

More information

SPARSE SOLVERS POISSON EQUATION. Margreet Nool. November 9, 2015 FOR THE. CWI, Multiscale Dynamics

SPARSE SOLVERS POISSON EQUATION. Margreet Nool. November 9, 2015 FOR THE. CWI, Multiscale Dynamics SPARSE SOLVERS FOR THE POISSON EQUATION Margreet Nool CWI, Multiscale Dynamics November 9, 2015 OUTLINE OF THIS TALK 1 FISHPACK, LAPACK, PARDISO 2 SYSTEM OVERVIEW OF CARTESIUS 3 POISSON EQUATION 4 SOLVERS

More information

Fast direct solvers for elliptic PDEs

Fast direct solvers for elliptic PDEs Fast direct solvers for elliptic PDEs Gunnar Martinsson The University of Colorado at Boulder Students: Adrianna Gillman (now at Dartmouth) Nathan Halko Sijia Hao Patrick Young (now at GeoEye Inc.) Collaborators:

More information

Some Geometric and Algebraic Aspects of Domain Decomposition Methods

Some Geometric and Algebraic Aspects of Domain Decomposition Methods Some Geometric and Algebraic Aspects of Domain Decomposition Methods D.S.Butyugin 1, Y.L.Gurieva 1, V.P.Ilin 1,2, and D.V.Perevozkin 1 Abstract Some geometric and algebraic aspects of various domain decomposition

More information

Domain Decomposition-based contour integration eigenvalue solvers

Domain Decomposition-based contour integration eigenvalue solvers Domain Decomposition-based contour integration eigenvalue solvers Vassilis Kalantzis joint work with Yousef Saad Computer Science and Engineering Department University of Minnesota - Twin Cities, USA SIAM

More information

July 18, Abstract. capture the unsymmetric structure of the matrices. We introduce a new algorithm which

July 18, Abstract. capture the unsymmetric structure of the matrices. We introduce a new algorithm which An unsymmetrized multifrontal LU factorization Λ Patrick R. Amestoy y and Chiara Puglisi z July 8, 000 Abstract A well-known approach to compute the LU factorization of a general unsymmetric matrix A is

More information

Linear Solvers. Andrew Hazel

Linear Solvers. Andrew Hazel Linear Solvers Andrew Hazel Introduction Thus far we have talked about the formulation and discretisation of physical problems...... and stopped when we got to a discrete linear system of equations. Introduction

More information

Efficient domain decomposition methods for the time-harmonic Maxwell equations

Efficient domain decomposition methods for the time-harmonic Maxwell equations Efficient domain decomposition methods for the time-harmonic Maxwell equations Marcella Bonazzoli 1, Victorita Dolean 2, Ivan G. Graham 3, Euan A. Spence 3, Pierre-Henri Tournier 4 1 Inria Saclay (Defi

More information

Sparse linear solvers

Sparse linear solvers Sparse linear solvers Laura Grigori ALPINES INRIA and LJLL, UPMC On sabbatical at UC Berkeley March 2015 Plan Sparse linear solvers Sparse matrices and graphs Classes of linear solvers Sparse Cholesky

More information

A robust inner-outer HSS preconditioner

A robust inner-outer HSS preconditioner NUMERICAL LINEAR ALGEBRA WIH APPLICAIONS Numer. Linear Algebra Appl. 2011; 00:1 0 [Version: 2002/09/18 v1.02] A robust inner-outer HSS preconditioner Jianlin Xia 1 1 Department of Mathematics, Purdue University,

More information

MULTI-LAYER HIERARCHICAL STRUCTURES AND FACTORIZATIONS

MULTI-LAYER HIERARCHICAL STRUCTURES AND FACTORIZATIONS MULTI-LAYER HIERARCHICAL STRUCTURES AND FACTORIZATIONS JIANLIN XIA Abstract. We propose multi-layer hierarchically semiseparable MHS structures for the fast factorizations of dense matrices arising from

More information

Multigrid based preconditioners for the numerical solution of two-dimensional heterogeneous problems in geophysics

Multigrid based preconditioners for the numerical solution of two-dimensional heterogeneous problems in geophysics Technical Report RAL-TR-2007-002 Multigrid based preconditioners for the numerical solution of two-dimensional heterogeneous problems in geophysics I. S. Duff, S. Gratton, X. Pinel, and X. Vasseur January

More information

Avoiding Communication in Distributed-Memory Tridiagonalization

Avoiding Communication in Distributed-Memory Tridiagonalization Avoiding Communication in Distributed-Memory Tridiagonalization SIAM CSE 15 Nicholas Knight University of California, Berkeley March 14, 2015 Joint work with: Grey Ballard (SNL) James Demmel (UCB) Laura

More information

Minisymposia 9 and 34: Avoiding Communication in Linear Algebra. Jim Demmel UC Berkeley bebop.cs.berkeley.edu

Minisymposia 9 and 34: Avoiding Communication in Linear Algebra. Jim Demmel UC Berkeley bebop.cs.berkeley.edu Minisymposia 9 and 34: Avoiding Communication in Linear Algebra Jim Demmel UC Berkeley bebop.cs.berkeley.edu Motivation (1) Increasing parallelism to exploit From Top500 to multicores in your laptop Exponentially

More information

Optimal Interface Conditions for an Arbitrary Decomposition into Subdomains

Optimal Interface Conditions for an Arbitrary Decomposition into Subdomains Optimal Interface Conditions for an Arbitrary Decomposition into Subdomains Martin J. Gander and Felix Kwok Section de mathématiques, Université de Genève, Geneva CH-1211, Switzerland, Martin.Gander@unige.ch;

More information

Two-level Domain decomposition preconditioning for the high-frequency time-harmonic Maxwell equations

Two-level Domain decomposition preconditioning for the high-frequency time-harmonic Maxwell equations Two-level Domain decomposition preconditioning for the high-frequency time-harmonic Maxwell equations Marcella Bonazzoli 2, Victorita Dolean 1,4, Ivan G. Graham 3, Euan A. Spence 3, Pierre-Henri Tournier

More information

Hybrid static/dynamic scheduling for already optimized dense matrix factorization. Joint Laboratory for Petascale Computing, INRIA-UIUC

Hybrid static/dynamic scheduling for already optimized dense matrix factorization. Joint Laboratory for Petascale Computing, INRIA-UIUC Hybrid static/dynamic scheduling for already optimized dense matrix factorization Simplice Donfack, Laura Grigori, INRIA, France Bill Gropp, Vivek Kale UIUC, USA Joint Laboratory for Petascale Computing,

More information

Inexactness and flexibility in linear Krylov solvers

Inexactness and flexibility in linear Krylov solvers Inexactness and flexibility in linear Krylov solvers Luc Giraud ENSEEIHT (N7) - IRIT, Toulouse Matrix Analysis and Applications CIRM Luminy - October 15-19, 2007 in honor of Gérard Meurant for his 60 th

More information

Sparse LU Factorization on GPUs for Accelerating SPICE Simulation

Sparse LU Factorization on GPUs for Accelerating SPICE Simulation Nano-scale Integrated Circuit and System (NICS) Laboratory Sparse LU Factorization on GPUs for Accelerating SPICE Simulation Xiaoming Chen PhD Candidate Department of Electronic Engineering Tsinghua University,

More information

A Fast Direct Solver for a Class of Elliptic Partial Differential Equations

A Fast Direct Solver for a Class of Elliptic Partial Differential Equations J Sci Comput (2009) 38: 316 330 DOI 101007/s10915-008-9240-6 A Fast Direct Solver for a Class of Elliptic Partial Differential Equations Per-Gunnar Martinsson Received: 20 September 2007 / Revised: 30

More information

Fast direct solvers for elliptic PDEs

Fast direct solvers for elliptic PDEs Fast direct solvers for elliptic PDEs Gunnar Martinsson The University of Colorado at Boulder PhD Students: Tracy Babb Adrianna Gillman (now at Dartmouth) Nathan Halko (now at Spot Influence, LLC) Sijia

More information

Sparse factorizations: Towards optimal complexity and resilience at exascale

Sparse factorizations: Towards optimal complexity and resilience at exascale Sparse factorizations: Towards optimal complexity and resilience at exascale Xiaoye Sherry Li Lawrence Berkeley National Laboratory Challenges in 21st Century Experimental Mathematical Computation Workshop,

More information

High Performance Parallel Tucker Decomposition of Sparse Tensors

High Performance Parallel Tucker Decomposition of Sparse Tensors High Performance Parallel Tucker Decomposition of Sparse Tensors Oguz Kaya INRIA and LIP, ENS Lyon, France SIAM PP 16, April 14, 2016, Paris, France Joint work with: Bora Uçar, CNRS and LIP, ENS Lyon,

More information

Parallel Preconditioning Methods for Ill-conditioned Problems

Parallel Preconditioning Methods for Ill-conditioned Problems Parallel Preconditioning Methods for Ill-conditioned Problems Kengo Nakajima Information Technology Center, The University of Tokyo 2014 Conference on Advanced Topics and Auto Tuning in High Performance

More information

Migration with Implicit Solvers for the Time-harmonic Helmholtz

Migration with Implicit Solvers for the Time-harmonic Helmholtz Migration with Implicit Solvers for the Time-harmonic Helmholtz Yogi A. Erlangga, Felix J. Herrmann Seismic Laboratory for Imaging and Modeling, The University of British Columbia {yerlangga,fherrmann}@eos.ubc.ca

More information

Balanced Truncation Model Reduction of Large and Sparse Generalized Linear Systems

Balanced Truncation Model Reduction of Large and Sparse Generalized Linear Systems Balanced Truncation Model Reduction of Large and Sparse Generalized Linear Systems Jos M. Badía 1, Peter Benner 2, Rafael Mayo 1, Enrique S. Quintana-Ortí 1, Gregorio Quintana-Ortí 1, A. Remón 1 1 Depto.

More information

Effective matrix-free preconditioning for the augmented immersed interface method

Effective matrix-free preconditioning for the augmented immersed interface method Effective matrix-free preconditioning for the augmented immersed interface method Jianlin Xia a, Zhilin Li b, Xin Ye a a Department of Mathematics, Purdue University, West Lafayette, IN 47907, USA. E-mail:

More information

Leveraging Task-Parallelism in Energy-Efficient ILU Preconditioners

Leveraging Task-Parallelism in Energy-Efficient ILU Preconditioners Leveraging Task-Parallelism in Energy-Efficient ILU Preconditioners José I. Aliaga Leveraging task-parallelism in energy-efficient ILU preconditioners Universidad Jaime I (Castellón, Spain) José I. Aliaga

More information

Convergence Behavior of a Two-Level Optimized Schwarz Preconditioner

Convergence Behavior of a Two-Level Optimized Schwarz Preconditioner Convergence Behavior of a Two-Level Optimized Schwarz Preconditioner Olivier Dubois 1 and Martin J. Gander 2 1 IMA, University of Minnesota, 207 Church St. SE, Minneapolis, MN 55455 dubois@ima.umn.edu

More information

A direct solver for elliptic PDEs in three dimensions based on hierarchical merging of Poincaré-Steklov operators

A direct solver for elliptic PDEs in three dimensions based on hierarchical merging of Poincaré-Steklov operators (1) A direct solver for elliptic PDEs in three dimensions based on hierarchical merging of Poincaré-Steklov operators S. Hao 1, P.G. Martinsson 2 Abstract: A numerical method for variable coefficient elliptic

More information

AMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences)

AMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences) AMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences) Lecture 19: Computing the SVD; Sparse Linear Systems Xiangmin Jiao Stony Brook University Xiangmin Jiao Numerical

More information

LU Factorization with Panel Rank Revealing Pivoting and Its Communication Avoiding Version

LU Factorization with Panel Rank Revealing Pivoting and Its Communication Avoiding Version LU Factorization with Panel Rank Revealing Pivoting and Its Communication Avoiding Version Amal Khabou James Demmel Laura Grigori Ming Gu Electrical Engineering and Computer Sciences University of California

More information

QR Factorization of Tall and Skinny Matrices in a Grid Computing Environment

QR Factorization of Tall and Skinny Matrices in a Grid Computing Environment QR Factorization of Tall and Skinny Matrices in a Grid Computing Environment Emmanuel AGULLO (INRIA / LaBRI) Camille COTI (Iowa State University) Jack DONGARRA (University of Tennessee) Thomas HÉRAULT

More information

Jacobi-Davidson Eigensolver in Cusolver Library. Lung-Sheng Chien, NVIDIA

Jacobi-Davidson Eigensolver in Cusolver Library. Lung-Sheng Chien, NVIDIA Jacobi-Davidson Eigensolver in Cusolver Library Lung-Sheng Chien, NVIDIA lchien@nvidia.com Outline CuSolver library - cusolverdn: dense LAPACK - cusolversp: sparse LAPACK - cusolverrf: refactorization

More information

Scalable Hybrid Programming and Performance for SuperLU Sparse Direct Solver

Scalable Hybrid Programming and Performance for SuperLU Sparse Direct Solver Scalable Hybrid Programming and Performance for SuperLU Sparse Direct Solver Sherry Li Lawrence Berkeley National Laboratory Piyush Sao Rich Vuduc Georgia Institute of Technology CUG 14, May 4-8, 14, Lugano,

More information

Robust solution of Poisson-like problems with aggregation-based AMG

Robust solution of Poisson-like problems with aggregation-based AMG Robust solution of Poisson-like problems with aggregation-based AMG Yvan Notay Université Libre de Bruxelles Service de Métrologie Nucléaire Paris, January 26, 215 Supported by the Belgian FNRS http://homepages.ulb.ac.be/

More information

Preface to the Second Edition. Preface to the First Edition

Preface to the Second Edition. Preface to the First Edition n page v Preface to the Second Edition Preface to the First Edition xiii xvii 1 Background in Linear Algebra 1 1.1 Matrices................................. 1 1.2 Square Matrices and Eigenvalues....................

More information

FAST SPARSE SELECTED INVERSION

FAST SPARSE SELECTED INVERSION SIAM J. MARIX ANAL. APPL. Vol. 36, No. 3, pp. 83 34 c 05 Society for Industrial and Applied Mathematics FAS SPARSE SELECED INVERSION JIANLIN XIA, YUANZHE XI, SEPHEN CAULEY, AND VENKAARAMANAN BALAKRISHNAN

More information

Exploiting Low-Rank Structure in Computing Matrix Powers with Applications to Preconditioning

Exploiting Low-Rank Structure in Computing Matrix Powers with Applications to Preconditioning Exploiting Low-Rank Structure in Computing Matrix Powers with Applications to Preconditioning Erin C. Carson, Nicholas Knight, James Demmel, Ming Gu U.C. Berkeley SIAM PP 12, Savannah, Georgia, USA, February

More information

MASSIVELY PARALLEL STRUCTURED MULTIFRONTAL SOLVER FOR TIME-HARMONIC ELASTIC WAVES IN 3D ANISOTROPIC MEDIA

MASSIVELY PARALLEL STRUCTURED MULTIFRONTAL SOLVER FOR TIME-HARMONIC ELASTIC WAVES IN 3D ANISOTROPIC MEDIA Proceedings of the Project Review, Geo-Mathematical Imaging Group Purdue University, West Lafayette IN, Vol. pp. 97-. MASSIVELY PARALLEL STRUCTURED MULTIFRONTAL SOLVER FOR TIME-HARMONIC ELASTIC WAVES IN

More information

An Efficient Low Memory Implicit DG Algorithm for Time Dependent Problems

An Efficient Low Memory Implicit DG Algorithm for Time Dependent Problems An Efficient Low Memory Implicit DG Algorithm for Time Dependent Problems P.-O. Persson and J. Peraire Massachusetts Institute of Technology 2006 AIAA Aerospace Sciences Meeting, Reno, Nevada January 9,

More information

arxiv: v1 [cs.na] 20 Jul 2015

arxiv: v1 [cs.na] 20 Jul 2015 AN EFFICIENT SOLVER FOR SPARSE LINEAR SYSTEMS BASED ON RANK-STRUCTURED CHOLESKY FACTORIZATION JEFFREY N. CHADWICK AND DAVID S. BINDEL arxiv:1507.05593v1 [cs.na] 20 Jul 2015 Abstract. Direct factorization

More information

Accelerating Linear Algebra on Heterogeneous Architectures of Multicore and GPUs using MAGMA and DPLASMA and StarPU Schedulers

Accelerating Linear Algebra on Heterogeneous Architectures of Multicore and GPUs using MAGMA and DPLASMA and StarPU Schedulers UT College of Engineering Tutorial Accelerating Linear Algebra on Heterogeneous Architectures of Multicore and GPUs using MAGMA and DPLASMA and StarPU Schedulers Stan Tomov 1, George Bosilca 1, and Cédric

More information

FINDING PARALLELISM IN GENERAL-PURPOSE LINEAR PROGRAMMING

FINDING PARALLELISM IN GENERAL-PURPOSE LINEAR PROGRAMMING FINDING PARALLELISM IN GENERAL-PURPOSE LINEAR PROGRAMMING Daniel Thuerck 1,2 (advisors Michael Goesele 1,2 and Marc Pfetsch 1 ) Maxim Naumov 3 1 Graduate School of Computational Engineering, TU Darmstadt

More information

Open-source finite element solver for domain decomposition problems

Open-source finite element solver for domain decomposition problems 1/29 Open-source finite element solver for domain decomposition problems C. Geuzaine 1, X. Antoine 2,3, D. Colignon 1, M. El Bouajaji 3,2 and B. Thierry 4 1 - University of Liège, Belgium 2 - University

More information

A dissection solver with kernel detection for symmetric finite element matrices on shared memory computers

A dissection solver with kernel detection for symmetric finite element matrices on shared memory computers A dissection solver with kernel detection for symmetric finite element matrices on shared memory computers Atsushi Suzuki, François-Xavier Roux To cite this version: Atsushi Suzuki, François-Xavier Roux.

More information

Contents. Preface... xi. Introduction...

Contents. Preface... xi. Introduction... Contents Preface... xi Introduction... xv Chapter 1. Computer Architectures... 1 1.1. Different types of parallelism... 1 1.1.1. Overlap, concurrency and parallelism... 1 1.1.2. Temporal and spatial parallelism

More information

COMPARED to other computational electromagnetic

COMPARED to other computational electromagnetic IEEE TRANSACTIONS ON MICROWAVE THEORY AND TECHNIQUES, VOL. 58, NO. 12, DECEMBER 2010 3697 Existence of -Matrix Representations of the Inverse Finite-Element Matrix of Electrodynamic Problems and -Based

More information

Direct solution methods for sparse matrices. p. 1/49

Direct solution methods for sparse matrices. p. 1/49 Direct solution methods for sparse matrices p. 1/49 p. 2/49 Direct solution methods for sparse matrices Solve Ax = b, where A(n n). (1) Factorize A = LU, L lower-triangular, U upper-triangular. (2) Solve

More information

Exploiting off-diagonal rank structures in the solution of linear matrix equations

Exploiting off-diagonal rank structures in the solution of linear matrix equations Stefano Massei Exploiting off-diagonal rank structures in the solution of linear matrix equations Based on joint works with D. Kressner (EPFL), M. Mazza (IPP of Munich), D. Palitta (IDCTS of Magdeburg)

More information

EFFECTIVE AND ROBUST PRECONDITIONING OF GENERAL SPD MATRICES VIA STRUCTURED INCOMPLETE FACTORIZATION

EFFECTIVE AND ROBUST PRECONDITIONING OF GENERAL SPD MATRICES VIA STRUCTURED INCOMPLETE FACTORIZATION EFFECVE AND ROBUS PRECONDONNG OF GENERAL SPD MARCES VA SRUCURED NCOMPLEE FACORZAON JANLN XA AND ZXNG XN Abstract For general symmetric positive definite SPD matrices, we present a framework for designing

More information

SEISMIC INVERSE SCATTERING VIA DISCRETE HELMHOLTZ OPERATOR FACTORIZATION AND OPTIMIZATION

SEISMIC INVERSE SCATTERING VIA DISCRETE HELMHOLTZ OPERATOR FACTORIZATION AND OPTIMIZATION Proceedings of the Project Review, Geo-Mathematical Imaging Group (Purdue University, West Lafayette IN), Vol. (2009) pp. 09-32. SEISMIC INVERSE SCATTERING VIA DISCRETE HELMHOLTZ OPERATOR FACTORIZATION

More information