Block Low-Rank (BLR) approximations to improve multifrontal sparse solvers

Size: px

Start display at page:

Download "Block Low-Rank (BLR) approximations to improve multifrontal sparse solvers"

Sherilyn Oliver
5 years ago
Views:

1 Block Low-Rank (BLR) approximations to improve multifrontal sparse solvers Joint work with Patrick Amestoy, Cleve Ashcraft, Olivier Boiteau, Alfredo Buttari and Jean-Yves L Excellent, PhD started on October 1st, 2010 and financed by EDF Clément Weisbecker, ENSEEIHT-IRIT, University of Toulouse, France MUMPS Users Group Meeting Clamart, France, May 29-30, 2013

2 Introduction Multifrontal solver Block Low-Rank approximations to improve multifrontal sparse solvers direct solver for large linear systems objective: A = LU Low-rank approximations compression and flop reduction accuracy controlled by a numerical parameter Combine these two notions to improve multifrontal solvers (in the context of MUMPS) 2/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013

3 The multifrontal method

4 Separator tree (Schreiber 1982) dependencies between variables topological order 4/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013

5 Frontal matrices At each node, a partial factorization of the frontal matrix is performed : CB 1 CB 2 FS assembly elim CB CBs are stacked waiting for assembly peak of CB stack NFS FS separator variables NFS border variables FS 5/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013

6 Low-rank approximations

7 Low-rank matrices Low-rank matrix (Bebendorf) Let A be a m n matrix Let k ε be the approximated numerical rank of A at accuracy ε A is said to be a low-rank matrix if there exist matrices X of size m k ε, Y of size n k ε, E of size m n such that : A = X Y T + E, where E 2 ε and k ε (m + n) < mn n k 2 k 2 Y T 2 k 1 k 1 C = Y T 1 X 2 Low-rank product: X 1 (Y T 1 X 2)Y T 2 m X1 7/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013

8 Where can I find low-rank matrices? Problem: Matrices are usually not low-rank! Solution (Börm, Bebendorf): Well defined subblocks A b = A στ can be low-rank! (b = σ τ I I) Define a clustering C to obtain low-rank blocks I 1 I 2 I 3 I 1 I 2 I 3 8/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013

9 Admissibility condition How to find a good clustering? Admissibility condition for elliptic PDEs (Börm, Grasedyck, Hackbusch) Let σ I and τ I be clusters of indices The block (σ, τ) is said admissible if σ and τ satisfy diam(σ) + diam(τ) 2ηdist(σ, τ) where η is a problem dependant fixed parameter (usually 1 or 05) diam(σ) dist(σ,τ) diam(τ) σ 9/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013 τ

10 Illustration (1) D 1 D 2 10/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013

11 Illustration (1) D 1 D 2 10/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013

12 Illustration (1) D 1 D 2 10/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013

13 Illustration (1) D 1 D 2 10/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013

14 Illustration (1) D 1 D 2 10/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013

15 Illustration (1) D 1 D 2 10/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013

16 Illustration (2) rank of distance between and room for compression 11/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013

17 Exploiting low-rank subblocks: H matrices Ã = D 11 X 12 Y T 12 X 36 Y T 36 X 21 Y T 21 D 22 X 63 Y T 63 D 44 X 45 Y T 45 X 54 Y T 54 D 55 I 7 I 7 I 3 I 3 I 3 I 6 I 6 I 3 I 6 I 6 I 1 I 1 I 1 I 2 I 2 I 1 I 2 I 2 I 4 I 4 I 4 I 5 I 5 I 4 I 5 I 5 (Börm, Bebendorf, Hackbush, Grasedyck) 12/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013

18 Exploiting low-rank subblocks: HSS matrices D 1 X 1 B 1 Y T 2 X 1 R 1 B 3 W T 4 YT 4 X 1 R 1 B 3 W T 5 YT 5 Ã = X 2 B 2 Y1 T D 2 X 2 R 2 B 3 W4 T YT 4 X 2 R 2 B 3 W5 T YT 5 X 4 R 4 B 6 W1 T YT 1 X 4 R 4 B 6 W2 T YT 2 D 4 X 4 B 4 Y5 T X 5 R 5 B 6 W1 T YT 1 X 5 R 5 B 6 W2 T YT 2 X 5 B 5 Y4 T D B 3 B 6 6 R 1,W 1 B 1 R 2,W 2 R 4,W 4 B 4 R 5,W 5 1 B B 5 5 X 1,Y 1 X 2,Y 2 X 4,Y 4 X 5,Y 5 D 1 D 2 D 4 D 5 (Li, Napov, Xia, Gu, Wang, Rouet Gillman, Martinsson) 13/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013

19 Exploiting low-rank subblocks: BLR matrices D 1 X 12 Y T 12 X 13 Y T 13 X 14 Y T 14 Ã = X 21 Y T 21 D 2 X 23 Y T 23 X 24 Y T 24 X 31 Y T 31 X 32 Y T 32 D 3 X 34 Y T 34 X 41 Y T 41 X 42 Y T 42 X 43 Y T 43 D 4 particular case of H-matrices no tree natural matrix structure Idea: representing fronts with BLR matrices 14/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013

20 Comparative study: compression rates Compression rates of the frontal matrix at the root of a multifrontal tree, on two 3D stencils (discretization of a 128 x 128 x 128 cube) fraction of dense storage Laplacian low-rank threshold H HSS BLR fraction of dense storage pts Geoazur stencil low-rank threshold H HSS BLR 15/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013

21 Comparative study: compression cost Compression cost of the frontal matrix at the root of a multifrontal tree, on two 3D stencils (discretization of a 128 x 128 x 128 cube) log 10 (number of operations for compression) Laplacian H HSS BLR log 10 (number of operations for compression) 27-pts Geoazur stencil low-rank threshold low-rank threshold H HSS BLR 16/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013

22 Adaptation to a multifrontal solver 17/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013

23 Adaptation to a multifrontal solver no relative order efficient clustering 17/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013

24 Adaptation to a multifrontal solver no relative order efficient compr and decompr 200 efficient clustering distribution assembly 17/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013

25 Adaptation to a multifrontal solver no relative order efficient compr and decompr flat non hierarchical structure efficient clustering distribution assembly pivoting 17/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013

26 Clustering variables

27 Admissible clustering Constraint : the admissibility condition should be satisfied large diameters fraction of memory used 83% small diameters fraction of memory used 57% 19/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013

28 Halo algorithm for the clustering of a separator Designed to catch the geometry of the problem Computed with the graph instead of the mesh Coupled with a third party partitioning tool 20/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013

29 Halo algorithm for the clustering of a separator Designed to catch the geometry of the problem Computed with the graph instead of the mesh Coupled with a third party partitioning tool 1 The separator 20/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013

30 Halo algorithm for the clustering of a separator Designed to catch the geometry of the problem Computed with the graph instead of the mesh Coupled with a third party partitioning tool 1 The separator 2 The halo 20/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013

31 Halo algorithm for the clustering of a separator Designed to catch the geometry of the problem Computed with the graph instead of the mesh Coupled with a third party partitioning tool 1 The separator 2 The halo 3 Extraction of the halo 20/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013

32 Halo algorithm for the clustering of a separator Designed to catch the geometry of the problem Computed with the graph instead of the mesh Coupled with a third party partitioning tool 1 The separator 2 The halo 3 Extraction of the halo 4 Partition of the halo 20/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013

33 Halo algorithm for the clustering of a separator Designed to catch the geometry of the problem Computed with the graph instead of the mesh Coupled with a third party partitioning tool P 1 P 2 P 3 1 The separator 2 The halo 3 Extraction of the halo 4 Partition of the halo 5 Partition of the separator (block size is fixed) 20/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013

34 Clustering of the variables of a front front = separator + border S 21/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013

35 Clustering of the variables of a front front = separator + border 1- separator : halo S 21/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013

36 Clustering of the variables of a front front = separator + border 1- separator : halo 2- border? 2 choices : S 21/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013

37 Clustering of the variables of a front front = separator + border 1- separator : halo 2- border? 2 choices : S EXPLICIT S 21/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013

38 Clustering of the variables of a front front = separator + border 1- separator : halo 2- border? 2 choices : S EXPLICIT S 21/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013

39 Clustering of the variables of a front front = separator + border 1- separator : halo 2- border? 2 choices : S EXPLICIT INHERITED (top down) S S 21/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013

40 Clustering of the variables of a front front = separator + border 1- separator : halo 2- border? 2 choices : S EXPLICIT INHERITED (top down) S S 21/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013

41 Clustering of the variables of a front front = separator + border 1- separator : halo 2- border? 2 choices : S EXPLICIT INHERITED (top down) S S 21/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013

42 Clustering of the variables of a front front = separator + border 1- separator : halo 2- border? 2 choices : S EXPLICIT INHERITED (top down) S S 21/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013

43 Structural comparison optimal optimal = optimal block 22/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013

44 Structural comparison optimal optimal = optimal block small optimal = large enough block 22/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013

45 Structural comparison optimal optimal = optimal block small optimal = large enough block small small = too small block 22/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013

46 Structural comparison optimal optimal = optimal block small optimal = large enough block small small = too small block reclustering strategies S 4 22/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013

47 Illustration with a top-level separator from SCOTCH side face view perspective view 23/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013

48 Block Low-Rank multifrontal method

49 General idea fronts approximated with low-rank matrices σ L i 1,1 U i 1,1 L i 2,2 U i 2,2 τ U i di,di L i di,di CB 25/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013

50 Switch level (1) percentof the entries of the factor computed in fronts larger than N mesh size N =200 N =400 N =600 N =800 N =1000 a large ratio of the memory consumption is targeted 26/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013

51 Switch level (1) 100 percentof operations performed in fronts larger than N N =200 N =400 N =600 N =800 N = mesh size a large ratio of the flop count is targeted 26/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013

52 Switch level (2) low-rank techniques only used on large fronts mesh size N = N = N = N = N = (Proportion of fronts larger than N with respect to the total number of fronts in the multifrontal tree, for different mesh sizes) Only a few fronts are actually compressed! 27/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013

53 Factorization algorithms task operation type dense low-rank Factor (F) B = LU T (2/3)n 3 (2/3)n 3 Compress (C) C = XY T kn 2 Solve (S) D = X(Y T L 1 ) n 3 kn 2 CB update (U) D = D X 1 (Y1 T X 2)Y2 T 2n 3 2kn 2 28/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013

54 Factorization algorithms task operation type dense low-rank Factor (F) B = LU T (2/3)n 3 (2/3)n 3 Compress (C) C = XY T kn 2 Solve (S) D = X(Y T L 1 ) n 3 kn 2 CB update (U) D = D X 1 (Y1 T X 2)Y2 T 2n 3 2kn 2 Standard FR algorithm: F Factor S Solve U Update 28/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013

55 Factorization algorithms task operation type dense low-rank Factor (F) B = LU T (2/3)n 3 (2/3)n 3 Compress (C) C = XY T kn 2 Solve (S) D = X(Y T L 1 ) n 3 kn 2 CB update (U) D = D X 1 (Y1 T X 2)Y2 T 2n 3 2kn 2 Standard FR algorithm: F Factor S Solve U Update 28/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013

56 Factorization algorithms task operation type dense low-rank Factor (F) B = LU T (2/3)n 3 (2/3)n 3 Compress (C) C = XY T kn 2 Solve (S) D = X(Y T L 1 ) n 3 kn 2 CB update (U) D = D X 1 (Y1 T X 2)Y2 T 2n 3 2kn 2 Standard FR algorithm: F Factor S Solve U Update 28/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013

57 Factorization algorithms task operation type dense low-rank Factor (F) B = LU T (2/3)n 3 (2/3)n 3 Compress (C) C = XY T kn 2 Solve (S) D = X(Y T L 1 ) n 3 kn 2 CB update (U) D = D X 1 (Y1 T X 2)Y2 T 2n 3 2kn 2 Standard FR algorithm: F Factor S Solve U Update 28/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013

58 Factorization algorithms task operation type dense low-rank Factor (F) B = LU T (2/3)n 3 (2/3)n 3 Compress (C) C = XY T kn 2 Solve (S) D = X(Y T L 1 ) n 3 kn 2 CB update (U) D = D X 1 (Y1 T X 2)Y2 T 2n 3 2kn 2 BLR FSCU algorithm: F Factor S Solve U Update 28/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013

59 Factorization algorithms task operation type dense low-rank Factor (F) B = LU T (2/3)n 3 (2/3)n 3 Compress (C) C = XY T kn 2 Solve (S) D = X(Y T L 1 ) n 3 kn 2 CB update (U) D = D X 1 (Y1 T X 2)Y2 T 2n 3 2kn 2 BLR FSCU algorithm: F Factor S Solve U Update 28/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013

60 Factorization algorithms task operation type dense low-rank Factor (F) B = LU T (2/3)n 3 (2/3)n 3 Compress (C) C = XY T kn 2 Solve (S) D = X(Y T L 1 ) n 3 kn 2 CB update (U) D = D X 1 (Y1 T X 2)Y2 T 2n 3 2kn 2 BLR FSCU algorithm: F Factor S Solve U Update 28/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013

61 Factorization algorithms task operation type dense low-rank Factor (F) B = LU T (2/3)n 3 (2/3)n 3 Compress (C) C = XY T kn 2 Solve (S) D = X(Y T L 1 ) n 3 kn 2 CB update (U) D = D X 1 (Y1 T X 2)Y2 T 2n 3 2kn 2 BLR FSCU algorithm: F Factor S Solve C Compress U Update 28/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013

62 Factorization algorithms task operation type dense low-rank Factor (F) B = LU T (2/3)n 3 (2/3)n 3 Compress (C) C = XY T kn 2 Solve (S) D = X(Y T L 1 ) n 3 kn 2 CB update (U) D = D X 1 (Y1 T X 2)Y2 T 2n 3 2kn 2 BLR FSCU algorithm: F Factor S Solve C Compress U Update 28/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013

63 Factorization algorithms task operation type dense low-rank Factor (F) B = LU T (2/3)n 3 (2/3)n 3 Compress (C) C = XY T kn 2 Solve (S) D = X(Y T L 1 ) n 3 kn 2 CB update (U) D = D X 1 (Y1 T X 2)Y2 T 2n 3 2kn 2 F Factor S Solve C Compress U Update FCSU version more efficient less stability 28/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013

64 Factorization algorithms task operation type dense low-rank Factor (F) B = LU T (2/3)n 3 (2/3)n 3 Compress (C) C = XY T kn 2 Solve (S) D = X(Y T L 1 ) n 3 kn 2 CB update (U) D = D X 1 (Y1 T X 2)Y2 T 2n 3 2kn 2 F Factor S Solve C Compress U Update FSUC version no flop reduction more stability 28/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013

65 Factorization algorithms task operation type dense low-rank Factor (F) B = LU T (2/3)n 3 (2/3)n 3 Compress (C) C = XY T kn 2 Solve (S) D = X(Y T L 1 ) n 3 kn 2 CB update (U) D = D X 1 (Y1 T X 2)Y2 T 2n 3 2kn 2 F Factor S Solve C Compress U Update High algorithmic flexibility 28/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013

66 Experiments

67 Set of problems Name Prop Arith N NZ mem LU flops LU CSR appli ( 10 6 ) ( 10 9 ) (GB) ( ) Curl D/sym D E-15 Geoazur D/unsym Z E-4 wave prop EDF_A_MECA_R12 2D/sym D E-15 mechanics EDF_D_THER_R7 3D/sym D E-15 thermics CSR = Componentwise Scaled Residual Code_Aster tpll01{a,d} test cases (refined) large matrices applicative problems 30/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013

68 Metrics memory: L = fraction of FR factors storage obtained with BLR (%) CB = fraction of FR maximum size of CB stack obtained with BLR (%) flops: fraction of FR operations needed for the BLR factorization (in percent or absolute data, including the compression cost) 31/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013

69 Clustering strategy memory flops time L CB facto clustering full rank analysis clustering inh exp inh exp inh exp inh exp Curl % 624% 70% 55% 109% 111% 13 s 27 s 897 s Geoazur % 770% 470% 450% 608% 591% 5 s 42 s 62 s TH_RAFF7 341% 307% 175% 162% 72% 66% 34 s 206 s 387 s ME_RAFF12 529% 511% 48% 41% 61% 61% 121 s 239 s 1971 s inherited version is more than 2 times faster same results on L 11 comparable results on L 21 a little less good on CBs inherited clustering used for all the experiments 32/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013

70 Global ordering of the matrix: general results Results with different orderings on Geoazur128 problem FR ordering mry flops peak L CB flops AMD 109GB 39E GB 735% 400% 594% AMF 72GB 18E GB 699% 535% 483% PORD 55GB 10E GB 704% 490% 489% METIS 46GB 62E GB 787% 464% 626% SCOTCH 49GB 66E GB 794% 481% 634% LR 33/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013

71 Global ordering of the matrix: general results Results with different orderings on Geoazur128 problem FR ordering mry flops peak L CB flops AMD 109GB 39E GB 735% 400% 594% AMF 72GB 18E GB 699% 535% 483% PORD 55GB 10E GB 704% 490% 489% METIS 46GB 62E GB 787% 464% 626% SCOTCH 49GB 66E GB 794% 481% 634% LR Algebraic approach 33/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013

72 Global ordering of the matrix: AMF tree [visualization tool developed by M Bremond talk tomorrow 11:30] 34/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013

73 Global ordering of the matrix: SCOTCH tree [visualization tool developed by M Bremond talk tomorrow 11:30] 35/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013

74 Influence of the size of the problem different refinements of problem EDF_A_MECA done with Homard memory flops Refinement N L CB facto R8 2, 101, % 43% 37% R11 33, 570, % 29% 12% R12 134, 250, % 24% 7% the larger the problem, the more efficient the method target = challenging problems 36/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013

75 Scalability with respect to the size of the problem Laplacian problem, ε = Full Rank O(N 2 ) BLR O(N 4/3 ) 14 flops facto Mesh size M O(N 4/3 ) complexity 37/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013

76 Influence of the size of the clusters memory flops block size L CB facto compr other % 55% 75% 98E+11 46E % 50% 68% 15E+12 41E % 46% 63% 19E+12 37E % 45% 60% 24E+12 34E % 44% 58% 27E+12 33E % 44% 56% 31E+12 32E % 45% 56% 34E+12 31E % 44% 56% 37E+12 31E % 44% 57% 44E+12 31E+13 flexibility (no need for a fine tuning) can be dynamically adapted for pivoting, parallelism, 38/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013

77 Global results: 2D problems EDF_A_MECA_R12 QR DROPPING PARAMETER L CB flops facto (%) CSR CURL-CURL 2D 5000x QR DROPPING PARAMETER Name N mem LU flops LU CSR EDF_A_MECA_R12 134E GB 2E+14 4E-15 Curl-Curl E+6 29 GB 5E+12 2E-15 39/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013

78 Global results: 3D problems L CB flops facto (%) CSR EDF_D_THER_R GEOAZUR 3D 128x128x QR DROPPING PARAMETER QR DROPPING PARAMETER Name N mem LU flops LU CSR EDF_D_THER_R7 8E GB 1E+14 8E-15 Geoazur E+6 54 GB 6E+13 3E-4 40/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013

79 Application to geophysics (1) Helmholtz equation for seismic modeling (SEISCOPE project) EAGE overthrust ground model 0 20 Dip (km) Cross (km) 15 5 Depth (km) m/s fqcy Flops LU Mem LU Peak memory 2 Hz 8957E+11 3 GB 4 GB 4 Hz 1639E GB 25 GB 8 Hz 5769E GB 283 GB 41/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013

80 Application to geophysics (2) 5 Dip (km) b) 20 ) Dip (km) e) m ) Dip (km) Dip (km) Depth (km) h) Dip (km) i) k) Dip (km) C 10 Depth (km) Dip (km) (k ro ss s ro s l) (k m ) 15 5 Dip (km) m ) 15 5 s Dip (km) 10 0 ro s C Depth (km) Depth (km) Depth (km) j) 15 C ro ss C ro ss C ro ss Dip (km) 10 (k (k m ) m ) 5 0 C ro ss (k Depth (km) (k m ) f) C ro ss C ro ss (k m ) g) Dip (km) 10 ro ss C Depth (km) m ) 5 0 (k 0 15 (k m ) C c) 20 (k m Depth (km) Depth (km) Depth (km) Depth (km) Dip (km) Depth (km) 5 5 d) 42/51 0 C ro ss C ro ss ) 0 (k m (k m ) a) a) MUMPS Users Group Meeting Clamart, France, May 29-30, 2013

81 Application to geophysics (3) Dip (km) Dip (km) d) Dip (km) (k C ro ss ss C ro ﬂops 43/51 c) 20 Depth (km) m ) 5 5 Depth (km) Depth (km) 0 (k (k ss C ro m ) b) 20 ) 15 m Dip (km) 10 (k 5 C ro ss 0 Depth (km) m ) a) memory ε fqcy facto F CB (10 5 ) 2 Hz 4 Hz 8 Hz 418 % 274 % 218 % 618 % 500 % 416 % 323% 244% 239% (10 4 ) 2 Hz 4 Hz 8 Hz 329 % 200 % 152 % 534 % 422 % 289 % 239% 217% 194% (10 3 ) 2 Hz 4 Hz 8 Hz 246 % 138 % 98 % 447 % 345 % 213 % 168% 190% 159% MUMPS Users Group Meeting Clamart, France, May 29-30, 2013

Preconditioning with BLR: set of problems N NZ

internal pressure force (challenging for EDF)

82 Preconditioning with BLR: set of problems N NZ Cond application Piston 13E+6 547E+6 51E+5 external pressure force on the top perf001d 20E+6 758E+6 15E+11 cavity hook subjected to internal pressure force (challenging for EDF) 44/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013

Preconditioning with BLR: number of iterations Number of iterations needed for the convergence FR is MUMPS 410 single precision means no convergence in 1, 000 seconds FR 10 8 10 7 10 6 10 5 10 4 10 3

83 Preconditioning with BLR: number of iterations Number of iterations needed for the convergence FR is MUMPS 410 single precision means no convergence in 1, 000 seconds FR Piston perf001d perf001d: only 2 iterations to convergence with a BLR double precision precond at ε = 10 8 good solution versus coarse sword of usual single precision! 45/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013

84 Preconditioning with BLR: Piston (compr and time) 70 (%) PISTON 60 L CB flops facto (%) QR DROPPING PARAMETER 46/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013

85 Preconditioning with BLR: Piston (compr and time) (s) Total execution time Factorization time Solves time (GCPC) PISTON 250 time in seconds FR = 313s FR = 308s FR = 5s QR DROPPING PARAMETER 46/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013

86 Preconditioning with BLR: perf001d (compr and time) (%) PERF L CB flops facto (%) QR DROPPING PARAMETER 47/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013

87 Preconditioning with BLR: perf001d (compr and time) PERF (s) 450 Total execution time Solves time (GCPC) 400 Factorization time 350 time in seconds FR = 569s FR = 117s FR = 452s QR DROPPING PARAMETER 47/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013

88 Software work new FR factorization with two levels of blocking: internal panels: efficiency of BLAS 3 operations external panels: correspond to low-rank blocks done in factorization algorithm: BLR seq unsymm with all MUMPS features and double panel BLR seq symm prototype (single panel, no pivoting) actual flop compression is achieved in both cases MPI symm and unsymm: panel facto rewritten for BLR very short term: BLR seq symm with all MUMPS features and double panel short term: long term: BLR MPI symm and unsymm BLR solution phase actual memory gains 48/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013

89 Conclusion & perspectives efficient method on various applicative problems considerable memory reduction & substantial decrease in computations O(N 4/3 ) complexity, comparable to HSS can be used as a preconditioner or as a direct solver good potential for parallelism code is stable on tested problems MPI study on larger and more difficult problems error propagation study absolute or relative dropping parameter? relative to what? (work with S Gratton, M Ngom and D Titley-Peloquin started) 49/51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013

90 Aknowledgements O Boiteau, B Quinnez and N Tardieu (EDF R&D) S Operto, R Brossier and J Virieux (SEISCOPE Project) for their contribution to the geophysics study the Toulouse Computing Center (CICT) and N Renon S Li, A Napov and F-H Rouet (LBNL Berkeley) Details on this work can be found in: P Amestoy, C Ashcraft, O Boiteau, A Buttari, J-Y L Excellent and C Weisbecker, Improving multifrontal methods by means of block low-rank representations, IRIT technical report RT/APO/12/6, INRIA technical report INRIA/RR-8199, submitted to SIAM SISC, enseeihtfr/documents/rt_apo_12_6_blrpdf, /51 MUMPS Users Group Meeting Clamart, France, May 29-30, 2013

91 Thank you! Any questions?

SUMMARY INTRODUCTION BLOCK LOW-RANK MULTIFRONTAL METHOD

SUMMARY INTRODUCTION BLOCK LOW-RANK MULTIFRONTAL METHOD D frequency-domain seismic modeling with a Block Low-Rank algebraic multifrontal direct solver. C. Weisbecker,, P. Amestoy, O. Boiteau, R. Brossier, A. Buttari, J.-Y. L Excellent, S. Operto, J. Virieux