Multilevel Low-Rank Preconditioners Yousef Saad Department of Computer Science and Engineering University of Minnesota. Modelling 2014 June 2,2014

Size: px

Start display at page:

Download "Multilevel Low-Rank Preconditioners Yousef Saad Department of Computer Science and Engineering University of Minnesota. Modelling 2014 June 2,2014"

Allan Dickerson
5 years ago
Views:

1 Multilevel Low-Rank Preconditioners Yousef Saad Department of Computer Science and Engineering University of Minnesota Modelling 24 June 2,24 Dedicated to Owe Axelsson at the occasion of his 8th birthday

2 Acknowledments Joint work with Ruipeng Li Work supported by NSF-DMS Modelling4 6/2/24 p. 2

3 Intro: ILU-type preconditioners Problem: To solve linear systems Ax = b Common approach: [ grey-box solvers] Krylov subspace accelerator (e.g., GMRES, BiCGSTAB) + Preconditioner Common preconditioners: Incomplete LU factorizations; Relaxation-type; AMG;... Common difficulties of ILUs: Often fail for indefinite problems Not too good for highly parallel environments [GPUs] Modelling4 6/2/24 p. 3

4 Alternatives to ILU preconditioners Time to think about (radical) alternatives? Preconditioners requiring few irregular computations..... that trade volume of computations for speed,.. and, if possible, more robust for indefinite case Possible candidates: Methods based on Multilevel Low- Rank (MLR) approximations Low-rank approximation techniques can be seen everywhere in computational sciences Common approach: truncated SVD.... and more often now : random sampling Modelling4 6/2/24 p. 4

5 Related work: Work on H-matrices [Hackbusch and co-workers, B. Khoromskij, L. Grasedyck, S. Leborne, + many others..] Work on HSS matrices [e.g., J. XIA, S. CHANDRASEKARAN, M. GU, AND X-S. LI 2.] Work on balanced incomplete factorizations (R. Bru et al.) Work on sweeping preconditioners by Engquist and Ying. Work on computing the diagonal of a matrix inverse [Jok Tang and YS (2)..] Modelling4 6/2/24 p. 5

6 MULTI-LEVEL LOW-RANK PRECONDITIONERS

7 Low-rank Multilevel Approximations Starting point: symmetric matrix derived from a 5-point discretization of a 2-D Pb on n x n y grid A = A = A D 2 D 2 A 2... D D α A α D α+ D α+ A α D ny A ny ( ) A A 2 A 2 A 22 ( A A22) + ( A 2 A 2 ) Modelling4 6/2/24 p. 7

8 Corresponding splitting on FD mesh: Modelling4 6/2/24 p. 8

9 A R m m, A 22 R (n m) (n m) In the simplest case second matrix is: ( ) A A 2 = A 2 A 22 ( A A22) + I I Write 2nd matrix as: I I = + I + I I I I I E T = I I E E T Modelling4 6/2/24 p. 9

10 Thus: A = (A + EE T ) }{{} B EE T Note: E R n n x, X R n x n x n x = separator = [O(n /2 ) in 2-D, O(n 2/3 ) in 3-D] B := ( B B2) A = B EE T, R n n, E := ( E E 2 ) R n n x, Next step: use Sherman- Morrison formula: A = B + (B E)X (B E) T X = I E T B E Modelling4 6/2/24 p.

11 Multilevel Low-Rank (MLR) algorithm Use in a recursive framework [apply recursively to B, B 2 ] Next step: lowrank approx. for B E B E U k V T k, U k R n k, V k R n x k, Replace B E by U k Vk T in X = I (ET B )E: X G k = I V k U T k E, ( Rn x n x ) Leads to... Preconditioner M = B + U k H k U T k Use recursivity Can show : H k = (I U T k EV k) and H T k = H k Modelling4 6/2/24 p.

12 Other options explored Another thought : approximate X (only) and exploit recursivity B [v + E X E T B v] However won t work: cost explodes with # levels [recursivity] We will see later how we can use this in DD framework Another possibility: approximate B E on one side only: M = B + B EG k V ku T k = B [I + EG k V ku T k ] However, can show that this is equivalent to the previous method Modelling4 6/2/24 p. 2

13 Recursive multilevel framework ( A i = B i + E i Ei T, B Bi i. Bi2) Next level, set A i B i and A i2 B i2 Repeat on A i, A i2 Last level, factor A i (IC, ILU) Binary tree structure: Modelling4 6/2/24 p. 3

14 Generalization: Domain Decomposition framework Domain partitioned into 2 domains with an edge separator Ω 2 Ω Matrix can be permuted to: P AP T = ˆB ˆF ˆF T C X X T ˆF 2 T C 2 ˆB 2 ˆF 2 Interface nodes in each domain are listed last. Modelling4 6/2/24 p. 4

15 Each matrix ˆB i is of size n i n i (interior var.) and the matrix C i is of size m i m i (interface var.) P AP T = Let: E α = ( B B2) and α used for balancing αi X T α EE T with B i = then we have: { D = α 2 I D 2 = α 2 X T X. ( ˆB i ˆF Better results when using diagonals instead of αi ˆF T i C i + D i ) Modelling4 6/2/24 p. 5

16 Theory: 2-level analysis for model problem Interested in eigenvalues γ j of A B = B EX E T B when A = Pure Laplacean.. They are: γ j = β j = α j = β j α j, j =,, n x with: n y /2 k= n y /2 k= 4 ( sin 2 sin 2 sin 2 n y kπ n y + kπ n y + + sin2 sin 2 n y kπ n y + kπ n y + + sin2 ) 2, jπ 2(n x +) jπ 2(n x +). Modelling4 6/2/24 p. 6

17 Decay of the γ j s when nx = ny = β j sqrt(β j ) α j γ j Note β j are the singular values of B E. In this particular case 3 eigenvectors will capture 92 % of the inverse whereas 5 eigenvectors will capture 97% of the inverse. Modelling4 6/2/24 p. 7

18 EXPERIMENTS

19 A few MATLAB experiments Matlab first Small problems ; real tests later Helmholtz-like problem: 2 u x 2 u 2 y ρu = 6 ρ ( 2x 2 + y 2) in Ω, 2 + Boundary conditions so solution is known ρ = constant selected to make problem more or less difficult Finite differences on a mesh (matrix size 4,96). MLR starts converging for k = 2. ρ = 845 selected so original Laplacean is shifted by.2 6 negative eigenvalues, smallest = Modelling4 6/2/24 p. 9

20 Comparison with ILUTP 2 MLLR k=2 MLLR k=3 MLLR k=5 ILUTP(.) Log of residual norm GMRES(4) iterations ILUTP vs. MLR (E) # levels = 7 for MLR Modelling4 6/2/24 p. 2

21 k nlev=7 nlev=6 nlev=5 nlev=4 nlev= MLR: GMRES(4) iteration counts and fill ratios Modelling4 6/2/24 p. 2

22 Helmoltz-like equation - a 3D case Similar set-up to 2D case grid size n = 24 3 = 3, 824 ρ = 32.5 selected so the shift is.5 - making the problem very indefinite [6 negative eigenvalues, λ min = ] GMRES(4)-MLR iteration counts and fill ratios k nlev=6 nlev=5 nlev= ILUTP fails even for very small values of droptol (large fill) Modelling4 6/2/24 p. 22

23 General matrices 7 matrices from the Univ. Florida sparse matrix collection + one from a shell problem. 7 matrices are SPD Size varies from n =, 224 (HB/bcsstm27) to n = 9, (AG-Monien/3elt dual) nnz varies from nnz = 5, 3 (HB/bcspwr6) to nnz = 355, 46 (Boeing/bcsstk38). Only indefinite cases shown

24 MATRICES (Non SPD) MLR ICT/ILUTP nlev k fill-ratio #its fill-ratio #its HB/bcsstm HB/bcspwr F HB/bcspwr F HB/bcspwr F HB/blckhole F HB/jagmesh Boeing/nasa AG-Monien/3elt_dual F AG-Monien/airfoil_dual F AG-Monien/ukerbe_dual F SHELL/COQUE8E F MLR vs. ICT/ILUTP Modelling4 6/2/24 p. 24

25 Real tests Experimental setting Hardware: Intel Xeon X5675 processor (2 MB Cache, 3.6 GHz, 6-core) C/C++; Intel Math Kernel Library (MKL,version.2) Stop when: r i 8 r or its exceeds 5 Model Problems in 2-D/3-D: u cu = g in Ω + B.C. 2-D: g(x, y) = ( x 2 + y 2 + c ) e xy ; Ω = (, ) 3. 3-D: g(x, y, z) = 6 c ( x 2 + y 2 + z 2) ; Ω = (, ) 3. F.D. Differences discret. Modelling4 6/2/24 p. 25

26 Symmetric indefinite cases c > in u cu; i.e., shifted by si. 2D case: s =., 3D case: s =.5 MLR + GMRES(4) compared to ILDLT + GMRES(4) 2-D problems: #lev= 4, rank= 5, 7, 7 3-D problems: #lev= 5, rank= 5, 7, 7 ILDLT failed for most cases Difficulties in MLR: #lev cannot be large, [no convergence] inefficient factorization at the last level (memory, CPU time)

27 Grid ILDLT-GMRES MLR-GMRES fill p-t its i-t fill p-t its i-t F F F F F F F Modelling4 6/2/24 p. 27

28 General symmetric matrices - Test matrices MATRIX N NNZ SPD DESCRIPTION Andrews/Andrews 6, 76,54 yes computer graphics pb. Williams/cant 62,45 4,7,383 yes FEM cantilever UTEP/Dubcova2 65,25,3,225 yes 2-D/3-D PDE pb. Rothberg/cfd 7,656,825,58 yes CFD pb. Schmid/thermal 82, ,458 yes thermal pb. Rothberg/cfd2 23,44 3,85,46 yes CFD pb. Schmid/thermal2,228,45 8,58,33 yes thermal pb. Cote/vibrobox 2,328 3,7 no vibroacoustic pb. Cunningham/qa8fk 66,27,66,579 no 3-D acoustics pb. Koutsovasilis/F2 7,55 5,294,285 no structural pb. Modelling4 6/2/24 p. 28

29 MATRIX ICT/ILDLT MLR-CG/GMRES fill p-t its i-t k lev fill p-t its i-t Andrews cant F Dubcova cfd thermal cfd F thermal Modelling4 6/2/24 p. 29

30 MATRIX ICT/ILDLT MLR-CG/GMRES fill p-t its i-t k lev fill p-t its i-t vibrobox F qa8fk F F Modelling4 6/2/24 p. 3

31 Avoiding recursivity: standard DD framework Work in progress Goal: avoid recursivity Consider a domain partition in p domains using vertex- based partitioniong (edge-separation) Interface nodes in each domain are listed last Modelling4 6/2/24 p. 3

32 Local view: External interface points Local interface points Interior points ( ) ( ) Bi F i ui Ei T C i y i + ( ) j N i E ij y j = ( fi g i ) Modelling4 6/2/24 p. 32

33 The global system: Global view Global system can be permuted to the form u i s internal variables y interface variables External interface points B... ˆF B 2... ˆF B p ˆF p Ê2 T... ÊT p C Ê T u u 2. y u p = b Local interface points Interior points ˆF i maps local interface points to interior points in domain Ω i Êi T does the reverse operation Modelling4 6/2/24 p. 33

34 Example: nz = 438 Modelling4 6/2/24 p. 34

35 Splitting Split as: Define: [ B Ê T ˆF C ( ) ( ( ) B ˆF B ˆF A Ê T = + C C) Ê T ( ) ( ) α ˆF α F ; E Ê Then: αi αi ] [ ] B + α = 2 ˆF Ê T C + α 2 F E T. I Property: ˆF Ê T is local, i.e., no inter-domain couplings [ B + α A 2 ˆF Ê T C + α 2 I = block-diagonal ] Modelling4 6/2/24 p. 35

36 Low-Rank Approximation DD preconditioners Sherman-Morrison A = A + A F G E T A G I E T A F Options: (a) Approximate A F, E T A, G [as before] (b) Approximate only G [new] (b) requires 2 solves with A. Let G G k Preconditioner M = A + A F G k ET A Modelling4 6/2/24 p. 36

37 Symmetric Positive Definite case Recap: Let G I E T A E I H. Then A = A + A EG E T A Approximate G by G k Matrix A is SPD M = A + (A Can show: λ j (H) <. preconditioner: E)G k (ET A ) Modelling4 6/2/24 p. 37

38 Next, approximate H as H U DU T Then: (I U DU T ) = I + U[(I D) I]U T. Now take rank-k approximation to H: H U k D k U T k G k = I U k D k U T k G k (I U k D k U T k ) = I + U k [(I D k ) I]U T k Observation: A = M + A E[G G ]ET A G k : k largest eigenvalues of G matched others set == Result: AM has n s + k eigenvalues == All others between and k Modelling4 6/2/24 p. 38

39 Alternative: reset lowest eigenvalues to constant Let H = UΛU T = exact (full) diagonalization of H We replaced Λ by: Alternative: replace Λ by λ λ λ 2 λ λ k λ k θ θ Interesting case: θ = λ k+ Question: related approximation to G? Modelling4 6/2/24 p. 39

40 Result: Let γ = /( θ). Then approx. to G is: G k,θ γi + U k[(i D k ) γi]u T k G k : k largest eigenvalues of G matched others set == θ θ = yields previous case When λ k+ θ < we get Result: AM has n s + k eigenvalues == All others Next: An example for a 9 9 Laplacean, 4 domains, s = 9 Modelling4 6/2/24 p. 4

41 k = 5 Eigenvalues of AM for the case θ = 2.5 x Modelling4 6/2/24 p. 4

42 k = 5 Eigenvalues of AM for the case θ = λ k+ 4 x Modelling4 6/2/24 p. 42

43 k = 5 Eigenvalues of AM for the case θ = λ k+ 2 x

44 Proposition Assume θ is so that λ k+ θ <. Then the eigenvalues η i of AM satisfy: η i + θ A/2 A E 2 2. Experiments: For the Laplacean (FD) and when α =, A /2 A E 2 2 = ET A AA E 2 4 regardless of the mesh-size. Being investigated. Best upper bound for θ = λ k+ Assume above is true and set θ = λ k+. Then κ(am ) constant, if k large enough so that λ k+ constant. i.e., need to capture sufficient part of spectrum Modelling4 6/2/24 p. 44

45 The symmetric indefinite case Appeal of this approach over ILU: approximate inverse Not as sensitive to indefiniteness Part of the results shown still hold But λ i (H) can be > now. Parameter α now plays a more important role Work still in progress Modelling4 6/2/24 p. 45

46 Example: Take Laplacean on a 3 3 FD grid. Subtract.4I result: 26 negative eigenvalues λ min = , λ max = Use α = 4., θ =.9; We do test for k = and then k = 5 Modelling4 6/2/24 p. 46

47 k = Eigenvalues of AM [θ =.9, α = 4] x k = All eigenvalues are > Modelling4 6/2/24 p. 47

48 k = 5 Eigenvalues of AM [θ =.9, α = 4] 6 x 2 k = eigenvalues are < Modelling4 6/2/24 p. 48

49 Parallel implementations Recall : [ ] M = A I + EG k,θ ET A G k,θ = γi + U k[(i D k ) γi]u T k Steps involved in applying M to a vector x : ALGORITHM : Preconditioning operation. z = A x // ˆB i -solves and C α solve 2. y = E T z // Interior points to interface (Loc.) 3. y k = G k,θ y // Use Low-Rank approx. 4. z k = Ey k // Interface to interior points (Loc.) 5. u = A (x + z k ) // ˆB i -solves and C α solve Modelling4 6/2/24 p. 49

50 A Solves Note: A = ˆB ˆB 2... ˆB p C α Recall ˆB i = B i + α 2 E i E T i A solve with A amounts to all p ˆB i -solves and a C α -solve Can replace C α Can use any solver for the ˆB i s by a low degree polynomial [Chebyshev] Modelling4 6/2/24 p. 5

51 Parallel tests: Itasca (MSI) HP linux cluster- with Xeons 556 ( Nehalem ) processors 2-D 3-D Mesh Nproc Rank #its Prec-t Iter-t Mesh Nproc Rank #its Prec-t Iter-t Modelling4 6/2/24 p. 5

52 Mixing Divide & Conquer and standard DD Modelling4 6/2/24 p. 52

53 Mixing Divide & Conquer and standard DD Must use a two-sided approximation Back to recursive version Recall A = A + (A E)G (A E)T G I E T (A E) Use a 2-domain partitioning + recursion Approximate A E by a low-rank matrix - get related approximation to G : A E U k Vk T M = A + U k X k X k = I U T k EV k Advantage: Natural way of splitting - no need for balancing U T k Modelling4 6/2/24 p. 53

54 A A is nearly low-rank Similar to experiment shown earlier earlier Eigenvalues of A A & 3 zooms closing in on 5 Original Zoom / Zoom /25 Zoom / Modelling4 6/2/24 p. 54

55 Conclusion Promising alternatives to ILUs can be found in new forms of approximate inverse techniques Seek data-sparsity instead of regular sparsity DD approch easier to implement, easier to understand than recursive approach Advantages of Multilevel Low-Rank preconditioners: Approximate inverses less sensitive to indefiniteness Exploit dense computations Easy to update Modelling4 6/2/24 p. 55

Multilevel low-rank approximation preconditioners Yousef Saad Department of Computer Science and Engineering University of Minnesota

Multilevel low-rank approximation preconditioners Yousef Saad Department of Computer Science and Engineering University of Minnesota SIAM CSE Boston - March 1, 2013 First: Joint work with Ruipeng Li Work