Accelerating the spike family of algorithms by solving linear systems with multiple right-hand sides

Size: px

Start display at page:

Download "Accelerating the spike family of algorithms by solving linear systems with multiple right-hand sides"

Candace Kelly
5 years ago
Views:

1 Accelerating the spike family of algorithms by solving linear systems with multiple right-hand sides Dept. of Computer Engineering & Informatics University of Patras Graduate Program in Computer Science & Technology July 30, 2014

2 Outline 1 Introduction 2 Parallel solution of banded linear systems - the Spike method 3 Solvers for linear systems with multiple right-hand sides 4 Numerical experiments 5 Bibliography

3 Goal and motivation I Important problem in NLA Efficient solution of AX = F where A R n n is a sparse matrix and F R n s is the matrix of right-hand sides. Applications Numerous... Modeling of heat propagation Weather forecasting Fluid dynamics Quantum chromodynamics (QCD) Simulation of electronic circuits Analysis of civil structures

4 Goal and motivation II The problem is very important... Numerous methods based on different assumptions... In this thesis Goal: To extend the Spike framework by using techniques for the solution of linear systems with multiple right-hand sides.

5 Characteristics of the problem The coefficient matrix A can be: A general sparse matrix nnz(a) = O(n) Banded with semi-bandwidth m (if m n nnz(a) = O(n)). The banded coefficient matrix can be: nz = nz = 167 Figure : Dense within the band Figure : Sparse within the band We focus in symmetric and positive definite (SPD) matrices.

6 Outline 1 Introduction 2 Parallel solution of banded linear systems - the Spike method 3 Solvers for linear systems with multiple right-hand sides 4 Numerical experiments 5 Bibliography

8 The Spike algorithm I The Spike [SamPol11,PolSam06] algorithm is a parallel algorithm for the solution of linear systems of the form AX = F with banded A of semi-bandwidth m, m n. Banded matrix Block tridiagonal matrix It can be divided in two phases Pre-processing Post-processing

9 An example for p = 4 partitions A 1 B 1 C 2 A 2 B 2 C 3 A 3 B 3 C 4 A 4 X 1 X 2 X 3 X 4 = F 1 F 2 F 3 F 4 A X F A j R nj nj, F j R nj s, B j, C j R m m

10 The Spike algorithm II A = DS Matrix A is factorized as A = DS (1) where D R n n is block diagonal and S R n n block tridiagonal. A1 B1 C2 A2 B2 C3 A3 B3 C4 A4 = A1 A2 A3 A4 I1 V1 W2 I2 V2 W3 I3 V3 W4 I4 A D S It holds that V j, W j R n/p m.

11 The Spike algorithm III Pre-processing stage Solution of the local linear systems A j[v j, W j] = [ˆBj, Ĉ j], j = 1,..., p. (2) Notation: ˆBj = [0 (n m) m ; B j], Ĉ j = [C j; 0 (n m) m ] Post-processing stage Solution of A jg j = F j, j = 1,..., p. (3) and SX = G. (4) Solution of (3) is performed locally in each processor. Solution of (4) demands communication between neighboring processors.

12 Solution of SX = G I Partitioning of V j, W j, X j, G j V j = V (1) j V ( ) j V (q) j, Wj = W (1) j W ( ) j W (q) j where V (1) j, V ( ) j, V (q) j and W (1), W ( ) j, W (q) j represent the first m, middle n j 2m and last m rows of V j, W j respectively. We assume the same partition for X j, G j as well. S rx r = G r, example for p = 3 I V (q) I 0 V (1) I V (q) W (1) 3 I W (1) W (q) X (q) 1 X (1) 2 X (q) 2 X (1) 3 = G (q) 1 G (1) 2 G (q) 2 G (1) 3. (5)

13 Solution of SX = G II Intermediate blocks of rows are computed locally as X 1 = G 1 V 1 X (q) 2 j = 1 X j = G j V j X (q) j+1 W j X (1) j 1 2 j p 1 X p = G p W px (1) p 1 j = p

14 Outline 1 Introduction 2 Parallel solution of banded linear systems - the Spike method 3 Solvers for linear systems with multiple right-hand sides 4 Numerical experiments 5 Bibliography

15 The Spike algorithm and the solvers for mrhs I Local linear systems in Spike algorithm The Spike algorithm leads to the solution of a linear system of the form A j[v j, W j, G j] = [ˆBj, Ĉ j, F j]. (6) Each local linear system contains 2m + s right-hand sides Obvious approach: Each right-hand side is solved on its own Solution of each right-hand side by direct methods (by factorizing A j) Solution of each right-hand side by using iterative methods

16 Solution methods Direct methods By LU decomposition CC: O(n 3 j ) Sparse structure? Pardiso, MUMPS, SuperLu... Banded structure? CC: O(n jm 2 ) By Cholesky decomposition P ja j = L ju j. (7) A j = L jl j. (8) Iterative methods Stationary point methods Krylov subspace methods

17 Krylov subspace iterative methods Solution of A jx = b Solution is sought in the Krylov subspace, If q(a j) = α 0I n + α 1A j α nj A nj j In the i-th iteration K µ(a j, b) = span{b, A jb,..., A µ 1 j b}. = 0, then A 1 j b = 1 nj 1 i=0 α i+1a i j. (9) α 0 x i x 0 + K i, r i = (b A jx i) C i. Different choices for subspace C i: C i K i or C i A jk i.

18 The Spike algorithm and the mrhs solvers II But matrix A j is the same for each right-hand side Direct methods Factorization is performed only once, P ja j = L ju j More right-hand sides better overhead amortization Iterative methods Can we employ something similar?

19 The Spike algorithm and the mrhs solvers III Goal: Reduce the solution cost of J linear systems How: Multiple Right Hand Sides Solvers Cost MRH SOLVER K J # rhs Ideally (J K) J

20 The Spike algorithm and the mrhs solvers IV A case example Suppose solving the following linear system, A[x (1), x (2) ] = [f (1), f (2) ]. First approach: solution of each Ax (i) = f (i), i = 1, 2 independently (one by one) A very simple concept but... if f (2) = γf (1) then x (2) = γx (1)!

21 The Spike algorithm and the mrhs solvers IV A case example Suppose solving the following linear system, A[x (1), x (2) ] = [f (1), f (2) ]. First approach: solution of each Ax (i) = f (i), i = 1, 2 independently (one by one) A very simple concept but... if f (2) = γf (1) then x (2) = γx (1)! MRHs Solvers: Categories The above example, although trivial, shows the path Right-hand sides can be co-linear, orthogonal... Two main classes, a) seed methods and b) block methods

22 Seed methods for the solution of mrhs problems I Seed methods - brief description Based on an "artificial" augmentation of the search subspace Solution of the first right-hand side returns an orthonormal basis V (1) i of K i(a, r (1) 0 ). Then with ( x (1,j) 0 = x (1,j) 0 + V (1) i ( T (1) i = V (1) i Procedure continues till all mrhs are solved T (1) i ) 1 ( V (1) i The j-th right-hand side is projected on the subspaces Krylov K i1 (A, r (1,1) 0 ),..., K ij 1 (A, r (j 2,j 1) 0 ) ) (1,j) r 0 (10) ) (1) AV i. (11)

23 Seed methods for the solution of mrhs problems II Algorithm 1 Generic algorithm of seed methods (seed) [SPM89,Saad87,SimG95] Input: A R n n, F R n s, X 0 R n s, tol Output: X R n s for j = 1,..., s do for i = 1,... do the j-th linear system performs the i-th iteration for g = j + 1,..., s do update solution of x (j,g) i end for end for end for

24 Seed methods for the solution of mrhs problems III Algorithm 2 The Single-Seed Conjugate Gradient algorithm (sscg) [ChanWan97] Input: A R n n, F R n s, X 0 R n s, tol R Output: X R n s R 0 = F AX 0 P 0 = R 0 for j = 1,..., s do while r (j) i / f (j) tol do α = ((r (j) i ) r (j) i )/((p (j) i ) Ap (j) x (j) = i+1 x(j) i + p (j) i α; r (j) = i+1 r(j) i β = ((r (j) i+1 ) r (j) i+1 )/((r(j) i ) r (j) i ) p (j) = i+1 r(j) + i+1 p(j) i β i ) Ap (j) i for k = j + 1,..., s do η (j,k) i = (p (j) i ) r (j,k) i /(p (j) i ) Ap (j) i x (j,k) i+1 = x (j,k) i + η (j,k) i p (j) i ; r (j,k) i+1 = r (j,k) i end for end while end for α η (j,k) i Ap (j) i

25 Seed methods for the solution of mrhs problems IV Theorem (ChanWan97) Let F = [f (1),..., f (s) ] and suppose that rank(f) = k, k < s. Then, there exists α, independent of the number of steps m p, such that for the residual of the non-solved rhs we have r (j,k) 0 α j h=1 βmp+1, k = j + 1,..., s, where value β mp+1 stems from Lanczos if we consider that the h-th rhs is solved by the Lanczos method. Proof. See [ChanWan97].

26 Block methods for the solution of the mrhs problem Block methods [Olea80,Gut09,SimG96a,SimG96b] A generalization of the classic approach. In step i, solution of the j-th right-hand side is sought in the following Krylov subspace, K B µ(a, R 0) = span{r 0, AR 0,..., A µ 1 R 0}, and x (j) i x (j) 0 + K B i (A, R 0). Dimension of the search subspace is at most s times bigger than previously, K B µ(a, R 0) K µ(a, r (1) 0 ) K µ(a, r (s) 0 ). Guaranteed to converge in at most n/s steps.

27 Numerical solution of a Laplacian PDE - discretized by finite differences n x = n y = n z = 15 AX = F, s = 50 rel. residual q=1 q=10 q=25 q= index of iterations Figure : Norm of relative residual for different blocksizes

28 An approach based on matrix inversion I For an SPD matrix H we have (H 1 ) ij Cλ i j, i, j = 1,..., n with λ = ( ) 2/m κ(h) 1. κ(h) + 1 Spikes V j, W j mult/ion of A 1 j with B j, C j We can approximate submatrices of A 1 j Can we take advantage of it? by lower rank matrices

29 An approach based on matrix inversion II Express A j as a 2 2 block matrix A j = [ (1) A j A (2) ] j A (3) j A (4) j (12) where A (1) j R (nj z) (nj z), A (2) j R (nj z) z, A (3) j R z (nj z), A (4) j R z z and z m, then A 1 j = [ (A (1) j A (2) j (A (4) j ) 1 A (3) j ) 1 A (2) j (A (4) j ) 1 (A (4) j ) 1 + (A (4) j ) 1 A (3) j (A (1) j A (2) j (A (4) j ) 1 A (3) j ) 1 A (2) j (A (4) j ) 1 ]

30 An approach based on matrix inversion III Spike V j can be computed as A (4) j Y (1) j = Bj (13) S (4) A Y (2) j = A (2) j Y (1) j (14) j A (4) j Y (3) j = A (3) j Y (2) j (15) where S (4) A = (A (1) j A (2) j (A (4) j ) 1 A (3) j ) is the Schur complement of A (4) j and by Bj we j denote the (upwards) extension of B j by z m null rows.

31 An approach based on matrix inversion IV In the same manner, we can express A j as a 2 2 block matrix [ (1) A j A (2) ] j A j = A (3) j A (4) j (16) where A (1) j R z z, A (2) j R z (nj z), A (3) j R (nj z) z, A (4) j R (nj z) (nj z). Then, ] A 1 j = [ (A (1) j ) 1 + (A (1) j ) 1 A (2) j (A (4) j A (3) j (A (1) j ) 1 A (2) j ) 1 A (3) j (A (1) j ) 1 (A (4) j A (3) j (A (1) j ) 1 A (2) j ) 1 A (3) j (A (1) j ) 1

32 An approach based on matrix inversion V Spike W j can be computed as A (1) j Y (1) j = Cj (17) S (1) A Y (2) j = A (3) j Y (1) j (18) j A (1) j Y (3) j = A (2) j Y (2) j (19) where S (1) A = (A (4) j A (3) j (A (1) j ) 1 A (2) j ) is the Schur complement of A (1) j and by Cj we j denote the (downwards) extension of C j by z m null rows.

33 Algorithm 3 The sc-spike algorithm Input : A j R n n, B j R m m, C j R m m, z Z Output : V j R n m, W j R n m {Each processor j:} {compute V j (if j < p):} A (4) j Y (1) j = ˆBj Y (2) j = proj(a (1) j A (2) j (A (4) j ) 1 A (3) j, A (2) j Y (1) j ) A (4) j Y (3) j = A (3) j Y (2) j {compute W j (if j > 1):} A (1) j Y (1) j = Ĉ j Y (2) j = proj(a (4) j A (3) j (A (1) j ) 1 A (2) j, A (3) j Y (1) j ) A (1) j Y (3) j = A (2) j Y (2) j Solve A jg j = F j by an iterative method Solve SX = G

34 Algorithm 4 proj Input: K R n n, L R n s Output: Y R n s 1. [Q, R] = QR(L) 2. l = {i : l (i) belongs in basis of L} 3. Solve KY l = L l 4. D = L l (L l L l ) 1 (L l L / l ) 5. Y / l = Y l D

35 Outline 1 Introduction 2 Parallel solution of banded linear systems - the Spike method 3 Solvers for linear systems with multiple right-hand sides 4 Numerical experiments 5 Bibliography

36 Computational system and software Computational system Experiments were performed in Purdue University 2 Inter(R) Xeon(R) processors with a total of 12 cores in 2.8 GHz 2 GBs RAM per core Software Fortran90 Intel(R) MPI Library for Linux, 64-bit, Version 4.0 mpiifort Fortran MPI compiler

37 Numerical experiments with banded matrices I About In this set of experiments we study the sc-spike algorithm Matrices were selected from the University of Florida Sparse Matrix Collection (UFSC) [DavHu11] We also present experiments with Toeplitz matrices

38 Numerical experiments with banded matrices II Table : Characteristics of the matrices used (UFSC) A n nnz(a) semi-bandwidth (m) Application 1. BenElechi1 245,874 13M 821 2D/3D 2. kim2 456, M D/3D 3. af_5_k , M 859 Structural problem 4. mc2depi 525, M 513 2D/3D

39 Matrices from UFSC I p=2 p= p=2 p=4 Numerical rank Numerical rank z x 10 4 (a) BenElechi z x 10 4 (b) mc2depi Figure : Rank of submatrix (V 1) 1:n 1:m j z (p = 2, 4) for the BenElechi1 and mc2depi matrices.

40 Matrices from UFSC II p=3 p= p=3 p=4 Numerical rank Numerical rank z x 10 4 (a) af_5_k z x 10 4 (b) kim2 Figure : Rank of submatrix (V 1) 1:m 1:n j z (p = 2, 4) for the shipsec5 and kim2 matrices.

41 Toeplitz I a ij = randn(1), a ii = δ n i=1,i j a ij numerical rank m=100 m=200 m=300 m=400 numerical rank m=100 m=200 m=300 m= z x 10 4 (a) δ = 1/ z x 10 4 (b) δ = 1 + Figure : Rank of submatrix (V 1) 1:m 1:n j z for a Toeplitz matrix with varying m and degree of diagonal dominance.

42 Toeplitz II a ij = 1/ i j, a ii = δ n i=1,i j a ij numerical rank m=100 m=200 m=300 m=400 numerical rank m=100 m=200 m=300 m= z x 10 4 (a) δ = 1/ z x 10 4 (b) δ = 1 + Figure : Rank of submatrix (V 1) 1:m 1:n j z for a Toeplitz matrix with varying m and degree of diagonal dominance.

43 Numerical experiments with banded matrices III About Matrices were selected from UFSC F = rand(n, 10) List of algorithms Alg1: All multiple right-hand sides are solved by BCG Alg2: All multiple right-hand sides are solved by BCG, preconditioned by IC(0) Cholesky: All multiple right-hand sides are solved by Cholesky

44 Numerical experiments with banded matrices IV Table : Running times of spike algorithm for varying number of partitions (s = 10). p = 2 p = 4 p = 8 Cholesky Alg1 Alg2 Cholesky Alg1 Alg2 Cholesky Alg1 Alg2 bcsstk fv t2dah_e crystm wathen wathen jnbrng apache F F F 105.2

45 The Spike algorithm as a preconditioner The Spike algorithm can be used as a banded preconditioner M We need M F A F We need pre-processing of A... Reordering? RCM, Fiedler? Weighted Fiedler (WSO) [Mang10]?

Introduction Parallel solution of banded linear

of sparse matrices I 4 0 x 10 0.5 1 1.5 2 2.

5 4 0 (a) No reordering (b) RCM 1 2 3 nz =

46 Introduction Parallel solution of banded linear systems - the Spike method Solvers for linear systems with multiple right-hand sides Numerical experiments Bibliography Reordering of sparse matrices I 4 0 x (a) No reordering (b) RCM nz = x 10 (c) WSO Figure : Non-zero pattern before and after RCM and WSO permutations for matrix opf10000

47 Reordering of sparse matrices II (a) No reordering (b) RCM (c) WSO Figure : Non-zero pattern before and after RCM and WSO permutations for matrix nd3k

48 The PSpike algorithm Algorithmic sketch 1 Reordering of the coefficient matrix A 2 Extraction of a banded preconditioner M with semi-bandwidth ˆm 3 Solve M 1 (PAP ) X = M 1 F by a Krylov subspacec method. Preconditioner is applied through Spike. 4 Form final solution, X = P X.

49 Numerical experiments with general sparse matrices About Matrices were selected from UFSC Different reordering methods Extraction of a banded preconditioner M with semi-bandwidth ˆm 1 (ˆm 2) such that M F 0.95 A F ( M F 0.99 A F) Reordering methods RCM Fiedler WSO

50 Table : Characteristics of the matrices selected from UFSC n Application 1. bcsstk Structural problem 2. eurqsa 7245 Structural problem 3. OPF_ Power electricity 4. dubcova D/3D 5. raefsky Structural problem

51 Table : Original semi-bandwidth m and semi-bandwidths ˆm 1, ˆm 2 so that M F 0.95 A F and M F 0.99 A F after symmetric reorderings RCM Fiedler WSO m ˆm 1 ˆm 2 m ˆm 1 ˆm 2 m ˆm 1 ˆm 2 bcsstk eurqsa OPF_ dubcova raefsky

52 numerical value the singular values of spike V 1 symrcm Fiedler WSO numerical value 3.5 x the singular values of spike V 1 symrcm Fiedler WSO index of singular values (a) raefsky4 (ˆm 1) index of singular values (b) raefsky4 (ˆm 2)

53 numerical value the singular values of spike V 1 symrcm Fiedler WSO numerical value the singular values of spike V 1 symrcm Fiedler WSO index of singular values (c) dubcova1 (ˆm 1) index of singular values (d) dubcova1 (ˆm 2)

54 5 x the singular values of spike V 1 symrcm Fiedler WSO 6 x the singular values of spike V 1 symrcm Fiedler WSO numerical value 3 2 numerical value index of singular values (e) bcsstk28 (ˆm 1) index of singular values (f) bcsstk28 (ˆm 2)

55 2.5 x the singular values of spike V 1 symrcm Fiedler WSO 2.5 x the singular values of spike V 1 symrcm Fiedler WSO numerical value numerical value index of singular values index of singular values (g) eurqsa (ˆm 1) (h) eurqsa (ˆm 2)

56 2.5 3 x 104 the singular values of spike V 1 symrcm Fiedler WSO x 104 the singular values of spike V 1 symrcm Fiedler WSO numerical value numerical value index of singular values index of singular values (i) OPF_3574 (ˆm 1) (j) OPF_3574 (ˆm 2)

57 Talks and manuscripts Talks NUMAN 2012, September 2012, Ioannina, Greece Manuscripts VK, F. Saied, E. Galllopoulos and A. Sameh. Accelerating Spike family of algorithms by solving linear systems with multiple right-hand sides, Technical Report

58 Acknowledgements PurdueU A. Klinvex, A. Sameh, F. Saied, Y. Zhu, G. Kollias UPatras E. Gallopoulos, E. Kontopoulou, C. Zaroliagis, A. Boudouvis

59 Conclusion In this thesis We studied the combination of Spike algorithms with techniques for solving linear systems with multiple right-hand sides We saw that under certain conditions, replacing direct solvers by these techniques can lead to more efficient schemes In future we seek to develop schemes for accelerating the Spike algorithm when not all multiple right-hand sides are available at once

60 Outline 1 Introduction 2 Parallel solution of banded linear systems - the Spike method 3 Solvers for linear systems with multiple right-hand sides 4 Numerical experiments 5 Bibliography

61 Bibliography I T. Chan and L. Wan, Analysis of projection methods for solving linear systems with multiple right-hand sides. SIAM J. Sci. Comput., 18(6), pp , T. A. Davis and Y. Hu, The university of Florida sparse matrix collection. ACM Trans. Math. Soft., 38(1), pp. 1-25, D. P. O Leary, The block conjugate gradient algorithms and related methods. Linear Algebra and its Applications, 29, pp , F. Zhang, Schur complement and its applications. Springer, E. Polizzi and A. Sameh, A parallel hybrid banded system solver: the SPIKE algorithm. Parallel Comput., 32(2), pp , 2006, Elsevier Science Publishers B. V. M. Manguoglu, M. Koyutürk, A. Sameh and A. Grama, Weighted Matrix Ordering and Parallel Banded Preconditioners for Iterative Linear System Solvers. SIAM J. Scientific Computing, 32(3), pp , M. Gutknecht and T. Schmelzer, The block grade of a Krylov subspace. Linear Alg. and its Appl., 430(11), pp , 2009.

62 Bibliography II C. F. Smith, A. F. Peterson and R. Mittra, A conjugate gradient algorithm for the treatment of multiple incident electromagnetic fields. IEEE Trans. Antennas and Propagation, 37, pp , V. Simoncini and E. Gallopoulos, An iterative method for nonsymmetric systems with multiple right-hand sides SIAM J. Sci. Comput, 16, pp , Y. Saad, On the Lanczos method for solving symmetric linear systems with multiple right-hand sides. Math. Comp., 48, pp , E. Gallopoulos, S. Hatzimihail and V. Kalantzis, Towards Lightweight Projection Methods for Linear Systems with Multiple Right-Hand Sides. Midwest Numerical Analysis Days 2011, presentation. V. Simoncini and E. Gallopoulos, Convergence properties of block GMRES and matrix polynomials. Linear Alg. and its Appl., 247, pp , V. Simoncini and E. Gallopoulos, A hybrid block GMRES method for nonsymmetric systems with multiple right-hand sides. J. of Comput. and Applied Mathematics, 66(1), pp , 1996.

AMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences)

AMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences) Lecture 19: Computing the SVD; Sparse Linear Systems Xiangmin Jiao Stony Brook University Xiangmin Jiao Numerical