LU factorization with Panel Rank Revealing Pivoting and its Communication Avoiding version

Size: px
Start display at page:

Download "LU factorization with Panel Rank Revealing Pivoting and its Communication Avoiding version"

Transcription

1 1 LU factorization with Panel Rank Revealing Pivoting and its Communication Avoiding version Amal Khabou Advisor: Laura Grigori Université Paris Sud 11, INRIA Saclay France SIAMPP12 February 17, 2012

2 2 Collaborators Jim Demmel, UC Berkeley Ming Gu, UC Berkeley

3 Plan Motivation : just work, less talk Communication Avoiding LU (CALU) LU with Panel Rank Revealing Pivoting (LU_PRRP) Communication Avoiding LU_PRRP (CALU_PRRP) Conclusions and future work 3

4 Lower bound for all direct linear algebra 4 Let M = local/ fast memory size, general lower bounds: [Ballard et al., 2011] ( # words moved = Ω # flops ( ) # flops # messages = Ω M 3 2 M )

5 Lower bound for all direct linear algebra 4 Let M = local/ fast memory size, general lower bounds: [Ballard et al., 2011] ( # words moved = Ω # flops ( ) # flops # messages = Ω Goal : reorganize linear algebra to : minimize # words moved minimize # messages M 3 2 M ) conserve numerical stability (better: to improve) don t increase the flop count (too much)

6 LU factorization (as in ScaLAPACK pdgetrf) LU factorization on a P = P r P c grid of processors For ib = 1 to n 1step b A (ib) = A(ib : n, ib : n) 1 Compute panel factorization - find pivot in each column, swap rows. 5

7 LU factorization (as in ScaLAPACK pdgetrf) LU factorization on a P = P r P c grid of processors For ib = 1 to n 1step b A (ib) = A(ib : n, ib : n) 1 Compute panel factorization # messages=o(n log 2 P r ) - find pivot in each column, swap rows. 5

8 LU factorization (as in ScaLAPACK pdgetrf) LU factorization on a P = P r P c grid of processors For ib = 1 to n 1step b A (ib) = A(ib : n, ib : n) 1 Compute panel factorization # messages=o(n log 2 P r ) - find pivot in each column, swap rows. 2 Apply all row permutations - broadcast pivot information along the rows - swap rows at left and right. 5

9 LU factorization (as in ScaLAPACK pdgetrf) LU factorization on a P = P r P c grid of processors For ib = 1 to n 1step b A (ib) = A(ib : n, ib : n) 1 Compute panel factorization # messages=o(n log 2 P r ) - find pivot in each column, swap rows. 2 Apply all row permutations # messages=o(n/b(log 2 P r + log 2 P c )) - broadcast pivot information along the rows - swap rows at left and right. 5

10 LU factorization (as in ScaLAPACK pdgetrf) LU factorization on a P = P r P c grid of processors For ib = 1 to n 1step b A (ib) = A(ib : n, ib : n) 1 Compute panel factorization # messages=o(n log 2 P r ) - find pivot in each column, swap rows. 2 Apply all row permutations # messages=o(n/b(log 2 P r + log 2 P c )) - broadcast pivot information along the rows - swap rows at left and right. 3 Compute block row of U - broadcast right diagonal block of L of current panel. 5

11 LU factorization (as in ScaLAPACK pdgetrf) LU factorization on a P = P r P c grid of processors For ib = 1 to n 1step b A (ib) = A(ib : n, ib : n) 1 Compute panel factorization # messages=o(n log 2 P r ) - find pivot in each column, swap rows. 2 Apply all row permutations # messages=o(n/b(log 2 P r + log 2 P c )) - broadcast pivot information along the rows - swap rows at left and right. 3 Compute block row of U # messages=o(n/b log 2 P c ) - broadcast right diagonal block of L of current panel. 5

12 LU factorization (as in ScaLAPACK pdgetrf) LU factorization on a P = P r P c grid of processors For ib = 1 to n 1step b A (ib) = A(ib : n, ib : n) 1 Compute panel factorization # messages=o(n log 2 P r ) - find pivot in each column, swap rows. 2 Apply all row permutations # messages=o(n/b(log 2 P r + log 2 P c )) - broadcast pivot information along the rows - swap rows at left and right. 3 Compute block row of U # messages=o(n/b log 2 P c ) - broadcast right diagonal block of L of current panel. 4 Update trailing matrix - broadcast right block column of L. - broadcast down block row of U. 5

13 LU factorization (as in ScaLAPACK pdgetrf) LU factorization on a P = P r P c grid of processors For ib = 1 to n 1step b A (ib) = A(ib : n, ib : n) 1 Compute panel factorization # messages=o(n log 2 P r ) - find pivot in each column, swap rows. 2 Apply all row permutations # messages=o(n/b(log 2 P r + log 2 P c )) - broadcast pivot information along the rows - swap rows at left and right. 3 Compute block row of U # messages=o(n/b log 2 P c ) - broadcast right diagonal block of L of current panel. 4 Update trailing matrix # messages=o(n/b(log 2 P r + log 2 P c )) - broadcast right block column of L. - broadcast down block row of U. 5

14 LU factorization (ScaLAPACK) upper bounds 2D algorithm P = P r P c, M = O [Ballard et al., 2011] ( ) # words moved = Ω n 2 P ( n 2 P ), Lower bounds : ( ) # messages = Ω P 6

15 LU factorization (ScaLAPACK) upper bounds 2D algorithm P = P r P c, M = O [Ballard et al., 2011] ( ) # words moved = Ω n 2 P ( n 2 P ), Lower bounds : ( ) # messages = Ω P LU with partial pivoting (ScaLAPACK ) P = P P, b = bound) : n P (upper Factor exceeding lower bounds for #words moved log P Factor exceeding lower bounds for #messages ( ) P n log P 6

16 LU factorization (ScaLAPACK) upper bounds 2D algorithm P = P r P c, M = O [Ballard et al., 2011] ( ) # words moved = Ω n 2 P ( n 2 P ), Lower bounds : ( ) # messages = Ω P LU with partial pivoting (ScaLAPACK ) P = P P, b = bound) : n P (upper Factor exceeding lower bounds for #words moved log P Factor exceeding lower bounds for #messages ( ) P n log P The lower bounds are attained by CALU: LU factorization with tournament pivoting [Grigori et al., 2011] 6

17 Stability of the LU factorization 7 Consider the growth factor [Wilkinson, 1961] g W = max i,j,k a (k) i,j max i,j a i,j where a (k) i,j denotes the entry in position (i, j) obtained after k steps of elimination.

18 Stability of the LU factorization 7 Consider the growth factor [Wilkinson, 1961] g W = max i,j,k a (k) i,j max i,j a i,j where a (k) i,j denotes the entry in position (i, j) obtained after k steps of elimination. Worst case growth factor : Partial pivoting : g W 2 n 1

19 Stability of the LU factorization 7 Consider the growth factor [Wilkinson, 1961] g W = max i,j,k a (k) i,j max i,j a i,j where a (k) i,j denotes the entry in position (i, j) obtained after k steps of elimination. Worst case growth factor : Partial pivoting : g W 2 n 1 For partial pivoting, the upper bound is attained for the Wilkinson matrix.

20 CALU: Tournament pivoting - the overall idea Consider a block algorithm that factors an n n matrix A by traversing panels of b columns. b {}}{ n b {}}{ A 12 A = A 11 A 21 A 22 where, W = [ A11 A 21 ] 8

21 CALU: Tournament pivoting - the overall idea Consider a block algorithm that factors an n n matrix A by traversing panels of b columns. b {}}{ n b {}}{ A 12 A = A 11 A 21 A 22 For each panel W where, W = [ A11 A 21 ] 8

22 CALU: Tournament pivoting - the overall idea Consider a block algorithm that factors an n n matrix A by traversing panels of b columns. b {}}{ n b {}}{ A 12 A = A 11 A 21 A 22 where, W = [ A11 A 21 For each panel W Find at low communication cost good pivots for the LU factorization of the panel W, return a permutation matrix Π (preprocessing step). ] 8

23 CALU: Tournament pivoting - the overall idea Consider a block algorithm that factors an n n matrix A by traversing panels of b columns. b {}}{ n b {}}{ A 12 A = A 11 A 21 A 22 where, W = [ A11 A 21 For each panel W Find at low communication cost good pivots for the LU factorization of the panel W, return a permutation matrix Π (preprocessing step). Apply Π to the input matrix A. ] 8

24 CALU: Tournament pivoting - the overall idea Consider a block algorithm that factors an n n matrix A by traversing panels of b columns. b {}}{ n b {}}{ A 12 A = A 11 A 21 A 22 where, W = [ A11 A 21 For each panel W Find at low communication cost good pivots for the LU factorization of the panel W, return a permutation matrix Π (preprocessing step). Apply Π to the input matrix A. Compute LU with no pivoting of ΠW, update trailing matrix. ] 8

25 CALU: Tournament pivoting - the overall idea Consider a block algorithm that factors an n n matrix A by traversing panels of b columns. b {}}{ n b {}}{ A 12 A = A 11 A 21 A 22 where, W = [ A11 A 21 For each panel W Find at low communication cost good pivots for the LU factorization of the panel W, return a permutation matrix Π (preprocessing step). Apply Π to the input matrix A. Compute LU with no pivoting of ΠW, update trailing matrix. benefit: b times fewer messages overall for the panel factorization - faster ] 8

26 Stability of CALU Worst case analysis of growth factor for a reduction tree of height H : CALU vs GEPP matrix of size m (b + 1) TSLU(b,H) GEPP g W upper bound 2 b(h+1) 2 b matrix of size m n CALU GEPP g W upper bound 2 n(h+1) 1 2 n 1 The growth factor upper bound of CALU is worse than GEPP, but CALU is still stable in practice.[grigori et al., 2011] 9

27 Motivation CALU does an optimal amount of communication. CALU is as stable as GEPP in practice.[grigori et al., 2011] CALU is worse than GEPP in terms of theoretical growth factor upper bound. Our goal: to improve the stability of both GEPP and CALU. 10

28 11 Strong Rank Revealing QR To improve the stability of LU: Pivoting strategy based on the Strong RRQR factorization [ A T R11 R Π = QR = Q 12 R 22 ]

29 Strong Rank Revealing QR To improve the stability of LU: Pivoting strategy based on the Strong RRQR factorization [ A T R11 R Π = QR = Q 12 R 22 This factorization satisfies three conditions [Gu and Eisenstat, 1996]: every singular value of R 11 is large. every singular value of R 22 is small. every element of R 1 11 R 12 could be bounded by a given threshold τ. ] 11

30 Strong Rank Revealing QR 12 Result: W T Π = QR with R 1 11 R 12 max τ Compute W T Π = QR RRQR with column pivoting ; [ while there exist i and j such that R 1 11 R 12 > τ do ij Compute the QR factorization of R Π ij (QR updates) ; end Algorithm: Strong RRQR of the panel W ]

31 Strong Rank Revealing QR 12 Result: W T Π = QR with R 1 11 R 12 max τ Compute W T Π = QR RRQR with column pivoting ; [ while there exist i and j such that R 1 11 R 12 > τ do ij Compute the QR factorization of R Π ij (QR updates) ; end Algorithm: Strong RRQR of the panel W ] After each swap the determinant of R 11 increases by at least τ. There are only a finite number of permutations. Strong RRQR does O(mb 2 ) flops in the worst case.

32 LU_PRRP : LU with Panel Rank Revealing Pivoting Consider a block algorithm that factors an n n matrix A by traversing panels of b columns. b n b {}}{{}}{ [ ] A = A 11 A 12 A11 where, W = A 21 A 21 A 22 13

33 LU_PRRP : LU with Panel Rank Revealing Pivoting Consider a block algorithm that factors an n n matrix A by traversing panels of b columns. b n b {}}{{}}{ [ ] A = A 11 A 12 A11 where, W = A 21 A 21 A 22 For each panel W 13

34 LU_PRRP : LU with Panel Rank Revealing Pivoting Consider a block algorithm that factors an n n matrix A by traversing panels of b columns. b n b {}}{{}}{ [ ] A = A 11 A 12 A11 where, W = A 21 A 21 A 22 For each panel W perform the Strong RRQR factorization of the transpose of the panel W, find a permutation matrix Π W T Π = Q [ R 11 R 12 ] such as L21 = R T 12(R 1 11 )T τ 13

35 LU_PRRP : LU with Panel Rank Revealing Pivoting Consider a block algorithm that factors an n n matrix A by traversing panels of b columns. b n b {}}{{}}{ [ ] A = A 11 A 12 A11 where, W = A 21 A 21 A 22 For each panel W perform the Strong RRQR factorization of the transpose of the panel W, find a permutation matrix Π W T Π = Q [ R 11 R 12 ] such as L21 = R T 12(R 1 11 )T τ Apply Π T to the input matrix A. 13

36 LU_PRRP : LU with Panel Rank Revealing Pivoting Consider a block algorithm that factors an n n matrix A by traversing panels of b columns. b n b {}}{{}}{ [ ] A = A 11 A 12 A11 where, W = A 21 A 21 A 22 For each panel W perform the Strong RRQR factorization of the transpose of the panel W, find a permutation matrix Π W T Π = Q [ R 11 R 12 ] such as L21 = R T 12(R 1 11 )T τ Apply Π T to the input matrix A. Update the trailing matrix as follows. [ ] [  = Π T Ib Â11  A =. 12 L 21 I m b where  s 22 ]  s 22 = Â22 L 21  12 13

37 LU_PRRP : LU with Panel Rank Revealing Pivoting 14 Perform an additional GEPP on the b b diagonal block Â11.

38 LU_PRRP : LU with Panel Rank Revealing Pivoting 14 Perform an additional GEPP on the b b diagonal block Â11. Update the corresponding trailing matrix Â12. Â = [ Ib L 21 I m b ] [ L11. I m b ] [ U11 U 12. Â s 22 ]

39 LU_PRRP : LU with Panel Rank Revealing Pivoting 14 Perform an additional GEPP on the b b diagonal block Â11. Update the corresponding trailing matrix Â12. Â = [ Ib L 21 I m b ] [ L11. I m b ] [ U11 U 12. Â s 22 ] LU_PRRP has the same memory requirements as the standard LU decomposition. The total cost is about 2 3 n3 + O(n 2 b) flops. Is only O(n 2 b) flops more than GEPP.

40 Stability of LU_PRRP (1/4) Worst case analysis of the growth factor : LU_PRRP vs GEPP. matrix of size m (b + 1) LU_PRRP GEPP g W upper bound 1 + τb 2 b matrix of size m n LU_PRRP GEPP g W upper bound (1 + τb) n b 2 n 1 LU_PRRP is more stable than GEPP in terms of theoretical growth factor upper bound. 15

41 Stability of LU_PRRP (2/4) The upper bound for different panel sizes with a parameter τ = 2 For the different panel sizes, LU_PRRP is more stable than GEPP. b g W 8 (1.425) n 16 (1.244) n 32 (1.139) n 64 (1.078) n 128 (1.044) n 16

42 Stability of LU_PRRP (3/4) 17 LU_PRRP is more resistant to pathological matrices on which GEPP fails:

43 Stability of LU_PRRP (3/4) 17 LU_PRRP is more resistant to pathological matrices on which GEPP fails: the Wilkinson matrix.

44 Stability of LU_PRRP (3/4) 17 LU_PRRP is more resistant to pathological matrices on which GEPP fails: the Wilkinson matrix. n b g W U 1 U 1 1 L 1 L 1 1 PA LU F A F e e e e e e e e e e e e e e e e e e e e-19

45 Stability of LU_PRRP (3/4) 17 LU_PRRP is more resistant to pathological matrices on which GEPP fails: the Wilkinson matrix. n b g W U 1 U 1 1 L 1 L 1 1 PA LU F A F e e e e e e e e e e e e e e e e e e e e-19 the Foster matrix: a concrete physical example that arises from using the quadrature method to solve a certain Volterra integral equation [Foster, 1994].

46 Stability of LU_PRRP (4/4) n b g W U 1 U 1 1 L 1 L 1 1 PA LU F A F e e e e e e e e e e e e e e e e e e e e e e e e e-17 18

47 Stability of LU_PRRP (4/4) n b g W U 1 U 1 1 L 1 L 1 1 PA LU F A F e e e e e e e e e e e e e e e e e e e e e e e e e-17 the Wright matrix: two-point boundary value problems, the multiple shooting algorithm [WRIGHT, 1993]. 18

48 Stability of LU_PRRP (4/4) n b g W U 1 U 1 1 L 1 L 1 1 PA LU F A F e e e e e e e e e e e e e e e e e e e e e e e e e-17 the Wright matrix: two-point boundary value problems, the multiple shooting algorithm [WRIGHT, 1993]. n b g W U 1 U 1 1 L 1 L 1 1 PA LU F A F e e e e e e e e e e e e e e e e e e e e e e e e e-16 18

49 CALU_PRRP: Communication-Avoiding LU_PRRP Consider a block algorithm that factors an n n matrix A by traversing panels of b columns. b {}}{ n b {}}{ A 12 A = A 11 A 21 A 22 where, W = [ A11 A 21 ] 19

50 CALU_PRRP: Communication-Avoiding LU_PRRP Consider a block algorithm that factors an n n matrix A by traversing panels of b columns. b {}}{ n b {}}{ A 12 A = A 11 A 21 A 22 For each panel W where, W = [ A11 A 21 ] 19

51 CALU_PRRP: Communication-Avoiding LU_PRRP Consider a block algorithm that factors an n n matrix A by traversing panels of b columns. b {}}{ n b {}}{ A 12 A = A 11 A 21 A 22 where, W = [ A11 A 21 For each panel W Find at low communication cost b pivot rows for the QR factorization of W T, return a permutation matrix Π (preprocessing step). ] 19

52 CALU_PRRP: Communication-Avoiding LU_PRRP Consider a block algorithm that factors an n n matrix A by traversing panels of b columns. b {}}{ n b {}}{ A 12 A = A 11 A 21 A 22 where, W = [ A11 A 21 For each panel W Find at low communication cost b pivot rows for the QR factorization of W T, return a permutation matrix Π (preprocessing step). Apply Π T to the input matrix A. ] 19

53 CALU_PRRP: Communication-Avoiding LU_PRRP Consider a block algorithm that factors an n n matrix A by traversing panels of b columns. b {}}{ n b {}}{ A 12 A = A 11 A 21 A 22 where, W = [ A11 A 21 For each panel W Find at low communication cost b pivot rows for the QR factorization of W T, return a permutation matrix Π (preprocessing step). Apply Π T to the input matrix A. Compute QR with no pivoting ofw T Π, update trailing matrix as for the LU_PRRP algorithm. ] 19

54 CALU_PRRP: Communication-Avoiding LU_PRRP Consider a block algorithm that factors an n n matrix A by traversing panels of b columns. b {}}{ n b {}}{ A 12 A = A 11 A 21 A 22 where, W = [ A11 A 21 For each panel W Find at low communication cost b pivot rows for the QR factorization of W T, return a permutation matrix Π (preprocessing step). Apply Π T to the input matrix A. Compute QR with no pivoting ofw T Π, update trailing matrix as for the LU_PRRP algorithm. Perform GEPP on the b b diagonal block and update the corresponding trailing matrix (to obtain the LU decomposition). ] 19

55 CALU_PRRP: preprocessing step Consider the panel W, partitioned over P = 4 processors as A 00 W = A 10 A 20, A 30 20

56 CALU_PRRP: preprocessing step Consider the panel W, partitioned over P = 4 processors as The panel factorization uses the following binary tree : A 00 W = A 10 A 20, A 00 A 10 A 20 A 01 A 02 A 30 A 30 A 11 20

57 CALU_PRRP: preprocessing step Consider the panel W, partitioned over P = 4 processors as The panel factorization uses the following binary tree : A 00 W = A 10 A 20, A 00 A 10 A 20 A 01 A 02 A 30 A 30 A 11 1 Compute Strong RRQR factorization of the transpose of each block A i0 of the panel W, find a permutation Π 0 A T 00 Π 00 A T 10 Π 10 A T 20 Π 20 A T 30 Π 30 = Q 00 R 00 Q 10 R 10 Q 20 R 20 Q 30 R 30 20

58 CALU_PRRP: preprocessing step Consider the panel W, partitioned over P = 4 processors as The panel factorization uses the following binary tree : A 00 W = A 10 A 20, A 00 A 10 A 20 A 01 A 02 A 30 A 30 A 11 1 Compute Strong RRQR factorization of the transpose of each block A i0 of the panel W, find a permutation Π 0 A T 00 Π 00 A T 10 Π 10 A T 20 Π 20 A T 30 Π 30 = Q 00 R 00 Q 10 R 10 Q 20 R 20 Q 30 R 30 2 Perform log P times Strong RRQR factorizations of 2b b blocks, find permutations Π 1 and Π 2 A T [ 01 Π 01 = (A T 00 Π 00)(:, 1 : b); (A T T 10 Π 10)(:, 1 : b)] Π01 = Q 01 R 01 A T [ 11 Π 11 = (A T 20 Π 20)(:, 1 : b); (A T T 30 Π 30)(:, 1 : b)] Π11 = Q 11 R 11 A T [ 02 Π 02 = (A T 01 Π 01)(:, 1 : b); (A T ] T 11 Π 11)(:, 1 : b) Π02 = Q 02 R 02 20

59 Stability of CALU_PRRP 21 Worst case analysis of the growth factor for a reduction tree of height H : CALU_PRRP vs CALU. matrix of size m (b + 1) TSLU_PRRP(b,H) TSLU(b,H) g W upper bound (1 + τb) H+1 2 b(h+1) matrix of size m n CALU_PRRP CALU g W upper bound (1 + τb) n b (H+1) 1 2 n(h+1) 1

60 Stability of CALU_PRRP 21 Worst case analysis of the growth factor for a reduction tree of height H : CALU_PRRP vs CALU. matrix of size m (b + 1) TSLU_PRRP(b,H) TSLU(b,H) g W upper bound (1 + τb) H+1 2 b(h+1) matrix of size m n CALU_PRRP CALU g W upper bound (1 + τb) n b (H+1) 1 2 n(h+1) 1 CALU_PRRP is more stable than CALU in terms of theoretical growth factor upper bound. For the binary reduction tree, it is more stable than GEPP when b log(τb) log P (b = n ) P

61 CALU_PRRP: experimental results CALU_PRRP is as stable as GEPP in practice [Khabou et al., 2012]. QR with column pivoting is sufficient to attain the bound τ in practice. 22

62 Overall summary LU_PRRP more stable than GEPP in terms of growth factor upper bound. more resistant to pathological matrices. same memory requirements as the standard LU. CALU_PRRP an optimal amount of communication at the cost of redundant computation. more stable than CALU in terms of growth factor upper bound. more stable than GEPP in terms of growth factor upper bound under certain conditions. 23

63 24 Future work Estimating the performance of CALU_PRRP on parallel machines based on multicore processors, and comparing it with the performance of CALU. Design of a communication avoiding algorithm that has smaller bounds on the growth factor than that of GEPP in general.

64 References I Ballard, G., Demmel, J., Holtz, O., and Schwartz, O. (2011). Minimizing communication in linear algebra. SIMAX, 32: Foster, L. V. (1994). Gaussian Elimination with Partial Pivoting Can Fail in Practice. SIAM J. Matrix Anal. Appl., 15: Grigori, L., Demmel, J., and Xiang, H. (2011). CALU: A communication optimal LU factorization algorithm. SIAM Journal on Matrix Analysis and Applications. Gu, M. and Eisenstat, S. C. (1996). Efficient Algorithms For Computing A Strong Rank Reaviling QR Factorization. SIAM J. SCI.COMPUT., 17: Khabou, A., Demmel, J., Grigori, L., and Gu, M. (2012). LU factorization with panel rank revealing pivoting and its communication avoiding version. Technical Report UCB/EECS , EECS Department, University of California, Berkeley. Wilkinson, J. H. (1961). Error analysis of direct methods of matrix inversion. J. Assoc. Comput. Mach., 8: WRIGHT, S. J. (1993). A collection of problems for which gaussian elimination with partial pivoting is unstable. SIAM J. SCI. STATIST. COMPUT., 14:

LU Factorization with Panel Rank Revealing Pivoting and Its Communication Avoiding Version

LU Factorization with Panel Rank Revealing Pivoting and Its Communication Avoiding Version LU Factorization with Panel Rank Revealing Pivoting and Its Communication Avoiding Version Amal Khabou James Demmel Laura Grigori Ming Gu Electrical Engineering and Computer Sciences University of California

More information

c 2013 Society for Industrial and Applied Mathematics

c 2013 Society for Industrial and Applied Mathematics SIAM J. MATRIX ANAL. APPL. Vol. 34, No. 3, pp. 1401 1429 c 2013 Society for Industrial and Applied Mathematics LU FACTORIZATION WITH PANEL RANK REVEALING PIVOTING AND ITS COMMUNICATION AVOIDING VERSION

More information

Hybrid static/dynamic scheduling for already optimized dense matrix factorization. Joint Laboratory for Petascale Computing, INRIA-UIUC

Hybrid static/dynamic scheduling for already optimized dense matrix factorization. Joint Laboratory for Petascale Computing, INRIA-UIUC Hybrid static/dynamic scheduling for already optimized dense matrix factorization Simplice Donfack, Laura Grigori, INRIA, France Bill Gropp, Vivek Kale UIUC, USA Joint Laboratory for Petascale Computing,

More information

Communication-avoiding LU and QR factorizations for multicore architectures

Communication-avoiding LU and QR factorizations for multicore architectures Communication-avoiding LU and QR factorizations for multicore architectures DONFACK Simplice INRIA Saclay Joint work with Laura Grigori INRIA Saclay Alok Kumar Gupta BCCS,Norway-5075 16th April 2010 Communication-avoiding

More information

CALU: A Communication Optimal LU Factorization Algorithm

CALU: A Communication Optimal LU Factorization Algorithm CALU: A Communication Optimal LU Factorization Algorithm James Demmel Laura Grigori Hua Xiang Electrical Engineering and Computer Sciences University of California at Berkeley Technical Report No. UCB/EECS-010-9

More information

Communication Avoiding LU Factorization using Complete Pivoting

Communication Avoiding LU Factorization using Complete Pivoting Communication Avoiding LU Factorization using Complete Pivoting Implementation and Analysis Avinash Bhardwaj Department of Industrial Engineering and Operations Research University of California, Berkeley

More information

Dense LU factorization and its error analysis

Dense LU factorization and its error analysis Dense LU factorization and its error analysis Laura Grigori INRIA and LJLL, UPMC February 2016 Plan Basis of floating point arithmetic and stability analysis Notation, results, proofs taken from [N.J.Higham,

More information

Communication avoiding parallel algorithms for dense matrix factorizations

Communication avoiding parallel algorithms for dense matrix factorizations Communication avoiding parallel dense matrix factorizations 1/ 44 Communication avoiding parallel algorithms for dense matrix factorizations Edgar Solomonik Department of EECS, UC Berkeley October 2013

More information

Introduction to communication avoiding algorithms for direct methods of factorization in Linear Algebra

Introduction to communication avoiding algorithms for direct methods of factorization in Linear Algebra Introduction to communication avoiding algorithms for direct methods of factorization in Linear Algebra Laura Grigori Abstract Modern, massively parallel computers play a fundamental role in a large and

More information

Introduction to communication avoiding linear algebra algorithms in high performance computing

Introduction to communication avoiding linear algebra algorithms in high performance computing Introduction to communication avoiding linear algebra algorithms in high performance computing Laura Grigori Inria Rocquencourt/UPMC Contents 1 Introduction............................ 2 2 The need for

More information

Spectrum-Revealing Matrix Factorizations Theory and Algorithms

Spectrum-Revealing Matrix Factorizations Theory and Algorithms Spectrum-Revealing Matrix Factorizations Theory and Algorithms Ming Gu Department of Mathematics University of California, Berkeley April 5, 2016 Joint work with D. Anderson, J. Deursch, C. Melgaard, J.

More information

Avoiding Communication in Distributed-Memory Tridiagonalization

Avoiding Communication in Distributed-Memory Tridiagonalization Avoiding Communication in Distributed-Memory Tridiagonalization SIAM CSE 15 Nicholas Knight University of California, Berkeley March 14, 2015 Joint work with: Grey Ballard (SNL) James Demmel (UCB) Laura

More information

Rank revealing factorizations, and low rank approximations

Rank revealing factorizations, and low rank approximations Rank revealing factorizations, and low rank approximations L. Grigori Inria Paris, UPMC January 2018 Plan Low rank matrix approximation Rank revealing QR factorization LU CRTP: Truncated LU factorization

More information

Linear Algebra Section 2.6 : LU Decomposition Section 2.7 : Permutations and transposes Wednesday, February 13th Math 301 Week #4

Linear Algebra Section 2.6 : LU Decomposition Section 2.7 : Permutations and transposes Wednesday, February 13th Math 301 Week #4 Linear Algebra Section. : LU Decomposition Section. : Permutations and transposes Wednesday, February 1th Math 01 Week # 1 The LU Decomposition We learned last time that we can factor a invertible matrix

More information

ETNA Kent State University

ETNA Kent State University C 8 Electronic Transactions on Numerical Analysis. Volume 17, pp. 76-2, 2004. Copyright 2004,. ISSN 1068-613. etnamcs.kent.edu STRONG RANK REVEALING CHOLESKY FACTORIZATION M. GU AND L. MIRANIAN Abstract.

More information

c 2015 Society for Industrial and Applied Mathematics

c 2015 Society for Industrial and Applied Mathematics SIAM J MATRIX ANAL APPL Vol 36, No, pp 55 89 c 05 Society for Industrial and Applied Mathematics COMMUNICATION AVOIDING RANK REVEALING QR FACTORIZATION WITH COLUMN PIVOTING JAMES W DEMMEL, LAURA GRIGORI,

More information

Low Rank Approximation of a Sparse Matrix Based on LU Factorization with Column and Row Tournament Pivoting

Low Rank Approximation of a Sparse Matrix Based on LU Factorization with Column and Row Tournament Pivoting Low Rank Approximation of a Sparse Matrix Based on LU Factorization with Column and Row Tournament Pivoting James Demmel Laura Grigori Sebastien Cayrols Electrical Engineering and Computer Sciences University

More information

Minisymposia 9 and 34: Avoiding Communication in Linear Algebra. Jim Demmel UC Berkeley bebop.cs.berkeley.edu

Minisymposia 9 and 34: Avoiding Communication in Linear Algebra. Jim Demmel UC Berkeley bebop.cs.berkeley.edu Minisymposia 9 and 34: Avoiding Communication in Linear Algebra Jim Demmel UC Berkeley bebop.cs.berkeley.edu Motivation (1) Increasing parallelism to exploit From Top500 to multicores in your laptop Exponentially

More information

Bindel, Fall 2016 Matrix Computations (CS 6210) Notes for

Bindel, Fall 2016 Matrix Computations (CS 6210) Notes for 1 Logistics Notes for 2016-09-14 1. There was a goof in HW 2, problem 1 (now fixed) please re-download if you have already started looking at it. 2. CS colloquium (4:15 in Gates G01) this Thurs is Margaret

More information

2.5D algorithms for distributed-memory computing

2.5D algorithms for distributed-memory computing ntroduction for distributed-memory computing C Berkeley July, 2012 1/ 62 ntroduction Outline ntroduction Strong scaling 2.5D factorization 2/ 62 ntroduction Strong scaling Solving science problems faster

More information

Iterative methods, preconditioning, and their application to CMB data analysis. Laura Grigori INRIA Saclay

Iterative methods, preconditioning, and their application to CMB data analysis. Laura Grigori INRIA Saclay Iterative methods, preconditioning, and their application to CMB data analysis Laura Grigori INRIA Saclay Plan Motivation Communication avoiding for numerical linear algebra Novel algorithms that minimize

More information

Practicality of Large Scale Fast Matrix Multiplication

Practicality of Large Scale Fast Matrix Multiplication Practicality of Large Scale Fast Matrix Multiplication Grey Ballard, James Demmel, Olga Holtz, Benjamin Lipshitz and Oded Schwartz UC Berkeley IWASEP June 5, 2012 Napa Valley, CA Research supported by

More information

Accelerating linear algebra computations with hybrid GPU-multicore systems.

Accelerating linear algebra computations with hybrid GPU-multicore systems. Accelerating linear algebra computations with hybrid GPU-multicore systems. Marc Baboulin INRIA/Université Paris-Sud joint work with Jack Dongarra (University of Tennessee and Oak Ridge National Laboratory)

More information

Tradeoffs between synchronization, communication, and work in parallel linear algebra computations

Tradeoffs between synchronization, communication, and work in parallel linear algebra computations Tradeoffs between synchronization, communication, and work in parallel linear algebra computations Edgar Solomonik, Erin Carson, Nicholas Knight, and James Demmel Department of EECS, UC Berkeley February,

More information

Matrix decompositions

Matrix decompositions Matrix decompositions How can we solve Ax = b? 1 Linear algebra Typical linear system of equations : x 1 x +x = x 1 +x +9x = 0 x 1 +x x = The variables x 1, x, and x only appear as linear terms (no powers

More information

Solving linear equations with Gaussian Elimination (I)

Solving linear equations with Gaussian Elimination (I) Term Projects Solving linear equations with Gaussian Elimination The QR Algorithm for Symmetric Eigenvalue Problem The QR Algorithm for The SVD Quasi-Newton Methods Solving linear equations with Gaussian

More information

A communication-avoiding parallel algorithm for the symmetric eigenvalue problem

A communication-avoiding parallel algorithm for the symmetric eigenvalue problem A communication-avoiding parallel algorithm for the symmetric eigenvalue problem Edgar Solomonik Department of Computer Science University of Illinois at Urbana-Champaign Householder Symposium XX June

More information

ETNA Kent State University

ETNA Kent State University Electronic Transactions on Numerical Analysis. Volume 17, pp. 76-92, 2004. Copyright 2004,. ISSN 1068-9613. ETNA STRONG RANK REVEALING CHOLESKY FACTORIZATION M. GU AND L. MIRANIAN Ý Abstract. For any symmetric

More information

A fast randomized algorithm for approximating an SVD of a matrix

A fast randomized algorithm for approximating an SVD of a matrix A fast randomized algorithm for approximating an SVD of a matrix Joint work with Franco Woolfe, Edo Liberty, and Vladimir Rokhlin Mark Tygert Program in Applied Mathematics Yale University Place July 17,

More information

Avoiding communication in linear algebra. Laura Grigori INRIA Saclay Paris Sud University

Avoiding communication in linear algebra. Laura Grigori INRIA Saclay Paris Sud University Avoiding communication in linear algebra Laura Grigori INRIA Saclay Paris Sud University Motivation Algorithms spend their time in doing useful computations (flops) or in moving data between different

More information

14.2 QR Factorization with Column Pivoting

14.2 QR Factorization with Column Pivoting page 531 Chapter 14 Special Topics Background Material Needed Vector and Matrix Norms (Section 25) Rounding Errors in Basic Floating Point Operations (Section 33 37) Forward Elimination and Back Substitution

More information

Rank Revealing QR factorization. F. Guyomarc h, D. Mezher and B. Philippe

Rank Revealing QR factorization. F. Guyomarc h, D. Mezher and B. Philippe Rank Revealing QR factorization F. Guyomarc h, D. Mezher and B. Philippe 1 Outline Introduction Classical Algorithms Full matrices Sparse matrices Rank-Revealing QR Conclusion CSDA 2005, Cyprus 2 Situation

More information

AMS526: Numerical Analysis I (Numerical Linear Algebra)

AMS526: Numerical Analysis I (Numerical Linear Algebra) AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 12: Gaussian Elimination and LU Factorization Xiangmin Jiao SUNY Stony Brook Xiangmin Jiao Numerical Analysis I 1 / 10 Gaussian Elimination

More information

Symmetric Pivoting in ScaLAPACK Craig Lucas University of Manchester Cray User Group 8 May 2006, Lugano

Symmetric Pivoting in ScaLAPACK Craig Lucas University of Manchester Cray User Group 8 May 2006, Lugano Symmetric Pivoting in ScaLAPACK Craig Lucas University of Manchester Cray User Group 8 May 2006, Lugano Introduction Introduction We wanted to parallelize a serial algorithm for the pivoted Cholesky factorization

More information

LU Factorization. LU factorization is the most common way of solving linear systems! Ax = b LUx = b

LU Factorization. LU factorization is the most common way of solving linear systems! Ax = b LUx = b AM 205: lecture 7 Last time: LU factorization Today s lecture: Cholesky factorization, timing, QR factorization Reminder: assignment 1 due at 5 PM on Friday September 22 LU Factorization LU factorization

More information

Low rank approximation of a sparse matrix based on LU factorization with column and row tournament pivoting

Low rank approximation of a sparse matrix based on LU factorization with column and row tournament pivoting Low rank approximation of a sparse matrix based on LU factorization with column and row tournament pivoting Laura Grigori, Sebastien Cayrols, James W. Demmel To cite this version: Laura Grigori, Sebastien

More information

1 Multiply Eq. E i by λ 0: (λe i ) (E i ) 2 Multiply Eq. E j by λ and add to Eq. E i : (E i + λe j ) (E i )

1 Multiply Eq. E i by λ 0: (λe i ) (E i ) 2 Multiply Eq. E j by λ and add to Eq. E i : (E i + λe j ) (E i ) Direct Methods for Linear Systems Chapter Direct Methods for Solving Linear Systems Per-Olof Persson persson@berkeleyedu Department of Mathematics University of California, Berkeley Math 18A Numerical

More information

CSE 160 Lecture 13. Numerical Linear Algebra

CSE 160 Lecture 13. Numerical Linear Algebra CSE 16 Lecture 13 Numerical Linear Algebra Announcements Section will be held on Friday as announced on Moodle Midterm Return 213 Scott B Baden / CSE 16 / Fall 213 2 Today s lecture Gaussian Elimination

More information

Numerical Linear Algebra

Numerical Linear Algebra Numerical Linear Algebra Direct Methods Philippe B. Laval KSU Fall 2017 Philippe B. Laval (KSU) Linear Systems: Direct Solution Methods Fall 2017 1 / 14 Introduction The solution of linear systems is one

More information

Low-Rank Approximations, Random Sampling and Subspace Iteration

Low-Rank Approximations, Random Sampling and Subspace Iteration Modified from Talk in 2012 For RNLA study group, March 2015 Ming Gu, UC Berkeley mgu@math.berkeley.edu Low-Rank Approximations, Random Sampling and Subspace Iteration Content! Approaches for low-rank matrix

More information

Linear Systems of n equations for n unknowns

Linear Systems of n equations for n unknowns Linear Systems of n equations for n unknowns In many application problems we want to find n unknowns, and we have n linear equations Example: Find x,x,x such that the following three equations hold: x

More information

Matrix decompositions

Matrix decompositions Matrix decompositions How can we solve Ax = b? 1 Linear algebra Typical linear system of equations : x 1 x +x = x 1 +x +9x = 0 x 1 +x x = The variables x 1, x, and x only appear as linear terms (no powers

More information

Using Random Butterfly Transformations to Avoid Pivoting in Sparse Direct Methods

Using Random Butterfly Transformations to Avoid Pivoting in Sparse Direct Methods Using Random Butterfly Transformations to Avoid Pivoting in Sparse Direct Methods Marc Baboulin 1, Xiaoye S. Li 2 and François-Henry Rouet 2 1 University of Paris-Sud, Inria Saclay, France 2 Lawrence Berkeley

More information

LU Factorization. Marco Chiarandini. DM559 Linear and Integer Programming. Department of Mathematics & Computer Science University of Southern Denmark

LU Factorization. Marco Chiarandini. DM559 Linear and Integer Programming. Department of Mathematics & Computer Science University of Southern Denmark DM559 Linear and Integer Programming LU Factorization Marco Chiarandini Department of Mathematics & Computer Science University of Southern Denmark [Based on slides by Lieven Vandenberghe, UCLA] Outline

More information

Example Linear Algebra Competency Test

Example Linear Algebra Competency Test Example Linear Algebra Competency Test The 4 questions below are a combination of True or False, multiple choice, fill in the blank, and computations involving matrices and vectors. In the latter case,

More information

Accurate and Efficient Matrix Computations with Totally Positive Generalized Vandermonde Matrices Using Schur Functions

Accurate and Efficient Matrix Computations with Totally Positive Generalized Vandermonde Matrices Using Schur Functions Accurate and Efficient Matrix Computations with Totally Positive Generalized Matrices Using Schur Functions Plamen Koev Department of Mathematics UC Berkeley Joint work with Prof. James Demmel Supported

More information

Linear Algebra Linear Algebra : Matrix decompositions Monday, February 11th Math 365 Week #4

Linear Algebra Linear Algebra : Matrix decompositions Monday, February 11th Math 365 Week #4 Linear Algebra Linear Algebra : Matrix decompositions Monday, February 11th Math Week # 1 Saturday, February 1, 1 Linear algebra Typical linear system of equations : x 1 x +x = x 1 +x +9x = 0 x 1 +x x

More information

Numerical Methods I: Numerical linear algebra

Numerical Methods I: Numerical linear algebra 1/3 Numerical Methods I: Numerical linear algebra Georg Stadler Courant Institute, NYU stadler@cimsnyuedu September 1, 017 /3 We study the solution of linear systems of the form Ax = b with A R n n, x,

More information

A communication-avoiding parallel algorithm for the symmetric eigenvalue problem

A communication-avoiding parallel algorithm for the symmetric eigenvalue problem A communication-avoiding parallel algorithm for the symmetric eigenvalue problem Edgar Solomonik 1, Grey Ballard 2, James Demmel 3, Torsten Hoefler 4 (1) Department of Computer Science, University of Illinois

More information

Communication-avoiding parallel and sequential QR factorizations

Communication-avoiding parallel and sequential QR factorizations Communication-avoiding parallel and sequential QR factorizations James Demmel, Laura Grigori, Mark Hoemmen, and Julien Langou May 30, 2008 Abstract We present parallel and sequential dense QR factorization

More information

Enhancing Scalability of Sparse Direct Methods

Enhancing Scalability of Sparse Direct Methods Journal of Physics: Conference Series 78 (007) 0 doi:0.088/7-6596/78//0 Enhancing Scalability of Sparse Direct Methods X.S. Li, J. Demmel, L. Grigori, M. Gu, J. Xia 5, S. Jardin 6, C. Sovinec 7, L.-Q.

More information

Communication-avoiding parallel and sequential QR factorizations

Communication-avoiding parallel and sequential QR factorizations Communication-avoiding parallel and sequential QR factorizations James Demmel Laura Grigori Mark Frederick Hoemmen Julien Langou Electrical Engineering and Computer Sciences University of California at

More information

Avoiding Communication in Linear Algebra

Avoiding Communication in Linear Algebra Avoiding Communication in Linear Algebra Grey Ballard Microsoft Research Faculty Summit July 15, 2014 Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation,

More information

Sparse BLAS-3 Reduction

Sparse BLAS-3 Reduction Sparse BLAS-3 Reduction to Banded Upper Triangular (Spar3Bnd) Gary Howell, HPC/OIT NC State University gary howell@ncsu.edu Sparse BLAS-3 Reduction p.1/27 Acknowledgements James Demmel, Gene Golub, Franc

More information

The Behavior of Algorithms in Practice 2/21/2002. Lecture 4. ɛ 1 x 1 y ɛ 1 x 1 1 = x y 1 1 = y 1 = 1 y 2 = 1 1 = 0 1 1

The Behavior of Algorithms in Practice 2/21/2002. Lecture 4. ɛ 1 x 1 y ɛ 1 x 1 1 = x y 1 1 = y 1 = 1 y 2 = 1 1 = 0 1 1 8.409 The Behavior of Algorithms in Practice //00 Lecture 4 Lecturer: Dan Spielman Scribe: Matthew Lepinski A Gaussian Elimination Example To solve: [ ] [ ] [ ] x x First factor the matrix to get: [ ]

More information

Review. Example 1. Elementary matrices in action: (a) a b c. d e f = g h i. d e f = a b c. a b c. (b) d e f. d e f.

Review. Example 1. Elementary matrices in action: (a) a b c. d e f = g h i. d e f = a b c. a b c. (b) d e f. d e f. Review Example. Elementary matrices in action: (a) 0 0 0 0 a b c d e f = g h i d e f 0 0 g h i a b c (b) 0 0 0 0 a b c d e f = a b c d e f 0 0 7 g h i 7g 7h 7i (c) 0 0 0 0 a b c a b c d e f = d e f 0 g

More information

Exploiting Low-Rank Structure in Computing Matrix Powers with Applications to Preconditioning

Exploiting Low-Rank Structure in Computing Matrix Powers with Applications to Preconditioning Exploiting Low-Rank Structure in Computing Matrix Powers with Applications to Preconditioning Erin C. Carson, Nicholas Knight, James Demmel, Ming Gu U.C. Berkeley SIAM PP 12, Savannah, Georgia, USA, February

More information

Introduction to Mathematical Programming

Introduction to Mathematical Programming Introduction to Mathematical Programming Ming Zhong Lecture 6 September 12, 2018 Ming Zhong (JHU) AMS Fall 2018 1 / 20 Table of Contents 1 Ming Zhong (JHU) AMS Fall 2018 2 / 20 Solving Linear Systems A

More information

SYMBOLIC AND EXACT STRUCTURE PREDICTION FOR SPARSE GAUSSIAN ELIMINATION WITH PARTIAL PIVOTING

SYMBOLIC AND EXACT STRUCTURE PREDICTION FOR SPARSE GAUSSIAN ELIMINATION WITH PARTIAL PIVOTING SYMBOLIC AND EXACT STRUCTURE PREDICTION FOR SPARSE GAUSSIAN ELIMINATION WITH PARTIAL PIVOTING LAURA GRIGORI, JOHN R. GILBERT, AND MICHEL COSNARD Abstract. In this paper we consider two structure prediction

More information

LAPACK-Style Codes for Pivoted Cholesky and QR Updating. Hammarling, Sven and Higham, Nicholas J. and Lucas, Craig. MIMS EPrint: 2006.

LAPACK-Style Codes for Pivoted Cholesky and QR Updating. Hammarling, Sven and Higham, Nicholas J. and Lucas, Craig. MIMS EPrint: 2006. LAPACK-Style Codes for Pivoted Cholesky and QR Updating Hammarling, Sven and Higham, Nicholas J. and Lucas, Craig 2007 MIMS EPrint: 2006.385 Manchester Institute for Mathematical Sciences School of Mathematics

More information

Minimizing Communication in Linear Algebra. James Demmel 15 June

Minimizing Communication in Linear Algebra. James Demmel 15 June Minimizing Communication in Linear Algebra James Demmel 15 June 2010 www.cs.berkeley.edu/~demmel 1 Outline What is communication and why is it important to avoid? Direct Linear Algebra Lower bounds on

More information

Parallel LU Decomposition (PSC 2.3) Lecture 2.3 Parallel LU

Parallel LU Decomposition (PSC 2.3) Lecture 2.3 Parallel LU Parallel LU Decomposition (PSC 2.3) 1 / 20 Designing a parallel algorithm Main question: how to distribute the data? What data? The matrix A and the permutation π. Data distribution + sequential algorithm

More information

Sparse linear solvers

Sparse linear solvers Sparse linear solvers Laura Grigori ALPINES INRIA and LJLL, UPMC On sabbatical at UC Berkeley March 2015 Plan Sparse linear solvers Sparse matrices and graphs Classes of linear solvers Sparse Cholesky

More information

Roundoff Error. Monday, August 29, 11

Roundoff Error. Monday, August 29, 11 Roundoff Error A round-off error (rounding error), is the difference between the calculated approximation of a number and its exact mathematical value. Numerical analysis specifically tries to estimate

More information

On the Skeel condition number, growth factor and pivoting strategies for Gaussian elimination

On the Skeel condition number, growth factor and pivoting strategies for Gaussian elimination On the Skeel condition number, growth factor and pivoting strategies for Gaussian elimination J.M. Peña 1 Introduction Gaussian elimination (GE) with a given pivoting strategy, for nonsingular matrices

More information

CS412: Lecture #17. Mridul Aanjaneya. March 19, 2015

CS412: Lecture #17. Mridul Aanjaneya. March 19, 2015 CS: Lecture #7 Mridul Aanjaneya March 9, 5 Solving linear systems of equations Consider a lower triangular matrix L: l l l L = l 3 l 3 l 33 l n l nn A procedure similar to that for upper triangular systems

More information

Computing least squares condition numbers on hybrid multicore/gpu systems

Computing least squares condition numbers on hybrid multicore/gpu systems Computing least squares condition numbers on hybrid multicore/gpu systems M. Baboulin and J. Dongarra and R. Lacroix Abstract This paper presents an efficient computation for least squares conditioning

More information

A Computation- and Communication-Optimal Parallel Direct 3-body Algorithm

A Computation- and Communication-Optimal Parallel Direct 3-body Algorithm A Computation- and Communication-Optimal Parallel Direct 3-body Algorithm Penporn Koanantakool and Katherine Yelick {penpornk, yelick}@cs.berkeley.edu Computer Science Division, University of California,

More information

Subset Selection. Deterministic vs. Randomized. Ilse Ipsen. North Carolina State University. Joint work with: Stan Eisenstat, Yale

Subset Selection. Deterministic vs. Randomized. Ilse Ipsen. North Carolina State University. Joint work with: Stan Eisenstat, Yale Subset Selection Deterministic vs. Randomized Ilse Ipsen North Carolina State University Joint work with: Stan Eisenstat, Yale Mary Beth Broadbent, Martin Brown, Kevin Penner Subset Selection Given: real

More information

STAT 309: MATHEMATICAL COMPUTATIONS I FALL 2018 LECTURE 13

STAT 309: MATHEMATICAL COMPUTATIONS I FALL 2018 LECTURE 13 STAT 309: MATHEMATICAL COMPUTATIONS I FALL 208 LECTURE 3 need for pivoting we saw that under proper circumstances, we can write A LU where 0 0 0 u u 2 u n l 2 0 0 0 u 22 u 2n L l 3 l 32, U 0 0 0 l n l

More information

1.5 Gaussian Elimination With Partial Pivoting.

1.5 Gaussian Elimination With Partial Pivoting. Gaussian Elimination With Partial Pivoting In the previous section we discussed Gaussian elimination In that discussion we used equation to eliminate x from equations through n Then we used equation to

More information

Performance Evaluation of Some Inverse Iteration Algorithms on PowerXCell T M 8i Processor

Performance Evaluation of Some Inverse Iteration Algorithms on PowerXCell T M 8i Processor Performance Evaluation of Some Inverse Iteration Algorithms on PowerXCell T M 8i Processor Masami Takata 1, Hiroyuki Ishigami 2, Kini Kimura 2, and Yoshimasa Nakamura 2 1 Academic Group of Information

More information

Matrix Computations: Direct Methods II. May 5, 2014 Lecture 11

Matrix Computations: Direct Methods II. May 5, 2014 Lecture 11 Matrix Computations: Direct Methods II May 5, 2014 ecture Summary You have seen an example of how a typical matrix operation (an important one) can be reduced to using lower level BS routines that would

More information

Low-Complexity Encoding Algorithm for LDPC Codes

Low-Complexity Encoding Algorithm for LDPC Codes EECE 580B Modern Coding Theory Low-Complexity Encoding Algorithm for LDPC Codes Problem: Given the following matrix (imagine a larger matrix with a small number of ones) and the vector of information bits,

More information

y b where U. matrix inverse A 1 ( L. 1 U 1. L 1 U 13 U 23 U 33 U 13 2 U 12 1

y b where U. matrix inverse A 1 ( L. 1 U 1. L 1 U 13 U 23 U 33 U 13 2 U 12 1 LU decomposition -- manual demonstration Instructor: Nam Sun Wang lu-manualmcd LU decomposition, where L is a lower-triangular matrix with as the diagonal elements and U is an upper-triangular matrix Just

More information

Scalable numerical algorithms for electronic structure calculations

Scalable numerical algorithms for electronic structure calculations Scalable numerical algorithms for electronic structure calculations Edgar Solomonik C Berkeley July, 2012 Edgar Solomonik Cyclops Tensor Framework 1/ 73 Outline Introduction Motivation: Coupled Cluster

More information

ANALYTICAL MATHEMATICS FOR APPLICATIONS 2018 LECTURE NOTES 3

ANALYTICAL MATHEMATICS FOR APPLICATIONS 2018 LECTURE NOTES 3 ANALYTICAL MATHEMATICS FOR APPLICATIONS 2018 LECTURE NOTES 3 ISSUED 24 FEBRUARY 2018 1 Gaussian elimination Let A be an (m n)-matrix Consider the following row operations on A (1) Swap the positions any

More information

LU Factorization. LU Decomposition. LU Decomposition. LU Decomposition: Motivation A = LU

LU Factorization. LU Decomposition. LU Decomposition. LU Decomposition: Motivation A = LU LU Factorization To further improve the efficiency of solving linear systems Factorizations of matrix A : LU and QR LU Factorization Methods: Using basic Gaussian Elimination (GE) Factorization of Tridiagonal

More information

I-v k e k. (I-e k h kt ) = Stability of Gauss-Huard Elimination for Solving Linear Systems. 1 x 1 x x x x

I-v k e k. (I-e k h kt ) = Stability of Gauss-Huard Elimination for Solving Linear Systems. 1 x 1 x x x x Technical Report CS-93-08 Department of Computer Systems Faculty of Mathematics and Computer Science University of Amsterdam Stability of Gauss-Huard Elimination for Solving Linear Systems T. J. Dekker

More information

Parallel Numerical Algorithms

Parallel Numerical Algorithms Parallel Numerical Algorithms Chapter 6 Matrix Models Section 6.2 Low Rank Approximation Edgar Solomonik Department of Computer Science University of Illinois at Urbana-Champaign CS 554 / CSE 512 Edgar

More information

Numerical Linear Algebra

Numerical Linear Algebra Chapter 3 Numerical Linear Algebra We review some techniques used to solve Ax = b where A is an n n matrix, and x and b are n 1 vectors (column vectors). We then review eigenvalues and eigenvectors and

More information

. =. a i1 x 1 + a i2 x 2 + a in x n = b i. a 11 a 12 a 1n a 21 a 22 a 1n. i1 a i2 a in

. =. a i1 x 1 + a i2 x 2 + a in x n = b i. a 11 a 12 a 1n a 21 a 22 a 1n. i1 a i2 a in Vectors and Matrices Continued Remember that our goal is to write a system of algebraic equations as a matrix equation. Suppose we have the n linear algebraic equations a x + a 2 x 2 + a n x n = b a 2

More information

COALA: Communication Optimal Algorithms

COALA: Communication Optimal Algorithms COALA: Communication Optimal Algorithms for Linear Algebra Jim Demmel EECS & Math Depts. UC Berkeley Laura Grigori INRIA Saclay Ile de France Collaborators and Supporters Collaborators at Berkeley (campus

More information

BlockMatrixComputations and the Singular Value Decomposition. ATaleofTwoIdeas

BlockMatrixComputations and the Singular Value Decomposition. ATaleofTwoIdeas BlockMatrixComputations and the Singular Value Decomposition ATaleofTwoIdeas Charles F. Van Loan Department of Computer Science Cornell University Supported in part by the NSF contract CCR-9901988. Block

More information

MATH 3511 Lecture 1. Solving Linear Systems 1

MATH 3511 Lecture 1. Solving Linear Systems 1 MATH 3511 Lecture 1 Solving Linear Systems 1 Dmitriy Leykekhman Spring 2012 Goals Review of basic linear algebra Solution of simple linear systems Gaussian elimination D Leykekhman - MATH 3511 Introduction

More information

Sparse Rank-Revealing LU Factorization

Sparse Rank-Revealing LU Factorization Sparse Rank-Revealing LU Factorization SIAM Conference on Optimization Toronto, May 2002 Michael O Sullivan and Michael Saunders Dept of Engineering Science Dept of Management Sci & Eng University of Auckland

More information

Numerical Linear Algebra

Numerical Linear Algebra Numerical Linear Algebra Decompositions, numerical aspects Gerard Sleijpen and Martin van Gijzen September 27, 2017 1 Delft University of Technology Program Lecture 2 LU-decomposition Basic algorithm Cost

More information

Program Lecture 2. Numerical Linear Algebra. Gaussian elimination (2) Gaussian elimination. Decompositions, numerical aspects

Program Lecture 2. Numerical Linear Algebra. Gaussian elimination (2) Gaussian elimination. Decompositions, numerical aspects Numerical Linear Algebra Decompositions, numerical aspects Program Lecture 2 LU-decomposition Basic algorithm Cost Stability Pivoting Cholesky decomposition Sparse matrices and reorderings Gerard Sleijpen

More information

Matrix decompositions

Matrix decompositions Matrix decompositions Zdeněk Dvořák May 19, 2015 Lemma 1 (Schur decomposition). If A is a symmetric real matrix, then there exists an orthogonal matrix Q and a diagonal matrix D such that A = QDQ T. The

More information

Applied Linear Algebra in Geoscience Using MATLAB

Applied Linear Algebra in Geoscience Using MATLAB Applied Linear Algebra in Geoscience Using MATLAB Contents Getting Started Creating Arrays Mathematical Operations with Arrays Using Script Files and Managing Data Two-Dimensional Plots Programming in

More information

Large Scale Sparse Linear Algebra

Large Scale Sparse Linear Algebra Large Scale Sparse Linear Algebra P. Amestoy (INP-N7, IRIT) A. Buttari (CNRS, IRIT) T. Mary (University of Toulouse, IRIT) A. Guermouche (Univ. Bordeaux, LaBRI), J.-Y. L Excellent (INRIA, LIP, ENS-Lyon)

More information

Pivoting. Reading: GV96 Section 3.4, Stew98 Chapter 3: 1.3

Pivoting. Reading: GV96 Section 3.4, Stew98 Chapter 3: 1.3 Pivoting Reading: GV96 Section 3.4, Stew98 Chapter 3: 1.3 In the previous discussions we have assumed that the LU factorization of A existed and the various versions could compute it in a stable manner.

More information

Linear Equations and Matrix

Linear Equations and Matrix 1/60 Chia-Ping Chen Professor Department of Computer Science and Engineering National Sun Yat-sen University Linear Algebra Gaussian Elimination 2/60 Alpha Go Linear algebra begins with a system of linear

More information

Sparse Rank-Revealing LU Factorization

Sparse Rank-Revealing LU Factorization Sparse Rank-Revealing LU Factorization Householder Symposium XV on Numerical Linear Algebra Peebles, Scotland, June 2002 Michael O Sullivan and Michael Saunders Dept of Engineering Science Dept of Management

More information

Lecture 9: Numerical Linear Algebra Primer (February 11st)

Lecture 9: Numerical Linear Algebra Primer (February 11st) 10-725/36-725: Convex Optimization Spring 2015 Lecture 9: Numerical Linear Algebra Primer (February 11st) Lecturer: Ryan Tibshirani Scribes: Avinash Siravuru, Guofan Wu, Maosheng Liu Note: LaTeX template

More information

CHAPTER 6. Direct Methods for Solving Linear Systems

CHAPTER 6. Direct Methods for Solving Linear Systems CHAPTER 6 Direct Methods for Solving Linear Systems. Introduction A direct method for approximating the solution of a system of n linear equations in n unknowns is one that gives the exact solution to

More information

Theoretical Computer Science

Theoretical Computer Science Theoretical Computer Science 412 (2011) 1484 1491 Contents lists available at ScienceDirect Theoretical Computer Science journal homepage: wwwelseviercom/locate/tcs Parallel QR processing of Generalized

More information

S.F. Xu (Department of Mathematics, Peking University, Beijing)

S.F. Xu (Department of Mathematics, Peking University, Beijing) Journal of Computational Mathematics, Vol.14, No.1, 1996, 23 31. A SMALLEST SINGULAR VALUE METHOD FOR SOLVING INVERSE EIGENVALUE PROBLEMS 1) S.F. Xu (Department of Mathematics, Peking University, Beijing)

More information

Orthogonal Transformations

Orthogonal Transformations Orthogonal Transformations Tom Lyche University of Oslo Norway Orthogonal Transformations p. 1/3 Applications of Qx with Q T Q = I 1. solving least squares problems (today) 2. solving linear equations

More information

Parallel Numerical Algorithms

Parallel Numerical Algorithms Parallel Numerical Algorithms Chapter 5 Eigenvalue Problems Section 5.1 Michael T. Heath and Edgar Solomonik Department of Computer Science University of Illinois at Urbana-Champaign CS 554 / CSE 512 Michael

More information