Avoiding Communication in Distributed-Memory Tridiagonalization

Size: px
Start display at page:

Download "Avoiding Communication in Distributed-Memory Tridiagonalization"

Transcription

1 Avoiding Communication in Distributed-Memory Tridiagonalization SIAM CSE 15 Nicholas Knight University of California, Berkeley March 14, 2015 Joint work with: Grey Ballard (SNL) James Demmel (UCB) Laura Grigori (INRIA) Mathias Jacquelin (LBNL) H. Diep Nguyen (UCB) Edgar Solomonik (ETHZ) Nicholas Knight SIAM CSE 15 1

2 The symmetric eigenproblem Three-phase direct approach for the symmetric eigenproblem: 1 Reduce symmetric matrix to tridiagonal by orthogonal similarity 2 Solve (symmetric) tridiagonal eigenproblem 3 Apply inverse orthogonal similarity to tridiagonal eigenvectors Nicholas Knight SIAM CSE 15 2

3 The symmetric eigenproblem Three-phase direct approach for the symmetric eigenproblem: 1 Reduce symmetric matrix to tridiagonal by orthogonal similarity 2 Solve (symmetric) tridiagonal eigenproblem 3 Apply inverse orthogonal similarity to tridiagonal eigenvectors This talk: improving phase 1 (tridiagonalization) Integrate communication-efficient QR factorization (CAQR) Nicholas Knight SIAM CSE 15 2

4 A class of algorithms for QR factorization (Q, R) = QR(A) A is m-by-n with m n 1 Return (Q, R) Nicholas Knight SIAM CSE 15 3

5 A class of algorithms for QR factorization (Q, R) = QR(A) A is m-by-n with m n 1 If n = 1 Compute orthogonal Q such that Q T A = R span(e 1 ) Return (Q, R) Nicholas Knight SIAM CSE 15 3

6 A class of algorithms for QR factorization (Q, R) = QR(A) A is m-by-n with m n 1 If n = 1 Compute orthogonal Q such that Q T A = R span(e 1 ) Else A11 A Partition A = 12 such that A A 21 A 11 is square, nonempty 22 Return (Q, R) Nicholas Knight SIAM CSE 15 3

7 A class of algorithms for QR factorization (Q, R) = QR(A) A is m-by-n with m n 1 If n = 1 Compute orthogonal Q such that Q T A = R span(e 1 ) Else A11 A Partition A = 12 such that A A 21 A 11 is square, nonempty 22 A11 (Q L, R 1 ) = QR A 21 Return (Q, R) panel factorization Nicholas Knight SIAM CSE 15 3

8 A class of algorithms for QR factorization (Q, R) = QR(A) A is m-by-n with m n 1 If n = 1 Compute orthogonal Q such that Q T A = R span(e 1 ) Else A11 A Partition A = 12 such that A A 21 A 11 is square, nonempty 22 A11 (Q L, R 1 ) = QR A 21 R2 = Q B L T A21 A 22 Return (Q, R) trailing matrix update Nicholas Knight SIAM CSE 15 3

9 A class of algorithms for QR factorization (Q, R) = QR(A) A is m-by-n with m n 1 If n = 1 Compute orthogonal Q such that Q T A = R span(e 1 ) Else A11 A Partition A = 12 such that A A 21 A 11 is square, nonempty 22 A11 (Q L, R 1 ) = QR A 21 R2 = Q B L T A21 A 22 (Q R, R 3 ) = QR(B) Return (Q, R) Nicholas Knight SIAM CSE 15 3

10 A class of algorithms for QR factorization (Q, R) = QR(A) A is m-by-n with m n 1 If n = 1 Compute orthogonal Q such that Q T A = R span(e 1 ) Else A11 A Partition A = 12 such that A A 21 A 11 is square, nonempty 22 A11 (Q L, R 1 ) = QR A 21 R2 = Q B L T A21 A 22 (Q R, R 3 ) = QR(B) I 0 Q = Q L 0 Q R Return (Q, R) Nicholas Knight SIAM CSE 15 3

11 A class of algorithms for QR factorization (Q, R) = QR(A) A is m-by-n with m n 1 If n = 1 Compute orthogonal Q such that Q T A = R span(e 1 ) Else A11 A Partition A = 12 such that A A 21 A 11 is square, nonempty 22 A11 (Q L, R 1 ) = QR A 21 R2 = Q B L T A21 A 22 (Q R, R 3 ) = QR(B) I 0 Q = Q L 0 Q R R1 R R = 2 0 R 3 Return (Q, R) Nicholas Knight SIAM CSE 15 3

12 A class of algorithms for QR factorization (Q, R) = QR b (A) A is m-by-n with m n 1 If n = 1 Compute orthogonal Q such that Q T A = R span(e 1 ) Else A11 A Partition A = 12 such that A A 21 A 11 is b-by-b 22 A11 (Q L, R 1 ) = QR b A 21 R2 = Q B L T A21 Blocking factor b: A 22 b = b(n) {1,..., n} (Q R, R 3 ) = QR b (B) b(n) > 1 b(n) > b(b(n)) I 0 Q = Q L 0 Q R R1 R R = 2 0 R 3 Return (Q, R) Nicholas Knight SIAM CSE 15 3

13 Particular choices of blocking function b b = n/2, b = n/k for k 2 b = 1 recursive right-looking (unblocked) b = max{1, n 1} { left-looking (unblocked) b 1 n > b 1 b = n b 1 1 { right-looking (1-level blocking) b = n b 1 n > b 1 n 1 n b 1 left-looking (1-level blocking) multi-level blocking... Nicholas Knight SIAM CSE 15 4

14 Householder representation for QR Use compact representation [SVL89] for orthogonal matrices arising in QR factorization (see also [SB95]). Q = H(Y ) = I YTY T Y is unit lower trapezoidal T is invertible upper triangular Q = I - Y T Y T Nicholas Knight SIAM CSE 15 5

15 Householder representation for QR Use compact representation [SVL89] for orthogonal matrices arising in QR factorization (see also [SB95]). Q = H(Y ) = I YTY T Y is unit lower trapezoidal T is invertible upper triangular T determined by Y T T + T 1 = Y T Y In practice: keep T explicit Q = I - Y T Y T T -T + T -1 = Y T Y Nicholas Knight SIAM CSE 15 5

16 Householder representation for QR Use compact representation [SVL89] for orthogonal matrices arising in QR factorization (see also [SB95]). Q = H(Y ) = I YTY T Y is unit lower trapezoidal T is invertible upper triangular T determined by Y T T + T 1 = Y T Y In practice: keep T explicit Q = I - Y T Y T T -T + T -1 = Y T Y Q, Q T can be applied efficiently Q T A = A YT T Y T A Q T A = A - Y T T Y T A well-optimized (mat. mul.) Nicholas Knight SIAM CSE 15 5

17 Householder QR factorization (Y, R) = QR(A) A is m-by-n with m n 1 If n = 1 R Compute vector Y such that H(Y ) T A = span(e 0 1 ) Else A11 A Partition A = 12 such that A A 21 A 11 is square, nonempty 22 A11 (Y L, R 1 ) = QR A ( 21 ) R2 = H(Y B L ) T A21 A 22 (Y R, R ( 3 ) = QR(B) ) 0 Y = Y L Y R R1 R R = 2 0 R 3 Return (Y, R) Nicholas Knight SIAM CSE 15 6

18 Communication-efficient Householder QR (Sca)LAPACK (P)xGEQRF use 1-level blocking to improve data locality in trailing matrix update (see, e.g., [Pug92]) Nicholas Knight SIAM CSE 15 7

19 Communication-efficient Householder QR (Sca)LAPACK (P)xGEQRF use 1-level blocking to improve data locality in trailing matrix update (see, e.g., [Pug92]) But, panel factorization can still be communication bottleneck E.g., A too tall/skinny to benefit from blocking Nicholas Knight SIAM CSE 15 7

20 Communication-efficient Householder QR (Sca)LAPACK (P)xGEQRF use 1-level blocking to improve data locality in trailing matrix update (see, e.g., [Pug92]) But, panel factorization can still be communication bottleneck E.g., A too tall/skinny to benefit from blocking Tall/skinny QR (TSQR) [DGHL12] Cheaper than Householder QR when A is sufficiently tall/skinny Comm.-avoiding Householder QR (CAQR): use TSQR as base-case Nicholas Knight SIAM CSE 15 7

21 Communication-efficient Householder QR (Sca)LAPACK (P)xGEQRF use 1-level blocking to improve data locality in trailing matrix update (see, e.g., [Pug92]) But, panel factorization can still be communication bottleneck E.g., A too tall/skinny to benefit from blocking Tall/skinny QR (TSQR) [DGHL12] Cheaper than Householder QR when A is sufficiently tall/skinny Comm.-avoiding Householder QR (CAQR): use TSQR as base-case Flops Words Messages Householder QR 2mn2 2n 3 /3 n log p CAQR p 2mn 2 2n 3 /3 p 2mn+n 2 /2 p 2mn+n 2 log p p 7 2 p log 3 p Leading-order costs of QR factorization (right-looking with 1-level blocking) of m-by-n matrix distributed over p processors in 2D fashion. Nicholas Knight SIAM CSE 15 7

22 Tall/skinny QR (TSQR) Key benefits of TSQR: sequential: read A once parallel: one reduction Q represented as tree (Y i ) i : Q = i H(Y i) Nicholas Knight SIAM CSE 15 8

23 Communication-avoiding QR (CAQR) CAQR uses TSQR for panel factorization and applies Q T to the trailing matrix by tree traversal More complicated than applying I YT T Y T. Nicholas Knight SIAM CSE 15 9

24 CAQR: Householder QR factorization using TSQR ((Y i ) i, R) = QR(A) A is m-by-n with m n 1 If A sufficiently tall/skinny ((Y i ) i, R) = TSQR(A) Else A11 A Partition A = 12 such that A A 21 A 11 is square, nonempty ( 22 ) A11 ((Y L,i ) i, R 1 ) = QR A 21 R2 = ( B i H(Y L,i)) T A21 A 22 ((Y R,i ) i, R 3 ) = QR(B) ( ) 0 (Y i ) i = concat (Y L,i ) i, Y R,i i R1 R R = 2 0 R 3 Return ((Y i ) i, R) Nicholas Knight SIAM CSE 15 10

25 Householder reconstruction for TSQR TSQR with Householder reconstruction (TSQR-HR) [BDG + 14]: Stably convert TSQR repr. (Y i ) i to Householder repr. Y CAQR-HR: simple and well-optimized trailing matrix update Performance portability (xgemm is likely more tuned than xtprfb) Flops Words Messages Householder QR 2mn2 2n 3 /3 n log p CAQR CAQR-HR p 2mn 2 2n 3 /3 p 2mn 2 2n 3 /3 p 2mn+n 2 /2 p 2mn+n 2 log p p 7 2mn+n 2 /2 p 2 p log 3 p 6 p log 2 p Leading-order costs of QR factorization (right-looking with 1-level blocking) of m-by-n matrix distributed over p processors in 2D fashion. Nicholas Knight SIAM CSE 15 11

26 CAQR-HR: CAQR with Householder reconstruction (Y, R) = QR(A) A is m-by-n with m n 1 If A sufficiently tall/skinny (Y, R) = TSQR-HR(A) Else A11 A Partition A = 12 such that A A 21 A 11 is square, nonempty 22 A11 (Y L, R 1 ) = QR A ( 21 ) R2 = H(Y B L ) T A21 A 22 (Y R, R 3 ) = QR(B) ( ) 0 Y = Y L Y R R1 R R = 2 0 R 3 Return (Y, R) Nicholas Knight SIAM CSE 15 12

27 Householder reconstruction Key idea: Let A = QR be a (thin) QR factorization, A = QR = (I YTY1 T )R, and rearrange to obtain an LU factorization, A1 R = Y ( TY1 T R) = L U. A 2 A-R = Y T Y T 1 R = L U Nicholas Knight SIAM CSE 15 13

28 Householder reconstruction If A 1 R is nonsingular, let Y = L where A1 R (L, U) = LU A 2 A1 R = Y ( TY1 T R). A 2 A-R = Y T Y T 1 R = L U Nicholas Knight SIAM CSE 15 14

29 Householder reconstruction If A 1 SR is nonsingular for some diagonal sign matrix S, instead compute A = ˆQ ˆR = (QS)(SR), and let Ŷ = L where A1 SR (L, U) = LU A 2 A1 SR = Ŷ ( T Ŷ 1 T SR). A 2 A-SR = Y T Y T 1 S R = L U Nicholas Knight SIAM CSE 15 14

30 Householder reconstruction It is always possible ( to) pick S such that Q 1 S is nonsingular and largest elements of Q1 S on diagonal. Q 2 Q1 S (L, U) = LU Q 2 Q1 S = Ŷ ( T Ŷ 1 T S). Q 2 Nicholas Knight SIAM CSE 15 14

31 Numerical stability TSQR-HR is as stable as Householder QR (up to constants) [BDG + 14]: Use TSQR to compute (thin) QR factorization LU factorization (well conditioned, no pivoting) Nicholas Knight SIAM CSE 15 15

32 Numerical stability TSQR-HR is as stable as Householder QR (up to constants) [BDG + 14]: Use TSQR to compute (thin) QR factorization LU factorization (well conditioned, no pivoting) ρ κ(a) A QR F I Q T Q F 1e e02 2.2e e-15 1e e04 2.3e e-15 1e e06 2.4e e-15 1e e08 2.3e e-15 1e e10 2.3e e-15 1e e12 2.2e e-15 1e e14 2.7e e-14 1e e15 2.3e e-15 Errors on tall and skinny matrices (m = 1000, b = 200) Nicholas Knight SIAM CSE 15 15

33 Conclusions Use communication-efficient QR factorization for tridiagonalization [BDG + 14]: Approach based on TSQR [AHW14]: Approach based on Cholesky-QR (w/ dynamic block-size) [FNYY14]: Approach based on Cholesky-QR (w/ iterative refinement) Nicholas Knight SIAM CSE 15 16

34 Conclusions Use communication-efficient QR factorization for tridiagonalization [BDG + 14]: Approach based on TSQR [AHW14]: Approach based on Cholesky-QR (w/ dynamic block-size) [FNYY14]: Approach based on Cholesky-QR (w/ iterative refinement) Ongoing work: 3D QR factorization Use recursive approach: b = n/2 Use 3D matrix multiplication for trailing matrix update Use fewer processors on base cases (TSQR-HR) O(p 1/6 )-fold decrease in # words O(p 1/6 )-fold increase in # messages Thank you! knight@cs.berkeley.edu Nicholas Knight SIAM CSE 15 16

35 References I T. Auckenthaler, T. Huckle, and R. Wittmann. A blocked QR-decomposition for the parallel symmetric eigenvalue problem. Parallel Computing, 40(7): , G. Ballard, J. Demmel, L. Grigori, M. Jacquelin, H. D. Nguyen, and E. Solomonik. Reconstructing Householder vectors from tall-skinny QR. In Parallel and Distributed Processing Symposium, 2014 IEEE 28th International, pages IEEE, J. Demmel, L. Grigori, M. Hoemmen, and J. Langou. Communication-optimal parallel and sequential QR and LU factorizations. SIAM Journal on Scientific Computing, 34(1):A206 A239, Nicholas Knight SIAM CSE 15 17

36 References II T. Fukaya, Y. Nakatsukasa, Y. Yanagisawa, and Y. Yamamoto. CholeskyQR2: a simple and communication-avoiding algorithm for computing a tall-skinny QR factorization on a large-scale parallel system. In Proceedings of the 5th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, pages IEEE Press, C. Puglisi. Modification of the Householder method based on the compact WY representation. SIAM Journal on Scientific and Statistical Computing, 13(3): , X. Sun and C. Bischof. A basis-kernel representation of orthogonal matrices. SIAM Journal on Matrix Analysis and Applications, 16(4): , Nicholas Knight SIAM CSE 15 18

37 References III R. Schreiber and C. Van Loan. A storage-efficient WY representation for products of Householder transformations. SIAM Journal on Scientific and Statistical Computing, 10(1):53 57, A. Tiskin. Communication-efficient parallel generic pairwise elimination. Future Generation Computer Systems, 23(2): , Nicholas Knight SIAM CSE 15 19

A communication-avoiding parallel algorithm for the symmetric eigenvalue problem

A communication-avoiding parallel algorithm for the symmetric eigenvalue problem A communication-avoiding parallel algorithm for the symmetric eigenvalue problem Edgar Solomonik Department of Computer Science University of Illinois at Urbana-Champaign Householder Symposium XX June

More information

Communication avoiding parallel algorithms for dense matrix factorizations

Communication avoiding parallel algorithms for dense matrix factorizations Communication avoiding parallel dense matrix factorizations 1/ 44 Communication avoiding parallel algorithms for dense matrix factorizations Edgar Solomonik Department of EECS, UC Berkeley October 2013

More information

A communication-avoiding parallel algorithm for the symmetric eigenvalue problem

A communication-avoiding parallel algorithm for the symmetric eigenvalue problem A communication-avoiding parallel algorithm for the symmetric eigenvalue problem Edgar Solomonik 1, Grey Ballard 2, James Demmel 3, Torsten Hoefler 4 (1) Department of Computer Science, University of Illinois

More information

A communication-avoiding thick-restart Lanczos method on a distributed-memory system

A communication-avoiding thick-restart Lanczos method on a distributed-memory system A communication-avoiding thick-restart Lanczos method on a distributed-memory system Ichitaro Yamazaki and Kesheng Wu Lawrence Berkeley National Laboratory, Berkeley, CA, USA Abstract. The Thick-Restart

More information

Communication-avoiding LU and QR factorizations for multicore architectures

Communication-avoiding LU and QR factorizations for multicore architectures Communication-avoiding LU and QR factorizations for multicore architectures DONFACK Simplice INRIA Saclay Joint work with Laura Grigori INRIA Saclay Alok Kumar Gupta BCCS,Norway-5075 16th April 2010 Communication-avoiding

More information

LU factorization with Panel Rank Revealing Pivoting and its Communication Avoiding version

LU factorization with Panel Rank Revealing Pivoting and its Communication Avoiding version 1 LU factorization with Panel Rank Revealing Pivoting and its Communication Avoiding version Amal Khabou Advisor: Laura Grigori Université Paris Sud 11, INRIA Saclay France SIAMPP12 February 17, 2012 2

More information

Hybrid static/dynamic scheduling for already optimized dense matrix factorization. Joint Laboratory for Petascale Computing, INRIA-UIUC

Hybrid static/dynamic scheduling for already optimized dense matrix factorization. Joint Laboratory for Petascale Computing, INRIA-UIUC Hybrid static/dynamic scheduling for already optimized dense matrix factorization Simplice Donfack, Laura Grigori, INRIA, France Bill Gropp, Vivek Kale UIUC, USA Joint Laboratory for Petascale Computing,

More information

Parallel Numerical Algorithms

Parallel Numerical Algorithms Parallel Numerical Algorithms Chapter 5 Eigenvalue Problems Section 5.1 Michael T. Heath and Edgar Solomonik Department of Computer Science University of Illinois at Urbana-Champaign CS 554 / CSE 512 Michael

More information

QR Factorization of Tall and Skinny Matrices in a Grid Computing Environment

QR Factorization of Tall and Skinny Matrices in a Grid Computing Environment QR Factorization of Tall and Skinny Matrices in a Grid Computing Environment Emmanuel AGULLO (INRIA / LaBRI) Camille COTI (Iowa State University) Jack DONGARRA (University of Tennessee) Thomas HÉRAULT

More information

Introduction to communication avoiding linear algebra algorithms in high performance computing

Introduction to communication avoiding linear algebra algorithms in high performance computing Introduction to communication avoiding linear algebra algorithms in high performance computing Laura Grigori Inria Rocquencourt/UPMC Contents 1 Introduction............................ 2 2 The need for

More information

Minisymposia 9 and 34: Avoiding Communication in Linear Algebra. Jim Demmel UC Berkeley bebop.cs.berkeley.edu

Minisymposia 9 and 34: Avoiding Communication in Linear Algebra. Jim Demmel UC Berkeley bebop.cs.berkeley.edu Minisymposia 9 and 34: Avoiding Communication in Linear Algebra Jim Demmel UC Berkeley bebop.cs.berkeley.edu Motivation (1) Increasing parallelism to exploit From Top500 to multicores in your laptop Exponentially

More information

Communication-avoiding parallel and sequential QR factorizations

Communication-avoiding parallel and sequential QR factorizations Communication-avoiding parallel and sequential QR factorizations James Demmel, Laura Grigori, Mark Hoemmen, and Julien Langou May 30, 2008 Abstract We present parallel and sequential dense QR factorization

More information

Reproducible Tall-Skinny QR

Reproducible Tall-Skinny QR Reproducible Tall-Skinny QR James Demmel and Hong Diep Nguyen University of California, Berkeley, Berkeley, USA demmel@cs.berkeley.edu, hdnguyen@eecs.berkeley.edu Abstract Reproducibility is the ability

More information

Dense LU factorization and its error analysis

Dense LU factorization and its error analysis Dense LU factorization and its error analysis Laura Grigori INRIA and LJLL, UPMC February 2016 Plan Basis of floating point arithmetic and stability analysis Notation, results, proofs taken from [N.J.Higham,

More information

Performance Evaluation of Some Inverse Iteration Algorithms on PowerXCell T M 8i Processor

Performance Evaluation of Some Inverse Iteration Algorithms on PowerXCell T M 8i Processor Performance Evaluation of Some Inverse Iteration Algorithms on PowerXCell T M 8i Processor Masami Takata 1, Hiroyuki Ishigami 2, Kini Kimura 2, and Yoshimasa Nakamura 2 1 Academic Group of Information

More information

Communication-avoiding parallel and sequential QR factorizations

Communication-avoiding parallel and sequential QR factorizations Communication-avoiding parallel and sequential QR factorizations James Demmel Laura Grigori Mark Frederick Hoemmen Julien Langou Electrical Engineering and Computer Sciences University of California at

More information

Introduction to communication avoiding algorithms for direct methods of factorization in Linear Algebra

Introduction to communication avoiding algorithms for direct methods of factorization in Linear Algebra Introduction to communication avoiding algorithms for direct methods of factorization in Linear Algebra Laura Grigori Abstract Modern, massively parallel computers play a fundamental role in a large and

More information

Avoiding Communication in Linear Algebra

Avoiding Communication in Linear Algebra Avoiding Communication in Linear Algebra Grey Ballard Microsoft Research Faculty Summit July 15, 2014 Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation,

More information

Exploiting Low-Rank Structure in Computing Matrix Powers with Applications to Preconditioning

Exploiting Low-Rank Structure in Computing Matrix Powers with Applications to Preconditioning Exploiting Low-Rank Structure in Computing Matrix Powers with Applications to Preconditioning Erin C. Carson, Nicholas Knight, James Demmel, Ming Gu U.C. Berkeley SIAM PP 12, Savannah, Georgia, USA, February

More information

Tradeoffs between synchronization, communication, and work in parallel linear algebra computations

Tradeoffs between synchronization, communication, and work in parallel linear algebra computations Tradeoffs between synchronization, communication, and work in parallel linear algebra computations Edgar Solomonik, Erin Carson, Nicholas Knight, and James Demmel Department of EECS, UC Berkeley February,

More information

Tall and Skinny QR Matrix Factorization Using Tile Algorithms on Multicore Architectures LAPACK Working Note - 222

Tall and Skinny QR Matrix Factorization Using Tile Algorithms on Multicore Architectures LAPACK Working Note - 222 Tall and Skinny QR Matrix Factorization Using Tile Algorithms on Multicore Architectures LAPACK Working Note - 222 Bilel Hadri 1, Hatem Ltaief 1, Emmanuel Agullo 1, and Jack Dongarra 1,2,3 1 Department

More information

Tile QR Factorization with Parallel Panel Processing for Multicore Architectures

Tile QR Factorization with Parallel Panel Processing for Multicore Architectures Tile QR Factorization with Parallel Panel Processing for Multicore Architectures Bilel Hadri, Hatem Ltaief, Emmanuel Agullo, Jack Dongarra Department of Electrical Engineering and Computer Science, University

More information

Provably efficient algorithms for numerical tensor algebra

Provably efficient algorithms for numerical tensor algebra Provably efficient algorithms for numerical tensor algebra Edgar Solomonik Department of EECS, Computer Science Division, UC Berkeley Dissertation Defense Adviser: James Demmel Aug 27, 2014 Edgar Solomonik

More information

Scalable numerical algorithms for electronic structure calculations

Scalable numerical algorithms for electronic structure calculations Scalable numerical algorithms for electronic structure calculations Edgar Solomonik C Berkeley July, 2012 Edgar Solomonik Cyclops Tensor Framework 1/ 73 Outline Introduction Motivation: Coupled Cluster

More information

2.5D algorithms for distributed-memory computing

2.5D algorithms for distributed-memory computing ntroduction for distributed-memory computing C Berkeley July, 2012 1/ 62 ntroduction Outline ntroduction Strong scaling 2.5D factorization 2/ 62 ntroduction Strong scaling Solving science problems faster

More information

Parallel Numerical Algorithms

Parallel Numerical Algorithms Parallel Numerical Algorithms Chapter 6 Matrix Models Section 6.2 Low Rank Approximation Edgar Solomonik Department of Computer Science University of Illinois at Urbana-Champaign CS 554 / CSE 512 Edgar

More information

Reconstructing Householder Vectors from Tall-Skinny QR

Reconstructing Householder Vectors from Tall-Skinny QR Reconstructing Householder Vectors from Tall-Skinny QR Grey Ballard James Demmel Laura Grigori Mathias Jacquelin Hong Die Nguyen Edgar Solomonik Electrical Engineering and Comuter Sciences University of

More information

Communication Avoiding LU Factorization using Complete Pivoting

Communication Avoiding LU Factorization using Complete Pivoting Communication Avoiding LU Factorization using Complete Pivoting Implementation and Analysis Avinash Bhardwaj Department of Industrial Engineering and Operations Research University of California, Berkeley

More information

CS 598: Communication Cost Analysis of Algorithms Lecture 9: The Ideal Cache Model and the Discrete Fourier Transform

CS 598: Communication Cost Analysis of Algorithms Lecture 9: The Ideal Cache Model and the Discrete Fourier Transform CS 598: Communication Cost Analysis of Algorithms Lecture 9: The Ideal Cache Model and the Discrete Fourier Transform Edgar Solomonik University of Illinois at Urbana-Champaign September 21, 2016 Fast

More information

Enhancing Scalability of Sparse Direct Methods

Enhancing Scalability of Sparse Direct Methods Journal of Physics: Conference Series 78 (007) 0 doi:0.088/7-6596/78//0 Enhancing Scalability of Sparse Direct Methods X.S. Li, J. Demmel, L. Grigori, M. Gu, J. Xia 5, S. Jardin 6, C. Sovinec 7, L.-Q.

More information

Minimizing Communication in Linear Algebra. James Demmel 15 June

Minimizing Communication in Linear Algebra. James Demmel 15 June Minimizing Communication in Linear Algebra James Demmel 15 June 2010 www.cs.berkeley.edu/~demmel 1 Outline What is communication and why is it important to avoid? Direct Linear Algebra Lower bounds on

More information

ROUNDOFF ERROR ANALYSIS OF THE CHOLESKYQR2 ALGORITHM

ROUNDOFF ERROR ANALYSIS OF THE CHOLESKYQR2 ALGORITHM Electronic Transactions on Numerical Analysis. Volume 44, pp. 306 326, 2015. Copyright c 2015,. ISSN 1068 9613. ETNA ROUNDOFF ERROR ANALYSIS OF THE CHOLESKYQR2 ALGORITHM YUSAKU YAMAMOTO, YUJI NAKATSUKASA,

More information

CALU: A Communication Optimal LU Factorization Algorithm

CALU: A Communication Optimal LU Factorization Algorithm CALU: A Communication Optimal LU Factorization Algorithm James Demmel Laura Grigori Hua Xiang Electrical Engineering and Computer Sciences University of California at Berkeley Technical Report No. UCB/EECS-010-9

More information

Accelerating linear algebra computations with hybrid GPU-multicore systems.

Accelerating linear algebra computations with hybrid GPU-multicore systems. Accelerating linear algebra computations with hybrid GPU-multicore systems. Marc Baboulin INRIA/Université Paris-Sud joint work with Jack Dongarra (University of Tennessee and Oak Ridge National Laboratory)

More information

LAPACK-Style Codes for Pivoted Cholesky and QR Updating. Hammarling, Sven and Higham, Nicholas J. and Lucas, Craig. MIMS EPrint: 2006.

LAPACK-Style Codes for Pivoted Cholesky and QR Updating. Hammarling, Sven and Higham, Nicholas J. and Lucas, Craig. MIMS EPrint: 2006. LAPACK-Style Codes for Pivoted Cholesky and QR Updating Hammarling, Sven and Higham, Nicholas J. and Lucas, Craig 2007 MIMS EPrint: 2006.385 Manchester Institute for Mathematical Sciences School of Mathematics

More information

arxiv: v1 [cs.dc] 14 May 2018

arxiv: v1 [cs.dc] 14 May 2018 A 3D arallel Algorithm for QR Decomposition arxiv:18050578v1 [csdc 14 May 018 ABSTRACT Grey Ballard Wake Forest University Winston Salem, NC, USA ballard@wfuedu Mathias Jacquelin Lawrence Berkeley Natl

More information

Solving linear equations with Gaussian Elimination (I)

Solving linear equations with Gaussian Elimination (I) Term Projects Solving linear equations with Gaussian Elimination The QR Algorithm for Symmetric Eigenvalue Problem The QR Algorithm for The SVD Quasi-Newton Methods Solving linear equations with Gaussian

More information

Communication-avoiding parallel algorithms for dense linear algebra

Communication-avoiding parallel algorithms for dense linear algebra Communication-avoiding parallel algorithms 1/ 49 Communication-avoiding parallel algorithms for dense linear algebra Edgar Solomonik Department of EECS, C Berkeley June 2013 Communication-avoiding parallel

More information

The QR Factorization

The QR Factorization The QR Factorization How to Make Matrices Nicer Radu Trîmbiţaş Babeş-Bolyai University March 11, 2009 Radu Trîmbiţaş ( Babeş-Bolyai University) The QR Factorization March 11, 2009 1 / 25 Projectors A projector

More information

c 2015 Society for Industrial and Applied Mathematics

c 2015 Society for Industrial and Applied Mathematics SIAM J MATRIX ANAL APPL Vol 36, No, pp 55 89 c 05 Society for Industrial and Applied Mathematics COMMUNICATION AVOIDING RANK REVEALING QR FACTORIZATION WITH COLUMN PIVOTING JAMES W DEMMEL, LAURA GRIGORI,

More information

LAPACK-Style Codes for Pivoted Cholesky and QR Updating

LAPACK-Style Codes for Pivoted Cholesky and QR Updating LAPACK-Style Codes for Pivoted Cholesky and QR Updating Sven Hammarling 1, Nicholas J. Higham 2, and Craig Lucas 3 1 NAG Ltd.,Wilkinson House, Jordan Hill Road, Oxford, OX2 8DR, England, sven@nag.co.uk,

More information

APPLIED NUMERICAL LINEAR ALGEBRA

APPLIED NUMERICAL LINEAR ALGEBRA APPLIED NUMERICAL LINEAR ALGEBRA James W. Demmel University of California Berkeley, California Society for Industrial and Applied Mathematics Philadelphia Contents Preface 1 Introduction 1 1.1 Basic Notation

More information

Theoretical Computer Science

Theoretical Computer Science Theoretical Computer Science 412 (2011) 1484 1491 Contents lists available at ScienceDirect Theoretical Computer Science journal homepage: wwwelseviercom/locate/tcs Parallel QR processing of Generalized

More information

Communication-Avoiding Parallel Algorithms for Dense Linear Algebra and Tensor Computations

Communication-Avoiding Parallel Algorithms for Dense Linear Algebra and Tensor Computations Introduction Communication-Avoiding Parallel Algorithms for Dense Linear Algebra and Tensor Computations Edgar Solomonik Department of EECS, C Berkeley February, 2013 Edgar Solomonik Communication-avoiding

More information

Scalable, Robust, Fault-Tolerant Parallel QR Factorization. Camille Coti

Scalable, Robust, Fault-Tolerant Parallel QR Factorization. Camille Coti Communication-avoiding Fault-tolerant Scalable, Robust, Fault-Tolerant Parallel Factorization Camille Coti LIPN, CNRS UMR 7030, SPC, Université Paris 13, France DCABES August 24th, 2016 1 / 21 C. Coti

More information

Sparse BLAS-3 Reduction

Sparse BLAS-3 Reduction Sparse BLAS-3 Reduction to Banded Upper Triangular (Spar3Bnd) Gary Howell, HPC/OIT NC State University gary howell@ncsu.edu Sparse BLAS-3 Reduction p.1/27 Acknowledgements James Demmel, Gene Golub, Franc

More information

Since the determinant of a diagonal matrix is the product of its diagonal elements it is trivial to see that det(a) = α 2. = max. A 1 x.

Since the determinant of a diagonal matrix is the product of its diagonal elements it is trivial to see that det(a) = α 2. = max. A 1 x. APPM 4720/5720 Problem Set 2 Solutions This assignment is due at the start of class on Wednesday, February 9th. Minimal credit will be given for incomplete solutions or solutions that do not provide details

More information

Scalable, Robust, Fault-Tolerant Parallel QR Factorization

Scalable, Robust, Fault-Tolerant Parallel QR Factorization Scalable, Robust, Fault-Tolerant Parallel QR Factorization Camille Coti LIPN, CNRS UMR 7030, Université Paris 13, Sorbonne Paris Cité 99, avenue Jean-Baptiste Clément, F-93430 Villetaneuse, FRANCE camille.coti@lipn.univ-paris13.fr

More information

BlockMatrixComputations and the Singular Value Decomposition. ATaleofTwoIdeas

BlockMatrixComputations and the Singular Value Decomposition. ATaleofTwoIdeas BlockMatrixComputations and the Singular Value Decomposition ATaleofTwoIdeas Charles F. Van Loan Department of Computer Science Cornell University Supported in part by the NSF contract CCR-9901988. Block

More information

Department of Mathematics California State University, Los Angeles Master s Degree Comprehensive Examination in. NUMERICAL ANALYSIS Spring 2015

Department of Mathematics California State University, Los Angeles Master s Degree Comprehensive Examination in. NUMERICAL ANALYSIS Spring 2015 Department of Mathematics California State University, Los Angeles Master s Degree Comprehensive Examination in NUMERICAL ANALYSIS Spring 2015 Instructions: Do exactly two problems from Part A AND two

More information

Sparse linear solvers

Sparse linear solvers Sparse linear solvers Laura Grigori ALPINES INRIA and LJLL, UPMC On sabbatical at UC Berkeley March 2015 Plan Sparse linear solvers Sparse matrices and graphs Classes of linear solvers Sparse Cholesky

More information

QR Decomposition in a Multicore Environment

QR Decomposition in a Multicore Environment QR Decomposition in a Multicore Environment Omar Ahsan University of Maryland-College Park Advised by Professor Howard Elman College Park, MD oha@cs.umd.edu ABSTRACT In this study we examine performance

More information

Rank revealing factorizations, and low rank approximations

Rank revealing factorizations, and low rank approximations Rank revealing factorizations, and low rank approximations L. Grigori Inria Paris, UPMC January 2018 Plan Low rank matrix approximation Rank revealing QR factorization LU CRTP: Truncated LU factorization

More information

Avoiding communication in linear algebra. Laura Grigori INRIA Saclay Paris Sud University

Avoiding communication in linear algebra. Laura Grigori INRIA Saclay Paris Sud University Avoiding communication in linear algebra Laura Grigori INRIA Saclay Paris Sud University Motivation Algorithms spend their time in doing useful computations (flops) or in moving data between different

More information

Table 1. Comparison of QR Factorization (Square: , Tall-Skinny (TS): )

Table 1. Comparison of QR Factorization (Square: , Tall-Skinny (TS): ) ENHANCING PERFORMANCE OF TALL-SKINNY QR FACTORIZATION USING FPGAS Abid Rafique, Nachiket Kapre and George A. Constantinides Electrical and Electronic Engineering Department Imperial College London London,

More information

LU Factorization with Panel Rank Revealing Pivoting and Its Communication Avoiding Version

LU Factorization with Panel Rank Revealing Pivoting and Its Communication Avoiding Version LU Factorization with Panel Rank Revealing Pivoting and Its Communication Avoiding Version Amal Khabou James Demmel Laura Grigori Ming Gu Electrical Engineering and Computer Sciences University of California

More information

Lecture # 11 The Power Method for Eigenvalues Part II. The power method find the largest (in magnitude) eigenvalue of. A R n n.

Lecture # 11 The Power Method for Eigenvalues Part II. The power method find the largest (in magnitude) eigenvalue of. A R n n. Lecture # 11 The Power Method for Eigenvalues Part II The power method find the largest (in magnitude) eigenvalue of It makes two assumptions. 1. A is diagonalizable. That is, A R n n. A = XΛX 1 for some

More information

Main matrix factorizations

Main matrix factorizations Main matrix factorizations A P L U P permutation matrix, L lower triangular, U upper triangular Key use: Solve square linear system Ax b. A Q R Q unitary, R upper triangular Key use: Solve square or overdetrmined

More information

Important Matrix Factorizations

Important Matrix Factorizations LU Factorization Choleski Factorization The QR Factorization LU Factorization: Gaussian Elimination Matrices Gaussian elimination transforms vectors of the form a α, b where a R k, 0 α R, and b R n k 1,

More information

Provably Efficient Algorithms for Numerical Tensor Algebra

Provably Efficient Algorithms for Numerical Tensor Algebra Provably Efficient Algorithms for Numerical Tensor Algebra Edgar Solomonik Electrical Engineering and Computer Sciences University of California at Berkeley Technical Report No. UCB/EECS-2014-170 http://www.eecs.berkeley.edu/pubs/techrpts/2014/eecs-2014-170.html

More information

COALA: Communication Optimal Algorithms

COALA: Communication Optimal Algorithms COALA: Communication Optimal Algorithms for Linear Algebra Jim Demmel EECS & Math Depts. UC Berkeley Laura Grigori INRIA Saclay Ile de France Collaborators and Supporters Collaborators at Berkeley (campus

More information

Out-of-Core SVD and QR Decompositions

Out-of-Core SVD and QR Decompositions Out-of-Core SVD and QR Decompositions Eran Rabani and Sivan Toledo 1 Introduction out-of-core singular-value-decomposition algorithm. The algorithm is designed for tall narrow matrices that are too large

More information

Scientific Computing with Case Studies SIAM Press, Lecture Notes for Unit VII Sparse Matrix

Scientific Computing with Case Studies SIAM Press, Lecture Notes for Unit VII Sparse Matrix Scientific Computing with Case Studies SIAM Press, 2009 http://www.cs.umd.edu/users/oleary/sccswebpage Lecture Notes for Unit VII Sparse Matrix Computations Part 1: Direct Methods Dianne P. O Leary c 2008

More information

Numerical Linear Algebra Primer. Ryan Tibshirani Convex Optimization /36-725

Numerical Linear Algebra Primer. Ryan Tibshirani Convex Optimization /36-725 Numerical Linear Algebra Primer Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: proximal gradient descent Consider the problem min g(x) + h(x) with g, h convex, g differentiable, and h simple

More information

c 2013 Society for Industrial and Applied Mathematics

c 2013 Society for Industrial and Applied Mathematics SIAM J. MATRIX ANAL. APPL. Vol. 34, No. 3, pp. 1401 1429 c 2013 Society for Industrial and Applied Mathematics LU FACTORIZATION WITH PANEL RANK REVEALING PIVOTING AND ITS COMMUNICATION AVOIDING VERSION

More information

Communication-avoiding Krylov subspace methods

Communication-avoiding Krylov subspace methods Motivation Communication-avoiding Krylov subspace methods Mark mhoemmen@cs.berkeley.edu University of California Berkeley EECS MS Numerical Libraries Group visit: 28 April 2008 Overview Motivation Current

More information

Iterative methods, preconditioning, and their application to CMB data analysis. Laura Grigori INRIA Saclay

Iterative methods, preconditioning, and their application to CMB data analysis. Laura Grigori INRIA Saclay Iterative methods, preconditioning, and their application to CMB data analysis Laura Grigori INRIA Saclay Plan Motivation Communication avoiding for numerical linear algebra Novel algorithms that minimize

More information

The Future of LAPACK and ScaLAPACK

The Future of LAPACK and ScaLAPACK The Future of LAPACK and ScaLAPACK Jason Riedy, Yozo Hida, James Demmel EECS Department University of California, Berkeley November 18, 2005 Outline Survey responses: What users want Improving LAPACK and

More information

A model leading to self-consistent iteration computation with need for HP LA (e.g, diagonalization and orthogonalization)

A model leading to self-consistent iteration computation with need for HP LA (e.g, diagonalization and orthogonalization) A model leading to self-consistent iteration computation with need for HP LA (e.g, diagonalization and orthogonalization) Schodinger equation: Hψ = Eψ Choose a basis set of wave functions Two cases: Orthonormal

More information

Numerical Methods in Matrix Computations

Numerical Methods in Matrix Computations Ake Bjorck Numerical Methods in Matrix Computations Springer Contents 1 Direct Methods for Linear Systems 1 1.1 Elements of Matrix Theory 1 1.1.1 Matrix Algebra 2 1.1.2 Vector Spaces 6 1.1.3 Submatrices

More information

Practicality of Large Scale Fast Matrix Multiplication

Practicality of Large Scale Fast Matrix Multiplication Practicality of Large Scale Fast Matrix Multiplication Grey Ballard, James Demmel, Olga Holtz, Benjamin Lipshitz and Oded Schwartz UC Berkeley IWASEP June 5, 2012 Napa Valley, CA Research supported by

More information

A Computation- and Communication-Optimal Parallel Direct 3-body Algorithm

A Computation- and Communication-Optimal Parallel Direct 3-body Algorithm A Computation- and Communication-Optimal Parallel Direct 3-body Algorithm Penporn Koanantakool and Katherine Yelick {penpornk, yelick}@cs.berkeley.edu Computer Science Division, University of California,

More information

Fast QR decomposition of HODLR matrices

Fast QR decomposition of HODLR matrices Fast QR decomposition of HODLR matrices Daniel Kressner Ana Šušnjara Abstract The efficient and accurate QR decomposition for matrices with hierarchical low-rank structures, such as HODLR and hierarchical

More information

Lower Bounds on Algorithm Energy Consumption: Current Work and Future Directions. March 1, 2013

Lower Bounds on Algorithm Energy Consumption: Current Work and Future Directions. March 1, 2013 Lower Bounds on Algorithm Energy Consumption: Current Work and Future Directions James Demmel, Andrew Gearhart, Benjamin Lipshitz and Oded Schwartz Electrical Engineering and Computer Sciences University

More information

Solving large scale eigenvalue problems

Solving large scale eigenvalue problems arge scale eigenvalue problems, Lecture 5, March 23, 2016 1/30 Lecture 5, March 23, 2016: The QR algorithm II http://people.inf.ethz.ch/arbenz/ewp/ Peter Arbenz Computer Science Department, ETH Zürich

More information

Eigenvalue Problems and Singular Value Decomposition

Eigenvalue Problems and Singular Value Decomposition Eigenvalue Problems and Singular Value Decomposition Sanzheng Qiao Department of Computing and Software McMaster University August, 2012 Outline 1 Eigenvalue Problems 2 Singular Value Decomposition 3 Software

More information

Preliminary/Qualifying Exam in Numerical Analysis (Math 502a) Spring 2012

Preliminary/Qualifying Exam in Numerical Analysis (Math 502a) Spring 2012 Instructions Preliminary/Qualifying Exam in Numerical Analysis (Math 502a) Spring 2012 The exam consists of four problems, each having multiple parts. You should attempt to solve all four problems. 1.

More information

Lecture 9: Numerical Linear Algebra Primer (February 11st)

Lecture 9: Numerical Linear Algebra Primer (February 11st) 10-725/36-725: Convex Optimization Spring 2015 Lecture 9: Numerical Linear Algebra Primer (February 11st) Lecturer: Ryan Tibshirani Scribes: Avinash Siravuru, Guofan Wu, Maosheng Liu Note: LaTeX template

More information

COMPUTER SCIENCE 515 Numerical Linear Algebra SPRING 2006 ASSIGNMENT # 4 (39 points) February 27

COMPUTER SCIENCE 515 Numerical Linear Algebra SPRING 2006 ASSIGNMENT # 4 (39 points) February 27 Due Friday, March 1 in class COMPUTER SCIENCE 1 Numerical Linear Algebra SPRING 26 ASSIGNMENT # 4 (9 points) February 27 1. (22 points) The goal is to compare the effectiveness of five different techniques

More information

Section 4.5 Eigenvalues of Symmetric Tridiagonal Matrices

Section 4.5 Eigenvalues of Symmetric Tridiagonal Matrices Section 4.5 Eigenvalues of Symmetric Tridiagonal Matrices Key Terms Symmetric matrix Tridiagonal matrix Orthogonal matrix QR-factorization Rotation matrices (plane rotations) Eigenvalues We will now complete

More information

Total least squares. Gérard MEURANT. October, 2008

Total least squares. Gérard MEURANT. October, 2008 Total least squares Gérard MEURANT October, 2008 1 Introduction to total least squares 2 Approximation of the TLS secular equation 3 Numerical experiments Introduction to total least squares In least squares

More information

Performance of Random Sampling for Computing Low-rank Approximations of a Dense Matrix on GPUs

Performance of Random Sampling for Computing Low-rank Approximations of a Dense Matrix on GPUs Performance of Random Sampling for Computing Low-rank Approximations of a Dense Matrix on GPUs Théo Mary, Ichitaro Yamazaki, Jakub Kurzak, Piotr Luszczek, Stanimire Tomov, Jack Dongarra presenter 1 Low-Rank

More information

LINEAR ALGEBRA: NUMERICAL METHODS. Version: August 12,

LINEAR ALGEBRA: NUMERICAL METHODS. Version: August 12, LINEAR ALGEBRA: NUMERICAL METHODS. Version: August 12, 2000 74 6 Summary Here we summarize the most important information about theoretical and numerical linear algebra. MORALS OF THE STORY: I. Theoretically

More information

Low Rank Approximation of a Sparse Matrix Based on LU Factorization with Column and Row Tournament Pivoting

Low Rank Approximation of a Sparse Matrix Based on LU Factorization with Column and Row Tournament Pivoting Low Rank Approximation of a Sparse Matrix Based on LU Factorization with Column and Row Tournament Pivoting James Demmel Laura Grigori Sebastien Cayrols Electrical Engineering and Computer Sciences University

More information

Matrix decompositions

Matrix decompositions Matrix decompositions How can we solve Ax = b? 1 Linear algebra Typical linear system of equations : x 1 x +x = x 1 +x +9x = 0 x 1 +x x = The variables x 1, x, and x only appear as linear terms (no powers

More information

Numerical Methods I: Eigenvalues and eigenvectors

Numerical Methods I: Eigenvalues and eigenvectors 1/25 Numerical Methods I: Eigenvalues and eigenvectors Georg Stadler Courant Institute, NYU stadler@cims.nyu.edu November 2, 2017 Overview 2/25 Conditioning Eigenvalues and eigenvectors How hard are they

More information

Computing least squares condition numbers on hybrid multicore/gpu systems

Computing least squares condition numbers on hybrid multicore/gpu systems Computing least squares condition numbers on hybrid multicore/gpu systems M. Baboulin and J. Dongarra and R. Lacroix Abstract This paper presents an efficient computation for least squares conditioning

More information

Today s class. Linear Algebraic Equations LU Decomposition. Numerical Methods, Fall 2011 Lecture 8. Prof. Jinbo Bi CSE, UConn

Today s class. Linear Algebraic Equations LU Decomposition. Numerical Methods, Fall 2011 Lecture 8. Prof. Jinbo Bi CSE, UConn Today s class Linear Algebraic Equations LU Decomposition 1 Linear Algebraic Equations Gaussian Elimination works well for solving linear systems of the form: AX = B What if you have to solve the linear

More information

Applied Linear Algebra

Applied Linear Algebra Applied Linear Algebra Peter J. Olver School of Mathematics University of Minnesota Minneapolis, MN 55455 olver@math.umn.edu http://www.math.umn.edu/ olver Chehrzad Shakiban Department of Mathematics University

More information

A hybrid Hermitian general eigenvalue solver

A hybrid Hermitian general eigenvalue solver Available online at www.prace-ri.eu Partnership for Advanced Computing in Europe A hybrid Hermitian general eigenvalue solver Raffaele Solcà *, Thomas C. Schulthess Institute fortheoretical Physics ETHZ,

More information

Numerical Linear Algebra Primer. Ryan Tibshirani Convex Optimization

Numerical Linear Algebra Primer. Ryan Tibshirani Convex Optimization Numerical Linear Algebra Primer Ryan Tibshirani Convex Optimization 10-725 Consider Last time: proximal Newton method min x g(x) + h(x) where g, h convex, g twice differentiable, and h simple. Proximal

More information

lecture 2 and 3: algorithms for linear algebra

lecture 2 and 3: algorithms for linear algebra lecture 2 and 3: algorithms for linear algebra STAT 545: Introduction to computational statistics Vinayak Rao Department of Statistics, Purdue University August 27, 2018 Solving a system of linear equations

More information

LU Factorization. LU Decomposition. LU Decomposition. LU Decomposition: Motivation A = LU

LU Factorization. LU Decomposition. LU Decomposition. LU Decomposition: Motivation A = LU LU Factorization To further improve the efficiency of solving linear systems Factorizations of matrix A : LU and QR LU Factorization Methods: Using basic Gaussian Elimination (GE) Factorization of Tridiagonal

More information

On Incremental 2-norm Condition Estimators

On Incremental 2-norm Condition Estimators On Incremental 2-norm Condition Estimators Jurjen Duintjer Tebbens Institute of Computer Science Academy of Sciences of the Czech Republic duintjertebbens@cs.cas.cz Miroslav Tůma Institute of Computer

More information

be a Householder matrix. Then prove the followings H = I 2 uut Hu = (I 2 uu u T u )u = u 2 uut u

be a Householder matrix. Then prove the followings H = I 2 uut Hu = (I 2 uu u T u )u = u 2 uut u MATH 434/534 Theoretical Assignment 7 Solution Chapter 7 (71) Let H = I 2uuT Hu = u (ii) Hv = v if = 0 be a Householder matrix Then prove the followings H = I 2 uut Hu = (I 2 uu )u = u 2 uut u = u 2u =

More information

On Orthogonal Block Elimination. Christian Bischof and Xiaobai Sun. Argonne, IL Argonne Preprint MCS-P

On Orthogonal Block Elimination. Christian Bischof and Xiaobai Sun. Argonne, IL Argonne Preprint MCS-P On Orthogonal Block Elimination Christian Bischof and iaobai Sun Mathematics and Computer Science Division Argonne National Laboratory Argonne, IL 6439 fbischof,xiaobaig@mcs.anl.gov Argonne Preprint MCS-P45-794

More information

Algorithms and Perturbation Theory for Matrix Eigenvalue Problems and the SVD

Algorithms and Perturbation Theory for Matrix Eigenvalue Problems and the SVD Algorithms and Perturbation Theory for Matrix Eigenvalue Problems and the SVD Yuji Nakatsukasa PhD dissertation University of California, Davis Supervisor: Roland Freund Householder 2014 2/28 Acknowledgment

More information

Matrix Computations: Direct Methods II. May 5, 2014 Lecture 11

Matrix Computations: Direct Methods II. May 5, 2014 Lecture 11 Matrix Computations: Direct Methods II May 5, 2014 ecture Summary You have seen an example of how a typical matrix operation (an important one) can be reduced to using lower level BS routines that would

More information

9. Numerical linear algebra background

9. Numerical linear algebra background Convex Optimization Boyd & Vandenberghe 9. Numerical linear algebra background matrix structure and algorithm complexity solving linear equations with factored matrices LU, Cholesky, LDL T factorization

More information

Tall-and-skinny! QRs and SVDs in MapReduce

Tall-and-skinny! QRs and SVDs in MapReduce A 1 A 2 A 3 Tall-and-skinny! QRs and SVDs in MapReduce Yangyang Hou " Purdue, CS Austin Benson " Stanford University Paul G. Constantine Col. School. Mines " Joe Nichols U. of Minn James Demmel " UC Berkeley

More information