Avoiding Communication in Distributed-Memory Tridiagonalization
|
|
- Michael Arnold
- 5 years ago
- Views:
Transcription
1 Avoiding Communication in Distributed-Memory Tridiagonalization SIAM CSE 15 Nicholas Knight University of California, Berkeley March 14, 2015 Joint work with: Grey Ballard (SNL) James Demmel (UCB) Laura Grigori (INRIA) Mathias Jacquelin (LBNL) H. Diep Nguyen (UCB) Edgar Solomonik (ETHZ) Nicholas Knight SIAM CSE 15 1
2 The symmetric eigenproblem Three-phase direct approach for the symmetric eigenproblem: 1 Reduce symmetric matrix to tridiagonal by orthogonal similarity 2 Solve (symmetric) tridiagonal eigenproblem 3 Apply inverse orthogonal similarity to tridiagonal eigenvectors Nicholas Knight SIAM CSE 15 2
3 The symmetric eigenproblem Three-phase direct approach for the symmetric eigenproblem: 1 Reduce symmetric matrix to tridiagonal by orthogonal similarity 2 Solve (symmetric) tridiagonal eigenproblem 3 Apply inverse orthogonal similarity to tridiagonal eigenvectors This talk: improving phase 1 (tridiagonalization) Integrate communication-efficient QR factorization (CAQR) Nicholas Knight SIAM CSE 15 2
4 A class of algorithms for QR factorization (Q, R) = QR(A) A is m-by-n with m n 1 Return (Q, R) Nicholas Knight SIAM CSE 15 3
5 A class of algorithms for QR factorization (Q, R) = QR(A) A is m-by-n with m n 1 If n = 1 Compute orthogonal Q such that Q T A = R span(e 1 ) Return (Q, R) Nicholas Knight SIAM CSE 15 3
6 A class of algorithms for QR factorization (Q, R) = QR(A) A is m-by-n with m n 1 If n = 1 Compute orthogonal Q such that Q T A = R span(e 1 ) Else A11 A Partition A = 12 such that A A 21 A 11 is square, nonempty 22 Return (Q, R) Nicholas Knight SIAM CSE 15 3
7 A class of algorithms for QR factorization (Q, R) = QR(A) A is m-by-n with m n 1 If n = 1 Compute orthogonal Q such that Q T A = R span(e 1 ) Else A11 A Partition A = 12 such that A A 21 A 11 is square, nonempty 22 A11 (Q L, R 1 ) = QR A 21 Return (Q, R) panel factorization Nicholas Knight SIAM CSE 15 3
8 A class of algorithms for QR factorization (Q, R) = QR(A) A is m-by-n with m n 1 If n = 1 Compute orthogonal Q such that Q T A = R span(e 1 ) Else A11 A Partition A = 12 such that A A 21 A 11 is square, nonempty 22 A11 (Q L, R 1 ) = QR A 21 R2 = Q B L T A21 A 22 Return (Q, R) trailing matrix update Nicholas Knight SIAM CSE 15 3
9 A class of algorithms for QR factorization (Q, R) = QR(A) A is m-by-n with m n 1 If n = 1 Compute orthogonal Q such that Q T A = R span(e 1 ) Else A11 A Partition A = 12 such that A A 21 A 11 is square, nonempty 22 A11 (Q L, R 1 ) = QR A 21 R2 = Q B L T A21 A 22 (Q R, R 3 ) = QR(B) Return (Q, R) Nicholas Knight SIAM CSE 15 3
10 A class of algorithms for QR factorization (Q, R) = QR(A) A is m-by-n with m n 1 If n = 1 Compute orthogonal Q such that Q T A = R span(e 1 ) Else A11 A Partition A = 12 such that A A 21 A 11 is square, nonempty 22 A11 (Q L, R 1 ) = QR A 21 R2 = Q B L T A21 A 22 (Q R, R 3 ) = QR(B) I 0 Q = Q L 0 Q R Return (Q, R) Nicholas Knight SIAM CSE 15 3
11 A class of algorithms for QR factorization (Q, R) = QR(A) A is m-by-n with m n 1 If n = 1 Compute orthogonal Q such that Q T A = R span(e 1 ) Else A11 A Partition A = 12 such that A A 21 A 11 is square, nonempty 22 A11 (Q L, R 1 ) = QR A 21 R2 = Q B L T A21 A 22 (Q R, R 3 ) = QR(B) I 0 Q = Q L 0 Q R R1 R R = 2 0 R 3 Return (Q, R) Nicholas Knight SIAM CSE 15 3
12 A class of algorithms for QR factorization (Q, R) = QR b (A) A is m-by-n with m n 1 If n = 1 Compute orthogonal Q such that Q T A = R span(e 1 ) Else A11 A Partition A = 12 such that A A 21 A 11 is b-by-b 22 A11 (Q L, R 1 ) = QR b A 21 R2 = Q B L T A21 Blocking factor b: A 22 b = b(n) {1,..., n} (Q R, R 3 ) = QR b (B) b(n) > 1 b(n) > b(b(n)) I 0 Q = Q L 0 Q R R1 R R = 2 0 R 3 Return (Q, R) Nicholas Knight SIAM CSE 15 3
13 Particular choices of blocking function b b = n/2, b = n/k for k 2 b = 1 recursive right-looking (unblocked) b = max{1, n 1} { left-looking (unblocked) b 1 n > b 1 b = n b 1 1 { right-looking (1-level blocking) b = n b 1 n > b 1 n 1 n b 1 left-looking (1-level blocking) multi-level blocking... Nicholas Knight SIAM CSE 15 4
14 Householder representation for QR Use compact representation [SVL89] for orthogonal matrices arising in QR factorization (see also [SB95]). Q = H(Y ) = I YTY T Y is unit lower trapezoidal T is invertible upper triangular Q = I - Y T Y T Nicholas Knight SIAM CSE 15 5
15 Householder representation for QR Use compact representation [SVL89] for orthogonal matrices arising in QR factorization (see also [SB95]). Q = H(Y ) = I YTY T Y is unit lower trapezoidal T is invertible upper triangular T determined by Y T T + T 1 = Y T Y In practice: keep T explicit Q = I - Y T Y T T -T + T -1 = Y T Y Nicholas Knight SIAM CSE 15 5
16 Householder representation for QR Use compact representation [SVL89] for orthogonal matrices arising in QR factorization (see also [SB95]). Q = H(Y ) = I YTY T Y is unit lower trapezoidal T is invertible upper triangular T determined by Y T T + T 1 = Y T Y In practice: keep T explicit Q = I - Y T Y T T -T + T -1 = Y T Y Q, Q T can be applied efficiently Q T A = A YT T Y T A Q T A = A - Y T T Y T A well-optimized (mat. mul.) Nicholas Knight SIAM CSE 15 5
17 Householder QR factorization (Y, R) = QR(A) A is m-by-n with m n 1 If n = 1 R Compute vector Y such that H(Y ) T A = span(e 0 1 ) Else A11 A Partition A = 12 such that A A 21 A 11 is square, nonempty 22 A11 (Y L, R 1 ) = QR A ( 21 ) R2 = H(Y B L ) T A21 A 22 (Y R, R ( 3 ) = QR(B) ) 0 Y = Y L Y R R1 R R = 2 0 R 3 Return (Y, R) Nicholas Knight SIAM CSE 15 6
18 Communication-efficient Householder QR (Sca)LAPACK (P)xGEQRF use 1-level blocking to improve data locality in trailing matrix update (see, e.g., [Pug92]) Nicholas Knight SIAM CSE 15 7
19 Communication-efficient Householder QR (Sca)LAPACK (P)xGEQRF use 1-level blocking to improve data locality in trailing matrix update (see, e.g., [Pug92]) But, panel factorization can still be communication bottleneck E.g., A too tall/skinny to benefit from blocking Nicholas Knight SIAM CSE 15 7
20 Communication-efficient Householder QR (Sca)LAPACK (P)xGEQRF use 1-level blocking to improve data locality in trailing matrix update (see, e.g., [Pug92]) But, panel factorization can still be communication bottleneck E.g., A too tall/skinny to benefit from blocking Tall/skinny QR (TSQR) [DGHL12] Cheaper than Householder QR when A is sufficiently tall/skinny Comm.-avoiding Householder QR (CAQR): use TSQR as base-case Nicholas Knight SIAM CSE 15 7
21 Communication-efficient Householder QR (Sca)LAPACK (P)xGEQRF use 1-level blocking to improve data locality in trailing matrix update (see, e.g., [Pug92]) But, panel factorization can still be communication bottleneck E.g., A too tall/skinny to benefit from blocking Tall/skinny QR (TSQR) [DGHL12] Cheaper than Householder QR when A is sufficiently tall/skinny Comm.-avoiding Householder QR (CAQR): use TSQR as base-case Flops Words Messages Householder QR 2mn2 2n 3 /3 n log p CAQR p 2mn 2 2n 3 /3 p 2mn+n 2 /2 p 2mn+n 2 log p p 7 2 p log 3 p Leading-order costs of QR factorization (right-looking with 1-level blocking) of m-by-n matrix distributed over p processors in 2D fashion. Nicholas Knight SIAM CSE 15 7
22 Tall/skinny QR (TSQR) Key benefits of TSQR: sequential: read A once parallel: one reduction Q represented as tree (Y i ) i : Q = i H(Y i) Nicholas Knight SIAM CSE 15 8
23 Communication-avoiding QR (CAQR) CAQR uses TSQR for panel factorization and applies Q T to the trailing matrix by tree traversal More complicated than applying I YT T Y T. Nicholas Knight SIAM CSE 15 9
24 CAQR: Householder QR factorization using TSQR ((Y i ) i, R) = QR(A) A is m-by-n with m n 1 If A sufficiently tall/skinny ((Y i ) i, R) = TSQR(A) Else A11 A Partition A = 12 such that A A 21 A 11 is square, nonempty ( 22 ) A11 ((Y L,i ) i, R 1 ) = QR A 21 R2 = ( B i H(Y L,i)) T A21 A 22 ((Y R,i ) i, R 3 ) = QR(B) ( ) 0 (Y i ) i = concat (Y L,i ) i, Y R,i i R1 R R = 2 0 R 3 Return ((Y i ) i, R) Nicholas Knight SIAM CSE 15 10
25 Householder reconstruction for TSQR TSQR with Householder reconstruction (TSQR-HR) [BDG + 14]: Stably convert TSQR repr. (Y i ) i to Householder repr. Y CAQR-HR: simple and well-optimized trailing matrix update Performance portability (xgemm is likely more tuned than xtprfb) Flops Words Messages Householder QR 2mn2 2n 3 /3 n log p CAQR CAQR-HR p 2mn 2 2n 3 /3 p 2mn 2 2n 3 /3 p 2mn+n 2 /2 p 2mn+n 2 log p p 7 2mn+n 2 /2 p 2 p log 3 p 6 p log 2 p Leading-order costs of QR factorization (right-looking with 1-level blocking) of m-by-n matrix distributed over p processors in 2D fashion. Nicholas Knight SIAM CSE 15 11
26 CAQR-HR: CAQR with Householder reconstruction (Y, R) = QR(A) A is m-by-n with m n 1 If A sufficiently tall/skinny (Y, R) = TSQR-HR(A) Else A11 A Partition A = 12 such that A A 21 A 11 is square, nonempty 22 A11 (Y L, R 1 ) = QR A ( 21 ) R2 = H(Y B L ) T A21 A 22 (Y R, R 3 ) = QR(B) ( ) 0 Y = Y L Y R R1 R R = 2 0 R 3 Return (Y, R) Nicholas Knight SIAM CSE 15 12
27 Householder reconstruction Key idea: Let A = QR be a (thin) QR factorization, A = QR = (I YTY1 T )R, and rearrange to obtain an LU factorization, A1 R = Y ( TY1 T R) = L U. A 2 A-R = Y T Y T 1 R = L U Nicholas Knight SIAM CSE 15 13
28 Householder reconstruction If A 1 R is nonsingular, let Y = L where A1 R (L, U) = LU A 2 A1 R = Y ( TY1 T R). A 2 A-R = Y T Y T 1 R = L U Nicholas Knight SIAM CSE 15 14
29 Householder reconstruction If A 1 SR is nonsingular for some diagonal sign matrix S, instead compute A = ˆQ ˆR = (QS)(SR), and let Ŷ = L where A1 SR (L, U) = LU A 2 A1 SR = Ŷ ( T Ŷ 1 T SR). A 2 A-SR = Y T Y T 1 S R = L U Nicholas Knight SIAM CSE 15 14
30 Householder reconstruction It is always possible ( to) pick S such that Q 1 S is nonsingular and largest elements of Q1 S on diagonal. Q 2 Q1 S (L, U) = LU Q 2 Q1 S = Ŷ ( T Ŷ 1 T S). Q 2 Nicholas Knight SIAM CSE 15 14
31 Numerical stability TSQR-HR is as stable as Householder QR (up to constants) [BDG + 14]: Use TSQR to compute (thin) QR factorization LU factorization (well conditioned, no pivoting) Nicholas Knight SIAM CSE 15 15
32 Numerical stability TSQR-HR is as stable as Householder QR (up to constants) [BDG + 14]: Use TSQR to compute (thin) QR factorization LU factorization (well conditioned, no pivoting) ρ κ(a) A QR F I Q T Q F 1e e02 2.2e e-15 1e e04 2.3e e-15 1e e06 2.4e e-15 1e e08 2.3e e-15 1e e10 2.3e e-15 1e e12 2.2e e-15 1e e14 2.7e e-14 1e e15 2.3e e-15 Errors on tall and skinny matrices (m = 1000, b = 200) Nicholas Knight SIAM CSE 15 15
33 Conclusions Use communication-efficient QR factorization for tridiagonalization [BDG + 14]: Approach based on TSQR [AHW14]: Approach based on Cholesky-QR (w/ dynamic block-size) [FNYY14]: Approach based on Cholesky-QR (w/ iterative refinement) Nicholas Knight SIAM CSE 15 16
34 Conclusions Use communication-efficient QR factorization for tridiagonalization [BDG + 14]: Approach based on TSQR [AHW14]: Approach based on Cholesky-QR (w/ dynamic block-size) [FNYY14]: Approach based on Cholesky-QR (w/ iterative refinement) Ongoing work: 3D QR factorization Use recursive approach: b = n/2 Use 3D matrix multiplication for trailing matrix update Use fewer processors on base cases (TSQR-HR) O(p 1/6 )-fold decrease in # words O(p 1/6 )-fold increase in # messages Thank you! knight@cs.berkeley.edu Nicholas Knight SIAM CSE 15 16
35 References I T. Auckenthaler, T. Huckle, and R. Wittmann. A blocked QR-decomposition for the parallel symmetric eigenvalue problem. Parallel Computing, 40(7): , G. Ballard, J. Demmel, L. Grigori, M. Jacquelin, H. D. Nguyen, and E. Solomonik. Reconstructing Householder vectors from tall-skinny QR. In Parallel and Distributed Processing Symposium, 2014 IEEE 28th International, pages IEEE, J. Demmel, L. Grigori, M. Hoemmen, and J. Langou. Communication-optimal parallel and sequential QR and LU factorizations. SIAM Journal on Scientific Computing, 34(1):A206 A239, Nicholas Knight SIAM CSE 15 17
36 References II T. Fukaya, Y. Nakatsukasa, Y. Yanagisawa, and Y. Yamamoto. CholeskyQR2: a simple and communication-avoiding algorithm for computing a tall-skinny QR factorization on a large-scale parallel system. In Proceedings of the 5th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, pages IEEE Press, C. Puglisi. Modification of the Householder method based on the compact WY representation. SIAM Journal on Scientific and Statistical Computing, 13(3): , X. Sun and C. Bischof. A basis-kernel representation of orthogonal matrices. SIAM Journal on Matrix Analysis and Applications, 16(4): , Nicholas Knight SIAM CSE 15 18
37 References III R. Schreiber and C. Van Loan. A storage-efficient WY representation for products of Householder transformations. SIAM Journal on Scientific and Statistical Computing, 10(1):53 57, A. Tiskin. Communication-efficient parallel generic pairwise elimination. Future Generation Computer Systems, 23(2): , Nicholas Knight SIAM CSE 15 19
A communication-avoiding parallel algorithm for the symmetric eigenvalue problem
A communication-avoiding parallel algorithm for the symmetric eigenvalue problem Edgar Solomonik Department of Computer Science University of Illinois at Urbana-Champaign Householder Symposium XX June
More informationCommunication avoiding parallel algorithms for dense matrix factorizations
Communication avoiding parallel dense matrix factorizations 1/ 44 Communication avoiding parallel algorithms for dense matrix factorizations Edgar Solomonik Department of EECS, UC Berkeley October 2013
More informationA communication-avoiding parallel algorithm for the symmetric eigenvalue problem
A communication-avoiding parallel algorithm for the symmetric eigenvalue problem Edgar Solomonik 1, Grey Ballard 2, James Demmel 3, Torsten Hoefler 4 (1) Department of Computer Science, University of Illinois
More informationA communication-avoiding thick-restart Lanczos method on a distributed-memory system
A communication-avoiding thick-restart Lanczos method on a distributed-memory system Ichitaro Yamazaki and Kesheng Wu Lawrence Berkeley National Laboratory, Berkeley, CA, USA Abstract. The Thick-Restart
More informationCommunication-avoiding LU and QR factorizations for multicore architectures
Communication-avoiding LU and QR factorizations for multicore architectures DONFACK Simplice INRIA Saclay Joint work with Laura Grigori INRIA Saclay Alok Kumar Gupta BCCS,Norway-5075 16th April 2010 Communication-avoiding
More informationLU factorization with Panel Rank Revealing Pivoting and its Communication Avoiding version
1 LU factorization with Panel Rank Revealing Pivoting and its Communication Avoiding version Amal Khabou Advisor: Laura Grigori Université Paris Sud 11, INRIA Saclay France SIAMPP12 February 17, 2012 2
More informationHybrid static/dynamic scheduling for already optimized dense matrix factorization. Joint Laboratory for Petascale Computing, INRIA-UIUC
Hybrid static/dynamic scheduling for already optimized dense matrix factorization Simplice Donfack, Laura Grigori, INRIA, France Bill Gropp, Vivek Kale UIUC, USA Joint Laboratory for Petascale Computing,
More informationParallel Numerical Algorithms
Parallel Numerical Algorithms Chapter 5 Eigenvalue Problems Section 5.1 Michael T. Heath and Edgar Solomonik Department of Computer Science University of Illinois at Urbana-Champaign CS 554 / CSE 512 Michael
More informationQR Factorization of Tall and Skinny Matrices in a Grid Computing Environment
QR Factorization of Tall and Skinny Matrices in a Grid Computing Environment Emmanuel AGULLO (INRIA / LaBRI) Camille COTI (Iowa State University) Jack DONGARRA (University of Tennessee) Thomas HÉRAULT
More informationIntroduction to communication avoiding linear algebra algorithms in high performance computing
Introduction to communication avoiding linear algebra algorithms in high performance computing Laura Grigori Inria Rocquencourt/UPMC Contents 1 Introduction............................ 2 2 The need for
More informationMinisymposia 9 and 34: Avoiding Communication in Linear Algebra. Jim Demmel UC Berkeley bebop.cs.berkeley.edu
Minisymposia 9 and 34: Avoiding Communication in Linear Algebra Jim Demmel UC Berkeley bebop.cs.berkeley.edu Motivation (1) Increasing parallelism to exploit From Top500 to multicores in your laptop Exponentially
More informationCommunication-avoiding parallel and sequential QR factorizations
Communication-avoiding parallel and sequential QR factorizations James Demmel, Laura Grigori, Mark Hoemmen, and Julien Langou May 30, 2008 Abstract We present parallel and sequential dense QR factorization
More informationReproducible Tall-Skinny QR
Reproducible Tall-Skinny QR James Demmel and Hong Diep Nguyen University of California, Berkeley, Berkeley, USA demmel@cs.berkeley.edu, hdnguyen@eecs.berkeley.edu Abstract Reproducibility is the ability
More informationDense LU factorization and its error analysis
Dense LU factorization and its error analysis Laura Grigori INRIA and LJLL, UPMC February 2016 Plan Basis of floating point arithmetic and stability analysis Notation, results, proofs taken from [N.J.Higham,
More informationPerformance Evaluation of Some Inverse Iteration Algorithms on PowerXCell T M 8i Processor
Performance Evaluation of Some Inverse Iteration Algorithms on PowerXCell T M 8i Processor Masami Takata 1, Hiroyuki Ishigami 2, Kini Kimura 2, and Yoshimasa Nakamura 2 1 Academic Group of Information
More informationCommunication-avoiding parallel and sequential QR factorizations
Communication-avoiding parallel and sequential QR factorizations James Demmel Laura Grigori Mark Frederick Hoemmen Julien Langou Electrical Engineering and Computer Sciences University of California at
More informationIntroduction to communication avoiding algorithms for direct methods of factorization in Linear Algebra
Introduction to communication avoiding algorithms for direct methods of factorization in Linear Algebra Laura Grigori Abstract Modern, massively parallel computers play a fundamental role in a large and
More informationAvoiding Communication in Linear Algebra
Avoiding Communication in Linear Algebra Grey Ballard Microsoft Research Faculty Summit July 15, 2014 Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation,
More informationExploiting Low-Rank Structure in Computing Matrix Powers with Applications to Preconditioning
Exploiting Low-Rank Structure in Computing Matrix Powers with Applications to Preconditioning Erin C. Carson, Nicholas Knight, James Demmel, Ming Gu U.C. Berkeley SIAM PP 12, Savannah, Georgia, USA, February
More informationTradeoffs between synchronization, communication, and work in parallel linear algebra computations
Tradeoffs between synchronization, communication, and work in parallel linear algebra computations Edgar Solomonik, Erin Carson, Nicholas Knight, and James Demmel Department of EECS, UC Berkeley February,
More informationTall and Skinny QR Matrix Factorization Using Tile Algorithms on Multicore Architectures LAPACK Working Note - 222
Tall and Skinny QR Matrix Factorization Using Tile Algorithms on Multicore Architectures LAPACK Working Note - 222 Bilel Hadri 1, Hatem Ltaief 1, Emmanuel Agullo 1, and Jack Dongarra 1,2,3 1 Department
More informationTile QR Factorization with Parallel Panel Processing for Multicore Architectures
Tile QR Factorization with Parallel Panel Processing for Multicore Architectures Bilel Hadri, Hatem Ltaief, Emmanuel Agullo, Jack Dongarra Department of Electrical Engineering and Computer Science, University
More informationProvably efficient algorithms for numerical tensor algebra
Provably efficient algorithms for numerical tensor algebra Edgar Solomonik Department of EECS, Computer Science Division, UC Berkeley Dissertation Defense Adviser: James Demmel Aug 27, 2014 Edgar Solomonik
More informationScalable numerical algorithms for electronic structure calculations
Scalable numerical algorithms for electronic structure calculations Edgar Solomonik C Berkeley July, 2012 Edgar Solomonik Cyclops Tensor Framework 1/ 73 Outline Introduction Motivation: Coupled Cluster
More information2.5D algorithms for distributed-memory computing
ntroduction for distributed-memory computing C Berkeley July, 2012 1/ 62 ntroduction Outline ntroduction Strong scaling 2.5D factorization 2/ 62 ntroduction Strong scaling Solving science problems faster
More informationParallel Numerical Algorithms
Parallel Numerical Algorithms Chapter 6 Matrix Models Section 6.2 Low Rank Approximation Edgar Solomonik Department of Computer Science University of Illinois at Urbana-Champaign CS 554 / CSE 512 Edgar
More informationReconstructing Householder Vectors from Tall-Skinny QR
Reconstructing Householder Vectors from Tall-Skinny QR Grey Ballard James Demmel Laura Grigori Mathias Jacquelin Hong Die Nguyen Edgar Solomonik Electrical Engineering and Comuter Sciences University of
More informationCommunication Avoiding LU Factorization using Complete Pivoting
Communication Avoiding LU Factorization using Complete Pivoting Implementation and Analysis Avinash Bhardwaj Department of Industrial Engineering and Operations Research University of California, Berkeley
More informationCS 598: Communication Cost Analysis of Algorithms Lecture 9: The Ideal Cache Model and the Discrete Fourier Transform
CS 598: Communication Cost Analysis of Algorithms Lecture 9: The Ideal Cache Model and the Discrete Fourier Transform Edgar Solomonik University of Illinois at Urbana-Champaign September 21, 2016 Fast
More informationEnhancing Scalability of Sparse Direct Methods
Journal of Physics: Conference Series 78 (007) 0 doi:0.088/7-6596/78//0 Enhancing Scalability of Sparse Direct Methods X.S. Li, J. Demmel, L. Grigori, M. Gu, J. Xia 5, S. Jardin 6, C. Sovinec 7, L.-Q.
More informationMinimizing Communication in Linear Algebra. James Demmel 15 June
Minimizing Communication in Linear Algebra James Demmel 15 June 2010 www.cs.berkeley.edu/~demmel 1 Outline What is communication and why is it important to avoid? Direct Linear Algebra Lower bounds on
More informationROUNDOFF ERROR ANALYSIS OF THE CHOLESKYQR2 ALGORITHM
Electronic Transactions on Numerical Analysis. Volume 44, pp. 306 326, 2015. Copyright c 2015,. ISSN 1068 9613. ETNA ROUNDOFF ERROR ANALYSIS OF THE CHOLESKYQR2 ALGORITHM YUSAKU YAMAMOTO, YUJI NAKATSUKASA,
More informationCALU: A Communication Optimal LU Factorization Algorithm
CALU: A Communication Optimal LU Factorization Algorithm James Demmel Laura Grigori Hua Xiang Electrical Engineering and Computer Sciences University of California at Berkeley Technical Report No. UCB/EECS-010-9
More informationAccelerating linear algebra computations with hybrid GPU-multicore systems.
Accelerating linear algebra computations with hybrid GPU-multicore systems. Marc Baboulin INRIA/Université Paris-Sud joint work with Jack Dongarra (University of Tennessee and Oak Ridge National Laboratory)
More informationLAPACK-Style Codes for Pivoted Cholesky and QR Updating. Hammarling, Sven and Higham, Nicholas J. and Lucas, Craig. MIMS EPrint: 2006.
LAPACK-Style Codes for Pivoted Cholesky and QR Updating Hammarling, Sven and Higham, Nicholas J. and Lucas, Craig 2007 MIMS EPrint: 2006.385 Manchester Institute for Mathematical Sciences School of Mathematics
More informationarxiv: v1 [cs.dc] 14 May 2018
A 3D arallel Algorithm for QR Decomposition arxiv:18050578v1 [csdc 14 May 018 ABSTRACT Grey Ballard Wake Forest University Winston Salem, NC, USA ballard@wfuedu Mathias Jacquelin Lawrence Berkeley Natl
More informationSolving linear equations with Gaussian Elimination (I)
Term Projects Solving linear equations with Gaussian Elimination The QR Algorithm for Symmetric Eigenvalue Problem The QR Algorithm for The SVD Quasi-Newton Methods Solving linear equations with Gaussian
More informationCommunication-avoiding parallel algorithms for dense linear algebra
Communication-avoiding parallel algorithms 1/ 49 Communication-avoiding parallel algorithms for dense linear algebra Edgar Solomonik Department of EECS, C Berkeley June 2013 Communication-avoiding parallel
More informationThe QR Factorization
The QR Factorization How to Make Matrices Nicer Radu Trîmbiţaş Babeş-Bolyai University March 11, 2009 Radu Trîmbiţaş ( Babeş-Bolyai University) The QR Factorization March 11, 2009 1 / 25 Projectors A projector
More informationc 2015 Society for Industrial and Applied Mathematics
SIAM J MATRIX ANAL APPL Vol 36, No, pp 55 89 c 05 Society for Industrial and Applied Mathematics COMMUNICATION AVOIDING RANK REVEALING QR FACTORIZATION WITH COLUMN PIVOTING JAMES W DEMMEL, LAURA GRIGORI,
More informationLAPACK-Style Codes for Pivoted Cholesky and QR Updating
LAPACK-Style Codes for Pivoted Cholesky and QR Updating Sven Hammarling 1, Nicholas J. Higham 2, and Craig Lucas 3 1 NAG Ltd.,Wilkinson House, Jordan Hill Road, Oxford, OX2 8DR, England, sven@nag.co.uk,
More informationAPPLIED NUMERICAL LINEAR ALGEBRA
APPLIED NUMERICAL LINEAR ALGEBRA James W. Demmel University of California Berkeley, California Society for Industrial and Applied Mathematics Philadelphia Contents Preface 1 Introduction 1 1.1 Basic Notation
More informationTheoretical Computer Science
Theoretical Computer Science 412 (2011) 1484 1491 Contents lists available at ScienceDirect Theoretical Computer Science journal homepage: wwwelseviercom/locate/tcs Parallel QR processing of Generalized
More informationCommunication-Avoiding Parallel Algorithms for Dense Linear Algebra and Tensor Computations
Introduction Communication-Avoiding Parallel Algorithms for Dense Linear Algebra and Tensor Computations Edgar Solomonik Department of EECS, C Berkeley February, 2013 Edgar Solomonik Communication-avoiding
More informationScalable, Robust, Fault-Tolerant Parallel QR Factorization. Camille Coti
Communication-avoiding Fault-tolerant Scalable, Robust, Fault-Tolerant Parallel Factorization Camille Coti LIPN, CNRS UMR 7030, SPC, Université Paris 13, France DCABES August 24th, 2016 1 / 21 C. Coti
More informationSparse BLAS-3 Reduction
Sparse BLAS-3 Reduction to Banded Upper Triangular (Spar3Bnd) Gary Howell, HPC/OIT NC State University gary howell@ncsu.edu Sparse BLAS-3 Reduction p.1/27 Acknowledgements James Demmel, Gene Golub, Franc
More informationSince the determinant of a diagonal matrix is the product of its diagonal elements it is trivial to see that det(a) = α 2. = max. A 1 x.
APPM 4720/5720 Problem Set 2 Solutions This assignment is due at the start of class on Wednesday, February 9th. Minimal credit will be given for incomplete solutions or solutions that do not provide details
More informationScalable, Robust, Fault-Tolerant Parallel QR Factorization
Scalable, Robust, Fault-Tolerant Parallel QR Factorization Camille Coti LIPN, CNRS UMR 7030, Université Paris 13, Sorbonne Paris Cité 99, avenue Jean-Baptiste Clément, F-93430 Villetaneuse, FRANCE camille.coti@lipn.univ-paris13.fr
More informationBlockMatrixComputations and the Singular Value Decomposition. ATaleofTwoIdeas
BlockMatrixComputations and the Singular Value Decomposition ATaleofTwoIdeas Charles F. Van Loan Department of Computer Science Cornell University Supported in part by the NSF contract CCR-9901988. Block
More informationDepartment of Mathematics California State University, Los Angeles Master s Degree Comprehensive Examination in. NUMERICAL ANALYSIS Spring 2015
Department of Mathematics California State University, Los Angeles Master s Degree Comprehensive Examination in NUMERICAL ANALYSIS Spring 2015 Instructions: Do exactly two problems from Part A AND two
More informationSparse linear solvers
Sparse linear solvers Laura Grigori ALPINES INRIA and LJLL, UPMC On sabbatical at UC Berkeley March 2015 Plan Sparse linear solvers Sparse matrices and graphs Classes of linear solvers Sparse Cholesky
More informationQR Decomposition in a Multicore Environment
QR Decomposition in a Multicore Environment Omar Ahsan University of Maryland-College Park Advised by Professor Howard Elman College Park, MD oha@cs.umd.edu ABSTRACT In this study we examine performance
More informationRank revealing factorizations, and low rank approximations
Rank revealing factorizations, and low rank approximations L. Grigori Inria Paris, UPMC January 2018 Plan Low rank matrix approximation Rank revealing QR factorization LU CRTP: Truncated LU factorization
More informationAvoiding communication in linear algebra. Laura Grigori INRIA Saclay Paris Sud University
Avoiding communication in linear algebra Laura Grigori INRIA Saclay Paris Sud University Motivation Algorithms spend their time in doing useful computations (flops) or in moving data between different
More informationTable 1. Comparison of QR Factorization (Square: , Tall-Skinny (TS): )
ENHANCING PERFORMANCE OF TALL-SKINNY QR FACTORIZATION USING FPGAS Abid Rafique, Nachiket Kapre and George A. Constantinides Electrical and Electronic Engineering Department Imperial College London London,
More informationLU Factorization with Panel Rank Revealing Pivoting and Its Communication Avoiding Version
LU Factorization with Panel Rank Revealing Pivoting and Its Communication Avoiding Version Amal Khabou James Demmel Laura Grigori Ming Gu Electrical Engineering and Computer Sciences University of California
More informationLecture # 11 The Power Method for Eigenvalues Part II. The power method find the largest (in magnitude) eigenvalue of. A R n n.
Lecture # 11 The Power Method for Eigenvalues Part II The power method find the largest (in magnitude) eigenvalue of It makes two assumptions. 1. A is diagonalizable. That is, A R n n. A = XΛX 1 for some
More informationMain matrix factorizations
Main matrix factorizations A P L U P permutation matrix, L lower triangular, U upper triangular Key use: Solve square linear system Ax b. A Q R Q unitary, R upper triangular Key use: Solve square or overdetrmined
More informationImportant Matrix Factorizations
LU Factorization Choleski Factorization The QR Factorization LU Factorization: Gaussian Elimination Matrices Gaussian elimination transforms vectors of the form a α, b where a R k, 0 α R, and b R n k 1,
More informationProvably Efficient Algorithms for Numerical Tensor Algebra
Provably Efficient Algorithms for Numerical Tensor Algebra Edgar Solomonik Electrical Engineering and Computer Sciences University of California at Berkeley Technical Report No. UCB/EECS-2014-170 http://www.eecs.berkeley.edu/pubs/techrpts/2014/eecs-2014-170.html
More informationCOALA: Communication Optimal Algorithms
COALA: Communication Optimal Algorithms for Linear Algebra Jim Demmel EECS & Math Depts. UC Berkeley Laura Grigori INRIA Saclay Ile de France Collaborators and Supporters Collaborators at Berkeley (campus
More informationOut-of-Core SVD and QR Decompositions
Out-of-Core SVD and QR Decompositions Eran Rabani and Sivan Toledo 1 Introduction out-of-core singular-value-decomposition algorithm. The algorithm is designed for tall narrow matrices that are too large
More informationScientific Computing with Case Studies SIAM Press, Lecture Notes for Unit VII Sparse Matrix
Scientific Computing with Case Studies SIAM Press, 2009 http://www.cs.umd.edu/users/oleary/sccswebpage Lecture Notes for Unit VII Sparse Matrix Computations Part 1: Direct Methods Dianne P. O Leary c 2008
More informationNumerical Linear Algebra Primer. Ryan Tibshirani Convex Optimization /36-725
Numerical Linear Algebra Primer Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: proximal gradient descent Consider the problem min g(x) + h(x) with g, h convex, g differentiable, and h simple
More informationc 2013 Society for Industrial and Applied Mathematics
SIAM J. MATRIX ANAL. APPL. Vol. 34, No. 3, pp. 1401 1429 c 2013 Society for Industrial and Applied Mathematics LU FACTORIZATION WITH PANEL RANK REVEALING PIVOTING AND ITS COMMUNICATION AVOIDING VERSION
More informationCommunication-avoiding Krylov subspace methods
Motivation Communication-avoiding Krylov subspace methods Mark mhoemmen@cs.berkeley.edu University of California Berkeley EECS MS Numerical Libraries Group visit: 28 April 2008 Overview Motivation Current
More informationIterative methods, preconditioning, and their application to CMB data analysis. Laura Grigori INRIA Saclay
Iterative methods, preconditioning, and their application to CMB data analysis Laura Grigori INRIA Saclay Plan Motivation Communication avoiding for numerical linear algebra Novel algorithms that minimize
More informationThe Future of LAPACK and ScaLAPACK
The Future of LAPACK and ScaLAPACK Jason Riedy, Yozo Hida, James Demmel EECS Department University of California, Berkeley November 18, 2005 Outline Survey responses: What users want Improving LAPACK and
More informationA model leading to self-consistent iteration computation with need for HP LA (e.g, diagonalization and orthogonalization)
A model leading to self-consistent iteration computation with need for HP LA (e.g, diagonalization and orthogonalization) Schodinger equation: Hψ = Eψ Choose a basis set of wave functions Two cases: Orthonormal
More informationNumerical Methods in Matrix Computations
Ake Bjorck Numerical Methods in Matrix Computations Springer Contents 1 Direct Methods for Linear Systems 1 1.1 Elements of Matrix Theory 1 1.1.1 Matrix Algebra 2 1.1.2 Vector Spaces 6 1.1.3 Submatrices
More informationPracticality of Large Scale Fast Matrix Multiplication
Practicality of Large Scale Fast Matrix Multiplication Grey Ballard, James Demmel, Olga Holtz, Benjamin Lipshitz and Oded Schwartz UC Berkeley IWASEP June 5, 2012 Napa Valley, CA Research supported by
More informationA Computation- and Communication-Optimal Parallel Direct 3-body Algorithm
A Computation- and Communication-Optimal Parallel Direct 3-body Algorithm Penporn Koanantakool and Katherine Yelick {penpornk, yelick}@cs.berkeley.edu Computer Science Division, University of California,
More informationFast QR decomposition of HODLR matrices
Fast QR decomposition of HODLR matrices Daniel Kressner Ana Šušnjara Abstract The efficient and accurate QR decomposition for matrices with hierarchical low-rank structures, such as HODLR and hierarchical
More informationLower Bounds on Algorithm Energy Consumption: Current Work and Future Directions. March 1, 2013
Lower Bounds on Algorithm Energy Consumption: Current Work and Future Directions James Demmel, Andrew Gearhart, Benjamin Lipshitz and Oded Schwartz Electrical Engineering and Computer Sciences University
More informationSolving large scale eigenvalue problems
arge scale eigenvalue problems, Lecture 5, March 23, 2016 1/30 Lecture 5, March 23, 2016: The QR algorithm II http://people.inf.ethz.ch/arbenz/ewp/ Peter Arbenz Computer Science Department, ETH Zürich
More informationEigenvalue Problems and Singular Value Decomposition
Eigenvalue Problems and Singular Value Decomposition Sanzheng Qiao Department of Computing and Software McMaster University August, 2012 Outline 1 Eigenvalue Problems 2 Singular Value Decomposition 3 Software
More informationPreliminary/Qualifying Exam in Numerical Analysis (Math 502a) Spring 2012
Instructions Preliminary/Qualifying Exam in Numerical Analysis (Math 502a) Spring 2012 The exam consists of four problems, each having multiple parts. You should attempt to solve all four problems. 1.
More informationLecture 9: Numerical Linear Algebra Primer (February 11st)
10-725/36-725: Convex Optimization Spring 2015 Lecture 9: Numerical Linear Algebra Primer (February 11st) Lecturer: Ryan Tibshirani Scribes: Avinash Siravuru, Guofan Wu, Maosheng Liu Note: LaTeX template
More informationCOMPUTER SCIENCE 515 Numerical Linear Algebra SPRING 2006 ASSIGNMENT # 4 (39 points) February 27
Due Friday, March 1 in class COMPUTER SCIENCE 1 Numerical Linear Algebra SPRING 26 ASSIGNMENT # 4 (9 points) February 27 1. (22 points) The goal is to compare the effectiveness of five different techniques
More informationSection 4.5 Eigenvalues of Symmetric Tridiagonal Matrices
Section 4.5 Eigenvalues of Symmetric Tridiagonal Matrices Key Terms Symmetric matrix Tridiagonal matrix Orthogonal matrix QR-factorization Rotation matrices (plane rotations) Eigenvalues We will now complete
More informationTotal least squares. Gérard MEURANT. October, 2008
Total least squares Gérard MEURANT October, 2008 1 Introduction to total least squares 2 Approximation of the TLS secular equation 3 Numerical experiments Introduction to total least squares In least squares
More informationPerformance of Random Sampling for Computing Low-rank Approximations of a Dense Matrix on GPUs
Performance of Random Sampling for Computing Low-rank Approximations of a Dense Matrix on GPUs Théo Mary, Ichitaro Yamazaki, Jakub Kurzak, Piotr Luszczek, Stanimire Tomov, Jack Dongarra presenter 1 Low-Rank
More informationLINEAR ALGEBRA: NUMERICAL METHODS. Version: August 12,
LINEAR ALGEBRA: NUMERICAL METHODS. Version: August 12, 2000 74 6 Summary Here we summarize the most important information about theoretical and numerical linear algebra. MORALS OF THE STORY: I. Theoretically
More informationLow Rank Approximation of a Sparse Matrix Based on LU Factorization with Column and Row Tournament Pivoting
Low Rank Approximation of a Sparse Matrix Based on LU Factorization with Column and Row Tournament Pivoting James Demmel Laura Grigori Sebastien Cayrols Electrical Engineering and Computer Sciences University
More informationMatrix decompositions
Matrix decompositions How can we solve Ax = b? 1 Linear algebra Typical linear system of equations : x 1 x +x = x 1 +x +9x = 0 x 1 +x x = The variables x 1, x, and x only appear as linear terms (no powers
More informationNumerical Methods I: Eigenvalues and eigenvectors
1/25 Numerical Methods I: Eigenvalues and eigenvectors Georg Stadler Courant Institute, NYU stadler@cims.nyu.edu November 2, 2017 Overview 2/25 Conditioning Eigenvalues and eigenvectors How hard are they
More informationComputing least squares condition numbers on hybrid multicore/gpu systems
Computing least squares condition numbers on hybrid multicore/gpu systems M. Baboulin and J. Dongarra and R. Lacroix Abstract This paper presents an efficient computation for least squares conditioning
More informationToday s class. Linear Algebraic Equations LU Decomposition. Numerical Methods, Fall 2011 Lecture 8. Prof. Jinbo Bi CSE, UConn
Today s class Linear Algebraic Equations LU Decomposition 1 Linear Algebraic Equations Gaussian Elimination works well for solving linear systems of the form: AX = B What if you have to solve the linear
More informationApplied Linear Algebra
Applied Linear Algebra Peter J. Olver School of Mathematics University of Minnesota Minneapolis, MN 55455 olver@math.umn.edu http://www.math.umn.edu/ olver Chehrzad Shakiban Department of Mathematics University
More informationA hybrid Hermitian general eigenvalue solver
Available online at www.prace-ri.eu Partnership for Advanced Computing in Europe A hybrid Hermitian general eigenvalue solver Raffaele Solcà *, Thomas C. Schulthess Institute fortheoretical Physics ETHZ,
More informationNumerical Linear Algebra Primer. Ryan Tibshirani Convex Optimization
Numerical Linear Algebra Primer Ryan Tibshirani Convex Optimization 10-725 Consider Last time: proximal Newton method min x g(x) + h(x) where g, h convex, g twice differentiable, and h simple. Proximal
More informationlecture 2 and 3: algorithms for linear algebra
lecture 2 and 3: algorithms for linear algebra STAT 545: Introduction to computational statistics Vinayak Rao Department of Statistics, Purdue University August 27, 2018 Solving a system of linear equations
More informationLU Factorization. LU Decomposition. LU Decomposition. LU Decomposition: Motivation A = LU
LU Factorization To further improve the efficiency of solving linear systems Factorizations of matrix A : LU and QR LU Factorization Methods: Using basic Gaussian Elimination (GE) Factorization of Tridiagonal
More informationOn Incremental 2-norm Condition Estimators
On Incremental 2-norm Condition Estimators Jurjen Duintjer Tebbens Institute of Computer Science Academy of Sciences of the Czech Republic duintjertebbens@cs.cas.cz Miroslav Tůma Institute of Computer
More informationbe a Householder matrix. Then prove the followings H = I 2 uut Hu = (I 2 uu u T u )u = u 2 uut u
MATH 434/534 Theoretical Assignment 7 Solution Chapter 7 (71) Let H = I 2uuT Hu = u (ii) Hv = v if = 0 be a Householder matrix Then prove the followings H = I 2 uut Hu = (I 2 uu )u = u 2 uut u = u 2u =
More informationOn Orthogonal Block Elimination. Christian Bischof and Xiaobai Sun. Argonne, IL Argonne Preprint MCS-P
On Orthogonal Block Elimination Christian Bischof and iaobai Sun Mathematics and Computer Science Division Argonne National Laboratory Argonne, IL 6439 fbischof,xiaobaig@mcs.anl.gov Argonne Preprint MCS-P45-794
More informationAlgorithms and Perturbation Theory for Matrix Eigenvalue Problems and the SVD
Algorithms and Perturbation Theory for Matrix Eigenvalue Problems and the SVD Yuji Nakatsukasa PhD dissertation University of California, Davis Supervisor: Roland Freund Householder 2014 2/28 Acknowledgment
More informationMatrix Computations: Direct Methods II. May 5, 2014 Lecture 11
Matrix Computations: Direct Methods II May 5, 2014 ecture Summary You have seen an example of how a typical matrix operation (an important one) can be reduced to using lower level BS routines that would
More information9. Numerical linear algebra background
Convex Optimization Boyd & Vandenberghe 9. Numerical linear algebra background matrix structure and algorithm complexity solving linear equations with factored matrices LU, Cholesky, LDL T factorization
More informationTall-and-skinny! QRs and SVDs in MapReduce
A 1 A 2 A 3 Tall-and-skinny! QRs and SVDs in MapReduce Yangyang Hou " Purdue, CS Austin Benson " Stanford University Paul G. Constantine Col. School. Mines " Joe Nichols U. of Minn James Demmel " UC Berkeley
More information