Parallel Numerical Algorithms

Similar documents
Numerical Methods. Elena loli Piccolomini. Civil Engeneering. piccolom. Metodi Numerici M p. 1/??

Linear Least squares

Orthogonal Transformations

A communication-avoiding parallel algorithm for the symmetric eigenvalue problem

Numerical Analysis Lecture Notes

A communication-avoiding parallel algorithm for the symmetric eigenvalue problem

3 QR factorization revisited

Avoiding Communication in Distributed-Memory Tridiagonalization

Matrix decompositions

Communication avoiding parallel algorithms for dense matrix factorizations

Matrix Factorization and Analysis

5.6. PSEUDOINVERSES 101. A H w.

Math 102, Winter Final Exam Review. Chapter 1. Matrices and Gaussian Elimination

CS 598: Communication Cost Analysis of Algorithms Lecture 9: The Ideal Cache Model and the Discrete Fourier Transform

Parallel Numerical Algorithms

Preliminary/Qualifying Exam in Numerical Analysis (Math 502a) Spring 2012

AMS 209, Fall 2015 Final Project Type A Numerical Linear Algebra: Gaussian Elimination with Pivoting for Solving Linear Systems

Parallel Numerical Algorithms

Orthonormal Transformations

Total 170. Name. Final Examination M340L-CS

We will discuss matrix diagonalization algorithms in Numerical Recipes in the context of the eigenvalue problem in quantum mechanics, m A n = λ m

EECS 275 Matrix Computation

(Linear equations) Applied Linear Algebra in Geoscience Using MATLAB

Applied Linear Algebra in Geoscience Using MATLAB

Numerical Linear Algebra

Orthonormal Transformations and Least Squares

Lecture 3: QR-Factorization

Tradeoffs between synchronization, communication, and work in parallel linear algebra computations

Computational Methods. Least Squares Approximation/Optimization

Notes on Eigenvalues, Singular Values and QR

Glossary of Linear Algebra Terms. Prepared by Vince Zaccone For Campus Learning Assistance Services at UCSB

Applied Linear Algebra in Geoscience Using MATLAB

Scientific Computing: An Introductory Survey

Gaussian Elimination and Back Substitution

This can be accomplished by left matrix multiplication as follows: I

(17) (18)

Chapter 3 Transformations

2.5D algorithms for distributed-memory computing

Math 18, Linear Algebra, Lecture C00, Spring 2017 Review and Practice Problems for Final Exam

14.2 QR Factorization with Column Pivoting

EE5120 Linear Algebra: Tutorial 1, July-Dec Solve the following sets of linear equations using Gaussian elimination (a)

Applied Numerical Linear Algebra. Lecture 8

Derivation of the Kalman Filter

Problem 1. CS205 Homework #2 Solutions. Solution

Matrix Algebra for Engineers Jeffrey R. Chasnov

Applied Linear Algebra in Geoscience Using MATLAB

MS&E 318 (CME 338) Large-Scale Numerical Optimization

Index. book 2009/5/27 page 121. (Page numbers set in bold type indicate the definition of an entry.)

CME 302: NUMERICAL LINEAR ALGEBRA FALL 2005/06 LECTURE 6

Lecture 4 Orthonormal vectors and QR factorization

lecture 2 and 3: algorithms for linear algebra

Since the determinant of a diagonal matrix is the product of its diagonal elements it is trivial to see that det(a) = α 2. = max. A 1 x.

lecture 3 and 4: algorithms for linear algebra

Math 520 Exam 2 Topic Outline Sections 1 3 (Xiao/Dumas/Liaw) Spring 2008

Matrix decompositions

Scalable numerical algorithms for electronic structure calculations

Introduction to communication avoiding algorithms for direct methods of factorization in Linear Algebra

Communication Avoiding LU Factorization using Complete Pivoting

7. LU factorization. factor-solve method. LU factorization. solving Ax = b with A nonsingular. the inverse of a nonsingular matrix

AM205: Assignment 2. i=1

MAT 610: Numerical Linear Algebra. James V. Lambers

Dense LU factorization and its error analysis

Linear Algebra, part 3 QR and SVD

ENGG5781 Matrix Analysis and Computations Lecture 8: QR Decomposition

Conceptual Questions for Review

2. Linear algebra. matrices and vectors. linear equations. range and nullspace of matrices. function of vectors, gradient and Hessian

Pivoting. Reading: GV96 Section 3.4, Stew98 Chapter 3: 1.3

(Mathematical Operations with Arrays) Applied Linear Algebra in Geoscience Using MATLAB

Linear Algebra Linear Algebra : Matrix decompositions Monday, February 11th Math 365 Week #4

Linear algebra & Numerical Analysis

AMS526: Numerical Analysis I (Numerical Linear Algebra)

AMS526: Numerical Analysis I (Numerical Linear Algebra)

Orthonormal Bases; Gram-Schmidt Process; QR-Decomposition

AMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences)

Gaussian Elimination without/with Pivoting and Cholesky Decomposition

be a Householder matrix. Then prove the followings H = I 2 uut Hu = (I 2 uu u T u )u = u 2 uut u

MATH 532: Linear Algebra

Some notes on Linear Algebra. Mark Schmidt September 10, 2009

Numerical Methods in Matrix Computations

SUMMARY OF MATH 1600

Today s class. Linear Algebraic Equations LU Decomposition. Numerical Methods, Fall 2011 Lecture 8. Prof. Jinbo Bi CSE, UConn

Sparse BLAS-3 Reduction

Matrix Factorizations

ANSWERS. E k E 2 E 1 A = B

Linear Algebra Review

Matrix decompositions

Linear Algebra Section 2.6 : LU Decomposition Section 2.7 : Permutations and transposes Wednesday, February 13th Math 301 Week #4

Cheat Sheet for MATH461

MATHEMATICS FOR COMPUTER VISION WEEK 2 LINEAR SYSTEMS. Dr Fabio Cuzzolin MSc in Computer Vision Oxford Brookes University Year

BlockMatrixComputations and the Singular Value Decomposition. ATaleofTwoIdeas

Linear Algebra, part 3. Going back to least squares. Mathematical Models, Analysis and Simulation = 0. a T 1 e. a T n e. Anna-Karin Tornberg

STAT 309: MATHEMATICAL COMPUTATIONS I FALL 2018 LECTURE 13

Applied Linear Algebra

Homework 2 Foundations of Computational Math 2 Spring 2019

MATH 1120 (LINEAR ALGEBRA 1), FINAL EXAM FALL 2011 SOLUTIONS TO PRACTICE VERSION

1 9/5 Matrices, vectors, and their applications

Numerical Methods I Solving Square Linear Systems: GEM and LU factorization

Fundamentals of Engineering Analysis (650163)

Lecture 4: Applications of Orthogonality: QR Decompositions

Review Questions REVIEW QUESTIONS 71

Transcription:

Parallel Numerical Algorithms Chapter 5 Eigenvalue Problems Section 5.1 Michael T. Heath and Edgar Solomonik Department of Computer Science University of Illinois at Urbana-Champaign CS 554 / CSE 512 Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 1 / 19

Outline 1 2 3 Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 2 / 19

For given m n matrix A, with m > n, QR factorization has form [ ] R A = Q O where matrix Q is m m and orthogonal, and R is n n and upper triangular Can be used to solve linear systems, least squares problems, and eigenvalue problems As with Gaussian elimination, zeros are introduced successively into matrix A, eventually reaching upper triangular form, but using orthogonal transformations instead of elementary eliminators Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 3 / 19

Methods for Householder transformations (elementary reflectors) Givens transformations (plane rotations) Gram-Schmidt orthogonalization Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 4 / 19

Householder transformation has form where v is nonzero vector H = I 2 vvt v T v From definition, H = H T = H 1, so H is both orthogonal and symmetric For given vector a, choose v so that α 1 0 Ha =. = α 0. = αe 1 0 0 Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 5 / 19

Substituting into formula for H, we see that we can take v = a αe 1 and to preserve norm we must have α = ± a 2, with sign chosen to avoid cancellation Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 6 / 19

Householder for k = 1 to n α k = sign(a kk ) a 2 kk + + a2 mk v k = [ 0 0 a kk a mk ] T αk e k β k = v T k v k if β k = 0 then continue with next k for j = k to n γ j = v T k a j a j = a j (2γ j /β k )v k end end Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 7 / 19

Basis-Kernel Representations A Householder matrix H is represented by H = I uu T, i.e. a rank-1 perturbation of the identity We can combine r Householder matrices H 1,..., H r into a rank-r peturbation of the identity H = r H r = I UV T, where U, V R n k i=1 Often, V = UT where T is upper-triangular and U is lower-triangular, yielding H = I UT T U T If H i = I u i u T i, then the ith column of U is u i, while T is defined by T 1 + T T = U T U Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 8 / 19

Parallel Householder QR A basis kernel representation of Householder transformations, allows us to update a trailing matrix B as HB = (I UT T U T )B = B U(T T (U T B)) with cost O(n 2 r) Performing such updates is essentially as hard as Schur complement updates in LU Forming Householder vector v k is also analogous to computing multipliers in Gaussian elimination Thus, parallel implementation is similar to parallel LU, but with Householder vectors broadcast horizontally instead of multipliers Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 9 / 19

Panel Finding Householder vector u i requires computation of the norm of the leading vector of the ith trailing matrix, creating a latency bottleneck much like that of pivot row selection in partial pivoting Other methods need S = Θ(log(p)) rather than Θ(n) msgs For example Cholesky-QR and Cholesky-QR2 perform R = Cholesky(A T A), Q = AR 1 (Cholesky-QR2 does one step of refinement), requiring only a single allreduce, but losing stability Unconditional stability and O(log(p)) messages achieved by TSQR algorithm with row-wise recursion (akin to tournament pivoting) Basis-kernel representation can be recovered by constructing first r columns of H Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 10 / 19

Givens rotation operates on pair of rows to introduce single zero For given 2-vector a = [a 1 a 2 ] T, if c = a 1, s = a 2 1 + a 2 2 a 2 a 2 1 + a 2 2 then [ ] [ ] c s a1 Ga = = s c a 2 [ ] α 0 Scalars c and s are cosine and sine of angle of rotation, and c 2 + s 2 = 1, so G is orthogonal Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 11 / 19

Givens Givens rotations can be systematically applied to successive pairs of rows of matrix A to zero entire strict lower triangle Subdiagonal entries of matrix can be annihilated in various possible orderings (but once introduced, zeros should be preserved) Each rotation must be applied to all entries in relevant pair of rows, not just entries determining c and s Once upper triangular form is reached, product of rotations, Q, is orthogonal, so we have QR factorization of A Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 12 / 19

Parallel Givens With 1-D partitioning of A by columns, parallel implementation of Givens QR factorization is similar to parallel Householder QR factorization, with cosines and sines broadcast horizontally and each task updating its portion of relevant rows With 1-D partitioning of A by rows, broadcast of cosines and sines is unnecessary, but there is no parallelism unless multiple pairs of rows are processed simultaneously Fortunately, it is possible to process multiple pairs of rows simultaneously without interfering with each other Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 13 / 19

Parallel Givens Stage at which each subdiagonal entry can be annihilated is shown here for 8 8 example 7 6 8 5 7 9 4 6 8 10 3 5 7 9 11 2 4 6 8 10 12 1 3 5 7 9 11 13 Maximum parallelism is n/2 at stage n 1 for n n matrix Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 14 / 19

Parallel Givens QR Wavefront Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 15 / 19

Parallel Givens Communication cost is high, but can be reduced by having each task initially reduce its entire local set of rows to upper triangular form, which requires no communication Then, in subsequent phase, task pairs cooperate in annihilating additional entries using one row from each of two tasks, exchanging data as necessary Various strategies can be used for combining results of first phase, depending on underlying network topology Parallel partitioning with slanted-panels (slope -2) achieve same scalablility as parallel algorithms for LU without pivoting (see [Tiskin 2007]) Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 16 / 19

Parallel Givens With 2-D partitioning of A, parallel implementation combines features of 1-D column and 1-D row algorithms In particular, sets of rows can be processed simultaneously to annihilate multiple entries, but updating of rows requires horizontal broadcast of cosines and sines Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 17 / 19

References E. Chu and A. George, QR factorization of a dense matrix on a hypercube multiprocessor, SIAM J. Sci. Stat. Comput. 11:990-1028, 1990 M. Cosnard, J. M. Muller, and Y. Robert, Parallel QR decomposition of a rectangular matrix, Numer. Math. 48:239-249, 1986 M. Cosnard and Y. Robert, Complexity of parallel QR factorization, J. ACM 33:712-723, 1986 E. Elmroth and F. G. Gustavson, Applying recursion to serial and parallel QR factorization leads to better performance, IBM J. Res. Develop. 44:605-624, 2000 Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 18 / 19

References B. Hendrickson, Parallel QR factorization using the torus-wrap mapping, Parallel Comput. 19:1259-1271, 1993. F. T. Luk, A rotation method for computing the QR-decomposition, SIAM J. Sci. Stat. Comput. 7:452-459, 1986 D. P. O Leary and P. Whitman, Parallel QR factorization by Householder and modified Gram-Schmidt algorithms, Parallel Comput. 16:99-112, 1990. A. Pothen and P. Raghavan, Distributed orthogonal factorization: Givens and Householder algorithms, SIAM J. Sci. Stat. Comput. 10:1113-1134, 1989 A. Tiskin, Communication-efficient parallel generic pairwise elimination. Future Generation Computer Systems 23.2 (2007): 179-188. Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 19 / 19