Bindel, Fall 2016 Matrix Computations (CS 6210) Notes for

Similar documents
Orthogonal iteration to QR

Eigenvalue Problems. Eigenvalue problems occur in many areas of science and engineering, such as structural analysis

Numerical Methods in Matrix Computations

Scientific Computing: An Introductory Survey

Direct methods for symmetric eigenvalue problems

13-2 Text: 28-30; AB: 1.3.3, 3.2.3, 3.4.2, 3.5, 3.6.2; GvL Eigen2

Solving linear equations with Gaussian Elimination (I)

Index. for generalized eigenvalue problem, butterfly form, 211

EECS 275 Matrix Computation

Matrix Algorithms. Volume II: Eigensystems. G. W. Stewart H1HJ1L. University of Maryland College Park, Maryland

Orthogonal Eigenvectors and Gram-Schmidt

An Explanation of the MRRR algorithm to compute eigenvectors of symmetric tridiagonal matrices

Linear algebra & Numerical Analysis

The Lanczos and conjugate gradient algorithms

Math 504 (Fall 2011) 1. (*) Consider the matrices

Computation of eigenvalues and singular values Recall that your solutions to these questions will not be collected or evaluated.

Introduction. Chapter One

Numerical Methods I Non-Square and Sparse Linear Systems

Using Godunov s Two-Sided Sturm Sequences to Accurately Compute Singular Vectors of Bidiagonal Matrices.

AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 23: GMRES and Other Krylov Subspace Methods; Preconditioning

Parallel Singular Value Decomposition. Jiaxing Tan

EIGENVALUE PROBLEMS. EIGENVALUE PROBLEMS p. 1/4

B553 Lecture 5: Matrix Algebra Review

Matrices, Moments and Quadrature, cont d

11.3 Eigenvalues and Eigenvectors of a Tridiagonal Matrix

Iterative Methods for Solving A x = b

Last Time. Social Network Graphs Betweenness. Graph Laplacian. Girvan-Newman Algorithm. Spectral Bisection

AMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences)

Linear Algebra - Part II

6.4 Krylov Subspaces and Conjugate Gradients

Using semiseparable matrices to compute the SVD of a general matrix product/quotient

arxiv: v1 [math.na] 5 May 2011

Computing Eigenvalues and/or Eigenvectors;Part 2, The Power method and QR-algorithm

Total least squares. Gérard MEURANT. October, 2008

LINEAR ALGEBRA: NUMERICAL METHODS. Version: August 12,

Orthogonal iteration to QR

Linear Algebra Methods for Data Mining

632 CHAP. 11 EIGENVALUES AND EIGENVECTORS. QR Method

APPLIED NUMERICAL LINEAR ALGEBRA

Eigenvalue Problems and Singular Value Decomposition

Math 411 Preliminaries

Math Matrix Algebra

LARGE SPARSE EIGENVALUE PROBLEMS

5.6. PSEUDOINVERSES 101. A H w.

Computing the Complete CS Decomposition

Double-shift QR steps

QR-decomposition. The QR-decomposition of an n k matrix A, k n, is an n n unitary matrix Q and an n k upper triangular matrix R for which A = QR

Bindel, Fall 2016 Matrix Computations (CS 6210) Notes for

5 Selected Topics in Numerical Linear Algebra

QR Decomposition. When solving an overdetermined system by projection (or a least squares solution) often the following method is used:

Lecture 5 Singular value decomposition

MATCOM and Selected MATCOM Functions

1 Extrapolation: A Hint of Things to Come

Iterative Methods. Splitting Methods

Chapter 12 Block LU Factorization

Scientific Computing with Case Studies SIAM Press, Lecture Notes for Unit VII Sparse Matrix

Data Mining Lecture 4: Covariance, EVD, PCA & SVD

Numerical Analysis Lecture Notes

Eigenvalue and Eigenvector Homework

THE QR METHOD A = Q 1 R 1

SVD, PCA & Preprocessing

Multivariate Statistical Analysis

4.8 Arnoldi Iteration, Krylov Subspaces and GMRES

Notes for CS542G (Iterative Solvers for Linear Systems)

Index. Copyright (c)2007 The Society for Industrial and Applied Mathematics From: Matrix Methods in Data Mining and Pattern Recgonition By: Lars Elden

Numerical Linear Algebra

Bindel, Spring 2016 Numerical Analysis (CS 4220) Notes for

LARGE SPARSE EIGENVALUE PROBLEMS. General Tools for Solving Large Eigen-Problems

Iterative Algorithm for Computing the Eigenvalues

A Cholesky LR algorithm for the positive definite symmetric diagonal-plus-semiseparable eigenproblem

Computing Eigenvalues and/or Eigenvectors;Part 2, The Power method and QR-algorithm

Block Bidiagonal Decomposition and Least Squares Problems

STAT 309: MATHEMATICAL COMPUTATIONS I FALL 2018 LECTURE 13

Lecture 11: CMSC 878R/AMSC698R. Iterative Methods An introduction. Outline. Inverse, LU decomposition, Cholesky, SVD, etc.

5.3 The Power Method Approximation of the Eigenvalue of Largest Module

q n. Q T Q = I. Projections Least Squares best fit solution to Ax = b. Gram-Schmidt process for getting an orthonormal basis from any basis.

Numerical Methods I: Eigenvalues and eigenvectors

COMP6237 Data Mining Covariance, EVD, PCA & SVD. Jonathon Hare

Foundations of Matrix Analysis

Numerical Methods I Eigenvalue Problems

BlockMatrixComputations and the Singular Value Decomposition. ATaleofTwoIdeas

From Matrix to Tensor. Charles F. Van Loan

11.0 Introduction. An N N matrix A is said to have an eigenvector x and corresponding eigenvalue λ if. A x = λx (11.0.1)

Begin accumulation of transformation matrices. This block skipped when i=1. Use u and u/h stored in a to form P Q.

Fundamentals of Numerical Linear Algebra

Computational Methods. Eigenvalues and Singular Values

JACOBI S ITERATION METHOD

The Godunov Inverse Iteration: A Fast and Accurate Solution to the Symmetric Tridiagonal Eigenvalue Problem

Eigenvalue problems and optimization

Section 4.5 Eigenvalues of Symmetric Tridiagonal Matrices

1 Eigenvalues and eigenvectors

Matrix Factorization and Analysis

Iassume you share the view that the rapid

Bindel, Fall 2016 Matrix Computations (CS 6210) Notes for

Solving large scale eigenvalue problems

Iterative methods for Linear System

MATRIX AND LINEAR ALGEBR A Aided with MATLAB

Arnoldi Methods in SLEPc

A Method for Constructing Diagonally Dominant Preconditioners based on Jacobi Rotations

STAT 309: MATHEMATICAL COMPUTATIONS I FALL 2018 LECTURE 9

Transcription:

1 Algorithms Notes for 2016-10-31 There are several flavors of symmetric eigenvalue solvers for which there is no equivalent (stable) nonsymmetric solver. We discuss four algorithmic ideas: the workhorse QR algorithm that we have already seen, the Jacobi iteration, divide-and-conquer, and bisection with inverse iteration. The details of these algorithms are quite technical (particularly the divideand-conquer method and bisection with inverse iteration). But even if you are not planning to focus in numerical linear algebra, you should know a little about the shape of these algorithms. Why? Three reasons: Intellectual fun! To make an informed choice of algorithms. To re-use building blocks. 2 Symmetric QR 2.1 The eigenvalues of symmetric A Like the nonsymmetric QR iteration, the symmetric QR iteration involves an initial reduction, but to a tridiagonal form. This is really the same as the Hessenberg reduction step; but a symmetric Hessenberg matrix is tridiagonal, and we can use that. Similarly, when we perform bulge-chasing, the intermediate matrices remain symmetric, and so we never have a matrix that is more than a few elements away from tridiagonal. For the symmetric case, there is a little difference in how we choose shifts: Wilkinson shifts are fine since there are only real eigenvalues no need for the Francis double-shift strategy. But otherwise, the main difference is that each step of the tridiagonal QR iteration maps between representations with O(n) parameters in O(n) time. Each eigenvalue converges in roughly a constant number of iterations, so the cost to compute all eigenvalues of a tridiagonal is O(n 2 ). Compared to the O(n 3 ) cost of reducing to tridiagonal form in the first place, the cost of solving for the eigenvalues of the tridiagonal is thus quit modest.

If we want all the eigenvalues of a sparse matrix, and only want the eigenvalues, the algorithm is basically the fastest option. But if we want eigenvectors as well, then the QR iteration is more expensive, costing an additional O(n 3 ); other methods run faster. 2.2 QR iteration for singular values Now consider the case of computing the singular values of a matrix A. We could compute the singular values directly from the eigenvalues of the Gram matrix A T A (or AA T, or the Golub-Kahan matrix). But the backward error associated with tridiagonal reduction is proportional to the norm of the matrix A T A (or the square norm of A) and this can look quite big compared to the square of the smallest singular values. So rather than work with A T A explicitly, we prefer to manipulate A in order to run the same algorithm implicitly. The first step of the QR iteration for the singular value problem is thus bidiagonalization; that is, we compute T A = ÛB ˆV where B is an upper bidiagonal matrix. Note that A T A = ˆV B T B ˆV T = ˆV T ˆV T, i.e. B is the Cholesky factor of the tridiagonal matrix A that we would obtain by tridiagonalization of the Gram matrix A T A. But we can compute B directly by alternately applying transformations to A from the left and the right. After the bidiagonal reduction, we want to do implicit QR steps. As a shift, we use the square root of the trailing corner element of the tridiagonal B T B; in terms of B, this is just the norm of the last column: σ = b 2 n 1,n + b 2 n,n. With this shift in hand, we could apply the first step of shifted QR and complete the process implicitly via bulge chasing in the same way we did for the nonsymmetric case. In practice, there is an alternate algorithm (the dqds method) that enjoys extra stability benefits, allowing us to compute the singular values of B to high relative accuracy. 1 1 Of course, we usually lose high relative accuracy of the small singular values through

3 Jacobi iteration A Jacobi rotation is a symmetric transformation that diagonalizes a 2 2 (sub)matrix: J T AJ = Λ In terms of scalars, this means solving [ ] [ ] [ ] [ ] c s α β c s λ1 0 =, s c β γ s c 0 λ 2 where c = cos(θ) and s = sin(θ). A numerically stable method for carrying out this computation is described in Golub and Van Loan 2 ; we will leave the details aside for the purposes of these notes. Given an algorithm for computing Jacobi rotations, the idea of the Jacobi iteration is to incrementally apply Jacobi rotations to each 2 2 principle minors of A in turn. The code is remarkably simple: 1 % [A] = jacobi_sweep(a, nsweeps) 2 % 3 % Apply nsweeps Jacobi iteration sweeps (default is 1). 4 % 5 function [A] = jacobi_sweep(a, nsweeps) 6 7 n = length(a); 8 if nargin < 2, nsweeps = 1; end 9 10 for sweep = 1:nsweeps 11 for k = 2:n 12 for l = 1:k-1 13 J = jacobi_rot(a(k,k), A(k,l), A(l,l)); 14 A([k l], :) = J *A([k l], :); 15 A(:, [k l]) = A(:, [k l])*j; 16 end 17 end 18 end Each time we apply a Jacobi iteration to rows and columns k and l, we reduce the sum of squares off the main diagonal by a 2 kl. The iteration converges quadratically, and typically we can stop after 5 10 sweeps. Each the initial reduction to bidiagonal the backward error for that reduction is only small relative to the norm of A. 2 Or on Wikipedia!

sweep costs O(n 3 ), and has cost comparable to the cost of a tridiagonalization step before running QR iteration. Hence, Jacobi iteration is rather slow. On the other hand, it tends to compute the small eigenvalues of A much more accurate than competing methods that start with a tridiagonalization step. 3.1 Divide and conquer The symmetric QR iteration and the Jacobi iteration are methods that an appropriately motivated student could reasonably code 3 The divide and conquer method, on the other hand, is much more numerically subtle. Nonetheless, the ideas are quite interesting, and it is worth spending a moment describing the strategy at a high level. Like the symmetric QR iteration, the divide-and-conquer method is preceded by a tridiagonalization process. After tridiagonalization, we think of the tridiagonal matrix as a block 2 2 matrix [ ] [ ] [ ] [ ] T11 βe T = k e T T 1 T11 0 ek ek βe 1 e T = + β. k T 22 0 T22 e 1 e 1 where T 11 and T 22 are tridiagonal submatrices of the original tridiagonal, and T 11 and T 22 are these submatrices with β subtracted from a corner entry. We compute the eigendecompositions Q T 11 T 11 Q 11 = D 1 and similarly for T 22 by applying the divide-and-conquer method recursively. We then have Q T T Q = D + βuu T where Q is a block diagonal orthogonal matrix with diagonal blocks Q 11 and Q 22, and u consists of the last row of Q 11 and the first row of Q 22 stacked atop each other. Assuming we can perform this reduction, we now need to find solutions to det(d + βuu T λi) = 0. Without loss of generality, consider the case where u has no zero elements 4 ; then none of the diagonal entries of D are eigenvalues of A, but we can write the eigenvalues of A as solutions to f(λ) = det(d + βuu T λi)/ det(d λi) = 1 + βu T (D λi) 1 u, 3 I do not really recommend this, but it is not implausible. 4 If there is a zero element in u, we have a converged eigenvalue on the diagonal, and can deflate it away

where we have used the identity (a good homework problem) det(i + XY T ) = det(i + Y T X). The equation f(λ) = 0 is known as a secular equation, and it is a particularly nice type of rational function. The values d i are poles of f, and there is one solution to f(λ) = 0 between each pair of poles, as well as one that is either smaller than min d i or greater than max d i. We can compute solutions to the secular equation very efficiently using a variant of Newton s method. Naively, this iteration would seem to require O(n) time per step in order to evaluate f and its derivatives at any given point. However, the evaluation time can be reduced significantly using the fast multipole method, so that the overall time is close to O(1) per step; as a consequence, finding all the solutions to one secular equation takes O(n) time. The difficulty in the divide-and-conquer algorithm lies primarily in obtaining accurate eigenvector estimates. The problem occurs when eigenvalues cluster together; in this case, one tends to compute eigenvector estimates which look good on their own, but which are not orthogonal to each other. Fixing this problem while retaining good performance was a technical tour de force, and we will not even attempt to do it full justice here. We suffice it to say that the divide-and-conquer algorithm is quite fast if we want all eigenvalues and eigenvectors, particularly if the eigenvectors are clustered. 4 Bisection with inverse iteration Our final algorithm for the tridiagonal eigenvalue problem 5 is based on Sylvester s inertia theorem, which says that congruent matrices have the same inertia. In particular, that means that if we perform the symmetric factorization T σi = LDL T, the number of positive, negative, and zero diagonal entries of D is the same as the number of eigenvalues of A that are respectively greater than, less than, and equal to σ. The bisection algorithm recursively partitions an initial interval containing all eigenvalues of T into smaller intervals that either 5 As with QR and divide-conquer, the bisection iteration is typically preceded by a tridiagonal reduction.

Contain no eigenvalues of A; Contain exactly one eigenvalue of A; or Are smaller in length than some tolerance. The center point of each interval containing exactly one eigenvalue is a reasonable estimate for that eigenvalue, good enough to guarantee convergence of a shift-invert iteration. And this is the bisection algorithm in a nutshell. The main technical difficulty with the bisection algorithm lies not in the bisection step (which is remarkably well-behaved even though it involves an unpivoted factorization of an indefinite matrix), but in the computation of the eigenvectors. As with the divide-and-conquer scheme, bisection with inverse iteration tends to yield eigenvector estimates which individually have small residuals, but which may not be adequately orthogonal for vectors that correspond to eigenvalues in a cluster. One could use explicit re-orthogonalization, but the modern Grail code (the RRR routines in LAPACK nomenclature) manages stability in a much more subtle and clever way. This was the thesis work of Inderjit Dhillon before he turned his energies to machine learning and data mining; more recently, the SVD case was the prize-winning thesis work of Paul Willems. The fact that the algorithm merits multiple highlyregarded PhD theses should tell you something of the subtleties in getting it right. The Grail code is so-called because it has optimal complexity for computing eigenvectors (given the eigenvalues); to obtain k eigenvectors requires only O(kn) time. This is generally the fastest way to compute a specified subset of eigenvectors, and is often the fastest way to compute all eigenvectors.