Algorithms and Perturbation Theory for Matrix Eigenvalue Problems and the SVD

Similar documents
Algorithms and Perturbation Theory for Matrix Eigenvalue Problems and the Singular Value Decomposition

Numerical Methods in Matrix Computations

Matrix Algorithms. Volume II: Eigensystems. G. W. Stewart H1HJ1L. University of Maryland College Park, Maryland

BACKWARD STABILITY OF ITERATIONS FOR COMPUTING THE POLAR DECOMPOSITION

Algorithms for Solving the Polynomial Eigenvalue Problem

Solving linear equations with Gaussian Elimination (I)

COMPUTING FUNDAMENTAL MATRIX DECOMPOSITIONS ACCURATELY VIA THE MATRIX SIGN FUNCTION IN TWO ITERATIONS: THE POWER OF ZOLOTAREV S FUNCTIONS

Bindel, Fall 2016 Matrix Computations (CS 6210) Notes for

Computation of eigenvalues and singular values Recall that your solutions to these questions will not be collected or evaluated.

Scientific Computing with Case Studies SIAM Press, Lecture Notes for Unit VII Sparse Matrix

A DIVIDE-AND-CONQUER METHOD FOR THE TAKAGI FACTORIZATION

QR-decomposition. The QR-decomposition of an n k matrix A, k n, is an n n unitary matrix Q and an n k upper triangular matrix R for which A = QR

Notes on Eigenvalues, Singular Values and QR

Index. for generalized eigenvalue problem, butterfly form, 211

Solving large scale eigenvalue problems

Direct methods for symmetric eigenvalue problems

Computing Eigenvalues and/or Eigenvectors;Part 2, The Power method and QR-algorithm

forms Christopher Engström November 14, 2014 MAA704: Matrix factorization and canonical forms Matrix properties Matrix factorization Canonical forms

Math 405: Numerical Methods for Differential Equations 2016 W1 Topics 10: Matrix Eigenvalues and the Symmetric QR Algorithm

Orthogonal Eigenvectors and Gram-Schmidt

Lecture 2: Computing functions of dense matrices

AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 23: GMRES and Other Krylov Subspace Methods; Preconditioning

The Lanczos and conjugate gradient algorithms

be a Householder matrix. Then prove the followings H = I 2 uut Hu = (I 2 uu u T u )u = u 2 uut u

EIGENVALUE PROBLEMS. Background on eigenvalues/ eigenvectors / decompositions. Perturbation analysis, condition numbers..

The geometric mean algorithm

Preliminary/Qualifying Exam in Numerical Analysis (Math 502a) Spring 2012

Downloaded 08/19/14 to Redistribution subject to SIAM license or copyright; see

Dense LU factorization and its error analysis

Today s class. Linear Algebraic Equations LU Decomposition. Numerical Methods, Fall 2011 Lecture 8. Prof. Jinbo Bi CSE, UConn

Block Bidiagonal Decomposition and Least Squares Problems

Eigenvalue Problems and Singular Value Decomposition

Accelerating computation of eigenvectors in the nonsymmetric eigenvalue problem

Numerical Linear Algebra Primer. Ryan Tibshirani Convex Optimization /36-725

Orthonormal Transformations and Least Squares

Analysis of Block LDL T Factorizations for Symmetric Indefinite Matrices

Accelerating computation of eigenvectors in the dense nonsymmetric eigenvalue problem

Math 504 (Fall 2011) 1. (*) Consider the matrices

Orthogonal iteration to QR

APPLIED NUMERICAL LINEAR ALGEBRA

LU Factorization. LU factorization is the most common way of solving linear systems! Ax = b LUx = b

5 Selected Topics in Numerical Linear Algebra

Math 411 Preliminaries

Numerical Methods - Numerical Linear Algebra

A Cholesky LR algorithm for the positive definite symmetric diagonal-plus-semiseparable eigenproblem

Numerical Methods I Eigenvalue Problems

Orthonormal Transformations

Analysis of the Cholesky Method with Iterative Refinement for Solving the Symmetric Definite Generalized Eigenproblem

1. Introduction. The CS decomposition [7, Section 2.5.4] allows any partitioned V 0 U 2 S

Key words. conjugate gradients, normwise backward error, incremental norm estimation.

Numerical Linear Algebra

Program Lecture 2. Numerical Linear Algebra. Gaussian elimination (2) Gaussian elimination. Decompositions, numerical aspects

KU Leuven Department of Computer Science

S.F. Xu (Department of Mathematics, Peking University, Beijing)

Research Matters. February 25, The Nonlinear Eigenvalue Problem. Nick Higham. Part III. Director of Research School of Mathematics

Solving Symmetric Semi-definite (ill-conditioned) Generalized Eigenvalue Problems

Homework 2 Foundations of Computational Math 2 Spring 2019

Solution of eigenvalue problems. Subspace iteration, The symmetric Lanczos algorithm. Harmonic Ritz values, Jacobi-Davidson s method

Re-design of Higher level Matrix Algorithms for Multicore and Heterogeneous Architectures. Based on the presentation at UC Berkeley, October 7, 2009

Computing Eigenvalues and/or Eigenvectors;Part 2, The Power method and QR-algorithm

the Unitary Polar Factor æ Ren-Cang Li P.O. Box 2008, Bldg 6012

arxiv: v1 [cs.lg] 26 Jul 2017

Lecture 8: Linear Algebra Background

EIGENVALUE PROBLEMS. EIGENVALUE PROBLEMS p. 1/4

Solution of eigenvalue problems. Subspace iteration, The symmetric Lanczos algorithm. Harmonic Ritz values, Jacobi-Davidson s method

Orthogonal Transformations

Review of similarity transformation and Singular Value Decomposition

Avoiding Communication in Distributed-Memory Tridiagonalization

Total least squares. Gérard MEURANT. October, 2008

REORTHOGONALIZATION FOR GOLUB KAHAN LANCZOS BIDIAGONAL REDUCTION: PART II SINGULAR VECTORS

Recent advances in approximation using Krylov subspaces. V. Simoncini. Dipartimento di Matematica, Università di Bologna.

Lecture 2 Decompositions, perturbations

Computational Linear Algebra

A Tour of the Lanczos Algorithm and its Convergence Guarantees through the Decades

Nick Higham. Director of Research School of Mathematics

Introduction. Chapter One

A CS decomposition for orthogonal matrices with application to eigenvalue computation

Is there a Small Skew Cayley Transform with Zero Diagonal?

LARGE SPARSE EIGENVALUE PROBLEMS. General Tools for Solving Large Eigen-Problems

Off-diagonal perturbation, first-order approximation and quadratic residual bounds for matrix eigenvalue problems

The Algorithm of Multiple Relatively Robust Representations for Multi-Core Processors

Sparse BLAS-3 Reduction

Intel Math Kernel Library (Intel MKL) LAPACK

Section 4.5 Eigenvalues of Symmetric Tridiagonal Matrices

KEYWORDS. Numerical methods, generalized singular values, products of matrices, quotients of matrices. Introduction The two basic unitary decompositio

Last Time. Social Network Graphs Betweenness. Graph Laplacian. Girvan-Newman Algorithm. Spectral Bisection

Dominant feature extraction

LARGE SPARSE EIGENVALUE PROBLEMS

Singular Value Decomposition

The QR Decomposition

The Future of LAPACK and ScaLAPACK

Computing the common zeros of two bivariate functions via Bézout resultants Colorado State University, 26th September 2013

Eigenvalue Problems. Eigenvalue problems occur in many areas of science and engineering, such as structural analysis

Estimating the Largest Elements of a Matrix

A QR-decomposition of block tridiagonal matrices generated by the block Lanczos process

AMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences)

Matrix Analysis and Algorithms

Scientific Computing: An Introductory Survey

Exploiting off-diagonal rank structures in the solution of linear matrix equations

Linear Algebra Methods for Data Mining

Transcription:

Algorithms and Perturbation Theory for Matrix Eigenvalue Problems and the SVD Yuji Nakatsukasa PhD dissertation University of California, Davis Supervisor: Roland Freund Householder 2014

2/28 Acknowledgment for the supervision and support... Zhaojun Bai Nick Higham Françoise Tisseur

2/28 Acknowledgment for the supervision and support... Zhaojun Bai Nick Higham Françoise Tisseur for the collaboration and friendship... Kensuke Aishima Rüdiger Borsdorf Stefan Güttel Vanni Noferini Alex Townsend

3/28 Dissertation content: references I Matrix decomposition algorithms N., Aishima, Yamazaki. dqds with agg. deflation. SIMAX, 2012. N., Z. Bai, and F. Gygi. Optimizing Halleys iteration for the polar decomposition. SIMAX, 2010. N., Higham, Backward stability of polar decomp. alg. SIMAX, 2012. N., Higham, Spectral d-c alg. for symeig and SVD, SISC, 2013. II Eigenvalue perturbation theory Li, N., Truhar, Xu, Pert. for partitioned Hermitian GEP, SIMAX, 2011. N., absolute/relative Weyl theorem for GEP, LAA, 2010. N., Perturbation of a multiple generalized eigenvalue, BIT, 2010. N., Gerschgorin-type theorem for GEP in the Euclidean metric. Math. Comp., 2011. N., Pert. for Hermitian block tridiagonal matrices. APNUM, 2012. N., Condition numbers of a multiple generalized eigenvalue, Numer. Math., 2012. N., The tan θ theorem with relaxed conditions. LAA, 2012.

4/28 Dissertation content: table of contents I Matrix decomposition algorithms spectral divide-and-conquer algorithms for eigenproblems polar decomposition algorithm (type (3 k, 3 k 1) Zolotarev) for symeig and SVD led to Zolotarev-based algorithms (Tuesday s talk) + generalized eigenproblems stability proof for polar and symeig, SVD [N., Higham SIMAX (12), SISC (13)] bidiagonal singular values: dqds + aggressive early deflation [N., Aishima, Yamazaki SIMAX (12)] II Eigenvalue perturbation theory Weyl-type bounds for generalized eigenproblems off-diagonal, block tridiagonal perturbation eigenvector bounds, tan θ theorem Gerschgorin theory for generalized eigenproblems

Dissertation content: table of contents I Matrix decomposition algorithms spectral divide-and-conquer algorithms for eigenproblems polar decomposition algorithm (type (3 k, 3 k 1) Zolotarev) for symeig and SVD led to Zolotarev-based algorithms (Tuesday s talk) + generalized eigenproblems stability proof for polar and symeig, SVD [N., Higham SIMAX (12), SISC (13)] bidiagonal singular values: dqds + aggressive early deflation [N., Aishima, Yamazaki SIMAX (12)] II Eigenvalue perturbation theory Weyl-type bounds for generalized eigenproblems off-diagonal, block tridiagonal perturbation eigenvector bounds, tan θ theorem Gerschgorin theory for generalized eigenproblems today s plan: a few tricks I learned show how perturbation theory inspires algorithm design 4/28

Dissertation content: table of contents I Matrix decomposition algorithms spectral divide-and-conquer algorithms for eigenproblems polar decomposition algorithm (type (3 k, 3 k 1) Zolotarev) for symeig and SVD led to Zolotarev-based algorithms (Tuesday s talk) + generalized eigenproblems stability proof for polar and symeig, SVD [N., Higham SIMAX (12), SISC (13)] bidiagonal singular values: dqds + aggressive early deflation [N., Aishima, Yamazaki SIMAX (12)] II Eigenvalue perturbation theory Weyl-type bounds for generalized eigenproblems off-diagonal, block tridiagonal perturbation eigenvector bounds, tan θ theorem Gerschgorin theory for generalized eigenproblems today s plan: a few tricks I learned show how perturbation theory inspires algorithm design 4/28

5/28 Tricks I ve learned 1. (almost) all matrix iterations employ rational approximation examples: QR algorithm, expm, polar, shift-invert Arnoldi 2. O(ɛ) off-diagonal perturbation results in O(ɛ 2 ) change in eigenvalues [Li, Li (05)] E T eig A 1 0 eig A 1 0 A 2 E A 2 E 2 gap even when generalized nonsymmetric [Li, N., Truhar, Xu SIMAX (11)] eig A 1 0 λ B 1 0 eig A 1 E 1 λ B 1 F T 1 0 A 2 0 B 2 E 2 A 2 F 2 B 2 ( E + λf ) 2 gap(a 1 λb 1, A 2 λb 2 ) can be proved also by a Gerschgorin-type argument [N, Math. Comp. (11)] 3. Influence of diagonal blocks connected by k off-diagonals of O(ɛ) decays like O( ɛk gap ) [Paige LAA (74), N, Apnum. (11)]

Polar decomposition A = U p H algorithms Scaled Newton Iteration X k+1 = 1 2 ( µk X k + µ 1 k ) X k, X0 = A. Higham (1986): Gave optimal µ k and cheap approximation 2 Byers-Xu (2008): ζ k+1 = (ζk + 1/ζ k ), ζ 0 = 1/ ab, a A 2, b σ min (A) QDWH (QR-based dyn. weigh. Halley) X k+1 = X k (a k I + b k X k X k)(i + c k X k X k) 1, [N., Bai & Gygi (2010)]. X 0 = A/α. Convergence cubic, 6 iterations in double precision. QR-based DWH ck X k = I Q 1 R, Q 2 X k+1 = b k c k X k + 1 ck ( ak b k c k ) Q1 Q 2 Are the algorithms backward stable? (experimentally yes) 6/28

Polar decomposition A = U p H algorithms Scaled Newton Iteration :type (2,1) Zolotarev X k+1 = 1 ( ) µk X k + µ 1 k 2 X k, X0 = A. Higham (1986): Gave optimal µ k and cheap approximation 2 Byers-Xu (2008): ζ k+1 = (ζk + 1/ζ k ), ζ 0 = 1/ ab, a A 2, b σ min (A) QDWH (QR-based dyn. weigh. Halley) :type (3,2) Zolotarev [N., Bai & Gygi (2010)]. X k+1 = X k (a k I + b k X k X k)(i + c k X k X k) 1, X 0 = A/α. Convergence cubic, 6 iterations in double precision. QR-based DWH ck X k = I Q 1 R, Q 2 X k+1 = b k c k X k + 1 ck ( ak b k c k ) Q1 Q 2 Are the algorithms backward stable? (experimentally yes) 6/28

7/28 Backward Stability Assume Ĥ is Hermitian. Alg is backward stable if Û p Ĥ = A + A, A = ɛ A, Ĥ = H + H, Û p = U p + U, where H Hermitian psd and U p unitary. H = ɛ H, U = ɛ U p, crucial consequence: symeig and SVD are backward stable [N. and Higham, SISC (13)]

Backward Stability Assume Ĥ is Hermitian. Alg is backward stable if Û p Ĥ = A + A, A = ɛ A, Ĥ = H + H, Û p = U p + U, where H Hermitian psd and U p unitary. H = ɛ H, U = ɛ U p, crucial consequence: symeig and SVD are backward stable We develop a global analysis of iterations for polar that proves some are backward stable, correctly predicts that others are not stable. Strategy: [N. and Higham, SISC (13)] take account of rounding errors within each iteration and error propagation between iterations. key fact: Hermitian factor H is well-conditioned [Bhatia (94), Higham (08)] 7/28

8/28 Statement Suppose 1. Iteration form: X k+1 = f k (X k ), X 0 = A, X k U p.

8/28 Statement Suppose 1. Iteration form: X k+1 = f k (X k ), X 0 = A, X k U p. 2. Mixed stable evaluation of iteration There is an X k C n n such that X k+1 = f k ( X k ) + ɛ X k+1 2, X k = X k + ɛ X k 2.

8/28 Statement Suppose 1. Iteration form: X k+1 = f k (X k ), X 0 = A, X k U p. 2. Mixed stable evaluation of iteration There is an X k C n n such that X k+1 = f k ( X k ) + ɛ X k+1 2, X k = X k + ɛ X k 2. 3. Mapping function condition f k does not significantly decrease relative size of σ i f k (σ i ) 1 ( ) σi, d 1. f k ( X k ) 2 d X k 2

Statement Suppose 1. Iteration form: X k+1 = f k (X k ), X 0 = A, X k U p. 2. Mixed stable evaluation of iteration There is an X k C n n such that X k+1 = f k ( X k ) + ɛ X k+1 2, X k = X k + ɛ X k 2. 3. Mapping function condition f k does not significantly decrease relative size of σ i f k (σ i ) 1 ( ) σi, d 1. f k ( X k ) 2 d X k 2 Theorem 1 Suppose X l X l = I + ɛ, let Û p = X l and Ĥ = 1 2 (Û pa + (Û pa) ). Then Û p Ĥ = A + dɛ A 2, Ĥ = H + dɛ H 2, where H is the Hermitian polar factor of A. Furthermore, Û p = U p + dɛκ 2 (A). 8/28

9/28 Condition on f k : good mapping 1 f(x) f(x) 1 0.75 0.5 x M 0.75 0.5 f(x) f(x) x M 0.25 0.25 0 m M 0 m M QDWH iteration f (x) = x a + bx2 1 + cx 2, a stable mapping, d = 1. Scaled Newton iteration f (x) = 1 2 (µx + (µx) 1 ), a stable mapping, d = 1.

10/28 Condition on f k : bad mapping 1 0.75 f(x) f(x) 0.5 x M 0.25 0 m M 3 Inverse Newton iteration f (x) = 2µx(1 + µ 2 x 2 ) 1, an unstable mapping. Newton Schulz iteration f (x) = 1 2 x(3 x2 ), an unstable mapping if M 3.

11/28 QDWH is stable QR-based implementation (QDWH) 1 f(x) f(x) [ ] ck X k I = [ Q1 Q 2 ] R, X k+1 = bk c k X k + 1 ck ( ak bk c k ) Q1 Q 2 0.75 0.5 0.25 x M 0 m M Use Householder QR factorization with column pivoting and row sorting (or pivoting). The QR factorization has row-wise b errs of order ρ i u, where growth factors ρ i (1 + 2) n 1 (Cox and Higham, 1998). ρ i usually small in practice. Can prove that mixed stable evaluation of iteration condition holds. No pivoting is fine in practice. But blocking order matters: [ ] [ ] I Q2 = R is unstable ck X k Q 1

12/28 Scaled Newton stability Mixed stable condition holds if matrix inverse computed using mixed backward forward stable method. Condition on f k holds. 1 0.75 Conclusion Scaled Newton is backward stable. 0.5 0.25 f(x) f(x) x M 0 m M History: Higham (85): raised the question of backward stability. Kielbasiński, Ziȩtak (03): long and complicated analysis proving backward stability, assuming matrix inverses are computed in a mixed backward forward stable way. Byers, Xu (08): proof with much simpler arguments, but some incompleteness in analysis [Kielbasiński, Ziȩtak (10)]

13/28 Extra: Is the (degree-17) Zolotarev-polar stable? 1. Mixed stable evaluation of iteration There is an X k C n n such that X k+1 = f k ( X k ) + ɛ X k+1 2, X k = X k + ɛ X k 2. 2. Mapping function condition f k does not significantly decrease relative size of σ i f k (σ i ) 1 ( ) σi, d 1. f k ( X k ) 2 d X k 2 1.4 Type (7,6) 1.2 1 0.8 0.6 0.4 0.2 0 0 0.2 0.4 0.6 0.8 1

13/28 Extra: Is the (degree-17) Zolotarev-polar stable? 1. Mixed stable evaluation of iteration? There is an X k C n n such that X k+1 = f k ( X k ) + ɛ X k+1 2, X k = X k + ɛ X k 2. 2. Mapping function condition f k does not significantly decrease relative size of σ i f k (σ i ) 1 ( ) σi, d 1. f k ( X k ) 2 d X k 2 1.4 Type (7,6) 1.2 1 0.8 0.6 0.4 0.2 0 0 0.2 0.4 0.6 0.8 1

14/28 Recap: Tricks I ve learned 1. (almost) all matrix iterations employ rational approximation : QR algorithm, Zolotarev-(pd,eig,SVD) 2. O(ɛ) off-diagonal perturbation results in O(ɛ 2 ) change in eigenvalues [Li, Li (05)] E T eig A 1 0 eig A 1 0 A 2 E A 2 E 2 gap even when generalized nonsymmetric [Li, N., Truhar, Xu (11)] eig A 1 0 λ B 1 0 eig A 1 E 1 λ B 1 F T 1 0 A 2 0 B 2 E 2 A 2 F 2 B 2 ( E + λf ) 2 gap(a 1 λb 1, A 2 λb 2 ) can be proved also via a Gerschgorin-type argument [N. (11)] 3. Influence of diagonal blocks connected by k off-diagonals of O(ɛ) decays like O( ɛk gap ) [N. (11)]

15/28 Recap: Tricks I ve learned A = 3 Influence of diagonal blocks connected by k off-diagonals of O(ɛ) decays like O( ɛk a 1 e T 1 e 1 a 2 e T 2 e 2...... gap ) a k+1 e T k+1. e.. k+1, Â = A(k+1:end,k+1:end) =,... an 1 e T n 1 e n 1 a n eig(a) eig m (Â) m i=k+1 e 2 i a i a k, m = k + 1,..., n... an 1 e T n 1 e n 1 a n many of eig(â) match an eigenvalue of A proof : = x T A i j x + eigenvector decays exponentially λ A i j

16/28 Standard SVD algorithm 1. Reduce A to bidiagonal form via Householder reflections H L, H R H L H R A = H L H R H L B. A = U A BV A, where U A = ( H L ), V A = H R. 2. Compute SVD of B = U B ΣV B. [Golub and Kahan (1965)] Compute singular values Σ via dqds. Compute singular vectors U B, V B via inverse iteration. 3. SVD: A = (U A U B )Σ(V B V A ) = UΣV.

Typical relative accuracy for B with σ max = 1, σ min (B) = 10 15 17/28 Computing bidiagonal singular values: historical aspect QR algorithm applied to B T B: yields absolute accuracy [Golub and Kahan (1965)] σ i σ i O(n) σ max ɛ Refined QR: attains high relative accuracy [Demmel and Kahan (1990)] σ i σ i 69n 2 σ i ɛ dqds: 4-fold speedup + higher relative accuracy [Fernando and Parlett (1994)] σ i σ i 4nσ i ɛ σ max σ max σ max σ min σ min σ min QR 10 15 10 1 Refined QR 10 15 10 14 dqds 10 15 10 15

18/28 dqds: pseudocode Algorithm 1 The dqds algorithm q i = (B i,i ) 2, e i = (B i,i+1 ) 2 for m := 0, 1, do choose shift s( 0) d 1 := q 1 s for i := 1,, n 1 do q i := d i + e i e i := e i q i+1 /q i d i+1 := d i q i+1 /q i s end for q n := d n end for B = q1 e1 q2 dqds estimate shift s e2...... qn 1 en 1 qn 1 dqds estimate shift s get shift s dqds get shift s dqds time root-free e i 0, qi σ i with guaranteed high relative accuracy sequential nature, has been difficult to parallelize

19/28 dqds with conventional deflation strategy Typically, running dqds results in q1 e1 B = q2 e2...... qn 1 en 1 qn 1

19/28 dqds with conventional deflation strategy Typically, running dqds results in q1 e1 B = q2 e2...... qn 1 en 1 qn 1 e n 1 0 with convergence factor σ 2 n s σ 2 n 1 s < 1.

19/28 dqds with conventional deflation strategy Typically, running dqds results in q1 e1 B = q2 e2...... qn 1 en 1 qn 1 e n 1 0 with convergence factor σ 2 n s σ 2 n 1 s < 1.

19/28 dqds with conventional deflation strategy Typically, running dqds results in q1 e1 B = q2 e2...... qn 1 en 1 qn 1 e n 1 0 with convergence factor σ 2 n s σ 2 n 1 s < 1.

19/28 dqds with conventional deflation strategy Typically, running dqds results in q1 e1 B = q2 e2...... qn 1 en 1 qn 1 e n 1 0 with convergence factor σ 2 n s σ 2 n 1 s < 1.

19/28 dqds with conventional deflation strategy Typically, running dqds results in q1 e1 B = q2 e2...... qn 1 en 1 qn 1 e n 1 0 with convergence factor σ 2 n s σ 2 n 1 s < 1.

19/28 dqds with conventional deflation strategy Typically, running dqds results in q1 e1 B = q2 e2...... qn 1 en 1 qn 1 e n 1 0 with convergence factor σ 2 n s σ 2 n 1 s < 1. when e n 1 is negligibly small, set it to 0.

19/28 dqds with conventional deflation strategy Typically, running dqds results in q1 e1 B = q2 e2...... qn 1 0 qn 1 e n 1 0 with convergence factor σ 2 n s σ 2 n 1 s < 1. when e n 1 is negligibly small, set it to 0. q n is isolated: converged singular value. remove last row and column (deflation), repeat.

19/28 dqds with conventional deflation strategy Typically, running dqds results in B = q1 e1 q2 e2...... qn 1 0 qn 1 q1 e1 q2...... en 2 qn 1 e n 1 0 with convergence factor σ 2 n s σ 2 n 1 s < 1. when e n 1 is negligibly small, set it to 0. q n is isolated: converged singular value. remove last row and column (deflation), repeat.

Aggressive deflation for non-hermitian eigenproblems n k 1 1 k n k 1 H 11 H 12 H 13 H = 1 H 21 H 22 H 23 k 0 H 32 H 33 [Braman, Byers, Mathias (2003)] k : window size Compute Schur decomposition H 33 = VTV (T is triangular) I 0 0 H 11 H 12 H 13 I 0 0 H 11 H 12 H 13 V 0 1 0 H 21 H 22 H 23 0 1 0 = H 21 H 22 H 23 V. 0 0 V 0 H 32 H 33 0 0 V 0 t T Find negligible elements in t = and deflate.. Results in significant speed-up. 20/28

21/28 Aggressive deflation for dqds -version 1: Aggdef(1) 1. Compute the small SVD of k-by-k B 2 = UΣV T in B B = 1 en k. 2. Compute [ In k U T ] [ In k...... U T ] [ In k B ] : V [ In k B 2 ] = V....... 3. Find negligible elements in, remove corresponding rows and columns. 4. Reduce matrix to bidiagonal form, resume dqds.

21/28 Aggressive deflation for dqds -version 1: Aggdef(1) 1. Compute the small SVD of k-by-k B 2 = UΣV T in B B = 1 en k. 2. Compute [ In k U T ] [ In k...... U T ] [ In k B ] : V [ In k B 2 ] = V....... 3. Find negligible elements in, remove corresponding rows and columns. 0 due to O( ɛk gap ) effect 4. Reduce matrix to bidiagonal form, resume dqds.

Aggressive deflation for dqds -version 1: Aggdef(1) 1. Compute the small SVD of k-by-k B 2 = UΣV T in B B = 1 en k. 2. Compute [ In k U T ] [ In k...... U T ] [ In k B ] : V [ In k B 2 ] = V....... 3. Find negligible elements in, remove corresponding rows and columns. 0 due to O( ɛk gap ) effect 4. Reduce matrix to bidiagonal form, resume dqds. Problem in speed + stability 21/28

22/28 Efficient and stable Aggressive deflation: Aggdef(2) 1. Compute B 2 s.t. B T 2 B 2 = B T 2 B 2 si, where s = (σ min (B 2 )) 2 2. Apply Givens rotations to B 2 : x 0 Set x 0 when negligible. x 0 0 3. Update B 2 : B T 2 B 2 = B T 2 B 2 + si, deflate, repeat. x 0 0 0

Efficient and stable Aggressive deflation: Aggdef(2) 1. Compute B 2 s.t. B T 2 B 2 = B T 2 B 2 si, where s = (σ min (B 2 )) 2 2. Apply Givens rotations to B 2 : x 0 Set x 0 when negligible. x 0 0 3. Update B 2 : B T 2 B 2 = B T 2 B 2 + si, deflate, repeat. x 0 0 0 Lemma 2 Aggdef(1) and Aggdef(2) are mathematically equivalent. flops rel. accuracy Aggdef(1) O(k 2 ) conditional Aggdef(2) O(kl) guaranteed k: window size ( n), l: number of singular values deflated by Aggdef 22/28

23/28 Aggdef(2) preserves high relative accuracy By a mixed forward-backward relative error analysis, we establish: Theorem 3 for i = 1,..., n. 1 8nɛ σ i( B) σ i (B) 1 + 8nɛ Recall dqds error bound 1 4nɛ σ i( B) σ i (B) 1 + 4nɛ Calling Aggdef(2) maintains high relative accuracy.

24/28 Conventional deflation vs. Aggressive deflation Conventional...... 4 3 2 1...... Aggressive 4 3 2 1 looks for negligible values in i : local view i = e n i convergence factor of i : i i σ2 n i+1 σ 2 n i i : i after one dqd(s) iteration, looks for negligible values in i : global view n i e j i e n i q j=n k+2 j convergence factor of i : i i σ2 n i+1 σ 2 n k+1 k: window size (k = 4 above)

24/28 Conventional deflation vs. Aggressive deflation Conventional...... 4 3 2 1...... Aggressive 4 3 2 1 looks for negligible values in i : local view i = e n i convergence factor of i : looks for negligible values in i : global view n i e j i e n i q j=n k+2 j convergence factor of i : i i σ2 n i+1 σ 2 n i σ2 n i+1 s σ 2 n i s i i σ2 n i+1 σ 2 n k+1 σ2 n i+1 s σ 2 n k+1 s i : i after one dqd(s) iteration, k: window size (k = 4 above)

25/28 Convergence factors of i Conventional i Aggressive i i e n i e n i n i i i with shift s Conventional σ 2 n i+1 s σ 2 n i s j=n k+2 σ 2 n i+1 s σ 2 n k+1 s e j 1 q j Aggressive solid: dqds (with shift), dashed: dqd (zero-shift) aggressive deflation is much more powerful shift seems unnecessary with aggressive deflation

Convergence factors of i Conventional i Aggressive i i e n i e n i n i i i with shift s Conventional σ 2 n i+1 s σ 2 n i s j=n k+2 σ 2 n i+1 s σ 2 n k+1 s e j 1 q j Aggressive solid: dqds (with shift), dashed: dqd (zero-shift) aggressive deflation is much more powerful shift seems unnecessary with aggressive deflation use dqd (zero-shift)? 25/28

26/28 Numerical experiments: specifications algorithm deflation strategy shift LAPACK conventional s > 0 dqds+agg1 Aggdef(1) s > 0 dqds+agg2 Aggdef(2) s > 0 dqd+agg2 Aggdef(2) zero-shift environment: Intel Core i7 2.67GHz Processor (4 cores, 8 threads), 12GB RAM n Test matrices B: diagonals q i, off-diagonals e i 1 30000 qi = n + 1 i, ei = 1 2 30000 qi 1 = β q i, ei = q i, β = 1.01 3 30000 Toeplitz: q i = 1, ei = 2 4 30000 q2i 1 = n + 1 i, q2i = i, ei = (n i)/5 5 30000 qi+1 = β q i (i n/2), qn/2 = 1, qi 1 = β q i (i n/2), ei = 1, β = 1.01 6 30000 Cholesky factor of tridiagonal (1, 2, 1) matrix 7 30000 Cholesky factor of Laguerre matrix 8 30000 Cholesky factor of Hermite recurrence matrix 9 30000 Cholesky factor of Wilkinson matrix 10 30000 Cholesky factor of Clement matrix 11 13786 matrix from electronic structure calculations 12 16023 matrix from electronic structure calculations

Numerical experiments 27/28

28/28 Summary perturbation theory can inspire algorithm design algorithm design inspires perturbation problems off-diagonal perturbation results in O(ɛ k ) eigenvalue change understand matrix iterations using rational approximation theory thesis posted at my website http://www.opt.mist.i.u-tokyo.ac.jp/nakatsukasa/research.htm

29/28 Backward stability proof of QDWH-eig [ Goal: show E 2 = ɛa 2 where V T A+ E T ] A V = E A Assumptions: A = ÛĤ + ɛ A 2, Û T Û I = ɛ. [ ] I V T V = Û + ɛ I [ I By the assumptions A = V I 0 = A A T ( [ ] [ I I = V V T Ĥ Ĥ T V I ] V T Ĥ + ɛ A 2, so I ] V T ) + ɛ A 2 Therefore [ I ɛ A 2 = V T Ĥ V I ] [ I V T Ĥ V ] [ ] 0 E T = 2 I E 0