Algorithms and Perturbation Theory for Matrix Eigenvalue Problems and the SVD

Algorithms and Perturbation Theory for Matrix Eigenvalue Problems and the SVD Yuji Nakatsukasa PhD dissertation University of California, Davis Supervisor: Roland Freund Householder 2014

2/28 Acknowledgment for the supervision and support... Zhaojun Bai Nick Higham Françoise Tisseur

2/28 Acknowledgment for the supervision and support... Zhaojun Bai Nick Higham Françoise Tisseur for the collaboration and friendship... Kensuke Aishima Rüdiger Borsdorf Stefan Güttel Vanni Noferini Alex Townsend

3/28 Dissertation content: references I Matrix decomposition algorithms N., Aishima, Yamazaki. dqds with agg. deflation. SIMAX, 2012. N., Z. Bai, and F. Gygi. Optimizing Halleys iteration for the polar decomposition. SIMAX, 2010. N., Higham, Backward stability of polar decomp. alg. SIMAX, 2012. N., Higham, Spectral d-c alg. for symeig and SVD, SISC, 2013. II Eigenvalue perturbation theory Li, N., Truhar, Xu, Pert. for partitioned Hermitian GEP, SIMAX, 2011. N., absolute/relative Weyl theorem for GEP, LAA, 2010. N., Perturbation of a multiple generalized eigenvalue, BIT, 2010. N., Gerschgorin-type theorem for GEP in the Euclidean metric. Math. Comp., 2011. N., Pert. for Hermitian block tridiagonal matrices. APNUM, 2012. N., Condition numbers of a multiple generalized eigenvalue, Numer. Math., 2012. N., The tan θ theorem with relaxed conditions. LAA, 2012.

4/28 Dissertation content: table of contents I Matrix decomposition algorithms spectral divide-and-conquer algorithms for eigenproblems polar decomposition algorithm (type (3 k, 3 k 1) Zolotarev) for symeig and SVD led to Zolotarev-based algorithms (Tuesday s talk) + generalized eigenproblems stability proof for polar and symeig, SVD [N., Higham SIMAX (12), SISC (13)] bidiagonal singular values: dqds + aggressive early deflation [N., Aishima, Yamazaki SIMAX (12)] II Eigenvalue perturbation theory Weyl-type bounds for generalized eigenproblems off-diagonal, block tridiagonal perturbation eigenvector bounds, tan θ theorem Gerschgorin theory for generalized eigenproblems

Dissertation content: table of contents I Matrix decomposition algorithms spectral divide-and-conquer algorithms for eigenproblems polar decomposition algorithm (type (3 k, 3 k 1) Zolotarev) for symeig and SVD led to Zolotarev-based algorithms (Tuesday s talk) + generalized eigenproblems stability proof for polar and symeig, SVD [N., Higham SIMAX (12), SISC (13)] bidiagonal singular values: dqds + aggressive early deflation [N., Aishima, Yamazaki SIMAX (12)] II Eigenvalue perturbation theory Weyl-type bounds for generalized eigenproblems off-diagonal, block tridiagonal perturbation eigenvector bounds, tan θ theorem Gerschgorin theory for generalized eigenproblems today s plan: a few tricks I learned show how perturbation theory inspires algorithm design 4/28

5/28 Tricks I ve learned 1. (almost) all matrix iterations employ rational approximation examples: QR algorithm, expm, polar, shift-invert Arnoldi 2. O(ɛ) off-diagonal perturbation results in O(ɛ 2 ) change in eigenvalues [Li, Li (05)] E T eig A 1 0 eig A 1 0 A 2 E A 2 E 2 gap even when generalized nonsymmetric [Li, N., Truhar, Xu SIMAX (11)] eig A 1 0 λ B 1 0 eig A 1 E 1 λ B 1 F T 1 0 A 2 0 B 2 E 2 A 2 F 2 B 2 ( E + λf ) 2 gap(a 1 λb 1, A 2 λb 2 ) can be proved also by a Gerschgorin-type argument [N, Math. Comp. (11)] 3. Influence of diagonal blocks connected by k off-diagonals of O(ɛ) decays like O( ɛk gap ) [Paige LAA (74), N, Apnum. (11)]

Polar decomposition A = U p H algorithms Scaled Newton Iteration X k+1 = 1 2 ( µk X k + µ 1 k ) X k, X0 = A. Higham (1986): Gave optimal µ k and cheap approximation 2 Byers-Xu (2008): ζ k+1 = (ζk + 1/ζ k ), ζ 0 = 1/ ab, a A 2, b σ min (A) QDWH (QR-based dyn. weigh. Halley) X k+1 = X k (a k I + b k X k X k)(i + c k X k X k) 1, [N., Bai & Gygi (2010)]. X 0 = A/α. Convergence cubic, 6 iterations in double precision. QR-based DWH ck X k = I Q 1 R, Q 2 X k+1 = b k c k X k + 1 ck ( ak b k c k ) Q1 Q 2 Are the algorithms backward stable? (experimentally yes) 6/28

Polar decomposition A = U p H algorithms Scaled Newton Iteration :type (2,1) Zolotarev X k+1 = 1 ( ) µk X k + µ 1 k 2 X k, X0 = A. Higham (1986): Gave optimal µ k and cheap approximation 2 Byers-Xu (2008): ζ k+1 = (ζk + 1/ζ k ), ζ 0 = 1/ ab, a A 2, b σ min (A) QDWH (QR-based dyn. weigh. Halley) :type (3,2) Zolotarev [N., Bai & Gygi (2010)]. X k+1 = X k (a k I + b k X k X k)(i + c k X k X k) 1, X 0 = A/α. Convergence cubic, 6 iterations in double precision. QR-based DWH ck X k = I Q 1 R, Q 2 X k+1 = b k c k X k + 1 ck ( ak b k c k ) Q1 Q 2 Are the algorithms backward stable? (experimentally yes) 6/28

7/28 Backward Stability Assume Ĥ is Hermitian. Alg is backward stable if Û p Ĥ = A + A, A = ɛ A, Ĥ = H + H, Û p = U p + U, where H Hermitian psd and U p unitary. H = ɛ H, U = ɛ U p, crucial consequence: symeig and SVD are backward stable [N. and Higham, SISC (13)]

Backward Stability Assume Ĥ is Hermitian. Alg is backward stable if Û p Ĥ = A + A, A = ɛ A, Ĥ = H + H, Û p = U p + U, where H Hermitian psd and U p unitary. H = ɛ H, U = ɛ U p, crucial consequence: symeig and SVD are backward stable We develop a global analysis of iterations for polar that proves some are backward stable, correctly predicts that others are not stable. Strategy: [N. and Higham, SISC (13)] take account of rounding errors within each iteration and error propagation between iterations. key fact: Hermitian factor H is well-conditioned [Bhatia (94), Higham (08)] 7/28

8/28 Statement Suppose 1. Iteration form: X k+1 = f k (X k ), X 0 = A, X k U p.

8/28 Statement Suppose 1. Iteration form: X k+1 = f k (X k ), X 0 = A, X k U p. 2. Mixed stable evaluation of iteration There is an X k C n n such that X k+1 = f k ( X k ) + ɛ X k+1 2, X k = X k + ɛ X k 2.

8/28 Statement Suppose 1. Iteration form: X k+1 = f k (X k ), X 0 = A, X k U p. 2. Mixed stable evaluation of iteration There is an X k C n n such that X k+1 = f k ( X k ) + ɛ X k+1 2, X k = X k + ɛ X k 2. 3. Mapping function condition f k does not significantly decrease relative size of σ i f k (σ i ) 1 ( ) σi, d 1. f k ( X k ) 2 d X k 2

Statement Suppose 1. Iteration form: X k+1 = f k (X k ), X 0 = A, X k U p. 2. Mixed stable evaluation of iteration There is an X k C n n such that X k+1 = f k ( X k ) + ɛ X k+1 2, X k = X k + ɛ X k 2. 3. Mapping function condition f k does not significantly decrease relative size of σ i f k (σ i ) 1 ( ) σi, d 1. f k ( X k ) 2 d X k 2 Theorem 1 Suppose X l X l = I + ɛ, let Û p = X l and Ĥ = 1 2 (Û pa + (Û pa) ). Then Û p Ĥ = A + dɛ A 2, Ĥ = H + dɛ H 2, where H is the Hermitian polar factor of A. Furthermore, Û p = U p + dɛκ 2 (A). 8/28

9/28 Condition on f k : good mapping 1 f(x) f(x) 1 0.75 0.5 x M 0.75 0.5 f(x) f(x) x M 0.25 0.25 0 m M 0 m M QDWH iteration f (x) = x a + bx2 1 + cx 2, a stable mapping, d = 1. Scaled Newton iteration f (x) = 1 2 (µx + (µx) 1 ), a stable mapping, d = 1.

10/28 Condition on f k : bad mapping 1 0.75 f(x) f(x) 0.5 x M 0.25 0 m M 3 Inverse Newton iteration f (x) = 2µx(1 + µ 2 x 2 ) 1, an unstable mapping. Newton Schulz iteration f (x) = 1 2 x(3 x2 ), an unstable mapping if M 3.

11/28 QDWH is stable QR-based implementation (QDWH) 1 f(x) f(x) [ ] ck X k I = [ Q1 Q 2 ] R, X k+1 = bk c k X k + 1 ck ( ak bk c k ) Q1 Q 2 0.75 0.5 0.25 x M 0 m M Use Householder QR factorization with column pivoting and row sorting (or pivoting). The QR factorization has row-wise b errs of order ρ i u, where growth factors ρ i (1 + 2) n 1 (Cox and Higham, 1998). ρ i usually small in practice. Can prove that mixed stable evaluation of iteration condition holds. No pivoting is fine in practice. But blocking order matters: [ ] [ ] I Q2 = R is unstable ck X k Q 1

12/28 Scaled Newton stability Mixed stable condition holds if matrix inverse computed using mixed backward forward stable method. Condition on f k holds. 1 0.75 Conclusion Scaled Newton is backward stable. 0.5 0.25 f(x) f(x) x M 0 m M History: Higham (85): raised the question of backward stability. Kielbasiński, Ziȩtak (03): long and complicated analysis proving backward stability, assuming matrix inverses are computed in a mixed backward forward stable way. Byers, Xu (08): proof with much simpler arguments, but some incompleteness in analysis [Kielbasiński, Ziȩtak (10)]

13/28 Extra: Is the (degree-17) Zolotarev-polar stable? 1. Mixed stable evaluation of iteration There is an X k C n n such that X k+1 = f k ( X k ) + ɛ X k+1 2, X k = X k + ɛ X k 2. 2. Mapping function condition f k does not significantly decrease relative size of σ i f k (σ i ) 1 ( ) σi, d 1. f k ( X k ) 2 d X k 2 1.4 Type (7,6) 1.2 1 0.8 0.6 0.4 0.2 0 0 0.2 0.4 0.6 0.8 1

13/28 Extra: Is the (degree-17) Zolotarev-polar stable? 1. Mixed stable evaluation of iteration? There is an X k C n n such that X k+1 = f k ( X k ) + ɛ X k+1 2, X k = X k + ɛ X k 2. 2. Mapping function condition f k does not significantly decrease relative size of σ i f k (σ i ) 1 ( ) σi, d 1. f k ( X k ) 2 d X k 2 1.4 Type (7,6) 1.2 1 0.8 0.6 0.4 0.2 0 0 0.2 0.4 0.6 0.8 1

14/28 Recap: Tricks I ve learned 1. (almost) all matrix iterations employ rational approximation : QR algorithm, Zolotarev-(pd,eig,SVD) 2. O(ɛ) off-diagonal perturbation results in O(ɛ 2 ) change in eigenvalues [Li, Li (05)] E T eig A 1 0 eig A 1 0 A 2 E A 2 E 2 gap even when generalized nonsymmetric [Li, N., Truhar, Xu (11)] eig A 1 0 λ B 1 0 eig A 1 E 1 λ B 1 F T 1 0 A 2 0 B 2 E 2 A 2 F 2 B 2 ( E + λf ) 2 gap(a 1 λb 1, A 2 λb 2 ) can be proved also via a Gerschgorin-type argument [N. (11)] 3. Influence of diagonal blocks connected by k off-diagonals of O(ɛ) decays like O( ɛk gap ) [N. (11)]

15/28 Recap: Tricks I ve learned A = 3 Influence of diagonal blocks connected by k off-diagonals of O(ɛ) decays like O( ɛk a 1 e T 1 e 1 a 2 e T 2 e 2...... gap ) a k+1 e T k+1. e.. k+1, Â = A(k+1:end,k+1:end) =,... an 1 e T n 1 e n 1 a n eig(a) eig m (Â) m i=k+1 e 2 i a i a k, m = k + 1,..., n... an 1 e T n 1 e n 1 a n many of eig(â) match an eigenvalue of A proof : = x T A i j x + eigenvector decays exponentially λ A i j

16/28 Standard SVD algorithm 1. Reduce A to bidiagonal form via Householder reflections H L, H R H L H R A = H L H R H L B. A = U A BV A, where U A = ( H L ), V A = H R. 2. Compute SVD of B = U B ΣV B. [Golub and Kahan (1965)] Compute singular values Σ via dqds. Compute singular vectors U B, V B via inverse iteration. 3. SVD: A = (U A U B )Σ(V B V A ) = UΣV.

Typical relative accuracy for B with σ max = 1, σ min (B) = 10 15 17/28 Computing bidiagonal singular values: historical aspect QR algorithm applied to B T B: yields absolute accuracy [Golub and Kahan (1965)] σ i σ i O(n) σ max ɛ Refined QR: attains high relative accuracy [Demmel and Kahan (1990)] σ i σ i 69n 2 σ i ɛ dqds: 4-fold speedup + higher relative accuracy [Fernando and Parlett (1994)] σ i σ i 4nσ i ɛ σ max σ max σ max σ min σ min σ min QR 10 15 10 1 Refined QR 10 15 10 14 dqds 10 15 10 15

18/28 dqds: pseudocode Algorithm 1 The dqds algorithm q i = (B i,i ) 2, e i = (B i,i+1 ) 2 for m := 0, 1, do choose shift s( 0) d 1 := q 1 s for i := 1,, n 1 do q i := d i + e i e i := e i q i+1 /q i d i+1 := d i q i+1 /q i s end for q n := d n end for B = q1 e1 q2 dqds estimate shift s e2...... qn 1 en 1 qn 1 dqds estimate shift s get shift s dqds get shift s dqds time root-free e i 0, qi σ i with guaranteed high relative accuracy sequential nature, has been difficult to parallelize

19/28 dqds with conventional deflation strategy Typically, running dqds results in q1 e1 B = q2 e2...... qn 1 en 1 qn 1

19/28 dqds with conventional deflation strategy Typically, running dqds results in q1 e1 B = q2 e2...... qn 1 en 1 qn 1 e n 1 0 with convergence factor σ 2 n s σ 2 n 1 s < 1.

19/28 dqds with conventional deflation strategy Typically, running dqds results in q1 e1 B = q2 e2...... qn 1 en 1 qn 1 e n 1 0 with convergence factor σ 2 n s σ 2 n 1 s < 1. when e n 1 is negligibly small, set it to 0.

19/28 dqds with conventional deflation strategy Typically, running dqds results in q1 e1 B = q2 e2...... qn 1 0 qn 1 e n 1 0 with convergence factor σ 2 n s σ 2 n 1 s < 1. when e n 1 is negligibly small, set it to 0. q n is isolated: converged singular value. remove last row and column (deflation), repeat.

19/28 dqds with conventional deflation strategy Typically, running dqds results in B = q1 e1 q2 e2...... qn 1 0 qn 1 q1 e1 q2...... en 2 qn 1 e n 1 0 with convergence factor σ 2 n s σ 2 n 1 s < 1. when e n 1 is negligibly small, set it to 0. q n is isolated: converged singular value. remove last row and column (deflation), repeat.

Aggressive deflation for non-hermitian eigenproblems n k 1 1 k n k 1 H 11 H 12 H 13 H = 1 H 21 H 22 H 23 k 0 H 32 H 33 [Braman, Byers, Mathias (2003)] k : window size Compute Schur decomposition H 33 = VTV (T is triangular) I 0 0 H 11 H 12 H 13 I 0 0 H 11 H 12 H 13 V 0 1 0 H 21 H 22 H 23 0 1 0 = H 21 H 22 H 23 V. 0 0 V 0 H 32 H 33 0 0 V 0 t T Find negligible elements in t = and deflate.. Results in significant speed-up. 20/28

21/28 Aggressive deflation for dqds -version 1: Aggdef(1) 1. Compute the small SVD of k-by-k B 2 = UΣV T in B B = 1 en k. 2. Compute [ In k U T ] [ In k...... U T ] [ In k B ] : V [ In k B 2 ] = V....... 3. Find negligible elements in, remove corresponding rows and columns. 4. Reduce matrix to bidiagonal form, resume dqds.

Aggressive deflation for dqds -version 1: Aggdef(1) 1. Compute the small SVD of k-by-k B 2 = UΣV T in B B = 1 en k. 2. Compute [ In k U T ] [ In k...... U T ] [ In k B ] : V [ In k B 2 ] = V....... 3. Find negligible elements in, remove corresponding rows and columns. 0 due to O( ɛk gap ) effect 4. Reduce matrix to bidiagonal form, resume dqds. Problem in speed + stability 21/28

22/28 Efficient and stable Aggressive deflation: Aggdef(2) 1. Compute B 2 s.t. B T 2 B 2 = B T 2 B 2 si, where s = (σ min (B 2 )) 2 2. Apply Givens rotations to B 2 : x 0 Set x 0 when negligible. x 0 0 3. Update B 2 : B T 2 B 2 = B T 2 B 2 + si, deflate, repeat. x 0 0 0

Efficient and stable Aggressive deflation: Aggdef(2) 1. Compute B 2 s.t. B T 2 B 2 = B T 2 B 2 si, where s = (σ min (B 2 )) 2 2. Apply Givens rotations to B 2 : x 0 Set x 0 when negligible. x 0 0 3. Update B 2 : B T 2 B 2 = B T 2 B 2 + si, deflate, repeat. x 0 0 0 Lemma 2 Aggdef(1) and Aggdef(2) are mathematically equivalent. flops rel. accuracy Aggdef(1) O(k 2 ) conditional Aggdef(2) O(kl) guaranteed k: window size ( n), l: number of singular values deflated by Aggdef 22/28

23/28 Aggdef(2) preserves high relative accuracy By a mixed forward-backward relative error analysis, we establish: Theorem 3 for i = 1,..., n. 1 8nɛ σ i( B) σ i (B) 1 + 8nɛ Recall dqds error bound 1 4nɛ σ i( B) σ i (B) 1 + 4nɛ Calling Aggdef(2) maintains high relative accuracy.

24/28 Conventional deflation vs. Aggressive deflation Conventional...... 4 3 2 1...... Aggressive 4 3 2 1 looks for negligible values in i : local view i = e n i convergence factor of i : i i σ2 n i+1 σ 2 n i i : i after one dqd(s) iteration, looks for negligible values in i : global view n i e j i e n i q j=n k+2 j convergence factor of i : i i σ2 n i+1 σ 2 n k+1 k: window size (k = 4 above)

24/28 Conventional deflation vs. Aggressive deflation Conventional...... 4 3 2 1...... Aggressive 4 3 2 1 looks for negligible values in i : local view i = e n i convergence factor of i : looks for negligible values in i : global view n i e j i e n i q j=n k+2 j convergence factor of i : i i σ2 n i+1 σ 2 n i σ2 n i+1 s σ 2 n i s i i σ2 n i+1 σ 2 n k+1 σ2 n i+1 s σ 2 n k+1 s i : i after one dqd(s) iteration, k: window size (k = 4 above)

25/28 Convergence factors of i Conventional i Aggressive i i e n i e n i n i i i with shift s Conventional σ 2 n i+1 s σ 2 n i s j=n k+2 σ 2 n i+1 s σ 2 n k+1 s e j 1 q j Aggressive solid: dqds (with shift), dashed: dqd (zero-shift) aggressive deflation is much more powerful shift seems unnecessary with aggressive deflation

Convergence factors of i Conventional i Aggressive i i e n i e n i n i i i with shift s Conventional σ 2 n i+1 s σ 2 n i s j=n k+2 σ 2 n i+1 s σ 2 n k+1 s e j 1 q j Aggressive solid: dqds (with shift), dashed: dqd (zero-shift) aggressive deflation is much more powerful shift seems unnecessary with aggressive deflation use dqd (zero-shift)? 25/28

26/28 Numerical experiments: specifications algorithm deflation strategy shift LAPACK conventional s > 0 dqds+agg1 Aggdef(1) s > 0 dqds+agg2 Aggdef(2) s > 0 dqd+agg2 Aggdef(2) zero-shift environment: Intel Core i7 2.67GHz Processor (4 cores, 8 threads), 12GB RAM n Test matrices B: diagonals q i, off-diagonals e i 1 30000 qi = n + 1 i, ei = 1 2 30000 qi 1 = β q i, ei = q i, β = 1.01 3 30000 Toeplitz: q i = 1, ei = 2 4 30000 q2i 1 = n + 1 i, q2i = i, ei = (n i)/5 5 30000 qi+1 = β q i (i n/2), qn/2 = 1, qi 1 = β q i (i n/2), ei = 1, β = 1.01 6 30000 Cholesky factor of tridiagonal (1, 2, 1) matrix 7 30000 Cholesky factor of Laguerre matrix 8 30000 Cholesky factor of Hermite recurrence matrix 9 30000 Cholesky factor of Wilkinson matrix 10 30000 Cholesky factor of Clement matrix 11 13786 matrix from electronic structure calculations 12 16023 matrix from electronic structure calculations

Numerical experiments 27/28

28/28 Summary perturbation theory can inspire algorithm design algorithm design inspires perturbation problems off-diagonal perturbation results in O(ɛ k ) eigenvalue change understand matrix iterations using rational approximation theory thesis posted at my website http://www.opt.mist.i.u-tokyo.ac.jp/nakatsukasa/research.htm

29/28 Backward stability proof of QDWH-eig [ Goal: show E 2 = ɛa 2 where V T A+ E T ] A V = E A Assumptions: A = ÛĤ + ɛ A 2, Û T Û I = ɛ. [ ] I V T V = Û + ɛ I [ I By the assumptions A = V I 0 = A A T ( [ ] [ I I = V V T Ĥ Ĥ T V I ] V T Ĥ + ɛ A 2, so I ] V T ) + ɛ A 2 Therefore [ I ɛ A 2 = V T Ĥ V I ] [ I V T Ĥ V ] [ ] 0 E T = 2 I E 0