A Tour of the Lanczos Algorithm and its Convergence Guarantees through the Decades

Similar documents
Normalized power iterations for the computation of SVD

randomized block krylov methods for stronger and faster approximate svd

Large-scale eigenvalue problems

The Lanczos and conjugate gradient algorithms

Applications of Randomized Methods for Decomposing and Simulating from Large Covariance Matrices

LARGE SPARSE EIGENVALUE PROBLEMS. General Tools for Solving Large Eigen-Problems

LARGE SPARSE EIGENVALUE PROBLEMS

Block Bidiagonal Decomposition and Least Squares Problems

Singular Value Decomposition

Singular Value Decomposition

Dimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas

STA141C: Big Data & High Performance Statistical Computing

Principal Component Analysis

Computation of eigenvalues and singular values Recall that your solutions to these questions will not be collected or evaluated.

Main matrix factorizations

Total least squares. Gérard MEURANT. October, 2008

STA141C: Big Data & High Performance Statistical Computing

Parallel Numerical Algorithms

APPLIED NUMERICAL LINEAR ALGEBRA

Low-Rank Approximations, Random Sampling and Subspace Iteration

UNIT 6: The singular value decomposition.

A fast randomized algorithm for approximating an SVD of a matrix

Solution of eigenvalue problems. Subspace iteration, The symmetric Lanczos algorithm. Harmonic Ritz values, Jacobi-Davidson s method

Randomized methods for computing the Singular Value Decomposition (SVD) of very large matrices

4.8 Arnoldi Iteration, Krylov Subspaces and GMRES

SUBSPACE ITERATION RANDOMIZATION AND SINGULAR VALUE PROBLEMS

Preliminary/Qualifying Exam in Numerical Analysis (Math 502a) Spring 2012

SVD, PCA & Preprocessing

Computational Methods. Eigenvalues and Singular Values

Singular Value Decomposition

A randomized algorithm for approximating the SVD of a matrix

A fast randomized algorithm for overdetermined linear least-squares regression

Fall TMA4145 Linear Methods. Exercise set Given the matrix 1 2

Lecture 9: Krylov Subspace Methods. 2 Derivation of the Conjugate Gradient Algorithm

AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 23: GMRES and Other Krylov Subspace Methods; Preconditioning

NORMS ON SPACE OF MATRICES

Matrix Approximations

Mathematical foundations - linear algebra

Randomization: Making Very Large-Scale Linear Algebraic Computations Possible

Notes on singular value decomposition for Math 54. Recall that if A is a symmetric n n matrix, then A has real eigenvalues A = P DP 1 A = P DP T.

Applied Linear Algebra in Geoscience Using MATLAB

Solution of eigenvalue problems. Subspace iteration, The symmetric Lanczos algorithm. Harmonic Ritz values, Jacobi-Davidson s method

arxiv: v1 [cs.lg] 26 Jul 2017

Randomized Methods for Computing Low-Rank Approximations of Matrices

The University of Texas at Austin Department of Electrical and Computer Engineering. EE381V: Large Scale Learning Spring 2013.

Data Mining Lecture 4: Covariance, EVD, PCA & SVD

Probabilistic upper bounds for the matrix two-norm

AM 205: lecture 8. Last time: Cholesky factorization, QR factorization Today: how to compute the QR factorization, the Singular Value Decomposition

Applied Mathematics 205. Unit II: Numerical Linear Algebra. Lecturer: Dr. David Knezevic

Linear Algebra & Geometry why is linear algebra useful in computer vision?

AMS526: Numerical Analysis I (Numerical Linear Algebra)

A Randomized Algorithm for the Approximation of Matrices

Iterative methods for symmetric eigenvalue problems

Let A an n n real nonsymmetric matrix. The eigenvalue problem: λ 1 = 1 with eigenvector u 1 = ( ) λ 2 = 2 with eigenvector u 2 = ( 1

Numerical Methods I Singular Value Decomposition

Golub-Kahan iterative bidiagonalization and determining the noise level in the data

Numerical Methods in Matrix Computations

Parallel Singular Value Decomposition. Jiaxing Tan

Last Time. Social Network Graphs Betweenness. Graph Laplacian. Girvan-Newman Algorithm. Spectral Bisection

Linear Algebra Primer

MATH 22A: LINEAR ALGEBRA Chapter 4

Contribution of Wo¹niakowski, Strako²,... The conjugate gradient method in nite precision computa

CS168: The Modern Algorithmic Toolbox Lecture #8: How PCA Works

(a) If A is a 3 by 4 matrix, what does this tell us about its nullspace? Solution: dim N(A) 1, since rank(a) 3. Ax =

ECS231 Handout Subspace projection methods for Solving Large-Scale Eigenvalue Problems. Part I: Review of basic theory of eigenvalue problems

Expectation Maximization

Solving Constrained Rayleigh Quotient Optimization Problem by Projected QEP Method

TBP MATH33A Review Sheet. November 24, 2018

Numerical Methods for Solving Large Scale Eigenvalue Problems

AMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences)

Research Article Some Generalizations and Modifications of Iterative Methods for Solving Large Sparse Symmetric Indefinite Linear Systems

Multivariate Statistical Analysis

Lecture 5: Randomized methods for low-rank approximation

Online Tensor Factorization for. Feature Selection in EEG

5 Selected Topics in Numerical Linear Algebra

Principal Component Analysis

arxiv: v5 [math.na] 16 Nov 2017

Linear Subspace Models

1. Diagonalize the matrix A if possible, that is, find an invertible matrix P and a diagonal

Linear Algebra Review. Vectors

Singular Value Decompsition

Lecture 2: Numerical linear algebra

Linear Algebra- Final Exam Review

MTH 2032 SemesterII

Enabling very large-scale matrix computations via randomization

Course Notes: Week 1

Bindel, Fall 2016 Matrix Computations (CS 6210) Notes for

Maths for Signals and Systems Linear Algebra in Engineering

2. Review of Linear Algebra

The Singular Value Decomposition (SVD) and Principal Component Analysis (PCA)

dimensionality reduction for k-means and low rank approximation

Methods for sparse analysis of high-dimensional data, II

Maths for Signals and Systems Linear Algebra in Engineering

CS 229r: Algorithms for Big Data Fall Lecture 17 10/28

Krylov Subspace Methods to Calculate PageRank

ITERATIVE REGULARIZATION WITH MINIMUM-RESIDUAL METHODS

Chapter 7: Symmetric Matrices and Quadratic Forms

Math 102, Winter Final Exam Review. Chapter 1. Matrices and Gaussian Elimination

Eigenvalue Problems Computation and Applications

Matrices, Moments and Quadrature, cont d

Transcription:

A Tour of the Lanczos Algorithm and its Convergence Guarantees through the Decades Qiaochu Yuan Department of Mathematics UC Berkeley Joint work with Prof. Ming Gu, Bo Li April 17, 2018 Qiaochu Yuan Tour of Lanczos April 17, 2018 1 / 43

Timeline 1950) Lanczos iteration algorithm 1965) Lanczos algorithm for SVD 1974, 1977) algorithm 2009, 2010, 2015) Randomized 1950 1960 1970 1980 1990 2000 2010 1966, 1971, 1980) Convergence 1975, 1980) bounds for Convergence Lanczos bounds for 2015) Convergence bounds for Randomized Qiaochu Yuan Tour of Lanczos April 17, 2018 2 / 43

Timeline 1950) Lanczos iteration algorithm 1965) Lanczos algorithm for SVD 1974, 1977) algorithm 2009, 2010, 2015) Randomized 1950 1960 1970 1980 1990 2000 2010 Our work: 2015) 1966, 1971, 1980) Convergence 1975, 1980) 1) Convergence bounds Convergence for bounds for bounds for Convergence RBL, arbitrary b Randomized Lanczos bounds 2) for Superlinear convergence for typical data matrices Qiaochu Yuan Tour of Lanczos April 17, 2018 3 / 43

Timeline 1950) Lanczos iteration algorithm 1965) Lanczos algorithm for SVD 1974, 1977) algorithm 2009, 2010, 2015) Randomized 1950 1960 1970 1980 1990 2000 2010 1966, 1971, 1980) Convergence 1975, 1980) bounds for Convergence Lanczos bounds for 2015) Convergence bounds for Randomized Qiaochu Yuan Tour of Lanczos April 17, 2018 4 / 43

Lanczos Iteration Algorithm Developed by Lanczos in 1950 [Lan50]. Widely used iterative algorithm for computing the extremal eigenvalues and corresponding eigenvectors of a large, sparse, symmetric matrix A. Goal Given a symmetric matrix A R n n, with eigenvalues λ 1 > λ 2 > > λ n and associated eigenvectors u 1,, u n, want to find approximations for λ i, i = 1,, k, the k largest eigenvalues of A u i, i = 1,, k, the associated eigenvectors where k n. Qiaochu Yuan Tour of Lanczos April 17, 2018 5 / 43

Lanczos - Details General idea: 1 Select an initial vector v. 2 Construct Krylov subspace K A, v, k) = span{v, Av, A 2 v,, A k 1 v}. 3 Restrict and project A to the Krylov subspace, T = proj K A K 4 Use eigen values and vectors of T as approximations to those of A. In matrices: K k = [ v Av A k 1 v ] R n k Q k = [ ] q 1 q 2 q k qr Kk ) T k = Q T k AQ k R k k Qiaochu Yuan Tour of Lanczos April 17, 2018 6 / 43

Lanczos - Details A [ ] [ ] q 1 q j = q1 q j q j+1 At each step j = 1,, k of Lanczos iteration: Use the three-term recurrence: Calculate the αs, βs as: AQ j = Q j T j + β j q j+1 e T j+1 Aq j = β j 1 q j 1 + α j q j + β j q j+1 α 1 β 1 β 1............ βj 1 β j 1 α j β j α j = q T j Aq j 1) r j = A α j I) q j β j 1 q j 2) β j = r j 2, q j+1 = r j /β j 3) Qiaochu Yuan Tour of Lanczos April 17, 2018 7 / 43

Timeline 1950) Lanczos iteration algorithm 1965) Lanczos algorithm for SVD 1974, 1977) algorithm 2009, 2010, 2015) Randomized 1950 1960 1970 1980 1990 2000 2010 1966, 1971, 1980) Convergence 1975, 1980) bounds for Convergence Lanczos bounds for 2015) Convergence bounds for Randomized Qiaochu Yuan Tour of Lanczos April 17, 2018 8 / 43

Convergence of Lanczos How well does λ k) i, the eigenvalues of T k, approximate λ i, the eigenvalues of A, for i = 1,, k? First answered by Kaniel in 1966 [Kan66] and Paige in 1971 [Pai71]. Theorem Kaniel-Paige Inequality) If v is chosen to be not orthogonal to the eigenspace associated with λ 1, then 0 λ 1 λ k) 1 λ 1 λ n ) tan2 θ u 1, v) Tk 1 2 γ 4) 1) where T i x) is the Chebyshev polynomial of degree i, θ, ) is the angle between two vectors, and γ 1 = 1 + 2 λ 1 λ 2 λ 2 λ n Qiaochu Yuan Tour of Lanczos April 17, 2018 9 / 43

Convergence of Lanczos Later generalized by Saad in 1980 [Saa80]. Theorem Saad Inequality) For i = 1,, k, if v is chosen such that u T i v 0, then ) 0 λ i λ k) L k) 2 i tan θ u i, v) i λ i λ n ) 5) T k i γ i ) where T i x) is the Chebyshev polynomial of degree i, θ, ) is the angle between two vectors, and γ i = 1 + 2 λ i λ i+1 λ i+1 λ n L k) i = i 1 j=1 λ k) j λ n if i 1 λ k) j λ i 1 if i = 1 Qiaochu Yuan Tour of Lanczos April 17, 2018 10 / 43

Aside - Chebyshev Polynomials Recall T j x) = 1 x + ) j x 2 2 1 + x ) ) j x 2 1 6) When j is large, T j x) 1 2 and when g is small, T j 1 + g) 1 2 x + ) j x 2 1 1 + g + ) j 2g determines the convergence of Lanczos with ) λi λ i+1 g = Θ λ i+1 λ n 140 120 Chebyshev, Lanczos 100 80 60 40 Monomial, power method 20 0 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 Qiaochu Yuan Tour of Lanczos April 17, 2018 11 / 43

Timeline 1950) Lanczos iteration algorithm 1965) Lanczos algorithm for SVD 1974, 1977) algorithm 2009, 2010, 2015) Randomized 1950 1960 1970 1980 1990 2000 2010 1966, 1971, 1980) Convergence 1975, 1980) bounds for Convergence Lanczos bounds for 2015) Convergence bounds for Randomized Qiaochu Yuan Tour of Lanczos April 17, 2018 12 / 43

Algorithm Introduced by Golub and Underwood in 1977 [GU77] and Cullum and Donath in 1974 [CD74]. The block generalization of the Lanczos method uses, instead of a single initial vector v, a block of b vectors V = [ v 1 v b ], and builds the Krylov subspace in q iterations as K q A, V, q) = span { V, AV,, A q 1 V }. k b, bq n. Compared to classical Lanczos, block Lanczos is more memory and cache efficient, using BLAS3 operations. has the ability to converge to eigenvalues with cluster size > 1. has faster convergence with respect to number of iterations. Qiaochu Yuan Tour of Lanczos April 17, 2018 13 / 43

Algorithm - Details A [ ] [ ] Q 1 Q j = Q1 Q j Q j+1 At each step j = 1,, k of Lanczos iteration: AQ = QT j + Q j+1 [ 0 0 Bj ] Use the three-term recurrence: A 1 B1 T. B.. 1......... B T j 1 B j 1 A j B j Calculate the As, Bs as: AQ j = Q j 1 B T j 1 + Q j A j + Q j+1 B j A j = Q T j AQ j Q j+1 B j qr ) AQ j Q j A j Q j Bj 1 T 7) 8) Qiaochu Yuan Tour of Lanczos April 17, 2018 14 / 43

Timeline 1950) Lanczos iteration algorithm 1965) Lanczos algorithm for SVD 1974, 1977) algorithm 2009, 2010, 2015) Randomized 1950 1960 1970 1980 1990 2000 2010 1966, 1971, 1980) Convergence 1975, 1980) bounds for Convergence Lanczos bounds for 2015) Convergence bounds for Randomized Qiaochu Yuan Tour of Lanczos April 17, 2018 15 / 43

Convergence of First analyzed by Underwood in 1975 [Und75] with the introduction of the algorithm. Theorem Underwood Inequality) Let λ i, u i, i = 1,, n be the eigenvalues and eigenvectors of A respectively. Let U = [ ] u 1 u b. If U T V is of full rank b, then for i = 1,, b 0 λ i λ q) i λ 1 λ n ) tan2 Θ U, V) Tq 1 2 ρ 9) i) where T i x) is the Chebyshev polynomial of degree i, cos Θ U, V) = σ min U T V ), and ρ i = 1 + 2 λ i λ b+1 λ b+1 λ n Qiaochu Yuan Tour of Lanczos April 17, 2018 16 / 43

Convergence of Theorem Saad Inequality [Saa80]) Let λ j, u j, j = 1,, n be the eigenvalues and eigenvectors of A respectively. Let U i = [ u i u i+b 1 ]. For i = 1,, b, if U T i V is of full rank b, then ) 0 λ i λ q) L q) 2 i tan Θ U i, V) i λ i λ n ) 10) T q i ˆγ i ) where T i x) is the Chebyshev polynomial of degree i, cos Θ U, V) = σ min U T V ), and ˆγ i = 1 + 2 λ i λ i+b λ i+b λ n L q) i = i 1 j=1 λ q) j λ n if i 1 λ q) j λ i 1 if i = 1 Qiaochu Yuan Tour of Lanczos April 17, 2018 17 / 43

Aside - Block Size Recall, when j is large and g is small T j 1 + g) 1 1 + g + j 2g) 11) 2 ) λi λ i+1 classical: g = Θ λ i+1 λ n ) λi λ i+b block: g b = Θ λ i+b λ n 140 120 g g b Suppose eigenvalue distributed as λ j > 1 + ɛ)λ j+1 for all j: 100 80 60 Chebyshev, Lanczos g b 1 1 + ɛ) b 1 + ɛ) b λ i b λ i ɛ b g 40 20 0 Monomial, power method 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 Qiaochu Yuan Tour of Lanczos April 17, 2018 18 / 43

Timeline 1950) Lanczos iteration algorithm 1965) Lanczos algorithm for SVD 1974, 1977) algorithm 2009, 2010, 2015) Randomized 1950 1960 1970 1980 1990 2000 2010 1966, 1971, 1980) Convergence 1975, 1980) bounds for Convergence Lanczos bounds for 2015) Convergence bounds for Randomized Qiaochu Yuan Tour of Lanczos April 17, 2018 19 / 43

Randomized In recent years, there has been increased interest in algorithms to compute low-rank approximations of matrices, with high computational efficiency requirement, running on large matrices low approximation accuracy requirement, 2-3 digits of accuracy Applications mostly in big-data computations compression of data matrices matrix processing techniques, e.g. PCA optimization of the nuclear norm objective function Qiaochu Yuan Tour of Lanczos April 17, 2018 20 / 43

Randomized Grew out of work done in randomized algorithms, in particular Randomized Subspace Iteration, by Rokhlin, Szlam, and Tygert in 2009 [RST09] Halko, Martinsson, and Tropp in 2011 [HMST11] Gu [Gu15], and Musco [MM15] in 2015 Idea: Instead of taking any initial set of vectors V, an unfortunate choice of which could result in poor convergence, choose V = AΩ, a random projection of the columns of A, to better capture the range space. Qiaochu Yuan Tour of Lanczos April 17, 2018 21 / 43

RSI & RBL Pseudocode Algorithm 1 Randomized Subspace Iteration RSI) pseudocode Input: A R m n, target rank k, block size b, number of iter. q Output: B k R m n, a rank-k approximation 1: Draw Ω R n b. ω ij N 0, 1). 2: Form K = AA T ) q 1 AΩ. 3: Orthogonalize qrk) Q. 4: B k = QQ T A) k = QQ T A) k. M) k indicates the k-truncated SVD of matrix M. Algorithm 2 Randomized Block Lanczos RBL) pseudocode Input: A R m n, target rank k, block size b, number of iter. q Output: B k R m n, a rank-k approximation 1: Draw Ω R n b. ω ij N 0, 1). 2: Form K =[AΩ, AA T )AΩ,, AA T ) q 1 AΩ]. 3: Orthogonalize qrk) Q. 4: B k = QQ T A) k = QQ T A) k. requires bq k. numerically unstable. Qiaochu Yuan Tour of Lanczos April 17, 2018 22 / 43

Timeline 1950) Lanczos iteration algorithm 1965) Lanczos algorithm for SVD 1974, 1977) algorithm 2009, 2010, 2015) Randomized 1950 1960 1970 1980 1990 2000 2010 1966, 1971, 1980) Convergence 1975, 1980) bounds for Convergence Lanczos bounds for 2015) Convergence bounds for Randomized Qiaochu Yuan Tour of Lanczos April 17, 2018 23 / 43

Lanczos for SVD A modification of the Lanczos algorithm can be used to find the extremal singular value pairs of a non-symmetric matrix A R m n. Utilizing the connections between the singular value decomposition of A and the eigen decompositions of AA T and A T A, 1 Select a block of initial vectors V = [ v 1 v b ]. 2 Construct Krylov subspaces K V A T A, V, q ) and K U AA T, AV, q ). 3 Restrict and project A to form B = proj KU A KV 4 Use singular values and vectors of B as approximations to those of A. Qiaochu Yuan Tour of Lanczos April 17, 2018 24 / 43

Golub-Kahan Bidiagonalization [GK65] A [ ] [ ] V 1 V j = U1 U j A 1 B T 1...... A T [ ] [ ] U 1 U j = V1 V j V j+1 A 1... B T j 1 A j B 1......... B j 1 At each step j = 1,, k of the bidiagonalization, calculate As and Bs as: ) U j A j qr AV j U j 1 Bj 1 T 12) ) V j+1 B j qr A T U j V j A j 13) A j B j Qiaochu Yuan Tour of Lanczos April 17, 2018 25 / 43

Timeline 1950) Lanczos iteration algorithm 1965) Lanczos algorithm for SVD 1974, 1977) algorithm 2009, 2010, 2015) Randomized 1950 1960 1970 1980 1990 2000 2010 1966, 1971, 1980) Convergence 1975, 1980) bounds for Convergence Lanczos bounds for 2015) Convergence bounds for Randomized Qiaochu Yuan Tour of Lanczos April 17, 2018 26 / 43

Convergence of RBL Analyzed by Musco and Musco in 2015 [MM15]. Theorem ) ) For block size b k, in q = Θ log n ɛ iterations of RSI and q = Θ log n ɛ iterations of RBL, with constant probability 99/100, the following inequalities bounds are satisfied: σi 2 σi 2 B k ) ɛσk+1 2,, for i = 1,, k 14) ) In the event that σ b+1 cσ k with c < 1, taking q = Θ ) and q = Θ logn/ɛ) min1,σk /σ b+1 1) logn/ɛ) min1,σ k /σ b+1 1) suffices for RSI and RBL respectively. Qiaochu Yuan Tour of Lanczos April 17, 2018 27 / 43

Timeline 1950) Lanczos iteration algorithm 1965) Lanczos algorithm for SVD 1974, 1977) algorithm 2009, 2010, 2015) Randomized 1950 1960 1970 1980 1990 2000 2010 2015) 1966, 1971, 1980) Our work: Convergence Convergence 1975, 1980) 1) Convergence bounds bounds for for bounds for Convergence RBL, arbitrary brandomized Lanczos bounds 2) for Superlinear convergence for typical data matrices Qiaochu Yuan Tour of Lanczos April 17, 2018 28 / 43

Aside: Convergence of RSI Many of our analysis techniques are similar to those used by Gu to analyze RSI [Gu15]. The bounds for RBL resulting from the current work will be similar in form to the bounds for RSI. Theorem RSI convergence) Let B k be the approximation returned by the RSI algorithm on A = UΣV T, with target rank k, block size b k, and q iterations. If ˆΩ 1 has full row rank in V T Ω = [ ] ˆΩ T 1 ˆΩ T T 2, then σ j σ j B k ) σ j 1 + ˆΩ 2 2 2 ˆΩ 1 2 2 σb+1 σ j ) 4q+2 15) Qiaochu Yuan Tour of Lanczos April 17, 2018 29 / 43

Current Work - Core Ideas The core pieces of our analysis are: 1 the growth behavior of Chebyshev polynomials, 2 the choice of a clever orthonormal basis for Krylov subspace, 3 the creation of a spectrum gap, by separating the spectrum of A into those singular values that are close to σ k, and those that are sufficiently smaller in magnitude. Qiaochu Yuan Tour of Lanczos April 17, 2018 30 / 43

Setup & Notation For block size b, random Gaussian matrix Ω R n b, we are interested in the column span of [ K q = AΩ AA T ) AΩ AA T ) ] q 1 AΩ Let the SVD of A R m n be UΣV T. For any 0 p q, define ˆK p : = UT 2p+1 Σ) [ ˆΩ Σ 2 ˆΩ Σ 2q p 1) ˆΩ ] = UT 2p+1 Σ)V q p Note V q p R n bq p). We require bq p) k, the target rank. Lemma With ˆK p and K q as previously defined, { } span {K q } span ˆK p 16) Qiaochu Yuan Tour of Lanczos April 17, 2018 31 / 43

Convergence of RBL Theorem Main Result) Let B k be the approximation returned by the RBL algorithm. With notation as previously defined, if the random starting Ω is initialized such that the block Vandermonde formed by the sub-blocks of V q p is invertible, then for all 1 j k, and all choices a of s, r, σ j σ j B k ) 1 + C 2 T 2 2p+1 σ j+s 1 + 2 σj σ j+s+r+1 σ j+s+r+1 ) 17) where p = q k+r b, and C is a constant independent of q. a s is chosen to be non-zero to handle multiple singular values, and can be set to zero otherwise. Qiaochu Yuan Tour of Lanczos April 17, 2018 32 / 43

Summary of Convergence Bounds Rewriting bounds in comparable forms: citation bound req. on b Saad [Saa80] Musco [MM15] spec. indep. Musco [MM15] spec. dep. Current Work λ q) j σ q) j σ q) j 1+L q) j σ q) j λ j 2 tan 2 ΘU,V)T 2 q j σ j 1+C 2 1 log2 n) q 2 σ2 k+1 σ 2 j σ j 1+C 2 ne q min1,σ k /σ b+1 1) σ2 k+1 σ 2 j 1+2 λ j λ j+b λ j+b ) b k σ j ) 1+C3 2T 2 1+2 σ j σ j+r+1 2q+1 2k+r)/b σ j+r+1 b k b k b 1 bq k + r Qiaochu Yuan Tour of Lanczos April 17, 2018 33 / 43

Typical Case - Superlinear Convergence Recall: {a q } convergence superlinearly to a if a q+1 a lim = 0 18) q a q a In practice, Lanczos algorithms classical, block, randomized) often exhibit superlinear convergence behavior. It has been shown that classical Lanczos iteration is theoretically superlinearly convergent under certain assumptions about the singular spectrum [saa94, Li10]. We show this for block Lanczos algorithms, i.e., that under certain assumptions about the singular spectrum, block Lanczos produces rank k approximations B k such that σ j B k ) σ j superlinearly. Qiaochu Yuan Tour of Lanczos April 17, 2018 34 / 43

Typical Case - Superlinear Convergence A typical data matrix might have singular value spectrum decaying to 0, i.e., σ j 0. In this case our bound suggests that convergence is governed by a q := Cr)T 1 p 1 + g) ) 2 Cr) 1 2 with σj = 2 σ j+r+1 g = 2 σ j σ j+r+1 p = 2 q k + r b ) + 1 = 2q + 1 + g + 2g) p ) 2 0 ) 1 σ j+r+1 1 2 k + r b We argue that a q+1 /a q 0 as follows: for all ɛ > 0, choose 1 r so that 1 + g ɛ 1 2. Then, a q+1 ɛ a q 1 Recall 1) our main result holds for all r; 2) k + r = q p)b, and so choosing r amounts to choosing q. Qiaochu Yuan Tour of Lanczos April 17, 2018 35 / 43 )

Effect of r There is some optimal value of r, typically non-zero, which achieves the best convergence factor. The balance is between larger smaller) values of r, which implies lower higher) Chebyshev degree but bigger smaller) gap. ) Figure: Value of reciprocal convergence factor T 1 2q+1 2k+r)/b) 1 + 2 σ j σ j+r+1 σ j+r+1 as r varies, for Daily Activities and Sports Dataset, k = j = 100, b = 10, q = 20. 10-1 10-2 10-3 10-4 10-5 0 10 20 30 40 50 60 r Qiaochu Yuan Tour of Lanczos April 17, 2018 36 / 43

Numerical Example Experimentally, choices of smaller block sizes 1 b < k appear favorable with superlinear convergence for all block sizes. Figure: Daily Activities and Sports Dataset - A R 9120 5625. 10 0 Daily Activities Dataset, target k = 200, j = 200 j - j B k ))/ j rel. err. = 10-5 10-10 10-15 RSI b=200+2 RSI b=200+5 RSI b=200+10 RBL b=200 RBL b=100 RBL b=50 RBL b=20 RBL b=10 RBL b=2 10-20 0 1000 2000 3000 4000 5000 Number of MATVECs = b*1+2*q) Qiaochu Yuan Tour of Lanczos April 17, 2018 37 / 43

Numerical Example Figure: Eigenfaces Dataset - A R 10304 400. 10 0 Eigenfaces Dataset, target k = 100, j = 100 j - j B k ))/ j rel. err. = 10-5 10-10 10-15 RSI b=100 RSI b=100+2 RSI b=100+5 RBL b=100 RBL b=50 RBL b=20 RBL b=10 RBL b=2 RBL b=1 10-20 0 500 1000 1500 2000 2500 Number of MATVECs = b*1+2*q) Qiaochu Yuan Tour of Lanczos April 17, 2018 38 / 43

Conclusions Both the theoretical analysis and numerical evidence suggest that, holding the number of matrix vector operations constant, RBL with smaller block size b is better. For matrices with decaying spectrum, RBL achieves superlinear convergence. However the preference for smaller b must be balanced with the advantages of a larger b for computational efficiency and numerical stability reasons in a practical implementation, and should be further investigated. Qiaochu Yuan Tour of Lanczos April 17, 2018 39 / 43

References I Jane Cullum and William E Donath, A block lanczos algorithm for computing the q algebraically largest eigenvalues and a corresponding eigenspace of large, sparse, real symmetric matrices, Decision and Control including the 13th Symposium on Adaptive Processes, 1974 IEEE Conference on, vol. 13, IEEE, 1974, pp. 505 509. Gene Golub and William Kahan, Calculating the singular values and pseudo-inverse of a matrix, Journal of the Society for Industrial and Applied Mathematics, Series B: Numerical Analysis 2 1965), no. 2, 205 224. Gene Howard Golub and Richard Underwood, The block lanczos method for computing eigenvalues, Mathematical software, Elsevier, 1977, pp. 361 377. Qiaochu Yuan Tour of Lanczos April 17, 2018 40 / 43

References II Ming Gu, Subspace iteration randomization and singular value problems, SIAM Journal on Scientific Computing 37 2015), no. 3, A1139 A1173. Nathan Halko, Per-Gunnar Martinsson, Yoel Shkolnisky, and Mark Tygert, An algorithm for the principal component analysis of large data sets, SIAM Journal on Scientific computing 33 2011), no. 5, 2580 2594. Shmuel Kaniel, Estimates for some computational techniques in linear algebra, Mathematics of Computation 20 1966), no. 95, 369 378. Cornelius Lanczos, An iteration method for the solution of the eigenvalue problem of linear differential and integral operators, United States Governm. Press Office Los Angeles, CA, 1950. Qiaochu Yuan Tour of Lanczos April 17, 2018 41 / 43

References III Ren-Cang Li, Sharpness in rates of convergence for the symmetric lanczos method, Mathematics of Computation 79 2010), no. 269, 419 435. Cameron Musco and Christopher Musco, Randomized block krylov methods for stronger and faster approximate singular value decomposition, Advances in Neural Information Processing Systems, 2015, pp. 1396 1404. Christopher Conway Paige, The computation of eigenvalues and eigenvectors of very large sparse matrices., Ph.D. thesis, University of London, 1971. Vladimir Rokhlin, Arthur Szlam, and Mark Tygert, A randomized algorithm for principal component analysis, SIAM Journal on Matrix Analysis and Applications 31 2009), no. 3, 1100 1124. Qiaochu Yuan Tour of Lanczos April 17, 2018 42 / 43

References IV Yousef Saad, On the rates of convergence of the lanczos and the block-lanczos methods, SIAM Journal on Numerical Analysis 17 1980), no. 5, 687 706. Theoretical error bounds and general analysis of a few lanczos-type algorithms, 1994. Richard Underwood, An iterative block lanczos method for the solution of large sparse symmetric eigenproblems, Tech. report, 1975. Qiaochu Yuan Tour of Lanczos April 17, 2018 43 / 43