How long does it take to compute the eigenvalues of a random symmetric matrix?

Similar documents
Universality in Numerical Computations with Random Data. Case Studies.

TODA-LLY COOL STUFF BARBARA SHIPMAN

Numerical analysis and random matrix theory. Tom Trogdon UC Irvine

Exponential tail inequalities for eigenvalues of random matrices

Numerical Algorithms as Dynamical Systems

1 Tridiagonal matrices

Universality for the Toda algorithm to compute the eigenvalues of a random matrix

Numerical Analysis Lecture Notes

Random matrices: Distribution of the least singular value (via Property Testing)

Department of Mathematics California State University, Los Angeles Master s Degree Comprehensive Examination in. NUMERICAL ANALYSIS Spring 2015

Progress in the method of Ghosts and Shadows for Beta Ensembles

Applied Linear Algebra

Math 411 Preliminaries

Lecture # 11 The Power Method for Eigenvalues Part II. The power method find the largest (in magnitude) eigenvalue of. A R n n.

AIMS Exercise Set # 1

Although the rich theory of dierential equations can often be put to use, the dynamics of many of the proposed dierential systems are not completely u

Derivation of the Kalman Filter

Universality of distribution functions in random matrix theory Arno Kuijlaars Katholieke Universiteit Leuven, Belgium

Statistical Geometry Processing Winter Semester 2011/2012

Algebra C Numerical Linear Algebra Sample Exam Problems

Preliminary/Qualifying Exam in Numerical Analysis (Math 502a) Spring 2012

Iterative Methods for Solving A x = b

Potentially useful reading Sakurai and Napolitano, sections 1.5 (Rotation), Schumacher & Westmoreland chapter 2

NUMERICAL SOLUTION OF DYSON BROWNIAN MOTION AND A SAMPLING SCHEME FOR INVARIANT MATRIX ENSEMBLES

Iterative Methods. Splitting Methods

Determinantal point processes and random matrix theory in a nutshell

Numerical Methods - Numerical Linear Algebra

Linear algebra & Numerical Analysis

Lecture Notes 1: Vector spaces

The circular law. Lewis Memorial Lecture / DIMACS minicourse March 19, Terence Tao (UCLA)

Math 102, Winter Final Exam Review. Chapter 1. Matrices and Gaussian Elimination

Numerical Methods in Matrix Computations

arxiv: v1 [math-ph] 6 Dec 2007

Optimization of Quadratic Forms: NP Hard Problems : Neural Networks

Iterative Algorithm for Computing the Eigenvalues

MATH 320, WEEK 7: Matrices, Matrix Operations

6.4 Krylov Subspaces and Conjugate Gradients

Bulk scaling limits, open questions

Markov operators, classical orthogonal polynomial ensembles, and random matrices

Stabilization and Acceleration of Algebraic Multigrid Method

ECE 595, Section 10 Numerical Simulations Lecture 7: Optimization and Eigenvalues. Prof. Peter Bermel January 23, 2013

Quantum Probability and Asymptotic Spectral Analysis of Growing Roma, GraphsNovember 14, / 47

Orthogonal iteration to QR

Matrix-valued stochastic processes

1. Matrix multiplication and Pauli Matrices: Pauli matrices are the 2 2 matrices. 1 0 i 0. 0 i

Vector spaces. DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis.

EECS 275 Matrix Computation

Problem Set (T) If A is an m n matrix, B is an n p matrix and D is a p s matrix, then show

632 CHAP. 11 EIGENVALUES AND EIGENVECTORS. QR Method

Painlevé Representations for Distribution Functions for Next-Largest, Next-Next-Largest, etc., Eigenvalues of GOE, GUE and GSE

Scientific Computing with Case Studies SIAM Press, Lecture Notes for Unit VII Sparse Matrix

Preconditioning via Diagonal Scaling

FREE PROBABILITY THEORY

Wigner s semicircle law

Bindel, Fall 2016 Matrix Computations (CS 6210) Notes for

Linear Analysis Lecture 16

Solving linear equations with Gaussian Elimination (I)

Lecture 4 Eigenvalue problems

Solving Polynomial Equations in Smoothed Polynomial Time

Random Matrices: Invertibility, Structure, and Applications

Exponentials of Symmetric Matrices through Tridiagonal Reductions

EIGENVALUE PROBLEMS. EIGENVALUE PROBLEMS p. 1/4

Assessing the dependence of high-dimensional time series via sample autocovariances and correlations

Compound matrices and some classical inequalities

Orthogonal iteration to QR

Advances in Random Matrix Theory: Let there be tools

Last Time. Social Network Graphs Betweenness. Graph Laplacian. Girvan-Newman Algorithm. Spectral Bisection

Numerical Methods I Non-Square and Sparse Linear Systems

THE RELATION BETWEEN THE QR AND LR ALGORITHMS

Eigenvalues and Eigenvectors

Non white sample covariance matrices.

THE QR METHOD A = Q 1 R 1

Inhomogeneous circular laws for random matrices with non-identically distributed entries

HOMEWORK PROBLEMS FROM STRANG S LINEAR ALGEBRA AND ITS APPLICATIONS (4TH EDITION)

Eigenvalues and Singular Values of Random Matrices: A Tutorial Introduction

Computing Eigenvalues and/or Eigenvectors;Part 2, The Power method and QR-algorithm

AMS526: Numerical Analysis I (Numerical Linear Algebra)

Random Fermionic Systems

APPLIED NUMERICAL LINEAR ALGEBRA

This can be accomplished by left matrix multiplication as follows: I

Universality for random matrices and log-gases

The Finite Non-Periodic Toda Lattice: A Geometric And Topological Viewpoint

Hamiltonian partial differential equations and Painlevé transcendents

Semicircle law on short scales and delocalization for Wigner random matrices

LINEAR SYSTEMS (11) Intensive Computation

13-2 Text: 28-30; AB: 1.3.3, 3.2.3, 3.4.2, 3.5, 3.6.2; GvL Eigen2

Dependence. Practitioner Course: Portfolio Optimization. John Dodson. September 10, Dependence. John Dodson. Outline.

Lattice polygons. P : lattice polygon in R 2 (vertices Z 2, no self-intersections)

The Behavior of Algorithms in Practice 2/21/2002. Lecture 4. ɛ 1 x 1 y ɛ 1 x 1 1 = x y 1 1 = y 1 = 1 y 2 = 1 1 = 0 1 1

Elementary linear algebra

Discrete Rigid Body Dynamics and Optimal Control

Double-shift QR steps

A Method for Constructing Diagonally Dominant Preconditioners based on Jacobi Rotations

(Linear equations) Applied Linear Algebra in Geoscience Using MATLAB

Beyond the Gaussian universality class

CME 302: NUMERICAL LINEAR ALGEBRA FALL 2005/06 LECTURE 0

On the Skeel condition number, growth factor and pivoting strategies for Gaussian elimination

The Simplex Method: An Example

Gaussian Free Field in beta ensembles and random surfaces. Alexei Borodin

Random Toeplitz Matrices

Transcription:

How long does it take to compute the eigenvalues of a random symmetric matrix? Govind Menon (Brown University) Joint work with: Christian Pfrang (Ph.D, Brown 2011) Percy Deift, Tom Trogdon (Courant Institute) Enrique Pujals (IMPA).

The guiding question How do eigenvalue algorithms perform on large, random matrices?* * this is theoretical numerical analysis (at least, so far), I'm not a professional numerical linear algebraist...

Three stories (1) Eigenvalue algorithms as dynamical systems (flashback to the 1980s). (2) Random matrices and empirical universality (with Pfrang, Deift, Trogdon). (3) A few explanations (with Pujals).

Part 1. Eigenvalue algorithms as dynamical systems

The QR factorization Assume L is a real, symmetrix matrix. The Gram-Schmidt procedure may be viewed as the matrix factorization L = QR Q is an orthogonal matrix R is an upper triangular matrix

The naive QR algorithm The QR algorithm is an iterative scheme to compute the eigenvalues of L. There are two steps: Step 1: Compute QR factors at iterate k. L k = Q k R k Step 2: Intertwine these factors to obtain the next iterate L k+1 = R k Q k. Observe that L k+1 = Q T k L k Q k. The unshifted QR algorithm generates a sequence of isospectral matrices that typically converges to the desired diagonal matrix of eigenvalues.

The practical QR algorithm (1) Reduce the initial matrix to tridiagonal form. (2) Use a shift while iterating as follows. At the k-th step compute the QR factorization: L k µ k I = Q k R k. Then set L k+1 = R k Q k + µ k I. Again, we find L k+1 = Q T k L k Q k. The shift parameter is often chosen to be the eigenvalue of the lower right 2x2 block that is closer to the right-hand corner entry of the matrix (Wilkinson).

Jacobi matrices Assume L is tridiagonal a 1 b 1 0 b 1 a 2 b 2 L =. 0 b 2 a.. 3 0.......... b n 1 0 0 b n 1 a n and that the off-diagonal entries are positive. This is a practical assumption: we may reduce a full matrix to a Jacobi matrix with the same spectrum by Householder reflections.

Spectral and inverse spectral theory (Stieltjes). The spectral data consist of the eigenvalues of L and the top row of U: L = U U T, u =(1, 0,...,0) U. Given L, clearly we may find and u. Stieltjes established the converse: given and we may reconstruct. u L n j=1 z u 2 j j = z a n + 1 z a n 1 + b 2 n 1 b 2 n 2 z a n 2 +.

QR as a dynamical system: the phase space Assume L is a Jacobi matrix with distinct eigenvalues, say L = U U T. The QR iterates are isospectral Jacobi matrices. L k = U k U T k, L 0 = L. A more sophisticated form of the inverse spectral theorem reveals that the isospectral manifold is a convex polytope --the permutahedron. The iterates of the QR algorithm live in the interior of this polytope.

Phase space and QR iterates for 3x3 matrices I1 0 I,1 0 0 hl 0 0 A3 b2 a3 0 A 0 0 albol [al O01!1 0 0 b a2 A 0 0 i io /I:3 1 C D/ L 0 b a3 a ba a F a2 a b 0 0 A3 b a2 A2 0 A 0 0 A 0 Convention: The eigenvalues are labelled in decreasing order: 1 > 2 >...> n Deift, Nanda, Tomei (SIAM J. Numer. Anal., 1983).

The Toda lattice Hamiltonian system of n particles with unit mass on the line. Each interacts with its neighbors through an exponential potential. q 1 q n H(p, q) = 1 2 n j=1 p 2 j + n 1 j=1 e q j q j+1 q j = e q j 1 q j e q j q j+1

Particle system to tridiagonal matrices (Flaschka, Manakov) Set a j = 1 2 p j, b j = 1 2 e(q j q j+1 )/2. a 1 b 1 0 b 1 a 2 b 2 0 b 1 0 b 1 0 b 2 L =. 0 b 2 a.. 3 0.......... b n 1 0 0 b n 1 a n B =. 0 b 2 0.. 0........ b n 1 0 0 b n 1 0 Jacobi Tridiagonal, skew-symmetric H(p, q) = 1 2 n p 2 j + n 1 e q j q j+1 H(L) = 1 2 Trace(L2 ) j=1 j=1 q = H p, ṗ = H q L = BL LB := [B,L]

Moser's solution formula (1975) The Toda lattice may be solved explicitly by inverse spectral theory. Suppose we know the spectral data for the initial matrix. If we denote L(t) =U(t) U(t) T, u(t) = (1, 0,...,0) U(t) Then v(t) =u(0)e t, u(t) = v(t) v(t). Thus, L(t) may be recovered by Stieltjes inverse spectral mapping.

Completely integrable Hamiltonian flows on Jacobi matrices Consider a scalar function G with derivative g and extend it to a function on matrices. L = U U T, G(L) =UG( )U T. Fundamental fact: the Lax equation L =[P s (g(l)),l], defines a completely integrable Hamiltonian flow, with Hamiltonian H(L) =TrG(L). P s is the projection onto skew-symmetric matrices. Complete integrability: all these flows commute, and all of them may be solved by Moser's recipe.

Symes theorem (1980) The iterates of the QR algorithm are exactly the same as the solutions to the QR flow evaluated at integer times! The Hamiltonian for the QR flow is the spectral entropy., diag (A 1, 2, 3) diag (h, h 3, h 2) diag (A 2, / 1, / 3) diag (h 3, h, h 2) diag (A 2, 3, / 1) diag (I,,,, )

Hamiltonian = algorithm G(x) = x2 The Toda algorithm: 2, H(L) =1 2 Tr L2. The QR algorithm: G(x) =x (log x 1), H(L) =Tr(L log L L). The signum algorithm (Pfrang, Deift, M.): G(x) = x, H(L) =Tr L.

Summary (part 1) Unexpected connections between dynamical systems and fundamental iterative algorithms were discovered in the 1980s. Many numerical linear algebra algorithms were shown to be time-1 maps of completely integrable Hamiltonian flows by Deift, Demmel, Li, Tomei and Nanda. Moody Chu and Watkins investigated these ideas, from a numerical linear algebra viewpoint. Certain sorting and dynamic programming algorithms, can also be recast as gradient flows (Brockett, Bloch, Fabyusovich). This includes, for example, a flow that solves the assignment problem -- so purely discrete problems can be solved by continuous methods. Bayer and Lagarias found gradient flows that underlie Karmarkar's algorithm and other interior point methods.

Part 2. Random matrices and empirical universality

Motivation An important feature of many numerical algorithms is that ``typical behaviour is often much better than ``worst case behaviour. It is interesting to try to quantify this. Some examples: 1) Testing Gaussian elimination on random matrices: (Goldstine, von Neumann 1947, Demmel 1988, Edelman, 1988). 2) Average runtime for the simplex method (Smale, 1983). 3) Smoothed analysis (Spielman, Sankar, Teng, 2004).

Deflation as a stopping criterion When computing eigenvalues we must choose a stopping criterion to decide on convergence. Typically, we try to split the matrix L into blocks as follows: L = L 11 L 12 L 21 L 22, L = L 11 0 0 L 22. We call this deflation. For a given tolerance, we define the deflation time T = min{k 0 max j j(l k ) j( L k ) < }

Christian Pfrang s thesis (2011) Empirically investigates the performance of three eigenvalue algorithms on random matrices from different ensembles. The algorithms: (1) QR -- with and without shifts. (2) The Toda algorithm. (3) The signum algorithm. The random matrix ensembles: (1) Gaussian Orthogonal Ensemble (GOE). (2) Gaussian Wigner ensemble. (3) Bernoulli ensemble. (4) Hermite-1 ensemble. (5) Jacobi uniform ensemble (JUE). (6) Uniform doubly stochastic Jacobi ensemble (UDSJ)

The numerical experiments Choose: (1) an algorithm (QR, Toda, signum) (2) a deflation tolerance in the range: 1e-2 -- 1e-12 (3) a matrix size between 10--200 (4) an ensemble (1--6) Compute the deflation time and deflation index for a large number of random samples (typically 5000--10000). In this manner, he generates empirical distributions of the deflation time and deflation index. (Various computational tricks are used to streamline this process).

Examples of empirical distributions unshifted QR, GOE data Toda, GOE data These are histograms of the deflation time for a fixed tolerance (1 e-8) and matrix sizes that range from 10, 30, 50,..., 190.

Empirical universality For each algorithm, the rescaled (zero mean, unit variance) empirical distributions collapse onto a single curve that depends only on the algorithm (i.e. not on the ensemble, matrix size or deflation tolerance). Ensembles (1)-(4) have the Wigner law as limiting spectral distribution. Ensembles (5)-(6) do not. However, we find that the QR algorithm with Wilkinson shift yields the same distribution in fluctuations for these ensembles too.

Empirical deflation time statistics: unshifted QR on GOE Frequency 0.00 0.01 0.02 0.03 0.04 0.05 0 20 40 60 80 100 120 Deflation Time Frequency 0.00 0.05 0.10 0.15 0 20 40 60 80 100 120 Deflation Time (a) (b) Figure 1: The QR algorithm applied to GOE. (a) Histogram for the empirical frequency of τ n,ϵ as n ranges from 10, 30,...,190 for a fixed deflation tolerance ϵ =10 8.Thecurves(thereare10ofthemplottedoneontop of another) do not depend significantly on n. (b) Histogram for empirical frequency of τ n,ϵ when ϵ =10 k, k =2, 4, 6, 8 for fixed matrix size n =190. The curves move to the right as ϵ decreases.

Empirical universality for QR with shifts Frequency 0.0 0.1 0.2 0.3 0.4 0.5 4 2 0 2 4 6 8 10 Normalized Deflation Time (a) Frequency 0.0 0.1 0.2 0.3 0.4 0.5 4 2 0 2 4 6 8 10 Normalized Deflation Time (b) The QR algorithm with Wilkinson shift takes very few iterations to deflate. This behavior is insensitive to matrix size. Frequency 0.0 0.1 0.2 0.3 0.4 0.5 4 2 0 2 4 6 8 10 Normalized Deflation Time (c) Frequency 0.0 0.1 0.2 0.3 0.4 0.5 4 2 0 2 4 6 8 10 Normalized Deflation Time (d) Nevertheless, by choosing very small deflation thresholds, we can begin to see a universal empirical distribution for deflation times.

unshifted QR. Fit is Gamma(2,1) with exponential tail. Toda. Fit is standard normal. Rescaled histograms for all matrix sizes, deflation tolerance, and ensembles (1)--(4) (GOE, Bernoulli, Gaussian Wigner, Hermite-1).

Summary (part 2) Numerical experiments reveal universal fluctuations in the distribution of the deflation time for three Hamiltonian eigenvalue algorithms. Such "universality in halting times" is not restricted to these algorithms. Tom Trogdon performed extensive numerical computations on several algorithms, and found similar universality. These include a critical scaling regime for the conjugate gradient method, and an interesting decision time algorithm. These results are reported in Deift, M., Olver, Trogdon (PNAS, 2014).

Part 3. The largest gap conjecture, gradient flows on the permutahedron and sorting networks

The largest gap conjecture Denote the largest gap in the spectrum by g n = max 1 j n 1 ( j j+1) Our guess: The observed "universal distribution" for the Toda flow is the edge-scaling limit of QR and signum are more subtle. 1/g n.

The largest gap conjecture: numerical evidence 0.5 Dim(s) =40 0.4 Frequency 0.3 0.2 0.1 0.0-3 -2-1 0 1 2 3 Halting Time Fluctuations Overlay of statistics of the largest gap and observed deflation time.

The largest gap via stochastic Airy The edge-scaling limit of the spectrum is given by the Stochastic Airy operator. d 2 + x +2ḃ =, 0 <x<, (0) = lim dx2 x (x) =0. Heuristically, the spectrum of stochastic Airy is a random perturbation of the spectrum of the Airy operator, i.e. minus the zeros of the Airy function Eigenvalues of Airy. Eigenvalues of stochastic Airy. An explicit description of the law of the largest gap is not known. Even the fact that it is well-defined requires some care...

Why the largest gap? Conventional wisdom: Toda rates are determined by the smallest gap, since the smallest gap determines the rate of convergence to equilibrium (Moser). But convergence to equilibrium is not the same as deflation. ẋ = µx, ẏ = y, 0 <µ. Convergence to equilibrium occurs at slow rate µ Approach to x-axis occurs at fast rate

Understanding deflation (M, Pujals) Main new idea: Combine analysis of Toda with combinatorics of permutations. Invariant manifolds: Each (n-k)-face on the polytope is invariant. Each such face corresponds to block diagonal matrices with k blocks. =3 L = L 1 0 0 0 0 L 2 0 0 0 L 3 0....... 0 0 0 L k

Toda phase space for 4x4 matrices (1234) (2134) (124 (1423).././(742()2413)/(31._234) (1432)[(4-12ii :431)(2341) (4132) (341)(211) (4312) (4321) Deift, Nanda, Tomei (SIAM J. Numer. Anal., 1983).

Normal hyperbolicity (M, Pujals) Main observation : order relations between eigenvalues of blocks imply normal hyperbolicity of invariant manifolds. Simplest example (enough to understand largest gap): The stable equilibrium lies at the intersection of (n-1) invariant manifolds. Each such manifold is of the form b k =0, 1 k n 1. The normal rate of attraction for any such manifold is at least k k+1. Easy consequence of Schur-Horn theorem for Toda. Analogous statement requires more care for other flows.

From deflation to global dynamics... A J=80 tl8\ C[ / 11 t:60t _" a k I tl //:: Deift, Nanda, Tomei (SIAM J. Numer. Anal., 1983).

Sorting networks Projection onto diagonal of a Jacobi matrix evolving by Toda (Roger Brockett). Angel, Holroyd, Romik, Virag-- Uniform sorting networks (2007)

From random matrix theory to a sorting network t With probability 1, each choice of initial condition determines a sorting network. Law of the sorting network is determined by the scattering theorem for Toda (and related flows). @ Figure 1 x 932 Figure 2 ~ J. Moser -- Finitely many mass-points on the line (1976)

Summary (1) Several iterative eigenvalue algorithms are tied to integrable systems. (2) Empirical universality for fluctuations of deflation times emerges from universality of gap statistics for Tracy-Widom point process. (3) Each algorithm and matrix ensemble determines a class of random sorting networks. It seems reasonable to hope that universality could hold for the sorting networks (this has not been tested).