Using the Karush-Kuhn-Tucker Conditions to Analyze the Convergence Rate of Preconditioned Eigenvalue Solvers

Similar documents
c 2009 Society for Industrial and Applied Mathematics

A Note on Inverse Iteration

Toward the Optimal Preconditioned Eigensolver: Locally Optimal Block Preconditioned Conjugate Gradient Method

Is there life after the Lanczos method? What is LOBPCG?

Preconditioned Locally Minimal Residual Method for Computing Interior Eigenpairs of Symmetric Operators

A GEOMETRIC THEORY FOR PRECONDITIONED INVERSE ITERATION I: EXTREMA OF THE RAYLEIGH QUOTIENT

A GEOMETRIC THEORY FOR PRECONDITIONED INVERSE ITERATION APPLIED TO A SUBSPACE

SOLVING MESH EIGENPROBLEMS WITH MULTIGRID EFFICIENCY

STEEPEST DESCENT AND CONJUGATE GRADIENT METHODS WITH VARIABLE PRECONDITIONING

Implementation of a preconditioned eigensolver using Hypre

EIGIFP: A MATLAB Program for Solving Large Symmetric Generalized Eigenvalue Problems

Preconditioned Eigenvalue Solvers for electronic structure calculations. Andrew V. Knyazev. Householder Symposium XVI May 26, 2005

MATH 304 Linear Algebra Lecture 20: The Gram-Schmidt process (continued). Eigenvalues and eigenvectors.

The Conjugate Gradient Method

Lecture 3: Inexact inverse iteration with preconditioning

Preconditioned Eigensolver LOBPCG in hypre and PETSc

Rayleigh-Ritz majorization error bounds with applications to FEM and subspace iterations

Linear Algebra Massoud Malek

Inexact inverse iteration with preconditioning

Numerical Optimization

Week Quadratic forms. Principal axes theorem. Text reference: this material corresponds to parts of sections 5.5, 8.2,

A Bibliography of Publications of Andrew Knyazev

Constrained Optimization

Numerical Methods - Numerical Linear Algebra

Optimization Problems with Constraints - introduction to theory, numerical Methods and applications

AMS526: Numerical Analysis I (Numerical Linear Algebra)

A Model-Trust-Region Framework for Symmetric Generalized Eigenvalue Problems

Definition (T -invariant subspace) Example. Example

MATH 5720: Unconstrained Optimization Hung Phan, UMass Lowell September 13, 2018

Numerical optimization

Math 3191 Applied Linear Algebra

Linear algebra and applications to graphs Part 1

UNDERGROUND LECTURE NOTES 1: Optimality Conditions for Constrained Optimization Problems

Chapter 7 Iterative Techniques in Matrix Algebra

Numerical optimization. Numerical optimization. Longest Shortest where Maximal Minimal. Fastest. Largest. Optimization problems

NEW ESTIMATES FOR RITZ VECTORS

Interior Point Methods. We ll discuss linear programming first, followed by three nonlinear problems. Algorithms for Linear Programming Problems

EIGENVALUE PROBLEMS. Background on eigenvalues/ eigenvectors / decompositions. Perturbation analysis, condition numbers..

ETNA Kent State University

Computational Methods. Eigenvalues and Singular Values

Characterization of half-radial matrices

Math 164-1: Optimization Instructor: Alpár R. Mészáros

Majorization for Changes in Ritz Values and Canonical Angles Between Subspaces (Part I and Part II)

Preconditioned inverse iteration and shift-invert Arnoldi method

Remark By definition, an eigenvector must be a nonzero vector, but eigenvalue could be zero.

Max Planck Institute Magdeburg Preprints

PRECONDITIONED ITERATIVE METHODS FOR LINEAR SYSTEMS, EIGENVALUE AND SINGULAR VALUE PROBLEMS. Eugene Vecharynski. M.S., Belarus State University, 2006

AMS526: Numerical Analysis I (Numerical Linear Algebra)

On the Modification of an Eigenvalue Problem that Preserves an Eigenspace

Iterative methods for Linear System

Structural and Multidisciplinary Optimization. P. Duysinx and P. Tossings

Numerical Methods for Solving Large Scale Eigenvalue Problems

Announce Statistical Motivation Properties Spectral Theorem Other ODE Theory Spectral Embedding. Eigenproblems I

Computational Methods CMSC/AMSC/MAPL 460. Eigenvalues and Eigenvectors. Ramani Duraiswami, Dept. of Computer Science

AN ITERATION. In part as motivation, we consider an iteration method for solving a system of linear equations which has the form x Ax = b

Iterative Methods for Solving A x = b

Diagonalization of Matrix

A geometric theory for preconditioned inverse iteration III: A short and sharp convergence estimate for generalized eigenvalue problems

Notes on Eigenvalues, Singular Values and QR

Family Feud Review. Linear Algebra. October 22, 2013

you expect to encounter difficulties when trying to solve A x = b? 4. A composite quadrature rule has error associated with it in the following form

ETNA Kent State University

Lecture 1: Introduction. Outline. B9824 Foundations of Optimization. Fall Administrative matters. 2. Introduction. 3. Existence of optima

ETNA Kent State University

Symmetric and self-adjoint matrices

University of Colorado Denver Department of Mathematical and Statistical Sciences Applied Linear Algebra Ph.D. Preliminary Exam January 23, 2015

Linear Algebra: Matrix Eigenvalue Problems

A geometric theory for preconditioned inverse iteration II: Convergence estimates

Mathematical foundations - linear algebra

AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 23: GMRES and Other Krylov Subspace Methods; Preconditioning

Least Squares Optimization

Gradient Descent. Dr. Xiaowei Huang

Multigrid absolute value preconditioning

Lecture 4 Eigenvalue problems

Lecture Notes for Inf-Mat 3350/4350, Tom Lyche

M.A. Botchev. September 5, 2014

Constrained optimization

Remark 1 By definition, an eigenvector must be a nonzero vector, but eigenvalue could be zero.

Math 108b: Notes on the Spectral Theorem

Linear vector spaces and subspaces.

Iterative methods for Linear System of Equations. Joint Advanced Student School (JASS-2009)

Lecture 18: Optimization Programming

Preliminary/Qualifying Exam in Numerical Analysis (Math 502a) Spring 2012

Solving Regularized Total Least Squares Problems

Eigenvalues and Eigenvectors

The Nullspace free eigenvalue problem and the inexact Shift and invert Lanczos method. V. Simoncini. Dipartimento di Matematica, Università di Bologna

ECE580 Partial Solution to Problem Set 3

Linear & nonlinear classifiers

minimize x subject to (x 2)(x 4) u,

Linear & nonlinear classifiers

Announce Statistical Motivation ODE Theory Spectral Embedding Properties Spectral Theorem Other. Eigenproblems I

c Igor Zelenko, Fall

Math 5630: Iterative Methods for Systems of Equations Hung Phan, UMass Lowell March 22, 2018

c 2006 Society for Industrial and Applied Mathematics

LINEAR ALGEBRA BOOT CAMP WEEK 2: LINEAR OPERATORS

Application of Numerical Algebraic Geometry to Geometric Data Analysis

Chapter 4 Euclid Space

MATH 1553-C MIDTERM EXAMINATION 3

Some definitions. Math 1080: Numerical Linear Algebra Chapter 5, Solving Ax = b by Optimization. A-inner product. Important facts

ECS231 Handout Subspace projection methods for Solving Large-Scale Eigenvalue Problems. Part I: Review of basic theory of eigenvalue problems

Transcription:

Using the Karush-Kuhn-Tucker Conditions to Analyze the Convergence Rate of Preconditioned Eigenvalue Solvers Merico Argentati University of Colorado Denver Joint work with Andrew V. Knyazev, Klaus Neymeyr and Evgueni E. Ovtchinnikov October 18, 2010 Merico Argentati (UCD) Preconditioned Eigenvalue Solvers October 18, 2010 1 / 23

Outline of Presentation Introduction Preconditioning for linear systems and eigenvalue problems Background for our research and analysis Preconditioned gradient iteration with a fixed step size A convergence rate bound Apply Karush-Kuhn-Tucker (KKT) theory (non linear programming) Conclusions Merico Argentati (UCD) Preconditioned Eigenvalue Solvers October 18, 2010 2 / 23

Introduction - Preconditioned Eigenvalue Solvers Preconditioned iterative solvers are designed for extremely large eigenvalue problems and popular in various application areas The convergence theory of such eigensolvers is still in its infancy with just a handful of convergence rate bounds derived. Only one of these known bounds, for the simplest preconditioned eigensolver the preconditioned gradient iteration with a fixed step size is sharp. We give a text book proof for this bound, that holds for the complex case, using novel ideas, and the Karush-Kuhn-Tucker conditions Merico Argentati (UCD) Preconditioned Eigenvalue Solvers October 18, 2010 3 / 23

Preconditioning for Linear Systems In linear algebra and numerical analysis, a preconditioner of a matrix A is a matrix such that P 1 A has a smaller condition number than A Instead of solving the original linear system Ax = b, one may solve the left precondioned (or right) system T (Ax b) = 0, where T = P 1 Generally neither T = P 1 nor P are explicitly available and multiplication is done in a matrix free fashion Preconditioned iterative methods for Ax b = 0 are, in most cases, mathematically equivalent to standard methods for P 1 (Ax b) = 0 Examples include the preconditioned Richardson iteration, the conjugate gradient method, the biconjugate gradient method, etc. Preconditoned Richardson iteration method for example implements x n+1 = x n γ n T (Ax n b), n 0 See http://en.wikipedia.org/wiki/preconditioner. Merico Argentati (UCD) Preconditioned Eigenvalue Solvers October 18, 2010 4 / 23

Preconditioning for Eigenvalue Problems Our goal is to find the eigenpair for Ax = λx, or for the generalize eigenvalue problem Bx = λax, often for the smallest or largest eigenvalue. Suppose that the targeted eigenvalue λ is known (approximately). Then one can compute the corresponding eigenvector for the linear system T (Ax λ I )x = 0. Then solve using Richardson iteration x n+1 = x n γ n T (A λ I )x n, n 0 A popular choice for λ is the Rayleigh quotient µ(x n ) = x T n Bx n /x T n Ax n which brings preconditioned optimization techniques to the scene. Merico Argentati (UCD) Preconditioned Eigenvalue Solvers October 18, 2010 5 / 23

Preconditioning for Eigenvalue Problems (Cont.) Due to the changing value λ n, a comprehensive theoretical convergence analysis is much more difficult, compared to the linear systems case, even for the simplest methods, such as the Richardson iteration Preconditioned eigensolvers have been used in practice, e.g., for band structure calculations, thin elastic structures, electronic structure calculations, etc. Examples include preconditioned gradient iteration, steepest descent, power iteration, inverse iteration, Rayleigh quotient iteration, LOBPCG, etc. Merico Argentati (UCD) Preconditioned Eigenvalue Solvers October 18, 2010 6 / 23

Background for Our Research and Analysis The work to analyze the gradient iterative method with a fixed step size convergent rate has been in progress for several years by these collaborators The original proof [Knyazev and Neymeyr (2003)] has been recently significantly simplified and shortened using a gradient flow integration approach [Knyazev and Neymeyr (2009)] We use powerful tools, which are uncommon in numerical linear algebra, and provide Preconditioned eigenvalue solver convergence theory in a nutshell, [Argentati, Knyazev, Neymeyr, Ovtchinnikov - tbp] The problem is captivating and has an interesting geometrical interpretation involving norms, angles, closed balls, the Rayleigh quotient and optimization theory Merico Argentati (UCD) Preconditioned Eigenvalue Solvers October 18, 2010 7 / 23

Convergence Rate Bound Problem Definition We consider a generalized eigenvalue problem Bx = µ(x)ax, with real symmetric positive definite matrices A and B with eigenvalues enumerated in decreasing order µ 1... µ n > 0 The objective is to approximate iteratively the largest eigenvalue µ 1 by maximizing the Rayleigh quotient µ(x) = x T Bx/x T Ax. We iteratively correct a current iterate x along the preconditioned gradient of the Rayleigh quotient, i.e. x = x + 1 T (Bx µ(x)ax). (1) µ(x) T is a real symmetric positive definite matrix called the preconditioner, where (1 γ)z T T 1 z z T Az (1 + γ)z T T 1 z, z, for a given γ [0, 1) Merico Argentati (UCD) Preconditioned Eigenvalue Solvers October 18, 2010 8 / 23

Convergence Rate Bound Main Theorem Theorem If µ i+1 < µ(x) µ i then µ(x ) µ(x) and either µ i < µ(x ) or µ i µ(x ) µ(x σ 2 µ i µ(x), σ = γ + (1 γ) µ i+1. (2) ) µ i+1 µ(x) µ i+1 µ i Note that: 1 We have σ < 1 2 σ may be small if γ is small and there is a large gap between µ i+1 and µ i Merico Argentati (UCD) Preconditioned Eigenvalue Solvers October 18, 2010 9 / 23

Simplify the Problem By changing the basis to the eigenvectors of A 1 B we make B diagonal and A = I, and denote the new inner product by (, ), so µ(x)x = Bx (I T )(Bx µ(x)x) (3) with I T γ and µ(x) = (x, Bx)/(x, x) Merico Argentati (UCD) Preconditioned Eigenvalue Solvers October 18, 2010 10 / 23

Geometry of Problem We denote r = Bx µ(x)x and B γ = {y : Bx y γ r }, which is a closed ball of radius γ r centered at Bx Trivially, µ(x)x = Bx (I T )r B γ since (I T )r γ r The span{x} does not intersect the ball B γ = B γ(x) Merico Argentati (UCD) Preconditioned Eigenvalue Solvers October 18, 2010 11 / 23

Motivation for Minimization The span{x} does not intersect the ball B γ = B γ (x) The latter gives cos( {x, Bx}) < cos( {y, Bx}) < 1 and thus (y, Bx) 0 for any vector y B γ. By direct calculation, cos( {y, Bx}) (y, Bx) x = cos( {x, Bx}) y (x, Bx) = µ(y) µ(x) thus µ(y) > µ(x) (y, Bx), y B x B This motivates us to vary y B γ intending to minimize µ(y) Merico Argentati (UCD) Preconditioned Eigenvalue Solvers October 18, 2010 12 / 23

The 2-D Case (Outline of Proof - 1 of 2) This Proof Works for Complex Case Proof. For a minimizer y we can show (Bx, y) = y 2. Combining this with the assumption that y B γ, we have for normalize (unit) vectors ˆx and ŷ ( ) Bˆx 2 (Bˆx, ŷ) 2 γ 2 Bˆx 2 µ 2 (ˆx). (4) Form ˆx = α 1x 1 + α 2x 2, α 1 2 + α 2 2 = 1 and ŷ = β 1x 1 + β 2x 2, β 1 2 + β 2 2 = 1. This yields α 1 2 µ(x) µ2 =, α 2 2 µ1 µ(x) =, µ 1 µ 2 µ 1 µ 2 and β 1 2 µ(y) µ2 =, β 2 2 µ2 µ(y) =. µ 1 µ 2 µ 1 µ 2 Then we obtain β2 α 1 α 2 µ1 µ2 < β 1 β2 α 1 µ 1 µ 2 α 2 β 1 γ(µ1 µ2), β 1 α 1 Merico Argentati (UCD) Preconditioned Eigenvalue Solvers October 18, 2010 13 / 23

The 2-D Case (Outline of Proof - 2 of 2) This Proof Works for Complex Case Proof. Then we obtain β2 α 1 α 2 µ1 µ2 < β 1 β2 α 1 µ 1 µ 2 α 2 β 1 γ(µ1 µ2). β 1 Insertion of α i, β i we get ( ) 2 µ 1 µ(y) µ(x) µ 2 µ(y) µ 2 µ 1 µ(x) < γ + (1 γ) µ2 = σ 2. µ 1 α 1 Merico Argentati (UCD) Preconditioned Eigenvalue Solvers October 18, 2010 14 / 23

The 2-D Case (Outline of Proof - 2 of 2) This Proof Works for Complex Case Proof. Then we obtain β2 α 1 α 2 µ1 µ2 < β 1 β2 α 1 µ 1 µ 2 α 2 β 1 γ(µ1 µ2). β 1 Insertion of α i, β i we get ( ) 2 µ 1 µ(y) µ(x) µ 2 µ(y) µ 2 µ 1 µ(x) < γ + (1 γ) µ2 = σ 2. µ 1 α 1 µ 1 µ(y) µ(y) µ 2 σ 2 µ 1 µ(x) µ(x) µ 2 (5) Merico Argentati (UCD) Preconditioned Eigenvalue Solvers October 18, 2010 14 / 23

The 2-D Case (Outline of Proof - 2 of 2) This Proof Works for Complex Case Proof. Then we obtain β2 α 1 α 2 µ1 µ2 < β 1 β2 α 1 µ 1 µ 2 α 2 β 1 γ(µ1 µ2). β 1 Insertion of α i, β i we get ( ) 2 µ 1 µ(y) µ(x) µ 2 µ(y) µ 2 µ 1 µ(x) < γ + (1 γ) µ2 = σ 2. µ 1 α 1 obtaining (2) µ 1 µ(y) µ(y) µ 2 µ 1 µ(x ) µ1 µ(y) µ(x ) µ 2 µ(y) µ 2 σ 2 µ 1 µ(x) µ(x) µ 2 (5) σ 2 µ 1 µ(x) µ(x) µ 2 (6) Merico Argentati (UCD) Preconditioned Eigenvalue Solvers October 18, 2010 14 / 23

Formulation of Problem Using Minimum µ(y) Choose µ i+1 < κ < µ i, and let κ = µ(x), then κ = µ(x) = µ(x ) < µ(y ) µ(w) µ(x ) where {x, y } is a global minimizer for µ(y) and w B γ(x) is a minimizer for µ(w) Our goal is to prove that In this case we have since µ i µ(y ) σ 2 µ i µ(x) (7) µ(y ) µ i+1 µ(x) µ i+1 µ i µ(x ) µ i µ(y ) σ 2 µ i µ(x), (8) µ(x ) µ i+1 µ(y ) µ i+1 µ(x) µ i+1 µ i d d µ i+1 is a decreasing function of d (µ i+1, µ i ). Merico Argentati (UCD) Preconditioned Eigenvalue Solvers October 18, 2010 15 / 23

Setup for Non-Linear Programming Given µ i+1 < µ(x) < µ i, we want to find y B γ such that µ(y) is a minimum. Now we will let x, y both vary. But we choose fixed κ such that µ i+1 < κ < µ i and constrain x such that µ(x) = κ. We seek a pair of vectors {x, y } that are a solution for the following constrained minimization problem: Reminder: minimize f (x, y) = µ(y), x 0 subject to g(x, y) = Bx y 2 γ 2 Bx κx 2 0 h(x, y) = κ(x, x) (x, Bx) = 0 B γ = {y : Bx y γ r } = {y : Bx y γ Bx µ(x)x } Merico Argentati (UCD) Preconditioned Eigenvalue Solvers October 18, 2010 16 / 23

Apply Karush-Kuhn-Tucker (KKT) Conditions Real Case The KKT stationarity condition states that there exists constants a and b such that f (x, y ) + a g(x, y ) + b h(x, y ) = 0 at the critical point {x, y }. To simplify the notation, in the rest of the proof we drop the superscript and substitute {x, y} for {x, y }. Use r = Bx κx and directly calculate separately the two components of the gradient 2a ( B 2 x By γ 2 (B κi )r ) 2br = 0, (9) By µ(y)y 2 + 2a(y Bx) = 0. (10) (y, y) Merico Argentati (UCD) Preconditioned Eigenvalue Solvers October 18, 2010 17 / 23

KKT Conditions - Remarks We notice x 0 implies y 0. Second, let us temporarily consider a stricter, compared to x 0, constraint x = 1. Combined with the other constraints, this results in minimization of a smooth function f (x, y) on a compact set, so there exists a solution {x, y }. Finally, let us remove the artificial constraint x = 1 and notice that the solution {x, y } is scale-invariant. Thus we can consider the Karush-Kuhn-Tucker (KKT) conditions, e.g., [?, Theorem 9.1.1], in any neighborhood of {x, y }, which does not include the origin. The gradients of g and h are linearly independent simply since g depends only on x. Thus, the stationary point is regular, i.e., the KKT conditions are valid. Merico Argentati (UCD) Preconditioned Eigenvalue Solvers October 18, 2010 18 / 23

The Role of 2 D Invariant Subspaces Lemma For a fixed value κ (µ n, µ 1 ), let a pair of vectors {x, y } denote a solution of the following constrained minimization problem: minimize f (x, y) = µ(y), x 0 subject to g(x, y) = Bx y 2 γ 2 Bx κx 2 0 h(x, y) = κ(x, x) (x, Bx) = 0 Then either y is an eigenvector, or both x and y belong to a two-dimensional invariant subspace of B corresponding to two distinct eigenvalues. Merico Argentati (UCD) Preconditioned Eigenvalue Solvers October 18, 2010 19 / 23

2 D Invariant Subspaces (Outline of Proof - 1 of 2) Real Case Proof. If y is an eigenvector we are done, otherwise introducing two new constants c and d, (10) implies a 0, and (9) turns into We rewrite (10) as B(Bx γ 2 r y) = cr. (11) By µ(y)bx = d(bx y). (12) After further manipulations we show that cd 0 and we have (1 γ 2 )B 3 x + κγ 2 (κ c)b 2 x B 2 y + cκbx = 0. Eliminating x we obtain p(b)y = (c 3 B 3 + c 2 B 2 + c 1 B + c 0 )y = 0, where p( ) is a real polynomial with c 3 = 1 γ 2 > 0 and c 0 = cdκ 0. Since c 3 > 0 and c 0 0, the polynomial p 3 must have a non-positive root, and thus at most two positive roots. Merico Argentati (UCD) Preconditioned Eigenvalue Solvers October 18, 2010 20 / 23

2 D Invariant Subspaces (Outline of Proof - 2 of 2) Real Case Proof. Since B is diagonal we have p(µ 1 ) 0..... 0 p(µ n ) y 1. y n = 0. which implies that at most two components of y are nonzero. Solving for Bx in (10) we have ( (a y 2 ) µ(y))i B Bx = a y 2 y Again since B is diagonal (and invertible), x and y are both in the same 2 D subspace. Merico Argentati (UCD) Preconditioned Eigenvalue Solvers October 18, 2010 21 / 23

The Complex Case The 2 D proof works as is for the complex case For the KKT proof we use complex vectors x = x 1 + x 2 i and y = y 1 + y 2 i. Then the formulation still involves real functions minimize f (x, y) = µ(y), x 0 subject to g(x, y) = Bx y 2 γ 2 Bx κx 2 0 h(x, y) = κ(x, x) (x, Bx) = 0 The gradients involve complex functions, but the proof and conclusions are exactly the same as for the real case. Merico Argentati (UCD) Preconditioned Eigenvalue Solvers October 18, 2010 22 / 23

Conclusions Use of the KKT approach provides a simple and elegant solution A detailed understanding of eigenvalue theory is not necessary KKT and non-linear programming should be of interest to a wider audience of mathematicians and graduate students This approach may be useful in analyzing other methods (e.g. LOBPCG) Merico Argentati (UCD) Preconditioned Eigenvalue Solvers October 18, 2010 23 / 23

M. E. Argentati, A. V. Knyazev, K. Neymeyr and E. E. Ovtchinnikov, Preconditioned eigenvalue solver convergence theory in a nutshell, to be published. A. V. Knyazev, Computation of eigenvalues and eigenvectors for mesh problems: algorithms and error estimates, (In Russian), Dept. Num. Math., USSR Ac. Sci., Moscow, 1986. A. V. Knyazev, Convergence rate estimates for iterative methods for a mesh symmetric eigenvalue problem, Russian J. Numer. Anal. Math. Modelling, 2 (1987), pp. 371 396. A. V. Knyazev and K. Neymeyr, A geometric theory for preconditioned inverse iteration. III: A short and sharp convergence estimate for generalized eigenvalue problems, Linear Algebra Appl., 358 (2003), pp. 95 114. A. V. Knyazev and K. Neymeyr, Gradient flow approach to geometric convergence analysis of preconditioned eigensolvers, SIAM J. Matrix Anal. Appl., 31 (2009), pp. 621 628. E. E. Ovtchinnikov, Sharp convergence estimates for the preconditioned steepest descent method for hermitian eigenvalue problems, SIAM J. Numer. Anal., 43(6):2668 2689, 2006. Merico Argentati (UCD) Preconditioned Eigenvalue Solvers October 18, 2010 23 / 23