Iterative solvers for linear equations

Similar documents
Iterative solvers for linear equations

Iterative solvers for linear equations

ORIE 6334 Spectral Graph Theory October 13, Lecture 15

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.

Lecture 10: October 27, 2016

Linear Solvers. Andrew Hazel

Spectral Graph Theory Lecture 2. The Laplacian. Daniel A. Spielman September 4, x T M x. ψ i = arg min

Fiedler s Theorems on Nodal Domains

Convergence of Random Walks

Course Notes: Week 1

Linear Algebra, Summer 2011, pt. 3

6.842 Randomness and Computation March 3, Lecture 8

Sparsification by Effective Resistance Sampling

MATH 583A REVIEW SESSION #1

Lecture 22. r i+1 = b Ax i+1 = b A(x i + α i r i ) =(b Ax i ) α i Ar i = r i α i Ar i

Econ Slides from Lecture 7

CS 143 Linear Algebra Review

Fiedler s Theorems on Nodal Domains

CS168: The Modern Algorithmic Toolbox Lecture #8: How PCA Works

Bindel, Fall 2016 Matrix Computations (CS 6210) Notes for

15 Singular Value Decomposition

LINEAR ALGEBRA 1, 2012-I PARTIAL EXAM 3 SOLUTIONS TO PRACTICE PROBLEMS

Applied Mathematics 205. Unit V: Eigenvalue Problems. Lecturer: Dr. David Knezevic

Iterative Methods for Solving A x = b

Physical Metaphors for Graphs

REVIEW FOR EXAM II. The exam covers sections , the part of 3.7 on Markov chains, and

Lecture 7: Positive Semidefinite Matrices

Effective Resistance and Schur Complements

A Brief Outline of Math 355

Preliminary/Qualifying Exam in Numerical Analysis (Math 502a) Spring 2012

8 Numerical methods for unconstrained problems

14 Singular Value Decomposition

Scientific Computing

Singular Value Decomposition

6 EIGENVALUES AND EIGENVECTORS

AMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences)

COMP 558 lecture 18 Nov. 15, 2010

CS 246 Review of Linear Algebra 01/17/19

Linear Algebra, Summer 2011, pt. 2

Jordan normal form notes (version date: 11/21/07)

Geometric Modeling Summer Semester 2010 Mathematical Tools (1)

Iterative Methods. Splitting Methods

MAA507, Power method, QR-method and sparse matrix representation.

Computational Methods. Eigenvalues and Singular Values

Getting Started with Communications Engineering

Strongly Regular Graphs, part 1

Lecture 9: Krylov Subspace Methods. 2 Derivation of the Conjugate Gradient Algorithm

Final Review Sheet. B = (1, 1 + 3x, 1 + x 2 ) then 2 + 3x + 6x 2

Algebra C Numerical Linear Algebra Sample Exam Problems

Eigenvalues and Eigenvectors

Lecture 10 - Eigenvalues problem

1 Last time: least-squares problems

Lecture 18 Classical Iterative Methods

HW2 - Due 01/30. Each answer must be mathematically justified. Don t forget your name.

Math 471 (Numerical methods) Chapter 3 (second half). System of equations

Numerical Methods - Numerical Linear Algebra

Mobile Robotics 1. A Compact Course on Linear Algebra. Giorgio Grisetti

Notes for CS542G (Iterative Solvers for Linear Systems)

Numerical Methods I Non-Square and Sparse Linear Systems

Parallel Numerics, WT 2016/ Iterative Methods for Sparse Linear Systems of Equations. page 1 of 1

Eigenvalues, Eigenvectors, and Diagonalization

6. Iterative Methods for Linear Systems. The stepwise approach to the solution...

Cheat Sheet for MATH461

8.1 Concentration inequality for Gaussian random matrix (cont d)

Computational Methods. Systems of Linear Equations

Numerical Methods I Solving Square Linear Systems: GEM and LU factorization

AMS526: Numerical Analysis I (Numerical Linear Algebra)

U.C. Berkeley CS270: Algorithms Lecture 21 Professor Vazirani and Professor Rao Last revised. Lecture 21

Math 471 (Numerical methods) Chapter 4. Eigenvalues and Eigenvectors

Name: INSERT YOUR NAME HERE. Due to dropbox by 6pm PDT, Wednesday, December 14, 2011

Chapter 7 Iterative Techniques in Matrix Algebra

Chapter 3 Transformations

Spectral Analysis of Random Walks

Lecture # 11 The Power Method for Eigenvalues Part II. The power method find the largest (in magnitude) eigenvalue of. A R n n.

Numerical Methods I Eigenvalue Problems

Lecture 1: Systems of linear equations and their solutions

U.C. Berkeley CS294: Spectral Methods and Expanders Handout 11 Luca Trevisan February 29, 2016

Quadrature for the Finite Free Convolution

B553 Lecture 5: Matrix Algebra Review

An Algorithmist s Toolkit September 10, Lecture 1

Numerical Linear Algebra

(f(x) P 3 (x)) dx. (a) The Lagrange formula for the error is given by

REVIEW FOR EXAM III SIMILARITY AND DIAGONALIZATION

A A x i x j i j (i, j) (j, i) Let. Compute the value of for and

Solution Set 7, Fall '12

EECS 275 Matrix Computation

Linear Algebra Primer

Math 350 Fall 2011 Notes about inner product spaces. In this notes we state and prove some important properties of inner product spaces.

Designing Information Devices and Systems I Spring 2016 Official Lecture Notes Note 21

Assignment #10: Diagonalization of Symmetric Matrices, Quadratic Forms, Optimization, Singular Value Decomposition. Name:

University of Colorado Denver Department of Mathematical and Statistical Sciences Applied Linear Algebra Ph.D. Preliminary Exam January 23, 2015

Matrices and Vectors. Definition of Matrix. An MxN matrix A is a two-dimensional array of numbers A =

YORK UNIVERSITY. Faculty of Science Department of Mathematics and Statistics MATH M Test #1. July 11, 2013 Solutions

Computational Methods CMSC/AMSC/MAPL 460. Eigenvalues and Eigenvectors. Ramani Duraiswami, Dept. of Computer Science

G1110 & 852G1 Numerical Linear Algebra

Linear Algebra Done Wrong. Sergei Treil. Department of Mathematics, Brown University

(a) If A is a 3 by 4 matrix, what does this tell us about its nullspace? Solution: dim N(A) 1, since rank(a) 3. Ax =

c 1 v 1 + c 2 v 2 = 0 c 1 λ 1 v 1 + c 2 λ 1 v 2 = 0

18.06 Quiz 2 April 7, 2010 Professor Strang

The eigenvalues are the roots of the characteristic polynomial, det(a λi). We can compute

Transcription:

Spectral Graph Theory Lecture 15 Iterative solvers for linear equations Daniel A. Spielman October 1, 009 15.1 Overview In this and the next lecture, I will discuss iterative algorithms for solving linear equations in Laplacian matrices and symmetric diagonally dominant matrices. I will begin with an introduction to iterative linear solvers in general, and then finish by explaining how this is relevant to Laplacians. 15. Why iterative methods? One is first taught to solve linear systems like Ax b by direct methods such as Gaussian elimination, computing the inverse of A, or the LU factorization. However, all of these algorithms can be very slow. This is especially true when A is sparse. Just writing down the inverse takes On ) space, and computing the inverse takes On 3 ) time if we do it naively. This might be OK if A is dense. But, it is very wastefull if A only has On) non-zero entries. In general, we prefer algorithms whose running time is proportional to the number of non-zero entries in the matrix A, and which do not require much more space than that used to store A. Iterative algorithms solve linear equations while only performing multiplictions by A, and performing a few vector operations. Unlike the direct methods which are based on elimination, the iterative algorithms do not get exact solutions. Rather, they get closer and closer to the solution the longer they work. The advantage of these methods is that they need to store very little, and are often much faster than the direct methods. When A is symmetric, the running times of these methods are determined by the eigenvalues of A. 15.3 First-Order Richardson Iteration To get started, we will examine a simple, but sub-optimal, iterative method, Richardson s iteration. The idea of the method is to find an iterative process that has the solution to Ax b as a fixed 15-1

Lecture 15: October 1, 009 15- point, and which converges. We observe that if Ax b, then for any α, αax αb, x + αa I)x αb, This leads us to the following iterative process: x I αa)x + αb. x t I αa)x t 1 + αb, where we will take x 0 0. We will show that this converges if I αa has norm less than 1, and that the convergence rate depends on how much the norm is less than 1. As we are assuming A is symmetric, I αa is symmetric as well, and so its norm is the maximum absolute value of its eigenvalues. Let λ 1 λ... λ n be the eigenvalues of A. Then, the eigenvalues of I αa are 1 αλ i, and the norm of I αa is max 1 αλ i max 1 αλ 1, 1 αλ n ). i This is minimized by taking α, in which case the smallest and largest eigenvalues of I αa become and the norm of I αa becomes ± λ n λ 1, 1 λ 1. To show that x t converges to the solution, x, consider x x t. We have x x t I αa)x + αb) I αa)x t 1 + αb ) I αa)x x t 1 ). So, and x x t I αa) t x, x x t I αa) t x I αa) t x I αa) t x 1 λ ) t 1 x. e λ 1t/λ n+λ 1 ) x.

Lecture 15: October 1, 009 15-3 So, if we want to get a solution x t with x x t x ɛ, it suffices to run for iterations. The term λn ln1/ɛ) + 1 ) ln1/ɛ). λ 1 λ 1 λ n λ 1 is called the condition number 1 of the matrix A, when A is symmetric. It is often written κa), and the running time of iterative algorithms is often stated in terms of this quantity. We see that if the condition number is small, then this algorithm quickly provides an approximate solution. 15.4 A polynomial approximation of the inverse I am now going to give another interpretation of Richardson s iteration. It provides us with a polynomial in A that approximates A 1. In particular, the tth iterate, x t can be expressed in the form p t A)b, where p t is a polynomial of degree t 1. To see this note that so x t x I αa) t x I I αa) t )x I I αa) t )A 1 b, p t A) I I αa) t )A 1. This really is a polynomial in A. The term A 1 disappears, as the difference I I αa) t ) is a sum of powers of A of degree at least 1 the I terms cancel). We will view p t A) as a good approximation of A 1 if Ap t A) I is small. As A, A 1 and I all commute, we have Ap t A) I p t A)A I I I αa) t ) I I αa) t. Thus, the norm of this matrix is at most 1 λ ) t 1. 1 For general matrices, the condition number is defined to be the ratio of the largest to smallest singular value.

Lecture 15: October 1, 009 15-4 15.5 Faster iterations It is natural to ask if we can find a faster iterative method, say by exploiting the values of more iterates. To do so, we again use an equation in x to define an interation of which x is a fixed point. For the rest of this lecture, define M def I αa, and remember that From this we derive This leads us to consider the iteration x Mx + αb. ωx ωmx + ωαb, x ωmx + 1 ω)x + ωαb. x t+1 ωmx t + 1 ω)x t 1 + ωαb. 15.1) We have chosen this iteration so that x is a fixed point. Now, let s see how quickly it converges. As before, we find that So, let s set y t to be the tth error vector and observe that these satisfy x x t+1 ωmx x t ) + 1 ω)x x t ). y t x x t, y t+1 ωmy t + 1 ω)y t 1. To express this rule as multiplication by a matrix, we need to consider y t+1 and y t together as one vector. We obtain ) [ ] ) y t+1 ωm 1 ω)i y t I 0 y t 1. Define y t S def [ ] ωm 1 ω)i, I 0 and let s find the value of ω that minimizes the absolute values of the eigenvalues of S. ) u If is an eigenvector of S of eigenvalue λ, then v [ ωm 1 ω)i I 0 ] ) u v ) ωmu + 1 ω)v u so it must be the case that u λv. From the top row, we obtain λ v ωλmv + 1 ω)v. ) λu, λv

Lecture 15: October 1, 009 15-5 So, if v is an eigenvector of M with eigenvalue µ i, λ will be an eigenvalue of S provided that λ ωλµ i + 1 ω) λ ωµ i λ + ω 1) 0. Generically each eigenvalue of M leads to two eigenvalues of S, and by counting we see that this provides all the eigenvalues of S. Solving this quadratic equation, we find eigenvalues λ + ωµ i + ω µ i 4ω 1) and λ ωµ i ω µ i 4ω 1). Assume that all eigenvalues of M lie strictly between µ and µ. We then find that a good choice for ω is ω 1 + 1 µ. The motivation for this choice is that it causes so for all µ i between µ and µ ω µ 4ω 1), ω µ i 4ω 1), and the square of the absolute value of the resulting complex eigenvalues λ is λ ω µ i + 4ω 1) ω µ i 4 ω 1). So, all of the eigenvalues of S are complex, and have absolute value ω 1) ωµ/. To see that this is an improvement, consider the case in which µ is near 1, say µ 1 ɛ. Then, to first order, ωµ µ 1 + 1 µ 1 ɛ 1 + 1 1 ɛ) 1 ɛ 1 + ɛ 1 ɛ ɛ. This is much further from 1 than 1 ɛ, and suggests that our algorithm should take around κa) iterations to converge, instead of κa). However, the matrix S was not symmetric, so this argument is not quite precise. To make it precise, we need to expand our initial vector in the eigenbasis of S. Let v 1,..., v n be a basis of eigenvectors of M, and for each let v + i and v i be the two induced eigenvectors of S: v + λ i + ) v i We will initialize the algorithm by setting v i and v λ i ) v i. x 0 0 and x 1 b. v i

Lecture 15: October 1, 009 15-6 15.6 A polynomial from the second-order method To make this precise, we will see that this method also applies a polynomial in A to b, and that it is a very good approximation of A 1. Our analysis will be facilitated by a judicious choice of x 0 and x 1. We set x 0 0 and x 1 b. We then find that y i [ I 0 ] S i 1 [ M I ] y 0. So, there is a polynomial p i of degree i such that y i p i A)x. To bound the eigenvalues of p i A). With some computation, one can prove that the eigenvalues of p i A) are real and have absolute value at most µω 1 + i 1 µ ). I thought I had a way of making this computation obvious, but I was wrong. Sorry about that. 15.7 The best polynomials To evaluate how close pa) is to A 1, we measured the eigenvalues of I ApA). If we want to find the polynomial in A of degree i 1 that comes closest to A 1, we can instead set and find the polynomial q that minimizes qx) 1 xpx), qλ i ) for λ i and eigenvalue of A. Of course, we need the constant qx) to be 1, so we restrict q0) 1. It is possible to find the optimal polynomial q under the assumption that the eigenvalues of A lie between λ 1 and λ n. If λ 1 > 0, then the optimal polynomial of degree i may be constructed using Chebyshev polynomials, and guarantees that q i λ) i 1, λn /λ 1 + 1) for all λ [λ 1, λ n ]. So, to get error ɛ, the best polynomial will require degree around ) ln1/ɛ) λn /λ 1 + 1 /. This is quadratically better than we got from the Richardson iteration.

Lecture 15: October 1, 009 15-7 15.8 The Conjugate Gradient For positive-definite systems, there is an algorithm that dominates all of these: the Conjugate Gradient. I do not have time to explain how it works, but I can tell you what it does. In the ith iteration, it finds the optimal approximation x i to x in the span of { b, Ab, A b,..., A i 1 b }. It does this while performing an amount of work comparable to that done by the second-order method: one matrix multiplication and a couple of vector operations per iteration. However, the Conjugate Gradient defines optimality in a slightly counter-intuitive way. It minimizes x i x A def x i xx) T Ax i xx). However, for many applications this is actually a better way to measure error than the ordinary Euclidean norm! By using Chebyshev polynomials, we can show that the Conjugate Gradient method will always find a vector x i in the ith iteration that satisfies i x x i A 1 x A λn /λ 1 + 1). 15.9 Solving equations in Laplacians It may seem strange to solve linear equations in Laplacian matrices, since they are degenerate. It becomes less strange once you observe that we know what their nullspace is, and can restrict ourselves to working orthogonal to this nullspace. In this case, we only care about the non-zero eigenvalues. The problem of solving linear equations in Laplacians arises in many applications. I hope I have time to mention some in class. In the meantime, let s look at what happens when we try to solve an equation in the Laplacian of a path graph. If we are going to use an iterative method, it is clear that we will need at least Θn) iterations. After all, it takes n/ iterations before the value of x 1) has any dependence on bn/). On the other hand, the unweighted path graph has λ n /λ On ), so an iterative method can converge in On) iterations. This tells us that, up to constant factors, the dependence of the convergence rate on λ n /λ cannot be improved. On the other hand, we see that if λ is large, then we can solve equations in the Laplacian very quickly. In the next lecture, I will show you a technique for avoiding a dependence on λ n /λ entirely. By approximating Laplacians by Laplacians of simpler graphs, we will see how to solve any Laplacian system in time Om 4/3 ).

Lecture 15: October 1, 009 15-8 15.10 Research Project The reason that I taught this lecture this way is that I have been trying to find a more intuitive explanation for the quadratic speedup of the Chebyshev method over Richardon s. I don t like merely appealing to the properties of Chebyshev polynomials. In fact, I would really prefer a geometric explanation. Please help me find one!