Iterative solvers for linear equations

Similar documents
Iterative solvers for linear equations

Iterative solvers for linear equations

ORIE 6334 Spectral Graph Theory October 13, Lecture 15

U.C. Berkeley CS270: Algorithms Lecture 21 Professor Vazirani and Professor Rao Last revised. Lecture 21

Lecture 10: October 27, 2016

Effective Resistance and Schur Complements

Fiedler s Theorems on Nodal Domains

Fiedler s Theorems on Nodal Domains

EECS 275 Matrix Computation

Matching Polynomials of Graphs

Sparsification by Effective Resistance Sampling

Spectral Graph Theory Lecture 2. The Laplacian. Daniel A. Spielman September 4, x T M x. ψ i = arg min

Physical Metaphors for Graphs

MATH 167: APPLIED LINEAR ALGEBRA Chapter 2

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

The Kernel Trick, Gram Matrices, and Feature Extraction. CS6787 Lecture 4 Fall 2017

Quadrature for the Finite Free Convolution

Course Notes: Week 1

Sec 2.2: Infinite Limits / Vertical Asymptotes Sec 2.6: Limits At Infinity / Horizontal Asymptotes

Iterative Methods for Solving A x = b

Sec 2.2: Infinite Limits / Vertical Asymptotes Sec 2.6: Limits At Infinity / Horizontal Asymptotes

x x2 2 + x3 3 x4 3. Use the divided-difference method to find a polynomial of least degree that fits the values shown: (b)

Numerical Methods I Eigenvalue Problems

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.

Applied Mathematics 205. Unit V: Eigenvalue Problems. Lecturer: Dr. David Knezevic

1 Adjacency matrix and eigenvalues

Lecture 24: Element-wise Sampling of Graphs and Linear Equation Solving. 22 Element-wise Sampling of Graphs and Linear Equation Solving

Walks, Springs, and Resistor Networks

Econ Slides from Lecture 7

Lecture 9: Krylov Subspace Methods. 2 Derivation of the Conjugate Gradient Algorithm

Linear Solvers. Andrew Hazel

Lecture 22. r i+1 = b Ax i+1 = b A(x i + α i r i ) =(b Ax i ) α i Ar i = r i α i Ar i

B553 Lecture 5: Matrix Algebra Review

6.842 Randomness and Computation March 3, Lecture 8

Rings, Paths, and Paley Graphs

Math 108b: Notes on the Spectral Theorem

MATH 583A REVIEW SESSION #1

The residual again. The residual is our method of judging how good a potential solution x! of a system A x = b actually is. We compute. r = b - A x!

Iterative Methods. Splitting Methods

Review Questions REVIEW QUESTIONS 71

Mobile Robotics 1. A Compact Course on Linear Algebra. Giorgio Grisetti

Rings, Paths, and Cayley Graphs

Numerical Methods I Solving Nonlinear Equations

1.2 Finite Precision Arithmetic

Lecture 1: Systems of linear equations and their solutions

MAT128A: Numerical Analysis Lecture Three: Condition Numbers

(x 1 +x 2 )(x 1 x 2 )+(x 2 +x 3 )(x 2 x 3 )+(x 3 +x 1 )(x 3 x 1 ).

(f(x) P 3 (x)) dx. (a) The Lagrange formula for the error is given by

Ramanujan Graphs of Every Size

22.3. Repeated Eigenvalues and Symmetric Matrices. Introduction. Prerequisites. Learning Outcomes

Chapter 5. Linear Algebra. A linear (algebraic) equation in. unknowns, x 1, x 2,..., x n, is. an equation of the form

Statistical Geometry Processing Winter Semester 2011/2012

FIXED POINT ITERATION

Numerical Linear Algebra Primer. Ryan Tibshirani Convex Optimization

Further Mathematical Methods (Linear Algebra) 2002

arxiv: v1 [math.na] 5 May 2011

LECTURE NOTES ELEMENTARY NUMERICAL METHODS. Eusebius Doedel

Approximate Gaussian Elimination for Laplacians

Geometric Modeling Summer Semester 2010 Mathematical Tools (1)

May 9, 2014 MATH 408 MIDTERM EXAM OUTLINE. Sample Questions

Notes for CS542G (Iterative Solvers for Linear Systems)

AMS526: Numerical Analysis I (Numerical Linear Algebra)

Sequences and Series

Queens College, CUNY, Department of Computer Science Numerical Methods CSCI 361 / 761 Spring 2018 Instructor: Dr. Sateesh Mane.

MAA507, Power method, QR-method and sparse matrix representation.

8.5 Taylor Polynomials and Taylor Series

Scientific Computing

Taylor Series. richard/math230 These notes are taken from Calculus Vol I, by Tom M. Apostol,

Bindel, Fall 2016 Matrix Computations (CS 6210) Notes for

Conjugate Gradient Method

Review of matrices. Let m, n IN. A rectangle of numbers written like A =

Next topics: Solving systems of linear equations

Numerical Linear Algebra

Introduction - Motivation. Many phenomena (physical, chemical, biological, etc.) are model by differential equations. f f(x + h) f(x) (x) = lim

88 CHAPTER 3. SYMMETRIES

B553 Lecture 1: Calculus Review

COMP 558 lecture 18 Nov. 15, 2010

Math 115 Spring 11 Written Homework 10 Solutions

SOME SOLUTION OF CALCULUS OF STEWART OF 7.4

and lim lim 6. The Squeeze Theorem

Queens College, CUNY, Department of Computer Science Numerical Methods CSCI 361 / 761 Spring 2018 Instructor: Dr. Sateesh Mane.

Algebraic Constructions of Graphs

Gradient Methods Using Momentum and Memory

Lecture: Local Spectral Methods (3 of 4) 20 An optimization perspective on local spectral methods

1 Vectors. Notes for Bindel, Spring 2017 Numerical Analysis (CS 4220)

MTH 464: Computational Linear Algebra

CME 302: NUMERICAL LINEAR ALGEBRA FALL 2005/06 LECTURE 6

Chapter 11 - Sequences and Series

Examination paper for TMA4215 Numerical Mathematics

Bipartite Ramanujan Graphs of Every Degree

8.1 Concentration inequality for Gaussian random matrix (cont d)

Review of Linear Algebra Definitions, Change of Basis, Trace, Spectral Theorem

A A x i x j i j (i, j) (j, i) Let. Compute the value of for and

Outline. Math Numerical Analysis. Errors. Lecture Notes Linear Algebra: Part B. Joseph M. Mahaffy,

Solving Boundary Value Problems (with Gaussians)

Sequences and infinite series

Numerical Linear Algebra Primer. Ryan Tibshirani Convex Optimization /36-725

AMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences)

Further Mathematical Methods (Linear Algebra) 2002

Linear Algebra Review

Transcription:

Spectral Graph Theory Lecture 17 Iterative solvers for linear equations Daniel A. Spielman October 31, 2012 17.1 About these notes These notes are not necessarily an accurate representation of what happened in class. The notes written before class say what I think I should say. The notes written after class way what I wish I said. Hurricane Sandy is responsible for our missing both a lecture and some of the lecturer s brain cells. 17.2 Overview In this and the next lecture, I will discuss iterative algorithms for solving linear equations in positive semi-definite matrices, with an emphasis on Laplacian matrices. I will begin with an introduction to iterative linear solvers in general, and then finish by explaining how this is relevant to Laplacians. Today s lecture will cover Richardson s first-order iterative method and the Chebyshev method. The next lecture will focus on the Conjugate Gradient method. 17.3 Why iterative methods? One is first taught to solve linear systems like Ax = b by direct methods such as Gaussian elimination, computing the inverse of A, or the LU factorization. However, all of these algorithms can be very slow. This is especially true when A is sparse. Just writing down the inverse takes O(n 2 ) space, and computing the inverse takes O(n 3 ) time if we do it naively. This might be OK if A is dense. But, it is very wasteful if A only has O(n) non-zero entries. In general, we prefer algorithms whose running time is proportional to the number of non-zero entries in the matrix A, and which do not require much more space than that used to store A. Iterative algorithms solve linear equations while only performing multiplications by A, and performing a few vector operations. Unlike the direct methods which are based on elimination, the iterative algorithms do not find exact solutions. Rather, they get closer and closer to the solution the longer they work. The advantage of these methods is that they need to store very little, and 17-1

Lecture 17: October 31, 2012 17-2 are often much faster than the direct methods. When A is symmetric, the running times of these methods are determined by the eigenvalues of A. Throughout this lecture we will assume that A is positive definite or positive semidefinite. 17.4 First-Order Richardson Iteration To get started, we will examine a simple, but sub-optimal, iterative method, Richardson s iteration. The idea of the method is to find an iterative process that has the solution to Ax = b as a fixed point, and which converges. We observe that if Ax = b, then for any α, αax = αb, = x + (αa I)x = αb, = This leads us to the following iterative process: x = (I αa)x + αb. where we will take x 0 = 0. We will show that this converges if x t = (I αa)x t 1 + αb, (17.1) I αa has norm less than 1, and that the convergence rate depends on how much the norm is less than 1. As we are assuming A is symmetric, I αa is symmetric as well, and so its norm is the maximum absolute value of its eigenvalues. Let 0 < λ 1 λ 2... λ n be the eigenvalues of A. Then, the eigenvalues of I αa are 1 αλ i, and the norm of I αa is This is minimized by taking max 1 αλ i = max (1 αλ 1, 1 αλ n ). i α = 2, in which case the smallest and largest eigenvalues of I αa become and the norm of I αa becomes ± λ n λ 1, 1 2λ 1. While we might not know, a good guess is often sufficient. If we choose an α < 2/( ), then the norm of I αa is at most 1 αλ 1.

Lecture 17: October 31, 2012 17-3 To show that x t converges to the solution, x, consider x x t. We have x x t = ((I αa)x + αb) ( (I αa)x t 1 + αb ) = (I αa)(x x t 1 ). So, and x x t = (I αa) t (x x 0 ) = (I αa) t x. x x t = (I αa) t x (I αa) t x = (I αa) t x ( 1 2λ ) t 1 x. e 2λ 1t/(λ n+λ 1 ) x. So, if we want to get a solution x t with x x t x ɛ, it suffices to run for iterations. The term ( λn ln(1/ɛ) = + 1 ) ln(1/ɛ). 2λ 1 2λ 1 2 λ n λ 1 is called the condition number 1 of the matrix A, when A is symmetric. It is often written κ(a), and the running time of iterative algorithms is often stated in terms of this quantity. We see that if the condition number is small, then this algorithm quickly provides an approximate solution. 17.5 A polynomial approximation of the inverse I am now going to give another interpretation of Richardson s iteration. It provides us with a polynomial in A that approximates A 1. In particular, the tth iterate, x t can be expressed in the form p t (A)b, where p t is a polynomial of degree t. We will view p t (A) as a good approximation of A 1 if Ap t (A) I 1 For general matrices, the condition number is defined to be the ratio of the largest to smallest singular value.

Lecture 17: October 31, 2012 17-4 is small. From the formula defining Richardson s iteration (17.1), we find x 0 = 0, x 1 = αb, x 2 = (I αa)αb + αb, x 3 = (I αa) 2 αb + (I αa)αb + αb, and t x t = (I αa) i αb. i=0 To get some idea of why this should be an approximation of A 1, consider what we get if we let the sum go to infinity. Assuming that the infinite sum converges, we have α (I αa) i = α (I (I αa)) 1 = α(αa) 1 = A 1. i=0 So, the Richardson iteration can be viewed as a truncation of this infinite summation. In general, a polynomial p t will enable us to compute a solution to precision ɛ if p t (A)b x ɛ x. As b = Ax, this is equivalent to which is equivalent to p t (A)Ax x ɛ x, Ap t (A) I ɛ 17.6 Better Polynomials This leads us to the question of whether we can find better polynomial approximations to A 1. The reason I ask is that the answer is yes! As A, p t (A) and I all commute, the matrix Ap t (A) I is symmetric and its norm is the maximum absolute value of its eigenvalues. So, it suffices to find a polynomial p t such that λ i p t (λ i ) 1 ɛ, for all eigenvalues λ i of A. To reformulate this, define q t (x) = 1 xp(x). Then, it suffices to find a polynomial q t of degree t + 1 for which q t (0) = 1, and q t (x) ɛ, for λ 1 x λ n.

Lecture 17: October 31, 2012 17-5 We will see that there are polynomials of degree ( ) ln(2/ɛ) λn /λ 1 + 1 /2 that allow us to compute solutions of accuracy ɛ. In terms of the condition number of A, this is a quadratic improvement over Richardson s first-order method. 17.7 Laplacian Systems One might at first think that these techniques do not apply to Laplacian systems, as these are always singular. However, we can apply these techniques without change if b is in the span of L. That is, if b is orthogonal to the all-1s vector and the graph is connected. In this case the eigenvalue λ 1 = 0 has no role in the analysis, and it is replaced by λ 2. One way of understanding this is to just view L as an operator acting on the space orthogonal to the all-1s vector. By considering the example of the Laplacian of the path graph, one can show that it is impossible to do much better than the κ iteration bound that I claimed at the end of the last section. To see this, first observe that when one multiplies a vector x by L, the entry (Lx )(i) just depends on x (i 1), x (i), and x (i + 1). So, if we apply a polynomial of degree at most t, x t (i) will only depend on b(j) with i t j i + t. This tells us that we will need a polynomial of degree on the order of n to solve such a system. On the other hand, λ n /λ 2 is on the order of n as well. So, we should not be able to solve the system with a polynomial whose degree is significantly less than λ n /λ 2. 17.8 Chebyshev Polynomials I d now like to explain how we find these better polynomials. The key is to transform one of the most fundamental polynomials: the Chebyshev polynomials. Just as one must become familiar with the fundamental families of graphs, one should also become familiar with the fundamental families of polynomials. Certain polynomials, such as Chebyshev polynomials, arise as the solutions to many problems. We will see other fundamental families in future lectures. The tth Chebyshev polynomial, written T t, may be defined as the polynomial such that cos(tx) = T t (cos(x)). It might not be obvious that one can express cos(tx) as a polynomial in cos(x). But, one can do it by using the identities cos(x + y) = cos(x) cos(y) sin(x) sin(y) sin(x + y) = sin(x) cos(y) + cos(x) sin(y), and sin 2 (x) = 1 cos 2 (x).

Lecture 17: October 31, 2012 17-6 To see how this works, we ll do the cases when t is a power of 2. We have cos(2x) = cos(x) cos(x) sin(x) sin(x) = cos 2 (x) sin 2 (x) = 2 cos 2 (x) 1. So, In general, T 2 (x) = 2x 2 1. cos(2tx) = cos(tx) cos(tx) sin(tx) sin(tx) = cos 2 (tx) sin 2 (tx) = 2 cos 2 (tx) 1. So, That is, T 2t (x) = 2(T t (x)) 2 1. T t (x) = cos(t acos(x)). The expansion in cosines tells us all that we need to know about the values of this polynomial on the interval [ 1, 1]. We have acos : [ 1, 1] [ π, 0]. So, on [ 1, 1] the polynomial stays between 1 and 1 and has t roots. To understand the polynomial outside [ 1, 1], we write it in terms of hyperbolic cosine. Recall that hyperbolic cosine maps the real line to [1, ] and is symmetric about the origin. So, the inverse of hyperbolic cosine may be viewed as a map from [1, ] to [0, ]. The following are the fundamental facts about hyperbolic cosine that we require: cosh(x) = cos(ix) cosh(x) = 1 ( e x + e x), and 2 ( acosh(x) = ln x + ) x 2 1, for x 1. From the first of these identities, we can see that we can equally well define the Chebyshev polynomials using the hyperbolic cosine: cosh(tx) = T t (cosh(x)), and T t (x) = cosh(t acosh(x)). This allows us to reason about the values of the Chebyshev polynomials on [1, ]. The key fact that we will use about these polynomials is that they grow quickly in this region. Claim 17.8.1. For γ > 0, T t (1 + γ) (1 + 2γ) t /2.

Lecture 17: October 31, 2012 17-7 Proof. Setting x = 1 + γ, we compute T t (x) = 1 2 1 2 (e t acosh(x) t + e acosh(x)) (e t acosh(x)) = 1 2 (x + x 2 1) t = 1 2 (1 + γ + (1 + γ) 2 1) t 1 2 (1 + 2γ) t. 17.9 How to Use Chebyshev Polynomials We will exploit the following properties of the Chebyshev polynomials: 1. T t has degree t. 2. T t (x) [ 1, 1], for x [ 1, 1]. 3. T t (x) is monotonically increasing for x 1. 4. T t (1 + γ) (1 + 2γ) t /2, for γ > 0. To express q(x) in terms of a Chebyshev polynomial, we should map the range on which we want p to be small, [λ min, λ max ] to [ 1, 1]. We will accomplish this with the linear map: Note that l(x) def = λ max + λ min 2x λ max λ min. 1 if x = λ max l(x) = 1 if x = λ min λ max+λ min λ max λ min if x = 0. So, to guarantee that the constant coefficient in q(x) is one, we should set q(x) def = T t(l(x)) T t (l(0)). We know that T t (l(x)) 1 for x [λ min, λ max ]. To find q(x) for x in this range, we must compute T t (l(0)). We have l(0) 1 + 2/κ(A),

Lecture 17: October 31, 2012 17-8 and so by properties 3 and 4 of Chebyshev polynomials, Thus, T t (l(0)) (1 + 2/ κ) t /2. q(x) 2(1 + 2/ κ) t, for x [λ min, λ max ], and so all eigenvalues of I Aq(A) will have absolute value at most 2(1 + 2/ κ) t. 17.10 Warning The polynomial-approach that I have described here only works in infinite precision arithmetic. In finite precision arithmetic one has to be more careful about how one implements these algorithms. This is why the descriptions of methods such as the Chebyshev method found in Numerical Linear Algebra textbooks are more complicated than that presented here. The algorithms that are actually used are mathematically identical in infinite precision, but they actually work. The problem with the naive implementations are the typical experience: in double-precision arithmetic the polynomial approach to Chebyshev will fail to solve linear systems in random positive definite matrices in 60 dimensions!