Lecture 10: October 27, 2016

Similar documents
Lecture 9: Krylov Subspace Methods. 2 Derivation of the Conjugate Gradient Algorithm

1 Matrix notation and preliminaries from spectral graph theory

The Conjugate Gradient Method

Iterative solvers for linear equations

Iterative solvers for linear equations

Lecture 7. 1 Normalized Adjacency and Laplacian Matrices

Lecture 12: Introduction to Spectral Graph Theory, Cheeger s inequality

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.

Iterative solvers for linear equations

U.C. Berkeley CS270: Algorithms Lecture 21 Professor Vazirani and Professor Rao Last revised. Lecture 21

Lecture 12: November 6, 2017

Lecture 22. r i+1 = b Ax i+1 = b A(x i + α i r i ) =(b Ax i ) α i Ar i = r i α i Ar i

Lecture 12 : Graph Laplacians and Cheeger s Inequality

Spectral Theorem for Self-adjoint Linear Operators

Lecture 14: Random Walks, Local Graph Clustering, Linear Programming

1 Matrix notation and preliminaries from spectral graph theory

Math 350 Fall 2011 Notes about inner product spaces. In this notes we state and prove some important properties of inner product spaces.

Applied Mathematics 205. Unit V: Eigenvalue Problems. Lecturer: Dr. David Knezevic

Homework set 4 - Solutions

Principal Component Analysis

ECE580 Exam 1 October 4, Please do not write on the back of the exam pages. Extra paper is available from the instructor.

PETROV-GALERKIN METHODS

15-850: Advanced Algorithms CMU, Fall 2018 HW #4 (out October 17, 2018) Due: October 28, 2018

Linear algebra and applications to graphs Part 1

The Kernel Trick, Gram Matrices, and Feature Extraction. CS6787 Lecture 4 Fall 2017

There are two things that are particularly nice about the first basis

TMA 4180 Optimeringsteori THE CONJUGATE GRADIENT METHOD

Lecture: Local Spectral Methods (3 of 4) 20 An optimization perspective on local spectral methods

Finding normalized and modularity cuts by spectral clustering. Ljubjana 2010, October

8.1 Concentration inequality for Gaussian random matrix (cont d)

U.C. Berkeley CS294: Spectral Methods and Expanders Handout 11 Luca Trevisan February 29, 2016

Lecture 13: Spectral Graph Theory

EECS 275 Matrix Computation

Chapter 6: Orthogonality

Fall TMA4145 Linear Methods. Exercise set Given the matrix 1 2

Data Mining and Analysis: Fundamental Concepts and Algorithms

Homework 2. Solutions T =

Contents. Preface for the Instructor. Preface for the Student. xvii. Acknowledgments. 1 Vector Spaces 1 1.A R n and C n 2

Chapter 10 Conjugate Direction Methods

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

MATH 304 Linear Algebra Lecture 34: Review for Test 2.

Chapter 8 Gradient Methods

Lecture 8 : Eigenvalues and Eigenvectors

Linear Solvers. Andrew Hazel

Rings, Paths, and Paley Graphs

The Conjugate Gradient Method

Lecture 7: Spectral Graph Theory II

LINEAR ALGEBRA BOOT CAMP WEEK 4: THE SPECTRAL THEOREM

Throughout these notes we assume V, W are finite dimensional inner product spaces over C.

An Algorithmist s Toolkit September 10, Lecture 1

Recall the convention that, for us, all vectors are column vectors.

Definition 1. A set V is a vector space over the scalar field F {R, C} iff. there are two operations defined on V, called vector addition

Conjugate Gradients I: Setup

MATH 583A REVIEW SESSION #1

Review problems for MA 54, Fall 2004.

MODULE 8 Topics: Null space, range, column space, row space and rank of a matrix

7 Principal Component Analysis

Conjugate gradient method. Descent method. Conjugate search direction. Conjugate Gradient Algorithm (294)

Lecture 1: Graphs, Adjacency Matrices, Graph Laplacian

Large-scale eigenvalue problems

A PRIMER ON SESQUILINEAR FORMS

Conjugate Gradient (CG) Method

MATH 51H Section 4. October 16, Recall what it means for a function between metric spaces to be continuous:

Chapter 3 Transformations

Contents. 1 Introduction. 1.1 History of Optimization ALG-ML SEMINAR LISSA: LINEAR TIME SECOND-ORDER STOCHASTIC ALGORITHM FEBRUARY 23, 2016

Math (P)Review Part II:

Linear Algebra. Min Yan

Some definitions. Math 1080: Numerical Linear Algebra Chapter 5, Solving Ax = b by Optimization. A-inner product. Important facts

1 Review: symmetric matrices, their eigenvalues and eigenvectors

1 Extrapolation: A Hint of Things to Come

Unconstrained optimization

Course Notes: Week 1

CS168: The Modern Algorithmic Toolbox Lectures #11 and #12: Spectral Graph Theory

Topics. The CG Algorithm Algorithmic Options CG s Two Main Convergence Theorems

Math 113 Final Exam: Solutions

Introduction to Iterative Solvers of Linear Systems

Quadratic forms. Here. Thus symmetric matrices are diagonalizable, and the diagonalization can be performed by means of an orthogonal matrix.

MATH 304 Linear Algebra Lecture 20: The Gram-Schmidt process (continued). Eigenvalues and eigenvectors.

ORIE 6334 Spectral Graph Theory October 13, Lecture 15

MATRICES ARE SIMILAR TO TRIANGULAR MATRICES

CS168: The Modern Algorithmic Toolbox Lecture #8: How PCA Works

Iterative methods for Linear System

MATH 240 Spring, Chapter 1: Linear Equations and Matrices

Course Notes: Week 4

Lecture 14: SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices) Lecturer: Sanjeev Arora

Then x 1,..., x n is a basis as desired. Indeed, it suffices to verify that it spans V, since n = dim(v ). We may write any v V as r

Chapter 7 Iterative Techniques in Matrix Algebra

Spectral Graph Theory Lecture 2. The Laplacian. Daniel A. Spielman September 4, x T M x. ψ i = arg min

CS168: The Modern Algorithmic Toolbox Lecture #8: PCA and the Power Iteration Method

Chapter 2: Vector Geometry

UNDERSTANDING THE DIAGONALIZATION PROBLEM. Roy Skjelnes. 1.- Linear Maps 1.1. Linear maps. A map T : R n R m is a linear map if

The Steepest Descent Algorithm for Unconstrained Optimization

Eigenvalues, random walks and Ramanujan graphs

ORIE 6334 Spectral Graph Theory September 22, Lecture 11

Unique Games and Small Set Expansion

Computation of eigenvalues and singular values Recall that your solutions to these questions will not be collected or evaluated.

(v, w) = arccos( < v, w >

Fiedler s Theorems on Nodal Domains

LINEAR ALGEBRA REVIEW

Designing Information Devices and Systems II

Transcription:

Mathematical Toolkit Autumn 206 Lecturer: Madhur Tulsiani Lecture 0: October 27, 206 The conjugate gradient method In the last lecture we saw the steepest descent or gradient descent method for finding a solution to the linear system Ax = b for A 0 The method guarantees x t x ε x 0 x after t = Oκ log/ε iterations, where κ is the condition number of the matrix A We will see that the conjugate gradient can obtain a similar guarantee in O κ log/ε iterations For the steepest descent method, if we start from x 0 = 0, we get x t x = I ηa x, which gives x t = pa b for some polynomial p of degree at most t The conjugate gradient method just takes this idea of finding an x of the form pa b and runs with it The method finds an x t = p t A b where p t is the best polynomial of degree at most t ie, the polynomial which minimizes the function 2 Ax, x b, x However, the method does not explicitly work with polynomials Instead we use the simple observation that any vector of the form p t A b lies in the subspace Span {b, Ab,, A t b} and the method finds the best vector in the subspace at every time t Definition Let ϕ : V V be a linear operator on a vector space V and let v V be a vector The Krylov subspace of order t defined by ϕ and v is defined as { } K t ϕ, v := Span v, ϕv,, ϕ v Thus, at step t of the conjugate gradient method, we find the best vector in the space K t A, b we will just write the subspace as K t since A and b are fixed for the entire argument The trick of course is to be able to do this in an iterative fashion so that we can quickly update the minimizer in the space K to the minimizer in the space K t This can be done by expressing the minimizer in K in terms of a convenient orthonormal basis {u 0,, u } for K It turns out that if we work with a basis which is orthonormal with respect to the inner product, A, at step t we only need to update the component of the minimizer along the new vector u t we get to obtain a basis for K t

The algorithm Recall that we defined the inner product x, y A := Ax, y where, denotes the standard inner product on R n As before we consider the function 2 Ax, x b, x and pick x t := arg min f x This can also be thought of as finding the closest point to x in the space K t under the distance A since f x = 2 Ax, x b, x = 2 Ax, x Ax, x = 2 x, x A x, x A = 2 x x 2 A x 2 A, which gives x t = arg min f x = arg min x x A We have already seen how to compute find the characterize the closest point in a subspace, to a given point Let {u 0,, u } be an orthonormal basis for K t under the inner product, A Completing this to an orthonormal basis {u 0,, u n } for R n, let x be expressible as x = n c i u i = n x, u i A u i Then we know that the closest point x t in K t, under the distance A is given by x t = x, u i A u i = Ax, u i u i = b, u i u i Note that even thhough we do not know x, we can find x t given an orthonormal basis {u 0,, u }, since we can compute b, u i for all u i This gives the following algorithm: - Start with u 0 = b/ b A as an orthonormal basis for K - Let x t = b, u i u i for a basis {u 0,, u } orthonormal under the inner product, A - Extend {u 0,, u } to a basis of K t+ by defining v t = A t b A t b, u i A u i and u t = v t vt, v t A 2

- Update x t+ = x t + b, u t u t Notice that the basis extension step here seems to require Ot matrix-vector multiplications in the t th iteration and thus we will need Ot 2 matrix-vector multiplications in total for t iterations This would negate the quadratic advantage we are trying to gain over steepest descent However, in the homework you will see a way of extending the basis using only O matrix-vector multiplications in each step 2 Bounding the number of iterations Since x t lies in the subspace K t, we have x t = pa b for some polynomial p of degree at most t Thus, x t x = pa b x = pa A x x = I pa A x 0 x, since x 0 = 0 We can think of I paa as a polynomial qa, where degq t and q0 = Recall from last lecture that the minimizer of f x is the same as the minimizer of x x, x x A = x x 2 A Since pab is the minimizer of f x in K t, we have x t x 2 A = min q Q t qax o x 2 A, where Q t is the set of polynomials defined as Q t := {q R[z] degq t, q0 = } Use the fact that if λ is an eigenvalue of a matrix M, then λ t is an eigenvalue of M t with the same eigenvector to prove that the following Exercise 2 Let λ,, λ n be the eigenvalues of A Then for any polynomial q and any v R n, qav A max qλ i v A i Using the above, we get that x t x A min max q Q t i qλ i x 0 x A Thus, the problem of bounding the norm of x t x is reduced to finding a polynomial q of degree at most t such that q0 = and qλ i is small for all i Exercise 3 Verify that using qz = method 2z λ +λ n t recovers the guarantee of the steepest descent 3

Note that the conjugate gradient method itself does not need to know anything about the optimal polynomials in the above bound The polynomials are only used in the analysis of the bound The following claim, which can be proved by using slightly modified Chebyshev polynomials, suffices to obtain the desired bound on the number of iterations Claim 4 For each t N, there exists a polynomial q t Q t such that 2 t q t z 2 z [λ, λ n ] κ + We will prove the claim later using Chebyshev polynomials However, using the claim we have that x t x A min qλ i x 0 x 2 t A 2 x 0 x κ + A max q Q t i Thus, O κ log/ε iterations suffice to ensure that x t x A ε x 0 x A 3 Chebyshev polynomials The Chebyshev polynomial of degree t is given by the expression P t z = [ 2 z + t z 2 + z ] t z 2 Note that this is a polynomial since the odd powers of z 2 will cancel from the two expansions For z [, ] this can also be written as P t z = cos t cos z, which shows that P t z [, ] for all z [, ] Using these polynomials, we can define the required polynomials q t as q t z = P λ +λ n 2z t λ n λ P λ +λ n t λ n λ The denominator is a constant which does not depend on z and the numerator is a polynomial of degree t in z Hence degq t = t Also, the denominator ensures that q t 0 = Finally, for z [λ, λ n ], we have Hence, the numerator is in the range [, ] for all z [λ, λ n ] This gives q t z λ +λ n 2z λ n λ 2 P λ +λ n t λ n λ t κ = 2 κ + 4 2 κ + t

The last bound above can be computed directly from the first definition of the Chebyshev polynomials 2 Matrices associated with graphs Let G = V, E be an undirected graph with V = [n] We consider the matrices A, D and L associated with the graph G, as defined below A ij = { if {i, j} E 0 otherwise D ij = { degi if i = j 0 otherwise L = D A The matrix A is called the adjacency matrix of the graph, while L is called the Laplacian matrix of the graph We will later use these matrices to analyze properties of the underlying graph Note that all the above matrices are symmetric since the graph G is undirected Thus, the eigenvectors of each of these matrices form an orthonormal basis, and the corresponding eigenvalues are real Conventionally, the eigenvalues of the Laplacian matrix are written in ascending order Let λ λ n be the eigenvalues of L The following claim shows that λ 0 ie, the Laplacian matrix is positive semidefinite Proposition 2 For all x R n, x, Lx = {i,j} E xi x j 2 Note that from the above, we also get that kerl, where denotes the vector,, In fact, the kernel of the Laplacian matrix corresponds directly to the connected components of G Proposition 22 Let k be the number of connected components of G Then dimkerl = k In particular, the above says that if λ 2 = 0, then G can be divided in two parts with no edges crossing between them While this seems an uneccessarily complicated way of talking about connected components of a graph, stating this is in terms of eigenvalues has the advantage that one can talk about robust versions of this statement One can then make statements of the form: if λ 2 is close to zero, then G can be divided into two pieces with only a small number of edges crossing between them One such statement is known as Cheeger s inequality and we will see this later in the course The proof of Cheeger s inequality in fact comes with an algorithm for finding a way of dividing G in two pieces with small number of crossing edges This procedure is often known as spectral partioning 5