U.C. Berkeley CS270: Algorithms Lecture 21 Professor Vazirani and Professor Rao Last revised. Lecture 21

Similar documents
Lecture 24: Element-wise Sampling of Graphs and Linear Equation Solving. 22 Element-wise Sampling of Graphs and Linear Equation Solving

Approximate Gaussian Elimination for Laplacians

Sparsification by Effective Resistance Sampling

ORIE 6334 Spectral Graph Theory October 13, Lecture 15

Iterative solvers for linear equations

Iterative solvers for linear equations

Lecture 12: Introduction to Spectral Graph Theory, Cheeger s inequality

Lecture 10: October 27, 2016

1 Review: symmetric matrices, their eigenvalues and eigenvectors

Graph Sparsification I : Effective Resistance Sampling

Rings, Paths, and Paley Graphs

Lecture 8 : Eigenvalues and Eigenvectors

ORIE 6334 Spectral Graph Theory November 8, Lecture 22

Lecture 9: Low Rank Approximation

Lecture 12: Randomized Least-squares Approximation in Practice, Cont. 12 Randomized Least-squares Approximation in Practice, Cont.

Graph Sparsifiers. Smaller graph that (approximately) preserves the values of some set of graph parameters. Graph Sparsification

Lecture Semidefinite Programming and Graph Partitioning

Lecture 9: Numerical Linear Algebra Primer (February 11st)

CS137 Introduction to Scientific Computing Winter Quarter 2004 Solutions to Homework #3

Numerical Linear Algebra Primer. Ryan Tibshirani Convex Optimization

Physical Metaphors for Graphs

Constant Arboricity Spectral Sparsifiers

Graph Sparsifiers: A Survey

Fiedler s Theorems on Nodal Domains

(a) If A is a 3 by 4 matrix, what does this tell us about its nullspace? Solution: dim N(A) 1, since rank(a) 3. Ax =

Lecture 5 : Projections

Iterative solvers for linear equations

U.C. Berkeley CS294: Spectral Methods and Expanders Handout 11 Luca Trevisan February 29, 2016

ON THE QUALITY OF SPECTRAL SEPARATORS

PETROV-GALERKIN METHODS

Iterative Methods for Solving A x = b

Numerical Linear Algebra Primer. Ryan Tibshirani Convex Optimization /36-725

An Efficient Graph Sparsification Approach to Scalable Harmonic Balance (HB) Analysis of Strongly Nonlinear RF Circuits

Spectral and Electrical Graph Theory. Daniel A. Spielman Dept. of Computer Science Program in Applied Mathematics Yale Unviersity

Walks, Springs, and Resistor Networks

Problem 1. CS205 Homework #2 Solutions. Solution

Linear algebra and applications to graphs Part 1

Lecture 8: Linear Algebra Background

Ma/CS 6b Class 23: Eigenvalues in Regular Graphs

Effective Resistance and Schur Complements

ORIE 6334 Spectral Graph Theory September 8, Lecture 6. In order to do the first proof, we need to use the following fact.

Jordan normal form notes (version date: 11/21/07)

7.2 Steepest Descent and Preconditioning

NORMS ON SPACE OF MATRICES

Applied Mathematics 205. Unit V: Eigenvalue Problems. Lecturer: Dr. David Knezevic

Solving systems of linear equations

The Simplest Construction of Expanders

The Conjugate Gradient Method

Graphs, Vectors, and Matrices Daniel A. Spielman Yale University. AMS Josiah Willard Gibbs Lecture January 6, 2016

Homework 1 Elena Davidson (B) (C) (D) (E) (F) (G) (H) (I)

An Empirical Comparison of Graph Laplacian Solvers

Graph Sparsification III: Ramanujan Graphs, Lifts, and Interlacing Families

Fiedler s Theorems on Nodal Domains

Spectral Graph Theory Lecture 2. The Laplacian. Daniel A. Spielman September 4, x T M x. ψ i = arg min

EECS 275 Matrix Computation

Lecture 1 and 2: Random Spanning Trees

Math 407: Linear Optimization

Throughout these notes we assume V, W are finite dimensional inner product spaces over C.

An Algorithmist s Toolkit September 15, Lecture 2

Algorithmic Primitives for Network Analysis: Through the Lens of the Laplacian Paradigm

Lecture: Modeling graphs with electrical networks

Lecture: Local Spectral Methods (3 of 4) 20 An optimization perspective on local spectral methods

Linear Systems. Class 27. c 2008 Ron Buckmire. TITLE Projection Matrices and Orthogonal Diagonalization CURRENT READING Poole 5.4

Lecture 14: SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices) Lecturer: Sanjeev Arora

LINEAR ALGEBRA BOOT CAMP WEEK 4: THE SPECTRAL THEOREM

An Algorithmist s Toolkit September 10, Lecture 1

Sparsification of Graphs and Matrices

Approximating the Exponential, the Lanczos Method and an Õ(m)-Time Spectral Algorithm for Balanced Separator

Chapter 3 Transformations

E2 212: Matrix Theory (Fall 2010) Solutions to Test - 1

Properties of Expanders Expanders as Approximations of the Complete Graph

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.

Chapter 6: Orthogonality

Lecture notes: Applied linear algebra Part 1. Version 2

Eigenvalues, random walks and Ramanujan graphs

Discrete quantum random walks

Lecture 7. 1 Normalized Adjacency and Laplacian Matrices

Lecture 7: Spectral Graph Theory II

Towards Practically-Efficient Spectral Sparsification of Graphs. Zhuo Feng

Lecture 4 Orthonormal vectors and QR factorization

Spectral Theory of Unsigned and Signed Graphs Applications to Graph Clustering. Some Slides

2.1 Laplacian Variants

CS 435, 2018 Lecture 3, Date: 8 March 2018 Instructor: Nisheeth Vishnoi. Gradient Descent

Fast Linear Solvers for Laplacian Systems

Solutions to Final Exam

8.1 Concentration inequality for Gaussian random matrix (cont d)

Contents. 1 Introduction. 1.1 History of Optimization ALG-ML SEMINAR LISSA: LINEAR TIME SECOND-ORDER STOCHASTIC ALGORITHM FEBRUARY 23, 2016

You should be able to...

12. Perturbed Matrices

CHAPTER 11. A Revision. 1. The Computers and Numbers therein

Graph Embedding Techniques for Bounding Condition Numbers of Incomplete Factor Preconditioners

Quadrature for the Finite Free Convolution

Properties of Matrices and Operations on Matrices

Linear Algebra Massoud Malek

Recall the convention that, for us, all vectors are column vectors.

Lecture 3: Review of Linear Algebra

Lecture 21 Nov 18, 2015

Symmetric matrices and dot products

Announce Statistical Motivation Properties Spectral Theorem Other ODE Theory Spectral Embedding. Eigenproblems I

AMS526: Numerical Analysis I (Numerical Linear Algebra)

Transcription:

U.C. Berkeley CS270: Algorithms Lecture 21 Professor Vazirani and Professor Rao Scribe: Anupam Last revised Lecture 21 1 Laplacian systems in nearly linear time Building upon the ideas introduced in the previous two lectures, we will see an algorithm for solving Laplacian linear systems in time O(m log 3 n) based on the recent work of Koutis, Miller and Peng [1]. More generally, linear systems in symmetric diagonally dominant matrices reduce to Laplacian systems. 1.1 A recursive solution The iterative method actually used in the solver is called the recursive preconditioned Chebyshev iteration and it converges in O( κ) iterations. Operationally each iteration involves some matrix multiplications so for simplicity we will pretend that the iterative method is gradient descent. Recall the computational steps involved in one iteration of the preconditioned gradient descent method, r i = B 1 b B 1 Ax i α = ri T r i ri T B 1 Ar i x i+1 = x i + αr i (1) We saw that using low stretch trees as preconditioners yields an algorithm with running time O(m 3/2 ). The idea for improving the running time is to add some edges to the low stretch tree to improve the condition number. The preconditioner B is no longer a tree, so there is no linear time algorithm for inverting B, to compute B 1 (Ay) we need to solve the linear system Bz = Ay. The linear system in B is solved recursively yielding the following recurrence for the total running time, T (m) = κ(t (m ) + O(m)) (2) The number of iterations is κ, the T (m ) is the time required to solve a linear system in B and O(m) is the time required for multiplication by A. The running time for such recurrences can be analyzed using Master s theorem, Theorem 1 The solutions to the recurrence T (n) = at ( n b ) + f(n) are given by: (i) If f(n) = O(n log b a ɛ ) then T (n) = O(n log b a ). (ii) If f(n) = Θ(n log b a log k n) then T (n) = O(n log b a log k+1 n).

Notes for Lecture 21: Scribe: Anupam 2 We will show that for all k N it is possible to construct a preconditioner B with m log 2 n/k edges and κ = k yielding the recurrence, T (m) = k(t (m log 2 n/k) + O(m)) (3) Choosing k = log 4 n we have a = b = log 2 n and f(n) = Θ(m log 2 n) in the recurrence. The running time of the algorithm is T (m) = O(m log 3 n) by case (ii) of Master s theorem, which is almost linear in m. 1.2 Sparsifiers The preconditioner B for should have two properties: (i) B should be a sparse graph, the number of edges m = m log 2 n/k is small. (ii) The condition number of B 1 A = k is small, this is equivalent to B A kb by the claim in the previous lecture. A sparse graph approximating the spectral properties of G is called a spectral sparsifier. It will be helpful to think about sparsifiers to construct the preconditioner B. A graph H with is said to be an ɛ-spectral sparsifier for the graph G if for all x R n, (1 ɛ) xt L H x x T L G x (1 + ɛ) (4) The Rayleigh quotient characterization of eigenvalues shows that all the eigenvalues of H are within (1 ± ɛ) of the eigenvalues of G. The weaker notion of an ɛ cut sparsifier requires that the weight of every cut S [n] in H is within (1±ɛ) of the weight of the corresponding cut in G. (1 ɛ) w H(S, S) w G (S, S) (1 + ɛ) (5) If x {0, 1} n is the characteristic vector of a cut, then x T Lx = (i,j) E w ij(x i x j ) 2 is the weight of the edges across the cut, showing that an ɛ-spectral sparsifier H is also an ɛ-cut sparsifier. 1.2.1 Sparsification: Consider the following general algorithm for sparsification by sampling: 1. Draw q uniformly random samples from E(G), edge e with weight w e is sampled with probability p e and added to the sparsifier with weight w e /qp e. Why is this a reasonable algorithm? One way to see this is that the expected weight of edges across cut (S, S) in H is equal to w G (S, S), E[w H (S, S)] = e E G (S,S) qp e w e qp e = w G (S, S) (6) The samples are independent so for large enough q, we expect the weight of cuts in H to be concentrated around the expectation.

Notes for Lecture 21: Scribe: Anupam 3 With what probabilities should the edges be sampled? For a dumbbell shaped graph with exactly one edge (u, v) between two connected components, the edge should be chosen with probability 1 in the sparsifier. The example suggests that the probability of sampling an edge should be proportional to the effective resistance of the edge. The effective resistance R(u, v) is the potential difference across u, v when 1 unit current is inserted at u and withdrawn at v. The effective resistance between (u, v) is computed by solving a Laplacian linear system, R(u, v) = (χ u χ v ) T L + (χ u χ v ) (7) Here χ u, χ v are the characteristic vectors of vertices u, v and the equation follows from the action of the Laplacian in on potential and current vectors from lecture 19. Computing the effective resistances is as hard as solving linear systems, explicitly computing the effective resistances is avoided using the oversampling theorem, Theorem 2 If edges of G are sampled with probabilities proportional to p e w e R e and t = e p e then the graph H obtained from the sparsification algorithm with q = Õ(t log t) satisfies G/2 H 3G/2 with high probability. The oversampling theorem shows that instead of computing the effective resistances explicitly it suffices to find p e such that p e w e R e provided p e can be bounded. Low stretch spanning trees can be used to compute suitable p e. 1.3 Oversampling low stretch trees The stretch of an edge e = (u, v) with respect to a spanning tree T is defined as st(e) = 1 w e i [k] w i, where w i is the weight of the i-th edge in T on the unique path between u and v. For example, the stretch of a tree edge is 1, if the tree edges have resistances 1/w i the stretch is the effective resistance between u and v in the tree. The total stretch is the sum of the stretches of all the edges in G. Given a weighted graph graph G a spanning tree T with total stretch O(m log n) can be constructed in nearly linear time. The stretch of edge e satisfies the conditions of the oversampling theorem as st(e) = w e R T (e) where R T (e) is the effective resistance in the tree. The effective resistance can only increase if we remove edges from a graph, so R T (e) > R e. (See next lecture for proof). Add a copy of the low stretch spanning tree T scaled by a factor of k to G to obtain G = G + (k 1)T. As G G kg we have G G kg. The edges of G are sampled with probabilities proportional to st(e)/k, the condition of the oversampling theorem is satisfied as kt is a spanning tree for G. As we multiplied the weights of the tree edges by k the total stretch of the non tree edges decreases to O(m log n/k). The sum of the sampling probabilities t = O(m log n/k+(n 1)), the oversampling theorem tells us that if we sample O(t log t) = O(m log 2 n/k) edges then with high probability we have graph H such that, G G 2H 3G 3kG (8) The oversampling procedure produces a preconditioner with O(m log 2 n/k) edges and condition number O(k) as required in the analysis of the recurrence in section 1.

Notes for Lecture 21: Scribe: Anupam 4 1.4 Proof of the oversampling theorem The oversampling theorem was not proved in lecture, the theorem was first proved in [2], we insert the proof for completeness. 1.4.1 Factorization of the Laplacian The Laplacian is positive semidefinite and we know that all positive semidefinite matrices can be written as BB t. A natural factorization for the Laplacian is obtained by fixing an arbitrary orientation for the edges of G and condsidering the n m vertex edge incidence matrix B: 1 if e = (v, w) B ve = 1 if e = (w, v) (9) 0 otherwise It can be easily verified that BB t is the Laplacian of the unweighted graph G, irrespective of the chosen orientation. The Laplacian of a weighted graph is given by BW B t where W is an m m diagonal matrix having edge weights w e on the diagonal. 1.4.2 Energy expressed in two ways The total energy dissipated in the electrical network can be expressed both in terms of potentials and currents. The expression on terms of potentials is equal to the value of the Laplacian quadratic form, E(φ) = i j (φ i φ j ) 2 R ij = i j w ij (φ i φ j ) 2 = φ T Lφ (10) Using the factorization L = BW B t the energy of the quadratic form can be re-written as the norm of an m dimensional vector y Im(W 1/2 B t ). E(φ) = φ t BW 1/2 W 1/2 B t φ = y t y (11) The Laplacian for a candidate sparsifier is of the form L H = BW 1/2 SW 1/2 B t as a sparsifier is obtained by re-weighting the edges. The diagonal entries of S are ne qp e where p e is the probability of sampling e and n e is the number of times e gets sampled. The energy of the electrical flow in H corresponding to the potential vector φ is y t Sy where y = W 1/2 B t φ. The spectral sparsification condition (4) states that the y t y and y t Sy are approximately equal for all y Im(W 1/2 B t ), the condition is removed by considering the projector Π onto Im(W 1/2 B t ), y R m, 1 ɛ yt ΠSΠy y t ΠΠy 1 + ɛ (12) It suffices to approximate find an approximation ΠSΠ ΠΠ 2 ɛ to produce a spectral sparsifier. Denoting W 1/2 B t by L 1/2 as it is a square root of L, it is easy to see that the projector Π = L 1/2 L + (L 1/2 ) t.

Notes for Lecture 21: Scribe: Anupam 5 1.4.3 Properties of the projection Π Here are some observations about Π: (i) Π is isospectral to LL + by the cyclic property, the rank of Π is n 1 as LL + is identity on the n 1 dimensional subspace orthogonal to 1. (ii) The diagonal entry Π ee = w e R e where R e = (χ u χ v ) T L + (χ u χ v ) is the effective resistance across the end points of e, this follows by by explicitly computing the diagonal entry e t i Πe i for Π = W 1/2 B t L + BW 1/2. Let us consider some general properties of projection matrices. A projector P with column vectors v i can be expressed as follows: P = i [n] v i e t i (13) Squaring and using orthogonality of the e i we obtain a decomposition of P in terms of the column vectors, P = P 2 = i [n] v i v t i (14) The expression allows us to compute decompositions for ΠΠ and ΠSΠ in terms of y e = v e / p e, the normalized column vectors of Π, ΠΠ = e p e y e y t e = E p [yy t ] (15) To evaluate ΠSΠ we expand each term in the product using the expansion of Π from equation (13) and expand S recalling that the diagonal entries of S are n j qp j, ΠSΠ = v i e t i = 1 q i [m] j [m] n j qp j e j e t j k [m] e k v t k y e ye t (16) e [q] The closeness of ΠΠ and ΠSΠ follows from the following concentration result for matrices, Theorem 3 If sup y i M and y 1, y 2,, y q are independent samples drawn from the distribution p then, E E p[yy t ] 1 y e ye t log q q CM q 2 e [q] The upper bound M can be obtained as follows: y i = v i pe t were ep e t by the assumption p e w e R e, using the fact that v i = w e R e. Choosing q = O(t log t) samples ensures that the left hand side of the concentration bound is a constant. The number of edges in the spectral sparsifier is Õ(m log2 n), one log factor comes from the stretch of the spanning tree t = Õ(m log n) while the second one arises due to the concentration bound.

Notes for Lecture 21: Scribe: Anupam 6 References [1] I. Koutis, G.L. Miller, and R. Peng. Approaching optimality for solving SDD linear systems. In 2010 IEEE 51st Annual Symposium on Foundations of Computer Science, pages 235 244. IEEE, 2010. [2] D.A. Spielman and N. Srivastava. Graph sparsification by effective resistances. In Proceedings of the 40th annual ACM symposium on Theory of computing, pages 563 568. ACM, 2008.