Constructing Linear-Sized Spectral Sparsification in Almost-Linear Time

Size: px

Start display at page:

Download "Constructing Linear-Sized Spectral Sparsification in Almost-Linear Time"

Jeffery Day
6 years ago
Views:

1 Constructing Linear-Sized Spectral Sparsification in Almost-Linear Time Yin Tat Lee MIT Cambridge, USA He Sun The University of Bristol Bristol, UK Abstract We present the first almost-linear time algorithm for constructing linear-sized spectral sparsification for graphs. This improves all previous constructions of linear-sized spectral sparsification, which requires Ωn time [BSS1, Zou1, AZLO15]. A key ingredient in our algorithm is a novel combination of two techniques used in literature for constructing spectral sparsification: Random sampling by effective resistance [SS11], and adaptive construction based on barrier functions [BSS1, AZLO15]. keywords: algorithmic spectral graph theory, spectral sparsification

2 1 Introduction Graph sparsification is the procedure of approximating a graph G by a sparse graph G such that certain quantities between G and G are preserved. For instance, spanners are defined between two graphs in which the distances between any pair of vertices in these two graphs are approximately the same [Che89]; cut sparsifiers are reweighted sparse graphs of the original graphs such that the weights of every cut between the sparsifiers and the original graphs are approximatedly the same [BK96]. Since both storing and processing large-scale graphs are expensive, graph sparsification is one of the most fundamental building blocks in designing fast graph algorithms, including solving Laplacian systems [ST04, KMP10, KMP11, KLP1, PS14, LPS15], designing approximation algorithms for the maximum flow problem [BK96, KLOS14, She13], and solving streaming problems [KL13, KLM + 14]. Beyond graph problems, techniques developed for spectral sparsification are widely used in randomized linear algebra [Mah11, LMP13, CLM + 15], sparsifying linear programs [LS14], and various pure mathematics problems [SS1, Sri1, MSS13, Bar14]. In this work, we study spectral sparsification introduced by Spielman and Teng [ST11]: A spectral sparsifier is a reweighted sparse subgraph of the original graph such that, for all real vectors, the Laplacian quadratic forms between that subgraph and the original graph are approximately the same. Formally, for any undirected and weighted graph G = V, E, w with n vertices and m edges, we call a subgraph G of G, with proper reweighting of the edges, is a 1 + ε spectral sparsifier if it holds for any x R n that 1 εx L G x x L G x 1 + εx L G x, where L G and L G are the respective graph Laplacian matrices of G and G. Spielman and Teng [ST11] presented the first algorithm for constructing spectral sparsification. For any undirected graph G of n vertices, their algorithm runs in On log c n/ε time, for some big constant c, and produces a spectral sparsifier with On log c n/ε edges for some c. Since then, there has been a wealth of work on spectral sparsification. For instance, Spielman and Srivastava [SS11] presented a nearly-linear time algorithm for constructing a spectral sparsifier of On log n/ε edges. Batson, Spielman and Srivastava [BSS1] presented an algorithm for constructing spectral sparsifiers with On/ε edges, which is optimal up to a constant. However, all previous constructions either require Ω n +ε time in order to produce linear-sized sparsifiers [BSS1, Zou1, AZLO15], or On log O1 n/ε time but the number of edges in the sparsifiers is sub-optimal. In this paper we present the first almost-linear time algorithm for constructing linear-sized spectral sparsification for graphs. Our result is summarized as follows: Theorem 1.1. Given any integer q 10 and 0 < ε 1/10. Let G = V, E, w be an undirected and weighted graph with n vertices and m edges. Then, there is an algorithm that outputs a 1 + ε- spectral sparsifier of G with O qn ε edges. The algorithm runs in Õ q m n 5/q time. ε 4+4/q Graph sparsification is known as a special case of sparsifying sums of rank-1 positive semidefinite PSD matrices [BSS1, SS11], and our algorithm works in this general setting as well. Our result is summarized as follows: Theorem 1.. Given any integer q 10 and 0 < ε 1/10. Let I = m v iv i be the sum of m rank-1 PSD matrices. Then, there is an algorithm that outputs scalers {s i } m with {s i : s i 0} = O qn ε such that m 1 ε I s i v i v i 1 + ε I. 1

3 The algorithm runs in Õ qm n ω 1+3/q time, where ω is the matrix-multiplication constant. ε A key ingredient in our algorithm is a novel combination of two techniques used in literature for constructing spectral sparsification: Random sampling by effective resistance of edges [SS11], and adaptive construction based on barrier functions [BSS1, AZLO15]. We will present an overview of the algorithm, and the intuitions behind it in Section. Preliminaries Let G = V, E, w be a connected, undirected and weighted graph with n vertices and m edges, and weight function w : V V R 0. The Laplacian matrix of G is an n by n matrix L defined by wu, v if u v, L G u, v = degu if u = v, 0 otherwise, where degu = v u wu, v. It is easy to see that x L G x = u v w u,v x u x v 0, for any x R n. For any matrix A, let λ max A and λ min A be the maximum and minimum eigenvalues of A. The condition number of matrix A is defined by λ max A/λ min A. For any two matrices A and B, we write A B to represent B A is positive semi-definite PSD, and A B to represent B A is positive definite. For any two matrices A and B of equal dimensions, let A B tr A B. For any function f, we write Õf Of logo1 f. For matrices A and B, we write A ε B if 1 ε A B 1 + εa. Algorithm We study the algorithm of sparsifying the sum of rank-1 PSD matrices in this section. Our goal is to, for any vectors v 1, v m with m v iv i = I, find scalars {s i} m satisfying qn {s i : s i 0} = O ε, such that 1 ε I m s i v i v i 1 + ε I. We will use this algorithm to construct graph sparsifiers in Section 3..1 Overview of Our Approach Our construction is based on a probabilistic view of the algorithm presented in Batson et al. [BSS1]. We refer their algorithm BSS for short, and give a brief overview of the BSS algorithm at first. At a high level, the BSS algorithm proceeds by iterations, and adds a rank-1 matrix c v i v i with some scaling factor c to the currently constructed matrix A j in iteration j. To control the spectral properties of matrix A j, the algorithm maintains two barrier values u j and l j, and initially u 0 > 0, l 0 < 0. It was proven that one can always find a vector in {v i } m and update u j, l j in a proper manner in each iteration, such that the invariant l j I A j u j I.1

4 always holds, [BSS1]. To guarantee this, Batson et al. [BSS1] introduces a potential function Φ u,l A trui A 1 + tra li 1. to measure how far the eigenvalues of A are from the barriers u and l, since a small value of Φ u,l A implies that no eigenvalue of A is close to u or l. With the help of the potential function, it was proven that, after k = Θ n/ε iterations, it holds that l k cu k for some constant c, implying that the resulting matrix A k is a linear-sized and A k Oε I. The original BSS algorithm is deterministic, and in each iteration the algorithm finds a rank-1 matrix which maximizes certain quantities. To informally explain our algorithm, let us look at the following randomized variant of the BSS algorithm: In each iteration, we choose a vector v i with probability p i, and add a rank-1 matrix A ε t 1 p i v i v i to the current matrix A. See Algorithm 1 for formal description. Algorithm 1 Randomized BSS algorithm 1: j = 0; : l 0 = 8n/ε, u 0 = 8n/ε; 3: A 0 = 0; 4: while u j l j < 8n/ε do 5: Let t = tr u j I A j 1 + tr A j l j I 1 ; 6: Sample a vector v i with probability p i v i u ji A j 1 v i + v i A j l j I 1 v i /t; 7: A j+1 = A j + ε t 1 p i v i v i ; 8: u j+1 = u j + ε 9: j j + 1; 10: Return A j ; t 1 ε and l j+1 = l j + ε t 1+ε ; Let us look at any fixed iteration j, and analyze how the added A impacts the potential function. We drop the subscript representing the iteration j for simplicity. After adding A, the first-order approximation of Φ u,l A gives that Φ u,l A + A Φ u,l A + ui A A A li A..3 Since we have that E [ A ] = m p i ε t 1 p i v i v i = ε t m v iv i = ε t I, E [Φ u,l A + A ] Φ u,l A + ε t ui A I ε t A li I = Φ u,l A + ε t tr ui A ε tr A li t = Φ u,l A ε t d du Φ u,la ε t d dl Φ u,la. Notice that if we increase u by ε t and l by ε t, Φ u,l approximately increases by ε t d du Φ u,la + ε t d dl Φ u,la. 3

5 Hence, comparing Φ u+ε/t,l+ε/t A + A with Φ u,l A, the increase of the potential function due to the change of barrier values is approximately compensated by the drop of the potential function by the effect of A. For a more rigorous analysis, we need to look at the higher-order terms and increase u slightly more than l to compensate that. Batson et al. [BSS1] gives the following estimate: Lemma.1 [BSS1], proof of Lemma 3.3 and 3.4. Let A R n n, and u, l be parameters satisfying li A ui. Suppose that w R n satisfies ww δui A and ww δa li for some 0 < δ < 1. Then, it holds that Φ u,l A + ww Φ u,l A + w ui A w 1 δ w A li w. 1 + δ The estimate above shows that the first-order approximation.3 is good if ww δui A and ww δa li for small δ. It is easy to check that, by setting δ = ε, the added matrix A satisfies these two conditions, since ε t 1 v i v i = p i ε v i v i v i ui A 1 v i + v i A li 1 v i ε v i v i v i ui A 1 v i ε ui A, where we used the fact that vv v B 1 vb for any vector v and PSD matrix B. Similarly, we have that ε t 1 v i v i εa li. p i Hence, if Φ u,l A is small initially, our crude calculations above gives a good approximation and Φ u,l A is small throughout the executions of the whole algorithm. Up to a constant factor, this gives the same result as [BSS1], and therefore Algorithm 1 constructs an Θn/ε -sized 1+Oε- spectral sparsifier. Our algorithm follows the same framework as Algorithm 1. However, to construct a spectral sparsifier in almost-linear time, we expect that the sampling probability {p i } m of vectors i can be approximately computed fast, and ii can be further reused for a few iterations. For fast approximation of the sampling probabilities, we adopt the idea proposed in [AZLO15]: Instead of defining the potential function by., we define the potential function by Φ u,l A trui A q + tra li q. Since q is a large constant, the value of the potential function becomes larger when some eigenvalue of A is close to u or l. Hence, a bounded value of Φ u,l A insures that the eigenvalues of A never get too close to u or l, which further allows us to compute the sampling probabilities {p i } m efficiently simply by Taylor expansion. Moreover, by defining the potential function based on tr q, one can prove a similar result as Lemma.1. This gives an alternative analysis of the algorithm presented in [AZLO15], which is the first almost-quadratic time algorithm for constructing linear-sized spectral sparsifiers. To reuse the sampling probabilities, we re-compute {p i } m after every Θ n 1 1/q iterations: We show that as long as the sampling probability satisfies p i C v i ui A 1 v i + v i A li 1 v i m v i ui A 1 v i + v i A li 1 v i for some constant C > 0, we can still sample v i with probability p i and get the same guarantee on the potential function. The reason is as follows: Assume that A = T A,i is the sum 4

6 of the sampled matrices within T = O n 1 1/q iterations. If a randomly chosen matrix A,i satisfies A,i 1 Cq ui A, then by the matrix Chernoff bound A 1 ui A holds with high probability. By scaling every sampled rank-1 matrix q times smaller, the sampling probability only changes by a constant factor within T iterations. Since we choose Θn/ε vectors in total, our algorithm only recomputes the sampling probabilities Θ n 1/q /ε times. Hence, our algorithm runs in almost-linear time if q is a large constant.. Algorithm Description The algorithm follows the same framework as Algorithm 1, and proceeds by iterations. Initially, the algorithm sets u 0 n 1/q, l 0 n 1/q, A 0 0. After iteration j the algorithm updates u j, l j by u,j, l,j respectively, i.e., u j+1 u j + u,j, l j+1 l j + l,j, and updates A j with respect to the chosen matrix in iteration j. The choice of u,j and l,j insures that l j I A j u j I holds for any j. In iteration j, the algorithm computes the relative effective resistance of vectors {v i } m defined by R i A j, u j, l j v i u ji A j 1 v i + v i A j l j I 1 v i, and samples N j vectors independently with replacement, where vector v i is chosen with probability proportional to R i A j, u j, l j, and N j 1 m n /q R i A j, u j, l j min {λ min u j I A j, λ min A j l j I}. The algorithm sets A j+1 to be the sum of A j and sampled v i v i with proper reweighting. For technical reasons, we define u,j and l,j by u,j 1 + ε ε N j q m R ia j, u j, l j, ε N j l,j 1 ε q m R ia j, u j, l j. See Algorithm for formal description. We remark that, although exact values of N j and relative effective resistances are difficult to compute in almost-linear time, we can use approximated values of R i and N j instead. It is easy to see that in each iteration an over estimate of R i, and an under estimate of N j with constant-factor approximation suffice for our purpose. 3 Analysis We analyze Algorithm in this section. To make the calculation less messy, we assume the following: Assumption 3.1. We always assume that 0 < ε 1/10, and q is an integer satisfying q 10. 5

7 Algorithm Algorithm for constructing spectral sparsifiers Require: ε 1/10, q 10 1: j = 0; : l 0 = n 1/q, u 0 = n 1/q, A 0 = 0; 3: while u j l j < 4 n 1/q do 4: W j = 0; 5: Compute R i A j, u j, l j for all vectors v i ; 6: Sample N j vectors independently with replacement, where every v i is chosen with probability proportional to R i A j, u j, l j. For every sampled v, add ε/q R i A j, u j, l j 1 vv to W j ; 7: A j+1 = A j + W j ; 8: u j+1 = u j + u,j, l j+1 = l j + l,j ; 9: j = j + 1; 10: Return A j ; Our analysis is based on a potential function Φ u,l with barrier values u, l R. Formally, for a symmetric matrix A R n n with eigenvalues λ 1 λ n and parameters u, l satisfying li A ui, let Φ u,l A trui A q + tra li q n 1 q n 1 q = u λ i λ i l We will show how the potential function evolves after each iteration in Section 3.1. Combing this with the ending condition of the algorithm, we will prove in Section 3. that the algorithm outputs a linear-sized spectral sparsifier. We will prove Theorem 1.1 and Theorem 1. in Section Analysis of a Single Iteration We analyze the sampling scheme within a single iteration, and drop the subscript representing the iteration j for simplicity. Recall that in each iteration the algorithm samples N vectors independently from V = {v i } m satisfying m v iv i = I, where every vector v i is sampled with probability R i A,u,l m j=1 R ja,u,l. We use v 1,, v N to denote these N sampled vectors, and define the reweighted vectors by for any 1 i N. Let ε w i q R i A, u, l v i, W N w i w i, and we use W DA, u, l to represent that W is sampled in this way with parameters A, u and l. We will show that with high probability matrix W satisfies 0 W 1 ui A. We first recall the following Matrix Chernoff Bound. Lemma 3. Matrix Chernoff Bound, [Tro1]. Let {X k } be a finite sequence of independent, random, and self-adjoint matrices with dimension n. Assume that each random matrix satisfies 6

8 X k 0, and λ max X k D. Let µ λ max k E [ X k ]. Then, it holds for any δ 0 that P [ λ max k X k 1 + δµ ] e δ µ/d n 1 + δ 1+δ. Lemma 3.3. Assume that the number of samples satisfies N < m n /q R i A, u, l λ min ui A. Then, it holds that E [ W ] = ε q N m R ia, u, l I, and [ P 0 W 1 ] ui A 1 ε 100qn. Proof. By the description of the sampling procedure, it holds that E [ w i w i ] = m j=1 R j A, u, l m t=1 R ta, u, l ε q v j v j R j A, u, l = ε q 1 m t=1 R ta, u, l I, and [ N E [ W ] = E w i w i which proves the first statement. Now for the second statement. Let ] = ε q N m R ia, u, l I, z i = ui A 1/ w i. It holds that tr z i z i = tr ui A 1/ w i w i ui A 1/ = ε q tr ui A 1/ v i v i ui A 1/ R i A, u, l ε q v i ui A 1 v i v i ui A 1 v i + v i A li 1 v i ε q, and λ max z i z i ε q. Moreover, it holds that [ N ] E z i z i = ε q ε q N m t=1 R ui A 1 ta, u, l N 1 m t=1 R ta, u, l λ max I. 3. ui A 7

9 This implies that [ N λ max E z i z i ] ε q N 1 m t=1 R ta, u, l λ max. ui A By setting µ = ε q N 1 m R ia, u, l λ max, ui A it holds by the Matrix Chernoff Bound cf. Lemma 3. that Set the value of 1 + δ to be [ N ] P λ max z i z i 1 + δµ 1 + δ = 1 µ = q εn n m R j A, u, l j=1 e δ 1 + δ 1+δ 1 λ max 1 ui A = q m εn R j A, u, l λ min ui A q 4ε n/q, j=1 µ q/ε. where the last inequality follows from the condition on N. Hence, with probability at least 1 n we have that e δ 1 + δ 1+δ µ q/ε e 1+δ µ q/ε 1 n 1 n 1 + δ N λ max z i z i 1 + δ µ = 1, q e ε ε δ 100qn, which implies that 0 N z iz i 1 I and 0 W 1 ui A. Now we analyze the change of the potential function after each iteration, and show that the expected value of the potential function decreases over time. By Lemma 3.3, with probability at least 1 ε 100qn, it holds that 0 W 1 ui A. We define [ Ẽ [fw ] P W is chosen and W 1 ] ui A f W. W DA,u,l Lemma 3.4 below shows how the potential function changes after each iteration, and plays a key role in our analysis. This lemma was first proved in [BSS1] for the case of q = 1, and was extended in [AZLO15] to general values of q. For completeness, we include the proof of the lemma in the appendix. 8

10 Lemma 3.4 [AZLO15]. Let q 10 and ε 1/10. Suppose that w ui A 1 w ε q and w A li 1 w ε q. It holds that tra + ww li q tra li q q1 ε w A li q+1 w, and trui A ww q trui A q + q1 + ε w ui A q+1 w. Lemma 3.5. Let j be any iteration. It holds that Ẽ [ Φ uj+1,l j+1 A j+1 ] Φ uj,l j A j. Proof. Let w 1 w 1,, w N j w N j be the matrices picked in iteration j, and define for any 0 i N j that i B i = A j + w t w t. We study the change of the potential function after adding a rank-1 matrix within each iteration. For this reason, we use and u = u,j N j = 1 + ε l = l,j N j = 1 ε t=1 ε q m t=1 R ta j, u j, l j, ε q m t=1 R ta j, u j, l j to express the average change of the barrier values u,j and l,j. We further define for 0 j N j that û i = u j + i u, ˆli = l j + i l. Assuming 0 W j 1 u ji A j, we claim that w i w i ε q û ii B i 1 and w i w i ε q B i 1 ˆl i I, 3.3 for any 1 i N j. Based on this, we apply Lemma 3.4 and get that [ ] Ẽ B Φûi,ˆli i 1 + w i w i Φûi,ˆl i B i 1 + q1 + εtr û i I B i 1 q+1 E [ w i w i ] q1 εtr B i 1 ˆl q+1 i I E [ wi w i ] = Φûi,ˆl i B i 1 + q u tr û i I B i 1 q+1 q l tr B i 1 ˆl i I q We define a function f i by Notice that df i t dt f i t = tr q q û i 1 + t u I Bi 1 + tr B i 1 ˆli 1 + t l I. = q u tr û i 1 + t u I Bi 1 q+1 + q l tr 9 q+1 B i 1 ˆli 1 + t l I.

11 Since f is convex, we have that df i t dt Putting 3.4 and 3.5 together, we have that f i 1 f i 0 = Φûi t=1,ˆl i B i 1 Φûi 1,ˆl i 1 B i [ ] Ẽ Φûi,ˆl i B i Φûi,ˆl i B i 1 df it dt Φûi 1 t=1,ˆl i 1 B i 1. Repeat this argument, we have that Ẽ [ Φ uj+1,l j+1 A j+1 ] [ = Ẽ ] ΦûNj,ˆl BNj Nj Φû0,ˆl 0 B 0 = Φ uj,l j A j, which proves the statement. So, it suffices to prove the claim 3.3. Since vv v B 1 vb for any vector v and PSD matrix B, we have that v i v i R i A j, u j, l j By the assumption of W j 1 u ji A j, it holds that w i w i = This proves the first statement of the claim. For the second statement, notice that v i v i v i u ji A j 1 v i u j I A j. ε qr i A j, u j, l j v iv i ε q u ji A j ε q û ii B i 1. and hence εn j l j+1 l j = l,j q m t=1 R ta j, u j, l j 1 λ mina j l j I w i w i ε q A j l j I ε q A j ˆl i I ε B i 1 q ˆl i I. 3. Analysis of the Approximation Guarantee In this subsection we will prove that the algorithm produces a linear-sized 1 + Oε-spectral sparsifier. We assume that the algorithm finishes after k iterations, and will prove that the output A k is a 1 + Oε-spectral sparsifier. It suffices to show that the condition number of A k is small, which follows directly from our setting of parameters. Lemma 3.6. The output matrix A k has condition number at most 1 + Oε. Proof. Since the condition number of A k is at most it suffices to prove that u k l k /u k = Oε. u k = 1 u k l 1 k, l k u k 10

12 Since the increase rate of u,j l,j with respect to u,j for any iteration j is we have that u,j l,j 1 + ε 1 ε = = 4ε 4ε, u,j 1 + ε 1 + ε u k l k = n1/q + k 1 u,j l,j u k n 1/q + k 1 u,j n1/q + k 1 u,j l,j n 1/q + 4ε 1 k 1 u,j l,j. By the ending condition of the algorithm, it holds that u k l k 4 n 1/q, i.e. Hence, it holds that which finishes the proof. k 1 u,j l,j n 1/q. u k l k n1/q + n 1/q u k n 1/q + 4ε 1 8ε, n 1/q Now we prove that the algorithm finishes in O total. Lemma 3.7. The following statements hold: qn 3/q ε With probability at least 4/5, the algorithm finishes in 10qn3/q ε With probability at least 4/5, the algorithm chooses at most 10qn ε iterations, and picks O qn ε vectors in iterations. Proof. Notice that after iteration j the barrier gap u j l j is increased by vectors. u,j l,j = 4ε q = 4ε q 4ε q N j m R ia j, u j, l j 1 n /q min {λ minu j I A j, λ min A j l j I} 1 n /q Φ uj,l j A j 1/q. Since the algorithm finishes within k iterations if k 1 u,j l,j n 1/q, 11

13 it holds that k 1 P [ algorithm finishes within k iterations ] P u,j l,j n 1/q k 1 P 4ε qn /q Φ uj,l j A j 1/q n 1/q k 1 = P Φuj,l j A j 1/q q ε n 3 1/q k 1 P Φuj,l j A j 1/q k ε q where the last inequality follows from the fact that k 1 Φuj,l j A j k 1 1/q Φuj,l j A j 1/q k. By Lemma 3.3, every picked matrix W j in iteration j satisfies 0 W j 1 u ji A 1 1/q n 3, with probability at least 1 ε 10qn 100qn, and with probability 9/10 all matrices picked in k = ε iterations satisfy the condition above. Also, by Lemma 3.5 we have that k 1 k 1 [ Ẽ Φ uj,l j A j 1/q ] k 1 = Ẽ Φ uj,l j A j 1/q [ Ẽ Φuj,l j A j ] 1/q k, 3.6 since the initial value of the potential function is at most 1. Therefore, it holds that P [ algorithm finishes in more than k iterations ] k 1 P Φuj,l j A j 1/q k ε 1 1/q q n 3 k 1 P Φuj,l j A j 1/q k ε 1 1/q q n 3 and j : W j 1 u ji A j [ + P j : W j 1 ] u ji A j q kε n 3 1/q + 1/10 1/5, where the second last inequity follows from Markov s inequality and 3.6, and the last inequality follows by our choice of k. This proves the first statement. 1

14 Now for the second statement. Notice that for every vector chosen in iteration j, the barrier gap u,j l,j is increased on average by u,j l,j N j = 4ε q m R ia j, u j, l j. To bound R i A j, u j, l j, let the eigenvalues of matrix A j be λ 1,, λ n. Then, it holds that m R i A j, u j, l j = Therefore, we have that m v i u ji A j 1 v i + u,j l,j N j m v i A j l j I 1 v i n 1 n 1 = + u j λ i λ i l j n 1/q n u j λ i q + λ i l j q n 1 1/q = Φ uj,l j A j 1/q n 1 1/q. 4ε q 1 n 1 1/q. 3.7 Φ uj,l j A j 1/q Let v 1,, v z be the vectors sampled by the algorithm, and v j is picked in iteration τ j, where 1 j z. We first assume that the algorithm could check the ending condition after adding every single vector. In such case, it holds that P [ algorithm finishes after choosing z vectors ] z P 4ε q 1 n 1 1/q n1/q Φ j=1 uτj,l τj A τj 1/q z = P Φ uτj,l τj A τj 1/q qn/ε. j=1 Following the same proof as the first part and noticing that in the final iteration the algorithm chooses at most On extra vectors, we obtain the second statement. 3.3 Proof of the Main Results Now we analyze the runtime of the algorithm, and prove the main results. We first analyze the algorithm for sparsifying sums of rank-1 PSD matrices, and prove Theorem 1.. Proof of Theorem 1.. By Lemma 3.7, with probability at least 4/5 the algorithm chooses at most 10qn vectors, and by Lemma 3.6 the condition number of A ε k is at most 1 + Oε, implying that the matrix A k is a 1 + Oε-approximation of I. These two results together prove that A k is a linear-sized spectral sparsifier. For the runtime, Lemma 3.7 proves that the algorithm finishes in 10qn3/q iterations, and it is ε easy to see that all the required quantities in each iteration can be approximately computed in Õm n ω 1 time using fast matrix multiplication. Therefore, the total runtime of the algorithm is Õ q m n ω 1+3/q. ε 13

15 Next we show how to apply our algorithm in the graph setting, and prove Theorem 1.1. Let L = m u iu i be the Laplacian matrix of an undirected graph G, where u i u i is the Laplacian matrix of the graph consisting of a single edge e i. By setting v i = L 1/ u i for 1 i m, it is easy to see that constructing a spectral sparsifier of G is equivalent to sparsifing the matrix m v iv i. We will present in the appendix almost-linear time algorithms to approximate the required quantities λ min u j I A j, λ min A j l j I, v i u ji A j 1 v i, and v i A j l j I 1 v i in each iteration, and this gives Theorem 1.1. Proof of Theorem 1.1. By applying the same analysis as in the proof of Theorem 1., we know that the output matrix A k is a linear-sized spectral sparsifier, and it suffices to analyze the runtime of the algorithm. By Lemma 3.3 and the Union Bound, with probability at least 9/10 all the matrices picked in k = 10qn3/q ε iterations satisfy W j 1 u ji A j. Conditioning on the event, with constant probability E [ Φ uj,l j A j ] for all iterations j, and by Markov s inequality with high probability it holds that Φ uj,l j A j = O qn ε for all iterations j. On the other hand, notice that it holds for any 1 j n that u λ j q n u λ i q < Φ u,l A, which implies that λ j < u Φ u,l A 1/q. Similarly, it holds that λ j > l + Φ u,l A 1/q for any 1 j n. Therefore, we have that ε 1/q ε 1/q l j + O I A j u j O I. qn qn Since both of u j and l j are of the order On 1/q, we set η = O ε/n /q and obtain that l j + l j ηi A j 1 ηu j I. Hence, we apply Lemma A.5 and Lemma A.6 to compute all required quantities in each iteration up to constant approximation in time m Õ ε = η Õ m n /q ε +/q. Since by Lemma 3.7 the algorithm finishes in 10qn3/q iterations with probability at least 4/5, the ε total runtime of the algorithm is q m n 5/q Õ. ε 4+4/q 14

16 Acknowledgment This work was partially supported by NSF awards and Part of this work was done while both authors were visiting the Simons Institute for the Theory of Computing, UC Berkeley, and the second author was affiliated with the Max Planck Institute for Informatics, Germany. We thank Zeyuan Allen-Zhu, Zhenyu Liao, and Lorenzo Orecchia for sending us their manuscript of [AZLO15] and the inspiring talk Zeyuan Allen-Zhu gave at the Simons Institute for the Theory of Computing. Finally, we thank Michael Cohen for pointing out a gap in a previous version of the paper and his fixes for the gap, as well as Lap-Chi Lau for many insightful comments on improving the presentation of the paper. References [AZLO15] Zeyuan Allen-Zhu, Zhenyu Liao, and Lorenzo Orecchia. Spectral sparsification and regret minimization beyond multiplicative updates. In 47th Annual ACM Symposium on Theory of Computing STOC 15, pages 37 45, 015. [Bar14] Alexander Barvinok. Thrifty approximations of convex bodies by polytopes. International Mathematics Research Notices, 01416: , 014. [BK96] András Benczúr and David Karger. Approximating s-t minimum cuts in Õn time. In 8th Annual ACM Symposium on Theory of Computing STOC 96, pages 47 55, [BSS1] Joshua D. Batson, Daniel A. Spielman, and Nikhil Srivastava. Twice-Ramanujan sparsifiers. SIAM Journal on Computing, 416: , 01. [Che89] L. Paul Chew. There are planar graphs almost as good as the complete graph. Journal of Computer and System Sciences, 39:05 19, [CLM + 15] Michael B Cohen, Yin Tat Lee, Cameron Musco, Christopher Musco, Richard Peng, and Aaron Sidford. Uniform sampling for matrix approximation. In 6th Innovations in Theoretical Computer Science ITCS 15, pages , 015. [KL13] Jonathan A Kelner and Alex Levin. Spectral sparsification in the semi-streaming setting. Theory of Computing Systems, 53:43 6, 013. [KLM + 14] Michael Kapralov, Yin Tat Lee, Cameron Musco, Christopher Musco, and Aaron Sidford. Single pass spectral sparsification in dynamic streams. In 55th Annual IEEE Symposium on Foundations of Computer Science FOCS 14, pages , 014. [KLOS14] Jonathan A. Kelner, Yin Tat Lee, Lorenzo Orecchia, and Aaron Sidford. An almostlinear-time algorithm for approximate max flow in undirected graphs, and its multicommodity generalizations. In 5th Annual ACM-SIAM Symposium on Discrete Algorithms SODA 14, pages 17 6, 014. [KLP1] Ioannis Koutis, Alex Levin, and Richard Peng. Improved spectral sparsification and numerical algorithms for SDD matrices. In 9th International Symposium on Theoretical Aspects of Computer Science STACS 1, pages 66 77, 01. [KMP10] Ioannis Koutis, Gary L Miller, and Richard Peng. Approaching optimality for solving SDD linear systems. In 51st Annual IEEE Symposium on Foundations of Computer Science FOCS 10, pages 35 44, 010. [KMP11] Ioannis Koutis, Gary L. Miller, and Richard Peng. A nearly-m log n time solver for 15

17 SDD linear systems. In 5nd Annual IEEE Symposium on Foundations of Computer Science FOCS 11, pages , 011. [LMP13] Mu Li, Gary L Miller, and Richard Peng. Iterative row sampling. In 54th Annual IEEE Symposium on Foundations of Computer Science FOCS 13, pages , 013. [LPS15] Yin Tat Lee, Richard Peng, and Daniel A Spielman. Sparsified Cholesky solvers for SDD linear systems. arxiv: , 015. [LS14] Yin Tat Lee and Aaron Sidford. Path finding methods for linear programming: Solving linear programs in Õ rank iterations and faster algorithms for maximum flow. In 55th Annual IEEE Symposium on Foundations of Computer Science FOCS 14, pages , 014. [LT76] E Lieb and W Thirring. Inequalities for the moments of the eigenvalues of the Schrödinger equation and their relation to Sobolev inequalities. Studies in Mathematical Physics: Essays in honor of Valentine Bargman, Lieb, E., Simon, B., Wightman, AS eds., pages , [Mah11] Michael W Mahoney. Randomized algorithms for matrices and data. Foundations and Trends in Machine Learning, 3:13 4, 011. [MSS13] Adam Marcus, Daniel A Spielman, and Nikhil Srivastava. Interlacing families II: Mixed characteristic polynomials and the Kadison-Singer problem. arxiv: , 013. [PS14] Richard Peng and Daniel A Spielman. An efficient parallel solver for SDD linear systems. In 46th Annual ACM Symposium on Theory of Computing STOC 14, pages , 014. [She13] Jonah Sherman. Nearly maximum flows in nearly linear time. In 54th Annual IEEE Symposium on Foundations of Computer Science FOCS 13, pages 63 69, 013. [Sri1] Nikhil Srivastava. On contact points of convex bodies. In Geometric Aspects of Functional Analysis, pages [SS11] Daniel A. Spielman and Nikhil Srivastava. Graph sparsification by effective resistances. SIAM Journal on Computing, 406: , 011. [SS1] Daniel A Spielman and Nikhil Srivastava. An elementary proof of the restricted invertibility theorem. Israel Journal of Mathematics, 1901:83 91, 01. [ST04] Daniel A Spielman and Shang-Hua Teng. Nearly-linear time algorithms for graph partitioning, graph sparsification, and solving linear systems. In 36th Annual IEEE Symposium on Foundations of Computer Science FOCS 04, pages 81 90, 004. [ST11] Daniel A. Spielman and Shang-Hua Teng. Spectral sparsification of graphs. SIAM Journal on Computing, 404: , 011. [Tro1] Joel A. Tropp. User-friendly tail bounds for sums of random matrices. Foundations of Computational Mathematics, 14: , 01. [Zou1] Anastasios Zouzias. A matrix hyperbolic cosine algorithm and applications. In 39th International Colloquium on Automata, Languages, and Programming ICALP 1, pages ,

18 A Omitted Proofs A.1 Estimates of the Potential Functions In this subsection we prove Lemma 3.4. We first list the following two lemmas, which will be used in our proof. Lemma A.1 Sherman-Morrison Formula. Let A R n n be an invertible matrix, and u, v R n. Suppose that 1 + v A 1 u 0. Then it holds that A + uv 1 = A 1 A 1 uv A v A 1 u. Lemma A. Lieb Thirring Inequality, [LT76]. Let A and B be positive definite matrices, and q 1. Then it holds that trbab q trb q A q B q. Proof of Lemma 3.4. Let Y = A li. By the Sherman-Morrison Formula Lemma A.1, it holds that try + ww q = tr Y 1 Y 1 ww Y 1 q 1 + w Y 1. A.1 w By the assumption of w Y 1 w ε q, we have that try + ww q tr Y 1 Y 1 ww Y 1 q A. 1 + ε/q q = tr Y 1/ I Y 1/ ww Y 1/ Y 1/ 1 + ε/q q tr Y q/ I Y 1/ ww Y 1/ Y q/ A ε/q q = tr Y q I Y 1/ ww Y 1/, A ε/q where A. uses the fact that A B implies that tr A q tr B q, A.3 follows from the Lieb- Thirring inequality Lemma A., and A.4 uses the fact that the trace is invariant under cyclic permutations. Let D = Y 1/ ww Y 1/. 1 + ε/q Note that 0 D ε q I, and I D q qq 1 I qd + I q εq 1 D D 17

19 Therefore, we have that q I Y 1/ ww Y 1/ I q 1 + ε/q This implies that try + ww q tr I I q q 1 εq 1 Y 1/ ww Y 1/ 1 + ε/q εq 1 εq + 1 q Y 1/ ww Y 1/ 1 ε q Y 1/ ww Y 1/ I q 1 ε Y 1/ ww Y 1/. Y q I q1 εy 1/ ww Y 1/ tr Y q q1 ε w Y q+1 w, which proves the first statement. Now for the second inequality. Let Z = ui A. By the Sherman-Morrison Formula Lemma A.1, it holds that trz ww q = tr Z 1 + Z 1 ww Z 1 q 1 w Z 1. A.5 w By the assumption of w Z 1 w ε q, it holds that trz ww q tr Z 1 + Z 1 ww Z 1 q A.6 1 ε/q q = tr Z 1/ I + Z 1/ ww Z 1/ Z 1/ 1 ε/q q tr Z q/ I + Z 1/ ww Z 1/ Z q/ A.7 1 ε/q q = tr Z q I + Z 1/ ww Z 1/, A.8 1 ε/q where A.6 uses the fact that A B implies that tr A q tr B q, A.7 follows from the Lieb- Thirring inequality Lemma A., and A.8 uses the fact that the trace is invariant under cyclic permutations. Let E = Z 1/ ww Z 1/. Combing E ε q I with the assumption that q 10 and ε 1/10, we have that I + E q I + qe qq ε/q q E 1 ε/q 1 ε/q 1 ε/q 1 ε/q I + q ε qq 1 E E q Therefore, we have that which proves the second statement. I + q ε E + 0.7εqE I + q 1 + ε E. trz ww q tr Z q + q1 + ε w Z q+1 w, 18

20 A. Implementation of the Algorithm In this section, we show that the algorithm for constructing graph sparsification runs in almostlinear time. Based on previous discussion, we only need to prove that, for any iteration j, the number of samples N j and {R i A j, u j, l j } m can be approximately computed in almost-linear time. By definition, it suffices to compute λ min u j I A j, λ min A j l j I, v i u ji A j 1 v i, and v i A j l j I 1 v i for all i. For simplicity we drop the subscript j expressing the iterations in this subsection. We will assume that the following assumption holds on A. We remark that an almost-linear time algorithm for computing similar quantities was shown in [AZLO15]. Assumption A.3. Let L and L be the Laplacian matrices of graph G and its subgraph after reweighting. Let A = L 1/ LL 1/, and assume that holds for some 0 < η < 1. l + l η I A 1 ηu I Lemma A.4. Under Assumption A.3, the following statements hold: We can construct a matrix S u such that and S u = pa for a polynomial p of degree O We can construct a matrix S l such that S u ε/10 ui A 1/,. log1/εη η S l ε/10 A li 1/. Moreover, S l is of the form A 1/ qa 1,where q is a polynomial of degree O and A = L 1/ L L 1/ for some Laplacian matrix L. Proof. By Taylor expansion, it holds that 1 x 1/ = 1 + k 1 k=1 j + 1 x k k!. log1/εη η We define for any T N that p T x = 1 + T k 1 k=1 j + 1 x k k!. Then, it holds for any 0 < x < 1 η that p T x 1 x 1/ = p T x + p T x + k 1 k=t +1 k=t +1 x k 1 ηt +1 p T x +. η j + 1 x k k! 19

21 Hence, it holds that and ui A 1/ = u 1/ I u 1 A 1/ u 1/ p T u 1 A, ui A 1/ u 1/ p T u 1 1 ηt +1 A + I, η since u 1 A 1 ηi. Notice that u 1/ I ui A 1/, and therefore ui A 1/ u 1/ p T u 1 1 ηt +1 A + η ui A 1/. Setting T = c log1/εη η for some constant c and defining S u = u 1/ p T u 1 A gives us that S u ε/10 ui A 1/. Now for the second statement. Our construction of S l is based on the case distinction l > 0, and l 0. Case 1: l > 0. Notice that A li 1/ = A 1/ I la 1 1/, and p T la 1 I la 1 1/ pt la 1 1 η/t +1 + η/ Using the same analysis as before, we have that I. A 1/ I la 1 1/ ε/10 A 1/ p T la 1. By defining S l = A 1/ p T la 1, i.e., A = A and q A 1 = p T la 1, we have that Case : l 0. We look at the matrix S l ε/10 A li 1/. A li = L 1/ LL 1/ li = L 1/ L lll 1/. Notice that L ll is a Laplacian matrix, and hence this reduces to the case of l = 0, for which we simply set S l = A li 1/. Therefore, we can write S l as a desired form, where A = A li and polynomial q = 1. Lemma A.5 below shows how to estimate v i ui A 1 v i, and v i A li 1 v i, for all v i in nearly-linear time. Lemma A.5. Let A = m v iv i, and suppose that A satisfies Assumption A.3. Then, we can compute {r i } m and {t i} m in Õ m time such that ε η 1 εr i v i ui A 1 v i 1 + εr i, and 1 εt i v i A li 1 v i 1 + εt i. 0

22 Proof. Define u i = L 1/ v i for any 1 i m. By Lemma A.4, we have that v i ui A 1 v i 3ε/10 pav i = p L 1/ 1/ LL L 1/ u i = L 1/ p L 1 L L 1 u i. Let L = B B for some B R m n. Then, it holds that v Bp i ui A 1 v i 3ε/10 L 1 L L 1 u i. We invoke the Johnson-Lindenstrauss Lemma and find a random matrix Q R Olog n/ε m : With high probability, it holds that v QBp i ui A 1 v i 4ε/10 L 1 L L 1 u i. We apply a nearly-linear time Laplacian solver to compute up to 1 ± ε/10-multiplicative error in time Õ m ε η The computation for {t i } m QBp L 1 L L 1 u i for all {ui } m. This gives the desired {r i } m. is similar. By Lemma A.4, it holds for any 1 i m that v i A li 1 v i 3ε/10 A 1/ qa 1 v i = A 1/ q L 1/ L 1 L 1/ L 1/ u i = A 1/ L 1/ qll 1 u i. Let L = B B for some B R m n. Then, it holds that v i A li 1 v i 3ε/10 L 1/ q LL 1 u i = L 1/ L 1 q LL 1 u i = B L 1 q LL 1 u i. We invoke the Johnson-Lindenstrauss Lemma and a nearly-linear time Laplacian solver as before to obtain required {t i } m. The total runtime is Õ m ηε. Lemma A.6 shows that how to approximate λ min ui A and λ min A li in nearly-linear time. Lemma A.6. Under Assumption A.3, we can compute values α, β in Õ m time such that ηε 3 1 εα λ min ui A 1 + εα and 1 εβ λ min A li 1 + εβ. 1

23 Proof. By Lemma A.4, we have that S u ε/10 ui A 1/. Hence, λ max S u 3ε/10 λ min ui A, and it suffices to estimate λ max S u. Since λ max S u tr S k u 1/k n 1/k λ max S u, by picking k = log n/ε we have that trsu k 1/k ε/ λ max S u. Notice that tr = tr p k L 1/ 1/ LL = tr p k L 1 L. S k u Set L = B B for some matrix B R m n, and we have that tr Su k = tr p k BL 1 B. Since we can apply p k BL 1 B to vectors in Õ m ηε time, we invoke the Johnson-Lindenstrauss Lemma and approximate tr Su k in Õ m time. ηε 3 We approximate λ min A li in a similar way. Notice that tr S 4k l 4k = tr A 1/ qa 1 = tr qa 1 A 1 qa 1 k. Let z be a polynomial defined by zx = xq x and L = B B. Then, we have that trsl z 4k = tr k A 1 = tr z k L 1/ L 1 L 1/. Applying the same analysis as before, we can estimate the trace in Õ m ηε 3 time.

An SDP-Based Algorithm for Linear-Sized Spectral Sparsification

An SDP-Based Algorithm for Linear-Sized Spectral Sparsification ABSTRACT Yin Tat Lee Microsoft Research Redmond, USA yile@microsoft.com For any undirected and weighted graph G = V, E, w with n vertices