Convex Optimization of Graph Laplacian Eigenvalues

Size: px

Start display at page:

Download "Convex Optimization of Graph Laplacian Eigenvalues"

Alberta Kelly
6 years ago
Views:

1 Convex Optimization of Graph Laplacian Eigenvalues Stephen Boyd Stanford University (Joint work with Persi Diaconis, Arpita Ghosh, Seung-Jean Kim, Sanjay Lall, Pablo Parrilo, Amin Saberi, Jun Sun, Lin Xiao. Thanks to Almir Mutapcic.) ICM 2006 Madrid, August 29, 2006

2 Outline some basic stuff we ll need graph Laplacian eigenvalues convex optimization and semidefinite programming the basic idea some example problems distributed linear averaging fastest mixing Markov chain on a graph fastest mixing Markov process on a graph its dual: maximum variance unfolding conclusions ICM 2006 Madrid, August 29,

3 Outline some basic stuff we ll need graph Laplacian eigenvalues convex optimization and semidefinite programming the basic idea some example problems distributed linear averaging fastest mixing Markov chain on a graph fastest mixing Markov process on a graph its dual: maximum variance unfolding conclusions

4 (Weighted) graph Laplacian graph G = (V,E) with n = V nodes, m = E edges edge weights w 1,...,w m R l (i,j) means edge l connects nodes i, j incidence matrix: A il = 1 edge l enters node i 1 edge l leaves node i 0 otherwise (weighted) Laplacian: L = Adiag(w)A T L ij = w l l (i, j) l (i,k) w l i = j 0 otherwise ICM 2006 Madrid, August 29,

5 Laplacian eigenvalues L is symmetric; L1 = 0 we ll be interested in case when L 0 (i.e., L is PSD) (always the case when weights nonnegative) Laplacian eigenvalues (eigenvalues of L): 0 = λ 1 λ 2 λ n spectral graph theory connects properties of graph, and λ i (with w = 1) e.g.: G connected iff λ 2 > 0 (with w = 1) ICM 2006 Madrid, August 29,

6 Convex spectral functions suppose φ is a symmetric convex function in n 1 variables then ψ(w) = φ(λ 2,...,λ n ) is a convex function of weight vector w examples: φ(u) = 1 T u (i.e., the sum): ψ(w) = n λ i = n λ i = TrL = 21 T w (twice the total weight) i=2 i=1 φ(u) = max i u i : ψ(w) = max{λ 2,...,λ n } = λ n (spectral radius) ICM 2006 Madrid, August 29,

7 More examples φ(u) = min i u i (concave) gives ψ(w) = λ 2, algebraic connectivity (concave function of w) φ(u) = i 1/u i (with u i > 0): ψ(w) = n i=2 1 λ i proportional to total effective resistance of graph, TrL ICM 2006 Madrid, August 29,

8 Outline some basic stuff we ll need graph Laplacian eigenvalues convex optimization and semidefinite programming the basic idea some example problems distributed linear averaging fastest mixing Markov chain on a graph fastest mixing Markov process on a graph its dual: maximum variance unfolding conclusions

9 Convex optimization x R n is optimization variable minimize f(x) subject to x X f is convex function (can maximize concave f by minimizing f) X R n is closed convex set roughly speaking, convex optimization problems are tractable, easy to solve numerically (tractability depends on how f and X are described) ICM 2006 Madrid, August 29,

10 Symmetry in convex optimization permutation (matrix) π is a symmetry of problem if f(πz) = f(z) for all z, πz X for all z X if π is a symmetry and the convex optimization problem has a solution, it has a solution invariant under π (if x is a solution, so is average over {x, πx,π 2 x,...}) ICM 2006 Madrid, August 29,

11 Duality in convex optimization primal: minimize f(x) subject to x X dual: maximize g(y) subject to y Y y is dual variable; dual objective g is concave; Y is closed, convex (various methods can be used to generate g, Y) p (d ) is optimal value of primal (dual) problem weak duality: for any x X, y Y, f(x) g(y); hence, p g(y) strong duality: for convex problems, provided a constraint qualification holds, there exist x X, y Y with f(x ) = g(y ) hence, x is primal optimal, y is dual optimal, and p = d ICM 2006 Madrid, August 29,

12 Semidefinite program (SDP) a particular type of convex optimization problem minimize c T x subject to n i=1 x ia i B, Fx = g variable is x R n ; data are c, F, g, symmetric matrices A i, B means with respect to positive semidefinite cone generalization of linear program (LP) minimize c T x subject to n i=1 x ia i b, Fx = g (here a i, b are vectors; means componentwise) ICM 2006 Madrid, August 29,

13 SDP dual primal SDP: minimize c T x subject to n i=1 x ia i B, Fx = g dual SDP: maximize TrZB ( ν T g subject to Z 0, F T ν + c ) + TrZA i i = 0, i = 1,...,n with (matrix) variable Z, (vector) ν ICM 2006 Madrid, August 29,

14 SDP algorithms and applications since 1990s, recently developed interior-point algorithms solve SDPs very effectively (polynomial time, work well in practice) many results for LP extended to SDP SDP widely used in many fields (control, combinatorial optimization, machine learning, finance, signal processing, communications, networking, circuit design, mechanical engineering,... ) ICM 2006 Madrid, August 29,

15 Outline some basic stuff we ll need graph Laplacian eigenvalues convex optimization and semidefinite programming the basic idea some example problems distributed linear averaging fastest mixing Markov chain on a graph fastest mixing Markov process on a graph its dual: maximum variance unfolding conclusions

16 The basic idea some interesting weight optimization problems have the common form minimize φ(w) = φ(λ 2,...,λ n ) subject to w W where φ is symmetric convex, and W is closed convex these are convex optimization problems we can solve them numerically (up to our ability to store data, compute eigenvalues... ) for some simple graphs, we can get analytic solutions associated dual problems can be quite interesting ICM 2006 Madrid, August 29,

17 Outline some basic stuff we ll need graph Laplacian eigenvalues convex optimization and semidefinite programming the basic idea some example problems distributed linear averaging fastest mixing Markov chain on a graph fastest mixing Markov process on a graph its dual: maximum variance unfolding conclusions

18 Distributed averaging x 1 1 W 12 2 x2 W 13 W 14 W 24 3 x 4 x 4 3 W 45 x 5 5 W 36 W 46 6 x 6 each node of connected graph has initial value x i (0) R; goal is to compute average 1 T x(0)/n using distributed iterative method applications in load balancing, distributed optimization, sensor networks ICM 2006 Madrid, August 29,

19 Distributed linear averaging simple linear iteration: replace each node value with weighted average of its own and its neighbors values; repeat x i (t + 1) = W ii x i (t) + where W ii + j N i W ij = 1 j N i W ij x j (t) = x i (t) j N i W ij (x i (t) x j (t)) we ll assume W ij = W ji, i.e., weights symmetric weights W ij determine whether convergence to average occurs, and if so, how fast classical result: convergence if weights W ij (i j) are small, positive ICM 2006 Madrid, August 29,

20 Convergence rate vector form: x(t + 1) = Wx(t) (we take W ij = 0 for i j, (i,j) E) W satisfies W = W T, W1 = 1 convergence lim t W t = (1/n)11 T ρ(w (1/n)11 T ) = W (1/n)11 T < 1 ρ is spectral radius; is spectral norm asymptotic convergence rate given by W (1/n)11 T convergence time is τ = 1/log W (1/n)11 T ICM 2006 Madrid, August 29,

21 Connection to Laplacian eigenvalues identifying W ij = w l for l (i,j), we have W = I L convergence rate given by W (1/n)11 T = I L (1/n)11 T = max{ 1 λ 2,..., 1 λ n } = max{1 λ 2,λ n 1}... a convex spectral function, with φ(u) = max i 1 u i ICM 2006 Madrid, August 29,

22 Fastest distributed linear averaging minimize W (1/n)11 T subject to W S, W = W T, W1 = 1 optimization variable is W; problem data is graph (sparsity pattern S) in terms of Laplacian eigenvalues with variable w R m minimize max{1 λ 2, λ n 1} these are convex optimization problems so, we can efficiently find the weights that give the fastest possible averaging on a graph ICM 2006 Madrid, August 29,

23 Semidefinite programming formulation introduce scalar variable s to bound spectral norm minimize s subject to si I L (1/n)11 T si (for Z = Z T, Z s si Z si) an SDP (hence, can be solved efficiently) ICM 2006 Madrid, August 29,

24 Constant weight averaging a simple, traditional method: constant weight on all edges, w = α1 yields update x i (t + 1) = x i (t) + j N i α(x j (t) x i (t)) a simple choice: max-degree weight, α = 1/max i d i d i is degree (number of neighbors) of node i best constant weight: α 2 = λ 2 + λ n (λ 2, λ n are eigenvalues of unweighted Laplacian, i.e., with w = 1) for edge transitive graph, w l = α is optimal ICM 2006 Madrid, August 29,

25 A small example max-degree best constant optimal ρ = W (1/n)11 T τ = 1/ log(1/ρ) ICM 2006 Madrid, August 29,

26 Optimal weights (note: some are negative!) λ i (W):.643,.643,.106, 0.000,.368,.643,.643, ICM 2006 Madrid, August 29,

27 Larger example 50 nodes, 200 edges max-degree best constant optimal ρ = W (1/n)11 T τ = 1/ log(1/ρ) ICM 2006 Madrid, August 29,

28 Eigenvalue distributions 6 4 max-degree best constant optimal ICM 2006 Madrid, August 29,

29 Optimal weights out of 250 are negative ICM 2006 Madrid, August 29,

30 Another example a cut grid with n = 64 nodes, m = 95 edges edge width shows weight value (red for negative) τ = 85; max-degree τ = 137 ICM 2006 Madrid, August 29,

31 Some questions & comments how much better are the optimal weights than the simple choices? for barbell graphs K n K n, optimal weights are unboundedly better than max-degree, optimal constant, and several other simple weight choices what size problems can be handled (on a PC)? interior-point algorithms easily handle problems with 10 4 edges subgradient-based methods handle problems with 10 6 edges any symmetry can be exploited for efficiency gain what happens if we require the weights to be nonnegative? (we ll soon see) ICM 2006 Madrid, August 29,

32 Least-mean-square average consensus include random noise in averaging process: x(t + 1) = W x(t) + v(t) v(t) i.i.d., Ev(t) = 0, Ev(t)v(t) T = I steady-state mean-square deviation: δ ss = lim t E 1 n (x i (t) x j (t)) 2 = i<j n i=2 1 λ i (2 λ i ) for ρ = max{1 λ 2, λ n 1} < 1 another convex spectral function ICM 2006 Madrid, August 29,

33 Outline some basic stuff we ll need graph Laplacian eigenvalues convex optimization and semidefinite programming the basic idea some example problems distributed linear averaging fastest mixing Markov chain on a graph fastest mixing Markov process on a graph its dual: maximum variance unfolding conclusions

34 Random walk on a graph Markov chain on nodes of G, with transition probabilities on edges P ij = Prob(X(t + 1) = j X(t) = i) we ll focus on symmetric transition probability matrices P (everything extends to reversible case, with fixed equilibrium distr.) identifying P ij with w l for l (i,j), we have P = I L same as linear averaging matrix W, but here W ij 0 (i.e., w 0, diag(l) 1) ICM 2006 Madrid, August 29,

35 Mixing rate probability distribution π i (t) = Prob(X(t) = i) satisfies π(t + 1) T = π(t) T P since P = P T and P1 = 1, uniform distribution π = (1/n)1 is stationary, i.e., ((1/n)1) T P = ((1/n)1) T π(t) (1/n)1 for any π(0) iff µ = P (1/n)11 T = I L (1/n)11 T < 1 µ is called second largest eigenvalue modulus (SLEM) of MC SLEM determines convergence (mixing) rate, e.g., sup π(t) (1/n)1 tv ( n/2 ) µ t π(0) associated mixing time is τ = 1/log(1/µ) ICM 2006 Madrid, August 29,

36 Fastest mixing Markov chain problem minimize µ = I L (1/n)11 T = max{1 λ 2,λ n 1} subject to w 0, diag(l) 1 optimization variable is w; problem data is graph G same as fast linear averaging problem, with additional nonnegativity constraint W ij 0 on weights convex optimization problem (indeed, SDP), hence efficiently solved ICM 2006 Madrid, August 29,

37 Two common suboptimal schemes max-degree chain: w = (1/max i d i )1 Metropolis-Hastings chain: w l = 1, where l (i,j) max{d i, d j } (comes from Metropolis method of generating reversible MC with uniform stationary distribution) ICM 2006 Madrid, August 29,

38 Small example optimal transition probabilities (some are zero) max-degree M.-H. optimal (fastest avg) SLEM µ (.643) mixing time τ (2.27) ICM 2006 Madrid, August 29,

39 Larger example 50 nodes, 200 edges max-degree M.-H. optimal (fastest avg) SLEM µ (.902) mixing time τ (9.7) ICM 2006 Madrid, August 29,

40 Optimal transition probabilities 82 edges (out of 200) edges have zero transition probability distribution of positive transition probabilities: ICM 2006 Madrid, August 29,

41 Subgraph with positive transition probabilities ICM 2006 Madrid, August 29,

42 Another example a cut grid with n = 64 nodes, m = 95 edges edge width shows weight value (dotted for zero) mixing time τ = 89; Metropolis-Hastings mixing time τ = 120 ICM 2006 Madrid, August 29,

43 Some analytical results for path, fastest mixing MC is obvious one (P i,i+1 = 1/2) for any edge-transitive graph (hypercube, ring,... ), all edge weights are equal, with value 2/(λ 2 + λ n ) (unweighted Laplacian eigenvalues) ICM 2006 Madrid, August 29,

44 Commute time for random walk on graph P ij proportional to w l, for l (i,j); P ii = 0 P not symmetric, but MC is reversible can normalize w as 1 T w = 1 commute time C ij : time for random walk to return to i after visiting j expected commute time averaged over all pairs of nodes is C = 1 n 2 n i,j=1 EC ij = 2 (n 1) n i=2 1 λ i (Chandra et al, 1989) called total effective resistance... another convex spectral function ICM 2006 Madrid, August 29,

45 Minimizing average commute time find weights that minimize average commute time on graph: minimize C = 2/(n 1) n i=2 1/λ i subject to w 0, 1 T w = 1 another convex problem of our general form ICM 2006 Madrid, August 29,

46 Outline some basic stuff we ll need graph Laplacian eigenvalues convex optimization and semidefinite programming the basic idea some example problems distributed linear averaging fastest mixing Markov chain on a graph fastest mixing Markov process on a graph its dual: maximum variance unfolding conclusions

47 Markov process on a graph (continuous-time) Markov process on nodes of G, with transition rate w l 0 between nodes i and j, for l (i,j) probability distribution π(t) R n satisfies heat equation π(t) = Lπ(t) π(t) = e tl π(0) π(t) converges to uniform distribution (1/n)1, for any π(0), iff λ 2 > 0 (asymptotic) convergence as e λ 2t ; λ 2 gives mixing rate of process λ 2 is concave, homogeneous function of w (come from symmetric concave function φ(u) = min i u i ) ICM 2006 Madrid, August 29,

48 Fastest mixing Markov process on a graph maximize λ 2 subject to l d2 l w l 1, w 0 variable is w R m ; data is graph, normalization constants d l > 0 a convex optimization problem, hence easily solved allocate rate across edges so as maximize mixing rate constraint is always tight at solution, i.e., l d2 l w l = 1 when d 2 l = 1/m, optimal value is called absolute algebraic connectivity ICM 2006 Madrid, August 29,

49 Interpretation: Grounded unit capacitor RC circuit charge vector q(t) satisfies q(t) = Lq(t), with edge weights given by conductances, w l = g l charge equilibrates (i.e., converges to uniform) at rate determined by λ 2 with conductor resistivity ρ, length d l, and cross-sectional area a l, we have g l = a l /(ρd l ) ICM 2006 Madrid, August 29,

50 total conductor volume is l d la l = ρ l d2 l w l problem is to choose conductor cross-sectional areas, subject to a total volume constraint, so as to make the circuit equilibrate charge as fast as possible optimal λ 2 is.105; uniform allocation of conductance gives λ 2 =.073 ICM 2006 Madrid, August 29,

51 SDP formulation and dual alternate formulation: minimize d 2 l w l subject to λ 2 1, w 0 SDP formulation: minimize d 2 l w l subject to L I (1/n)11 T, w 0 dual problem: maximize Tr X subject to X ii + X jj X ij X ji d 2 l, 1 T X1 = 0, X 0 l (i,j) with variable X R n n ICM 2006 Madrid, August 29,

52 A maximum variance unfolding problem use variables x 1,...,x n R n, with X = xt 1. x T n [x 1 x n ] dual problem becomes maximum variance unfolding (MVU) problem maximize i x i 2 subject to x i x j d l, i x i = 0 l (i,j) position n points in R n to maximize variance, while respecting local distance constraints ICM 2006 Madrid, August 29,

53 similar to semidefinite embedding for unsupervised learning of manifolds (Weinberger & Saul 2004) surprise: duality between fastest mixing Markov process and maximum variance unfolding ICM 2006 Madrid, August 29,

54 Conclusions some interesting weight optimization problems have the common form minimize φ(w) = φ(λ 2,...,λ n ) subject to w W where φ is symmetric convex, and W is closed convex these are convex optimization problems we can solve them numerically (up to our ability to store data, compute eigenvalues... ) for some simple graphs, we can get analytic solutions associated dual problems can be quite interesting ICM 2006 Madrid, August 29,

55 Convex Optimization Cambridge University Press, 2004 Some references Fast linear iterations for distributed averaging Systems and Control Letters 53:65-78, 2004 Fastest mixing Markov chain on a graph SIAM Review 46(4): , 2004 Fastest mixing Markov process on a graph and a connection to a maximum variance unfolding problem to appear, SIAM Review 48(4), 2006 Minimizing effective resistance of a graph to appear, SIAM Review, 2007 Distributed average consensus with least-mean-square deviation MTNS p , 2006 these and other papers available at ICM 2006 Madrid, August 29,

Convex Optimization of Graph Laplacian Eigenvalues

Convex Optimization of Graph Laplacian Eigenvalues Stephen Boyd Abstract. We consider the problem of choosing the edge weights of an undirected graph so as to maximize or minimize some function of the