Convex Optimization of Graph Laplacian Eigenvalues

Similar documents
Convex Optimization of Graph Laplacian Eigenvalues

Fast Linear Iterations for Distributed Averaging 1

FASTEST MIXING MARKOV CHAIN ON A GRAPH. 1. Introduction Fastest mixing Markov chain problem.

The Fastest Mixing Markov Process on a Graph and a Connection to a Maximum Variance Unfolding Problem

MINIMIZING EFFECTIVE RESISTANCE OF A GRAPH

ELE539A: Optimization of Communication Systems Lecture 15: Semidefinite Programming, Detection and Estimation Applications

A Duality View of Spectral Methods for Dimensionality Reduction

Lecture Note 5: Semidefinite Programming for Stability Analysis

Graph and Controller Design for Disturbance Attenuation in Consensus Networks

Dimension reduction for semidefinite programming

Upper bounds on algebraic connectivity via convex optimization

A Duality View of Spectral Methods for Dimensionality Reduction

SEMIDEFINITE PROGRAM BASICS. Contents

Semidefinite Programming Basics and Applications

CSC Linear Programming and Combinatorial Optimization Lecture 10: Semidefinite Programming

Fast linear iterations for distributed averaging

Fastest Distributed Consensus Averaging Problem on. Chain of Rhombus Networks

INTERIOR-POINT METHODS ROBERT J. VANDERBEI JOINT WORK WITH H. YURTTAN BENSON REAL-WORLD EXAMPLES BY J.O. COLEMAN, NAVAL RESEARCH LAB

Introduction to Machine Learning CMU-10701

Inequalities for the spectra of symmetric doubly stochastic matrices

Lecture: Introduction to LP, SDP and SOCP

Lecture 15 Perron-Frobenius Theory

Convex Optimization CMU-10725

Distributed Optimization over Networks Gossip-Based Algorithms

Graph Partitioning Using Random Walks

EE/ACM Applications of Convex Optimization in Signal Processing and Communications Lecture 17

6-1 The Positivstellensatz P. Parrilo and S. Lall, ECC

Continuous Optimisation, Chpt 9: Semidefinite Optimisation

The maximal stable set problem : Copositive programming and Semidefinite Relaxations

Agenda. Applications of semidefinite programming. 1 Control and system theory. 2 Combinatorial and nonconvex optimization

c 2005 Society for Industrial and Applied Mathematics

4. Algebra and Duality

6 Markov Chain Monte Carlo (MCMC)

III. Applications in convex optimization

approximation algorithms I

Consensus Problems on Small World Graphs: A Structural Study

Determinant maximization with linear. S. Boyd, L. Vandenberghe, S.-P. Wu. Information Systems Laboratory. Stanford University

Randomized Gossip Algorithms

Convex Optimization and l 1 -minimization

MIT Algebraic techniques and semidefinite optimization May 9, Lecture 21. Lecturer: Pablo A. Parrilo Scribe:???

Semidefinite and Second Order Cone Programming Seminar Fall 2001 Lecture 5

Continuous Optimisation, Chpt 9: Semidefinite Problems

LECTURE 13 LECTURE OUTLINE

Convex Optimization M2

LECTURE NOTE #11 PROF. ALAN YUILLE

Semidefinite and Second Order Cone Programming Seminar Fall 2001 Lecture 4

Convex sets, conic matrix factorizations and conic rank lower bounds

Example: feasibility. Interpretation as formal proof. Example: linear inequalities and Farkas lemma

Graph Laplacian Regularization for Large-Scale Semidefinite Programming

Optimal Scaling of a Gradient Method for Distributed Resource Allocation 1

Interior Point Algorithms for Constrained Convex Optimization

5. Duality. Lagrangian

The Miracles of Tropical Spectral Theory

Communities, Spectral Clustering, and Random Walks

Inderjit Dhillon The University of Texas at Austin

Sparse Covariance Selection using Semidefinite Programming

Analysis and synthesis: a complexity perspective

A priori bounds on the condition numbers in interior-point methods

Relaxations and Randomized Methods for Nonconvex QCQPs

Advances in Convex Optimization: Theory, Algorithms, and Applications

- Well-characterized problems, min-max relations, approximate certificates. - LP problems in the standard form, primal and dual linear programs

Convex Optimization in Classification Problems

Advanced SDPs Lecture 6: March 16, 2017

Agreement algorithms for synchronization of clocks in nodes of stochastic networks

EE 227A: Convex Optimization and Applications October 14, 2008

Random Walks on Graphs. One Concrete Example of a random walk Motivation applications

Fast ADMM for Sum of Squares Programs Using Partial Orthogonality

Lagrange Duality. Daniel P. Palomar. Hong Kong University of Science and Technology (HKUST)

Markov Chains, Random Walks on Graphs, and the Laplacian

Data Analysis and Manifold Learning Lecture 9: Diffusion on Manifolds and on Graphs

15. Conic optimization

12. Interior-point methods

POLYNOMIAL OPTIMIZATION WITH SUMS-OF-SQUARES INTERPOLANTS

Spectra of Large Random Stochastic Matrices & Relaxation in Complex Systems

Introduction to Machine Learning Lecture 7. Mehryar Mohri Courant Institute and Google Research

Real Symmetric Matrices and Semidefinite Programming

8.1 Concentration inequality for Gaussian random matrix (cont d)

Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization

Lecture 20: Reversible Processes and Queues

Some Properties of the Augmented Lagrangian in Cone Constrained Optimization

MULTISCALE DISTRIBUTED ESTIMATION WITH APPLICATIONS TO GPS AUGMENTATION AND NETWORK SPECTRA

Linearly-solvable Markov decision problems

Agenda. 1 Cone programming. 2 Convex cones. 3 Generalized inequalities. 4 Linear programming (LP) 5 Second-order cone programming (SOCP)

Module 04 Optimization Problems KKT Conditions & Solvers

LECTURE 15 Markov chain Monte Carlo

minimize x x2 2 x 1x 2 x 1 subject to x 1 +2x 2 u 1 x 1 4x 2 u 2, 5x 1 +76x 2 1,

15-780: LinearProgramming

Stochastic Subgradient Methods

1 The independent set problem

Semidefinite Programming

Primal-Dual Geometry of Level Sets and their Explanatory Value of the Practical Performance of Interior-Point Methods for Conic Optimization

Optimization over Polynomials with Sums of Squares and Moment Matrices

Convergence Rate of Markov Chains

Geometric problems. Chapter Projection on a set. The distance of a point x 0 R n to a closed set C R n, in the norm, is defined as

Algebraic Representation of Networks

Lecture: Expanders, in theory and in practice (2 of 2)

Course Outline. FRTN10 Multivariable Control, Lecture 13. General idea for Lectures Lecture 13 Outline. Example 1 (Doyle Stein, 1979)

Markov Chains and MCMC

Copositive Programming and Combinatorial Optimization

HW1 solutions. 1. α Ef(x) β, where Ef(x) is the expected value of f(x), i.e., Ef(x) = n. i=1 p if(a i ). (The function f : R R is given.

Transcription:

Convex Optimization of Graph Laplacian Eigenvalues Stephen Boyd Stanford University (Joint work with Persi Diaconis, Arpita Ghosh, Seung-Jean Kim, Sanjay Lall, Pablo Parrilo, Amin Saberi, Jun Sun, Lin Xiao. Thanks to Almir Mutapcic.) ICM 2006 Madrid, August 29, 2006

Outline some basic stuff we ll need graph Laplacian eigenvalues convex optimization and semidefinite programming the basic idea some example problems distributed linear averaging fastest mixing Markov chain on a graph fastest mixing Markov process on a graph its dual: maximum variance unfolding conclusions ICM 2006 Madrid, August 29, 2006 1

Outline some basic stuff we ll need graph Laplacian eigenvalues convex optimization and semidefinite programming the basic idea some example problems distributed linear averaging fastest mixing Markov chain on a graph fastest mixing Markov process on a graph its dual: maximum variance unfolding conclusions

(Weighted) graph Laplacian graph G = (V,E) with n = V nodes, m = E edges edge weights w 1,...,w m R l (i,j) means edge l connects nodes i, j incidence matrix: A il = 1 edge l enters node i 1 edge l leaves node i 0 otherwise (weighted) Laplacian: L = Adiag(w)A T L ij = w l l (i, j) l (i,k) w l i = j 0 otherwise ICM 2006 Madrid, August 29, 2006 2

Laplacian eigenvalues L is symmetric; L1 = 0 we ll be interested in case when L 0 (i.e., L is PSD) (always the case when weights nonnegative) Laplacian eigenvalues (eigenvalues of L): 0 = λ 1 λ 2 λ n spectral graph theory connects properties of graph, and λ i (with w = 1) e.g.: G connected iff λ 2 > 0 (with w = 1) ICM 2006 Madrid, August 29, 2006 3

Convex spectral functions suppose φ is a symmetric convex function in n 1 variables then ψ(w) = φ(λ 2,...,λ n ) is a convex function of weight vector w examples: φ(u) = 1 T u (i.e., the sum): ψ(w) = n λ i = n λ i = TrL = 21 T w (twice the total weight) i=2 i=1 φ(u) = max i u i : ψ(w) = max{λ 2,...,λ n } = λ n (spectral radius) ICM 2006 Madrid, August 29, 2006 4

More examples φ(u) = min i u i (concave) gives ψ(w) = λ 2, algebraic connectivity (concave function of w) φ(u) = i 1/u i (with u i > 0): ψ(w) = n i=2 1 λ i proportional to total effective resistance of graph, TrL ICM 2006 Madrid, August 29, 2006 5

Outline some basic stuff we ll need graph Laplacian eigenvalues convex optimization and semidefinite programming the basic idea some example problems distributed linear averaging fastest mixing Markov chain on a graph fastest mixing Markov process on a graph its dual: maximum variance unfolding conclusions

Convex optimization x R n is optimization variable minimize f(x) subject to x X f is convex function (can maximize concave f by minimizing f) X R n is closed convex set roughly speaking, convex optimization problems are tractable, easy to solve numerically (tractability depends on how f and X are described) ICM 2006 Madrid, August 29, 2006 6

Symmetry in convex optimization permutation (matrix) π is a symmetry of problem if f(πz) = f(z) for all z, πz X for all z X if π is a symmetry and the convex optimization problem has a solution, it has a solution invariant under π (if x is a solution, so is average over {x, πx,π 2 x,...}) ICM 2006 Madrid, August 29, 2006 7

Duality in convex optimization primal: minimize f(x) subject to x X dual: maximize g(y) subject to y Y y is dual variable; dual objective g is concave; Y is closed, convex (various methods can be used to generate g, Y) p (d ) is optimal value of primal (dual) problem weak duality: for any x X, y Y, f(x) g(y); hence, p g(y) strong duality: for convex problems, provided a constraint qualification holds, there exist x X, y Y with f(x ) = g(y ) hence, x is primal optimal, y is dual optimal, and p = d ICM 2006 Madrid, August 29, 2006 8

Semidefinite program (SDP) a particular type of convex optimization problem minimize c T x subject to n i=1 x ia i B, Fx = g variable is x R n ; data are c, F, g, symmetric matrices A i, B means with respect to positive semidefinite cone generalization of linear program (LP) minimize c T x subject to n i=1 x ia i b, Fx = g (here a i, b are vectors; means componentwise) ICM 2006 Madrid, August 29, 2006 9

SDP dual primal SDP: minimize c T x subject to n i=1 x ia i B, Fx = g dual SDP: maximize TrZB ( ν T g subject to Z 0, F T ν + c ) + TrZA i i = 0, i = 1,...,n with (matrix) variable Z, (vector) ν ICM 2006 Madrid, August 29, 2006 10

SDP algorithms and applications since 1990s, recently developed interior-point algorithms solve SDPs very effectively (polynomial time, work well in practice) many results for LP extended to SDP SDP widely used in many fields (control, combinatorial optimization, machine learning, finance, signal processing, communications, networking, circuit design, mechanical engineering,... ) ICM 2006 Madrid, August 29, 2006 11

Outline some basic stuff we ll need graph Laplacian eigenvalues convex optimization and semidefinite programming the basic idea some example problems distributed linear averaging fastest mixing Markov chain on a graph fastest mixing Markov process on a graph its dual: maximum variance unfolding conclusions

The basic idea some interesting weight optimization problems have the common form minimize φ(w) = φ(λ 2,...,λ n ) subject to w W where φ is symmetric convex, and W is closed convex these are convex optimization problems we can solve them numerically (up to our ability to store data, compute eigenvalues... ) for some simple graphs, we can get analytic solutions associated dual problems can be quite interesting ICM 2006 Madrid, August 29, 2006 12

Outline some basic stuff we ll need graph Laplacian eigenvalues convex optimization and semidefinite programming the basic idea some example problems distributed linear averaging fastest mixing Markov chain on a graph fastest mixing Markov process on a graph its dual: maximum variance unfolding conclusions

Distributed averaging x 1 1 W 12 2 x2 W 13 W 14 W 24 3 x 4 x 4 3 W 45 x 5 5 W 36 W 46 6 x 6 each node of connected graph has initial value x i (0) R; goal is to compute average 1 T x(0)/n using distributed iterative method applications in load balancing, distributed optimization, sensor networks ICM 2006 Madrid, August 29, 2006 13

Distributed linear averaging simple linear iteration: replace each node value with weighted average of its own and its neighbors values; repeat x i (t + 1) = W ii x i (t) + where W ii + j N i W ij = 1 j N i W ij x j (t) = x i (t) j N i W ij (x i (t) x j (t)) we ll assume W ij = W ji, i.e., weights symmetric weights W ij determine whether convergence to average occurs, and if so, how fast classical result: convergence if weights W ij (i j) are small, positive ICM 2006 Madrid, August 29, 2006 14

Convergence rate vector form: x(t + 1) = Wx(t) (we take W ij = 0 for i j, (i,j) E) W satisfies W = W T, W1 = 1 convergence lim t W t = (1/n)11 T ρ(w (1/n)11 T ) = W (1/n)11 T < 1 ρ is spectral radius; is spectral norm asymptotic convergence rate given by W (1/n)11 T convergence time is τ = 1/log W (1/n)11 T ICM 2006 Madrid, August 29, 2006 15

Connection to Laplacian eigenvalues identifying W ij = w l for l (i,j), we have W = I L convergence rate given by W (1/n)11 T = I L (1/n)11 T = max{ 1 λ 2,..., 1 λ n } = max{1 λ 2,λ n 1}... a convex spectral function, with φ(u) = max i 1 u i ICM 2006 Madrid, August 29, 2006 16

Fastest distributed linear averaging minimize W (1/n)11 T subject to W S, W = W T, W1 = 1 optimization variable is W; problem data is graph (sparsity pattern S) in terms of Laplacian eigenvalues with variable w R m minimize max{1 λ 2, λ n 1} these are convex optimization problems so, we can efficiently find the weights that give the fastest possible averaging on a graph ICM 2006 Madrid, August 29, 2006 17

Semidefinite programming formulation introduce scalar variable s to bound spectral norm minimize s subject to si I L (1/n)11 T si (for Z = Z T, Z s si Z si) an SDP (hence, can be solved efficiently) ICM 2006 Madrid, August 29, 2006 18

Constant weight averaging a simple, traditional method: constant weight on all edges, w = α1 yields update x i (t + 1) = x i (t) + j N i α(x j (t) x i (t)) a simple choice: max-degree weight, α = 1/max i d i d i is degree (number of neighbors) of node i best constant weight: α 2 = λ 2 + λ n (λ 2, λ n are eigenvalues of unweighted Laplacian, i.e., with w = 1) for edge transitive graph, w l = α is optimal ICM 2006 Madrid, August 29, 2006 19

A small example max-degree best constant optimal ρ = W (1/n)11 T 0.779 0.712 0.643 τ = 1/ log(1/ρ) 4.01 2.94 2.27 ICM 2006 Madrid, August 29, 2006 20

Optimal weights (note: some are negative!).07.43.50.03.08.34.36.22.18.24.03.15.34.08.22.22.34.25.42.34.36 λ i (W):.643,.643,.106, 0.000,.368,.643,.643, 1.000 ICM 2006 Madrid, August 29, 2006 21

Larger example 50 nodes, 200 edges max-degree best constant optimal ρ = W (1/n)11 T.971.947.902 τ = 1/ log(1/ρ) 33.5 18.3 9.7 ICM 2006 Madrid, August 29, 2006 22

Eigenvalue distributions 6 4 max-degree 2 0 1 0.8 0.6 0.4 0.2 0 0.2 0.4 0.6 0.8 1 6 4 best constant 2 0 1 0.8 0.6 0.4 0.2 0 0.2 0.4 0.6 0.8 1 6 4 optimal 2 0 1 0.8 0.6 0.4 0.2 0 0.2 0.4 0.6 0.8 1 ICM 2006 Madrid, August 29, 2006 23

Optimal weights 12 10 8 6 4 2 0 0.5 0.4 0.3 0.2 0.1 0 0.1 0.2 0.3 0.4 0.5 69 out of 250 are negative ICM 2006 Madrid, August 29, 2006 24

Another example a cut grid with n = 64 nodes, m = 95 edges edge width shows weight value (red for negative) τ = 85; max-degree τ = 137 ICM 2006 Madrid, August 29, 2006 25

Some questions & comments how much better are the optimal weights than the simple choices? for barbell graphs K n K n, optimal weights are unboundedly better than max-degree, optimal constant, and several other simple weight choices what size problems can be handled (on a PC)? interior-point algorithms easily handle problems with 10 4 edges subgradient-based methods handle problems with 10 6 edges any symmetry can be exploited for efficiency gain what happens if we require the weights to be nonnegative? (we ll soon see) ICM 2006 Madrid, August 29, 2006 26

Least-mean-square average consensus include random noise in averaging process: x(t + 1) = W x(t) + v(t) v(t) i.i.d., Ev(t) = 0, Ev(t)v(t) T = I steady-state mean-square deviation: δ ss = lim t E 1 n (x i (t) x j (t)) 2 = i<j n i=2 1 λ i (2 λ i ) for ρ = max{1 λ 2, λ n 1} < 1 another convex spectral function ICM 2006 Madrid, August 29, 2006 27

Outline some basic stuff we ll need graph Laplacian eigenvalues convex optimization and semidefinite programming the basic idea some example problems distributed linear averaging fastest mixing Markov chain on a graph fastest mixing Markov process on a graph its dual: maximum variance unfolding conclusions

Random walk on a graph Markov chain on nodes of G, with transition probabilities on edges P ij = Prob(X(t + 1) = j X(t) = i) we ll focus on symmetric transition probability matrices P (everything extends to reversible case, with fixed equilibrium distr.) identifying P ij with w l for l (i,j), we have P = I L same as linear averaging matrix W, but here W ij 0 (i.e., w 0, diag(l) 1) ICM 2006 Madrid, August 29, 2006 28

Mixing rate probability distribution π i (t) = Prob(X(t) = i) satisfies π(t + 1) T = π(t) T P since P = P T and P1 = 1, uniform distribution π = (1/n)1 is stationary, i.e., ((1/n)1) T P = ((1/n)1) T π(t) (1/n)1 for any π(0) iff µ = P (1/n)11 T = I L (1/n)11 T < 1 µ is called second largest eigenvalue modulus (SLEM) of MC SLEM determines convergence (mixing) rate, e.g., sup π(t) (1/n)1 tv ( n/2 ) µ t π(0) associated mixing time is τ = 1/log(1/µ) ICM 2006 Madrid, August 29, 2006 29

Fastest mixing Markov chain problem minimize µ = I L (1/n)11 T = max{1 λ 2,λ n 1} subject to w 0, diag(l) 1 optimization variable is w; problem data is graph G same as fast linear averaging problem, with additional nonnegativity constraint W ij 0 on weights convex optimization problem (indeed, SDP), hence efficiently solved ICM 2006 Madrid, August 29, 2006 30

Two common suboptimal schemes max-degree chain: w = (1/max i d i )1 Metropolis-Hastings chain: w l = 1, where l (i,j) max{d i, d j } (comes from Metropolis method of generating reversible MC with uniform stationary distribution) ICM 2006 Madrid, August 29, 2006 31

Small example optimal transition probabilities (some are zero).03.47.50.00.06.38.50.18.25.16.12.00.26.06.26.18.48.00.36.26.50 max-degree M.-H. optimal (fastest avg) SLEM µ.779.774.681 (.643) mixing time τ 4.01 3.91 2.60 (2.27) ICM 2006 Madrid, August 29, 2006 32

Larger example 50 nodes, 200 edges max-degree M.-H. optimal (fastest avg) SLEM µ.971.949.915 (.902) mixing time τ 33.5 19.1 11.3 (9.7) ICM 2006 Madrid, August 29, 2006 33

Optimal transition probabilities 82 edges (out of 200) edges have zero transition probability distribution of positive transition probabilities: 10 9 8 7 6 5 4 3 2 1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 ICM 2006 Madrid, August 29, 2006 34

Subgraph with positive transition probabilities ICM 2006 Madrid, August 29, 2006 35

Another example a cut grid with n = 64 nodes, m = 95 edges edge width shows weight value (dotted for zero) mixing time τ = 89; Metropolis-Hastings mixing time τ = 120 ICM 2006 Madrid, August 29, 2006 36

Some analytical results for path, fastest mixing MC is obvious one (P i,i+1 = 1/2) for any edge-transitive graph (hypercube, ring,... ), all edge weights are equal, with value 2/(λ 2 + λ n ) (unweighted Laplacian eigenvalues) ICM 2006 Madrid, August 29, 2006 37

Commute time for random walk on graph P ij proportional to w l, for l (i,j); P ii = 0 P not symmetric, but MC is reversible can normalize w as 1 T w = 1 commute time C ij : time for random walk to return to i after visiting j expected commute time averaged over all pairs of nodes is C = 1 n 2 n i,j=1 EC ij = 2 (n 1) n i=2 1 λ i (Chandra et al, 1989) called total effective resistance... another convex spectral function ICM 2006 Madrid, August 29, 2006 38

Minimizing average commute time find weights that minimize average commute time on graph: minimize C = 2/(n 1) n i=2 1/λ i subject to w 0, 1 T w = 1 another convex problem of our general form ICM 2006 Madrid, August 29, 2006 39

Outline some basic stuff we ll need graph Laplacian eigenvalues convex optimization and semidefinite programming the basic idea some example problems distributed linear averaging fastest mixing Markov chain on a graph fastest mixing Markov process on a graph its dual: maximum variance unfolding conclusions

Markov process on a graph (continuous-time) Markov process on nodes of G, with transition rate w l 0 between nodes i and j, for l (i,j) probability distribution π(t) R n satisfies heat equation π(t) = Lπ(t) π(t) = e tl π(0) π(t) converges to uniform distribution (1/n)1, for any π(0), iff λ 2 > 0 (asymptotic) convergence as e λ 2t ; λ 2 gives mixing rate of process λ 2 is concave, homogeneous function of w (come from symmetric concave function φ(u) = min i u i ) ICM 2006 Madrid, August 29, 2006 40

Fastest mixing Markov process on a graph maximize λ 2 subject to l d2 l w l 1, w 0 variable is w R m ; data is graph, normalization constants d l > 0 a convex optimization problem, hence easily solved allocate rate across edges so as maximize mixing rate constraint is always tight at solution, i.e., l d2 l w l = 1 when d 2 l = 1/m, optimal value is called absolute algebraic connectivity ICM 2006 Madrid, August 29, 2006 41

Interpretation: Grounded unit capacitor RC circuit charge vector q(t) satisfies q(t) = Lq(t), with edge weights given by conductances, w l = g l charge equilibrates (i.e., converges to uniform) at rate determined by λ 2 with conductor resistivity ρ, length d l, and cross-sectional area a l, we have g l = a l /(ρd l ) ICM 2006 Madrid, August 29, 2006 42

total conductor volume is l d la l = ρ l d2 l w l problem is to choose conductor cross-sectional areas, subject to a total volume constraint, so as to make the circuit equilibrate charge as fast as possible optimal λ 2 is.105; uniform allocation of conductance gives λ 2 =.073 ICM 2006 Madrid, August 29, 2006 43

SDP formulation and dual alternate formulation: minimize d 2 l w l subject to λ 2 1, w 0 SDP formulation: minimize d 2 l w l subject to L I (1/n)11 T, w 0 dual problem: maximize Tr X subject to X ii + X jj X ij X ji d 2 l, 1 T X1 = 0, X 0 l (i,j) with variable X R n n ICM 2006 Madrid, August 29, 2006 44

A maximum variance unfolding problem use variables x 1,...,x n R n, with X = xt 1. x T n [x 1 x n ] dual problem becomes maximum variance unfolding (MVU) problem maximize i x i 2 subject to x i x j d l, i x i = 0 l (i,j) position n points in R n to maximize variance, while respecting local distance constraints ICM 2006 Madrid, August 29, 2006 45

similar to semidefinite embedding for unsupervised learning of manifolds (Weinberger & Saul 2004) surprise: duality between fastest mixing Markov process and maximum variance unfolding ICM 2006 Madrid, August 29, 2006 46

Conclusions some interesting weight optimization problems have the common form minimize φ(w) = φ(λ 2,...,λ n ) subject to w W where φ is symmetric convex, and W is closed convex these are convex optimization problems we can solve them numerically (up to our ability to store data, compute eigenvalues... ) for some simple graphs, we can get analytic solutions associated dual problems can be quite interesting ICM 2006 Madrid, August 29, 2006 47

Convex Optimization Cambridge University Press, 2004 Some references Fast linear iterations for distributed averaging Systems and Control Letters 53:65-78, 2004 Fastest mixing Markov chain on a graph SIAM Review 46(4):667-689, 2004 Fastest mixing Markov process on a graph and a connection to a maximum variance unfolding problem to appear, SIAM Review 48(4), 2006 Minimizing effective resistance of a graph to appear, SIAM Review, 2007 Distributed average consensus with least-mean-square deviation MTNS p2768-2776, 2006 these and other papers available at www.stanford.edu/~boyd ICM 2006 Madrid, August 29, 2006 48