Lecture 18 and 19. and a graph is denoted by G = (V, E).

1 Lecture 18 and 19 Spring 2013 - EE 194, Advanced Control (Prof. Khan) Mar. 27 (Wed.) and Apr. 01 (Mon.), 2013 I. GRAPH THEORY A graph, G, is defined to be a collection of two sets: (i) a vertex-set, V = {1,..., N}, that is a collection of nodes (vertices); and an edge-set, E V V, that is a collection of edges. The edge-set, E, is defined as a set of ordered pairs (i, j) with i, j V such that j is connected to i to be interpreted as j can send information to i. Formally, and a graph is denoted by G = (V, E). E = {(i, j) j i}, (1) A graph is said to be undirected if (i, j) E (j, i) E for all i and j. A graph that does not satisfy this property is called a directed graph or a digraph. Unless otherwise stated, we deal explicitly with undirected graphs in the following. The neighborhood of a node i is defined as N i = {j (i, j) E}. (2) The degree of a node i is defined as the number of nodes that can send information to node i, i.e., N i. For directed graphs, there are two different notions of degree: in-degree and out-degree. A. Graph theory and Linear algebra Analysis of graphs is typically carried out via matrix theory. For this purpose, we define matrices that can define a graph (as opposed to the set notation earlier). The adjacency matrix, A = {a ij }, of a graph is defined as a ij = { 1, j i, 0, otw. Sometimes it is assumed that (i, i) E. With this assumption, the adjacency matrix has all 1 s on the main diagonal. (3) Remark 1. The adjacency matrix of an undirected graph is symmetric.

2 The incidence matrix, C = c ij, of a graph is defined as an N M matrix (where M is the total number of edges) such that for the mth edge (i, j) E, the mth column of C has a 1 at the ith location, a 1 at the jth location, and zeros everywhere else. The degree matrix, D, is defined as a diagonal matrix that has N i as the ith element on the main diagonal. The following definitions of a graph Laplacian, L = {l ij }, are equivalent: (i) L = D A. (4) N i, j = i, (ii) l ij = 1, i j, j i, (5) 0, otw. (iii) L = CC T. (6) Remark 2. The Laplacian, L, is symmetric and positive-semidefinite. Proof: Obvious from definition (iii). The eigenvalues of L are denoted by λ 1, λ 2,..., λ N ; the following conventional is typically employed, 0 = λ 1 λ 2... λ N. Remark 3. The Laplacian, L, is singular (rank-deficient), i.e., it has at least one 0 eigenvalue. Proof: Row-sum is 0. A path between node i 1 V and node i K+1 V of length K is defined as a sequence of edges (i 1, i 2 ), (i 2, i 3 ),... (i K, i K+1 ) in E for any distinct i 2,..., i K. An undirected graph is said to be connected if there exists a path from each i V to each j V. A graph is said to be complete or all-to-all if (i, j) E, for all i and j. If a graph is not connected then it can be partitioned into connected components.

3 II. WELL-KNOW RESULTS A diagonally-dominant matrix, A, is such that a ii j i a ij, i. A strictly diagonallydominant, A, is such that a ii > j i a ij, i. Lemma 1 (Gershgorin circle theorem). Let A = {a ij } C N N. Let D i be the closed disc centered at a ii with radius j i a ij. Then every eigenvalue of A lies in i D i. Corollary 1. A symmetric diagonally-dominant matrix with non-negative diagonals is PSD. Proof: Follows from Gershgorin circle theorem. Corollary 2. A Laplacian matrix is PSD. Proof: Laplacian matrices are symmetric diagonally-dominant with non-negative elements on the main diagonal. Lemma 2. Let G be connected and let λ 1 λ 2... λ N be the Laplacian eigenvalues. Then λ 2 > 0. Proof: Let u = [u 1, u 2,..., u N ] T be an eigenvector of L with eigenvalue 0. Since Lu = 0 and u T Lu = u T CC T u, we have C T u = 0. Now C T u u i u j = 0, (i, j) E. (7) This implies that u i = u j for all (i, j) E. As the graph is connected, we have u i = u j for all i, j V and the only normalized eigenvector that satisfies Lu = 0 is u = 1 [1, 1,..., 1] T. (8) N }{{} N elements Hence, the there is only one 0 eigenvalue and λ 2 > 0 since L is PSD. Lemma 3. The number of connected components equals to the multiplicity of 0 eigenvalues in its Laplacian. Proof: A disconnected graph is a union of some number of connected components. Each of such component is a connected graph on its own and has exactly one 0 eigenvalue.

4 Example 1. Consider a network with N nodes and no edges. There are N connected components (each node). From the above lemma, the Laplacian should have N 0 eigenvalues. Can be verified as the Laplacian in this case is a 0 matrix. An irreducible matrix is such that it cannot be transformed into a block- upper-triangular matrix with any row-column permutation. A block upper triangular matrix is such that it can be decomposed into [ ] 0. Remark 4. A matrix is irreducible if and only if its associated graph is strongly-connected. A symmetric matrix is irreducible if and only if its associated graph is connected. A primitive matrix is such that it is non-negative, square, and its pth integer power, with p > 0, has all positive elements. Remark 5. A primitive matrix is irreducible. Proof: Exercise. Remark 6. An irreducible matrix is not necessarily primitive unless it has a strictly positive diagonal. Proof: Exercise. The following statements can be proved. (i) A graph is connected if and only if its Laplacian is irreducible. (ii) For a comlete graph, λ 2 =,..., λ N = N. The algebraic connectivity of the graph is defined as the second-smallest eigenvalue of its Laplacian, i.e., λ 2. For connected graphs, this measures the strength of connectivity. Remark 7. In a connected graph, adding an edge does not decrease λ 2. Proof: Exercise.

5 A. Types of graphs A k-regular graph is such that each node is connected to exactly k other nodes. A nearest neighbor graph is such that each node is connected to all the nodes within a certain communication radius. An m-circulant graph is such that each node is connected to m forward and m backward neighbors. Remark 8. The adjacency and Laplacian matrices of a circulant graph are circulant matrices. The eigenvalues and eigenvectors of a circulant matrix are known in closed-form. The eigenvectors of a circulant matrix, for instance, are given by the well-known DTFT (Vandermonde) matrix. The above graphs are referred to as structured graphs. Typically such graphs are highly clustered (how many of a node s neighbors are neighbors of each other), but have a large mean shortest-path. Can also be related to graph diameter (largest shortest path). A random graph with 1 p 0 is such that every two nodes, i and j, are connected with a probability p. Random graphs have smaller average shortest path but suffer from weak clustering. Example 2. The above is one of the Erdös-Renýi graph generating model. An alternate is to randomly pick (with uniform probability) one graph out of all possible graphs with N nodes and K edges. Consider the following graph generation: Take a structured graph and a positive number 0 p 1. For each edge in the graph, rewire it to a randomly chosen (uniform probability) node with the probability p. Watts-Strogatz model: When p is small and the starting graph is circulant, the resulting graph is shown to exhibit the small-world principle, i.e., small average shortest path and large clustering. Example 3. Transportation networks, electric power grid, network of brain neurons, social networks, six degrees of separation, the email sending experiment, the author collaboration network, the famous Erdös number. My Erös number is 5 from three paths (may be 4-cannot prove) mean is 4.6. Khan, U. Moura, J. Püschel, M. Beth, T. Mullin, R. Erdös, P.

6 Lec 19: Monday, Apr. 01, 2013 III. MORE ON MATRICES Given an N N matrix, A = {a ij }, its associated graph is defined as G A = (V A, E A ) such that V A = {1,..., N} and E A = {(i, j) a ij 0}. The notions of a graph and a matrix are related. Graph Adjacency matrix Matrix Graph The notions of irreducibility and strong-connectivity are also related. SC Graph Irreducible adjacency matrix Irreducible matrix SC Associated Graph The following are equivalent on irreducible matrices. (i) An irreducible matrix is such it cannot be arranged into a block upper-triangular matrix with arbitrary row-column permutations. (ii) The associated graph of an irreducible matrix is strongly-connected. (iii) If a matrix, A, is irreducible then each of its columns and each of its rows has at least one non-zero element. A primitive matrix is such that it is non-negative, square, and its pth integer power, with p > 0, has all positive elements. A non-negative matrix, A = {a ij } R N N, is such that all of its elements are non-negative, i.e., a ij 0, i, j. We denote this by A 0 or A R N N 0. Furthermore, A B A B 0, where B R N N 0. A. Examples Example 4. The matrix A = 0 1 0 0 0 1 1 0 0 is irreducible. Its associated graph 1 2 3 1 is SC. However, this matrix is not primitive. Example 5. A non-negative, square, irreducible matrix with all positive diagonal elements is primitive.

7 B. Results Lemma 4. Let A R N N 0 be irreducible and let x 0. If Ax = 0, then x = 0. Proof: From (iii) above, each column of A has at least one non-zero element and b > 0 such that (jth column-sum) i a ij b > 0, j. Assume on the contrary that x 0 and Ax = 0. Then, 0 = Ax, = a ij x j, i j = a ij x j, i b j Since b > 0, we have j x j 0, and since x 0, we must have x = 0, which is a contradiction. x j. j Lemma 5. A non-negative matrix, A R N N is irreducible if and only if (I + A) N 1 > 0. Proof: Sketch: A is non-negative so (I + A) is non-negative with strictly positive diagonals. An irreducible non-negative matrix with all positive diagonal elements is primitive with index of primitivity of N 1.

8 C. Topology Theorem 1 (Brouwer Fixed Point). Let B n be a closed unit-disk in R n, i.e., B n = {x R n x 2 1 +... + x 2 n 1}. Every continuous function, f : B n B n, has at least one fixed point, i.e., x B n such that f(x) = x. (A remarkable result from topology: Equivalently, every map that encloses your current location has a You are here point.) A closed unit-disk in R is a line segment from [ 1, 1]. A closed unit-disk in R 2 is a circle centered at (0, 0) with unit radius. Corollary 3. Let S be the unit simplex: { S = x R n x 0 and i x i = 1 }. If f : S S is a continuous function, then there exists a w S such that f(w) = w. Proof: Because the properties involved (continuity, being a fixed point) are invariant under homeomorphisms (topological equivalence), the FP theorem holds for every set that is homeomorphic to a closed ball. In the language of topology, a coffee cup = a donut. Example 6. Every closed interval, [a, b] R is homeomorphic to the closed unit-disk in R. Let f : [a, b] [a, b] be any continuous function. a f(x)=x f(a) b f(b) a b f : [a, b] [a, b] is continuous Brouwer fixed-point theorem: Every continuous function that maps a closed set to itself intersects with the straight line, f(x)=x

Latin Way A B C D E F G H J K L M 1 2 Greenleaf Ave. Charnwood Rd. Benham St. Brookings St. 27 29 Stanley Ave. Frederick Ave. Windsor Rd. Fleming St. Renfrew St. Stearling St. Charlton St. Dartmouth St. Sunset Ave. 3 4 5 6 7 8 9 10 11 Emery St. Capen St. Capen St. Extension 9 Tesla Ave. Upland Rd. Sunset Rd. Chetwynd Rd. Curtis Ave. Conwell Ave. Curtis St. Winthrop St. Winthrop St. Raymond Ave. Curtis St. 15 16 Bellevue St. Fairmount St. Professors Row 35 Sawyer Ave. Whitfield Rd. Teele Ave. 39 46 48 31 University Ave. 13 49 Packard Ave. 36 42 61 North Hill Rd. 44 7 Talbot Ave. Boston Ave. 6 5 30 60 37 Powder House Blvd. Hume Ave. Burget Ave. 26 Professors Row 10 45 40 18 38 19 23 51 52 41 20 58 12 34 P 24 25 8 14 59 47 43 55 50 47 2 54 4 P 21 33 54 1 College Ave. College Ave. Talbot Ave. 17 22 28 32 3 56 53 Boston Ave. Dearborn Rd. Powder House Blvd. 11 N College Ave. Bromfield Rd. Pearson Rd. Bowdoin St. Colby St. St. Clement s Rd. Warner St. Wellesley St. Radcliffe St. Princeton St. 57 Broadway Yale St. To Harvard St. TO T DAVIS SQ. TUFTS UNIVERSITY MEDFORD/SOMERVILLE CAMPUS University Buildings 1 Aidekman Arts Center H9 2 Alumnae Hall H9 3 Anderson Hall H6 4 Balch Arena Theater H9 5 Ballou Hall E6 6 Barnum Hall E6 7 Bendetson Hall E5 8 Bookstore F8 9 Boston School of Occupational Therapy (BSOT) B4 10 Braker Hall G5 11 Bromfield-Pearson J7 12 Bush Hall F10 13 Cabot Center (The Fletcher School) D6 14 Campus Center F9 15 Carmichael Hall C5 16 Chase Center, Carmichael Hall C6 17 Cohen Auditorium H9 18 Community Health (112 Packard Avenue) E9 19 Conference Bureau Office (108 Packard Avenue) E9 20 Summer Session Office (108 Packard Avenue) D9 21 Cousens Gym H3 22 Curtis Hall H5 23 Dewick-MacPhie Dining Hall F9 24 Dowling Hall F4 25 East Hall F5 26 Eaton Hall & Computer Lab G6 27 Eliot-Pearson H1 28 Fine Arts House (11 Talbot Avenue) H8 29 Gantcher Center H2 30 Goddard Chapel F6 31 Granoff Family Hillel Center D5 32 Granoff Music Center H9 33 Halligan Hall H4 34 Haskell Hall F10 35 Health Services C8 36 Hill Hall E4 37 Hillside Apartments E4 38 Hodgdon Hall E9 39 Houston Hall C6 40 International Center D9 41 Jackson Gym G9 42 Lane Hall E4 43 Latin Way Apartments G10 44 Lewis Hall E10 45 Lincoln Filene Center G5 46 Miller Hall D5 47 Miner Hall H6 48 Mugar Hall C7 49 Olin Center for Language and Cultural Studies D6 50 Paige Hall H6 51 Pearson Chemical Lab F9 52 Performance Hanger G9 53 Psychology Building J6 54 Sophia Gordon Hall H7 55 South Hall G10 56 Robinson Hall H7 57 Science & Technology Center M7 58 Tilton Hall E10 59 Tisch Library G7 60 Urban & Environmental Planning (97 Talbot Avenue) E9 61 West Hall E5 P Public Parking

10 D. Vector and Matrix norms The max-norm, x, of a vector, x, is defined as its maximum absolute value, i.e., x = max i x i. Given a vector, w > 0, the weighted max-norm, x w, of a vector, x, is defined as max i x i /w i. The Euclidean norm or 2 norm of a vector, x, is defined as x 2 = x 2 1 +... + x 2 n. Example 7. Notice the difference between the absolute and square norms: x < α is a square with side α, whereas, x 2 < α is a circle with radius α centered at (0, 0). The p norm of a vector, x, is defined as x p = (x p 1 +... + x p n) 1 p. Let A R m n. Given vector norms,, on R n and R m, we can define an induced matrixnorm as the following: A = max{ Ax x R n and x = 1}, { } Ax = max x R n and x 0. x Example 8. Given the weighted max-norm and A R N N, the induced matrix-norm is A w = max x 0 Ax w. x w The Frobenius norm of a matrix is defined as A F = a ij 2 = trace(aa T ). i j The spectral radius, ρ(a), of a matrix, A, is defined as max i λ i, where λ i are the eigenvalues of A. Spectral radius can also be given by Gelfand s formula: ρ(a) = lim k A k 1 k, where is a consistent matrix norm. (All induced norms are consistent.) Lemma 6. Any induced norm,, satisfies ρ(a) A. Proof: Can be proved by Gelfand s formula. Lemma 7. ρ(a) A F Proof: A 2 A F.

11 IV. PERRON-FROBENIUS Theorem 2 (Perron-Frobenius). Let A be an N N non-negative matrix with eigenvalues, λ i, ordered as λ 1... λ N. (Arguably, the most important theorem in distributed algorithms.) If A is irreducible then: (a) There exists w > 0 such that Aw = ρ(a)w. (b) The eigenvector w is unique up to a scalar multiplication. Proof: The case n = 1 is trivial and it will be assumed that n 2. (a) Existence statement so define the element first: Consider the following set: { } S = x R n x 0 and x i = 1. i It can be shown that Ax 0 for any 1 x S. Define a function 2, f : S S, f(x) = Ax 1 T Ax. From Brouwer FP theorem, there exists some w S such that f(w) = w. This can be written as f(w) = Aw 1 T Aw = w Aw = ( 1 T Aw ) w, i.e., w is an eigenvector of A with eigenvalue λ 1 T Aw > 0. Now (I + A)w = (1 + λ)w (I + A) N 1 w = (1 + λ) N 1 w. Since A 0 is irreducible, (I + A) is non-negative and irreducible; and (I + A) N 1 > 0 (I + A) N 1 w > 0 (1 + λ) N 1 w > 0 w > 0. }{{} since w 0 Now show that ρ(a) = λ. Firstly, λ ρ(a), by definition. On the other hand, ρ(a) A w, = Aw w, (Exercise) = λw w, = λ w w, = λ. We conclude that λ ρ(a) λ, so ρ(a) = λ. (b) Exercise: Prove uniqueness. 1 Suppose elements of x sum to 1 but Ax = 0; then A is not irreducible which is a contradiction. Furthermore, if Ax = 0, then x = 0 but note that x = 0 / S. 2 Since Ax 0, the denominator is never zero and f is well-defined.

12 Lec 4: Wednesday, Feb. 01, 2012 Remarks: Recap Perron-Frobenius. The largest eigenvalue of a non-negative, irreducible matrix is positive-real, i.e., λ N R >0. The eigenvector corresponding to λ N of a non-negative, irreducible matrix is strictly positive and is unique up to a scalar multiplication. For non-negative irreducible matrices, λ N > λ N 1 is not necessarily true. See the next comment. A matrix that is non-negative and irreducible but not primitive can have λ N 1 = λ N. An example is A = 0 1 0 0 0 1 1 0 0 The eigenvalues are λ 1,2 = 0.5±j0.867, λ 3 = 1; note λ 1,2 = 1. However, the eigenvector corresponding to λ 3 = 1 only has to be strictly positive, whereas for other eigenvalues with = 1, the eigenvectors may not be strictly positive. Perron-Frobenius for primitive matrices: Theorem s statement plus λ N > λ i, i N.. A. Eigenspace Let A R n n. Then any v that satisfies Av = λv is called the right eigenvector of A. Similarly, any w that satisfies w T A = λw T is called the left eigenvector of A. By definition, the left eigenvectors are the right eigenvectors of A T. This can be seen by w T A = λw T A T w = λw. We call the collection of {v, λ} as the eigenspace of A and the collection of {w, λ} as eigenspace of A T. For a symmetric matrix, A = A T, the left eigenvectors are the same as the right eigenvectors and thus A and A T have the same eigenspace. When we decompose a matrix as A = V DV 1 ; the matrix V consists of the right eigenvectors of A and the matrix V 1 consists of the left eigenvectors of A (as rows of V 1 ). This can be shown as A = V DV 1 A T = V T DV T W DW 1. Since A T = W DW 1, each column of W is the right eigenvector of A T. Since W = (V 1 ) T, each column of W is a row in V 1. A normal matrix is such that it can be diagonalized by a diagonal matrix and a unitary matrix (V V T = I, V is unitary real). A symmetric matrix is a normal matrix. Does A and A T have the same eigenspace? Not unless A is normal, i.e., AA T = A T A. As we have shown above, the relationship between the left and right eigenvectors is given by W = (V 1 ) T. If A is normal, then V 1 = V T and W = V.

13 All of the above can be re-written for complex-valued matrices if we replace the transpose with Hermitian (complex conjugate transpose). B. Stochastic matrices A row(column)-stochastic matrix is such that it is non-negative and its row(column)-sum is 1. Lemma 8. The eigenvalues of a row-stochastic matrix lie in the unit circle. Proof: Gershgorin s circle theorem. Lemma 9. The spectral radius of a row-stochastic matrix is 1. Proof: Note that 1 is an eigenvalue and by the above lemma no other eigenvalue exceeds 1. Lemma 10. The eigenvalues of an irreducible row-stochastic matrix follow: λ 1... λ N 1 λ N = 1. The right eigenvector, v N, corresponding to λ N = 1 is a vector of all constants (positive numbers), i.e., v N = 1 N [1 1... 1] T, after normalization. In addition, if W is primitive (that can be made sure by adding a a strictly positive diagonal) then λ N 1 < λ N = 1. Proof: Perron-Frobenius, W with a strictly positive diagonal is primitive. A doubly-stochastic matrix is such that it is both row-stochastic and column-stochastic (or A T is row-stochastic). C. Average-consensus algorithm Consider a strongly-connected graph, G = (V, E), with N nodes. Let each node possess a real number, x i (0), at the ith node. Each node implements the following algorithm: x i (k + 1) = {i} N i w ij x j (k), where w ij > 0 for i = j and (i, j) E such that i w ij = 1. The network-level algorithm can be summarized as where W = {w ij } is a weight matrix that collects w ij. x k+1 = W x k, (9)

14 Remark 9. The weight matrix, W, is row-stochastic and irreducible. With w ii > 0, i, it is further primitive. From PF theorem, the eigenvalues, λ i, of W are such that λ 1... λ N 1 < λ N = 1. The right eigenvector, v N, corresponding to λ N = 1 is a strictly positive vector of all constants, i.e., v N = 1 [1 1... 1] T. N }{{} 1 N Let v i be the eigenvector corresponding to λ i then W = V DV 1, where V = [v N,..., v 1 ] and D is a diagonal matrix with λ N,..., λ 1 on the main diagonal. Consider the asymptotic behavior of (9). x k+1 = W k+1 x 0, = V D k+1 V 1 x 0, = [v N,..., v 1 ] = v N v T Nx 0 + x lim k x k+1 = v N v T Nx 0. N 1 i=1 1 k+1 λ k+1 N 1 λ k+1 i v i v T i x 0, If, in addition, W is symmetric then v N = v N and... λ k+1 1 v T N v T N 1. v T 1 x 0, x lim x k+1 = v N v T k Nx 0 = 1 1 1 N 1 T Nx 0 = 1 N N N 11T x 0, (10) where it can be verified that 1 T x 0 /N is the average of the initial condition. Summary: Agreement: If G is strongly-connected and the weights are such that: (i) w ij > 0 for all (i, j) E and (i, i), i V; and (ii) i w ij = 1; then the update in (9) converges to an agreement over all of the nodes in the network. Average-consensus: If G is connected and the weights are such that: (i) w ij > 0 for all (i, j) E and (i, i), i V; (ii) i w ij = 1; and (iii) w ij = w ji ; then the update in (9) converges to the average of the nodal initial conditions.