Matrix Theory, Math6304 Lecture Notes from March 22, 2016 taken by Kazem Safari

Matrix Theory, Math6304 Lecture Notes from March 22, 2016 taken by Kazem Safari 1.1 Applications of Courant-Fisher and min- or -min Last time: Courant -Fishert -min or min- for eigenvalues Warm-up: Sums of eigenvalues from optimization problems 1.1.1 Proposition. Let A M n be Hermitian, i.e. A A, and with eigenvalues λ 1 λ 2... λ n. Then λ j min tr[ap ]. In order to prove this remarkable result, we need to recall some Orthogonal Projection theory from Real Analysis 1 : If A is a closed subspace 2 of a Hilbert space H 3, for every x H we can define: δ inf x y. y A It then follows that δ < and there exist a unique z A that achieves this infimum, i.e. δ inf x y x z. y A 1 For the proof of these results cf Folland, Real Analysis Modern Techniques and Their Application, 5.5 2 for the purpose of defining the orthogonal projection, it suffices for our subset A to be convex. 3 In this course H R n or C n. Therefore we can identify each linear maps with a matrix and vice-versa. 1

P (x) : z is called the orthogonal projection onto closed subspace A. Then, we have: x P (x) A 4 And every element x H can be uniquely written as: x P x + (x P x) where P x A and x P x A. In other words: H A A 1.1.2 Proposition. The orthogonal projection operator P : H A has the following properties: 1) P is a linear continuous map, and P 1. 2) P 2 P, i.e. P A id. 3) R(P ) A and null(p ) A. 4) P P. 5) rank(p ) tr(p ). 6) Any eigenvalue of P is either 0 or 1. Conversely: 1.1.3 Proposition. Suppose that P L(H, H) satisfies P 2 P P. Then R(P ) is closed and P is the orthogonal projection onto R(P ). 1.1.4 Definition. we define P k as the set of orthogonal projections of rank k. 5 4 A {x H x, a 0 x A} 5 Unfortunately, the space of orthogonal projections of rank k is not a linear space but it is what we call a Variety. 2

Proof of the warm-up. Consider n orthonormal eigenvectors {u j } n corresponding to eigenvalues {λ j } n of our Hermitian matrix A. If we define U [u 1... u n ], then one eigendecomposition of A is A UΛU UU U U I and Λ diag[λ 1,..., λ n ]. Therefore where A n λ ju j u j. Next, if P P k, then since P 1. Now, P u j 2 u j 2 1. Since P 2 P, and since P P, k rank(p ) tr(p ) tr(u P U) u 1 u 2 tr. P [u 1 u 2... u n ] u n u 1 u 2 u n tr. [P u 1 P u 2... P u n ] u 1P u 1 u 1P u 2 u 1P u n u 2P u 1 u 2P u 2 u 2P u n tr...... u np u 1 u np u 2 u np u n tr[p u j u j] u jp u j P u j, u j P 2 u j, u j 3

So by the definition of the adjoint operator, P P u j, u j P u j, P u j P u j 2. Thus, if we define x j : tr(p u j u j) then by the same procedure as above: x j P u j u ju l, u l P u j δ j,l, u l P u j, u j P u j 2. Then by combining the two previous results, for each j we have: 0 x j 1 and x j k. Now let X k : {x [0, 1] n s.t. n x j k}. Then 6 : min tr[ap ] min tr[p A] min tr[ P λ j u j u j] min min λ j tr[p u j u j] λ j x j 6 since tr is linear and tr(ab) tr(ba) for all A, B M n n (C). 4

Now we are going to invoke the variational principal in optimization, which in essence says that by properly relaxing the conditions of a structurally highly complicated problem, the min only goes lower and goes higher. min x X k λ j x j Which turns the problem into minimizing over a linear k-polytope. Claim 7 : min x X k λ j x j λ j proof of the claim. if x l 0 for any l > k then there exist a ɛ > 0 such that λ l λ k + ɛ. Then: λ j x j λ j x j + λ l x l λ j x j + (λ k + ɛ)x l k 1 λ j x j + λ k (x k + x l ) + ɛx l k 1 λ j x j + λ k (x k + x l ) Meaning whenever any eigenvalue greater than λ k has a positive weight we can redistribute that weight among the eigenvalues less than or equal to λ k and achieve a lower overall value. Therefore we must have x k+1 x k+2... x n 0 7 This problem is very similar to water-filling algorithm in Singal Processing. 5

On the other hand, since x X k, we must fully scale the first k eigenvalues: Conversely, Choosing P k u ju j x 1 x 2... x k 1. It is straightforward to check that P L(H, H) and P 2 P P. Therefore by the latter Prop, P is the orthogonal projection into R(P ), which is a closed linear subspace of H. Moreover we have: P u j u j u 1 j u 2 j u n j. [ ] u 1 j u 2 j u n j u 1 ju 1 j u 1 ju 2 j u 1 ju n j u 2 ju 1 1 u 2 ju 2 j u 2 ju n j...... u n j u 1 1 u n j u 2 j u n j u n j Therefore we can easily see that: tr(p ) u j 2 k. But rank(p ) tr(p ) k by the former Prop, therefore we see that P P k as well. Thus: 6

min P P tr[ap ] tr[ap ] tr[a u j u j] tr[au j u j] tr[λ j u j u j] λ j tr(u j u j) λ j u j 2 λ j. And now we conclude that min tr[ap ] λ j. Moral: We can replace /min or min/ by a sequence of minimizations over P k, and therefore, we can relate the spectrum of submatrices to the whole matrix. Recall: 1.1.5 Theorem (Courant-Fischer). Suppose A M n is Hermitian, i.e. A A. Now, for each 1 k n, let {S α k } α I k, where α I k denote the set of all k dimensional linear subspaces of H, and enumerate the n eigenvalues λ 1,..., λ n (counting multiplicity) in increasing order, i.e. λ 1 λ 2,..., λ n. Then, we have 7

(i) min α I k x Sk α\{0} Ax, x / x 2 λ k. (ii) α J n k+1 min x Sn k+1 α \{0} Ax x / x 2 λ k. Proof of part (ii). Let W Span{u 1, u 2,..., u k }, dim W k. Then if dim S n k+1 n k+1 then by dimension counting, i.e. 1 dim(w S n k+1 ) dim W + dim S n k+1 dim(w S n k+1 ), S n k+1 W 0, and therefore, there exists an x (S n k+1 W ) {0}, with x k x, u j u j. Therefore R A (x) Ax, x k λ j x, u j u j, x k λ j x, u j u j, x k λ j u j, x 2 λ k Since λ k λ k 1... λ 1, and we have x, u j 2 since {u j } k is an orthonormal basis for W k. Therefore we have: Ax, x min x S n k+1 x 0 8 λ k

So since the choice of S n k+1 was arbitrary: S n k+1 min x S n k+1 x 0 Ax, x λ k But, on the other hand, there is a Special Choice of S n k+1, namely S n k+1 Span{u k, u k+1,..., u n }, and then we have S n k+1 W Span{u k }. Finally, using Rayleigh-Ritz for A Sn k+1 : Ax, x min smallest eiqenvalue of A x S n k+1 Sn k+1 λ k. x 0 Note: If k 1 in (1) or (2), we recover Rayleigh-Ritz as a special case. Counterexample in the non-hermitian case [ ] 0 1 Let N be the nilpotent matrix. 0 0 Define the Rayleigh quotient R N (x) exactly as above in the Hermitian case. Then it is easy to see that the only eigenvalue of N is zero, while the imum value of the Rayleigh ratio is 1/2. That is, the imum value of the Rayleigh quotient is larger than the imum eigenvalue. Applications of Courant-Fisher 1.1.6 Theorem (Weyl). Let A, B M n be Hermitian with eigenvalues {λ j (A)} n and {λ j (B)} n, and {λ j (A + B)} n, all arranged in non-decreasing order. 8 We then have: λ k (A) + λ 1 (B) λ k (A + B) λ k (A) + λ n (B) Proof. We know from Rayleigh-Ritz that for any nonzero vector x C n : 8 A and B could be considered as kenetic and potential energy matrices of the Schrdinger Hamiltonian operator in Quantum Mechanics. 9

λ 1 (B) Bx, x λ n (B), So in order to prove the first inequality, considering A + B by Courant-Fisher we have: λ k (A + B) min S k min S k min S k min S k x S k,x 0 x S k,x 0 x S k,x 0 x S k,x 0 λ k (A) + λ k (B), ( ) (A + B)x, x x ( 2 ) Ax, x Bx, x + ( ) x ( 2 ) Ax, x + λ 1 (B) ( ) Ax, x + λ 1 (B) where the last equality follows from Courant-Fisher for A. Now to prove the second inequality we instead, estimate the second term in ( ) by Bx, x λ n (B) which similarly gives λ k (A + B) λ k (A) + λ n (B). In special cases, we can deduce simpler inequalities. 1.1.7 Definition. A matrix B M n is called positive semidefinite if, it is Hermitian, and for each x C, Bx, x 0. 10

1.1.8 Corollary. Let A, B M n be Hermitian and B positive semidefinite, then λ k (A) λ k (A + B). Proof. Follows from Weyl s Theorem and the fact that λ 1 (B) 0. 1.1.9 Remark. A positive semidefinite rank-one matrix B is of the form B zz. Since B is Hermitian, we can write it as B λuu, where λ 0. So we can choose z λu. 1.1.10 Theorem (Interlacing Theorem). 9 Let A M n be Hermitian and z C n. If {λ j (A)} n and {λ j (A) ± zz } n are in non-decreasing order, then the eigenvalues interlace, that is: λ k (A ± zz ) λ k+1(a) λ k+2 (A ± zz ) Application of min- -min Theorem in Game Theory: 10 Finding the Nash Equilibrium Game theory attempts to mathematically explain behavior in situations in which an individual s outcome depends on the actions of others. 1.1.11 Definition. An n-person game is one in which there are n players, and a payoff function, which assigns an n-vector to each terminal vertex of the game, indicating each players earnings. 1.1.12 Definition. A strategy refers to a players plan specifying which choices it will make in every possible situation, leading to an eventual outcome. Let Σ i denote the set of all strategies for player i. In order to decide which strategy is best, player i will have to choose the strategy which imizes its payoff (i.e., the i -th component of the payoff function). Letting π denote the probability of a certain combination of strategies occuring, we can derive a mathematical expression for the payoff function, given player i uses strategy σ i Σ i : 9 If you imagine the eigenvalues of A ± zz and A are arranged in ascending order on two vertical lines parallel to each other, then the comparative order of them somehow resembles how you use your shoe-laces to tie your shoes. 10 cf John Von Neumann and Oskar Morgenstern. Theory of Games and Economic Behavior. Princeton University Press. 1947. 11

π(σ 1, σ 2,..., σ n ) (π 1 (σ 1...σ n ), π 2 (σ 1...σ n ),..., π n (σ 1...σ n )) where σ 1 represents player 1 s strategy, σ 2 represents player 2 s strategy, and so on, while π 1 represents the probability of player 1 choosing strategy σ 1, π 2 represents the probability of player 2 choosing strategy σ 2, and so on. It is possible to express this function through an n-dimensional array of n-vectors, called the normal form of the game. 1.1.13 Definition. A strategy n-tuple (σ 1, σ 2,..., σ n ) is said to be a Nash equilibrium if and only if no player has any reason to change its strategy, assuming the other players do not change theirs. That is, the strategy n-tuple (σ 1, σ 2,..., σ n ) is in equilibrium, for any i 1,...n, and any σ i Σ i : π i (σ 1,..., σ i 1, σ i, σ i+1,..., σ n ) π i (σ 1, σ 2,..., σ n ) 1.1.14 Definition. A mixed strategy is a probability distribution on the set of a players pure strategies. When a player has a finite number of m strategies, its mixed strategy can be expressed as an m-vector, x (x 1,..., x m ) such that x i 0 and n i1 x i 1 Suppose players (1,2) have pay-off matrices (A n, B n ). Let X denote the set of all mixed strategies for player 1, and Y represent the set of all mixed strategies for player 2. If player 1 chooses mixed strategy x while player 2 chooses mixed strategy y, then the expected pay-off matrices can be written as P A x Ay and P B y Bx. So the Nash Equilibrium would be: P A min y x y Ax P B min x y x By It is straightforward to check that in the case of the pay-off matrix of the famous prisoner s dilemma, the NE is in fact the pair of smallest eigenvalues. 12

Prisoner s dilemma Example of PD payoff matrix M Cooperate (with other) Defect (betray other) ( ) Cooperate (with other) 2, 2 0, 3 Defect (betray other) 3, 0 1, 1 Since A ( ) 2 0, B 3 1 ( ) 2 3 and the NE (1, 1) (λ 0 1 min (A), λ min (B)). 13