Matrix Theory, Math6304 Lecture Notes from March 22, 2016 taken by Kazem Safari

Similar documents
Matrix Theory, Math6304 Lecture Notes from October 25, 2012

MATRICES ARE SIMILAR TO TRIANGULAR MATRICES

KAKUTANI S FIXED POINT THEOREM AND THE MINIMAX THEOREM IN GAME THEORY

Lecture 9: Low Rank Approximation

Math 408 Advanced Linear Algebra

6.891 Games, Decision, and Computation February 5, Lecture 2

SPECTRAL PROPERTIES OF THE LAPLACIAN ON BOUNDED DOMAINS

SPECTRAL THEOREM FOR COMPACT SELF-ADJOINT OPERATORS

Exercise Sheet 1.

Spectral Clustering on Handwritten Digits Database

Math 350 Fall 2011 Notes about inner product spaces. In this notes we state and prove some important properties of inner product spaces.

The following definition is fundamental.

Throughout these notes we assume V, W are finite dimensional inner product spaces over C.

STAT200C: Review of Linear Algebra

1 Review: symmetric matrices, their eigenvalues and eigenvectors

THE SINGULAR VALUE DECOMPOSITION AND LOW RANK APPROXIMATION

Recall the convention that, for us, all vectors are column vectors.

Prisoner s Dilemma. Veronica Ciocanel. February 25, 2013

Lecture 2: Linear Algebra Review

Math 102, Winter Final Exam Review. Chapter 1. Matrices and Gaussian Elimination

Lecture notes: Applied linear algebra Part 1. Version 2

Semidefinite and Second Order Cone Programming Seminar Fall 2001 Lecture 5

Problem # Max points possible Actual score Total 120

Lecture 3: Review of Linear Algebra

Chapter 0 Miscellaneous Preliminaries

Lecture 3: Review of Linear Algebra

Duke University, Department of Electrical and Computer Engineering Optimization for Scientists and Engineers c Alex Bronstein, 2014

LECTURE VI: SELF-ADJOINT AND UNITARY OPERATORS MAT FALL 2006 PRINCETON UNIVERSITY

Trace Class Operators and Lidskii s Theorem

Lecture 4: Purifications and fidelity

08a. Operators on Hilbert spaces. 1. Boundedness, continuity, operator norms

Functional Analysis. Franck Sueur Metric spaces Definitions Completeness Compactness Separability...

Lectures 6, 7 and part of 8

MATH 829: Introduction to Data Mining and Analysis Principal component analysis

Lecture 12 : Graph Laplacians and Cheeger s Inequality

Introduction to quantum information processing

Lecture 1 and 2: Random Spanning Trees

Lecture 7: Positive Semidefinite Matrices

Linear Algebra Formulas. Ben Lee

Linear Algebra Review

MATH 320: PRACTICE PROBLEMS FOR THE FINAL AND SOLUTIONS

October 25, 2013 INNER PRODUCT SPACES

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.

SPECTRAL THEORY EVAN JENKINS

Elementary linear algebra

Evolution & Learning in Games

LINEAR ALGEBRA BOOT CAMP WEEK 4: THE SPECTRAL THEOREM

A Linear Algebra Primer

Lecture 18. Ramanujan Graphs continued

NORMS ON SPACE OF MATRICES

1 Quantum states and von Neumann entropy

Math Camp Lecture 4: Linear Algebra. Xiao Yu Wang. Aug 2010 MIT. Xiao Yu Wang (MIT) Math Camp /10 1 / 88

OPTIMALITY AND STABILITY OF SYMMETRIC EVOLUTIONARY GAMES WITH APPLICATIONS IN GENETIC SELECTION. (Communicated by Yang Kuang)

Math 3191 Applied Linear Algebra

Chapter 4 Euclid Space

Linear algebra and applications to graphs Part 1

Coalitional Structure of the Muller-Satterthwaite Theorem

5 Compact linear operators

Math 113 Winter 2013 Prof. Church Midterm Solutions

Principal Component Analysis

Lecture 1 Review: Linear models have the form (in matrix notation) Y = Xβ + ε,

Computing Minmax; Dominance

Linear Algebra and Dirac Notation, Pt. 2

Nullity of Measurement-induced Nonlocality. Yu Guo

UCSD ECE269 Handout #18 Prof. Young-Han Kim Monday, March 19, Final Examination (Total: 130 points)

7. Symmetric Matrices and Quadratic Forms

5 Quiver Representations

Fiedler s Theorems on Nodal Domains

Math Matrix Algebra

Homework 2. Solutions T =

Eigenvalues and diagonalization

Math 2331 Linear Algebra

Asymptotic distribution of eigenvalues of Laplace operator

Lecture December 2009 Fall 2009 Scribe: R. Ring In this lecture we will talk about

ECE 275A Homework #3 Solutions

Finite and infinite dimensional generalizations of Klyachko theorem. Shmuel Friedland. August 15, 1999

MATH 423 Linear Algebra II Lecture 33: Diagonalization of normal operators.

CS286.2 Lecture 15: Tsirelson s characterization of XOR games

arxiv: v1 [math.na] 5 May 2011

The Principles of Quantum Mechanics: Pt. 1

MATH 426, TOPOLOGY. p 1.

Linear Algebra Massoud Malek

Homework 11 Solutions. Math 110, Fall 2013.

Linear Algebra 1. M.T.Nair Department of Mathematics, IIT Madras. and in that case x is called an eigenvector of T corresponding to the eigenvalue λ.

Variational Principles for Nonlinear Eigenvalue Problems

Belief-based Learning

Lecture 21: HSP via the Pretty Good Measurement

Near-Potential Games: Geometry and Dynamics

Fiedler s Theorems on Nodal Domains

Spectra of Adjacency and Laplacian Matrices

Linear Algebra Lecture Notes-II

Chapter 3 Transformations

Preliminary/Qualifying Exam in Numerical Analysis (Math 502a) Spring 2012

REPRESENTATION THEORY WEEK 7

Chapter 6: Orthogonality

D-bounded Distance-Regular Graphs

Lecture 8 : Eigenvalues and Eigenvectors

Section 6.2, 6.3 Orthogonal Sets, Orthogonal Projections

Chapter 1 Vector Spaces

Lecture 4 Eigenvalue problems

Transcription:

Matrix Theory, Math6304 Lecture Notes from March 22, 2016 taken by Kazem Safari 1.1 Applications of Courant-Fisher and min- or -min Last time: Courant -Fishert -min or min- for eigenvalues Warm-up: Sums of eigenvalues from optimization problems 1.1.1 Proposition. Let A M n be Hermitian, i.e. A A, and with eigenvalues λ 1 λ 2... λ n. Then λ j min tr[ap ]. In order to prove this remarkable result, we need to recall some Orthogonal Projection theory from Real Analysis 1 : If A is a closed subspace 2 of a Hilbert space H 3, for every x H we can define: δ inf x y. y A It then follows that δ < and there exist a unique z A that achieves this infimum, i.e. δ inf x y x z. y A 1 For the proof of these results cf Folland, Real Analysis Modern Techniques and Their Application, 5.5 2 for the purpose of defining the orthogonal projection, it suffices for our subset A to be convex. 3 In this course H R n or C n. Therefore we can identify each linear maps with a matrix and vice-versa. 1

P (x) : z is called the orthogonal projection onto closed subspace A. Then, we have: x P (x) A 4 And every element x H can be uniquely written as: x P x + (x P x) where P x A and x P x A. In other words: H A A 1.1.2 Proposition. The orthogonal projection operator P : H A has the following properties: 1) P is a linear continuous map, and P 1. 2) P 2 P, i.e. P A id. 3) R(P ) A and null(p ) A. 4) P P. 5) rank(p ) tr(p ). 6) Any eigenvalue of P is either 0 or 1. Conversely: 1.1.3 Proposition. Suppose that P L(H, H) satisfies P 2 P P. Then R(P ) is closed and P is the orthogonal projection onto R(P ). 1.1.4 Definition. we define P k as the set of orthogonal projections of rank k. 5 4 A {x H x, a 0 x A} 5 Unfortunately, the space of orthogonal projections of rank k is not a linear space but it is what we call a Variety. 2

Proof of the warm-up. Consider n orthonormal eigenvectors {u j } n corresponding to eigenvalues {λ j } n of our Hermitian matrix A. If we define U [u 1... u n ], then one eigendecomposition of A is A UΛU UU U U I and Λ diag[λ 1,..., λ n ]. Therefore where A n λ ju j u j. Next, if P P k, then since P 1. Now, P u j 2 u j 2 1. Since P 2 P, and since P P, k rank(p ) tr(p ) tr(u P U) u 1 u 2 tr. P [u 1 u 2... u n ] u n u 1 u 2 u n tr. [P u 1 P u 2... P u n ] u 1P u 1 u 1P u 2 u 1P u n u 2P u 1 u 2P u 2 u 2P u n tr...... u np u 1 u np u 2 u np u n tr[p u j u j] u jp u j P u j, u j P 2 u j, u j 3

So by the definition of the adjoint operator, P P u j, u j P u j, P u j P u j 2. Thus, if we define x j : tr(p u j u j) then by the same procedure as above: x j P u j u ju l, u l P u j δ j,l, u l P u j, u j P u j 2. Then by combining the two previous results, for each j we have: 0 x j 1 and x j k. Now let X k : {x [0, 1] n s.t. n x j k}. Then 6 : min tr[ap ] min tr[p A] min tr[ P λ j u j u j] min min λ j tr[p u j u j] λ j x j 6 since tr is linear and tr(ab) tr(ba) for all A, B M n n (C). 4

Now we are going to invoke the variational principal in optimization, which in essence says that by properly relaxing the conditions of a structurally highly complicated problem, the min only goes lower and goes higher. min x X k λ j x j Which turns the problem into minimizing over a linear k-polytope. Claim 7 : min x X k λ j x j λ j proof of the claim. if x l 0 for any l > k then there exist a ɛ > 0 such that λ l λ k + ɛ. Then: λ j x j λ j x j + λ l x l λ j x j + (λ k + ɛ)x l k 1 λ j x j + λ k (x k + x l ) + ɛx l k 1 λ j x j + λ k (x k + x l ) Meaning whenever any eigenvalue greater than λ k has a positive weight we can redistribute that weight among the eigenvalues less than or equal to λ k and achieve a lower overall value. Therefore we must have x k+1 x k+2... x n 0 7 This problem is very similar to water-filling algorithm in Singal Processing. 5

On the other hand, since x X k, we must fully scale the first k eigenvalues: Conversely, Choosing P k u ju j x 1 x 2... x k 1. It is straightforward to check that P L(H, H) and P 2 P P. Therefore by the latter Prop, P is the orthogonal projection into R(P ), which is a closed linear subspace of H. Moreover we have: P u j u j u 1 j u 2 j u n j. [ ] u 1 j u 2 j u n j u 1 ju 1 j u 1 ju 2 j u 1 ju n j u 2 ju 1 1 u 2 ju 2 j u 2 ju n j...... u n j u 1 1 u n j u 2 j u n j u n j Therefore we can easily see that: tr(p ) u j 2 k. But rank(p ) tr(p ) k by the former Prop, therefore we see that P P k as well. Thus: 6

min P P tr[ap ] tr[ap ] tr[a u j u j] tr[au j u j] tr[λ j u j u j] λ j tr(u j u j) λ j u j 2 λ j. And now we conclude that min tr[ap ] λ j. Moral: We can replace /min or min/ by a sequence of minimizations over P k, and therefore, we can relate the spectrum of submatrices to the whole matrix. Recall: 1.1.5 Theorem (Courant-Fischer). Suppose A M n is Hermitian, i.e. A A. Now, for each 1 k n, let {S α k } α I k, where α I k denote the set of all k dimensional linear subspaces of H, and enumerate the n eigenvalues λ 1,..., λ n (counting multiplicity) in increasing order, i.e. λ 1 λ 2,..., λ n. Then, we have 7

(i) min α I k x Sk α\{0} Ax, x / x 2 λ k. (ii) α J n k+1 min x Sn k+1 α \{0} Ax x / x 2 λ k. Proof of part (ii). Let W Span{u 1, u 2,..., u k }, dim W k. Then if dim S n k+1 n k+1 then by dimension counting, i.e. 1 dim(w S n k+1 ) dim W + dim S n k+1 dim(w S n k+1 ), S n k+1 W 0, and therefore, there exists an x (S n k+1 W ) {0}, with x k x, u j u j. Therefore R A (x) Ax, x k λ j x, u j u j, x k λ j x, u j u j, x k λ j u j, x 2 λ k Since λ k λ k 1... λ 1, and we have x, u j 2 since {u j } k is an orthonormal basis for W k. Therefore we have: Ax, x min x S n k+1 x 0 8 λ k

So since the choice of S n k+1 was arbitrary: S n k+1 min x S n k+1 x 0 Ax, x λ k But, on the other hand, there is a Special Choice of S n k+1, namely S n k+1 Span{u k, u k+1,..., u n }, and then we have S n k+1 W Span{u k }. Finally, using Rayleigh-Ritz for A Sn k+1 : Ax, x min smallest eiqenvalue of A x S n k+1 Sn k+1 λ k. x 0 Note: If k 1 in (1) or (2), we recover Rayleigh-Ritz as a special case. Counterexample in the non-hermitian case [ ] 0 1 Let N be the nilpotent matrix. 0 0 Define the Rayleigh quotient R N (x) exactly as above in the Hermitian case. Then it is easy to see that the only eigenvalue of N is zero, while the imum value of the Rayleigh ratio is 1/2. That is, the imum value of the Rayleigh quotient is larger than the imum eigenvalue. Applications of Courant-Fisher 1.1.6 Theorem (Weyl). Let A, B M n be Hermitian with eigenvalues {λ j (A)} n and {λ j (B)} n, and {λ j (A + B)} n, all arranged in non-decreasing order. 8 We then have: λ k (A) + λ 1 (B) λ k (A + B) λ k (A) + λ n (B) Proof. We know from Rayleigh-Ritz that for any nonzero vector x C n : 8 A and B could be considered as kenetic and potential energy matrices of the Schrdinger Hamiltonian operator in Quantum Mechanics. 9

λ 1 (B) Bx, x λ n (B), So in order to prove the first inequality, considering A + B by Courant-Fisher we have: λ k (A + B) min S k min S k min S k min S k x S k,x 0 x S k,x 0 x S k,x 0 x S k,x 0 λ k (A) + λ k (B), ( ) (A + B)x, x x ( 2 ) Ax, x Bx, x + ( ) x ( 2 ) Ax, x + λ 1 (B) ( ) Ax, x + λ 1 (B) where the last equality follows from Courant-Fisher for A. Now to prove the second inequality we instead, estimate the second term in ( ) by Bx, x λ n (B) which similarly gives λ k (A + B) λ k (A) + λ n (B). In special cases, we can deduce simpler inequalities. 1.1.7 Definition. A matrix B M n is called positive semidefinite if, it is Hermitian, and for each x C, Bx, x 0. 10

1.1.8 Corollary. Let A, B M n be Hermitian and B positive semidefinite, then λ k (A) λ k (A + B). Proof. Follows from Weyl s Theorem and the fact that λ 1 (B) 0. 1.1.9 Remark. A positive semidefinite rank-one matrix B is of the form B zz. Since B is Hermitian, we can write it as B λuu, where λ 0. So we can choose z λu. 1.1.10 Theorem (Interlacing Theorem). 9 Let A M n be Hermitian and z C n. If {λ j (A)} n and {λ j (A) ± zz } n are in non-decreasing order, then the eigenvalues interlace, that is: λ k (A ± zz ) λ k+1(a) λ k+2 (A ± zz ) Application of min- -min Theorem in Game Theory: 10 Finding the Nash Equilibrium Game theory attempts to mathematically explain behavior in situations in which an individual s outcome depends on the actions of others. 1.1.11 Definition. An n-person game is one in which there are n players, and a payoff function, which assigns an n-vector to each terminal vertex of the game, indicating each players earnings. 1.1.12 Definition. A strategy refers to a players plan specifying which choices it will make in every possible situation, leading to an eventual outcome. Let Σ i denote the set of all strategies for player i. In order to decide which strategy is best, player i will have to choose the strategy which imizes its payoff (i.e., the i -th component of the payoff function). Letting π denote the probability of a certain combination of strategies occuring, we can derive a mathematical expression for the payoff function, given player i uses strategy σ i Σ i : 9 If you imagine the eigenvalues of A ± zz and A are arranged in ascending order on two vertical lines parallel to each other, then the comparative order of them somehow resembles how you use your shoe-laces to tie your shoes. 10 cf John Von Neumann and Oskar Morgenstern. Theory of Games and Economic Behavior. Princeton University Press. 1947. 11

π(σ 1, σ 2,..., σ n ) (π 1 (σ 1...σ n ), π 2 (σ 1...σ n ),..., π n (σ 1...σ n )) where σ 1 represents player 1 s strategy, σ 2 represents player 2 s strategy, and so on, while π 1 represents the probability of player 1 choosing strategy σ 1, π 2 represents the probability of player 2 choosing strategy σ 2, and so on. It is possible to express this function through an n-dimensional array of n-vectors, called the normal form of the game. 1.1.13 Definition. A strategy n-tuple (σ 1, σ 2,..., σ n ) is said to be a Nash equilibrium if and only if no player has any reason to change its strategy, assuming the other players do not change theirs. That is, the strategy n-tuple (σ 1, σ 2,..., σ n ) is in equilibrium, for any i 1,...n, and any σ i Σ i : π i (σ 1,..., σ i 1, σ i, σ i+1,..., σ n ) π i (σ 1, σ 2,..., σ n ) 1.1.14 Definition. A mixed strategy is a probability distribution on the set of a players pure strategies. When a player has a finite number of m strategies, its mixed strategy can be expressed as an m-vector, x (x 1,..., x m ) such that x i 0 and n i1 x i 1 Suppose players (1,2) have pay-off matrices (A n, B n ). Let X denote the set of all mixed strategies for player 1, and Y represent the set of all mixed strategies for player 2. If player 1 chooses mixed strategy x while player 2 chooses mixed strategy y, then the expected pay-off matrices can be written as P A x Ay and P B y Bx. So the Nash Equilibrium would be: P A min y x y Ax P B min x y x By It is straightforward to check that in the case of the pay-off matrix of the famous prisoner s dilemma, the NE is in fact the pair of smallest eigenvalues. 12

Prisoner s dilemma Example of PD payoff matrix M Cooperate (with other) Defect (betray other) ( ) Cooperate (with other) 2, 2 0, 3 Defect (betray other) 3, 0 1, 1 Since A ( ) 2 0, B 3 1 ( ) 2 3 and the NE (1, 1) (λ 0 1 min (A), λ min (B)). 13