BACKGROUND: WEAK CONVERGENCE, LINEAR ALGEBRA 1. WEAK CONVERGENCE

Similar documents
APPENDIX A Some Linear Algebra

Lectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix

= = = (a) Use the MATLAB command rref to solve the system. (b) Let A be the coefficient matrix and B be the right-hand side of the system.

More metrics on cartesian products

CSCE 790S Background Results

Linear, affine, and convex sets and hulls In the sequel, unless otherwise specified, X will denote a real vector space.

MATH 241B FUNCTIONAL ANALYSIS - NOTES EXAMPLES OF C ALGEBRAS

8.4 COMPLEX VECTOR SPACES AND INNER PRODUCTS

U.C. Berkeley CS294: Beyond Worst-Case Analysis Luca Trevisan September 5, 2017

Representation theory and quantum mechanics tutorial Representation theory and quantum conservation laws

Formulas for the Determinant

2.3 Nilpotent endomorphisms

Affine transformations and convexity

Inner Product. Euclidean Space. Orthonormal Basis. Orthogonal

On Finite Rank Perturbation of Diagonalizable Operators

n α j x j = 0 j=1 has a nontrivial solution. Here A is the n k matrix whose jth column is the vector for all t j=0

The Order Relation and Trace Inequalities for. Hermitian Operators

REAL ANALYSIS I HOMEWORK 1

Salmon: Lectures on partial differential equations. Consider the general linear, second-order PDE in the form. ,x 2

Homework Notes Week 7

332600_08_1.qxp 4/17/08 11:29 AM Page 481

THE GAUSSIAN AND WISHART ENSEMBLES: EIGENVALUE DENSITIES

SL n (F ) Equals its Own Derived Group

Ph 219a/CS 219a. Exercises Due: Wednesday 23 October 2013

NOTES ON SIMPLIFICATION OF MATRICES

Quantum Mechanics I - Session 4

First day August 1, Problems and Solutions

DIFFERENTIAL FORMS BRIAN OSSERMAN

7. Products and matrix elements

C/CS/Phy191 Problem Set 3 Solutions Out: Oct 1, 2008., where ( 00. ), so the overall state of the system is ) ( ( ( ( 00 ± 11 ), Φ ± = 1

Norms, Condition Numbers, Eigenvalues and Eigenvectors

5 The Rational Canonical Form

Eigenvalues of Random Graphs

Exercise Solutions to Real Analysis

DISCRIMINANTS AND RAMIFIED PRIMES. 1. Introduction A prime number p is said to be ramified in a number field K if the prime ideal factorization

Stanford University CS359G: Graph Partitioning and Expanders Handout 4 Luca Trevisan January 13, 2011

MEM 255 Introduction to Control Systems Review: Basics of Linear Algebra

BOUNDEDNESS OF THE RIESZ TRANSFORM WITH MATRIX A 2 WEIGHTS

Ph 219a/CS 219a. Exercises Due: Wednesday 12 November 2008

Random Walks on Digraphs

Perron Vectors of an Irreducible Nonnegative Interval Matrix

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X

Appendix B. Criterion of Riemann-Stieltjes Integrability

U.C. Berkeley CS294: Spectral Methods and Expanders Handout 8 Luca Trevisan February 17, 2016

Singular Value Decomposition: Theory and Applications

Bézier curves. Michael S. Floater. September 10, These notes provide an introduction to Bézier curves. i=0

Bezier curves. Michael S. Floater. August 25, These notes provide an introduction to Bezier curves. i=0

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 12 10/21/2013. Martingale Concentration Inequalities and Applications

Chapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems

Errata to Invariant Theory with Applications January 28, 2017

Example: (13320, 22140) =? Solution #1: The divisors of are 1, 2, 3, 4, 5, 6, 9, 10, 12, 15, 18, 20, 27, 30, 36, 41,

Foundations of Arithmetic

Lecture 3. Ax x i a i. i i

Lecture 20: Lift and Project, SDP Duality. Today we will study the Lift and Project method. Then we will prove the SDP duality theorem.

Week 2. This week, we covered operations on sets and cardinality.

The Prncpal Component Transform The Prncpal Component Transform s also called Karhunen-Loeve Transform (KLT, Hotellng Transform, oregenvector Transfor

763622S ADVANCED QUANTUM MECHANICS Solution Set 1 Spring c n a n. c n 2 = 1.

Linear Approximation with Regularization and Moving Least Squares

Google PageRank with Stochastic Matrix

Advanced Quantum Mechanics

COMPUTING THE NORM OF A MATRIX

Errors for Linear Systems

Quantum Mechanics for Scientists and Engineers. David Miller

STAT 309: MATHEMATICAL COMPUTATIONS I FALL 2018 LECTURE 16

NOTES FOR QUANTUM GROUPS, CRYSTAL BASES AND REALIZATION OF ŝl(n)-modules

1 Vectors over the complex numbers

Problem Set 9 Solutions

Dirichlet s Theorem In Arithmetic Progressions

9 Characteristic classes

FINITELY-GENERATED MODULES OVER A PRINCIPAL IDEAL DOMAIN

Matrix Approximation via Sampling, Subspace Embedding. 1 Solving Linear Systems Using SVD

ON A DETERMINATION OF THE INITIAL FUNCTIONS FROM THE OBSERVED VALUES OF THE BOUNDARY FUNCTIONS FOR THE SECOND-ORDER HYPERBOLIC EQUATION

COMPLEX NUMBERS AND QUADRATIC EQUATIONS

General viscosity iterative method for a sequence of quasi-nonexpansive mappings

Deriving the X-Z Identity from Auxiliary Space Method

20. Mon, Oct. 13 What we have done so far corresponds roughly to Chapters 2 & 3 of Lee. Now we turn to Chapter 4. The first idea is connectedness.

Complete subgraphs in multipartite graphs

Composite Hypotheses testing

a b a In case b 0, a being divisible by b is the same as to say that

D.K.M COLLEGE FOR WOMEN (AUTONOMOUS), VELLORE DEPARTMENT OF MATHEMATICS

Maximizing the number of nonnegative subsets

MATH Homework #2

Lecture Notes on Linear Regression

GELFAND-TSETLIN BASIS FOR THE REPRESENTATIONS OF gl n

MAT 578 Functional Analysis

THE CHINESE REMAINDER THEOREM. We should thank the Chinese for their wonderful remainder theorem. Glenn Stevens

2 More examples with details

Homework 1 Lie Algebras

However, since P is a symmetric idempotent matrix, of P are either 0 or 1 [Eigen-values

Norm Bounds for a Transformed Activity Level. Vector in Sraffian Systems: A Dual Exercise

Lecture 6/7 (February 10/12, 2014) DIRAC EQUATION. The non-relativistic Schrödinger equation was obtained by noting that the Hamiltonian 2

Edge Isoperimetric Inequalities

MATH Sensitivity of Eigenvalue Problems

FACTORIZATION IN KRULL MONOIDS WITH INFINITE CLASS GROUP

THE WEIGHTED WEAK TYPE INEQUALITY FOR THE STRONG MAXIMAL FUNCTION

3 Basic boundary value problems for analytic function in the upper half plane

Difference Equations

10-801: Advanced Optimization and Randomized Methods Lecture 2: Convex functions (Jan 15, 2014)

Lecture 4: Constant Time SVD Approximation

Transcription:

BACKGROUND: WEAK CONVERGENCE, LINEAR ALGEBRA 1. WEAK CONVERGENCE Defnton 1. Let (X,d) be a complete, separable metrc space (also known as a Polsh space). The Borel σ algebra on X s the mnmal σ algebra contanng the open (and hence also closed) subsets of X. If µ n and µ are fnte Borel measures on X, then µ n converges weakly (or n dstrbuton), wrtten µ n µ, f for every bounded, contnuous functon f : X R, (1) lm f dµ n = f dµ. n Proposton 1. If X s a compact metrc space then the space C (X ) of contnuous, real-valued functons on X s separable n the topology of unform convergence, that s, there s a countable subset G C (X ) such that for every functon f C (X ) and every ε > 0 there exsts g G such that f g < ε. Proof. See your favorte analyss textbook. In the specal case where X s a compact subset of R d for some d < (whch s probably the only case that wll occur n ths course) the proposton follows easly from Weerstrass Polynomal Approxmaton Theorem. Proposton 2. (Weerstrass) Let X be a compact subset of R d. Then the polynomal functons n d varables on X are unformly dense n C (X ), that s, for every functon f C (X ) and every ε > 0 there s a polynomal p(x) = p(x 1, x 2,..., x d ) such that f p < ε. Proposton 3. (Resz Representaton Theorem) (A) Let X be a compact metrc space. Then every bounded, postve, lnear functonal λ : C (X ) R s gotten by ntegraton aganst a fnte, postve, Borel measure µ: that s, (2) λ(g ) = g dµ g C (X ). (B) Let X be a Polsh space. Then every bounded, lnear functonal λ : C c (X ) R on the space C c (X ) of contnuous functons wth compact support s gotten by ntegraton aganst a postve, Borel measure µ that attaches fnte mass to every compact subset of X. Thus, (2) holds for all g C c (X ). Note: A lnear functonal λ : C (X ) R s sad to be bounded f there s a constant A < such that λ(g ) A g for every g C (X ), and postve f for every nonegatve contnuous functon g : X R the value λ(g ) 0. Proof. See Rudn, Real and Complex Analyss, ch. 2 for the standard proof. Proposton 4. (Helly s Selecton Prncple) Let X be a compact metrc space. Then every sequence µ n of bounded postve measures for whch sup n µ n (X ) < has a weakly convergent subsequence. 1

2 BACKGROUND: WEAK CONVERGENCE, LINEAR ALGEBRA Proof. Let G C (X ) be a countable, unformly dense set of contnuous functons on X. For every g G the sequence g dµ n s bounded n R, snce sup n µ n (X ) := A <, and so by Cantor s dagonal argument there s a subsequence µ k of µ n such that (3) λ(g ) := lm g dµ k k exsts and s fnte for all g G. Snce G s unformly dense n C (X ), t follows by a routne argument that the convergence (3) holds for all g C (X ). Furthermore, t s easy to check that the lmt λ(g ) s a lnear functonal on the space C (X ), and s postve and bounded (because λ(g ) A g ). Therefore, by the Resz Representaton Theorem, there s a fnte, postve, Borel measure µ on X such that λ(g ) = g dµ g C (X ). Ths, together wth the convergence (3), mples that µ k µ. Defnton 2. Let X be a Polsh space. A famly {µ } I of fnte, postve Borel measures on X s sad to be tght f for every ε > 0 there s a compact subset K X such that µ n (K c ) < ε I. Proposton 5. Let X be a Polsh space. Then every tght sequence µ n of fnte, postve Borel measures on X such that A := sup n µ n (X ) < has a weakly convergent subsequence. Remark 1. Ths s usually called Prohorov s theorem. The converse s also true, but wll not be needed. Note that the characterzaton of weak sequental compactness provded by ths theorem s hghly specfc to the noton of weak convergence gven n Defnton 1: If (1) were only requred to hold for contnuous functons f : X R wth compact support, then the tghtness hypothess would not be needed. See Bllngsley, Convergence of Probablty Measures, ch. 1 for further nformaton. Proof. The hypothess that X s a Polsh space guarantees that there s a sequence of compact subsets K n whose unon s X, and such that each K n s contaned n the nteror of K n+1. ( For X = R d ths s obvous just take K n = [ n,n] d. Note that the requrement K n nt(k n+1 ) ensures that there s a contnuous functon f : X [0,1] that s 1 on K n and 0 off K n+1, by the Urysohn Lemma.) By the Helly Selecton Prncple, for each K n there s a subsequence of µ n that converges weakly on K n. Hence, by the Cantor dagonal argument, there s a subsequence µ k that converges weakly to a measure ν n on each K n. Observe that ν n (K n ) A for each n. The problem s to show that µ k converges weakly to a fnte measure on X. (1) Choose f C b (X ). For any ε > 0 there exsts n so large that µ k (Kn c ) < ε for every k 1. (Exercse: why?) Now lm k K n f dµ k = K n f dν n and Snce ε > 0 s arbtrary, t follows routnely that f dµ k := λ(f ) lm k K c n f dµ k ε f k. exsts. Snce sup k µ k (X ) A, t s clear that λ(f ) A f, so the functonal λ s bounded. It s easy to check that λ s lnear and postve. (2) It remans to show that the functonal f λ(f ) s gven by ntegraton aganst a fnte Borel measure on X. For ths, note that the restrcton of λ to the space C c (X ) of contnuous

BACKGROUND: WEAK CONVERGENCE, LINEAR ALGEBRA 3 functons wth compact support s bounded, lnear, and postve, so the Resz Representaton Theorem mples that there s a postve measure ν such that λ(f ) = f dν C c (X ). To see that ths equalty extends to all f C b (X ), observe that each such f can be arbtrarly well-approxmated by functons n C c (X ). In partcular, for each n there s a contnuous functon h n : X [0,1] that s 1 on K n and 0 off K n+1. For each n the functon h n f s contnuous, has support K n+1, and agrees wth f on K n. Furthermore, snce f f h n s bounded n absolute value by f, λ(f ) λ(f h n ) 2ε f for all n large enough that µ k (Kn c ) < ε. Therefore, by the Domnated Convergence Theorem, λ(f ) = f dν Fnally, observe that ths dentty, wth f 1, mples that the total mass of ν s A, because λ(1) = lm k µ k (X ) A. Proposton 6. Let µ,ν be fnte Borel measures both wth compact support K R d. If µ and ν have the same moments, that s, f for every monomal x n 1 1 xn 2 2... xn d d, (4) x n 1 1 xn 2 2... xn d d dµ = x n 1 1 xn 2 2... xn d d dν, then they are equal as measures. More generally, f µ and ν are fnte postve Borel measures on R d wth fnte moment generatng functons n a neghborhood of the orgn, then equalty of moments (4) mples that µ = ν. Remark 2. In general, a probablty measure s not unquely determned by ts moments, even when they are all fnte. For further nformaton on ths subject, see, e.g., Rudn, Real and Complex Analyss, chapter on the Denjoy-Carleman theorem. Proof. Consder frst the case where both µ and ν are supported by a compact set K. Equalty of moments (4) mples that f dµ = f dν for every polynomal f, and therefore, by the Weerstrass theorem, for all contnuous functons f on K. Ths n turn mples that µ(f ) = ν(f ) for every rectangle F, by an easy approxmaton argument, and therefore for every Borel set F. Now consder the case where both measures have fnte moment generatng functons n a neghborhood of 0, that s, e θt x dµ(x) + e θt x dν(x) < for all θ n a neghborhood of the orgn n R d. In ths case the moment generatng functons extend to complex arguments z = (z 1, z 2,..., z d ), and defne holomorphc (see Remark 3 below) functons of d varables: (5) ϕ µ (z) := e zt x dµ(x) and ϕ ν (z) := e zt x dν(x). Equalty of moments mples that these two holomorphc functons have the same power seres coeffcents, and therefore are equal n a neghborhood of zero. It now follows that the two measures are the same (e.g., by the Fourer nverson theorem).

4 BACKGROUND: WEAK CONVERGENCE, LINEAR ALGEBRA Remark 3. That the functons ϕ µ (z) and ϕ ν defned by (5) are holomorphc n ther arguments z 1, z 2,..., z d s a consequence of the Cauchy, Morrera, and Fubn theorems, by a standard argument, usng the fact that e zt x s holomorphc n z for each x. The Morrera theorem states that a functon f (ζ) s holomorphc n a doman Ω C f t ntegrates to 0 on every closed curve γ that s contractble n Ω. The Cauchy ntegral theorem asserts that f f (ζ) s holomorphc n a doman Ω then t ntegrates to zero on every closed curve γ that s contractble n Ω. Thus, f f (ζ, x) s holomorphc n ζ for each x X, and µ s a Borel measure on X, then X f (ζ, x)dµ(x) s holomorphc n ζ provded the condtons of the Fubn theorem are satsfed. Proposton 7. (Method of Moments) Let µ be a probablty measure on R d wth all moments fnte, and such that t s the only fnte measure on R d wth these moments. Let µ n be a sequence of probablty measures on R d whose moments converge as n to the moments of µ. Then µ n = µ. Proof. Convergence of the second moments x 2 j dµ(x) s enough to ensure that the sequence µ n s tght (by Chebyshev s nequalty). Thus, every subsequence of µ n has a subsequence that converges weakly, by Prohorov s theorem. But any possble lmt must have moments that agree wth those of µ; snce µ s unquely determned by ts moments, t s the only possble accumulaton pont of the sequence µ n. The method of moments s the most useful tool for provng convergence of emprcal spectral dstrbutons n the theory of random matrces. There s one other useful tool, the Steltjes transform: Defnton 3. Let µ be a fnte, postve Borel measure on R. The Steltjes transform F µ (z) s the holomorphc (see Remark 3) functon of z C \ R defned by F µ (z) = (x z) 1 dµ(x). Remark 4. Snce the probablty measure µ s supported by R, the Steltjes transform satsfes F µ ( z) = F µ (z). Proposton 8. Let µ n be a tght sequence of probablty measures on R. If the Steltjes transforms F n (z) of the measures µ n converge for all z n a set A C\R wth an accumulaton pont n C\R to a lmt functon F (z) defned on A, then the sequence µ n converges weakly to a probablty measure whose Steltjes transform agrees wth F (z) on A. Proof. Snce the sequence µ n s tght, every subsequence has a weakly convergent subsequence. For every convergent subsequence µ k, the Steltjes transforms F k (z) converge to the Steltjes transform of the lmt measure for all z C \ R, because x (x z) 1 s a bounded, contnuous functon of x R. (Note: Ths functon s complex-valued, but ts real and magnary parts wll be bounded, contnuous, real-valued functons.) Thus, the lmt measure must have a Steltjes transform that agrees wth F (z) on the set A. Snce the set A has an accumulaton pont, analytcty guarantees that the Steltjes transform of the lmt measure s unquely determned by ts values F (z) on A.

BACKGROUND: WEAK CONVERGENCE, LINEAR ALGEBRA 5 2. MATRIX THEORY: TRACE AND DETERMINANT 2.1. Trace. The trace of a square matrx M = (m,j ), denoted by tr(m), s the sum m, of ts dagonal entres. The followng s elementary but mportant: Proposton 9. Let A = (a,j ) m n and B = (b,j ) n m be real or complex square matrces. Then (6) tr(ab) = tr(b A) = m =1 j =1 a,j b j,. Consequently, f A and B are smlar square matrces, that s, f there s an nvertble matrx U such that B = U 1 AU, then (7) tr(a) = tr(b). A square matrx A s dagonalzable f there exst an nvertble matrx U amd a dagonal matrx D such that A = U 1 DU. You should recall that A s dagonalzable f and only f the underlyng vector space has a bass consstng of egenvectors of A; n ths case, the dagonalzaton A = U 1 DU s obtaned by lettng U be the matrx whose columns are the egenvectors, and D the dagonal matrx whose dagonal entres are the correspondng egenvalues. Thus, for dagonalzable A, the trace tr(a) s just the sum of the egenvalues. More generally: Corollary 10. Assume that A s dagonalzable. Then for any nteger k 0, (8) tr(a k ) = λ k. Furthermore, for any analytc functon f (z) defned by a power seres f (z) = k=0 a k z k wth radous of convergence R > 0, f A has spectral radus < R (that s, f all egenvalues of A have absolute value < R) then (9) tr(f (A)) = f (λ ). Both assertons are easy consequences of Proposton 9. Relaton (8) s especally useful n conjuncton wth the method of moments, as t gves an effectve way of computng the moments of the emprcal spectral dstrbuton (defned below). Smlarly, relaton (9) gves a handle on varous transforms of the emprcal spectral dstrbuton, n partcular, the Steltjes transform. Defnton 4. If A s a dagonalzable n n matrx wth egenvalues λ, the emprcal spectral dstrbuton F A of A s defned to be the unform dstrbuton on the egenvalues (counted accordng to multplcty), that s (10) F A := n 1 n δ λ. =1 (11) Relaton (8) mples λ k F A (dλ) = n k tr(a k ).

6 BACKGROUND: WEAK CONVERGENCE, LINEAR ALGEBRA 2.2. Determnant. The determnant of an n n matrx A = (a,j ) s defned by the combnatoral formula (12) det(a) := n ( 1) σ a,σ( ). σ S n =1 The sum s over the set S n of permutatons of the ndex set [n], and for each permutaton σ S n, the sgn (or party) ( 1) σ s defned to be ( 1) N (σ), where N (σ) s the number of cycles n the cycle representaton of σ. Note that f A s dagonal, then ts determnant s just the product of the dagonal entres (the only permutaton that gves a nonzero contrbuton to the sum (12) s the dentty permutaton). Also, the combnatoral formula (12) mples (13) det(a T ) = det(a), where A T s the transpose of A. (Proof: exercse. You wll need the fact whch you should also prove that the party ( 1) σ of any permutaton σ s the same as that of ts nverse.) The followng recursve algorthm for computng the determnant s another routne consequence of the defnton: (14) det(a) = ( 1) j +k 1 a k,j det(a k,j ) j =1 where A k,j s the (n 1) (n 1) matrx obtaned from A by deletng the kth row and the j th column. Proposton 11. The determnant det(a) s multlnear and antsymmetrc n the columns of A, that s, for column vectors a and a, scalars b,b, and permutatons σ (15) (16) det(ba 1 + b a 1, a 2, a 3,..., a n ) = b det(a 1, a 2,..., a n ) + b det(a 1, a 2,..., a n ) det(a σ(1), a σ(2),..., a σ(n) ) = ( 1) σ det(a 1, a 2,..., a n ). and These are both easy consequences of the defnton (12). (For the second equaton (16), use the fact that the party functonal s multplcatve, that s, ( 1) στ = ( 1) σ ( 1) τ ). Proposton 11 states that the determnant s unaffected by column operatons (recall that a column operaton conssts of addng a scalar multple of a column to another column), and that column transpostons change the sgn of the determnant. In vew of the transpose rule (13), ths also mples that the determnant s unaffected by row operatons, and that row transpostons change the sgn of the determnant. Corollary 12. The determnant det(a) s zero f A s not full rank, that s, f the columns are lnearly dependent. Proof. If a column of A s 0, then the determnant s zero, because each product n the sum (12) has a factor from ths column. Also, f two columns of A are dentcal then det(a) = 0, by antsymmetry. (Apply (16) wth σ = the transposton that swtches the two dentcal columns. Snce the party ( 1) σ of a transposton s always 1, (16) mples that det(a) = det(a).) Hence, f two columns of A are proportonal, then det(a) = 0. Now suppose that the columns of A are lnearly dependent. Then there s a nontrval relaton among the columns; wthout loss of generalty (by applyng a sutable permutaton to the columns and usng (16)), the relaton s of the form a 1 + β 2 a 2 + + β n a n = 0

BACKGROUND: WEAK CONVERGENCE, LINEAR ALGEBRA 7 where a s the th column of A and β are scalars. But ths mples that a sequence of column operatons (successvely replace the frst column by ts sum wth β j the j th column) wll lead to a matrx whose frst column s 0. Thus, the determnant of A s 0. Proposton 13. If A and B are both n n matrces, then (17) det(ab) = det(a)det(b) = det(b)det(a) = det(b A). Consequently, f A s nvertble then det(a 1 ) = det(a) 1 0, and f the matrces A and B are smlar, then they have the same determnant. In partcular, f A s dagonalzable wth egenvalues λ, then n (18) det(a) = λ. Proof. (Sketch) The only nontrval asserton s the product rule det(ab) = det(a) det(b). It s easy to check that ths dentty holds for upper trangular matrces A and B. The general case can be deduced from ths usng the fact that any square matrx can be put nto upper trangular form by ether row or column operatons and transpostons. =1 3. HERMITIAN, UNITARY, AND ORTHOGONAL MATRICES 3.1. Spectral Theorems. A Hermtan matrx s a square complex matrx that equals ts conjugate transpose. A matrx wth real entres s therefore Hermtan f and only f t s symmetrc. A untary matrx s a square complex matrx whose conjugate transpose s ts nverse. In spectral problems, t s often advantageous to take a coordnate-free approach, usng an nner product to defne Hermtan and untary operators. Thus, let V be a complex vector space, and let, be a complex nner product on V. Recall the defnton (see Rudn, Real and Complex Analyss, ch. 4): Defnton 5. An nner product on a complex vector space V s a mappng V V C that satsfes (a) u, v = v,u. (b) αu + α u, v = α u, v + α u, v. (c) u,u > 0 for all u 0. (d) 0,0 = 0. The dfference between the real and complex cases s rule (a); ths together wth (b) mples that u,αv = ᾱ u, v. The natural nner product on C n s u, v = u v. More generally, the natural nner product on the space L 2 (µ) of square-ntegrable complex-valued functons on a measure space (Ω,F,µ) s f, g = f ḡ dµ; when µ s a probablty measure ths s a complex analogue of the covarance. An nner product space s a vector space equpped wth an nner product. Two vectors u, v n an nner product space are sad to be orthogonal f u, v = 0. An orthonormal set s a set of unt vectors such that any two are orthogonal. You should recall that the Gram-Schmdt algorthm produces orthonormal bases. Defnton 6. Let V be a fnte-dmensonal nner product space. For any operator (.e., lnear transformaton) T : V V the adjont T s the unque lnear transformaton such that (19) Tu, v = u,t v u, v V.

8 BACKGROUND: WEAK CONVERGENCE, LINEAR ALGEBRA The lnear transformaton T s called Hermtan f T = T, and untary f T 1 = T, equvalently, (20) (21) Tu, v = u,t v (Hermtan) Tu,T v = u, v (Untary) Theorem 14. (Spectral Theorem for Hermtan Operators) Let T be a Hermtan operator on a fnte-dmensonal nner product space V. Then all egenvalues λ of T are real, and there s an orthonormal bass {u } consstng of egenvectors of T. Thus, (22) T v = λ v,u u v V. Theorem 15. (Spectral Theorem for Untary Operators) Let T be a untary operator on a fntedmensonal nner product space V. Then all egenvalues λ of T have absolute value 1, and there s an orthonormal bass {u } consstng of egenvectors of T. Thus, (23) T v = λ v,u u v V. Some of the mportant elements of the proofs are lad out below. Consder frst the case where T s Hermtan. Suppose that T v = λv; then λ v, v = λv, v = T v, v = v,t v = v,λv = λ v, v. Thus, λ = λ, so all egenvalues of T are real. A smlar argument shows that egenvalues of untary operators must be complex numbers of absolute value 1. Next, there s the noton of an nvarant subspace: a lnear subspace W of V s nvarant for T f T W W. If T s Hermtan (respectvely, untary) and W s an nvarant subspace, then the restrcton T W of T to W s also Hermtan (respectvely, untary). Also, f T s nvertble, as s the case f T s untary, then W s an nvarant subspace f and only f T W = W. The followng s an easy exercse: Proposton 16. Let T be ether Hermtan or untary. If T has an nvarant subspace W, then the orthogonal complement 1 W of W s also an nvarant subspace for T. Proof of Theorems 14 15. The proof s by nducton on the dmenson of V. Dmenson 1 s trval. Now every lnear operator T on a complex vector space V has at least one egenvector v. (Proof: The characterstc polynomal det(λi T ) has a zero, snce C s algebracally complete. For any such root λ the lnear transformaton λi T must be sngular, by Proposton 13, and so the equaton (λi T )v = 0 has a nonzero soluton v.) If v s an egenvector of T, then the one-dmensonal subspace of V spanned by v s nvarant, and so ts orthogonal complement W s also nvarant. But dmenson(w ) s less than dmenson(v ), so the nducton hypothess apples to T W : n partcular, W has an orthonormal bass consstng of egenvectors of T. When augmented by the vector v/ v, v, ths gves an orthonormal bass of V made up entrely of egenvectors of T. 1 The orthogonal complement W s defned to be the set of all vectors u such that u s orthogonal to every w W.

BACKGROUND: WEAK CONVERGENCE, LINEAR ALGEBRA 9 3.2. Orthogonal Matrces: Spectral Theory. An orthogonal matrx s a untary matrx whose entres are real, equvalently, a real matrx whose transpose s ts nverse. Because an orthogonal matrx s untary, the Spectral Theorem for untary operators mples that ts egenvalues are complex numbers of modulus 1, and that there s an orthonormal bass of egenvectors. Ths doesn t tell the whole story, however, because n many crcumstances one s nterested n the acton of an orthonormal matrx on a real vector space. An orthogonal lnear transformaton of a real nner product space need not have real egenvectors: for nstance, the matrx ( ) cosθ snθ R θ := snθ cosθ actng on R 2 has no nonzero egenvectors unless θ s an nteger multple of 2π, because R θ rotates every nonzero vector through an angle of θ. Lemma 17. Let T be an orthogonal n n matrx, and let v be a (possbly complex) egenvector wth egenvalue λ. Then the complex conjugate vs also an egenvector, wth egenvalue λ. The proof s trval, but the result s mportant because t mples the followng structure theorem for orthogonal lnear transformatons of real nner product spaces. Corollary 18. Let T be an orthogonal n n matrx actng on the real nner product space R n. Then R n decomposes as an orthogonal drect sum of one- or two-dmensonal nvarant subspaces for T, on each of whch T acts as a rotaton matrx R θ. In other words, n a sutable orthonormal bass T s represented by a matrx n block form (where all but the last two blocks are of sze 2 2) R θ1 0 0 0 0 0 0 R θ2 0 0 0 0 0 0 0 R θk 0 0 0 0 0 0 I 0 0 0 0 0 0 I Proof. The only possble real egenvalues are ±1. On the space of egenvectors wth egenvector +1 the matrx T acts as the dentty, and on the space of egenvectors wth egenvector +1 the matrx T acts as ( 1) the dentty. Consequently, each of these subspaces splts as a drect sum of one-dmensonal subspaces. Let v = u+ w be a complex egenvector wth real and magnary parts u, w and egenvalue λ = e θ. By Lemma 17, v = u w s an egenvector wth egenvalue e θ. Addng and subtractng the egenvector equatons for these two egenvectors shows that the two-dmensonal real subspace of R n spanned by u, w s nvarant for T, and that the restrcton of T to ths subspace s just the rotaton by θ. It s routne to check that the two-dmensonal nvarant subspaces obtaned n ths manner are mutually orthogonal, and that each s orthogonal to the one-dmensonal nvarant subspaces correspondng to egenvalues ±1. 3.3. Mnmax Characterzaton of Egenvalues. Let T : V V be a Hermtan operator on a fnte-dmensonal vector space V of dmenson n. Accordng to the Spectral Theorem 14, the egenvalues λ of T are real, and there s an orthonormal bass consstng of egenvectors u. Because the egenvectors are real, they are lnearly ordered: (24) λ 1 λ 2 λ n.

10 BACKGROUND: WEAK CONVERGENCE, LINEAR ALGEBRA Proposton 19. (25) (26) λ n = max Tu,u u: u =1 λ 1 = mn Tu,u u: u =1 and Proof. Any unt vector u has an expanson u = α 1 u n the egenvectors of T, where the (complex) scalars α satsfy α 2 = 1. It follows that Tu = α λ u = Tu,u = λ α 2 Snce u s a unt vector, the assgnment α 2 s a probablty dstrbuton on the ndex set [n]. Clearly, the probablty dstrbuton that maxmzes the expectaton λ α 2 puts all of ts mass on the ndces for whch λ s maxmal. Thus, the maxmal expectaton s λ n. Smlarly, the mnmum expectaton s λ 1. The mnmax characterzaton s a generalzaton of Proposton 19 to the entre spectrum. Ths characterzaton s best descrbed usng the termnology of game theory. The game s as follows: Frst, I pck a lnear subspace W V of dmenson k; then you pck a unt vector u n W. I pay you Tu,u. If we both behave ratonally (not always a sure thng on my end, but for the sake of argument let s assume that n ths case I do) then I should choose the subspace spanned by the egenvectors u 1,u 2,...,u k, and then you should choose u = u k, so that the payoff s λ k. That ths s n fact the optmal strategy s the content of the mnmax theorem: Theorem 20. (Mnmax Characterzaton of Egenvalues) (27) λ k = mn W :dm(w )=k = max W :dm(w )=n k max Tu,u u W : u =1 mn Tu,u u W : u =1 Proof. The second equalty s obtaned from the frst by applyng the result to the Hermtan operator T, so only the frst equalty need be proved. It s clear that the rght sde of (27) s no larger than λ k (see the comments precedng the statement of the theorem), so what must be proved s that for any subspace W of dmenson k, max Tu,u λ k. u W : u =1 Let u be an orthonormal bass of V such that Tu = λ u, where the egenvalues λ are arranged n ncreasng order as n (24). The subspace W has an orthonormal bass w of cardnalty k, and the vectors n ths bass can be expanded as lnear combnatons of the egenvectors u : w = β,j u j. j =1 Because the vectors w form an orthonormal bass of W, the rows of the matrx β,j are orthonormal, that s, the n vectors β wth entres β,j are orthonormal. Consder the k (k 1) matrx β 1,1 β 1,2 β 1,k 1 B := β 2,1 β 2,2 β 2,k 1 ; β k,1 β k,2 β k,k 1

BACKGROUND: WEAK CONVERGENCE, LINEAR ALGEBRA 11 the rank of B s at most k 1, so there s a nontrval lnear combnaton of the rows that sums to zero. Wthout loss of generalty, the coeffcents n ths lnear combnaton can be scaled so as to form a k vector α of norm 1, that s, k k α 2 = 1 and α β,j = 0 1 j k 1. Thus, =1 =1 =1 k k w := α w = α β,j u j = =1 j =1 k =1 j =k α β,j u j := γ j u j s a unt vector n W that s orthogonal to the frst k 1 egenvectors u j. Snce w s a unt vector, j γ j 2 = 1, so t follows that T w, w = λ j γ j 2 λ k. j =k Corollary 21. (Egenvalue nterlacng) Let T : V V be Hermtan, and let W V be a lnear subspace of dmenson n 1, where n = dm(v ). Then the egenvalues of the restrcton T W are nterlaced wth those of T on V : that s, f the egenvalues of T are λ 1 λ 2 λ n and the egenvalues of T W are µ 1 µ 2 µ n 1, then (28) λ 1 µ 1 λ 2 µ 2 λ n 1 µ n 1 λ n. Proof. It suffces to prove that µ k λ k, because the reverse nequaltes then follow by consderng T. By Theorem 20, λ k = mn S V :dm(s)=k max Tu,u u S : u =1 j =k µ k = mn max Tu,u, S W :dm(s)=k u S : u =1 where the mnma are taken over lnear subspaces S of V and W, respectvely. Snce W V, every lnear subspace of W s a lnear subspace of V, and so the frst mn s taken over a larger collecton than the second. Thus, µ k λ k. 3.4. Emprcal spectral dstrbutons. Recall that the emprcal spectral dstrbuton of a dagonalzable matrx s the unform dstrbuton on egenvalues (counted accordng to multplcty). The emprcal spectral dstrbuton s the object of prmary nterest n the study of random matrces. Thus, t s useful to know how changes n the entres of a matrx affect the emprcal spectral dstrbuton. In ths secton we gve two useful bounds on the magntude of the change n the emprcal spectral dstrbuton under certan types of matrx perturbatons. Defnton 7. The Lévy dstance between two probablty dstrbutons µ, ν on R wth cumulatve dstrbuton functons F µ and F ν s defned to be and (29) L(µ,ν) := nf{ε > 0 : F (x ε) ε G(x) F (x + ε) + ε}. Observe that f F G < ε then L(F,G) < ε; the converse, however, need not hold. Moreover, the Lévy dstance characterzes convergence n dstrbuton, that s, lm n L(F n,f ) = 0 f and only f F n = F.

12 BACKGROUND: WEAK CONVERGENCE, LINEAR ALGEBRA Corollary 22. (Perturbaton Inequalty) Let A and B be Hermtan operators on V = C n relatve to the standard nner product, and let F A and F B be ther emprcal spectral dstrbutons. Then (30) L(F A,F B ) n 1 rank(a B). Proof. It suffces to prove ths for Hermtan operators A,B that dffer by a rank-1 operator = A B, because operators that dffer by a rank-k operator can be connected by a chan of k + 1 Hermtan operators whose successve dfferences are all rank-1. The operator s Hermtan, so f t s rank-1 then t has a sngle nonzero egenvalue δ, wth correspondng egenvector w. Let W be the (n 1) dmensonal subspace orthogonal to w; then W = 0 and so the restrctons A W and B W are dentcal. Let µ 1 µ n 1 be the egenvalues of A W = B W. By the Egenvalue Interlacng Theorem (Corollary 21), the (ordered) egenvalues λ B of B are nterlaced wth the sequence µ, and so are the egenvalues λ A of A. Consequently, t s mpossble for ether to occur for any k. λ A k+2 < λb k or λ B k+2 < λa k Proposton 23. Let A and B be n n Hermtan matrces wth egenvalues λ and µ, respectvely, lsted n decreasng order. Then (31) (λ µ ) 2 Tr (A B) 2. =1 Proof. Expand the squares on both sdes of the nequalty to obtan the equvalent statement (32) (λ 2 2λ µ + µ 2 ) Tr A2 + Tr B 2 2Tr AB (ths also uses the fact that Tr AB = Tr B A). SInce Tr A 2 = λ 2 and Tr B = µ 2, provng nequalty (32) s tantamount to provng (33) Tr AB λ µ. Nether the trace nor the spectrum of a dagonzable matrx depends on whch bass for the vector space s used. Consequently, we can work n the orthonormal bass of egenvectors of A, that s to say, we may assume that A s dagonal. Snce B s also Hermtan, there s a untary matrx U that dagonzes B. Thus, and so (33) s equvalent to (34) A = dag(λ 1,λ 2,...,λ n ) and B = U dag(µ 1,µ 2,...,µ n )U, λ µ j U,j 2 λ µ j Now U s untary, so the matrx ( U,j 2 ),j s doubly stochastc (U untary means that the rows and columns are orthonormal). Hence, nequalty (34) wll follow from the nequalty (35) λ µ j p,j λ µ j where p,j s any doubly stochastc matrx. There are varous ways to prove (35). Followng s a short and panless proof that uses Brkhoff s theorem on doubly stochastc matrces. Brkhoff s theorem states that the convex set S of doubly

BACKGROUND: WEAK CONVERGENCE, LINEAR ALGEBRA 13 stochastc n n matrces has as ts extreme ponts the permutaton matrces; that s, every doubly stochastc matrx s a convex combnaton of permutaton matrces. To see how Brkhoff s theorem apples to (35), consder the problem of maxmzng the left sde over all doubly stochastc matrces p,j. Snce the left sde s a lnear form n the varables p,j, and snce the set S of doubly stochastc n n matrces s a convex set, the maxmum must ocur at one of the extreme ponts of S. By Brkhoff s theorem, the extreme ponts of S are the permutaton matrces (see Exercse 1 below), so t suffces to check that max σ λ µ σ( ) = λ µ, where the max s over the set of all permutatons σ. For ths, just check that f σ s not the dentty permutaton, then the left sde can be ncreased (or at least not decreased) by swtchng two ndces, where λ,λ and µ,µ are n opposte relatve order. Exercse 1. Prove Brkhoff theorem, that s, show that every doubly stochastc matrx can be wrtten as a convex combnaton of permutaton matrces. Example: ( ).3.7 =.3.7.3 ( ) 1 0 +.7 0 1 ( ) 0 1 1 0