PROOF OF TWO MATRIX THEOREMS VIA TRIANGULAR FACTORIZATIONS ROY MATHIAS Abstract. We present elementary proofs of the Cauchy-Binet Theorem on determinants and of the fact that the eigenvalues of a matrix are continuous functions of its entries by rst reducing these problems to triangular matrices. Key words. Eigenvalue Continuity, Cauchy-Binet Theorem, Determinant AMS(MOS) subject classications. 15A15, 15A18 In this paper we present elementary proofs of the Cauchy-Binet Theorem on determinants and of the fact that the eigenvalues of a matrix are continuous functions of its entries. Both these results are useful, but are not usually proved in the textbooks nor in many of the references in which they are used. Let M n denote the space of n n complex matrices. Given A 2 M n let i (A); i = 1; : : : ; n denote its eigenvalues, and let (A) = f 1 (A); ; n (A)g denote the collection of the eigenvalues of A counting multiplicity. We will measure the distance between the eigenvalues of A; B 2 M n by the matching distance (1) d M ((A); (B)) min max i=1; : : : ; n j i(a)? (i) (B)j where the minimum is taken over all permutations on f1; : : : ; ng. This measure is not, in general, easy to compute since by Hall's Marriage Theorem for any t 0, (2) d M ((A); (B)) t if and only if for each k = 1; : : : ; n and each subset f1; : : : ; ng of cardinality k there is a subset f1; : : : ; ng such that (3) max i2 minj i (A)? j (B)j t: j2 (See for example [1, Chapter 1] for an exposition of the application of Hall's Theorem to the matching distance and a discussion of other measures of the distance between eigenvalues.) However, if we take t 0 such that (4) t < 1 2 minfj i(a)? j (A)j : i (A) 6= j (A)g (taking the minimum to be +1 if the eigenvalues of A are all identical) then, because no point can be within a distance t of two distinct eigenvalues of A, one can easily show Institute for Mathematics and its Applications, 514 Vincent Hall, 206 Church St. SE, University of Minnesota, Minneapolis, MN 55455. On leave from Department of Mathematics, College of William & Mary, Williamsburg, VA 23187. e-mail: na.mathias@na-net.ornl.gov.
2 roy mathias that (2) holds if and only if for each eigenvalue i (A) of multiplicity k there are at least k eigenvalues of B in D t ( i (A)) fz 2 C : jz? i (A)j tg: When we say that the eigenvalues of a matrix are a continuous function of its entries we mean that given any A = [a ij ] n i;j=1 2 M n and > 0 there exists such that whenever B = [b ij ] n i;j=1 2 M n is such that ja ij? b ij j ; i; j = 1; : : : ; n then (5) d M ((A); (B)) : This is equivalent to saying that if A k! A then for each k there is a permutation k such that lim k!1 k (i)(a k ) = i (A); or in words, that if the eigenvalues of A k are suitably ordered then they converge to those of A. Theorem 1. The eigenvalues of A 2 M n are a continuous function of its entries. The usual proof of the continuity of the eigenvalues of a matrix uses the fact that the eigenvalues of the matrix are the roots of characteristic polynomial and the fact the roots of a polynomial are continuous functions of the coecients [6, Appendix K]. The proof of the latter depends on Rouche's Theorem, which one may want to avoid in a course in Linear Algebra. Our proof is based on a compactness argument and Schur's Triangularization Theorem [3, Theorem 2.3.1]: Theorem 2 (Schur). Let A 2 M n have eigenvalues 1 ; : : : ; n, given in any order. Then, there is a unitary U 2 M n such that T U AU is upper triangular and t ii = i. Proof. (of Theorem 1) The proof is by contradiction. Take t = 1 2 minfj i? j j : i 6= j ; and i ; j are eigenvalues of Ag: That is, t is half the minimum distance between distinct eigenvalues of A. Suppose that A k! A and that the eigenvalues of A k do not converge to those of A. Then there must be an > 0, an eigenvalue of A of multiplicity m and a subsequence A ij such that the number of eigenvalues of A ij in D () = fjz? j g is less than m for j = 1; 2; : : :. We may assume with loss of generality that the subsequence fa ij g is actually fa j g, the sequence itself, and that < t. By Theorem 2 there are unitary matrices U j such that T j U j A j U j is upper triangular and its rst n? m + 1 diagonal entries lie outsided (). By the compactness
proofs via matrix factorizations 3 of the set of unitary matrices there is a convergent subsequence fu ij g that converges to a unitary U. Then U AU = lim j!1 U i j A ij U ij = lim j!1 T ij : Therefore, U AU is upper triangular and its n? m + 1 leading diagonal entries, which are n?m+1 of the eigenvalues of A, lie outside D (). This contradicts the assumption that is an eigenvalue of A of multiplicity m. 2 This approach to the proof of Theorem 1 has also been noted in [2, Problem 7.1.6]. The fact that the eigenvalues of a matrix are continuous functions of its entries is very useful in numerical linear algebra. It is used in the proof of Gershgorin's Theorem, stated below, which gives a quantitative version of the continuity of the eigenvalues of a matrix. Theorem 3 (Gershgorin). Let A = [a ij ] 2 M n. The eigenvalues of A lie in the union of the discs D i fz : jz? a ii j X j6=i ja ij jg: Furthermore, if the union of any k of these discs is disjoint form the remaining n? k then there are precisely k eigenvalues of A in the uniion of these k discs. The second part of the theorem is what makes it useful. For example, from the rst apart alone one can only conclude that the eigenvalues of lie in 1 0! ; < 1=2 fjz? 1j g [ fjzj < g and not that there is one eigenvalue near 0 and one near 1. The continuity of the eigenvalues is essential in existing proofs of the second part of Gershgorin's Theorem (see e.g., [3, Theorem 6.1.1]). The rst part, which is used mainly as a criterion for nonsingularity or to provide a bound on the spectral radius can be proved by considering the ith component of the equation (A? I)x = 0, where x i is the largest entry of x in absolute value. It was necessary to use the Schur Decomposition of the A i rather that the Jordan Decomposition: S i A i S?1 i = J i where J i is a Jordan canonical form of A i. The reason is that the set of invertible matrices is not compact so we cannot be sure that fs i g has a convergent subsequence, and even if it does we cannot be sure that the limit is invertible. The Schur decomposition can also be used to prove the Cayley-Hamilton Theorem (for complex matrices) [3, Theorem 2.4.2], the Spectral Radius Theorem [3, Corollary 5.6.14], and some results relating the eigenvalues and singular values of a matrix [5].
4 roy mathias Now we will prove the Cauchy-Binet Theorem. First we must introduce some notation. Given integers k; n with 1 k n we let the n kc subsets of f1; : : : ; ng be ordered in some xed order. Henceforth, when we use the term \subset" we will mean a subset of f1; : : : ; ng of cardinality k. Let A denote the collection of these subsets, and let A 2 M n. Given ; 2 A we dene A(; ) to be the k k submatrix of A lying in rows corresponding to the indices in and the columns corresponding to the indices in. We dene the kth compound of A to be the n kc n k C matrix with entries indexed by the elements of A given by C k (A) ; = det A(; ); ; 2 A: Notice that C n (A) = det(a), and that C n?1 (A) = adj(a) T, the transpose of the classical adjoint of A if the subsets are ordered f2; : : : ; ng; f1; 3; : : : ; ng; : : : ; f1; : : : ; n? 1g. The basic result on compounds is Theorem 4 (Cauchy-Binet). Let A; B 2 M n and let k be an integer between 1 and n. Then (6) C k (AB) = C k (A)C k (B): One can see that the result is true for k = n since det(ab) = det(a) det(b), and for k = n? 1 since C n?1 (AB) = (adj(ab)) T = (adj(b)adj(a)) T = adj(a) T adj(b) T = C n?1 (A)C n?1 (B): Using Theorems 2 and 4 one can show that the eigenvalues of C k (A) are all products of k eigenvalues of A. The details can be found in [4, Chapter 19, F.2.b]. One can also dene a kth additive compound of A by k (A) d dt C k(i + ta) : t=0 The eigenvalues of k (A) are all sums of k eigenvalues of A [4, Chapter 19, F.3]. These two compounds are very useful in proving results for sums and products of the eigenvalues and singular values of matrices [4, Chapters 9 and 20]. We will use the following lemma to reduce the proof of Theorem 4 to the case where B is an elementary matrix, that is B = [b ij ] n i;j=1 is of the form (7) b ij = 8 >< >: 1 i = j x i = l; j = m 0 otherwise for some l 6= m. The lemma can be proved by a variant of the method used to prove the existence of row-reduced echelon form. Note that the L i in the lemma are not necessarily lower triangular, and the R i are not necessarily upper triangular.
proofs via matrix factorizations 5 Lemma 5. Let A =2 M n. Then there are elementary matrices L i ; i = 1; : : : ; l and R i ; i = 1; : : : ; r and a diagonal matrix D such that A = L 1 L l DR 1 R r : Proof. (of Theorem 4) In view of Lemma 5 it is sucient to prove Theorem 4 in the case that B is a diagonal matrix or an elementary matrix (7). The case of diagonal B is much easier and we omit it. We will assume that l < m (so that B is upper triangular) as the other case can be dealt with in the same way. We will verify (6) in this special case by direct computation. First we will compute C k (B), then C k (AB), and then check that (6) holds. We will dene by s() to be the cardinality of \ fl + 1; : : : ; m? 1g. Also, given a subset containing m but not l we dene the the `complementary' subset by nfmg [ flg: For any subsets and one can check, by explicitly writing down the matrix B(; ) that (8) C k (B) ; = det(b(; )) = 8 >< >: 1 if = (?1) s() x if l 2 ; and = 0 otherwise The second case is the only case that is not immediate. In this case one can check that B(; ) = I j1 B ~ I j2 where j 1 + j 1 + s() = k and B ~ is s() s() and of the form One can now check that ~B = 0 B @ 1 0 x. 1........ C A : 1 0 det(b(; )) = det( ~ B) = (?1) s() x by applying s() column interchanges to transform ~ B to be diagonal. Now x and. Let us compute C k (AB). There are several cases. If m 62 then once can easily check that A(; ) = (AB)(; ) and hence that C k (AB) = C k (A). Now consider the case m 2. Let a i denote the k vector consisting of the entries in the ith column of A and the rows corresponding to. Let the elements of be i 1 < < i k, and assume that i r = m. Then writing (AB)(; ) column-wise we have (AB)(; ) = [a i1 a ir?1 (a ir + xa l ) a ir+1 a ik ]:
6 roy mathias So, using the Laplace expansion ofdthe determinant on the rth column of this matrix we have (9) det((ab)(; )) = det([a i1 a ik ]) + det([a i1 a ir?1 (xa l ) a ir+1 a ik ]) = det(a(; )) + x det([a i1 a ir?1 a l a ir+1 a ik ]): Call the second matrix in the nal expression C. There are two subcases. If l 2 then two of the columns of C are the same so det(c) = 0. Otherwise l 62 and the columns of C are those of A(; ) but in a dierent order. By the denition of s(), we know that i r?s() < l < i r?s()+1. So, if the rth column of C, which is a l, is moved to between the [r? s()]th and [r? s() + 1]th columns then the resulting matrix is A(; ). This transformation can be accomplished by interchanging a l and the column on its left s() times so To summarize, we have shown det(c) = (?1) s() det(a(; )): (10) ( Ck (A) C k (AB) ; = ; if m 62 ; or fl; mg C k (A) ; + (?1) s() x C k (A) ; otherwise From (8) one can verify that : [C k (A)C k (B)] ; = X 2A C k (A) ; C k (B) ; is the same as (10). 2 If k = n the proof simplies greatly since we only have to show that det(ab) = det(a) when B is an elementary matrix. This follows easily from (9) and the fact that the second matrix in (9) has two identical columns and hence has determinant 0. Thus we have a proof of the multiplicativity of the determinant that does not involve much computation. We could have avoided the use of Lemma 5 in the proof Theorem 4 by using the LU factorization of B - a better known fact. One way to do this is to use the fact that any matrix B 2 M n may be factored as B = P LDU where P is a permutation, L is a product of lower triangular elementary matrices, U is a product of upper triangular matrices and D is diagonal.' However, this would require that one verify (6) for permutation matrices as well as for elementary matrices. Another way to avoid the use of Lemma 5 is to use that fact that if the leading principal minors of B are non-zero then B = LDU with L; D and U as before. To verify (6) for such B it is again sucient to verify it for diagonal and elementary matrices. One can then obtain (6) for general B by a limiting argument. The approach employed in the paper is perhaps the simplest. REFERENCES
proofs via matrix factorizations 7 [1] R. Bhatia. Perturbation Bounds for Matrix Eigenvalues. Pitman Research Notes in Mathematics 162. Longman Scientic and Technical, New York, 1987. [2] G. H. Golub and C. F. Van Loan. Matrix Computations. The Johns Hopkins University Press, Baltimore, second edition, 1989. [3] R. A. Horn and C. R. Johnson. Matrix Analysis. Cambridge University Press, New York, 1985. [4] A. W. Marshall and I. Olkin. Inequalities: Theory of Majorization and its Applications. Academic Press, London, 1979. [5] R. Mathias. Two theorems on singular values and eigenvalues. Amer. Math. Monthly, 97(1):47{50, 1990. [6] A. M. Ostrowski. Solution of Equations and Systems of Equations. Academic Press, London, 1960.