Baltzer Journals April 4, 996 Bounds for the Trace of the Inverse and the Determinant of Symmetric Positive Denite Matrices haojun Bai ; and Gene H. Golub ; y Department of Mathematics, University of Kentucky, Lexington, KY 40506 E-mail: bai@ms.uky.edu Scientic Computing and Computational Mathematics Program, Computer Science Department, Stanford University, Stanford, CA 94305 E-mail: golub@sccm.stanford.edu. Dedicated to T. Rivlin on the occasion of his 70th birthday Lower and upper bounds are given for the trace of the inverse tr(a? ) and the determinant det(a) of a symmetric positive denite matrix A. They are derived by applying Gaussian quadrature and related theory. The bounds for det(a) appears to be new. For the bounds of tr(a? ), the Kantorovich inequality is available for providing such bounds. In a number of examples, our bounds are found to be tighter when simple trial vectors are used in Kantorovich's bound. The new bounds are equivalent to Robinson and Wathen's variational bounds. But our bounds are directly derived for the quantity instead of the summation of bounds for each diagonal entry of A?. Subject classication: AMS(MOS) 65F0 Keywords: trace, inverse, determinant, quadrature Introduction There are a number of applications where it is desired to estimate the bounds for the quantities of the trace of the inverse tr(a? ) and the determinant det(a) of a matrix A, such as in the study of fractals [4, 8], lattice Quantum Chromodynamics (QCD) [5, 3], crystals [, ], the generalized cross-validation and its applications (see [7] and references therein). In this paper, we focus on deriving lower and upper bounds for the quantities tr(a? ) The author was supported in part by by an NSF grant ASC-933958 and in part by an DOE grant DE-FG03-94ER59. y The work of this author was in part supported by NSF under grant CCR-9505393.
Bai and Golub / Bounds for tr(a? ) and det(a) and det(a) of a symmetric positive denite matrix A. Throughout this paper, A will denote an n-by-n symmetric positive denite matrix with eigenvalues n : (A) is the set of all eigenvalues. The parameters and denote the bounds for the smallest and largest eigenvalues and n of A, 0 < ; n : a ij or (A) ij will denote the (i; j) entry of a matrix A. Using the eigenvalue decomposition and the denition of matrix function [6], it is easy to prove the identity ln(det(a)) = tr(ln(a)) () for a symmetric positive denite matrix A. Therefore, instead of bounding det(a), we will bound ln(det(a)), which is turned into bounding tr(ln(a)). Under this reformulation, the problems of bounding the quantities tr(a? ) and det(a) are unied to bound tr(f(a)) for f() =? and ln, respectively. We rst note that the Kantorovich inequality (x T A? x)(x T Ax) 4 n + n + (x T x) holds for all vectors x. For derivation of this inequality, see [9]. By using a simple trial vector x = (0; : : : ; 0; ; 0; : : :; 0) T in the inequality and noting that the upper bound is monotonically increasing in n =, it yields (A? ) ii 4a ii + + : () The summation of the upper bound for all diagonal entries of A? gives a Kantorovich's upper bound for tr(a? ). Another approach for bounding tr(a? ) is to use variational functional, Robinson and Wathen [3] show that + (? a ii) (a ii? s ii ) (A? ) ii? (a ii? ) (s ii? a ii ) ; (3) where s ii = P n k= a ik. Hence the sums of the lower and upper bounds for all diagonal entries of A? give Robinson and Wathen's lower and upper bounds for tr(a? ), respectively. Two other types of bounds for (A? ) ii are also presented in [3], but more information is required. The third approach is to use Gaussian quadrature and related theory. As discussed in [4, 5, ], one rst bounds the quantity x T f(a)x for a given vector x, then probabilistic bounds and estimates can be obtained for tr(f(a)) by using Monte Carlo simulation. We refer to [] for details. The new bounds derived in this paper also use Gaussian quadrature and related theory. But they are exact lower and upper bounds for tr(f(a)) instead of probabilistic
Bai and Golub / Bounds for tr(a? ) and det(a) 3 bounds as given in []. The new bounds are derived by directly considering the quantity tr(f(a)) instead of each diagonal entry of f(a). They are very cheap to compute. In a number of examples our bounds are found to be tighter than Kantorovich's upper bound, and are equivalent to the Robinson and Wathen's bounds, which is computationally more expensive. Our experiences indicate that the probabilistic bounds presented in [] are the most accurate, but they are also the most expensive ones in terms of computational costs and memory requirement. In section we present the lower and upper bounds for tr(a? ) and tr(ln(a)). A number of examples, coming from the dierent applications, and comparisons with the Kantorovich's upper bound, Robinson and Wathen's bounds and the estimates using Monte Carlo simulation are given in section 3. We give concluding remarks in section 4. Bounds for tr(a? ) and tr(ln(a)) Let r = tr(a r ) = nx i= r i = r d(); (4) where the weight function () of the Stieltjes integral is () = P n j= I(? j), and I() is the unit step function: I() = 0 if < 0 and I() = if 0. Note that one can easily compute 0 = n; = nx i= a ii ; = nx i;j= a ij = kak F : Our rst task is to use 0, and and the parameters and to determine a lower and an upper bound for? = tr(a? ). The approach is to use the classical Gaussian quadrature and related theory, see for example []. Specically, we use the Gauss-Radau quadrature rule. By the rule, the integral in (4) can be written as r = r d() = r + R[ r ]; (5) where r is the following quadrature formula r = w 0 t r 0 + w t r ; (6) w 0 and w are weights and to be determined. t 0 and t are nodes. The node t 0 is prescribed, say t 0 = or. t is unknown and to be determined. The remainder R[ r ] = 6 r(r? )(r? )r?3 (? t 0 )(? t ) d() for some < <. If R[ r ] 0, r is a lower bound of r and if R[ r ] 0, r is a upper bound of r. From (6), we see that r satises a second order dierence equation c r + d r?? r? = 0 (7)
Bai and Golub / Bounds for tr(a? ) and det(a) 4 for certain coecients c and d. The nodes t 0 and t are the roots of the characteristic polynomial p() = c + d? : (8) To determine the coecients c and d, by using (7) with the fact r = r for r = 0; ;, and the prescribed node t 0 being the root of the characteristic polynomial (8), we have c + d? 0 = 0; ct 0 + dt 0? = 0: Solving the above linear equations for c and d yields c = d t 0 t 0? 0 Once having the coecients c and d, the node t of the quadrature r is given by t =?=(t 0 c). For determining the weights w 0 and w, we note that Then i.e., w0 w = w 0 t 0 + w t ; = w 0 t 0 + w t : = t0 t t 0 t? To bound tr(a? ) =?, writing the dierence equation (7) with r =, we have c + d 0?? = 0:? = c + d 0 = 0 t 0 t 0? 0 : By the Gauss-Radau quadrature rule (5) with r =?, we have where the remainder? =? + R[? ]; R[? ] =? (? t 4 0 )(? t ) d() for some < <. If the prescribed node t 0 =, R[? ] 0, then? is an upper bound of?. If t 0 =, R[? ] 0, then? is a lower bound of?. In summary, we have the following bounds for tr(a? ). :
Bai and Golub / Bounds for tr(a? ) and det(a) 5 Theorem (Lower and upper bounds for tr(a? )) Let A be an n-by-n symmetric positive denite matrix, = tr(a), = kak F and (A) [; ] with > 0, then n? n tr(a? ) n? n Let us turn to our second task for bounding tr(ln(a)). Note that the identity () can be further written as ln(det(a)) = tr(ln A) = nx i= (ln i ) = (ln )d(); where the weight function () of the Stieltjes integral is the same as the weight function dened in (4). Again, by using the Gauss-Radau quadrature rule, we have tr(ln(a)) = where the quadrature term I[ln ] is The remainder (ln )d() = I[ln ] + R[ln ]; I[ln ] = w 0 ln(t 0 ) + w ln(t ): R[ln ] = (? t 3 3 0 )(? t ) d() for some < <. Therefore, if t 0 =, R[ln ] 0, then I[ln ] is a lower bound of tr(ln(a)). If t 0 =, R[ln ] 0, then I[ln ] is an upper bound. Therefore, we derive the following bounds for tr(ln(a)). Theorem (Lower and upper bounds for tr(ln(a))) Let A be an n-by-n symmetric positive denite matrix, = tr(a), = kak F and (A) [; ] with > 0, then ln ln t t t? where tr(ln(a)) ln ln t t t t =? n? and t =? n?? The bounds (9) and (0) involve only the trace of the lower orders of the matrix power A r, namely, 0 = tr(a 0 ) = n, = tr(a ) and = tr(a ). They can be easily computed. If the trace of the higher orders of the matrix power A r, r 3, are available, then using the principles discussed above, we can derive tighter bounds. The parameters and for the bounds of eigenvalues of A must be provided, which happens to be required in all such bounds discussed in Section. (9) (0)
Bai and Golub / Bounds for tr(a? ) and det(a) 6 Table Lower and upper bounds for tr(a? ) Matrix (order) \Exact" MC estimation Lower bound Upper bound Poisson (900) 5:644 0 5:00 0 :6085 0 8:74445 0 3 Wathen (34) 6:60 0 6:09 0 4:4944 0 9:945 0 Heat ow (65) 3:657 0 3:6579 0 3:59979 0 3:73996 0 3 Examples In this section, we use four examples to show the tightness of the bounds given in (9) and (0). We will also compare with the Kantorovich's upper bound (), Robinson and Wathen's bounds (3) and the approach using Monte Carlo simulation (henceforth the MC estimation) described in []. Numerical experiments are carried out in Matlab environment on a SUN Sparcstation 0. The so-called \exact" value of tr(a? ) is computed by analytic formulas if available, or by rst computing the inverse using function inv in Matlab, and then calculating the trace. For the \exact" value of tr(ln(a)) = ln(det(a)), we use the analytic formula if available, or rst compute the Cholesky decomposition of A using Matlab function chol and then compute the natural logarithm of the product of diagonals of Cholesky factor. Example (Pei matrix): Consider the so-called n-by-n Pei matrix A = I + uu T, where u = (; ; : : :; ) T [8]. It is easy to see that A has two distinct eigenvalues and n +. The eigenvalue has multiplicity n?. If > 0, A is symmetric positive denite. By the Sherman-Morrison formula [6], the inverse of A can be written as A? = I? ( + n) uut Then tr(a? ) = n? n and tr(ln(a)) = (n? ) ln + ln( + n). It is easy to compute (+n) that = tr(a) = ( +)n and = tr(a ) = n +( +)n. If let parameters = and = n+, then by straightforward algebraic calculation, both lower and upper bounds for tr(a? ) in (9) are equal to the exact value. Similarly, one can also easily show that both lower and upper bounds for tr(ln(a)) in (0) are also equal to the exact value. For this example, the bounds (9) and (0) are perfect! Of course, one can also verify that for this example, there are no intergration errors (the remainders are zero) in the Gauss-Radau quadrature rule. For this example, Robinson and Wathen's bounds for tr(a? ) are also equal to the exact value. Example (Poisson matrix): The matrix of order m is a block tridiagonal matrix from the 5-point central dierence discretization of the -D Poisson's equation on a m m square mesh [8]. It can be shown that the parameters and for the bounds of the smallest and largest eigenvalues are = m+ and = 8, respectively. In Tables and, we have tabulated the exact values for tr(a? ) and tr(ln(a)), the estimated values by the MC estimation []. and the lower and upper bounds given in (9) and (0) for the 900 by 900 Poisson matrix (i.e., m = 30).
Bai and Golub / Bounds for tr(a? ) and det(a) 7 Table Lower and upper bounds for tr(ln(a)) Matrix (order) \Exact" MC estimation Lower bound Upper bound Poisson (900) :06500 0 3 :0603 0 3 4:7386 0 :6857 0 3 Wathen (34)?:007 0?:063 0?:0065 0?6:86595 0 Heat ow (65) 3:5679 0 3:5075 0 3:47348 0 3:54997 0 The Kantorovich's upper bound for tr(a? ) is :008 0 4. Robinson and Wathen's lower and upper bounds are :60969 0 and 8:7379 0 3. Note that the estimates from the Monte Carlo simulation for tr(a? ) and tr(ln(a)) are only at % and 0.4% of relative errors of the actual values, respectively []. Example 3 (Wathen matrix): The matrix A is \wathen(n x ; n y )" in the set of Matlab test matrices collection by Higham [8]. It is a consistent mass matrix in nite element computations for a regular n x -by-n y grid of 8-node (serendipity) elements in space dimensions (see [6]). The resulting matrix is of order n = 3n x n y + n x + n y +. Let D be the diagonals of A, then realistic bounds for the eigenvalues of D? A are given by Wathen [7]. In our numerical experiment, we let n x = n y = 0. Then the matrix A is of order n = 34 and = 0:5 and = 4:5. The bounds for tr(d A? D ) and tr(ln(d? AD? )) are tabulated in Table and. The Kantorovich's upper bound for tr(d A? D ) is :70974 0 3 and the Robinson and Wathen's lower and upper bounds are 4:5380 0 and 9:995 0, respectively. Again, note that the estimated values of the MC estimation are only at 0.8% and 0.% of relative errors of the actual values. Example 4: This test matrix is from [0], see also [3]. The matrix is resulted from the implicit nite dierent discretization of a linear heat ow problem. It is a m by m block tridiagonal matrix of the form A = 0 B @ D C C D......... C C D where D is a m m tridiagonal matrix with + 4 on diagonal, and? on super- and sub-diagonal, and C is a diagonal matrix with diagonal entries. is the ratio of the time step and the square of grid size. The Gershgorin circle theorem gives = and = + 8 for the eigenvalue bounds of A. We have tabulated in Tables and the bounds for tr(a? ) and tr(ln(a)), respectively, where n = 65 (m = 5) and = 0:5. The Kantorovich's upper bound for tr(a? ) is 4:369 0 and the Robinson and Wathen's lower and upper bounds are 3:59996 0 and 3:7397 0, respectively. C A ;
Bai and Golub / Bounds for tr(a? ) and det(a) 8 4 Concluding Remarks Simple bounds for tr(a? ) and tr(ln(a)) of a symmetric positive denite matrix A are derived by using Gaussian quadrature and related theory. The bounds involve only n (the order of A), tr(a) and kak F and the parameters and, namely for the bounds of eigenvalues of A. From the numerical examples presented in Section 3 and numerous other experiments conducted, our bounds for tr(a? ) are found to be tighter when simple trial vectors are used in the Kantorovich's bound, and are equivalent to the Robinson and Wathen's bounds. But it is generally poorer than the probablistic bounds and estimations for tr(a? ) and ln(det(a)) derived by using Gaussian quadrature and Monte Carlo simulation []. The latter costs signicantly more arithmetic operations and memory. But a fully parallelism scheme can be developed for the simulation []. We point out that using Gaussian quadrature and related theory, we have the advantage of easily extending the approach for tr(a? ) to tr(ln(a)), while the approaches based on Kantorovich inequality and variational inequality do not enjoy. If the traces of higher orders of the matrix power A r, r 3, are available, then bounds can be further tightened by using the same technique. Moreover, one could use modied moments to get improved estimates of the quadrature rule. 5 Afternote While we were nishing up this paper, we read an article by Ortner and Krauter on lower bounds for the determinant and the trace of the inverse in the most recent issue of Linear Algebra and its Applications []. One central problem studied by Ortner and Krauter is to nd lower bounds of tr(p? ) and det(p? ), where P = m XT X, X is a given m- by-n (m n) full rank matrix, whose rows have unit length. This problem arises from accuracy considerations in real second-rank tensor measurements of single crystals []. By using standard matrix theory, one can show that the condition number of the matrix X is related to the quantity tr(p? ), namely, F (X) = kxk F kx y k F = p tr(p? ): where X y is the Moore-Penrose inverse of X [6]. Therefore, a lower bound of tr(p? ) also gives a lower bound of the condition number of X. Using an approach based on matrix theory and combinatorics, various lower bounds of tr(p? ) are derived in []. The sharpness of those lower bounds are demonstrated for small m and n ([, Example 3]). Our approach yields the same lower bounds for their set of test problems, provided that the extreme eigenvalues of P are available. As indicated by Ortner and Krauter [], in most cases, it is very hard, if not impossible, to nd a suitable conguration of the row vectors of X to attain the optimal lower bound. Since our approach gives both lower and upper bounds of the quantity tr(p? ), it might provide a way to assess how far a given conguration is from the optimal conguration. It would be interesting to make further investigations in this direction.
Bai and Golub / Bounds for tr(a? ) and det(a) 9 References []. Bai, M. Fahey, and G. Golub. Some large scale matrix computation problems. J. of Comput. and Appl. Math., 996. to appear. [] P. Davis and P. Rabinowitz. Methods of Numerical Integration. Academic Press, NY, 984. [3] S. Dong and K. Liu. Stochastic estimation with z noise. Physics Letters B, 38 (994) 30{36. [4] G. Golub and G. Meurant. Matrices, moments and quadrature. in Proceedings of the 5th Dundee Conference, June 993, D. F. Griths and G. A. Watson, Eds., Longman Scientic & Technical, 994. [5] G. Golub and. Strakos. Estimates in quadratic formulas. Numerical Algorithms, 8 (994) 4-68. [6] G. Golub and C. Van Loan. Matrix Computations. Johns Hopkins University Press, Baltimore, MD, nd edition, 989. [7] G. Golub and U. Von Matt. Generalized cross-validation for large scale problems. Report SCCM-96-05, Stanford University, Stanford, 996. [8] N. J. Higham. The test matrix toolbox for Matlab. Numerical Analysis Report 37, University of Manchester, Manchester, 993. [9] A. S. Householder. The Theory of Matrices in Numerical Analysis. Blaisdell, New York, NY, 964. Dover edition 975. [0] A. R. Mitchell and D. F. Griths. The Finite Dierence Method in Partial Dierential Equations. Wiley, NY, 980. [] B. Ortner. On the selection of measurement directions in second-rank tensor (e.g. elastic strain) determination of single crystals. J. Appl. Cryst., (989) 6{. [] B. Ortner and A. R. Krauter. Lower bounds for the determinant and the trace of a class of Hermitian matrices. Lin. Alg. and Appl., 36 (996) 47{80. [3] P. Robinson and A. J. Wathen. Variational bounds on the entries of the inverse of a matrix. IMA Journal of Numerical Analysis, (99) 463{486. [4] B. Sapoval, Th. Gobron, and A. Margolina. Vibrations of fractal drums. Phy. Rev. Lett., 67 (99) 974{977. [5] J. C. Sexton and D. H. Weingarten. Systematic expansion for full QCD based on the valence approximation, IBM T. J. Watson Research Center, Preprint, 994 [6] G. Strang and G. Fix. An Analysis of the Finite Element Method. Prentice Hall, Englewood Clis, NJ, 973. [7] A. J. Wathen. Realistic eigenvalue bounds for the Galerkin mass matrix. IMA Journal of Numerical Analysis, 7 (987) 449{457. [8] S. Y. Wu, J. A. Cocks, and C. S. Jayanthi. An accelerated inversion algorithm using the resolvent matrix method. Comput. Phy. Comm., 7 (99) 5{33.