Teil 2: Hierarchische Matrizen

Size: px

Start display at page:

Download "Teil 2: Hierarchische Matrizen"

Oscar Dickerson
6 years ago
Views:

1 Teil 2: Hierarchische Matrizen Mario Bebendorf, Universität Leipzig Bebendorf, Mario (Univ. Leipzig) Kompaktkurs lineare GS und H-Matrizen April 2 1 /

2 Fast Summation Methods Fast multipole methods: Rokhlin 1; Panel clustering: Hackbusch/Nowak 1; Mosaic-skeleton methods: Tyrtyshnikov 1; Hierarchical matrices: Hackbusch 2 and Hackbusch/Khoromskij Main idea: Approximation errors can be tolerated as long their size is of the order of discretization error. Aim: Solution of linear systems with complexity O(n log n). Here: Emphasis is laid on algorithmic aspects; approximation theory is neglected. Bebendorf, Mario (Univ. Leipzig) Kompaktkurs lineare GS und H-Matrizen April 2 2 /

3 Low-Rank Matrices Let m, n N and A C m n be a matrix. The rank of A is the dimension of its range rank A := dim {Ax C m, x C n }. Theorem Let m, n, k N. Then it holds that (a) rank A min{m, n} for all A C m n (b) rank(ab) min{rank A, rank B} for all A C m p and all B C p n (c) rank(a + B) rank A + rank B for all A, B C m n We denote the set of matrices A C m n having at most k linearly independent rows or columns by C m n k := A C m n : rank A k. NOTE: is not a linear space. The rank of the sum of two rank-k matrices is in general only bounded by 2k. C m n k The rank of each sub-block of A C m n k is bounded by k. Bebendorf, Mario (Univ. Leipzig) Kompaktkurs lineare GS und H-Matrizen April 2 3 /

4 Efficient representation Since among the n columns of A C m n k only k are sufficient to represent the whole matrix by linear combination, the entrywise representation of A contains redundancies which can be removed by changing to another kind of representation. Theorem A matrix A C m n belongs to C m n k V C n k such that if and only if there are matrices U C m k and A = UV H. (1) The representation (1) of matrices from C m n k is called outer-product form. If u i, v i, i = 1,..., k, denote the columns of U and V, respectively, then (1) can be equivalently written as A = kx i=1 u i v H i. Hence, instead of storing the m n entries of A C m n k, we can equally store the vectors u i, v i, i = 1,..., k, which require k(m + n) units of storage. Bebendorf, Mario (Univ. Leipzig) Kompaktkurs lineare GS und H-Matrizen April 2 4 /

5 In addition to reducing the storage requirements, the outer-product form (1) also facilitates matrix-vector multiplications: Ax = UV H x = U(V H x), i.e, instead of computing the update y := y + Ax in the usual (entrywise) way, A can alternatively be multiplied by x using the following two-step procedure: (a) define z := V H x C k (b) compute y := y + Uz. Hence, instead of 2m n arithmetic operations which are required in the entrywise representation, the outer-product form amounts to 2k(m + n) k operations. Assume for a moment that m = n. Outer-product form replaces one dimension in the complexity of matrices from C m n k by 2k, i.e., instead of m n = m 2 we have k(m + n) = 2km this representation is not advantageous for large k. By the following definition we characterize matrices for which the outer-product form is advantageous compared with the entrywise representation. Bebendorf, Mario (Univ. Leipzig) Kompaktkurs lineare GS und H-Matrizen April 2 /

6 Definition A matrix A C m n k is called a matrix of low rank if k(m + n) < m n. Besides arithmetic operations also the norm of a matrix is often required. The Frobenius norm A F :=! 1/2 mx nx trace A H A = a ij 2 of A C m n k i=1 j=1 can be computed with 2k 2 (m + n) operations since Similarly, the spectral norm UV H 2 F = kx i,j=1 (u H i u j )(v H i v j ). (2) A 2 := p ρ(a H A) = p ρ(vu H UV H ) = p ρ(u H UV H V ), where ρ(a) denotes the spectral radius of A, can be evaluated with O(k 2 (m + n)) arithmetic operations. Bebendorf, Mario (Univ. Leipzig) Kompaktkurs lineare GS und H-Matrizen April 2 /

7 Sometimes it is useful that U and V have orthonormal columns; i.e., it holds that U H U = I k = V H V. In this case we have to introduce an additional diagonal coefficient matrix Σ = diag(σ 1,..., σ k ) R k k and replace (1) by the singular value decomposition A = UΣV H = kx i=1 σ i u i v H i. (3) If the representation (3) with matrices U, V having orthonormal columns is employed, the computation of the Frobenius norm simplifies to v ux UΣV H F = Σ F = t k σi 2, which requires only O(k) operations. Since the spectral norm is unitarily invariant, too, we have UΣV H 2 = Σ 2 = σ 1, where we assume that the singular values σ i, i = 1,..., k, of A are non-increasingly ordered; i.e., it holds that σ 1 σ 2... σ k. i=1 Bebendorf, Mario (Univ. Leipzig) Kompaktkurs lineare GS und H-Matrizen April 2 /

8 Adding and multiplying low-rank matrices We consider the multiplication of low-rank matrices A C m p k A outer-product representation A = U A V H A and B = U B V H B. and B C p n k B The rank of the product AB is bounded by min{k A, k B }. Hence, the outer-product form will be advantageous for the product AB as well. There are two possibilities for computing AB = UV H : (a) U := U A (V H A U B ) and V := V B using 2k A k B (m + p) k B (m + k A ) operations (b) U := U A and V := V B (U H B V A ) using 2k A k B (p + n) k A (n + k B ) operations. Depending on the quantities k A, k B, m, and n, either representation should be chosen. If exactly one of the matrices A or B is stored entrywise, say B, we have the following outer-product representation of AB = UV H, where U := U A C m k A and V := B H V A C n k A. This requires k A (2p 1)n operations. in Bebendorf, Mario (Univ. Leipzig) Kompaktkurs lineare GS und H-Matrizen April 2 /

9 If A and B are to be added, the sum A + B will have the following outer-product representation A + B = UV H with U := [U A, U B ] C m k and V := [V A, V B ] C n k, which guarantees that rank(a + B) k A + k B =: k. (4) Hence, apart from the reorganization of the data structure, no numerical operations are required for adding two matrices in outer-product form. A lower bound than (4) for the rank of A + B cannot be found for general low-rank matrices. Hence, the rank of A + B will be considerably larger than the ranks of A and B although A + B might be close to a matrix of a much smaller rank. Bebendorf, Mario (Univ. Leipzig) Kompaktkurs lineare GS und H-Matrizen April 2 /

10 Approximation by low-rank matrices Although matrices usually have full rank, they can often be approximated by matrices having a much lower rank. The following theorem states that the closest matrix in C m n k to a given matrix from C m n, m n, can be obtained from the singular value decomposition (SVD) A = UΣV H with U H U = I n = V H V and a diagonal matrix Σ R n n with entries σ 1... σ n. Interestingly, this result is valid for any unitarily invariant norm. Theorem Let the SVD A = UΣV H of A C m n, m n, be given. Then for k N satisfying k n it holds that min M C m n k A M = A A k = Σ Σ k, () where A k := UΣ k V H C m n k and Σ k := diag(σ 1,..., σ k,,..., ) R n n. Note that the approximant UΣ k V H has the representation (3). If the outer-product representation is preferred, either U or V has to be multiplied by Σ k. Bebendorf, Mario (Univ. Leipzig) Kompaktkurs lineare GS und H-Matrizen April 2 /

11 If the spectral norm 2 is used in the previous theorem, then A A k 2 = σ k+1. In the case of the Frobenius norm one has instead nx A A k 2 F = σl. 2 l=k+1 The information about the error A A k = Σ Σ k can also be used in the opposite way. If on the other hand a relative accuracy ε > of the approximant A k is prescribed, i.e., A A k 2 < ε A 2, then due to () the required rank k(ε) is given by k(ε) := min{k N : σ k+1 < εσ 1}. In order to find the optimum approximant A k, it remains to compute the singular value decomposition, which requires O(mn 2 ) operations for general matrices A C m n, m n. If the given matrix A has low rank, then its SVD can be computed with significantly less operations. Bebendorf, Mario (Univ. Leipzig) Kompaktkurs lineare GS und H-Matrizen April 2 11 /

12 Singular value decomposition of low-rank matrices For matrices A = UV H C m n k it is possible to compute an SVD with complexity O(k 2 (m + n)). Assume we have computed the QR decompositions U = Q U R U and V = Q V R V of U C m k and V C n k, respectively. Note that this can be done with 4k 2 (m + n) operations. The outer-product of the two k k upper triangular matrices R U and R V is then decomposed using the SVD of R U RV H : R U R H V = Û ˆΣ ˆV H. Computing R U R H V needs k 2 (k + 1) operations and the cost of the SVD amount to 21k 3 operations. Since Q U Û and Q V ˆV both are unitary, A = UV H = (Q U Û)ˆΣ(Q V ˆV ) H is an SVD of A. The number of arithmetic operations of the SVD of a rank-k matrix sum up to operations. QR decomposition of U and V 4k 2 (m + n) Computing R U RV H k 2 (k + 1) SVD of R U RV H 21k 3 Computing Q U Û and Q V ˆV k(2k 1)(m + n) k 2 (m + n) + 22k 3 Bebendorf, Mario (Univ. Leipzig) Kompaktkurs lineare GS und H-Matrizen April 2 12 /

13 Approximate addition of low-rank matrices When computing the sum of two low-rank matrices, we have to deal with the problem that C m n k is not a linear space. The sum A + B of two matrices from C m n k might, however, be close to a matrix of a much smaller rank. In this case the sum of two low-rank matrices can be truncated to rank k using the SVD of low-rank matrices from the last section. This truncation will be referred to as the rounded addition. Theorem Let A C m n k A, B C m n k B, and k N with k k A + k B. Then a matrix S C m n k satisfying A + B S = min M C m n k A + B M with respect to any unitarily invariant norm can be computed with (k A + k B ) 2 (m + n) + 22(k A + k B ) 3 operations. In some applications it may also occur that the sum of several low-rank matrices A i C m n k i, i = 1,..., l, has to be rounded. In this case, the complexity analysis reveals a factor ( P l i=1 k i) 2 in front of m + n. In order to avoid this, we gradually compute the rounded sum pairwise. Bebendorf, Mario (Univ. Leipzig) Kompaktkurs lineare GS und H-Matrizen April 2 13 /

14 Exploiting the SVD for the rounded addition The rounded addition is the most time-consuming part in the arithmetic of hierarchical matrices. The numerical effort can be reduced if the SVD representation is used for instance. Assume that A = U A Σ A V H A and B = U B Σ B V H B, where U A, V A, U B, and V B have orthonormal columns and Σ A R k A k A Σ B R k B k B are diagonal matrices. Then» ΣA A + B = [U A, U B ] [V A, V B ] H. Σ B and Assume that k A k B. In order to reestablish a representation of type (3), we have to orthogonalize the columns of the matrices [U A, U B ] and [V A, V B ]. Let X U := UA H U B C k A k B and Y U := U B U A X U C m k B. Furthermore, let Q U R U = Y U, Q U C m k B, be a QR decomposition of Y U. Then [U A, U B ] = ˆU»I X A, Q U U R U is a QR decomposition of [U A, U B ]. Bebendorf, Mario (Univ. Leipzig) Kompaktkurs lineare GS und H-Matrizen April 2 14 /

15 Similarly, [V A, V B ] = ˆV»I X A, Q V V R V is a QR decomposition of [V A, V B ], where X V := VA H V B C k A k B and Q V R V = Y V is a QR decomposition of Y V := V B V A X V C n k B. We obtain» ΣA + X A + B = [U A, Q U ] U Σ B XV H X U Σ B RV H R U Σ B XV H R U Σ B RV H [V A, Q V ] H. From this point on one proceeds in the same way as for the SVD of low-rank matrices. For the complexity analysis we concentrate on those terms which depend on m. The orthogonalization of [U A, U B ] using the previous method requires Computing X U Computing Y U Decomposing Y U 2k A k B m (k A + 1)k B m 4kBm 2 (3k A + 4k B + 1)k B m operations while the orthogonalization of [U A, U B ] using the QR decomposition needs 4(k A + k B ) 2 m operations. Bebendorf, Mario (Univ. Leipzig) Kompaktkurs lineare GS und H-Matrizen April 2 1 /

16 If k := k A = k B, then the proposed variant requires (k + 1)km operations, while 1k 2 m operations are needed to decompose [U A, U B ]. m n k A k B time old time new gain 2.23s 4.s % s.1s 32% s.2s 3% s 1.4s 4% s 23.2s 4% The presented CPU times are the times for additions with accuracy ε = 1 2. Bebendorf, Mario (Univ. Leipzig) Kompaktkurs lineare GS und H-Matrizen April 2 1 /

17 Agglomerating low-rank blocks We will come across the problem of unifying neighboring blocks to a single one for the purpose of saving memory. This operation will be referred to as agglomeration. Assume a 2 2 block matrix» A1 A 2 UV H A 3 A 4 consisting of four low-rank matrices A i = U i V H i is to be approximated by a single matrix A = UV H C m n k» A1 A 2 A 3 A 4 =»» A1 +, i = 1,..., 4, each having rank at most k. Since»» A 2 + +, A 3 A 4 this problem may be regarded as a rounded addition of four low-rank matrices. Therefore, a best approximation in C m n k can be computed using the SVD of low-rank matrices. Bebendorf, Mario (Univ. Leipzig) Kompaktkurs lineare GS und H-Matrizen April 2 1 /

18 Compared with the rounded addition of general low-rank matrices, the presence of zeros should be taken into account. Since» A1 A 2 = A 3 A Û ˆV H, 4 where» U1 U 2 Û := U 3 U 4» V1 V 3 and ˆV :=, V 2 V 4 it is enough to compute QR decompositions of [U 1, U 2], [U 3, U 4], [V 1, V 3], and [V 2, V 4]. The number of arithmetic operations can be estimated as Computing the QR decompositions 2 k 2 (m + n) Computing RÛR HˆV k(2k) 2 Computing the SVD of RÛR HˆV 21(4k) 3 Building the unitary factors 4k 2 (m + n) 1k 2 (m + n) + 132k 3 The amount of operations can be reduced if each of the matrices [A 1, A 2] and [A 3, A 4] is agglomerated before agglomerating the results. Bebendorf, Mario (Univ. Leipzig) Kompaktkurs lineare GS und H-Matrizen April 2 1 /

19 Degenerate Kernels Typically, only sub-blocks of A C I J, I = {1,..., m} and J := {1,..., n}, can be approximated by low-rank matrices. COMMON: the ith component of a vector x C I is denoted by x i GENERALIZATION: If t I, then x t C t denotes the restriction of x to the indices in t. Consequently, A ts or A b denotes the restriction of a given matrix A C m n to the indices in b := t s, where t I and s J. In this part we consider matrices A R I J with blocks A ts of the form A ts = Λ 1,tAΛ 2,s, which arise from the discretization of integral operators Z (Av)(y) = κ(x, y)v(x) dµ x, y Ω. Ω Here, Ω R d denotes the domain of integration and µ is an associated measure. If Ω is a (d 1)-dimensional manifold in R d for instance, then µ denotes the surface measure. Furthermore, Z Z (Λ 1,tf ) i = f (x)ψ i (x) dµ x, (Λ 2,sf ) j = f (x)ϕ j (x) dµ x Ω and Λ 2,s : R s L 2 (Γ) is the adjoint of Λ 2,s : L 2 (Γ) R s defined by (Λ 2,sz, f ) L 2 (Γ) = z T (Λ 2,sf ) for all z R s, f L 2 (Γ). Ω Bebendorf, Mario (Univ. Leipzig) Kompaktkurs lineare GS und H-Matrizen April 2 1 /

20 Each set of rows t I and each set of columns s J is connected with subdomains Y t := [ i t Y i and X s = [ j s X j which are the union of supports Y i Ω and X j Ω of finite element basis functions ψ i, ϕ j defined on the computational domain Ω. Assume that the kernel function κ is degenerate on Y t X s. Later on, we will find conditions on t and s for this assumption to hold. Definition Let D 1, D 2 R d be two domains. A kernel function κ : D 1 D 2 R is called degenerate if k N and functions u l : D 1 R and v l : D 2 R, l = 1,..., k, exist such that kx κ(x, y) = u l (x)v l (y), x D 1, y D 2. l=1 The number k is called degree of degeneracy. Bebendorf, Mario (Univ. Leipzig) Kompaktkurs lineare GS und H-Matrizen April 2 2 /

21 The rank of A ts is bounded by k. To see this, let a l = Λ 1,tv l R t and b l = Λ 2,su l R s, l = 1,..., k. For z R s we have b T l z = (u l, Λ 2,sz) L 2 (X s ). Since for y Y t Z Z (AΛ 2,sz)(y) = κ(x, y)(λ 2,sz)(x) dµ x = X s = X s l=1 kx Z v l (y) u l (x)(λ 2,sz)(x) dµ x = X s l=1 we obtain for the sub-block A ts of A A ts = Λ 1,tAΛ 2,s = Λ 1,t kx u l (x)v l (y)(λ 2,sz)(x) dµ x kx v l (y)bl T z, l=1 kx v l bl T = X 1,tv l )bl l=1(λ T = l=1 kx a l bl T. Therefore, degenerate kernels lead to low-rank matrices if t and s are large enough compared with k; i.e., if k( t + s ) < t s. l=1 Bebendorf, Mario (Univ. Leipzig) Kompaktkurs lineare GS und H-Matrizen April 2 21 /

22 Asymptotically smooth kernels If A is an elliptic operator, then its kernel function κ is asymptotically smooth. Definition A function κ : Ω R d R satisfying κ(x, ) C (R d \ {x}) for all x Ω is called asymptotically smooth in Ω with respect to y if constants c and γ can be found such that for all x Ω and all α N d y α κ(x, y) cp!γ p r p sup κ(x, z) z B r (y) for all y R d \ {x}, where r = x y /2 and p = α. Let κ : D 1 D 2 R be analytic with respect to its second argument y and let ξ D2 denote the Chebyshev center of D 2. Then κ has a Taylor expansion where κ(x, y) = X α <p R p(x, y) := X 1 α! α y κ(x, ξ D2 )(y ξ D2 ) α + R p(x, y), α p 1 α! α y κ(x, ξ D2 )(y ξ D2 ) α denotes the remainder of the expansion, which converges to zero for p. The rate of convergence can however be arbitrarily bad. Bebendorf, Mario (Univ. Leipzig) Kompaktkurs lineare GS und H-Matrizen April 2 22 /

23 Note that T p[κ](x, y) := X α <p 1 α! α y κ(x, ξ D2 )(y ξ D2 ) α is a degenerate kernel approximation. Since T p[κ](x, ) Π d p 1, the degree of degeneracy is the dimension of the space of d-variate polynomials of order at most p 1 k = dim Π d p 1 p d. The importance of asymptotic smoothness is that it leads to exponential convergence of the Taylor series if D 1 and D 2 are far enough away from each other. For the following lemma we assume that η dist(ξ D2, D 1) ρ D2, () where η > and ξ D2 is the Chebyshev center of D 2 R d, i.e., the center of the ball with minimum radius ρ D2 containing D 2. Lemma Assume that () holds with η > satisfying 2γ dη < 1. If κ is asymptotically smooth on the convex set D 2 with respect to y, then with r := x ξ D2 /2 for all x D 1 and y D 2 it holds that κ(x, y) T p[κ](x, y) (2γ dη) p 1 2γ dη sup κ(x, z). z B r (ξ D2 ) Bebendorf, Mario (Univ. Leipzig) Kompaktkurs lineare GS und H-Matrizen April 2 23 /

24 Proof. For the remainder R p it holds that due to P α =l R p(x, y) X c α p 1 α! α y κ(x, ξ D2 ) (y ξ D2 ) α sup z B r (ξ D2 ) κ(x, z) X = c sup κ(x, z) z B r (ξ D2 ) c c sup z B r (ξ D2 ) sup z B r (ξ D2 ) κ(x, z) κ(x, z) α p X l=p X l=p (2γ) α α! α! x ξ D2 α (y ξ D 2 ) α 2γ «l X x ξ D2 α =l «l 2γ d y ξ D 2 x ξ D2 X (2γ dη) l l=p c (2γ dη) p 1 2γ sup κ(x, z) dη z B r (ξ D2 ) ` l ξ α = ( P d α i=1 ξ i ) l d l/2 ξ l for all ξ R d.! l (y ξ D2 ) α α Bebendorf, Mario (Univ. Leipzig) Kompaktkurs lineare GS und H-Matrizen April 2 24 /

25 The previous lemma shows that the Taylor expansion of asymptotically smooth kernels converges exponentially with convergence rate 2γ dη < 1. Thus, p log ε is required to achieve a given approximation accuracy ε >. For the degree k of degeneracy of T p[κ] it follows that k p d log ε d. Note that if κ is asymptotically smooth only with respect to the first argument x, then ρ D2 has to be replaced by ρ D1 in (). If κ is asymptotically smooth with respect to both variables, then the symmetric condition is sufficient. min{ρ D1, ρ D2 } η dist(d 1, D 2) () In order to be able to approximate the block A ts, we have to satisfy the last condition for D 1 = Y t and D 2 = X s. A block t s will therefore be called admissible if the condition is satisfied. min{diam Y t, diam X s} η dist(y t, X s) () Bebendorf, Mario (Univ. Leipzig) Kompaktkurs lineare GS und H-Matrizen April 2 2 /

26 Matrix partitioning When constructing partitions, we have to account for two contrary aims. partition has to be fine enough such that most of the blocks can be successfully approximated; the number of blocks must be as small as possible in order to be able to guarantee efficient arithmetic operations. Finding an optimal partition is a difficult task since the set of all possible partitions is too large to be searched for. Instead of searching for the best partition, we will therefore construct partitions which are quasi-optimal in the sense that they can be computed with almost linear costs and allow approximants of logarithmic-linear complexity. Bebendorf, Mario (Univ. Leipzig) Kompaktkurs lineare GS und H-Matrizen April 2 2 /

27 Note that at least for the diagonal entries (i, i), i I, both conditions () will always be violated. In order to guarantee that A b, (i, i) b, has low rank, the dimensions of A b therefore have to be small. This leads to the following definition. Definition A partition P is called admissible if each block t s P is either admissible or small; i.e., the cardinalities t and s of t and s satisfy min{ t, s } n min with a given minimal dimension n min N. Bebendorf, Mario (Univ. Leipzig) Kompaktkurs lineare GS und H-Matrizen April 2 2 /

28 Tensor vs. hierarchical partitions In this section we assume that I = J. For the comparison of tensor and hierarchical partitions we will investigate the memory consumption and the number of arithmetic operations required to multiply such matrices by a vector if (a) each diagonal block is stored as a dense matrix; (b) all other blocks t s are assumed to be admissible and are stored as rank-1 matrices. Let us first consider the case of tensor partitions P = P I P I = {t k t l : t k, t l P I }, where P I := {t k, k = 1,..., p} is a partition of the index set I ; i.e., I = p[ t k and t k t l = for k l. k=1 Bebendorf, Mario (Univ. Leipzig) Kompaktkurs lineare GS und H-Matrizen April 2 2 /

29 Storing A requires P p k=1 t k 2 units of storage for the diagonal and X px px px t k + t l = 2 t k = 2(p 1) t k = 2(p 1)n k l k=1 l=1 l k units for the off-diagonal blocks. Due to the Cauchy-Schwarz inequality k=1 n 2 = 2 px px t k! p t k 2, k=1 k=1 at least n 2 /p + 2(p 1)n units of storage are necessary to hold A. The minimum of the last expression is attained for p = p n/2 resulting in a minimum amount of storage of order n 3/2. Hence, the required amount of storage resulting from tensor product partitions is not competitive. Bebendorf, Mario (Univ. Leipzig) Kompaktkurs lineare GS und H-Matrizen April 2 2 /

30 Let us now check whether a hierarchical partition leads to almost linear complexity. We restrict ourselves to the case n = 2 p for some p N. Assume that t has already been generated from I after l subdivisions. Subdividing t = {i 1,..., i 2 p l} into two parts t 1 = {i 1,..., i 2 p l 1} and t 2 = {i 2 p l 1 +1,..., i 2 p l} of equal size, we obtain a 2 2 block partition of A tt R t t :» At1 t A tt = 1 A t1 t 2, () A t2 t 1 A t2 t 2 where A ti t j R t i t j, i, j = 1, 2. The diagonal blocks A t1 t 1 and A t2 t 2 are subdivided in the same way as A tt; i.e., its off-diagonal blocks are again restricted to rank-1 matrices while its diagonal blocks are subdivided and so on. Figure: Partition for p = 4 Bebendorf, Mario (Univ. Leipzig) Kompaktkurs lineare GS und H-Matrizen April 2 3 /

31 Let Np st denote the amount of storage which is required to hold such a matrix. Since the off-diagonal blocks in level p require 4 2 p 1 = 2 p+1 units of storage, we find the recursive relation N st p = 2N st p p+1 with N st = 1. Resolving this recursion, we obtain that N st p = (2p + 1)2 p = 2n log 2 n + n. Using the blocking (), the matrix-vector product with x C I can be computed recursively by»» At1 t A ttx t = 1 x t1 + A t1 t 2 x t2 xt1, where x A t2 t 1 x t1 + A t2 t 2 x t = and x t2 x t1 C t 1, x t2 C t 2. t2 Hence, for Ax we need the results of the products A t1 t 1 x t1 the size. For the number Np MV of operations it holds that and A t2 t 2 x t2, which have half N MV p = 2N MV p p+2 2. Multiplying the rank-1 matrices A t1 t 2 and A t2 t 1 by x t2 and x t1 and adding the results each requires 4 2 p 1 1 operations. With N MV = 2 we obtain N MV p = p2 p = 4n log 2 n + 2. Bebendorf, Mario (Univ. Leipzig) Kompaktkurs lineare GS und H-Matrizen April 2 31 /

32 Cluster trees The presented partition is too special since non-admissible blocks can only appear on the diagonal. In the following it will be described how hierarchical partitions can be constructed for arbitrary admissibility conditions. In order to partition the set of matrix indices I J hierarchically into sub-blocks, we first need a rule to subdivide the index sets I and J. This leads to the so-called cluster tree (cf. []), which contains a hierarchy of partitions. Definition A tree T I = (V, E) with vertices V and edges E is called a cluster tree for a set I N if the following conditions hold (a) I is the root of T I ; (b) t = S t S(t) t for all t V ; (c) the degree deg t := S(t) 2 of each vertex t V \ L(T I ) is bounded from below. Here, the set of sons S(t) := {t V : (t, t ) E} of t V is pairwise disjoint and L(T I ) := {t V : S(t) = } denotes the set of leaves of T I. Condition (b) implies that t I for all t T I and that each level of T I contains a partition of I. T (l) I := {t T I : level t = l} Bebendorf, Mario (Univ. Leipzig) Kompaktkurs lineare GS und H-Matrizen April 2 32 /

33 In the sequel we will identify the set of vertices V with the cluster tree T I. The purpose of S is to generate subdomains of minimal diameter. 2nd level 3rd level 4th level Figure: Three levels in a cluster tree. For practical purposes it is useful to work with clusters having a minimal size of n min > 1 rather than subdividing the clusters until only one index is left. Bebendorf, Mario (Univ. Leipzig) Kompaktkurs lineare GS und H-Matrizen April 2 33 /

34 Since the number of leaves L(T I ) is bounded by I /n min provided t n min for all t T I, the following estimate shows that the complexity of storing a cluster tree is still linear. The proof uses the property that each subtree of T I is a cluster tree. Lemma Let q := min t TI \L(T I ) deg t 2. Then for the number of vertices in T I it holds that T I q L(T I ) 1 q 1 2 L(T I ) 1. () Proof. We cut down the tree T vertex by vertex starting from the leaves of T \ L(T ) in k steps until only the root is left. Let T l denote the tree after l steps and q l the degree of the lth vertex. Then T l+1 = T l q l and L(T l+1 ) = L(T l ) q l + 1. After k steps T k = 1 = L(T k ), where Xk 1 Xk 1 T k = T q l and L(T k ) = L(T ) (q l 1). l=1 Hence, T = L(T ) + k 1 and from q k q it follows that k(q 1) L(T ) + q 2. l=1 Bebendorf, Mario (Univ. Leipzig) Kompaktkurs lineare GS und H-Matrizen April 2 34 /

35 The number of vertices in T I is always linear. In order to guarantee a logarithmic depth of T I we have to ensure that each subdivision by S produces clusters of comparable cardinality. Definition A tree T I is called balanced if R := min { t1 / t2, t1, t2 S(t)} t T I \L(T I ) is bounded independently of I by a positive constant from below. Figure: A balanced and an unbalanced tree. By L(T I ) := max t TI level t + 1 we denote the depth of the cluster tree T I. The depth of balanced cluster trees depends logarithmically on I. Bebendorf, Mario (Univ. Leipzig) Kompaktkurs lineare GS und H-Matrizen April 2 3 /

36 Lemma Let T I be a balanced cluster tree. Then for the depth of T I it holds that L(T I ) log 1+R ( I /n min ) + 1 log I (11) and t I (1 + R) l, where l denotes the level of t T I. Proof. For t T I \ L(T I ) and t S(t) we observe that t t = t + P t s S(t) s = 1 + X s 1 + ( S(t) 1)R 1 + R. t t t s S(t) Let e 1,..., e L 1 be a sequence of edges from the root v 1 := I to a deepest vertex v L in T I. Furthermore, let v 2,..., v L 1 be the intermediate vertices. Then from we obtain that (1 + R) v l+1 v l, l = 1,..., L 1, (1 + R) L 1 v L I, which gives (L 1) log(1 + R) log( I / v L ) log( I /n min ). Bebendorf, Mario (Univ. Leipzig) Kompaktkurs lineare GS und H-Matrizen April 2 3 /

37 The following lemma will be helpful for many of the complexity estimates. Lemma Let T I be a cluster tree for I, then X t T I t L(T I ) I and X t T I t log t L(T I ) I log I. (12) Proof. Each of the L(T I ) levels in T I is made of disjoint subsets of I. The second estimate follows from log t log I for t T I. For each vertex t in T I its indices have to be stored. It is desirable that clusters are contiguous. For this purpose, the set I should be reordered. If t is to be subdivided into t 1 and t 2, we rearrange the indices in t so that max t 1 min t 2. This can be done during the construction of T I. In this case, a cluster t can be represented by its minimal and maximal index. With this simplification, T I requires T I I units of storage even for unbalanced trees. The permutation requires additional I units of storage. Note that due to the hierarchical character of cluster trees, this reordering does not change the previously generated clusters. Bebendorf, Mario (Univ. Leipzig) Kompaktkurs lineare GS und H-Matrizen April 2 3 /

38 Construction of cluster trees In this section we will concentrate on how a cluster tree T I is constructed from an index set I N such that the diameters of X t are as small as possible. We assume that X i, i I, are quasi-uniform, i.e., there is a constant c U > such that max i I µ(x i ) c U min µ(x i ). i I The expression µ(m) denotes the m-dimensional measure of an m-dimensional manifold M R d. We assume that the computational domain Ω is an m-dimensional manifold, i.e., there is a constant c Ω > such that for all z Ω µ(ω B r (z)) c Ω r m for all r >. (13) In addition, we assume that only a bounded number of sets X i overlap; i.e., there is ν N such that {i I : j J such that int X i int X j } ν. (14) The above assumptions are in accordance with usual applications such as finite element discretizations. Bebendorf, Mario (Univ. Leipzig) Kompaktkurs lineare GS und H-Matrizen April 2 3 /

39 We use a clustering strategy which is based on the principal component analysis (PCA). In order to be able to apply it, we select arbitrary but fixed points z i X i, i = 1,..., n, e.g., the centroid of X i if X i is polygonal. A cluster t I is subdivided by the hypersurface through the centroid P m t := P i t µ(x i)z i i t µ(x i) of t with normal w t, where w t is the main direction of t. Definition A vector w R d, w 2 = 1, satisfying X X w T (z i m t) 2 = max v T (z i m t) 2 v 2 =1 i t is called main direction of the cluster t. i t Bebendorf, Mario (Univ. Leipzig) Kompaktkurs lineare GS und H-Matrizen April 2 3 /

40 Note that with the symmetric positive semidefinite covariance matrix C t := X i t (z i m t)(z i m t) T R d d it holds that X v T (z i m t) 2 = X v T (z i m t)(z i m t) T v = v T C tv for all v R d. i t i t Hence, by the variational representation of eigenvalues X max v 2 =1 i t v T (z i m t) 2 = max v 2 =1 v T C tv = λ max(c t) one observes that the maximum is attained for the eigenvector corresponding to the largest eigenvalue λ max of C t. Using w t, one possibility is to define the sons S(t) = {t 1, t 2} of t by t 1 = {i t : w T t (z i m t) > } and t 2 := t \ t 1. This subdivision generates geometrically balanced cluster trees in the sense that there are constants c g, c G > such that for each level l =,..., L(T I ) 1 (diam X t) m c g 2 l and µ(x t) 2 l /c G for all t T (l) I. (1) Bebendorf, Mario (Univ. Leipzig) Kompaktkurs lineare GS und H-Matrizen April 2 4 /

41 The following lemma shows that geometrically balanced cluster trees are balanced for quasi-uniform grids. Lemma Assume that X i, i I, are quasi-uniform. Then a geometrically balanced cluster tree is balanced. Proof. Let t 1, t 2 T (l) I (1) we see that be two clusters from the same level l of T I. From (14), (13), and t 1 min i t 1 µ(x i ) X i t 1 µ(x i ) νµ(x t1 ) νc Ω (diam X t1 ) m νc Ω c g 2 l. The last estimate together with leads to which proves the assertion. t 2 max µ(x i ) X µ(x i ) µ(x t2 ) 2 l /c G i t 2 i t 2 t 1 t 2 νc Ωc g c G max i t2 µ(x i ) min i t1 µ(x i ) νc Ωc g c G c U, Since subdividing t I requires O( t ) operations for geometrically balanced clustering, we can see that constructing T I takes L(T I ) I operations. Bebendorf, Mario (Univ. Leipzig) Kompaktkurs lineare GS und H-Matrizen April 2 41 /

42 Block Cluster Trees Since our aim is to find an admissible partition of I J, we consider cluster trees T I J for I J, which will be referred to as block cluster trees. Let T I and T J be cluster trees for I and J with corresponding mappings S I and S J. We will consider block cluster trees T I J which are defined by the following mapping S I J : (, if t s is admissible or S I (t) = or S J (s) =, S I J (t s) = S I (t) S J (s), else. The depth L(T I J ) of the tree T I J is obviously bounded by the minimum of the depths of the generating cluster trees T I and T J ; i.e., L(T I J ) min{l(t I ), L(T J )}. The leaves of T I J constitute an admissible partition P := L(T I J ). Usually, a block cluster is built from binary cluster trees; i.e., deg t = 2 for t T I \ L(T I ). In this case, T I J is a quadtree. Note that each block t s T I J consists of clusters t T I and s T J of the same level from their respective cluster tree. Bebendorf, Mario (Univ. Leipzig) Kompaktkurs lineare GS und H-Matrizen April 2 42 /

43 The sparsity constant A measure for the complexity of a partition is the so-called sparsity constant; cf. [4]. The sparsity is the maximum number of blocks in the partition that are associated with a given row or column cluster. Definition Let T I and T J be cluster trees for the index sets I and J and let T I J be a block cluster tree for I J. For a cluster t T I we denote the maximum number of blocks t s T I J by c r sp(t I J, t) := {s J : t s T I J }. Similarly, for a given cluster s T J c c sp(t I J, s) := {t I : t s T I J } stands for the maximum number of blocks t s T I J. The sparsity constant c sp of a block cluster tree T I J is then defined as j ff c sp(t I J ) := max max csp(t r I J, t), max csp(t c I J, s). t T I s T J Lemma Assume that T I and T J are geometrically balanced, i.e., (1) is satisfied. Then c sp(t I J ) 2νc Ω c g c G (2 + 1/η) m holds. Bebendorf, Mario (Univ. Leipzig) Kompaktkurs lineare GS und H-Matrizen April 2 43 /

44 Proof. Let t T (l) I with an associated point z t X t. The estimates (14) and (1) guarantee that each neighborhood N ρ := {s T (l) J : max x X s x z t ρ}, ρ >, of t contains at most νc G c Ω 2 l ρ m clusters s from the same level l in T J. This follows from N ρ 2 l /c G X s N ρ µ(x s) νµ(x Nρ ) νc Ω ρ m. (1) Let s T J such that t s T I J. Furthermore, let t and s be the father clusters of t and s, respectively. Assume that max x Xs x z t ρ, where Then ρ := min{diam X t, diam X s }/η + diam X t + diam X s (c g 2 (l 1) ) 1/m (2 + 1/η). dist(x t, X s ) max x X s x z t diam X t diam X s min{diam X t, diam X s }/η implies that t s is admissible. Thus, T I J cannot contain t s, which is a contradiction. It follows that max x Xs x z t < ρ (c g 2 (l 1) ) 1/m (2 + 1/η). From (1) we obtain that c r sp = {s T J : t s P} 2νc Ω c g c G (2 + 1/η) m. Interchanging the roles of t and s, one shows that c c sp is bounded by the same constant. Bebendorf, Mario (Univ. Leipzig) Kompaktkurs lineare GS und H-Matrizen April 2 44 /

45 Many algorithms in the context of hierarchical matrices can be applied blockwise. In this case, the cost of the algorithm is the sum over the costs of each block. Assume that on each block the costs are bounded by a constant c >. Then X X c c {s J : t s T I J } ccsp T r I cc sp T I. (1) t s T I J t T I By interchanging the roles of t and s, we also obtain P t s T I J c cc sp T J. If the cost of the algorithm on each block t s P is bounded by c( t + s ), then the overall cost can be estimated as X X c( t + s ) = c t s T I J t T I X {s T J :t s T I J} t + c X s T J 1 X {t T I :t s T I J} s (1a) cc X t T I t + X s T J cc sp[l(t I ) I + L(T J ) J ] s A cc spl(t I J )[ I + J ] (1b) (1c) due to (12). Here, T I and T J denote the subtrees of T I and T J, respectively, which are actually used to construct T I J ; i.e., T I := {t T I : there is t t and s T J such that t s T I J }. Bebendorf, Mario (Univ. Leipzig) Kompaktkurs lineare GS und H-Matrizen April 2 4 /

46 Lemma Let T I and T J be cluster trees for the index sets I and J, respectively. For the number of blocks in a partition L(T I J ) and for the number of vertices in T I J it holds that L(T I J ) T I J 2c sp min { L(T I ), L(T J ) )}. If t n min for all t T I T J, then T I J 2c sp min { I, J } /n min. Proof. Estimate (1) applied to T I J = P t s T I J 1 gives T I J c sp min{ T I, T J }. Equation () states that T I 2 L(T I ) and T J 2 L(T J ). Lemma Let T I and T J be balanced cluster trees. Taking into account requirement (b) on the admissibility condition, the number of operations for constructing the block cluster tree is of the order c sp[ I log I + J log J ]. Proof. Use (1) and (11). The time required to compute an admissible matrix partition can be neglected compared with the rest of the computation. Bebendorf, Mario (Univ. Leipzig) Kompaktkurs lineare GS und H-Matrizen April 2 4 /

47 The Set of Hierarchical Matrices Definition The set of hierarchical matrices on the block cluster tree T I J with admissible partition P := L(T I J ) and blockwise rank k is defined as n o H(T I J, k) = A C I J : rank A b k for all admissible b = t s P. For the sake of brevity, elements from H(T I J, k) will be called H-matrices. Remark For an efficient treatment of admissible blocks the outer-product representation should be used. Additionally, it is advisable not to use the maximum rank k but the actual rank of the respective block as the number of rows and columns. Storing non-admissible blocks entrywise will increase the efficiency. Bebendorf, Mario (Univ. Leipzig) Kompaktkurs lineare GS und H-Matrizen April 2 4 /

48 We have already seen that the storage requirements for an admissible block b = t s L(T I J ) of A H(T I J, k) are N st(a b ) = k( t + s ). A non-admissible block A b, b P, is stored entrywise and thus requires t s units of storage. Since min{ t, s } n min we have t s = min{ t, s } max{ t, s } n min ( t + s ). (1) Hence, for storing A b, b P, at most max{k, n min }( t + s ) units of storage are required. Using (1), we obtain the following theorem. Theorem Let c sp be the sparsity constant of the partition P. The storage requirements N st for A H(T I J, k) are bounded by N st(a) c sp max{k, n min }[L(T I ) I + L(T J ) J ]. If T I and T J are balanced cluster trees, we have N st(a) max{k, n min }[ I log I + J log J ]. Bebendorf, Mario (Univ. Leipzig) Kompaktkurs lineare GS und H-Matrizen April 2 4 /

49 Sparse H-matrices Although H-matrices are primarily aiming at dense matrices, sparse matrices A which vanish on admissible blocks are also in H(T I I, n min ). Since the size of one of the clusters corresponding to non-admissible blocks is less than or equal to n min, the rank of each block A b does not exceed n min. 1 1 The following lemma shows that the storage requirements are actually linear. Lemma Storing FE matrices A as an H-matrix requires O(n) units of storage. Bebendorf, Mario (Univ. Leipzig) Kompaktkurs lineare GS und H-Matrizen April 2 4 /

50 Proof. Since A vanishes on admissible blocks, we only have to estimate the number of non-admissible blocks. Let t T (l) I be a cluster from the lth level of T I. The number of elements of the set n o N(t) := s T (l) J : η dist(x t, X s) min{diam X t, diam X s} is bounded by a constant since due to (1) N(t) 2 l /c G X µ(x s) νµ(x N(t) ) s N(t) νc Ω (diam X t + η 1 min{diam X t, diam X s}) d = νc Ω (1 + 1/η) d c g 2 l gives N(t) νc g c G c Ω (1 + 1/η) d. Since there are only T I I cluster t T I, we obtain the desired result. Bebendorf, Mario (Univ. Leipzig) Kompaktkurs lineare GS und H-Matrizen April 2 /

51 Matrix-Vector Multiplication Multiplying an H-matrix A H(T I J, k) or its Hermitian transpose A H by a vector x can be done blockwise: Ax = X A tsx s and A H x = X (A ts) H x t. t s P t s P Since each admissible block t s has the outer product representation A ts = UV H, U C t k, V C s k, at most 2k( t + s ) operations are required to compute the matrix-vector products A tsx s = UV H x s and (A ts) H x t = VU H x t. If t s is non-admissible, then A ts is stored entrywise and min{ t, s } n min. As in (1) we see that in this case 2 t s 2n min ( t + s ) arithmetic operations are required. Theorem For the number of operations N MV required for one matrix-vector multiplication Ax of A H(T I J, k) by a vector x C J it holds that N MV (A) 2c sp max{k, n min }[L(T I ) I + L(T J ) J ]. If T I and T J are balanced cluster trees, we have N MV (A) max{k, n min }[ I log I + J log J ]. Hence, H-matrices are well suited for iterative schemes such as Krylov subspace methods which the matrix enters only through the matrix-vector product. Bebendorf, Mario (Univ. Leipzig) Kompaktkurs lineare GS und H-Matrizen April 2 1 /

52 Blockwise and global norms From the analysis we will usually obtain estimates on each of the blocks b of a partition P. However, such estimates are finally required for the whole matrix. If we are interested in the Frobenius norm, blockwise estimates directly translate to global estimates using A 2 F = X b P A b 2 F. For the spectral norm the situation is a bit more difficult. We can however exploit the structure of the partition P together with the following lemma. Lemma Consider the following r r block matrix 2 A A 1r A = 4.. A r1... A rr with A ij C m i n j, i, j = 1,..., r. Then it holds that max A ij 2 A 2 i,j=1,...,r max i=1,...,r! 1/2 rx A ij 2 j=1 max j=1,...,r! 1/2 rx A ij 2. i=1 Bebendorf, Mario (Univ. Leipzig) Kompaktkurs lineare GS und H-Matrizen April 2 2 /

53 Theorem Let P be the leaves of a block cluster tree T I J. Then for A, B H(T I J, k) it holds that (a) max b P A b 2 A 2 c spl(t I J ) max b P A b 2 (b) A 2 c spl(t I J ) B 2 provided max b P A b 2 max b P B b 2. Proof. Let A l denote the part of A which corresponds to the blocks of P from the lth level of T I J ; i.e., ( A b, b T (l) I J (A l ) b = P,, else. Since A l has tensor structure with at most c sp per block row or block column, the last lemma gives A l 2 c sp max b T (l) I J P A b 2, such that A 2 The estimate L(T I J ) X l=1 A l 1 2 c sp gives the second part of the assertion. L(T I J ) X l=1 max b T (l 1) I J max A b 2 max B b 2 B 2 b P b P A b c spl(t I J ) max A b 2. P b P Bebendorf, Mario (Univ. Leipzig) Kompaktkurs lineare GS und H-Matrizen April 2 3 /

54 Adding H-Matrices Since H(T I J, k) is not a linear space, we have to approximate the sum A + B by a matrix S H(T I J, k) if we want to avoid that the rank and hence the complexity grows with each addition. use rounded addition on each admissible block; on non-admissible block, the usual addition is employed. Since the rounded addition gives a blockwise best approximation, S is a best approximation in the Frobenius norm A + B S F A + B M F for all M H(T I J, k). Using the last theorem, this estimate for the spectral norm reads A + B S 2 c spl(t I J ) A + B M 2 for all M H(T I J, k). The following bound on the number of arithmetic operations results from the complexity of the rounded addition, (1), and (1). Theorem Let A, B H(T I J, k). A matrix S H(T I J, k) satisfying the above error estimates can be computed with at most arithmetic operations. 13c spk 2 [L(T I ) I + L(T J ) J ] + 1c spk 3 min{ T I, T J } Bebendorf, Mario (Univ. Leipzig) Kompaktkurs lineare GS und H-Matrizen April 2 4 /

55 Preserving Positivity If the smallest eigenvalue is close to the origin compared with the rounding accuracy, it may happen that the rounded result becomes indefinite although it should be positive definite in exact arithmetic. Assume that Â CI I is the Hermitian positive definite result of an exact addition of two matrices from H(T I I, k) and let A H(T I I, k) be its H-matrix approximant. For a moment we assume that Â and A differ only on a single off-diagonal block t s P. Then an error matrix EF H, E C t k, F C s k, satisfying is associated with t s; i.e., max{ E 2, F 2} ε A ts = Âts EF H. Due to symmetry, FE H is the error matrix on block s t. Bebendorf, Mario (Univ. Leipzig) Kompaktkurs lineare GS und H-Matrizen April 2 /

56 We modify the approximant A in such a manner that the new approximant Ã can be guaranteed to be positive definite. This is done by adding EE H to A tt and FF H to A ss such that»ãtt Ã H ts Ã ts Ã ss» Att A ts := A H ts A ss» EE H + FF H =»Âtt Â H ts Â ts Â ss» EE H + FE H EF H FF H. Since» EE H FE H EF H FF H =» E F» H E F is positive semi-definite, the eigenvalues of Ã are not smaller than those of Â. Therefore, Ã is Hermitian positive definite and»»âtt Â ts EE H EF H 2 = 2 E F 2 2 2ε.»Ãtt Ã H ts Ã ts Ã ss Â H ts Â ss FE H FF H Bebendorf, Mario (Univ. Leipzig) Kompaktkurs lineare GS und H-Matrizen April 2 /

57 Note that adding EE H to A tt and FF H to A ss leads to a rounding error which in turn has to be added to the diagonal sub-blocks of t t and s s in order to preserve positivity. For this purpose, we add a positive semi-definite matrix. Let t 1 and t 2 be the sons of t and let s 1 and s 2 be the sons of s. If we define Ã tt := Ãtt +» Et1 E t2» H» Et1 Et1 Et H = A E tt t2 E t2 Et H 2 the problem of adding EE H to A tt is reduced to adding 2E t1 E H t 1 to A t1 t 1 and 2E t2 E H t 2 to A t2 t 2. Applying this idea recursively, adding EE H to A tt can finally be done by adding a multiple of E t E H t to the dense matrix block A t t for each leaf t in T I from the set of descendants of t., Bebendorf, Mario (Univ. Leipzig) Kompaktkurs lineare GS und H-Matrizen April 2 /

58 We obtain the following two algorithms addsym stab and addsym diag. procedure addsym stab(t, s, U, V, var A) if t s is non-admissible then add UV H to A ts without approximation; else add UV H to A ts using the rounded addition; denote by EF H the rounding error; addsym diag(t, E, A); addsym diag(s, F, A); endif The first adds a matrix of low rank UV H to an off-diagonal block t s while the latter adds EE H to the diagonal block t t. Note that we assume that an Hermitian matrix is represented by its upper triangular part only. procedure addsym diag(t, E, var A) if t t is a leaf then add EE H to A tt without approximation; else addsym diag(t 1, 2E t1, A); addsym diag(t 2, 2E t2, A); endif Bebendorf, Mario (Univ. Leipzig) Kompaktkurs lineare GS und H-Matrizen April 2 /

59 Theorem Let A, B be Hermitian and let λ i, i I, denote the eigenvalues of A + B. Assume that S H H(T I I, k) has precision ε. Using the stabilized rounded addition on each block leads to a matrix S H H(T I I, k) with eigenvalues λ i λ i, i I, satisfying A + B S H 2 L(T I ) I ε. At most (max{k 2, n min } + n min k)l(t I ) I operations are needed to construct S H. Bebendorf, Mario (Univ. Leipzig) Kompaktkurs lineare GS und H-Matrizen April 2 /

60 Multiplying H-Matrices Let A H(T I J, k A ) and B H(T J K, k B ) be two hierarchical matrices. We compute an approximation C of the product in H(T I K, k). Assume that A and B are subdivided according to their block cluster trees T I J and T J K :» A11 A 12 A = A 21 A 22, B =» B11 B 12 B 21 B 22 Then AB has the following block structure» A11B 11 + A 12B 21 A 11B 12 + A 12B 22 AB =. A 21B 11 + A 22B 21 A 21B 12 + A 22B 22 Assume that the products A ik B kj, i, j, k = 1, 2, each of which has half the size of AB, have been computed. Round the sums A i1 B 1j + A i2 B 2j to rank- k matrices R ij ; if C has a 2 2 block structure in T I K, then C ij := C ij + R ij, i, j = 1, 2. else: agglomerate» R11 R 12 R 21 R 22 to a single rank- k matrix and add it to C. The complexity of this multiplication was shown to be of the order k 2 L 2 (T I ) I + k 3 I for I = J. Bebendorf, Mario (Univ. Leipzig) Kompaktkurs lineare GS und H-Matrizen April 2 /.

61 Hierarchical Inversion Assume that a block A tt is subdivided into sub-blocks in the following way:» At1 t A tt = 1 A t1 t 2, A t2 t 1 A t2 t 2 where t 1 and t 2 denote the sons of t in T I. For the exact inverse of A tt it holds» A 1 A 1 t tt = 1 t 1 + A 1 t 1 t 1 A t1 t 2 S 1 A t2 t 1 A 1 t 1 t 1 A 1 t 1 t 1 A t1 t 2 S 1 S 1 A t2 t 1 A 1 t 1 t 1 S 1, where S denotes the Schur complement S := A t2 t 2 A t2 t 1 A 1 t 1 t 1 A t1 t 2 of A t1 t 1 in A. Let T H(T I I, k) which together with C is initialized to zero. procedure inverth(t, A, var C) if t L(T I ) then C tt := A 1 tt is the usual inverse else inverth(t 1, A, C) T t1 t 2 = T t1 t 2 C t1 t 1 A t1 t 2 T t2 t 1 = T t2 t 1 A t2 t 1 C t1 t 1 A t2 t 2 = A t2 t 2 + A t2 t 1 T t1 t 2 inverth(t 2, A, C) C t1 t 2 = C t1 t 2 + T t1 t 2 C t2 t 2 C t2 t 1 = C t2 t 1 + C t2 t 2 T t2 t 1 C t1 t 1 = C t1 t 1 + T t1 t 2 C t2 t 1 The complexity of the computation of the H-inverse is determined by the cost of the H-matrix multiplication. Bebendorf, Mario (Univ. Leipzig) Kompaktkurs lineare GS und H-Matrizen April 2 1 /

62 Hierarchical LU Decomposition Although the hierarchical inversion has almost linear complexity, the following hierarchical LU decomposition provides a significantly more efficient alternative.. To define the H-LU decomposition, we exploit the hierarchical block structure of a block A tt, t T I \ L(T I ):» At1 t A tt = 1 A t1 t 2 = A t2 t 1 A t2 t 2» Lt1 t 1 L t2 t 1 L t2 t 2» Ut1 t 1 U t1 t 2 U t2 t 2 where t 1, t 2 T I denote the sons of t in T I. Hence, the LU decomposition of a block A tt is reduced to the following four problems on the sons of t t: (a) Compute L t1 t 1 and U t1 t 1 from the LU decomposition L t1 t 1 U t1 t 1 = A t1 t 1 ; (b) Compute U t1 t 2 from L t1 t 1 U t1 t 2 = A t1 t 2 ; (c) Compute L t2 t 1 from L t2 t 1 U t1 t 1 = A t2 t 1 ; (d) Compute L t2 t 2 and U t2 t 2 from the LU decomposition L t2 t 2 U t2 t 2 = A t2 t 2 L t2 t 1 U t1 t 2. If a block t t L(T I I ) is a leaf, the usual pivoted LU decomposition is employed. For (a) and (d) two LU decompositions of half the size have to be computed., Bebendorf, Mario (Univ. Leipzig) Kompaktkurs lineare GS und H-Matrizen April 2 2 /

63 In order to solve (b), i.e., solve a problem of the structure L ttb ts = A ts for B ts, where L tt is a lower triangular matrix and t s T I I, we use a recursive block forward substitution: If the block t s is not a leaf in T I I, from the subdivision of the blocks A ts, B ts and L tt into their sub-blocks»»» Lt1 t 1 Bt1 s 1 B t1 s 2 At1 s = 1 A t1 s 2 L t2 t 1 L t2 t 2 B t2 s 1 B t2 s 2 A t2 s 1 A t2 s 2 one observes that B ts can be found from the following equations L t1 t 1 B t1 s 1 = A t1 s 1 L t1 t 1 B t1 s 2 = A t1 s 2 L t2 t 2 B t2 s 1 = A t2 s 1 L t2 t 1 B t1 s 1 L t2 t 2 B t2 s 2 = A t2 s 2 L t2 t 1 B t1 s 2, which are again of type (b). If on the other hand t s is a leaf, the usual forward substitution is applied. Similarly, one can solve (c) by recursive block backward substitution. The complexity of the above recursions is determined by the complexity of the hierarchical matrix multiplication, which can be estimated as O(k 2 I log 2 I ) for two matrices from H(T I I, k). Bebendorf, Mario (Univ. Leipzig) Kompaktkurs lineare GS und H-Matrizen April 2 3 /

64 In the case of positive definite matrices A it is possible to define an H-version of the Cholesky decomposition of a block A tt, t T I \ L(T I ):» At1 t A tt = 1 A t1 t 2 A H = t 1 t 2 A t2 t 2 This factorization is recursively computed by» Lt1 t 1 L t2 t 1 L t2 t 2» H Lt1 t 1. L t2 t 1 L t2 t 2 L t1 t 1 L H t 1 t 1 = A t1 t 1 L t1 t 1 L H t 2 t 1 = A t1 t 2 L t2 t 2 L H t 2 t 2 = A t2 t 2 L t2 t 1 L H t 2 t 1 using the usual Cholesky decomposition on the leaves of T I I. The second equation L t1 t 1 L H t 2 t 1 = A t1 t 2 is solved for L t2 t 1 in a similar way as U t1 t 2 has previously been obtained in the LU decomposition. Bebendorf, Mario (Univ. Leipzig) Kompaktkurs lineare GS und H-Matrizen April 2 4 /

65 Once A has been decomposed, the solution of Ax = b can be found by forward/backward substitution: L Hy = b and U Hx = y. Since L H and U H are H-matrices, y t, t T I \ L(T I ), can be computed recursively by solving the following systems for y t1 and y t2 L t1t1y t1 = b t1 and L t2t2y t2 = b t2 L t2t1y t1. If t L(T I ) is a leaf, a usual triangular solver is used. The backward substitution can be done analogously. These substitutions are exact and their complexity is determined by the complexity of the hierarchical matrix-vector multiplication, which is O(k I log I ) for multiplying an H(T I I, k)-matrix by a vector. NOTE: We have assumed that the blockwise rank k required for a given approximation accuracy stays bounded during the computation; For elliptic problems the required rank can be proved to depend logarithmically on n and on the accuracy ε. Bebendorf, Mario (Univ. Leipzig) Kompaktkurs lineare GS und H-Matrizen April 2 /

66 Adaptive Cross Approximation (Computation of Approximants for BEM) We assume that a partition has been generated. Blocks b P which do not satisfy () are generated and stored without approximation. All other blocks b P satisfy () and can be treated independently from each other. Therefore, in the rest of this section we focus on a single block of a discrete integral operator A ts = Λ 1,tAΛ 2,s which is identified with A R m n. The idea of the algorithm is as follows. Starting from R := A, find a nonzero pivot in R k, say (i k, j k ), and subtract a scaled outer product of the i k th row and the j k th column: R k+1 := R k [(R k ) ik j k ] 1 (R k ) 1:m,jk (R k ) ik,1:n, (2) where we use the notations (R k ) i,1:n and (R k ) 1:m,j for the ith row and the jth column of R k, respectively. It will turn out that j k should be chosen the maximum element in modulus of the i k th row; i.e., (R k 1 ) ik j k = The choice of i k is a bit more delicate. max (R k 1) ik j. j=1,...,n Bebendorf, Mario (Univ. Leipzig) Kompaktkurs lineare GS und H-Matrizen April 2 /

67 Example We apply two steps of equation (2) to the following matrix R. The bold entries are the chosen pivots T i 1 = R = j = , T R 1 = R 2 = i 3 =3 j 3 =1 i 2 =2 j 2 = T Apparently, the size of the entries decreases from step to step., Bebendorf, Mario (Univ. Leipzig) Kompaktkurs lineare GS und H-Matrizen April 2 /

68 Since in the kth step only the entries in the j k th column and the i k th row of R k are used to compute R k+1, there is no need to build the whole matrix R k. Let k = 1; Z = ; repeat find i k as described later on ṽ k := a ik,1:n for l = 1,..., k 1 do ṽ k := ṽ k (u l ) ik v l Z := Z {i k } if ṽ k does not vanish then v k := (ṽ k ) 1 j ṽ k k j k := argmax j=1,...,n (ṽ k ) j ; u k := a 1:m,jk for l = 1,..., k 1 do u k := u k (v l ) jk u l. k := k + 1 until the stopping criterion (21) is fulfilled or Z = {1,..., m} The matrix S k := P k l=1 u lv T l will be used as an approximation of A = S k + R k. Obviously, the rank of S k is bounded by k. Bebendorf, Mario (Univ. Leipzig) Kompaktkurs lineare GS und H-Matrizen April 2 /

69 Let ε > be given. The following condition on k u k+1 2 v k+1 2 ε(1 η) 1 + ε S k F (21) can be used as a stopping criterion. Assume that R k+1 F η R k F with η from (), then R k F R k+1 F + u k+1 v T k+1 F η R k F + u k+1 2 v k+1 2. Hence, R k F 1 1 η u k+1 2 v k+1 2 ε 1 + ε S k F ε 1 + ε ( A F + R k F ). From the last estimate we obtain R k F ε A F ; i.e., condition (21) guarantees a relative approximation error ε. Due to (2), the Frobenius norm of S k can be computed with O(k 2 (m + n)) complexity. Therefore, the amount of numerical work required by Algorithm.1 is of the order Z 2 (m + n), which leads to an overall logarithmic-linear complexity due to (1). Remark If the costs for generating the matrix entries dominate the algebraic transformations of Algorithm.1, then its complexity scales like Z (m + n). Bebendorf, Mario (Univ. Leipzig) Kompaktkurs lineare GS und H-Matrizen April 2 /

70 What remains is to estimate the remainder R k. For this purpose, the entries of R k will be estimated by the approximation error F Ξ ts := max inf j s p span Ξ AΛ 2,j p,yt in an arbitrary system of functions Ξ := {ξ 1,..., ξ k } with ξ 1 = 1. Note that AΛ 2,j is an asymptotically smooth function since supp Λ 2,j = X j. For the linear operators Λ 1,i, i = 1,..., k, corresponding to the first k rows in A we assume that det [Λ 1,i ξ j ] i,j=1,...,k, which can be guaranteed by the choice of pivoting rows i k. Theorem Then for i = 1,..., m and j = 1,..., n it holds that (R k ) ij c(1 + I Ξ k )(1 + 2k ) ψ i L 1F Ξ ts. (22) Bebendorf, Mario (Univ. Leipzig) Kompaktkurs lineare GS und H-Matrizen April 2 /

71 The similarity of ACA and the LU factorization can be seen from the following representation R k = (I γ k R k 1 e k e T k )R k 1 = L k R k 1 with the m m matrix L k defined by L k = 4 3 (R k 1) k+1,k, (R k 1 ) kk (R k 1) mk (R k 1 ) kk 1 which differs from a Gauß matrix only in the position (k, k). It is known that during the LU decomposition the so-called growth of entries may happen. Note that this is reflected by the factor 2 k in (22). However, this growth is also known to be rarely observable in practice. Bebendorf, Mario (Univ. Leipzig) Kompaktkurs lineare GS und H-Matrizen April 2 1 /

Numerical Example In order to show that a thorough implementation of ACA can handle nonsmooth geometries, we consider the Dirichlet boundary value problem u = in Ω, u = g on Γ.

72 Numerical Example In order to show that a thorough implementation of ACA can handle nonsmooth geometries, we consider the Dirichlet boundary value problem u = in Ω, u = g on Γ. Boundary integral equation for the unknown t := νu: Vt = ( 1 Z 1 u(x) I + K)g with Vu(y) = 2 4π x y dsx and Ku(y) = 1 4π RΓ u(x) νx 1 dsx. A Galerkin x y discretization with piecewise linears ϕ j and piecewise constants ψ i leads to Γ Vx = b, b = ( 1 2 M + K) g, where for i = 1,..., n and j = 1,..., n V ij = (Vψ j, ψ i ) L 2 (Γ), K ij = (Kϕ j, ψ i ) L 2 (Γ), and M ij = (ϕ j, ψ i ) L 2 (Γ). Furthermore, g R n is the vector minimizing g P n j=1 g jϕ j L 2 (Γ). The solution x defines an approximation t h := P n i=1 x iψ i of t. Bebendorf, Mario (Univ. Leipzig) Kompaktkurs lineare GS und H-Matrizen April 2 2 /

73 Since V is coercive, its Galerkin stiffness matrix V is symmetric positive definite. Hence, in contrast to the H-matrix approximant K H to K, we may generate only the upper triangular part of the approximant V H to V. n = 2 n = 11 2 η time MB ratio time MB ratio. 31s 2.1% 1 s % 1. 23s 24.4% 1 21s 1.% s 13.4% 1 2s 1.% 1.4 2s 12.1% 2 12s % Table: Approximation results for V and ε = 1 4. n = 2 n = 11 2 η time MB ratio time MB ratio s % 11 1s 2.% s % 1s % s % s % s 4 1.2% 21 2s % Table: Approximation results for K and ε = 1 4. For the computation of the singular integrals we have used O. Steinbach s semi-analytic quadrature routines OSTBEM. Bebendorf, Mario (Univ. Leipzig) Kompaktkurs lineare GS und H-Matrizen April 2 3 /

74 Helmholtz equation Boundary integral formulations can be derived and are particularly useful for Helmholtz equation u ω 2 u = in Ω c, u = 1 on Ω, where Ω R d is a bounded domain and Ω c = R d \ Ω. The mesh size h has to be chosen such that ωh is constant when discretizing Ω. As a consequence, k h 1 n 1/(d 1) and methods based on low-rank approximants will not be able to achieve logarithmic-linear complexity for large ω. In the next table we compare the required rank for a prescribed accuracy ε = 1 4 of ACA applied to an admissible sub-block with the low-rank approximant resulting from the SVD for increasing wave numbers ω. SVD ACA ω matrix size k time k time 2 1.s.s s 13.s s 2.s s 42.s s s Bebendorf, Mario (Univ. Leipzig) Kompaktkurs lineare GS und H-Matrizen April 2 4 /

75 Using Hierarchical Matrices for Preconditioning Consider a sequence of linear systems where each A n C n n is invertible. A nx n = b n, n, (23) convergence of Krylov subspace methods is determined by the distribution of eigenvalues of A n; eigenvalues determined by mapping properties of the underlying differential or integral operator A; a large condition number can arise even for small n. Therefore, if (23) is to be solved iteratively, one has to incorporate a preconditioner. IDEA: generate preconditioners C from low-accuracy approximations of the inverse of A: C A 1 ; Usually more efficient: approximate LU decomposition C = (L HU H) 1. How much accuracy is required for preconditioning? Bebendorf, Mario (Univ. Leipzig) Kompaktkurs lineare GS und H-Matrizen April 2 /

76 Hermitian positive definite coefficient matrices Use preconditioned conjugate gradient method (PCG) to solve Hermitian positive definite systems. The condition I A HC 2 ε < 1, (24) in which ε does not depend on n, leads to an well-conditioned matrix A HC. The following lemma provides an estimate on the precision ε of the preconditioner. Lemma Assume that (24) holds. Then cond 2(A HC) = A HC 2 (A HC) ε 1 ε. Proof. The assertion follows from the triangle inequality and from the Neumann series A HC 2 I 2 + I A HC ε (A HC) 1 2 X k= I A HC k 2 = 1 1 ε. Bebendorf, Mario (Univ. Leipzig) Kompaktkurs lineare GS und H-Matrizen April 2 /

77 The choice ε =., for instance, guarantees that cond 2(A HC) 3. problem independent convergence rates. In order to be able to apply PCG, C additionally needs to be Hermitian positive definite. It is interesting to see that this is already guaranteed by condition (24). Lemma Assume that A H is Hermitian positive definite. Then any Hermitian matrix C satisfying (24) is positive definite, too. Proof. According to the assumptions, the square root A 1/2 H of AH is defined. Since AHC is similar to the Hermitian matrix A 1/2 H CA1/2 H, the eigenvalues of AHC are real. Moreover, for the smallest eigenvalue of A 1/2 H CA1/2 H it follows that λ min (A 1/2 H CA1/2 H ) = λ min(a HC) 1 ε. Let x and y = A 1/2 x. Then y and we have H which proves that C is positive definite. x H Cx = y H A 1/2 H CA1/2 H y (1 ε) y 2 2 >, Bebendorf, Mario (Univ. Leipzig) Kompaktkurs lineare GS und H-Matrizen April 2 /

78 From the approximation by H-matrices usually error estimates of the form instead of (24) are satisfied. Lemma A H C 1 2 ε A H 2 or A 1 H Assume that (2) holds with ε > such that ε cond 2(A H) < 1. Then cond 2(A HC) 1 + ε cond2(ah) 1 ε cond 2(A H). Proof. Assume first that A H C 1 2 ε A H 2. Since C 2 ε A 1 H 2 (2) I (A HC) 1 2 = (A H C 1 )A 1 H 2 ε A H 2 A 1 H 2 = ε cond 2(A H), one can apply the proof of the second last lemma with A HC replaced by (A HC) 1. If A 1 H C 2 ε A 1 H 2, then gives the assertion. I A HC 2 = A H(A 1 H C) 2 ε A H 2 A 1 H 2 = ε cond 2(A H) Bebendorf, Mario (Univ. Leipzig) Kompaktkurs lineare GS und H-Matrizen April 2 /

79 The stronger condition ε cond 2(A H) < 1 implies that ε if A H is not well-conditioned. This will NOT destroy the almost linear complexity since it will be seen that the complexity of the H-matrix approximation depends logarithmically on the accuracy ε. In the non-hermitian case, not the spectral condition number of the coefficient matrix but the distance of a cluster of eigenvalues to the origin usually determines the convergence rate of appropriate Krylov subspace methods such as GMRes, BiCGStab, and MinRes. For the convergence of GMRes, for instance, the numerical range n o F (A HC) := x H A HCx : x C n, x 2 = 1 of A HC is of particular importance. It is known that «k r b A Hx k 2 2 b 2 z provided F (A HC) B r (z), where B r (z) denotes the closed disc around z with radius r. Condition (24) implies that F (A HC) B ε(1), which follows from x H A HCx 1 = x H (A HC I )x I A HC 2 ε for all x C n, x 2 = 1. Therefore, (24) also leads to a problem-independent convergence of GMRes. b A Hx k 2 2ε k b 2 Bebendorf, Mario (Univ. Leipzig) Kompaktkurs lineare GS und H-Matrizen April 2 /

80 Industrial Applications In the following numerical results we will use C := U 1 H L 1 H as an explicit preconditioner T * Figure: Low-precision Cholesky decomposition. If A H is Hermitian positive definite, C := L T H L 1 H is used as a preconditioner. The ability to compute preconditioners from the matrix approximant A H in a black-box way is one of the advantages of H-matrices over fast multipole methods. Bebendorf, Mario (Univ. Leipzig) Kompaktkurs lineare GS und H-Matrizen April 2 /

The first example is the boundary integral equation Z Z 1 2 u(y) + (νx, y x) u(x) ds x y 3 x = νx u(x) dsx, x y y Γ, Γ with given Neumann boundary condition νu = g on the surface Γ.

81 The first example is the boundary integral equation Z Z 1 2 u(y) + (νx, y x) u(x) ds x y 3 x = νx u(x) dsx, x y y Γ, Γ with given Neumann boundary condition νu = g on the surface Γ. The dimension of the coefficient matrix A arising from a collocation method is n = 3. Since the kernel of A is one-dimensional (the surface is simply connected), the extended system» A v w T» x = λ with v Im A and w Ker A, which is uniquely solvable, has to be considered instead. Note that (2) arises from adding the auxiliary condition x Ker A by Lagrangian multipliers. Γ» b (2) Figure: Device with electric potential (by courtesy of ABB Schweiz AG). Bebendorf, Mario (Univ. Leipzig) Kompaktkurs lineare GS und H-Matrizen April 2 1 /

82 In the following table we compare the results obtained by fast methods and a standard solution strategy; i.e., A is built without approximation and the augmented system (2) is solved using Gaussian elimination. For the kernel vectors we have used v = w = (1,..., 1) T. storage CPU time matrix precond. matrix precond. solution standard MB.s 1.4s Mbit 42 MB 14.1s 123.2s ACA 2 MB 12 MB.s 1.s 1.3s In the row Mbit, the results using an implementation of the fast multipole method can be found. The preconditioner was computed from the approximant A H of A with precision δ =.1. Although the double-layer potential operator is asymptotically well conditioned, the augmented coefficient matrix is ill-conditioned even for small problem sizes. This is due to the geometry and its discretization with strongly non-uniform triangles. Bebendorf, Mario (Univ. Leipzig) Kompaktkurs lineare GS und H-Matrizen April 2 2 /

Mixed boundary value problems We consider the following device, which is connected to an electric source of opposite voltages on the dark parts of the left picture.

83 Mixed boundary value problems We consider the following device, which is connected to an electric source of opposite voltages on the dark parts of the left picture. The discrete single-layer potential operator V and the discrete hypersingular operator D are symmetric positive definite matrices. Hence, the coefficient matrix» V K A := K T D is symmetric and non-singular since the Schur complement S := D + K T V 1 K of V in A is symmetric positive definite. Bebendorf, Mario (Univ. Leipzig) Kompaktkurs lineare GS und H-Matrizen April 2 3 /

84 We employ the preconditioner» L T C := Û 1 1 L T 2» I L T, where Û := and L 1 and L 2 denote lower triangular H-matrices such that 1 X I I (L 1L T 1 ) 1 V 2 < δ, I (L 2L T 2 ) 1 D 2 < δ, and X is an H-matrix satisfying K L 1X 2 < δ. Note that L 2 is defined to be the approximate Cholesky factor of D but not of D + X T X ; i.e., instead of approximating the original coefficient matrix A, C T C approximates the matrix» V K 1. K T D K T V 1 K Theorem The eigenvalues of C T AC are contained in [ 1 cδ, 1 + cδ] [γ 1 3 (1 cδ), 1 + cδ], where c := 4(c K + 2) max{ V 1 2, D 1 2} max{1, δ(c K + 2) V 1 2} + 1. Bebendorf, Mario (Univ. Leipzig) Kompaktkurs lineare GS und H-Matrizen April 2 4 /

85 Since C T AC is symmetric indefinite, we employ MinRes for the iterative solution of the preconditioned linear system. For the kth residual vector r k it holds that r k 2 p (a ρ)(b + ρ) p (a + ρ)(b ρ) p (a ρ)(b + ρ) + p (a + ρ)(b ρ)! k/2 r, k = 1, 2,..., where the spectrum is assume to be enclosed in positive and negative intervals [ a ρ, a + ρ] [b ρ, b + ρ]. In order to obtain a convergence rate which is independent of the discretization, we therefore have to guarantee that cδ < 1 2. n = 14 n = 2 3 precond. solution precond. solution δ time MB #It time δ time MB #It time 2 1.s. 11.s 3 22.s 3. 3.s s. 1.s s s s s s s Bebendorf, Mario (Univ. Leipzig) Kompaktkurs lineare GS und H-Matrizen April 2 /

86 Mixed BVPs with vanishing Dirichlet part The coarsest discretization of the following surface contains only 4 Dirichlet triangles. If the Dirichlet part would vanish completely, then the hypersingular operator would not be coercive at all. After generating the approximant with accuracy ε = 1, we recompress a copy of the coefficient matrix to a blockwise relative accuracy δ. The hierarchical Cholesky decomposition fails to compute unless we use a stabilized variant which is based on the stabilization technique. n = 3 12 n = 12 2 precond. solution precond. solution δ time MB #It time δ time MB #It time s.3 1.4s s 3. 4.s s. 1.s 4 1.4s s 3 1.s s s 3.1.s Iterating without any preconditioner does not converge at all. Bebendorf, Mario (Univ. Leipzig) Kompaktkurs lineare GS und H-Matrizen April 2 /

Singular Value Decomposition

Chapter 6 Singular Value Decomposition In Chapter 5, we derived a number of algorithms for computing the eigenvalues and eigenvectors of matrices A R n n. Having developed this machinery, we complete our