Max-Planck-Institut fur Mathematik in den Naturwissenschaften Leipzig H 2 -matrix approximation of integral operators by interpolation by Wolfgang Hackbusch and Steen Borm Preprint no.: 04 200
H 2 -Matrix Approximation of Integral Operators by Interpolation Wolfgang Hackbusch and Steen Borm Max-Planck-Institut Mathematik in den Naturwissenschaften Inselstr. 22{26, D-0403 Leipzig, Germany Abstract Typical panel clustering methods for the fast evaluation of integral operators are based on the Taylor expansion of the kernel function and therefore usually require the user to implement the evaluation of the derivatives of this function up to an arbitrary degree. We propose an alternative approach that replaces the Taylor expansion by simple polynomial interpolation. By applying the interpolation idea to the approximating polynomials on dierent levels of the cluster tree, the matrix vector multiplication can be performed in only O(np d ) operations for a polynomial order of p and an n-dimensional trial space. The main advantage of our method, compared to other methods, is its simplicity: Only pointwise evaluations of the kernel and of simple polynomials have to be implemented. Introduction We consider an integral equation on a submanifold of R d for d 2 N, namely Find u (in a suitable space) satisfying for all x 2 u(x) + for a right hand side f, a parameter 2 R and kernel function k(x; y)u(y)dy = f(x) (.) k : R d R d! R that has the asymptotic smoothness (cf. (5.) ) typical for BEM kernels. We use a Galerkin approach for the discretisation of the equation (.), i.e., we choose a family ( i) i2i of suitable basis functions, set V h := spanf i : i 2 Ig and get the discrete problem Find u h 2 V h satisfying for all v h 2 V h. u h (x)v h (x)dx + k(x; y)u h (y)v h (x)dy dx = We introduce matrices M 2 R II ; K 2 R II and vectors ^u 2 R I ; ^f 2 R I by setting M ij = u h = i2i j(x) i(x)dx; ^fj = ^u i i and K ij = for all i; j 2 I, which allows us to rewrite (.2) in the form f(x) j(x)dx; k(x; y) j(y) i(x)dy dx f(x)v h (x)dx (.2) (M + K)^u = ^f: (.3)
In typical applications, the basis functions ( i) i2i have local support, so the matrix M is sparse, i.e., for n := jij, the construction of the matrix and its multiplication with a given vector can be accomplished in O(n) operations. Since the support of the integral kernel k(; ) is not local, the matrix K is not sparse, so a straightforward computation of its entries requires at least O(n 2 ) operations, and a multiplication of a vector with this matrix will require the same amount of work. There are dierent approaches for reducing the complexity: If the domain is very regular, the resulting matrix has a Toeplitz-type structure and can therefore be evaluated eciently by performing fast Fourier transformations. If the domain is smooth, it is possible to use Wavelet compression, and this approach can be extended to piecewise smooth domains []. In this article, we will consider a variant of the panel clustering method [8] and the mosaic skeleton matrix approach [0] by using the framework of H 2 -matrices introduced in [7]. The idea of our variant is to replace K by a suitable approximation and thereby to reduce the complexity to O(np d ), where p is a parameter corresponding to the \quality" of the approximation and d, as above, is the dimension of the space in which the manifold is embedded. We will approximate the kernel function k(; ) by an interpolant. Since k(; ) will usually not be smooth on the entire domain, we do not use a global interpolant, but choose subsets where k(; ) is smooth and interpolate on each of these subsets. If we organise the subsets in a hierarchical way, we end up with a multilevel block matrix where each block has a rank p d. The structure is that of an H-matrix described in [4, 5]. In this form, the approximating matrix requires only O(np d log n) units of storage, constructing it and multiplying it with vectors requires O(np d log n) operations. If we exploit the fact that the kernel is locally approximated by polynomials that depend only on and and have a xed degree, we nd that the structure resembles that of H 2 -matrices described in [7]. Using an algorithm similar to the Fast Fourier Transformation, but not as limited in scope, we can reduce the storage requirements and the number of operations to O(np d ). Since only quadrature and polynomial interpolation are needed in our algorithm, its implementation is very simple as compared to Wavelet compression techniques or panel clustering approaches based on Taylor expansion. In typical applications, the order p will be chosen to be proportional to log n, so the complexity will be O(n log d n), i.e., not optimal. In [2], the use of anisotropic interpolation on domains adapted to the boundary is suggested in order to reduce the complexity to O(n log d n). We propose a dierent approach, namely the variable order expansion along the lines of [9]. This is only a very minor modication of the original constant-order algorithm and allows us to attain optimal complexity. 2 Cluster Tree, Block Partitioning and Types of H-Matrices In order to keep the selection of the subsets mentioned above algorithmically manageable, we base them on a hierarchical splitting of the index set I corresponding to the basis functions: Let T (I) be a binary tree and let T (I) be the set of its nodes. T (I) is called a binary cluster tree if it satises the following conditions:. T (I) P(I), i.e., each node of T (I) is a subset of the index set I. 2. I is the root of T (I). 3. If 2 T (I) is a leaf, then jj Cp d, i.e., the leaves consist of a relatively small number of indices. 4. If 2 T (I) is not a leaf, then it has exactly two sons and is their disjoint union. For each 2 T (I), we denote the set of its sons by S() T (I). The support of a cluster 2 T (I) is given by the union of the supports of the basis functions corresponding to its elements, i.e., := [ i2 i ; where i := supp i for all i 2 I. 2
Next, we need an admissibility condition that allows us to select pairs (; ) 2 T (I) T (I) such that the kernel k(; ) is smooth enough on the domain associated with. We are going to dene our interpolants on axiparallel boxes B containing the set, so we need the kernel to be smooth on B B if we want a good approximation. This can be expressed in quantitative terms by the inequality maxfdiam(b ); diam(b )g 2 dist(b ; B ); (2.) where 2]0; [ is some parameter controlling the trade-o between the number of admissible blocks, i.e., the algorithmic complexity, and the speed of convergence, i.e., the quality of the approximation. The condition (2.) is especially suited for kernel functions with the property (5.). Figure contains a cluster splitting of the unit circle. For some clusters, the corresponding boxes B are highlighted in grey. Figure : Dyadic clustering of the unit circle. The index set I I corresponding to the matrix K 2 R II is partitioned into blocks by the algorithm shown in Table : procedure BuildPartitioning(,, var P ); begin if (; ) meets condition (2.) then P := P [ f g else if S() 6= ; and S() 6= ; then for 0 2 S(); 0 2 S() do BuildPartitioning( 0, 0, P ) else P := P [ f g end Table : BuildPartitioning algorithm Calling this procedure with = = I and P = ; creates a block partitioning of I I consisting of admissible blocks and non-admissible blocks corresponding to leaf clusters of T (I). An example of an admissible partitioning corresponding to the splitting outlined in Figure can be found in Figure 2. The complexity of algorithms for the creation of suitable cluster trees and block partitionings has been analysed in detail in [3]: For typical quasi-uniform grids, a \good" cluster tree can be created in O(n log n) operations, the computation of the block partitioning can be accomplished in O(n) operations. Based on the cluster tree and the block partitioning, we dene the following three types of data-sparse matrices: Denition 2. (H-matrices) Let A 2 R II be a matrix and P be a block partitioning of I I consisting of admissible blocks and leaf blocks.. Let k 2 N. A is called H-matrix of rank k, if rank(aj ) k (2.2) 3
Figure 2: Block partitioning for the unit circle with dyadic clustering. holds for each 2 P. 2. A family V = (V ) 2T (I) is called a cluster basis, if there is a k 2 N for each 2 T (I) such that V 2 R k. Let V c ; V r be cluster bases. A is called uniform H-matrix with respect to V c and V r, if for each 2 P there is a matrix S satisfying Aj = V c; S V T r; : (2.3) In this context, V c is called column basis of A, V r is called row basis of A. 3. A cluster basis V = (V ) 2T (I) is called nested, if there are transfer matrices B 0 ; 2 R k 0 ;k for all 2 T (I) with S() 6= ; and 0 2 S() satisfying V j 0 k = V 0B 0 ; : (2.4) A is called H 2 -matrix with respect to V c and V r, if it is a uniform H-matrix with respect to these cluster bases and if both bases are nested. 3 H 2 Construction 3. Approximation of the Kernel For each cluster 2 T (I), we x a family (x ) 2I of interpolation points (e.g. tensor products of the zeroes of Chebyshev polynomials) in B and the family (p ) 2I of corresponding Lagrange polynomials. On a given admissible block 2 P, we approximate the kernel function k(; ) by its interpolant 3.2 Approximation of the Discrete Operator The discrete Galerkin operator is given by ~k (x; y) := k(x ; y )p (x)p (y): (3.) 2I 2I K ij := i k(x; y) j(y) i(x)dy dx for nite element basis function i; j having their supports in i ;. 4
For i 2 and j 2, we can replace k(; ) with ~ k(; ) and nd ~K ij := i = 2I ~ k(x; y) i(x) j(y)dy dx 2I k(x ; y ) i p (x) i(x)dx Introducing matrices V 2 R I ; V 2 R I and S 2 R I I with we get V i := i.e., a representation as a uniform H-matrix. 3.3 Nested Bases i p (x) i(x)dx; V j := p (y) j(y)dy: p (y) j(y)dy; (3.2) S := k(x ; y ); (3.3) ~K ij = V S V T ij ; (3.4) Let Q p be the space of polynomials spanned by tensor products of polynomials of a degree up to p. The interpolation operator is a projection onto this space, so if we use Q p as the interpolation polynomials on all clusters, we can express polynomials corresponding to father clusters in terms of polynomials corresponding to son clusters without loss, i.e., for a cluster 2 T (I) with a son 0 2 T (I), we nd p (x) = p (x 0 )p 0 (x): (3.5) 2I 0 By introducing a matrix B 0 ; 2 R I 0 I by setting we get Vi = p i (x) i(x)dx = 2I 0 = 2I 0 B 0 ; := p (x 0 ); (3.6) p (x 0 ) i p 0 (x) i(x)dx B 0 ; V 0 i = (V 0 B 0 ; ) i (3.7) for all i 2 0, so the bases are nested in the way required for H 2 -matrices. 4 Algorithms and Complexity We require the block partitioning P to be sparse in the sense of [3], i.e., that there exists a constant C sp satisfying max jf 2 T (I) : 2 P gj C sp ; (4.) 2T (I) max jf 2 T (I) : 2 P gj C sp : (4.2) 2T (I) For standard situations with quasi-uniform meshes, this estimate has been established in [3]. We further assume that there is a constant C L satisfying S() = ; ) ji j jj C L ji j (4.3) for all 2 T (I), i.e., the size of a leaf cluster is comparable to the number of polynomials used in the interpolation on this cluster. Note that, due to the requirement of subsection 3.3, we have ji j = p d for all clusters 2 T (I). 5
procedure ForwardTransformation(); begin if S() = ; then x := V T xj else begin x := 0; for 0 2 S() do begin ForwardTransformation( 0 ); x := x + B 0 ; T x 0 end end end Table 2: ForwardTransformation algorithm 4. Matrix-Vector Multiplication We split the matrix-vector multiplication y := ~ Kx into three steps: First, the products of the input vector with the matrices V T are computed by recursively applying the equation (3.7). Then, the resulting coecients are multiplied by S. In the last step, the result is transformed back from the coecients of the cluster bases into the standard basis. 4.. Forward Transformation In this rst step, we want to compute x := V T xj for all 2 T (I). If is not a leaf, i.e., if S() 6= ;, we have V T xj = B 0 ; T V T 0 xj 0 = B 0 ; T x 0; 0 2S() 0 2S() so the algorithm shown in Table 2 can be used to compute all coecients. Now we want to analyse the complexity of this algorithm. For each 2 T (I), we have ji j = p d and nd that O(p 2d ) operations are required for the computation of x. Due to the lower bound in condition (4.3), there are no more than n=p d leaves in T (I), so there are less than 2n=p d nodes. Therefore, the full forward transformation takes O(np d ) operations to complete. 4..2 Multiplication In the second step, we want to compute for all 2 T (I), where y := 2P S x P := f 2 T (I) : 2 P g: Due to the sparsity condition (4.), we have jp j C sp, so the computation of y requires O(C sp p 2d ) operations. The cluster tree has at most 2n=p d nodes, so the second step takes O(np d ) operations. 4..3 Backward Transformation In the last step, we transform the result from the cluster bases into the standard basis. The procedure shown in Table 3 is the adjoint of the Forward Transformation. By the same arguments as in the case of the Forward Transformation, we nd that only O(np d ) operations are required for the Backward Transformation, so the complete matrix-vector multiplication can be completed in O(np d ) operations. 6
procedure BackwardTransformation(); begin if S() = ; then yj := V y else for 0 2 S() do begin y 0 := y 0 + B 0 ; y ; BackwardTransformation( 0 ) end end Table 3: BackwardTransformation algorithm 4.2 Discretisation 4.2. Basis Transformation Matrices We rst consider the computation of the matrices B 0 ;. We are using tensor product interpolation, so the polynomials p and the corresponding interpolation points x can be written in the form p (x ; : : : ; x d ) = p ; (x ) : : : p ;d d (x d ); x = (x ; ; : : : ; x ;d d ) for one-dimensional Lagrange polynomials (p ;i j ) and corresponding interpolation points (x;i j ) on the real line. The equation (3.6) takes the form so it has the tensor product structure with B 0 ; = p (x 0 ) = p ; (x 0 ; ) : : : p ;d B 0 ; = B 0 ;; : : : B 0 ;;d B 0 ;;i i; i = p ;i i (x 0 ;i i ): d (x 0 ;d The computation of one of these latter matrices requires O(p 3 ) operations, the computation of all of them requires O(dp 3 ). This means that, since there are n=p d leaves and each leaf requires O(dp 3 ) units of storage, the matrices B 0 ; can be stored in the form of tensor products in O(dnp 3d ) units of storage. Of course, we can also construct the full matrix B 0 ; from the tensor product form. This requires O(dp 3 + p 2d ) operations and leads to a total of O(dnp 3d + np d ) operations. 4.2.2 Leaf Basis Matrices The computation of the matrices V for leaves 2 T (I) corresponds to the computation of the integrals V i = i p (x) i(x)dx; where i is some nite element basis function, p a tensor product polynomial in Q p and where i 2 and 2 I. The computation of each of the O(p 2d ) entries using Gauss quadrature of order q requires O(p 2d q d ) operations, so all leaf basis matrices can be computed using O(np d q d ) operations and stored in O(np d ) units of memory. 4.2.3 Admissible Blocks According to (3.3), the coecient matrices S corresponding to admissible blocks are dened by S = k(x ; x ); so building one of these matrices requires O(p 2d ) operations. By the same arguments as before, we get a total of O(np d ) operations for the process of building the coecients for all admissible blocks. d ); 7
4.2.4 Non-admissible Blocks To compute the matrices S for non-admissible blocks, we have to approximate the integral S ij = i k(x; y) j(y) i(x)dy dx: As before, we use quadrature of order q and nd that we need O(np d q d ) operations and O(np d ) units of memory. 5 Error Analysis We assume that the kernel k(; ) is asymptotically smooth, i.e., that there are constants C as (n; m) satisfying j@ x @ y k(x; y)j C as(jj; jj)kx yk jjjj jk(x; y)j (5.) for all multi-indices ; 2 N d 0 and all x; y 2 R d with x 6= y. Lemma 5. (Kernel approximation) Let 2 P be an admissible block and assume that (5.) holds for k(; ). Then the kernel ~ k (; ) from (3.) satises the estimate with the constant kk ~ k k ;B B C apx (p) p+ kkk ;B B (5.2) p 2d(Cas (0; p + ) + C as (p + ; 0)) (C log(p + )) 2d C apx (p) := : (5.3) (p + )! 2 p=2 The behaviour of this constant will be discussed in Remark 5.2. Proof. The interpolated kernel ~ k(; ) can be written in the form ~k = I p B B k by using the tensor product interpolation operator. Applying the error estimate of Lemma A., we nd kk kk ~ ;BB = kk I p B B kk ;B B 4p 2(p + )! (C log(p + ))2d diam(b B ) p+ By using the admissibility condition (2.), we nd 2d @ p+ j diam(b B ) = maxfkp p 2 k 2 : p ; p 2 2 B B g = maxfk(x ; y ) (x 2 ; y 2 )k 2 : (x ; y ); (x 2 ; y 2 ) 2 B B g q k : ;B B = maxfk(x ; y ) (x 2 ; y 2 )k 2 2 : x ; x 2 2 B ; y ; y 2 2 B g q = maxfkx x 2 k 2 2 + ky y 2 k 2 2 : x ; x 2 2 B ; y ; y 2 2 B g p diam(b ) 2 + diam(b ) 2 p 2 maxfdiam(b ); diam(b )g 2 p 2 dist(b ; B ); and therefore p kk kk ~ 2 (C log(p + )) 2d ;BB p+ dist(b (p + )! 2 p=2 ; B ) p+ 2d @ p+ j k : ;B B The partial derivatives can be estimated by using the asymptotic smoothness (5.) as follows: j@ p+ j k(x; y)j C as (p + ; 0)kx yk (p+) jk(x; y)j 8
C as (p + ; 0) dist(b ; B ) (p+) jk(x; y)j for j d and for j > d, so we can use 2d @ p+ j k ;B B to get the desired result j@ p+ j k(x; y)j C as (0; p + ) dist(b ; B ) (p+) jk(x; y)j d(c as (p + ; 0) + C as (0; p + )) dist(b ; B ) (p+) kkk ;BB kk kk ~ ;BB p 2 (C log(p + )) 2d p+ dist(b (p + )! 2 p=2 ; B ) p+ p 2 (C log(p + )) d p+ dist(b ; B ) p+ (p + )! 2 p=2 dist(b ; B ) (p+) (d C as (0; p + ) + d C as (p + ; 0)) kkk BB C apx (p) p+ dist(b ; B ) p+ dist(b ; B ) p+ kkk B B C apx (p) p+ kkk BB : 2d @ p+ j k ;B B Remark 5.2 (Convergence) For typical kernels, an estimate of the type holds, leading to C as (n; m) c c n+m 0 (n + m)! C apx (p) 2 p (C log(p + ))2d 2c d c p+ 2 p=2 0 : If c 0 <, the approximation ~ k(; ) will converge to k(; ) faster than (c 0 ) p+. If the continuous operator is W -elliptic for some Hilbert space W with W h := spanf i : i 2 Ig W, then we can proceed as in [6, Lemma 3] to prove the estimate ku ~u h k W c inf ku v h k W + C apx (p) p+ k KkWh ~!W kuk 0 h W : v h 2W h 6 Extensions Our construction can be extended to other integral operators and other discretisation techniques by simple modication of the denition of V : 6. Collocation If we want to discretise by the collocation method, we have an additional family ( i ) of collocation points and the discrete operator is given by K ij := Replacing once more k(; ) by ~ k(; ), we nd ~K ij := j ~ k(i ; y) j(y)dy = 2I k( i ; y) j(y)dy: 2I k(x ; y )p ( i ) 9 p (y) j(y);
so we can dene V ;coll 2 R I by setting and get ~K ij = V ;coll i := p ( i ) V ;coll S V T ij ; i.e., the same result as in (3.4) with only V replaced by V ;coll. As before, V ;coll can be expressed in terms of the matrices V 0 ;coll corresponding to the sons 0 of. 6.2 Double Layer Potential If we want to discretise the double layer potential (using the Galerkin approach again), we have to compute K ij := i @ @n y k(x; y) j(y) i(x)dy dx: Replacing k(; ) by its approximation k(; ~ ), we get @ ~K ij := k(x; ~ y) i @n y = 2I 2I k(x ; y ) i p (x) i(x)dx j(y) i(x)dy dx @ @n y p (y) j(y)dy; so by dening V ;DLP 2 R I as we once more get the representation @ V ;DLP i := p @n (y) y ~K ij = V S V ;DLPT ; ij j(y)dy; similar to (3.4). This means that for the typical variants of the integral operator (single layer potential, double layer potential and its transposed, hypersingular operator) and for the typical discretisation methods (collocation and Galerkin) the matrices S and B 0 ; stay the same, only the matrices V corresponding to the leaves of the cluster tree have to be modied. The construction leads directly to H 2 -matrices, so if tensor products of polynomials of a degree up to p are used for a kernel function dened in R d, the rank will be k = p d and the complexity of the matrix-vector multiplication will be O(nk) = O(np d ). 6.3 Variable Order Approximation In [9], a variation of the H 2 -technique is investigated that varies the polynomial order corresponding to the size of the clusters: For small clusters, a lower order approximation is sucient, while larger clusters require a higher order. The same approach can be used in the context of our method: If the polynomial order corresponding to a son cluster 0 is lower than that of its father, the equation (3.5) yields only an approximation p (x) ~p (x) := p (x 0 )p 0 (x) 2I 0 of the possibly higher-order polynomial p by lower-order polynomials p 0. If the growth of the polynomial order is \slow enough", e.g., only polynomial with respect to the level of the clusters, then the storage requirements and the complexity of the discretisation and the matrix-vector multiplication can be reduced to O(n), i.e., to optimal complexity. 0
7 Numerical Experiments We applied our technique to a simple model problem: The discretisation of the single layer potential on the unit circle by a Galerkin approach using a regular grid of piecewise constant functions. The results are reported in Table 4. Table 4: Unit circle, single layer potential, = 0:8, p(`) = + (`max `). n Build=s MVM=s ka Ak2 ~ =kak 2 Memory per DoF 024 :05 0:02 0:000483583 47 2048 2:30 0:06 0:00026483 4605 4096 4:97 0:4 0:00040073 4929 892 0:2 0:3 0:0000725354 562 6384 2:0 0:62 0:0000370742 5324 32768 40:96 :22 0:000087955 5434 65536 83:54 2:54 5507 3072 67:79 5:6 5554 26244 338:22 0:27 5584 524288 675:67 20:8 5602 We can see the linear growth of the time for the construction of the matrix and for the matrix vector multiplication. The relative approximation error appears to be proportional to the approximation error. The memory requirements per degree of freedom are bounded. A Multidimensional Interpolation Estimate We denote the Chebyshev points in the interval [; ] by ( i ) p i=0, i.e., we have 2i + i = cos 2p + 2 ; and the corresponding Lagrange polynomials by (p i ) p i=0. The one-dimensional interpolation operator is given by This operator satises the error estimate and the stability estimate ~I p : C[; ]! P p ; u 7! p i=0 u( i )p i : k~ I p u uk ;[;] 2p (p + )! ku(p+) k ;[;] (A.) k~ I p uk ;[;] C log(p + )kuk ;[;] : (A.2) We are interested in a result for the d-dimensional tensor product interpolation operator I p B B := dy [a j ; b j ]: on a box The interpolation error can be bounded as follows: Lemma A. (Tensor product interpolation) We have ki p B d u uk ;Bd
2p (p + )! d k= (C log(p + )) k bj a j 2 4p 2(p + )! (C log(p + ))d diam(b d ) p+ p+ @ p+ j d u ;Bd @ p+ j u ;Bd (A.3) for u 2 C p+ (B). Proof. We denote by the transformation from the unit interval to [a j ; b j ]. For each j 2 f; : : : ; dg, we introduce the operator I p j j : [; ]! [a j ; b j ]; t 7! 2 (( + t)b j + ( t)a j ) :C(B)! C(B); u 7! x 7! p i=0 u(x ; : : : ; x j ; j ( i ); x j+ ; : : : ; x d ) p i ( j (x j )) interpolating functions in the j-th coordinate direction only. The estimates (A.) and (A.2) can be applied to I p j and take the form! ki p j u uk ;B 2p (p + )! bj a j 2 ki p j uk ;B (C log(p + ))kuk ;B : The tensor product interpolation operator can be written as I p B = Ip : : : Ip d = dy Due to this product representation, we can derive the following estimate: ki p B u uk ;B = = = p+ k@ p+ j uk ;B ; (A.4) I p j : dy I p j u u ;B 0 dy d Y ky I p j u + @ k I p j u I p j u A u k= ;B 0 d Y @ k k Y I p j u I p j u A d 0 k= @ k Y k= ;B 0 d Y (A.5) (A.4) k= d k= @ k (C log(p + )) A ki p k u uk ;B (C log(p + )) k 2 p (p + )! I p j A (I p k ;B u u) p+ bk a k k@ p+ k uk ;B : 2 (A.5) References [] W. Dahmen and R. Schneider: Wavelets on manifolds I: Construction and domain decomposition, SIAM J. of Math. Anal. 3 (999) 84{230. 2
[2] K. Giebermann: Multilevel approximation of boundary integral operators, Computing 67 (200) pp. 83{ 207. [3] L. Grasedyck: Theorie und Anwendungen Hierarchischer Matrizen, PhD thesis, Universitat Kiel, 200. [4] W. Hackbusch: A sparse matrix arithmetic based on H-Matrices. Part I: Introduction to H-Matrices, Computing 62 (999) 89{08. [5] W. Hackbusch and B. Khoromskij: A sparse matrix arithmetic based on H-Matrices. Part II: Application to multi-dimensional problems, Computing 64 (2000) 2{47. [6] W. Hackbusch and B. Khoromskij: A sparse H-matrix arithmetic: General complexity estimates, J. Comp. Appl. Math. 25 (2000) 479{50. [7] W. Hackbusch, B. Khoromskij, and S. Sauter: On H 2 -matrices. In Lectures on Applied Mathematics, H. Bungartz, R. Hoppe, and C. enger, eds., 2000, pp. 9{29. [8] W. Hackbusch and. P. Nowak: On the fast matrix multiplication in the boundary element method by panel clustering, Numerische Mathematik 54 (989) 463{49. [9] S. Sauter: Variable order panel clustering, Computing 64 (2000) 223{26. [0] E. Tyrtyshnikov: Mosaic-skeleton approximation, Calcolo 33 (996) 47{57. 3