Transactions on Modelling and Simulation vol 19, 1998 WIT Press, ISSN X

Cost estimation of the panel clustering method applied to 3-D elastostatics Ken Hayami* & Stefan A. Sauter^ * Department of Mathematical Engineering and Information Physics, Graduate School of Engineering, University of Tokyo, 113-8656 Tokyo, Japan, EMail: hayami@simplex.t.u-tokyo.ac.jp % Lehrstuhl fur Praktische Mathematik, Mathematisches Seminar Bereich II, Christian-Albrechts-Universitdt zu Kiel, D-24098 Kiel, Germany, EMail: sas@numerik.uni-kiel.de Abstract We will present an efficient algorithm for the panel clustering method for the three-dimensional elastostatic problem, which reduces the memory and computational work for the boundary element method. In order to make the necessary polynomial expansions for the clustering simple, the fundamental solution for the displacement components are expressed as a linear combination of partial derivatives of the distance r between the observation point and the field point. It is shown that this approach is far more efficient compared to the standard expression by comparing the estimates for the computational work. 1 Introduction Although the Boundary Element Method (BEM) enjoys the advantage of the boundary only discretization, a serious computational difficulty arises due to its dense matrix formulation for large scale three-dimensional problems, since the method requires O(JV^) memory and O(JV^) computational work using the conventional approach, where N is the number of unknowns. The situation is even worse for the three-dimensional elastostatic problem, where the number of unknowns is three times that of the potential problem, since the unknowns at each node is a vector instead of a scalar.

34 Boundary Element Research In Europe Hackbusch and Nowak[l] proposed the panel clustering method in order to overcome this difficulty. The main idea is to approximate the farfieldusing polynomial expansions around a centre of a cluster of panels or boundary elements, thus reducing the O(N^) dense matrix vector multiplication to a O(7V(log N)*) sparse matrix vector multiplication for each iteration of the iterative linear solver. The authors previously proposed a formulation of the panel clustering method for the three-dimensional elastostatic problem [2]. In this paper we will present an efficient algorithm for calculating the expansion coefficients for the panel clustering, which is the most time and memory consuming part of the method, and the computational costs will be estimated. 2 The boundary element formulation of 3-D elastostatics The boundary integral equation for the three-dimensional (linear, isotropic) elastostatic problem is given by + c.p. (f= 1,2,3), (1) where we have used Einstein's convention for the summation over the repeated index k = 1,2,3, F is the boundary of the domain under consideration, %,P& are the displacement and traction components, respectively, %(#) = ^6/& when F is smooth at a, and the body force term has been neglected. u*k(x,y) is the fundamental displacement, which is usually given by where /z is the shear modulus, v is the Poisson's ratio, r := \y #, r^ := dr/dykj x = (x\,x^x^ and y = (3/1,2/2,2/3)^- However, it will prove useful [2] to use the alternative expression: 1 f ^ 1 1,gX ^r,mm,6 2(1 _y) ^J'

Boundary Element Research In Europe 35 The fundamental traction component is given by P%(%,2/) = <?;%%, = A%%,%6 + /4%j + %W%' (4) where n/ is the component of the unit outward normal vector at y G F Next, the boundary T is discretized into boundary elements or panels TT^, (a = 1,...,n). Although the following discussion is also valid for higher order elements, assume constant elements, and let x" be the point representing 7r<%, to obtain 0=1 (a = l,._,7z; /= 1,2,3), (5) where uf = u(x&) etc. Given the boundary data, equation (5) is a system of linear equations for the unknown boundary displacement and traction components u and p%, where the matrix is dense and nonsymmetric. Hence, if it is solved using LU-decomposition, the computational work is O(N*) &%d memory O(JV^), where N = 3n is the number of unknowns. Alternatively, one could use iterative solvers for nonsymmetric matrices, such as the GMRES method. Since the system of equations arising from boundary integral equations is usually well conditioned, the method should converge within M < N iterations with the use of a suitable preconditioner when necessary. In this method, the dominant part of the computation is the dense matrix-vector multiplication corresponding to equation (5) for the unknown boundary data or iteration vector, which costs O(N^) for each iteration. Hence, the amount of computational work is reduced from O(TV^) to O(MN^), but the memory required is still O(N^), and it is this memory bottle-neck that hinders the solution of large scale problems using the boundary element method. 3 The panel clustering method The reason why the matrix-vector product for the iterative solution of equation (5) is dense is because the observation point x<* on the element I\ is related to all the elements Fp on the boundary through the kernels p*k(x<*,y) and u*k(x *,y).

36 Boundary Element Research In Europe The panel clustering method [1] was proposed to reduce the required memory and computational work for computing such matrixvector products. The method makes use of polynomial expansions to approximate the integral kernels for cluster of elements which are sufficiently far from the observation point, thus reducing the amount of computation and required memory. Let 6(%,!/X%)dr(2,) (6) p represent the integrals in equation (1). In the farfield (i.e. when x - y \ is sufficiently large) k(x,y) is approximated by a piecewise polynomial &#% (#,%/) with respect to %/, which may be based on thetaylor expansion around a centre y R^ expressed as global polynomials of y: 66/rn where /m is an index set of size (t/m < Cjm?. The integer m is the order of the expansion. The functions $i(y) must be independent of x and the centre y. Furthermore, integrals of $^(y) over single boundary elements F/? must be easily computable. Usually, $*(t/) is a (piecewise) polynomial over each element F^. The error of the expansion of equation (7) depends on the order m and on the distance y - y \ from the centre, i.e. A(*,2/) - W*,%) < Ci(C27?r 6(a:,y)l &" all y - ^ < 7? % - %, where 0 < rj < 1. Next, "clusters" r, which are unions of several boundary elements 7T0, are introduced. For each observation point %, the boundary F can be represented by a union of certain number of elements and clusters: F = 7riU7r2U...U7TpUriUr2U...Urc (TT^ : elements, TJ : clusters). (8) Since the clusters TJ are unions of many elements, the sum p + c can be considerably smaller (O(logn)) than the number of elements n. The integral of equation (6) can now be expressed as j=l

Boundary Element Research In Europe 37 The first term corresponding to the "near field" TI U 7r2 U... U Tip will be evaluated directly. The clusters TJ correspond to the "far field", where the expansion (7) around a centre y = y^ of TJ can be exploited, i.e. we can approximate the integral over TJ by replacing k by km to obtain Since the quantities J' («) = / *»(y)tt(»)dr(»), (j = l.-.c) are independent of a, they will be computed in the first phase for all indices i and clusters r. Then the evaluation of equation (9) can be performed for all element nodes x", 1 < a < n, by = K.K;/)4/4, which has tt/m =0(ro3) terms independent of the size of the cluster TJ, and Jj.(w) can be shared among different %". By taking a hierarchy of clusters, the number of all possible clusters (consisting of more than one element) for all JB«, 1 < a < n can be kept under the number of elements n [1]. The error of approximating the integral /,. k(x,y)u(y}ay(y) can be controlled by keeping the size of the clusters sufficiently small compared to the distance from x. A partition of T by equation (8) is called "admissible" with respect to x when the clusters are sufficiently small in this sense. The admissible covering with the smallest number of members is used. 4 Computation of the expansion coefficients The computation of the expansion coefficients K,(a;;j/ ) in (7) is the most time-consuming part of the panel clustering method. Let km(x,y) be the Taylor expansion of k(x,y) around yr to the m-th order, i.e. where,:= K^f, ", NO, 1 < «No := {0}UN. ~ and z =

38 Boundary Element Research In Europe Sauter [3] proposed an efficient algorithm for calculating KV(X, j/r) when k(x,y) = % - J/H"""*, which is given in the following. Let [x] denote the integer part of x 6 R, := x - Algorithm a) Compute s\(yr] for 0 < / < m and u? for 0 < V{ < m, 1 < i < 3 b) Compute for all 1/3, / E NO; */s + / < rn : c) Compute Df'** for r/2,^3,1 E NO; ^2 + ^3 + / < m T-\t/2»^3. JJt. d) Compute D" for all i/ < m : e) Compute the coefficients % (%,%/?) using the following recursion: 1. For v \ m 1,...,0, Compute {n^ U if II k K \u f/ I I y^3.^ j l/ < Aj < TTi where e* is the unit vector such that (e;)j = %. 2. Compute for all v E Njj, i/ < m :

Boundary Element Research In Europe 39 The total computational work for computing %,/(%, */T),H < for fixed,j/t is Wn = (5nf + 25m* + 43m + 71) eo + 1 neo, (10) LZi where eo stands for an elementary operation such as ±, X,-f, and neo stands for a non-elementary operation (namely square root). The memory required for K^(x,j/r),H < m is Nm = KM < m} = i(m* + 3m' + 2m). Next, we will make use of the expression (3) and the preceding algorithm to obtain polynomial expansions for the three-dimensional elastostatic kernels. First, consider approximating the fundamental displacement u*k(x,y) for a cluster r of elements by the polynomial expansion obtained by the m-th order Taylor expansion around %/ = %/?, i.e. Then, the integral over the cluster r is approximated by p 0=1 where the last approximation is valid for constant elements, where Pk(y) «pf,l < fc < 3 for y 7T0, and the farfieldcoefficients are given by 31 := / t/"dr(y). (12) J^P Let fp denote the p-th order (Taylor) expansion of /. Then, we have Lemma 1 Here the _ denotes that Einstein's convention for repeated indices do not apply. < can be obtained by Algorithm 1 with 7 = -1.

40 Boundary Element Research In Europe Thus, (3) gives for / ^ 6, /A%,2/T) = - i^ ^(^/ +!)(*% + l)<+e,+e/^,%/t) (13) and for / = A;, %=! where an = 1L Note that it is only necessary to compute and store & * for 1 < I < k < 3, due to symmetry, so that the memory required for f/ < m is 6JV = m* + 3m^ + 2m. As for the fundamental traction pj^, ^ is a function not only of spatial derivatives of r but also of the unit outward normal n/(y) to the boundary T, which is generally a piecewise polynomial (cf. (4)), so that it is not possible to obtain a global expansion for p^(x,y). Instead, we will expand parts of p^ by global polynomials, as follows. From (4), we have 4- i/ <m ( ^'X^ /3=1 /5=1 J (15) where the farfieldcoefficients are given by /^ ^ := Jpn%, and constant elements: rij(y) w n?, Uk(y} «wf for y ftp are used. (High order elements can be treated similarly.) The expansion coefficients for u*^ can be computed using Lemma 2

Boundary Element Research In Europe 41 5 Estimates of computational costs First, the computational work to obtain the expansion coefficients will be estimated. According to (10), the work to obtain < for \v\ < m + 3 is Wm+3 = (6^ + T# * + IT ' + 2f m + 14o) eo + 1 neo. Next, K$* for 1 < I < k < 3 and i/ < ra+1 is obtained using (13) and 7 " ' '» * (14) with work: l4(? + 6m* + llm + 6)eO. Finally, «""''', 1 < / < 3 and K^'^^\ 1 < / < 3, 1 < k < j < 3; i/ < m are computed using Lemma 2 with work: BIANCO = (f ^ + T^ + 27m) eo. Hence, the total work to compute the necessary expansion coefficients Transactions on Modelling and Simulation vol 19, 1998 WIT Press, www.witpress.com, ISSN 1743-355X is / 5 A 415 a 508 o 929, A _, l^n = [ Ji^4 _ _ ^3 _ _ ^2 _ _ yn + 224 eo + V12 12 3 3 / The memory required to store the expansion coefficients for \v\ < m for tijfc, 1 < / < k < 3 is 67V, for A%, 1 < / < 3 is 37V^, and for Mkj + %%,&), 1 < '^ 3, 1 < j <fc < 3 is 18JV^, which gives a total of % = 27JV^, where JV^ = i/ < m - i(m^ + Sm^ + 2m). On the other hand, if we had used the standard expression (2) for u*^, the work estimate would have been /2 «% 79 X 314 3 2059 % 2071 177 TC = (^ + 8m^ym' + m^ + -^m^-^m+ eo + IneO, which is greater than the W above for m > 2. The reason why the standard expression requires O(ra^) work is because it involves multiplication of polynomials with O(m^) terms each, such as (r,/) and (r,/.)m, which requires O(m^) work if done in the straightforward way. The required memory using the standard expression is the same as above. Since the number of admissible clusters per observation point is O(log n) and the order of expansion m necessary for an approximation consistent with the discretization error is O(logn), where n is the number of boundary elements (observation points) [1], the total work necessary for the computation of the expansion coefficients is O(nlognm^) = O(nlog* n). The memory required is O(n log n m?) = O(nlog^n).

42 Boundary Element Research In Europe Next, the work for the computation of the farfieldcoefficients of (12) etc. is O(nm^) =O(nlog*n) [1] and the memory is O(nra^) = O(nlog^n). As for the work for each iteration, assuming that the number of elements in a cluster is p, the work per iteration for computing Ir <&(*,%W%/)drW using (11) is 3{(6p + 3)7V - l}eo and that for computing frp*k(*>y)uk(ywy) using (15) is 3{(24p+10)JV^-l}eO, so that the work for a matrix vector multiplication iso(nra^) O(nlog^ n) and the work and memory for other parts of the iteration such as in GMRES is O(n). Hence, the dominant part of the whole algorithm is the computation of the expansion coefficients, which consumes O(nlog^ n) work and O(nlog^n) memory. 6 Conclusions We presented an algorithm for applying the panel clustering method to the three-dimensional elastostatic problem and estimated its computational complexity. It was shown that the computation of the expansion coefficients is the dominant part of the algorithm and that expressing the fundamental displacement in terms of a linear combination (3) of the second order spatial derivatives of the distance between the observation point and the field point substantially reduces the computational costs. References [1] Hackbusch, W. and Nowak, Z.P., On the fast matrix multiplication in the boundary element method by panel clustering, Numerische Mathematik, Vol. 54, pp. 463-491, 1989. [2] Hayami, K. and Sauter, S., Application of the panel clustering method to the three-dimensional elastostatic problem, Boundary Elements XIX, Computational Mechanics Publications, pp. 625-634, 1997. [3] Sauter, Der Aufwand der Panel-Clustering-Methode fur Integralgleichungen, Bericht Nr. 9115, Institut fur Informatik und Praktische Mathematik, Christian-Albrechts-Universitat Kiel, 1991.