arxiv: v1 [math.na] 28 Feb PDF Free Download

BDDC by a frontal solver and the stress computation in a hip joint replacement arxiv:0802.4295v1 [math.na] 28 Feb 2008 Jakub Šístek a, Jaroslav Novotný b, Jan Mandel c, Marta Čertíková a, Pavel Burda a a Department of Mathematics, Faculty of Mechanical Engineering Czech Technical University in Prague b Institute of Thermomechanics, Academy of Sciences of the Czech Republic Abstract c Department of Mathematical Sciences, University of Colorado Denver A parallel implementation of the BDDC method using the frontal solver is employed to solve systems of linear equations from finite element analysis, and incorporated into a standard finite element system for engineering analysis by linear elasticity. Results of computation of stress in a hip replacement are presented. The part is made of titanium and loaded by the weight of human body. Key words: domain decomposition, iterative substructuring, finite elements, linear elasticity, parallel algorithms 1 Introduction Parallel numerical solution of linear problems arising from linearized isotropic elasticity discretized by finite elements is important in many areas of engineering. The matrix of the system is typically large, sparse, and illconditioned. The classical frontal solver [4] has became a popular direct method for solving problems with such matrices arising from finite element analyses. However, for large problems, the computational cost of direct solvers makes them less competitive compared to iterative methods, such as the Email addresses: jakub.sistek@fs.cvut.cz (Jakub Šístek), novotny@it.cas.cz (Jaroslav Novotný), jan.mandel@cudenver.edu (Jan Mandel), marta.certikova@fs.cvut.cz (Marta Čertíková), pavel.burda@fs.cvut.cz (Pavel Burda). Preprint submitted to Elsevier 6 January 2018

preconditioned conjugate gradients (PCG). The goal is then to design efficient preconditioners that result in a lower overall cost and can be implemented in parallel, which has given rise to the field of domain decomposition and iterative substructuring [10]. The Balancing Domain Decomposition based on Constraints (BDDC) [3] is one of the most advanced preconditioners of this class. However, the additional custom coding effort required can be an obstacle to the use of the method in an existing finite element code. We propose an implementation of BDDC built on top of common components of existing finite element codes, namely the frontal solver and the element stiffness matrix generation. The implementation requires only a minimal amount of additional code and it is therefore of interest. For an important alternative implementation of BDDC, see [5]. The frontal solver was used to implement a limited variant of BDDC in [1,9]. The implementation takes advantage of the existing integration of the frontal solver into the finite element methodology and of its implementation of constraints, which is well-suited for BDDC. However, the frontal solver treats naturally only point constraints, while an efficient BDDC method in three dimensions requires constraints on averages [6]. In this paper, we extend the implementation to constraints on averages and apply it to a problem in biomechanics. 2 Mathematical formulation of BDDC Consider the problem in a variational form a(u, v) = f, v v V, (1) where V is a finite element space of R 3 -valued piecewise polynomial functions v continuous on a given domain Ω R 3, satisfying homogeneous Dirichlet boundary conditions, and a(u, v) = Ω (λ div u div v + 1 2 µ ( u + T u) : ( v + T v)). (2) Solution u V represents the vector field of displacement. It is known that a(u, v) is a symmetric positive definite bilinear form on V. An equivalent formulation of (1) is to find a solution u to a linear system Au = f, where A = (a ij ) is the stiffness matrix computed as a ij = a(φ i, φ j ), where {φ i } is a finite element basis of V, corresponding to set of unknowns, also called degrees of freedom, defined as values of displacement at the nodes of a given triangulation of the domain. The domain Ω is decomposed into nonoverlapping subdomains Ω i, i = 1,... N, also called substructures. Unknowns common to 2

at least two subdomains are called boundary unknowns and the union of all boundary unknowns is called the interface Γ. The first step is the reduction of the problem to the interface. The space V is decomposed as the a-orthogonal direct sum V = V 1 V N V Γ, where V i is the space of all functions from V with nonzero values only inside Ω i (in particular, they are zero on Γ), and V Γ is the a-orthogonal complement of all spaces V i ; V Γ = {v V : a(v, w) = 0 w V i, i = 1,... N}. Functions from V Γ are fully determined by their values at unknowns on Γ and the discrete harmonic condition that they have minimal energy on every subdomain. They are represented in the computation by their values on the interface Γ. The solution satisfies u = u Γ + N i u i, where u i are solutions of the local problems A i u i = f i on every Ω i, with zero Dirichlet boundary condition on interface Γ, and u Γ solves the Schur complement problem in the space of discrete harmonic functions V Γ, Su Γ = f Γ. Once the solution u Γ on the interface Γ is found, the solutions u i in interiors of subdomains are computed prescribing the solution on Γ as Dirichlet boundary condition. See [10] for more details. The BDDC method is a particular kind of preconditioner for the reduced problem Su Γ = f Γ. The main idea of the BDDC preconditioner in an abstract form [8] is to construct an auxiliary finite dimensional space W such that V Γ W and extend the bilinear form a (, ) to a form ã (, ) defined on W W and such that solving the variational problem (1) with ã (, ) in place of a (, ) is cheaper and can be split into independent computations done in parallel. Then the solution restricted to V Γ is used for the preconditioning of S. More precisely, let E : W V Γ be a given projection of W onto V Γ, and r = f Γ Su Γ the residual. Then the output of the BDDC preconditioner is v = Ew, where w W : ã (w, z) = (r, Ez) z W. (3) In terms of operators, v = E S 1 E T r, where S is the operator associated with the bilinear form ã (but not computed explicitly as a matrix). The choice of the space W and the projection E are standard [3,8]. All functions from V Γ are continuous on the domain Ω. In order to design the space W, we relax the continuity on the interface Γ. On Γ, we select coarse degrees of freedom and define W as the space of finite element functions with minimal energy on every subdomain, continuous across Γ only at coarse degrees of freedom. The coarse degrees of freedom can be values at subdomain corners or averages over subdomain faces or edges. The continuity condition then means that the values of the corresponding unknowns, resp. averages, on neighbouring subdomains coincide. The bilinear form a (, ) from (2) is extended to ã (, ) on W W by integrating over the subdomains Ω i separately and adding the results. The projection E : W V Γ is realized as a weighted average of values from different subdomains at unknowns on the interface Γ, thus resulting in functions continuous across the interface, and the solutions of 3

local subdomain problems to make the averaged function discrete harmonic. To assure good performance regardless of different stiffness of the subdomains [6], the weights are chosen proportional to the corresponding diagonal entries of the subdomain stiffness matrices. The space W is further decomposed as ã-orthogonal direct sum W = W 1 W N W C, where W i is the space of functions with nonzero values only in Ω i (i.e. they have zero values at coarse unknowns and they are generally not continuous at other unknowns on Γ) and W C is the coarse space, defined as the ã-orthogonal complement of all spaces W i ; W C = {v W : ã(v, w) = 0 w W i, i = 1,... N}. Functions from W C are fully determined by their values at coarse degrees of freedom (where they are continuous) and have minimal energy. Thus, they are generally discontinuous across Γ outside of coarse unknowns. The solution w W from (3) is now split accordingly as w = w C + N i=1 w i, where w C, determined by w C W C : ã (w C, v) = (r, Ev) v W C, (4) is called the coarse correction, and w i, determined by w i W i : ã (w i, v) = (r, Ev) v W i, (5) are the corrections from the substructures Ω i. See [6,7,8] for further details. 3 BDDC implementation based on frontal solver The frontal solver implements the solution of a square linear system with some of the variables having prescribed fixed values. Equations that correspond to the fixed variables are omitted and the values of these variables are substituted into the solution vector directly. The output of the solver consists of the solution and the resulting imbalance in the equations, called reaction forces. More precisely, consider a block decomposition of the vector of unknowns x with the second block consisting of all fixed variables, and write a system matrix A with the same block decomposition. Then on exit from the frontal solver, A 11 A 12 x 1 = f 1 + 0, (6) A 21 A 22 x 2 f 2 r 2 where fixed variable values x 2 and the load vectors f 1 and f 2 are the inputs, while the solution x 1 and the reaction r 2 are the outputs. In this section, we drop the subdomain subscript i and we write subdomain vectors w in the block form with the second block consisting of unknowns that are also coarse degrees of freedom (from now on, coarse unknowns), denoted 4

by the subscript c, and the first block consisting of the remaining degrees of freedom, denoted by the subscript f. The vector of the coarse degrees of freedom given by averages is written as Cw, where each row of C contains the coefficients of the average that makes that degrees of freedom; zeros and ones for arithmetic averages. Then subdomain vectors w W are characterized by w c = 0, Cw = 0. Assume that C = [C f C c ], with C c = 0, that is, the averages do not involve single variable coarse degrees of freedom; then Cw = C f w f. Denote the substructure local stiffness matrix by K. This matrix is obtained by the subassembly of element matrices only of elements in the substructure. The matrix K is singular for floating subdomains (subdomains not touching Dirichlet boundary conditions), but the block K ff is nonsingular if there are enough coarse unknowns to eliminate the rigid body modes, which will be assumed. We now show how to solve (4) and (5) using the frontal solver. In the case when there are no averages as coarse degrees of freedom, we recover the previous method from [1,9]. The local substructure problems (5) are written in the frontal solver form (6) as K ff K fc Cf T w f r 0 K cf K cc 0 w c = 0 + Rea, (7) C f 0 0 µ 0 0 where w c = 0, r is the part in the f block of the residual in the PCG method distributed to the substructures by the operator E T, and Rea is the reaction. The constraint w c = 0 is enforced by marking the w c unknowns as fixed, while the remaining constraints C f w f = 0 are enforced via the Lagrange multiplier µ. Using the fact that w c = 0, we get from (7) that K ff w f = C T f µ + r, (8) K cf w f = Rea, (9) C f w f = 0. (10) ( From (8), w f = Kff 1 C T f µ + r ). Now substituting w f into (10), we get the dual problem for µ, C f Kff 1 CT f µ = C f Kff 1 r. (11) The matrix C f Kff 1 CT f is dense but small, with the order equal to the number of averages on the subdomain, and it is constructed by solving the system K ff U = Cf T with multiple right hand sides by the frontal solver and then the multiplication C f U. After solving problem (11), we substitute for µ in (8) and find w f from (8) (9) by the frontal solver, considering both w c = 0 and µ fixed. The factorization in the frontal solver for (7) and the factorization of 5

the dual matrix C f K 1 ff CT f need to be computed only once in the setup phase. Note that while the residual in the PCG method applied to the reduced problem is given at the interface only, the right hand side in (7) has the dimension of all degrees of freedom on the subdomain. This is corrected naturally by extending the residual to subdomain interiors by zeros, which is required by the condition that the solution w of (7) is discrete harmonic inside subdomain. Similarly, only interface values of w i are used after solution of (7) in further PCG computation. Such approach is equivalent to computing with explicit Schur complements. Aware of this, we make no distinction in notation between these vectors given on the whole subdomain and on the corresponding interface. The coarse problem (4) is solved by the frontal solver just like a finite element problem, with the subdomains playing the role of elements. It only remains to specify the basis functions of W C on the subdomain and compute the local subdomain coarse matrix efficiently. Each coarse basis function is a column vector of values of unknowns on the subdomain and it is associated with one coarse degree of freedom, which has value 1, while all other coarse degrees of freedom have value 0. Denote by ψ c the matrix whose colums are coarse basis functions associated with the coarse unknowns at corners, and ψ avg the matrix made out of the coarse basis functions associated with averages. To find the coarse basis functions, we proceed similarly as in (7) and write the equations for the coarse basis functions in the frontal solver form, now with multiple right-hand sides, K ff K fc Cf T ψf c ψ avg f 0 0 0 0 K cf K cc 0 I 0 = 0 0 + Rea c Rea avg, (12) C f 0 0 λ c λ avg 0 I 0 0 [ ] where Rea c and Rea avg are matrices of reactions. Denote ψ f = ψf c ψ avg, f [ ] [ ] [ ] [ ] ψ c = ψc c ψc avg = I 0, λ = λ c λ avg, Rea = Rea c Rea avg, and [ ] R = 0 I with blocks of the same size. Then (12) becomes K ff ψ f + K fc ψ c = C T f λ, (13) K cf ψ f + K cc ψ c = Rea, (14) C f ψ f = R. (15) ( From (13), we get ψ f = Kff 1 Kfc ψ c + Cf T λ ). Substituting ψ f into (15), we derive the dual problem for Lagrange multipliers C f K 1 ff CT f λ = ( R + C f K 1 ff K fcψ c ), (16) 6

Fig. 1. Hip joint replacement, von Mises stresses in improved design. which is solved for λ by solving the system (16) for multiple right hand sides. Since ψ c is known, we can use frontal solver to solve (13)-(14) to find ψ f. Finally, we construct the local coarse matrix corresponding to the subdomain as K C = ψ T Kψ = ψ T CT [ ] f λ = ψf T ψc T Cf T λ = ψf T Cf T λ + I Rea, Rea Rea 0 where ψ = [ ψ c ψ avg ]. At the end of the setup phase, the matrix of coarse problem is factored by frontal solver, using subdomain coarse matrices as input. Note that the factorizations in the subdomain solution and in the computation of the coarse basis functions are the same, and need to be computed only once in the setup phase. 4 Numerical results The structural analysis of the replacement of the hip joint construction loaded by pressure from body weight is an important problem in bioengineering. The hip replacement consists of several parts made of titanium; here we consider the central part of the replacement joint. The problem was simplified to stationary linearized elasticity. The highest stress was reached in the notches of the holders. In the original design, holders of the hip replacement had thickness of 2 mm which led to maximal von Mises stress about 1,500 MPa. As the yield point of titanium is about 800 MPa, the geometry of the construction had to 7

Fig. 2. Hip joint replacement, division into 32 subdomains be modified. The thickness of the holders was increased to 3 mm, radiuses of the notches were increased, and the notches were made smaller, as is in Fig. 1. The maximal von Mises stress on this new construction was only about 540 MPa, which satisfed the demands for the strength of the construction [2,11]. The computation needs 400 minutes when using a serial frontal solver on Compaq Alpha server ES47 at the Institute of Thermomechanics, Academy of Sciences of the Czech Republic. With 32 subdomains and corner coarse degrees of freedom only, BDDC on a single Alpha processor took 10 times less, only 40 minutes. Further improvement (Tables 1 and 2) was obtained by adding averages and using a parallel computer, namely 16 1.5 GHz Intel Itanium 2 processors of SGI Altix 4700 computer in CTU Supercomputing Centre, Prague. The decompositions into 16 and 32 subdomains were obtained by the package METIS. The mesh consists of 33,186 quadratic elements resulting in 544,734 unknowns. In the case of 16 subdomains, the interface was divided into 100 corners, 12 edges, and 35 faces, and in the case of 32 subdomains, into 200 corners, 12 edges, and 66 faces. In the first column of the tables, no additional averages are considered and only corners were used in the construction of W C. Then we enforce the equality of arithmetic averages over all edges, over all faces and over all edges and faces, respectively. We observe, that while the implementation of averages leads to a negligible increase in the computational cost of the factorizations, it considerably improves the condition number, and thus reduces the overall time of solution. Also, the decomposition into 32 subdomains leads to significantly lower computational times than the division into 16 subdomains. 8

coarse problem corners corners+edges corners+faces corners+edges+faces iterations 79 77 43 41 cond. number est. 1,173 1,173 258 258 factorization (sec) 88 77 108 84 pcg iter (sec) 126 107 61 55 total (sec) 250 223 207 177 Table 1 Hip joint replacement, 16 subdomains coarse problem corners corners+edges corners+faces corners+edges+faces iterations 76 60 42 35 cond. number est. 15,561 223 279 67 factorization (sec) 53 51 56 52 pcg iter (sec) 87 65 50 40 total (sec) 157 135 124 111 Table 2 Hip joint replacement, 32 subdomains 5 Conclusion We have presented an application of a standard frontal solver within the iterative substructuring method BDDC. The method was applied to an industrial stress analysis problem. The numerical results show that the improvement of preconditioning by additional constraints on edges and faces is significant and leads to a considerable savings of computational time, while the additional cost is negligible. In addition, the total time was lower for more subdomains. For a large number of subdomains, of course, the solution of the coarse problem will eventually become a bottleneck. Acknowledgements This research has been supported by the Czech Science Foundation under grant 106/08/0403 and by the U.S. National Science Foundation under grant DMS-0713876. A part of this work was done while Jakub Šístek was visiting 9

at the University of Colorado Denver. The authors would also like to thank Bedřich Sousedík for his help in both typesetting and proofreading of the manuscript. References [1] P. Burda, M. Čertíková, J. Novotný, and J. Šístek, BDDC method with simplified coarse problem and its parallel implementation, Proceedings of MIS 2007, Josefův Důl, Czech Republic, January 13 20, pp. 3 9. Matfyzpress, Praha, 2007. http://ulita.ms.mff.cuni.cz/pub/mis [2] M. Čertíková, J. Tuzar, B Sousedík, and J. Novotný. Stress computation of the hip joint replacement using the finite element method. In Proceedings of Software a algoritmy numerické matematiky, Srní, Czech Republic, September, pp. 31 40. Charles University, Praha, 2005. [3] C. R. Dohrmann. A preconditioner for substructuring based on constrained energy minimization. SIAM J. Sci. Comput., 25:246 258, 2003. [4] B. M. Irons. A frontal solution scheme for finite element analysis. Int. J. Numer. Methods Eng., 2:5 32, 1970. [5] J. Li and O. B. Widlund. FETI-DP, BDDC, and block Cholesky methods. Internat. J. Numer. Methods Engrg., 66:250 271, 2006. [6] J. Mandel and C.R. Dohrmann. Convergence of a balancing domain decomposition by constraints and energy minimization. Numer. Linear Algebra Appl., 10:639 659, 2003. [7] J. Mandel, C. R. Dohrmann, and R. Tezaur, An algebraic theory for primal and dual substructuring methods by constraints. Appl. Numer. Math., 54:167 193, 2005. [8] J. Mandel and B. Sousedík. BDDC and FETI-DP under minimalist assumptions. Computing, 81:269 280, 2007. [9] J. Šístek, M. Čertíková, P. Burda, E. Neumanová, S. Pták, J. Novotný, and A. Damašek. Development of an efficient parallel BDDC solver for linear elasticity problems. In: Blaheta, R. and Starý, J., eds, Proceedings of Seminar on Numerical Analysis, SNA 07, Ostrava, Czech Republic, January 22 26, pp. 105 108. Institute of Geonics AS CR, Ostrava, 2007. [10] A. Toselli and O. Widlund. Domain Decomposition Methods - Algorithms and Theory. Springer-Verlag, Berlin Heidelberg, 2005. [11] J. Tuzar. Mathematical modeling of hip joint replacement. (Matematické modelování náhrady kyčelního kloubu, in Czech), 2005. Master thesis, CVUT. 10

arxiv: v1 [math.na] 28 Feb 2008