PIERS ONLINE, VOL. 6, NO. 7, 2010 679 An H-LU Based Direct Finite Element Solver Accelerated by Nested Dissection for Large-scale Modeling of ICs and Packages Haixin Liu and Dan Jiao School of Electrical and Computer Engineering, Purdue University 465 Northwestern Avenue, West Lafayette, IN 47907, USA Abstract In this work, we prove that for the sparse matrix resulting from a finite-elementbased analysis of electrodynamic problems, its inverse has a data-sparse H-matrix approximation with error well controlled. Based on this proof, we develop a fast direct finite element solver. In this direct solver, the H-matrix-based LU factorization is developed, which is further accelerated by nested dissection. We show that the proposed direct solver has an O(kN logn) memory complexity and O(k 2 Nlog 2 N) time complexity, where k is a small number that is adaptively determined based on accuracy requirements, and N is the number of. A comparison with the state-of-the-art direct finite element solver that employs the most advanced sparse matrix solution has shown a clear advantage of the proposed solver. Applications to large-scale package modeling involving millions of have demonstrated the accuracy and almost linear complexity of the proposed direct solver. In addition, the proposed method is applicable to arbitrarily-shaped three-dimensional structures and arbitrary inhomogeneity. 1. INTRODUCTION A finite element method (FEM) based analysis of a large-scale IC and package problem generally results in a large-scale system matrix. Although the matrix is sparse, solving it can be a computational challenge when the problem size is large. There exists a general mathematical framework called the Hierarchical (H) Matrix framework [1 4], which enables a highly compact representation and efficient numerical computation of the dense matrices. It has been shown that the storage requirements and matrix-vector multiplications using H matrices are of complexity O(N logn), and the inverse of an H-matrix can be obtained in O(Nlog 2 N) complexity. In [5, 6], we developed an H-matrix based solver to efficiently compute and store the inverse of a finite element matrix. In this work, we develop an LU-factorization based fast finite-element solver. We then further accelerate the H-based LU solver by Nested Dissection [7]. The main contribution of this work is four-fold. First, we theoretically prove the existence of an H-matrix-based representation of the FEM matrix and its inverse for electrodynamic problems. The existence of an H-matrix approximation so far was only proved for elliptic partial differential equations (PDE) [8], whereas the Maxwell s equations are hyperbolic in nature. Second, we develop an H-matrix-based LU solver of O(kN logn) memory complexity and O(k 2 Nlog 2 N) time complexity for solving vector wave equations, where k is a variable that is adaptively determined based on the accuracy requirement, which is also small compared to N. The H-based LU is further accelerated by Nested Dissection [7]. Third, we develop a theoretical analysis of the complexity and accuracy for the proposed fast direct solver. In addition, we compare the proposed direct solver with the state-of-the-art direct sparse solver such as 5.0 [9]. has incorporated almost all the advanced sparse matrix techniques such as the multifrontal method and the approximate minimum degree (AMD) ordering for solving large-scale sparse matrices. The proposed solver is shown to outperform the 5.0 in both matrix decomposition and matrix solution time without sacrificing accuracy. 2. ON THE EXISTENCE OF H-MATRIX REPRESENTATION OF THE FEM MATRIX AND ITS INVERSE FOR ELECTRODYNAMIC PROBLEMS It has been proven in the mathematical literature that the FEM matrix resulting from the analysis of elliptic partial differential equations such as a Poisson equation has an H-matrix representation. Moreover, its inverse also allows for a date-sparse H-matrix approximation [8]. However, the full Maxwell s equations are hyperbolic partial differential equations in nature. Therefore, the proof developed for elliptic PDE-based equations does not apply to the wave equation, which governs all the electrodynamic phenomena. The existence of an H-matrix representation of the FEM system
PIERS ONLINE, VOL. 6, NO. 7, 2010 680 matrix for electrodynamic analysis is obvious based on the definition of an H-matrix [1 4]. In the following, we rigorously prove that the inverse of the FEM matrix also allows for an H-matrix representation. Consider the electric field E due to an arbitrary current distribution J in free space. The current distribution J can always be decomposed into a group of electric dipoles Ĩil i, where Ĩi is the current of the i-th element and l i is the length of the i-th current element. An FEM-based solution to the second-order vector wave equation subject to boundary conditions results in a linear system of equations Y{E} = {I} (1) where the right-hand-side vector {I} has the following entries I i = jωµ 0 Ĩ i l i. (2) On the other hand, E due to any current distribution J can be evaluated from the following integral ( E = jωµ 0 JG 0 + 1 ) V k0 2 J G 0 dv (3) where G 0 is the free-space Green s function. For a group of electric dipoles Ĩil i, the E field radiated by them at any point in the computational domain can be written as {E} = Z{I} (4) where {I} vector is the same as that in (1), Z is a dense matrix having the following elements { 1 Z mn = jωµ 0ˆt m (r m ) jωµ ˆl n (r n)g 0 (r m, r n) j 0 ωε ˆt [ m (r m ) (ˆl n (r )G 0 (r, r ))] } (5) and {E} vector has the following entries E m = ˆt m (r m ) E(r m ) (6) where ˆt m is the unit vector tangential to the m-th edge, ˆl n is the unit vector tangential to the n-th current element, r m denotes the center point of the m-th edge, r n denotes the point where the n-th current element is located. Comparing (1) to (4), it is clear that the inverse of the FEM matrix Y is Z. Since we have proved in [10 12] that the Z resulting from an integral-equation based analysis can be represented by an H-matrix with error well controlled, we prove that Y s inverse has an H-matrix representation. Following a proof similar to the above, we can show that the inverse of the FEM matrix in a non-uniform material can also be represented by an H-matrix. 3. PROPOSED FAST DIRECT FEM SOLVER In [5, 6], we developed an H-inverse based fast direct FEM solver. Since what is to be solved in (1) is Y 1 {I} instead of Y 1, an LU-factorization-based direct solution is generally more efficient than an inverse-based direct solution. In addition, in the LU factorization, the input matrix can be overwritten by L and U factors, thus the memory usage can be saved by half. The proposed LU-based direct solution has three components: (1) H-based recursive LU factorization; (2) matrix solution by H-based backward and forward substitution; and (3) acceleration by nested dissection. 3.1. Recursive LU Factorization and Matrix Solution We use an H-matrix block Y tt to demonstrate the H-LU factorization process, where t is a non-leaf cluster in the cluster tree T I [5, 6]. If t is a non-leaf, block t t is not a leaf block and hence Y tt can be subdivided into four sub blocks: ( ) Yt1t Y tt = 1 Y t1t 2 (7) Y t2t 1 Y t2t 2 where t 1 and t 2 are the children of t in the cluster tree T I.
PIERS ONLINE, VOL. 6, NO. 7, 2010 681 Assuming Y can be factorized into L and U matrices, Y can also be written as: ( ) ( ) ( ) Lt1t Y tt = L tt U tt = 1 0 Ut1t 1 U t1t 2 Lt1t = 1 U t1t 1 L t1t 1 U t1t 2 L t2t 2 0 U t2t 2 U t1t 1 U t1t 2 + L t2t 2 U t2t 2 (8) By comparing (7) and (8), the LU factorization can be computed recursively by the following four steps: 1) Compute L t1t1 and U t1t1 by H-LU factorization Y t1t1 = L t1t1 U t1t1 ; 2) Compute U t1t2 by solving L t1t1 U t1t2 = Y t1t2 ; 3) Compute L t2t1 by solving L t2t1 U t1t1 = Y t2t1 ; and 4) Compute L t2t2 and U t2t2 by H-LU factorization L t2t2 U t2t2 = Y t2t2 L t2t1 U t1t2. If t t is a leaf block, Y tt is not subdivided. It is stored in full matrix format, and factorized by a conventional pivoted LU factorization. Matrix solution by backward and forward substitution can be done in a similar hierarchical way. 3.2. Acceleration by Nested Dissection It is known that the smaller the number of nonzero elements to be processed in LU factorization, the better the computational efficiency. Nested dissection [7] can be used as an ordering technique to reduce the number of non-zero blocks to be computed in the LU factorization. In addition, this scheme naturally fits the H-based framework compared to many other ordering techniques. It serves an efficient approach to construct a block cluster tree [5]. We divide the computational domain into three parts: two domain clusters D1 and D2 which do not interact with each other and one interface cluster I which interacts with both domain clusters. Since the domain clusters D1 and D2 do not have interaction, their crosstalk entries in the FEM matrix Y are all zero. If we order the in D1 and D2 first and the in I last, the resultant matrix will have large zero blocks. These zero blocks are preserved during the LU factorization, and hence the computation cost of LU factorization is reduced. We further partition the domain clusters D1 and D2 into three parts. This process continues until the number of in each cluster is smaller than leafsize (n min ), or no interface edges can be found to divide the domain. Since the matrices in the non-zero blocks are stored and processed by H-matrix techniques in the proposed direct solver, the computational complexity is significantly reduced compared to a conventional nested dissection based LU factorization. 4. COMPLEXITY AND ACCURACY ANALYSIS 4.1. Complexity Analysis The storage complexity and inverse complexity of an H-matrix are shown to be O(kN logn), and O(kN log 2 N) respectively in [6] for solving electrodynamic problems. Here, we only analyze the complexity of an H-based LU factorization. As can be seen from Section 3.1, the LU factorization of Y tt is computed in four steps. In these four steps, Y t1t1, Y t1t2, and Y t2t1 are computed once, Y t2t2 is computed twice. Since in inverse, each block is computed twice, the complexity of H-based LU factorization is bounded by the H-based inverse, which is O(Nlog 2 N). 4.2. Accuracy Analysis From the proof given in Section 2, it can be seen that the inverse of the FEM matrix Y has an H-matrix-based representation. In such a representation, which block is admissible and which block is inadmissible are determined by an admissibility condition [5, 6]. Rigorously speaking, this admissibility condition should be defined based on Y 1. However, since Y 1 is unknown, we determine it based on Y. Apparently, this will induce error. However, as analyzed in Section 2, the Y s inverse can be mapped to the dense matrix formed for an integral operator. For this dense matrix, the admissibility condition used to construct an H-matrix representation has the same form as that used in the representation of the FEM matrix Y [10, 12]. Thus, the H-matrix structure, i.e, which block can have a potential low-rank approximation and which block is a full matrix, is formed correctly for Y 1. In addition, the accuracy of the admissibility condition can be controlled. In the LU factorization process, the rank of each admissible block is adaptively determined based on a required level of accuracy. If the rank is determined to be a full rank based on the adaptive scheme, then a full rank will be used. Thus, the low-rank approximation for each admissible block is also error controllable. Based on the aforementioned two facts, the error of the proposed direct solver is controllable.
PIERS ONLINE, VOL. 6, NO. 7, 2010 682 D = 1000 um, W = 100 um, S = 50 um Metal conductivity: 5.8 10 7 S/m Perfect electric conductor D 15 um 15 um ε 650 um, r = 3.4 30 um, r = 3.4 The bottom is backed by a perfect electric conductor ε W S W Figure 1: Geometry and material of a package inductor. Figure 2: A 7 7 inductor array. CPU time (s) 10 5 10 4 10 3 O(Nlog 2 N) CPU time (s) 10 2 10 1 10 0 O(NlogN) storage (GB) 10 2 10 1 O(NlogN) LU error A-LU / A 10-6 10-8 10-10 Figure 3: Performance of the proposed LU-based direct solver for simulating an inductor array from a 2 2 array to a 7 7 array. (a) CPU time for LU factorization. (b) CPU time for solving one right hand side. (c) Storage. (d) Accuracy. 5. NUMERICAL RESULTS A package inductor array is simulated to demonstrate the accuracy and efficiency of the proposed direct solver. The configuration of each inductor is shown in Figure 1, and a 7 7 inductor array is shown in Figure 2. In this example, H-LU factorization with nested dissection is used to directly solve the FEM matrix. Simulation is done at 10 GHz for the inductor array from 2 2 to 7 7, the number of of which is from 117,287 to 1,415,127. The simulation parameters were chosen as: n min = 32 and η = 1. The rank k was adaptively decided. In Figure 3(a), we plot the LU factorization time of the proposed direct solver, and that of 5.0 with respect to the number of. The proposed solver demonstrates a complexity of O(Nlog 2 N), which agrees very well with the theoretical analysis, whereas has a much higher complexity. In Figure 3(b), we plot the matrix solution time of the proposed direct solver, and that of for one right hand side. Once again, the proposed direct solver outperforms. In addition, the proposed direct solver is shown to have an O(N logn) complexity in matrix solution (backward and forward substitution). In Figure 3(c), we plot the storage requirement of the proposed direct solver and that of in simulating this example. Even though the storage of the proposed solver is shown to be a little bit higher than that of, the complexity of the proposed solver is lower, and hence for larger number of, the proposed solver will outperform in storage. In Figure 3(d), we plot the relative error of the proposed direct FEM solver. Good accuracy is observed in the entire range. The proposed direct FEM solver has also been successfully applied to the modeling of on-chip circuits. 6. CONCLUSIONS In this work, we proved the existence of an H-matrix-based representation of the inverse of the FEM matrix for solving electrodynamic problems. We developed a direct LU-based FEM solver of significantly reduced complexity. The time and storage complexity were shown to be O(Nlog 2 N)
PIERS ONLINE, VOL. 6, NO. 7, 2010 683 and O(N logn) respectively. In addition, we accelerated the direct solver by nested dissection. Numerical experiments and a comparison with the state-of-the-art sparse matrix solver have demonstrated its superior performance in modeling large-scale circuit and package problems involving millions of. ACKNOWLEDGMENT This work was supported by NSF under award No. 0747578 and No. 0702567. REFERENCES 1. Hackbusch, W. and B. Khoromaskij, A sparse matrix arithmetic based on matrices. Part I: Introduction to matrices, Computing, Vol. 62, 89 108, 1999. 2. Hackbusch, W. and B. N. Khoromskij, A sparse-matrix arithmetic. Part II: Application to multi-dimensional problems, Computing, Vol. 64, 21 47, 2000. 3. Borm, S., L. Grasedyck, and W. Hackbusch, Hierarchical matrices, Lecture Note 21 of the Max Planck Institute for Mathematics in the Sciences, 2003. 4. Grasedyck, L. and W. Hackbusch, Construction and arithmetics of matrices, Computing, Vol. 70, No. 4, 295 344, August 2003. 5. Liu, H. and D. Jiao, A direct finite-element-based solver of significantly reduced complexity for solving large-scale electromagnetic problems, International Microwave Symposium (IMS), 4, June 2009. 6. Liu, H. and D. Jiao, Performance analysis of the H-matrix-based fast direct solver for finiteelement-based analysis of electromagnetic problems, IEEE International Symposium on Antennas and Propagation, 4, June 2009. 7. George, A., Nested dissection of a regular finite element mesh, SIAM J. on Numerical Analysis, Vol. 10, No. 2, 345 363, April 1973. 8. Bebendorf, M. and W. Hackbusch, Existence of H-matrix approximants to the inverse FEmatrix of elliptic operators with L -coefficients, Numerische Mathematik, Vol. 95, 1 28, 2003. 9. 5.0, http://www.cise.ufl.edu/research/sparse/umfpack/. 10. Chai, W. and D. Jiao, H- and H 2 -matrix-based fast integral-equation solvers for large-scale electromagnetic analysis, IET Microwaves, Antennas & Propagation, accepted for publication, 2009. 11. Chai, W. and D. Jiao, An H 2 -matrix-based integral-equation solver of reduced complexity and controlled accuracy for solving electrodynamic problems, IEEE Trans. Antennas Propagat., Vol. 57, No. 10, 3147 3159, October 2009. 12. Chai, W. and D. Jiao, An H 2 -matrix-based integral-equation solver of linear-complexity for large-scale full-wave modeling of 3D circuits, IEEE 17th Conference on Electrical Performance of Electronic Packaging (EPEP), 283 286, October 2008.