FAST SPARSE SELECTED INVERSION

Size: px

Start display at page:

Download "FAST SPARSE SELECTED INVERSION"

Martina West
5 years ago
Views:

1 SIAM J. MARIX ANAL. APPL. Vol. 36, No. 3, pp c 05 Society for Industrial and Applied Mathematics FAS SPARSE SELECED INVERSION JIANLIN XIA, YUANZHE XI, SEPHEN CAULEY, AND VENKAARAMANAN BALAKRISHNAN Abstract. We propose a fast structured selected inversion method for extracting the diagonal blocks of the inverse of a sparse symmetric matrix A, using the multifrontal method and rank structures. When A arises from the discretization of some PDEs and has a low-rank property the intermediate dense matrices in the factorization have small off-diagonal numerical ranks, structured approximations of the diagonal blocks and certain off-diagonal blocks of A that are needed to find the diagonal blocks of A can be quickly computed. A structured multifrontal LDL factorization is first computed for A with a forward traversal of an assembly tree, which yields a sequence of local data-sparse factors. he factors are used in a backward traversal of the tree for the structured inversion. he intermediate operations in the inversion are performed in hierarchically semiseparable or low-rank forms. With the assumptions of data sparsity and appropriate rank conditions, the theoretical structured inversion cost is proportional to the matrix size n times a low-degree polylogarithmic function of n after structured factorizations. he memory counts are similar. In comparison, existing direct selected inversion methods cost On 3/ flops in two dimensions and On flopsinthree dimensions for both the factorization and the inversion, with On 4/3 memory in three dimensions. Additional formulas for efficient structured operations are also derived. Numerical tests on two- and three-dimensional discretized PDEs and more general sparse matrices are done to demonstrate the performance. Key words. fast selected inversion, data sparsity, structured multifrontal method, low-rank property, HSS matrix, linear complexity AMS subject classifications. 5A3, 65F05, 65F30, 65F50 DOI. 0.37/ X. Introduction. Extracting selected entries often the diagonal of the inverse of a sparse matrix, usually called selected inversion, is critical in many scientific computing problems. Examples include preconditioning, uncertainty quantification in risk analysis [3], electronic structure calculations within the density functional theory framework [6], and simulations of nanotransistors and silicon nanowires [8]. he diagonal entries of the inverse are useful in providing significant insights into the original problem or its solution. he goal of this paper is to present an efficient structured method for computing the diagonal of A as a column vector, denoted by diaga, as well as the diagonal blocks of A for a large n n sparse symmetric matrix A. hemethodalso produces some off-diagonal blocks of A. For convenience, we usually just mention diaga. In recent years, a lot of effort has been made in the extraction of diaga. Since A is often fully dense, a brute-force formation of A is generally impractical. On the other hand, it is possible to find diaga without forming the entire inverse. Received by the editors February 8, 04; accepted for publication in revised form by S. Le Borne July 7, 05; published electronically September, Department of Mathematics and Department of Computer Science, Purdue University, West Lafayette, IN xiaj@math.purdue.edu, yxi@math.purdue.edu. he research of the first author was supported in part by an NSF CAREER Award DMS-5546 and an NSF grant DMS-557. Athinoula A. Martinos Center for Biomedical Imaging, Department of Radiology, Massachusetts General Hospital, Harvard University, Charlestown, MA 09 stcauley@nmr.mgh.harvard.edu. School of Electrical and Computer Engineering, Purdue University, West Lafayette, IN ragu@ecn.purdue.edu. 83

2 84 J. XIA, Y. XI, S. CAULEY, AND V. BALAKRISHNAN If A is diagonally dominant or positive definite, A mayhavemanysmallentries. Based on this property, a probing method is proposed in [33]. It exploits the pattern of a sparsified A together with some standard graph theories, and computes diaga by solving a sequence of linear systems with a preconditioned Krylov subspace method. Later, several approaches were proposed for more general matrices. he fast inverse with nested dissection methodin[3,4]andtheselectedinversion method in [5] use domain decomposition and compute some hierarchical Schur complements of the interior points for each subdomain. his is followed by the extraction of the diagonal entries in a top-down pass. he method Selinv in [6, 7] uses a supernode left-looking LDL factorization of A to improve the efficiency. he work in [] focuses on the computation of a subset of A by accessing only part of the factors where the LU or LDL factorization of A is held in out-of-core storage. All these methods in [, 3, 5, 6] are direct methods. For iterative methods, a Lanczos-type algorithm is first used in [35]. Later, a divide-and-conquer DC method and a domain decomposition DD method are presented in [34]. he DC method assumes that the matrix can be recursively decomposed into a block-diagonal plus low-rank form, where the decomposed problem is solved and corrected by the Sherman Morrison Woodbury SMW formula at each recursion level. he DD method solves each local subdomain problem and then modifies the result by a global Schur complement. Both methods use iterative solvers and sparse approximation techniques to speed up the computations. Our scheme for the selected inversion is a direct method. As in [5, 6, 7], the scheme includes two stages, a block LDL factorization stage and an inversion stage. We incorporate various sparse and structured matrix techniques, especially a structured multifrontal factorization and a structured selected inversion, to gain significant efficiency and storage benefits, as outlined below. General sparse matrices and the multifrontal method. Our method is applicable to symmetric discretized matrices on both two-dimensional D and threedimensional 3D domains, as well as more general symmetric sparse matrices as long as certain rank structures exist, as explained in the next item. Just like in modern sparse direct solvers, we apply the nested dissection ordering [4] to the mesh or adjacency graph by calling some graph partitioning tools. hus, our inversion method does not rely on the special shape of the computational domain, and is more generally applicable than those in [5, 34]. o enhance the data locality of the later inversion, we use the multifrontal method [3] to compute a block LDL factorization of A. his method converts the overall sparse factorization into a sequence of operations on some local dense matrices called frontal matrices, following a nice tree structure called assembly tree. his makes it convenient to manage and access data in the inversion. Fast structured factorization stage. o speed up the factorization and the later inversion, we further incorporate rank structured techniques. For problems such as some discretized elliptic/elasticity equations and Helmholtz equations with low or medium frequencies, the dense intermediate matrices in the sparse factorization often have certain rank structures see, e.g., [,, 6, 7, 9, 3, 4]. For these cases, the problem or the discretized matrix A is often said to have a low-rank property. We perform the LDL factorization of A with the structured multifrontal methods in [39, 40, 4], which further approximate the frontal matrices in the multifrontal method by hierarchically semiseparable HSS forms [0, 4]. Such HSS forms take much less storage than the original dense ones. he local dense operations are thus converted into a series of fast structured ones. he complexity of the LDL factorization is

3 FAS SPARSE SELECED INVERSION 85 On times a low-degree polylogarithmic function of n in two dimensions, and On to On 4/3 times a low-degree polylogarithmic function of n in three dimensions, depending on certain rank conditions. Moreover, after the factorization, the factors are in data-sparse forms and with storage roughly proportional to n, which makes the later efficient inversion feasible. Related to the later inversion, we also show a fast Schur complement computation strategy in heorem 3.4 using a concept similar to the reduced HSS matrix in [39]. he formula takes advantage of an HSS inverse computation that is needed in the inversion stage, and significantly saves the Schur complement computation cost in the factorization stage. 3 Fast structured selected inversion stage. After the structured multifrontal factorization, the factors are represented by a sequence of HSS or low-rank forms. his yields a structured approximation Ã to A. he inversion procedure is performed on these data-sparse factors instead of the original dense ones. In heorem 3.8, we show the rank structures in the diagonal blocks and selected off-diagonal blocks of Ã. hey are either HSS or low-rank forms. he major operations of the inversion are then HSS inversions, multiplications of HSS and low-rank matrices, and low-rank updates, which can all be quickly performed. his thus significantly improves the efficiency and the storage over the standard direct selected inversion. In heorem 3.5, we also derive another formula to quickly apply the inverse of the leading part of an HSS form to its off-diagonal part, as needed in the inversion. 4 Nearly linear theoretical complexity for selected inversion. hecomplexityof the inversion algorithm is analyzed with a complexity optimization strategy and a rank relaxation idea in [39]. Unlike in [39, 4] where the factorization cost is optimized, here we minimize the selected inversion cost. A switching level in the assembly tree is used to shift from dense local inversions to HSS ones. he structured selected inversion has almost linear complexity for the discretized matrices with the low-rank property in both two and three dimensions. More specifically, if the off-diagonal numerical ranks of the intermediate frontal matrices are bounded by r, then the selected inversion cost is Orn in two dimensions and Or 3/ n in three dimensions. If the off-diagonal numerical ranks grow with n following the rank patterns in ables 4 6, then the theoretical selected inversion cost and the storage are proportional to n times lowdegree polylogarithmic functions of n. In contrast, the methods in [3, 5, 6, 7] cost On 3/ in two dimensions and On in three dimensions, for both the LDL factorization and the selected inversion, and need On 4/3 storage in three dimensions. Due to the data sparsity, our method has the potential to be extended to the fast extraction of general off-diagonal entries of A. Numerical tests in terms of discretized Helmholtz equations in both two and three dimensions as well as various more general problems from a sparse matrix collection are performed. Significant performance gain is observed for the structured selected inversion over the standard direct one. We would like to mention that an alternative way is to use H- orh -matrices [, 6, 7, 7, 0, ], especially for D and 3D discretized PDEs. hat is, the discretized operators can be first approximated by H- orh -matrices and then a related inversion procedure is applied. In fact, the methods in [7, ] involving nested dissection and the one in [0] involving weak admissibility conditions share some concepts similar to ours. In particular, if H -matrix techniques are applied to certain algebraic operators, strict On inversion complexity is possible. Here by using the multifrontal method, we seek to take advantage of its data locality and related well-studied graph techniques for sparse matrices.

4 86 J. XIA, Y. XI, S. CAULEY, AND V. BALAKRISHNAN he remaining sections of the paper are organized as follows. Section briefly reviews the structured multifrontal factorization with HSS techniques. Section 3 describes the structured selected inversion algorithm in detail. We discuss the complexity optimization in section 4. he numerical results are shown in section 5. Section 6 concludes the paper. In the presentation, we use the following notation: A s t represents a submatrix of A specified by the row index set s and the column index set t, anda s consists of selected rows of F specified by s; diagd denotes the diagonal entries of D as a column vector; on the other hand, diagd,d denotes a block diagonal matrix with the diagonal blocks D and D ; represents a full postordered binary tree with its nodes denoted by i =,,...,root, where root istheroot; c and c denote the left and the right children of a nonleaf node i in, respectively; sibi and pari denote the sibling and the parent of a node i, respectively; boldface symbols such as F and S are for the dense matrices inside the multifrontal factorization, and i is for a node of the assembly tree.. Structured multifrontal LDL factorization. We first briefly review the structured multifrontal method in [39] and also a block LDL variation, which will be used in the factorization stage before the selected inversion... HSS matrix and algorithms. he structured multifrontal method incorporates HSS structures into the multifrontal method. An N N HSS matrix F with a corresponding HSS tree can be defined as follows [0, 38, 4]. Let each node i of a full binary tree be associated with a consecutive index set t i I {:N}, which satisfies t i t sibi = t pari, t i t sibi =, andt root = I. he index sets t i associated with all the leaves i together form I. An HSS matrix F is given by F D root, where D i is recursively defined for each nonleaf node i and its children c and c as. D i = F ti t i = D c U c B c V c U c B c V c with Uc Rc Vc Wc. U i =, V Uc R i =. c Vc W c he size t i of D i satisfies t i = t c + t c. Notice that this structure is essentially the same as the matrix family M k,τ used in [0] based on a weak admissibility condition. Here, D i,u i,r i, etc., are called the HSS generators associated with node i. Due to., we also call U i,v i associated with a nonleaf node i nested basis matrices. he HSS rank of F is defined to be r = max i=,,...,root D c maxrankf ti I\t i, rankf I\ti t i, where F ti I\t i and F I\ti t i are called the ith HSS block row and column, respectively. An illustration can be found in Figure. If F is symmetric, then U i = V i, R i = W i, B c = Bc. o construct an HSS form, the HSS blocks are compressed hierarchically in a bottom-up traversal of [6, 4]. F is partitioned following the index sets t i.foreach

5 FAS SPARSE SELECED INVERSION i Mesh ii Outer assembly tree and inner HSS trees Fig.. Illustration of the nested dissection ordering [4] of a general mesh and a two-layer tree structure [39, 4], where the outer tree in ii is the assembly tree, and each inner tree is an HSS tree.each leaf node of corresponds to the interior points of a subdomain and each nonleaf node corresponds to a separator or interface. leaf i, a QR or rank-revealing QR factorization is computed: F ti I\t i = U i H i.for a nonleaf node i, appropriate subblocks of H c and H c are merged and compressed to yield R c R c. Similar operations are applied to the HSS block columns to obtain Vi and W i. After such compression, B i can be conveniently identified. he overall HSS construction from dense F costs OrN flops. Later, this HSS construction can be applied to the frontal matrices in the multifrontal method. o avoid such dense HSS construction, randomized methods can be used [40]. If r is small, we can quickly perform various HSS operations. For example, ULVtype algorithms [0, 4] can be used to factorize F in Or N flops, and linear system solution with the factors costs only OrN flops. Similarly, the multiplication and the explicit inversion of HSS matrices [0, 6] can be done in Or N flops. Even if the HSS blocks at different hierarchical levels have ranks growing with the block sizes, the factorization complexity may still be quite satisfactory, as long as the growth follows certain patterns [4,, 38]; see section 4. Recently, HSS techniques were embedded into sparse matrix computations and helped the development of some fast direct solvers and efficient preconditioners [39, 40, 4]. he sparse factorization described in the next subsection is an example... Structured multifrontal block LDL factorization. he multifrontal method [3] can be used to compute a block LDL factorization of A:.3 A = LΛL. o improve the efficiency, the fast structured sparse solver in [39] can be conveniently modified to compute an approximate multifrontal block LDL factorization. hat is, structured approximations to L and Λ will be computed. his is briefly reviewed here as a background for the later structured selected inversion. he matrix A is first reordered with nested dissection to reduce fill-in [4]. he variables or mesh points are grouped into separators, which are used to recursively divide the mesh or adjacency graph into smaller pieces. As in [39], we use some graph partitioning tools [5, 30] to partition the graph, so that our method is not restricted to any particular domain shape or mesh structure. For illustration, an example is shown in Figure i. he multifrontal method performs the sparse factorization via some local factorizations of a series of smaller dense frontal matrices. A postordered assembly tree is formed to organize the elimination of the separators. Each node of corresponds

6 88 J. XIA, Y. XI, S. CAULEY, AND V. BALAKRISHNAN to a separator in the mesh. Later, we do not distinguish between a node of and a separator in the mesh. Label the nodes/separators of as i =,,...,root. See the outer tree in Figure ii. Let s i be the index set of the mesh points in separator i, andn i be the set of neighbors [39] of a node i, defined as follows. Definition.. A neighbor j of node i is an ancestor node of i in the assembly tree satisfying either A sj s i has nonzero entries, or a nonzero entry or fill-in is introduced into A sj s i due to the elimination of a descendant of i in the factorization, i.e., A sj s i =0but L sj s i has nonzero entries. For example, in Figure, N = {3, 7, 5}, N 3 = {7, 5}. NotethatN 3 includes separator 5 because the elimination of separator connects separators 3 and 5. Before reviewing the structured multifrontal method, we emphasize the following: he global sparse factorization is governed by the multifrontal framework, which follows the assembly tree. HSS techniques are applied locally to the intermediate frontal matrices only. hat is, each local HSS tree corresponds to an individual node/separator in the global assembly tree ; see Figure ii. he two types of trees and are independent of each other. In for an HSS matrix, a parent node corresponds to an index set that includes the child index sets. In, a parent node and its children correspond to different mesh point sets. hat is, each node corresponds to an independent separator in the mesh, regardless of the parent/child relationship. he structured multifrontal scheme proceeds as follows. For each node i of,let.4 F 0 i = A si s i A sni s i A sni s i 0 where s Ni can be understood similarly to s i.ifiis a leaf of, the associated frontal matrix is simply F i F 0 i.ifiis a nonleaf node with children c and c, form a frontal matrix F i by an assembly operation called extend-add [3]:.5 F i = F 0 i S c S c, where S c and S c are called update matrices and are obtained from early steps similar to.8 below, and the extend-add operator means that the matrices are permuted and extended to match the global indices in {s i, s Ni }. he structured multifrontal LDL factorization follows the framework in [39] partially structured or [40, 4] fully structured. Partition F i as F F.6 F i N i,i F Ni,i F Ni,N i so that F corresponds to the index set s i. Construct an HSS approximation to F i with generators D i,u i, etc., so that F and F Ni,N i correspond to two sibling nodes k and k of the HSS tree, respectively; see Figure. Next, compute a block LDL factorization I F I LNi F i =,i, L Ni,i I S i I where.7 L Ni,i = F Ni,iF,,,

7 FAS SPARSE SELECED INVERSION 89 Fig.. Illustration of an HSS form and the corresponding HSS tree used for a frontal matrix F i. and S i istheupdatematrixoftheform.8 S i = F Ni,N i F Ni,iF F N i,i. After the HSS approximation of F i, F Ni,i as one of its off-diagonal blocks looks like.9 F Ni,i U kb k U k. We then have.0 S i F Ni,N i U kb k Uk F U kb k U k. Remark.. HSS approximation errors are extensively studied in [37]. For example, if all the HSS blocks are compressed with a relative accuracy τ, then the relative HSS approximation error in the Frobenius norm is O r log Nτ. In other words, the error accumulation factor is O r log N. his may possibly even be reduced to Olog N [5, Corollary 6.8]. If all the frontal matrices are approximated, we expect a similar effect in the error accumulation, due to the similarity in the hierarchical structures of the assembly tree and the HSS tree. hat is, the overall accumulation factor is likely a low-order power of O r log n. A more precise estimation will be conducted in the future for all the factorization steps. Numerical tests indicate that the overall approximation error is usually well controlled. Since we are interested in the selected inversion, unlike the method in [39], an HSS inverse F is computed here see section 3.. heorem 3.4 below indicates that the information in the inversion of F can be used to quickly form.0. In practice, a switching level l s [39, 4] is also involved, so that standard dense factorizations are used when a node of the assembly tree is below level l s ;thisisto optimize the complexity see section 4. Remark.. he structured multifrontal LDL factorization we implement involves a shortcut in [39] which does not affect the performance of the later inversion. he shortcut is to use dense update matrices, so that dense frontal matrices are formed first and then approximated by HSS forms. his has OrN complexity for local HSS compression, and makes the overall multifrontal factorization cost suboptimal, but is much simpler to use. he reasons for this shortcut are clarified in [39]. his multifrontal algorithm is designed to take quite general sparse matrices as inputs, without specific information on the PDE or geometry. If the update matrices are in HSS forms, they may potentially need to be permuted arbitrarily in the assembly operation.5. Designing such a general scheme would be unnecessarily sophisticated. However, when the update matrices are kept dense,.5 is straightforward.

8 90 J. XIA, Y. XI, S. CAULEY, AND V. BALAKRISHNAN As shown in [39], for D and 3D problems satisfying certain rank patterns, the difference in the complexity between this factorization with the shortcut and a fully structured factorization with all HSS local operations is insignificant. For both versions, the factorization complexity is up to On log nintwo dimensions and up to On 4/3 log n in three dimensions. Most importantly, the resulting factors are still structured, just like in a fully structured multifrontal method. herefore, this shortcut does not affect the complexity of the later selected inversion, which is the focus of this work. 3. Structured sparse selected inversion. We then describe our structured selected inversion scheme for A based on its structured LDL factors. After the structured multifrontal factorization, suppose A is approximated by Ã. hat is, the structured factors are the exact factors of Ã. Since no approximation is involved in the inversion stage, we use notation such as F i to denote its HSS approximation, and L to denote the structured factor. his highly simplifies the notation that is otherwise very messy due to the multilevel approximations in the factorization stage. Let 3. C = Ã. hus, our selected inversion computes diagc which approximates diaga. We start with the discussion of an HSS inversion algorithm, as needed in the scheme. he HSS inversion also benefits the computation of the update matrices S i. 3.. HSS inversion and fast computation of the update matrices. An HSS inversion algorithm is proposed in [6], and the idea is to recursively apply the SMW formula after writing an HSS matrix as a block diagonal matrix plus a low-rank update see.. A slightly modified form of the SMW formula is used in [6] and can be derived based on the standard version in a clearer way as follows. Assume all the matrices in the following derivation are of appropriate sizes, and all the relevant inverses exist, then 3. D + UBV = D D UB + V D U V D = D D UB + ˆD V D ˆD =V D U = D D UB ˆD + B ˆDV D = D D U[ ˆD + B ˆD ˆD + B ] ˆDV D =D D U ˆDV D +D U ˆD ˆD + B ˆDV D. With this formula, a simplification of the HSS inversion algorithm in [6] for a symmetric HSS matrix F is outlined below. he inputs of this algorithm are the HSS generators U i,r i,b i,d i of F, and the outputs are the HSS generators Ũi, R i, B i, D i of F. In the following two subsections, Di, Ūi, Ḡi, ˆD i, Ĝi are intermediate results in the inversion, and G i for a nonleaf node i is used only for the derivation and is not actually computed Basic HSS inversion in terms of two levels. o motivate the HSS inversion procedure, first consider a two-level symmetric HSS form 3.3 D 3 = D D U + U B B U U.

9 FAS SPARSE SELECED INVERSION 9 Accordingto3., 3.4 D3 = diagg,g +diagũ, Ũ D 3 diagũ, Ũ, where G j = D j D j U j ˆDj Uj D j, Ũ j = D j U j ˆDj, j =,, with ˆD j =Uj D ˆDc B D i = c Bc ˆDc j U j, j =,,, i =3c =,c = General HSS inversion. o generalize to more levels, we consider a postordered HSS tree consisting of three levels and 7 nodes, with node 7 as the root, nodes 3, 6 children of 7 at the next level, and nodes, children of 3 and 4, 5 children of 6 at the leaf level. A corresponding symmetric HSS matrix looks like D 7 = D3 D6 + U3 U6 B 3 B 3 U 3 where D 3,D 6,U 3,U 6 are in nested forms just like.. with i = 3 or 6. Since D 3 and D 6 are two-level HSS forms, the two-level HSS inversion above yields D3 in 3.4 and also D6.heŨgenerators associated with the leaves are obtained. On the other hand, by treating D 7 itself as a two-level HSS form, we can get 3.8 D7 = diagg 3,G 6 +diagũ3, Ũ6 D 7 diagũ 3, Ũ 6, where G 3,G 6, Ũ3, Ũ6 are defined just like in 3.5 with j = 3 or 6, and D 7 is defined just like in 3.7 with i =7. Let Ĝ 7 = D 7 Ĝ7;, Ĝ 7;, Ĝ 7;, Ĝ 7;, where Ĝ7 is partitioned as in 3.8. hen, D7 G3 + = Ũ3Ĝ7;,Ũ 3 Ũ 3 Ĝ 7;, Ũ6 Ũ 6 Ĝ 7;, Ũ3 G 6 + Ũ6Ĝ7;,Ũ 6 hus, we can set 3.9 B3 = Ĝ7;,, D3 = G 3 + Ũ3Ĝ7;,Ũ 3, Ũ 3 = D 3 U 3 ˆD 3, where ˆD 3 is defined just like in 3.6 with j = 3. Here, we focus on node 3 and its descendants. he study of node 6 and its descendants is similar. We then need to resolve the following issues:. D7 and Ũ3 involve ˆD 3, and the computation of ˆD3 =U3 D3 U 3 needs D3 and a nested basis matrix U 3, which are not explicitly available;. Ũ 3 should appear as a nested basis matrix, so we need to find R and R ; 3. D3 itself is a two-level HSS form, so we need to find D, D, B. For these purposes, we introduce the following simple lemma. Lemma 3.. he matrices in satisfy, U j Ũj = I, U j G ju j =0, j =,, diaguc,uc Di diagu c,u c =. D i, i =3. U 6,

10 9 J. XIA, Y. XI, S. CAULEY, AND V. BALAKRISHNAN Proof. From 3.5 and 3.6, we have U j Ũj = Uj D j U j ˆDj = U j G ju j = U j D j D j ˆD j ˆD j = I, U j ˆDj Uj D U j = Uj D j U j Uj D j U j ˆD j Uj D j U j = ˆD j From these two results and 3.4, for i =3wehave j ˆD j ˆD j ˆD j =0. diaguc,uc Di diagu c,u c =diaguc,uc [diagg c,g c +diagũc, Ũc D i diagũ c, Ũ c ] diagu c,u c =diaguc G c Ũ c,uc G c U c +diaguc Ũ c,uc Ũ c D i diagũ c U c, Ũ c U c = diag0, 0 + diagi,i D i diagi,i= D i. hen we resolve the three issues above Lemma 3.. First, we derive a convenient way to find ˆD 3 =U3 D3 U 3. his is based on the following lemma. Lemma 3.. D i in 3.7 satisfies 3.0 Ui D i U i = Ū i D i Rc Ū i with Ū i =, i =3. R c hus, ˆD i =Ū i D i Ū i which does not involve the nested forms D i,u i. Proof. According to the nested form of U i and Lemma 3., Ui D i U i = Rc Rc U U Di U Rc U R c = Ū i D i Ū i. Second, we find R and R in the nested form of Ũ3: R = D3 R U 3 ˆD 3 from 3.9. Ũ Ũ By Lemma 3., we can multiply diagu,u on the left of both sides to get R U = R U D3 U 3 ˆD U 3 = U D3 U R 3. U R = D 3 Ū 3 ˆD3 from Lemma 3. and Ū3 in 3.0. his gives a formula for computing R and R. hird, we find D, D, B. From 3.4, 3.9, and the nested form of Ũ3, wehave D 3 = G 3 + Ũ3Ĝ7;,Ũ 3 = D 3 =D 3 D 3 U 3 ˆD 3 U 3 D 3 +Ũ3Ĝ7;,Ũ 3 Ũ3 ˆD 3 Ũ3 + Ũ3Ĝ7;,Ũ 3 from 3.9 =diagg,g + diagũ, Ũ D 3 diagũ, Ũ [ D = diagg,g +diagũ, Ũ ˆD 3 Ũ3 ˆD 3 Ũ3 + Ũ3Ĝ7;,Ũ 3 R R R 3 ˆD 3 R R ] + Ĝ 7;, R R R diagũ, Ũ.

11 FAS SPARSE SELECED INVERSION 93 Define R Ḡ 3 = D 3 R = = D 3 D 3 D 3 Ĝ 3 = Ḡ3 + ˆD 3 R R D 3 Ū 3 ˆD3 ˆD 3 ˆD 3 Ū3 D 3, D 3 from 3. Ū 3 ˆD3 Ū3 R Ĝ3;, Ĝ Ĝ 7;, R R R 3;, Ĝ 3;, Ĝ 3;,, where Ĝ3 is partitioned conformably. hen D 3 =diagg,g + diagũ, ŨĜ3 diagũ, Ũ G + = ŨĜ3;,Ũ Ũ Ĝ 3;, Ũ Ũ Ĝ 3;, Ũ G + ŨĜ3;,Ũ. hus, we obtain the following generators: 3.4 D = G + ŨĜ3;,Ũ, D = G + ŨĜ3;,Ũ, B = Ĝ3;,. In general HSS inversion with more levels, the generators of F can be similarly found. he derivation is similar to the process above. Lemma 3.3 below shows some essential ideas by induction. Another way is based on a telescoping HSS representation in [6]. We do not repeat the details since they do not affect the understanding of our later discussions on selected inversion. he procedure can be organized into two traversals of the HSS tree. In a bottom-up traversal, define hierarchically 3.5 D i = D i, Ū i = U i, ˆDi =Ū i ˆDc B D i = c Rc Bc, Ū i =, ˆDi =Ū ˆDc R i c D i Ū i D i Ū i i: leaf i: nonleaf. he derivations above and Lemma 3.3 below indicate that, D i, Ūi can be understood as generators of an intermediate reduced HSS matrix [6, 39]. hen compute Rc 3.6 Ũ i = D i Ū i ˆDi i: leaf or = D i Ū i ˆDi i: nonleaf; R c see, e.g., 3.5 and 3.. Proceed with these steps for the nonleaf nodes i. When the node i =root is reached, compute only D i as in 3.5. In a top-down traversal of, we find the D, B generators of F.Fori =root, D i let Ĝi and partition Ĝi conformably following 3.5 as Ĝi;, Ĝ 3.7 Ĝ i = i;,. hen let Ĝ i;, Ĝ i;, 3.8 Bc = Ĝi;,. For a nonleaf node i<root, let 3.9 Ḡ i = D i D i Ū i ˆDi Ūi D i, Ĝ i = Ḡi + Rc R c Ĝ pari;j,j R c R c,

12 94 J. XIA, Y. XI, S. CAULEY, AND V. BALAKRISHNAN where j = or, depending on whether i is a left or a right child of pari. Ḡi can be understood as a diagonal block of the inverse of a reduced HSS matrix [6]; see, e.g., 3.5 and 3.. Ĝ i is the sum of Ḡ i and a low-rank contribution from the parent level; see, e.g., 3.3. hen partition Ĝi as in 3.7, and set B c as in 3.8. Repeat these until all the nonleaf nodes are visited. hen for each leaf i, define Ḡ i as in 3.9 and let 3.0 Di = Ḡi + ŨiĜpari;j,jŨ i, where j = or, depending on whether i is a left or a right child of pari; see, e.g., 3.4.henwehavetheHSSgenerators D i, Ũi, R i, B i of F Fast computation of the update matrices. he HSS inversion is applied to F in.6 when we compute the diagonal blocks of C. In addition, the information computed in the inversion of F can also help to speed up the computation of the update matrix S i in.0 in the structured multifrontal method, as well as some additional computations in the selected inversion. For this purpose, we first show that the results in Lemmas 3. and 3. can be generalized. Lemma 3.3. Let G i = D i 3. D i U i ˆDi Ui D i for a node i root, then Ui D i U i = Ū i D i Ū i = ˆD i, Ui Ũ i = I, Ui G i U i =0, diaguc,uc D i diagu c,u c = D i i: nonleaf. Proof. We only need to prove 3.. he reason is, once 3. holds, the other formulas can be proved in the same way as in the proof of Lemma 3.. We show Ui D i U i = Ū i D i Ū i = ˆD i by induction. Let i denote the subtree of associated with node i and its descendants. he induction is done on the number of levels l of i. he result holds for i with two levels, as in Lemma 3.. Assume the result holds for any subtree of with up to l levels. We show it also holds for i with l levels. Writing D i as a block diagonal plus a low-rank form see, e.g., 3.3 and applying the modified SMW formula 3. yield 3. D i = diagdc,dc diagdc U c,dc U c ] [H H B c Bc + H H diaguc Dc,Uc Dc, where 3.3 H = diag Uc Dc U c,uc Dc U c. According to the nested form of U i in., we have Ui D i U i = Rc Rc Rc H R R c Rc HH Rc H c R c + Rc Rc HH B c + H H Rc H R c B c = Rc Rc Uc Dc U c B c Bc Uc Dc U c Rc. R c

13 FAS SPARSE SELECED INVERSION 95 Fig. 3. A pictorial illustration of the formula in heorem 3.4. Since c and c are at level l, by induction, 3.4 U c D c U c = Ū c D c Ū c = ˆD c, Uc D U c = Ū c D Ū c = c c ˆD c. hus, Ui D i U i = Rc Rc ˆDc Bc Rc Bc ˆDc R c = Ū i D i Ū i. Equation 3. illustrates an idea of reduced matrices just like that in [39]. hus, to compute the update matrix S i in.0, we can avoid directly using the HSS form of F D k in Uk D k U k. he following result is a direct corollary of Lemma 3.3. heorem 3.4. Suppose D i,u i, etc., are the HSS generators of F in.6. hen U k F U k = Ū k D k where the HSS tree of F has k nodes and D k is given in 3.5 with i = k in the HSS inversion of F. herefore, S i in.0 can be quickly computed as F Ni,N i U kb k Ū k D k ŪkB k U k. See Figure 3 for an illustration. his theorem indicates that D k plays a role similar to the final reduced matrix defined in [39] where F is factorized by ULVtype algorithms [0], yet here we take advantage of HSS inversion. If F has size N and HSS rank r, this theorem can help to reduce the complexity of computing Uk F U k from Or N with HSS inversion to only Or 3 with a simple formula Ūk D k Ū k. 3.. Basic ideas of the structured selected inversion. Similarly to the existing direct selected inversion methods in [3, 5, 6, 7], the extraction of diagc includes two stages, a forward one block LDL factorization and a backward one inversion. he basic ideas of our structured selected inversion can be illustrated with a simple example. Consider a sparse symmetric matrix A after applying nested dissection, as well as its block LDL factorization: A = A A 3 A A 3 = A 3 A 3 A 33 I Ū k, I A L 3 L 3 I where A 33 corresponds to the top-level separator, and A I L 3 I L 3, F 3 I L 3 = A 3 A, L 3 = A 3 A, F 3 = A 33 L 3 A L 3 L 3 A L 3. When A results from a problem with the low-rank property, F 3 can be approximated by an HSS form, and the above factorization is performed recursively in a

14 96 J. XIA, Y. XI, S. CAULEY, AND V. BALAKRISHNAN structured way as the structured multifrontal factorization in section. hen the resulting factors are structured or data sparse. In the inversion stage, we have 3.5 I L A A 3 = I L 3 A I A L 3 = A + L 3 hus, F 3 F 3 L 3 F 3 L diag A = I I L 3 L 3 I F 3 L3 L 3 diaga + L 3F 3 L 3 diaga + L 3F 3 L 3 diagf 3, L 3 F 3 L 3 F 3 F 3. where F 3 L 3, F 3 L 3, andf 3 are submatrices of A corresponding to the nonzero blocks of L. his also means that the extraction of diag A needs the computation of the off-diagonal blocks of A that fall into the block nonzero pattern of the L factor from the LDL factorization [3]. With the structured factorization, L and F 3 are approximated by data-sparse forms, all the operations in 3.6 can be performed via structured HSS or low rank operations. For example, the HSS inversion in section 3. can be used to compute F 3, and multiplications of HSS matrices and low-rank matrices are used for F 3 L 3 and F 3 L 3. he idea can then be recursively applied. Remark 3.. Since C is generally a dense matrix, a full direct inversion for 3.6 costs On 3, which is prohibitive for large n and is impractical. In able IV of [7], some test examples are given. For small or modest n, the selected inversion is already at least 3 times faster than the full inversion, and often hundreds of times faster. For example, for the test matrix pwtk of size n = 7,98 in section 5, the selected inversion is 353 times faster. In this work, we will further show that our structured selected inversion is faster than the nonstructured selected inversion Structures within C and general structured selected inversion. In the general selected inversion, the first stage is a structured multifrontal LDL factorization as in section., and the second stage is a structured inversion. In the factorization stage, we traverse the assembly tree in its postorder, and the resulting factors are in data-sparse forms HSS or low rank. We then focus on the inversion stage, where we traverse in its reverse postorder. We use C to denote the diagonal block C si s i of C in 3. corresponding to node i of, and use C Ni,i to denote the off-diagonal block C sni s i of C with N i in Definition.. See 3.5 for an example where, for i =,wehavec corresponding to A + L 3 F 3 L 3 and C Ni,i corresponding to F 3 L 3. For the node k root, the corresponding diagonal block of C is C k,k = F k.

15 FAS SPARSE SELECED INVERSION 97 Apply the HSS inversion procedure in section 3. to the HSS form of the final frontal matrix F k. he diagonal blocks of C k,k are simply the D generators of the resulting HSS form of C k,k as in 3.0. For a node i of with i < k, we show a structured computation of C.Letl i be the row index in A that the first row of A si s i in.4 corresponds to. he derivation of the representation of C follows directly from : 3.7 C = F +L li+ :n l i :l i+ C li+ :n l i+ :nl li+ :n l i :l i+. Since L li+ :n l i :l i+ has many zero blocks following nested dissection, only the entries of C li+ :n l i+ :n within the block nonzero pattern of L li+ :n l i :l i+ are needed to compute C [3]. hus, following Definition. for N i,welet L N i,i = L sj s i L sj s i L sjα s i, wherewesuppose 3.8 N i {j, j,...,j α }. We can similarly form a matrix C Ni,N i from C li+ :n l i+ :n so that the following formulas are used for the actual construction of C in 3.7: 3.9 C = F + L N i,ic Ni,N i L Ni,i = F L N i,ic Ni,i with 3.30 C Ni,i = C Ni,N i L Ni,i. he extraction of the diagonal blocks of C in 3.9 involves these steps: apply HSS inversion to the HSS form of F to extract the diagonal generators of F ; compute C Ni,i in 3.30 via the low-rank structure of L Ni,i and the rank structure of C Ni,N i that has already been computed; compute L N i,i C N i,i in 3.9 via the low-rank structures of L Ni,i and C Ni,i. he first step is shown in section 3.. We thus focus on the latter two steps. First, consider L Ni,i. Recall that F Ni,i is in a low-rank form.9. hus, L Ni,i is also in a low-rank form noticing the notation assumption above 3.: 3.3 L Ni,i = F Ni,iF = U kb k U k F. We need to compute Uk F,whereU k is a nested basis matrix. Here, instead of using HSS solutions or HSS matrix-vector multiplications, we can compute Uk F with a more compact form based on an idea similar to heorem 3.4. heorem 3.5. With the notation in Lemma 3.3 and heorem 3.4, we have Uk F = Ūk D k diagũ k, Ũ k, where k and k are the left and right children of k, respectively. Proof. Similarly to the proof of Lemma 3.3, we use induction and just show the general induction step Ui D i = Ūi D i diagũ c, Ũ c for a nonleaf node i. According to 3., 3.3, and the nested form of U i, Ui D i = Rc Rc diagu c Dc,Uc Dc Rc Rc [H H H = Rc Rc B c Bc ] B c Bc + H H diaguc Dc,Uc Dc + H H diaguc Dc,Uc Dc.

16 98 J. XIA, Y. XI, S. CAULEY, AND V. BALAKRISHNAN According to 3.3 and 3.4, H = diag ˆD c, ˆD c. hus, U i D i = Ū i ˆDc B c B c ˆDc diag ˆD cuc Dc, ˆD c Uc Dc = Ū i D i diag ˆD c Uc Dc, ˆD c Uc Dc. D c and D c are submatrices of D i and are also HSS. Let j and j be the children of c. By induction, ˆD c U c D c = ˆD c Ū c D c diagũ j, Ũ j = ˆD c Ūc D c diagũ j, Ũ j = R j R j diag Ũj, Ũ j =Ũ c, where 3.6 is used with i replaced by c. Similarly, we have ˆD c Uc Dc = Ũ c,and then the theorem holds. he benefit of this formula can be seen from a pictorial illustration similar to Figure 3. By this formula, L Ni,i in 3.3 can be written in a compact form L Ni,i = U kb k Ūk D k diagũ k, Ũ k. Note that at this point, we leave L Ni,i in the above low-rank form, which can be conveniently multiplied by the structured form of C Ni,N i later. Next, we study the structures of some blocks of C. For convenience, we write down the following simple lemma, which can be proven by definition; see, e.g., [0]. Lemma 3.6. If F is an invertible matrix with HSS rank bounded by r, thenf also has HSS rank bounded by r. Another lemma shows how the addition of matrices with the same off-diagonal nested bases preserves the structure, and is a direct extension of the results in [39, 4] and [40, Proposition 3.3]. Lemma 3.7. Assume F is an HSS matrix with generators D, B, U, V,etc.,and corresponds to an HSS tree with root k. Let H be any square matrix with size equal to the column size of the nested basis matrix U k. hen the following matrix is also an HSS matrix: F + U k HV k, and its U, V, R, W generators are the same as those of F,anditsD, B generators have the same sizes as and can be obtained by updating the D, B generators of F. We can now prove an important theorem that discloses the rank structures of C and C Ni,i. heorem 3.8. Assume all the frontal matrices in the multifrontal factorization of A can be approximated by HSS matrices with HSS ranks bounded by r, so that the factorization is the exact one of Ã. hen for C in 3., a C is an HSS matrix with HSS rank bounded by r; b C Ni,i is a low-rank form with rank bounded by r.

17 FAS SPARSE SELECED INVERSION 99 Proof. We prove a first. For a node i of,considerc in 3.9. he HSS structure of F follows from Lemma 3.6. According to 3.3 and heorem 3.5, 3.3 C = F + L N i,i C N i,n i L Ni,i = F + diagũk, Ũk [Bk k U k C Ni,N U kb i k k }{{ ] diagũ k }, Ũ k H Ūk D F + diagũk, Ũk H diagũ k, Ũ k. Ūk D his indicates that diagũk, Ũk gives a column basis of L N i,i C N i,n i L Ni,i. On the other hand, Ũk and Ũk are also the generators of F that give the column bases of its appropriate off-diagonal blocks. In another word, the off-diagonal blocks of F and L N i,i C N i,n i L Ni,i have the same nested basis information. More specifically, we can partition the right-hand side of 3.3 conformably into a block form: F C = F Ũk H H Ũ F F + k Ũ k H H Ũk F = + Ũk H Ũk F + Ũk H Ũk F + Ũk H Ũk F. + Ũk H Ũk According to Lemma 3.7, the, and, blocks of the above matrix are HSS forms, and the HSS generators are just those of F, except the entries but not the sizes of the D, B generators of F are updated. In addition, the, block is F + Ũk H Ũ k = Ũk B k + H Ũ k, and the size of B k + H is bounded by r. hus, C has the same Ũ, R generators as F, and the summation on the right-hand side of 3.3 does not increase the HSS rank, which remains bounded by r. For b, C Ni,i has a low-rank form C Ni,i = C Ni,N i L Ni,i = C Ni,N i U kb k Ūk D k diagũ k, Ũ k. Since B k is an HSS generator of F i and has size bounded by r, the right-hand side has rank bounded by r. herefore, the blocks of C Ni,N i in 3.9 can be conveniently represented by HSS or low-rank forms, which then participate in the computation of C. his is illustrated in Figure 4. he formula 3.3 in the proof above gives our structured method for computing C. For notational convenience, let 3.33 U Ni = C Ni,N i U k, B i = B k, V i = then Ūk D k diagũ k, Ũ k ; C Ni,i = U Ni B i V i, C = F L N i,ic Ni,i = F + V i B i U k U Ni B i V i.

18 300 J. XIA, Y. XI, S. CAULEY, AND V. BALAKRISHNAN i Structures in the factors ii Structures in block lower-triangular C Fig. 4. he rank structures in the LDL factors of Ã and in C, and the pieces marked in red that are needed to compute C in 3.9. We use the low-rank form 3.34 to form C Ni,i, which is then used in the computation of C as in he structured forms of both C Ni,i and C then participate in the later inversion steps associated with the descendants of node i. One task is to compute the product U Ni = C Ni,N i U k in Note that U k is a column basis matrix of an off-diagonal block of F i in.9, and is hierarchically represented by lower-level generators. hus, the cost of computing U Ni is comparable to the multiplication of an HSS matrix and a nested basis matrix. For simplicity, we first briefly show how to compute C Ni,N i U k with an explicit form of U k. Formeshes with local connectivity, N i in 3.8 usually has only a finite/small number of nodes. hus, this only involves a small number of HSS matrix-vector products and low-rank matrix-vector products. hat is, we partition U k into block rows U j, k, for j N i,so that the row size of U j, k, is equal to the row size of C j,j. Also partition U Ni into block C j,ni U k for j N i similarly holds for each j N i.hus, rows U i j 3.36 U i j = C j,j U j, k, t N i,t<j U j t B t Vt U t, k V j B j UN j Û j, k, where Ûj, k is obtained by stacking all the blocks U t, k for t N i, t > j. he first term on the right-hand side involves HSS matrix-vector products, and the remaining two terms are low-rank matrix-vector products. A fast HSS matrix-vector multiplication scheme can be found in [9]. U i j can then be quickly computed. After this, stack all the pieces U i j to get U Ni. Moreover, since U k is a nested basis matrix, the cost of multiplying C Ni,N i and U k may be lower than straightforward HSS matrix-vector multiplications. he is because of the existence of possible rank patterns across different levels of the HSS tree see the next section. hat is, in., U i may have a larger column size than U c or U c, i.e., the HSS block corresponding to i may have a higher rank than the individual HSS blocks corresponding to c and c. hus, multiplying a matrix by the nested form of U i may be cheaper than by the explicit form of U i. With multilevel hierarchy, the difference in the efficiency can be very significant.

19 FAS SPARSE SELECED INVERSION 30 able Major structured operations for computing C Ni,i and C. C Ni,i Formulas C Ni,N i U k U Ni Bi V i Structured operations Multiplication of HSS and nested basis matrices Multiplication of nested basis matrices F HSS inversion C V i B i U ku Ni Bi V i Multiplications of nested basis matrices F + V i B i U ku Ni Bi V i HSS generator update Overall, computing C Ni,i and C involves the structured operations in able. Some of the matrix products are related to the multiplications of HSS matrices. See [8, section 3.] for an HSS multiplication scheme. hat is, the generators of the product matrix can be computed following a bottom-up traversal of the HSS tree and a top-down traversal. he latter traversal is not needed during the multiplication of an HSS matrix and a nested basis matrix. Interested readers can find the detailed derivation of the algorithm in [8, proof of heorem 3..]. he results of the matrix multiplications in able are also structured, i.e., can be represented by nested basis matrices. One way to understand this is to use the idea of the fast multipole method FMM [9] when A corresponds to a discretized Green s function, together with the fact that C Ni,i is an off-diagonal block of A. Sometimes, after the multiplications, a recompression step [38, section 5] may be used to recover compact structured forms. hat is, in a bottom-up traversal of the HSS tree, the basis matrices are compressed, and related upper-level generators are modified. hen in a top-down traversal, the B i generators are compressed, and the related lower-level generators are modified. As an example, for the separator i = in Figures and 4, we have N i = {3, 7, 5}. hen C 3,3 V 3 B 3 C N,N = U 3 7 U 3 5 B3 V3 U 3 7 U 3 5 C 7,7 V 7 B 7 UN 7 U N7 B7 V 7 C 5,5 See Figure 5 for an illustration of the multiplication of C N,N with U k, whereanested form of U k is shown in Figure 6. he overall structured selected inversion scheme is summarized in Algorithm. As in the structured multifrontal factorization, a switching level l s is involved for the optimization of the complexity in the next section. Remark 3.. hroughout the multifrontal method coupled with nested dissection, we only need to work on L Ni,i and C Ni,N i, which contain the local subblocks of L and C, respectively. his naturally takes advantage of the nonzero pattern of L and has nice data locality. On the other hand, the method in [5, 6] uses the indices of the nonzero entries of L and it is not immediately clear which dense pieces need to be put together. In addition, in [6], to find C Ni,i, the blocks C j,i for all the ancestors j of i are visited. Here, we only visit N i, which is often just a small subset of the ancestors. N i is determined in the symbolic factorization stage after nested dissection. Remark 3.3. In terms of the memory, in lines 5, 6, 3, and 4 of Algorithm, the storage for the dense or structured blocks of L and Λ may be used to at least partly store C Ni,i and C. hus, the overall memory for the extraction of diagc is.

20 30 J. XIA, Y. XI, S. CAULEY, AND V. BALAKRISHNAN Fig. 5. he structure of C N N related to Figure 4 zoomed in and the multiplication of C N N with U k, whereu k is shown in Figure 6. Fig. 6. A nested basis form of U k in Figure 5. Algorithm. Structured selected inversion for extracting the diagonal blocks of A. : procedure SINV : Find the diagonal generators of the HSS inverse C k,k = F k for k root 3: for node/separator i of from k todo 4: if i is at level l < l s then Dense extraction below the switching level l s 5: C Ni,i C Ni,N i L Ni,i From 3.30; e.g., 3.5 6: C F L N i,i C N i,i From 3.9; e.g., 3.5 7: else Dense matrix L Ni,i below l s 8: Find the HSS generators D i and Ũi of F Structured F 9: Form B i and V i as in 3.33 Structures of C Ni,i 0: for node/separator j N i do Structures of C Ni,i : Form U Nj as in 3.33 with i replaced by j With appropriate changes of the detailed matrices : end for 3: C Ni,i U Ni Bi V i Low-rank approximation of C Ni,i as in able 4: C F + V i B i U ku Ni Bi V i As in able 5: end if 6: end for 7: end procedure

21 FAS SPARSE SELECED INVERSION 303 able Factorization cost ξ fact, inversion cost ξ inv, and storage σ mem for the extraction of diagc for A discretized on a regular mesh, with the methods in [3, 5, 6, 7] or the dense multifrontal method. Mesh ξ fact ξ inv σ mem D m m, n = m On.5 On.5 On log n 3D m m m, n = m 3 On On On 4/3 able 3 Inversion cost ξ inv and storage σ mem of the structured method for the extraction of diagc for A discretized on a regular mesh, where r is the maximal HSS rank of all the frontal matrices. In practice, the HSS ranks may depend on n. he results here based on a maximum rank bound r would then overestimate the actual costs. Mesh ξinv σ mem D m m, n = m Orn Orn 3D m m m, n = m 3 Or 3/ n Or / n about the same as that for the LDL factorization. he blocks of L and Λ are accessed just like in the backward substitution for solving a linear system. Moreover, in the backward traversal of, some blocks L Nj,j and C Nj,j may be discarded when j is not within any N i for the remaining nodes i. 4. Complexity optimization. For the purpose of later comparison, we restate the complexity and memory usage for the direct selected inversion algorithms in [3, 5, 6, 7] as well as the inversion algorithm based on the dense multifrontal method. Assume A is discretized on a D m m mesh or a 3D m m m mesh. hen the cost ξ fact of factorizing A, thecostξ inv of extracting diagc, and the memory σ mem are reported in able. Here, σ mem measures the storage for the LDL factors and the blocks of C corresponding to the block nonzero pattern of the factors. Later, we also use tilde notation for the counts of the structured method. For example, ξ inv denotes the cost of extracting diagc with the structured inversion. hen we turn to the analysis of our structured method when A has the low-rank property. he analysis is similar to that for the structured direct solvers in [39, 40]. We first present the traditional case when the HSS ranks of all the frontal matrices in the LDL factorization are bounded by r, and then show some similar results by letting r grow following certain patterns. In the following remarks, the cost of factorizing A and the memory may be slightly different from those in [39, 40], since we try to optimize the inversion cost. Remark 4.. Suppose the structured multifrontal factorization and Algorithm are applied to a discretized matrix A on a D m m mesh n = m ora3dm m m mesh n = m 3, and the HSS ranks of the frontal matrices in the multifrontal method are bounded by r. Choose the switching level l s = Olog m Olog r of the assembly tree so that the inversion costs before and after l s are the same. hen after ξ fact = Orn log n flops in two dimensions and ξ fact = Orn 4/3 in three dimensions for the structured factorization, the optimal structured inversion costs ξ inv are Orn in two dimensions and Or 3/ n in three dimensions; the memory requirements σ mem are On log r+on log log n in two dimensions and Or / n in three dimensions see able 3.

MULTI-LAYER HIERARCHICAL STRUCTURES AND FACTORIZATIONS

MULTI-LAYER HIERARCHICAL STRUCTURES AND FACTORIZATIONS JIANLIN XIA Abstract. We propose multi-layer hierarchically semiseparable MHS structures for the fast factorizations of dense matrices arising from