FAST SPARSE SELECTED INVERSION

Size: px
Start display at page:

Download "FAST SPARSE SELECTED INVERSION"

Transcription

1 SIAM J. MARIX ANAL. APPL. Vol. 36, No. 3, pp c 05 Society for Industrial and Applied Mathematics FAS SPARSE SELECED INVERSION JIANLIN XIA, YUANZHE XI, SEPHEN CAULEY, AND VENKAARAMANAN BALAKRISHNAN Abstract. We propose a fast structured selected inversion method for extracting the diagonal blocks of the inverse of a sparse symmetric matrix A, using the multifrontal method and rank structures. When A arises from the discretization of some PDEs and has a low-rank property the intermediate dense matrices in the factorization have small off-diagonal numerical ranks, structured approximations of the diagonal blocks and certain off-diagonal blocks of A that are needed to find the diagonal blocks of A can be quickly computed. A structured multifrontal LDL factorization is first computed for A with a forward traversal of an assembly tree, which yields a sequence of local data-sparse factors. he factors are used in a backward traversal of the tree for the structured inversion. he intermediate operations in the inversion are performed in hierarchically semiseparable or low-rank forms. With the assumptions of data sparsity and appropriate rank conditions, the theoretical structured inversion cost is proportional to the matrix size n times a low-degree polylogarithmic function of n after structured factorizations. he memory counts are similar. In comparison, existing direct selected inversion methods cost On 3/ flops in two dimensions and On flopsinthree dimensions for both the factorization and the inversion, with On 4/3 memory in three dimensions. Additional formulas for efficient structured operations are also derived. Numerical tests on two- and three-dimensional discretized PDEs and more general sparse matrices are done to demonstrate the performance. Key words. fast selected inversion, data sparsity, structured multifrontal method, low-rank property, HSS matrix, linear complexity AMS subject classifications. 5A3, 65F05, 65F30, 65F50 DOI. 0.37/ X. Introduction. Extracting selected entries often the diagonal of the inverse of a sparse matrix, usually called selected inversion, is critical in many scientific computing problems. Examples include preconditioning, uncertainty quantification in risk analysis [3], electronic structure calculations within the density functional theory framework [6], and simulations of nanotransistors and silicon nanowires [8]. he diagonal entries of the inverse are useful in providing significant insights into the original problem or its solution. he goal of this paper is to present an efficient structured method for computing the diagonal of A as a column vector, denoted by diaga, as well as the diagonal blocks of A for a large n n sparse symmetric matrix A. hemethodalso produces some off-diagonal blocks of A. For convenience, we usually just mention diaga. In recent years, a lot of effort has been made in the extraction of diaga. Since A is often fully dense, a brute-force formation of A is generally impractical. On the other hand, it is possible to find diaga without forming the entire inverse. Received by the editors February 8, 04; accepted for publication in revised form by S. Le Borne July 7, 05; published electronically September, Department of Mathematics and Department of Computer Science, Purdue University, West Lafayette, IN xiaj@math.purdue.edu, yxi@math.purdue.edu. he research of the first author was supported in part by an NSF CAREER Award DMS-5546 and an NSF grant DMS-557. Athinoula A. Martinos Center for Biomedical Imaging, Department of Radiology, Massachusetts General Hospital, Harvard University, Charlestown, MA 09 stcauley@nmr.mgh.harvard.edu. School of Electrical and Computer Engineering, Purdue University, West Lafayette, IN ragu@ecn.purdue.edu. 83

2 84 J. XIA, Y. XI, S. CAULEY, AND V. BALAKRISHNAN If A is diagonally dominant or positive definite, A mayhavemanysmallentries. Based on this property, a probing method is proposed in [33]. It exploits the pattern of a sparsified A together with some standard graph theories, and computes diaga by solving a sequence of linear systems with a preconditioned Krylov subspace method. Later, several approaches were proposed for more general matrices. he fast inverse with nested dissection methodin[3,4]andtheselectedinversion method in [5] use domain decomposition and compute some hierarchical Schur complements of the interior points for each subdomain. his is followed by the extraction of the diagonal entries in a top-down pass. he method Selinv in [6, 7] uses a supernode left-looking LDL factorization of A to improve the efficiency. he work in [] focuses on the computation of a subset of A by accessing only part of the factors where the LU or LDL factorization of A is held in out-of-core storage. All these methods in [, 3, 5, 6] are direct methods. For iterative methods, a Lanczos-type algorithm is first used in [35]. Later, a divide-and-conquer DC method and a domain decomposition DD method are presented in [34]. he DC method assumes that the matrix can be recursively decomposed into a block-diagonal plus low-rank form, where the decomposed problem is solved and corrected by the Sherman Morrison Woodbury SMW formula at each recursion level. he DD method solves each local subdomain problem and then modifies the result by a global Schur complement. Both methods use iterative solvers and sparse approximation techniques to speed up the computations. Our scheme for the selected inversion is a direct method. As in [5, 6, 7], the scheme includes two stages, a block LDL factorization stage and an inversion stage. We incorporate various sparse and structured matrix techniques, especially a structured multifrontal factorization and a structured selected inversion, to gain significant efficiency and storage benefits, as outlined below. General sparse matrices and the multifrontal method. Our method is applicable to symmetric discretized matrices on both two-dimensional D and threedimensional 3D domains, as well as more general symmetric sparse matrices as long as certain rank structures exist, as explained in the next item. Just like in modern sparse direct solvers, we apply the nested dissection ordering [4] to the mesh or adjacency graph by calling some graph partitioning tools. hus, our inversion method does not rely on the special shape of the computational domain, and is more generally applicable than those in [5, 34]. o enhance the data locality of the later inversion, we use the multifrontal method [3] to compute a block LDL factorization of A. his method converts the overall sparse factorization into a sequence of operations on some local dense matrices called frontal matrices, following a nice tree structure called assembly tree. his makes it convenient to manage and access data in the inversion. Fast structured factorization stage. o speed up the factorization and the later inversion, we further incorporate rank structured techniques. For problems such as some discretized elliptic/elasticity equations and Helmholtz equations with low or medium frequencies, the dense intermediate matrices in the sparse factorization often have certain rank structures see, e.g., [,, 6, 7, 9, 3, 4]. For these cases, the problem or the discretized matrix A is often said to have a low-rank property. We perform the LDL factorization of A with the structured multifrontal methods in [39, 40, 4], which further approximate the frontal matrices in the multifrontal method by hierarchically semiseparable HSS forms [0, 4]. Such HSS forms take much less storage than the original dense ones. he local dense operations are thus converted into a series of fast structured ones. he complexity of the LDL factorization is

3 FAS SPARSE SELECED INVERSION 85 On times a low-degree polylogarithmic function of n in two dimensions, and On to On 4/3 times a low-degree polylogarithmic function of n in three dimensions, depending on certain rank conditions. Moreover, after the factorization, the factors are in data-sparse forms and with storage roughly proportional to n, which makes the later efficient inversion feasible. Related to the later inversion, we also show a fast Schur complement computation strategy in heorem 3.4 using a concept similar to the reduced HSS matrix in [39]. he formula takes advantage of an HSS inverse computation that is needed in the inversion stage, and significantly saves the Schur complement computation cost in the factorization stage. 3 Fast structured selected inversion stage. After the structured multifrontal factorization, the factors are represented by a sequence of HSS or low-rank forms. his yields a structured approximation à to A. he inversion procedure is performed on these data-sparse factors instead of the original dense ones. In heorem 3.8, we show the rank structures in the diagonal blocks and selected off-diagonal blocks of Ã. hey are either HSS or low-rank forms. he major operations of the inversion are then HSS inversions, multiplications of HSS and low-rank matrices, and low-rank updates, which can all be quickly performed. his thus significantly improves the efficiency and the storage over the standard direct selected inversion. In heorem 3.5, we also derive another formula to quickly apply the inverse of the leading part of an HSS form to its off-diagonal part, as needed in the inversion. 4 Nearly linear theoretical complexity for selected inversion. hecomplexityof the inversion algorithm is analyzed with a complexity optimization strategy and a rank relaxation idea in [39]. Unlike in [39, 4] where the factorization cost is optimized, here we minimize the selected inversion cost. A switching level in the assembly tree is used to shift from dense local inversions to HSS ones. he structured selected inversion has almost linear complexity for the discretized matrices with the low-rank property in both two and three dimensions. More specifically, if the off-diagonal numerical ranks of the intermediate frontal matrices are bounded by r, then the selected inversion cost is Orn in two dimensions and Or 3/ n in three dimensions. If the off-diagonal numerical ranks grow with n following the rank patterns in ables 4 6, then the theoretical selected inversion cost and the storage are proportional to n times lowdegree polylogarithmic functions of n. In contrast, the methods in [3, 5, 6, 7] cost On 3/ in two dimensions and On in three dimensions, for both the LDL factorization and the selected inversion, and need On 4/3 storage in three dimensions. Due to the data sparsity, our method has the potential to be extended to the fast extraction of general off-diagonal entries of A. Numerical tests in terms of discretized Helmholtz equations in both two and three dimensions as well as various more general problems from a sparse matrix collection are performed. Significant performance gain is observed for the structured selected inversion over the standard direct one. We would like to mention that an alternative way is to use H- orh -matrices [, 6, 7, 7, 0, ], especially for D and 3D discretized PDEs. hat is, the discretized operators can be first approximated by H- orh -matrices and then a related inversion procedure is applied. In fact, the methods in [7, ] involving nested dissection and the one in [0] involving weak admissibility conditions share some concepts similar to ours. In particular, if H -matrix techniques are applied to certain algebraic operators, strict On inversion complexity is possible. Here by using the multifrontal method, we seek to take advantage of its data locality and related well-studied graph techniques for sparse matrices.

4 86 J. XIA, Y. XI, S. CAULEY, AND V. BALAKRISHNAN he remaining sections of the paper are organized as follows. Section briefly reviews the structured multifrontal factorization with HSS techniques. Section 3 describes the structured selected inversion algorithm in detail. We discuss the complexity optimization in section 4. he numerical results are shown in section 5. Section 6 concludes the paper. In the presentation, we use the following notation: A s t represents a submatrix of A specified by the row index set s and the column index set t, anda s consists of selected rows of F specified by s; diagd denotes the diagonal entries of D as a column vector; on the other hand, diagd,d denotes a block diagonal matrix with the diagonal blocks D and D ; represents a full postordered binary tree with its nodes denoted by i =,,...,root, where root istheroot; c and c denote the left and the right children of a nonleaf node i in, respectively; sibi and pari denote the sibling and the parent of a node i, respectively; boldface symbols such as F and S are for the dense matrices inside the multifrontal factorization, and i is for a node of the assembly tree.. Structured multifrontal LDL factorization. We first briefly review the structured multifrontal method in [39] and also a block LDL variation, which will be used in the factorization stage before the selected inversion... HSS matrix and algorithms. he structured multifrontal method incorporates HSS structures into the multifrontal method. An N N HSS matrix F with a corresponding HSS tree can be defined as follows [0, 38, 4]. Let each node i of a full binary tree be associated with a consecutive index set t i I {:N}, which satisfies t i t sibi = t pari, t i t sibi =, andt root = I. he index sets t i associated with all the leaves i together form I. An HSS matrix F is given by F D root, where D i is recursively defined for each nonleaf node i and its children c and c as. D i = F ti t i = D c U c B c V c U c B c V c with Uc Rc Vc Wc. U i =, V Uc R i =. c Vc W c he size t i of D i satisfies t i = t c + t c. Notice that this structure is essentially the same as the matrix family M k,τ used in [0] based on a weak admissibility condition. Here, D i,u i,r i, etc., are called the HSS generators associated with node i. Due to., we also call U i,v i associated with a nonleaf node i nested basis matrices. he HSS rank of F is defined to be r = max i=,,...,root D c maxrankf ti I\t i, rankf I\ti t i, where F ti I\t i and F I\ti t i are called the ith HSS block row and column, respectively. An illustration can be found in Figure. If F is symmetric, then U i = V i, R i = W i, B c = Bc. o construct an HSS form, the HSS blocks are compressed hierarchically in a bottom-up traversal of [6, 4]. F is partitioned following the index sets t i.foreach

5 FAS SPARSE SELECED INVERSION i Mesh ii Outer assembly tree and inner HSS trees Fig.. Illustration of the nested dissection ordering [4] of a general mesh and a two-layer tree structure [39, 4], where the outer tree in ii is the assembly tree, and each inner tree is an HSS tree.each leaf node of corresponds to the interior points of a subdomain and each nonleaf node corresponds to a separator or interface. leaf i, a QR or rank-revealing QR factorization is computed: F ti I\t i = U i H i.for a nonleaf node i, appropriate subblocks of H c and H c are merged and compressed to yield R c R c. Similar operations are applied to the HSS block columns to obtain Vi and W i. After such compression, B i can be conveniently identified. he overall HSS construction from dense F costs OrN flops. Later, this HSS construction can be applied to the frontal matrices in the multifrontal method. o avoid such dense HSS construction, randomized methods can be used [40]. If r is small, we can quickly perform various HSS operations. For example, ULVtype algorithms [0, 4] can be used to factorize F in Or N flops, and linear system solution with the factors costs only OrN flops. Similarly, the multiplication and the explicit inversion of HSS matrices [0, 6] can be done in Or N flops. Even if the HSS blocks at different hierarchical levels have ranks growing with the block sizes, the factorization complexity may still be quite satisfactory, as long as the growth follows certain patterns [4,, 38]; see section 4. Recently, HSS techniques were embedded into sparse matrix computations and helped the development of some fast direct solvers and efficient preconditioners [39, 40, 4]. he sparse factorization described in the next subsection is an example... Structured multifrontal block LDL factorization. he multifrontal method [3] can be used to compute a block LDL factorization of A:.3 A = LΛL. o improve the efficiency, the fast structured sparse solver in [39] can be conveniently modified to compute an approximate multifrontal block LDL factorization. hat is, structured approximations to L and Λ will be computed. his is briefly reviewed here as a background for the later structured selected inversion. he matrix A is first reordered with nested dissection to reduce fill-in [4]. he variables or mesh points are grouped into separators, which are used to recursively divide the mesh or adjacency graph into smaller pieces. As in [39], we use some graph partitioning tools [5, 30] to partition the graph, so that our method is not restricted to any particular domain shape or mesh structure. For illustration, an example is shown in Figure i. he multifrontal method performs the sparse factorization via some local factorizations of a series of smaller dense frontal matrices. A postordered assembly tree is formed to organize the elimination of the separators. Each node of corresponds

6 88 J. XIA, Y. XI, S. CAULEY, AND V. BALAKRISHNAN to a separator in the mesh. Later, we do not distinguish between a node of and a separator in the mesh. Label the nodes/separators of as i =,,...,root. See the outer tree in Figure ii. Let s i be the index set of the mesh points in separator i, andn i be the set of neighbors [39] of a node i, defined as follows. Definition.. A neighbor j of node i is an ancestor node of i in the assembly tree satisfying either A sj s i has nonzero entries, or a nonzero entry or fill-in is introduced into A sj s i due to the elimination of a descendant of i in the factorization, i.e., A sj s i =0but L sj s i has nonzero entries. For example, in Figure, N = {3, 7, 5}, N 3 = {7, 5}. NotethatN 3 includes separator 5 because the elimination of separator connects separators 3 and 5. Before reviewing the structured multifrontal method, we emphasize the following: he global sparse factorization is governed by the multifrontal framework, which follows the assembly tree. HSS techniques are applied locally to the intermediate frontal matrices only. hat is, each local HSS tree corresponds to an individual node/separator in the global assembly tree ; see Figure ii. he two types of trees and are independent of each other. In for an HSS matrix, a parent node corresponds to an index set that includes the child index sets. In, a parent node and its children correspond to different mesh point sets. hat is, each node corresponds to an independent separator in the mesh, regardless of the parent/child relationship. he structured multifrontal scheme proceeds as follows. For each node i of,let.4 F 0 i = A si s i A sni s i A sni s i 0 where s Ni can be understood similarly to s i.ifiis a leaf of, the associated frontal matrix is simply F i F 0 i.ifiis a nonleaf node with children c and c, form a frontal matrix F i by an assembly operation called extend-add [3]:.5 F i = F 0 i S c S c, where S c and S c are called update matrices and are obtained from early steps similar to.8 below, and the extend-add operator means that the matrices are permuted and extended to match the global indices in {s i, s Ni }. he structured multifrontal LDL factorization follows the framework in [39] partially structured or [40, 4] fully structured. Partition F i as F F.6 F i N i,i F Ni,i F Ni,N i so that F corresponds to the index set s i. Construct an HSS approximation to F i with generators D i,u i, etc., so that F and F Ni,N i correspond to two sibling nodes k and k of the HSS tree, respectively; see Figure. Next, compute a block LDL factorization I F I LNi F i =,i, L Ni,i I S i I where.7 L Ni,i = F Ni,iF,,,

7 FAS SPARSE SELECED INVERSION 89 Fig.. Illustration of an HSS form and the corresponding HSS tree used for a frontal matrix F i. and S i istheupdatematrixoftheform.8 S i = F Ni,N i F Ni,iF F N i,i. After the HSS approximation of F i, F Ni,i as one of its off-diagonal blocks looks like.9 F Ni,i U kb k U k. We then have.0 S i F Ni,N i U kb k Uk F U kb k U k. Remark.. HSS approximation errors are extensively studied in [37]. For example, if all the HSS blocks are compressed with a relative accuracy τ, then the relative HSS approximation error in the Frobenius norm is O r log Nτ. In other words, the error accumulation factor is O r log N. his may possibly even be reduced to Olog N [5, Corollary 6.8]. If all the frontal matrices are approximated, we expect a similar effect in the error accumulation, due to the similarity in the hierarchical structures of the assembly tree and the HSS tree. hat is, the overall accumulation factor is likely a low-order power of O r log n. A more precise estimation will be conducted in the future for all the factorization steps. Numerical tests indicate that the overall approximation error is usually well controlled. Since we are interested in the selected inversion, unlike the method in [39], an HSS inverse F is computed here see section 3.. heorem 3.4 below indicates that the information in the inversion of F can be used to quickly form.0. In practice, a switching level l s [39, 4] is also involved, so that standard dense factorizations are used when a node of the assembly tree is below level l s ;thisisto optimize the complexity see section 4. Remark.. he structured multifrontal LDL factorization we implement involves a shortcut in [39] which does not affect the performance of the later inversion. he shortcut is to use dense update matrices, so that dense frontal matrices are formed first and then approximated by HSS forms. his has OrN complexity for local HSS compression, and makes the overall multifrontal factorization cost suboptimal, but is much simpler to use. he reasons for this shortcut are clarified in [39]. his multifrontal algorithm is designed to take quite general sparse matrices as inputs, without specific information on the PDE or geometry. If the update matrices are in HSS forms, they may potentially need to be permuted arbitrarily in the assembly operation.5. Designing such a general scheme would be unnecessarily sophisticated. However, when the update matrices are kept dense,.5 is straightforward.

8 90 J. XIA, Y. XI, S. CAULEY, AND V. BALAKRISHNAN As shown in [39], for D and 3D problems satisfying certain rank patterns, the difference in the complexity between this factorization with the shortcut and a fully structured factorization with all HSS local operations is insignificant. For both versions, the factorization complexity is up to On log nintwo dimensions and up to On 4/3 log n in three dimensions. Most importantly, the resulting factors are still structured, just like in a fully structured multifrontal method. herefore, this shortcut does not affect the complexity of the later selected inversion, which is the focus of this work. 3. Structured sparse selected inversion. We then describe our structured selected inversion scheme for A based on its structured LDL factors. After the structured multifrontal factorization, suppose A is approximated by Ã. hat is, the structured factors are the exact factors of Ã. Since no approximation is involved in the inversion stage, we use notation such as F i to denote its HSS approximation, and L to denote the structured factor. his highly simplifies the notation that is otherwise very messy due to the multilevel approximations in the factorization stage. Let 3. C = Ã. hus, our selected inversion computes diagc which approximates diaga. We start with the discussion of an HSS inversion algorithm, as needed in the scheme. he HSS inversion also benefits the computation of the update matrices S i. 3.. HSS inversion and fast computation of the update matrices. An HSS inversion algorithm is proposed in [6], and the idea is to recursively apply the SMW formula after writing an HSS matrix as a block diagonal matrix plus a low-rank update see.. A slightly modified form of the SMW formula is used in [6] and can be derived based on the standard version in a clearer way as follows. Assume all the matrices in the following derivation are of appropriate sizes, and all the relevant inverses exist, then 3. D + UBV = D D UB + V D U V D = D D UB + ˆD V D ˆD =V D U = D D UB ˆD + B ˆDV D = D D U[ ˆD + B ˆD ˆD + B ] ˆDV D =D D U ˆDV D +D U ˆD ˆD + B ˆDV D. With this formula, a simplification of the HSS inversion algorithm in [6] for a symmetric HSS matrix F is outlined below. he inputs of this algorithm are the HSS generators U i,r i,b i,d i of F, and the outputs are the HSS generators Ũi, R i, B i, D i of F. In the following two subsections, Di, Ūi, Ḡi, ˆD i, Ĝi are intermediate results in the inversion, and G i for a nonleaf node i is used only for the derivation and is not actually computed Basic HSS inversion in terms of two levels. o motivate the HSS inversion procedure, first consider a two-level symmetric HSS form 3.3 D 3 = D D U + U B B U U.

9 FAS SPARSE SELECED INVERSION 9 Accordingto3., 3.4 D3 = diagg,g +diagũ, Ũ D 3 diagũ, Ũ, where G j = D j D j U j ˆDj Uj D j, Ũ j = D j U j ˆDj, j =,, with ˆD j =Uj D ˆDc B D i = c Bc ˆDc j U j, j =,,, i =3c =,c = General HSS inversion. o generalize to more levels, we consider a postordered HSS tree consisting of three levels and 7 nodes, with node 7 as the root, nodes 3, 6 children of 7 at the next level, and nodes, children of 3 and 4, 5 children of 6 at the leaf level. A corresponding symmetric HSS matrix looks like D 7 = D3 D6 + U3 U6 B 3 B 3 U 3 where D 3,D 6,U 3,U 6 are in nested forms just like.. with i = 3 or 6. Since D 3 and D 6 are two-level HSS forms, the two-level HSS inversion above yields D3 in 3.4 and also D6.heŨgenerators associated with the leaves are obtained. On the other hand, by treating D 7 itself as a two-level HSS form, we can get 3.8 D7 = diagg 3,G 6 +diagũ3, Ũ6 D 7 diagũ 3, Ũ 6, where G 3,G 6, Ũ3, Ũ6 are defined just like in 3.5 with j = 3 or 6, and D 7 is defined just like in 3.7 with i =7. Let Ĝ 7 = D 7 Ĝ7;, Ĝ 7;, Ĝ 7;, Ĝ 7;, where Ĝ7 is partitioned as in 3.8. hen, D7 G3 + = Ũ3Ĝ7;,Ũ 3 Ũ 3 Ĝ 7;, Ũ6 Ũ 6 Ĝ 7;, Ũ3 G 6 + Ũ6Ĝ7;,Ũ 6 hus, we can set 3.9 B3 = Ĝ7;,, D3 = G 3 + Ũ3Ĝ7;,Ũ 3, Ũ 3 = D 3 U 3 ˆD 3, where ˆD 3 is defined just like in 3.6 with j = 3. Here, we focus on node 3 and its descendants. he study of node 6 and its descendants is similar. We then need to resolve the following issues:. D7 and Ũ3 involve ˆD 3, and the computation of ˆD3 =U3 D3 U 3 needs D3 and a nested basis matrix U 3, which are not explicitly available;. Ũ 3 should appear as a nested basis matrix, so we need to find R and R ; 3. D3 itself is a two-level HSS form, so we need to find D, D, B. For these purposes, we introduce the following simple lemma. Lemma 3.. he matrices in satisfy, U j Ũj = I, U j G ju j =0, j =,, diaguc,uc Di diagu c,u c =. D i, i =3. U 6,

10 9 J. XIA, Y. XI, S. CAULEY, AND V. BALAKRISHNAN Proof. From 3.5 and 3.6, we have U j Ũj = Uj D j U j ˆDj = U j G ju j = U j D j D j ˆD j ˆD j = I, U j ˆDj Uj D U j = Uj D j U j Uj D j U j ˆD j Uj D j U j = ˆD j From these two results and 3.4, for i =3wehave j ˆD j ˆD j ˆD j =0. diaguc,uc Di diagu c,u c =diaguc,uc [diagg c,g c +diagũc, Ũc D i diagũ c, Ũ c ] diagu c,u c =diaguc G c Ũ c,uc G c U c +diaguc Ũ c,uc Ũ c D i diagũ c U c, Ũ c U c = diag0, 0 + diagi,i D i diagi,i= D i. hen we resolve the three issues above Lemma 3.. First, we derive a convenient way to find ˆD 3 =U3 D3 U 3. his is based on the following lemma. Lemma 3.. D i in 3.7 satisfies 3.0 Ui D i U i = Ū i D i Rc Ū i with Ū i =, i =3. R c hus, ˆD i =Ū i D i Ū i which does not involve the nested forms D i,u i. Proof. According to the nested form of U i and Lemma 3., Ui D i U i = Rc Rc U U Di U Rc U R c = Ū i D i Ū i. Second, we find R and R in the nested form of Ũ3: R = D3 R U 3 ˆD 3 from 3.9. Ũ Ũ By Lemma 3., we can multiply diagu,u on the left of both sides to get R U = R U D3 U 3 ˆD U 3 = U D3 U R 3. U R = D 3 Ū 3 ˆD3 from Lemma 3. and Ū3 in 3.0. his gives a formula for computing R and R. hird, we find D, D, B. From 3.4, 3.9, and the nested form of Ũ3, wehave D 3 = G 3 + Ũ3Ĝ7;,Ũ 3 = D 3 =D 3 D 3 U 3 ˆD 3 U 3 D 3 +Ũ3Ĝ7;,Ũ 3 Ũ3 ˆD 3 Ũ3 + Ũ3Ĝ7;,Ũ 3 from 3.9 =diagg,g + diagũ, Ũ D 3 diagũ, Ũ [ D = diagg,g +diagũ, Ũ ˆD 3 Ũ3 ˆD 3 Ũ3 + Ũ3Ĝ7;,Ũ 3 R R R 3 ˆD 3 R R ] + Ĝ 7;, R R R diagũ, Ũ.

11 FAS SPARSE SELECED INVERSION 93 Define R Ḡ 3 = D 3 R = = D 3 D 3 D 3 Ĝ 3 = Ḡ3 + ˆD 3 R R D 3 Ū 3 ˆD3 ˆD 3 ˆD 3 Ū3 D 3, D 3 from 3. Ū 3 ˆD3 Ū3 R Ĝ3;, Ĝ Ĝ 7;, R R R 3;, Ĝ 3;, Ĝ 3;,, where Ĝ3 is partitioned conformably. hen D 3 =diagg,g + diagũ, ŨĜ3 diagũ, Ũ G + = ŨĜ3;,Ũ Ũ Ĝ 3;, Ũ Ũ Ĝ 3;, Ũ G + ŨĜ3;,Ũ. hus, we obtain the following generators: 3.4 D = G + ŨĜ3;,Ũ, D = G + ŨĜ3;,Ũ, B = Ĝ3;,. In general HSS inversion with more levels, the generators of F can be similarly found. he derivation is similar to the process above. Lemma 3.3 below shows some essential ideas by induction. Another way is based on a telescoping HSS representation in [6]. We do not repeat the details since they do not affect the understanding of our later discussions on selected inversion. he procedure can be organized into two traversals of the HSS tree. In a bottom-up traversal, define hierarchically 3.5 D i = D i, Ū i = U i, ˆDi =Ū i ˆDc B D i = c Rc Bc, Ū i =, ˆDi =Ū ˆDc R i c D i Ū i D i Ū i i: leaf i: nonleaf. he derivations above and Lemma 3.3 below indicate that, D i, Ūi can be understood as generators of an intermediate reduced HSS matrix [6, 39]. hen compute Rc 3.6 Ũ i = D i Ū i ˆDi i: leaf or = D i Ū i ˆDi i: nonleaf; R c see, e.g., 3.5 and 3.. Proceed with these steps for the nonleaf nodes i. When the node i =root is reached, compute only D i as in 3.5. In a top-down traversal of, we find the D, B generators of F.Fori =root, D i let Ĝi and partition Ĝi conformably following 3.5 as Ĝi;, Ĝ 3.7 Ĝ i = i;,. hen let Ĝ i;, Ĝ i;, 3.8 Bc = Ĝi;,. For a nonleaf node i<root, let 3.9 Ḡ i = D i D i Ū i ˆDi Ūi D i, Ĝ i = Ḡi + Rc R c Ĝ pari;j,j R c R c,

12 94 J. XIA, Y. XI, S. CAULEY, AND V. BALAKRISHNAN where j = or, depending on whether i is a left or a right child of pari. Ḡi can be understood as a diagonal block of the inverse of a reduced HSS matrix [6]; see, e.g., 3.5 and 3.. Ĝ i is the sum of Ḡ i and a low-rank contribution from the parent level; see, e.g., 3.3. hen partition Ĝi as in 3.7, and set B c as in 3.8. Repeat these until all the nonleaf nodes are visited. hen for each leaf i, define Ḡ i as in 3.9 and let 3.0 Di = Ḡi + ŨiĜpari;j,jŨ i, where j = or, depending on whether i is a left or a right child of pari; see, e.g., 3.4.henwehavetheHSSgenerators D i, Ũi, R i, B i of F Fast computation of the update matrices. he HSS inversion is applied to F in.6 when we compute the diagonal blocks of C. In addition, the information computed in the inversion of F can also help to speed up the computation of the update matrix S i in.0 in the structured multifrontal method, as well as some additional computations in the selected inversion. For this purpose, we first show that the results in Lemmas 3. and 3. can be generalized. Lemma 3.3. Let G i = D i 3. D i U i ˆDi Ui D i for a node i root, then Ui D i U i = Ū i D i Ū i = ˆD i, Ui Ũ i = I, Ui G i U i =0, diaguc,uc D i diagu c,u c = D i i: nonleaf. Proof. We only need to prove 3.. he reason is, once 3. holds, the other formulas can be proved in the same way as in the proof of Lemma 3.. We show Ui D i U i = Ū i D i Ū i = ˆD i by induction. Let i denote the subtree of associated with node i and its descendants. he induction is done on the number of levels l of i. he result holds for i with two levels, as in Lemma 3.. Assume the result holds for any subtree of with up to l levels. We show it also holds for i with l levels. Writing D i as a block diagonal plus a low-rank form see, e.g., 3.3 and applying the modified SMW formula 3. yield 3. D i = diagdc,dc diagdc U c,dc U c ] [H H B c Bc + H H diaguc Dc,Uc Dc, where 3.3 H = diag Uc Dc U c,uc Dc U c. According to the nested form of U i in., we have Ui D i U i = Rc Rc Rc H R R c Rc HH Rc H c R c + Rc Rc HH B c + H H Rc H R c B c = Rc Rc Uc Dc U c B c Bc Uc Dc U c Rc. R c

13 FAS SPARSE SELECED INVERSION 95 Fig. 3. A pictorial illustration of the formula in heorem 3.4. Since c and c are at level l, by induction, 3.4 U c D c U c = Ū c D c Ū c = ˆD c, Uc D U c = Ū c D Ū c = c c ˆD c. hus, Ui D i U i = Rc Rc ˆDc Bc Rc Bc ˆDc R c = Ū i D i Ū i. Equation 3. illustrates an idea of reduced matrices just like that in [39]. hus, to compute the update matrix S i in.0, we can avoid directly using the HSS form of F D k in Uk D k U k. he following result is a direct corollary of Lemma 3.3. heorem 3.4. Suppose D i,u i, etc., are the HSS generators of F in.6. hen U k F U k = Ū k D k where the HSS tree of F has k nodes and D k is given in 3.5 with i = k in the HSS inversion of F. herefore, S i in.0 can be quickly computed as F Ni,N i U kb k Ū k D k ŪkB k U k. See Figure 3 for an illustration. his theorem indicates that D k plays a role similar to the final reduced matrix defined in [39] where F is factorized by ULVtype algorithms [0], yet here we take advantage of HSS inversion. If F has size N and HSS rank r, this theorem can help to reduce the complexity of computing Uk F U k from Or N with HSS inversion to only Or 3 with a simple formula Ūk D k Ū k. 3.. Basic ideas of the structured selected inversion. Similarly to the existing direct selected inversion methods in [3, 5, 6, 7], the extraction of diagc includes two stages, a forward one block LDL factorization and a backward one inversion. he basic ideas of our structured selected inversion can be illustrated with a simple example. Consider a sparse symmetric matrix A after applying nested dissection, as well as its block LDL factorization: A = A A 3 A A 3 = A 3 A 3 A 33 I Ū k, I A L 3 L 3 I where A 33 corresponds to the top-level separator, and A I L 3 I L 3, F 3 I L 3 = A 3 A, L 3 = A 3 A, F 3 = A 33 L 3 A L 3 L 3 A L 3. When A results from a problem with the low-rank property, F 3 can be approximated by an HSS form, and the above factorization is performed recursively in a

14 96 J. XIA, Y. XI, S. CAULEY, AND V. BALAKRISHNAN structured way as the structured multifrontal factorization in section. hen the resulting factors are structured or data sparse. In the inversion stage, we have 3.5 I L A A 3 = I L 3 A I A L 3 = A + L 3 hus, F 3 F 3 L 3 F 3 L diag A = I I L 3 L 3 I F 3 L3 L 3 diaga + L 3F 3 L 3 diaga + L 3F 3 L 3 diagf 3, L 3 F 3 L 3 F 3 F 3. where F 3 L 3, F 3 L 3, andf 3 are submatrices of A corresponding to the nonzero blocks of L. his also means that the extraction of diag A needs the computation of the off-diagonal blocks of A that fall into the block nonzero pattern of the L factor from the LDL factorization [3]. With the structured factorization, L and F 3 are approximated by data-sparse forms, all the operations in 3.6 can be performed via structured HSS or low rank operations. For example, the HSS inversion in section 3. can be used to compute F 3, and multiplications of HSS matrices and low-rank matrices are used for F 3 L 3 and F 3 L 3. he idea can then be recursively applied. Remark 3.. Since C is generally a dense matrix, a full direct inversion for 3.6 costs On 3, which is prohibitive for large n and is impractical. In able IV of [7], some test examples are given. For small or modest n, the selected inversion is already at least 3 times faster than the full inversion, and often hundreds of times faster. For example, for the test matrix pwtk of size n = 7,98 in section 5, the selected inversion is 353 times faster. In this work, we will further show that our structured selected inversion is faster than the nonstructured selected inversion Structures within C and general structured selected inversion. In the general selected inversion, the first stage is a structured multifrontal LDL factorization as in section., and the second stage is a structured inversion. In the factorization stage, we traverse the assembly tree in its postorder, and the resulting factors are in data-sparse forms HSS or low rank. We then focus on the inversion stage, where we traverse in its reverse postorder. We use C to denote the diagonal block C si s i of C in 3. corresponding to node i of, and use C Ni,i to denote the off-diagonal block C sni s i of C with N i in Definition.. See 3.5 for an example where, for i =,wehavec corresponding to A + L 3 F 3 L 3 and C Ni,i corresponding to F 3 L 3. For the node k root, the corresponding diagonal block of C is C k,k = F k.

15 FAS SPARSE SELECED INVERSION 97 Apply the HSS inversion procedure in section 3. to the HSS form of the final frontal matrix F k. he diagonal blocks of C k,k are simply the D generators of the resulting HSS form of C k,k as in 3.0. For a node i of with i < k, we show a structured computation of C.Letl i be the row index in A that the first row of A si s i in.4 corresponds to. he derivation of the representation of C follows directly from : 3.7 C = F +L li+ :n l i :l i+ C li+ :n l i+ :nl li+ :n l i :l i+. Since L li+ :n l i :l i+ has many zero blocks following nested dissection, only the entries of C li+ :n l i+ :n within the block nonzero pattern of L li+ :n l i :l i+ are needed to compute C [3]. hus, following Definition. for N i,welet L N i,i = L sj s i L sj s i L sjα s i, wherewesuppose 3.8 N i {j, j,...,j α }. We can similarly form a matrix C Ni,N i from C li+ :n l i+ :n so that the following formulas are used for the actual construction of C in 3.7: 3.9 C = F + L N i,ic Ni,N i L Ni,i = F L N i,ic Ni,i with 3.30 C Ni,i = C Ni,N i L Ni,i. he extraction of the diagonal blocks of C in 3.9 involves these steps: apply HSS inversion to the HSS form of F to extract the diagonal generators of F ; compute C Ni,i in 3.30 via the low-rank structure of L Ni,i and the rank structure of C Ni,N i that has already been computed; compute L N i,i C N i,i in 3.9 via the low-rank structures of L Ni,i and C Ni,i. he first step is shown in section 3.. We thus focus on the latter two steps. First, consider L Ni,i. Recall that F Ni,i is in a low-rank form.9. hus, L Ni,i is also in a low-rank form noticing the notation assumption above 3.: 3.3 L Ni,i = F Ni,iF = U kb k U k F. We need to compute Uk F,whereU k is a nested basis matrix. Here, instead of using HSS solutions or HSS matrix-vector multiplications, we can compute Uk F with a more compact form based on an idea similar to heorem 3.4. heorem 3.5. With the notation in Lemma 3.3 and heorem 3.4, we have Uk F = Ūk D k diagũ k, Ũ k, where k and k are the left and right children of k, respectively. Proof. Similarly to the proof of Lemma 3.3, we use induction and just show the general induction step Ui D i = Ūi D i diagũ c, Ũ c for a nonleaf node i. According to 3., 3.3, and the nested form of U i, Ui D i = Rc Rc diagu c Dc,Uc Dc Rc Rc [H H H = Rc Rc B c Bc ] B c Bc + H H diaguc Dc,Uc Dc + H H diaguc Dc,Uc Dc.

16 98 J. XIA, Y. XI, S. CAULEY, AND V. BALAKRISHNAN According to 3.3 and 3.4, H = diag ˆD c, ˆD c. hus, U i D i = Ū i ˆDc B c B c ˆDc diag ˆD cuc Dc, ˆD c Uc Dc = Ū i D i diag ˆD c Uc Dc, ˆD c Uc Dc. D c and D c are submatrices of D i and are also HSS. Let j and j be the children of c. By induction, ˆD c U c D c = ˆD c Ū c D c diagũ j, Ũ j = ˆD c Ūc D c diagũ j, Ũ j = R j R j diag Ũj, Ũ j =Ũ c, where 3.6 is used with i replaced by c. Similarly, we have ˆD c Uc Dc = Ũ c,and then the theorem holds. he benefit of this formula can be seen from a pictorial illustration similar to Figure 3. By this formula, L Ni,i in 3.3 can be written in a compact form L Ni,i = U kb k Ūk D k diagũ k, Ũ k. Note that at this point, we leave L Ni,i in the above low-rank form, which can be conveniently multiplied by the structured form of C Ni,N i later. Next, we study the structures of some blocks of C. For convenience, we write down the following simple lemma, which can be proven by definition; see, e.g., [0]. Lemma 3.6. If F is an invertible matrix with HSS rank bounded by r, thenf also has HSS rank bounded by r. Another lemma shows how the addition of matrices with the same off-diagonal nested bases preserves the structure, and is a direct extension of the results in [39, 4] and [40, Proposition 3.3]. Lemma 3.7. Assume F is an HSS matrix with generators D, B, U, V,etc.,and corresponds to an HSS tree with root k. Let H be any square matrix with size equal to the column size of the nested basis matrix U k. hen the following matrix is also an HSS matrix: F + U k HV k, and its U, V, R, W generators are the same as those of F,anditsD, B generators have the same sizes as and can be obtained by updating the D, B generators of F. We can now prove an important theorem that discloses the rank structures of C and C Ni,i. heorem 3.8. Assume all the frontal matrices in the multifrontal factorization of A can be approximated by HSS matrices with HSS ranks bounded by r, so that the factorization is the exact one of Ã. hen for C in 3., a C is an HSS matrix with HSS rank bounded by r; b C Ni,i is a low-rank form with rank bounded by r.

17 FAS SPARSE SELECED INVERSION 99 Proof. We prove a first. For a node i of,considerc in 3.9. he HSS structure of F follows from Lemma 3.6. According to 3.3 and heorem 3.5, 3.3 C = F + L N i,i C N i,n i L Ni,i = F + diagũk, Ũk [Bk k U k C Ni,N U kb i k k }{{ ] diagũ k }, Ũ k H Ūk D F + diagũk, Ũk H diagũ k, Ũ k. Ūk D his indicates that diagũk, Ũk gives a column basis of L N i,i C N i,n i L Ni,i. On the other hand, Ũk and Ũk are also the generators of F that give the column bases of its appropriate off-diagonal blocks. In another word, the off-diagonal blocks of F and L N i,i C N i,n i L Ni,i have the same nested basis information. More specifically, we can partition the right-hand side of 3.3 conformably into a block form: F C = F Ũk H H Ũ F F + k Ũ k H H Ũk F = + Ũk H Ũk F + Ũk H Ũk F + Ũk H Ũk F. + Ũk H Ũk According to Lemma 3.7, the, and, blocks of the above matrix are HSS forms, and the HSS generators are just those of F, except the entries but not the sizes of the D, B generators of F are updated. In addition, the, block is F + Ũk H Ũ k = Ũk B k + H Ũ k, and the size of B k + H is bounded by r. hus, C has the same Ũ, R generators as F, and the summation on the right-hand side of 3.3 does not increase the HSS rank, which remains bounded by r. For b, C Ni,i has a low-rank form C Ni,i = C Ni,N i L Ni,i = C Ni,N i U kb k Ūk D k diagũ k, Ũ k. Since B k is an HSS generator of F i and has size bounded by r, the right-hand side has rank bounded by r. herefore, the blocks of C Ni,N i in 3.9 can be conveniently represented by HSS or low-rank forms, which then participate in the computation of C. his is illustrated in Figure 4. he formula 3.3 in the proof above gives our structured method for computing C. For notational convenience, let 3.33 U Ni = C Ni,N i U k, B i = B k, V i = then Ūk D k diagũ k, Ũ k ; C Ni,i = U Ni B i V i, C = F L N i,ic Ni,i = F + V i B i U k U Ni B i V i.

18 300 J. XIA, Y. XI, S. CAULEY, AND V. BALAKRISHNAN i Structures in the factors ii Structures in block lower-triangular C Fig. 4. he rank structures in the LDL factors of à and in C, and the pieces marked in red that are needed to compute C in 3.9. We use the low-rank form 3.34 to form C Ni,i, which is then used in the computation of C as in he structured forms of both C Ni,i and C then participate in the later inversion steps associated with the descendants of node i. One task is to compute the product U Ni = C Ni,N i U k in Note that U k is a column basis matrix of an off-diagonal block of F i in.9, and is hierarchically represented by lower-level generators. hus, the cost of computing U Ni is comparable to the multiplication of an HSS matrix and a nested basis matrix. For simplicity, we first briefly show how to compute C Ni,N i U k with an explicit form of U k. Formeshes with local connectivity, N i in 3.8 usually has only a finite/small number of nodes. hus, this only involves a small number of HSS matrix-vector products and low-rank matrix-vector products. hat is, we partition U k into block rows U j, k, for j N i,so that the row size of U j, k, is equal to the row size of C j,j. Also partition U Ni into block C j,ni U k for j N i similarly holds for each j N i.hus, rows U i j 3.36 U i j = C j,j U j, k, t N i,t<j U j t B t Vt U t, k V j B j UN j Û j, k, where Ûj, k is obtained by stacking all the blocks U t, k for t N i, t > j. he first term on the right-hand side involves HSS matrix-vector products, and the remaining two terms are low-rank matrix-vector products. A fast HSS matrix-vector multiplication scheme can be found in [9]. U i j can then be quickly computed. After this, stack all the pieces U i j to get U Ni. Moreover, since U k is a nested basis matrix, the cost of multiplying C Ni,N i and U k may be lower than straightforward HSS matrix-vector multiplications. he is because of the existence of possible rank patterns across different levels of the HSS tree see the next section. hat is, in., U i may have a larger column size than U c or U c, i.e., the HSS block corresponding to i may have a higher rank than the individual HSS blocks corresponding to c and c. hus, multiplying a matrix by the nested form of U i may be cheaper than by the explicit form of U i. With multilevel hierarchy, the difference in the efficiency can be very significant.

19 FAS SPARSE SELECED INVERSION 30 able Major structured operations for computing C Ni,i and C. C Ni,i Formulas C Ni,N i U k U Ni Bi V i Structured operations Multiplication of HSS and nested basis matrices Multiplication of nested basis matrices F HSS inversion C V i B i U ku Ni Bi V i Multiplications of nested basis matrices F + V i B i U ku Ni Bi V i HSS generator update Overall, computing C Ni,i and C involves the structured operations in able. Some of the matrix products are related to the multiplications of HSS matrices. See [8, section 3.] for an HSS multiplication scheme. hat is, the generators of the product matrix can be computed following a bottom-up traversal of the HSS tree and a top-down traversal. he latter traversal is not needed during the multiplication of an HSS matrix and a nested basis matrix. Interested readers can find the detailed derivation of the algorithm in [8, proof of heorem 3..]. he results of the matrix multiplications in able are also structured, i.e., can be represented by nested basis matrices. One way to understand this is to use the idea of the fast multipole method FMM [9] when A corresponds to a discretized Green s function, together with the fact that C Ni,i is an off-diagonal block of A. Sometimes, after the multiplications, a recompression step [38, section 5] may be used to recover compact structured forms. hat is, in a bottom-up traversal of the HSS tree, the basis matrices are compressed, and related upper-level generators are modified. hen in a top-down traversal, the B i generators are compressed, and the related lower-level generators are modified. As an example, for the separator i = in Figures and 4, we have N i = {3, 7, 5}. hen C 3,3 V 3 B 3 C N,N = U 3 7 U 3 5 B3 V3 U 3 7 U 3 5 C 7,7 V 7 B 7 UN 7 U N7 B7 V 7 C 5,5 See Figure 5 for an illustration of the multiplication of C N,N with U k, whereanested form of U k is shown in Figure 6. he overall structured selected inversion scheme is summarized in Algorithm. As in the structured multifrontal factorization, a switching level l s is involved for the optimization of the complexity in the next section. Remark 3.. hroughout the multifrontal method coupled with nested dissection, we only need to work on L Ni,i and C Ni,N i, which contain the local subblocks of L and C, respectively. his naturally takes advantage of the nonzero pattern of L and has nice data locality. On the other hand, the method in [5, 6] uses the indices of the nonzero entries of L and it is not immediately clear which dense pieces need to be put together. In addition, in [6], to find C Ni,i, the blocks C j,i for all the ancestors j of i are visited. Here, we only visit N i, which is often just a small subset of the ancestors. N i is determined in the symbolic factorization stage after nested dissection. Remark 3.3. In terms of the memory, in lines 5, 6, 3, and 4 of Algorithm, the storage for the dense or structured blocks of L and Λ may be used to at least partly store C Ni,i and C. hus, the overall memory for the extraction of diagc is.

20 30 J. XIA, Y. XI, S. CAULEY, AND V. BALAKRISHNAN Fig. 5. he structure of C N N related to Figure 4 zoomed in and the multiplication of C N N with U k, whereu k is shown in Figure 6. Fig. 6. A nested basis form of U k in Figure 5. Algorithm. Structured selected inversion for extracting the diagonal blocks of A. : procedure SINV : Find the diagonal generators of the HSS inverse C k,k = F k for k root 3: for node/separator i of from k todo 4: if i is at level l < l s then Dense extraction below the switching level l s 5: C Ni,i C Ni,N i L Ni,i From 3.30; e.g., 3.5 6: C F L N i,i C N i,i From 3.9; e.g., 3.5 7: else Dense matrix L Ni,i below l s 8: Find the HSS generators D i and Ũi of F Structured F 9: Form B i and V i as in 3.33 Structures of C Ni,i 0: for node/separator j N i do Structures of C Ni,i : Form U Nj as in 3.33 with i replaced by j With appropriate changes of the detailed matrices : end for 3: C Ni,i U Ni Bi V i Low-rank approximation of C Ni,i as in able 4: C F + V i B i U ku Ni Bi V i As in able 5: end if 6: end for 7: end procedure

21 FAS SPARSE SELECED INVERSION 303 able Factorization cost ξ fact, inversion cost ξ inv, and storage σ mem for the extraction of diagc for A discretized on a regular mesh, with the methods in [3, 5, 6, 7] or the dense multifrontal method. Mesh ξ fact ξ inv σ mem D m m, n = m On.5 On.5 On log n 3D m m m, n = m 3 On On On 4/3 able 3 Inversion cost ξ inv and storage σ mem of the structured method for the extraction of diagc for A discretized on a regular mesh, where r is the maximal HSS rank of all the frontal matrices. In practice, the HSS ranks may depend on n. he results here based on a maximum rank bound r would then overestimate the actual costs. Mesh ξinv σ mem D m m, n = m Orn Orn 3D m m m, n = m 3 Or 3/ n Or / n about the same as that for the LDL factorization. he blocks of L and Λ are accessed just like in the backward substitution for solving a linear system. Moreover, in the backward traversal of, some blocks L Nj,j and C Nj,j may be discarded when j is not within any N i for the remaining nodes i. 4. Complexity optimization. For the purpose of later comparison, we restate the complexity and memory usage for the direct selected inversion algorithms in [3, 5, 6, 7] as well as the inversion algorithm based on the dense multifrontal method. Assume A is discretized on a D m m mesh or a 3D m m m mesh. hen the cost ξ fact of factorizing A, thecostξ inv of extracting diagc, and the memory σ mem are reported in able. Here, σ mem measures the storage for the LDL factors and the blocks of C corresponding to the block nonzero pattern of the factors. Later, we also use tilde notation for the counts of the structured method. For example, ξ inv denotes the cost of extracting diagc with the structured inversion. hen we turn to the analysis of our structured method when A has the low-rank property. he analysis is similar to that for the structured direct solvers in [39, 40]. We first present the traditional case when the HSS ranks of all the frontal matrices in the LDL factorization are bounded by r, and then show some similar results by letting r grow following certain patterns. In the following remarks, the cost of factorizing A and the memory may be slightly different from those in [39, 40], since we try to optimize the inversion cost. Remark 4.. Suppose the structured multifrontal factorization and Algorithm are applied to a discretized matrix A on a D m m mesh n = m ora3dm m m mesh n = m 3, and the HSS ranks of the frontal matrices in the multifrontal method are bounded by r. Choose the switching level l s = Olog m Olog r of the assembly tree so that the inversion costs before and after l s are the same. hen after ξ fact = Orn log n flops in two dimensions and ξ fact = Orn 4/3 in three dimensions for the structured factorization, the optimal structured inversion costs ξ inv are Orn in two dimensions and Or 3/ n in three dimensions; the memory requirements σ mem are On log r+on log log n in two dimensions and Or / n in three dimensions see able 3.

MULTI-LAYER HIERARCHICAL STRUCTURES AND FACTORIZATIONS

MULTI-LAYER HIERARCHICAL STRUCTURES AND FACTORIZATIONS MULTI-LAYER HIERARCHICAL STRUCTURES AND FACTORIZATIONS JIANLIN XIA Abstract. We propose multi-layer hierarchically semiseparable MHS structures for the fast factorizations of dense matrices arising from

More information

A DISTRIBUTED-MEMORY RANDOMIZED STRUCTURED MULTIFRONTAL METHOD FOR SPARSE DIRECT SOLUTIONS

A DISTRIBUTED-MEMORY RANDOMIZED STRUCTURED MULTIFRONTAL METHOD FOR SPARSE DIRECT SOLUTIONS A DISTRIBUTED-MEMORY RANDOMIZED STRUCTURED MULTIFRONTAL METHOD FOR SPARSE DIRECT SOLUTIONS ZIXING XIN, JIANLIN XIA, MAARTEN V. DE HOOP, STEPHEN CAULEY, AND VENKATARAMANAN BALAKRISHNAN Abstract. We design

More information

Fast algorithms for hierarchically semiseparable matrices

Fast algorithms for hierarchically semiseparable matrices NUMERICAL LINEAR ALGEBRA WITH APPLICATIONS Numer. Linear Algebra Appl. 2010; 17:953 976 Published online 22 December 2009 in Wiley Online Library (wileyonlinelibrary.com)..691 Fast algorithms for hierarchically

More information

A DISTRIBUTED-MEMORY RANDOMIZED STRUCTURED MULTIFRONTAL METHOD FOR SPARSE DIRECT SOLUTIONS

A DISTRIBUTED-MEMORY RANDOMIZED STRUCTURED MULTIFRONTAL METHOD FOR SPARSE DIRECT SOLUTIONS SIAM J. SCI. COMPUT. Vol. 39, No. 4, pp. C292 C318 c 2017 Society for Industrial and Applied Mathematics A DISTRIBUTED-MEMORY RANDOMIZED STRUCTURED MULTIFRONTAL METHOD FOR SPARSE DIRECT SOLUTIONS ZIXING

More information

FAST STRUCTURED EIGENSOLVER FOR DISCRETIZED PARTIAL DIFFERENTIAL OPERATORS ON GENERAL MESHES

FAST STRUCTURED EIGENSOLVER FOR DISCRETIZED PARTIAL DIFFERENTIAL OPERATORS ON GENERAL MESHES Proceedings of the Project Review, Geo-Mathematical Imaging Group Purdue University, West Lafayette IN, Vol. 1 2012 pp. 123-132. FAST STRUCTURED EIGENSOLVER FOR DISCRETIZED PARTIAL DIFFERENTIAL OPERATORS

More information

A robust inner-outer HSS preconditioner

A robust inner-outer HSS preconditioner NUMERICAL LINEAR ALGEBRA WIH APPLICAIONS Numer. Linear Algebra Appl. 2011; 00:1 0 [Version: 2002/09/18 v1.02] A robust inner-outer HSS preconditioner Jianlin Xia 1 1 Department of Mathematics, Purdue University,

More information

Effective matrix-free preconditioning for the augmented immersed interface method

Effective matrix-free preconditioning for the augmented immersed interface method Effective matrix-free preconditioning for the augmented immersed interface method Jianlin Xia a, Zhilin Li b, Xin Ye a a Department of Mathematics, Purdue University, West Lafayette, IN 47907, USA. E-mail:

More information

EFFECTIVE AND ROBUST PRECONDITIONING OF GENERAL SPD MATRICES VIA STRUCTURED INCOMPLETE FACTORIZATION

EFFECTIVE AND ROBUST PRECONDITIONING OF GENERAL SPD MATRICES VIA STRUCTURED INCOMPLETE FACTORIZATION EFFECVE AND ROBUS PRECONDONNG OF GENERAL SPD MARCES VA SRUCURED NCOMPLEE FACORZAON JANLN XA AND ZXNG XN Abstract For general symmetric positive definite SPD matrices, we present a framework for designing

More information

Fast matrix algebra for dense matrices with rank-deficient off-diagonal blocks

Fast matrix algebra for dense matrices with rank-deficient off-diagonal blocks CHAPTER 2 Fast matrix algebra for dense matrices with rank-deficient off-diagonal blocks Chapter summary: The chapter describes techniques for rapidly performing algebraic operations on dense matrices

More information

Partial Left-Looking Structured Multifrontal Factorization & Algorithms for Compressed Sensing. Cinna Julie Wu

Partial Left-Looking Structured Multifrontal Factorization & Algorithms for Compressed Sensing. Cinna Julie Wu Partial Left-Looking Structured Multifrontal Factorization & Algorithms for Compressed Sensing by Cinna Julie Wu A dissertation submitted in partial satisfaction of the requirements for the degree of Doctor

More information

V C V L T I 0 C V B 1 V T 0 I. l nk

V C V L T I 0 C V B 1 V T 0 I. l nk Multifrontal Method Kailai Xu September 16, 2017 Main observation. Consider the LDL T decomposition of a SPD matrix [ ] [ ] [ ] [ ] B V T L 0 I 0 L T L A = = 1 V T V C V L T I 0 C V B 1 V T, 0 I where

More information

A sparse multifrontal solver using hierarchically semi-separable frontal matrices

A sparse multifrontal solver using hierarchically semi-separable frontal matrices A sparse multifrontal solver using hierarchically semi-separable frontal matrices Pieter Ghysels Lawrence Berkeley National Laboratory Joint work with: Xiaoye S. Li (LBNL), Artem Napov (ULB), François-Henry

More information

Incomplete Cholesky preconditioners that exploit the low-rank property

Incomplete Cholesky preconditioners that exploit the low-rank property anapov@ulb.ac.be ; http://homepages.ulb.ac.be/ anapov/ 1 / 35 Incomplete Cholesky preconditioners that exploit the low-rank property (theory and practice) Artem Napov Service de Métrologie Nucléaire, Université

More information

Block-tridiagonal matrices

Block-tridiagonal matrices Block-tridiagonal matrices. p.1/31 Block-tridiagonal matrices - where do these arise? - as a result of a particular mesh-point ordering - as a part of a factorization procedure, for example when we compute

More information

AMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences)

AMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences) AMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences) Lecture 19: Computing the SVD; Sparse Linear Systems Xiangmin Jiao Stony Brook University Xiangmin Jiao Numerical

More information

J.I. Aliaga 1 M. Bollhöfer 2 A.F. Martín 1 E.S. Quintana-Ortí 1. March, 2009

J.I. Aliaga 1 M. Bollhöfer 2 A.F. Martín 1 E.S. Quintana-Ortí 1. March, 2009 Parallel Preconditioning of Linear Systems based on ILUPACK for Multithreaded Architectures J.I. Aliaga M. Bollhöfer 2 A.F. Martín E.S. Quintana-Ortí Deparment of Computer Science and Engineering, Univ.

More information

Scientific Computing with Case Studies SIAM Press, Lecture Notes for Unit VII Sparse Matrix

Scientific Computing with Case Studies SIAM Press, Lecture Notes for Unit VII Sparse Matrix Scientific Computing with Case Studies SIAM Press, 2009 http://www.cs.umd.edu/users/oleary/sccswebpage Lecture Notes for Unit VII Sparse Matrix Computations Part 1: Direct Methods Dianne P. O Leary c 2008

More information

SOLVING SPARSE LINEAR SYSTEMS OF EQUATIONS. Chao Yang Computational Research Division Lawrence Berkeley National Laboratory Berkeley, CA, USA

SOLVING SPARSE LINEAR SYSTEMS OF EQUATIONS. Chao Yang Computational Research Division Lawrence Berkeley National Laboratory Berkeley, CA, USA 1 SOLVING SPARSE LINEAR SYSTEMS OF EQUATIONS Chao Yang Computational Research Division Lawrence Berkeley National Laboratory Berkeley, CA, USA 2 OUTLINE Sparse matrix storage format Basic factorization

More information

ANONSINGULAR tridiagonal linear system of the form

ANONSINGULAR tridiagonal linear system of the form Generalized Diagonal Pivoting Methods for Tridiagonal Systems without Interchanges Jennifer B. Erway, Roummel F. Marcia, and Joseph A. Tyson Abstract It has been shown that a nonsingular symmetric tridiagonal

More information

Enhancing Scalability of Sparse Direct Methods

Enhancing Scalability of Sparse Direct Methods Journal of Physics: Conference Series 78 (007) 0 doi:0.088/7-6596/78//0 Enhancing Scalability of Sparse Direct Methods X.S. Li, J. Demmel, L. Grigori, M. Gu, J. Xia 5, S. Jardin 6, C. Sovinec 7, L.-Q.

More information

1. Introduction. In this paper, we consider the eigenvalue solution for a non- Hermitian matrix A:

1. Introduction. In this paper, we consider the eigenvalue solution for a non- Hermitian matrix A: A FAST CONTOUR-INTEGRAL EIGENSOLVER FOR NON-HERMITIAN MATRICES XIN YE, JIANLIN XIA, RAYMOND H. CHAN, STEPHEN CAULEY, AND VENKATARAMANAN BALAKRISHNAN Abstract. We present a fast contour-integral eigensolver

More information

INTERCONNECTED HIERARCHICAL RANK-STRUCTURED METHODS FOR DIRECTLY SOLVING AND PRECONDITIONING THE HELMHOLTZ EQUATION

INTERCONNECTED HIERARCHICAL RANK-STRUCTURED METHODS FOR DIRECTLY SOLVING AND PRECONDITIONING THE HELMHOLTZ EQUATION MATHEMATCS OF COMPUTATON Volume 00, Number 0, Pages 000 000 S 0025-5718XX0000-0 NTERCONNECTED HERARCHCAL RANK-STRUCTURED METHODS FOR DRECTLY SOLVNG AND PRECONDTONNG THE HELMHOLTZ EQUATON XAO LU, JANLN

More information

5.1 Banded Storage. u = temperature. The five-point difference operator. uh (x, y + h) 2u h (x, y)+u h (x, y h) uh (x + h, y) 2u h (x, y)+u h (x h, y)

5.1 Banded Storage. u = temperature. The five-point difference operator. uh (x, y + h) 2u h (x, y)+u h (x, y h) uh (x + h, y) 2u h (x, y)+u h (x h, y) 5.1 Banded Storage u = temperature u= u h temperature at gridpoints u h = 1 u= Laplace s equation u= h u = u h = grid size u=1 The five-point difference operator 1 u h =1 uh (x + h, y) 2u h (x, y)+u h

More information

Notes on Householder QR Factorization

Notes on Householder QR Factorization Notes on Householder QR Factorization Robert A van de Geijn Department of Computer Science he University of exas at Austin Austin, X 7872 rvdg@csutexasedu September 2, 24 Motivation A fundamental problem

More information

arxiv: v1 [cs.na] 20 Jul 2015

arxiv: v1 [cs.na] 20 Jul 2015 AN EFFICIENT SOLVER FOR SPARSE LINEAR SYSTEMS BASED ON RANK-STRUCTURED CHOLESKY FACTORIZATION JEFFREY N. CHADWICK AND DAVID S. BINDEL arxiv:1507.05593v1 [cs.na] 20 Jul 2015 Abstract. Direct factorization

More information

A direct solver for elliptic PDEs in three dimensions based on hierarchical merging of Poincaré-Steklov operators

A direct solver for elliptic PDEs in three dimensions based on hierarchical merging of Poincaré-Steklov operators (1) A direct solver for elliptic PDEs in three dimensions based on hierarchical merging of Poincaré-Steklov operators S. Hao 1, P.G. Martinsson 2 Abstract: A numerical method for variable coefficient elliptic

More information

Sparse factorization using low rank submatrices. Cleve Ashcraft LSTC 2010 MUMPS User Group Meeting April 15-16, 2010 Toulouse, FRANCE

Sparse factorization using low rank submatrices. Cleve Ashcraft LSTC 2010 MUMPS User Group Meeting April 15-16, 2010 Toulouse, FRANCE Sparse factorization using low rank submatrices Cleve Ashcraft LSTC cleve@lstc.com 21 MUMPS User Group Meeting April 15-16, 21 Toulouse, FRANCE ftp.lstc.com:outgoing/cleve/mumps1 Ashcraft.pdf 1 LSTC Livermore

More information

A fast randomized algorithm for computing a Hierarchically Semi-Separable representation of a matrix

A fast randomized algorithm for computing a Hierarchically Semi-Separable representation of a matrix A fast randomized algorithm for computing a Hierarchically Semi-Separable representation of a matrix P.G. Martinsson, Department of Applied Mathematics, University of Colorado at Boulder Abstract: Randomized

More information

Numerical Methods I: Numerical linear algebra

Numerical Methods I: Numerical linear algebra 1/3 Numerical Methods I: Numerical linear algebra Georg Stadler Courant Institute, NYU stadler@cimsnyuedu September 1, 017 /3 We study the solution of linear systems of the form Ax = b with A R n n, x,

More information

Optimal Interface Conditions for an Arbitrary Decomposition into Subdomains

Optimal Interface Conditions for an Arbitrary Decomposition into Subdomains Optimal Interface Conditions for an Arbitrary Decomposition into Subdomains Martin J. Gander and Felix Kwok Section de mathématiques, Université de Genève, Geneva CH-1211, Switzerland, Martin.Gander@unige.ch;

More information

9. Numerical linear algebra background

9. Numerical linear algebra background Convex Optimization Boyd & Vandenberghe 9. Numerical linear algebra background matrix structure and algorithm complexity solving linear equations with factored matrices LU, Cholesky, LDL T factorization

More information

An H-LU Based Direct Finite Element Solver Accelerated by Nested Dissection for Large-scale Modeling of ICs and Packages

An H-LU Based Direct Finite Element Solver Accelerated by Nested Dissection for Large-scale Modeling of ICs and Packages PIERS ONLINE, VOL. 6, NO. 7, 2010 679 An H-LU Based Direct Finite Element Solver Accelerated by Nested Dissection for Large-scale Modeling of ICs and Packages Haixin Liu and Dan Jiao School of Electrical

More information

Multilevel low-rank approximation preconditioners Yousef Saad Department of Computer Science and Engineering University of Minnesota

Multilevel low-rank approximation preconditioners Yousef Saad Department of Computer Science and Engineering University of Minnesota Multilevel low-rank approximation preconditioners Yousef Saad Department of Computer Science and Engineering University of Minnesota SIAM CSE Boston - March 1, 2013 First: Joint work with Ruipeng Li Work

More information

Solving linear systems (6 lectures)

Solving linear systems (6 lectures) Chapter 2 Solving linear systems (6 lectures) 2.1 Solving linear systems: LU factorization (1 lectures) Reference: [Trefethen, Bau III] Lecture 20, 21 How do you solve Ax = b? (2.1.1) In numerical linear

More information

Research Reports on Mathematical and Computing Sciences

Research Reports on Mathematical and Computing Sciences ISSN 1342-284 Research Reports on Mathematical and Computing Sciences Exploiting Sparsity in Linear and Nonlinear Matrix Inequalities via Positive Semidefinite Matrix Completion Sunyoung Kim, Masakazu

More information

Key words. spectral method, structured matrix, low-rank property, matrix-free direct solver, linear complexity

Key words. spectral method, structured matrix, low-rank property, matrix-free direct solver, linear complexity SIAM J. SCI. COMPUT. Vol. 0, No. 0, pp. 000 000 c XXXX Society for Industrial and Applied Mathematics FAST STRUCTURED DIRECT SPECTRAL METHODS FOR DIFFERENTIAL EQUATIONS WITH VARIABLE COEFFICIENTS, I. THE

More information

An Efficient Solver for Sparse Linear Systems based on Rank-Structured Cholesky Factorization

An Efficient Solver for Sparse Linear Systems based on Rank-Structured Cholesky Factorization An Efficient Solver for Sparse Linear Systems based on Rank-Structured Cholesky Factorization David Bindel Department of Computer Science Cornell University 15 March 2016 (TSIMF) Rank-Structured Cholesky

More information

Numerical Linear Algebra

Numerical Linear Algebra Numerical Linear Algebra Decompositions, numerical aspects Gerard Sleijpen and Martin van Gijzen September 27, 2017 1 Delft University of Technology Program Lecture 2 LU-decomposition Basic algorithm Cost

More information

Program Lecture 2. Numerical Linear Algebra. Gaussian elimination (2) Gaussian elimination. Decompositions, numerical aspects

Program Lecture 2. Numerical Linear Algebra. Gaussian elimination (2) Gaussian elimination. Decompositions, numerical aspects Numerical Linear Algebra Decompositions, numerical aspects Program Lecture 2 LU-decomposition Basic algorithm Cost Stability Pivoting Cholesky decomposition Sparse matrices and reorderings Gerard Sleijpen

More information

Improvements for Implicit Linear Equation Solvers

Improvements for Implicit Linear Equation Solvers Improvements for Implicit Linear Equation Solvers Roger Grimes, Bob Lucas, Clement Weisbecker Livermore Software Technology Corporation Abstract Solving large sparse linear systems of equations is often

More information

LU Factorization. Marco Chiarandini. DM559 Linear and Integer Programming. Department of Mathematics & Computer Science University of Southern Denmark

LU Factorization. Marco Chiarandini. DM559 Linear and Integer Programming. Department of Mathematics & Computer Science University of Southern Denmark DM559 Linear and Integer Programming LU Factorization Marco Chiarandini Department of Mathematics & Computer Science University of Southern Denmark [Based on slides by Lieven Vandenberghe, UCLA] Outline

More information

Bindel, Spring 2016 Numerical Analysis (CS 4220) Notes for

Bindel, Spring 2016 Numerical Analysis (CS 4220) Notes for Cholesky Notes for 2016-02-17 2016-02-19 So far, we have focused on the LU factorization for general nonsymmetric matrices. There is an alternate factorization for the case where A is symmetric positive

More information

This can be accomplished by left matrix multiplication as follows: I

This can be accomplished by left matrix multiplication as follows: I 1 Numerical Linear Algebra 11 The LU Factorization Recall from linear algebra that Gaussian elimination is a method for solving linear systems of the form Ax = b, where A R m n and bran(a) In this method

More information

Practical Linear Algebra: A Geometry Toolbox

Practical Linear Algebra: A Geometry Toolbox Practical Linear Algebra: A Geometry Toolbox Third edition Chapter 12: Gauss for Linear Systems Gerald Farin & Dianne Hansford CRC Press, Taylor & Francis Group, An A K Peters Book www.farinhansford.com/books/pla

More information

ENHANCING PERFORMANCE AND ROBUSTNESS OF ILU PRECONDITIONERS BY BLOCKING AND SELECTIVE TRANSPOSITION

ENHANCING PERFORMANCE AND ROBUSTNESS OF ILU PRECONDITIONERS BY BLOCKING AND SELECTIVE TRANSPOSITION SIAM J. SCI. COMPUT. Vol. 39, No., pp. A303 A332 c 207 Society for Industrial and Applied Mathematics ENHANCING PERFORMANCE AND ROBUSTNESS OF ILU PRECONDITIONERS BY BLOCKING AND SELECTIVE TRANSPOSITION

More information

Bridging the gap between flat and hierarchical low-rank matrix formats: the multilevel BLR format

Bridging the gap between flat and hierarchical low-rank matrix formats: the multilevel BLR format Bridging the gap between flat and hierarchical low-rank matrix formats: the multilevel BLR format Amestoy, Patrick and Buttari, Alfredo and L Excellent, Jean-Yves and Mary, Theo 2018 MIMS EPrint: 2018.12

More information

Exploiting off-diagonal rank structures in the solution of linear matrix equations

Exploiting off-diagonal rank structures in the solution of linear matrix equations Stefano Massei Exploiting off-diagonal rank structures in the solution of linear matrix equations Based on joint works with D. Kressner (EPFL), M. Mazza (IPP of Munich), D. Palitta (IDCTS of Magdeburg)

More information

Key words. numerical linear algebra, direct methods, Cholesky factorization, sparse matrices, mathematical software, matrix updates.

Key words. numerical linear algebra, direct methods, Cholesky factorization, sparse matrices, mathematical software, matrix updates. ROW MODIFICATIONS OF A SPARSE CHOLESKY FACTORIZATION TIMOTHY A. DAVIS AND WILLIAM W. HAGER Abstract. Given a sparse, symmetric positive definite matrix C and an associated sparse Cholesky factorization

More information

A Sparse QS-Decomposition for Large Sparse Linear System of Equations

A Sparse QS-Decomposition for Large Sparse Linear System of Equations A Sparse QS-Decomposition for Large Sparse Linear System of Equations Wujian Peng 1 and Biswa N. Datta 2 1 Department of Math, Zhaoqing University, Zhaoqing, China, douglas peng@yahoo.com 2 Department

More information

Scientific Computing

Scientific Computing Scientific Computing Direct solution methods Martin van Gijzen Delft University of Technology October 3, 2018 1 Program October 3 Matrix norms LU decomposition Basic algorithm Cost Stability Pivoting Pivoting

More information

Contents. Preface... xi. Introduction...

Contents. Preface... xi. Introduction... Contents Preface... xi Introduction... xv Chapter 1. Computer Architectures... 1 1.1. Different types of parallelism... 1 1.1.1. Overlap, concurrency and parallelism... 1 1.1.2. Temporal and spatial parallelism

More information

Linear Algebra Section 2.6 : LU Decomposition Section 2.7 : Permutations and transposes Wednesday, February 13th Math 301 Week #4

Linear Algebra Section 2.6 : LU Decomposition Section 2.7 : Permutations and transposes Wednesday, February 13th Math 301 Week #4 Linear Algebra Section. : LU Decomposition Section. : Permutations and transposes Wednesday, February 1th Math 01 Week # 1 The LU Decomposition We learned last time that we can factor a invertible matrix

More information

The antitriangular factorisation of saddle point matrices

The antitriangular factorisation of saddle point matrices The antitriangular factorisation of saddle point matrices J. Pestana and A. J. Wathen August 29, 2013 Abstract Mastronardi and Van Dooren [this journal, 34 (2013) pp. 173 196] recently introduced the block

More information

Algorithms for Fast Linear System Solving and Rank Profile Computation

Algorithms for Fast Linear System Solving and Rank Profile Computation Algorithms for Fast Linear System Solving and Rank Profile Computation by Shiyun Yang A thesis presented to the University of Waterloo in fulfillment of the thesis requirement for the degree of Master

More information

Downloaded 08/28/12 to Redistribution subject to SIAM license or copyright; see

Downloaded 08/28/12 to Redistribution subject to SIAM license or copyright; see SIAM J. MATRIX ANAL. APPL. Vol. 26, No. 3, pp. 621 639 c 2005 Society for Industrial and Applied Mathematics ROW MODIFICATIONS OF A SPARSE CHOLESKY FACTORIZATION TIMOTHY A. DAVIS AND WILLIAM W. HAGER Abstract.

More information

7. LU factorization. factor-solve method. LU factorization. solving Ax = b with A nonsingular. the inverse of a nonsingular matrix

7. LU factorization. factor-solve method. LU factorization. solving Ax = b with A nonsingular. the inverse of a nonsingular matrix EE507 - Computational Techniques for EE 7. LU factorization Jitkomut Songsiri factor-solve method LU factorization solving Ax = b with A nonsingular the inverse of a nonsingular matrix LU factorization

More information

c 2017 Society for Industrial and Applied Mathematics

c 2017 Society for Industrial and Applied Mathematics SIAM J. SCI. COMPUT. Vol. 39, No. 4, pp. A1710 A1740 c 2017 Society for Industrial and Applied Mathematics ON THE COMPLEXITY OF THE BLOCK LOW-RANK MULTIFRONTAL FACTORIZATION PATRICK AMESTOY, ALFREDO BUTTARI,

More information

AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 23: GMRES and Other Krylov Subspace Methods; Preconditioning

AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 23: GMRES and Other Krylov Subspace Methods; Preconditioning AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 23: GMRES and Other Krylov Subspace Methods; Preconditioning Xiangmin Jiao SUNY Stony Brook Xiangmin Jiao Numerical Analysis I 1 / 18 Outline

More information

CLASSICAL ITERATIVE METHODS

CLASSICAL ITERATIVE METHODS CLASSICAL ITERATIVE METHODS LONG CHEN In this notes we discuss classic iterative methods on solving the linear operator equation (1) Au = f, posed on a finite dimensional Hilbert space V = R N equipped

More information

BLAS: Basic Linear Algebra Subroutines Analysis of the Matrix-Vector-Product Analysis of Matrix-Matrix Product

BLAS: Basic Linear Algebra Subroutines Analysis of the Matrix-Vector-Product Analysis of Matrix-Matrix Product Level-1 BLAS: SAXPY BLAS-Notation: S single precision (D for double, C for complex) A α scalar X vector P plus operation Y vector SAXPY: y = αx + y Vectorization of SAXPY (αx + y) by pipelining: page 8

More information

Numerical Methods I Non-Square and Sparse Linear Systems

Numerical Methods I Non-Square and Sparse Linear Systems Numerical Methods I Non-Square and Sparse Linear Systems Aleksandar Donev Courant Institute, NYU 1 donev@courant.nyu.edu 1 MATH-GA 2011.003 / CSCI-GA 2945.003, Fall 2014 September 25th, 2014 A. Donev (Courant

More information

A robust multilevel approximate inverse preconditioner for symmetric positive definite matrices

A robust multilevel approximate inverse preconditioner for symmetric positive definite matrices DICEA DEPARTMENT OF CIVIL, ENVIRONMENTAL AND ARCHITECTURAL ENGINEERING PhD SCHOOL CIVIL AND ENVIRONMENTAL ENGINEERING SCIENCES XXX CYCLE A robust multilevel approximate inverse preconditioner for symmetric

More information

Multilevel Low-Rank Preconditioners Yousef Saad Department of Computer Science and Engineering University of Minnesota. Modelling 2014 June 2,2014

Multilevel Low-Rank Preconditioners Yousef Saad Department of Computer Science and Engineering University of Minnesota. Modelling 2014 June 2,2014 Multilevel Low-Rank Preconditioners Yousef Saad Department of Computer Science and Engineering University of Minnesota Modelling 24 June 2,24 Dedicated to Owe Axelsson at the occasion of his 8th birthday

More information

Domain decomposition on different levels of the Jacobi-Davidson method

Domain decomposition on different levels of the Jacobi-Davidson method hapter 5 Domain decomposition on different levels of the Jacobi-Davidson method Abstract Most computational work of Jacobi-Davidson [46], an iterative method suitable for computing solutions of large dimensional

More information

(17) (18)

(17) (18) Module 4 : Solving Linear Algebraic Equations Section 3 : Direct Solution Techniques 3 Direct Solution Techniques Methods for solving linear algebraic equations can be categorized as direct and iterative

More information

IMPROVING THE PERFORMANCE OF SPARSE LU MATRIX FACTORIZATION USING A SUPERNODAL ALGORITHM

IMPROVING THE PERFORMANCE OF SPARSE LU MATRIX FACTORIZATION USING A SUPERNODAL ALGORITHM IMPROVING THE PERFORMANCE OF SPARSE LU MATRIX FACTORIZATION USING A SUPERNODAL ALGORITHM Bogdan OANCEA PhD, Associate Professor, Artife University, Bucharest, Romania E-mail: oanceab@ie.ase.ro Abstract:

More information

Bindel, Fall 2016 Matrix Computations (CS 6210) Notes for

Bindel, Fall 2016 Matrix Computations (CS 6210) Notes for 1 Logistics Notes for 2016-08-26 1. Our enrollment is at 50, and there are still a few students who want to get in. We only have 50 seats in the room, and I cannot increase the cap further. So if you are

More information

LU Factorization with Panel Rank Revealing Pivoting and Its Communication Avoiding Version

LU Factorization with Panel Rank Revealing Pivoting and Its Communication Avoiding Version LU Factorization with Panel Rank Revealing Pivoting and Its Communication Avoiding Version Amal Khabou James Demmel Laura Grigori Ming Gu Electrical Engineering and Computer Sciences University of California

More information

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 1 x 2. x n 8 (4) 3 4 2

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 1 x 2. x n 8 (4) 3 4 2 MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS SYSTEMS OF EQUATIONS AND MATRICES Representation of a linear system The general system of m equations in n unknowns can be written a x + a 2 x 2 + + a n x n b a

More information

On the Skeel condition number, growth factor and pivoting strategies for Gaussian elimination

On the Skeel condition number, growth factor and pivoting strategies for Gaussian elimination On the Skeel condition number, growth factor and pivoting strategies for Gaussian elimination J.M. Peña 1 Introduction Gaussian elimination (GE) with a given pivoting strategy, for nonsingular matrices

More information

9. Numerical linear algebra background

9. Numerical linear algebra background Convex Optimization Boyd & Vandenberghe 9. Numerical linear algebra background matrix structure and algorithm complexity solving linear equations with factored matrices LU, Cholesky, LDL T factorization

More information

Preliminary/Qualifying Exam in Numerical Analysis (Math 502a) Spring 2012

Preliminary/Qualifying Exam in Numerical Analysis (Math 502a) Spring 2012 Instructions Preliminary/Qualifying Exam in Numerical Analysis (Math 502a) Spring 2012 The exam consists of four problems, each having multiple parts. You should attempt to solve all four problems. 1.

More information

Solving an Elliptic PDE Eigenvalue Problem via Automated Multi-Level Substructuring and Hierarchical Matrices

Solving an Elliptic PDE Eigenvalue Problem via Automated Multi-Level Substructuring and Hierarchical Matrices Solving an Elliptic PDE Eigenvalue Problem via Automated Multi-Level Substructuring and Hierarchical Matrices Peter Gerds and Lars Grasedyck Bericht Nr. 30 März 2014 Key words: automated multi-level substructuring,

More information

THE H 2 -matrix is a general mathematical framework [1],

THE H 2 -matrix is a general mathematical framework [1], Accuracy Directly Controlled Fast Direct Solutions of General H -Matrices and ts Application to Electrically Large ntegral-equation-based Electromagnetic Analysis Miaomiao Ma, Student Member, EEE, and

More information

SPARSE signal representations have gained popularity in recent

SPARSE signal representations have gained popularity in recent 6958 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 10, OCTOBER 2011 Blind Compressed Sensing Sivan Gleichman and Yonina C. Eldar, Senior Member, IEEE Abstract The fundamental principle underlying

More information

Direct Methods for Solving Linear Systems. Simon Fraser University Surrey Campus MACM 316 Spring 2005 Instructor: Ha Le

Direct Methods for Solving Linear Systems. Simon Fraser University Surrey Campus MACM 316 Spring 2005 Instructor: Ha Le Direct Methods for Solving Linear Systems Simon Fraser University Surrey Campus MACM 316 Spring 2005 Instructor: Ha Le 1 Overview General Linear Systems Gaussian Elimination Triangular Systems The LU Factorization

More information

Scientific Computing WS 2018/2019. Lecture 9. Jürgen Fuhrmann Lecture 9 Slide 1

Scientific Computing WS 2018/2019. Lecture 9. Jürgen Fuhrmann Lecture 9 Slide 1 Scientific Computing WS 2018/2019 Lecture 9 Jürgen Fuhrmann juergen.fuhrmann@wias-berlin.de Lecture 9 Slide 1 Lecture 9 Slide 2 Simple iteration with preconditioning Idea: Aû = b iterative scheme û = û

More information

Numerical Methods in Matrix Computations

Numerical Methods in Matrix Computations Ake Bjorck Numerical Methods in Matrix Computations Springer Contents 1 Direct Methods for Linear Systems 1 1.1 Elements of Matrix Theory 1 1.1.1 Matrix Algebra 2 1.1.2 Vector Spaces 6 1.1.3 Submatrices

More information

QR FACTORIZATIONS USING A RESTRICTED SET OF ROTATIONS

QR FACTORIZATIONS USING A RESTRICTED SET OF ROTATIONS QR FACTORIZATIONS USING A RESTRICTED SET OF ROTATIONS DIANNE P. O LEARY AND STEPHEN S. BULLOCK Dedicated to Alan George on the occasion of his 60th birthday Abstract. Any matrix A of dimension m n (m n)

More information

30.3. LU Decomposition. Introduction. Prerequisites. Learning Outcomes

30.3. LU Decomposition. Introduction. Prerequisites. Learning Outcomes LU Decomposition 30.3 Introduction In this Section we consider another direct method for obtaining the solution of systems of equations in the form AX B. Prerequisites Before starting this Section you

More information

Computation fundamentals of discrete GMRF representations of continuous domain spatial models

Computation fundamentals of discrete GMRF representations of continuous domain spatial models Computation fundamentals of discrete GMRF representations of continuous domain spatial models Finn Lindgren September 23 2015 v0.2.2 Abstract The fundamental formulas and algorithms for Bayesian spatial

More information

Linear Solvers. Andrew Hazel

Linear Solvers. Andrew Hazel Linear Solvers Andrew Hazel Introduction Thus far we have talked about the formulation and discretisation of physical problems...... and stopped when we got to a discrete linear system of equations. Introduction

More information

Revised Simplex Method

Revised Simplex Method DM545 Linear and Integer Programming Lecture 7 Marco Chiarandini Department of Mathematics & Computer Science University of Southern Denmark Outline 1. 2. 2 Motivation Complexity of single pivot operation

More information

Sparse Linear Systems. Iterative Methods for Sparse Linear Systems. Motivation for Studying Sparse Linear Systems. Partial Differential Equations

Sparse Linear Systems. Iterative Methods for Sparse Linear Systems. Motivation for Studying Sparse Linear Systems. Partial Differential Equations Sparse Linear Systems Iterative Methods for Sparse Linear Systems Matrix Computations and Applications, Lecture C11 Fredrik Bengzon, Robert Söderlund We consider the problem of solving the linear system

More information

Algorithms to solve block Toeplitz systems and. least-squares problems by transforming to Cauchy-like. matrices

Algorithms to solve block Toeplitz systems and. least-squares problems by transforming to Cauchy-like. matrices Algorithms to solve block Toeplitz systems and least-squares problems by transforming to Cauchy-like matrices K. Gallivan S. Thirumalai P. Van Dooren 1 Introduction Fast algorithms to factor Toeplitz matrices

More information

Lecture 2 INF-MAT : , LU, symmetric LU, Positve (semi)definite, Cholesky, Semi-Cholesky

Lecture 2 INF-MAT : , LU, symmetric LU, Positve (semi)definite, Cholesky, Semi-Cholesky Lecture 2 INF-MAT 4350 2009: 7.1-7.6, LU, symmetric LU, Positve (semi)definite, Cholesky, Semi-Cholesky Tom Lyche and Michael Floater Centre of Mathematics for Applications, Department of Informatics,

More information

SYMBOLIC AND EXACT STRUCTURE PREDICTION FOR SPARSE GAUSSIAN ELIMINATION WITH PARTIAL PIVOTING

SYMBOLIC AND EXACT STRUCTURE PREDICTION FOR SPARSE GAUSSIAN ELIMINATION WITH PARTIAL PIVOTING SYMBOLIC AND EXACT STRUCTURE PREDICTION FOR SPARSE GAUSSIAN ELIMINATION WITH PARTIAL PIVOTING LAURA GRIGORI, JOHN R. GILBERT, AND MICHEL COSNARD Abstract. In this paper we consider two structure prediction

More information

arxiv: v1 [math.ra] 13 Jan 2009

arxiv: v1 [math.ra] 13 Jan 2009 A CONCISE PROOF OF KRUSKAL S THEOREM ON TENSOR DECOMPOSITION arxiv:0901.1796v1 [math.ra] 13 Jan 2009 JOHN A. RHODES Abstract. A theorem of J. Kruskal from 1977, motivated by a latent-class statistical

More information

Block Low-Rank (BLR) approximations to improve multifrontal sparse solvers

Block Low-Rank (BLR) approximations to improve multifrontal sparse solvers Block Low-Rank (BLR) approximations to improve multifrontal sparse solvers Joint work with Patrick Amestoy, Cleve Ashcraft, Olivier Boiteau, Alfredo Buttari and Jean-Yves L Excellent, PhD started on October

More information

Intrinsic products and factorizations of matrices

Intrinsic products and factorizations of matrices Available online at www.sciencedirect.com Linear Algebra and its Applications 428 (2008) 5 3 www.elsevier.com/locate/laa Intrinsic products and factorizations of matrices Miroslav Fiedler Academy of Sciences

More information

8.3 Householder QR factorization

8.3 Householder QR factorization 83 ouseholder QR factorization 23 83 ouseholder QR factorization A fundamental problem to avoid in numerical codes is the situation where one starts with large values and one ends up with small values

More information

STAT 309: MATHEMATICAL COMPUTATIONS I FALL 2018 LECTURE 13

STAT 309: MATHEMATICAL COMPUTATIONS I FALL 2018 LECTURE 13 STAT 309: MATHEMATICAL COMPUTATIONS I FALL 208 LECTURE 3 need for pivoting we saw that under proper circumstances, we can write A LU where 0 0 0 u u 2 u n l 2 0 0 0 u 22 u 2n L l 3 l 32, U 0 0 0 l n l

More information

Fast Structured Spectral Methods

Fast Structured Spectral Methods Spectral methods HSS structures Fast algorithms Conclusion Fast Structured Spectral Methods Yingwei Wang Department of Mathematics, Purdue University Joint work with Prof Jie Shen and Prof Jianlin Xia

More information

Kasetsart University Workshop. Multigrid methods: An introduction

Kasetsart University Workshop. Multigrid methods: An introduction Kasetsart University Workshop Multigrid methods: An introduction Dr. Anand Pardhanani Mathematics Department Earlham College Richmond, Indiana USA pardhan@earlham.edu A copy of these slides is available

More information

The amount of work to construct each new guess from the previous one should be a small multiple of the number of nonzeros in A.

The amount of work to construct each new guess from the previous one should be a small multiple of the number of nonzeros in A. AMSC/CMSC 661 Scientific Computing II Spring 2005 Solution of Sparse Linear Systems Part 2: Iterative methods Dianne P. O Leary c 2005 Solving Sparse Linear Systems: Iterative methods The plan: Iterative

More information

Incomplete LU Preconditioning and Error Compensation Strategies for Sparse Matrices

Incomplete LU Preconditioning and Error Compensation Strategies for Sparse Matrices Incomplete LU Preconditioning and Error Compensation Strategies for Sparse Matrices Eun-Joo Lee Department of Computer Science, East Stroudsburg University of Pennsylvania, 327 Science and Technology Center,

More information

Ann. Funct. Anal. 5 (2014), no. 2, A nnals of F unctional A nalysis ISSN: (electronic) URL:

Ann. Funct. Anal. 5 (2014), no. 2, A nnals of F unctional A nalysis ISSN: (electronic) URL: Ann Funct Anal 5 (2014), no 2, 127 137 A nnals of F unctional A nalysis ISSN: 2008-8752 (electronic) URL:wwwemisde/journals/AFA/ THE ROOTS AND LINKS IN A CLASS OF M-MATRICES XIAO-DONG ZHANG This paper

More information

ECE133A Applied Numerical Computing Additional Lecture Notes

ECE133A Applied Numerical Computing Additional Lecture Notes Winter Quarter 2018 ECE133A Applied Numerical Computing Additional Lecture Notes L. Vandenberghe ii Contents 1 LU factorization 1 1.1 Definition................................. 1 1.2 Nonsingular sets

More information

Dense LU factorization and its error analysis

Dense LU factorization and its error analysis Dense LU factorization and its error analysis Laura Grigori INRIA and LJLL, UPMC February 2016 Plan Basis of floating point arithmetic and stability analysis Notation, results, proofs taken from [N.J.Higham,

More information

Effective Resistance and Schur Complements

Effective Resistance and Schur Complements Spectral Graph Theory Lecture 8 Effective Resistance and Schur Complements Daniel A Spielman September 28, 2015 Disclaimer These notes are not necessarily an accurate representation of what happened in

More information