EFFECTIVE AND ROBUST PRECONDITIONING OF GENERAL SPD MATRICES VIA STRUCTURED INCOMPLETE FACTORIZATION

Size: px
Start display at page:

Download "EFFECTIVE AND ROBUST PRECONDITIONING OF GENERAL SPD MATRICES VIA STRUCTURED INCOMPLETE FACTORIZATION"

Transcription

1 EFFECVE AND ROBUS PRECONDONNG OF GENERAL SPD MARCES VA SRUCURED NCOMPLEE FACORZAON JANLN XA AND ZXNG XN Abstract For general symmetric positive definite SPD matrices, we present a framework for designing effective and robust black-box preconditioners via structured incomplete factorization n a scaling-and-compression strategy, off-diagonal blocks are first scaled on both sides by the inverses of the factors of the corresponding diagonal blocks and then compressed into low-rank approximations ULV-type factorizations are then computed A resulting prototype preconditioner is always positive definite Generalizations to practical hierarchical multilevel preconditioners are given Systematic analysis of the approximation error, robustness, and effectiveness is shown for both the prototype preconditioner and the multilevel generalization n particular, we show how local scaling and compression control the approximation accuracy and robustness, and how aggressive compression leads to efficient preconditioners that can significantly reduce the condition number and improve the eigenvalue clustering A result is also given to show when the multilevel preconditioner preserves positive definiteness he costs to apply the multilevel preconditioners are about ON, where N is the matrix size Numerical tests on several ill-conditioned problems show the effectiveness and robustness even if the compression uses very small numerical ranks n addition, significant robustness and effectiveness benefits can be observed as compared with a standard rank structured preconditioner based on direct off-diagonal compression Key words effective preconditioning, approximation error, preservation of positive definiteness, structured incomplete factorization, scaling-and-compression strategy, multilevel scheme AMS subject classifications 15A3, 65F10, 65F30 1 ntroduction Preconditioners play a key role in iterative solutions of linear systems For symmetric positive definite SPD matrices, incomplete factorizations have often been used to construct robust preconditioners See [1,, 3, 13, 19, ] for some examples n recent years, a strategy based on low-rank approximations has been successfully used to design preconditioners, where certain off-diagonal blocks are approximated by low-rank forms Such rank structured preconditioners often target at some specific problems, usually PDEs and integral equations see, eg, [6, 10, 11, 14] For more general sparse matrices, some algebraic preconditioners that incorporate low-rank approximations appear in [15, 16, 17, 5] For general dense SPD matrices A, limited work is available about the feasibility and effectiveness of preconditioning techniques based on low-rank off-diagonal approximations n general, when the off-diagonal blocks of A are directly approximated by low-rank forms with very low accuracy, it is not clear how effective the resulting preconditioner is he preconditioner may also likely fail to preserve the positive definiteness that is crucial in many applications Preliminary studies are conducted in [1, 9, 3], where sequential Schur complement computations are used to ensure positive definiteness in an idea of Schur monotonicity or Schur compensation n [9], the effectiveness for preconditioning is analyzed in terms of a special case n this paper, we consider the preconditioning of a general SPD matrix A based on incomplete factorizations that involve low-rank off-diagonal approximations For convenience, all our discussions are in terms of real dense SPD matrices We present a Department of Mathematics, Purdue University, West Lafayette, N xiaj@mathpurdue edu, zxin@purdueedu he research of Jianlin Xia was supported in part by NSF CAREER Award DMS

2 JANLN XA AND ZXNG XN framework to construct effective and robust black-box structured preconditioners that are convenient to analyze n the framework, we compute a structured incomplete factorization SF A L L, where appropriate off-diagonal blocks are first scaled on both sides and then compressed into low-rank approximations he resulting preconditioners called SF preconditioners can effectively reduce the condition number of A and bring the eigenvalues to cluster around 1 hey are also robust in the sense that they can preserve the positive definiteness either unconditionally in a prototype case or under a condition in a general case More specifically, our main contributions include: 1 a fundamental SF preconditioning framework, the multilevel generalization, and 3 systematic analysis he details are as follows We give a fundamental SF framework for designing effective and robust structured preconditioners for A he framework includes a scaling-and-compression strategy for matrix approximation and a ULV-type factorization hat is, instead of directly compressing an off-diagonal block, we scale the off-diagonal block on both sides with the inverses of the factors of the corresponding diagonal blocks and then perform the compression A sequence of orthogonal and triangular factors called ULV factors is then computed L is defined by these ULV factors and L L serves as the preconditioner We show that this preconditioner has superior effectiveness and robustness he basic idea is illustrated in terms of a prototype preconditioner, which is shown to be always positive definite regardless of the accuracy used in the compression We can also show that the local compression tolerance τ precisely controls the overall relative approximation accuracy of the matrix and the factor see heorem 4 Furthermore, we can justify the effectiveness of the preconditioner based on how τ impacts the eigenvalue distribution and the condition number of the preconditioned matrix L 1 A L n fact, the eigenvalues of L 1 A L cluster inside [1 τ, 1 + τ] Moreover, as long as the singular values of the scaled off-diagonal block slightly decay, we can aggressively truncate them so as to get a compact preconditioner L L that significantly reduces the condition number of L 1 A L his is rigorously characterized in terms of a representation 1+τ 1 τ in heorem 5 We further generalize the prototype preconditioner to multiple levels, where multilevel scaling-and-compression operations are combined with a hierarchical ULV-type factorization he robustness, accuracy, and effectiveness of the multilevel preconditioner L L are also shown A condition for τ is given to ensure that the preconditioner is positive definite heorem 31 We also prove the condition number of L 1 A L in a form similar to that in the prototype case, ie, 1+ϵ 1 ϵ in heorem 3, where ϵ is related to τ he eigenvalues of L 1 A L 1 cluster inside [ he accuracy and effectiveness analysis confirms that, with low-accuracy approximation, the preconditioners can significantly reduce the condition number and improve the eigenvalue clustering he SF framework and preconditioning techniques have some attractive features 1 he scaling strategy can enhance the compressibility of the off-diagonal blocks in some problems, where those blocks may have singular values very close to each other An example is the five-point discrete Laplacian matrix from a D regular grid, which has the identity matrix as the nonzero subblock in its off-diagonal blocks he scaling-and-compression strategy also propagates diagonal block information to off-diagonal blocks, so that it is convenient to justify the robustness 1+ϵ, 1 1 ϵ ]

3 EFFECVE AND ROBUS SPD MARX PRECONDONNG 3 and effectiveness n the earlier work in [9], only the leading diagonal block information is propagated to an off-diagonal block 3 Unlike the method in [9] that involves Schur complement computations in a sequential fashion, our SF framework computes hierarchical ULV-type factorizations and avoids the use of sequential Schur complement computations he construction of the preconditioner follows a binary tree, where the operations at the same level can be performed simultaneously his potentially leads to much better scalability 4 t is convenient to obtain a general multilevel preconditioner by repeatedly applying the scaling-and-compression strategy he cost to apply the resulting preconditioner is ON log N, where N is the order of A A modified version is also provided and costs only ON to apply he ON log N version is generally more effective, while the ON version is slightly more robust in practice 5 Systematic analysis of the approximation accuracy, robustness, and effectiveness is given, not only for the prototype preconditioner, but also for the multilevel one Multilevel error accumulation is analyzed he eigenvalue clustering and the resulting condition number are also shown he studies in [9] are restricted to a special 1-level case he performance of the preconditioners is illustrated in some numerical tests n particular, we use aggressive compression with very small ranks in the approximation of the scaled off-diagonal blocks and get satisfactory convergence in iterative solutions For some highly ill-conditioned problems such as some interpolation matrices in radial basis function methods, our preconditioners can still preserve positive definiteness and also greatly accelerate the convergence On the other hand, a structured preconditioner based on direct off-diagonal compression as in traditional rank structured methods yields slower convergence, and also fails to be positive definite he outline of the paper is as follows We first present the fundamental SF preconditioning framework in Section he prototype preconditioner and its analysis are included he generalization of the framework and analysis to multiple levels is given in Section 3, followed by numerical tests in Section 4 he following notation will be used throughout the discussions λa denotes an eigenvalue of A, and we usually use λa to mean any eigenvalue of A in general λ 1 A and λ N A specifically denote the largest and smallest eigenvalues of A, respectively ρa denotes the spectral radius of A κa is the -norm condition number of A diag represents a diagonal or block diagonal matrix with the given diagonal entries/blocks Fundamental SF preconditioning framework and analysis One essential idea of our preconditioners is a scaling-and-compression strategy: the offdiagonal blocks are first scaled on both sides by the inverses of the factors of the corresponding diagonal blocks and then compressed he work in [9] only uses the scaling on the left by the inverse of the leading factor, and the scaling on both sides is mentioned without details he double-sided scaling strategy is first formally mentioned in our presentation [6] Here, we show that, with the double-sided scaling and compression, it becomes very convenient to justify the effectiveness and also to generalize to multiple levels in the next section Also unlike in [9], our schemes do

4 4 JANLN XA AND ZXNG XN not need sequential Schur complement computations n this section, we present our scaling-and-compression strategy in a prototype preconditioner hen we rigorously show its properties he preconditioner and analysis will be generalized to multiple levels in Section 3 1 Scaling-and-compression strategy Consider an N N SPD matrix A partitioned into a block form A11 A 1 A 1, A 1 A where the two diagonal blocks A 11 and A are square matrices Suppose the Cholesky factorizations of the diagonal blocks look like A 11 = L 1 L 1, A = L L hen 3 A = L1 L C C L 1 L, where the identity matrices may have different sizes, and 4 C = L 1 1 A 1L Compute an SVD 5 C = U 1 Û 1 Σ ˆΣ U Û = U 1 ΣU + Û1 ˆΣÛ, where Σ = diagσ 1,, σ r is for the leading singular values σ 1,, σ r of C, and ˆΣ = diagσ r+1, σ r+, is for the remaining singular values σ r+1, σ r+, of C Σ is a square matrix, and ˆΣ may be rectangular U1 Û 1 and U Û are square matrices Suppose all the singular values are ordered from the largest to the smallest, and 6 σ r+1 τ σ r Later, τ will be used as a tolerance for the truncation of the singular values in ˆΣ Before we proceed, we give the following simple lemma Lemma 1 Suppose C is an m n matrix with singular values σ i, i = 1,,, C min{m, n} Let H = C hen H has an eigenvalue 1 with multiplicity m + n min{m, n}, and its remaining eigenvalues are given by 1 ± σ i, i = 1,,, min{m, n} Furthermore, H is SPD if and only if σ i < 1, i = 1,,, min{m, n} Proof Suppose Σ is an m n diagonal matrix with the main diagonal entries given Σ by σ i, i = 1,,, min{m, n} hen H is similar to Σ, whose eigenvalues are either 1 or 1 ± σ i his result then immediately implies that H is positive definite if and only if σ i < 1

5 EFFECVE AND ROBUS SPD MARX PRECONDONNG 5 C Since A is SPD, so is C in 3 hen Lemma 1 leads to the following result Lemma All the singular values σ i of C in 4 satisfy σ i < 1, i = 1,, o construct a structured incomplete preconditioner, we truncate the singular values of C in 4 by dropping ˆΣ in 5 hen 7 A Ã L1 L U 1 ΣU U ΣU1 L 1 We can continue to factorize the matrix in the middle of the right-hand side so as to obtain a preconditioner For the analysis purpose, we tentatively use the Cholesky factorization Later in Section 4, this will be replaced by a more practical ULV factorization hat is, 8 U 1 ΣU U ΣU1 where S is the Schur complement = U ΣU1 9 S = U Σ U S L U1 ΣU From Proposition 3 below, we can see that S is positive definite, so assume S = D D is its Cholesky factorization hus, U 1 ΣU U ΣU1 hen a prototype preconditioner looks like = U ΣU 1 D D U1 ΣU, 10 L = L1 L Ã = L L, with U ΣU1 = D L 1 L U ΣU 1 L D Remark 1 Our scheme here is more general than the one in [9], and there are some fundamental differences 1 Double-sided scaling L 1 1 A 1L is used here in 4, while single-sided scaling L 1 1 A 1 is used in [9] he double-sided scaling and compression make it convenient to control the approximation accuracy of Ã, and can be used to produce more general and comprehensive analysis See the next subsection he analysis in [9] is restricted to a special case Explicit Schur complement computation is needed in [9] to get a structured Cholesky factorization, which is essentially a sequential process Our ULV factorization in Section 4 will avoid the Schur complements he mentioning of the Schur complement above is only for the analysis purpose, and is not needed in the implementation 3 he generalization of the prototype preconditioner based on 1-level block partitioning to multiple levels is also more natural via repeated application We will further provide accuracy, effectiveness, and robustness analysis for our multilevel version See Section 3 No such type of analysis is available in [9]

6 6 JANLN XA AND ZXNG XN Robustness and accuracy We then discuss the robustness and accuracy of the prototype preconditioner à in 7 and also 10 We show that à is guaranteed to be positive definite, and it approximates A by a relative error bound precisely controlled by the absolute compression tolerance τ he tolerance τ also controls the approximation accuracy of L Proposition 3 à in 7 resulting from the truncation of ˆΣ in 5 is always positive definite Proof here are multiple ways to show the result We prove it from an aspect C that clearly reflects the robustness of the preconditioner Since C is SPD, the following Schur complement is also SPD: 11 S C C = U Σ U Û ˆΣ ˆΣÛ Note that ˆΣ is allowed to be rectangular On the other hand, 9 means 1 S = S + Û ˆΣ ˆΣÛ U hus, S in 8 is SPD Accordingly, 1 ΣU U ΣU1 in 8 is SPD 7 then indicates à is SPD he relationship in 1 is consistent with the idea of Schur monotonicity or Schur compensation in [1, 9], and leads to the enhanced robustness of à An alternative proof can be based on Lemma 1 We then show the approximation accuracy of à and L in terms of the truncation tolerance τ heorem 4 Suppose L is the exact lower triangular Cholesky factor of A hen A à τ A, L L τ 1 + d n 1 σ n 1 σ1 τ L, where d n = 1 + log n and n is the column size of C Proof Note that 6 and Lemma imply τ < 1 According to 3, 5, and 7, the approximation error matrix is given by hus, A à = L1 L = 0 Û 1 ˆΣÛ Û ˆΣ Û1 0 0 L 1 Û 1 ˆΣÛ L L Û ˆΣ Û 1 L 1 0 L 1 L A à = L 1 Û 1 ˆΣÛ L L 1 ˆΣ L = σ r+1 A11 A τ A, where we have used the relationship between the -norm of an SPD matrix and the -norm of its Cholesky factor

7 EFFECVE AND ROBUS SPD MARX PRECONDONNG 7 Next, consider the approximation accuracy of L Suppose S = D D is the Cholesky factorization of S in 11 According to 3 and 5, L1 L 1 13 L = L C = D L U ΣU1 + Û ˆΣ Û1 L D his together with 10 yields L L = hus, 0 0 L Û ˆΣ Û 1 L D D L L = L Û ˆΣ Û 1 L D D L Û ˆΣ Û 1 D D L τ + D D = τ + D D A τ + D D A = τ + D D L According to [9], the Cholesky factor D of S satisfies the approximation error bound D D d n S 1 D S S = d n S 1 D σ r+1, where d n = 1/ + log n and n is the column size of C and also the size of S From 5 and 11 and noticing Lemmas 1, we can see that S 1 = 1 1 σ1, D = 1 σn hus, D D 1 σ d n n 1 σ1 σr+1 d n 1 σ n 1 σ1 τ, L L τ + d n 1 σ n 1 σ1 τ L = τ 1 + d n 1 σ n 1 σ1 τ L heorem 4 and Lemma indicate that the absolute tolerance τ in 6 for the compression of C essentially plays a role of a relative error bound in the approximation of A by à τ also controls the relative approximation accuracy of the Cholesky factor L n fact, if τ is not too large and satisfies dn 1 σ n τ = O1, then 1 σ1 L L = Oτ L 3 Effectiveness of the preconditioner heorem 4 indicates that we can use τ to control how accurately à approximates A and L approximates L Moreover, since we are interested in preconditioning, we would like to show that τ does not have to be very small to yield an effective preconditioner A larger tolerance τ means a

8 8 JANLN XA AND ZXNG XN more compact SF preconditioner that is more efficient to apply Note that our scheme here is different and more general than the one in [9] see Remark 1 However, we can still prove effectiveness results similar to those in [9] and even stronger, though the double-sided scaling makes the analysis much less trivial here heorem 5 For L in 10, we have where Ĉ = Û1 ˆΣÛ D and L 1 A L = Ĉ Ĉ 14 Ĉ = σ r+1 τ Moreover,, L 1 A L = σ r+1, λ L 1 A L 1 σ r+1, κ L 1 A L = 1 + σ r+1 1 σ r τ 1 τ Proof According to 10 and 13, 1 L 1 A L C U1 ΣU = U ΣU1 D C D D D C U1 ΣU = D 1 C U ΣU1 D D D Û1 ˆΣÛ = D 1 Û ˆΣ Û1 D D D Û = 1 ˆΣÛ D D 1 Û ˆΣ Û1 D 1 Û ˆΣ ˆΣÛ + S D Û = 1 ˆΣÛ D Ĉ Û ˆΣ Û1 = S Ĉ D 1 D 1 D Next, we show 14 Notice that the -norm of a symmetric matrix is equal to its spectral radius We have Ĉ = Ĉ 1 Ĉ = D 1 = ρ D Û ˆΣ ˆΣÛ = ρ S 1 Û ˆΣ ˆΣÛ Û ˆΣ ˆΣÛ D = ρ = ρ U Σ U 1 Û ˆΣ ˆΣÛ D D According to the Sherman-Morrison-Woodbury formula, U Σ U 1 = [ U ΣU Σ ] 1 D 1 Û ˆΣ ˆΣÛ = U Σ[ + U Σ U Σ] 1 U Σ = U Σ + Σ 1 Σ U 1

9 EFFECVE AND ROBUS SPD MARX PRECONDONNG 9 hus, Ĉ = ρ[ U Σ + Σ 1 Σ U ]Û ˆΣ ˆΣÛ = ρû ˆΣ ˆΣÛ = Û ˆΣ ˆΣÛ = σ r+1, where U Û = 0 is used he remaining results can then be conveniently shown 15 is obvious Lemma Ĉ 1 means that Ĉ has the largest eigenvalue 1 + σ r+1 and the smallest eigenvalue 1 σ r+1 his leads to 16 and 17 heorem 5 indicates that τ also controls how close the preconditioned matrix L 1 A L is to and how closely λ L 1 A L clusters around 1 Moreover, the effectiveness of the preconditioner when τ is not small can be interpreted from the theorem similarly to [9] hat is, when the singular values σ i of C decay, the condition number 1+σr+1 1 σ r+1 after truncation decays much faster n [9], this is illustrated in terms of a special case A similar study of the decay property is also done in [17] for some Schur complement based preconditioners Here, we can further rigorously characterize this as follows Suppose st is a differentiable and non-increasing function with t > 0, 0 st < 1 ts decay rate at t can be reflected by its slope s t hen for ft = 1+st 1 st, f t = θts t, with θt = 1 st his indicates that, the slope of ft at t is that of st enlarged by a magnification factor θt he closer t is to 0 or s is to 1, the larger θt is and the faster ft decays Accordingly, this means that a relatively small t can already bring ft to a reasonable magnitude hat is, by keeping a relatively small number of largest singular values σ i of C σ i [0, 1, we can get a reasonable condition number after preconditioning Notice that the singular value truncation is controlled by 6 We can thus use a relatively large tolerance τ for preconditioning For example, for selected values of τ, the corresponding bounds for κ L 1 A L and the decay magnification factors are given in able 1 For τ as large as 095, we already have a reasonable condition number bound 1+τ 1 τ = 39 n practice, the faster σ i decays, the better the preconditioner works able 1 Understanding the decay behavior of the condition number of the preconditioned matrix in terms of the decay of the singular values σ i when the singular values smaller than or equal to τ are truncated τ τ τ 1 τ n particular, for a 5-point discrete D Laplacian matrix A evenly partitioned into a block form, we can compute C and investigate its singular values σ i, and then show the resulting condition number if all the singular values σ i satisfying σ i τ are truncated See Figure 1 Clearly, for smaller indices i, the magnification factors θ

10 10 JANLN XA AND ZXNG XN are larger and the condition number decays faster By keeping only a small number of singular values or using a small numerical rank r, we can already get a reasonable condition number σi σi/1 σi /1 σi ndex i ndex i ndex i a Singular value b Condition number c Magnification factor σ i 1+σ i 1 σ i θ i = 1 σ i Fig 1 For a 5-point discrete Laplacian matrix A evenly partitioned into a block form, a shows how the nonzero singular values σ i of C decay, b shows how the condition number of the preconditioned matrix decays when the singular values smaller than or equal to σ i are truncated, and c shows the factor that magnifies the decay Remark For some model problems such as the 5-point discrete Laplacian matrix, the analytical derivation of the singular values of C will appear in [31] t can further be shown that κ L 1 A L = O N when r is set to be O1 his is beyond the focus here 4 Basic ULV factorization scheme As mentioned in Remark 1, the preconditioning scheme in [9] has several limitations, one of which is the sequential computation of Schur complements his can be avoided by using a ULV-type factorization that is similar to but simpler than the ones in [5, 8] for hierarchically semiseparable HSS matrices he idea is to introduce zeros into the scaled 1, and, 1 off-diagonal blocks simultaneously and then obtain reduced matrices More specifically, for U i, i = 1, in 7, suppose Q i is an orthogonal matrix such that 0 Q i U i = hen U 1 ΣU U ΣU1 Q1 = Q 0 0 Σ 0 Σ 0 Q 1 Q Let Π 3 be an appropriate permutation matrix such that U 1 ΣU Q1 U ΣU1 = Π Q 3 Π Q 1 D3 3 Q, Σ where D 3 = is a small r r matrix called reduced matrix Π Σ 3 is used to assemble D 3 hen directly compute a Cholesky factorization of D 3 = L 3 L 3 to

11 EFFECVE AND ROBUS SPD MARX PRECONDONNG 11 complete the factorization his in turn gives a factorization of à in 7 that looks like 18 L 3 = L1 L à = L 3 L 3, with Q1 Π 3 Q L3 L 3 is said to be a ULV factor, since it is given by a sequence of orthogonal and triangular matrices he term ULV factor here follows the name in HSS ULV factorizations [5], and differs from the traditional sense of ULV factors Notice that the operations associated with i = 1, can be performed independently, and this avoids the sequential computation of a Schur complement hus, the overall SF framework includes the scaling-and-compression strategy followed by the ULV factorization With the 1-level block partitioning, we get the prototype preconditioner 18 3 Practical multilevel SF preconditioners and analysis he fundamental ideas for the prototype preconditioner in the previous section can be conveniently generalized to practical preconditioners with multiple levels of partitioning n the following, we describe our multilevel SF framework, including a multilevel scalingand-compression strategy combined with a multilevel ULV factorization We have two versions for the multilevel generalization, with some differences in the performance o describe the preconditioners and analysis, we list commonly used notation as follows = {1 : N} is the entire index set for A For an index subset s i, \s i is its complement represents a postordered binary tree with the nodes labeled as i = 1,,, root, where root denotes the root node Suppose root is at level 0, and the leaves are at the bottom level L sibi and pari denote the sibling and parent of node i in, respectively Each node i of is associated with a set s i of consecutive indices hese sets satisfy s root =, and s i = s c1 s c, s c1 s c = if i is a nonleaf node with children c 1 and c A si s j represents the submatrix of A with row index set s i and column index set s j b si represents the vector selected from a vector b with the index set s i 31 Multilevel SF preconditioner o generalize the ideas in the previous section to multiple levels, we can apply the scheme repeatedly to the diagonal blocks A 11 and A in 1 his can be done following a binary tree, and the operations corresponding to the nodes at the same level of can be performed simultaneously We can further avoid the top-down recursion by organizing all the local operations along a bottom-up traversal of At each level of, suppose A is partitioned following the index set s i associated with each node i at that level he number of partition levels L will be decided at the end of this subsection n the traversal of, for a leaf node i, directly compute the Cholesky factorization of the corresponding diagonal block 31 A si s i = L i L i Also for later notational consistency, set L i L i For a nonleaf node i, suppose c 1

12 1 JANLN XA AND ZXNG XN and c are its left and right children, respectively hen A sc1 A A si s i = sc sc1 1 sc A sc s c1 A sc s c By induction as in 36 below, the diagonal blocks have been approximately factorized as 3 A sc1 s c1 L c1 L c 1, A sc s c L c L c hus, 33 L A si s i c1 Lc L 1 L 1 c A sc s c1 L c 1 c 1 A sc1 s c L c L c1 L c hen just like the 1-level scheme in Section, we computed the compression approximate SVD 34 L 1 c 1 A sc1 s c L c U c1 Σ c1 Uc, and introduce zeros into this block by using orthogonal matrices Q c1 and Q c such that 35 Q 0 c 1 U c1 =, Q 0 c U c = Since L c1 and L c are structured factors, the formation of L 1 c 1 A sc1 s c L c in 34 involves multiple structured solutions his yields a factorization just like in 18: 36 A si s i L i L i, with L L i = c1 Qc1 Π Lc Q i, c Li where Π i is an appropriate permutation matrix to assemble the reduced matrix D i = Σc1, and L Σ c1 i is the lower triangular Cholesky factor of D i Later in Section 3, we will discuss a condition that guarantees the positive definiteness he formula gives a recursive representation for L i hus, it is clear that the algorithm generates a sequence of orthogonal matrices Q i and triangular matrices L i hese matrices Q i, L i together with the rotations Π i define a ULV factor L in an approximate ULV factorization 37 A Ã L L, where L L root Our multilevel SF preconditioner is then L L he application of the preconditioner is also done following the traversal of For example, consider the solution of the following linear system via forward substitution: Ly = b Partition b into pieces b si following the leaf level index sets s i For each leaf i, compute y i = Q i L 1 i b si

13 EFFECVE AND ROBUS SPD MARX PRECONDONNG 13 For each nonleaf node i with children c 1 and c, compute y i = Q i Π yc1 i L 1 i When root is reached, we obtain the solution y y root he backward substitution is done similarly he costs of the construction and application of the preconditioner can be conveniently counted Suppose r is the maximum rank used for the compression or truncated SVD in 34 he repeated diagonal partition is done until the finest level diagonal block sizes are around r hus, L log N r hen the approximate factorization costs OrN log N r flops Notice that A is a general SPD matrix his cost can be reduced when A is sparse or has some special properties o get a preconditioner, we set r = O1 he cost to apply the resulting preconditioner is ON log N he storage is also ON log N he log N term in the costs is due to the size of Q i, which is roughly N if i is at level l Q l i results from the extension of U i into an orthogonal matrix, which can be done via Householder transformations for convenience See 35 hus, Q i can be represented in terms of r Householder vectors We can remove the log N term in a modified scheme in Section 33 3 Robustness, accuracy, and effectiveness of the multilevel preconditioner We then provide analysis for the multilevel preconditioner For convenience, introduce the notation A l corresponding to the levels l = 0, 1,, L of so as to track the error accumulation at each level nitially, A L A For each leaf node i at level L, we compute the Cholesky factorization 31 At this level, no approximation is involved hen for each node i at level l < L, the scaling-and-compression strategy in 3 34 is applied, producing an approximate factorization in 36 his yields the approximation of A l+1 by A l, where the diagonal block A l+1 si s i is approximated by A l si s i L i L i his process then continues, and produces a sequence of approximations to A: 38 A L A = A L 1 = = A 0 Ã, where à is the final preconditioner in a factorized form Since A L 1 approximates A with modified diagonal blocks, A L 1 may not be positive definite Similarly, A l for l < L may not be positive definite We would like to study the levelwise error accumulation and give a condition for A l to remain positive definite heorem 31 Let τ be the tolerance of the truncated SVD in the scaling-andcompression strategy in 3 34 to generate the matrices A l, l = L 1, L,, 0 in 38 with L 1 f τ L < κa, then each A l is positive definite, and Proof 39 means y c A à [1 + τ L 1] A 310 [1 + τ L 1] A < λ N A,

14 14 JANLN XA AND ZXNG XN where λ N A is the smallest eigenvalue of A For l = L 1, L,, 0, we show A A l [1 + τ L 1] A < λ N A hen Weyl s heorem means that each A l is SPD For l = L 1, the matrix A L 1 is obtained from A through the 1-level scaling and compression, where A si s i is approximated as in 36 for each node i at level L 1 hus, heorem 4 indicates Accordingly, A si s i A L 1 si s i τ A si s i A A L 1 τ Since τ 1 + τ L 1, 310 means max { A s i s i } τ A i at level L 1 A A L 1 τ A [1 + τ L 1] A < λ N A hus, A L 1 is SPD he factorization can then proceed to level l = L when L Similarly to the argument as above, we have By the triangle inequality, A L 1 A L τ A L A A L A A L 1 + A L 1 A L A A L 1 + τ A L 1 A A L 1 + τ A A L 1 + A 1 + τ A A L 1 + τ A [1 + τ + 1]τ A = [1 + τ 1] A hen by 310, A A L [1 + τ L 1] A < λ N A his leads to the positive definiteness of A L he process then proceeds by induction n fact, just like in 311, for level l < L 1, we can get an error accumulation pattern A A l 1 + τ A A l+1 + τ A hus, A A τ A A 1 + τ A 1 + τ[1 + τ A A + τ A ] + τ A [1 + τ L τ + 1]τ A = [1 + τ L 1] A < λ N A

15 EFFECVE AND ROBUS SPD MARX PRECONDONNG 15 his completes the proof 1 hus, if τ is smaller than roughly LκA, then the preconditioner à is always positive definite, and it approximates A to the accuracy about Lτ Of course, this is still a very conservative estimate n practice, the positive definiteness is often well preserved by à for various ill-conditioned matrices A even with relative large τ See the examples in Section 4 Our modified scheme in Section 33 has even better robustness in practice n our future work, we will investigate the feasibility of improving the methods and analysis to avoid the term κa in the condition he following theorem shows the effectiveness of the multilevel preconditioner in terms of the eigenvalues and the condition number of the preconditioned matrix L 1 A L We also show how close L 1 A L is to heorem 3 With the same conditions as in heorem 31 and with L in 37, the eigenvalues of L 1 A L satisfies ϵ λ L 1 A L 1 1 ϵ, where ϵ = [1 + τ L 1]κA < 1 Accordingly, 313 L 1 A L ϵ 1 ϵ, κ L 1 A L 1 + ϵ 1 ϵ Proof 39 means ϵ < 1 Note that the eigenvalues of L 1 A L are the inverses of the eigenvalues of L 1 ÃL, where A = LL By heorem 31, Note that hus, L 1 ÃL = L 1 à AL A à L 1 L [1 + τ L 1] A L 1 L = [1 + τ L 1] A A 1 = ϵ L 1 ÃL = max{ λ N L 1 ÃL 1, λ 1 L 1 ÃL 1 } 314 λ N L 1 ÃL 1 ϵ his yields λ N L 1 ÃL 1 ϵ n fact, either λ N L 1 ÃL 1, or otherwise, 314 means 1 λ N L 1 ÃL ϵ Accordingly, 315 λ 1 L 1 A L = On the other hand, 1 λ N L 1ÃL 1 1 ϵ λ 1 L 1 ÃL = L 1 ÃL 1 + L 1 ÃL 1 + ϵ

16 16 JANLN XA AND ZXNG XN Accordingly, 316 λ N L 1 A L = 1 λ 1 L 1ÃL ϵ Combining 315 and 316 yields 31 and 313 As a consequence, L 1 A L = max{ λ 1 L 1 A L 1, λ N L 1 A L 1 } max{ 1 1 ϵ 1, ϵ } ϵ 1 ϵ herefore, in the multilevel case, ϵ plays a role similar to τ in the 1-level case in heorem 5 We can then similarly understand the effectiveness of the preconditioner when aggressive compression with relatively large τ is used, like in the end of Section 3 able 1 and Figure 1 33 Modified multilevel SF preconditioner n the multilevel scheme in Section 31, the resulting Q i factors have sizes equal to the sizes of the corresponding index sets s i Such sizes increase when we advance to upper level nodes here are different ways to save some computational costs One way is to instead scale and compress the off-diagonal blocks A si \s i in a hierarchical scheme A si \s i is essentially called an HSS block row a block row without the diagonal block [8] Some nested basis matrices will be produced his can avoid the scaling and compression of large dense off-diagonal blocks since A si \s i may include subblocks that have already been compressed in previous steps However, it makes the scaling on the right more delicate hus, we use a modified version hat is, we first apply the left scaling to A si \s i, perform compression, and compute ULV factorization Since the entire block row A si \s i instead of just A si s j for j = sibi is scaled on the left, the symmetry indicates that the right-side scaling is implicitly built in his will become clear later in 31 and 3 We would first like to mention a notation issue n the algorithm, when A si \s i of A is compressed by a rank-revealing QR factorization as A si \s i U i i, the matrix i can reuse the storage for A si \s i For convenience, we write this compression step as A si \s i U i  i ŝi \s i, where Âi ŝi \s i indicates the storage of i in A with the row index set ŝ i and column index set \s i Such a notation scheme is used in [7] and makes it convenient to identify the subblocks that have been compressed in earlier steps he symbols Âi are only for the purpose of organizing the compression steps, and do not imply any extra storage During the factorization, we try to take advantage of previous compression information as much as possible by keeping track of compressed subblocks his is done with the aid of a visited set in [7] Definition 33 For a node i of a postordered binary tree, a visited set V i is defined to be V i = {j: j < i and sibj is an ancestor of i} he visited set essentially corresponds to the roots of all the subtrees with indices smaller than i hose nodes have been traversed, and correspond to blocks that are

17 EFFECVE AND ROBUS SPD MARX PRECONDONNG 17 highly compressed A detailed explanation of this and a pictorial illustration of V i can be found in [7] n fact, V i can be made more general to include the roots of all the subtrees previous visited, not necessarily those smaller than i We follow the original definition in [7] just for the convenience of presentation he modified multilevel SF works as follows For a leaf node i of, compute the Cholesky factorization as in 31 and set L i L i We then scale and compress the off-diagonal block A si \s i Note that, due to the symmetry, some subblocks of A si \s i may have been compressed in the previous steps Assume 317 V i = {k 1, k,, k α : k 1 < < k α }, where α is the number of elements in V i and depends on i Form Ω i = Âk1 ŝk1 s i Âkα ŝkα s i A si s + i, where s + i contains all indices larger than those in s i, and Âk1 ŝk1 s i,, are results from earlier compression steps More specifically, each block Âk sk s i is previously compressed as U k  k ŝk s i during the compression step associated with node k V i Due to the symmetry, Âk sk s i should be compressed at step i, and the row basis matrix U k can be ignored hus, the factor Âk ŝk s i is collected into Ω i his can be understood like in Figure 33 in [7] We skip the technical details Scale Ω i as Θ i L 1 i Ω i hen compute a rank-revealing Âkα ŝkα s i QR RRQR factorization Θ i U i Âi ŝi ŝ k1  i ŝi ŝ kα  i ŝi s + i where ŝ i is the row index set used to identify the second factor Also, find an orthogonal matrix Q i such that Q i U i = hen set L 0 i = L i Q i 0 he process above indicates that L i = L i U i and it is an approximate column basis matrix for Ω i Similarly, for j = sibi, we also obtain an approximate 0 column basis matrix L j for Ω j hese yield 318 A si s j where L i 0 B i B i = Âι ŝi ŝ j with ι = max{i, j} L j = L i diag0, B i L j, For a nonleaf node i with children c 1 and c, the induction process below shows 0 0 that L c1 and L c are approximate column basis matrices for A sc1 \s c1 and A sc \s c, respectively, and A sc1 s c1 L c1 L c 1, A sc s c L c L c, 0 A sc1 s c L c1 B c1 0 L c = L c1 diag0, B c1 L c, with B c1 = Âc ŝc1 ŝ c

18 18 JANLN XA AND ZXNG XN t is thus clear that A si s i is approximated as L 3 A si s i c1 diag0, B c1 Lc diag0, Bc 1 L c 1 L c 31 and 3 precisely explain how the right-side scaling is implicitly done in the process Again, with a permutation matrix Π i, the matrix in the middle on the righthand side of 3 can be assembled as Π i Bc1, where D D i = i Bc 1 Compute a Cholesky factorization D i = L i L i, then L 33 A si s i ˆL iˆl i, with ˆL i = c1 Lc Π i Li At this point, we can use ˆL 1 i to scale the off-diagonal block row A si \s i associated with i and then compress Again by induction, 34 A si \s i = A sc1 \si A sc \s i = L c1 0 L c 0 Ω i,1 Ω i, where Ω i = Ωi,1 Ω i, Âc1 ŝk1 ŝ c1 Âc1 ŝkα ŝ c1  c1 ŝc1 s + i Âc ŝk1 ŝ c Âc ŝkα ŝ c  c ŝc s + i, and the notation in 317 is assumed hen ˆL 1 i A si \s i = ˆL 1 i diag = = hus, we just need to compress L 1 i L 1 i 0 L c1 Π 0 i diag, L c 0 0 diag, Θ i L 1 i Ω i We can then perform an RRQR factorization Ω i 0, Ω i = Θ i U i Âi ŝi ŝ k1  i ŝi ŝ kα  i ŝi s + i 0 Again, find an orthogonal matrix Q i such that Q i U i = Ω i 0 L 1 i Let Ω i L 35 L i ˆL i = c1 Q i Lc Π i LiQi

19 EFFECVE AND ROBUS SPD MARX PRECONDONNG hen we can verify that L i = ˆL i and Ui A si \s i L i 0 Âi ŝi ŝk1 Â i ŝi ŝ kα Â i ŝi s + i 0 his confirms the induction about the column basis matrices of the forms L i See, eg, 31 and 34 Moreover, 33 can be rewritten as A si s i L i L i, with L i in 35 his confirms the induction process in 30 n addition, we can extract B i just like in 319 so as to obtain an approximation like in 318 he factorization process then continues until root is reached, and then we have an approximate factorization like in 37 A major difference is that L is now given by the recursion 35, where Q i is a small matrix with size r when r is the numerical rank in the compression For a general dense matrix A, this scheme costs OrN to compute the approximate factorization he solution procedure can be designed similarly to that at the end of Section 31 With r = O1 in preconditioning, the cost to apply the resulting preconditioner is ON, and the storage is ON his scheme shares some similarities as the one in Section 31, but is much more sophisticated We thus skip the other analysis We expect slightly reduced effectiveness, but enhanced robustness, as confirmed by numerical tests in the next section Remark 31 n this scheme and also the one in Section 31, we may replace the truncated SVDs or RRQR factorizations by other appropriate methods to improve the efficiency n particular, if A is a sparse matrix or a dense one that admits fast matrix-vector multiplications, randomized construction schemes similar to those in [18, 0, 30] can be used so that the cost to construct the preconditioner can be reduced to roughly ON Relevant details will appear in [31], as this work focuses on general SPD matrices Here for the compression of dense blocks, either truncated SVD or a randomized method is used, depending on the block sizes 4 Numerical experiments n this section, we demonstrate the performance of the new preconditioners, which are used in the preconditioned conjugate gradient method PCG to solve some ill-conditioned linear systems We compare the effectiveness and robustness of the following preconditioners: bdiag: the block diagonal preconditioner; HSS: HSS approximation in ULV factors [8] based on direct off-diagonal compression; NEW0: the multilevel SF preconditioner in Section 31; NEW1: the modified multilevel SF preconditioner in Section 33 For convenience, the following notation is used in the discussions: r: the numerical rank bound used in off-diagonal compression; A prec : the preconditioned matrix eg, L 1 A L when a preconditioner L L is used; for some cases where HSS loses positive definiteness, factors from nonsymmetric HSS factorizations are used; γ = Ax b b : the relative residual, where b is generated with the vector of all ones as the exact solution;

20 0 JANLN XA AND ZXNG XN N it : the number of iterations to reach a given accuracy Several ill-conditioned matrices are tested Since our studies are concerned with general SPD matrices, all the test matrices are treated as dense ones, and we do not take advantage of possible special structures within some examples n all the tests, we keep the numerical rank bound r very small, so that the costs of applying our preconditioners by substitution are nearly linear in N Such costs are almost negligible as compared with dense matrix-vector multiplication costs hus, it is important to reduce the number of iterations in PCG For consistency, the diagonal block sizes of bdiag and also the finest level diagonal block sizes of HSS, NEW0, and NEW1 are all set to be r Example 1 Consider a matrix A with entries A ij = ij1/4 π 16 + i j he matrix can be known to be SPD and ill conditioned based on a result in [4] We test different matrix sizes N, but fix r = 5 when generating the preconditioners PCG with the four preconditioners is used to reach the relative accuracy γ 10 1 he tests are in Matlab and run on an ntel Xeon-E5 processor with 64GB memory able 41 shows the results We report the condition numbers before and after preconditioning, the numbers of iterations N it, and the relative residuals γ For all the tests, NEW0 and NEW1 remain SPD, although r is very small However, HSS fails to preserve the positive definiteness hus, a nonsymmetric HSS factorization is used in the solution NEW0 and NEW1 significantly reduce the condition numbers and lead to much faster convergence than bdiag and HSS HSS gives reasonable reduction in the condition numbers, but the numbers of iterations are higher he performance can also be observed from Figure 41a, which shows the actual convergence behaviors for N = 300 he fast convergence with the new preconditioners can also be confirmed by the eigenvalue distribution of A prec See Figure 41b, where the eigenvalues of A prec with NEW0 or NEW1 closely cluster around 1 t is not the case for bdiag or HSS Relative residual γ bdiag HSS NEW0 NEW N it Number of iterations a Convergence of PCG Eigenvalues A A prec bdiag A prec HSS A prec NEW0 A prec NEW ndex b λa and λa prec Fig 41 Example 1 Convergence of PCG with the preconditioners for the matrix with N = 300, and the eigenvalues before and after preconditioning n addition, NEW0 fully respects the scaling-and-compression strategy and is the most effective NEW1 needs less memory than NEW0 ON vs ON log N See

21 EFFECVE AND ROBUS SPD MARX PRECONDONNG 1 able 41 Example 1 Performance of the preconditioners in PCG he condition numbers are not evaluated for N = 5, 600 and 51, 00 since it is too slow to form A prec κa prec N , 800 5, , 00 κa 148e6 14e6 306e6 438e6 bdiag 618e3 60e3 61e3 6e3 HSS 489e 489e 481e 476e NEW NEW N it HSS NEW bdiag NEW bdiag 810e e e e e e 13 γ HSS 878e e e 13 48e e e 13 NEW0 689e e e e e e 13 NEW1 518e e 13 38e e e e 13 bdiag Positive HSS definiteness NEW0 NEW1 Figure 4a he storage of HSS is nearly the same as that of NEW1 and is not shown We also report the runtime to construct the two new preconditioners and to apply them to one vector in Figure 4b he construction of HSS is about 3 times faster than that of NEW1, and the application of HSS has runtime close to that of NEW1 However, the iteration time of PCG with HSS is much longer due to the larger number of iterations he iteration time of PCG with bdiag is also quite larger than with the new preconditioners he construction and application of bdiag are more efficient in Matlab due to the simple block diagonal structure, while the new preconditioners involve more sophisticated structured matrix operations such as the application of lots of local Householder matrices hus, the advantage of the new preconditioners as compared with bdiag in iteration time is not as large as the advantage in the numbers of iterations n practice, the reduction of the number of iterations is essential, since each dense matrix-vector multiplication costs ON, while each application of these preconditioners costs only about ON Example hen consider some interpolation matrices A for the following radial basis functions RBFs of t: f 1 t, µ = e µ t, f t, µ = 1 µ t + 1, f 3t, µ = sech µt, where µ is the shape parameter For convenience, choose t to be between 0 and N 1 t is well known that the RBF matrices are very ill conditioned for small shape parameters For some cases, the condition numbers are nearly exponential in 1 µ or 1 µ see, eg, [4]

22 JANLN XA AND ZXNG XN Storage 106 ON log N reference line ON reference line NEW0 NEW1 ime s ON log N reference line NEW0 - Construction NEW1 - Construction ON reference line NEW0 - Application NEW1 - Application N a Storage number of nonzeros N b Construction and application time Fig 4 Example 1 Storage for NEW0 and NEW1 and the runtime to construct them and to apply them to one vector We test several matrices with different parameters µ he matrix sizes are set to be N = 1000 Similar results are observed for other sizes We fix r = 7 when generating the preconditioners See able 4 for the results PCG with NEW0 or NEW1 converges quickly, and NEW0 is the most effective On the other hand, NEW1 remains positive definite for all the test cases, while NEW0 loses positive definiteness for one case indicated by almost in able 4 n that case, two small intermediate reduced matrices in the construction of the preconditioner become indefinite A very small diagonal shift is added to make these reduced matrices positive definite n addition, HSS again becomes indefinite for all the tests able 4 Example Performance of the preconditioners in PCG RBF e ϵ t sech ϵt 1 ϵ x +1 ϵ κa 49e6 930e8 348e6 130e10 5e5 536e7 N it bdiag 883e4 75e7 80e4 318e8 651e3 10e6 κa prec HSS 160e1 54e4 94e1 79e4 63e1 750e4 NEW e1 NEW e e e HSS NEW bdiag NEW bdiag 667e e e e e e 13 γ HSS 863e e 13 74e 13 57e 13 70e e 13 NEW0 49e e 14 93e e 13 77e 13 41e 14 NEW1 38e e e e e 13 70e 13 bdiag Positive HSS definiteness NEW0 Almost NEW1

23 EFFECVE AND ROBUS SPD MARX PRECONDONNG 3 Figure 43 also shows the actual convergence and eigenvalue distributions for one matrix he new preconditioners yield much faster convergence and better eigenvalue clustering Relative residual γ bdiag HSS NEW0 NEW N it Number of iterations a Convergence of PCG Eigenvalues A A prec bdiag A prec HSS A prec NEW0 A prec NEW ndex b λa and λa prec Fig 43 Example Convergence of PCG with the preconditioners for the matrix corresponding to sech µt, µ = 0 in able 4, and the eigenvalues before and after preconditioning Example 3 We further test some highly ill-conditioned matrices from different applications, for which NEW1 still demonstrates superior robustness and also leads to much faster convergence than HSS LinProg N = 301, κa = 09e11 A matrix of the form CΛC as used in some linear programming problems, where C is a rectangular matrix nemspmm from the linear programming test matrix set Meszaros in [7], and Λ is a diagonal matrix with diagonal entries even spaced between 10 5 and 1 We set r = 9 MaxwSchur N = 3947, κa = 478e9 A dense normal matrix in [9] constructed based on a Schur complement in the direct factorization of a discretized 3D Maxwell equation We set r = 0 BC5Schur N = 186, κa = 38e11 A dense intermediate Schur complement in the multifrontal factorization [8] of the BCS structural engineering matrix BCSSK5 from the Harwell-Boeing Collection [1] We set r = 7 BCSSK13 N = 003, κa = 110e10 he matrix BCSSK13 directly treated as dense from the Harwell-Boeing Collection [1] We set r = 10 For each matrix, NEW1 remains positive definite, while HSS loses the positive definiteness he convergence of PCG with NEW1 is also faster, as shown in Figure 44 n able 43, we also show the convergence results for one matrix with different compression rank bounds r A larger rank bound r leads to better convergence, but the preconditioner construction and application cost more 5 Conclusions We have presented the SF framework and its analysis for preconditioning general SPD matrices he SF techniques include the scaling-andcompression strategy and the ULV factorization A prototype preconditioner is used to illustrate the basic ideas and fundamental analysis Generalizations to practical multilevel schemes are then made and also analyzed Systematic analysis for the robustness, accuracy control, and effectiveness of the preconditioners is given

MULTI-LAYER HIERARCHICAL STRUCTURES AND FACTORIZATIONS

MULTI-LAYER HIERARCHICAL STRUCTURES AND FACTORIZATIONS MULTI-LAYER HIERARCHICAL STRUCTURES AND FACTORIZATIONS JIANLIN XIA Abstract. We propose multi-layer hierarchically semiseparable MHS structures for the fast factorizations of dense matrices arising from

More information

A robust inner-outer HSS preconditioner

A robust inner-outer HSS preconditioner NUMERICAL LINEAR ALGEBRA WIH APPLICAIONS Numer. Linear Algebra Appl. 2011; 00:1 0 [Version: 2002/09/18 v1.02] A robust inner-outer HSS preconditioner Jianlin Xia 1 1 Department of Mathematics, Purdue University,

More information

Fast algorithms for hierarchically semiseparable matrices

Fast algorithms for hierarchically semiseparable matrices NUMERICAL LINEAR ALGEBRA WITH APPLICATIONS Numer. Linear Algebra Appl. 2010; 17:953 976 Published online 22 December 2009 in Wiley Online Library (wileyonlinelibrary.com)..691 Fast algorithms for hierarchically

More information

Effective matrix-free preconditioning for the augmented immersed interface method

Effective matrix-free preconditioning for the augmented immersed interface method Effective matrix-free preconditioning for the augmented immersed interface method Jianlin Xia a, Zhilin Li b, Xin Ye a a Department of Mathematics, Purdue University, West Lafayette, IN 47907, USA. E-mail:

More information

Incomplete Cholesky preconditioners that exploit the low-rank property

Incomplete Cholesky preconditioners that exploit the low-rank property anapov@ulb.ac.be ; http://homepages.ulb.ac.be/ anapov/ 1 / 35 Incomplete Cholesky preconditioners that exploit the low-rank property (theory and practice) Artem Napov Service de Métrologie Nucléaire, Université

More information

A DISTRIBUTED-MEMORY RANDOMIZED STRUCTURED MULTIFRONTAL METHOD FOR SPARSE DIRECT SOLUTIONS

A DISTRIBUTED-MEMORY RANDOMIZED STRUCTURED MULTIFRONTAL METHOD FOR SPARSE DIRECT SOLUTIONS A DISTRIBUTED-MEMORY RANDOMIZED STRUCTURED MULTIFRONTAL METHOD FOR SPARSE DIRECT SOLUTIONS ZIXING XIN, JIANLIN XIA, MAARTEN V. DE HOOP, STEPHEN CAULEY, AND VENKATARAMANAN BALAKRISHNAN Abstract. We design

More information

Numerical Methods I Non-Square and Sparse Linear Systems

Numerical Methods I Non-Square and Sparse Linear Systems Numerical Methods I Non-Square and Sparse Linear Systems Aleksandar Donev Courant Institute, NYU 1 donev@courant.nyu.edu 1 MATH-GA 2011.003 / CSCI-GA 2945.003, Fall 2014 September 25th, 2014 A. Donev (Courant

More information

A DISTRIBUTED-MEMORY RANDOMIZED STRUCTURED MULTIFRONTAL METHOD FOR SPARSE DIRECT SOLUTIONS

A DISTRIBUTED-MEMORY RANDOMIZED STRUCTURED MULTIFRONTAL METHOD FOR SPARSE DIRECT SOLUTIONS SIAM J. SCI. COMPUT. Vol. 39, No. 4, pp. C292 C318 c 2017 Society for Industrial and Applied Mathematics A DISTRIBUTED-MEMORY RANDOMIZED STRUCTURED MULTIFRONTAL METHOD FOR SPARSE DIRECT SOLUTIONS ZIXING

More information

Fast matrix algebra for dense matrices with rank-deficient off-diagonal blocks

Fast matrix algebra for dense matrices with rank-deficient off-diagonal blocks CHAPTER 2 Fast matrix algebra for dense matrices with rank-deficient off-diagonal blocks Chapter summary: The chapter describes techniques for rapidly performing algebraic operations on dense matrices

More information

Partial Left-Looking Structured Multifrontal Factorization & Algorithms for Compressed Sensing. Cinna Julie Wu

Partial Left-Looking Structured Multifrontal Factorization & Algorithms for Compressed Sensing. Cinna Julie Wu Partial Left-Looking Structured Multifrontal Factorization & Algorithms for Compressed Sensing by Cinna Julie Wu A dissertation submitted in partial satisfaction of the requirements for the degree of Doctor

More information

AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 23: GMRES and Other Krylov Subspace Methods; Preconditioning

AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 23: GMRES and Other Krylov Subspace Methods; Preconditioning AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 23: GMRES and Other Krylov Subspace Methods; Preconditioning Xiangmin Jiao SUNY Stony Brook Xiangmin Jiao Numerical Analysis I 1 / 18 Outline

More information

FAST STRUCTURED EIGENSOLVER FOR DISCRETIZED PARTIAL DIFFERENTIAL OPERATORS ON GENERAL MESHES

FAST STRUCTURED EIGENSOLVER FOR DISCRETIZED PARTIAL DIFFERENTIAL OPERATORS ON GENERAL MESHES Proceedings of the Project Review, Geo-Mathematical Imaging Group Purdue University, West Lafayette IN, Vol. 1 2012 pp. 123-132. FAST STRUCTURED EIGENSOLVER FOR DISCRETIZED PARTIAL DIFFERENTIAL OPERATORS

More information

AMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences)

AMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences) AMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences) Lecture 19: Computing the SVD; Sparse Linear Systems Xiangmin Jiao Stony Brook University Xiangmin Jiao Numerical

More information

FAST SPARSE SELECTED INVERSION

FAST SPARSE SELECTED INVERSION SIAM J. MARIX ANAL. APPL. Vol. 36, No. 3, pp. 83 34 c 05 Society for Industrial and Applied Mathematics FAS SPARSE SELECED INVERSION JIANLIN XIA, YUANZHE XI, SEPHEN CAULEY, AND VENKAARAMANAN BALAKRISHNAN

More information

A sparse multifrontal solver using hierarchically semi-separable frontal matrices

A sparse multifrontal solver using hierarchically semi-separable frontal matrices A sparse multifrontal solver using hierarchically semi-separable frontal matrices Pieter Ghysels Lawrence Berkeley National Laboratory Joint work with: Xiaoye S. Li (LBNL), Artem Napov (ULB), François-Henry

More information

INTERCONNECTED HIERARCHICAL RANK-STRUCTURED METHODS FOR DIRECTLY SOLVING AND PRECONDITIONING THE HELMHOLTZ EQUATION

INTERCONNECTED HIERARCHICAL RANK-STRUCTURED METHODS FOR DIRECTLY SOLVING AND PRECONDITIONING THE HELMHOLTZ EQUATION MATHEMATCS OF COMPUTATON Volume 00, Number 0, Pages 000 000 S 0025-5718XX0000-0 NTERCONNECTED HERARCHCAL RANK-STRUCTURED METHODS FOR DRECTLY SOLVNG AND PRECONDTONNG THE HELMHOLTZ EQUATON XAO LU, JANLN

More information

J.I. Aliaga 1 M. Bollhöfer 2 A.F. Martín 1 E.S. Quintana-Ortí 1. March, 2009

J.I. Aliaga 1 M. Bollhöfer 2 A.F. Martín 1 E.S. Quintana-Ortí 1. March, 2009 Parallel Preconditioning of Linear Systems based on ILUPACK for Multithreaded Architectures J.I. Aliaga M. Bollhöfer 2 A.F. Martín E.S. Quintana-Ortí Deparment of Computer Science and Engineering, Univ.

More information

Multilevel low-rank approximation preconditioners Yousef Saad Department of Computer Science and Engineering University of Minnesota

Multilevel low-rank approximation preconditioners Yousef Saad Department of Computer Science and Engineering University of Minnesota Multilevel low-rank approximation preconditioners Yousef Saad Department of Computer Science and Engineering University of Minnesota SIAM CSE Boston - March 1, 2013 First: Joint work with Ruipeng Li Work

More information

Scientific Computing with Case Studies SIAM Press, Lecture Notes for Unit VII Sparse Matrix

Scientific Computing with Case Studies SIAM Press, Lecture Notes for Unit VII Sparse Matrix Scientific Computing with Case Studies SIAM Press, 2009 http://www.cs.umd.edu/users/oleary/sccswebpage Lecture Notes for Unit VII Sparse Matrix Computations Part 1: Direct Methods Dianne P. O Leary c 2008

More information

This can be accomplished by left matrix multiplication as follows: I

This can be accomplished by left matrix multiplication as follows: I 1 Numerical Linear Algebra 11 The LU Factorization Recall from linear algebra that Gaussian elimination is a method for solving linear systems of the form Ax = b, where A R m n and bran(a) In this method

More information

A robust multilevel approximate inverse preconditioner for symmetric positive definite matrices

A robust multilevel approximate inverse preconditioner for symmetric positive definite matrices DICEA DEPARTMENT OF CIVIL, ENVIRONMENTAL AND ARCHITECTURAL ENGINEERING PhD SCHOOL CIVIL AND ENVIRONMENTAL ENGINEERING SCIENCES XXX CYCLE A robust multilevel approximate inverse preconditioner for symmetric

More information

Key words. spectral method, structured matrix, low-rank property, matrix-free direct solver, linear complexity

Key words. spectral method, structured matrix, low-rank property, matrix-free direct solver, linear complexity SIAM J. SCI. COMPUT. Vol. 0, No. 0, pp. 000 000 c XXXX Society for Industrial and Applied Mathematics FAST STRUCTURED DIRECT SPECTRAL METHODS FOR DIFFERENTIAL EQUATIONS WITH VARIABLE COEFFICIENTS, I. THE

More information

Direct and Incomplete Cholesky Factorizations with Static Supernodes

Direct and Incomplete Cholesky Factorizations with Static Supernodes Direct and Incomplete Cholesky Factorizations with Static Supernodes AMSC 661 Term Project Report Yuancheng Luo 2010-05-14 Introduction Incomplete factorizations of sparse symmetric positive definite (SSPD)

More information

Notes on PCG for Sparse Linear Systems

Notes on PCG for Sparse Linear Systems Notes on PCG for Sparse Linear Systems Luca Bergamaschi Department of Civil Environmental and Architectural Engineering University of Padova e-mail luca.bergamaschi@unipd.it webpage www.dmsa.unipd.it/

More information

Notes for CS542G (Iterative Solvers for Linear Systems)

Notes for CS542G (Iterative Solvers for Linear Systems) Notes for CS542G (Iterative Solvers for Linear Systems) Robert Bridson November 20, 2007 1 The Basics We re now looking at efficient ways to solve the linear system of equations Ax = b where in this course,

More information

Dense LU factorization and its error analysis

Dense LU factorization and its error analysis Dense LU factorization and its error analysis Laura Grigori INRIA and LJLL, UPMC February 2016 Plan Basis of floating point arithmetic and stability analysis Notation, results, proofs taken from [N.J.Higham,

More information

arxiv: v1 [cs.na] 20 Jul 2015

arxiv: v1 [cs.na] 20 Jul 2015 AN EFFICIENT SOLVER FOR SPARSE LINEAR SYSTEMS BASED ON RANK-STRUCTURED CHOLESKY FACTORIZATION JEFFREY N. CHADWICK AND DAVID S. BINDEL arxiv:1507.05593v1 [cs.na] 20 Jul 2015 Abstract. Direct factorization

More information

Numerical Methods I Solving Square Linear Systems: GEM and LU factorization

Numerical Methods I Solving Square Linear Systems: GEM and LU factorization Numerical Methods I Solving Square Linear Systems: GEM and LU factorization Aleksandar Donev Courant Institute, NYU 1 donev@courant.nyu.edu 1 MATH-GA 2011.003 / CSCI-GA 2945.003, Fall 2014 September 18th,

More information

Solving linear equations with Gaussian Elimination (I)

Solving linear equations with Gaussian Elimination (I) Term Projects Solving linear equations with Gaussian Elimination The QR Algorithm for Symmetric Eigenvalue Problem The QR Algorithm for The SVD Quasi-Newton Methods Solving linear equations with Gaussian

More information

Matrix decompositions

Matrix decompositions Matrix decompositions Zdeněk Dvořák May 19, 2015 Lemma 1 (Schur decomposition). If A is a symmetric real matrix, then there exists an orthogonal matrix Q and a diagonal matrix D such that A = QDQ T. The

More information

Numerical Methods in Matrix Computations

Numerical Methods in Matrix Computations Ake Bjorck Numerical Methods in Matrix Computations Springer Contents 1 Direct Methods for Linear Systems 1 1.1 Elements of Matrix Theory 1 1.1.1 Matrix Algebra 2 1.1.2 Vector Spaces 6 1.1.3 Submatrices

More information

Preconditioning Techniques Analysis for CG Method

Preconditioning Techniques Analysis for CG Method Preconditioning Techniques Analysis for CG Method Huaguang Song Department of Computer Science University of California, Davis hso@ucdavis.edu Abstract Matrix computation issue for solve linear system

More information

1. Introduction. In this paper, we consider the eigenvalue solution for a non- Hermitian matrix A:

1. Introduction. In this paper, we consider the eigenvalue solution for a non- Hermitian matrix A: A FAST CONTOUR-INTEGRAL EIGENSOLVER FOR NON-HERMITIAN MATRICES XIN YE, JIANLIN XIA, RAYMOND H. CHAN, STEPHEN CAULEY, AND VENKATARAMANAN BALAKRISHNAN Abstract. We present a fast contour-integral eigensolver

More information

Solving Symmetric Indefinite Systems with Symmetric Positive Definite Preconditioners

Solving Symmetric Indefinite Systems with Symmetric Positive Definite Preconditioners Solving Symmetric Indefinite Systems with Symmetric Positive Definite Preconditioners Eugene Vecharynski 1 Andrew Knyazev 2 1 Department of Computer Science and Engineering University of Minnesota 2 Department

More information

Lecture Note 7: Iterative methods for solving linear systems. Xiaoqun Zhang Shanghai Jiao Tong University

Lecture Note 7: Iterative methods for solving linear systems. Xiaoqun Zhang Shanghai Jiao Tong University Lecture Note 7: Iterative methods for solving linear systems Xiaoqun Zhang Shanghai Jiao Tong University Last updated: December 24, 2014 1.1 Review on linear algebra Norms of vectors and matrices vector

More information

Lecture 3: QR-Factorization

Lecture 3: QR-Factorization Lecture 3: QR-Factorization This lecture introduces the Gram Schmidt orthonormalization process and the associated QR-factorization of matrices It also outlines some applications of this factorization

More information

SOLVING SPARSE LINEAR SYSTEMS OF EQUATIONS. Chao Yang Computational Research Division Lawrence Berkeley National Laboratory Berkeley, CA, USA

SOLVING SPARSE LINEAR SYSTEMS OF EQUATIONS. Chao Yang Computational Research Division Lawrence Berkeley National Laboratory Berkeley, CA, USA 1 SOLVING SPARSE LINEAR SYSTEMS OF EQUATIONS Chao Yang Computational Research Division Lawrence Berkeley National Laboratory Berkeley, CA, USA 2 OUTLINE Sparse matrix storage format Basic factorization

More information

Sparse factorization using low rank submatrices. Cleve Ashcraft LSTC 2010 MUMPS User Group Meeting April 15-16, 2010 Toulouse, FRANCE

Sparse factorization using low rank submatrices. Cleve Ashcraft LSTC 2010 MUMPS User Group Meeting April 15-16, 2010 Toulouse, FRANCE Sparse factorization using low rank submatrices Cleve Ashcraft LSTC cleve@lstc.com 21 MUMPS User Group Meeting April 15-16, 21 Toulouse, FRANCE ftp.lstc.com:outgoing/cleve/mumps1 Ashcraft.pdf 1 LSTC Livermore

More information

Review Questions REVIEW QUESTIONS 71

Review Questions REVIEW QUESTIONS 71 REVIEW QUESTIONS 71 MATLAB, is [42]. For a comprehensive treatment of error analysis and perturbation theory for linear systems and many other problems in linear algebra, see [126, 241]. An overview of

More information

EIGENVALUE PROBLEMS. EIGENVALUE PROBLEMS p. 1/4

EIGENVALUE PROBLEMS. EIGENVALUE PROBLEMS p. 1/4 EIGENVALUE PROBLEMS EIGENVALUE PROBLEMS p. 1/4 EIGENVALUE PROBLEMS p. 2/4 Eigenvalues and eigenvectors Let A C n n. Suppose Ax = λx, x 0, then x is a (right) eigenvector of A, corresponding to the eigenvalue

More information

Scientific Computing: Solving Linear Systems

Scientific Computing: Solving Linear Systems Scientific Computing: Solving Linear Systems Aleksandar Donev Courant Institute, NYU 1 donev@courant.nyu.edu 1 Course MATH-GA.2043 or CSCI-GA.2112, Spring 2012 September 17th and 24th, 2015 A. Donev (Courant

More information

Fast Structured Spectral Methods

Fast Structured Spectral Methods Spectral methods HSS structures Fast algorithms Conclusion Fast Structured Spectral Methods Yingwei Wang Department of Mathematics, Purdue University Joint work with Prof Jie Shen and Prof Jianlin Xia

More information

Preliminary/Qualifying Exam in Numerical Analysis (Math 502a) Spring 2012

Preliminary/Qualifying Exam in Numerical Analysis (Math 502a) Spring 2012 Instructions Preliminary/Qualifying Exam in Numerical Analysis (Math 502a) Spring 2012 The exam consists of four problems, each having multiple parts. You should attempt to solve all four problems. 1.

More information

Multigrid absolute value preconditioning

Multigrid absolute value preconditioning Multigrid absolute value preconditioning Eugene Vecharynski 1 Andrew Knyazev 2 (speaker) 1 Department of Computer Science and Engineering University of Minnesota 2 Department of Mathematical and Statistical

More information

Linear Algebra Massoud Malek

Linear Algebra Massoud Malek CSUEB Linear Algebra Massoud Malek Inner Product and Normed Space In all that follows, the n n identity matrix is denoted by I n, the n n zero matrix by Z n, and the zero vector by θ n An inner product

More information

Comparative Analysis of Mesh Generators and MIC(0) Preconditioning of FEM Elasticity Systems

Comparative Analysis of Mesh Generators and MIC(0) Preconditioning of FEM Elasticity Systems Comparative Analysis of Mesh Generators and MIC(0) Preconditioning of FEM Elasticity Systems Nikola Kosturski and Svetozar Margenov Institute for Parallel Processing, Bulgarian Academy of Sciences Abstract.

More information

Course Notes: Week 1

Course Notes: Week 1 Course Notes: Week 1 Math 270C: Applied Numerical Linear Algebra 1 Lecture 1: Introduction (3/28/11) We will focus on iterative methods for solving linear systems of equations (and some discussion of eigenvalues

More information

Bindel, Spring 2016 Numerical Analysis (CS 4220) Notes for

Bindel, Spring 2016 Numerical Analysis (CS 4220) Notes for Cholesky Notes for 2016-02-17 2016-02-19 So far, we have focused on the LU factorization for general nonsymmetric matrices. There is an alternate factorization for the case where A is symmetric positive

More information

Jae Heon Yun and Yu Du Han

Jae Heon Yun and Yu Du Han Bull. Korean Math. Soc. 39 (2002), No. 3, pp. 495 509 MODIFIED INCOMPLETE CHOLESKY FACTORIZATION PRECONDITIONERS FOR A SYMMETRIC POSITIVE DEFINITE MATRIX Jae Heon Yun and Yu Du Han Abstract. We propose

More information

9.1 Preconditioned Krylov Subspace Methods

9.1 Preconditioned Krylov Subspace Methods Chapter 9 PRECONDITIONING 9.1 Preconditioned Krylov Subspace Methods 9.2 Preconditioned Conjugate Gradient 9.3 Preconditioned Generalized Minimal Residual 9.4 Relaxation Method Preconditioners 9.5 Incomplete

More information

Iterative Methods for Solving A x = b

Iterative Methods for Solving A x = b Iterative Methods for Solving A x = b A good (free) online source for iterative methods for solving A x = b is given in the description of a set of iterative solvers called templates found at netlib: http

More information

ANONSINGULAR tridiagonal linear system of the form

ANONSINGULAR tridiagonal linear system of the form Generalized Diagonal Pivoting Methods for Tridiagonal Systems without Interchanges Jennifer B. Erway, Roummel F. Marcia, and Joseph A. Tyson Abstract It has been shown that a nonsingular symmetric tridiagonal

More information

Sparsity-Preserving Difference of Positive Semidefinite Matrix Representation of Indefinite Matrices

Sparsity-Preserving Difference of Positive Semidefinite Matrix Representation of Indefinite Matrices Sparsity-Preserving Difference of Positive Semidefinite Matrix Representation of Indefinite Matrices Jaehyun Park June 1 2016 Abstract We consider the problem of writing an arbitrary symmetric matrix as

More information

Math 671: Tensor Train decomposition methods

Math 671: Tensor Train decomposition methods Math 671: Eduardo Corona 1 1 University of Michigan at Ann Arbor December 8, 2016 Table of Contents 1 Preliminaries and goal 2 Unfolding matrices for tensorized arrays The Tensor Train decomposition 3

More information

Scientific Computing: Dense Linear Systems

Scientific Computing: Dense Linear Systems Scientific Computing: Dense Linear Systems Aleksandar Donev Courant Institute, NYU 1 donev@courant.nyu.edu 1 Course MATH-GA.2043 or CSCI-GA.2112, Spring 2012 February 9th, 2012 A. Donev (Courant Institute)

More information

Numerical Linear Algebra

Numerical Linear Algebra Numerical Linear Algebra Decompositions, numerical aspects Gerard Sleijpen and Martin van Gijzen September 27, 2017 1 Delft University of Technology Program Lecture 2 LU-decomposition Basic algorithm Cost

More information

Program Lecture 2. Numerical Linear Algebra. Gaussian elimination (2) Gaussian elimination. Decompositions, numerical aspects

Program Lecture 2. Numerical Linear Algebra. Gaussian elimination (2) Gaussian elimination. Decompositions, numerical aspects Numerical Linear Algebra Decompositions, numerical aspects Program Lecture 2 LU-decomposition Basic algorithm Cost Stability Pivoting Cholesky decomposition Sparse matrices and reorderings Gerard Sleijpen

More information

1. Fast Iterative Solvers of SLE

1. Fast Iterative Solvers of SLE 1. Fast Iterative Solvers of crucial drawback of solvers discussed so far: they become slower if we discretize more accurate! now: look for possible remedies relaxation: explicit application of the multigrid

More information

A Cholesky LR algorithm for the positive definite symmetric diagonal-plus-semiseparable eigenproblem

A Cholesky LR algorithm for the positive definite symmetric diagonal-plus-semiseparable eigenproblem A Cholesky LR algorithm for the positive definite symmetric diagonal-plus-semiseparable eigenproblem Bor Plestenjak Department of Mathematics University of Ljubljana Slovenia Ellen Van Camp and Marc Van

More information

AMS Mathematics Subject Classification : 65F10,65F50. Key words and phrases: ILUS factorization, preconditioning, Schur complement, 1.

AMS Mathematics Subject Classification : 65F10,65F50. Key words and phrases: ILUS factorization, preconditioning, Schur complement, 1. J. Appl. Math. & Computing Vol. 15(2004), No. 1, pp. 299-312 BILUS: A BLOCK VERSION OF ILUS FACTORIZATION DAVOD KHOJASTEH SALKUYEH AND FAEZEH TOUTOUNIAN Abstract. ILUS factorization has many desirable

More information

CLASSICAL ITERATIVE METHODS

CLASSICAL ITERATIVE METHODS CLASSICAL ITERATIVE METHODS LONG CHEN In this notes we discuss classic iterative methods on solving the linear operator equation (1) Au = f, posed on a finite dimensional Hilbert space V = R N equipped

More information

Bindel, Fall 2016 Matrix Computations (CS 6210) Notes for

Bindel, Fall 2016 Matrix Computations (CS 6210) Notes for 1 Logistics Notes for 2016-08-26 1. Our enrollment is at 50, and there are still a few students who want to get in. We only have 50 seats in the room, and I cannot increase the cap further. So if you are

More information

Domain decomposition on different levels of the Jacobi-Davidson method

Domain decomposition on different levels of the Jacobi-Davidson method hapter 5 Domain decomposition on different levels of the Jacobi-Davidson method Abstract Most computational work of Jacobi-Davidson [46], an iterative method suitable for computing solutions of large dimensional

More information

The Singular Value Decomposition

The Singular Value Decomposition The Singular Value Decomposition An Important topic in NLA Radu Tiberiu Trîmbiţaş Babeş-Bolyai University February 23, 2009 Radu Tiberiu Trîmbiţaş ( Babeş-Bolyai University)The Singular Value Decomposition

More information

Last Time. Social Network Graphs Betweenness. Graph Laplacian. Girvan-Newman Algorithm. Spectral Bisection

Last Time. Social Network Graphs Betweenness. Graph Laplacian. Girvan-Newman Algorithm. Spectral Bisection Eigenvalue Problems Last Time Social Network Graphs Betweenness Girvan-Newman Algorithm Graph Laplacian Spectral Bisection λ 2, w 2 Today Small deviation into eigenvalue problems Formulation Standard eigenvalue

More information

6.4 Krylov Subspaces and Conjugate Gradients

6.4 Krylov Subspaces and Conjugate Gradients 6.4 Krylov Subspaces and Conjugate Gradients Our original equation is Ax = b. The preconditioned equation is P Ax = P b. When we write P, we never intend that an inverse will be explicitly computed. P

More information

Lecture # 20 The Preconditioned Conjugate Gradient Method

Lecture # 20 The Preconditioned Conjugate Gradient Method Lecture # 20 The Preconditioned Conjugate Gradient Method We wish to solve Ax = b (1) A R n n is symmetric and positive definite (SPD). We then of n are being VERY LARGE, say, n = 10 6 or n = 10 7. Usually,

More information

An H-LU Based Direct Finite Element Solver Accelerated by Nested Dissection for Large-scale Modeling of ICs and Packages

An H-LU Based Direct Finite Element Solver Accelerated by Nested Dissection for Large-scale Modeling of ICs and Packages PIERS ONLINE, VOL. 6, NO. 7, 2010 679 An H-LU Based Direct Finite Element Solver Accelerated by Nested Dissection for Large-scale Modeling of ICs and Packages Haixin Liu and Dan Jiao School of Electrical

More information

Enhancing Scalability of Sparse Direct Methods

Enhancing Scalability of Sparse Direct Methods Journal of Physics: Conference Series 78 (007) 0 doi:0.088/7-6596/78//0 Enhancing Scalability of Sparse Direct Methods X.S. Li, J. Demmel, L. Grigori, M. Gu, J. Xia 5, S. Jardin 6, C. Sovinec 7, L.-Q.

More information

Scientific Computing

Scientific Computing Scientific Computing Direct solution methods Martin van Gijzen Delft University of Technology October 3, 2018 1 Program October 3 Matrix norms LU decomposition Basic algorithm Cost Stability Pivoting Pivoting

More information

be a Householder matrix. Then prove the followings H = I 2 uut Hu = (I 2 uu u T u )u = u 2 uut u

be a Householder matrix. Then prove the followings H = I 2 uut Hu = (I 2 uu u T u )u = u 2 uut u MATH 434/534 Theoretical Assignment 7 Solution Chapter 7 (71) Let H = I 2uuT Hu = u (ii) Hv = v if = 0 be a Householder matrix Then prove the followings H = I 2 uut Hu = (I 2 uu )u = u 2 uut u = u 2u =

More information

lecture 2 and 3: algorithms for linear algebra

lecture 2 and 3: algorithms for linear algebra lecture 2 and 3: algorithms for linear algebra STAT 545: Introduction to computational statistics Vinayak Rao Department of Statistics, Purdue University August 27, 2018 Solving a system of linear equations

More information

Lecture 9: Numerical Linear Algebra Primer (February 11st)

Lecture 9: Numerical Linear Algebra Primer (February 11st) 10-725/36-725: Convex Optimization Spring 2015 Lecture 9: Numerical Linear Algebra Primer (February 11st) Lecturer: Ryan Tibshirani Scribes: Avinash Siravuru, Guofan Wu, Maosheng Liu Note: LaTeX template

More information

Multilevel Preconditioning of Graph-Laplacians: Polynomial Approximation of the Pivot Blocks Inverses

Multilevel Preconditioning of Graph-Laplacians: Polynomial Approximation of the Pivot Blocks Inverses Multilevel Preconditioning of Graph-Laplacians: Polynomial Approximation of the Pivot Blocks Inverses P. Boyanova 1, I. Georgiev 34, S. Margenov, L. Zikatanov 5 1 Uppsala University, Box 337, 751 05 Uppsala,

More information

Math 471 (Numerical methods) Chapter 3 (second half). System of equations

Math 471 (Numerical methods) Chapter 3 (second half). System of equations Math 47 (Numerical methods) Chapter 3 (second half). System of equations Overlap 3.5 3.8 of Bradie 3.5 LU factorization w/o pivoting. Motivation: ( ) A I Gaussian Elimination (U L ) where U is upper triangular

More information

Contents. Preface... xi. Introduction...

Contents. Preface... xi. Introduction... Contents Preface... xi Introduction... xv Chapter 1. Computer Architectures... 1 1.1. Different types of parallelism... 1 1.1.1. Overlap, concurrency and parallelism... 1 1.1.2. Temporal and spatial parallelism

More information

Numerical Linear Algebra

Numerical Linear Algebra Numerical Linear Algebra Direct Methods Philippe B. Laval KSU Fall 2017 Philippe B. Laval (KSU) Linear Systems: Direct Solution Methods Fall 2017 1 / 14 Introduction The solution of linear systems is one

More information

AMS526: Numerical Analysis I (Numerical Linear Algebra)

AMS526: Numerical Analysis I (Numerical Linear Algebra) AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 3: Positive-Definite Systems; Cholesky Factorization Xiangmin Jiao Stony Brook University Xiangmin Jiao Numerical Analysis I 1 / 11 Symmetric

More information

Conjugate Gradient Method

Conjugate Gradient Method Conjugate Gradient Method direct and indirect methods positive definite linear systems Krylov sequence spectral analysis of Krylov sequence preconditioning Prof. S. Boyd, EE364b, Stanford University Three

More information

Section 4.5 Eigenvalues of Symmetric Tridiagonal Matrices

Section 4.5 Eigenvalues of Symmetric Tridiagonal Matrices Section 4.5 Eigenvalues of Symmetric Tridiagonal Matrices Key Terms Symmetric matrix Tridiagonal matrix Orthogonal matrix QR-factorization Rotation matrices (plane rotations) Eigenvalues We will now complete

More information

Scientific Computing WS 2018/2019. Lecture 9. Jürgen Fuhrmann Lecture 9 Slide 1

Scientific Computing WS 2018/2019. Lecture 9. Jürgen Fuhrmann Lecture 9 Slide 1 Scientific Computing WS 2018/2019 Lecture 9 Jürgen Fuhrmann juergen.fuhrmann@wias-berlin.de Lecture 9 Slide 1 Lecture 9 Slide 2 Simple iteration with preconditioning Idea: Aû = b iterative scheme û = û

More information

Incomplete LU Preconditioning and Error Compensation Strategies for Sparse Matrices

Incomplete LU Preconditioning and Error Compensation Strategies for Sparse Matrices Incomplete LU Preconditioning and Error Compensation Strategies for Sparse Matrices Eun-Joo Lee Department of Computer Science, East Stroudsburg University of Pennsylvania, 327 Science and Technology Center,

More information

LU Factorization. LU factorization is the most common way of solving linear systems! Ax = b LUx = b

LU Factorization. LU factorization is the most common way of solving linear systems! Ax = b LUx = b AM 205: lecture 7 Last time: LU factorization Today s lecture: Cholesky factorization, timing, QR factorization Reminder: assignment 1 due at 5 PM on Friday September 22 LU Factorization LU factorization

More information

Numerical Analysis Lecture Notes

Numerical Analysis Lecture Notes Numerical Analysis Lecture Notes Peter J Olver 8 Numerical Computation of Eigenvalues In this part, we discuss some practical methods for computing eigenvalues and eigenvectors of matrices Needless to

More information

On solving linear systems arising from Shishkin mesh discretizations

On solving linear systems arising from Shishkin mesh discretizations On solving linear systems arising from Shishkin mesh discretizations Petr Tichý Faculty of Mathematics and Physics, Charles University joint work with Carlos Echeverría, Jörg Liesen, and Daniel Szyld October

More information

Notes on Eigenvalues, Singular Values and QR

Notes on Eigenvalues, Singular Values and QR Notes on Eigenvalues, Singular Values and QR Michael Overton, Numerical Computing, Spring 2017 March 30, 2017 1 Eigenvalues Everyone who has studied linear algebra knows the definition: given a square

More information

Multilevel Low-Rank Preconditioners Yousef Saad Department of Computer Science and Engineering University of Minnesota. Modelling 2014 June 2,2014

Multilevel Low-Rank Preconditioners Yousef Saad Department of Computer Science and Engineering University of Minnesota. Modelling 2014 June 2,2014 Multilevel Low-Rank Preconditioners Yousef Saad Department of Computer Science and Engineering University of Minnesota Modelling 24 June 2,24 Dedicated to Owe Axelsson at the occasion of his 8th birthday

More information

Computational Methods. Eigenvalues and Singular Values

Computational Methods. Eigenvalues and Singular Values Computational Methods Eigenvalues and Singular Values Manfred Huber 2010 1 Eigenvalues and Singular Values Eigenvalues and singular values describe important aspects of transformations and of data relations

More information

AMS526: Numerical Analysis I (Numerical Linear Algebra)

AMS526: Numerical Analysis I (Numerical Linear Algebra) AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 24: Preconditioning and Multigrid Solver Xiangmin Jiao SUNY Stony Brook Xiangmin Jiao Numerical Analysis I 1 / 5 Preconditioning Motivation:

More information

In this section again we shall assume that the matrix A is m m, real and symmetric.

In this section again we shall assume that the matrix A is m m, real and symmetric. 84 3. The QR algorithm without shifts See Chapter 28 of the textbook In this section again we shall assume that the matrix A is m m, real and symmetric. 3.1. Simultaneous Iterations algorithm Suppose we

More information

14.2 QR Factorization with Column Pivoting

14.2 QR Factorization with Column Pivoting page 531 Chapter 14 Special Topics Background Material Needed Vector and Matrix Norms (Section 25) Rounding Errors in Basic Floating Point Operations (Section 33 37) Forward Elimination and Back Substitution

More information

BlockMatrixComputations and the Singular Value Decomposition. ATaleofTwoIdeas

BlockMatrixComputations and the Singular Value Decomposition. ATaleofTwoIdeas BlockMatrixComputations and the Singular Value Decomposition ATaleofTwoIdeas Charles F. Van Loan Department of Computer Science Cornell University Supported in part by the NSF contract CCR-9901988. Block

More information

Linear Solvers. Andrew Hazel

Linear Solvers. Andrew Hazel Linear Solvers Andrew Hazel Introduction Thus far we have talked about the formulation and discretisation of physical problems...... and stopped when we got to a discrete linear system of equations. Introduction

More information

Math 411 Preliminaries

Math 411 Preliminaries Math 411 Preliminaries Provide a list of preliminary vocabulary and concepts Preliminary Basic Netwon s method, Taylor series expansion (for single and multiple variables), Eigenvalue, Eigenvector, Vector

More information

A direct solver for elliptic PDEs in three dimensions based on hierarchical merging of Poincaré-Steklov operators

A direct solver for elliptic PDEs in three dimensions based on hierarchical merging of Poincaré-Steklov operators (1) A direct solver for elliptic PDEs in three dimensions based on hierarchical merging of Poincaré-Steklov operators S. Hao 1, P.G. Martinsson 2 Abstract: A numerical method for variable coefficient elliptic

More information

SCALABLE HYBRID SPARSE LINEAR SOLVERS

SCALABLE HYBRID SPARSE LINEAR SOLVERS The Pennsylvania State University The Graduate School Department of Computer Science and Engineering SCALABLE HYBRID SPARSE LINEAR SOLVERS A Thesis in Computer Science and Engineering by Keita Teranishi

More information

A fast randomized algorithm for computing a Hierarchically Semi-Separable representation of a matrix

A fast randomized algorithm for computing a Hierarchically Semi-Separable representation of a matrix A fast randomized algorithm for computing a Hierarchically Semi-Separable representation of a matrix P.G. Martinsson, Department of Applied Mathematics, University of Colorado at Boulder Abstract: Randomized

More information

LU Factorization. Marco Chiarandini. DM559 Linear and Integer Programming. Department of Mathematics & Computer Science University of Southern Denmark

LU Factorization. Marco Chiarandini. DM559 Linear and Integer Programming. Department of Mathematics & Computer Science University of Southern Denmark DM559 Linear and Integer Programming LU Factorization Marco Chiarandini Department of Mathematics & Computer Science University of Southern Denmark [Based on slides by Lieven Vandenberghe, UCLA] Outline

More information

Notes on Householder QR Factorization

Notes on Householder QR Factorization Notes on Householder QR Factorization Robert A van de Geijn Department of Computer Science he University of exas at Austin Austin, X 7872 rvdg@csutexasedu September 2, 24 Motivation A fundamental problem

More information

Improvements for Implicit Linear Equation Solvers

Improvements for Implicit Linear Equation Solvers Improvements for Implicit Linear Equation Solvers Roger Grimes, Bob Lucas, Clement Weisbecker Livermore Software Technology Corporation Abstract Solving large sparse linear systems of equations is often

More information