A robust inner-outer HSS preconditioner

Size: px
Start display at page:

Download "A robust inner-outer HSS preconditioner"

Transcription

1 NUMERICAL LINEAR ALGEBRA WIH APPLICAIONS Numer. Linear Algebra Appl. 2011; 00:1 0 [Version: 2002/09/18 v1.02] A robust inner-outer HSS preconditioner Jianlin Xia 1 1 Department of Mathematics, Purdue University, West Lafayette, IN 47907, U.S.A. SUMMARY his paper presents an inner-outer preconditioner for symmetric positive definite matrices based on hierarchically semiseparable HSS matrix representation. A sequence of new HSS algorithms are developed, including some ULV-type HSS methods and triangular HSS methods. During the construction of this preconditioner, off-diagonal blocks are compressed in parallel, and an approximate HSS form is used as the outer HSS representation. In the meantime, the internal dense diagonal blocks are approximately factorized into HSS Cholesky factors. Such inner factorizations are guaranteed to exist for any approximation accuracy. he inner-outer preconditioner combines the advantages of both direct HSS and triangular HSS methods, and are both scalable and robust. Systematic complexity analysis and discussions of the errors are presented. Various tests on some practical numerical problems are used to demonstrate the efficiency and robustness. In particular, the effectiveness of the preconditioner in iterative solutions has been shown for some illconditioned problems. his work also gives a practical way of developing inner-outer preconditioners using direct rank structured factorizations other than iterative methods. Copyright c 2011 John Wiley & Sons, Ltd. key words: ULV factorization robust preconditioner; hierarchically semiseparable HSS matrix; inner-outer HSS algorithm; 1. INRODUCION In this paper, we present a robust and scalable inner-outer preconditioner based on hierarchically semiseparable HSS matrix structures [7, 8, 35]. It is known that dense intermediate matrices in many practical applications have certain low-rank structure or low-rank off-diagonal blocks, based on the early studies by Rokhlin [28], Gohberg, Kailath, and Koltracht [12], and on the ideas of the fast multipole method FMM by Greengard and Rokhlin [17]. he HSS representation by Chandrasekaran, Dewilde, Gu, et al. is a stable special form of FMM representations and H- and H 2 - matrices by Hackbusch, et al [5, 19, 20]. HSS matrices have been widely used in the development of fast numerical methods for sparse matrices, PDEs, integral equations, oeplitz problems, and more [7, 22, 23, 26, 31, 34, etc.]. HSS methods can often be used to solve related problems with nearly Correspondence to: Jianlin Xia, Department of Mathematics, Purdue University, West Lafayette, IN 47907, U.S.A. xiaj@math.purdue.edu he research of Jianlin Xia was supported in part by NSF grants DMS and CHE Copyright c 2011 John Wiley & Sons, Ltd.

2 2 JIANLIN. XIA linear complexity. An HSS form has a nice binary tree structure and the operations are conducted locally at multiple hierarchical levels. he applications of HSS methods and similar as preconditioners have been considered by Dewilde, Gu, et al. [18, 30, 36]. hese methods can be considered as structured incomplete factorizations. Robustness enhancement has also been studied. In previous studies for general symmetric positive definite SPD matrices, it is know that incomplete Cholesky factorizations exist for M-matrices and certain H-matrices. See, e.g., [27] by Meijerink and van der Vorst and [25] by Manteuffel. Some robust or stabilized methods have been proposed for general SPD matrices. See, e.g., [3, 4] by Benzi, Cullum, and uma, and [21] by Kaporin. In fact, rank structured matrix techniques have also been shown very useful in robust preconditioning. Robustness techniques for H-matrices are discussed by Bebendorf and Hackbusch [2]. For an SPD matrix, the structured methods in [18] by Gu, Li, and Vassilevski and in [36] by Xia and Gu guarantee that the structured preconditioners are always positive definite. In particular, it is discussed in [36] that HSS methods have the potential to work as effective preconditioners for problems where the low-rank structure is insignificant, or when the problems have only weak low-rank property. However, both methods in [18, 36] involve Schur complement computations in approximate triangular factorizations, and are not easily parallelizable. Recently, the scalability of HSS methods has been investigated by Wang, Li, et al. [32]. It is shown that both the direct construction of an HSS form and the solution of an HSS system with an ULV factorization procedure are highly parallelizable. In ULV, U and V represent orthogonal matrices and L represents triangular ones. On the other hand, the direct HSS construction may not preserve positive definiteness so that the ULV factorization may break down. Direct HSS methods and triangular HSS factorizations also have different efficiency. For example, the direct construction of an HSS approximation to a general SPD matrix costs 3rn 2 flops under certain assumptions, where r is a related off-diagonal rank bound [33]. hen the ULV factorization and solution in [35] costs at least 56 3 r2 n. In contrast, an HSS Cholesky factorization costs about 11 2 rn2, but the triangular HSS solution is more efficient and needs only 10rn Main results he task of this work is to combine the benefits of both direct HSS and triangular HSS methods. hat is, we propose an efficient and effective preconditioner for SPD matrices which is both scalable and robust. Motivated by the ideas of inner-outer iterative preconditioning by Saad [29], Golub and Ye [14], Bai, Benzi, and Chen [1], et al., we design an inner-outer HSS factorization scheme. We give a sequence of new HSS algorithms, including both direct and triangular HSS ones. hen the inner-outer preconditioner uses a direct HSS construction and ULV factorization as the outer scheme, where an inner triangular HSS factorization is applied to the diagonal blocks. Both the inner and the outer schemes approximate certain off-diagonal blocks with compression or rank-revealing factorizations. Just like some inner-outer iteration-based preconditioners [1, 14], we can also use lower accuracies in the inner HSS factorizations so as to save costs. Moreover, we allow the flexibility of adjusting the levels of hierarchical operations in both the inner and the outer HSS methods. he level of outer operations can be designed to fit the available parallel computing resource, and the inner one can then be adjusted to enhance the robustness. Here, the robustness means that our inner scheme guarantees the existence of a local Cholesky HSS factorization regardless of the approximation accuracy, unlike ULV HSS methods which may break down at certain stages of the hierarchical operations see, e.g.,

3 ROBUS INNER-OUER HSS PRECONDIIONER 3 Example 2 below. At the outer HSS levels, even if breakdown occurs, we can use simple diagonal shifting. Since the number of outer HSS levels is often small, the effect of diagonal shifting on the approximation accuracy is limited, as illustrated in Section 5. he individual new HSS algorithms here are compared with similar existing ones, and generally have better efficiency and scalability. For example, for ULV HSS factorizations, we use local triangular factorizations for diagonal blocks, which not only are more efficient than QR factorizations as used in [8], but also enable us to use the new triangular HSS algorithms as inner operations. Our triangular HSS factorization only needs to compress off-diagonal blocks about half of the size of those used in [36], while keeping about the same accuracy. Unlike the triangular HSS solution procedure in [24] which is sequential, a scalable triangular ULV factorization scheme with similar cost is proposed. he detailed complexity of these new HSS algorithms is analyzed. he analysis can help us compare different methods and provide improvements and optimization. See, e.g., 39 and 40. Discussions on the errors are given. Several numerical examples are used to demonstrate the efficiency and robustness of the algorithms. We show the effectiveness of the inner-outer HSS preconditioning for some ill-conditioned problems, such as a oeplitz problem generated by a Gaussian radial basis function and a linear elasticity PDE. We observe that, with efficient inner-outer HSS preconditioning, the preconditioned conjugate gradient method converges quickly for all the examples, and the convergence is often much faster than with block diagonal preconditioning. he overall inner-outer preconditioning cost is often insignificant when we manually set the off-diagonal rank bounds to be small. Our methods can also be built into sparse factorization or preconditioning frameworks as in [15, 16, 34] by Grasedyck, Le Borne, Kriemann, Xia, et al Outline and notation he remaining sections are organized as follows. Section 2 provides some new HSS factorization algorithms for SPD matrices. Additional HSS solution algorithms and our inner-outer HSS preconditioner is given in Section 3. he complexity and error analysis are presented in Section 4. Sections 5 and 6 are devoted to some numerical experiments and concluding remarks, respectively. For convenience, the following notation is used in the presentation: For an index set t i I {1, 2,..., n}, let t c i t i t r i = I, where all indices in tc i are smaller than those in t i, and all indices in t r i are larger than those in t i. A ti t j is the submatrix of a matrix A with row index set t i and column index set t j. If a rank-revealing QR RRQR factorization is computed for A ti t j, we sometimes write A ti t j U i A ˆt i t j, which can be understood as that the factor A ˆt i t j is still stored in A with row index set ˆt i. is a postordered full binary tree with nodes ordered as 1, 2,... hat is, each non-leaf node i and its children c i,1, c i,2 are ordered following c i,1 < c i,2 < i. Sometimes we just write the children as c 1, c 2 when no confusion is caused. For a node i, denote its parent and sibling by pari and sibi, respectively. diaga 1,..., A k is a diagonal matrix with diagonal blocks A 1,..., A k. he definition and notation for HSS structures are also briefly reviewed as follows, following a postordering HSS form [35, 36]. Definition 1.1. Assume A be an n n dense matrix. Let be a postordered full binary tree with 2k 1 nodes, and t i be an index set associated with each node i of. We say is a postordered HSS tree and A is in an HSS form if:

4 4 JIANLIN. XIA 1. For each non-leaf node i with children c 1 and c 2, t c1 t c2 = t i, t c1 t c2 = φ, and t 2k 1 = I = {1, 2,..., n}. 2. here exist matrices D i, U i, V i, R i, W i, B i called HSS generators associated with each node i satisfying Dc1 U D i = c1 B c1 Vc Uc1 2 R U c2 B c2 Vc, U i = c1 Vc1 W, V i = c1, 1 1 D c2 U c2 R c2 V c2 W c2 so that D 2k 1 A ti t i. Here, U 2k 1, V 2k 1, R 2k 1, W 2k 1, B 2k 1 are empty matrices. A is in a data-sparse form given be the generators. For a non-leaf node i, the generators D i, U i, V i are recursively defined and are not explicitly stored. Define the following HSS blocks: A = A i = A ti I\t i, A i = A I\t i t i. Clearly, U i gives a column basis for A i HSS block row, and Vi gives a row basis for A i HSS block column. U i and V i are also called cluster bases in [5]. he maximum numerical rank of all HSS blocks is called the HSS rank of A. he following is a block 4 4 HSS matrix example: D 1 U 1 B 1 V2 U 1 R 1 B 3 W4 V4 U 1 R 1 B 3 W5 V5 U 2 B 2 V1 D 2 U 2 R 2 B 3 W4 V4 U 2 R 2 B 3 W5 V5 U 4 R 4 B 6 W1 V1 U 4 R 4 B 6 W2 V2 D 4 U 4 B 4 V5 U 5 R 5 B 6 W 1 V 1 U 5 R 5 B 6 W 2 V 2 U 5 B 5 V 4 D 5 See Figure 1. Obviously, if A is symmetric, we can set [35]. 2 D i = D i, V i = U i, B j = B i with j = sibi. 3 7 D 7 =A 0 3 D 3 D D 1 D 2 D 4 D 5 2 level Figure 1. HSS tree for the block 4 4 HSS matrix 2. A given HSS form generally enables us to conduct related matrix operations such as matrix inversions and multiplications in linear complexity, if the HSS rank is small. Otherwise, we can use compact HSS forms as efficient preconditioners. 2. NEW SPD HSS FACORIZAION ALGORIHMS We show how to convert a dense SPD matrix into data-sparse forms, by either direct HSS construction with ULV factorization, or Cholesky HSS factorization. In this section, for convenience, each subsection uses the same notation for the generators of a related HSS form. o help the reader understand the algorithms, we briefly summarize the major framework of each algorithm in able I.

5 ROBUS INNER-OUER HSS PRECONDIIONER 5 HSS construction Section 2.1 Leaf level: off-diagonal block row compression for U generators; Non-leaf levels: off-diagonal block row compression for R generators; Similar off-diagonal block column compression. ULV HSS factorization Section 2.2, and similar for Section 2.4 Diagonal block factorization; Introducing zeros into off-diagonal blocks and preserving diagonal identity matrices; Partial diagonal elimination; Merging and recursion. Robust HSS Cholesky factorization Section 2.3 Leaf level: Diagonal block factorization; Computation of the off-diagonal block row of the factor and its compression for U generators; Partial Schur complement formation; Non-leaf level: off-diagonal block row compression for R generators; Similar off-diagonal block column compression. able I. Framework of the major algorithms in Section Scalable symmetric HSS construction he HSS construction algorithm in [35] uses nested compression of the HSS blocks of a given matrix A. However, it is a sequential process since early compression results participate in later compression. he algorithm can be modified into a new scalable one. In particular, for a symmetric matrix A, we need only to compress the block rows. Moreover, we can further reduce the compression cost. he method has two stages and is briefly explained as follows. For simplicity, assume the row size and numerical rank of all HSS block rows are m and r, respectively. During the first stage, all HSS block rows are compressed. If i is a leaf, compute an RRQR A ti I\t i U i A ˆt i I\t i. If i is a non-leaf node with children c 1 and c 2, compute an RRQR of the matrix A ˆt c1 I\t i Rc1 A ˆt R. i I\t i c2 hen by recursion, A ti I\t i Uc1 A ˆt c2 I\ti U c2 A ˆt c2 I\t i A ˆt c2 I\t i = Uc1 U c2 Rc1 R c2 A ˆt i I\t i = U ia ˆt i I\t i. hat is, we obtain the orthonormal column basis U i for each HSS block row A i. hese compression steps are done independently of other nodes at the same level of the HSS tree. After this stage, all U and R generators are available. In the next stage, we compute the B generators. Let j = sibi. According to the first stage, we have A ti t j U i A ˆt i t j.

6 6 JIANLIN. XIA On the other hand, in the HSS approximation, A ti t j has a form U i B i Uj A ˆt j t i U j, if i is a leaf, Bi = Rcj1 A ˆt j t i, otherwise. R cj2. hus, we can let 4 Note that the method in [35] needs additional compression steps to get B i, which are generally more expensive than Improved ULV-type HSS factorization he ULV HSS Cholesky factorization in [35] computes an explicit ULV-type factorization of an HSS matrix A. he basic idea is similar to the implicit ULV solution in [8]. However, for SPD matrices, as pointed out in [35], the efficiency of these ULV algorithms can be improved. Moreover, we can take advantage of certain special forms of the generators. he main steps are illustrated in Figure 2 and are explained as follows. For simplicity, assume all leaf level HSS block row sizes are m i m, and the ranks of all HSS blocks are r. Since U i forms a basis for the column space of A i, sometimes we write A i = U i i. If node i is a leaf, compute a Cholesky factorization D i = L i L i, and then a full QL factorization of L 1 i U i as 0 m r L 1 i U i = Q i. 5 Ũ i r See Figure 2i ii. hen apply Q i to the HSS block row A i = U i i on the left: 0 Q i A i = Ũ i i m r r his also means Q i is applied to the HSS block column on the right. Notice that the diagonal block remains to be an identity matrix. See Figure 2iii. After this, the leading m r rows of block row i can be eliminated. his is done for all the leaves. If node i is a non-leaf node with its children c 1 and c 2 being leaves, assume the previous operations are applied to both c 1 and c 2. hat is, Q c1 L 1 c 1 D c1 U c1 B c1 Uc 2 L c 1 Q c1 Q c2 L 1 c 2 U c2 Bc 1 Uc 1 D c2 L c 2 Q c2 I diag 0, Ũ c1 B c1 Ũ c2 =, 6 diag 0, Ũ c2 B c1 Ũ c1 I where diag denotes a block diagonal matrix with the given blocks on the diagonal. hen we merge appropriate blocks on the right-hand side of 6 and remove c 1 and c 2 from the HSS tree so that i becomes a leaf with D and U generators D i = I Ũ c1 B c1 Ũc 2 Ũ c2 Bc 1 Ũc 1 I, U i = Ũc1 R c1 Ũ c2 R c2. 7 Note that we are using D i and U i to mean the generators of a smaller matrix after elimination, instead of A. his smaller matrix is called a reduced HSS matrix. See Figure 2iv. he same operations then apply recursively. At the root level, a direct Cholesky factorization is computed. Figure 3 shows how an HSS tree is reduced along the factorization.

7 ROBUS INNER-OUER HSS PRECONDIIONER 7 L 1 L 2 L 4 L 5 Q 1 Q 2 Q 4 Q 5 L 1 D 1 U 1 Q 1 L 2 U 2 D 2 U 3 Q 2 L 4 D 4 U 4 Q 4 L 5 U 6 U 5 D 5 Q 5 i Cholesky factorization of diagonal blocks ii Introducing zeros into off-diagonal L 3 L 6 L 3 L 6 iii After introducing zeros iv Merging remaining blocks and repeating Figure 2. An improved ULV HSS factorization scheme for an SPD matrix. 3 6 R 1 R B 2 R 4 R 1 B D 1,U 1 D 2,U 2 D 4,U 4 D 5,U 5 7 B 3 i Symmetric HSS tree 7 B D 3,U 3 D 6,U 6 ii After eliminating one level 7 D 7 iii Until the root Figure 3. How the HSS tree is reduced along the ULV HSS factorization. Note that due to the special form of D i in 7, the intermediate Cholesky factorization D i = L i L i at non-leaf levels can be done quickly. hat is, we have I L i = Ũ c2 Bc 1 Ũc, Li L i = I Ũc 2 Bc 1 Li 1 Ũc 2 Bc 1. 8 Similarly, the update L 1 i U i for 7 8 can also be done quickly as L 1 I Ũc1 R i U i = c1 Ũ L 1 i Ũ c2 Bc 1 Ũc = c1 R c1 L 1 1 i Ũ c2 R c2 L 1 i Ũ c2 [R c2 Bc 1 Ũc 1 Ũc 1 R c1]. 9

8 8 JIANLIN. XIA 8 9 can be carefully organized so that they only need six matrix multiplications and one triangular solution. his improved algorithm is thus more efficient than the one in [35], which requires two dense square matrix products in the form of Q i D i Q i and the full QR factorization of a size m r r dense block for each node i. In fact, according to able VI in Section 4, this new ULV factorization with m = 2r costs 35 3 r2 n flops, while the one in [35] costs 56 3 r2 n flops. Our algorithm computes an ULV-type HSS factorization A = ˆLˆL, 10 where ˆL is given by a sequence of orthogonal and lower triangular matrices. We call ˆL an ULV factor, which is formed recursively. Let A l be the reduced matrix at level l obtained from A l+1 after the elimination of all nodes at level l + 1, where A lmax A for the leaf level l max. Assume the nodes at level l are j 1, j 2,..., j α. Also let P l be a permutation matrix which performs all the merge steps during the elimination of level l + 1 and also forms the new reduced HSS matrix. hat is, P l X l diag Ūcj1,1 R c j1,1 Ū cj1,2r cj1,2,..., Ūcj α,1r cj α,1 Ū cjα;2 R cjα,2 P l X l A l+1 X l P l = diag = 0 diag Ū j1,..., Ūj α, 11 I, A l, 12 where X l = diag diag Q c L 1 j1,1 c, Q j1,1 c L 1 j1,2 c j1,..., diag Q,2 c L 1 j α,1 c, Q j α,1 c L 1 j α,2 c j. α,2 hen we let ˆL ˆL lmax, and A l = ˆL l ˆL l, l = l max, l max 1,..., 1, 0, where ˆL 0 is the Cholesky factor associated with the root, and ˆL l+1 = X 1 l Pl diag I, ˆL l P l, l = l max 1, l max 2,..., 1, Robust HSS Cholesky factorization Motivation of robust and effective HSS preconditioning he factorization method in [36] computes an approximate Cholesky factorization for a dense SPD matrix A R R, where R is an upper triangular HSS matrix. he approximation R R is guaranteed to exist and to be positive definite, regardless of the approximation accuracy. he basic idea is called Schur compensation and can be illustrated as follows. Consider a block 2 2 SPD matrix and the Cholesky factorization of its 1, 1 block: A11 A 21 D1 A = A 21 A 22 A 21 D 1 1 I D1 D 1 A 21 S, 14 where S = A 22 A 21 D1 1 D 1 A 21 is the Schur complement. Compute an SVD: D 1 A 21 = U 1 ΣU2 + Û1 ˆΣÛ 2, where the singular values in Σ are larger than a tolerance τ, and those in ˆΣ are smaller than τ. A relative tolerance τ may be used. If all the singular values in ˆΣ are dropped, then S is approximated by S = A 22 Û2Σ 2 Û2 = S + Oτ hat is, a positive semidefinite term is automatically added to the Schur complement. Compute a Cholesky factorization S = D 2 D 2. We then get an approximate Cholesky factorization A R R.

9 ROBUS INNER-OUER HSS PRECONDIIONER 9 It is also shown in [36] that, with certain additional techniques, a modification to R can work as an effective preconditioner when the low-rank property is not significant. hat is, if the numerical rank for a given tolerance is large, we can still manually set a small rank to get a low accuracy R. he idea is then generalized so that the Cholesky factor of A is approximated by an HSS matrix. RRQR factorizations are often used to replace SVDs in the compression, since they are generally more efficient Improved HSS Cholesky factorization he HSS Cholesky factorization algorithm in [36] uses a form for R where only D, U, R, B generators are needed. R is computed block rowwise. he computed off-diagonal block column R t c i t i and off-diagonal block row R ti t r associated with i node i are put together as R t c i t i R ti t r, which is compressed to provide the column i basis U i. his is equivalent to constructing a form for a symmetric full matrix whose block upper triangular part is R. Here, we propose a new procedure which has several advantages: 1 Instead of compressing R t c i t i R ti t r, we compress R i t c i t i and R ti t r i independently so as to get their row and column bases U i and Vi, respectively. his implementation is much simpler. 2 he HSS form of the Cholesky factor we obtained is more compact, since the individual numerical ranks of R t c i t i and R ti t r are usually smaller than that of i R t c i t i R ti t r, if the same compression accuracy is used. i 3 he algorithm in [36] maintains a compressed form see 22 below only for Schur complement computations. Here, this compressed form is also used to speed up the computation of V i generators. he new HSS Cholesky factorization procedure is described with the aid of the following definition. Definition 2.1. [36] For a node i of a postordered binary tree, the set of predecessors of i is defined to be { {i}, if i is the root of, predi = {i} predpari, otherwise. A visited set V i set of visited nodes before i whose siblings have not been visited is defined to be V i = {j : j is a left node and sibj predi}. 16 We can still use the HSS tree as in Definition 1.1 for R, but some generators do not exist. he following result can be easily verified. Observation 1. For an HSS matrix to be block triangular, some of its generators are empty matrices, as specified in able II. Lower triangular Upper triangular pari pred1 U i, R i V i, W i pari predj V i, W i U i, R i able II. Empty generators of triangular HSS matrices, where j is the last largest-labeled leaf of the HSS tree of the matrix. he factorization is conducted along the postordering traversal of the HSS tree for nodes i = 1, 2,...

10 10 JIANLIN. XIA If i is a leaf, let A i 1 t r i 1 t r denote the Schur complement due to the previous i 1 factorization i 1 steps. A i 1 t r i 1 t r is partially formed, or, its first block row Ai 1 i 1 ti t r is available see 23 i 1 below. Note t r i 1 = t i t r i. Compute a block row D i R ti t r i of R as in the usual Cholesky factorization. hat is, compute the following Cholesky factorization and update the off-diagonal block as hen compress R ti t r i A i 1 ti t i = D i D i, R ti t r i = D i A i 1 ti t r i. 17 with an RRQR factorization R ti t r i U ir ˆt i t r i. 18 he off-diagonal block column R t c i t i is already partially compressed in previous steps so that R tj t i U j R ˆt j t i for all j V i {j 1, j 2,..., j s }, where t c i = t j1 t j2 t js. Ignoring previously computed U bases, we only need to compress R ˆtj1 ti R ˆtj2 ti R ˆtjs ti V i R ˆt j1 ˆt i R ˆt j2 ˆt i R ˆt js ˆt i. 19 Note that 19 can be replaced by a more efficient step as illustrated in 26 and 27 below. If i is a non-leaf node, let c 1 and c 2 be its children. Clearly, t r c 2 = t r i. he blocks R t c1 t r and i R tc2 t r are compressed in previous steps. With the basis matrices ignored, we compress the following i matrix Rc1 R ˆt R. 20 i t r i c2 R ˆtc1 t r i R ˆt c2 t r i Similarly, the further compression of R t c i t i = R t c i t c1 R t c i t c2 is done by ignoring certain existing V basis matrices. hat is, compress R ˆt j1 ˆt c1 R ˆt js ˆt c1 R ˆt j1 ˆt c2 R ˆt js ˆt c2 Wc1 W c2 R ˆtj1 ˆti R ˆtjs ˆti. 21 After these compression steps, U i and V i are implicitly available due the recursions in 1. According to Observation 1, if i pred1, 18 and 20 are not needed. Also, if i predj where j is the last largest-labeled leaf, 19 and 21 are not needed. Furthermore, if i is a right node with its sibling j, we set B j = R ˆt j ˆt i. Before the traversal of each leaf i, we partially form A i 1 ti t r, which is the first block row of i 1 the new Schur complement. Similar to [36], we also maintain a compact approximation R t c i t r i 1 Q i 1Y i 1, 22 where Q i 1 has orthonormal columns and is not stored. Once 22 holds, then A i 1 ti t r i 1 = A t i t r i 1 R t c i t i R t c i t r i 1 A t i t r i 1 Y i 1;1Y i 1, 23 where a partitioned form of 22 is used: R t c i t i R t c i t r i Qi 1 Yi 1;1 Y i 1;2. 24

11 ROBUS INNER-OUER HSS PRECONDIIONER 11 Here, Y i 1 is obtained approximately by recursion. For any i 1 pred1, set Y i 1 R ˆt. i 1 t r i 1 Otherwise, let j be the largest leaf before i so that Y j 1 is previously available. hen Y i 1 can be approximately computed with an RRQR factorization Yj 1;2 R ˆt j t r j Q i 1 Y i See [36] for the detailed derivation. Furthermore, unlike [36], here we can use Y i 1 to improve the efficiency of computing V i. Due to 24, the HSS block column R t c i t i is already in a compressed form R t c i t i Q i 1 Y i 1;1. hus, to get the Vi which gives a row basis for the row space of R t c i t i, we just compress Yi 1;1 with an RRQR factorization Yi 1;1 V i i, 26 where i is not stored. hen set R ˆtj1 ˆti R ˆtj2 ˆti R ˆtjs ˆti = V i R ˆt j1 t i R ˆt j2 t i R ˆt j s ti. 27 Equations 26 and 27 can be used to replace 19. his generally reduces the compression cost, since the difference between the costs of 19 and is 4msrr 2r 2 sr + m + 43 r3 2srmr + 4mrr 2r 2 r + m + 43 r3 = 2r 2 m rs + r 2m. Here, m > r in general, and s is as large as Olog n riangular ULV HSS factorization Next, we study the ULV factorization of triangular HSS matrices. Notice that the ULV algorithm in Section 2.2 cannot be applied in a straightforward way, due to the nonsymmetry. We consider a block lower triangular HSS matrix L. An upper triangular HSS factorization procedure can be similarly derived. A triangular HSS solution algorithm is given in [23, 24]. he algorithm traverses the HSS tree with depth-first sweeps so that newly computed solution entries can be used to update the right-hand side, or information is propagated to all related nodes of the HSS tree. his algorithm implicitly computes a triangular HSS matrix-vector multiplication by recursion. It is thus not suitable for parallelization. Here, we provide a new triangular ULV HSS solver which is scalable. Again, we convert the diagonal blocks to identity matrices so as to avoid multiplications of square dense blocks. he main steps are illustrated in Figure 4 and explained briefly as follows. If node i is a leaf, multiply D 1 i matrix, and update U i as D 1 i hen let Q i V i = Ṽi Ṽ i to the block row i so that the diagonal block becomes an identity U i. Compute a full QL factorization of D 1 i U i 0 m r D 1 i U i = Q i. 28 Ũ i r, where the partition follows that of 28. Clearly, the leading m r m r subblock of the diagonal block becomes an identity matrix, which is the only nonzero block in those rows and can be eliminated.

12 12 JIANLIN. XIA D 1 D 1 D 2 U 2 D 2 Q 2 D 4 D 4 Q 4 D 5 U 6 U 5 D 5 Q 5 i Factorization of diagonal blocks ii Introduction of zeros into off-diagonal Q 2 Q 4 Q 5 iii Restoring diagonal identity iv Merging remaining blocks and repeating Figure 4. A triangular ULV HSS factorization scheme. If node i is a non-leaf node with its children c 1 and c 2 being leaves, assume the previous operations are applied to both c 1 and c 2. hat is, Q c1 D 1 c 1 Q c2 D 1 c 2 D c1 U c2 B c2 V c 1 D c2 = I diag 0, Ũ c2 B c2 Ṽc 1 I. 29 hen we merge appropriate blocks on the right-hand side of 29 and remove c 1 and c 2 from the HSS tree so that i becomes a leaf with D and U generators D i = I Ũ c2 B c2 Ṽ c 1 I, U i = Ũc1 R c1 Ũ c2 R c2 L is reduced to a smaller matrix. he same operations then apply recursively.. 30 Note that due to the special form of D i in 30, the intermediate update step Dp 1 Ũ p can be done quickly as D 1 i Ũ i = Ũ c1 R c1 Ũ c2 [R c2 B c2 Ṽ c 1 Ũc 1 R c1 ]. 31 he products in can be carefully organized so that only five multiplications of r r matrices are needed.

13 ROBUS INNER-OUER HSS PRECONDIIONER Improved ULV HSS solution 3. INNER-OUER HSS PERCONDIIONER Due to the improved ULV-type HSS factorization in Section 2.2, the solution procedure with the ULV factor is much simpler than the one in [35]. For example, for ˆL in 10, we consider the solution of the following system by forward substitution: ˆLy = b. In the process, each node i of the HSS tree is associated with b i, a piece of b, and y i, a piece of y. hese pieces are defined recursively. First, b is partitioned into b i pieces according to the leaf level L i sizes. For each leaf i, let b i = Q i L 1 i b i. 32 Partition b i as b i = bi,1 b i,2 m r r, and set ỹ i = b i,1. b Assume similar operations are also conducted for j = sibi. hen for p = pari, set b p = i,2 b j,2 b if i < j, or b p = j,2 otherwise. Similar operations can then be applied to upper level nodes. b i,2 hese steps proceed until the root node 2k 1 is reached, where ỹ 2k 1 = L 1 2k 1 b k. 33 After all these ỹ i pieces are computed, we traverse the HSS tree top-down. Let y 2k 1 = ỹ 2k 1. ỹi,1 r For a non-leaf node i with children c 1, c 2, partition ỹ i as ỹ i = and compute ỹ i;2 r y c1 = L ỹc1 c 1 Q c1, y c2 = L ỹc2 c ỹ 2 Q c2. 34 i;1 ỹ i;2 When all the nodes are visited, the pieces y i associated with all the leaves can be assembled into y. See Figure 6 below riangular ULV HSS solution We then consider the ULV solution for a lower triangular HSS system Lx = b. Initially, partition b conformably following the sizes of the leaf level diagonal blocks D i in L. For each node i of the HSS tree, compute bi = Q i D 1 i b i Similarly, the structure of D i in 30 can be used to accelerate the computation of D 1 i b i. Partition bi as b bi,1 m r i =, and set bi,2 r x i = b i,1.

14 14 JIANLIN. XIA hen similar to the previous subsection, we form b p by stacking b i,2 and b j,2 for j = sibi. he process then proceeds, until the root node 2k 1 is reached, where x 2k 1 = D 1 2k 1 b k. After all solution pieces x i are computed, we traverse the HSS tree top-down. Let x 2k 1 = x 2k 1. xi,1 r For a non-leaf node i with children c 1, c 2, partition x i as x i = and compute r x c1 = Q c1 xc1 x i;1, x c2 = Q c2 xc2 x i;2 x i;2. 35 When all the nodes are visited, the pieces x i associated with all the leaves can be assembled into x Inner-outer HSS solution With all the new HSS algorithms above, we are ready to present our inner-outer HSS algorithms. In the inner-outer factorization, A is approximated by an HSS form with the HSS tree, where each diagonal block D i is approximately factorized as D i L i L i, and L i is an inner lower triangular HSS form with the HSS tree 0. his is illustrated with Figure 5. hen ULV factorizations are computed for both inner and outer HSS forms. Sections 2.3 and 2.2 are used. In the solution stage, we follow an outer solution procedure given by the ULV solution in Section 3.1. In the meantime, for any solution steps involving L i associated with a leaf i, such as 32, we use the triangular HSS solution in Section 3.2. he solution algorithm is summarized in able III. 7 D 7 =A 3 D 3 D D 1 D 2 D 4 D 5 Figure 5. A pictorial illustration of the inner-outer HSS process, where each leaf D i block of the outer HSS tree in Figure 1 is approximately factorized into an HSS Cholesky factor as the inner HSS matrix. In practical implementations of the algorithm as a preconditioner, additional techniques can be utilized. One is that we can traverse the HSS trees levelwise, so that the algorithms can be implemented in parallel. For example, when all the y i pieces are obtained as in 34, they can be first assembled together at each level. Assume the solution pieces provided at each level l form a vector ȳ l. hen the solution y is formed by collecting all ȳ l with appropriate permutations Figure 6. hat is, we define y l = P l ȳl y l 1, l = l max, l max 1,..., 2, 1 36 with ȳ 0 = y 2k 1. hen y y 0. Another techniques is that we can use diagonal shifts in the outer factorization step to enhance robustness. his is not needed for the inner factorizations. Since D i has the form in 7 for a

15 ROBUS INNER-OUER HSS PRECONDIIONER 15 Algorithm 1. Inner-outer HSS solution subroutine io hsssola, b Input: A in factorized form, where the outer ULV factor is given by Q i and L i, and L i is given by its inner ULV factor; b: right-hand side initially, y b Output: Solution y for all nodes i at the bottom level l max, partition y into y i pieces for each leaf i, compute ỹ i = Q i [triulvsoll i, b i ] for l = l max 1, l max 2,..., 1 do for all nodes i at level l do end for for each child c of i, compute b c = Q c L 1 c b c ỹc1,2 Let b i =, ỹ c1 = ỹ c1,1, ỹ c2 = ỹ c2,1 ỹ c2,2 end for Compute ỹ 2k 1 = L 1 2k 1 b 2k 1 for l = 0, 1,..., l max 1 do ỹc,1 ỹ c,2 m r r end for for all nodes i at level l do end for Partition ỹ i as ỹ i = ỹi,1 ỹ i;2 for j = 1, 2, compute y cj for each leaf i, compute y cj end subroutine r r = L ỹcj c j Q cj ỹ i;j = triulvsol L ỹcj c j, Q cj ỹ i;j able III. An inner-outer HSS solution algorithm, where triulvsol represents the solution scheme in Section 3.2. non-leaf node i in, we can add the following shift to D i : s i = εi. 37 Notice that this diagonal shift is only needed for non-leaf nodes in. In parallel computing, the number of non-leaf nodes may be set to be twice of the number of processors, and the affect on the accuracy is limited. 4. ANALYSIS We collect all the new HSS algorithms in this work as follows:

16 JIANLIN. XIA y 0 { y... y l max-1 { y lmax = y 0 y l max-1 y l max 0 l max -1 l max level 3 y y 1 y 2 Figure 6. How the solution pieces generated in Algorithm 1 are assembled into the solution y, where the pieces y i associated with the nodes at each level l of the HSS tree are assembled into ȳ l, and then into y l as in 36. Direct-HSS: he new HSS algorithms based on direct HSS construction and ULV operations: Direct-HSSconstr: he HSS construction procedure in Section 2.1. Direct-HSSfact: he ULV HSS factorization procedure in Section 2.2. Direct-HSSsol: he ULV HSS solution procedure in Section 3.1. ri-hss: he new HSS algorithms based on triangular HSS operations: ri-hssconstr: he HSS Cholesky factorization procedure in Section 2.3. ri-hsssol: he triangular HSS factorization procedure in Section 2.4 together with solution in Section 3.2. I/O-HSS: he new inner-outer HSS algorithms: I/O-HSSconstr: he procedure with outer Direct-HSSconstr and inner ri-hssconstr. I/O-HSSfact: he procedure with outer Direct-HSSfact. I/O-HSSsol: he procedure with outer Direct-HSSsol and inner ri-hsssol. For convenience, we use the following notation in the analysis and the numerical experiments: For the outer HSS algorithms, n, r, m, and are the matrix size, HSS rank, leaf level diagonal block size, and outer HSS tree, respectively. he number of leaves in is k. For the inner HSS algorithms, we similarly use the notation n 0, r 0, m 0, 0, and k 0. Here, n 0 m. ξ 1, ξ 2, and ξ 3 are the flop counts of the above three types of algorithms. For example, ξ 2 constr is the cost of ri-hssconstr Complexity count: the new HSS Cholesky factorization as an example As an example, we count the flops of ri-hssconstr, and then other algorithms can be easily studied. For simplicity, assume the inner HSS tree 0 is a perfect binary tree, and m 0 = Or 0. Since there are k 0 leaves, k 0 m 0 = n 0. We also assume the root of 0 is at level 0 so that there are totally about log 2 k 0 levels. Our complexity analysis uses the flop counts of some basic matrix operations in able IV. Also two useful formulas are needed: 2 l n i = k 0 j k 0 2 l m 0 = 1 2 k l l 0m 0 2 l 1, s i = = 2 l 1, 38 j i at level l j=1 i at level l j=1

17 ROBUS INNER-OUER HSS PRECONDIIONER 17 Operation Flops 1 Cholesky factorization of an m 0 m 0 matrix 3 m3 0 Product of an m 0 q matrix and an q r 0 matrix 2m 0 qr 0 RRQR factorization of an m 0 q matrix with rank r 0 Full QR factorization of an m 0 r 0 tall matrix m 0 > r 0 4m 0 qr 0 2r0m q r3 0 2r0 2 m0 r 0 3 Product of the Q factor and an m q matrix 2r 0 q2m 0 r 0 Solution of an order m 0 triangular system Lx = b m 2 0 able IV. Flop counts leading terms only of some basic matrix operations, which can be found in, say, [10, 13] or can be derived based on those. he RRQR factorization in our work is based on the modified Gram-Schmidt process with column pivoting [13]. where n i is the column size of R ti t r, and s i i is the cardinality of V i in 16. he first formula is because there are 2 l nodes at each level l, with the n i values k 0 j k 0 m 2 l 0, j = 1, 2,..., 2 l. he second formula is derived in [33]. a Elimination cost. For each leaf i, the elimination step 17 costs m m2 0n i flops. hus, the total elimination cost for all leaves is i: leaf m m2 0n i = k 0m m 2 0 n i = 1 2 k2 0m k 0m 3 0. i at level log 2 k 0 b Compression cost. For each leaf i, the RRQR step 18 is applied to an m 0 n i block to get U i. For each non-leaf node i at level l > 0, the RRQR step 20 is applied to a 2r 0 n i block to get R c1, R c2. he total for all HSS block row compression is then i: leaf 4m 0 n i r 0 2r0m n i + 4 log 2 k r r 0 k 2 0m r 2 0k 2 0m 0 2r 2 0km k 0r 3 0, l=1 i at level l [42r 0 n i r 0 2r 2 02r 0 + n i r3 0] where 38 is used. For each leaf i, to compute V i with 26 27, we need the compression of a size r 0 m 0 block and the multiplication of a m 0 r 0 and an r 0 s i r 0 matrix. For each non-leaf node i at level l > 0, the RRQR step 21 is applied to an 2r 0 s i r 0 block to get the W generators. he total cost at all HSS block column compression is then i: leaf [4m 0 r0 2 2r 2 m 0 +r log 2 k 1 3 r3 0 +2s i r 0 r 0 m 0 ]+ 4k 0 r 2 0m r3 0m k 0r 3 0. l=1 i at level l [8s i r 0 r 2 0 2r 2 02r 0 +s i r r3 0] c Schur complement computation cost. Similarly, we can count the cost of 23 and 25 for each leaf i as m 0 r 0 + 3r0 2 k 2 0 m 0. herefore, the total factorization cost is roughly ξ 2 constr 1 2 m 0 + 3r r2 0 m 0 n his formula can be used to minimize ξ constr. 2 It can be easily shown that the optimal cost is ξ 2 constr r 0 n 2 0, 40

18 18 JIANLIN. XIA which is achieved when m 0 = 10r 0. If we choose m 0 = 2r 0 as often used, ξ 2 constr 13 2 r 0n 2 0, which is slightly larger than the optimal cost. On the other hand, the original version in [36] with m 0 = 2r 0 costs at least 11 r 0 n 2 0 flops [33], where r 0 may be as large as 2r 0. Similarly, the costs of the new ULV triangular HSS factorization and solution are respectively. ξ 2 fact r 0 m 0 + 2r r0 2 m 0 n 0 and ξ 2 sol m 0 + 4r r2 0 m Complexity of the inner-outer HSS algorithm and other algorithms he complexity of other algorithms involved can be counted similarly. See able V. ype Operation Complexity Construction m r rn 2 1 Direct-HSS ULV factorization 3 m2 + mr + 2r ULV solution m + 4r + 8 r2 m n 1 Factorization 2 m 0 + 3r r2 0 m 0 n 2 0 ri-hss ULV factorization r 0 m 0 + 2r m 0 n 0 ULV solution m 0 + 4r r2 0 m 0 n 0 I/O-HSS 1 r 2 0 r 3 m n Construction m r rn m 0 + 3r r2 0 m 0 mn Factorization [r 0 m 0 + 2r r m 0 + r m 0 + 4r r2 0 Solution m 0 + 4r r2 0 m 0 + 4r + 8 r2 m n m 0 n 0, + 2r able V. Flop counts leading terms only of the HSS algorithms proposed in this work. ] r 3 m n In particular, with certain values of m and m 0, we get some sample counts in able VI. It can then be verified that the new HSS algorithms proposed in this work are generally much faster than similar existing ones. See, e.g., Section 2.2. ype Operation Complexity Construction 3rn 2 35 Direct-HSS ULV factorization 3 r2 n ULV solution 10rn 13 Factorization 2 r 0n ri-hss ULV factorization 3 r2 0n 0 ULV solution 10r 0 n 0 23 Construction 8 rn2 73 I/O-HSS Factorization Solution 3 r2 n 14rn able VI. Flop counts leading terms only as in able V, with r = r 0, m 0 = 2r 0, m = n 4. hese assumptions may not be satisfied in the numerical experiments in the next section.

19 ROBUS INNER-OUER HSS PRECONDIIONER Approximation errors he off-diagonal compression introduces errors into the approximations, which can be roughly measured by the following result. Proposition 4.1. For a given SPD matrix A, let τ be the relative tolerance in the off-diagonal compression by truncated SVD. hen after the outer HSS construction for A, the HSS approximation à satisfies A à 2 τolog n. A 2 Sketch of the proof. he proof uses the result that the U, V generators have orthonormal columns. he HSS tree has Olog n levels, and the approximation error accumulates additively when the compression level moves bottom-up along. his proposition can be applied to HSS constructions with a given relative tolerance τ for the offdiagonal singular values. For an off-diagonal block B, the singular values σ i that satisfy σ i < σ 1 τ are dropped, where σ 1 is the largest singular value of B. he number of remaining singular values is the numerical rank of B. he proposition gives a bound for the error introduced in the approximation of A. he error analysis for the HSS Cholesky factorization is less straightforward. As in [18], the compression of R ti t r = D i i A i 1 ti t r in 17 with an absolute error τ i 0 introduces an error of e = O D 1 2 τ 0 = O A 2 τ 0 into A i 1 ti t r. If an relative tolerance is used, this becomes i more complicated. With hierarchical compression, the error introduced into the original off-diagonal block A ti t r can be as large as Olog n e. Moreover, the compression of R i ti t r adds an error i of ẽ = Oτ0 2 to the Schur complement S i, which continues to be factorized approximately, and ẽ may be magnified by O n. Such analysis may be too pessimistic, and the detailed study is not the main focus of this work. In practice, the error is often well controlled. Since our objective is preconditioning, a lower accuracy is allowed in the inner HSS Cholesky factorization. he feasibility is verified by the numerical experiments. 5. NUMERICAL EXPERIMENS he inner-outer HSS algorithms and others are implemented in Matlab, and are tested on various examples. For convenience, we list additional notation as follows, other than that summarized at the beginning of Section 4: Notation Meaning Notation Meaning κ 2 A 2-norm condition number of A e Relative residual A1x b 2 b 2 ε Size of the shift in 37 N iter Number of preconditioned N shift Number of times when shifts 37 are conjugate gradient PCG iterations used to avoid breakdown in I/O-HSS ξ iter otal PCG cost flops Example 1. Let A 0 = x i x j, 41 n n where the points x i = cos2i + 1π/2n are the zeros of the nth Chebyshev polynomial, as used in [7, 9].

20 20 JIANLIN. XIA We demonstrate the performance of the algorithms for a linear system with an SPD coefficient matrix: A 1 x = b, with A 1 = A 0 A 0 + 2I. 42 he 2I diagonal matrix ensures that A 1 remains positive definite in our numerical tests with the presence of numerical errors on the computer. his nearly has no effect on the off-diagonal numerical ranks in our factorizations, and does not change the nature of the tests. A 1 has size n ranging from 500 to 160, 000, and b is generated with random exact x. Several aspects are tested. First, both ri-hss and Direct-HSS algorithms proposed here are faster than the original versions in [7, 36]. For example, for n = 4000, m 0 = 50, and compression accuracy τ 0 = 10 6, our ri-hssconstr and ri-hsssol directly applied to A 1 cost about 8.00E8 and 6.09E5 flops, respectively. While the original versions in [36] cost 1.24E9 and 6.11E5 flops, respectively. In the second test, we use I/O-HSS algorithms to directly solve the system 42. Different inner and outer off-diagonal compression accuracies are used, which are τ 0 and τ, respectively. No diagonal shift is involved here, or ε = 0 in 37. he performance is demonstrated in able VII. It can be seen that, for reasonable accuracy, I/O-HSSconstr has about On 2 complexity, and I/O-HSSfact and I/O-HSSsol have nearly On complexity. n ξ 3 constr 4.84E6 2.07E7 8.56E7 3.47E8 1.42E9 5.49E9 ξ 3 fact 3.30E5 6.88E5 1.40E6 2.79E6 5.77E6 1.06E7 ξ 3 sol 7.91E4 1.70E5 3.51E5 7.07E5 1.43E6 2.78E6 e 1.68E E E E E E 8 able VII. Example 1: Flop counts ξ 3 and relative residuals e of the inner-outer HSS algorithms for solving 42, where m = 1000, m 0 = 50, ε = 0, τ = 10 8, τ 0 = he I/O-HSS algorithms are also compared with ri-hss and Direct-HSS algorithms, when they are directly applied to A 1 to get about the same accuracy. he same set of parameters m 0, τ 0 are used by ri-hss and Direct-HSS. Note that the ULV factorization in ri-hss is intended for parallel implementation and may be slower than the sequential triangular solver in [24]. hus, here we use the one in [24], which does not use a factorization stage. From Figure 7, we observe that the ξ sol costs of all three types of algorithms are comparable to each other, while I/O-HSS can take into consideration both the scalability and the robustness. In the third test, we examine the effectiveness of I/O-HSS as a preconditioner in PCG for A 1 and also A 2 = A 0 A 0 A 0 A 0 + 2I. he matrices A 1 and A 2 have 2-norm condition numbers κ 2 = 5.4e6 and κ 2 = 6.2e13, respectively. In I/O-HSSconstr, we only keep very few columns in the off-diagonal compression, or we manually set the numerical rank r to be a small number and set r 0 = r. hus, it is very efficient to obtain and use the preconditioner. he results are shown in able VIII. he details of the iteration processes are shown in Figure 8. We can see that, PCG with I/O-HSS converges much faster and costs far less than with block diagonal preconditioning. Lastly, we test the performance of the inner-outer preconditioning by using lower accuracies in the inner ri-hss factorization. For A 1 in able VIII, we use outer HSS rank bound r = 7, and the inner HSS rank bound r 0 varies from 3 to 7. See able 9 for the performance. We see that fast

21 ROBUS INNER-OUER HSS PRECONDIIONER ri HSS I/O HSS Basic HSS 10 8 Basic HSS I/O HSS 10 7 I/O HSS ri HSS Basic HSS flops flops flops n matrix size n matrix size n matrix size i ξ constr ii ξ fact iii ξ sol Figure 7. Example 1: Flop counts of I/O-HSS, ri-hss, and Direct-HSS for solving 42, where m = 1000, m 0 = 50, τ = 10 8, τ 0 = 10 6, and ε = 0 in 37. Matrix A 1 κ 2 = 5.4E6 Matrix A 2 κ 2 = 6.2E13 N iter ξ iter e N iter ξ iter e Block diagonal E E 15 Block diagonal E E 9 I/O-HSS E8 6.63E 15 I/O-HSS E8 2.48E 15 able VIII. Example 1: Performance of PCG with block diagonal preconditioning and I/O-HSS preconditioning for A 1 and A 2 with sizes n = 4000, where m = 1000, m 0 = 50, r = 7, and ε = 0 in Block diag prec 2 I/O HSS prec Block diag prec 2 I/O HSS prec e relative residual e relative residual N # of iterations iter N # of iterations iter i A 1 ii A 2 Figure 8. Example 1: Convergence of PCG with block diagonal preconditioning and I/O-HSS preconditioning for A 1 and A 2 with sizes n = 4000, where m = 1000, m 0 = 50, r = 7, and ε = 0. convergence is achieved for all these r 0 values. In fact, the convergence behaviors for r 0 = 4, 5, 6, 7 are almost the same. Example 2. hen we consider a oeplitz matrix A generated by a Gaussian radial basis function, as vt i = e δ2 t 2 i,

22 22 JIANLIN. XIA e relative residual I/O HSS prec r =3 0 I/O HSS prec r =4 0 I/O HSS prec r =5 0 I/O HSS prec r =6 0 I/O HSS prec r = N iter # of iterations Figure 9. Example 1: Convergence of PCG with I/O-HSS preconditioning for A 1 in able VIII with different inner rank bounds r 0. where the t i = i 1. he oeplitz structure is not our focus and is ignored. he matrix is highly ill conditioned for small δ. In fact, its condition number is about e π2 /4δ 2 [6]. We consider the matrix with size n = If regular Direct-HSS methods are directly applied with a low accuracy τ, they actually break down since some intermediate diagonal blocks corresponding to non-leaf nodes of fail to be positive definite. hen we may use shifts as in 37. he number of times nodes of such breakdown or shifting is denoted N shift. However, I/O-HSS can significantly enhance the robustness. he larger m is, the more robust the method is. See Figure IX. If no inner factorization is used, then too many shifting steps are needed, and the preconditioner fails to be useful in practice able X. PCG with I/O-HSS preconditioning also costs much less than with block diagonal preconditioning. he detailed convergence is shown in Figure 10. Direct-HSS I/O-HSS m 50 m = m ξ 3 constr 6.11E9 5.42E9 5.49E9 5.64E9 5.79E9 5.95E9 6.11E9 6.33E9 6.48E9 ξ 3 fact 2.98E7 1.88E7 1.95E7 1.98E7 1.99E7 2.01E7 2.01E7 2.02E7 2.02E7 ξ 3 sol 2.62E6 1.38E6 1.33E6 1.31E6 1.30E6 1.30E6 1.29E6 1.29E6 1.29E6 N shift able IX. Example 2: Costs ξ 3 of I/O-HSS algorithms, and the number N shift of shifts, where of PCG with I/O-HSS preconditioning, where m 0 = 50, r = 10, and ε = 10 4 in 37. N iter ξ iter e Block diagonal E E 14 Direct-HSS F ailure I/O-HSS r = E E 16 able X. Example 2: Convergence of PCG with block diagonal preconditioning and I/O-HSS preconditioning, where m 0 = 50, r = 10, and ε = 10 4 in 37.

Fast algorithms for hierarchically semiseparable matrices

Fast algorithms for hierarchically semiseparable matrices NUMERICAL LINEAR ALGEBRA WITH APPLICATIONS Numer. Linear Algebra Appl. 2010; 17:953 976 Published online 22 December 2009 in Wiley Online Library (wileyonlinelibrary.com)..691 Fast algorithms for hierarchically

More information

EFFECTIVE AND ROBUST PRECONDITIONING OF GENERAL SPD MATRICES VIA STRUCTURED INCOMPLETE FACTORIZATION

EFFECTIVE AND ROBUST PRECONDITIONING OF GENERAL SPD MATRICES VIA STRUCTURED INCOMPLETE FACTORIZATION EFFECVE AND ROBUS PRECONDONNG OF GENERAL SPD MARCES VA SRUCURED NCOMPLEE FACORZAON JANLN XA AND ZXNG XN Abstract For general symmetric positive definite SPD matrices, we present a framework for designing

More information

MULTI-LAYER HIERARCHICAL STRUCTURES AND FACTORIZATIONS

MULTI-LAYER HIERARCHICAL STRUCTURES AND FACTORIZATIONS MULTI-LAYER HIERARCHICAL STRUCTURES AND FACTORIZATIONS JIANLIN XIA Abstract. We propose multi-layer hierarchically semiseparable MHS structures for the fast factorizations of dense matrices arising from

More information

Effective matrix-free preconditioning for the augmented immersed interface method

Effective matrix-free preconditioning for the augmented immersed interface method Effective matrix-free preconditioning for the augmented immersed interface method Jianlin Xia a, Zhilin Li b, Xin Ye a a Department of Mathematics, Purdue University, West Lafayette, IN 47907, USA. E-mail:

More information

Fast matrix algebra for dense matrices with rank-deficient off-diagonal blocks

Fast matrix algebra for dense matrices with rank-deficient off-diagonal blocks CHAPTER 2 Fast matrix algebra for dense matrices with rank-deficient off-diagonal blocks Chapter summary: The chapter describes techniques for rapidly performing algebraic operations on dense matrices

More information

Incomplete Cholesky preconditioners that exploit the low-rank property

Incomplete Cholesky preconditioners that exploit the low-rank property anapov@ulb.ac.be ; http://homepages.ulb.ac.be/ anapov/ 1 / 35 Incomplete Cholesky preconditioners that exploit the low-rank property (theory and practice) Artem Napov Service de Métrologie Nucléaire, Université

More information

A DISTRIBUTED-MEMORY RANDOMIZED STRUCTURED MULTIFRONTAL METHOD FOR SPARSE DIRECT SOLUTIONS

A DISTRIBUTED-MEMORY RANDOMIZED STRUCTURED MULTIFRONTAL METHOD FOR SPARSE DIRECT SOLUTIONS A DISTRIBUTED-MEMORY RANDOMIZED STRUCTURED MULTIFRONTAL METHOD FOR SPARSE DIRECT SOLUTIONS ZIXING XIN, JIANLIN XIA, MAARTEN V. DE HOOP, STEPHEN CAULEY, AND VENKATARAMANAN BALAKRISHNAN Abstract. We design

More information

FAST STRUCTURED EIGENSOLVER FOR DISCRETIZED PARTIAL DIFFERENTIAL OPERATORS ON GENERAL MESHES

FAST STRUCTURED EIGENSOLVER FOR DISCRETIZED PARTIAL DIFFERENTIAL OPERATORS ON GENERAL MESHES Proceedings of the Project Review, Geo-Mathematical Imaging Group Purdue University, West Lafayette IN, Vol. 1 2012 pp. 123-132. FAST STRUCTURED EIGENSOLVER FOR DISCRETIZED PARTIAL DIFFERENTIAL OPERATORS

More information

Numerical Methods I Non-Square and Sparse Linear Systems

Numerical Methods I Non-Square and Sparse Linear Systems Numerical Methods I Non-Square and Sparse Linear Systems Aleksandar Donev Courant Institute, NYU 1 donev@courant.nyu.edu 1 MATH-GA 2011.003 / CSCI-GA 2945.003, Fall 2014 September 25th, 2014 A. Donev (Courant

More information

A DISTRIBUTED-MEMORY RANDOMIZED STRUCTURED MULTIFRONTAL METHOD FOR SPARSE DIRECT SOLUTIONS

A DISTRIBUTED-MEMORY RANDOMIZED STRUCTURED MULTIFRONTAL METHOD FOR SPARSE DIRECT SOLUTIONS SIAM J. SCI. COMPUT. Vol. 39, No. 4, pp. C292 C318 c 2017 Society for Industrial and Applied Mathematics A DISTRIBUTED-MEMORY RANDOMIZED STRUCTURED MULTIFRONTAL METHOD FOR SPARSE DIRECT SOLUTIONS ZIXING

More information

A sparse multifrontal solver using hierarchically semi-separable frontal matrices

A sparse multifrontal solver using hierarchically semi-separable frontal matrices A sparse multifrontal solver using hierarchically semi-separable frontal matrices Pieter Ghysels Lawrence Berkeley National Laboratory Joint work with: Xiaoye S. Li (LBNL), Artem Napov (ULB), François-Henry

More information

AMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences)

AMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences) AMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences) Lecture 19: Computing the SVD; Sparse Linear Systems Xiangmin Jiao Stony Brook University Xiangmin Jiao Numerical

More information

Partial Left-Looking Structured Multifrontal Factorization & Algorithms for Compressed Sensing. Cinna Julie Wu

Partial Left-Looking Structured Multifrontal Factorization & Algorithms for Compressed Sensing. Cinna Julie Wu Partial Left-Looking Structured Multifrontal Factorization & Algorithms for Compressed Sensing by Cinna Julie Wu A dissertation submitted in partial satisfaction of the requirements for the degree of Doctor

More information

Key words. spectral method, structured matrix, low-rank property, matrix-free direct solver, linear complexity

Key words. spectral method, structured matrix, low-rank property, matrix-free direct solver, linear complexity SIAM J. SCI. COMPUT. Vol. 0, No. 0, pp. 000 000 c XXXX Society for Industrial and Applied Mathematics FAST STRUCTURED DIRECT SPECTRAL METHODS FOR DIFFERENTIAL EQUATIONS WITH VARIABLE COEFFICIENTS, I. THE

More information

FAST SPARSE SELECTED INVERSION

FAST SPARSE SELECTED INVERSION SIAM J. MARIX ANAL. APPL. Vol. 36, No. 3, pp. 83 34 c 05 Society for Industrial and Applied Mathematics FAS SPARSE SELECED INVERSION JIANLIN XIA, YUANZHE XI, SEPHEN CAULEY, AND VENKAARAMANAN BALAKRISHNAN

More information

J.I. Aliaga 1 M. Bollhöfer 2 A.F. Martín 1 E.S. Quintana-Ortí 1. March, 2009

J.I. Aliaga 1 M. Bollhöfer 2 A.F. Martín 1 E.S. Quintana-Ortí 1. March, 2009 Parallel Preconditioning of Linear Systems based on ILUPACK for Multithreaded Architectures J.I. Aliaga M. Bollhöfer 2 A.F. Martín E.S. Quintana-Ortí Deparment of Computer Science and Engineering, Univ.

More information

A fast randomized algorithm for computing a Hierarchically Semi-Separable representation of a matrix

A fast randomized algorithm for computing a Hierarchically Semi-Separable representation of a matrix A fast randomized algorithm for computing a Hierarchically Semi-Separable representation of a matrix P.G. Martinsson, Department of Applied Mathematics, University of Colorado at Boulder Abstract: Randomized

More information

Preconditioning Techniques Analysis for CG Method

Preconditioning Techniques Analysis for CG Method Preconditioning Techniques Analysis for CG Method Huaguang Song Department of Computer Science University of California, Davis hso@ucdavis.edu Abstract Matrix computation issue for solve linear system

More information

ANONSINGULAR tridiagonal linear system of the form

ANONSINGULAR tridiagonal linear system of the form Generalized Diagonal Pivoting Methods for Tridiagonal Systems without Interchanges Jennifer B. Erway, Roummel F. Marcia, and Joseph A. Tyson Abstract It has been shown that a nonsingular symmetric tridiagonal

More information

A robust multilevel approximate inverse preconditioner for symmetric positive definite matrices

A robust multilevel approximate inverse preconditioner for symmetric positive definite matrices DICEA DEPARTMENT OF CIVIL, ENVIRONMENTAL AND ARCHITECTURAL ENGINEERING PhD SCHOOL CIVIL AND ENVIRONMENTAL ENGINEERING SCIENCES XXX CYCLE A robust multilevel approximate inverse preconditioner for symmetric

More information

14.2 QR Factorization with Column Pivoting

14.2 QR Factorization with Column Pivoting page 531 Chapter 14 Special Topics Background Material Needed Vector and Matrix Norms (Section 25) Rounding Errors in Basic Floating Point Operations (Section 33 37) Forward Elimination and Back Substitution

More information

AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 23: GMRES and Other Krylov Subspace Methods; Preconditioning

AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 23: GMRES and Other Krylov Subspace Methods; Preconditioning AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 23: GMRES and Other Krylov Subspace Methods; Preconditioning Xiangmin Jiao SUNY Stony Brook Xiangmin Jiao Numerical Analysis I 1 / 18 Outline

More information

arxiv: v1 [cs.na] 20 Jul 2015

arxiv: v1 [cs.na] 20 Jul 2015 AN EFFICIENT SOLVER FOR SPARSE LINEAR SYSTEMS BASED ON RANK-STRUCTURED CHOLESKY FACTORIZATION JEFFREY N. CHADWICK AND DAVID S. BINDEL arxiv:1507.05593v1 [cs.na] 20 Jul 2015 Abstract. Direct factorization

More information

Fast Structured Spectral Methods

Fast Structured Spectral Methods Spectral methods HSS structures Fast algorithms Conclusion Fast Structured Spectral Methods Yingwei Wang Department of Mathematics, Purdue University Joint work with Prof Jie Shen and Prof Jianlin Xia

More information

Lecture # 20 The Preconditioned Conjugate Gradient Method

Lecture # 20 The Preconditioned Conjugate Gradient Method Lecture # 20 The Preconditioned Conjugate Gradient Method We wish to solve Ax = b (1) A R n n is symmetric and positive definite (SPD). We then of n are being VERY LARGE, say, n = 10 6 or n = 10 7. Usually,

More information

Solving linear equations with Gaussian Elimination (I)

Solving linear equations with Gaussian Elimination (I) Term Projects Solving linear equations with Gaussian Elimination The QR Algorithm for Symmetric Eigenvalue Problem The QR Algorithm for The SVD Quasi-Newton Methods Solving linear equations with Gaussian

More information

AMS526: Numerical Analysis I (Numerical Linear Algebra)

AMS526: Numerical Analysis I (Numerical Linear Algebra) AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 7: More on Householder Reflectors; Least Squares Problems Xiangmin Jiao SUNY Stony Brook Xiangmin Jiao Numerical Analysis I 1 / 15 Outline

More information

Matrix decompositions

Matrix decompositions Matrix decompositions Zdeněk Dvořák May 19, 2015 Lemma 1 (Schur decomposition). If A is a symmetric real matrix, then there exists an orthogonal matrix Q and a diagonal matrix D such that A = QDQ T. The

More information

6.4 Krylov Subspaces and Conjugate Gradients

6.4 Krylov Subspaces and Conjugate Gradients 6.4 Krylov Subspaces and Conjugate Gradients Our original equation is Ax = b. The preconditioned equation is P Ax = P b. When we write P, we never intend that an inverse will be explicitly computed. P

More information

Numerical Methods in Matrix Computations

Numerical Methods in Matrix Computations Ake Bjorck Numerical Methods in Matrix Computations Springer Contents 1 Direct Methods for Linear Systems 1 1.1 Elements of Matrix Theory 1 1.1.1 Matrix Algebra 2 1.1.2 Vector Spaces 6 1.1.3 Submatrices

More information

Arnoldi Methods in SLEPc

Arnoldi Methods in SLEPc Scalable Library for Eigenvalue Problem Computations SLEPc Technical Report STR-4 Available at http://slepc.upv.es Arnoldi Methods in SLEPc V. Hernández J. E. Román A. Tomás V. Vidal Last update: October,

More information

Parallel Numerical Algorithms

Parallel Numerical Algorithms Parallel Numerical Algorithms Chapter 6 Matrix Models Section 6.2 Low Rank Approximation Edgar Solomonik Department of Computer Science University of Illinois at Urbana-Champaign CS 554 / CSE 512 Edgar

More information

Sparse factorization using low rank submatrices. Cleve Ashcraft LSTC 2010 MUMPS User Group Meeting April 15-16, 2010 Toulouse, FRANCE

Sparse factorization using low rank submatrices. Cleve Ashcraft LSTC 2010 MUMPS User Group Meeting April 15-16, 2010 Toulouse, FRANCE Sparse factorization using low rank submatrices Cleve Ashcraft LSTC cleve@lstc.com 21 MUMPS User Group Meeting April 15-16, 21 Toulouse, FRANCE ftp.lstc.com:outgoing/cleve/mumps1 Ashcraft.pdf 1 LSTC Livermore

More information

Conjugate Gradient Method

Conjugate Gradient Method Conjugate Gradient Method direct and indirect methods positive definite linear systems Krylov sequence spectral analysis of Krylov sequence preconditioning Prof. S. Boyd, EE364b, Stanford University Three

More information

Multilevel low-rank approximation preconditioners Yousef Saad Department of Computer Science and Engineering University of Minnesota

Multilevel low-rank approximation preconditioners Yousef Saad Department of Computer Science and Engineering University of Minnesota Multilevel low-rank approximation preconditioners Yousef Saad Department of Computer Science and Engineering University of Minnesota SIAM CSE Boston - March 1, 2013 First: Joint work with Ruipeng Li Work

More information

LU Factorization. Marco Chiarandini. DM559 Linear and Integer Programming. Department of Mathematics & Computer Science University of Southern Denmark

LU Factorization. Marco Chiarandini. DM559 Linear and Integer Programming. Department of Mathematics & Computer Science University of Southern Denmark DM559 Linear and Integer Programming LU Factorization Marco Chiarandini Department of Mathematics & Computer Science University of Southern Denmark [Based on slides by Lieven Vandenberghe, UCLA] Outline

More information

Numerical Linear Algebra Primer. Ryan Tibshirani Convex Optimization /36-725

Numerical Linear Algebra Primer. Ryan Tibshirani Convex Optimization /36-725 Numerical Linear Algebra Primer Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: proximal gradient descent Consider the problem min g(x) + h(x) with g, h convex, g differentiable, and h simple

More information

Preliminary/Qualifying Exam in Numerical Analysis (Math 502a) Spring 2012

Preliminary/Qualifying Exam in Numerical Analysis (Math 502a) Spring 2012 Instructions Preliminary/Qualifying Exam in Numerical Analysis (Math 502a) Spring 2012 The exam consists of four problems, each having multiple parts. You should attempt to solve all four problems. 1.

More information

Scientific Computing with Case Studies SIAM Press, Lecture Notes for Unit VII Sparse Matrix

Scientific Computing with Case Studies SIAM Press, Lecture Notes for Unit VII Sparse Matrix Scientific Computing with Case Studies SIAM Press, 2009 http://www.cs.umd.edu/users/oleary/sccswebpage Lecture Notes for Unit VII Sparse Matrix Computations Part 1: Direct Methods Dianne P. O Leary c 2008

More information

Contents. Preface... xi. Introduction...

Contents. Preface... xi. Introduction... Contents Preface... xi Introduction... xv Chapter 1. Computer Architectures... 1 1.1. Different types of parallelism... 1 1.1.1. Overlap, concurrency and parallelism... 1 1.1.2. Temporal and spatial parallelism

More information

Review Questions REVIEW QUESTIONS 71

Review Questions REVIEW QUESTIONS 71 REVIEW QUESTIONS 71 MATLAB, is [42]. For a comprehensive treatment of error analysis and perturbation theory for linear systems and many other problems in linear algebra, see [126, 241]. An overview of

More information

This can be accomplished by left matrix multiplication as follows: I

This can be accomplished by left matrix multiplication as follows: I 1 Numerical Linear Algebra 11 The LU Factorization Recall from linear algebra that Gaussian elimination is a method for solving linear systems of the form Ax = b, where A R m n and bran(a) In this method

More information

Numerical Methods I: Numerical linear algebra

Numerical Methods I: Numerical linear algebra 1/3 Numerical Methods I: Numerical linear algebra Georg Stadler Courant Institute, NYU stadler@cimsnyuedu September 1, 017 /3 We study the solution of linear systems of the form Ax = b with A R n n, x,

More information

AMS526: Numerical Analysis I (Numerical Linear Algebra)

AMS526: Numerical Analysis I (Numerical Linear Algebra) AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 24: Preconditioning and Multigrid Solver Xiangmin Jiao SUNY Stony Brook Xiangmin Jiao Numerical Analysis I 1 / 5 Preconditioning Motivation:

More information

A Block Compression Algorithm for Computing Preconditioners

A Block Compression Algorithm for Computing Preconditioners A Block Compression Algorithm for Computing Preconditioners J Cerdán, J Marín, and J Mas Abstract To implement efficiently algorithms for the solution of large systems of linear equations in modern computer

More information

Matrices. Chapter What is a Matrix? We review the basic matrix operations. An array of numbers a a 1n A = a m1...

Matrices. Chapter What is a Matrix? We review the basic matrix operations. An array of numbers a a 1n A = a m1... Chapter Matrices We review the basic matrix operations What is a Matrix? An array of numbers a a n A = a m a mn with m rows and n columns is a m n matrix Element a ij in located in position (i, j The elements

More information

9. Numerical linear algebra background

9. Numerical linear algebra background Convex Optimization Boyd & Vandenberghe 9. Numerical linear algebra background matrix structure and algorithm complexity solving linear equations with factored matrices LU, Cholesky, LDL T factorization

More information

Lecture 3: QR-Factorization

Lecture 3: QR-Factorization Lecture 3: QR-Factorization This lecture introduces the Gram Schmidt orthonormalization process and the associated QR-factorization of matrices It also outlines some applications of this factorization

More information

An H-LU Based Direct Finite Element Solver Accelerated by Nested Dissection for Large-scale Modeling of ICs and Packages

An H-LU Based Direct Finite Element Solver Accelerated by Nested Dissection for Large-scale Modeling of ICs and Packages PIERS ONLINE, VOL. 6, NO. 7, 2010 679 An H-LU Based Direct Finite Element Solver Accelerated by Nested Dissection for Large-scale Modeling of ICs and Packages Haixin Liu and Dan Jiao School of Electrical

More information

Scientific Computing

Scientific Computing Scientific Computing Direct solution methods Martin van Gijzen Delft University of Technology October 3, 2018 1 Program October 3 Matrix norms LU decomposition Basic algorithm Cost Stability Pivoting Pivoting

More information

1. Introduction. In this paper, we consider the eigenvalue solution for a non- Hermitian matrix A:

1. Introduction. In this paper, we consider the eigenvalue solution for a non- Hermitian matrix A: A FAST CONTOUR-INTEGRAL EIGENSOLVER FOR NON-HERMITIAN MATRICES XIN YE, JIANLIN XIA, RAYMOND H. CHAN, STEPHEN CAULEY, AND VENKATARAMANAN BALAKRISHNAN Abstract. We present a fast contour-integral eigensolver

More information

Lecture 9: Numerical Linear Algebra Primer (February 11st)

Lecture 9: Numerical Linear Algebra Primer (February 11st) 10-725/36-725: Convex Optimization Spring 2015 Lecture 9: Numerical Linear Algebra Primer (February 11st) Lecturer: Ryan Tibshirani Scribes: Avinash Siravuru, Guofan Wu, Maosheng Liu Note: LaTeX template

More information

Preface to the Second Edition. Preface to the First Edition

Preface to the Second Edition. Preface to the First Edition n page v Preface to the Second Edition Preface to the First Edition xiii xvii 1 Background in Linear Algebra 1 1.1 Matrices................................. 1 1.2 Square Matrices and Eigenvalues....................

More information

Bindel, Spring 2016 Numerical Analysis (CS 4220) Notes for

Bindel, Spring 2016 Numerical Analysis (CS 4220) Notes for Cholesky Notes for 2016-02-17 2016-02-19 So far, we have focused on the LU factorization for general nonsymmetric matrices. There is an alternate factorization for the case where A is symmetric positive

More information

Homework 2 Foundations of Computational Math 2 Spring 2019

Homework 2 Foundations of Computational Math 2 Spring 2019 Homework 2 Foundations of Computational Math 2 Spring 2019 Problem 2.1 (2.1.a) Suppose (v 1,λ 1 )and(v 2,λ 2 ) are eigenpairs for a matrix A C n n. Show that if λ 1 λ 2 then v 1 and v 2 are linearly independent.

More information

Linear Least squares

Linear Least squares Linear Least squares Method of least squares Measurement errors are inevitable in observational and experimental sciences Errors can be smoothed out by averaging over more measurements than necessary to

More information

Numerical Linear Algebra Primer. Ryan Tibshirani Convex Optimization

Numerical Linear Algebra Primer. Ryan Tibshirani Convex Optimization Numerical Linear Algebra Primer Ryan Tibshirani Convex Optimization 10-725 Consider Last time: proximal Newton method min x g(x) + h(x) where g, h convex, g twice differentiable, and h simple. Proximal

More information

Multilevel Preconditioning of Graph-Laplacians: Polynomial Approximation of the Pivot Blocks Inverses

Multilevel Preconditioning of Graph-Laplacians: Polynomial Approximation of the Pivot Blocks Inverses Multilevel Preconditioning of Graph-Laplacians: Polynomial Approximation of the Pivot Blocks Inverses P. Boyanova 1, I. Georgiev 34, S. Margenov, L. Zikatanov 5 1 Uppsala University, Box 337, 751 05 Uppsala,

More information

ITERATIVE METHODS FOR SPARSE LINEAR SYSTEMS

ITERATIVE METHODS FOR SPARSE LINEAR SYSTEMS ITERATIVE METHODS FOR SPARSE LINEAR SYSTEMS YOUSEF SAAD University of Minnesota PWS PUBLISHING COMPANY I(T)P An International Thomson Publishing Company BOSTON ALBANY BONN CINCINNATI DETROIT LONDON MADRID

More information

Direct and Incomplete Cholesky Factorizations with Static Supernodes

Direct and Incomplete Cholesky Factorizations with Static Supernodes Direct and Incomplete Cholesky Factorizations with Static Supernodes AMSC 661 Term Project Report Yuancheng Luo 2010-05-14 Introduction Incomplete factorizations of sparse symmetric positive definite (SSPD)

More information

Vector and Matrix Norms. Vector and Matrix Norms

Vector and Matrix Norms. Vector and Matrix Norms Vector and Matrix Norms Vector Space Algebra Matrix Algebra: We let x x and A A, where, if x is an element of an abstract vector space n, and A = A: n m, then x is a complex column vector of length n whose

More information

The antitriangular factorisation of saddle point matrices

The antitriangular factorisation of saddle point matrices The antitriangular factorisation of saddle point matrices J. Pestana and A. J. Wathen August 29, 2013 Abstract Mastronardi and Van Dooren [this journal, 34 (2013) pp. 173 196] recently introduced the block

More information

Linear Solvers. Andrew Hazel

Linear Solvers. Andrew Hazel Linear Solvers Andrew Hazel Introduction Thus far we have talked about the formulation and discretisation of physical problems...... and stopped when we got to a discrete linear system of equations. Introduction

More information

A Backward Stable Hyperbolic QR Factorization Method for Solving Indefinite Least Squares Problem

A Backward Stable Hyperbolic QR Factorization Method for Solving Indefinite Least Squares Problem A Backward Stable Hyperbolic QR Factorization Method for Solving Indefinite Least Suares Problem Hongguo Xu Dedicated to Professor Erxiong Jiang on the occasion of his 7th birthday. Abstract We present

More information

Linear Algebra: A Constructive Approach

Linear Algebra: A Constructive Approach Chapter 2 Linear Algebra: A Constructive Approach In Section 14 we sketched a geometric interpretation of the simplex method In this chapter, we describe the basis of an algebraic interpretation that allows

More information

INTERCONNECTED HIERARCHICAL RANK-STRUCTURED METHODS FOR DIRECTLY SOLVING AND PRECONDITIONING THE HELMHOLTZ EQUATION

INTERCONNECTED HIERARCHICAL RANK-STRUCTURED METHODS FOR DIRECTLY SOLVING AND PRECONDITIONING THE HELMHOLTZ EQUATION MATHEMATCS OF COMPUTATON Volume 00, Number 0, Pages 000 000 S 0025-5718XX0000-0 NTERCONNECTED HERARCHCAL RANK-STRUCTURED METHODS FOR DRECTLY SOLVNG AND PRECONDTONNG THE HELMHOLTZ EQUATON XAO LU, JANLN

More information

ACM106a - Homework 2 Solutions

ACM106a - Homework 2 Solutions ACM06a - Homework 2 Solutions prepared by Svitlana Vyetrenko October 7, 2006. Chapter 2, problem 2.2 (solution adapted from Golub, Van Loan, pp.52-54): For the proof we will use the fact that if A C m

More information

Conjugate gradient method. Descent method. Conjugate search direction. Conjugate Gradient Algorithm (294)

Conjugate gradient method. Descent method. Conjugate search direction. Conjugate Gradient Algorithm (294) Conjugate gradient method Descent method Hestenes, Stiefel 1952 For A N N SPD In exact arithmetic, solves in N steps In real arithmetic No guaranteed stopping Often converges in many fewer than N steps

More information

Topics. The CG Algorithm Algorithmic Options CG s Two Main Convergence Theorems

Topics. The CG Algorithm Algorithmic Options CG s Two Main Convergence Theorems Topics The CG Algorithm Algorithmic Options CG s Two Main Convergence Theorems What about non-spd systems? Methods requiring small history Methods requiring large history Summary of solvers 1 / 52 Conjugate

More information

Sparsity-Preserving Difference of Positive Semidefinite Matrix Representation of Indefinite Matrices

Sparsity-Preserving Difference of Positive Semidefinite Matrix Representation of Indefinite Matrices Sparsity-Preserving Difference of Positive Semidefinite Matrix Representation of Indefinite Matrices Jaehyun Park June 1 2016 Abstract We consider the problem of writing an arbitrary symmetric matrix as

More information

AMS526: Numerical Analysis I (Numerical Linear Algebra)

AMS526: Numerical Analysis I (Numerical Linear Algebra) AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 3: Positive-Definite Systems; Cholesky Factorization Xiangmin Jiao Stony Brook University Xiangmin Jiao Numerical Analysis I 1 / 11 Symmetric

More information

BLAS: Basic Linear Algebra Subroutines Analysis of the Matrix-Vector-Product Analysis of Matrix-Matrix Product

BLAS: Basic Linear Algebra Subroutines Analysis of the Matrix-Vector-Product Analysis of Matrix-Matrix Product Level-1 BLAS: SAXPY BLAS-Notation: S single precision (D for double, C for complex) A α scalar X vector P plus operation Y vector SAXPY: y = αx + y Vectorization of SAXPY (αx + y) by pipelining: page 8

More information

Incomplete Factorization Preconditioners for Least Squares and Linear and Quadratic Programming

Incomplete Factorization Preconditioners for Least Squares and Linear and Quadratic Programming Incomplete Factorization Preconditioners for Least Squares and Linear and Quadratic Programming by Jelena Sirovljevic B.Sc., The University of British Columbia, 2005 A THESIS SUBMITTED IN PARTIAL FULFILMENT

More information

LU Factorization with Panel Rank Revealing Pivoting and Its Communication Avoiding Version

LU Factorization with Panel Rank Revealing Pivoting and Its Communication Avoiding Version LU Factorization with Panel Rank Revealing Pivoting and Its Communication Avoiding Version Amal Khabou James Demmel Laura Grigori Ming Gu Electrical Engineering and Computer Sciences University of California

More information

Preconditioning Techniques for Large Linear Systems Part III: General-Purpose Algebraic Preconditioners

Preconditioning Techniques for Large Linear Systems Part III: General-Purpose Algebraic Preconditioners Preconditioning Techniques for Large Linear Systems Part III: General-Purpose Algebraic Preconditioners Michele Benzi Department of Mathematics and Computer Science Emory University Atlanta, Georgia, USA

More information

12. Cholesky factorization

12. Cholesky factorization L. Vandenberghe ECE133A (Winter 2018) 12. Cholesky factorization positive definite matrices examples Cholesky factorization complex positive definite matrices kernel methods 12-1 Definitions a symmetric

More information

AMS Mathematics Subject Classification : 65F10,65F50. Key words and phrases: ILUS factorization, preconditioning, Schur complement, 1.

AMS Mathematics Subject Classification : 65F10,65F50. Key words and phrases: ILUS factorization, preconditioning, Schur complement, 1. J. Appl. Math. & Computing Vol. 15(2004), No. 1, pp. 299-312 BILUS: A BLOCK VERSION OF ILUS FACTORIZATION DAVOD KHOJASTEH SALKUYEH AND FAEZEH TOUTOUNIAN Abstract. ILUS factorization has many desirable

More information

Scientific Computing WS 2018/2019. Lecture 9. Jürgen Fuhrmann Lecture 9 Slide 1

Scientific Computing WS 2018/2019. Lecture 9. Jürgen Fuhrmann Lecture 9 Slide 1 Scientific Computing WS 2018/2019 Lecture 9 Jürgen Fuhrmann juergen.fuhrmann@wias-berlin.de Lecture 9 Slide 1 Lecture 9 Slide 2 Simple iteration with preconditioning Idea: Aû = b iterative scheme û = û

More information

Orthogonal Transformations

Orthogonal Transformations Orthogonal Transformations Tom Lyche University of Oslo Norway Orthogonal Transformations p. 1/3 Applications of Qx with Q T Q = I 1. solving least squares problems (today) 2. solving linear equations

More information

Block-tridiagonal matrices

Block-tridiagonal matrices Block-tridiagonal matrices. p.1/31 Block-tridiagonal matrices - where do these arise? - as a result of a particular mesh-point ordering - as a part of a factorization procedure, for example when we compute

More information

Fast Hessenberg QR Iteration for Companion Matrices

Fast Hessenberg QR Iteration for Companion Matrices Fast Hessenberg QR Iteration for Companion Matrices David Bindel Ming Gu David Garmire James Demmel Shivkumar Chandrasekaran Fast Hessenberg QR Iteration for Companion Matrices p.1/20 Motivation: Polynomial

More information

Numerical Methods. Elena loli Piccolomini. Civil Engeneering. piccolom. Metodi Numerici M p. 1/??

Numerical Methods. Elena loli Piccolomini. Civil Engeneering.  piccolom. Metodi Numerici M p. 1/?? Metodi Numerici M p. 1/?? Numerical Methods Elena loli Piccolomini Civil Engeneering http://www.dm.unibo.it/ piccolom elena.loli@unibo.it Metodi Numerici M p. 2/?? Least Squares Data Fitting Measurement

More information

Dense LU factorization and its error analysis

Dense LU factorization and its error analysis Dense LU factorization and its error analysis Laura Grigori INRIA and LJLL, UPMC February 2016 Plan Basis of floating point arithmetic and stability analysis Notation, results, proofs taken from [N.J.Higham,

More information

Bindel, Fall 2016 Matrix Computations (CS 6210) Notes for

Bindel, Fall 2016 Matrix Computations (CS 6210) Notes for 1 Logistics Notes for 2016-08-26 1. Our enrollment is at 50, and there are still a few students who want to get in. We only have 50 seats in the room, and I cannot increase the cap further. So if you are

More information

AM 205: lecture 8. Last time: Cholesky factorization, QR factorization Today: how to compute the QR factorization, the Singular Value Decomposition

AM 205: lecture 8. Last time: Cholesky factorization, QR factorization Today: how to compute the QR factorization, the Singular Value Decomposition AM 205: lecture 8 Last time: Cholesky factorization, QR factorization Today: how to compute the QR factorization, the Singular Value Decomposition QR Factorization A matrix A R m n, m n, can be factorized

More information

Boundary Value Problems - Solving 3-D Finite-Difference problems Jacob White

Boundary Value Problems - Solving 3-D Finite-Difference problems Jacob White Introduction to Simulation - Lecture 2 Boundary Value Problems - Solving 3-D Finite-Difference problems Jacob White Thanks to Deepak Ramaswamy, Michal Rewienski, and Karen Veroy Outline Reminder about

More information

LU Factorization. LU factorization is the most common way of solving linear systems! Ax = b LUx = b

LU Factorization. LU factorization is the most common way of solving linear systems! Ax = b LUx = b AM 205: lecture 7 Last time: LU factorization Today s lecture: Cholesky factorization, timing, QR factorization Reminder: assignment 1 due at 5 PM on Friday September 22 LU Factorization LU factorization

More information

A Chebyshev-based two-stage iterative method as an alternative to the direct solution of linear systems

A Chebyshev-based two-stage iterative method as an alternative to the direct solution of linear systems A Chebyshev-based two-stage iterative method as an alternative to the direct solution of linear systems Mario Arioli m.arioli@rl.ac.uk CCLRC-Rutherford Appleton Laboratory with Daniel Ruiz (E.N.S.E.E.I.H.T)

More information

ETNA Kent State University

ETNA Kent State University C 8 Electronic Transactions on Numerical Analysis. Volume 17, pp. 76-2, 2004. Copyright 2004,. ISSN 1068-613. etnamcs.kent.edu STRONG RANK REVEALING CHOLESKY FACTORIZATION M. GU AND L. MIRANIAN Abstract.

More information

Index. book 2009/5/27 page 121. (Page numbers set in bold type indicate the definition of an entry.)

Index. book 2009/5/27 page 121. (Page numbers set in bold type indicate the definition of an entry.) page 121 Index (Page numbers set in bold type indicate the definition of an entry.) A absolute error...26 componentwise...31 in subtraction...27 normwise...31 angle in least squares problem...98,99 approximation

More information

Linear algebra issues in Interior Point methods for bound-constrained least-squares problems

Linear algebra issues in Interior Point methods for bound-constrained least-squares problems Linear algebra issues in Interior Point methods for bound-constrained least-squares problems Stefania Bellavia Dipartimento di Energetica S. Stecco Università degli Studi di Firenze Joint work with Jacek

More information

Notes on Eigenvalues, Singular Values and QR

Notes on Eigenvalues, Singular Values and QR Notes on Eigenvalues, Singular Values and QR Michael Overton, Numerical Computing, Spring 2017 March 30, 2017 1 Eigenvalues Everyone who has studied linear algebra knows the definition: given a square

More information

Parallel Singular Value Decomposition. Jiaxing Tan

Parallel Singular Value Decomposition. Jiaxing Tan Parallel Singular Value Decomposition Jiaxing Tan Outline What is SVD? How to calculate SVD? How to parallelize SVD? Future Work What is SVD? Matrix Decomposition Eigen Decomposition A (non-zero) vector

More information

Enhancing Scalability of Sparse Direct Methods

Enhancing Scalability of Sparse Direct Methods Journal of Physics: Conference Series 78 (007) 0 doi:0.088/7-6596/78//0 Enhancing Scalability of Sparse Direct Methods X.S. Li, J. Demmel, L. Grigori, M. Gu, J. Xia 5, S. Jardin 6, C. Sovinec 7, L.-Q.

More information

On the Preconditioning of the Block Tridiagonal Linear System of Equations

On the Preconditioning of the Block Tridiagonal Linear System of Equations On the Preconditioning of the Block Tridiagonal Linear System of Equations Davod Khojasteh Salkuyeh Department of Mathematics, University of Mohaghegh Ardabili, PO Box 179, Ardabil, Iran E-mail: khojaste@umaacir

More information

A fast randomized algorithm for approximating an SVD of a matrix

A fast randomized algorithm for approximating an SVD of a matrix A fast randomized algorithm for approximating an SVD of a matrix Joint work with Franco Woolfe, Edo Liberty, and Vladimir Rokhlin Mark Tygert Program in Applied Mathematics Yale University Place July 17,

More information

Linear Algebra, part 3 QR and SVD

Linear Algebra, part 3 QR and SVD Linear Algebra, part 3 QR and SVD Anna-Karin Tornberg Mathematical Models, Analysis and Simulation Fall semester, 2012 Going back to least squares (Section 1.4 from Strang, now also see section 5.2). We

More information

Notes on PCG for Sparse Linear Systems

Notes on PCG for Sparse Linear Systems Notes on PCG for Sparse Linear Systems Luca Bergamaschi Department of Civil Environmental and Architectural Engineering University of Padova e-mail luca.bergamaschi@unipd.it webpage www.dmsa.unipd.it/

More information

LU factorization with Panel Rank Revealing Pivoting and its Communication Avoiding version

LU factorization with Panel Rank Revealing Pivoting and its Communication Avoiding version 1 LU factorization with Panel Rank Revealing Pivoting and its Communication Avoiding version Amal Khabou Advisor: Laura Grigori Université Paris Sud 11, INRIA Saclay France SIAMPP12 February 17, 2012 2

More information

Applied Mathematics 205. Unit II: Numerical Linear Algebra. Lecturer: Dr. David Knezevic

Applied Mathematics 205. Unit II: Numerical Linear Algebra. Lecturer: Dr. David Knezevic Applied Mathematics 205 Unit II: Numerical Linear Algebra Lecturer: Dr. David Knezevic Unit II: Numerical Linear Algebra Chapter II.3: QR Factorization, SVD 2 / 66 QR Factorization 3 / 66 QR Factorization

More information