Sparse Direct Solvers

Size: px
Start display at page:

Download "Sparse Direct Solvers"

Transcription

1 Sparse Direct Solvers Alfredo Buttari 1 (with slides from Prof P. Amestoy, Dr. J.-Y. L Excellent and Dr. B. Uçar) 1 alfredo.buttari@enseeiht.fr

2 Sparse Matrix Factorizations A 2< m m, symmetric positive definite! LL T = A Ax = b A 2< m m,symmetric! LDL T = A Ax = b A 2< m m,unsymmetric! LU = A Ax = b A 2< m n, m 6= n! QR = A min x kax bk if m>n min kxk such that Ax = b if n>m

3 Factorization of sparse matrices: problems The factorization of a sparse matrix is problematic due to the presence of fill-in. The basic LU step: a (k+1) i,j = a (k) i,j a (k) i,k a(k) k,j a (k) k,k Even if a (k) i,j is null, a(k+1) i,j can be a nonzero a fa d o o o o h oo 0

4 Factorization of sparse matrices: problems Which kind of problems does fill-in pose? more expensive is the factorization higher amount of memory required more complicated algorithms to achieve the factorization The amount of fill-in must be predicted using elimination trees reduced (possibly) using ordering algorithms These steps, moreover, must complete much faster than the actual factorization The basic tools to achieve all this are GRAPHS

5 Graph theory definitions and notions

6 Graph notations and definitions A graph G =(V,E) consists of a finite set V,calledthevertexset and a finite, binary relation E on V, called the edge set. Three standard graph models Undirected graph: The edges are unordered pair of vertices, i.e., {u, v} 2E for some u, v 2 V. Directed graph: The edges are ordered pair of vertices, that is, (u, v) and (v, u) are two di erent edges. Bipartite graph: G =(U [ V,E) consists of two disjoint vertex sets U and V such that for each edge (u, v) 2 E, u 2 U and v 2 V. An ordering or labelling of G =(V,E) having n vertices, i.e., V = n, is a mapping of V onto 1, 2,...,n.

7 Matrices and graphs: Rectangular matrices The rows/columns and nonzeros of a given sparse matrix correspond (with natural labelling) to the vertices and edges, respectively, of a graph. Bipartite graph Rectangular matrices A = A 3 r 1 r 2 r 3 c 1 c 2 c 3 c 4 The nodes corresponding to rows of the matrix are grouped into a vertex set R while the columns into the other vertex set C such that for each a ij 6=0, (r i,c j ) is an edge.

8 Matrices and graphs: Square unsymmetric pattern The set of rows/cols corresponds the vertex set V such that for each a ij 6=0, (v i,v j ) is an edge. Transposed view possible too, i.e., the edge (v i,v j ) directed from column i to row j. Usually self-loops are omitted. The rows/columns and nonzeros of a given sparse matrix correspond (with natural labelling) to the vertices and edges, respectively, of a graph. Graph models Square unsymmetric pattern matrices A = A 3 Bipartite graph as before. Directed graph v 1 v v 2 3

9 Matrices and graphs: Symmetric pattern The rows/columns and nonzeros of a given sparse matrix correspond (with natural labelling) to the vertices and edges, respectively, of a graph. Square symmetric pattern matrices A = A 3 Graph models Bipartite and directed graphs as before. Undirected graph v 1 v v 2 3 The set of rows/cols corresponds the vertex set V such that for each a ij,a ji 6=0, {v i,v j } is an edge. No self-loops.

10 Definitions: Edges, degrees, and paths Many definitions for directed and undirected graphs are the same. We will use (u, v) to refer to an edge of an undirected or directed graph to avoid repeated definitions. An edge (u, v) is said to incident on the vertices u and v. For any vertex u, the set of vertices in adj(u) ={v :(u, v) 2 E} are called the neighbors of u. Theverticesinadj(u) are said to be adjacent to u. The degree of a vertex is the number of edges incident on it. A path p of length k is a sequence of vertices hv 0,v 1,...,v k i where (v i 1,v i ) 2 E for i =1,...,k. The two end points v 0 and v k are said to be connected by the path p, andthevertexv k is said to be reachable from v 0.

11 Definitions: Components An undirected graph is said to be connected if every pair of vertices is connected by a path. The connected components of an undirected graph are the equivalence classes of vertices under the is reachable from relation. A directed graph is said to be strongly connected if every pair of vertices are reachable from each other. The strongly connected components of a directed graph are the equivalence classes of vertices under the are mutually reachable relation.

12 Definitions: Trees and spanning trees A tree is a connected, acyclic, undirected graph. If an undirected graph is acyclic but disconnected, then it is a forest. Properties of trees Any two vertices are connected by a unique path. E = V 1 A rooted tree is a tree with a distinguished vertex r, calledtheroot. There is a unique path from the root r to every other vertex v. Any vertex y in that path is called an ancestor of v. Ify is an ancestor of v, thenv is a descendant of y. The subtree rooted at v is the tree induced by the descendants of v, rooted at v. A spanning tree of a connected graph G =(V,E) is a tree T =(V,F), suchthatf E.

13 Ordering of the vertices of a rooted tree A topological ordering of a rooted tree is an ordering that numbers children vertices before their parent. A preorder of a rooted tree is an ordering that numbers children vertices after their parent. A postorder is a topological ordering which numbers the vertices in any subtree consecutively. v z y u x w Connected graph G 6 z 5 y v 2 4 x 1 3 u w Rooted spanning tree with topological ordering 1 u 4 y 3 x 6 z 2 w 5 v Rooted spanning tree with postordering

14 Permutation matrices A permutation matrix is a square (0, 1)-matrix where each row and column has a single 1. If P is a permutation matrix, PP T = I, i.e., it is an orthogonal matrix. Let, A = 2@ A 3 and suppose we want to permute columns as [2, 1, 3]. Define p 2,1 =1, p 1,2 =1, p 3,3 =1,andB = AP (if column j to be at position i, setp ji =1) B = @ A = @ A 2@ 1 A 3 3 1

15 Definitions: Reducibility Reducible matrix: Ann n square matrix is reducible if there exists an n n permutation matrix P such that P AP T A11 A = 12, O A 22 where A 11 is an r r submatrix, A 22 is an (n r) (n r) submatrix, where 1 apple r<n. Irreducible matrix: There is no such permutation matrix. Theorem: An n n square matrix is irreducible i its directed graph is strongly connected. Proof: Follows by definition. Why is reducibility important?

16 Definitions: Cliques and independent sets Clique In an undirected graph G =(V,E), a set of vertices S V is a clique if for all s, t 2 S, wehave(s, t) 2 E. In a symmetric matrix A, a clique corresponds to a subset of rows R and the corresponding columns such that the matrix A(R, R) is full.

17 Depth First Search of a graph It s a searching algorithm that advances as deep as possible by exploring children of a node before siblings:

18 Depth First Search of a graph It s a searching algorithm that advances as deep as possible by exploring children of a node before siblings:

19 Depth First Search of a graph It s a searching algorithm that advances as deep as possible by exploring children of a node before siblings:

20 Depth First Search of a graph It s a searching algorithm that advances as deep as possible by exploring children of a node before siblings:

21 Depth First Search of a graph It s a searching algorithm that advances as deep as possible by exploring children of a node before siblings:

22 Depth First Search of a graph It s a searching algorithm that advances as deep as possible by exploring children of a node before siblings:

23 Depth First Search of a graph It s a searching algorithm that advances as deep as possible by exploring children of a node before siblings:

24 Depth First Search of a graph It s a searching algorithm that advances as deep as possible by exploring children of a node before siblings:

25 Depth First Search of a graph It s a searching algorithm that advances as deep as possible by exploring children of a node before siblings:

26 Depth First Search of a graph It s a searching algorithm that advances as deep as possible by exploring children of a node before siblings:

27 Depth First Search of a graph It s a searching algorithm that advances as deep as possible by exploring children of a node before siblings:

28 Depth First Search of a graph It s a searching algorithm that advances as deep as possible by exploring children of a node before siblings:

29 Depth First Search of a graph It s a searching algorithm that advances as deep as possible by exploring children of a node before siblings: It produces a spanning forest of a graph and the order in which the nodes are visited corresponds to its preorder.

30 Analysis

31 Matrices and graphs Predicting structure helps in reducing the memory requirements, in achieving high performance, in simplifying the algorithms. We will consider the Cholesky factorization A = LL T.Inthiscase, structural and numerical aspects are neatly separated. For general case (e.g., LU factorization) pivoting is necessary and depends on the actual numerical values. There are combinatorial tools for these, but we will not cover. Structure prediction algorithms should run, preferably, faster than the numerical computations that will follow.

32 Symmetric matrices and graphs Assumptions: A symmetric and pivots are chosen on the diagonal Structure of A symmetric represented by the graph G =(V,E) Vertices are associated to columns: V = {1,...,n} Edges E are defined by: (i, j) 2 E $ a ij 6=0 G undirected (symmetry of A)

33 Symmetric matrices and graphs Remarks: Number of nonzeros in column j = adj G (j) Symmetric permutation renumbering the graph Symmetric matrix Corresponding graph

34 The elimination model for symmetric matrices A symmetric, positive definite matrix can be factorized by means of the Cholesky algorithm for k =1,...,n q do (k 1) l kk = a kk for i = k +1,...,n do (k 1) l ik = a ik /l kk for j = k +1,...,i do a (k) ij = a (k) ij l ik l jk end for end for end for 0 a 11 a 21 a 22 a 31 a 32 a 33 a 41 a 42 a 43 a 44 1 C A

35 The elimination graph model for symmetric matrices Let A be a symmetric positive definite matrix of order n The LL T factorization can be described by the equation: d1 v A = A 0 = 1 T v 1 A 1 p! p! d v d1 p1 T = pd1 v d1 1 I n 1 0 A 1 0 I n 1 = L 1 A 1 L T 1,where A 1 = A 1 v 1 v T 1 d 1 The basic step is applied on A 1 A 2 to obtain : A =(L 1 L 2 L n 1 ) I n L T n 1...LT 2 LT 1 = LL T

36 The basic step: A 1 = A 1 v 1 v T 1 d 1 What is v 1 v T 1 in terms of structure? v1 is a column of A, hencethe neighbors of the corresponding vertex. v 1 v1 T results in a dense subblock in A 1, i.e., the elimination of a node results in the creation of a clique that connects all the neighbors of the eliminated node. If any of the nonzeros in dense submatrix are not in A, thenwe have fill-ins.

37 The elimination process in the graphs G U (V,E) undirected graph of A for k =1:n 1 do V V {k} {remove vertex k} E E {(k, `) :` 2 adj(k)}[{(x, y) :x 2 adj(k) and y 2 adj(k)} G k (V,E) {for definition} end for G k are the so-called elimination graphs (Parter, 61) G0 : H0 =

38 A sequence of elimination graphs G 0 : G 1 : G 2 : A 0 = A 1 = A 2 = l 11 l 21 l 22 l 32 l 33 l 42 l 43 l 44 l 53 l 54 l 55 l 61 l 62 l 63 l 64 l 65 l 66 1 C A G 3 : A 3 = 4 5 6

39 Elimination process: Formal definitions Deficiency of a vertex: D(v) is the set of edges defined by D(v) = {(x, y) :x 2 adj(v) and y 2 adj(v) and y/2 adj(x) and x 6= y} v-elimination graph: Apply the elimination process to the vertex v of G to obtain G v = V {v},e(v {v}) [ D(v)). v G v

40 Elimination process: Formal definitions For a graph G =(V,E), theelimination process P (G) =[G = G 0,G 1,G 2,...,G n 1 ] is the sequence of elimination graphs defined by G 0 = G, G i =(G i 1 ) i Let G i =(V i,e i ) for i =0, 1,...,n 1. Thefill-in F (G) is defined by F (G) =[ n 1 i=1 i where i = D(i) in G i 1,andtheelimination graph is defined by G + =(V,E [ F (G)) For a matrix A, i corresponds to the new nonzeros elements, the fill-ins, created during i the step of elimination.

41 Elimination process: Formal definitions Continuing from the previous sample matrix, we have the filled-graph G + (A) G (A) = G(F) 3 F = L + L T

42 Elimination process Fill-path theorem [Rose, Tarjan, Lueker 76] Let G =(V,E) be an ordered graph. Then (v, w) is an edge of G + =(V,E [ F (G)) i there exists a path µ =[v = v 1,v 2,...,v k+1 = w] in G such that v i < min{v, w}, 2 apple i apple k

43 Elimination process Fill-path theorem [Rose, Tarjan, Lueker 76] Let G =(V,E) be an ordered graph. Then (v, w) is an edge of G + =(V,E [ F (G)) i there exists a path µ =[v = v 1,v 2,...,v k+1 = w] in G such that v i < min{v, w}, 2 apple i apple k

44 Elimination process Fill-path theorem [Rose, Tarjan, Lueker 76] Let G =(V,E) be an ordered graph. Then (v, w) is an edge of G + =(V,E [ F (G)) i there exists a path µ =[v = v 1,v 2,...,v k+1 = w] in G such that v i < min{v, w}, 2 apple i apple k

45 Elimination process Fill-path theorem [Rose, Tarjan, Lueker 76] Let G =(V,E) be an ordered graph. Then (v, w) is an edge of G + =(V,E [ F (G)) i there exists a path µ =[v = v 1,v 2,...,v k+1 = w] in G such that v i < min{v, w}, 2 apple i apple k

46 Set-up Reminder A spanning tree of a connected graph G =(V,E) is a tree T =(V,F), suchthatf E. A topological ordering of a rooted tree is an ordering that numbers children vertices before their parent. A postorder is a topological ordering which numbers the vertices in any subtree consecutively. Let A be an n n symmetric positive-definite and irreducible matrix, A = LL T its Cholesky factorization, and G + (A) its filled graph (graph of F = L + L T ).

47 Afirstdefinition Since A is irreducible, each of the first n least one o -diagonal nonzero (prove?). 1 columns of L has at For each column j<nof L, remove all the nonzeros in the column j except the first one below the diagonal. Let L t denote the remaining structure and consider the matrix F t = L t + L T t. The graph G(F t ) is a tree called the elimination tree. a fa d o d I o o h o oo 0 oj

48 Afirstdefinition The elimination tree of A is a spanning tree of G + (A) satisfying the relation P ARENT [j] =min{i >j: `ij 6=0}. fa o o o h d I o oo 0 oj j 10 a c g 1 a 3 c 7 g 9 i h d i e G(A) f j b 8 9 h i 4 d e 5 10 j b 2 + G (A) = G(F) f 6 e 5 b 2 3 c 1 a 8 h 7 g f T(A) 4 d 6

49 Aseconddefinition:Representscolumndependencies Dependency between columns of L: Column i>jdepends on column j i `ij 6=0 Use a directed graph to express this dependency (edge from j to i, if column i depends on column j) Simplify redundant dependencies (transitive reduction) The transitive reduction of the directed filled graph gives the elimination tree structure. Remove a directed edge (j, i) if there is a path of length greater than one from j to i. j i j i

50 Directed filled graph and its transitive reduction h 8 d 4 e 5 b 2 f 6 c 3 a 1 i 9 g 7 j 10 d 4 i 9 e 5 b 2 a 1 c 3 g 7 h 8 f 6 j 10 d 4 i 9 e 5 b 2 a 1 c 3 g 7 h 8 f 6 j 10 T(A) Directed filled graph Transitive reduction

51 Athirddefinition:DFStree Theorem The elimination tree T (A) of a connected graph G(A) is a depth-first search tree of the filled graph G + (A). Proof. Let x 1,x 2,...,x n be the node ordering of G + (A). Consider the depth-first search subject to the following tie-breaking rule: when there is a choice of more than one node to explore next, always pick the one with largest subscript. With this additional rule the depth-first search will construct T (A).

52 Adepth-firstsearchtreeofthefilledgraph Any DFS on an undirected graph produces only Tree and Back edges. 6/7 5/8 a c g 4/11 j i 3/12 h 13/14 d f j 9/10 1/20 e b h g d c f i e b 2/19 15/18 16/17 a

53 Path characterization of filled edges Because there is no cross edge in the DFS tree of an undirected graph, two nodes that belong to two distinct subtrees cannot be connected Nonzeros of L If `ij 6=0,thennodex i is an ancestor of x j in T (A). Some zeros of L Let T [x i ] and T [x j ] be two disjoint subtrees of T (A). Then`st =0 for any x s 2 T [x i ] and x t 2 T [x j ].

54 Fill-in entries Fill-path theorem [Rose, Tarjan, Lueker 76] Let G =(V,E) be an ordered graph. Then (v, w) is an edge of G + =(V,E [ F (G)) i there exists a path µ =[v = v 1,v 2,...,v k+1 = w] in G such that v i < min{v, w}, 2 apple i apple k Restating using the elimination tree Let i>j.then`ij 6=0i there exists a path x i,x p1,...,x pk,x j in the graph of A such that {x p1,...,x pk } T [x j ].HereT [x j ] is the set of nodes in the subtree rooted at x j.

55 Uses of the elimination tree The elimination tree has several uses in the factorization of a sparse matrix: it expresses the order in which variables can be eliminated: because the elimination of a variable only a ects the elimination of its ancestors, any topological order of the elimination tree will lead to a correct result and to the same fill-in it expresses concurrence: because variables in separate subtrees do not a ect each other, they can be eliminated in parallel it can be used to characterize the structure of the factors: elimination graphs can be obviously used to determine the structure and size of the factors but this results in a complexity proportional to the number of nonzeroes in the factors. Instead, the elimination tree can be used to compute the row and column count of the factors with a cost proportional to O(nnz(A)) for big enough matrices

56 The analysis phase The determination of the structure of the factors is commonly referred to as symbolic factorization because it only does symbolic computations that do not involve numerical operations. In modern software packages, the symbolic factorization is commonly done is a preprocessing phase called the analysis phase. The analysis phase is essential for the actual factorization of a matrix and may include many other (symbolic) operations as we will see later. Once the analysis phase is complete, the actual matrix factorization can take place.

57 Matrix factorization

58 Cholesky on a dense matrix left-looking Cholesky for k =1,...,n do for i = k,..., n do for j =1,...,k 1 do a (k) ik = a(k) ik l ij l kj end for end for q (k 1) l kk = a kk for i = k +1,...,n do (k 1) end for end for l ik = a ik /l kk right-looking Cholesky for k =1,...,n q do (k 1) l kk = a kk for i = k +1,...,n do (k 1) l ik = a ik /l kk for j = k +1,...,i do a (k) ij = a (k) ij l ik l jk end for end for end for Left looking Right looking used for modification modified

59 Cholesky on a sparse matrix The Cholesky factorization of a sparse matrix can be achieved with a left-looking, right-looking or multifrontal method. Reference case: regular 3 3 grid ordered by nested dissection. Nodes in the separators are ordered last (see the section on orderings) Notation: cdiv(j): divide column j by a scalar cmod(j,k): update column j with column k, k<j struct(l(1:k,j)): the structure of L(1:k,j) submatrix

60 Sparse left-looking Cholesky left-looking for j=1 to n do for k in struct(l(j,i:j-1)) do cmod(j,k) end for cdiv(j) end for In the left-looking method, before variable j is eliminated, column j is updated with all the columns that have a nonzero on line j. Inthe example above, struct(l(7,1:6))={1, 3, 4, 6}. this corresponds to receiving updates from nodes lower in the subtree rooted at j the filled graph is necessary to determine the structure of each line

61 Sparse right-looking Cholesky right-looking for k=1 to n do cdiv(k) for j in struct(l(k+1:n,k)) do cmod(j,k) end for end for In the right-looking method, after variable k is eliminated, column k is used to update all the columns corresponding to nonzeros in column k. In the example above, struct(l(4:9,3))={7, 8, 9}. this corresponds to sending updates to nodes higher in the elimination tree the filled graph is necessary to determine the structure of each column

62 The Multifrontal method Take as an example a simple 3 3 sparse matrix where no fill-in is generated: a 11 a 13 a 22 a 23 a 31 a 32 a 33 Its factorization can be achieved in three simple steps. The right-looking approach results in: 8 < Step 1 : l 11 = p a 11 l 31 = a 31 /l 11 Step 2 a 0 33 = a 33 l 31 l 31 These computations are ine 8 < : cient: heavy use of indirect addressing no vectorization nor cache reuse 1 A l 22 = p a 22 p l 32 = a 32 /l 22 Step 3 l 33 = a 00 a = 33 a0 33 l 32 l 32

63 BLAS operations High e ciency can be achieved if the computations of a sparse matrix can be rearranged as operations on dense matrices/blocks. This allows the use of e cient BLAS routines: Level-1 BLAS: vector-vector operations like inner product or vector sum. O(n) operations are performed on O(n) data. Vectorizable but limited by bus speed Level-2 BLAS: matrix-vector operations like matrix-vector product. O(n 2 ) operations are performed on O(n 2 ) data. Vectorizable but limited by bus speed Level-3 BLAS: matrix-matrix operations like matrix-matrix product or rank-k update. O(n 3 ) operations are performed on O(n 2 ) data. Vectorizable and very e cient thanks to good exploitation of memory hierarchy

64 The Multifrontal Method REMEMBER: each time a pivot is eliminated, a clique is formed in the graph. A clique is a set of nodes fully connected, i.e., a graph associated to a dense submatrix U L A k The nonzero values concerned by an elimination step can be stored in a dense matrix and, thus, operations can be carried on by means of BLAS operation

65 The Multifrontal Method Thanks to associativity of the addition operation, the three steps before can be rewritten as: Step 1 a 11 a 13 a 22 a 23 a 31 a 32 a 33 1 A! a11 a 13 a 31 0! l11 l 31 b, b = l 31 l 31 Step a 11 a a 22 a 23 A a22 a! 23 a a 31 a 32 a ! l22 l 32 c, c = l 32 l 32 Step 3 l 33 = p a 33 + b + c

66 The Multifrontal Method In the general case b and c are dense submatrices (Schur complements) called contribution blocks and will be assembled in some sophisticated way into other dense matrices f 11 f 12 f 1n l 11 f 21 f 22 f 2n l C.. A! 21 cb 22 cb 2n C.. A f n1 f n2 f nn l n1 cb n2 cb nn where l 11 = p f 11, cb 22 cb 2n C B.. A cb n2 cb nn 0 l 21. l n1 f n2 f nn 1 0 f 21 C B A f n1 f 22 f 2n C A 0 1 C A /l 11 l 21. l n1 1 0 l 21 C B l n1 1T C A

67 The Multifrontal Method The elimination tree can be regarded as a graph of dependencies which defines where/how to assemble the elimination blocks and which variable to eliminate at each step

68 The Multifrontal Method The elimination tree can be regarded as a graph of dependencies which defines where/how to assemble the elimination blocks and which variable to eliminate at each step a 44 a 46 a 47 a a A! l 44 l 64 b 66 b 67 l 74 b 76 b 77 1 A

69 The Multifrontal Method The elimination tree can be regarded as a graph of dependencies which defines where/how to assemble the elimination blocks and which variable to eliminate at each step a 55 a 56 a 59 a a A! l 55 l 65 c 66 c 69 l 95 c 96 c 99 1 A

70 The Multifrontal Method The elimination tree can be regarded as a graph of dependencies which defines where/how to assemble the elimination blocks and which variable to eliminate at each step a 66 0 a a C A + 0 0! b 66 b b 76 b C A + l 66 l 76 d 77 d 78 d 79 l 86 d 87 d 88 d 89 l 86 d 97 d 98 d C A c c c c 99 1 C A

71 The Multifrontal Method A dense matrix, called frontal matrix, is associated at each node of the elimination tree. The Multifrontal method consists in a bottom-up traversal of the tree where at each node two operations are done: Assembly Nonzeros from the original matrix are assembled together with the contribution blocks from children nodes into the frontal matrix arrowhead + cb cb-n Elimination A partial factorization of the frontal matrix is done. The variable associated to the node of the frontal tree (called fully assembled) can be eliminated. This step produces part of the final factors and a Schur complement (contribution block) that will be assembled into the father node frontal matrix L cb

72 The Multifrontal Method: example

73 Solve

74 Solve Once the matrix is factorized, the problem can be solved against one or more right-hand sides: AX = LL T X = B, A, L 2 R n n, X 2 R n k, B 2 R n k The solution of this problem can be achieved in two steps: forward substitution LZ = B backward substitution L T X = Z

75 Solve: left-looking

76 Solve: right-looking

77 Direct solvers: resume The solution of a sparse linear system can be achieved in three phases that include at least the following operations: Analysis Factorization Solve Symbolic factorization Elimination tree computation 2 The actual matrix factorization Forward substitution Backward substitution These phases, especially the analysis, can include many other operations. Some of them are presented next. 2 note that the elimination tree is strictly needed only for the multifrontal method but in every case it can always be used to compute the symbolic factorization in a cheap way

78 Orderings

79 Asequenceofeliminationgraphs The nodes don t have to be necessarily eliminated in the natural order: G 0 : G 4 : G 2 : A 0 = A 4 = A 2 = l 44 l 24 l 22 l 32 l 33 l 12 l 13 l 11 l 53 l 51 l 55 l 61 l 65 l 66 1 C A G 3 : 1 A 3 =

80 Elimination process: ordered graphs For an undirected graph G =(V,E) with V = n, anordering of V is a bijection : {1,...,n}$V. For an ordered graph G =(V,E, ), theelimination process P (G )=[G = G 0,G 1,G 2,...,G n 1 ] is the sequence of elimination graphs defined by G 0 = G, G i =(G i 1 ) (i) Let G i =(V i,e i ) for i =0, 1,...,n 1. Thefill-in F (G ) is defined by where i = D( (i)) in G i F (G )=[ n 1 i=1 i 1,andtheelimination graph is defined by G + =(V,E [ F (G )) For a matrix A, i corresponds to the new nonzeros elements, the fill-ins, created during i the step of elimination.

81 Elimination process Fill-path theorem for ordered graphs Let G =(V,E, ) be an ordered graph. Then (v, w) is an edge of G + =(V,E [ F (G )) i there exists a path µ =[v = v 1,v 2,...,v k+1 = w] in G such that 1 (v i ) < min{ 1 (v), 1 (w)}, 2 apple i apple k = {4, 2, 3, 1, 5, 6} G (A) = G(F) 3 F = L + L T

82 Elimination process: Formal definitions Given a graph G =(V,E), anordering of V is a perfect elimination ordering of G if F (G )=;. The ordering is a perfect elimination ordering if w 2 adj(v), x 2 adj(v), and 1 (v) < min{ 1 (w), 1 (x)} in G,implyeither (w, x) 2 E or w = x. In other words, when v is to be eliminated (both w and x are not eliminated yet), there is an edge (w, x). A graph which has a perfect elimination ordering is a perfect elimination graph. Any elimination graph G + is a perfect elimination graph, since is a perfect ordering.

83 Fill-reducing ordering methods Three main classes of methods for minimizing fill-in during factorization Local approaches : At each step of the factorization, selection of the pivot that is likely to minimize fill-in. Method is characterized by the way pivots are selected. Markowitz criterion (for a general matrix) Minimum degree (for symmetric matrices) Global approaches : The matrix is permuted so as to confine the fill-in within certain parts of the permuted matrix Cuthill-McKee, Reverse Cuthill-McKee Nested dissection Hybrid approaches : First permute the matrix globally to confine the fill-in, then reorder small parts using local heuristics.

84 Local heuristics to reduce fill-in during factorization Let G(V,E) be the graph associated to a matrix A that we want to order using local heuristics. Let Metric such that Metric(v i ) <Metric(v j ) implies v i is better than v j 1: G U (V,E) undirected graph of A 2: for i =1:n 1 do 3: let k be a vertex that minimizes a metric 4: V V {k} {remove vertex k} 5: E E {(k, `) :` 2 adj(k)}[{(x, y) :x 2 adj(k) and y 2 adj(k)} 6: update Metric(v j ) for all non-selected nodes v j 7: end for Step 6 should only be applied to nodes for which the Metric value might have changed.

85 Reordering unsymmetric matrices: Markowitz criterion At step k of Gaussian elimination: r (k) i = number of non-zeros in row i of A (k) c (k) j = number of non-zeros in column j of A (k) U L A k Markowitz criterion: Candidate pivot a ij should minimize (r (k) i 1) (c (k) j 1) 8i, j k Minimum degree, i.e., Markowitz criterion for symmetric matrices: Candidate pivot a ij should minimize (c (k) j 1) 8j k

86 Minimum degree algorithm Step 1: Select the vertex that possesses the smallest number of neighbors in G (a) Sparse symmetric matrix (b) Elimination graph The node/variable selected is 1 of degree 2.

87 Illustration Step 1: elimination of pivot (a) Elimination graph (b) Factors and active submatrix Initial nonzeros Nonzeros in factors Fill in

88 Minimum degree algorithm based on elimination graphs 8i 2 [1 n] d i = adj G 0(i) For k =1to n Do p =min i2vk 1 (d i ) For each i 2 adj G k 1(p) Do adj G k(i) = (adj G k 1(i) [ adj G k 1(p)) \{i, p} d i = adj G k(i) EndFor V k = V k 1 \ p EndFor

89 Illustration (cont d) Graphs G 1,G 2,G 3 and corresponding reduced matrices (a) Elimination graphs (b) Factors and active submatrices Original nonzero Original nonzero modified Fill in Nonzeros e in factors

90 Minimum degree algorithm Minimum Degree does not always minimize fill-in!!! Consider the following matrix Remark: Using initial ordering No fill in Corresponding elimination graph Step 1 of Minimum Degree: 2 7 Select pivot 5 (minimum degree = 2) Updated graph Add (4,6) i.e. fill in 3 8

91 E cient implementation of Minimum degree Reduce time complexity 1. Accelerate selection of pivots and update of the graph: 1.1 Supervariables (or indistinguishable nodes): if several variables have the same adjacency structure in G k,theycanbeeliminated simultaneously. 1.2 Two non-adjacent nodes of same degree can be eliminated simultaneously (multiple eliminations). 1.3 Degree update of neighbours of the pivot can be e ected in an approximate way (Approximate Minimum Degree).

92 E cient implementation of Minimum degree Reduce memory complexity 2. Decrease size of working space Using the elimination graph, working space is of order O(nnz(L)). Fill-in: Let pivot be the pivot at step k If i 2 Adj G k 1(pivot) then Adj G k 1(pivot) Adj G k(i) Structure of pivot column included in filled structure of column i. We can then use an implicit representation of fill-in by defining the notion of element (variable already eliminated) and quotient graph. A variable of the quotient graph is adjacent to variables and elements. One can show that 8k 2 [1...n], the size of the quotient graph is O(nnz(A))

93 Influence on the structure of factors Harwell-Boeing matrix: dwt 592.rua, structural computing on a submarine. NZ(LU factors)= nz = nz = 58202

94 Influence on the structure of factors 0 Structure of factors after permutation Minimum Degree MMD ( ) nz = nz = Detection of supervariables allows to build more regularly structured factors (easier factorization).

95 Comparison of 3 implementations of Minimum Degree V0 is the initial algorithm (based on the elimination graph) MMD is V / + 1.2/ + 2/ (Multiple Minimum Degree, Liu, 85, 89) AMD is V / + 1.3/ + 2/ (Approximate Minimum Degree, Amestoy et al., 95). Execution times on a SunSparc 10: Matrix Order Nonzeros Minimum Degree V0 MMD AMD dwt Min. memory size 250KB 110KB 110KB Wang Orani Fill-in is similar Memory space for MMD and AMD : 2 NZ integers V0 was not able to perform reordering for the 2 last matrices (lack of memory after 2 hours of computations)

96 Minimum fill based algorithm Metric(v i ) is the amount of fill-in that v i would introduce if it were selected as a pivot. This corresponds to the cardinality of the deficiency of a vertex Illustration: r has a degree d =4and a fill-in metric of d (d 1)/2 =6whereas s has degree d =5but a fill-in metric of d (d 1)/2 9=1. j2 j1 j3 r j4 s i1 i3 i4 i2 i5

97 Minimum fill-in properties The situation typically occurs when {i 1,i 2,i 3 } and {i 2,i 3,i 4,i 5 } were adjacent to two already selected nodes (here e 2 and e 1 ) j2 j1 j3 r j4 s i2 i1 i3 i5 i4 e1 e2 r s i1 i2 i3 i4 i5 j1 j2 e1 and e2 are previously selected nodes The elimination of a node v k a ects the degree of nodes adjacent to v k. The fill-in metric of adj(adj(v k )) is also a ected. Illustration: selecting r a ects the fill-in metric of i 1 (because of fill edge (j 3,j 4 )). j3 j4

98 How to compute the fill-in metrics Only nodes adjacent to current pivot are updated. Only approximated metrics (using clique structures) are computed Let d k be the degree of node k; d k (d k 1)/2 is an upper bound of the fill (s! d s =5! d s (d s 1)/2 = 10). Several possibilities: 1. Deduct the clique area of the last selected pivot adjacent to k (s! clique of e 2 ). 2. Deduct the largest clique area of all adjacent selected pivots (s! clique of e 1 ) 3. If for d k we use instead AMD then cliques of all adjacent selected pivots can be deducted.

99 CM and RCM: Definitions Bandwidth: AstructurallysymmetricmatrixA is said to have bandwidth 2m +1,ifmisthe smallest integer such that a ij =0, whenever i j >m. If no interchanges are performed during elimination, fill-in occurs only within the band. Profile: Definebandwidthforeachrowi: m(i) is the smallest integer such that a ij =0,wheneveri j>m(i) for j<i.then profile of a symmetric matrix is P i m(i). If no interchanges are performed during elimination, no fill-in occurs ahead of the first entry in each row. Block tridiagonal form: Nonzeros are on the diagonal blocks or in a block just above the diagonal or just below the diagonal.

100 CM: Algorithm Level sets are built from the vertex of minimum degree. At any level, priority is given to a vertex with smaller number of neighbors. pick a vertex v and order it as the first vertex S {v} while S 6= V do S 0 all vertices in V \ S which are adjacent to S order vertices in S 0 in increasing order of degrees S S [ S 0 end while (example from Du, Erisman, Reid)

101 CM vs RCM RCM: Simply reverse the order found by the CM algorithm. It does not change the bandwidth but improves the storage requirements. (Liu and Sherman, 76 )

102 Illustration: Reverse Cuthill-McKee on matrix dwt 592 Harwell-Boeing matrix: dwt 592, structural computing on a submarine. NZ(LU factors)= Original matrix 0 Factorized matrix nz = nz = 58202

103 Illustration: Reverse Cuthill-McKee on matrix dwt 592 NZ(LU factors)= Permuted matrix (RCM) Factorized permuted matrix nz = nz = 16924

104 Nested dissection Fill-path theorem [Rose, Tarjan, Lueker 76] Let G =(V,E) be an ordered graph. Then (v, w) is an edge of G + =(V,E [ F (G)) i there exists a path µ =[v = v 1,v 2,...,v k+1 = w] in G such that v i < min{v, w}, 2 apple i apple k All the paths connecting one subdomain to the other are contained in the separator (bisector). Thus, there cannot be any l ij where v i 2 D 1 and v j 2 D 2

105 ND of a regular square mesh The nested dissection method aims at partitioning the domain so that the fill-in is only generated internally on each subdomain and on the interface by recursively computing bisectors The nested dissection method also produces an elimination tree.

106 Nested dissection A matrix from UFL sparse matrix collection, with 4045 nonzeros nz = nz = nz = nnz(l) = nz = nnz(lrcm ) = nz = nz = nnz(ln D ) = 4287

107 Orderings: comparison of methods Million of entries in the original matrix and in the factors: A METIS AMF AMD gupta ship twotone wang xenon METIS (Karypis and Kumar) and SCOTCH (Pellegrini) are global strategies (recursive nested dissection based orderings). PORD (Schulze, Paderborn Univ.) recursive dissection based on a bottom up strategy to build the separator AMD (Amestoy, Davis and Du ) is a local strategy based on Approximate Minimum Degree. AMF (Amestoy) is a local strategy based on Approx. Minimum Fill.

108 Impact of fill-reducing heuristics Number of operations (millions): METIS SCOTCH PORD AMF AMD gupta ship twotone wang xenon METIS (Karypis and Kumar) and SCOTCH (Pellegrini) are global strategies (recursive nested dissection based orderings). PORD (Schulze, Paderborn Univ.) recursive dissection based on a bottom up strategy to build the separator AMD (Amestoy, Davis and Du ) is a local strategy based on Approximate Minimum Degree. AMF (Amestoy) is a local strategy based on Approx. Minimum Fill.

109 Tree Amalgamation

110 Tree Amalgamation The whole factorization is recast into a sequence of partial dense factorizations of the type: l 11 = p f 11, l 21. l n1 C A = f 21. f n1 C A /l 11 0 cb cb 2n cb n2 cb nn. 1 0 C B A f f 2n f n2 f nn. 1 C A 0 l 21. l n1 1 0 C B l 21. l n1 1 C A T This is still only Level-2 BLAS operations. How to get the e of Level-3 BLAS? ciency

111 Tree Amalgamation 9 L= Amalgamation without fill-in consists in merging all the frontal matrices related to pivots whose columns in the factor L have the same structure. The subset of nodes containing this pivots is called a supernode. All the pivots in a supernode can thus be eliminated at once within the same frontal matrix

112 Tree Amalgamation L= Amalgamation with fill-in is based on the same principle except that it groups together pivots whose column structure in L is not exactly the same. If the generated fill-in does not exceed a certain threshold, the extra cost is overcome by e ciency

113 Tree Amalgamation After amalgamation: L 11 A = A21 A22 L21 CB L 11 L T 11 = A 11 (Cholesky factorization) L 21 = A 21 L11 T CB = A 22 L 21 L T 21 All the operation related to the frontal matrix can be done through Level-3 BLAS routines

114 Tree Amalgamation Liu, Ng, Peyton 93 Column j is the first node in a fundamental supernode if and only if node j has two or more children in the elimination tree, or j is a leaf of some row subtree of the elimination tree of A

115 Tree traversal orders

116 Equivalent reorderings A tree can be regarded as dependency graph where node produce data that is, then, processed by its parent. By this definition, a tree has to be traversed in topological order. But there are still many topological traversals of the same tree. Definition Two orderings P and Q are equivalent if the structures of the filled graphs of P AP T and QAQ T are the same (that is they are isomorphic). Equivalent orderings result in the same amount of fill-in and computation during factorization. To ease the notation, we discuss only one ordering wrt A, i.e.,p is an equivalent ordering of A if the filled graph of A and that of P AP T are isomorphic.

117 Equivalent reorderings Any topological ordering on T (A) are equivalent Let P be the permutation matrix corresponding to a topological ordering of T (A). Then,G + (P AP T ) and G + (A) are isomorphic. Any topological ordering on T (A) are equivalent Let P be the permutation matrix corresponding to a topological ordering of T (A). The elimination tree T (P AP T ) and T (A) are isomorphic. Because the fill-in won t change, we have the freedom to choose any specific topological order that will provide other properties

118 Tree traversal orders Which specific topological order to choose? postorder: why? Because data produced by nodes is consumed by parents in a LIFO order. In the multifrontal method, we can thus use a stack memory where contribution blocks are pushed as soon as they are produced by the elimination on a frontal matrix and popped at the moment where the father node is assembled. This provides a better data locality and makes the memory management easier.

119 The multifrontal method (Du, Reid 83) 1 2 A= 3 L+U I= Fill in Memory is divided into two parts (that can overlap in time): the factors the active memory Factors Contribution block Factors Active frontal matrix Stack of contribution blocks Active Memory Elimination tree represents tasks dependencies

120 Example 1: Processing a wide tree Memory unused memory space stack memory space factor memory space non-free memory space Active memory

121 Example 2: Processing a deep tree Memory Allocation of Assembly step for Factorization step for 3 + Stack step for unused memory space factor memory space stack memory space non-free memory space

122 Postorder traversals: memory Postorder provides a good data locality and better memory consumption that a general topological order since father nodes are assembled as soon as its children have been processed. But there are still many postorders of the same tree. Which one to choose? the one that minimizes memory consumption Best (abcdefghi) Worst (hfdbacegi) i Root i g g e e c c Leaves a b d f h h f d b a

123 Modelization of the problem M i : memory peak for complete subtree rooted at i, temp i :temporarymemoryproducedbynodei, m parent :memoryforstoringtheparent. M(parent) temp1 temp2 temp3 M1 M2 M3 M parent = max( max nbchildren j=1 (M j + P j 1 k=1 temp k), m parent + P nbchildren j=1 temp j ) (1)

124 Modelization of the problem M i : memory peak for complete subtree rooted at i, temp i :temporarymemoryproducedbynodei, m parent :memoryforstoringtheparent. M(parent) temp1 temp2 temp3 M1 M2 M3 M parent = max( max nbchildren j=1 (M j + P j 1 k=1 temp k), m parent + P nbchildren j=1 temp j ) (1) Objective: orderthechildrentominimizem parent

125 Memory-minimizing schedules Theorem [Liu,86] The minimum of max j (x j + P j 1 i=1 y i) is obtained when the sequence (x i, y i ) is sorted in decreasing order of x i y i. Corollary An optimal child sequence is obtained by rearranging the children nodes in decreasing order of M i temp i. Interpretation: At each level of the tree, child with relatively large peak of memory in its subtree (M i large with respect to temp i ) should be processed first. ) Apply on complete tree starting from the leaves (or from the root with a recursive approach)

126 Optimal tree reordering Objective: Minimize peak of stack memory Tree Reorder (T ): for all i in the set of root nodes do Process Node(i); end for Process Node(i): if i is a leaf then M i =m i else for j =1to nbchildren do Process Node(j th child); end for Reorder the children of i in decreasing order of (M j temp j ); Compute M parent at node i using Formula (1); end if

127 Parallelism

128 Parallelization: two levels of parallelism tree parallelism arising from sparsity, it is formalized by the fact that nodes in separate subtrees of the elimination tree can be eliminated at the same time node parallelism within each node: parallel dense LU factorization (BLAS) Decreasing tree parallelism Increasing node parallelism U L L U U L

129 Exploiting the second level of parallelism is crucial Multifrontal factorization (1) (2) Computer #procs MFlops (speed-up) MFlops (speed-up) Alliant FX/ (1.9) 34 (4.3) IBM 3090J/6VF (2.1) 227 (3.8) CRAY (1.8) 404 (2.3) CRAY Y-MP (2.3) 1119 (4.8) Performance summary of the multifrontal factorization on matrix BCSSTK15. In column (1), we exploit only parallelism from the tree. In column (2), we combine the two levels of parallelism.

130 Other features Dynamic management of parallelism: Pool of tasks for exploiting the two levels of parallelism Assembly operations also parallel (but indirect addressing) L U Dynamic management of data Storage of LU factors, frontal and contribution matrices Amount of memory available may conflict with exploiting maximum parallelism

131 Task mapping and scheduling Assign tasks to processors to achieve a goal: makespan minimization, memory minimization,... many approaches: static: Build the schedule before the execution and follow it at run-time Advantage: verye cientsinceithasaglobalviewofthesystem Drawback: Requires a very-good modelization of the platform dynamic: Take scheduling decisions dynamically at run-time Advantage: Reactive to the evolution of the platform and easy to use on several platforms Drawback: Decisions taken with local criteria (a decision which seems to be good at time t can have very bad consequences at time t +1)

132 Influence of scheduling on the makespan Objective: Assign processes/tasks to processors so that the completion time, also called the makespan is minimized. (We may also say that we minimize the maximum total processing time on any processor.)

133 Task scheduling on shared memory computers The data can be shared between processors without any communication. Dynamic scheduling of the tasks (pool of ready tasks). Each processor selects a task (order can influence the performance). Example of good topological ordering (w.r.t time) Ordering not so good in terms of working memory.

134 Static scheduling: proportional mapping Main objective: reduce the volume of communication between processors. Recursively partition the processors equally between children of a given node. Initially all processors are assigned to root node. Good at localizing communication but not so easy if no overlapping between processor partitions at each step. 18 1,2,3,4,5 1,2,3 16 4, ,3 4 4, Mapping of the tasks onto the 5 processors

135 Mapping of the tree onto the processors Objective :FindalayerL 0 such that subtrees of L 0 can be mapped onto the processor with a good balance. Step A Step B Step C Construction and mapping of the initial level L 0 Let L 0 Roots of the assembly tree repeat Find the node q in L 0 whose subtree has largest computational cost Set L 0 (L 0 \{q}) [{children of q} Greedy mapping of the nodes of L 0 onto the processors Estimate the load unbalance until load unbalance < threshold

136 Decomposition of the tree into levels Determination of Level L 0 based on subtree cost. L 0 Subtree roots Mapping of top of the tree can be dynamic. Could be useful for both shared and distributed memory algo.

137 Assumptions and Notations Assumptions : We assume that each column of L/each node of the tree is assigned to a single processor. Each processor is in charge of computing cdiv(j) for columns j that it owns. Notation : mycols(p) is the set of columns owned by processor p. map(j) gives the processor owning column j (or task j). procs(l(:,k))) = {map(j) j 2 struct(l(:,k))} (only processors in procs(l(:,k)) require updates from column k they correspond to ancestors of k in the tree). father(j) is the father of node j in the elimination tree

138 Computational strategies for parallel direct solvers The parallel algorithm is characterized by: Computational graph dependency Communication graph There are three classical approaches to distributed memory parallelism: 1. Fan-in : The fan-in algorithm is very similar to the left-looking approach and is demand-driven: data required are aggregated update columns computed by sending processor 2. Fan-out : The fan-out algorithms is very similar to the right-looking approach and is data driven: data is sent as soon as it is produced. 3. Multifrontal : The communication pattern follows a bottom-up traversal of the tree. Messages are contribution blocks and are sent to the processor mapped on the father node

139 Fan-in variant (similar to left looking) fan-in for j=1 to n u=0 for all k in (struct(l(j,1:j-1)) \ mycols(p) ) cmod(u,k) end for if map(j)!= p send u to processor map(j) else incorporate u in column j receive all the updates on column j and incorporate them cdiv(j) end if end for

140 Fan-in variant P4 P0 P1 P2 P3

141 Fan-in variant P4 P0 P1 P2 P3

142 Fan-in variant P4 P0 P1 P2 P3

143 Fan-in variant P4 P0 P1 P2 P3

144 Fan-in variant P4 P0 P1 P2 P3

145 Fan-in variant P4 Communication P0 P1 P2 P3

146 Fan-in variant P4 Communication P0 P0 P0 P0 if 8i 2 children map(i) =P 0 and map(father) 6= P 0 (only) one message sent by P 0! exploits data locality of proportional mapping.

147 Fan-out variant (similar to right-looking) fan-out for all leaf nodes j in mycols(p) cdiv(j) send column L(:,j) to procs(l(:,j)) mycols(p) = mycols(p) - {j} end for while mycols(p)!= ; receive any column (say L(:,k)) for j in struct(l(:,k)) \ mycols(p) cmod(j,k) if column j is completely updated cdiv(j) send column L(:,j) to procs(l(:,j)) mycols(p) = mycols(p) - {j} end if end for end while

Sparse Linear Algebra: Direct Methods, advanced features

Sparse Linear Algebra: Direct Methods, advanced features Sparse Linear Algebra: Direct Methods, advanced features P. Amestoy and A. Buttari (INPT(ENSEEIHT)-IRIT) A. Guermouche (Univ. Bordeaux-LaBRI), J.-Y. L Excellent and B. Uçar (INRIA-CNRS/LIP-ENS Lyon) F.-H.

More information

Algorithmique pour l algèbre linéaire creuse

Algorithmique pour l algèbre linéaire creuse Algorithmique pour l algèbre linéaire creuse Abdou Guermouche 26 septembre 2008 Abdou Guermouche Algorithmique pour l algèbre linéaire creuse 1 Introduction to Sparse Matrix Computations Outline 1. Introduction

More information

Direct solution methods for sparse matrices. p. 1/49

Direct solution methods for sparse matrices. p. 1/49 Direct solution methods for sparse matrices p. 1/49 p. 2/49 Direct solution methods for sparse matrices Solve Ax = b, where A(n n). (1) Factorize A = LU, L lower-triangular, U upper-triangular. (2) Solve

More information

Sparse Linear Algebra PRACE PLA, Ostrava, Czech Republic

Sparse Linear Algebra PRACE PLA, Ostrava, Czech Republic Sparse Linear Algebra PACE PLA, Ostrava, Czech epublic Mathieu Faverge February 2nd, 2017 Mathieu Faverge Sparse Linear Algebra 1 Contributions Many thanks to Patrick Amestoy, Abdou Guermouche, Pascal

More information

V C V L T I 0 C V B 1 V T 0 I. l nk

V C V L T I 0 C V B 1 V T 0 I. l nk Multifrontal Method Kailai Xu September 16, 2017 Main observation. Consider the LDL T decomposition of a SPD matrix [ ] [ ] [ ] [ ] B V T L 0 I 0 L T L A = = 1 V T V C V L T I 0 C V B 1 V T, 0 I where

More information

Sparse linear solvers

Sparse linear solvers Sparse linear solvers Laura Grigori ALPINES INRIA and LJLL, UPMC On sabbatical at UC Berkeley March 2015 Plan Sparse linear solvers Sparse matrices and graphs Classes of linear solvers Sparse Cholesky

More information

Scientific Computing with Case Studies SIAM Press, Lecture Notes for Unit VII Sparse Matrix

Scientific Computing with Case Studies SIAM Press, Lecture Notes for Unit VII Sparse Matrix Scientific Computing with Case Studies SIAM Press, 2009 http://www.cs.umd.edu/users/oleary/sccswebpage Lecture Notes for Unit VII Sparse Matrix Computations Part 1: Direct Methods Dianne P. O Leary c 2008

More information

5.1 Banded Storage. u = temperature. The five-point difference operator. uh (x, y + h) 2u h (x, y)+u h (x, y h) uh (x + h, y) 2u h (x, y)+u h (x h, y)

5.1 Banded Storage. u = temperature. The five-point difference operator. uh (x, y + h) 2u h (x, y)+u h (x, y h) uh (x + h, y) 2u h (x, y)+u h (x h, y) 5.1 Banded Storage u = temperature u= u h temperature at gridpoints u h = 1 u= Laplace s equation u= h u = u h = grid size u=1 The five-point difference operator 1 u h =1 uh (x + h, y) 2u h (x, y)+u h

More information

SOLVING SPARSE LINEAR SYSTEMS OF EQUATIONS. Chao Yang Computational Research Division Lawrence Berkeley National Laboratory Berkeley, CA, USA

SOLVING SPARSE LINEAR SYSTEMS OF EQUATIONS. Chao Yang Computational Research Division Lawrence Berkeley National Laboratory Berkeley, CA, USA 1 SOLVING SPARSE LINEAR SYSTEMS OF EQUATIONS Chao Yang Computational Research Division Lawrence Berkeley National Laboratory Berkeley, CA, USA 2 OUTLINE Sparse matrix storage format Basic factorization

More information

A DISTRIBUTED-MEMORY RANDOMIZED STRUCTURED MULTIFRONTAL METHOD FOR SPARSE DIRECT SOLUTIONS

A DISTRIBUTED-MEMORY RANDOMIZED STRUCTURED MULTIFRONTAL METHOD FOR SPARSE DIRECT SOLUTIONS A DISTRIBUTED-MEMORY RANDOMIZED STRUCTURED MULTIFRONTAL METHOD FOR SPARSE DIRECT SOLUTIONS ZIXING XIN, JIANLIN XIA, MAARTEN V. DE HOOP, STEPHEN CAULEY, AND VENKATARAMANAN BALAKRISHNAN Abstract. We design

More information

Solving linear systems (6 lectures)

Solving linear systems (6 lectures) Chapter 2 Solving linear systems (6 lectures) 2.1 Solving linear systems: LU factorization (1 lectures) Reference: [Trefethen, Bau III] Lecture 20, 21 How do you solve Ax = b? (2.1.1) In numerical linear

More information

IMPROVING THE PERFORMANCE OF SPARSE LU MATRIX FACTORIZATION USING A SUPERNODAL ALGORITHM

IMPROVING THE PERFORMANCE OF SPARSE LU MATRIX FACTORIZATION USING A SUPERNODAL ALGORITHM IMPROVING THE PERFORMANCE OF SPARSE LU MATRIX FACTORIZATION USING A SUPERNODAL ALGORITHM Bogdan OANCEA PhD, Associate Professor, Artife University, Bucharest, Romania E-mail: oanceab@ie.ase.ro Abstract:

More information

J.I. Aliaga 1 M. Bollhöfer 2 A.F. Martín 1 E.S. Quintana-Ortí 1. March, 2009

J.I. Aliaga 1 M. Bollhöfer 2 A.F. Martín 1 E.S. Quintana-Ortí 1. March, 2009 Parallel Preconditioning of Linear Systems based on ILUPACK for Multithreaded Architectures J.I. Aliaga M. Bollhöfer 2 A.F. Martín E.S. Quintana-Ortí Deparment of Computer Science and Engineering, Univ.

More information

A DISTRIBUTED-MEMORY RANDOMIZED STRUCTURED MULTIFRONTAL METHOD FOR SPARSE DIRECT SOLUTIONS

A DISTRIBUTED-MEMORY RANDOMIZED STRUCTURED MULTIFRONTAL METHOD FOR SPARSE DIRECT SOLUTIONS SIAM J. SCI. COMPUT. Vol. 39, No. 4, pp. C292 C318 c 2017 Society for Industrial and Applied Mathematics A DISTRIBUTED-MEMORY RANDOMIZED STRUCTURED MULTIFRONTAL METHOD FOR SPARSE DIRECT SOLUTIONS ZIXING

More information

AMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences)

AMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences) AMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences) Lecture 19: Computing the SVD; Sparse Linear Systems Xiangmin Jiao Stony Brook University Xiangmin Jiao Numerical

More information

Using Postordering and Static Symbolic Factorization for Parallel Sparse LU

Using Postordering and Static Symbolic Factorization for Parallel Sparse LU Using Postordering and Static Symbolic Factorization for Parallel Sparse LU Michel Cosnard LORIA - INRIA Lorraine Nancy, France Michel.Cosnard@loria.fr Laura Grigori LORIA - Univ. Henri Poincaré Nancy,

More information

A Column Pre-ordering Strategy for the Unsymmetric-Pattern Multifrontal Method

A Column Pre-ordering Strategy for the Unsymmetric-Pattern Multifrontal Method A Column Pre-ordering Strategy for the Unsymmetric-Pattern Multifrontal Method TIMOTHY A. DAVIS University of Florida A new method for sparse LU factorization is presented that combines a column pre-ordering

More information

July 18, Abstract. capture the unsymmetric structure of the matrices. We introduce a new algorithm which

July 18, Abstract. capture the unsymmetric structure of the matrices. We introduce a new algorithm which An unsymmetrized multifrontal LU factorization Λ Patrick R. Amestoy y and Chiara Puglisi z July 8, 000 Abstract A well-known approach to compute the LU factorization of a general unsymmetric matrix A is

More information

1 Multiply Eq. E i by λ 0: (λe i ) (E i ) 2 Multiply Eq. E j by λ and add to Eq. E i : (E i + λe j ) (E i )

1 Multiply Eq. E i by λ 0: (λe i ) (E i ) 2 Multiply Eq. E j by λ and add to Eq. E i : (E i + λe j ) (E i ) Direct Methods for Linear Systems Chapter Direct Methods for Solving Linear Systems Per-Olof Persson persson@berkeleyedu Department of Mathematics University of California, Berkeley Math 18A Numerical

More information

Scientific Computing

Scientific Computing Scientific Computing Direct solution methods Martin van Gijzen Delft University of Technology October 3, 2018 1 Program October 3 Matrix norms LU decomposition Basic algorithm Cost Stability Pivoting Pivoting

More information

Static-scheduling and hybrid-programming in SuperLU DIST on multicore cluster systems

Static-scheduling and hybrid-programming in SuperLU DIST on multicore cluster systems Static-scheduling and hybrid-programming in SuperLU DIST on multicore cluster systems Ichitaro Yamazaki University of Tennessee, Knoxville Xiaoye Sherry Li Lawrence Berkeley National Laboratory MS49: Sparse

More information

Partial Left-Looking Structured Multifrontal Factorization & Algorithms for Compressed Sensing. Cinna Julie Wu

Partial Left-Looking Structured Multifrontal Factorization & Algorithms for Compressed Sensing. Cinna Julie Wu Partial Left-Looking Structured Multifrontal Factorization & Algorithms for Compressed Sensing by Cinna Julie Wu A dissertation submitted in partial satisfaction of the requirements for the degree of Doctor

More information

SYMBOLIC AND EXACT STRUCTURE PREDICTION FOR SPARSE GAUSSIAN ELIMINATION WITH PARTIAL PIVOTING

SYMBOLIC AND EXACT STRUCTURE PREDICTION FOR SPARSE GAUSSIAN ELIMINATION WITH PARTIAL PIVOTING SYMBOLIC AND EXACT STRUCTURE PREDICTION FOR SPARSE GAUSSIAN ELIMINATION WITH PARTIAL PIVOTING LAURA GRIGORI, JOHN R. GILBERT, AND MICHEL COSNARD Abstract. In this paper we consider two structure prediction

More information

Improvements for Implicit Linear Equation Solvers

Improvements for Implicit Linear Equation Solvers Improvements for Implicit Linear Equation Solvers Roger Grimes, Bob Lucas, Clement Weisbecker Livermore Software Technology Corporation Abstract Solving large sparse linear systems of equations is often

More information

Parallelization of Multilevel Preconditioners Constructed from Inverse-Based ILUs on Shared-Memory Multiprocessors

Parallelization of Multilevel Preconditioners Constructed from Inverse-Based ILUs on Shared-Memory Multiprocessors Parallelization of Multilevel Preconditioners Constructed from Inverse-Based ILUs on Shared-Memory Multiprocessors J.I. Aliaga 1 M. Bollhöfer 2 A.F. Martín 1 E.S. Quintana-Ortí 1 1 Deparment of Computer

More information

Fast algorithms for hierarchically semiseparable matrices

Fast algorithms for hierarchically semiseparable matrices NUMERICAL LINEAR ALGEBRA WITH APPLICATIONS Numer. Linear Algebra Appl. 2010; 17:953 976 Published online 22 December 2009 in Wiley Online Library (wileyonlinelibrary.com)..691 Fast algorithms for hierarchically

More information

Computational Linear Algebra

Computational Linear Algebra Computational Linear Algebra PD Dr. rer. nat. habil. Ralf Peter Mundani Computation in Engineering / BGU Scientific Computing in Computer Science / INF Winter Term 2017/18 Part 2: Direct Methods PD Dr.

More information

BLAS: Basic Linear Algebra Subroutines Analysis of the Matrix-Vector-Product Analysis of Matrix-Matrix Product

BLAS: Basic Linear Algebra Subroutines Analysis of the Matrix-Vector-Product Analysis of Matrix-Matrix Product Level-1 BLAS: SAXPY BLAS-Notation: S single precision (D for double, C for complex) A α scalar X vector P plus operation Y vector SAXPY: y = αx + y Vectorization of SAXPY (αx + y) by pipelining: page 8

More information

9. Numerical linear algebra background

9. Numerical linear algebra background Convex Optimization Boyd & Vandenberghe 9. Numerical linear algebra background matrix structure and algorithm complexity solving linear equations with factored matrices LU, Cholesky, LDL T factorization

More information

Factoring Matrices with a Tree-Structured Sparsity Pattern

Factoring Matrices with a Tree-Structured Sparsity Pattern TEL-AVIV UNIVERSITY RAYMOND AND BEVERLY SACKLER FACULTY OF EXACT SCIENCES SCHOOL OF COMPUTER SCIENCE Factoring Matrices with a Tree-Structured Sparsity Pattern Thesis submitted in partial fulfillment of

More information

Numerical Methods I: Numerical linear algebra

Numerical Methods I: Numerical linear algebra 1/3 Numerical Methods I: Numerical linear algebra Georg Stadler Courant Institute, NYU stadler@cimsnyuedu September 1, 017 /3 We study the solution of linear systems of the form Ax = b with A R n n, x,

More information

LU Factorization. Marco Chiarandini. DM559 Linear and Integer Programming. Department of Mathematics & Computer Science University of Southern Denmark

LU Factorization. Marco Chiarandini. DM559 Linear and Integer Programming. Department of Mathematics & Computer Science University of Southern Denmark DM559 Linear and Integer Programming LU Factorization Marco Chiarandini Department of Mathematics & Computer Science University of Southern Denmark [Based on slides by Lieven Vandenberghe, UCLA] Outline

More information

Numerical Linear Algebra

Numerical Linear Algebra Numerical Linear Algebra Decompositions, numerical aspects Gerard Sleijpen and Martin van Gijzen September 27, 2017 1 Delft University of Technology Program Lecture 2 LU-decomposition Basic algorithm Cost

More information

Program Lecture 2. Numerical Linear Algebra. Gaussian elimination (2) Gaussian elimination. Decompositions, numerical aspects

Program Lecture 2. Numerical Linear Algebra. Gaussian elimination (2) Gaussian elimination. Decompositions, numerical aspects Numerical Linear Algebra Decompositions, numerical aspects Program Lecture 2 LU-decomposition Basic algorithm Cost Stability Pivoting Cholesky decomposition Sparse matrices and reorderings Gerard Sleijpen

More information

MULTI-LAYER HIERARCHICAL STRUCTURES AND FACTORIZATIONS

MULTI-LAYER HIERARCHICAL STRUCTURES AND FACTORIZATIONS MULTI-LAYER HIERARCHICAL STRUCTURES AND FACTORIZATIONS JIANLIN XIA Abstract. We propose multi-layer hierarchically semiseparable MHS structures for the fast factorizations of dense matrices arising from

More information

Contents. Preface... xi. Introduction...

Contents. Preface... xi. Introduction... Contents Preface... xi Introduction... xv Chapter 1. Computer Architectures... 1 1.1. Different types of parallelism... 1 1.1.1. Overlap, concurrency and parallelism... 1 1.1.2. Temporal and spatial parallelism

More information

Preliminaries and Complexity Theory

Preliminaries and Complexity Theory Preliminaries and Complexity Theory Oleksandr Romanko CAS 746 - Advanced Topics in Combinatorial Optimization McMaster University, January 16, 2006 Introduction Book structure: 2 Part I Linear Algebra

More information

QR FACTORIZATIONS USING A RESTRICTED SET OF ROTATIONS

QR FACTORIZATIONS USING A RESTRICTED SET OF ROTATIONS QR FACTORIZATIONS USING A RESTRICTED SET OF ROTATIONS DIANNE P. O LEARY AND STEPHEN S. BULLOCK Dedicated to Alan George on the occasion of his 60th birthday Abstract. Any matrix A of dimension m n (m n)

More information

Enhancing Scalability of Sparse Direct Methods

Enhancing Scalability of Sparse Direct Methods Journal of Physics: Conference Series 78 (007) 0 doi:0.088/7-6596/78//0 Enhancing Scalability of Sparse Direct Methods X.S. Li, J. Demmel, L. Grigori, M. Gu, J. Xia 5, S. Jardin 6, C. Sovinec 7, L.-Q.

More information

LINEAR SYSTEMS (11) Intensive Computation

LINEAR SYSTEMS (11) Intensive Computation LINEAR SYSTEMS () Intensive Computation 27-8 prof. Annalisa Massini Viviana Arrigoni EXACT METHODS:. GAUSSIAN ELIMINATION. 2. CHOLESKY DECOMPOSITION. ITERATIVE METHODS:. JACOBI. 2. GAUSS-SEIDEL 2 CHOLESKY

More information

Research Reports on Mathematical and Computing Sciences

Research Reports on Mathematical and Computing Sciences ISSN 1342-284 Research Reports on Mathematical and Computing Sciences Exploiting Sparsity in Linear and Nonlinear Matrix Inequalities via Positive Semidefinite Matrix Completion Sunyoung Kim, Masakazu

More information

Classical Complexity and Fixed-Parameter Tractability of Simultaneous Consecutive Ones Submatrix & Editing Problems

Classical Complexity and Fixed-Parameter Tractability of Simultaneous Consecutive Ones Submatrix & Editing Problems Classical Complexity and Fixed-Parameter Tractability of Simultaneous Consecutive Ones Submatrix & Editing Problems Rani M. R, Mohith Jagalmohanan, R. Subashini Binary matrices having simultaneous consecutive

More information

Preliminaries. Graphs. E : set of edges (arcs) (Undirected) Graph : (i, j) = (j, i) (edges) V = {1, 2, 3, 4, 5}, E = {(1, 3), (3, 2), (2, 4)}

Preliminaries. Graphs. E : set of edges (arcs) (Undirected) Graph : (i, j) = (j, i) (edges) V = {1, 2, 3, 4, 5}, E = {(1, 3), (3, 2), (2, 4)} Preliminaries Graphs G = (V, E), V : set of vertices E : set of edges (arcs) (Undirected) Graph : (i, j) = (j, i) (edges) 1 2 3 5 4 V = {1, 2, 3, 4, 5}, E = {(1, 3), (3, 2), (2, 4)} 1 Directed Graph (Digraph)

More information

Efficient Sparse LU Factorization with Partial Pivoting on Distributed Memory Architectures

Efficient Sparse LU Factorization with Partial Pivoting on Distributed Memory Architectures IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 9, NO. 2, FEBRUARY 1998 109 Efficient Sparse LU Factorization with Partial Pivoting on Distributed Memory Architectures Cong Fu, Xiangmin Jiao,

More information

AM205: Assignment 2. i=1

AM205: Assignment 2. i=1 AM05: Assignment Question 1 [10 points] (a) [4 points] For p 1, the p-norm for a vector x R n is defined as: ( n ) 1/p x p x i p ( ) i=1 This definition is in fact meaningful for p < 1 as well, although

More information

ENHANCING PERFORMANCE AND ROBUSTNESS OF ILU PRECONDITIONERS BY BLOCKING AND SELECTIVE TRANSPOSITION

ENHANCING PERFORMANCE AND ROBUSTNESS OF ILU PRECONDITIONERS BY BLOCKING AND SELECTIVE TRANSPOSITION SIAM J. SCI. COMPUT. Vol. 39, No., pp. A303 A332 c 207 Society for Industrial and Applied Mathematics ENHANCING PERFORMANCE AND ROBUSTNESS OF ILU PRECONDITIONERS BY BLOCKING AND SELECTIVE TRANSPOSITION

More information

A Review of Matrix Analysis

A Review of Matrix Analysis Matrix Notation Part Matrix Operations Matrices are simply rectangular arrays of quantities Each quantity in the array is called an element of the matrix and an element can be either a numerical value

More information

SCALABLE HYBRID SPARSE LINEAR SOLVERS

SCALABLE HYBRID SPARSE LINEAR SOLVERS The Pennsylvania State University The Graduate School Department of Computer Science and Engineering SCALABLE HYBRID SPARSE LINEAR SOLVERS A Thesis in Computer Science and Engineering by Keita Teranishi

More information

An exploration of matrix equilibration

An exploration of matrix equilibration An exploration of matrix equilibration Paul Liu Abstract We review three algorithms that scale the innity-norm of each row and column in a matrix to. The rst algorithm applies to unsymmetric matrices,

More information

9. Numerical linear algebra background

9. Numerical linear algebra background Convex Optimization Boyd & Vandenberghe 9. Numerical linear algebra background matrix structure and algorithm complexity solving linear equations with factored matrices LU, Cholesky, LDL T factorization

More information

Linear-Time Algorithms for Finding Tucker Submatrices and Lekkerkerker-Boland Subgraphs

Linear-Time Algorithms for Finding Tucker Submatrices and Lekkerkerker-Boland Subgraphs Linear-Time Algorithms for Finding Tucker Submatrices and Lekkerkerker-Boland Subgraphs Nathan Lindzey, Ross M. McConnell Colorado State University, Fort Collins CO 80521, USA Abstract. Tucker characterized

More information

Direct and Incomplete Cholesky Factorizations with Static Supernodes

Direct and Incomplete Cholesky Factorizations with Static Supernodes Direct and Incomplete Cholesky Factorizations with Static Supernodes AMSC 661 Term Project Report Yuancheng Luo 2010-05-14 Introduction Incomplete factorizations of sparse symmetric positive definite (SSPD)

More information

A Sparse QS-Decomposition for Large Sparse Linear System of Equations

A Sparse QS-Decomposition for Large Sparse Linear System of Equations A Sparse QS-Decomposition for Large Sparse Linear System of Equations Wujian Peng 1 and Biswa N. Datta 2 1 Department of Math, Zhaoqing University, Zhaoqing, China, douglas peng@yahoo.com 2 Department

More information

7. LU factorization. factor-solve method. LU factorization. solving Ax = b with A nonsingular. the inverse of a nonsingular matrix

7. LU factorization. factor-solve method. LU factorization. solving Ax = b with A nonsingular. the inverse of a nonsingular matrix EE507 - Computational Techniques for EE 7. LU factorization Jitkomut Songsiri factor-solve method LU factorization solving Ax = b with A nonsingular the inverse of a nonsingular matrix LU factorization

More information

Algorithms for Fast Linear System Solving and Rank Profile Computation

Algorithms for Fast Linear System Solving and Rank Profile Computation Algorithms for Fast Linear System Solving and Rank Profile Computation by Shiyun Yang A thesis presented to the University of Waterloo in fulfillment of the thesis requirement for the degree of Master

More information

AMS526: Numerical Analysis I (Numerical Linear Algebra)

AMS526: Numerical Analysis I (Numerical Linear Algebra) AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 3: Positive-Definite Systems; Cholesky Factorization Xiangmin Jiao Stony Brook University Xiangmin Jiao Numerical Analysis I 1 / 11 Symmetric

More information

FAST STRUCTURED EIGENSOLVER FOR DISCRETIZED PARTIAL DIFFERENTIAL OPERATORS ON GENERAL MESHES

FAST STRUCTURED EIGENSOLVER FOR DISCRETIZED PARTIAL DIFFERENTIAL OPERATORS ON GENERAL MESHES Proceedings of the Project Review, Geo-Mathematical Imaging Group Purdue University, West Lafayette IN, Vol. 1 2012 pp. 123-132. FAST STRUCTURED EIGENSOLVER FOR DISCRETIZED PARTIAL DIFFERENTIAL OPERATORS

More information

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 1 x 2. x n 8 (4) 3 4 2

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 1 x 2. x n 8 (4) 3 4 2 MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS SYSTEMS OF EQUATIONS AND MATRICES Representation of a linear system The general system of m equations in n unknowns can be written a x + a 2 x 2 + + a n x n b a

More information

Multifrontal QR factorization in a multiprocessor. environment. Abstract

Multifrontal QR factorization in a multiprocessor. environment. Abstract Multifrontal QR factorization in a multiprocessor environment P. Amestoy 1, I.S. Du and C. Puglisi Abstract We describe the design and implementation of a parallel QR decomposition algorithm for a large

More information

Matrix Computations: Direct Methods II. May 5, 2014 Lecture 11

Matrix Computations: Direct Methods II. May 5, 2014 Lecture 11 Matrix Computations: Direct Methods II May 5, 2014 ecture Summary You have seen an example of how a typical matrix operation (an important one) can be reduced to using lower level BS routines that would

More information

Lecture 9: Numerical Linear Algebra Primer (February 11st)

Lecture 9: Numerical Linear Algebra Primer (February 11st) 10-725/36-725: Convex Optimization Spring 2015 Lecture 9: Numerical Linear Algebra Primer (February 11st) Lecturer: Ryan Tibshirani Scribes: Avinash Siravuru, Guofan Wu, Maosheng Liu Note: LaTeX template

More information

Downloaded 08/28/12 to Redistribution subject to SIAM license or copyright; see

Downloaded 08/28/12 to Redistribution subject to SIAM license or copyright; see SIAMJ.MATRIX ANAL. APPL. c 1997 Society for Industrial and Applied Mathematics Vol. 18, No. 1, pp. 140 158, January 1997 012 AN UNSYMMETRIC-PATTERN MULTIFRONTAL METHOD FOR SPARSE LU FACTORIZATION TIMOTHY

More information

Computation of the mtx-vec product based on storage scheme on vector CPUs

Computation of the mtx-vec product based on storage scheme on vector CPUs BLAS: Basic Linear Algebra Subroutines BLAS: Basic Linear Algebra Subroutines BLAS: Basic Linear Algebra Subroutines Analysis of the Matrix Computation of the mtx-vec product based on storage scheme on

More information

Downloaded 08/28/12 to Redistribution subject to SIAM license or copyright; see

Downloaded 08/28/12 to Redistribution subject to SIAM license or copyright; see SIAM J. MATRIX ANAL. APPL. Vol. 22, No. 4, pp. 997 1013 c 2001 Society for Industrial and Applied Mathematics MULTIPLE-RANK MODIFICATIONS OF A SPARSE CHOLESKY FACTORIZATION TIMOTHY A. DAVIS AND WILLIAM

More information

Incomplete Cholesky preconditioners that exploit the low-rank property

Incomplete Cholesky preconditioners that exploit the low-rank property anapov@ulb.ac.be ; http://homepages.ulb.ac.be/ anapov/ 1 / 35 Incomplete Cholesky preconditioners that exploit the low-rank property (theory and practice) Artem Napov Service de Métrologie Nucléaire, Université

More information

Fundamentals of Engineering Analysis (650163)

Fundamentals of Engineering Analysis (650163) Philadelphia University Faculty of Engineering Communications and Electronics Engineering Fundamentals of Engineering Analysis (6563) Part Dr. Omar R Daoud Matrices: Introduction DEFINITION A matrix is

More information

Numerical Methods - Numerical Linear Algebra

Numerical Methods - Numerical Linear Algebra Numerical Methods - Numerical Linear Algebra Y. K. Goh Universiti Tunku Abdul Rahman 2013 Y. K. Goh (UTAR) Numerical Methods - Numerical Linear Algebra I 2013 1 / 62 Outline 1 Motivation 2 Solving Linear

More information

A sparse multifrontal solver using hierarchically semi-separable frontal matrices

A sparse multifrontal solver using hierarchically semi-separable frontal matrices A sparse multifrontal solver using hierarchically semi-separable frontal matrices Pieter Ghysels Lawrence Berkeley National Laboratory Joint work with: Xiaoye S. Li (LBNL), Artem Napov (ULB), François-Henry

More information

Linear Solvers. Andrew Hazel

Linear Solvers. Andrew Hazel Linear Solvers Andrew Hazel Introduction Thus far we have talked about the formulation and discretisation of physical problems...... and stopped when we got to a discrete linear system of equations. Introduction

More information

Solving PDEs with CUDA Jonathan Cohen

Solving PDEs with CUDA Jonathan Cohen Solving PDEs with CUDA Jonathan Cohen jocohen@nvidia.com NVIDIA Research PDEs (Partial Differential Equations) Big topic Some common strategies Focus on one type of PDE in this talk Poisson Equation Linear

More information

Using Laplacian Eigenvalues and Eigenvectors in the Analysis of Frequency Assignment Problems

Using Laplacian Eigenvalues and Eigenvectors in the Analysis of Frequency Assignment Problems Using Laplacian Eigenvalues and Eigenvectors in the Analysis of Frequency Assignment Problems Jan van den Heuvel and Snežana Pejić Department of Mathematics London School of Economics Houghton Street,

More information

A dissection solver with kernel detection for unsymmetric matrices in FreeFem++

A dissection solver with kernel detection for unsymmetric matrices in FreeFem++ . p.1/21 11 Dec. 2014, LJLL, Paris FreeFem++ workshop A dissection solver with kernel detection for unsymmetric matrices in FreeFem++ Atsushi Suzuki Atsushi.Suzuki@ann.jussieu.fr Joint work with François-Xavier

More information

An approximate minimum degree algorithm for matrices with dense rows

An approximate minimum degree algorithm for matrices with dense rows RT/APO/08/02 An approximate minimum degree algorithm for matrices with dense rows P. R. Amestoy 1, H. S. Dollar 2, J. K. Reid 2, J. A. Scott 2 ABSTRACT We present a modified version of the approximate

More information

Bridging the gap between flat and hierarchical low-rank matrix formats: the multilevel BLR format

Bridging the gap between flat and hierarchical low-rank matrix formats: the multilevel BLR format Bridging the gap between flat and hierarchical low-rank matrix formats: the multilevel BLR format Amestoy, Patrick and Buttari, Alfredo and L Excellent, Jean-Yves and Mary, Theo 2018 MIMS EPrint: 2018.12

More information

Direct Methods for Solving Linear Systems. Simon Fraser University Surrey Campus MACM 316 Spring 2005 Instructor: Ha Le

Direct Methods for Solving Linear Systems. Simon Fraser University Surrey Campus MACM 316 Spring 2005 Instructor: Ha Le Direct Methods for Solving Linear Systems Simon Fraser University Surrey Campus MACM 316 Spring 2005 Instructor: Ha Le 1 Overview General Linear Systems Gaussian Elimination Triangular Systems The LU Factorization

More information

arxiv: v1 [cs.na] 20 Jul 2015

arxiv: v1 [cs.na] 20 Jul 2015 AN EFFICIENT SOLVER FOR SPARSE LINEAR SYSTEMS BASED ON RANK-STRUCTURED CHOLESKY FACTORIZATION JEFFREY N. CHADWICK AND DAVID S. BINDEL arxiv:1507.05593v1 [cs.na] 20 Jul 2015 Abstract. Direct factorization

More information

Towards a stable static pivoting strategy for the sequential and parallel solution of sparse symmetric indefinite systems

Towards a stable static pivoting strategy for the sequential and parallel solution of sparse symmetric indefinite systems Technical Report RAL-TR-2005-007 Towards a stable static pivoting strategy for the sequential and parallel solution of sparse symmetric indefinite systems Iain S. Duff and Stéphane Pralet April 22, 2005

More information

Numerical Methods I Non-Square and Sparse Linear Systems

Numerical Methods I Non-Square and Sparse Linear Systems Numerical Methods I Non-Square and Sparse Linear Systems Aleksandar Donev Courant Institute, NYU 1 donev@courant.nyu.edu 1 MATH-GA 2011.003 / CSCI-GA 2945.003, Fall 2014 September 25th, 2014 A. Donev (Courant

More information

STAT 309: MATHEMATICAL COMPUTATIONS I FALL 2018 LECTURE 13

STAT 309: MATHEMATICAL COMPUTATIONS I FALL 2018 LECTURE 13 STAT 309: MATHEMATICAL COMPUTATIONS I FALL 208 LECTURE 3 need for pivoting we saw that under proper circumstances, we can write A LU where 0 0 0 u u 2 u n l 2 0 0 0 u 22 u 2n L l 3 l 32, U 0 0 0 l n l

More information

Undirected Graphs. V = { 1, 2, 3, 4, 5, 6, 7, 8 } E = { 1-2, 1-3, 2-3, 2-4, 2-5, 3-5, 3-7, 3-8, 4-5, 5-6 } n = 8 m = 11

Undirected Graphs. V = { 1, 2, 3, 4, 5, 6, 7, 8 } E = { 1-2, 1-3, 2-3, 2-4, 2-5, 3-5, 3-7, 3-8, 4-5, 5-6 } n = 8 m = 11 Undirected Graphs Undirected graph. G = (V, E) V = nodes. E = edges between pairs of nodes. Captures pairwise relationship between objects. Graph size parameters: n = V, m = E. V = {, 2, 3,,,, 7, 8 } E

More information

Pivoting. Reading: GV96 Section 3.4, Stew98 Chapter 3: 1.3

Pivoting. Reading: GV96 Section 3.4, Stew98 Chapter 3: 1.3 Pivoting Reading: GV96 Section 3.4, Stew98 Chapter 3: 1.3 In the previous discussions we have assumed that the LU factorization of A existed and the various versions could compute it in a stable manner.

More information

Using Random Butterfly Transformations to Avoid Pivoting in Sparse Direct Methods

Using Random Butterfly Transformations to Avoid Pivoting in Sparse Direct Methods Using Random Butterfly Transformations to Avoid Pivoting in Sparse Direct Methods Marc Baboulin 1, Xiaoye S. Li 2 and François-Henry Rouet 2 1 University of Paris-Sud, Inria Saclay, France 2 Lawrence Berkeley

More information

Directed Graphs (Digraphs) and Graphs

Directed Graphs (Digraphs) and Graphs Directed Graphs (Digraphs) and Graphs Definitions Graph ADT Traversal algorithms DFS Lecturer: Georgy Gimel farb COMPSCI 220 Algorithms and Data Structures 1 / 74 1 Basic definitions 2 Digraph Representation

More information

Sparse factorization using low rank submatrices. Cleve Ashcraft LSTC 2010 MUMPS User Group Meeting April 15-16, 2010 Toulouse, FRANCE

Sparse factorization using low rank submatrices. Cleve Ashcraft LSTC 2010 MUMPS User Group Meeting April 15-16, 2010 Toulouse, FRANCE Sparse factorization using low rank submatrices Cleve Ashcraft LSTC cleve@lstc.com 21 MUMPS User Group Meeting April 15-16, 21 Toulouse, FRANCE ftp.lstc.com:outgoing/cleve/mumps1 Ashcraft.pdf 1 LSTC Livermore

More information

Parallel Scientific Computing

Parallel Scientific Computing IV-1 Parallel Scientific Computing Matrix-vector multiplication. Matrix-matrix multiplication. Direct method for solving a linear equation. Gaussian Elimination. Iterative method for solving a linear equation.

More information

Multifrontal Incomplete Factorization for Indefinite and Complex Symmetric Systems

Multifrontal Incomplete Factorization for Indefinite and Complex Symmetric Systems Multifrontal Incomplete Factorization for Indefinite and Complex Symmetric Systems Yong Qu and Jacob Fish Departments of Civil, Mechanical and Aerospace Engineering Rensselaer Polytechnic Institute, Troy,

More information

Numerical Linear Algebra Primer. Ryan Tibshirani Convex Optimization /36-725

Numerical Linear Algebra Primer. Ryan Tibshirani Convex Optimization /36-725 Numerical Linear Algebra Primer Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: proximal gradient descent Consider the problem min g(x) + h(x) with g, h convex, g differentiable, and h simple

More information

MTH 464: Computational Linear Algebra

MTH 464: Computational Linear Algebra MTH 464: Computational Linear Algebra Lecture Outlines Exam 2 Material Prof. M. Beauregard Department of Mathematics & Statistics Stephen F. Austin State University February 6, 2018 Linear Algebra (MTH

More information

Exploiting Fill-in and Fill-out in Gaussian-like Elimination Procedures on the Extended Jacobian Matrix

Exploiting Fill-in and Fill-out in Gaussian-like Elimination Procedures on the Extended Jacobian Matrix 2nd European Workshop on AD 1 Exploiting Fill-in and Fill-out in Gaussian-like Elimination Procedures on the Extended Jacobian Matrix Andrew Lyons (Vanderbilt U.) / Uwe Naumann (RWTH Aachen) 2nd European

More information

Matrix Assembly in FEA

Matrix Assembly in FEA Matrix Assembly in FEA 1 In Chapter 2, we spoke about how the global matrix equations are assembled in the finite element method. We now want to revisit that discussion and add some details. For example,

More information

Numerical Methods I Solving Square Linear Systems: GEM and LU factorization

Numerical Methods I Solving Square Linear Systems: GEM and LU factorization Numerical Methods I Solving Square Linear Systems: GEM and LU factorization Aleksandar Donev Courant Institute, NYU 1 donev@courant.nyu.edu 1 MATH-GA 2011.003 / CSCI-GA 2945.003, Fall 2014 September 18th,

More information

MAA507, Power method, QR-method and sparse matrix representation.

MAA507, Power method, QR-method and sparse matrix representation. ,, and representation. February 11, 2014 Lecture 7: Overview, Today we will look at:.. If time: A look at representation and fill in. Why do we need numerical s? I think everyone have seen how time consuming

More information

AN EFFICIENT APPROACH FOR MULTIFRONTAL AL- GORITHM TO SOLVE NON-POSITIVE-DEFINITE FI- NITE ELEMENT EQUATIONS IN ELECTROMAGNETIC PROBLEMS

AN EFFICIENT APPROACH FOR MULTIFRONTAL AL- GORITHM TO SOLVE NON-POSITIVE-DEFINITE FI- NITE ELEMENT EQUATIONS IN ELECTROMAGNETIC PROBLEMS Progress In Electromagnetics Research, PIER 95, 2 33, 29 AN EFFICIENT APPROACH FOR MULTIFRONTAL AL- GORITHM TO SOLVE NON-POSITIVE-DEFINITE FI- NITE ELEMENT EQUATIONS IN ELECTROMAGNETIC PROBLEMS J. Tian,

More information

The Solution of Linear Systems AX = B

The Solution of Linear Systems AX = B Chapter 2 The Solution of Linear Systems AX = B 21 Upper-triangular Linear Systems We will now develop the back-substitution algorithm, which is useful for solving a linear system of equations that has

More information

Notes on the Matrix-Tree theorem and Cayley s tree enumerator

Notes on the Matrix-Tree theorem and Cayley s tree enumerator Notes on the Matrix-Tree theorem and Cayley s tree enumerator 1 Cayley s tree enumerator Recall that the degree of a vertex in a tree (or in any graph) is the number of edges emanating from it We will

More information

Graph Theorizing Peg Solitaire. D. Paul Hoilman East Tennessee State University

Graph Theorizing Peg Solitaire. D. Paul Hoilman East Tennessee State University Graph Theorizing Peg Solitaire D. Paul Hoilman East Tennessee State University December 7, 00 Contents INTRODUCTION SIMPLE SOLVING CONCEPTS 5 IMPROVED SOLVING 7 4 RELATED GAMES 5 5 PROGENATION OF SOLVABLE

More information

k-protected VERTICES IN BINARY SEARCH TREES

k-protected VERTICES IN BINARY SEARCH TREES k-protected VERTICES IN BINARY SEARCH TREES MIKLÓS BÓNA Abstract. We show that for every k, the probability that a randomly selected vertex of a random binary search tree on n nodes is at distance k from

More information

Chapter 7 Network Flow Problems, I

Chapter 7 Network Flow Problems, I Chapter 7 Network Flow Problems, I Network flow problems are the most frequently solved linear programming problems. They include as special cases, the assignment, transportation, maximum flow, and shortest

More information

Sparse Linear Systems. Iterative Methods for Sparse Linear Systems. Motivation for Studying Sparse Linear Systems. Partial Differential Equations

Sparse Linear Systems. Iterative Methods for Sparse Linear Systems. Motivation for Studying Sparse Linear Systems. Partial Differential Equations Sparse Linear Systems Iterative Methods for Sparse Linear Systems Matrix Computations and Applications, Lecture C11 Fredrik Bengzon, Robert Söderlund We consider the problem of solving the linear system

More information

APPARC PaA3a Deliverable. ESPRIT BRA III Contract # Reordering of Sparse Matrices for Parallel Processing. Achim Basermannn.

APPARC PaA3a Deliverable. ESPRIT BRA III Contract # Reordering of Sparse Matrices for Parallel Processing. Achim Basermannn. APPARC PaA3a Deliverable ESPRIT BRA III Contract # 6634 Reordering of Sparse Matrices for Parallel Processing Achim Basermannn Peter Weidner Zentralinstitut fur Angewandte Mathematik KFA Julich GmbH D-52425

More information