Block-tridiagonal matrices. p.1/31
Block-tridiagonal matrices - where do these arise? - as a result of a particular mesh-point ordering - as a part of a factorization procedure, for example when we compute the eigenvalues of a matrix.. p.2/31
Block-tridiagonal matrices Ω1 Ω2 Consider a two-dimensional domain partitioned in strips. Assume that points on the lines of intersection are only coupled to their nearest neighbors in the underlying mesh (and we do not have periodic boundary conditions). Hence, there is no coupling between subdomains except through the glue on the interfaces. Ω3. p.3/31
Block-tridiagonal matrices When the subdomains are ordered lexicographically from left to right, a domain Ω becomes coupled only to its pre- and postdecessors Ω 1 and Ω +1, respectively and the corresponding matrix takes the form of a block tridiagonal matrix = tridiag ( 1 +1 ), or ¾ 11 12 0 =......... 21 22 23 0 ÒÒ 1 ÒÒ For definiteness we let the boundary meshline Ω Ω +1 belong to Ω. In order to preserve the sparsity pattern we shall factor without use of permutations. Naturally, the lines of intersection do not have to be straight.. p.4/31
Block-tridiagonal matrices How do we factorize a (block)-tridiagonal matrix?. p.5/31
Let be block-tridiagonal, and expressed as = Ä Í. Convenient: seek Ä,, Í such that = Ä 1 Í and where is diagonal, Ä = Ä and Í = Í Direct computation: = ( Ä ) 1 ( Í ) = Ä Í +Ä 1 Í = Ä Í i.e., = + Ä 1 Í Important: Ä and Í are strictly lower and upper triangular.. p.6/31
= Ä 1 Í for pointwise tridiagonal matrices 2 6 4 11 22... 3 7 5 = 2 6 4 1 2... 3 + 7 5 2 6 4 0 21 0 ÒÒ ÒÒ 1 0 3 2... 7 6 5 4 1 1 12... Ò 1 Ò 3 2 7 6 5 4 0 12 0 23... 3 7 5 0 Factorization algorithm: 1 = 11 = 1 1 1. p.7/31
= Ä 1 Í for pointwise tridiagonal matrices Solution of systems with Ä 1 Í. p.8/31
Block-tridiagonal matrices Let be block-tridiagonal, and expressed as = Ä Í. One can envisage three major versions of the factorization algorithm: (i) = ( Ä ) 1 ( Í ) (ii) = ( 1 Ä )( 1 Í ) (iii) = (Á Ä ) 1 (Á Í ) = 1 1 1 1 2 1 = 11 ( = 1 1 1) 1 1 0 = 0 (Inverse free substitutions), where Ä Ä =, Í Í =. Here 1 (Á = Í ) 1 Ä ) 1 (Á Í ) 1 = 2 + ) + (Á 2 Í )(Á Í + Í ) and similarly (Á for Ä ) 1 (Á. (Á. p.9/31
Existence of factorization for block-tridiagonal matric We assume that the matrices are real. It can be shown that (Ö) is ÖÖ always nonsingular for two important classes of matrices, namely for matrices which are positive definite, i.e., 0 for all Ü ¾ Ê ( if Ò has order Ò) Ü Ì Ü blockwise generalized diagonally dominant matrices (also called block À-matrices), i.e., for which the diagonal matrices are nonsingular and 1 1 (here 01 = 0 Ò+1Ò = 0). + +1 1 1 = 1 2 Ò. p.10/31
A factorization passes through stages Ö = 1 2 Ò For two important classes of matrices there holds that the successive top blocks, i.e., pivot matrices which arise after every factorization stage, are nonsingular. At every stage the current matrix (Ö) is partitioned in 2 2 blocks, (1) = = 2 6 4 11 12 0 21 22 23 0............ 0 0 ÒÒ 1 ÒÒ 3 7 5 = 2 4 (1) (1) 11 12 (1) (1) 21 22 3 5 At the Öth stage we compute (Ö) 11 = (Ö) 1 11 and factor (Ö), (Ö) = 2 4 0 Á (Ö) 21 (Ö) 11 Á 3 2 5 1 4 (Ö) (Ö) 11 12 0 (Ö+1) 3 5 where (Ö+1) = (Ö) 22 (Ö) 21 (Ö) 11 (Ö) 12 complement. is the so-called Schur. p.11/31
Existence of factorization for block-tridiagonal matric The factorization of a block matrix is equivalent to the block Gaussian elimination of it. Note then that the only block in (Ö) 22 which will be affected by the elimination (of block matrix (1) 21 block tridiagonal decomposition of (Ö) 22 matrix. ) is the top block of the, i.e., (Ö+1) 11, the new pivot We show that for the above matrix classes the Schur complement (Ö+1) = (Ö) 22 (Ö) 21 (Ö) 11 (Ö) 12 belongs to the same class as (Ö), i.e., in particular that the pivot entries are nonsingular.. p.12/31
Lemma 1 Let = 11 12 21 22 be positive definite. Then = 1 2 and the Schur complement Ë = 22 21 1 11 12 are also positive definite. Proof Ü Ü Ü There holds 11Ü = 1 1 Ü for all (Ü = 1 0). Hence 1 ÜÌ 11Ü 1 0 for all Ì Ì Ì Ì Ü 1, i.e., 11 is positive definite. Similarly, it can be shown that 22 is positive definite. Since is nonsingular then Ü Ì Ü = Ü Ì Ì Ü = Ý Ì 1 Ý for Ý = Ü Ý so 1 0 for all = i.e., the inverse of is also positive definite. Ý Use Ý 0 now the explicit form of the inverse Ì computed by use of the factorization, 2 3 2 3 2 3 2 3 1 11 = 4 12 11 0 Á 0 5 4 5 4 5 = 4 5 0 1 Á Ë Á 0 Ë 1 21 11 where indicates entries not important for the present discussion. Hence, since 1 is positive definite, so is its diagonal block Ë 1. Hence, the inverse of Ë 1, and therefore also Ë, is positive definite. Á. p.13/31
Corollary 1 When (Ö) is positive definite, (Ö+1) and in particular, (Ö+1) 11 are positive definite. Proof (Ö+1) is a Schur complement of (Ö) so by Lemma 1, (Ö+1) is positive definite when (Ö) is. In particular, its top diagonal block is positive definite.. p.14/31
Lemma 2 Let = 11 12 21 22 be blockwise generalized diagonally dominant where is block tridiagonal. Then the Schur complement Ë = 22 21 1 11 12 is also generalized diagonally dominant. Proof (Hint) Since the only matrix block in Ë which has been changed from 22 is its top block 11 to (Ö+1) 11 it suffices to show that 11 is nonsingular and the first block column is generalized diagonally dominant.. p.15/31
Linear recursions Consider the solution of the linear system of equations x = b, where has been already factorized as = ÄÍ or = ÄÍ. The matrices Ä = Ð and Í = Ù are lower- and upper-triangular, respectively. To compute x, we must perform two steps: forward substitution: Äz = b, i.e., Þ 1 = 1 Þ = 1 È =1 Ð Þ = 2 3 Ò backward substitution: Í x = z, i.e., Ü Ò = Þ Ò Ü = Þ ÒÈ Ù Ü = Ò 1 Ò 2 1 =+1. p.16/31
While the implementation of the forward and back-substitution on a serial computer is trivial, to implement them on a vector or parallel computer system is problematic. The reason is that the relations are particular examples of a linear recursion which is an inherently sequential process. A general Ñ-level recurrence relation reads as Ü = 1 Ü 1 + 2 Ü 2 + + Ñ Ü Ñ + and the performance of its straightforward vector or parallel implementation is degraded due to the existing backwards data dependencies.. p.17/31
Block-tridiagonal matrices Can we speedup somehow the solution of systems with bi- or tri-diagonal matrices?. p.18/31
Multifrontal solution methods 1 3 5 7 9 8 6 4 2 x n0 (a) Two way frontal (Ü method Ò0 is the middle node (b) The structure of the matrix Any tridiagonal or block tridiagonal matrix can be attacked in parallel from both ends, after a proper numbering of the unknowns It can be seen that we can work independently on the odd numbered and even numbered points until we have eliminated all entries except the final corner one.. p.19/31
Hence, the factorization and forward substitution can occur in parallel for the two fronts (the even and the odd). At the final point we can either continue in parallel with the back substitution to compute the solution at all the other interior points, or we can use the same type of two way frontal method now for each of the two structures which have been split by the already computed solution at the middle point. This method of recursively dividing the domain in smaller and smaller pieces which can be handled all in parallel, can be continued log 2 Ò steps, after which we have just one unknown per subinterval.. p.20/31
The idea to perform Gaussian elimination from both ends of a tridiagonal matrix, called also twisted factorization, was proposed first by Babushka in 1972. Note that in this method no back substitution is required.. p.21/31
Odd-even elimination/cyclic reduction/divideand-conquer We sketch some parallel computation methods for recurrence relations. The methods are applicable for general (block-)band matrices. For simplicity of presentation, the idea is illustrated on one-level or two-level scalar recursions: Ü 1 = 1 Ü = Ü 1 + = 2 3 Ò 1Ü 1 + Ü + +1 Ü +1 = = 1 2 Ò 10 = ÒÒ+1 = 0 The corresponding matrix-vector equivalent of the above recursions is to solve a system x = b, where is lower bidiagonal and tridiagonal, respectively.. p.22/31
An idea to gain some parallelism when solving linear recursions is to reduce the size of the corresponding linear system by eliminating the odd-indexed unknowns from the even-numbered equations (or vice versa). This elimination can be done in parallel for each of the equations because the odd numbered equations and the even numbered equations are both mutually uncoupled. The system of equations resulting for the even numbered and for the odd numbered unknowns after the elimination can be applied for the reduced equations and so on. For every elimination step we reduce the order of the coupled equations to about half its previous order and eventually we are left with a single equation or a system of uncoupled equations. 1 2 3 4 5 6 7 2 4 6. p.23/31
In the odd-even elimination (or odd-even reduction) method we eliminate the odd numbered unknowns (i.e., numbers 1 (mod 2)) and we are left with a tridiagonal system for the even numbered (i.e., numbers 2 (mod 2)) unknowns. The method is repeated, i.e., we eliminate the unknowns 2 (mod 4) and are left with the unknowns numbered 4 (mod 4) and so on. Eventually we are left with just a single equation which we solve. At this point we can use back substitution to compute the remaining unknowns.. p.24/31
...the odd-even simultaneous... There exist a second version of this method, called the odd-even simultaneous elimination. In the odd-even simultaneous elimination method we eliminate the odd-numbered unknowns from the even numbered equations and simultaneously the even numbered unknowns from the odd equations. In this way we are left with two decoupled equations, one for the even numbered unknowns and one for the odd numbered unknowns. The same method can be recursively applied for these two sets in parallel. Hence, in this method we do not reduce the size of the problem but we successively decouple the problems into smaller and smaller sizes of subproblems. Eventually we arrive at a system on diagonal form which we solve for all unknowns in parallel. Therefore, in this method there is no need to perform back substitution.. p.25/31
...the odd-even... 4 6 2 7 3 5 1 8 6 4 2 7 5 3 8 1 8 2 3 4 6 7 1 5 Two elimination steps of the simultaneous elimination method. p.26/31
...the odd-even... The computational complexity of the sequential LU factorization and forward and back substitution method for three-diagonal matrices is 8Ò. While performing the odd-even simultaneous elimination we perform 9Òlog 2 Ò flops to transform the system and Ò flops to solve the last diagonal system. Hence, the redundancy of the odd-even simultaneous elimination method is 98log 2 Ò which is the price we pay to get a fully parallel method.. p.27/31
2 1Ù 2 1 + 2 Ù 2 + 2 Ù 2+1 = 2 2 2 2 FMB - NLA Algebraic description of the odd-even... Consider the three-term recursion, which we rewrite as 2 Ù 2 + 2+1Ù 2+1 + 2+1Ù 2+2 = 2+1 2+1Ù 2+1 + 2+2Ù 2+2 + 2+2Ù 2+3 = 2+2 We multiply the first equation by, the third by 2+1 and add the 2 resulting 2 2+2 equations to the second equation. The so-resulting equation is () Ù 1 + (1) 2+1 2+1 + (1) Ù 2 2 (1) = 2 (1) 2+1 = 2+1 (1) 2 2 1 2+1 = 2+1 2 (1) 2+2 2+2 2+1 = 2+1 2+1 2+3 = (1) 2+1 = 0 1 where Ù 2 2 2 2+1 2+2 2+1 2+1 2+2 2+2 Next, the odd-even reduction is repeated for all odd numbered equations. The resulting system can be reduced in a similar way and eventually we are left with just one equation.. p.28/31
Similarly, for the even points we get (1) 1 Ù 2 2 + (1) 2 2 Ù 2 + (1) 2 Ù 2+2 = (1) 2 = 1 2 where (1) 1 (1) (1) 2 2 and (1) 2 are defined accordingly. 2 It is interesting to note that for a sufficiently diagonally dominant matrix, the reduction can be terminated or truncated after fewer than Ç(log 2 Ò) steps, since the reduced system can be considered numerically (i.e., up to a machine precision) as a diagonal system.. p.29/31
With the same indices for a block tridiagonal system we get = blocktridiag( 1 ) (1) = 1 2 2 2 2 1 (1) 2+1 = 2+1 1 2 2 2 2+1 1 2+2 2+1 (1) 2+1 = 1 2+12+2 2+2 (1). p.30/31
Some keywords to discuss Load balancing for cyclic reduction methods Divide-and-conquer techniques Domain decomposition ordering. p.31/31