An efficient algorithm for modelling progressive damage accumulation in disordered materials

Size: px

Start display at page:

Download "An efficient algorithm for modelling progressive damage accumulation in disordered materials"

Chad Gardner
5 years ago
Views:

1 INTERNATIONAL JOURNAL FOR NUMERICAL METHODS IN ENGINEERING Int. J. Numer. Meth. Engng 2005; 62: Published online 15 February 2005 in Wiley InterScience ( DOI: /nme.1257 An efficient algorithm for modelling progressive damage accumulation in disordered materials Phani Kumar V. V. Nukala 1,,,SrYan Šimunović 1 and Murthy N. Guddati 2 1 Computer Science and Mathematics Division, Oak Ridge National Laboratory, Oak Ridge, TN , U.S.A. 2 Department of Civil Engineering, North Carolina State University, Raleigh, NC , U.S.A. SUMMARY This paper presents an efficient algorithm for the simulation of progressive fracture in disordered quasi-brittle materials using discrete lattice networks. The main computational bottleneck involved in modelling the fracture simulations using large discrete lattice networks stems from the fact that a new large set of linear equations needs to be solved every time a lattice bond is broken. Using the present algorithm, the computational complexity of solving the new set of linear equations after breaking a bond reduces to a simple triangular solves (forward elimination and backward substitution) using the already Cholesky factored matrix. This algorithm using the direct sparse solver is faster than the Fourier accelerated iterative solvers such as the preconditioned conjugate gradient (PCG) solvers, and eliminates the critical slowing down associated with the iterative solvers that is especially severe close to the percolation critical points. Numerical results using random resistor networks for modelling the fracture and damage evolution in disordered materials substantiate the efficiency of the present algorithm. In particular, the proposed algorithm is especially advantageous for fracture simulations wherein ensemble averaging of numerical results is necessary to obtain a realistic lattice system response. Copyright 2005 John Wiley & Sons, Ltd. KEY WORDS: damage evolution; brittle materials; random thresholds model; lattice network; statistical physics Correspondence to: P. K. V. V. Nukala, Computer Science and Mathematics Division, Oak Ridge National Laboratory, Oak Ridge, TN , U.S.A. nukalapk@ornl.gov The submitted manuscript has been authored by a contractor of the U.S. Government under Contract No. DE-AC05-00OR Accordingly, the U.S. Government retains a non-exclusive, royalty-free license to publish or reproduce the published form of this contribution, or allow others to do so, for U.S. Government purposes. Contract/grant sponsor: U.S. Department of Energy; contract/grant number: DE-AC05-00OR22725 Received 9 June 2004 Revised 20 September 2004 Copyright 2005 John Wiley & Sons, Ltd. Accepted 18 October 2004

ALGORITHM FOR MODELLING PROGRESSIVE DAMAGE ACCUMULATION 1983 1.

2 ALGORITHM FOR MODELLING PROGRESSIVE DAMAGE ACCUMULATION INTRODUCTION Progressive damage evolution leading to failure of disordered quasi-brittle materials has been studied extensively using various types of discrete lattice models [1 4]. Figure 1 presents a discrete lattice network with a triangular lattice topology. The essential features of discrete lattice models used for modelling progressive damage evolution are: disorder, elastic response characteristics of the lattice bonds, and a breaking rule for each of the bonds in the lattice. The disorder in the system is introduced by either random dilution of bonds or by randomly prescribing different elastic stiffness constants or breaking thresholds to each of the bonds in the lattice. The elastic response of the individual bonds is typically described by random fuse models, central-force (spring) models, bond-bending spring models, and beam-type models. Depending on the elastic response characteristics of individual bonds, various types of breaking rules are prescribed. The elastic and breaking characteristics of each bond in the lattice are supposed to represent the mesoscopic response of the material. The mechanical breakdown of disordered materials is often modelled using its electrical analogue. Based on this analogy, the mechanical bonds in the lattice are modelled by the electrical fuses, and the mechanical strength of a bond is modelled by the fuse threshold current. The principle advantage of using the electrical fuse network is that it reduces the Figure 1. Random thresholds fuse network with triangular lattice topology. Each of the bonds in the network is a fuse with unit electrical conductance and a breaking threshold randomly assigned based on a probability distribution (uniform). The behaviour of the fuse is linear upto the breaking threshold. Periodic boundary conditions are applied in the horizontal direction and a unit voltage difference, V = 1, is applied between the top and bottom of lattice system bus bars. As the current I flowing through the lattice network is increased, the fuses will burn out one by one.

3 1984 P. K. V. V. NUKALA, S. ŠIMUNOVIĆ AND M. N. GUDDATI number of degrees of freedom by replacing the vector problem with the scalar one [5, 6]. Numerical simulations based on electrical and mechanical idealizations have shown excellent agreement between the scaling properties of the two models [5], provided an equivalence is established between electrical current, voltage and conductance in the electrical model and the mechanical stress, strain and Young s modulus in the mechanical model, respectively. Within this framework of the random thresholds fuse model, the lattice is initially fully intact with bonds having the same conductance, but the bond breaking thresholds, t, are randomly distributed based on a threshold probability distribution, p(t). The burning of a fuse occurs irreversibly, whenever the electrical current in the fuse exceeds the breaking threshold current value, t of the fuse. Numerically, the damage evolution in materials is simulated by setting a unit voltage difference, V = 1, between the top and bottom of lattice system bus bars, and the Kirchhoff equations are solved to determine the current flowing in each of the fuses. Subsequently, for each fuse j, the ratio between the current i j ( and ) the breaking threshold t j is evaluated, and the bond j c having the largest value, max j ij /t j, is irreversibly removed (burnt). The current is redistributed instantaneously after a fuse is burnt, implying that the current relaxation in the lattice system is much faster than the breaking of a fuse. We assume that the loading process is quasi-static in the sense that the current is redistributed instantaneously after a fuse is burnt, and only one fuse is burnt in a given load step. Each time a fuse is burnt, it is necessary to re-calculate the current distribution in the lattice to determine the subsequent breaking of fuses. This process of breaking fuses irreversibly one after another is repeated until the lattice network finally becomes disconnected. It is assumed that successive fuse failures leading ultimately to the failure of lattice system is similar to the breakdown of quasi-brittle materials. Numerical simulation of fracture using large discrete lattice networks is often hampered by the high computational cost associated with solving a new large set of linear equations every time a new lattice bond is broken. This becomes especially severe with increasing lattice system size, L, as the number of broken bonds at failure, n f, follows a power-law distribution given by n f O ( L 1.8). In addition, since the response of the lattice system corresponds to a specific realization of the random breaking thresholds, an ensemble averaging of numerical results over N config configurations is necessary to obtain a realistic representation of the lattice system response. This further increases the computational time required to perform simulations on large lattice systems. Fourier accelerated PCG iterative solvers [7 9] have been used in the past for simulation of breakdowns of large lattices. However, in fracture simulations using discrete lattice networks, these methods exhibit critical slowing down, particularly in the vicinity of percolation critical thresholds. As the lattice system gets closer to the macroscopic fracture, the condition number of the system of linear equations increases, thereby increasing the number of iterations required to attain a fixed accuracy. This becomes particularly significant for large lattices. Furthermore, the Fourier acceleration technique is not effective when fracture simulation is performed using central-force and bond-bending lattice models [8]. This study presents two classes of algorithms to speed up lattice simulations for modelling of fracture in disordered quasi-brittle materials. The first class of algorithms are based on Davis and Hager s [10, 11] multiple-rank sparse Cholesky factorization downdate using direct solvers, and the second class of algorithms are based on PCG iterative techniques using circulant and block-circulant preconditioners. The aim of the paper is to develop an

4 ALGORITHM FOR MODELLING PROGRESSIVE DAMAGE ACCUMULATION 1985 algorithm that significantly reduces the average computational time required to break one lattice configuration. The paper is organized as follows. Section 2 presents the algebraic problem of modelling progressive damage evolution using discrete lattice models. Section 3 describes the computational algorithms based on multiple-rank sparse Cholesky factorization update using direct solvers for random fuse networks. In Section 4, we present alternate block-circulant PCG iterative algorithms for solving the Kirchhoff equations. Section 5 presents the numerical simulations on 2D triangular lattice systems and compares the computational efficiency of the algorithms based on direct solvers with the iterative solvers. Concluding remarks are summarized in Section ALGEBRAIC PROBLEM Consider a lattice system with a total number of bonds, N el. Let N = {1, 2, 3,...,N el } denote the set of individual bonds in the lattice system. After n bonds are broken, let Sn b denote an ordered set of broken bonds such that the element number of the jth broken bond, where j = 1, 2,...,n,isgivenbye j = Sn b (j). The set of unbroken bonds after n bonds are broken is defined as Sn u = N \ Sb n. The cardinality of sets Sb n and Su n are n and M = (N el n), respectively. Based on the above definitions, it is clear that Sn b Sb n+1 and Su n Su n+1 for each n = 0, 1, 2,..., and S0 b = and Su 0 = N. Algebraically, fracture of discrete lattices by breaking one bond at a time is equivalent to solving a new set of linear equations (Kirchhoff equations in the case of an electrical analogue) A n x n = b n, n= 0, 1, 2,... (1) every time a new lattice bond is broken. In Equation (1), each matrix A n is an N N symmetric and positive definite matrix (the lattice conductance matrix in the case of fuse models and the lattice stiffness matrix in the case of spring and beam models), b n is the N 1 (given) applied nodal current or force vector, x n is the N 1 (unknown) nodal potential or displacement vector, and N is the number of degrees of freedom (unknowns) in the lattice. The subscript n in Equation (1) indicates that A n and b n are evaluated after the nth bond is broken. The solution x n, obtained after the nth bond is broken, is used in determining the subsequent ((n + 1)th) bond to be broken. The matrices A n and b n are obtained from the element (bond) matrices using the standard finite element assembly procedure (for example, see p. 42 of Reference [12]), denoted as A n = k e, b n = f e (2) e S u n where k e and f e are the element conductivity matrix and the current vector, respectively, and denotes the standard finite assembly operator. For the case of the random fuse model, the assembly procedure is equivalent to A n = G n Σ n Gn t = k e g e ge t n = 0, 1, 2,... (3) e S u n e S u n

5 1986 P. K. V. V. NUKALA, S. ŠIMUNOVIĆ AND M. N. GUDDATI where g e is the column vector (with entries 1, 0, or 1) of the N M incidence matrix, G n, associated with the element e. Similarly, k e is the conductivity (positive) of the element e, and Σ n is the M M positive definite diagonal conductivity matrix with diagonal entries k e corresponding to each of the elements of the set Sn u. In the case of central-force (spring) models, Equation (3) is still valid except that k e now denotes the spring constant of the element e, and the entries of g e are either n x, n y, n z, n x, n y, n z or 0, where ( ) n x,n y,n z denote the direction cosines of the element (see Equation (A1) in Appendix A). Mathematically, in the case of the fuse and spring models, the breaking of a bond is equivalent to a rank-one downdate of the matrix A n. In the case of beam models, the breaking of a bond is equivalent to multiple-rank (rank-6) downdate of the matrix A n. In the following, we present the process of breaking in the context of a fuse model. The methodology for central-force (spring) and beam models is similar and is presented in Appendix A. As discussed before, let N (the dimension of A n, where n = 0, 1, 2,...,) denote the total number of degrees of freedom in the lattice system and let A n represent the stiffness matrix of the random fuse network system in which n fuses are either missing (random dilution) or have been burnt during the analysis. Let us also assume that a fuse ij (the (n + 1)th fuse) is burnt when the externally applied voltage is increased gradually. In the above description, i and j refer to the global degrees of freedom connected by the fuse before it is broken. For the scalar random fuse model, the degrees of freedom i and j are also equivalent to the node i and node j connected by the fuse before it is broken. The new stiffness matrix A n+1 of the lattice system after the fuse ij is burnt is given by A n+1 = G n+1 Σ n+1 G t n+1 = G n Σ n G t n k nv n v t n = A n k n v n vn t, n = 0, 1, 2,... (4) where v n is a sparse vector with only a few (at most two) non-zero entries, } vn {0 t = i j (5) and k n is a positive constant denoting the conductance of the fuse ij ((n + 1)th broken bond) before it is broken. It is to be noted that for each n = 0, 1, 2,..., we have v n = g en+1 and k n = k en+1, where e n+1 denotes the element number of the (n + 1)th broken bond with connecting nodes i and j, and Sn+1 b = Sb n {e n+1}. For the random fuse and spring models, Equation (4) follows directly from Equation (3) by noting that breaking the (n + 1)th fuse (removing an element from the lattice) is equivalent to deleting the corresponding column v n associated with the (n + 1)th burnt fuse from the incident matrix G n. In particular, the relation A n = G n Σ n Gn t (Equation (3)) holds for each n, where G n is the result of deleting columns v 0, v 1,...,v n 1 from G 0. The matrix G 0 is an N N el rectangular incidence matrix with entries 1, 0, or 1 corresponding to each of the bonds (elements) of the original intact lattice. Similarly, the matrix Σ n is obtained by successively deleting the rows and columns corresponding to k 0,k 1,...,k n 1 diagonal entries of Σ 0. Furthermore, it is noted that

6 ALGORITHM FOR MODELLING PROGRESSIVE DAMAGE ACCUMULATION 1987 if the Cholesky factorizations are PA n P t = L n L t n (6) for each n = 0, 1, 2,..., where P is a permutation matrix chosen to preserve the sparsity of L n, then the sparsity pattern of L n+1 is contained in that of L n. Hence, for all n, the sparsity pattern of L n is contained in that of L 0. Based on the above description of successive A n for n = 0, 1, 2,..., an updating scheme of some kind is therefore likely to be more efficient than solving the new set of equations formed by Equation (1) for each n. In particular, since the successive matrices A n, for each n = 0, 1, 2,..., differ by a rank-one matrix, we successively downdate the Cholesky factorizations L n of A n, using the sparse Cholesky factorization downdate algorithm of Davis and Hager [10, 11]. For numerical simulations, we explore two variants of this approach. The successive rank-one downdates [10] of L n L n+1 for n = 0, 1, 2,..., leads to Solver Type 1 algorithm, and the multiple-rank ((n + 1 m) rank) downdate [11] of L m L n+1, where L m is the Cholesky factor of A m for 0 m n, leads to Solver Type 2 algorithm. The multiple-rank update of the sparse Cholesky factorization is computationally superior to an equivalent series of rank-one updates since the multiple-rank update makes one pass through L in computing the new entries, while a series of rank-one updates require multiple passes through L [11]. In addition, given the Cholesky factorization L m of A m for m = 0, 1, 2,..., it is possible that a direct updating of the solution x n+1 for n = m, m + 1, m + 2,... based on p = (n m) saxpy vector updates (see Section 3.2 for details), may sometimes be cheaper than successive sparse Cholesky updates of Solver Type 1. Hence, in this work, we compare the efficiency of both solver types 1 and 2 for solving the random thresholds fuse model. 3. ALGORITHMS BASED ON SPARSE DIRECT SOLVERS 3.1. Sparse Cholesky factorization update: L m L m+p for any p The algorithm presented in References [10, 11] is based on the analysis and manipulation of the underlying graph structure of the stiffness matrix A and on the methodology presented in References [13, 14] for modifying a dense Cholesky factorization. This algorithm incorporates the change in the sparsity pattern of L and is optimal in the sense that the computational time required is proportional to the number of changing non-zero entries in L. In particular, since the breaking of fuses is equivalent to removing the edges in the underlying graph structure of the stiffness matrix A, the new sparsity pattern of the modified L must be a subset of the sparsity pattern of the original L. Denoting the sparsity pattern of L by L, we have L m L n m<n (7) Therefore, we can use the modified dense Cholesky factorization update [10] and work only on the non-zero entries in L. Furthermore, since the changing non-zero entries in L depend on the ith and jth degrees of freedom of the fuse ij that is broken (at most two non-zeros entries in v n, n), it is only necessary to modify the non-zero elements of a submatrix of L.

7 1988 P. K. V. V. NUKALA, S. ŠIMUNOVIĆ AND M. N. GUDDATI The multiple-rank sparse Cholesky downdate algorithm downdates the Cholesky factorization L m of the matrix A m to L n+1 of the new matrix A n+1, where A n+1 = A m + σyy t, σ = 1, and Y represents a N p rank-p matrix. The pseudo-code in Algorithm 1 follows the matlab syntax [15] and its sparse matrix functionalities very closely. The following notation is used: zeros(m, n): anm n matrix of zero entries. L (i 1 : i 2,j) = L (i1 : i 2,j): refers to entries corresponding to i 1 th to i 2 th rows of column j of the matrix L. [ilist, jlist, val]=find (L (i 1 : i 2,j)): extracts the sparsity pattern of L (i 1 : i 2,j). That is, the non-zeros of L (i 1 : i 2,j) are stored in val, and the corresponding row and column indices are stored in ilist and jlist, respectively. j + ilist: increment each of the entries of ilist by j. length (ilist): length of the vector ilist. Sparse(ilist, jlist, val, m, n): create an m n sparse matrix with non-zero entries val located at the row and column indices given by ilist and jlist, respectively. Using the above notation, the multiple-rank sparse Cholesky factor update (σ =+1) or downdate (σ = 1) algorithm, L n+1 = SpChol (L m, Y, σ), where L n+1 Ln+1 t = L mlm t + σyyt, and Y represents an N p rank-p matrix, is given by Algorithm 1 (SpChol(L, Y, σ): Rank-p sparse Cholesky update/downdate algorithm [11]) 1: Convert LL t LD L t [15] ( ) ) 2: Initialize: ilist = jlist = vlist = zeros nnz ( L, 1 ; ifree = 1 3: Set α i = 1 for all i = 1, 2,...,p 4: for j = 1toN do 5: Algorithm 5 from Davis et al. [10] (insert Algorithm 2 from below) 6: Update the non-zeros of column j of L (insert pseudo-code 3 from below) 7: end for 8: nz = ifree 1 9: Update L = I + Sparse(ilist(1 : nz), jlist(1 : nz), vlist,n,n) 10: Convert LD L t LL t [15] Algorithm 2 (Algorithm 5 from Davis and Hager s [10]) 1: for i = 1top do 2: if Y ji = 0 then 3: ᾱ = α i + σy 2 4: D jj =ᾱd jj 5: γ i = σ Y ji /D jj 6: D jj = D jj /α i 7: α i =ᾱ 8: end if 9: end for ji /D jj {(σ =+1 for update or 1 for downdate)}

8 ALGORITHM FOR MODELLING PROGRESSIVE DAMAGE ACCUMULATION 1989 Algorithm 3 (Pseudo-code for updating the non-zeros of column j of L) 1: if j<n then 2: [iylist, jylist, yval]=find ( ) L (j+1):n,j 3: 4: for i = 1top do 5: if Y ji = 0 and yval = then 6: 7: Y j+iylist,i = Y j+iylist,i Y ji yval [mm, nn, ww]=find ( ) Y (j+1):n,i 8: vv = zeros(n, 1); 9: vv j+iylist = yval; vv j+mm = vv j+mm + γ i ww; 10: [iylist, jylist, yval] =find(vv((j + 1) : N) 11: end if 12: end for 13: 14: nz = length(iylist) 15: ilist(ifree : (ifree + nz 1)) = iylist + j 16: jlist(ifree : (ifree + nz 1)) = j 17: vlist(ifree : (ifree + nz 1)) = yval 18: ifree = ifree + nz 19: end if When the factorization of the matrix A is available as LDL t instead of LL t, the conversion from LL t to LDL t and vice versa is not performed in Algorithm 1. The reader is referred to References [10, 11] for a comprehensive presentation of the sparse Cholesky factorization update/downdate algorithm. Solver Type 1: Given the factorization L m of A m, the rank-1 sparse Cholesky modification (algorithm 1 with p = 1) is used to update the factorization L n+1 for all subsequent values of n = m, m+1,.... Once the factorization L n+1 of A n+1 is obtained, the solution vector x n+1 is obtained from L n+1 Ln+1 t x n+1 = b n+1 by two triangular solves. Solver Type 2: Given the factorization L m of A m for m n, the solution vector x n+1 after the (n + 1)th fuse is burnt, for n = m, m + 1, m + 2,..., is obtained by two triangular solves using the factor L m on a sparse load vector with only few non-zeros, and (n + 2 m) saxpy vector updates (see Section 3.2). The triangular solves are further simplified by the fact that it is performed on a trivial load vector (at most two non-zeros) and hence the triangular solves can be performed much more efficiently than O (nnz (L m )), where nnz (L m ) denotes the number of non-zeros of the Cholesky factorization L m of A m. Since the storage and computational cost associated with p = (n m) saxpy vector updates can become expensive, we use multiple-rank (rankp) sparse Cholesky update (Algorithm 1) L m L m+maxupd+1 for every maxupd (defined later) interval. In the following, we present an algorithm for updating the solution x n x n+1, given the factor L m of A m for m n. This algorithm is used in conjunction with the multiple-rank sparse Cholesky update L m L m+maxupd+1 of Solver Type 2.

9 1990 P. K. V. V. NUKALA, S. ŠIMUNOVIĆ AND M. N. GUDDATI 3.2. Update x n x n+1, given A m = L m Lm t for any m n Since the successive matrices A n and A n+1, for each n, differ by a rank-one matrix, it is possible to use Shermon Morrison Woodbury formula [15] to update the solution vector x n+1 directly knowing the sparse Cholesky factor L m of A m, for any m n. In the following, we assume that the factor L m of the stiffness matrix A m is available (after the mth bond is broken) for any 0 m n, and we update the solution vector x n x n+1 for all n = m, m + 1,.... Let b n and x n represent the load vector and solution vector after the nth bond is broken. Similarly, let us assume that a fuse ij ((n + 1)th fuse) is burnt when the externally applied voltage is increased gradually such that b n+1 and x n+1 represent the new load and solution vectors, and k n is the conductance of the fuse ij ((n + 1)th broken bond) before it is broken. In addition, let v (n m) denote the sparse vector corresponding to the (n + 1)th broken bond for any n m (similar to Equation (5)) with only a few (at most two) non-zero entries. The implementation details involved in the updating of x n+1 from x n using the factor L m are described below: Algorithm 4 (Solution vector update: x n x n+1, given A m = L m Lm t for any m n) 1: Solve L m Lm t u (n m) = v (n m) 2: if n = m then 3: u (n m) u (n m) + ũ (n m), where ũ (n m) = (n m 1) ( ) α l u t l v (n m) ul 4: end if k n 5: Evaluate α (n m) = ( ) 1 k n v(n m) t u (n m) 6: Store u (n m) and α (n m) (used in the future evaluation of ũ (n m+s), where s = 1, 2,...) 7: Update the load vector: b n b n+1 ) (see Appendix B, Remark 5) 8: Compute β = α (n m) (u(n m) t b n+1 9: if (i or j is prescribed) then 10: β = β k n 11: end if 12: Update x n+1 = x n + β u (n m) Algorithm 4 is repeated for all values of n = m, m + 1,... until either the lattice system has fractured or the number of updates (n + 1 m) exceeds maxupd, defined as the maximum number of updates between successive Cholesky factor updates. It is noted that the storage and computational requirements can become prohibitively expensive especially for large lattices as the number of updates, p = (n m), increases. Hence, it is necessary to limit the maximum number of updates between two successive Cholesky factor updates to a certain maxupd. That is, it is necessary to update L m L m+maxupd+1 using the multiple-rank (rank-p) sparse Cholesky update algorithm at every maxupd intervals. Once L m has been updated, the counter m in algorithm 4 is reset, i.e. m (m + maxupd + 1), and the algorithm 4 is repeated for all n = m, m + 1,..., until the lattice system fractures. Notes on algorithm 4: Implementation of the algorithm 4 takes advantage of the fact that v (n m) is a sparse vector with at most 2 non-zeros (see Equations (5) and (B1)). Hence, the l = 0

10 ALGORITHM FOR MODELLING PROGRESSIVE DAMAGE ACCUMULATION 1991 computational cost of Step 1 is much less than O(nnz(L m )). For the same reason, the scalar products v t (n m) u l and v t (n m) u (n m) can be evaluated trivially as described below. Depending on whether the degrees of freedom i and j of the broken bond ij ((n + 1)th broken bond) are constrained/prescribed (see Appendix B, Remarks 3 and 4), we have the following expressions for evaluating the scalar products in Steps 3 and 5 of Algorithm 4. 1: if i and j are free degrees of freedom then { } 2: v(n m) t = 0 1 i j 1 0 3: v(n m) t u l = (u l ) i (u l ) j and v(n m) t u (n m) = ( ) u (n m) i ( u (n m) )j 4: else if j is a constrained/prescribed degree of freedom then { } 5: v(n m) t = 0 1 i 0 6: v(n m) t u l = (u l ) i and v(n m) t u (n m) = ( u (n m) )i 7: else if i is a constrained/prescribed degree of freedom then { } 8: v(n m) t = 0 j 1 0 9: v t (n m) u l = (u l ) j and v t (n m) u (n m) = ( u (n m) )j 10: end if Similarly, the updating of the load vector b n b n+1 in Step 10 of the algorithm 4 is computed trivially by updating a single entry of the vector b n as shown below (see Appendix B, Remark 5): 1: if j is a prescribed degree of freedom then 2: (b n+1 ) i = (b n ) i k n 3: else if i is a prescribed degree of freedom then 4: (b n+1 ) j = (b n ) j k n 5: end if Based on the above implementation of Algorithm 4, the computational cost involved in breaking the (n + 1)th fuse ij (say i and j are free degrees of freedom as in Section 2) is (n + 2 m) saxpy vector updates, one vector inner product, and two triangular solves L m L t m u (n m) = v (n m) on a sparse vector v (n m) with at most two non-zeros. The computational cost of the associated triangular solves is much less than (O(nnz (L m ))) because of the sparsity of the vector v (n m). The optimum number of steps between successive factorization updates of the matrix A is determined by minimizing the cpu time required for the entire analysis. Let t fac denote the average cpu time required for performing or multiple-rank sparse updating of the Cholesky factorization A m = L m L t m, and t back denote the average cpu time required for solving L m L t m u (n m) = v (n m) with a sparse RHS vector v (n m). Let t upd denote the average cpu time required for a single rank-1 update of the solution ũ n+1 m. Note that the evaluation of ũ n+1 m requires (n m) saxpy vector updates. Let the estimated number of steps for the lattice system failure be n steps. Then, the total cpu time required for solving the linear system of equations until the lattice

11 1992 P. K. V. V. NUKALA, S. ŠIMUNOVIĆ AND M. N. GUDDATI system failure is given by Ψ = n fac t fac + n steps t back + n rank1 n rank1 t upd = n fac t fac + n steps t back ( nsteps n fac ) n fac n steps n fac t upd (8) where n fac denotes the number of factorization updates until lattice system failure and n rank1 = (n steps n fac )/n fac denotes the average number of rank-1 updates per factorization. The optimum number of factorizations, n optfac, for the entire analysis is obtained by minimizing the function Ψ via Ψ/ n fac = 0. The maximum number of vector updates, maxupd, between successive factorization updates is then estimated as maxupd = ( nsteps n optfac ) n optfac (9) It should be noted that the above estimate of maxupd is based only on minimizing the computational cost. However, for large lattice systems, in core storage of the Cholesky factor L m, and the u l vectors, where l = 0, 1, 2,...,p and p = (n m), may also become a limiting factor. In such cases, maxupd is chosen based on storage constraints Alternative algorithms based on sparse direct solvers In general, the multiple-rank update of L m L n+1 as in Solver Type 2, after fuses n = m, m + 1,...,m+ maxupd are broken, is expected to be computationally cheaper than performing the direct factorization of the new stiffness matrix A m+maxupd+1 [10, 11]. However, as the number of updates p = (n m) increases, especially close to macroscopic fracture, where A n+1 is much sparser than A m, a direct factorization of A n+1 may become cheaper than multiplerank updating of L m L n+1. In order to take advantage of the much sparser A n+1 and for comparison of different solver types based on sparse direct solvers, we use the following solver types 3 and 4 in the numerical simulation of random threshold fuse model. Solver Type 3: Given the factor L m of A m for m n, the solution vector x n+1 after the (n + 1)th fuse is burnt, for n = m, m + 1,m+ 2,...,m+ maxupd, is obtained by two triangular solves using L m Lm t on a sparse load vector with at most two non-zeros, and (n + 2 m) saxpy vector updates (see Section 3.2). The only difference between Solver Types 2 and 3 is that instead of multiple-rank updating of L m L m+maxupd+1 after every maxupd steps, we refactorize A m+maxupd+1 directly to obtain L m+maxupd+1. Solver Type 4: Along with the Solver Type 3, we have also investigated an alternative modified direct method using dense matrix updates. The details of one such algorithm are given in Appendix C. However, based on the operation counts and the actual cpu times measured during numerical simulations, the algorithm presented in Appendix C was found to be computationally inefficient compared with the other algorithms presented in this section.

12 ALGORITHM FOR MODELLING PROGRESSIVE DAMAGE ACCUMULATION CIRCULANT PRECONDITIONERS FOR CG ITERATIVE SOLVERS This study also investigates the use of PCG algorithms for re-solving the new set of Kirchhoff equations (stiffness matrix) every time a fuse is burnt. In particular, we investigate the choice of using circulant and block-circulant matrices as preconditioners to the Laplacian operator on a fractal network. The main advantage is that the preconditioners can be diagonalized by discrete Fourier matrices, and hence the inversion of an N N circulant matrix can be done in O(N log N) operations by using FFTs of size N. In addition, since the convergence rate of the PCG method depends on the condition number and clustering of the eigenvalues of the preconditioned system, it is possible to choose a circulant preconditioner that minimizes the condition number of the preconditioned system [16, 17]. Furthermore, these circulant preconditioned systems exhibit favourable clustering of eigenvalues. In general, the more clustered the eigenvalues are, the faster the convergence rate is. As noted earlier, since the initial lattice grid is uniform, the Laplacian operator (Kirchhoff equations) on the initial uniform grid results in a Toeplitz matrix A 0. Hence, a fast Poisson type solver with a circulant preconditioner can be used to obtain the solution in O(N log N) operations using FFTs of size N. However, as the lattice bonds are broken successively, the initial uniform lattice grid becomes a fractal network. Consequently, although the matrix A 0 is Toeplitz (also block Toeplitz with Toeplitz blocks) initially, the subsequent matrices A n, for each n, are not Toeplitz matrices. However, A n may still possess block structure with many of the blocks being Toeplitz blocks depending on the pattern of broken bonds. Even with the above observation that A n for n>0 is not Toeplitz, for the purposes of direct comparison with the previously published algorithms for solving random thresholds fuse models, this study explores the choice of optimal [17 20] and superoptimal [16, 17] circulant preconditioners along with incomplete Cholesky preconditioners [15] for the Laplacian operator (Kirchhoff equations) on a fractal network. In addition, since the matrices A n in general have block structure with many blocks being Toeplitz blocks, we also use block-circulant matrices [17, 21] as preconditioners to the stiffness matrix. In the literature [7 9], Fourieraccelerated PCG has been used to accelerate the iterative solution of random resistor networks near the percolation critical threshold. However, the type of ensemble-averaged circulant preconditioner used in these studies is not optimal in the sense described in References [17 20], and hence is expected to take more CG iterations than the optimal and superoptimal circulant preconditioners. Consider the N N stiffness matrix A. The optimal circulant preconditioner c(a) [18] is defined as the minimizer of C A F over all N N circulant matrices C. In the above description, F denotes the Frobenius norm [15]. The optimal circulant preconditioner c(a) is uniquely determined by A, and is given by c(a) = F δ(faf )F (10) where F denotes the discrete Fourier matrix, δ(a) denotes the diagonal matrix whose diagonal is equal to the diagonal of the matrix A, and denotes the adjoint (i.e. conjugate transpose). It should be noted that the diagonals of FAF represent the eigenvalues of the matrix c(a) and can be obtained in O(N log N) operations by taking the FFT of the first column of c(a). The first column vector of T. Chan s optimal circulant preconditioner matrix that minimizes

13 1994 P. K. V. V. NUKALA, S. ŠIMUNOVIĆ AND M. N. GUDDATI the norm C A F is given by c i = 1 N N j = 1 a j,(j i+1) mod N (11) The above formula can be interpreted simply as follows: the element c i is simply the arithmetic average of those diagonal elements of A extended to length N by wrapping around and containing the element a i,1. If the matrix A is Hermitian, the eigenvalues of c(a) are bounded below and above by λ min (A) λ min (c(a)) λ max (c(a)) λ max (A) (12) where λ min ( ) and λ max ( ) denote the minimum and maximum eigenvalues, respectively. Based on the above result, if A is positive definite, then the circulant preconditioner c(a) is also positive definite. In particular, if the circulant preconditioner is such that the spectra of the preconditioned system is clustered around one, then the convergence of the solution will be fast. The superoptimal circulant preconditioner t(a) [16] is based on the idea of minimizing the norm I C 1 A F over all non-singular circulant matrices C. Since the asymptotic convergence of the t(a) preconditioned system is same as c(a) for large N, in this study, we limit ourselves to the investigation of preconditioned systems using c(a) given by Equation (11). The computational cost associated with the solution of the preconditioned system c(a)z = r is the initialization cost of nnz(a) for setting the first column of c(a) using Equation (11) during the first iteration, and O(N log N) during every iteration step. Since the Laplacian operator on a discrete lattice network results in block structure in the stiffness matrix, we use block-circulant matrices [17, 21] as preconditioners to the stiffness matrix. In order to distinguish the block-circulant preconditioners that follow from the abovedescribed circulant preconditioners, we refer henceforth to the above preconditioners as pointcirculant preconditioners Block-circulant preconditioners Let the matrix A be partitioned into r r blocks such that each block is an s s matrix. That is, N = rs, and A 1,1 A 1,2 A 1,r A 2,1 A 2,2 A 2,r A = A r,1 A r,2 A r,r Although the point-circulant preconditioner c(a) defined by Equation (11) can be used as a preconditioner, in general, the block structure is not restored by using c(a) as a preconditioner. In contrast, the block-circulant preconditioners obtained by using circulant approximations for each of the blocks restore the block structure of A. The block-circulant preconditioner of A (13)

14 ALGORITHM FOR MODELLING PROGRESSIVE DAMAGE ACCUMULATION 1995 can be expressed as c(a 1,1 ) c(a 1,2 ) c(a 1,r ) c(a 2,1 ) c(a 2,2 ) c(a 2,r ) c B (A) = c(a r,1 ) c(a r,2 ) c(a r,r ) (14) It is the minimizer of C A F over all matrices C that are r r block matrices with s s circulant blocks. The spectral properties given by Equation (12) for point-circulant preconditioners also extend to the block-circulant preconditioners [17, 21]. That is, λ min (A) λ min (c B (A)) λ max (c B (A)) λ max (A) (15) In particular, if A is positive definite, then the block-preconditioner c B (A) is also positive definite. The computational cost associated with the block-circulant preconditioners can be estimated as follows. Since the stiffness matrix A is real symmetric for the type of problems considered in this study, in the following, we assume block symmetric structure for A, i.e. A j,i = A t i,j. In forming the block-circulant preconditioner given by Equation (14), it is necessary to obtain point-circulant preconditioners for each of the r r block matrices of order s. The pointcirculant approximation for each of the s s blocks requires O(s log s) operations. This cost is in addition to the cost of forming the first column vectors (Equation (11)) for each of the c(a i,j ) blocks, which is given by nnz(a) operations. Since there are (r(r + 1))/2 blocks, we need O(r 2 s log s) operations to form δ ( FA 1,1 F ) δ ( FA 1,2 F ) δ ( FA 1,r F ) δ ( FA 2,1 F ) δ ( FA 2,2 F ) δ ( FA 2,r F ) Δ = (I F)c B (A)(I F ) = (16) δ ( FA r,1 F ) δ ( FA r,2 F ) δ ( FA r,r F ) where refers to the Kronecker tensor product and I is an r r identity matrix. In order to solve the preconditioned equation c B (A)z = r, Equation (16) is permuted to obtain a blockdiagonal matrix of the form Δ 1, Δ 2,2 0 Δ = P ΔP = (17) Δ s,s

15 1996 P. K. V. V. NUKALA, S. ŠIMUNOVIĆ AND M. N. GUDDATI where P is the permutation matrix such that [ Δ k,k ] = [ δ ( FA i,j F )] 1 i, j r, 1 k s (18) ij kk During each iteration, in order to solve the preconditioned system c B (A)z = r, it is necessary to solve with the block-diagonal matrix Δ. This task can be performed by first factorizing each of the Δ k,k blocks during the first iteration, and then subsequently using these factored matrices to do the triangular solves. Hence, without considering the first factorizing cost of each of the block diagonals, during each iteration, the number of operations involving the solves with Δ is ( s ) delops = O L Δ k,k (19) k = 1 where L Δ k,k denotes the number of non-zeros in the Cholesky factorization of Δ k,k. Therefore, the system c B (A)z = r can be solved in O(rs log s) + delops operations per iteration. Thus, we conclude that for the block-circulant preconditioner, the initialization cost is nnz(a) + O(r 2 s log s) plus the cost associated with the factorization of each of the diagonal blocks Δ k,k during the first iteration, and O(rs log s) + delops during every other iteration. Although from the operational cost per iteration point of view, the point-circulant preconditioner may prove advantageous for some problems, it is not clear whether point-circulant or block-circulant is closest to the matrix A in terms of the number of CG iterations necessary for convergence. Hence, we investigate both point- and block-circulant preconditioners in obtaining the solution of the linear system Ax = b using iterative techniques. In addition, we also employ the commonly used point and block versions of the incomplete Cholesky preconditioners to solve Ax = b. Remark 1 In the case of a 2D discrete lattice network with periodic boundary conditions in the horizontal direction and a constant voltage difference between the top and bottom of the lattice network, A is a block tri-diagonal real symmetric matrix. Under these circumstances, the initialization cost is nnz(a) + O(rs log s). Since each of the diagonal blocks Δ k,k is tri-diagonal, during each iteration the solution involving Δ can be obtained in O(rs) operations. Thus, the cost per iteration is O(rs log s)+ O(rs)= O(rs log s) operations. The total computational cost involved in using the block-circulant preconditioner for a symmetric block tri-diagonal matrix is the initialization cost of nnz(a) + O(rs log s), and O(rs log s) operations per iteration step. This is significantly less than the cost of using a generic block-circulant preconditioner. It should be noted that the block tri-diagonal structure of A does not change the computational cost associated with using a point-circulant preconditioner to solve Ax = b. 5. MODEL PROBLEM: TRIANGULAR LATTICE SYSTEM Consider a 2D random threshold fuse lattice system of size L L with periodic boundary conditions in the horizontal direction and a unit voltage difference, V = 1, applied between the top and bottom of lattice system bus bars (see Figure 1). For the triangular lattice topology, N el = (3L + 1)(L + 1), and N = L(L + 1). The model problem considered in this study is well

16 ALGORITHM FOR MODELLING PROGRESSIVE DAMAGE ACCUMULATION 1997 described in References [22, 23, 1, p. 45, 2, p. 231], and the references therein. The elastic response of each bond in the lattice is linear up to an assigned threshold value, at which brittle failure of the bond occurs. The disorder in the system is introduced by assigning random maximum threshold current values t, (which is equivalent to the breaking stress in mechanical problem) to each of the fuses (bonds) in the lattice, based on an assumed probability distribution. The electrical conductance (stiffness in the mechanical problem) is assumed to be the same and equal to unity for all the bonds in the lattice. This is justified because the conductance (or stiffness) of a heterogeneous solid converges rapidly to its scale-independent continuum value. A uniform probability distribution, which is constant between 0 and 1, is chosen as the probability distribution of failure thresholds. A broad thresholds distribution represents large disorder and exhibits diffusive damage (uncorrelated burning of fuses) leading to progressive damage localization, whereas a very narrow thresholds distribution exhibits brittle failure in which a single crack propagation causes material failure. This relatively simple model has been extensively used in the literature [1, 2, 22 25] for simulating the fracture and progressive damage evolution in brittle materials, and provides a meaningful benchmark for comparing the performance of different numerical algorithms. As mentioned earlier, damage evolution in materials is simulated by slowly increasing the external voltage on the lattice system until the current i flowing in an individual fuse exceeds its breaking threshold t. At this point, the fuse is burnt irreversibly. Each time a fuse is removed, the electrical current is instantaneously redistributed and a new system of Kirchhoff equations is solved to determine the fuse that is going to burn up under the redistributed currents. The simulation is initiated with an intact lattice, and the burning of fuses under a gradual increase of external voltage is repeated until the entire lattice system falls apart. Figure 2 presents the snapshots of progressive damage evolution for the case of a uniformly distributed random thresholds model problem in a triangular fuse lattice system of size L = 256. Similarly, Figures 3 and 4 present the snapshots of damage and the scaled stress distribution, respectively, in a triangular spring lattice system of size L = 256 (see Appendix A for algorithmic details of Cholesky downdating scheme for simulating a lattice system with spring elements) Brief summary of the solver types used in the numerical simulations For the numerical simulation of a random thresholds fuse model, we use the following four alternate solver types presented once again for quick reference: Solver Type 1: Given the factor L m of A m, the rank-1 sparse Cholesky update /downdate (algorithm 1) presented in Section 3.1 is used to downdate the factorization L n+1 for all subsequent values of n = m, m + 1,... Once the factorization L n+1 of A n+1 is obtained, the solution vector x n+1 is obtained by two triangular solves of L n+1 Ln+1 t x n+1 = b n+1. Solver Type 2: Given the factor L m of A m, the solution vector x n+1 after the (n + 1)th fuse is burnt, for n = m, m+1,..., m+maxupd, is obtained using the algorithm presented in Section 3.2. The multiple-rank sparse Cholesky update/downdate algorithm presented in Section 3.1 is used to downdate the factorization L m L m+maxupd+1 after every maxupd steps. Solver Type 3: Assuming that the factor L m of matrix A m is available after the mth fuse is burnt, the implementation follows the algorithm 4 presented in Section 3.2 to determine the solution vector x n+1 after the (n+1)th fuse is burnt, for n = m, m+1,...,m+maxupd. Factorization of the matrix A is performed every maxupd steps.

17 1998 P. K. V. V. NUKALA, S. ŠIMUNOVIĆ AND M. N. GUDDATI Figure 2. Snapshots of damage in a typical triangular lattice system of size L = 256. Number of broken bonds at the peak load and at failure are and , respectively. (a) (i) represent the snapshots of damage after n b bonds are broken: (a) n b = 6250; (b) n b = ; (c) n b = ; (d) n b = (peak load); (e) n b = ; (f) n b = ; (g) n b = ; (h) n b = ; and (i) n b = (failure). Solver Type 4: Assuming that the factor L m of A m is available, the dense matrix update (algorithm 5) presented in Appendix C is used to obtain the solution vector x n+1 after the (n + 1)th fuse is burnt, for n = m, m + 1,...,m+ maxupd. Once again, factorization of the matrix A is performed every maxupd steps. In addition to the above four types of direct solvers, we have performed numerical simulations using PCG iterative solvers. The first iterative solver is CG without any preconditioner. An

ALGORITHM FOR MODELLING PROGRESSIVE DAMAGE ACCUMULATION 1999 Figure 3. Snapshots of damage (broken bonds) evolution in a typical triangular spring lattice system of size L = 256.

18 ALGORITHM FOR MODELLING PROGRESSIVE DAMAGE ACCUMULATION 1999 Figure 3. Snapshots of damage (broken bonds) evolution in a typical triangular spring lattice system of size L = 256. Number of broken bonds at the peak load and at failure are and , respectively: (1) (9) represent the snapshots of damage after n b bonds are broken: (1) n b = 5000; (2) n b = ; (3) n b = ; (4) n b = ; (5) n b = (just after peak load); (6) n b = ; (7) n b = ; (8) n b = ; and (9) n b = (close to failure). incomplete Cholesky with a drop tolerance of 10 3 is used as the second PCG iterative solver. The drop tolerance sets the small elements of the Cholesky factor to zero after they are computed but before they update other coefficients. Elements are dropped if they are smaller than drop tolerance times the norm of the column of the Cholesky factor and they are not on the diagonal and they are not in the non-zero pattern of A n. In addition, the factorization is also modified so that the row sums of L n L t n are equal to the row sums of A n. Circulant (Equation (11)) and block-circulant (Equation (14)) PCG have also been used to accelerate the iterative solution.

19 2000 P. K. V. V. NUKALA, S. ŠIMUNOVIĆ AND M. N. GUDDATI Figure 4. Scaled stress distribution corresponding to the snapshots of damage in Figure 3. Once again, it should be noted that the Fourier accelerated PCG presented in References [7 9] is not optimal in the sense described in References [17 20], and hence is expected to take more CG iterations compared with the optimal and superoptimal circulant preconditioners. In the numerical simulations using solver types 1 4, the maximum number of vector updates, maxupd, is chosen to be a constant for a given lattice size L. We choose maxupd = 25 for L ={4, 8, 16, 24, 32}, maxupd = 50 for L = 64, and maxupd = 100 for L ={128, 256, 512}. For L = 512, maxupd is limited to 100 by memory constraints. By keeping the maxupd value constant, it is possible to compare realistically the computational cost associated with different solver types. Moreover, the relative cpu times taken by these algorithms remain the same even when the simulations are performed on different platforms.

20 ALGORITHM FOR MODELLING PROGRESSIVE DAMAGE ACCUMULATION 2001 It is well known that the convergence rate of an iterative solver depends on the quality of starting vector. When PCG iterative solvers described in Section 4 are used in updating the solution x n x n+1, we use either the solution from the previous converged state, i.e. x n,or the zero vector, 0, as the starting vectors for the CG iteration of x n+1. The choice of using either x n or 0 as the starting vectors depends on which vector is closer to b n+1 in 2-norm. That is, if b n+1 A n x n 2 b n+1 2, then x n is used, else 0 is used as the starting vector for the x n+1 CG iteration. A relative residual tolerance ε = is used in CG iteration Numerical simulation results Tables I IV present the cpu and wall-clock times taken for one configuration (simulation) using the solver types 1 4, respectively. These tables also indicate the number of configurations, Table I. Solver type 1. Size CPU (s) Wall (s) Simulations Table II. Solver type 2. Size CPU (s) Wall (s) Simulations Table III. Solver type 3. Size CPU (s) Wall (s) Simulations Table IV. Solver type 4. Size CPU (s) Wall (s) Simulations

An exact reanalysis algorithm using incremental Cholesky factorization and its application to crack growth modeling

An exact reanalysis algorithm using incremental Cholesky factorization and its application to crack growth modeling INTERNATIONAL JOURNAL FOR NUMERICAL METHODS IN ENGINEERING Int. J. Numer. Meth. Engng 01; 91:158 14 Published online 5 June 01 in Wiley Online Library (wileyonlinelibrary.com). DOI: 10.100/nme.4 SHORT