PhD Thesis. Sparse preconditioners for dense linear systems from electromagnetic applications

Size: px

Start display at page:

Download "PhD Thesis. Sparse preconditioners for dense linear systems from electromagnetic applications"

Millicent Owens
6 years ago
Views:

1 N o Ordre: 1879 PhD Thesis Spécialité : Informatique Sparse preconditioners for dense linear systems from electromagnetic applications présentée le 23 Avril 2002 à l Institut National Polytechnique de Toulouse par Bruno CARPENTIERI CERFACS devant le Jury composé de : G. Alléon EADS M. Daydé Professeur à l ENSEEIHT I. S. Duff Project Leader at CERFACS Group Leader at Rutherford Appleton Laboratory Président L. Giraud CERFACS G. Meurant CEA Rapporteur Y. Saad Professor at the University of Minnesota Rapporteur S. Piperno INRIA-CERMICS CERFACS report: TH/PA/02/48

3 i Acknowledgments I wish to express my sincere gratitude to Iain S. Duff and Luc Giraud because they introduced me to the subject of this thesis and guided my research with vivid interest. They taught me the enjoyment both for rigour and for simplicity, and let me experience the freedom and the excitement of personal discovery. Without their professional advice and their trust in me, this thesis would not be possible. My sincere thanks go to Michel Daydé for his continued support in the development of my research at CERFACS. I am grateful to Gerard Meurant and Yousef Saad who accepted to act as referees for my thesis. It was an honour for me to benefit from their feedback on my research work. I wish to thank Guillaume Alléon and Serge Piperno who opened me the door of an enriching collaboration with EADS and INRIA-CERMICS, respectively, and accepted to take part in my jury. Guillaume Sylvand at INRIA-CERMICS deserves thanks for providing me with codes and valuable support. Grateful acknowledgments are made for the EMC Team at CERFACS for their interest in my work, in particular to Mbarek Fares who provided me with the CESC code, and Francis Collino and Florence Millot for many fertile discussions. I would like to thank sincerely all the members of the Parallel Algorithms Team and CSG at CERFACS for their professional and friendly support, and Brigitte Yzel for helping me many times gently. The Parallel Algorithms Team represented a stimulating environment to develop my thesis. I am grateful to many visitors or colleagues who, at different stages, shared my enjoyment for this research. Above all, I wish to express my deep gratitude to my family and friends for their presence and continued support. This work was supported by INDAM under a grant Borsa di Studio per l Estero A.A (Provvedimento del Presidente del 30 Aprile 1998), and by CERFACS. - B. C.

4 ii

5 iii To my family

6 iv

7 v Don t just say it is impossible without putting a sincere effort. Observe the word Impossible carefully.. You can see I m possible. What really matters is your attitude and your perception. Anonymous

8 vi

9 vii Abstract In this work, we investigate the use of sparse approximate inverse preconditioners for the solution of large dense complex linear systems arising from integral equations in electromagnetism applications. The goal of this study is the development of robust and parallelizable preconditioners that can easily be integrated in simulation codes able to treat large configurations. We first adapt to the dense situation the preconditioners initialy developed for sparse linear systems. We compare their respective numerical behaviours and propose a robust pattern selection strategy for Frobenius-norm minimization preconditioners. Our approach has been implemented by another PhD student in a large parallel code that exploits a fast multipole calculation for the matrix vector product in the Krylov iterations. This enables us to study the numerical scalability of our preconditioner on large academic and industrial test problems in order to identify its limitations. To remove these limitations we propose an embedded scheme. This inner-outer technique enables to significantly reduce the computational cost of the simulation and improve the robustness of the preconditioner. In particular, we were able to solve a linear system with more than a million unknowns arising from a simulation on a real aircraft. That solution was out of reach with our initial technique. Finally we perform a preliminary study on a spectral two-level preconditioner to enhance the robustness of our preconditioner. This numerical technique exploits spectral information of the preconditioned systems to build a low-rank update of the preconditioner. Keywords : Krylov subspace methods, preconditioning techniques, sparse approximate inverse, Frobenius-norm minimization method, nonzero pattern selection strategies, electromagnetic scattering applications, boundary element method, fast multipole method.

10 viii

11 Contents 1 Introduction The physical problem and applications The mathematical problem Numerical solution of Maxwell s equations Differential equation methods Integral equation methods Direct versus iterative solution methods A sparse approach for solving scattering problems Iterative solution via preconditioned Krylov solvers of dense systems in electromagnetism Introduction and motivation Preconditioning based on sparsification strategies SSOR Incomplete Cholesky factorization AINV SPAI SLU Other preconditioners Concluding remarks Sparse pattern selection strategies for robust Frobeniusnorm minimization preconditioner Introduction and motivation Pattern selection strategies for Frobenius-norm minimization methods in electromagnetism Algebraic strategy Topological strategy Geometric strategy Numerical experiments Strategies for the coefficient matrix Numerical results Concluding remarks ix

12 x 4 Symmetric Frobenius-norm minimization preconditioners in electromagnetism Comparison with standard preconditioners Symmetrization strategies for Frobenius-norm minimization method Concluding remarks Combining fast multipole techniques and approximate inverse preconditioners for large parallel electromagnetics calculations The fast multipole method Implementation of the Frobenius-norm minimization preconditioner in the fast multipole framework Numerical scalability of the preconditioner Improving the preconditioner robustness using embedded iterations Concluding remarks Spectral two-level preconditioner Introduction and motivation Two-level preconditioner via low-rank spectral updates Additive formulation Numerical experiments Symmetric formulation Multiplicative formulation of low-rank spectral updates Numerical experiments Concluding remarks Conclusions and perspectives 145 A Numerical results with the two-level spectral preconditioner 153 A.1 Effect of the low-rank updates on the GMRES convergence. 154 A.2 Experiments with the operator W H = Vɛ H M A.3 Cost of the eigencomputation A.4 Sensitivity of the preconditioner to the accuracy of the eigencomputation A.5 Experiments with a poor preconditioner M A.6 Numerical results for the symmetric formulation A.7 Numerical results for the multiplicative formulation

13 List of Tables Number of matrix-vector products needed by some unpreconditioned Krylov solvers to reduce the residual by a factor of Number of iterations using both symmetric and unsymmetric preconditioned Krylov methods to reduce the normwise backward error by 10 5 on Example 1. The symbol - means that convergence was not obtained after 500 iterations. The symbol * means that the method is not applicable Number of iterations required by different Krylov solvers preconditioned by SSOR to reduce the residual by The symbol - means that convergence was not obtained after 500 iterations Number of iterations, varying the sparsity level of Ã and the level of fill-in on Example Number of iterations, varying the sparsity level of Ã and the level of fill-in on Example Number of iterations, varying the sparsity level of Ã and the level of fill-in on Example Number of iterations, varying the sparsity level of Ã and the level of fill-in on Example Number of iterations, varying the sparsity level of Ã and the level of fill-in on Example Number of SQMR iterations, varying the shift parameter for various level of fill-in in IC Number of iterations required by different Krylov solvers preconditioned by AINV to reduce the residual by The symbol - means that convergence was not obtained after 500 iterations Number of iterations required by different Krylov solvers preconditioned by AINV to reduce the residual by The preconditioner is computed using the dense coefficient matrix. The symbol - means that convergence was not obtained after 500 iterations xi

14 xii Number of iterations required by different Krylov solvers preconditioned by SPAI to reduce the residual by The symbol - means that convergence was not obtained after 500 iterations Number of iterations required by different Krylov solvers preconditioned by SLU to reduce the residual by The symbol - means that convergence was not obtained after 500 iterations Number of iterations using the preconditioners based on dense A Number of iterations for GMRES(50) preconditioned with different values for the density of M using the same pattern for A and larger patterns. A geometric approach is adopted to construct the patterns. The test problem is Example 1. This is representative of the general behaviour observed Number of iterations to solve the set of test problems CPU time to compute the preconditioners Number of iterations to solve the set of test models by using a multiple density geometric strategy to construct the preconditioner. The pattern imposed on M is twice as dense as that imposed on A Number of iterations to solve the set of test models by using a topological strategy to sparsify A and a geometric strategy for the preconditioner. The pattern imposed on M is twice as dense as that imposed on A Number of iterations with some standard preconditioners computed using sparse A (algebraic) Number of iterations on the test examples using the same pattern for the preconditioners Number of iterations for M Sym F rob combined with SQMR using three times more non-zero in Ã than in the preconditioner Number of iterations of SQMR with M Sym F rob with different values for the density of M, using the same pattern for A and larger patterns. The test problem is Example Number of iterations of SQMR with M Aver F rob with different values for the density of M, using the same pattern for A and larger patterns. The test problem is Example Number of iterations of SQMR with M Sym F rob with different orderings Number of iterations on the test examples using the same pattern for the preconditioners. An algebraic pattern is used to sparsify A

15 xiii Number of iterations M Sym F rob combined with SQMR using three times more non-zero in Ã than in the preconditioner. An algebraic pattern is used to sparsify A Number of iterations of SQMR with M Sym F rob with different values for the density of M, using the same pattern for A and larger patterns. A geometric approach is adopted to construct the pattern for the preconditioner and an algebraic approach is adopted to construct the pattern for the coefficient matrix. The test problem is Example Number of iterations of SQMR with M Aver F rob with different values for the density of M, using the same pattern for A and larger patterns. A geometric approach is adopted to construct the pattern for the preconditioner and an algebraic approach is adopted to construct the pattern for the coefficient matrix. The test problem is Example Number of iterations of SQMR with M Sym F rob with different ordering. An algebraic pattern is used to sparsify A Total number of matrix-vector products required to converge on a sphere on problems of increasing size - tolerance = The size of the leaf-boxes in the oct-tree associated with the preconditioner is wavelengths Elapsed time required to build the preconditioner and by GMRES(30) to converge on a sphere on problems of increasing size on eight processors on a Compaq Alpha server - tolerance = Total number of matrix-vector products required to converge on an aircraft on problems of increasing size - tolerance = Elapsed time required to build the preconditioner and by GMRES(30) to converge on an aircraft on problems of increasing size on eight procs on a Compaq Alpha server - tolerance = Elapsed time to build the preconditioner, elapsed time to solve the problem and total number of matrix-vector products using GMRES(30) on an aircraft with unknowns - tolerance = eight processors Compaq, varying the parameters controlling the density of the preconditioner. The symbol means stagnation after 1000 iterations Tests on the parallel scalability of the code relative to the construction and application of the preconditioner and to the matrix-vector product operation on problems of increasing size. The test example is the Airbus aircraft

16 xiv Global elapsed time and total number of matrix-vector products required to converge on a sphere with points varying the size of the restart parameters and the maximum number of inner GMRES iterations per FGMRES preconditioning step - tolerance = eight processors Compaq Global elapsed time and total number of matrixvector products required to converge on an aircraft with unknowns varying the size of the restart parameters and the maximum number of inner GMRES iterations per FGMRES preconditioning step - tolerance = eight processors Compaq Total number of matrix-vector products required to converge on a sphere on problems of increasing size - tolerance = Total number of matrix-vector products required to converge on an aircraft on problems of increasing size - tolerance = Effect of shifting the eigenvalues nearest zero on the convergence of GMRES(10) for Example 2. We show the magnitude of successively shifted eigenvalues and the number of iterations required when these eigenvalues are shifted. A tolerance of 10 8 is required in the iterative solution Effect of shifting the eigenvalues nearest zero on the convergence of GMRES(10) for Example 5. We show the magnitude of successively shifted eigenvalues and the number of iterations required when these eigenvalues are shifted. A tolerance of 10 8 is required in the iterative solution Number of iterations required by GMRES(10) preconditioned by a Frobenius-norm minimization method updated with spectral corrections to reduce the normwise backward error by 10 8 for increasing size of the coarse space on Example 1. Different choices are considered for the operator W H Number of iterations required by GMRES(10) preconditioned by a Frobenius-norm minimization method updated with spectral corrections to reduce the normwise backward error by 10 8 for increasing size of the coarse space on Example 2. Different choices are considered for the operator W H Number of iterations required by GMRES(10) preconditioned by a Frobenius-norm minimization method updated with spectral corrections to reduce the normwise backward error by 10 8 for increasing size of the coarse space on Example 3. Different choices are considered for the operator W H

17 xv Number of iterations required by GMRES(10) preconditioned by a Frobenius-norm minimization method updated with spectral corrections to reduce the normwise backward error by 10 8 for increasing size of the coarse space on Example 4. Different choices are considered for the operator W H Number of iterations required by GMRES(10) preconditioned by a Frobenius-norm minimization method updated with spectral corrections to reduce the normwise backward error by 10 8 for increasing size of the coarse space on Example 5. Different choices are considered for the operator W H Number of matrix-vector products required by the IRAM algorithm to compute approximate eigenvalues nearest zero and the corresponding right eigenvectors Number of amortization vectors required by the IRAM algorithm to compute approximate eigenvalues nearest zero and the corresponding right eigenvectors. The computation of the amortization vectors is relative to GMRES(10) and a tolerance of Number of iterations required by GMRES(10) preconditioned by a Frobenius-norm minimization method updated with spectral corrections to reduce the normwise backward error by 10 8 for increasing size of the coarse space. The formulation of Theorem 2 with the choice W H = Vε H M 1 is used for the low-rank updates. The computation of Ritz pairs is carried out at machine precision A.1.1 Number of iterations required by GMRES preconditioned by a Frobenius-norm minimization method updated with spectral corrections to reduce the normwise backward error by 10 8 for increasing size of the coarse space A.1.2 Number of iterations required by GMRES preconditioned by a Frobenius-norm minimization method updated with spectral corrections to reduce the normwise backward error by 10 5 for increasing size of the coarse space A.1.3 Number of iterations required by GMRES preconditioned by a Frobenius-norm minimization method updated with spectral corrections to reduce the normwise backward error by 10 8 for increasing size of the coarse space A.1.4 Number of iterations required by GMRES preconditioned by a Frobenius-norm minimization method updated with spectral corrections to reduce the normwise backward error by 10 5 for increasing size of the coarse space

18 xvi A.1.5 Number of iterations required by GMRES preconditioned by a Frobenius-norm minimization method updated with spectral corrections to reduce the normwise backward error by 10 8 for increasing size of the coarse space A.1.6 Number of iterations required by GMRES preconditioned by a Frobenius-norm minimization method updated with spectral corrections to reduce the normwise backward error by 10 5 for increasing size of the coarse space A.1.7 Number of iterations required by GMRES preconditioned by a Frobenius-norm minimization method updated with spectral corrections to reduce the normwise backward error by 10 8 for increasing size of the coarse space A.1.8 Number of iterations required by GMRES preconditioned by a Frobenius-norm minimization method updated with spectral corrections to reduce the normwise backward error by 10 5 for increasing size of the coarse space A.1.9 Number of iterations required by GMRES preconditioned by a Frobenius-norm minimization method updated with spectral corrections to reduce the normwise backward error by 10 8 for increasing size of the coarse space A.1.10 Number of iterations required by GMRES preconditioned by a Frobenius-norm minimization method updated with spectral corrections to reduce the normwise backward error by 10 5 for increasing size of the coarse space A.2.11Number of iterations required by GMRES preconditioned by a Frobenius-norm minimization method updated with spectral corrections to reduce the normwise backward error by 10 8 for increasing size of the coarse space. The formulation of Theorem 2 with the choice W H = Vε H M 1 is used for the low-rank updates A.2.12Number of iterations required by GMRES preconditioned by a Frobenius-norm minimization method updated with spectral corrections to reduce the normwise backward error by 10 5 for increasing size of the coarse space. The formulation of Theorem 2 with the choice W H = Vε H M 1 is used for the low-rank updates A.2.13Number of iterations required by GMRES preconditioned by a Frobenius-norm minimization method updated with spectral corrections to reduce the normwise backward error by 10 8 for increasing size of the coarse space. The formulation of Theorem 2 with the choice W H = Vε H M 1 is used for the low-rank updates

19 xvii A.2.14Number of iterations required by GMRES preconditioned by a Frobenius-norm minimization method updated with spectral corrections to reduce the normwise backward error by 10 5 for increasing size of the coarse space. The formulation of Theorem 2 with the choice W H = Vε H M 1 is used for the low-rank updates A.2.15Number of iterations required by GMRES preconditioned by a Frobenius-norm minimization method updated with spectral corrections to reduce the normwise backward error by 10 8 for increasing size of the coarse space. The formulation of Theorem 2 with the choice W H = Vε H M 1 is used for the low-rank updates A.2.16Number of iterations required by GMRES preconditioned by a Frobenius-norm minimization method updated with spectral corrections to reduce the normwise backward error by 10 5 for increasing size of the coarse space. The formulation of Theorem 2 with the choice W H = Vε H M 1 is used for the low-rank updates A.2.17Number of iterations required by GMRES preconditioned by a Frobenius-norm minimization method updated with spectral corrections to reduce the normwise backward error by 10 8 for increasing size of the coarse space. The formulation of Theorem 2 with the choice W H = Vε H M 1 is used for the low-rank updates A.2.18Number of iterations required by GMRES preconditioned by a Frobenius-norm minimization method updated with spectral corrections to reduce the normwise backward error by 10 5 for increasing size of the coarse space. The formulation of Theorem 2 with the choice W H = Vε H M 1 is used for the low-rank updates A.2.19Number of iterations required by GMRES preconditioned by a Frobenius-norm minimization method updated with spectral corrections to reduce the normwise backward error by 10 8 for increasing size of the coarse space. The formulation of Theorem 2 with the choice W H = Vε H M 1 is used for the low-rank updates A.2.20 Number of iterations required by GMRES preconditioned by a Frobenius-norm minimization method updated with spectral corrections to reduce the normwise backward error by 10 5 for increasing size of the coarse space. The formulation of Theorem 2 with the choice W H = Vε H M 1 is used for the low-rank updates

20 xviii A.3.21 Number of matrix-vector products, CPU time and amortization vectors required by the IRAM algorithm to compute approximate eigenvalues nearest zero and the corresponding eigenvectors. The computation of the amortization vectors is relative to GMRES(10) and a tolerance of A.3.22 Number of matrix-vector products, CPU time and amortization vectors required by the IRAM algorithm to compute approximate eigenvalues nearest zero and the corresponding eigenvectors. The computation of the amortization vectors is relative to GMRES(10) and a tolerance of A.3.23 Number of matrix-vector products, CPU time and amortization vectors required by the IRAM algorithm to compute approximate eigenvalues nearest zero and the corresponding eigenvectors. The computation of the amortization vectors is relative to GMRES(10) and a tolerance of A.3.24 Number of matrix-vector products, CPU time and amortization vectors required by the IRAM algorithm to compute approximate eigenvalues nearest zero and the corresponding eigenvectors. The computation of the amortization vectors is relative to GMRES(10) and a tolerance of A.3.25 Number of matrix-vector products, CPU time and amortization vectors required by the IRAM algorithm to compute approximate eigenvalues nearest zero and the corresponding eigenvectors. The computation of the amortization vectors is relative to GMRES(10) and a tolerance of A.4.26Number of iterations required by GMRES preconditioned by a Frobenius-norm minimization method updated with spectral corrections to reduce the residual by 10 8 for increasing size of the coarse space. The formulation of Theorem 2 with the choice W H = Vε H M 1 is used for the low-rank updates. The computation of Ritz pairs is carried out at machine precision A.4.27Number of iterations required by GMRES preconditioned by a Frobenius-norm minimization method updated with spectral corrections to reduce the normwise backward error by 10 5 for increasing size of the coarse space. The formulation M 1 is used for the low-rank updates. The computation of Ritz pairs is carried out at machine precision of Theorem 2 with the choice W H = V H ε

21 xix A.4.28Number of iterations required by GMRES preconditioned by a Frobenius-norm minimization method updated with spectral corrections to reduce the normwise backward error by 10 8 for increasing size of the coarse space. The formulation of Theorem 2 with the choice W H = Vε H M 1 is used for the low-rank updates. The computation of Ritz pairs is carried out at machine precision A.4.29Number of iterations required by GMRES preconditioned by a Frobenius-norm minimization method updated with spectral corrections to reduce the normwise backward error by 10 5 for increasing size of the coarse space. The formulation of Theorem 2 with the choice W H = Vε H M 1 is used for the low-rank updates. The computation of Ritz pairs is carried out at machine precision A.4.30 Number of iterations required by GMRES preconditioned by a Frobenius-norm minimization method updated with spectral corrections to reduce the normwise backward error by 10 8 for increasing size of the coarse space. The formulation of Theorem 2 with the choice W H = Vε H M 1 is used for the low-rank updates. The computation of Ritz pairs is carried out at machine precision A.4.31 Number of iterations required by GMRES preconditioned by a Frobenius-norm minimization method updated with spectral corrections to reduce the normwise backward error by 10 5 for increasing size of the coarse space. The formulation of Theorem 2 with the choice W H = Vε H M 1 is used for the low-rank updates. The computation of Ritz pairs is carried out at machine precision A.4.32 Number of iterations required by GMRES preconditioned by a Frobenius-norm minimization method updated with spectral corrections to reduce the normwise backward error by 10 8 for increasing size of the coarse space. The formulation M 1 is used for the low-rank updates. The computation of Ritz pairs is carried out at machine precision of Theorem 2 with the choice W H = V H ε A.4.33 Number of iterations required by GMRES preconditioned by a Frobenius-norm minimization method updated with spectral corrections to reduce the normwise backward error by 10 5 for increasing size of the coarse space. The formulation M 1 is used for the low-rank updates. The computation of Ritz pairs is carried out at machine precision of Theorem 2 with the choice W H = V H ε

22 xx A.4.34 Number of iterations required by GMRES preconditioned by a Frobenius-norm minimization method updated with spectral corrections to reduce the normwise backward error by 10 8 for increasing size of the coarse space. The formulation of Theorem 2 with the choice W H = Vε H M 1 is used for the low-rank updates. The computation of Ritz pairs is carried out at machine precision A.4.35 Number of iterations required by GMRES preconditioned by a Frobenius-norm minimization method updated with spectral corrections to reduce the normwise backward error by 10 5 for increasing size of the coarse space. The formulation of Theorem 2 with the choice W H = Vε H M 1 is used for the low-rank updates. The computation of Ritz pairs is carried out at machine precision A.5.36 Number of iterations required by GMRES preconditioned by a Frobenius-norm minimization method updated with spectral corrections to reduce the normwise backward error by 10 8 for increasing size of the coarse space. The formulation of Theorem 2 with the choice W H = Vε H M 1 is used for the low-rank updates. The same nonzero sructure is imposed on A and M A.5.37 Number of iterations required by GMRES preconditioned by a Frobenius-norm minimization method updated with spectral corrections to reduce the normwise backward error by 10 5 for increasing size of the coarse space. The formulation of Theorem 2 with the choice W H = Vε H M 1 is used for the low-rank updates. The same nonzero structure is imposed on A and M A.5.38 Number of iterations required by GMRES preconditioned by a Frobenius-norm minimization method updated with spectral corrections to reduce the normwise backward error by 10 8 for increasing size of the coarse space. The formulation of Theorem 2 with the choice W H = Vε H M 1 is used for the low-rank updates A.5.39 Number of iterations required by GMRES preconditioned by a Frobenius-norm minimization method updated with spectral corrections to reduce the normwise backward error by 10 5 for increasing size of the coarse space. The formulation M 1 is used for the low-rank updates. The same nonzero structure is imposed on A and M of Theorem 2 with the choice W H = V H ε

23 xxi A.5.40 Number of iterations required by GMRES preconditioned by a Frobenius-norm minimization method updated with spectral corrections to reduce the normwise backward error by 10 8 for increasing size of the coarse space. The formulation of Theorem 2 with the choice W H = Vε H M 1 is used for the low-rank updates. The same nonzero structure is imposed on A and M A.5.41 Number of iterations required by GMRES preconditioned by a Frobenius-norm minimization method updated with spectral corrections to reduce the normwise backward error by 10 5 for increasing size of the coarse space. The formulation of Theorem 2 with the choice W H = Vε H M 1 is used for the low-rank updates. The same nonzero structure is imposed on A and M A.5.42 Number of iterations required by GMRES preconditioned by a Frobenius-norm minimization method updated with spectral corrections to reduce the normwise backward error by 10 8 for increasing size of the coarse space. The formulation of Theorem 2 with the choice W H = Vε H M 1 is used for the low-rank updates. The same nonzero structure is imposed on A and M A.5.43 Number of iterations required by GMRES preconditioned by a Frobenius-norm minimization method updated with spectral corrections to reduce the normwise backward error by 10 5 for increasing size of the coarse space. The formulation of Theorem 2 with the choice W H = Vε H M 1 is used for the low-rank updates. The same nonzero structure is imposed on A and M A.5.44 Number of iterations required by GMRES preconditioned by a Frobenius-norm minimization method updated with spectral corrections to reduce the normwise backward error by 10 8 for increasing size of the coarse space. The formulation M 1 is used for the low-rank updates. The same nonzero sructure is imposed on A and M of Theorem 2 with the choice W H = V H ε A.5.45 Number of iterations required by GMRES preconditioned by a Frobenius-norm minimization method updated with spectral corrections to reduce the normwise backward error by 10 5 for increasing size of the coarse space. The formulation M 1 is used for the low-rank updates. The same nonzero structure is imposed on A and M of Theorem 2 with the choice W H = V H ε

24 xxii A.5.46 Number of matrix-vector products, CPU time and amortization vectors required by the IRAM algorithm to compute approximate eigenvalues nearest zero and the corresponding eigenvectors. The computation of the amortization vectors is relative to GMRES(10) and a tolerance of A.5.47 Number of matrix-vector products, CPU time and amortization vectors required by the IRAM algorithm to compute approximate eigenvalues nearest zero and the corresponding eigenvectors. The computation of the amortization vectors is relative to GMRES(10) and a tolerance of A.5.48 Number of matrix-vector products, CPU time and amortization vectors required by the IRAM algorithm to compute approximate eigenvalues nearest zero and the corresponding eigenvectors. The computation of the amortization vectors is relative to GMRES(10) and a tolerance of A.5.49 Number of matrix-vector products, CPU time and amortization vectors required by the IRAM algorithm to compute approximate eigenvalues nearest zero and the corresponding eigenvectors. The computation of the amortization vectors is relative to GMRES(10) and a tolerance of A.5.50 Number of matrix-vector products, CPU time and amortization vectors required by the IRAM algorithm to compute approximate eigenvalues nearest zero and the corresponding eigenvectors. The computation of the amortization vectors is relative to GMRES(10) and a tolerance of A.6.51 Number of iterations required by GMRES preconditioned by a Frobenius-norm minimization method updated with spectral corrections to reduce the normwise backward error by 10 8 for increasing size of the coarse space. The symmetric formulation of Theorem 2 with the choice W = V ε is used for the low-rank updates A.6.52 Number of iterations required by GMRES preconditioned by a Frobenius-norm minimization method updated with spectral corrections to reduce the normwise backward error by 10 5 for increasing size of the coarse space. The symmetric formulation of Theorem 2 with the choice W = V ε is used for the low-rank updates

25 xxiii A.6.53 Number of iterations required by GMRES preconditioned by a Frobenius-norm minimization method updated with spectral corrections to reduce the normwise backward error by 10 8 for increasing size of the coarse space. The symmetric formulation of Theorem 2 with the choice W = V ε is used for the low-rank updates A.6.54 Number of iterations required by GMRES preconditioned by a Frobenius-norm minimization method updated with spectral corrections to reduce the normwise backward error by 10 5 for increasing size of the coarse space. The symmetric formulation of Theorem 2 with the choice W = V ε is used for the low-rank updates A.6.55 Number of iterations required by GMRES preconditioned by a Frobenius-norm minimization method updated with spectral corrections to reduce the normwise backward error by 10 8 for increasing size of the coarse space. The symmetric formulation of Theorem 2 with the choice W = V ε is used for the low-rank updates A.6.56 Number of iterations required by GMRES preconditioned by a Frobenius-norm minimization method updated with spectral corrections to reduce the normwise backward error by 10 5 for increasing size of the coarse space. The symmetric formulation of Theorem 2 with the choice W = V ε is used for the low-rank updates A.6.57 Number of iterations required by GMRES preconditioned by a Frobenius-norm minimization method updated with spectral corrections to reduce the normwise backward error by 10 8 for increasing size of the coarse space. The symmetric formulation of Theorem 2 with the choice W = V ε is used for the low-rank updates A.6.58 Number of iterations required by GMRES preconditioned by a Frobenius-norm minimization method updated with spectral corrections to reduce the normwise backward error by 10 5 for increasing size of the coarse space. The symmetric formulation of Theorem 2 with the choice W = V ε is used for the low-rank updates A.6.59 Number of iterations required by GMRES preconditioned by a Frobenius-norm minimization method updated with spectral corrections to reduce the normwise backward error by 10 8 for increasing size of the coarse space. The symmetric formulation of Theorem 2 with the choice W = V ε is used for the low-rank updates

26 xxiv A.6.60 Number of iterations required by GMRES preconditioned by a Frobenius-norm minimization method updated with spectral corrections to reduce the normwise backward error by 10 5 for increasing size of the coarse space. The symmetric formulation of Theorem 2 with the choice W = V ε is used for the low-rank updates A.6.61 Number of iterations required by SQMR preconditioned by a Frobenius-norm minimization method updated with spectral corrections to reduce the normwise backward error by 10 8 for increasing size of the coarse space. The symmetric formulation of Theorem 2 with the choice W = V ε is used for the low-rank updates A.6.62 Number of iterations required by SQMR preconditioned by a Frobenius-norm minimization method updated with spectral corrections to reduce the normwise backward error by 10 5 for increasing size of the coarse space. The symmetric formulation of Theorem 2 with the choice W = V ε is used for the low-rank updates A.7.63 Number of iterations required by GMRES preconditioned by a Frobenius-norm minimization method updated with spectral corrections to reduce the normwise backward error by 10 8 for increasing size of the coarse space. The preconditioner is updated in multiplicative form A.7.64 Number of iterations required by GMRES preconditioned by a Frobenius-norm minimization method updated with spectral corrections to reduce the normwise backward error by 10 5 for increasing size of the coarse space. The preconditioner is updated in multiplicative form A.7.65 Number of iterations required by GMRES preconditioned by a Frobenius-norm minimization method updated with spectral corrections to reduce the normwise backward error by 10 8 for increasing size of the coarse space. The preconditioner is updated in multiplicative form A.7.66 Number of iterations required by GMRES preconditioned by a Frobenius-norm minimization method updated with spectral corrections to reduce the normwise backward error by 10 5 for increasing size of the coarse space. The preconditioner is updated in multiplicative form A.7.67 Number of iterations required by GMRES preconditioned by a Frobenius-norm minimization method updated with spectral corrections to reduce the normwise backward error by 10 8 for increasing size of the coarse space. The preconditioner is updated in multiplicative form

27 xxv A.7.68 Number of iterations required by GMRES preconditioned by a Frobenius-norm minimization method updated with spectral corrections to reduce the normwise backward error by 10 5 for increasing size of the coarse space. The preconditioner is updated in multiplicative form A.7.69 Number of iterations required by GMRES preconditioned by a Frobenius-norm minimization method updated with spectral corrections to reduce the normwise backward error by 10 8 for increasing size of the coarse space. The preconditioner is updated in multiplicative form A.7.70 Number of iterations required by GMRES preconditioned by a Frobenius-norm minimization method updated with spectral corrections to reduce the normwise backward error by 10 5 for increasing size of the coarse space. The preconditioner is updated in multiplicative form A.7.71 Number of iterations required by GMRES preconditioned by a Frobenius-norm minimization method updated with spectral corrections to reduce the normwise backward error by 10 8 for increasing size of the coarse space. The preconditioner is updated in multiplicative form A.7.72 Number of iterations required by GMRES preconditioned by a Frobenius-norm minimization method updated with spectral corrections to reduce the normwise backward error by 10 5 for increasing size of the coarse space. The preconditioner is updated in multiplicative form A.7.73 Number of iterations required by SQMR preconditioned by a Frobenius-norm minimization method updated with spectral corrections to reduce the normwise backward error by 10 8 for increasing size of the coarse space. The symmetric formulation of Theorem 2 with the choice W = V ε is used for the low-rank updates. The preconditioner is updated in multiplicative form A.7.74 Number of iterations required by SQMR preconditioned by a Frobenius-norm minimization method updated with spectral corrections to reduce the normwise backward error by 10 5 for increasing size of the coarse space. The symmetric formulation of Theorem 2 with the choice W = V ε is used for the low-rank updates. The preconditioner is updated in multiplicative form

28 xxvi

29 List of Figures Example of discretized mesh Meshes associated with test examples Eigenvalue distribution in the complex plane of the coefficient matrix of Example Pattern structure of the large entries of A. The test problem is Example Nonzero pattern for A when the smallest entries are discarded. The test problem is Example Sensitivity of SQMR convergence to the SSOR parameter ω for Example Sensitivity of SQMR convergence to the SSOR parameter ω for Example Incomplete factorization algorithm - M = LDL T The spectrum of the matrix preconditioned with IC(1), the condition number of L, and the number of iterations with SQMR for various values of the shift parameter τ. The test problem is Example 1 and the density of Ã is around 3% The eigenvalue distribution on the square [-1, 1] of the matrix preconditioned with IC(1), the condition number of L, and the number of iterations with SQMR for various values of the shift parameter τ. The test problem is Example 1 and the density of Ã is around 3% The eigenvalue distribution on the square [-0.3, 0.3] of the matrix preconditioned with IC(1), the condition number of L, and the number of iterations with SQMR for various values of the shift parameter τ. The test problem is Example 1 and the density of Ã is around 3% The biconjugation algorithm - M = ZD 1 Z T Sparsity patterns of the inverse of A (on the left) and of the inverse of its lower triangular factor (on the right), where all the entries whose relative magnitude is smaller than are dropped. The test problem, representative of the general trend, is a small sphere xxvii

30 xxviii Histograms of the magnitude of the entries of the first column of A 1 and its lower triangular factor. A similar behaviour has been observed for all the other columns. The test problem, representative of the general trend, is a small sphere Pattern structure of A 1. The test problem is Example Example of discretized mesh Topological neighbours of a DOF in the mesh Topological localization in the mesh for the large entries of A. The test problem is Example 1 and is representative of the general behaviour Topological localization in the mesh for the large entries of A 1. The test problem is Example 1 and is representative of the general behaviour Evolution of the density of the pattern computed for increasing number of levels. The test problem is Example 1. This is representative of the general behaviour Geometric localization in the mesh for the large entries of A. The test problem is Example 1. This is representative of the general behaviour Geometric localization in the mesh for the large entries of A 1. The test problem is Example 1. This is representative of the general behaviour Evolution of the density of the pattern computed for larger geometric neighbourhoods. The test problem is Example 1. This is representative of the general behaviour Mesh of Example Nonzero pattern for A 1 when the smallest entries are discarded. The test problem is Example Sparsity pattern of the inverse of sparse A associated with Example 1. The pattern has been sparsified with the same value of the threshold used for the sparsification of displayed in Figure CPU time for the construction of the preconditioner using a different number of nonzeros in the patterns for A and M. The test problem is Example 1. This is representative of the other examples Eigenvalue distribution for the coefficient matrix preconditioned by using a single density strategy on Example Eigenvalue distribution for the coefficient matrix preconditioned by using a multiple density strategy on Example

31 xxix Interactions in the one-level FMM. For each leaf-box, the interactions with the gray neighbouring leaf-boxes are computed directly. The contribution of far away cubes are computed approximately. The multipole expansions of far away boxes are translated to local expansions for the leaf-box; these contributions are summed together and the total field induced by far away cubes is evaluated from local expansions The oct-tree in the FMM algorithm. The maximum number of children is eight. The actual number corresponds to the subset of eight that intersect the object (courtesy of G. Sylvand, INRIA CERMICS) Interactions in the multilevel FMM. The interactions for the gray boxes are computed directly. We denote by dashed lines the interaction list for the observation box, that consists of those cubes that are not neighbours of the cube itself but whose parent is a neighbour of the cube s parent. The interactions of the cubes in the list are computed using the FMM. All the other interactions are computed hierarchically on a coarser level, denoted by solid lines Mesh associated with the Airbus aircraft (courtesy of EADS). The surface is discretized by triangles The RCS curve for an Airbus aircraft discretized with unknowns. The problem is formulated using the EFIE formulation and a tolerance of in the iterative solution. The quantity reported on the ordinate axis indicates the value of the energy radiated back at different incidence angles The RCS curve for an Airbus aircraft discretized with unknowns. The problem is formulated using the CFIE formulation and a tolerance of 10 6 in the iterative solution. The quantity reported on the ordinate axis indicates the value of the energy radiated back at different incidence angles Effect of the restart parameter on GMRES stagnation on an aircraft with unknowns Inner-outer solution schemes in the FMM context. Sketch of the algorithm Convergence history of restarted GMRES for different values of restart on an aircraft with unknowns Effect of the restart parameter on FGMRES stagnation on an aircraft with unknowns using GMRES(20) as inner solver Eigenvalue distribution for the coefficient matrix preconditioned by the Frobenius-norm minimization method on Example

32 xxx Number of iterations required by GMRES preconditioned by a Frobenius-norm minimization method updated with spectral corrections to reduce the normwise backward error by 10 8 and 10 5 for increasing size of the coarse space on Example Number of iterations required by GMRES preconditioned by a Frobenius-norm minimization method updated with spectral corrections to reduce the normwise backward error by 10 8 and 10 5 for increasing size of the coarse space on Example Number of iterations required by GMRES preconditioned by a Frobenius-norm minimization method updated with spectral corrections to reduce the normwise backward error by 10 8 and 10 5 for increasing size of the coarse space on Example Number of iterations required by GMRES preconditioned by a Frobenius-norm minimization method updated with spectral corrections to reduce the normwise backward error by 10 8 and 10 5 for increasing size of the coarse space on Example Number of iterations required by GMRES preconditioned by a Frobenius-norm minimization method updated with spectral corrections to reduce the normwise backward error by 10 8 and 10 5 for increasing size of the coarse space on Example Number of iterations required by GMRES preconditioned by a Frobenius-norm minimization method updated with spectral corrections to reduce the normwise backward error by 10 8 for three choices of restart and increasing size of the coarse space on Example Number of iterations required by GMRES preconditioned by a Frobenius-norm minimization method updated with spectral corrections to reduce the normwise backward error by 10 8 for three choices of restart and increasing size of the coarse space on Example Number of iterations required by GMRES preconditioned by a Frobenius-norm minimization method updated with spectral corrections to reduce the normwise backward error by 10 8 for three choices of restart and increasing size of the coarse space on Example Eigenvalue distribution for the coefficient matrix preconditioned by a Frobenius-norm minimization method on Example 2. The same sparsity pattern is used for A and for the preconditioner

33 xxxi Convergence of GMRES preconditioned by a Frobeniusnorm minimization method updated with spectral corrections to reduce the normwise backward error by 10 8 and 10 5 for increasing size of the coarse space on Example 1. The formulation of Theorem 2 with the choice W H = Vε H M 1 is used for the low-rank updates. The same nonzero structure is used for A and M Convergence of GMRES preconditioned by a Frobeniusnorm minimization method updated with spectral corrections to reduce the normwise backward error by 10 8 and 10 5 for increasing size of the coarse space on Example 2. The formulation of Theorem 2 with the choice W H = Vε H M 1 is used for the low-rank updates. The same nonzero structure is used for A and M Convergence of GMRES preconditioned by a Frobeniusnorm minimization method updated with spectral corrections to reduce the normwise backward error by 10 8 and 10 5 for increasing size of the coarse space on Example 3. The formulation of Theorem 2 with the choice W H = Vε H M 1 is used for the low-rank updates. The same nonzero structure is used for A and M Convergence of GMRES preconditioned by a Frobeniusnorm minimization method updated with spectral corrections to reduce the normwise backward error by 10 8 and 10 5 for increasing size of the coarse space on Example 4. The formulation of Theorem 2 with the choice W H = Vε H M 1 is used for the low-rank updates. The same nonzero structure is used for A and M Number of iterations required by SQMR preconditioned by a Frobenius-norm minimization method updated with spectral corrections to reduce the normwise backward error by 10 5 for increasing size of the coarse space on Example 1. The symmetric formulation of Theorem 2 with the choice W = V ε is used for the low-rank updates Number of iterations required by SQMR preconditioned by a Frobenius-norm minimization method updated with spectral corrections to reduce the normwise backward error by 10 5 for increasing size of the coarse space on Example 2. The symmetric formulation of Theorem 2 with the choice W = V ε is used for the low-rank updates

34 xxxii Number of iterations required by SQMR preconditioned by a Frobenius-norm minimization method updated with spectral corrections to reduce the normwise backward error by 10 5 for increasing size of the coarse space on Example 3. The symmetric formulation of Theorem 2 with the choice W = V ε is used for the low-rank updates Number of iterations required by SQMR preconditioned by a Frobenius-norm minimization method updated with spectral corrections to reduce the normwise backward error by 10 5 for increasing size of the coarse space on Example 4. The symmetric formulation of Theorem 2 with the choice W = V ε is used for the low-rank updates Number of iterations required by SQMR preconditioned by a Frobenius-norm minimization method updated with spectral corrections to reduce the normwise backward error by 10 5 for increasing size of the coarse space on Example 5. The symmetric formulation of Theorem 2 with the choice W = V ε is used for the low-rank updates Convergence of GMRES preconditioned by a Frobeniusnorm minimization method updated with spectral corrections to reduce the normwise backward error by 10 8 and 10 5 for increasing numberof corrections on Example 1. The symmetric formulation of Theorem 2 with the choice W = V ε is used for the low-rank updates. The preconditioner is updated in multiplicative form Convergence of GMRES preconditioned by a Frobeniusnorm minimization method updated with spectral corrections to reduce the normwise backward error by 10 8 and 10 5 for increasing size of the coarse space on Example 3. The symmetric formulation of Theorem 2 with the choice W = V ε is used for the low-rank updates. The preconditioner is updated in multiplicative form Convergence of GMRES preconditioned by a Frobeniusnorm minimization method updated with spectral corrections to reduce the normwise backward error by 10 8 and 10 5 for increasing size of the coarse space on Example 4. The symmetric formulation of Theorem 2 with the choice W = V ε is used for the low-rank updates. The preconditioner is updated in multiplicative form

35 Chapter 1 Introduction This thesis considers the problem of designing effective preconditioning strategies for the iterative solution of boundary integral equations in electromagnetism. An accurate numerical solution of these problems is required in the simulation of many industrial processes, such as the prediction of the Radar Cross Section (RCS) of arbitrarily shaped 3D objects like aircrafts, the analysis of electromagnetic compatibility of electrical devices with their environment, and many others. In the last 20 years, owing to the impressive development in computer technology and to the introduction of fast methods which require less computational cost and memory resources, a rigorous numerical solution of many of these applications has become possible [29]. Nowadays challenging problems in an industrial setting demand a continuous reduction in the computational complexity of the numerical methods employed; the aim of this research is to investigate the use of sparse linear algebra techniques (with particular emphasis on preconditioning) for the solution of dense linear systems of equations arising from scattering problems expressed in an integral formulation. In this chapter, we illustrate the motivation of our research, and we present the major topics discussed in the thesis. In Section 1.1, we describe the physical problem we are interested in, and give some examples of applications. In Section 1.2, we formulate the mathematical problem and, in Section 1.3, we overview some of the principal approaches generally used to solve scattering problems. Finally, in Section 1.4, we discuss direct and iterative solution strategies and introduce some issues relevant to the design of the preconditioner. 1

36 2 1. Introduction 1.1 The physical problem and applications Electromagnetic scattering problems address the physical issue of detecting the diffraction pattern of the electromagnetic radiation scattered from a large and complex body when illuminated by an incident incoming wave. A good understanding of these phenomena is crucial to the design of many industrial devices like radars, antennae, computer microprocessors, optical fibre systems, cellular telephones, transistors, modems, and so on. Electronic circuits produce and are subject to electromagnetic interference, and ensuring reduced radiation and signal distortion have become two major issues in the design of modern electronic devices. The increase of currents and frequencies in industrial simulations makes electromagnetic compatibility requirements more difficult to meet and demands an accurate analysis previous to the design phase. The study of electromagnetic scattering is required in radar applications, where a target is illuminated by incident radiation and the energy radiated back to the radar is analysed to retrieve information on the target. In fact, the amount of the radiated energy depends on the radar cross-section of the target, on its shape, on the material of which it is composed, and on the wavelength of the incident radiation. Radar measurements are vital for estimating surface currents in oceanography, for mapping precipitation areas and detecting wind direction and speed in meteorological and climatic studies, as well as in the production of accurate weather forecasts, geophysical prospecting from remote sensing data, wireless communication and bioelectromagnetics. In particular, the computation of radar crosssection is used to identify unknown targets as well as to design stealth technology. Modern targets reduce their observability features by using new materials. Engineers design, develop and test absorbing materials which can control radiation, reduce signatures of military systems, preserve compatibility with other electromagnetic compatibility devices, isolate recording studios and listening rooms. A good knowledge of the electromagnetic properties of materials can be critical for economic competitiveness and technological advances in many industrial sectors. All these simulations can be very demanding in terms of computer resources; they require innovative algorithms and the use of high performance computers to afford a rigorous numerical solution. 1.2 The mathematical problem The mathematical formulation of scattering problems relies on Maxwell s equations, originally introduced by James Maxwell in 1864 in the article A Dynamical Theory of the Electromagnetic Field [103] as 20 scalar equations.

37 1.2. The mathematical problem 3 Maxwell s equations were reformulated in the 1880s as a set of four vector differential equations, describing the time and space evolution of the electric and the magnetic field around the scatterer. They are: H = J + D t, E = B t, D = ρ, B = 0. (1.2.1) The vector fields which appear in (1.2.1) are the electric field E(x,t), the magnetic field H(x,t), the magnetic flux density B(x,t) and the electric flux density D(x,t). Equations (1.2.1) involve also the current density J(x,t) and the charge density ρ(x, t). Given a vector field A represented in Cartesian coordinates in the form A(x, y, z) = A x (x, y, z)i+a y (x, y, z)j+a z (x, y, z)k, the components of the curl operator A are ( A) x = A z y A y z, ( A) y = A x z A z x, ( A) z = A y x A x y. The divergence operator A in Cartesian coordinates is A = A x x + A y y + A z z. The continuity equation, which expresses the conservation of charge, relates the quantities J and ρ ρ t + J = 0. In an isotropic conductor the current density is related to the electric field by Ohm s law: J = σe, where σ(x) is called the electric conductivity. If σ is nonzero, the medium is called a conductor, whereas if σ = 0 the medium is referred to as a dielectric. Relations also exist between D and E, B and H, and are determined by the polarization and magnetization properties of the medium containing the scatterer; in a linear isotropic medium we have D = ɛe, B = µh,

38 4 1. Introduction where the functions ɛ(x) and µ(x) are the electric permittivity and the magnetic permeability, respectively. In a vacuum D = E, and B = H. This equality can be assumed valid, up to some approximation, when the medium is the air. In this case, Maxwell s equations can be simplified and read: H = J + E t, E = H t, E = ρ, H = 0. (1.2.2) Boundary conditions are associated with system (1.2.2) to describe different physical situations. For scattering from perfect conductors, which represents an important model problem in industrial simulations, the electric field vanishes inside the object and the total tangential electric field on the surface of the scatterer is zero. Absorbing radiation conditions at infinity are imposed, like the Silver-Müller radiation condition [25] lim r (Hs x re s ) = 0 uniformly in all directions ˆx = x/ x, where r = x and H s and E s are the scattered part of the fields. A further simplification comes when Maxwell s equations are formulated in the frequency domain rather than in the time domain. Since the sum of two solutions is still a solution, Fourier transformations can be introduced to remove time-dependency from system (1.2.2), and to write it in the form of a set of several time-independent systems, each corresponding to one fixed value of frequency. All the quantities in (1.2.2) are assumed to have harmonic behaviour in time, that is they can be written in the form A(x, t) = A(x)e iωt (ω is a constant) and their time dependency is completely determined by the amplitude and relative phase. For a dielectric body the new system assumes the form: H = +iωe, E = iωh, E = 0, H = 0. (1.2.3) where now E = E(x) and H = H(x). Here ω = ck = 2πc/λ is referred to as the angular frequency, k as the wave number and λ as the wavelength of the electromagnetic wave. The constant c is the speed of light.

39 1.3. Numerical solution of Maxwell s equations Numerical solution of Maxwell s equations A popular solution approach eliminates the magnetic field H from (1.2.3) and obtains a vector Helmholtz equation with a divergence condition: { E + k 2 E = 0, E = 0. (1.3.4) Systems (1.3.4) are challenging to solve. An analytic solution can be computed when the geometry of the scatterer is very regular, as in the case of a sphere or a spheroid. More complicated boundaries require the use of numerical techniques. Objects of interest in industrial applications generally have large dimension in terms of wavelength, and the computation of their scattering cross section can be very demanding in terms of computer resources. Until the emergence of high-performance computers in the early eighties, the solution was afforded by using approximate high frequency techniques such as the shooting and bouncing ray method (SBR) [101]. Basically, raybased asymptotic methods like SBR and uniform theory of diffraction are based on the idea that EM scattering becomes a localized phenomenon as the size of the scatterer increases with respect to the wavelength. In the last 20 years, the impressive advance in computer technology and the introduction of fast methods which have less computational and memory requirement, have made a rigorous numerical solution affordable for many practical applications. Nowadays, computer scientists generally adopt two distinct approaches for the numerical solution, based on either differential or integral equation methods Differential equation methods The first approach solves system (1.3.4) for the electric field surrounding the scatterer by differential equation methods. Classical discretization schemes like the finite-element method (FEM) [125, 145]) or the finite-difference method (FDM) [99, 137] can be used to discretize the continuous model and give rise to a sparse linear system of equations. The domain outside the object is truncated and an artificial boundary is introduced to simulate an infinite volume [20, 83, 85]. Absorbing boundary conditions do not alter the sparsity structure in the matrix from the discretization but have to be imposed at some distance from the scatterer. More accurate exterior boundary conditions, based on integral equations, allow us to bring the exterior boundary of the simulation region closer to the surface of the scatterer and to limit the size of the linear system to solve [89, 104]. As they are based on integral equations, they result in a part of the matrix being dense in the final system which can increase the overall solution cost.

40 6 1. Introduction The discretization of large 3D domains may suffer from grid dispersion errors, which occur when a wave has a different phase velocity on the grid compared to the exact solution [9, 90, 100]. Grid dispersion errors accumulate in space and, for 2D and 3D problems over large simulation regions, their effect can be troublesome, introducing spurious solutions in the computation. The effect of grid dispersion errors can be reduced by using finer grids or higher-order accurate differential equation solvers, which substantially increase the problem size, or by coupling the differential equation solver with an integral equation solver. Because of the sparsity structure of the discretization matrix, differential equation methods have become popular solution methods for EM problems Integral equation methods An alternative class of methods is represented by integral equation solvers. Using the equivalence principle, system (1.3.4) can be recast in the form of four integral equations which relate the electric and magnetic fields E and H to the equivalent electric and magnetic currents J and M on the surface of the object. Integral equation methods solve for the induced currents globally, whereas differential equation methods solve for the fields. The electric-field integral equation (EFIE) expresses the electric field outside the object E E in terms of the induced current J. In the case of harmonic time dependency it reads E(x) = G(x, x )ρ(x )d 3 x ik G(x, x )J(x )d 3 x + E E (x) (1.3.5) Γ c Γ where E E is the electric field due to external sources, and G is the Green s function for scattering problems: G(x, x ) = e ik x x x x. The EFIE provides a first-kind integral equation which is well known to be ill-conditioned, but it is the only integral formulation that can be used for open targets. Another formulation, referred to as the magneticfield integral equation (MFIE), expresses the magnetic field outside the object in terms of the induced current and allows the calculation of the magnetic field outside the object. Both formulations suffer from interior resonances, which can make the numerical solution more problematic at some frequencies known as resonant frequencies. The problem of interior resonances is particularly troubling for large objects. A possible remedy is to combine linearly the EFIE and MFIE formulation. The resulting equation, known as the combined-field integral equation (CFIE), does not suffer from internal resonance and is much better conditioned as it generally provides an integral equation of the second-kind, but can be used only for closed

41 1.3. Numerical solution of Maxwell s equations 7 targets. Owing to these nice properties, the use of the CFIE formulation is considered mandatory for closed surfaces. The resulting EFIE, MFIE and CFIE are converted into matrix equations by the Method of Moments [86]. The unknown current J(x) on the surface of the object is expanded into a set of basis functions B i, i = 1, 2,..., N J(x) = N J i B i (x). i=1 This expansion is introduced in (1.3.5), and the discretized equation is applied to a set of test functions. A linear system of equations is finally obtained, whose unknowns are the coefficients of the expansion. The entries in the coefficient matrix are expressed in terms of surface integrals and assume the simplified form A KL = G(x, y)b K (x) B L (y)dl(y)dk(x). (1.3.6) When m-point Gauss quadrature formulae are used to compute the surface integrals in (1.3.6), the entries of the coefficient matrix have the form A KL = m m ω i ω j G(x Ki, y Lj )B K (x Ki ) B L (y Lj ). i=1 j=1 The resulting linear system is dense and complex, unsymmetric in the case of MFIE and CFIE, symmetric but non-hermitian in the case of the EFIE formulation. For homogeneous or layered homogeneous dielectric bodies, integral equations are discretized on the surface of the object or at the discontinuous interfaces between two different materials. Thus the number of unknowns is generally much smaller when compared to the discretization of large 3D spaces by finite-difference or finite-element methods. However, a global coupling of the induced currents in the problem results in dense matrices. The cost of the solution associated with these dense matrices has for a long time precluded the popularity of integral solution methods in EM. In recent years, the application in the context of the study of radar targets of different materials and the availability of larger computer resources have motivated an increasing interest towards integral methods. Throughout this thesis, we focus on preconditioning strategies for the EFIE formulation of scattering problems. In the integral equation context that we consider, the problems are discretized by the Method of Moments using the Rao-Wilton-Glisson (RWG) basis functions [116]. The surface of the object is modelled by a triangular faceted mesh (see Figure 1.3.1), and each RWG basis is assigned to one interior edge in the mesh. Each unknown in the problem represents the vectorial flux across each edge in the

42 8 1. Introduction triangular mesh. The total number of unknowns is given by the number of interior edges, which is about one and a half times the number of triangular facets. In order to have a correct approximation to the oscillating solution of the Maxwell s equations, physical constraints impose that the average edge length a has to be between 0.1λ and 0.2λ, where λ is the wavelength of the incoming wave [11]. Two factors mainly affect the dimension N of the linear system to solve, namely the total surface area and the frequency of the problem. For a given target the size of the system is proportional to the square of the frequency, and the memory cost for the storage of the N 2 complex numbers of the full discretization matrix is proportional to the fourth power of the frequency. This cost increases drastically when fine discretization is required, as in the case for rough geometries, and can make the numerical solution of medium size problems unaffordable even on modern computers. Nowadays a typical electromagnetic problem in industry can have hundred of thousands or a few million unknowns. Figure 1.3.1: Example of discretized mesh. 1.4 Direct versus iterative solution methods Direct methods are often the method of choice for the solution of these systems in an industrial environment because they are reliable and predictable both in terms of accuracy and cost. Dense linear algebra packages such as LAPACK [5] provide reliable implementations of LU factorization attaining good performance on modern computer architectures. In particular, they use Level 3 BLAS [51, 52] for block operations which

43 1.4 Direct versus iterative solution methods 9 enable us to exploit data locality in the cache memory. Except when the geometries are very irregular, the coefficient matrices of the discretized problem are not very ill-conditioned, and direct methods compute fairly accurate solutions. The factorization can be performed once and then is reused to compute a solution for all excitations. In industrial simulations, objects are illuminated at several, slightly different incidence directions, and hundred of thousands of systems have often to be solved for the same application, all having the same coefficient matrix and a different right-hand side. For the solution of large-scale problems, direct methods become impractical even on large parallel platforms because they require storage of N 2 single or double precision complex entries of the coefficient matrix and O(N 3 ) floating-point operations to compute the factorization, where N denotes the size of the linear system. Some direct solvers with reduced computational complexity have been introduced for the case when the solution is sought for blocks of right-hand sides, like the EADS out-of-core parallel solver [1], the Nested Equivalence Principle Algorithm (NEPAL) [30, 31] and the Recursive Aggregate T-Matrix Algorithm (RATMA) [31, 32], but the computational cost remains a bottleneck for large-scale applications. Although, in the last twenty years, computer technology has gone from flops to Gigaflops, that is a speedup factor of 10 9, the size of the largest dense problems solved on current architectures increased by only a factor of three [56, 57] A sparse approach for solving scattering problems It can be argued that all large dense matrices hide some structure behind their N 2 entries. The structure sometimes emerges naturally at the matrix level (Toeplitz, circulant, orthogonal matrices) and sometimes can be identified from the origin of the problem. When the number of unknowns is large, the discretized problem reflects more closely the properties of the continuous problem, and the entries of the discretization matrix are far from arbitrary. Exploiting this structure can enable the use of sparse linear algebra techniques and lead to a sensible reduction of the overall solution cost. The use of iterative methods can be promising from this viewpoint because they simply require a routine to compute matrix-vector products and do not need the knowledge of all the entries of the coefficient matrix. Special properties of the problem can be profitably used to reduce the computational cost of this procedure. Under favourable conditions, iterative methods improve the approximate solution at each step. When the required accuracy is obtained, one can stop the iteration. In the last decades, active research efforts have been devoted to understanding theoretical and numerical properties of modern iterative solvers. Although they still cannot compete with direct solvers in terms of

44 10 1. Introduction robustness, they have been successfully used in many contexts. In particular, it is now established that iterative solvers have to be used with some form of preconditioning to be effective on challenging problems, like those arising in industry (see, for instance, [2, 41, 60, 146]). Provided we have fast matrixvector multiplications and robust preconditioners the iterative solution via modern Krylov solvers can be an alternative to direct methods. There are active research efforts on fast methods [4, 82] to perform fast matrix-vector products with O(N log N) computational complexity. These methods, generally referred to as hierarchical methods, were introduced originally in the context of the study of particle simulations as a way to reduce costs and enable the solution of large problems, or to demand more accuracy in the computation [6, 8]. Hierarchical methods can be effective on boundary element applications, and many research efforts have been successful in this direction, including strategies for parallel distributed memory implementations [45, 46, 47, 79, 80]. In this thesis, we focus on the other key component of Krylov methods in this context; that is, we study the design of robust preconditioning techniques. The design of the preconditioner is generally very problemdependent and can take great advantage of a good knowledge of the underlying physical problem. General purpose preconditioners can fail on specific classes of problems, and for some of them a good preconditioner is not known yet. A preconditioner M is required to be a good approximation of A in some sense (or of A 1, depending on the context), to be easy to compute and cheap to store and to apply. For electromagnetic scattering problems expressed in integral formulation, some special constraints in addition to normal constraints are required. For large problems the use of fast methods is mandatory for the matrix-vector products. When fast methods are used, the coefficient matrix is not completely stored in memory and only some of the entries, corresponding to the near-field interactions, are explicitely computed and available for the construction of the preconditioner. Hierarchical methods are often implemented in parallel, partitioning the domain among different processors and the matrix-vector products are computed in a distributed manner, trying to meet the goal of both load balancing and reduced communications. Thus, parallelism is a relevant factor to consider in the design of the preconditioner. Nowadays the typical problem size in electromagnetic industry is continually increasing, and the effectiveness of preconditioned Krylov subspace solvers should be combined with the property of numerical scalability; that is, the numerical behavior of the preconditioner should not depend on the mesh size or on the frequency of the problem. Finally, matrices arising from the discretization of integral equations can be highly indefinite and many standard preconditioners can exhibit surprisingly poor performance. This manuscript is structured as follows. In Chapter 2, we establish the need for preconditioning linear systems of equations which arise from the

45 1.4 Direct versus iterative solution methods 11 discretization of boundary integral equations in electromagnetism, and we test and compare several standard preconditioners computed from a sparse approximation of the dense coefficient matrix. We study their numerical behaviour on a set of model problems arising from both academic and from industrial applications, and gain some insight on potential causes of failure. In Chapter 3, we focus our analysis on sparse approximate inverse methods and we propose some efficient static nonzero pattern selection strategies for the construction of a robust Frobenius-norm minimization preconditioner in electromagnetism. We introduce suitable strategies to identify the relevant entries to consider in the original matrix A, as well as an appropriate sparsity structure for the approximate inverse. In Chapter 4, we illustrate the numerical and computational efficiency of the proposed preconditioner on a set of model problems, and we complete the study considering two symmetric preconditioners based on Frobenius-norm minimization. In Chapter 5, we consider the implementation of the Frobenius-norm minimization preconditioner within the code that implements the Fast Multipole Method (FMM). We combine the sparse approximate inverse preconditioner with fast multipole techniques for the solution of huge electromagnetic problems. We study the numerical and parallel scalability of the implementation and we investigate the numerical behaviour of innerouter iterative solution schemes implemented in a multipole context with different levels of accuracy for the matrix-vector products in the inner and outer loops. In Chapter 6, we introduce an algebraic multilevel strategy based on low-rank updates for the preconditioner computed by using spectral information of the preconditioned matrix. We illustrate the computational and numerical efficiency of the algorithm on a set of model problems that is representative of real electromagnetic calculation. We finally draw some conclusions arising from the work and address perspectives for future research.

46 12 1. Introduction

47 Chapter 2 Iterative solution via preconditioned Krylov solvers of dense systems in electromagnetism In this chapter we establish the need for preconditioning linear systems of equations which arise from the discretization of boundary integral equations in electromagnetism. In Section 2.1, we illustrate the numerical behaviour of iterative Krylov solvers on a set of model problems arising both from industrial and from academic applications. The numerical results suggest the need for preconditioning to effectively reduce the number of iterations required to obtain convergence. In Section 2.2, we introduce the idea of preconditioning based on sparsification strategies, and we test and compare several standard preconditioners computed from a sparse approximation of the dense coefficient matrix. We study their numerical behaviour on model problems and gain some insight on potential causes of failure. 2.1 Introduction and motivation In this section we study the numerical behaviour of several iterative solvers for the solution of linear systems of the form Ax = b (2.1.1) where the coefficient matrix A arises from the discretization of boundary integral equations in electromagnetism. Among different integral 13

48 14 2. Iterative solution via preconditioned Krylov solvers... formulations here we focus on the EFIE formulation 1.3.5, because it is more general and more difficult to solve. We use the following Krylov methods: restarted GMRES [123]; Bi-CGSTAB [142] and Bi-CGSTAB(2) [129]; symmetric [69], nonsymmetric [67] and transpose-free QMR [66]; CGS [131]. As a set of model problems for the numerical experiments we consider the following geometries, arising both from academic and from industrial applications, that are representative of the general numerical behaviour observed. For physical consistency we have set the frequency of the wave so that there are about ten discretization points per wavelength [11]. Example 1: a cylinder with a hollow inside, a matrix of order n = 1080, see Figure 2.1.1(a); Example 2: a cylinder with a break on the surface, a matrix of order n = 1299, see Figure 2.1.1(b); Example 3: a satellite, a matrix of order n = 1701, see Figure 2.1.1(c); Example 4: a parallelopiped, a matrix of order n = 2016, see Figure 2.1.1(d); and Example 5: a sphere, a matrix of order n = 2430, see Figure 2.1.1(e). The first three examples are considered because they can be representative of real industrial simulations. The geometries of Examples 4 and 5 are very regular, and they are mainly introduced to study the numerical behaviour of the proposed methods on smooth surfaces. In spite of their small dimension, these problems are not easy to solve. Except for two of the model problems, the sphere and the parallelopiped, the other problems are tough because their geometries have open surfaces. Larger problems will be examined in Chapter 5 when we consider the multipole method.

49 2.1. Introduction and motivation 15 (a) Example 1 (b) Example 2 (c) Example 3 (d) Example 4 (e) Example 5 Figure 2.1.1: Meshes associated with test examples.

50 16 2. Iterative solution via preconditioned Krylov solvers... Table shows the number of matrix-vector products needed by each of the solvers to reduce the residual by This tolerance can be accurate for engineering purposes, as it enables to localize fairly accurately the distribution of the currents on the surface of the object. In each case, we take as initial guess x 0 = 0, and the right-hand side such that the exact solution of the system is known. In the GMRES code [63] and the symmetric QMR code [62] (referred to as SQMR in the forthcoming tables), iterations are stopped when, for the current approximation x m, the computed value of r m 2 α x m 2 +β satisfied a fixed tolerance. Here r m is the residual vector r m = b Ax m, and standard choices for constants α and β in backward error analysis are α = A 2 and β = b 2. In all our tests we use α = 0 and β = b 2 = r 0 2 because of initial guess. For CGS and Bi-CGSTAB, we use the implementations provided by HSL 2000 [87] subroutines MI06 and MI03 respectively, suitably adapted to complex arithmetic. These routines accept the current approximation x m when b Ax m max( b Ax 0 2 ε 1, ε 2 ), where ε 1 and ε 2 are user-defined tolerances. In our case we take ε 1 as equal to the required accuracy, and ε 2 = 0.0. For Bi-CGSTAB(2) we use the implementation developed by D. Fokkema of the Bi-CGSTAB(l) algorithm, which introduces some enhancements to improve stability and robustness, as explained in [127] and [128]. The algorithm stops iterations when the relative residual norm r n 2 / r 0 2 becomes smaller than a fixed tolerance. In the tests with nonsymmetric QMR (referred to as UQMR in the forthcoming tables) and TFQMR, we use, respectively, the ZUCPL and ZUTFX routines provided in QMRPACK [70]. In particular, ZUCPL implements a double complex nonsymmetric QMR algorithm based on the coupled two-term look-ahead Lanczos variant (see [68]). Both ZUCPL and ZUTFX stop iterations when the relative residual norm r n 2 / r 0 2 becomes smaller than a fixed tolerance. Notice that, since x 0 = 0, all the stopping criteria are equivalent, allowing fair comparison among all those methods. All the numerical experiments reported in this section correspond to runs on a Sun workstation in double complex arithmetic and Level 2 BLAS operations are used to carry out dense matrix-vector products. In connection with GMRES, we test different values of the restart m, from 10 up to 110. We recall that each iteration involves one matrix-vector product for restarted GMRES and SQMR, two for Bi-CGSTAB and CGS, three for UQMR and four for TFQMR.

51 2.1. Introduction and motivation 17 Example Size GMRES(m) Bi - CGStab m=10 m=30 m=50 m=80 m= Example Size Bi - CGStab(2) SQMR UQMR TFQMR CGS Table 2.1.1: Number of matrix-vector products needed by some unpreconditioned Krylov solvers to reduce the residual by a factor of Except for SQMR, all the other solvers exhibit very slow convergence on the first three examples which correspond to irregular geometries and are more difficult to solve. The last two examples are easier because the geometries are very regular, however the iterative solution is still expensive in terms of number of matrix-vector products. These experiments reveal the remarkable robustness of SQMR that clearly outperforms non-symmetric solvers on all the test cases, even GMRES for large restarts. The results also reveal the good performance of Bi-CGSTAB(2) compared to the standard Bi-CGSTAB method which generally requires at least one third more matrix-vector products to converge. On the most difficult problems, slow convergence is essentially due to the bad spectral properties of the coefficient matrix. Figure plots the distribution of eigenvalues in the complex plane for Example 3; the eigenvalues are scattered from the left to the right of the spectrum, many of them have large negative real part and no clustering appears. Such a distribution is not at all favourable for the rapid convergence of Krylov solvers. Krylov methods look for the solution of the system in the Krylov space K k (A, b) = span{b, Ab, A 2 b,..., A k 1 b}. This is a good space from which to construct approximate solutions for a nonsingular linear system because it is intimately related to A 1. The inverse of any nonsingular matrix A can be written in terms of powers of A with the help of the minimal polynomial q(t) of A, which is the unique monic polynomial of minimum

52 18 2. Iterative solution via preconditioned Krylov solvers Imaginary axis Real axis Figure 2.1.2: Eigenvalue distribution in the complex plane of the coefficient matrix of Example 3. degree such that q(a) = 0. If the minimal polynomial of A has degree m, then the solution of Ax = b lies in the space K m (A, b). Consequently, the smaller the degree of the minimal polynomial, the faster the expected rate of convergence of a Krylov method (see [88]). If preconditioning A by a nonsingular matrix M causes the eigenvalues of M 1 A to fall into a few clusters, say t of them, whose diameters are small enough, then M 1 A behaves numerically like a matrix with t distinct eigenvalues. As a result, we would expect t iterations of a Krylov method to produce reasonably accurate approximations. It has been shown in [74, 122, 148] that in practice, with the availability of a high quality preconditioner, the choice of the Krylov subspace accelerator is not so critical. 2.2 Preconditioning based on sparsification strategies A preconditioner M should satisfy the following demands: M is a good approximation to A in some sense (sometimes to A 1, depending on the context); the construction and storage of M is not expensive;

53 2.2. Preconditioning based on sparsification strategies 19 the system Mx = b is much easier to solve than the original one. The transformed preconditioned system has the form M 1 Ax = M 1 b if preconditioning from the left, and AM 1 y = b, with x = M 1 y, when preconditioning from the right. For a preconditioner M given in the form M = M 1 M 2, it is also possible to consider the two-sided preconditioned system M1 1 1 AM2 z = M b, with x = M2 z. Most of the existing preconditioners can be divided into either implicit or explicit form. A preconditioner is said to be of implicit form if its application, within each step of an iterative method, requires the solution of a linear system; it is implicitly defined by any nonsingular matrix M A. The most important example of this class is represented by incomplete factorization methods, where M is implicitly defined by M = LŪ, L and Ū are generally triangular matrices that approximate the exact L and U factors from a standard factorization of A according to some dropping strategy adopted during the factorization. It is well known that these methods are sensitive to indefiniteness in the coefficient matrix A and can lead to unstable triangular solves and very poor preconditioners (see [34]). Another important drawback of ILU-techniques is that they are not naturally suitable for a parallel implementation since the sparse triangular solves can lead to a severe degradation of performance on vector and parallel machines. Explicit preconditioning techniques try to mitigate such difficulties. They directly approximate A 1 as the product M of sparse matrices, so that the preconditioning operation reduces to forming one or more matrix-vector products. Consequently the application of the preconditioner should be easier to parallelize, with different strategies depending on the particular architecture. In addition, some of these techniques can also perform the construction phase in parallel. On certain indefinite problems with large nonsymmetric parts, these methods have provided better results than techniques based on incomplete factorizations (see [35]), representing an efficient alternative to the solution of difficult applications. A comparison of approximate inverse and ILU can be found in [76]. In the next sections, we study the numerical behaviour of several standard preconditioners both of implicit and of explicit form in combination with Krylov methods for the solution of systems (2.1.1). All the preconditioners are computed from a sparse approximation of the dense coefficient matrix. On general problems, this approach can cause a severe deterioration of the quality of the preconditioner; in the BEM context, it is likely to be more effective since a very sparse matrix can retain the most relevant contributions to the singular integrals. In Figure we depict the pattern structure of the large entries in the discretization matrix for Example 5, which is representative of the general trend. Large to small entries are depicted in different colours, from red to green, yellow

20 2. Iterative solution via preconditioned Krylov solvers... and blue. The picture shows that, in the discretization matrix, only a small set of entries generally have large magnitude.

54 20 2. Iterative solution via preconditioned Krylov solvers... and blue. The picture shows that, in the discretization matrix, only a small set of entries generally have large magnitude. The largest entries are located on the main diagonal and only a few adjacent bands have entries of high magnitude. Most of remaining entries generally have much smaller modulus. In Figure 2.2.4, we plot for the same example the matrix obtained by scaling A = [a ij ] so that max i,j a ij = 1, and discarding from A all entries less than ε = 0.05 in modulus. This matrix is 98.5% sparse. The figure emphasizes the presence of the strong coupling among neighbouring edges introduced in the geometrical domain by the Boundary Element Method, and suggests the possibility of extracting a sparsity pattern from A by simply discarding elements of negligible magnitude, which correspond to weak contributions of coupling among distant nodes. Figure 2.2.3: Pattern structure of the large entries of A. The test problem is Example 5. The dropping operation is generally referred to as sparsification. The idea of sparsifying dense matrices before computing the preconditioner was introduced by Kolotilina [93] in the context of sparse approximate inverse methods. Alléon et al. [2], Chen [28] and Vavasis [144] used this idea for the preconditioning of dense systems from the discretization of boundary integral equations, and Tang and Wan [140] in the context of multigrid methods. Similar ideas are also exploited by Ruge and Stüben [118] in the

55 2.2. Preconditioning based on sparsification strategies 21 Figure 2.2.4: Nonzero pattern for A when the smallest entries are discarded. The test problem is Example 5. context of algebraic multigrid methods. On sparse systems, sparsification can be helpful to identify the most relevant connections in the direct problem, especially when the coefficient matrix contains many small entries or is fairly dense (see [33] and [91]). Several heuristics can be used to sparsify A and to try and retain the main contributions to the singular integrals. Some approaches are the following: find, in each column of A, the k entries of largest modulus, where k n is a positive integer. The choice of the parameter k is generally problem-dependent. The resulting matrix will have exactly k n entries; for each column of A, select the row indices of the k largest entries in modulus and then, for each row index i corresponding to one of these entries, performing the same search on column i. These new row indices will be added to the previous ones to form the nonzero pattern for the column. This heuristic, referred to as neighbours of neighbours, is described in detail in [36]; the same approach as in the previous heuristic, but performing more than one iteration, and halving the number of largest entries to be located at each iteration in order to preserve sparsity. In practice, two iterations are enough [2];

56 22 2. Iterative solution via preconditioned Krylov solvers... scaling A such that its largest entry has magnitude equal to 1, and retaining in the pattern only the elements located in positions (i, j) such that a ij > ε, where the threshold parameter ε (0, 1). This heuristic was proposed by Kolotilina in [93]. Combinations of these approaches can be also used. In the numerical experiments the preconditioners considered are constructed from the sparse near-field approximation of A, computed by using the first heuristic. We will refer to this matrix as sparsified(a) and denote it as Ã. We symmetrize the pattern after computing it in order to preserve symmetry in Ã. We consider the following methods implemented as right preconditioners : SSOR(ω), where ω is the relaxation parameter ; IC(k), the incomplete Cholesky factorization technique with k levels of fill-in, i.e. taking for the factors a sparsity pattern based on position and prescribed in advance; AINV, the approximate inverse method introduced in [16] that uses a dropping strategy based on values; SP AI, a Frobenius-norm minimization technique with the adaptive strategy proposed by Gould and Scott [76] for the selection of the sparsity pattern for the preconditioner. In order to illustrate the trend in the behaviour of these preconditioners, we first show in Table the number of iterations required to compute the solution on Example 1. All the preconditioners are computed using the same sparse approximation of the original matrix and all have roughly the same number of nonzeros entries. In the incomplete Cholesky factorization, no additional level of fill-in was allowed in the factors; with AINV, we selected a suitable dropping threshold (around 10 3 ) to obtain the same degree of density as the other methods; and finally, with SP AI, we chose a priori, for each column of M, the same fixed maximum number of nonzeros as in the computation of sparsif ied(a). In the SSOR method, we choose ω=1. In Table we give the number of iterations for both GMRES and SQMR that actually also corresponds to the number of matrix-vector products that is the most time consuming part of the algorithms. We intend, in the following sections, to understand the numerical behaviour of these methods on electromagnetics problems, identifying some potential causes of failure SSOR The SSOR preconditioner is the most basic preconditioning method apart from a diagonal scaling. It is defined as

57 2.2. Preconditioning based on sparsification strategies 23 Example 1 - Density of Ã = 4% - Density of M = 4% Precond. GMRES(50) GMRES(110) GMRES( ) SQMR None Jacobi SSOR IC(0) 159 AINV SP AI * Table 2.2.2: Number of iterations using both symmetric and unsymmetric preconditioned Krylov methods to reduce the normwise backward error by 10 5 on Example 1. The symbol - means that convergence was not obtained after 500 iterations. The symbol * means that the method is not applicable. M = (D + ωe)d 1 (D + ωe T ) where E is the strictly lower triangular part of Ã, and D is the diagonal matrix whose nonzero entries are the diagonal entries of Ã. In the case ω = 1, D + E is the lower part of Ã, including the diagonal, and D + ET is the upper part of Ã. We recall that Ã is symmetric, because A is symmetric and we use a symmetric pattern for the sparsification. In Table we show the number of iterations required by different Krylov solvers preconditioned by SSOR to reduce the residual by a factor of For those experiments we use ω = 1 to compute the preconditioner and we consider increasing values of density for the matrix Ã. Although very cheap to compute, SSOR is not very robust. Increasing the density of the sparse approximation of A does not help to improve its performance, and indeed on some problems it behaves like a diagonal scaling (ω = 0). In Figures and Figures we illustrate the sensitivity of the SQMR convergence to the parameter ω for Examples 1 and 4. When SSOR is used as a stationary iterative solver, the relaxation parameter ω is selected in the interval [0,2]. When SSOR is used as a preconditioner, the choice of the ω parameter might be less constraining; thus we also show experiments with values a bit larger than 2.0.

58 24 2. Iterative solution via preconditioned Krylov solvers... Density of Ã GMRES(m) Example 1 Bi - CGStab UQMR SQMR TFQMR m=10 m=30 m=50 m=80 m=110 2% % % % % Example 2 Density of Ã GMRES(m) Bi - CGStab UQMR SQMR TFQMR m=10 m=30 m=50 m=80 m=110 2% % % % % Example 3 Density of Ã GMRES(m) Bi - CGStab UQMR SQMR TFQMR m=10 m=30 m=50 m=80 m=110 2% % % % % Example 4 Density of Ã GMRES(m) Bi - CGStab UQMR SQMR TFQMR m=10 m=30 m=50 m=80 m=110 2% % % % % Continued on next page

59 2.2. Preconditioning based on sparsification strategies 25 Density of Ã GMRES(m) Example 5 Continued from previous page Bi - CGStab UQMR SQMR TFQMR m=10 m=30 m=50 m=80 m=110 2% % % % % Table 2.2.3: Number of iterations required by different Krylov solvers preconditioned by SSOR to reduce the residual by The symbol - means that convergence was not obtained after 500 iterations. 165 Example 1 Size = 1080 Density of sparsified(a) = 6 % 160 SQMR iterations Value of ω Figure 2.2.5: Sensitivity of SQMR convergence to the SSOR parameter ω for Example Incomplete Cholesky factorization Incomplete factorization methods are one of the most natural ways to construct preconditioners of implicit type. In the general nonsymmetric case, they start from a factorization method such as LU or Cholesky

60 26 2. Iterative solution via preconditioned Krylov solvers Example 4 Size = 2016 Density of sparsified(a) = 6 % 100 SQMR iterations Value of ω Figure 2.2.6: Sensitivity of SQMR convergence to the SSOR parameter ω for Example 4. decomposition or even QR factorization that decompose the matrix into the product of triangular factors, and thus modify it to reduce the construction cost. The basic idea is to keep the factors artificially sparse, for instance by dropping some elements in prescribed nondiagonal positions during the standard Gaussian elimination algorithm. It is well known that, even when the matrix is sparse, the triangular factors L and U and similarly the unitary and the upper triangular factors Q and R can often be fairly dense. The preconditioning operation z = M 1 y is computed by solving the linear system LŪz = y, where L L and Ū U, that is performed in two distinct steps: 1. solve Lw = y 2. solve Ūz = w. ILU preconditioners are amongst the most reliable in a general setting. Originally developed for sparse matrices, they can be applied also to dense systems, by extracting a sparsity pattern in advance, and performing the incomplete factorization on the sparsified matrix. This class has been intensively studied, and successfully employed on a wide range of symmetric problems, providing a good balance between computational costs and reduction of the number of iterations (see [27] and [55]). Well known theoretical results on the existence and stability of the factorization can be proved for the class of M-matrices [105], and recent studies involve more general symmetric matrices, both structured and unstructured.

61 2.2. Preconditioning based on sparsification strategies 27 In this section, we consider the incomplete Cholesky factorization and denote it by IC. We assume that the standard IC factorization matrix M of Ã is given in the following form M = LDL T, (2.2.2) where D and L stand for, respectively, the diagonal matrix and the unit lower triangular matrix whose entries are computed by means of the algorithm given in Figure The set F of fill-in entries to be kept is given by F = { (k, i) lev(l k,i ) l }, where integer l denotes a user specified maximal fill-in level. lev(l k,i ) of the coefficient l k,i of L is defined by: The level Initialization Factorization lev(l k,i ) = 0 if l k,i 0 or k = i otherwise lev(l k,i ) = min { lev(l k,i ), lev(l i,j ) + lev(l k,j ) + 1 }. The resulting preconditioner is usually denoted by IC(l). Alternative strategies that dynamically discard fill-in entries are summarized in [122]. In Tables to 2.2.8, we display the number of iterations using an incomplete Cholesky factorization preconditioner on the five model problems. In this and in the forthcoming tables the symbol - means that convergence was not obtained after 500 iterations. We show results for increasing values of the density for the sparse approximation of A as well as various levels of fill-in. The general trend is that increasing the fill-in generally produces a much more robust preconditioner than IC(0) applied to a denser sparse approximation of the original matrix. Moreover, IC(l) with l 1 may deliver a good rate of convergence provided the coefficient matrix is not too sparse, as we get closer to LDL T. However, on indefinite problems the numerical behaviour of IC can be fairly chaotic. This can be observed in Table for Example 5. The factorization of a very sparse approximation (up to 2%) of the coefficient matrix can be stable and deliver a good rate of convergence, especially if at least one level of fill-in is retained. For higher values of density for the approximation of A, the factors may become very ill-conditioned and consequently the preconditioner is very poor. As shown in the tables, ill-conditioning of the factors is not related to ill-conditioning of the matrix Ã. This behaviour has been already observed on sparse real indefinite systems, see for instance [34]. As an attempt for a possible remedy, following [109, 110], we apply IC(l) to a perturbation of Ã by a complex diagonal matrix. More specifically, we

62 28 2. Iterative solution via preconditioned Krylov solvers... Compute D and L Initialization phase d i,i = ã i,i, i = 1, 2,, n l i,j = ã i,j, i = 2,, n, j = 1, 2,, i 1 Incomplete factorization process do j = 1, 2,, n 1 do i = j + 1, j + 2,, n d i,i = d i,i l2 i,j d j,j end do end do l i,j = l i,j d j,j do k = i + 1, i + 2,, n end do if (i, k) F l k,i = l k,i l i,j l k,j Figure 2.2.7: Incomplete factorization algorithm - M = LDL T. use Ã τ = Ã + i τh r, (2.2.3) where r = diag(re(a)) = diag(re(ã)), and τ stands for a nonnegative real parameter, while h = n 1 d with d = 3 (the space dimension). (2.2.4) The intention is to move the eigenvalues of the preconditioned system along the imaginary axis and thus avoid a possible eigenvalue cluster close to zero. In Table 2.2.9, we show the number of SQMR iterations for different values of τ, the shift parameter, and various levels of fill-in in the preconditioner. The value of the shift is problem-dependent, and should be selected to ensure a good balance between making the factorization process more stable without perturbing significantly the coefficient matrix. A good value can be between 0 and 2. Although it is not easy to tune and its effect is difficult to predict, a small diagonal shift can help to compute a more stable factorization, and in some cases the performance of the preconditioner can significantly improve. In Figures 2.2.8, and , we illustrate the effect of this shift strategy on the eigenvalue distribution of the preconditioned matrix. For

63 2.2. Preconditioning based on sparsification strategies 29 Example 1 Density of Ã = 2% - K (Ã) = IC(level) Density of M GMRES(30) GMRES(50) SQMR IC(0) 2.0% IC(1) 4.5% IC(2) 7.8% Density of Ã = 3% - K (Ã) = IC(level) Density of M GMRES(30) GMRES(50) SQMR IC(0) 3.0% IC(1) 7.5% IC(2) 13.0% Density of Ã = 4% - K (Ã) = IC(level) Density of M GMRES(30) GMRES(50) SQMR IC(0) 4.0% IC(1) 11.9% IC(2) 23.4% 194 Density of Ã = 5% - K (Ã) = 5350 IC(level) Density of M GMRES(30) GMRES(50) SQMR IC(0) 5.0% 398 IC(1) 16.9% 222 IC(2) 32.3% Density of Ã = 6% - K (Ã) = IC(level) Density of M GMRES(30) GMRES(50) SQMR IC(0) 6.0% 296 IC(1) 21.7% 128 IC(2) 39.0% Table 2.2.4: Number of iterations, varying the sparsity level of Ã and the level of fill-in on Example 1. each value of the shift parameter τ, we display κ(l), the condition number (calculated using the LAPACK package) of the computed L factor, and the number of iterations required by SQMR. The eigenvalues are scattered all over the complex plane when no shift is used, whereas they look more clustered when a shift is applied. As we mentioned before, a clustered spectrum of the preconditioned matrix is usually considered a desirable property for fast convergence of Krylov solvers. However, for incomplete factorizations the condition number of the factors plays a more important role than the eigenvalue distribution on the rate of convergence of the Krylov iterations. In fact, if the triangular factors computed by the incomplete factorization process are very ill-conditioned, the long recurrences associated

64 30 2. Iterative solution via preconditioned Krylov solvers... Example 2 Density of Ã = 2% - K (Ã) = IC(level) Density of M GMRES(30) GMRES(50) SQMR IC(0) 2.0% 168 IC(1) 4.1% 386 IC(2) 6.6% Density of Ã = 3% - K (Ã) = 998 IC(level) Density of M GMRES(30) GMRES(50) SQMR IC(0) 3.0% 171 IC(1) 6.7% IC(2) 11.5% Density of Ã = 4% - K (Ã) = 737 IC(level) Density of M GMRES(30) GMRES(50) SQMR IC(0) 4.0% IC(1) 9.9% IC(2) 17.5% Density of Ã = 5% - K (Ã) = 647 IC(level) Density of M GMRES(30) GMRES(50) SQMR IC(0) 5.0% 103 IC(1) 13.2% IC(2) 23.4% Density of Ã = 6% - K (Ã) = 648 IC(level) Density of M GMRES(30) GMRES(50) SQMR IC(0) 6.0% 143 IC(1) 15.9% IC(2) 28.2% Table 2.2.5: Number of iterations, varying the sparsity level of Ã and the level of fill-in on Example 2. with the triangular solves are unstable and the use of the preconditioner may be totally uneffective. An auto-tuned strategy might be designed, which consists in incrementing the value of the shift and computing a new incomplete factorization if the condition number of the current factor is too large. Although time consuming, this strategy might construct a robust shifted IC factorization on highly indefinite problems.

65 2.2. Preconditioning based on sparsification strategies 31 Example 3 Density of Ã = 2% - K (Ã) = IC(level) Density of M GMRES(30) GMRES(50) SQMR IC(0) 2.0% IC(1) 4.5% IC(2) 7.0% Density of Ã = 3% - K (Ã) = IC(level) Density of M GMRES(30) GMRES(50) SQMR IC(0) 3.0% IC(1) 7.1% IC(2) 11.3% Density of Ã = 4% - K (Ã) = 9568 IC(level) Density of M GMRES(30) GMRES(50) SQMR IC(0) 4.0% 388 IC(1) 10.0% IC(2) 15.9% Density of Ã = 5% - K (Ã) = 1874 IC(level) Density of M GMRES(30) GMRES(50) SQMR IC(0) 5.0% 342 IC(1) 12.9% IC(2) 20.4% Density of Ã = 6% - K (Ã) = 1403 IC(level) Density of M GMRES(30) GMRES(50) SQMR IC(0) 6.0% 362 IC(1) 15.8% IC(2) 24.5% Table 2.2.6: Number of iterations, varying the sparsity level of Ã and the level of fill-in on Example 3.

66 32 2. Iterative solution via preconditioned Krylov solvers... Example 4 Density of Ã = 2% - K (Ã) = 541 IC(level) Density of M GMRES(30) GMRES(50) SQMR IC(0) 2.0% IC(1) 5.1% IC(2) 8.6% Density of Ã = 3% - K (Ã) = 346 IC(level) Density of M GMRES(30) GMRES(50) SQMR IC(0) 3.0% 467 IC(1) 8.3% IC(2) 14.2% Density of Ã = 4% - K (Ã) = 322 IC(level) Density of M GMRES(30) GMRES(50) SQMR IC(0) 4.0% IC(1) 10.9% IC(2) 17.9% Density of Ã = 5% - K (Ã) = 369 IC(level) Density of M GMRES(30) GMRES(50) SQMR IC(0) 5.0% IC(1) 14.7% IC(2) 24.5% Density of Ã = 6% - K (Ã) = 370 IC(level) Density of M GMRES(30) GMRES(50) SQMR IC(0) 6.0% IC(1) 18.6% IC(2) 30.2% Table 2.2.7: Number of iterations, varying the sparsity level of Ã and the level of fill-in on Example 4.

67 2.2. Preconditioning based on sparsification strategies 33 Example 5 Density of Ã = 2% - K (Ã) = 263 IC(level) Density of M κ (L) GMRES(30) GMRES(50) SQMR IC(0) 2.0% IC(1) 5.1% IC(2) 9.1% Density of Ã = 3% - K (Ã) = 270 IC(level) Density of M κ (L) GMRES(30) GMRES(50) SQMR IC(0) 3.0% IC(1) 7.8% IC(2) 12.8% Density of Ã = 4% - K (Ã) = 253 IC(level) Density of M κ (L) GMRES(30) GMRES(50) SQMR IC(0) 4.0% IC(1) 11.7% IC(2) 19.0% Density of Ã = 5% - K (Ã) = 285 IC(level) Density of M κ (L) GMRES(30) GMRES(50) SQMR IC(0) 5.0% IC(1) 14.6% IC(2) 23.0% Density of Ã = 6% - K (Ã) = 294 IC(level) Density of M κ (L) GMRES(30) GMRES(50) SQMR IC(0) 6.0% IC(1) 18.8% IC(2) 29.6% Table 2.2.8: Number of iterations, varying the sparsity level of Ã and the level of fill-in on Example 5.

68 34 2. Iterative solution via preconditioned Krylov solvers... Example 1 - Density of Ã = 5% IC(level) Density of M τ IC(0) 5.0% IC(1) 16.9% IC(2) 32.3% Example 2 - Density of Ã = 2% IC(level) Density of M τ IC(0) 2.0% IC(1) 4.1% IC(2) 6.6% Example 3 - Density of Ã = 3% IC(level) Density of M τ IC(0) 3.0% IC(1) 7.1% IC(2) 11.3% Example 4 - Density of Ã = 4% IC(level) Density of M τ IC(0) 3.0% IC(1) 8.4% IC(2) 14.2% Example 5 - Density of Ã = 4% IC(level) Density of M τ IC(0) 4.0% IC(1) 11.7% IC(2) 19.0% Table 2.2.9: Number of SQMR iterations, varying the shift parameter for various level of fill-in in IC.

69 2.2. Preconditioning based on sparsification strategies 35 (a) τ = κ(l) = SQMR iter. = +500 (b) τ = κ(l) = SQMR iter. = +500 (a) τ = κ(l) = SQMR iter. = 313 (b) τ = κ(l) = SQMR iter. = 161 (c) τ = κ(l) = SQMR iter. = 117 (d) τ = κ(l) = SQMR iter. = 104 (c) τ = κ(l) = SQMR iter. = 95 (d) τ = κ(l) = SQMR iter. = 94 Figure 2.2.8: The spectrum of the matrix preconditioned with IC(1), the condition number of L, and the number of iterations with SQMR for various values of the shift parameter τ. The test problem is Example 1 and the density of Ã is around 3%.

70 36 2. Iterative solution via preconditioned Krylov solvers... (a) τ = κ(l) = SQMR iter. = +500 (b) τ = κ(l) = SQMR iter. = +500 (a) τ = κ(l) = SQMR iter. = 313 (b) τ = κ(l) = SQMR iter. = 161 (c) τ = κ(l) = SQMR iter. = 117 (d) τ = κ(l) = SQMR iter. = 104 (c) τ = κ(l) = SQMR iter. = 95 (d) τ = κ(l) = SQMR iter. = 94 Figure 2.2.9: The eigenvalue distribution on the square [-1, 1] of the matrix preconditioned with IC(1), the condition number of L, and the number of iterations with SQMR for various values of the shift parameter τ. The test problem is Example 1 and the density of Ã is around 3%.

71 2.2. Preconditioning based on sparsification strategies 37 (a) τ = κ(l) = SQMR iter. = +500 (b) τ = κ(l) = SQMR iter. = +500 (a) τ = κ(l) = SQMR iter. = 313 (b) τ = κ(l) = SQMR iter. = 161 (c) τ = κ(l) = SQMR iter. = 117 (d) τ = κ(l) = SQMR iter. = 104 (c) τ = κ(l) = SQMR iter. = 95 (d) τ = κ(l) = SQMR iter. = 94 Figure : The eigenvalue distribution on the square [-0.3, 0.3] of the matrix preconditioned with IC(1), the condition number of L, and the number of iterations with SQMR for various values of the shift parameter τ. The test problem is Example 1 and the density of Ã is around 3%.

72 38 2. Iterative solution via preconditioned Krylov solvers AINV An alternative way to construct a preconditioner is to compute an explicit approximation of the inverse of the coefficient matrix. In this section we consider two techniques, the first constructs an approximation of the inverse of the factors using an Ã-biconjugation process [19] and the other a Frobenius-norm minimization technique [93]. If the matrix Ã can be written in the form LDLT where L is unit lower triangular and D is diagonal, then its inverse can be decomposed as Ã 1 = L T D 1 L 1 = ZD 1 Z T where Z = L T is unit triangular. Factorized sparse approximate inverse techniques compute sparse approximations Z Z, so that the resulting preconditioner will be M = Z D 1 ZT Ã 1, for D D. In the approach known as AIN V the triangular factors are computed by means of a set of Ã-biconjugate vectors {z i } n i=1, such that zt i Ãz j = 0 if and only if i j. Then, introducing the matrix Z = [z 1, z 2,...z n ] the relation p Z T 0 p ÃZ = D = p n holds, where p i = z T i Ãz i 0, and the inverse is equal to Ã 1 = ZD 1 Z T = n i=1 z i z T i p i. The sets of Ã-biconjugate vectors are computed by means of a (two-sided) Gram-Schmidt orthogonalization process with respect to the bilinear form associated with Ã. A sketch of the algorithm is resumed in Figure In exact arithmetic this process can be completed if and only if Ã admits a LU factorization. AIN V does not require a pattern prescribed in advance for the approximate inverse factors, and sparsity is preserved during the process, by discarding elements in the computed approximate inverse factor having magnitude smaller than a given positive threshold. An alternative approach was proposed by Kolotilina and Yeremin in a series of papers ([95, 96, 97, 98]). This approach, known as F SAI, approximates Ã 1 by the factorization G T G, where G is a sparse lower triangular matrix approximating the inverse of the lower triangular Cholesky factor, L, of Ã. This technique has obtained good results on some difficult problems and is suitable for parallel implementation, but it requires an a priori prescription for the sparsity pattern for the approximate factors. The approximate inverse factor is computed by minimizing I G L 2 F, that can be accomplished without knowing the Cholesky factor L by solving the

73 2.2. Preconditioning based on sparsification strategies 39 Compute D 1 and Z Initialization phase z (0) i = e i (1 i n), A = [a 1,, a n] The biconjugation algorithm do i = 1, 2,, n do j = i, i + 1,, n end do p (i 1) j do j = i + 1,, n = a T i z (i 1) j end do end do z (i) j = z (i 1) j (p (i 1) j /p (i 1) i )z (i 1) i z i = z (i 1) i, p i = p (i 1) i Figure : The biconjugation algorithm - M = ZD 1 Z T. normal equations {G L L T } ij = L T ij, (i, j) S L (2.2.5) where S L is a lower triangular nonzero pattern for G. Equation (2.2.5) can be replaced by { GÃ} ij = I ij, (i, j) S L (2.2.6) where G = D 1 G and D is the diagonal of L. Then, each row of G can be computed independently by solving a small linear system. The preconditioned linear system has the form GÃGT = D GÃ G T D. The matrix D is not known and is generally chosen so that the diagonal of GÃGT is all ones. Recently another matrix inversion based on incomplete biconjugation has been proposed in [148]. The idea is to compute a lower unit triangular matrix L = [L 1, L 2,...L n ] of order n, such that L T ÃL is a diagonal nonsingular matrix, say D 1 =diag[d 1 11, d d 1 nn].

74 40 2. Iterative solution via preconditioned Krylov solvers... This is equivalent to the relations L T i ÃL j { = 0 if i j 0 if i = j (2.2.7) In other words L T i and L j are Ã-biconjugate, and then the inverse can be written as Ã 1 = LDL T. A procedure computes the inverse factors of Ã 1 using relations and preserves a sparsity pattern for the factor L discarding entries with small modulus. In Table we show the number of iterations needed by GMRES and SQMR preconditioned by AINV to reduce the normwise backward error by 10 5 on the five examples considered. On the most difficult problems, the performance of this preconditioner is very poor. For low values of density of Ã, AINV is less effective than a diagonal scaling, and its quality does not improve even when the dense coefficient matrix is used for the construction as shown in the results of Table Both re-ordering and shift strategies do not improve the effectiveness of the preconditioner. We performed in particular experiments with the reverse Cuthil-MacKee ordering [37], the minimum degree ordering [71, 141] and the spectral nested dissection ordering [114]. The best performance were observed with the minimum degree algorithm that in some cases enables to have smaller norm-wise backward error at the end of convergence. We mention that very similar or sometimes more disappointing results have been observed with the FSAI method and the other factorized approximate inverse proposed in [148].

75 2.2. Preconditioning based on sparsification strategies 41 Example 1 Density of Ã GMRES(m) SQMR m=50 m=110 m= 2% 4% 6% 313 8% % Example 2 Density of Ã GMRES(m) SQMR m=50 m=110 m= 2% 4% % % % Example 3 Density of Ã GMRES(m) SQMR m=50 m=110 m= 2% 4% % % % Example 4 Density of Ã GMRES(m) SQMR m=50 m=110 m= 2% % % % % Example 5 Density of Ã GMRES(m) SQMR m=50 m=110 m= 2% % % % % Table : Number of iterations required by different Krylov solvers preconditioned by AINV to reduce the residual by The symbol - means that convergence was not obtained after 500 iterations.

76 42 2. Iterative solution via preconditioned Krylov solvers... Density of Ã Example 1 GMRES(m) SQMR m=50 m=110 m= 2% 4% 6% 8% 10% 483 Example 2 Density of Ã GMRES(m) SQMR m=50 m=110 m= 2% 4% 495 6% 361 8% % Example 3 Density of Ã GMRES(m) SQMR m=50 m=110 m= 2% % % % % Example 4 Density of Ã GMRES(m) SQMR m=50 m=110 m= 2% % % % % Example 5 Density of Ã GMRES(m) SQMR m=50 m=110 m= 2% 4% 6% % % Table : Number of iterations required by different Krylov solvers preconditioned by AINV to reduce the residual by The preconditioner is computed using the dense coefficient matrix. The symbol - means that convergence was not obtained after 500 iterations.

77 2.2. Preconditioning based on sparsification strategies 43 Possible causes of failure of factorized approximate inverses One potential difficulty with the factorized approximate inverse method AIN V is the tuning of the threshold parameter that controls the fill-in in the inverse factors. For a typical example we display in Figure the sparsity pattern of A 1 (on the left) and L 1, the inverse of its Cholesky factor (on the right), respectively, where all the entries smaller than have been dropped after a symmetric scaling such that max i a ji = max i l ji = 1. The location of the large entries in the inverse matrix exhibit some structure. In addition, only a very small number of its entries have large magnitude compared to the others that are much smaller. This fact has been successfully exploited to define various a priori pattern selection strategies for Frobenius norm minimization preconditioners [2, 22] in a non-factorized form. On the contrary, the inverse factors that are explicitely approximated by AINV and by F SAI can be totally unstructured as shown in Figure (b). In this case, the a priori selection of a sparse pattern for the factors can be extremely hard as no real structures are revealed, preventing the use of techniques like F SAI. In Figure we plot the magnitude of the entries in the first column of A 1 (on the left) and L 1 (on the right), respectively, with respect to their row index. These plots indicate that any dropping strategy, either static or dynamic, may be very difficult to tune as it can easily discard relevant information and potentially lead to a very poor preconditioner. Selecting too small a threshold would retain too many entries and lead to a fairly dense preconditioner. For instance on the small example considered, if a threshold of 0.05 is used the preconditioner is 14.8% dense. A larger threshold would yield a sparser preconditioner but might discard too many entries of moderate magnitude that are important for the preconditioner. On the previous example all the entries with magnitude smaller than 0.2 must be dropped to keep the density in the inverse factor around 3%. Because of these issues, finding the appropriate threshold to enable a good trade-off between sparsity and numerical efficiency is challenging and very problemdependent SPAI Frobenius-norm minimization is a natural approach for building explicit preconditioners. This method computes a sparse approximate inverse as the matrix M = {m ij } which minimizes I MÃ F (or I ÃM F for right preconditioning) subject to certain sparsity constraints. Early references to this latter class can be found in [12, 13, 14, 65] and in [2] for some applications to boundary element matrices in electromagnetism. The Frobenius-norm is usually chosen since it allows the decoupling of the constrained minimization problem into n independent linear least-squares

78 44 2. Iterative solution via preconditioned Krylov solvers Density = 8.75% (a) Sparsity pattern of sparsified(a 1 ) Density = 29.39% (b) Sparsity pattern of sparsified(l 1 ) Figure : Sparsity patterns of the inverse of A (on the left) and of the inverse of its lower triangular factor (on the right), where all the entries whose relative magnitude is smaller than are dropped. The test problem, representative of the general trend, is a small sphere Magnitude of the entries in the 1st row of A Column of A 1 (a) Histogram of the magnitude of the entries of the first column of A (b) Histogram of the magnitude of the entries in the first column of the inverse of a factor of A Figure : Histograms of the magnitude of the entries of the first column of A 1 and its lower triangular factor. A similar behaviour has been observed for all the other columns. The test problem, representative of the general trend, is a small sphere. problems, one for each column of M (when preconditioning from the right) or row of M (when preconditioning from the left). The independence of these least-squares problems follows immediately from the identity: I MÃ 2 F = I ÃM T 2 F = n e j Ãm j 2 2 (2.2.8) where e j is the j-th unit vector and m j is the column vector representing the j-th row of M. j=1

79 2.2. Preconditioning based on sparsification strategies 45 In the case of right preconditioning, the analogous relation I ÃM 2 F = n e j Ãm j 2 2 (2.2.9) j=1 holds, where m j is the column vector representing the j-th column of M. Clearly, there is considerable scope for parallelism in this approach. However, the precondioner is not guaranteed to be nonsingular, and the symmetry of Ã is generally not preserved in M. The main issue for the computation of the sparse approximate inverse is the selection of the nonzero pattern of M, that is the set of indices S = { (i, j) [1, n] 2 s.t. m ij = 0 }. If the sparsity pattern of M is known, the nonzero structure for the j-th column of M is automatically determined, and defined as J = {i [1, n] s.t. (i, j) S}. The least-squares solution involves only the columns of Ã indexed by J; we indicate this subset by Ã(:, J). Because Ã is sparse, many rows in Ã(:, J) are usually null, not affecting the solution of the least-squares problems (2.2.9). Thus if I is the set of indices corresponding to the nonzero rows in Ã(:, J), and if we define by Â = Ã(I, J), by ˆm j = m j (J), and by ê j = e j (J), the actual reduced least-squares problems to solve are min ê j Â ˆm j 2, j = 1,.., n. (2.2.10) Usually problems (2.2.10) have much smaller size than problems (2.2.9). Two different approaches can be followed for the selection of the sparsity pattern of M: an adaptive technique that dynamically tries to identify the best structure for M; and a static technique, where the pattern of M is prescribed a priori based on some heuristics. The idea is to keep M reasonably sparse while trying to capture the large entries of the inverse, which are expected to contribute the most to the quality of the preconditioner. A static approach that requires an a priori nonzero pattern for the preconditioner, introduces significant scope for parallelism and has the advantage that the memory storage requirements and computational cost for the setup phase are known in advance. However, it can be very problem dependent. A dynamic approach is generally effective but is usually very expensive. These methods usually start with a simple initial guess, like a diagonal matrix, and then improve the pattern until a criterion of the form Ãm j e j 2 < ε (for each j) is satisfied for a given ε > 0, e j being the j-th column of the identity matrix, or until a maximum number of nonzeros in the j-th column m j of M has been reached.

80 46 2. Iterative solution via preconditioned Krylov solvers... Different strategies can be adopted to enrich the initial nonzero structure of the j-th column of the preconditioner. The method known as SPAI [84] uses some heuristic to select the new indices by predicting those that can most effectively reduce the residual r 2 = Ã(:, J) ˆm j ê j 2 (2.2.11) Grote and Huckle [84] propose solving a one-dimensional minimization problem. If L = {l s.t. r(l) 0}, then the new candidates are selected from Ĩ = {j s.t. Ã(L, j) 0}. They suggest solving, for each j Ĩ the following problem The solution of this problem is min µj r + µ j Ãe j 2. µ j = rt Ãe j Ãe, j 2 2 and the residual of the updated solution is given by ρ j = r 2 rt Ãe j Ãe. j 2 2 The proposed heuristic selects the indices which maximize rt Ãe j. More Ãe j 2 2 than one new candidate can be selected at a time, and the algorithm stops when either a maximum number of nonzeros per column is reached or the required accuracy is achieved. The algorithm can deliver very good preconditioners even on hard problems, but at the cost of huge times and memory although the execution time can be significantly reduced because of parallelism. A comparison in terms of construction cost with ILU-type methods can be found in [18, 76]. In Table , we show the number of iterations needed by Krylov solvers preconditioned by SPAI to solve the model problems. As for the other preconditioners, we consider different levels of density in the sparse approximation of A. Provided the preconditioner is dense enough, SPAI is quite effective in reducing the number of iterations. Also, the quality of the preconditioner on difficult problems can be remarkably improved if the dense coefficient matrix is used for the construction. For instance on Example 1, if SPAI is computed using the full A, then a density of 2% for the approximate inverse enables the convergence of GMRES(80) in 75 iterations, whereas convergence is not achieved in 500 iterations if the approximate inverse is computed using a sparse approximation of A. However the adaptive strategy requires a prohibitive time. The construction of the approximate inverse using 6% density for Ã takes nearly one hour of computation on a SGI

81 2.2. Preconditioning based on sparsification strategies 47 Origin 2000 for Example 4 and three hours for Example 5. When using the dense matrix A in the computation, the construction of the preconditioner for the same examples takes more than one day SLU In this section we use the sparsified matrix Ã as an implicit preconditioner; that is, the sparsified matrix is factorized using ME47, a sparse direct solver from HSL [87], and those exact factors are used as the preconditioner. Thus it represents an extreme case with respect to ILU(0), since a complete fillin is allowed in the factors. This method will be referred to as SLU. This approach, although not easily parallelizable, is generally quite effective on this class of applications for dense enough sparse approximations of A. In Table we show the number of iterations required by different Krylov solvers preconditioned by SLU to reduce the normwise backward error by a factor of This approach, although not easily parallelizable, is generally quite effective on this class of applications for dense enough sparse approximations of A. However, as shown in the table, when the preconditioner is very sparse, the numerical quality of this approach deteriorates and the Frobenius-norm minimization method is more robust.

82 48 2. Iterative solution via preconditioned Krylov solvers... Example 1 Density of Ã GMRES(m) Bi - CGStab UQMR TFQMR m=10 m=30 m=50 m=80 m=110 2% 4% % % % Example 2 Density of Ã GMRES(m) Bi - CGStab UQMR TFQMR m=10 m=30 m=50 m=80 m=110 2% 212 4% % % % Example 3 Density of Ã GMRES(m) Bi - CGStab UQMR TFQMR m=10 m=30 m=50 m=80 m=110 2% 4% % % % Example 4 Density of Ã GMRES(m) Bi - CGStab UQMR TFQMR m=10 m=30 m=50 m=80 m=110 2% % % % % Example 5 Density of Ã GMRES(m) Bi - CGStab UQMR TFQMR m=10 m=30 m=50 m=80 m=110 2% 4% % % % Table : Number of iterations required by different Krylov solvers preconditioned by SPAI to reduce the residual by The symbol - means that convergence was not obtained after 500 iterations.

83 2.2. Preconditioning based on sparsification strategies 49 Example 1 Density of Ã GMRES(m) Bi - CGStab UQMR TFQMR m=10 m=30 m=50 m=80 m=110 2% % % % % Example 2 Density of Ã GMRES(m) Bi - CGStab UQMR TFQMR m=10 m=30 m=50 m=80 m=110 2% % % % % Example 3 Density of Ã GMRES(m) Bi - CGStab UQMR TFQMR m=10 m=30 m=50 m=80 m=110 2% % % % % Example 4 Density of Ã GMRES(m) Bi - CGStab UQMR TFQMR m=10 m=30 m=50 m=80 m=110 2% % % % % Example 5 Density of Ã GMRES(m) Bi - CGStab UQMR TFQMR m=10 m=30 m=50 m=80 m=110 2% % % % % Table : Number of iterations required by different Krylov solvers preconditioned by SLU to reduce the residual by The symbol - means that convergence was not obtained after 500 iterations.

84 50 2. Iterative solution via preconditioned Krylov solvers Other preconditioners A third class of explicit methods deserves to be mentioned here, although we will not consider it in our numerical experiments. It is based on ILU techniques, and in the general nonsymmetric case it builds the sparse approximate inverse by first performing an incomplete LU factorization Ã LŪ and then approximately inverting the L and Ū factors by solving the 2n triangular linear systems { Lxi = e i Ūy i = e i (1 i n). These two systems are solved approximately, prescribing two sparsity pattern for L and Ū and using a Frobenius-type method, or the adaptive SP AI method without any pattern in advance. Another approach, which has provided better results, consists in solving the 2n triangular systems by customary forward and backward substitution, respectively, and adopting dropping strategy, based either on position or on values, to maintain sparsity in the columns of L and Ū. Generally two different levels of incompleteness are applied, rather than one as in the other approximate inverse methods. These preconditioners are not easy to use; relying on ILU factorization, they are almost useless for highly nonsymmetric, indefinite matrices and since incomplete processes are strongly sequential, the preconditioner building phase is not entirely parallelizable, although the independence of the two triangular solves suggest a good scope for parallelism. References to this class can be found in [3, 40, 133]. 2.3 Concluding remarks In this chapter we have established the need for preconditioning linear systems of equations which arise from the discretization of boundary integral equations in electromagnetism. We have discussed several standard preconditioners based on sparsification strategies and have studied and compared their numerical behaviour on a set of model problems that may be representative of real electromagnetic calculation. We have shown that the incomplete factorization process is highly unstable on indefinite matrices like those arising from the discretization of the EFIE formulation. Using numerical experiments we have shown that the triangular factors computed by the factorization can be very ill-conditioned, and the long recurrences associated with the triangular solves are unstable. As an attempt at a possible remedy, we have introduced a small complex shift to move the eigenvalues of the preconditioned system along the imaginary axis and thus try to avoid a possible cluster of eigenvalues close to zero. A small diagonal complex shift can help to compute a more stable factorization.

85 2.4. Concluding remarks 51 However, suitable strategies can be introduced to tune the optimal value of the shift and to predict its effect. Factorized approximate inverses, namely AIN V and F SAI, exhibit poor convergence behaviour because the inverse factors can be totally unstructured; both reordering and shift strategies do not improve their effectiveness. Any dropping strategy, either static or dynamic, may be very difficult to tune as it can easily discard relevant information and potentially lead to a very poor preconditioner. Among different techniques, Frobenius norm minimization methods are quite efficient because they deliver a good rate of convergence. However, they require a high computational effort, so that their use is mainly effective in a parallel setting. To be computationally affordable on dense linear systems, Frobenius-norm minimization preconditioning techniques require a suitable strategy to identify the relevant entries to consider in the original matrix A, in order to define small least-squares problems, as well as an appropriate sparsity structure for the approximate inverse. Prescribing a pattern in advance for the preconditioner can greatly reduce the amount of work in terms of CPU-time. The problem of cost is evident for the computation of SP AI, since fast convergence can be obtained for high values of the sparsity ratio, but then the adaptive strategy requires a prohibitive time and computational cost in a sequential environment. Compared to sparse approximate inverse methods, SSOR is generally slower, but is very cheap to compute. Its main drawback is that it is not parallelizable and in addition, for much larger problems, the cost per iteration will grow so that this preconditioner will no longer be competitive with the other techniques. Finally, the SLU preconditioner, although generally quite effective on this class of applications, is not easily parallelizable and requires dense enough sparse approximations of A. This preconditioner can be expensive in terms of both memory and CPU time for the solution of large problems, and thus it is mainly interesting for comparison purpose.

86 52 2. Iterative solution via preconditioned Krylov solvers...

87 Chapter 3 Sparse pattern selection strategies for robust Frobenius-norm minimization preconditioner In the previous chapter, we established the need for preconditioning linear systems of equations arising from the discretization of boundary integral equations (expressed via the EFIE formulation) in electromagnetism. We briefly discussed some preconditioners and compared their performance on a set of model problems arising both from academic and from industrial applications. The numerical results suggests that sparse approximate inverse techniques can be good candidates to precondition this class of problems efficiently. In particular, the Frobenius-norm minimization approach can greatly reduce the number of iterations needed if compared with the implicit approach based on incomplete factorization. In addition Frobenius-norm minimization is inherently parallel. To be computationally affordable on dense linear systems, Frobenius-norm minimization preconditioners require a suitable strategy to identify the relevant entries to consider in the original matrix A, in order to define small least-squares problems, as well as an appropriate sparsity structure for the approximate inverse. In this chapter, we propose some efficient static nonzero pattern selection strategies both for the preconditioner and for the selection of the entries of A. In Section 3.1, we overview both dynamic and static approaches to compute the sparsity pattern of Frobenius-norm minimization preconditioners. In Section 3.2, we introduce and compare some strategies to prescribe in advance the nonzero structure of the preconditioner in electromagnetic applications. In Section 3.3, we propose the use of a different 53

88 54 3. Sparse pattern selection strategies for robust... pattern selection procedure for the original matrix from that used for the preconditioner and finally, in Section 3.4 we illustrate the numerical and computational efficiency of the proposed preconditioners on a set of model problems. 3.1 Introduction and motivation We introduced Frobenius-norm minimization in Section The idea is to compute the sparse approximate inverse of a matrix A as the matrix M which minimizes I MA F (or I AM F for right preconditioning) subject to certain sparsity constraints. The main issue is the selection of the nonzero pattern of M. The idea is to keep M reasonably sparse while trying to capture the large entries of the inverse, which are expected to contribute the most to the quality of the preconditioner. For this purpose, two approaches can be followed: an adaptive technique that dynamically tries to identify the best structure for M; and a static technique, where the pattern of M is prescribed a priori based on some heuristics. A simple approach is to prescribe the locations of nonzeros of M before computing their actual values. When the coefficient matrix has a special structure or special properties, efforts have been made to find a pattern that can retain the entries of A 1 having large modulus [42, 48, 49, 138], and indeed some theoretical studies have shown that there are cases where the large entries in A 1 are clustered near the diagonal [58, 106]. If A is row diagonally dominant, then the entries in the inverse decay columnwise and vice versa [138]. When A is a banded SP D matrix, the entries of A 1 decay exponentially along each row or column; more precisely, if b ij is the element located at the i-th row and j-th column of A 1, then b ij Cγ i j (3.1.1) where γ < 1 and C > 0 are constant. In this case a banded M would be a good approximation to A 1 [49]. For many PDE problems the entries of the inverse exhibit some decaying behaviour and a good sparse pattern for the approximate inverse can be computed in advance. However the constant C in relation (3.1.1) can be very large and the decay unacceptably slow, or the decay is non-monotonic and thus hardly predictable [139]. For sparse matrices, the nonzero structure of the approximate inverse can be computed based on graph information of the coefficient matrix. The sparsity structure of a sparse matrix A of order n is represented by a directed graph G(A) where the vertices are the integers {1, 2,..., n} and the edges connect pairs of distinct vertices (i, j) corresponding to nonzero off-diagonal entries {a ij } in A. The inverse will contain a nonzero in the (i, j) location whenever there is a directed path connecting vertex i to vertex j in G(A) [72].

89 3.1. Introduction and motivation 55 Several heuristics can be used to traverse the graph along specific directions and select a suitable subset of vertices of G(A) to construct the sparsity pattern of the approximate inverse. Benson and Frederickson [13] define the structure for the j-th column of the approximate inverse in the case of structurally symmetric matrices with a full diagonal by selecting in G(A) vertex j and its q-th level nearest-neighbours. They called matrices defined with these patterns as q-local matrices. A 0-local matrix has a diagonal structure, while a 1-local matrix has the same sparsity pattern of A. Taking for the sparse approximate inverse the same pattern of A generally works well only for specific classes of problems; using more levels can improve the quality of the preconditioner but the storage can become prohibitive when q is increased, and even q=2 is impractical in many cases [61]. The direction of the path in the graph can be selected based on physical considerations dictated by the decay of the magnitude of the entries observed in the discrete Green s function for many problems [139]. The discrete Green s function can be considered as a row or as a column of the exact inverse depicted on the physical computational grid. Dropping or sparsification can help to identify the most relevant interactions in the direct problem and select suitable search directions in the graph. For instance dropping entries of A smaller than a global threshold can detect anisotropy in the underlying problem and reveal it when no additional physical information is available. Chow [33] proposes combining sparsification with the use of patterns of powers of the sparsified matrix for preconditioning linear systems arising from the discretization of PDE problems. Sparsification can remarkably reduce the construction cost of the preconditioner, and the use of matrix powers enables to retain the largest entries in the Green s function. A post-processing stage, called filtration, can be included to drop small magnitude entries in the sparse approximate inverse, and reduce the cost of storing and applying the preconditioner. However, the choice of these parameters is problem-dependent and this strategy is not guaranteed to be effective on systems not arising from PDEs. The difficulty in extracting a good sparsity pattern for the approximate inverse of matrices with a general sparsity pattern has motivated the investigation of adaptive strategies that compute the pattern of the approximate inverse dynamically. The adaptive procedure known as SP AI has been already described in Section The procedure described in [35] uses a few steps of an iterative solver, like the minimal residual, to approximately minimize the least-squares problems of relation The sparsity pattern automatically emerges during the computation, and a dual threshold strategy is adopted to drop small entries either in the search directions or the iterates. To control costs, operations must be performed in sparse-sparse mode, meaning that sparse matrix-sparse vector multiplications are performed. These algorithms usually compute the approximate inverse starting with an initial pattern and estimate the

90 56 3. Sparse pattern selection strategies for robust... accuracy of the preconditioner computed by monitoring the 2-norm of the residual R = I AM. If the norm is larger than a user-defined threshold or the number of nonzeros used is less than a fixed maximum, the pattern is enlarged according to some heuristics and the approximate inverse is recomputed. The process is repeated until the required accuracy is not attained. We refer to these as adaptive procedures. We have mentioned the problem of cost for the computation of SP AI. Fast convergence can be obtained for high values of the sparsity ratio, but then the adaptive strategy requires a prohibitive time and computational cost in a sequential environment. In general, adaptive strategies can solve much more general or hard problems but tend to be very expensive. The use of effective static pattern selection strategies can greatly reduce the amount of work in terms of CPU-time, and improve substantially the overall setup process, introducing significant scope for parallelism. Also, the memory storage requirements and computational cost for the setup phase are known in advance. In the next sections, we investigate nonzero pattern selection strategies for the computation of sparse approximate inverses on electromagnetic problems. We consider both methods based on the magnitude of the entries and methods which exploit geometric or topological information from the underlying meshes. The pattern is computed in a preprocessing step and then used to compute the entries of the preconditioner. 3.2 Pattern selection strategies for Frobeniusnorm minimization methods in electromagnetism Algebraic strategy The boundary element method discretizes integral equations on the surface of the scattering object, generally introducing a very localized strong coupling among the edges in the underlying mesh. Each edge is strongly connected to only a few neighbours while, although not null, far-away connections are much weaker. This means that a very sparse matrix can still retain the most relevant contributions from the singular integrals that give rise to dense matrices. Owing to the decay of the discrete Green s function, the inverse of A may exhibit a very similar structure to A. Figure shows the typical decay of the discrete Green s function for Example 5, a scattering problem from a small sphere, which is representative of the general trend. In the density coloured plot, large to small magnitude entries in the inverse matrix

3.2. Pattern selection strategies for Frobenius-norm... 57 are depicted in different colours, from red to green, yellow and blue.

91 3.2. Pattern selection strategies for Frobenius-norm are depicted in different colours, from red to green, yellow and blue. The discrete Green s function peaks at a point, then it decays rapidly, and far from the diagonal only a small set of entries have large magnitude. Figure 3.2.1: Pattern structure of A 1. The test problem is Example 5. In this case, a good pattern for the sparse approximate inverse is likely to be the nonzero pattern of a sparse approximation to A, constructed by dropping all the entries lower than a prescribed global threshold, as suggested for instance in [93]. We refer to this approach as the algebraic approach. The dropping heuristics described in Section 2.2 can be used to compute the sparse pattern for the approximate inverse. In [2], these approaches were compared, observing similar results in the ability to cluster the eigenvalues of the preconditioners. The first and the last heuristic are the simplest, and are more suitable for parallel implementation. In addition, the first one has the advantage of placing the number of nonzero entries in the approximate inverse under complete user-control, and of achieving a perfect load balancing in a parallel implementation. A drawback common to all heuristics is that we need some deus ex machina to find optimal values for the parameters. In the numerical experiments, we have selected the strategy where, for each column of A, the k entries (k n is a positive integer) of largest modulus are retained. The algebraic strategy generally works well and competes with the approach that adaptively defines the nonzero pattern as implemented in the SPAI preconditioner described in reference [84]. Nevertheless it

92 58 3. Sparse pattern selection strategies for robust... suffers some drawbacks that put severe limits on its use in practical applications. For large problems, accessing all the entries of the matrix A becomes too expensive or even impossible. This is the case in the fast multipole framework, where all the entries of the matrix A are not even available. In addition on complex geometries, a pattern for the sparse approximate inverse computed by using information solely from A may lead to a poor preconditioner. These two main drawbacks motivate the investigation of more appropriate techniques to define a sparsity pattern for the preconditioner. Because we work in an integral equation context, we can use more information than just the entries of the matrix of the discretized problem. In particular, we can exploit the underlying mesh and extract further relevant information to construct the preconditioner. Two types of information are available from the mesh: the connectivity graph, describing the topological neighbourhood among the edges, and the coordinates of the nodes in the mesh, describing geometric neighbourhoods among the edges Topological strategy In the integral equation context that we consider, the surface of the object is discretized by a triangular mesh (see Figure 3.2.2). Each degree of freedom (DOF), representing an unknown in the linear system, corresponds to the vectorial flux across an edge in the mesh. When the object geometries are smooth, only the neighbouring edges can have a strong interaction with each other, while far-away connections are generally much weaker. Thus an effective pattern for the sparse approximate inverse can be prescribed by exploiting topological information related to the near field. The sparsity pattern for any row of the preconditioner can be defined according to the concept of level k neighbours, as introduced in [115]. Figure shows the hierarchical representation of the mesh in terms of topological levels. Level 1 neighbours of a DOF are the DOF plus the four DOFs belonging to the two triangles that share the edge corresponding to the DOF itself. Level 2 neighbours are all the level 1 neighbours plus the DOFs in the triangles that are neighbours of the two triangles considered at level 1, and so forth. In Figures and we plot, for each pair of DOFs of the mesh for Example 1, the magnitude of the associated entry in A and A 1 with respect to their relative level of neighbours. The large entries in A 1 derive from the interaction of a very localized set of edges in the mesh so that by retaining a few levels of neighbours for each DOF an effective preconditioner

3.2. Pattern selection strategies for Frobenius-norm... 59 Figure 3.2.2: Example of discretized mesh. Figure 3.2.3: Topological neighbours of a DOF in the mesh. is likely to be constructed.

93 3.2. Pattern selection strategies for Frobenius-norm Figure 3.2.2: Example of discretized mesh. Figure 3.2.3: Topological neighbours of a DOF in the mesh. is likely to be constructed. Three levels can generally provide a good pattern for constructing an effective sparse approximate inverse. Using more levels increases the computational cost but does not improve substantially the quality of the preconditioner. We will refer to this pattern selection strategy as the topological strategy. In Figure we show how the density of nonzeros in the preconditioner evolves when the number of levels is increased.

94 60 3. Sparse pattern selection strategies for robust... It can be seen that for up to five levels the preconditioner is still sparse with a density lower than 10%. Considering too many topological levels may cause unnecessary introduction of nonzeros in the sparse approximation. Some of these nonzero entries do not contribute much to the quality of the approximation. Magnitude v.s. levels for A Figure 3.2.4: Topological localization in the mesh for the large entries of A. The test problem is Example 1 and is representative of the general behaviour Geometric strategy When the object geometries are not smooth, two far-away edges in the topological sense can have a strong interaction with each other so that they are strongly coupled in the inverse matrix. For the scattering problem on Example 1, we plot in Figures and 3.2.8, for the interaction of each pair of edges in the mesh, the magnitude of the associated entry in A and A 1 with respect to their distance in terms of wavelength. The largest entries of A 1 on smooth geometries may come from the interaction of a geometrically localized set of entries in the mesh. If we construct the sparse pattern for the inverse by only using information related to A, we may retain many small entries in the preconditioner, contributing marginally to its quality, but may neglect some of the large ones potentially damaging the quality of the preconditioner. Also, when the surface of the object is very non-smooth, these large entries may come from the interaction of far-away or non-connected edges in a topological sense, which are neighbours in a geometric sense. Thus they cannot be detected by using only topological information related to the near field. Figure suggests that we can

95 3.2. Pattern selection strategies for Frobenius-norm Magnitude v.s. levels for A 1 Figure 3.2.5: Topological localization in the mesh for the large entries of A 1. The test problem is Example 1 and is representative of the general behaviour. select the pattern for the preconditioner using physical information, that is: for each edge we select all those edges within a sufficiently large sphere that defines our geometric neighbourhood. By using a suitable size for this sphere, we hope to include the most relevant contributions to the inverse and consequently to obtain an effective sparse approximate inverse. This selection strategy will be referred to as the geometric strategy. In Figure we show how the density of nonzeros in the preconditioner evolves when the radius of the sphere increases Numerical experiments In this section, we compare the different strategies described above in the solution of our test problems. Using the three pattern selection strategies for M, we denote by M a, the preconditioner computed by using the algebraic strategy, M t, the preconditioner computed by using the topological strategy, M g, the preconditioner computed by using the geometric strategy, SP AI, the preconditioner constructed by using the dynamic strategy implemented by [77] and described in Section To evaluate the effectiveness of the proposed strategies, we first consider using the dense matrix A to construct the preconditioners M a, M t, M g and SP AI. This requires the solution of large dense least-squares problems.

96 62 3. Sparse pattern selection strategies for robust Percentage of density of the pattern computed Levels Figure 3.2.6: Evolution of the density of the pattern computed for increasing number of levels. The test problem is Example 1. This is representative of the general behaviour. The density of the preconditioner varies from one problem to another for the same value of the distance parameter chosen to define M g. As Figure shows, and tests on all the other examples confirm, those entries, corresponding to edges contained within a sphere of radius 0.12 times the wavelength, can retain many of the large entries of the inverse while giving rise to quite a sparse preconditioner. For all our numerical experiments, we choose a value for k in the construction of M a and SP AI, and for the level of neighbours used to generate M t so that they have the same density as M g, when necessary discarding some small entries of the preconditioner so that all have the same number of entries. As for the numerical experiments reported in the previous chapter, we show results for different Krylov solvers. The stopping criteria in all cases just consists in reducing the normwise backward error by The symbol - means that convergence was not obtained after 500 iterations. In each case, we took as the initial guess x 0 = 0, and the right-hand side was such that the exact solution of the system was known. We performed different tests with different known solutions, observing identical results. All the numerical experiments were performed in double precision complex arithmetic on a SGI Origin 2000 and the number of iterations reported in this paper are for left preconditioning. Very similar results were obtained when preconditioning from the right. From the results shown in Table 3.2.1, we first note that all the preconditioners accelerate the convergence of the Krylov solvers, and in some cases enable convergence when the unpreconditioned solver diverges

97 3.2. Pattern selection strategies for Frobenius-norm Magnitude v.s. distance for A Figure 3.2.7: Geometric localization in the mesh for the large entries of A. The test problem is Example 1. This is representative of the general behaviour. or converges very slowly. These numerical experiments also highlight the advantages of the geometric strategy. It not only outperforms the algebraic approach and is more robust than the topological approach, which has a similar computational complexity, but it also generally outperforms the adaptive approach implemented in SPAI which is much more sophisticated and more expensive in execution time and memory. SPAI competes with M g only on Example 1 where the density of the preconditioner is higher. This trend, namely the denser the preconditioner the more efficient SPAI is, has been observed on many other examples. However, for sparse preconditioners, SPAI may be quite poor, as illustrated on Example 4 where preconditioned GMRES(30) or Bi-CGSTAB are slower than without a preconditioner and the iteration diverges for GMRES(10) with the SPAI preconditioner while it converges for the other three preconditioners. On the non-smooth geometry, that is Example 2, an explanation of why the geometric approach should lead to a better sparse preconditioner can be suggested by Figure Some far-away edges in the connectivity graph, those from each side of the break, are weakly connected in the mesh but can have a strong interaction with each other and can lead to large entries in the inverse matrix.

98 64 3. Sparse pattern selection strategies for robust... (b) Magnitude v.s. distance for A 1 Figure 3.2.8: Geometric localization in the mesh for the large entries of A 1. The test problem is Example 1. This is representative of the general behaviour Percentage of density of the pattern computed Distance/Wavelength Figure 3.2.9: Evolution of the density of the pattern computed for larger geometric neighbourhoods. The test problem is Example 1. This is representative of the general behaviour.

99 3.3. Strategies for the coefficient matrix 65 Example 1 - Density of M = 5.03% Precond. GMRES(m) Bi - CGStab UQMR TFQMR m=10 m=30 m=50 m=80 m=110 Unprec M j M a M t M g SPAI Example 2 - Density of M = 1.59% Precond. GMRES(m) Bi - CGStab UQMR TFQMR m=10 m=30 m=50 m=80 m=110 Unprec M j M a M t M g SPAI Example 4 - Density of M = 1.04% Precond. GMRES(m) Bi - CGStab UQMR TFQMR m=10 m=30 m=50 m=80 m=110 Unprec M j M a M t M g SPAI Table 3.2.1: Number of iterations using the preconditioners based on dense A. 3.3 Strategies for the coefficient matrix When the coefficient matrix of the linear system is dense, the construction of even a very sparse preconditioner may become too expensive in execution time as the problem size increases. Both memory and execution time are significantly reduced by replacing A with a sparse approximation. On general problems, this approach can cause a severe deterioration of the quality of the preconditioner; in the context of the Boundary Element Method (BEM), since a very sparse matrix can retain the most relevant contributions to the singular integrals, it is likely to be more effective. The

66 3. Sparse pattern selection strategies for robust... Figure 3.2.10: Mesh of Example 2.

100 66 3. Sparse pattern selection strategies for robust... Figure : Mesh of Example 2. use of a sparse matrix substantially reduces the size of the least-squares problems that can then be efficiently solved by direct methods. The algebraic heuristic described in the previous sections is well suited for sparsifying A. In [2] the same nonzero sparsity pattern is selected both for A and M; in that case, especially when the pattern is very sparse, the computed preconditioner may be poor on some geometries. The effect of replacing A with its sparse approximation on some problems is highlighted in Figure where we display the sparsified pattern of the inverse of the sparsified A. We see that the resulting pattern is very different from the sparsified pattern of the inverse of A shown in Figure A possible remedy is to increase the density in the patterns for both A and M. To a certain extent, we can improve the convergence, but the computational cost of generating the preconditioner grows almost cubicly with respect to density. A cheaper remedy is to choose a different number of nonzeros to construct the patterns for A and M, with less entries in the preconditioner than in Ã, the sparse approximation of A. To illustrate this effect, we show in Table the number of iterations of preconditioned GMRES(50), where the preconditioners are built by using either the same sparsity pattern for A or a two, three or five times denser pattern for A. Except when the preconditioner is very sparse, increasing the density of the pattern imposed on A for a given density of M accelerates the convergence as expected, getting quite rapidly very close to the number of iterations required when using a full A. The additional cost in terms of CPU time is negligible as can be seen in Figure for experiments on Example 1. This is due to the fact that the complexity of the QR factorization used to solve the least-squares problems is the square of the number of columns times the number of rows. Thus, increasing the number of rows, that is the number of entries of Ã, is much cheaper in terms of overall CPU time than increasing the density of the preconditioner, that is the number of columns in the least-squares problems. Notice that this

101 3.3. Strategies for the coefficient matrix 67 sparsified(a 1 ) Figure : Nonzero pattern for A 1 when the smallest entries are discarded. The test problem is Example 5. Example 1 Percentage density of M Density strategy Same times times times Full A Table 3.3.2: Number of iterations for GMRES(50) preconditioned with different values for the density of M using the same pattern for A and larger patterns. A geometric approach is adopted to construct the patterns. The test problem is Example 1. This is representative of the general behaviour observed. observation is true for both left and right preconditioning because, according to (2.2.8) and (2.2.9), the smaller dimension of the matrices involved in the least-squares problems always corresponds to the entries of M to be computed, and the larger to the entries of the sparsified matrix from A.

102 68 3. Sparse pattern selection strategies for robust... Figure : Sparsity pattern of the inverse of sparse A associated with Example 1. The pattern has been sparsified with the same value of the threshold used for the sparsification of displayed in Figure Numerical results We report in this section on the numerical results obtained by replacing A with its sparse approximation in the construction of the preconditioner. In Table we use the following notation: M a a, introduced in [2] and computed by using algebraic information from A. The same pattern is used for the preconditioner; M a t, constructed by using the algebraic strategy to sparsify A and the topological strategy to prescribe the pattern for the preconditioner; M a g, constructed by using the geometric approach and an algebraic heuristic for A with the same density as for the preconditioner; M 2a t, similar to M a t, but the density of the pattern imposed on A is twice as dense as that imposed on M a t ; M 2a g, similar to M a g but, as in the previous case, the density of the pattern imposed on A is twice as dense as that imposed on M a g.

103 3.4. Numerical results 69 CPU time for the construction of the preconditioner :1 3:1 5:1 Full A Density of the preconditioning matrix Figure : CPU time for the construction of the preconditioner using a different number of nonzeros in the patterns for A and M. The test problem is Example 1. This is representative of the other examples. For the sake of comparison we also report the number of iterations without using a preconditioner and with only a diagonal scaling, denoted by M j (j stands for Jacobi preconditoner). Other combinations are possible for defining the selection strategies for the patterns of A and M. Here we focus on the most promising ones that use information from the mesh to retain the large entries of the inverse, and the algebraic strategy for A to capture the most relevant contributions to the singular integrals. We also consider the preconditioner M a a to compare with previous tests [2] that were performed on different geometries from those considered here. We show, in Table 3.4.3, the results of our numerical experiments. For each example, we give the number of iterations required by each preconditioned solver.

104 70 3. Sparse pattern selection strategies for robust... Example 1 - Density of M = 5.03% Precond. GMRES(m) Bi - CGStab UQMR TFQMR m=10 m=30 m=50 m=80 m=110 Unprec M j M a a M a t M a g M 2a t M 2a g Example 2 - Density of M = 1.59% Precond. GMRES(m) Bi - CGStab UQMR TFQMR m=10 m=30 m=50 m=80 m=110 Unprec M j M a a M a t M a g M 2a t M 2a g Example 3 - Density of M = 2.35% Precond. GMRES(m) Bi - CGStab UQMR TFQMR m=10 m=30 m=50 m=80 m=110 Unprec M j M a a M a t M a g M 2a t M 2a g Example 4 - Density of M = 1.04% Precond. GMRES(m) Bi - CGStab UQMR TFQMR m=10 m=30 m=50 m=80 m=110 Unprec M j M a a M a t M a g M 2a t M 2a g Continued on next page

105 3.4. Numerical results 71 Continued from previous page Example 5 - Density of M = 0.63% Precond. GMRES(m) Bi - CGStab UQMR TFQMR m=10 m=30 m=50 m=80 m=110 Unprec M j M a a M a t M a g M 2a t M 2a g Table 3.4.3: Number of iterations to solve the set of test problems. Example 1 - Density of M = 5.03% M a a M a t M 2a t M a g M 2a g Example 2 - Density of M = 1.59% M a a M a t M 2a t M a g M 2a g Example 3 - Density of M = 2.35% M a a M a t M 2a t M a g M 2a g Example 4 - Density of M = 1.04% M a a M a t M 2a t M a g M 2a g Example 4 - Density of M = 0.63% M a a M a t M 2a t M a g M 2a g Table 3.4.4: CPU time to compute the preconditioners. In Table 3.4.4, we show the CPU time required to compute the preconditioners when the least-squares problems are solved using LAPACK routines. The CPU time for constructing M a t and M 2a t is in some cases much larger than that needed for M a g and M 2a g. The reason is that, in the topological strategy, it is not possible to prescribe exactly a value for the density. Thus, for each problem, we select a suitable number of levels of neighbours, to obtain the closest number of nonzeros to that retained in the pattern based on the geometric approach. After the construction of the

106 72 3. Sparse pattern selection strategies for robust... preconditioner, we drop its smallest entries to ensure an identical number of nonzeros for the two strategies. The results illustrate that considering twice as dense a pattern for A as for M does not cause a significant growth in the computational time although it enables us to construct a more robust preconditioner. We first observe that using a sparse approximation of A reduces the convergence rate of the preconditioned iterations when the nonzero pattern imposed on the preconditioner is very sparse. However if we adopt the geometric strategy to define the sparsity pattern for the approximate inverse, the convergence rate is not affected very much. For even larger values of density, the difference in the number of iterations between using full A or an algebraic sparse approximation becomes negligible. For all the experiments, M a g still outperforms M a a and is generally more robust than M a t ; the most efficient and robust preconditioner is M 2a g. The multiple density strategy allows us to improve the efficiency and the robustness of the Frobenius-norm preconditioner on this class of problems without requiring any more time for the construction of the preconditioner. For all the test examples, it enables us to get the fastest convergence even for GMRES with a low restart parameter on problems where neither M a a nor M a g converge. The effectiveness of this multiple density heuristic is illustrated in Figures and where we see the effect of preconditioning on the clustering of the eigenvalues of A for the most difficult problem, Example 2. The eigenvalues of the preconditioned matrices are in both cases well clustered around the point (1.0,0.0) (with a more effective clustering for M 2a g ), but those obtained by using the multiple density strategy are further from the origin. This is highly desirable when trying to improve the convergence of Krylov solvers. Another advantage of this multiple density heuristic is that it generally allows us to reduce the density of the preconditioner (and thus its construction cost), while preserving its numerical quality. Although no specific results are reported to illustrate this aspect, this behaviour may be partially observed in Table

107 3.5. Concluding remarks Imaginary axis Real axis Figure : Eigenvalue distribution for the coefficient matrix preconditioned by using a single density strategy on Example Concluding remarks We have presented some a priori pattern selection strategies for the construction of a robust sparse Frobenius-norm minimization preconditioner for electromagnetic scattering problems expressed in integral formulation. We have shown that, by using additional geometric information from the underlying mesh, it is possible to construct robust sparse preconditioners at an affordable computational and memory cost. The topological strategy requires less computational effort to construct the pattern, but since the density is a step function of the number of levels, the construction of the preconditioner can require some additional computation. Also it may not handle very well complex geometries where some parts of the object are not connected. By retaining two different densities in the patterns of A and M we can decrease very much the computational cost for the construction of the preconditioner, usually a bottleneck for this family of methods; preserving the efficiency while increasing the robustness of the resulting preconditioner. Although sparsifying A using an algebraic dropping strategy seems to be the most natural approach to get a sparse approximation of A when all its entries are available, either the topological or the geometric criterion can be used to define the sparse approximation of A. Those alternatives are attractive in a multipole framework where all the entries of A are not computed. The geometric approach can be also used to sparsify A, without noticeably deteriorating the quality of the preconditioner. This is shown in Table 3.5.5, where M 2g g is constructed by exploiting geometric information

108 74 3. Sparse pattern selection strategies for robust Imaginary axis Real axis Figure : Eigenvalue distribution for the coefficient matrix preconditioned by using a multiple density strategy on Example 2. in the patterns of both A and M, but choosing twice as dense a pattern for A as for M. As suggested by Figure 3.2.4, due to the strongly localized coupling introduced by the discretization of the integral equations, the topological approach can also provide a good sparse approximation of A, by retaining just a few levels of neighbouring edges for each DOF in the mesh. The numerical behaviour of this approach is illustrated in Table In both cases the resulting preconditioner is still robust and better suited for a fast multipole framework since it does not require knowledge of the location of the largest entries in A. M 2g g Example GMRES(m) Bi - CGStab UQMR TFQMR m=10 m=30 m=50 m=80 m= Table 3.5.5: Number of iterations to solve the set of test models by using a multiple density geometric strategy to construct the preconditioner. The pattern imposed on M is twice as dense as that imposed on A.

109 3.5. Concluding remarks 75 M 2t g Example GMRES(m) Bi - CGStab UQMR TFQMR m=10 m=30 m=50 m=80 m= Table 3.5.6: Number of iterations to solve the set of test models by using a topological strategy to sparsify A and a geometric strategy for the preconditioner. The pattern imposed on M is twice as dense as that imposed on A.

110 76 3. Sparse pattern selection strategies for robust...

111 Chapter 4 Symmetric Frobenius-norm minimization preconditioners in electromagnetism In the previous chapter we have introduced and compared some strategies to compute a priori the nonzero sparsity pattern for Frobeniusnorm minimization preconditioners in electromagnetic applications. The results of the numerical experiments suggest that using additional geometric information from the underlying mesh, it is possible to construct very sparse preconditioners and to make them more robust. In this chapter, we illustrate the numerical and computational efficiency of the proposed preconditioner. In Section 4.1, we assess the effectiveness of the sparse approximate inverse compared with standard methods for the solution of a set of model problems that are representative of real electromagnetic calculation. In Section 4.2, we complete the study considering two symmetric preconditioners based on Frobenius-norm minimization. 4.1 Comparison with standard preconditioners In this section we want to assess the performance of the proposed Frobenius-norm minimization approach. In Table 4.1.1, we show the numerical results observed on Examples 1-5 with some standard preconditioners, of both explicit and implicit form. These are: diagonal scaling, SSOR, ILU(0), SPAI and SLU applied to a sparse approximation of A constructed using the algebraic approach. All these preconditioners, except SLU, exhibit much poorer acceleration capabilities than that provided by M 2a g. If we reduce the density of the preconditioner in Example 1 and 3, M 2a g converges slowly but becomes the most efficient. 77

112 78 4. Symmetric Frobenius-norm minimization preconditioners... It should also be noted that SPAI works reasonably well when computed using dense A (see Table 3.2.1) but with sparse A it does not converge on Example 2 (see Table 4.1.1). In addition, following [35], we performed some numerical experiments where we obtained an approximate m j from (2.2.9) by dropping the smallest entries of the iterates computed by few steps of either the Minimum Residual method or GMRES. Unfortunately, the performance of these approaches for dynamically defining the pattern of the preconditioner was disappointing. They only improved the unpreconditioned case when a relative large number of iterations was used to build the preconditioner making them unaffordable for our problems. The purpose of this study is to understand the numerical behaviour of the preconditioners. Nevertheless, we do recognize that some of the simple strategies have a much lower cost for building the preconditioner and so could result in a faster solution. When SSOR converges, it is often the fastest, in terms of the CPU time for the overall solution of the linear system. When the solution is performed for only one right-hand side, the construction cost of the other preconditioners cannot be compensated for by the reduction in the number of iterations; the matrix-vector product is performed using BLAS kernels that make the iteration cost quite cheap for the problem sizes we have considered. For instance, when solving Example 1 with GMRES(50) on a SUN Enterprise, SSOR converges in 31.4 seconds and M 2a g requires 190 seconds for the construction and 7.6 seconds for the iterations. However, in electromagnetism applications, the same linear system has to be solved with many right-hand sides when illuminating an object with various waves corresponding to different angles of incidence. For that example, if we have more than eight right-hand sides, the construction of M 2a g is overcome by the time saved in the iterations and M 2a g becomes more efficient than SSOR. In addition, the construction and the application of M 2a g is fully parallelizable while the parallelization of SSOR requires some reordering of equations that may be difficult to implement efficiently on a distributed memory platform.

113 4.1. Comparison with standard preconditioners 79 Example 1 - Density of M = 5.03% Precond. GMRES(m) Bi - CGStab UQMR TFQMR m=10 m=30 m=50 m=80 m=110 M j SSOR ILU(0) SP AI SLU M 2a g Example 2 - Density of M = 1.59% Precond. GMRES(m) Bi - CGStab UQMR TFQMR m=10 m=30 m=50 m=80 m=110 M j SSOR ILU(0) SP AI SLU M 2a g Example 3 - Density of M = 2.35% Precond. GMRES(m) Bi - CGStab UQMR TFQMR m=10 m=30 m=50 m=80 m=110 M j SSOR ILU(0) SP AI SLU M 2a g Example 4 - Density of M = 1.04% Precond. GMRES(m) Bi - CGStab UQMR TFQMR m=10 m=30 m=50 m=80 m=110 M j SSOR ILU(0) SP AI SLU M 2a g Continued on next page

114 80 4. Symmetric Frobenius-norm minimization preconditioners... Continued from previous page Example 5 - Density of M = 0.63% Precond. GMRES(m) Bi - CGStab UQMR TFQMR m=10 m=30 m=50 m=80 m=110 M j SSOR ILU(0) SP AI SLU M 2a g Table 4.1.1: Number of iterations with some standard preconditioners computed using sparse A (algebraic). 4.2 Symmetrization strategies for Frobenius-norm minimization method The linear systems arising from the discretization by BEM can be symmetric non-hermitian in the Electric Field Integral Equation formulation (EFIE), or unsymmetric in the Combined Field Integral Equation formulation (CFIE). In this thesis, as mentioned in the previous chapters, we will only consider cases where the matrix is symmetric because EFIE usually gives rise to linear systems that are more difficult to solve with iterative methods. Another motivation to focus only on the EFIE formulation is that it does not require any restriction on the geometry of the scattering obstacle as CFIE does and in this respect is more general. However, the sparse approximate inverse computed by the Frobenius-norm minimization method is not guaranteed to be symmetric, and usually is not, even if a symmetric pattern is imposed on M, and consequently it might not fully exploit all the characteristics of the linear system. This fact prevents the use of symmetric Krylov solvers. To complete the earlier studies, in this section we consider two possible symmetrization strategies for Frobeniusnorm minimization using a prescribed pattern for the preconditioner based on geometric information. As before, all the preconditioners are computed using as input Ã, a sparse approximation of the dense coefficient matrix A. If M F rob denotes the unsymmetric matrix resulting from the minimization (2.2.9), the first strategy simply averages its off-diagonal entries. That is M Aver F rob = M F rob + MF T rob. (4.2.1) 2 An alternative way to construct a symmetric sparse approximate inverse is to only compute the lower triangular part, including the diagonal, of the

115 4.2. Symmetrization strategies for Frobenius-norm minimization preconditioner. The nonzeros calculated are reflected with respect to the diagonal and are used to update the right-hand sides of the subsequent leastsquares problems involved in the construction of the remaining columns of the preconditioner. More precisely, in the computation of the k-th column of the preconditioner, the entries m ik for i < k are set to m ki that are already available and only the lower diagonal entries are computed. The entries m ki are then used to update the right-hand sides of the least-squares problems which involve the remaining unknowns m ik, for k i. The least-squares problems are as follows: min ê j Ã ˆm j 2 2 (4.2.2) where ê j = e j k<j ã km kj, ˆm j = (0,..., 0, m jj,..., m nj ) T. In the following, this preconditioner is referred to as M Sym F rob. It should be noted that the preconditioner built using this approach no longer minimizes any Frobenius norm and it might be sensitive to the ordering of columns. In addition, if m denotes the number of nonzero entries in M Sym F rob, this methods only computes (m+n)/2 nonzeros. Thus the overall computational complexity for the construction of M Sym F rob can be considerably smaller than for M Aver F rob as the least-squares problems are usually solved by QR factorizations whose complexity is of the order of the square in the number of unknowns and linear in the number of equations. To study the numerical behaviour of these preconditioners, we consider the same set of test examples considered for the experiments with unsymmetric preconditioners. We recall that, for physical consistency, we have set the frequency of the incident wave for all the examples so that there are about ten discretization points per wavelength. We investigate the behaviour of the preconditioners when used to accelerate restarted GMRES, amongst unsymmetric solvers, and symmetric QMR, denoted by SQMR in the forthcoming tables, amongst symmetric solvers. As in the previous tests, the stopping criterion in all cases just consists in reducing the original residual by 10 5 that then can be related to a normwise backward error. In all the tables, the symbol - means that convergence is not obtained after 500 iterations. All the numerical experiments are performed in double precision complex arithmetic on a SGI Origin 2000 and the number of iterations reported in this section are for right preconditioning. The number of iterations for both GMRES and SQMR that actually also corresponds to the number of matrix-vector products that is the most time consuming part of the algorithms. Nevertheless, it should be noted that for the other parts of the algorithms the coupled two term recurrences of SQMR are much cheaper than the orthogonalization and least-squares solution involved in GMRES. From a memory point of view, SQMR is also much less demanding; if we used the same memory workspace for GMRES as for SQMR, the largest restart would be 5. In Table 4.2.2, we show the numerical behaviour of the different

116 82 4. Symmetric Frobenius-norm minimization preconditioners... Frobenius-norm minimization type preconditioners, both symmetric and unsymmetric. In the following we consider a geometric approach to define the sparsity pattern for Ã, as it is the only one that can be efficiently implemented in a parallel fast multipole environment [23]. We compare the unsymmetric preconditioner M F rob and the two symmetric preconditioners M Aver F rob and M Sym F rob. The column entitled Relative Flops displays σ QR (M) the ratio σ QR (M F rob ), where the σ QR(M) represents the number of floatingpoint operations required by the sequence of QR factorizations used to build the preconditioner M, that is either M = M Aver F rob or M = M Sym F rob. In this table, it can be seen that M Aver F rob almost always requires less iterations than M Sym F rob that imposes the symmetry directly and consequently only computes half of the entries. Since M Sym F rob computes less entries the associated values in the column Relative Flops are all less than one and close to a third in all cases. On the hardest test cases (Examples 1 and 3), the combination SQMR and M Aver F rob needs less than half the number of iterations of M F rob with GMRES(30) and is only very slightly less efficient than M F rob and GMRES(80). On the less difficult problems, SQMR plus M Aver F rob converges between 21 and 37% faster than GMRES(80) plus M F rob and between 31 and 43% faster than GMRES(30) plus M F rob. M Sym F rob, that only computes half of the entries of the preconditioner, has a poor convergence behaviour on the hardest problems and is slightly less efficient than M Aver F rob on the other problems when used with SQMR. Nevertheless, we should mention that, for the sake of comparison, those preliminary results have been performed using the set of parameters for the density of Ã and M that were the best for M F rob and consequently nearly optimal for M Aver F rob ; the performance of M Sym F rob might be improved as shown by the results depicted in Table These first experiments reveal the remarkable robustness of SQMR when used in combination with a symmetric preconditioner. This combination generally outperforms GMRES even for large restarts. The best alternative for significantly improving the behaviour of M Sym F rob is to enlarge significantly the density of Ã and only marginally increase the density of the preconditioner. In Table 4.2.3, we show the number of iterations observed with this strategy that consists in using a density of Ã that is three times larger than that for M Sym F rob ; we recall that for M Aver F rob and M F rob a density of Ã twice as large as that of the preconditioner is usually the best trade-off between computing cost and numerical efficiency. It can be seen that M Sym F rob is slightly better than M Aver F rob (as in Table 4.2.2) but it is less expensive to build. In this table, we consider the same values for σ QR (M F rob ) as those in Table to evaluate the ratio Relative Flops.

117 4.2. Symmetrization strategies for Frobenius-norm minimization Example 1 - Density of Ã = 10.13% - Density of M = 5.03% Precond. GMRES(30) GMRES(80) GMRES( ) SQMR Relative Flops M F rob * 1.00 M Aver F rob M Sym F rob Example 2 - Density of Ã = 3.17% - Density of M = 1.99% Precond. GMRES(30) GMRES(80) GMRES( ) SQMR Relative Flops M F rob * 1.00 M Aver F rob M Sym F rob Example 3 - Density of Ã = 4.72% - Density of M = 2.35% Precond. GMRES(30) GMRES(80) GMRES( ) SQMR Relative Flops M F rob * 1.00 M Aver F rob M Sym F rob Example 4 - Density of Ã = 2.08% - Density of M = 1.04% Precond. GMRES(30) GMRES(80) GMRES( ) SQMR Relative Flops M F rob * 1.00 M Aver F rob M Sym F rob Example 5 - Density of Ã = 1.25% - Density of M = 0.62% Precond. GMRES(30) GMRES(80) GMRES( ) SQMR Relative Flops M F rob * 1.00 M Aver F rob M Sym F rob Table 4.2.2: Number of iterations on the test examples using the same pattern for the preconditioners. Example Density GMRES(m) SQMR Relative Flops m=30 m=80 m= 1 Ã =11.98% M = 6.10 % Ã = 5.94% M = 2.04 % Ã =11.01% M = 3.14 % Ã = 2.08% M = 1.19 % Ã = 1.98% M = 0.62 % Table 4.2.3: Number of iterations for M Sym F rob combined with SQMR using three times more non-zero in Ã than in the preconditioner.

118 84 4. Symmetric Frobenius-norm minimization preconditioners... To illustrate the effect of the densities of Ã and of the preconditioners, we performed experiments with preconditioned SQMR, where the preconditioners are built by using either the same sparsity pattern for Ã or a two, three or five times denser pattern for Ã. We report in Tables and respectively the number of SQMR iterations for M Sym F rob, and for M Aver F rob respectively. In these tables, M Sym F rob always requires more iterations than M Aver F rob for the same values of density for Ã and for the preconditioner, but its computation costs about a quarter of the flops for each test. Example 1 Percentage density of M Density strategy Same times times times Table 4.2.4: Number of iterations of SQMR with M Sym F rob with different values for the density of M, using the same pattern for A and larger patterns. The test problem is Example 1. Example 1 Percentage density of M Density strategy Same times times times Table 4.2.5: Number of iterations of SQMR with M Aver F rob with different values for the density of M, using the same pattern for A and larger patterns. The test problem is Example 1. Because the construction of M Sym F rob is dependent on the ordering selected, a natural question concerns the sensitivity of the quality of the preconditioner to this. In particular, in [54] it is shown that the numerical behaviour of IC is very dependent on the ordering and a similar study and comparable conclusion with AIN V is described in [17]. In Table 4.2.6, we display the number of iterations with SQMR, selecting the same density

119 4.2. Symmetrization strategies for Frobenius-norm minimization parameters as those used for the experiments reported in Table 4.2.5, but using different orderings to permute the original pattern of M Sym F rob. More precisely we consider the reverse Cuthil-MacKee ordering [37] (RCM), the minimum degree [71, 141] ordering (MD), the spectral nested dissection ordering [114] (SND) and lastly we reorder the matrix by putting the denser rows and columns first (DF). It can be seen that M Sym F rob is not too sensitive to the ordering and none of the tested orderings appears superior to the others. Example Density Original RCM MD SND DF 1 Ã =11.98% M = 6.10 % Ã = 5.94% M = 2.04 % Ã =11.01% M = 3.14 % Ã = 2.08% M = 1.19 % Ã = 1.98% M = 0.62 % Table 4.2.6: Number of iterations of SQMR with M Sym F rob with different orderings. For comparison, in Table 4.2.7, we report on comparative results amongst different Frobenius-norm minimization type preconditioners, both symmetric and unsymmetric, obtained when the algebraic dropping strategy is used to sparsify the coefficient matrix. In this case, M Aver F rob always performs better than M Sym F rob but is at least three times more expensive to compute. On Examples 1 and 3, the hardest test cases, the combination SQMR and M Aver F rob needs up to 65% more iterations than GMRES(80) plus M F rob but competes with GMRES(30) plus M F rob. On the less difficult problems, SQMR plus M Aver F rob converges between 18 and 35% faster than GMRES(80) plus M F rob and between 20 and 47% faster than GMRES(30) plus M F rob. The best alternative to significantly improve the behaviour of M Sym F rob remains to enlarge notably the density of Ã and only marginally the density of the preconditioner. This can be observed in Table where we show the number of iterations observed with this strategy that consists in using a density of Ã that is at most three times larger than that of M Sym F rob. Once again the behaviour of M Sym F rob is comparable to that of M Aver F rob described in Table but is less expensive to build. In Tables and we illustrate the effect of the density of the approximation of the original matrix and of the preconditioners on the

120 86 4. Symmetric Frobenius-norm minimization preconditioners... convergence of SQMR. The preconditioners are built by using either the same sparsity pattern for Ã or a two, three or five times denser pattern for Ã. We report in Tables and Table , respectively, the number of iterations of SQMR iterations when an algebraic approach is used for Ã and a geometric approach is selected for M Sym F rob and M Aver F rob, respectively. If we compare these results with those reported in Table 4.2.4, it can be seen that, on hard problems, using geometric information even to prescribe the pattern of Ã is beneficial. M Sym F rob remains rather insensitive to the ordering as shown in the results of Table Example 1 - Density of Ã = 10.19% - Density of M = 5.03% Precond. GMRES(30) GMRES(80) GMRES(110) SQMR Relative Flops M F rob * 1.00 M Aver F rob M Sym F rob 0.25 Example 2 - Density of Ã = 3.18% - Density of M = 1.99% Precond. GMRES(30) GMRES(80) GMRES(110) SQMR Relative Flops M F rob * 1.00 M Aver F rob M Sym F rob Example 3 - Density of Ã = 4.69% - Density of M = 2.35% Precond. GMRES(30) GMRES(80) GMRES(110) SQMR Relative Flops M F rob * 1.00 M Aver F rob M Sym F rob 0.29 Example 4 - Density of Ã = 2.10% - Density of M = 1.04% Precond. GMRES(30) GMRES(80) GMRES(110) SQMR Relative Flops M F rob * 1.00 M Aver F rob M Sym F rob Example 5 - Density of Ã = 1.27% - Density of M = 0.62% Precond. GMRES(30) GMRES(80) GMRES(110) SQMR Relative Flops M F rob * 1.00 M Aver F rob M Sym F rob Table 4.2.7: Number of iterations on the test examples using the same pattern for the preconditioners. An algebraic pattern is used to sparsify A.

121 4.2. Symmetrization strategies for Frobenius-norm minimization Example Density GMRES(m) SQMR Relative Flops m=30 m=80 m= 1 Ã =12% M = 6% Ã = 5.97% M = 2.04 % Ã =11.08% M = 3.14 % Ã = 2.10% M = 1.19 % Ã = 1.87% M = 0.62 % Table 4.2.8: Number of iterations M Sym F rob combined with SQMR using three times more non-zero in Ã than in the preconditioner. An algebraic pattern is used to sparsify A. Example 1 Percentage density of M Density strategy Same times times times Table 4.2.9: Number of iterations of SQMR with M Sym F rob with different values for the density of M, using the same pattern for A and larger patterns. A geometric approach is adopted to construct the pattern for the preconditioner and an algebraic approach is adopted to construct the pattern for the coefficient matrix. The test problem is Example 1.

122 88 4. Symmetric Frobenius-norm minimization preconditioners... Example 1 Percentage density of M Density strategy Same times times times Table : Number of iterations of SQMR with M Aver F rob with different values for the density of M, using the same pattern for A and larger patterns. A geometric approach is adopted to construct the pattern for the preconditioner and an algebraic approach is adopted to construct the pattern for the coefficient matrix. The test problem is Example 1. Example Density Original RCM MD SND DF 1 Ã =12% M = 6% Ã = 5.97% M = 2.04 % Ã =11.08% M = 3.14 % Ã = 2.10% M = 1.19 % Ã = 1.87% M = 0.62 % Table : Number of iterations of SQMR with M Sym F rob with different ordering. An algebraic pattern is used to sparsify A. 4.3 Concluding remarks In this chapter we have assessed the performance of the Frobeniusnorm minimization preconditioner in the solution of dense complex symmetric non-hermitian systems of equations arising from electromagnetic applications. The set of problems used for the numerical experiments can be representative of larger systems. We have also investigated the use of symmetric preconditioners which reflect the symmetry of the original matrix in the associated preconditioner, and enable us to use a symmetric Krylov solver that might be cheaper than GMRES iterations. Both M Aver F rob and M Sym F rob appear to be efficient and robust. Through numerical

123 4.3. Concluding remarks 89 experiments, we have shown that M Sym F rob was not too sensitive to column ordering while M Aver F rob is totally insensitive. In addition M Aver F rob is straightforward to parallelize even though it requires more flops for its construction. It would probably be the preconditioner of choice in a parallel distributed fast multipole environment but possibilities for parallelizing M Sym F rob also exist, by using colouring techniques to detect independent subsets of columns that can be computed in parallel. In a multipole context the algorithm must be recast by blocks, and Level 2 BLAS operations have to be used for the least-squares updates. Finally, the major benefit of these two preconditioners is the remarkable robustness they exhibit when used in conjunction with SQMR.

124 90 4. Symmetric Frobenius-norm minimization preconditioners...

125 Chapter 5 Combining fast multipole techniques and approximate inverse preconditioners for large parallel electromagnetics calculations. In this chapter we consider the implementation of the Frobenius-norm minimization preconditioner described in Chapter 3 within a code that implements the Fast Multipole Method (FMM). We combine the sparse approximate inverse preconditioner with fast multipole techniques for the solution of huge electromagnetic problems. The chapter is organized as follows: in Section 5.1 we quickly overview the FMM. In Section 5.2 we describe the implementation of the Frobenius-norm minimization preconditioner in a parallel and multipole context that has been developed by [135]. In Section 5.3 we study the numerical and parallel scalability of the implementation for the solution of large problems. Finally, in Section 5.4 we investigate the numerical behaviour of inner-outer iterative solution schemes implemented in a multipole context with different levels of accuracy for the matrix-vector products in the inner and outer loops. We consider in particular FGMRES as the outer solver with an inner GMRES iteration preconditioned by the Frobenius-norm minimization method. We illustrate the robustness and effectiveness of this scheme for the solution of problems with up to one million unknowns. 91

126 92 5. Combining fast multipole techniques and approximate inverse The fast multipole method The FMM, introduced by Greengard and Rokhlin [82], provides an algorithm for computing approximate matrix-vector products for electromagnetic scattering problems. The method is fast in the sense that the computation of one matrix-vector product costs O(n log n) arithmetic operations instead of the usual O(n 2 ) operations, and is approximate in the sense that the relative error with respect to the exact computation is around 10 3 [38, 135]. It is based on truncated series expansions of the Green s function for the electric-field integral equation (EFIE). The EFIE can be written as E(x) = G(x, x )ρ(x )d 3 x ik Γ c Γ G(x, x )J(x )d 3 x +E E (x), (5.1.1) where E E is the electric field due to external sources, J(x) is the current density, ρ(x) is the charge density and the constants k and c are the wavenumber and the speed of light, respectively. The Green s function G can be expressed as G(x, x ) = e ik x x x x. (5.1.2) The EFIE is converted into matrix equations by the Method of Moments [86]. The unknown current J(x) on the surface of the object is expanded into a set of basis functions B i, i = 1, 2,..., N J(x) = N J i B i (x). i=1 This expansion is introduced in (5.1.1), and the discretized equation is applied to a set of test functions. A linear system is finally obtained. The entries in the coefficient matrix of the system are expressed in terms of surface integrals, and have the form A KL = G(x, y)b K (x) B L (y)dl(y)dk(x). (5.1.3) When m-point Gauss quadrature formulae are used to compute the surface integrals in (5.1.3), the entries of the coefficient matrix assume the form A KL = m i=1 j=1 m ω i ω j G(x Ki, y Lj )B K (x Ki ) B L (y Lj ). (5.1.4) Single and multilevel variants of the FMM exist and, for the multilevel algorithm, there are adaptive variants that handle efficiently inhomogeneous

127 5.1. The fast multipole method 93 discretizations. In the one-level algorithm, the 3D obstacle is entirely enclosed in a large rectangular domain, and the domain is divided into eight boxes (four in 2D). Each box is recursively divided until the length of the edges of the boxes of the current level is small enough compared with the wavelength. The neighbourhood of a box is defined by the box itself and its 26 adjacent neighbours (eight in 2D). The interactions of degrees of freedom within nearby boxes are computed exactly from (5.1.4), where the Green s function is expressed via (5.1.2). The contributions of far away cubes are computed approximately. For each far away box, the effect of a large number of degrees of freedom is concentrated into one multipole coefficient, that is computed using truncated series expansion of the Green s function G(x, y) = P ψ p (x)φ p (y). (5.1.5) p=1 The expansion (5.1.5) separates the Green s function into two sets of terms, ψ i and φ i, that depend on the observation point x and the source (or evaluation) point y, respectively. In (5.1.5) the origin of the expansion is near the source point and the observation point x is far away. Local coefficients for the observation cubes are computed by summing together multipole coefficients of far-away boxes, and the total effect of the far field on each observation point is evaluated from the local expansions (see Figure for a 2D illustration). Local and multipole coefficients can be computed in a preprocessing step; the approximate computation of the far field enables us to reduce the computational cost of the matrix-vector product to O(n 3/2 ) in the basic one-level algorithm. In the hierarchical multilevel algorithm, the obstacle is enclosed in a cube, the cube is divided into eight subcubes and each subcube is recursively divided until the size of the smallest box is generally half of a wavelength. Tree-structured data is used at all levels. In particular only non-empty cubes are indexed and recorded in the data structure. The resulting tree is called an oct-tree (see Figure 5.1.2) and we refer to its leaves as the leaf-boxes. The oct-tree provides a hierarchical representation of the computational domain partitioned by boxes. Each box has one parent in the oct-tree, except for the largest cube which encloses the whole domain, and up to eight children. Obviously, the leaf-boxes have no children. Multipole coefficients are computed for all cubes in the lowest level of the oct-tree, that is for the leaf-boxes. Multipole coefficients of the parent cubes in the hierarchy are computed by summing together contributions from the multipole coefficients of their children. The process is repeated recursively until the coarsest possible level. For each observation cube, an interaction list is defined that consists of those cubes that are not neighbours of the cube itself but whose parent is a neighbour of the cube s parent. In Figure we denote by dashed lines the interaction list

128 94 5. Combining fast multipole techniques and approximate inverse... for the observation cube in the 2D case. The interactions of degrees of freedom within neighbouring boxes are computed exactly, while the interactions between cubes in the interaction list are computed using the FMM. All the other interactions are computed hierarchically on a coarser level traversing the oct-tree. Both the computational cost and the memory requirement of the algorithm are of order O(n log n). For further details on the algorithmic steps see [39, 115, 124] and [38, 44, 45, 46] for recent theoretical investigations. Parallel implementations of hierarchical methods have been described in [78, 79, 80, 81, 126, 149]. Figure 5.1.1: Interactions in the one-level FMM. For each leaf-box, the interactions with the gray neighbouring leaf-boxes are computed directly. The contribution of far away cubes are computed approximately. The multipole expansions of far away boxes are translated to local expansions for the leaf-box; these contributions are summed together and the total field induced by far away cubes is evaluated from local expansions. 5.2 Implementation of the Frobenius-norm minimization preconditioner in the fast multipole framework An efficient implementation of the Frobenius-norm minimization preconditioner in the FMM context exploits the box-wise partitioning of the domain. The subdivision into boxes of the computational domain uses

129 5.2. Implementation of the Frobenius-norm minimization preconditioner...95 Figure 5.1.2: The oct-tree in the FMM algorithm. The maximum number of children is eight. The actual number corresponds to the subset of eight that intersect the object (courtesy of G. Sylvand, INRIA CERMICS). geometric information from the obstacle, that is the spatial coordinates of its degrees of freedom. As we know from Chapter 3, this information can be profitably used to compute an effective a priori sparsity pattern for the approximate inverse. In the FMM implementation, we adopt the following criterion: the nonzero structure of each column of the preconditioner is defined by retaining all the edges within a given leaf-box and those in one level of neighbouring boxes. We recall that the neighbourhood of a box is defined by the box itself and its 26 adjacent neighbours (eight in 2D). The sparse approximation of the dense coefficient matrix is defined by retaining the entries associated with edges included in the given leaf-box as well as those belonging to the two levels of neighbours. The actual entries of the approximate inverse are computed column by column by solving independent least-squares problems. The main advantage of defining the pattern of the preconditioner and of the original sparsified matrix box-wise is that we only have to compute a QR factorization per leaf-box. Indeed the least-squares problems corresponding to edges within the same box are identical because they are defined using the same nonzero structure and the same entries of A. It means that the QR factorization can be performed once and reused many times, improving significantly the efficiency of the computation. The preconditioner has a sparse block structure; each block is dense and is associated with one leaf-box. Its construction can use a different partitioning from that used to approximate the dense coefficient matrix and represented by the oct-tree. The size of the smallest boxes in the partitioning

96 5. Combining fast multipole techniques and approximate inverse... Figure 5.1.3: Interactions in the multilevel FMM. The interactions for the gray boxes are computed directly.

130 96 5. Combining fast multipole techniques and approximate inverse... Figure 5.1.3: Interactions in the multilevel FMM. The interactions for the gray boxes are computed directly. We denote by dashed lines the interaction list for the observation box, that consists of those cubes that are not neighbours of the cube itself but whose parent is a neighbour of the cube s parent. The interactions of the cubes in the list are computed using the FMM. All the other interactions are computed hierarchically on a coarser level, denoted by solid lines. associated with the preconditioner is a user-defined parameter that can be tuned to control the number of nonzeros computed per row, that is the density of the preconditioner. According to our criterion, the larger the size of the leaf-boxes, the larger the geometric neighbourhood that determines the sparsity structure of the columns of the preconditioner. Parallelism can be exploited by assigning disjoint subsets of leaf-boxes to different processors and performing the least-squares solutions independently on each processor. Communication is required to get information on the entries of the coefficient matrix from neighbouring leaf-boxes. 5.3 Numerical scalability of the preconditioner In this section we show results concerning the numerical scalability of the Frobenius-norm minimization preconditioner. They have been obtained by increasing the value of the frequency and illuminating the same obstacle. The surface of the object is always discretized using ten points per wavelength. We consider two test examples: a sphere of radius 1 metre

Efficient parallel iterative solvers for the solution of large dense linear systems arising from the boundary element method in electromagnetism

Efficient parallel iterative solvers for the solution of large dense linear systems arising from the boundary element method in electromagnetism G. Alléon EADS-CCR Av. D. Daurat 31 700 Blagnac guillaume.alleon@eads.net