Two-sided Eigenvalue Algorithms for Modal Approximation

Two-sided Eigenvalue Algorithms for Modal Approximation Master s thesis submitted to Faculty of Mathematics at Chemnitz University of Technology presented by: Supervisor: Advisor: B.sc. Patrick Kürschner Prof. Dr. Peter Benner Dr. Michiel E. Hochstenbach Chemnitz June 14, 2010

ACKNOWLEDGEMENTS My primary thanks go to my teacher and supervisor Prof. Dr. Peter Benner for helping me writing this thesis and guiding me during all this years as student and scientific research assistant. Without his supervision and the opportunity to work in his research group, I probably would not have discovered numerical linear algebra, systems and control theory and model order reduction as such extremely interesting fields of modern mathematics. Secondly, I thank my advisor Dr. Michiel E. Hochstenbach for the initial idea for the topic of this thesis, all the advises and hints he gave me in the many inspiring discussions, and of course for his hospitality during my stay in Eindhoven, which was sadly much too short. I am also very grateful to Dr. Joost Rommes for answering a lot of my questions in the countless conversations that helped me get a deeper understanding of the investigated methods. Of course, many further thanks go to my friends and colleagues with whom I had the pleasure to live and work together which made the last years such an unforgettable time. Unfortunately, I am not able to mention every single person but only a few ones. I especially thank, for instance, my dear colleagues Dr. Jens Saak and Matthias Voigt for the daily coffee breaks in our office involving many encouraging conversations. I also want to thank Alexander Bernhardt and Gordon Schmidt for reading parts of this work. Furthermore, I thank all my other friends who did probably only rarely catch sight on me during the last weeks of the development of this thesis. Finally, I am also deeply grateful for the constant support my family gave me during my study.

Abstract Large scale linear time invariant (LTI) systems arise in many physical and technical fields. An approximation, e.g. with model order reduction techniques, of this large systems is crucial for a cost efficient simulation. In this thesis we focus on a model order reduction method based on modal approximation, where the LTI system is projected onto the left and right eigenspaces corresponding to the dominant poles of the system. These dominant poles are related to the most dominant parts of the residue expansion of the transfer function and usually form a small subset of the eigenvalues of the system matrices. The computation of this dominant poles can be a formidable task, since they can lie anywhere inside the spectrum and the corresponding left eigenvectors have to be approximated as well. We investigate the subspace accelerated dominant pole algorithm and the two-sided and alternating Jacobi-Davidson method for this modal truncation approach. These methods can be seen as subspace accelerated versions of certain Rayleigh quotient iterations. Several strategies that admit an efficient computation of several dominant poles of single-input single-output LTI systems are examined. Since dominant poles can lie in the interior of the spectrum, we discuss also harmonic subspace extraction approaches which might improve the convergence of the methods. Extentions of the modal approximation approach and the applied eigenvalue solvers to multi-input multi-output are also examined. The discussed eigenvalue algorithms and the model order reduction approach will be tested for several practically relevant LTI systems.

Contents List of Figures List of Tables List of Algorithms ix xi xiii 1 Introduction 1 2 Mathematical basics 3 2.1 Eigenvalue problems.............................. 3 2.1.1 The standard eigenvalue problem.................. 3 2.1.2 The generalized eigenvalue problem................. 4 2.1.3 Quadratic and polynomial eigenvalue problems.......... 5 2.1.4 The singular value decomposition.................. 6 2.2 Methods for eigenvalue problems....................... 7 2.3 Systems and control theory.......................... 10 2.3.1 Linear time invariant state-space systems.............. 10 2.3.2 Linear descriptor systems....................... 13 2.3.3 Second-order systems......................... 13 2.4 Model order reduction............................. 14 2.4.1 The common principle of model order reduction.......... 14 2.4.2 Modal approximation......................... 15 3 Rayleigh Quotient Iterations 23 3.1 The standard Rayleigh Quotient Iteration.................. 23 3.2 The two-sided Rayleigh Quotient Iteration................. 26 3.3 The Dominant Pole Algorithm........................ 28 3.4 The Alternating Rayleigh Quotient Iteration................. 30

viii Contents 3.5 The Half-Step Rayleigh Quotient Iteration.................. 32 3.6 Numerical example............................... 33 4 Two-sided subspace accelerated eigenvalue methods 37 4.1 The Subspace Accelerated Dominant Pole Algorithm........... 38 4.2 The two-sided Jacobi-Davidson algorithm.................. 44 4.2.1 The new correction equations..................... 44 4.2.2 Computing more than one eigentriplet............... 51 4.2.3 Inexact solution and preconditioning of the correction equations 54 4.3 The Alternating Jacobi-Davidson algorithm................. 59 4.3.1 An alternating subspace accelerated scheme............ 59 4.3.2 Computing dominant poles...................... 62 4.3.3 Deflation, restarts and inexact solution of the correction equations 62 5 Further improvements and generalizations 65 5.1 Harmonic subspace extraction......................... 65 5.1.1 One-sided harmonic subspace extraction.............. 66 5.1.2 Two-sided harmonic subspace extraction.............. 67 5.2 MIMO Systems................................. 71 5.2.1 Multivariable transfer functions................... 71 5.2.2 The Subspace Accelerated MIMO Dominant Pole Algorithm... 72 5.2.3 Computation of MIMO dominant poles with 2-JD......... 74 6 Numerical examples 77 7 Summary and Outlook 93 7.1 Conclusions................................... 93 7.2 Future research perspectives.......................... 94 Bibliography 97 Theses 102 Declaration of Authorship/Selbstständigkeitserklärung 105

List of Figures 1.1 Schematic overview of model order reduction (MOR)............ 2 2.1 (a) Bode plot of the transfer function of the CD player [8] SISO system of order n = 120 in a double logarithmic plot. (b) Sigma plot of the full 2 2 MIMO system................................... 12 2.2 3-D Bode plot of H(s), eigenvalues and dominant poles in the region [ 2, 0] i[0, 20] C of the New England test system [26] of order n = 66. 17 2.3 (a) Eigenvalues and 6 dominant poles in [ 2, 0] i[0, 10] C. (b) Bode magnitude plot of the transfer function of the New England test system and imaginary parts of the dominant poles........... 19 2.4 Bode plot of original New England test system and reduced order models with p = 3 (k = 5 eigenvalues / states) and p = 6 (k = 11 eigenvalues / states) dominant poles according to (2.15)................... 20 3.1 (a) Convergence histories of RQI, 2-RQI, ARQI, DPA and HSRQI for the New England test system. (b) The same as (a), but for the PEEC patch antenna Model [8]................................ 34 4.1 Bode magnitude plot of original transfer function of the FOM model [8], and modal equivalents H i where the dominant pole p i for i = 1, 2, 3 is deflated. The dominant poles are p 1 = 1 ± 100i, p 2 = 1 ± 200i and p 3 = 1 ± 400i. H 4 shows the result when all three poles are removed. The vertical dashed lines mark Im (p j ) for j = 1, 2, 3........... 43 6.1 Convergence histories for 2-JD, SA2RQI and SADPA for the PEEC model [8] (n = 480) with bi-e-orthogonal (a) and orthogonal (b) search spaces.. 78

x List of Figures 6.2 (a) Bode plot and (b) relative error of the original PEEC model and the modal equivalents of order k = 80 obtained directly with the QZ algorithm. Figure (c) and (d) show the results obtained with 2-JD....... 80 6.3 Convergence histories for 2-JD, SA2RQI and SADPA for the BIPS model (n = 13.251). All linear systems were solved exactly............. 82 6.4 (a) Bode plot and (b) relative error of the original BIPS model and the reduced order models (r.o.m.) of order k = 100 obtained with 2-JD, SA2RQI and SADPA............................... 83 6.5 (a) Convergence histories for 2-JD, SA2RQI and SADPA for the BIPS system (n = 13.251). All linear systems were solved with 10 steps GMRES and LU = ie A as fixed preconditioner. (b) The same as (a), but the preconditioner is updated after a triplet has been detected and after a restart....................................... 85 6.6 Convergence histories for SAARQI and AJD for the clamped beam model (n = 348). All linear systems were solved exactly using LU decompositions. 86 6.7 (a) Bode plot and (b) relative error of the original beam model and the k = 14 modal equivalents obtained with AJD and SAARQI........ 86 6.8 Convergence histories for 2-JD with standard, generalized two-sided harmonic (gen. 2-harm.) and double one-sided harmonic Petrov-Galerkin (double 1-harm.) extraction for the BIPS model and τ = i, γ = 0.95. All linear systems were solved with 10 steps of GMRES and LU = τe A.. 88 6.9 Sigma plot of complete model of the ISS system 3 3 transfer function and k = 40 modal equivalents computed with 2-JD............. 89 6.10 Sigma plot of complete model of the ISS system 3 3 transfer function and k = 40 modal equivalents computed with SAMDP........... 90 6.11 Sigma plot of complete model of the BIPS 8 8 transfer function and k = 251 modal equivalents computed with 2-JD (a) and SAMDP (b).... 90

List of Tables 2.1 Dominant poles and corresponding scaled residues of the New England test system..................................... 21 6.1 Excerpt of the found poles and corresponding residues of the BIPS system for the three methods. Iteration numbers marked with brackets represent poles that were found after the first 50 iterations while a minus sign indicates that the pole was not found by the particular method...... 82 6.2 Summary of the found poles and corresponding residues of the BIPS system using different subspace extractions. A minus sign indicates that the pole was not found.............................. 88

xii List of Tables

List of Algorithms 3.1 Rayleigh quotient iteration (RQI)....................... 25 3.2 Two-sided Rayleigh quotient iteration (2-RQI)............... 28 3.3 Dominant Pole Algorithm (DPA)....................... 30 3.4 Alternating Rayleigh quotient iteration (ARQI)............... 31 3.5 Half-step Rayleigh quotient iteration (HSRQI)............... 33 4.1 Subpace Accelerated Dominant Pole Algorithm (SADPA)......... 38 4.2 ( Λ, Q, Z)=Sort(S, T, b, c)........................... 40 4.3 Basic bi-e-orthogonal two-sided Jacobi-Davidson algorithm....... 48 4.4 Efficient exact solution of the correction equations of Algorithm 4.3 (bi- E-orthogonal 2-JD)............................... 50 4.5 Bi-E-orthogonal two-sided Jacobi-Davidson algorithm for dominant pole computation................................... 52 4.6 Alternating Jacobi-Davidson method..................... 60 5.1 ( Λ, Q, Z)=SortHarm(S 1, S 2, T 1, T 2, b, c, V, W, τ, γ)........... 70 5.2 ( Λ, Q, Z)=SortM(S, T, B, C, V, W)...................... 74

1 Introduction One of the most widely used tools to describe technical and physical phenomena are dynamical systems governed by systems of differential equations. The simulation of these phenomena via the solution of the underlying equations reveals insights in the dynamical behavior of the system and is a cornerstone of the production cycle of modern technical devices since it is common practice to simulate the device in the computer before actually realizing it. In the last decades, however, the size of these dynamical systems began to increase drastically. On the one hand because one is nowadays interested in a more realistic description of the phenomena involved and thus more details have to be considered, and on the other hand simply due to the significantly increased complexity of the systems. Electrical circuits provide a good example to illustrate this issue. An electrical circuit consists of circuit elements which are, for instance, resistors, inductors, capacitors and transistors, and is usually described by Kirchhoff s laws and the characteristic equations of the circuit elements [15, 35]. This leads to systems of nonlinear differential-algebraic equations [24]. However, a modern integrated circuit has a very huge number of circuit elements packed very densely in a small space. To guarantee a realistic description of such a circuit from a physical point of view, other effects arising from the electromagnetic field or even heat conduction may have to be taken into account as well. Altogether the result is a very large-scale system of circuit equations where the number of unknowns can easily exceed one million. Other fields of applications where such large-scale dynamical systems arise are, e.g., vibration analysis of mechanical structures, chemical and biological engineering, and power supply networks. The computational effort for the simulation of these large systems is therefore very high and can even be beyond the capabilities of modern high-end computers. In the last decades this has led to an increased focus in model order reduction which is a research area that addresses the approximation of dynamical systems. Model order reduction is schematically illustrated in Figure 1.1 using linear time invariant dynamical systems

2 Chapter 1. Introduction E ẋ(t) = A x(t) + B u(t) y(t) = C x(t) MOR Ẽ x(t) = Ã x(t) + B u(t) ỹ(t) = C x(t) Figure 1.1: Schematic overview of model order reduction (MOR). as example. The dark green squares and rectangles represent the system matrices and their size stands for the dimension of the matrices and likewise for the order of the system. The large arrow in the middle poses as the intrinsic model order reduction which reduces the size of the matrices, and hence the order of the system. In this sense the goal of model order reduction is to obtain a so called reduced order model (r.o.m.) of strongly decreased size which is much easier to simulate than the large-scale original model. This is usually achieved by determining the dominant parts of the original system, which have a significant contribution to the system dynamics, and neglecting the less important segments which normally outnumber the dominant parts. Of course, this reduced order model should be accurate enough to reflect the dynamic behavior of the original model adequately. There exist several model order reduction techniques for this purpose, for instance balanced truncation and Krylov subspace projection methods [2, 5, 14, 28]. In this thesis we investigate another technique for linear time invariant systems, namely modal approximation (or modal truncation). Thereby the original system is projected onto the right and left eigenspaces associated with the dominant poles of the system. Dominant poles are eigenvalues of the system matrices that have a significant contribution to the dynamical behavior of the system. Thus, modal truncation boils down to the computation of a number of eigentriplets of the corresponding large-scale eigenvalue problem. Algorithms for this often formidable task and their application within modal approximation are the essential topics of this thesis. The remainder of this thesis is structured as follows. In Chapter 2 we review the necessary fundamentals of eigenvalue theory and systems and control theory, and we investigate modal approximation as model order reduction method. Since we have to compute eigenvalues and eigenvectors for this strategy, we discuss some basic methods for eigenvalue computations based on the Rayleigh quotient in Chapter 3. Chapter 4 represents the main part of this thesis and deals with some improved versions of the basic iterations mentioned before which admit the computation of a number of eigentriplets of the system matrices and can hence directly be used to generate reduced order models. Our main interest in this context are Jacobi-Davidson style eigenvalue solvers. Chapter 5 introduces some generalizations of these methods from both a numerical and an application-oriented point of view. Numerical experiments of the eigenvalue computation and the intrinsic modal approximation are presented in Chapter 6. Finally, Chapter 7 gives conclusions and some future research perspectives.

2 Mathematical basics 2.1 Eigenvalue problems This section briefly reviews some concepts from eigenvalue theory which are necessary for the following sections and chapters of this thesis. More details and informations can be found, for instance, in the textbooks [13, 33, 59]. 2.1.1 The standard eigenvalue problem For a matrix A C n n the standard eigenvalue problem is defined as Ax = λx, x 0, with unknowns λ C and x C n. The scalar λ is a root of the characteristic polynomial of A, that is p A (λ) := det(a λi) = 0, and is called eigenvalue. The nonzero vector x is a (right) eigenvector for λ. We refer to a pair (λ, x) as eigenpair of A. Similarly, a nonzero vector y C n for which y A = λy holds is a left eigenvector for λ. Together with λ and x this forms an eigentriplet (λ, x, y) of A. The set Λ(A) = {λ C : det(a λi) = 0} contains all eigenvalues of A and is referred to as spectrum of A. The multiplicity of a root of det(a λi) is called algebraic multiplicity and is denoted by α(λ). The corresponding geometric multiplicity β(λ) of the eigenvalue λ is the dimension of the null space of A λi. If A has s n distinct eigenvalues or equivalently, det(a λi) has s n distinct roots, denoted by λ j, and α(λ j ) = β(λ j ) holds for all j = 1,..., s, then the matrix is diagonalizable (or nondefective). In this case there exist n linearly independent eigenvectors x 1,..., x n such that X 1 AX = diag(λ 1,..., λ n ) with a nonsingular matrix X C n n which has the right eigenvectors as columns. The corresponding left eigenvectors y 1,..., y n satisfy y i A = λy i

4 Chapter 2. Mathematical basics and can (in the nondefective case) be scaled such that y i x j = δ ij, i, j = 1,..., n. However, if x i, y i are scaled such that x i 2 = y i 2 = 1 and if α(λ i ) = 1, then κ(λ i ) := 1 y i x i defines the condition number of the simple eigenvalue λ i. The left eigenvectors of A are also the rows of X 1 and therefore it is possible to write Y AX = diag(λ 1,..., λ n ) and Y X = I. The columns of the nonsingular matrix Y C n n are then the left eigenvectors. Symmetric matrices A = A T R n n and Hermitian matrices A = A C n n have only real eigenvalues and the eigenvectors form a complete orthonormal basis. This property of the eigenvectors is also shared by normal matrices, that is matrices for which AA = A A holds. Hence, a right eigenvector for an eigenvalue of a symmetric, Hermitian or normal matrix is also a left eigenvector for the same eigenvalue. A matrix that is not diagonalizable is called defective. In this case there are defective eigenvalues with α(λ) > β(λ). For general square matrices there exists a Schur decomposition such that Q AQ = T = diag(λ 1,..., λ n ) + N with a unitary matrix Q C n n and a strictly upper triangular nilpotent matrix N. A decomposition that reveals the algebraic and geometric multiplicities of the eigenvalues, too, is the Jordan decomposition or Jordan canonical form X 1 AX = diag(j 1,..., J s ). Each Jordan block J i C m i m i is strictly upper triangular with a single eigenvalue λ i on its diagonal and ones along the first superdiagonal and it holds m 1 +... + m s = n. The Jordan blocks with m i > 1 correspond to defective eigenvalues. 2.1.2 The generalized eigenvalue problem The problem to find scalars λ and nonzero vectors x that satisfy Ax = λex, where A, E C n n, is called generalized eigenvalue problem. Analogous to the standard eigenvalue problem, the scalar λ is a root of det(a λe) and is referred to as generalized eigenvalue with corresponding (right) eigenvector x. The pair (λ, x) is now called generalized eigenpair of the matrix pair (A, E) or of the pencil A λe. A nonzero vector y C n that fulfills y A = λy E

2.1. Eigenvalue problems 5 is then a left eigenvector of (A, E) or, altogether, (λ, x, y) is a (generalized) eigentriplet of the pair (A, E). As for the standard eigenvalue problem, the set of all eigenvalues of the pair (A, E) is called the spectrum of (A, E) and is denoted by Λ(A, E). It is possible to find for general matrices A, E unitary matrices Q, Z C n n which simultaneously triangularize A and E to a generalized Schur decomposition: Q AZ = T, Q EZ = S. The matrices S, T are upper triangular and their diagonal entries reveal the eigenvalues of (A, E) by λ i = t ii /s ii if s ii 0. If s ii = 0, there is an eigenvalue λ i = and if s ii = t ii = 0 for some i, then Λ(A, E) = C. In this case the pair (A, E) is called singular and in all former cases it is called regular. If there are n linearly independent right and left eigenvectors x i and y i, then the pair (A, E) is called diagonalizable or nondefective. Since in this case it holds y i Ex j = 0 for i j, it is possible to write Y AX = Λ A, Y EX = Λ E, with Y = [y 1,..., y n ], X = [x 1,..., x n ] C n n and diagonal matrices Λ A and Λ E. Additionally, if Λ E is nonsingular it follows that Λ A Λ 1 E =: Λ = diag(λ 1,..., λ n ). Right and left eigenvectors corresponding to finite eigenvalues of a nondefective pair (A, E) can be scaled so that y i Ax i = λ i and y i Ex j = δ ij holds. An important special case is A = A and E = E > 0. Then Λ(A, E) R and there exists a nonsingular eigenvector matrix X C n n such that X AX = diag(λ 1,..., λ n ) and X EX = I. Here the columns x i of X are both right and left eigenvectors corresponding to the eigenvalue λ i and they are orthogonal with respect to the inner product induced by E or, in other words, bi-e-orthogonal. If A or E is nonsingular, the generalized eigenvalue problem can be transformed into a standard eigenproblem by multiplying with A 1 or E 1, respectively. However, in most cases this is not reasonable from a numerical point of view. For arbitrary pairs (A, E) there exist, in analogy to the Jordan decomposition for the standard eigenvalue problem, the Weierstrass and Weierstrass-Schur decompositions. 2.1.3 Quadratic and polynomial eigenvalue problems Another generalization of the standard eigenproblem is the quadratic eigenvalue problem of the form (λ 2 M + λl + K)x = 0, x 0, where M, L, K C n n. Eigenvalues and right and left eigenvectors are defined in the same way as for the standard or generalized eigenproblem. The next generalization is the polynomial eigenvalue problem p λ i A i x = 0, x 0 i=0

6 Chapter 2. Mathematical basics with A i C n n. Polynomial eigenproblems have np eigenvalues and up to np right and left eigenvectors which implies that, if there are more than n eigenvectors, they are not linearly independent. A usual approach to deal with quadratic and polynomial problems is to reformulate them as equivalent generalized or standard eigenproblem. For instance, the quadratic eigenproblem can be rewritten as L(λ)z = 0, where L(λ) := λ [ ] M 0 0 I [ L ] K I 0 [ ] λx and z :=. x We refer to [54] for more details and a nice collection of examples of quadratic eigenproblems. 2.1.4 The singular value decomposition For an arbitrary matrix A C m n there exists a singular value decomposition (SVD) of the form A = UΣV, U C m m, V C n n unitary, and Σ R m n. We assume without loss of generality m n. Then [ ] Σ1 Σ =, Σ 0 1 = diag(σ 1,..., σ n ) R n n. For m < n one can easily consider A. The diagonal entries σ i of Σ 1 are called singular values of A and are ordered such that σ 1 σ 2... σ r > σ r+1 =... = σ n = 0, where r := rank (A). The columns u and v of the unitary matrices U and V are called left and right singular vectors, respectively, and are scaled so that u 2 = v 2 = 1. Together with a corresponding singular value they form a singular triplet (σ, u, v) that satisfies Av = σu and A u = σv. The square roots of the nonzero eigenvalues of A A and AA are the nonzero singular values of A. The corresponding eigenvectors of A A (AA ) are the right (left) singular vectors of A. Another connection of singular values and eigenvalues is given by the augmented matrix [13, Section 8.6] [ ] 0 A M := A C (m+n) (m+n). 0 The absolute values of the eigenvalues of M are the singular values of A and the eigenvectors of M can be decomposed into an upper and lower part. The upper part corresponds to the right and the lower part to the left singular vectors.

2.2. Methods for eigenvalue problems 7 2.2 Methods for eigenvalue problems In this section we give a brief overview over some of the methods for eigenvalue problems. For more detailed descriptions we refer to [3, 13, 53, 56]. Methods for eigenvalue problems are usually distinguished between full space methods for dense matrices of moderate size and iterative subspace methods for very large and sparse matrices. Full space methods compute the complete set of eigenvalues and, if necessary, the eigenvectors or invariant subspaces, too. Although these methods are sometimes referred to as direct methods, they are also of iterative nature. For the standard eigenproblem with A C n n the QR method can be used to compute a Schur decomposition. If the matrix is symmetric, there exist, among others, the symmetric QR method and Jacobi methods. Symmetric tridiagonal eigenproblems can be solved efficiently with divide and conquer methods. The QZ method computes a generalized Schur decomposition of a pair (A, E). Since full space methods usually transform the original matrices to diagonal or triangular form by applying transformation matrices, they have a complexity of O(n 3 ) and therefore have a limited range of applications. For large scale sparse matrices, which play an essential role in scientific computations, iterative subspace methods come into the picture which normally focus on the computation of only a fraction of the spectrum and, if necessary, the corresponding eigenvectors. A matrix A C n n is called sparse if the number of nonzero elements is only of order O(n). Iterative subspace methods work only with matrix vector products on the original matrix A, which is inexpensive if A is sparse and hence they can theoretically be applied to large sparse matrices of unlimited size. In such methods, the eigenproblem is usually projected onto a lower dimensional subspace. The projected eigenproblem is then of small or moderate size and can be solved with the full space methods mentioned above. The projection is usually carried out by imposing a certain Galerkin condition on the approximate eigenpair (θ, v). Its most basic form is given by the Ritz-Galerkin projection. There the eigenvector approximations are represented by v := V k x C n with a nonzero coefficient vector x C k and a matrix V k C n k whose columns are orthonormal and span the k dimensional search space V k C n, practically with k n. The corresponding residual of a approximate eigenpair (θ, v) is then assumed to be orthogonal to the whole subspace V k : or equivalently, r := Av θv V k, V k AV k ˆx = θ ˆx. Clearly, (θ, x) is an eigenpair of the reduced matrix M k := V k AV k C k k and can be computed efficiently by standard methods for small eigenproblems. The eigenvector x is lifted up to the n dimensional space by v := V k x and the resulting pair (θ, v) is called Ritz pair of A with respect to the subspace V k. Afterwards, the subspace V k is expanded

8 Chapter 2. Mathematical basics orthogonally by a new basis vector which is derived from v, and the whole process is repeated with this (k + 1) dimensional subspace. In this thesis, we will work mainly with Petrov-Galerkin projection where a second subspace X k is used such that r := AV k x θv k x X k holds. Note that X k is often called test subspace in this context. The eigenpairs (θ, x) of the reduced eigenproblem X k AV k x = θx k V k x lead to Petrov pairs (θ, v := V k x) with respect to the search subspace V k and the test subspace X k. Note that, depending on the choice of X k, the reduced eigenproblem can in general be a generalized one, even if the original problem is not. One choice is to derive the basis vectors of X k from the approximate left eigenvectors of A. This Petrov-Galerkin projection can be easily extended to generalized eigenvalue problems and to a two-sided Petrov-Galerkin projection if the left eigenvectors are sought as well. This two-sided approach will be the common feature of the eigenvalue methods to be investigated in Chapter 4. Important and well known iterative methods working with the Ritz-Galerkin projection are the Krylov subspace methods where V k is constructed to be a Krylov subspace V k = K(A, q 1, k) := span { q 1, Aq 1,..., A k 1 q 1 } for some initial vector q 1 C n. Two prominent methods using this framework are the Lanczos method for Hermitian and the Arnoldi method [13, Ch. 9] for general square matrices. The Lanczos methods produces a unitary matrix V k C n k that spans a k dimensional A invariant Krylov subspace such that the transformed matrix T k := V k AV k C k k is tridiagonal and its eigenvalues are good approximations for the eigenvalues of A. Similarly, the Arnoldi method produces a matrix V k such that H k := V k AV k is upper Hessenberg. For both methods there exist generalizations for the generalized eigenvalue problem and for the computation of the left eigenvalues which include a one- or two-sided Petrov-Galerkin type projection [56]. In this thesis we focus on another important class of subspace methods which are the Jacobi-Davidson methods. Initially proposed by G. L. G. Sleijpen and H. A. van der Vorst [48] for the standard linear eigenvalue problem, the Jacobi-Davidson method has been improved and generalized in several ways to handle various problems, for instance generalized eigenvalue problems [7, 12, 37], quadratic and polynomial eigenvalue problems [7, 21, 49], singular value problems [18], and even nonlinear eigenvalue problems [6, 45, 46]. Although we investigate these methods in greater detail in the remaining chapters, we give here a rough description of their functioning. The original basic Jacobi-Davidson method [48] uses two principles. At first, the so called Davidson principle [10] where

2.2. Methods for eigenvalue problems 9 a Ritz-Galerkin type projection is applied as described previously. But now the constructed subspace V k is in general not a Krylov subspace. Let V k be the orthogonal matrix with the basis vectors of V k as columns. The reduced eigenvalue problem is then given by V k AV k x = θ x with the reduced matrix M k := V k AV k C k k which has, unlike in the Lanczos or Arnoldi method, no special tridiagonal or triangular form. For a Ritz pair (θ, v := V k x) of A with respect to V k obtained from this small eigenvalue problem the second principle is used to find a correction t for the Ritz vector v. In the Jacobi-Davidson method this is done by the Jacobi style correction [22], which finds a correction t C n orthogonal to the Ritz vector v. This can be done by applying a Newton scheme to the function F : C n+1 C n+1, see [34, 47]: [ ] Ax λx F w (λ, x) = w. x 1 The vector w C n is used to induce a suitable scaling of the vector x. Clearly, an exact eigenpair (λ, x) with w x = 1 is a root of F and satisfies F w (λ, x) = 0. We now look for a better approximation (θ +, v + ) = (θ + µ, v + t) of the previously generated Ritz pair, where the improvement t is sought in the orthogonal complement v := {z C : v z = 0} of v. After some manipulations this yields the linear system of equations [48] (I vv )(A θi)(i vv )t = r which is called Jacobi-Davidson correction equation. The subspace V k is then expanded orthogonally, by e.g. using Gram-Schmidt orthogonalization, with t as new basis vector v k+1. The whole process is then repeated with a search space dimension until the norm of the residual r 2 is smaller than a given tolerance. If the correction equation is solved exactly, the Jacobi-Davidson method is an exact Newton method [11, 47]. However, in practice it is often sufficient to solve the correction equation only up to a moderate accuracy by applying a small number of steps of an iterative method for linear systems, such as CG, BiCG or GMRES [43]. In [58] it is shown, that the error induced by this inexact solution of the correction equation is in some sense minimized due to the projection onto the orthogonal complement of the previous eigenvector approximation. In Section 4.2 and 4.3 we investigate the two-sided and the alternating Jacobi-Davidson method [20, 50] which compute a number of approximate eigentriplets (λ, x, y) of a matrix pair (A, E). The two-sided Jacobi-Davidson method uses a two-sided Petrov-Galerkin projection to compute approximations for right and left eigenvectors simultaneously, while the alternating Jacobi-Davidson applies a Ritz-Galerkin projection alternately to (A, E) and (A, E ) to produce approximations of right and left eigenvectors in each even and odd iteration, respectively.

10 Chapter 2. Mathematical basics 2.3 Systems and control theory This section covers some fundamentals of systems and control theory that are necessary for the remainder of this thesis. Very good and more detailed introductions can be found, for instance, in [2, 4, 9, 23]. 2.3.1 Linear time invariant state-space systems A linear time invariant (LTI) state-space system is of the form ẋ(t) = Ax(t) + Bu(t), x(t 0 ) = x 0, y(t) = Cx(t) + Du(t), (2.1) with a state-space matrix A R n n, input and output maps B R n m and C R p n, and a direct transmission map D R p m. Furthermore, x(t) R n is called state vector, u(t) R m is called input or control vector, and y(t) R p is referred to as output vector. Instead of the form (2.1), we will denote such systems often by tuples (A, B, C, D). If for the direct transmission map holds D = 0, we remove D from the tuple and denote the system only by (A, B, C). The ordinary differential equation of (2.1) is called state equation and the algebraic equation below is called output equation. The order of such a system is defined by the dimension n of the state-space matrix A. If m, p > 1 (2.1) is called multi-input multi-output (MIMO) system and otherwise single-input single-output (SISO) system if m = p = 1. In the sequel we assume without loss of generality that t 0 = 0 and x 0 = 0. SISO systems are often written in the form ẋ(t) = Ax(t) + bu(t), x 0 = 0, (2.2) y(t) = cx(t) + du(t) with A, x(t) as in (2.1), input and output vectors b, c R n, d R and u(t), y(t) R. Dynamical systems of the form (2.1) and (2.2) are used to characterize the dynamics of certain physical and technical models. However, in realistic models the relations between x(t), u(t) and y(t) can be nonlinear. These nonlinear models can under certain conditions be linearized to get LTI systems. Other physical phenomena, for instance the heat transfer within a material, are modeled by instationary partial differential equations. A discretization of the space variables using finite elements or finite differences leads then to ordinary differential equations which can, if necessary, again be linearized to obtain LTI systems. See [9, Section 5.2] for a nice collection of examples of these concepts. Of great importance in systems and control theory is the transfer function of the system (2.1). The transfer function can be obtained by applying the Laplace transform L { f } (s) := 0 e st f (t)dt

2.3. Systems and control theory 11 to the state and output equations. This yields sx(s) Y(s) = AX(s) + BU(s), = CX(s) + DU(s), where X(s), U(s) and Y(s) are the Laplace transforms of x(t), u(t) and y(t), respectively. The upper equation can be rearranged to X(s) = (si A) 1 BU(s) and inserted into the output equation such that Y(s) = ( C(sI A) 1 b + D ) U(s). The term in the brackets is called transfer function H : C C p m of (2.1) and is defined by H(s) = C(sI A) 1 B + D. (2.3) It relates inputs to outputs in the frequency domain by the relation Y(s) = H(s)U(s). The poles of (2.3) are a subset of Λ(A) and play an essential role for model order reduction based on modal approximation. The transfer function H(s) of a SISO (2.2) is often illustrated via the Bode magnitude plot of H(s), which is a logarithmic plot of H(iω) versus frequencies ω R +. Usually, the magnitude H(s) for s C is expressed as decibels of the gain G(s) := 20 log 10 ( H(s) ). (2.4) With this logarithmic scaling the Bode magnitude plot shows the graph of (ω, G(iω)). Another useful illustration are 3-D Bode plots [55]. In the MIMO case H(s) is a matrix valued function and an appropriate illustration are so called sigma plots, which depict the largest and smallest singular values σ max and σ min of H(iω) versus frequencies ω R + where a logarithmic scaling similar to (2.4) can be used as well. In Figure 2.1 we illustrated the Bode magnitude and sigma plot of the CD player 1 example [8] of order n = 120. The SISO system (A, b 2, c 1 ) in the Bode magnitude plot 2.1a is extracted from the full MIMO system by taking only the second column b 2 of B and the first row c 1 of C. Figure 2.1b shows the sigma plot of the full MIMO system (A, B, C, D = 0) with p = m = 2. Observe the peaks in both plots, which are caused by the dominant poles of H(s) which usually form a small subset of Λ(A). These dominant poles are defined and described in more detail in subsection 2.4.2 and can be considered as the backbone of our modal truncation approach. An LTI system is called asymptotically stable if lim t x(t) = 0. The solution of the system 1 available at http://www.slicot.org/index.php?site=examples

12 Chapter 2. Mathematical basics 0 100 Gain (db) 50 Gain (db) 0 100 H(iω) 100 σ max (H(iω)) σ min (H(iω)) 10 1 10 0 10 1 10 2 10 3 10 4 10 5 10 6 Frequency (rad/sec) 10 1 10 0 10 1 10 2 10 3 10 4 10 5 10 6 Frequency (rad/sec) (a) Bode magnitude plot (b) Sigma plot Figure 2.1: (a) Bode plot of the transfer function of the CD player [8] SISO system of order n = 120 in a double logarithmic plot. (b) Sigma plot of the full 2 2 MIMO system. (2.1) is given by t x(t) = e At x 0 + y(t) = Cx(t) + Du(t). 0 e A(t τ) Bu(τ)dτ, (2.5) Hence, a LTI system is asymptotically stable if Λ(A) C, that is, all eigenvalues of A have negative real parts. Matrices A with this property are also called Hurwitz. Another important system theoretic property is passivity which means that the system generates no energy and absorbs energy only from sources that are used to excite it. Passivity can be investigated with the transfer function H(s). A system is passive if and only if H(s) is positive real. Positive realness is given when H(s) is analytic for all s C +, H(s) = H(s) for all s C, and H(s) + H (s) = Re (H(s)) 0 for all s C +. A system is called controllable if every state x(t) can be reached via appropriate control u(t) from the initial state x 0 = 0 for any t 0. From the solution (2.5) of the differential equation it follows that controllability is equivalent to im(e At ) = R n and the Cayley-Hamilton theorem implies that this is the case when the controllability matrix C(A, B) = [B, AB,..., A n 1 B] has rank n. A dual property of controllability is observability. A system is observable if the initial state x 0 is uniquely determined from the input and the output. Equivalently, with u(t) = 0, the output y(0) = 0 implies that x 0 = 0. Another equivalent condition of

2.3. Systems and control theory 13 observability is that the observability matrix O(A, C) = [C, A C,..., (A n 1 ) C] has full rank n. If a system is both controllable and observable it is called minimal. 2.3.2 Linear descriptor systems A modification of (2.1) are linear time invariant descriptor systems Eẋ(t) = Ax(t) + Bu(t), x(0) = x 0, y(t) = Cx(t) + Du(t) (2.6) with an additional, possibly singular descriptor matrix E R n n. The vector x(t) R n is in this case called descriptor or generalized state vector. We will denote descriptor systems also by tuples (E, A, B, C, D). If E is indeed singular then the first equation in (2.6) is a differential algebraic equation (DAE). DAEs have their own special properties which distinguishes them from the ordinary differential equations of the standard state-space systems. See [24] for detailed informations. The transfer function of (2.6) is, if the matrix pair (A, E) is regular, given by H(s) = C(sE A) 1 B + D and its poles form a subset of Λ(A, E). If E is singular (DAE case), Λ(A, E) contains eigenvalues at infinity. A descriptor system is called asymptotically stable if the finite eigenvalues of the pair (A, E) have negative real parts. Controllability and observability can be generalized for descriptor systems, too. Most of the systems we consider in this thesis will be in descriptor form. 2.3.3 Second-order systems Another generalization of the standard LTI systems are second-order linear time invariant dynamical systems of the form Mẍ(t) + Lẋ(t) + Kx(t) = Bu(t), (2.7) y(t) = Cx(t) + Du(t) with three system matrices M, L, K R n n. All other matrices and vectors have the same size and meaning as in the standard and descriptor case. Since second-order systems arise often in structural system analysis, M is called mass matrix, L is the damping matrix, and K is referred to as stiffness matrix. The transfer function of (2.7) is defined as H(s) = C(s 2 M + sl + K) 1 B + D. (2.8)

14 Chapter 2. Mathematical basics Obviously, the poles of the transfer function (2.8) are a subset of the eigenvalues λ i C of the quadratic eigenproblem (λ 2 M + λl + K)x = 0, x 0. Second-order systems can be transformed to first order descriptor systems by applying the same linearization technique as for the quadratic eigenvalue problem. Assuming a nonsingular K, we can define matrices [ ] [ ] 0 K K 0 A :=, E := R 2n 2n, K L 0 M B l = [0, B T ] T R 2n m and C l = [C, 0] R p 2n, and a vector z = [x T, ẋ T ] T R 2n such that the corresponding linear system is given by Eż(t) = Az(t) + B l u(t), x(t 0 ) = x 0, y(t) = L l z(t) + Du(t). Note that, similar to quadratic eigenvalue problems, there are other linearizations possible [54]. 2.4 Model order reduction 2.4.1 The common principle of model order reduction The goal of model order reduction is to reduce the order of a given dynamical system, for instance in order to allow a simulation with a reduced computational effort. For a descriptor system (2.6) this can be expressed in finding a reduced order model of order k n Ẽ x = Ã x + Bu, ỹ = B x + Du with Ã, Ẽ R k k, B R k m, B R p k, D R p m and x = x(t) R k, u = u(t) R m, ỹ = ỹ(t) R p. There are mainly three prominent methods of model order reduction of linear time invariant systems: Krylov subspace methods, balanced truncation, and modal approximation. See e.g. [2, 5, 14, 28] for more details of these approaches. The common principle of all those methods can again be considered as a Petrov-Galerkin type projection. For the state equation of the original system (2.6) one could write Eẋ Ax Bu C n (i.e. = 0). Let X k, Y k be two k dimensional subspaces of C n spanned by the basis vectors x 1,..., x k and y 1,..., y k, respectively, and let X k, Y k C n k be the corresponding basis matrices

2.4. Model order reduction 15 with the basis vectors as columns. The space X k is called search space and Y k is the test space. In a Petrov-Galerkin projection we use an oblique projection of the system onto X k along Y k. A representation of the state vector x in X k is then given by x = X k x with x C k. The associated Petrov-Galerkin condition is which is equivalent to EX k x AX k x Bu Y k (2.9) Y k EX k x Y k AX k x Y kbu = 0. Together with the corresponding projected output equation this leads to Y k EX k x ỹ Obviously, this is a reduced order model = Y k AX k x + Y k Bu, = CX k x + Du. (Ẽ, Ã, B, C, D ) = ( Y k EX k, Y k AX k, Y k B, CX k, D ) of order k. The question arises how to choose the subspace X k, Y k adequately, such that the reduced order model yields a good approximation for the dynamics of the original model. The availability of an a-priori error bound is also an often requested goal. We are now going to review how the spaces X k and Y k can be chosen for a modal approximation since this is the model order reduction technique of interest in this thesis. 2.4.2 Modal approximation For modal approximation one chooses the subspaces X k, Y k of the Petrov-Galerkin projection (2.9) as the right and left eigenspaces corresponding to a set of k eigenvalues of the pair (A, E). For this purpose, we consider the eigenvalue decomposition of a nondefective pair (A, E): Y AX = Λ = diag(λ 1,..., λ n ), Y EX = I. (2.10) The matrices X, Y C n n contain the right and left eigenvectors x i and y i corresponding to the eigenvalues λ i (i = 1,..., n), that is Ax i = λex i and y i A = λy i E. Now suppose the eigenvalue decomposition can be partitioned as [ ] [ ] [ ] Λ1 0 Y1, Y 2 A X1, X 2 =, [ ] [ ] Y 0 Λ 1, Y 2 E X1, X 2 = I, 2

16 Chapter 2. Mathematical basics where Λ 1 C k k contains k eigenvalues of interest and the other block matrices are of appropriate size. The reduced order model is then obtained via (Ẽ, Ã, B, C, D) := (Y 1 EX 1 = I k, Y 1 AX 1 = Λ 1, Y 1 B, CX 1, D). and the subspaces X k, Y k are trivially given by span(x 1 ), span(y 1 ). The natural question that arises, is which subset of Λ(A, E) should be selected in order to obtain a good approximation of the systems behavior in the reduced order model? For the answer we start for a better illustration with the SISO descriptor case and investigate the transfer function (2.3) H : C C, H(s) = c(se A) 1 b + d (2.11) which is a scalar rational function and its poles form a subset of Λ(A, E). Remark 2.1: Since the matrix pairs (A, E) of descriptor systems are in most cases real, we consider this case only in this thesis and refer to a pole of (2.11) as either an eigentriplet (λ, x, y) if λ R or to a pair of complex conjugate eigentriplets if λ C. Since the absolute value H(s) of the transfer function H(s) maps complex numbers s = Re (s) + i Im (s) to real numbers z := H(s) R, we can write it as bivariate function z = H(x, y) for (x := Re (s), y := Im (s)) R R. A very interesting and revealing illustration for our purpose is a 3-D Bode plot [55] which is a surface plot of the gain (2.4) against s = x + iy C. Figure 2.2 shows the 3-D Bode plot of H(s) of the New England test system 2 [26] in a region in the left half plane. The poles of H(s) (eigenvalues of A) are marked as black dots in the Re (s)-im (s)-plane. Note that the cutsection of the surface with the Im (s)-z-plane corresponds to the Bode plot of H(s) and is therefore emphasized as thick black curve. Observe that the function values grow in the limit towards infinity as s reaches an eigenvalue λ of (A, E). However, the poles marked as green dots elevate the function values in a stronger way and cause peaks in the Bode plot. In the following, these poles will be referred to as dominant poles. To investigate which specific poles cause these peaks we rewrite H(s) as sum of residues R j C of the r n finite poles [23]: H(s) = r i=j R j + R + d, R j := (cx j )(y s λ jb), (2.12) j where R is the constant contribution of the infinite eigenvalues and often zero. This expression can be obtained by inserting the eigenvalue decomposition (2.10) into (2.11) [23] or by rewriting H(s) as a partial fraction expansion [1]. Since H(s) is raised towards infinity as s approaches λ Λ(A, E), the peaks in the Bode plot occur close to frequencies ω R + which are close to the imaginary parts of certain eigenvalues λ j. For clarification which specific eigenvalues cause this behavior, let λ = λ n = α + iβ Λ(A, E), R = R n 2 available at http://sites.google.com/site/rommes/software

2.4. Model order reduction 17 Figure 2.2: 3-D Bode plot of H(s), eigenvalues and dominant poles in the region [ 2, 0] i[0, 20] C of the New England test system [26] of order n = 66. the corresponding residue, assume R = 0, d = 0, and consider the limit of H(iω) for ω towards Im (λ) = β: n R j lim H(iω) = lim ω β ω β iω λ j j=1 R n 1 = lim ω β iω (α + iβ) + R j iω λ j = R α + H n 1(iβ). We see that if ω is close to Im (λ) and R / Re (λ) is large, then H(iω) in the Bode magnitude plot is large as well which establishes a first criterion of modal dominance. Note that by the scaling by Re (λ), dominant poles are usually positioned close to the imaginary axis as it can be observed in Figure 2.2. If this scaling is omitted, the residue j=1

18 Chapter 2. Mathematical basics magnitude R alone can also be used as an indicator for modal dominance. In some applications one is interested in the poles closest to zero which can be emphasized by a modal dominance indicator of the form R / λ. This quantity can be derived in a similar way as above by considering the limit of H(s) for s towards zero. Altogether we get the following three definitions of modal dominance. Definition 2.2: Let λ i Λ(A, E) be a pole of the transfer function H(s) of a SISO system (E, A, b, c, d) with corresponding left and right eigenvectors y i and x i which are scaled so that y i Ex i = 1. Then λ i is called dominant pole if R i > R j, (2.13) R i Re (λ i ) > R j Re (λ j ), (2.14) or R i λ i > R j λ j (2.15) holds for all j i. For the more general MIMO systems, the residue represention of H(s) is H(s) = r i=1 R i s λ i + R + D, R i := (Cx i )(y i B) Cp m. (2.16) In this case peaks occur in the sigma plot of H(s). By using the spectral norm 2 this leads to similar definitions of modal dominance for MIMO systems. Definition 2.3: Let (λ i, x i, y i ) be an eigentriplet of (A, E) with y i Ex i = 1. Then the pole λ i of the transfer function H(s) of a MIMO system (E, A, B, C, D) is called dominant pole if R i 2 > R j 2, (2.17) R i 2 Re (λ i ) > R j 2 Re (λ j ), (2.18) or R i 2 λ i > R j 2 λ j (2.19) holds for all j i. Note that R 2 = σ max (R). Note that none of the introduced dominance criteria is dependent of the direct transmission maps d or D. For both SISO and MIMO systems several other dominance measurements are possible [1, 57]. In Figure 2.3a 6 dominant poles with respect to the dominance definition (2.15) of the New England test system are plotted together with Λ(A) in a region of the Re (λ)-im (λ)-plane. Note that this plot is essentially the same