Preconditioned multishift BiCG for H 2 -optimal model reduction. Mian Ilyas Ahmad, Daniel B. Szyld, and Martin B. van Gijzen

Preconditioned multishift BiCG for H 2 -optimal model reduction Mian Ilyas Ahmad, Daniel B. Szyld, and Martin B. van Gijzen Report 12-06-15 Department of Mathematics Temple University June 2012. Revised March 2013 This report is available at http://www.math.temple.edu/~szyld

PRECONDITIONED MULTISHIFT BICG FOR H 2 -OPTIMAL MODEL REDUCTION MIAN ILYAS AHMAD, DANIEL B. SZYLD, AND MARTIN B. VAN GIJZEN Abstract. Modern methods for H 2 -optimal model order reduction include the Iterative Rational Krylov Algorithm (IRKA, [Gugerkin, Antoulas, and Beattie, 2008]) and Parameterized Model Reduction (PMOR, [Baur, Beattie, Benner, and Gugercin, 2011]). In every IRKA or PMOR iteration, two sequences of shifted linear systems need to be solved, one for the shifted matrices and one for their adjoint, using the same sequence of shifts. In this paper we propose a computationally efficient way of solving both sequence of problems together using the bi-conjugate gradient method (BiCG). The idea is to construct in advance bases for the two Krylov subspaces (one for the matrix and one for its adjoint), suitably preconditioned. These bases are then reused inside the parametrized model reduction methods for the other shifts, without the need for additional matrix-vector products. The performance of our proposed implementation ideas is illustrated through numerical experiments. Key words. Model order reduction, IRKA, PMOR, shifted linear systems, preconditioning, BiCG AMS subject classifications. 65F10, 65F50, 41A05 1. Introduction. Numerical simulations are often based on large scale complex models that are used to measure and control the behavior of some output parameters with respect to a given set of inputs. The concept of model reduction is to approximate this input-output behavior by a much simpler model that can predict the actual behavior. Different applications in various areas require model reduction. For example, in electronics, model reduction is used to predict the behavior of complicated interconnect systems [9]; see [1] for an overview of model reduction for large scale dynamical systems. For a single input/single output (SISO) linear system represented in state-space form by the following, ẋ(t) = Ax(t)+bu(t), y(t) = c T x(t), (1.1) where A R n n, b,c R n, with n being the order (or dimension) of the system, and x(t) R n, u(t),y(t) R are the state, input, and output, respectively. It is in general assumed that A is stable, i.e., all its eigenvalues are in the left half plane. The transfer function associated with (1.1) is G(s) = c T (si A) 1 b. Model reduction consists of writing another system, of much smaller dimension m n, similar to (1.1), i.e., ẋ m (t) = A m x m (t)+b m u(t), y m (t) = c T m x m(t), (1.2) with A m R m m, b m,c m,x m (t) R m. One goal is that (1.2) reflects (1.1) well, for example, by having its transfer function G m (s) = c T m(si A m ) 1 b m be close to G(s) in some sense. For example H 2 -model order reduction seeks a stable minimizer to the H 2 -norm of the difference G G m over all possible choices of stable matrices [ Am b m c T m 0 ], This version dated 25 March 2013 Max Planck Institute for Dynamics of Complex Technical Systems, Sandtorstr. 1, 39106 Magdeburg, Germany. imahmad@mpi-magdeburg.mpg.de Temple University, Department of Mathematics College of Science and Technology, 1805 N Broad Street Philadelphia, PA 19122-6094, USA. szyld@temple.edu. Research supported in part by the U.S. Department of Energy under grant DE-FG02-05ER25672 and the U.S. National Science Foundation under grant DMS-1115520. Delft Institute of Applied Mathematics, Mekelweg 4, 2628 CD Delft, The Netherlands. M.B.vanGijzen@tudelft.nl

2 Preconditioned multishift BiCG for model reduction with A m R m m, and the H 2 -norm is defined as G 1 2 H 2 := 1 2π G 1 (jw) dw. One desirable feature is that with the reduced-order system one obtains moment matching, i.e., that the two transfer functions, and some of their derivatives, have the same value at certain prescribed points. Various approaches have been proposed in the literature for model reduction [1, 4, 14, 19 21, 27, 33, 36, 37]. The balanced truncation [26] and optimal Hankel-norm approximation methods [15] guarantee stability of the actual system in the approximate model and have a priori error bound. However, these techniques involve the computation of solutions of Lyapunov equations and decomposition of matrices, making them computationally expensive for large scale systems and unable to efficiently take advantage of sparsity or any of their structure. The alternative is to project to appropriate (left and right) Krylov subspaces, and their construction involve only matrix-vector products (or solution of shifted systems) of dimension n, and can achieve moment matching without explicit computation of moments; see further Section 2. These properties make them computationally efficient and a better choice for model reduction of large scale systems. The standard Arnoldi and Lanczos algorithms [2, 24, 31, 32] are two well-known techniques that compute orthogonal and bi-orthogonal bases, respectively, of a Krylov subspace, of the form K i (A,b) := span{b,ab,a 2 b,...,a i 1 b}; thus an element of this subspaces can be written as p(a)b, where p(z) is a polynomial of degree i 1. A variety of projection techniques utilize these bases to compute reduced order approximate models for large linear systems. It was shown that the resulting reduced order approximation achieves moment matching at infinity. An important extension of the Arnoldi/Lanczos algorithm is the rational Arnoldi/Lanczos algorithm [16, 28, 29] which computes orthogonal/bi-orthogonal bases for subspaces whose elements are of the form r(a)b, where r(z) is a rational function. The reduced order approximation obtained via projection onto these rational Krylov subspaces has the advantage that it achieves moment matching at a predefined set of interpolation points in contrast to the Arnoldi/Lanczos approximation which achieve moment matching only at infinity. Thus the rational Arnoldi/Lanczos approximations are highly dependent on the choice of interpolation points. Various techniques have been proposed in the literature for the selection of interpolation points in the rational Krylov method. In 2008, the iterative rational Krylov algorithm (IRKA) [19] was proposed which updates the interpolation points in such a way that on convergence these points are the negative eigenvalues of the reduced order approximation. Having negative eigenvalues is in fact a necessary condition for H 2 -optimal approximation [25,38]. See also the more recent paper [5], where similar ideas are extended to more general interpolatory projection methods, including parameterized model reduction (PMOR). In this paper, we suggest an alternative implementation to methods such as IRKA and PMOR, by using a variant of the bi-conjugate gradient (BiCG) method [10]. In IRKA and other interpolatory projection methods, at every iteration j, two systems (s j ki A)x = b and ( s j k I AT ) x = c ( s denotes the conjugate of s) have to be solved for every interpolating frequency s j k Sj m = {sj 1,...,sj m }. BiCG can solve both of these shifted linear systems simultaneously. We mention that we feel justified in the use of approximate solutions of the shifted systems for IRKA, PMOR, and similar methods, since the reduced order models obtained are optimal for a close model to the original problem as demonstrated in the very recent paper [6]. Both IRKA as implemented in [19] and the proposed implementation with preconditioned BiCG proposed here converge to the same local minimizer of the H 2 -optimal model reduction problem. However, the proposed version is based on the observation that Krylov subspaces are shift invariant [12, 22], that is, K i (A,b) = K i ((si A),b). (1.3)

M.I. Ahmad, D.B. Szyld, and M.B. van Gijzen 3 The proposal here is to use a variant of the BiCG algorithm for shifted linear systems that can exploit this property by constructing approximate solutions for the sequence of shifted systems by utilizing only the bases vectors for K i (A,b) and K i (A T,c) (called the seed systems). This means that algorithms which require the solution of many sequences of shifted linear systems (such as IRKA) can be efficiently solved by using the matrix-vector multiplications associated with the seed system. Thus the proposed algorithm improves the computational cost of IRKA and similar methods. The construction of a new basis vector requires a matrix-vector multiplication with A for K i (A,b) and with A T for K i (A T,c). Therefore by using the basis vector for m s shifted systems we save m s 1 matrix-vector product with A and A T in every iteration, compared to solving these systems one-by-one with the standard BiCG algorithm (or other methods). In the proposed implementation, the basis vectors computed in each iteration that are utilized by the m s shifted systems can be computed in advance. In this paper, we also discuss the question how to simultaneously precondition a sequence of shifted linear systems, without loosing the shift-invariance property of the corresponding Krylov subspaces. Polynomial preconditioners form a class of preconditioners that satisfy this property, as is explained in [11]. We will explain in general how to apply polynomial preconditioning, and also present a polynomial preconditioner for matrices with a strongly complex spectrum. Our numerical experiments show that for our test problems they are highly effective. The remaining part of this paper is organized as follows. Section 2 discusses rational Krylov methods for H 2 -Optimal model reduction, in particular IRKA; Section 3 presents a variant of BiCG that can be used to simultaneously solve a sequence of shifted systems; Section 4 explains how this variant of BiCG can be used to give a computationally attractive implementation of IRKA; Section 5 discusses the question how to precondition a sequence of shifted linear systems; Section 6 presents numerical experiments that show the efficiency of our approach. 2. Rational Krylov methods and H 2 -Optimal model reduction. We begin by sketching an algorithm for rational Arnoldi, that given A, b, and a set of frequencies s i (the poles of the rational function r), builds the subspaces of vectors of the form r(a)b. This is presented in Algorithm 2.1, where V stands for the adjoint of V, i.e., its transpose conjugate. Algorithm 2.1 The Rational Arnoldi Algorithm 1: Inputs: A R n n, b R n, S = {s 1,...,s K } C closed under conjugation with multiplicities M = {m s1,...,m sk } and tol > 0. 2: Initialize: V m = [ ]; j = 0; 3: for k = 1 to K do 4: if {s k = }; ṽ j+1 = b; ṽ m+1 = ṽ j+1 ; else ṽ j+1 = (s k I A) 1 b end 5: if {k > 1}; ˆv j+1 = ṽ j+1 V m V mṽ j+1 end 6: if ˆv j+1 < tol, Stop, end 7: v j+1 = ˆv j+1 / ˆv j+1 ; V m = [V m v j+1 ]; j = j +1; 8: for i = 1 to m sk 1 do 9: if {s k = }; ṽ j+1 = Av j ; ṽ m+1 = ṽ j+1 ; else ṽ j+1 = (s k I A) 1 v j end 10: ˆv j+1 = ṽ j+1 V m V mṽ j+1 11: if ˆv j+1 < tol, Stop, end 12: v j+1 = ˆv j+1 / ˆv j+1 ; V m = [V m v j+1 ]; j = j +1; 13: end for 14: end for 15: ˆv m+1 = Aṽ m+1 V m V maṽ m+1 ; 16: if ˆv m+1 < tol, Stop, end 17: v m+1 = ˆv m+1 / ˆv m+1 18: Outputs: V m ; v m+1 ; As we discussed in the introduction, the H 2 -optimal model reduction problem is to identify a stable reduced order system G m (s), which is the best approximation of G(s) in terms of the

4 Preconditioned multishift BiCG for model reduction H 2 -norm, i.e., G m (s) = arg min G Ĝm H2. (2.1) Ĝm(s) is stable dim(ĝm(s))=m The set of stable m-th order dynamical systems is not convex and therefore the problem in (2.1) has multiple minimizers. An iterative algorithm may not converge to the global minimizer that can solve (2.1) but may result in a local minimizer. Most approaches for solving (2.1) derive first order necessary conditions for local optimality. It was shown in [19] that the H 2 -norm of the error is small if the reduced order model G m (s) interpolates G(s) at λ i (A) and λ i (A m ), where λ i (A) stands for the ith eigenvalue of A. Since λ i (A m ) is not known a priori, several approaches were developed to minimize the H 2 -error by choosing interpolation points to be λ i (A) [17,18]. However, Meier and Luenberger [25] showed that interpolation at λ i (A m ) is more useful and is a necessary condition for H 2 -optimality. The following theorem gives this condition. Lemma 2.1. [25] Given a stable SISO system G(s)=c T (si A) 1 b, let G m (s)=c T m (si m A m ) 1 b m be a local minimizer of dimension m for the optimal H 2 -model reduction problem (2.1) and suppose that G m (s) has simple poles at s i, i=1,...,m. Then G m (s) interpolates both G(s) and its first derivative at s i, i=1,...,m: G m (s i )=G(s i ), G m (s i)=g (s i ), s i S m := {s 1,...,s m }. (2.2) It was shown in [19] that the necessary optimality conditions of Hyland-Bernstein [20] and Wilson [36] are equivalent to the Meier-Luenberger conditions [25] in the case of continuous time SISO systems having simple poles. Also some new necessary conditions based on the H 2 - inner product were stated. The equivalence of necessary optimality conditions for continuous time MIMO (multiple inputs/multiple outputs) systems with multiple poles was derived in [35]. Similar results about the equivalence of necessary optimality conditions are formulated [7] for discrete time MIMO systems. An iterative algorithm based on rational a Krylov method was proposed in [19], which on convergence achieves moment matching and satisfies the interpolation based optimality conditions. This IRKA algorithm utilizes rational Krylov steps in constructing G m (s) and does not require Lyapunov solvers or dense matrix decompositions, and this is very effective, especially for large-scale systems. The necessary conditions given in (2.2) are equivalent to the root finding problem of the following expression. λ(a m (S m )) = S m, (2.3) where λ( ) represent the eigenvalues and S m are the required roots and where we have written A m (S m ) to emphasize the dependence of A m on the interpolation points. A Newton framework is used in [19] to compute the interpolation points (roots) that successively update the roots using Sm i+1 = λ( A m (Sm)). i Using Lemma 2.1, the successive updates Sm i+1 = λ( A m (Sm)) i are possible via the rational Krylov method. This leads to the iterative rational Krylov algorithm ( ) 1, s j mi A T and Ran(W) (IRKA) [19] which is given in Algorithm 2.2, where (s j m I A) = stands for the range of the matrix W. 3. Shifted variants of the BiCG algorithm. In this section, we propose an alternative implementation to the IRKA method by using a variant of the bi-conjugate gradient (BiCG) method [10]. The approach utilizes the shift invariant property of Krylov subspaces (1.3). We begin with a brief description of the standard BiCG method. BiCG is based on the non-symmetric Lanczos algorithm that can simultaneously solve a linear system Ax = b and a transposed system A T x = c. For a given A R n n and two vectors b, c R n with c T b 0, it computes two approximate solutions, x m and x m, such that the following holds.

M.I. Ahmad, D.B. Szyld, and M.B. van Gijzen 5 Algorithm 2.2 The iterative rational Krylov algorithm 1: Inputs: A, b, c, m and tol 2: Initialize: Select Sm 0 = {s0 1,...,s0 m } Cm that are closed under conjugation and set j = 0 and error > tol. 3: while error < tol do 4: Use the rational Arnoldi algorithm (Algorithm 2.1) to obtain V m and W m such that [ ] Ran(V m )=colsp (s j 1 I A) 1 b,...,(s j mi A) 1 b), [ ] Ran(W m )=colsp (s j 1 I A) c,...,(s j m I A) c). 5: Compute A m = (W m V m) 1 W m AV m 6: error = sort(eig( A m )) sort(s j m) 7: Update the interpolation points, S j+1 m = eig( Am); j = j +1; 8: end while 9: Compute A m = (W mv m ) 1 W mav m ; b m = (W mv m ) 1 W mb; c m = V mc; 10: Outputs: A m, b m, c m. (i) A-orthogonal search directions: Given x 0 and x 0, then for i = 1,2,..., until convergence x i = x i 1 +α i d i and x i = x i 1 +ᾱ i di (3.1) where α i C, ᾱ i is its complex conjugate and where d i, d j C n are A-orthogonal, that is, d iad j = 0 for i j (3.2) (ii) Orthogonal residuals: Let r i = c A T x i and r j = b Ax j be the residuals for i,j = 0,1,2,..., then A version of the BiCG algorithm is given in Algorithm 3.1. r i r j = 0 for i j (3.3) Algorithm 3.1 The bi-conjugate gradient algorithm 1: Inputs: A, b, c, x 0, x 0, max it and tol 2: Initialize: r 0 = b Ax 0 = d 0, r 0 = c A T x 0 = d 0, and error = max{ r 0 / b, r 0 / c } 3: for i = 0 to max it do 4: α i = ( r i r i)/( d i Ad i) 5: x i+1 = x i +α i d i, x i+1 = x i +ᾱ i di 6: r i+1 = r i α i Ad i, r i+1 = r i ᾱ i A T d i. 7: β i+1 = ( r i+1 r i+1)/( r i r i). 8: d i+1 = r i+1 β i+1 d i, di+1 = r i+1 β i+1 di. 9: error = max{ r i+1 / b, r i+1 / c } 10: if error < tol then break end 11: end for 12: Outputs: x i+1, x i+1. In order to solve a sequence of shifted linear systems by using basis vectors associated with A and A T, we consider a seed system and a shifted linear system, i.e., Ax = b, (si A)x s = b, (3.4)

6 Preconditioned multishift BiCG for model reduction where A R n n, and x,x s,b R n while s C. Let x i K i (A,b) and x s i K i((si A),b) be the approximations of the above linear systems obtained after i iterations of the BiCG algorithm (Algorithm 3.1). We can write them as x i = p i 1 (A)b and x s i = ps i 1 (si A)b, (3.5) where p i 1 and p s i 1 are polynomials of degree less than or equal to i 1. Then the associated residuals r i = b Ax i and ri s = b (si A) xs i can be written as r i = q i (A)b and ri s = qs i (si A)b, (3.6) where q i (t) = 1 tp i 1 (t) and qi s are polynomials of degree i such that q i(0) = qi s (0) = 1, called the residual polynomials. It turns out that these residuals must be collinear [22]. Lemma 3.1. Let r i = b Ax i and ri s = b (si A) xs i, be the residuals after the i-th iteration of BiCG of the two systems in (3.4). Then, there exists ζi s C, such that r i = ζi srs i. Proof. Since r i,ri s K i+1(a,b) = K i+1 ((si A),b), and we have ri s K i(( si A T ),c) and r i K i (A T,c). Since K i (A T,c) = K i (( si A T ),c) the Lemma follows. As a consequence, we can write which implies, Since qi s(s t) = (1/ζs i )q i(t) and qi s (0) = 1, r s i = (1/ζs i )r i, (3.7) r s i = qs i (si A)b = (1/ζs i )q i(a)b. ζ s i = q i (s). (3.8) The above expression and (3.7) shows that the residual associated with the shifted linear system in (3.4) can be expressed in terms of the residual associated with the seed linear system Ax = b. In the following, we give some expressions of the parameters associated with the shifted BiCG method; cf. [12]. From the BiCG Algorithm (Algorithm 3.1), we have for i = 1,2,..., and therefore d i 1 = 1 β i (r i d i ), Ad i = 1 α i (r i r i+1 ), ( ) 1 r i = r i 1 α i 1 Ad i 1 =r i 1 α i 1 A (r i d i ), β i =r i 1 α i 1 β i Ar i + α i 1 β i α i (r i r i+1 ), which results in the following three-term recurrence expression for the residual, r i+1 = α i Ar i + β ( iα i r i 1 + 1 β ) iα i r i. (3.9) α i 1 α i 1 In terms of the polynomial representation used in (3.6), the above expression can be written as, q i+1 (t) = α i tq i (t)+ β ( iα i q i 1 (t)+ 1 β ) iα i q i (t). α i 1 α i 1 Setting t = s and (3.8), we have ( ζi+1 s = 1 α i s β ) iα i ζi s α + β iα i ζi 1 s i 1 α. (3.10) i 1

M.I. Ahmad, D.B. Szyld, and M.B. van Gijzen 7 Since we have collinear residuals (3.7), the residual for the shifted system is, ri+1 s = 1 ( ζi+1 s α i ζiar s i s + β iα i ζi 1 s ri 1 s +(ζi s ζs i β ) iα i )ri s, α i 1 α i 1 = α iζ s i ζ s i+1 (si A)ri s + β iα i ζi 1 s ( ζ α i 1 ζi+1 s ri 1 s s + i ζi+1 s ζs i β iα i α i 1 ζi+1 s s α iζi s ζi+1 s ) r s i. Now comparing the above equation with the three-term recurrence (similar to (3.9)) for the shifted system given by ( ) ri+1 s = αs i (si A)rs i + βs i αs i α s ri 1 s + 1 βs i αs i i 1 α s ri s, (3.11) i 1 we have, α s i = α i ( ζ s i ζ s i+1 ), (3.12) β s i = ζ s i+1 = ( αi α s i )( α s ) i 1 ζ s i 1 α i 1 ( 1 α i s β iα i α i 1 ζi+1 s β i = ( ζ s ) 2 i 1 β i, (3.13) ζ s i ) ζ s i + β iα i α i 1 ζ s i 1. (3.14) For more details on the derivation of the above expressions, see [12]. It is easy to see that the above expressions are also true for m = 0, if we initialize ζ 1 s = 1. A shifted variant of the BiCG method is given in Algorithm 3.2. Note that in finding an update for the approximate solution of shifted systems, Algorithm 3.2 uses x s i+1 = xs i + αs i ds i where d s i = ri s βs i ds i 1. Since rs i can be expressed in terms of r i, no matrix-vector multiplications are required for the shifted systems if the residual r i is known. In the following we use this useful result to propose an iterative algorithm for the H 2 -optimal model reduction problem. 4. Iterative shifted BiCG for H 2 -optimal model reduction. In this sectionwe develop an implementation of the shifted bi-conjugate gradient method (MS-BiCG) to compute local minimizers to the H 2 -optimal model reduction problem. It is given in Algorithm 4.1. Observe that since this algorithm utilizes a shifted variant of the BiCG method it can deal with a sequence of shifted systems and transposed shifted systems using only matrix-vector multiplications associated with the seed systems, i.e., systems with no shifts. This saves the burden of computing matrix-vector multiplications corresponding to each shifted system. Observe also that the proposed algorithm involves Krylov iterations to solve linear systems with the shifted matrices (s i I A) i = 1,...,m. This allows us to tune the approximation of the solution of these systems by setting max it and tol values to different values. Again, we refer to the very recent paper [6] for a justification of using these approximations. In our proposed implementation, we divide the tasks of the shifted BiCG algorithm (Algorithm 3.2) into two parts. The first part involves the computation of all the residuals R = [r 1,r 2,...,r m ] and R = [ r 1, r 2,..., r m ] of the seed systems and all the scalar parameters A = [α 1,...,α m ] and B = [β i,...,β m ]. The second part uses these residuals to compute the coefficients for the collinearity of the residuals of the shifted systems, and the rest of the IRKA method. The first part is given by Algorithm 4.2. Note that this algorithm does not compute the approximations x m and x m since only the residuals and the constant parameters are required for the solution of shifted linear systems. The residuals represent BiCG polynomials of degree m that are linked to the set of interpolation points S m = [s 1,...,s m ] according to (3.7), but are computed independently of S m. This means that we can compute all the Krylov subspaces associated with S m in advance.

8 Preconditioned multishift BiCG for model reduction Algorithm 3.2 The shifted bi-conjugate gradient algorithm 1: Inputs: A, b, c, S m = {s 1,...,s m } C m, max it and tol 2: Initialize: x 0 = x 0 = zeros(n,1), r 0 = b = d 0, r 0 = c = d 0, ζ s o = ζ s = ones(m,1), ζ s n = α s = β s = zeros(m,1), x s = x s = d s = d s = zeros(n,m) 3: error = max{ r 0 / b, r 0 / c } 4: for i = 0 to max it do 5: {Linear system with no shift} 6: α i = ( r i r i)/( d i Ad i) 7: x i+1 = x i +α i d i, x i+1 = x i +ᾱ i di 8: r i+1 = r i α i Ad i, r i+1 = r i ᾱ i A T d i 9: β i+1 = ( r i+1 r i+1)/( r i r i) 10: d i+1 = r i+1 β i+1 d i, di+1 = r i+1 β i+1 di 11: error = max{ r i+1 / b, r i+1 / c } 12: if error < tol then break end 13: {Shifted system} 14: for k = 1,...,m do 15: ζ s n(k) = (1 α i s k )ζ s (k)+( αiβ i α i 1 )(ζ s o(k) ζ s (k)) 16: α s (k) = α i ( ζ s (k) 17: β s (k) = ) ζn ( s(k) ζ s 2βi+1 o (k) ζ (k)) s 18: for j = 1,...,n do 19: 20: d s (j,k) = (1/ζ s (k))r i+1 β s (k)d s (j,k), (j,k) = ( 1/ ζ s (k) ) r i+1 β s (k) d s (j,k) 21: x s (j,k) = x s (j,k)+α s (k)d s (j,k), 22: x s (j,k) = x s (j,k)+ᾱ s (k) d s (j,k) 23: end for 24: end for 25: end for 26: Outputs: x s m, x s m. Algorithm 4.1 Iterative shifted bi-conjugate gradient method 1: Inputs: A, b, c, m, max it and tol 2: Initialize: Sm 0 = {s0 1,...,s0 m } Cm, V m = [ ], W m = [ ], set j = 0 and error > tol. 3: while error < tol do 4: Call the shifted BiCG algorithm (Algorithm 3.2) to compute x s m and xs m with A, b, c, Sj m, max it and tol as inputs. 5: V m := [x s m], W m := [ x s m] 6: Compute A m = (Wm V m) 1 Wm AV m; b m = (Wm V m) 1 Wm b; c m = Vm c; 7: error = sort(eig( A m )) sort(sm) j 8: Update the interpolation points, Sm j+1 = eig( Am); j = j +1; 9: end while 10: Outputs: A m, b m, c m. The second part of Algorithm 3.2 involves the computation of all search directions d s i and d s i andallthescalarparametersz s = [ζ s i,...,ζs m], A s = [α s i,...,αs m]andb s = [β s i,...,βs m]toobtain approximations to the shifted and transposed shifted systems, and it is given in Algorithm 4.3. This part requires the output of the first part, Algorithm 4.2. The advantage of this division of our work is that the outputs of Algorithm 4.2 can be used by different sequences of shifted linear systems without computing them again for each sequence; see further our numerical experiments in Section 6. Of course, we have to keep in mind that by doing so, we are incurring in the expense

M.I. Ahmad, D.B. Szyld, and M.B. van Gijzen 9 Algorithm 4.2 Krylov basis for the shifted BiCG method 1: Inputs: A, b, c, max it, error = max{ r 0 / b, r 0 / c } and tol 2: n := size(a,1) 3: Initialize: R = R = zeros(n,m+1), A = B = zeros(m+1,1) 4: R(:,1) = r 1 := b, R(:,1) = r1 := c, A(1) = 1, B(1) = 1, d 1 = r 1 and d 1 = r 1 5: for i = 1 to max it do 6: α i = ( r i r i)/( d i Ad i) 7: r i+1 = r i α i Ad i, r i+1 = r i ᾱ i A T d i 8: β i+1 = ( r i+1 r i+1)/( r i r i) 9: d i+1 = r i+1 β i+1 d i, di+1 = r i+1 β i+1 di 10: error = max{ r i+1 / b, r i+1 / c } 11: if error < tol then break max it = i end 12: R(:,i+1) = r i+1, R(:,i+1) = ri+1, A(i+1) = α i, B(i+1) = β i 13: end for 14: Outputs: R, R, A, B of the additional storage of the computed residuals. Algorithm 4.3 IRKA using the shifted bi-conjugate gradient method 1: Inputs: A, b, c, m, max it and tol 2: n := size(a,1) 3: Call Algorithm 4.2 with inputs A, b, c, max it and tol to obtain R, R, A, B. 4: while error < tol do 5: Initialize: Sm 0 = [ s 0 1,...,sm] 0 C m, Z s = Ẑs = Z s = ones(m,1), A s = B s = zeros(m,1), x s = x s = d s = d s = zeros(n,m), V m = [ ], W m = [ ] and j = 0 6: for i = 0 to m do 7: α = A(i+1), α 1 = A(i), β = B(i+1) 8: r = R(:,i), r = R(:,i) 9: Zs = (1 α S m (i)). Z s +(α β/α 1 ) (Ẑs Z s ); 10: B s = (Ẑs./Z s ). 2 β 11: A s = (Z s./ Z s ) α, k = 1 12: if k < m+1 then 13: d s (:,k) = (1/Z s (k)) r B s (k) d s (:,k) 14: ds (:,k) = (1/Z s (k) ) r B s (k) d s (:,k) 15: x s (:,k) = x s (:,k)+a s (k) d s (:,k) 16: x s (:,k) = x s (:,k)+a s (k) d s (:,k), 17: k = k +1 18: end 19: Ẑ s = Z s, Z s = Z s. 20: end for 21: V m = x s and W m = x s 22: Compute A m = (W mv m ) 1 W mav m ; b m = (W mv m ) 1 W mb; c m = V mc; 23: error = sort(eig( A m )) sort(s j m ) 24: Update the interpolation points, S j+1 m = eig( A m); j = j +1; 25: end while 26: Outputs: A m, b m, c m. 5. Preconditioning of the shifted problem. In order to accelerate the convergence of the iterative linear equation solver one normally uses a suitably chosen preconditioner. For shifted problems this means that, in the case of right-preconditioning, the iterative solver (implicitly or

10 Preconditioned multishift BiCG for model reduction explicitly) solves the system (si A)M 1 s y s = b, x s = M 1 s y s. (5.1) The multi-shift Krylov methods like the BiCG-based algorithms that are described in the previous section all rely on the fact that Krylov subspaces are shift invariant: K i (A,b) = K i ((si A),b). Preconditioned multi-shift methods should satisfy the same property, that is, all the Krylov subspaces for the preconditioned shifted systems must be the same. Without further assumptions on the preconditioners M s this is clearly not the case. So the question is how can we construct preconditioners M s that can be used with multi-shift Krylov subspace methods. We next discuss a framework for this, cf. [11,12,22]. 5.1. A framework for constructing preconditioners for the shifted problems. Let us assume that a matrix M independent of s is given, and that for any shift s a matrix M s exists such that K i (AM 1,b) = K i ((si A)M 1 s,b), (5.2) then a basis for the Krylov subspace K i (AM 1,b) also spans the Krylov subspaces K i ((si A)Ms 1,b)fortheshiftedproblem. ThematrixM s isthenthepreconditionerthatisappliedtothe shifted system. Although M s is not explicitly needed to generate a basis for K i ((si A)Ms 1,b), it is explicitly needed in the computation of the solution x s of the unpreconditioned system from the solution y s of the right-preconditioned system. It is easy to see that the condition (5.2) is satisfied if an s dependent parameter η s and a matrix M independent of s exist such that (si A)M 1 s = η s I AM 1, (5.3) that is, such that the preconditioned shifted matrix can be written as a shifted preconditioned matrix. We have to select M such that the matrices M s are efficient preconditioners for the shifted systems. We first consider the situation in which we choose M and η s such that the shifted matrix si A can be factored out easily. To illustrate this we consider the shift-and-invert preconditioner M = σi A, in which σ is a suitably chosen shift. Substituting this in the right-hand-side of (5.3) gives (si A)M 1 s = η s I A(σI A) 1. After some manipulations, this can be rewritten as ( ) (si A)Ms 1 σηs = (1+η s ) I A (σi A) 1. 1+η s To avoid the solution of a system with (si A) for the unpreconditioned system we choose The matrix M s then becomes ση s 1+η s = s η s = s σ s. M s = 1 1+η s (σi A). As a result, solving systems with both M and M s only involves the matrix σi A; only one LU-decomposition of the matrix σi A has to be computed that can be used for both the preconditioning operations and for computing the solutions of the unpreconditioned shifted systems. The shift-and-invert preconditioner described above has, however, some important disadvantages. The preconditioner is very effective for shifted systems with s close to σ, but is much

M.I. Ahmad, D.B. Szyld, and M.B. van Gijzen 11 less effective for shifts further away from σ. Another disadvantage is that computing the LU decomposition of σi A may still be prohibitively expensive for large 3D problems. We discuss next a class of preconditioners that does not have these disadvantages. In [11,12,22] it is explained that polynomial preconditioners M satisfy (5.3), this is, if M is a polynomial in A then for every shift s an η s exists such that M s is a polynomial in A. Many polynomial preconditioners have been proposed in literature, e.g. [3, 23, 30, 34], and any of these can be chosen as preconditioner for A. The question that we address next is what the resulting polynomial preconditioner Ms 1 = p s n(a) and the parameter η s are for a given shift s. Let n p n (A) = γ i A i (5.4) be the polynomial preconditioner of degree n for A and let p s n (A) = n i=0 γs i Ai be the polynomial preconditioner that as a result is (implicitly) applied to the shifted system i=0 (si A)x s = b. In order to determine p s n(a) given p n (A) we write (cf. (5.3)) (si A)p s n(a) = η s I Ap n (A). (5.5) We need to find the parameters γ s i,i = 0,...,n and η s given s and γ i,i = 0,...,n. Substituting the sum (5.4) into (5.5) gives n sγi s Ai i=0 n γi s Ai+1 η s I + i=0 Shifting the second and last sums gives n i=0 n γ i A i+1 = 0. i=0 n+1 n+1 sγi s Ai γi 1 s Ai η s I + γ i 1 A i = 0. i=1 Taking out the terms for i = 0 and i = n+1 and combining the other terms yields sγ s 0 I γs n An+1 η s I +γ n A n+1 + Equating the parameters for like powers gives i=1 n (sγi s γs i 1 +γ i 1)A i = 0. i=1 γ s n = γ n (5.6) γ s i 1 = γ i 1 +sγ s i, η s = sγ s 0 These difference equations can be solved to give γ s i = η s = s i = n,n 1,...,1 n γ j s j i j=i n γ j s j j=0 which gives η s and the parameters for the polynomial p s n (A) in closed form. The above procedure explains how to compute the parameters p s n(a) from a suitably chosen preconditioning polynomial p n (A) for A. This preconditioning polynomial should be such that Ap n (A) I,

12 Preconditioned multishift BiCG for model reduction or such that the residual polynomial q n+1 (A) := I Ap n (A) O. The usual way to construct a polynomial preconditioner is to first construct a residual polynomial q n+1 (t) that is small in the area in the complex plain that contains the spectrum. The coefficients of this residual polynomial determine the coefficients of the polynomial preconditioner. Let q n+1 (t) be given as n+1 q n+1 (t) = (1 ω i t), i=1 in which the parameters ω i are the reciprocals of the roots q n+1 (t). Furthermore, the requirement for residual polynomials that q(0) = 1 is satisfied. Then, in order to find the parameters γ i of the polynomial p(t) = n i=0 γ it i, we first express q n+1 (t) as n+1 q n+1 (t) = 1 γ i 1 t i. i=1 The parameters γ i can then be found from the recursion q j (t) = q j 1 (t) ω j tq j 1 (t),j = 1 : n+1, q 0 (t) = 1. It is a natural choice to take the ω s equal to the reciprocals of the Chebyshev nodes on the interval [l, u], i.e., ψ = (2i 1)/(2(n+1)), ω i = 2 u+l (u l)cos(πψ), i = 1,...,n+1. Here l and u are lower and upper bounds, respectively, on the spectrum if the eigenvalues are real. If the eigenvalues are complex, l and u are the foci of the ellipse that encloses the spectrum. 6. Numerical experiments. In this section we present numerical results for three different test problems. The first two problems are draw from reservoir simulation. The system matrix for these test problems is nonsymmetric, but the eigenvalues are real. The third test problem, the classical EADY problem [8] is used to demonstrate how our approach can be applied to models with a system matrix with strongly complex eigenvalues. The numerical experiments were performed on a laptop computer with an Intel core I5 processor and 4 GB of RAM memory. The computations were performed using MATLAB version 7.14. 6.1. Test problems from reservoir simulation. 6.1.1. Flow in a cylindrical reservoir. The flow in a porous medium in a cylindrical reservoir around a well can under certain assumptions be described by the radial diffusivity equation r (r p r )+ 1 2 p r 2 θ 2 + 2 p z 2 = p t r (1,R),θ ( π,π],z (0,D), t > 0. (6.1) In this equation p is the pressure in the reservoir, r the range to the center of the well, R to the radius of the reservoir, θ is the radial coordinate, z the vertical coordinate, D the depth of the reservoir, and t is the time. All these variables are dimensionless. The initial condition is p(0,r,θ,z) = 0, r (1,R).

M.I. Ahmad, D.B. Szyld, and M.B. van Gijzen 13 A no-flow boundary condition is imposed on the top and on the bottom of the reservoir, i.e., p = 0 if z = 0 or z = D. z Because of the cylindrical domain the following periodic boundary condition holds: p(t,r,0,z) = p(t,r,2π,z). For the boundary conditions for the well at r = 1 and for the outer boundary at r = R we consider two physically important cases, which define our test problems. Test problem 1: prescribed well pressure. For the first test problem the pressure in the borehole is prescribed and there is no inflow through the outer boundary. In dimensionless variables these boundary conditions read p(t,1) = p well, and p(t,r,θ,z) r = 0, t > 0. (6.2) The variable of interest is the outflow from the reservoir into the borehole (which determines the production rate), which is given by v well = p(t,1,θ,z) r The control variable is p well. The above problem is discretized with the Finite Difference method. The outflow into the borehole is calculated using a second order finite difference formula. This leads to a dynamical system of the following form [ ][ A b c T d ] [ p = p well. ] ṗ v well Test problem 2: prescribed well flux. For the second test problem the well flux is prescribed and the pressureat the outer boundary is constant. The latter boundary condition is an approximation to an infinitely large reservoir. In dimensionless variables, the corresponding boundary conditions are p(t,1,θ,z) r. = 1, and p(t,r,θ,z) = 0, t > 0. (6.3) For this problem, the output variable is the borehole pressure p well and the control variable is v well, and the discretized dynamical system is given by [ ][ ] [ ] A b p ṗ c T =. 0 v well p well For both problems, the matrices A have real eigenvalues, since they correspond to a discretized Laplacian (the matrices for the two test problems differ due to the different boundary conditions). However, due to the Neumann boundary conditions and to the cylindrical coordinates the matrices are non-symmetric. 6.1.2. Parameter settings. The radius R and depth D of the reservoir are R = 200, and D = 10. The domain was discretized with an equidistant grid using 199 points in the radial direction (excluding the prescribed boundary), 11 points in z-direction, and 12 points in angular direction. The number of state variables is therefore 26,268 for both the test problems. The dimension of the reduced model m is taken to be 8, which gives a good correspondence between the response of the reduced and unreduced systems.

14 Preconditioned multishift BiCG for model reduction IRKA is initialized using the harmonic Ritz values after m = 8 Arnoldi iterations with a constant starting vector. The IRKA iterations are terminated once the maximum difference in the shifts between two consecutive iterations is smaller than 10 5. The different BiCG-type iterative linear equation solvers used in the experiments are terminated for a specific shifted linear system once the corresponding residuals satisfy r s i 10 8 b and r s i 10 8 c. Note that the norms of the residuals of the shifted systems can be obtained almost for free using (3.7). The maximum number of iterations is set to 10,000. The algebraically largest harmonic Ritz value is taken as upper bound for the algebraically largest eigenvalue. Minus the maximum absolute rowsum is taken as lower bound for the algebraically smallest eigenvalue. 6.1.3. Numerical results. To solve the sequence of linear systems we have used the following techniques: 1. Direct solution (using MATLAB s \ command): every system is solved individually. 2. BiCG (Algorithm 3.1) combined with the polynomial preconditioner: the bases for the Krylov subspaces for A and A T are recomputed for every shift in every IRKA iteration, systems with A and its transpose for the same shift are solved simultaneously; 3. Shifted BiCG (MS-BiCG, Algorithm 3.2) combined with the polynomial preconditioner: in every IRKA iteration the bases for the Krylov subspace are computed once for solving the m shifted systems with A and the m adjoint shifted systems; 4. Two-part MS-BiCG (Algorithm 4.3): the bases for the Krylov subspaces are computed once before the IRKA iteration loop, all shifted systems inside the IRKA loop are solved using vector updates only. Test problem 1: prescribed well pressure. Figure 6.1 shows in the left panel the Hankel singular values of the reduced system. For comparison we also included the Hankel singular values of a 1D version of the problem, in which the angular and depth dependency is ignored. Due to symmetry, the response of the 1D problem should be the same as for the 3D problem. The right panel shows the time response of the unreduced 3D problem, of the reduced 3D problem and of the 1D problem. Clearly, the response of the reduced model is almost the same as of the original model. 10 0 Full model 1D Reduced model 3D Singular Values 1.6 1.4 Time response Full model 1D Full model 3D Reduced model 3D Singular Values (db) 10 20 30 40 50 60 70 80 10 8 10 6 10 4 10 2 10 0 10 2 Frequency (rad/s) Well flux 1.2 1 0.8 0.6 0.4 0.2 0 20 40 60 80 100 120 t Fig. 6.1. Response for test problem 1 Table 6.1 shows the number of IRKA iterations, the total number of matrix-vector multiplications with A (MATVECs), the total number of BiCG iterations as a measure for the vector overhead, and the CPU times in seconds for the different solution techniques. Not included are the results without preconditioning since no convergence occurred for some linear system within

M.I. Ahmad, D.B. Szyld, and M.B. van Gijzen 15 the maximum number of iterations. The number of IRKA iterations is the same for all tabulated methods, as is to be expected if all linear systems are solved to sufficient accuracy; cf. [6]. Method IRKA-its. BiCG-its. MATVECs CPU-time [s] Direct solution 50 - - 804 BiCG, pp = 5 50 11971 73826 299 BiCG, pp = 10 50 6774 78514 296 MS-BiCG, pp = 5 50 4086 26516 116 MS-BiCG, pp = 10 50 2305 29355 118 Two-part MS-BiCG, pp = 5 50 4224 2540 (90) 28 Two-part MS-BiCG, pp = 10 50 2401 4561 (51) 28 Table 6.1 Results for test problem 1. pp is the degree of the polynomial preconditioner. As the results show, solving the linear systems with standard BiCG with polynomial preconditioning already gives a performance improvement of a factor of three over the direct solver. Using MS-BiCG with the polynomial preconditioner gives a further speed-up of a factor of three. Using the Two-part MS-BiCG algorithm yields another big gain in performance, which results in a total speed-up of about a factor of almost 30 with respect to IRKA with the direct solution method. The performance gain for the Two-part MS-BiCG algorithm comes at the expense of a higher memory consumption than for the other two BiCG variants. The number in parenthesis in the third column for the Two-part MS-BiCG algorithms indicates the dimension i of the Krylov subspace K i (A,b). Since the bases for both K i (A,b) and K i (A T,c) need to be stored, additional memory of 2i vectors of the size of the number of state variables is needed. Note that taking the degree of the polynomial preconditioner higher not only reduces the computing time but, more importantly, also reduces the storage requirements. We remark that the storage requirements of the direct solution method is equivalent to the storage requirements for 264 vectors (i.e., the number of points in angular direction times the number of points in z direction times 2). Test problem 2: prescribed well flux. Figure 6.2 shows in the left panel the Hankel singular values of the 1D model and of the reduced 3D model for the second test problem. The right panel shows the time response of the unreduced 3D problem, of the reduced 3D problem and of the 1D problem. Also for the second test problem the response of the reduced model is almost the same as of the original model. 15 10 Singular Values Full model 1D Reduced model 3D 3 2.5 Full model 1D Full model 3D Reduced model 3D Time response 5 Singular Values (db) 0 5 10 15 20 25 30 Well pressure 2 1.5 1 0.5 35 10 4 10 2 10 0 10 2 Frequency (rad/s) 0 0 20 40 60 80 100 120 t Fig. 6.2. Response for test problem 2 Table 6.2 shows the number of IRKA iterations, the total number of matrix-vector multiplications with A (MATVECs) and the CPU times in seconds for the different solution techniques. Also for the second test problem preconditioning was necessary to obtain convergence for all the

16 Preconditioned multishift BiCG for model reduction Spectrum EADY matrix Singular Values 10 70 60 Full m=8 5 0 5 Singular Values (db) 50 40 30 10 20 15 10 5 0 5 10 15 10 10 2 10 1 10 0 10 1 10 2 Frequency (rad/s) Fig. 6.3. Eady problem linear systems within the maximum number of iterations. Break-down occurred for one of the systems that was solved with BiCG preconditioned with a polynomial preconditioner of degree 5, which also resulted in non-convergence of IRKA. The number of IRKA iterations is again the same for all other tabulated methods. Method IRKA-its. BiCG-its. MATVECs CPU-time [s] Direct solution 34 - - 518 BiCG, pp = 5 n.c. - - - BiCG, pp = 10 34 7264 82624 313 MS-BiCG, pp = 5 34 5881 36646 178 MS-BiCG, pp = 10 34 2232 27272 114 Two-part MS-BiCG, pp = 5 34 6077 2566 (201) 72 Two-part MS-BiCG, pp = 10 34 2335 3490 (70) 25 Table 6.2 Results for test problem 2. pp is the degree of the polynomial preconditioner. The results for the second test problem also shows that an enormous gain in performance can be obtained if the linear systems are solved with one of the proposed BiCG variants instead of with the direct solver. Using the Two-part MS-BiCG algorithm with a polynomial preconditioner of degree 10 yields a speed-up of about a factor of 20 with respect to IRKA with direct solution method. 6.2. The EADY test problem. The test problems in the previous section had a real spectrum. In this case it is relatively easy to construct an effective polynomial preconditioner if accurate bounds on the upper and lower eigenvalue are available. In the complex case, an ellipse has to be computed that encloses the spectrum. For the Chebyshev polynomial preconditioner to be effective, this ellipse should not enclose the origin. Unfortunately it is not always possible to construct such an ellipse, even if all the eigenvalues are located in the same half plane. To illustrate this we consider the classical EADY test problem [8], which is a model of the atmospheric storm track. The system matrix A has dimension 598, and is dense. The left panel of Figure 6.3 shows the spectrum and the bounding ellipse. Although all eigenvalues have a negative real part, the ellipse extends into the right half plane. The dimension of the reduced model is chosen to be m = 8. The right panel shows the frequency response of the full and of the reduced model. In order to make a suitable preconditioner for the case where the ellipse extends into the right half plane, we construct a polynomial preconditioner not for A, but for the shiftedd matrix σi A. Hence we compute an approximate shift-and-invert (SI) preconditioner. Clearly, the shift σ has to be chosen such the eigenvalues of σi A are in the right half plane. To solve the sequence of linear systems we have used the following techniques:

M.I. Ahmad, D.B. Szyld, and M.B. van Gijzen 17 1. Direct solution (using MATLAB s \ command): every system is solved individually. 2. BiCG (Algorithm 3.1), without preconditioner, with SI preconditioner, and with a polynomial preconditioner of degree 5. 3. Shifted BiCG(MS-BiCG, Algorithm 3.2), without preconditioner, with SI preconditioner, and with a polynomial preconditioner of degree 5. 4. Two-part MS-BiCG (Algorithm 4.3), without preconditioner, with SI preconditioner, and with a polynomial preconditioner of degree 5. The parametersfor IRKAand for the BiCG algorithms arechosen as specified in Section 6.2. The ellipse is constructed to enclose the eight harmonic Ritzvalues that are computed to initialize the IRKA algorithm. The shift is chosen equal to half the length of the minor axis, which garantees that the ellipse aound the spectrum of the shifted matrix is in the right half plane. Table 6.3 gives the results for the different methods; recall that the number in parenthesis indicates the dimension i of the Krylov subspace K i (A,b). The exact SI preconditioner and the polynomial preconditioner (which is an approximate SI preconditioner) behave similarly: the number of BiCG iterations is almost the same for the two preconditioners. IRKA with Two-part MS-BiCG and either the shift-and-invert preconditioner or polynomial preconditioner of degree 5 are the fastest methods: IRKA with these linear system solvers is about eight times faster than IRKA with the direct solver. Method, preconditioner IRKA-its. BiCG-its. MATVECs CPU-time Direct solution 50 - - 45 BiCG, None 44 122676 122676 146 BiCG, SI 49 34672-163 BiCG, pp = 5 50 39387 238322 233 MS-BiCG, None 50 25057 25057 36 MS-BiCG, SI 49 7589-38 MS-BiCG, pp = 5 49 7832 48952 53 Two-part MS-BiCG, None 49 21872 (566) 566 11 Two-part MS-BiCG, SI 49 6101 (133) - 5 Two-part MS-BiCG, pp = 5 45 6285 (147) 2682 6 Table 6.3 Eady problem. pp is the degree of the polynomial preconditioner. 7. Concluding remarks.. The computationally most demanding part of IRKA is the solution of sequences of shifted systems. In this paper we have proposed variants of IRKA that use preconditioned versions of the shifted BiCG algorithm. By means of numerical experiments we have shown the efficiency of the resulting algorithms. Our choice for the shifted BiCG algorithm was mainly motivated by the fact that BiCG simultaneously solves systems with A and its adjoint, which is what is needed in IRKA. However, another Krylov method for sequences of linear systems, such as shifted (restarted) GMRES [13] maysometimesbe preferable,in particularifc T b 0. Manyofthe ideasthat wehavepresentedin this paper carry over directly to this algorithm. In particular, it is still sufficient to only store bases for the Krylov subspaces K i (A,b) and K i (A T,c) to solve all the shifted systems systems without additional matrix-vector multiplications. Also the polynomial preconditioner for sequences of shifted systems that we have presented can be used with any shifted Krylov method. Acknowledgements. The third author thanks Professor Danny Sorensen for introducing him to the area of model order reduction during a sabbatical stay at the Delft University of Technology. Part of this work was performed while the second author was visiting Delft. The warm hospitality received is greatly appreciated. The authors thank Professor Valeria Simoncini for her very useful comments on an earlier version of the manuscript. REFERENCES [1] A. C. Antoulas. Approximation of large-scale dynamical systems. SIAM, Philadelphia, 2005.

18 Preconditioned multishift BiCG for model reduction [2] W. Arnoldi. The principle of minimized iterations in the solution of matrix eigenvalue problem. Quarterly of Applied Mathematics, 9:17 29, 1951. [3] S. Ashby, T. Manteuffel, and J. Otto. A comparison of adaptive Chebyshev and least squares polynomial preconditioning for Hermitian positive definite linear systems. SIAM Journal on Scientific and Statistical Computing, 13:1 29, 1992. [4] L. Baratchart, M. Cardelli, and M. Olivi. Identification and rational l 2 approximation: a gradient algorithm,. Automatica, 27:413 418., 1991. [5] U. Baur, C. A. Beattie, P. Benner, and S. Gugercin. Interpolatory projection methods for parameterized model reduction. SIAM Journal on Scientific Computing, 33:2489 2518, 2011. [6] C. A. Beattie, S. Gugercin, and S. Wyatt. Inexact solves in interpolatory model reduction. Linear Algebra and its Applications, 436:2916 2943, 2012. [7] A. Bunse-Gerstner, D. Kubalinska, G. Vossen, and D. Wilczek. h 2 -norm optimal model reduction for large scale discrete dynamical MIMO systems. Journal of Computational and Applied Mathematics, 233:1202 1216, 2010. [8] Y. Chahlaoui and P. V. Dooren. Youns chahlaoui and paul van dooren: A collection of benchmark examples for model reduction of linear time invariant dynamical systems;slicot working note 2002-2: February 2002. Technical report, SLICOT Working Note, 2002. [9] L. Daniel, O. Siong, L. Chay, K. Lee, and J. White. A multiparameter moment-matching model-reduction approach for generating geometrically parameterized interconnect performance models. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 23:678 693, 2004. [10] R. Fletcher. Conjugate gradient methods for indefinite systems. In G. Watson, editor, Numerical Analysis - Dundee 1975, volume 506 of Lecture Notes in Mathematics, pages 73 89. Springer, Heidelberg, 1976. [11] R. W. Freund. On Conjugate Gradient Type Methods and Polynomial Preconditioners for a Class of Complex Non-Hermitian Matrices. Numerische Mathematik, 57:285 312, 1990. [12] A. Frommer. BiCGStab(l) for families of shifted linear systems. Computing, 70:87 109, 2003. [13] A. Frommer and U. Glassner. Restarted GMRES for shifted linear systems. SIAM Journal on Scientific Computing, 19:15 26, 1998. [14] P. Fulcheri and M. Olivi. Matrix rational h 2 approximation: A gradient algorithm based on Schur analysis. SIAM Journal on Control and Optimization, 36:2103 2127, 1998. [15] K. Glover. All optimal Hankel-norm approximations of linear multivariable systems and their L -error bounds. International Journal of Control, 39 (6):1115 1193, 1984. [16] E. Grimme. Krylov projection methods for model reduction. PhD thesis, University of Illinois at Urbana- Champain, Urbana, Illinois, 1997. [17] S. Gugercin. Projection Methods for Model Reduction of Large-Scale Dynamical Systems. PhD thesis, Rice University, Houston, TX, 2002. [18] S. Gugercin and A. Antoulas. An H 2 error expression for the Lanczos procedure. In 42nd IEEE Conference on Decision and Control, 2003. [19] S. Gugercin, A. C. Antoulas, and C. Beattie. H 2 model reduction for large-scale linear dynamical systems. SIAM Journal on Matrix Analysis and Applications, 30:609 638, 2008. [20] D. Hyland and D. Bernstein. The optimal projection equations for model reduction and the relationships among the methods of Wilson, Skelton, and Moore. IEEE Transactions on Automatic Control, AC- 30(12):1201 1211, 1985. [21] I. Jaimoukha and E. Kasenally. Oblique projection methods for large scale model reduction. SIAM Journal of Matrix Analysis and Applications, 16:602 627, 1995. [22] B. Jegerlehner. Krylov space solvers for shifted linear systems. Report IUHET-353, December 1996. [23] O. Johnson, C. Micchelli, and G. Paul. Polynomial preconditioning for conjugate gradient calculation. SIAM Journal on Numerical Analysis, 20:362 376, 1983. [24] C. Lanczos. Solution of systems of linear equations by minimized iterations. Journal of Research of the National Bureau of Standards, 49:33 53, 1952. [25] L. Meier and D. Luenberger. Approximation of linear constant systems. IEEE Transactions on Automatic Control, 12:585 588, 1967. [26] B. Moore. Principal component analysis in linear systems: controllability, observability, and model reduction. IEEE Transactions on Automatic Control, 26:17 32, 1981. [27] T. Penzl. Algorithms for model reduction of large dynamical systems. Linear Algebra and its Applications, 415:322 343, 2006. [28] A. Ruhe. Rational Krylov algorithms for nonsymmetric eigenvalue problems. II. matrix pairs. Linear Algebra and its Applications, 197-198:283 295, 1994. [29] A. Ruhe. The rational Krylov algorithms for nonsymmetric eigenvalue problems III. Complex shifts for real matrices. BIT Numerical Mathematics, 34:165 176, 1994. [30] Y. Saad. Practical use of polynomial preconditionings for the conjugate gradient method. SIAM Journal on Scientific and Statistical Computing, 6:865 881, 1985. [31] Y. Saad. Iterative methods for sparse linear systems. PWS, 1996. Second edition, SIAM, Philadelphia, 2003. [32] V. Simoncini and D. B. Szyld. Recent computational developments in Krylov subspace methods for linear systems. Numerical Linear Algebra with Applications, 14:1 59, 2007. [33] J.T.Spanos, M.H.Milman,and D.L.Mingori. Anew algorithmforl2optimalmodelreduction. Automatica, 28:897 909, 1992.