u(t) = Au(t), u(0) = B, (1.1) - PDF Free Download

ADAPTIVE TANGENTIAL INTERPOLATION IN RATIONAL KRYLOV SUBSPACES FOR MIMO MODEL REDUCTION V. DRUSKIN, V. SIMONCINI, AND M. ZASLAVSKY Abstract. Model reduction approaches have shown to be powerful techniques in the numerical simulation of very large dynamical systems. The presence of multiple inputs and outputs (MIMO systems), makes the reduction process even more challenging. We consider projection-based approaches where the reduction of complexity is achieved by direct projection of the problem onto a rational Krylov subspace of significantly smaller dimension. We present an effective way to treat multiple inputs, by dynamically choosing the next direction vectors to expand the space. We apply the new strategy to the approximation of the transfer matrix function and to the solution of the Lyapunov matrix equation. Numerical results confirm that the new approach is competitive with respect to state-of-the-art methods both in terms of CPU time and memory requirements. 1. Introduction. Numerical simulation of large-scale dynamical systems has shown to be a very useful tool for studying complex physical and engineering problems. For further optimization of the system, or to solve related inverse problems, typically observations under multiple external inputs and/or outputs are required, giving rise to so called multiple-input multiple-output systems (MIMO). Even for a single input, with significant level of detailization of the problem, the system size becomes intractable, and its direct solution appears to be infeasible due to the enormous memory and computational requirements. Obviously, multiple inputs make the complexity even worse. Model reduction approaches have shown to be powerful techniques to overcome this problem, aiming at producing low-rank dynamical systems that have rather similar features as those of the original full-scale system. Consider the continuous time-invariant dynamical system with multiple inputs b 1,b 2,...,b p R n u(t) = Au(t), u(0) = B, (1.1) where B = [b 1,...,b p ] R n p. In applications, b i,i = 1,...,p are determined by the shape of sources and type of excitation signal. For point source located at element k of dynamical system and impulse-type excitation, one can set b i = e k where e k is the k-th unit vector. For step-off excitation mode, b i = A 1 e k can be used. Throughout the paper we assume that the system (1.1) is passive, i.e. the field of values of A, defined as W(A) = {x Ax, x C n, x = 1}, is a subset of C, where is the Euclidean vector norm and x is the conjugate transpose of x. The Laplace-image U(s) = + 0 u(t)e st dt of u(t) satisfies U(s) = (A si) 1 B, (1.2) where I is the identity matrix. One typical application problem is the approximation of Cu(t) in (1.1) for all positive times t 0 (or, which is equivalent, CU(s) for s ir), here C R r n is the matrix of outputs (receivers); another problem of interest in the context of the system stability analysis is the solution of associated Lyapunov matrix equations. Version of November 12, 2012. Schlumberger-Doll Research, 1 Hampshire st. Cambridge, MA 02139 USA (druskin1@slb.com) Dipartimento di Matematica, Università di Bologna, Piazza di Porta S. Donato 5, I-40127 Bologna, Italy and CIRSA, Ravenna, Italy (valeria@dm.unibo.it). Schlumberger-DollResearch, 1Hampshirest. Cambridge, MA02139USA(mzaslavsky@slb.com) 1

2 V. Druskin, V. Simoncini and M. Zaslavsky In this paper, we consider MIMO problems and reduced-order models where the reduction of complexity is achieved by direct Galerkin-based projection of the problem onto subspace of significantly smaller dimension. The projection technique became an increasingly popular tool for solving stable dynamical systems since early 80 s (see, for example, [2] for an overview). Typically, a main ingredient of projectionbased model reduction is given by Krylov-type subspaces. The polynomial Krylov subspace for single input (p = 1) is defined as K m (A,b) = span{b,ab,...,a m 1 b}, and its block analogue for multiple inputs, K m (A,B) = range([b,ab,...,a m 1 B]), can be efficiently used to solve problems just of rather moderate size (see, e.g., [17, 23, 4]). A more general type of subspace, namely rational Krylov subspace (RKS), was first introduced by Ruhe[37, 38]. Its single and multiple input(block) versions have forms K m (A,b) = span{(a s 1 I) 1 b,...,(a s m I) 1 b} and K m (A,B) = range([(a s 1 I) 1 B,...,(A s m I) 1 B]), respectively, assuming that all parameters s 1,...,s m are different from each other. Abundance of analysis and benchmarks show that it has superior properties for approximating dynamical systems (see, e.g., [24],[19],[20],[21],[33],[8]); the choice of parameters s 1,...,s m (called shifts) is crucial for the efficiency of the approach. Based on estimates of the spectral interval, an a-priori optimal choice of the multiple shifts for the single input case with symmetric A was suggested in[19],[28],[8]. As was shown in these papers, the shifts can be chosen as Zolotarev interpolation points for a given spectral interval (or for its estimate). The obtained approximants have an asymptotically optimal convergence rate for the error upper bound. This bound is reached in the case of the spectral measure uniformly distributed in the spectral interval. Indeed, for the highly non-uniform distribution of the spectral points, faster convergence can be achieved by using an efficient adaptive algorithm for choosing shifts, optimized for a given spectral measure. For a single input, such an approach was suggested in [20] for symmetric problems and further investigated in [21] for the non-symmetric case as well as for the Lyapunov equation. For the problemwithsymmetricatheh 2 -optimal(hardyspaceoptimal)shifts(a.k.a. Meier- Luenberger points in control theory) are known to be on W( A). Hence, the shifts can be chosen one by one in a greedy way, i.e. by selecting the next shift where the residual of the current subspace approximation is the largest with respect to s W( A) [20]. This approach was extended in [21] for the nonsymmetric A (and also applied to the Lyapunov problem) by using instead of W( A) its specially chosen dynamically updated (via current Ritz values) subset. Thanks to a simple formula for the residual of the RKS approximation, the next shift can be found by minimization of an explicitly given rational scalar function. Important, that the above procedure generates sequence of nested rational Krylov subspaces, that can be terminated upon achieving desirable accuracy. When dealing with multiple inputs, the same shifts chosen a-priori or adaptively determined for one particular input can be exploited. However, the generated subspace may be too large, thus containing redundant information, and far from optimal. In fact, by adding vectors to the subspace we strive to minimize the error. However, not all the vectors from the block equally contribute to the error reduction. Indeed, the size of the blocks can be reduced using deflation algorithms. One possible deflation strategy is to eliminate (almost) linearly dependent vectors from the blocks [1] via SVD of these blocks, and it is known to work well only for (almost) linearly dependent inputs. Another approach to deflate the subspace is via SVD of the residual matrix, i.e. to eliminate singular vectors corresponding to small singular values of residual

Adaptive tangential rational Krylov subspaces 3 block. On examples with block GMRES for the solution of linear systems, this strategy was shown to be very promising [14], however, to avoid significant error in the final solution, only the very small singular values should be truncated (see [29, 36]); a more sophisticated deflation procedure during the generation of the approximation space was recently proposed in [13]. In this paper, we suggest to span the rational Krylov subspace on matrices (A s i I) 1 F i,i = 1,... with properly chosen shift s i and right-hand side matrices F i, whose number of columns is possibly lower than p. Therefore, we implicitly eliminate some vectors from the original blocks, in a dynamic manner. This generalization of the block RKS was first suggested by Ruhe [39] in the eigenvalue context, and further considered in [43] in the framework of solving linear systems (with infinite s i,i = 1,...). However, the authors of these papers did not propose a systematic way to choose s i and F i, i = 1,... The concept of tangential interpolation was first introduced by Mark Krein [32] in the framework of the rational interpolation problem of matrix-valued functions. For the problem of approximating (1.2), instead of interpolating the full matrix U(s) C n p at given s = s (corresponding to the block-rks in the projection framework), the tangential interpolation refers to interpolation of the projection of U(s) onto some direction d. Such an approach adds significant flexibility in the choice of interpolation sampling, moreover, the H 2 -optimal approximation of MIMO transfer functions can be obtained via tangentional interpolation [16],[12],[25]. In the projection-based model reduction framework, the tangential interpolation at points s i,i = 1,...,m and directions d i,i = 1,...,m can be achieved by using the subspace T m = range([(a s 1 I) 1 Bd 1,...,(A s m I) 1 Bd m ]). In this paper, we introduce an adaptive choice of (s i,d i ), i = 1,...,m to optimize the approximation error. We acknowledge the approach proposed in [12], [25], based on the theory of H 2 -optimal model reduction. It iteratively constructs a sequence of subspaces with different (s i,d i ) m i=1 until convergence to the one satisfying necessary H 2 -optimality conditions. However, this optimization problem is not convex and has multiple local minima. Moreover, even when convergent, this approach is efficient only for those applications where what really matters is only the size of the reduced problem, and the cost for its computation can be neglected. In our case, for the solution of either (1.1) or (1.2), the dominating computational cost is the construction of the rational Krylov subspaces (i.e., system solves with matrices (A s i I)). Therefore, this method is not feasible for our purposes. In this paper, we extend the approach of [20],[21] to the case of multiple inputs. When the shifts s i,i = 1,... are well separated from the spectrum of A, the residual provides a rather good estimate of the approximation error. Hence, it seems natural to choose the next shift s i+1 and tangential direction d i+1 in a greedy fashion from the residualoftheapproximationto(a si) 1 Bd. Inparticular, thenextpair(s i+1,d i+1 ) maximizes the Euclidean norm of the residual with respect to s on a subset of W( A) (given by dynamically updated estimate of the convex envelop of A a spectrum, as in [21]) and unit d. For fixed s, this problem is obviously convex, however, when s is varied, the optimization problem is known to have multiple maxima. For the single input case, in [20, 21] we proposed a robust maximization algorithm based on a simple explicit scalar formula for the residual of the RKS approximation. Though we do not have such a formula for the multiple inputs case, we show here that the maximization problem can still be solved rather efficiently. We also note that for

4 V. Druskin, V. Simoncini and M. Zaslavsky each new shift we may choose multiple l > 1 principal directions so as to add the corresponding l vectors to the subspace. This generalization of our approach may have worse convergence properties in terms of space dimension, however, may be better suitable for parallel computing and CPU time in general. The flexibility in the number of principal directions constitutes one more advantage of our approach compared with deflated block methods where the block size are automatically determined. We also briefly discuss the application of our approach for the solution of linear systems using preconditioned GMRES. We illustrate the performance of both single direction (TRKSM) and multi direction(multidir TRKSM) approaches for two important problems of control theory: the approximation of the transfer function CU(s), and the solution of the Lyapunov equation. The new algorithms are compared with the extended Krylov subspace method (EKSM) and the original block rational Krylov subspace (block RKSM) for publicly available benchmark problems and for easily reproducible discretized operators. For problems with multiple inputs, we show that our tangential adaptive approaches allow one to obtain a reduced system with the same accuracy as block RKSM and EKSM, but of significantly smaller size and in general with lower CPU time requirements. 2. Tangential rational Krylov subspace. For the sake of the derivation of our new strategy, let us consider the problem of computing U(s) = (A si) 1 B, s ir, (2.1) where B R n p is a tall matrix of linearly independent inputs, and A R n n. We approximate U by U m as U m (s) = V m (H m si) 1 V mb, where V m R n m has orthonormal columns, which are a basis of some m-dimensional subspace in R n p, and H m = V mav m. We want to extend the subspace by a vector v m+1 = (A s m+1 I) 1 f m+1, (2.2) where f m+1 = Bd m+1, with the direction vector d m+1 R p, d m+1 = 1 and pole s m+1 S m. Here S m C + is a set defined similarly as in [20], [21], and it encloses the eigenvalues of A. To this end, we introduce the n p residual matrix Proposition 2.1. It holds r m (s) = (A si)v m (H m si) 1 V mb B. (2.3) T m+1 := range([v m,v m+1 ]) = range([v m,(a s m+1 I) 1 r m (s m+1 )d m+1 ]), and dim(t m+1 )) = m+1 if and only if r m (s m+1 )d m+1 0. Proof. We observe that (A s m+1 I) 1 r m(s m+1 )d m+1 = V m(h m s m+1 I) 1 V m Bd m+1 (A s m+1 I) 1 Bd m+1 = V m(h m s m+1 I) 1 V mbd m+1 v m+1. Since V m (H m s m+1 I) 1 V mbd m+1 range(v m ), the first result follows. If r m (s m+1 )d m+1 = 0 then dim(range([v m,v m+1 ])) = m. If r m (s m+1 )d m+1 0, then let w = (I V m V m)(a s m+1 I) 1 r m (s m+1 )d m+1. Obviously, it holds that W[(A s m+1 I) 1 ] C for R(s m+1 ) 0, so using the orthogonality of r m (s m+1 )d m+1 with respect to range(v m ), we obtain r m (s m+1 )d m+1w = r m (s m+1 )d m+1(a s m+1 I) 1 r m (s m+1 )d m+1 < 0.

Adaptive tangential rational Krylov subspaces 5 Hence, w 0, and sufficiency is proved. Let us construct the M dimensional tangential rational Krylov subspace (TRKS) T M = range([(a s 1 I) 1 Bd 1,...,(A s M I) 1 Bd M ]). (2.4) by ensuring that Proposition 2.1 holds. The computed approximation space satisfies the following tangential interpolation property. Proposition 2.2. Let s m,d m, m = 1,...,M satisfy Proposition 2.1. Then U M (s m )d m = v m and r M (s m )d m = 0, m = 1,...,M. Proof. By construction v m+1 T M, and it exactly satisfies the Gakerkin equation V m[(a s m I)v m+1 Bd m ] = 0. Likewise, U M (s m )d m belongs to the same subspace and satisfies the same Galerkin equation. Then the proposition statement follows from the uniqueness of the solution of the latter. Define the exact and approximate square matrix-valued transfer functions as H M (s) = BV M (H M si) 1 V M B and H(s) = B(A si) 1 B, respectively. Then Proposition 2.2 yields H(s i )d i = H M (s i )d i, i = 1,...,M. For symmetric A in addition we will have a derivative interpolation property, that is d i d ds H(s) s=s i d i = d d i ds H M(s) s=si d i, i = 1,...,M. 3. Adaptive choice of poles and directions. The performance of the tangential approach depends on the choice of the poles and the direction vectors. Both choices are made adaptively, so that the whole algorithm is self-improving. 3.1. Single pole and principal direction. For B with a single column, it was first proposed in [20], and then further explored in [21] to determine the next pole as ( ) s m+1 = arg max q(s), (3.1) s S m where S m is a region of the complex plane approximating the mirrored spectral region of A, while q m (s) = m j=1 s sj s λ j is the residual norm, where s j,λ j, j = 1,...,m are the previously computed poles and the current Ritz values, respectively. Following our scalar adaptive approach we want to choose the tangential vector d m+1 and the interpolation point s m+1 so that (d m+1,s m+1 ) = arg max r m (s)d, (3.2) s Sm d R p, d =1 with r m (s)d as in (2.3). Obviously, the residual can be written as the quadratic form r m (s)d 2 = d Q m (s)d, where Q m (s) R p p is a symmetric nonnegative matrix that does not have full rank only for s at previous interpolation points. For p = 1 (the SISO case) we have Q m = q m. Generally, for MIMO problems we do not have an explicit scalar formula for Q m (s) as in the SISO case, thus the computation of r m (s)d is more costly,

6 V. Druskin, V. Simoncini and M. Zaslavsky though with a cost mainly dependent on m and p, but not on n. Therefore, (3.2) can be solved by computing a series of small singular value decompositions, and then d m+1 is obtained as the principal right singular vector of r m (s m+1 ). The TRKS approximation becomes exact when max s Sm r m (s) = 0; It is reasonable to stop the process when r m (s m+1 ) is small enough. The procedure introduced in (3.2) minimizes the interpolation error in a greedy fashion by putting new interpolation point where the error is the largest. Intuitively, we expect that such an approach produces near optimal sequences of nested subspaces. Some reasoning for such an approach based on potential theory was presented in [20], [21], [26] for the SISO case, but even for this simple case a rigorous analysis is still missing. An ad hoc greedy concept for optimal choice of interpolation points (in SISO formulation) first appeared in [24], and was further developed in parametric model reduction in [30]. For optimal interpolation, such as H 2 -optimal interpolation, more sophisticated conditions involving the eigenpairs of H m must be imposed; we refer to recent results that have uncovered the whole theory [16],[12],[25]. Algorithms for obtaining H 2 - optimal interpolation have been devised, although their computational cost (in term of number of linear solves) is significantly higher than that we are presenting here, and they do not produce nested subspaces. However undoubtedly, the H 2 -optimal interpolation becomes the method of choice when the final subspace size is critical, for example, in inverse problems [22]. Though, even in such cases, our algorithm can be used for initial guess in the H 2 -optimal algorithms (IRKA [25]). A preliminary comparison of the two approaches is performed in Example 5.3. 3.2. Multiple principal directions. The TRKS in(2.4) is generated by adding one vector at the time, therefore systems with a single right-hand side are solved sequentially. On the other hand, if the original Rational Krylov method is used, the approximation space is expanded by p vectors at the time, so that linear systems with p right-hand sides need to be solved at each iteration. The sequential nature of the generation process in the TRKS case thus makes parallel implementation more difficult compared with block rational Krylov methods. More importantly, from an algorithmic view point, we have noticed in our experiments that when the rank p of B is quite high, the tangential algorithm may compute poles with high multiplicity, close to p, simply because different relevant directions associated with the same shift are required. The whole sequential process may thus become very expensive. To mitigate these shortcomings we suggest a multi-directional variant of the algorithm, where in (3.2) we select d (i) m+1, i = 1,...,l, l p for a single shift s m+1, where d (i) m+1 are the right singular vectors corresponding to l largest singular values of r m (s m+1 ). At this point, we notice that the block RKSM corresponds to keeping all p directions. The number of new directions can be chosen in several different ways, for example, by balancing the subspace approximation quality and the cost of updating the system matrix in the multiple right-hand side linear solvers, or just divisible by the number of the processors in a parallel environment. Our practical selection strategy chooses l so that the first l singular values σ (1),...,σ (l) of r m (s m+1 ) satisfy σ (i) > 1 10 σ(1), i = 1,...,l, (3.3) so that only the principal directions are considered. Clearly, l 1. For l = 1 we recover the approach of the previous section. We found the factor 1/10 to be particularly effective, however other values may be considered.

Adaptive tangential rational Krylov subspaces 7 Remark 3.1. The multi-directional approach allows in principle to have shift multiplicity (i.e., several directions for the same shift) at most equal to the number of inputs. For higher multiplicity, the derivatives of (A si) 1 r m (s)d m+1 with respect to s must be employed. Remark 3.2. The outlined procedure is very general, as it could be applied to any Krylov subspace based method. As an example, when solving the multiple righthand side linear system AX = B, let us assume that V m contains the basis for the already computed approximation space(with poles at infinity). Then the space can be enrichedbyaddingthefirstl k leftsingularvectorsofr m (0), wherey isthesolution to the projected problem in Range(V m ). The matrix of the orthogonal basis on the enlarged approximation space is thus given as [V m, r m (0)d 1,..., r m (0)d l ]. Such an approach can be viewed as a particular case of the vector continuation approach discussed in [43]. 4. Computational considerations. We start with the description of a few simple results that will be convenient for the actual computation of the pair (d i,s i ). Here we restrict our discussion to the case of a single direction d, keeping in mind that the computational strategy is completely analogous for multiple directions. Proposition 4.1. Given a matrix V m with m orthonormal columns, let H m = V mav m and r m (s) = (A si)v m Y m (s) B, with Y m (s) = (H m si) 1 V mb. Then Proof. The result follows from r m (s) = (AV m V m H m )Y(s) (I V m V m)b. r m (s) = AV m Y m (s) V m s(h m si) 1 V mb B = AV m Y m (s)+v m (H m si)(h m si) 1 V mb V m H m (H m si) 1 V mb B = AV m Y m (s)+v m V mb V m H m Y m (s) B. We observe that if B Range(V m ) we have max r m (s) = max (AV m V m H m )Y(s) (4.1) s S m s S m with AV m V m H m not depending on s. The result above is general, as it holds for any approximation space spanned by the columns of V m. In the case when the approximation space is given by our rational Krylov space, the following result holds. Proposition 4.2. Let K m = [(A s 1 I) 1 Bd 1,...,(A s m I) 1 Bd m ], and let K m = V m R m be the skinny QR decomposition of K m. Let D m = [d 1,...,d m ]. Then AV m V m H m = (I V m V m)bd m R 1 m. Therefore, r m (s) = (I V m V m)b(d m R 1 m Y(s) I). Proof. Let us drop the subscript m when clear from the context, and let S = diag(s 1,...,s m ). From A(A s i I) 1 Bd i = Bd i + s i (A s i I) 1 Bd i it follows that AV = AKR 1 = (BD +KS)R 1. Using H = V AV = R K AKR 1 we have AV VH = BDR 1 +KSR 1 KR 1 R K AV = BDR 1 +KSR 1 KR 1 R K (BD +KS)R 1 = BDR 1 KR 1 R K BDR 1 = BDR 1 VV BDR 1 ;

8 V. Druskin, V. Simoncini and M. Zaslavsky the first result follows. Substituting this expression of AV VH in the residual we get r(s) = (I VV )BDR 1 Y(s) (I VV )B = (I VV )B(DR 1 Y(s) I). The expression above allows us to significantly reduce the computational cost whileseekingthenextpoleanddirection. Indeed, solvingmax s Sm r m (s) foralarge sampleofvaluesforsseemstorequirethecomputationofthetallmatrixr m (s)ateach value of s. Proposition 4.2 shows that that is not necessary, and only computation with small matrices are requested, as s varies. Indeed, let QL = (I V m V m)b be the skinny QR decomposition of (I V m V m)b. Then s m+1 = arg max s S m r m (s) = arg max s S m L(D m R 1 m Y m (s) I), (4.2) which requires the computation of the norm of an m m matrix for each value of s. Then d m+1 is computed as d m+1 = arg max d =1 r m(s m+1 )d, (4.3) which is thus determined as the largest right singular vector of the matrix r m (s m+1 ). Due to the relations established above, it is possible to compute the requested singular vectors by computing the singular value decomposition of the m m matrix L(D m Rm 1 Y m (s m+1 ) I). The main steps of the algorithm for generating the rational Krylov subspace with adaptive choice of possibly complex poles and tangential interpolation are summarized as follows. The part on the pole selection corresponds to that described in [21], with the variant of the matrix problem in (4.2). Algorithm 1. Given A, B, m max, s (1) 0,s(2) 0 C 1. Set s 1 = s (1) 0. Compute V 1 = QR((A s (1) 0 I) 1 B) 2. For m = 1,...,m max 1 3. Update T m = VmAV m and related quantities 5. Compute s m+1 : If C s m s m 1 or {m 1 and I(s 1 ) 0} then s m+1 = s m else: Compute {λ 1,...,λ m } eigenvalues of T m Determine S m, convex hull of { λ 1,..., λ m,s (1) 0, s(1) 0,s(2) 0, s(2) 0 } Solve (4.2) 6. Compute leading right singular vectors of r m (s m+1 ) (or of L(D m Rm 1 Y m (s m+1 ) I); cf. Proposition 4.2, with criterion (3.3)) to define d m R m l 7. ṽ m+1 = (A s m+1 I) 1 Bd m 8. Orthogonalize ṽ m+1 wrto previous basis vectors 9. v m+1 = QR(ṽ m+1 ), V m+1 = [V m,v m+1 ] We would like to stress that in Algorithm 1, d m is an m l matrix, with the value of l = l m depending on the iteration m, so that both ṽ m+1 and v m+1 may be matrices, if l > 1. Therefore, the approximation space is expanded by l m new vectors at each iteration m. Remark 4.3. In both algorithms we have preferred to use (A s m+1 I) 1 Bd m instead of the algebraically equivalent (according to the first result of Proposition 2.1) update (A s m+1 I) 1 r m (s m+1 )d m+1. Obviously, the relation among the two only holds in exact arithmetic, ad if (A s m+1 I) 1 is applied exactly. At this point it is not clear whether using (A s m+1 I) 1 r m (s m+1 )d m+1 would lead to a more stable procedure. Preliminary experiments did not lead to a definite answer.

Adaptive tangential rational Krylov subspaces 9 5. Transfer function approximation. We implemented the described tangential method for approximating the transfer matrix function H(ω) = C(A ωi) 1 B, ω ir. We are interested in comparing the new tangential approach (TRKSM and Multi- Dir TRKSM) with the block adaptive rational Krylov method (RKSM; [20],[21]) and the Extended Krylov subspace (EKSM, [18], [42]). RKSM is a block version of the SISO adaptive approach, which corresponds to repeating p times each newly determined shift, while EKSM is implemented as the multiple-column version, the way it is described in [42]. For ease of presentation, the case of a single direction will be considered in the tangential approach. We would like to stress that for C B, our approximation to H(ω) simply corresponds to a matrix multiplication of C and our approximation to (A ωi) 1 B, therefore matching at most m moments with a subspace of size m (optimal methods such as IRKA can match up to 2m moments and require 2m linear solves for one iteration due two different right and left subspaces in the Petrov-Galerkin formulation). Nonetheless, we show that the tangential approach provides a very satisfactory approximation with small m. In this performance evaluation we are mainly interested in the accuracy obtained by the approximation space, therefore we shall not report CPU times. We should recall here that we expect the new tangential approach to be more expensive than the block adaptive RKSM and EKSM, for the same space dimension, because a larger number of solves with different coefficient matrices is required. We should stress that the goal in the transfer function context is to determine a rich and small reduced space, possibly allowing for larger computational costs. In this section and in the rest of the paper, Îp denotes the matrix of the first p columns of the identity matrix, whose size can be deduced from the context. Example 5.1. We consider three data sets from the Oberwolfach collection [15]: the CDPlayer set, with n = 120 and given B and C with p = r = 2; the EADY set, with n = 598 where we considered C = B = A 1 Î 10, p = 10; and finally the FLOW set, with n = 6996 and given C with five rows and B with a single columns. In this latter case, we switched the role of B and C, and worked with A so as to simulate five inputs. We built a space of dimension 10 in all cases, and we evaluated the Euclidean norm of the error, H(ω) H m (ω), ω in a given interval, with respect to the true transfer function H(ω). The results are reported in Figure 5.1; in all cases, real shifts were used. In all cases the accuracy obtained by the tangential approach is significantly higher, for the same space dimension. The figures confirm that in the original block rational Krylov space, redundant information is provided by the multiple columns, which does not improve the quality of the space. On the other hand, the tangential approach is able to select the highest quality directions for the approximation of the transfer function matrix. Example 5.2. We consider the 10,000 10,000 matrix stemming from the finite difference discretization of the Laplace operator in [0, 1] [0, 1], with Dirichlet boundary conditions, and C = B = A 1 Î 10. In the left plot of Figure 5.2 we report the error H(ω) H m (ω) for a subspace of dimension 100, when the block RKSM, EKSM and the new tangential RKSM method are used. In the right plot, the overall error H(ω) H m (ω) L2 for the whole range of ω s is computed, as a function of the space dimension, for all considered methods. Both plots show the effectiveness of the tangential approach. In particular, for a MIMO system with ten inputs, the

10 V. Druskin, V. Simoncini and M. Zaslavsky error error 10 6 10 1 TRKS block RKS H(ω) H m (ω) 10 4 10 2 TRKS block RKS H(ω) H m (ω) 10 0 10 1 10 0 10 2 10 2 10 0 10 1 10 2 10 3 10 4 ω 10 3 10 4 10 3 10 2 10 1 10 0 10 1 ω error TRKS block RKS 10 2 H(ω) H m (ω) 10 1 10 0 10 3 10 2 10 1 10 0 10 1 ω Fig. 5.1. Example 5.1. Matrices from the Oberwolfach collection. Error H(ω) H m(ω) for ω ir for the original block RKSM and for the new tangential approach; space dimension equal to m = 10. From the top, clock-wise: CDPlayer, EADY, and FLOW. block RKSM space of dimension 100 has only generated 10 different poles, while the EKSM space has only at most five powers with A or A 1, affecting the accuracy of both approaches. Example 5.3. In this example we compare the new tangential procedure with the H 2 -optimal IRKA method [25], [7]. Figure 5.3 shows the error H(ω) H m (ω) for two typical small benchmark problems; larger problems required a huge amount of computation for the iterative IRKA procedure to achieve a relative accuracy of 10 4 for its final poles. The left plot shows the error for the dataset ISS, with n = 270 and p = r = 3 ([15]), while the right plot shows the error for the dataset CDPlayer, with n = 120 and p = r = 2 ([15]); in both cases, an approximation space of size 10 was requested and complex poles were used in all methods. Although in general our procedure cannot compete with the optimal IRKA method in terms of accuracy for a fixed space dimension, our experiments showed that it compares rather well when looking at the error H 2 -norm. On the other hand, the cost of the whole tangential procedure is less than the cost of a single iteration of IRKA, and it is thus very affordable, especially on large size problems. For completeness, we report that IRKA applied to the ISS(resp. CDPlayer) dataset required 4(resp. 8) iterations to converge, and every iteration generated two tangential-type rational Krylov subspaces of size 10 each. The same plots emphasize, once again, the gain of using the tangential approach over the original block rational Krylov method.

Adaptive tangential rational Krylov subspaces 11 10 4 10 6 tangential interpol. block RKSM EKSM 10 2 10 4 10 8 10 6 H(ω) H m (ω) 10 10 10 12 H H m L 10 8 10 10 10 14 10 16 10 18 10 4 10 3 10 2 10 1 10 0 10 1 10 2 10 3 10 4 ω 10 12 10 14 tangential interpol. block RKSM EKSM 0 10 20 30 40 50 60 70 80 90 100 space dimension Fig. 5.2. Example 5.2. A is the Laplace operator, B = A 1Î 10. Left: Error H(ω) H m(ω), ω i[10 4,10 4 ] for a space of dimension 100. Right: Error norm H H m L2 as a function of the space dimension. H(ω) H m (ω) 10 1 10 2 10 3 10 4 H(ω) H m (ω) 10 6 10 4 10 2 tangential interpol. block RKSM IRKA 10 5 10 6 tangential interpol. block RKSM IRKA 10 0 10 1 10 0 10 1 10 2 ω 10 0 10 1 10 2 10 3 ω Fig. 5.3. Example 5.3. Error H(ω) H m(ω), for a space of dimension 10. Left: dataset ISS, ω i[10 1,10 3 ]. Right: dataset CDPlayer, ω i[10 0,10 4 ]. 6. Solution of the Lyapunov equation. In this section we explore the approximate solution of the large scale Lyapunov equation AX +XA +BB = 0, (6.1) with B of rank (significantly) larger than one. Projection-type methods have emerged as state-of-the-art solvers for this class of equations, delivering a low rank approximate solution X in factored form, X = ZZ, so that only the tall matrix Z needs to be stored. The projection is usually performed onto a standard, global, extended or rational Krylov subspace, generated iteratively with A and the columns of B [40], [27], [42], [21]. A condition on the residual matrix such as the Galerkin condition is then imposed to extract the solution. More precisely, if the orthonormal columns of

12 V. Druskin, V. Simoncini and M. Zaslavsky V m span the approximation space, then we can write X = V m Y m V m for some matrix Y m, and by imposing the Galerkin condition V m(a X+ XA +BB )V m = 0 (see, e.g., [40]). The matrix Y m can be uniquely 1 determined as the solution to the resulting small Lyapunov equation V mav m Y +YV ma V m +V mbb V m = 0. (6.2) In exact arithmetic, Y is symmetric and positive (semi)definite, so that it can be factorized as Y = LL, so that X = V m L(V m L). In fact, an SVD truncation of L is performed to obtain a low rank approximation of Y, so that V m L has smallest possible number of columns. In [21] the adaptive rational Krylov subspace was shown to be competitive on particularly hard problems for a rank-one B, for which a large dimension was still required for other approximation spaces such as the extended Krylov space to deliver a sufficiently accurate solution. Though a block version of the adaptive procedure can be readily obtained, as discussed in [21], the approximation space is expected to grow very rapidly, by p columns at the time. The tangential version of the adaptive rational Krylov subspace should be able to significantly reduce the space dimension for p 1, while keeping the computational costs limited; we explore its behavior in this section. For completeness, we shall consider both the single direction and the multiple direction cases, though experimental results seem to favor the latter. For the solution to the matrix equation, it is advisable to include the original matrix B onto the approximation space, so that for the single direction case, we shall use range([b,(a s 1 I) 1 Bd 1,...,(A s m I) 1 Bd m ]), (6.3) whose basis we shall still denote with V m, though its dimension is now p+m, for B of size n p and d i R p, i = 1,...,m. 6.1. Implementation aspects. The solution matrix Y may be computed periodically during the generation of the tangential rational space, by solving the matrix equation (6.2) with VmAV m = H m. Due to its reduced size, a Schur-based scheme such as the Bartels and Stewart method can be used [6]. Convergence can be monitored by computing the residual norm, although the residual matrix should not be formed explicitly for large A. The following proposition allows us to obtain the norm of the residual at a low computational cost, without explicitly computing the associated matrix. Proposition 6.1. Let the orthonormal columns of V m span the space (6.3), and let X = Vm Y m Vm, B m = VmB. Define also D m = [d 1,...,d m ] to be the matrix of tangential directions and S m = diag(0,...,0,s 1,...,s m ) of size p + m. Let also [AB,V m ] = Q b R b be the skinny QR decomposition of [AB,V m ]. Then [ ] A X + XA +BB O Ym = W Y m B m Bm W, where is any unitarily invariant norm, with [[ ] [ ] [ ]] Ip 0 W := R b R 1 0 0 + 0 B m D m R S m R 1, R (2p+m) (2p+m). I p+m 1 The uniqueness is ensured by the requirement that the field of values of A, and thus of V m AVm, is contained in the left-half complex plane.

Adaptive tangential rational Krylov subspaces 13 and R = [B m, ], where the values are the Gram-Schmidt orthogonalization coefficients obtained from the generation of the basis in V m. Proof. WerecallthatB belongstothespace,sothatb = V m B m withb m = VmB. We can write (see, e.g., [34] for similar derivations) [ ] A X + XA +BB O Ym = [AV m,v m ] Y m B m Bm [AV m,v m ] We next implicitly derive a skinny QR decomposition of [AV m,v m ]. Let K = [B,(A s 1 I) 1 Bd 1,...,(A s m I) 1 Bd m ], sothat V m R = K for some uppertriangular matrix R. We also notice that for any scalar s and vector d, A(A si) 1 Bd = Bd+s(A si) 1 Bd. Therefore, AV m = [AB,A(A s 1 I) 1 Bd 1,...,A(A s m I) 1 Bd m ]R 1 = [AB,Bd 1,...,Bd m ]R 1 +[0,(A s 1 I) 1 Bd 1 s 1,...,(A s m I) 1 Bd m s m ]R 1 = [AB,BD m ]R 1 +[B,(A s 1 I) 1 Bd 1,...,(A s m I) 1 Bd m ] S m R 1 = [AB,V m B m D m ]R 1 +V m R S m R 1 ([ ] [ ]) I 0 = [AB,V m ] R 1 0 + 0 B m D m R S m R 1 Therefore, [[ I 0 [AV m,v m ] = [AB,V m ] 0 B m D m ] [ R 1 + 0 R S m R 1 ] [ ]] 0, I p+m (6.4) Using the QR factorization of [AB,V m ] the result follows. We observe that all quantities defining the matrix W in Proposition 6.1 are available during the computation, except for the (2p + m) (2p + m) matrix R b, which can be updated by means of a Gram-Schmidt process during the construction of V m. Moreover, the QR decomposition of [AB,V m ] can be updated iteratively. The expression for AV m obtained in the proof of Proposition 6.1 can be used to simplify the computation of (AV m V m H m )Y(s) in (4.1) during the determination of the next shift and tangential direction. An explicit formula is obtained in the following corollary. Corollary 6.2. With the definitions and notation of Proposition 6.1, it holds [ ] Ip+m (AV m V m H m )Y(s) = W Y(s). H m [ Ip+m ] Proof. We write (AV m V m H m )Y(s) = [AV m,v m ] Y(s). The result H m thus follows from using (6.4) and the QR decomposition [AB,V m ] = Q b R b. The main steps of the overall algorithm are depicted next. Algorithm 2. Given A, B, m max, s (1) 0,s(2) 0 C 1. Set D = [], S = s (1) 0 2. Compute [V m,b m ] =QR(B), [Q,R b ] =QR([AB,V]) 3. Compute H m = VmAV m, W H = R b [I; H m ] 4 While (not converged) 4.1 Select new shift s m+1 as in Algorithm 1

14 V. Druskin, V. Simoncini and M. Zaslavsky 4.2 Select new direction(s) d m+1 = argmin d r m (s m+1 )d 2, using Corollary 6.2 4.3 Update diagonal of S with s m+1, and columns of D with d m+1 4.4 Solve (A s m+1 I)v = Bd m+1 4.5 Orthogonalize v with respect to V m to get v m+1, V m+1 = [V m,v m+1 ] 4.6 Update H m+1 = [H m,vmav m+1 ;vm+1av m,vm+1av m+1 ] 4.7 Update B m, and R = [B m,h m+1 ] 4.8 Solve H m+1 Y +YHm+1 +B m Bm = 0 to get Y m+1 4.9 Compute reduced residual using Proposition 6.1 4.10 If converged then exit while statement 4.11 Update [Q,R b ] =QR([AB,V m+1 ]) 4.12 end 5. Reduce rank of Y m+1 : 5.1 Compute eigenvalue decomposition, Y m+1 = XΛX 5.2 Truncate: X = [X 1,X 2 ], Λ = diag(λ 1,Λ 2 ), max(diag(λ 1 )) > 10 12 5.3 Z = V m+1 X 1 Λ 1 2 1 The minimization problem in step 4.2 could be performed using the less expensive Frobenius norm, when p is very large. The same could be done for computing the residual norm. Both quantities could also be estimated by means of an iterative procedure that approximates the largest singular value of a matrix. For very large space dimensions valuable savings can be observed if this estimation is implemented. Thetruncationofthefinalapproximatesolutioninstep5isdoneviaaneigenvalue decomposition of the positive semidefinite matrix. 6.2. Numerical experiments. In the following we report on a few numerical experiments we have performed for solving the Lyapunov equation. The aim of these experiments is to illustrate the potential of the tangential approach, both for the single and for the multiple direction cases. To this end, we shall mainly focus on symmetric problems, for which the comparison with the other methods can be done more easily when the inner system solves cannot be performed with a direct method. Indeed, most considered methods require dealing with multiple right-hand side linear systems, whose effective solution is a research topic by itself, especially in the nonsymmetric case. In the symmetric case, we solve these systems iteratively, and we use preconditioned block CG [41],[3], with Algebraic Multigrid as preconditioner [11]. In the nonsymmetric case we only report results with direct inner system solves. Clearly, direct methods should not be used for very large coefficient matrices. In all tests, the stopping tolerance for the Frobenius norm of the Lyapunov equation residual was set to 10 6. The matrix C was normalized to have unit 1-norm. The solution of the projected problem was performed by using the Bartels-Stewart algorithm [6], available as source code in Matlab. The projected problem was solved at each iteration, that is at the generation of a new shift. The considered methods advance with a different number of vectors at each iteration, therefore the size of the projected equation may vary significantly, for the same number of shifts computed. All results were obtained with Matlab 7.13 (R2011b) on a Machine running linux with 8Gb memory. Example 6.3. We consider solving (6.1) for A of size 125,000 125,000 being the discretization of the Laplace operator in the unit cube, with Dirichlet zero boundary conditions. Different choices of columns for B were performed, as reported in Table 6.1. For these data, we see that the original block rational Krylov method and its single vector tangential variant have large computational costs, while the former

Adaptive tangential rational Krylov subspaces 15 Table 6.1 Example 6.3. 3D Laplace operator. Comparison of all methods. Inner systems are solved by AMG preconditioned block CG. block MultiDir EKSM RKSM TRKSM TRKSM CPU space CPU space CPU space CPU space B time dim. time dim. time dim. time dim. A 1 Î 5 16.81 40 24.67 35 17.79 22 13.83 25 A 1 Î 10 37.57 80 46.36 70 37.35 44 27.66 48 A 1 Î 15 52.74 120 71.20 105 56.21 64 37.03 66 A 1 Î 20 74.56 160 97.04 140 81.58 84 48.28 83 rand(n, 10) 36.69 80 47.18 70 36.99 44 25.89 48 rand(n, 20) 78.61 160 97.27 140 75.56 84 47.87 83 rand(n, 30) 147.97 300 161.83 210 117.30 117 85.34 139 rand(n, 40) 220.91 400 243.23 320 145.16 142 105.47 167 even has memory requirements comparable to those of EKSM. The multi-direction scheme consistently shows the best performance in terms of CPU over all other block methods, while greatly limiting the memory demands of the original block rational method. In particular, the larger space dimension when using more than one direction at each iteration, shows that the multi-direction strategy is capable of selecting and preserving important information during the iteration, while discarding redundant one. Example 6.4. We consider the set Rail from the Oberwolfach collection ([15]), with two symmetric matrices of size n = 20209 and n = 79841, respectively, and B having seven mutually orthogonal columns. We also test with the first p columns of A 1 as B. The numerical results in Table 6.2 immediately show that EKSM has convergence problems (loss of basis rank) for certain choices of columns in B; This problem does not occur in the rational Krylov methods. Except for the choice of B given in the data set, for which the original block method provides average performance, the original rational scheme shows high computational costs and memory requirements for both problem sizes. These shortcomings are cured by the tangential approaches. For both problem dimensions, CPU times of the multi-direction scheme consistently outperform those of the single direction approach for B from the data set. We believe that the orthogonality of these vectors at least partially explain the good performance of the original block method. Example 6.5. We consider two nonsymmetric matrices again from the Oberwolfach collection ([15]). The data set FLOW comes with a matrix A of size 9669 and C with five columns, while the data set CHIP, with matrix A of size 20082, only provides a single column B = C. For the latter data set, we added the first four columns of the identity matrix so as to obtain p = 5. We compared all considered methods and used a direct method for solving the linear systems required at each iteration by all procedures. In particular, for the extended Krylov subspace method the matrix A can be factorized at the beginning, and then only the factors are back-solved at each iteration. For all rational methods, the matlab back-slash function was used [31]. The stopping tolerance for the Lyapunov solver was set to 10 6. The Frobenius norm of the absolute residual was used in the stopping test. The results for the two datasets are reported in Table 6.3 and Table 6.4,

16 V. Druskin, V. Simoncini and M. Zaslavsky Table 6.2 Example 6.4. Rail set. Comparison of all block methods. Inner systems solved by AMG preconditioned block CG. block MultiDir EKSM RKSM TRKSM TRKSM CPU space CPU space CPU space CPU space B time dim. time dim. time dim. time dim. 20209 p = 7 20.16 196 14.07 91 14.08 85 9.58 87 A 1Î 5 4.51 20 6.80 45 0.89 7 1.15 9 A 1Î 10 6.41 60 18.03 150 1.15 14 1.43 17 A 1Î 15 12.02 120 29.35 240 1.28 20 1.59 23 A 1Î 20 - - 22.82 180 1.33 25 1.64 28 79841 p = 7 94.09 238 60.09 98 86.96 93 53.60 95 A 1Î 5 16.16 30 20.01 35 1.98 7 1.98 7 A 1Î 10 27.89 60 60.90 110 2.81 13 3.62 14 A 1Î 15 - - 102.04 195 2.96 18 3.50 19 A 1Î 20 - - 145.62 260 3.08 23 3.62 24 TRKS MultiDirTRKS RKS 10 4 10 3 10 2 10 1 10 0 10 1 10 2 10 3 Fig. 6.1. Example 6.5. FLOW dataset, direct inner solves. Shifts adaptively determined by the three rational Krylov methods, with B of rank p = 5. respectively. They readily show that all rational Krylov approaches are very effective compared with the extended Krylov one, and this is mainly due to the presence of the multiple vectors. The tangential strategy reduces the memory allocation of the rational Krylov method significantly, while the multi-direction version is also able to give low CPU time. We stress that when a direct method is used for solving the inner linear systems, methods that simultaneously solve for many right-hand sides have a great advantage: on the one hand, a large portion of the solution procedure is performed on the matrix once for all p systems, on the other hand, the final stage can be applied simultaneously to all right-hand sides, increasing data locality. Although the new multi-direction tangential approach pays the price of solving more systems with different coefficient matrix, the convergence improvement is such that this cost is completely overcome. For the FLOW data set, in Figure 6.1 we also report the final distribution of the adaptively determined shifts s 1,s 2,..., for all rational methods, with B of rank p = 5. We readily observe that TRKSM generates the largest number of poles,

Adaptive tangential rational Krylov subspaces 17 Table 6.3 Example 6.5. FLOW set. Comparison of different block methods. Inner systems solved by direct method. block MultiDir EKSM RKSM TRKSM TRKSM CPU space CPU space CPU space CPU space p time dim. time dim. time dim. time dim. 2 1.09 104 1.32 16 0.86 17 0.83 17 3 26.36 420 7.77 96 2.84 42 2.78 42 4 25.20 448 5.29 108 3.52 64 3.48 64 5 33.30 510 9.38 165 3.66 91 3.62 91 Table 6.4 Example 6.5. CHIP set. Comparison of different block methods. Inner systems solved by direct method. block MultiDir EKSM RKSM TRKSM TRKSM CPU space CPU space CPU space CPU space p time dim. time dim. time dim. time dim. 2 24.87 212 21.99 34 33.54 36 17.70 32 3 26.53 288 19.92 51 40.48 44 18.44 44 4 33.70 376 24.02 76 46.46 50 18.86 52 5 47.05 460 35.85 95 54.96 60 19.50 60 distributed more or less evenly throughout the interval. A similar but less populated distribution is obtained with the multidirection approach. The original block RKS method has the lowest number of parameters, more clustered towards the two ends of the interval. Example 6.6. In this example we perform a deeper analysis of the cost associated with the methods, as a function of the number of columns in B, where the n 40 matrix B is chosen to have random components uniformly distributed in (0,1). We first use the 90,000 90,000 matrix A stemming from the discretization of the self-adjoint operator L(u) = (e xy u x ) x +( e xy u y ) y. The left plot of Figure 6.2 shows how the space dimension at convergence grows with p, the number of columns of B. The right plot of Figure 6.2 reports the CPU time required by all methods to solve the problem, for the given p. In all cases, the inner solves are performed with PCG and AMG preconditioning [11]. The plots show the typical expected performance of these methods when the inner system is solved iteratively: TRKSM performs very well in terms of memory requirements, though its computational cost tends to unsatisfactorily grow with p. The multi-direction version maintains a qualitatively similar approximation space, while keeping the cost low. Both block methods, EKSM and RKSM become increasingly inefficient with p. The second group of experiments is performed with the matrix CHIP, of size 20,090 and a direct method is used to solve all systems. Results are reported in Figure 6.3; the matrix B was chosen as before. On the one hand, the picture reporting the space dimension is analogous to that for L, showing the significant memory savings of both tangential approaches. On the other hand, the computational costs associated with the direct solver are significantly higher, making the single direction tangential method too costly, compared with the other approaches, especially for large p. Multi-