ADAPTIVE RATIONAL KRYLOV SUBSPACES FOR LARGE-SCALE DYNAMICAL SYSTEMS. 1. Introduction. The time-invariant linear system

Size: px

Start display at page:

Download "ADAPTIVE RATIONAL KRYLOV SUBSPACES FOR LARGE-SCALE DYNAMICAL SYSTEMS. 1. Introduction. The time-invariant linear system"

Shannon Stokes
5 years ago
Views:

1 ADAPTIVE RATIONAL KRYLOV SUBSPACES FOR LARGE-SCALE DYNAMICAL SYSTEMS V. DRUSKIN AND V. SIMONCINI Abstract. The rational Krylov space is recognized as a powerful tool within Model Order Reduction techniques for linear dynamical systems. However, its success has been hindered by the lack of a parameter-free procedure, which would effectively generate the sequence of shifts used to build the space. In this paper we propose an adaptive computation of these shifts. The whole procedure only requires to inject some initial rough estimate of the spectral region of the matrix, while further information is automatically generated during the process. The approach is a full generalization to the nonsymmetric case of the idea first proposed in [19] and it is used for two important problems in control: the approximation of the transfer function and the numerical solution of large Lyapunov equations. The procedure can be naturally extended to other related problems, such as the solution of the Sylvester equation, and parametric or higher order systems. Several numerical experiments are proposed to assess the quality of the rational projection space over its most natural competitors. 1. Introduction. The time-invariant linear system x (t) = Ax(t)+Bu(t), x(0) = x 0 y = Cx(t) arises in a great variety of application problems such as signal processing, system and control theory. Here we assume that A is negative definite, that is its numerical range W(A) = {x Ax : x C n,x x = 1} is a strict subset of C. A major task in the analysis of linear dynamical systems consists in reducing the continuous-time system ( ) A B Σ =, A R n n C onto the significantly smaller system ˆΣ = ( Ã B C ) with Ã of size m n, while maintaining the most relevant properties of the original system Σ. We refer the reader to [44], [10] for a discussion of the state of the art methods and for collections of challenging problems. In this paper we shall focus only on the single-input single-output case, so that both matrices B and C are vectors, namely B = b and C = c. Various approaches have been explored in the literature, based either on invariant subspaces (see, e.g., [21], [45], [7] and references therein) or on projection spaces (see, e.g., [1] for a general presentation). Within the second class a leading role is played by Krylov subspace type methods, see also [23], [2], [13]. The standard Krylov subspace is defined as K m (A,b) = span{b,ab,...,a m 1 b}, with dim(k m (A,b)) m; by projecting the system matrices onto this space, it is sometimes possible to obtain Version of April 18, Scientific Advisor, Ph.D., Schlumberger Doll Research, 1 Hampshire st. Cambridge, MA USA (druskin1@slb.com) Dipartimento di Matematica, Università di Bologna, Piazza di Porta S. Donato 5, I Bologna, Italy and CIRSA, Ravenna, Italy (valeria@dm.unibo.it). 1

2 2 V. Druskin and V. Simoncini a sufficiently accurate reduced system with a moderate space dimension. Other more effective variants have been analyzed, the most general of which is given by the rational Krylov subspace defined as m K m (A,b,s) = span{(a s 1 I) 1 b,..., (A s j I) 1 b} (1.1) where s = [s 1,s 1,...,s m ] C m. The rational Krylov method was originally proposed by Ruhe [39] in the context of approximating interior eigenvalues, with which appropriately chosen shifts would accelerate convergence to the sought after spectral region. Within model order reduction, the role of rational Krylov subspaces is rather different, as they are particularly well suited for approximating the behavior of the transfer function on the imaginary axis. Indeed, it is now acknowledged as being one of the most powerful projection approaches for reducing the given linear dynamical system. Major efforts have been devoted to the characterization and parameter evaluations of this space, see, e.g., [25], [26], [48], [36]. However, the success of the rational Krylov subspace has been hindered by the lack of a parameter-free procedure. In most cases information on the problem or a trial and error approach have been used to tune the choice of the shifts s j. In the context of H 2 -optimality reduction, an interesting attempt to provide an automatic selection has been recently proposed in [28]. However, the computational and memory costs of this approach have not been fully assessed. We also mention the early efforts due to Grimme [27] for determining a sequence of shifts. Similar concepts were recently employed in a greedy sampling approach for multi-parameter model reduction considered in [50]. However neither of these works relied on a simple rational function like the one we are going to introduce. In this paper we propose an adaptive computation of the shifts for building the rational space. The whole procedure only requires to inject some initial rough estimate of the spectral region of the matrix A, while further information is automatically generated during the process. The overall process is basically parameter-free, although performance can be even improved if accurate initial estimates are at hand. The approach is a generalization to the challenging nonsymmetric case of the idea first proposed in [19], where it was used for a specific symmetric application problem, and it is investigated here for two important problems in control: the approximation of the transfer function and the numerical solution of large Lyapunov equations. The procedure is very general and can be naturally extended to other related problems, such as the solution of the Sylvester equation, and parametric or higher order systems [3], [24]. The new procedure is tested on available benchmark problems of medium and large dimensions; our experimental results show that the adaptive approach allows one to obtain a reduced system of very small dimension compared to available projection type techniques. Throughout the paper the Euclidean norm for vectors and the induced norm for matrices is used, except when using the Frobenius norm, which is defined as A 2 F = i,j a i,j 2, where a i,j are the entries of A. Moreover, A denotes the transpose of a matrix, and A its conjugate transpose. 2. Computation of the Rational Krylov subspace. In this section we briefly recall the algorithm for computing an orthonormal basis of (1.1) and a representation matrix for A in the same subspace, and we describe the proposed dynamic computation of the shifts. In the next sections the use of the computed space in two related contexts will be described. j=1

3 Adaptive rational Krylov space 3 Assume that a sequence of parameters {s 1,s 2...,s m } is given. An orthonormal basis can be generated using a standard Arnoldi-type process with a modified Gram-Schmidt (re)orthogonalization procedure, starting with (A s 1 I) 1 b and then continuing with a new shift at each iteration [39]. We will use V m to denote the matrix whose columns are the computed orthonormal basis vectors and T m := V mav m ; since we require that the numerical range of A is in C, the matrix T m is stable. The standard rational Krylov subspace procedure is outlined in the following pseudo-code. Algorithm 0. Given A, b, m max, {s 1,s 2,...,s mmax } 1. Compute ṽ 1 = (A s 1 I) 1 b, v 1 = ṽ 1 / ṽ 1, V 1 = v 1 2. For m = 1,...,m max 1 3. Compute the new elements of T m = V mav m 4. ṽ m+1 = (A s m+1 I) 1 v m 5. for j = 1 : m %Orthogonalization step ṽ m+1 = ṽ m+1 v j (ṽ m+1v j ) enddo 6. Scale v m+1 = ṽ m+1 / ṽ m+1, V m+1 = [V m,v m+1 ] In our setting, we assume that we are not given the sequence of poles s 1,s 2,..., s mmax, and thus we need to include the procedure to automatically generate the shift sequence. Given s 1, at each subsequent iteration a new shift s m+1 is selected to compute a new basis vector as ṽ m+1 = (A s m+1 I) 1 v m. We start by observing that any vector u K m (A,b,s) can be written as u = p m 1 (A)q m (A) 1 b, (2.1) where p m 1 is a polynomial of degree at most m 1, while q m is a fixed polynomial of degree m, whose roots are the components of s. In the context of solving shifted systems of the type (A si)x = b, an approximate solution x m K m (A,b,s) can be determined as x m = V m (T m si) 1 V mb (obtained by imposing the Galerkin orthogonality condition on the corresponding residual [42, sec. 5.2]). We can make a more precise statement about the structure of x m with the help of the so-called skeleton approximation f θ1,...,θ m,s 1,...,s m (θ,s) introduced in [51], which is a [m 1/m] rational function of each variable, interpolating (θ + s) 1 at θ = θ i, s = s i, i = 1,...,m. It was shown in [18] that x m = f λ1,...,λ m,s 1,...,s m (A,s)b, where λ j, j = 1,...,m are the eigenvalues of T m. So the residual of the resolvent can be viewed as the relative interpolation error of (θ+s) 1 by the skeleton approximation and can be represented as b (A si)v m (T m si) 1 V mb = r m(a)b r m (s), r m = m j=1 z λ j z s j ; (2.2) see [19]. We stress here that the use of shifted systems to introduce our strategy is very natural, as both application problems we shall analyze in the sequel are strictly related to shifted systems. We further observe that the characteristic polynomial of T m minimizes p(a)v

4 4 V. Druskin and V. Simoncini among all monic polynomials of degree m, so that the numerator in (2.2) satisfies m r m (A)b = min θ 1,...,θ m (A θ j I)(A s j I) 1 b. (2.3) j=1 With this result in mind, it was proposed in [19] for A symmetric to select the next shift s m+1 as ( ) 1 s m+1 = arg, s [ λ max, λ min] r m (s) where [λ min,λ max ] is an estimate of A s spectral interval. When working with nonsymmetric matrices, in principle we may allow both sets of interpolatory points {λ j } and {s j } to be complex. Obviously, (2.2) and (2.3) are also valid for nonsymmetric A, provided A and T m do not have Jordan blocks of size larger than 1. We thus extend the adaptive approach of [19] to a nonsymmetric A, by obtaining the next pole s m+1 so that ( s m+1 = arg 1 max s S m r m (s) ), (2.4) where S m C + approximates the mirrored spectral region of A, and S m is its border. It is important to realize that the problem in (2.4) determines the next pole for the expansion of the rational Krylov space, which will also be the extra pole of the function r m+1, compared to r m. Therefore, at each iteration the function r m changes with respect to r m 1 as follows: the numerator may change completely, since its roots are fully recomputed, whereas the denominator polynomial only gains an extra root. The procedure to numerically address (2.4) starts with two given values s (1) 0 C, s (2) 0 C which, together with their complex conjugates, define a rough approximation of the mirrored spectral region of A; typically s (1) 0 approximates the part of the spectrum closest to zero, and s (2) 0 that farthest from zero. We set s 1 = s (1) 0 as first shift; moreover, the next few poles are chosen in { s (1) 0,s(2) 0, s(2) 0 }. This way four poles at most are determined from the initial values. To compute the next pole, we first need to determine S m : we propose to take as S m the convex hull of the set { λ 1,..., λ m,s (1) 0, s(1) 0,s(2) 0, s(2) 0 }, where the λ j s are the eigenvalues of T m = V mav m. Once S m is available, problem (2.4) is numerically solved by determining the maximum over each edge of the polygon S m. This can be done either with an optimized minimization routine, or simply by sampling values of r m (s) 1 on each edgeandtakingthemaximum. Thens m+1 isobtainedasthecorrespondingargument. Finally, if the determined s m+1 is complex, the subsequent pole is automatically selected as s m+2 = s m+1, so that a real basis can be obtained. Indeed, whenever data are real and shifts are complex conjugate, it is possible to maintain a real basis and projection matrix T m, although the expensive solution of the linear system still requires complex arithmetic; we refer the reader to [40] for a possible implementation. The proposed rational Krylov subspace method with adaptive shift selection is summarized in the following algorithm.

5 Algorithm 1. Given A, b, m max, s (1) 0,s(2) 0 C Adaptive rational Krylov space 5 1. Set s 1 = s (1) 0. Compute ṽ 1 = (A s 1 I) 1 b, v 1 = ṽ 1 / ṽ 1, V 1 = v 1 2. For m = 1,...,m max 1 3. Update T m = V mav m 4. Compute s m+1 : If C s m s m 1 or {m 1 and I(s 1 ) 0} then s m+1 = s m else 1 : Compute {λ 1,...,λ m } eigenvalues of T m Determine S m, convex hull of { λ 1,..., λ m,s (1) 0, s(1) 0,s(2) 0, s(2) 0 } Solve (2.4) 5. ṽ m+1 = (A s m+1 I) 1 v m 6. for j = 1 : m %Orthogonalization step ṽ m+1 = ṽ m+1 v j (ṽ m+1v j ) enddo 7. Scale v m+1 = ṽ m+1 / ṽ m+1, V m+1 = [V m,v m+1 ] Except for the extra step 4, the algorithm is the same as Algorithm 0. Note that the final space dimension may differ by one with respect to m max, to accommodate all conjugate pairs. A possible Matlab ([34]) implementation of the whole algorithm is given in the Appendix. A typical setting for the selection of the next shift is depicted in Figure 2.1 for the CDPlayer data (cf. Example 3.1). In particular, the figure shows the convex hull S m (continuous polygon) for m = 20 (left plot) and m = 30 (right plot), the eigenvalues of A ( + ), and the previously selected shifts ( ). Here s (1) 0 = 10 1 and s (2) 0 = 800+i x x imaginary part imaginary part real part real part Fig CDPlayer data set, m = 20 (left) and m = 30 (right). Typical location of S m (solid polygon), eigenvalues of A ( + ) and previously selected shifts ( ). The typical pole distribution is apparent in the plot. The poles tend towards the border of the convex hull of the mirrored eigenvalues of A, according to the approximation obtained by the current rational Ritz values. Note that the extra 10 poles from the left to the right plot cluster in the leftmost (closest to zero) part of the 1 As already mentioned, the actual procedure also includes s (2) 0 as shift in the first few iterations. See the Appendix for a more complete implementation.

6 6 V. Druskin and V. Simoncini spectrum, suggesting that the remaining spectral region is already well captured. A detailed analysis of the convergence of rational Ritz values towards certain eigenvalues of A for A symmetric was recently developed in [5]. A complete theoretical description of the convergence of rational Ritz values in the nonsymmetric case remains a challenging open problem. We also remark that if all poles are requested to be real, then there is no need to compute the convex hull. Instead, the interval determined by the initial real values s (1) 0,s(2) 0 can be used as S m. The values λ 1,...,λ m can still be allowed to be complex in the definition of r m. As we already mentioned, the idea of adaptive pole selection was first discussed in Grimme s Ph.D Thesis. In the introduction to[27, sec ] it was suggested to chose the next pole at the maximum of the error function he was trying to monitor, however this idea was not further explored later in the thesis. A somewhat related idea was more recently explored in [38], which still inherits the same main drawback, i.e., an extra computational cost required for the determination of a more accurate reference solution to evaluate the error function. This drawback was partially circumvented in [54] in a rather general parametric setting by using the residual norm as the error estimate. However, the parametric setting lacks an explicit residual formula similar to (2.2), that is probably why the algorithm of [54] is limited to the selection of an optimal subset of a given candidate (finite) set of interpolation points, without a clear recipe of how to chose the latter. In [27, sec ], E. Grimme also principally suggested to use either few a-priori chosen real poles logarithmically distributed on the frequency interval of interest, or linearly/logarithmically spaced imaginary poles, whose number is however difficult to predict, according to Grimme. The possibility of using general complex shifts is also mentioned there, but not fully explored. It is also interesting that the pole selection has been more largely discussed in the literature associated with the solution of the Lyapunov equation by means of the ADI method (cf. sec. 4), for which however, an a-priori selection is the standard practice Analysis of the proposed shift selection. It is known that standard Krylov subspace - based methods may be accurate at the local level, but are far less so in the H or H 2 norms, compared with methods such as balanced truncation; see, e.g., [1]. On the other hand, it was recently shown in [17] that the Rational Krylov subspace using real shifts obtained via the classical Zolotaryov solution, yields asymptotically optimal linear convergence rate for the approximation of the resolvent on the whole imaginary axis for bounded positive-definite self-adjoint operators with continuous spectrum. In particular, the following optimal convergence rate for the L error of the approximation of the resolvent on ir was derived: O [ exp ( π2 m(1+o(1)) 2log 4λmax λ min )], for λ max λ min 1. (2.5) In [17] the authors also proposed the computation of a nested sequence of shifts (socalled equidistributed sequences or EDS), so that the asymptotically optimal rational space with convergence given by (2.5) could be computed iteratively. The EDS is generated as a pseudo-random sequence distributed with some density ρ[a,b](x) on [λ min,λ max ]. This density corresponds to the equilibrium charge distribution (for x < 0) of the condenser with positive and negative plates [ λ max, λ min ] and [λ min,λ max ], respectively. Bound (2.5) corresponds to the worst possible scenario

7 Adaptive rational Krylov space 7 for a given spectral interval, since it assumes there is no possibility for essential spectral adaptation. It is exact for the cases when the spectral measures are proportional to the limiting measure of the EDS, i.e., to the above mentioned ρ[λ min,λ max ](x). In fact, convergence can be better for essentially different spectral measures. The Iterative Rational Krylov Algorithm (IRKA) proposed in [28] generates a sequence of Krylov subspaces with the shifts converging to those satisfying the firstorder H 2 -optimality conditions s i = λ i, i = 1,...,m, (2.6) where λ i are the eigenvalues of T m. The main disadvantage of this method is that it requires the construction of many Krylov subspaces which will not be utilized in the final model, since only the last subspace is used. We refer the reader to section 3.2 for additional discussion. An adaptive strategy allowing to beat (2.5) when possible was proposed in [19], again for A symmetric. As discussed in the previous section, the next (real) pole s m+1 is chosen to be the value that yields the maximum of r m 1 over the mirrored image of the real spectral interval of A; this way, the next function will have damped values around the new pole. At every iteration the algorithm in [19] adds a new pole in the denominator of r m and completely changes the numerator. Remark 2.1. The adaptive approach exploits a rather natural greedy strategy to select the next pole at the maximum of the residual on S m C, so that the choice of S m is crucial for the entire process to succeed. For example, if we allowed S m to include the Ritz values λ j s, then the algorithm would obviously fail. Our approach to the choice of S m can be justified using potential theory arguments. We refer the interested reader to [43] for a thorough analysis of these potential theory tools; Here we just outline a basic reasoning and reserve a more rigorous and detailed analysis for future work. Following [17], the problem min θ 1,...,θ m,s 1,...,s m max f θ 1,...,θ m,s 1,...,s m (A,s)b (A+sI) 1 b s ir { } can be approximately reduced to the generalized third Zolotaryov problem on the extended complex plane ([43]) given by m max λ spec(a) r(λ) min θ 1,...,θ m,s 1,...,s m min s ir { } r(s), where r(z) = z θ j, (2.7) z s j and spec(a) is A s spectrum. Interestingly, the solutions of the Zolotaryov problem satisfy (2.6) and the Zolotaryov optimal solution is also approximately H 2 -optimal. Let us assume that n m and that A can be treated as an infinite-dimensional operator with the spectral measure of positive logarithmic capacity. Then asymptotically optimal (in the Cauchy-Hadamard sense) solutions of the Zolotaryov problem will be given by the minimal energy charge distributions on the plane condenser with positive and negative plates ±spec(a), respectively ([43]), i.e., s i will be the negative equilibrium charge on the plate spec(a). If spec(a) were known, then indeed the ideal solution would be to solve (2.7). Instead, one can use an a-priori estimate of the spectral domain and its mirrored reflection as condenser plates to obtain the shifts a-priori, e.g., as done for the already mentioned EquiDistributed Sequences of shifts. Our adaptive algorithm uses the true spec(a) as the positive plate of the j=1

8 8 V. Druskin and V. Simoncini condenser and an iteratively updated mirrored spectral estimate S m for the negative one. Thanks to the optimal property of the Ritz values given by (2.3), for any fixed s 1,...,s m, the set θ 1 = λ 1,...,θ m = λ m approximately minimizes the energy of potential log r m (s) on the plate spec(a) for fixed s i. Then to minimize the energy of charges s i on S m, it is reasonable to chose the next charge s m+1 at the maximum of the logarithmic potential log r m (s) on S m, which precisely corresponds to (2.4). See also the discussion in [19] on the connection with the Leja-Bagby approach. The discussion above justifies the use of real shifts for A symmetric. It was experimentally shown that such an approach always gives linear convergence rate that is at least as good as the one using the EDS points of [17], although significant speedups were observed for nonuniform spectral distributions. The problem remains open as of which shifts should be used for A nonsymmetric with a complex spectrum. As a generalization we can either consider real shifts, with s (1) 0,s(2) 0 being the mirrored interval of the eigenvalue real parts, or complex shifts taken, over a mirrored estimate of W(A) or of the convex hull of the spectral region. From Remark 2.1, S m must be complex for problems with complex spectra, however the actual choice between real and complex poles is in most cases based on computational ground, and it depends on the application. Our experience shows that complex poles yield a rich space already for quite small m, thus they are particularly appropriate for the model order reduction treated in section 3. On the other hand, in the solution of Lyapunov equations, where the computational costs are in general higher, the cost of dealing with the solution of a different complex linear system at each iteration can be prohibitive, and thus real arithmetic is preferred throughout. 3. Evaluation of the transfer function. We consider the approximation of the transfer function h(ω) = c(a ωie) 1 b, ω R, (3.1) with A negative (semi-)definite and E = LL symmetric positive definite. This setting corresponds to a time-invariant system with x replaced by Ex. Since c(a ωie) 1 b = cl (L 1 AL ωii) 1 L 1 b, the discussion of previous sections applies to the matrix L 1 AL. For simplicity of notation, in the following we shall still refer to A, keeping in mind that A should be replaced by L 1 AL when E is not the identity; similarly for c and b. Also, we observe that when solving shifted systems with coefficient matrix L 1 AL s i I, we can in fact use L 1 (A s i E)L. In case of singular E, one could consider building the rational Krylov subspace using, e.g., the matrix (A s i E) 1 E; see, e.g., [26], [27]. The poles are selected accordingly. Let h approx be a reduced transfer function obtained by projection onto a space of dimension m, using a projection matrix V with orthonormal columns, namely h approx (ω) = cv(v AV ωii m ) 1 (V b). The H -norm of the error is given as (cf., e.g., [1, sec.5.3]) h h approx H = sup h(ω) h approx (ω). ω R The adaptive procedure in Algorithm 1 appears to numerically maintain the high convergence rate obtained by the theoretical Rational Krylov space with A symmetric and a-priori computed weights, as described in [17]; see also [19]. As an

9 Adaptive rational Krylov space Ext d Krylov Std Krylov Invert Krylov Rational Krylov EDP poles max ω h(ω) h approx (ω) Ext d Krylov Std Krylov Invert Krylov Rational Krylov EDS poles h(ω) h approx (ω) space dimension frequency range Fig Diagonal matrix with log-linearly distributed values in spectral interval. H -norm error (left), and h(ω) h approx(ω) (right) for m = 20. For Rational Krylov, the initial parameters were estimated as s (1) 0 = , s (2) 0 = 1. example, we consider a diagonal matrix of size 900 with log-uniformly distributed (eigen)values in [ 1, ], b = c = 1. The left plot of Figure 3.1 shows the error H -norm in (3.1) for various projection methods as the space dimension increases: Standard Krylov (using K m (A,b)), Inverted-Krylov (using K m (A 1,b)), see, e.g., [1] for their description in context), Extended Krylov (using K m (A,b) = K m (A,b) + K m (A 1,A 1 b)) [15], [46], [47], Rational Krylov space method with a- priori EquiDistributed Sequences of shifts (dash-dotted, only for theoretical purposes) and adaptive (solid) real shifts Ext d Krylov Std Krylov Invert Krylov Rational Krylov EDS poles max h(w) h approx (w) space dimension Fig Diagonal matrix with linearly distributed values in spectral interval. Error H -norm. The latter two spaces are the only ones that maintain a good convergence rate, in spite of the large condition number and densely distributed spectrum of the matrix. In particular, we remark that the log-uniform spectral distribution is asymptotically close to ρ[λ min,λ max ](x) for large condition numbers [31], i.e., it is asymptotically

10 10 V. Druskin and V. Simoncini optimal for the theoretical EDS point -based method. However the adaptive shifts are so well chosen that no significant difference is observed with respect to the former. For the sake of completeness, in the right plot of Figure 3.1 we also report the behavior of the methods for ω [10 10,10 10 ] and m = 20, showing that the other projection methods may behave well locally, cf. the frequency range [,10 4 ], but perform worse over the whole axis. It is also interesting to remark that for spectral measures significantly different from ρ[λ min,λ max ](x), the adaptive algorithm is observed to show a faster convergence rate than that with the EDS points. This fact is confirmed by the convergence history in Figure 3.2, where the diagonal matrix A now consists of linearly distributed (eigen)values in [ 1, ] eigs(a) hull adapt. poles W(A) 0.2 h h approx H 10 6 imaginary part Extended Krylov Std Krylov invert Krylov Rational Krylov pseudo EDS poles space dimension real part Fig Nonsymmetric matrix stemming from (3.2). Left: Convergence history of the transfer function approximation for various methods versus space dimension. Right: spectral quantities involved in the analysis. The theory in [17] does not cover the case of nonsymmetric A with complex spectrum. However, we hope that for moderate imaginary parts, the convergence behavior will not be very different. We consider computing the next shift by maximizing 1/ r m over certain intervals, where the zeros of r m are rational Ritz values, while the poles of r m are the previously computed shifts. In the left plot of Figure 3.3 we show the H -norm of the error of all methods versus the space dimension, for the matrix A of size , stemming from the centered finite difference discretization of the operator L(u) = (exp( xy)u x ) x +(exp(xy)u y ) y (10(x+y)u) x, (3.2) on the unit square with Dirichlet homogeneous boundary conditions. The eigenvalues of A are contained in the complex box [ , ] [ , ] (cf. rightplotoffigure3.3). ThesolidlinereportstheaccuracyoftheRationalKrylov subspace method on the whole imaginary axis as the space dimension increases, when using complex shifts over the convex hull of the rational Ritz values; The initial values s (1) 0 = and s (2) 0 = were computed automatically as rough spectral estimates. For theoretical purposes, we consider generalizing the idea of EDS shifts to the complex case. As it was mentioned already, the EDS for symmetric negative

11 Adaptive rational Krylov space 11 Table 3.1 Relevant data information for the test problems Pb n A F condest(a) E F condest(e) FOM e Id Id CDPlayer e e+04 Id Id EADY e e+02 Id Id FLOW e e e+06 definite problems is a (randomized) approximation of the equilibrium charge (in the left complex half plane) of the plane condenser with the positive and negative plates being respectively the A s spectral interval and its mirrored reflection with respect to the imaginary axis. The spectral interval of a symmetric matrix is its numerical range, so a natural extension to a nonsymmetric matrix is to consider its numerical range and its mirrored reflection as condenser s plates. Following this reasoning, a possible rough approximation of the a-priori reference shifts in the numerical range is computed by using 50 points on the spectral convex hull of A: these are taken to be the vertices of the convex hull of the complex spectrum of A (cf. right plot of Figure 3.3). The convergence of the rational Krylov method with these precomputed shifts is also reported in the plots of Figure 3.3, as dash-dotted line. The similarity with the adaptive method in both cases is striking. Indeed, finding such a-priori shifts requires the solution of the full eigen-problem for A, whereas computing the adaptive shifts involves negligible additional cost. We also ran the rational Krylov subspace method with points from the border of the mirrored numerical range of A, a much larger region of C + is this case (cf. right plot of Figure 3.3), however convergence was significantly slower Numerical experiments. In the following we report on our experience with various nonsymmetric benchmark problems, whose main properties are reported in Table 3.1. For each test case, we show the Bode plot of the true transfer function andofitsapproximations, andalsotheerror h(ω) h approx (ω) inthefrequencyrange of interest (in most cases, this range is part of the data set), keeping in mind that this latter plot only reports on the local behavior of the methods. As in the previous test, we compare the performance of the Standard Krylov, Inverted-Krylov, Extended Krylov subspace methods, with that of the new adaptive Rational Krylov subspace method. To simplify the presentation, we consider approximating the transfer function around zero. Numerical results reported in the literature for this problem typically focus on the performance of the considered methods in terms of accuracy of the transfer function, and of size of the projection space; we thus adopt the same comparison criteria here. The selection of the reduced subspace dimension is a crucial step in the overall reduction process. In the following we shall not dwell with this question, as it is common to all projection-type methods; the subspace dimension will thus be fixed a-priori. Nonetheless, in [27, Ch. 5] it is possible to find a more extensive discussion on stopping criteria. In addition to the simple, but possibly expensive approach of point-wise evaluating h(ω) h approx (ω) at some selected points, Grimme analyzes approximations to the error, or rather its trend, by using residuals. Denoting by x m (ω) an approximation to x(ω) = (A iωi) 1 b and d m (ω) = b (A iωi)x m (ω)

12 12 V. Druskin and V. Simoncini its residual, one can observe that h(ω) h approx (ω) = cx(ω) cx m (ω) = c(a iωi) 1 d m (ω), so that h(ω) h approx (ω) c(a iωi) 1 d m (ω). Therefore, if c(a iωi) 1 is moderate for all ω, the quantity d m (ω) may be used to provide a first estimate of the quality of the approximation. For reduction processes associated with Krylov subspaces, the norm d m (ω) could be obtained quite cheaply as ω varies, and thus a stopping criterion based on d m (ω) could be included; see, e.g., [27, sec. 5.2] h(ω) true Extended Krylov std Krylov invert Krylov rational Krylov h(ω) h approx (ω) Extended Krylov std Krylov invert Krylov rational Krylov frequency ω frequency ω Fig Example 3.1. FOM data set. Space dimension m = 20. Left: Bode plot. Right: error h(ω) h approx(ω) as ω varies h(ω) true Extended Krylov std Krylov invert Krylov rational Krylov h(ω) h approx (ω) Extended Krylov std Krylov invert Krylov rational Krylov frequency ω frequency ω Fig Example 3.1. CDPlayer data set. Space dimension m = 20. Left: Bode plot. Right: error h(ω) h approx(ω) as ω varies. Example 3.1. We consider a few benchmark problems from the NICONET repository ([35]) and the Oberwolfach collection [14]. All relevant information on the

13 Adaptive rational Krylov space 13 problems are reported in Table 3.1. The performance of the methods is shown in the plots of Figure , which report the Bode plot of the transfer function (left plots) and the associated errors (right plots) for the reference (small) space dimension m = 20. In all cases, the matrices provided in the data set were used; otherwise, we set c = b and E = I where needed. In almost all cases, the frequency range was also provided. For the CDplayer example, the second input and first output were used, otherwise the first input and output were selected. For the CDplayer data set we used s (1) 0 = 10 1 and s (2) 0 = i For the FOM matrix we used s (1) 0 = 1 and s (2) 0 = Finally, for the Eady data set, we did not use any a-priori information, and we used the estimates s (1) 0 = A F /condest(a), and s (2) 0 as the eigenvalue with largest real part of A (using Matlab s eigs), with a rough absolute stopping tolerance of 10 5 ; cf. also the discussion on initial value selection later in this section. All results consistently confirm the good performance of the rational Krylov subspace method compared with the other projection strategies, for a given final reduced space dimension true Extended Krylov std Krylov invert Krylov rational Krylov 10 2 h(ω) 10 2 h(ω) h approx (ω) Extended Krylov std Krylov invert Krylov rational Krylov frequency ω frequency ω Fig Example 3.1. EADY data set. Space dimension m = 20. Left: Bode plot. Right: error h(ω) h approx(ω) as ω varies. We next report on the performance of the rational Krylov subspace as a reduction space on a larger problem. Example 3.2. We consider the nonsymmetric linear system stemming from a 2D convective thermal flow problem (set flow meter model v0.5 in [14]). The involved matrices have size 9669 and we consider the first output vector. The starting shifts s (1) 0,s(2) 0 were approximated as s (1) 0 = A F /(condest(a) E F ) and s (2) 0 λ max ( A,E). All subsequent poles were selected to be real. The results of the approximation for m = 80 are shown in the plots of Figure 3.7, except for the Arnoldi method, which did not deliver a useful result because the output vector c appears to be orthogonal to the generated space. The error plot fully confirms our previous findings, showing a consistent error reduction throughout the considered frequency range. Our final considerations of this section concern the choice of the initial values s (1) 0,s(2) 0. These values are used in the determination of S m to generate the set of

14 14 V. Druskin and V. Simoncini h(ω) h(ω) h approx (ω) extended invert arnoldi rational krylov true extended invert arnoldi rational krylov frequency ω frequency ω Fig Example 3.2. Flow problem, size n = Space dimension m = 20. Left: Bode plot. Right: error h(ω) h approx(ω) as ω varies. points used to compute the next shift in Algorithm 1; They are also used as the first few poles. Some problems appeared to be sensitive to the accuracy of the initial values as estimates of the spectral region, therefore any relevant a-priori knowledge could be exploited here. As an example, the left plot of Figure 3.8 reports the H error norm as a function of the space dimension, for different starting values s (1) 0, s (2) 0 for the FOM dataset. All real poles were selected. The matrix A is normal, with R(λ) [1,1000], and the thickest curve corresponds to the selection s (1) 0 = 1, s (2) 0 = Note that the value s (1) 0 = 18 was obtained with the following rough estimate s (1) 0 = A F /condest(a), which is the default approximation used in all our tests when no spectral information is provided. Figure 3.8 shows that the method seems to perform less well when the real part of the spectral interval is not captured. It is also interesting that the use of complex shifts allows us to recover the best observed convergence rate (cf. right plot of Figure 3.8) Comparison with IRKA. We complete our experimental analysis of the reduction of the transfer function by performing a preliminary experimental comparison with the Iterative Rational Krylov Algorithm (IRKA) developed in [28]. For a chosen space dimension m, and starting with an initial guess of poles s (0) C m, IRKA generates a sequence of biorthogonal bases V, W for the two rational Krylov subspaces K m (A,b,s (j) ) and K m (A,c,s (j) ), j = 1,2,..., respectively. At each cycle j, the next set of poles s (j+1) is computed as the mirrored eigenvalues of W AV. Upon convergence, the last set of poles will coincide with the mirrored eigenvalues of the projected matrix (interpolation points), and first-order necessary conditions for H 2 optimality will be satisfied; we refer the reader to [28] for both algorithmic and theoretical details. We consider the CD Player data set (cf. Example 3.1), which was extensively discussed in [28, sec. 5.2]. The starting set of poles for IRKA was chosen randomly as described in [28, sec. 5.2]. For the Rational Krylov Subspace method (RKSM) we

15 Adaptive rational Krylov space s 0 (1) = 18 max h(w) h approx (w) s 0 (2) = 2000 max h(ω) h approx (ω) complex shifts real shifts s 0 (1) = space dimension space dimension Fig FOM data set. Matrix A is normal, with eigenvalues in [1, 1000]. Left: Dependence of H error norm on the choice of real starting parameters s (1) 0 1, s (2) when either of them is moved away from the exact value (all poles are chosen to be real). The thickest curve is for both parameters equal to the exact values s (1) 0 = 1, s (2) 0 = Right: Error in the H -norm for real and complex shifts. selected s (1) 0 = 10 1 and s (2) 0 = 800+i5 10 4, assuming that a rough knowledge of the spectral region is available, as done for IRKA. dim=10 dim=20 dim= IRKA IRKA h(ω) h approx (ω) RKSM h(ω) h approx (ω) RKSM IRKA h(ω) h approx (ω) RKSM frequency ω frequency ω frequency ω Fig CD Player data. RKSM vs IRKA. Error as a function of frequency, in the range [10 1,10 6 ]. For IRKA: stopping tolerance and number of cycles: 8 (dim=10), 28 (dim=20), 41 (dim=30). Although our algorithm is expected to perform well in terms of H error norm, we do not expect it to satisfy H 2 optimality conditions. On the other hand, the cost of RKSM is approximately half the cost of a single cycle of IRKA for the same space dimension, and it is thus more affordable on large-scale problems. The plots of Figure 3.9 report the error h(ω) h approx (ω) for ω [10 1,10 6 ], for IRKA and RKSM and m = 10,20,30, while Table 3.2 shows the relevant global error norm information. The IRKA stopping tolerance, based on the relative change of the poles between cycles, was set 2 equal to. We observe that IRKA requires 8, 28 and 41 cycles to meet the stopping criterion for m = 10,20,30, respectively. We 2 We did not notice any difference in the quality of the results, in terms of obtained error norms, for lower tolerances.

16 16 V. Druskin and V. Simoncini Table 3.2 H 2,H 1 and H error norms for IRKA and RKSM for different space dimensions. IRKA (tol= ) RKSM dim # cycles H 2 H 1 H H 2 H 1 H e e e e e e e e e e e e e e e-02 Table 3.3 H 2,H 1 and H error norms for IRKA for different space dimensions. Number of cycles to achieve comparable error norms as with RKSM. IRKA dim # cycles H 2 H 1 H e e e e e e e e e-02 also observe that the number of cycles tends to increase with the subspace dimension. Therefore, in terms of costs, RKSM is significantly superior to IRKA. On the other hand, IRKA does provide better H 2 and H error norms for all space dimensions: the difference with respect to the error in RKSM is of one order of magnitude. Note however that such difference is less pronounced in the H 1 norm. For completeness, Table 3.3 reports the number of IRKA cycles needed to obtain error norms comparable to those obtained by a single run of RKSM. We stress that the objective of the IRKA is to obtain the most accurate reduced order model of a given size, when the cost of subspace generation can be neglected. Suchproblemsarisewhenareducedordermodelwillbeusedmanytimes, forexample, in optimization routines. The RKSM is intended to obtain an approximant of a given accuracy with minimal cost, that is measured as the number of linear solves. The experiment confirms that both the approaches serve the best in their respective areas. We also notice that IRKA is based on the two sided Petrov-Galerkin algorithm, while the RKSM is a Galerkin-type algorithm. They coincide for symmetric transfer functions, however for the nonsymmetric ones they are indeed different. The IRKA does match 2m moments, whereas RKSM does only m for the sake of preserving passivity. It can be shown that the two-sided Petrov-Galerkin algorithm has a residual formula similar to the one of RKSM in (2.2), so that the same adaptive algorithm can in principle be applied to IRKA. Such a possibility can be a subject of future research. 4. The Lyapunov equation. We consider the numerical approximation to the solution of the algebraic equation AX +XA +bb = 0. Consider a subspace K of R n, and let V be a matrix whose orthonormal columns span K. We project the Lyapunov equation onto this subspace K: we set H = V AV, and seek an approximate solution in K in the form X = VYV. The projected problem may be obtained by imposing that the residual R = A X + XA +bb be orthogonal to the approximation space K, the so-called Galerkin condition. In matrix terms,

17 Adaptive rational Krylov space 17 this can be written as V RV = 0. Since V has orthonormal columns, this yields the following projected Lyapunov equation V AVY +YV A V +V bb V = 0, (4.1) whose solution Y defines X. Since we assume that A is negative definite, V AV is stable and the equation (4.1) admits a unique solution. Since Y is symmetric and positive (semi-)definite, the approximate solution can be written as X = ZZ and thus only Z needs to be stored, where the number of columns of Z corresponds to the number of eigenvalues of Y that are larger than, say, the available machine accuracy. The projection is effective if a good approximation is attained for a small dimensional subspace, so that (4.1) can be cheaply solved by means of procedures based on matrix factorizations; cf. [4], [30], [32]. In turn, this effectiveness is more likely to take place whenever the exact solution can be accurately approximated by low rank matrices. Projection methods explored in the literature differ in the way the space K, and in some cases the matrix V, is chosen, and these include the standard Krylov subspace K m (A,b) first proposed in [41], and the much more recent Extended Krylov subspace K m (A,b) = K m (A,b) + K m (A 1,A 1 b) [46]. A related approach is the cyclic low-rank Smith method (see, e.g., [37]), which is an effective implementation of the classical ADI method [22],[33]. A lot of effort has been recently put in the development of a parameter-free and computationally effective cyclic Smith method; see, e.g., [8] and references therein. It is important to realize that the cyclic Smith method determines the solution in a rational Krylov subspace, but in general without imposing an orthogonalization condition [16]. The effectiveness of the computation in the Smith method only relies on the closeness of the employed parameters to the optimal ones, for which a fast asymptotic convergence rate can be analytically derived [22]. If such an accuracy is not attained, then a much larger approximation space for the Smith process may be required, also possibly yielding an approximate solution factor of large rank. This weakness in the parameter selection makes the ADI-type approach inferior to the Extended Krylov subspace method, in terms of memory and computational time, both in the symmetric and nonsymmetric cases; see, e.g., the numerical experiments reported in [46]. In this paper we want to show that the adaptive computation of the parameters in Algorithm 1 allows us to generate an effective rational Krylov subspace, which may compete with the Extended Krylov subspace. In particular, our experiments show that the generated approximation space is always (much) smaller than that computed by the Extended Krylov subspace. The final space dimension is probably the most explicit indicator of the performance of the Rational Krylov method compared to the Extended one. The latter requires a system solve at every other iteration with the same coefficient matrix, therefore either the matrix can be factorized once for all, or an effective preconditioner can be precomputed if an iterative solver is employed. On the other hand, the rational Krylov space requires a different system solve at each iteration, since the shifted matrix changes, so that the number of solves equals the space dimension. All other computations are comparable when A is nonsymmetric. As a result, for a fixed space dimension, the generation of the rational space is more expensive than that of the extended space. Therefore, the rational Krylov subspace is competitive only if it can generate a sufficiently good approximation earlier, that is with a smaller space, than with the Extended Krylov space. We have seen in previous sections that this may be the case for the transfer function. In section 4.1 we show that this is so also for some data sets in this context.

18 18 V. Druskin and V. Simoncini The use of the rational Krylov subspace has an extra theoretical motivation. Let X m = VYV be an approximate solution, where Y solves the reduced equation (4.1). Let T = V AV and e = V b. Then using the analytic expression for the solution X (see, e.g., [1]), we can write X X m = (x(ω)x(ω) x(ω) x(ω) )dω R with x(ω) = (A ωii) 1 b, x(ω) = V(T ωii) 1 e. This relation emphasizes that the resolvent function A ωii plays a crucial role also in the approximation to the Lyapunov equation, and in particular, we can expect that the behavior of the error x(ω) x(ω) for ω R influences that of X X m. A convergence analysis of the rational Krylov method with fixed shifts for the Lyapunov equation was recently performed in [16, sec. 4]. In previous sections we defined the rational Krylov subspace as (1.1). However, the inclusion of the vector b may be beneficial, and thus for the computation associated with the Lyapunov equation we use m K m (A,b,s) = span{b,(a s 2 I) 1 b,..., (A s j I) 1 b}. (4.2) In particular, if b = 1 and b is the vector used to generate the space, then b = V m e 1, where V m is the basis for (4.2) obtained by Gram-Schmidt orthogonalization. In the following we shall consider real shifts, therefore all computation is performed in real arithmetic. We introduce a few additional computational details. The computation of T m can be performed in a more efficient manner than the explicit product T m = V mav m, so as to avoid extra matrix-vector multiplies per iteration. In [6] for instance, it was recently suggested to generate the basis by means of (A s j I) 1 Av, j = 2,3,...; see also [29, Ch.5]. Alternatively, the following proposition describes the procedure suggested by Ruhe, see e.g. [40], where an extra multiplication by A is only performed to build the projected matrix. Proposition 4.1 ([40]). Let the columns of V m be an orthonormal basis of the rational Krylov subspace, H m+1,m = [H m ;h m+1,m e m] be the matrix of orthogonalization coefficients, D m = diag(s 2,s 3...,s m+1 ). With the notation above, it holds that j=2 T m := V mav m = (I m +H m D m V mav m+1 h m+1,m e m)h 1 m. The construction described in Proposition 4.1 reveals that this approach requires the knowledge of s m+1 in order to compute T m. This simply amounts to postponing step 3 of Algorithm 1 to after step 6. The following proposition shows that the norm associated with the current residual can be computed at low cost. Proposition 4.2. Let K be the rational Krylov subspace in (4.2) of dimension m, and V m R n m contain an orthonormal basis of K. Let X m = V m Y m Vm be the approximate solution obtained in K. With the same notation as in Proposition 4.1, the residual R m = A X m + X m A +bb satisfies [ ] R m F = SJS 0 1 F, J =, 1 0

Adaptive rational Krylov subspaces for large-scale dynamical systems. V. Simoncini

Adaptive rational Krylov subspaces for large-scale dynamical systems V. Simoncini Dipartimento di Matematica, Università di Bologna valeria@dm.unibo.it joint work with Vladimir Druskin, Schlumberger Doll