H 2 -optimal model reduction of MIMO systems P. Van Dooren K. A. Gallivan P.-A. Absil Abstract We consider the problem of approximating a p m rational transfer function Hs of high degree by another p m rational transfer function b Hs of much smaller degree. We derive the gradients of the H 2-norm of the approximation error and show how stationary points can be described via tangential interpolation. Keyword Multivariable systems, model reduction, optimal H 2 approximation, tangential interpolation. Introduction In this paper we will consider the problem of approximating a real p m rational transfer function Hs of McMillan degree N by a real p m rational transfer function Ĥs of lower McMillan degree n using the H 2 -norm as approximation criterion. Since a transfer function has an unbounded H 2 -norm if it is not proper a rational transfer function is proper if it is zero at s =, we will constrain both Hs and Ĥs to be proper. Such transfer functions have state-space realizations A,B,C R N2 R Nm R pn and Â, B,Ĉ Rn2 R nm R pn satisfying Hs := CsI N A B and Ĥs := ĈsI n  B. The realization {Â, B,Ĉ} is not unique in the sense that the triple {ÂT, B T,ĈT } := {T ÂT,T B, ĈT } for any matrix T GLn, R defines the same transfer function : Ĥs = ĈsI n  B = Ĉ T si n ÂT BT. It is known see e.g. Theorem 4.7 in Byrnes and Falb [3] that the geometric quotient of R n2 R nm R pn under GLn, R is a smooth, irreducible variety of dimension nm + p. This implies that the set Rat n p,m of p m proper rational transfer functions of degree n can be parameterized with only nm+p real parameters in a locally smooth manner. A possible approach for building a reduced order model {Â, B,Ĉ} from a full order model {A,B,C} is tangential interpolation, which can always be achieved see [4] by solving two Sylvester equations for the unknowns W,V R N n AV V Σσ + BR =, 2 and constructing the reduced order model of degree n as follows W T A Σ T µw T + L T C =, 3 {Â, B,Ĉ} := {W T V W T AV,W T V W T B,CV }, 4 Corresponding author. E-mail address: paul.vandooren@uclouvain.be CESAME, Université catholique de Louvain, B-348 Louvain-la-Neuve, Belgium School of Computational Science, Florida State University, Tallahassee FL 3236, USA This paper presents research supported by the Belgian Network DYSCO Dynamical Systems, Control, and Optimization, funded by the Interuniversity Attraction Poles Programme, initiated by the Belgian State, Science Policy Office and by the National Science Foundation under contract OCI-3-24944. The scientific responsibility rests with its authors.
provided the matrix W T V is invertible which also implies that V and W must have full rank n. The interpolation conditions {Σ σ,r} and {Σ µ,l} where Σ µ,σ σ R n n, R R m n and L R p n are known to uniquely determine the projected system {Â, B,Ĉ} [4]. The equations above can be expressed in another coordinate system by applying invertible transformations of the type { Q Σ σ Q,RQ } and { P Σ µ P,LP } to the interpolation conditions, which yields transformed matrices V P and WQ but does not affect the transfer function of the reduced order model {Â, B,Ĉ} see [4]. Therefore, the interpolation conditions essentially impose nm + p real conditions, since Σ σ and Σ µ can be transformed to their Jordan canonical form. In the case that both matrices are simple no Jordan blocks of size larger than we can assume Σ σ and Σ µ to be block diagonal with a diagonal block σ i or µ i for each real condition and a 2 2 diagonal block [ σ i σ i+ ] [ µ σ i+ σ i or i µ i+ ] µ i+ µ i for each pair of complex conjugate conditions. We refer to [] for a more elaborate discussion on this and for a discrete-time version of the results of this paper. In this paper we first compute the gradients of the H 2 error of the approximation problem and then show that its stationary points satisfy special tangential interpolation conditions that generalize earlier results for SISO systems and help understand numerical algorithms to solve this model reduction problem. 2 The H 2 approximation problem Let Es be an arbitrary proper transfer function, with realization triple {A e,b e,c e }. If Es is unstable, its H 2 -norm is defined to be. Otherwise, its squared H 2 -norm is defined as the trace of a matrix integral [2] : J := Es 2 H 2 := tr Ejω H Ejω dω 2π = tr EjωEjω H dω 2π. By Parseval s identity, this can also be expressed using the state space realization as see [2] J = tr [C e exp Aet B e ][C e exp Aet B e ] T dt = tr [C e exp Aet B e ] T [C e exp Aet B e ]dt. This can also be related to an expression involving the Gramians P e and Q e defined as P e := [exp Aet B e ][exp Aet B e ] T dt, Q e := which are also known to be the solutions of the Lyapunov equations [exp Aet B e ] T [C e exp Aet ]dt, A e P e + P e A T e + B e B T e =, Q e A e + A T e Q e + C T e C e =. 5 Using these, it easily follows that the squared H 2 -norm of Es can also be expressed as J = tr B T e Q e B e = tr C e P e C T e. 6 We now apply this to the error function Es := Hs Ĥs. A realization of Es in partitioned form is given by {A e,b e,c e } := {[ [ A B B] [,, C Ĉ] Â] }, and the Lyapunov equations 5 become [ ] P X P e := X T, P [ ] [ ] [ ][ A P X P X A T Â X T + P X T P Â T ] [ ] [ + B BT] B B T =, 7 and [ ] Q Y Q e := Y T, Q [ A T Â T ] [ Q ] Y Y T Q [ ] [ ] Q Y A C T [ + Y Q][ T + C Ĉ] =. 8 Â ĈT 2
To minimize the H 2 -norm, J, of the error function Es we must minimize [ ] [ [ ] J = tr B T Q Y BT Y Q] B B T = tr B T QB + 2B T Y B + B T Q B, 9 where Q, Y and Q depend on A, Â, C and Ĉ through the Lyapunov equation 8, or equivalently [ ] [ ][ ] P X C T J = tr C Ĉ = tr CPC P T ĈT 2CXĈT + Ĉ PĈT, X T where P, X and P depend on A, Â, B and B through the Lyapunov equation 7. Note that the terms B T QB and CPC T in the above expressions are constant, and hence can be discarded in the optimization. 3 Optimality conditions The expansions above can be used to express first order optimality conditions for the squared H 2 -norm in terms of the gradients of J versus Â, B and Ĉ. We define a gradient as follows. Definition 3. The gradient of a real scalar function fx of a real matrix variable X R n p is the real matrix X fx R n p defined by It yields the expansion [ X fx] i,j = d dx i,j fx, i =,...,n, j =,...,p. fx + = fx + X fx, + O 2, where M,N := tr M T N. The following lemma is useful in the derivation of our results see [7]. Lemma 3.2 If AM + MB + C = and NA + BN + D =, then trcn = trdm. Starting from the characterizations 7, and 8,9 of the H 2 norm and using Lemma 3.2 we easily derive succinct forms of the gradients. This theorem is originally due to Wilson [8]. Theorem 3.3 The gradients ba J, bb J and bc J of J := Es 2 H 2 are given by ba J = 2 Q P + Y T X, bb J = 2 Q B + Y T B, bc J = 2Ĉ P CX, where A T Y + Y  CT Ĉ =,  T Q + Q + ĈT Ĉ =, 2 X T A T + ÂXT + BB T =, P T +  P + B B T =. 3 Proof. For finding an expression for ba J we consider the characterization J = tr B T QB + 2B T Y B + B T Q B, A T Y + Y  CT Ĉ =,  T Q + Q + ĈT Ĉ =. Then the first order perturbation J corresponding to ba is given by J = tr 2 BB T Y + B B T bq where Y and bq depend on ba via the equations A T Y + Y  + Y ba =,  T bq + bq  + T b A Q + Q ba =. 4 3
It follows from applying Lemma 3.2 to the Sylvester equations 3,4 that tr BB T Y = tr X T Y ba and tr B BT bq = tr P T Q ba + Q ba and therefore J = tr 2X T Y ba + P T A b Q + Q ba = tr 2X T Y ba + 2 P Q ba = 2 Q P + Y T X, ba. Since J also equals ba J, ba, it follows that ba J = 2 Q P + Y T X. To find an expression for bb J we perturb B in the characterization J = tr B T QB + 2B T Y B + B T Q B. which yields the first order perturbation J = tr 2B T Y bb + T Q B bb + BT Q bb = 2Y T B + Q B, bb. Since J also equals bb J, bb, it follows that bb J = 2 Q B + Y T B. In a similar fashion we can write the first order perturbation of J = tr CPC T 2CXĈT + Ĉ PĈT to obtain bc J = 2Ĉ P CX. The gradient forms of Theorem 3.3 allow us to derive our fundamental theoretical result. Theorem 3.4 At every stationary point of J where P and Q are invertible, we have the following identities  = W T AV, B = W T B, Ĉ = CV, W T V = I n with W := Y Q, V := X P 5 where X, Y, P and Q satisfy the Sylvester equations 2,3. Proof. Since we are at a stationary point of J, the gradients versus Â, B and Ĉ must be zero : Q P + Y T X =, Q B + Y T B =, Ĉ P CX =. Since P and Q are invertible, we can define W := Y Q and V := X P. It then follows that W T V = I n, B = W T B, Ĉ = CV. Multiplying the first equation of 3 with W and using X T = PV T, yields PV T A T W +  PV T W + BB T W =. Using V T W = I, B T W = B T and the second equation of 3 it then follows that  = W T AV. If we rewrite the above theorem as a projection problem, then we are constructing a projector Π := V W T implying W T V = I n where V and W are given by the following transposed Sylvester equations QW T A + ÂT QW T + ĈT C =, AV P + V PÂT + B B T =. 6 Notice that P and Q can be interpreted as normalizations to ensure that W T V = I n. 4
It was shown in [4] that projecting a system via Sylvester equations always amounts to satisfying tangential interpolation conditions. The Sylvester equations 6 show that the parameters of reduced order models corresponding to stationary points must have specific relationships with the parameters of the tangential interpolation conditions 2,3,4. First note that  = Σ σ = Σ µ requires that the left and right interpolation points are identical and equal to the negatives of the poles of the reduced order model. For SISO systems, choosing identical left and right interpolation point sets implies that Ĥs and Hs and, at least, their first derivatives match at the interpolation points. Theorem 3.4 therefore generalizes to MIMO systems the conditions of [6] on the H 2 -norm stationary points for SISO systems. The simple additive result for the orders of rational interpolation for SISO systems, however, is replaced by more complicated tangential conditions for MIMO systems that require the definition of tangential direction vectors that can be vector polynomials of s. The Sylvester equations 6 show that these direction vectors are also related to parameters of realizations of Ĥs. If the Sylvester equations are expressed in the coordinate system with  in Jordan form then the transformed B and Ĉ contain the parameters that define the tangential interpolation directions. 4 Tangential interpolation revisited Theorem 3.4 provides the fundamental characterization of the stationary points of J via tangential interpolation conditions and their relationship to the realizations of Ĥs. It is instructive to illustrate those relationships in a particular coordinate system and derive an explicit form of the tangential interpolation conditions. We assume here that all poles of Ĥs are distinct but possibly complex the so-called generic case. Hence the transfer functions Hs and Ĥs have real realizations {A,B,C} and {Â, B,Ĉ} with  diagonalizable. The interpretation of these conditions for multiple poles or higher order poles becomes more involved and can be found in an extended version of this paper []. Given our assumptions, we have for Ĥs the partial fraction expansion Ĥs = n i= ĉ i bh i s λ i, 7 where b i C m and ĉ i C p and where λ i, b i,ĉ i,i =,...,n is a self-conjugate set. We must keep in mind that the number of parameters in {Â, B, Ĉ} is not minimal and hence that the gradient conditions of Theorem 3.3 must be redundant. We make this more explicit in the theorem below. For this we will need s i, t H i, the complex left and right eigenvectors of the real matrix  corresponding to the complex eigenvalue λ i. Because of the expansion 7, we then have : Âs i = λ i s i, Ĉs i = ĉ i, t H i  = λ i t H i, t H i B = b H i. Theorem 4. Let Ĥs = n i= ĉi b H i /s λ i have distinct first order poles where λ i, b i,ĉ i, i =,...,n is self-conjugate. Then 2 B bj T s i = [H T λ i ĤT λ i ]ĉ i 8 2 th i bc J T = b H i [H T λ i ĤT λ i ] 9 2 th i ba J T s i = b H d i ds [HT s ĤT s] ĉ i s= λi b 2 2 th i ba J T s j = 2 λ i λ j [ b H i bb J T s j t H i bc J T ĉ j ] 2 5
Proof. Define y i := Y s i, q i := Qs i, x i := Xt i and p i := Pt i. Then from 2,3 we have It follows that from which we obtain A T + λ i Iy i = C T ĉ i, ÂT + λ i I q i = ĈT ĉ i, x H i A T + λ i I = b H i B T, p H i ÂT + λ i I = b H i y i = A T + λ i I C T ĉ i, q i = ÂT + λ i I Ĉ T ĉ i, 22 x H i = b H i B T A T + λ i I, p H i = b H i B T. 2 b B J T s i = B T Q + B T Y s i = [H T λ i ĤT λ i ]ĉ i, B T ÂT + λ i I, 23 2 th i bc J T = t H i PĈT X T C T = b H i [H T λ i ĤT λ i ]. From the 22,23 it also follows that 2 th i ba J T s j = t H i P Q+X T Y s j = b H i [ B T ÂT + λ i I ÂT + λ j I Ĉ T B T A T + λ i I A T + λ j I C T ]ĉ i. If we use d ds Hs = CsI A 2 B and dsĥs d = ĈsI Â 2 B, then for i = j we obtain 2 th i ba J T s i = b H d i ds [HT s ĤT s] ĉ i. s= λi b For i j we use the identity M + λ j I M + λ i I = λ i λ j [M + λ j I M + λ i I ] to obtain 2 th i ba J T s j = λ i λ j t H i [H T λ i ĤT λ i ] [H T λ j ĤT λ j ] s j and finally 2 th i ba J T s j = 2 λ i λ j [ b H i bb J T s j t H i bc J T ĉ j ]. Let S := [ s... s n ], then the above theorem shows that the off-diagonal elements of S ba J T S vanish when bb J T and bc J T vanish. Therefore we need to impose only conditions on diags ba J T S, on bb J T and on bc J T to characterize stationary points of J. These are exactly nm + p conditions since the vectors b H i or ĉ i can be scaled as indicated in Section. Moreover one can view them as nm + p real conditions since the poles λ i come in complex conjugate pairs. The following corollary easily follows. Corollary 4.2 If bb J T =, bc J T = and diag S ba J T S = then ba J = and the following tangential interpolation conditions are satisfied for all λ i,i =,...,n : [H T λ i ĤT λ i ]ĉ i =, bh i [H T d λ i ĤT λ i ] =, bh i ds [HT s ĤT s] ĉ i =. 24 s= λi b Notice that we retrieve the conditions of [6] for the SISO case since then b H i and ĉ i are just nonzero scalars that can be divided out. The conditions above then become the familiar 2n interpolation conditions H λ i = Ĥ λ i, d ds Hs s= b λ i = d ds Ĥs s= b λi, i =,...,n. 6
5 Concluding remarks The H 2 norm of a stable proper transfer function Es is a smooth function of the parameters {A e,b e,c e } of its state-space realization because the squared norm of Es is differentiable versus the parameters {A e,b e,c e } as long as A e is stable the Lyapunov equations are then invertible linear maps and the trace is a smooth function of its parameters. If Ĥs is an isolated local minimum of the error function Hs Ĥs 2 H 2, then the continuity of the norm implies that a small perturbation of Hs will induce also only a small perturbation of that local minimum. This explains why we can construct a characterization of the optimality conditions without assuming anything about the structure of the poles of the transfer functions Hs and Ĥs. Those ideas also lead to algorithms. One can view 2,3 and 5 as two coupled equations X,Y, P, Q = FÂ, B,Ĉ and Â, B,Ĉ = GX,Y, P, Q for which we have a fixed point Â, B,Ĉ = GFÂ, B,Ĉ at every stationary point of J Â, B,Ĉ. This automatically suggests an iterative procedure X,Y, P, Q i+ = FÂ, B,Ĉ i+, Â, B,Ĉ i+ = GX,Y, P, Q i, which is expected to converge to a nearby fixed point. This is essentially the idea behind existing algorithms using Sylvester equations in their iterations see [2]. Another approach would be to use the gradients or the interpolation conditions of Theorem 4. to develop descent methods or even Newton-like methods, as was done for the SISO case in [5]. The two fundamental contributions of this paper are, first, the characterization of the stationary points of J via tangential interpolation conditions and their relationship to the realizations of Ĥs given by Theorem 3.4, and, second, the fact that this can be done using Sylvester equations without assuming anything about the structure of either Hs or Ĥs thereby providing a framework to relate existing algorithms and to develop and understand new ones. References [] P.-A. Absil, K. A. Gallivan and P. Van Dooren. Multivariable H 2 -optimal approximation and tangential interpolation. Internal CESAME Report, Catholic University of Louvain, September 27. [2] A. Antoulas. Approximation of Large-Scale Dynamical Systems. Siam Publications, Philadelphia 25 [3] C. Byrnes and P. Falb. Applications of algebraic geometry in systems theory. American Journal of Mathematics, 2:337-363, April 979. [4] K. A. Gallivan, A. Vandendorpe, and P. Van Dooren. Model reduction of MIMO systems via tangential interpolation. SIAM J. Matrix Anal. Appl., 262:328 349, 24 [5] S. Gugercin, A. Antoulas and C. Beattie. Rational Krylov methods for optimal H2 model reduction, submitted for publication, 26. [6] L. Meier and D. Luenberger. Approximation of linear constant systems. IEEE Trans. Aut. Contr., 2:585-588, 967. [7] W.-Y. Yan and J. Lam. An approximate approach to H 2 optimal model reduction. IEEE Trans. Aut. Contr. 447:34-358, 999. [8] D. A. Wilson, Optimum solution of model reduction problem, Proc. Inst. Elec. Eng., 7:665, 97. 7