Gramian Based Model Reduction of Large-scale Dynamical Systems

Gramian Based Model Reduction of Large-scale Dynamical Systems Paul Van Dooren Université catholique de Louvain Center for Systems Engineering and Applied Mechanics B-1348 Louvain-la-Neuve, Belgium vdooren@csam.ucl.ac.be Origin and motivation Ex: Storm surge forecasting Typical techniques (Gramians) Linear time-invariant Linear time-varying Non-linear Numerical/Algorithmic issues (Krylov) 1

Storm surge forecasting in the North Sea see Verlaan-Heemink 97 depth 16 35 14 3 12 Wick 25 1 2 n 8 6 4 North Shields Lowestoft Sheerness Dover Huibertgat Harlingen Delfzijl Den Helder IJmuiden-II HoekvanHolland Vlissingen 15 1 2 5 2 4 6 8 1 12 14 16 18 2 m Problem Using measurements predict the state of the North Sea variables in order to operate the sluices in due time (6h.) Solution x(t). = [h(t), v x (t), v y (t)] satisfies the shallow water equations x(t)/ t = F (x(t), w(t)) y(t) = G(x(t), v(t)) with measurements y(t) and noise processes v(.), w(.) = estimate and predict ˆx(t) using Kalman filtering 2

Some data : very few measurements (x s and + s) 51 5 49 water level current HF radar station 48 47 46 Scheveningen Noordwijk 45 44 Hook of Holland 43 42 2 4 6 8 1 but discretized state is very large-scale (6. variables) 51 5 49 48 y [km] 47 46 45 44 43 42 2 4 6 8 1 x [km] 3

Nevertheless it works... True water levels Estimated water levels ZUNOWAK True water levels (t=1) ZUNOWAK Estimated water levels (t=1).7.1 6.6.5.2.3.4 6.7.6.5.4.3.2.1 4 GERMANY 4 GERMANY 2 ENGLAND THE NETHERLANDS 2 ENGLAND THE NETHERLANDS BELGIUM FRANCE RIJKSWATERSTAAT RIKZ BELGIUM FRANCE RIJKSWATERSTAAT RIKZ 2 4 6 8 ZUNOWAK True water levels (t=3) 2 4 6 8 ZUNOWAK Estimated water levels (t=3) 6 4 1 1.2.2.8.6.4.2 GERMANY 6 4 1 1.4 1.2.8.6.2.4.2 GERMANY 2 ENGLAND THE NETHERLANDS 2 ENGLAND THE NETHERLANDS 1.4 BELGIUM FRANCE RIJKSWATERSTAAT RIKZ BELGIUM FRANCE RIJKSWATERSTAAT RIKZ 2 4 6 8 ZUNOWAK True water levels (t=8) 2 4 6 8 ZUNOWAK Estimated water levels (t=8) 6 6 4 -.2 -.2 -.2 -.8 -.6 -.4 -.2 GERMANY 4 -.2 -.2 -.6 -.4 GERMANY 2.2 ENGLAND.4.6 THE NETHERLANDS 2 ENGLAND.6.4.2 THE NETHERLANDS BELGIUM FRANCE RIJKSWATERSTAAT RIKZ BELGIUM FRANCE RIJKSWATERSTAAT RIKZ 2 4 6 8 2 4 6 8 Reconstruction works well around estuarium 4

Visualisation of computed variance of the error std water level [m].4 16.35 14.3 12.25 1 n.2 8.15 6 4.1 2.5 2 4 6 8 1 m 12 14 16 18 2 Standard deviation of filter using 8 measurement locations std water level [m].4 16.35 14.3 12.25 1 n.2 8.15 6 4.1 2.5 2 4 6 8 1 m 12 14 16 18 2 Standard deviation of filter using measurement locations (1 used for validation) 5

Suppose a good discretization x(.) R N is given Dynamical systems modeled via explicit equations y 1 (.) y 2 (.). y p (.) x(t) R N, N >> m, p u 1 (.) u 2 (.). u m (.) discretize = continuous-time { ẋ(t) = G(x(t), u(t)) y(t) = H(x(t), u(t)) discrete-time { x(k + 1) = G(x(k), u(k)) y(k) = H(x(k), u(k)) linearize { ẋ(t)=a(t)x(t)+b(t)u(t) y(t) = C(t)x(t)+D(t)u(t) { x(k+1) = A(k)x(k)+B(k)u(k) y(k) = C(k)x(k)+D(k)u(k) { ẋ(t) = Ax(t) + Bu(t) y(t) = Cx(t) + Du(t) freeze time { x(k + 1) = Ax(k) + Bu(k) y(k) = Cx(k) + Du(k) Many control problems require N 3 /( t) operations. Because of cubic complexity in N model reduction 6

Linear Time Invariant Systems Given large model {A NN, B Nm, C pn } { ẋ(t) = Ax(t) + Bu(t) y(t) = Cx(t) + Du(t), u(t) R m, y(t) R p, x(t) R N, N << m, p find small model {Ânn, ˆB nm, Ĉpn } with n << N { ˆx(t) = Âˆx(t) + ˆBu(t) ŷ(t) = Ĉ ˆx(t) + ˆDu(t), driven by the same input u(t) with small error y(t) ŷ(t) Model reduction = find a smaller model, i.e. n << N : approximation problem stability is important measure is important 7

How to capture the essence of the system? Transfer functions and norms H(s) = C(sI N A) 1 B + D, Ĥ(s) = Ĉ(sI n Â) 1 ˆB + ˆD, are p m rational matrices try to match frequency responses 1 2 1 2 1 1 1 1 1 1 Frequency response 1 1 1 2 Frequency response 1 1 1 2 1 3 1 3 1 4 1 4 1 5 1 1 1 1 1 1 2 1 3 1 4 1 5 Full order model 1 5 1 1 1 1 1 1 2 1 3 1 4 1 5 Reduced order model Theory : by minimizing their difference using H(.) Ĥ(.) balanced truncation (Moore 81). = sup σ max {H(jω) Ĥ(jω)} ω optimal Hankel norm approximation (Adamian-Arov-Krein 71, Glover 9) interpolation (Gragg-Lindquist 83) Other references : Gallivan-Grimme-VanDooren, Jaimoukha-Kasanelly, Villemagne-Skelton, Boley, Craig, Freund-Feldman, Sorensen-Antoulas,... 8

Why. norm? Fourier transforms of signals : u f (ω) = Fu(t), y f (ω) = Fy(t), ŷ f (ω) = Fŷ(t) yields y f (ω) = H(jω)u f (ω), ŷ f (ω) = Ĥ(jω)u f(ω). and hence a bound for e(t). = [y(t) ŷ(t)] : Fe(t) = e f (ω) = [H(jω) Ĥ(jω)]u f(ω). Minimize worst case error e f (ω) 2 for u f (ω) 2 = 1 by minimizing H(.) Ĥ(.). = sup H(jω) Ĥ(jω) 2, ω but this is a difficult norm to handle! 9

Use Hankel norm instead of. approximations Consider the mapping past inputs = future outputs Continuous-time y(t) = Ce A(t τ) Bu(τ)dτ = Ce At y(t) = Ce At x(), x() = e Aτ Bu( τ)dτ, e Aτ Bu( τ)dτ. 1 1.8.8.6.6.4.4.2.2.2.2.4.4.6 4 35 3 25 2 15 1 5 5 1 Continuous input u(t).6 1 5 5 1 15 2 25 3 35 4 Continuous output y(t) Discrete-time y(k) = CA (k j) Bu(j) = CA k A j Bu( j), y(k) = CA k x(), x() = A j Bu( j). 1 1.8.8.6.6.4.4.2.2.2.2.4.4.6 4 35 3 25 2 15 1 5 5 1 Discrete input u(k).6 1 5 5 1 15 2 25 3 35 4 Discrete output y(k) 1

N N Gramians derived from the Hankel map From y([, )) = Ox(), x() = Cu((, )), define the dual maps O : y([, )) x(), C : x() u((, )) and the (observability and controllability) Gramians G o. = O O, G c. = CC Continuous-time G o = + (Ce At ) T (Ce At )dt, G c = + (e At B)(e At B) T dt, Discrete-time G o = + (CA k ) T (CA k ), G c = + (A k B)(A k B) T, Gramians can be viewed as energy functions G c for past inputs x() G o for x() future outputs Perform ordered eigendecomposition Λ = T 1 (G c G o )T (with λ n >> λ n+1 ) and project on the first n coordinates : T 1 AT =. [ ] A11 A 12, T 1 B. [ ] B1 =, CT =. [ ] C A 21 A 22 B 1 C 2. 2 {Â, ˆB, Ĉ}. = {A 11, B 1, C 1 } 11

Interpolating the frequency response H(ω) seems a good idea since Parseval s theorem implies and G o = 1 2π G c = 1 2π + G o = 1 2π G c = 1 2π + + + ( jωi A T ) 1 C T C(jωI A) 1 dω, (jωi A) 1 BB T ( jωi A T ) 1 dω. (e jω I A T ) 1 C T C(e jω I A) 1 (e jω I A) 1 BB T (e jω I A T ) 1. What technique to use? The discrete-time case : y u 1 C y 1 y 2 = OC u 2 u 3 = CA [ CA 2 B AB A 2 B... ]...... suggests to use Krylov spaces! K j (M, R) = Im { R, MR, M 2 R,..., M j 1 R } u 1 u 2 u 3. 12

Rational interpolation and moment matching Let X and Y define a projector (Y T X = I n ) and {Â, ˆB, Ĉ, D} = { Y T AX, Y T B, CX, D } Taylor series of H(s). = C(sI A) 1 B + D around H(s) = H + H 1 s 1 + H 2 s 2 +, where the moments H i are equal to : H = D, H i = CA i 1 B, i = 1, 2,... The reduced order model Ĥ(s). = Ĉ(sI Â) 1 ˆB + ˆD has a similar expansion with moments Ĥi : Ĥ(s) = Ĥ + Ĥ1s 1 + Ĥ2s 2 +, Ĥ = ˆD, Ĥ i = ĈÂi 1 ˆB, i = 1, 2,... Moment matching of both models (Padé approximation) is obtained by using Krylov spaces (Gragg-Lindquist 83) 13

Theorem : Let m = p, Y T X = I n and assume ImX = Im [ B, AB, A 2 B,... A k 1 B ], ] ImY = Im [C T, A T C T, A 2T C T,... A (k 1)T C T then the first 2k moments match : H j = Ĥj, j = 1,..., 2k. Rational Krylov methods extend this to several points σ i : multipoint (Padé) approximations are obtained by just using modified Krylov spaces : (Gallivan-Grimme-VanDooren 97) ImX = i K ji {(A σ i I) 1, (A σ i I) 1 B} ImY = i K ji {(A T σ i I) 1, (A T σ i I) 1 C T } in the above theorem 14

Comparison Optimal and rational approximations 15th order approximation of 12 th order CD player 1 1 5 5 Frequency responses in db 5 1 Frequency responses in db 5 1 15 15 2 2 25 1 1 1 1 1 1 2 1 3 1 4 1 5 15th order Hankel norm approximation 25 1 1 1 1 1 1 2 1 3 1 4 1 5 15th order balanced truncation 1 5 Frequency responses in db 5 1 15 2 25 1 1 1 1 1 1 2 1 3 1 4 1 5 15th order RK with points 1,2,2,2,2,2,2,2,2,3,4,4,5,5,Inf Legend : Hankel norm - - Balanced truncation - - - Rational Krylov Errors T (.) ˆT (.) ln( T (.) ˆT (.) ) Hankel.2 6.1 Balanc..4 4.1 Rat. Kr. 4.2 1.5 Rational approximation looks better on a logarithmic scale 15

Time-varying systems Approximate the (discrete) time-varying systems { x(k + 1) = A(k)x(k) + B(k)u(k) y(k) = C(k)x(k) + D(k)u(k) by a lower order models of same type. We notice that y(k) C(k) y(k+1) y(k+2) = C(k+1)A(k) C(k+2)A(k+1)A(k) x(k),.. x(k) = [ B(k 1) A (k 1) B (k 2) A (k 1) A (k 2) B (k 3)... ] u(k 1) u(k 2) u(k 3). which again suggests Krylov. Use low rank approximations G o (k) S o (k)s T o (k), G c (k) S c (k)s T c (k), where S o (k) and S c (k) are N n matrices Such approximations are obtained e.g. by keeping only the n dominant singular vectors at step k of the Krylov recurrence : S c (k) = SV D n [B(k), A(k)S c (k 1)] 16

Let s go back to the Storm Surge example The important matrix in this Kalman filtering problem is P (k) = E{(x(k) ˆx(k))(x(k) ˆx(k)) T } which represents the error covariance of the estimation. If we approximate P (k) S(k)S(k) T, S(k) R N n, then we obtain the recurrence S(k + 1) SV D n [ R(k) C(k)S(k) A(k)S(k) B(k)Q(k) where the new factor S(k + 1) stays of rank n by a projection (using the SVD). The only big matrix involved here is the sparse matrix A(k) which is multiplied with only n columns Verlaan-Heemink 97 ] 17

Time-varying linearized problems Consider the discrete-time system { x(k + 1) = G(x(k), u(k)) y(k) = H(x(k), u(k)). One could linearize along a nominal trajectory (x(k), u(k)) and get A(.), B(.), C(.), D(.) from Taylor expansion of G(.,.), H(.,.) Simpler idea (POD) : (Holmes-Lumley-Berkooz 96) Use the energy function G. = k f k=k i x(k)x(k) T. From x(k + 1) = A(k)x(k) with initial conditions x(k i ), we have x(k) = Φ(k, k i )x(k i ). Therefore G looks like a Gramian : k f (Φ(k, k i )x(k i ))(Φ(k, k i )x(k i )) T. k=k i Now project on its dominant subspace (POD) 18

Example : Use POD in CVD reactor (Ly-Tran 99) Schematic representation of a horizontal quartz reactor in a steel confinement shell Compute state trajectories for one typical input : Snap shots of typical states Ten dominant states 19

Concluding remarks large-scale is typically sparse time-stepping (simulation) is cheaper than control (optimization) find an energy function that is cheap and project on its dominant features Future work find error bounds on the fly incorporate projections in closed loop Further reading P. Van Dooren, Gramian based model reduction of large-scale dynamical systems, in Numerical Analysis 1999, Chapman and Hall, pp. 231-247, CRC Press, London, 2. M. Verlaan and A. Heemink, Tidal flow forecasting using reduced rank square root filters, Stochastic Hydrology and Hydraulics, 11, pp. 349-368, 1997. See also SIAM short course notes on http://www.auto.ucl.ac.be/ vdooren/ 2