On the numerical solution of large-scale linear matrix equations V. Simoncini Dipartimento di Matematica, Università di Bologna (Italy) valeria.simoncini@unibo.it 1
Some matrix equations Sylvester matrix equation AX+XB +D = 0 Eigenvalue pbs and tracking, Control, MOR, Assignment pbs, Riccati eqn Lyapunov matrix equation AX+XA +D = 0, D = D Stability analysis in Control and Dynamical systems, Signal processing, eigenvalue computations Algebraic Riccati equation AX+XA XBB X+D = 0, D = D Lancaster-Rodman 95, Konstantinov-Gu-Mehrmann-Petkov, 02, Bini-Iannazzo-Meini 12 Focus: All or some of the matrices are large (and possibly sparse) 2
Some matrix equations Sylvester matrix equation AX+XB +D = 0 Eigenvalue pbs and tracking, Control, MOR, Assignment pbs, Riccati eqn Lyapunov matrix equation AX+XA +D = 0, D = D Stability analysis in Control and Dynamical systems, Signal processing, eigenvalue computations Algebraic Riccati equation AX+XA XBB X+D = 0, D = D Lancaster-Rodman 95, Konstantinov-Gu-Mehrmann-Petkov, 02, Bini-Iannazzo-Meini 12 Focus: All or some of the matrices are large (and possibly sparse) 3
Some matrix equations Sylvester matrix equation AX+XB +D = 0 Eigenvalue pbs and tracking, Control, MOR, Assignment pbs, Riccati eqn Lyapunov matrix equation AX+XA +D = 0, D = D Stability analysis in Control and Dynamical systems, Signal processing, eigenvalue computations Algebraic Riccati equation AX+XA XBB X+D = 0, D = D Lancaster-Rodman 95, Konstantinov-Gu-Mehrmann-Petkov, 02, Bini-Iannazzo-Meini 12 Focus: All or some of the matrices are large (and possibly sparse) 4
Some matrix equations Sylvester matrix equation AX+XB +D = 0 Eigenvalue pbs and tracking, Control, MOR, Assignment pbs, Riccati eqn Lyapunov matrix equation AX+XA +D = 0, D = D Stability analysis in Control and Dynamical systems, Signal processing, eigenvalue computations Algebraic Riccati equation AX+XA XBB X+D = 0, D = D Lancaster-Rodman 95, Konstantinov-Gu-Mehrmann-Petkov, 02, Bini-Iannazzo-Meini 12 Focus: All or some of the matrices are large (and possibly sparse) 5
Solving the Lyapunov equation. The problem Approximate X in: AX +XA +BB = 0 A R n n neg.real B R n p, 1 p n Time-invariant linear dynamical system: x (t) = Ax(t)+Bu(t), x(0) = x 0 Closed form solution: X = 0 e ta BB e ta dt = 0 xx dt with x = exp( ta)b. X symmetric semidef. see, e.g., Antoulas 05, Benner 06 6
Solving the Lyapunov equation. The problem Approximate X in: AX +XA +BB = 0 A R n n neg.real B R n p, 1 p n Time-invariant linear system: x (t) = Ax(t)+Bu(t), x(0) = x 0 Closed form solution: X = 0 e ta BB e ta dt X symmetric semidef. see, e.g., Antoulas 05, Benner 06 7
Linear systems vs linear matrix equations Large linear systems: Ax = b, A R n n Krylov subspace methods (CG, MINRES, GMRES, BiCGSTAB, etc.) Preconditioners: find P such that AP 1 x = b x = P 1 x is easier and fast to solve Large linear matrix equation: AX +XA +BB = 0 No preconditioning to preserve symmetry X is a large, dense matrix low rank approximation X X = ZZ, Ztall 8
Linear systems vs linear matrix equations Large linear systems: Ax = b, A R n n Krylov subspace methods (CG, MINRES, GMRES, BiCGSTAB, etc.) Preconditioners: find P such that AP 1 x = b x = P 1 x is easier and fast to solve Large linear matrix equations: AX +XA +BB = 0 No preconditioning - to preserve symmetry X is a large, dense matrix low rank approximation X X = ZZ, Ztall 9
Linear systems vs linear matrix equations Large linear systems: Ax = b, A R n n Krylov subspace methods (CG, MINRES, GMRES, BiCGSTAB, etc.) Preconditioners: find P such that AP 1 x = b x = P 1 x is easier and fast to solve Large linear matrix equations: AX +XA +BB = 0 Kronecker formulation: (A I +I A)x = b x = vec(x) 10
Given an approximation space K, Projection-type methods X X m col(x m ) K Galerkin condition: R := AX m +X m A +BB K VmRV m = 0 K = Range(V m ) Assume VmV m = I m and let X m := V m Y m Vm. Projected Lyapunov equation: Vm(AV m Y m Vm +V m Y m VmA + BB )V m = 0 (VmAV m )Y m +Y m (VmA V m ) + VmBB V m = 0 Early contributions: Saad 90, Jaimoukha & Kasenally 94, for K = K m (A,B) = Range([B,AB,...,A m 1 B]) 11
Given an approximation space K, Projection-type methods X X m col(x m ) K Galerkin condition: R := AX m +X m A +BB K VmRV m = 0 K = Range(V m ) Assume VmV m = I m and let X m := V m Y m Vm. Projected Lyapunov equation: Vm(AV m Y m Vm +V m Y m VmA +BB )V m = 0 (VmAV m )Y m +Y m (VmA V m )+VmBB V m = 0 Early contributions: Saad 90, Jaimoukha & Kasenally 94, for K = K m (A,B) = Range([B,AB,...,A m 1 B]) 12
Given an approximation space K, Projection-type methods X X m col(x m ) K Galerkin condition: R := AX m +X m A +BB K VmRV m = 0 K = Range(V m ) Assume VmV m = I m and let X m := V m Y m Vm. Projected Lyapunov equation: Vm(AV m Y m Vm +V m Y m VmA +BB )V m = 0 (VmAV m )Y m +Y m (VmA V m )+VmBB V m = 0 Early contributions: Saad 90, Jaimoukha & Kasenally 94, for K = K m (A,B) = Range([B,AB,...,A m 1 B]) 13
More recent options as approximation space Enrich space to decrease space dimension Extended Krylov subspace K = K m (A,B)+K m (A 1,A 1 B), that is, K = Range([B,A 1 B,AB,A 2 B,A 2,A 3 B,...,]) (Druskin & Knizhnerman 98, Simoncini 07) Rational Krylov subspace K = Range([B,(A s 1 I) 1 B,...,(A s m I) 1 B]) usually, {s 1,...,s m } C + chosen a-priori In both cases, for Range(V m ) = K, projected Lyapunov equation: (VmAV m )Y m +Y m (VmA V m )+VmBB V m = 0 X m = V m Y m Vm 14
More recent options as approximation space Enrich space to decrease space dimension Extended Krylov subspace K = K m (A,B)+K m (A 1,A 1 B), that is, K = Range([B,A 1 B,AB,A 2 B,A 2,A 3 B,...,]) (Druskin & Knizhnerman 98, Simoncini 07) Rational Krylov subspace K = Range([B,(A s 1 I) 1 B,...,(A s m I) 1 B]) usually, {s 1,...,s m } C + chosen a-priori In both cases, for Range(V m ) = K, projected Lyapunov equation: (VmAV m )Y m +Y m (VmA V m )+VmBB V m = 0 X m = V m Y m Vm 15
More recent options as approximation space Enrich space to decrease space dimension Extended Krylov subspace K = K m (A,B)+K m (A 1,A 1 B), that is, K = Range([B,A 1 B,AB,A 2 B,A 2,A 3 B,...,]) (Druskin & Knizhnerman 98, Simoncini 07) Rational Krylov subspace K = Range([B,(A s 1 I) 1 B,...,(A s m I) 1 B]) usually, {s 1,...,s m } C + chosen a-priori In both cases, for Range(V m ) = K, projected Lyapunov equation: (VmAV m )Y m +Y m (VmA V m )+VmBB V m = 0 X m = V m Y m Vm 16
Rational Krylov Subspaces. A long tradition... In general, K m (A,B,s) = Range([(A s 1 I) 1 B,(A s 2 I) 1 B,...,(A s m I) 1 B]) Eigenvalue problems (Ruhe, 1984) Model Order Reduction (transfer function evaluation) In Alternating Direction Implicit iteration (ADI) for linear matrix equations 17
Rational Krylov Subspaces in MOR. Choice of poles. K m (A,B,s) = Range([(A s 1 I) 1 B,(A s 2 I) 1 B,...,(A s m I) 1 B]) cf. General discussion in Antoulas, 2005. Many contributions: Gallivan, Grimme, Van Dooren (1996, ad-hoc poles) Penzl (1999-2000, ADI shifts - preprocessing, Ritz values)... Sabino (2006 - tuning within preprocessing) IRKA Gugercin, Antoulas, Beattie (2008) Druskin, Lieberman, Simoncini, Zaslavski (adaptive greedy procedure) Güttel, Knizhnerman (black-box for matrix functions)... 18
Alternating Direction Implicit iteration (ADI) - Wachspress (see, e.g., Li 2000, Penzl 2000) X 0 = 0, X j = 2p j (A+p j I) 1 BB (A+p j I) j = 1,...,l +(A+p j I) 1 (A p j I)X j 1 (A p j I) (A+p j I) with φ l (t) = l (t p j ), {p 1,...,p l } = argmin max j=1 t Λ(A) φ l (t) φ l ( t) Implementation aspects: Benner, Saak, Quintana-Ortì 2,... Convergence depends on choice of poles {p j } More advanced approach: Galerkin-Projection Accelerated ADI (Benner, Saak, tr 2010) 19
ADI and Rational Krylov subspaces Let B = b (vector). Main consideration (see, e.g., Li, Wright 2000) col(x m (ADI) ) K m (A,b,s) and also, for U m = [(A s 1 I) 1 b,...,(a s m I) 1 b], X (ADI) m = U m α 1 U m with α Cauchy matrix Equivalence between ADI and RKSM: ADI coincides with the Galerkin solution X m in Rational Krylov space if and only if s j = λ j where λ j = eigs(v mav m ) Ritz values (suitably ordered) Druskin, Knizhnerman, S. 11, Beckermann 11, Flagg 09, Gugercin, Flagg 12 20
ADI and Rational Krylov subspaces Let B = b (vector). Main consideration (see, e.g., Li, Wright 2000) col(x m (ADI) ) K m (A,b,s) and also, for U m = [(A s 1 I) 1 b,...,(a s m I) 1 b], X (ADI) m = U m α 1 U m with α Cauchy matrix Equivalence between ADI and RKSM: ADI coincides with the Galerkin solution X m in Rational Krylov space if and only if s j = λ j where λ j = eigs(v mav m ) Ritz values (suitably ordered) Druskin, Knizhnerman, S. 11, Beckermann 11, Flagg 09, Gugercin, Flagg 12 21
Typical behavior of ADI and generic RKSM for the same poles Operator: L(u) = u+(50xu x ) x +(50yu y ) y on [0,1] 2 4 10 2 3 2 10 0 10 2 RKSM cyclic ADI imaginary part 1 0 1 2 3 F norm of error 10 4 10 6 10 8 10 10 4 0 1 2 3 4 5 6 7 8 9 real part 10 12 0 5 10 15 20 25 30 35 40 45 number of iterations Same non-optimal 20 poles, repeated cyclically. 22
Expected performance (from Oberwolfach Collection) 10 2 10 4 backward error 10 4 10 6 10 8 10 10 adaptive RKSM RKSM ADI backward error 10 6 10 8 10 10 adaptive RKSM RKSM ADI 10 12 10 12 10 14 10 14 0 10 20 30 40 50 60 number of system solves 0 5 10 15 20 25 30 35 40 number of system solves Left: rail problem, A symmetric. Right: flow meter model v0.5 problem, A nonsymmetric. ADI and RKSM use 10 non-optimal poles cyclically (computed a-priori with lyapack, Penzl 2000) 23
Multiterm linear matrix equation A 1 XB 1 +A 2 XB 2 +...+A l XB l = C Applications: Matrix least squares Control Stochastic PDEs... Main device: Kronecker formulation ( B 1 A 1 +...+B l A l ) x = c Iterative methods: matrix-matrix multiplications and rank truncation (Benner, Breiten, Bouhamidi, Chehab, Damm, Grasedyck, Jbilou, Kressner, Matthies, Onwunta, Raydan, Stoll, Tobler, Zander,...) 24
Multiterm linear matrix equation A 1 XB 1 +A 2 XB 2 +...+A l XB l = C Applications: Matrix least squares Control Stochastic PDEs... Main device: Kronecker formulation ( B 1 A 1 +...+B l A l ) x = c Iterative methods: matrix-matrix multiplications and rank truncation (Benner, Breiten, Bouhamidi, Chehab, Damm, Grasedyck, Jbilou, Kressner, Matthies, Onwunta, Raydan, Stoll, Tobler, Zander,...) 25
Other related matrix equations More exotic linear matrix equations Sylvester-like BX +f(x)a = C typically (but not only!) f(x) = X, f(x) = X, or f(x) = X (Bevis, Braden, Byers, Chiang, De Terán, Dopico, Duan, Feng, Guillery, Hall, Hartwig, Ikramov, Kressner, Montealegre, Reyes, Schröder, Vorntsov, Watkins, Wu,...) 26
The -Sylvester matrix equations Solve for X: AX +X B = C, ( ) A unique solution exists for any C R n n iff A λb is regular and spec(a,b )\{1} is reciprocal free (with 1 having at most algebraic multiplicity 1) Small scale: Bartel-Stewart type algorithm (De Teran, Dopico, 2011) If X 0 is the unique solution to the Sylvester eqn AXA B XB = C C A 1 B then X 0 is the unique solution to ( ) 27
The large scale -Sylvester matrix equations AX +X B = C 1 C 2, C 1,C 2 R n r, r n Find: X X m = V m Y m W m R n n Orthogonality (Petrov-Galerkin) condition: W m(ax m +X mb C 1 C 2 )W m = 0 (the orthogonality space is different from the approximation space) Reduced T-Sylvester equation: (W mav m )Y m +Y m(v mbw m ) = (W mc 1 )(W mc 2 ) Key issue: Choice of V m,w m 28
The selection of V m,w m Exploit the generalized Schur decomposition: A = WT A V and B = WT B V (W, V orthogonal) from which B A = V T 1 B T AV and B V = WT B Therefore: Range(V m ) B AV = V T 1 B T A Range(W m ) = B Range(V m ) and B V = WT B good approx to invariant subspaces of B A Range(V m ) = K m (B T A,B T [C 1,C 2 ]) Range(W m ) = B Range(V m ) 29
The selection of V m,w m Exploit the generalized Schur decomposition: A = WT A V and B = WT B V (W, V orthogonal) from which B A = V T 1 B T AV and B V = WT B Therefore: Range(V m ) B AV = V T 1 B T A Range(W m ) = B Range(V m ) and B V = WT B good approx to invariant subspaces of B A Range(V m ) = K m (B A,B [C 1,C 2 ]), Range(W m ) = B Range(V m ) 30
The selection of V m,w m Range(V m ) = K m (B A,B [C 1,C 2 ]), Range(W m ) = B Range(V m ) Algorithmic considerations: Range(W m ) = K m (AB, [C 1,C 2 ]) so that Range(C 1 ) Range(C 2 ) Range(W m ) If C 1 = C 2 then Range(V m ) = K m (B A,B C 1 ) The role of A and B can be reversed (A B, B A, C 1 C 2 ) Remark: Enriched spaces can be used... 31
Computational considerations n = 10 4. A and B: finite difference discretizations in [0,1] 2 of a(u) = ( exp( xy)u x ) x +( exp(xy)u y ) y +100xu x +γu b(u) = u xx u yy, γ = 5 10 4 tol = 10 10 EK BK BK-TR EK-SYLV iterations 8 83 8 8 dim. approx. space 32 166 16 32 time (seconds) 1.7 58.1 0.7 2.4 BK-TR: Standard Krylov subspace, roles of A and B reversed All eigenvalues of B A are well outside the unit circle 32
Computational considerations n = 10 4. A and B: finite difference discretizations in [0,1] 2 of a(u) = ( exp( xy)u x ) x +( exp(xy)u y ) y +100xu x +γu, b(u) = u xx u yy +100xu x, γ = 5 10 4 tol = 10 10 EK BK* BK-TR* EK-SYLV* iterations 29 100 100 100 dim. approx. space 116 200 200 400 time (seconds) 10.9 70.7 63.8 521.2 eigenvalues of B A are now located inside and outside the unit circle 33
Conclusions Large advances in solving really large linear matrix equations Matrix equation challenges rely on strength and maturity of linear system solvers Some references V. S., Computational methods for linear matrix equations SIAM Review, v.58, n.3, 2016 Froilan Dopico, Javier Gonzalez, Daniel Kressner and V. S. Projection methods for large T-Sylvester equations Mathematics of Computations, v.85, n.301, 2016 Stephen D. Shank, V. S. and Daniel B. Szyld Efficient low-rank solutions of Generalized Lyapunov equations Numerische Mathematik, v.134, n.2, 2016 34