Non-Intrusive Solution of Stochastic and Parametric Equations

Non-Intrusive Solution of Stochastic and Parametric Equations Hermann G. Matthies a Loïc Giraldi b, Alexander Litvinenko c, Dishi Liu d, and Anthony Nouy b a,, Brunswick, Germany b École Centrale de Nantes, GeM, Nantes, France c KAUST, Thuwal, Saudi Arabia d Institute of Aerodynamics and Flow Control, DLR, Brunswick, Germany wire@tu-bs.de http://www.wire.tu-bs.de 13 Overview-Barcelona.tex,v 5.3.1.2 2015/01/06 00:23:34 hgm Exp

Overview 2 1. Parametric equations 2. Stochastic model problem 3. Plain vanilla Galerkin 4. To be or no to be intrusive 5. Numerical Comparison 6. Galerkin and low-rank tensor approximation 7. Non-intrusive computation 8. Numerical examples

General mathematical setup 3 Consider operator equation, physical system modelled by A: A(p; u) = f(p) u U, f F, U space of states, F = U dual space of actions / forcings; Operator and rhs depend on parameters p, well posed for all p P. Iterative solver convergent for all values of p iterates for k = 0,..., u (k+1) (p) = S(p; u (k) (p), R(p; u (k) (p)), with u (k) (p) u (p), where S is one cycle of the solver, and the residuum: R(u (k) ) := R(p; u (k) (p)) := f(p) A(p; u (k) ). When the residuum vanishes R(p; u (p)) = 0 the mapping S has a fixed point u (p) = S(p; u (p), 0).

Model stochastic problem 4 Geometry flow = 0 Sources 2 flow out Aquifier 2D Model model with stochastic data, p 1.5 1 0.5 0 2 1.5 Dirichlet b.c. (κ(x, p) u(x, p)) = f(x, ω) & b.c., x G R d (κ(x, p) u(x, p)) n = g(x, p), x Γ G, p P κ stochastic conductivity, f and g stochastic sinks and sources. One p is a realisation of κ, f, g. 1 0.5 0

Preconditioned residual 5 In the iteration set u (k+1) = u (k) + u (k) with u (k) := S(p; u (k), R(p; u (k) )) u (k), and usually P ( u (k) ) = R(p; u (k) ), so that S(p; u (k) ) = u (k) + P 1 (R(p; u (k) )). (list of arguments shortened) Here P is some preconditioner, which may depend on p, the iteration counter k, and on the current iterate u (k) ; e.g. in Newton s method P = D u A(p; u (k) ).

Iteration 6 Algorithm: Start with some initial guess u (0) k 0 while no convergence do Compute u (k) S(p; u (k), R(p; u (k) )) u (k) u (k+1) u (k) + u (k) k k + 1 end while Uniform contraction: p, u, v : S(p; u(p), R(p; u(p))) S(p; v(p), R(p; v(p))) U ϱ u(p) v(p) U. with ϱ < 1.

Discretisation I 7 Let S R P be an appropriate Hilbert space of real functions on P, look for solution in tensor space U := U S so that u(p) = ι u ι ς ι (p). Normally, discretise first U by choosing finite-dimensional U N U, but results here independent of that. Direct Integration: To compute Quantity of Interest (QoI) Q(u) = Q(u(p), p) µ(dp) w z Q(u(p z ), p z ) P z integrand and u(p z ) have to be computed for all p z : expensive! But decoupled, non-intrusive solves. Want to replace it by proxy / meta model or emulator u(p) u M (p)

Discretisation II 8 (Further) discretise U S by choosing S M = span{ψ α (p)} S to give U S M = U M U S = U. Ansatz u M (ω) = α u αψ α (p) U M, Often Ψ α (p) = Ψ α (θ(p)), where θ(p) = [..., θ l (p),... ] are independent. If Ψ α (θ) = l ψ α l (θ l ), where α = (..., α l,... ), then S M = l S M,l allows for higher degree tensors. Simplest computation for Ψ α to be orthogonal (orthonormal), e.g. in inner product φ, ϕ S = φ(p)ϕ(p) µ(dp) P How to determine the unknown coefficients u α?

Solution procedures I 9 Projection of the solution u(p), or of the residuum R(p; u(p)). Interpolation: Determine the u α by interpolating condition: p β : u(p β ) =! u M (p β ) = u α Ψ α (p β ). α Simplest when Kronecker-δ property Ψ α (p β ) = δ α,β satisfied. Solve equation on interpolation points p β decoupled, non-intrusive solves. Pseudo-spectral projection: Simple as Ψ α are orthonormal. Compute projection inner product (integral) by quadrature, i.e. u α = Ψ α (p)u(p) µ(dp) w z Ψ α (p z )u(p z ), P z solve equation on quadrature points p z decoupled, non-intrusive solve.

Solution procedures II 10 Mapping u( ) u M ( ) is a projection Π, and to describe a general projection, choose ŜM = span{φ α (p)}, projection orthogonal to ŜM: ϕ ŜM : (I Π)u, ϕ = 0, i.e. Ŝ M = im(i Π). Approximation properties are determined by S M, stability by ŜM. Collocation / Interpolation i.e. solve equation on collocation / interpolation points p β, i.e. Φ β (p) = δ(p p β ): R(p β ; u M (p β )) = R(p β ; α u α Ψ α (p β ))! = 0. With Kronecker-δ: R(p β ; u β ) = 0 same as interpolation, decoupled, non-intrusive solve. We worry about the norm Π. Norm of collocation projector Π C may grow with M.

Projectors 11 Pseudo-spectral projector Π P is orthogonal, i.e. Π P = 1. This means that ŜM = S M, normally Φ α = Ψ α Galerkin: Apply Galerkin weighting. β : Φ β (p), R(p; u M (p)) = Φ β (p), R(p; α Coupled equations, is it intrusive? u α Ψ α (p)) = 0. When solved in a partitioned way, residua computed by quadrature, it is non-intrusive, needs only residua on qudrature points. To have norm of projector as small as possible (Bubnov-Galerkin), choose orthogonal projection Φ α = Ψ α,

Galerkin on iteration equation 12 Trick: Project iteration equation. Set u (k) (p) = α u(k) α Ψ α (p)) and u (k) = [..., u (k) β,... ]: u (k+1) = u (k) + u (k) = S(u (k), R(u (k) )) = u (k+1) = u (k) + M u (k), with M u (k) := [..., Ψ β, S(p; u (k) (p), R(p; u (k) (p))),... ] u (k) Define a mapping S(u): S(u) := [..., Ψ β, S(p; α u α Ψ α (p), R(p; α u α Ψ α (p))),... ], then M u (k) = S(u (k) ) u (k) and u (k+1) = u (k) + M u (k) = S(u (k) ).

Convergence 13 Start with some initial guess u (0) k 0 while no convergence do Compute M u (k) as above u (k+1) u (k) + M u (k) k k + 1 end while Nonlinear block Jacobi algorithm: Theorem: The mapping S has the same contraction factor ϱ. This means that the simple nonlinear block Jacobi algorithm converges as before.

The myth about intrusiveness 14 Folklore: Galerkin methods are intrusive. They can be, but don t have to. Question: To be or not to be intrusive? Stochastic Galerkin conditions for iteration equation requires S(u (k) ), approximated by S(u (k) ) S Z (u (k) ) = ( ) υ z Ψ β (p z )S p z, u (k) (p z ), R(p; u (k) (p z )) z. to give M u (k) Z u (k) = S Z (u (k) ) u (k). This requires the evaluation of preconditioned residuum one iteration with the solver at each p z. Theorem still holds with M u (k) replaced by Z u (k) in algorithm.

Numerical example 15 2 R 4 R R R 3 R R R 1 R 5 R 6 A(p; u) := (Ku + λ 1 (p 1 )(u T u) u) = λ 2 (p 2 )f 0 =: f(p). f 0 := [1, 0, 0, 0, 0] T.

Numerical example spec 16 Case 1 Case 2 Case 3 Case 4 λ 1 (p 1 ) p 1 + 2 p 1 + 1.1 p 1 + 2 sin(4p 1 + 2) λ 2 (p 2 ) p 2 + 25 25p 2 + 0.5 10p 2 + 30 10 sin(p 2 ) + 30 c.o.v. 2.5e 2 2.9e+1 1.7e 1 2.2e 1 1e 02 1e 04 2nd order polynomial 3rd order polynomial 4th order polynomial 5th order polynomial RMSE 1e 06 1e 08 10 10 10 8 10 6 10 4 10 2 Convergence criteria (ε tol ) 10 0

Numerical results 17 order solver calls ɛ(l 2 (u)) ɛ(l 1 (u)) ɛ(l 2 (R u )) m P G P G P G P G 2 79 90 6.1e-5 6.1e-5 3.5e-5 3.5e-5 4.1e-5 4.1e-5 3 161 192 3.9e-6 3.9e-6 2.3e-6 2.3e-6 2.6e-6 2.6e-6 4 284 325 2.7e-7 2.7e-7 1.6e-7 1.6e-7 1.8e-7 1.8e-7 5 458 540 2.0e-8 2.0e-8 1.2e-8 1.2e-8 1.4e-8 1.4e-8 Low rank approximation: write u := [..., u α,... ] = (u α,n ) u = α,n u α,ne α e n u r = r 1 j=1 w j η j. Use faster global methods than block Jacobi, e.g. Quasi-Newton. Try and keep a low-rank tensor approximation troughout, from input fields to output solution.

Successive rank-one updates (SR1U) 18 Assume a functional J(p; u), and that A(p; u) f(p) = δ u J(p; u) = 0, so that solution is equivalent with minimising J for each p. Build solution rank-one by rank-one, i.e. with already computed u r := r 1 j=1 w j η j add new term w r η r through: min J(u r + w r η r ) δ w,η J(u r + w r η r ) = 0 w r,η r successive rank-one updates (SR1U), proper generalised decomposition (PGD). This Galerkin procedure only solves small problems, good approximations often with small r.

Low-rank approximation (basic PGD) 19 Define J r (w r, η r ) := J(u r (p) + w r η r ). New w r and η r found via system δ w J r (w r, η r ) = 0 δ η J r (w r, η r ) = 0 w r = 1. Block-Jacobi solver: u 1 0; η 1 1; w 1 0; for r = 1,..., until u r + w r η r accurate enough do : while no convergence do η r η r / η r ; Solve δ w J r (w r, η r ) = 0 for w r ; w r w r / w r ; Solve δ η J r (w r, η r ) = 0 for η r ; end while u r+1 u r + w r η r ; end for Output: a basic (greedy) low-rank approximation u r.

Non-intrusive residual for PGD Non-intrusive approximation of first equation: δ w J r (w r, η r ) = 0 δ u J(u r + w r η r ), η r S = 0 in U 0 = R(p; u r (p) + w r η r (p))η r (p) dp P 20 z υ z R(p z ; u r (p z ) + w r η r (p z )) η r (p z ), 2 nd eq.: δ η J r (w r, η r ) = 0 δ u J(u r + w r η r ), w r U = 0 in S λ S : 0 = R(p; u r (p) + w r η r (p)), w r U λ(p) dp P z υ z R(p z ; u r (p z ) + w r η r (p z )), w r U λ(p z )

Recent improvements 21 Increase u r by more than one term at a time (e.g. 5 10 terms) larger systems to be solved. Use faster algorithm than block-jacobi, e.g. Quasi-Newton methods (here BFGS). Use previous iterates as control variates to have fewer integration points per iteration. Increase accuracy of integration (number of integration points) as iteration converges.

z matrix is chosen to be the linear part B of the Hessian of the functional J. The low-rank approximations are also compared to the full-rank Galerkin approximation computed with the block-jacobi algorithmpgd introduced accuracy in [8], with a stagnation criterion of 10 10. The comparison is made in Table 1 for total degrees d = 2,3,4,5 and ranks 1,2,3,4,5 for the approximations. 22 d =2 d =3 d =4 d =5 Block-Jacobi solver [8] 5.14 10 5 3.31 10 6 2.31 10 7 1.70 10 8 Basic PGD (Algorithm 1) r =1 2.34 10 3 2.34 10 3 2.34 10 3 2.34 10 3 r =2 9.67 10 5 8.22 10 5 8.22 10 5 8.22 10 5 r =3 5.14 10 5 3.39 10 6 8.03 10 7 7.78 10 7 r =4 5.14 10 5 3.31 10 6 2.34 10 7 3.63 10 8 r =5 5.14 10 5 3.31 10 6 2.31 10 7 1.71 10 8 Improved PGD (Algorithm 3) r =1 2.34 10 3 2.34 10 3 2.34 10 3 2.34 10 3 r =2 5.14 10 5 3.31 10 6 2.85 10 7 1.95 10 7 r =3 5.14 10 5 3.31 10 6 2.31 10 7 1.79 10 8 r =4 5.14 10 5 3.31 10 6 2.31 10 7 1.79 10 8 r =5 5.14 10 5 3.31 10 6 2.31 10 7 1.76 10 8 Table 1: Relative error for the approximation resulting from the block-jacobi solver, the basic PGD and the improved algorithm for different total degrees d and different r.

greedy approximation gives satisfying results, even if the result is not optimal compared to the approximation resulting from a direct optimization in low-rank subsets. For the rest of this section, we focus on d = 5 and we measure the eﬃciency of PGD calls the diﬀerent algorithms by counting the number of calls to the residual R(ur (pz ); pz ) = b(pz ) A(u(pz ); pz ). The results are reported in Table 2. r=1 r=2 r=3 r=4 Basic PGD (Algorithm 1) Relative error 2.34 10 3 8.22 10 5 7.78 10 7 3.63 10 8 Residual calls 1044 2160 3096 3816 Improved algorithm (Algorithm 3) Relative error 2.34 10 3 1.95 10 7 1.79 10 8 1.79 10 8 Residual calls 1044 2304 2700 2844 r=5 1.71 10 8 4464 1.79 10 8 3024 Table 2: Number of calls to the residual and corresponding relative error for diﬀerent ranks r for the basic PGD and the improved algorithm. Both algorithms are similar at the beginning until r = 2. When r = 3, Algorithm 3 becomes the most eﬃcient for computing the low-rank approximation. However, if we compare with the block-jacobi solver, the latter one only requires 540 calls to the residual. This suggests that the classical algorithms for computing the low-rank approximation of the solution of nonlinear equations must be reconsidered in terms of eﬃciency and intrusivity and diﬀerent approaches must be proposed. 23

24 1 1 0.8 0.8 0.6 0.6 u(p;x) g(p;x) Obstacle example 0.4 0.4 0.2 0.2 0 1 0 1 0.8 1 0.6 0.8 0.6 0.4 0.4 0.2 p 0 0.8 1 0.6 0.8 0.4 x (a) Obstacle: g(p; x). 0.4 0.2 0.2 0 0.6 p 0 0.2 0 x (b) Solution: u(p; x). Figure 2: Obstacle and solution as functions of x and p [3]. 100 10 1 10 2 r 3 SVD of the L2 -projection Algorithm 1 Algorithm 3

0.6 0.4 0 p 0.4 0.2 0.2 0 0.6 0.4 0.4 0.2 p x (a) Obstacle: g(p; x). Obstacle 0 0.2 0 x (b) Solution: u(p; x). example convergence Figure 2: Obstacle and solution as functions of x and p [3]. 100 SVD of the L2 -projection Algorithm 1 Algorithm 3 10 1 Relative error 10 2 10 3 10 4 10 5 10 6 10 7 10 8 10 9 0 10 20 30 40 r 50 60 70 80 Figure 3: Relative error with respect to the rank of the approximation for diﬀerent algorithms. 25

Conclusion 26 Parametric problems can be emulated. Galerkin methods can be non-intrusive. Convergence can be accelerated by faster global algorithms. For efficiency try and use sparse representation throughout; ansatz in low-rank tensor products, saves storage as well as computation. PGD / SR1U is inherently a Galerkin procedure. Can also be non-intrusive. Low-rank tensor representation can be very accurate.