Numerical Approximation of Stochastic Elliptic Partial Differential Equations

Numerical Approximation of Stochastic Elliptic Partial Differential Equations Hermann G. Matthies, Andreas Keese Institut für Wissenschaftliches Rechnen Technische Universität Braunschweig wire@tu-bs.de http://www.wire.tu-bs.de

Overview 2. Stochastic modell problem and stochastic PDE 2. Spatial discretisation of random fields (Karhunen-Loève) 3. Stochastic Galerkin Methods 4. Structure of Discrete Equations 5. Solver for linear SPDEs 6. Hierarchical Parallelisation 7. Numerical Example

Model Problem 3 Geometry 2.5 2.5 Aquifier 2D Model simple stationary model of groundwater flow (κ(x) u(x)) = f(x) & b.c., x G R d κ(x) u(x) = g(x), x G, u hydraulic head, κ conductivity, f and g sinks and sources.

Stochastic Model 4 Uncertainty of system parameters e.g. κ = κ(x, ω) = κ(x) + κ(x, ω), f = f(x, ω), g = g(x, ω) are stochastic fields ω Ω = probability space of all realisations, with probability measure P. Assumption: < κ κ(x, ω) < κ. Possibilities: Transformation κ(x, ω) = φ(x, γ(x, ω)) := F κ(x) of Gaussian field γ with given 2nd order statistic. e.g. κ(x, ω) has marginal β(/2, /2)-distribution, if (with a(x) > ) κ(x, ω) = a(x) + b(x) arccos (π Φ ( γ(x, ω) )) or log-normal distribution (violates assumption) κ(x, ω) = a(x) + exp(γ(x, ω)).35.3.25.2..5 Φ(γ(x, ω)).5 2 2.5 3 3.5 4 4.5 5 5.5 6 x

Realisation of κ(x, ω) 5

Stochastic PDE and Variational Form 6 Insert stochastic parameters into PDE stochastic PDE (κ(x, ω) u(x, ω)) = f(x, ω), x G & b.c. on G Solution u(x, ω) is a stochastic field in tensor product form W S u(x, ω) = µ v µ (x)u (µ) (ω) W is a normal spatial Sobolev space, and S a space of random variables, e.g. S = L 2 (Ω, P ) if u has finite variance, otherwise a stochastic distribution. Well-posed variational formulation: Find u W S, such that v W S : a(κ; v, u) := v(x, ω) (κ(x, ω) u(x, ω)) dx dp (ω) = Ω Ω G [ G v(x, ω)f(x, ω) dx + G ] v(x, ω)g(x, ω) ds(x) dp (ω) =: f, v.

Theoretical matters 7 Ghanem & Spanos (99) Idea of stochastic Galerkin methods, first formulation. Holden et al. (996): Products between RVs as Wick product. Interpretation doubtful. Classical PDE in space. Existence via S-transform. Benth & Gjerde (998): Convergence rates for polynomial chaos approximation in variational setting. Use of Lax-Milgram. M. & Bucher (999): Variational formulation of Wick product SPDEs, equivalence with S-transform. Besold (2): Existence via Lax-Milgram for normal product SPDEs. Babuška et al. (22): p-version FE-convergence rates for solution of a restricted modell (finite dim. Prob. space of independent RVs). M. & Keese (23): Stability of Galerkin approximation for Karhunen-Loève and polynomial chaos expansion of κ. Nonlinear SPDEs via monotone operator theory.

Statistics 8 Desirable: Uncertainty Quantification: The goal is to compute statistics which are deterministic functionals of the solution: Ψ u (x) = E (Ψ(u(x, ))) := Ψ(u(x, ω)) dp (ω) Ω e.g.: ū(x) = E (u(x)), or var u (x) = E ( (ũ(x)) 2), or Pr{u(x) u } = E ( ) χ {u(x) u } Principal Approach:. Discretise spatial (and temporal) part (e.g. via finite elements, finite differences). 2. Approximate stochastic fields in finitely many independent random variables (RVs), stochastic discretisation on spatially continuous or discrete representation. 3. Compute solution/statistics: Via high-dimensional integration (e.g. Monte Carlo, Smolyak, FORM). Or approximate solution with response-surface, then integration stochastic Galerkin methods are a form of response surface computation

Example Solution 9 Geometry 2.5 flow = flow out 2 Sources 5 Dirichlet b.c. 5 2.5 Mean of solution 5 Realization of κ 2 9 8 7 6 4 2 2 8 6 9 4 8 2 7 Variance of solution Realization of solution 2 5 4 3 9.5 9 8.5 8 7.5 7 6.5 6 5.5 2 2 6 5 4 2 2 2

Random Fields Consider region in space G, a random field is a RV κ x (ω) at each x G, alternatively for each ω Ω a random function a realisation κ ω (x) on G Covariance considered at different positions C κ (x, x 2 ) := E ( κ(x, ) κ(x 2, )) ( If κ(x) κ, and C κ (x, x 2 ) = c κ (x x 2 ), process is (weakly) homogeneous.) We have to deal with another RV at each point x. (uncountably many) Need to discretise spatial aspect (generalise Fourier representation). One possibility is the Karhunen-Loève Expansion. Can serve also for system (information) reduction. Need to discretise the random variables. One possibility is the Polynomial Chaos Expansion.

Karhunen-Loève Expansion KL mode KL mode 5.8.6.4.2.5 2 Eigenvalue-Problem for Karhunen-Loève Expansion KLE. Other names: Proper Orth. Decomp.(POD), Sing. Value Decomp.(SVD), Prncpl. Comp. Anal.(PCA): C κ (x, y)g j (y) dy = κ 2 jg j (x) G gives spectrum {κ 2 j} and orthogonal KLE eigenfunctions g j (x) Representation of κ: κ(x, ω) = κ(x) + κ j g j (x)ξ j (ω) =: κ j g j (x)ξ j (ω) j= with centred, uncorrelated random variables ξ j (ω). Truncate after m largest eigenvalues optimal in variance expansion in m RVs. j=

Representation in Gaussian Random Variables 2 Represent {ξ j (ω)} in (uncorrelated ) independent Gaussian RVs θ m (ω) = (θ (ω),..., θ m (ω)) (one for each KLE function) on an m-dimensional space Θ (m) R m with Gaussian measure dγ m (θ m ) dγ m (θ m ) = (2π) m/2 exp( θ m 2 /2) dθ m m = (2π) /2 exp( θj 2 /2) dθ j j= Each function (RV) ξ j (θ m ) of these Gaussian RVs θ m (ω) (with finite variance) can be approximated arbitrarily well (in variance, i.e. the L 2 (Ω, P )-norm) with polynomials in (θ (ω),..., θ m (ω)) Wiener s Plynomial Chaos. 6 4 2 Realisation m = 5 6 4 2 Realisation m = 2

Polynomial Chaos Expansion 3 Much more is actually true (polynomial chaos expansion PCE): Theorem[Norbert Wiener]: Any RV r(ω) L 2 (Ω, P ) (with finite variance) can be represented in orthogonal polynomials of Gaussian RVs {θ m (ω)} m= =: θ(ω): r(ω) = ϱ (α) H α (θ(ω)), with H α (θ(ω)) = h αj (θ j (ω)), α J j= where h l (ϑ) are the usual Hermite polynomials, and J := {α α = (α,..., α j,...), α j N, α := are multi-indices, where only finitely many of the α j are non-zero. In other words, the H α are an orthogonal basis in L 2 (Ω), with H α, H β = E (H α H β ) = α! δ αβ, where α! := j= (α j!). α j < } j=

Computational Approaches 4 Variational formulation discretised in space, e.g. via finite elements u(x, ω) = N(x) T u(ω): A(ω)[u(ω)] = f(ω), or A(θ)[u(θ)] = f(θ). The principal computational approaches are: Monte Carlo / Direct Integration Directly compute statistic by quadrature: Ψ u (x) = E (Ψ(, u(x, ))) = Θ Ψ(θ, N(x)T u(θ)) dγ (θ) by numerical integration. Perturbation Assume that stochastics is a small perturbation around mean value, do Taylor expansion and truncate usually after linear or quadratic term. Response Surface Try to find a functional fit u(θ) v(θ), then compute with v. Stochastic Galerkin Use an ansatz for the solution u(θ) = β u(β) H β (θ) in the stochastic dimension, then do Galerkin method to obtain u (β).

Monte Carlo / Direct Integration 5 Numerical integration of Ψ u proceeds as follows:. Select integration rule, i.e. sampling points {θ z } Z z= and corresponding weights w z. 2. At each θ z, first solve the equations for u(θ z ) for fixed realisation θ z, then evaluate the integrand Ψ(θ z, u(x, θ z )). Use of normal, existing software. 3. Approximate Ψ u (x) = Θ Ψ(θ, N(x)T u(θ)) dγ (θ) Z z= w zψ(θ z, N(x) T u(θ z )). For Monte Carlo (MC), the points θ z are random according to measure Γ, and weights are w z = /Z. For Quasi Monte Carlo (QMC), the points θ z are non-random according to number-theoretic low discrepancy series, and weights are still w z = /Z. For Product Gauss or sparse grid Smolyak rules, the points θ z and weights w z are known.

High-Dimensional Integration 6 We want E (ψ(θ)) = Θ m ψ(θ m ) dγ m (θ m ) = ψ(θ,..., θ m )dγ (θ ) dγ (θ m ) θ θ m Monte-Carlo (MC) with Z realisations and expected error ɛ: Pure Monte Carlo (MC) ɛ = O( ψ 2 Z /2 ) Quasi Monte Carlo (QMC) ɛ = O ( ψ BV Z (log Z) m) Quadrature-formulas (Integrand in C r (Θ m ): ) Full Product k-point Gauss ɛ = O(Z (2r )/m ) Z = O (k m Sparse Grid Smolyak ɛ = O(Z r (log Z) (m )(r+) ) Z = O ( 2 k k! mk)

Perturbation 7 Assume that stochastic input (e.g. κ(x, θ) = κ(x) + κ(x, θ)) is small variation around mean. Do Taylor expansion of solution w.r.t stochastic input quantity. Equate terms of equal order. Approximate mean and covariance of solution. Limited to small stochastic variations, only gives mean and covariance, nothing else; other statistics very difficult and unreliable. Derivatives often computed via dual / adjoint approach.

Response Surface 8 Assume a simple functional form for v(θ) u(θ). Usually low order polynomial. Determine by sampling i.e. computing the solution at some point θ m and (least squares) fitting. Use of normal, existing software. Once response surcase v(θ) has been determined, evaluate statistics as in Direct Integration approach with v in case of u, but now evaluation of solution much quicker through response surface. As described works only well for low dimension m of stochastic space. In some sense a Galerkin solution is also a response surface.

Stochastic Galerkin 9 Recipe: Stochastic ansatz and projection in stochastic dimensions u(θ) = β H β (θ)u (β) =: H(θ)u, H(θ) = (...H β (θ)...), u = (...u (β)...) T. Goal: Compute coefficients u (β). Through stochastic (intrusive) Galerkin-Methods, γ : E ((f(θ) A(θ)[H(θ)u])H γ (θ)) =, #dof space #dof stoch eqns. 2. Non-intrusive / uncoupled, direct projection, u (β) is also a statistic, integral computed through numerical (e.g. MC, QMC, or Smolyak) quadrature: β : u (β) = E (u(θ)h β (θ)), many problems of size #dof space.

Results of Galerkin Method 2 err. 4 mean Galerkin scheme m = 6 k = 2 4 3 2 x y Prob{u(x) > 3.25},.8.6 MCsimulations.4.2 of response surface x y u (α) = E (u(ω)h α ) for α = (,,,,, )..8.6.4.2.8.6.4.2.2.4.6.8 x.6.8.8.6.4.2.2.4 y Error 4 in u (α) Galerkin scheme. 3 2.5 2.5.8.6.4.2.2.4.6.8 x.6.8.8.6.4.2.2.4 y

Galerkin-Methods for the Linear Case 2 γ satisfy: [ β R N(x) E (κ(x, θ)h β (θ)h γ (θ)) N(x) T dx ] u (β) = E (f(θ)h γ (θ)) }{{} =:f (γ) More efficient representation through expansion of κ in KLE and PCE and analytic computation of expectations. κ(x, θ) = κ j ξ j (θ)g j (x) j= r κ j ξ j (α) H α (θ)g j (x), α J 2k,m j= where J k,m = {α J α k, ı > m α ı = }.

Resulting Equations 22 Insertion of expansion of κ ( γ satisfy): ξ j (α) E (H α H β H γ ) N(x)κ }{{} j g j (x) N(x) T dx α β j }{{} =: (α) β,γ K j u (β) = f (γ) K j is stiffness matrix of a FEM discretisation for the material κ j g j (x). Use deterministic FEM program in black-box-fashion. Need:. Possibility to change material-parameter 2. Function returning mass-matrix R N(x)N(x)T dx for numerical KLE-decomposition. 3. Functions, which give residuum and (possibly) Jacobi-matrix K j. 4. Function, which will solve for a realisation, u(θ) = K (θ)f(θ)

Approximation Theory 23 Stability of discrete approximation under truncated KLE and PCE (M. & Keese, 23). Matrix stays uniformly positive definite. Convergence follows from Céas Lemma. Convergence rates under stochastic regularity (Benth & Gjerde, 998) in stochastic Hilbert spaces. Error estimation via dual weighted residuals possible. Stochastic Hilbert spaces start with formal PCE: R(θ) = α J R(α) H α (θ). Define for ρ and p norm (with (2N) β := j N (2j)β j ): R 2 ρ,p = α R (α) 2 (α!) +ρ (2N) pα.

Stochastic Hilbert Spaces Convergence Rates 24 Define for ρ, p : (S) ρ,p = {R(θ) = α J R (α) H α (θ) : R ρ,p < }. These are Hilbert spaces, the duals are denoted by (S) ρ, p, and L 2 (Ω) = (S),. Let P k,m be orthogonal projection from (S) ρ,p into the finite dimensional subspace spanned by {H α }, α J k,m. Theorem[Benth & Gjerde, Keese]: Let p R, q > p + r, where r.53 solves r = 2 r (r ), and let ρ. Then for any R (S) ρ,p : R P k,m (R) ρ,p R ρ,q c(m, k, q p), ( k+, where c(m, k, r) = c (r)m r r + c 2 (r) 2 (r )) and r c (r) = (2 r (r ) r), c 2 (r) = 2 r (r )c (r). E.g., if r = 2, then c(m, k, r) 2 = /2(m + (/2) k ).

Tensor Product Structure 25 The equations have the structure of a tensor product K u = ξ j (α) (α) K j u = f j α ξ j (α) (α) β,γ K j (α) β 2,γ K j (α) β,γ 2 K j (α) β 2,γ 2 K j u(β ). = α j..... u (β N) f (γ ).. f (γ N) #dof space #dof stoch linear equations. Exploit parallelism in the multiplication: parallel operator-sum, distributed block-vector Block-matrix efficiently stored and used in tensor-representation.

Sparsity Structure 26 Non-zero blocks for increasing degree of H α

Matrix Expansion Errors 27 H α is orthogonal on polynomials of degree < α (α) βγ = E (H αh β H γ ) = for α > β + γ, hence sum over α is finite. One can choose J such that expansion J ξ j (α) (α) K j u j= α is arbitrarily close to = N(x) E (κ(x, θ)h β (θ)h γ (θ)) N(x) T dx u (β), β R and hence global matrix uniformly positive definite. This stability argument is only valid for the (intrusive) Galerkin method.

Properties of Global Equations 28 K u = j α ξ j (α) (α) K j u = f Each K j is symmetric, and each (α). Block-matrix K is symmetric. SPDE ist positive definite. Appropriate expansion of κ K is uniformly positive definite stability. Solving the Equations: Never assemble block-matrix explicitly. Use K only as multiplication. Use Krylov Method (here CG) with pre-conditioner.

Block-Diagonaler Pre-Conditioner 29 Let K = K = stiffness-matrix for average material κ(x). Use deterministic solver as pre-conditioner: P = K........ = I K... K Good pre-conditioner, when variance of κ not too large. Otherwise use P = block-diag(k). This may again be done with existing deterministic solver. Block-diagonal P is well suited for parallelisation.

Parallelising the Matrix-Vector Product 3 γ : (K u) (γ) = J j N β n α α ξ (α) j (α) β,γ K j u β K j = deterministic solver. This may be a (lower-level) parallel program to do K j u β. Parallelise operator-sum in j several instances of deterministic solver in parallel. Distribute u and f Parallelise sum in β. Sum in α may also be done in parallel, but usually not essential.

Parallelisation 3 γ : (K u) (γ) = j β α ξ (α) j (α) β,γ K j u β Obviously Parallel in γ. Block-vectors u and f distributed. May be replicated, in order to reduce communication. Matrices K j distributed over processors. May be replicated, in order to reduce parallel communication. use more processors than number of K j. Several processor-groups, where each uses a subset of the K j and stores a subset of u and f.

Speedup Measurements I 32 On Cray T3E Constant problem size, N space = 75, N stoch = 54, (9 KLE-terms, degree 3 polynomial chaos), total number of equations, 55, 5. 2 operators K j Scaling up to > 5 6 equations. Distributed operator, distributed block-vector, (best use of memory): Efficiency Relative time used 2 4 6 8 2 4 6 8 2 Number of processors 5 5 2 Number of processors

Speedup Measurements II 33 Replicated operator, distributed block-vector (compromise): Efficiency.9.8.7.6.4.3.2. 2 4 6 8 2 4 6 8 2 Number of processors Operators distributed, replicated block-vector (parallelise for speed): Efficiency.9.8.7.6.4.3.2. 2 4 6 8 2 4 6 8 2 Number of processors

Speedup Measurements III 34 Execution time, problem size scaled with number of processors 9 8 7 6 Time [s] 5 4 3 2 2 4 6 8 2 Number of processors

Outlook 35 Stochastic problems at very beginning (like FEM in the 6 s). Deterministic programs can be used in this framework. Nonlinear (and instationary) problems possible (but much more work). Development of framework for stochastic coupling and parallelisation. Hierarchical parallelisation well possible, with different objectives: Memory Saver. Time Saver. Combination of both.

At Last... 36 Thank you for your attention! Email: wire@tu-bs.de URL: http://www.wire.tu-bs.de