Quantifying Uncertainty: Modern Computational Representation of Probability and Applications Hermann G. Matthies with Andreas Keese Technische Universität Braunschweig wire@tu-bs.de http://www.wire.tu-bs.de
Overview. Background: Motivation and Model Problem. Discretisation of Random Fields 3. Stochastic Galerkin Methods 4. Low Rank Tensor Product Approximations 5. Structure of Discrete Equations and Solvers 6. Keeping Low Rank Operation Counts
Why Probabilistic or Stochastic Models? 3 Many descriptions (especially of future events) contain elements, which are uncertain and not precisely known. For example future rainfall, or discharge from a river. More generally, action from surrounding environment. The system itself may contain only incompletely known parameters, processes or fields (not possible or too costly to measure) There may be small, unresolved scales in the model, they act as a kind of background noise. All these introduce some uncertainty in the model.
Ontology and Modelling 4 A bit of ontology: Uncertainty may be aleatoric, which means random and not reducible, or epistemic, which means due to incomplete knowledge. Stochastic models can give quantitative information about uncertainty, they are used for both types of uncertainty. Possible areas of use: Reliability, heterogeneous materials, upscaling, incomplete knowledge of details, uncertain [inter-]action with environment, random loading, etc.
Quantification of Uncertainty 5 Uncertainty may be modelled in different ways: Intervals / convex sets do not give a degree of uncertainty, quantification only through size of sets. Fuzzy and possibilistic approaches model quantitative possibility with certain rules. Generalisation of set membership. Evidence theory models basic probability, but also (as a generalisation) plausability (a kind of lower bound) and belief (a kind of upper bound) in a quantitative way. Mathematically no measures. Stochastic / probabilistic methods model probability quantitatively, have most developed theory.
Physical Models 6 Models for a system S may be stationary with state u, exterior action f and random model description (realisation) ω Ω, with probability measure P: S(u, ω) = f(ω). Evolution in time may be discrete (e.g. Markov chain), may be driven by discrete random process u n+ = F(u n, ω), or continuous, (e.g. Markov process, stochastic differential equation), may be driven by random processes du = (S(u, ω) f(ω, t))dt + B(u, ω)dw (ω, t) + P(u, ω)dq(ω, t) In this Itô evolution equation, W (ω, t) is a Wiener process, and Q(ω, t) is a (compensated) Poisson process.
Model Problem 7 Geometry.5.5.5.5 Aquifier D Model simple stationary model of groundwater flow (κ(x) u(x)) = f(x) & b.c., x G R d κ(x) u(x) = g(x), x Γ G, u hydraulic head, κ conductivity, f and g sinks and sources.
Model Stochastic Problem 8 Geometry flow = Sources flow out Aquifier D Model simple stationary model of groundwater flow with stochastic data.5.5.5 Dirichlet b.c. (κ(x, ω) u(x, ω)) = f(x, ω) & b.c., x G R d κ(x) u(x, ω) = g(x, ω), x Γ G, ω Ω κ stochastic conductivity, f and g stochastic sinks and sources..5
Realisation of κ(x, ω) 9
Stochastic PDE and Variational Form Solution u(x, ω) is a stochastic field in a tensor product space W is a Sobolev space of spatial functions, S a space of random variables: W S u(x, ω) = v µ (x)u (µ) (ω) µ Variational formulation: Find u W S, such that v W S : a(v, u) := v(x, ω) (κ(x, ω) u(x, ω)) dx P(dω) = Ω [ G Ω G v(x, ω)f(x, ω) dx + G ] v(x, ω)g(x, ω) ds(x) P(dω) =: f, v.
Mathematical Results To find a solution u W S such that for v : under certain conditions a(v, u) = f, v is guaranteed by Lax-Milgram lemma, problem is well-posed in the sense of Hadamard (existence, uniqueness, continuous dependence on data f, g in L - and on κ in L -norm). may be achieved by Galerkin methods, convergence established with Céa s lemma Galerkin methods are stable, if no variational crimes are committed Good approximating subspaces of W S have to be found, as well as efficient numerical procedures worked out.
Functionals Desirable: Uncertainty Quantification or Optimisation under uncertainty: The goal is to compute functionals of the solution: Ψ u = Ψ(u) := E (Ψ(u)) := Ω G Ψ(u(x, ω), x, ω) dx P(dω) e.g.: ū = E (u), or var u = E ( (ũ) ), where ũ = u ū, or P{u u } = P({ω Ω u(ω) u }) = E ( ) χ {u u } All desirables are usually expected values of some functional, to be computed via (high dimensional) integration over Ω.
General Computational Approach 3 Principal Approach:. Discretise / approximate physical model (e.g. via finite elements, finite differences), and approximate stochastic model (processes, fields) in finitely many independent random variables (RVs), stochastic discretisation.. Compute statistics via integration over Ω high dimensional (e.g. Monte Carlo, Quasi Monte Carlo, Smolyak (= sparse grids)): Via direct integration. Each integration point ω z Ω requires one expensive PDE solution (with rough data). Or approximate solution with some response-surface, then integration by sampling a cheap expression at each integration point.
Computational Requirements 4 How to represent a stochastic process for computation, both simulation or otherwise? Best would be as some combination of countably many independent random variables (RVs). How to compute the required integrals or expectations numerically? Best would be to have probability measure as a product measure P = P... P l, then integrals can be computed as iterated one-dimensional integrals via Fubini s theorem, Ψ P(dω) =... Ψ P (dω )... P l (dω l ) Ω Ω l Ω
Random Fields 5 Mean κ(x) = E (κ ω (x)) and fluctuating part κ(x, ω). Covariance may be considered at different positions C κ (x, x ) := E ( κ(x, ) κ(x, )) If κ(x) κ, and C κ (x, x ) = c κ (x x ), process is homogeneous. Here representation through spectrum as a Fourier sum is well known. Need to discretise spatial aspect (generalise Fourier representation). One possibility is the Karhunen-Loève expansion (KLE). Need to discretise each of the random variables in Fourier synthesis. One possibility is Wiener s polynomial chaos expansion (PCE).
Karhunen-Loève Expansion I 6 KL mode KL mode 5.8.6.4..5.5.5 mode :.5.5.5.5 mode 5: KLE: Other names: Proper Orthogonal Decomposition (POD), Singular Value Decomposition (SVD), Principal Component Analalysis (PCA): spectrum of {κj } R + and orthogonal KLE eigenfunctions g j (x): C κ (x, y)g j (y) dy = κj g j (x) with g j (x)g k (x) dx = δ jk. G G.5.5.5.5 Mercer s representation of C κ : C κ (x, y) = j= κ j g j (x)g j (y)
Karhunen-Loève Expansion II 7 4.5.5.5.5.5.5.5.5.5.5.5.5 mode 5 mode mode 5 Representation of κ: κ(x, ω) = κ(x) + κ j g j (x)ξ j (ω) =: κ j g j (x)ξ j (ω) j= j= with centred, normalised, uncorrelated random variables ξ j (ω): E (ξ j ) =, E (ξ j ξ k ) =: ξ j, x k L (Ω) = δ jk.
Karhunen-Loève Expansion III 8 Realisation with: 3 3 3.5.5 x.5 y.5.5.5 x.5 y.5.5.5 x.5 y.5 6 modes 5 modes 4 modes Truncate after m largest eigenvalues optimal in variance expansion in m RVs.
Karhunen-Loève Expansion IV 9 Modes for a 3-D domain 4 4 4 5 5 5 5 5 4 5 5 5 5 5 mode mode 9
Karhunen-Loève Expansion V Reminder: SVD of a matrix W = UΣV T = j σ ju j v T j with U T U = I, V T V = I, and Σ = diag(σ j ). The σ j are singular values of W and σ j are eigenvalues of W T W or W W T. To every random field w(x, ω) L (G) L (Ω) associate a linear map W : L (G) L (Ω) W : L (G) v W (v)(ω) = v( ), w(, ω) L (G) = v(x)w(x, ω) dx L (Ω). KLE is SVD of the map W, the covariance operator is C w := W W, G
Karhunen-Loève Expansion VI u, C w v L (G) = u, W W v L (G) = W (u), W (v) L (Ω) = ( ) E (W (u)w (v)) = E u(x)w(x, ω) dx v(y)w(y, ω) dy G G G u(x)e (w(x, ω)w(y, ω)) v(y) dy dx = G G u(x) G = C w (x, y)v(y) dy dx Covariance operator C w is represented by covariance kernel C w (x, y). Truncating the KLE is therefore the same as what is done when truncating a SVD, finding a sparse representation (model reduction).
Polynomial Chaos Expansion in Gaussians Each ξ j (ω) = α ξ(α) j H α (θ(ω)) from KLE may be expanded in plynomial chaos expansion (PCE), with orthogonal (uncorrelated independent) polynomials of Gaussian RVs {θ m (ω)} m= =: θ(ω): H α (θ(ω)) = h αj (θ j (ω)), j= where h l (ϑ) are the usual Hermite polynomials, and J := {α α = (α,..., α j,...), α j N, α := α j < } are multi-indices, where only finitely many of the α j are non-zero. Here H α, H β L (Ω) = E (H α H β ) = α! δ αβ, where α! := j= (α j!). j=
Polynomial Chaos 3 4 8 6 4 4 y 4 y Hermite (,) Hermite (,) 6 4 4 4 6 y 4 y Hermite (,3) Hermite (3,3)
First Summary 4 Representation in uncorrelated independent Gaussian RVs Allows direct simulation independent Gaussian RVs in computer by random generators (quasi-random), or direct hardware (random). Each finite dimensional joint Gaussian measure Γ m (dθ) is a product measure, allows Fubini s theorem (Γ m (dθ) = m l= Γ (dθ l )). Polynomial expansion theorem allows approximation of any random variable, especially also u(x, ω) = u(x, θ) even wild ones with no variance or no mean unlike Monte Carlo. Still open: How to actually compute u(θ)? How to perform integration?
Computational Approaches 5 The principal computational approaches are: Direct Integration (e.g. Monte Carlo) Directly compute statistic by quadrature: Ψ u = E (Ψ(u(ω), ω)) = Ψ(u(θ), θ) Γ (dθ) by Θ numerical integration. Perturbation Assume that stochastics is a small perturbation around mean value, do Taylor expansion and truncate usually after linear or quadratic term. Response Surface Try to find a functional fit u(θ) v(θ), then compute with v. Integrand is now cheap. Stochastic Galerkin This is one possible form of response surface.
Stability Issues 6 For direct integration approaches direct expansion (both KLE and PCE) pose stability problems: Both only converge in L, not in L (uniformly) as required spatially discrete problems to compute u(θ z ) for a specific realisation θ z (like Monte Carlo) may not be well posed. Convergence of KLE may be uniform if covariance C κ (x, x ) smooth enough, but e.g. not possible for C κ (x, x ) = exp( a x x ) Truncation of PCE gives a polynomial, as soon as one α j is odd, there are regions where κ is negative compare approximating exp(ξ) with a truncated Taylor poplynomial at odd power. This can not be repaired. Like negative Jacobian in normal FEM. Method κ(x, ω) = φ(x, γ(x, ω)) possible with KLE of Gaussian γ(x, ω).
Stochastic Galerkin I 7 Variational formulation discretised in space, e.g. via finite element ansatz u(x, ω) = n l= u l(ω)n l (x) = [N (x),..., N n (x)][u (ω),..., u n (ω)] T = N(x) T u(ω): K(ω)[u(ω)] = f(ω). Recipe: Stochastic ansatz and projection in stochastic dimensions u(θ) = β u (β) H β (θ) = [..., u (β),...][..., H β (θ),...] T = uh(θ) Goal: Compute coefficients u (β) through stochastic Galerkin Methods, γ : E ((f(θ) K(θ)[uH(θ)])H γ (θ)) =, requires solution of one huge system, only integrals of residuals.
Stochastic Galerkin II 8 Of course we can not use all α J, but take only a finite subset J k,m = {α J α k, ı > m α ı = } J. Let S k,m = span{h α : α J k,m }, ( ) m + p + then dim S k,m = p + better to use other subsets best is adaptive choice. m k dim S k,m 3 3 35 5 84 5 3 6 5 46 3 5 88 3 66 5 33 8.5 7 3 4.6 6 5.7 9 4.7 4
Simpler Case of Additive Noise 9 Assume first that K is not random: γ J k,m satisfy: β J k,m E (H γ (θ)h β (θ)) Ku (β) = γ! Ku (γ) = E (f(θ)h γ (θ)) =: γ! f (γ) There are too many f (γ). Use discrete version of KLE of f(θ): f(θ) = f + ϕ l φ l (θ)f l = f + ϕ l φ (γ) l H γ (θ)f l. l l γ In particular f (γ) = l ϕ l φ (γ) l f l. The SVD of f(θ) is with Φ = (φ (γ) l ), ϕ = diag(ϕ l ), and F = [..., f l,...]; f = [..., f (γ),...] = F ϕφ T = l ϕ l v l φ T l.
Solution for Additive Noise 3 Observe Kū = f, and set V := K F (i.e. solve KV = F ), then u(θ) = ū + V ϕφ T H = ū + l ϕ l f l φ T l H But this is not the SVD of u(θ)! This ) via eigenproblem: Covariance C f := E ( f(θ) f(θ) = F ϕ F T. Hence C u := E (ũ(θ) ũ(θ)) = K F ϕ (K F ) T = V ϕ V T Even fewer terms needed with SVD of u(θ) from C u U = (V ϕ V T )U = Uυ Sparsification achieved for u via SVD with small m: u = [..., u (β),...] = [..., u m ] diag(υ m )(y (β) m ) T = UυY T.
Galerkin-Methods for the General Linear Case 3 γ J k,m = {α J α k, ı > m α ı = } satisfy: [ ] N(x) E (κ(x, θ)h β (θ)h γ (θ)) N(x) T dx u (β) = β G E (f(θ)h γ (θ)) }{{} =:γ!f (γ) More efficient representation through direct expansion of κ in KLE and PCE and analytic computation of expectations. κ(x, θ) = κ j ξ j (θ)g j (x) j= r j= κ j ξ j (α) α J k,m H α (θ)g j (x).
Resulting Equations 3 Insertion of expansion of κ ( γ satisfy): ξ j (α) E (H α H β H γ ) N(x)κ }{{} j g j (x) N(x) T dx α β j =: (α) }{{} β,γ K j u (β) = f (γ) K j is stiffness matrix of a FEM discretisation for the material κ j g j (x). Use deterministic FEM program in black-box-fashion. Equations have structure of a tensor product (storage and use). K u = ξ j (α) (α) K j u = f j α #dof space #dof stoch linear equations.
Sparsity Structure 33 Non-zero blocks of (α) for increasing degree of H α
Properties of Global Equations 34 K u = j α ξ j (α) (α) K j u = f Each K j is symmetric, and each (α) Block-matrix K is symmetric. Appropriate expansion of κ K is uniformly positive definite. Never assemble block-matrix explicitly. (α) are known analytically. No need to store explicitly. Use K only as multiplication. Use Krylov method (here CG) with pre-conditioner.
Block-Diagonal Pre-Conditioner 35 Let K = K = stiffness-matrix for average material κ(x). Use deterministic solver as pre-conditioner: P = K........ = I K... K Good pre-conditioner, when variance of κ not too large. Otherwise use P = block-diag(k). This may again be done with existing deterministic solver. Block-diagonal P is well suited for parallelisation.
Sparsification 36 Goal: As with additive noise, compute only with f l, u l from SVD. Start iteration with tensor product of low rank L. f = [..., f (γ),...] = F ϕφ T = ϕ l v l φ T l. l At each iteration k, rank of iterative approximation u k will increase by number of terms in matrix sum. In each iteration, perform a SVD of u k and reduce rank again to L. Resulting iteration converges to SVD = discrete KLE of u.
Operation Count 37 K u = ξ j (α) (α) K j u = f j α Assume sum in K has K terms, u and f have size N M, and f = [..., f (γ),...] = F ϕφ T = l ϕ l v l φ T l has L N, L M terms. Each application of K on full u k needs K M K-multiplications plus K N -multiplications. Each application of K on low rank tensor product u k needs K L K-multiplications plus K L -multiplications, which is much less. Storage is reduced from N M to L (N + M).
Second Summary 38 Stochastic calculations produce huge amounts of data, which is expensive to operate on and to store. Results live in very high dimensional spaces. They have a natural tensor product structure. Low Rank tensor product approximations reduce operations and storage, they are a model reduction. This procedure may be extended into the non-linear regime.
Example Solution 39 Geometry.5 flow = flow out.5 Sources Dirichlet b.c..5.5 5 5 Realization of κ 9 8 7 8 6 4 Realization of solution 9.5 9 8.5 8 7.5 7 6.5 6 5.5 5 Mean of solution 9 8 7 6 5 4 6 4 Variance of solution 5 4 3.8.6.4..5.5 x Pr{u(x) > 8}.5 y.5
Approximation Theory Stability of discrete approximation under truncated KLE and PCE. Matrix stays uniformly positive definite. Convergence follows from Céa s Lemma. Convergence rates under stochastic regularity in stochastic Hilbert spaces stochastic regularitry theory?. Error estimation via dual weighted residuals possible. 4 Stochastic Hilbert spaces start with formal PCE: R(θ) = α J R(α) H α (θ). Define for ρ and p norm (with (N) β := j N (j)β j ): R ρ,p = α R (α) (α!) +ρ (N) pα.
Stochastic Hilbert Spaces Convergence Rates 4 Define for ρ, p : (S) ρ,p = {R(θ) = α J R (α) H α (θ) : R ρ,p < }. These are Hilbert spaces, the duals are denoted by (S) ρ, p, and L (Ω) = (S),. Let P k,m be orthogonal projection from (S) ρ,p into the finite dimensional subspace S k,m. Theorem: Let p >, r > and let ρ. Then for any R (S) ρ,p : R P k,m (R) ρ, p R ρ, p+r c(m, k, r), where c(m, k, r) = c (r)m r + c (r) kr. But dim S k,m grows too quickly with k and m. Sparser spaces and error estimates needed.
Results of Galerkin Method 4 err. 4 in mean m = 6 k = 4 3.5.5 x.5 y.5 err. 4 in std dev m = 6 k =.4.3...5.5 x.5 y.5 u (α) for α = (,,,,, )..8.6.4..8.6.4...4.6.8 x.6.8.8.6.4...4 y Error 4 in u (α) Galerkin scheme. 3.5.5.5.5.8.6.4...4.6.8 x.6.8.8.6.4...4 y
Let M (k) f Example: Computation of Moments k { times }}{ = E f(ω)... f(ω) k = E ( f ), totally symmetric, 43 and M () f = f, M () f = C f. KLE of C f is C f = l ϕ l f l f T l = l ϕ l f l f l. For deterministic operator K, just compute Kv l = f l, and then Kū = f, ( and M () u = C u = E ũ k) = l ϕ l v l v l. α ϕ l φ (α) H α (θ)v l, it results that multi-point correlation As ũ(θ) = l M (k) u = k l... l k m= l ϕ lm k α (),...,α (k) n= φ α (n) l n E ) (H α() H α(k) v l... v lk.
Non-Linear Equations 44 Example: Use κ(x, u, ω) = a(x, ω) + b(x, ω)u, and a, b random. Space discretisation generates a non-linear equation A( β u(β) H β (θ)) = f(θ). Projection onto PCE: a[u] = [..., E H α (θ)a( β u (β) H β (θ)),...] = f = ˆfϕφ T Expressions in a need high-dimensional integration (in each iteration), e.g. Monte Carlo or Smolyak (sparse grid) quadrature. After that, a should be sparsified or compressed. The residual equation to be solved is r(u) := f a[u] =.
Solution of Non-Linear Equations 45 As model problem is gradient of some functional, use a quasi-newton (here BFGS with line-search) method, to solve in k-th iteration ( u) k = H k r(u) u k = u k + ( u) k k H k = H + (a j p j p j + b j q j q j ) j= Tensors p and q computed from residuum and last increment. Notice tensor products of (hopefully sparse) tensors. Needs pre-conditioner H for good convergence: May use linear solver as described before, or just preconditioner (uses again deterministic solver), i.e. H = Dr(u ) or H = I K.
Third Summary 46 Stochastic Galerkin methods work. They are computationally possible on todays hardware. They are numerically stable, and have variational convergence theory behind them. They can use existing software efficiently. Software framework is being built for easy integration of existing software.
Important Features of Stochastic Galerkin 47 For efficency try and use sparse representation throughout: ansatz in tensor products, as well as storage of solution and residuum and matrix in tensor products, sparse grids for integration. In contrast to MCS, they are stable and have only cheap integrands. Can be coupled to existing software, only marginally more complicated than with MCS.
Outlook 48 Stochastic problems at very beginning (like FEM in the 96 s), when to choose which stochastic discretisation? Nonlinear (and instationary) problems possible (but much more work). Development of framework for stochastic coupling and parallelisation. Computational algorithms have to be further developed. Hierarchical parallelisation well possible.