A Spectral Approach to Linear Bayesian Updating

A Spectral Approach to Linear Bayesian Updating Oliver Pajonk 1,2, Bojana V. Rosic 1, Alexander Litvinenko 1, and Hermann G. Matthies 1 1 Institute of Scientific Computing, TU Braunschweig, Germany 2 SPT Group GmbH, Hamburg, Germany 1

Outline Motivation / Problem Statement Proposed Solution Examples Non-trivial scalar update Gauss-linear model State estimation of Lorenz-63 model Conclusions / Outlook 2

Motivation Application: stochastic inverse & control problems on dynamical systems Uncertain parameter & state estimation from noisy evidence, Subsequent optimal control under uncertainty (or closed-loop; not considered further in this talk) Most existing (linear) Bayesian methods use sampling Spectral representations of RVs possess nice convergence properties, are deterministic make use of that! Updating and spectral representation should be tightly integrated ( uncertainty quantification) Approaches exist, but have specific problems 3

Linear Bayesian Updating of Polynomial Chaos Coefficients (1/3) First ingredient: Linear Update Formula for RVs 1 : Second ingredient: Polynomial Chaos Expansion 2 : x ω = x ω + K z ω y ω with K = C xy C z +C y 1 r ω = r α α J H α θ 1 ω,, θ k ω, y(ω) = h x ω z ω = h x + ε(ω) x(ω) prior, x(ω) posterior, x unknown truth to identify, ε ω measurement error, z ω evidence + error model, h measurement operator 1 Kalman (1960) [Gaussian RVs]; Luenberger (1969) [L 2 RVs] 2 Possibly other spectral expansions. Projection on PCE gives: α J: x α = x α + K z α y α Truncation & limitation to finite amount of basis RVs θ i ω gives: α J Z : x α = x α + K z α y α This LPCU is a formula which actually can be implemented: 4

Linear Bayesian Updating of Polynomial Chaos Coefficients (2/3) Write truncated PCE as matrices of coefficient column vectors: X =, x α, X = X + K Z Y Kalman gain computation: Introduce diagonal Gram matrix: αβ = E H α H β = diag(α!) Some matrix shorthands: X = X α>0, X = X α=0 = E X, therefore: X = [X X] Then the involved (cross-) covariance matrices are easy to approximate, e.g.: C xy X Y T, and the computation of K is complete. 5

Linear Bayesian Updating of Polynomial Chaos Coefficients (3/3) Now all ingredients of the formula are ready... aren t they? not quite. Important: ε ω and y ω are assumed as uncorrelated, therefore Z Y (as defined above) is technically wrong! The problem: C yε = C yz = 0 is assumed for the formula. The variance of z ω and y ω must add. Possible solutions: 1. ignore it, 2. fix it, 3. circumvent it. 6

The Implementation Issue and Possible Treatments Ignore it: Simply re-use the same basis RVs for evidence error and forward model. In other words: directly compute Z Y, thereby introducing that ε ω and y ω are correlated. See this as necessary approximation (denoted co-linear in the following). However, this approach over-estimates the posterior variance. Fix it: Introduce new basis RVs with each update. Not really applicable for sequential updating in dynamical systems. Circumvent it: Come up with some consistent way to avoid the additional RVs; e.g. Zeng (2010) and Blanchard (2010) argue that the PCE coefficients are zero due to independence. However, this approach under-estimates the posterior variance (cf. Burgers (1998) for EnKF case). 7

Circumvent It, Differently: A Square Root Implementation (1/3) Idea of square root approaches: Update mean x(ω) and varying part x ω of RV x(ω) independently (cf. Potter (1963). First: Update for the mean remains as-is: X = X + K Z Y. Second: Update for the varying part: Realise that: C x X X T = X X T = SS T ( is diagonal) S is a (very specific) square root of the prior covariance matrix. Idea: Transform S into S, a square root of the posterior covariance matrix? Ansatz: Find matrix A with S = SAT. (T some orthonormal matrix) 8

Circumvent It, Differently: A Square Root Implementation (2/3) Start with update for covariance (e.g. Kalman, 1960): (req.: linear measurement H) C x = (I KH)C x = C x C x H T HC x H T + C z 1 HC x = SS T SS T H T HSS T H T + C z 1 HSS T = SMS T Substitute: C x SS T with M = I S T H T HSS T H T + C z 1 HS. A matrix A with AA T = M would be a solution to the ansatz. Therefore: Compute eigenvalue decomposition HSS T H T + C z = BΛB T. Then: M = I S T H T BΛB T 1 HS = I S T H T BΛ 1 B T HS = I Λ 1 2B T HS = I W T W (cf. Evensen, 2004: EnSRF) T Λ 1 2B T HS M = I (UΣV T ) T UΣV T = I VΣ T ΣV T = V I Σ T Σ V T = V I Σ T Σ V I Σ T Σ T giving the desired square root of M. 9

Circumvent It, Differently: A Square Root Implementation (3/3) With this result, the ansatz becomes: S = SAT = S V I Σ T Σ T. It remains to choose T. See V as mapping between normalized PCE space and covariance structure space. We need to map back; therefore, choose T = V T, giving the final update equation for the varying part: S = S V I Σ T Σ V T. To obtain posterior PCE, set X = S Δ and join with updated mean: X = [XX]. This will be denoted as SRPCU in the following. Approach is not uniqe: Use a pre-multiplication ansatz Use a different derivation for the square root Choice of T may be conclusive but it is still a choice! others exist. 10

Numerical Example: Non-trivial Scalar Update 11

Numerical Example: Non-trivial Scalar Update (Zoom) 12

Numerical Example: Lorenz-63 State Estimation System model: 3d, non-linear, chaotic Task: State estimation from evidence Noise model: N(μ = 0, σ = 3) for all three variables Initial conditions & parameters: standard choices Governing equations: dx dt dy dt = s y x = rx y xz dz = xy bz dt 13

Following Plots: Some Functionals and their Reliability Functionals f: root mean square error (RMSE = 1 N variance, skewness, and kurtosis 1000 repetitions of each experiment Plots contain M f and a reliability measure of functional f: V( ), the unbiased sample variance, and M( ), the sample mean, V(f) M f 2 are computed over the 1000 repetitions. N E x i x 2 i=1 i ), probabilistic components are randomized (evidence noise, initial ensemble noise, simulated data noise) 14

Lorenz-63: RMSE (SRPCU, EnSRF) 15

Lorenz-63: Variance (SRPCU, EnSRF) 16

Lorenz-63: Skewness (SRPCU, EnSRF) 17

Lorenz-63: Kurtosis (SRPCU, EnSRF) 18

Lorenz-63: PDF Estimates @ t = 80 (SRPCU, EnSRF) EnSRF: tends to produce outliers, clusters (known effect) SRPCU: smooth, non- Gaussian 19

Lorenz-63: PDF Estimates @ t = 80 (SRPCU, EnKF) EnKF: smooth, but strong tendency to Gaussian estimates SRPCU: smooth, non- Gaussian 20

Conclusions Fully deterministic method (as opposed to EnKF, EnSRF) Applications where this is mandatory Very efficient (update takes practically no time, evolution of the model is expensive part) Exact for Gauss-linear problems (as theory would predict) Higher moments are transferred from prior to posterior, mean and variance are corrected Avoids the outlier problem of EnSRF Avoids the growing PCE basis problem of correct, but non-square-root schemes Drawback: only evidence and assumed covariance enter the update not distributional form (e.g. non-gaussian) 21

Outlook Combination with adaptive subspace selection schemes Other spectral expansions Collocation methods (cf. Zeng (2010)) Different V to change variance re-distribution Iterative variants for improved non-linear identification Pre-multiplication schemes (as opposed to the post-multiplication used here) Non-linear h( )? Regularization techniques like covariance localization (cf. EnKF) 22

References Pajonk, O.; Rosić, B. V.; Litvinenko, A. & Matthies, H. G., A Deterministic Filter for Non-Gaussian Bayesian Estimation, Physica D: Nonlinear Phenomena, 2012, 241, 775-788, DOI:10.1016/j.physd.2012.01.001 Rosić, B. V.; Litvinenko, A.; Pajonk, O. & Matthies, H. G., Direct Bayesian Update of Polynomial Chaos Representations, Journal of Computational Physics, 2011, Submitted for publication Related Methods: Blanchard, E. D.; Sandu, A. & Sandu, C., A Polynomial Chaos-Based Kalman Filter Approach for Parameter Estimation of Mechanical Systems, Journal of Dynamic Systems, Measurement, and Control, ASME, 2010, 132, 061404 Zeng, L. & Zhang, D., A stochastic collocation based Kalman filter for data assimilation, Computational Geosciences, Springer Netherlands, 2010, 14, 721-744 Saad, G. A., Stochastic Data Assimilation with Application to Multi-Phase Flow and Health Monitoring Problems, Faculty of the Graduate School, University of Southern California, 2007 Bibliography: Kálmán, R. E., A New Approach to Linear Filtering and Prediction Problems, Transactions of the ASME - Journal of Basic Engineering, 1960, 82, 35-45 Potter, J. E. & Stern, R. G., Statistical filtering of space navigation measurements, Proceedings of the AIAA Guidance and Control Conference, Massachusetts Institute of Technology, August, 1963 Evensen, G., Sampling strategies and square root analysis schemes for the EnKF, Ocean Dynamics, 2004, 54, 539-560 23

Numerical Example: Gauss-Linear Model (1/3) Case with σ = 0.01 Mean good for all methods (watch scale!) Dirac evidence underestimates variance RMSE is smaller Colinear evidence overestimates variance, but order of EnKF noise (not visible) 25

Numerical Example: Gauss-Linear Model (2/3) Case with σ = 0.1 Mean bad for colinear and Dirac evidence Variance estimates are constantly wrong for both, but not completely off RMSE therefore worse than KF solution 26

Numerical Example: Gauss-Linear Model (3/3) Case with σ = 1.0 Mean is partly completely off, variance too LPCU with square root, EnKF, and EnSRF reproduce the KF result in all cases Other LPCU variants over/underestimate variance 27