Background and methods NCEO, Dept. of Meteorology, Univ. of Reading 710 March 2018, Univ. of Reading
Bayes' Theorem Bayes' Theorem p(x y) = posterior distribution = p(x) p(y x) p(y) prior distribution likelihood normalizing constant Prior distribution: PDF of the state before observations are considered (e.g. PDF of model forecast). Likelihood: PDF of observations given that the state is x. Posterior: PDF of the state after the observations have been considered.
The Gaussian assumption PDFs are often described by Gaussians (normal distributions). Gaussian PDFs are described by a mean and covariance only. P(x) = x N(x b,b) 1 (2π) n det(b) -10-5 0 5 epsilon1 ε = x x b 10 15 20-10 -5 0 5 20 15 10 exp 1 2 epsilon2 ( x x b ) T B 1 ( x x b)
Meaning of x and y x a analysis; x b background state; δ x increment (perturbation) y observations; y m = H (x) model observations. H (x) is the observation operator / forward model. Sometimes x and y are for only one time (3DVar). x-vectors have n elements; y-vectors have p elements.
Back to the Gaussian assumption Prior: mean x b, covariance B P(x) = 1 (2π) n det(b) exp 1 2 ( x x b ) T B 1 ( x x b) Likelihood: mean H (x), covariance R P(y x) = 1 (2π) p det(r) exp 1 2 (y H (x))t R 1 (y H (x)) Posterior p(x y) = p(x) p(y x) p(y) exp 1 [ (x x b ) T B 1 ( x x b) 2 ] +(y H (x)) T R 1 (y H (x))
Variational DA the idea In Var., we seek a solution that maximizes the posterior probability p(x y) (maximum-a-posteriori). This is the most likely state given the observations (and the background), called the analysis, x a. Maximizing p(x y) is equivalent to minimizing ln p(x y) J(x) (a least-squares problem). p(x y) = C exp 1 2 J(x) = lnc + 1 2 [ (x x b ) T B 1 ( x x b) ] +(y H (x)) T R 1 (y H (x)) ( x x b ) T B 1 ( x x b) + 1 2 (y H (x))t R 1 (y H (x)) = constant (ignored) + J b (x) + J o (x)
Four-dimensional Var (4DVar) Aim To nd the `best' estimate of the true state of the system (analysis), consistent with the observations, the background, and the system dynamics.
Towards a 4DVar cost function Consider the observation operator in this case: x 0 H 0 (x 0 ) H (x) = H. =. H N (x N ) x N So the J o is (assume that R is block diagonal): J o = 1 2 (y H (x))t R 1 (y H (x)) = T 1 y 0 H 0 (x 0 ) R 0 0 0 y 0 H 0 (x 0 ) 1. 2. 0.. 0. y N H N (x N ) 0 0 R N y N H N (x N ) = 1 2 N i=0 (y i H i (x i )) T R 1 i (y i H i (x i )) where x i+1 = M i (x i )
The 4DVar cost function (`full 4DVar') Let (a) T A 1 (a) (a) T A 1 ( ) J(x) = 1 ( x0 x b ) T 0 B 1 0 2 ( ) + 1 2 (y H (x))t R 1 ( ) = 1 ( x0 x b ) T 0 B 1 0 2 ( ) + 1 2 N i=0 subject to x i+1 = M i (x i ) (y i H i (x i )) T R 1 i ( ) x b 0 a-priori (background) state at t 0. y i observations at t i. H i (x i ) observation operator at t i. B 0 background error covariance matrix at t 0. R i observation error covariance matrix at t i.
How to minimize this cost function? Minimize J(x) iteratively The gradient of the cost function Use the gradient of J at each iteration: x k+1 0 = x k 0 + α J(x k 0) J/ (x 0 ) 1 J(x 0 ) =. J/ (x 0 ) n J points in the direction of steepest descent. Methods: steepest descent (inecient), conjugate gradient (more ecient),...
The gradient of the cost function (wrt x(t 0 )) Either: 1 Di. J(x 0 ) w.r.t. x 0 with x i = M i 1 (M i 2 ( M 0 (x 0 ))). 2 Di. J(x) = J(x 0,x 1,...,x N ) w.r.t. x 0,x 1,...,x N subject to the constraint N 1 L(x,λ) = J(x) + x i+1 M i (x i ) = 0 i=0 Each approach leads to the adjoint method λ T i+1 (x i+1 M i (x i )). An ecient means of computing the gradient. Uses the linearized/adjoint of M i and H i : M T i and H T i.
The adjoint method Equivalent gradient formula: 1 J J(x 0 ) = B 1 ( 0 x0 x b ) 0 N i=0 M T 0...M T i 1 HT i R 1 i (y i H i (x i )) 2 λ N+1 = 0 λ i = H T i R 1 i (y i H i (x i )) + M T i λ i+1 λ 0 = J o J = J b + J o = B 1 0 ( x0 x b ) 0 + λ 0
The adjoint method
Simplications and complications The full 4DVar method is expensive and dicult to solve. Model M i is non-linear. Observation operators, H i can be non-linear. Linear H quadratic cost function easy(er) to minimize, J o 1 2 (y ax)2 /σ 2 o. Non-linear H non-quadratic cost function hard to minimize, J o 1 2 (y f (x))2 /σ 2 o. Later will recognise that models are `wrong'! Look for simplications: Incremental 4DVar (linearized 4DVar) 3D-FGAT 3DVar Complications: Weak constraint (imperfect model)
Incremental 4DVar 1 definitions: x R i+1(k) = M i ( ) x R i(k) x i = x R i(k) + δ x i x b 0 = x R 0(k) + δ xb 0 x i+1 = M i (x i ) H i (x i ) H i (x R i(k) δ x i+1 M i(k) δ x i ) + H i(k) δ x i δ x i M i 1(k) M i 2(k)...M 0(k) δ x 0
Incremental 4DVar 2 J (k) (δ x 0 ) = 1 ( δ x0 δ x b ) T ( ) 0 B 1 2 1 2 N i=0 ( 0 y i H i (x R i(k) ) H i(k)δ x i ) T R 1 i ( ) `Inner loop': iterations to nd δ x 0 (as adjoint method). `Outer loop' (k): iterate x R 0(k+1) = xr 0(k) + δ x 0 Inner loop is exactly quadratic (e.g. has a unique minimum). Inner loop can be simplied (lower res., simplied physics).
Simplication 1: 3D-FGAT Three dimensional variational data assimilation with first guess (i.e. x R i(k) ) is computed at the appropriate time. Simplication is that M i(k) I, i.e. δ x i = M i 1(k)...M 0(k) δ x 0 δ x 0. J(k) 3DFGAT (δ x 0 ) = 1 ( δ x0 δ x b ) T 0 B 1 0 2 N ( 1 2 i=0 ( ) y i H i (x R i(k) ) H i(k)δ x 0 ) T R 1 i ( )
Simplication 2: 3DVar This has no time dependence within the assimilation window. Not used (these days 3D-Var really means 3D-FGAT). J(k) 3DVar (δ x 0 ) = 1 ( δ x0 δ x b 0 2 N ( 1 2 i=0 ) T B 1 0 ( ) y i H i (x R 0(k) ) H i(k)δ x 0 ) T R 1 i ( )
Properties of 4DVar Observations are treated at the correct time. Use of dynamics means that more information can be obtained from observations. Covariance B 0 is implicitly evolved, B i = ( M i 1(k)...M 0(k) ) B0 ( Mi 1(k)...M 0(k) ) T. In practice development of linear and adjoint models is complex. But note M i, H i, M i, H i, M T i, and HT i are subroutines, and so `matrices' are usually not in explicit matrix form. Standard 4DVar assumes the model is perfect. This can lead to sub-optimalities. Weak-constraint 4DVar relaxes this assumption.
Weak constraint 4DVar Modify evolution equation: x i+1 = M i (x i ) + η i where η i N(0,Q i ) `State formulation' of WC4DVar J wc (x 0,...,x N ) = J b + J o + 1 2 `Error formulation' of WC4DVar N 1 i=0 J wc (x o,η 0...,η N 1 ) = J b + J o + 1 2 (x i+1 M i (x i )) T Q 1 i ( ) N 1 i=0 η T i Q 1 i η i
Implementation of weak constraint 4DVar Vector to be determined (`control vector') increases from n in 4DVar to n + n(n 1) in WC4DVar. The model error covariance matrices, Q i, need to be estimated. How? The `state' formulation (determine x 0,...,x N ) and the `error' formulation (determine x 0,η 0...,η N 1 ) are mathematically equivalent, but can behave dierently in practice. There is an incremental form of WC4DVar.
Summary of 4DVar The variational method forms the basis of many operational weather and ocean forecasting systems, including at ECMWF, the Met Oce, Météo-France, etc. It allows complicated observation operators to be used (e.g. for assimilation of satellite data). It has been very successful. Incremental (quasi-linear) versions are usually implemented. It requires specication of B 0, the background error cov. matrix, and R i, the observation error cov. matrix. 4DVar requires the development of linear and adjoint models not a simple task! Weak constraint formulations require the additional specication of Q i.
Some challenges ahead Methods assume that error cov. matrices are correctly known. Representing B 0. Better models of B 0. Flow dependency (e.g. Ensemble-Var or hybrid methods). Representing R i. Allowing for observation error covariances. Representing Q i. Numerical conditioning of the problem. Application to more complicated systems (e.g. high-resolution models, coupled atmosphere-ocean DA, chemical DA). Variational bias correction. Moist processes, inc. clouds. Eective use on massively parallel computer architectures.
Selected References Original application of 4DVar: Talagrand O, Courtier P, Variational assimilation of meteorological observations with the adjoint vorticity equation I: Theory, Q. J. R. Meteorol. Soc. 113, 13111328 (1987). Excellent tutorial on Var: Schlatter TW, Variational assimilation of meteorological observations in the lower atmosphere: A tutorial on how it works, J. Atmos. Sol. Terr. Phys. 62, 10571070 (2000). Incremental 4DVar: Courtier P, Thepaut J-N, Hollingsworth A, A strategy for operational implementation of 4D-Var, using an incremental approach, Q. J. R. Meteorol. Soc. 120, 13671387 (1994). High-resolution application of 4DVar: Park SK, Zupanski D, Four-dimensional variational data assimilation for mesoscale and storm scale applications, Meteorol. Atmos. Phys. 82, 173208 (2003). Met Oce 4DVar: Rawlins F, Ballard SP, Bovis KJ, Clayton AM, Li D, Inverarity GW, Lorenc AC, Payne TJ, The Met Oce global four-dimensional variational data assimilation scheme, Q. J. R. Meteorol. Soc. 133, 347362 (2007). Weak constraint 4DVar: Tremolet Y, Model-error estimation in 4D-Var, Q. J. R. Meteorol. Soc. 133, 12671280 (2007). Inner and outer loops: Lawless, Gratton & Nichols, QJRMS, 2005; Gratton, Lawless & Nichols, SIAM J. on Optimization (2007). More detailed survey of variational methods than can be done in this lecture (plus ensemble-variational, hybrid methods): Bannister R.N., A review of operational methods of variational and ensemble-variational data assimilation, Q.J.R. Meteor. Soc. 143, 607-633 (2017).