Fundamentals of Data Assimila1on

2015 GSI Community Tutorial NCAR Foothills Campus, Boulder, CO August 11-14, 2015 Fundamentals of Data Assimila1on Milija Zupanski Cooperative Institute for Research in the Atmosphere Colorado State University Fort Collins, Colorado

Outline! Motivation! Basic concepts! General challenges of data assimilation! Data assimilation methodologies! Relevance of forecast error covariance! Future of DA 2

Mo1va1on for Data Assimila1on! Prediction Models o o o models are described by a set of equations used to simulate the real- world processes and predict their future behavior. In geosciences, these equations typically refer to a system of partial differential equations (PDE) Various parameters can impact the performance of a PDE model: Initial conditions (IC), model errors (ME), empirical parameters (EP), Our knowledge of these parameters is never perfect, implying uncertainty of the prediction! Observations o o o Various measurements are available: satellite, radar, ships, aircraft, surface, Measuring instruments have errors, other errors (e.g., representativeness) Measurements have numerous errors, implying uncertainty of observations 3

Forecast! Uncertain forecast can be represented by a dynamic- stochastic process dx t = m(x t,t)dt + g(x t,t)dβ t m = Dynamics/model time evolution g = Stochastic forcing x β = State vector = Random vector! State vector (x) o o A (smallest) subset of variables defining a dynamical/physical system Typically it refers to the initial conditions only, but it may also include model errors and empirical parameters x = ( p T q u,v q cloud q snow O 3 T soil q soil ) T p = ( p 1 p N ) T T = ( T 1 T N ) T... Prediction models are fundamentally probabilistic! 4

Observa1ons! In general, observed variables are nonlinearly related to model variables! Observations include instrument and representativeness errors h y ε = Nonlinear mapping from model to observations = Observation vector = Observation error y = h(x t ) + ε! Given the probabilistic character of the model state (x t ) and the existence of observation errors, the observation transformation equation implies the probabilistic character of observations 5

Uncertainty Defines how reliable is a state vector estimate Measure of missing knowledge Important for decision- making Classical information theory (Shannon): - uncertainty is related to probability and entropy - probability and entropy are measures of incomplete knowledge Hurricane Ike (2008) wind speed probability Entropy and order 6

Data assimila1on process Observations Analysis Guess forecast Mathematical method used to blend the information from models and observations is called data assimilation. Data assimilation has a goal of producing optimal estimates of the state and its uncertainty.

Data assimila1on is probabilis1c! Insufficient knowledge of input information implies insufficient knowledge of the output: o o model equations and input parameters are imperfect, thus model forecast will be imperfect observations are imperfect! Uncertainties and imperfect knowledge are best measured by probability. o o make an assumption regarding adequate probability, or, (if possible) form a histogram and deduce best probability distribution Input to data assimilation could be represented by a joint probability density function (pdf) p(x,y ) X = State variable, Y = Observations 8

Bayesian principle in data assimila1on! Bayes theorem: create independent probabilities p(x,y ) = p(y )p(x Y ) = p(x)p(y X) p(x Y ) = p(y X)p(X) p(y ) o o p(x) Prior PDF p(x Y) Conditional PDF! It is implicitly assumed that it is easier to calculate the prior and the conditional PDFs than the joint PDF! A learning algorithm: probability estimate is updated as additional evidence is acquired 9

Prior and condi1onal probability density func1ons! Prior : Defines the knowledge about dynamical state before new observations (Y N ) are assimilated p(x ) = p(x Y N 1 Y 1 ) o Y N-1 Y 1 = Old observations o Y N = New observation not yet used! Conditional probability of new observations with respect to the prior state p(y N X ) o Y N = New observations 10

Gaussian assump1on! Probability Density Function (PDF) can be highly nonlinear, or with numerous unknown parameters that need to be estimated! One of the simplest and most widely applicable PDFs is the Gaussian PDF o Errors of physical processes tend to accumulate near zero (e.g., small errors dominate) o Gaussian PDF has the smallest number of unknown parameters (e.g., mean and covariance first two moments) 1 " (z µ) p(z) = exp$ σ 2π # 2σ 2 2 % ' & 11

Other relevant non- Gaussian PDFs Double exponential (Laplacian) PDF p L (x µ,b) = 1 2b exp x µ b! Wind speed errors can be described using Laplacian PDF! Sharp gradient fields (e.g., atmospheric fronts) exhibit Laplacian PDF There are other non- Gaussian (skewed) PDFs that are relevant: - Lognormal PDF (humidity, cloud variables) - Gamma function PDFs (precipitation) 12

Mul1variate Gaussian prior PDF! Multivariate Gaussian PDF p(x) = 1 exp " ( 2π ) N /2 1/2 $ 1 P 2 x x f # f ( ) T P 1 ( f x x f ) o x = state variable, P f = covariance, f = index denoting forecast o = determinant, N = dimension of state vector x % ' & } } } Covariance is independent of x Error of the forecast is defined as a difference between states Only mean and covariance are required # p(x) exp% 1 2 x x f $ ( ) T P 1 ( f x x f ) & ( ' 13

Mul1variate Gaussian condi1onal PDF! Probability of observations given the prior state # p( y x) exp% 1 $ 2 y h(x) ( ) T R 1 ( y h(x) ) o y = observations, R = observation covariance o h = nonlinear observation operator (mapping from model to observations) & ( ' } } } Zero mean is implicitly assumed in the above equation Error of the observation is defined as a difference between the observation and the model guess value at observation location Some observations, such as satellite, have a non- zero mean error 14

Data assimila1on with Gaussian PDFs! Maximum a- posteriori estimate: Find optimal state X opt that maximizes the posterior probability density function p(x Y) X opt = argmax x p(x Y )! Minimum variance estimate: Find optimal state X opt with the smallest error (variance) [L= loss function of conditional mean] X opt = argmin x E( L[E(x y)] ) Both estimates are identical for Gaussian PDF, otherwise they differ 15

Gaussian posterior PDF! Given the prior and conditional PDF, the posterior is # p(x y) exp% 1 $ 2 y h(x) ( ) T R 1 ( y h(x) ) 1 2 x x f ( ) T P 1 ( f x x f ) & ( '! Log- likelihood function (also referred to as the cost function) f (x) = log p(x y) = 1 " 2 # x x f $ % T Pf 1 " # x x f $ % + 1 " # 2 y h(x) $ %T R 1 " # y h(x) $ % Consequence: Minimize cost function = Maximize posterior PDF X opt = argmax x p(x y) = argmin f (x) x 16

Op1mality condi1ons for the minimum Taylor expansion for multivariate functional: f (x + ε p) = f (x) + ε p T [ f ]+ 1 2 ε 2 p T [ 2 f ]p + O(ε 3 ) ε p - constant parameter - perturbation vector 2 f > 0 f = 0 x - control vector f = g(x) - gradient 2 f = G(x) - Hessian f (x + ε p) = f (x) + ε p T g + 1 2 ε 2 p T Gp + O(ε 3 ) g(x*) = 0 G(x*) > 0 17

Minimize cost function: Op1mal solu1on f (x) = 1 2 x x f T Pf 1 x x f + 1 [ 2 y h(x) ] T R 1 [ y h(x) ] (1) g(x) = 0 P f 1 x x f H T R 1 [ y h(x) ] = 0 H = h x (2) G(x) > 0 z T Gz > 0 for any z G = P f 1 + H T R 1 H > 0 Since P 1 f + H T R 1 H minimum exists. is positive definite and symmetric, the 18

Role of Hessian (second deriva1ve) Optimal solution requires inverse Hessian g(x) = P f 1 x x f H T R 1 [ y h(x) ] f (x) G = P f 1 + H T R 1 H Gd = g d = G 1 ( g) Best condition for the system of linear equations: x x f = E T w G = EE T f (x) G w = E 1 (EE T )E T = I 19

One- point DA algorithm deriva1on Minimize quadratic cost function: f (x) = 1 2 x x f T Pf 1 x x f (1) f (x) = 0 P f 1 + 1 [ 2 y Hx ] T R 1 [ y Hx] x x f H T R 1 [ y Hx] = 0 x = (I + P f H T R 1 H ) 1 (x f + P f H T R 1 y) One- point DA with single observation at a grid- point 2 2 H I P f σ f R σ o x a = (I + σ f 2 σ o 2 ) 1 (x f + σ f 2 σ o 2 y) 20

One- point DA (1) x a = (I + σ f 2 σ o 2 ) 1 (x f + σ f 2 σ o 2 y) x a = 2 σ o σ 2 2 f + σ o 2 x σ f + f σ 2 2 f + σ o y x a = α x f + βy α + β =? Analysis is a linear combination of the first guess and observation vectors, or analysis is an interpolation from observation and first guess uncertainty defines the interpolation weights 21

One- point DA (2) Note that interpolation weights are normalized: 2 σ o σ 2 2 f + σ o + 2 σ f σ 2 2 f + σ o = σ 2 2 o + σ f σ 2 f + σ = 1 2 α + β = 1 o x a = α x f + (1 α )y α = σ o 2 σ f 2 + σ o 2 Normalization of weights assures that the analysis will be between the guess and the observation α = 2 σ o σ 2 f + σ = 1 2 2 o σ f σ o 2 +1 = 1 1+ σ f σ o 1 2 Only the ratio between uncertainties is important! 22

One- point DA (3) x a = 2 σ o σ 2 2 f + σ o x 2 σ f + 1 o σ 2 2 f + σ o y Observation error Forecast error a = σ o b = σ f (1) Large confidence in observations: a < b (2) Equal confidence in observations and first guess: a = b First guess x b b Optimal analysis First guess x b b Optimal analysis x a a x a a y Observation y Observation Interpretation of data assimilation is simple, the complexity comes from high dimensional state and nonlinear operators 23

Example 1 σ f (1) Given =, what is the analysis? σ o x a = 1 1+ ω 2 x f + ω 2 1+ ω 2 y ω = σ f σ o x a = lim 1+ ω 2 x f + lim ω 1 1+ ω 2 y ω 2 ω lim ω 1 1+ ω 2 = 0 ω 2 lim ω 1+ ω 2 = lim 1 ω 1 ω +1 2 = 1 x a = y No confidence in forecast implies the analysis at observation 24

Example 2 σ f (1) Given = 0, what is the analysis? σ o x a = 1 1+ ω 2 x f + ω 2 1+ ω 2 y ω = σ f σ o x a 1 ω 2 = lim ω 0 1+ ω 2 x f + lim ω 0 1+ ω 2 y lim ω 0 1 1+ ω 2 = 1 ω 2 lim ω 0 1+ ω 2 = 0 x a = x f No confidence in observations implies no impact of data assimilation 25

Challenges of realis1c data assimila1on Challenges are:! High dimensionality of state and observations o impacts degrees of freedom of forecast error covariance, acceptable choices of DA methodology! Nonlinearity of simulated physical processes and observation operators o Need capability to handle nonlinearities! Computation o Costly integration of realistic forecast models, matrix inversion! Observation errors o Bias correction, correlated observation errors! Multivariate character of the DA problem o Dynamical stability of the analysis 26

Impact of high dimensionality on matrices in data assimila1on Several possible formula8ons of data assimila8on (linear) analysis equa8ons: (1) (2) x = x f + P f H T ( HP f H T + R) 1 [y h(x)] x = x f + (P f 1 + H T R 1 H ) 1 H T R 1 [y h(x)] KF, observa8on- space VAR model- space VAR (3) x = x f + P f 1/2 (I + P f T /2 H T R 1 HP f 1/2 ) 1 P f T /2 H T R 1 [y h(x)] sqrt ensemble- space Q: Which version is best for high- dimensional applica8ons? It all depends on the limitation of matrix inversion due to dimensionality 27

Possible approxima1ons used in prac1ce (1) Neglect forecast covariance impact in matrix inverse: HP f H T + R R x = x f + P f H T R 1 [y h(x)] (2) Neglect observation impact in matrix inverse: H T R 1 H ~ 0 x = x f + P f H T R 1 [y h(x)] (3) Use all, but in low- dimensional subspace: dim(i + P f T /2 H T R 1 HP f 1/2 ) ~ O(100) x = x f + P f 1/2 (I + P f T /2 H T R 1 HP f 1/2 ) 1 P f T /2 H T R 1 [y h(x)] 28

Prac1cal data assimila1on algorithms: Basic methods! Variational data assimilation (3D- Var, 4D- Var) o Maximum a- posteriori estimate o Iterative minimization has advantage for nonlinear operators o Forecast uncertainty is pre- defined, time- independent (e.g., static) o Forecast uncertainty has all degrees of freedom o Employs an adjoint (e.g., transpose) operator! Ensemble Kalman filter data assimilation (EnKF, EnSRF) o Minimum variance estimate o Assumed linear KF analysis solution o Statistical sampling of forecast error covariance o Forecast uncertainty is flow- dependent (e.g., ensemble forecasts) o Reduced number of degrees of freedom o No need for an adjoint, use difference of nonlinear functions

Varia1onal data assimila1on 3D- Var cost function (one observation time): f (x) = 1 2 x x f T Pf 1 x x f + 1 [ 2 y h(x) ] T R 1 [ y h(x) ] 4D- Var cost function (sum over observation times): f (x) = 1 2 x x f T Pf 1 x x f + 1 2 T k=1 [ y h(m(x)) ] T k R 1 k y h(m(x)) [ ] k o 4D- Var allows smooth transition to forecast after data assimilation o 4D- Var analysis is more costly to calculate than 3D- Var o 3D- Var can considerably improve with better definition of the background error covariance (P f )

Prac1cal varia1onal DA f (x) = 1 2 x x f T Pf 1 x x f + 1 [ 2 y h(x) ] T R 1 [ y h(x) ] Forecast error covariance is high- dimensional (O(10 14 )) Matrix inversion in high- dimensions is costly/impossible to compute and likely ill- condi8oned Always use a change of variable: x x f = P f 1/2 w OR x x f = P f w f (w) = 1 2 wt w + 1 2 y h(x f + P 1/2 f w) This is practical if we assume a simple, or diagonal observation error covariance Otherwise, need to think of another change of variable T R 1 y h(x f + P 1/2 f w)

What is ensemble data assimila1on? (1) Forecast uncertainty is calculated from multiple model forecasts (ensembles) FCST ERROR M(x) } Forecast uncertainty Initial uncertainty { t t+1 TIME (2) Analysis employs the ensemble information and produces uncertainty Dynamical model (phase) space Observation space K(x) X X X 32

Ensemble Kalman filters Monte-Carlo (Evensen 1994; Houtekamer and Mitchell 1998) (1) Perturb observations using an assumed observation error PDF N(0,R) (2) Calculate ensemble of analyses, for each realization of observations x i a = x i f + [P f H T ens ] i ([HP f H T ] ens i + R) 1 y i h(x f i ) x a = 1 N Deterministic, square-root filters: (Bishop et al. 2001; Anderson 2003; Whitaker and Hamill 2002; Hunt et al. 2007) i x i a (1) No perturbed observations, use actual measurements (2) Only a single analysis is calculated x a ( f = x i + [P f H T ens ] i ([HP f H T ] ens i + R) 1 y i h(x f i ) ) y i = y + ε i x a = x f + [P f H T ens ] i ([HP f H T ] ens i + R) 1 y h(x f ) h(x f i ) h(x f )

Flow- dependent forecast error covariance grid- point x time obs 2 obs 1 Geographically distant observations can bring more information than close- by observations, if in a dynamically significant region

Impact of sta1c error covariance grid- point x Correlation length scale time obs 2 obs 1 Low- valued information (obs 2 ) will be assimilated instead of a high- valued information (obs 1 )

Forecast error covariance: Degrees of Freedom Singular Value Decomposition [Golub and van Loan (1989)] P 1/2 f = VΣW T T = σ i v i w i i P f = P 1/2 1/2 f ( P f ) T = VΣW T ( WΣV T ) = VΣ 2 V T 2 σ i v i v i T i v i, w i singular vectors σ i singular value V = ( v 1 v 2 v ) = N! # # # # # " 1 v 1 1 v 2 2 v 1 2 v 2 1 v N 2 v N N v S 1 N v S 2 N v S N $ & & & & & % " $ $ Σ = $ $ # σ 1 0 0 0 σ 2 0 0 0 0 0 σ N % ' ' ' ' & N S state vector dimension N N S σ 1 σ 2 σ N 0 " $ 1 i = j v T i v j = δ ij = # %$ 0 i j DOF = number of singular values greater than zero = rank of the matrix

Reduced rank aspect of (ensemble) data assimila1on Only a limited number of ensembles can be calculated: - high cost of forecast model integra1on - high- dimensional state vector Full model (phase) space Ensemble (phase) space Observations outside ensemble space cannot be assimilated Hybrid varia1onal- ensemble methods improve DOF problem by crea1ng uncertainty in all parts of the model space (e.g., combine flow- dependent and sta1c error covariance)

Why is forecast error covariance important? x a = x f + P f H T z obs = H T ( HP f H T + R) 1 [y h(x f )] = P f z obs ( HP f H T + R) 1 [y h(x f )] The analysis update is x a x f = i σ i 2 v i v i T z obs = i µ i v i µ i = σ i 2 v i T z obs! Analysis update is a linear combina1on of forecast error covariance singular vectors! Analysis increments are defined in the subspace spanned by forecast error covariance singular vectors

Prac1cal data assimila1on algorithms Hybrid methods! Hybrid variational- ensemble data assimilation J(x,α ) = P f 1/2 = β f o Combined static and flow- dependent error covariance o Iterative minimization o Sequential method ( δ x f ) T 1 P VAR ( δ x f ) + β e (α ) T ( P ENS L) 1 (α ) + 1 2 y Hδ xtot T R 1 y Hδ x tot δx tot = δx f + k α k P ENS 1/2 " # $ % k 1 β f + 1 β e =1! 4D- EN- VAR o 4- D control variable: simultaneous adjustment in time and space o Sequential method o Increased dimension of the control vector o Reduced rank, but could be used as a hybrid

What are future possibili1es for DA methodology? (1) Particle filters - solution to curse of dimensionality may exist - computational improvements may help with respect to large ensemble size requirement (2) Other hybrid options - improve existing hybrid DA methods by other means (3) Maximum entropy filters, information filters - more general than using a typical log- likelihood function - relate to Shannon information theory (4) Development of reduced- rank modeling - if reduced- rank modeling becomes a reality, it would imply reduced number of DOF - this would directly benefit EnsDA by alleviating the issue of small ensemble size (5) Improved computing efficiency - parallel computing on large number of CPUs

Future of data assimila1on! Practice: o Great need for combining information from observations and models with increased dimensionality and complexity o High temporal frequency observations (e.g., geostationary satellite) o Coupled data assimilation! Theory: o Increase generality of the mathematical formalism (fewer auxiliary parameters) o Reduce number of assumptions - Gaussian pdf, - uncorellated observation errors, 41