Introduction to Ocean Data Assimilation

Size: px

Start display at page:

Download "Introduction to Ocean Data Assimilation"

Bernard French
5 years ago
Views:

1 Introduction to Ocean Data Assimilation Edward D. Zaron 1 Department of Civil and Environmental Engineering Portland State University, Portland, OR, USA Conventional ocean modeling consists of solving the model equations as accurately as possible, and then comparing the results with observations. While encouraging levels of quantitative agreement have been obtained, as a rule there is significant quantitative disagreement owing to many sources of error: model formulation, model inputs, computation and the data themselves. Computational errors aside, the errors made both in formulating the model and in specifying its inputs usually exceed the errors in the data. Thus it is unsatisfactory to have a model solution which is uninfluenced by the data. Bennett (1992) Abstract Data assimilation is the process of hindcasting, now-casting, and forecasting using all available information. This chapter summarizes the methods and applications of data assimilation in the ocean. Key words: data assimilation 1 Introduction There are many technologies for observing the ocean. Examples include instruments for taking measurements a fixed points, such as acoustic Doppler velocimeters; horizontal and vertical profilers, such as towed conductivity-temperaturepressure sensors (CTDs); and spatially extensive, nearly instantaneous or synoptic, measurements, such as satellite imagery or radiometry. Every measurement system is defined by the physical variables it measures, space-time resolution and averaging characteristics which determine how high frequency information 1 Department of Civil and Environmental Engineering, P.O. Box 751, Portland, OR 97207, USA. ezaron@coas.oregonstate.edu; WWW: ~zaron/. Preprint submitted to GODAE/Bluelink Summer School 17 November 2009

2 is aliased to lower frequencies, and the noise and bias properties of the instrumentation. Given the large size of the ocean, and the great expense of measurement and observation systems, no practicable observation systems completely determine the state of the oceans. Hence, models are necessary to complement the basic observations. However, the ocean itself is a turbulent fluid, and small changes in initial conditions can have a significant impact on the subsequent evolution of the fluid. Hence, even if it were possible to completely solve the partial differential equations of fluid motion, the prediction of the oceanic state would be limited by the accuracy of initial conditions and boundary data (e.g., air-sea flux of momentum). Numerical ocean models additionally require discretization or truncation of the degrees of freedom of the continuum equations, and the parameterization of the neglected motion on the resolved scales is a significant source of error in our ability to simulate the fluid flow accurately. It is these two considerations, the relative paucity of observational data and the limitations of models, which provide the impetus for data assimilation. Generally, one would like to predict the oceanic state, i.e., the space-time fields of temperature, salinity, pressure, and three-dimensional velocity, within some spatial domain and time period. One has some set of observations within this same space-time domain, but, even if they were error-free, they would constrain only a finite number of degrees of freedom of the fluid. The goal of data assimilative modeling is to find an estimate for the oceanic state variables, and possibly other controls such as fluxes through the air-sea boundary layer, which are maximally consistent with observations and numerical model dynamics, allowing for errors in both. Data assimilation, as described here, is a relatively new field since it is dependent on significant computational resources for solving the equations of fluid motion in the atmosphere and ocean. But the basic problems of ocean data assimilation have a long history with mathematical roots in probability and estimation theory, inverse theory, and the classical calculus of variations. The operational roots of data assimilation are closely tied to the weather prediction community, which has long dealt with the problem of how to smooth and interpolate sparse measurements in order to optimize subsequent weather predictions. This introduction to the subject of ocean data assimilation is selective. The goal is to touch on the major points of theory and implementation, with an eye towards common themes which have been developed in the primary literature. After reading this chapter, the reader should be well-prepared to survey any of the the many existing texts and introductions to data assimilation (Gelb, 1974; Wahba, 1990; Daley, 1991; Bennett, 1992; Parker, 1994; Wunsch, 1996; Bennett, 2002; Kalnay, 2003; Evensen, 2007). 2

3 The article begins by reviewing different views of the purpose of data assimilation, which is not simply the assimilation of data into prognostic ocean models. Then, the basic theory is introduced as an application of Bayes Theorem, several current ocean data assimilation methods are derived from this, and the first part of the article closes by outlining the basic components common to all data assimilation methodologies. The second part of the article provides an overview of implementation issues common across data assimilation methodologies, with the goal of introducing the reader to central issues in the analysis of data assimilation systems. Notation and nomenclature vary widely in the literature, and an effort has been made to use a consistent but minimal notation which is in accord with the recent literature. A glossary is provided as an appendix which provides annotated definitions of significant terms. 2 The purpose of data assimilation Much like the ancient Indian parable of the blind men and the elephant (Strong, 2007), there are several different perspectives on the purpose of data assimilation. The literature concerned with each of these areas is quite diverse, and the disparate nomenclature and perspectives can sometimes obscure common themes and methodological approaches. Interpolation, extrapolation, and smoothing The purpose of data assimilation is to estimate the state of the ocean using all information available, including dynamics (e.g., the equations of motion) and observations. The end goal of the data assimilation is to produce the most accurate analysis fields which are smoothly and consistently gridded from sparse or irregularly distributed data, and the dynamical relationships amongst the fields are in reasonable balance with prior physical considerations, such as geostrophic balance. Where measurements are sparse, the analysis fields ought to interpolate the measurements, or nearly so, with allowance for the measurement error. Where measurements are absent, they ought to be extrapolated from nearby measurements, consistent with a realistic term balance in the dynamic model. Where measurements are dense, redundant, or particularly inaccurate, the analysis field ought be plausibly smooth, containing no more structure than warranted by the observations and the dynamics. Representative papers from this genre are many, e.g., Oke et al. (2002); Paduan and Shulman (2004); Moore et al. (2004). Parameter calibration The purpose of data assimilation is to develop the most accurate model of the ocean, by systematically adjusting unknown or uncertain parameters so that model predictions are maximally congruent with calibration 3

4 data. The emphasis is on adjusting what may be highly uncertain or difficultto-measure physical parameters, e.g., scalar parameters involved in turbulence sub-models, or fields such as the seabed bathymetry. From the perspective of parameter calibration, the end goal of data assimilation is to produce the best possible model for future prognostic or data assimilative studies, which maximizes the information gained, neither over- or under-fitting the calibration data. There is a significant oceanographic literature in this area, but parameter estimation generally involves the solution of strongly nonlinear inverse problems, which can be more complex than state estimation (Lardner et al., 1993; Heemink et al., 2002; Losch and Wunsch, 2003; Mourre et al., 2004). Hypothesis testing The purpose of data assimilation is to systematically test or validate an ocean prediction system, which includes as subcomponents a model of hypothesized ocean dynamics, its error model, and an error model for the validation data. The thorough study of analysis increments, model inhomogeneities, data misfits, and their relations to hypothesized dynamics and error models is emphasized. The end goal from this perspective is a definitive test of the ocean prediction system, and an analysis of the primary flaws in the dynamical model. Dee and dasilva (1999) and Bennett et al. (2006) are representative examples. 3 Mathematical Formulation The different purposes of data assimilation involve the optimal utilization of information from different sources. Bayes Theorem is a concise foundation of these ideas since it is concerned with the combination of information as expressed in probabilities. Optimization criteria and statistical estimators may be derived by considering the posterior probability of x, the state to be estimated, conditioned on y, the observations. A non-rigorous introduction is presented here; non-trivial details concerning the applicability of probability densities to function spaces are glossed over. Wahba (1990) contains an introduction to the central issues and is a good entry point to the specialized literature. 3.1 Bayes Theorem In principle there is a probability density function (pdf), P(ɛ f,ɛ i,ɛ b ), which specifies completely the distribution of the errors in the model forcings ɛ f, the initial conditions ɛ i, and the boundary conditions ɛ b. Hence, there is a probability density for the oceanic state P(x) which may be computed from these inputs. Likewise, there is a probability density which describes the measurement errors, 4

5 P(ɛ m ), which is usually expressed in terms of P(y x), the pdf of observations conditioned on the oceanic state, x. The joint probability of the state and the measurements P(x, y) (the probability of x and y) and the conditional probability are related by the definition, P(x, y) = P(x y)p(y). (1) Bayes Theorem is derived by combining this relationship with its counterpart, P(x, y) = P(y x)p(x), and solving for the conditional probability, P(x y) = P(y x)p(x)/p(y). (2) Equation (2) is a simple prescription for combining information from both the dynamics and the data. Given estimates of the errors in initial conditions, boundary forcing, or other model inhomogeneities, one can, in principle, find P(x), the probability distribution of the oceanic state, in the absence of measurements. Knowledge of the measurement system determines P(y x), the probability distribution of the observations, conditioned on the oceanic state. With these quantities in hand, it is simply a matter of computation to find the posterior probability distribution of the oceanic state conditioned on the observations, P(x y). The denominator, P(y) = P(y x)p(x)dx, can be computed; however, since this pdf is independent of x, it merely serves to normalize P(x y). It is a matter of choice to select the best estimator for x, the analysis x a, among the following choices: (1) A maximum likelihood estimate, x a = argminp(x y), (2) The expected value, x a = xp(x y)dx, or (3) The median, P(x a y) = 0.5. The distinctions between different data assimilation methods arise from the following considerations: The definition of oceanic state variables. Implicit in the above discussion is the assumption that the oceanic state consists of the full space-time fields of momentum, buoyancy, and pressure within a region of the ocean, over some time interval of consideration. The number of state variables may be considerably reduced in practice, depending on context, by using diagnostic relations amongst the variables. Dimensionality is important. Consider, for example, the fields in a regional ocean model defined on a spatial grid of N X = 200 by NY = 200 horizontal grid points, and N Z = 30 vertical grid points, at NT = 1000 time points. A sequential assimilation scheme might estimate initial conditions of sea-surface height at N = N X NY grid points, resulting in a cardinality of N = for the state variable x. Alternately, if x is taken as the initial conditions for the N = N X NY N Z 4 values of the horizontal velocity, buoyancy, and pressure field (u, v,b, p), one has N = unknowns in the state vector. In some versions of so-called weak-constraint 4-D variational assimilation (W4D-Var), one 5

6 seeks an optimal state estimate of the above fields at all NT time steps, which yields a cardinality of N = for the unknown state. Complexity of the error models for the dynamics and observations. If the error in the initial conditions, boundary conditions, etc. can be adequately approximated by multivariate Gaussian distributions, then the implementation of the above Bayesian analysis procedure is greatly simplified. In fact, these assumptions are practically universal in ocean data assimilation. Even where non- Gaussian empirical distributions are used to define the probability distributions, the optimal state estimate is usually selected by a minimum variance criterion. Complexity of the model dynamics. Even if the errors are correctly described by Gaussian distributions, it may be that the model dynamics are sufficiently nonlinear to render the probability distribution of the model state P(x) non- Gaussian. Differing treatments of the nonlinearity in the model dynamics yield both formal and practical differences between various data assimilation algorithms. 3.2 Examples To make these ideas concrete, we consider first a trivial example, namely, the estimation of a scalar by combining information from a climatology and a single observation. Then, the more general multivariate estimation problem is written out in detail which provides the foundation for analysis of most ocean data assimilation methods, if not the actual solution procedure Estimation of a Scalar Assume that one wishes to estimate a scalar, say, temperature, denoted x. A climatology has been constructed, from which can be approximated the probability distribution function P(x) = ( 2πσ 2 ) 1/2 x exp ( (x x b) 2 ) where the background x b is the climatological mean. In other words, the null dynamics of a climatology is used for the background. A thermometer provides an observation of temperature with finite accuracy. Hence, given temperature x, the probability distribution of the observations is assumed 2σ 2 x (3) 6

7 to be a Gaussian also, ) ( ) 1/2 P(y x) = 2πσ 2 (y x)2 y exp (. (4) 2σ 2 y In other words, the measurements are assumed to be unbiased, and the standard deviation of the measurement error is σ y. It is left as an exercise to the reader to use the definition P(y) = P(y x)p(x)dx (5) to show that P(y) is a Gaussian, with mean x b and variance σ 2 x + σ2 y. Application of Bayes Theorem is straightforward, and one finds that P(x y) is Gaussian. The maximum likelihood, mean, and median estimators all coincide, yielding x a = x b + σ 2 x (σ2 x + σ2 y ) 1 (y x b ). (6) The variance of this estimate is σ a = (σ 2 x + σ 2 y ) 1. (7) In spite of its simplicity, this example shows some of the key features of advanced linear data assimilation methods. First, note that the optimal estimate in equation (6) is a linear combination of the background x b and the residual y x b, where the term δx a = σ 2 x (σ2 x + σ2 y ) 1 (y x b ) (8) is called the analysis increment. Note the limits σ x 0 (perfect background) and σ y 0 (perfect data), which yield x a = x b and x a = y, respectively. Furthermore, the estimated variance of the optimum (7) is less than the variance of either the background or the data, separately; combining information from the background and the observations has reduced uncertainty in the unknown state, x. A final point to note is the dependence of the optimal estimate on the error models for the background and data. In this case, the error model assumes Gaussian errors, which are uniquely defined by their means and variances. What if the errors are biased, and there is a nonzero mean? The analysis will also be biased. What if the variances are incorrect? Consider the sensitivity defined by the derivative of the analysis with respect to the variances, x a σ 2 x = σ2 y σ 2 x δx a σ 2 x + σ 2, (9) y 7

8 and x a σ 2 y = δx a σ 2 x + σ 2, (10) y Sensitivity to the mis-specification of a variance is greatest when the difference between the background and the data is large compared to the sum of the prior variances. Hence, a badly mis-specificed error model can have a significant impact on the quality of analyses, particularly when the errors are under-estimated. This is the essence of the garbage in = garbage out phenomena in data assimilation Estimation of a Vector It is customary and instructive to generalize the above example to the estimation of a vector, assuming all errors are still Gaussian. This is Gauss-Markov smoothing, which forms the basis for so many other estimation algorithms. Assume one wishes to estimate a vector x R N, given a background x b, and a vector of observations y R M. For concreteness, assume that x represents the complete set of initial conditions for a linear prognostic ocean model at t = t 0, and assume the dynamical model may be integrated to yield predictions of the oceanic state at a later time, thus, x(t) = M (t, t 0 )[x], (11) where M (t 2, t 1 )[x(t 1 )] represents the time integration of the dynamical equations that maps the solution x(t 1 ) to x(t 2 ), and x = x(t 0 ) is implicit. Furthermore, assume that each element of the observation vector y = {y i } M may be represented as the action of a linear operator on x(t i i=1 ), y i = ĥi x(t i ), (12) where ĥi R 1 N. Using the evolution operator M, each measurement can be related to the initial state by the dynamical model, y i = ĥi M (t i, t 0 )[x(t 0 )], (13) which allows one to define the measurement operators in terms of their action on the vector x to be estimated, y i = h i x, (14) where h i = ĥi M (t i, t 0 ). Next, define the matrix H R M N by collecting the measurement operators together so that y = Hx. With these definitions, one has set up the so-called 4D-Var problem, assuming both the dynamics and measurement operators are linear. The notation follows Ide et al. (1997). 8

9 To apply Bayes Theorem, it is necessary to state the probability densities of x x b and the measurement error ɛ = y Hx. These shall both be assumed to be multivariate Gaussian with zero means; the covariance of x is denoted B R N N, i.e., (x x b )(x x b ) T = B, (15) and the covariance of ɛ is denoted R R M M. Applying equation (2), one finds P(x y) is proportional to exp( 1 2J (x)), where J (x) = (x x b ) T B 1 (x x b ) + (y Hx) T R 1 (y Hx), (16) which is the objective function that forms the basis for so-called variational data assimilation. With a bit of linear algebra, one finds the value x = x a which minimizes J is x a = x b + K(y Hx b ), (17) where the analysis increment is δx a = K(y Hx b ), and K takes the form K = BH T (HBH T + R) 1. (18) The full expression for the analysis error, the error covariance of x a, is denoted P a R N N, P a = (B 1 + H T R 1 H) 1. (19) Remarks: (1) Equation (17) can be derived as the Best Linear Unbiased Estimator, which minimizes expected error (e T k (x a x)) 2, where e k R N is the basis vector pointing in direction k. Similarly, the estimator in (17) minimizes the expected mean square error Tr ((x a x)(x a x) T )/N. (2) J written in (16) is also called the penalty function or cost function. The analysis field x a is its minimizer. (3) 1 2J is the negative of the log-likelihood function, when the errors are modeled as Gaussian. (4) Because H is linear, J is convex and possesses a unique minimum. When H represents a non-linear operator, there may be multiple minima. Note that in our notation, H contains both the measurement operators per se, ĥi, and the model dynamics M. (5) Additional constraints may be added to the objective function, say, to suppress certain dynamics. These can complicate the solution procedure considerably and may obscure the failure of B or R to properly account for the covariance structure of the errors. (6) The conditioning of the objective function refers to the ratio of the largest to smallest eigenvalues of the Hessian matrix of second derivatives of J. The 9

10 eigenvalue spectrum of the Hessian can be interpreted as the curvature of principle axes of the iso-surfaces of J. (7) When the assumptions regarding the Gaussian errors are correct, the Hessian matrix H = 2 J / x 2 and the analysis error covariance P a are related by P a = 1 2 H 1. (8) The above formalism can be applied to continuous fields, rather than vectors in R N. In that case x is generally a vector function and M represents an integro-differential operator. J is then a penalty functional, and the stationarity condition for the minimum J = 0 must be derived using the calculus of variations; the result is the Euler-Lagrange equation. (9) When continuous fields are used, there are close ties to the theory of smoothing splines, with B 1 being a positive-definite-symmetric differential operator. (10) The interpretation in terms of continuous fields is essential to properly understanding the conditioning of the objective function as model resolution is increased to the continuum limit. Likewise, the language of linear algebra is not adequate for analyzing spatial regularity (differentiability) of the analysis increments. Several important generalizations of the above are considered next, with particular attention paid to the forecast cycle, which leads to an evolution equation for P and the Kalman Filter, as well as a consideration of nonlinearity in both the ocean dynamics and the measurement operators. 3.3 Sequential Filtering Algorithms Consider now the problem of sequential estimation, where one assumes that initial conditions x(t i ) at t i propagate forward to time t i+1 according to x(t i+1 ) = M (t i+1, t i )[x(t i )]+η i, where η i is random model noise with zero mean and known covariance. One also has a vector of observations y i collected in the interval [t i, t i+1 ]. Assume one intends to cycle the assimilation beginning with a previous analysis at t i, leading to a background forecast at t i+1, and ending with an analysis t i+1. As before, the analysis optimally incorporates the forecast and observations, but rather than estimate of initial conditions to improve future forecasts, one adjusts the forecast to produce an analysis. The additional complication in this case is that the analysis covariance from step i becomes the forecast covariance at step i + 1; hence, the covariance is evolved together with the state itself. The notation x f i denotes the background forecast and x a i the analysis, both at t i. For consistency with notation in the literature, P f i is used for the forecast covariance, and P a i is used for the analysis covariance at time t i. 10

11 Because η i is unknown, the forecast is computed from the previous analysis with x f i+1 = M (t i+1, t i )x a i. (20) Assuming η i η T i = Q i is the model noise covariance, the forecast error covariance evolves according to P f i+1 = M (t i+1, t i )P a i M (t i+1, t i ) T + Q i. (21) With these two pieces of information, one can find the analysis at t i+1 using the previously-derived results from Bayes Theorem, where the Kalman gain matrix K i is The analysis error covariance, x a i+1 = xf i + K i (y i H i x a i ), (22) K i = P f i+1 HT i (H i P f i+1 HT i + R i ) 1. (23) P a i+1 = ((Pf i+1 ) 1 + H T i R 1 i H i ) 1, (24) is customarily written in terms of the Kalman gain and forecast covariance by using the Sherman-Morrison-Woodbury formula (Golub and Van Loan, 1989), P a i+1 = Pf i+1 Pf i+1 HT i (R i + H i P f i+1 HT i ) 1 H i P f i+1 (25) = (I P f i+1 HT i (R i + H i P f i+1 HT i ) 1 H i )P f i+1 (26) = (I K i H i )P f i+1. (27) Remarks: (1) Equations (24) and (27) are equivalent, but note that (27) only requires the inversion of an M M matrix (in the definition of K i ), while equation (24) appears to require the inversion of an N N matrix. This reduction of apparent rank is a consequence of the fact that at most M degrees of freedom are actually constrained by the data. (2) Implicit in the above notation is the linearity of the model evolution operator, M (t i+1, t i ). When this operator is linear, the above algorithm constitutes the Kalman Filter. When M (t i+1, t i ) is nonlinear, a linear approximation must be used in the forecast covariance evolution equation (21), and the above algorithm is called the Extended Kalman Filter. (3) Recall that the analysis fields are a function of the model dynamics, data values, observation operators, model error covariance, and observation error covariance. If the model error covariance (the system noise) is underestimated, the filter equations can lock-on to an overly optimistic estimate 11

12 of the forecast error covariance. Once this occurs, further data are discounted and do little to improve the analysis. It is essential to monitor the performance of data assimilation algorithms and verify that analysis increments and innovation vectors are within nominal ranges. (4) In the Extended Kalman Filter, the evolution equations for the forecast error covariance matrix can be unstable, particularly when the time interval [t i, t i+1 ] is long compared to the time scales of nonlinear instabilities in the dynamic model. In this case, care must be exercised in choosing the appropriate linear approximation of the model dynamics. Alternately, the data must be sufficiently dense in space and time to permit frequent analysis updates, keeping the forecast covariance realistic. (5) Many different computational approaches have been proposed to enable the implementation of the Extended Kalman Filter in oceanography. These generally amount to making some approximations in the filter evolution equations (e.g., steady state), reducing the rank of the forecast or analysis covariance matrices (by orthogonal decomposition, e.g., error subspace statistical estimation), reducing the information content of the covariance matrices (by basing them on ensembles of model states, as in the Ensemble Kalman Filter), or by reducing the dimensionality of the state to be estimated (reduced order models). 3.4 Generalized Inversion The progression from section to to 3.3, is intended to illustrate commonalities in the approaches to data assimilation. However, the choice of state space, sources of model noise, and modeling of errors are determined by particulars of the application and context and, it is important to develop a general formalism for developing data assimilation applications. Bennett (1992) introduced the notion of the generalized inverse of a dynamical model. The ordinary inverse corresponds to the usual notion of non-data assimilative ocean modeling: starting with a well-posed system of differential equations, the ordinary inversion of these dynamics is effected by some means, thus resulting in the unique solution of the equations. The notion of generalized inversion is to incorporate observations which formally over-determines the dynamical equations. Since the full system of equations is now ill-posed, additional assumptions regarding the errors must be provided to find the solution which is most compatible with the model and observations. This approach works systematically from a consideration of the sources of error in the dynamics and a definition of the observing system, and proceeds to derive the conditions for the optimal solution, the analysis. Once these have been derived, one can consider approximations and simplifications for implementing 12

13 solution algorithms. A WORKED EXAMPLE. HOW COMPLEX? TBD. 4 Summary of Part I: Components of Data Assimilation Systems The general approach for developing data assimilation systems is outlined above. There are many, many, details specific to particular applications and solution algorithms, which will be introduced in subsequent chapters by Brasseur (Kalman Filters) and Moore (Variational Assimilation). The following list defines the elements common to all approaches. There is a definition of the system state which is to be estimated. There is a dynamical model which provides the background estimate of the system state. There is a definition of the control variables, the sources of system noise, which are also to be estimated. There is an error model for the system noise. There is a definition of the observing system which describes how measured values are related to the system state. There is an error model for the observing system. There is an optimality criterion which incorporates the above components. There is a solution algorithm which computes the analysis state and other quantities of interest, such as the estimated analysis error. 5 Analysis of Data Assimilation Systems 5.1 Solution Algorithms As mentioned previously, solution methods are determined by the following considerations. (1) The cardinality of the state space. Typically N is so large that the N N matrices written in the previous section cannot be explicitly constructed. Instead, one considers the computations in terms of vector-matrix multiplication, so the large matrices are never constructed. (2) The dimension of the observation vector. For linear models and observing systems it can be shown that the M observations constrain only the M- dimensional observable subspace of the N-dimensional state space. Hence, 13

14 computational efficiency may be optimized by restricting operations to the M-dimensional observation space. (3) The effective rank of the background covariance. In practice the dimension M is too large to carry out the above algorithms as written. Instead, the effective rank or degrees of freedom are truncated in some way, leading to sub-optimal approximations to the optimality criterion. The following survey of solution algorithms highlights these considerations in the development of practicable data assimilation algorithms Variational Data Assimilation Variational data assimilation algorithms are so-called because they are generally derived from a stated objective function, J (x), and the calculus of variations is used to derive the first-order optimality condition, J (x = x a ) = 0. Iterative solvers, conjugate-gradient or Newton s method, may be used to solve the optimality condition, B 1 (x a x b ) + H T R 1 (y Hx a ) = 0, (28) but note the size of B R N N makes the computation of B 1 impossible except in very special cases. Assuming it is possible to compute the vector-matrix product Bx without explicitly constructing B, one can re-write the optimality condition as (I + BH T R 1 H)x a = x b + BH T R 1 y (29) where I is the N N identity matrix. Equation (29) is sometimes referred to as the primal formulation of the variational data assimilation problem, in contrast to the dual formulation, derived below. Note that the transpose of the evolution operator, M T, the so-called adjoint model, is implicit in the definition of H T. This form of variational assimilation essentially constitutes the 4D-Var algorithm used at ECMWF and NCEP. Note that preconditioners are frequently used to accelerate the convergence of the iterative solver. Depending on model complexity, iterative solvers are truncated after some small number P < M predetermined steps or when the elements of the innovation vector are comparable to the measurement error Incremental 4D-Var The incremental formulation of 4D-Var writes the above optimality condition in terms of δx a = x a x g, where x g is a first-guess field, which may or may not coincide with the background. Equation (29) becomes (I + BH T R 1 H)δx a = x b x g + BH T R 1 (y Hx g ). (30) 14

15 Treatment of the nonlinearity generally involves linearization around x g, x b, or their linear combination. In addition to the iterative solution of the linear system, (30), an outer level of iteration around a sequence first guess states may be necessary in strongly nonlinear problems. The incremental formulation also exposes the idea that one can use different models to compute x g and δx a. For example, if the model is too computationally intensive to embed in the iterative solver, it is possible to use reduced physics or reduced resolution for M inside the H operator on the left-hand-side, while still using the complete model to compute Hx g on the right-hand-side The Dual Formulation Notice that the left-hand-side of equations (28), (29), and (30) all involve N N matrices. The linear algebra can be considerably simplified by noting that H T R 1 H is of rank M. Application of the Sherman-Morrison-Woodbury formula leads to the following equivalent form of the optimality condition (28), x a = x b + BH T w (31) (HBH T + R)w = y Hx b, (32) where HBH T + R is an M M matrix. This formulation has deeper roots than it might appear. When x a and x b are taken as functions, and the evolution operator M is a integro-differential operator, Bennett (1992) shows how (31)-(32) may be derived from the Euler-Lagrange equations for the extremum of J. The analysis increment is a linear combination of the M columns of BH T, which may be used to diagnose unambiguously features of the analysis corresponding to particular observations. The matrix HBH T is the expected covariance of observations of the forecast, neglecting measurement noise; its analysis can provide a wealth of information concerning the design of the observing system. Solving for the vector w R M may be done by direct matrix inversion if the construction of the M M matrix on the left-hand-side of (32) is feasible; otherwise, iterative solvers may be applied. When (32) is multiplied by R 1, the conditioning of (29) and (32) are formally identical, and an iterative solver for the latter is known as the Physical Space Analysis System (PSAS) (Cohn et al., 1998). As with the primal formulation, preconditioning the linear system (32) is often an essential part of realistic applications. Treatment of nonlinearity is also an important issue which may be handled via an incremental approach. An iterative solver built on this idea is the core functionality of the Inverse Ocean Model (IOM), a model- and platform-independent data assimilation toolkit (Bennett 15

16 et al., 2008; Muccino et al., 2008) Kalman Filter Sequential data assimilation problems are typically cast as the Kalman Filter equations already derived. Practical implementation generally requires some simplification of the forecast or analysis covariance evolution equations, via reduction of the effective or actual rank of these matrices or the dimension of the state space. Alternate formulations are also possible in terms of inverse covariances (information matrices) or square-roots of covariance (square-root filters). There is a large literature. Three approaches are outlined in the next sections Model Reduction Among the vast array of techniques for model reduction, the most rudimentary involves projecting the dynamics onto a small number of degrees of freedom by spectral truncation, grid coarsening, or other projections. Because the solutions of interest depend not only on the model dynamics, but also on the degrees of freedom in the unknown model system noise, other approaches to reducing degrees of freedom involves analysis and reduction via empirical orthogonal functions (EOF) or similar, to define the reduced state space of the stochasticallydriven model. Further control of the reduced dimensions can be had through analyzing modeled states but weighting their importance via metrics related to specific phenomena, observing systems, or error metrics Error Subspace Statistical Estimation Reflecting on the fact that knowledge of the statistical properties of the background and the model forcing errors are generally poor, Lemursiaux suggested Error Subspace Statistical Estimation (ESSE) which uses reduced rank representations of the forecast and analysis error covariances. Rather than attempting to manipulate N N covariance matrices, in ESSE the covariances are approximated by rank-p objects, constructed to approximate the P most significant modes of uncertainty. This reduced-rank approach to covariance modeling leads to adjoint-free versions of generalized inversion. Given a rank-p decomposition of B = UΛU T, where U R N P is orthogonal, and Λ R P P is the diagonal matrix of singular values of B, one can explicitly compute HU, from which one may find BH T = UΛ(HU) T as needed. The Sherman-Morrison-Woodbury formula then provides 16

17 a means to solve (31) and (32) requiring rank-p matrix inversion Ensemble Methods The principle of most ensemble methods is to use a set of sample realizations of forecasts to estimate the forecast covariance directly. The Kalman Gain matrix can be estimated from the same ensemble, thus permitting one to compute an ensemble of analyses. From these, the analysis covariance can be estimated, and the process continued. The appeal of this approach is that it may, in principle, be applied directly to linear or nonlinear models. Furthermore, even if the statistics of the system noise are not Gaussian, the analysis field approximately satisfies a minimum variance criterion, to within the limits of accuracy of the sample statistics. Two difficulties arise in practice. First, a large number of ensemble members are necessary to accurately estimate off-diagonal elements of the forecast covariance matrix. The sample variance of a Gaussian random variable x converges like 2σ 2 x / E, where the true variance is σ 2 x and the sample size is E. However, the sample covariance of two correlated random variables x and y, converges like (2σ 2 x y + σ xσ y )/ E, where σ x y is the covariance. When the correlation between variables is small the sample covariance is dominated by sampling error. For this reason, the sample covariance must be localized or tapered to reduce distant correlations. This operation increases the effective rank of the covariance, but it must be done with careful consideration of the dynamical correlations one wishes to preserve. The other principle difficulty is the that the members of the forecast ensemble are not independent after the Kalman Filter has been running. This can contribute to a loss of variance and filter lock-on. Hence, various strategies for covariance inflation and filter re-initialization have been developed. The Ensemble Kalman Filter is but one member of a general class of so-called particle filters which have received considerable attention in recent years. The field is developing rapidly to develop strategies for propagating and updating ensembles to accurately represent non-gaussian probability densities. 5.2 Covariance Modeling Setting aside the solution algorithms for data assimilation with realistic ocean dynamics and observing systems, which are generally technological issues from the perspective of oceanography, there are important scientific questions related to defining correct error models for the dynamics and observing systems. 17

18 As discussed here, there are three components of the error model which need to be specified a priori. The background error, denoted as B or P f, is the spatial covariance of errors in the field to be analyzed. The system noise, denoted Q, is the covariance of the unknown model forcing errors, an object which describes a space-time correlation structure. Lastly, there is the observation error covariance, R, which should be determinable from the measurement devices, independent of the dynamical model Estimating the background error For optimal interpolation, 4D-Var, or sequential Kalman Filters, it is necessary to estimate the error in the background solution. In principle this can be estimated from a large ensemble of previous analyses. Another approach relies on making two predictions with different lead-times, say 12hr and 24hr, and regarding the difference as an estimate of the forecast error. If an ensemble filter is used, it may be sufficient to retain an ensemble of forecast error fields, and use these to synthesize the sample covariance as needed. Otherwise, the structure of the background error is usually parameterized in terms of an amplitude (variance) and a set of correlation lengths, aligned with some orthogonal basis. Implementations are described in Bennett (1992); Weaver and Courtier (2001); Purser et al. (2003a,b); Zaron (2006a) Estimating the system noise System noise may arise from incorrect forcing functions, for example, coarselygridded wind stress or open-boundary conditions derived from climatology, or it may arise from dynamical approximations or truncation errors in solving the dynamical equations. In the former case, the errors can generally be characterized by consideration of the data source. In the latter case, it can be difficult to quantify the errors, as they are likely to be state- and resolution-dependent Validation of Error Models Having used the above described techniques to parameterize the errors, it is essential to have a methodology to validate, a posteriori, the hypothesized error models. When the hypothesized dynamics and error models are correct, the minimum value of the objective function J (x a ) is a chi-squared variable with M degrees of freedom (Bennett, 1992). This criterion can be use to accept or reject in total the hypothesized dynamics, observations, and their error models. 18

19 A finer-grained approach can be used to analyze components of the objective function, comparing model and observations, or observation sub-types, separately. For example, let J = J B + J R represent the two parts of the objective function from the background and observations, and J B (x) = (x x b ) T B 1 (x x b ), (33) J R (x) = (y Hx) T R 1 (y Hx). (34) It may be shown that the expected values of these terms is, and J B (x a ) = Tr (HBH T D 1 ), (35) J B (x a ) = Tr (RD 1 ), (36) where D = HBH T + R is the matrix appearing on the left-hand-side of equation (32). See Talagrand (1999); Desroziers and Ivanov (2001), and Bennett (2002) for derivations and applications. A related class of techniques for calibrating error models is based on the generalizedcross validation statistic, an estimate of the prediction error of the analysis at the observation sites. Using the notation developed above, the generalized crossvalidation statistic is given by GCV (B,R;y,x b ) = (y Hxa ) T (y Hx a ) M(1 µ/m) 2 (37) where µ = Tr (RD 1 ). Optimizing this statistic amounts to selecting the error model which approximately maximizes the accuracy of prediction at each data site, using data at all other sites. It is a useful countermeasure to avoid overfitting the data, which occurs when one simply minimizes the mean-square innovation vector. Applications to data assimilation may be found in Wahba et al. (1995) and Zaron (2006b). The key benefit of these metrics is that they can be computed from a small sample of data assimilative forecast/analysis cycles, and the error models can be retuned to yield improved results. Note that direct construction and inversion of M M matrices is generally not required, as any of the solution algorithms in Section 5.1 may be combined with a randomized trace estimator (Girard, 1989; Hutchinson, 1989) to evaluate the matrix trace as needed. 5.3 Conditioning and Stability There is a deep analogy between the trivial univariate data assimilation in Section 3.2.1, and the multi-variate dual formulation of data assimilation in Sec- 19

20 tion Assume, for the sake of example, that observation errors are uncorrelated, with each observation having the same uncertainty. In other words, assume R = σ 2 y I, where σ y is the nominal measurement error and I is the M M identity matrix. When this is not the case, the observations can be non-dimensionalized and errors decorrelated (i.e., pre-whitened ) by multiplication of the data vector by R 1/2, the inverse Cholesky factor of R. With this diagonal structure for the observation errors, it is possible to write out the solution of (32) in terms of an orthogonal decomposition (the singular value decomposition, Golub and Van Loan 1989) of HBH T = UΛU T, w = (UΛU T + σ 2 y I) 1 (y Hx b ) (38) = U(Λ + σ 2 y I) 1 U T (y Hx b ) (39) M 1 = u i λ i + σ 2 u T i (y Hxb ) y (40) i=1 where U = {u i } is an M M orthonormal matrix, and Λ is the diagonal matrix of singular values {λ i } arranged along the diagonal. Applying the observation operator and projecting onto the i-th orthogonal mode, one finds λ i u T i Hxa = u T i Hxb + λ i + σ 2 u T i (y Hxb ). (41) y The key analogy with the univariate case is evident if one identifies σ 2 x in equation (6) with λ i in (41). In the limit λ i σ 2 y (perfect model), the analysis makes no correction to the background associated with mode-i. In the other limit, σ 2 y λ i (perfect data), the analysis is identically equal to the observation associated with mode-i. Bennett (1992) applies this analysis to evaluate the design of observing arrays. Each mode u i corresponds to a so-called antenna array mode associated with the observing system, dynamics, and hypothesized error models. The modes may be classified according to whether they are approximately interpolated (σ 2 y λ i ) or smoothed (λ i σ 2 y ). The effective number of degrees of freedom determined by the observing system is given by the number of modes for which σ 2 y λ i. Information about redundant observation sites can be gained from the structure of the u i modes. 6 Summary and Conclusions In summary, ocean data assimilation comprises a set of techniques for estimating the oceanic state using as much information as possible, by combining model 20

21 predictions with observed data in an optimal manner. Optimality is defined by maximum likelihood or minimum variance criteria. Application of these optimality criteria to forecasting the real ocean is difficult due to the large dimensionality or degrees of freedom in the oceanic state to be estimated. Consequently, practical algorithms are developed through approaches which either truncate the representation of the oceanic state, reduce the degrees of freedom to be estimated, or solve sub-optimal criterion for the state estimate. As computing power increases, there are fewer technological obstacles to operational data assimilative ocean forecasting. Scientific attention then focusses on developing and validating error models for the dynamics, initial conditions, and boundary forcing (Chapnik et al., 2006). Observational impact studies are another new avenue of study, which may be useful for improving data quality control, observing system design, and calibrating covariance models (Baker and Daley, 2000; Gelaro and Zhu, 2009). The existence of new ocean observing initiatives (NSF-OOI), satellite observation capabilities (SWOT, SSS), and the growing investments in operational ocean forecasting capabilities (e.g., BlueLink, NOPP-HYCOM) suggests that expertise in ocean data assimilation will be in demand for the foreseeable future. 7 Appendix A: Glossary Analysis The analysis is the end-point or result of a data assimilation. It is the best estimate of the true state of the ocean at a given time, or within a given time interval. If the analysis is retrospective, i.e., it is the best estimate of the oceanic state at some past time conditioned upon measurements both before and after the analysis time, it is called a re-analysis. Typically the analysis is presented as a set of uniformly gridded oceanic state variables (sea-surface height, current vectors, temperature, salinity, etc.), on the same discrete grid as the ocean model. The analysis may be the end result of a forecast system, or it may provide input for the computation of other diagnostics, such as the computation of transport across transects. Sometimes the analysis is compared with new observations to either verify the analysis or assess the quality of the new observations. Analysis increment The analysis increment is the difference between the analysis field and the background. Equivalently, the analysis increment is the correction to the background field which results in the optimal analysis. Background The background state, sometimes called the first guess, is the prediction of the oceanic state without data assimilation. In the absence of other 21

22 information, a climatology or other dynamics-free estimate of the ocean may serve as the background. Control variables The control variables, sometimes simply called the controls, are the independent quantities to be estimated in the data assimilation. The dynamical model consists of a set of diagnostic or prognostic relations which relates the control variables to the state variables. There is not generally a unique partition between control variables and state variables, but the controls are generally regarded as inputs while the state is regarded as an output. For example, in the 4D-Var algorithm, the model s initial conditions are regarded as the control variable; although, these same initial conditions and the resulting forecast may be regarded as state variables. In the Kalman Filter, the model s initial conditions and the system noise are considered as control variables. Data assimilation Data assimilation is the systematic methodology of incorporating information from an observing system into a dynamical model in such a manner that an optimality criterion is satisfied. Optimality criteria typically express a maximum likelihood or minimum mean square error criterion. Note that the aforementioned criteria are often identical when the error models are Gaussian. Also, in practice, many data assimilation systems find analysis states which only approximately satisfy the stated optimality criterion. This is usually considered acceptable because the optimality criteria are based on error models which are themselves approximate. Dynamical model It is assumed that the state of the ocean is predicted or modeled by a set of dynamics, e.g., Newton s Laws expressed in the usual formulations of continuum mechanics such as the Navier-Stokes equations or the shallow water equations. The dynamical model is assumed to be formulated as a mathematically well-posed initial-boundary-value problem. Error model An error model is a description of the probability distribution of some possibly multivariate or field quantity. For example, an error model for a measurement of temperature might minimally declare that the errors have zero mean (are unbiased), known variance, σ, and are Gaussian distributed. Hence, an error model for an observing system would minimally consist of error models for the individual observations. Likewise, an error model for a set of dynamics would minimally consist of error sub-models for the initial conditions, boundary conditions, and other model inhomogeneities. Each of these sub-models would be characterized by its own space-time covariance structure, as appropriate. 22

4. DATA ASSIMILATION FUNDAMENTALS

4. DATA ASSIMILATION FUNDAMENTALS... [the atmosphere] "is a chaotic system in which errors introduced into the system can grow with time... As a consequence, data assimilation is a struggle between chaotic