Lagrangian Data Assimilation and its Applications to Geophysical Fluid Flows

Size: px

Start display at page:

Download "Lagrangian Data Assimilation and its Applications to Geophysical Fluid Flows"

Henry Howard
5 years ago
Views:

1 Lagrangian Data Assimilation and its Applications to Geophysical Fluid Flows by Laura Slivinski B.S., University of Maryland; College Park, MD, 2009 Sc.M, Brown University; Providence, RI, 2010 A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in The Division of Applied Mathematics at Brown University PROVIDENCE, RHODE ISLAND May 2014

3 This dissertation by Laura Slivinski is accepted in its present form by The Division of Applied Mathematics as satisfying the dissertation requirement for the degree of Doctor of Philosophy. Date Björn Sandstede, Ph.D., Advisor Recommended to the Graduate Council Date Martin Maxey, Ph.D., Reader Date Elaine Spiller, Ph.D., Reader Approved by the Graduate Council Date Peter Weber, Dean of the Graduate School iii

4 Vitae Degrees and Honors University of Maryland Mathematics B.S., cum laude, 2009 Brown University Applied Mathematics Sc.M., 2010 Brown University Applied Mathematics Ph.D., expected 2014 Publications L.C. Slivinski and C. Snyder. Particle filtering in high-dimensional nonlinear systems. In preparation, L.C. Slivinski, E.T. Spiller, A.S. Apte, and B. Sandstede. A hybrid particle-ensemble Kalman filter for Lagrangian data assimilation. Submitted, L.C. Slivinski, A.R. Margetts, and D.W. Bliss. Sparse space-time equalization with l 1 norm. Asilomar Conference on Signals, Systems, and Computers. Pacific Grove, CA. November 6-9, iv

5 Selected Presentations A hybrid particle-ensemble Kalman filter scheme for Lagrangian data assimilation. SIAM Conference on Uncertainty Quantification, Savannah, GA, Particle filtering for nonlinear systems: proposals and scalability. IMA Hot Topics Workshop: Predictability in Earth System Processes, University of Minnesota, MN, Lagrangian data assimilation and its application to geophysical fluid flows. (Poster) Sixth WMO Symposium on Data Assimilation, College Park, MD, Lagrangian data assimilation and its application to geophysical fluid flows. SIAM Conference on Applications of Dynamics Systems, Snowbird, UT, Sparse space-time equalization with l1 norm. (Poster) IEEE Asilomar Conference on Signals, Systems, and Computers, Pacific Grove, CA, Professional Experience Visitor, National Center for Atmospheric Research Summer 2013 Intern, MIT Lincoln Laboratory Summer 2010 Intern, National Security Agency Summer 2008 Intern, Orbital Sciences Corporation Summer 2006 & 2007, Winter 2008 v

6 Teaching Experience Teaching Assistant, Brown University APMA1210 (Operations Research: Deterministic Models), Fall 2010 APMA1200 (Operations Research: Probabilistic Models), Spring 2011 APMA1650 (Statistical Inference I), Fall 2011 Teaching Assistant, University of Maryland Math003 (Algebra I and II), Fall Spring 2008 vi

7 Acknowledgments I would first like to thank my advisor, Björn Sandstede, for introducing me to my thesis topic and allowing me freedom of direction in pursuing it, as well as for many opportunities to travel and discuss my research with scientists around the world. I would also like to thank Professor M. Maxey and Professor E. Spiller for donating their time to serve on my thesis committee. I would like to particularly thank Professor Spiller for suggesting the algorithm that is a focus of my thesis and continuing our collaboration over the last several years through visits, conferences, and Skype calls. Dr. C. Snyder, Professor A. Apte, and Professor C.K.R.T. Jones have also been fantastic collaborators, and have been instrumental in allowing me further opportunities for scientific travel. I have made many meaningful friendships while at Brown, and would like to thank every one of them for my experience over the last few years. In particular, I would like to thank Kelly McQuighan for sharing an office with me for four years, discussing math and non-math, and introducing me to rock climbing. Finally, I would like to thank my family for their unending support through all of my endeavors. I wouldn t have been able to do this without you. vii

8 Contents Vitae Acknowledgments iv vii 1 Introduction 1 2 Review and Preliminaries Data Assimilation Ensemble Methods Ensemble Kalman Filter Particle Filter Lagrangian Data Assimilation Motivation for a Hybrid Filter Hybrid Filter Algorithm The Proposed Filter Setup Between Updates Update - No Resampling Update - With Resampling Calculating the Covariance Matrices Multiple Drifters Expected Benefits and Outcomes Application to the Linear Shallow Water Equations Model Description Scenario 1 - Single Step, Bimodal Prior Scenario 2 - Long Trajectory; Undamped, Unforced Model Scenario 3 - Long Trajectory; Damped, Forced Model Discussion viii

9 5 Application to the Nonlinear Shallow Water Equations Model Description Previous Results on Drifter Deployment Results Numerical Implementation Results - Ensemble Kalman Filter Results - Hybrid Filter Discussion Particle Filtering in High-Dimensional Nonlinear Systems Review of Previous Asymptotic Results Setup Asymptotic Behavior of Weights Calculation of τ in the Linear, Gaussian Case Numerical Results Numerical Model Extension to Nonlinear Case: Standard Proposal Optimal Proposal Sampling Optimal Proposal in Nonlinear Systems Numerical Results Performance of Standard and Optimal Proposals Discussion Conclusion 131 A Derivation of the Shallow Water Equations 139 B Derivation of the Linear Shallow Water Equations and Solutions 144 ix

10 List of Tables 2.1 Effect of different resampling methods on sampling noise, demonstrated using the errors in variance of resampled ensemble compared to the variance of the original ensemble, before resampling Root mean squared error of each filter over assimilation window: scenario 2 (without damping or forcing.) Root mean squared error of each filter over assimilation window: scenario 3 (with damping and forcing.) Parameter values used in the nonlinear shallow water equations in [57] Spread-skill relation for the EnKF in the Lorenz 96 system Measure of nonlinearity in Lorenz 96 system, as a function of increasing integration steps. Ratio=1 for a linear system Average (over assimilation steps) of skewness values of {log(s i )} i for varying observation error, for a short assimilation length and long assimilation length Variance of each Q i and the relative error between Q and Q, as the initial ensemble spread varies. Calculation of Q i, Q, and Q described in text Variance of each Q i and the relative error between Q and Q, as the model noise σ varies Number of times the resampling threshold was hit, for varying state dimension N x and varying ensemble size N e, for both the standard and optimal proposals Errors of PF mean from truth (averaged over last 200 of 300 assimilation steps, with 95 % confidence bounds) and number of times the resampling threshold was hit for varying system noise σ, for both standard (std) and optimal (opt) proposals x

11 List of Figures 2.1 Diagram of sequential data assimilation Evolution of ensemble under Lorenz 96 model: no assimilation. Each plot shows the ensemble members (thin blue lines), the ensemble mean (thick dark blue line), and the truth (red line). The x-axis indexes the element of the state vector, while the y axis gives the value of each element Evolution of ensemble under Lorenz 96 model: EnKF with localization. Each plot shows the ensemble members (thin blue lines), the ensemble mean (thick dark blue line), the truth (red line), and the observations (green crosses). The x-axis indexes the element of the state vector, while the y axis gives the value of each element Effect of varying localization radius on the performance of the EnKF in the Lorenz 96 system. See text for parameter values used EnKF after a single update step with a bimodal prior distribution (red), Gaussian likelihood (green dashed) with narrow (left) and wide (right) variance, and the true Bayes posterior (thin black). The approximate distribution of the EnKF analysis ensemble is shown in thick blue Particle filter posterior distribution after a single update step with a bimodal prior distribution (red), Gaussian likelihood (green dashed) with narrow (left) and wide (right) variance, and the true Bayes posterior (thin black). The approximate distribution of the particle filter updated ensemble is shown in thick magenta Snapshot of the velocity field (blue arrows) arising from the linear shallow water equations and a sample drifter trajectory (red asterisks) Setup for each scenario. Left: snapshot in time of flow field (u, v) (arrows) and height field h (shading), scenario 1: no noise, damping, or forcing. Right: true drifter trajectories; black circles, scenario 2 (no damping or forcing) and red crosses, scenario 3 (with damping and forcing.) Comparison of posterior distributions of particle filter (blue dashed curve), ensemble Kalman filter (red dotted curve), and hybrid filter (green dash-dotted curve): single forecast & update step of stationary linear shallow water equations. Bimodal prior on y; Gaussian priors on u 0, u 1, v 1, h xi

12 4.3 Evolution of filter means. Scenario 2a: no damping or forcing, high observation frequency Evolution of errors. Scenario 2a: no damping or forcing, high observation frequency Evolution of filter means. Scenario 2b: no damping or forcing, low observation frequency Evolution of errors. Scenario 2b: no damping or forcing, low observation frequency Distributions of drifter variables. Scenario 2a: no damping or forcing, high observation frequency Evolution of filter means. Scenario 3a: with damping and forcing, high observation frequency Evolution of errors. Scenario 3a: with damping and forcing, high observation frequency Evolution of filter means. Scenario 3b: with damping and forcing, low observation frequency Evolution of errors. Scenario 3b: with damping and forcing, low observation frequency Distributions of drifter variables. Scenario 3a: with damping and forcing, high observation frequency Trajectories of the true drifters, over a period of 300 days, for each launch strategy. Light gray lines are lines of constant height. Figure from [57] Errors in (a) kinetic energy, (b) height field, and (c) drifter position of each of the deployment strategies, as well as the errors without assimilation. Errors are calculated as described in the text. Figure from [57] True height field of nonlinear shallow water equations at initial assimilation time True trajectories of drifters under the nonlinear shallow water equations Evolution of the true velocity vector field for the nonlinear shallow water equations Errors for each deployment strategy, using the EnKF Spatial errors in kinetic energy (EnKF). Left to right: 25 days, 75 days, and 150 days. Top to bottom: uniform release, saddle release, center release, and mixed release. Drifter locations for each release and each time are given in white asterisks. Axes represent length in meters, errors are in m/s Same as Fig. 5.7 but for height field; errors are in meters Errors for each deployment strategy, using the hybrid filter Evolution of the state x from observation k to observation k+1, where the noise is additive at intermediate integration steps Numerical validation of Eqn. (6.8), for varying values of λ j. The numerical best-fit line is denoted by the solid line, and the theoretical relationship is given by the dashed line. From [62], Figure xii

13 6.3 Numerical estimation of (2 log(n e )) 1/2 /τ versus the time average and 95% confidence intervals of E[1/w max ] 1 calculated using the standard proposal. Blue represents calculating τ from the true eigenvalues, and red represents calculations based on the diagonal entries of the matrix. The black line represents the theoretical relationship between (2 log(n e )) 1/2 /τ and E[1/w max ] Numerical estimation of (2 log(n e )) 1/2 /τ versus the time average and 95% confidence intervals of E[1/w max ] 1 calculated using the optimal proposal, with approximations as described in the text. Blue represents calculating τ from the true eigenvalues, and red represents calculations based on the diagonal entries of the matrix. The black line represents the theoretical relationship between (2 log(n e )) 1/2 /τ and E[1/w max ] Average errors for standard proposal (solid line) and optimal proposal (dashed line), as a function of ensemble size and for varying state dimensions: N x = 5 (blue), N x = 10 (red) and N x = 20 (black) Comparison of maximum weight after one assimilation step as a function of state dimension, using the standard proposal (blue) and the optimal proposal (red) xiii

14 Chapter One Introduction

15 2 Numerical weather prediction has been a challenging problem for meteorologists for decades. Not only are well-developed atmospheric models necessary to make accurate predictions, but the initial conditions of the system must also be known to some degree of accuracy [37]. However, even small errors in these initial conditions may lead to forecasts which are wildly inaccurate due to the inherent chaos in the atmosphere [65]. Luckily, scientists generally have observations of the system of interest which may be used to pull the predictions back towards the true state. Methods which combine predictions from models with observed data fall into a category known as data assimilation (DA). Talagrand [64] defines the purpose of assimilation as using all available information [to] determine as accurately as possible the state of the atmospheric or oceanic flow. This is a very large and active field of study, and different data assimilation algorithms are used in operational centers for real-time weather prediction [53]. In practice, many of these algorithms fall under the category of sequential data assimilation. Generally, this refers to algorithms which update the state estimate every time an observation becomes available. For weather prediction, observations are available about every six hours or so. The sequential algorithm updates the estimate with an observation and uses the system evolution model to make predictions of the state at future times, until the next observation is available, and the process repeats. The major differences across sequential data assimilation algorithms arise in the way they utilize the observations to update the estimate. Ideally, this estimate will include all inherent uncertainty in the estimate arising from the uncertainty in the initial conditions, the observations, and potentially the model itself. Thus, the ideal estimate would be an entire probability distribution on the state; probabilistically, the best way to calculate this distribution is to use Bayes rule, an equation derived

16 3 from the definitions of conditional probability which describes how to calculate the distribution of a state of interest given an observation. Although this Bayesian approach is the ideal way to implement data assimilation (in a probabilistic sense), it is often infeasible; thus, most DA schemes must make certain approximations. While numerical weather prediction and assimilation of satellite data is one of the most well-known applications of data assimilation, it is by no means the only one. The oceans are another system which must be well-understood in order to make accurate predictions regarding the climate on longer (seasonal to decadal) time scales. Additionally, understanding the currents in a body of water is necessary to make predictions regarding the paths of substances, such as oil after a spill. In addition to the currents, this system requires knowledge of temperature, salinity, and pressure, for instance. Lagrangian ocean instruments drifters, floats, and gliders which are advected by velocity fields while taking measurements provide a significant and important source of this data. However, the Lagrangian nature of the data makes the assimilation of it into models a nontrivial task [43, 58, 47]. In particular, one would like to estimate the currents (or velocity field) in the ocean, but the instruments may only provide data about their locations at discrete points in time; the transformation from data to state of interest, then, may be highly nonlinear. On the other hand, Lagrangian data assimilation avoids directly computing this transformation; instead, the drifter positions are augmented to the velocity variables, which are then updated indirectly from the drifter observations. This method also avoids approximations necessary to transform the observations into the state space. It has been developed and applied successfully in several studies over the last decade [35, 43, 58, 66]. However, Lagrangian data assimilation is not without its own difficulties. First, the state of interest is generally very high-dimensional: it may consist of velocity, temperature, salinity, pressure, etc. at grid points across the domain of the model.

17 4 As computational power increases, the resolution of these models grows, causing this state dimension to grow larger. A second major difficulty of the Lagrangian data assimilation approach is that the Lagrangian trajectories, which are now part of the state of interest, are highly nonlinear; many traditional data assimilation algorithms are based on linearizing assumptions. Two well-known data assimilation schemes, the ensemble Kalman filter and the particle filter, each have strengths and weaknesses in the context of Lagrangian data assimilation. The ensemble Kalman filter is computationally feasible in highdimensional systems, but fails when the system is highly nonlinear. The particle filter deals with nonlinear systems very well, but quickly becomes computationally intractable with increasing dimension. The ensemble Kalman filter has previously been used in Lagrangian data assimilation [35, 58], but in cases where the drifter trajectories are highly nonlinear (such as when the time between observations becomes long) the ensemble Kalman filter has been shown to fail [5]. This motivates the development of a hybrid particle-ensemble Kalman filter, which will be the focus of this work. We aim to show that this filter overcomes the two main challenges of Lagrangian data assimilation, that is, the high-dimensionality of the flow variable and the high degree of nonlinearity of the drifter variable. To this end, the filter algorithm consists of applying the ensemble Kalman filter to the high-dimensional, relatively nonlinear flow variable, and the particle filter to the low-dimensional, highly nonlinear drifter variable. We test this algorithm in several situations, each of which consists of a synthetic truth and observations so that we may compare the filter output to the truth. We apply this filter first to the simple test case of the linear shallow water equations, for which the velocity field can be parameterized to a low-dimensional flow variable. In this case, the particle filter is tractable on the entire system, and we may com-

18 5 pare statistics from the distributions of the particle filter to those of the ensemble Kalman filter and the hybrid filter. We next apply the hybrid filter to the nonlinear shallow water equations, for which the particle filter is intractable. In this case, we compare the ability of the means of the hybrid filter and the ensemble Kalman filter to track the true state of the system. Finally, we provide further motivation for the development of this filter by showing some numerical results regarding the scaling of the particle filter in nonlinear systems. This work is organized as follows. In Chapter 2, we provide an overview of data assimilation from a Bayesian viewpoint, a discussion of the ensemble Kalman filter and the particle filter, and a discussion of Lagrangian data assimilation. In Chapter 3 we give a detailed discussion of the development and implementation of the hybrid particle-ensemble Kalman filter, along with notation and setup for the following two chapters. Chapter 4 provides results and discussion of the application of the hybrid filter to the linear shallow water equations. An application to the nonlinear shallow water equations and an investigation of different drifter deployment strategies is given in Chapter 5. In Chapter 6, we investigate the scalability of the particle filter in high-dimensional nonlinear systems. Finally, Chapter 7 includes an overall discussion, conclusion, and directions for future work.

19 Chapter Two Review and Preliminaries

20 7 2.1 Data Assimilation In this work we will take a Bayesian approach to data assimilation; see, for example, [3, 73]. To this end, we assume we have a system state x whose dynamics are given by dx dt = f(x), (2.1) x(0) = x 0. (2.2) The model f may include stochastic noise terms, or it may be deterministic. Further, we assume that there is uncertainty in the initial conditions, so that there is some prior probability distribution on x 0 given by p(x 0 ). We also assume that we have noisy observations available at discrete times t k given by y k = h(x(t k )) + ɛ k, k = 1...K. (2.3) While the observation noise does not have to be Gaussian, we will assume in this work that ɛ k N (0, R). From a Bayesian viewpoint, the goal is to find the posterior distribution on x 0 (in the deterministic model case) or on paths {x t } (in the stochastic case), given the observations y = {y k } k=1...k. Ideally, we want to sample from this distribution. Assuming for the moment that we are in the deterministic model case, then by Bayes rule, p(x 0 y) = p(y x 0)p(x 0 ) p(y) (2.4) p(y x 0 )p(x 0 ). (2.5)

21 8 This assumes that we have observations up to time K, and want to assimilate them all at once. However, a more realistic situation may be that observations are being collected in real time, and we want to assimilate them sequentially as they come in. In this case, it might not make sense to find an updated distribution on x 0 every time, before evolving that distribution forward to the current time in order to make a prediction. Instead, we want an estimate of (or at least a sample from) the distribution on the state at the current time t k given the observations up to and including the current time; we can then evolve this estimate forward in time to make a prediction, then update that prediction when a new observation is available, and so on. This is known as sequential data assimilation and we now describe it in more detail. Let y 1:k = [y 1,..., y k ]. Then, the distribution we are interested in is p(x k y 1:k ) p(y 1:k x k )p(x k ) (2.6) up to a multiplicative constant of proportionality that depends only on y 1:k. In the case that the model and observation operator are both linear and the initial distribution of the system state, observation likelihood, and model noise (if it exists) are all Gaussian, the exact Bayesian posterior distribution can be calculated at each step. Since Gaussian distributions remain Gaussian under linear transformations and products of Gaussian distributions are proportional to Gaussian distributions, the posterior distribution will always be Gaussian. Finally, since Gaussian distributions in R n are completely determined by the mean and covariance, any data assimilation scheme in this case need only keep track of the mean and covariance. For the linear Gaussian case, we introduce the following notation. The state

22 9 evolves between discrete times according to the linear model M: x k = Mx k 1 + η k, (2.7) η k N (0, Q) (2.8) and the observations are given by y k = Hx k + ɛ k, (2.9) ɛ k N (0, R), (2.10) where H is the linear observation operator. Let x k denote the mean of the state at time k, and P k denote the covariance at time k. The Kalman filter was originally presented in [36] as a solution of the linear filtering problem, and a derivation of the update equations for the data assimilation problem are given in [14]. This filter, along with several other sequential DA methods, involves two steps which are repeated whenever an observation becomes available: a forecast step and an analysis step. (See Figure 2.1.) In the following, a superscript f will denote forecast while a superscript a will denote analysis. The forecast estimate and covariance are found by evolving the previous estimate and covariance forward under the model: ˆx f k = Mˆxa k 1 (2.11) P f k = MPa k 1M T + Q. (2.12) Assuming an observation y k is available at time k, the Kalman update equations

23 10 give the analysis estimate and covariance after assimilating this observation: ˆx a k = ˆx f k + K(y k Hˆx f k ), (2.13) P a k = (I K k )P f k (2.14) where K k = P f k HT (HP f k HT + R) 1 (2.15) is the so-called Kalman gain matrix. We now show that the Kalman update equations agree with the Bayesian posterior statistics (following [60]). More precisely, assume the prior distribution (given information about all previous observations) is given by ) p(x k y 1:k 1 ) = 1/C 1 exp ( (x k x fk )T (P fk ) 1 (x k x fk )/2 (2.16) where C 1 is a normalization constant independent of x k, and define the observation likelihood as p(y k x k ) = 1/C 2 exp ( (y k Hx k ) T R 1 (y k Hx k )/2 ), (2.17) where C 2 also does not depend on x k. Note that x k is random here, but y k is the (deterministic) observation itself. By Bayes rule then, the posterior distribution is given by p(x k y 1:k ) = 1/C 3 exp( J(x k )/2), (2.18) J(x k ) = (x k x f k )T (P f k ) 1 (x k x f k ) + (y k Hx k ) T R 1 (y k Hx k ) (2.19)

24 11 observation time t1 time t2 time t2 forecast analysis repeat Figure 2.1: Diagram of sequential data assimilation where C 3 is again independent of x k. After some algebra, we have J(x k ) = x T k ( ) ( T (P f k )T + H T R 1 H x k 2 H T R 1 y k + (P f k k) ) 1 x f xk + const (2.20) = (x k x a k) T (P a k) 1 (x k x a k) + const, (2.21) where x a k = P a k ( ) ( 1 H T R 1 y k + (P f k ) 1 x f k, P a k = (P f k ) 1 + H T R H) 1. (2.22) These can be rewritten as ) 1 ( ) x a k = x f k + Pf k (HP HT f k HT + R y k H x f k, (2.23) P a k = (I P f k HT (HP f k HT + R) 1 H)P f k. (2.24) Thus when the forecast mean is taken to be the Kalman forecast estimate (ˆx f k = xf k ), the Kalman update equations provide the Bayesian mean and covariance in the linear Gaussian case. In the next section we discuss ensemble methods for data assimilation. We will

25 12 focus on these methods primarily because they are the basis for the hybrid particleensemble Kalman filter developed in this thesis. However, another category of data assimilation methods includes variational approaches [15]. Generally, these methods find the best estimate of the system by minimizing some cost function. If this cost function only incorporates observations for a single given time, it is known as 3DVAR; on the other hand, 4DVAR incorporates observations over a window of time. Now, if we are in the linear Gaussian case, the 4DVAR estimate coincides with the maximum a posteriori estimate (which will be the mean in this case.) Additionally, variational methods have been used successfully in operational centers all over the globe [20, 41]. (See [6] for a mathematical and statistical overview of variational methods as compared to ensemble methods.) However, variational methods have drawbacks as well. First, these methods require computation of the adjoint of the cost function, which can be computationally expensive. Second, they require defining the background covariance matrix a priori, which requires a fair amount of insight into the specific problem. Finally, these methods are designed to provide an accurate estimate of the state, but not a probabilistic distribution (that is, they provide the best-guess estimate, but do not quantify the uncertainty surrounding it.) 2.2 Ensemble Methods When the model dynamics are nonlinear, all approaches mentioned so far either fail or begin to run into difficulties. In particular, the Kalman filter as described above is impossible to implement: the forecast covariance matrix P f cannot be explicitly calculated without making some linearizing assumptions. Variational methods (in their traditional formulation) become unwieldy as the model gets more complicated, due to the need to calculate the adjoint. In addition, nonlinear models result in non-

26 13 Gaussian distributions, for which functional forms may be difficult or impossible to calculate. This motivates the development of ensemble methods, which essentially approximate a continuous probability space with an ensemble of possible states. In this section, we discuss two common ensemble methods: the ensemble Kalman filter and the particle filter Ensemble Kalman Filter As mentioned above, the ensemble Kalman filter approximates the Bayesian posterior probability distribution using an ensemble of possible states. Generally, to incorporate observations when available, each ensemble member is updated in a way similar to that of the traditional Kalman filter. This method, with some algorithmic improvements, has been used in various fields including atmospheric and oceanic applications, satellite data assimilation, and traffic flow estimation [33, 34, 74]. In particular, this method is well-known to be successful in high-dimensional data assimilation problems (unlike the particle filter, as we will see below.) Here, we assume we have a map F that discretizes in time the continuous model f of our dynamical system above. In particular: x k = F(x k 1 ) + η k, (2.25) η k N (0, Q) (2.26)

27 14 and the observations are again given by y k = Hx k + ɛ k, (2.27) ɛ k N (0, R). (2.28) Like the Kalman filter, the ensemble Kalman filter (EnKF) [21] can be formulated as an iterative scheme which consists of a forecast step and an analysis step whenever an observation is available. Unlike the Kalman filter, the EnKF employs an ensemble of state vectors {x i } i=1...ne to represent the posterior distribution. The forecast step for the EnKF simply requires evolving the previous ensemble estimate forward under the model: x f i (t k) = F (x a i (t k 1 )) + η k. (2.29) At analysis times, the members themselves are updated according to an ensemble approximation of the traditional Kalman filter update step. However, since the true forecast covariance cannot be calculated, it is approximated as the sample covariance from the ensemble: P f (t k ) = x f (t k ) = 1 N e 1 N e 1 N e i=1 N e i=1 ( ) ( T x f i (t k) x f (t k ) x f i (t k) x f (t k )), (2.30) x f i (t k). (2.31)

28 15 The ensemble members are then updated at time t k according to ) x a i (t k ) = x f i (t k) + K (y(t k ) Hx fi (t k) + ɛ i, (2.32) K(t k ) = P f (t k )H T ( HP f (t k )H T + R ) 1, (2.33) ɛ i N (0, R). (2.34) This formulation is the so-called perturbed observation EnKF (see [10, 33, 22]). Here, K is the Kalman gain matrix and ɛ i are the observation perturbations. These perturbations are necessary to ensure that the posterior covariance has the correct size and structure (in a Bayesian sense) in the linear, Gaussian case. In high dimensional problems, the EnKF as described above may fail. In particular, spurious correlations may arise between data points that are (physically) far away from each other, due an ensemble size that is smaller than the dimension of the system. To avoid this problem, Houtekamer and Mitchell [33] suggest covariance localization, essentially suppressing correlations between points that are further apart than a given radius. Gaspari and Cohn [24] suggest a smooth correlation function for such localization. This is now widely used (e.g. [30, 34]) and is known as the Gaspari-Cohn correlation function. Another common problem that the EnKF encounters, especially in the presence of relatively small ensembles, is over-tightening of the covariance matrices. Research has been done in the area of covariance inflation in order to ameliorate this problem, including scaling the covariance by a constant coefficient [2] and adaptive inflation [1]. Few theoretical results on convergence of the EnKF in general situations are available; however, [46] gives a proof of convergence of the EnKF algorithm to that of the Kalman filter in the limit of large ensembles, for the case of a linear model and normal distributions.

29 16 To demonstrate some simple behavior of the EnKF on a nonlinear system, we consider the Lorenz 96 equations with 40 variables [45]. The deterministic form of these equations is given by: dx (j) dt = (x (j+1) x (j 2) )x (j 1) x (j) + F, (2.35) for j = 1,..., 40 and F = 8 here. Note that this is the traditional choice of forcing parameter to represent the usual level of chaos in the system. To solve these equations, we use a third-order Runge-Kutta scheme with time step dt = As a demonstration of the chaos in this system, we first run an ensemble under this model without performing any data assimilation. Specifically, we pick a random initial condition which we then evolve under the model until t = 10 in order to create an initial true state which has already been spun up. We then draw an ensemble of 20 members with mean equal to the truth and standard deviation equal to 0.01, so that the initial spread is quite small. We then allow the ensemble members to run under the model independently from the truth, and show how the spread evolves in time in Figure 2.2. The leftmost plot shows that, from t = 0 to t = 1, the ensemble (blue) does not spread away from the truth (red). However, at t = 4, the ensemble has already spread significantly; at t = 7, the ensemble members look so different from each other that the mean of the ensemble is essentially the zero state, unlike the truth. On the other hand, if we implement the EnKF to assimilate observations at t = 1, 2, 3,..., the ensemble stays much closer to the truth, and the ensemble mean is generally a good estimate of the truth (see Figure 2.3.) In this case, we still have an ensemble size of 20 with initial mean equal to the truth and initial spread equal to 0.01, but we will also assimilate observations: assume we have full observations,

30 ï ensemble mean ensemble truth ï ï t=1 t=4 t=7 Figure 2.2: Evolution of ensemble under Lorenz 96 model: no assimilation. Each plot shows the ensemble members (thin blue lines), the ensemble mean (thick dark blue line), and the truth (red line). The x-axis indexes the element of the state vector, while the y axis gives the value of each element. so that H = I. Assume R = σ 2 obs I with σ2 obs = 0.1; that is, the observation noise is independent in space and time. We also use a Gaspari-Cohn localization function with radius 5 (this value was tuned, and will be explored further below.) In this example, the ensemble spread (light blue) and mean (dark blue) lie almost entirely beneath the truth (red). As expected, assimilating observations has greatly improved the mean estimate of this chaotic system ensemble mean ï5 truth observation ensemble ï5 ï5 ï Figure 2.3: Evolution of ensemble under Lorenz 96 model: EnKF with localization. Each plot shows the ensemble members (thin blue lines), the ensemble mean (thick dark blue line), the truth (red line), and the observations (green crosses). The x-axis indexes the element of the state vector, while the y axis gives the value of each element. In the next example, we show the effect of varying the localization radius in the EnKF. Here, we have decreased the ensemble size to 5 but kept the number of variables (i.e. the state dimension) at N x = 40. We lengthen the time between observations to 3 and increase σobs 2 to 1. Finally, we assimilate every third observation (x = 1, 4, 7,...). Figure 2.4 shows the results of this experiment. The x axis displays

31 18 localization radius, and the y axis displays the root mean squared error of the EnKF analysis mean from the truth, averaged over all spatial variables as well as averaged over the entire time window. Fifty total observations were assimilated. In particular, note that there is a range of values of localization radius that improves the results; when the radius increases (little to no localization), the performance decreases. 4.8 Effect of Localization in the EnKF space - time RMSE ï 2 10 ï localization radius Figure 2.4: Effect of varying localization radius on the performance of the EnKF in the Lorenz 96 system. See text for parameter values used. With improvements such as localization, the ensemble Kalman filter yields reasonable results in some nonlinear cases, as shown above, and many high-dimensional cases [57]. However, the ensemble Kalman filter has its drawbacks as well: it is based on a scheme which provides the optimal posterior mean and covariance when the prior and likelihood are Gaussian and the model is linear. Thus, when the true posterior distribution is non-gaussian, the ensemble Kalman filter will often result in a posterior distribution which is close to Gaussian. Methods like covariance localization or inflation do not overcome this basic shortcoming of the EnKF, which is its inability to capture highly non-gaussian distributions. This was not a problem in the above Lorenz 96 experiments, since the time between observations was fairly short, and thus the model is approximately linear between observations. However, it has been observed [5, 4] that the EnKF may fail when the time between observations is long enough to allow the nonlinearity to become significant.

32 19 In Figure 2.5 we show a simple case in which the EnKF completely fails to capture the true Bayesian posterior distribution. In this example, we consider a single variable, and a single update step of the EnKF. The prior distribution (red) is strongly bimodal, and the observation likelihood (green dashed) is Gaussian. The EnKF posterior distribution is the continuous approximation based on the discrete analysis ensemble. In this case, the true Bayes posterior (thin black) is also bimodal, and the EnKF posterior (thick blue) differs strongly from it. In the case of a narrow likelihood (left), the EnKF algorithm overly trusts the observations, and its posterior is very close to the Gaussian likelihood. In the case of a broad likelihood (right), the EnKF posterior no longer resembles a Gaussian distribution, but it still fails to capture the true posterior distribution. Note here that the ensemble size used for the EnKF is N e = 10 5, which is very large for a one-dimensional problem; thus, the failure of the EnKF in this case is not due to sample impoverishment, but rather an inherent characteristic of the algorithm Narrow likelihood obs likelihood prior EnKF posterior Bayes posterior Broad likelihood ! Figure 2.5: EnKF after a single update step with a bimodal prior distribution (red), Gaussian likelihood (green dashed) with narrow (left) and wide (right) variance, and the true Bayes posterior (thin black). The approximate distribution of the EnKF analysis ensemble is shown in thick blue.

33 Particle Filter Gordon et al. introduced the particle filter (or bootstrap filter ) as a method of sequential importance sampling for nonlinear, non-gaussian problems [27]. The main idea behind the particle filter, as opposed to the ensemble Kalman filter, is that the ensemble members have associated weights, which are updated in a way that is consistent with Bayes rule. Thus, in its basic form, the particle filter makes no approximations on the form of the prior or posterior distributions, and so can handle non-gaussian distributions. For an overview of the field, including some of the derivations to follow, see for example [44, 17, 16, 69]. For a discussion of some convergence results, see [12]. Various different particle filtering methods have been applied successfully for different purposes, such as target tracking [26], position and navigation [28], time series analysis [40], and parameter estimation [18]. Generally, particle filters are most useful on low-dimensional problems; below, we will explore the reasons behind this, but first we describe the algorithm. We will denote the weighted ensemble at time k as {x k i, wi k } i=1...ne. Unlike the ensemble Kalman filter, there is no clear separation between the forecast and analysis steps. Instead, we view the algorithm as a sequential Monte Carlo sampling scheme: when an observation is available, the particles are sampled from a proposal distribution, and the weights are subsequently updated to incorporate the observation (in a way that is consistent with Bayes rule.) We will discuss two of the more common proposal distributions: the standard proposal and the optimal proposal. Note that optimal here does not refer to the performance of the filter, but rather the minimum variance across weights.

34 21 When an observation becomes available, we want to approximate the posterior distribution of the state given the observation with a weighted ensemble of states. That is, if the observation is available at time k, we are interested in p(x k y 0:k ) N e i=1 wk i δ(x k x k i ). We will briefly review the derivation based on importance sampling [17, 61]. Suppose we want to sample from the distribution of states at the current time and at the previous observation time, given the observation at the current time: p(x k, x k 1 y 0:k ). We will see later (when discussing the optimal proposal) how this formulation of the joint density will aid in the development of methods to sample states at the current time. Since this distribution is unknown, we instead sample from a proposal distribution denoted π(x k, x k 1 y 0:k ), which we can choose, and then assign weights to each member of the sample. This weighted ensemble is then a sample from the correct distribution, if the weights are given by w k i p(xk i, x k 1 i y 0:k ) π(x k, x k 1 y 0:k ) (2.36) and where the constant of proportionality is defined so that the sum of all weights is 1. Since we will always be conditioning on y 0:k 1, we omit this term in what follows, and consider only y k. We will also choose a proposal density of the form π(x k, x k 1 y k ) = π(x k 1 )π(x k x k 1, y k ). Then by Bayes rule, since p(x k i, x k 1 i y k ) = p(y k x k i )p(x k i x k 1 i )p(x k 1 i ), wi k p(yk x k i )p(x k i x k 1 i )p(x k 1 π(x k 1 i )π(x k i xk 1 i, y k ) i ) (2.37) and since w k 1 i = p(x k 1 i )/π(x k 1 i ) (recall we dropped the dependence on y 0:k 1 in

35 22 our notation), then this reduces to an iterative form: i ) w k i p(yk x k i )p(x k i x k 1 π(x k i xk 1 i, y k ) w k 1 i. (2.38) The simplest choice for a proposal density is the standard proposal (or bootstrap filter), in which π(x k, x k 1 y k ) is chosen to be p(x k x k 1 ); one can easily sample from this distribution by simply evolving each particle under the stochastic dynamics of the system. Then the weight update reduces to w k i p(y k x k i )w k 1 i, (2.39) so that the weights are updated using the likelihood of the observation given each particle p(y k x k i ) and then normalized to sum to 1. This likelihood can be computed directly in the case of additive observation noise whose distribution has a functional form. Since the proposal density can be any probability distribution that can be explicitly calculated and sampled from, we could also choose to use a proposal which includes more information than the standard proposal. Doucet et al [17] discuss the so-called optimal proposal, which includes information about the previous state as well as the current observation: π(x k x k 1, y k ) = p(x k x k 1, y k ). Using Bayes rule again, we have that p(x k x k 1, y k ) = p(y k x k )p(x k x k 1 )/p(y k x k 1 ) (2.40) = p(y k x k i )p(x k i x k 1 i ) = p(y k x k 1 i )p(x k i x k 1 i, y k ), (2.41)

36 23 so the weight update now reduces to wi k p(y k x k 1 i )w k 1 i. (2.42) Note that drawing from the optimal proposal and updating the weights are both more complicated in the case of the optimal proposal than the standard proposal, due to conditioning on the observation at the current time. When we are in the linear Gaussian case, these distributions can be calculated explicitly. To this end, assume the model and observations are as given in Equations (2.7)-(2.10). Then x k x k 1, y k N ( x k, P), (2.43) x k = (I KH)M(x k 1 ) + Ky k, (2.44) P = (I KH)Q, (2.45) K = QH T (HQH T + R) 1. (2.46) In this case, the weights have an analytic update expression, since y k x k 1 N (HM(x k 1 ), HQH T + R). (2.47) Thus, the particles at time k are first sampled from Equation (2.43), and then their weights are updated according to { wi k exp 1 2 ( y k HM(x k 1 i ) ) T ( HQH T + R ) 1 ( y k HM(x k 1 i ) } w k 1 i. (2.48) When the model is nonlinear, these distributions will likely be non-gaussian, so the above derivations do not hold. In this case, approximations must be made which generally add computational time to the algorithm. Despite this added effort, there

37 24 are cases in which the optimal proposal has significant performance gain over the standard proposal, with the same number of particles; these results, along with a more detailed investigation of the optimal proposal in nonlinear systems, will be investigated in detail in Chapter 6. Thus the optimal proposal may be more computationally tractable than the standard proposal in terms of the number of particles needed for an acceptable error level. To demonstrate the benefit of the particle filter over the EnKF in non-gaussian situations, we run the same example shown in Figure 2.5 with the standard particle filter replacing the ensemble Kalman filter. Results from this experiment are shown in Figure 2.6. In particular, the kernel density estimate of the updated weighted particles from the particle filter lies almost entirely on top of the true Bayes posterior distribution. Note that the ensemble size is large enough (N e = 10 5 ) to avoid filter divergence in a single step (see the following discussion of weight collapse. ) Narrow likelihood obs likelihood prior PF posterior Bayes posterior Broad likelihood ! Figure 2.6: Particle filter posterior distribution after a single update step with a bimodal prior distribution (red), Gaussian likelihood (green dashed) with narrow (left) and wide (right) variance, and the true Bayes posterior (thin black). The approximate distribution of the particle filter updated ensemble is shown in thick magenta. Since this method of updating weights is iterative, once one particle starts to accumulate a high weight, eventually this particle ends up with a weight of almost

38 25 1 and the other particles have a weight close to 0. This phenomenon is known as weight collapse or filter collapse [17, 44], and various methods of resampling have been suggested to avoid it. van Leeuwen [69] gives an overview of several common resampling schemes, including probabilistic resampling, residual resampling, stochastic universal sampling, and Monte Carlo Metropolis-Hastings, which will be discussed here. In all of the following algorithms, the particles have already been sampled from the proposal distribution and their weights have been updated. After this, the particles are resampled according to the chosen scheme, and the weights are reset to 1/N e. Probabilistic resampling is the method of resampling described in the original paper on particle filtering [27]. It is also the most straightforward method: the particles are sampled directly from the weighted ensemble. However, this method also introduces the most resampling noise due to the random nature of this resampling. Residual resampling, which was introduced in [44], does not add as much noise as probabilistic resampling. This method consists of first making n copies of particle i, where n = floor(n e w i ). Then, the rest of the particles are sampled from the distribution given by N e w i floor(n e w i ), or the residuals left over after the integer part is subtracted from N e w i. Stochastic universal resampling has the lowest sampling noise of these methods [40]. This algorithm is computed as follows. First, order all the weights from smallest to largest, so that there are N e bins of size corresponding to the weight; that is, (0, w 1 ) is the bin corresponding to particle 1, (w 1, w 2 ) corresponds to particle 2, etc. Draw a number α randomly from the uniform density on [0, 1/N e ]. Suppose α falls in the j th bin; then add the j th particle to the resampled ensemble. Repeat with α + 1/N e,..., α + (N e 1)/N e. Thus particles with larger weight will have larger

39 26 probabilistic residual stochastic univ. Metropolis-Hastings Error Table 2.1: Effect of different resampling methods on sampling noise, demonstrated using the errors in variance of resampled ensemble compared to the variance of the original ensemble, before resampling. bins, and will be resampled more often. The final resembling algorithm discussed here is the Monte Carlo Metropolis Hastings algorithm, originally presented in [31] and applied to particle filtering in [19]. First, choose one particle (say x 1 ) to be a member of the resampled ensemble. The second particle, x 2, will also be added to this ensemble if w 2 > w 1. If w 2 w 1, then choose a number ν from the uniform distribution on [0, 1]. If ν < w 2 /w 1, add x 2 to the new ensemble; if ν w 2 /w 1, then reject x 2 and duplicate x 1 as the second particle of the new ensemble. Repeat this algorithm with each particle from the previous ensemble, comparing against the last particle chosen for the new ensemble. To compare the amount of sampling noise added by each of the methods described here in a simple example, we generate a sample of size 100 from a standard normal distribution and assign equal weight to each sample member. We then apply each method to the original ensemble, and calculate the variance of the resampled ensemble. We do this 100 times and take the average error of each resampled variance (from the true value of 1) over each realization. Table 2.1 shows the resulting errors. Metropolis-Hastings resampling has variance closest to the true value of 1, while probabilistic resampling has the most sampling error. All of the methods of resampling described here will make identical copies of particles with high weight. For systems with a large amount of model noise, or systems with some model noise dynamics that do not collapse to a low-dimensional

40 27 attractor, this may not be a problem; identical particles will quickly spread out. However, for systems that have little to no model noise or degenerate dynamics, this will result in the same filter degeneracy as would occur without resampling. To avoid this, jitter may be added to the particles after resampling. The simplest method of adding jitter, and that which will be used in all experiments in this work, is simply to add zero-mean Gaussian random noise to each particle. These methods also require a resampling condition, to determine when to resample. Resampling could be done at every assimilation step; however, since any form of resampling will add some sampling noise, it should only be done when the filter is close to collapse (but, clearly, before total collapse.) A simple condition would be to resample when the largest weight is greater than some threshold, say 0.7. A more common condition, which will be used in all experiments in this work, is based on the effective sample size defined in [42]. This quantity is defined as N eff N e i=1 1 w 2 i. (2.49) Note that, when one particle has gained all of the weight, this effective sample size reduces to 1; when all the particles have equal weight, the effective sample size is N e. Resampling occurs when this quantity drops below some threshold, e.g., when N eff < Neff thresh. Often, Neff thresh example, 0.1N e. is some percentage of the total ensemble size; for Work on particle filters is ongoing, and there are several variations on the traditional particle filter method as presented above. Many of these variations are also discussed in [69]. For example, the auxiliary particle filter introduced in [52] uses observations from the next point in time to reweight the particles at the current time, and then evolves this ensemble forward to represent the proposal density. The

41 28 backtracking particle filter, as its name suggests, involves going back in time before the filter collapsed and altering the method in some way; this filter, and four different ways of performing the altering step, are discussed in [63]. Other approximations to the particle filter include the merging particle filter [49], kernel dressing [2], and the maximum entropy particle filter [39]. However, like the ensemble Kalman filter, the particle filter faces an inherent problem which cannot be solved by resampling: without a large ensemble size, the weights may collapse after a single step. The necessary ensemble size depends on the size of the system, and so the particle filter is generally computationally infeasible in high-dimensional systems (see [70, 8]). Bickel et al [9], Bengtsson et al [7], and Snyder et al [62] have studied, in various contexts, the dependence of this ensemble size on the dimension of the system. In [7], it is shown that the necessary ensemble size depends exponentially on an effective dimension, which in [62], the authors show can be calculated from statistics of the system (in the linear Gaussian case). In Chapter 6, it is shown that these results can be numerically extended from linear systems to nonlinear systems. 2.3 Lagrangian Data Assimilation The goal of Lagrangian data assimilation is to estimate the Eulerian flow field of a system (say, currents in the ocean) given Lagrangian observations of the positions of passive tracers (e.g. drifters or floaters), subject to the flow. In this case, the

42 29 dynamical system of interest is ẋ F = f 1 (x F ) ẋ D = f 2 (x F, x D ), (2.50a) (2.50b) where x F denotes the Eulerian velocity field (generally, this is a solution to a partial differential equation (PDE) which is discretized over a grid) and x D denotes the position of the drifter(s) [35]. The observations are then y = x D + ɛ, (2.51) ɛ N (0, R), (2.52) where ɛ represents the observation noise with covariance R. One challenge of assimilating data from Lagrangian observations is that the velocity fields are usually defined on grids, and there is no guarantee that the observations will fall exactly on grid points. The approach of Molcard et al [48] was to interpolate Lagrangian paths in velocities at nearby grid points, and assimilate this velocity data. However, this method is not true Lagrangian data assimilation, since the Lagrangian observations are converted into Eulerian data before assimilating. One true Lagrangian data assimilation approach involves appending the drifter position x D to the velocity field x F, so that the state of interest is x = [x F, x D ] T, the model is f = [f 1, f 2 ] T, and the observation operator is simply H = [0 I]. This method was developed and applied successfully in several theoretical and methodological studies over the last decade [35, 43, 58, 57, 5, 66, 63]. Now, f 1 can be linear or nonlinear, but the evolution of x D will always be non-

43 30 linear (typically to a high degree), resulting in non-gaussian distributions on (at least) the drifter positions [5]. Figure 2.7 shows an example of a flow field (blue arrows) arising from the linear shallow water equations (discussed in detail in following sections), and a sample drifter trajectory, shown at discrete points in time by red asterisks. Note the saddle point in the center of the flow; non-gaussian distributions may arise here, since, before an observation is assimilated, there will be some probability that the drifter moves in the positive y direction, and some probability that it moves in the negative y direction, but very small probability that the drifter remains at the saddle point Figure 2.7: Snapshot of the velocity field (blue arrows) arising from the linear shallow water equations and a sample drifter trajectory (red asterisks). Thus, non-gaussian distributions may arise in the context of Lagrangian data assimilation. In addition, in the general case, the flow field x F may be high dimensional. These two defining characteristics lead to complications with traditional data assimilation algorithms, including the ensemble Kalman filter (which fails in highly non-gaussian situations) and the particle filter (which is infeasible in high dimensional systems.) However, we aim to show that a hybrid particle-ensemble Kalman

44 31 filter avoids many of these issues. 2.4 Motivation for a Hybrid Filter Previous work on hybrid schemes includes the ensemble Kalman-particle filter (EnKPF) [23], the weighted ensemble Kalman filter (WEnKF) [50], the hybrid grid/particle filter [55, 56], and many hybrid ensemble-variational schemes, such as [29, 71, 11]. The EnKPF algorithm of Frei and Künsch provides a continuous interpolation between the EnKF and the particle filter, depending on the interpolation parameter. The WEnKF of Papadakis and Mémin is primarily a particle filter in which the proposal distribution is the posterior of the ensemble Kalman filter analysis. As mentioned above, neither the particle filter nor the ensemble Kalman filter is ideal in the case of Lagrangian data assimilation. The aim of the hybrid particle-ensemble Kalman filter is to exploit the advantages of each filter by splitting the drifter coordinates away from the flow variables. The high-dimensional, relatively Gaussian flow part is estimated via the ensemble Kalman filter, and the low-dimensional, highly chaotic and possibly non-gaussian drifter variables are estimated via a particle filter. This hybrid particle-ensemble Kalman filter is based primarily on the hybrid grid/particle filter of Salman [55, 56]. That method was also derived for the context of Lagrangian data assimilation, and introduced the idea of splitting the state space into the Eulerian flow variables and the Lagrangian position variables, and using different methods for each part. We now provide details on that method before describing the major differences between it and the hybrid particle-ensemble Kalman

45 32 filter. As above, let x = [x F, x D ] T and let N F = dim(x F ), 2N D = dim(x D ). The stochastic evolution of the variables will be given by dx = m(x, t)dt + η(t)dt. (2.53) In this method, represent the uncertainties in the initial conditions by a probability density function (PDF), φ{x(t 0 )}. The evolution of φ is then given by the Fokker- Planck Equation Ndim φ t + i=1 N m i φ dim,n dim Q ij = x i 2 i,j=1 2 φ x i x j (2.54) where Q = E[ηη T ] is the covariance matrix of the stochastic forcing and N dim = N F + 2N D. The appropriate boundary conditions for this second-order differential equation are φ(x, t) 0 as x i ±. (2.55) The Fokker-Planck equation then reduces to the two equations φ D t φ F t N 2D + i=1 + N F i=1 N F,N m F,i φ F F = x F,i i,j=1 Q F F,ij 2 2N D,2N md,i φ F D Q DD,ij dx F = x D,i 2 i,j=1 2 φ F x F,i x F,j, (2.56) 2 φ D x D,i x D,j (2.57) where m, φ and Q have been decomposed into the flow part (F ) and the drifter part (D). In this method, the PDF in the observation (drifter) space is retained in its continuous form, and the marginal PDF of the flow variables is approximated using

46 33 a particle filter: φ F (x F, t) 1 N e w k δ(x F x F,k ). (2.58) N e Using this equation together with the definition of marginal density, we have φ(x, t) = 1 N e N e k=1 k=1 δ(x F x F,k )φ D,k (x D, t) (2.59) φ D (x D, t) = 1 N e φ D,k (x D, t) (2.60) N e k=1 w k = φ D,k (x D.t)dx D. (2.61) After some substitution and use of properties of the Fokker-Planck equation, the final result is φ D,k t + 2N D i=1 2N D,2N m D,ik φ D,k D Q DD,ij = x D,ik 2 i,j=1 2 φ D,k x D,ik x D,jk (2.62) for k = 1...N e. In summary, to approximate the full continuous PDF φ(x F, x D, t), the hybrid grid/particle filter approximates the flow phase space with a Monte Carlo (particle) method, and keeps a continuous PDF for the drifter phase space. Essentially, each ensemble member of the flow x F,k has, associated with it, a functional form of the marginal distribution of the drifters φ D,k (x D, t), which is then updated with the observation according to Bayes rule. This method was tested on a regularized vortex model with two vortices and a single drifter in [56]. In this case, it is important to note that the dimension of the flow variable is 4 (the position of each vortex.) It was found that the numerical solver of the drifter evolution equation must be high-order and monotonic to avoid

47 34 spurious negative probabilities from arising. It was also found that, like any particle filtering scheme, resampling is necessary to avoid sample impoverishment. This study compared the hybrid grid/particle filter to the perturbed-observation formulation of the EnKF discussed above, and to the particle filter with the same resampling scheme used in the hybrid grid/particle filter. As expected based on previous studies of Lagrangian data assimilation using the EnKF, the EnKF diverged when the time between observations was long, despite larger ensemble sizes; in this case though, the hybrid and particle filter were able to track the truth well. In addition, the hybrid filter did not require as large of an ensemble for similar performance as the particle filter; however, because of the continuous PDF required for the hybrid filter, the particle filter is still computationally less expensive than the hybrid filter in this example. The main difference of the hybrid particle-ensemble Kalman filter from the gridparticle filter described above is that Salman uses an advection-diffusion equation to solve the Fokker-Planck equation associated to the drifter evolution (Equation (2.50b)) in order to propagate the probability density function of the drifter variables x D, and then updates that density using Bayes rule. This process effectively gives a weighted ensemble of the flow variables x F which is resampled to get an ensemble with equal weights. In contrast, we use a Monte Carlo approximation of the Fokker-Planck equation, by using an ensemble of drifter positions, each of which is propagated using Equation (2.50b). Additionally, instead of resampling the flow variables in each update step, we use a version of the EnKF for weighted ensembles. There are two main reasons for choosing a combined particle-ensemble Kalman filter strategy instead of the hybrid grid/particle filter strategy of Salman. First, the flow is usually high dimensional, so a particle filter approximation of the Eulerian variables (as used in the work of Salman) will not work well. Since these variables

48 35 are usually not very nonlinear, we choose to work with an EnKF approximation for the updates of these variables. Second, solving a Fokker-Planck equation for the drifter distribution function (as done in the work of Salman) can by itself be quite computationally challenging, and in the case of multiple drifters, may not be feasible at all. Hence we choose to work with a Monte Carlo approximation given by a weighted ensemble of drifters, with the weights updated in a manner similar to a particle filter described above. Thus we expect the method that we propose to work well even for realistic models of the ocean flow, augmented by equations for drifter dynamics.

49 Chapter Three Hybrid Filter Algorithm

50 The Proposed Filter Since neither the particle filter nor the ensemble Kalman filter are ideal individually in the case of Lagrangian data assimilation, the aim of the hybrid particle-ensemble Kalman filter proposed here is to exploit the advantages of each filter by splitting the drifter coordinates away from the flow variables. The high-dimensional, relatively linear Gaussian flow part is estimated via the ensemble Kalman filter, and the low-dimensional, highly nonlinear and possibly non-gaussian drifter variables are estimated via a particle filter. In this chapter, we first provide notation and setup. We then describe the filter algorithm in detail, and address complications that arise in the context of this filter. In the following chapters, we test this algorithm on two different Lagrangian data assimilation problems Setup Let x F R N F denote the (potentially high dimensional) flow variable, and let x D denote the drifter position variable, which consists of the x and y components of each of the N D drifters, so that x D R 2N D. We assume a planar fluid flow in which we can only observe the position of the drifter on the surface, and not the height of the fluid at its location. Recall, these variables evolve according to ẋ F = f 1 (x F ), ẋ D = f 2 (x F, x D ). (3.1a) (3.1b)

51 38 At discrete times k, we have observations of the drifter available: y k = x D,k + ɛ k, ɛ k N (0, R). (3.2) Let p(x) denote a probability density function associated with the random variable x. At time k, the joint distribution of the flow and drifter variables is p(x F,k, x D,k ) = p(x D,k x F,k )p(x F,k ). (3.3) We discretely approximate the marginal distribution on the flow p(x F,k ) with an { } Ne ensemble of N e weighted states, w i k. Initially, we set w i k = 1/N e, so that x F,k i the joint distribution is approximated by p(x F,k, x D,k ) 1 N e N e i=1 i=1 p(x D,k x F,k i )δ(x F,k x F,k i ). (3.4) Next, we approximate the conditional distribution of the drifters given the flow with a weighted ensemble of M states: p(x D,k x F,k i ) M j=1 wi,jδ(x k D,k x D,k i,j ), (3.5) where {x D,k i,j } j=1...m is the ensemble of drifter states associated with (and subject to) the flow x F,k i. Thus the full joint distribution is approximated discretely as p(x F,k, x D,k ) 1 N e N e i=1 M j=1 wi,jδ(x k D,k x D,k i,j )δ(xf,k x F,k i ), (3.6)

52 39 where 1/N e Ne i=1 M j=1 wk i,j = 1. For simplicity, we will absorb the factor 1/N e into the weights, so that the distribution is now p(x F,k, x D,k ) N e i=1 M j=1 wi,jδ(x k D,k x D,k i,j )δ(xf,k x F,k i ) (3.7) and N e i=1 We also define w k i M j=1 wk i,j = 1. We will denote this ensemble by {x F,k i, x D,k i,j, wk i,j} i=ne,j=m i=1,j=1. in terms of w k i,j for general times k in Equation (3.8); then the weighted ensemble representing the marginal distribution of the flow is given by {x F,k i, w i k } Ne i=1. As with the typical particle filter, we will assume that, at time 0, w 0 i,j = 1/MN e, and the ensemble members have all been drawn independently from their respective prior distributions. Finally, define the following quantities at time k: w k i = j w k i,j, (3.8) x D,k i = 1 w k i j x D,k i,j wk i,j, (3.9) x F,k = i x F,k i w k i, (3.10) x D,k = i,j x D,k i,j wk i,j = i x D,k i w k i. (3.11) Thus x D,k i denotes the mean of the drifter particles associated with flow member i, while x D,k is the mean over all the drifter particles, and x F,k denotes the mean of the flow variables.

53 Between Updates Suppose, at time k 1, we have the ensemble {x F,k 1 i our next observation is at time k., x D,k 1 i,j, w k 1 i,j } i=ne,j=m i=1,j=1 and Before assimilating the observation, we must obtain an ensemble at time k. This will generally be performed by first numerically integrating each flow member x F,k 1 i The drifter particles x D,k 1 i,j subject to the i th flow member: according to the model given by Equation (3.1a). are numerically advected according to Equation (3.1b) ẋ D i,j = f 2 (x F i, x D i,j). (3.12) The weights w k 1 i,j are then tested to determine whether the resampling condition is met or not. That is, at time k, the prior weights are used to calculate the effective dimension and determine whether N eff Neff thresh, with N eff = N i=1 1/w2 i. (This is done to avoid the computational effort of saving two sets of weights at every step, since the prior weights are necessary for the update step with resampling.) Update - No Resampling The transition described above then yields the prior ensemble of state values at time k, {x F,k i, x D,k i,j }, which, along with the weights {wk 1}, describe the prior distribution at time k. Suppose an observation y k is available at time k, and the weights have not yet crossed the pre-determined threshold. The weights w k 1 i,j w k i,j, but the ensemble members {x F,k i i,j will be updated to } i and the particles {x D,k i,j } i,j themselves remain unchanged. Following the standard sequential importance sampling (SIS) particle

54 41 filter [17], the weight update equation is given by w k i,j = p(y k x D,k i,j ) l,m )wk 1 p(yk x D,k i,j. (3.13) l,m That is, the updated weights are found by multiplying the likelihood of that particle by the previous weight and normalizing to sum to 1. Then, we have p(x F,k, x D,k y k ) N e i=1 M j=1 wi,jδ(x k D,k x D,k i,j )δ(xf,k x F,k i ), (3.14) the discrete approximation of the joint posterior distribution of the flow and drifters conditioned on all the observations up to and including the observation at time k Update - With Resampling In this section, we discuss how to update the full ensemble when the weights cross the resampling threshold: N eff < Neff thresh. Since this update will occur entirely at time k, we drop the time-dependence. Instead, we give notation to define whether the variable has been updated using the observation y k := y or not: the superscript f (forecast) will denote variables which have not yet been updated, and the superscript a (analysis) will denote variables which have been updated with the observation y. In particular, note that w f i,j will denote wk 1 i,j since the weights do not change when the particles themselves are evolved forward in time, and w a i,j will denote the weights at time t k after they are updated according to the observation. Traditionally, when applying the particle filter, one would resample (x i ) N i=1 from N i=1 w iδ(x x i ) when some predetermined threshold of the effective sample size is hit. That is, the particles are resampled from the approximate full distribution

55 42 on x. In the hybrid filter algorithm, the drifter variables will be resampled from the approximation of the marginal distribution of the drifters conditioned on their respective flow members: (x a,d i,j )i=ne,j=m i=1,j=1 are sampled from Ne M wi,jδ(x a D x f,d i,j ), after which the weights are reset to w i,j = 1/MN e. i=1 j=1 The flow variables will be resampled from the EnKF posterior distribution. More specifically, they will be resampled from the EnKF approximation of the joint distribution between the flow and the averaged drifters x D, marginalized over x D : p(x F y) = p(x F, x D y)d x D. Essentially, this resampling is performed by updating each flow member according to the EnKF analysis step. We now describe the update process in more detail. Let A f F be the N F N e matrix with the i th column given by x f,f i, and let Ãf D be the 2N D N e matrix with the i th column given by the average x f,d i of the drifters associated with x f,f i, defined in Equation (3.9). Since this algorithm will use the perturbed-observation formulation of the EnKF, define the 2N D N e matrix Υ of perturbed observations: Υ = [y + ɛ 1, y + ɛ 2,..., y + ɛ Ne ], (3.15) where the distribution of each ɛ i will be discussed below, since they must account for the fact that the ensembles of flow members and drifter particles have associated weights. In the context of Lagrangian data assimilation, it is common to divide the covariance P into four blocks: the covariance of the flow, the covariance of the drifters,

56 43 and the cross-covariances between the flow and drifters: P = P F F P T F D P F D P DD. (3.16) Since this algorithm uses a different ensemble size for the flow members x F i and the drifter particles x D i,j, the calculation of these covariance matrices is not as straightforward as with the traditional EnKF. The method for calculating these matrices for the hybrid filter will be discussed in detail below. Other than the differences in Υ and P f which will be described below, the update step on the flow members for the hybrid filter has the same formulation as the traditional EnKF update in the Lagrangian case: A a F = A f F + Pf F D (Pf DD + R) 1 (Υ Ãf D). (3.17) In particular, note that P f F D (Pf DD +R) 1 defines the upper block of the Kalman gain matrix; since the drifter variables will be updated separately, the lower block is not needed here. In the Gaussian case with a linear model, the traditional ensemble Kalman filter can be shown to give the correct Bayesian posterior mean and covariance in the limit as N e [46]. In order for this to hold in the weighted case, the observation perturbations must have the correct distribution when considered as weighted samples. Specifically, let Υ be as defined in Equation (3.15). In the traditional EnKF, ɛ i N (0, R); however, in our case, we need the weighted ensemble {ɛ i, w f i } to be a discrete approximation of the continuous normal distribution with mean 0 and

57 44 covariance R. This can be achieved by setting ɛ i = R 1/2 (P f DD ) 1/2 (z i m f DD ), (3.18) where R is the observation noise covariance, m f DD is the (appropriately weighted) mean of the forecast drifter particles stored in Ãf D, and {z i, w i } is a weighted sample from N (m f DD, Pf DD ). This will be described in further detail after the following lemma. Lemma The EnKF update on a weighted ensemble has the same posterior mean and covariance as the traditional EnKF in the Gaussian case, and thus the correct Bayes posterior mean and covariance in this case, provided the observation perturbations satisfy N i=1 ɛ iw i = 0 and N i=1 ɛ iɛ T i w i = R. Proof. Suppose, at some time t k, the true state of the system is x and we have an observation available given by y = Hx+ɛ, where ɛ N (0, R). To represent the initial uncertainty in the true state, we have a weighted ensemble of states {x f i, w i} i=1...n which represents a normal distribution with mean m 0, covariance C 0. That is, m 0 = C 0 = N x f i w i (3.19) i=1 N (x f i m 0) (x f i m 0) T w i. (3.20) i=1 The goal will be to show that, after updating the ensemble members themselves (but not the weights) via the Kalman update step, the posterior mean and covariance of the updated ensemble will be equivalent to the true Bayes posterior mean and covariance. That is, we want the updated ensemble {x a i, w i } i=1...n to have mean and

58 45 covariance m 1 = m 0 + K(y Hm 0 ) (3.21) C 1 = (I KH)C 0, (3.22) where the Kalman gain matrix is as usual: K = C 0 H T (HC 0 H T + R) 1. The ensemble will be updated according to the ensemble Kalman update, in the perturbed observation format: x a i = x f i + K(y Hxf i + ɛ i). (3.23) Then the updated mean is m 1 = = = N x a i w i (3.24) i=1 N i=1 ) w i (x f i + K(y Hxf i + ɛ i) (3.25) N N w i x f i + Ky KH N w i x f i + K w i ɛ i (3.26) i=1 i=1 = m 0 + K(y Hm 0 ) + K i=1 N w i ɛ i. (3.27) Thus for the updated mean to coincide with the correct Bayes posterior mean, we need i w iɛ i = 0. i=1

59 46 The updated covariance is now C 1 = = = = = + N (x a i m 1 )(x a i m 1 ) T w i (3.28) i=1 N i=1 N i=1 N i=1 N i=1 ) ( ) T w i (x f i + K(y Hxf i + ɛ i) m 0 K(y Hm 0 ) (3.29) w i ( (I KH)x f i (I KH)m 0 + Kɛ i ) ( ) T (3.30) w i ( (I KH)(x f i m 0) + Kɛ i ) ( ) T (3.31) ) ( w i ((I KH)(x f i m 0) N Kɛ i ɛ T i K T w i i=1 ) T N + (I KH)(x f i m 0)Kɛ i w i (3.32) i=1 where ( ) T represents the transpose of the same term in parentheses immediately preceding it. In this expression, the first term is equivalent to (I KH)C 0 (I KH) T and the second term is 0, as long as we assume independence between the noise terms ɛ i and the ensemble members x i. Now, if i ɛ iɛ T i w i = R, then the third term reduces to KRK T. Thus, we have C 1 = (I KH)C 0 (I KH) T + KRK T (3.33) = C 0 KHC 0 C 0 H T K T + K(HC 0 H T + R)K T (3.34) = (I KH)C 0, (3.35) as desired. Therefore the weighted EnKF update step gives the correct posterior mean and

60 47 covariance in the Gaussian case provided that the perturbations ɛ i satisfy N ɛ i w i = 0 (3.36) i=1 N ɛ i ɛ T i w i = R. (3.37) i=1 Essentially, the weighted ensemble {ɛ i, w i } must approximate the Gaussian distribution with mean 0 and covariance R. Now we show how to draw such perturbed observations. First, draw another sample with the same distribution as the ensemble members, say z i, so that {z i, w i } i=1...n is a weighted sample from N (m 0, C 0 ). Then define the perturbations as ɛ i = R 1/2 C 1/2 0 (z i m 0 ). (3.38) First, note that the square roots and inverses are well-defined since R and C 0 are covariance matrices. (Note they are also symmetric.) Then, the mean of the perturbations is and the covariance is N i=1 N w i ɛ i ɛ T i = i=1 ɛ i w i = R 1/2 C 1/2 0 N i=1 ( N ) (w i z i ) m o = 0 (3.39) i=1 R 1/2 C 1/2 0 (z i m 0 )(z i m 0 ) T C 1/2 0 R 1/2 w i (3.40) = R 1/2 C 1/2 0 C 0 C 1/2 0 R 1/2 (3.41) = R. (3.42) Thus, replacing m 0 and C 0 with m DD and P DD respectively, we have the method for applying the weighted EnKF to the flow variables. However, we must still address the calculation of P DD in this weighted, two-ensemble situation.

61 Calculating the Covariance Matrices There are essentially two choices for calculating P f F D and Pf DD : they can be calculated using the full ensemble {x f,f i, x f,d i,j, wf i,j }i=ne,j=m i=1,j=1 (3.43) or the averaged ensemble {x f,f i, x f,d i, w f i }Ne i=1. (3.44) These two ensembles will have the same means but different covariances, as we now show. To this end, we will derive and compare the statistics for the full ensemble {x F i, x D i,j, w i,j } and for the averaged ensemble {x F i, x D i, w i }. Let x = [x F, x D ], and consider (as above) the following decomposition of the covariance matrix into the flow-flow covariance, drifter-drifter covariance, and flowdrifter cross-covariance: P = P F F P T F D P F D P DD. (3.45)

62 49 The full ensemble {x F i, x D i,j, w i,j } has mean and covariance x F full = i x F i w i, (3.46) x D full = i,j x D i,jw i,j, (3.47) P full = i,j w i,j (x i,j x) (x i,j x) T. (3.48) In particular, P F F,full = i w i ( x F i x F ) ( x F i x F ) T, (3.49) P F D,full = i,j w i,j ( x F i x F ) ( x D i,j x D) T, (3.50) P DD,full = i,j w i,j ( x D i,j x D) ( x D i,j x D) T. (3.51) The averaged ensemble {x F i, x D i, w i } has mean x F avg = i x F i w i, (3.52) x D avg = i x D i w i = i,j x D i,jw i,j (3.53) which is equivalent to the mean of the full ensemble, and the covariances are P F F,avg = i w i ( x F i x F ) ( x F i x F ) T, (3.54) P F D,avg = i w i ( x F i x F ) ( x D i x D) T, (3.55) P DD,avg = i w i ( x D i x D) ( x D i x D) T. (3.56) Clearly, P F F,full = P F F,avg. We will show that P F D,full = P F D,avg as well, but that P DD,full P DD,avg. Indeed,

63 50 P F D,full = i,j = i = i w i,j ( x F i [ (x F i [ (x F i x F ) ( x D i,j x D) T x ) ] ( F w i,j x D i,j x D) T j x ) ( F wi,j (x D i,j) ) ] T ( x D ) T w i j (3.57) (3.58) (3.59) = i ( x F i x F ) ( x D i x D) T wi (3.60) = P F D,avg (3.61) as claimed. If the binomial in the drifter variance is expanded, only one of the three terms differs between the full distribution and the averaged distribution: P DD,full P DD,avg = w i,j (x D i,j)(x D i,j) T w i,j ( x D i )(x D i,j) T (3.62) i,j i,j ( = w i,j (x D i,j )(x D i,j) T ( x D i )(x D i,j) ) T. (3.63) i,j Thus, this term determines how close the prior of the full distribution is to the prior of the averaged distribution. We now argue that the averaged ensemble should be used in the EnKF update of the flow variable, so that, in the linear Gaussian case, the posterior (analysis) covariance of the flow is consistent with that of the traditional EnKF. Indeed, note that the posterior mean and covariance of the flow variables (after the EnKF update) will depend on which ensemble (full or averaged) is used to calculate P f DD and Pf F D.

64 51 Define K (1) = P f F D (Pf DD + R) 1, (3.64) the upper block of the Kalman gain matrix. Then the posterior mean of the flow is given by x a,f = x f,f + K (1) (y x f,d ), (3.65) and the posterior covariance can be shown to be P a F F = P f F F K(1) (P f F D )T P f F D K(1)T + K (1) (P f DD + R)K(1)T. (3.66) Now, if the covariances from the full ensemble are used, this expression cannot be simplified further. However, if the covariances from the averaged ensemble are used, this can be further simplified to P a F F = P f F F K(1) (P f F D )T, (3.67) which is the same form given by the traditional EnKF for Lagrangian data assimilation. Thus, since the prior statistics on the flow use the averaged ensemble, the update step on the flow should also use the averaged ensemble; in the linear Gaussian case, this will lead to posterior statistics which are consistent with those of the traditional EnKF. (In particular, the innovations (y x f,d i ) depend on the averaged statistics, which prevents further simplifications of the posterior covariance if the full statistics are used in K (1).) We now present a concise description of the implementation of this update with resampling. We consider the prior/forecast ensemble to be two ensembles: one for

65 52 the flow variables, {x f,f i, w f i } (with w i as defined in Eqn. (3.8)), and one for the drifter variables, {x f,d i,j, wf i,j }. The updating/resampling algorithm proceeds as follows: PF-EnKF hybrid update algorithm 1. Change the state values of the flow ensemble members using the observation y with the EnKF update given in Eqn. (3.17). Note that the covariances P f F D and P f DD should be obtained using the averaged drifter ensemble as described above. Then we have {x a,f i, w f i }. 2. Find w a i,j using y and w f i,j with the standard particle filter update described in Eqn. (3.13). Then we have {x f,d i,j, wa i,j}. 3. Now {x a,f i, w f i } and {xf,d i, wi,j} a together represent the posterior distribution at time t k which has incorporated the observation y. 4. Resample the flow variables from {x a,f i, w f i } and the drifter variables from {x f,d i, w a i,j} using standard methods. Call these {ˇx a,f i } and {ˇx a,d i } respectively. (For the examples presented in Chapter 4, we use an MCMC based resampling outlined in [19, 69]). Note, for a specific flow member i = m, if y falls far from the support of {x f,d m,j }, we recommend resampling the drifter variables for the m th flow around the observation using the observation error statistics. 5. Set w a i,j = 1/MN e, x a,f i = ˇx a,f i, and x a,d i,j = ˇx a,d i,j. Then the posterior is represented by {x a,f i, x a,d i,j, wa i,j} and sequential filtering proceeds as normal. In particular, note that the EnKF update on the flow variables uses the prior weights as updating both the weights and the flow members would lead to the observations

66 53 being incorporated into the flow update twice. However, since the weights must all be equal at the end of the full update, the final flow members must be resampled from {x a,f i, w f i }, so that the ensemble with equal weights approximates the same distribution. At any point in time, the ensemble {x F i, x D i,j, w i,j } i=ne,j=m i=1,j=1 may be used to calculate statistics of interest such as the mean or covariance, as with a typical particle filter. The mean flow state is given by x F = N e i=1 xf i w i, and the mean drifter state is given by x D = M Ne j=1 i=1 xd i,jw i,j. The covariance matrices will be subject to the same complications described above: they can be calculated either using the full ensemble of drifter particles, or the averaged ensemble. Define x i,j = [ x F i, xi,j] D T ; then, the covariance matrix of the full ensemble is Pfull = M Ne j=1 i=1 (x i,j x) (x i,j x) T w i,j. Next let x i = [ ] x F i, x D T i ; then, the covariance matrix of the averaged ensemble is P avg = N e i=1 ( x i x) ( x i x) T w i. 3.2 Multiple Drifters Here we briefly address the question of multiple drifters. For simplicity, we restrict to the case with two drifters. Thus the three variables of interest are the flow field, the first drifter position, and the second drifter position: x F, x D1, x D2. Assume they each have marginal distributions given by: q 1 (x D1 ), q 2 (x D2 ), q 3 (x F ) and assume the joint distribution between drifters is q 1,2 (x D1, x D2 ) with no assumption on independence for now. Let the full joint distribution be given by g(x D1, x D2, x F ), and note

67 54 that q 1,2 (x D1, x D2 ) = g(x D1, x D2, x F )dx F. Define the means as: x D1 = x D2 = x F = xq 1 (x)dx (3.68) xq 2 (x)dx xq 3 (x)dx. Then the covariance between drifters is given by cov(x D1, x D2 ) = E [ (x D1 x D1 )(x D2 x D2 ) ] T (3.69) = (x D1 x D1 )(x D2 x D2 ) T q 1,2 (x D1, x D2 )dx D1 dx D2. If the distributions on the drifters are independent, then q 1,2 (x D1, x D2 ) = q 1 (x D1 )q 2 (x D2 ) (3.70) and cov(x D1, x D2 ) = (x D1 x D1 )(x D2 x D2 ) T q 1 (x D1 )q 2 (x D2 )dx D1 dx D2. (3.71) Now assume that the continuous probability distributions are approximated by discrete distributions (a finite number of possible states with associated weights, or probabilities.) We will consider two ways of doing this: first, to represent the full joint distribution on (x F, x D1, x D2 ), and second, to represent the two joint distributions on (x F, x D1 ) and (x F, x D2 ) separately. In the first case, we have one ensemble {(x F, x D1, x D2 ) (k) } k=1...mne and one set of weights {w (k) } k=1...mne where k w(k) = 1. In order to calculate the cross-covariance

68 55 between drifters, we need weights that represent the distribution q 1,2 (x D1, x D2 ). Recall that for discrete probability mass functions, the marginal distribution is given by p(x D1, x D2 ) = x F p(x D1, x D2, x F ); but in the case of the hybrid filter, each (x D1, x D2 ) pair only has one x F associated with it, so the sum is essentially only over one term. Thus p(x D1, x D2 ) = p(x D1, x D2, x F ); that is, the weights which determine the full joint distribution on (x D1.x D2, x F ) also determine the joint distribution on (x D1, x D2 ). So, the cross-covariance between drifters would be approximated as: cov(x D1, x D2 ) = (x D1 x D1 )(x D2 x D2 ) T q 1,2 (x D1, x D2 )dx D1 dx D2 (3.72) k (x (k) D 1 x D1 )(x (k) D 2 x D2 ) T w (k). (3.73) In the second case, we would have two ensembles, one for each drifter: {(x F, x D1 ) (k) } k=1...mne, {(x F, x D2 ) (k) } k=1...mne (3.74) and associated weights: {w (k) 1 } k=1...mne, {w (l) 2 } l=1...mne, k l w (k) 1 = 1, (3.75) w (l) 2 = 1. (3.76) However, in order to calculate the covariance, we still need the joint distribution between the drifters. If we can assume that they are independent, then the approximation of the joint distribution is given by a product of the weights: cov(x D1, x D2 ) = (x D1 x D1 )(x D2 x D2 ) T f 1 (x D1 )f 2 (x D2 )dx D1 dx D2 (3.77) k l (x (k) D 1 x D1 )(x (l) D 2 x D2 ) T w (k) 1 w (l) 2. (3.78)

69 56 However, this is not necessarily a valid assumption. To this end, suppose the evolution of the variables is linear (or linearized) and discrete in time, so that the evolution from time 0 to time 1 can be written as: x F x D1 (1) = M x F x D1 (0) (3.79) x D2 x D2 where A 0 0 M = B 1 C 1 0 B 2 0 C 2 (3.80) to capture the dependence of the drifter variables on the flow. Additionally, suppose that the covariance is initially diagonal, and in particular that there is no cross covariance between drifters: P F 0 0 P 0 = 0 P D1 0. (3.81) 0 0 P D2

70 57 The evolution of the covariance matrix is given by P 1 = MP 0 M T (3.82) A 0 0 P F 0 0 A T B T 1 B T 2 = B 1 C P D1 0 0 C T 1 0 (3.83) B 2 0 C P D2 0 0 C T 2 AP F A T AP F B T 1 AP F B T 2 = B 1 P F A T B 1 P F B T 1 + C 1 P D1 C T 1 B 1 P F B T 2. (3.84) B 2 P F A T B 2 P F B T 1 B 2 P F B T 2 + C 2 P D2 C T 2 In particular, this matrix is no longer block diagonal. Cross covariances between the drifters are induced by their dependence on the flow; note that the covariance between drifters is a function of the initial covariance of the flow part, and the flow operator which evolves the drifters forward. Therefore, any assumptions that the drifter variables are independent become inaccurate as soon as the variables are evolved forward in time. So, the way we deal with multiple drifters in subsequent experiments is to approximate the full joint distribution on (x D1, x D2, x F ), as in the first case described above. However, based on the specific structure of the model, the cross-covariances that arise between drifters may still be relatively small compared to the other covariances. In particular, if the drifters are far away from each other in the flow domain, the drifters can likely be assumed independent without much loss of information. This assumption of independence may simplify calculations within the hybrid filter, but this will be the subject of future work.

71 Expected Benefits and Outcomes The hybrid filter presented above is developed mainly for the case of Lagrangian data assimilation, when the flow field has been discretized to a high-dimensional vector and the drifter trajectories are highly nonlinear. In this case, the particle filter is completely intractable, whereas the ensemble Kalman filter breaks down when the prior distribution is highly non-gaussian. The latter case arises when the drifter dynamics, either near a saddle or a center, leads to non-gaussian distributions [4]. In these cases, we expect the hybrid filter to outperform the ensemble Kalman filter, since the hybrid filter employs a particle filter on the drifters, in order to effectively approximate such non-gaussian distributions. When the flow leads to highly non-gaussian distributions on the drifter position, the ensemble Kalman filter may break down. In these cases, although the hybrid filter may take more machine time to run due to the large number of drifter particles, it will produce much more accurate results than the ensemble Kalman filter, as shown in numerical examples in the following chapter. In any case, the increase in computation will be largely nominal, since running many more evolutions of the drifter particles will be significantly cheaper than evolving more realizations of the flow. In addition, each drifter particle is independent given the flow field ensemble, so these advections can be easily parallelized. We also expect that this general methodology, of splitting the state space in two parts and applying different assimilation techniques to them, will be useful in other contexts as well. In the following chapter, we test the hybrid filter on a model for which the flow is low-dimensional, and compare the results of the hybrid filter to those of both the ensemble Kalman and particle filter. We expect the hybrid filter to be comparable

72 59 to the particle filter, and to outperform the EnKF when the drifter trajectory is highly nonlinear. Subsequently, we test the hybrid filter on a high-dimensional flow model and compare the results to those of the EnKF, as well as to the results of the previous work [57].

73 Chapter Four Application to the Linear Shallow Water Equations

74 61 In this chapter, we test the hybrid filter on a model with a low-dimensional flow variable. This was chosen because the particle filter is tractable for Lagrangian data assimilation when the flow is low-dimensional, and we are particularly interested in comparing the hybrid and ensemble Kalman filter posterior distributions to the particle filter posterior. Since Lagrangian data assimilation leads to non-gaussian distributions, we need a method that can handle non-gaussianity as a benchmark to which we can compare both the traditional method of the EnKF and the new method of the hybrid filter. Thus, we compare the posterior distributions (after updating with an observation) of the hybrid filter, EnKF, and particle filter, considering the particle filter posterior as an approximation of the true Bayesian posterior. In addition, since this is a synthetic-truth experiment, we have the true state of the system available and therefore can calculate the errors between the means of the filters and the truth (since the filter means are often used as the estimate of the state.) We can also include the truth in the investigation of the posterior distributions, to see where the truth lies within the support of the distributions. For example, there are some situations for which none of the filter means estimate the truth well; however, the support of the distributions from the particle filter and hybrid filter includes the truth, while the distribution from the EnKF assigns zero probability to the true state. We will also see that the hybrid filter is able to capture the non-gaussian distributions arising from nonlinearities in the drifter trajectories better than the EnKF does. Finally, we show that the hybrid filter is also able to track the true state as a function of assimilation step better than the EnKF does.

75 Model Description We apply the particle filter, ensemble Kalman filter, and hybrid filter to the linear shallow water equations with a single drifter. This model, and the decomposed solution given below, are based on [51] and were used as a test problem in [5]. Derived from the Navier-Stokes equations under certain assumptions and approximations, the linear shallow water equations describe the time evolution of the horizontal velocity u, the meridional velocity v, and the offset from the mean height field h, and are given by: u t = v h x, v h = u t y, (4.1) h t = u x v y. For simplicity, we use periodic boundary conditions so that explicit solutions to this model can be found as sums of Fourier modes: u(x, y, t) = l sin(kx) cos(ly)u 0 + cos(my)u 1 (t), v(x, y, t) = k cos(kx) sin(ly)u 0 + cos(my)v 1 (t), (4.2) h(x, y, t) = sin(kx) sin(ly)u 0 + sin(my)h 1 (t), where the Fourier amplitudes solve linear ordinary differential equations. (See Appendix B for a derivation of the linear shallow water equations and these solutions.) We will consider a perturbed version of this system, so that the flow is no longer in the Gaussian regime. These equations include noise η N (0, Q) (independent across the three variables), periodic forcing cos(t)f, and damping δ, where each of

76 63 the variables η, F, and δ has three components: η = [η (1), η (2), η (3) ], etc. u 0 = 0, u 1 = v 1 + η (1) + cos(t)f (1) δ (1) sign(u 1 ), v 1 = u 1 mh 1 + η (2) + cos(t)f (2) δ (2) sign(v 1 ), (4.3) h 1 = mv 1 + η (3) + cos(t)f (3) δ (3) sign(h 1 ). Then, the position of the drifter x D = (x, y) solves ẋ = u(x, y, t), (4.4) ẏ = v(x, y, t). In particular, even if the flow evolution given in Equations (4.1) is linear in (u, v, h), the drifter evolution (4.4) will be nonlinear in (x, y) unless (u, v, h) is constant. In this model, the Eulerian variables of interest are x F = (u 0, u 1, v 1, h 1 ) := (x F 1, x F 2, x F 3, x F 4 ), whereas the drifter variables are x D = (x, y). We observe a noisy measurement of (x, y), for which the covariance is assumed to be R = σr (4.5) 0 1 for some scalar σ 2 R. All experiments in this chapter and the following chapter fall under the twinexperiment setup. Generally, this means that the same model used to generate the truth is also used in the forecast step of the data assimilation method (although the realizations of the model noise will be different.) Specifically, a true initial condition (say x true (0)) is chosen for both the flow and the drifter. These are evolved

77 64 forward together under the model, for the length of time that the assimilation algorithm will run, to generate the true flow evolution and the true drifter trajectory. The observations are then determined by sampling the true drifter at given discrete points in time and adding random noise with mean 0 and covariance R. Thus, we know that the true distribution of the observation errors is the same as that used in the assimilation algorithm, which is often not the case in operational data assimilation. The truth and observations constitute one side of the twin-experiment, and the other side will be the assimilation algorithm. Specifically, we return to the true initial condition, and initialize our data assimilation ensemble by drawing samples from a distribution with mean x true (0) + offset for some offset from the mean, and some covariance, say P. We then use the same model as we used to generate the truth to evolve the ensemble forward to the first observation time, and assimilate the observation. We now have the true value of the flow and the drifter available at every observation time, to which we can compare our assimilated results. For the following experiments, we will estimate the Fourier amplitudes as the flow variables. Therefore, since we are only estimating a relatively small number of variables, the particle filter is tractable. With enough particles, it can also be assumed to provide an approximation to the true Bayesian posterior distribution, since it captures all non-gaussian behavior. In particular, this choice of system provides the ability to easily compare the marginal distributions of the flow from each filter graphically. In some experiments we will also compare the errors between the mean of each filter and the truth as a function of time, for the flow and drifter variables

78 65 separately. At a given assimilation step, these errors are calculated according to flow error = ( NF m=1 ) 1/2 ) F ( x m x F,true 2 m (4.6) drifter error = 1 σ R ( ( x x true ) 2 + (ȳ y true ) 2 ) 1/2, (4.7) where N F may be 3 or 4 depending on the scenario (described in the following subsections), and whether or not we estimate u 0 in that case. In particular, note also that the error on the drifter is normalized by the observation error variance. In the remainder of this chapter, we explore three scenarios: first, in Section 4.2, a single-step update in which a bimodal prior distribution is enforced; second, in Section 4.3, a long trajectory within the linear, undamped, unforced regime in which the drifter crosses through several cells; third, in Section 4.4, a long trajectory in the damped and forced regime. In scenario 1, no noise, forcing, or damping is added to the system. Figure 4.1 (left) shows a snapshot in time of the flow field in this case. (Exact parameters for each scenario will be given in the subsections below.) In scenarios 2 and 3, nonzero noise is added to the evolution of the flow, and the drifter trajectories cross between several cells. The true trajectories for these cases are given in Figure 4.1 (right): black circles represent the drifter trajectory in scenario 2 with no damping or forcing, and red crosses represent the drifter trajectory in scenario 3 with damping and forcing. In these two final cases, we run the filters over much longer time windows than the first scenario.

79 Figure 4.1: Setup for each scenario. Left: snapshot in time of flow field (u, v) (arrows) and height field h (shading), scenario 1: no noise, damping, or forcing. Right: true drifter trajectories; black circles, scenario 2 (no damping or forcing) and red crosses, scenario 3 (with damping and forcing.) 4.2 Scenario 1 - Single Step, Bimodal Prior In this simple case, we consider the marginal posterior distributions on the four flow variables u 0, u 1, v 1, h 1 and the drifter coordinates x and y after a single forecastupdate step of each assimilation algorithm. The particle filter update step includes Metropolis-Hastings resampling and the hybrid filter update step includes the EnKF update on the flow variables described in Section In this case, k = l = m = 1 and no noise, damping, or forcing is added to the system: Q = 0, F = 0, and δ = 0. We let the EnKF ensemble size and the number of particles for the particle filter both be N = The ensemble of flow members for the hybrid filter is N e = 1000 and the number of drifter particles for each flow member is M = 100, so that the total number of particles in the hybrid filter is MN e = Since the dimension of the estimated state is relatively low and only one update step is performed, the particle filter distribution is taken to be an approximation to the true Bayesian posterior. The prior distributions on each of the flow variables u 0, u 1, v 1, h 1 are Gaussian. The prior distribution on x is also Gaussian, while the prior distribution on y is

80 u0 PF EnKF hybrid u v ! ! ! h1 1 x 0.5 y ! ! ! Figure 4.2: Comparison of posterior distributions of particle filter (blue dashed curve), ensemble Kalman filter (red dotted curve), and hybrid filter (green dash-dotted curve): single forecast & update step of stationary linear shallow water equations. Bimodal prior on y; Gaussian priors on u 0, u 1, v 1, h 1. bimodal to simulate a saddle case. Based on previous applications of the EnKF to a bimodal distribution, we expect the EnKF to fail to capture the true distribution of the y coordinate, but we expect the hybrid filter to capture this distribution more accurately (since the algorithm uses a particle filter on the drifter variables.) Indeed, in Figure 4.2, the particle filter posterior on the y coordinate is highly non-gaussian, and while the hybrid filter captures this shape, the EnKF posterior is much closer to Gaussian. The particle filter posterior on the x coordinate is much closer to Gaussian, and while the EnKF posterior is more accurate than for the y coordinate, it still does not quite capture the covariance of the particle filter, while the hybrid filter does. The hybrid filter and EnKF are equivalent on the flow variables, since the hybrid filter employs the EnKF update on these variables. In this case, since the flow variables evolve linearly, the EnKF posterior and particle filter posterior distributions are fairly close to each other.

81 4.3 Scenario 2 - Long Trajectory; Undamped, Unforced Model 68 Next, we test the performance of each filter in the case where a drifter passes through many cells in the flow. Within this scenario, we run experiments for two sets of observations: high-frequency observations and low-frequency observations. Here, we only estimate three flow variables (u 1, v 1, h 1 ) using Equations (4.3) and the drifter (x, y) using Equations (4.4) with wave numbers k = l = m = 4, model noise covariance Q = 0.01I, and F = δ = 0. The observation error covariance is R = 0.1I. The high-frequency case uses 400 observations with T final = 10, and the low-frequency case uses 100 observations for the same time window. The true initial conditions are (u true 0 (0), u true 1 (0), v true 1 (0), h true 1 (0), x true (0), y true (0)) = (1, 0.5, 0.9, 1, π/2, π). In each case, the initial ensembles for the filters are drawn from Gaussian distributions which are centered away from the truth, in order to judge whether the filters are able to recover from this initial error. The initial ensembles for the flow variables are drawn from distributions with mean (u true 1 (0) + 0.5, v true 1 (0) + 0.5, h true 1 (0) + 0.5) and covariance I. The initial ensembles for the drifter variables are drawn from distributions with mean (x true (0) + 0.1, y true (0) + 0.1) and covariance 0.1I. The particle filter uses ensemble size N = 10 5 and the EnKF uses ensemble size N e = 50. The hybrid filter uses N e = 50 and M = 2000; that is, each of the 50 flow members has 2000 drifter particles associated with it. Figures show the results for the high-frequency and low-frequency cases in this scenario. In particular, Figures 4.3 and 4.5 show the evolution of each of the three flow variables u 1, v 1, h 1 as a function of time, as well as the path of the drifter. The black lines show the true evolution, while blue, red, and green represent the time evolution of the means of the PF, EnKF, and hybrid filter, respectively.

82 69 Figures 4.4 and 4.6 show the errors of the means (of the flow and drifter variables separately) from the truth as a function of time; the vertical dashed lines denote the times that the hybrid filter met the criterion for the EnKF update. 1 PF EnKF hybrid truth 2 u1 h ! time y v ! time PF EnKF hybrid truth -2 2! time! x Figure 4.3: Scenario 2a: no damping or forcing, high observation frequency. Evolution of the particle filter (blue dashed), EnKF (red dotted), and hybrid filter (green dash-dotted) means of flow variables u 1 (top left), v 1 (top right), and h 1 (bottom left) as a function of assimilation step, and trajectory of drifter over entire assimilation window (bottom right). True evolutions are given in black. In the high-frequency case, all three filters are able to estimate v 1 and h 1 fairly well, although the EnKF tends to overestimate the amplitude of the periodic oscillation of the flow variables. All of the filters have more difficulty estimating u 1, but the EnKF completely loses the true evolution of this variable. This, combined with the lesser performance of the EnKF in estimating v 1 and h 1, results in the EnKF having consistently worse flow error than the hybrid and particle filters. Although all of the filters seem to have slightly increasing error in the flow variables towards

83 70 flow RMSE PF EnKF hybrid hybrid update time drifter position RMSE PF EnKF hybrid hybrid update time Figure 4.4: Scenario 2a: no damping or forcing, high observation frequency. Errors of means of particle filter (blue dashed), EnKF (red dotted), and hybrid filter (green dash-dotted) from truth as functions of assimilation step. Vertical dashed lines represent steps at which the hybrid filter performed the EnKF update, according to the resampling threshold described in the text. the end of the assimilation window, this is likely due to the presence of model error. All filters have very similar behavior in terms of drifter error. There are some peaks in the drifter error plot (Figure 4.4), when the means lose track of the true drifter trajectory; however, despite the errors being similar in magnitude for all filters, the hybrid filter gives higher probability to the truth in these cases. For example, Figure 4.7 shows the posterior distributions of each filter for the drifter variable, along with the truth, at assimilation step 159 (almost halfway through the time window), when the largest peak in drifter error occurs. Clearly, though the filters capture the y coordinate well, they have trouble estimating the x coordinate in this case. The EnKF (red dotted curve) and particle filter (blue dashed curve) are far from the truth (vertical black line) in their estimates of the x coordinate; in

84 u1 0 v PF EnKF hybrid truth!! time time 5 4 PF EnKF hybrid truth h1 0 y time!! x Figure 4.5: Scenario 2b: no damping or forcing, low observation frequency. Evolution of the particle filter (blue dashed), EnKF (red dotted), and hybrid filter (green dash-dotted) means of flow variables u 1 (top left), v 1 (top right), and h 1 (bottom left) as a function of assimilation step, and trajectory of drifter over entire assimilation window (bottom right). True evolutions are given in black. fact, the truth lies in the low-probability tails of each of these distributions. On the other hand, the hybrid filter distribution (dash-dotted green curve) is shifted closer to the truth, and has larger probability at the value of the truth than the EnKF or the particle filter. Thus, even though the means of the filters all have large errors in the drifter position at this time step, the hybrid filter distribution is much closer to the truth. In the low frequency case, the results from above are amplified. In particular, the hybrid filter and particle filter each estimate the flow variables much more accurately than the EnKF. Figure 4.5 shows that, as before, the EnKF tends to overestimate the amplitude of v 1 and h 1. Once again, all of the filters have difficulty estimating u 0,

85 72 flow RMSE PF EnKF hybrid hybrid update time drifter position RMSE PF EnKF hybrid hybrid update time Figure 4.6: Scenario 2b: no damping or forcing, low observation frequency. Errors of means of particle filter (blue dashed), EnKF (red dotted), and hybrid filter (green dash-dotted) from truth as functions of assimilation step. Vertical dashed lines represent steps at which the hybrid filter performed the EnKF update, according to the resampling threshold described in the text. although the hybrid filter seems to have the best performance in this case. Figure 4.6 shows the errors of each filter for the flow and drifter variables. In this case, it is clear that the EnKF has consistently larger error not only in the flow but also in the drifter estimate. Table 4.1 shows the averaged errors of the drifter and flow over the entire time window for each filter, as well as the error of the flow in the time window (2.5, 10) (after some spinup of the filter). In particular, the particle filter provides the best estimate of the flow on average, though the hybrid errors are similar. The EnKF has the poorest performance for all cases in scenario 2, but for a low frequency of observations, this difference becomes more apparent.

86 73 4 EnKF PF hybrid truth 12 8 EnKF PF hybrid truth x!! y Figure 4.7: Distributions of x (left) and y (right) drifter variables for particle filter (blue dashed), EnKF (red dotted), and hybrid filter (green dash-dotted); scenario 2a: no damping or forcing, high observation frequency. observation method drifter RMSE flow RMSE flow RMSE frequency (entire window) (after spinup) high EnKF PF hybrid low EnKF PF hybrid Table 4.1: Root mean squared error of each filter over assimilation window: scenario 2 (without damping or forcing.) In particular, note that good estimation of the drifter position does not guarantee good estimation of the flow. For example, in the high observation frequency case, the hybrid filter estimates the drifter position (slightly) better than the particle filter, but the particle filter estimates the flow better than the hybrid filter. This is a general trait of Lagrangian data assimilation: since the drifter positions are observed, they are often easier for the filter to estimate than the flow. However, since the flow is the state of interest, comparison of filter performance should focus on flow estimation.

87 4.4 Scenario 3 - Long Trajectory; Damped, Forced Model 74 In the final scenario, we add nonzero damping and forcing to the ODE on the flow variables (see Equations (4.3)). This case estimates (u 1, v 1, h 1, x, y) with k = l = m = 4, F = [0.2, 1, 1], and δ = [0.7, 0.1, 0.1]. As in scenario 2, Q = 0.01I, and R = 0.1I. The same time window and observation frequencies as scenario 2 are used: T final = 10 with 400 observations for the high-frequency case and 100 observations for the low-frequency case. The true initial conditions are (u true 0 (0), u true 1 (0), v true 1 (0), h true 1 (0), x true (0), y true (0)) = (1, 0.5, 0.9, 1, π/2, π). The initial ensembles for the flow variables are drawn from Gaussian distributions with mean (u true 1 (0) + 0.5, v true 1 (0) + 0.5, h true 1 (0) + 0.5) and covariance I. The initial ensembles for the drifter variables are drawn from Gaussian distributions with mean (x true (0) + 0.5, y true (0) + 0.5) and covariance 0.5I. As in scenario 2, the particle filter uses ensemble size N = 10 5, the EnKF uses ensemble size N e = 50, and the hybrid filter ensemble sizes are N e = 50 and M = Figures show the results for scenarios 3a and 3b. As above, the evolutions of the means of each variable from each filter are shown, as well as the evolution of the errors of the means from the truth, for both a high frequency and low frequency of observations. With a high frequency of observations, the EnKF takes longer to converge to the true flow state than the hybrid or PF does, but eventually all the filters are comparable (see Figure 4.9). This is likely due to the damping in the system, which pulls all of the evolutions towards the zero state (see Figure 4.8). However, the EnKF clearly fails in its estimation of the drifter position when the true drifter approaches a saddle in the flow. In particular, note the spike in the drifter error of the EnKF at time t 3.5 in Figure 4.9. This corresponds to the drifter coordinate of (0.85, 1.8)

88 PF EnKF hybrid truth 2 u1 v time!! time 2 4 PF EnKF hybrid truth h1 0 y 2-2 0!! time 1 x 2 Figure 4.8: Scenario 3a: with damping or forcing, high observation frequency. Evolution of the particle filter (blue dashed), EnKF (red dotted), and hybrid filter (green dash-dotted) means of flow variables u 1 (top left), v 1 (top right), and h 1 (bottom left) as a function of assimilation step, and trajectory of drifter over entire assimilation window (bottom right). True evolutions are given in black. in the bottom right plot of Figure 4.8. Figure 4.12 shows the posterior distributions of the x and y coordinates of the drifter for each filter at the time of this breakdown. In particular, although each filter does fairly well at estimating the x coordinate, the EnKF fails to capture the true y coordinate (the truth is not even contained in the support of the EnKF posterior at this time.) However, in the case of low frequency of observations, the EnKF does not converge to the true flow at all within the time window. In addition, the EnKF estimate of the drifter trajectory is severely degraded when the drifter first crosses between cells (in the time window (3.5, 5) in Figure 4.11.) This occurs near the saddle point around (0.85, 1.8) as in scenario 3a, although in scenario 3b, the error is much more

89 76 flow RMSE PF EnKF hybrid hybrid update time drifter position RMSE PF EnKF hybrid hybrid update time Figure 4.9: Scenario 3a: with damping and forcing, high observation frequency. Errors of means of particle filter (blue dashed), EnKF (red dotted), and hybrid filter (green dash-dotted) from truth as functions of assimilation step. Vertical dashed lines represent steps at which the hybrid filter performed the EnKF update, according to the resampling threshold described in the text. pronounced and lasts for a longer period of time. This is exactly the case in which we would expect the EnKF performance to degrade, and the hybrid filter to provide a large gain in improvement: the drifter trajectory is highly nonlinear, the evolution of the flow is somewhat nonlinear, and the time between observations is long. Once again, the poorest estimation by all of the filters is of u 1. The over-damping of the filter estimates of u 1 in Figure 4.8 may be due to the size of damping/forcing used in this system. Another possibility may be that the truth in this case is a draw from a probability space on time trajectories, for which the filter means provide the (zero-state) mean of this space. Table 4.2 gives the averaged errors of each filter for scenario 3. As above, the table includes the errors for the drifter and flow over the entire window, as well as

90 PF EnKF hybrid truth 4 2 u1 v ! time ! time PF EnKF hybrid truth h1 0 y ! time ! x Figure 4.10: Scenario 3b: with damping or forcing, low observation frequency. Evolution of the particle filter (blue dashed), EnKF (red dotted), and hybrid filter (green dash-dotted) means of flow variables u 1 (top left), v 1 (top right), and h 1 (bottom left) as a function of assimilation step, and trajectory of drifter over entire assimilation window (bottom right). True evolutions are given in black. for the flow over the time window (2.5, 10). On average, the EnKF has the poorest performance, and the particle filter generally has the best performance. In particular, these errors show the extent to which the EnKF breaks down in the low-frequency case. 4.5 Discussion The numerical experiments presented in this chapter demonstrate that the hybrid filter consistently outperforms the ensemble Kalman filter and often performs on par with posterior densities estimated by the particle filter, when the flow is low-

91 78 flow RMSE PF EnKF hybrid hybrid update time drifter position RMSE PF EnKF hybrid hybrid update time Figure 4.11: Scenario 3b: with damping and forcing, low observation frequency. Errors of means of particle filter (blue dashed), EnKF (red dotted), and hybrid filter (green dash-dotted) from truth as functions of assimilation step. Vertical dashed lines represent steps at which the hybrid filter performed the EnKF update, according to the resampling threshold described in the text. dimensional and nearly linear. In this case, the hybrid filter estimated the full posterior distribution much more accurately than the EnKF. Many applications involve sampling from this posterior in order to get a sense of different possible outcomes, as well as variability among them; thus, an incorrect posterior distribution would result in incorrect samples (even if the distribution has the correct mean and covariance.) Therefore, in cases where the true posterior distribution is highly non-gaussian, the EnKF will likely give poor results regardless of algorithmic improvements such as covariance inflation. In the cases shown here, the hybrid filter overcame this problem and yielded posterior distributions which more closely represented those of the particle filter. When damping and forcing were added to the dynamical system on the Fourier amplitudes of the linear shallow water equations, the hybrid filter showed an improvement in estimation over the EnKF. In particular, when the time between

92 79 6 EnKF PF hybrid truth 6 EnKF PF hybrid truth ! x! 1 2 y 3 Figure 4.12: Distributions of x (left) and y (right) drifter variables for particle filter (blue dashed), EnKF (red dotted), and hybrid filter (green dash-dotted); scenario 3a: with damping or forcing, high observation frequency. observation method drifter RMSE flow RMSE flow RMSE frequency (entire window) (after spinup) high EnKF PF hybrid low EnKF PF hybrid Table 4.2: Root mean squared error of each filter over assimilation window: scenario 3 (with damping and forcing.) observations became long, the EnKF broke down while the mean of the hybrid filter provided accurate estimations of the truth. This is precisely the case that motivated the development of the hybrid filter, as drifter path nonlinearity is hard to avoid when the time between observations is long. While the linear shallow water equations were a useful model to compare the ability of each filter to approximate the Bayesian posterior distributions, we are also interested in the more realistic case of a nonlinear, high-dimensional flow model. To this end, we will apply the hybrid filter and ensemble Kalman filter to the nonlinear shallow water equations in the next chapter. In this case, we will be unable to

93 80 implement the particle filter, and thus we will not be able to judge which filter is best able to approximate the true Bayesian posterior distribution. However, we will still be able to compare the means of each filter to the true state of the system.

94 Chapter Five Application to the Nonlinear Shallow Water Equations

95 Model Description To test the hybrid method on a fully nonlinear, high-dimensional flow variable, we use the nonlinear shallow water equations as a model; see [25, 13, 59]. As with the linear shallow water equations, the flow variables are u(x, y, t), v(x, y, t), and h(x, y, t) to denote, respectively, the zonal velocity, meridional velocity, and height offset at location (x, y) and time t. Assume we are on a domain of size L x L y. Without forcing or dissipation, these variables evolve according to u t + u u x + v u h fv = g y x v t + u v x + v v h + fu = g y y h t + x (hu) + y (hv) = 0 (5.1) where f is the Coriolis parameter (arising from the motion of a fluid on a rotating plane) and g is gravitational acceleration. A derivation of these equations from first principles is provided in Appendix A. In this chapter, we attempt to reproduce the results of [57], which will be discussed in detail in the following section. Here, we present the version of the shallow water equations used in that study, which includes wind forcing and dissipation due to viscosity in the velocity fields. The resulting model is u t v t h t = u u x v u + fv g h y x + F u + 1 h = u v x v v fu g h y y + 1 h = (hu) x (hv) y ( τxx ( τyx x + τ yy y x + τ ) xy y ) (5.2)

96 83 where g is the reduced gravity term, F u is a horizontal wind-forcing in the zonal (x) direction, and f is the Coriolis parameter. The dissipation terms are given by ( ui τ ij = µh + u ) j u k δ ij x j x i x k (5.3) for all possible combinations of i and j, which can take the components x or y and µ is a constant eddy viscosity. This form of the dissipation term leads to a selfconsistent formulation of the shallow water equations (i.e., it satisfies energy and momentum conservation principles) [59]. The β-plane approximation is used for the Coriolis term. This plane spans several degrees of latitude and is the area in which large wave formations are studied [13]; in particular, the Coriolis parameter is Taylor expanded and approximated as f = f 0 + βy. (5.4) The wind-forcing is F u = H o (t) = 1 L x L y τ o ρh o (t) cos(2πy/l y), (5.5) Lx Ly 0 0 h(x, y, t)dxdy (5.6) where τ o is the wind stress, ρ is the density of water, and H o (t) is the average water depth as a function of time. In this case, instead of parameterizing the flow variables (u, v, h), we discretize each variable over the domain. In these experiments, we use a Chebyshev grid with N x N y grid points, with N x = N y = 50. Thus, the flow variable x F = (u, v, h) T is of size N F = = 7500, which is too high of a dimension to apply the

97 84 particle filter. Therefore we will only consider the ensemble Kalman filter and the hybrid filter in this section. Finally, assuming we have N D drifters, we define the drifter variable as x D = (x (1), y (1),..., x (N D), y (N D) ). For j = 1...N D, they will evolve according to the flow: ẋ (j) = u(x (j), y (j), t) (5.7) ẏ (j) = v(x (j), y (j), t) (5.8) which will be numerically implemented using a pseudospectral Chebyshev algorithm, described below. 5.2 Previous Results on Drifter Deployment In [57], Salman et al investigate the effects of different drifter deployments (based on knowledge of the flow field) on the performance of the EnKF. Since the true flow field was known a priori, it was possible to initially release the drifters in the flow to target certain structures. The authors considered four different drifter deployment strategies in their work: first, a uniform release across the entire domain; second, a release which targeted saddle points (intersections of attracting and repelling material lines) in the flow; third, a release which targeted vortex centers; and fourth, a mixed release which targeted both saddle points and vortex centers. The true trajectories of the drifters for each of these strategies, over a period of 300 days, are given in Figure 5.1. For each of these experiments, nine drifters were released, and an en-

98 85 semble of size 80 was used to try to estimate the drifter position, the kinetic energy (a function of u and v), and the height field (h.) The assimilation method used was the perturbed-observation ensemble Kalman filter with Gaspari-Cohn localization (discussed in Chapter 2) with radius 600 km. Figure 5.1: Trajectories of the true drifters, over a period of 300 days, for each launch strategy. Light gray lines are lines of constant height. Figure from [57]. They used the shallow water model with wind forcing and diffusion described above, with boundary and initial conditions given by u(x, y, t) Ω = 0 (5.9) v(x, y, t) Ω = 0 u(x, y, 0) = 0 v(x, y, 0) = 0 h(x, y, 0) = H o where Ω is the boundary of the domain. They used a uniform mesh with N x = N y = 100 grid points to discretize the flow. The exact parameters used in this study are provided in Table 5.1. The ensemble members each had randomly perturbed initial height fields, with mean 50 m and variance 50 m. The truth run and ensemble members were all

99 86 Parameter Value L x 2000 km L y 2000 km f s 1 β m 1 s 1 H m g 0.02 ms 2 ρ 1000 kg m 3 τ N m 2 x 20 km y 20 km t 12 min µ 400 m 2 s 1 Table 5.1: Parameter values used in the nonlinear shallow water equations in [57]. spun up (evolved forward for a fixed amount of time); after spinup, the drifters were released in the truth run to generate the true drifter trajectories. Random noise with variance σ 2 obs = 200 m was then added to the true drifter trajectories to generate noisy observations. They then assimilated 300 observations, and calculated the errors of the mean from the truth at each assimilation step according to: KE error = h error = x D error = N x 1,N y 1 i=1,j=1 N x,n y i=1,j=1 (u i,j u t i,j) 2 + (v i,j vi,j) t 2 (u t i,j )2 + (vi,j t )2 N x 1,N y 1 i=1,j=1 (h i,j h t i,j) 2 N x,n y (h t i,j )2 i=1,j=1 1/2 N D (x (j) x t,(j) ) 2 + (y (j) y t,(j) ) 2 j=1 σobs 2 N D 1/2 1/2 (5.10) (5.11), (5.12) where the overline represents the filter mean, the superscript t denotes the truth, and the subscript i, j here denotes the value of the variable at grid point (i, j).

100 87 The results from this study are given in Figure 5.2. The authors found that the mixed (saddle and center) deployment estimated the kinetic energy of the truth the best. However, two different strategies were able to predict the height field the best: the uniform release and the release that targeted only the saddle points. Every deployment was able to track the drifter trajectories well. Figure 5.2: Errors in (a) kinetic energy, (b) height field, and (c) drifter position of each of the deployment strategies, as well as the errors without assimilation. Errors are calculated as described in the text. Figure from [57]. To attempt to explain this difference, the authors consider the spatial errors of kinetic energy and height field over the domain. They find that the kinetic energy errors are mainly localized around the areas which are very active (eg, the western boundary current near the unsteady meandering jet) while the height field errors are mainly on the eastern side of the domain. Therefore, to estimate the velocity field well, the drifters need to target areas of high activity in the flow; on the other hand, to estimate the height field well, the drifters need to be evenly spread across the entire domain. This explains why the deployment which targets different structures in the flow estimates the kinetic energy well, while the deployments which ensure the drifters spread out across the entire domain (uniform and saddle) estimate the height field well. However, it was not explained why the kinetic energy errors are localized to areas of high activity and the height field errors are not. One possible explanation for this could revolve around the fact that the height

101 88 field is a global quantity, while the velocity field is a local quantity. In particular, note that the shallow water equations do not ever make assumptions that would violate the incompressibility of the fluid; thus, if the height field decreases at one location in the fluid, this must be accounted for elsewhere in the domain. We now explore this in a more rigorous setting. First, based on a simple derivation with the Navier-Stokes equations, we show that the height field is a global quantity. We note that the Navier-Stokes equation for the motion of a fluid, combined with the continuity equation arising from conservation of mass, may be rewritten so that the pressure is the inverse Laplacian of a function of the velocity. Since the Laplacian is a global operator, this implies that the pressure is a global quantity over the domain. The Navier-Stokes equation for velocity v, pressure p, density ρ, viscosity coefficient ν and other body forces f is ρ (v t + v v) = p + ν v + f (5.13) and the associated continuity equation is ρ t + (ρv) = 0. (5.14) Assume (as we do in Appendix A when deriving the shallow water equations) that the density is approximately constant in space and time, so that ρ t = 0 and without loss of generality assume ρ = 1. Then the continuity equation implies v = 0 and it can be shown that the pressure is the inverse Laplacian of a function of the

102 89 velocity. Taking the divergence of the Navier Stokes equation, we have (v t + v v) = p + ν v + f (5.15) = v t + (v v) + p = ν v + f = p = (ν v v v + f) ( v) t = p = (ν v v v + f). Under the further assumptions made when deriving the shallow water equations, the forcing term f will include the Coriolis forces on the velocity, but this does not change the basic derivation. Next, recall (from Appendix A) that under the assumptions that the fluid is homogeneous, the pressure is constant above the fluid surface, and the bottom surface is flat, then the pressure is proportional to the height h. Note that the last assumption we needed to derive the shallow water equations was that the flow is barotropic; this assumption, that the velocities are independent of the height of the fluid, does not affect the above derivation. Thus, we have that in the shallow water setup, the height field h is the inverse Laplacian of a function of the velocity field, which implies that the height field is a global quantity, as we wished to show. On the other hand, the velocity field is a local quantity. In addition, because of the added wind forcing and viscosity, the kinetic energy is not necessarily conserved. This could explain why the uniform launch, and eventually the saddle launch, predict the height field better than the other methods, while they do not predict the kinetic energy as well. Observations are needed across the entire flow domain (either by spreading them out initially, or targeting saddle points which cause the drifters to quickly explore a large space) in order to correctly constrain the height field globally. They are not needed within every vortex center, since the height in those areas will

103 90 be constrained by the model to conserve mass. However, the same is not true for the kinetic energy. If there are no observations within a vortex, the model will not help to constrain the velocity in that area towards the correct value. This could explain why the mixed deployment, with some drifters spreading out across the domain and some targeting vortex centers, performs the best at predicting the kinetic energy. In fact, for the same reason as above, we may not expect the height field to be predicted as well as it was in [57] in general, without drifter observations covering the entire domain. In addition, because of the global nature of the height field, localization may actually hurt the results more than it helps in this case. The localization in [57] was dependent only on the distance from the grid point to the drifter, but not on whether the variable was the horizontal velocity, meridional velocity, or height field. Thus, in the following experiments, we have implemented the same localization. However, this may suppress correlations that are not spurious artifacts from the finite ensemble size, but real correlations due to the nature of the flow. This would suggest that the localization radius should perhaps be larger for the covariance involving the height field variables than for the velocity field. This could also explain why we might expect to see a more drastic improvement in kinetic energy after assimilation than we would for the height field. 5.3 Results In this section, we test both the EnKF and the hybrid filter on the nonlinear shallow water equations, for different drifter deployments as in [57]. Before presenting the results, we first describe the setup and how it differs from the one used in the previous investigation.

104 Numerical Implementation In this experiment, we wanted to test the hybrid particle-ensemble Kalman filter on a fully nonlinear system with a high-dimensional flow part, which is a more typical situation in Lagrangian data assimilation. Therefore, we tested both the EnKF and the hybrid filter using the same model as in [57], with the same parameter values for the shallow water equations presented above. However, the authors of [57] used a central differencing scheme for the evolution of the velocity and a multidimensional positive definite advection transport algorithm for the evolution of the height field, and we will use a Chebyshev spectral algorithm (see [67] for details.) In particular, we discretize the flow on a Chebyshev grid of size N x N y where N x = N y = 50. The Chebyshev grid is not uniform, but rather the grid points are clustered more tightly near the boundaries. This is appropriate for this problem, since the most active areas of the flow tend to be near the boundaries. In addition, the decomposition into spectral components on Chebyshev points allows us to deal with the Dirichlet boundary conditions more easily. To evolve the flow variables (u, v, h), we use a fourth-order Runge Kutta method in physical space, and calculate the spatial derivatives of each variable with Chebyshev differentiation matrices. We advect the drifter variables using a fourth-order Runge Kutta method as well, and we calculate the velocity (u, v) at the exact drifter location via a spectral decomposition. For the EnKF, we used N e = 50 and T final = 300 days, and for the hybrid filter we used N e = 50, M = 50 and T final = 40 days (due to computational limits.) We used Gaspari-Cohn localization with a radius of 600 m (as above) for both the EnKF and hybrid filter experiments. We also attempted to use the same drifter deployment strategies, but since our numerical implementation of the flow is different than the one used in [57], the true flow field and drifter trajectories may be slightly different.

105 92 In particular, we used the same initial release coordinates regardless of whether they still match exactly with the structures; however, the trajectories have fairly similar behavior in our study as in [57]. Figure 5.3 shows a snapshot of the height field h; shading represents height. Figure 5.5 shows the evolution of the velocity field, sampled every 25 days. Note that, as in the previous study, there is one major vortex in the southwestern quadrant of the domain, and several other vortices north of that one, causing a meandering jet which flows from west to east. Additionally, note that the northernmost vortex in the first panel breaks apart over the course of the experiment; this allows drifters released within that vortex to escape and explore more of the domain, as we will see. Finally, note that the western side of the domain is the most active region; the velocity field on the eastern half has relatively low magnitude, so drifters released in that half of the domain will not travel a great distance. Figure 5.4 shows the resulting true drifter trajectories in our study. Colors represent the different deployment locations: uniform (green), center (blue), saddle (red), and mixed (magenta, right panel) and the arrows represent a snapshot in time of the gradient flow field. With the exception of the drifters targeting the vortex that breaks apart, all the drifters which targeted vortex centers stayed fairly close to their release locations, and the drifters which targeted saddles spread across the western half of the domain fairly well; however, some of the saddle drifters eventually became trapped, the effect of which we will see in the following sections Results - Ensemble Kalman Filter The errors of the EnKF mean estimate as a function of time, split into kinetic energy, height field, and drifter position as above, are given in Figure 5.6. The solid black

93 2000 750 550 0 2000 350 Figure 5.3: True height field (shading) at initial assimilation time. Axes and shading scales are in meters.

106 Figure 5.3: True height field (shading) at initial assimilation time. Axes and shading scales are in meters. line gives the evolution of the kinetic energy and height field errors of an ensemble with the same initial statistics, but without assimilating observations. As expected, each deployment strategy captures the drifter position equally well. The center deployment gives a poor estimate of the kinetic energy and the height field, though its estimate of the height field improves around t = 150 days. This may be due to the drifters in the vortex near (500, 1200), which follow the vortex as it drifts southward. In addition, at around t = 160 days, the drifters escape the vortex near (300, 1300) and move southeast towards another orbit; this would improve the velocity estimates for this deployment as well. The uniform deployment gives the best estimate of the height field for most of the time window, but a poor estimate of the kinetic energy, although its errors initially decrease. The mixed deployment estimates the kinetic energy well until about t = 175 days; at this point, the errors dramatically increase. This is likely due to two of the drifters getting trapped in a vortex; note that the drifters targeting the saddle point at around (400, 1100) do not escape towards the right side of the domain, but rather get trapped near (200, 1500). Once they have stopped exploring the space, they cannot estimate the kinetic energy well. We see similar results for the saddle launch, although the increase in errors is delayed until around t = 225 days. Since the saddle launch includes a drifter which explores the

107 Figure 5.4: True trajectories of drifters under the nonlinear shallow water equations described in the text. Left: Drifters are grouped according to deployment location: uniform (green), center (blue), and saddle (red). Black arrows represent a snapshot in time of the gradient flow field, corresponding to the initial assimilation time. Right: Specific drifters used in mixed deployment strategy. Axes represent length in meters. center of the domain, it is able to estimate the velocity field better than the mixed deployment; however, it also includes the saddle-launch drifters which get trapped in a vortex, and eventually cause degradation of the performance of the velocity estimate of the saddle deployment. Finally, the saddle and uniform deployments estimate the height field well. There is a decrease in the saddle deployment estimate of the height field after t = 200 days, likely due to the drifter which travels away from the energetic eastern region towards the center of the domain. However, none of the deployments provide a drastic decrease in height field errors; note the scale. These results are similar, although not identical to those of the previous study. In both studies, the uniform launch estimates the height field well, and the mixed deployment estimates the kinetic energy fairly well (for some amount of time.) However, the magnitude of decrease of the height field errors is much smaller in this study than the previous one. Additionally, the uniform deployment gives a much poorer estimate of the kinetic energy in this study than in the previous one. We also consider the spatial height and kinetic energy errors for each deployment

108 95 1 day 26 days 51 days 76 days 101 days 126 days 151 days 176 days 201 days 226 days 251 days 276 days Figure 5.5: Evolution of the true velocity vector field for the nonlinear shallow water equations. Axes are in meters. at several fixed points in time; these results are given in Figures As expected, the kinetic energy errors are almost entirely confined to the left region of the domain, with the most small-scale structure. The height field errors are spread more evenly across the entire domain, although the regions of largest error are also on the left side of the domain.

109 96 KE error H error no assim uniform saddle center mixed Position error time 0.085! time 0.2! time Figure 5.6: Errors for each deployment strategy, using the EnKF: uniform (green), saddle (red), center (blue), and mixed deployment (magenta), compared to a run with no assimilation (black). In particular, note row three of Fig This shows that the center deployment strategy makes the largest errors in velocity estimation along the manifolds separating the vortices, but after a drifter escapes a vortex (around day 100), these errors dissipate. This can be seen in the smaller magnitude errors in row three, column three (center deployment at day 150.) These figures also explain the initial decrease but ultimate increase in kinetic energy error for the uniform deployment; see row one of Fig Note that initially, three drifters are near the left boundary of the domain where much of the activity in the velocity field is, and thus the errors are fairly small. However, as time goes on, most of the drifters move away from that area and towards the right half of the domain, where there is little activity; velocity errors near the double-gyre structures to the left are then allowed to increase with no effect from the observations. These results are likely due to the local nature of the kinetic energy; without drifters in and near the vortices, the velocities in that area are very poorly estimated. Figure 5.8 also provides support for our hypothesis regarding height field as a global quantity, which therefore requires observations spread uniformly across the domain. Note row three, which shows the spatial height errors for the center deployment. The third plot (150 days) in particular shows that the height field is well-estimated in two of the vortices with drifters, but the height in the eastern side

110 97 of the domain, especially along the right and lower boundaries, is not well estimated. In fact, the estimate of the height field along the right boundary becomes poorer as time goes on, since the drifters do not explore that area Results - Hybrid Filter We next perform the same experiment using the hybrid filter; the errors in these estimates are given in Figure 5.9. Again, these errors are plotted as a function of time and include the errors calculated without any assimilation. The errors in kinetic energy are similar to those from the EnKF; in particular, the center deployment has the worst performance, while the saddle and mixed deployments initially have comparable results, but the saddle case seems to have worsening errors as time goes on. The magnitude of errors is, on average, slightly larger than those for the EnKF. The height field errors are also fairly different from those of the EnKF. In particular, although the uniform deployment does well in both the hybrid and EnKF cases, the center deployment is comparable for the hybrid case. The mixed deployment has the worst performance, followed by the saddle deployment. However, the magnitude of the height field errors for the uniform and center case are slightly smaller than those for the EnKF. Each of the strategies has slightly worse performance estimating the drifter position than the EnKF, particularly the saddle and mixed deployments, which increase above 3 before starting to level off. This may be due to sample impoverishment, as we are essentially attempting to estimate an 18-dimensional variable with 2500 particles. This is further supported by the frequency of resampling; the resampling threshold was hit every other time step, which means that the weights started to collapse after two steps. In addition, in this experiment, we did not resample the

98 25 days 75 days 0 2000 2000 2000 2000 2000 0 2000 Saddle launch 2000 2000 Center launch 2000 2000 0 2000 0 0 Saddle launch 2000 2000 Center launch 2000 0 Center launch 2000 2000 2000 Mixed launch

111 98 25 days 75 days Saddle launch Center launch Saddle launch Center launch Center launch Mixed launch 2000 Uniform launch 2000 Saddle launch 0 Uniform launch Uniform launch days Mixed launch Mixed launch 2000 Figure 5.7: Spatial errors in kinetic energy (EnKF). Left to right: 25 days, 75 days, and 150 days. Top to bottom: uniform release, saddle release, center release, and mixed release. Drifter locations for each release and each time are given in white asterisks. Axes represent length in meters, errors are in m/s.

Uniform launch 2000 2000 2000 0 2000 Saddle launch

2000 Mixed launch 0 2000 Saddle launch 2000 0 Center

2000 Center launch 2000 0 Mixed launch 2000 0 2000

112 99 25 days 75 days days Uniform launch Uniform launch Saddle launch Center launch Mixed launch Saddle launch Center launch 2000 Uniform launch Saddle launch Center launch Mixed launch Mixed launch Figure 5.8: Same as Fig. 5.7 but for height field; errors are in meters.

Lagrangian Data Assimilation and Its Application to Geophysical Fluid Flows

Lagrangian Data Assimilation and Its Application to Geophysical Fluid Flows Laura Slivinski June, 3 Laura Slivinski (Brown University) Lagrangian Data Assimilation June, 3 / 3 Data Assimilation Setup: