(Extended) Kalman Filter

Similar documents
4. DATA ASSIMILATION FUNDAMENTALS

A new Hierarchical Bayes approach to ensemble-variational data assimilation

Efficient Data Assimilation for Spatiotemporal Chaos: a Local Ensemble Transform Kalman Filter

Local Ensemble Transform Kalman Filter

Fundamentals of Data Assimila1on

Fundamentals of Data Assimila1on

In the derivation of Optimal Interpolation, we found the optimal weight matrix W that minimizes the total analysis error variance.

Data assimilation; comparison of 4D-Var and LETKF smoothers

Variational data assimilation

Lagrangian Data Assimilation and Its Application to Geophysical Fluid Flows

Gaussian Process Approximations of Stochastic Differential Equations

Lecture 8: Bayesian Estimation of Parameters in State Space Models

Data assimilation in high dimensions

UNDERSTANDING DATA ASSIMILATION APPLICATIONS TO HIGH-LATITUDE IONOSPHERIC ELECTRODYNAMICS

Ergodicity in data assimilation methods

Convergence of Square Root Ensemble Kalman Filters in the Large Ensemble Limit

Efficient Data Assimilation for Spatiotemporal Chaos: a Local Ensemble Transform Kalman Filter

Fundamentals of Data Assimilation

Weak Constraints 4D-Var

Relationship between Singular Vectors, Bred Vectors, 4D-Var and EnKF

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a

An Efficient Ensemble Data Assimilation Approach To Deal With Range Limited Observation

Hierarchical Bayes Ensemble Kalman Filter

Adaptive ensemble Kalman filtering of nonlinear systems

Par$cle Filters Part I: Theory. Peter Jan van Leeuwen Data- Assimila$on Research Centre DARC University of Reading

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 13: SEQUENTIAL DATA

TSRT14: Sensor Fusion Lecture 8

Consider the joint probability, P(x,y), shown as the contours in the figure above. P(x) is given by the integral of P(x,y) over all values of y.

EKF, UKF. Pieter Abbeel UC Berkeley EECS. Many slides adapted from Thrun, Burgard and Fox, Probabilistic Robotics

EKF, UKF. Pieter Abbeel UC Berkeley EECS. Many slides adapted from Thrun, Burgard and Fox, Probabilistic Robotics

Application of the Ensemble Kalman Filter to History Matching

Lagrangian Data Assimilation and Manifold Detection for a Point-Vortex Model. David Darmon, AMSC Kayo Ide, AOSC, IPST, CSCAMM, ESSIC

Observation impact on data assimilation with dynamic background error formulation

What do we know about EnKF?

Data assimilation in high dimensions

Relationship between Singular Vectors, Bred Vectors, 4D-Var and EnKF

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Ensemble prediction and strategies for initialization: Tangent Linear and Adjoint Models, Singular Vectors, Lyapunov vectors

Conditions for successful data assimilation

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall

Strong Lens Modeling (II): Statistical Methods

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Parameter estimation Conditional risk

Vector Derivatives and the Gradient

Using Observations at Different Spatial. Scales in Data Assimilation for. Environmental Prediction. Joanne A. Waller

Nonlinear and/or Non-normal Filtering. Jesús Fernández-Villaverde University of Pennsylvania

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Data assimilation with and without a model

Introduction to gradient descent

CS 630 Basic Probability and Information Theory. Tim Campbell

Brian J. Etherton University of North Carolina

Methods of Data Assimilation and Comparisons for Lagrangian Data

Assimilation Techniques (4): 4dVar April 2001

Data assimilation with and without a model

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =

Introduction to Machine Learning

Probability and Information Theory. Sargur N. Srihari

Mobile Robot Localization

Assimilation des Observations et Traitement des Incertitudes en Météorologie

LINEAR MODELS FOR CLASSIFICATION. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception

2D Image Processing. Bayes filter implementation: Kalman filter

Comparisons between 4DEnVar and 4DVar on the Met Office global model

Ch. 12 Linear Bayesian Estimators

SLAM Techniques and Algorithms. Jack Collier. Canada. Recherche et développement pour la défense Canada. Defence Research and Development Canada

New Fast Kalman filter method

6.4 Kalman Filter Equations

Expectation Propagation Algorithm

From Bayes to Extended Kalman Filter

Gaussian Filtering Strategies for Nonlinear Systems

Lecture 4: Extended Kalman filter and Statistically Linearized Filter

Organization. I MCMC discussion. I project talks. I Lecture.

Short tutorial on data assimilation

Overfitting, Bias / Variance Analysis

Previously Monte Carlo Integration

Lecture notes on assimilation algorithms

Data Assimilation: Finding the Initial Conditions in Large Dynamical Systems. Eric Kostelich Data Mining Seminar, Feb. 6, 2006

Bayesian Linear Regression [DRAFT - In Progress]

The Kalman Filter ImPr Talk

DART Tutorial Sec'on 1: Filtering For a One Variable System

The Inversion Problem: solving parameters inversion and assimilation problems

Some ideas for Ensemble Kalman Filter

Derivation of the Kalman Filter

Bayesian Inverse problem, Data assimilation and Localization

State Estimation of Linear and Nonlinear Dynamic Systems

Ensemble Data Assimilation and Uncertainty Quantification

Linear Regression (continued)

Image Alignment and Mosaicing Feature Tracking and the Kalman Filter

EnKF and filter divergence

Introduction to Data Assimilation. Reima Eresmaa Finnish Meteorological Institute

M.Sc. in Meteorology. Numerical Weather Prediction

DART_LAB Tutorial Section 2: How should observations impact an unobserved state variable? Multivariate assimilation.

Approximate Inference

2D Image Processing (Extended) Kalman and particle filter

CS491/691: Introduction to Aerial Robotics

Frequentist-Bayesian Model Comparisons: A Simple Example

2D Image Processing. Bayes filter implementation: Kalman filter

MS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari

Data Assimilation in Geosciences: A highly multidisciplinary enterprise

Stability of Ensemble Kalman Filters

Here represents the impulse (or delta) function. is an diagonal matrix of intensities, and is an diagonal matrix of intensities.

Transcription:

(Extended) Kalman Filter Brian Hunt 7 June 2013

Goals of Data Assimilation (DA) Estimate the state of a system based on both current and all past observations of the system, using a model for the system dynamics. Perform the estimation iteratively: compute the current estimate in terms of a recent past estimate. Ideally, quantify the uncertainty in the state estimate.

Terminology and Notation Forecast model: a known function M on a vector space of model states. Truth: an unknown sequence {x n } of model states to be estimated. Model error: δ n = x n+1 M(x n ). Observations: a sequence {y n } of vectors in observation space (may depend on n). Forward operator: a known function H n from model space to observation space. Observation error: ε n = y n H n (x n ).

When DA is not Necessary If the forward operator H n is invertible and ε n = 0 then x n = H 1 n (y n ). If H n is invertible and the statistics of ε n are known, then we can compute the pdf of x n (but data assimilation can improve the estimate). Note: the pdf (probability density function) of x gives the relative likelihood of the possible values of x. The maximizer ( mode ) of the pdf is the most likely value.

More Terminology Background ( first guess ): estimate x b n of the current model state x n given past observations y 1,..., y n 1. Analysis: estimate x a n of x n given current and past observations y 1,..., y n. A data assimilation cycle consists of: Analysis step: Determine analysis x a n from background x b n and observations y n. Forecast step: Typically x b n+1 = M(x a n ).

Remarks on the Analysis Step If the observation error ε n is zero, we should seek x a n close to x b n such that H n (x a n ) = y n. Otherwise, we should just make H n (x a n ) closer to y n than H n (x b n ) is. How much closer depends on the relative uncertainties of the background estimate x n b and the observation y n. The better we understand the uncertainties, the clearer it is how to do the analysis step.

Bayes Rule Definition of conditional probability: Then P(V W ) = P(V W )/P(W ) P(V given W ) = P(V and W )/P(W ) P(V W )P(W ) = P(V W ) = P(V )P(W V ). Corollary: P(V W ) = P(V )P(W V )/P(W ) posterior = prior likelihood/normalization

Bayesian Data Assimilation Assume that the statistics of the model error δ n and observation error ε n are known. Theoretically, given an analysis pdf p(x n 1 y 1,..., y n 1 ), we can use the forecast model to determine a background ( prior ) pdf p(x n y 1,..., y n 1 ). The forward operator tells us p(y n x n ). Bayes rule tells us that the analysis ( posterior ) pdf p(x n y 1,..., y n ) is proportional to p(x n y 1,..., y n 1 )p(y n x n ).

Advantages and Disadvantages Advantage: the analysis step is simple just multiply two functions. Disadvantage: the forecast step is generally unfeasible in practice. If x is high-dimensional, we can t numerically keep track of an arbitrary pdf for x too much information! We need to make some simplifying assumptions.

Linearity and Gaussianity Assume that M and H n are linear. Assume model and observation errors are Gaussian with known covariances and no time correlations: δ n N(0, Q n ) and ε n N(0, R n ). Then in the analysis step, a Gaussian background pdf leads to a Gaussian analysis pdf. Gaussian input yields Gaussian output in the forecast step too. Let the background pdf have mean x b n and covariance P b n.

Bayesian DA with Gaussians The (unnormalized) background pdf is: exp[ (x n x b n ) T (P b n) 1 (x n x b n )/2] The pdf of y n given x n is exp[ (H n x n y n ) T R 1 n (H n x n y n )/2] The analysis pdf is the exp( J n /2) where: J n =(x n x b n ) T (P b n) 1 (x n x b n ) + (H n x n y n ) T R 1 n (H n x n y n ) To find the mean and covariance of the analysis pdf, we want to write: J n = (x n x a n ) T (P a n) 1 (x n x a n ) + c

The Kalman Filter [Kalman 1960] After some linear algebra, the analysis mean x a n and covariance P a n are x a n = x b n + K n (y n H n x b n ) Pn a = [(Pn) b 1 + Hn T Rn 1 H n ] 1 = [I + PnH b n T Rn 1 H n ] 1 Pn b where K n = PnH a n T Rn 1 matrix. is the Kalman gain The forecast step is x b n+1 = Mx a n and P b n+1 = MPb nm T + Q n

Observation Space Formulation After some further linear algebra, the Kalman filter analysis equations can be written K n =P b nh T n [H n P b nh T n + R n ] 1 x a n =x b n + K n (y n H n x b n ) P a n =(I K n H n )P b n The size of the matrix that must be inverted is determined by the number of (current) observations, not by the number of model state variables.

Example Assume that M = H n = I, that x is a scalar, and that Q n = 0 and R n = r > 0. We are making independent measurements y 1, y 2,... of a constant-in-time quantity x. The analysis equations are: x a n =x b n + P a nr 1 (y n x b n ) (P a n) 1 =(P b n) 1 + r 1 Start with a uniform prior pdf: (P b 1 ) 1 = 0 and x b 1 arbitrary. Then by induction, P b n+1 = Pa n = r/n and x b n+1 = x a n = (y 1 + + y n )/n.

A Least Squares Formulation In terms of all the observations y 1,..., y n, what problem did we solve to estimate x n? Assume no model error (δ n = 0). The likelihood of a model trajectory x 1,..., x n is exp( J n /2) where: J n = n i=1 (H i x i y i ) T R 1 i (H i x i y i ) Problem: minimize the cost function J n (x 1,..., x n ) subject to the constraints x i+1 = Mx i.

Kalman Filter Revisited The Kalman filter expresses the minimizer x a n of J n in terms of the minimizer x a n 1 of J n 1 as follows. It expresses J n 1 as a function of x n 1 only. It keeps track of an auxiliary matrix P a n 1 that is the 2nd derivative (Hessian) of J n 1. Assuming it has done so correctly at time n 1, the next slide explains why it does so at time n.

Kalman Filter Revisited If xn 1 a minimizes J n 1 and Pn 1 a Hessian, then is its J n 1 = (x n 1 x a n 1) T (P a n 1) 1 (x n 1 x a n 1)+c n 1 Then substituting x n = Mx n 1, x b n = Mx a n 1, and P b n = MP a n 1 MT yields: J n 1 = (x n x b n ) T (P b n) 1 (x n x b n ) + c n 1 We get the same cost function as before: J n = J n 1 + (Hx n y n ) T R 1 n (Hx n y n ) The KF completes the square as before.

Nonlinear Least Squares Now let s eliminate the assumption that M and H i are linear. As before, assume no model error and Gaussian observation errors. The maximum likelihood estimate for the true trajectory is the minimizer of: J n = n i=1 (H i (x i ) y i ) T R 1 i (H i (x i ) y i ) subject to the constraints x i+1 = M(x i ).

Approximate Solution Methods Use an approximate solution at time n 1 to find an approximate solution at time n. If we track covariances associated with our estimates, we can write: J n 1 (x n 1 x a n 1) T (P a n 1) 1 (x n 1 x a n 1)+c As a further approximation, we can write: J n 1 (x n x b n ) T (P b n) 1 (x n x b n ) + c It seems clear that x b n should be M(x a n 1 ), but what choice of P b n is best?

Extended Kalman Filter Matching the second derivatives of the two approximate cost functions yields P b n = (DM)P a n 1 (DM)T where DM is the derivative of M at x a n 1. The remaining equations are like the Kalman filter (linearizing H n near x b n ). Advantage: The approximation error may be smaller than for other methods. Disadvantage: For a high-dimensional model, the covariance forecast is computationally expensive.

Extended KF (Square Root Form) If M is computed by solving a system of differential equations, then DM is computed by solving the associated tangent linear model (TLM). If P a n 1 = X a n 1 (X a n 1 )T, then compute Xn b = (DM)Xn 1 a, followed by Pb n = Xn b (Xn b ) T. This is easier if Xn 1 a has (many) fewer columns than rows; the resulting covariance has reduced rank. The Kalman covariance update becomes X a n = X b n (H n X b n ) T [H n X b n (H n X b n ) T + R n ] 1/2.

Tangent Linear Model Suppose x n = x(n) where dx/dt = F (x). Then for all solutions, M(x(0)) = x(1). Consider a family of solutions with x γ (0) = x 0 + γv; then DM(x 0 )v = ( / γ)x γ (1) γ=0. Let v(t) = ( / γ)x γ (t) γ=0. Substituting x γ (t) into the ODE and differentiating w.r.t. γ yields dv/dt = DF(x 0 (t))v Compute v(1) with v(0) = v to get DM(x 0 )v.

Ensemble Kalman Filter Use an ensemble of model states whose mean and covariance are transformed according to the Kalman filter equations. Forecast each ensemble member separately. Advantage: Relatively easy to implement and the analysis step is computationally efficient. Disadvantage: Only represents uncertainty in a space whose dimension is bounded by the ensemble size (inherently reduced rank).

3D-Var Replace P b n with a time-independent background covariance matrix B, determined empirically. Numerically minimize the resulting cost function (allowing nonlinear H n ). Advantage: The covariance B and associated matrices (B 1/2 is used in the analysis) only need to be computed once. Disadvantage: Ignores time dependence of background uncertainty, which can vary considerably.

(Strong Constraint) 4D-Var [le Dimet & Talagrand 1985] Numerically minimize the cost function J n =(x n p xn p) b T B 1 (x n p xn p) b n + (H i (x i ) y i ) T R 1 i (H i (x i ) y i ) i=n p subject to the constraints x i+1 = M(x i ). Advantage: Accuracy, especially as p increases. Disadvantage: Difficult to implement and computationally expensive.