Bayesian Statistics and Data Assimilation. Jonathan Stroud. Department of Statistics The George Washington University

Size: px

Start display at page:

Download "Bayesian Statistics and Data Assimilation. Jonathan Stroud. Department of Statistics The George Washington University"

Patrick Briggs
5 years ago
Views:

1 Bayesian Statistics and Data Assimilation Jonathan Stroud Department of Statistics The George Washington University 1

2 Outline Motivation Bayesian Statistics Parameter Estimation in Data Assimilation Combined State and Parameter Estimation within EnKF Physical Parameters Observation Error Variance Observation Error Covariance 2

3 Motivation Physical models and data assimilation systems often involve unknown parameters: physical model parameters error covariance parameters covariance inflation factors localization radius Use data to estimate parameters, either off-line or sequentially. Two common approaches to parameter estimation Maximum likelihood estimation Bayesian estimation 3

4 Statistical Methods for Parameter Estimation Maximum Likelihood approach Specify a likelihood function for the data. Choose parameters to maximize this function. Bayesian approach Parameters are assigned a prior probability distribution. Update the prior distribution using Bayes Theorem. Summarize the parameters using the posterior distribution. 4

5 Bayesian Parameter Estimation A Bayesian model includes the following components (where d = data; α = unknown parameters): Likelihood function: p(d α) Prior distribution: p(α) Posterior distribution (via Bayes Theorem): p(α d) = p(α)p(d α) p(d) The parameters can be summarized using the posterior mean, standard deviation, or 95% posterior intervals. 5

6 Example 1: Normal Data with Unknown Mean Let d 1,...,d n be iid samples from a normal distribution with unknown mean θ and variance v. The likelihood function is p(d θ) = (2πv) n/2 exp { ( d θ) 2 }. 2v/n The standard prior distribution is a normal: θ N(θ b,v b ) p(θ) = (2πv b ) 1/2 exp { (θ θ b) 2 } 2v b Posterior distribution (likelihood prior): p(θ d) exp { ( d θ) 2 2v/n (θ θ b) 2 2v b } 6

7 Example 1: Normal Data with Unknown Mean The posterior distribution is normal θ d N(θ a,v a ) with mean and information θ a = (v/n)θ b +v b d v b +(v/n) v 1 a = v 1 b +(v/n) 1 The posterior mean is a weighted average of the prior mean and the sample mean. The posterior information is the sum of the prior and data information. 7

8 Example 1: Normal Data with Unknown Mean Prior N(0,1) Likelihood N(2,4) Posterior N(0.4,0.8)

9 Example 2: Normal with Unknown Variance Let d 1,...,d n be iid samples from a normal distribution with mean zero and unknown variance v. The likelihood function is p(d v) = (2πv) n/2 exp ( 1 2v The prior distribution is inverse gamma, v IG(m/2,s/2) n i=1 d 2 i p(v) = (s/2)m/2 ( Γ(m/2) v m/2 1 exp s ). 2v ). The posterior distribution is p(v d) v (m+n)/2 1 exp ( s + d 2 i 2v ). 9

10 Example 2: Normal with Unknown Variance The posterior is also inverse gamma v d IG ( m+n 2, s + di 2 2 ). The parameters are updated by adding the sample size and the data sum of squares, respectively. 10

11 Example 2: Normal with Unknown Variance 0.5 Prior IG(5,10) mean= Likelihood IG(10,50) mean= Posterior IG(15,60) mean=

12 Conjugate Priors The normal and inverse gamma priors are called conjugate priors. These occur when the prior and posterior distribution belong to the same family. These are convenient because the distribution can be updated by updating a set of sufficient statistics. Other examples of conjugate priors: Inverse-Wishart, for the covariance matrix of a normal. Normal-Inverse Gamma, for the mean and variance of normal. Dirichlet, for the probabilities of a discrete distribution. 12

13 Sequential Bayesian Estimation If the data are assimilated sequentially, we want to update the parameters α after each new observation d 1,d 2,...,d n. Under the Bayesian approach, this requires calculating the sequence of posterior distributions p(α d 1 ), p(α d 1,d 2 ),..., p(α d 1,d 2,...,d n ) This is done by applying Bayes theorem recursively after each new observation: p(α d 1,...,d k ) p(d k α)p(α d 1,...,d k 1 ), for each k. 13

14 Sequence of Posterior Distributions Posterior t=1 Posterior t= MLE=2.9 Mode=1 Mean= MLE=1 Mode=1.6 Mean= Posterior t=2 MLE=0.2 Mode=0.9 Mean= Posterior t=4 MLE=11 Mode=2.2 Mean= Posterior t=100 MLE=9 Mode=2 Mean= Posterior t=200 MLE=8.6 Mode=2 Mean=

15 Sequential Bayesian Estimates MLEs of alpha Sequential Estimates Posterior Mean Posterior Mode 90% Posterior CI

16 Convergence of the Posterior Distribution 1. The posterior distribution converges to the true parameter value. 2. If the model is correct, and certain regularity conditions hold, the posterior distribution converges to a normal distribution with mean equal to the true value and covariance equal to the asymptotic covariance matrix. 16

17 Parameter Estimation in Data Assimilation Many parameter estimation methods have been proposed for data assimilation systems. Maximum likelihood estimation Dee (1995) and Dee & Da Silva (1999): Error covariances Mitchell & Houtekamer (2000): Error covariances (EnKF) Li, Kalnay & Miyoshi (2007): Variance/Covariance inflation Bayesian estimation Anderson & Anderson (1999): State augmentation Stroud & Bengtsson (2007): Observation error variance Anderson (2007, 2009): Covariance inflation factors Miyoshi (2011): Covariance inflation factors 17

18 Estimation of Physical Parameters State augmentation is used to estimate unknown parameters θ in the physical model M(x t,θ). Define the augmented state vector z t = (x t,θ t ), and the augmented model as x t θ t = M(x t 1,θ t 1 ) θ t 1 + w t. 0 Specify an initial prior distribution, θ 0 p(θ 0 ). Then, standard data assimilation methods are applied to z t to estimate the posterior distribution, p(θ d 1,...,d t ), at each time t. 18

19 Example: Lorenz 63 Model Model equations: dx/dt = σ(y x) dy/dt = ρx y xz dz/dt = xy βz z y x The state vector is x = (x,y,z), and parameter is θ = (σ,ρ,β) The parameters σ = 10,ρ = 28,β = 8/3 give the famous butterfly. Generate data with time step dt =.01, and observation noise = 1. ETKF/state augmentation on z t = (x t,θ t ) with ensemble size 100 and variance inflation factor 4. 19

20 Sequential Parameter Estimates: Lorenz σ Posterior Mean 90% Posterior CI ρ Posterior Mean 90% Posterior CI β Posterior Mean 90% Posterior CI

21 Estimation of Covariance Parameters State augmentation does not work well for parameters in the background or error covariance matrices, P, Q, and R. Dee (1995), D&D (1999) and Mitchell & Houtekamer (2000) proposed Maximum Likelihood estimation for these parameters. Assuming the innovations d are normal with mean zero and covariance D(α), the likelihood function is p(d α) D(α) 1/2 exp { 1 } 2 d D(α) 1 d and the maximum likelihood estimator is ˆα ML = argmax α p(d α). 21

22 Estimation of Covariance Parameters Maximum Likelihood (ML) works well for large samples, but has problems for recursive estimation. D95 and MH00 proposed the recursive ML estimator: ) ˆα t = (1 γ t )ˆα t 1 +γ t (argmax α p(d t α). Setting γ t = 1/t defines ˆα t as the mean of the ML estimates. They also defined ˆα t as the median of the ML estimates. 22

23 Simple Scalar Example Mitchell & Houtekamer (2000) proposed the following example: Generate data d N(0,2+α), with true value α =.3. Since α 0, the single sample ML estimator is ˆα ML = max ( 0,d 2 2 ) 23

24 Mitchell & Houtekamer Example Recursive estimates for α. 0.5 True Mean Median Bayes The recursive ML estimators do not converge to the true value. The Bayes estimator does converge. 24

25 Bayesian Parameter Estimation in the EnKF We propose the following generic EnKF algorithm for combined estimation of states z and covariance parameters α. 1. Assume a prior distribution for the parameters α p(α). 2. Generate a forecast ensemble of parameters and states: α f i p(α) z f i p(z α f i ) 3. Update the prior distribution via Bayes Theorem: p(α d) p(α)p(d α) 4. Generate an analysis ensemble of parameters and states: α i p(α d) z i p(z α i,d) 25

26 Model 1: Unknown Observation Variance Stroud & Bengtsson (2007) considered the case where R = αr, Q = αq and D = αd. 1. Assume an inverse gamma prior distribution: α IG(n/2,s/2). 2. Generate the forecast ensemble: α f i IG(n/2,s/2), z f t,i M(z t 1,i )+N(0,α f i Q ) 3. Update the parameters of the inverse gamma distribution: n = n+p, s = s +d (D ) 1 d 4. Generate the analysis ensemble: α i IG(n /2,s /2) z t,i z f t,i +K(d+N(0,α i R )) 26

27 Example: Lorenz 96 Model The Lorenz 96 model mimics advection on a latitude circle. The model is highly nonlinear (chaotic), containing quadratic terms. ẋ t,j = (x t,j+1 x t,j 2 )x t,j 1 x t,j +F. The state vector has 40 variables x = (x 1,...,x 40 ). The model parameter is F, the forcing variable. 27

28 Model & Assimilation Settings Time step dt =.05 or.25. Forcing F = 8 (known or unknown). Observations at every location. Error covariances: Q = 0, R = αi; true α = 4. EnKF with ensemble size m = 100. Covariance localization with cutoff radius c = 10. Covariance inflation factor 1. 28

29 Sequential Bayesian Estimates of α (dt =.05) 5 α Y 0 ~IG(1.5,6) 4.5 α t 500 α Y ~IG(15,240) α α Y ~IG(15,15) t α t 29

30 Sequential Bayesian Estimates of α (dt =.25) 5 α Y 0 ~IG(1.5,6) 4.5 α t α Y ~IG(15,240) α t α Y ~IG(15,15) α t 30

31 Sequential Bayesian Estimates of (α, F) 5 α Y 0 ~IG(15,15) 10 F Y 0 ~N(8,1) α 4 F t t 5 α Y 0 ~IG(15,15) 10 F Y 0 ~N(8,1) α 4 F t t 31

32 Sequential Estimates of (α, F): Sparse Network 5 α Y 0 ~IG(15,15) 10 F Y 0 ~N(8,1) α 4 F t t 5 α Y 0 ~IG(15,15) 10 F Y 0 ~N(8,1) α 4 F t t 32

33 Spatially- and Temporally-Varying Scale Factors 5 α 1 Y 0 ~IG(1.5, 6) 2.5 α 2 Y 0 ~IG(1.5, 3) 4.5 α 1 4 α t t 5 α 1 Y 0 ~IG(1.5, 6) 11 α 2 Y 0 ~IG(1.5, 13.5) α 1 4 α t t 33

34 Estimation of Spatial Correlation Parameters: Discrete Representation Assume R is defined by a covariance model K(r;α). 1. Assume a discrete prior on a grid of parameter values α : α Mult(α,π) 2. Generate the forecast ensemble. 3. Estimate the innovation mean d and covariance, D(α). 4. Update the posterior distribution: p(α d) Mult(α α,π)p(d α) = Mult(α α,π ) 5. Generate the analysis ensemble: α i Mult(α,π ) z t,i z f t,i +K(α i )(d+n(0,r(α i ))) 34

35 Estimation of Spatial Correlation Parameters: Gaussian Approximation Assume R is defined by a covariance model K(r;α). 1. Assume a normal prior on the parameters: α N(m,C) 2. Generate the forecast ensemble. 3. Estimate the innovation mean d and covariance, D(α). 4. Update the posterior distribution: p(α d) N(α m,c)p(d α) N(α m,c ) 5. Generate the analysis ensemble: α i N(m,C ) z t,i z f t,i +K(α i )(d+n(0,r(α i ))) 35

36 Grid vs Normal Posteriors: Linear Model True: t= Grid: t= Normal: t= True: t= Grid: t= Normal: t= True: t= Grid: t= Normal: t=

37 Sequential Posterior Estimates: Linear Model 0.5 γ σ γ θ γ α

38 Lorenz 96 Model & Assimilation Settings Time step dt =.01. Perfect model, F = 8 known. Observations at 40 locations. R defined by the Matérn correlation model: α ( r ν ( r ) K(r) = Kν Γ(ν)2 λ) ν 1 ; α,λ,ν > 0. λ EnKF with ensemble size m = 100 Covariance localization with radius r = 12. No covariance inflation. 38

39 Sequential Posterior Distributions: Discrete t=0 t= t=5 t= t=50 t=

40 Sequential Bayesian Estimates: Discrete α Posterior Mean 95% Interval λ Posterior Mean 95% Interval ν Posterior Mean 95% Interval

41 Conclusions Bayesian methods are useful for parameter estimation in DA. Two new algorithms for combined state and parameter estimation within EnKF. Easily combined with state augmentation. Good convergence properties (unlike recursive ML). Conjugate priors allow for easy updating. Would love to collaborate with you on this topic! 41

42 Computational Methods Bayesian and ML methods rely heavily on calculation of the likelihood. Several approximate methods have been proposed for computing the likelihood for large spatial data sets Spectral approximations (Whittle, 1953) Approximate likelihood (Vecchia, 1988) Covariance localization (Kaufman et al., 2008) These methods can be applied in data assimilation systems. 42

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012 Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood