Strathprints Institutional Repository

Similar documents
Posterior Covariance vs. Analysis Error Covariance in Data Assimilation

where I = (n x n) diagonal identity matrix with diagonal elements = 1 and off-diagonal elements = 0; and σ 2 e = variance of (Y X).

4DVAR, according to the name, is a four-dimensional variational method.

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)

Chapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems

Linear Approximation with Regularization and Moving Least Squares

Analysis error covariance versus posterior covariance in variational data assimilation

MMA and GCMMA two methods for nonlinear optimization

Lecture 12: Discrete Laplacian

Lecture Notes on Linear Regression

Time-Varying Systems and Computations Lecture 6

ON A DETERMINATION OF THE INITIAL FUNCTIONS FROM THE OBSERVED VALUES OF THE BOUNDARY FUNCTIONS FOR THE SECOND-ORDER HYPERBOLIC EQUATION

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Global Sensitivity. Tuesday 20 th February, 2018

x yi In chapter 14, we want to perform inference (i.e. calculate confidence intervals and perform tests of significance) in this setting.

Inner Product. Euclidean Space. Orthonormal Basis. Orthogonal

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X

The Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction

Appendix B. The Finite Difference Scheme

Errors for Linear Systems

Numerical Heat and Mass Transfer

CSci 6974 and ECSE 6966 Math. Tech. for Vision, Graphics and Robotics Lecture 21, April 17, 2006 Estimating A Plane Homography

CHAPTER 5 NUMERICAL EVALUATION OF DYNAMIC RESPONSE

Lecture 21: Numerical methods for pricing American type derivatives

A Robust Method for Calculating the Correlation Coefficient

Feature Selection: Part 1

1 Matrix representations of canonical matrices

Composite Hypotheses testing

On an Extension of Stochastic Approximation EM Algorithm for Incomplete Data Problems. Vahid Tadayon 1

Econ Statistical Properties of the OLS estimator. Sanjaya DeSilva

Difference Equations

Hongyi Miao, College of Science, Nanjing Forestry University, Nanjing ,China. (Received 20 June 2013, accepted 11 March 2014) I)ϕ (k)

Some Comments on Accelerating Convergence of Iterative Sequences Using Direct Inversion of the Iterative Subspace (DIIS)

Week 5: Neural Networks

Parametric fractional imputation for missing data analysis. Jae Kwang Kim Survey Working Group Seminar March 29, 2010

Singular Value Decomposition: Theory and Applications

Kernel Methods and SVMs Extension

Chapter 11: Simple Linear Regression and Correlation

The Order Relation and Trace Inequalities for. Hermitian Operators

Generalized Linear Methods

Psychology 282 Lecture #24 Outline Regression Diagnostics: Outliers

Linear Feature Engineering 11

The Feynman path integral

U.C. Berkeley CS294: Spectral Methods and Expanders Handout 8 Luca Trevisan February 17, 2016

NUMERICAL DIFFERENTIATION

4 Analysis of Variance (ANOVA) 5 ANOVA. 5.1 Introduction. 5.2 Fixed Effects ANOVA

Lectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix

Estimation: Part 2. Chapter GREG estimation

STAT 309: MATHEMATICAL COMPUTATIONS I FALL 2018 LECTURE 16

Physics 5153 Classical Mechanics. D Alembert s Principle and The Lagrangian-1

Richard Socher, Henning Peters Elements of Statistical Learning I E[X] = arg min. E[(X b) 2 ]

Feb 14: Spatial analysis of data fields

Vector Norms. Chapter 7 Iterative Techniques in Matrix Algebra. Cauchy-Bunyakovsky-Schwarz Inequality for Sums. Distances. Convergence.

NON-CENTRAL 7-POINT FORMULA IN THE METHOD OF LINES FOR PARABOLIC AND BURGERS' EQUATIONS

risk and uncertainty assessment

Hidden Markov Models

STAT 511 FINAL EXAM NAME Spring 2001

Implicit Integration Henyey Method

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results.

Discussion of Extensions of the Gauss-Markov Theorem to the Case of Stochastic Regression Coefficients Ed Stanek

A New Refinement of Jacobi Method for Solution of Linear System Equations AX=b

2016 Wiley. Study Session 2: Ethical and Professional Standards Application

Inductance Calculation for Conductors of Arbitrary Shape

The Minimum Universal Cost Flow in an Infeasible Flow Network

Research Article Green s Theorem for Sign Data

LOW BIAS INTEGRATED PATH ESTIMATORS. James M. Calvin

Canonical transformations

Department of Quantitative Methods & Information Systems. Time Series and Their Components QMIS 320. Chapter 6

This column is a continuation of our previous column

Grover s Algorithm + Quantum Zeno Effect + Vaidman

ELASTIC WAVE PROPAGATION IN A CONTINUOUS MEDIUM

Report on Image warping

CME 302: NUMERICAL LINEAR ALGEBRA FALL 2005/06 LECTURE 13

One-sided finite-difference approximations suitable for use with Richardson extrapolation

Hidden Markov Models & The Multivariate Gaussian (10/26/04)

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification

Comparison of Regression Lines

STAT 3008 Applied Regression Analysis

1 GSW Iterative Techniques for y = Ax

Mathematical Preparations

Gaussian Mixture Models

The Geometry of Logit and Probit

ISSN: ISO 9001:2008 Certified International Journal of Engineering and Innovative Technology (IJEIT) Volume 3, Issue 1, July 2013

More metrics on cartesian products

Some modelling aspects for the Matlab implementation of MMA

CIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M

Classification as a Regression Problem

COMPARISON OF SOME RELIABILITY CHARACTERISTICS BETWEEN REDUNDANT SYSTEMS REQUIRING SUPPORTING UNITS FOR THEIR OPERATIONS

A Comparative Study for Estimation Parameters in Panel Data Model

Section 8.3 Polar Form of Complex Numbers

Module 9. Lecture 6. Duality in Assignment Problems

RELIABILITY ASSESSMENT

Lecture 5.8 Flux Vector Splitting

Additional Codes using Finite Difference Method. 1 HJB Equation for Consumption-Saving Problem Without Uncertainty

Tracking with Kalman Filter

The Finite Element Method

PHYS 705: Classical Mechanics. Calculus of Variations II

2.29 Numerical Fluid Mechanics Fall 2011 Lecture 12

Structure and Drive Paul A. Jensen Copyright July 20, 2003

Transcription:

Strathprnts Insttutonal Repostory Gejadze, I. Yu. and Shutyaev, V. and Le Dmet, F.X. (2013) Analyss error covarance versus posteror covarance n varatonal data assmlaton. Quarterly Journal of the Royal Meteorologcal Socety, 139 (676). 1826 1841. ISSN 0035-9009, http://dx.do.org/10.1002/qj.2070 Ths verson s avalale at http://strathprnts.strath.ac.uk/42544/ Strathprnts s desgned to allow users to access the research output of the Unversty of Strathclyde. Unless otherwse explctly stated on the manuscrpt, Copyrght and Moral Rghts for the papers on ths ste are retaned y the ndvdual authors and/or other copyrght owners. Please check the manuscrpt for detals of any other lcences that may have een appled. You may not engage n further dstruton of the materal for any proftmakng actvtes or any commercal gan. You may freely dstrute oth the url (http://strathprnts.strath.ac.uk/) and the content of ths paper for research or prvate study, educatonal, or not-for-proft purposes wthout pror permsson or charge. Any correspondence concernng ths servce should e sent to Strathprnts admnstrator: strathprnts@strath.ac.uk

Quarterly Journal of the Royal Meteorologcal Socety Q. J. R. Meteorol. Soc. (2012) Analyss error covarance versus posteror covarance n varatonal data assmlaton I. Yu. Gejadze, a * V. Shutyaev and F.-X. Le Dmet c a Department of Cvl Engneerng, Unversty of Strathclyde, Glasgow, UK Insttute of Numercal Mathematcs, Russan Academy of Scences, MIPT, Moscow, Russa c MOISE project, LJK, Unversty of Grenole, France *Correspondence to: I. Yu. Gejadze, Department of Cvl Engneerng, Unversty of Strathclyde, 107 Rottenrow, Glasgow G4 ONG, UK. E-mal: gor.gejadze@strath.ac.uk The prolem of varatonal data assmlaton for a nonlnear evoluton model s formulated as an optmal control prolem to fnd the ntal condton functon (analyss). The data contan errors (oservaton and ackground errors); hence there s an error n the analyss. For mldly nonlnear dynamcs the analyss error covarance can e approxmated y the nverse Hessan of the cost functonal n the auxlary data assmlaton prolem, and for stronger nonlnearty y the effectve nverse Hessan. However, t has een notced that the analyss error covarance s not the posteror covarance from the Bayesan perspectve. Whle these two are equvalent n the lnear case, the dfference may ecome sgnfcant n practcal terms wth the nonlnearty level rsng. For the proper Bayesan posteror covarance a new approxmaton va the Hessan s derved and ts effectve counterpart s ntroduced. An approach for computng the mentoned estmates n the matrxfree envronment usng the Lanczos method wth precondtonng s suggested. Numercal examples whch valdate the developed theory are presented for the model governed y Burgers equaton wth a nonlnear vscous term. Copyrght c 2012 Royal Meteorologcal Socety Key Words: large-scale flow models; nonlnear dynamcs; data assmlaton; optmal control; analyss error covarance; Bayesan posteror covarance; Hessan Receved 3 Feruary 2012; Revsed 25 Septemer 2012; Accepted 4 Octoer 2012; Pulshed onlne n Wley Onlne Lrary Ctaton: Gejadze IY, Shutyaev V, Le Dmet F-X. 2012. Analyss error covarance versus posteror covarance n varatonal data assmlaton. Q. J. R. Meteorol. Soc. DOI:10.1002/qj.2070 1. Introducton Over the past two decades, methods of data assmlaton (DA) have ecome vtal tools for analyss and predcton of complex physcal phenomena n varous felds of scence and technology, ut partcularly n large-scale geophyscal applcatons such as numercal weather and ocean predcton. Among the few feasle methods for solvng these prolems the varatonal data assmlaton method called 4D-Var s the preferred method mplemented at some major operatonal centres, such as the UK Met Offce, ECMWF, Meteo France and GMAO (USA). The key deas of the method were ntroduced y Sasak (1955), Penenko and Oraztsov (1976) and Le Dmet and Talagrand (1986). Assumng that an adequate dynamcal model descrng the evoluton of the state u s gven, the 4D-Var method conssts n mnmzaton of a specally desgned cost functonal J(u) whch ncludes two parts: the squared weghted resdual etween model predctons and nstrumental oservatons taken over the fnte oservaton perod [0, T]; and the squared weghted dfference etween the soluton and the pror estmate of u, known as the ackground term u. Wthout ths term one would smply get the generalzed nonlnear least square prolem (Hartley and Booker, 1965), as n the presence of the ackground term the cost functonal s smlar that consdered n Tkhonov s regularzaton theory (Tkhonov, Copyrght c 2012 Royal Meteorologcal Socety

I. Yu. Gejadze et al. 1963). The modern mplementaton of the method n meteorology s known as the ncremental approach (see Courter et al., 1994). Curously, t took over a decade for the data assmlaton communty to realze that the ncremental approach s nothng else ut the Gauss Newton method appled for solvng the optmalty system assocated wth J(u) (see Lawless et al., 2005). The error n the optmal soluton (or analyss error ) s naturally defned as a dfference etween the soluton u and the true state u t ; ths error s quantfed y the analyss error covarance matrx (see, for example, Thacker, 1989; Raer and Courter, 1992; Fsher and Courter, 1995; Yang et al., 1996; Gejadze et al., 2008). Ths percepton of uncertantes n the 4D-Var method s proaly nherted from the nonlnear least square (or nonlnear regresson) theory (Hartley and Booker, 1965). A less wdespread pont of vew s to consder the 4D-Var method n the framework of Bayesan methods. Among the frst to wrte on the Bayesan perspectve on DA one should proaly menton Lorenc (1986) and Tarantola (1987). For a comprehensve revew on the recent advances n DA from ths pont of vew see, for example, Wkle and Berlner (2007) and Stuart (2010). So far, t has een recognzed that for the Gaussan data errors (whch nclude oservaton and ackground/pror errors) the Bayesan approach leads to the same standard 4D-Var cost functonal J(u) to e mnmzed. However, t s not yet wdely recognzed that the concepton of the estmaton error n the Bayesan theory s somewhat dfferent from the nonlnear least squares theory and, as a result, the Bayesan posteror covarance s not exactly the analyss error covarance. These two are conceptually dfferent ojects, whch can sometmes e approxmated y the same estmate. In the lnear case they are quanttatvely equal; n the nonlnear case the dfference may ecome qute notceale n practcal terms. Note that the analyss error covarance computed at the optmal soluton can also e named posteror, ecause t s, n some way, condtoned on the data (oservatons and ackground/pror). However, ths s not the same as the Bayesan posteror covarance. An mportant ssue s the relatonshp etween the analyss error covarance, the Bayesan posteror covarances and the Hessan H = J (u). A well-known fact whch can e found n any textook on statstcs (e.g. Draper and Smth, 1966) s that n the case of the lnear dependence etween the state varales (exogenous varales) and oservatons (endogenous varales) the analyss error covarance s equal to H 1. For the nonlnear case ths s transformed nto the statement that the analyss error can e approxmated y H 1,H s a lnearzed approxmaton to H. Snce the analyss error covarance s often eng confused wth the Bayesan posteror covarance, the latter s also thought to e approxmately equal to H 1. Ths msconcepton often ecomes apparent when one apples, or ntends to apply, elements of the varatonal approach n the framework of sequental methods (flterng) (see, for example, Dorcc, 2009, p. 274; Auvnen et al., 2010, p. 319; Zupansk et al., 2008, p. 1043). In the 4D-Var framework, the analyss error covarance must e consdered to evaluate the confdence ntervals/regons of the analyss or correspondng forecast. However, t s the Bayesan posteror covarance whch should e used as a ass for evaluatng the ackground covarance for the next assmlaton wndow f the Bayesan approach s to e consstently followed. In ths paper we carefully consder relatonshps etween the two mentoned covarances and the Hessans H and H. A new estmate of the Bayesan posteror covarance va the Hessans has een suggested and ts effectve counterpart (smlar to the effectve nverse Hessan ) (see Gejadze et al., 2011) has een ntroduced. We eleve these are new results whch may have oth theoretcal and appled value as for data assmlaton, so n a more general framework of the nverse prolems and parameter estmaton theory (Tarantola, 2005). The ssue of computatonal effcency s not consdered to e of major mportance n ths paper; however, all ntroduced estmates are, n prncple, computale n large-scale prolem set-ups. The paper s organzed as follows. In secton 3 we state the varatonal DA prolem to dentfy the ntal condtons for a nonlnear evoluton model. In secton 4 the equaton for analyss error s gven through the errors n the nput data usng the Hessan of the auxlary DA prolem, and the asc relatonshp etween analyss error covarance and the nverse of ths Hessan s estalshed. Smlarly, n secton 5 the expresson for the Bayesan posteror covarance nvolvng the orgnal Hessan of J(u) and the Hessan of the auxlary DA prolem s derved. In secton 6 the effectve estmates are ntroduced and n secton 7 the key mplementaton ssues are consdered. In secton 8 the asymptotc propertes of the regularzed least square estmator and of the Bayesan estmator are refly dscussed. The detals of numercal mplementaton are presented n secton 9 and the numercal results whch valdate the presented theory n secton 10. The man results of ths paper are summarzed n the Conclusons. The Appendx contans addtonal materal on the asymptotc propertes of the estmators. 2. Overvew Let u e the ntal state of a dynamcal system and y ncomplete oservatons of the system. It s possle to wrte the ntalzaton-to-data map as y = G(u) + ξ o, G represents the mappng from the ntal state to the oservatons and ξ o s a random varale from the Gaussan N(0, V o ). The ojectve s to fnd u from y. In the Bayesan formulaton u has the pror densty ρ pror from the Gaussan N(u, V ). The posteror densty ρ post s gven y Bayes rule as ρ post (u) = const exp( (u) 1 2 V 1/2 (u u ) 2 ), (u) = 1 2 V 1/2 o (y G(u)) 2 (for detals see secton 5). The 4D-Var soluton, whch concdes wth the maxmzer of the posteror densty, s found y mnmzng (u) + 1 2 V 1/2 (u u ) 2 (see Eqs (2) (3)). The mnmzer ū solves the optmalty system D (ū) + (ū u ) = 0

Analyss Error Covarance versus Posteror Covarance (see Eqs (4) (6)). Wth ths notaton the paper addresses the followng ssues: () The posteror covarance s gven y E post ((u u mean )(u u mean ) ), u mean = E post u and E post denotes averagng (expectaton) wth respect to ρ post (see Eq. (32)). The posteror covarance s often approxmated y tryng to fnd the second moment of ρ post centred around ū nstead of u mean (see Eq. (33)), whch s natural ecause ū s the output of 4D-Var. In the lnear Gaussan set-up u mean and ū concde. Ths s not true n general, ut can e expected to e a good approxmaton f the volume of data s large and/or nose s small (see secton 8). () The analyss error covarance s assocated wth tryng to fnd an approxmaton around the truth u t, as the data are also assumed to come from the truth: y = G(u t ) + ξ o, u = u t + ξ, ξ o N(0, V o )andξ N(0, V ) are the oservaton and ackground error, respectvely. The analyss error s defned as δu = u u t and ts covarance s gven y E a ((u u t )(u u t ) ) = E a (δuδu ) (see Eq. (22)), E a denotes averagng (expectaton) wth respect to the analyss error densty ρ a whch, takng nto account the defntons of the data y and u, can e defned as follows: ρ a (u) = const exp( (u) 1 2 V 1/2 (u u t ) 2 ), (u) = 1 2 V 1/2 o (G(u t ) G(u)) 2. The analyss error covarance can e approxmated y the nverse of the Hessan H of the auxlary cost functon 1 2 V 1/2 o DG(u t )v 2 + 1 2 V 1/2 v 2, v s a functon elongng to the state space (see Eqs (20) (21)). Snce u t s not known, ū s used nstead of u t. () Owng to dfferent centrng of Gaussan data, the posteror covarance and the analyss error covarance are dfferent ojects and should not e confused. They are equal n the lnear case. (v) Computng D to fnd the 4D-Var soluton requres computng (DG) and ths may e found from an adjont computaton (see Eq. (5)). Computng the approxmaton of the posteror covarance at ū requres fndng the Hessan H(ū) = D 2 (ū) + (see Eqs (45) (47)) and nvertng t. The second dervatve D 2 (ū) requres computng D 2 G(ū). Important (and sometmes expensve to compute) terms comng from F (ū) n the notaton to follow cannot e neglected here. (v) The posteror covarance can e approxmated usng the formula whch ncludes oth the Hessans H and H (see Eq. (51)). Other susequently coarse approxmatons nclude H 1 and H 1. The latter concdes wth the approxmaton of the analyss error covarance. Actual mplementaton of the algorthms for computng the aove estmates s detaled n the paper. Owng to the presence of lnearzaton errors, the effectve values of all the covarance estmates have to e preferred (see secton 6) f they are computatonally affordale. (v) We put a dstance metrc (see Eq. (68)) on operators/matrces and use ths to compare all of the dfferent notons of covarance. It s mportant to dstngush etween dfferences arsng from conceptual shfts of perspectve and those arsng from approxmatons. For example, H 1 must e used for estmatng the analyss error covarance, not H 1.In ths case, the latter (f avalale y means of a dfferent approach; see, for example, Yang et al., 1996), can e used as an approxmaton to H 1. Vce versa, t s H 1 that should e used for estmatng the posteror covarance, not H 1. However, the latter can e used to approxmate H 1. 3. Statement of the prolem Consder the mathematcal model of a physcal process that s descred y the evoluton prolem: ϕ = F(ϕ) + f, ϕ = u, (1) ϕ = ϕ(t) s the unknown functon elongng, for any t (0, T), to a state space X, u X, F s a nonlnear operator mappng X nto X. LetY = L 2 (0, T; X) e a space of functons ϕ(t) wthvaluesnx, Y = (, ) 1/2 Y, f Y. Suppose that for a gven u X, f Y there exsts a unque soluton ϕ Y to Eq. (1). Let u t e the true ntal state and ϕ t the soluton to the prolem (Eq. (1)) wth u = u t,.e. the true state evoluton. We defne the nput data as follows: the ackground functon u X, u = u t + ξ and the oservatons y Y o, y = Cϕ t + ξ o, C : Y Y o s a lnear ounded operator (oservaton operator) and Y o s an oservaton space. The functons ξ X and ξ o Y o may e regarded as the ackground and the oservaton error, respectvely. We assume that these errors are normally dstruted (Gaussan) wth zero mean and the covarance operators V =E[(, ξ ) X ξ ] and V o =E[(,ξ o ) Yo ξ o ],.e. ξ N (0, V ), ξ o N (0, V o ), s read s dstruted as. We also assume that ξ o, ξ are mutually uncorrelated and V,V o are postve defnte, and hence nvertle. Let us formulate the followng DA prolem (optmal control prolem) wth the am to dentfy the ntal condton: for gven f Y fnd u X and ϕ Y such that they satsfy Eq. (1), and on the set of solutons to Eq. (1) a cost functonal J(u) takes the mnmum value,.e. J(u) = nf J(v), (2) v X

I. Yu. Gejadze et al. J(u) = 1 2 (V 1 (u u ), u u ) X, + 1 2 (V 1 o (Cϕ y), Cϕ y) Yo. (3) The necessary optmalty condton reduces the prolem (Eqs (2) (3)) to the followng system (Lons, 1968): ϕ ϕ = F(ϕ) + f, ϕ = u, (4) (F (ϕ)) ϕ = C o (Cϕ y), (5) (u u ) ϕ = 0, (6) wth the unknowns ϕ, ϕ, u, (F (ϕ)) s the adjont to the Frechet dervatve of F, andc s the adjont to C defned y (Cϕ,ψ) Yo = (ϕ, C ψ) Y, ϕ Y,ψ Y o. All adjont varales throughout the paper satsfy the trval termnal condton, e.g. ϕ t=t = 0. Havng assumed that the system (Eqs (4) (6)) has a unque soluton, we wll study the mpact of the errors ξ, ξ o on the optmal soluton u. 4. The analyss error covarance va nverse Hessan In ths secton an equaton for the analyss error s derved through the errors n the nput data, the approxmate relatonshp etween the analyss error covarance and the Hessan of the auxlary DA prolem s estalshed and the valdty of ths approxmaton s dscussed. Let us defne the analyss (optmal soluton) error δu = u u t and the correspondng (related va Eq. (7)) feld devaton δϕ = ϕ ϕ t. Assumng F s contnuously Frechet dfferentale, there exsts ϕ = ϕ t + τ (ϕ ϕ t ), τ [0, 1], such that the Taylor Lagrange formula (Marchuk et al., 1996) s vald: F(ϕ) F(ϕ t ) = F ( ϕ)δϕ. Then from Eqs (4) (6) we get ϕ δϕ F ( ϕ)δϕ = 0, δϕ = δu, (7) (F (ϕ)) ϕ = C o (Cδϕ ξ o ), (8) (δu ξ ) ϕ = 0. (9) Let us ntroduce the operator R(ϕ) :X Y as follows: R(ϕ)v = ψ, v X, (10) ψ s the soluton of the tangent lnear prolem ψ F (ϕ)ψ = 0, ψ = v. (11) The adjont operator R (ϕ) :Y X acts on the functon g Y accordng to the formula R (ϕ)g = ψ, (12) ψ s the soluton to the adjont prolem ψ (F (ϕ)) ψ = g. (13) Then, the system for errors (Eqs (7) (9)) can e represented as a sngle operator equaton for δu: H(ϕ, ϕ)δu = ξ + R (ϕ)c Vo 1 ξ o, (14) H(ϕ, ϕ) = + R (ϕ)c o CR( ϕ). (15) The operator H(ϕ, ϕ) :X X can e defned y the successve solutons of the followng prolems: ψ ψ F ( ϕ)ψ = 0, ψ = v, (16) (F (ϕ)) ψ = C o Cψ, (17) H(ϕ, ϕ)v = v ψ. (18) In general, the operator H(ϕ, ϕ) s nether symmetrc nor postve defnte. However, f oth ts entres are the same,.e. ϕ = ϕ = θ, t ecomes the Hessan H(θ) of the cost functon J 1 n the followng optmal control prolem: fnd δu and δϕ such that J 1 (δu) = nf v J 1(v), (19) J 1 (δu) = 1 2 (V 1 (δu ξ ),δu ξ ) X and δϕ satsfes the prolem δϕ + 1 2 (V 1 o (Cδϕ ξ o ), Cδϕ ξ o ) Yo, (20) F (θ)δϕ = 0, δϕ = δu. (21) We shall call the prolem (Eqs (19) (20)) the auxlary DA prolem and the entry θ n Eq. (21) the orgn of the Hessan H(θ). Let us note that any ξ X and ξ o Y o can e consdered n Eq. (20), ncludng ξ = 0andξ o = 0. Further, we assume that the optmal soluton (analyss) error δu s unased,.e. E[δu] = 0 (the valdty of ths assumpton n the nonlnear case wll e dscussed n secton 8), wth the analyss error covarance operator V δu =E[(, δu) X δu] = E[(, u u t ) X (u u t )]. (22) In order to evaluate V δu we express δu from Eq. (14), then apply the expectaton E to (,δu) X δu. Let us note, however, that the functons ϕ, ϕ n Eqs (7) (9) are dependent on ξ,ξ o and so are the operators R( ϕ), R (ϕ), and t s not possle to represent δu through ξ, ξ o n an explct form. Therefore, efore applyng E we need to ntroduce some approxmatons of the operators nvolved n Eq. (14) ndependent of ξ, ξ o. Consder the functons ϕ = ϕ t + τδϕ

Analyss Error Covarance versus Posteror Covarance and ϕ = ϕ t + δϕ n Eqs (7) (9). As far as we assume that E[δu] 0, t s natural to consder E[δϕ] 0. Thus the est value of ϕ and ϕ ndependent of ξ o,ξ s apparently ϕ t and we can use the followng approxmatons: R( ϕ) R(ϕ t ), R (ϕ) R (ϕ t ). (23) Then Eq. (14) reduces to H(ϕ t )δu = ξ + R (ϕ t )C Vo 1 ξ o, (24) H( ) = Now we express δu from Eq. (24): + R ( )C o CR( ). (25) δu = H 1 (ϕ t )( ξ + R (ϕ t )C Vo 1 ξ o ) and otan the expresson for the analyss error covarance as follows: V δu = H 1 (ϕ t )( + R (ϕ t )C Vo 1 CR(ϕ t ))H 1 (ϕ t ) = H 1 (ϕ t )H(ϕ t )H 1 (ϕ t ) = H 1 (ϕ t ). (26) In practce the true feld ϕ t s not known (apart from the dentcal twn experment set-up); thus we have to use ts est avalale approxmaton ϕ assocated to a certan unque optmal soluton ū defned y the real data (ū, ȳ),.e. we have to use V δu = H 1 ( ϕ). (27) Ths formula s equvalent to a well-estalshed result (see Thacker, 1989; Raer and Courter, 1992; Courter et al., 1994), whch s usually deduced (wthout consderng the exact equaton (14)) y straghtforwardly smplfyng the orgnal nonlnear DA prolem (Eqs (2) (3)) under the assumpton that F(ϕ) F(ϕ t ) F (ϕ)δϕ, ϕ, (28) whch s called the tangent lnear hypothess (TLH). In partcular, n Raer and Courter (1992, p. 671), the error equaton s actually derved n the form ( = + R (ϕ)c Vo 1 CR(ϕ))δu ξ + R (ϕ)c o ξ o. (29) It s ovous that the operators R(ϕ), R (ϕ) n ths equaton depend on the errors va ϕ and they cannot e treated as eng constant wth respect to δu when computng the expectaton E [(, δu) X δu], as has een done y Raer and Courter (1992). From Eq. (29) the authors nevertheless deduce the formula (27); hence there s no dfference n practcal terms etween the two approaches. However, t s clear from our dervaton that the est estmate of V δu va the nverse Hessan can e acheved gven the orgn ϕ t.the error n ths estmate s an averaged (over all possle mplementatons of ϕ and ϕ) error due to transtons R( ϕ) R(ϕ t ) and R (ϕ) R (ϕ t ); we shall call t the lnearzaton error. The use of ϕ nstead of ϕ t n the Hessan computatons leads to another error, whch we shall call the orgn error. It s mportant to dstngush these two errors. The frst one s related to the method n use and can e elmnated f the error equaton (14) for each ξ 1, ξ 2 s satsfed exactly. Ths can e acheved y solvng the pertured orgnal DA prolem n the Monte Carlo loop wth a large sample sze, for example. The second one, however, cannot e elmnated y any method, gven that the state estmate almost always dffers from the truth. It should e mentoned n advance that the orgn error can e sgnfcantly larger than the lnearzaton error. Ths means, for example, that the use of the computatonally expensve Monte Carlo nstead of the nverse Hessan may lead to only margnal qualty mprovement. Ths ssue s dscussed n Gejadze et al. (2011) and a method of accessng the possle magntude of the orgn error s a suject of a forthcomng paper. In the context of our approach, the tangent lnear hypothess should e rather consdered n the form F(ϕ) F(ϕ t ) F (ϕ t )δϕ, ϕ. (30) There s a clear dfference etween Eqs (30) and (28). For example, f we assume that E[δϕ] = 0 then E[F (ϕ t )δϕ] = 0; however, E[F (ϕ)δϕ] = E[F (ϕ t + δϕ)δϕ] 0. One can easly magne stuatons n whch the condton (30) s far less demandng than (28). It s customarly sad n the geophyscal lterature that V δu can e approxmated y the nverse Hessan f the TLH (Eq. (28)) s vald, whch should e true f the nonlnearty s mld and/or the error δu and, susequently, δϕ are small. We would say more precsely that the lnearzaton error n V δu approxmated y H 1 (ϕ t ) s small f the TLH (Eq. (30)) s vald. Moreover, we derve Eq. (26) va Eq. (14). From ths dervaton one can see that the valdty of Eq. (26) depends on the accuracy of the approxmatons (23), whch may stll e accurate though Eq. (30) s not satsfed. Ths partally explans why n practce the approxmaton (26) s reasonaly accurate f Eq. (30) s evdently not satsfed. Another reason s rooted n the stochastc propertes of the nonlnear least squares estmator, as dscussed n secton 6. However, t s hardly possle to judge on the magntude of the orgn error n relaton to the condton (30) eng vald or not. 5. Posteror covarance In ths secton the expresson for the Bayesan posteror covarance nvolvng the Hessans of the orgnal functonal J(u) and the auxlary functonal J 1 (δu) s derved, and ts possle approxmatons are dscussed. The results of ths secton demonstrate that the analyss error covarance and Bayesan posteror covarance are dfferent ojects and should not e confused. Gven u N (ū, V ), y N ( ȳ, V o ), the followng expresson for the posteror dstruton of u s derved from the Bayes theorem (for detals see Stuart, 2010): p(u ȳ) = const exp ( 1 2 (V 1 (u ū ), u ū ) X ) exp( 1 2 (V 1 o (Cϕ ȳ), Cϕ ȳ) Yo ). (31) It follows from Eq. (31) that the soluton to the varatonal DA prolem (Eqs (2) (3)) wth the data y =ȳ and u =ū

I. Yu. Gejadze et al. s equal to the mode of p(u, ȳ) (see, for example, Lorenc, 1986; Tarantola, 1987). Accordngly, the Bayesan posteror covarance has to e defned y V δu =E[(, u E[u]) X (u E[u])], (32) wth u p(u ȳ). Clearly, n order to compute V δu y the Monte Carlo method, one must generate a sample of pseudorandom realzatons u from p(u ȳ). In partcular, n the ensemle flterng methods (see Evensen, 2003; Zupansk et al., 2008) these are produced y solvng the optmal control prolem (.e. nverse prolem!) for ndependently pertured data at the current tme step y explctly usng the Kalman update formula n the EnKF of Evensen (2003) or y mnmzng the nonlnear cost functon n the MLEF of Zupansk et al. (2008). Then, the sample mean and the sample covarance (equvalent to Eq. (32)) are computed. As far as the ensemle flterng methods are consdered a specal case of the Bayesan sequental estmaton (Wkle and Berlner, 2007, p. 10), we may call the covarance otaned y the descred method the Bayesan posteror covarance. Followng a smlar approach n varatonal DA, one should consder u to e the solutons to the DA prolem (Eqs (2) (3)) wth the pertured data u =ū + ξ,and y =ȳ + ξ o,ξ N (0, V ), ξ o N (0, V o ). Further, we assume that E[u] =ū, ū s the soluton to the unpertured prolem (Eqs (2) (3)), n whch case V δu can e approxmated as follows: V δu =E[(, u ū) X (u ū)] = E[(, δu) X δu]. (33) We wll show that ths covarance s dfferent from the classcal analyss error covarance (Raer and Courter, 1992) evaluated at the optmal soluton ū. Now, n order to uld the posteror error covarance, let us consder the unpertured optmalty system (Eqs (4) (6)) wth fxed u =ū, y =ȳ: ϕ ϕ = F( ϕ) + f, ϕ =ū, (34) (F ( ϕ)) ϕ = C o (C ϕ ȳ), (35) (ū ū ) ϕ = 0, (36) wth the soluton {ū, ϕ, ϕ }. Let us now ntroduce the perturatons as follows: u =ū + ξ, y =ȳ + ξ o, ξ X, ξ o Y o. The pertured soluton {u,ϕ,ϕ } satsfes Eqs (4) (6). Let us denote δu = u ū, δϕ = ϕ ϕ and δϕ = ϕ ϕ. Then from Eqs (4) (6) and (34) (36) we otan for {δu, δϕ,δϕ } δϕ = F(ϕ) F( ϕ), δϕ = δu, (37) Usng the Taylor Lagrange formulas F(ϕ) = F( ϕ) + F ( ϕ 1 )δϕ, F (ϕ) = F ( ϕ) + F ( ϕ 2 )δϕ, and ntroducng ϕ 1 = ϕ + τ 1 δϕ, ϕ 2 = ϕ + τ 2 δϕ, τ 1,τ 2 [0, 1], we derve the system for errors: δϕ = F ( ϕ 1 )δϕ, δϕ = δu, (40) δϕ (F (ϕ)) δϕ = [(F ( ϕ 2 )) ϕ ] δϕ C Vo 1 (Cδϕ ξ o ), (41) (δu ξ ) δϕ = 0, (42) whch s equvalent to a sngle operator equaton for δu: H(ϕ, ϕ 1, ϕ 2 )δu = ξ + R (ϕ)c Vo 1 ξ o, (43) H(ϕ, ϕ 1, ϕ 2 ) = + R (ϕ)(c o C [(F ( ϕ 2 )) ϕ ] )R( ϕ 1 ). (44) Here, the operators R and R aredefnednsecton4 and H(ϕ, ϕ 1, ϕ 2 ):X X can e defned y the successve soluton of the followng prolems: ψ = F ( ϕ 1 )ψ, ψ = v, (45) ψ (F (ϕ)) ψ = [(F ( ϕ 2 )) ϕ ] ψ C Vo 1 Cψ, (46) H(ϕ, ϕ 1, ϕ 2 )v = v ψ. (47) Let us underlne that the term nvolvng F on the rght-hand sde of Eq. (41) s of frst-order accuracy wth respect to δϕ the same as C V0 1 Cδϕ and therefore t cannot e neglected n dervaton of the covarance. In general, the operator H(ϕ, ϕ 1, ϕ 2 ) s nether symmetrc nor postve defnte. However, f all ts entres are the same,.e. ϕ = ϕ 1 = ϕ 2, t ecomes the Hessan H(ϕ) of the cost functon n the orgnal DA prolem (Eqs (2) (3)), whch s symmetrc and, also, postve defnte f u s a mnmum pont of J(u). Equaton (46) s often referred to as the secondorder adjont model (Le Dmet et al., 2002). Techncally, ths s smply an adjont model wth a specally defned source term. As efore, we assume that E(δu) 0. Let us accept the followng approxmatons: R( ϕ 1 ) R( ϕ), R (ϕ) R ( ϕ), δϕ (F (ϕ)) δϕ = [(F (ϕ)) (F ( ϕ)) ] ϕ C Vo 1 (Cδϕ ξ o ), (38) [(F ( ϕ 2 )) ϕ ] [(F ( ϕ)) ϕ ]. (48) Then the exact error equaton (43) s approxmated as follows: (δu ξ ) δϕ = 0. (39) H( ϕ)δu = ξ + R( ϕ) C Vo 1 ξ o, (49)

Analyss Error Covarance versus Posteror Covarance H( ) = + R ( )(C o C [(F ( )) ϕ ] )R( ). (50) Now, we express δu from Eq. (49): δu = H 1 ( ϕ)( ξ + R( ϕ) C Vo 1 ξ o ), and otan an approxmate expresson for the posteror error covarance: V δu V 1 = H 1 ( ϕ)( = H 1 ( ϕ)h( ϕ)h 1 ( ϕ), + R ( ϕ)v o 1 R( ϕ))h 1 ( ϕ) (51) H( ϕ) s the Hessan of the cost functon J 1 n the auxlary DA prolem (Eqs (19) (20)), computed at θ = ϕ. Ovously, the aove doule-product formula could e overly senstve to the errors due to the approxmatons (48). By assumng H( ϕ)h 1 ( ϕ) I we otan a more stale (ut, possly, less accurate) approxmaton: V δu V 2 = H 1 ( ϕ). (52) It s nterestng to note that H 1 ( ϕ) s known as the asymptotc Bayesan covarance n the framework of the Bayesan asymptotc theory (see Heyde and Johnstone, 1979; Km, 1994). By assumng H 1 ( ϕ) H 1 ( ϕ) we otan from Eq. (51) yet another (more crude than Eq. (52)) approxmaton: V δu V 3 = H 1 ( ϕ), (53).e. the nverse Hessan of the auxlary DA prolem can e consdered as an approxmaton to oth the posteror error covarance and the analyss error covarance evaluated at ϕ. 6. Effectve covarance estmates At the end of secton 4 the lnearzaton and orgn errors n the analyss error covarance were dscussed. We say that the lnearzaton error can e relatvely small even though the TLH s volated to a certan degree. However, when the nonlnearty ecomes stronger and/or the nput data errors ecome larger, the nverse Hessan may not properly approxmate the analyss error covarance (even for the known true state), n whch case the effectve nverse Hessan (see Gejadze et al., 2011) should e used nstead: V δu = E [ H 1 (ϕ) ]. (54) Apparently, the same must e true for the posteror error covarance computed y Eq. (51). By followng the reasonng of Gejadze et al. (2011), let us consder the dscretzed nonlnear error equaton (43) and wrte down the expresson for δu: δu = H 1 (ϕ, ϕ 1, ϕ 2 )( ξ + R (ϕ)c V0 1 ξ o ). For the covarance V δu we have an expresson as follows: V δu = E[H 1 ξ ξ T V 1 H 1 ] + E[H 1 R (ϕ)c V0 1 ξ o ξo T V 1 0 CR(ϕ)H 1 ] + E[H 1 ξ ξo T V 1 o CR(ϕ)H 1 ] + E[H 1 R (ϕ)c Vo 1 ξ o ξ T V 1 H 1 ], (55) H 1 = H 1 (ϕ, ϕ 1, ϕ 2 ). As dscussed n Gejadze et al. (2011), we approxmate the products ξ ξ T, ξ oξ T o, ξ ξ T o and ξ o ξ T n (55) y E[ξ ξ T ] = V, E[ξ o ξ T o ] = V o, and E[ξ ξ T o ] = 0, E[ξ oξ T ] = 0 (snce ξ and ξ o are mutually uncorrelated), respectvely. Thus we wrte an approxmaton of V δu as follows: V δu = E [ H 1 (ϕ, ϕ 1, ϕ 2 ) H(ϕ) H 1 (ϕ, ϕ 1, ϕ 2 ) ]. Frst, we susttute a possly asymmetrc and ndefnte operator H(ϕ, ϕ 1, ϕ 2 ) wth the Hessan H(ϕ), n whch case we otan V δu V e 1 = E [ H 1 (ϕ) H(ϕ) H 1 (ϕ) ]. (56) Here we keep n mnd that ϕ := ϕ(u) = ϕ(ū + δu), δu s a random vector; therefore t s the varale of ntegraton n E. Next, y assumng H(ϕ)H 1 (ϕ) I we otan a more stale (ut, possly, less accurate) approxmaton: V δu V e 2 = E [ H 1 (ϕ) ]. (57) Fnally, y assumng H 1 (ϕ) H 1 (ϕ) weotanyet another (more crude than Eq. (57)) approxmaton: V δu V e 3 = E [ H 1 (ϕ) ], (58) whch s equvalent to Eq. (54). Therefore, the effectve nverse Hessan can also e consdered as an approxmaton to the posteror error covarance. 7. Implementaton remarks In ths secton the key mplementaton ssues, ncludng precondtonng, regularzaton and computaton of the effectve covarance estmates, are consdered. 7.1. Precondtonng Precondtonng can e used to accelerate computaton of the nverse Hessan y teratve methods such as BFGS or Lanczos. The latter evaluates the egenvalues and egenvectors (or, more precsely, the Rtz values and Rtz vectors) of an operator usng the operator vector acton result. Snce H s self-adjont, we must consder a projected Hessan n a symmetrc form: H( ) = (B 1 ) H( )B 1, wth some operator B : X X, defned n such a way that: (a) most egenvalues of H are clustered around 1; () there are only a few egenvalues sgnfcantly dfferent from 1 (domnant egenvalues). A sensle approxmaton of H 1 can e otaned usng these domnant egenvalues and the correspondng egenvectors, the numer of whch s expected to e much smaller than the state-vector dmenson M. After that, havng computed H 1, one can easly recover H 1 usng the formula H 1 ( ) = B 1 H 1 ( )(B 1 ). By comparng Eq. (50) to (25) we notce that H( ) s dfferent from H( ) due to the presence of the second-order

I. Yu. Gejadze et al. term [(F ( )) ϕ ]. If we assume that the dfference etween H( ) andh( ) s not large, then H 1/2 ( ) caneusedfor effcent precondtonng of H( ). Thus we wll look for the projected Hessan: H( ) = H 1/2 ( )H( )H 1/2 ( ), (59) n whch case the posteror error covarance V δu can e approxmated y the followng estmates: V 1 = H 1/2 ( ϕ) H 2 ( ϕ)h 1/2 ( ϕ), (60) V 2 = H 1/2 ( ϕ) H 1 ( ϕ)h 1/2 ( ϕ). (61) It s clear, therefore, that H 1/2 ( ϕ) has to e computed frst. For computng H 1 ( ) tself the precondtonng n the form B 1 = V 1/2 s used. The result can e presented n lmted-memory form: wth H 1 ( ) = V 1/2 H 1 ( ) = I + K 1 (s 1 =1 H 1 ( )V 1/2, (62) 1)U U T, (63) {s, U }, = 1,..., K 1 << M are the egenvalues and egenvectors of H( ) for whch the values of s 1 1 are most sgnfcant. The matrx functons theory (see, for example, Bellman, 1960) asserts that for any symmetrc matrx A (whch may e presented n the form A = BDB T, D s a dagonal matrx, B s an orthogonal matrx) and for any functon f the followng defnton holds: f (A) = Bf (D)B T. In partcular, f f s the power functon, we otan as follows: A α = BD α B T, α R. (64) For example, f H s presented n the form H = USU T (symmetrc egenvalue decomposton), then H 1 = US 1 U T. Assumng that only K 1 frst egenvalues are dstnct from 1,.e. (s 1 1) 0, > K 1, we otan Eq. (63). Let us menton that n geophyscal lterature the expresson (63) s usually derved n a more cumersome way y consderng the Sherman Morrson Woodury nverson formula (see, for example, Powell and Moore, 2009). Gven the pars {s, U }, the lmted-memory square root operator H 1/2 ( ) can e computed as follows: H 1/2 ( ) = I + K 1 (s 1/2 =1 1)U U T. (65) Thus we can compute H 1/2 ( )v = V 1/2 H 1/2 ( )v,whch s needed for Eq. (59). Another way to compute H 1/2 ( )v s the recursve procedure suggested n Tshmanga et al. (2008, Appendx A, Theorem 2). The operators H 1 ( )and H 2 ( ) can also e computed y the Lanczos algorthm n the lmted-memory form equvalent to (Eq. (63)): H 1 ( ) = I + K 2 (λ 1 =1 H 2 ( ) = I + K 2 (λ 2 =1 1)U U T, (66) 1)U U T, (67) {λ, U }, = 1,..., K 2 are the domnant egenvalues and egenvectors of H( ), the numer of whch s expected to e much smaller than K 1. The advantage of computng V 1 or V 2 n the form (60), (61) s therefore ovous: the second-order adjont model has to e called only K 2 tmes. 7.2. Regularzaton Let us consder two symmetrc postve defnte M M matrces A and B and ntroduce the dvergence matrx Ŵ(A, B) = B 1/2 AB 1/2. We defne the Remann dstance etween A and B as follows: µ(a, B) = log Ŵ(A, B) ( M ) 1/2 = log 2 γ, (68) γ are the egenvalues of Ŵ(A, B) (see, for example, Moakher, 2005). Comparng Eqs (60) and (61) and takng nto account Eqs (66) and (67), we notce that the Remann dstance etween V 3 = H 1 and V 2 s defned y (λ 1 1), as the dstance etween V 3 and V 1 s defned y (λ 2 1). Therefore, the norm of V 1 can e sgnfcantly larger than that of V 2, whch clearly explans the ncreased senstvty of V 1 to the approxmaton error due to transtons (Eq. (48)) (as compared to V 2 ). A smple approach to regularze V 1 s to present t n the form wth =1 V 1 = H 1/2 ( ϕ) H (1+α) ( ϕ)h 1/2 ( ϕ), (69) H (1+α) ( ) = I + K 2 (λ (1+α) =1 1)U U T, (70) α = α(λ 1,...,λ K2 ) (0, 1). The dea of ths approach s to ound the dstance etween V 1 and V 2 dependent on the values (λ 1 1). For example, the followng rule defnng α s suggested and used n computatons: α = { cos( π 2 x = log β (λ max ), x), x 1 0, x > 1, (71) λ max < 1 s the egenvalue for whch 1 λ takes the largest postve value, and β>1 s the regularzaton parameter to e chosen. Let us note that f all λ 1, no regularzaton s requred,.e. α = 1.

Analyss Error Covarance versus Posteror Covarance 7.3. Computaton of the effectve estmates Let us consder, for example, Eq. (58): V δu V e 3 = E [ H 1 (ϕ) ]. The feld ϕ = ϕ(x, t) n ths equaton corresponds to the pertured optmal soluton u =ū + δu, whch s the soluton to the optmalty system (Eqs (4) (6)) wth the pertured data u =ū + ξ and y =ȳ + ξ o. Gven a set of ndependent perturatons ξ,ξ o, = 1,..., L, L s the sample sze, one can compute a set of u and then V3 e as a sample mean: V e 3 = 1 L L [ H 1 (ϕ(u )) ]. (72) =1 Clearly, ths method s very expensve ecause t requres a set of optmal soluton to e computed. A far more feasle method s suggested n Gejadze et al. (2011). The dea of the method s to susttute a set of optmal solutons y a set of functons whch elong to and est represent the same (as the optmal solutons) proalty dstruton. Assumng that u has a close-to normal dstruton we are lookng for V3 e whch satsfes the system as follows: { V e 3 = E [ H 1 (ϕ(u)) ], u N ( ū, V3 e ) (73). A very sgnfcant reducton of computatonal costs can e acheved f H 1/2 (ϕ(ū)) s used for precondtonng when computng H 1 (ϕ(u )) (see also Gejadze et al., 2011). In the same way as V3 e the estmates V 2 e and V 1 e can e computed. 8. Asymptotc propertes of the analyss and posteror errors In ths secton the asymptotc propertes of the regularzed least square estmator (4D-Var) and of the Bayesan estmator are dscussed. These are mportant propertes whch justfy the use of the Hessan-ased approxmatons of the covarances consdered n ths paper. Let us consder the error equatons (14) and (43). Both these equatons can e rewrtten n an equvalent form (see Appendx): J (ũ) δu = J (û), (74) J s the cost functonal (3), ũ = û + τ (u û), τ [0, 1] and δu = u û (û = u t and û =ū for Eqs (14) and (43), correspondngly). Ths form of the error equaton concdes wth the equaton otaned n Amemya (1983) whle consderng the nonlnear least-squares estmaton prolem for a cost functonal smlar to Eq. (3), ut wthout the penalty (ackground) term. In ths case, the statstcal propertes of the nonlnear least-squares estmator have een analysed y many authors. For a unvarate case, the classcal result (see Jennrch, 1969) states that δu s consstent and asymptotcally normal f ξ o s an ndependent dentcally dstruted (..d.) random varale wth E[ξ o ] = 0 and E[ξ 2 o ] = σ 2 <. In the data assmlaton prolem (Eqs (1) (3)) asymptotcally means that, gven the oservaton array, T gven the fnte oservaton tme step dt, or dt 0 gven the fnte oservaton wndow [0, T]. Let us stress that for the asymptotc normalty of δu the error ξ o s not requred to e normal. Ths orgnal result has een generalzed to the multvarate case and to the case of serally correlated, yet dentcally dstruted oservatons, y Whte and Domowtz (1984), as an even more general case s consdered n Yuan and Jennrch (1998). In the present paper we consder the complete cost functonal (Eq. (3)) and, correspondngly, oth J H( ϕ) and J n Eq. (74) contan addtonal terms,.e. J (ũ) = + R ( ϕ)(c Vo 1 C [(F ( ϕ)) ϕ ] )R( ϕ), ξ + R ( ˆϕ)C J (û) = o ξ o. To analyse a possle mpact of these terms let us follow the reasonng of Amemya (1983, pp. 337 345). It s concluded that the error δu s consstent and asymptotcally normal when: (a) the rght-hand sde of the error equaton s normal; () the left-hand sde matrx converges n proalty to a non-random value. These condtons are met under certan general regularty requrements to the functon F(ϕ), whch are ncomparaly weaker than the tangent lnear hypothess and do not depend on the magntude of the nput errors. It s easy to see that the frst condton holds f ξ s normally dstruted. Snce s a constant matrx, the second condton always holds, as long as t holds for R ( ϕ)(c Vo 1 C [(F ( ϕ)) ϕ ] )R( ϕ). Therefore, one may conclude that δu from Eqs (14) and (43) s ound to reman asymptotcally normal. In practce, the oservaton wndow [0, T] and the oservaton tme step dt are always fnte, mplyng the fnte numer of..d. oservatons. Moreover, t s not easy to access how large the numer of oservatons must e for the desred asymptotc propertes to e reasonaly approxmated. Some nonlnear least-square prolems n whch the normalty of the estmaton error holds for practcally relevant sample szes are sad to exht a close-to-lnear statstcal ehavour. The method suggested n Ratkowsky (1983) to verfy ths ehavour s, essentally, a normalty test appled to a generated sample of optmal solutons, whch s hardly feasle for large-scale applcatons. Nevertheless, for certan hghly nonlnear evoluton models t s reasonale to expect that the dstruton of δu mght e reasonaly close to normal f the numer of..d. oservatons s sgnfcant n tme (typcally, n varatonal DA for the medum-range weather forecast one uses T = 6 h wth the oservaton step dt = 2 mn), and the oservaton network s suffcently dense n space. 9. Numercal valdaton In ths secton the detals of numercal mplementaton are provded. These nclude the descrpton of the numercal experments and of the numercal model. 9.1. Descrpton of numercal experments In order to valdate the presented theory a seres of numercal experments has een performed. We assgn a certan functon u to e the true ntal state u t. Gven u t, we compute a large (L = 2500) ensemle of optmal solutons {u (u t )}, = 1,..., L y solvng L tmes the data assmlaton prolem (Eqs (2) (3)) wth the pertured data u = u t + ξ and y = Cϕ t + ξ o,ξ N (0, V ) and

I. Yu. Gejadze et al. ξ o N (0, V o ). Based on ths ensemle the sample mean and sample covarance matrx are computed. The latter s further processed to flter out the samplng error (as descred n Gejadze et al., 2011); the result s consdered to e the reference ( true ) value ˆV of the analyss error covarance matrx V δu. Ovously, each ensemle memer u (u t ) may e regarded as a unque true optmal soluton ū condtoned on a come true mplementaton of the random processes ξ and ξ o, whch defne the nput data ū and ȳ. Next we choose ū to e a certan u (u t ) for whch the statstcs d = (u u t ) T ˆ (u u t ) are close enough to the statevector dmenson M (d has χ 2 -dstruton wth M degrees of freedom). For any ū we compute a large (L = 2500) ensemle of optmal solutons {u (ū)}, = 1,..., L y solvng L tmes the data assmlaton prolem (Eqs (2) (3)) wth the pertured data u =ū + ξ and y =ȳ + ξ o. Based on ths ensemle the sample mean and sample covarance matrx are computed. The latter s further processed to flter out the samplng error; the result s consdered to e the reference ( true ) value ˆV of the posteror error covarance matrx V δu assocated wth chosen ū. Next we compute the estmates of V δu : V 1 y Eq. (51), V 2 y Eq. (52), V 3 y Eq. (53), V1 e e e y Eq. (56), V2 y Eq. (57) and V3 y Eq. (58), and compare them to ˆV. The accuracy of approxmatons of V δu y dfferent V can e quantfed y the Remann dstanceµ(v, V δu ) defned y Eq. (68). It s also worth notng that H 1 = Ŵ(H 1, V )and H 1 = Ŵ(H 1, H 1 ). Snce the computatonal effcency s not the major ssue n ths paper, the effectve estmates V1 e, V 2 e and V 3 e are evaluated as the sample mean (see Eq. (72) for V3 e ) usng the frst 100 memers of the ensemle {u (ū)}, whch are avalale after computng the reference posteror error covarance ˆV. 9.2. Numercal model As a nonlnear evoluton model for ϕ(x, t) we use the 1D Burgers equaton wth a nonlnear vscous term: ϕ + 1 (ϕ 2 ) = ( ν(ϕ) ϕ ), (75) 2 x x x ϕ = ϕ(x, t), t (0, T), x (0, 1), wth the Neumann oundary condtons ϕ = ϕ = 0 (76) x x x=0 x=1 and the vscosty coeffcent ( ) ϕ 2 ν(ϕ) = ν 0 + ν 1, x ν 0, ν 1 = const > 0. (77) The nonlnear dffuson term wthν(ϕ) dependent onϕ x s ntroduced to mmc the eddy vscosty (turulence), whch depends on the feld gradents (pressure, temperature), rather than on the feld value tself. Ths type of ν(ϕ) also allows us to formally qualfy the prolem (Eqs (75) (77)) as strongly nonlnear (see Fučk and Kufner, 1980). Let us menton that Burgers equatons are sometmes consdered n DA context as a smple model of atmospherc flow moton. We use the mplct tme dscretzaton as follows: ϕ ϕ 1 ( ) 1 + dfx 2 w(ϕ )ϕ ν(ϕ ) ϕ = 0, (78) x h t = 1,..., N s the tme ntegraton ndex and h t = T/N s a tme step. The spatal operator s dscretzed on a unform grd (h x s the spatal dscretzaton step, j = 1,..., M s the node numer, M s the total numer of grd nodes) usng the power law frst-order scheme as descred n Patankar (1980), whch yelds qute a stale dscretzaton scheme (ths scheme allows ν(ϕ) as small as 0.5 10 4 for M = 200 wthout notceale oscllatons). For each tme step we perform nonlnear teratons on coeffcents w(ϕ) = ϕ and ν(ϕ), assumng ntally that ν(ϕ ) = ν(ϕ 1 )andw(ϕ ) = ϕ 1, and keep teratng untl Eq. (78) s satsfed (.e. the norm of the left-hand sde n Eq. (78) ecomes smaller than a threshold ǫ 1 = 10 12 M 1/2 ). In all computatons presented n ths paper the followng parameters are used: oservaton perod T = 0.32, dscretzaton steps h t = 0.004, h x = 0.005, state vector dmenson M = 200, and parameters n Eq. (77) ν 0 = 10 4, ν 1 = 10 6. For numercal experments two ntal condtons u t = ϕ t (x, 0) have een chosen; these wll e referred to elow as case A and case B. For each case, the state evoluton ϕ t (x, t) s presented n Fgure 1 (left) and (rght), respectvely. A well-known property of Burgers solutons s that a smooth ntal condton evolves nto shocks. However, the dffuson term n the form Eq. (77) helps to lmt the feld gradents and to avod the typcal oscllatons. The frst ntal condton s a lfted cos functon. Apparently, the area to the left of the mnmum ponts at x = 0.5 and x = 1 are the areas the shocks form. The level of nonlnearty related to the convectve term can e easly controlled n ths case y addng a constant. In the second case, we comne two cos functons of dfferent frequency and sgn. Moreover, n the area x (0.45, 0.55) one hasϕ t (x,0)= 0,.e. only the nonlnear dffuson process ntally takes place n ths part of the doman. Dfferent oservaton schemes are used: for case A the sensor locaton coordnates ˆx k ={0.35, 0.4, 0.5, 0.6, 0.65}; andforcase B ˆx k ={0.35, 0.45, 0.5, 0.55, 0.65}. 9.3. Addtonal detals The consstent tangent lnear and adjont models (operators R and R ) have een generated y the Automatc Dfferentaton tool TAPENADE (Hascoët and Pascual, 2004) from the forward model code mplementng Eq. (78). The consstent second-order term [(F ( )) ϕ ] has een generated n the same way from the pece of the code descrng the local spatal dscretzaton stencl, then manually ntroduced as a source term to the adjont model (Eq. (17)) to form the second-order adjont model. Both adjont models have een valdated usng the standard gradent tests. Solutons to the DA prolem (Eqs (2) (3)) have een otaned usng the lmted-memory BFGS mnmzaton algorthm (Lu and Nocedal, 1989). For each set of perturatons the prolem s solved twce: frst startng from the unpertured state u t (or ū), then startng from the ackground u = u t + ξ (or u =ū + ξ ). If close results are otaned, the soluton s accepted as an ensemle memer. Ths s done to avod dffcultes related to a possle mult-extrema nature of the cost functon (Eq. (3)). In all computatons reported n ths paper less than 3% of solutons have een eventually dscarded for each ensemle.

Analyss Error Covarance versus Posteror Covarance (a) ϕ () ϕ Fgure 1. Feld evoluton. Left: case A; rght: case B. Correlaton 1 0.8 0.6 0.4 0.2 0-0.3-0.2-0.1 0 0.1 0.2 0.3 x Fgure 2. Correlaton functon. The egenvalue analyss of operators has een performed y the Implctly Restarted Arnold Method (symmetrc drver dsdrv1, ARPACK lrary; Lehoucq et al., 1988). The operators H(ϕ(ū)) and H(ϕ(ū)) needed for evaluatng V 1,2,3 have een computed wthout lmtng the numer of Lanczos teratons. However, when computng the effectve values V e 1,2,3, the numer of teratons have een lmted y 20 and only the converged egenpars (parameter tol = 0.001 n dsdrv1) has een used to form H(ϕ(u )) and H(ϕ(u )). The ackground error covarance V s computed assumng that the ackground error elongs to the Soolev space W2 2 (0, 1) (see Gejadze et al., 2010, for detals). The resultng correlaton functon s as presented n Fgure 2; the ackground error varance s σ 2 = 0.02 and the oservaton error varance s σo 2 = 0.001. 10. Numercal results In ths secton we consder the numercal results whch valdate the presented theory. For a gven true ntal state (case A or case B), from the frst 50 memers of the correspondng ensemle {u (u t )} we choose 10 optmal solutons ū such that the Remann dstance µ(h 1 (ū), H 1 (ū)) gven y Eq. (68) s most sgnfcant. These solutons are numered as ū k, k = 1,..., 10 and referred to elow as case Ak or case Bk. For each ū k we compute the sample of {u (ū k )}, then, consequently, the sample mean, the sample covarance and the reference posteror error covarance ˆV related to ū k. Fnally we compute the estmates V 1,2,3 and V e 1,2,3 and the measures µ(v, ˆV)andµ(V e, ˆV). The results are summarzed n Tale 1. The frst column of Tale 1 contans µ 2 (V 3, ˆV), whch s the squared Remann dstance etween the posteror covarance ˆV and ts most crude estmate V 3 = H 1 ( ϕ). Thus we expect µ(v 3, ˆV) to have the largest value among all measures nvolvng other estmates of V δu.letusrecall that H 1 ( ϕ) s usually consdered as an approxmaton to the analyss error covarance V δu (see Eq. (27)). The latter s sometmes regarded as the Bayesan posteror covarance, whch s a conceptual mstake. Techncally, the dfference s clear: for computng the posteror covarance one must take nto account the second-order term, as n computng the analyss error covarance ths term smply does not appear. The second column of Tale 1 contans µ 2 (V 2, ˆV) V 2 = H 1 ( ϕ).letusrecallthat H 1 ( ϕ) s consdered the asymptotc posteror covarance n Bayesan theory. The thrd column contans µ 2 (V 1, ˆV), V 1 = H 1 ( ϕ)h( ϕ)h 1 ( ϕ) s the posteror covarance estmate suggested n ths paper. Accordng to the theory presented, for small nput errors ξ o, ξ one should expect µ(v 1, ˆV) <µ(v 2, ˆV) <µ(v 3, ˆV). In practce, ths relaton may not stand (as can e seen from the tale) due to lnearzaton errors, as dscussed n secton 6. In ths case one should expect ths ehavour to e true at least for effectve estmates,.e. µ(v e 1, ˆV) <µ(v e 2, ˆV) <µ(v e 3, ˆV) <µ(v 3, ˆV). (79) Lookng at Tale 1 we note that ths condton always holds, whch valdates the presented theory. In some cases the overall reducton of the Remann dstance (compare µ(v e 1, ˆV) toµ(v 3, ˆV)) s aout an order of magntude or even larger. In some cases, e.g. A5, B2, ths reducton s not sgnfcant. It s dffcult, therefore, to warrant a certan level of dstance reducton for each partcular case, and ths should e accessed n an average sense. The tale addtonally

I. Yu. Gejadze et al. Tale 1. Summary of numercal experments: squared Remann dstance µ 2 (, ). Case µ 2 (V 3, ˆV) µ 2 (V 2, ˆV) µ 2 (V 1, ˆV) µ 2 (V e 3, ˆV) µ 2 (V e 2, ˆV) µ 2 (V e 1, ˆV) A1 3.817 3.058 4.738 2.250 1.418 1.151 A2 17.89 18.06 21.50 2.535 1.778 1.602 A3 10.89 9.183 8.988 4.725 3.013 2.627 A4 3.489 2.960 4.286 2.190 1.342 1.079 A5 5.832 5.070 5.778 3.710 2.886 2.564 A6 9.048 8.362 10.16 2.539 1.748 1.474 A7 20.21 19.76 22.24 4.290 3.508 3.383 A8 1.133 0.585 1.419 1.108 0.466 0.246 A9 20.18 20.65 24.52 2.191 1.986 1.976 A10 10.01 8.521 8.411 3.200 2.437 2.428 B1 7.271 6.452 6.785 2.852 1.835 1.476 B2 16.42 14.89 14.70 15.61 14.11 13.77 B3 9.937 10.70 17.70 4.125 3.636 3.385 B4 6.223 5.353 11.50 2.600 1.773 1.580 B5 10.73 9.515 9.875 4.752 3.178 2.530 B6 6.184 4.153 8.621 4.874 2.479 1.858 B7 9.551 9.818 26.51 3.912 3.195 2.971 B8 5.948 4.845 7.105 3.484 2.105 1.745 B9 17.10 15.69 16.77 5.025 4.186 3.854 B10 8.230 7.685 7.827 3.787 2.861 2.647 demonstrates the correctness and potental of the effectve value approach suggested n Gejadze et al. (2011). By comparng µ(v e 3, ˆV) andµ(v 3, ˆV) n cases A2, A7, A9 and B9 one can note that the Remann dstance s drastcally reduced f the effectve nverse Hessan s used nstead of the nverse Hessan at pont ū. The followng examples show what the Remann dstance actually means n terms of the error n covarance estmate. Let us consder the mean devaton vector σ and the correlaton matrx r defned as follows: σ () = V 1/2 (, ), r(, j) = V(, j)/(σ ()σ (j)),, j = 1,..., M, and denote σ 3, σ1,2,3 e, ˆσ as the mean devaton vectors and r 3, r1,2,3 e, ˆr as the correlaton matrces assocated correspondngly wth V 3, V e 1,2,3 and ˆV. Naturally, ˆσ and ˆr are used as the reference values. The mean devaton error s characterzed y the vector ε = log 2 (σ/ˆσ ). (80) The logarthmc error (Eq. (80)) s partcularly approprate when comparng postve quanttes snce t shows (symmetrcally!) how many tmes the reference value s ether over- or underestmated. The error n the correlaton matrx s characterzed y ǫ = r ˆr. (81) Let us denote y ε 3, ε1,2,3 e the error vectors assocated wth σ 3, σ1,2,3 e,andyǫ 3, ǫ1,2,3 e the error matrces assocated wth r 3, r1,2,3 e. For demonstraton, two cases for each ntal condton have een chosen: A2, A8 and B6, B9. The reference mean devaton ˆσ for cases A and B s presented n Fgure 3 (left) and (rght), correspondngly. In Fgure 4 the logarthmc error ε (see Eq. (80)) s shown as follows: ε 3 (error assocated wth V 3 = H 1 )as the oundary of the lght-flled area 3; ε3 e (error assocated wth V e 3 = E[H 1 ]) n lne 3e; ε2 e (error assocated wth V e 2 = E[H 1 ]) n lne 2e; andε1 e (error assocated wth V e 1 = E[H 1 HH 1 ]) as the oundary of the dark-flled area 1e. The presented fgures confrm the man result: the mean devaton error s the largest for the posteror covarance eng estmated y V 3 (oundary of area 3) and the smallest - y V e 1. For example, see case A2 (upper/left panel), area 0.48 < x < 0.5, or case B9 (lower/rght panel), area 0.5 < x < 0.52, the estmated σ s aout three tmes smaller than the actual value. If the effectve estmate V 3 e s used (lne 3e), ths error s notcealy reduced. In case B6 (upper/rght panel) no eneft from usng V e 3 nstead of V 3 can e notced; however, the eneft of usng the estmates V e 2 (lne 2e) andv e 1 (oundary of area 1e) s clearly manfested. On the other hand, case B9 represents an example no notceale eneft s acheved when usng V e 2 and Ve 1 nstead of V e 3. Nevertheless, t s ovous from the fgures that Ve 1 s, on average, the est estmate avalale (see also Tale 1). Case B6 (upper/rght panel) s also nterestng n the way that σ assocated wth V e 3 and V 3 s manly overestmated (ε>0). Relyng on all 20 cases consdered n numercal smulaton one may conclude that V 3 = H 1 s more lkely to provde underestmated values of σ. The asolute error n the correlaton matrx ǫ (see Eq. (81)) s shown n Fgure 5. Here, for each case consdered, su-cases (a), () and (c) dsplayng ǫ 3, ǫ3 e and ǫ1 e correspondngly are presented. The dstance etween an element ǫ(, j) and the dagonal element ǫ(, )scounted y (j )h x along the axs x. The features to e notced n Fgure 5 are smlar to those dscussed prevously. As efore, the error assocated wth V 3 = H 1 (su-case (a)) s the largest and the error assocated wth V e 1 (su-case (c)) s the smallest. In case A2, the man error reducton s acheved y usng the effectve estmate V e 3 nstead of the pont estmate V 3, as usage of V e 1 nstead of Ve 3 does not make too much dfference. The opposte ehavour can e oserved n case B6.

Analyss Error Covarance versus Posteror Covarance (a) 0.16 0.14 case A2 case A8 () 0.16 0.14 case B6 case B9 0.12 0.12 0.1 0.1 σ 0.08 0.06 0.04 0.02 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 x σ 0.08 0.06 0.04 0.02 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 x Fgure 3. Reference mean devaton ˆσ (x) (corresponds to ˆV). Left: cases A2, A8; rght: cases B6, B9. Ths fgure s avalale n colour onlne at wleyonlnelrary.com/journal/qj (a) 0.2 () 0.6 0-0.2 1e 0.4 ε -0.4-0.6 3e 2e ε 0.2 (c) ε -0.8-1 -1.2-1.4 0-0.1-0.2-0.3-0.4 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 x 3 case A2 case A8-0.5 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 x (d) ε 0-0.2 case B6-0.4 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 x 0.2 0-0.2-0.4-0.6-0.8-1 -1.2 case B9-1.4 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 x Fgure 4. Logarthmc errors n the mean devaton (Eq. (80)): ε 3 (x), ε3 e(x), εe 2 (x) andεe 1 (x). Ths fgure s avalale n colour onlne at wleyonlnelrary.com/journal/qj 11. Conclusons In ths paper we consder the hnd-cast (ntalzaton) data assmlaton prolem, whch s a typcal prolem n meteorology and oceanography. The prolem s formulated as an ntal value control prolem for a nonlnear evoluton model governed y partal dfferental equatons and the soluton method (called 4D-Var) conssts n mnmzaton of the cost functon (Eq. (3)) under constrants (Eq. (1)). In fnte dmensons ths s equvalent to solvng the regularzed nonlnear least squares prolem. The statstcal propertes of the optmal soluton (analyss) error are usually quantfed y the analyss error covarance matrx: the approach possly nherted from the nonlnear regresson theory a smlarly defned covarance matrx s used to quantfy asymptotc propertes of the nonlnear leastsquares estmator. Less often the 4D-Var method had een consdered n the Bayesan perspectve, ut ths pont of vew ecomes ncreasngly popular. In partcular, t s recognzed that n the case of Gaussan nput errors the Bayesan approach yelds the same cost functonal as consdered n 4D-Var. However, some authors seem to fall short n

I. Yu. Gejadze et al. (a) () () () () () (c) (c) (c) (d) () () () () (c) (c) Fgure 5. Asolute errors n the correlaton matrx (Eq. (81)): ǫ 3 (x, x ), su-case (a); ǫ e 3 (x, x ), su-case (); ǫ e 1 (x, x ), su-case (c). Ths fgure s avalale n colour onlne at wleyonlnelrary.com/journal/qj recognzng that n ths case t would e consstent to utlze a somewhat dfferent error measure, namely the proper (Bayesan) posteror covarance. Let us note that the analyss error covarance s sometmes called posteror n the sense that t s condtoned on the data,.e. t s otaned after the data have een assmlated. The man purpose of ths paper has een to demonstrate that the analyss error covarance and the Bayesan posteror covarance are dfferent ojects and ths dfference s not merely a sutle theoretcal ssue. In ths paper the dfference etween the analyss error covarance and the Bayesan posteror covarance has een thoroughly examned. These two conceptually dfferent ojects are quanttatvely equal n the lnear case, ut may sgnfcantly dffer n the nonlnear case. The analyss error covarance can e approxmated y the nverse Hessan of the auxlary DA prolem (Eqs (19) (20)),.e. y V 3,ory ts effectve value V e 3. The Bayesan posteror covarance has to e approxmated y a doule-product formula (Eq. (51)),.e. y V 1, or y ts effectve value V e 1. The dfference etween V 1 and V 3 s due to the presence of the second-order term n Eq. (46), whch vanshes n the lnear case. Thus, techncally, the second-order adjont analyss s nvolved when dealng