A Reduced Rank Kalman Filter Ross Bannister, October/November 2009

Similar documents
R. E. Petrie and R. N. Bannister. Department of Meteorology, Earley Gate, University of Reading, Reading, RG6 6BB, United Kingdom

Representation theory of SU(2), density operators, purification Michael Walter, University of Amsterdam

Simple Examples. Let s look at a few simple examples of OI analysis.

Variational data assimilation

Vector Spaces. EXAMPLE: Let R n be the set of all n 1 matrices. x 1 x 2. x n

1 Systems of Differential Equations

Fundamentals of Engineering Analysis (650163)

Chapter 2 Canonical Correlation Analysis

11 a 12 a 21 a 11 a 22 a 12 a 21. (C.11) A = The determinant of a product of two matrices is given by AB = A B 1 1 = (C.13) and similarly.

B553 Lecture 5: Matrix Algebra Review

Linear Algebra March 16, 2019

Matrix Factorizations

Diagonalization by a unitary similarity transformation

1 Inner Product and Orthogonality

Inverses. Stephen Boyd. EE103 Stanford University. October 28, 2017

First of all, the notion of linearity does not depend on which coordinates are used. Recall that a map T : R n R m is linear if

4. DATA ASSIMILATION FUNDAMENTALS

Review of Vectors and Matrices

Foundations of Matrix Analysis

Differential Geometry of Surfaces

CHAPTER 5. Linear Operators, Span, Linear Independence, Basis Sets, and Dimension

Mathematical Optimisation, Chpt 2: Linear Equations and inequalities

Vectors and Matrices

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

APPENDIX A. Background Mathematics. A.1 Linear Algebra. Vector algebra. Let x denote the n-dimensional column vector with components x 1 x 2.

Mobile Robotics 1. A Compact Course on Linear Algebra. Giorgio Grisetti

Numerical Linear Algebra

ENGG5781 Matrix Analysis and Computations Lecture 8: QR Decomposition

A Review of Matrix Analysis

Application of Ensemble Kalman Filter in numerical models. UTM, CSIC, Barcelona, Spain 2. SMOS BEC, Barcelona, Spain 3. NURC, La Spezia, Italy 4

Preliminary/Qualifying Exam in Numerical Analysis (Math 502a) Spring 2012

Math Camp II. Basic Linear Algebra. Yiqing Xu. Aug 26, 2014 MIT

Linear Algebra Review. Vectors

Linear Algebra. and

SU(N) representations

Kalman Filter. Predict: Update: x k k 1 = F k x k 1 k 1 + B k u k P k k 1 = F k P k 1 k 1 F T k + Q

Conjugate Gradient (CG) Method

MATH 315 Linear Algebra Homework #1 Assigned: August 20, 2018

CS412: Lecture #17. Mridul Aanjaneya. March 19, 2015

Expansion formula using properties of dot product (analogous to FOIL in algebra): u v 2 u v u v u u 2u v v v u 2 2u v v 2

Basic Concepts in Linear Algebra

2. Linear algebra. matrices and vectors. linear equations. range and nullspace of matrices. function of vectors, gradient and Hessian

Linear Algebra, part 3 QR and SVD

Notes on basis changes and matrix diagonalization

CS 246 Review of Linear Algebra 01/17/19

Review of Basic Concepts in Linear Algebra

MATH 240 Spring, Chapter 1: Linear Equations and Matrices

Matrix & Linear Algebra

AM205: Assignment 2. i=1

Linear Algebra Massoud Malek

MODULE 8 Topics: Null space, range, column space, row space and rank of a matrix

Travel Grouping of Evaporating Polydisperse Droplets in Oscillating Flow- Theoretical Analysis

Conceptual Questions for Review

1 2 2 Circulant Matrices

Linear algebra for computational statistics

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 1 x 2. x n 8 (4) 3 4 2

JACOBI S ITERATION METHOD

Introduction. Vectors and Matrices. Vectors [1] Vectors [2]

1 Determinants. 1.1 Determinant

Quivers. Virginia, Lonardo, Tiago and Eloy UNICAMP. 28 July 2006

Lecture 3: QR-Factorization

Direct Methods for Solving Linear Systems. Matrix Factorization

MAT Linear Algebra Collection of sample exams

Assimilation Techniques (4): 4dVar April 2001

What is A + B? What is A B? What is AB? What is BA? What is A 2? and B = QUESTION 2. What is the reduced row echelon matrix of A =

Matrix Algebra for Engineers Jeffrey R. Chasnov

PHYS 705: Classical Mechanics. Introduction and Derivative of Moment of Inertia Tensor

a s 1.3 Matrix Multiplication. Know how to multiply two matrices and be able to write down the formula

MATHEMATICS COMPREHENSIVE EXAM: IN-CLASS COMPONENT

Numerical Linear Algebra

MAT 610: Numerical Linear Algebra. James V. Lambers

Numerical Analysis Lecture Notes

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.

Feynman rules for fermion-number-violating interactions

Kevin James. MTHSC 3110 Section 2.1 Matrix Operations

Divide-and-Conquer. Reading: CLRS Sections 2.3, 4.1, 4.2, 4.3, 28.2, CSE 6331 Algorithms Steve Lai

Computational Linear Algebra

Numerical Linear Algebra Homework Assignment - Week 2

Derivation of the Kalman Filter

Chapter 4 - MATRIX ALGEBRA. ... a 2j... a 2n. a i1 a i2... a ij... a in

A matrix over a field F is a rectangular array of elements from F. The symbol

14 Singular Value Decomposition

Mathematical Methods wk 2: Linear Operators

AMS526: Numerical Analysis I (Numerical Linear Algebra)

Solution of Linear Equations

1 Multiply Eq. E i by λ 0: (λe i ) (E i ) 2 Multiply Eq. E j by λ and add to Eq. E i : (E i + λe j ) (E i )

ANALYTICAL MATHEMATICS FOR APPLICATIONS 2018 LECTURE NOTES 3

1 Last time: least-squares problems

LAKELAND COMMUNITY COLLEGE COURSE OUTLINE FORM

Lecture 11. Linear systems: Cholesky method. Eigensystems: Terminology. Jacobi transformations QR transformation

Lecture 7. Econ August 18

Linear Algebra. James Je Heon Kim

Determinants of generalized binary band matrices

Chapter 5 Eigenvalues and Eigenvectors

Applied Linear Algebra in Geoscience Using MATLAB

The purpose of this section is to derive the asymptotic distribution of the Pearson chi-square statistic. k (n j np j ) 2. np j.

Linear Algebra. Linear Equations and Matrices. Copyright 2005, W.R. Winfrey

AM 205: lecture 8. Last time: Cholesky factorization, QR factorization Today: how to compute the QR factorization, the Singular Value Decomposition

Lie Groups for 2D and 3D Transformations

Basics of Calculus and Algebra

Transcription:

, Octoer/Novemer 2009 hese notes give the detailed workings of Fisher's reduced rank Kalman filter (RRKF), which was developed for use in a variational environment (Fisher, 1998). hese notes are my interpretation of the method and an outline of how I would implement it. Fisher's approach uses flow-dependent information from a Hessian singular vector (HSV) calculation and I propose an alternative that uses information from an availale ensemle, made up of ensemle memers (EMs). 1. What is a reduced rank Kalman filter? In VAR, the B-matrix used is a static representation of forecast error covariances and as such VAR would e expected to perform poorly in an environment where the actual forecast error covariance statistics change significantly from case to case (e.g. Fisher, 2007; Bannister, 2008). he RRKF is an attempt to lend-in to the Var. prolem flow-dependent aspects in a mathematically formal manner. wo sources of flow-dependent information are considered here: firstly HSV information (as in Fisher's original formulation (Fisher, 1998)) and secondly from EM information. A RRKF in this context may e regarded as a modification to the existing B-matrix in VAR that allows the dynamical evolution of a suspace of the state vector (where the suspace is defined as that spanned y either the HSVs or y the EMs). Each approach is outlined elow. 2. Source A of flow-dependent information to lend with the B-matrix (Hessian singular vectors) One way of identifying the suspace that will e treated with explicit flow dependence is to use a HSV calculation. Let the size of the suspace e K, which can e chosen aritrarily, ut restricted in practice y cost. K represents the numer of singular vectors. he size of the full model space is N and in these notes it is recognised that K N. Fisher defines the suspace y the K most rapidly growing HSVs. he reason why they are chosen to e singular vectors of the Hessian will ecome evident. In order to specify the prolem that must e solved to compute the HSVs, we introduce two norms as follows. Let the covariance matrix e the error covariance of the analysis of the previous cycle. In order for Fisher's method to work, it must e possile to act with the matrix P a 1 (or an approximation of it). Let the matrix W e the norm used to measure the size of a perturation. It must e possile to act with the matrix W 1. P a Let the time of the previous data assimilation cycle e t and the time of the present analysis e 0. States that have no time lael are valid at time 0 y default. Let the tangent linear model, M 0 t act on perturations at time t and give a perturation at time 0 δx = M 0 t δx( t). (1)

If δx( t) were known, then the size of δx according to the W-norm would e J 1 J 1 = δx W 1 δx = δx ( t) M 0 tw 1 M 0 t δx( t). (2) he HSVs are defined as those δx( t) that maximise J 1 suject to the constraint that δx( t) is distriuted according to P a, ie δx ( t) P a 1 δx( t) const = 0, (3) for an aritrary constant, ' const'. he constrained optimisation prolem may therefore e posed as x( t) [J 1 λ (δx ( t) P a 1 δx( t) const)] = 0, (4) where λ is the Lagrange multiplier. Differentiating and setting the solutions to δx k ( t) (with associated Lagrange multiplier ) gives λk M 0 tw 1 M 0 t δx k ( t) = λkp a 1 δx k ( t), (5) which is a generalised eigenvalue prolem. he δx k ( t) are the HSVs. he set of vectors P a 1/2 δx k ( t) are eigenvectors of (W 1/2 M 0 t P a1/2 ) (W 1/2 M 0 t P a1/2 ) and are thus the right singular vectors of the matrix W 1/2 M 0 t P a1/2. Let s k = M 0 t δx k ( t). hose s k with the largest define the suspace whose ackground errors are treated explicitly y the RRKF. λk A general perturation at time 0, δx has a part δx s that lies in this suspace, which can e found as a linear comination of the s k. Identification of this suspace can e simplified y first constructing an orthogonalized and normalized set of vectors, sk (e.g. y the Gramm-Schmidt procedure, see App. A). hen δx s = S a, (6) where S is the N K matrix of sk vectors and a is the K-element vector of (as yet unknown) coefficients. Orthogonalization should e done with respect to an inner product that nondimensionalizes the components (the W 1 inner product achieves this, so this matrix shall e used) such that S W 1 S = I, (7) (the B 1 norm could, in principle e used instead). he enefit of first orthogonalizing the vectors is to allow a to e found easily from δx he part of δx that is not within the chosen suspace is the residual a = S W 1 δx. (8) δx s = δx δx s, (9) which is orthogonal to δx s under the W 1 norm (see App. B). he way that the HSVs may e used in the RRKF is covered in Sec. 6. δx s 3. Source B of flow-dependent information to lend with the B-matrix (ensemle memers) Another way of identifying the suspace that will e treated with explicit flow dependence is to use information extracted from an ensemle. Let the numer of EMs e L, which can e chosen aritrarily, ut restricted in practice y cost. From the L ensemle memers, a K-dimensional suspace can e determined ( K L) that will e used to descrie the covariances explicitly. 2

S he size of the full model space is N and in these notes it is assumed that K, L N. In a similar way to the procedure for the HSVs in Sec. 2, the EMs may e orthogonalized y the Gramm-Schmidt procedure (see App. A) and placed in columns of the matrix. he relationship etween a vector in the L-dimensional space spanned y the EMs, a, and the model space, δx s, is given y (6), ut where now a is a L-element vector and S is the N L matrix of orthogonalized EMs. Orthogonalization is performed with respect to the W 1 inner product descried y (7) (Eqs. (8) and (9) then follow). he procedure to use the EMs in the RRKF is a little more involved than the HSV procedure. he way that the EMs may e used in the RRKF is covered in Sec. 7. 4. he ackground cost function in the new suspace he next task is to outline the way that the flow-dependent information, whether from HSVs or from EMs, can e comined with static error covariance information in the VAR formulation. he usual ackground cost function in VAR is J J = 1 2 (δx δx ) B 1 (δx δx ), (10) where δx = x x g, δx = x x g, x is the ackground state and x g is a reference (or guess) state. he B-matrix in (10) is static. Equation (10) may e written in terms of the components δx s and δx s y sustituting (9) into (10). his gives three parts: (i) a part that involves only the special suspace that has een identified from the K-dimensional suspace introduced in Secs. 2 and 3, (ii) the part that couples this suspace with the rest of the state, and (iii) the part that involves only the rest of the state J = 1 2 (δx s δx s) B 1 (δx s δx s) + (δx s δxs) B 1 (δx s δx s) + 1 2 (δx s δxs) B 1 (δx s δxs). (11) his cost function is identical to (10). he RRKF is constructed y imposing a flow dependent error covariance matrix for the first two terms (B P f ) ut keeping the static B-matrix in the last term J 1 2 (δx s δx s) P f 1 (δx s δx s) + α (δx s δxs) P f 1 (δx s δx s) + 1 2 (δx s δxs) B 1 (δx s δxs). (12) he factor α, added y Fisher (1998), is to help ensure that J is convex. he flow dependent information provided y the K special vectors will e used to determine the P f. It will e introduced into the prolem y a modification to the standard control variale transform used in VAR. 5. Control variale transforms stage It is usual in VAR to make a change of variale from model variales ( δx) to control variales (often named χ). he control variale transform used in standard VAR is here denoted L and in the RRKF there is an additional transform denoted X as follows 3

δx = LXχ, (13) where, y design, X is an orthogonal matrix XX = I, (14) (see elow) and L is the usual control variale transform used in VAR LX (LX) = LL = B. (15) Sustituting (13) into (12) gives J = 1 2 (χ s χ s) X L P f 1 LX (χs χ s) + α (χ s χ s) X L P f 1 LX (χs χ s) + 1 2 (χ s χ s) X L B 1 LX (χ s χ s), (16) where χ s = X L 1 δx s, χ s = X L 1 δx s, χ s = X L 1 δx s and χ s = X L 1 δx s. he matrix X is not present in standard VAR, ut is introduced in (13) to isolate the special suspace identified in Secs. 2 and 3 from the remainder of the state space. As it stands, (16) looks complicated to treat. wo sustitutions are made as follows X L B 1 LX = I, (17) X L P f 1 LX = P f χ 1, (18) where (17) follows from (14) and (15), and (18) is a definition. With these sustitutions, (16) is J = 1 2 (χ s χ s) P f χ 1 (χ s χ s) + α (χ s χ s) P f χ 1 (χ s χ s) + 1 2 (χ s χ s) (χ s χ s). (19) he part of P f 1 χ that is important is derived in Secs. 6 and 7 from the K-dimensional suspaces identified from the HSV or EM calculations respectively. he key to simplifying (19) is in the design of X. Let X have the following properties. X acting on any vector in the suspace L 1 δx s (where δx s is a vector that exists entirely in the special K-dimensional suspace) gives a vector that is non-zero only in the first K elements ( ) α1 X L 1 αk δx s =, (20) 0 0 and acting on a vector in the remaining space, L 1 δx, gives a vector that is non-zero only in the remaining N K elements. X It is possile to define a suitale X that satisfies these conditions y using a sequence of Householder transformations (see App. C). he important oservation is that in (19), only the 1 first K columns of need to e known. P f χ s 6. Determination of P f 1 χ for the Hessian singular vectors P f 1 χ he procedure to calculate the first K columns of using the HSVs is now descried. 4

ˆ ˆ ˆ Following Fisher (1998) let Z = P f 1 S, (21) where S is the N K matrix whose columns are the s k and Z is the N K result after acting with the inverse of the flow-dependent error covariance matrix. Equation (21) is now developed using the definition (18) along the way X (N N)L (N N)Z (N K) Z = P f 1 LXX L 1 S, X L Z = X L P f 1 LXX L 1 S, = Pχ 1 f ˆ I X L 1 S, (22) = P f χ 1 (N K)I(K N)X (N N)L 1 where I is the following non-square quasi-identity matrix (N N)S (N K), (23) I(K N) = (I (K K) 0 (K N K) ). (24) his matrix is included to remove the superfluous zero-elements for rows i > K of X L 1 S (y the design of X). In (23) and (24), laels have een added to the matrices to indicate their dimensions. Equation (22) leads to X L Z (I ˆ X L 1 S) 1 = Pχ 1 f, (25) where the operator inverted is a calculale K K matrix, which we assume is non-singular. Note that (25) is for only part of the inverse covariance matrix and so is not symmetric. he matrix yet known is X L Z which is now found from the HSVs (Sec. 2). By the definition of Z (21), X L Z is X L Z = X L P f 1 S, (26) whose right hand side can e found from (5). Let columns of a new matrix, S t, e those K vectors at time t that evolve into the columns of S (the columns of S t are the states δx k ( t) in (5)) S = M 0 t S t. (27) his is useful in the derivation to follow. Write (5) in complete matrix form M 0 tw 1 M 0 t S t = P a 1 S t Λ, (28) where Λ is the diagonal matrix of λk. Also important is the propagation of the error covariances (ignoring the model error contriution) P f = M 0 t P a M 0 t. (29) hese equations can e manipulated to give the matrix in (26) required to complete (25). Starting from (28) P a M 0 tw 1 SΛ 1 = S t, M 0 t P a M 0 tw 1 SΛ 1 = M 0 t S t, P f W 1 SΛ 1 = S, W 1 SΛ 1 = P f 1 S, X L W 1 SΛ 1 = X L P f 1 S, 5

S S = X L Z y (26), Pχ 1 f = X L W 1 SΛ 1 (I ˆ X L 1 S) 1 y (25). (30) he right hand side of (30) is known and thus all relevant elements of the ackground cost function (19) are now calculale given the HSV results. 7. Completing the calculation with information from the ensemle he EMs are now used to determine the first K columns of P f χ. Let the columns of S contain the L raw EMs ( L K). he forecast error covariance matrix in state space is then P f 1 = L 1 SS, (31) which is too large to compute explicitly. he transform etween the ensemle from the orthogonalized ensemle suspace and the state space is S = S S su, (32) where S is the N L orthogonalized matrix of EMs, as used in (6), and S su is the L L matrix of EMs in the orthogonalized ensemle representation. he orthogonality of is specified in (7). he inverse of (32) is 1 S su = S W 1 S. (33) he ensemle forecast error covariance matrix in EM suspace is P f 1 L = L 1 S sus su, (34) which is easily calculale, as are its eigenvectors and eigenvalues. For reference, the relationship etween P f in (31) and in (34) is, using (32) P f L P f = S PLS f. (35) Interest here is in the K eigenvectors of PL f with the largest eigenvalues, where 0 < K L. hese are used to define the special K-dimensional suspace. First let U L e the L L matrix of all L eigenvectors, where U LU L = I and U L U L = I, (36) and let Λ L e the diagonal L L matrix of eigenvalues. Equation (34) may e decomposed as P f L = U L Λ L U L. (37) Similarly, let U K e the L K matrix of the K eigenvectors with the largest eigenvalues, where U KU K = I. (38) he following also holds as long as it acts only on vectors entirely in the column space of U K U K U K = I. (39) he K largest eigenvalues are assemled into the diagonal matrix Λ K. he L L covariance matrix, ut of rank K is then (y analogy to (37)) P f K = U K Λ K U K. (40) hese K eigenvectors may e projected ack to model space using (6) U = S U K, (41) where (as explained in Sec. 3) is the N L matrix of orthogonalized ensemle memers. he inverse of (41) makes use of (7) 6

ˆ U K = S W 1 U. (42) he projection of these K eigenvectors to model space span the special suspace and so the L columns of U can e used to define the Householder matrix X. Recall the two ullet points at the end of Sec. 5, where δx s is a vector that spans this special space. Appendix C shows how X can e constructed from U. he aim of the following procedure is to use information from the EMs to define the first K columns of P f 1 χ as defined in (18) and used in (19). In a similar fashion to the HSV approach in Sec. 6, let Z = P f 1 U, (43) = P f 1 LXX L 1 U, X L Z = X L P f 1 LXX L 1 U, = Pχ 1 f ˆ I X L 1 U, (44) where I is the K N non-square quasi-identity matrix defined y (24). his has to e included 1 ecause is defined only with the first K columns. Equation (44) leads to P f χ Pχ 1 f = X L Z (I ˆ X L 1 U) 1, (45) where the operator inverted is a calculale K K matrix, which we assume is non-singular. Note that (45) is for only part of the inverse covariance matrix and so is not symmetric. he matrix yet known is X L Z which is now found from the EMs (Sec. 3). he P f that is of interest exists only in the K-dimensional suspace. he P f that exists in the L- dimensional suspace is (35). his can e modified to exist in the K-dimensional suspace replacing in (35) with found from (40). P f L P f K P f = S P f LS S P f KS = S U K Λ K U KS. (46) Equation (46) is developed as follows (steps marked with a * need special note - see elow) I = S U K Λ K U KS P f 1, S W 1 = U K Λ K U KS P f 1, U KS W 1 = Λ K U KS P f 1, Λ 1 K U KS W 1 = U KS P f 1, U K Λ 1 K U KS W 1 = S P f 1. (47 ) he following, derived from (7) and (39) holds as long as it acts only on vectors entirely in the column space of S When used with (47) this gives S S W 1 = I. (48 ) U K Λ 1 K U KS W 1 = S W 1 WP f 1, S U K Λ 1 K U KS W 1 = WP f 1, W 1 S U K Λ 1 K U KS W 1 = P f 1. (49) 7

he steps marked with a (*) require special attention as these statements are appropriate only under special circumstances. his may require some checking. Equation (49) is used in the 1 definition of Z (43), which is itself used for the definition of in (45). his gives which is a N K 1 P f χ = X L W 1 S U K Λ 1 K U KS W 1 U (I ˆ X L 1 U) 1, (50) matrix as required. P f χ 8. Differentiating the cost function J he ackground part of the cost function, (19), is now defined, where is found depending upon whether the HSV or the EM source of information is used. For the HSV calculation, P f 1 χ is found from (30) and for the EM calculation, P f 1 χ is found from (50). For VAR, the derivatives of J and J o (the oservation part of the cost function) are required with respect to each component of the control vector χ. Recall that the control vector comprises two parts: one that descries the special suspace, χ s (non-zero only in the first K components), and another that descries the remainder, χ s (non-zero only in the last N K components) P f 1 χ χ = χs + χ s, (51) and the gradient vector, J / χ has a similar structure (i.e. J / χ s is non-zero only in the first K components and J / χ s is non-zero only in the last N K components). J is the total cost function, which is the sum of the ackground part, J in (19) and an oservation part, J o. Let the three terms defining in (19) e written separately J J = J I + J II + J III + J o, (52) where J I = 1 2 (χ s χ s) P f χ 1 (χ s χ s), (53) J II = α (χ s χ s) P f χ 1 (χ s χ s), (54) and J III = 1 2 (χ s χ s) (χ s χ s). (55) It is assumed that J o / δx has already een found, e.g. y the conventional adjoint method (e.g. Rodgers, 200x; Bannister 200x). here are eight contriutions to the gradient vector, viz. J I / χ s, J I / χ s, J II / χ s, J II / χ s, J III / χ s, J III / χ s, J o / χ s, and J o / χ s, where each contriutes to the total gradient vector in the following way J = JI + JI χ χs χ s Note that derivatives with respect to + JII χs χs + JII + JIII + JIII + J o χ s χs χ s χs give a vector that is non-zero only in the first + J o. (56) χ s K components, and so imply differentiating with respect to the first K components of χ only. Similarly derivatives with respect to χ s give a vector that is non-zero only in the last N K components, and so imply differentiating with respect to the last N K components of χ only. It will e useful to expand-out the matrix notation in (53)-(55) as follows (in the following, ignore the χ s and χ s terms for simplicity (they can e put ack in later with no loss of generality). Whether we are interested in χs or χ s will e controlled y the range of the index (1 to K or K + 1 to N respectively) 8 J I = 1 2 K (χs) i K (P f 1 χ )ij (χs) j i = 1 = 1 2 K χi K (Pχ 1 f )ij χj, (57) i = 1

J II = α N i = K + 1 J III = 1 2 N i = K + 1 (χ s) i K (P f χ Each contriution in (56) is now addressed in turn. 1 )ij (χs) j = α N i = K + 1 χi K (P f χ 1 )ij χj, (58) χ 2 i. (59) Contriution (i): J I / χ s Differentiating (57) with respect to χ (components 1 k K) gives J I (χs) k = 1 2 K δik K (P f 1 χ )ij χj + 1 i = 1 2 K χi K (Pχ 1 f )ij δkj, i = 1 = 1 2 K (P f 1 χ )kj χj + 1 2 K χi (P f 1 χ )ik i = 1 = K Contriution (ii): J I / χ s Differentiating (57) with respect to χ (components K + 1 k N) gives J I (P f χ 1 )kj χj. (60) (χ s) k = 0. (61) Contriution (iii): J II / χ s Differentiating (58) with respect to χ (components 1 k K) gives J II = α (χs) k N = α N χi K (Pχ 1 f )ij δjk, i = K + 1 i = K + 1 χi (P f χ 1 )ik. (62) Contriution (iv): J II / χ s Differentiating (58) with respect to χ (components K + 1 k N) gives J II = α (χ s) k N = α K i = K + 1 δik K (P f χ 1 )ij χj, Contriution (v): J III / χ s Differentiating (59) with respect to χ (components 1 k K) gives J III (P f χ 1 )kj χj. (63) (χs) k = 0. (64) Contriution (vi): J III / χ s Differentiating (59) with respect to χ (components N + 1 k N) gives J III = (χs) k N i = K + 1 Contriutions (vii) and (viii): J o / χ s and J o / χ s χi δ ik, = χk. (65) 9

For this term it is not necessary to distinguish etween parts of the state vector. Using the definition of the control variale transform in (13), it can e expanded as herefore x m χq δx m = N n = 1 = N n = 1 L mn N L mn N p = 1 X np δpq p = 1 Using the chain rule, and then feeding-in (67) gives J o χq = N m = 1 χq = N m = 1 = N x m X qn n = 1 J o δx m, X np χp. (66) = N n = 1 N J o L mn X nq, n = 1 δx m N m = 1 L nm J o Equation (68) is equivalent to the matrix operation represented y J o χ = X L J o L mn X nq. (67) δx m. (68) δx, (69) which is the standard result of the adjoint of the control variale transform. References Bannister R.N., 2001, Elementary 4d-Var, DARC echnical Report No. 2, 16 Pp. Bannister R.N., 2008, A review of forecast error covariance statistics in atmospheric variational data assimilation. I: Characteristics and measurements of forecast error covariances, Q.J.R. Meteorol. Soc. 134, 1951-1970. Fisher M., 1998, Development of a simplified Kalman filter, ECMWF ech. Memo. 260 [availale from ECMWF, Shinfield Park, Reading, Berkshire, RG2 9AX, UK]. Fisher M., 2007, he sensitivity of analysis errors to the specification of ackground error covariances, Pp. 27-36 of ECWMF Workshop Proceedings on Flow-dependent aspects of Data Assimilation, 11-13 June 2007, ECMWF, Reading, UK. Press W.H., eukolsky S.A., Vetterling W.., Flannery B.P. 1986, Numerical Recipes in Fortran, Camridge University Press. Rodgers C.D., Inverse methods for atmospheric sounding: theory and practice (Sec. 8.2.3). Series on atmospheric, oceanic and planetary physics, Vol. 2, World Scientific. 10

S Appendix A: the Gram-Schmidt procedure he creation of an orthogonalized set of vectors, si, from a set that is made up of non-orthogonal vectors, s i, may e performed y the Gram-Schmidt procedure. he space spanned y each set of vectors is the same, ut the orthogonalized set is more convenient to work with. Let orthogonalization e performed with respect to the W 1 inner product. Let the first vector of the orthogonalized set, s1, e the first vector of the non-orthogonal set, s 1, ut normalized to have unit length (under the W 1 norm). s1 = 1 s 1, (A.1) N 1 s1w 1 s1 = 1, (A.2) N 1 = s 1W 1 s 1. (A.3) For i 1, the i+1th orthogonalized vector is defined as the i+1th non-orthogonal vector minus a linear comination of all previous vectors defined (and normalized) 1 ( si + 1 = N s i + 1 i + 1 i j) αj,i + 1s. (A.4) he coefficients,, are chosen for orthogonality as follows αj,i + 1 sj W 1 si + 1 = δj,i + 1. (A.5) Performing an inner product of (A.4) with sk (where 1 k i) gives skw 1 1 ( si + 1 = N s k W 1 s i + 1 i + 1 i j) αj,i + 1skW 1 s = 0. (A.6) his is set equal to zero y (A.5) and y the fact that k i + 1. Further use of (A.5) leads to his determines the coefficients. Let unnormalized vector) where skw 1 s i + 1 αk,i + 1 = 0, αj,i + 1 ti + 1 = sj W 1 s i + 1. e the part of (A.4) inside the rackets (i.e. the si + 1 = ti + 1 1 ti + 1, N i + 1 = s i + 1 i αj,i + 1s j, which can e calculated now that the coefficients are known (A.7). similar way to (A.2) and (A.3) N i + 1 then follows in a (A.7) (A.8) (A.9) si + 1W 1 si + 1 = 1, (A.10) N i + 1 = t i + 1 W 1 ti + 1. (A.11) Appendix B: Proof of the orthogonality of the residual in (9) In these notes a vector in model space, δx is divided into a part that is spanned y columns of, δx s, and a residual, δx s, given y (9). Here it is proved that any vector that exists only in the 11

spanned space is ' e shown that W 1 First eliminate δx s using (9) -orthogonal' to one that exists only in the residual. Mathematically, it is to δxs W 1 δx s = 0. (B.1) δxs W 1 δx s = (δx δx s ) W 1 δx s. (B.2) δx s can e written in terms of δx y comining (6) and (8) Sustituting (B.3) into (B.2) gives δxs W 1 δx s δx s = S S W 1 δx. (B.3) = (δx S S W 1 δx) W 1 S S W 1 δx = δx W 1 S S W 1 δx δx W 1 S S W 1 S S W 1 δx. (B.4) Next use the orthogonality of the suspace (7) which proves that each term in (B.4) is equal and opposite, thus giving zero and proving (B.1) δxs W 1 δx s = δx W 1 S S W 1 δx δx W 1 S S W 1 δx = 0. (B.5) Appendix C: Design of the sequence of Householder transforms It is now shown how X can e formulated to achieve property (20). Fisher (1998) states that this can e achieved with a sequence of Householder transformations. A single Householder transformation, P, (e.g. Press et al. 1986) may e written as follows where P = I 2 uu u u, u = x x e 1. (C.1) (C.2) he vector x is aritrary and e 1 is a vector which is zero valued except for the first element, which has unit value (the properties of the Householder transformation hold for a general single element eing chosen instead, although here we always choose the first element). P is useful ecause it has the following useful properties. he Householder transformation is orthogonal PP = (I 2 uu u u ) (I 2 uu u u ), = I 4 uu u u + 4uu uu u u u u, = I 4 uu u u + 4uu u u = I. (C.3) When acting on the state x, which is used to define P in (C.1) and (C.2), the result is a vector with all ut the first element zero Px = (I 2 uu u u ) x, = (I 2 u (x x e 1) u u ) x, 12

= ( I 2 u (x x e 1) 2x x 2 x x 1) x, = x 2 u (x x x x 1 ) 2x x 2 x x 1, = x u, = ± x e 1. (C.4) When acting on a state x, which is orthogonal to the state x, which is used to define P in (C.1) and (C.2), the result is a vector with zero in the first element Note u x = x x = x x1, x e 1x, (C.5) then Px = (I 2 uu u u ) x, = x = x 2 x ± 2 x x 1 (x x e 1 ), 2x x 2 x x 1 x x 1 x ± x x1x x1 x 2 e 1. (C.6) x x x x 1 Equation (C.6) does not have weight in element 1. o show this, do a scalar product with e 1 e 1Px = x 2 x1 x x 1 x1 ± x x1x 1 x1 x 2 = 0, (C.7) x x x x 1 In these equations, x1 and x 1 are the first components of x and x respectively. he first property gives P = P = P 1. hese properties can e comined to give X in the following way. Defining R (0) = L 1 S for the HSV calculation or R (0) = L 1 U for the EM calculation, let X R (0) e a vector of two parts X R (0) A = ( 0 ), (C.8) where A is a K K matrix consistent with the required property of (20). In fact, y the way that X is to e formed, A will turn out to e upper triangular. Let Each X R (0) = P K P k P 2 P 1 R (0). (C.9) P l transformation is Householder-like according to the following, e.g. for P 1 P 1 R (0) = (I 2 uu u u ) R (0), u = r (0) 1 r (0) 1 e (N) 1, (C.10) where r (0) 1 is the first column of R (0) and e (N) 1 is the N-element vector with all ut the first element zero (which is unity). his generates ( a new matrix R (1) ) = P 1 R (0) which has the form r (1) 11 r (1) 12 r (1) 13 r (1) 1K 0 r (1) 22 r (1) 23 r (1) 2K 0 r (1) R (1) 32 r (1) 33 r (1) 3K =, (C.11) 0 r (1) k2 r (1) k3 r (1) kk 0 r (1) N2 r (1) N3 r (1) NK 13

having only the first element non-zero of the first column (since P 1 is designed in terms of the first column of R (0) ). he aim now is to act with a N 1 N 1 element Householder operator on R (1) excluding the first row and first column ( 1 0 ) P 2 R (1) = 0 I 2 uu R(1) u u, u = r (1) 2 r (1) (N 1) 2 e 1, (C.12) where the partitioned-off (lower right) part of P 2 is a N 1 N 1 matrix, r (1) 2 is the N 1- element second column of R (1) (N 1) (excluding the first component), and e 1 is the N 1-element vector with all ut the first element zero (which is unity). his generates a new matrix R (2) = P 2 R (1) which has the form ( ) r (1) 11 r (1) 12 r (1) 13 r (1) 1K 0 r (2) 22 r (2) 23 r (2) 2K 0 0 r (2) R (2) 33 r (2) 3K =. (C.13) 0 0 r (2) k3 r (2) kk 0 0 r (2) N3 r (2) NK he kth operator, P k, has the form ( I 0 ) P k R (k 1) = 0 I 2 uu R(k u u 1) (k 1), u = r (k 1) (N k + 1) k r k e 1, (C.14) where the partitioned-off part of P k is a N k + 1 N k + 1 matrix, r k is the N k + 1-element kth column of R (k 1) (N k + 1) (excluding the first k 1 components) and e 1 is the N k + 1-element vector with all ut the first element zero (which is unity). After all K operators have acted, the result is R (k) = ( X R (0) ) r (1) 11 r (1) 12 r (1) 13 r (1) 1K 0 r (2) 22 r (2) 23 r (2) 2K 0 0 r (3) R (k) 33 r (3) 3K =. (C.15) 0 0 0 r (k) KK 0 0 0 0 (k 1) where the top section is the matrix A in (C.8), and the ottom section comprises zeros. It should also e shown that XX the property that P k Pk = I = I. From (C.9) this is easy to show given that each pair has P K P k P 2 P 1 P 1 P 2 P k P K = I. (C.16) It remains to e shown that the string of Householder operators P K P k P 2 P 1 acting on a (0) state, r (which is orthogonal to all columns of R (0) ) gives a state that is zero in the first K elements. All P k are formed in the same way as shown aove (ie with respect to the R (k) matrices). (1) (0) (0) First let r = P 1 r. Since r is orthogonal to 1 (the latter is the vector used to define P 1 in (1) (C.10)), and y property (C.7), the vector r has zero in the first element. Next, let 14 r (0)

(2) (1) r = P 2 r. By a similar argument, if the vector formed from the last N 1 components of r is orthogonal to the vector formed from the last N 1 components of r (1) 2 (the latter is the vector (2) used to define P 2 in (C.12)), and y property (C.7), the vector r will have zero in the first two (0) elements. Because the first element of P 1 r is always zero, the remaining N 1-component inner product in question is equal to the full N-component inner product as the first element contriutes zero. he orthogonality test is therefore satisfied if the following N-component inner product is zero (P 1 r (0) ) P 1 r (0) (0) 2 = r P 1 P 1 r (0) (0) 2 = r r (0) 2 = 0. (C.17) (0) his is satisfied ecause the vector r is orthogonal to r (0) 2 y definition. hese arguments continue for all K operators. he final result is a vector of zeros in the first K components. (1) 15