A reduced and limited-memory preconditioned approach for the 4D-Var data-assimilation problem

Size: px

Start display at page:

Download "A reduced and limited-memory preconditioned approach for the 4D-Var data-assimilation problem"

Chad Ryan
5 years ago
Views:

1 Quarterly Journal of the Royal Meteorological Society Q. J. R. Meteorol. Soc. 137: , January 211 B A reduced and limited-memory preconditioned approach for the 4D-Var data-assimilation problem S. Gratton, a * P. Laloyaux, b A. Sartenaer b and J. Tshimanga a a INPT ENSEEIHT and IRIT, Toulouse, France b FUNDP, Namur, Belgium *Correspondence to: Dr S. Gratton, CERFACS, 42 av G. Coriolis, Toulouse 3157, France. gratton@cerfacs.fr We recall a theoretical analysis of the equivalence between the Kalman filter and the four-dimensional variational (4D-Var) approach to solve data-assimilation problems. This result is then extended to cover the comparison of the singular evolutive extended Kalman (SEEK) filter with a reduced variant of the 4D-Var algorithm. We next concentrate on the solution of the 4D-Var, which is usually computed with a (truncated) Gauss Newton algorithm using a preconditioned conjugate-gradient-like (CG) method. Motivated by the equivalence of the abovementioned algorithms, we explore techniques used in the SEEK filter and based on empirical orthogonal functions (EOFs) as an attempt to accelerate the Gauss Newton method further. This leads to the development of an appropriate starting point for the CG method, together with that of a powerful limited-memory preconditioner (LMP), as shown by preliary numerical experiments performed on a shallowwater model. Copyright c 211 Royal Meteorological Society Key Words: SEEK filter; empirical orthogonal functions Received 28-Apr-21; Revised 27-Oct-21; Accepted 11-Nov-21; Published online in Wiley Online Library 28 February 211 Citation: Gratton S, Laloyaux P, Sartenaer A, Tshimanga J A reduced and limited-memory preconditioned approach for the 4D-Var data-assimilation problem. Q. J. R. Meteorol. Soc. 137: DOI:1.12/qj Introduction To compute forecasts of the state of the ocean and of the atmosphere, dynamical systems are integrated on highperformance computers within a given time window starting from an initial state (also called initial condition). This time window is related to the relevant time-scales of the physical situation considered. The ability to make an accurate forecast requires both that the dynamical systems be a good representation of the reality and that the initial condition be known precisely. Numerical weather forecasts have been performed since the 195s and have witnessed an explosion of activity since the 198s. Over the years, the quality of the models and methods for using observations has improved continuously, resulting in major forecast advancements. This progress is also due to the increased quantity of observational data and to the increased power of supercomputers available. The problem of detering the initial condition for a forecast is very complex and has become an active area of research on its own. Dataassimilation methods aim at specifying the initial condition of a dynamical system by combining information from both a numerical model and observations (Kalnay, 23; Rabier, 25). There are two main categories of data-assimilation algorithms: sequential methods based on statistical estimation theory and variational methods based on optimal control theory. Sequential methods, originally intended for linear dynamical systems and linear observational operators, have been introduced by Kalman (196) with the Kalman filter, and have been successfully applied in a wide range of engineering applications. Adaptations of this filter have been developed to overcome the nonlinearities and the computational cost inherent in large-scale observational systems (Todling and Cohn, 1994; Rozier et al., 27). It has led to the development of some suboptimal Kalman filters, among them the well-known singular evolutive extended Kalman (SEEK) filter (Pham et al., 1998; Hoteit Copyright c 211 Royal Meteorological Society

2 Reduced 4D-Var using the LMP Preconditioner 453 and Pham, 23) which makes use of so-called empirical orthogonal functions (EOFs). Variational methods for dataassimilation have undergone rapid development since their introduction (Le Dimet and Talagrand, 1986). These are based on the imization of a sum of squares of misfits, weighted to take the statistical property of the noise in the problem into account. The conventional formulation is the four-dimensional variational problem (4D-Var) (Courtier et al., 1994). As for the Kalman filter, adaptations have been proposed for large nonlinear systems in numerical weather prediction. The focus of the research has recently shifted towards the development of reduced control-space methods. A reduced 4D-Var formulation has been proposed by Robert et al. (25), which solves the 4D-Var problem in a subspace spanned by EOFs. Yaremchuk et al. (29) and Wang and Li (29) also make use of EOFs in their strategies, while Daescu and Navon (27) and Lawless et al. (28) study a method based on the reduction of the model itself. The equivalence between the Kalman filter and 4D- Var has been established for linear model and observation operators (Strang and Borre, 1997; Li and Navon, 21). It is the main purpose of the first part of this article to prove such an equivalence between the SEEK filter and the reduced 4D-Var. The second part of the article concentrates on the solution of 4D-Var, which amounts to the solution of a nonlinear weighted least-squares problem. This problem is generally solved by a (truncated) Gauss Newton method (Gratton et al., 27) using a preconditioned conjugategradient (CG) algorithm. In Tshimanga et al. (28), a class of limited-memory preconditioners (LMP) is proposed and studied to accelerate the convergence of the Gauss Newton process. Motivated by the equivalence proved in the first part, our goal here is to go one step further and to explore thetechniquesusedintheseekfilter,whicharebasedon relevant information contained in the EOFs, as an attempt to accelerate the Gauss Newton method further. We will show that this leads to the development of an appropriate starting point for the CG method together with that of a powerful preconditioner belonging to the LMP class proposed in Tshimanga et al. (28). This article describes the sequential and the variational approaches in data-assimilation in section 2 and investigates their theoretical connections in section 3. The major result is the equivalence between the SEEK filter and the reduced 4D-Var. In section 4 we describe the Gauss Newton method to solve the 4D-Var problem. We propose in section 5 a new approach to solve the 4D-Var problem that improves thecgmethodusedwithinthegauss Newtonalgorithm by combining a Ritz Galerkin starting point and a limitedmemory preconditioner based on the leading EOFs. In section 6, we interpret the numerical results obtained for a data-assimilation problem in a shallow-water model. Finally, we conclude and exae the perspective for future research in section Sequential and variational approaches For consistency with the literature on the subject, the notation proposed by Ide et al. (1997) will be adopted here as far as possible. A discrete model for the evolution of an atmospheric, oceanic or coupled system is governed by the equation x t i+1 = M i+1,ix t i, (1) where xi t Rn is a vector representing the true state at time t i and M i+1,i is a model operator. The exponent t is introduced to denote the true state of the system and R n is the state space. The model operator M i+1,i describes the system transition from t i to t i+1 and represents the integration in time of some partial differential equations governing the evolution of the system. Calligraphic fonts are used whenever the considered operators are assumed to be nonlinear. The model operator takes xi t as argument and we assume, throughout the article, that the model is perfect in the sense that no noise has been added in the system (Eq. (1)). At each time t i,thestatevector is observed according to y i = H i x t i + ɛ i, (2) where y i R p i is the observation vector with R p i the observation space at time t i, H i is an observation operator and ɛ i R p i is the observational noise due to instrumental and representativeness errors. This noise is assumed to consist of uncorrelated random vectors with zero means and symmetric positive-definite covariance matrices R i R p i p i. When the model and observation operators are linear, the dynamical system is described by x t i+1 = M i+1,ix t i, (3) where M i+1,i R n n is an n by n matrix, and the observations are obtained by y i = H i x t i + ɛ i, (4) where H i R pi n is a p i by n matrix. The distribution of the observations in space and time could be very nonuniform, with regions that are relatively data rich while others are much more poorly observed. Moreover, the size of the observation vectors y i is very much smaller than the size of the state space (p i n). The background, denoted by x b R n, is a prior estimation of the state vector. This additional information is available everywhere in the space at the initial time t. We suppose that the background errors are unbiased and uncorrelated with the observation errors. The background-error covariance matrix is symmetric positivedefinite and denoted by B R n n. We are now able to describe the two main approaches to solve data-assimilation problems. We suppose that a background x b is available at the initial time t and that we can perform N observations from time t 1 to time t N. All information used to estimate the state of the system is obtained within the time interval [t, t N ] which is called the assimilation window. In sequential data-assimilation, the observations y 1,..., y N are available sequentially at times t 1,..., t N. At each observation time t i,themethod constructs an analysis state xi a, which is an estimate of the true model state xi t using the available observations up to this time, y 1,..., y i,andthebackgroundx b.thekalmanfilter, applied to linear dynamical systems such as Eq. (3) with linear observational operators such as Eq. (4), computes the best (imal variance) linear unbiased estimate of the model system at each time step. We give the equations of the Kalman filter in Algorithm 1 (see Table I), which

3 454 S. Gratton et al. Table I. Algorithm 1: Kalman filter with perfect model operators. 1: x a = xb 2: P a = B 3: for i = 1toN do 4: (* Forecast step *) 5: xi f = M i,i 1 xi 1 a 6: P f i = M i,i 1 P a i 1 MT i,i 1 7: (* Analysis step *) 8: K i = P f i HT i (H ip f i HT i + R i ) 1 9: xi a = xi f + K i(y i H i xi f) 1: P a i = (I n K i H i )P f i 11: end for mainly involves two steps. In the forecast step, weintegrate the last analysis state to obtain a forecast xi f and update thecovariancematrixaccordingly.intheanalysis step, we correct the forecast using the observation y i and the Kalman gain K i to obtain the analysis at time t i. We also update the analysis covariance matrix denoted by P a i.thematrix I n R n n denotes the identity matrix of order n. The Kalman filter has a simple expression, but its application in realistic ocean and atmospheric models encounters major difficulties: nonlinearity, computational cost and storage in large-scale applications, and the model error. Since we assume that our model is perfect, we will not deal with model errors in this work. Nonlinearity can be partially solved by linearization of the model and the observation operators around the state estimate, and by perforg iterations. It leads to the so-called extended Kalman filter and iterated Kalman filter (Jazwinski, 197; Gelb, 1974; Bell and Cathey, 1993). The doant significant computational cost of the Kalman filter often comes from the computation of the error covariance matrices P f i and Pa i, respectively at line 6 and line 1 of Algorithm 1, since it implies matrix matrix multiplications with ranks equal to the size of the state vector, this being larger than 1 6 in many realistic applications. Moreover, storing the error covariance matrices P f i and Pa i explicitly may be unrealistic for large-scale problems. To overcome these difficulties, a set of suboptimal Kalman filters has been developed (Rozier et al., 27). The singular evolutive extended Kalman filter (SEEK filter) is one of them, and has been proposed in Pham et al. (1998). It consists of approximating the error covariance matrices by singular low-rank matrices. Finding a realistic backgrounderror covariance matrix in the operational models is also a serious problem. The SEEK filter deteres it from a set of state vectors, x 1,..., x l R n (l 2), supposedly representative of the variability of the system and computed for a time interval before the assimilation window. Regarding the state vector as a random vector and having a sample of realizations, one calculates the sample covariance matrix using the formula S = 1 l 1 l (x i x)(x i x) T, (5) where x is the sample mean. Formally, one considers a spectral decomposition of this covariance matrix S = L U L T + ˆL Û ˆL T, (6) Table II. Algorithm 2: SEEK filter with perfect model operators. 1: 2: x a = xb P a = L U L T 3: for i = 1toN do 4: (* Forecast step *) 5: xi f = M i,i 1 xi 1 a 6: L i = M i,i 1 L i 1 7: P f i = L i U i 1 L T i 8: (* Analysis step *) 9: U i = (U 1 i 1 + LT i HT i R 1 i H i L i ) 1 1: K i = L i U i L T i HT i R 1 i 11: xi a = xi f + K i(y i H i xi f) 12: P a i = L i U i L T i 13: end for where the diagonal matrix U R r r contains the r largest eigenvalues of S and the diagonal matrix Û R (n r) (n r) contains the remaining eigenvalues, while the matrices L R n r and ˆL R n (n r) are formed with the corresponding eigenvectors, called empirical orthogonal functions (EOFs). The columns of the partition matrix [L ˆL ] form an orthonormal basis of R n called the EOF basis. We are interested in situations in which using only the doant EOFs stored in L enables one to obtain a satisfactory approximate solution. In particular, the initial error covariance matrix in the SEEK filter is then defined by P a = L U L T, where only L and U have to be computed. This low-rank matrix plays the role of the background-error covariance matrix B and keeps most of the information contained in S, sinceu contains the largest eigenvalues. The equations of the SEEK filter, given in Algorithm 2 (see Table II), follow from the equations of the Kalman filter and from this low-rank approximation (see Pham et al., 1998). Notice that the error covariance matrices at lines 7 and 12 are lowrank matrices and are computed only for diagnostic reasons. The SEEK filter performs corrections only in the directions spanned by the columns of L i. These columns evolve since they are integrated in time at each iteration. There are physical considerations regarding oceanographic or atmospheric models that support the above approximation. The reason is that most of the variability of the oceanic and atmospheric systems can be described by a low-dimensional subspace. Indeed, the ocean and atmosphereare basicallyforcedand dissipative dynamical systems that exhibit an attractor, meaning that asymptotically the trajectories of the state vector belong only to a small part of the phase space. In the vicinity of this attractor, orthogonal perturbations will be naturally damped while tangent perturbations will not. This is why the SEEK filter performs corrections only in the directions spanned by the columns of L i, which are the main directions of the variability tangent to the attractor. The choice of the subspace defining the directions of correction is not unique. It is possible to use, for instance, singular vectors, Liapunov vectors or bred grown vectors. These are computed using the tangent linear model or the nonlinear model and contain, in a sense, the main directions of variability of the system. Durbiano (21)

4 Reduced 4D-Var using the LMP Preconditioner 455 performed a study of these families of vectors. When the data-assimilation is performed on a shallow-water model, she concluded the clear superiority of the EOF basis with regards to the other subspaces. Variational data-assimilation is the second approach for data-assimilation. The conventional formulation of this approach is the so-called four-dimensional variational problem (4D-Var). The 4D-Var problem can be stated as that of detering the system state x at the initial time t that produces a trajectory that best fits, in a nonlinear weighted least-squares sense, both the background x b and the observations y 1,..., y N. Mathematically, we formulate the imization problem as a nonlinear least-squares problem: J(x) = 1 x R n 2 (x xb ) T B 1 (x x b ) + 1 N (y i H i M i, x) T R 1 i (y i H i M i, x), (7) 2 where M i, = M i,i 1...M 1, is an operator that describes the system transition from time t to time t i. This method uses all available observations at the same time to define the objective function. If the model and observation operators are linear, we replace the nonlinear operators by M i, = M i,i 1...M 1, and H i and the 4D-Var becomes a linear least-squares problem. Note that the 4D-Var formulation (Eq. (7)) can be written in a more compact form: x R n J(x) = 1 2 x xb 2 B 1 z 2 R 1 i N y i H i M i, x 2 R 1, (8) i where weighted norms, defined by z 2 B 1 = z T B 1 z and = z T R 1 i z,areused. As for the Kalman filter, it is possible to reduce the computational cost of the method. The 4D-Var problem (Eq. (8)) can be defined in a subspace of low dimension. Let us express the variable x in the EOF basis as x = L x 1 + ˆL x 2. The 4D-Var problem using this decomposition can then be stated as J(x 1, x 2 ) = 1 x 1,x 2 2 L x 1 + ˆL x 2 x b 2 B N y i H i M i, (L x 1 + ˆL x 2 ) 2 2 R 1. i To perform corrections in the same directions as those generated by the SEEK filter, the components of x in the range of ˆL are neglected. A new optimization problem depending only on x = x 1 R r can thus be defined, which approximates the previous one: x R r J(x) = 1 2 L x x b 2 B N y i H i M i, L x 2 R 1. (9) i In many oceanographic and atmospheric problems (Hoteit and Pham, 23), one can choose the dimension of the subspace L to be very small compared with the dimension of the state space, since only a few EOFs explain most of the variability of the system. As a consequence, the computational cost to solve the approximate problem (Eq. (9)), called reduced 4D-Var, is very much lower than that of the solution of the initial 4D-Var problem (Eq. (8)). 3. Connections between sequential and variational approaches The connections between sequential and variational methods have been known for a long time. For a linear model and observation operators, both the Kalman filter and 4D-Var yield the same estimate of the system state (theoretically) at the end of the assimilation window, as shown in Strang and Borre (1997) and Li and Navon (21). Their proofs are based on a block matrix and on a statistical view, respectively. In this section, we propose an alternative proof for the equivalence between the Kalman filter and 4D-Var in the specific case of a perfect linear model operator and a noisy linear observation operator. This proof, which is based on quadratic optimization techniques, is both short and elementary. Moreover, we use some results from this proof to establish a useful connection between the SEEK filter and the reduced 4D-Var problem presented in section 2. In order to be as concise as possible, we set M i = M i,, H = H i and R = R i throughout this section. Generalizing the proofs would just result in a heavier notation Connection between 4D-Var / Kalman filter Theorem 1 Suppose that the model operator is perfect and linear, that the observation operator is linear and that the same background x b,observationsy i, background covariance matrix B and observation covariance matrices R i are given. Then the analysis state x a i computed at time t i by the Kalman filter and the solution produced by 4D-Var using the first i observations andintegrateduptotimet i are identical and have the same covariance matrices. Both methods hence produce the same results at the end of the assimilation window. Proof. We first introduce the notation x k l to denote the solution of the 4D-Var (8) (with a linear model and observation operators) using the first l observations and integrated up to time t k, x R n J l(x) = 1 2 x xb 2 B l y i HM i x 2 R 1, and use P k l to denote the covariance matrix of x k l.wethus want to prove, for each observation time l = 1,..., N, that the solution of the 4D-Var using the first l observations and integrated up to time t l is equal to the analysis state produced by the Kalman filter after l iterations, i.e. x l l = xa l,andthat both state vectors have the same covariance matrices, i.e. P l l = Pa l. By the word equivalence we mean, throughout the article, the fact that the considered methods produce the same results at the end of the assimilation window.

5 456 S. Gratton et al. The proof is by induction. To prove that x1 1 = xa 1 and P 1 1 = Pa 1,wefirstnoticethatx = xb = x a,sincexb is the solution of the 4D-Var problem using no observations. Therefore, the covariance matrix of x is given by P = B = Pa. We next define the 4D-Var problem using only the first observation y 1 at time t 1 as J 1(x) = 1 x R n 2 x x 2 (P + 1 ) 1 2 y 1 HM 1 x 2 R 1. (1) The solution x1 of this problem is computed by nullifying its gradient, (P ) 1 (x1 x ) MT 1 HT R 1 (y 1 HM 1 x1 ) =. (11) Defining W (P ) 1 + M T 1 HT R 1 HM 1, (12) we obtain, by simple extraction of the vector x1 from Eq. (11), x 1 = W 1 ( M T 1 HT R 1 y 1 + (P ) 1 x ). (13) Note that the matrix W is invertible since it is the sum of a positive-definite and a positive-semi-definite matrix, which gives a positive-definite matrix. From the Sherman Morisson Woodbury formula (see, for example, Horn and Johnson, 26, p.18), we have that W 1 = P P MT 1 HT (R + HM 1 P MT 1 HT ) 1 HM 1 P. (14) Using this formula in Eq. (13), developing the product and reordering the terms, we obtain x 1 =x + P MT 1 HT R 1 y 1 P MT 1 HT (R + HM 1 P MT 1 HT ) 1 HM 1 P ( M T 1 H T R 1 y 1 + (P ) 1 x ). (15) Observing that HM 1 P ( M T 1 H T R 1 y 1 + (P ) 1 x) = (HM 1 P MT 1 HT + R)R 1 y 1 y 1 + HM 1 x, (16) we obtain, after appropriate simplifications, x 1 = x + P MT 1 HT (R + HM 1 P MT 1 HT ) 1 (y 1 HM 1 x ) (17) as a solution of the 4D-Var (Eq. (1)). If we next integrate this solution x1 up to the first time step, we find x 1 1 = M 1x + M 1P MT 1 HT (R + HM 1 P MT 1 HT ) 1 (y 1 HM 1 x ). It thus follows from Algorithm 1 that x1 1 = xa 1, since M 1 x = M 1x a = xf 1, M 1P MT 1 = M 1P a MT 1 = Pf 1 and we use the same observation y 1. Moreover, observe from Eq. (11) that W, as defined in Eq. (12), is the Hessian matrix of J 1. Following Rabier and Courtier (1992), who proved that the covariance matrix of a 4D-Var solution is equal to the Hessian inverse, we can thus conclude that the covariance matrix of x1 is given by P 1 = W 1. (18) We have that the covariance matrix of x1 1 is given by P 1 1 = Cov(x1 1, x1 1 ) = Cov(M 1x1, M 1x1 ) = M 1 Cov(x1, x 1 )MT 1 = M 1P 1 MT 1. By Eqs (14) and (18), we deduce from this last expression that P 1 1 = M 1P MT 1 M 1P MT 1 HT (R + HM 1 P MT 1 HT ) 1 HM 1 P MT 1, which corresponds to the analysis-error covariance matrix P a 1 produced by the Kalman filter (see Algorithm 1), since M 1 P MT 1 = Pf 1. Now, we suppose that when the first j observation vectors are available, the solution xj of the corresponding 4D-Var problem x R n J j(x) = 1 2 x x 2 (P ) j y i HM i x 2 R 1 satisfies M j xj = x j j = xa j and P j j = Pa j. We know that the Hessian matrix of J j is (P j ) 1, allowing us to express the quadratic function J j using its Taylor expansion around the solution xj as J j (x) = J j (x j ) (x x j )T (P j ) 1 (x x j ), remembering that J j (xj ) =. We add the next available observation vector y j+1 to this expression and obtain the expression of the 4D-Var problem using the first j + 1 observation vectors x R n J j+1(x) = J j (x j ) x x j 2 (P j ) 1 It remains to prove that x j+1 j+1 = xa j y j+1 HM j+1 x 2 R 1. (19) and that Pj+1 j+1 = Pa j+1. Observing that the first term of Eq. (19) is constant, we can apply the same reasoning as the one used to deduce Eqs (17) and (18) from Eq. (1) and write the solution of Eq. (19) as x j+1 = x j + P j MT j+1 HT (R + HM j+1 P j MT j+1 HT ) 1 (y j+1 HM j+1 xj ), (2) with the covariance matrix P j+1 = ((P j ) 1 + M T j+1 HT R 1 HM j+1 ) 1. (21)

6 Reduced 4D-Var using the LMP Preconditioner 457 If we integrate the solution x j+1 uptothetimestept j+1,we have x j+1 j+1 = M j+1x j + M j+1 P j MT j+1 HT Observing that (R + HM j+1 P j MT j+1 HT ) 1 (y j+1 HM j+1 x j ). P j j = Cov(xj j, xj j ) = Cov(M jxj, M jxj ) = M j Cov(xj, x j )MT j = M j P j MT j, we can thus conclude that x j+1 j+1 = xa j+1 from Algorithm 1 and the recurrence assumptions, since we have and M j+1 x j = M j+1,j M j x j = M j+1,j x j j = M j+1,j x a j = x f j+1 M j+1 P j MT j+1 = M j+1,jm j P j MT j MT j+1,j = M j+1,j P j j MT j+1,j = M j+1,j P a j MT j+1,j = P f j+1, (22) and the same observation y j+1. The covariance matrix of x j+1 j+1 is given by P j+1 j+1 = Cov(xj+1 j+1, xj+1 j+1 ) = Cov(M j+1 xj+1, M j+1xj+1 ) = M j+1 Cov(xj+1, x j+1 )MT j+1 = M j+1 P j+1 MT j+1. Replacing P j+1 by its expression Eq. (21) and using the righthand side of Eq. (14) (with subscripts j and j + 1insteadof and 1, respectively), we obtain P j+1 j+1 = M j+1p j MT j+1 M j+1p j MT j+1 HT (R + HM j+1 P j MT j+1 HT ) 1 HM j+1 P j M j+1. (23) Using Eq. (22) in Eq. (23), we obtain P j+1 j+1 = Pa j+1,which ends the proof. Note that the proof of this theorem shows (see Eqs (2) and (21)) how to update the solution of the 4D-Var and its covariance matrix incrementally when observations are obtained sequentially, without processing all the information from the beginning Connection between reduced 4D-Var / SEEK filter In a recent article, Krysta et al. (submitted) proposed a hybridization between the reduced 4D-Var and a SEEK smoother (issued from the SEEK filter) with the purpose of ensuring the evolution of the background-error covariance matrix within the reduced 4D-Var. In doing so, the authors have shown the equivalence between the reduced 4D-Var and a SEEK smoother. In what follows, we investigate the connection between the reduced 4D-Var and the SEEK filter itself, in the particular case in which the background covariance matrix B is equal to the sample covariance matrix S computed from a set of state vectors, see Eq. (5). In this specific case, the reduced 4D-Var problem (Eq. (9) (with linear model and observation operators)) can be written as x R r J(x) = 1 2 L x x b 2 S N y i HM i L x 2 R 1. (24) Before proving that this problem is equivalent to the SEEK filter, we have to reformulate it in another way. This is the purpose of the following lemma. Lemma 1 The solution of the reduced 4D-Var Eq. (24) is equal to the solution of J(x) = 1 x R r 2 x xb 2 U N y i HM i L x 2 2 R 1, (25) where x b = L T xb is the reduced background and U is the reduced background covariance matrix, which is a diagonal matrix containing the r largest eigenvalues of S. Proof. Since the columns of [L ˆL ] form an orthonormal basis of R n,wehavetheequality[l ˆL ][L ˆL ] T = I n (see Meyer, 2, p.32) which can be rewritten as L L T + ˆL ˆL T = I n andusedtodecomposethebackground vector x b as x b = L L T xb + ˆL ˆL T xb. (26) Substituting Eq. (26) in the background term of Eq. (24), we obtain L x x b 2 S 1 = L (x L T xb ) ˆL ˆL T xb 2 S 1, the right-hand side of which can be reformulated as L (x L T xb ) 2 S 1 + ˆL ˆL T xb 2 S 1 2(L (x L T xb )) T S 1 (ˆL ˆL T xb ). We observe that the third term is equal to zero, using Eq. (6) and the decomposition of S 1 in the basis [L ˆL ], S 1 = L U 1 LT + ˆL Û 1 ˆL T, (27) and twice the equality L T ˆL =. The second term is constant and can thus be ignored for the imization. Finally, we can reformulate the first term as L (x L T xb ) 2 S 1 = x L T xb 2 L T S 1 L, which is equal to x x b 2 U 1, (28)

7 458 S. Gratton et al. by Eq. (27) and where x b = L T xb. This concludes the proof, since the background term of Eq. (24) is equal to Eq. (28) plus a constant term. We can now prove the main theoretical result of this article, i.e. the equivalence between specific formulations of the SEEK filter and the reduced 4D-Var. More precisely, we consider a SEEK filter using, as initial analysis x a,the projection of the background x b onthesubspacespanned by the columns of L, and the reduced 4D-Var Eq. (24), in which the sample covariance matrix S plays the role of the background covariance matrix B. Theorem 2 Suppose that the model operator is perfect and linear, that the observation operator is linear and that the same background x b, observations y i and observation covariance matrices R i are given. Assume moreover that the initial analysis of the SEEK filter x a is equal to L L T xb and that its covariance matrix P a is equal to L U L T. Then the analysis state xi a computed at time t i by the SEEK filter and the solution produced by the reduced 4D-Var Eq. (24), using the first i observations, prolongated in the full space and integrated up to time t i, are identical and have the same covariance matrices. Both methods hence produce the same results at the end of the assimilation window. Proof. We have to prove that the reduced 4D-Var Eq. (24) is equivalent to the SEEK filter (with a specific choice for x a). Using Lemma 1, this amounts to proving the equivalence of this last with Eq. (25). To this aim, we use the same scheme as in the proof of Theorem 1. We first introduce the notation x l to denote the solution of the reduced 4D-Var Eq. (25) using the first l observations and Pl to denote the covariance matrix of x l. Moreover, we introduce the notation xk l to denote the prolongation of the solution x l integrated up to time t k and P k l to denote the covariance matrix of xl k. We thus want to prove that the solution of the reduced 4D-Var Eq. (25) using the first l observations, prolongated andintegrateduptotimet l, is equal to the analysis state produced by the SEEK filter after l iterations, i.e. xl l = xa l,and that both state vectors have the same covariance matrices, i.e. P l l = Pa l, for each observation time l = 1,..., N. The proof is again by induction. To prove that x1 1 = xa 1 and P 1 1 = Pa 1, we first notice that x = L x = L x b = L L T xb = x a,sincexb is the solution of the reduced 4D-Var problem using no observations and from the definition of x b in Lemma 1 and the assumption on x a. Therefore, the covariance matrix of x is given by P = L U L T = Pa.We next define the reduced 4D-Var problem when only one observation vector is available at the first time step as x R r J 1 (x) = 1 2 x x 2 (P ) y 1 HM 1 L x 2 R 1. Using the equivalent of Eqs (12), (13) and (18) from Theorem 1 adapted to the reduced space, we obtain as solution x 1 = ( P 1 L T M T 1 HT R 1 y 1 + (P ) 1 x ) (29) where P 1 = ( (P ) 1 + L T MT 1 HT R 1 HM 1 L ) 1 (3) is its covariance matrix. The solution (Eq. (29)) can now be rewritten as x 1 = P 1 LT MT 1 HT R 1 y 1 + P [ 1 (P ) 1 + L T MT 1 HT R 1 HM 1 L L T MT 1 HT R 1 ] HM 1 L x, or, equivalently, using Eq. (3) x 1 = x + P 1 LT MT 1 HT R 1 (y 1 HM 1 L x ). (31) If we formulate this reduced solution in the full space R n and integrate it up to the first time step, we obtain x 1 1 = M 1L x 1 = M 1L x + M 1 L P 1 LT MT 1 HT R 1 (y 1 HM 1 L x ), and retrieve the solution of Algorithm 2 after one iteration. Indeed, by the algorithm, we have that M 1 L x = M 1x = M 1 x a = xf 1 and, using Eq. (3) and the equality P = U,that P 1 = U 1. This and the fact that we use the same observation y 1 implies that x1 1 = xa 1. Moreover, the covariance matrix of is given by x 1 1 P 1 1 = Cov(x1 1, x1 1 ) = Cov(M 1L x 1, M 1L x 1 ) = M 1 L Cov(x 1, x 1 )LT MT 1 = M 1 L P 1 LT MT 1 = L 1U 1 L T 1 = Pa 1, which is the analysis-error covariance matrix given by the SEEK filter after one iteration. Now, we suppose that when the first j observation vectors are available, the solution x j of the corresponding reduced 4D-Var problem x R r J j (x) = 1 2 x x 2 (P ) j y i HM i L x 2 R 1 satisfies M j L x j = x j j = xa j and P j j = Pa j. We also assume that P j = U j.since(p j ) 1 is the Hessian matrix of J j, it allows us to express the quadratic function J j using its Taylor expansion around the solution x j as J j (x) = J j (x j ) (x x j )T (P j ) 1 (x x j ), remembering that J j (x j ) =. We add the next available observation vector y j+1 to this expression and obtain the expression of the reduced 4D-Var problem using the first j + 1 observation vectors, x R r J j+1 (x) = J j (x j ) x x j 2 (P j ) y j+1 HM j+1 L x 2 R 1.

8 Reduced 4D-Var using the LMP Preconditioner 459 Taking into account that J j (x j ) is constant, the solution of this problem, based on the same reasoning used to derive Eqs (3) and (31), may be written as x j+1 = x j + P j+1 LT MT j+1 HT R 1 (y j+1 HM j+1l x j ), with the covariance matrix P j+1 = ((P j ) 1 + L T MT j+1 HT R 1 HM j+1 L ) 1. Using Algorithm 2, we remark that Pj+1 = U j+1 since P j = U j. If we formulate the solution x j+1 in the full space and integrate it up to the time step j + 1, we have x j+1 j+1 = M j+1l x j+1 = M j+1l x j + M j+1 L P j+1 LT MT j+1 HT R 1 (y j+1 HM j+1 L x j ). Using the recurrence assumptions, we retrieve the SEEK filter solution xj+1 a and its covariance matrix after j + 1 iterations, since P j+1 = U j+1,and and M j+1 L x j = M j+1,j M j L x j = M j+1,j x j j = M j+1,j x a j = x f j+1 P j+1 j+1 = Cov(xj+1 j+1, xj+1 j+1 ) = Cov(M j+1 L x j+1, M j+1l x j+1 ) = M j+1 L Cov(x j+1, x j+1 )LT MT j+1 = M j+1 L P j+1 LT MT j+1 = L j+1 U j+1 L T j+1 = P a j+1 byalgorithm2.thisendstheproof. 4. Enhanced Gauss Newton method for 4D-Var In atmospheric and oceanographic applications, due to nonlinear model and observation operators, the 4D-Var Eq. (7) is a nonlinear least-squares problem. A common method for solving this problem is the Gauss Newton algorithm (Bjorck, 1996; Nocedal and Wright, 26). To introduce this method, we reformulate Eq. (7) as J(x) = 1 x R n 2 (x xb ) T B 1 (x x b ) (G(x) y)t R 1 (G(x) y), (32) where y = ((y 1 ) T,...,(y N ) T ) T R p (with y i R p i and p = N p i) is the observation vector and the covariance matrix R R p p is formed with each R i for i = 1,..., N. The operator G : R n R p maps the initial condition into the space of the observation vector and is given by G(x) = H 1 M 1, x. H i M i, x. H N M N, x, where the observation and model operators are assumed to be nonlinear. We can rewrite Eq. (32) in a more compact form as with f(x) = x R n J(x) = 1 2 f(x) 2 2 (33) ( B 1/2 (x x b ) R 1/2 (G(x) y) ). The matrices B 1/2 and R 1/2 are the inverse of the squareroot factors of B = B 1/2 (B 1/2 ) T and R = R 1/2 (R 1/2 ) T, respectively. The Gauss Newton method is an iterative method that solves Eq. (33) through the solution of a sequence of linear least-squares problems of the form 1 δx R n 2 F kδx + f(x k ) 2 2, (34) where x k is the current estimate of the initial condition and F k is the Jacobian matrix of f at x k and is given by ( ) B 1/2 F k = R 1/2, G k with G k,thejacobianmatrixofg at x k. The solution of Eq. (34), denoted δx k, can easily be computed by nullifying its gradient, yielding a sequence of symmetric positive-definite linear systems to solve, F T k F kδx = F T k f(x k), (35) known as the normal equations. We rewrite the system Eq. (35) in a more compact form as where A k δx = b k, (36) A k = F T k F k = B 1 + G T k R 1 G k (37) is a symmetric positive-definite approximation of the Hessian matrix of J at x k that ignores terms involving second-order derivatives of f,while b k = F T k f(x k) = B 1 (x b x k ) + G T k R 1 (y G(x k )) is the negative of the gradient of J at x k.anewestimate of the solution of Eq. (33) is computed using the update formula x k+1 = x k + δx k. A nonlinear model integration is then performed from this new approximation to update the reference trajectory and to compute f(x k+1 ). We summarize the Gauss Newton method in Algorithm III (see Table III). The sequence of linear least-squares problems (Eq. (34)) can be viewed as a sequence of (strictly convex) quadratic

9 46 S. Gratton et al. Table III. Algorithm 3: Gauss Newton method. x = x b for k =, 1, 2,...do 1 δx k = arg δx R n 2 F kδx + f(x k ) 2 2 x k+1 = x k + δx k end for problems that approximate the nonlinear function J given ineq.(32).theycanberewrittenintothewell-known incremental 4D-Var formulation 1 δx R n 2 (δx dbk ) T B 1 (δx d bk ) (G kδx d ok ) T R 1 (G k δx d ok ), (38) where d bk = x b x k and d ok = y G(x k ) are departure vectors. Though symmetric positive-definite linear systems like Eq. (36) could be solved using direct methods based on the Cholesky factorization of A k or on the QR factorization of F k, for instance, iterative methods are better suited for large operational systems such as those we are interested in. Moreover the matrices A k cannot be stored and only matrix vector products involving them can be computed. The enhanced Gauss Newton method we consider here thus makes use of a conjugate-gradient-like (CG) method. Additionally, in order to reduce the computational cost, the Jacobian matrices G k in the incremental 4D-Var Eq. (38) are approximated and only a limited number of outer Gauss Newton iterations and inner CG iterations are performed (Weaver et al., 23). As pointed out by Lawless et al. (24) and Gratton et al. (27), this amounts to the incremental 4D-Var introduced by Courtier et al. (1994). Focusing now on the CG method to solve Eq. (36), it is well-known (see Nocedal and Wright, 26, for instance) that its convergence rate is influenced by the starting point, the right-hand side and the actual distribution of the eigenvalues (e.g. the existence of eigenvalue clusters). Therefore, the development of good starting points and preconditioners is essential to obtain a better convergence rate. We address these issues in the next section. 5. Using EOFs within the Gauss Newton method Robert et al. (25) (see also Robert et al., 26) make use of the EOFs to define an appropriate reduced control space and estimation of the background-error covariance matrix. In their approach, an optimal solution is sought in the full space but the problem of computational cost is addressed by considering a reduced 4D-Var to provide a relevant initial guess for the full-space imization. Their approach consists of perforg a few iterations to solve the first incremental 4D-Var Eq. (38) approximately, reduced to the subspace L and with the sample covariance matrix S as background-error covariance matrix approximation, 1 δx R r 2 L δx d b 2 S G L δx d o 2 R 1. (39) Indeed, for x = x b,i.e.d b =, and by Eq. (27), the background term in Eq. (39) is equivalent to that in Robert et al. (25), which may be The prolongation of the approximate solution is then used as the starting point of a Gauss Newton method to solve the full 4D-Var (Eq. (7)) where the background-error covariance matrix B is used as preconditioner. They apply thisreduced-orderapproachintheocéan PArallélisé(OPA) model with its Tropical Pacific Ocean configuration and the variational data-assimilation package OPAVAR (Weaver et al., 23). Note that Theorem 2 of section 3.2 gives theoretical foundations for this approach. Indeed, under the assumptions of Theorem 2 and taking d b = x b, d o = y and the Jacobian of the nonlinear operator G at x as linear model and observation operators, the reduced 4D-Var Eq. (39) (see Eq. (24)) and the SEEK filter are equivalent. This motivates the relevance of solving Eq. (39) approximately to obtain a cheap initial guess. In what follows, we propose, and combine, two ways of exploiting the relevant information contained in the EOFs to accelerate the Gauss Newton method presented in section 4. First, to define an appropriate starting point for the CG method for the first linear least-squares problem (Eq. (34)), 1 δx R n 2 F δx + f(x ) 2 2, (4) we consider the reduced form 1 δx R r 2 F L δx + f(x ) 2 2, (41) yielding the symmetric positive-definite linear system which can be rewritten as L T FT F L δx = L T FT f(x ), L T A L δx = L T b (42) using the notation introduced above. The solution of Eq. (42), denoted by δx,isthenprolongatedinthefullspace, yielding L x. As previously, the reduced linear least-squares problem Eq. (41) can be rewritten in the incremental formulation δx R r = 1 2 (L δx d b ) T B 1 (L δx d b ) (G L δx d o ) T R 1 (G L δx d o ), (43) which differs from Eq. (39) only by the use of the background-error covariance matrix B instead of the sample covariance matrix S. Observe that the prolongation of the solution δx of Eq. (41) satisfies, by Eq. (42), L δx = L (L T A L ) 1 L T b (44) = L (L T A L ) 1 L T A δx, and corresponds to the solution of the full linear leastsquares problem Eq. (4) projected on L along the writteninournotationas 1 2 δx 2 U 1.

10 Reduced 4D-Var using the LMP Preconditioner 461 orthogonal of A L (see Saad, 28, for a review on projections). In our approach, instead of perforg only a few iterations on the reduced problem as in Robert et al. (25), we compute Eq. (44) and choose it as a starting point to solve the full linear least-squares problem Eq. (4). This starting point is called the Ritz Galerkin starting point with respect to the subspace L (van der Vorst, 23). The matrix inverse in Eq. (44) is computed using a direct method since its dimension is equal to r and is very small in practice. Considering again the first outer iteration of the Gauss Newton method, the convergence rate of the CG method is strongly influenced by the distribution of the eigenvalues of A and is roughly bounded by a function of the condition number of A (see Nocedal and Wright, 26, for details). When this condition number is large, it is advisable to use a preconditioned version of the CG method. It is widely recognized that there is no universal way to design a preconditioner for all type of problems (Benzi, 22) and that designing preconditioners is a very broad and active area of research (van der Vorst, 23). Ideally, the preconditioner must be a symmetric positive-definite matrix, which is both cheap to compute and easy to apply. Its expected effect is a reduction of the condition number and/or a clusterization of the spectrum of A. Gratton et al. (21) introduced a class of preconditioners called limited-memory preconditioners (LMP) and defined by H = [I n Z(Z T AZ) 1 Z T A]M [I n AZ(Z T AZ) 1 Z T ] + Z(Z T AZ) 1 Z T, where A is the matrix of the system to solve, the columns of the matrix Z R n r are linearly independent vectors and the symmetric positive-definite matrix M plays the role of a possible first-level preconditioner. Thisfirst-level preconditioner usually depends on the physics of the application. In variational data-assimilation problems, it is equal to the background-error covariance matrix B and is able to cluster most eigenvalues at 1. In order to improve the efficiency of this first-level preconditioner, the columns of the matrix Z inthelmparebuilttocontaindirectionsina low-dimensional subspace that are left out by the first-level preconditioner and slowing down the convergence of the CG method. Recently, Tshimanga et al. (28) used the LMP technique above in a 4D-Var context. Their idea is to exploit information gained when solving one system to build an LMP for the next system of the sequence. The information could be, for example, Ritz pairs or descent directions generated by the CG. Their method, however, is unable to exploit the LMP technique to design a preconditioner for the first system to solve in the sequence, since in this case no information is already available through a CG. Here we propose such a preconditioner, based on EOFs. More precisely, we choose as preconditioner the LMP H = [I n L (L T A L ) 1 L T A ]B [I n A L (L T A L ) 1 L T ] + L (L T A L ) 1 L T. (45) The cost of building this preconditioner is doated by the product A L ; thus it is available without any extra computational cost since this product has been computed for the starting point Eq. (44). In the numerical experiments that follow, we assume that the Hessian approximations A k do not change significantly from one outer iteration to the next and keep the preconditioner unchanged for each system, since it is available from the first to the last system of the sequence. 6. Numerical experiments The evolution model we consider is the one-dimensional shallow-water system describing the flow of a fluid over an obstacle (Lawless et al., 23). The governing equation can be written as Du Dt + φ z = g h z, Dφ Dt + φ u z =, where D Dt = t + u z is the material derivative and g is the gravitational constant. In these equations, u = u(z, t) isthe velocity of the fluid, φ = gh(z, t) is the geopotential, with h(z, t) >, the depth of the fluid above the orography, and h = h(z) is the height of the bottom orography. The problem is defined on the domain z [, l], with the periodic boundary condition z() = z(l), and for t [, T]. The height of the obstacle is given by ( ) h h(z) = c 1 (z 2 l )2 for z l a 2 2 a, otherwise, where h c is the maximum height of the obstacle and a is half the length over which the base of the obstacle extends. In our experiments, a is set to.4mand h c to.5 m. The values of u and φ are specified everywhere at the initial time as u(z,)= u (z) = u, φ(z,)= φ (z) = g(h h(z)), with u =.1m s 1 and h =.2m.Attimet =, the fluid is impulsively put in motion with the constant velocity u. From this impulse, a wave motion develops and moves away from the obstacle in both directions. In our example, the domain is defined to be periodic over 25 grid points, with a distance of z =.1 m between them, so that z [ m, 2.5 m]. For simplicity, we set the gravitational constant g to 1 m s 2.Thetimestep t for the model integration is s. The state vector of the system at atimet is given by the velocity and the geopotential at each of the 25 grid points and is thus of dimension n = 5. The framework for the numerical tests is the classical twin experiment. The system period, i.e. the average time for a wave to cover the whole basin, is empirically estimated to be 4 time steps. It is divided into 1 assimilation windows and we solve the data-assimilation problem for the first assimilation window only, hence composed of 4 time steps. We first compute reference states by integrating the model from an arbitrary initial condition over 4 time steps. Observations are then generated from the reference states by adding Gaussian noise. To build the sample covariance matrix S given by Eq. (5) and required to obtain L,one computes a set of l = 5 state vectors by integrating the model over 4 time steps from an arbitrary initial condition

11 462 S. Gratton et al..14 U field Phi field Velocity (m/s).12.1 Geopotential (m 2 /s 2 ) Distance (m) 1 2 Distance (m).25.2 height (m) Distance (m) Figure 1. Velocity, geopotential and water level at a given time for the shallow-water model. wileyonlinelibrary.com/journal/qj This figure is available in colour online at Percentage of explained system s variability Number of selected EOFs Figure 2. Percentage of the explained system s variability versus the number of selected EOFs. This figure is available in colour online at wileyonlinelibrary.com/journal/qj occurring before the assimilation window, storing the state vectors produced every eighth time step. Note that the background x b is computed as the average of these 5 state vectors, while the background covariance matrix B is computed using a Laplacian-based correlation model, with a correlation length-scale equal to 1.5 m and a variance of The spectral decomposition (Eq. (6)) of S is then applied to compute the EOF basis. To estimate the percentage of variation accounted for by the first r EOFs, we use the formula r 1 λ i n λ, i where the λ i are the eigenvalues of S in decreasing order. We plot this percentage in Figure 2 (for r = 1,..., 5) and see that retaining the first five EOFs enables us to explain 8% of the system s variability. This choice for r is a good compromise, since it allows us to account for most of the variation without increasing too much the dimension of the subspace spanned by the selected EOFs. Note that all these parameters are chosen according to the choices made for operational data-assimilation problems. The set of experiments is designed to illustrate the impact of the Ritz Galerkin starting point, Eq. (44) and of the LMP Eq. (45), when the subspace spanned by L is defined by the five doant EOFs. We allow for three outer Gauss Newton iterations with five inner CG iterations to solve each linear system, which corresponds to realistic heuristics for operational 4D-Var problems. We consider a first set of three experiments with an increasing number of observations (see Figures 3 5). These observations are generated at each time step, implying that N = 4. A velocity and geopotential observation is generated at the 5th grid point for the first experiment (p i = 2), at the 5th and 2th grid points for the second experiment (p i = 4) and at the 5th, 125th and 2th grid points for the third one (p i = 6). In these experiments, the observationerror covariance matrix R is set to a multiple µ of the identity matrix, with µ = 1 4.Theseexperimentsyield an observation vector space of dimension p = 8, p = 16 and p = 24, respectively (see figure 1). We next consider a second set of two experiments in which µ is set to µ = 1 5 and µ = 1 3, successively, for the case p = 24 (see Figures 6 and 7). Figures 3 7 show the history of the nonlinear and incremental cost functions for the three outer iterations (the nonlinear with circles and the incremental with lines). Note that the nonlinear function value is available only at outer iterations when a linearization is performed in the Gauss Newton algorithm. For the incremental cost functions, the curves are placed one after the other in sequence and the inner iterations are cumulated. The

c 2007 Society for Industrial and Applied Mathematics

c 2007 Society for Industrial and Applied Mathematics SIAM J. OPTIM. Vol. 18, No. 1, pp. 106 13 c 007 Society for Industrial and Applied Mathematics APPROXIMATE GAUSS NEWTON METHODS FOR NONLINEAR LEAST SQUARES PROBLEMS S. GRATTON, A. S. LAWLESS, AND N. K.