Evaluation of Analysis by Cross-Validation, Part II: Diagnostic and Optimization of Analysis Error Covariance

Size: px

Start display at page:

Download "Evaluation of Analysis by Cross-Validation, Part II: Diagnostic and Optimization of Analysis Error Covariance"

Rosaline Flowers
5 years ago
Views:

atmosphere Article Evaluation of Analysis by Cross-Validation, Part II: Diagnostic and Optimization of Analysis Error Covariance Richard Ménard * ID and Martin Deshaies-Jacques Air Quality Research

: +1-514-421-4613 Received: 7 November 2017; Accepted: 13 February 2018; Published: 15 February 2018 Abstract: We present a general ory of estimation of analysis error covariances based on

In particular, we use variance of passive observation-minus-analysis residuals and show that true analysis error variance can be estimated, without relying on optimality assumption.

1 atmosphere Article Evaluation of Analysis by Cross-Validation, Part II: Diagnostic and Optimization of Analysis Error Covariance Richard Ménard * ID and Martin Deshaies-Jacques Air Quality Research Division, Environment and Climate Change Canada, 2121 ranscanada Highway, Dorval, QC H9P 1J3, Canada; martin.deshaies-jacques@canada.ca * Correspondence: richard.menard@canada.ca; el.: Received: 7 November 2017; Accepted: 13 February 2018; Published: 15 February 2018 Abstract: We present a general ory of estimation of analysis error covariances based on cross-validation as well as a geometric interpretation of method. In particular, we use variance of passive observation-minus-analysis residuals and show that true analysis error variance can be estimated, without relying on optimality assumption. his approach is used to obtain near optimal analyses that are n used to evaluate air quality analysis error using several different methods at active and passive observation sites. We compare estimates according to method of Hollingsworth-Lönnberg, Desroziers et al., a new diagnostic we developed, and perceived analysis error computed from analysis scheme, to conclude that, as long as analysis is near optimal, all estimates agree within a certain error margin. Keywords: data assimilation; statistical diagnostics of analysis residuals; estimation of analysis error; air quality model diagnostics; Desroziers et al. method; cross-validation 1. Introduction At Environment and Climate Change Canada (ECCC) we have been producing hourly surface pollutants analyses covering North America [1 3] using an optimum interpolation scheme which combines operational air quality forecast model GEM-MACH output [4] with real-time hourly observations of O 3, PM 2.5, PM 10, NO 2, and SO 2 from AirNow gateway with additional observations from Canada. hese analyses are not used to initialize air quality model and we wish to evaluate m by cross-validation, that is by leaving out a subset of observations from analysis to use m for verification. Observations used to produce analysis are called active observations while those used for verification are called passive observations. In a first-part paper of this study, i.e., Ménard and Deshaies-Jacques [5], we have examined different verification metrics using eir active or passive observations. As we changed ratio of observation error to background error variances γ = σo 2 /σb 2, while keeping sum σ2 o + σb 2 equal to var(o B), we found a minimum in var(o A) in passive observation space. In this second-part paper, we formalize this result, develop principles of estimation of analysis error covariance by cross-validation, and apply it to estimate and optimize analysis error covariance of ECCC s surface analyses of O 3 and PM 2.5. When we refer to analysis error, or analysis error covariance, it is important to distinguish perceived analysis error with true analysis error [6]. he perceived analysis error is analysis error that results from analysis algorithm itself, whereas true analysis error is difference between analysis and true state. Analysis schemes are usually derived from an optimization of some sort. In a variational analysis scheme for example, analysis is obtained by minimizing a cost function with some given or prescribed observation error and background error covariances, R and B Atmosphere 2018, 9, 70; doi: /atmos

2 Atmosphere 2018, 9, 70 2 of 21 respectively. In a linear unbiased analysis scheme, gain matrix K is obtained by minimum variance estimation, yielding an expression of form, K = BH(H BH + R) 1, where H is observation operator. he perceived analysis error covariance is n derived as Ã = (I KH) B. In order to derive an expression for perceived analysis error covariance we in fact assume that given error covariances, R and B, are error covariances with respect to true state, i.e., true error covariances. We also assume that observation operator is not an approximation with some error, but is true error-free observation operator. Of course, in real applications neir of R and B are covariance measures with respect to true state, but only a more or less accurate estimate of those. Daley [6] have argued that in principle for an arbitrary gain matrix K, true analysis error covariance A can be computed as A = (I KH)B(I KH) + KR K provided that we know true observation and background error covariances, R and B. his expression is a quadratic matrix equation, and has property that true analysis error variance, tr(a) is minimum when K = BH(HBH + R) 1 = K. In that sense, analysis is truly optimal. he optimal gain matrix K is called Kalman gain. It thus illustrates that although an analysis is obtained through some minimization principle, resulting analysis error is not necessarily true analysis error. One of main sources of information to obtain true R and B is from var(o B) statistic. However, it has always been argued that this is not possible without making some assumptions [7 9], most useful one being that background errors are spatially correlated while observation errors are spatially uncorrelated, or at least on a much shorter length-scale. Even under those assumptions, different estimation methods such as Hollingsworth-Lönnberg method [10], maximum likelihood gives different error variances and different correlation lengths [11]. Or methods use var(o B) for rescaling but assume that observation error is known. he assumption that observation error is known is also debated as y contain representativeness errors [12] that include observation operator errors. How to obtain an optimal analysis is thus unclear. he evaluation of true or perceived analysis error covariance using its own active observations is also a misleading problem unless analysis is already optimal. Hollingsworth and Lönnberg [13] addressed this issue for first time where y noted that in case of an optimal gain (i.e., optimal analysis), statistics of observation-minus-analysis residuals O Â are related to analysis error by E[(O Â)(O Â) ] = R HÂH, where Â is optimal analysis error covariance and H and R are observation operator and observation error covariance respectively. he caret (ˆ) over A indicates that analysis uses an optimal gain. In context of spatially uncorrelated observation errors, off-diagonal elements of E[(O Â)(O Â) ] would n give analysis error covariance in observation space. Hollingsworth and Lönnberg [13] argued that for most practical purposes, negative intercept of E[(O Â)(O Â) ] at zero distance and prescribed observation weight should be nearly equal, and thus could be used as an assessment of optimality of an analysis. However, in case where such agreement does not exist, an estimate of actual analysis error is not possible. Anor method, proposed by Desroziers et al. [14], argued that diagnostic E[(O Â)(Â B) ] should be equal to analysis error covariance in observation space but, again, only if gain is optimal and innovation covariance consistency is respected [15]. he impasse of estimation of true analysis error seems to be tied with using active observations, i.e., using same observations as those used to create analysis. A robust approach that does not require an optimal analysis is to use observations whose errors are uncorrelated with analysis error. For example, if we assume that observation errors are temporarily (serially) uncorrelated, an estimation of analysis error can be made with help of a forecast model initialized by analysis by verifying forecast against se observations. his is essential assumption used traditionally in meteorological data assimilation to assess indirectly analysis error by comparing resulting forecast with observations valid at forecast time. As forecast error grows with time, observation-minus-forecast can be used to assess wher an analysis is better than anor. In a somewhat different method but making same assumption, Daley [6]

3 Atmosphere 2018, 9, 70 3 of 21 used temporal (serial) correlation of innovations to diagnose optimality of gain matrix. his property was first established in context of Kalman filter estimation ory by Kailath [16]. However, both traditional meteorological forecast approach and Daley method [6] are subject to limitations: y assume that model forecast has no bias and analysis corrections are made correctly on all variables needed to initialize model. In practice, improper initialization of unobserved meteorological variables gives rise to spin-up problems or imbalances. Furrmore, with traditional meteorological approach, compensation due to model error can occur, so that an optimal analysis does not necessarily yield an optimal forecast [7]. An alternative approach introduced by Marseille et al. [17], which we will follow here, is to use independent observation or passive observations to assess analysis error. he essential assumption of this method is that observations have spatially uncorrelated errors, so that observations used for verification, i.e., passive observations, have uncorrelated errors with analysis. he advantage of this approach is that it does not involve any model to propagate analysis information to a later time. Marseille et al. [17] n showed that by multiplying Kalman gain with an appropriate scalar value, one can reduce analysis error. In this paper, we go furr by using principles of error covariance estimation to obtain a near optimal Kalman gain. In addition we impose innovation covariance consistency [15] and show that all diagnostics of analysis error variance nearly agree with one anor. hese include Hollingsworth and Lönnberg [13], Desroziers et al. [14] and new diagnostics that we will introduce. he paper is organized as follows. First we present in Section 2 ory and diagnostics of analysis error covariance in both passive and active observation spaces, as well as a geometrical representation. his leads us to a method to minimize true analysis error variance. In Section 3, we present experimental setup on how we obtain near optimal analyses and presents results of several diagnostics in active and passive observation spaces, and compare with analysis error variance obtained from optimum interpolation scheme itself. In Section 4, we discuss statistical assumptions being used, how and if y can be extended and how this formalism can be used in or applications such as estimation of correlated observation errors with satellite observations. Finally, we draw some conclusions in Section heoretical Framework his section is composed of mainly three parts. In Sections 2.1 and 2.2, we first describe diagnostics to obtain true error covariances using passive observations wher analysis is optimal or not. We n give a geometric interpretation in Section 2.3 that indicate way to obtain optimal analysis, that is, one that minimizes true analysis error. Lastly, in Sections 2.4 and 2.5, we formulate diagnostics of analysis error for optimal analyses Diagnostic of Analysis Error Covariance in Passive Observation Space Let us decompose observation space in two disjoint sets; active observation set or training set {y} used to create analysis, and independent or passive observation set {y c } used to evaluate analysis. An analysis built from prescribed background and observation error covariances, B and R respectively, is given by x a = x f + BH (H BH + R) 1 d = x f + Kd (1) where K is gain matrix built from prescribed error covariances, H is observation operator for active observation set, d is active innovation vector, d = y Hx f = ε o Hε f, x f is

4 Atmosphere 2018, 9, 70 4 of 21 background state or model forecast, ε o is active observation error and ε f background error. he observation-minus-analysis residual (O A) for active set is given by [14,15,18]. (O A) = y Hx a = ε o Hε a = d H BH (H BH + R) 1 d = R(H BH + R) 1 d (2) where ε a is analysis error. he analysis interpolated at passive observation sites can be denoted by H c x a, where H c is observation operator at passive observation sites. he observation-minus-analysis residual at passive observation sites (O A) c is n given by (O A) c = y c H c x a = ε o c H c ε a = d c H c BH (H BH + R) 1 d, (3) where d c = y c H c x f = ε o c H c ε f is innovation at passive observation sites. Note that formalism introduced here is general, and can be used with any set of independent observations such as different instruments or observation networks as long as H c operator is properly defined. Consequently, for generality, we distinguish passive observation errors or independent observation error, ε o c, from active observation error ε o. here are two important statistical assumptions from which we derive cross-validation diagnostics. Assuming that observation errors are spatially uncorrelated, it follows that E[ε o (ε o c) ] = 0 (4) where E[ ] is mamatical expectation that represents mean over an ensemble of realizations. It has been argued by Marseille et al. [17] that representativeness error can violate this assumption for a close pair of active-passive observations, but we will neglect this effect. Also, assuming that observation errors are uncorrelated with background error, we have E[ε o (Hε f ) ] = 0, E[ε o c(h c ε f ) ] = 0 (5) We come now to most important property: since analysis is a linear combination of active observations and background state, analysis error is n uncorrelated with passive observation errors, E[( c ε a )(ε o c) ] = 0 (6) and thus we get following cross-validation diagnostic in passive observation space, E[(O A) c (O A) c ] = R c + H c AH c (7) similarly to Marseille et al. [17]. A very important point to note is that A is true analysis error covariance it does not assume that gain is optimal. he matrices in Equation (7) are of dimension of passive observation space and R c is observation error covariance matrix for passive or independent observations A Complete Set of Diagnostics of Error Covariances in Passive Observation Space It is also possible to define a set of diagnostics that would determine, in principle, true error covariances, R, B and A. From Equations (4) and (5) we get anor cross-validation diagnostic, E[(O B) c (O B) ] = H c BH (8)

5 Atmosphere 2018, 9, 70 5 of 21 his diagnostic is related to Hollingsworth-Lönnberg [10] estimation of spatially correlated part of innovation, and in practice can be dominated by sampling error. Note that it is not a square matrix and an estimation of parameters of B may not be trivial. We can also obtain innovation covariance matrix in passive observation space (identical in fact to one in active observations space) as E[(O B) c (O B) c ] = R c + H c BH c (9) he system of Equations (7) (9) gives a complete set of equations to determine true R, B and A at passive observation sites provided that by interpolation/extrapolation we can obtain H c BH c from H c BH. Neverless, and for sake of completeness, we also investigated meaning of statistics E[(O A) c (O B) ] and came to conclusion that it can interpreted as a misfit to Desroziers et al. estimate of B [14] with true value B. We recall that first iterate of Desroziers et al. estimate for B in active observation space is given by HB D H = E[(Hx a Hx f )(y Hx f ) ] [15], where we used superscript D indicate Desroziers et al. first iterate estimate. We can actually generalize this diagnostic to be a cross-covariance between state space and active observation space as B D H = E[(x a x f )(y Hx f ) ] (10) By applying H c to Equation (10) we can n introduce a generalized Desroziers et al. estimate of background error covariance B between passive and active observation spaces as, E[(A B) c (O B) ] = H c B D H (11) Since (O A) c = (O B) c (A B) c, we get with Equations (8) and (10), E[(O A) c (O B) ] = H c (B B D )H (12) hat is difference between true B and Desroziers et al. first estimate B D in cross passive-active observation spaces. Similarly to Equation (8) this diagnostics requires spatial interpolation of error covariances from active sites to passive sites of basically noisy statistics. he estimation of B from this diagnostics is furr complicated by fact that what is being interpolated, that is B B D, may not even be positive definite, but augmented matrix, cov ( (O B) (O A) c ) = [ R + HBH H c (B B D )H H(B B D )Hc R c + H c AHc is positive definite. Except for diagonal of Equation (13), we have not attempted in this study to conduct this complete estimation of R, B and A, but rar focused on getting a reliable estimate of analysis error covariance A Geometrical Interpretation A geometrical illustration of some of relationships obtained above can be made by using a Hilbert space representation of random variables in observation space. A 2D representation for analysis of a scalar quantity was used in Desroziers et al. [14] to illustrate ir a posteriori diagnostics. We will generalize this approach to include passive observations by considering a 3D representation. As in Desroziers et al. [14] let s consider analysis of a scalar quantity. Several variables are to be considered in this observation space: y o active observation (or measurement) of scalar quantity, y b background (or prior) value equivalent in observation space (i.e., y b = H x b ), y a analysis in observation space (i.e., y a = H x a ), and for verification y c an independent observation (or passive ] (13)

6 Atmosphere 2018, 9, 70 6 of 21 observation) that is not used to compute analysis. Each of se quantities is a random variable as y each contain random errors, and any linear combination of random variables in observation space also belong to observation space. For example, y o y b is innovation (commonly denoted by O-B) and that belongs to observation space, y a y b is analysis increment in observation space (commonly denoted by A-B), and y o y a is analysis residual in observation space (commonly denoted by O-A). We can also define an inner product of any random variables in observation space. y 1, y 2, as y 1, y 2 := E[(y 1 E(y 1 ))(y 2 E(y 2 ))] (14) he squared norm n represents variance, so inner product has following geometric interpretation y 2 := y, y = σ 2 y (15) y 1, y 2 = y 1 y 2 cos θ (16) where cos θ is correlation coefficient. Uncorrelated random variables are thus statistically orthogonal. With this inner product, observation space forms a Hilbert space of random variables. Figure 1 illustrates statistical relationship in observation space between: active observation y o (illustrated as O in figure), prior or background y b (i.e., B), analysis y a (i.e., A), and independent observation y c (i.e., O c ). he origin corresponds to truth of scalar quantity, and also corresponds to zero of central moment of each random variables, e.g., y E[y], since each variables are assumed to be unbiased. We also assume that background, active and passive observations errors are uncorrelated to one anor, so three axes; ε o for active observation error, ε b for background error, and ε o c for passive observation error are orthogonal. he plane defined by ε o and ε b axes is space where analysis takes place, and is called analysis plane. However, since we define analysis to be linear and unbiased, only linear combinations of form y a = ky o + (1 k)y b where k is a constant are allowed. he analysis A n lies on line (B, O). he thick lines in Figure 1 represent norm of associated error. For example, thick line along ε o axis depict (active) observation standard deviation σ o, and similarly for or axes and or random variables. Since active observation error is uncorrelated with background error, triangle OB is a right triangle, and by Pythagoras orem we have, (y o y b ) 2 := (O B), (O B) = σo 2 + σb 2. his is usual statement that innovation variance is sum of background and observation error variances. he analysis is optimum when analysis error ε a 2 = σa 2 is minimum, in which case line (, A) is perpendicular to line (O, B). Now let s consider passive observation O c. he passive observation error is perpendicular to analysis plane, thus triangle O c A is a right triangle, (y c y a ) 2 := (O A) c, (O A) c = σ 2 c + σ 2 a (17) where σ 2 c is passive observation error variance. he most important fact to stress here is that orthogonality expressed in Equation (17) is true wher or not analysis is optimal. Furrmore, as distance (y c y a ) 2 varies with position of A along line (O, B), distance (y c y a ) 2 reaches a minimum value when σ 2 a is minimum that is when analysis is optimal. We thus also argue from this representation that re is always a minimum, and minimum is unique. Finally, we note that BO c is also a right triangle so that (y c y b ) 2 := (O B) c, (O B) c = σ 2 c + σ 2 b, which is scalar version of Equation (9).

7 Atmosphere 2018, 9, 70 7 of 21 Atmosphere 2017, 8, 70 7 of 21 Figure Figure 1. Hilbert 1. Hilbert space space representation of a of scalar a scalar analysis analysis and and cross-validation problem. he he arrows arrows indicate indicate directionsof of variability of of random variables, and and plane defined plane defined by background by b o background and observation and observation errors ε b, errors ε o defines ε, ε analysis defines plane. analysis he thick plane. lines he represent thick lines represent norm associated norm with associated different with random different variables. random indicate variables. truth, indicate O active truth, observation, O active Bobservation, background, B Abackground, analysisa and Oanalysis c passive and observation. Oc passive observation. o extend o extend this this formalism to random to random vectors vectors requires requires to define to define a proper a proper matricial matricial inner inner product product as briefly as briefly described described in Section in Section of Caines of Caines [19]. [19]. It is It is important here here to to distinguish stochastic metric metric space space from from observation vector vector space. space. In In stochastic stochastic metric metric space space a scalar a scalar needs needs not not to to be a benumber, a but but only only a a non-random quantity quantity invariant invariant with with respect respect to E to [ ] E[. Hence, ]. Hence, we we can can define define a a Hilbert Hilbert space with with following following (stochastic scalar) (stochastic matricial scalar) inner product matricial y, w = inner E[(y product E(y))(w y, E(w) w = E )] [( y= cov(y, E( y))( ww). Ehe ( w) matrix )] = cov( nature y, w) of. cov(y, he matrix w) pertains nature to of observation cov( y, w) pertains vector space, to but it observation remains a vector scalar space, with respect but it remains to stochastic a scalar with Hilbert respect space to herein stochastic defined. In Hilbert order space to obtain herein a true defined. scalar In ( order R), one to obtain would a need true to scalar define ( a R metric ), one on would observation need to define space a metric matrices on as well, observation such as trace. Also to be able to compare active and passive observations, projectors need to be introduced on space matrices as well, such as trace. Also to be able to compare active and passive observations, observation space, implying yet anor metric structure on observation space. We do not carry projectors need to be introduced on observation space, implying yet anor metric structure on out this formalism here as it would represent a rar lengthy development that would distract us observation space. We do not carry out this formalism here as it would represent a rar lengthy from main purpose of this paper; this will be considered in a future manuscript. Finally, we remark development that would distract us from main purpose of this paper; this will be considered in a that Hilbert space representation of random variables in infinite dimensional space (i.e., continuous future manuscript. Finally, we remark that Hilbert space representation of random variables in space) can also be defined, see Appendix 1 of Cohn [20]. infinite dimensional space (i.e., continuous space) can also be defined, see Appendix 1 of Cohn [20] Error Covariance Diagnostics in Active Observation Space for Optimal Analysis 2.4. Error Covariance Diagnostics in Active Observation Space for Optimal Analysis An analysis is optimal if analysis error E[(ε a ) ε a ] is minimum. his implies that gain a a matrix An analysis using is optimal prescribed if error analysis covariances, error E [( K ε ) ε ] is minimum. his implies that gain matrix using prescribed error covariances, K ~ in Equation (1), must be identical to gain using true error covariances, i.e., K in Equation (1), must be identical to gain 1 using true error covariances, i.e., K ~ = BH (HBH + R) 1 [14,15]. It is important to mention that necessary and sufficient conditions to = obtain BH ( HBH true + Rerror ) [14,15]. covariances It is important HBH and to Rmention in observation that space, are: 1 Kalman gain condition, H K = HK true and 2 innovation covariance consistency, necessary and sufficient conditions to obtain true error covariances HBH and R in E[(O B)(O B) ] = H BH + R. For a proof see heorem true observation space, are: 1 Kalman gain condition, H K ~ on error covariance estimates in Ménard [15]. = HK and 2 innovation ~ ~ covariance Fromconsistency, optimality E [( of O B)( analysis O B) (or ] = HBH Kalman+ gain) R. For alone, a proof we derive see that E[(Â heorem )(O on error B) ] = covariance 0 or E[(Â estimates )(Oin Ménard Â) ] = [15]. 0. Indeed, from Equation (2), we get (O Â) = R(HBH + R) 1 d, From optimality of analysis (or Kalman gain) alone, we derive that ( ˆ and for analysis error in observation space we get, (Â ) = E [ R(HBH A )( O+ R) B) 1 ] = Hε 0 f + or HBH ( ˆ 1 E [ A (HBH )( O A+ ˆ) R) ] = 1 0ε. o Indeed,, from which from Equation we derive (2), we expectations get ( O Aˆ) = above. R ( HBH + Using R) d, and geometrical for representation in Section 2.3 distance 1 f 1 o analysis error in observation space we get, ( Aˆ between A and is minimum, when AO and AB ) = R ( HBH + R) Hε + HBH ( HBH + R) ε, (Figure 1) are right triangles. We should also note that for scalar problem, Kalman gain depends from which we derive expectations above. Using geometrical representation in Section 2.3 distance between A and is minimum, when AO and AB (Figure 1) are right triangles. We should also note that for scalar problem, Kalman gain depends only on ratio of

8 Atmosphere 2018, 9, 70 8 of 21 only on ratio of observation to background error variances and thus scalar Kalman gain is optimal if ratio of prescribed error variances is equal to ratio of true error variances. If in addition to optimality of analysis or Kalman gain we add innovation covariance consistency n we get three different statistical diagnostics of (optimal) analysis error covariance. Hollingsworth and Lönnberg [13] was first to introduce a statistical diagnostic of analysis error in active observation space, as: E[(O Â)(O Â) ] = R HÂ HL H (18) Here we use a subscript, HL to indicate that this is Hollingsworth-Lönnberg estimate. Equation (18) can obtained from covariance of (O Â) and that, for an optimal gain matrix, HÂH = R R(HBH + R) 1 R which derives from usual formula, Â = B BH (HBH + R) 1 HB. Geometrically it derives from fact that triangle ÂO is a right triangle, and from innovation covariance consistency that implies that triangle OB is a right triangle. Using data to construct E[(O Â)(O Â) ], an estimated analysis error covariance obtained from Equation (18) is symmetric but could be non-positive definite as it is obtained by subtracting two positive definite matrices. he effect of misspecification in prescribed error covariances resulting from a lack of innovation covariance consistency will be discussed in result Section 3 and in Appendix B. Inspired from geometrical interpretation that AB is also be a right triangle we derived following diagnostic, E[(Â B)(Â B) ] = HBH HÂ MDJ H = H(B Â MDJ )H (19) where MDJ stands for Ménard-Deshaies-Jacques. his relationship is obtained by using expression (Â B) = HBH (HBH + R) 1 d, innovation covariance consistency and formula for optimal analysis error covariance Â = B BH (HBH + R) 1 HB. As for HL diagnostics, estimated error covariance obtained from this diagnostic is symmetric by construction but may not be positive definite. Anor way of looking at Equation (19) is that it expresses, in observation space, error reduction due to use of observations. Anor diagnostic of analysis error covariance was proposed by Desroziers et al. [14]. By combining (O Â) and (Â B) we get E[(O Â)(Â B) ] = HÂ D H (20) where subscript D denotes Desroziers et al. estimate. By construction, estimated analysis error covariance is not necessarily symmetric. A geometrical derivation is provided in Appendix A. We also provide in Appendix B a sensitivity analysis on departure from innovation covariance consistency for each diagnostics in both active and passive observation spaces Error Covariance Diagnostics in Passive Observation Space for Optimal Analysis We can also derive optimal analysis diagnostics in passive observation space. Considering 3D geometric interpretation, and in particular tetrahedron (O c,, A, B), we notice that since AB is a right triangle, so is O c AB, which is a projection of triangle AB on plane passing through O c, O and B. We thus have E[(Â B) c (Â B) c ] + E[(O Â) c (O Â) c ] = E[(O B) c (O B) c ]. Combining this result with Equation (7) and using, E[(O B) c (O B) c ] = H cbh c + R c, we n get E[(Â B) c (Â B) c ] = H cbh c H c Â MDJ H c (21) Note that our analysis diagnostic is only diagnostic that is valid in both active and passive observation spaces, i.e., Equation (21) is similar to Equation (19).

9 Atmosphere 2018, 9, 70 9 of 21 he or, less direct, diagnostic for optimal analysis is simply based on Equation (7) that is argmin γ, L c {E[(O A) c (O A) c ]} = R c + H c Â(γ, L c )H c (22) hese five diagnostics will be used later in results section. 3. Results with Near Optimal Analyses 3.1. Experimental Setup We will just give here a short summary of experimental setup we are using in this study. More details can be found in Part I paper (Ménard and Deshaies-Jacques [5]). A series of hourly analyses of O 3 and PM 2.5 at 21 UC for a period of 60 days (14 June to 12 August 2014) were performed using an optimum interpolation scheme combining operational air quality model GEM-MACH forecast and real-time AirNow observations (see Section 2 of [5] for furr details). he analyses are made off-line so y are not used to initialize model. As input error covariances, we use uniform observation and background error variances, with R = σo 2 I and B = σb 2 C, where C is a homogeneous isotropic error correlation based on a second-order autoregressive model. he correlation length is estimated by using a maximum likelihood method using at first, error variances obtained from a local Hollingworth-Lönnberg fit [11] (and only for purpose of obtaining a first estimate of correlation length). We n conduct a series of analyses by changing error variance ratio γ = σo 2 /σb 2 while at same time respecting innovation variance consistency condition, σo 2 + σb 2 = var(o B). his corresponds basically in searching for minimum of tr (trace) of Equation (7) while trace of innovation covariance consistency, tr[r + HBH ] = tr{e[(o B)(O B) ]}, is respected. First observations are separated into 3 sets of observations of equal number and distributed randomly in space. By leaving out one set of observations for verification and constructing analyses with remaining 2 or sets, we construct a cross-validation setup from which we can evaluate diagnostic var(o A) c = tr{e[(o A) c (O A) c } in passive observation space. his constitutes our first guess experiment that we will refer to as iter 0. No observation or model bias correction was applied, nor were mean of innovation at stations were removed prior to performing analysis. he variance statistics are first computed at station using 60-day members, and n averaged over domain to give equivalent of tr{e[(o A) c (O A) c }. We repeat procedure by exhausting all permutations possible, in this case 3. he mean value statistic for three verifying subsets are n averaged. More details can be found in Section 2 and beginning of Section 3 of Part I [5]. he red curve on Figure 2 illustrates how this diagnostic varies with and exhibits a minimum. his minimum can easily be understood by referring to Figure 1: as analysis point A changes position along line (O, B) distance O c A reaches a minimum, and this is what we observe in Figure 2. In our next step, iter 1, we first re-estimate correlation length by applying a maximum likelihood method, as in Ménard [15], using iter 0 error variances that are consistent with optimal ratio ˆγ obtained in iter 0 and innovation variance consistency. hen, with this new correlation length, we estimate a new optimal ratio ˆγ (iter 1), which turn out to be very close to value obtained in iter 0. We recall that we use uniform error variances, both to keep things simple but also because optimal ratio is obtained by minimizing a domain-averaged variance var(o c A). A summary of error covariance parameters obtained for iter 0 and iter 1 are presented in able 1.

10 Atmosphere 2018, 9, of 21 Atmosphere 2017, 8, of 21 Figure Red line, variance of of observation-minus-analysis of OO3 3 in in passive observation space. Blue line, variance of observation-minus-model. able able1. Input error statistics for first experiment and and optimized variance variance ratio ratio experiment. 2 Experiment L c (km) ( O B) Experiment L c (km) (O B) 2 ˆγ = ˆ γ ˆσ =σo/ 2 ˆ o ˆσ / ˆ σ b σˆ o ˆb σ χ / N b 2 ˆσ o 2 ˆσ b 2 χ 2 s/n s O O3 3 iter 0iter O 3 O3 iter 1iter PM 2.5 PM2.5 iter 0iter PM 2.5 iter PM2.5 iter We have also added in 2 able 1, χ 2 /N / N s diagnostic s values [15] [15] which is is 60-day mean value of χ 2 2 value of k /N s (k) where N s (k) is total number of observations available at time t χ k / N s ( k) where N s (k) is total number of observations available at time k and χ 2 t k = ( 1dk k and dk H BH + R) 2 ~ ~. 1 he χ 2 χ k = d k ( HBH + R) 2 /N s diagnostics [15] is closely related to J min diagnostic used in variational methods, d and k. he should χ have / Ns diagnostics a value of 1 [15] in is closely case where related to innovations J min diagnostic are consistent used with in variational prescribed methods, error and statistics. should When have a re value isof innovation 1 case covariance where consistency innovations n are χconsistent 2 /N s = 1 but with reverse prescribed is not true. error We statistics. observe that When re was re a significant is innovation improvement covariance fromconsistency iter 0 to itern 1 in 2 terms χ / Nof s = χ1 2 /N but s but is reverse still not is equal not true. to one. We We observe thus refer that re analysis was a of significant iter 1 as near improvement optimal. from 2 iter 0 he to iter repeated 1 in terms application of χ / of Nthis s but sequence is still not of estimation equal to one. methods, We thus i.e., refer find analysis correlation of iter length 1 as by maximum likelihood and optimize variance ratio, converges really fast and in practice re is near optimal. no need to go beyond iter 1. Figure 3 displays iterates 0 to 4 with our estimation procedure for O he repeated application of this sequence of estimation methods, i.e., find correlation length 3. With one iteration update we nearly converge. A similar procedure was used in Ménard [15], where by maximum likelihood and optimize variance ratio, converges really fast and in practice re is variance and correlation length (estimated by maximum likelihood) were estimated in sequence, no need to go beyond iter 1. Figure 3 displays iterates 0 to 4 with our estimation procedure for O3. which taught us that a slow and fictitious drift in estimated variances and correlation length can occur With one iteration update we nearly converge. A similar procedure was used in Ménard [15], where when correlation model is not true correlation. So in regard of similar considerations that may variance and correlation length (estimated by maximum likelihood) were estimated in sequence, occur here, we do not extend our iteration procedure beyond first iterate. which taught us that a slow and fictitious drift in estimated variances and correlation length can occur 3.2. when Statistical correlation Diagnostics model of Analysis not Error true Variance correlation. So in regard of similar considerations that may occur here, we do not extend our iteration procedure beyond first iterate. For each of se experiments, statistics related diagnostics for analysis error variance, discussed in 3.2. Section Statistical 2, are Diagnostics computedof and Analysis results Error Variance are presented in able 2 for verification made against active observations, and in able 3 to verification made against passive observations. If analysis For was each truly of optimal se experiments, different diagnostics statistics related would diagnostics all agree. for analysis error variance, discussed in Section 2, are computed and results are presented in able 2 for verification made against active observations, and in able 3 to verification made against passive observations. If analysis was truly optimal different diagnostics would all agree.

11 Atmosphere 2018, 9, of 21 Atmosphere 2017, 8, of Figure 3. Figure Optimal 3. Optimal estimates estimates of σ of o, σ b and maximum likelihood estimate of c o 2, σb 2 and maximum likelihood estimate of correlation length L c for first four for iterates. first four Blue, iterates. is Blue, optimal is background optimal background error error variance, variance, green, green, optimal observation error error variance and in red correlation length (in km, with labels on right side of figure). variance and in red correlation length (in km, with labels on right side of figure). able 2. Analysis statistics against active observations. able 2. Analysis statistics against active observations. Active Active Active Active Active Experiment ˆ var( A B) tr ˆ ( HAMDJ H ) / Ns tr ( H Aˆ DH ) / N ˆ s var( O A) tr ˆ Active Active Active Active ( HA HLH Active )/ Ns Experiment O3 iter 0 var(â B) tr(hâ MDJ H )/N s tr(hâ 9.61 D H )/N s var(o Â) 6.03 tr(hâ HL H )/N s O3 iter O 3 iter PM2.5 iter O 3 iter PM2.5 iter PM 2.5 iter PM 2.5 iter In second, third and last column of able 2 are tabulated estimates of analysis error variance at active location sites, i.e., tr (HAH ) / Ns, obtained by three different methods. he 2 second column is an estimate given with our method σ b var( A ˆ B) = tr( HAˆ MDJ H ) / Ns. he third column is Desroziers et al. estimate of analysis error [14], Equation (20), and last column is estimate using method proposed by Hollingsworth and Lönnberg [13], Equation (18). We note that analysis error variance estimate provided by first two methods is fairly consistent 2 for an updated correlation length estimate, i.e., iter 1 (but not iter 0). We also note that χ / N s is closer to one for iter 1. hese two facts indicate that updated correlation length (iter 1) with uniform error variances is closer to innovation covariance consistency. he Hollingsworth and Lönnberg [13] method however, is very sensitive and negatively biased in lack of innovation covariance consistency. Estimate of analysis error variance at passive observation locations, i.e., tr ( H cahc )/ N s, provided by two different methods are given by Equation (21) in column 3 and by Equation (22) in column 5 of able 3. As for estimate at active locations (able 2), re is a general agreement on analysis error estimates with updated correlation length (iter 1), although this distinction is not that clear for PM2.5. In second, third and last column of able 2 are tabulated estimates of analysis error variance at active location sites, i.e., tr(hah )/N s, obtained by three different methods. he second column is an estimate given with our method σb 2 var(â B) = tr(hâ MDJ H )/N s. he third column is Desroziers et al. estimate of analysis error [14], Equation (20), and last column is estimate using method proposed by Hollingsworth and Lönnberg [13], Equation (18). We note that analysis error variance estimate provided by first two methods is fairly consistent for an updated correlation length estimate, i.e., iter 1 (but not iter 0). We also note that χ 2 /N s is closer to one for iter 1. hese two facts indicate that updated correlation length (iter 1) with uniform error variances is closer to innovation covariance consistency. he Hollingsworth and Lönnberg [13] method however, is very sensitive and negatively biased in lack of innovation covariance consistency. Estimate of analysis error variance at passive observation locations, i.e., tr(h c AHc )/N s, provided by two different methods are given by Equation (21) in column 3 and by Equation (22) in column 5 of able 3. As for estimate at active locations (able 2), re is a general agreement on analysis error estimates with updated correlation length (iter 1), although this distinction is able 3. Analysis statistics against passive observations. not that clear for PM 2.5. Passive Passive Passive Passive Experiment able var[( Aˆ 3. Analysis B) ] tr statistics ( H Aˆ 2 Hagainst ) / N passive var[( Oobservations. Aˆ) ] var[( O Aˆ) ] σ c c MDJ O3 iter O3 iter 1 Passive Passive Passive Passive Experiment PM2.5 iter var[(â 0 B) c ] tr(h c Â MDJ H c )/N s var[(o Â) c ] var[(o Â) c ] σoc 2 O 3 iter 0PM2.5 iter O 3 iter PM 2.5 iter PM 2.5 iter c s c c oc We note also that analysis error variance at active sites is smaller than analysis error variance at passive observation sites. his involves in particular fact that since passive

Atmosphere 2018, 9, 70 12 of 21 observation are away from active observation sites, reduction of variance at passive observation sites is smaller than at active observation sites. 3.

covariance also that [6], analysis usingerror variance expression, at active sites is smaller than analysis error variance at passive observation sites.

12 Atmosphere 2018, 9, of 21 observation are away from active observation sites, reduction of variance at passive observation sites is smaller than at active observation sites Comparison with Perceived Analysis Error Variance Atmosphere 2017, 8, of 21 We computed analysis error covariance A resulting from analysis scheme, so-called perceived analysis We note covariance also that [6], analysis usingerror variance expression, at active sites is smaller than analysis error variance at passive observation sites. his involves in particular fact that since passive observation are away from active A = B BH observation sites, (H BH + R) 1 reduction of H B = B GG variance at passive (23) observation sites is smaller than at active observation sites. Contrary 3.3. Comparison to statistical with Perceived diagnostics Analysis above, Error Variance perceived analysis error variance is obtained at each model grid point. We n compared perceived analysis error variance at active We computed analysis error covariance A resulting from analysis scheme, observationso-called sites with perceived estimated analysis covariance active analysis [6], using error expression, variance obtained from previous diagnostics. Since we showed that in both active and ~ passive ~ ~ observations ~ 1 ~ ~ sites different diagnostics agrees A = B BH ( HBH + R) HB = B GG. (23) when analysis is optimal, this would indicate that if perceived analysis error variance agrees with analysis Contrary errorto variance statistical diagnostics, that above, whole perceived map analysis of analysis error variance erroris variance obtained at is credible. each model grid point. We n compared perceived analysis error variance at active In order to calculate perceived analysis error covariance Equation (23) we first perform a observation sites with estimated Choleski decomposition of H BH active analysis + R = LL error variance obtained from previous diagnostics. Since we showed that in both active and passive, where observations L is a sites lower triangular different diagnostics matrix. agrees hen with a forward substitution when analysis we obtain is optimal, L 1 this, from would which indicate we that compute if perceived G = analysis BH L error. he variance perceived agrees analysis with analysis error variance diagnostics, that whole map of analysis error variance is credible. error variance for ozone optimal analysis (i.e., O 3 iter 1) is displayed in Figure 4 (A similar figure In order to calculate perceived analysis error covariance Equation (23) we first perform a but for PM 2.5 is given in supplementary material). We note that although input statistics used for Choleski decomposition of H BH ~ + R ~ = LL, where L is a lower triangular matrix. hen with a analysis are uniform (i.e., uniform background 1 and observation error variances, and homogeneous forward substitution we obtain L, from which we compute G = BH ~ L. he perceived analysis correlationerror model), variance for computed ozone optimal analysis analysis error (i.e., variance at active observation location displays O3 iter 1) is displayed in Figure 4 (A similar figure large variations, but for which PM2.5 is given is attributed in supplementary to non-uniform material). We note spatial that although distribution input of statistics active used observations. for In Figure analysis 5 we display are uniform a (i.e., histogram uniform background of those variances and observation for error ozone variances, optimal and homogeneous analysis O 3 iter 1 correlation model), computed analysis error variance at active observation location displays (panel b) and for first experiment O 3 iter 0 (panel a) without optimization (A similar figure is but large variations, which is attributed to non-uniform spatial distribution of active observations. for PM 2.5 is given In Figure in supplementary 5 we display a histogram material). of those variances for ozone optimal analysis O3 iter 1 Note that (panel median b) and for mean first experiment values of O3 variances iter 0 (panel area) significantly without optimization different (A similar between figure is but optimal and non-optimal for analysis PM2.5 given cases. in supplementary Although material). observation and background errors are uniform in both Note that median or mean values of variances are significantly different between optimal optimal and non-optimal analyses, in optimal analysis case perceived analysis uncertainty at and non-optimal analysis cases. Although observation and background errors are uniform in observation both optimal location and is distributed non-optimal analyses, more equally in across optimal all analysis values, case indicating perceived thatanalysis re is a better propagationuncertainty of information, observation measured location by analysis distributed errormore variance, equally across all values, different indicating observation that sites. he exception re being is a better propagation isolated observation of information, sites, measured for which by analysis we error observe variance, a maxima across ondifferent high end of observation sites. he exception being isolated observation sites, for which we observe a maxima histograms. At those sites analysis error variance is simply obtained by scalar equation 1/σa 2 on = 1/σo 2 high + 1/σb 2 end. For of O histograms. At those sites analysis error variance is simply obtained by 23 iter 12 scalar 2 analysis error variance gives 16.2, and for O 3 iter 0 we scalar equation 1/ σ a = 1/ σ o + 1/ σ b. For O3 iter 1 scalar analysis error variance gives 16.2, and get 15.0, thus for explaining secondary maxima on high end of histogram. O3 iter 0 we get 15.0, thus explaining secondary maxima on high end of histogram. (a) (b) Figure 4. Analysis error variance for ozone optimal analysis case O 3 iter 1. (a) is analysis error on model grid and (b) at active observation sites. Note that color bar of left and right panels are different. he maximum of color bar for left panel correspond to σ 2 o + σ 2 b.

Atmosphere 2017, 8, 70 13 of 21 Figure 4. Analysis error variance for ozone optimal analysis case O3 iter 1. (a) is analysis error on model grid and (b) at active observation sites.

5. Distribution Distribution (histogram) (histogram) of of ozone ozone analysis analysis error error variance variance at at active active observation observation locations.

13 Atmosphere 2017, 8, of 21 Figure 4. Analysis error variance for ozone optimal analysis case O3 iter 1. (a) is analysis error on model grid and (b) at active observation sites. Note that color bar of left and right Atmosphere 2018, 9, of 21 panels are different. he maximum of color bar for left panel correspond to σ o + σ b. (a) (b) Figure Figure Distribution Distribution (histogram) (histogram) of of ozone ozone analysis analysis error error variance variance at at active active observation observation locations. (a) First analysis experiment O3 locations. (a) First analysis experiment O iter 0 (no optimization) on left panel; (b) Optimal 3 iter 0 (no optimization) on left panel; (b) Optimal analysis case O3 analysis case O iter 1. Note that scales are different between left and right panels. 3 iter 1. Note that scales are different between left and right panels. he mean perceived analysis error variance for all experiments is presented in able 4. Comparing he mean perceived analysis error variance for all experiments is presented in able 4. Comparing se values with estimated values of analysis error variance based on diagnostics in able 2 we se values with estimated values of analysis error variance based on diagnostics in able 2 we note that for both optimal experiments, O3 iter 1 and PM2.5 iter 1, perceived analysis error note that for both optimal experiments, O variance roughly agrees with all analysis 3 iter 1 and PM error variances 2.5 iter 1, perceived analysis error variance estimated with diagnostics (able 2). roughly agrees with all analysis error variances estimated with diagnostics (able 2). able 4. Perceived analysis error variance. Mean over active observation sites. able 4. Perceived analysis error variance. Mean over active observation sites. Experiment Perceived tr ( HA PH )/ N s Experiment Perceived tr(hâ O3 iter P H )/N s O O3 3 iter iter O 3 iter PM2.5 iter PM 2.5 iter PM iter iter However, for non-optimal analyses, O3 iter 0 and PM2.5 iter 0, re is a general However, for non-optimal analyses, O disagreement between all estimated values. 3 iter 0 and PM Looking more 2.5 iter 0, re is a general disagreement closely, however, we note that between all estimated values. Looking more closely, however, we note that agreement in optimal agreement in optimal case is not perfect. he perceived analysis error variance is about 20% case is not perfect. he perceived analysis error variance is about 20% lower than best estimates lower than best estimates tr ( H Aˆ MDJ H ) / Ns and tr ( H Aˆ 2 tr(hâ DH ) / Ns. he optimal χ / Ns values MDJ H )/N s and tr(hâ D H )/N s. he optimal χ 2 /N s values in optimal cases are slightly above in one, optimal thus indicating cases that are slightly innovation above covariance one, thus consistency indicating is not that exact and innovation some furr covariance tuning ofconsistency error statistics not exact could and be done. some More furr on that tuning matter of will error be presented statistics in could Section be 4.5. done. More on that matter will be presented in Section Discussion on Statistical Assumptions and Practical Applications 4. Discussion on Statistical Assumptions and Practical Applications 4.1. Representativeness Error with In situ Observations 4.1. he Representativeness statistical diagnostics Error with presented In situ Observations in Section 2 derive from assumption that observation errors he are statistical horizontally diagnostics uncorrelated presented and in uncorrelated Section 2 derive withfrom background assumption error. that Although observation this assumption errors are horizontally is never entirely uncorrelated observedand in reality, uncorrelated re are with ways tobackground work around error. it. InAlthough case this of inassumption situ observations, is never and entirely assuming observed thatin any reality, systematic re are error ways have to work been removed, around it. random In case errors of in are situ still observations, present, dueand to assuming difference that between any systematic observation error have and been removed, model s equivalent random errors of are observation called still present, due to representativeness difference between error (see Janjic observation et al. [12] and for a review). model s Representativeness equivalent of error observation called is due to unresolved representativeness scales and processes error in(see Janjic model et and al. [12] interpolation for a review). or forward Representativeness observation model error is errors. due to hese unresolved errors are scales typically and processes roughlyin at model scale of and interpolation model grid or [21,22], forward so typically observation a few tens of kilometers for air quality models. his should not be confused with representativeness of an observation, where, for example, remote stations are representative of large area (e.g., several

14 Atmosphere 2017, 8, of 21 model errors. hese errors are typically roughly at scale of model grid [21,22], so typically a Atmosphere 2018, 9, of 21 few tens of kilometers for air quality models. his should not be confused with representativeness of an observation, where, for example, remote stations are representative of large area (e.g., several hundreds of kilometers), whereas urban and suburban stations are at scale of human activity in cities, traffic and industries, etc. and are, depending on chemical specie, of a few kilometers and less. Representativeness error of in situ measurements can be discarded altoger by simply filtering any pair of observations that are in range of a few model grid sizes, both in in assimilation and estimation of error statistics [1] or in pairs of passive-active observations for cross-validation [17]. Once this filtering is done, assumption on observation errors being spatially uncorrelated and uncorrelated with background error n applies Correlated Observation-Background Errors In any case, it itis is interesting to show to show how how different different diagnostics, diagnostics, introduced introduced Section in 2, Section depends 2, depends on statistical on statistical assumptions assumptions of observation of observation error. One error. way to One get way an understanding to get understanding of effect of se effect assumptions of se assumptions is to look at is it to from look a geometrical at it from a geometrical point of view, point using of view, representation using representation introduced introduced Section 2.2. in Section Note that 2.2. Note same that results same can be results obtained cananalytically, be obtainedbut analytically, geometrical but interpretation geometrical interpretation gives a simple gives and appealing a simple and way appealing of looking way at of looking problem. at problem. Let us consider effect on analysis of observation error correlated with background error. he case where observation error is is uncorrelated with with background error error is is represented in in Figure Figure 6 on 6 on panel panel a and a and when when y y are are correlated on on panel panel b. b. (a) (b) Figure Figure Geometrical Geometrical representation representation of of analysis. analysis. (a) (a) for for observation observation errors errors uncorrelated uncorrelated with with background background error. error. (b) (b) with with correlated correlated errors. errors. indicate indicate truth, O truth, observation, O observation, B background B and background Â optimal and Â analysis. optimal analysis. he observation and background error variances are kept unchanged, with same ( O, ) he observation and background error variances are kept unchanged, with same (O, ) length and length (B, ) and length ( B, ) inlength both panels. in both Inpanels. casein of correlated case of errors correlated angle errors BO angle is no longer BOa right is no angle. longer Yet, a right it is angle. still possible Yet, it to is obtain still possible an optimal to obtain analysis, optimal Â, as a linear analysis, combination Â, as a linear of combination observation and of observation background, and on line background, (O, B), for on which line distance ( O, B), for Â which to (i.e., distance analysis Â error to variance) (i.e., is minimum. In this case, (Â, ) (O, B). Note that for strongly correlated errors and when σ analysis error variance) is minimum. In this case, ( Aˆ b 2, ) ( O, B). Note that for strongly correlated > σ2 o, although Â is still on line (O, B), it may actually lie outside segment [O, B]. Yet, principles 2 2 and errors ory and still when holdσ bin> that σ o, case. although Â is still on line ( O, B), it may actually lie outside segment When[ O, B observation ]. Yet, principles error is uncorrelated and ory with still hold background in that case. error, (O, ) (B, ), triangles OÂWhen and BÂ observation are similarerror and is it uncorrelated follows that with (O Â)(Â background B) = ( error, Â ) ( O, 2, ) which ( B, is ), Desroziers et al. triangles OA ˆ [14] diagnostic and BA ˆ for analysis error variance. However, when observation 2 error are similar and it follows that ( O Aˆ)( Aˆ B) = ( Aˆ ), which is is correlated with background error (right panel of Figure 6), triangles OÂ and BÂ are no Desroziers longer similar et al. [14] triangles diagnostic andfor analysis Desroziers error et variance. al. [14] However, diagnostics when for analysis observation error does error not is correlated hold (seewith derivation background in Appendix error A). (right However, panel of Figure HL, Equation 6), triangles (18), and O MDJ A ˆ and diagnostic, BA ˆ Equation are no longer (19), depend similar triangles only on having and right Desroziers triangleset OÂ al. [14] and diagnostics BÂ, and for not analysis on error orthogonality does not of hold (B,(see ) with derivation (O, ). in herefore, Appendix A). HL However, and MDJ diagnostics HL, Equation are(18), valid and with MDJ ordiagnostic, without correlated Equation observation-background errors. (19), depend only on having right triangles OA ˆ and BA ˆ, and not on orthogonality of

15 Atmosphere 2018, 9, of Estimation of Satellite Observation Errors with In situ Observation Cross-Validation One of important problems in satellite assimilation is estimation of satellite observation error, which could be addressed with a simple modification of our cross-validation procedure. Let us assume that we have in situ observations that we assume to have uncorrelated errors between mselves (or use a filter with a minimum distance as discussed in Section 4.1), with background errors and satellite observation errors. Yet, satellite observation errors could be correlated with background error. Satellite observations could come from a multi-channel instrument with channel-correlated observation errors, as found with many instruments, and yet our validation procedure can still be used. Let us consider that analyses comprise of satellite and in situ observations but, for purpose of cross-validation, we use only 2/3rd of in situ observations in analysis, and keep remaining 1/3rd as passive to carry out cross-validation procedure. he first thing to note is that passive in situ observations have uncorrelated errors with analysis error ( analysis is composed of satellite observations and 2/3rd of in situ observations). We n use Equation (7) where interpolation of analysis is made only at in situ active observations. Minimizing E[(O A) c (O A)] (i.e., trace of left hand side of Equation (7)) results in finding optimal in situ observation weight. hen, computing analysis error covariance in satellite observation space from analysis scheme (eir from a Hessian of a variational cost function, or with an explicit gain as in Equation (23)), i.e., H sat ÂHsat, we use HL formulation Equation (18) to obtain satellite observation error covariance, H sat ÂH sat + E[(O A) sat (O A) sat ] = R sat (24) he Equation (24) has important properties that estimated observation error covariance is symmetric and positive definite by construction. hen, a new analysis could be carried out to obtain a more realistic H sat ÂH sat, with a resulting updated R sat, and so forth until convergence Remark on Cross-Validation of Satellite Retrievals As a last remark, it appears that cross-validation of satellite retrieval observations using a k-fold approach where observations are used as passive observations to validate analysis can be a difficult problem. Retrievals from passive remote sensing at nadir generally involve a prior or climatology or a model assumption over different regions, and is thus likely to have spatially correlated errors and errors correlated with background error. It does not mean, however, that nothing can be done in that case. For example, for certain sensors, such as infrared sensors, it is possible to disentangle prior from retrieval, so that by an appropriate transformation of measurements, observations can be practically decorrelated from background [23,24]. However to authors knowledge, such an approach have never been undertaken for visible measurements such as for NO 2 or AOD s Lack of Innovation Covariance Consistency and Its Relevance to Statistical Diagnostics he error covariance diagnostics for optimal analysis, presented in Sections 2.4 and 2.5, depends on innovation covariance consistency, E[(O B)(O B) ] = H BH + R, and our results presented in Section 3 have shown that different estimates for optimal analysis error variance are close, but do not strictly agree to each or. his disagreement is related to lack of innovation consistency as follows. Let us introduce a departure matrix from innovation covariance consistency as, E[dd ](H BH + R) 1 = I + (25)

16 Atmosphere 2018, 9, of 21 he trace of Equation (25), which is related to χ 2, is given by E[χ 2 ] = tr{e[dd ](H BH + R) 1 } = N s + tr( ) (26) We recall that in experiment iter 1 we got χ 2 /N s values of 1.36 for O 3 and 1.25 for PM 2.5 (see able 1), indicating that innovation covariance consistency is deficient, although less serious than with experiment iter 0 where values of 2 and higher have been obtained. If we take into account fact that re can be a difference between E [ dd ] and (H BH + R) and we rederive (active) analysis error covariance for HL, MDJ and D schemes, we get (see Appendix B) tr{hâ HL H } = tr{hâ true H } + tr{ R } tr{h BH } + tr{error MDJ } (27) tr{hâ MDJ H } = tr{hâ true H } tr{error MDJ } (28) tr{hâ D H } = tr{hâ true H } tr{error D } (29) where tr{error MDJ } = tr{(h BH )(H BH + R) 1 (H BH )}, tr{error D } = tr{ R(H BH + R) 1 (H BH )}. We note that although error terms are complex expressions, y all depend linearly on. hus, disagreement between HL, MDJ and D analysis error variance estimates is due to lack of innovation covariance consistency. 5. Conclusions We showed that analysis error variance can be estimated and optimized, without using a model forecast, by partitioning original observation data set into a training set, to create analysis, and an independent (or passive) set, used to evaluate analysis. his kind of evaluation by partitioning is called cross-validation. he method derives from assuming that observations have spatially uncorrelated errors or, minimally, that independent (or passive) observations have uncorrelated errors with active observation, and are uncorrelated background error. his leads to important property that passive observations are uncorrelated with analysis error and can n be used to evaluate analysis [17]. We have developed a oretical framework and a geometric interpretation that has allowed us to derive a number of statistical estimation formulas of analysis error covariance that can be used in both passive and active observation spaces. It is shown that by minimizing variance of observation-minus-analysis residuals in passive observation space we actually identify optimal analysis. his has been done with respect to a single parameter, namely ratio of observation to background error variances, to obtain a near optimal Kalman gain. he optimization is also done under constraint of innovation covariance consistency [14,15]. his optimization could have been done with more than one error covariance parameter but this has not been attempted here. he ory does suggest, however, that minimum is unique. Once an optimal analysis is identified we conduct an evaluation of analysis error covariance using several different formulas; Desroziers et al. [14], Hollingsworth Lönnberg [13], and one that we develop in this paper which works in eir active or passive observation spaces. As a way to validate analysis error variance computed by analysis scheme itself, so-called perceived analysis error variance [6], we compare it with values obtained from different statistical diagnostics of analysis error variance. his methodology arises from a need to assess and improve ECCC s surface air quality analyses using our operational air quality model GEM-MACH and real-time surface observations of O 3 and PM 2.5. Our method applied ory in a simplified way. First by considering averaged observation and background error variances and finding an optimal ratio γ = σo 2 /σb 2 using as a constraint trace of innovation covariance consistency [15]. Second, using a single parameter correlation model, its correlation length, we used maximum likelihood estimation [11] to obtain near

17 Atmosphere 2018, 9, of 21 optimal analyses. Also we did not attempt to account for representativeness error in observations by, for example, filtering observations that are close. Despite all se limitations, our results show that with near optimal analyses, all estimates of analysis error variance roughly agree with each or, while disagreeing strongly when input error statistics are not optimal. his check on estimating analysis error variance gives us confidence that method we propose is reliable, and provides us an objective method to evaluate different analysis components configurations, such as type of background error correlation model, spatial distribution of error variances and possibly use of thinning observations to circumvent effects of representativeness errors. he methodology introduced here for estimating analysis error variances is general and not restricted to case of surface pollutant analysis. It would be desirable to investigate or areas of applications, such as surface analysis in meteorology and oceanography. he method could, in principle, provide guidance for any assimilation system. By considering observation space subdomain [25], proper scaling, local averaging [26], or or methods discussed in Janjic et al. [12] it may also be possible to extend this methodology to spatially varying error statistics. Based on our verification results in Part I [5], we found that re is a dependence between model values and error variances, which we will investigate furr in view of our next operational implementation of Canadian surface air quality analysis and assimilation. One strong limitation of optimum interpolation scheme we are using (i.e., homogeneous isotropic error correlation and uniform error variances), which is also case for most 3D-Var implementations, is lack of innovation covariance consistency. Ensemble Kalman filters seem, however, much better in that regard although y have ir own issues with localization and inflation. Experiments with chemical data assimilation using an ensemble Kalman filter does gives χ 2 /N s values very close to unity after simple adjustments for observation and model error variances [27]. We thus argue that ensemble methods, such as ensemble Kalman filter, would produce analysis error variance estimates that are much more consistent between different diagnostics. Estimates of analysis uncertainties can also be obtained by resampling techniques, such as jackknife method and bootstrapping [28]. In bootstrapping with replacement, distribution of analysis error is obtained by creating new analyses by replacing and duplicating observations from an existing set of observations [28]. his technique relies on assumption that each member of dataset is independent and identically distributed. For surface ozone analyses where re is persistence to next day and statistics is spatially inhomogeneous, assumption of statistical independence may not be adequate. he comparison of se resampling estimates of analysis uncertainties could be compared with our analysis error variance estimates to help us identify limitations and areas of improvement. Supplementary Materials: he following are available online at Figure S1: Analysis error variance for ozone optimal analysis case PM 2.5 iter 1. Left panel is analysis error on model grid and on right panel at active observation sites. Note that color bar of left and right panels are different. he maximum of color bar for left panel correspond to σo 2 + σb 2, Figure S2: Distribution (histogram) of ozone analysis error variance at active observation locations. First analysis experiment PM 2.5 iter 0 (no optimization) on left panel, and optimal analysis case PM 2.5 iter 1 on right panel. Acknowledgments: We are grateful to US/EPA for use of AIRNow database for surface pollutants and to all provincial governments and territories of Canada for kindly transmitting ir data to Canadian Meteorological Centre to produce surface analysis of atmospheric pollutants. We are also thankful for proof read by Kerill Semeniuk, and for two anonymous reviewers for ir comments and help in improving manuscript. Author Contributions: his research was conducted as a joint effort by both authors. R.M. contributed to oretical development and wrote paper, and M.D.-J. conducted all experiments design and execution, proof reading and introduced a new diagnostic of optimal analysis error that was furr extended to passive observations space. Conflicts of Interest: he authors declare no conflict of interest. he founding sponsors, which is government of Canada, had no role in design of study; in collection, analyses, or interpretation of data; in writing of manuscript, and in decision to publish results.

18 Atmosphere 2018, 9, of 21 Appendix A. A Geometrical Derivation of Desroziers et al. Diagnostic Let u and v be two random variables of a real Hilbert space as defined in Section 2.3. wo properties of Hilbert spaces are: polarization identity u, v = 1 4 { u + v 2 u v 2} (A1) and parallelogram identity [19] Combining se two equations, we get: ( u + v 2 + u v 2 = 2 u 2 + v 2) u, v = 1 2 { u + v 2 u 2 v 2} Let u = O A and v = A B, n u + v = O B, and so we have (O A), (A B) = 1 2 { O B 2 O A 2 A B 2} (A2) (A3) (A4) If we assume that analysis is optimal, so that OA is a right triangle (see Figure 1), n Â 2 + O Â 2 = O 2 (A5) and similarly that AB is a right triangle, Â 2 + Â B 2 = B 2 and substitute se expression into Equation (A4) we get (O Â), (Â B) = 1 2 { O B 2 O 2 B 2 2 Â 2} (A6) (A7) If in addition we assume that we have uncorrelated observation-background errors, that is O B 2 = O 2 + B 2 (A8) we n get (O Â), (Â B) = Â 2 (A9) Note that difference with result obtained in Appendix A (property 6) of Ménard [15] where it is shown that necessary and sufficient condition for perceived analysis error covariance in active observation space to meet Desroziers et al. diagnostics for analysis error, is to have innovation covariance consistency and uncorrelated observation-background errors. he result obtained here is different; it concerns optimal analysis error covariance. Appendix B. Diagnostics of Analysis error Covariance and Innovation Covariance Consistency Let us introduce a departure matrix from innovation covariance consistency as, E[dd ](H BH + R) 1 = I + (A10)

19 Atmosphere 2018, 9, of 21 he HL diagnostic, Equation (18), can be expanded as E[(O Â)(O Â) ] = E[dd ] + H BH (H BH + R) 1 E[dd ](H BH + R) 1 E[dd ](H BH + R) 1 H BH H BH (H BH + R) 1 E[dd ] = R H( B BH(H BH + R) 1 H B)H + error HL = R HÂH + error HL (A11) where error HL is given as error HL = R (H BH ) + H BH (H BH + R) 1 (H BH ) (A12) which includes three error terms. A scalar version of Equation (A11) is given as he error analysis for MDJ diagnostic, Equation (19) is error HL = σ2 b 1 + γ σ2 b + σ2 o (A13) E[(Â B)(Â B)] = H KE[dd ] KH = H BH (H BH + R) 1 H BH + error MDJ (A14) where error MDJ is given as error MDJ = H BH (H BH + R) 1 (H BH ) (A15) here is only one term, and its scalar version is given as, error MDJ = σ2 b 1 + γ (A16) he error analysis for Desroziers et al. diagnostic, Equation (20) is E[(O Â)(Â B)] = E[dd ](H BH + R) 1 H BH H BH (H BH + R)E[dd ](H BH + R) 1 H BH = H BH H BH (H BH + R) 1 H BH + error D (A17) where error D is given as error D = {I H BH (H BH + R) 1 } (H BH ) = R(H BH + R) 1 (H BH ) (A18) Again only one error term that is similar to MDJ diagnostic, and its scalar version is given as, error D = σ2 o 1 + γ (A19) Error analysis for diagnostics using passive observations can also be derived. For passive MDJ diagnostics, Equation (21), we have similarly to (A14), E[(Â B) c (Â B) c ] = H c KE[dd ] KH c = H c BH (H BH + R) 1 H BH c + error MDJ_passive (A20) with a single error term given as, error MDJ_passive = H c BH (H BH + R) 1 (H BH c ) (A21)

20 Atmosphere 2018, 9, of 21 o express a scalar version of this equation we need to account for background error correlation ρ between active observation location and passive observation location, and thus expressed as, error MDJ_passive = ρ2 σ 2 b 1 + γ (A22) Finally, fundamental diagnostic of cross-validation Equation (7) does not depend explicitly on innovation covariance consistency. However, attaining its true minimum by tuning only γ and L c as would suggest Equation (22), does introduce some innovation in-consistency, which all or optimal diagnostics Equations (18) (21) has to account for. References 1. Ménard, R.; Robichaud, A. he chemistry-forecast system at Meteorological Service of Canada. In Proceedings of ECMWF Seminar Proceedings on Global Earth-System Monitoring, Reading, UK, 5 9 September 2005; pp Robichaud, A.; Ménard, R. Multi-year objective analysis of warm season ground-level ozone and PM 2.5 over North-America using real-time observations and Canadian operational air quality models. Atmos. Chem. Phys. 2014, 14, [CrossRef] 3. Robichaud, A.; Ménard, R.; Zaïtseva, Y.; Anselmo, D. Multi-pollutant surface objective analyses and mapping of air quality health index over North America. Air Qual. Atmos. Health 2016, 9, [CrossRef] [PubMed] 4. Moran, M.D.; Ménard, S.; Pavlovic, R.; Anselmo, D.; Antonopoulus, S.; Robichaud, A.; Gravel, S.; Makar, P.A.; Gong, W.; Stroud, C.; et al. Recent advances in Canada s national operational air quality forecasting system. In Proceedings of 32nd NAO-SPS IM, Utrecht, he Nerlands, 7 11 May Ménard, R.; Deshaies-Jacques, M. Evaluation of analysis by cross-validation. Part I: Using verification metrics. Atmosphere 2018, in press. 6. Daley, R. he lagged-innovation covariance: A performance diagnostic for atmospheric data assimilation. Mon. Wear Rev. 1992, 120, [CrossRef] 7. Daley, R. Atmospheric Data Analysis; Cambridge University Press: New York, NY, USA, 1991; p alagrand, O. A posteriori evaluation and verification of analysis and assimilation algorithms. In Proceedings of Workshop on Diagnosis of Data Assimilation Systems, November 1998; European Centre for Medium-Range Wear Forecasts: Reading, UK, 1999; pp odling, R. Notes and Correspondence: A complementary note to A lag-1 smoor approach to system-error estimaton : he intrinsic limitations of residuals diagnostics. Q. J. R. Meteorol. Soc. 2015, 141, [CrossRef] 10. Hollingsworth, A.; Lönnberg, P. he statistical structure of short-range forecast errors as determined from radiosonde data. Part I: he wind field. ellus 1986, 38A, [CrossRef] 11. Ménard, R.; Deshaies-Jacques, M.; Gasset, N. A comparison of correlation-length estimation methods for objective analysis of surface pollutants at Environment and Climate Change Canada. J. Air Waste Manag. Assoc. 2016, 66, [CrossRef] [PubMed] 12. Janjic,.; Bormann, N.; Bocquet, M.; Carton, J.A.; Cohn, S.E.; Dance, S.L.; Losa, S.N.; Nichols, N.K.; Potthast, R.; Waller, J.A.; et al. On representation error in data assimilation. Q. J. R. Meteorol. Soc [CrossRef] 13. Hollingsworth, A.; Lönnberg, P. he verification of objective analyses: Diagnostics of analysis system performance. Meteorol. Atmos. Phys. 1989, 40, [CrossRef] 14. Desroziers, G.; Berre, L.; Chapnik, B.; Poli, P. Diagnosis of observation-, background-, and analysis-error statistics in observation space. Q. J. R. Meteorol. Soc. 2005, 131, [CrossRef] 15. Ménard, R. Error covariance estimation methods based on analysis residuals: heoretical foundation and convergence properties derived from simplified observation networks. Q. J. R. Meteorol. Soc. 2016, 142, [CrossRef] 16. Kailath,. An innovation approach to least-squares estimation. Part I: Linear filtering in additive white noise. IEEE rans. Autom. Control 1968, 13,

Atmosphere 2018, 9, 70 21 of 21 17. Marseille, G.-J.; Barkmeijer, J.; de Haan, S.; Verkley, W. Assessment and tuning of data assimilation systems using passive observations. Q. J. R. Meteorol. Soc.

21 Atmosphere 2018, 9, of Marseille, G.-J.; Barkmeijer, J.; de Haan, S.; Verkley, W. Assessment and tuning of data assimilation systems using passive observations. Q. J. R. Meteorol. Soc. 2016, 142, [CrossRef] 18. Waller, J.A.; Dance, S.L.; Nichols, N.K. heoretical insight into diagnosing observation error correlations using observation-minus-background and observation-minus-analysis statistics. Q. J. R. Meteorol. Soc. 2016, 142, [CrossRef] 19. Caines, P.E. Linear Stochastic Systems; John Wiley and Sons: New York, NY, USA, 1988; p Cohn, S.E. he principle of energetic consistency in data assimilation. In Data Assimilation; Lahoz, W., Boris, K., Richard, M., Eds.; Springer: Berlin/Heidelberg, Germany, Mitchell, H.L.; Daley, R. Discretization error and signal/error correlation in atmospheric data assimilation: (I). All scales resolved. ellus 1997, 49A, [CrossRef] 22. Mitchell, H.L.; Daley, R. Discretization error and signal/error correlation in atmospheric data assimilation: (II). he effect of unresolved scales. ellus 1997, 49A, [CrossRef] 23. Joiner, J.; da Silva, A. Efficient methods to assimilate remotely sensed data based on information content. Q. J. R. Meteorol. Soc. 1998, 124, [CrossRef] 24. Migliorini, S. On quivalence between radiance and retrieval assimilation. Mon. Wear Rev. 2012, 140, [CrossRef] 25. Chapnik, B.; Desroziers, G.; Rabier, F.; alagrand, O. Properties and first application of an error-statistics tunning method in variational assimilation. Q. J. R. Meteorol. Soc. 2005, 130, [CrossRef] 26. Ménard, R.; Deshiaes-Jacques, M. Error covariance estimation methods based on analysis residuals and its application to air quality surface observation networks. In Air Pollution and Its Application XXV; Mensink, C., Kallos, G., Eds.; Springer International AG: Cham, Switzerland, Skachko, S.; Errera, Q.; Ménard, R.; Christophe, Y.; Chabrillat, S. Comparison of ensemlbe Kalman filter and 4D-Var assimilation methods using a stratospheric tracer transport model. Geosci. Model Dev. 2014, 7, [CrossRef] 28. Efron, B. An Introduction to Boostrap; Chapman & Hall: New York, NY, USA, 1993; p by authors. Licensee MDPI, Basel, Switzerland. his article is an open access article distributed under terms and conditions of Creative Commons Attribution (CC BY) license (

Interpretation of two error statistics estimation methods: 1 - the Derozier s method 2 the NMC method (lagged forecast)

Interpretation of two error statistics estimation methods: 1 - the Derozier s method 2 the NMC method (lagged forecast) Richard Ménard, Yan Yang and Yves Rochon Air Quality Research Division Environment