Correcting the mathematical structure of a hydrological model via Bayesian data assimilation

Size: px

Start display at page:

Download "Correcting the mathematical structure of a hydrological model via Bayesian data assimilation"

Christal Hicks
5 years ago
Views:

1 WATER RESOURCES RESEARCH, VOL. 47,, doi: /2010wr009614, 2011 Correcting the mathematical structure of a hydrological model via Bayesian data assimilation Nataliya Bulygina 1 and Hoshin Gupta 2 Received 4 June 2010; revised 29 January 2011; accepted 18 February 2011; published 12 May [1] The goal of model identification is to improve our understanding of the structure and behavior of a system so the model can be used to make inferences about its input state output response. It is conventional to preselect some model form and evaluate its suitability against historical data. If deemed unsuitable, ways must be found to correct the model through some intuitive process. Here, we discuss a Bayesian data assimilation process by which historical observations can be used to diagnose what might be wrong with the presumed mathematical structure of the model and to provide guidance toward fixing the problem. In previous work we showed how, given a suitable conceptual model for the system, the Bayesian estimation of structure (BESt) method can estimate the stochastic form for structural equations of a model that are consistent with historical observations at the spatiotemporal scale of the data while explicitly estimating model structural contributions to prediction uncertainty. However, a prior assumption regarding the form of the equations (an existing model) is often available. Here, we extend BESt to show how the mathematical form of the prior model equations can be corrected/improved to be more consistent with available data while remaining consistent with the presumed physics of the system. Conditions under which convergence will occur are stated. The potential of the extended BESt approach is demonstrated in the context of basin scale hydrological modeling by correcting the equations of the HyMod model applied to the Leaf River catchment and thereby improving its representation of system input state output response. Citation: Bulygina, N., and H. Gupta (2011), Correcting the mathematical structure of a hydrological model via Bayesian data assimilation, Water Resour. Res., 47,, doi: /2010wr Introduction [2] While numerous catchment scale models have been published [Crawford and Linsley, 1966; Burnash et al., 1973; Wagener et al., 2004], and much has been written about how to conduct a robust model calibration [e.g., Duan et al., 1994; Boyle et al., 2000], less attention has been given to the issue of proper model evaluation when a comparison with data indicates the model structure to be deficient and even less to the problem of how to diagnose model structural deficiencies with a view to resolving them [Gupta et al., 2008]. In fact, the concept of model structural error is arguably not even well understood, although Doherty and Welter [2010] present an interesting conceptual and mathematical approach to the subject. [3] Dealing with a model deficiency can involve different kinds of modifications to (1) model architecture or configuration (the conceptual representation of the system), (2) model equations, (3) model effective (as opposed to observed) 1 Civil and Environmental Engineering Department, Imperial College, London, UK. 2 Department of Hydrology and Water Resources, University of Arizona, Tucson, Arizona, USA. Copyright 2011 by the American Geophysical Union /11/2010WR inputs, (4) model states, and/or (5) model parameters. Model architecture is defined as a list of all perceived to be important inputs, states, and outputs, and causal links between them, but excluding the model equations (that are to be specified separately); that is, two different models can have the same architecture (conceptual model) but use different mathematical equations to relate the inputs, states, and outputs. Examples of work involving modifications to model architecture include the following: Young [2001] infers model configuration directly from data using transfer functions; Perrin et al. [2003] look for a maximum level of complexity supported by available data; Fenicia et al. [2008] explore progressive adaptations of the model architecture based on catchment characteristics, available data, and expert (subjective) judgment; Clark et al. [2008] analyze and evaluate multiple preselected model configurations; and Neuman [2003] and Ye et al. [2004] prescribe probabilities to a preselected set of appropriate model architectures. Regarding the multiple model approach, composite model predictions can theoretically be shown to be superior to any individual model prediction provided that the ensemble of models is somehow selected to be both independent and spanning the model space [Winter and Nychka, 2010]. [4] When the model architecture is considered to be correct (and therefore not subject to modification), deficiencies in the model mathematical equations can be treated in various ways. Abramowitz et al. [2006] model systematic deviations 1of15

2 between the model simulations and observed data; Kuczera et al. [2006] and Reichert and Mieleitner [2009] treat parameters as stochastic variables with possible time or state dependency; Kennedy and O Hagan [2001] stochastically estimate the model equations themselves when all model inputs, states, and outputs are observable. [5] When the model mathematical equations are also considered to be correct, one can make inferences about the joint distribution of the inputs and model parameters [Kavetski et al., 2006; Renard et al., 2010] or one can adjust the model states to compensate for input, initial conditions, and model structure errors via data assimilation [Burgers et al., 1998; Moradkhani et al., 2005]. Interestingly, Renard et al. [2010] show that because of high sensitivity of a hydrological model to its inputs and parameters, reliable and sharp inference requires reliable and sharp prior error models. [6] Nonetheless, the approaches mentioned above do not go far enough toward addressing the fundamental issue of diagnosing what is wrong with the model and to providing guidance toward fixing the problem. Bulygina and Gupta [2009] take steps in this direction by demonstrating a data assimilation methodology that enables identification of the catchment scale structure of the model equations directly from historical input/output (IO) data (or input state output (ISO) data when state observations are available) while remaining consistent with a prior physically based specification of the conceptual structure of the watershed. The Bayesian estimation of structure (BESt) method belongs to a family of data assimilation strategies based in Bayesian inference. It begins with a prior specification of what is known about the system of interest, constructs a likelihood function to extract information embedded in the IO (or ISO) data, and uses Bayes law to update (portions of) the prior information, thereby constructing a posterior specification of the system. [7] What distinguishes BESt from other filtering techniques is that it provides updated information related to the form of the model structural equations along with updated stochastic estimates of the model state. It does so by recognizing that the model structural equations can be viewed as the conditional joint density of the system unknowns (outputs) given knowledge of the system knowns (inputs and states). The joint density (of system inputs, states, and outputs) can be compactly represented by means of a nonparametric method (mixture of Gaussians approach) that greatly reduces subjectivity in selection of the model equations. An innovative aspect of the method is its recursive proposeand revise algorithm for model structure estimation that allows for the system inputs and outputs to be only partially observable (up to some measurement error), and even for some of the states, and outputs to be unobserved (e.g., soil moisture and evaporation). [8] This paper extends upon our previous work [Bulygina and Gupta, 2009; Bulygina and Gupta, 2010] by developing a connection between BESt and the method of data augmentation [Tanner and Wong, 1987], thereby allowing statements to be made about convergence of the method. This facilitates the use of BESt for detection, diagnosis, and correction of errors associated with the prior assumptions regarding model equations. The consequence is that Bayesian data augmentation can be used to make meaningful inferences about how to improve the mathematical form of an existing conceptual hydrological model. [9] The paper is organized as follows. Section 2 describes the BESt method in the context of a data augmentation framework, provides convergence conditions, and discusses how an existing conceptual hydrological model can be used to construct the model prior. Section 3 presents a case study in which the method is used to develop an improved basin scale daily time step model for the Leaf River catchment (Collins, Mississippi). Finally, section 4 discusses ways in which the approach could be further improved. 2. Method [10] The BESt approach to inferring mathematical structure of a model is based in a probabilistic Bayesian data assimilation framework, which allows for prior theoretical understanding to be combined with information extracted from input state output observations. The method represents our prior assumptions regarding the forms of the model equations, as well as our uncertainty regarding those assumptions, using probability density functions [Bulygina and Gupta, 2009; Kennedy and O Hagan, 2001]. A key innovation of this paper is a technique to formulate and refine this stochastic structure by connecting BESt to the method of data augmentation [Tanner and Wong, 1987] using computationally tractable mixture based density approximation [Muller et al., 1996]. [11] For mathematical consistency, we adopt the notation proposed by Liu and Gupta [2007]. Further, as discussed by Bulygina and Gupta [2009], we treat the specification of a system model as having two sequential stages: a system conceptual structure identification stage [e.g., Clark et al., 2008; Ye et al., 2004; Young, 2001] followed by a system mathematical (equation) structure identification stage. The first stage specifies (by assumption and/or inference) one or more suitable conceptual models for the hydrologic system (its architecture) without specification of the mathematical equations, which can therefore include whole families of watershed models [e.g., Clark et al., 2008]. The second stage determines (by selection or construction) the mathematical equations (and/or rules) that link the input fluxes and state variables to the output fluxes. Whereas it is common [e.g., Clark et al., 2008] at this stage for the model equations to be selected from some predetermined set, BESt instead provides a method for inferring, by construction, the mathematical form of the model equations, conditional on a conceptual structure having been previously specified. This paper takes a step further and discusses a hybrid approach wherein BESt can also be used to correct (modify) a previously selected set of model equations Estimation of Mathematical Model Structure [12] Let our hypothesis regarding a hydrological system be represented by some conceptual model having inputs, u, state variables, x, and outputs, y, such that the geometry of the interconnections between these variables is assumed known (i.e., a directed graph can be drawn indicating the architecture of the interdependence of variables). The extended model state is represented by s t =(u t, x t, y t ), and data set D 1:T includes all observations regarding the system driving forces or inputs (e.g., precipitation, potential evapotranspiration), at least some of the system outputs (e.g., stream discharge, actual evapotranspiration), and possibly some measurements of system state variables (i.e., soil moisture content). Our goal 2of15

3 is to construct an estimate of the mathematical structure (mappings/equations) for the model that is consistent with the available information (prior assumptions and observed data). [13] To do so, we make a few additional assumptions. First, we assume that the measurement errors in the data have zero mean (this assumption can be relaxed if the form of the measurement error bias is known). Second, and more critical, we assume that the data set D 1:T is sufficiently representative of the important input state output dynamics of the system, i.e., that the historical observations of system behavior contain sufficient information regarding the range of system dynamics to allow for accurate predictions of future system behavior to be made. In this regard we also assume that the system is ergodic. These assumptions imply the hypothesis that the system will continue to return to states similar to the ones it has previously experienced and that the frequency distribution of such states can be represented by the observed distribution of frequencies. [14] Under these conditions, and on the basis of each set of observed system variables at each time moment being regarded as an independent draw, the joint probability density function, p(s D 1:T ), can be considered to be representative of the statistical distribution of system behavior over the period of the data and treated as sufficient to describe system behavior during future time periods. For conditions where the data is not sufficient to construct this joint probability density function (jpdf), we must provide some reasonable prior hypothesis regarding the form of p(s D 1:T ) that allows us to make inferences under novel conditions. Under such conditions we must specify with what frequency to sample this alternative density. [15] Given p(s D 1:T ), the mathematical form of the model equations can be estimated by computing the conditional density function p(y u, x, D 1:T ), giving us a probabilistic equation that maps uncertain knowledge about inputs, u, and system states, x, into uncertain estimates of the outputs, y. If desired, a deterministic form for the model equation can then be derived as the functional form that maps (for example) the expected values of the input and state variables into expected value of the outputs y u, x. For a more complete description, the second and higher order moments (or quantiles) can also be provided. [16] To summarize, once the joint density function p(s D 1:T ) is computed, the problem of mathematical structure estimation can be solved under any hypothesis regarding the form of the conceptual model. Although other choices might be made (i.e., kernel density estimation [Silverman, 1986]), the BESt approach estimates this density as a weighed sum of a relatively small number (as compared with kernel density methods) of multivariate Gaussian pdfs [Ferguson, 1983; Muller et al., 1996]; for details, see Appendix A. The problem of density function estimation is treated as a problem of mixture model estimation, characterized by an appropriate set of distribution parameters Q (the means, covariance matrices, and mixture weights). The task, therefore, is to compute the probability density function p(q D 1:T ) of the parameters Q conditioned on the observed data D 1:T. [17] In the trivial case that all system states are observable at each time step, estimation of the mixture model parameters is straightforward. However, when some of the system variables are not observed (or not directly observable) the problem is more complicated; in catchment modeling, precipitation and discharge are typically observed, while other variables such as soil moisture at different soil levels and locations and water levels along the stream network are typically unobserved. At least two approaches to this problem have been explored in the mathematical literature: (1) the method of Gibbs sampling [Casella and George, 1992; Gelfand and Smith, 1990] and (2) the method of data augmentation [Tanner and Wong, 1987]. Here, we follow the second approach, treating the unknown parameters and unobserved states as two separate and logically grouped multidimensional entities. [18] The basic idea underlying data augmentation is simple. The observed data, D 1:T, are augmented by the extended model state, s 1:T, also referred to as the latent data. Under the assumption that D 1:T and s 1:T are both known, the analysis is straightforward and the augmented data posterior, p(q s 1:T, D 1:T ), can be easily computed. However, we actually wish to compute p(q D 1:T ) requiring that s 1:T be known. The data augmentation approach is to generate multiple estimates of s 1:T from the predictive distribution, p(s 1:T D 1:T ), and to approximate p(q D 1:T ) as the expectation of p(q s 1:T, D 1:T ) over the multiple draws on s 1:T (by integrating out the unknowns s 1:T ). However, because p(s 1:T D 1:T ) depends, in turn, on p(q D 1:T ), the mutual dependency leads to an iterative algorithm, which is analytically a method of successive substitution for solving an operator fixed point equation [Ortega and Rheinboldt, 2000] The Algorithm [19] The algorithm is motivated by the following simple representation of the desired posterior density: pðjd 1:T Þ ¼ S pðjs 1:T ; D 1:T Þps ð 1:T j; D 1:T Þds 1:T : ð1þ The predictive density of s 1:T can, in turn, be related to the desired posterior density by ps ð 1:T j; D 1:T Þ ¼ Q ps ð 1:T j8; D 1:T Þpð8jD 1:T Þd8: ð2þ In the above equations, the sample space for the latent data s 1:T is denoted by S and the parameter space for 8 is denoted by Q. [20] To implement this algorithm we need to sample from two distributions: p( s 1:T, D 1:T ) and p(s 1:T, D 1:T ). Substituting (2) into (1) and interchanging the order of integration, we see that p( D 1:T ) must satisfy the following integral equation (hereafter the range of integration is omitted, as it is implicitly specified by the differentials ds 1:T and d8): where Kð; 8Þ ¼ gðþ¼ Kð; 8Þgð8Þd8; pðjs 1:T ; D 1:T Þps ð 1:T j8; D 1:T Þds 1:T : ð3þ 3of15

4 Let T be an operator that transforms the integrable function f into the integrable function Tf by the equation Tf ðþ¼ Kð; 8Þf ð8þd8: Then, the method of successive substitution for solving (3) becomes an iterative method for calculation of p( D 1:T )in which we begin with some initial guess g 0 () for p( D 1:T ) and successively calculate g iþ1 ðþ¼ ðtg i ÞðÞ: Later, we list some mild conditions required for g i to converge to the desired posterior density p( D 1:T ) when calculated in this manner. [21] Because it is rarely possible to calculate the integrals in (5) analytically, we will employ a Monte Carlo scheme. Motivated by equation (1), the iterative scheme proceeds as follows: [22] Given the current approximation g i to p( D 1:T ) [23] (a) Generate a sample s (1) (m) 1:T,, s 1:T from the current approximation to the predictive density p(s 1:T D 1:T ) [24] (b) Update the current approximation to p( D 1:T ) to be the mixture of conditional densities of given the augmented data generated in (a), so that g iþ1 ðþ¼ 1 X m m j¼1 p js ðþ j 1:T ; D 1:T : To achieve this, we must sample from the distribution p( s 1:T, D 1:T ). Using equation (2), we perform step (a) as follows [25] (a1) generate from g i (), [26] (a2) generate s 1:T from p(s 1:T 8, D 1:T ), where 8 is the value obtained in (a1). To solve the mathematical model estimation problem set up at the beginning of this section, each estimate of the latent variable, s t, is treated as an independent draw from the posterior distribution, p(s t 8, D 1:T ), so that p(s 1:T 8, D 1:T ) becomes a product of the posterior distributions, P t=1 T p(s t 8, D 1:T ). The conditional density, p( s 1:T, D 1:T ), then provides information about the parameters of the mixture of Gaussians used to approximate the joint density function (as discussed above) so that (given the latent data s 1:T ) the distribution is independent of the observations D 1:T, i.e., p( s 1:T, D 1:T )=p( s 1:T ) Convergence Conditions [27] Here we provide a list of sufficient conditions for the iterative algorithm to converge, along with a statement regarding convergence rate. The theoretical proof can be found in Tanner and Wong [1987] and Tanner [1991]. Under these conditions, the density function g* that solves equation (3) is unique, and the iterative procedure (5) converges linearly to g* for essentially any starting value (i.e., the deviation in L 1 norm decreases at a geometric rate) Condition (A) [28] (1) K(, 8) is uniformly bounded and is equicontinuous in. [29] (2) For any 0 2 Q there is an open neighborhood U of 0, so that K(, 8) > 0 for all, 8 2 U. [30] The second part of this condition says that if and 8 are close, then it is possible to generate some latent data, ð4þ ð5þ s 1:T, from p(s 1:T 8, D 1:T ) such that p( s 1:T, D 1:T ) is nonzero, which is reasonable Theorem 1 [31] If Condition (A) holds and the starting value g 0 g satisfies sup 0 ðþ g * ðþ<, then there exists a constant a, with 0<a < 1, such that giþ1 g * a i g0 g * : Remark 1 [32] Theorem 1 requires g 0ðÞ g * ðþto be uniformly bounded, which can be satisfied when, for example, the parameter space Q is compact (and Condition (A) holds) or g 0 has a bounded support Remark 2 [33] The convergence rate a in theorem 1 is dependent on the initial value g 0.IfQ is compact, the supremum of a over all possible g 0 is still less than 1; that is, we get a rate independent of the starting values. If Q is unbounded, however, a can be arbitrary close to 1, depending on the starting value. [34] Ifp( s 1:T, D 1:T ) is a distribution of the parameters of a mixture of Gaussians approximation to the joint probability density, p(s D 1:T ), then from Muller et al. [1996] the density, p( s 1:T, D 1:T )=p( s 1:T ), is a strictly positive function for any latent data pattern, s 1:T. Because of this, it can be shown that K(, 8) > 0 (as required by Condition (A)). [35] To satisfy theorem 1, we need a good initial guess for the starting density g 0. One way to construct a sensible initial guess is to use an existing (deterministic or stochastic) conceptual model, with assumed form for the mathematical equations that has been somehow constrained to be approximately consistent with the available data. Next, we run the model many times in ensemble mode, with the conceptual model parameters sampled from some distribution (representing uncertainty in the parameter values), thereby generating a set of extended state vectors. Since these are fully known, it is simple to construct a suitable prior guess for the joint density function, p(s D 1:T ), and subsequently its corresponding density function, g o () =p( D 1:T ). As mentioned previously, this starting value can be represented as a finite sum of densities g 0 () / P K i¼1 p i (), where { i } K i=1 are the parameters of the Gaussian mixture model representing our prior; that is, each density p i () is centered on i and has bounded support (e.g., a uniform distribution or truncated normal distribution). Because this starting value has bounded support (remark 1), it satisfies theorem 1. Under these conditions, data augmentation can be thought of as a method of mapping correction that merges prior (theoretical) belief regarding model structure with the empirical information contained in the observed data. [36] Even though the convergence conditions appear to be fairly mild, in practice, the convergence rate might be prohibitively slow unless some additional conditions are met (as happens, for example, in Monte Carlo sampling [Robert and Casella, 2004]), thereby potentially making the convergence rate highly dependent on the starting value, observation error structure specifications, and on the parameter estimation algorithm (i.e., mixture of Gaussians). Here, we propose to use the following necessary (but not sufficient) convergence check. Since it is not possible to calculate an analytical form for the distance (in L 1 norm, see theorem 1) between the 4of15

5 Figure 1. Conceptual structure of the HyMod daily time step catchment model. current pdf estimate and the density function that solves equation (3), we use the Kullback Leibler (KL) divergence measure as a convergence diagnostic (see Appendix C); our case study results are based on a KL distance threshold of 0.05 (see Appendix C for more details). 3. A Hydrological Application [37] The method for mathematical structure estimation/ correction outlined above is general and can be applied to a diversity of system identification problems. To illustrate the method, we present a hydrological case study involving lumped catchment scale modeling of daily precipitation runoff dynamics. We begin by postulating a simple conceptual model for the catchment and formalize it into a computational model by adopting a commonly used mathematical form for the system equations. We then use noisy historical input and output observations to update/correct the form of the model equations, under reasonable assumptions regarding the probabilistic structure of the measurement errors Data and Prior Model Assumptions [38] Data for this study is taken from the intensively investigated Leaf River basin (of area 1944 km 2 ) located near Collins in southern Mississippi [e.g., Sorooshian et al., 1983; Brazil, 1988; Gupta et al., 1998]. Forty water years of daily data are available, consisting of mean areal precipitation (mm/day), potential evapotranspiration (mm/day), and streamflow (mm/day). In the work of Bulygina and Gupta [2009] we modeled the water balance dynamics of this basin at annual, monthly, and weekly time scales, using a simple conceptual model having one state variable (basin storage). The simple conceptual model hypothesis was found to be suitable at annual and monthly time scales, but failed at weekly and shorter time scales, suggesting the need for a conceptual model having higher state dimension. [39] Onthebasisofworkbyotherauthors[Boyle et al., 2000; Vrugt et al., 2008; Wagener et al., 2001], we adopt the HyMod daily time step deterministic catchment model having five state variables (Figure 1) as a suitable hypothesis regarding the conceptual structure and prior assumptions regarding mathematical structure for this basin. The model accepts daily precipitation and potential evapotranspiration as inputs and computes estimates of streamflow discharge and actual evapotranspiration as outputs (Appendix D). For purposes of this case study, it is interesting to note that the HyMod equations make the following assumptions: (1) actual ET is linearly proportional to potential ET by a factor indicating the degree of saturation of the upper soil zone, (2) the quick flow component of streamflow is linearly proportional to the storage in a series of three routing reservoirs, and (3) the slow flow component of streamflow (base flow) is linearly proportional to the amount of water stored in the lower soil zone. [40] To initiate the iterative algorithm, we generate the required mathematical prior g o () using the following procedure. First, we calibrate the HyMod model to 6 water years ( ) of Leaf River data using the Shuffled Complex Evolution algorithm (SCE UA) [Duan et al., 1994] to find parameters that minimize the mean squared error of Box Cox transformed flow (l = 0.3). This particular period was selected to represent a broad range of hydrological and system response conditions, including dry, moderate, and wet years; a review of the 40 years of available data indicates that the period is reasonably representative of the range of system dynamics characteristic for this catchment. Next, we assume a multivariate uniform distribution centered on this optimal parameter set and having a range extending to ±10% of the optimal parameter value. Further, we assume a 10% zero mean heteroscedastic Gaussian distribution around the input data (precipitation and potential evapotranspiration) at each time step and homogeneous Gaussian distribution around Box Cox transformed discharge measurements (for more details, see below). We then generate an ensemble of extended state vectors, s t, at each time step by jointly sampling from the joint parameter input uncertainty regions specified above and use this ensemble to construct a prior estimate of the density, p(s D 1:T ) and, subsequently, its corresponding density function, g o () =p( D 1:T ). Next, we use these same 6 years of data to update the initial model density estimate using the BESt data augmentation approach. To conduct a meaningful (independent) evaluation of model performance, the subsequent 6 water years ( ) were used. [41] To implement the data augmentation procedure, information regarding the stochastic form of measurement errors in the precipitation, potential ET, and stream discharge observations must be provided (see Appendix B). Very little information about the actual measurement error distributions for areal precipitation, potential evapotranspiration, or outflow is available. Therefore, for precipitation we follow Vrugt et al. [2008] and Renard et al. [2010] in assuming mutually independent zero mean Gaussian observation errors and set the uncertainty level (standard deviation) at 10% 5of15

6 [Vrugt et al., 2008]. For potential evapotranspiration we make the same assumption. For discharge we assume the Box Cox transformed observations (l = 0.3) to be corrupted by zero mean, uncorrelated, constant standard deviation (s =2) Gaussian measurement noise [Sorooshian and Dracup, 1980; Thiemann et al., 2001]; selecting s = 2 corresponds to about half the standard deviation of the transformed flows, so that the observed median discharge (0.35 mm/day for WY) lies within [0.1; 0.9] mm/day and the observed actual 1% exceedance discharge (18 mm/day for WY) lies within [13; 25] mm/day with 66% probability. This large level of streamflow measurement uncertainty is in agreement with other studies [Di Baldassarre and Montanari, 2009; McMillan et al., 2010]. Of course, a resulting estimate and convergence rate could perhaps be made dependent on initial guesses regarding model structure, input and output error structures, and their interactions [Renard et al., 2010], but this would require separate considerations not pursued in this work. Under these assumptions and conditions, a Bayesian posterior estimate of the system joint probability density function for the Leaf River basin was derived. The results are discussed below Posterior Estimate of the System Joint PDF [42] First we compare the forms of the prior (Appendix D) and posterior mappings. Since all of the estimated quantities are stochastic while the prior is deterministic, we base the following comparisons on the expected values of the estimated mappings. The results, illustrated in Figure 2, show the following: [43] 1. Whereas the prior estimate of effective precipitation P eff (Figure 2a, light gray surface) is determined using a probability distribution for soil moisture store capacities (PDM model) [Moore, 2007] (Appendix D), the posterior estimate of effective precipitation (dark gray surface) exhibits a threshold like dependence on soil moisture. The response is very close to zero when the soil moisture, SM, is low but increases rapidly after some threshold value of soil moisture is exceeded. [44] 2. Whereas the prior estimate of actual evapotranspiration AE (Figure 2b, light gray surface) is determined by the equation AE = C SM max PE indicating that AE is linearly proportional to each of the controlling variables (potential evapotranspiration PE and soil moisture ratio SM/C max ), the posterior estimate of actual ET (dark gray surface) appears to be nonlinearly dependent on soil moisture (compare low and high soil moisture behaviors). [45] 3. Whereas the prior estimate of outflow QQ from quick flow storage is linearly proportional to the corresponding storage level, this linear relationship remains for the posterior estimate (result not shown). However, the third store exhibits an almost one to one relationship between its storage and outflow (based on visual inspection and linear regression slope). This observation suggests a possible conceptual model simplification by removing one of the quick flow stores from the series. [46] 4. Whereas the prior estimate of outflow QS from slow flow storage (Figure 2c, light gray surface) is determined by the equation QS t = KS XS t indicating that QS is linearly dependent on storage XS in the slow flow tank; the posterior estimate (dark gray surface) shows a nonlinear dependence on slow flow storage level, with different rates of outflow at smaller and larger storage depths Posterior Estimate of the HyMod Model Equations [47] Following the inferences made by examining the posterior estimate of the system jpdf (see above), we revised the model structure in two stages while maintaining a similar overall conceptual structure. First, we removed one of the stores used for quick flow routing (previously three stores) and repeated the procedure for the posterior model estimation (i.e., construction of the prior followed by data augmentation) using only the remaining four state variables. This results in a new posterior system jpdf. [48] Next, we modified the model structural equations as follows. For each of the functional dependencies of interest, we made extended state particle draws from the new posterior system jpdf. An illustration of this is provided in Figure 3 (draws are indicated by light gray dots). The distribution of these draws in the predictor predictand space indicates the stochastic functional dependence among these variables, as determined by the data augmentation process. Next, via an empirical trial and error process (using subjective judgment guided by commonly used functional forms from the literature [e.g., Moore, 2007; Liang et al., 1994; Young and Beven, 1994] (Appendix D), we selected deterministic equations that characterize the observed functional dependencies (shown in Figure 3b as the deterministic surface passing through the points). Using this process we selected the following posterior forms for the deterministic model equations (AE as a function of SM and PE, P eff as a function of SM and P, QQ 1 as a function of P fast and XQ 1, QQ 2 as a function of XQ 2 and QQ 1, and QS as a function of XS). [49] 1. AE is modeled as a nonlinear function of soil moisture storage SM (Figure 3a). Compare with Figure 2a and Appendix D for the prior model: AE ¼ min PE; PE ae;1 ae;2 1 SM ae;3 ae;4 with estimated parameters a ae = ( ). [50] 2. Effective precipitation generation rate depends on a combination of soil moisture level and precipitation rate, so that when the combination is low (dry soil and/or low precipitation) effective precipitation is relatively lower than when the soil moisture/precipitation combination is high (wet soil, medium/high precipitation). Each of the two components is represented as PDM model (equations (D1) (D4) in the Appendix D) and referred here as PDM 1 and PDM 2, with parameters C max,k and b k, k = 1, 2 (Figure 3b). Compare with Figure 2b and Appendix D for the prior model. 2 PDM 1 ðsm; PÞ; P þ Peff ;1 SM Peff ;2 < 0 6 P eff ¼ 4 PDM 2 ðsm; PÞþD; P þ Peff ;1 SM Peff ;2 0 D ¼ PDM 1 SM; Peff ;1 SM Peff ;2 PDM 2 SM; Peff ;1 SM Peff ;2 Here D assures function continuity and estimated parameters are a peff = ( ), (C max, 1 b 1 ) = ( ), (C max,2 b 2 )= (2, ). It should be pointed out that there is a relatively 6of15

7 Figure 2. Prior (light gray) and posterior (dark gray) estimates of the expected value mapping surfaces for HyMod with three quick stores: (a) effective precipitation, (b) actual evapotranspiration, and (c) slow flow. 7of15

8 Figure 3. Mathematical models fitted to posterior system variable draws for HyMod with two quick stores: (a) effective precipitation, (b) actual evapotranspiration, and (c) slow flow. Light gray dots are draws from the posterior joint density function. 8of15

9 Figure 4. HyMod model flow predictions for the evaluation period in (a) raw and (b) Box Cox transformed space given by the prior deterministic model, posterior stochastic model (95% confidence intervals), and posterior deterministic model. 9of15

10 Table 1. Standard Deviations of Residual Errors Distribution, and Reliability, a and x, and sharpness, p, Statistics for Streamflow Estimates Given by the Prior and Posterior HyMod Models a high uncertainty in the posterior mapping at high soil moisture precipitation combinations, probably indicating a need for more complex effective runoff representation (i.e., higher number of states and fluxes in the conceptual model). [51] 3. The portion of effective precipitation that goes to quick flow routing (P fast ) is modeled as a simple ratio of the P eff P fast ¼ Pfast P eff with estimated parameter a Pfast = [52] 4. Quick flow QQ 1 linearly depends on quick flow storage XQ 1 and current time influx P eff, and QQ 2 linearly depends on quick flow storage XQ 2 with contributions from current time influx QQ 1. QQ 1 ¼ QQ1;1 XQ 1 þ QQ1;2 Peff QQ 2 ¼ QQ2;1 XQ 2 þ QQ2;2 QQ 1 with estimated parameters a QQ1 = ( ), a QQ2 = ( ). [53] 5. Slow flow QS is modeled as nonlinearly dependent (with a threshold) on slow flow storage XS (Figure 3c). Compare with Figure 2c or the prior model (note that the minor dependence on P eff has been ignored). 2 QS;2 XS; XS QS;1 QS ¼ 4 Prior Deterministic HyMod Posterior Stochastic HyMod Posterior Deterministic HyMod Standard deviation Calibration Period a x p 3.4 Evaluation Period a x p 3.29 a Desirable values for a and x are 1, and 0.998, respectively. QS;4 QS;2 QS;1 þ QS;3 XS QS;1 ; XS > QS;1 with estimated parameters a QS = ( ). [54] Use of these equations gives us a posterior (modified) deterministic version of HyMod. Although the modified model has a larger number of parameters than the prior model, their estimates are based on a larger amount of information than that contained in just the rainfall discharge time series. That is, the parameters of the revised model equations were not calibrated in the traditional sense instead their values were adjusted so that the equations match the high probability behavior of the corresponding joint pdfs Streamflow Predictions [55] To demonstrate the model performance improvement achieved via data augmentation and model structure correction, we show hydrograph simulations in both raw and Box Cox transformed spaces (Figure 4) for a portion of 1972 WY (from the model evaluation period) that was characterized by a number of hydrological events. Most of the streamflow observations are bracketed by the 95% confidence intervals estimated by the stochastic posterior model. Figure 4 demonstrates model corrections to (1) high flow and (2) flow recession representation. Whereas the prior deterministic model tends to overpredict high and recession flows, the deterministic posterior provides estimates that are closer to the observed streamflow values. [56] To evaluate the predictive quality of the model, the time series of pdfs of streamflow observations (true streamflow values are not available) was compared to the model generated time series of pdfs of predictions of the streamflow observations (not true flows) (Appendix E). Note that the pdfs of predictions of the streamflow observations will be wider than the pdfs of predictions of true streamflow produced by a model (because of the observational errors). To produce predictions of streamflow observation, an observational noise was added to model predictions in the Box Cox transformed flow space (see section 3.1). The standard deviations of observational noise applied to the three models (the prior deterministic, the posterior stochastic, and the posterior deterministic models) were estimated from the flow residuals in Box Cox transformed flow space for the calibration/ structure estimation period ( ) (Table 1). As should be expected, the standard deviation of observation error for the posterior stochastic model was found to be lower (0.94) than for both the prior deterministic (1.44) and the posterior deterministic models (1.5). A Kolmogorov Smirnov test indicated that the hypothesis regarding streamflow observation errors distribution being Gaussian in Box Cox transformed space can be accepted at the 0.05 significance level for both stochastic and deterministic posterior models and at the 0.02 significance level for the prior model. [57] Further, the reliability and sharpness [Laio and Tamea, 2007; Renard et al., 2010] of the predictions was evaluated as follows. Reliability was assessed graphically using QQ plots (Figure 5) and using two reliability indexes (a and x) that quantify information from the plot. Sharpness (p) was calculated as the average relative precision of the true (not the observed) value predictions (see Renard et al. [2010] and Appendix E). Table 1 shows results for both calibration (structure estimation) and evaluation (performance estimation) periods. Note that sharpness requires a prediction pdf to calculate the prediction standard deviations and expected values at each time (Appendix E), and hence the measure cannot be computed (and makes no sense) for deterministic model predictions (standard deviation is 0). The results (Figure 5 and Table 1) indicate that the posterior models provide better reliability scores (smaller areas between the posterior QQ curves and 1:1 line), but all three models (prior deterministic, posterior stochastic, and posterior deterministic) tend to overpredict the flows (the corresponding QQ curves lie below the 1:1 line). Further, the sharpness values for the stochastic posterior predictions correspond (on average) to 30% relative prediction bounds (as derived using an inverse of prediction sharpness p from Table 1). 4. Summary and Discussion [58] This paper discusses how Bayesian data assimilation can be used to correct the mathematical structure (equations) of a model to be more consistent with the observed data, 10 of 15

11 Figure 5. QQ plots for the calibration (structure estimation, ) and validation (structure evaluation, ) periods for deterministic prior, stochastic posterior, and deterministic posterior models. conditional on the assumption that a given conceptual structure for the system is correct. Correction of the equations is achieved by treating the model mappings as conditional density functions of the model outputs for given values of model inputs and states as described by Bulygina and Gupta [2009]. Beginning with a prior estimate of the system joint probability density function (constructed from an existing model of the system), a data augmentation procedure is used to estimate the posterior density function in such a way that the physical and mathematical restrictions imposed by both the conceptual model of the system and by the behavioral information implicitly contained in the observed data are satisfied. The conditions governing convergence are discussed. [59] The applicability of the approach to model structural diagnosis and correction was illustrated via a case study involving catchment scale daily time step modeling of the input state output response of the Leaf River basin. On the basis of prior work, the conceptual structure of the HyMod model was assumed to be suitable and the existing HyMod deterministic model equations were used to construct a prior form for the system joint probability density function. Analysis of the results suggested that in addition to reducing the number of quick flow stores from three to two, there was a need to modify several of the model equations, including the functional form to compute precipitation excess, the equation used to compute actual evapotranspiration, and the equations controlling the quick and slow flow component of discharge. Therefore, a modified set of deterministic model equations was fit to match the behavior indicated by joint pdfs relating the corresponding groups of predictor predictand variables, resulting in a revised deterministic form for the mathematical structure of the HyMod model. The revised posterior stochastic model attributes less of the simulated versus observed flow discrepancy (the residual error) to errors in the observations (than does the prior deterministic model), thereby indicating that more of the flow variability is attributable to the mathematical structure of the model, and less to the effective measurement error (in contrast with the approach of observation error modeling by Abramowitz et al. [2006]). [60] The inferred posterior model suggested the need for changes in the form of the underlying deterministic mathematical model and indicated that additional uncertainty should be ascribed to some of the mathematical model components (e.g., higher uncertainty in the estimates of effective rainfall for conditions of simultaneous high soil moisture and rainfall). The latter suggests that a more complex model configuration (having a larger number of states and fluxes related to effective rainfall production) might be necessary to achieve a more accurate representation of system response. While this issue was not pursued further in this paper, one possible approach might be to combine the proposed Bayesian model correction methodology with approaches that assess multiple system conceptualizations [i.e., Fenicia et al., 2008; Clark et al., 2008]. [61] Both stochastic and deterministic posterior models provided better calibration and evaluation period performance than the original model as evaluated by QQ plots and related statistics. While the revised deterministic model structure can certainly be used for streamflow prediction, improved accuracy (smaller standard deviation of residuals) can be obtained by using the posterior probabilistic model structure, which provides a full representation of the (posterior) mathematical structure uncertainty (conditional on the conceptual model structure). In general, major improvements in model behavior are reduced model flashiness and improved consistency during flow recession (including measurement uncertainty). [62] The results described in this paper, although preliminary, serve to demonstrate the powerful potential of combining the Bayesian estimation of structure approach with data augmentation. We recognize that the current implementation is computationally intensive and will require technical improvements to become more widely applicable and useful. Among the developments and improvements we are continuing to explore are as follows: (1) dependence of the posterior estimate and convergence rate on input/output measurement error structure assumptions and prior guess regarding model structure, (2) more efficient methods for 11 of 15

12 sampling from the complex probability densities representative of the analysis and smoothing probability density functions at each time step, and (3) methods for testing the representativeness and sufficiency of the selected data periods to improve confidence that the derived mathematical structure can be considered reliable for predictions on other periods. [63] In conclusion, the problem of model structure correction/improvement/updating is one that needs considerable additional attention by the hydrologic community. While considerable progress has been made beyond the model identification process being primarily one of curve fitting or of parameter estimation in a deterministic sense, much more needs to be done to understand how to diagnostically exploit both prior information regarding a system and the various kinds of information contained within the data, so as to arrive at (progressive) strategies for model improvement [Gupta et al., 2008]. The current state of the art in Bayesian statistics provides one useful approach for exploring such issues but may not necessarily address all of the aspects of this topic (particularly for higher order, more complex models). Moreover, the potentially strong dependence of Bayesian approaches on the selected prior must not be ignored. In this regard, equally (if not more) important than equation correction is the issue of conceptual model error, which may require other strategies to be brought to bear. As always, we invite dialog on these and related issues of model identification, and welcome collaborations and/or suggestions for improvement of the methods presented here. Appendix A: Distribution Approximation by Mixture of Multivariate Gaussian Densities [64] In this appendix we list some key equations for approximation of a probability density function using a mixture of Gaussian densities (for details see Muller et al. [1996]). Given observations D 1:T, a probability density function p(s D 1:T ) can be approximated via a mixture of multivariate Gaussian densities as follows: psjd ð 1:T Þ ¼ psj ð Þ*pðjD 1:T Þd; ða1þ where =( 1,, T ) with i =(m i, W i ), i =1,T are the parameters of a Gaussian mixture with means, m i, and correlation matrices, W i. As discussed in section 2, an iterative data augmentation algorithm can be implemented to approximate p( D 1:T ), so that after each iteration step the density estimate is expressed as mp 1 m j¼1 p( s 1:T (j) ) leading to the following approximation of (A1): psjd ð 1:T Þ 1 X m m j¼1 psj ð Þ*p js j ðþ 1:T d: ða2þ [65] The second density in the integrand is a posterior distribution for the mixture parameters pðjs 1:T T Þ / P j¼1 f ð si j i Þ ag 0ð i Þþ P j<i iðjþ a þ i 1 ; ða3þ where f(s i i ) is the likelihood component for i derived from the density of the Gaussian distribution of s i i and a and G 0 characterize the Dirichlet process [Antoniak, 1974] used to specify a prior for the mixture parameters. If a is very large, then the number of different i in could be large (equal to the number of data points T); this number reduces as a goes to zero. [66] The first density in the integrand (A2) is simply a mixture of Gaussian densities with parameters and a special summand based on a Dirichlet process prior that accounts for new conditions: pðsjþ ¼ a a þ T fðsj8þdg 0 ð8þþ a a þ T X T f ð sj i¼1 iþ: ða4þ In any parameter set there will be some T* T distinct values. Antoniak [1974] gives an estimate for the number of distinct components indicating that T* is typically a very small number compared to T, so the model essentially implies the data are drawn from a mixture of small number of normals. [67] The integrals in (A2) may be approximated using several parameter draws (j,k) representing kth parameter draw from the p( s (j) 1:T ) density, so that psjd ð 1:T Þ 1 X m;p p sj ð j;k m j¼1;k¼1 : ða5þ When calculating the conditional density for outputs y given some inputs u and states x, equation (A4) becomes pyju; ð x;þ ¼ s 0 ðu; xþp 0 ðyju; xþþ X T s i¼1 iðu; xþf i ðyju; x; i Þ; ða6þ where p 0 is the conditional density of y given u and x based on the base measure G 0 and f i is the conditional Gaussian density of y given u and x under jointly Gaussian f(s i ). The corresponding weights s i (u, x), i =0,T are functions of the marginal densities of u and x under the base prior G 0 and the jointly Gaussian f i, respectively. These weights determine that the ith component will be more highly weighted in predicting y when the value of corresponding marginal density (p 0 or f i ) is relatively large. Thus, u and x values close to a particular component represented in (i =1,T) implies that the conditional density function of that component dominates the predictions; otherwise (u and x are far from the components represented in ), the conditional density based on the base prior G 0 is favored reflecting occurrence of new conditions (u and x). Appendix B: Estimation of the Extended State Probability Density Function [68] In this section, we discuss estimation of the extended state, s t =(u t, x t, y t ), at each time t =1,T, given the conceptual model, mathematical model parameterization 8 and the observed data D 1:T. Following a Bayesian approach, the extended state estimation problem can be solved via data assimilation [Wikle and Berliner, 2007] when the smoothing probability density function p(s t 8, D 1:T ) for the extended state is derived. The smoothing density estimates the system state at some past time expressed as pstþ1 ð js t ; 8Þ*pðs tþ1 j8; D 1:T Þ ps ð t j8; D 1:T Þ ¼ ps ð t j8; D 1:t Þ* ds tþ1 : ps ð tþ1 j8; D 1:t Þ ðb1þ Þ 12 of 15

Development of Stochastic Artificial Neural Networks for Hydrological Prediction

Development of Stochastic Artificial Neural Networks for Hydrological Prediction G. B. Kingston, M. F. Lambert and H. R. Maier Centre for Applied Modelling in Water Engineering, School of Civil and Environmental