From flumes to rivers: Can sediment transport in natural alluvial channels be predicted from observations at the laboratory scale?

WATER RESOURCES RESEARCH, VOL. 45, W08433, doi:10.1029/2008wr007637, 2009 From flumes to rivers: Can sediment transport in natural alluvial channels be predicted from observations at the laboratory scale? Emrah Dogan, 1,2 Shivam Tripathi, 1 Dennis A. Lyn, 1 and Rao S. Govindaraju 1 Received 3 December 2008; accepted 4 June 2009; published 25 August 2009. [1] Doubt regarding the applicability of laboratory results to alluvial streams has led some to develop sediment transport predictors based solely on field data, and most current sediment transport formulae have typically been calibrated at least partially on field data. This paper examines the transferability of flume results to the field by exploring the extent to which a unified approach to the prediction of (1) flow regime, (2) depth, and (3) total sediment transport can be developed entirely with laboratory data. Relevance vector machine (RVM)-based probabilistic models were constructed with only laboratory data, and their performances were tested against field data and found to be comparable with or better than currently available methods. Comparison of a laboratory-trained RVM with a field-trained RVM suggests that the prediction performances of the two models for unseen field data are not statistically different given the prediction uncertainty. For transferability, the choice of predictor variables is important with successful predictors being characterized by similar probability distribution in the laboratory and field data, e.g., as quantified by the Kullback-Leibler divergence. Citation: Dogan, E., S. Tripathi, D. A. Lyn, and R. S. Govindaraju (2009), From flumes to rivers: Can sediment transport in natural alluvial channels be predicted from observations at the laboratory scale?, Water Resour. Res., 45, W08433, doi:10.1029/2008wr007637. 1. Introduction [2] Predictions of sediment transport in natural alluvial channels have relied on empirical or semiempirical models, the success of which depends on the availability and quality of comprehensive data sets used in their calibration. Highquality data from many flume experiments are readily available, but correspondingly high quality field data remain relatively limited and will likely continue to remain so because of the difficulty and expense in making field measurements. It has been suggested in the literature, often tacitly and sometimes explicitly, that empirical models trained or calibrated purely from flume data do not generally perform well under field conditions [Brownlie, 1981a; Molinas and Wu, 2001], with the implication that scale effects may play an important role. A recent trend in statistical models is an exclusive reliance on field data [Nagy et al., 2002; Sinnakaudan et al., 2006]. The specific question of transferability, i.e., whether a laboratory-trained model can be applied to the field, has implications for the more fundamental question, whether different physical processes occur at the field and at the laboratory scale. [3] Traditional approaches have relied heavily on dimensional analysis but there is no consensus on an effectively complete set of variables governing sediment transport. Even if a complete set is known, scale effects may arise because the 1 School of Civil Engineering, Purdue University, West Lafayette, Indiana, USA. 2 Department of Civil Engineering, Sakarya University, Esentepe, Sakarya, Turkey. Copyright 2009 by the American Geophysical Union. 0043-1397/09/2008WR007637 functional dependence on one or more dimensionless groups differs substantially between laboratory and field scales. Consequently, statistical fitting to laboratory data will never be exposed to the full range of functional behavior, and hence such laboratory-trained models will perform poorly when applied to predict field-scale phenomenon. While the choice of relevant dimensionless groups may be the primary determinant of the performance of a prediction, the regression technique used may also play a role. Conventional multivariate regression may be overly restrictive in requiring a prespecification of the functional form, often taken to be simple linear combinations of log-transformed dimensionless groups. More recently developed techniques that are nonlinear and less restrictive, for example, those based on artificial neural networks (ANN), may still encounter problems due to the presence of local minima and overfitting. [4] Although the option of developing predictors from field data may seem attractive from a practical perspective because it presumably avoids scale effects, field data have notable shortcomings in developing physical models. In addition to their relative sparsity, uncertainties in the measurements not only of the output variables, but also of input variables, are generally significantly larger than in the laboratory. Additional complications arise in that variables in the field tend to be much more highly correlated. Channel width, flow depth, as well as sediment size and gradation, tend to be strongly correlated with channel slope, which may lead to difficulties in applying statistical techniques to obtain physically valid results. Stronger biases may also be present in that most of the field data may occupy only a small part of the parameter space, while other ranges that may be physically possible but not commonly occurring may be poorly represented. W08433 1of16

W08433 DOGAN ET AL.: FROM FLUMES TO RIVERS W08433 [5] A large number of sediment transport models have been proposed in the literature [ASCE, 1975], most of which have been trained on both laboratory and field data, thereby obscuring the issue of transferability. This study examines whether alluvial stream predictors based solely on laboratory data can perform well under field conditions, and if so, whether factors conducive to transferability can be identified. The first question is addressed by comparing the performance of laboratory-trained models with that of traditional models and with a model trained only from field data. A partial answer to the second question is given based on an analysis of the statistical characteristics of chosen input parameters. [6] The study makes use of relevance vector machine (RVM [Tipping, 2001]), a statistical tool with a number of appealing attributes within the present context, namely: [7] 1. RVM provides a probabilistic framework yielding both model and prediction errors, which not only quantify the inherent uncertainty in predictions, but also separate uncertainties due to noise in the training data and due to extrapolation during prediction, thus facilitating comparison of models trained on laboratory data and field data. [8] 2. RVM includes a Bayesian construct that induces regularization through automatic relevance determination, yielding more robust models that are not overly complex. [9] 3. RVM design involves relatively little subjectivity, compared to other common data-driven or machine-learning approaches. [10] A minimally complete sediment transport model should predict flow regime, stage (or depth), and total sediment transport. Many proposed models, with a few notable exceptions [van Rijn, 1984a, 1984b, 1984c, 2007a, 2007b, 2007c; Brownlie, 1981a], address only one aspect of the sediment transport problem, and as such cannot be used in a fully predictive sense. Because bed characteristics, flow resistance, and sediment transport rates are closely coupled, prediction of transport requires information regarding the flow depth, which must therefore also be predicted. A unified approach to the sediment transport problem within an RVM framework is thereby proposed. The use of data-driven or a machine learning tool for sediment transport prediction is not new [see, e.g., Bhattacharya et al., 2007; Nagy et al., 2002], but previous work has typically been limited to a single aspect, and above all has not addressed the issue of transferability. [11] The remainder of the paper is structured as follows: (1) a mathematical formulation of RVM is first outlined, (2) the data sets used in this study are then described, (3) models to predict flow regime, flow depth, and total sediment load are described, and finally (4) results are presented. 2. The Relevance Vector Machine Methodology [12] Finding relationships between independent variables (predictors) and a dependent variable (predictand) is a classical problem in statistical learning theory. If the predictand is a continuous variable, the learning problem is known as regression, whereas if the predictand is a categorical variable, the problem is one of classification. In the sediment transport literature, model relationships between predictors and predictand are often assumed to be effectively linear, e.g., when log transformed, but such a model assumption has been dictated more by the available regression tools than by any physical grounds. Over the last decade, the development of nonlinear regression models by using artificial neural networks (ANNs [ASCE Task Committee on Artificial Neural Networks in Hydrology, 2000]) has received much attention. Mathematically, an ANN belongs to a class of universal approximators, i.e., under mild restrictions on its architecture, it can learn, from the training patterns, any nonlinear continuous function to an arbitrary degree of accuracy [Hornik et al., 1989]. This flexibility comes however at a cost, in that traditional ANNs have several drawbacks including the possibility of getting trapped in local minima, the subjectivity in the choice of model architecture, and the lack of control of the complexity of ANNs in order to avoid over fitting. [13] Kernel methods in conjunction with Bayesian inference offer an elegant alternative to ANNs. The relevance vector machine (RVM) belongs to this class of methods, and like ANN, is a universal approximator, but differs in that Bayesian inference is used to minimize overfitting. An attractive feature of RVMs is a probabilistic interpretation that yields prediction uncertainty. Applications of RVMs to hydrology include groundwater quality modeling [Khalil et al., 2005], modeling of chaotic hydrologic time series [Khalil et al., 2006], and studying the impact of climate change on regional hydrology [Ghosh and Mujumdar, 2008], but few if any applications to sediment transport problems. Readers are referred to Tipping [2001], Schölkopf and Smola [2002], and Bishop [2006] for additional details on RVMs. 2.1. RVM for Regression [14] Consider a finite training sample of N patterns {(x i, y i ), i =1,..., N}. The i th input pattern x i denotes a vector of m independent variables (i.e. x i =[x i1,..., x im ] 2< m and X =[x 1,..., x N ] T ), and y i 2<(y =[y 1,..., y N ] T )is the corresponding dependent variable. Further, let the regression relationship be defined as y i ¼ f ðx i ; X; wþþe i ¼ XN w j K x i ; x j þ "i ð1þ j¼0 where w =[w 0, w 1,..., w N ] T is a weight vector, e i is an error term assumed to have Gaussian distribution with mean zero and variance s e 2, and K(x i, x j ) is a kernel function given as K x i ; x 1 if j ¼ 0 j ¼ exp kx i x j k 2 =s 2 kernel otherwise 2 where s kernel is the width of the kernel function. The training of an RVM model involves estimation of parameters w, s 2 2 e,ands kernel. The likelihood of the data set for this model can be written as p yw; s 2 Y N e ¼ i¼1 1 pffiffiffiffiffiffiffiffiffiffi exp 1 2ps 2 e 2s 2 e ½y i f ðx i ; X; wþš 2 In RVMs, learning is achieved by specifying a prior distribution for model parameters and estimating a posterior distribution for parameters by using the likelihood function (equation (3)). In this study, an automatic ð2þ ð3þ 2of16

W08433 DOGAN ET AL.: FROM FLUMES TO RIVERS W08433 relevance determination (ARD) prior of the following form was assigned to each weight vector pw j 1 a j ¼N 0; where a j are called hyperparameters. The choice of a zeromean Gaussian-prior for weights expresses a preference for smaller weights, and hence for a smoother model (equation (1)). During the process of learning, some of the a j s may approach infinity, and so the corresponding weights w j tend to be delta functions centered at zero, and are thus automatically omitted from equation (1). Only the remaining patterns corresponding to nonzero weights are deemed relevant for function approximation (hence the term, relevance vector machine). [15] Following Tipping [2001], noninformative priors were assigned to the model parameters, namely, a j and variance of the noise term, s 2 e. After assigning priors and defining the likelihood function (equation(3)), the posterior distribution of the model parameters was estimated using Bayes rule. The posterior distribution for the weight vector turns out to be a Gaussian distribution p(wjx, y) =N (m w, S w ) but those for a j and s 2 e do not have closed-form expressions, and were approximated by delta functions at their mode as a j max and 2 s e max, respectively. [16] The predictive distribution of the dependent variable y* for a new set of independent variables x* is given by py* j x*; X; y; a j max ; s 2 e max ¼N my* ; s2 y* where the mean (m y * ) and the variance (s y* 2 ) of the predictive distribution are given by a j ð4þ ð5þ m y* ¼ m wkðx*; XÞ ð6þ 2 y* ¼ 2 e max þ Kðx*; XÞT S w Kðx*; XÞ ð7þ The width of the kernel function, kernel, was determined following the procedure given by Tripathi and Govindaraju [2007]. 2.2. RVM for Classification [17] In a classification problem, the dependent variable t i is a binary variable t i 2 {0,1}. The RVM framework can be extended to the classification problem by applying a sigmoid function to the regression model (equation (1)) as 1 Wðf Þ ¼ 1 þ e f ð8þ (equation (1)) except that there was no noise variance term s e 2. Model parameters were estimated as before, by first assigning prior distributions and then estimating their posterior distributions using the likelihood function (equation (9)) and applying Bayes rule. Unlike the regression model, the posterior distribution of w did not have a closed form, and it was also approximated by delta function at its mode. The other aspects of the classification model formulation remained the same as the regression model. For a new input vector, the RVM model makes a probabilistic prediction for the category of the dependent variable. Additional details of the RVM model for classification can be found in the work of Bishop [2006, p. 353]. 3. Data Used in the Study [18] Building on the works of Johnson [1943] and Peterson and Howells [1973], Brownlie [1981b] compiled an extensive set of laboratory and field data. In this study, additional recent data sets were organized in a format compatible with that used by Brownlie and then merged. The assembled database contains 9437 records (5594 laboratory and 3843 field), of which 2443 records (331 laboratory and 2112 field) are in addition to the Brownlie [1981b] compilation. A list of new data sources is given in the Appendix A2. [19] For the current study, the following restrictions were applied on the data sets: [20] 1. Only data records with width-to-depth ratio, B/ D > 4, were used in order to avoid any significant sidewall effects in the laboratory data. [21] 2. The relative roughness (R/d 50 ), where R is the hydraulic radius, and d 50 is the median sediment size, was limited to R/d 50 > 100 to avoid effects due to extremely shallow conditions. [22] 3. Sediment sizes were restricted to 0.062 mm < d 50 < 2.0 mm, and so the study was limited to sand sizes. The geometric standard deviation of the distribution of sediment particles (gradation, s g ) was limited to values less than 5 so as to avoid records that have excessive amounts of gravel or fine material. [23] 4. Records with sediment concentration, C < 10 ppm, were excluded due to concerns about measurement accuracy at low concentrations. This restriction was enforced only when predicting sediment transport (section 5.1.3). Temperature was used in the calculation of the kinematic viscosity, n, of water. For data records with no water temperature data, a value of 10 C was assumed. The geometric standard deviation of the distribution of sediment particles of 2 was chosen for the data records that did not have this information. Finally, 34 laboratory data points were identified as outliers (see Appendix A3) and were removed from the analysis. where, to keep the notation uncluttered, f(x i, X; w) is represented by f. The likelihood of the data set for this classification model is given by pðtw j Þ ¼ YN i¼1 ½Wðf ÞŠ ti ½1 Wðf ÞŠ 1 ti ð9þ where t =[t 1,...,t N ]. The parameters of the classification model were same as those for the regression model 3of16 4. Formulation of Models [24] The flows in the alluvial channels (flumes and rivers) were assumed to be steady, uniform, and two-dimensional with equilibrium noncohesive sediment transport. The flow regime (8), flow depth (D), and sediment concentration (C) were chosen as the dependent (unknown) variables to be determined from the following independent (measured) variables: (1) channel characteristics, slope S; (2) sediment characteristics, d 50 and specific gravity s; (3) flow/fluid

W08433 DOGAN ET AL.: FROM FLUMES TO RIVERS W08433 characteristics, unit discharge q and the kinematic viscosity of water n together with the acceleration due to gravity (g). [25] These variables are generally acknowledged to be relevant, and have been used in many previous models. Additional independent variables can be identified, but, like Bhattacharya et al. [2007], a minimal set was preferred. The effect of channel width, B, has been explicitly included by some [e.g., Nagy et al., 2002], but the traditional twodimensional viewpoint is taken to justify the omission of B as a relevant variable. Further, with the focus on sand sizes, the effect of gradation is considered secondary, and so s g was not included. [26] Dimensional analysis was then applied to the remaining variables to yield appropriate dimensional groups, the choice of which was substantially influenced by previous effective relationships and by a transparent physical interpretation. Because a unified approach was sought, the same set of predictors (inputs) was to be used for all predictands. This contrasts with the approach of Brownlie [1981a], who also developed a complete model, but chose substantially different sets of predictors for the depth and transport predictions. The basic relationships for the flow regime, 8, the dimensionless depth, D/D 0, and the transport concentration, C (in ppm by weight), were chosen as 8 ¼ f 1 q * ; t * ; t 0 * ; D D ¼ f 0 2 q * ; t * ; t * 0 ; C ¼ f 3 q * ; t * ; t * 0 ; t *c ; ð10þ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi where q * = qs/ gs ð 1Þd50 3 is interpreted as a dimensionless stream power, t * and t* 0 are forms of the Shields parameter, t * = RS/[(s 1)d 50 ]. A primed quantity, such as D 0 (or R 0 )andt* 0, indicates a quantity associated with the grain or skin friction. The critical Shields parameter, t * c,is associated with incipientq sediment ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi motion; a dimensionless sediment diameter, d * = gs ð 1Þd50 3 =n2, could have been used instead of t * c, but t * c waschosenasbeingmore directly related to transport. For consistency, t * c could have also been included in f 1 and f 2, but the RVM results suggested that its relevance to f 1 and f 2 was negligible, and so was omitted in the regime and depth relationships. The total of four independent variables is a middle course compared with the six in the final ANN model of Nagy et al. [2002] and the three in the machine-learning total load models of Bhattacharya et al. [2007]. [27] In its explicit use of D/D 0, the depth prediction model of equation (10) recalls the classical models of Einstein and Barbarossa [1952] and Engelund [1966], rather than the more direct modern approaches. Similarly, the related explicit use of t * 0 is rather uncommon in recently proposed transport formulae, though it is present in the van Rijn s [1984a, 1984b] transport parameter, which is combination of t * 0 and t * c. The specification of t * 0 or equivalently D0 has traditionally been based on a plane fixed-bed friction relationship, either in a log-law form (as in the Engelund or van Rijn models) or in a Manning-Strickler form (as in the Einstein-Barbarossa method). For simplicity, the latter has been chosen, with the specific relationship used being R 0 ¼ 1 ðq=dþ 3=2 pffiffiffiffiffiffiffiffiffiffiffi ð11þ d 50 6:74 gsd 50 For wide channels, D 0 = R 0, while for narrow channels, D 0 = R 0 /(1 2R 0 /B). The critical Shields stress, t * c, was evaluated according to Sturm [2001], which is based on the experimental pffiffiffiffiffiffiffi results of Yalin and Karahan [1979], for (Re * ) c = d * t * c > 1, and according to Brownlie [1981a] for (Re * ) c < 1. Strictly speaking, D/D 0 = t * /t* 0 is not independent of t* 0 and t * so that either t0 * or t * could have been omitted from the specification of D/D 0 in equation (10), but in the actual regression analysis, the added flexibility of retaining both groups was found advantageous in obtaining a better fit. [28] It would have been desirable to examine a scaled concentration (analogous to the scaled depth, D/D 0 ), but a natural concentration scale could not be found to normalize C. For the present purposes of mainly studying model transferability, this will not be of any importance, but it may be of some concern in other contexts. 5. Results and Discussion 5.1. Development and Comparison of Models [29] The RVM models (based on equation (10)) for predicting flow regime, depth, and sediment concentration, trained exclusively on laboratory data and hence collectively labeled as RVM L, were compared with some common existing methods in the prediction of field data. It is emphasized that most such existing models were developed using either solely field data or a combination of laboratory data and field data. This section outlines the steps involved in training RVM models and presents comparison results. 5.1.1. Flow Regime Prediction [30] The variable 8 in equation (10) represents flow regime, i.e., a characterization of the alluvial bed, taking a value of one for the lower regime, where the bed is covered with ripples or dunes and therefore offers significant form flow resistance, and zero for the upper regime, where the bed is plane or covered with antidunes, and offers primarily grain and hence comparatively reduced flow resistance. The boundary between the two regimes in the log-transformed space of q *, t *, and t* 0 was obtained by a linear RVM classification algorithm, the output of which is a probability for a data point to be in lower regime given the predictors p(8 =1jq *,t *,t* 0 ). To simplify the notation, p(8 =1jq *,t *,t* 0 ) will be denoted as p(l), and similarly, the probability of a data point being in the upper regime is p(u) =1 p(l). Thus p(l) = 1 indicates lower regime while p(l) = 0 indicates upper regime, and a transition regime was identified for intermediate values of p(l). [31] As illustrated in box plot in Figure 1, the median of the lower and upper flow regimes are confined to two extreme corners on the y axis, demonstrating that the chosen predictors can effectively classify upper and lower regimes. Further, the spread of p(l) for the transition regime suggests that it may not be prudent to make a discrete classification for this regime. An advantage offered by the probabilistic viewpoint of RVM over other existing flow regime prediction methods is that a definite decision regarding flow regime can be avoided, which may be especially useful for data points that are in or close to the transition state. Probabilistic classification by RVM may also be considered more realistic in view of the inevitable ambiguity associated with labeling observed data particularly under field conditions. 4of16

W08433 DOGAN ET AL.: FROM FLUMES TO RIVERS W08433 Figure 1. Illustration of probability associated with different regimes using boxplots. The top and bottom of the box ranges from 25 to 75 percentile, whereas whiskers extend to 10 and 90 percentile. The thick dashed line dividing the box represents the median value. Lower, Transition, PLA, and Upper refer to lower, transition, plane bed with sediment transport, and upper regime, respectively. The quantity p(l) represents probability of a data point to be in lower regime. [32] Table 1 presents the comparison of RVM L in predicting lower and upper regimes with some existing flow regime prediction methods. The values in the second column of Table 1 are the total number of data points in lower or upper flow regimes based on observed bed forms, while values in the subsequent columns represent the number of data points correctly classified by different regime separation methods. Among the existing methods, classification by van Rijn [1984c] and White et al. [1987] methods are biased toward predicting lower regime, while Brownlie [1983] performs relatively better in classifying both laboratory and field data. The performance of RVM L is better, or at least comparable, to any of the other traditional methods on field data suggesting that RVM trained solely on flume data can satisfactorily classify flow regimes in field data. [33] Even though probabilistic classification discourages the use of thresholds, for purposes of comparison with existing methods, the criterion, 0.1 < p(l) < 0.9, was adopted for classifying the transition regime. The threshold values were selected based on the results presented in Figure 1 for laboratory data. The performances of different methods in identifying the transition regime are given in Table 2, where RVM L is seen to perform reasonably in identifying the transition regime in both laboratory and field data. All the methods considered for regime classification (Tables 1 and 2) use observed flow depth, but the usefulness of regime classification in the present approach lies in prediction of flow depth which will be presented in the next subsection. 5.1.2. Flow Depth Prediction [34] The log-transformed values of q *, t *, and t 0 * formed the inputs, and D/D 0 the output of the RVM depth prediction model. The functional relationship between inputs and output was separately learned for lower and upper regime on flume data. In the upper regime, due to absence of significant form friction, D/D 0 was found to be close to unity and therefore the flow depth D up was set to D 0. The lower regime flow depth D low was computed using linear RVM whose parameters are given in the Appendix A4. The flow depth D was determined according to the RVM regime classification as Lower regime if p l ðlþ > 0:9 and p u ðlþ > 0:1; ) D ¼ D low ; upper regime if p l ðlþ < 0:9 and p u ðlþ < 0:1; ) D ¼ D up ¼ D 0 ; otherwise transition ) D ¼ D lowp l ðlþþd up ½1 p u ðlþš ð12þ p l ðlþþ½1 p u ðlþš where p l (L) and p u (L) are the probabilities of the data point being in the lower regime based on D low and D up respectively. In the present approach, regime and depth prediction are treated in a coupled manner, with both problems being solved simultaneously. Table 1. Number of Data Points for Which Flow Regime is Correctly Predicted by Various Models a Methods Observed Bogardi [1959] Brownlie [1981a] van Rijn [1984c] White et al. [1987] Karim [1995] RVM L Model Data used F F and R F and R F F and R F Lab L(710) 615 640 709 628 648 690 U(280) 51 231 93 172 174 269 Field L(325) 67 281 306 313 306 322 U(38) 11 21 1 0 2 28 a L and U represent lower and upper regimes, respectively. F or R denotes whether flume data or river data were used in developing the model. 5of16

W08433 DOGAN ET AL.: FROM FLUMES TO RIVERS W08433 Table 2. Number of Data Points for Which Flow Regime is Correctly Predicted in Transition Regime a Methods Observed Bogardi [1959] Brownlie [1981a] van Rijn [1984c] White et al. [1987] Karim [1995] RVM L Model Lab T(103) 46 46 9 47 79 63 Field T(31) 21 19 11 0 16 17 a For RVM L, a point is classified to be in the transition regime if 0.1 < p(l) < 0.9, wherein p(l) is the probability that the data point is in lower regime. [35] Tables 3 and 4 present comparisons between RVM L model with existing methods for predicting D in laboratory and field channels respectively. The measures of performance are the following statistics: [36] 1. Percentage relative bias (R-Bias) in preservation of mean, expressed as R Bias ¼ 1 N p X Np i¼1 D i ^D i 100 ð13þ D i where D i and ^D i are observed and predicted depth for the i th data point respectively, and N p is the total number of data points. For an ideal model, R-Bias should be zero; R-Bias > 0 or R-Bias < 0 indicates an underpredicting or overpredicting model. [37] 2. Percentage normalized mean square error (N-MSE), defined as the ratio of the mean square error in predicting depth to the variance, S 2 obs, of the observed depth, 1 N N MSE ¼ X Np 2 D i ^D i i¼1 Sobs 2 100 ð14þ [38] 3. Number of data points for which the predicted values of depth lie within 10%, 20%, and 30%, respectively, CI10, CI20, CI30, of the observed depth. All the existing models used in the comparison, except the Brownlie [1983] method, estimate D iteratively and a guessed value is required to start the iteration. For existing methods, the observed D was used as the initial guess. For RVM, D 0 and 3D 0 were used as the initial guesses for estimating D up and D low respectively, these values being based on the average value of D/D 0 in the laboratory data. [39] For laboratory data (Table 3), the Brownlie [1983] method outperforms other models in terms of both R-Bias and N-MSE statistics. The Engelund [1966] model performs fairly well as does the van Rijn [1984c] model, except for the upper-regime data points where it tends to overpredict. The RVM L has higher R-Bias and intermediate N-MSE; its performance though not outstanding, compares well with the existing methods. For field data (Table 4), the Engelund [1966] model has the smallest R-Bias but also the highest N-MSE, whereas Brownlie [1983] has the smallest N-MSE and also relatively small bias. In comparison with other models, the relatively robust performance of RVM L on field data is better than on laboratory data, rather surprising since RVM L was trained only on laboratory data. [40] The comparison presented thus far is not an independent check because the data used for verification were also used in calibrating most of the existing methods. In a more stringent verification, two data sets, one for Malaysian rivers [Sinnakaudan et al., 2006] and the other for the Lower Yellow River [Long and Liang, 1994], were selected. These data sets are independent in that they were not used in the calibration, and also have a fairly long record that facilitates a meaningful comparison. The performance of RVM L is compared with those of Brownlie [1983] and van Rijn [1984c] methods in Table 5. RVM L performed best in predicting D for the Lower Yellow River in terms of R-Bias, N-MSE, and number of points with predicted D within a specified band. All methods were characterized by high values of R-Bias and N-MSE in predicting D for the Malaysian rivers, but the RVM L had the largest values of CI10, CI20, and CI30. 5.1.3. Prediction of Total Sediment Concentration [41] The log-transformed values of q *, t *, t* 0, and t * formed the input to the transport model with log C as the c output. The performance of the linear RVM L model in estimating C is compared with those of eleven well-known prediction equations in Table 6. The comparison measures are based on a discrepancy ratio, DR = C pred /C obs, where C pred and C obs are predicted and observed values. Following Brownlie [1981a], DR was assumed to be lognormally distributed, and so its geometric mean (GM) and geometric standard deviation (Gstd) were estimated for each method. The number of data points satisfying 0.5 DR 2, denoted as DR 2, is also reported. For most existing models Table 3. Performance of Flow Depth Prediction Methods a Investigator(s) Data Used Laboratory Number R-Bias N-MSE CI 10 CI 20 CI 30 Engelund [1966] F and R 1356 2.78 12.75 634 967 1101 Brownlie [1983] F and R 1351 0.51 9.94 685 1069 1224 van Rijn [1984c] F and R 1360 13.01 24.44 554 850 994 Karim and Kennedy [1990] F and R 1359 6.91 16.57 496 882 1113 Karim [1995] F and R 1360 0.72 10.95 604 1011 1222 Karim [1999] F 1354 9.59 11.12 478 1064 1302 RVM L model F 1360 14.18 14.83 556 883 1053 a Number of points for which algorithm converged (Number), percentage relative bias (R-bias), percentage normalized mean square error (N-MSE), and number of data points for which predicted depth was within ±10%, ±20%, and ±30% of the observed depth (CI 10, CI 20, and CI 30) are shown for laboratory data. F or R denotes whether flume data or river data were used in developing the model. 6of16

W08433 DOGAN ET AL.: FROM FLUMES TO RIVERS W08433 Table 4. Performance of Flow Depth Prediction Methods on Field Data a Investigator(s) Field Number R-Bias N-MSE CI 10 CI 20 CI 30 Engelund [1966] 3337 0.75 15.31 868 1470 1973 Brownlie [1983] 3306 3.53 6.77 1249 2075 2533 van Rijn [1984c] 3355 7.17 12.16 1204 1984 2418 Karim and Kennedy [1990] 3350 2.61 12.69 1211 2072 2639 Karim [1995] 3355 6.54 9.14 1152 1954 2602 Karim [1999] 3350 13.62 9.10 871 1961 2834 RVM L model 3355 3.00 10.97 1142 1937 2463 a Number of points for which algorithm converged (Number), percentage relative bias (R-bias), percentage normalized mean square error (N-MSE), and number of data points for which predicted depth was within ±10%, ±20%, and ±30% of the observed depth (CI 10, CI 20, and CI 30) are shown. applied to the field data, GM < 1, indicative of a tendency to underpredict C. From Table 6, the performance of RVM L model for predicting C is comparable to existing models, almost all of which had the advantage of being trained to some extent on field data. [42] The data sets from Malaysian rivers [Sinnakaudan et al., 2006] and the Lower Yellow River [Long and Liang, 1994] were selected again for independent verification of model performance. The performance of RVM L is compared with that of Brownlie [1981a], van Rijn [1984c] and Molinas and Wu [2001] methods in Table 7. All models tend to underpredict C for both Malaysian rivers and Lower Yellow River. For the Malaysian rivers, the performance of RVM L is marginally better than other methods but for Lower Yellow River, it is substantially better. [43] In the results presented thus far, the observed D was used in the estimation of C. Table 8 presents the performance of three prediction methods when D was assumed to be unknown and was estimated as part of the prediction process. For each method compared, namely that of Brownlie [1981a] and van Rijn [1984a, 1984b, 1984c], D was estimated using the depth predictor developed by the corresponding investigator. The results so obtained reflect the actual performance that may be expected in predicting C for field applications. The performance degraded for all three methods compared to that obtained using observed D (Table 6), but the relative performance of the methods remained the same with RVM L having the largest value of DR 2, followed by Brownlie [1981a] and van Rijn [1984a, 1984b, 1984c], respectively. [44] The above results for predicting flow regime, flow depth, and total sediment concentration support the claim that empirical models trained solely on laboratory data can be developed that perform as well or, even in some specific cases and according to some performance measures, markedly better than existing models trained partially or wholly with field data. This suggests that judicious model design, through choice of input variables, can largely avoid scale effects, and that explanations for any poor predictive performance need to be sought elsewhere. The distinct difference in performance of the methods, including RVM L, applied to the Malaysian and Lower Yellow river data (Table 7) is a case in point. 5.2. Comparison of RVM Trained on Laboratory Data (RVM L ) With RVM Trained on River Data (RVM R ) [45] The issue of transferability is further studied by comparing the performance of the same basic model, as specified, e.g., in equation (10), trained separately on laboratory and on field data. The problem of predicting C rather than D or 8 was selected for further investigation as the most challenging. An RVM model (RVM R ) was developed using only field data for training. Prediction from RVM R for field data was obtained using 5-fold cross validation. The field data were divided into 5 disjoint subsets, and the model was trained on all subsets except one and the prediction was made on the subset left out. The procedure was repeated 5 times, each time selecting a different subset for testing, such that each data point in the field data set becomes a part of test data once. [46] Table 9 compares the performance of RVM L and RVM R on field data, and as expected, RVM R performs on average better than RVM L, but the difference in performance measures is small, raising the question whether the difference is statistically significant. This was investigated by considering the prediction variance, s 2 y *, which, in an RVM model, consists firstly of model error and secondly of uncertainty in estimates of RVM model parameters (equation (7)). The model error accounts for the precision with which data can be modeled by RVM, and is measured by the variance in the dependent variable of the training data that cannot be explained by RVM. Because the laboratory data have smaller uncertainties, the model error of RVM L was less than the model error of RVM R as illustrated in Figure 2a. The model error remains same for all the test data points. The second contribution to s 2 y * accounts for the uncertainty in the RVM model parameter (w) and varies with each test data point. If the test data point is far from the training data in the space of Table 5. Comparison of Different Methods in Predicting Flow Depth of Malaysian Rivers [Sinnakaudan et al., 2006] and Lower Yellow River [Long and Liang, 1994] a Malaysian Rivers Lower Yellow River Measures Brownlie [1983] van Rijn [1984c] RVM L Model Brownlie [1983] van Rijn [1984c] RVM L Model Number 289 289 289 887 935 935 R-Bias 35.30 33.15 27.16 35.00 42.12 13.72 N-MSE 92.27 91.61 91.79 89.39 113.28 29.18 CI 10 17 23 97 191 174 230 CI 20 82 96 135 344 309 407 CI 30 139 142 158 466 417 558 a Number of points for which algorithm converged (Number), percentage relative bias (R-bias), percentage normalized mean square error (N-MSE), and number of data points for which predicted depth was within ±10%, ±20%, and ±30% of the observed depth (CI 10, CI 20, and CI 30) are provided. 7of16

W08433 DOGAN ET AL.: FROM FLUMES TO RIVERS W08433 Table 6. Performance of Total Sediment Transport Prediction Methods for 1210 Laboratory and 2911 Field Data Points a Laboratory Field Investigator(s) Data Used GM GStd DR 2 GM GStd DR 2 Rottner [1959] F 1.15 2.53 767 0.21 4.45 895 Engelund and Hansen [1967] F 1.19 2.40 855 0.38 5.74 1130 Acaroglu [1968] F and R 0.50 3.72 374 0.13 17.60 725 Shen and Hung [1972] F and R 0.89 2.05 949 0.21 5.63 812 Yang [1973] F and R 1.10 1.96 955 0.21 4.70 808 Ackers and White [1973] F 1.58 1.97 726 0.57 5.50 1111 Brownlie [1981a] F and R 0.85 2.09 901 0.34 3.59 1176 van Rijn [1984a, 1984b] F and R 1.15 2.92 672 0.37 3.32 1152 Karim and Kennedy [1990] F and R 1.01 2.20 896 0.35 4.37 1188 Molinas and Wu [2001] R 1.87 2.65 642 0.45 2.90 1142 Yang [2005] F 2.16 2.55 581 0.32 4.57 863 RVM L model F 1.00 2.08 836 1.03 2.55 1709 a Geometric mean (GM) and geometric standard deviation (GStd) of the ratio of predicted to observed concentration (discrepancy ratio) along with number of points with discrepancy ratio less than two (DR 2 ). Table 7. Performance of Total Sediment Transport Estimating Methods for Malaysian Rivers (289 Data Points) and Lower Yellow River (935 Points) a Investigator(s) Malaysian Rivers Lower Yellow River GM GStd DR 2 GM GStd DR 2 Brownlie [1981a] 0.82 2.42 139 0.09 2.28 9 van Rijn [1984a, 1984b] 0.30 4.91 100 0.32 3.25 389 Molinas and Wu [2001] 0.85 2.18 130 0.21 2.23 112 RVM L model 0.76 2.71 152 0.72 2.44 567 a Geometric mean (GM) and geometric standard deviation (GStd) of the ratio of predicted to observed concentration (discrepancy ratio) along with number of points with discrepancy ratio less than two (DR 2 ). input variables (i.e. the model is extrapolating), the uncertainty in the model parameter will be high. Because the laboratory-trained model must extrapolate further when making predictions for field data than a field-trained model, the uncertainty in model parameter for RVM L was larger than that of RVM R as shown in Figure 2b. Figure 3 presents the combined prediction uncertainty from RVM L and RVM R for a few randomly selected points in the field data. [47] The statistical significance of the difference between predictions from RVM L and RVM R was assessed by using the null (no difference) hypothesis considering three significance levels (1%, 5%, and 10%). Let m Li and m Ri, and s Li and s Ri denote the mean and the standard error of the prediction of log-transformed concentration at ith test point by using RVM L and RVM R, respectively. The difference in the mean q prediction dp i = m Li m Ri has standard ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi error Sp i = s 2 Li þ s2 Ri, and the test statistics z = dp i /Sp i has an approximate normal distribution [Altman and Bland, 2003]. The null hypothesis that the mean predictions from RVM L and RVM R are equal could not be rejected at 1%, 5%, and 10% significance levels, leading to the inference that, given the uncertainty in the predictions, the performance differences between RVM L and RVM R are not statistically significant. 5.3. Assessing Transferability of a Brownlie-Type Model [48] The above analysis of the laboratory-trained and field-trained RVM models led to conclusion that performance differences were not statistically significant, and a similar analysis for an existing model was motivated to determine the effect of individual model elements. A Brownlie [1981a]-type model was selected for this purpose as a widely known modern example that was also convenient for computer modeling. [49] The input variables and the power law structure of the Brownlie [1981a] model were retained, but the model constants (coefficient and exponents) were reevaluated, separately for laboratory and field data, using the data set compiled as part of this study (the constants are given in the Appendix A5). The constants for the two models, to be labeled as BRN L and BRN R, are noticeably different. Table 9 presents the prediction performance of BRN L and BRN R for predicting field concentrations, and as expected, BRN R performs better than BRN L. The statistical significance of the difference between the point predictions of BRN L and BRN R was assessed by testing the null hypothesis considering three significance levels (1%, 5%, and 10%), with a standard two-sample t distribution test, assuming that the predicted values of log C follow a normal distribution. The null hypothesis of the same mean prediction from BRN L and BRN R was rejected at all three significance levels, suggesting that the point predictions from BRN L and BRN R are indeed statistically different. In contrast to the RVM models, a Brownlie-type model suffers from poor transferability. [50] The proposed RVM model differs from the Brownlietype models (BRN L or BRN R ) in two important respects, firstly in its choice of input variables, and secondly in the Table 8. Performance of Total Sediment Transport Predicting Methods for Field Data (2911 Points) When Both Sediment Concentration and Flow Depth are Assumed to be Unknown a Investigator Field GM GStd DR 2 Brownlie [1981a] 0.37 6.44 1076 van Rijn [1984a, 1984b, 1984c] 0.34 6.80 956 RVM L model 0.99 3.22 1435 a Geometric mean (GM) and geometric standard deviation (GStd) of the ratio of predicted to observed concentration along with number of points with discrepancy ratio less than two (DR 2 ). 8of16

W08433 DOGAN ET AL.: FROM FLUMES TO RIVERS W08433 Figure 2. Standard error in the estimate of total sediment concentration for field data: (a) uncertainty due to model error and (b) uncertainty in estimate of model parameters. RVM L and RVM R refer to RVM model trained on laboratory and river data, respectively. regression methodology (and hence functional form of the model). In order to determine more precisely the reasons for the poor transferability of BRN, linear RVMs were also developed with the same input variables of the BRN models, and again trained separately on laboratory and field data. The performance measures for these models (BRN-RVM L and BRN-RVM R ) for predicting field data are also shown in Table 9. The application of the RVM approach with the Brownlie input variables does not enhance the performance of the Brownlie-type models. This points to the choice of input variables rather than the regression methodology as the main explanation for the poor transferability. 5.4. Input Variables, Transferability, and KL Divergence [51] If as hypothesized poor transferability can be attributed to a poor choice of input variables, the identification of better input variables becomes the prime problem, requiring both physical as well as statistical insights for resolution. The present work investigates only a single statistical aspect, namely the similarity of probability density function (pdf) of each input variable in the laboratory and in the field. Considering the input variables to the RVM and Brownlietype models as random variables, Figures 4 and 5 compare the marginal pdfs of each input variable in the laboratory and in the field. These were estimated using a Gaussian kernel smoother which is a standard nonparametric method of estimating a density function [Bowman and Azzalini, 1997]. Unlike the other input variables, R/d 50 and S that serve as explicit independent inputs to a Brownlie-type model have very distinct distributions derived from laboratory and field data. Thus if BRN L is used for field predictions, then extrapolation is inevitable in the range of input variable for which laboratory data is not available, which will most likely adversely affect the transferability of a laboratorytrained model. The conditional distribution of the prediction Figure 3. Prediction of total sediment concentration by RVM L and RVM R for a few randomly selected field data points. The solid squares correspond to the observed concentration. The crosses and the circles refer to the mean prediction by RM L and RVM R, respectively. The error bar stretching across the crosses and circles denote one standard deviation of the prediction variance. 9of16

W08433 DOGAN ET AL.: FROM FLUMES TO RIVERS W08433 Table 9. Performance of Various Models When the Same Input Variables are Used but the Models are Trained Solely on Laboratory Data (Subscript L) and Also Trained Solely on Field Data (Subscript R) Model Field GM GStd DR 2 RVM L 1.03 2.55 1709 RVM R 1.00 2.44 1811 BRN L 0.41 3.52 1208 BRN R 1.00 3.01 1483 BRN-RVM L 0.42 3.54 1221 BRN-RVM R 1.00 3.02 1477 error in log C, given S and R/d 50, were estimated for BRN L and BRN R models, and the conditional expected value of the errors are shown in Figure 6. For illustration purposes, the marginal pdfs are plotted again (upper panels in Figure 6). The error in predicting C is larger for BRN L model in the regions of input variable space where laboratory data is sparse or not available, with the implication that the difference in predictions of BRN L and BRN R can be attributed to extrapolation in those regions. [52] One criterion for a model to achieve better transferability is that the need for extrapolation during prediction should be minimized, as exemplified by the RVM L model. This can be quantified by means of the Kullback-Leibler (KL) divergence [Kullback and Leibler, 1951], which measures the difference between two distributions. Let p L (x) and p R (x) be the pdfs of a variable x in laboratory and field data respectively. The KL-divergence between the two distributions is given by where KLDðp L ; p R Z D KL ðp L jp R Þ ¼ Z D KL ðp R jp L Þ ¼ Þ ¼ 0:5fD KL ðp L jp R ÞþD KL ðp R jp L Þg ð15þ p L ðþln x p LðÞ x dx; and p R ðþ x p R ðþln x p RðÞ x dx: p L ðþ x ð16þ The KLD is nonnegative, with a value of zero if, and only if, p L (x) =p R (x). Variables having similar distributions in laboratory and field data will have smaller values of KLD. For developing a model with a high degree of transferability from flume to river, input variables with low KLD should be preferred. Table 10 compares the KLD of the input variables in the RVM and Brownlie [1981a] models, and as might be expected from Figures 4 and 5, the KLD for input variables in the RVM model are substantially smaller than those for Brownlie s model. [53] The KLD can be estimated for each variable separately, but this ignores the dependence among variables. Input variables having similar mutual dependence in laboratory and field data should be more conducive to better transferability, though one to one correspondence is unlikely. To estimate a multivariate KLD that accounts for the dependence among input variables, the marginal distributions p L (x) andp R (x) should be replaced by joint distributions, Figure 4. Marginal probability density functions (pdfs) of input variables in RVM model for predicting total sediment concentration. The solid lines denote pdfs the input variables for the laboratory data, and the dashed lines represent the same for field data. 10 of 16

W08433 DOGAN ET AL.: FROM FLUMES TO RIVERS W08433 Figure 5. Marginal probability density functions (pdfs) of input variables in Brownlie [1981a] model for predicting total sediment concentration. The solid lines denote the pdfs of the input variables for the laboratory data, and the dashed lines represent the same for field data. q L (x) andq R (x), where x is random vector of input variables. Intuitively, KLD in this case is a measure of distance between two joint pdfs in a multidimensional space of input variables. The estimation of joint distributions is, in general, not trivial, but they can be approximated by copulas, a statistical tool for formulating multivariate distributions [Nelsen, 2006; Renard and Lang, 2007]. The multivariate KLD of the input variables in RVM and Brownlie-type models obtained using a Gaussian copula are provided in the last column of Table 10. As expected, the multivariate KLD for input variables in the RVM model is smaller than that for the input variables in Brownlie s model. 5.5. Further Comments [54] In their development of machine-learning approaches to sediment transport prediction, Bhattacharya et al. [2007] had already emphasized the importance of similar distributions in training and testing data. The difference in goals and hence strategy from the present study should be highlighted. Because of their focus on develop- Figure 6. Marginal probability density functions (pdfs) of (a) relative roughness (R/d 50 ) and (c) slope (S), which serve as inputs vector to Brownlie [1981a] model for predicting total sediment concentration. Expected value of error in predicting log transformed concentration by BRN L and BRN R models conditioned on the values of input variables (b) R/d 50 and (d) S, respectively. 11 of 16