From flumes to rivers: Can sediment transport in natural alluvial channels be predicted from observations at the laboratory scale?

Size: px
Start display at page:

Download "From flumes to rivers: Can sediment transport in natural alluvial channels be predicted from observations at the laboratory scale?"

Transcription

1 WATER RESOURCES RESEARCH, VOL. 45, W08433, doi: /2008wr007637, 2009 From flumes to rivers: Can sediment transport in natural alluvial channels be predicted from observations at the laboratory scale? Emrah Dogan, 1,2 Shivam Tripathi, 1 Dennis A. Lyn, 1 and Rao S. Govindaraju 1 Received 3 December 2008; accepted 4 June 2009; published 25 August [1] Doubt regarding the applicability of laboratory results to alluvial streams has led some to develop sediment transport predictors based solely on field data, and most current sediment transport formulae have typically been calibrated at least partially on field data. This paper examines the transferability of flume results to the field by exploring the extent to which a unified approach to the prediction of (1) flow regime, (2) depth, and (3) total sediment transport can be developed entirely with laboratory data. Relevance vector machine (RVM)-based probabilistic models were constructed with only laboratory data, and their performances were tested against field data and found to be comparable with or better than currently available methods. Comparison of a laboratory-trained RVM with a field-trained RVM suggests that the prediction performances of the two models for unseen field data are not statistically different given the prediction uncertainty. For transferability, the choice of predictor variables is important with successful predictors being characterized by similar probability distribution in the laboratory and field data, e.g., as quantified by the Kullback-Leibler divergence. Citation: Dogan, E., S. Tripathi, D. A. Lyn, and R. S. Govindaraju (2009), From flumes to rivers: Can sediment transport in natural alluvial channels be predicted from observations at the laboratory scale?, Water Resour. Res., 45, W08433, doi: /2008wr Introduction [2] Predictions of sediment transport in natural alluvial channels have relied on empirical or semiempirical models, the success of which depends on the availability and quality of comprehensive data sets used in their calibration. Highquality data from many flume experiments are readily available, but correspondingly high quality field data remain relatively limited and will likely continue to remain so because of the difficulty and expense in making field measurements. It has been suggested in the literature, often tacitly and sometimes explicitly, that empirical models trained or calibrated purely from flume data do not generally perform well under field conditions [Brownlie, 1981a; Molinas and Wu, 2001], with the implication that scale effects may play an important role. A recent trend in statistical models is an exclusive reliance on field data [Nagy et al., 2002; Sinnakaudan et al., 2006]. The specific question of transferability, i.e., whether a laboratory-trained model can be applied to the field, has implications for the more fundamental question, whether different physical processes occur at the field and at the laboratory scale. [3] Traditional approaches have relied heavily on dimensional analysis but there is no consensus on an effectively complete set of variables governing sediment transport. Even if a complete set is known, scale effects may arise because the 1 School of Civil Engineering, Purdue University, West Lafayette, Indiana, USA. 2 Department of Civil Engineering, Sakarya University, Esentepe, Sakarya, Turkey. Copyright 2009 by the American Geophysical Union /09/2008WR functional dependence on one or more dimensionless groups differs substantially between laboratory and field scales. Consequently, statistical fitting to laboratory data will never be exposed to the full range of functional behavior, and hence such laboratory-trained models will perform poorly when applied to predict field-scale phenomenon. While the choice of relevant dimensionless groups may be the primary determinant of the performance of a prediction, the regression technique used may also play a role. Conventional multivariate regression may be overly restrictive in requiring a prespecification of the functional form, often taken to be simple linear combinations of log-transformed dimensionless groups. More recently developed techniques that are nonlinear and less restrictive, for example, those based on artificial neural networks (ANN), may still encounter problems due to the presence of local minima and overfitting. [4] Although the option of developing predictors from field data may seem attractive from a practical perspective because it presumably avoids scale effects, field data have notable shortcomings in developing physical models. In addition to their relative sparsity, uncertainties in the measurements not only of the output variables, but also of input variables, are generally significantly larger than in the laboratory. Additional complications arise in that variables in the field tend to be much more highly correlated. Channel width, flow depth, as well as sediment size and gradation, tend to be strongly correlated with channel slope, which may lead to difficulties in applying statistical techniques to obtain physically valid results. Stronger biases may also be present in that most of the field data may occupy only a small part of the parameter space, while other ranges that may be physically possible but not commonly occurring may be poorly represented. W of16

2 W08433 DOGAN ET AL.: FROM FLUMES TO RIVERS W08433 [5] A large number of sediment transport models have been proposed in the literature [ASCE, 1975], most of which have been trained on both laboratory and field data, thereby obscuring the issue of transferability. This study examines whether alluvial stream predictors based solely on laboratory data can perform well under field conditions, and if so, whether factors conducive to transferability can be identified. The first question is addressed by comparing the performance of laboratory-trained models with that of traditional models and with a model trained only from field data. A partial answer to the second question is given based on an analysis of the statistical characteristics of chosen input parameters. [6] The study makes use of relevance vector machine (RVM [Tipping, 2001]), a statistical tool with a number of appealing attributes within the present context, namely: [7] 1. RVM provides a probabilistic framework yielding both model and prediction errors, which not only quantify the inherent uncertainty in predictions, but also separate uncertainties due to noise in the training data and due to extrapolation during prediction, thus facilitating comparison of models trained on laboratory data and field data. [8] 2. RVM includes a Bayesian construct that induces regularization through automatic relevance determination, yielding more robust models that are not overly complex. [9] 3. RVM design involves relatively little subjectivity, compared to other common data-driven or machine-learning approaches. [10] A minimally complete sediment transport model should predict flow regime, stage (or depth), and total sediment transport. Many proposed models, with a few notable exceptions [van Rijn, 1984a, 1984b, 1984c, 2007a, 2007b, 2007c; Brownlie, 1981a], address only one aspect of the sediment transport problem, and as such cannot be used in a fully predictive sense. Because bed characteristics, flow resistance, and sediment transport rates are closely coupled, prediction of transport requires information regarding the flow depth, which must therefore also be predicted. A unified approach to the sediment transport problem within an RVM framework is thereby proposed. The use of data-driven or a machine learning tool for sediment transport prediction is not new [see, e.g., Bhattacharya et al., 2007; Nagy et al., 2002], but previous work has typically been limited to a single aspect, and above all has not addressed the issue of transferability. [11] The remainder of the paper is structured as follows: (1) a mathematical formulation of RVM is first outlined, (2) the data sets used in this study are then described, (3) models to predict flow regime, flow depth, and total sediment load are described, and finally (4) results are presented. 2. The Relevance Vector Machine Methodology [12] Finding relationships between independent variables (predictors) and a dependent variable (predictand) is a classical problem in statistical learning theory. If the predictand is a continuous variable, the learning problem is known as regression, whereas if the predictand is a categorical variable, the problem is one of classification. In the sediment transport literature, model relationships between predictors and predictand are often assumed to be effectively linear, e.g., when log transformed, but such a model assumption has been dictated more by the available regression tools than by any physical grounds. Over the last decade, the development of nonlinear regression models by using artificial neural networks (ANNs [ASCE Task Committee on Artificial Neural Networks in Hydrology, 2000]) has received much attention. Mathematically, an ANN belongs to a class of universal approximators, i.e., under mild restrictions on its architecture, it can learn, from the training patterns, any nonlinear continuous function to an arbitrary degree of accuracy [Hornik et al., 1989]. This flexibility comes however at a cost, in that traditional ANNs have several drawbacks including the possibility of getting trapped in local minima, the subjectivity in the choice of model architecture, and the lack of control of the complexity of ANNs in order to avoid over fitting. [13] Kernel methods in conjunction with Bayesian inference offer an elegant alternative to ANNs. The relevance vector machine (RVM) belongs to this class of methods, and like ANN, is a universal approximator, but differs in that Bayesian inference is used to minimize overfitting. An attractive feature of RVMs is a probabilistic interpretation that yields prediction uncertainty. Applications of RVMs to hydrology include groundwater quality modeling [Khalil et al., 2005], modeling of chaotic hydrologic time series [Khalil et al., 2006], and studying the impact of climate change on regional hydrology [Ghosh and Mujumdar, 2008], but few if any applications to sediment transport problems. Readers are referred to Tipping [2001], Schölkopf and Smola [2002], and Bishop [2006] for additional details on RVMs RVM for Regression [14] Consider a finite training sample of N patterns {(x i, y i ), i =1,..., N}. The i th input pattern x i denotes a vector of m independent variables (i.e. x i =[x i1,..., x im ] 2< m and X =[x 1,..., x N ] T ), and y i 2<(y =[y 1,..., y N ] T )is the corresponding dependent variable. Further, let the regression relationship be defined as y i ¼ f ðx i ; X; wþþe i ¼ XN w j K x i ; x j þ "i ð1þ j¼0 where w =[w 0, w 1,..., w N ] T is a weight vector, e i is an error term assumed to have Gaussian distribution with mean zero and variance s e 2, and K(x i, x j ) is a kernel function given as K x i ; x 1 if j ¼ 0 j ¼ exp kx i x j k 2 =s 2 kernel otherwise 2 where s kernel is the width of the kernel function. The training of an RVM model involves estimation of parameters w, s 2 2 e,ands kernel. The likelihood of the data set for this model can be written as p yw; s 2 Y N e ¼ i¼1 1 pffiffiffiffiffiffiffiffiffiffi exp 1 2ps 2 e 2s 2 e ½y i f ðx i ; X; wþš 2 In RVMs, learning is achieved by specifying a prior distribution for model parameters and estimating a posterior distribution for parameters by using the likelihood function (equation (3)). In this study, an automatic ð2þ ð3þ 2of16

3 W08433 DOGAN ET AL.: FROM FLUMES TO RIVERS W08433 relevance determination (ARD) prior of the following form was assigned to each weight vector pw j 1 a j ¼N 0; where a j are called hyperparameters. The choice of a zeromean Gaussian-prior for weights expresses a preference for smaller weights, and hence for a smoother model (equation (1)). During the process of learning, some of the a j s may approach infinity, and so the corresponding weights w j tend to be delta functions centered at zero, and are thus automatically omitted from equation (1). Only the remaining patterns corresponding to nonzero weights are deemed relevant for function approximation (hence the term, relevance vector machine). [15] Following Tipping [2001], noninformative priors were assigned to the model parameters, namely, a j and variance of the noise term, s 2 e. After assigning priors and defining the likelihood function (equation(3)), the posterior distribution of the model parameters was estimated using Bayes rule. The posterior distribution for the weight vector turns out to be a Gaussian distribution p(wjx, y) =N (m w, S w ) but those for a j and s 2 e do not have closed-form expressions, and were approximated by delta functions at their mode as a j max and 2 s e max, respectively. [16] The predictive distribution of the dependent variable y* for a new set of independent variables x* is given by py* j x*; X; y; a j max ; s 2 e max ¼N my* ; s2 y* where the mean (m y * ) and the variance (s y* 2 ) of the predictive distribution are given by a j ð4þ ð5þ m y* ¼ m wkðx*; XÞ ð6þ 2 y* ¼ 2 e max þ Kðx*; XÞT S w Kðx*; XÞ ð7þ The width of the kernel function, kernel, was determined following the procedure given by Tripathi and Govindaraju [2007] RVM for Classification [17] In a classification problem, the dependent variable t i is a binary variable t i 2 {0,1}. The RVM framework can be extended to the classification problem by applying a sigmoid function to the regression model (equation (1)) as 1 Wðf Þ ¼ 1 þ e f ð8þ (equation (1)) except that there was no noise variance term s e 2. Model parameters were estimated as before, by first assigning prior distributions and then estimating their posterior distributions using the likelihood function (equation (9)) and applying Bayes rule. Unlike the regression model, the posterior distribution of w did not have a closed form, and it was also approximated by delta function at its mode. The other aspects of the classification model formulation remained the same as the regression model. For a new input vector, the RVM model makes a probabilistic prediction for the category of the dependent variable. Additional details of the RVM model for classification can be found in the work of Bishop [2006, p. 353]. 3. Data Used in the Study [18] Building on the works of Johnson [1943] and Peterson and Howells [1973], Brownlie [1981b] compiled an extensive set of laboratory and field data. In this study, additional recent data sets were organized in a format compatible with that used by Brownlie and then merged. The assembled database contains 9437 records (5594 laboratory and 3843 field), of which 2443 records (331 laboratory and 2112 field) are in addition to the Brownlie [1981b] compilation. A list of new data sources is given in the Appendix A2. [19] For the current study, the following restrictions were applied on the data sets: [20] 1. Only data records with width-to-depth ratio, B/ D > 4, were used in order to avoid any significant sidewall effects in the laboratory data. [21] 2. The relative roughness (R/d 50 ), where R is the hydraulic radius, and d 50 is the median sediment size, was limited to R/d 50 > 100 to avoid effects due to extremely shallow conditions. [22] 3. Sediment sizes were restricted to mm < d 50 < 2.0 mm, and so the study was limited to sand sizes. The geometric standard deviation of the distribution of sediment particles (gradation, s g ) was limited to values less than 5 so as to avoid records that have excessive amounts of gravel or fine material. [23] 4. Records with sediment concentration, C < 10 ppm, were excluded due to concerns about measurement accuracy at low concentrations. This restriction was enforced only when predicting sediment transport (section 5.1.3). Temperature was used in the calculation of the kinematic viscosity, n, of water. For data records with no water temperature data, a value of 10 C was assumed. The geometric standard deviation of the distribution of sediment particles of 2 was chosen for the data records that did not have this information. Finally, 34 laboratory data points were identified as outliers (see Appendix A3) and were removed from the analysis. where, to keep the notation uncluttered, f(x i, X; w) is represented by f. The likelihood of the data set for this classification model is given by pðtw j Þ ¼ YN i¼1 ½Wðf ÞŠ ti ½1 Wðf ÞŠ 1 ti ð9þ where t =[t 1,...,t N ]. The parameters of the classification model were same as those for the regression model 3of16 4. Formulation of Models [24] The flows in the alluvial channels (flumes and rivers) were assumed to be steady, uniform, and two-dimensional with equilibrium noncohesive sediment transport. The flow regime (8), flow depth (D), and sediment concentration (C) were chosen as the dependent (unknown) variables to be determined from the following independent (measured) variables: (1) channel characteristics, slope S; (2) sediment characteristics, d 50 and specific gravity s; (3) flow/fluid

4 W08433 DOGAN ET AL.: FROM FLUMES TO RIVERS W08433 characteristics, unit discharge q and the kinematic viscosity of water n together with the acceleration due to gravity (g). [25] These variables are generally acknowledged to be relevant, and have been used in many previous models. Additional independent variables can be identified, but, like Bhattacharya et al. [2007], a minimal set was preferred. The effect of channel width, B, has been explicitly included by some [e.g., Nagy et al., 2002], but the traditional twodimensional viewpoint is taken to justify the omission of B as a relevant variable. Further, with the focus on sand sizes, the effect of gradation is considered secondary, and so s g was not included. [26] Dimensional analysis was then applied to the remaining variables to yield appropriate dimensional groups, the choice of which was substantially influenced by previous effective relationships and by a transparent physical interpretation. Because a unified approach was sought, the same set of predictors (inputs) was to be used for all predictands. This contrasts with the approach of Brownlie [1981a], who also developed a complete model, but chose substantially different sets of predictors for the depth and transport predictions. The basic relationships for the flow regime, 8, the dimensionless depth, D/D 0, and the transport concentration, C (in ppm by weight), were chosen as 8 ¼ f 1 q * ; t * ; t 0 * ; D D ¼ f 0 2 q * ; t * ; t * 0 ; C ¼ f 3 q * ; t * ; t * 0 ; t *c ; ð10þ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi where q * = qs/ gs ð 1Þd50 3 is interpreted as a dimensionless stream power, t * and t* 0 are forms of the Shields parameter, t * = RS/[(s 1)d 50 ]. A primed quantity, such as D 0 (or R 0 )andt* 0, indicates a quantity associated with the grain or skin friction. The critical Shields parameter, t * c,is associated with incipientq sediment ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi motion; a dimensionless sediment diameter, d * = gs ð 1Þd50 3 =n2, could have been used instead of t * c, but t * c waschosenasbeingmore directly related to transport. For consistency, t * c could have also been included in f 1 and f 2, but the RVM results suggested that its relevance to f 1 and f 2 was negligible, and so was omitted in the regime and depth relationships. The total of four independent variables is a middle course compared with the six in the final ANN model of Nagy et al. [2002] and the three in the machine-learning total load models of Bhattacharya et al. [2007]. [27] In its explicit use of D/D 0, the depth prediction model of equation (10) recalls the classical models of Einstein and Barbarossa [1952] and Engelund [1966], rather than the more direct modern approaches. Similarly, the related explicit use of t * 0 is rather uncommon in recently proposed transport formulae, though it is present in the van Rijn s [1984a, 1984b] transport parameter, which is combination of t * 0 and t * c. The specification of t * 0 or equivalently D0 has traditionally been based on a plane fixed-bed friction relationship, either in a log-law form (as in the Engelund or van Rijn models) or in a Manning-Strickler form (as in the Einstein-Barbarossa method). For simplicity, the latter has been chosen, with the specific relationship used being R 0 ¼ 1 ðq=dþ 3=2 pffiffiffiffiffiffiffiffiffiffiffi ð11þ d 50 6:74 gsd 50 For wide channels, D 0 = R 0, while for narrow channels, D 0 = R 0 /(1 2R 0 /B). The critical Shields stress, t * c, was evaluated according to Sturm [2001], which is based on the experimental pffiffiffiffiffiffiffi results of Yalin and Karahan [1979], for (Re * ) c = d * t * c > 1, and according to Brownlie [1981a] for (Re * ) c < 1. Strictly speaking, D/D 0 = t * /t* 0 is not independent of t* 0 and t * so that either t0 * or t * could have been omitted from the specification of D/D 0 in equation (10), but in the actual regression analysis, the added flexibility of retaining both groups was found advantageous in obtaining a better fit. [28] It would have been desirable to examine a scaled concentration (analogous to the scaled depth, D/D 0 ), but a natural concentration scale could not be found to normalize C. For the present purposes of mainly studying model transferability, this will not be of any importance, but it may be of some concern in other contexts. 5. Results and Discussion 5.1. Development and Comparison of Models [29] The RVM models (based on equation (10)) for predicting flow regime, depth, and sediment concentration, trained exclusively on laboratory data and hence collectively labeled as RVM L, were compared with some common existing methods in the prediction of field data. It is emphasized that most such existing models were developed using either solely field data or a combination of laboratory data and field data. This section outlines the steps involved in training RVM models and presents comparison results Flow Regime Prediction [30] The variable 8 in equation (10) represents flow regime, i.e., a characterization of the alluvial bed, taking a value of one for the lower regime, where the bed is covered with ripples or dunes and therefore offers significant form flow resistance, and zero for the upper regime, where the bed is plane or covered with antidunes, and offers primarily grain and hence comparatively reduced flow resistance. The boundary between the two regimes in the log-transformed space of q *, t *, and t* 0 was obtained by a linear RVM classification algorithm, the output of which is a probability for a data point to be in lower regime given the predictors p(8 =1jq *,t *,t* 0 ). To simplify the notation, p(8 =1jq *,t *,t* 0 ) will be denoted as p(l), and similarly, the probability of a data point being in the upper regime is p(u) =1 p(l). Thus p(l) = 1 indicates lower regime while p(l) = 0 indicates upper regime, and a transition regime was identified for intermediate values of p(l). [31] As illustrated in box plot in Figure 1, the median of the lower and upper flow regimes are confined to two extreme corners on the y axis, demonstrating that the chosen predictors can effectively classify upper and lower regimes. Further, the spread of p(l) for the transition regime suggests that it may not be prudent to make a discrete classification for this regime. An advantage offered by the probabilistic viewpoint of RVM over other existing flow regime prediction methods is that a definite decision regarding flow regime can be avoided, which may be especially useful for data points that are in or close to the transition state. Probabilistic classification by RVM may also be considered more realistic in view of the inevitable ambiguity associated with labeling observed data particularly under field conditions. 4of16

5 W08433 DOGAN ET AL.: FROM FLUMES TO RIVERS W08433 Figure 1. Illustration of probability associated with different regimes using boxplots. The top and bottom of the box ranges from 25 to 75 percentile, whereas whiskers extend to 10 and 90 percentile. The thick dashed line dividing the box represents the median value. Lower, Transition, PLA, and Upper refer to lower, transition, plane bed with sediment transport, and upper regime, respectively. The quantity p(l) represents probability of a data point to be in lower regime. [32] Table 1 presents the comparison of RVM L in predicting lower and upper regimes with some existing flow regime prediction methods. The values in the second column of Table 1 are the total number of data points in lower or upper flow regimes based on observed bed forms, while values in the subsequent columns represent the number of data points correctly classified by different regime separation methods. Among the existing methods, classification by van Rijn [1984c] and White et al. [1987] methods are biased toward predicting lower regime, while Brownlie [1983] performs relatively better in classifying both laboratory and field data. The performance of RVM L is better, or at least comparable, to any of the other traditional methods on field data suggesting that RVM trained solely on flume data can satisfactorily classify flow regimes in field data. [33] Even though probabilistic classification discourages the use of thresholds, for purposes of comparison with existing methods, the criterion, 0.1 < p(l) < 0.9, was adopted for classifying the transition regime. The threshold values were selected based on the results presented in Figure 1 for laboratory data. The performances of different methods in identifying the transition regime are given in Table 2, where RVM L is seen to perform reasonably in identifying the transition regime in both laboratory and field data. All the methods considered for regime classification (Tables 1 and 2) use observed flow depth, but the usefulness of regime classification in the present approach lies in prediction of flow depth which will be presented in the next subsection Flow Depth Prediction [34] The log-transformed values of q *, t *, and t 0 * formed the inputs, and D/D 0 the output of the RVM depth prediction model. The functional relationship between inputs and output was separately learned for lower and upper regime on flume data. In the upper regime, due to absence of significant form friction, D/D 0 was found to be close to unity and therefore the flow depth D up was set to D 0. The lower regime flow depth D low was computed using linear RVM whose parameters are given in the Appendix A4. The flow depth D was determined according to the RVM regime classification as Lower regime if p l ðlþ > 0:9 and p u ðlþ > 0:1; ) D ¼ D low ; upper regime if p l ðlþ < 0:9 and p u ðlþ < 0:1; ) D ¼ D up ¼ D 0 ; otherwise transition ) D ¼ D lowp l ðlþþd up ½1 p u ðlþš ð12þ p l ðlþþ½1 p u ðlþš where p l (L) and p u (L) are the probabilities of the data point being in the lower regime based on D low and D up respectively. In the present approach, regime and depth prediction are treated in a coupled manner, with both problems being solved simultaneously. Table 1. Number of Data Points for Which Flow Regime is Correctly Predicted by Various Models a Methods Observed Bogardi [1959] Brownlie [1981a] van Rijn [1984c] White et al. [1987] Karim [1995] RVM L Model Data used F F and R F and R F F and R F Lab L(710) U(280) Field L(325) U(38) a L and U represent lower and upper regimes, respectively. F or R denotes whether flume data or river data were used in developing the model. 5of16

6 W08433 DOGAN ET AL.: FROM FLUMES TO RIVERS W08433 Table 2. Number of Data Points for Which Flow Regime is Correctly Predicted in Transition Regime a Methods Observed Bogardi [1959] Brownlie [1981a] van Rijn [1984c] White et al. [1987] Karim [1995] RVM L Model Lab T(103) Field T(31) a For RVM L, a point is classified to be in the transition regime if 0.1 < p(l) < 0.9, wherein p(l) is the probability that the data point is in lower regime. [35] Tables 3 and 4 present comparisons between RVM L model with existing methods for predicting D in laboratory and field channels respectively. The measures of performance are the following statistics: [36] 1. Percentage relative bias (R-Bias) in preservation of mean, expressed as R Bias ¼ 1 N p X Np i¼1 D i ^D i 100 ð13þ D i where D i and ^D i are observed and predicted depth for the i th data point respectively, and N p is the total number of data points. For an ideal model, R-Bias should be zero; R-Bias > 0 or R-Bias < 0 indicates an underpredicting or overpredicting model. [37] 2. Percentage normalized mean square error (N-MSE), defined as the ratio of the mean square error in predicting depth to the variance, S 2 obs, of the observed depth, 1 N N MSE ¼ X Np 2 D i ^D i i¼1 Sobs ð14þ [38] 3. Number of data points for which the predicted values of depth lie within 10%, 20%, and 30%, respectively, CI10, CI20, CI30, of the observed depth. All the existing models used in the comparison, except the Brownlie [1983] method, estimate D iteratively and a guessed value is required to start the iteration. For existing methods, the observed D was used as the initial guess. For RVM, D 0 and 3D 0 were used as the initial guesses for estimating D up and D low respectively, these values being based on the average value of D/D 0 in the laboratory data. [39] For laboratory data (Table 3), the Brownlie [1983] method outperforms other models in terms of both R-Bias and N-MSE statistics. The Engelund [1966] model performs fairly well as does the van Rijn [1984c] model, except for the upper-regime data points where it tends to overpredict. The RVM L has higher R-Bias and intermediate N-MSE; its performance though not outstanding, compares well with the existing methods. For field data (Table 4), the Engelund [1966] model has the smallest R-Bias but also the highest N-MSE, whereas Brownlie [1983] has the smallest N-MSE and also relatively small bias. In comparison with other models, the relatively robust performance of RVM L on field data is better than on laboratory data, rather surprising since RVM L was trained only on laboratory data. [40] The comparison presented thus far is not an independent check because the data used for verification were also used in calibrating most of the existing methods. In a more stringent verification, two data sets, one for Malaysian rivers [Sinnakaudan et al., 2006] and the other for the Lower Yellow River [Long and Liang, 1994], were selected. These data sets are independent in that they were not used in the calibration, and also have a fairly long record that facilitates a meaningful comparison. The performance of RVM L is compared with those of Brownlie [1983] and van Rijn [1984c] methods in Table 5. RVM L performed best in predicting D for the Lower Yellow River in terms of R-Bias, N-MSE, and number of points with predicted D within a specified band. All methods were characterized by high values of R-Bias and N-MSE in predicting D for the Malaysian rivers, but the RVM L had the largest values of CI10, CI20, and CI Prediction of Total Sediment Concentration [41] The log-transformed values of q *, t *, t* 0, and t * formed the input to the transport model with log C as the c output. The performance of the linear RVM L model in estimating C is compared with those of eleven well-known prediction equations in Table 6. The comparison measures are based on a discrepancy ratio, DR = C pred /C obs, where C pred and C obs are predicted and observed values. Following Brownlie [1981a], DR was assumed to be lognormally distributed, and so its geometric mean (GM) and geometric standard deviation (Gstd) were estimated for each method. The number of data points satisfying 0.5 DR 2, denoted as DR 2, is also reported. For most existing models Table 3. Performance of Flow Depth Prediction Methods a Investigator(s) Data Used Laboratory Number R-Bias N-MSE CI 10 CI 20 CI 30 Engelund [1966] F and R Brownlie [1983] F and R van Rijn [1984c] F and R Karim and Kennedy [1990] F and R Karim [1995] F and R Karim [1999] F RVM L model F a Number of points for which algorithm converged (Number), percentage relative bias (R-bias), percentage normalized mean square error (N-MSE), and number of data points for which predicted depth was within ±10%, ±20%, and ±30% of the observed depth (CI 10, CI 20, and CI 30) are shown for laboratory data. F or R denotes whether flume data or river data were used in developing the model. 6of16

7 W08433 DOGAN ET AL.: FROM FLUMES TO RIVERS W08433 Table 4. Performance of Flow Depth Prediction Methods on Field Data a Investigator(s) Field Number R-Bias N-MSE CI 10 CI 20 CI 30 Engelund [1966] Brownlie [1983] van Rijn [1984c] Karim and Kennedy [1990] Karim [1995] Karim [1999] RVM L model a Number of points for which algorithm converged (Number), percentage relative bias (R-bias), percentage normalized mean square error (N-MSE), and number of data points for which predicted depth was within ±10%, ±20%, and ±30% of the observed depth (CI 10, CI 20, and CI 30) are shown. applied to the field data, GM < 1, indicative of a tendency to underpredict C. From Table 6, the performance of RVM L model for predicting C is comparable to existing models, almost all of which had the advantage of being trained to some extent on field data. [42] The data sets from Malaysian rivers [Sinnakaudan et al., 2006] and the Lower Yellow River [Long and Liang, 1994] were selected again for independent verification of model performance. The performance of RVM L is compared with that of Brownlie [1981a], van Rijn [1984c] and Molinas and Wu [2001] methods in Table 7. All models tend to underpredict C for both Malaysian rivers and Lower Yellow River. For the Malaysian rivers, the performance of RVM L is marginally better than other methods but for Lower Yellow River, it is substantially better. [43] In the results presented thus far, the observed D was used in the estimation of C. Table 8 presents the performance of three prediction methods when D was assumed to be unknown and was estimated as part of the prediction process. For each method compared, namely that of Brownlie [1981a] and van Rijn [1984a, 1984b, 1984c], D was estimated using the depth predictor developed by the corresponding investigator. The results so obtained reflect the actual performance that may be expected in predicting C for field applications. The performance degraded for all three methods compared to that obtained using observed D (Table 6), but the relative performance of the methods remained the same with RVM L having the largest value of DR 2, followed by Brownlie [1981a] and van Rijn [1984a, 1984b, 1984c], respectively. [44] The above results for predicting flow regime, flow depth, and total sediment concentration support the claim that empirical models trained solely on laboratory data can be developed that perform as well or, even in some specific cases and according to some performance measures, markedly better than existing models trained partially or wholly with field data. This suggests that judicious model design, through choice of input variables, can largely avoid scale effects, and that explanations for any poor predictive performance need to be sought elsewhere. The distinct difference in performance of the methods, including RVM L, applied to the Malaysian and Lower Yellow river data (Table 7) is a case in point Comparison of RVM Trained on Laboratory Data (RVM L ) With RVM Trained on River Data (RVM R ) [45] The issue of transferability is further studied by comparing the performance of the same basic model, as specified, e.g., in equation (10), trained separately on laboratory and on field data. The problem of predicting C rather than D or 8 was selected for further investigation as the most challenging. An RVM model (RVM R ) was developed using only field data for training. Prediction from RVM R for field data was obtained using 5-fold cross validation. The field data were divided into 5 disjoint subsets, and the model was trained on all subsets except one and the prediction was made on the subset left out. The procedure was repeated 5 times, each time selecting a different subset for testing, such that each data point in the field data set becomes a part of test data once. [46] Table 9 compares the performance of RVM L and RVM R on field data, and as expected, RVM R performs on average better than RVM L, but the difference in performance measures is small, raising the question whether the difference is statistically significant. This was investigated by considering the prediction variance, s 2 y *, which, in an RVM model, consists firstly of model error and secondly of uncertainty in estimates of RVM model parameters (equation (7)). The model error accounts for the precision with which data can be modeled by RVM, and is measured by the variance in the dependent variable of the training data that cannot be explained by RVM. Because the laboratory data have smaller uncertainties, the model error of RVM L was less than the model error of RVM R as illustrated in Figure 2a. The model error remains same for all the test data points. The second contribution to s 2 y * accounts for the uncertainty in the RVM model parameter (w) and varies with each test data point. If the test data point is far from the training data in the space of Table 5. Comparison of Different Methods in Predicting Flow Depth of Malaysian Rivers [Sinnakaudan et al., 2006] and Lower Yellow River [Long and Liang, 1994] a Malaysian Rivers Lower Yellow River Measures Brownlie [1983] van Rijn [1984c] RVM L Model Brownlie [1983] van Rijn [1984c] RVM L Model Number R-Bias N-MSE CI CI CI a Number of points for which algorithm converged (Number), percentage relative bias (R-bias), percentage normalized mean square error (N-MSE), and number of data points for which predicted depth was within ±10%, ±20%, and ±30% of the observed depth (CI 10, CI 20, and CI 30) are provided. 7of16

8 W08433 DOGAN ET AL.: FROM FLUMES TO RIVERS W08433 Table 6. Performance of Total Sediment Transport Prediction Methods for 1210 Laboratory and 2911 Field Data Points a Laboratory Field Investigator(s) Data Used GM GStd DR 2 GM GStd DR 2 Rottner [1959] F Engelund and Hansen [1967] F Acaroglu [1968] F and R Shen and Hung [1972] F and R Yang [1973] F and R Ackers and White [1973] F Brownlie [1981a] F and R van Rijn [1984a, 1984b] F and R Karim and Kennedy [1990] F and R Molinas and Wu [2001] R Yang [2005] F RVM L model F a Geometric mean (GM) and geometric standard deviation (GStd) of the ratio of predicted to observed concentration (discrepancy ratio) along with number of points with discrepancy ratio less than two (DR 2 ). Table 7. Performance of Total Sediment Transport Estimating Methods for Malaysian Rivers (289 Data Points) and Lower Yellow River (935 Points) a Investigator(s) Malaysian Rivers Lower Yellow River GM GStd DR 2 GM GStd DR 2 Brownlie [1981a] van Rijn [1984a, 1984b] Molinas and Wu [2001] RVM L model a Geometric mean (GM) and geometric standard deviation (GStd) of the ratio of predicted to observed concentration (discrepancy ratio) along with number of points with discrepancy ratio less than two (DR 2 ). input variables (i.e. the model is extrapolating), the uncertainty in the model parameter will be high. Because the laboratory-trained model must extrapolate further when making predictions for field data than a field-trained model, the uncertainty in model parameter for RVM L was larger than that of RVM R as shown in Figure 2b. Figure 3 presents the combined prediction uncertainty from RVM L and RVM R for a few randomly selected points in the field data. [47] The statistical significance of the difference between predictions from RVM L and RVM R was assessed by using the null (no difference) hypothesis considering three significance levels (1%, 5%, and 10%). Let m Li and m Ri, and s Li and s Ri denote the mean and the standard error of the prediction of log-transformed concentration at ith test point by using RVM L and RVM R, respectively. The difference in the mean q prediction dp i = m Li m Ri has standard ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi error Sp i = s 2 Li þ s2 Ri, and the test statistics z = dp i /Sp i has an approximate normal distribution [Altman and Bland, 2003]. The null hypothesis that the mean predictions from RVM L and RVM R are equal could not be rejected at 1%, 5%, and 10% significance levels, leading to the inference that, given the uncertainty in the predictions, the performance differences between RVM L and RVM R are not statistically significant Assessing Transferability of a Brownlie-Type Model [48] The above analysis of the laboratory-trained and field-trained RVM models led to conclusion that performance differences were not statistically significant, and a similar analysis for an existing model was motivated to determine the effect of individual model elements. A Brownlie [1981a]-type model was selected for this purpose as a widely known modern example that was also convenient for computer modeling. [49] The input variables and the power law structure of the Brownlie [1981a] model were retained, but the model constants (coefficient and exponents) were reevaluated, separately for laboratory and field data, using the data set compiled as part of this study (the constants are given in the Appendix A5). The constants for the two models, to be labeled as BRN L and BRN R, are noticeably different. Table 9 presents the prediction performance of BRN L and BRN R for predicting field concentrations, and as expected, BRN R performs better than BRN L. The statistical significance of the difference between the point predictions of BRN L and BRN R was assessed by testing the null hypothesis considering three significance levels (1%, 5%, and 10%), with a standard two-sample t distribution test, assuming that the predicted values of log C follow a normal distribution. The null hypothesis of the same mean prediction from BRN L and BRN R was rejected at all three significance levels, suggesting that the point predictions from BRN L and BRN R are indeed statistically different. In contrast to the RVM models, a Brownlie-type model suffers from poor transferability. [50] The proposed RVM model differs from the Brownlietype models (BRN L or BRN R ) in two important respects, firstly in its choice of input variables, and secondly in the Table 8. Performance of Total Sediment Transport Predicting Methods for Field Data (2911 Points) When Both Sediment Concentration and Flow Depth are Assumed to be Unknown a Investigator Field GM GStd DR 2 Brownlie [1981a] van Rijn [1984a, 1984b, 1984c] RVM L model a Geometric mean (GM) and geometric standard deviation (GStd) of the ratio of predicted to observed concentration along with number of points with discrepancy ratio less than two (DR 2 ). 8of16

9 W08433 DOGAN ET AL.: FROM FLUMES TO RIVERS W08433 Figure 2. Standard error in the estimate of total sediment concentration for field data: (a) uncertainty due to model error and (b) uncertainty in estimate of model parameters. RVM L and RVM R refer to RVM model trained on laboratory and river data, respectively. regression methodology (and hence functional form of the model). In order to determine more precisely the reasons for the poor transferability of BRN, linear RVMs were also developed with the same input variables of the BRN models, and again trained separately on laboratory and field data. The performance measures for these models (BRN-RVM L and BRN-RVM R ) for predicting field data are also shown in Table 9. The application of the RVM approach with the Brownlie input variables does not enhance the performance of the Brownlie-type models. This points to the choice of input variables rather than the regression methodology as the main explanation for the poor transferability Input Variables, Transferability, and KL Divergence [51] If as hypothesized poor transferability can be attributed to a poor choice of input variables, the identification of better input variables becomes the prime problem, requiring both physical as well as statistical insights for resolution. The present work investigates only a single statistical aspect, namely the similarity of probability density function (pdf) of each input variable in the laboratory and in the field. Considering the input variables to the RVM and Brownlietype models as random variables, Figures 4 and 5 compare the marginal pdfs of each input variable in the laboratory and in the field. These were estimated using a Gaussian kernel smoother which is a standard nonparametric method of estimating a density function [Bowman and Azzalini, 1997]. Unlike the other input variables, R/d 50 and S that serve as explicit independent inputs to a Brownlie-type model have very distinct distributions derived from laboratory and field data. Thus if BRN L is used for field predictions, then extrapolation is inevitable in the range of input variable for which laboratory data is not available, which will most likely adversely affect the transferability of a laboratorytrained model. The conditional distribution of the prediction Figure 3. Prediction of total sediment concentration by RVM L and RVM R for a few randomly selected field data points. The solid squares correspond to the observed concentration. The crosses and the circles refer to the mean prediction by RM L and RVM R, respectively. The error bar stretching across the crosses and circles denote one standard deviation of the prediction variance. 9of16

10 W08433 DOGAN ET AL.: FROM FLUMES TO RIVERS W08433 Table 9. Performance of Various Models When the Same Input Variables are Used but the Models are Trained Solely on Laboratory Data (Subscript L) and Also Trained Solely on Field Data (Subscript R) Model Field GM GStd DR 2 RVM L RVM R BRN L BRN R BRN-RVM L BRN-RVM R error in log C, given S and R/d 50, were estimated for BRN L and BRN R models, and the conditional expected value of the errors are shown in Figure 6. For illustration purposes, the marginal pdfs are plotted again (upper panels in Figure 6). The error in predicting C is larger for BRN L model in the regions of input variable space where laboratory data is sparse or not available, with the implication that the difference in predictions of BRN L and BRN R can be attributed to extrapolation in those regions. [52] One criterion for a model to achieve better transferability is that the need for extrapolation during prediction should be minimized, as exemplified by the RVM L model. This can be quantified by means of the Kullback-Leibler (KL) divergence [Kullback and Leibler, 1951], which measures the difference between two distributions. Let p L (x) and p R (x) be the pdfs of a variable x in laboratory and field data respectively. The KL-divergence between the two distributions is given by where KLDðp L ; p R Z D KL ðp L jp R Þ ¼ Z D KL ðp R jp L Þ ¼ Þ ¼ 0:5fD KL ðp L jp R ÞþD KL ðp R jp L Þg ð15þ p L ðþln x p LðÞ x dx; and p R ðþ x p R ðþln x p RðÞ x dx: p L ðþ x ð16þ The KLD is nonnegative, with a value of zero if, and only if, p L (x) =p R (x). Variables having similar distributions in laboratory and field data will have smaller values of KLD. For developing a model with a high degree of transferability from flume to river, input variables with low KLD should be preferred. Table 10 compares the KLD of the input variables in the RVM and Brownlie [1981a] models, and as might be expected from Figures 4 and 5, the KLD for input variables in the RVM model are substantially smaller than those for Brownlie s model. [53] The KLD can be estimated for each variable separately, but this ignores the dependence among variables. Input variables having similar mutual dependence in laboratory and field data should be more conducive to better transferability, though one to one correspondence is unlikely. To estimate a multivariate KLD that accounts for the dependence among input variables, the marginal distributions p L (x) andp R (x) should be replaced by joint distributions, Figure 4. Marginal probability density functions (pdfs) of input variables in RVM model for predicting total sediment concentration. The solid lines denote pdfs the input variables for the laboratory data, and the dashed lines represent the same for field data. 10 of 16

11 W08433 DOGAN ET AL.: FROM FLUMES TO RIVERS W08433 Figure 5. Marginal probability density functions (pdfs) of input variables in Brownlie [1981a] model for predicting total sediment concentration. The solid lines denote the pdfs of the input variables for the laboratory data, and the dashed lines represent the same for field data. q L (x) andq R (x), where x is random vector of input variables. Intuitively, KLD in this case is a measure of distance between two joint pdfs in a multidimensional space of input variables. The estimation of joint distributions is, in general, not trivial, but they can be approximated by copulas, a statistical tool for formulating multivariate distributions [Nelsen, 2006; Renard and Lang, 2007]. The multivariate KLD of the input variables in RVM and Brownlie-type models obtained using a Gaussian copula are provided in the last column of Table 10. As expected, the multivariate KLD for input variables in the RVM model is smaller than that for the input variables in Brownlie s model Further Comments [54] In their development of machine-learning approaches to sediment transport prediction, Bhattacharya et al. [2007] had already emphasized the importance of similar distributions in training and testing data. The difference in goals and hence strategy from the present study should be highlighted. Because of their focus on develop- Figure 6. Marginal probability density functions (pdfs) of (a) relative roughness (R/d 50 ) and (c) slope (S), which serve as inputs vector to Brownlie [1981a] model for predicting total sediment concentration. Expected value of error in predicting log transformed concentration by BRN L and BRN R models conditioned on the values of input variables (b) R/d 50 and (d) S, respectively. 11 of 16

Introduction. Chapter 1

Introduction. Chapter 1 Chapter 1 Introduction In this book we will be concerned with supervised learning, which is the problem of learning input-output mappings from empirical data (the training dataset). Depending on the characteristics

More information

A Critical Study Of Total Bed Material Load Predictors

A Critical Study Of Total Bed Material Load Predictors A Critical Study Of Total Bed Material Load Predictors Mubeen Beg 1 Nadeem Ahmad 2 1Associate Professor, Civil Engineering department, AMU, Aligarh,202002, U.P.,India, Email Id:raisbeg2013@gmail.com 2P.G.

More information

INTRODUCTION TO PATTERN RECOGNITION

INTRODUCTION TO PATTERN RECOGNITION INTRODUCTION TO PATTERN RECOGNITION INSTRUCTOR: WEI DING 1 Pattern Recognition Automatic discovery of regularities in data through the use of computer algorithms With the use of these regularities to take

More information

3 Theoretical Basis for SAM.sed Calculations

3 Theoretical Basis for SAM.sed Calculations 3 Theoretical Basis for SAM.sed Calculations Purpose Sediment transport functions can be used to calculate the bed material portion of the sediment discharge rating curve. This rating curve can then be

More information

Relevance Vector Machines for Earthquake Response Spectra

Relevance Vector Machines for Earthquake Response Spectra 2012 2011 American American Transactions Transactions on on Engineering Engineering & Applied Applied Sciences Sciences. American Transactions on Engineering & Applied Sciences http://tuengr.com/ateas

More information

PATTERN RECOGNITION AND MACHINE LEARNING

PATTERN RECOGNITION AND MACHINE LEARNING PATTERN RECOGNITION AND MACHINE LEARNING Chapter 1. Introduction Shuai Huang April 21, 2014 Outline 1 What is Machine Learning? 2 Curve Fitting 3 Probability Theory 4 Model Selection 5 The curse of dimensionality

More information

Development of Stochastic Artificial Neural Networks for Hydrological Prediction

Development of Stochastic Artificial Neural Networks for Hydrological Prediction Development of Stochastic Artificial Neural Networks for Hydrological Prediction G. B. Kingston, M. F. Lambert and H. R. Maier Centre for Applied Modelling in Water Engineering, School of Civil and Environmental

More information

A Statistical Input Pruning Method for Artificial Neural Networks Used in Environmental Modelling

A Statistical Input Pruning Method for Artificial Neural Networks Used in Environmental Modelling A Statistical Input Pruning Method for Artificial Neural Networks Used in Environmental Modelling G. B. Kingston, H. R. Maier and M. F. Lambert Centre for Applied Modelling in Water Engineering, School

More information

Maximum Direction to Geometric Mean Spectral Response Ratios using the Relevance Vector Machine

Maximum Direction to Geometric Mean Spectral Response Ratios using the Relevance Vector Machine Maximum Direction to Geometric Mean Spectral Response Ratios using the Relevance Vector Machine Y. Dak Hazirbaba, J. Tezcan, Q. Cheng Southern Illinois University Carbondale, IL, USA SUMMARY: The 2009

More information

10-701/ Machine Learning, Fall

10-701/ Machine Learning, Fall 0-70/5-78 Machine Learning, Fall 2003 Homework 2 Solution If you have questions, please contact Jiayong Zhang .. (Error Function) The sum-of-squares error is the most common training

More information

Relevance Vector Machines

Relevance Vector Machines LUT February 21, 2011 Support Vector Machines Model / Regression Marginal Likelihood Regression Relevance vector machines Exercise Support Vector Machines The relevance vector machine (RVM) is a bayesian

More information

Prediction of bed form height in straight and meandering compound channels

Prediction of bed form height in straight and meandering compound channels Water Resources Management III 311 Prediction of bed form height in straight and meandering compound channels R. D. Karamisheva, J. F. Lyness, W. R. C. Myers, J. O Sullivan & J. B. C. Cassells School of

More information

Evaluation of Sediment Transport Function using Different Fall Velocity Equations

Evaluation of Sediment Transport Function using Different Fall Velocity Equations GRD Journals Global Research and Development Journal for Engineering Recent Advances in Civil Engineering for Global Sustainability March 2016 e-issn: 2455-5703 Evaluation of Sediment Transport Function

More information

Module 2. The Science of Surface and Ground Water. Version 2 CE IIT, Kharagpur

Module 2. The Science of Surface and Ground Water. Version 2 CE IIT, Kharagpur Module The Science of Surface and Ground Water Lesson Sediment Dynamics in Alluvial Rivers and Channels Instructional Objectives On completion of this lesson, the student shall be able to learn the following:.

More information

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008 Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table

More information

Lecture 3: Statistical Decision Theory (Part II)

Lecture 3: Statistical Decision Theory (Part II) Lecture 3: Statistical Decision Theory (Part II) Hao Helen Zhang Hao Helen Zhang Lecture 3: Statistical Decision Theory (Part II) 1 / 27 Outline of This Note Part I: Statistics Decision Theory (Classical

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information

Advanced Machine Learning Practical 4b Solution: Regression (BLR, GPR & Gradient Boosting)

Advanced Machine Learning Practical 4b Solution: Regression (BLR, GPR & Gradient Boosting) Advanced Machine Learning Practical 4b Solution: Regression (BLR, GPR & Gradient Boosting) Professor: Aude Billard Assistants: Nadia Figueroa, Ilaria Lauzana and Brice Platerrier E-mails: aude.billard@epfl.ch,

More information

Neutron inverse kinetics via Gaussian Processes

Neutron inverse kinetics via Gaussian Processes Neutron inverse kinetics via Gaussian Processes P. Picca Politecnico di Torino, Torino, Italy R. Furfaro University of Arizona, Tucson, Arizona Outline Introduction Review of inverse kinetics techniques

More information

DEPARTMENT OF COMPUTER SCIENCE Autumn Semester MACHINE LEARNING AND ADAPTIVE INTELLIGENCE

DEPARTMENT OF COMPUTER SCIENCE Autumn Semester MACHINE LEARNING AND ADAPTIVE INTELLIGENCE Data Provided: None DEPARTMENT OF COMPUTER SCIENCE Autumn Semester 203 204 MACHINE LEARNING AND ADAPTIVE INTELLIGENCE 2 hours Answer THREE of the four questions. All questions carry equal weight. Figures

More information

Homework 2 Solutions Kernel SVM and Perceptron

Homework 2 Solutions Kernel SVM and Perceptron Homework 2 Solutions Kernel SVM and Perceptron CMU 1-71: Machine Learning (Fall 21) https://piazza.com/cmu/fall21/17115781/home OUT: Sept 25, 21 DUE: Oct 8, 11:59 PM Problem 1: SVM decision boundaries

More information

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition NONLINEAR CLASSIFICATION AND REGRESSION Nonlinear Classification and Regression: Outline 2 Multi-Layer Perceptrons The Back-Propagation Learning Algorithm Generalized Linear Models Radial Basis Function

More information

Introduction to SVM and RVM

Introduction to SVM and RVM Introduction to SVM and RVM Machine Learning Seminar HUS HVL UIB Yushu Li, UIB Overview Support vector machine SVM First introduced by Vapnik, et al. 1992 Several literature and wide applications Relevance

More information

CS168: The Modern Algorithmic Toolbox Lecture #6: Regularization

CS168: The Modern Algorithmic Toolbox Lecture #6: Regularization CS168: The Modern Algorithmic Toolbox Lecture #6: Regularization Tim Roughgarden & Gregory Valiant April 18, 2018 1 The Context and Intuition behind Regularization Given a dataset, and some class of models

More information

Nonparametric Bayesian Methods (Gaussian Processes)

Nonparametric Bayesian Methods (Gaussian Processes) [70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent

More information

Lecture 9: Bayesian Learning

Lecture 9: Bayesian Learning Lecture 9: Bayesian Learning Cognitive Systems II - Machine Learning Part II: Special Aspects of Concept Learning Bayes Theorem, MAL / ML hypotheses, Brute-force MAP LEARNING, MDL principle, Bayes Optimal

More information

Confidence Estimation Methods for Neural Networks: A Practical Comparison

Confidence Estimation Methods for Neural Networks: A Practical Comparison , 6-8 000, Confidence Estimation Methods for : A Practical Comparison G. Papadopoulos, P.J. Edwards, A.F. Murray Department of Electronics and Electrical Engineering, University of Edinburgh Abstract.

More information

Linear Model Selection and Regularization

Linear Model Selection and Regularization Linear Model Selection and Regularization Recall the linear model Y = β 0 + β 1 X 1 + + β p X p + ɛ. In the lectures that follow, we consider some approaches for extending the linear model framework. In

More information

CHAPTER 2- BACKGROUND. INVESTIGATIONS OF COMPOSITE ROUGHNESS COEFFICIENT IN A RIVER WITH LOW FLOW

CHAPTER 2- BACKGROUND. INVESTIGATIONS OF COMPOSITE ROUGHNESS COEFFICIENT IN A RIVER WITH LOW FLOW 2. Background 2.1 Introduction The estimation of resistant coefficient and hence discharge capacity in a channel or river is one of the fundamental problems facing river engineers. When applying Manning

More information

Support Vector Machine (SVM) and Kernel Methods

Support Vector Machine (SVM) and Kernel Methods Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2015 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin

More information

COMP 551 Applied Machine Learning Lecture 21: Bayesian optimisation

COMP 551 Applied Machine Learning Lecture 21: Bayesian optimisation COMP 55 Applied Machine Learning Lecture 2: Bayesian optimisation Associate Instructor: (herke.vanhoof@mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/comp55 Unless otherwise noted, all material posted

More information

Click Prediction and Preference Ranking of RSS Feeds

Click Prediction and Preference Ranking of RSS Feeds Click Prediction and Preference Ranking of RSS Feeds 1 Introduction December 11, 2009 Steven Wu RSS (Really Simple Syndication) is a family of data formats used to publish frequently updated works. RSS

More information

Feature selection and classifier performance in computer-aided diagnosis: The effect of finite sample size

Feature selection and classifier performance in computer-aided diagnosis: The effect of finite sample size Feature selection and classifier performance in computer-aided diagnosis: The effect of finite sample size Berkman Sahiner, a) Heang-Ping Chan, Nicholas Petrick, Robert F. Wagner, b) and Lubomir Hadjiiski

More information

Machine Learning for OR & FE

Machine Learning for OR & FE Machine Learning for OR & FE Regression II: Regularization and Shrinkage Methods Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com

More information

Statistics: Learning models from data

Statistics: Learning models from data DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1396 1 / 44 Table

More information

The UCD community has made this article openly available. Please share how this access benefits you. Your story matters!

The UCD community has made this article openly available. Please share how this access benefits you. Your story matters! Provided by the author(s) and University College Dublin Library in accordance with publisher policies., Please cite the published version when available. Title Sediment transport formulae for compound

More information

Choosing the Summary Statistics and the Acceptance Rate in Approximate Bayesian Computation

Choosing the Summary Statistics and the Acceptance Rate in Approximate Bayesian Computation Choosing the Summary Statistics and the Acceptance Rate in Approximate Bayesian Computation COMPSTAT 2010 Revised version; August 13, 2010 Michael G.B. Blum 1 Laboratoire TIMC-IMAG, CNRS, UJF Grenoble

More information

Learning Gaussian Process Models from Uncertain Data

Learning Gaussian Process Models from Uncertain Data Learning Gaussian Process Models from Uncertain Data Patrick Dallaire, Camille Besse, and Brahim Chaib-draa DAMAS Laboratory, Computer Science & Software Engineering Department, Laval University, Canada

More information

18.9 SUPPORT VECTOR MACHINES

18.9 SUPPORT VECTOR MACHINES 744 Chapter 8. Learning from Examples is the fact that each regression problem will be easier to solve, because it involves only the examples with nonzero weight the examples whose kernels overlap the

More information

Support Vector Machine (SVM) and Kernel Methods

Support Vector Machine (SVM) and Kernel Methods Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2014 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin

More information

Recent Advances in Bayesian Inference Techniques

Recent Advances in Bayesian Inference Techniques Recent Advances in Bayesian Inference Techniques Christopher M. Bishop Microsoft Research, Cambridge, U.K. research.microsoft.com/~cmbishop SIAM Conference on Data Mining, April 2004 Abstract Bayesian

More information

Curve Fitting Re-visited, Bishop1.2.5

Curve Fitting Re-visited, Bishop1.2.5 Curve Fitting Re-visited, Bishop1.2.5 Maximum Likelihood Bishop 1.2.5 Model Likelihood differentiation p(t x, w, β) = Maximum Likelihood N N ( t n y(x n, w), β 1). (1.61) n=1 As we did in the case of the

More information

Analysis of Fast Input Selection: Application in Time Series Prediction

Analysis of Fast Input Selection: Application in Time Series Prediction Analysis of Fast Input Selection: Application in Time Series Prediction Jarkko Tikka, Amaury Lendasse, and Jaakko Hollmén Helsinki University of Technology, Laboratory of Computer and Information Science,

More information

Sparse Linear Models (10/7/13)

Sparse Linear Models (10/7/13) STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine

More information

Sediment transport and river bed evolution

Sediment transport and river bed evolution 1 Chapter 1 Sediment transport and river bed evolution 1.1 What is the sediment transport? What is the river bed evolution? System of the interaction between flow and river beds Rivers transport a variety

More information

Algorithm-Independent Learning Issues

Algorithm-Independent Learning Issues Algorithm-Independent Learning Issues Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2007 c 2007, Selim Aksoy Introduction We have seen many learning

More information

Learning features by contrasting natural images with noise

Learning features by contrasting natural images with noise Learning features by contrasting natural images with noise Michael Gutmann 1 and Aapo Hyvärinen 12 1 Dept. of Computer Science and HIIT, University of Helsinki, P.O. Box 68, FIN-00014 University of Helsinki,

More information

Discriminative Direction for Kernel Classifiers

Discriminative Direction for Kernel Classifiers Discriminative Direction for Kernel Classifiers Polina Golland Artificial Intelligence Lab Massachusetts Institute of Technology Cambridge, MA 02139 polina@ai.mit.edu Abstract In many scientific and engineering

More information

INTRODUCTION TO PATTERN

INTRODUCTION TO PATTERN INTRODUCTION TO PATTERN RECOGNITION INSTRUCTOR: WEI DING 1 Pattern Recognition Automatic discovery of regularities in data through the use of computer algorithms With the use of these regularities to take

More information

Machine Learning Linear Classification. Prof. Matteo Matteucci

Machine Learning Linear Classification. Prof. Matteo Matteucci Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)

More information

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability

More information

MODULE -4 BAYEIAN LEARNING

MODULE -4 BAYEIAN LEARNING MODULE -4 BAYEIAN LEARNING CONTENT Introduction Bayes theorem Bayes theorem and concept learning Maximum likelihood and Least Squared Error Hypothesis Maximum likelihood Hypotheses for predicting probabilities

More information

GWAS V: Gaussian processes

GWAS V: Gaussian processes GWAS V: Gaussian processes Dr. Oliver Stegle Christoh Lippert Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen Summer 2011 Oliver Stegle GWAS V: Gaussian processes Summer 2011

More information

Linear discriminant functions

Linear discriminant functions Andrea Passerini passerini@disi.unitn.it Machine Learning Discriminative learning Discriminative vs generative Generative learning assumes knowledge of the distribution governing the data Discriminative

More information

COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 10

COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 10 COS53: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 0 MELISSA CARROLL, LINJIE LUO. BIAS-VARIANCE TRADE-OFF (CONTINUED FROM LAST LECTURE) If V = (X n, Y n )} are observed data, the linear regression problem

More information

Learning with Rejection

Learning with Rejection Learning with Rejection Corinna Cortes 1, Giulia DeSalvo 2, and Mehryar Mohri 2,1 1 Google Research, 111 8th Avenue, New York, NY 2 Courant Institute of Mathematical Sciences, 251 Mercer Street, New York,

More information

Instance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Instance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016 Instance-based Learning CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Outline Non-parametric approach Unsupervised: Non-parametric density estimation Parzen Windows Kn-Nearest

More information

CMU-Q Lecture 24:

CMU-Q Lecture 24: CMU-Q 15-381 Lecture 24: Supervised Learning 2 Teacher: Gianni A. Di Caro SUPERVISED LEARNING Hypotheses space Hypothesis function Labeled Given Errors Performance criteria Given a collection of input

More information

Lecture: Gaussian Process Regression. STAT 6474 Instructor: Hongxiao Zhu

Lecture: Gaussian Process Regression. STAT 6474 Instructor: Hongxiao Zhu Lecture: Gaussian Process Regression STAT 6474 Instructor: Hongxiao Zhu Motivation Reference: Marc Deisenroth s tutorial on Robot Learning. 2 Fast Learning for Autonomous Robots with Gaussian Processes

More information

Bayesian Inference: Principles and Practice 3. Sparse Bayesian Models and the Relevance Vector Machine

Bayesian Inference: Principles and Practice 3. Sparse Bayesian Models and the Relevance Vector Machine Bayesian Inference: Principles and Practice 3. Sparse Bayesian Models and the Relevance Vector Machine Mike Tipping Gaussian prior Marginal prior: single α Independent α Cambridge, UK Lecture 3: Overview

More information

Probabilistic Machine Learning. Industrial AI Lab.

Probabilistic Machine Learning. Industrial AI Lab. Probabilistic Machine Learning Industrial AI Lab. Probabilistic Linear Regression Outline Probabilistic Classification Probabilistic Clustering Probabilistic Dimension Reduction 2 Probabilistic Linear

More information

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016 Probabilistic classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Topics Probabilistic approach Bayes decision theory Generative models Gaussian Bayes classifier

More information

The Generalized Likelihood Uncertainty Estimation methodology

The Generalized Likelihood Uncertainty Estimation methodology CHAPTER 4 The Generalized Likelihood Uncertainty Estimation methodology Calibration and uncertainty estimation based upon a statistical framework is aimed at finding an optimal set of models, parameters

More information

Statistical Learning. Philipp Koehn. 10 November 2015

Statistical Learning. Philipp Koehn. 10 November 2015 Statistical Learning Philipp Koehn 10 November 2015 Outline 1 Learning agents Inductive learning Decision tree learning Measuring learning performance Bayesian learning Maximum a posteriori and maximum

More information

Fundamentals to Biostatistics. Prof. Chandan Chakraborty Associate Professor School of Medical Science & Technology IIT Kharagpur

Fundamentals to Biostatistics. Prof. Chandan Chakraborty Associate Professor School of Medical Science & Technology IIT Kharagpur Fundamentals to Biostatistics Prof. Chandan Chakraborty Associate Professor School of Medical Science & Technology IIT Kharagpur Statistics collection, analysis, interpretation of data development of new

More information

Self Adaptive Particle Filter

Self Adaptive Particle Filter Self Adaptive Particle Filter Alvaro Soto Pontificia Universidad Catolica de Chile Department of Computer Science Vicuna Mackenna 4860 (143), Santiago 22, Chile asoto@ing.puc.cl Abstract The particle filter

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate

More information

Machine Learning. Bayesian Regression & Classification. Marc Toussaint U Stuttgart

Machine Learning. Bayesian Regression & Classification. Marc Toussaint U Stuttgart Machine Learning Bayesian Regression & Classification learning as inference, Bayesian Kernel Ridge regression & Gaussian Processes, Bayesian Kernel Logistic Regression & GP classification, Bayesian Neural

More information

Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines

Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Fall 2018 CS 551, Fall

More information

Machine Learning Techniques for Computer Vision

Machine Learning Techniques for Computer Vision Machine Learning Techniques for Computer Vision Part 2: Unsupervised Learning Microsoft Research Cambridge x 3 1 0.5 0.2 0 0.5 0.3 0 0.5 1 ECCV 2004, Prague x 2 x 1 Overview of Part 2 Mixture models EM

More information

CSC2541 Lecture 2 Bayesian Occam s Razor and Gaussian Processes

CSC2541 Lecture 2 Bayesian Occam s Razor and Gaussian Processes CSC2541 Lecture 2 Bayesian Occam s Razor and Gaussian Processes Roger Grosse Roger Grosse CSC2541 Lecture 2 Bayesian Occam s Razor and Gaussian Processes 1 / 55 Adminis-Trivia Did everyone get my e-mail

More information

Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers

Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers Erin Allwein, Robert Schapire and Yoram Singer Journal of Machine Learning Research, 1:113-141, 000 CSE 54: Seminar on Learning

More information

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics STA414/2104 Lecture 11: Gaussian Processes Department of Statistics www.utstat.utoronto.ca Delivered by Mark Ebden with thanks to Russ Salakhutdinov Outline Gaussian Processes Exam review Course evaluations

More information

Lecture : Probabilistic Machine Learning

Lecture : Probabilistic Machine Learning Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning

More information

Short Note: Naive Bayes Classifiers and Permanence of Ratios

Short Note: Naive Bayes Classifiers and Permanence of Ratios Short Note: Naive Bayes Classifiers and Permanence of Ratios Julián M. Ortiz (jmo1@ualberta.ca) Department of Civil & Environmental Engineering University of Alberta Abstract The assumption of permanence

More information

Lecture 2. G. Cowan Lectures on Statistical Data Analysis Lecture 2 page 1

Lecture 2. G. Cowan Lectures on Statistical Data Analysis Lecture 2 page 1 Lecture 2 1 Probability (90 min.) Definition, Bayes theorem, probability densities and their properties, catalogue of pdfs, Monte Carlo 2 Statistical tests (90 min.) general concepts, test statistics,

More information

SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION

SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION 1 Outline Basic terminology Features Training and validation Model selection Error and loss measures Statistical comparison Evaluation measures 2 Terminology

More information

Keywords: Multimode process monitoring, Joint probability, Weighted probabilistic PCA, Coefficient of variation.

Keywords: Multimode process monitoring, Joint probability, Weighted probabilistic PCA, Coefficient of variation. 2016 International Conference on rtificial Intelligence: Techniques and pplications (IT 2016) ISBN: 978-1-60595-389-2 Joint Probability Density and Weighted Probabilistic PC Based on Coefficient of Variation

More information

Gaussian Processes (10/16/13)

Gaussian Processes (10/16/13) STA561: Probabilistic machine learning Gaussian Processes (10/16/13) Lecturer: Barbara Engelhardt Scribes: Changwei Hu, Di Jin, Mengdi Wang 1 Introduction In supervised learning, we observe some inputs

More information

Simple closed form formulas for predicting groundwater flow model uncertainty in complex, heterogeneous trending media

Simple closed form formulas for predicting groundwater flow model uncertainty in complex, heterogeneous trending media WATER RESOURCES RESEARCH, VOL. 4,, doi:0.029/2005wr00443, 2005 Simple closed form formulas for predicting groundwater flow model uncertainty in complex, heterogeneous trending media Chuen-Fa Ni and Shu-Guang

More information

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering Types of learning Modeling data Supervised: we know input and targets Goal is to learn a model that, given input data, accurately predicts target data Unsupervised: we know the input only and want to make

More information

Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training

Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training Charles Elkan elkan@cs.ucsd.edu January 17, 2013 1 Principle of maximum likelihood Consider a family of probability distributions

More information

COMS 4771 Introduction to Machine Learning. Nakul Verma

COMS 4771 Introduction to Machine Learning. Nakul Verma COMS 4771 Introduction to Machine Learning Nakul Verma Announcements HW1 due next lecture Project details are available decide on the group and topic by Thursday Last time Generative vs. Discriminative

More information

Machine Learning Ensemble Learning I Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi Spring /

Machine Learning Ensemble Learning I Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi Spring / Machine Learning Ensemble Learning I Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi Spring 2015 http://ce.sharif.edu/courses/93-94/2/ce717-1 / Agenda Combining Classifiers Empirical view Theoretical

More information

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions Pattern Recognition and Machine Learning Chapter 2: Probability Distributions Cécile Amblard Alex Kläser Jakob Verbeek October 11, 27 Probability Distributions: General Density Estimation: given a finite

More information

CS-E3210 Machine Learning: Basic Principles

CS-E3210 Machine Learning: Basic Principles CS-E3210 Machine Learning: Basic Principles Lecture 4: Regression II slides by Markus Heinonen Department of Computer Science Aalto University, School of Science Autumn (Period I) 2017 1 / 61 Today s introduction

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

That s Hot: Predicting Daily Temperature for Different Locations

That s Hot: Predicting Daily Temperature for Different Locations That s Hot: Predicting Daily Temperature for Different Locations Alborz Bejnood, Max Chang, Edward Zhu Stanford University Computer Science 229: Machine Learning December 14, 2012 1 Abstract. The problem

More information

Learning with Noisy Labels. Kate Niehaus Reading group 11-Feb-2014

Learning with Noisy Labels. Kate Niehaus Reading group 11-Feb-2014 Learning with Noisy Labels Kate Niehaus Reading group 11-Feb-2014 Outline Motivations Generative model approach: Lawrence, N. & Scho lkopf, B. Estimating a Kernel Fisher Discriminant in the Presence of

More information

Bayesian Linear Regression. Sargur Srihari

Bayesian Linear Regression. Sargur Srihari Bayesian Linear Regression Sargur srihari@cedar.buffalo.edu Topics in Bayesian Regression Recall Max Likelihood Linear Regression Parameter Distribution Predictive Distribution Equivalent Kernel 2 Linear

More information

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) = Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,

More information

Nonparametric Bayes tensor factorizations for big data

Nonparametric Bayes tensor factorizations for big data Nonparametric Bayes tensor factorizations for big data David Dunson Department of Statistical Science, Duke University Funded from NIH R01-ES017240, R01-ES017436 & DARPA N66001-09-C-2082 Motivation Conditional

More information

3.4 Linear Least-Squares Filter

3.4 Linear Least-Squares Filter X(n) = [x(1), x(2),..., x(n)] T 1 3.4 Linear Least-Squares Filter Two characteristics of linear least-squares filter: 1. The filter is built around a single linear neuron. 2. The cost function is the sum

More information

Lawrence D. Brown* and Daniel McCarthy*

Lawrence D. Brown* and Daniel McCarthy* Comments on the paper, An adaptive resampling test for detecting the presence of significant predictors by I. W. McKeague and M. Qian Lawrence D. Brown* and Daniel McCarthy* ABSTRACT: This commentary deals

More information

On the errors introduced by the naive Bayes independence assumption

On the errors introduced by the naive Bayes independence assumption On the errors introduced by the naive Bayes independence assumption Author Matthijs de Wachter 3671100 Utrecht University Master Thesis Artificial Intelligence Supervisor Dr. Silja Renooij Department of

More information

CSci 8980: Advanced Topics in Graphical Models Gaussian Processes

CSci 8980: Advanced Topics in Graphical Models Gaussian Processes CSci 8980: Advanced Topics in Graphical Models Gaussian Processes Instructor: Arindam Banerjee November 15, 2007 Gaussian Processes Outline Gaussian Processes Outline Parametric Bayesian Regression Gaussian

More information

COMS 4771 Regression. Nakul Verma

COMS 4771 Regression. Nakul Verma COMS 4771 Regression Nakul Verma Last time Support Vector Machines Maximum Margin formulation Constrained Optimization Lagrange Duality Theory Convex Optimization SVM dual and Interpretation How get the

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Le Song Machine Learning I CSE 6740, Fall 2013 Naïve Bayes classifier Still use Bayes decision rule for classification P y x = P x y P y P x But assume p x y = 1 is fully factorized

More information