Performance evaluation of Bagging and Boosting in nonparametric regression

Size: px
Start display at page:

Download "Performance evaluation of Bagging and Boosting in nonparametric regression"

Transcription

1 SIMONE BORRA (*) AGOSTINO DI CIACCIO (**) Performance evaluation of Bagging and Boosting in nonparametric regression Contents: 1. Introduction. 2. Bagging and Boosting in Projection Pursuit Regression. 3. Simulation study on real data sets. 4. Conclusions. References. Summary. Riassunto. Key words. 1. Introduction In the last years new algorithms have been proposed in literature in order to improve the accuracy of nonparametric classifiers or to improve the prediction capabilities of nonparametric regression methods. In this context very promising empirical results have been found using procedures based on the assembling of simple predictors obtained resampling on the training sample. Two relevant procedures are Bagging (Breiman, 1996) and Boosting (Freunde & Schapire, 1997). Simulation studies on real and artificial data sets proved that, using Bagging or Boosting with Decision Trees (for example, CART) or with Neural Networks, it is possible to obtain an appreciable reduction of prediction error or classification error (H. Drucker & C. Cortes, 1996; Y. Freunde & R. Schapire, 1996; J. R. Quinlan, 1996; H. Schwenk & Y. Bengio, 1998). These techniques can now be considered a valid alternative to the pruning method, common used in Decision Trees. With regard to the performance comparison of Bagging and Boosting, some simulation studies, concerning in particular the use of CART, showed that Boosting works better than Bagging (L. Breiman, 1998a; E. Bauer & R. Kohavi, 1998; T.G. Dietterich, 1998). (*) University of Rome Tor Vergata - Dept. Sefemeq (**) University of Urbino - Faculty of Economics

2 142 In this paper we consider the prediction capability of nonparametric regression methods and particularly we compare the use of Boosting and Bagging conjointly with the Projection Pursuit Regression, as proposed in Borra & Di Ciaccio (2001). In particular, in the present simulation study, we use well known real data sets, often considered in other papers, to allow a comparison with other approaches presented in literature. We remark that the choice to consider both real and artificial data sets depends on the different information you can get by simulation studies: with artificial data, fairly regular, we can study the influence of the parameters on the prediction capability of a method; with real data, generally more complex and irregular, we can observe the effective prediction capability in empirical studies. The results obtained in this work shows that both techniques are efficient also when they are applied with more complex models then the weak learners more often considered in literature (for example the Decision Trees). 2. Bagging and Boosting in Projection Pursuit Regression Consider a nonparametric regression problem Y = f (X)+ε, where we suppose that a response variable Y is related to p predictor variables X ={X 1, X 2,...,X p } by an unknown nonlinear function f (X) defined in a such way that ε is a noise component with E(ε) = 0. We don t suppose f (X) as the true link function between Y and the predictor variables, but we consider f (X) a function useful to reconstitute the values of Y starting from the values of the predictor variables. The objective is not to explain the link existing between the variables but to obtain a tool useful in a prediction problem. In this context, we will evaluate the choice of the estimator of the function f (X) in terms of its prediction capability. Many strategies have been explored to improve the prediction capability in presence of nonlinear associations, for example cross-validation and the methods based on data perturbation (Efron & Tibshirani 1993, Ditterich 1997). In this paper, in accord with the methodology used by some authors (for example, Breiman 1998a, 1998b), we will compare the prediction capability of some estimators of f (X) on empirical known populations, drawing from them independent random samples. If we indicate with g(x t) an approximating function of the unknown function, based on n observations of a training sample t =

3 143 {y i, x i } n 1, than we use as measure of prediction error, the mean squared generalization error: PE(g t) = E Y,X (Y g(x t)) 2 (1) where E indicates the average over all values (y, x) of the population, not included in the training sample. The prediction capability of the function g(x t) can be evaluated considering the average of the sampling distribution of PE(g t) indicated by PE(g). Some recent approaches show that it is possible to build an aggregated predictor able to improve (in terms of prediction capability) the performance obtained by the single predictor. In general, single approximating functions g 1 (X t 1 ),..., g K (X t K ) are calculated on the basis of a sequence of sub-samples, t 1,...,t K generated from the training sample t (resampling, eventually with a system of weights), then they are opportunely aggregated to build an aggregated predictor. This approach produced two relevant procedures: Bagging, including its variants (Breiman 1996, 1998b, 1999), and Boosting (Drucker, 1997; Ridgeway et al., 1999). We can describe the Bagging procedure by the following steps: 1. From the training sample we draw with replacement one subsample t k t ={y i, x i } n 1 2. The predictor g k (X t k ) is calculated 3. Repeat K times Step 1 and 2 and finally the aggregated bagging predictor is calculated as: K g bagg (X t) = γ k g k (X t k ) (2) k=1 where the weight γ k is γ k 0, K k=1 γ k = 1 (usually γ k = 1/K ). The explanation given by Breiman (1996) for the reduction of PE(g) by means of the aggregated bagging predictor considers the following decomposition: EP(g) = var(ε) + E X ( f (X) ḡ(x)) 2 + E X,T (g(x T) ḡ(x)) 2 (3) where var(ε) is the noise component variance, ḡ(x) = E T (g(x T)) is the average of g(x T) over all training samples of size n. The second and third term of (3) are, respectively, the square bias and

4 144 the variance of the approximating function. The effect produced by Bagging (demonstrated in many simulation works) is to reduce considerably the variance term maintaining constant the bias. Recently, Friedman & Hall (1999) pointed out a theoretical justification for a statistical estimator of a function f (X) showing its decomposition into a linear and higher order parts. The authors deduced that the effect of Bagging is due essentially to reduced variability of the nonlinear component by replacing it with an estimate of its expected value and consequently decreasing the estimator variability. The Boosting algorithm initially have been studied for classification problems and only recently similar approaches for regression problems have been proposed. Differently from Bagging, where all cases of the training sample have the same probability of inclusion in the sub-sample, in the Boosting algorithm the probabilities of inclusion change. In particular, considering the k-th sub-sample, the probability of inclusion for the i-th case increases if the correspondent value y i was bad predicted by the approximating functions estimated on k 1 previous sub-samples. In this way the algorithm forces the sub-sample to include cases with values, till that moment, bad predicted. We considered a Boosting algorithm for regression problems proposed by Drucker (1997) (this algorithm is a modification of Adaboost proposed by Freund e Schapire for classification problems) which can be resumed as follow: 1. Firstly set the same weight w i = 1(i = 1,...,n) toall cases of the training sample. 2. A sub-sample t k is generated drawing with replacement n examples from the training sample t, where the probability of inclusion for the i-th case is fixed to p i = w i / n j=1 w j. 3. Fit on the sub-sample t k the approximating function g k (X t k ) and for all cases of the training sample obtain the predicted value of Y, ŷi k (i = 1,...,n). 4. Calculate over all cases a quadratic loss function normalised: L i (g k ) = (ŷi k y i ) 2 / j(ŷj k y j ) 2 for i = 1,...,n and finally calculate the average loss L(g k ) = n i=1 L i(g k )p i. 5. Calculate the coefficient β(g k ) = L(g k )/(1 L(g k )) that indicates the prediction capability of the approximating function g k (X t k ): a small value of β(g k ) indicates an high prediction capability.

5 Update the weights w i w i β(g k ) (1 L i (g k )) : in this way, a small loss for the i-th case corresponds to higher reduction of the weight w i and consequently to higher probability p i to be included in the successive sub-sample. 7. Repeat for K times the procedure from Step 2 to Step 6 or until the average loss is less than To obtain the aggregated boosting predictor use the weighted median: { g boost (X t)=inf y Y : log(1/β(g k )) 1 } log(1/β(g k )) (4) 2 k:g k (x t k ) y The aggregated predictor g boost weighs the K approximating functions giving higher weights to those with best fitting over all cases of the training sample. In this way, we balance the specialized prediction capability of K approximating functions on a restricted number of harder cases to their global prediction capability over all cases of the training sample. Drucker (1997) proposed in the Step 4 other two candidate loss functions: linear and exponential, but in this study we chose always the quadratic loss function. In literature the properties of Bagging and Boosting applied in regression problems have been studied prevalently as regards to the use of Regression Trees and K -Nearest Neighbor as approximating functions (Breiman, 1999; Drucker, 1997; Ridgeway et al., 1999). In this work we use as approximating function the Projection Pursuit Regression (PPR) (Friedman & Stuetzle, 1981). We define PPR as: g k (X t k ) = k M β m,k h m,k (α T m,k X t k) (5) m=1 where h m,k ( ) are smooth functions of different linear combinations of predictor variables. Increasing the number M of smooth functions the model is capable to fit functions with higher complexity. Diaconis and Shahshahani, (1984) demonstrated that for a large value of M, under very general conditions, every continuous function of X can be approximated by PPR. To estimate the PPR model we used the SMART program (Friedman, 1985); to employ a pruning strategy it is necessary to select the largest number of smooth functions, M L,touse in the search as well

6 146 as the final number of smooth functions, M F. In our applications we fixed the number of terms as M L = M F + 4. We performed simulation studies regarding the simple predictor PPR, the aggregated Bagging predictor and the aggregated Boosting predictor to estimate the values of PE(g PPR ),PE(g PPRbagg ),PE(g PPRboost ) and related variances. 3. Simulation study on real data sets Using artificial data sets originated in Friedman (1991), Borra & Di Ciaccio (2001) showed that Bagging applied to PPR can reduce considerably the prediction error and they pointed out some factors affecting the performance of Bagging. In particular they showed that: 1. Increasing K, the number of aggregated PPR calculated on the bootstrapped samples, the average of prediction error PE(g PPRbagg ) decreases until it becomes constant. 2. A rise of complexity in the approximating function resulting from an increase of M does not bring, differently from that occurs to simple PPR, to overfit the training data and consequently to have a reduced prediction capability. 3. The effect of Bagging is more evident in presence of moderatehigh noise. In this work we have considered three well known real data sets previously used in other studies and included in the machine learning repository of the University of California, Irvine: Boston Housing data, with 506 observations for each census district of the Boston metropolitan area, the median market value of owner-occupied homes as the response variable and 12 social-demographic and environmental indicators as predictor variables; Auto-MPG data, concerning the city-cycle fuel consumption in miles per gallon of 398 cars and 7 predictor variables related to mechanical characteristics, year and origin of production; Abalone data, that includes the age of 4177 examples of Abalone (the age of abalone is determined by cutting the shell through the cone, staining it, and counting the number of rings through a microscope) and 8 different measurements of the dimension and weight which are easier to obtain and are used to predict the age. To estimate PE(g PPR ), PR(g PPRbagg ) and PE(g PPRboost ) we applied the simulation procedure illustrated in Fig. 1.

7 147 Empirical data-set Random split Training sample 80% of units Test sample 20% of units Estimation of the methods: PPR PPRbagg PPRboost calculate: EQM for PPR EQM for PPRbagg EQM for PPRboost calculate: EP for PPR EP for PPRbagg EP for PPRboost Fig. 1. Simulation schema for real data-set Auto-MPG. In the first step we split randomly the real data set in two parts: the trainig set (including, in accordance with the same training size used in literature, 80% of cases of Auto-MPG data-set, 90% of cases of Boston Housing and 75% of Abalone) and the test set (including the remaining cases). From the training data we estimate g PPR and the aggregated predictors g PPRbagg and g PPRboost obtained from the two different resamplig techniques. Correspondently the values of the mean square error are calculated on the training data set. Then we calculate the prediction error PE on the test set for each method. This schema is repeated for 50 times with different casual splits of the data-set and finally we obtain three estimates of the average mean-squared generalization error PE. The simulation procedure is replicated with several values of M, hence the effect of several degree of complexity of PPR is considered for both Bagging and Boosting.

8 148 In figures we show the results of the simulations for Auto MPG data-set with M = 4. For each of 50 repetitions of the simulation schema, the MSE estimated on the training set and the PE estimated on the test set are showed as a function of the number of aggregations K (from 1 to 30 for Bagging, from 1 to 50 for Boosting). We can note that the average PE (line in bold type) decreases rising the number of aggregations and it is evident a reduction in the sample variability of both Bagging and Boosting though with different level. Moreover the average PE becomes stable yet for 30 aggregations using Bagging, while with Boosting is necessary to consider a larger number K (50) to obtain a good predictor. MSE n. of iterations Fig. 2. Fit of PPR-bagg on the training sample (data-set Auto MPG, M = 4). In figures 6 and 7 we consider Boston Housing data-set with M = 4 and we show the behaviour of the average of PE and the average of MSE increasing the number of aggregations for the predictors PPRbagg and PPRboost. The horizontal lines on the graphs represents the average of PE and the average of MSE for the simple PPR. In tables 1-2 we compared, for the three methods, the estimates PE of and MSE (with K = 30 for Bagging and K = 50 for Boosting) considering the Auto-MPG data set and several values of M. In the tables we indicated the variance of the values of PE and MSE obtained by simulations to compare the variability of the results obtained by three different methods.

9 149 PE n. of iterations Fig. 3. Prediction Error of PPR-bagg on the test sample (data-set Auto MPG, M = 4) MSE n. of iterations Fig. 4. Fit of PPR-boost on the training sample (data-set Auto MPG, M = 4). We can note that for the simple PPR the prediction capability, measured by means the value of PE, decreases growing the complexity parameter M, while for PPRbagg and PPRboost increases the prediction capability showing lower values of variability. The behavior of the values of MSE and PE for PPR points out that the predictor tends to overfit the data. Such disadvantage does not occur, at least for the values of M considered, with PPRbagg and with PPRboost that indeed benefit of the increase of the number of parameters of the model.

10 PE n. of iterations Fig. 5. Prediction Error of PPR-boost on the test sample (data-set Auto MPG, M = 4) PPR on test-sample PPRbagg on test-sample PPR on training-set PPRbagg on training-set n. of iterations Fig. 6. Mean values of PE and MSE for PPR and PPR-bagg (data-set Boston Housing, M = 4). In tables 3 and 4 we have showed the results concerning the same analysis applied to the Boston Housing data-set. Also in this case increasing the value of M, the fitting to the training sample rises for all three predictors, even if the PPRboost obtains best results in terms of values of MSE and variability. Respect to the prediction capability, PPR seems to reach, for M = 8, the overfitting. Differently, PPRboost and PPRbagg improve their forecast and the last achieves the best performance.

11 PPR on test-sample PPRboost on test-sample PPR on training-sample PPRboost on training-sample n. of iterations Fig. 7. Mean values of PE and MSE for PPR and PPR-boost (data-set Boston Housing, M = 4) The results of tables 5 and 6 refer to the Abalone data-set, which is different from the previous two data sets for the greater number of cases. From the tables we can note that, growing M, the fitting of PPR on the training sample improves but without overfitting. Also for this data set PPRbagg and PPRboost result better than PPR both in training and prediction capability. From the point of view of computational complexity of the simulations carried out, for each data-set we applied PPR 6000 times to obtain the estimation of PE for PPRbagg and times to obtain the estimation of PE for PPRboost. Using the SMART program to estimate PPR we fixed the internal parameters in such way that the algorithm did not carry out strategies of pruning. Table 1: Estimation of PE onauto MPG for several values of M. PPR PPR-bagg PPR-boost PE var. PE var. PE var. M = M = M = M =

12 152 Table 2: Estimation of MSE onauto MPG for several values of M. PPR PPR-bagg PPR-boost MSE var. MSE var. MSE var. M = M = M = M = Table 3: Estimation of PE onboston Housing for several values of M. PPR PPR-bagg PPR-boost PE var. PE var. PE var. M = M = M = M = Table 4: Estimation of MSE onboston Housing for several values of M. PPR PPR-bagg PPR-boost MSE var. MSE var. MSE var. M = M = M = M = Table 5: Estimation of PE onabalone for several values of M. PPR PPR-bagg PPR-boost PE var. PE var. PE var. M = M = M = M =

13 153 Table 6: Estimation of MSE onabalone for several values of M. PPR PPR-bagg PPR-boost MSE var. MSE var. MSE var. M = M = M = M = Conclusions The results obtained by simulation studies on real data sets confirm the effectiveness of Bagging when applied in conjunction with nonparametric regression methods like PPR. This resamplig technique allows to improve both the fitting of the training sample and the prediction capability of PPR, avoiding problems connected to overfitting, also in presence of a large number of parameters in the model. The results obtained brings us to a positive evaluation of the method, even if the number of replications is not sufficient to obtain very accurate estimates. The comparison between Boosting and Bagging does not give remarkable differences, although the last method gives, for our datasets, better performance in the prediction phase. Never the prediction capability of Boosting has been remarkable better than Bagging, but the fit to the training sample is usually the best. This results are different with respect to other comparisons obtained using Regression Trees (Drucker 1997). However it is necessary to consider that we used the boosting approach of Drucker, but there are also other proposals in literature for the application of Boosting with regressive methods. Therefore we can not exclude that better results may be obtained considering another proposal. Alternatives approaches were proposed, for example, by Ridgeway et al. (1999), difficult to implement from a computational point of view, or by Friedman (1999) based on the gradient descendent. With regard to the comparison with other analyses using the same data-sets, it can be noticed that the application of PPR with Bagging proved in many cases more effective as regards to Regression Tree

14 154 with Bagging (see Breiman 1999 as an example). After all it can be concluded in the light of the obtained results that both for real data and simulated data (Borra & Di Ciaccio, 2001) the application of Bagging to the Projection Pursuit Regression proved to be effective both in training phase and in prediction phase. REFERENCES Bauer, E. and Kohavi, R. (1998) An empirical comparison of voting classification algorithms: Bagging, Boosting, and variants, Machine Learning, vv:1-38. Borra, S. and Di Ciaccio, A. (2001) Reduction of prediction error by bagging projection pursuit regression, in Advances in Classification and Data Analysis, Borra S., Rocci R., Vichi M., Schader M. eds., Springer Verlag. Breiman, L. (1996) Bagging predictors, Machine Learning, 26, n.2: Breiman, L. (1998a) Arcing Classifier, discussion paper, Annals of Statistics, 26: Breiman, L. (1998b) Half and half bagging and hard boundary points, Technical Report n. 534, Statistics Department, University of California, Berkeley. Breiman, L. (1999) Using adaptive bagging to debias regressions, Technical Report n. 547, Statistics Department, University of California, Berkeley. Diaconis, P. and Shahshahani, M. (1984) On non linear functions of linear combinations, SIAM, J. Sci. Statist. Comput., 5: Dietterich, T. G. (1997) Machine Learning Research: Four Current Directions, A.I. Magazine, 18(4): Dietterich, T. G. (1998) An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, Boosting, and Randomization, Machine Learning: Drucker, H. (1997) Improving Regressors using Boosting Techniques, in Proceedings of the Fourteenth International Conference on Machine Learning, ed. Fisher D.H., Jr., Morgan-Kaufmann: Drucker, H. and Cortes, C. (1996) Boosting decision trees, Advances in Neural Information Processing Systems, 8: Efron, B. and Tibshirani, R.J. (1993) An Introduction to the Bootstrap, Chapman & Hall, New York. Freunde, Y. and Schapire, R. (1996) Experiments with a new boosting algorithm, in Machine Learning: Proceedings of the Thirteenth International Conference: Freunde, Y. and Schapire, R. (1997) A decision-theoretic generalization of on-line learning and an application to boosting, Journal of Computer and System Sciences, 55(1): Friedman, J.H. (1985) Classification and multiple regression through projection pursuit, Technical Report n. 12, Dept. of Statistics, Stanford University.

15 155 Friedman, J.H. (1991) Multivariate adaptive regression splines, The Annals of Statistics, 19: Friedman, J.H. (1999) Greedy Function Approximation: a Gradient Boosting Machine, Technical report, Dept. of Statistics, Stanford University: in http: //www-stat.stanford.edu/ jhf/. Friedman, J.H. and Hall, P. (1999) On Bagging and Nonlinear Estimation, in http: //www-stat.stanford.edu/ jhf/. Friedman, J.H. and Stuetzle, W. (1981) Projection Pursuit Regression, Journal of the American Statistical Association, 76: Quinlan, J. R. (1996) Bagging, boosting, and C4.5, in Proceedings of the Thirteenth National Conference on Artificial Intelligence, Ridgeway, G., Madigan, D. and Richardson, T. (1999) Boosting methodology for regression problems, in Proceedings of the 7th International Workshop on Artificial Intelligence and Statistics, January 3-6, Florida. Schwenk, H. and Bengio, Y. (1998) Training methods for adaptive boosting of neural networks for charater recognition, in Advances in Neural Information Processing Systems, 10. Performance evaluation of Bagging and Boosting in nonparametric regression Summary In this paper we consider two recently proposed techniques, Bagging and Boosting, devoted to the reduction of prediction error in classification and regression problems. In the analysis of artificial and real data-set they proved to be particularly capable combined with Classification and Regression Trees and with Neural Networks. In this paper we test and compare these techniques combined with Projection Pursuit Regression. The results obtained in the analysis of simulations on real data-set show the capability of these techniques to reduce the prediction error of Projection Pursuit Regression. In these simulations Bagging performed better than Boosting, contrary to what was observed with other non-parametric methods. Valutazione della performance del Bagging e del Boosting nella regressione non parametrica Riassunto In questo lavoro vengono considerate due tecniche proposte recentemente in letteratura, il Bagging e il Boosting, rivolte alla riduzione dell errore di previsione in problemi di classificazione e di regressione. Da studi condotti su dati simulati o reali, tali tecniche sono risultate particolarmente efficaci in congiunzione con i Classification e Regression Trees e con i Neural Networks. In questo lavoro vengono sperimentate e confrontate queste tecniche in congiunzione con la Projection Pursuit Regression. I risultati ottenuti

16 156 da simulazioni su dati reali mostrano la loro capacità nel ridurre l errore di previsione della Projection Pursuit Regression. Nelle simulazioni effettuate il Bagging risulta più efficace del Boosting, contrariamente a quanto osservato in congiunzione con altri metodi non-parametrici. Key words Nonparametric regression; Bagging; Boosting; Projection pursuit regression; Prediction error; Pruning. [Manuscript received October 2000; final version received March 2001.]

Voting (Ensemble Methods)

Voting (Ensemble Methods) 1 2 Voting (Ensemble Methods) Instead of learning a single classifier, learn many weak classifiers that are good at different parts of the data Output class: (Weighted) vote of each classifier Classifiers

More information

Machine Learning. Ensemble Methods. Manfred Huber

Machine Learning. Ensemble Methods. Manfred Huber Machine Learning Ensemble Methods Manfred Huber 2015 1 Bias, Variance, Noise Classification errors have different sources Choice of hypothesis space and algorithm Training set Noise in the data The expected

More information

Monte Carlo Theory as an Explanation of Bagging and Boosting

Monte Carlo Theory as an Explanation of Bagging and Boosting Monte Carlo Theory as an Explanation of Bagging and Boosting Roberto Esposito Università di Torino C.so Svizzera 185, Torino, Italy esposito@di.unito.it Lorenza Saitta Università del Piemonte Orientale

More information

CS7267 MACHINE LEARNING

CS7267 MACHINE LEARNING CS7267 MACHINE LEARNING ENSEMBLE LEARNING Ref: Dr. Ricardo Gutierrez-Osuna at TAMU, and Aarti Singh at CMU Mingon Kang, Ph.D. Computer Science, Kennesaw State University Definition of Ensemble Learning

More information

RANDOMIZING OUTPUTS TO INCREASE PREDICTION ACCURACY

RANDOMIZING OUTPUTS TO INCREASE PREDICTION ACCURACY 1 RANDOMIZING OUTPUTS TO INCREASE PREDICTION ACCURACY Leo Breiman Statistics Department University of California Berkeley, CA. 94720 leo@stat.berkeley.edu Technical Report 518, May 1, 1998 abstract Bagging

More information

Boosting Methods: Why They Can Be Useful for High-Dimensional Data

Boosting Methods: Why They Can Be Useful for High-Dimensional Data New URL: http://www.r-project.org/conferences/dsc-2003/ Proceedings of the 3rd International Workshop on Distributed Statistical Computing (DSC 2003) March 20 22, Vienna, Austria ISSN 1609-395X Kurt Hornik,

More information

Variance Reduction and Ensemble Methods

Variance Reduction and Ensemble Methods Variance Reduction and Ensemble Methods Nicholas Ruozzi University of Texas at Dallas Based on the slides of Vibhav Gogate and David Sontag Last Time PAC learning Bias/variance tradeoff small hypothesis

More information

Ensembles of Classifiers.

Ensembles of Classifiers. Ensembles of Classifiers www.biostat.wisc.edu/~dpage/cs760/ 1 Goals for the lecture you should understand the following concepts ensemble bootstrap sample bagging boosting random forests error correcting

More information

Data Mining und Maschinelles Lernen

Data Mining und Maschinelles Lernen Data Mining und Maschinelles Lernen Ensemble Methods Bias-Variance Trade-off Basic Idea of Ensembles Bagging Basic Algorithm Bagging with Costs Randomization Random Forests Boosting Stacking Error-Correcting

More information

1/sqrt(B) convergence 1/B convergence B

1/sqrt(B) convergence 1/B convergence B The Error Coding Method and PICTs Gareth James and Trevor Hastie Department of Statistics, Stanford University March 29, 1998 Abstract A new family of plug-in classication techniques has recently been

More information

Hierarchical Boosting and Filter Generation

Hierarchical Boosting and Filter Generation January 29, 2007 Plan Combining Classifiers Boosting Neural Network Structure of AdaBoost Image processing Hierarchical Boosting Hierarchical Structure Filters Combining Classifiers Combining Classifiers

More information

Lossless Online Bayesian Bagging

Lossless Online Bayesian Bagging Lossless Online Bayesian Bagging Herbert K. H. Lee ISDS Duke University Box 90251 Durham, NC 27708 herbie@isds.duke.edu Merlise A. Clyde ISDS Duke University Box 90251 Durham, NC 27708 clyde@isds.duke.edu

More information

Learning with multiple models. Boosting.

Learning with multiple models. Boosting. CS 2750 Machine Learning Lecture 21 Learning with multiple models. Boosting. Milos Hauskrecht milos@cs.pitt.edu 5329 Sennott Square Learning with multiple models: Approach 2 Approach 2: use multiple models

More information

A Brief Introduction to Adaboost

A Brief Introduction to Adaboost A Brief Introduction to Adaboost Hongbo Deng 6 Feb, 2007 Some of the slides are borrowed from Derek Hoiem & Jan ˇSochman. 1 Outline Background Adaboost Algorithm Theory/Interpretations 2 What s So Good

More information

Adaptive Boosting of Neural Networks for Character Recognition

Adaptive Boosting of Neural Networks for Character Recognition Adaptive Boosting of Neural Networks for Character Recognition Holger Schwenk Yoshua Bengio Dept. Informatique et Recherche Opérationnelle Université de Montréal, Montreal, Qc H3C-3J7, Canada fschwenk,bengioyg@iro.umontreal.ca

More information

Bagging and Boosting for the Nearest Mean Classifier: Effects of Sample Size on Diversity and Accuracy

Bagging and Boosting for the Nearest Mean Classifier: Effects of Sample Size on Diversity and Accuracy and for the Nearest Mean Classifier: Effects of Sample Size on Diversity and Accuracy Marina Skurichina, Liudmila I. Kuncheva 2 and Robert P.W. Duin Pattern Recognition Group, Department of Applied Physics,

More information

Classification Ensemble That Maximizes the Area Under Receiver Operating Characteristic Curve (AUC)

Classification Ensemble That Maximizes the Area Under Receiver Operating Characteristic Curve (AUC) Classification Ensemble That Maximizes the Area Under Receiver Operating Characteristic Curve (AUC) Eunsik Park 1 and Y-c Ivan Chang 2 1 Chonnam National University, Gwangju, Korea 2 Academia Sinica, Taipei,

More information

A Unified Bias-Variance Decomposition

A Unified Bias-Variance Decomposition A Unified Bias-Variance Decomposition Pedro Domingos Department of Computer Science and Engineering University of Washington Box 352350 Seattle, WA 98185-2350, U.S.A. pedrod@cs.washington.edu Tel.: 206-543-4229

More information

Machine Learning Ensemble Learning I Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi Spring /

Machine Learning Ensemble Learning I Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi Spring / Machine Learning Ensemble Learning I Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi Spring 2015 http://ce.sharif.edu/courses/93-94/2/ce717-1 / Agenda Combining Classifiers Empirical view Theoretical

More information

Holdout and Cross-Validation Methods Overfitting Avoidance

Holdout and Cross-Validation Methods Overfitting Avoidance Holdout and Cross-Validation Methods Overfitting Avoidance Decision Trees Reduce error pruning Cost-complexity pruning Neural Networks Early stopping Adjusting Regularizers via Cross-Validation Nearest

More information

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18 CSE 417T: Introduction to Machine Learning Final Review Henry Chai 12/4/18 Overfitting Overfitting is fitting the training data more than is warranted Fitting noise rather than signal 2 Estimating! "#$

More information

Stochastic Gradient Descent

Stochastic Gradient Descent Stochastic Gradient Descent Machine Learning CSE546 Carlos Guestrin University of Washington October 9, 2013 1 Logistic Regression Logistic function (or Sigmoid): Learn P(Y X) directly Assume a particular

More information

Ensemble Methods: Jay Hyer

Ensemble Methods: Jay Hyer Ensemble Methods: committee-based learning Jay Hyer linkedin.com/in/jayhyer @adatahead Overview Why Ensemble Learning? What is learning? How is ensemble learning different? Boosting Weak and Strong Learners

More information

8.6 Bayesian neural networks (BNN) [Book, Sect. 6.7]

8.6 Bayesian neural networks (BNN) [Book, Sect. 6.7] 8.6 Bayesian neural networks (BNN) [Book, Sect. 6.7] While cross-validation allows one to find the weight penalty parameters which would give the model good generalization capability, the separation of

More information

Bagging During Markov Chain Monte Carlo for Smoother Predictions

Bagging During Markov Chain Monte Carlo for Smoother Predictions Bagging During Markov Chain Monte Carlo for Smoother Predictions Herbert K. H. Lee University of California, Santa Cruz Abstract: Making good predictions from noisy data is a challenging problem. Methods

More information

Combining Estimators to Improve Performance

Combining Estimators to Improve Performance Combining Estimators to Improve Performance A survey of model bundling techniques -- from boosting and bagging, to Bayesian model averaging -- creating a breakthrough in the practice of Data Mining. John

More information

AdaBoost. Lecturer: Authors: Center for Machine Perception Czech Technical University, Prague

AdaBoost. Lecturer: Authors: Center for Machine Perception Czech Technical University, Prague AdaBoost Lecturer: Jan Šochman Authors: Jan Šochman, Jiří Matas Center for Machine Perception Czech Technical University, Prague http://cmp.felk.cvut.cz Motivation Presentation 2/17 AdaBoost with trees

More information

Learning Ensembles. 293S T. Yang. UCSB, 2017.

Learning Ensembles. 293S T. Yang. UCSB, 2017. Learning Ensembles 293S T. Yang. UCSB, 2017. Outlines Learning Assembles Random Forest Adaboost Training data: Restaurant example Examples described by attribute values (Boolean, discrete, continuous)

More information

A Decision Stump. Decision Trees, cont. Boosting. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University. October 1 st, 2007

A Decision Stump. Decision Trees, cont. Boosting. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University. October 1 st, 2007 Decision Trees, cont. Boosting Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University October 1 st, 2007 1 A Decision Stump 2 1 The final tree 3 Basic Decision Tree Building Summarized

More information

A Simple Algorithm for Learning Stable Machines

A Simple Algorithm for Learning Stable Machines A Simple Algorithm for Learning Stable Machines Savina Andonova and Andre Elisseeff and Theodoros Evgeniou and Massimiliano ontil Abstract. We present an algorithm for learning stable machines which is

More information

Learning theory. Ensemble methods. Boosting. Boosting: history

Learning theory. Ensemble methods. Boosting. Boosting: history Learning theory Probability distribution P over X {0, 1}; let (X, Y ) P. We get S := {(x i, y i )} n i=1, an iid sample from P. Ensemble methods Goal: Fix ɛ, δ (0, 1). With probability at least 1 δ (over

More information

Bagging. Ryan Tibshirani Data Mining: / April Optional reading: ISL 8.2, ESL 8.7

Bagging. Ryan Tibshirani Data Mining: / April Optional reading: ISL 8.2, ESL 8.7 Bagging Ryan Tibshirani Data Mining: 36-462/36-662 April 23 2013 Optional reading: ISL 8.2, ESL 8.7 1 Reminder: classification trees Our task is to predict the class label y {1,... K} given a feature vector

More information

Ensemble Methods for Machine Learning

Ensemble Methods for Machine Learning Ensemble Methods for Machine Learning COMBINING CLASSIFIERS: ENSEMBLE APPROACHES Common Ensemble classifiers Bagging/Random Forests Bucket of models Stacking Boosting Ensemble classifiers we ve studied

More information

Statistical Machine Learning from Data

Statistical Machine Learning from Data Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Ensembles Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique Fédérale de Lausanne

More information

Bias-Variance in Machine Learning

Bias-Variance in Machine Learning Bias-Variance in Machine Learning Bias-Variance: Outline Underfitting/overfitting: Why are complex hypotheses bad? Simple example of bias/variance Error as bias+variance for regression brief comments on

More information

Ensemble Methods and Random Forests

Ensemble Methods and Random Forests Ensemble Methods and Random Forests Vaishnavi S May 2017 1 Introduction We have seen various analysis for classification and regression in the course. One of the common methods to reduce the generalization

More information

ECE 5424: Introduction to Machine Learning

ECE 5424: Introduction to Machine Learning ECE 5424: Introduction to Machine Learning Topics: Ensemble Methods: Bagging, Boosting PAC Learning Readings: Murphy 16.4;; Hastie 16 Stefan Lee Virginia Tech Fighting the bias-variance tradeoff Simple

More information

Decision Trees: Overfitting

Decision Trees: Overfitting Decision Trees: Overfitting Emily Fox University of Washington January 30, 2017 Decision tree recap Loan status: Root 22 18 poor 4 14 Credit? Income? excellent 9 0 3 years 0 4 Fair 9 4 Term? 5 years 9

More information

Statistics and learning: Big Data

Statistics and learning: Big Data Statistics and learning: Big Data Learning Decision Trees and an Introduction to Boosting Sébastien Gadat Toulouse School of Economics February 2017 S. Gadat (TSE) SAD 2013 1 / 30 Keywords Decision trees

More information

JEROME H. FRIEDMAN Department of Statistics and Stanford Linear Accelerator Center, Stanford University, Stanford, CA

JEROME H. FRIEDMAN Department of Statistics and Stanford Linear Accelerator Center, Stanford University, Stanford, CA 1 SEPARATING SIGNAL FROM BACKGROUND USING ENSEMBLES OF RULES JEROME H. FRIEDMAN Department of Statistics and Stanford Linear Accelerator Center, Stanford University, Stanford, CA 94305 E-mail: jhf@stanford.edu

More information

I D I A P. Online Policy Adaptation for Ensemble Classifiers R E S E A R C H R E P O R T. Samy Bengio b. Christos Dimitrakakis a IDIAP RR 03-69

I D I A P. Online Policy Adaptation for Ensemble Classifiers R E S E A R C H R E P O R T. Samy Bengio b. Christos Dimitrakakis a IDIAP RR 03-69 R E S E A R C H R E P O R T Online Policy Adaptation for Ensemble Classifiers Christos Dimitrakakis a IDIAP RR 03-69 Samy Bengio b I D I A P December 2003 D a l l e M o l l e I n s t i t u t e for Perceptual

More information

VBM683 Machine Learning

VBM683 Machine Learning VBM683 Machine Learning Pinar Duygulu Slides are adapted from Dhruv Batra Bias is the algorithm's tendency to consistently learn the wrong thing by not taking into account all the information in the data

More information

Low Bias Bagged Support Vector Machines

Low Bias Bagged Support Vector Machines Low Bias Bagged Support Vector Machines Giorgio Valentini Dipartimento di Scienze dell Informazione, Università degli Studi di Milano, Italy INFM, Istituto Nazionale per la Fisica della Materia, Italy.

More information

A Unified Bias-Variance Decomposition and its Applications

A Unified Bias-Variance Decomposition and its Applications A Unified ias-ariance Decomposition and its Applications Pedro Domingos pedrod@cs.washington.edu Department of Computer Science and Engineering, University of Washington, Seattle, WA 98195, U.S.A. Abstract

More information

Analysis of the Performance of AdaBoost.M2 for the Simulated Digit-Recognition-Example

Analysis of the Performance of AdaBoost.M2 for the Simulated Digit-Recognition-Example Analysis of the Performance of AdaBoost.M2 for the Simulated Digit-Recognition-Example Günther Eibl and Karl Peter Pfeiffer Institute of Biostatistics, Innsbruck, Austria guenther.eibl@uibk.ac.at Abstract.

More information

Importance Sampling: An Alternative View of Ensemble Learning. Jerome H. Friedman Bogdan Popescu Stanford University

Importance Sampling: An Alternative View of Ensemble Learning. Jerome H. Friedman Bogdan Popescu Stanford University Importance Sampling: An Alternative View of Ensemble Learning Jerome H. Friedman Bogdan Popescu Stanford University 1 PREDICTIVE LEARNING Given data: {z i } N 1 = {y i, x i } N 1 q(z) y = output or response

More information

TDT4173 Machine Learning

TDT4173 Machine Learning TDT4173 Machine Learning Lecture 9 Learning Classifiers: Bagging & Boosting Norwegian University of Science and Technology Helge Langseth IT-VEST 310 helgel@idi.ntnu.no 1 TDT4173 Machine Learning Outline

More information

Generalized Boosted Models: A guide to the gbm package

Generalized Boosted Models: A guide to the gbm package Generalized Boosted Models: A guide to the gbm package Greg Ridgeway April 15, 2006 Boosting takes on various forms with different programs using different loss functions, different base models, and different

More information

Ensemble learning 11/19/13. The wisdom of the crowds. Chapter 11. Ensemble methods. Ensemble methods

Ensemble learning 11/19/13. The wisdom of the crowds. Chapter 11. Ensemble methods. Ensemble methods The wisdom of the crowds Ensemble learning Sir Francis Galton discovered in the early 1900s that a collection of educated guesses can add up to very accurate predictions! Chapter 11 The paper in which

More information

Boosting the margin: A new explanation for the effectiveness of voting methods

Boosting the margin: A new explanation for the effectiveness of voting methods Machine Learning: Proceedings of the Fourteenth International Conference, 1997. Boosting the margin: A new explanation for the effectiveness of voting methods Robert E. Schapire Yoav Freund AT&T Labs 6

More information

PDEEC Machine Learning 2016/17

PDEEC Machine Learning 2016/17 PDEEC Machine Learning 2016/17 Lecture - Model assessment, selection and Ensemble Jaime S. Cardoso jaime.cardoso@inesctec.pt INESC TEC and Faculdade Engenharia, Universidade do Porto Nov. 07, 2017 1 /

More information

Boosting. CAP5610: Machine Learning Instructor: Guo-Jun Qi

Boosting. CAP5610: Machine Learning Instructor: Guo-Jun Qi Boosting CAP5610: Machine Learning Instructor: Guo-Jun Qi Weak classifiers Weak classifiers Decision stump one layer decision tree Naive Bayes A classifier without feature correlations Linear classifier

More information

Ensembles. Léon Bottou COS 424 4/8/2010

Ensembles. Léon Bottou COS 424 4/8/2010 Ensembles Léon Bottou COS 424 4/8/2010 Readings T. G. Dietterich (2000) Ensemble Methods in Machine Learning. R. E. Schapire (2003): The Boosting Approach to Machine Learning. Sections 1,2,3,4,6. Léon

More information

Boos$ng Can we make dumb learners smart?

Boos$ng Can we make dumb learners smart? Boos$ng Can we make dumb learners smart? Aarti Singh Machine Learning 10-601 Nov 29, 2011 Slides Courtesy: Carlos Guestrin, Freund & Schapire 1 Why boost weak learners? Goal: Automa'cally categorize type

More information

Neural Networks and Ensemble Methods for Classification

Neural Networks and Ensemble Methods for Classification Neural Networks and Ensemble Methods for Classification NEURAL NETWORKS 2 Neural Networks A neural network is a set of connected input/output units (neurons) where each connection has a weight associated

More information

Big Data Analytics. Special Topics for Computer Science CSE CSE Feb 24

Big Data Analytics. Special Topics for Computer Science CSE CSE Feb 24 Big Data Analytics Special Topics for Computer Science CSE 4095-001 CSE 5095-005 Feb 24 Fei Wang Associate Professor Department of Computer Science and Engineering fei_wang@uconn.edu Prediction III Goal

More information

Boosting & Deep Learning

Boosting & Deep Learning Boosting & Deep Learning Ensemble Learning n So far learning methods that learn a single hypothesis, chosen form a hypothesis space that is used to make predictions n Ensemble learning à select a collection

More information

1 Handling of Continuous Attributes in C4.5. Algorithm

1 Handling of Continuous Attributes in C4.5. Algorithm .. Spring 2009 CSC 466: Knowledge Discovery from Data Alexander Dekhtyar.. Data Mining: Classification/Supervised Learning Potpourri Contents 1. C4.5. and continuous attributes: incorporating continuous

More information

Logistic Regression and Boosting for Labeled Bags of Instances

Logistic Regression and Boosting for Labeled Bags of Instances Logistic Regression and Boosting for Labeled Bags of Instances Xin Xu and Eibe Frank Department of Computer Science University of Waikato Hamilton, New Zealand {xx5, eibe}@cs.waikato.ac.nz Abstract. In

More information

Support Vector Regression Machines

Support Vector Regression Machines Support Vector Regression Machines Harris Drucker* Chris J.C. Burges** Linda Kaufman** Alex Smola** Vladimir Vapnik + *Bell Labs and Monmouth University Department of Electronic Engineering West Long Branch,

More information

BAGGING PREDICTORS AND RANDOM FOREST

BAGGING PREDICTORS AND RANDOM FOREST BAGGING PREDICTORS AND RANDOM FOREST DANA KANER M.SC. SEMINAR IN STATISTICS, MAY 2017 BAGIGNG PREDICTORS / LEO BREIMAN, 1996 RANDOM FORESTS / LEO BREIMAN, 2001 THE ELEMENTS OF STATISTICAL LEARNING (CHAPTERS

More information

Gradient Boosting (Continued)

Gradient Boosting (Continued) Gradient Boosting (Continued) David Rosenberg New York University April 4, 2016 David Rosenberg (New York University) DS-GA 1003 April 4, 2016 1 / 31 Boosting Fits an Additive Model Boosting Fits an Additive

More information

Ensemble Methods. Charles Sutton Data Mining and Exploration Spring Friday, 27 January 12

Ensemble Methods. Charles Sutton Data Mining and Exploration Spring Friday, 27 January 12 Ensemble Methods Charles Sutton Data Mining and Exploration Spring 2012 Bias and Variance Consider a regression problem Y = f(x)+ N(0, 2 ) With an estimate regression function ˆf, e.g., ˆf(x) =w > x Suppose

More information

Active Sonar Target Classification Using Classifier Ensembles

Active Sonar Target Classification Using Classifier Ensembles International Journal of Engineering Research and Technology. ISSN 0974-3154 Volume 11, Number 12 (2018), pp. 2125-2133 International Research Publication House http://www.irphouse.com Active Sonar Target

More information

CSE 151 Machine Learning. Instructor: Kamalika Chaudhuri

CSE 151 Machine Learning. Instructor: Kamalika Chaudhuri CSE 151 Machine Learning Instructor: Kamalika Chaudhuri Ensemble Learning How to combine multiple classifiers into a single one Works well if the classifiers are complementary This class: two types of

More information

What makes good ensemble? CS789: Machine Learning and Neural Network. Introduction. More on diversity

What makes good ensemble? CS789: Machine Learning and Neural Network. Introduction. More on diversity What makes good ensemble? CS789: Machine Learning and Neural Network Ensemble methods Jakramate Bootkrajang Department of Computer Science Chiang Mai University 1. A member of the ensemble is accurate.

More information

Look before you leap: Some insights into learner evaluation with cross-validation

Look before you leap: Some insights into learner evaluation with cross-validation Look before you leap: Some insights into learner evaluation with cross-validation Gitte Vanwinckelen and Hendrik Blockeel Department of Computer Science, KU Leuven, Belgium, {gitte.vanwinckelen,hendrik.blockeel}@cs.kuleuven.be

More information

ECE 5984: Introduction to Machine Learning

ECE 5984: Introduction to Machine Learning ECE 5984: Introduction to Machine Learning Topics: Ensemble Methods: Bagging, Boosting Readings: Murphy 16.4; Hastie 16 Dhruv Batra Virginia Tech Administrativia HW3 Due: April 14, 11:55pm You will implement

More information

Chapter 14 Combining Models

Chapter 14 Combining Models Chapter 14 Combining Models T-61.62 Special Course II: Pattern Recognition and Machine Learning Spring 27 Laboratory of Computer and Information Science TKK April 3th 27 Outline Independent Mixing Coefficients

More information

TDT4173 Machine Learning

TDT4173 Machine Learning TDT4173 Machine Learning Lecture 3 Bagging & Boosting + SVMs Norwegian University of Science and Technology Helge Langseth IT-VEST 310 helgel@idi.ntnu.no 1 TDT4173 Machine Learning Outline 1 Ensemble-methods

More information

ENSEMBLES OF DECISION RULES

ENSEMBLES OF DECISION RULES ENSEMBLES OF DECISION RULES Jerzy BŁASZCZYŃSKI, Krzysztof DEMBCZYŃSKI, Wojciech KOTŁOWSKI, Roman SŁOWIŃSKI, Marcin SZELĄG Abstract. In most approaches to ensemble methods, base classifiers are decision

More information

Lecture 13: Ensemble Methods

Lecture 13: Ensemble Methods Lecture 13: Ensemble Methods Applied Multivariate Analysis Math 570, Fall 2014 Xingye Qiao Department of Mathematical Sciences Binghamton University E-mail: qiao@math.binghamton.edu 1 / 71 Outline 1 Bootstrap

More information

BBM406 - Introduc0on to ML. Spring Ensemble Methods. Aykut Erdem Dept. of Computer Engineering HaceDepe University

BBM406 - Introduc0on to ML. Spring Ensemble Methods. Aykut Erdem Dept. of Computer Engineering HaceDepe University BBM406 - Introduc0on to ML Spring 2014 Ensemble Methods Aykut Erdem Dept. of Computer Engineering HaceDepe University 2 Slides adopted from David Sontag, Mehryar Mohri, Ziv- Bar Joseph, Arvind Rao, Greg

More information

Analysis of Fast Input Selection: Application in Time Series Prediction

Analysis of Fast Input Selection: Application in Time Series Prediction Analysis of Fast Input Selection: Application in Time Series Prediction Jarkko Tikka, Amaury Lendasse, and Jaakko Hollmén Helsinki University of Technology, Laboratory of Computer and Information Science,

More information

Diversity-Based Boosting Algorithm

Diversity-Based Boosting Algorithm Diversity-Based Boosting Algorithm Jafar A. Alzubi School of Engineering Al-Balqa Applied University Al-Salt, Jordan Abstract Boosting is a well known and efficient technique for constructing a classifier

More information

A Gentle Introduction to Gradient Boosting. Cheng Li College of Computer and Information Science Northeastern University

A Gentle Introduction to Gradient Boosting. Cheng Li College of Computer and Information Science Northeastern University A Gentle Introduction to Gradient Boosting Cheng Li chengli@ccs.neu.edu College of Computer and Information Science Northeastern University Gradient Boosting a powerful machine learning algorithm it can

More information

An Empirical Study of Building Compact Ensembles

An Empirical Study of Building Compact Ensembles An Empirical Study of Building Compact Ensembles Huan Liu, Amit Mandvikar, and Jigar Mody Computer Science & Engineering Arizona State University Tempe, AZ 85281 {huan.liu,amitm,jigar.mody}@asu.edu Abstract.

More information

... SPARROW. SPARse approximation Weighted regression. Pardis Noorzad. Department of Computer Engineering and IT Amirkabir University of Technology

... SPARROW. SPARse approximation Weighted regression. Pardis Noorzad. Department of Computer Engineering and IT Amirkabir University of Technology ..... SPARROW SPARse approximation Weighted regression Pardis Noorzad Department of Computer Engineering and IT Amirkabir University of Technology Université de Montréal March 12, 2012 SPARROW 1/47 .....

More information

LEAST ANGLE REGRESSION 469

LEAST ANGLE REGRESSION 469 LEAST ANGLE REGRESSION 469 Specifically for the Lasso, one alternative strategy for logistic regression is to use a quadratic approximation for the log-likelihood. Consider the Bayesian version of Lasso

More information

A Unified Bias-Variance Decomposition for Zero-One and Squared Loss

A Unified Bias-Variance Decomposition for Zero-One and Squared Loss From: AAAI-00 Proceedings. Copyright 2000, AAAI www.aaai.org). All rights reserved. A Unified Bias-Variance Decomposition for Zero-One and Squared Loss Pedro Domingos Department of Computer Science and

More information

1 Handling of Continuous Attributes in C4.5. Algorithm

1 Handling of Continuous Attributes in C4.5. Algorithm .. Spring 2009 CSC 466: Knowledge Discovery from Data Alexander Dekhtyar.. Data Mining: Classification/Supervised Learning Potpourri Contents 1. C4.5. and continuous attributes: incorporating continuous

More information

Variance and Bias for General Loss Functions

Variance and Bias for General Loss Functions Machine Learning, 51, 115 135, 2003 c 2003 Kluwer Academic Publishers. Manufactured in The Netherlands. Variance and Bias for General Loss Functions GARETH M. JAMES Marshall School of Business, University

More information

Data Mining. 3.6 Regression Analysis. Fall Instructor: Dr. Masoud Yaghini. Numeric Prediction

Data Mining. 3.6 Regression Analysis. Fall Instructor: Dr. Masoud Yaghini. Numeric Prediction Data Mining 3.6 Regression Analysis Fall 2008 Instructor: Dr. Masoud Yaghini Outline Introduction Straight-Line Linear Regression Multiple Linear Regression Other Regression Models References Introduction

More information

Bagging and Other Ensemble Methods

Bagging and Other Ensemble Methods Bagging and Other Ensemble Methods Sargur N. Srihari srihari@buffalo.edu 1 Regularization Strategies 1. Parameter Norm Penalties 2. Norm Penalties as Constrained Optimization 3. Regularization and Underconstrained

More information

A Bias Correction for the Minimum Error Rate in Cross-validation

A Bias Correction for the Minimum Error Rate in Cross-validation A Bias Correction for the Minimum Error Rate in Cross-validation Ryan J. Tibshirani Robert Tibshirani Abstract Tuning parameters in supervised learning problems are often estimated by cross-validation.

More information

Cross Validation & Ensembling

Cross Validation & Ensembling Cross Validation & Ensembling Shan-Hung Wu shwu@cs.nthu.edu.tw Department of Computer Science, National Tsing Hua University, Taiwan Machine Learning Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning

More information

Relationship between Loss Functions and Confirmation Measures

Relationship between Loss Functions and Confirmation Measures Relationship between Loss Functions and Confirmation Measures Krzysztof Dembczyński 1 and Salvatore Greco 2 and Wojciech Kotłowski 1 and Roman Słowiński 1,3 1 Institute of Computing Science, Poznań University

More information

Boosting Methods for Regression

Boosting Methods for Regression Machine Learning, 47, 153 00, 00 c 00 Kluwer Academic Publishers. Manufactured in The Netherlands. Boosting Methods for Regression NIGEL DUFFY nigeduff@cse.ucsc.edu DAVID HELMBOLD dph@cse.ucsc.edu Computer

More information

CS 484 Data Mining. Classification 7. Some slides are from Professor Padhraic Smyth at UC Irvine

CS 484 Data Mining. Classification 7. Some slides are from Professor Padhraic Smyth at UC Irvine CS 484 Data Mining Classification 7 Some slides are from Professor Padhraic Smyth at UC Irvine Bayesian Belief networks Conditional independence assumption of Naïve Bayes classifier is too strong. Allows

More information

Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas

Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas Midterm Review CS 6375: Machine Learning Vibhav Gogate The University of Texas at Dallas Machine Learning Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Y Continuous Non-parametric

More information

Ordinal Classification with Decision Rules

Ordinal Classification with Decision Rules Ordinal Classification with Decision Rules Krzysztof Dembczyński 1, Wojciech Kotłowski 1, and Roman Słowiński 1,2 1 Institute of Computing Science, Poznań University of Technology, 60-965 Poznań, Poland

More information

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted

More information

Classification using stochastic ensembles

Classification using stochastic ensembles July 31, 2014 Topics Introduction Topics Classification Application and classfication Classification and Regression Trees Stochastic ensemble methods Our application: USAID Poverty Assessment Tools Topics

More information

A TWO-STAGE COMMITTEE MACHINE OF NEURAL NETWORKS

A TWO-STAGE COMMITTEE MACHINE OF NEURAL NETWORKS Journal of the Chinese Institute of Engineers, Vol. 32, No. 2, pp. 169-178 (2009) 169 A TWO-STAGE COMMITTEE MACHINE OF NEURAL NETWORKS Jen-Feng Wang, Chinson Yeh, Chen-Wen Yen*, and Mark L. Nagurka ABSTRACT

More information

Final Overview. Introduction to ML. Marek Petrik 4/25/2017

Final Overview. Introduction to ML. Marek Petrik 4/25/2017 Final Overview Introduction to ML Marek Petrik 4/25/2017 This Course: Introduction to Machine Learning Build a foundation for practice and research in ML Basic machine learning concepts: max likelihood,

More information

CS534 Machine Learning - Spring Final Exam

CS534 Machine Learning - Spring Final Exam CS534 Machine Learning - Spring 2013 Final Exam Name: You have 110 minutes. There are 6 questions (8 pages including cover page). If you get stuck on one question, move on to others and come back to the

More information

CART Classification and Regression Trees Trees can be viewed as basis expansions of simple functions. f(x) = c m 1(x R m )

CART Classification and Regression Trees Trees can be viewed as basis expansions of simple functions. f(x) = c m 1(x R m ) CART Classification and Regression Trees Trees can be viewed as basis expansions of simple functions with R 1,..., R m R p disjoint. f(x) = M c m 1(x R m ) m=1 The CART algorithm is a heuristic, adaptive

More information

SPECIAL INVITED PAPER

SPECIAL INVITED PAPER The Annals of Statistics 2000, Vol. 28, No. 2, 337 407 SPECIAL INVITED PAPER ADDITIVE LOGISTIC REGRESSION: A STATISTICAL VIEW OF BOOSTING By Jerome Friedman, 1 Trevor Hastie 2 3 and Robert Tibshirani 2

More information

Announcements Kevin Jamieson

Announcements Kevin Jamieson Announcements My office hours TODAY 3:30 pm - 4:30 pm CSE 666 Poster Session - Pick one First poster session TODAY 4:30 pm - 7:30 pm CSE Atrium Second poster session December 12 4:30 pm - 7:30 pm CSE Atrium

More information

Boosting Based Conditional Quantile Estimation for Regression and Binary Classification

Boosting Based Conditional Quantile Estimation for Regression and Binary Classification Boosting Based Conditional Quantile Estimation for Regression and Binary Classification Songfeng Zheng Department of Mathematics Missouri State University, Springfield, MO 65897, USA SongfengZheng@MissouriState.edu

More information