Combining Estimators to Improve Performance
|
|
- Ethelbert Pierce
- 6 years ago
- Views:
Transcription
1 Combining Estimators to Improve Performance A survey of model bundling techniques -- from boosting and bagging, to Bayesian model averaging -- creating a breakthrough in the practice of Data Mining. John F. Elder IV, Ph.D. Elder Research, Charlottesville, Virginia Greg Ridgeway, Ph.D. University of Washington, Dept. of Statistics Elder & Ridgeway KDD99 T5-1 Combining Estimators
2 Outline Why combine? A motivating example Hidden dangers of model selection Reducing modeling uncertainty through Bayesian Model Averaging Stabilizing predictors through bagging Improving performance through boosting Emerging theory illuminates empirical success Bundling, in general Latest algorithms Closing Examples & Summary 1999 Elder & Ridgeway KDD99 T5-2 Combining Estimators
3 Reasons to combine estimators Decreases variability in the predictions. Accounts for uncertainty in the model class. > Improved accuracy on new data Elder & Ridgeway KDD99 T5-3 Combining Estimators
4 A Motivating Example: Classifying a bat s species from its chirp Goal: Use time-frequency features of echolocation signals to classify bats by species in the field (avoiding capture and physical inspection). U. Illinois biologists gathered data: 98 signals from 19 bats representing 6 species: Southeastern, Grey, Little Brown, Indiana, Pipistrelle, Big-Eared. ~35 data features (dimensions) calculated from signals, such as low frequency at the 3db level, time position of the signal peak, and amplitude ratio of 1st and 2nd harmonics. Turned out to have a nice level of difficulty for comparing methods: overlap in classes, but some separability Elder & Ridgeway KDD99 T5-4 Combining Estimators
5 Sample Projection Var4 t10 bw 1999 Elder & Ridgeway KDD99 T5-5 Combining Estimators
6 % 88 Percent Classified Correctly training time: TX2step (11-12) CART (7) ASPN (15) 8-N.net (14) (tree vars) 17-N.net Visualization & multicorrelation 35-N.net Optimal (of 8) dimensions Near.n. 8-Near.n (13) [Fit](16) prop. Avg. Avg. Trees Polynomial Neural Nearest y ˆ Networks Networks Neighbor 1:seconds hour day ε combinations 2:20 mins 1999 Elder & Ridgeway KDD99 T5-6 Combining Estimators Bundling 4 Old Fit Advisor Perceptrons (16) enabled
7 What is model uncertainty? Suppose we wish to predict y from predictors x. Given a dataset of observations, D, for a new observation with predictors x * we want to derive the predictive distribution of y * given x * and D. P( y x, D) 1999 Elder & Ridgeway KDD99 T5-7 Combining Estimators
8 In practice Although we want to use all the information in D to make the best estimate of y * for an individual with covariates x * P( y x, D) In practice, however, we always use P( y x, M where M is a model constructed from D. ) 1999 Elder & Ridgeway KDD99 T5-8 Combining Estimators
9 Selecting M The process of selecting a model usually involves Model class selection Linear regression, tree regression, neural network Variable selection variable exclusion, transformation, smoothing Parameter estimation We tend to choose the one model that fits the data or performs best as the model Elder & Ridgeway KDD99 T5-9 Combining Estimators
10 What s wrong with that? Two models may equally fit a dataset (with repect to some loss) but have different predictions. Competing interpretable models with equivalent performance offer ambiguious conclusions. Model search dilutes the evidence. Part of the evidence is spent specifying the model Elder & Ridgeway KDD99 T5-10 Combining Estimators
11 Bayesian Model Averaging Goal: Account for model uncertainty Method: Use Bayes Theorem and average the models by their posterior probabilities Properties: Improves predictive performance Theoretically elegant Computationally costly 1999 Elder & Ridgeway KDD99 T5-11 Combining Estimators
12 Averaging the models Consider a set containing the K candidate models M 1,, M K. With a few probability manipulations we can make predictions using all of them. P( y * x *, D) * * = k k k P( y x, M ) P( M D) The probability mass for a particular prediction value of y is a weighted average of the probability mass that each model places on that value of y. The weight is based on the posterior probability of that model given the data Elder & Ridgeway KDD99 T5-12 Combining Estimators
13 1999 Elder & Ridgeway KDD99 T5-13 Combining Estimators Bayes Theorem M k - model D - data P(D M k ) - integrated likelihood of M k P(M k ) - prior model probability = = K l l l k k k M P M D P M P M D P D M P 1 ) ( ) ( ) ( ) ( ) (
14 Challenges The size of the model set may cause exhaustive summation to be impossible. The integrated likelihood of each model is usually complex. Specifying a prior distribution (even a noninformative one) across the space of models is non-trivial. Proposed solutions to these challenges often involve MCMC, BIC approximation, MLE approximation, Occam s window, Occam s razor Elder & Ridgeway KDD99 T5-14 Combining Estimators
15 Performance Survival model: Primary biliary cirrhosis BMA vs. Stepwise regression 2% improvement BMA vs. expert selected model 10% improvement Linear regression: Body fat prediction BMA provides best 90% predictive coverage. Graphical models BMA yields an improvement 1999 Elder & Ridgeway KDD99 T5-15 Combining Estimators
16 BMA References Chris Volinsky s BMA homepage J. Hoeting, D. Madigan, A. Raftery, C. Volinsky (1999). Bayesian Model Averaging: A Practical Tutorial (to appear in Statistical Science), Elder & Ridgeway KDD99 T5-16 Combining Estimators
17 Unstable predictors We can always assume y = f ( x) + ε, where E( ε x) = 0 Assume that we have a way of constructing a predictor, f ˆ ( x), from a dataset D. f D We want to choose the estimator of f that minimizes J, squared loss for example. J ( fˆ, D) = E ˆ, ( y f ( x)) y x 1999 Elder & Ridgeway KDD99 T5-17 Combining Estimators D 2
18 Bias-variance decomposition If we could average over all possible datasets, let the average prediction be f ( x) = E f ˆ ( x) The average prediction error over all datasets that we might see is decomposable D D E D J ( fˆ, D) 2 2 = Eε + E ( f ( x) f ( x)) + E ( fˆ ( x) f ( x)) x = noise + bias + variance x, D D Elder & Ridgeway KDD99 T5-18 Combining Estimators
19 Bias-variance decomposition (cont.) E D J ( fˆ, D) 2 2 = Eε + E ( f ( x) f ( x)) + E ( fˆ ( x) f ( x)) x = noise + bias + variance x, D D 2 The noise cannot be reduced. The squared-bias term might be reducible The variance term is 0 if we use fˆ D ( x) = f ( x) But this requires having an infinite number of datasets 1999 Elder & Ridgeway KDD99 T5-19 Combining Estimators
20 Bagging (Bootstrap Aggregating) Goal: Variance reduction Method: Create bootstrap replicates of the dataset and fit a model to each. Average the predictions of each model. Properties: Stabilizes unstable methods Easy to implement, parallelizable Theory is not fully explained 1999 Elder & Ridgeway KDD99 T5-20 Combining Estimators
21 Bagging algorithm 1. Create K bootstrap replicates of the dataset. 2. Fit a model to each of the replicates. 3. Average (or vote) the predictions of the K models. Bootstrapping simulates the stream of infinite datasets in the bias-variance decomposition Elder & Ridgeway KDD99 T5-21 Combining Estimators
22 Bagging Example x Elder & Ridgeway KDD99 T5-22 Combining Estimators x1
23 CART decision boundary Elder & Ridgeway KDD99 T5-23 Combining Estimators
24 100 bagged trees Elder & Ridgeway KDD99 T5-24 Combining Estimators
25 Bagged tree decision boundary Elder & Ridgeway KDD99 T5-25 Combining Estimators
26 Regression results Squared error loss CART Bagged CART Boston Housing Ozone Friedman #1 Friedman #2 Friedman # Elder & Ridgeway KDD99 T5-26 Combining Estimators
27 Classification results Misclassification rates CART Bagged CART Diabetes Breast Ionosphere Heart Soybean Glass Waveform 1999 Elder & Ridgeway KDD99 T5-27 Combining Estimators
28 Bagging References Leo Breiman s homepage Breiman, L. (1996) Bagging Predictors, Machine Learning, 26:2, Friedman, J. and P. Hall (1999) On Bagging and Nonlinear Estimation Elder & Ridgeway KDD99 T5-28 Combining Estimators
29 Boosting Goal: Improve misclassification rates Method: Sequentially fit models, each more heavily weighting those observations poorly predicted by the previous model Properties: Bias and variance reduction Easy to implement Theory is not fully (but almost) explained 1999 Elder & Ridgeway KDD99 T5-29 Combining Estimators
30 Origin of Boosting Classification problems {y, x} i, i = 1,,n y {0, 1} The task - construct a function, F(x) : x {0, 1} so that F minimizes misclassification error Elder & Ridgeway KDD99 T5-30 Combining Estimators
31 Generic boosting algorithm Equally weight the observations (y,x) i For t in 1,,T Using the weights, fit a classifier f t (x) y Upweight the poorly predicted observations Downweight the well-predicted observations Merge f 1,,f T to form the boosted classifier 1999 Elder & Ridgeway KDD99 T5-31 Combining Estimators
32 Real AdaBoost Schapire & Singer 1998 y i {-1,1}, w i = 1/N For t in 1,,T do 1. Estimate P w (y = 1 x). Pˆ ( y = 1 x) 1 w 2. Set f ( x) = 2 log t Pˆ ( y = 1 x) w 3. w w exp y f ( x ) and renormalize i i ( ) i t i Output the classifier F ( x) = sign f ( x) t 1999 Elder & Ridgeway KDD99 T5-32 Combining Estimators
33 AdaBoost s Performance Freund & Schapire [1996] Leo Breiman - AdaBoost with trees is the best off-the-shelf classifier in the world. Performs well with many base classifiers and in a variety of problem domains. AdaBoost is generally slow to overfit. Boosted naïve Bayes tied for first place in the 1997 KDD Cup. (Elkan [1997]) Boosted naïve Bayes is a scalable, interpretable classifier (Ridgeway, et al [1998]) Elder & Ridgeway KDD99 T5-33 Combining Estimators
34 Boosting Example x Elder & Ridgeway KDD99 T5-34 Combining Estimators x1
35 After one iteration CART splits, larger points have great weight x Elder & Ridgeway KDD99 T5-35 Combining Estimators x1
36 After 3 iterations x Elder & Ridgeway KDD99 T5-36 Combining Estimators x1
37 After 20 iterations x Elder & Ridgeway KDD99 T5-37 Combining Estimators x1
38 Decision boundary after 100 iterations Elder & Ridgeway KDD99 T5-38 Combining Estimators
39 Boosting as optimization Friedman, Hastie, Tibshirani [1998] - AdaBoost is an optimization method for finding a classifier. Let y {-1,1}, F(x) (-, ) J ( ) ( ) F = E e yf ( x) x 1999 Elder & Ridgeway KDD99 T5-39 Combining Estimators
40 Criterion E(e yf(x) ) bounds the misclassification rate. I( yf( x) < 0) < e yf ( x) The minimizer of E(e yf(x) ) coincides with the maximizer of the expected Bernoulli likelihood. E ( ) 2 yf ( x) l( p( x), y) = E log(1 + e ) 1999 Elder & Ridgeway KDD99 T5-40 Combining Estimators
41 1999 Elder & Ridgeway KDD99 T5-41 Combining Estimators Optimization step Select f to minimize J ( ) ( ) ( ) x e E f F J x f x F y ) ( ) ( + = + ) ( 2 1 ) ( 1) ( ) ( ), ( ] 1) ( [ 1 ] 1) ( [ log x yf w w t t t e y x w x y I E x y I E F F + = = = +
42 1999 Elder & Ridgeway KDD99 T5-42 Combining Estimators LogitBoost Friedman, Hastie, Tibshirani [1998] Logistic regression Expected log-likelihood of a regressor, F(x) = ) ( 1 with probability 0 ) ( with probability 1 x p x p y ( ) x e x yf F x F ) log(1 ) ( E ) ( E ) ( + = l ) ( 1 1 ) ( x F e x p + =
43 1999 Elder & Ridgeway KDD99 T5-43 Combining Estimators Newton steps Iterate to optimize expected log-likelihood. 0 ) ( 0 ) ( ) ( 1) ( ) ( ) ( ) ( ) ( 2 2 = = f t f f t f t t f F J f F J x F x F ( ) x e x f x F y E f F J x f x F ) log(1 )) ( ) ( ( ) ( ) ( ) ( = +
44 LogitBoost, continued Newton steps for Bernoulli likelihood F( x) y p( x) F( x) + Ew p( x)(1 p( x)) w( x) = p( x)(1 p( x)) In practice the E w ( x) can be any regressor - trees, smoothers, etc. x Trees are adaptive and work well for high dimensional data Elder & Ridgeway KDD99 T5-44 Combining Estimators
45 Misclassification rates Friedman, Hastie, Tibshirani [1998] CART AdaBoost CART LogitBoost CART Breast Ionosphere Glass Sonar Waveform 1999 Elder & Ridgeway KDD99 T5-45 Combining Estimators
46 Boosting References Rob Schapire s homepage Freund, Y. and R. Schapire (1996). Experiments with a new boosting algorithm, Machine Learning: Proceedings of the 13 th International Conference, Jerry Friedman s homepage Friedman, J., T. Hastie, R. Tibshirani (1998). Additive Logistic Regression: a statistical view of boosting, Technical report, Statistics Department, Stanford University Elder & Ridgeway KDD99 T5-46 Combining Estimators
47 In general, combining ( bundling ) estimators consists of two steps: 1) Constructing varied models, and 2) Combining their estimates Generate component models by varying: Case Weights Data Values Guiding Parameters Variable Subsets Combine estimates using: Estimator Weights Voting Advisor Perceptrons Partitions of Design Space, X 1999 Elder & Ridgeway KDD99 T5-47 Combining Estimators
48 Other Bundling Techniques We ve Examined: Bayesian Model Averaging: sum estimates of possible models, weighted by posterior evidence Bagging (Breiman 96) (bootstrap aggregating) -- bootstrap data (to build trees mostly); take majority vote or average Boosting (Freund & Shapire 96) -- weight error cases by β t = (1-e(t))/e(t), iteratively re-model; average, weighing model t by ln(β t ) Additional Example Techniques: GMDH (Ivakhenko 68) -- multiple layers of quadratic polynomials, using two inputs each, fit by Linear Regression Stacking (Wolpert 92) -- train a 2nd-level (LR) model using leave-1-out estimates of 1st-level (neural net) models ARCing (Breiman 96) (Adaptive Resampling and Combining) -- Bagging with reweighting of error cases; superset of boosting Bumping (Tibshirani 97) -- bootstrap, select single best Crumpling (Anderson & Elder 98) -- average cross-validations Born-Again (Breiman 98) -- invent new X data Elder & Ridgeway KDD99 T5-48 Combining Estimators
49 Group Method of Data Handling Layer 1 (GMDH) a b A 0 + A 0 + A 1 a + A 1 a + A 2 b 2 b + A + A 3 a 2 3 a 2 + A + A 4 ab 4 ab + A + A 5 b 2 5 b 2 z 1 Layer 2 Layer 3 c d B 0 + B 0 + B 1 c + B 1 c + B 2 d 2 d + B + B 3 c 2 3 c 2 + B + B 4 cd 4 cd + B + B 5 d 2 5 d 2 z 2 z 4 y est e f C 0 + C 0 + C 1 e + C 1 e + C 2 f 2 f + C + C 3 e 2 3 e 2 + C + C 4 ef 4 ef + C + C 5 f 2 5 f 2 z 3 z 5 Try all pairs of variables (K choose 2) in quadratic polynomial nodes. Fit coefficients using regression. Keep best M nodes. Train model on one training data set, score on test data set. (Need a third data set for independent confirmation of model.) 1999 Elder & Ridgeway KDD99 T5-49 Combining Estimators
50 Polynomial Networks (ASPN) Z 17 = a -.15b bc abc + 0.5c 3 Layer 0 (Normalizers) Layer 1 a N1 z1 z9 Double 16 k N9 z6 Single 14 f N6 d N4 z4 Triple 17 z16 z14 z17 Layer 2 Multi- Linear 15 z15 Layer 3 Triple 21 z21 Unitizers U2 Y1 h N8 z8 Double 20 z20 Double 19 z19 U7 Y2 e N5 z Elder & Ridgeway KDD99 T5-50 Combining Estimators
51 When does Bundling work? Hypotheses: Breiman (1996): when the prediction method is unstable (significantly different models are constructed) Ali & Pazzani (1996): when there is low noise, lots of irrelevant variables, and good individual predictors which make different errors when models are slightly overfit when models are from different families 1999 Elder & Ridgeway KDD99 T5-51 Combining Estimators
52 Advanced techniques Stochastic gradient boosting Adaptive bagging Example regression and classification results 1999 Elder & Ridgeway KDD99 T5-52 Combining Estimators
53 Stochastic Gradient Boosting Goal: Non-parametric function estimation Method: Cast the problem as optimization and use gradient ascent to obtain predictor Properties: Bias and variance reduction Widely applicable Can make use of existing algorithms Many tuning parameters 1999 Elder & Ridgeway KDD99 T5-53 Combining Estimators
54 Improving boosting Boosting usually has the form F ( t+ 1) Improve by... ( x) ( x) + λe Sub-sampling a fraction of the data at each step when computing the expectation. Robustifying the expectation. F ( t) ( z( y, x) x) Trimming observations with small weights. w 1999 Elder & Ridgeway KDD99 T5-54 Combining Estimators
55 Stochastic gradient boosting offers... Application to likelihood based models (GLM, Cox models) Bias reduction - non-linear fitting Massive datasets - bagging, trimming Variance reduction - bagging Interpretability - additive models High-dimensional regression - trees Robust regression 1999 Elder & Ridgeway KDD99 T5-55 Combining Estimators
56 SGB References Friedman, J. (1999). Greedy function approximation: a gradient boosting machine, Technical report, Dept. of Statistics, Stanford University. Friedman, J. (1999). Stochastic gradient boosting, Technical report, Dept. of Statistics, Stanford University Elder & Ridgeway KDD99 T5-56 Combining Estimators
57 Adaptive Bagging Goal: Bias and variance reduction Method: Sequentially fit bagged models, where each fits the current residuals Properties: Bias and variance reduction No tuning parameters 1999 Elder & Ridgeway KDD99 T5-57 Combining Estimators
58 Adaptive bagging algorithm 1. Fit a bagged regressor to the dataset D. 2. Predict out-of-bag observations. 3. Fit a new bagged regressor to the bias (error) and repeat. For a new observation, sum the predictions from each stage Elder & Ridgeway KDD99 T5-58 Combining Estimators
59 Regression results Squared error loss Bagging Adaptive bagging Abalone Robot arm Peak20 Boston Housing Ozone Servo F #1 F #2 F # Elder & Ridgeway KDD99 T5-59 Combining Estimators
60 Classification results Misclassification rates AdaBoost Adaptive bagging Breast Ionosphere Sonar Heart German credit Votes Liver Diabetes 1999 Elder & Ridgeway KDD99 T5-60 Combining Estimators
61 Relative Performance Examples: 5 Algorithms on 6 Datasets (John Elder, Elder Research & Stephen Lee, U. Idaho, 1997) Neural Network Logistic Regression Linear Vector Quantization Projection Pursuit Regression Decision Tree Diabetes Gaussian Hypothyroid German Credit Waveform Investment 1999 Elder & Ridgeway KDD99 T5-61 Combining Estimators
62 1.00 Essentially every Bundling method improves performance Advisor Perceptron AP weighted average Vote Average Diabetes Gaussian Hypothyroid German Credit Waveform Investment 1999 Elder & Ridgeway KDD99 T5-62 Combining Estimators
63 Application Ex.: Direct Marketing (Elder Research ) Model respondants to direct marketing as binary variable: 0 (no response), 1 (response). Create models using several (here, 5) different algorithms, all employing the same candidate model inputs. Rank-order model responses: Give highest-probability response value a rank of 1, second highest value 2, etc. For bundling, combine model ranks (not estimates) into a new consensus estimate (which is again ranked). Report number of response cases missed (in top portion) Elder & Ridgeway KDD99 T5-63 Combining Estimators
64 80 Marketing Application Performance 75 Bundled Trees NT SNT #Cases Missed Stepwise Regression Polynomial Network Neural Network MARS NS MT PT MS MP ST PS NP MN SPT PNT MPT SMN SMT SPN MNT SMP SPNT SMPT SMNT SMPN MPNT 55 MPN #Models Combined (averaging output rank) 1999 Elder & Ridgeway KDD99 T5-64 Combining Estimators
65 Median (and Mean) Error Reduced with each Stage of Combination M i s s e d NOTE: Fewer misses is better No. Models in combination 1999 Elder & Ridgeway KDD99 T5-65 Combining Estimators
66 Why Bundling works (semi-) Independent Estimators Bayes Rule - weighing evidence Shrinking (ex.: stepwise LR) Smoothing (ex.: decision trees)...and in a multitude of counselors there is safety. Proverbs 24:6b Additive modeling and maximum likelihood (Friedman, Hastie, & Tibshirani 8/20/98) Open research area. Meanwhile, we recommend bundling competing candidate models both within, and between, model families Elder & Ridgeway KDD99 T5-66 Combining Estimators
Ensemble Methods. Charles Sutton Data Mining and Exploration Spring Friday, 27 January 12
Ensemble Methods Charles Sutton Data Mining and Exploration Spring 2012 Bias and Variance Consider a regression problem Y = f(x)+ N(0, 2 ) With an estimate regression function ˆf, e.g., ˆf(x) =w > x Suppose
More informationMachine Learning Ensemble Learning I Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi Spring /
Machine Learning Ensemble Learning I Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi Spring 2015 http://ce.sharif.edu/courses/93-94/2/ce717-1 / Agenda Combining Classifiers Empirical view Theoretical
More informationECE 5424: Introduction to Machine Learning
ECE 5424: Introduction to Machine Learning Topics: Ensemble Methods: Bagging, Boosting PAC Learning Readings: Murphy 16.4;; Hastie 16 Stefan Lee Virginia Tech Fighting the bias-variance tradeoff Simple
More informationCS7267 MACHINE LEARNING
CS7267 MACHINE LEARNING ENSEMBLE LEARNING Ref: Dr. Ricardo Gutierrez-Osuna at TAMU, and Aarti Singh at CMU Mingon Kang, Ph.D. Computer Science, Kennesaw State University Definition of Ensemble Learning
More informationGeneralized Boosted Models: A guide to the gbm package
Generalized Boosted Models: A guide to the gbm package Greg Ridgeway April 15, 2006 Boosting takes on various forms with different programs using different loss functions, different base models, and different
More informationVoting (Ensemble Methods)
1 2 Voting (Ensemble Methods) Instead of learning a single classifier, learn many weak classifiers that are good at different parts of the data Output class: (Weighted) vote of each classifier Classifiers
More informationVBM683 Machine Learning
VBM683 Machine Learning Pinar Duygulu Slides are adapted from Dhruv Batra Bias is the algorithm's tendency to consistently learn the wrong thing by not taking into account all the information in the data
More informationPDEEC Machine Learning 2016/17
PDEEC Machine Learning 2016/17 Lecture - Model assessment, selection and Ensemble Jaime S. Cardoso jaime.cardoso@inesctec.pt INESC TEC and Faculdade Engenharia, Universidade do Porto Nov. 07, 2017 1 /
More informationMidterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas
Midterm Review CS 6375: Machine Learning Vibhav Gogate The University of Texas at Dallas Machine Learning Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Y Continuous Non-parametric
More informationAdaBoost. Lecturer: Authors: Center for Machine Perception Czech Technical University, Prague
AdaBoost Lecturer: Jan Šochman Authors: Jan Šochman, Jiří Matas Center for Machine Perception Czech Technical University, Prague http://cmp.felk.cvut.cz Motivation Presentation 2/17 AdaBoost with trees
More informationBoosting. CAP5610: Machine Learning Instructor: Guo-Jun Qi
Boosting CAP5610: Machine Learning Instructor: Guo-Jun Qi Weak classifiers Weak classifiers Decision stump one layer decision tree Naive Bayes A classifier without feature correlations Linear classifier
More informationRANDOMIZING OUTPUTS TO INCREASE PREDICTION ACCURACY
1 RANDOMIZING OUTPUTS TO INCREASE PREDICTION ACCURACY Leo Breiman Statistics Department University of California Berkeley, CA. 94720 leo@stat.berkeley.edu Technical Report 518, May 1, 1998 abstract Bagging
More informationData Mining und Maschinelles Lernen
Data Mining und Maschinelles Lernen Ensemble Methods Bias-Variance Trade-off Basic Idea of Ensembles Bagging Basic Algorithm Bagging with Costs Randomization Random Forests Boosting Stacking Error-Correcting
More informationAlgorithm-Independent Learning Issues
Algorithm-Independent Learning Issues Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2007 c 2007, Selim Aksoy Introduction We have seen many learning
More informationLearning with multiple models. Boosting.
CS 2750 Machine Learning Lecture 21 Learning with multiple models. Boosting. Milos Hauskrecht milos@cs.pitt.edu 5329 Sennott Square Learning with multiple models: Approach 2 Approach 2: use multiple models
More informationBoosting Methods: Why They Can Be Useful for High-Dimensional Data
New URL: http://www.r-project.org/conferences/dsc-2003/ Proceedings of the 3rd International Workshop on Distributed Statistical Computing (DSC 2003) March 20 22, Vienna, Austria ISSN 1609-395X Kurt Hornik,
More informationChapter 14 Combining Models
Chapter 14 Combining Models T-61.62 Special Course II: Pattern Recognition and Machine Learning Spring 27 Laboratory of Computer and Information Science TKK April 3th 27 Outline Independent Mixing Coefficients
More informationVariance Reduction and Ensemble Methods
Variance Reduction and Ensemble Methods Nicholas Ruozzi University of Texas at Dallas Based on the slides of Vibhav Gogate and David Sontag Last Time PAC learning Bias/variance tradeoff small hypothesis
More informationECE 5984: Introduction to Machine Learning
ECE 5984: Introduction to Machine Learning Topics: Ensemble Methods: Bagging, Boosting Readings: Murphy 16.4; Hastie 16 Dhruv Batra Virginia Tech Administrativia HW3 Due: April 14, 11:55pm You will implement
More informationEnsemble Methods and Random Forests
Ensemble Methods and Random Forests Vaishnavi S May 2017 1 Introduction We have seen various analysis for classification and regression in the course. One of the common methods to reduce the generalization
More informationStochastic Gradient Descent
Stochastic Gradient Descent Machine Learning CSE546 Carlos Guestrin University of Washington October 9, 2013 1 Logistic Regression Logistic function (or Sigmoid): Learn P(Y X) directly Assume a particular
More informationMachine Learning. Ensemble Methods. Manfred Huber
Machine Learning Ensemble Methods Manfred Huber 2015 1 Bias, Variance, Noise Classification errors have different sources Choice of hypothesis space and algorithm Training set Noise in the data The expected
More informationSPECIAL INVITED PAPER
The Annals of Statistics 2000, Vol. 28, No. 2, 337 407 SPECIAL INVITED PAPER ADDITIVE LOGISTIC REGRESSION: A STATISTICAL VIEW OF BOOSTING By Jerome Friedman, 1 Trevor Hastie 2 3 and Robert Tibshirani 2
More informationMachine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.
Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted
More informationNeural Networks and Ensemble Methods for Classification
Neural Networks and Ensemble Methods for Classification NEURAL NETWORKS 2 Neural Networks A neural network is a set of connected input/output units (neurons) where each connection has a weight associated
More informationAn Introduction to Statistical and Probabilistic Linear Models
An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakultät für Informatik Technische Universität München June 07, 2017 Introduction In statistical learning
More informationFinal Overview. Introduction to ML. Marek Petrik 4/25/2017
Final Overview Introduction to ML Marek Petrik 4/25/2017 This Course: Introduction to Machine Learning Build a foundation for practice and research in ML Basic machine learning concepts: max likelihood,
More informationLossless Online Bayesian Bagging
Lossless Online Bayesian Bagging Herbert K. H. Lee ISDS Duke University Box 90251 Durham, NC 27708 herbie@isds.duke.edu Merlise A. Clyde ISDS Duke University Box 90251 Durham, NC 27708 clyde@isds.duke.edu
More informationPattern Recognition and Machine Learning
Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability
More informationMachine Learning Lecture 7
Course Outline Machine Learning Lecture 7 Fundamentals (2 weeks) Bayes Decision Theory Probability Density Estimation Statistical Learning Theory 23.05.2016 Discriminative Approaches (5 weeks) Linear Discriminant
More informationCSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18
CSE 417T: Introduction to Machine Learning Final Review Henry Chai 12/4/18 Overfitting Overfitting is fitting the training data more than is warranted Fitting noise rather than signal 2 Estimating! "#$
More informationEnsemble Methods for Machine Learning
Ensemble Methods for Machine Learning COMBINING CLASSIFIERS: ENSEMBLE APPROACHES Common Ensemble classifiers Bagging/Random Forests Bucket of models Stacking Boosting Ensemble classifiers we ve studied
More informationLINEAR MODELS FOR CLASSIFICATION. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception
LINEAR MODELS FOR CLASSIFICATION Classification: Problem Statement 2 In regression, we are modeling the relationship between a continuous input variable x and a continuous target variable t. In classification,
More informationMODULE -4 BAYEIAN LEARNING
MODULE -4 BAYEIAN LEARNING CONTENT Introduction Bayes theorem Bayes theorem and concept learning Maximum likelihood and Least Squared Error Hypothesis Maximum likelihood Hypotheses for predicting probabilities
More informationBagging During Markov Chain Monte Carlo for Smoother Predictions
Bagging During Markov Chain Monte Carlo for Smoother Predictions Herbert K. H. Lee University of California, Santa Cruz Abstract: Making good predictions from noisy data is a challenging problem. Methods
More informationA Decision Stump. Decision Trees, cont. Boosting. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University. October 1 st, 2007
Decision Trees, cont. Boosting Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University October 1 st, 2007 1 A Decision Stump 2 1 The final tree 3 Basic Decision Tree Building Summarized
More informationday month year documentname/initials 1
ECE471-571 Pattern Recognition Lecture 13 Decision Tree Hairong Qi, Gonzalez Family Professor Electrical Engineering and Computer Science University of Tennessee, Knoxville http://www.eecs.utk.edu/faculty/qi
More informationDecision Trees: Overfitting
Decision Trees: Overfitting Emily Fox University of Washington January 30, 2017 Decision tree recap Loan status: Root 22 18 poor 4 14 Credit? Income? excellent 9 0 3 years 0 4 Fair 9 4 Term? 5 years 9
More informationTDT4173 Machine Learning
TDT4173 Machine Learning Lecture 3 Bagging & Boosting + SVMs Norwegian University of Science and Technology Helge Langseth IT-VEST 310 helgel@idi.ntnu.no 1 TDT4173 Machine Learning Outline 1 Ensemble-methods
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear
More informationLast updated: Oct 22, 2012 LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
Last updated: Oct 22, 2012 LINEAR CLASSIFIERS Problems 2 Please do Problem 8.3 in the textbook. We will discuss this in class. Classification: Problem Statement 3 In regression, we are modeling the relationship
More informationImportance Sampling: An Alternative View of Ensemble Learning. Jerome H. Friedman Bogdan Popescu Stanford University
Importance Sampling: An Alternative View of Ensemble Learning Jerome H. Friedman Bogdan Popescu Stanford University 1 PREDICTIVE LEARNING Given data: {z i } N 1 = {y i, x i } N 1 q(z) y = output or response
More informationSTA414/2104. Lecture 11: Gaussian Processes. Department of Statistics
STA414/2104 Lecture 11: Gaussian Processes Department of Statistics www.utstat.utoronto.ca Delivered by Mark Ebden with thanks to Russ Salakhutdinov Outline Gaussian Processes Exam review Course evaluations
More informationChapter 6. Ensemble Methods
Chapter 6. Ensemble Methods Wei Pan Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455 Email: weip@biostat.umn.edu PubH 7475/8475 c Wei Pan Introduction
More informationEnsembles of Classifiers.
Ensembles of Classifiers www.biostat.wisc.edu/~dpage/cs760/ 1 Goals for the lecture you should understand the following concepts ensemble bootstrap sample bagging boosting random forests error correcting
More informationBig Data Analytics. Special Topics for Computer Science CSE CSE Feb 24
Big Data Analytics Special Topics for Computer Science CSE 4095-001 CSE 5095-005 Feb 24 Fei Wang Associate Professor Department of Computer Science and Engineering fei_wang@uconn.edu Prediction III Goal
More informationData Mining: Concepts and Techniques. (3 rd ed.) Chapter 8. Chapter 8. Classification: Basic Concepts
Data Mining: Concepts and Techniques (3 rd ed.) Chapter 8 1 Chapter 8. Classification: Basic Concepts Classification: Basic Concepts Decision Tree Induction Bayes Classification Methods Rule-Based Classification
More informationStatistical Machine Learning from Data
Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Ensembles Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique Fédérale de Lausanne
More informationECE521 week 3: 23/26 January 2017
ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear
More informationA Gentle Introduction to Gradient Boosting. Cheng Li College of Computer and Information Science Northeastern University
A Gentle Introduction to Gradient Boosting Cheng Li chengli@ccs.neu.edu College of Computer and Information Science Northeastern University Gradient Boosting a powerful machine learning algorithm it can
More informationBagging. Ryan Tibshirani Data Mining: / April Optional reading: ISL 8.2, ESL 8.7
Bagging Ryan Tibshirani Data Mining: 36-462/36-662 April 23 2013 Optional reading: ISL 8.2, ESL 8.7 1 Reminder: classification trees Our task is to predict the class label y {1,... K} given a feature vector
More informationBayesian Inference: Principles and Practice 3. Sparse Bayesian Models and the Relevance Vector Machine
Bayesian Inference: Principles and Practice 3. Sparse Bayesian Models and the Relevance Vector Machine Mike Tipping Gaussian prior Marginal prior: single α Independent α Cambridge, UK Lecture 3: Overview
More informationMethods and Criteria for Model Selection. CS57300 Data Mining Fall Instructor: Bruno Ribeiro
Methods and Criteria for Model Selection CS57300 Data Mining Fall 2016 Instructor: Bruno Ribeiro Goal } Introduce classifier evaluation criteria } Introduce Bias x Variance duality } Model Assessment }
More informationI D I A P. Online Policy Adaptation for Ensemble Classifiers R E S E A R C H R E P O R T. Samy Bengio b. Christos Dimitrakakis a IDIAP RR 03-69
R E S E A R C H R E P O R T Online Policy Adaptation for Ensemble Classifiers Christos Dimitrakakis a IDIAP RR 03-69 Samy Bengio b I D I A P December 2003 D a l l e M o l l e I n s t i t u t e for Perceptual
More informationChap 1. Overview of Statistical Learning (HTF, , 2.9) Yongdai Kim Seoul National University
Chap 1. Overview of Statistical Learning (HTF, 2.1-2.6, 2.9) Yongdai Kim Seoul National University 0. Learning vs Statistical learning Learning procedure Construct a claim by observing data or using logics
More informationPATTERN CLASSIFICATION
PATTERN CLASSIFICATION Second Edition Richard O. Duda Peter E. Hart David G. Stork A Wiley-lnterscience Publication JOHN WILEY & SONS, INC. New York Chichester Weinheim Brisbane Singapore Toronto CONTENTS
More informationCS 484 Data Mining. Classification 7. Some slides are from Professor Padhraic Smyth at UC Irvine
CS 484 Data Mining Classification 7 Some slides are from Professor Padhraic Smyth at UC Irvine Bayesian Belief networks Conditional independence assumption of Naïve Bayes classifier is too strong. Allows
More informationABC random forest for parameter estimation. Jean-Michel Marin
ABC random forest for parameter estimation Jean-Michel Marin Université de Montpellier Institut Montpelliérain Alexander Grothendieck (IMAG) Institut de Biologie Computationnelle (IBC) Labex Numev! joint
More informationNotes on Discriminant Functions and Optimal Classification
Notes on Discriminant Functions and Optimal Classification Padhraic Smyth, Department of Computer Science University of California, Irvine c 2017 1 Discriminant Functions Consider a classification problem
More informationDiscussion of Least Angle Regression
Discussion of Least Angle Regression David Madigan Rutgers University & Avaya Labs Research Piscataway, NJ 08855 madigan@stat.rutgers.edu Greg Ridgeway RAND Statistics Group Santa Monica, CA 90407-2138
More informationIntroduction. Chapter 1
Chapter 1 Introduction In this book we will be concerned with supervised learning, which is the problem of learning input-output mappings from empirical data (the training dataset). Depending on the characteristics
More informationIntroduction to Gaussian Process
Introduction to Gaussian Process CS 778 Chris Tensmeyer CS 478 INTRODUCTION 1 What Topic? Machine Learning Regression Bayesian ML Bayesian Regression Bayesian Non-parametric Gaussian Process (GP) GP Regression
More informationMachine Learning for OR & FE
Machine Learning for OR & FE Regression II: Regularization and Shrinkage Methods Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com
More informationThe exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet.
CS 189 Spring 013 Introduction to Machine Learning Final You have 3 hours for the exam. The exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet. Please
More informationMachine Learning Linear Classification. Prof. Matteo Matteucci
Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)
More informationGradient Boosting (Continued)
Gradient Boosting (Continued) David Rosenberg New York University April 4, 2016 David Rosenberg (New York University) DS-GA 1003 April 4, 2016 1 / 31 Boosting Fits an Additive Model Boosting Fits an Additive
More informationLeast Absolute Shrinkage is Equivalent to Quadratic Penalization
Least Absolute Shrinkage is Equivalent to Quadratic Penalization Yves Grandvalet Heudiasyc, UMR CNRS 6599, Université de Technologie de Compiègne, BP 20.529, 60205 Compiègne Cedex, France Yves.Grandvalet@hds.utc.fr
More informationLearning Ensembles. 293S T. Yang. UCSB, 2017.
Learning Ensembles 293S T. Yang. UCSB, 2017. Outlines Learning Assembles Random Forest Adaboost Training data: Restaurant example Examples described by attribute values (Boolean, discrete, continuous)
More informationStatistics and learning: Big Data
Statistics and learning: Big Data Learning Decision Trees and an Introduction to Boosting Sébastien Gadat Toulouse School of Economics February 2017 S. Gadat (TSE) SAD 2013 1 / 30 Keywords Decision trees
More informationNonparametric Bayesian Methods (Gaussian Processes)
[70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent
More information> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel
Logistic Regression Pattern Recognition 2016 Sandro Schönborn University of Basel Two Worlds: Probabilistic & Algorithmic We have seen two conceptual approaches to classification: data class density estimation
More informationComputer Vision Group Prof. Daniel Cremers. 2. Regression (cont.)
Prof. Daniel Cremers 2. Regression (cont.) Regression with MLE (Rep.) Assume that y is affected by Gaussian noise : t = f(x, w)+ where Thus, we have p(t x, w, )=N (t; f(x, w), 2 ) 2 Maximum A-Posteriori
More informationCSCI-567: Machine Learning (Spring 2019)
CSCI-567: Machine Learning (Spring 2019) Prof. Victor Adamchik U of Southern California Mar. 19, 2019 March 19, 2019 1 / 43 Administration March 19, 2019 2 / 43 Administration TA3 is due this week March
More informationBoosting & Deep Learning
Boosting & Deep Learning Ensemble Learning n So far learning methods that learn a single hypothesis, chosen form a hypothesis space that is used to make predictions n Ensemble learning à select a collection
More information1/sqrt(B) convergence 1/B convergence B
The Error Coding Method and PICTs Gareth James and Trevor Hastie Department of Statistics, Stanford University March 29, 1998 Abstract A new family of plug-in classication techniques has recently been
More informationBagging and the Bayesian Bootstrap
Bagging and the Bayesian Bootstrap erlise A. Clyde and Herbert K. H. Lee Institute of Statistics & Decision Sciences Duke University Durham, NC 27708 Abstract Bagging is a method of obtaining more robust
More informationIntroduction to Machine Learning Midterm Exam Solutions
10-701 Introduction to Machine Learning Midterm Exam Solutions Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes,
More informationA Brief Introduction to Adaboost
A Brief Introduction to Adaboost Hongbo Deng 6 Feb, 2007 Some of the slides are borrowed from Derek Hoiem & Jan ˇSochman. 1 Outline Background Adaboost Algorithm Theory/Interpretations 2 What s So Good
More informationCSC2515 Winter 2015 Introduction to Machine Learning. Lecture 2: Linear regression
CSC2515 Winter 2015 Introduction to Machine Learning Lecture 2: Linear regression All lecture slides will be available as.pdf on the course website: http://www.cs.toronto.edu/~urtasun/courses/csc2515/csc2515_winter15.html
More informationAnnouncements. Proposals graded
Announcements Proposals graded Kevin Jamieson 2018 1 Bayesian Methods Machine Learning CSE546 Kevin Jamieson University of Washington November 1, 2018 2018 Kevin Jamieson 2 MLE Recap - coin flips Data:
More informationMachine Learning & Data Mining
Group M L D Machine Learning M & Data Mining Chapter 7 Decision Trees Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University Top 10 Algorithm in DM #1: C4.5 #2: K-Means #3: SVM
More informationClassification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012
Classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Topics Discriminant functions Logistic regression Perceptron Generative models Generative vs. discriminative
More informationOverfitting, Bias / Variance Analysis
Overfitting, Bias / Variance Analysis Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 8, 207 / 40 Outline Administration 2 Review of last lecture 3 Basic
More informationAdvanced Machine Learning
Advanced Machine Learning Deep Boosting MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Outline Model selection. Deep boosting. theory. algorithm. experiments. page 2 Model Selection Problem:
More informationBoosting. Ryan Tibshirani Data Mining: / April Optional reading: ISL 8.2, ESL , 10.7, 10.13
Boosting Ryan Tibshirani Data Mining: 36-462/36-662 April 25 2013 Optional reading: ISL 8.2, ESL 10.1 10.4, 10.7, 10.13 1 Reminder: classification trees Suppose that we are given training data (x i, y
More informationIntroduction to Machine Learning
Introduction to Machine Learning Thomas G. Dietterich tgd@eecs.oregonstate.edu 1 Outline What is Machine Learning? Introduction to Supervised Learning: Linear Methods Overfitting, Regularization, and the
More informationMidterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas
Midterm Review CS 7301: Advanced Machine Learning Vibhav Gogate The University of Texas at Dallas Supervised Learning Issues in supervised learning What makes learning hard Point Estimation: MLE vs Bayesian
More informationIntroduction to Machine Learning Midterm Exam
10-701 Introduction to Machine Learning Midterm Exam Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes, but
More informationHierarchical Boosting and Filter Generation
January 29, 2007 Plan Combining Classifiers Boosting Neural Network Structure of AdaBoost Image processing Hierarchical Boosting Hierarchical Structure Filters Combining Classifiers Combining Classifiers
More informationMIDTERM SOLUTIONS: FALL 2012 CS 6375 INSTRUCTOR: VIBHAV GOGATE
MIDTERM SOLUTIONS: FALL 2012 CS 6375 INSTRUCTOR: VIBHAV GOGATE March 28, 2012 The exam is closed book. You are allowed a double sided one page cheat sheet. Answer the questions in the spaces provided on
More informationEnsemble learning 11/19/13. The wisdom of the crowds. Chapter 11. Ensemble methods. Ensemble methods
The wisdom of the crowds Ensemble learning Sir Francis Galton discovered in the early 1900s that a collection of educated guesses can add up to very accurate predictions! Chapter 11 The paper in which
More informationAnnouncements Kevin Jamieson
Announcements My office hours TODAY 3:30 pm - 4:30 pm CSE 666 Poster Session - Pick one First poster session TODAY 4:30 pm - 7:30 pm CSE Atrium Second poster session December 12 4:30 pm - 7:30 pm CSE Atrium
More informationChris Fraley and Daniel Percival. August 22, 2008, revised May 14, 2010
Model-Averaged l 1 Regularization using Markov Chain Monte Carlo Model Composition Technical Report No. 541 Department of Statistics, University of Washington Chris Fraley and Daniel Percival August 22,
More informationFINAL: CS 6375 (Machine Learning) Fall 2014
FINAL: CS 6375 (Machine Learning) Fall 2014 The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run out of room for
More informationENSEMBLES OF DECISION RULES
ENSEMBLES OF DECISION RULES Jerzy BŁASZCZYŃSKI, Krzysztof DEMBCZYŃSKI, Wojciech KOTŁOWSKI, Roman SŁOWIŃSKI, Marcin SZELĄG Abstract. In most approaches to ensemble methods, base classifiers are decision
More informationHoldout and Cross-Validation Methods Overfitting Avoidance
Holdout and Cross-Validation Methods Overfitting Avoidance Decision Trees Reduce error pruning Cost-complexity pruning Neural Networks Early stopping Adjusting Regularizers via Cross-Validation Nearest
More informationLEAST ANGLE REGRESSION 469
LEAST ANGLE REGRESSION 469 Specifically for the Lasso, one alternative strategy for logistic regression is to use a quadratic approximation for the log-likelihood. Consider the Bayesian version of Lasso
More information9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering
Types of learning Modeling data Supervised: we know input and targets Goal is to learn a model that, given input data, accurately predicts target data Unsupervised: we know the input only and want to make
More informationFrank C Porter and Ilya Narsky: Statistical Analysis Techniques in Particle Physics Chap. c /9/9 page 331 le-tex
Frank C Porter and Ilya Narsky: Statistical Analysis Techniques in Particle Physics Chap. c15 2013/9/9 page 331 le-tex 331 15 Ensemble Learning The expression ensemble learning refers to a broad class
More informationEnsemble Methods: Jay Hyer
Ensemble Methods: committee-based learning Jay Hyer linkedin.com/in/jayhyer @adatahead Overview Why Ensemble Learning? What is learning? How is ensemble learning different? Boosting Weak and Strong Learners
More information