BAGGING PREDICTORS AND RANDOM FOREST
|
|
- Archibald Fitzgerald
- 5 years ago
- Views:
Transcription
1 BAGGING PREDICTORS AND RANDOM FOREST DANA KANER M.SC. SEMINAR IN STATISTICS, MAY 2017 BAGIGNG PREDICTORS / LEO BREIMAN, 1996 RANDOM FORESTS / LEO BREIMAN, 2001 THE ELEMENTS OF STATISTICAL LEARNING (CHAPTERS 8,9,15) / HASTIE, TIBSHIRANI, FRIEDMAN
2 TABLE OF CONTENTS Bagging predictors - Introduction. The algorithm. Justification. Examples: classification and regressions trees, variable selection. Random forest - Decision trees. The algorithm. And more...
3 BAGGING - INTRODUCTION A method based on Bootstrap sampling for generating multiple versions of a predictor, and using them in order to get an improved predictor. Usually the algorithm works for unstable procedures (trees, neural nets). The evidence, both experimental and theoretical, is that bagging can push a good but unstable procedure a significant step towards optimality. On the other hand, it can slightly degrade the performance of stable procedures.
4 AGGREGATED PREDICTORS Consider a learning set L drawn from distribution F, and a procedure that forms a predictor θ = φ x, L for an unknown x. Now, imagine we can take K samples of N independent observations from F. In order to get a better prediction, we calculate an aggregated predictor. Great. So what exactly is the problem?
5 BAGGING - THE ALGORITHM Usually, we have a single learning set L drawn from distribution F, and a procedure that forms a predictor θ = φ x, L for an unknown x. We ll take B Bootstrap samples from distribution F: N i.i.d observations, drawn at random with replacement from L. For each sample b=1 B, we ll form a predictor φ x, L b. Numerical value: φ B = avg(φ x, L b ). Categorical value: the majority of the votes φ x, L b.
6 BETWEEN BAGGING AND BOOTSTRAP F L = {(x 1, y 1 ),,(x N, y N )} A single predictor: መθ = φ x, L F L b = {(x 1, y 1 ),, (x N, y N ) } For each sampling L from F: መθ = φ x, L Aggregated estimator: E( መθ) = E φ A = L (φ x, L ) argmax j p φ x, L = j B bootstrap samples from F : መθ b = φ x, L b Bagged estimator: φ avg(φ x B =, L b ) Majority vote of the B trees
7 BAGGING JUSTIFICATION - NUMERIC PREDICTION Consider a numeric aggregated predictor, based on replications of L from the same distribution F, φ A (x)=e L φ x, L. Given fixed x, y : E L [(y φ x, L ) 2 ] E L 2 (y φ x, L ) = (E L y E L φ x, L ) 2 = (y φ A (x)) 2 If we integrate both sides over the joint probability of x, y : E x,y E L [(y φ x, L ) 2 x, y] E x,y [(E L 2 (y φ x, L ))] E x,y,l [(y φ x, L ) 2 ] E x,y [(y φ A (x)) 2 ] average MSE(φ x, L ) over L samples MSE(φ A (x)) Therefore, φ A is better then φ, a predictor based on one sample from F.
8 BAGGING JUSTIFICATION - NUMERIC PREDICTION E x,y E L [(y φ x, L ) 2 L] E x,y [(E L 2 (y φ x, L ))] If E L [φ 2 2 x, L ] E L [φ x, L ] (i.e. small variance of φ over L replicates), an aggregation will not help. The more highly variable the φ x, L are (over different replicates of L), the more improvement aggregation may produce.
9 BAGGING JUSTIFICATION - NUMERIC PREDICTION φ A (x):=φ A (x,f) φ B : = φ A x, F Therefore, φ B is better then φ, a predictor based on one sample from F. A cross over point between instability and stability at which φ B stops improving on φ x, L and does worse: On the one hand, if the procedure φ is unstable on F, it can give improvement through aggregation - from one to many BS samples. On the other side, if the procedure φ is stable, then φ B will not be as accurate for data drown from F as φ A x, F φ x, L - from F to F.
10 BAGGING JUSTIFICATION - CLASSIFICATION Q j x = p φ x, L = j j s frequency over replicates of L for tree predictor φ. P j x The real distribution of j given x. The probability that a predictor φ will classify correctly: P correct classification x = The overall probability of correct classification: j Q j x P j x r = න( Q j x P j x ) P x dx j
11 BAGGING JUSTIFICATION - CLASSIFICATION σ j Q j x P j x max j P j x Equality for Q j x = Ι P j x =max i P i x Theoretical Best predictor: φ (x) = argmax j P j x The highest attainable correct classification rate: r = න max j P j x P x dx
12 BAGGING JUSTIFICATION - CLASSIFICATION Call φ order-correct at input x if: argmax j Q j x = argmax j P j x j given x is most likely to happen usually, φ x, L = j over many L duplicates The aggregated predictor: φ A (x) = argmax j Q j x Q A j x = Ι argmaxi Q i x =j P φ A wil classify correctly x = σ j Ι argmaxi Q i x =jp j x If φ A is order-correct at x: P φ A wil classify correctly x = max j P j x
13 BAGGING JUSTIFICATION - CLASSIFICATION Let C be all x s where φ A order-correct at. The correct-classification rate for φ A : r A = C Ι argmaxi Q i x =j P j x P xdx + c σ j Ι argmaxi Q i x =j P j x P Xdx = C max j P j x P x dx + c σ j Ι φa (x)=jp j x P x dx Reminder: r = max j P j x P x dx If a predictor is good in the sense that it is order-correct for most inputs x, then aggregation can transform it into a nearly optimal predictor.
14 BAGGING - REGRESSION TREES EXAMPLE Data sets divided into test set T and a learning set L, usually 10% and 90% respectively. A regression tree is constructed from L using 10-fold CV T squared error e s L, T. 25 Bootstrap samples L b are drawn from L predictors {φ 1 x, L 1,,φ 25 x, L 25 }. (x j, y j ) T, y j = 1 σ B B b=1 φ b x j, L b T e B L, T = 1 σ T T j=1 (yj y j ) 2 The random division of the data is repeated 100 times, the predictors: ( ഥe s, e B ).
15 BAGGING - REGRESSION TREES EXAMPLE The results ( ഥe s, e B ):
16 BAGGING - CLASSIFICATION TREES EXAMPLE Data sets are divided into test set T and a learning set L, 10% - 90%. A classification tree is constructed from L using 10-fold CV T misclassification rate e s L, T. 50 Bootstrap samples L b are drawn from L predictors {φ 1 x, L 1,,φ 50 x, L 50 }. (x j, y j ) T, the estimated class is the one having the plurality in {φ 1 x j, L b,,φ 50 x j, L b } T The bagging misclassification rate: e B L, T. The random division of the data is repeated 100 times, the predictors: ( ഥe s, e B ).
17 BAGGING - CLASSIFICATION TREES EXAMPLE The results ( ഥe s, e B ):
18 BAGGING - FORWARD STEPWISE SELECTION m: Given a predictor φ m based on x "1",, x "m 1" : Form a regression for y on x "1",, x "m 1", x "m" for each m that was not chosen. Select m that minimizes RSS(m). The output: a sequence of models for each m. Subset selection is nearly optimal if there are only a few large non-zero {β i }.
19 BAGGING - FORWARD STEPWISE SELECTION For 3,15 and 27 non-zero {β i } simulations: y = σ P=30 i=1 β i x i + ε, ε~n 0,1, L = { x 1, y 1,, x 60, y 60 }. Run FSS on L. Predictors {φ 1 (x), φ P (x)} mean squared errors e 1 s, e P s. 50 BS samples, b: predictors {φ 1 (x, L b ), φ P (x, L b )}. Bagged predictors {φ 1 B (x), φ P B (x)} mean squared errors {e 1 B, e P B }. Average over the 250 repetitions { e m ҧ S }, { e m ҧ B }, m = 1 P = 30.
20 BAGGING - FORWARD STEPWISE SELECTION e m ҧs B e m ҧ m m m FSS is better for a small number of nonzero coefficients. Bigger error, less stable. Bagging is good for unstable procedures (linear regression with all coefficients is stable)
21 BAGGING AND RANDOM FOREST Consider B bootstrap samples drawn i.i.d from F, and B tree models based on them. Bias(average of predictions) = Bias(1 st prediction) Our only hope: reduce the variance. Assume σ 2 is the single tree variance, ρ is the correlation between trees. Then, the variance of the average of predictions is ρσ ρ B σ2 B ρσ 2. Random forest: a modification of bagging that builds a large collection of decorrelated trees, and then averages them. In other words, the idea is to reduce ρ, without increasing too much σ 2 or the MSE.
22 DECISION TREES - CART - INTRO We will define a tree by Θ = {(R m, c m )} M m=1. The tree prediction መf x = σ M m=1 c m Ι xεrm. Choosing c m (given R M ): Regression: average of {y i x i ϵr m }. Classification: majority of votes of {y i x i ϵr m }. Choosing R = {R 1,, R M } - greedy algorithm: At each stage, minimizing the selected error by variable x j and by s, the splitting point of x j.
23 REGRESSION TREES SIMULATION (ESL) Simulation of L, L = 30. x i = p. y i 0,1. ቊ P y i = 1 x i 0.5 = 0.2 P y i = 1 x i > 0.5 = 0.8
24 RANDOM FOREST - THE ALGORITHM For b = 1 to B: 1. Draw a bootstrap sample L b = {(x 1, y 1 ),, (x N, y N ) } from L. 2. Grow a RF tree to L b, by repeating at each terminal node until n min is reached: Select m variables at random from the p variables of x i. Pick the best variable and split-point among the m. Split the node into two daughter nodes. Output: B tree predictions {T b (x, Θ b )} B b=1, where Θ b = {(R m, c m )} M m=1. Prediction at a new point x: ቐ መC B rf መf rf B x = 1 B σt b(x, Θ b ) x = majority vote of {T b (x, Θ b )} B b=1.
25 RANDOM FOREST - REGRESSION TREE መf rf B x = 1 B σt b x, Θ b B መf rf x = E Θ L (T x, Θ L ) Var መf rf B x = ρ x σ 2 x + 1 ρ x B σ 2 x B Var መf rf x = ρ x σ 2 x ρ x = corr(t x, θ 1 L, T x, θ 2 L ), where θ 1 L, θ 2 L are representations of two RF trees grown to the randomly sampled L. In other words, ρ x is the theoretical correlation between trees, induced by repeatedly making training samples L from the population. σ 2 x = Var(T x, θ L.
26 RANDOM FOREST - REGRESSION TREE Var θ,l T x, θ L = Var L E θ L T x, θ L + E L Var θ L T x, θ L Total Variance Var L መf rf x + within L Variance of a tree pred m RF ensemble is better than one RF tree. m As for the bias: Bias x = μ x E L መf rf x = μ x E L E θ L (T x, Θ L ) Although for different models the shape and rate of the bias curves may differ, the general trend is that as m decreases, the bias increases.
27 RANDOM FOREST - REGRESSION TREE Usually, the default value for m is p 3 for regression and p for classification.
28 OUT OF BAG SAMPLES It turns out roughly 37% of the examples in L do not appear in a particular bootstrap training set. OOB samples: For each Bootstrap sampling, the OOB are the observations x i, y i which did not appear in the sample. The OOB samples can be used to form estimates for important quantities - error estimate, variable importance and more (Breiman,1996b, OOB estimation).
29 VARIABLE IMPORTANCE RF also use the OOB samples to construct an alternative way to compute variableimportance of features. Gini importance: mean gain in Gini impurity criterion produced by x j over all trees. OOB permutation VI: When the b th tree is grown, the OOB samples are passed down the tree, and the prediction misclassification rate is recorded. Then, the values for the m th variable are randomly permuted in the OOB samples, and the rate is again computed. The VI of feature m is computed as the average increase in misclassification rate (over all trees) as compared to the out-of-bag misclassification rate.
30 VARIABLE IMPORTANCE (SPAM DATA)
SF2930 Regression Analysis
SF2930 Regression Analysis Alexandre Chotard Tree-based regression and classication 20 February 2017 1 / 30 Idag Overview Regression trees Pruning Bagging, random forests 2 / 30 Today Overview Regression
More informationVariance Reduction and Ensemble Methods
Variance Reduction and Ensemble Methods Nicholas Ruozzi University of Texas at Dallas Based on the slides of Vibhav Gogate and David Sontag Last Time PAC learning Bias/variance tradeoff small hypothesis
More informationLecture 13: Ensemble Methods
Lecture 13: Ensemble Methods Applied Multivariate Analysis Math 570, Fall 2014 Xingye Qiao Department of Mathematical Sciences Binghamton University E-mail: qiao@math.binghamton.edu 1 / 71 Outline 1 Bootstrap
More informationBagging. Ryan Tibshirani Data Mining: / April Optional reading: ISL 8.2, ESL 8.7
Bagging Ryan Tibshirani Data Mining: 36-462/36-662 April 23 2013 Optional reading: ISL 8.2, ESL 8.7 1 Reminder: classification trees Our task is to predict the class label y {1,... K} given a feature vector
More informationUVA CS 4501: Machine Learning
UVA CS 4501: Machine Learning Lecture 21: Decision Tree / Random Forest / Ensemble Dr. Yanjun Qi University of Virginia Department of Computer Science Where are we? è Five major sections of this course
More informationABC random forest for parameter estimation. Jean-Michel Marin
ABC random forest for parameter estimation Jean-Michel Marin Université de Montpellier Institut Montpelliérain Alexander Grothendieck (IMAG) Institut de Biologie Computationnelle (IBC) Labex Numev! joint
More informationRANDOMIZING OUTPUTS TO INCREASE PREDICTION ACCURACY
1 RANDOMIZING OUTPUTS TO INCREASE PREDICTION ACCURACY Leo Breiman Statistics Department University of California Berkeley, CA. 94720 leo@stat.berkeley.edu Technical Report 518, May 1, 1998 abstract Bagging
More informationEnsemble Methods and Random Forests
Ensemble Methods and Random Forests Vaishnavi S May 2017 1 Introduction We have seen various analysis for classification and regression in the course. One of the common methods to reduce the generalization
More informationImportance Sampling: An Alternative View of Ensemble Learning. Jerome H. Friedman Bogdan Popescu Stanford University
Importance Sampling: An Alternative View of Ensemble Learning Jerome H. Friedman Bogdan Popescu Stanford University 1 PREDICTIVE LEARNING Given data: {z i } N 1 = {y i, x i } N 1 q(z) y = output or response
More informationStatistical Machine Learning from Data
Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Ensembles Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique Fédérale de Lausanne
More informationBoosting. Ryan Tibshirani Data Mining: / April Optional reading: ISL 8.2, ESL , 10.7, 10.13
Boosting Ryan Tibshirani Data Mining: 36-462/36-662 April 25 2013 Optional reading: ISL 8.2, ESL 10.1 10.4, 10.7, 10.13 1 Reminder: classification trees Suppose that we are given training data (x i, y
More informationData Mining und Maschinelles Lernen
Data Mining und Maschinelles Lernen Ensemble Methods Bias-Variance Trade-off Basic Idea of Ensembles Bagging Basic Algorithm Bagging with Costs Randomization Random Forests Boosting Stacking Error-Correcting
More informationInformal Definition: Telling things apart
9. Decision Trees Informal Definition: Telling things apart 2 Nominal data No numeric feature vector Just a list or properties: Banana: longish, yellow Apple: round, medium sized, different colors like
More informationClassification using stochastic ensembles
July 31, 2014 Topics Introduction Topics Classification Application and classfication Classification and Regression Trees Stochastic ensemble methods Our application: USAID Poverty Assessment Tools Topics
More informationIEEE TRANSACTIONS ON SYSTEMS, MAN AND CYBERNETICS - PART B 1
IEEE TRANSACTIONS ON SYSTEMS, MAN AND CYBERNETICS - PART B 1 1 2 IEEE TRANSACTIONS ON SYSTEMS, MAN AND CYBERNETICS - PART B 2 An experimental bias variance analysis of SVM ensembles based on resampling
More informationStatistics and learning: Big Data
Statistics and learning: Big Data Learning Decision Trees and an Introduction to Boosting Sébastien Gadat Toulouse School of Economics February 2017 S. Gadat (TSE) SAD 2013 1 / 30 Keywords Decision trees
More informationUniversität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Decision Trees. Tobias Scheffer
Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Decision Trees Tobias Scheffer Decision Trees One of many applications: credit risk Employed longer than 3 months Positive credit
More informationChapter 6. Ensemble Methods
Chapter 6. Ensemble Methods Wei Pan Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455 Email: weip@biostat.umn.edu PubH 7475/8475 c Wei Pan Introduction
More informationday month year documentname/initials 1
ECE471-571 Pattern Recognition Lecture 13 Decision Tree Hairong Qi, Gonzalez Family Professor Electrical Engineering and Computer Science University of Tennessee, Knoxville http://www.eecs.utk.edu/faculty/qi
More informationEXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING
EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING DATE AND TIME: June 9, 2018, 09.00 14.00 RESPONSIBLE TEACHER: Andreas Svensson NUMBER OF PROBLEMS: 5 AIDING MATERIAL: Calculator, mathematical
More informationCART Classification and Regression Trees Trees can be viewed as basis expansions of simple functions. f(x) = c m 1(x R m )
CART Classification and Regression Trees Trees can be viewed as basis expansions of simple functions with R 1,..., R m R p disjoint. f(x) = M c m 1(x R m ) m=1 The CART algorithm is a heuristic, adaptive
More informationMachine Learning. Ensemble Methods. Manfred Huber
Machine Learning Ensemble Methods Manfred Huber 2015 1 Bias, Variance, Noise Classification errors have different sources Choice of hypothesis space and algorithm Training set Noise in the data The expected
More informationMachine Learning Recitation 8 Oct 21, Oznur Tastan
Machine Learning 10601 Recitation 8 Oct 21, 2009 Oznur Tastan Outline Tree representation Brief information theory Learning decision trees Bagging Random forests Decision trees Non linear classifier Easy
More informationECE 5424: Introduction to Machine Learning
ECE 5424: Introduction to Machine Learning Topics: Ensemble Methods: Bagging, Boosting PAC Learning Readings: Murphy 16.4;; Hastie 16 Stefan Lee Virginia Tech Fighting the bias-variance tradeoff Simple
More informationEnsemble Methods. Charles Sutton Data Mining and Exploration Spring Friday, 27 January 12
Ensemble Methods Charles Sutton Data Mining and Exploration Spring 2012 Bias and Variance Consider a regression problem Y = f(x)+ N(0, 2 ) With an estimate regression function ˆf, e.g., ˆf(x) =w > x Suppose
More informationEnsembles of Classifiers.
Ensembles of Classifiers www.biostat.wisc.edu/~dpage/cs760/ 1 Goals for the lecture you should understand the following concepts ensemble bootstrap sample bagging boosting random forests error correcting
More informationHoldout and Cross-Validation Methods Overfitting Avoidance
Holdout and Cross-Validation Methods Overfitting Avoidance Decision Trees Reduce error pruning Cost-complexity pruning Neural Networks Early stopping Adjusting Regularizers via Cross-Validation Nearest
More informationData Mining Classification: Basic Concepts and Techniques. Lecture Notes for Chapter 3. Introduction to Data Mining, 2nd Edition
Data Mining Classification: Basic Concepts and Techniques Lecture Notes for Chapter 3 by Tan, Steinbach, Karpatne, Kumar 1 Classification: Definition Given a collection of records (training set ) Each
More informationCART Classification and Regression Trees Trees can be viewed as basis expansions of simple functions
CART Classification and Regression Trees Trees can be viewed as basis expansions of simple functions f (x) = M c m 1(x R m ) m=1 with R 1,..., R m R p disjoint. The CART algorithm is a heuristic, adaptive
More informationRandom Forests. These notes rely heavily on Biau and Scornet (2016) as well as the other references at the end of the notes.
Random Forests One of the best known classifiers is the random forest. It is very simple and effective but there is still a large gap between theory and practice. Basically, a random forest is an average
More informationPDEEC Machine Learning 2016/17
PDEEC Machine Learning 2016/17 Lecture - Model assessment, selection and Ensemble Jaime S. Cardoso jaime.cardoso@inesctec.pt INESC TEC and Faculdade Engenharia, Universidade do Porto Nov. 07, 2017 1 /
More informationDiagnostics. Gad Kimmel
Diagnostics Gad Kimmel Outline Introduction. Bootstrap method. Cross validation. ROC plot. Introduction Motivation Estimating properties of an estimator. Given data samples say the average. x 1, x 2,...,
More information1 Handling of Continuous Attributes in C4.5. Algorithm
.. Spring 2009 CSC 466: Knowledge Discovery from Data Alexander Dekhtyar.. Data Mining: Classification/Supervised Learning Potpourri Contents 1. C4.5. and continuous attributes: incorporating continuous
More informationLossless Online Bayesian Bagging
Lossless Online Bayesian Bagging Herbert K. H. Lee ISDS Duke University Box 90251 Durham, NC 27708 herbie@isds.duke.edu Merlise A. Clyde ISDS Duke University Box 90251 Durham, NC 27708 clyde@isds.duke.edu
More informationAdvanced Statistical Methods: Beyond Linear Regression
Advanced Statistical Methods: Beyond Linear Regression John R. Stevens Utah State University Notes 3. Statistical Methods II Mathematics Educators Worshop 28 March 2009 1 http://www.stat.usu.edu/~jrstevens/pcmi
More informationWALD LECTURE II LOOKING INSIDE THE BLACK BOX. Leo Breiman UCB Statistics
1 WALD LECTURE II LOOKING INSIDE THE BLACK BOX Leo Breiman UCB Statistics leo@stat.berkeley.edu ORIGIN OF BLACK BOXES 2 Statistics uses data to explore problems. Think of the data as being generated by
More informationMachine Learning Ensemble Learning I Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi Spring /
Machine Learning Ensemble Learning I Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi Spring 2015 http://ce.sharif.edu/courses/93-94/2/ce717-1 / Agenda Combining Classifiers Empirical view Theoretical
More informationCSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18
CSE 417T: Introduction to Machine Learning Final Review Henry Chai 12/4/18 Overfitting Overfitting is fitting the training data more than is warranted Fitting noise rather than signal 2 Estimating! "#$
More informationFinal Overview. Introduction to ML. Marek Petrik 4/25/2017
Final Overview Introduction to ML Marek Petrik 4/25/2017 This Course: Introduction to Machine Learning Build a foundation for practice and research in ML Basic machine learning concepts: max likelihood,
More informationChapter 14 Combining Models
Chapter 14 Combining Models T-61.62 Special Course II: Pattern Recognition and Machine Learning Spring 27 Laboratory of Computer and Information Science TKK April 3th 27 Outline Independent Mixing Coefficients
More informationData Mining. CS57300 Purdue University. Bruno Ribeiro. February 8, 2018
Data Mining CS57300 Purdue University Bruno Ribeiro February 8, 2018 Decision trees Why Trees? interpretable/intuitive, popular in medical applications because they mimic the way a doctor thinks model
More informationCS145: INTRODUCTION TO DATA MINING
CS145: INTRODUCTION TO DATA MINING 4: Vector Data: Decision Tree Instructor: Yizhou Sun yzsun@cs.ucla.edu October 10, 2017 Methods to Learn Vector Data Set Data Sequence Data Text Data Classification Clustering
More informationEnsemble learning 11/19/13. The wisdom of the crowds. Chapter 11. Ensemble methods. Ensemble methods
The wisdom of the crowds Ensemble learning Sir Francis Galton discovered in the early 1900s that a collection of educated guesses can add up to very accurate predictions! Chapter 11 The paper in which
More informationBagging and Other Ensemble Methods
Bagging and Other Ensemble Methods Sargur N. Srihari srihari@buffalo.edu 1 Regularization Strategies 1. Parameter Norm Penalties 2. Norm Penalties as Constrained Optimization 3. Regularization and Underconstrained
More informationLearning with multiple models. Boosting.
CS 2750 Machine Learning Lecture 21 Learning with multiple models. Boosting. Milos Hauskrecht milos@cs.pitt.edu 5329 Sennott Square Learning with multiple models: Approach 2 Approach 2: use multiple models
More informationClassification of Longitudinal Data Using Tree-Based Ensemble Methods
Classification of Longitudinal Data Using Tree-Based Ensemble Methods W. Adler, and B. Lausen 29.06.2009 Overview 1 Ensemble classification of dependent observations 2 3 4 Classification of dependent observations
More informationDecision trees COMS 4771
Decision trees COMS 4771 1. Prediction functions (again) Learning prediction functions IID model for supervised learning: (X 1, Y 1),..., (X n, Y n), (X, Y ) are iid random pairs (i.e., labeled examples).
More informationConstructing Prediction Intervals for Random Forests
Senior Thesis in Mathematics Constructing Prediction Intervals for Random Forests Author: Benjamin Lu Advisor: Dr. Jo Hardin Submitted to Pomona College in Partial Fulfillment of the Degree of Bachelor
More informationProbabilistic Random Forests: Predicting Data Point Specific Misclassification Probabilities ; CU- CS
University of Colorado, Boulder CU Scholar Computer Science Technical Reports Computer Science Spring 5-1-23 Probabilistic Random Forests: Predicting Data Point Specific Misclassification Probabilities
More informationA Simple Algorithm for Learning Stable Machines
A Simple Algorithm for Learning Stable Machines Savina Andonova and Andre Elisseeff and Theodoros Evgeniou and Massimiliano ontil Abstract. We present an algorithm for learning stable machines which is
More informationDeconstructing Data Science
econstructing ata Science avid Bamman, UC Berkeley Info 290 Lecture 6: ecision trees & random forests Feb 2, 2016 Linear regression eep learning ecision trees Ordinal regression Probabilistic graphical
More informationMachine Learning and Data Mining. Decision Trees. Prof. Alexander Ihler
+ Machine Learning and Data Mining Decision Trees Prof. Alexander Ihler Decision trees Func-onal form f(x;µ): nested if-then-else statements Discrete features: fully expressive (any func-on) Structure:
More informationLearning Ensembles. 293S T. Yang. UCSB, 2017.
Learning Ensembles 293S T. Yang. UCSB, 2017. Outlines Learning Assembles Random Forest Adaboost Training data: Restaurant example Examples described by attribute values (Boolean, discrete, continuous)
More informationNeural Networks and Ensemble Methods for Classification
Neural Networks and Ensemble Methods for Classification NEURAL NETWORKS 2 Neural Networks A neural network is a set of connected input/output units (neurons) where each connection has a weight associated
More informationMachine Learning. Lecture 9: Learning Theory. Feng Li.
Machine Learning Lecture 9: Learning Theory Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2018 Why Learning Theory How can we tell
More informationarxiv: v5 [stat.me] 18 Apr 2016
Correlation and variable importance in random forests Baptiste Gregorutti 12, Bertrand Michel 2, Philippe Saint-Pierre 2 1 Safety Line 15 rue Jean-Baptiste Berlier, 75013 Paris, France arxiv:1310.5726v5
More informationData analysis strategies for high dimensional social science data M3 Conference May 2013
Data analysis strategies for high dimensional social science data M3 Conference May 2013 W. Holmes Finch, Maria Hernández Finch, David E. McIntosh, & Lauren E. Moss Ball State University High dimensional
More informationDecision Trees. CS57300 Data Mining Fall Instructor: Bruno Ribeiro
Decision Trees CS57300 Data Mining Fall 2016 Instructor: Bruno Ribeiro Goal } Classification without Models Well, partially without a model } Today: Decision Trees 2015 Bruno Ribeiro 2 3 Why Trees? } interpretable/intuitive,
More informationRegression tree methods for subgroup identification I
Regression tree methods for subgroup identification I Xu He Academy of Mathematics and Systems Science, Chinese Academy of Sciences March 25, 2014 Xu He (AMSS, CAS) March 25, 2014 1 / 34 Outline The problem
More informationTDT4173 Machine Learning
TDT4173 Machine Learning Lecture 9 Learning Classifiers: Bagging & Boosting Norwegian University of Science and Technology Helge Langseth IT-VEST 310 helgel@idi.ntnu.no 1 TDT4173 Machine Learning Outline
More informationStatPatternRecognition: A C++ Package for Multivariate Classification of HEP Data. Ilya Narsky, Caltech
StatPatternRecognition: A C++ Package for Multivariate Classification of HEP Data Ilya Narsky, Caltech Motivation Introduction advanced classification tools in a convenient C++ package for HEP researchers
More informationSupplementary material for Intervention in prediction measure: a new approach to assessing variable importance for random forests
Supplementary material for Intervention in prediction measure: a new approach to assessing variable importance for random forests Irene Epifanio Dept. Matemàtiques and IMAC Universitat Jaume I Castelló,
More informationRandom Forests for Ordinal Response Data: Prediction and Variable Selection
Silke Janitza, Gerhard Tutz, Anne-Laure Boulesteix Random Forests for Ordinal Response Data: Prediction and Variable Selection Technical Report Number 174, 2014 Department of Statistics University of Munich
More informationVBM683 Machine Learning
VBM683 Machine Learning Pinar Duygulu Slides are adapted from Dhruv Batra Bias is the algorithm's tendency to consistently learn the wrong thing by not taking into account all the information in the data
More informationthe tree till a class assignment is reached
Decision Trees Decision Tree for Playing Tennis Prediction is done by sending the example down Prediction is done by sending the example down the tree till a class assignment is reached Definitions Internal
More informationCross Validation & Ensembling
Cross Validation & Ensembling Shan-Hung Wu shwu@cs.nthu.edu.tw Department of Computer Science, National Tsing Hua University, Taiwan Machine Learning Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning
More information2D1431 Machine Learning. Bagging & Boosting
2D1431 Machine Learning Bagging & Boosting Outline Bagging and Boosting Evaluating Hypotheses Feature Subset Selection Model Selection Question of the Day Three salesmen arrive at a hotel one night and
More informationChapter ML:II (continued)
Chapter ML:II (continued) II. Machine Learning Basics Regression Concept Learning: Search in Hypothesis Space Concept Learning: Search in Version Space Measuring Performance ML:II-96 Basics STEIN/LETTMANN
More informationCSE 151 Machine Learning. Instructor: Kamalika Chaudhuri
CSE 151 Machine Learning Instructor: Kamalika Chaudhuri Ensemble Learning How to combine multiple classifiers into a single one Works well if the classifiers are complementary This class: two types of
More informationDiscrimination Among Groups. Classification (and Regression) Trees
Discrimination Among Groups P Are groups significantly different? (How valid are the groups?) < Multivariate Analysis of Variance [(NP)MANOVA] < Multi-Response Permutation Procedures [MRPP] < Analysis
More informationLow Bias Bagged Support Vector Machines
Low Bias Bagged Support Vector Machines Giorgio Valentini Dipartimento di Scienze dell Informazione, Università degli Studi di Milano, Italy INFM, Istituto Nazionale per la Fisica della Materia, Italy.
More informationJEROME H. FRIEDMAN Department of Statistics and Stanford Linear Accelerator Center, Stanford University, Stanford, CA
1 SEPARATING SIGNAL FROM BACKGROUND USING ENSEMBLES OF RULES JEROME H. FRIEDMAN Department of Statistics and Stanford Linear Accelerator Center, Stanford University, Stanford, CA 94305 E-mail: jhf@stanford.edu
More informationPerformance Evaluation and Comparison
Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Cross Validation and Resampling 3 Interval Estimation
More informationFrom statistics to data science. BAE 815 (Fall 2017) Dr. Zifei Liu
From statistics to data science BAE 815 (Fall 2017) Dr. Zifei Liu Zifeiliu@ksu.edu Why? How? What? How much? How many? Individual facts (quantities, characters, or symbols) The Data-Information-Knowledge-Wisdom
More informationEnsemble Methods for Machine Learning
Ensemble Methods for Machine Learning COMBINING CLASSIFIERS: ENSEMBLE APPROACHES Common Ensemble classifiers Bagging/Random Forests Bucket of models Stacking Boosting Ensemble classifiers we ve studied
More informationREGRESSION TREE CREDIBILITY MODEL
LIQUN DIAO AND CHENGGUO WENG Department of Statistics and Actuarial Science, University of Waterloo Advances in Predictive Analytics Conference, Waterloo, Ontario Dec 1, 2017 Overview Statistical }{{ Method
More informationCover Letter. Inflated Results and Spurious Conclusions: A Re-Analysis of the MacArthur Violence Risk
Cover Letter Please consider this manuscript, entitled The Lack of Cross-Validation Can Lead to Inflated Results and Spurious Conclusions: A Re-Analysis of the MacArthur Violence Risk Assessment Study,
More informationLearning theory. Ensemble methods. Boosting. Boosting: history
Learning theory Probability distribution P over X {0, 1}; let (X, Y ) P. We get S := {(x i, y i )} n i=1, an iid sample from P. Ensemble methods Goal: Fix ɛ, δ (0, 1). With probability at least 1 δ (over
More informationVariable importance measures in regression and classification methods
MASTER THESIS Variable importance measures in regression and classification methods Institute for Statistics and Mathematics Vienna University of Economics and Business under the supervision of Univ.Prof.
More informationEXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING
EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING DATE AND TIME: August 30, 2018, 14.00 19.00 RESPONSIBLE TEACHER: Niklas Wahlström NUMBER OF PROBLEMS: 5 AIDING MATERIAL: Calculator, mathematical
More informationEnsemble Methods: Jay Hyer
Ensemble Methods: committee-based learning Jay Hyer linkedin.com/in/jayhyer @adatahead Overview Why Ensemble Learning? What is learning? How is ensemble learning different? Boosting Weak and Strong Learners
More information8.6 Bayesian neural networks (BNN) [Book, Sect. 6.7]
8.6 Bayesian neural networks (BNN) [Book, Sect. 6.7] While cross-validation allows one to find the weight penalty parameters which would give the model good generalization capability, the separation of
More informationPerformance of Cross Validation in Tree-Based Models
Performance of Cross Validation in Tree-Based Models Seoung Bum Kim, Xiaoming Huo, Kwok-Leung Tsui School of Industrial and Systems Engineering Georgia Institute of Technology Atlanta, Georgia 30332 {sbkim,xiaoming,ktsui}@isye.gatech.edu
More informationInfluence measures for CART
Jean-Michel Poggi Orsay, Paris Sud & Paris Descartes Joint work with Avner Bar-Hen Servane Gey (MAP5, Paris Descartes ) CART CART Classification And Regression Trees, Breiman et al. (1984) Learning set
More informationApplied Machine Learning Annalisa Marsico
Applied Machine Learning Annalisa Marsico OWL RNA Bionformatics group Max Planck Institute for Molecular Genetics Free University of Berlin 22 April, SoSe 2015 Goals Feature Selection rather than Feature
More informationRandom Forests: Finding Quasars
This is page i Printer: Opaque this Random Forests: Finding Quasars Leo Breiman Michael Last John Rice Department of Statistics University of California, Berkeley 0.1 Introduction The automatic classification
More informationBootstrap, Jackknife and other resampling methods
Bootstrap, Jackknife and other resampling methods Part VI: Cross-validation Rozenn Dahyot Room 128, Department of Statistics Trinity College Dublin, Ireland dahyot@mee.tcd.ie 2005 R. Dahyot (TCD) 453 Modern
More informationVariance and Bias for General Loss Functions
Machine Learning, 51, 115 135, 2003 c 2003 Kluwer Academic Publishers. Manufactured in The Netherlands. Variance and Bias for General Loss Functions GARETH M. JAMES Marshall School of Business, University
More informationDiscriminative v. generative
Discriminative v. generative Naive Bayes 2 Naive Bayes P (x ij,y i )= Y i P (y i ) Y j P (x ij y i ) P (y i =+)=p MLE: max P (x ij,y i ) a j,b j,p p = 1 N P [yi =+] P (x ij =1 y i = ) = a j P (x ij =1
More informationMachine Learning Linear Classification. Prof. Matteo Matteucci
Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)
More informationCS7267 MACHINE LEARNING
CS7267 MACHINE LEARNING ENSEMBLE LEARNING Ref: Dr. Ricardo Gutierrez-Osuna at TAMU, and Aarti Singh at CMU Mingon Kang, Ph.D. Computer Science, Kennesaw State University Definition of Ensemble Learning
More informationMachine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.
Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted
More informationcxx ab.ec Warm up OH 2 ax 16 0 axtb Fix any a, b, c > What is the x 2 R that minimizes ax 2 + bx + c
Warm up D cai.yo.ie p IExrL9CxsYD Sglx.Ddl f E Luo fhlexi.si dbll Fix any a, b, c > 0. 1. What is the x 2 R that minimizes ax 2 + bx + c x a b Ta OH 2 ax 16 0 x 1 Za fhkxiiso3ii draulx.h dp.d 2. What is
More informationMachine Learning for OR & FE
Machine Learning for OR & FE Regression II: Regularization and Shrinkage Methods Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com
More informationLecture 24: Other (Non-linear) Classifiers: Decision Tree Learning, Boosting, and Support Vector Classification Instructor: Prof. Ganesh Ramakrishnan
Lecture 24: Other (Non-linear) Classifiers: Decision Tree Learning, Boosting, and Support Vector Classification Instructor: Prof Ganesh Ramakrishnan October 20, 2016 1 / 25 Decision Trees: Cascade of step
More informationMachine Learning. Nathalie Villa-Vialaneix - Formation INRA, Niveau 3
Machine Learning Nathalie Villa-Vialaneix - nathalie.villa@univ-paris1.fr http://www.nathalievilla.org IUT STID (Carcassonne) & SAMM (Université Paris 1) Formation INRA, Niveau 3 Formation INRA (Niveau
More informationHarrison B. Prosper. Bari Lectures
Harrison B. Prosper Florida State University Bari Lectures 30, 31 May, 1 June 2016 Lectures on Multivariate Methods Harrison B. Prosper Bari, 2016 1 h Lecture 1 h Introduction h Classification h Grid Searches
More informationBig Data Analytics. Special Topics for Computer Science CSE CSE Feb 24
Big Data Analytics Special Topics for Computer Science CSE 4095-001 CSE 5095-005 Feb 24 Fei Wang Associate Professor Department of Computer Science and Engineering fei_wang@uconn.edu Prediction III Goal
More informationEnsemble Learning in the Presence of Noise
Universidad Autónoma de Madrid Master s Thesis Ensemble Learning in the Presence of Noise Author: Maryam Sabzevari Supervisors: Dr. Gonzalo Martínez Muñoz, Dr. Alberto Suárez González Submitted in partial
More informationDecision Tree Learning Lecture 2
Machine Learning Coms-4771 Decision Tree Learning Lecture 2 January 28, 2008 Two Types of Supervised Learning Problems (recap) Feature (input) space X, label (output) space Y. Unknown distribution D over
More information