Decision Trees: Overfitting
|
|
- Jonah McKinney
- 5 years ago
- Views:
Transcription
1 Decision Trees: Overfitting Emily Fox University of Washington January 30, 2017 Decision tree recap Loan status: Root poor 4 14 Credit? Income? excellent years 0 4 Fair 9 4 Term? 5 years years 0 2 high 4 5 Term? 5 years 4 3 low 0 9 For each leaf node, set = majority value 2 1
2 Greedy decision tree learning Step 1: Start with an empty tree Step 2: Select a feature to split data For each split of the tree: Step 3: If nothing more to, make predictions Step 4: Otherwise, go to Step 2 & continue (recurse) on this split Pick feature split leading to lowest classification error Stopping conditions 1 & 2 Recursion 3 Scoring a loan application Start x i = (Credit = poor, Income = high, Term = 5 years) excellent Credit? fair poor 3 years Term? 5 years high Income? Low Term? 3 years 5 years i = 4 2
3 Decision trees vs logistic regression: Example Logistic regression Feature Value Weight Learne d h 0 (x) h 1 (x) x[1] 1.12 h 2 (x) x[2]
4 Depth 1: Split on x[1] y values - + Root x[1] x[1] < x[1] >= Depth 2 y values - + Root x[1] x[1] < x[1] >= x[1] x[2] x[1] < x[1] >= x[2] < x[2] >=
5 Threshold split caveat y values - + Root x[1] For threshold splits, same feature can be used multiple times x[1] < x[1] x[1] >= x[2] x[1] < x[1] >= x[2] < x[2] >= Decision boundaries Depth 1 Depth 2 Depth
6 Comparing decision boundaries Decision Tree Depth 1 Depth 3 Depth 10 Logistic Regression Degree 1 features Degree 2 features Degree 6 features 11 Predicting probabilities with decision trees Loan status: Root Credit? excellent 9 2 fair 6 9 poor 3 1 P(y = x) = 3 =
7 Depth 1 probabilities Y values - + root X1 X1 < X1 >= Depth 2 probabilities Y values - + root X1 X1 < X1 >= X1 X2 X1 < X1 >= X2 < X2 >=
8 Comparison with logistic regression Logistic Regression (Degree 2) Decision Trees (Depth 2) Class Probability 15 Overfitting in decision trees 8
9 What happens when we increase depth? Training error reduces with depth Tree depth depth = 1 depth = 2 depth = 3 depth = 5 depth = 10 Training error Decision boundary 17 Two approaches to picking simpler trees 1. Early Stopping: Stop the learning algorithm before tree becomes too complex 2. Pruning: Simplify the tree after the learning algorithm terminates 18 9
10 Technique 1: Early stopping Stopping conditions (recap): 1. All examples have the same target value 2. No more features to split on Early stopping conditions: 1. Limit tree depth (choose max_depth using validation set) 2. Do not consider splits that do not cause a sufficient decrease in classification error 3. Do not split an intermediate node which contains too few data points 19 Challenge with early stopping condition 1 Hard to know exactly when to stop Classification Error Training error Simple trees Complex trees True error Also, might want some branches of tree to go deeper while others remain shallow 20 max_depth Tree depth 10
11 Early stopping condition 2: Pros and Cons Pros: - A reasonable heuristic for early stopping to avoid useless splits Cons: - Too short sighted: We may miss out on good splits may occur right after useless splits - Saw this with xor example 21 Two approaches to picking simpler trees 1. Early Stopping: Stop the learning algorithm before tree becomes too complex 2. Pruning: Simplify the tree after the learning algorithm terminates Complements early stopping 22 11
12 Pruning: Intuition Train a complex tree, simplify later Complex Tree Simpler Tree Simplify 23 Pruning motivation Classification Error True Error Training Error Simple tree Complex tree Simplify after tree is built Don t stop too early 24 Tree depth 12
13 Scoring trees: Desired total quality format Want to balance: i. How well tree fits data ii. Complexity of tree Total cost = want to balance measure of fit + measure of complexity 25 (classification error) Large # = bad fit to training data Large # = likely to overfit Simple measure of complexity of tree Start L(T) = # of leaf nodes excellent Credit? poor good fair bad 26 13
14 Balance simplicity & predictive power Too complex, risk of overfitting Start excellent Credit? fair Term? 3 years 5 years poor high Income? low Too simple, high classification error Start Term? 3 years 5 years Emily Fox & Carlos Guestrin Balancing fit and complexity If λ=0: Total cost C(T) = Error(T) + λ L(T) tuning parameter If λ= : If λ in between: 28 14
15 Tree pruning algorithm Step 1: Consider a split Start Tree T excellent Credit? fair poor 3 years Term? 5 years high Income? low Term? 3 years 5 years Candidate for pruning 30 15
16 Step 2: Compute total cost C(T) of split Start Tree T excellent Credit? fair Term? 3 years 5 years poor high Income? low λ = 0.3 Tree Error #Leaves Total T C(T) = Error(T) + λ L(T) Term? 3 years 5 years Candidate for pruning 31 Step 2: Undo the splits on Tsmaller Start Tree T smaller excellent Credit? fair Term? 3 years 5 years poor high Income? low λ = 0.3 Tree Error #Leaves Total T T smaller C(T) = Error(T) + λ L(T) 32 Replace split by leaf node? 16
17 Prune if total cost is lower: C(T smaller ) C(T) Start Tree T smaller Worse training error but lower overall cost excellent Credit? fair Term? 3 years 5 years poor high Income? low λ = 0.3 Tree Error #Leaves Total T T smaller C(T) = Error(T) + λ L(T) 33 Replace split by leaf node? YES! Step 5: Repeat Steps 1-4 for every split excellent Start Credit? poor Decide if each split can be pruned fair 3 years Term? 5 years high Income? low 34 17
18 Summary of overfitting in decision trees What you can do now Identify when overfitting in decision trees Prevent overfitting with early stopping - Limit tree depth - Do not consider splits that do not reduce classification error - Do not split intermediate nodes with only few points Prevent overfitting by pruning complex trees - Use a total cost formula that balances classification error and tree complexity - Use total cost to merge potentially complex trees into simpler ones 36 18
19 Boosting Emily Fox University of Washington January 30, 2017 Simple (weak) classifiers are good! Income>$100K? Yes No Logistic regression w. simple features Shallow decision trees Decision stumps Low variance. Learning is fast! 38 But high bias 19
20 Finding a classifier that s just right Classification error Model complexity true error Weak learner Need stronger learner 39 Option 1: add more features or depth Option 2:????? Boosting question Can a set of weak learners be combined to create a stronger learner? Kearns and Valiant (1988) Yes! Schapire (1990) Boosting Amazing impact: simple approach widely used in industry wins most Kaggle competitions 40 20
21 Ensemble classifier A single classifier Input: x Income>$100K? Yes No Output: = f(x) - Either +1 or Classifier 21
22 Income>$100K? Yes No 43 Ensemble methods: Each classifier votes on prediction x i = (Income=$120K, Credit=Bad, Savings=$50K, Market=Good) Income>$100K? Credit history? Savings>$100K? Market conditions? Yes No Bad Good Yes No Bad Good f 1 (x i ) = +1 f 2 (x i ) = -1 f 3 (x i ) = -1 f 4 (x i ) = +1 Ensemble model Combine? Learn coefficients F(x i ) = sign(w 1 f 1 (x i ) + w 2 f 2 (x i ) + w 3 f 3 (x i ) + w 4 f 4 (x i )) 44 22
23 Prediction with ensemble Income>$100K? Credit history? Savings>$100K? Market conditions? Yes No Bad Good Yes No Bad Good f 1 (x i ) = +1 f 2 (x i ) = -1 f 3 (x i ) = -1 f 4 (x i ) = +1 Ensemble model Combine? Learn coefficients F(x i ) = sign(w 1 f 1 (x i ) + w 2 f 2 (x i ) + w 3 f 3 (x i ) + w 4 f 4 (x i )) w 1 2 w w w Ensemble classifier in general Goal: - Predict output y Either +1 or -1 - From input x Learn ensemble model: - Classifiers: f 1 (x),f 2 (x),,f T (x) - Coefficients: 1, 2,, T Prediction: 46 23
24 Boosting Training a classifier Training data Learn classifier f(x) Predict = sign(f(x)) 48 24
25 Learning decision stump Credit Income y A $130K B $80K C $110K A $110K A $90K B $120K C $30K C $60K B $95K A $60K A $98K Income? > $100K = = 49 Boosting = Focus learning on hard points Training data Learn classifier f(x) Predict = sign(f(x)) Evaluate Boosting: focus next classifier on places where f(x) does less well Learn where f(x) makes mistakes 50 25
26 Learning on weighted data: More weight on hard or more important points Weighted dataset: - Each x i,y i weighted by α i More important point = higher weight α i Learning: - Data point i counts as α i data points E.g., α i = 2 count point twice 51 Learning a decision stump on weighted data Increase weight α of harder/misclassified points 52 Credit Income y y Weight t A $130K α A B $130K $80K 0.5 B C $80K $110K 1.5 C A $110K $110K 1.2 A A $110K $90K 0.8 A B $90K $120K 0.6 B C $120K $30K 0.7 C C $30K $60K 3 C B $60K $95K 2 B A $95K $60K 0.8 A A $60K $98K 0.7 A $98K 0.9 Income? > $100K = = 26
27 Learning from weighted data in general Usually, learning from weighted data - Data point i counts as α i data points E.g., gradient ascent for logistic regression: Sum over data points α i Weigh each point by α i 53 Boosting = Greedy learning ensembles from data Training data Higher weight for points where f 1 (x) is wrong Learn classifier f 1 (x) Predict = sign(f 1 (x)) Weighted data Learn classifier & coefficient,f 2 (x) Predict = sign( 1 f 1 (x) + 2 f 2 (x)) 54 27
28 AdaBoost AdaBoost: learning ensemble [Freund & Schapire 1999] Start with same weight for all points: α i = 1/N For t = 1,,T - Learn f t (x) with data weights α i - Compute coefficient t - Recompute weights α i Final model predicts by: 56 28
29 Computing coefficient t AdaBoost: Computing coefficient t of classifier f t (x) Is f t (x) good? Yes No t large t small f t (x) is good f t has low training error Measuring error in weighted data? - Just weighted # of misclassified points 58 29
30 Weighted classification error Learned classifier = Data point (Food (Sushi was OK, great,,α=0.5),α=1.2) Correct! Mistake! Weight of correct Weight of mistakes Hide label 59 Weighted classification error Total weight of mistakes: Total weight of all points: Weighted error measures fraction of weight of mistakes: weighted_error = Best possible value is
31 AdaBoost: Formula for computing coefficient t of classifier f t (x) Is f t (x) good? Yes No weighted_error(f t ) on training data t 61 AdaBoost: learning ensemble Start with same weight for all points: α i = 1/N For t = 1,,T - Learn f t (x) with data weights α i - Compute coefficient t - Recompute weights α i Final model predicts by: 62 31
32 Recompute weights α i AdaBoost: Updating weights α i based on where classifier f t (x) makes mistakes Did f t get x i right? Yes No Decrease α i Increase α i 64 32
33 AdaBoost: Formula for updating weights α i α i - t α i e, if f t (x i )=y i t α i e, if f t (x i ) y i f t (x i )=y i? t Multiply α i by Implication Did f t get x i right? Yes No 65 AdaBoost: learning ensemble Start with same weight for all points: α i = 1/N For t = 1,,T - Learn f t (x) with data weights α i - Compute coefficient t - Recompute weights α i Final model predicts by: α i - t α i e, if f t (x i )=y i t α i e, if f t (x i ) y i 66 33
34 AdaBoost: Normalizing weights α i If x i often mistake, weight α i gets very large If x i often correct, weight α i gets very small Can cause numerical instability after many iterations Normalize weights to add up to 1 after every iteration 67 AdaBoost: learning ensemble Start with same weight for all points: α i = 1/N For t = 1,,T - Learn f t (x) with data weights α i - Compute coefficient t - Recompute weights α i α i - t α i e, if f t (x i )=y i t α i e, if f t (x i ) y i - Normalize weights α i Final model predicts by: 68 34
35 AdaBoost example t=1: Just learn a classifier on original data Original data Learned decision stump f 1 (x) 70 35
36 Updating weights α i Learned decision stump f 1 (x) Increase weight α i of misclassified points New data weights α i Boundary 71 t=2: Learn classifier on weighted data f 1 (x) Weighted data: using α i chosen in previous iteration Learned decision stump f 2 (x) on weighted data 72 36
37 Ensemble becomes weighted sum of learned classifiers f 1 (x) f 2 (x) = 73 Decision boundary of ensemble classifier after 30 iterations 74 training_error = 0 37
38 Boosted decision stumps Boosted decision stumps Start same weight for all points: α i = 1/N For t = 1,,T - Learn f t (x): pick decision stump with lowest weighted training error according to α i - Compute coefficient t - Recompute weights α i - Normalize weights α i Final model predicts by: 76 38
39 Finding best next decision stump f t (x) Consider splitting on each feature: Income>$100K? Credit history? Savings>$100K? Market conditions? Yes No Bad Good Yes No Bad Good weighted_error = 0.2 weighted_error = 0.35 weighted_error = 0.3 weighted_error = 0.4 f t = Income>$100K? Yes No t = Boosted decision stumps Start same weight for all points: α i = 1/N For t = 1,,T - Learn f t (x): pick decision stump with lowest weighted training error according to α i - Compute coefficient t - Recompute weights α i - Normalize weights α i Final model predicts by: 78 39
40 Updating weights α i Credit Income y Income>$100K? Yes α i Previous weight α α i e - t α i e t New weight α A $130K $130K /2 = 0.25 B $80K $80K C $110K $110K * 1.5 = 3 A $110K $110K 2 1 A $90K $90K 1 2 B $120K $120K C $30K $30K C $60K $60K 2 1 B $95K $95K A $60K $60K 1 2 A $98K $98K No = α i e = α i /2 = α i e 0.69 = 2α i, if ft(xi)=yi, if f t (x i ) y i Summary of boosting 40
41 Variants of boosting and related algorithms There are hundreds of variants of boosting, most important: Gradient boosting Like AdaBoost, but useful beyond basic classification Many other approaches to learn ensembles, most important: Random forests Bagging: Pick random subsets of the data - Learn a tree in each subset - Average predictions Simpler than boosting & easier to parallelize Typically higher error than boosting for same # of trees (# iterations T) 81 Impact of boosting (spoiler alert... HUGE IMPACT) Amongst most useful ML methods ever created Extremely useful in computer vision Standard approach for face detection, for example Used by most winners of ML competitions (Kaggle, KDD Cup, ) Malware classification, credit fraud detection, ads click through rate estimation, sales forecasting, ranking webpages for search, Higgs boson detection, Most deployed ML systems use model ensembles Coefficients chosen manually, with boosting, with bagging, or others 82 41
42 What you can do now Identify notion ensemble classifiers Formalize ensembles as the weighted combination of simpler classifiers Outline the boosting framework sequentially learn classifiers on weighted data Describe the AdaBoost algorithm - Learn each classifier on weighted data - Compute coefficient of classifier - Recompute data weights - Normalize weights Implement AdaBoost to create an ensemble of decision stumps 83 42
Stochastic Gradient Descent
Stochastic Gradient Descent Machine Learning CSE546 Carlos Guestrin University of Washington October 9, 2013 1 Logistic Regression Logistic function (or Sigmoid): Learn P(Y X) directly Assume a particular
More informationECE 5424: Introduction to Machine Learning
ECE 5424: Introduction to Machine Learning Topics: Ensemble Methods: Bagging, Boosting PAC Learning Readings: Murphy 16.4;; Hastie 16 Stefan Lee Virginia Tech Fighting the bias-variance tradeoff Simple
More informationA Decision Stump. Decision Trees, cont. Boosting. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University. October 1 st, 2007
Decision Trees, cont. Boosting Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University October 1 st, 2007 1 A Decision Stump 2 1 The final tree 3 Basic Decision Tree Building Summarized
More informationVBM683 Machine Learning
VBM683 Machine Learning Pinar Duygulu Slides are adapted from Dhruv Batra Bias is the algorithm's tendency to consistently learn the wrong thing by not taking into account all the information in the data
More informationECE 5984: Introduction to Machine Learning
ECE 5984: Introduction to Machine Learning Topics: Ensemble Methods: Bagging, Boosting Readings: Murphy 16.4; Hastie 16 Dhruv Batra Virginia Tech Administrativia HW3 Due: April 14, 11:55pm You will implement
More informationVoting (Ensemble Methods)
1 2 Voting (Ensemble Methods) Instead of learning a single classifier, learn many weak classifiers that are good at different parts of the data Output class: (Weighted) vote of each classifier Classifiers
More informationAnnouncements Kevin Jamieson
Announcements My office hours TODAY 3:30 pm - 4:30 pm CSE 666 Poster Session - Pick one First poster session TODAY 4:30 pm - 7:30 pm CSE Atrium Second poster session December 12 4:30 pm - 7:30 pm CSE Atrium
More informationLearning theory. Ensemble methods. Boosting. Boosting: history
Learning theory Probability distribution P over X {0, 1}; let (X, Y ) P. We get S := {(x i, y i )} n i=1, an iid sample from P. Ensemble methods Goal: Fix ɛ, δ (0, 1). With probability at least 1 δ (over
More informationCS7267 MACHINE LEARNING
CS7267 MACHINE LEARNING ENSEMBLE LEARNING Ref: Dr. Ricardo Gutierrez-Osuna at TAMU, and Aarti Singh at CMU Mingon Kang, Ph.D. Computer Science, Kennesaw State University Definition of Ensemble Learning
More informationLinear classifiers: Overfitting and regularization
Linear classifiers: Overfitting and regularization Emily Fox University of Washington January 25, 2017 Logistic regression recap 1 . Thus far, we focused on decision boundaries Score(x i ) = w 0 h 0 (x
More informationCS6375: Machine Learning Gautam Kunapuli. Decision Trees
Gautam Kunapuli Example: Restaurant Recommendation Example: Develop a model to recommend restaurants to users depending on their past dining experiences. Here, the features are cost (x ) and the user s
More informationEnsemble Methods for Machine Learning
Ensemble Methods for Machine Learning COMBINING CLASSIFIERS: ENSEMBLE APPROACHES Common Ensemble classifiers Bagging/Random Forests Bucket of models Stacking Boosting Ensemble classifiers we ve studied
More informationBoosting. CAP5610: Machine Learning Instructor: Guo-Jun Qi
Boosting CAP5610: Machine Learning Instructor: Guo-Jun Qi Weak classifiers Weak classifiers Decision stump one layer decision tree Naive Bayes A classifier without feature correlations Linear classifier
More informationCSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18
CSE 417T: Introduction to Machine Learning Final Review Henry Chai 12/4/18 Overfitting Overfitting is fitting the training data more than is warranted Fitting noise rather than signal 2 Estimating! "#$
More informationUniversität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Decision Trees. Tobias Scheffer
Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Decision Trees Tobias Scheffer Decision Trees One of many applications: credit risk Employed longer than 3 months Positive credit
More informationIntroduction to Machine Learning CMU-10701
Introduction to Machine Learning CMU-10701 23. Decision Trees Barnabás Póczos Contents Decision Trees: Definition + Motivation Algorithm for Learning Decision Trees Entropy, Mutual Information, Information
More informationLearning with multiple models. Boosting.
CS 2750 Machine Learning Lecture 21 Learning with multiple models. Boosting. Milos Hauskrecht milos@cs.pitt.edu 5329 Sennott Square Learning with multiple models: Approach 2 Approach 2: use multiple models
More informationData Mining und Maschinelles Lernen
Data Mining und Maschinelles Lernen Ensemble Methods Bias-Variance Trade-off Basic Idea of Ensembles Bagging Basic Algorithm Bagging with Costs Randomization Random Forests Boosting Stacking Error-Correcting
More informationBoosting: Foundations and Algorithms. Rob Schapire
Boosting: Foundations and Algorithms Rob Schapire Example: Spam Filtering problem: filter out spam (junk email) gather large collection of examples of spam and non-spam: From: yoav@ucsd.edu Rob, can you
More informationCS145: INTRODUCTION TO DATA MINING
CS145: INTRODUCTION TO DATA MINING 4: Vector Data: Decision Tree Instructor: Yizhou Sun yzsun@cs.ucla.edu October 10, 2017 Methods to Learn Vector Data Set Data Sequence Data Text Data Classification Clustering
More informationMachine Learning Recitation 8 Oct 21, Oznur Tastan
Machine Learning 10601 Recitation 8 Oct 21, 2009 Oznur Tastan Outline Tree representation Brief information theory Learning decision trees Bagging Random forests Decision trees Non linear classifier Easy
More informationCOMS 4771 Lecture Boosting 1 / 16
COMS 4771 Lecture 12 1. Boosting 1 / 16 Boosting What is boosting? Boosting: Using a learning algorithm that provides rough rules-of-thumb to construct a very accurate predictor. 3 / 16 What is boosting?
More informationDecision Trees. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University. February 5 th, Carlos Guestrin 1
Decision Trees Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University February 5 th, 2007 2005-2007 Carlos Guestrin 1 Linear separability A dataset is linearly separable iff 9 a separating
More informationDecision Trees. CSC411/2515: Machine Learning and Data Mining, Winter 2018 Luke Zettlemoyer, Carlos Guestrin, and Andrew Moore
Decision Trees Claude Monet, The Mulberry Tree Slides from Pedro Domingos, CSC411/2515: Machine Learning and Data Mining, Winter 2018 Luke Zettlemoyer, Carlos Guestrin, and Andrew Moore Michael Guerzhoy
More informationBoosting. Ryan Tibshirani Data Mining: / April Optional reading: ISL 8.2, ESL , 10.7, 10.13
Boosting Ryan Tibshirani Data Mining: 36-462/36-662 April 25 2013 Optional reading: ISL 8.2, ESL 10.1 10.4, 10.7, 10.13 1 Reminder: classification trees Suppose that we are given training data (x i, y
More informationNeural Networks and Ensemble Methods for Classification
Neural Networks and Ensemble Methods for Classification NEURAL NETWORKS 2 Neural Networks A neural network is a set of connected input/output units (neurons) where each connection has a weight associated
More informationLogistic Regression Logistic
Case Study 1: Estimating Click Probabilities L2 Regularization for Logistic Regression Machine Learning/Statistics for Big Data CSE599C1/STAT592, University of Washington Carlos Guestrin January 10 th,
More informationStatistics and learning: Big Data
Statistics and learning: Big Data Learning Decision Trees and an Introduction to Boosting Sébastien Gadat Toulouse School of Economics February 2017 S. Gadat (TSE) SAD 2013 1 / 30 Keywords Decision trees
More informationAdaBoost. Lecturer: Authors: Center for Machine Perception Czech Technical University, Prague
AdaBoost Lecturer: Jan Šochman Authors: Jan Šochman, Jiří Matas Center for Machine Perception Czech Technical University, Prague http://cmp.felk.cvut.cz Motivation Presentation 2/17 AdaBoost with trees
More informationChapter 14 Combining Models
Chapter 14 Combining Models T-61.62 Special Course II: Pattern Recognition and Machine Learning Spring 27 Laboratory of Computer and Information Science TKK April 3th 27 Outline Independent Mixing Coefficients
More informationEnsemble Methods. NLP ML Web! Fall 2013! Andrew Rosenberg! TA/Grader: David Guy Brizan
Ensemble Methods NLP ML Web! Fall 2013! Andrew Rosenberg! TA/Grader: David Guy Brizan How do you make a decision? What do you want for lunch today?! What did you have last night?! What are your favorite
More informationCART Classification and Regression Trees Trees can be viewed as basis expansions of simple functions. f(x) = c m 1(x R m )
CART Classification and Regression Trees Trees can be viewed as basis expansions of simple functions with R 1,..., R m R p disjoint. f(x) = M c m 1(x R m ) m=1 The CART algorithm is a heuristic, adaptive
More informationEnsemble Methods: Jay Hyer
Ensemble Methods: committee-based learning Jay Hyer linkedin.com/in/jayhyer @adatahead Overview Why Ensemble Learning? What is learning? How is ensemble learning different? Boosting Weak and Strong Learners
More informationDecision Trees. Machine Learning CSEP546 Carlos Guestrin University of Washington. February 3, 2014
Decision Trees Machine Learning CSEP546 Carlos Guestrin University of Washington February 3, 2014 17 Linear separability n A dataset is linearly separable iff there exists a separating hyperplane: Exists
More informationGenerative v. Discriminative classifiers Intuition
Logistic Regression (Continued) Generative v. Discriminative Decision rees Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University January 31 st, 2007 2005-2007 Carlos Guestrin 1 Generative
More informationLecture 3: Decision Trees
Lecture 3: Decision Trees Cognitive Systems - Machine Learning Part I: Basic Approaches of Concept Learning ID3, Information Gain, Overfitting, Pruning last change November 26, 2014 Ute Schmid (CogSys,
More informationA Brief Introduction to Adaboost
A Brief Introduction to Adaboost Hongbo Deng 6 Feb, 2007 Some of the slides are borrowed from Derek Hoiem & Jan ˇSochman. 1 Outline Background Adaboost Algorithm Theory/Interpretations 2 What s So Good
More informationCSC 411 Lecture 3: Decision Trees
CSC 411 Lecture 3: Decision Trees Roger Grosse, Amir-massoud Farahmand, and Juan Carrasquilla University of Toronto UofT CSC 411: 03-Decision Trees 1 / 33 Today Decision Trees Simple but powerful learning
More informationBoosting & Deep Learning
Boosting & Deep Learning Ensemble Learning n So far learning methods that learn a single hypothesis, chosen form a hypothesis space that is used to make predictions n Ensemble learning à select a collection
More informationLinear classifiers: Logistic regression
Linear classifiers: Logistic regression STAT/CSE 416: Machine Learning Emily Fox University of Washington April 19, 2018 How confident is your prediction? The sushi & everything else were awesome! The
More informationEnsemble Methods. Charles Sutton Data Mining and Exploration Spring Friday, 27 January 12
Ensemble Methods Charles Sutton Data Mining and Exploration Spring 2012 Bias and Variance Consider a regression problem Y = f(x)+ N(0, 2 ) With an estimate regression function ˆf, e.g., ˆf(x) =w > x Suppose
More informationGradient Boosting (Continued)
Gradient Boosting (Continued) David Rosenberg New York University April 4, 2016 David Rosenberg (New York University) DS-GA 1003 April 4, 2016 1 / 31 Boosting Fits an Additive Model Boosting Fits an Additive
More informationStatistical Machine Learning from Data
Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Ensembles Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique Fédérale de Lausanne
More informationDecision Tree Learning Lecture 2
Machine Learning Coms-4771 Decision Tree Learning Lecture 2 January 28, 2008 Two Types of Supervised Learning Problems (recap) Feature (input) space X, label (output) space Y. Unknown distribution D over
More informationLearning Decision Trees
Learning Decision Trees Machine Learning Fall 2018 Some slides from Tom Mitchell, Dan Roth and others 1 Key issues in machine learning Modeling How to formulate your problem as a machine learning problem?
More informationIMBALANCED DATA. Phishing. Admin 9/30/13. Assignment 3: - how did it go? - do the experiments help? Assignment 4. Course feedback
9/3/3 Admin Assignment 3: - how did it go? - do the experiments help? Assignment 4 IMBALANCED DATA Course feedback David Kauchak CS 45 Fall 3 Phishing 9/3/3 Setup Imbalanced data. for hour, google collects
More informationMachine Learning. Ensemble Methods. Manfred Huber
Machine Learning Ensemble Methods Manfred Huber 2015 1 Bias, Variance, Noise Classification errors have different sources Choice of hypothesis space and algorithm Training set Noise in the data The expected
More informationEnsembles. Léon Bottou COS 424 4/8/2010
Ensembles Léon Bottou COS 424 4/8/2010 Readings T. G. Dietterich (2000) Ensemble Methods in Machine Learning. R. E. Schapire (2003): The Boosting Approach to Machine Learning. Sections 1,2,3,4,6. Léon
More informationMachine Learning Ensemble Learning I Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi Spring /
Machine Learning Ensemble Learning I Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi Spring 2015 http://ce.sharif.edu/courses/93-94/2/ce717-1 / Agenda Combining Classifiers Empirical view Theoretical
More informationOliver Dürr. Statistisches Data Mining (StDM) Woche 11. Institut für Datenanalyse und Prozessdesign Zürcher Hochschule für Angewandte Wissenschaften
Statistisches Data Mining (StDM) Woche 11 Oliver Dürr Institut für Datenanalyse und Prozessdesign Zürcher Hochschule für Angewandte Wissenschaften oliver.duerr@zhaw.ch Winterthur, 29 November 2016 1 Multitasking
More informationDecision Trees. CS57300 Data Mining Fall Instructor: Bruno Ribeiro
Decision Trees CS57300 Data Mining Fall 2016 Instructor: Bruno Ribeiro Goal } Classification without Models Well, partially without a model } Today: Decision Trees 2015 Bruno Ribeiro 2 3 Why Trees? } interpretable/intuitive,
More informationBoos$ng Can we make dumb learners smart?
Boos$ng Can we make dumb learners smart? Aarti Singh Machine Learning 10-601 Nov 29, 2011 Slides Courtesy: Carlos Guestrin, Freund & Schapire 1 Why boost weak learners? Goal: Automa'cally categorize type
More informationLearning Ensembles. 293S T. Yang. UCSB, 2017.
Learning Ensembles 293S T. Yang. UCSB, 2017. Outlines Learning Assembles Random Forest Adaboost Training data: Restaurant example Examples described by attribute values (Boolean, discrete, continuous)
More informationFINAL: CS 6375 (Machine Learning) Fall 2014
FINAL: CS 6375 (Machine Learning) Fall 2014 The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run out of room for
More information10701/15781 Machine Learning, Spring 2007: Homework 2
070/578 Machine Learning, Spring 2007: Homework 2 Due: Wednesday, February 2, beginning of the class Instructions There are 4 questions on this assignment The second question involves coding Do not attach
More informationOutline: Ensemble Learning. Ensemble Learning. The Wisdom of Crowds. The Wisdom of Crowds - Really? Crowd wiser than any individual
Outline: Ensemble Learning We will describe and investigate algorithms to Ensemble Learning Lecture 10, DD2431 Machine Learning A. Maki, J. Sullivan October 2014 train weak classifiers/regressors and how
More informationLearning Theory. Machine Learning CSE546 Carlos Guestrin University of Washington. November 25, Carlos Guestrin
Learning Theory Machine Learning CSE546 Carlos Guestrin University of Washington November 25, 2013 Carlos Guestrin 2005-2013 1 What now n We have explored many ways of learning from data n But How good
More informationLecture 8. Instructor: Haipeng Luo
Lecture 8 Instructor: Haipeng Luo Boosting and AdaBoost In this lecture we discuss the connection between boosting and online learning. Boosting is not only one of the most fundamental theories in machine
More informationLearning Decision Trees
Learning Decision Trees Machine Learning Spring 2018 1 This lecture: Learning Decision Trees 1. Representation: What are decision trees? 2. Algorithm: Learning decision trees The ID3 algorithm: A greedy
More informationEnsemble learning 11/19/13. The wisdom of the crowds. Chapter 11. Ensemble methods. Ensemble methods
The wisdom of the crowds Ensemble learning Sir Francis Galton discovered in the early 1900s that a collection of educated guesses can add up to very accurate predictions! Chapter 11 The paper in which
More informationMachine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.
Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted
More informationData Mining. CS57300 Purdue University. Bruno Ribeiro. February 8, 2018
Data Mining CS57300 Purdue University Bruno Ribeiro February 8, 2018 Decision trees Why Trees? interpretable/intuitive, popular in medical applications because they mimic the way a doctor thinks model
More informationUVA CS 4501: Machine Learning
UVA CS 4501: Machine Learning Lecture 21: Decision Tree / Random Forest / Ensemble Dr. Yanjun Qi University of Virginia Department of Computer Science Where are we? è Five major sections of this course
More informationHierarchical Boosting and Filter Generation
January 29, 2007 Plan Combining Classifiers Boosting Neural Network Structure of AdaBoost Image processing Hierarchical Boosting Hierarchical Structure Filters Combining Classifiers Combining Classifiers
More informationAnalysis of the Performance of AdaBoost.M2 for the Simulated Digit-Recognition-Example
Analysis of the Performance of AdaBoost.M2 for the Simulated Digit-Recognition-Example Günther Eibl and Karl Peter Pfeiffer Institute of Biostatistics, Innsbruck, Austria guenther.eibl@uibk.ac.at Abstract.
More informationMidterm, Fall 2003
5-78 Midterm, Fall 2003 YOUR ANDREW USERID IN CAPITAL LETTERS: YOUR NAME: There are 9 questions. The ninth may be more time-consuming and is worth only three points, so do not attempt 9 unless you are
More informationData Mining: Concepts and Techniques. (3 rd ed.) Chapter 8. Chapter 8. Classification: Basic Concepts
Data Mining: Concepts and Techniques (3 rd ed.) Chapter 8 1 Chapter 8. Classification: Basic Concepts Classification: Basic Concepts Decision Tree Induction Bayes Classification Methods Rule-Based Classification
More informationChapter 18. Decision Trees and Ensemble Learning. Recall: Learning Decision Trees
CSE 473 Chapter 18 Decision Trees and Ensemble Learning Recall: Learning Decision Trees Example: When should I wait for a table at a restaurant? Attributes (features) relevant to Wait? decision: 1. Alternate:
More informationDecision Trees Lecture 12
Decision Trees Lecture 12 David Sontag New York University Slides adapted from Luke Zettlemoyer, Carlos Guestrin, and Andrew Moore Machine Learning in the ER Physician documentation Triage Information
More informationEXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING
EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING DATE AND TIME: June 9, 2018, 09.00 14.00 RESPONSIBLE TEACHER: Andreas Svensson NUMBER OF PROBLEMS: 5 AIDING MATERIAL: Calculator, mathematical
More informationFrank C Porter and Ilya Narsky: Statistical Analysis Techniques in Particle Physics Chap. c /9/9 page 331 le-tex
Frank C Porter and Ilya Narsky: Statistical Analysis Techniques in Particle Physics Chap. c15 2013/9/9 page 331 le-tex 331 15 Ensemble Learning The expression ensemble learning refers to a broad class
More informationMachine Learning and Data Mining. Decision Trees. Prof. Alexander Ihler
+ Machine Learning and Data Mining Decision Trees Prof. Alexander Ihler Decision trees Func-onal form f(x;µ): nested if-then-else statements Discrete features: fully expressive (any func-on) Structure:
More informationDecision Trees. Data Science: Jordan Boyd-Graber University of Maryland MARCH 11, Data Science: Jordan Boyd-Graber UMD Decision Trees 1 / 1
Decision Trees Data Science: Jordan Boyd-Graber University of Maryland MARCH 11, 2018 Data Science: Jordan Boyd-Graber UMD Decision Trees 1 / 1 Roadmap Classification: machines labeling data for us Last
More informationName (NetID): (1 Point)
CS446: Machine Learning Fall 2016 October 25 th, 2016 This is a closed book exam. Everything you need in order to solve the problems is supplied in the body of this exam. This exam booklet contains four
More informationECE521 Lecture7. Logistic Regression
ECE521 Lecture7 Logistic Regression Outline Review of decision theory Logistic regression A single neuron Multi-class classification 2 Outline Decision theory is conceptually easy and computationally hard
More informationLearning from Examples
Learning from Examples Data fitting Decision trees Cross validation Computational learning theory Linear classifiers Neural networks Nonparametric methods: nearest neighbor Support vector machines Ensemble
More informationVariance Reduction and Ensemble Methods
Variance Reduction and Ensemble Methods Nicholas Ruozzi University of Texas at Dallas Based on the slides of Vibhav Gogate and David Sontag Last Time PAC learning Bias/variance tradeoff small hypothesis
More informationBig Data Analytics. Special Topics for Computer Science CSE CSE Feb 24
Big Data Analytics Special Topics for Computer Science CSE 4095-001 CSE 5095-005 Feb 24 Fei Wang Associate Professor Department of Computer Science and Engineering fei_wang@uconn.edu Prediction III Goal
More informationcxx ab.ec Warm up OH 2 ax 16 0 axtb Fix any a, b, c > What is the x 2 R that minimizes ax 2 + bx + c
Warm up D cai.yo.ie p IExrL9CxsYD Sglx.Ddl f E Luo fhlexi.si dbll Fix any a, b, c > 0. 1. What is the x 2 R that minimizes ax 2 + bx + c x a b Ta OH 2 ax 16 0 x 1 Za fhkxiiso3ii draulx.h dp.d 2. What is
More informationCross Validation & Ensembling
Cross Validation & Ensembling Shan-Hung Wu shwu@cs.nthu.edu.tw Department of Computer Science, National Tsing Hua University, Taiwan Machine Learning Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning
More informationWarm up: risk prediction with logistic regression
Warm up: risk prediction with logistic regression Boss gives you a bunch of data on loans defaulting or not: {(x i,y i )} n i= x i 2 R d, y i 2 {, } You model the data as: P (Y = y x, w) = + exp( yw T
More informationLecture 7: DecisionTrees
Lecture 7: DecisionTrees What are decision trees? Brief interlude on information theory Decision tree construction Overfitting avoidance Regression trees COMP-652, Lecture 7 - September 28, 2009 1 Recall:
More informationUniversität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Intelligent Data Analysis. Decision Trees
Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Intelligent Data Analysis Decision Trees Paul Prasse, Niels Landwehr, Tobias Scheffer Decision Trees One of many applications:
More informationNeural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /9/17
3/9/7 Neural Networks Emily Fox University of Washington March 0, 207 Slides adapted from Ali Farhadi (via Carlos Guestrin and Luke Zettlemoyer) Single-layer neural network 3/9/7 Perceptron as a neural
More informationCS229 Supplemental Lecture notes
CS229 Supplemental Lecture notes John Duchi 1 Boosting We have seen so far how to solve classification (and other) problems when we have a data representation already chosen. We now talk about a procedure,
More informationCSCI-567: Machine Learning (Spring 2019)
CSCI-567: Machine Learning (Spring 2019) Prof. Victor Adamchik U of Southern California Mar. 19, 2019 March 19, 2019 1 / 43 Administration March 19, 2019 2 / 43 Administration TA3 is due this week March
More informationCSE 151 Machine Learning. Instructor: Kamalika Chaudhuri
CSE 151 Machine Learning Instructor: Kamalika Chaudhuri Ensemble Learning How to combine multiple classifiers into a single one Works well if the classifiers are complementary This class: two types of
More informationApplication of machine learning in manufacturing industry
Application of machine learning in manufacturing industry MSC Degree Thesis Written by: Hsinyi Lin Master of Science in Mathematics Supervisor: Lukács András Institute of Mathematics Eötvös Loránd University
More informationthe tree till a class assignment is reached
Decision Trees Decision Tree for Playing Tennis Prediction is done by sending the example down Prediction is done by sending the example down the tree till a class assignment is reached Definitions Internal
More informationSupervised Learning via Decision Trees
Supervised Learning via Decision Trees Lecture 4 1 Outline 1. Learning via feature splits 2. ID3 Information gain 3. Extensions Continuous features Gain ratio Ensemble learning 2 Sequence of decisions
More informationCLUe Training An Introduction to Machine Learning in R with an example from handwritten digit recognition
CLUe Training An Introduction to Machine Learning in R with an example from handwritten digit recognition Ad Feelders Universiteit Utrecht Department of Information and Computing Sciences Algorithmic Data
More informationDecision Tree And Random Forest
Decision Tree And Random Forest Dr. Ammar Mohammed Associate Professor of Computer Science ISSR, Cairo University PhD of CS ( Uni. Koblenz-Landau, Germany) Spring 2019 Contact: mailto: Ammar@cu.edu.eg
More informationEECS 349:Machine Learning Bryan Pardo
EECS 349:Machine Learning Bryan Pardo Topic 2: Decision Trees (Includes content provided by: Russel & Norvig, D. Downie, P. Domingos) 1 General Learning Task There is a set of possible examples Each example
More informationLarge-Margin Thresholded Ensembles for Ordinal Regression
Large-Margin Thresholded Ensembles for Ordinal Regression Hsuan-Tien Lin (accepted by ALT 06, joint work with Ling Li) Learning Systems Group, Caltech Workshop Talk in MLSS 2006, Taipei, Taiwan, 07/25/2006
More information15-388/688 - Practical Data Science: Decision trees and interpretable models. J. Zico Kolter Carnegie Mellon University Spring 2018
15-388/688 - Practical Data Science: Decision trees and interpretable models J. Zico Kolter Carnegie Mellon University Spring 2018 1 Outline Decision trees Training (classification) decision trees Interpreting
More informationDecision Trees. CSC411/2515: Machine Learning and Data Mining, Winter 2018 Luke Zettlemoyer, Carlos Guestrin, and Andrew Moore
Decision Trees Claude Monet, The Mulberry Tree Slides from Pedro Domingos, CSC411/2515: Machine Learning and Data Mining, Winter 2018 Luke Zettlemoyer, Carlos Guestrin, and Andrew Moore Michael Guerzhoy
More informationCS534 Machine Learning - Spring Final Exam
CS534 Machine Learning - Spring 2013 Final Exam Name: You have 110 minutes. There are 6 questions (8 pages including cover page). If you get stuck on one question, move on to others and come back to the
More informationMachine Learning. Lecture 9: Learning Theory. Feng Li.
Machine Learning Lecture 9: Learning Theory Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2018 Why Learning Theory How can we tell
More informationEnsemble Methods and Random Forests
Ensemble Methods and Random Forests Vaishnavi S May 2017 1 Introduction We have seen various analysis for classification and regression in the course. One of the common methods to reduce the generalization
More informationThe Boosting Approach to. Machine Learning. Maria-Florina Balcan 10/31/2016
The Boosting Approach to Machine Learning Maria-Florina Balcan 10/31/2016 Boosting General method for improving the accuracy of any given learning algorithm. Works by creating a series of challenge datasets
More information