Measures of Diversity in Combining Classifiers
|
|
- Winfred Melton
- 5 years ago
- Views:
Transcription
1 Measures of Diversity in Combining Classifiers Part. Non-pairwise diversity measures For fewer cartoons and more formulas:
2 Random forest :, x, θ k (i.i.d, k=,,l), L is large Strength and correlation: D(x): the class label of x suggested by D Define margin function for a random forest to be mr(x, ω i ) = P θ (D(x)=ω i ) max t i P θ (D(x)=ω t ), and the strength of the set of classifiers to be s = E x,ω [mr(x, ω)] Denote ω s = argmax t i P θ (D(x)=ω ) t and define raw margin function to be rmr(x, ω i, θ) = I (D(x)=ω i ) I (D(x)=ω s ), where I (.) is an indicator function.
3 The probability of error of the ensemble is bounded as follows PE* ρ ( s ) / s (mean) correlation between rmr(d ), i rmr(d k ) (averaged across all pairs of classifiers) strength Although the bound is likely to be loose, it fulfils the same suggestive function for random forests as VC-type bounds do for other types of classifiers.
4 The -class case: mr(x, ω i ) = P θ (D(x)=ω i ), i =, the strength of the set of classifiers is s = E x,ω [mr(x, ω)] /N [ Σ P θ (D(x)=ω ) + Σ P θ (D(x)=ω ) ] - True label ω True label ω The correlation ρ can be calculated as the averaged pairwise correlation between the oracle outputs NB. Both are just estimates!
5 An example: banana-shaped data (gendatb routine from Matlab toolbox PRtools) Training N = 600 data points Testing (a separate set) N = 600 data points The idea was to avoid using OB estimates which anyway simulate estimates on an independent testing set of the same size
6 Simple bagging, L = 50 classifiers Error bound Testing error
7 Q strength correlation
8 L=50, N=600 L=50, N=00
9 true labels guessed labels Is strength related to accuracy? D D D / 0 /3 P θ (D(x)=ω ) x (7/0) - 0 / / /3 /3 (/0) x (0/3) - 0 / /3 /3 P θ (D(x)=ω ) 7/0 4/0 7/0 4/0 /3 6/0 /3 accuracy strength
10 individual testing error averaged individual testing error strength
11 Part : Non-pairwise diversity measures 0. A note on pairwise diversity (ρ) for random forests Measures based on a single data point + averaging (entropy, spread, KW variance) Interrater agreement (kappa for multiple raters) Measures based on difficulties of the data points Relationship with accuracy Open problems
12 Now we look at the whole ensemble of classifiers. Classifier outputs Oracle(binary) 0 0 Continuous-valued (measurement level) Class labels (abstract level) ω 0. ω 0.4 ω ω 4 0. ω ω ω ω ω 3 ω Ordered list of class labels ω ω 8 ω ω 0 ω 9 ω ω ω 0 ω ω ω 9 ω 0 ω ω ω 8 ω 0 ω 0 ω 8 called decision profile (remember for later)
13 Measures based on a single data point (case, instance, example, object, whatever) and subsequently averaged over the whole data set.. Measures based on all data points. For oracle outputs and L = 8 classifiers, are these diverse? No-o-o-o-o-o-o! Nope Yes Yes.
14 ENTROPY (oracle outputs) How do we measure how far we are from the desired pattern of L/ 0 s and L/ s for N objects? [ Σ 0 s, Σ s ] k E = min { } L - N Σ k Consider the output 0 or as a random variable with relative frequencies p 0 = (Σ 0 s) / L and p = (Σ s) / L, respectively. Then the (proper) formula for the entropy of the distribution, averaged across the N data points will be H = - Σ k [ p 0 log p 0 + p log p ] k N [Cunningham Carney, 000]
15 ENTROPY (label outputs) Votes of L=0 classifiers for a single x* ω ω ω 3 ω 4 H = - /N Σ k [ Σ i p i log p i ] k
16 Breiman s Bias-Variance decomposition, 996 Assume that classifier output for a given x* is a random variable with p.m.f. P(ω x*,d),, P(ω c x*,d). The classification error is P(error x*) = Σ j P(ω j x*) P(ω j x*, D) = { P(ω B x*) - P(ω B x*) - Σ j P(ω j x*) P(ω j x*, D) } = [ - P(ω B x*)] + Σ j [P(ω B x*) - P(ω j x*)] P(ω j x*, D) = P B ( x*) + Σ j [P(ω B x*) - P(ω j x*)] P(ω j x*, D) [P(ω B x*) - P(ω s x*)] P(ω s x*, D)] bias + Σ j s [P(ω B x*) - P(ω j x*)] P(ω j x*, D) spread
17 P B ( x*) + [P(ω B x*) - P(ω s x*)] P(ω s x*, D)] (bias) + Σ j s [P(ω B x*) - P(ω j x*)] P(ω j x*, D) (spread) Is the spread related to diversity? An example: If we drew a classifier at random from the distribution P D, guessed true ω ω ω 3 ω 4 ω ω ω 3 ω 4 P(error x) = [0.4-0.] [0.x0.+0.3x0.] = 0.73
18 P(error x) = [0.4-0.] [0.x0.+0.3x0.] = Take majority vote. This means decide always ω s for x*. guessed true ω ω ω 3 ω 4 ω ω ω 3 ω 4 P(error x) = [0.4-0.].0 =
19 KW variance (label outputs) [Kohavi Wolpert, 996, Bias plus variance decomposition for zero-one loss functions] The c-class case: P(error x) = bias (x)+variance(x)+noise (x) bias (x) ½ Σ ω (P true (ω x) - P guessed (ω x)) variance(x) ½ ( - Σ ω (P guessed (ω x)) noise (x) ½ ( - Σ ω (P true (ω x))
20 bias (x) = ½ Σ ω (P true (ω x) - P guessed (ω x)) ½ (0.3-0.) + (0.4-0.) + ( ) = 0.03 guessed true ω ω ω 3 ω 4 ω ω ω 3 ω 4 variance(x) = ½ ( - Σ ω (P guessed (ω x)) ) ½ [ ((0.) + (0.) + (0.4) + (0.3) )] = 0.35 noise (x) = ½ ( - Σ ω (P true (ω x)) ) ½ [ ((0.3) + (0.) + (0.) + (0.4) )] = 0.35
21 KW variance (oracle outputs) Consider again the output 0 or as a random variable with relative frequencies p 0 = (Σ 0 s) / L and p = (Σ s) / L, respectively. Then the variance is variance(x) = ½ ( - (p 0 ) - (p ) ) Averaging across the whole data set, KW = /(N L ) Σ k [ (Σ 0 s) (Σ s) ] k Curiously, KW and the averaged pairwise disagreement measure are related through KW = (L-)/(L) D av
22 Measures based on a single data point (case, instance, example, object, whatever) and subsequently averaged over the whole data set.. Measures based on all data points. Interrater agreement, kappa, (oracle outputs) L KW k = - Σ k [ (Σ 0 s) (Σ s) ] k N L (L - ) p ( - p) Number of data points Number of classifiers Averaged individual accuracy
23 Measure of difficulty θ [Hansen Salamon, 990] Define a random variable X = proportion of classifiers which correctly classify a randomly drawn sample x. Let L = 7. Number of points misclassified by all 7 Number of points recognized by any 4 Number of points recognized by all 7
24 L = 7, p = % 60 independent identical negatively dependent 00% 7%
25 measure of diversity θ = Var(X) independent identical diverse θ = θ = 0.40 θ = 0.004
26 Generalized diversity [Partridge Krzanowski, 997] Define a random variable Y = proportion of classifiers which misclassify a randomly drawn sample x. (Y = X defined before) Number of points recognized by all 7 Number of points misclassified by all 7 Denote by p i the probability that Y = i / L, and by p(k) the probability that k randomly chosen classifiers will fail on a randomly drawn x.
27 p() = Σ i p i i / L (the probability of single classifier failing) p() = Σ i p i i (i - )/ (L (L - )) (the probability that two randomly chosen classifiers will fail together) GD = p()/p() Coincidence failure diversity CFD = 0, if p 0 =, /(- p 0 ) Σ i p i (L - i)/ (L - ), if p 0 <
28 Relationship between diversity and accuracy Correlations between the improvement on the single best classifier and some diversity measures (WBC) Q ρ Dis DF κ θ GD CFD MAJ NB BKS WER MAX AVR PRO DT
29 Relationship between diversity measures pairwise non-pairwise DF CFD Q, ρ, E, KW, κ, θ GD non-symmetrical
30 Open problems How to narrow down the study? (Use a specific methodology for building the ensemble) Some theory would not go amiss. Diversity for label outputs and continuous-valued outputs might lead somewhere. The difficulty comes from the fact that the output of the classifiers are vectors ω ω ω 3 ω 4 D D D similarity between distributions (pairwise)
31
Measures of Diversity in Classifier Ensembles and Their Relationship with the Ensemble Accuracy
Machine Learning, 51, 181 207, 2003 c 2003 Kluwer Academic Publishers. Manufactured in The Netherlands. Measures of Diversity in Classifier Ensembles and Their Relationship with the Ensemble Accuracy LUDMILA
More informationEnsemble learning 11/19/13. The wisdom of the crowds. Chapter 11. Ensemble methods. Ensemble methods
The wisdom of the crowds Ensemble learning Sir Francis Galton discovered in the early 1900s that a collection of educated guesses can add up to very accurate predictions! Chapter 11 The paper in which
More informationVariance Reduction and Ensemble Methods
Variance Reduction and Ensemble Methods Nicholas Ruozzi University of Texas at Dallas Based on the slides of Vibhav Gogate and David Sontag Last Time PAC learning Bias/variance tradeoff small hypothesis
More informationClassifier Performance. Assessment and Improvement
Classifier Performance Assessment and Improvement Error Rates Define the Error Rate function Q( ω ˆ,ω) = δ( ω ˆ ω) = 1 if ω ˆ ω = 0 0 otherwise When training a classifier, the Apparent error rate (or Test
More informationMachine Learning. Ensemble Methods. Manfred Huber
Machine Learning Ensemble Methods Manfred Huber 2015 1 Bias, Variance, Noise Classification errors have different sources Choice of hypothesis space and algorithm Training set Noise in the data The expected
More informationLearning Theory, Overfi1ng, Bias Variance Decomposi9on
Learning Theory, Overfi1ng, Bias Variance Decomposi9on Machine Learning 10-601B Seyoung Kim Many of these slides are derived from Tom Mitchell, Ziv- 1 Bar Joseph. Thanks! Any(!) learner that outputs a
More informationLearning with multiple models. Boosting.
CS 2750 Machine Learning Lecture 21 Learning with multiple models. Boosting. Milos Hauskrecht milos@cs.pitt.edu 5329 Sennott Square Learning with multiple models: Approach 2 Approach 2: use multiple models
More informationMachine Learning Foundations
Machine Learning Foundations ( 機器學習基石 ) Lecture 11: Linear Models for Classification Hsuan-Tien Lin ( 林軒田 ) htlin@csie.ntu.edu.tw Department of Computer Science & Information Engineering National Taiwan
More informationBAGGING PREDICTORS AND RANDOM FOREST
BAGGING PREDICTORS AND RANDOM FOREST DANA KANER M.SC. SEMINAR IN STATISTICS, MAY 2017 BAGIGNG PREDICTORS / LEO BREIMAN, 1996 RANDOM FORESTS / LEO BREIMAN, 2001 THE ELEMENTS OF STATISTICAL LEARNING (CHAPTERS
More informationTable 2.14 : Distribution of 125 subjects by laboratory and +/ Category. Test Reference Laboratory Laboratory Total
2.5. Kappa Coefficient and the Paradoxes. - 31-2.5.1 Kappa s Dependency on Trait Prevalence On February 9, 2003 we received an e-mail from a researcher asking whether it would be possible to apply the
More informationMachine Learning Ensemble Learning I Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi Spring /
Machine Learning Ensemble Learning I Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi Spring 2015 http://ce.sharif.edu/courses/93-94/2/ce717-1 / Agenda Combining Classifiers Empirical view Theoretical
More informationCourse: ESO-209 Home Work: 1 Instructor: Debasis Kundu
Home Work: 1 1. Describe the sample space when a coin is tossed (a) once, (b) three times, (c) n times, (d) an infinite number of times. 2. A coin is tossed until for the first time the same result appear
More informationDiversity and Regularization in Neural Network Ensembles
Diversity and Regularization in Neural Network Ensembles Huanhuan Chen School of Computer Science University of Birmingham A thesis submitted for the degree of Doctor of Philosophy October, 2008 To my
More informationNeural Networks and Ensemble Methods for Classification
Neural Networks and Ensemble Methods for Classification NEURAL NETWORKS 2 Neural Networks A neural network is a set of connected input/output units (neurons) where each connection has a weight associated
More informationParametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012
Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood
More informationData Mining und Maschinelles Lernen
Data Mining und Maschinelles Lernen Ensemble Methods Bias-Variance Trade-off Basic Idea of Ensembles Bagging Basic Algorithm Bagging with Costs Randomization Random Forests Boosting Stacking Error-Correcting
More informationMachine Learning Recitation 8 Oct 21, Oznur Tastan
Machine Learning 10601 Recitation 8 Oct 21, 2009 Oznur Tastan Outline Tree representation Brief information theory Learning decision trees Bagging Random forests Decision trees Non linear classifier Easy
More informationCS534 Machine Learning - Spring Final Exam
CS534 Machine Learning - Spring 2013 Final Exam Name: You have 110 minutes. There are 6 questions (8 pages including cover page). If you get stuck on one question, move on to others and come back to the
More informationPareto Analysis for the Selection of Classifier Ensembles
Pareto Analysis for the Selection of Classifier Ensembles Eulanda M. Dos Santos Ecole de technologie superieure 10 rue Notre-Dame ouest Montreal, Canada eulanda@livia.etsmtl.ca Robert Sabourin Ecole de
More informationMachine Learning
Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 11, 2012 Today: Computational Learning Theory Probably Approximately Coorrect (PAC) learning theorem
More informationLearning Theory Continued
Learning Theory Continued Machine Learning CSE446 Carlos Guestrin University of Washington May 13, 2013 1 A simple setting n Classification N data points Finite number of possible hypothesis (e.g., dec.
More informationStatistical Machine Learning from Data
Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Ensembles Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique Fédérale de Lausanne
More informationBackground. Adaptive Filters and Machine Learning. Bootstrap. Combining models. Boosting and Bagging. Poltayev Rassulzhan
Adaptive Filters and Machine Learning Boosting and Bagging Background Poltayev Rassulzhan rasulzhan@gmail.com Resampling Bootstrap We are using training set and different subsets in order to validate results
More informationCSE 151 Machine Learning. Instructor: Kamalika Chaudhuri
CSE 151 Machine Learning Instructor: Kamalika Chaudhuri Ensemble Learning How to combine multiple classifiers into a single one Works well if the classifiers are complementary This class: two types of
More informationMachine Learning
Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 11, 2012 Today: Computational Learning Theory Probably Approximately Coorrect (PAC) learning theorem
More informationday month year documentname/initials 1
ECE471-571 Pattern Recognition Lecture 13 Decision Tree Hairong Qi, Gonzalez Family Professor Electrical Engineering and Computer Science University of Tennessee, Knoxville http://www.eecs.utk.edu/faculty/qi
More informationLow Bias Bagged Support Vector Machines
Low Bias Bagged Support Vector Machines Giorgio Valentini Dipartimento di Scienze dell Informazione Università degli Studi di Milano, Italy valentini@dsi.unimi.it Thomas G. Dietterich Department of Computer
More informationAdaBoost. S. Sumitra Department of Mathematics Indian Institute of Space Science and Technology
AdaBoost S. Sumitra Department of Mathematics Indian Institute of Space Science and Technology 1 Introduction In this chapter, we are considering AdaBoost algorithm for the two class classification problem.
More informationLecture 13: Ensemble Methods
Lecture 13: Ensemble Methods Applied Multivariate Analysis Math 570, Fall 2014 Xingye Qiao Department of Mathematical Sciences Binghamton University E-mail: qiao@math.binghamton.edu 1 / 71 Outline 1 Bootstrap
More informationBagging and Boosting for the Nearest Mean Classifier: Effects of Sample Size on Diversity and Accuracy
and for the Nearest Mean Classifier: Effects of Sample Size on Diversity and Accuracy Marina Skurichina, Liudmila I. Kuncheva 2 and Robert P.W. Duin Pattern Recognition Group, Department of Applied Physics,
More informationIntroduction to Machine Learning
Introduction to Machine Learning Machine Learning: Jordan Boyd-Graber University of Maryland SUPPORT VECTOR MACHINES Slides adapted from Tom Mitchell, Eric Xing, and Lauren Hannah Machine Learning: Jordan
More informationStatistics and learning: Big Data
Statistics and learning: Big Data Learning Decision Trees and an Introduction to Boosting Sébastien Gadat Toulouse School of Economics February 2017 S. Gadat (TSE) SAD 2013 1 / 30 Keywords Decision trees
More informationLearning Ensembles. 293S T. Yang. UCSB, 2017.
Learning Ensembles 293S T. Yang. UCSB, 2017. Outlines Learning Assembles Random Forest Adaboost Training data: Restaurant example Examples described by attribute values (Boolean, discrete, continuous)
More informationThe exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet.
CS 189 Spring 013 Introduction to Machine Learning Final You have 3 hours for the exam. The exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet. Please
More informationCS7267 MACHINE LEARNING
CS7267 MACHINE LEARNING ENSEMBLE LEARNING Ref: Dr. Ricardo Gutierrez-Osuna at TAMU, and Aarti Singh at CMU Mingon Kang, Ph.D. Computer Science, Kennesaw State University Definition of Ensemble Learning
More informationDeep Learning for Computer Vision
Deep Learning for Computer Vision Lecture 3: Probability, Bayes Theorem, and Bayes Classification Peter Belhumeur Computer Science Columbia University Probability Should you play this game? Game: A fair
More informationSupport Vector Machines
Support Vector Machines Jordan Boyd-Graber University of Colorado Boulder LECTURE 7 Slides adapted from Tom Mitchell, Eric Xing, and Lauren Hannah Jordan Boyd-Graber Boulder Support Vector Machines 1 of
More informationMachine Learning Basics: Maximum Likelihood Estimation
Machine Learning Basics: Maximum Likelihood Estimation Sargur N. srihari@cedar.buffalo.edu This is part of lecture slides on Deep Learning: http://www.cedar.buffalo.edu/~srihari/cse676 1 Topics 1. Learning
More informationIEEE TRANSACTIONS ON SYSTEMS, MAN AND CYBERNETICS - PART B 1
IEEE TRANSACTIONS ON SYSTEMS, MAN AND CYBERNETICS - PART B 1 1 2 IEEE TRANSACTIONS ON SYSTEMS, MAN AND CYBERNETICS - PART B 2 An experimental bias variance analysis of SVM ensembles based on resampling
More informationFinal Overview. Introduction to ML. Marek Petrik 4/25/2017
Final Overview Introduction to ML Marek Petrik 4/25/2017 This Course: Introduction to Machine Learning Build a foundation for practice and research in ML Basic machine learning concepts: max likelihood,
More informationQuick Tour of Basic Probability Theory and Linear Algebra
Quick Tour of and Linear Algebra Quick Tour of and Linear Algebra CS224w: Social and Information Network Analysis Fall 2011 Quick Tour of and Linear Algebra Quick Tour of and Linear Algebra Outline Definitions
More informationCSE 417T: Introduction to Machine Learning. Lecture 11: Review. Henry Chai 10/02/18
CSE 417T: Introduction to Machine Learning Lecture 11: Review Henry Chai 10/02/18 Unknown Target Function!: # % Training data Formal Setup & = ( ), + ),, ( -, + - Learning Algorithm 2 Hypothesis Set H
More informationMachine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.
Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted
More informationUnderstanding Generalization Error: Bounds and Decompositions
CIS 520: Machine Learning Spring 2018: Lecture 11 Understanding Generalization Error: Bounds and Decompositions Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the
More informationMachine Learning, Midterm Exam: Spring 2008 SOLUTIONS. Q Topic Max. Score Score. 1 Short answer questions 20.
10-601 Machine Learning, Midterm Exam: Spring 2008 Please put your name on this cover sheet If you need more room to work out your answer to a question, use the back of the page and clearly mark on the
More informationBoosting & Deep Learning
Boosting & Deep Learning Ensemble Learning n So far learning methods that learn a single hypothesis, chosen form a hypothesis space that is used to make predictions n Ensemble learning à select a collection
More informationSupport Vector Machine. Industrial AI Lab.
Support Vector Machine Industrial AI Lab. Classification (Linear) Autonomously figure out which category (or class) an unknown item should be categorized into Number of categories / classes Binary: 2 different
More informationUVA CS 4501: Machine Learning
UVA CS 4501: Machine Learning Lecture 21: Decision Tree / Random Forest / Ensemble Dr. Yanjun Qi University of Virginia Department of Computer Science Where are we? è Five major sections of this course
More informationEnsembles of Classifiers.
Ensembles of Classifiers www.biostat.wisc.edu/~dpage/cs760/ 1 Goals for the lecture you should understand the following concepts ensemble bootstrap sample bagging boosting random forests error correcting
More informationCSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18
CSE 417T: Introduction to Machine Learning Final Review Henry Chai 12/4/18 Overfitting Overfitting is fitting the training data more than is warranted Fitting noise rather than signal 2 Estimating! "#$
More informationStatistical methods in recognition. Why is classification a problem?
Statistical methods in recognition Basic steps in classifier design collect training images choose a classification model estimate parameters of classification model from training images evaluate model
More informationLecture 24: Other (Non-linear) Classifiers: Decision Tree Learning, Boosting, and Support Vector Classification Instructor: Prof. Ganesh Ramakrishnan
Lecture 24: Other (Non-linear) Classifiers: Decision Tree Learning, Boosting, and Support Vector Classification Instructor: Prof Ganesh Ramakrishnan October 20, 2016 1 / 25 Decision Trees: Cascade of step
More informationEnsemble Methods and Random Forests
Ensemble Methods and Random Forests Vaishnavi S May 2017 1 Introduction We have seen various analysis for classification and regression in the course. One of the common methods to reduce the generalization
More informationInstance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016
Instance-based Learning CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Outline Non-parametric approach Unsupervised: Non-parametric density estimation Parzen Windows Kn-Nearest
More informationBAYESIAN DECISION THEORY
Last updated: September 17, 2012 BAYESIAN DECISION THEORY Problems 2 The following problems from the textbook are relevant: 2.1 2.9, 2.11, 2.17 For this week, please at least solve Problem 2.3. We will
More informationMathematical Tools for Neuroscience (NEU 314) Princeton University, Spring 2016 Jonathan Pillow. Homework 8: Logistic Regression & Information Theory
Mathematical Tools for Neuroscience (NEU 34) Princeton University, Spring 206 Jonathan Pillow Homework 8: Logistic Regression & Information Theory Due: Tuesday, April 26, 9:59am Optimization Toolbox One
More informationMachine Learning. Lecture 9: Learning Theory. Feng Li.
Machine Learning Lecture 9: Learning Theory Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2018 Why Learning Theory How can we tell
More informationEnsemble Learning in the Presence of Noise
Universidad Autónoma de Madrid Master s Thesis Ensemble Learning in the Presence of Noise Author: Maryam Sabzevari Supervisors: Dr. Gonzalo Martínez Muñoz, Dr. Alberto Suárez González Submitted in partial
More informationThe AdaBoost algorithm =1/n for i =1,...,n 1) At the m th iteration we find (any) classifier h(x; ˆθ m ) for which the weighted classification error m
) Set W () i The AdaBoost algorithm =1/n for i =1,...,n 1) At the m th iteration we find (any) classifier h(x; ˆθ m ) for which the weighted classification error m m =.5 1 n W (m 1) i y i h(x i ; 2 ˆθ
More informationVoting (Ensemble Methods)
1 2 Voting (Ensemble Methods) Instead of learning a single classifier, learn many weak classifiers that are good at different parts of the data Output class: (Weighted) vote of each classifier Classifiers
More informationGetting Lost in the Wealth of Classifier Ensembles?
Getting Lost in the Wealth of Classifier Ensembles? Ludmila Kuncheva School of Computer Science Bangor University mas00a@bangor.ac.uk Supported by Project RPG-2015-188 Sponsored by the Leverhulme trust
More informationLearning Theory. Machine Learning CSE546 Carlos Guestrin University of Washington. November 25, Carlos Guestrin
Learning Theory Machine Learning CSE546 Carlos Guestrin University of Washington November 25, 2013 Carlos Guestrin 2005-2013 1 What now n We have explored many ways of learning from data n But How good
More informationVariance and Bias for General Loss Functions
Machine Learning, 51, 115 135, 2003 c 2003 Kluwer Academic Publishers. Manufactured in The Netherlands. Variance and Bias for General Loss Functions GARETH M. JAMES Marshall School of Business, University
More informationB555 - Machine Learning - Homework 4. Enrique Areyan April 28, 2015
- Machine Learning - Homework Enrique Areyan April 8, 01 Problem 1: Give decision trees to represent the following oolean functions a) A b) A C c) Ā d) A C D e) A C D where Ā is a negation of A and is
More informationRandomized Decision Trees
Randomized Decision Trees compiled by Alvin Wan from Professor Jitendra Malik s lecture Discrete Variables First, let us consider some terminology. We have primarily been dealing with real-valued data,
More informationData Mining and Analysis: Fundamental Concepts and Algorithms
Data Mining and Analysis: Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki 1 Wagner Meira Jr. 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA
More informationIntroduction to ML. Two examples of Learners: Naïve Bayesian Classifiers Decision Trees
Introduction to ML Two examples of Learners: Naïve Bayesian Classifiers Decision Trees Why Bayesian learning? Probabilistic learning: Calculate explicit probabilities for hypothesis, among the most practical
More informationMachine Learning: Chenhao Tan University of Colorado Boulder LECTURE 9
Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 9 Slides adapted from Jordan Boyd-Graber Machine Learning: Chenhao Tan Boulder 1 of 39 Recap Supervised learning Previously: KNN, naïve
More informationLecture Notes 5 Convergence and Limit Theorems. Convergence with Probability 1. Convergence in Mean Square. Convergence in Probability, WLLN
Lecture Notes 5 Convergence and Limit Theorems Motivation Convergence with Probability Convergence in Mean Square Convergence in Probability, WLLN Convergence in Distribution, CLT EE 278: Convergence and
More informationTDT4173 Machine Learning
TDT4173 Machine Learning Lecture 3 Bagging & Boosting + SVMs Norwegian University of Science and Technology Helge Langseth IT-VEST 310 helgel@idi.ntnu.no 1 TDT4173 Machine Learning Outline 1 Ensemble-methods
More informationLecture 3: Pattern Classification. Pattern classification
EE E68: Speech & Audio Processing & Recognition Lecture 3: Pattern Classification 3 4 5 The problem of classification Linear and nonlinear classifiers Probabilistic classification Gaussians, mitures and
More informationAlgorithm-Independent Learning Issues
Algorithm-Independent Learning Issues Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2007 c 2007, Selim Aksoy Introduction We have seen many learning
More informationCross Validation & Ensembling
Cross Validation & Ensembling Shan-Hung Wu shwu@cs.nthu.edu.tw Department of Computer Science, National Tsing Hua University, Taiwan Machine Learning Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning
More informationDan Roth 461C, 3401 Walnut
CIS 519/419 Applied Machine Learning www.seas.upenn.edu/~cis519 Dan Roth danroth@seas.upenn.edu http://www.cis.upenn.edu/~danroth/ 461C, 3401 Walnut Slides were created by Dan Roth (for CIS519/419 at Penn
More informationSF2930 Regression Analysis
SF2930 Regression Analysis Alexandre Chotard Tree-based regression and classication 20 February 2017 1 / 30 Idag Overview Regression trees Pruning Bagging, random forests 2 / 30 Today Overview Regression
More informationBernoulli and Binomial Distributions. Notes. Bernoulli Trials. Bernoulli/Binomial Random Variables Bernoulli and Binomial Distributions.
Lecture 11 Text: A Course in Probability by Weiss 5.3 STAT 225 Introduction to Probability Models February 16, 2014 Whitney Huang Purdue University 11.1 Agenda 1 2 11.2 Bernoulli trials Many problems in
More informationMACHINE LEARNING ADVANCED MACHINE LEARNING
MACHINE LEARNING ADVANCED MACHINE LEARNING Recap of Important Notions on Estimation of Probability Density Functions 2 2 MACHINE LEARNING Overview Definition pdf Definition joint, condition, marginal,
More informationFINAL: CS 6375 (Machine Learning) Fall 2014
FINAL: CS 6375 (Machine Learning) Fall 2014 The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run out of room for
More informationAccuracy/Diversity and Ensemble MLP Classifier Design
Accuracy/Diversity and Ensemble MLP Classifier Design Terry Windeatt Centre for Vision, Speech and Signal Proc (CVSSP), School of Electronics and Physical Sciences University of Surrey, Guildford, Surrey,
More informationSupport Vector Machine. Industrial AI Lab. Prof. Seungchul Lee
Support Vector Machine Industrial AI Lab. Prof. Seungchul Lee Classification (Linear) Autonomously figure out which category (or class) an unknown item should be categorized into Number of categories /
More informationLecture 3: Pattern Classification
EE E6820: Speech & Audio Processing & Recognition Lecture 3: Pattern Classification 1 2 3 4 5 The problem of classification Linear and nonlinear classifiers Probabilistic classification Gaussians, mixtures
More informationWALD LECTURE II LOOKING INSIDE THE BLACK BOX. Leo Breiman UCB Statistics
1 WALD LECTURE II LOOKING INSIDE THE BLACK BOX Leo Breiman UCB Statistics leo@stat.berkeley.edu ORIGIN OF BLACK BOXES 2 Statistics uses data to explore problems. Think of the data as being generated by
More informationNearest Neighbor Pattern Classification
Nearest Neighbor Pattern Classification T. M. Cover and P. E. Hart May 15, 2018 1 The Intro The nearest neighbor algorithm/rule (NN) is the simplest nonparametric decisions procedure, that assigns to unclassified
More informationEnsemble Methods. Charles Sutton Data Mining and Exploration Spring Friday, 27 January 12
Ensemble Methods Charles Sutton Data Mining and Exploration Spring 2012 Bias and Variance Consider a regression problem Y = f(x)+ N(0, 2 ) With an estimate regression function ˆf, e.g., ˆf(x) =w > x Suppose
More informationECE 5424: Introduction to Machine Learning
ECE 5424: Introduction to Machine Learning Topics: Ensemble Methods: Bagging, Boosting PAC Learning Readings: Murphy 16.4;; Hastie 16 Stefan Lee Virginia Tech Fighting the bias-variance tradeoff Simple
More informationLecture 8. Instructor: Haipeng Luo
Lecture 8 Instructor: Haipeng Luo Boosting and AdaBoost In this lecture we discuss the connection between boosting and online learning. Boosting is not only one of the most fundamental theories in machine
More informationLow Bias Bagged Support Vector Machines
Low Bias Bagged Support Vector Machines Giorgio Valentini Dipartimento di Scienze dell Informazione, Università degli Studi di Milano, Italy INFM, Istituto Nazionale per la Fisica della Materia, Italy.
More informationFinal Examination CS540-2: Introduction to Artificial Intelligence
Final Examination CS540-2: Introduction to Artificial Intelligence May 9, 2018 LAST NAME: SOLUTIONS FIRST NAME: Directions 1. This exam contains 33 questions worth a total of 100 points 2. Fill in your
More informationLearning theory. Ensemble methods. Boosting. Boosting: history
Learning theory Probability distribution P over X {0, 1}; let (X, Y ) P. We get S := {(x i, y i )} n i=1, an iid sample from P. Ensemble methods Goal: Fix ɛ, δ (0, 1). With probability at least 1 δ (over
More informationRegularization. CSCE 970 Lecture 3: Regularization. Stephen Scott and Vinod Variyam. Introduction. Outline
Other Measures 1 / 52 sscott@cse.unl.edu learning can generally be distilled to an optimization problem Choose a classifier (function, hypothesis) from a set of functions that minimizes an objective function
More informationAdaBoost. Lecturer: Authors: Center for Machine Perception Czech Technical University, Prague
AdaBoost Lecturer: Jan Šochman Authors: Jan Šochman, Jiří Matas Center for Machine Perception Czech Technical University, Prague http://cmp.felk.cvut.cz Motivation Presentation 2/17 AdaBoost with trees
More information9 Classification. 9.1 Linear Classifiers
9 Classification This topic returns to prediction. Unlike linear regression where we were predicting a numeric value, in this case we are predicting a class: winner or loser, yes or no, rich or poor, positive
More informationMeasuring Discriminant and Characteristic Capability for Building and Assessing Classifiers
Measuring Discriminant and Characteristic Capability for Building and Assessing Classifiers Giuliano Armano, Francesca Fanni and Alessandro Giuliani Dept. of Electrical and Electronic Engineering, University
More informationMonte Carlo Theory as an Explanation of Bagging and Boosting
Monte Carlo Theory as an Explanation of Bagging and Boosting Roberto Esposito Università di Torino C.so Svizzera 185, Torino, Italy esposito@di.unito.it Lorenza Saitta Università del Piemonte Orientale
More informationQualifying Exam in Machine Learning
Qualifying Exam in Machine Learning October 20, 2009 Instructions: Answer two out of the three questions in Part 1. In addition, answer two out of three questions in two additional parts (choose two parts
More informationEnsembles. Léon Bottou COS 424 4/8/2010
Ensembles Léon Bottou COS 424 4/8/2010 Readings T. G. Dietterich (2000) Ensemble Methods in Machine Learning. R. E. Schapire (2003): The Boosting Approach to Machine Learning. Sections 1,2,3,4,6. Léon
More informationIndependent Samples ANOVA
Independent Samples ANOVA In this example students were randomly assigned to one of three mnemonics (techniques for improving memory) rehearsal (the control group; simply repeat the words), visual imagery
More informationCMU-Q Lecture 24:
CMU-Q 15-381 Lecture 24: Supervised Learning 2 Teacher: Gianni A. Di Caro SUPERVISED LEARNING Hypotheses space Hypothesis function Labeled Given Errors Performance criteria Given a collection of input
More informationECE 5984: Introduction to Machine Learning
ECE 5984: Introduction to Machine Learning Topics: Ensemble Methods: Bagging, Boosting Readings: Murphy 16.4; Hastie 16 Dhruv Batra Virginia Tech Administrativia HW3 Due: April 14, 11:55pm You will implement
More informationPAC-learning, VC Dimension and Margin-based Bounds
More details: General: http://www.learning-with-kernels.org/ Example of more complex bounds: http://www.research.ibm.com/people/t/tzhang/papers/jmlr02_cover.ps.gz PAC-learning, VC Dimension and Margin-based
More information