Learning theory. Ensemble methods. Boosting. Boosting: history


 Sheena Harper
 10 months ago
 Views:
Transcription
1 Learning theory Probability distribution P over X {0, 1}; let (X, Y ) P. We get S := {(x i, y i )} n i=1, an iid sample from P. Ensemble methods Goal: Fix ɛ, δ (0, 1). With probability at least 1 δ (over random choice of S), learn a classifier ˆf : X { 1, 1} with low error rate err( ˆf) = P ( ˆf(X) Y ) ɛ. Basic question: When is this possible? Suppose I even promise you that there is a perfect classifier from a particular function class F. (E.g., F = linear classifiers or F = decision trees.) Default: Empirical Risk Minimization (i.e., pick classifier from F with lowest training error rate), but this might be computationally difficult (e.g., for decision trees). Another question: Is it easier to learn just nontrivial classifiers in F (i.e., better than random guessing)? 1 / 30 2 / 30 Boosting Boosting: Using a learning algorithm that provides rough rulesofthumb to construct a very accurate predictor. Motivation: Easy to construct classification rules that are correct moreoftenthannot (e.g., If % of the characters are dollar signs, then it s spam. ), but seems hard to find a single rule that is almost always correct. Basic idea: Input: training data S For t = 1, 2,..., T : 1. Choose subset of examples S t S (or a distribution over S). 2. Use weak learning algorithm to get classifier: f t := WL(S t ). Return an ensemble classifier based on f 1, f 2,..., f T. Boosting: history 1984 Valiant and Kearns ask whether boosting is theoretically possible (formalized in the PAC learning model) Schapire creates first boosting algorithm, solving the open problem of Valiant and Kearns Freund creates an optimal boosting algorithm (Boostbymajority) Drucker, Schapire, and Simard empirically observe practical limitations of early boosting algorithms. 199 Freund and Schapire create AdaBoost a boosting algorithm with practical advantages over early boosting algorithms. Winner of 2004 ACM Paris Kanellakis Award: For their seminal work and distinguished contributions [...] to the development of the theory and practice of boosting, a general and provably effective method of producing arbitrarily accurate prediction rules by combining weak learning rules ; specifically, for AdaBoost, which can be used to significantly reduce the error of algorithms used in statistical analysis, spam filtering, fraud detection, optical character recognition, and market segmentation, among other applications. 3 / 30 4 / 30
2 AdaBoost Interpretation input Training data {(x i, y i )} n i=1 from X { 1, 1}. 1: initialize D 1 (i) := 1/n for each i = 1, 2,..., n (a probability distribution). 2: for t = 1, 2,..., T do 3: Give D t weighted examples to WL; get back f t : X { 1, 1}. 4: Update weights: z t := n D t (i) y i f t (x i ) [ 1, 1] i=1 α t := 1 2 ln 1 z t 1 z t R (weight of f t ) D t1 (i) := D t (i) exp ( α t y i f t (x i ) ) /Z t for each i = 1, 2,..., n, where Z t > 0 is normalizer that makes D t1 a probability distribution. : end for 6: return Final classifier ˆf(x) := sign α t f t (x). Interpreting z t Suppose (X, Y ) D t. If then z t = P (f(x) = Y ) = 1 2 γ t, n D t (i) y i f(x i ) = 2γ t [ 1, 1]. i=1 z t = 0 random guessing w.r.t. D t. z t > 0 better than random guessing w.r.t. D t. z t < 0 better off using the opposite of f s predictions. (Let sign(z) := 1 if z > 0 and sign(z) := 1 if z 0.) / 30 6 / 30 Interpretation Example: AdaBoost with decision stumps Classifier weights α t = 1 2 ln 1z t 1 z t α t z t Example weights D t1 (i) D t1 (i) D t (i) exp( α t y i f t (x i )). Weak learning algorithm WL: ERM with F = decision stumps on R 2 (i.e., axisaligned threshold functions x sign(vx i t)). Straightforward to handle importance weights in ERM. (Example from Figures 1.1 and 1.2 of Schapire & Freund text.) 7 / 30 8 / 30
3 Example: execution of AdaBoost Example: final classifier from AdaBoost D 1 D 2 D 3 f 1 f 2 f 3 z 1 = 0.40, α 1 = 0.42 z 2 = 0.8, α 2 = 0.6 z 3 = 0.72, α 3 = 0.92 Final classifier ˆf(x) = sign(0.42f 1 (x) 0.6f 2 (x) 0.92f 3 (x)) f 1 f 2 f 3 z 1 = 0.40, α 1 = 0.42 z 2 = 0.8, α 2 = 0.6 z 3 = 0.72, α 3 = 0.92 (Zero training error rate!) 9 / / 30 Empirical results UCI UCI Results Test error rates of C4. and AdaBoost on several classification problems. Each point represents a single classification problem/dataset from UCI repository. Training error rate of final classifier Recall γ t := P (f t (X) = Y ) 1/2 = z t /2 when (X, Y ) D t. C4. C C4. C AdaBooststumps AdaBoostC4. boosting Stumps boosting C4. C4. C4. = popular algorithm for learning decision trees. Training error rate of final classifier from AdaBoost: err( ˆf, {(x i, y i )} n i=1) exp 2 If average γ 2 ( ) := 1 T T γ2 t > 0, then training error rate is exp 2 γ 2 T. AdaBoost = Adaptive Boosting Some γ t could be small, even negative only care about overall average γ 2. What about true error rate? γ 2 t. (Figure 1.3 from Schapire & Freund text.) 11 / / 30
4 Combining classifiers Let F be the function class used by the weak learning algorithm WL. The function class used by AdaBoost is F T := x sign α t f t (x) : f 1, f 2,..., f T F, α 1, α 2,..., α T R i.e., linear combinations of T functions from F. Complexity of F T grows linearly with T. Theoretical guarantee (e.g., when F = decision stumps in R d ): With high probability (over random choice of training sample), err( ˆf) ( ) ( ) exp 2 γ 2 T log d T O. }{{} n }{{} Training error rate Error due to finite sample Theory suggests danger of overfitting when T is very large. Indeed, this does happen sometimes... but often not! A typical run of boosting AdaBoostC4. on letters dataset. Error rate C4. test error AdaBoost test error AdaBoost training error # of rounds T (# nodes across all decision trees in ˆf is > ) Training error rate is zero after just five rounds, but test error rate continues to decrease, even up to 1000 rounds! (Figure 1.7 from Schapire & Freund text) 13 / / 30 Boosting the margin Final classifier from AdaBoost: ( T ) ˆf(x) = sign α tf t (x) T α. t }{{} g(x) [ 1, 1] Call y g(x) [ 1, 1] the margin achieved on example (x, y). New theory [Schapire, Freund, Bartlett, and Lee, 1998]: Larger margins better resistance to overfitting, independent of T. AdaBoost tends to increase margins on training examples. (Similar but not the same as SVM margins.) On letters dataset: T = T = 100 T = 1000 training error rate 0.0% 0.0% 0.0% test error rate 8.4% 3.3% 3.1% % margins % 0.0% 0.0% min. margin Linear classifiers Regard function class F used by weak learning algorithm as feature functions : x φ(x) := ( f(x) : f F ) { 1, 1} F (possibly infinite dimensional!). AdaBoost s final classifier is a linear classifier in { 1, 1} F : ˆf(x) = sign α t f t (x) = sign w f f(x) = sign ( w, φ(x) ) f F where w f := α t 1{f t = f} f F. 1 / / 30
5 Exponential loss AdaBoost is a particular coordinate descent algorithm (similar to but not the same as gradient descent) for 1 min w R F n n exp ( y i w, φ(x i ) ). i=1 exp(x) log(1exp(x)) More on boosting Many variants of boosting: AdaBoost with different loss functions. Boosted decision trees = boosting decision trees. Boosting algorithms for ranking and multiclass. Boosting algorithms that are robust to certain kinds of noise. Boosting for online learning algorithms (very new!).... Many connections between boosting and other subjects: Game theory, online learning Geometry of information (replace 2 2 with relative entropy divergence) Computational complexity / / 30 Application: face detection Face detection Problem: Given an image, locate all of the faces. An application of AdaBoost As a classification problem: Divide up images into patches (at varying scales, e.g., 24 24, 48 48). Learn classifier f : X Y, where Y = {face, not face}. Many other things built on top of face detectors (e.g., face tracking, face recognizers); now in every digital camera and iphoto/picasalike software. Main problem: how to make this very fast. 20 / 30
6 Face detectors via AdaBoost [Viola & Jones, 2001] Face detector architecture by Viola & Jones (2001): major achievement in computer vision; detector actually usable in realtime. Think of each image patch (d dpixel grayscale) as a vector x [0, 1] d2. Used weak learning algorithm that picks linear classifiers f w,t (x) = sign( w, x t), where w has a very particular form: Viola & Jones integral image trick Integral image trick: (0, 0) For every image, precompute (r, c) s(r, c) = sum of pixel values in rectangle from (0, 0) to (r, c) (single pass through image). To compute inner product w, x = average pixel value in black box average pixel value in white box just need to add and subtract a few s(r, c) values. AdaBoost combines many rulesofthumb of this form. Very simple. Extremely fast to evaluate via precomputation. Evaluating rulesofthumb classifiers is extremely fast. 21 / / 30 Viola & Jones cascade architecture Viola & Jones detector: example results Problem: severe class imbalance (most patches don t contain a face). Solution: Train several classifiers (each using AdaBoost), and arrange in a special kind of decision list called a cascade: x f (1) 1 (x) f (2) 1 (x) f (3) (x) Each f (l) is trained (using AdaBoost), adjust threshold (before passing through sign) to minimize false negative rate. Can make f (l) in later stages more complex than in earlier stages, since most examples don t make it to the end. (Cascade) classifier evaluation extremely fast. 23 / / 30
7 Summary Two key points: AdaBoost effectively combines many fasttoevaluate weak classifiers. Cascade structure optimizes speed for common case. Bagging 2 / 30 Bagging Aside: sampling with replacement Question: if n individuals are picked from a population of size n u.a.r. with replacement, what is the probability that a given individual is not picked? Bagging = Bootstrap aggregating (Leo Breiman, 1994). Input: training data {(x i, y i )} n i=1 from X { 1, 1}. For t = 1, 2,..., T : 1. Randomly pick n examples with replacement from training data {(x (t) i, y (t) i )} n i=1 (a bootstrap sample). 2. Run learning algorithm on {(x (t) i, y (t) i )} n i=1 classifier f t. Return a majority vote classifier over f 1, f 2,..., f T. Answer: ( 1 1 ) n n For large n: Implications for bagging: ( lim 1 1 ) n = 1 n n e Each bootstrap sample contains about 63% of the data set. Remaining 37% can be used to estimate error rate of classifier trained on the bootstrap sample. Can average across bootstrap samples to get estimate of bagged classifier s error rate (sort of). 27 / / 30
8 Random Forests Key takeaways Random Forests (Leo Breiman, 2001). Input: training data {(x i, y i )} n i=1 from R d { 1, 1}. For t = 1, 2,..., T : 1. Randomly pick n examples with replacement from training data {(x (t) i, y (t) i )} n i=1 (a bootstrap sample). 2. Run variant of decision tree learning algorithm on {(x (t) i, y (t) i )} n i=1, where each split is chosen by only considering a random subset of d features (rather than all d features) decision tree classifier f t. Return a majority vote classifier over f 1, f 2,..., f T. 1. Theoretical concept of weak and strong learning. 2. AdaBoost algorithm; concept of margins in boosting. 3. Interpreting AdaBoost s final classifier as a linear classifier, and interpreting AdaBoost as a coordinate descent algorithm. 4. Structure of decision lists / cascades.. Concept of bootstrap samples; bagging and random forests. 29 / / 30
Boosting: Foundations and Algorithms. Rob Schapire
Boosting: Foundations and Algorithms Rob Schapire Example: Spam Filtering problem: filter out spam (junk email) gather large collection of examples of spam and nonspam: From: yoav@ucsd.edu Rob, can you
More informationUniversität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Decision Trees. Tobias Scheffer
Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Decision Trees Tobias Scheffer Decision Trees One of many applications: credit risk Employed longer than 3 months Positive credit
More informationBackground. Adaptive Filters and Machine Learning. Bootstrap. Combining models. Boosting and Bagging. Poltayev Rassulzhan
Adaptive Filters and Machine Learning Boosting and Bagging Background Poltayev Rassulzhan rasulzhan@gmail.com Resampling Bootstrap We are using training set and different subsets in order to validate results
More informationEnsemble Methods. Charles Sutton Data Mining and Exploration Spring Friday, 27 January 12
Ensemble Methods Charles Sutton Data Mining and Exploration Spring 2012 Bias and Variance Consider a regression problem Y = f(x)+ N(0, 2 ) With an estimate regression function ˆf, e.g., ˆf(x) =w > x Suppose
More informationMachine Learning Ensemble Learning I Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi Spring /
Machine Learning Ensemble Learning I Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi Spring 2015 http://ce.sharif.edu/courses/9394/2/ce7171 / Agenda Combining Classifiers Empirical view Theoretical
More informationStatistical Machine Learning from Data
Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Ensembles Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique Fédérale de Lausanne
More informationEnsemble learning 11/19/13. The wisdom of the crowds. Chapter 11. Ensemble methods. Ensemble methods
The wisdom of the crowds Ensemble learning Sir Francis Galton discovered in the early 1900s that a collection of educated guesses can add up to very accurate predictions! Chapter 11 The paper in which
More informationData Mining: Concepts and Techniques. (3 rd ed.) Chapter 8. Chapter 8. Classification: Basic Concepts
Data Mining: Concepts and Techniques (3 rd ed.) Chapter 8 1 Chapter 8. Classification: Basic Concepts Classification: Basic Concepts Decision Tree Induction Bayes Classification Methods RuleBased Classification
More informationThe AdaBoost algorithm =1/n for i =1,...,n 1) At the m th iteration we find (any) classifier h(x; ˆθ m ) for which the weighted classification error m
) Set W () i The AdaBoost algorithm =1/n for i =1,...,n 1) At the m th iteration we find (any) classifier h(x; ˆθ m ) for which the weighted classification error m m =.5 1 n W (m 1) i y i h(x i ; 2 ˆθ
More informationCS 231A Section 1: Linear Algebra & Probability Review
CS 231A Section 1: Linear Algebra & Probability Review 1 Topics Support Vector Machines Boosting ViolaJones face detector Linear Algebra Review Notation Operations & Properties Matrix Calculus Probability
More informationCS 231A Section 1: Linear Algebra & Probability Review. Kevin Tang
CS 231A Section 1: Linear Algebra & Probability Review Kevin Tang Kevin Tang Section 11 9/30/2011 Topics Support Vector Machines Boosting Viola Jones face detector Linear Algebra Review Notation Operations
More informationECE 5984: Introduction to Machine Learning
ECE 5984: Introduction to Machine Learning Topics: Ensemble Methods: Bagging, Boosting Readings: Murphy 16.4; Hastie 16 Dhruv Batra Virginia Tech Administrativia HW3 Due: April 14, 11:55pm You will implement
More informationAdaBoost. Lecturer: Authors: Center for Machine Perception Czech Technical University, Prague
AdaBoost Lecturer: Jan Šochman Authors: Jan Šochman, Jiří Matas Center for Machine Perception Czech Technical University, Prague http://cmp.felk.cvut.cz Motivation Presentation 2/17 AdaBoost with trees
More informationAnalysis of the Performance of AdaBoost.M2 for the Simulated DigitRecognitionExample
Analysis of the Performance of AdaBoost.M2 for the Simulated DigitRecognitionExample Günther Eibl and Karl Peter Pfeiffer Institute of Biostatistics, Innsbruck, Austria guenther.eibl@uibk.ac.at Abstract.
More informationWhat makes good ensemble? CS789: Machine Learning and Neural Network. Introduction. More on diversity
What makes good ensemble? CS789: Machine Learning and Neural Network Ensemble methods Jakramate Bootkrajang Department of Computer Science Chiang Mai University 1. A member of the ensemble is accurate.
More informationIntroduction to Machine Learning Lecture 11. Mehryar Mohri Courant Institute and Google Research
Introduction to Machine Learning Lecture 11 Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Boosting Mehryar Mohri  Introduction to Machine Learning page 2 Boosting Ideas Main idea:
More informationAdvanced Machine Learning
Advanced Machine Learning Deep Boosting MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Outline Model selection. Deep boosting. theory. algorithm. experiments. page 2 Model Selection Problem:
More informationA Decision Stump. Decision Trees, cont. Boosting. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University. October 1 st, 2007
Decision Trees, cont. Boosting Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University October 1 st, 2007 1 A Decision Stump 2 1 The final tree 3 Basic Decision Tree Building Summarized
More informationGradient Boosting (Continued)
Gradient Boosting (Continued) David Rosenberg New York University April 4, 2016 David Rosenberg (New York University) DSGA 1003 April 4, 2016 1 / 31 Boosting Fits an Additive Model Boosting Fits an Additive
More informationEnsembles of Classifiers.
Ensembles of Classifiers www.biostat.wisc.edu/~dpage/cs760/ 1 Goals for the lecture you should understand the following concepts ensemble bootstrap sample bagging boosting random forests error correcting
More informationTDT4173 Machine Learning
TDT4173 Machine Learning Lecture 3 Bagging & Boosting + SVMs Norwegian University of Science and Technology Helge Langseth ITVEST 310 helgel@idi.ntnu.no 1 TDT4173 Machine Learning Outline 1 Ensemblemethods
More informationBoos$ng Can we make dumb learners smart?
Boos$ng Can we make dumb learners smart? Aarti Singh Machine Learning 10601 Nov 29, 2011 Slides Courtesy: Carlos Guestrin, Freund & Schapire 1 Why boost weak learners? Goal: Automa'cally categorize type
More informationBoosting. Ryan Tibshirani Data Mining: / April Optional reading: ISL 8.2, ESL , 10.7, 10.13
Boosting Ryan Tibshirani Data Mining: 36462/36662 April 25 2013 Optional reading: ISL 8.2, ESL 10.1 10.4, 10.7, 10.13 1 Reminder: classification trees Suppose that we are given training data (x i, y
More informationBoosting Methods for Regression
Machine Learning, 47, 153 00, 00 c 00 Kluwer Academic Publishers. Manufactured in The Netherlands. Boosting Methods for Regression NIGEL DUFFY nigeduff@cse.ucsc.edu DAVID HELMBOLD dph@cse.ucsc.edu Computer
More informationPart I Week 7 Based in part on slides from textbook, slides of Susan Holmes
Part I Week 7 Based in part on slides from textbook, slides of Susan Holmes Support Vector Machine, Random Forests, Boosting December 2, 2012 1 / 1 2 / 1 Neural networks Artificial Neural networks: Networks
More informationData Mining und Maschinelles Lernen
Data Mining und Maschinelles Lernen Ensemble Methods BiasVariance Tradeoff Basic Idea of Ensembles Bagging Basic Algorithm Bagging with Costs Randomization Random Forests Boosting Stacking ErrorCorrecting
More informationUniversität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Linear Classifiers. Blaine Nelson, Tobias Scheffer
Universität Potsdam Institut für Informatik Lehrstuhl Linear Classifiers Blaine Nelson, Tobias Scheffer Contents Classification Problem Bayesian Classifier Decision Linear Classifiers, MAP Models Logistic
More informationBagging. Ryan Tibshirani Data Mining: / April Optional reading: ISL 8.2, ESL 8.7
Bagging Ryan Tibshirani Data Mining: 36462/36662 April 23 2013 Optional reading: ISL 8.2, ESL 8.7 1 Reminder: classification trees Our task is to predict the class label y {1,... K} given a feature vector
More informationChapter 18. Decision Trees and Ensemble Learning. Recall: Learning Decision Trees
CSE 473 Chapter 18 Decision Trees and Ensemble Learning Recall: Learning Decision Trees Example: When should I wait for a table at a restaurant? Attributes (features) relevant to Wait? decision: 1. Alternate:
More informationIntroduction and Overview
1 Introduction and Overview How is it that a committee of blockheads can somehow arrive at highly reasoned decisions, despite the weak judgment of the individual members? How can the shaky separate views
More informationthe tree till a class assignment is reached
Decision Trees Decision Tree for Playing Tennis Prediction is done by sending the example down Prediction is done by sending the example down the tree till a class assignment is reached Definitions Internal
More informationMachine Learning Lecture 7
Course Outline Machine Learning Lecture 7 Fundamentals (2 weeks) Bayes Decision Theory Probability Density Estimation Statistical Learning Theory 23.05.2016 Discriminative Approaches (5 weeks) Linear Discriminant
More informationLeast Squares Classification
Least Squares Classification Stephen Boyd EE103 Stanford University November 4, 2017 Outline Classification Least squares classification Multiclass classifiers Classification 2 Classification data fitting
More informationChemometrics and Intelligent Laboratory Systems
Chemometrics and Intelligent Laboratory Systems 100 (2010) 1 11 Contents lists available at ScienceDirect Chemometrics and Intelligent Laboratory Systems journal homepage: www.elsevier.com/locate/chemolab
More informationA Gentle Introduction to Gradient Boosting. Cheng Li College of Computer and Information Science Northeastern University
A Gentle Introduction to Gradient Boosting Cheng Li chengli@ccs.neu.edu College of Computer and Information Science Northeastern University Gradient Boosting a powerful machine learning algorithm it can
More informationFinal Overview. Introduction to ML. Marek Petrik 4/25/2017
Final Overview Introduction to ML Marek Petrik 4/25/2017 This Course: Introduction to Machine Learning Build a foundation for practice and research in ML Basic machine learning concepts: max likelihood,
More informationMachine Learning Lecture 12
Machine Learning Lecture 12 Neural Networks 30.11.2017 Bastian Leibe RWTH Aachen http://www.vision.rwthaachen.de leibe@vision.rwthaachen.de Course Outline Fundamentals Bayes Decision Theory Probability
More informationLinear Classifiers and the Perceptron
Linear Classifiers and the Perceptron William Cohen February 4, 2008 1 Linear classifiers Let s assume that every instance is an ndimensional vector of real numbers x R n, and there are only two possible
More informationScaleInvariance of Support Vector Machines based on the Triangular Kernel. Abstract
ScaleInvariance of Support Vector Machines based on the Triangular Kernel François Fleuret Hichem Sahbi IMEDIA Research Group INRIA Domaine de Voluceau 78150 Le Chesnay, France Abstract This paper focuses
More informationTDT4173 Machine Learning
TDT4173 Machine Learning Lecture 9 Learning Classifiers: Bagging & Boosting Norwegian University of Science and Technology Helge Langseth ITVEST 310 helgel@idi.ntnu.no 1 TDT4173 Machine Learning Outline
More informationInfinite Ensemble Learning with Support Vector Machinery
Infinite Ensemble Learning with Support Vector Machinery HsuanTien Lin and Ling Li Learning Systems Group, California Institute of Technology ECML/PKDD, October 4, 2005 H.T. Lin and L. Li (Learning Systems
More informationGradient Boosting, Continued
Gradient Boosting, Continued David Rosenberg New York University December 26, 2016 David Rosenberg (New York University) DSGA 1003 December 26, 2016 1 / 16 Review: Gradient Boosting Review: Gradient Boosting
More informationMIDTERM SOLUTIONS: FALL 2012 CS 6375 INSTRUCTOR: VIBHAV GOGATE
MIDTERM SOLUTIONS: FALL 2012 CS 6375 INSTRUCTOR: VIBHAV GOGATE March 28, 2012 The exam is closed book. You are allowed a double sided one page cheat sheet. Answer the questions in the spaces provided on
More informationClassification Ensemble That Maximizes the Area Under Receiver Operating Characteristic Curve (AUC)
Classification Ensemble That Maximizes the Area Under Receiver Operating Characteristic Curve (AUC) Eunsik Park 1 and Yc Ivan Chang 2 1 Chonnam National University, Gwangju, Korea 2 Academia Sinica, Taipei,
More informationUNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013
UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two onepage, twosided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and
More information2D Image Processing Face Detection and Recognition
2D Image Processing Face Detection and Recognition Prof. Didier Stricker Kaiserlautern University http://ags.cs.unikl.de/ DFKI Deutsches Forschungszentrum für Künstliche Intelligenz http://av.dfki.de
More informationCS4495/6495 Introduction to Computer Vision. 8CL3 Support Vector Machines
CS4495/6495 Introduction to Computer Vision 8CL3 Support Vector Machines Discriminative classifiers Discriminative classifiers find a division (surface) in feature space that separates the classes Several
More informationIntroduction to Boosting and Joint Boosting
Introduction to Boosting and Learning Systems Group, Caltech 2005/04/26, Presentation in EE150 Boosting and Outline Introduction to Boosting 1 Introduction to Boosting Intuition of Boosting Adaptive Boosting
More informationRandom Classification Noise Defeats All Convex Potential Boosters
Philip M. Long Google Rocco A. Servedio Computer Science Department, Columbia University Abstract A broad class of boosting algorithms can be interpreted as performing coordinatewise gradient descent
More informationLecture 4. 1 Learning NonLinear Classifiers. 2 The Kernel Trick. CS621 Theory Gems September 27, 2012
CS62 Theory Gems September 27, 22 Lecture 4 Lecturer: Aleksander Mądry Scribes: Alhussein Fawzi Learning NonLinear Classifiers In the previous lectures, we have focused on finding linear classifiers,
More informationSupport Vector Machines (SVM) in bioinformatics. Day 1: Introduction to SVM
1 Support Vector Machines (SVM) in bioinformatics Day 1: Introduction to SVM JeanPhilippe Vert Bioinformatics Center, Kyoto University, Japan JeanPhilippe.Vert@mines.org Human Genome Center, University
More information1 Handling of Continuous Attributes in C4.5. Algorithm
.. Spring 2009 CSC 466: Knowledge Discovery from Data Alexander Dekhtyar.. Data Mining: Classification/Supervised Learning Potpourri Contents 1. C4.5. and continuous attributes: incorporating continuous
More informationAdaptive Boosting of Neural Networks for Character Recognition
Adaptive Boosting of Neural Networks for Character Recognition Holger Schwenk Yoshua Bengio Dept. Informatique et Recherche Opérationnelle Université de Montréal, Montreal, Qc H3C3J7, Canada fschwenk,bengioyg@iro.umontreal.ca
More informationBoosting Methods: Why They Can Be Useful for HighDimensional Data
New URL: http://www.rproject.org/conferences/dsc2003/ Proceedings of the 3rd International Workshop on Distributed Statistical Computing (DSC 2003) March 20 22, Vienna, Austria ISSN 1609395X Kurt Hornik,
More informationPCA FACE RECOGNITION
PCA FACE RECOGNITION The slides are from several sources through James Hays (Brown); Srinivasa Narasimhan (CMU); Silvio Savarese (U. of Michigan); Shree Nayar (Columbia) including their own slides. Goal
More informationProbabilistic Random Forests: Predicting Data Point Specific Misclassification Probabilities ; CU CS
University of Colorado, Boulder CU Scholar Computer Science Technical Reports Computer Science Spring 5123 Probabilistic Random Forests: Predicting Data Point Specific Misclassification Probabilities
More informationCross Validation & Ensembling
Cross Validation & Ensembling ShanHung Wu shwu@cs.nthu.edu.tw Department of Computer Science, National Tsing Hua University, Taiwan Machine Learning ShanHung Wu (CS, NTHU) CV & Ensembling Machine Learning
More informationUNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014
UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014 Exam policy: This exam allows two onepage, twosided cheat sheets (i.e. 4 sides); No other materials. Time: 2 hours. Be sure to write
More informationGeneralized Boosted Models: A guide to the gbm package
Generalized Boosted Models: A guide to the gbm package Greg Ridgeway May 23, 202 Boosting takes on various forms th different programs using different loss functions, different base models, and different
More informationIntroduction to Machine Learning
1, DATA11002 Introduction to Machine Learning Lecturer: Teemu Roos TAs: Ville Hyvönen and Janne Leppäaho Department of Computer Science University of Helsinki (based in part on material by Patrik Hoyer
More informationCOS 429: COMPUTER VISON Face Recognition
COS 429: COMPUTER VISON Face Recognition Intro to recognition PCA and Eigenfaces LDA and Fisherfaces Face detection: Viola & Jones (Optional) generic object models for faces: the Constellation Model Reading:
More informationPredicting the Probability of Correct Classification
Predicting the Probability of Correct Classification Gregory Z. Grudic Department of Computer Science University of Colorado, Boulder grudic@cs.colorado.edu Abstract We propose a formulation for binary
More informationLearning from Examples
Learning from Examples Data fitting Decision trees Cross validation Computational learning theory Linear classifiers Neural networks Nonparametric methods: nearest neighbor Support vector machines Ensemble
More informationTutorial: PART 1. Online Convex Optimization, A Game Theoretic Approach to Learning.
Tutorial: PART 1 Online Convex Optimization, A Game Theoretic Approach to Learning http://www.cs.princeton.edu/~ehazan/tutorial/tutorial.htm Elad Hazan Princeton University Satyen Kale Yahoo Research
More informationMargin Maximizing Loss Functions
Margin Maximizing Loss Functions Saharon Rosset, Ji Zhu and Trevor Hastie Department of Statistics Stanford University Stanford, CA, 94305 saharon, jzhu, hastie@stat.stanford.edu Abstract Margin maximizing
More informationMonte Carlo Theory as an Explanation of Bagging and Boosting
Monte Carlo Theory as an Explanation of Bagging and Boosting Roberto Esposito Università di Torino C.so Svizzera 185, Torino, Italy esposito@di.unito.it Lorenza Saitta Università del Piemonte Orientale
More informationDiversity Regularized Machine
Diversity Regularized Machine Yang Yu and YuFeng Li and ZhiHua Zhou National Key Laboratory for Novel Software Technology Nanjing University, Nanjing 10093, China {yuy,liyf,zhouzh}@lamda.nju.edu.cn Abstract
More informationAn adaptive version of the boost by majority algorithm
An adaptive version of the boost by majority algorithm Yoav Freund AT&T Labs 180 Park Avenue Florham Park NJ 092 USA October 2 2000 Abstract We propose a new boosting algorithm This boosting algorithm
More informationMidterm, Fall 2003
578 Midterm, Fall 2003 YOUR ANDREW USERID IN CAPITAL LETTERS: YOUR NAME: There are 9 questions. The ninth may be more timeconsuming and is worth only three points, so do not attempt 9 unless you are
More informationSupport Vector Machines and Kernel Methods
Support Vector Machines and Kernel Methods Geoff Gordon ggordon@cs.cmu.edu July 10, 2003 Overview Why do people care about SVMs? Classification problems SVMs often produce good results over a wide range
More informationDoes Unlabeled Data Help?
Does Unlabeled Data Help? Worstcase Analysis of the Sample Complexity of Semisupervised Learning. BenDavid, Lu and Pal; COLT, 2008. Presentation by Ashish Rastogi Courant Machine Learning Seminar. Outline
More informationNeural Networks with Applications to Vision and Language. Feedforward Networks. Marco Kuhlmann
Neural Networks with Applications to Vision and Language Feedforward Networks Marco Kuhlmann Feedforward networks Linear separability x 2 x 2 0 1 0 1 0 0 x 1 1 0 x 1 linearly separable not linearly separable
More informationMidterm: CS 6375 Spring 2015 Solutions
Midterm: CS 6375 Spring 2015 Solutions The exam is closed book. You are allowed a onepage cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run out of room for an
More informationModeling Data with Linear Combinations of Basis Functions. Read Chapter 3 in the text by Bishop
Modeling Data with Linear Combinations of Basis Functions Read Chapter 3 in the text by Bishop A Type of Supervised Learning Problem We want to model data (x 1, t 1 ),..., (x N, t N ), where x i is a vector
More informationIntroduction to Machine Learning Midterm Exam Solutions
10701 Introduction to Machine Learning Midterm Exam Solutions Instructors: Eric Xing, Ziv BarJoseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes,
More informationMultivariate Analysis Techniques in HEP
Multivariate Analysis Techniques in HEP Jan Therhaag IKTP Institutsseminar, Dresden, January 31 st 2013 Multivariate analysis in a nutshell Neural networks: Defeating the black box Boosted Decision Trees:
More informationCIS 520: Machine Learning Oct 09, Kernel Methods
CIS 520: Machine Learning Oct 09, 207 Kernel Methods Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture They may or may not cover all the material discussed
More informationLecture 14 : Online Learning, Stochastic Gradient Descent, Perceptron
CS446: Machine Learning, Fall 2017 Lecture 14 : Online Learning, Stochastic Gradient Descent, Perceptron Lecturer: Sanmi Koyejo Scribe: Ke Wang, Oct. 24th, 2017 Agenda Recap: SVM and Hinge loss, Representer
More informationRegularization. CSCE 970 Lecture 3: Regularization. Stephen Scott and Vinod Variyam. Introduction. Outline
Other Measures 1 / 52 sscott@cse.unl.edu learning can generally be distilled to an optimization problem Choose a classifier (function, hypothesis) from a set of functions that minimizes an objective function
More informationA Bias Correction for the Minimum Error Rate in Crossvalidation
A Bias Correction for the Minimum Error Rate in Crossvalidation Ryan J. Tibshirani Robert Tibshirani Abstract Tuning parameters in supervised learning problems are often estimated by crossvalidation.
More informationNONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
NONLINEAR CLASSIFICATION AND REGRESSION Nonlinear Classification and Regression: Outline 2 MultiLayer Perceptrons The BackPropagation Learning Algorithm Generalized Linear Models Radial Basis Function
More informationArtificial Neural Networks Examination, June 2005
Artificial Neural Networks Examination, June 2005 Instructions There are SIXTY questions. (The pass mark is 30 out of 60). For each question, please select a maximum of ONE of the given answers (either
More informationFoundations of Machine Learning
Introduction to ML Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu page 1 Logistics Prerequisites: basics in linear algebra, probability, and analysis of algorithms. Workload: about
More informationImproved Boosting Algorithms Using Confidencerated Predictions
Improved Boosting Algorithms Using Confidencerated Predictions schapire@research.att.com AT&T Labs, Shannon Laboratory, 18 Park Avenue, Room A79, Florham Park, NJ 793971 singer@research.att.com AT&T
More informationMidterm Exam Solutions, Spring 2007
171 Midterm Exam Solutions, Spring 7 1. Personal info: Name: Andrew account: Email address:. There should be 16 numbered pages in this exam (including this cover sheet). 3. You can use any material you
More informationClassification: The rest of the story
U NIVERSITY OF ILLINOIS AT URBANACHAMPAIGN CS598 Machine Learning for Signal Processing Classification: The rest of the story 3 October 2017 Today s lecture Important things we haven t covered yet Fisher
More informationClassification and Regression Trees
Classification and Regression Trees Ryan P Adams So far, we have primarily examined linear classifiers and regressors, and considered several different ways to train them When we ve found the linearity
More information15388/688  Practical Data Science: Nonlinear modeling, crossvalidation, regularization, and evaluation
15388/688  Practical Data Science: Nonlinear modeling, crossvalidation, regularization, and evaluation J. Zico Kolter Carnegie Mellon University Fall 2016 1 Outline Example: return to peak demand prediction
More informationFoundations of Machine Learning OnLine Learning. Mehryar Mohri Courant Institute and Google Research
Foundations of Machine Learning OnLine Learning Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Motivation PAC learning: distribution fixed over time (training and test). IID assumption.
More informationEmpirical Risk Minimization Algorithms
Empirical Risk Minimization Algorithms Tirgul 2 Part I November 2016 Reminder Domain set, X : the set of objects that we wish to label. Label set, Y : the set of possible labels. A prediction rule, h:
More informationIntroduction to Logistic Regression and Support Vector Machine
Introduction to Logistic Regression and Support Vector Machine guest lecturer: MingWei Chang CS 446 Fall, 2009 () / 25 Fall, 2009 / 25 Before we start () 2 / 25 Fall, 2009 2 / 25 Before we start Feel
More informationLecture 2. Judging the Performance of Classifiers. Nitin R. Patel
Lecture 2 Judging the Performance of Classifiers Nitin R. Patel 1 In this note we will examine the question of how to udge the usefulness of a classifier and how to compare different classifiers. Not only
More informationImportance Sampling: An Alternative View of Ensemble Learning. Jerome H. Friedman Bogdan Popescu Stanford University
Importance Sampling: An Alternative View of Ensemble Learning Jerome H. Friedman Bogdan Popescu Stanford University 1 PREDICTIVE LEARNING Given data: {z i } N 1 = {y i, x i } N 1 q(z) y = output or response
More informationConvex envelopes, cardinality constrained optimization and LASSO. An application in supervised learning: support vector machines (SVMs)
ORF 523 Lecture 8 Princeton University Instructor: A.A. Ahmadi Scribe: G. Hall Any typos should be emailed to a a a@princeton.edu. 1 Outline Convexitypreserving operations Convex envelopes, cardinality
More informationInfinite Ensemble Learning with Support Vector Machines
Infinite Ensemble Learning with Support Vector Machines Thesis by HsuanTien Lin In Partial Fulfillment of the Requirements for the Degree of Master of Science California Institute of Technology Pasadena,
More informationSupport vector machines Lecture 4
Support vector machines Lecture 4 David Sontag New York University Slides adapted from Luke Zettlemoyer, Vibhav Gogate, and Carlos Guestrin Q: What does the Perceptron mistake bound tell us? Theorem: The
More informationMachine Learning. VC Dimension and Model Complexity. Eric Xing , Fall 2015
Machine Learning 10701, Fall 2015 VC Dimension and Model Complexity Eric Xing Lecture 16, November 3, 2015 Reading: Chap. 7 T.M book, and outline material Eric Xing @ CMU, 20062015 1 Last time: PAC and
More informationSelf bounding learning algorithms
Self bounding learning algorithms Yoav Freund AT&T Labs 180 Park Avenue Florham Park, NJ 079320971 USA yoav@research.att.com January 17, 2000 Abstract Most of the work which attempts to give bounds on
More informationENSEMBLES OF DECISION RULES
ENSEMBLES OF DECISION RULES Jerzy BŁASZCZYŃSKI, Krzysztof DEMBCZYŃSKI, Wojciech KOTŁOWSKI, Roman SŁOWIŃSKI, Marcin SZELĄG Abstract. In most approaches to ensemble methods, base classifiers are decision
More informationABCBoost: Adaptive Base Class Boost for Multiclass Classification
ABCBoost: Adaptive Base Class Boost for Multiclass Classification Ping Li Department of Statistical Science, Cornell University, Ithaca, NY 14853 USA pingli@cornell.edu Abstract We propose boost (adaptive
More information