Computational and Statistical Learning Theory
|
|
- Sibyl Jefferson
- 5 years ago
- Views:
Transcription
1 Computational and Statistical Learning Theory TTIC Prof. Nati Srebro Lecture 8: Boosting (and Compression Schemes)
2 Boosting the Error If we have an efficient learning algorithm that for any distribution realizable by H n returns a predictor that is guaranteed to be slightly better then chance, i.e. has error ε 0 = 1 γ < 1 (for some γ > 0), can we construct a 2 2 new efficient algorithm that for any distribution realizable by H n achieves arbitrarily low error ε? Posed (as a theoretical question) by Valiant and Kearns, Harvard 1988 Solved by MIT student Robert Schapire, 1990 AdaBoost Algorithm by Schapire and Yoav Fruend, AT&T 1995 Leslie Valiant Michael Kearns Rob Schapire Yoav Fruend
3 AdaBoost Input: Training set S = { x 1, y 1, x 2, y 2,, x m, y m } Weak Learner A, which will be applied to distributions D over S If thinking of A(S ) as accepting a sample S : each x, y S is set to (x i, y i ) w.p. D i (independently and with replacements) Can often think of A as operating on a weighted sample, with weights D i Output: hypothesis h Initialize D 1 = 1 m, 1 m,, 1 m For t=1,, T: h t = A(D t ) ε t = L D t h t = σ i D i t h t x i y i α t = 1 2 log 1 ε t 1 D t+1 i = D t i exp αt y i h t x i t σ j D i exp αt y j h t x j Output: h T x = sign σt t=1 α t h t x
4 AdaBoost: Weight Update Initialize D 1 = 1 m, 1 m,, 1 m For t=1,, T: h t = A(D t ) ε t = L D t h t = σ i D i t h t x i y i α t = 1 2 log 1 ε t 1 D t+1 i = D t i exp αt y i h t x i t σ j D i exp αt y j h t x j T Output: h s x = sign σ t=1 α t h t x Increase weight on errors, decrease on correct points: D i t+1 D i t D i t 1 ε t ε t if h t x i y i ε t 1 ε t if h t x i = y i
5 AdaBoost: Weight Update D i t+1 = D t i exp αt y i h t x i Z t = 1 Z t D i t D i t 1 ε t ε t if h t x i y i ε t 1 ε t if h t x i = y i = D i t 2ε t if h t x i y i D i t 2 1 ε t if h t x i = y i Z t = σ ht x i y i D i t 1 ε t ε t + σ ht x i =y i D i t ε t 1 ε t = ε t 1 ε t ε t + 1 ε t ε t 1 ε t = 2 ε t 1 ε t L D t+1 h t = σ ht x i y i D i t+1 = σ ht x i y i D i t 1 2ε t = ε t 1 2ε t = 1 2
6 AdaBoost: Weight Update Initialize D 1 = 1 m, 1 m,, 1 m For t=1,, T: h t = A(D t ) ε t = L D t h t = σ i D i t h t x i y i α t = 1 2 log 1 ε t 1 D t+1 i = D t i exp αt y i h t x i t σ j D i exp αt y j h t x j T Output: h s x = sign σ t=1 α t h t x Increase weight on errors, decrease on correct points: D i t+1 D i t D i t 1 ε t ε t if h t x i y i ε t 1 ε t if h t x i = y i L D t+1 h t = 0.5
7 AdaBoost as Learning a Linear Classifier Recall: h T x = sign σt t=1 α t h t x Let B = all hypothesis outputed by A ( Base Class, e.g. decision stumps) w h = h t =h α t h T sign w, φ x w R B Class of halfspaces L B exp 1 L S w = m σlexp (h w x i ; y i ) l exp y w,φ x h w (x), y = e Each step of AdaBoost: Coordinate descent on L S exp (w) Choose coordinate h of φ(x) s.t. φ x w h L S h = h(x) exp w is very negative Update w h = arg min L S exp (w) s.t. h h w[h ] is unchanged
8 Coordinate Descent on L S exp w L exp w h S w = w h 1 m σe y ih w x i = 1 m σe y ih w x i y i h w x i w h = 1 m σe y ih w x i y i h(x i ) = 1 m σe y i σ t=1 T 1 α t h t x i y i h x i 2L D T h 1 ς T 1 t=1 e y iα t h t x i D i T Minimize L D T h Minimize L exp w h S w (make it very negative) Updating w[h]: set w t h t = w t 1 h t + α α = arg min L S exp w t 0 = L exp α S w t = exp L w h t S w t 2L D t+1 h t 1 choose α s.t. L D t+1 h t = 1 2
9 AdaBoost: Minimizing L S (h) Theorem: If t ε t 1 2 γ, then L S 01 h T L S exp ht e 2γ2 T Proof: L S exp ht D T+1 i = 1 ς T m t=1 = 1 σ m i e y i σ T t=1 e y i α t h t x i = ςt t 1 2 ε t 1 ε t 1 2γ 1 + 2γ z t σ i D i T+1 = 1 α t h t x i = 1 σ m i D T+1 i m ςt t=1 z t = ςt t=1 z t TΤ 2 = 1 4γ 2 TΤ 2 e 2γ2 T If A( ) is a weak learner with δ 0, ε 0 = 1 2 γ, and if L D h = 0: L S h = 0 L D t h = 0 w.p. 1 δ, L D t h 1 2 γ w.p. 1 δt, L S h s e 2γ2 T To get any ε > 0, run AdaBoost for T = log 1 ε 2γ 2 rounds Setting ε = 1 log 2m, after T = rounds: L 2m 2γ 2 S h s = 0! What about L D h?
10 Sparse Linear Classifiers Recall: h s x = sign σt t=1 w t h t x Let B = all hypothesis outputed by A Base Class, e.g. decision stumps h T sign w, φ x w R B, w 0 T Class of halfspaces L B
11 Sparse Linear Classifiers Recall: h s x = sign σt t=1 w t h t x Let B = all hypothesis outputed by A Base Class, e.g. decision stumps h T sign w, φ x w R B, w 0 T Class of sparse halfspaces L B, T You already saw: VCdim(L B, T ) O T log B Even if B is infinite (e.g. in the case of decision stumps): VCdim L B, T O T VCdim B Sample complexity: m = O log m γ 2 VCdim B ε But if weak learner is improper and VCdim B =? = O VCdim B γ 2 ε
12 Compression Bounds Focus on realizable case, and learning rules s.t. L S A S = 0 Suppose A(S) only dependent on first r < m examples, A x 1, y 1,, x m, y m = ሚA x 1, y 1,, x r, y r : δ L S r+1:m ሚA S[1: r] = 0 S D m L D A S log 1 δ m r In fact, same holds for any predetermined i 1,, i r, if A(S) only depends on x i1 y i1,, x ir, y ir Now consider A S = ሚA S I S with I: X Y m 1.. m r. That is, can represent A(S) using r training points, but need to choose which ones. Taking a union bound over m r choices of indices: r log m + log 1 δ L D A S m r
13 Compression Schemes A S is r-compressing if A S = ሚA S I S for some I: X Y m 1.. m r Axis Aligned Rectangles I(S) = { leftmost positive, rightmost positive, top positive, bottom positive} r = 4 Halfspaces in R d A bit trickier, but can be done with r = d + 1 (for non-homogenous) δ A is r-compressing and L S A S = 0 for m > 2r, S D m r log m + log 1 δ L D A S 2 m By VC lower bound: FINDCONS H is r-compressing VCdim H O(r) In fact: VCdim H r Conjecture: every H has a VCdim H -compressing FINDCONS H Manfred Warmuth
14 Back to Boosting A(S) is an (ε 0 = 1 2 γ, δ 0) weak learner that uses m 0 samples. Boost the confidence to get a ( 1 γ, δ ) learner that uses 2 2 m 1 δ = O m 0 log1 δ + log1 δ log log 1 δ0 samples log 1 γ δ0 2 Run AdaBoost on m samples for T = using m = m 1 δ 2 log m γ 2 iterations, each time T samples for the weak learner to get L S h T = 0 T h T = α t h t t=1 (h 1,, h T ) has a compression scheme with r = T m points What about α t??? h t = A(subsample of size m )
15 Partial Compression Instead of r training points specifying A(S) exactly, suppose they only specify a limited set of hypothesis in which A(S) lies. I: X Y m 1.. m r F: X Y r hypothesis classes, each with VCdim F S A S F I S D Theorem: If A(S) has a compression scheme as above and L S A S = 0, δ then for m 2r + D, S D m D + r log m + log 2 L D A S O δ m Proof outline: take union bound over choice of indices I(S), of the VC-based uniform convergence bounds, each time using just the points outside I(S).
16 Back to Boosting A(S) is an (ε 0 = 1 2 γ, δ 0) weak learner that uses m 0 samples. Boost the confidence to get a ( 1 2 γ 2, δ ) learner that uses m 1 δ = O m 0 log1 δ log 1 δ0 + log1 δ log log 1 δ0 γ 2 Run AdaBoost on m samples for T = using m = m 1 Conclusion: δ 2 log m γ 2 samples iterations, each time T samples for the weak learner to get L S h T = 0 T h T = α t h t L h 1,, h T = F(I S ) t=1 For fixed ε 0, δ 0 L D h T O T + Tm log m + log 1 δ m m ε, δ = O = O m 0 log 2 m log 1 δ m m 0 log 21 Τε log 1 δ ε 1 γ 2 log 1 δ0
17 AdaBoost In Practice Complexity control is in terms of sparsity (#iterations) T Realizable case (MDL): use first T s.t. L S h T = 0 More realistically (SRM): Use validation/cross-validation to select T Test error T=1000 T=100 Training error T=5 Even after L S h T = 0, AdaBoost keeps improving the l 1 margin
18 Interpretations of AdaBoost Boosting weak learning to get arbitrary small error Theory is for realizable case Shows efficient weak and strong learning equivalent Ensemble method for combining many simpler predictors E.g. combining decision stumps or decision trees Other ensemble methods: bagging, averaging, gating networks Method for learning using sparse linear predictors with large (infinite?) dimensional feature space Sparsity controls complexity Number of iterations controls sparsity Coordinate-wise optimization of L S exp w We ll get back to this when we talk about real-valued loss Learning (in high dimensions) with large l 1 margin Learning guarantee in terms of l 1 margin We ll get back to this when we talk about l 1 margin
19 Just one more thing
20 Back to Hardness of Agnostic Learning H = {x w, x > 0 w R n } H k n = h 1 h 2 h k h i H Lemma: h Hk L D h = 0 h H L D h < k 2 Want to show: hard to learn agnostically Hard to learn even if realizable H k n H is efficiently agnostically learnable Efficient weak learner for H k n with γ = 1 2k 2 is efficiently learnable (in realizable case) for, e.g. k n = n Conclusion: subject to lattice-based crypto hardness, halfspaces are not efficiently agnostically learnable (even improperly)
Computational and Statistical Learning Theory
Computational and Statistical Learning Theory TTIC 31120 Prof. Nati Srebro Lecture 7: Computational Complexity of Learning Agnostic Learning Hardness of Learning via Crypto Assumption: No poly-time algorithm
More informationCOMS 4721: Machine Learning for Data Science Lecture 13, 3/2/2017
COMS 4721: Machine Learning for Data Science Lecture 13, 3/2/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University BOOSTING Robert E. Schapire and Yoav
More informationLecture 8. Instructor: Haipeng Luo
Lecture 8 Instructor: Haipeng Luo Boosting and AdaBoost In this lecture we discuss the connection between boosting and online learning. Boosting is not only one of the most fundamental theories in machine
More informationComputational and Statistical Learning Theory
Computational and Statistical Learning Theory TTIC 31120 Prof. Nati Srebro Lecture 6: Computational Complexity of Learning Proper vs Improper Learning Efficient PAC Learning Definition: A family H n of
More informationIntroduction to Machine Learning
Introduction to Machine Learning 236756 Prof. Nir Ailon Lecture 4: Computational Complexity of Learning & Surrogate Losses Efficient PAC Learning Until now we were mostly worried about sample complexity
More informationComputational and Statistical Learning Theory
Computational and Statistical Learning Theory TTIC 31120 Prof. Nati Srebro Lecture 6: Computational Complexity of Learning Proper vs Improper Learning Learning Using FIND-CONS For any family of hypothesis
More informationThe Boosting Approach to. Machine Learning. Maria-Florina Balcan 10/31/2016
The Boosting Approach to Machine Learning Maria-Florina Balcan 10/31/2016 Boosting General method for improving the accuracy of any given learning algorithm. Works by creating a series of challenge datasets
More informationComputational and Statistical Learning Theory
Computational and Statistical Learning Theory TTIC 31120 Prof. Nati Srebro Lecture 12: Weak Learnability and the l 1 margin Converse to Scale-Sensitive Learning Stability Convex-Lipschitz-Bounded Problems
More informationMachine Learning Ensemble Learning I Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi Spring /
Machine Learning Ensemble Learning I Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi Spring 2015 http://ce.sharif.edu/courses/93-94/2/ce717-1 / Agenda Combining Classifiers Empirical view Theoretical
More informationComputational and Statistical Learning Theory
Computational and Statistical Learning Theory TTIC 31120 Prof. Nati Srebro Lecture 4: MDL and PAC-Bayes Uniform vs Non-Uniform Bias No Free Lunch: we need some inductive bias Limiting attention to hypothesis
More informationComputational and Statistical Learning Theory
Coputational and Statistical Learning Theory TTIC 31120 Prof. Nati Srebro Lecture 2: PAC Learning and VC Theory I Fro Adversarial Online to Statistical Three reasons to ove fro worst-case deterinistic
More informationLittlestone s Dimension and Online Learnability
Littlestone s Dimension and Online Learnability Shai Shalev-Shwartz Toyota Technological Institute at Chicago The Hebrew University Talk at UCSD workshop, February, 2009 Joint work with Shai Ben-David
More informationIntroduction to Machine Learning Lecture 11. Mehryar Mohri Courant Institute and Google Research
Introduction to Machine Learning Lecture 11 Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Boosting Mehryar Mohri - Introduction to Machine Learning page 2 Boosting Ideas Main idea:
More informationComputational and Statistical Learning theory
Computational and Statistical Learning theory Problem set 2 Due: January 31st Email solutions to : karthik at ttic dot edu Notation : Input space : X Label space : Y = {±1} Sample : (x 1, y 1,..., (x n,
More informationLearning Ensembles. 293S T. Yang. UCSB, 2017.
Learning Ensembles 293S T. Yang. UCSB, 2017. Outlines Learning Assembles Random Forest Adaboost Training data: Restaurant example Examples described by attribute values (Boolean, discrete, continuous)
More informationLearning theory. Ensemble methods. Boosting. Boosting: history
Learning theory Probability distribution P over X {0, 1}; let (X, Y ) P. We get S := {(x i, y i )} n i=1, an iid sample from P. Ensemble methods Goal: Fix ɛ, δ (0, 1). With probability at least 1 δ (over
More informationCOMS 4771 Lecture Boosting 1 / 16
COMS 4771 Lecture 12 1. Boosting 1 / 16 Boosting What is boosting? Boosting: Using a learning algorithm that provides rough rules-of-thumb to construct a very accurate predictor. 3 / 16 What is boosting?
More informationAdaBoost. Lecturer: Authors: Center for Machine Perception Czech Technical University, Prague
AdaBoost Lecturer: Jan Šochman Authors: Jan Šochman, Jiří Matas Center for Machine Perception Czech Technical University, Prague http://cmp.felk.cvut.cz Motivation Presentation 2/17 AdaBoost with trees
More informationCS229 Supplemental Lecture notes
CS229 Supplemental Lecture notes John Duchi 1 Boosting We have seen so far how to solve classification (and other) problems when we have a data representation already chosen. We now talk about a procedure,
More informationBackground. Adaptive Filters and Machine Learning. Bootstrap. Combining models. Boosting and Bagging. Poltayev Rassulzhan
Adaptive Filters and Machine Learning Boosting and Bagging Background Poltayev Rassulzhan rasulzhan@gmail.com Resampling Bootstrap We are using training set and different subsets in order to validate results
More informationRecitation 9. Gradient Boosting. Brett Bernstein. March 30, CDS at NYU. Brett Bernstein (CDS at NYU) Recitation 9 March 30, / 14
Brett Bernstein CDS at NYU March 30, 2017 Brett Bernstein (CDS at NYU) Recitation 9 March 30, 2017 1 / 14 Initial Question Intro Question Question Suppose 10 different meteorologists have produced functions
More informationGeneralization, Overfitting, and Model Selection
Generalization, Overfitting, and Model Selection Sample Complexity Results for Supervised Classification Maria-Florina (Nina) Balcan 10/03/2016 Two Core Aspects of Machine Learning Algorithm Design. How
More informationDeep Boosting. Joint work with Corinna Cortes (Google Research) Umar Syed (Google Research) COURANT INSTITUTE & GOOGLE RESEARCH.
Deep Boosting Joint work with Corinna Cortes (Google Research) Umar Syed (Google Research) MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Ensemble Methods in ML Combining several base classifiers
More informationComputational Learning Theory: Shattering and VC Dimensions. Machine Learning. Spring The slides are mainly from Vivek Srikumar
Computational Learning Theory: Shattering and VC Dimensions Machine Learning Spring 2018 The slides are mainly from Vivek Srikumar 1 This lecture: Computational Learning Theory The Theory of Generalization
More informationBoosting. CAP5610: Machine Learning Instructor: Guo-Jun Qi
Boosting CAP5610: Machine Learning Instructor: Guo-Jun Qi Weak classifiers Weak classifiers Decision stump one layer decision tree Naive Bayes A classifier without feature correlations Linear classifier
More informationStochastic Gradient Descent
Stochastic Gradient Descent Machine Learning CSE546 Carlos Guestrin University of Washington October 9, 2013 1 Logistic Regression Logistic function (or Sigmoid): Learn P(Y X) directly Assume a particular
More informationVoting (Ensemble Methods)
1 2 Voting (Ensemble Methods) Instead of learning a single classifier, learn many weak classifiers that are good at different parts of the data Output class: (Weighted) vote of each classifier Classifiers
More informationEnsemble Methods for Machine Learning
Ensemble Methods for Machine Learning COMBINING CLASSIFIERS: ENSEMBLE APPROACHES Common Ensemble classifiers Bagging/Random Forests Bucket of models Stacking Boosting Ensemble classifiers we ve studied
More informationBoosting: Foundations and Algorithms. Rob Schapire
Boosting: Foundations and Algorithms Rob Schapire Example: Spam Filtering problem: filter out spam (junk email) gather large collection of examples of spam and non-spam: From: yoav@ucsd.edu Rob, can you
More informationBoosting. Acknowledgment Slides are based on tutorials from Robert Schapire and Gunnar Raetsch
. Machine Learning Boosting Prof. Dr. Martin Riedmiller AG Maschinelles Lernen und Natürlichsprachliche Systeme Institut für Informatik Technische Fakultät Albert-Ludwigs-Universität Freiburg riedmiller@informatik.uni-freiburg.de
More informationComputational Learning Theory
Computational Learning Theory Pardis Noorzad Department of Computer Engineering and IT Amirkabir University of Technology Ordibehesht 1390 Introduction For the analysis of data structures and algorithms
More informationECE 5424: Introduction to Machine Learning
ECE 5424: Introduction to Machine Learning Topics: Ensemble Methods: Bagging, Boosting PAC Learning Readings: Murphy 16.4;; Hastie 16 Stefan Lee Virginia Tech Fighting the bias-variance tradeoff Simple
More informationGeneralization and Overfitting
Generalization and Overfitting Model Selection Maria-Florina (Nina) Balcan February 24th, 2016 PAC/SLT models for Supervised Learning Data Source Distribution D on X Learning Algorithm Expert / Oracle
More informationIntroduction to Machine Learning (67577) Lecture 5
Introduction to Machine Learning (67577) Lecture 5 Shai Shalev-Shwartz School of CS and Engineering, The Hebrew University of Jerusalem Nonuniform learning, MDL, SRM, Decision Trees, Nearest Neighbor Shai
More information2 Upper-bound of Generalization Error of AdaBoost
COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #10 Scribe: Haipeng Zheng March 5, 2008 1 Review of AdaBoost Algorithm Here is the AdaBoost Algorithm: input: (x 1,y 1 ),...,(x m,y
More informationOutline: Ensemble Learning. Ensemble Learning. The Wisdom of Crowds. The Wisdom of Crowds - Really? Crowd wiser than any individual
Outline: Ensemble Learning We will describe and investigate algorithms to Ensemble Learning Lecture 10, DD2431 Machine Learning A. Maki, J. Sullivan October 2014 train weak classifiers/regressors and how
More informationVC Dimension Review. The purpose of this document is to review VC dimension and PAC learning for infinite hypothesis spaces.
VC Dimension Review The purpose of this document is to review VC dimension and PAC learning for infinite hypothesis spaces. Previously, in discussing PAC learning, we were trying to answer questions about
More informationLearning with multiple models. Boosting.
CS 2750 Machine Learning Lecture 21 Learning with multiple models. Boosting. Milos Hauskrecht milos@cs.pitt.edu 5329 Sennott Square Learning with multiple models: Approach 2 Approach 2: use multiple models
More informationEnsembles. Léon Bottou COS 424 4/8/2010
Ensembles Léon Bottou COS 424 4/8/2010 Readings T. G. Dietterich (2000) Ensemble Methods in Machine Learning. R. E. Schapire (2003): The Boosting Approach to Machine Learning. Sections 1,2,3,4,6. Léon
More informationTheory and Applications of A Repeated Game Playing Algorithm. Rob Schapire Princeton University [currently visiting Yahoo!
Theory and Applications of A Repeated Game Playing Algorithm Rob Schapire Princeton University [currently visiting Yahoo! Research] Learning Is (Often) Just a Game some learning problems: learn from training
More informationComputational and Statistical Learning Theory
Computational and Statistical Learning Theory TTIC 31120 Prof. Nati Srebro Lecture 17: Stochastic Optimization Part II: Realizable vs Agnostic Rates Part III: Nearest Neighbor Classification Stochastic
More informationAdvanced Machine Learning
Advanced Machine Learning Deep Boosting MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Outline Model selection. Deep boosting. theory. algorithm. experiments. page 2 Model Selection Problem:
More informationi=1 = H t 1 (x) + α t h t (x)
AdaBoost AdaBoost, which stands for ``Adaptive Boosting", is an ensemble learning algorithm that uses the boosting paradigm []. We will discuss AdaBoost for binary classification. That is, we assume that
More informationAgnostic Online learnability
Technical Report TTIC-TR-2008-2 October 2008 Agnostic Online learnability Shai Shalev-Shwartz Toyota Technological Institute Chicago shai@tti-c.org ABSTRACT We study a fundamental question. What classes
More informationVBM683 Machine Learning
VBM683 Machine Learning Pinar Duygulu Slides are adapted from Dhruv Batra Bias is the algorithm's tendency to consistently learn the wrong thing by not taking into account all the information in the data
More informationAdaBoost. S. Sumitra Department of Mathematics Indian Institute of Space Science and Technology
AdaBoost S. Sumitra Department of Mathematics Indian Institute of Space Science and Technology 1 Introduction In this chapter, we are considering AdaBoost algorithm for the two class classification problem.
More information1 Rademacher Complexity Bounds
COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #10 Scribe: Max Goer March 07, 2013 1 Radeacher Coplexity Bounds Recall the following theore fro last lecture: Theore 1. With probability
More informationECE 5984: Introduction to Machine Learning
ECE 5984: Introduction to Machine Learning Topics: Ensemble Methods: Bagging, Boosting Readings: Murphy 16.4; Hastie 16 Dhruv Batra Virginia Tech Administrativia HW3 Due: April 14, 11:55pm You will implement
More informationAn overview of Boosting. Yoav Freund UCSD
An overview of Boosting Yoav Freund UCSD Plan of talk Generative vs. non-generative modeling Boosting Alternating decision trees Boosting and over-fitting Applications 2 Toy Example Computer receives telephone
More informationComputational Learning Theory: Probably Approximately Correct (PAC) Learning. Machine Learning. Spring The slides are mainly from Vivek Srikumar
Computational Learning Theory: Probably Approximately Correct (PAC) Learning Machine Learning Spring 2018 The slides are mainly from Vivek Srikumar 1 This lecture: Computational Learning Theory The Theory
More informationOn the power and the limits of evolvability. Vitaly Feldman Almaden Research Center
On the power and the limits of evolvability Vitaly Feldman Almaden Research Center Learning from examples vs evolvability Learnable from examples Evolvable Parity functions The core model: PAC [Valiant
More informationBig Data Analytics. Special Topics for Computer Science CSE CSE Feb 24
Big Data Analytics Special Topics for Computer Science CSE 4095-001 CSE 5095-005 Feb 24 Fei Wang Associate Professor Department of Computer Science and Engineering fei_wang@uconn.edu Prediction III Goal
More informationChapter 18. Decision Trees and Ensemble Learning. Recall: Learning Decision Trees
CSE 473 Chapter 18 Decision Trees and Ensemble Learning Recall: Learning Decision Trees Example: When should I wait for a table at a restaurant? Attributes (features) relevant to Wait? decision: 1. Alternate:
More informationLecture Learning infinite hypothesis class via VC-dimension and Rademacher complexity;
CSCI699: Topics in Learning and Game Theory Lecture 2 Lecturer: Ilias Diakonikolas Scribes: Li Han Today we will cover the following 2 topics: 1. Learning infinite hypothesis class via VC-dimension and
More informationData Warehousing & Data Mining
13. Meta-Algorithms for Classification Data Warehousing & Data Mining Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de 13.
More informationHypothesis Testing and Computational Learning Theory. EECS 349 Machine Learning With slides from Bryan Pardo, Tom Mitchell
Hypothesis Testing and Computational Learning Theory EECS 349 Machine Learning With slides from Bryan Pardo, Tom Mitchell Overview Hypothesis Testing: How do we know our learners are good? What does performance
More informationGeneralization, Overfitting, and Model Selection
Generalization, Overfitting, and Model Selection Sample Complexity Results for Supervised Classification MariaFlorina (Nina) Balcan 10/05/2016 Reminders Midterm Exam Mon, Oct. 10th Midterm Review Session
More informationA Brief Introduction to Adaboost
A Brief Introduction to Adaboost Hongbo Deng 6 Feb, 2007 Some of the slides are borrowed from Derek Hoiem & Jan ˇSochman. 1 Outline Background Adaboost Algorithm Theory/Interpretations 2 What s So Good
More informationAn adaptive version of the boost by majority algorithm
An adaptive version of the boost by majority algorithm Yoav Freund AT&T Labs 180 Park Avenue Florham Park NJ 092 USA October 2 2000 Abstract We propose a new boosting algorithm This boosting algorithm
More informationIntroduction to Machine Learning (67577) Lecture 7
Introduction to Machine Learning (67577) Lecture 7 Shai Shalev-Shwartz School of CS and Engineering, The Hebrew University of Jerusalem Solving Convex Problems using SGD and RLM Shai Shalev-Shwartz (Hebrew
More informationThe AdaBoost algorithm =1/n for i =1,...,n 1) At the m th iteration we find (any) classifier h(x; ˆθ m ) for which the weighted classification error m
) Set W () i The AdaBoost algorithm =1/n for i =1,...,n 1) At the m th iteration we find (any) classifier h(x; ˆθ m ) for which the weighted classification error m m =.5 1 n W (m 1) i y i h(x i ; 2 ˆθ
More informationTDT4173 Machine Learning
TDT4173 Machine Learning Lecture 9 Learning Classifiers: Bagging & Boosting Norwegian University of Science and Technology Helge Langseth IT-VEST 310 helgel@idi.ntnu.no 1 TDT4173 Machine Learning Outline
More informationWeb-Mining Agents Computational Learning Theory
Web-Mining Agents Computational Learning Theory Prof. Dr. Ralf Möller Dr. Özgür Özcep Universität zu Lübeck Institut für Informationssysteme Tanya Braun (Exercise Lab) Computational Learning Theory (Adapted)
More informationCSE 151 Machine Learning. Instructor: Kamalika Chaudhuri
CSE 151 Machine Learning Instructor: Kamalika Chaudhuri Ensemble Learning How to combine multiple classifiers into a single one Works well if the classifiers are complementary This class: two types of
More informationOn the tradeoff between computational complexity and sample complexity in learning
On the tradeoff between computational complexity and sample complexity in learning Shai Shalev-Shwartz School of Computer Science and Engineering The Hebrew University of Jerusalem Joint work with Sham
More informationEnsembles of Classifiers.
Ensembles of Classifiers www.biostat.wisc.edu/~dpage/cs760/ 1 Goals for the lecture you should understand the following concepts ensemble bootstrap sample bagging boosting random forests error correcting
More informationAnnouncements Kevin Jamieson
Announcements My office hours TODAY 3:30 pm - 4:30 pm CSE 666 Poster Session - Pick one First poster session TODAY 4:30 pm - 7:30 pm CSE Atrium Second poster session December 12 4:30 pm - 7:30 pm CSE Atrium
More informationCS7267 MACHINE LEARNING
CS7267 MACHINE LEARNING ENSEMBLE LEARNING Ref: Dr. Ricardo Gutierrez-Osuna at TAMU, and Aarti Singh at CMU Mingon Kang, Ph.D. Computer Science, Kennesaw State University Definition of Ensemble Learning
More informationBoosting. Ryan Tibshirani Data Mining: / April Optional reading: ISL 8.2, ESL , 10.7, 10.13
Boosting Ryan Tibshirani Data Mining: 36-462/36-662 April 25 2013 Optional reading: ISL 8.2, ESL 10.1 10.4, 10.7, 10.13 1 Reminder: classification trees Suppose that we are given training data (x i, y
More informationThe Frank-Wolfe Algorithm:
The Frank-Wolfe Algorithm: New Results, and Connections to Statistical Boosting Paul Grigas, Robert Freund, and Rahul Mazumder http://web.mit.edu/rfreund/www/talks.html Massachusetts Institute of Technology
More informationI D I A P. Online Policy Adaptation for Ensemble Classifiers R E S E A R C H R E P O R T. Samy Bengio b. Christos Dimitrakakis a IDIAP RR 03-69
R E S E A R C H R E P O R T Online Policy Adaptation for Ensemble Classifiers Christos Dimitrakakis a IDIAP RR 03-69 Samy Bengio b I D I A P December 2003 D a l l e M o l l e I n s t i t u t e for Perceptual
More informationComputational Learning Theory. CS 486/686: Introduction to Artificial Intelligence Fall 2013
Computational Learning Theory CS 486/686: Introduction to Artificial Intelligence Fall 2013 1 Overview Introduction to Computational Learning Theory PAC Learning Theory Thanks to T Mitchell 2 Introduction
More informationComputational Learning Theory - Hilary Term : Introduction to the PAC Learning Framework
Computational Learning Theory - Hilary Term 2018 1 : Introduction to the PAC Learning Framework Lecturer: Varun Kanade 1 What is computational learning theory? Machine learning techniques lie at the heart
More informationCOS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture 24 Scribe: Sachin Ravi May 2, 2013
COS 5: heoretical Machine Learning Lecturer: Rob Schapire Lecture 24 Scribe: Sachin Ravi May 2, 203 Review of Zero-Sum Games At the end of last lecture, we discussed a model for two player games (call
More informationEnsemble Methods. Charles Sutton Data Mining and Exploration Spring Friday, 27 January 12
Ensemble Methods Charles Sutton Data Mining and Exploration Spring 2012 Bias and Variance Consider a regression problem Y = f(x)+ N(0, 2 ) With an estimate regression function ˆf, e.g., ˆf(x) =w > x Suppose
More information10.1 The Formal Model
67577 Intro. to Machine Learning Fall semester, 2008/9 Lecture 10: The Formal (PAC) Learning Model Lecturer: Amnon Shashua Scribe: Amnon Shashua 1 We have see so far algorithms that explicitly estimate
More informationEnsemble learning 11/19/13. The wisdom of the crowds. Chapter 11. Ensemble methods. Ensemble methods
The wisdom of the crowds Ensemble learning Sir Francis Galton discovered in the early 1900s that a collection of educated guesses can add up to very accurate predictions! Chapter 11 The paper in which
More informationIntroduction to Machine Learning
Introduction to Machine Learning PAC Learning and VC Dimension Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE
More informationGradient Boosting (Continued)
Gradient Boosting (Continued) David Rosenberg New York University April 4, 2016 David Rosenberg (New York University) DS-GA 1003 April 4, 2016 1 / 31 Boosting Fits an Additive Model Boosting Fits an Additive
More informationInfinite Ensemble Learning with Support Vector Machinery
Infinite Ensemble Learning with Support Vector Machinery Hsuan-Tien Lin and Ling Li Learning Systems Group, California Institute of Technology ECML/PKDD, October 4, 2005 H.-T. Lin and L. Li (Learning Systems
More informationMachine Learning. VC Dimension and Model Complexity. Eric Xing , Fall 2015
Machine Learning 10-701, Fall 2015 VC Dimension and Model Complexity Eric Xing Lecture 16, November 3, 2015 Reading: Chap. 7 T.M book, and outline material Eric Xing @ CMU, 2006-2015 1 Last time: PAC and
More informationBoos$ng Can we make dumb learners smart?
Boos$ng Can we make dumb learners smart? Aarti Singh Machine Learning 10-601 Nov 29, 2011 Slides Courtesy: Carlos Guestrin, Freund & Schapire 1 Why boost weak learners? Goal: Automa'cally categorize type
More informationLecture 25 of 42. PAC Learning, VC Dimension, and Mistake Bounds
Lecture 25 of 42 PAC Learning, VC Dimension, and Mistake Bounds Thursday, 15 March 2007 William H. Hsu, KSU http://www.kddresearch.org/courses/spring2007/cis732 Readings: Sections 7.4.17.4.3, 7.5.17.5.3,
More informationMulticlass Boosting with Repartitioning
Multiclass Boosting with Repartitioning Ling Li Learning Systems Group, Caltech ICML 2006 Binary and Multiclass Problems Binary classification problems Y = { 1, 1} Multiclass classification problems Y
More informationDoes Unlabeled Data Help?
Does Unlabeled Data Help? Worst-case Analysis of the Sample Complexity of Semi-supervised Learning. Ben-David, Lu and Pal; COLT, 2008. Presentation by Ashish Rastogi Courant Machine Learning Seminar. Outline
More informationA Decision Stump. Decision Trees, cont. Boosting. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University. October 1 st, 2007
Decision Trees, cont. Boosting Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University October 1 st, 2007 1 A Decision Stump 2 1 The final tree 3 Basic Decision Tree Building Summarized
More informationThe Perceptron algorithm
The Perceptron algorithm Tirgul 3 November 2016 Agnostic PAC Learnability A hypothesis class H is agnostic PAC learnable if there exists a function m H : 0,1 2 N and a learning algorithm with the following
More informationOpen Problem: A (missing) boosting-type convergence result for ADABOOST.MH with factorized multi-class classifiers
JMLR: Workshop and Conference Proceedings vol 35:1 8, 014 Open Problem: A (missing) boosting-type convergence result for ADABOOST.MH with factorized multi-class classifiers Balázs Kégl LAL/LRI, University
More informationPAC-learning, VC Dimension and Margin-based Bounds
More details: General: http://www.learning-with-kernels.org/ Example of more complex bounds: http://www.research.ibm.com/people/t/tzhang/papers/jmlr02_cover.ps.gz PAC-learning, VC Dimension and Margin-based
More information1 Review of Winnow Algorithm
COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture # 17 Scribe: Xingyuan Fang, Ethan April 9th, 2013 1 Review of Winnow Algorithm We have studied Winnow algorithm in Algorithm 1. Algorithm
More informationDistributed Machine Learning. Maria-Florina Balcan, Georgia Tech
Distributed Machine Learning Maria-Florina Balcan, Georgia Tech Model for reasoning about key issues in supervised learning [Balcan-Blum-Fine-Mansour, COLT 12] Supervised Learning Example: which emails
More informationICML '97 and AAAI '97 Tutorials
A Short Course in Computational Learning Theory: ICML '97 and AAAI '97 Tutorials Michael Kearns AT&T Laboratories Outline Sample Complexity/Learning Curves: nite classes, Occam's VC dimension Razor, Best
More informationReducing Multiclass to Binary: A Unifying Approach for Margin Classifiers
Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers Erin Allwein, Robert Schapire and Yoram Singer Journal of Machine Learning Research, 1:113-141, 000 CSE 54: Seminar on Learning
More informationCOMS 4771 Introduction to Machine Learning. Nakul Verma
COMS 4771 Introduction to Machine Learning Nakul Verma Announcements HW2 due now! Project proposal due on tomorrow Midterm next lecture! HW3 posted Last time Linear Regression Parametric vs Nonparametric
More informationSelf bounding learning algorithms
Self bounding learning algorithms Yoav Freund AT&T Labs 180 Park Avenue Florham Park, NJ 07932-0971 USA yoav@research.att.com January 17, 2000 Abstract Most of the work which attempts to give bounds on
More informationMachine Learning. Ensemble Methods. Manfred Huber
Machine Learning Ensemble Methods Manfred Huber 2015 1 Bias, Variance, Noise Classification errors have different sources Choice of hypothesis space and algorithm Training set Noise in the data The expected
More informationWhat makes good ensemble? CS789: Machine Learning and Neural Network. Introduction. More on diversity
What makes good ensemble? CS789: Machine Learning and Neural Network Ensemble methods Jakramate Bootkrajang Department of Computer Science Chiang Mai University 1. A member of the ensemble is accurate.
More informationAn Introduction to Statistical Theory of Learning. Nakul Verma Janelia, HHMI
An Introduction to Statistical Theory of Learning Nakul Verma Janelia, HHMI Towards formalizing learning What does it mean to learn a concept? Gain knowledge or experience of the concept. The basic process
More informationStatistical Machine Learning from Data
Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Ensembles Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique Fédérale de Lausanne
More informationName (NetID): (1 Point)
CS446: Machine Learning Fall 2016 October 25 th, 2016 This is a closed book exam. Everything you need in order to solve the problems is supplied in the body of this exam. This exam booklet contains four
More information