Littlestone s Dimension and Online Learnability
|
|
- Janel Potter
- 5 years ago
- Views:
Transcription
1 Littlestone s Dimension and Online Learnability Shai Shalev-Shwartz Toyota Technological Institute at Chicago The Hebrew University Talk at UCSD workshop, February, 2009 Joint work with Shai Ben-David and David Pal Shai Shalev-Shwartz (TTI-C) Online Learnability Feb 09 1 / 21
2 Online Learning For t = 1,..., T Environment presents input x t X Learner predicts label ŷ t {0, 1} Environment reveals true label y t {0, 1} Learner pays 1 if ŷ t y t and 0 otherwise Goal: Make few mistakes Shai Shalev-Shwartz (TTI-C) Online Learnability Feb 09 2 / 21
3 Online Learning For t = 1,..., T Environment presents input x t X Learner predicts label ŷ t {0, 1} Environment reveals true label y t {0, 1} Learner pays 1 if ŷ t y t and 0 otherwise Goal: Make few mistakes Online Learnability: When can we guarantee to make few mistakes? Shai Shalev-Shwartz (TTI-C) Online Learnability Feb 09 2 / 21
4 Online Learning For t = 1,..., T Environment presents input x t X Learner predicts label ŷ t {0, 1} Environment reveals true label y t {0, 1} Learner pays 1 if ŷ t y t and 0 otherwise Goal: Make few mistakes Online Learnability: When can we guarantee to make few mistakes? PAC Learnability: well understood (VC theory) Shai Shalev-Shwartz (TTI-C) Online Learnability Feb 09 2 / 21
5 Outline Online Learnability: Can we be almost as good as the best predictor in a reference class H? Shai Shalev-Shwartz (TTI-C) Online Learnability Feb 09 3 / 21
6 Outline Online Learnability: Can we be almost as good as the best predictor in a reference class H? Finite H Infinite H margin-based H No noise Halving Arbitrary noise Stochastic noise Shai Shalev-Shwartz (TTI-C) Online Learnability Feb 09 3 / 21
7 Outline Online Learnability: Can we be almost as good as the best predictor in a reference class H? Finite H Infinite H margin-based H No noise Halving L 88 Arbitrary noise Stochastic noise Shai Shalev-Shwartz (TTI-C) Online Learnability Feb 09 3 / 21
8 Outline Online Learnability: Can we be almost as good as the best predictor in a reference class H? Finite H Infinite H margin-based H No noise Halving L 88 Arbitrary noise LW 94 Stochastic noise Shai Shalev-Shwartz (TTI-C) Online Learnability Feb 09 3 / 21
9 Outline Online Learnability: Can we be almost as good as the best predictor in a reference class H? Finite H Infinite H margin-based H No noise Halving L 88 Arbitrary noise LW 94 Stochastic noise Shai Shalev-Shwartz (TTI-C) Online Learnability Feb 09 3 / 21
10 Outline Online Learnability: Can we be almost as good as the best predictor in a reference class H? Finite H Infinite H margin-based H No noise Halving L 88 Arbitrary noise LW 94 Stochastic noise Upper and (almost) matching lower bounds Seamlessly deriving new algorithms/bounds Shai Shalev-Shwartz (TTI-C) Online Learnability Feb 09 3 / 21
11 Realizable Case (no noise) Realizable Assumption: Environment answers y t = h(x t ), where h H and the hypothesis class, H, is known to the learner Theorem (Littlestone 88) A combinatorial dimension, Ldim(H), characterizes online learnability: Any algorithm might make at least Ldim(H) mistakes Exists algorithm that makes at most Ldim(H) mistakes But, only in the realizable case... Shai Shalev-Shwartz (TTI-C) Online Learnability Feb 09 4 / 21
12 Littlestone s dimension Motivation h 1 h 2 h 8 Shai Shalev-Shwartz (TTI-C) Online Learnability Feb 09 5 / 21
13 Littlestone s dimension Motivation h 1 h h h 1 h 2 h 5 h 6 Shai Shalev-Shwartz (TTI-C) Online Learnability Feb 09 6 / 21
14 Littlestone s dimension Definition Ldim(H) is the maximal depth of a full binary tree such that each path is explained by some h H Lemma Any learner can be forced to make at least Ldim(H) mistakes Proof. Adversarial environment will walk on the tree, while on each round setting y t = ŷ t. Shai Shalev-Shwartz (TTI-C) Online Learnability Feb 09 7 / 21
15 Standard Optimal Algorithm (SOA) initialize: V 1 = H for t = 1, 2,... receive x t for r {0, 1} let V (r) t = {h V t : h(x t ) = r} predict ŷ t = arg max r Ldim(V (r) t ) receive true answer y t update V t+1 = V (yt) t Shai Shalev-Shwartz (TTI-C) Online Learnability Feb 09 8 / 21
16 Standard Optimal Algorithm (SOA) initialize: V 1 = H for t = 1, 2,... receive x t for r {0, 1} let V (r) t = {h V t : h(x t ) = r} predict ŷ t = arg max r Ldim(V (r) t ) receive true answer y t update V t+1 = V (yt) t Theorem SOA makes at most Ldim(H) mistakes. Proof. Whenever SOA errs we have Ldim(V t+1 ) Ldim(V t ) 1. Shai Shalev-Shwartz (TTI-C) Online Learnability Feb 09 8 / 21
17 Intermediate Summary Littlestone s dimension characterizes online learnability Example: H = { all 100 characters long C++ functions } Ldim(H) 500 Shai Shalev-Shwartz (TTI-C) Online Learnability Feb 09 9 / 21
18 Intermediate Summary Littlestone s dimension characterizes online learnability Example: H = { all 100 characters long C++ functions } Ldim(H) 500 Received relatively little attention by researchers Maybe due to: Non-realistic realizable assumption Lack of interesting examples Lack of margin-based theory Coming Next Generalizing to: Agnostic case (noise is allowed) Fat dimension and margin-based bounds Linear separators new algorithms/bounds Shai Shalev-Shwartz (TTI-C) Online Learnability Feb 09 9 / 21
19 Agnostic Online Learning and Regret Analysis Make no assumptions on origin of labels Analyze regret of not following best predictor in H: T t=1 ŷ t y t min h H When can we guarantee low regret? T h(x t ) y t t=1 Shai Shalev-Shwartz (TTI-C) Online Learnability Feb / 21
20 Cover s impossibility result H = {h(x) = 1, h(x) = 0} Ldim(H) = 1 Environment will output y t = ŷ t Learner makes T mistakes Best in H makes at most T/2 mistakes Regret is at least T/2 Corollary: Online learning in the non-realizable case is impossible?!? Shai Shalev-Shwartz (TTI-C) Online Learnability Feb / 21
21 Randomized Prediction and Expected Regret Let s weaken the environment it should decide on y t before seeing ŷ t For deterministic learner, environment can simulate learner so there s no difference For learner that randomizes his predictions big difference We analyze expected regret T t=1 E[ ŷ t y t ] min h H T h(x t ) y t t=1 This enables to sidestep Cover s impossibility result Online learning in the non-realizable case becomes possible! Shai Shalev-Shwartz (TTI-C) Online Learnability Feb / 21
22 Weighed Majority WM for learning with d experts initialize: assign weight w i = 1 for each expert for t = 1, 2,..., T each expert predicts f i {0, 1} environment determines y t without revealing it to the learner predict ŷ t = 1 w.p. i:f i =1 w i receive label y t foreach wrong expert: w i η w i Shai Shalev-Shwartz (TTI-C) Online Learnability Feb / 21
23 Weighed Majority WM for learning with d experts initialize: assign weight w i = 1 for each expert for t = 1, 2,..., T each expert predicts f i {0, 1} environment determines y t without revealing it to the learner predict ŷ t = 1 w.p. i:f i =1 w i receive label y t foreach wrong expert: w i η w i Theorem WM achieves expected regret of at most: ln(d) T Shai Shalev-Shwartz (TTI-C) Online Learnability Feb / 21
24 WM and Online Learnability WM regret bound a finite H is learnable with regret ln( H ) T Is this the best we can do? And, what if H is infinite? Solution: Combing WM with SOA Shai Shalev-Shwartz (TTI-C) Online Learnability Feb / 21
25 WM and Online Learnability WM regret bound a finite H is learnable with regret ln( H ) T Is this the best we can do? And, what if H is infinite? Solution: Combing WM with SOA Theorem Exists learner with expected regret Ldim(H) T log(t ) No learner can have expected regret smaller than Ldim(H) T Therefore: H is agnostic online learnable Ldim(H) < Shai Shalev-Shwartz (TTI-C) Online Learnability Feb / 21
26 Proof idea Expert(i 1,..., i L ) initialize: V 1 = H for t = 1, 2,... receive x t for r {0, 1} let V (r) t = {h V t : h(x t ) = r} define ŷ t = arg max r Ldim(V (r) t ) if t {i 1,..., i L } flip prediction: ŷ t ŷ t update V t+1 = V (ŷt) t Lemma If Ldim(H) <, then for any h H exists i 1,..., i L, L < Ldim(H), s.t. Expert(i 1,..., i L ) agrees with h on the entire sequence. Shai Shalev-Shwartz (TTI-C) Online Learnability Feb / 21
27 Agnostic Online Learning Bounded Stochastic Noise Previous theorem holds for any noise For stochastic noise better results Assume: y t = h(x t ) + 2 ν t, where P[ν t = 1] γ < 1 2 Then, there exists learner with: [ T ] 1 E ŷ t h(x t ) Ldim(H) ln(t ) 1 2 γ(1 γ) t=1 Learner is better than teacher: Learner makes O(ln(T )) mistakes while teacher makes γ T mistakes Shai Shalev-Shwartz (TTI-C) Online Learnability Feb / 21
28 Fat Littlestone s dimension Consider hypotheses of the form h : X R, where actual prediction is sign(h(x)) Fat Littlestone s dimension: Maximal depth of tree such that each path is explained by some h H with margin γ Importance: Can apply analysis tools for bounding a combinatorial object Theorem Let M be expected #mistakes of online learner Let M γ (H) be #margin-mistakes of optimal h H M M γ (H) + Ldim γ (H) ln(t ) T Shai Shalev-Shwartz (TTI-C) Online Learnability Feb / 21
29 Fat Littlestone s dimension of linear separators Linear predictors: H = {x w, x : w 1} Shai Shalev-Shwartz (TTI-C) Online Learnability Feb / 21
30 Fat Littlestone s dimension of linear separators Linear predictors: H = {x w, x : w 1} Lemma If X is the unit ball of a σ-regular Banach space (B, ), then Ldim γ (H) σ γ 2 Shai Shalev-Shwartz (TTI-C) Online Learnability Feb / 21
31 Fat Littlestone s dimension of linear separators Linear predictors: H = {x w, x : w 1} Lemma If X is the unit ball of a σ-regular Banach space (B, ), then Ldim γ (H) σ γ 2 Examples: X H Ldim γ (H) {x : x 2 1} {x w, x : w 2 1} 1 γ 2 {x : x 1} {x w, x : w 1 1} log(n) γ 2 Shai Shalev-Shwartz (TTI-C) Online Learnability Feb / 21
32 (Surprising) Corollary: Regret with non-convex loss M M γ (H) + 1 γ ln(t ) T l(a, y) Freund and Schapire 99 Quadratic loss Gentile 02 hinge loss No result with non-convex loss a Shai Shalev-Shwartz (TTI-C) Online Learnability Feb / 21
33 Summary Online Learning PAC Learning Dimension Ldim(H) VCdim(H) Realizable case: dim T Agnostic case: dim T Low noise: dim T Margin: Shai Shalev-Shwartz (TTI-C) Online Learnability Feb / 21
34 Some Open Problems Ldim and fat-ldim calculus Bridging the log(t ) gap between lower and upper bounds Other noise conditions (Tsybakov, Steinwart) Multiclass prediction with bandit feedback: Efficient algorithms? Lower bounds? Low Ldim Compression scheme Low VCdim Low Ldim? Compression scheme? Low VCdim Shai Shalev-Shwartz (TTI-C) Online Learnability Feb / 21
Agnostic Online learnability
Technical Report TTIC-TR-2008-2 October 2008 Agnostic Online learnability Shai Shalev-Shwartz Toyota Technological Institute Chicago shai@tti-c.org ABSTRACT We study a fundamental question. What classes
More informationAgnostic Online Learning
Agnostic Online Learning Shai Ben-David and Dávid Pál David R. Cheriton School of Computer Science University of Waterloo Waterloo, ON, Canada {shai,dpal}@cs.uwaterloo.ca Shai Shalev-Shwartz Toyota Technological
More informationEfficient Bandit Algorithms for Online Multiclass Prediction
Efficient Bandit Algorithms for Online Multiclass Prediction Sham Kakade, Shai Shalev-Shwartz and Ambuj Tewari Presented By: Nakul Verma Motivation In many learning applications, true class labels are
More informationOnline Learning Summer School Copenhagen 2015 Lecture 1
Online Learning Summer School Copenhagen 2015 Lecture 1 Shai Shalev-Shwartz School of CS and Engineering, The Hebrew University of Jerusalem Online Learning Shai Shalev-Shwartz (Hebrew U) OLSS Lecture
More informationOn the tradeoff between computational complexity and sample complexity in learning
On the tradeoff between computational complexity and sample complexity in learning Shai Shalev-Shwartz School of Computer Science and Engineering The Hebrew University of Jerusalem Joint work with Sham
More informationMulticlass Learnability and the ERM principle
Multiclass Learnability and the ERM principle Amit Daniely Sivan Sabato Shai Ben-David Shai Shalev-Shwartz November 5, 04 arxiv:308.893v [cs.lg] 4 Nov 04 Abstract We study the sample complexity of multiclass
More informationEmpirical Risk Minimization Algorithms
Empirical Risk Minimization Algorithms Tirgul 2 Part I November 2016 Reminder Domain set, X : the set of objects that we wish to label. Label set, Y : the set of possible labels. A prediction rule, h:
More informationMulticlass Learnability and the ERM Principle
Journal of Machine Learning Research 6 205 2377-2404 Submitted 8/3; Revised /5; Published 2/5 Multiclass Learnability and the ERM Principle Amit Daniely amitdaniely@mailhujiacil Dept of Mathematics, The
More informationMulticlass Learnability and the ERM principle
JMLR: Workshop and Conference Proceedings 19 2011 207 232 24th Annual Conference on Learning Theory Multiclass Learnability and the ERM principle Amit Daniely amit.daniely@mail.huji.ac.il Dept. of Mathematics,
More informationClassification. Jordan Boyd-Graber University of Maryland WEIGHTED MAJORITY. Slides adapted from Mohri. Jordan Boyd-Graber UMD Classification 1 / 13
Classification Jordan Boyd-Graber University of Maryland WEIGHTED MAJORITY Slides adapted from Mohri Jordan Boyd-Graber UMD Classification 1 / 13 Beyond Binary Classification Before we ve talked about
More informationOLSO. Online Learning and Stochastic Optimization. Yoram Singer August 10, Google Research
OLSO Online Learning and Stochastic Optimization Yoram Singer August 10, 2016 Google Research References Introduction to Online Convex Optimization, Elad Hazan, Princeton University Online Learning and
More informationOnline Learning. Jordan Boyd-Graber. University of Colorado Boulder LECTURE 21. Slides adapted from Mohri
Online Learning Jordan Boyd-Graber University of Colorado Boulder LECTURE 21 Slides adapted from Mohri Jordan Boyd-Graber Boulder Online Learning 1 of 31 Motivation PAC learning: distribution fixed over
More informationFrom Batch to Transductive Online Learning
From Batch to Transductive Online Learning Sham Kakade Toyota Technological Institute Chicago, IL 60637 sham@tti-c.org Adam Tauman Kalai Toyota Technological Institute Chicago, IL 60637 kalai@tti-c.org
More informationComputational and Statistical Learning Theory
Computational and Statistical Learning Theory TTIC 31120 Prof. Nati Srebro Lecture 8: Boosting (and Compression Schemes) Boosting the Error If we have an efficient learning algorithm that for any distribution
More informationThe sample complexity of agnostic learning with deterministic labels
The sample complexity of agnostic learning with deterministic labels Shai Ben-David Cheriton School of Computer Science University of Waterloo Waterloo, ON, N2L 3G CANADA shai@uwaterloo.ca Ruth Urner College
More informationOnline Convex Optimization
Advanced Course in Machine Learning Spring 2010 Online Convex Optimization Handouts are jointly prepared by Shie Mannor and Shai Shalev-Shwartz A convex repeated game is a two players game that is performed
More informationTutorial: PART 1. Online Convex Optimization, A Game- Theoretic Approach to Learning.
Tutorial: PART 1 Online Convex Optimization, A Game- Theoretic Approach to Learning http://www.cs.princeton.edu/~ehazan/tutorial/tutorial.htm Elad Hazan Princeton University Satyen Kale Yahoo Research
More informationPAC Learning Introduction to Machine Learning. Matt Gormley Lecture 14 March 5, 2018
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University PAC Learning Matt Gormley Lecture 14 March 5, 2018 1 ML Big Picture Learning Paradigms:
More informationOnline Learning, Mistake Bounds, Perceptron Algorithm
Online Learning, Mistake Bounds, Perceptron Algorithm 1 Online Learning So far the focus of the course has been on batch learning, where algorithms are presented with a sample of training data, from which
More informationLearnability, Stability, Regularization and Strong Convexity
Learnability, Stability, Regularization and Strong Convexity Nati Srebro Shai Shalev-Shwartz HUJI Ohad Shamir Weizmann Karthik Sridharan Cornell Ambuj Tewari Michigan Toyota Technological Institute Chicago
More informationPractical Agnostic Active Learning
Practical Agnostic Active Learning Alina Beygelzimer Yahoo Research based on joint work with Sanjoy Dasgupta, Daniel Hsu, John Langford, Francesco Orabona, Chicheng Zhang, and Tong Zhang * * introductory
More informationIntroduction to Machine Learning (67577) Lecture 3
Introduction to Machine Learning (67577) Lecture 3 Shai Shalev-Shwartz School of CS and Engineering, The Hebrew University of Jerusalem General Learning Model and Bias-Complexity tradeoff Shai Shalev-Shwartz
More informationNew bounds on the price of bandit feedback for mistake-bounded online multiclass learning
Journal of Machine Learning Research 1 8, 2017 Algorithmic Learning Theory 2017 New bounds on the price of bandit feedback for mistake-bounded online multiclass learning Philip M. Long Google, 1600 Amphitheatre
More informationActive Learning: Disagreement Coefficient
Advanced Course in Machine Learning Spring 2010 Active Learning: Disagreement Coefficient Handouts are jointly prepared by Shie Mannor and Shai Shalev-Shwartz In previous lectures we saw examples in which
More informationSample Complexity of Learning Independent of Set Theory
Sample Complexity of Learning Independent of Set Theory Shai Ben-David University of Waterloo, Canada Based on joint work with Pavel Hrubes, Shay Moran, Amir Shpilka and Amir Yehudayoff Simons workshop,
More informationComputational and Statistical Learning Theory
Computational and Statistical Learning Theory TTIC 31120 Prof. Nati Srebro Lecture 12: Weak Learnability and the l 1 margin Converse to Scale-Sensitive Learning Stability Convex-Lipschitz-Bounded Problems
More informationSupport vector machines Lecture 4
Support vector machines Lecture 4 David Sontag New York University Slides adapted from Luke Zettlemoyer, Vibhav Gogate, and Carlos Guestrin Q: What does the Perceptron mistake bound tell us? Theorem: The
More informationEfficient Bandit Algorithms for Online Multiclass Prediction
Sham M. Kakade Shai Shalev-Shwartz Ambuj Tewari Toyota Technological Institute, 427 East 60th Street, Chicago, Illinois 60637, USA sham@tti-c.org shai@tti-c.org tewari@tti-c.org Keywords: Online learning,
More informationLearning Theory: Basic Guarantees
Learning Theory: Basic Guarantees Daniel Khashabi Fall 2014 Last Update: April 28, 2016 1 Introduction The Learning Theory 1, is somewhat least practical part of Machine Learning, which is most about the
More informationCOS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture 24 Scribe: Sachin Ravi May 2, 2013
COS 5: heoretical Machine Learning Lecturer: Rob Schapire Lecture 24 Scribe: Sachin Ravi May 2, 203 Review of Zero-Sum Games At the end of last lecture, we discussed a model for two player games (call
More informationIntroduction to Machine Learning (67577) Lecture 5
Introduction to Machine Learning (67577) Lecture 5 Shai Shalev-Shwartz School of CS and Engineering, The Hebrew University of Jerusalem Nonuniform learning, MDL, SRM, Decision Trees, Nearest Neighbor Shai
More informationGame Theory, On-line prediction and Boosting (Freund, Schapire)
Game heory, On-line prediction and Boosting (Freund, Schapire) Idan Attias /4/208 INRODUCION he purpose of this paper is to bring out the close connection between game theory, on-line prediction and boosting,
More informationOn the Generalization Ability of Online Strongly Convex Programming Algorithms
On the Generalization Ability of Online Strongly Convex Programming Algorithms Sham M. Kakade I Chicago Chicago, IL 60637 sham@tti-c.org Ambuj ewari I Chicago Chicago, IL 60637 tewari@tti-c.org Abstract
More informationFoundations of Machine Learning On-Line Learning. Mehryar Mohri Courant Institute and Google Research
Foundations of Machine Learning On-Line Learning Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Motivation PAC learning: distribution fixed over time (training and test). IID assumption.
More informationComputational Learning Theory: Probably Approximately Correct (PAC) Learning. Machine Learning. Spring The slides are mainly from Vivek Srikumar
Computational Learning Theory: Probably Approximately Correct (PAC) Learning Machine Learning Spring 2018 The slides are mainly from Vivek Srikumar 1 This lecture: Computational Learning Theory The Theory
More informationComputational and Statistical Learning Theory
Computational and Statistical Learning Theory TTIC 31120 Prof. Nati Srebro Lecture 17: Stochastic Optimization Part II: Realizable vs Agnostic Rates Part III: Nearest Neighbor Classification Stochastic
More informationEfficient and Principled Online Classification Algorithms for Lifelon
Efficient and Principled Online Classification Algorithms for Lifelong Learning Toyota Technological Institute at Chicago Chicago, IL USA Talk @ Lifelong Learning for Mobile Robotics Applications Workshop,
More informationIntroduction to Algorithms / Algorithms I Lecturer: Michael Dinitz Topic: Intro to Learning Theory Date: 12/8/16
600.463 Introduction to Algorithms / Algorithms I Lecturer: Michael Dinitz Topic: Intro to Learning Theory Date: 12/8/16 25.1 Introduction Today we re going to talk about machine learning, but from an
More informationAccelerating Stochastic Optimization
Accelerating Stochastic Optimization Shai Shalev-Shwartz School of CS and Engineering, The Hebrew University of Jerusalem and Mobileye Master Class at Tel-Aviv, Tel-Aviv University, November 2014 Shalev-Shwartz
More informationGeneralization, Overfitting, and Model Selection
Generalization, Overfitting, and Model Selection Sample Complexity Results for Supervised Classification Maria-Florina (Nina) Balcan 10/03/2016 Two Core Aspects of Machine Learning Algorithm Design. How
More informationIntroduction to Machine Learning
Introduction to Machine Learning 236756 Prof. Nir Ailon Lecture 4: Computational Complexity of Learning & Surrogate Losses Efficient PAC Learning Until now we were mostly worried about sample complexity
More informationOnline Learning: Random Averages, Combinatorial Parameters, and Learnability
Online Learning: Random Averages, Combinatorial Parameters, and Learnability Alexander Rakhlin Department of Statistics University of Pennsylvania Karthik Sridharan Toyota Technological Institute at Chicago
More informationOptimal and Adaptive Online Learning
Optimal and Adaptive Online Learning Haipeng Luo Advisor: Robert Schapire Computer Science Department Princeton University Examples of Online Learning (a) Spam detection 2 / 34 Examples of Online Learning
More informationLecture 16: Perceptron and Exponential Weights Algorithm
EECS 598-005: Theoretical Foundations of Machine Learning Fall 2015 Lecture 16: Perceptron and Exponential Weights Algorithm Lecturer: Jacob Abernethy Scribes: Yue Wang, Editors: Weiqing Yu and Andrew
More informationVC Dimension Review. The purpose of this document is to review VC dimension and PAC learning for infinite hypothesis spaces.
VC Dimension Review The purpose of this document is to review VC dimension and PAC learning for infinite hypothesis spaces. Previously, in discussing PAC learning, we were trying to answer questions about
More informationGeneralization and Overfitting
Generalization and Overfitting Model Selection Maria-Florina (Nina) Balcan February 24th, 2016 PAC/SLT models for Supervised Learning Data Source Distribution D on X Learning Algorithm Expert / Oracle
More informationMulticlass Learnability and the ERM principle
Multiclass Learnability and the ERM principle Amit Daniely Sivan Sabato Shai Ben-David Shai Shalev-Shwartz Abstract Multiclass learning is an area of growing practical relevance, for which the currently
More informationCS340 Machine learning Lecture 4 Learning theory. Some slides are borrowed from Sebastian Thrun and Stuart Russell
CS340 Machine learning Lecture 4 Learning theory Some slides are borrowed from Sebastian Thrun and Stuart Russell Announcement What: Workshop on applying for NSERC scholarships and for entry to graduate
More informationIntroduction to Machine Learning
Introduction to Machine Learning PAC Learning and VC Dimension Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE
More informationMachine Learning in the Data Revolution Era
Machine Learning in the Data Revolution Era Shai Shalev-Shwartz School of Computer Science and Engineering The Hebrew University of Jerusalem Machine Learning Seminar Series, Google & University of Waterloo,
More informationComputational and Statistical Learning Theory
Computational and Statistical Learning Theory TTIC 31120 Prof. Nati Srebro Lecture 7: Computational Complexity of Learning Agnostic Learning Hardness of Learning via Crypto Assumption: No poly-time algorithm
More informationNew Algorithms for Contextual Bandits
New Algorithms for Contextual Bandits Lev Reyzin Georgia Institute of Technology Work done at Yahoo! 1 S A. Beygelzimer, J. Langford, L. Li, L. Reyzin, R.E. Schapire Contextual Bandit Algorithms with Supervised
More informationOnline Passive-Aggressive Algorithms
Online Passive-Aggressive Algorithms Koby Crammer Ofer Dekel Shai Shalev-Shwartz Yoram Singer School of Computer Science & Engineering The Hebrew University, Jerusalem 91904, Israel {kobics,oferd,shais,singer}@cs.huji.ac.il
More informationThe Perceptron algorithm
The Perceptron algorithm Tirgul 3 November 2016 Agnostic PAC Learnability A hypothesis class H is agnostic PAC learnable if there exists a function m H : 0,1 2 N and a learning algorithm with the following
More information1 Review of Winnow Algorithm
COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture # 17 Scribe: Xingyuan Fang, Ethan April 9th, 2013 1 Review of Winnow Algorithm We have studied Winnow algorithm in Algorithm 1. Algorithm
More informationActive Learning and Optimized Information Gathering
Active Learning and Optimized Information Gathering Lecture 7 Learning Theory CS 101.2 Andreas Krause Announcements Project proposal: Due tomorrow 1/27 Homework 1: Due Thursday 1/29 Any time is ok. Office
More informationComputational Learning Theory. CS534 - Machine Learning
Computational Learning Theory CS534 Machine Learning Introduction Computational learning theory Provides a theoretical analysis of learning Shows when a learning algorithm can be expected to succeed Shows
More informationUnderstanding Machine Learning A theory Perspective
Understanding Machine Learning A theory Perspective Shai Ben-David University of Waterloo MLSS at MPI Tubingen, 2017 Disclaimer Warning. This talk is NOT about how cool machine learning is. I am sure you
More informationLecture 8. Instructor: Haipeng Luo
Lecture 8 Instructor: Haipeng Luo Boosting and AdaBoost In this lecture we discuss the connection between boosting and online learning. Boosting is not only one of the most fundamental theories in machine
More informationLecture Learning infinite hypothesis class via VC-dimension and Rademacher complexity;
CSCI699: Topics in Learning and Game Theory Lecture 2 Lecturer: Ilias Diakonikolas Scribes: Li Han Today we will cover the following 2 topics: 1. Learning infinite hypothesis class via VC-dimension and
More informationBetter Algorithms for Selective Sampling
Francesco Orabona Nicolò Cesa-Bianchi DSI, Università degli Studi di Milano, Italy francesco@orabonacom nicolocesa-bianchi@unimiit Abstract We study online algorithms for selective sampling that use regularized
More informationOnline Learning with Experts & Multiplicative Weights Algorithms
Online Learning with Experts & Multiplicative Weights Algorithms CS 159 lecture #2 Stephan Zheng April 1, 2016 Caltech Table of contents 1. Online Learning with Experts With a perfect expert Without perfect
More informationA Tutorial on Computational Learning Theory Presented at Genetic Programming 1997 Stanford University, July 1997
A Tutorial on Computational Learning Theory Presented at Genetic Programming 1997 Stanford University, July 1997 Vasant Honavar Artificial Intelligence Research Laboratory Department of Computer Science
More informationComputational and Statistical Learning Theory
Computational and Statistical Learning Theory TTIC 31120 Prof. Nati Srebro Lecture 4: MDL and PAC-Bayes Uniform vs Non-Uniform Bias No Free Lunch: we need some inductive bias Limiting attention to hypothesis
More informationPerceptron Mistake Bounds
Perceptron Mistake Bounds Mehryar Mohri, and Afshin Rostamizadeh Google Research Courant Institute of Mathematical Sciences Abstract. We present a brief survey of existing mistake bounds and introduce
More informationActive Learning from Single and Multiple Annotators
Active Learning from Single and Multiple Annotators Kamalika Chaudhuri University of California, San Diego Joint work with Chicheng Zhang Classification Given: ( xi, yi ) Vector of features Discrete Labels
More informationAdaptive Sampling Under Low Noise Conditions 1
Manuscrit auteur, publié dans "41èmes Journées de Statistique, SFdS, Bordeaux (2009)" Adaptive Sampling Under Low Noise Conditions 1 Nicolò Cesa-Bianchi Dipartimento di Scienze dell Informazione Università
More informationHybrid Machine Learning Algorithms
Hybrid Machine Learning Algorithms Umar Syed Princeton University Includes joint work with: Rob Schapire (Princeton) Nina Mishra, Alex Slivkins (Microsoft) Common Approaches to Machine Learning!! Supervised
More informationIFT Lecture 7 Elements of statistical learning theory
IFT 6085 - Lecture 7 Elements of statistical learning theory This version of the notes has not yet been thoroughly checked. Please report any bugs to the scribes or instructor. Scribe(s): Brady Neal and
More informationOnline multiclass learning with bandit feedback under a Passive-Aggressive approach
ESANN 205 proceedings, European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning. Bruges (Belgium), 22-24 April 205, i6doc.com publ., ISBN 978-28758704-8. Online
More informationIntroduction to Machine Learning (67577) Lecture 7
Introduction to Machine Learning (67577) Lecture 7 Shai Shalev-Shwartz School of CS and Engineering, The Hebrew University of Jerusalem Solving Convex Problems using SGD and RLM Shai Shalev-Shwartz (Hebrew
More informationOptimistic Rates Nati Srebro
Optimistic Rates Nati Srebro Based on work with Karthik Sridharan and Ambuj Tewari Examples based on work with Andy Cotter, Elad Hazan, Tomer Koren, Percy Liang, Shai Shalev-Shwartz, Ohad Shamir, Karthik
More informationWarm up: risk prediction with logistic regression
Warm up: risk prediction with logistic regression Boss gives you a bunch of data on loans defaulting or not: {(x i,y i )} n i= x i 2 R d, y i 2 {, } You model the data as: P (Y = y x, w) = + exp( yw T
More informationCOS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture #5 Scribe: Allen(Zhelun) Wu February 19, ). Then: Pr[err D (h A ) > ɛ] δ
COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #5 Scribe: Allen(Zhelun) Wu February 19, 018 Review Theorem (Occam s Razor). Say algorithm A finds a hypothesis h A H consistent with
More informationComputational Learning Theory
Computational Learning Theory Sinh Hoa Nguyen, Hung Son Nguyen Polish-Japanese Institute of Information Technology Institute of Mathematics, Warsaw University February 14, 2006 inh Hoa Nguyen, Hung Son
More informationIntroduction to Machine Learning
Introduction to Machine Learning Vapnik Chervonenkis Theory Barnabás Póczos Empirical Risk and True Risk 2 Empirical Risk Shorthand: True risk of f (deterministic): Bayes risk: Let us use the empirical
More informationFull-information Online Learning
Introduction Expert Advice OCO LM A DA NANJING UNIVERSITY Full-information Lijun Zhang Nanjing University, China June 2, 2017 Outline Introduction Expert Advice OCO 1 Introduction Definitions Regret 2
More informationSelective Prediction. Binary classifications. Rong Zhou November 8, 2017
Selective Prediction Binary classifications Rong Zhou November 8, 2017 Table of contents 1. What are selective classifiers? 2. The Realizable Setting 3. The Noisy Setting 1 What are selective classifiers?
More informationActive Learning Class 22, 03 May Claire Monteleoni MIT CSAIL
Active Learning 9.520 Class 22, 03 May 2006 Claire Monteleoni MIT CSAIL Outline Motivation Historical framework: query learning Current framework: selective sampling Some recent results Open problems Active
More informationLinear smoother. ŷ = S y. where s ij = s ij (x) e.g. s ij = diag(l i (x))
Linear smoother ŷ = S y where s ij = s ij (x) e.g. s ij = diag(l i (x)) 2 Online Learning: LMS and Perceptrons Partially adapted from slides by Ryan Gabbard and Mitch Marcus (and lots original slides by
More informationThe No-Regret Framework for Online Learning
The No-Regret Framework for Online Learning A Tutorial Introduction Nahum Shimkin Technion Israel Institute of Technology Haifa, Israel Stochastic Processes in Engineering IIT Mumbai, March 2013 N. Shimkin,
More informationOnline Bayesian Passive-Aggressive Learning
Online Bayesian Passive-Aggressive Learning Full Journal Version: http://qr.net/b1rd Tianlin Shi Jun Zhu ICML 2014 T. Shi, J. Zhu (Tsinghua) BayesPA ICML 2014 1 / 35 Outline Introduction Motivation Framework
More informationLearning and 1-bit Compressed Sensing under Asymmetric Noise
JMLR: Workshop and Conference Proceedings vol 49:1 39, 2016 Learning and 1-bit Compressed Sensing under Asymmetric Noise Pranjal Awasthi Rutgers University Maria-Florina Balcan Nika Haghtalab Hongyang
More informationMidterm, Fall 2003
5-78 Midterm, Fall 2003 YOUR ANDREW USERID IN CAPITAL LETTERS: YOUR NAME: There are 9 questions. The ninth may be more time-consuming and is worth only three points, so do not attempt 9 unless you are
More informationMachine Learning Theory (CS 6783)
Machine Learning Theory (CS 6783) Tu-Th 1:25 to 2:40 PM Kimball, B-11 Instructor : Karthik Sridharan ABOUT THE COURSE No exams! 5 assignments that count towards your grades (55%) One term project (40%)
More informationA Course in Machine Learning
A Course in Machine Learning Hal Daumé III 10 LEARNING THEORY For nothing ought to be posited without a reason given, unless it is self-evident or known by experience or proved by the authority of Sacred
More informationComputational Learning Theory
Computational Learning Theory Pardis Noorzad Department of Computer Engineering and IT Amirkabir University of Technology Ordibehesht 1390 Introduction For the analysis of data structures and algorithms
More informationOnline Learning and Online Convex Optimization
Online Learning and Online Convex Optimization Nicolò Cesa-Bianchi Università degli Studi di Milano N. Cesa-Bianchi (UNIMI) Online Learning 1 / 49 Summary 1 My beautiful regret 2 A supposedly fun game
More informationNear-Optimal Algorithms for Online Matrix Prediction
JMLR: Workshop and Conference Proceedings vol 23 (2012) 38.1 38.13 25th Annual Conference on Learning Theory Near-Optimal Algorithms for Online Matrix Prediction Elad Hazan Technion - Israel Inst. of Tech.
More informationIndividual Sequence Prediction using Memory-efficient Context Trees
Individual Sequence Prediction using Memory-efficient Context rees Ofer Dekel, Shai Shalev-Shwartz, and Yoram Singer Abstract Context trees are a popular and effective tool for tasks such as compression,
More informationSample and Computationally Efficient Active Learning. Maria-Florina Balcan Carnegie Mellon University
Sample and Computationally Efficient Active Learning Maria-Florina Balcan Carnegie Mellon University Machine Learning is Shaping the World Highly successful discipline with lots of applications. Computational
More informationCS340 Machine learning Lecture 5 Learning theory cont'd. Some slides are borrowed from Stuart Russell and Thorsten Joachims
CS340 Machine learning Lecture 5 Learning theory cont'd Some slides are borrowed from Stuart Russell and Thorsten Joachims Inductive learning Simplest form: learn a function from examples f is the target
More informationICML '97 and AAAI '97 Tutorials
A Short Course in Computational Learning Theory: ICML '97 and AAAI '97 Tutorials Michael Kearns AT&T Laboratories Outline Sample Complexity/Learning Curves: nite classes, Occam's VC dimension Razor, Best
More informationComputational Learning Theory
CS 446 Machine Learning Fall 2016 OCT 11, 2016 Computational Learning Theory Professor: Dan Roth Scribe: Ben Zhou, C. Cervantes 1 PAC Learning We want to develop a theory to relate the probability of successful
More informationDoes Unlabeled Data Help?
Does Unlabeled Data Help? Worst-case Analysis of the Sample Complexity of Semi-supervised Learning. Ben-David, Lu and Pal; COLT, 2008. Presentation by Ashish Rastogi Courant Machine Learning Seminar. Outline
More informationConvex Repeated Games and Fenchel Duality
Convex Repeated Games and Fenchel Duality Shai Shalev-Shwartz 1 and Yoram Singer 1,2 1 School of Computer Sci. & Eng., he Hebrew University, Jerusalem 91904, Israel 2 Google Inc. 1600 Amphitheater Parkway,
More informationOnline Learning with Feedback Graphs
Online Learning with Feedback Graphs Claudio Gentile INRIA and Google NY clagentile@gmailcom NYC March 6th, 2018 1 Content of this lecture Regret analysis of sequential prediction problems lying between
More informationNatural Language Processing. Classification. Features. Some Definitions. Classification. Feature Vectors. Classification I. Dan Klein UC Berkeley
Natural Language Processing Classification Classification I Dan Klein UC Berkeley Classification Automatically make a decision about inputs Example: document category Example: image of digit digit Example:
More informationA New Perspective on an Old Perceptron Algorithm
A New Perspective on an Old Perceptron Algorithm Shai Shalev-Shwartz 1,2 and Yoram Singer 1,2 1 School of Computer Sci & Eng, The Hebrew University, Jerusalem 91904, Israel 2 Google Inc, 1600 Amphitheater
More informationComputational Learning Theory: Shattering and VC Dimensions. Machine Learning. Spring The slides are mainly from Vivek Srikumar
Computational Learning Theory: Shattering and VC Dimensions Machine Learning Spring 2018 The slides are mainly from Vivek Srikumar 1 This lecture: Computational Learning Theory The Theory of Generalization
More information