Sample and Computationally Efficient Active Learning. Maria-Florina Balcan Carnegie Mellon University
|
|
- Rosaline Fleming
- 5 years ago
- Views:
Transcription
1 Sample and Computationally Efficient Active Learning Maria-Florina Balcan Carnegie Mellon University
2 Machine Learning is Shaping the World Highly successful discipline with lots of applications. Computational Biology Computer Vision Speech, Language Processing Mostly based on passive supervised learning.
3 Interactive Machine Learning Modern applications: massive amounts of raw data. Only a tiny fraction can be annotated by human experts. Protein sequences Billions of webpages Images Active Learning: utilize data, minimize expert intervention. Expert
4 Interactive Machine Learning Lots of exciting activity in recent years on understanding the power of active learning. Mostly label efficiency. This talk: margin based active learning. Power of aggressive localization for label efficient and poly time active learning. Techniques of interest beyond active learning Adaptive sequence of convex optimization pbs better noise tolerance than previously known for classic passive learning via poly time algos.
5 Structure of the Talk Standard supervised learning. Active learning. Margin based active learning.
6 Supervised Machine Learning
7 Supervised Learning E.g., which s are spam and which are important. Not spam spam E.g., classify objects as chairs vs non chairs. Not chair chair
8 Statistical / PAC learning model Data Source Distribution D on X Learning Algorithm Labeled Examples Expert / Oracle (x 1,c*(x 1 )),, (x m,c*(x m )) c* : X! {0,1} h : X! {0,1} Algo sees (x 1,c*(x 1 )),, (x m,c*(x m )), x i i.i.d. from D Does optimization over S, finds hypothesis h 2 C. Goal: h has small error, err(h)=pr x 2 D (h(x) c*(x)) c* in C, realizable case; else agnostic + + +
9 Two Main Aspects in Classic Machine Learning Algorithm Design. How to optimize? Automatically generate rules that do well on observed data. 9 Runing time: poly d, 1 ϵ, 1 δ Generalization Guarantees, Sample Complexity Confidence for rule effectiveness on future data. O 1 ϵ VCdim C log 1 + log 1 ϵ δ C= linear separators in R d : O 1 d log 1 + log 1 ϵ ϵ δ
10 Modern ML: New Learning Approaches Modern applications: massive amounts of raw data. Only a tiny fraction can be annotated by human experts. Protein sequences Billions of webpages Images
11 Active Learning raw data face not face Expert Labeler O O O Classifier
12 Active Learning in Practice Text classification: active SVM (Tong & Koller, ICML2000). e.g., request label of the example closest to current separator. Video Segmentation (Fathi-Balcan-Ren-Regh, BMVC 11).
13 Can adaptive querying help? [CAL 92, Dasgupta 04] Threshold fns on the real line: - h w (x) = 1(x w), C = {h w : w 2 R} + Active Algorithm Get N = O(1/ϵ) unlabeled examples How can we recover the correct labels with N queries? Do binary search! - - w Just need O(log N) labels! + Output a classifier consistent with the N inferred labels. Passive supervised: Ω(1/ϵ) labels to find an -accurate threshold. Active: only O(log 1/ϵ) labels. Exponential improvement.
14 Active Learning, Provable Guarantees Lots of exciting results on sample complexity. E.g., DasguptaKalaiMonteleoni 05, CastroNowak 07, CavallantiCesa-BianchiGentile 10, YanChaudhuriJavidi 16 DasguptaHsu 08, UrnerWulffBenDavid 13 Disagreement based algorithms Pick a few points at random from the current region of disagreement (uncertainty), query their labels, throw out hypothesis if you are statistically confident they are suboptimal. surviving classifiers region of disagreement [BalcanBeygelzimerLangford 06, Hanneke07, DasguptaHsuMontleoni 07, Wang 09, Fridman 09, Koltchinskii10, BHW 08, BeygelzimerHsuLangfordZhang 10, Hsu 10, Ailon 12, ]
15 Disagreement Based Active Learning A 2 Agnostic Active Learner [Balcan-Beygelzimer-Langford, ICML 2006] current version space Let H 1 = H. For t = 1,., region of disagreement Pick a few points at random from current region of disagreement DIS(H t ) and query their labels. Throw out hypothesis if statistically confident they are suboptimal.
16 Disagreement Based Active Learning A 2 Agnostic Active Learner [Balcan-Beygelzimer-Langford, ICML 2006] Pick a few points at random from the current region of disagreement (uncertainty), query their labels, throw out hypothesis if you are statistically confident they are suboptimal. Guarantees for A 2 [Hanneke 07]: Disagreement coefficient: P(DIS(B(c, r))) θ c = sup r>η+ϵ r Realizable: m = VCim C θ c log(1/ϵ) Agnostic: m = η2 ε 2 VCim C θ c 2 log(1/ϵ) Linear separators, uniform distr.: θ c = d c *
17 Disagreement Based Active Learning Pick a few points at random from the current region of disagreement (uncertainty), query their labels, throw out hypothesis if you are statistically confident they are suboptimal. [BalcanBeygelzimerLangford 06, Hanneke07, DasguptaHsuMontleoni 07, Wang 09, Fridman 09, Koltchinskii10, BHW 08, BeygelzimerHsuLangfordZhang 10, Hsu 10, Ailon 12, ] Generic (any class), adversarial label noise. suboptimal in label complexity computationally prohibitive.
18 Poly Time, Noise Tolerant/Agnostic, Label Optimal AL Algos.
19 Margin Based Active Learning Margin based algo for learning linear separators. [Balcan-Long, COLT 2013] [Awasthi-Balcan-Long, STOC 2014] [Awasthi-Balcan-Haghtalab-Urner, COLT15] [Awasthi-Balcan-Haghtalab-Zhang, COLT16] [Awasthi-Balcan-Long, JACM 2017] Realizable: exponential improvement, only O(d log 1/ϵ) labels to find w error, when D logconcave Agnostic: poly-time AL algo outputs w with err(w) =O(η + ε), η=err(best lin. sep), O(d log 1/ϵ) labels when D logconcave. Improves on noise tolerance of previous best passive [KKMS 05], [KLS 09] algos too!
20 Margin Based Active-Learning, Realizable Case Draw m 1 unlabeled examples, label them, add them to W(1). iterate k = 2,, s find a hypothesis w k-1 consistent with W(k-1). W(k)=W(k-1). sample m k unlabeled samples x satisfying w k-1 x k-1 label them and add them to W(k). 1 w 2 w 3 w 1 2
21 Margin Based Active-Learning, Realizable Case Log-concave distributions: log of density fnc concave. wide class: uniform distr. over any convex set, Gaussian, etc. f λx λx 2 f x 1 λ f x 2 1 λ Theorem D log-concave in R d. If γ k = O then err w s ϵ 1 2 k after s = log 1 ϵ rounds using O(d) labels per round. Active learning label requests Passive learning label requests unlabeled examples
22 Analysis: Aggressive Localization Induction: all w consistent with W(k), err(w) 1/2 k w w *
23 Analysis: Aggressive Localization Induction: all w consistent with W(k), err(w) 1/2 k Suboptimal w w w k-1 w * w k-1 w * k-1 k-1
24 Analysis: Aggressive Localization Induction: all w consistent with W(k), err(w) 1/2 k w w k-1 1/2 k+1 w * err w = Pr w errs on x, w k 1 x γ k 1 + Pr w errs on x, w k 1 x γ k 1 k-1
25 Analysis: Aggressive Localization Induction: all w consistent with W(k), err(w) 1/2 k w w k-1 1/2 k+1 w * err w = Pr w errs on x, w k 1 x γ k 1 + Pr w errs on x w k 1 x γ k 1 Pr w k 1 x γ k 1 k-1 Enough to ensure Pr w errs on x w k 1 x γ k 1 C Need only m k = O(d) labels in round k. Key point: localize aggressively, while maintaining correctness.
26 The Agnostic Case No linear separator can separate Best linear separator error η and Algorithm still margin based Draw m 1 unlabeled examples, label them, add them to W. iterate k=2,, s find w k-1 in B(w k-1, r k-1 ) of small t k-1 hinge loss wrt W. Clear working set. sample m k unlabeled samples x satisfying w k-1 x k-1 ; label them and add them to W. end iterate
27 The Agnostic Case No linear separator can separate Best linear separator error η and
28 The Agnostic Case No linear separator can separate and Best linear separator error η w k w k-1 Key idea 1: Replace error minimization with k-1 hinge loss minimization penalty penalty τ k γ k 1 sgn(w x y ) τ k sgn(w x y )
29 The Agnostic Case No linear separator can separate and Best linear separator error η w k w k-1 Key idea 2: Stay close to the current guess: w k in small ball around w k 1 k-1
30 Margin Based Active-Learning, Agnostic Case Draw m 1 unlabeled examples, label them, add them to W. iterate k=2,, s find w k-1 in B(w k-1, r k-1 ) of small Localization in t k-1 hinge loss wrt W. concept space. Clear working set. sample m k unlabeled samples x satisfying w k-1 x k-1 ; label them and add them to W. end iterate Analysis, key idea: Pick τ k γ k Localization in instance space. Localization & variance analysis control the gap between hinge loss and 0/1 loss (only a constant).
31 Improves over Passive Learning too! Passive Learning Prior Work Our Work Malicious err(w) = O η 2/3 log 2/3 (d/η) [KLS 09] err(w) = O(η) Info theoretic optimal Agnostic err(w) = O η 1/3 log 1/3 (1/η) [KLS 09] err(w) = O(η) Bounded Noise P Y = 1 x P Y = 1 x Active Learning [agnostic/malicious/ bounded] NA NA η + ε same as above! Info theoretic optimal
32 Localization both algorithmic and analysis tool! Useful for active and passive learning!
33 Localization both algorithmic and analysis tool! Useful for active and passive learning! Well known for analyzing sample complexity. [Bousquet-Boucheron-Lugosi 04], [BBL 06], [Hanneke 07], We show useful for noise tolerant poly time algorithms. Previously observed in practice, e.g. for SVMs [Bottou 07]
34 Further Margin Based Refinements Efficient algorithms [Hanneke-Kanade-Yang, ALT 15] O(d log 1/ϵ) label complexity for our poly time agnostic algorithm. [Daniely, COLT 15] achieve C η for any constant C in poly time in the agnostic setting [tradeoff running time with accuracy]. [Awasthi-Balcan-Haghtalab-Urner, COLT15], [Awasthi-Balcan-Haghtalab-Zhang, COLT16] Bounded noise, get error η + ε in time poly(d, 1/ϵ) [YanZhang, NIPS 17] modified Perceptron enjoys similar guarantees. Active & Differentially Private [Balcan-Feldman, NIPS 13] General concept spaces: [Zhang-Chaudhuri, NIPS 14]. Compute the region to localize in each round by using unlabeled data and writing an LP
35 Discussion, Open Directions AL: important paradigm in the age of Big Data. Lots of exciting developments. Disagreement based AL, general sample complexity. Margin based AL: label efficient, noise tolerant, + poly time algo for learning linear separators Better noise tolerance than known for passive learning via poly time algos. [KKMS 05] [KLS 09] Solve an adaptive sequence of convex optimization pbs on smaller & smaller bands around current guess for target.
36 Discussion, Open Directions AL: important paradigm in the age of Big Data. Lots of exciting developments. Disagreement based AL, general sample complexity. Margin based AL: label efficient, noise tolerant, poly time algo for learning linear separators. Open Directions More general distributions, other concept spaces. Exploit localization insights in other settings (e.g., online convex optimization with adversarial noise).
37 Important direction: richer interactions with the expert. Better Accuracy Fewer queries Expert Natural interaction
38
Foundations For Learning in the Age of Big Data. Maria-Florina Balcan
Foundations For Learning in the Age of Big Data Maria-Florina Balcan Modern Machine Learning New applications Explosion of data Classic Paradigm Insufficient Nowadays Modern applications: massive amounts
More informationFoundations For Learning in the Age of Big Data. Maria-Florina Balcan
Foundations For Learning in the Age of Big Data Maria-Florina Balcan Modern Machine Learning New applications Explosion of data Modern ML: New Learning Approaches Modern applications: massive amounts of
More informationDistributed Machine Learning. Maria-Florina Balcan Carnegie Mellon University
Distributed Machine Learning Maria-Florina Balcan Carnegie Mellon University Distributed Machine Learning Modern applications: massive amounts of data distributed across multiple locations. Distributed
More informationDistributed Machine Learning. Maria-Florina Balcan, Georgia Tech
Distributed Machine Learning Maria-Florina Balcan, Georgia Tech Model for reasoning about key issues in supervised learning [Balcan-Blum-Fine-Mansour, COLT 12] Supervised Learning Example: which emails
More information1 Active Learning Foundations of Machine Learning and Data Science. Lecturer: Maria-Florina Balcan Lecture 20 & 21: November 16 & 18, 2015
10-806 Foundations of Machine Learning and Data Science Lecturer: Maria-Florina Balcan Lecture 20 & 21: November 16 & 18, 2015 1 Active Learning Most classic machine learning methods and the formal learning
More informationPractical Agnostic Active Learning
Practical Agnostic Active Learning Alina Beygelzimer Yahoo Research based on joint work with Sanjoy Dasgupta, Daniel Hsu, John Langford, Francesco Orabona, Chicheng Zhang, and Tong Zhang * * introductory
More informationGeneralization, Overfitting, and Model Selection
Generalization, Overfitting, and Model Selection Sample Complexity Results for Supervised Classification Maria-Florina (Nina) Balcan 10/03/2016 Two Core Aspects of Machine Learning Algorithm Design. How
More informationAtutorialonactivelearning
Atutorialonactivelearning Sanjoy Dasgupta 1 John Langford 2 UC San Diego 1 Yahoo Labs 2 Exploiting unlabeled data Alotofunlabeleddataisplentifulandcheap,eg. documents off the web speech samples images
More informationSupport Vector Machines (SVMs).
Support Vector Machines (SVMs). SemiSupervised Learning. SemiSupervised SVMs. MariaFlorina Balcan 3/25/215 Support Vector Machines (SVMs). One of the most theoretically well motivated and practically most
More informationActive Learning Class 22, 03 May Claire Monteleoni MIT CSAIL
Active Learning 9.520 Class 22, 03 May 2006 Claire Monteleoni MIT CSAIL Outline Motivation Historical framework: query learning Current framework: selective sampling Some recent results Open problems Active
More informationGeneralization and Overfitting
Generalization and Overfitting Model Selection Maria-Florina (Nina) Balcan February 24th, 2016 PAC/SLT models for Supervised Learning Data Source Distribution D on X Learning Algorithm Expert / Oracle
More informationLearning and 1-bit Compressed Sensing under Asymmetric Noise
JMLR: Workshop and Conference Proceedings vol 49:1 39, 2016 Learning and 1-bit Compressed Sensing under Asymmetric Noise Pranjal Awasthi Rutgers University Maria-Florina Balcan Nika Haghtalab Hongyang
More informationActive learning. Sanjoy Dasgupta. University of California, San Diego
Active learning Sanjoy Dasgupta University of California, San Diego Exploiting unlabeled data A lot of unlabeled data is plentiful and cheap, eg. documents off the web speech samples images and video But
More informationGeneralization, Overfitting, and Model Selection
Generalization, Overfitting, and Model Selection Sample Complexity Results for Supervised Classification MariaFlorina (Nina) Balcan 10/05/2016 Reminders Midterm Exam Mon, Oct. 10th Midterm Review Session
More informationStatistical Active Learning Algorithms
Statistical Active Learning Algorithms Maria Florina Balcan Georgia Institute of Technology ninamf@cc.gatech.edu Vitaly Feldman IBM Research - Almaden vitaly@post.harvard.edu Abstract We describe a framework
More informationActive Learning from Single and Multiple Annotators
Active Learning from Single and Multiple Annotators Kamalika Chaudhuri University of California, San Diego Joint work with Chicheng Zhang Classification Given: ( xi, yi ) Vector of features Discrete Labels
More informationPAC Learning Introduction to Machine Learning. Matt Gormley Lecture 14 March 5, 2018
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University PAC Learning Matt Gormley Lecture 14 March 5, 2018 1 ML Big Picture Learning Paradigms:
More informationAsymptotic Active Learning
Asymptotic Active Learning Maria-Florina Balcan Eyal Even-Dar Steve Hanneke Michael Kearns Yishay Mansour Jennifer Wortman Abstract We describe and analyze a PAC-asymptotic model for active learning. We
More informationThe Perceptron Algorithm, Margins
The Perceptron Algorithm, Margins MariaFlorina Balcan 08/29/2018 The Perceptron Algorithm Simple learning algorithm for supervised classification analyzed via geometric margins in the 50 s [Rosenblatt
More informationActive Learning: Disagreement Coefficient
Advanced Course in Machine Learning Spring 2010 Active Learning: Disagreement Coefficient Handouts are jointly prepared by Shie Mannor and Shai Shalev-Shwartz In previous lectures we saw examples in which
More informationEfficient Semi-supervised and Active Learning of Disjunctions
Maria-Florina Balcan ninamf@cc.gatech.edu Christopher Berlind cberlind@gatech.edu Steven Ehrlich sehrlich@cc.gatech.edu Yingyu Liang yliang39@gatech.edu School of Computer Science, College of Computing,
More informationA Bound on the Label Complexity of Agnostic Active Learning
A Bound on the Label Complexity of Agnostic Active Learning Steve Hanneke March 2007 CMU-ML-07-103 School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213 Machine Learning Department,
More informationTsybakov noise adap/ve margin- based ac/ve learning
Tsybakov noise adap/ve margin- based ac/ve learning Aar$ Singh A. Nico Habermann Associate Professor NIPS workshop on Learning Faster from Easy Data II Dec 11, 2015 Passive Learning Ac/ve Learning (X j,?)
More informationNoise-Tolerant Interactive Learning Using Pairwise Comparisons
Noise-Tolerant Interactive Learning Using Pairwise Comparisons Yichong Xu *, Hongyang Zhang *, Kyle Miller, Aarti Singh *, and Artur Dubrawski * Machine Learning Department, Carnegie Mellon University,
More informationImproved Algorithms for Confidence-Rated Prediction with Error Guarantees
Improved Algorithms for Confidence-Rated Prediction with Error Guarantees Kamalika Chaudhuri Chicheng Zhang Department of Computer Science and Engineering University of California, San Diego 95 Gilman
More informationEfficient PAC Learning from the Crowd
Proceedings of Machine Learning Research vol 65:1 24, 2017 Efficient PAC Learning from the Crowd Pranjal Awasthi Rutgers University Avrim Blum Nika Haghtalab Carnegie Mellon University Yishay Mansour Tel-Aviv
More informationTTIC An Introduction to the Theory of Machine Learning. Learning from noisy data, intro to SQ model
TTIC 325 An Introduction to the Theory of Machine Learning Learning from noisy data, intro to SQ model Avrim Blum 4/25/8 Learning when there is no perfect predictor Hoeffding/Chernoff bounds: minimizing
More informationImproved Algorithms for Confidence-Rated Prediction with Error Guarantees
Improved Algorithms for Confidence-Rated Prediction with Error Guarantees Kamalika Chaudhuri Chicheng Zhang Department of Computer Science and Engineering University of California, San Diego 95 Gilman
More informationAgnostic Active Learning
Agnostic Active Learning Maria-Florina Balcan Carnegie Mellon University, Pittsburgh, PA 523 Alina Beygelzimer IBM T. J. Watson Research Center, Hawthorne, NY 0532 John Langford Yahoo! Research, New York,
More informationMargin Based Active Learning
Margin Based Active Learning Maria-Florina Balcan 1, Andrei Broder 2, and Tong Zhang 3 1 Computer Science Department, Carnegie Mellon University, Pittsburgh, PA. ninamf@cs.cmu.edu 2 Yahoo! Research, Sunnyvale,
More informationThe sample complexity of agnostic learning with deterministic labels
The sample complexity of agnostic learning with deterministic labels Shai Ben-David Cheriton School of Computer Science University of Waterloo Waterloo, ON, N2L 3G CANADA shai@uwaterloo.ca Ruth Urner College
More informationSample complexity bounds for differentially private learning
Sample complexity bounds for differentially private learning Kamalika Chaudhuri University of California, San Diego Daniel Hsu Microsoft Research Outline 1. Learning and privacy model 2. Our results: sample
More informationLearning with Rejection
Learning with Rejection Corinna Cortes 1, Giulia DeSalvo 2, and Mehryar Mohri 2,1 1 Google Research, 111 8th Avenue, New York, NY 2 Courant Institute of Mathematical Sciences, 251 Mercer Street, New York,
More informationThe Boosting Approach to. Machine Learning. Maria-Florina Balcan 10/31/2016
The Boosting Approach to Machine Learning Maria-Florina Balcan 10/31/2016 Boosting General method for improving the accuracy of any given learning algorithm. Works by creating a series of challenge datasets
More informationBetter Algorithms for Selective Sampling
Francesco Orabona Nicolò Cesa-Bianchi DSI, Università degli Studi di Milano, Italy francesco@orabonacom nicolocesa-bianchi@unimiit Abstract We study online algorithms for selective sampling that use regularized
More information1 Learning Linear Separators
8803 Machine Learning Theory Maria-Florina Balcan Lecture 3: August 30, 2011 Plan: Perceptron algorithm for learning linear separators. 1 Learning Linear Separators Here we can think of examples as being
More informationActive Learning and Optimized Information Gathering
Active Learning and Optimized Information Gathering Lecture 7 Learning Theory CS 101.2 Andreas Krause Announcements Project proposal: Due tomorrow 1/27 Homework 1: Due Thursday 1/29 Any time is ok. Office
More informationDoes Unlabeled Data Help?
Does Unlabeled Data Help? Worst-case Analysis of the Sample Complexity of Semi-supervised Learning. Ben-David, Lu and Pal; COLT, 2008. Presentation by Ashish Rastogi Courant Machine Learning Seminar. Outline
More informationMinimax Analysis of Active Learning
Journal of Machine Learning Research 6 05 3487-360 Submitted 0/4; Published /5 Minimax Analysis of Active Learning Steve Hanneke Princeton, NJ 0854 Liu Yang IBM T J Watson Research Center, Yorktown Heights,
More informationClustering Perturbation Resilient
Clustering Perturbation Resilient Instances Maria-Florina Balcan Carnegie Mellon University Clustering Comes Up Everywhere Clustering news articles or web pages or search results by topic. Clustering protein
More informationarxiv: v4 [cs.lg] 5 Nov 2014
Statistical Active Learning Algorithms for Noise Tolerance and Differential Privacy arxiv:1307.310v4 [cs.lg] 5 Nov 014 Maria Florina Balcan Carnegie Mellon University ninamf@cs.cmu.edu Abstract Vitaly
More informationPAC-learning, VC Dimension and Margin-based Bounds
More details: General: http://www.learning-with-kernels.org/ Example of more complex bounds: http://www.research.ibm.com/people/t/tzhang/papers/jmlr02_cover.ps.gz PAC-learning, VC Dimension and Margin-based
More informationOn the power and the limits of evolvability. Vitaly Feldman Almaden Research Center
On the power and the limits of evolvability Vitaly Feldman Almaden Research Center Learning from examples vs evolvability Learnable from examples Evolvable Parity functions The core model: PAC [Valiant
More informationCSE 151 Machine Learning. Instructor: Kamalika Chaudhuri
CSE 151 Machine Learning Instructor: Kamalika Chaudhuri Ensemble Learning How to combine multiple classifiers into a single one Works well if the classifiers are complementary This class: two types of
More informationAdaptive Sampling Under Low Noise Conditions 1
Manuscrit auteur, publié dans "41èmes Journées de Statistique, SFdS, Bordeaux (2009)" Adaptive Sampling Under Low Noise Conditions 1 Nicolò Cesa-Bianchi Dipartimento di Scienze dell Informazione Università
More informationPlug-in Approach to Active Learning
Plug-in Approach to Active Learning Stanislav Minsker Stanislav Minsker (Georgia Tech) Plug-in approach to active learning 1 / 18 Prediction framework Let (X, Y ) be a random couple in R d { 1, +1}. X
More informationThe True Sample Complexity of Active Learning
The True Sample Complexity of Active Learning Maria-Florina Balcan Computer Science Department Carnegie Mellon University ninamf@cs.cmu.edu Steve Hanneke Machine Learning Department Carnegie Mellon University
More informationMachine Learning
Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 11, 2012 Today: Computational Learning Theory Probably Approximately Coorrect (PAC) learning theorem
More informationSelective Prediction. Binary classifications. Rong Zhou November 8, 2017
Selective Prediction Binary classifications Rong Zhou November 8, 2017 Table of contents 1. What are selective classifiers? 2. The Realizable Setting 3. The Noisy Setting 1 What are selective classifiers?
More informationLecture 7: Passive Learning
CS 880: Advanced Complexity Theory 2/8/2008 Lecture 7: Passive Learning Instructor: Dieter van Melkebeek Scribe: Tom Watson In the previous lectures, we studied harmonic analysis as a tool for analyzing
More informationIntroduction to Machine Learning. Introduction to ML - TAU 2016/7 1
Introduction to Machine Learning Introduction to ML - TAU 2016/7 1 Course Administration Lecturers: Amir Globerson (gamir@post.tau.ac.il) Yishay Mansour (Mansour@tau.ac.il) Teaching Assistance: Regev Schweiger
More informationMachine Learning. Computational Learning Theory. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012
Machine Learning CSE6740/CS7641/ISYE6740, Fall 2012 Computational Learning Theory Le Song Lecture 11, September 20, 2012 Based on Slides from Eric Xing, CMU Reading: Chap. 7 T.M book 1 Complexity of Learning
More informationMachine Learning
Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 11, 2012 Today: Computational Learning Theory Probably Approximately Coorrect (PAC) learning theorem
More informationComputational and Statistical Learning Theory
Computational and Statistical Learning Theory TTIC 31120 Prof. Nati Srebro Lecture 17: Stochastic Optimization Part II: Realizable vs Agnostic Rates Part III: Nearest Neighbor Classification Stochastic
More informationLecture 29: Computational Learning Theory
CS 710: Complexity Theory 5/4/2010 Lecture 29: Computational Learning Theory Instructor: Dieter van Melkebeek Scribe: Dmitri Svetlov and Jake Rosin Today we will provide a brief introduction to computational
More informationEfficient and Principled Online Classification Algorithms for Lifelon
Efficient and Principled Online Classification Algorithms for Lifelong Learning Toyota Technological Institute at Chicago Chicago, IL USA Talk @ Lifelong Learning for Mobile Robotics Applications Workshop,
More informationLittlestone s Dimension and Online Learnability
Littlestone s Dimension and Online Learnability Shai Shalev-Shwartz Toyota Technological Institute at Chicago The Hebrew University Talk at UCSD workshop, February, 2009 Joint work with Shai Ben-David
More information1 Learning Linear Separators
10-601 Machine Learning Maria-Florina Balcan Spring 2015 Plan: Perceptron algorithm for learning linear separators. 1 Learning Linear Separators Here we can think of examples as being from {0, 1} n or
More informationOnline Manifold Regularization: A New Learning Setting and Empirical Study
Online Manifold Regularization: A New Learning Setting and Empirical Study Andrew B. Goldberg 1, Ming Li 2, Xiaojin Zhu 1 1 Computer Sciences, University of Wisconsin Madison, USA. {goldberg,jerryzhu}@cs.wisc.edu
More informationThe Power of Localization for Efficiently Learning Linear Separators with Noise
The Power of Localization for Efficiently Learning Linear Separators with Noise Pranjal Awasthi pawasthi@princeton.edu Maria Florina Balcan ninamf@cc.gatech.edu January 3, 2014 Philip M. Long plong@microsoft.com
More informationPerceptron (Theory) + Linear Regression
10601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Perceptron (Theory) Linear Regression Matt Gormley Lecture 6 Feb. 5, 2018 1 Q&A
More informationOnline Learning, Mistake Bounds, Perceptron Algorithm
Online Learning, Mistake Bounds, Perceptron Algorithm 1 Online Learning So far the focus of the course has been on batch learning, where algorithms are presented with a sample of training data, from which
More informationOnline Learning. Jordan Boyd-Graber. University of Colorado Boulder LECTURE 21. Slides adapted from Mohri
Online Learning Jordan Boyd-Graber University of Colorado Boulder LECTURE 21 Slides adapted from Mohri Jordan Boyd-Graber Boulder Online Learning 1 of 31 Motivation PAC learning: distribution fixed over
More informationLearning symmetric non-monotone submodular functions
Learning symmetric non-monotone submodular functions Maria-Florina Balcan Georgia Institute of Technology ninamf@cc.gatech.edu Nicholas J. A. Harvey University of British Columbia nickhar@cs.ubc.ca Satoru
More informationMachine Learning: Homework 5
0-60 Machine Learning: Homework 5 Due 5:0 p.m. Thursday, March, 06 TAs: Travis Dick and Han Zhao Instructions Late homework policy: Homework is worth full credit if submitted before the due date, half
More informationBoosting: Foundations and Algorithms. Rob Schapire
Boosting: Foundations and Algorithms Rob Schapire Example: Spam Filtering problem: filter out spam (junk email) gather large collection of examples of spam and non-spam: From: yoav@ucsd.edu Rob, can you
More informationFrom Batch to Transductive Online Learning
From Batch to Transductive Online Learning Sham Kakade Toyota Technological Institute Chicago, IL 60637 sham@tti-c.org Adam Tauman Kalai Toyota Technological Institute Chicago, IL 60637 kalai@tti-c.org
More informationMachine Learning. Computational Learning Theory. Eric Xing , Fall Lecture 9, October 5, 2016
Machine Learning 10-701, Fall 2016 Computational Learning Theory Eric Xing Lecture 9, October 5, 2016 Reading: Chap. 7 T.M book Eric Xing @ CMU, 2006-2016 1 Generalizability of Learning In machine learning
More informationPolynomial regression under arbitrary product distributions
Polynomial regression under arbitrary product distributions Eric Blais and Ryan O Donnell and Karl Wimmer Carnegie Mellon University Abstract In recent work, Kalai, Klivans, Mansour, and Servedio [KKMS05]
More informationBoosting the Area Under the ROC Curve
Boosting the Area Under the ROC Curve Philip M. Long plong@google.com Rocco A. Servedio rocco@cs.columbia.edu Abstract We show that any weak ranker that can achieve an area under the ROC curve slightly
More informationUnderstanding Machine Learning A theory Perspective
Understanding Machine Learning A theory Perspective Shai Ben-David University of Waterloo MLSS at MPI Tubingen, 2017 Disclaimer Warning. This talk is NOT about how cool machine learning is. I am sure you
More informationActivized Learning with Uniform Classification Noise
Activized Learning with Uniform Classification Noise Liu Yang Steve Hanneke June 9, 2011 Abstract We prove that for any VC class, it is possible to transform any passive learning algorithm into an active
More informationLinear Classification and Selective Sampling Under Low Noise Conditions
Linear Classification and Selective Sampling Under Low Noise Conditions Giovanni Cavallanti DSI, Università degli Studi di Milano, Italy cavallanti@dsi.unimi.it Nicolò Cesa-Bianchi DSI, Università degli
More informationEmpirical Risk Minimization Algorithms
Empirical Risk Minimization Algorithms Tirgul 2 Part I November 2016 Reminder Domain set, X : the set of objects that we wish to label. Label set, Y : the set of possible labels. A prediction rule, h:
More informationLecture 14: Binary Classification: Disagreement-based Methods
CSE599i: Online and Adaptive Machine Learning Winter 2018 Lecture 14: Binary Classification: Disagreement-based Methods Lecturer: Kevin Jamieson Scribes: N. Cano, O. Rafieian, E. Barzegary, G. Erion, S.
More informationMachine Learning
Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University January 14, 2015 Today: The Big Picture Overfitting Review: probability Readings: Decision trees, overfiting
More informationCOS 402 Machine Learning and Artificial Intelligence Fall Lecture 3: Learning Theory
COS 402 Machine Learning and Artificial Intelligence Fall 2016 Lecture 3: Learning Theory Sanjeev Arora Elad Hazan Admin Exercise 1 due next Tue, in class Enrolment Recap We have seen: AI by introspection
More informationImportance Weighted Active Learning
Alina Beygelzimer IBM Research, 19 Skyline Drive, Hawthorne, NY 10532 beygel@us.ibm.com Sanjoy Dasgupta dasgupta@cs.ucsd.edu Department of Computer Science and Engineering, University of California, San
More informationComputational Learning Theory. CS534 - Machine Learning
Computational Learning Theory CS534 Machine Learning Introduction Computational learning theory Provides a theoretical analysis of learning Shows when a learning algorithm can be expected to succeed Shows
More informationStochastic optimization in Hilbert spaces
Stochastic optimization in Hilbert spaces Aymeric Dieuleveut Aymeric Dieuleveut Stochastic optimization Hilbert spaces 1 / 48 Outline Learning vs Statistics Aymeric Dieuleveut Stochastic optimization Hilbert
More informationFoundations of Machine Learning
Introduction to ML Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu page 1 Logistics Prerequisites: basics in linear algebra, probability, and analysis of algorithms. Workload: about
More informationA Tight Excess Risk Bound via a Unified PAC-Bayesian- Rademacher-Shtarkov-MDL Complexity
A Tight Excess Risk Bound via a Unified PAC-Bayesian- Rademacher-Shtarkov-MDL Complexity Peter Grünwald Centrum Wiskunde & Informatica Amsterdam Mathematical Institute Leiden University Joint work with
More informationLogistic Regression Logistic
Case Study 1: Estimating Click Probabilities L2 Regularization for Logistic Regression Machine Learning/Statistics for Big Data CSE599C1/STAT592, University of Washington Carlos Guestrin January 10 th,
More informationMachine Learning (CS 567) Lecture 2
Machine Learning (CS 567) Lecture 2 Time: T-Th 5:00pm - 6:20pm Location: GFS118 Instructor: Sofus A. Macskassy (macskass@usc.edu) Office: SAL 216 Office hours: by appointment Teaching assistant: Cheol
More informationEfficient Bandit Algorithms for Online Multiclass Prediction
Efficient Bandit Algorithms for Online Multiclass Prediction Sham Kakade, Shai Shalev-Shwartz and Ambuj Tewari Presented By: Nakul Verma Motivation In many learning applications, true class labels are
More informationMachine Learning And Applications: Supervised Learning-SVM
Machine Learning And Applications: Supervised Learning-SVM Raphaël Bournhonesque École Normale Supérieure de Lyon, Lyon, France raphael.bournhonesque@ens-lyon.fr 1 Supervised vs unsupervised learning Machine
More informationNew Algorithms for Contextual Bandits
New Algorithms for Contextual Bandits Lev Reyzin Georgia Institute of Technology Work done at Yahoo! 1 S A. Beygelzimer, J. Langford, L. Li, L. Reyzin, R.E. Schapire Contextual Bandit Algorithms with Supervised
More informationThe Perceptron algorithm
The Perceptron algorithm Tirgul 3 November 2016 Agnostic PAC Learnability A hypothesis class H is agnostic PAC learnable if there exists a function m H : 0,1 2 N and a learning algorithm with the following
More informationActive Learning with Oracle Epiphany
Active Learning with Oracle Epiphany Tzu-Kuo Huang Uber Advanced Technologies Group Pittsburgh, PA 1501 Lihong Li Microsoft Research Redmond, WA 9805 Ara Vartanian University of Wisconsin Madison Madison,
More informationLearning Theory. Machine Learning CSE546 Carlos Guestrin University of Washington. November 25, Carlos Guestrin
Learning Theory Machine Learning CSE546 Carlos Guestrin University of Washington November 25, 2013 Carlos Guestrin 2005-2013 1 What now n We have explored many ways of learning from data n But How good
More informationRobust Selective Sampling from Single and Multiple Teachers
Robust Selective Sampling from Single and Multiple Teachers Ofer Dekel Microsoft Research oferd@microsoftcom Claudio Gentile DICOM, Università dell Insubria claudiogentile@uninsubriait Karthik Sridharan
More informationMachine Learning. Support Vector Machines. Fabio Vandin November 20, 2017
Machine Learning Support Vector Machines Fabio Vandin November 20, 2017 1 Classification and Margin Consider a classification problem with two classes: instance set X = R d label set Y = { 1, 1}. Training
More informationTowards Lifelong Machine Learning Multi-Task and Lifelong Learning with Unlabeled Tasks Christoph Lampert
Towards Lifelong Machine Learning Multi-Task and Lifelong Learning with Unlabeled Tasks Christoph Lampert HSE Computer Science Colloquium September 6, 2016 IST Austria (Institute of Science and Technology
More informationCSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18
CSE 417T: Introduction to Machine Learning Final Review Henry Chai 12/4/18 Overfitting Overfitting is fitting the training data more than is warranted Fitting noise rather than signal 2 Estimating! "#$
More information1 Differential Privacy and Statistical Query Learning
10-806 Foundations of Machine Learning and Data Science Lecturer: Maria-Florina Balcan Lecture 5: December 07, 015 1 Differential Privacy and Statistical Query Learning 1.1 Differential Privacy Suppose
More informationIntroduction to Algorithms / Algorithms I Lecturer: Michael Dinitz Topic: Intro to Learning Theory Date: 12/8/16
600.463 Introduction to Algorithms / Algorithms I Lecturer: Michael Dinitz Topic: Intro to Learning Theory Date: 12/8/16 25.1 Introduction Today we re going to talk about machine learning, but from an
More informationAn Algorithms-based Intro to Machine Learning
CMU 15451 lecture 12/08/11 An Algorithmsbased Intro to Machine Learning Plan for today Machine Learning intro: models and basic issues An interesting algorithm for combining expert advice Avrim Blum [Based
More informationSupport vector machines Lecture 4
Support vector machines Lecture 4 David Sontag New York University Slides adapted from Luke Zettlemoyer, Vibhav Gogate, and Carlos Guestrin Q: What does the Perceptron mistake bound tell us? Theorem: The
More informationIntroduction to Machine Learning
Introduction to Machine Learning 236756 Prof. Nir Ailon Lecture 4: Computational Complexity of Learning & Surrogate Losses Efficient PAC Learning Until now we were mostly worried about sample complexity
More informationTutorial: PART 1. Online Convex Optimization, A Game- Theoretic Approach to Learning.
Tutorial: PART 1 Online Convex Optimization, A Game- Theoretic Approach to Learning http://www.cs.princeton.edu/~ehazan/tutorial/tutorial.htm Elad Hazan Princeton University Satyen Kale Yahoo Research
More information