Sample and Computationally Efficient Active Learning. Maria-Florina Balcan Carnegie Mellon University

Size: px
Start display at page:

Download "Sample and Computationally Efficient Active Learning. Maria-Florina Balcan Carnegie Mellon University"

Transcription

1 Sample and Computationally Efficient Active Learning Maria-Florina Balcan Carnegie Mellon University

2 Machine Learning is Shaping the World Highly successful discipline with lots of applications. Computational Biology Computer Vision Speech, Language Processing Mostly based on passive supervised learning.

3 Interactive Machine Learning Modern applications: massive amounts of raw data. Only a tiny fraction can be annotated by human experts. Protein sequences Billions of webpages Images Active Learning: utilize data, minimize expert intervention. Expert

4 Interactive Machine Learning Lots of exciting activity in recent years on understanding the power of active learning. Mostly label efficiency. This talk: margin based active learning. Power of aggressive localization for label efficient and poly time active learning. Techniques of interest beyond active learning Adaptive sequence of convex optimization pbs better noise tolerance than previously known for classic passive learning via poly time algos.

5 Structure of the Talk Standard supervised learning. Active learning. Margin based active learning.

6 Supervised Machine Learning

7 Supervised Learning E.g., which s are spam and which are important. Not spam spam E.g., classify objects as chairs vs non chairs. Not chair chair

8 Statistical / PAC learning model Data Source Distribution D on X Learning Algorithm Labeled Examples Expert / Oracle (x 1,c*(x 1 )),, (x m,c*(x m )) c* : X! {0,1} h : X! {0,1} Algo sees (x 1,c*(x 1 )),, (x m,c*(x m )), x i i.i.d. from D Does optimization over S, finds hypothesis h 2 C. Goal: h has small error, err(h)=pr x 2 D (h(x) c*(x)) c* in C, realizable case; else agnostic + + +

9 Two Main Aspects in Classic Machine Learning Algorithm Design. How to optimize? Automatically generate rules that do well on observed data. 9 Runing time: poly d, 1 ϵ, 1 δ Generalization Guarantees, Sample Complexity Confidence for rule effectiveness on future data. O 1 ϵ VCdim C log 1 + log 1 ϵ δ C= linear separators in R d : O 1 d log 1 + log 1 ϵ ϵ δ

10 Modern ML: New Learning Approaches Modern applications: massive amounts of raw data. Only a tiny fraction can be annotated by human experts. Protein sequences Billions of webpages Images

11 Active Learning raw data face not face Expert Labeler O O O Classifier

12 Active Learning in Practice Text classification: active SVM (Tong & Koller, ICML2000). e.g., request label of the example closest to current separator. Video Segmentation (Fathi-Balcan-Ren-Regh, BMVC 11).

13 Can adaptive querying help? [CAL 92, Dasgupta 04] Threshold fns on the real line: - h w (x) = 1(x w), C = {h w : w 2 R} + Active Algorithm Get N = O(1/ϵ) unlabeled examples How can we recover the correct labels with N queries? Do binary search! - - w Just need O(log N) labels! + Output a classifier consistent with the N inferred labels. Passive supervised: Ω(1/ϵ) labels to find an -accurate threshold. Active: only O(log 1/ϵ) labels. Exponential improvement.

14 Active Learning, Provable Guarantees Lots of exciting results on sample complexity. E.g., DasguptaKalaiMonteleoni 05, CastroNowak 07, CavallantiCesa-BianchiGentile 10, YanChaudhuriJavidi 16 DasguptaHsu 08, UrnerWulffBenDavid 13 Disagreement based algorithms Pick a few points at random from the current region of disagreement (uncertainty), query their labels, throw out hypothesis if you are statistically confident they are suboptimal. surviving classifiers region of disagreement [BalcanBeygelzimerLangford 06, Hanneke07, DasguptaHsuMontleoni 07, Wang 09, Fridman 09, Koltchinskii10, BHW 08, BeygelzimerHsuLangfordZhang 10, Hsu 10, Ailon 12, ]

15 Disagreement Based Active Learning A 2 Agnostic Active Learner [Balcan-Beygelzimer-Langford, ICML 2006] current version space Let H 1 = H. For t = 1,., region of disagreement Pick a few points at random from current region of disagreement DIS(H t ) and query their labels. Throw out hypothesis if statistically confident they are suboptimal.

16 Disagreement Based Active Learning A 2 Agnostic Active Learner [Balcan-Beygelzimer-Langford, ICML 2006] Pick a few points at random from the current region of disagreement (uncertainty), query their labels, throw out hypothesis if you are statistically confident they are suboptimal. Guarantees for A 2 [Hanneke 07]: Disagreement coefficient: P(DIS(B(c, r))) θ c = sup r>η+ϵ r Realizable: m = VCim C θ c log(1/ϵ) Agnostic: m = η2 ε 2 VCim C θ c 2 log(1/ϵ) Linear separators, uniform distr.: θ c = d c *

17 Disagreement Based Active Learning Pick a few points at random from the current region of disagreement (uncertainty), query their labels, throw out hypothesis if you are statistically confident they are suboptimal. [BalcanBeygelzimerLangford 06, Hanneke07, DasguptaHsuMontleoni 07, Wang 09, Fridman 09, Koltchinskii10, BHW 08, BeygelzimerHsuLangfordZhang 10, Hsu 10, Ailon 12, ] Generic (any class), adversarial label noise. suboptimal in label complexity computationally prohibitive.

18 Poly Time, Noise Tolerant/Agnostic, Label Optimal AL Algos.

19 Margin Based Active Learning Margin based algo for learning linear separators. [Balcan-Long, COLT 2013] [Awasthi-Balcan-Long, STOC 2014] [Awasthi-Balcan-Haghtalab-Urner, COLT15] [Awasthi-Balcan-Haghtalab-Zhang, COLT16] [Awasthi-Balcan-Long, JACM 2017] Realizable: exponential improvement, only O(d log 1/ϵ) labels to find w error, when D logconcave Agnostic: poly-time AL algo outputs w with err(w) =O(η + ε), η=err(best lin. sep), O(d log 1/ϵ) labels when D logconcave. Improves on noise tolerance of previous best passive [KKMS 05], [KLS 09] algos too!

20 Margin Based Active-Learning, Realizable Case Draw m 1 unlabeled examples, label them, add them to W(1). iterate k = 2,, s find a hypothesis w k-1 consistent with W(k-1). W(k)=W(k-1). sample m k unlabeled samples x satisfying w k-1 x k-1 label them and add them to W(k). 1 w 2 w 3 w 1 2

21 Margin Based Active-Learning, Realizable Case Log-concave distributions: log of density fnc concave. wide class: uniform distr. over any convex set, Gaussian, etc. f λx λx 2 f x 1 λ f x 2 1 λ Theorem D log-concave in R d. If γ k = O then err w s ϵ 1 2 k after s = log 1 ϵ rounds using O(d) labels per round. Active learning label requests Passive learning label requests unlabeled examples

22 Analysis: Aggressive Localization Induction: all w consistent with W(k), err(w) 1/2 k w w *

23 Analysis: Aggressive Localization Induction: all w consistent with W(k), err(w) 1/2 k Suboptimal w w w k-1 w * w k-1 w * k-1 k-1

24 Analysis: Aggressive Localization Induction: all w consistent with W(k), err(w) 1/2 k w w k-1 1/2 k+1 w * err w = Pr w errs on x, w k 1 x γ k 1 + Pr w errs on x, w k 1 x γ k 1 k-1

25 Analysis: Aggressive Localization Induction: all w consistent with W(k), err(w) 1/2 k w w k-1 1/2 k+1 w * err w = Pr w errs on x, w k 1 x γ k 1 + Pr w errs on x w k 1 x γ k 1 Pr w k 1 x γ k 1 k-1 Enough to ensure Pr w errs on x w k 1 x γ k 1 C Need only m k = O(d) labels in round k. Key point: localize aggressively, while maintaining correctness.

26 The Agnostic Case No linear separator can separate Best linear separator error η and Algorithm still margin based Draw m 1 unlabeled examples, label them, add them to W. iterate k=2,, s find w k-1 in B(w k-1, r k-1 ) of small t k-1 hinge loss wrt W. Clear working set. sample m k unlabeled samples x satisfying w k-1 x k-1 ; label them and add them to W. end iterate

27 The Agnostic Case No linear separator can separate Best linear separator error η and

28 The Agnostic Case No linear separator can separate and Best linear separator error η w k w k-1 Key idea 1: Replace error minimization with k-1 hinge loss minimization penalty penalty τ k γ k 1 sgn(w x y ) τ k sgn(w x y )

29 The Agnostic Case No linear separator can separate and Best linear separator error η w k w k-1 Key idea 2: Stay close to the current guess: w k in small ball around w k 1 k-1

30 Margin Based Active-Learning, Agnostic Case Draw m 1 unlabeled examples, label them, add them to W. iterate k=2,, s find w k-1 in B(w k-1, r k-1 ) of small Localization in t k-1 hinge loss wrt W. concept space. Clear working set. sample m k unlabeled samples x satisfying w k-1 x k-1 ; label them and add them to W. end iterate Analysis, key idea: Pick τ k γ k Localization in instance space. Localization & variance analysis control the gap between hinge loss and 0/1 loss (only a constant).

31 Improves over Passive Learning too! Passive Learning Prior Work Our Work Malicious err(w) = O η 2/3 log 2/3 (d/η) [KLS 09] err(w) = O(η) Info theoretic optimal Agnostic err(w) = O η 1/3 log 1/3 (1/η) [KLS 09] err(w) = O(η) Bounded Noise P Y = 1 x P Y = 1 x Active Learning [agnostic/malicious/ bounded] NA NA η + ε same as above! Info theoretic optimal

32 Localization both algorithmic and analysis tool! Useful for active and passive learning!

33 Localization both algorithmic and analysis tool! Useful for active and passive learning! Well known for analyzing sample complexity. [Bousquet-Boucheron-Lugosi 04], [BBL 06], [Hanneke 07], We show useful for noise tolerant poly time algorithms. Previously observed in practice, e.g. for SVMs [Bottou 07]

34 Further Margin Based Refinements Efficient algorithms [Hanneke-Kanade-Yang, ALT 15] O(d log 1/ϵ) label complexity for our poly time agnostic algorithm. [Daniely, COLT 15] achieve C η for any constant C in poly time in the agnostic setting [tradeoff running time with accuracy]. [Awasthi-Balcan-Haghtalab-Urner, COLT15], [Awasthi-Balcan-Haghtalab-Zhang, COLT16] Bounded noise, get error η + ε in time poly(d, 1/ϵ) [YanZhang, NIPS 17] modified Perceptron enjoys similar guarantees. Active & Differentially Private [Balcan-Feldman, NIPS 13] General concept spaces: [Zhang-Chaudhuri, NIPS 14]. Compute the region to localize in each round by using unlabeled data and writing an LP

35 Discussion, Open Directions AL: important paradigm in the age of Big Data. Lots of exciting developments. Disagreement based AL, general sample complexity. Margin based AL: label efficient, noise tolerant, + poly time algo for learning linear separators Better noise tolerance than known for passive learning via poly time algos. [KKMS 05] [KLS 09] Solve an adaptive sequence of convex optimization pbs on smaller & smaller bands around current guess for target.

36 Discussion, Open Directions AL: important paradigm in the age of Big Data. Lots of exciting developments. Disagreement based AL, general sample complexity. Margin based AL: label efficient, noise tolerant, poly time algo for learning linear separators. Open Directions More general distributions, other concept spaces. Exploit localization insights in other settings (e.g., online convex optimization with adversarial noise).

37 Important direction: richer interactions with the expert. Better Accuracy Fewer queries Expert Natural interaction

38

Foundations For Learning in the Age of Big Data. Maria-Florina Balcan

Foundations For Learning in the Age of Big Data. Maria-Florina Balcan Foundations For Learning in the Age of Big Data Maria-Florina Balcan Modern Machine Learning New applications Explosion of data Classic Paradigm Insufficient Nowadays Modern applications: massive amounts

More information

Foundations For Learning in the Age of Big Data. Maria-Florina Balcan

Foundations For Learning in the Age of Big Data. Maria-Florina Balcan Foundations For Learning in the Age of Big Data Maria-Florina Balcan Modern Machine Learning New applications Explosion of data Modern ML: New Learning Approaches Modern applications: massive amounts of

More information

Distributed Machine Learning. Maria-Florina Balcan Carnegie Mellon University

Distributed Machine Learning. Maria-Florina Balcan Carnegie Mellon University Distributed Machine Learning Maria-Florina Balcan Carnegie Mellon University Distributed Machine Learning Modern applications: massive amounts of data distributed across multiple locations. Distributed

More information

Distributed Machine Learning. Maria-Florina Balcan, Georgia Tech

Distributed Machine Learning. Maria-Florina Balcan, Georgia Tech Distributed Machine Learning Maria-Florina Balcan, Georgia Tech Model for reasoning about key issues in supervised learning [Balcan-Blum-Fine-Mansour, COLT 12] Supervised Learning Example: which emails

More information

1 Active Learning Foundations of Machine Learning and Data Science. Lecturer: Maria-Florina Balcan Lecture 20 & 21: November 16 & 18, 2015

1 Active Learning Foundations of Machine Learning and Data Science. Lecturer: Maria-Florina Balcan Lecture 20 & 21: November 16 & 18, 2015 10-806 Foundations of Machine Learning and Data Science Lecturer: Maria-Florina Balcan Lecture 20 & 21: November 16 & 18, 2015 1 Active Learning Most classic machine learning methods and the formal learning

More information

Practical Agnostic Active Learning

Practical Agnostic Active Learning Practical Agnostic Active Learning Alina Beygelzimer Yahoo Research based on joint work with Sanjoy Dasgupta, Daniel Hsu, John Langford, Francesco Orabona, Chicheng Zhang, and Tong Zhang * * introductory

More information

Generalization, Overfitting, and Model Selection

Generalization, Overfitting, and Model Selection Generalization, Overfitting, and Model Selection Sample Complexity Results for Supervised Classification Maria-Florina (Nina) Balcan 10/03/2016 Two Core Aspects of Machine Learning Algorithm Design. How

More information

Atutorialonactivelearning

Atutorialonactivelearning Atutorialonactivelearning Sanjoy Dasgupta 1 John Langford 2 UC San Diego 1 Yahoo Labs 2 Exploiting unlabeled data Alotofunlabeleddataisplentifulandcheap,eg. documents off the web speech samples images

More information

Support Vector Machines (SVMs).

Support Vector Machines (SVMs). Support Vector Machines (SVMs). SemiSupervised Learning. SemiSupervised SVMs. MariaFlorina Balcan 3/25/215 Support Vector Machines (SVMs). One of the most theoretically well motivated and practically most

More information

Active Learning Class 22, 03 May Claire Monteleoni MIT CSAIL

Active Learning Class 22, 03 May Claire Monteleoni MIT CSAIL Active Learning 9.520 Class 22, 03 May 2006 Claire Monteleoni MIT CSAIL Outline Motivation Historical framework: query learning Current framework: selective sampling Some recent results Open problems Active

More information

Generalization and Overfitting

Generalization and Overfitting Generalization and Overfitting Model Selection Maria-Florina (Nina) Balcan February 24th, 2016 PAC/SLT models for Supervised Learning Data Source Distribution D on X Learning Algorithm Expert / Oracle

More information

Learning and 1-bit Compressed Sensing under Asymmetric Noise

Learning and 1-bit Compressed Sensing under Asymmetric Noise JMLR: Workshop and Conference Proceedings vol 49:1 39, 2016 Learning and 1-bit Compressed Sensing under Asymmetric Noise Pranjal Awasthi Rutgers University Maria-Florina Balcan Nika Haghtalab Hongyang

More information

Active learning. Sanjoy Dasgupta. University of California, San Diego

Active learning. Sanjoy Dasgupta. University of California, San Diego Active learning Sanjoy Dasgupta University of California, San Diego Exploiting unlabeled data A lot of unlabeled data is plentiful and cheap, eg. documents off the web speech samples images and video But

More information

Generalization, Overfitting, and Model Selection

Generalization, Overfitting, and Model Selection Generalization, Overfitting, and Model Selection Sample Complexity Results for Supervised Classification MariaFlorina (Nina) Balcan 10/05/2016 Reminders Midterm Exam Mon, Oct. 10th Midterm Review Session

More information

Statistical Active Learning Algorithms

Statistical Active Learning Algorithms Statistical Active Learning Algorithms Maria Florina Balcan Georgia Institute of Technology ninamf@cc.gatech.edu Vitaly Feldman IBM Research - Almaden vitaly@post.harvard.edu Abstract We describe a framework

More information

Active Learning from Single and Multiple Annotators

Active Learning from Single and Multiple Annotators Active Learning from Single and Multiple Annotators Kamalika Chaudhuri University of California, San Diego Joint work with Chicheng Zhang Classification Given: ( xi, yi ) Vector of features Discrete Labels

More information

PAC Learning Introduction to Machine Learning. Matt Gormley Lecture 14 March 5, 2018

PAC Learning Introduction to Machine Learning. Matt Gormley Lecture 14 March 5, 2018 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University PAC Learning Matt Gormley Lecture 14 March 5, 2018 1 ML Big Picture Learning Paradigms:

More information

Asymptotic Active Learning

Asymptotic Active Learning Asymptotic Active Learning Maria-Florina Balcan Eyal Even-Dar Steve Hanneke Michael Kearns Yishay Mansour Jennifer Wortman Abstract We describe and analyze a PAC-asymptotic model for active learning. We

More information

The Perceptron Algorithm, Margins

The Perceptron Algorithm, Margins The Perceptron Algorithm, Margins MariaFlorina Balcan 08/29/2018 The Perceptron Algorithm Simple learning algorithm for supervised classification analyzed via geometric margins in the 50 s [Rosenblatt

More information

Active Learning: Disagreement Coefficient

Active Learning: Disagreement Coefficient Advanced Course in Machine Learning Spring 2010 Active Learning: Disagreement Coefficient Handouts are jointly prepared by Shie Mannor and Shai Shalev-Shwartz In previous lectures we saw examples in which

More information

Efficient Semi-supervised and Active Learning of Disjunctions

Efficient Semi-supervised and Active Learning of Disjunctions Maria-Florina Balcan ninamf@cc.gatech.edu Christopher Berlind cberlind@gatech.edu Steven Ehrlich sehrlich@cc.gatech.edu Yingyu Liang yliang39@gatech.edu School of Computer Science, College of Computing,

More information

A Bound on the Label Complexity of Agnostic Active Learning

A Bound on the Label Complexity of Agnostic Active Learning A Bound on the Label Complexity of Agnostic Active Learning Steve Hanneke March 2007 CMU-ML-07-103 School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213 Machine Learning Department,

More information

Tsybakov noise adap/ve margin- based ac/ve learning

Tsybakov noise adap/ve margin- based ac/ve learning Tsybakov noise adap/ve margin- based ac/ve learning Aar$ Singh A. Nico Habermann Associate Professor NIPS workshop on Learning Faster from Easy Data II Dec 11, 2015 Passive Learning Ac/ve Learning (X j,?)

More information

Noise-Tolerant Interactive Learning Using Pairwise Comparisons

Noise-Tolerant Interactive Learning Using Pairwise Comparisons Noise-Tolerant Interactive Learning Using Pairwise Comparisons Yichong Xu *, Hongyang Zhang *, Kyle Miller, Aarti Singh *, and Artur Dubrawski * Machine Learning Department, Carnegie Mellon University,

More information

Improved Algorithms for Confidence-Rated Prediction with Error Guarantees

Improved Algorithms for Confidence-Rated Prediction with Error Guarantees Improved Algorithms for Confidence-Rated Prediction with Error Guarantees Kamalika Chaudhuri Chicheng Zhang Department of Computer Science and Engineering University of California, San Diego 95 Gilman

More information

Efficient PAC Learning from the Crowd

Efficient PAC Learning from the Crowd Proceedings of Machine Learning Research vol 65:1 24, 2017 Efficient PAC Learning from the Crowd Pranjal Awasthi Rutgers University Avrim Blum Nika Haghtalab Carnegie Mellon University Yishay Mansour Tel-Aviv

More information

TTIC An Introduction to the Theory of Machine Learning. Learning from noisy data, intro to SQ model

TTIC An Introduction to the Theory of Machine Learning. Learning from noisy data, intro to SQ model TTIC 325 An Introduction to the Theory of Machine Learning Learning from noisy data, intro to SQ model Avrim Blum 4/25/8 Learning when there is no perfect predictor Hoeffding/Chernoff bounds: minimizing

More information

Improved Algorithms for Confidence-Rated Prediction with Error Guarantees

Improved Algorithms for Confidence-Rated Prediction with Error Guarantees Improved Algorithms for Confidence-Rated Prediction with Error Guarantees Kamalika Chaudhuri Chicheng Zhang Department of Computer Science and Engineering University of California, San Diego 95 Gilman

More information

Agnostic Active Learning

Agnostic Active Learning Agnostic Active Learning Maria-Florina Balcan Carnegie Mellon University, Pittsburgh, PA 523 Alina Beygelzimer IBM T. J. Watson Research Center, Hawthorne, NY 0532 John Langford Yahoo! Research, New York,

More information

Margin Based Active Learning

Margin Based Active Learning Margin Based Active Learning Maria-Florina Balcan 1, Andrei Broder 2, and Tong Zhang 3 1 Computer Science Department, Carnegie Mellon University, Pittsburgh, PA. ninamf@cs.cmu.edu 2 Yahoo! Research, Sunnyvale,

More information

The sample complexity of agnostic learning with deterministic labels

The sample complexity of agnostic learning with deterministic labels The sample complexity of agnostic learning with deterministic labels Shai Ben-David Cheriton School of Computer Science University of Waterloo Waterloo, ON, N2L 3G CANADA shai@uwaterloo.ca Ruth Urner College

More information

Sample complexity bounds for differentially private learning

Sample complexity bounds for differentially private learning Sample complexity bounds for differentially private learning Kamalika Chaudhuri University of California, San Diego Daniel Hsu Microsoft Research Outline 1. Learning and privacy model 2. Our results: sample

More information

Learning with Rejection

Learning with Rejection Learning with Rejection Corinna Cortes 1, Giulia DeSalvo 2, and Mehryar Mohri 2,1 1 Google Research, 111 8th Avenue, New York, NY 2 Courant Institute of Mathematical Sciences, 251 Mercer Street, New York,

More information

The Boosting Approach to. Machine Learning. Maria-Florina Balcan 10/31/2016

The Boosting Approach to. Machine Learning. Maria-Florina Balcan 10/31/2016 The Boosting Approach to Machine Learning Maria-Florina Balcan 10/31/2016 Boosting General method for improving the accuracy of any given learning algorithm. Works by creating a series of challenge datasets

More information

Better Algorithms for Selective Sampling

Better Algorithms for Selective Sampling Francesco Orabona Nicolò Cesa-Bianchi DSI, Università degli Studi di Milano, Italy francesco@orabonacom nicolocesa-bianchi@unimiit Abstract We study online algorithms for selective sampling that use regularized

More information

1 Learning Linear Separators

1 Learning Linear Separators 8803 Machine Learning Theory Maria-Florina Balcan Lecture 3: August 30, 2011 Plan: Perceptron algorithm for learning linear separators. 1 Learning Linear Separators Here we can think of examples as being

More information

Active Learning and Optimized Information Gathering

Active Learning and Optimized Information Gathering Active Learning and Optimized Information Gathering Lecture 7 Learning Theory CS 101.2 Andreas Krause Announcements Project proposal: Due tomorrow 1/27 Homework 1: Due Thursday 1/29 Any time is ok. Office

More information

Does Unlabeled Data Help?

Does Unlabeled Data Help? Does Unlabeled Data Help? Worst-case Analysis of the Sample Complexity of Semi-supervised Learning. Ben-David, Lu and Pal; COLT, 2008. Presentation by Ashish Rastogi Courant Machine Learning Seminar. Outline

More information

Minimax Analysis of Active Learning

Minimax Analysis of Active Learning Journal of Machine Learning Research 6 05 3487-360 Submitted 0/4; Published /5 Minimax Analysis of Active Learning Steve Hanneke Princeton, NJ 0854 Liu Yang IBM T J Watson Research Center, Yorktown Heights,

More information

Clustering Perturbation Resilient

Clustering Perturbation Resilient Clustering Perturbation Resilient Instances Maria-Florina Balcan Carnegie Mellon University Clustering Comes Up Everywhere Clustering news articles or web pages or search results by topic. Clustering protein

More information

arxiv: v4 [cs.lg] 5 Nov 2014

arxiv: v4 [cs.lg] 5 Nov 2014 Statistical Active Learning Algorithms for Noise Tolerance and Differential Privacy arxiv:1307.310v4 [cs.lg] 5 Nov 014 Maria Florina Balcan Carnegie Mellon University ninamf@cs.cmu.edu Abstract Vitaly

More information

PAC-learning, VC Dimension and Margin-based Bounds

PAC-learning, VC Dimension and Margin-based Bounds More details: General: http://www.learning-with-kernels.org/ Example of more complex bounds: http://www.research.ibm.com/people/t/tzhang/papers/jmlr02_cover.ps.gz PAC-learning, VC Dimension and Margin-based

More information

On the power and the limits of evolvability. Vitaly Feldman Almaden Research Center

On the power and the limits of evolvability. Vitaly Feldman Almaden Research Center On the power and the limits of evolvability Vitaly Feldman Almaden Research Center Learning from examples vs evolvability Learnable from examples Evolvable Parity functions The core model: PAC [Valiant

More information

CSE 151 Machine Learning. Instructor: Kamalika Chaudhuri

CSE 151 Machine Learning. Instructor: Kamalika Chaudhuri CSE 151 Machine Learning Instructor: Kamalika Chaudhuri Ensemble Learning How to combine multiple classifiers into a single one Works well if the classifiers are complementary This class: two types of

More information

Adaptive Sampling Under Low Noise Conditions 1

Adaptive Sampling Under Low Noise Conditions 1 Manuscrit auteur, publié dans "41èmes Journées de Statistique, SFdS, Bordeaux (2009)" Adaptive Sampling Under Low Noise Conditions 1 Nicolò Cesa-Bianchi Dipartimento di Scienze dell Informazione Università

More information

Plug-in Approach to Active Learning

Plug-in Approach to Active Learning Plug-in Approach to Active Learning Stanislav Minsker Stanislav Minsker (Georgia Tech) Plug-in approach to active learning 1 / 18 Prediction framework Let (X, Y ) be a random couple in R d { 1, +1}. X

More information

The True Sample Complexity of Active Learning

The True Sample Complexity of Active Learning The True Sample Complexity of Active Learning Maria-Florina Balcan Computer Science Department Carnegie Mellon University ninamf@cs.cmu.edu Steve Hanneke Machine Learning Department Carnegie Mellon University

More information

Machine Learning

Machine Learning Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 11, 2012 Today: Computational Learning Theory Probably Approximately Coorrect (PAC) learning theorem

More information

Selective Prediction. Binary classifications. Rong Zhou November 8, 2017

Selective Prediction. Binary classifications. Rong Zhou November 8, 2017 Selective Prediction Binary classifications Rong Zhou November 8, 2017 Table of contents 1. What are selective classifiers? 2. The Realizable Setting 3. The Noisy Setting 1 What are selective classifiers?

More information

Lecture 7: Passive Learning

Lecture 7: Passive Learning CS 880: Advanced Complexity Theory 2/8/2008 Lecture 7: Passive Learning Instructor: Dieter van Melkebeek Scribe: Tom Watson In the previous lectures, we studied harmonic analysis as a tool for analyzing

More information

Introduction to Machine Learning. Introduction to ML - TAU 2016/7 1

Introduction to Machine Learning. Introduction to ML - TAU 2016/7 1 Introduction to Machine Learning Introduction to ML - TAU 2016/7 1 Course Administration Lecturers: Amir Globerson (gamir@post.tau.ac.il) Yishay Mansour (Mansour@tau.ac.il) Teaching Assistance: Regev Schweiger

More information

Machine Learning. Computational Learning Theory. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012

Machine Learning. Computational Learning Theory. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012 Machine Learning CSE6740/CS7641/ISYE6740, Fall 2012 Computational Learning Theory Le Song Lecture 11, September 20, 2012 Based on Slides from Eric Xing, CMU Reading: Chap. 7 T.M book 1 Complexity of Learning

More information

Machine Learning

Machine Learning Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 11, 2012 Today: Computational Learning Theory Probably Approximately Coorrect (PAC) learning theorem

More information

Computational and Statistical Learning Theory

Computational and Statistical Learning Theory Computational and Statistical Learning Theory TTIC 31120 Prof. Nati Srebro Lecture 17: Stochastic Optimization Part II: Realizable vs Agnostic Rates Part III: Nearest Neighbor Classification Stochastic

More information

Lecture 29: Computational Learning Theory

Lecture 29: Computational Learning Theory CS 710: Complexity Theory 5/4/2010 Lecture 29: Computational Learning Theory Instructor: Dieter van Melkebeek Scribe: Dmitri Svetlov and Jake Rosin Today we will provide a brief introduction to computational

More information

Efficient and Principled Online Classification Algorithms for Lifelon

Efficient and Principled Online Classification Algorithms for Lifelon Efficient and Principled Online Classification Algorithms for Lifelong Learning Toyota Technological Institute at Chicago Chicago, IL USA Talk @ Lifelong Learning for Mobile Robotics Applications Workshop,

More information

Littlestone s Dimension and Online Learnability

Littlestone s Dimension and Online Learnability Littlestone s Dimension and Online Learnability Shai Shalev-Shwartz Toyota Technological Institute at Chicago The Hebrew University Talk at UCSD workshop, February, 2009 Joint work with Shai Ben-David

More information

1 Learning Linear Separators

1 Learning Linear Separators 10-601 Machine Learning Maria-Florina Balcan Spring 2015 Plan: Perceptron algorithm for learning linear separators. 1 Learning Linear Separators Here we can think of examples as being from {0, 1} n or

More information

Online Manifold Regularization: A New Learning Setting and Empirical Study

Online Manifold Regularization: A New Learning Setting and Empirical Study Online Manifold Regularization: A New Learning Setting and Empirical Study Andrew B. Goldberg 1, Ming Li 2, Xiaojin Zhu 1 1 Computer Sciences, University of Wisconsin Madison, USA. {goldberg,jerryzhu}@cs.wisc.edu

More information

The Power of Localization for Efficiently Learning Linear Separators with Noise

The Power of Localization for Efficiently Learning Linear Separators with Noise The Power of Localization for Efficiently Learning Linear Separators with Noise Pranjal Awasthi pawasthi@princeton.edu Maria Florina Balcan ninamf@cc.gatech.edu January 3, 2014 Philip M. Long plong@microsoft.com

More information

Perceptron (Theory) + Linear Regression

Perceptron (Theory) + Linear Regression 10601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Perceptron (Theory) Linear Regression Matt Gormley Lecture 6 Feb. 5, 2018 1 Q&A

More information

Online Learning, Mistake Bounds, Perceptron Algorithm

Online Learning, Mistake Bounds, Perceptron Algorithm Online Learning, Mistake Bounds, Perceptron Algorithm 1 Online Learning So far the focus of the course has been on batch learning, where algorithms are presented with a sample of training data, from which

More information

Online Learning. Jordan Boyd-Graber. University of Colorado Boulder LECTURE 21. Slides adapted from Mohri

Online Learning. Jordan Boyd-Graber. University of Colorado Boulder LECTURE 21. Slides adapted from Mohri Online Learning Jordan Boyd-Graber University of Colorado Boulder LECTURE 21 Slides adapted from Mohri Jordan Boyd-Graber Boulder Online Learning 1 of 31 Motivation PAC learning: distribution fixed over

More information

Learning symmetric non-monotone submodular functions

Learning symmetric non-monotone submodular functions Learning symmetric non-monotone submodular functions Maria-Florina Balcan Georgia Institute of Technology ninamf@cc.gatech.edu Nicholas J. A. Harvey University of British Columbia nickhar@cs.ubc.ca Satoru

More information

Machine Learning: Homework 5

Machine Learning: Homework 5 0-60 Machine Learning: Homework 5 Due 5:0 p.m. Thursday, March, 06 TAs: Travis Dick and Han Zhao Instructions Late homework policy: Homework is worth full credit if submitted before the due date, half

More information

Boosting: Foundations and Algorithms. Rob Schapire

Boosting: Foundations and Algorithms. Rob Schapire Boosting: Foundations and Algorithms Rob Schapire Example: Spam Filtering problem: filter out spam (junk email) gather large collection of examples of spam and non-spam: From: yoav@ucsd.edu Rob, can you

More information

From Batch to Transductive Online Learning

From Batch to Transductive Online Learning From Batch to Transductive Online Learning Sham Kakade Toyota Technological Institute Chicago, IL 60637 sham@tti-c.org Adam Tauman Kalai Toyota Technological Institute Chicago, IL 60637 kalai@tti-c.org

More information

Machine Learning. Computational Learning Theory. Eric Xing , Fall Lecture 9, October 5, 2016

Machine Learning. Computational Learning Theory. Eric Xing , Fall Lecture 9, October 5, 2016 Machine Learning 10-701, Fall 2016 Computational Learning Theory Eric Xing Lecture 9, October 5, 2016 Reading: Chap. 7 T.M book Eric Xing @ CMU, 2006-2016 1 Generalizability of Learning In machine learning

More information

Polynomial regression under arbitrary product distributions

Polynomial regression under arbitrary product distributions Polynomial regression under arbitrary product distributions Eric Blais and Ryan O Donnell and Karl Wimmer Carnegie Mellon University Abstract In recent work, Kalai, Klivans, Mansour, and Servedio [KKMS05]

More information

Boosting the Area Under the ROC Curve

Boosting the Area Under the ROC Curve Boosting the Area Under the ROC Curve Philip M. Long plong@google.com Rocco A. Servedio rocco@cs.columbia.edu Abstract We show that any weak ranker that can achieve an area under the ROC curve slightly

More information

Understanding Machine Learning A theory Perspective

Understanding Machine Learning A theory Perspective Understanding Machine Learning A theory Perspective Shai Ben-David University of Waterloo MLSS at MPI Tubingen, 2017 Disclaimer Warning. This talk is NOT about how cool machine learning is. I am sure you

More information

Activized Learning with Uniform Classification Noise

Activized Learning with Uniform Classification Noise Activized Learning with Uniform Classification Noise Liu Yang Steve Hanneke June 9, 2011 Abstract We prove that for any VC class, it is possible to transform any passive learning algorithm into an active

More information

Linear Classification and Selective Sampling Under Low Noise Conditions

Linear Classification and Selective Sampling Under Low Noise Conditions Linear Classification and Selective Sampling Under Low Noise Conditions Giovanni Cavallanti DSI, Università degli Studi di Milano, Italy cavallanti@dsi.unimi.it Nicolò Cesa-Bianchi DSI, Università degli

More information

Empirical Risk Minimization Algorithms

Empirical Risk Minimization Algorithms Empirical Risk Minimization Algorithms Tirgul 2 Part I November 2016 Reminder Domain set, X : the set of objects that we wish to label. Label set, Y : the set of possible labels. A prediction rule, h:

More information

Lecture 14: Binary Classification: Disagreement-based Methods

Lecture 14: Binary Classification: Disagreement-based Methods CSE599i: Online and Adaptive Machine Learning Winter 2018 Lecture 14: Binary Classification: Disagreement-based Methods Lecturer: Kevin Jamieson Scribes: N. Cano, O. Rafieian, E. Barzegary, G. Erion, S.

More information

Machine Learning

Machine Learning Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University January 14, 2015 Today: The Big Picture Overfitting Review: probability Readings: Decision trees, overfiting

More information

COS 402 Machine Learning and Artificial Intelligence Fall Lecture 3: Learning Theory

COS 402 Machine Learning and Artificial Intelligence Fall Lecture 3: Learning Theory COS 402 Machine Learning and Artificial Intelligence Fall 2016 Lecture 3: Learning Theory Sanjeev Arora Elad Hazan Admin Exercise 1 due next Tue, in class Enrolment Recap We have seen: AI by introspection

More information

Importance Weighted Active Learning

Importance Weighted Active Learning Alina Beygelzimer IBM Research, 19 Skyline Drive, Hawthorne, NY 10532 beygel@us.ibm.com Sanjoy Dasgupta dasgupta@cs.ucsd.edu Department of Computer Science and Engineering, University of California, San

More information

Computational Learning Theory. CS534 - Machine Learning

Computational Learning Theory. CS534 - Machine Learning Computational Learning Theory CS534 Machine Learning Introduction Computational learning theory Provides a theoretical analysis of learning Shows when a learning algorithm can be expected to succeed Shows

More information

Stochastic optimization in Hilbert spaces

Stochastic optimization in Hilbert spaces Stochastic optimization in Hilbert spaces Aymeric Dieuleveut Aymeric Dieuleveut Stochastic optimization Hilbert spaces 1 / 48 Outline Learning vs Statistics Aymeric Dieuleveut Stochastic optimization Hilbert

More information

Foundations of Machine Learning

Foundations of Machine Learning Introduction to ML Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu page 1 Logistics Prerequisites: basics in linear algebra, probability, and analysis of algorithms. Workload: about

More information

A Tight Excess Risk Bound via a Unified PAC-Bayesian- Rademacher-Shtarkov-MDL Complexity

A Tight Excess Risk Bound via a Unified PAC-Bayesian- Rademacher-Shtarkov-MDL Complexity A Tight Excess Risk Bound via a Unified PAC-Bayesian- Rademacher-Shtarkov-MDL Complexity Peter Grünwald Centrum Wiskunde & Informatica Amsterdam Mathematical Institute Leiden University Joint work with

More information

Logistic Regression Logistic

Logistic Regression Logistic Case Study 1: Estimating Click Probabilities L2 Regularization for Logistic Regression Machine Learning/Statistics for Big Data CSE599C1/STAT592, University of Washington Carlos Guestrin January 10 th,

More information

Machine Learning (CS 567) Lecture 2

Machine Learning (CS 567) Lecture 2 Machine Learning (CS 567) Lecture 2 Time: T-Th 5:00pm - 6:20pm Location: GFS118 Instructor: Sofus A. Macskassy (macskass@usc.edu) Office: SAL 216 Office hours: by appointment Teaching assistant: Cheol

More information

Efficient Bandit Algorithms for Online Multiclass Prediction

Efficient Bandit Algorithms for Online Multiclass Prediction Efficient Bandit Algorithms for Online Multiclass Prediction Sham Kakade, Shai Shalev-Shwartz and Ambuj Tewari Presented By: Nakul Verma Motivation In many learning applications, true class labels are

More information

Machine Learning And Applications: Supervised Learning-SVM

Machine Learning And Applications: Supervised Learning-SVM Machine Learning And Applications: Supervised Learning-SVM Raphaël Bournhonesque École Normale Supérieure de Lyon, Lyon, France raphael.bournhonesque@ens-lyon.fr 1 Supervised vs unsupervised learning Machine

More information

New Algorithms for Contextual Bandits

New Algorithms for Contextual Bandits New Algorithms for Contextual Bandits Lev Reyzin Georgia Institute of Technology Work done at Yahoo! 1 S A. Beygelzimer, J. Langford, L. Li, L. Reyzin, R.E. Schapire Contextual Bandit Algorithms with Supervised

More information

The Perceptron algorithm

The Perceptron algorithm The Perceptron algorithm Tirgul 3 November 2016 Agnostic PAC Learnability A hypothesis class H is agnostic PAC learnable if there exists a function m H : 0,1 2 N and a learning algorithm with the following

More information

Active Learning with Oracle Epiphany

Active Learning with Oracle Epiphany Active Learning with Oracle Epiphany Tzu-Kuo Huang Uber Advanced Technologies Group Pittsburgh, PA 1501 Lihong Li Microsoft Research Redmond, WA 9805 Ara Vartanian University of Wisconsin Madison Madison,

More information

Learning Theory. Machine Learning CSE546 Carlos Guestrin University of Washington. November 25, Carlos Guestrin

Learning Theory. Machine Learning CSE546 Carlos Guestrin University of Washington. November 25, Carlos Guestrin Learning Theory Machine Learning CSE546 Carlos Guestrin University of Washington November 25, 2013 Carlos Guestrin 2005-2013 1 What now n We have explored many ways of learning from data n But How good

More information

Robust Selective Sampling from Single and Multiple Teachers

Robust Selective Sampling from Single and Multiple Teachers Robust Selective Sampling from Single and Multiple Teachers Ofer Dekel Microsoft Research oferd@microsoftcom Claudio Gentile DICOM, Università dell Insubria claudiogentile@uninsubriait Karthik Sridharan

More information

Machine Learning. Support Vector Machines. Fabio Vandin November 20, 2017

Machine Learning. Support Vector Machines. Fabio Vandin November 20, 2017 Machine Learning Support Vector Machines Fabio Vandin November 20, 2017 1 Classification and Margin Consider a classification problem with two classes: instance set X = R d label set Y = { 1, 1}. Training

More information

Towards Lifelong Machine Learning Multi-Task and Lifelong Learning with Unlabeled Tasks Christoph Lampert

Towards Lifelong Machine Learning Multi-Task and Lifelong Learning with Unlabeled Tasks Christoph Lampert Towards Lifelong Machine Learning Multi-Task and Lifelong Learning with Unlabeled Tasks Christoph Lampert HSE Computer Science Colloquium September 6, 2016 IST Austria (Institute of Science and Technology

More information

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18 CSE 417T: Introduction to Machine Learning Final Review Henry Chai 12/4/18 Overfitting Overfitting is fitting the training data more than is warranted Fitting noise rather than signal 2 Estimating! "#$

More information

1 Differential Privacy and Statistical Query Learning

1 Differential Privacy and Statistical Query Learning 10-806 Foundations of Machine Learning and Data Science Lecturer: Maria-Florina Balcan Lecture 5: December 07, 015 1 Differential Privacy and Statistical Query Learning 1.1 Differential Privacy Suppose

More information

Introduction to Algorithms / Algorithms I Lecturer: Michael Dinitz Topic: Intro to Learning Theory Date: 12/8/16

Introduction to Algorithms / Algorithms I Lecturer: Michael Dinitz Topic: Intro to Learning Theory Date: 12/8/16 600.463 Introduction to Algorithms / Algorithms I Lecturer: Michael Dinitz Topic: Intro to Learning Theory Date: 12/8/16 25.1 Introduction Today we re going to talk about machine learning, but from an

More information

An Algorithms-based Intro to Machine Learning

An Algorithms-based Intro to Machine Learning CMU 15451 lecture 12/08/11 An Algorithmsbased Intro to Machine Learning Plan for today Machine Learning intro: models and basic issues An interesting algorithm for combining expert advice Avrim Blum [Based

More information

Support vector machines Lecture 4

Support vector machines Lecture 4 Support vector machines Lecture 4 David Sontag New York University Slides adapted from Luke Zettlemoyer, Vibhav Gogate, and Carlos Guestrin Q: What does the Perceptron mistake bound tell us? Theorem: The

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning 236756 Prof. Nir Ailon Lecture 4: Computational Complexity of Learning & Surrogate Losses Efficient PAC Learning Until now we were mostly worried about sample complexity

More information

Tutorial: PART 1. Online Convex Optimization, A Game- Theoretic Approach to Learning.

Tutorial: PART 1. Online Convex Optimization, A Game- Theoretic Approach to Learning. Tutorial: PART 1 Online Convex Optimization, A Game- Theoretic Approach to Learning http://www.cs.princeton.edu/~ehazan/tutorial/tutorial.htm Elad Hazan Princeton University Satyen Kale Yahoo Research

More information