Foundations For Learning in the Age of Big Data. Maria-Florina Balcan

Size: px
Start display at page:

Download "Foundations For Learning in the Age of Big Data. Maria-Florina Balcan"

Transcription

1 Foundations For Learning in the Age of Big Data Maria-Florina Balcan

2 Modern Machine Learning New applications Explosion of data

3 Classic Paradigm Insufficient Nowadays Modern applications: massive amounts of raw data. Only a tiny fraction can be annotated by human experts. Protein sequences Billions of webpages Images

4 Classic Paradigm Insufficient Nowadays Modern applications: massive amounts of data distributed across multiple locations

5 Interactive Learning Outline of the talk Noise tolerant poly time active learning algos. Implications to passive learning. Learning with richer interaction. Distributed Learning Model communication as key resource. Communication efficient algos.

6 Supervised Classification Data Source Distribution D on X Learning Algorithm Labeled Examples (x 1,c*(x 1 )),, (x m,c*(x m )) Expert / Oracle Alg.outputs h : X! {0,1} c* : X! {0,1}

7 Statistical / PAC learning model Learning Algorithm Data Source Labeled Examples (x 1,c*(x 1 )),, (x k,c*(x m )) Distribution D on X Expert / Oracle Alg.outputs c* : X! {0,1} h : X! {0,1} Algo sees (x 1,c*(x 1 )),, (x k,c*(x m )), x i i.i.d. from D Do optimization over S, find hypothesis h 2 C. Goal: h has small error over D. err(h)=pr x 2 D (h(x) c*(x)) c* in C, realizable case; else agnostic -

8 Two Main Aspects in Classic Learning Theory Algorithm Design. How to optimize? Automatically generate rules that do well on observed data. E.g., Boosting, SVM, etc. Confidence Bounds, Generalization Guarantees Confidence for rule effectiveness on future data.

9 Sample Complexity Results Confidence Bounds, Generalization Guarantees Confidence for rule effectiveness on future data. Agnostic replace with 2.

10 Interactive Machine Learning

11 Learning Algorithm Unlabeled examples Active Learning Data Source Expert / Oracle Request for the Label of an Example A Label for that Example Request for the Label of an Example A Label for that Example... Algorithm outputs a classifier Learner can choose specific examples to be labeled. Goal: use fewer labeled examples. Need to pick informative examples to be labeled.

12 Active Learning in Practice Text classification: active SVM (Tong & Koller, ICML2000). e.g., request label of the example closest to current separator. Video Segmentation (Fathi-Balcan-Ren-Regh, BMVC 11).

13 Provable Guarantees, Active Learning Canonical theoretical example [CAL92, Dasgupta04] - + Active Algorithm w Sample with 1/ unlabeled examples; do binary search Passive supervised: (1/ ) labels to find an -accurate threshold. Active: only O(log 1/ ) labels. Exponential improvement.

14 Disagreement Based Active Learning Disagreement based algos: query points from current region of disagreement, throw out hypotheses when statistically confident they are suboptimal. First analyzed in [Balcan, Beygelzimer, Langford 06] for A 2 algo. Lots of subsequent work: [Hanneke07, DasguptaHsuMontleoni 07, Wang 09, Fridman 09, Koltchinskii10, BHW 08, BeygelzimerHsuLangfordZhang 10, Hsu 10, Ailon 12, ] Generic (any class), adversarial label noise. Suboptimal in label complex & computationally prohibitive.

15 Poly Time, Noise Tolerant, Label Optimal AL Algos.

16 Margin Based Active Learning Margin based algo for learning linear separators Realizable: exponential improvement, only O(d log 1/ ) labels to find w error when D logconcave. [Balcan-Long COLT 13] Resolves an open question on sample complex. of ERM. Agnostic & malicious noise: poly-time AL algo outputs w with err(w) =O( ), =err( best lin. sep). [Awasthi-Balcan-Long 2013] First poly time AL algo in noisy scenarios! First for malicious noise [Val85] (features corrupted too). Improves on noise tolerance of previous best passive [KKMS 05], [KLS 09] algos too!

17 Margin Based Active-Learning, Realizable Case Draw m 1 unlabeled examples, label them, add them to W(1). iterate k = 2,, s find a hypothesis w k-1 consistent with W(k-1). W(k)=W(k-1). sample m k unlabeled samples x satisfying w k-1 x k-1 label them and add them to W(k). 1 w 2 w 3 w 1 2

18 Margin Based Active-Learning, Realizable Case Theorem If D log-concave in R d. and then after iterations err(w s ). Log-concave distributions: log of density fnc concave wide class: uniform distr. over any convex set, Gaussian, Logistic, etc major role in sampling & optimization [LV 07, KKMS 05,KLT 09]

19 Linear Separators, Log-Concave Distributions Fact 1 Proof idea: (u,v) u v project the region of disagreement in the space given by u and v use properties of log-concave distributions in 2 dimensions. Fact 2 v

20 Linear Separators, Log-Concave Distributions Fact 3 If and v u v

21 Margin Based Active-Learning, Realizable Case iterate k=2,,s find a hypothesis w k-1 consistent with W(k-1). W(k)=W(k-1). sample m k unlabeled samples x satisfying w k-1 x k-1 label them and add them to W(k). w w k-1 w * Proof Idea Induction: all w consistent with W(k) have error 1/2 k ; so, w k has error 1/2 k. k-1 For 1/2 k+1

22 Proof Idea Under logconcave distr. for w w k-1 1/2 k+1 w * k-1

23 Proof Idea Under logconcave distr. for w w k-1 1/2 k+1 w * k-1 Enough to ensure Can do with only labels.

24 Margin Based Analysis [Balcan-Long, COLT13] Theorem: (Active, Realizable) D log-concave in R d only O(d log 1/ ) labels to find w, err(w) ². Also leads to optimal bound for ERM passive learning Theorem: (Passive, Realizable) Any w consistent with labeled examples satisfies err(w) ², with prob. 1-±. Solves open question for the uniform distr. [Long 95, 03], [Bshouty 09] First tight bound for poly-time PAC algos for an infinite class of fns under a general class of distributions. [Ehrenfeucht et al., 1989; Blumer et al., 1989]

25 Our Results [Awasthi-Balcan-Long 2013] Theorem: (Active, Agnostic or Malicious) Poly(d, 1/ ) time algo, uses only O(d 2 log 1/ ) labels to find w with err(w) log(1/ )+ when D log-concave in R d. First poly time algo for agnostic case. First analysis for malicious noise [Val85] [features corrupted too]. Theorem: (Active, Agnostic or Malicious) Poly(d, 1/ ) time algo, uses only O(d 2 log 1/ ) labels to find w with err(w) c ( + ) when D uniform in R d. Significantly improves over previously known results for passive learning too! [KKMS 05, KLS 09]

26 Margin Based Active-Learning, Agnostic Case Draw m 1 unlabeled examples, label them, add them to W. iterate k=2,, s find w k-1 in B(w k-1, r k-1 ) of small k-1 hinge loss wrt W. Clear working set. sample m k unlabeled samples x satisfying w k-1 x k-1 ; label them and add them to W. end iterate

27 Margin Based Active-Learning, Agnostic Case Draw m 1 unlabeled examples, label them, add them to W. iterate k=2,, s find w k-1 in B(w k-1, r k-1 ) of small Localization in k-1 hinge loss wrt W. concept space. Clear working set. sample m k unlabeled samples x satisfying w k-1 x k-1 ; label them and add them to W. end iterate Localization in instance space.

28 Analysis: the Agnostic Case Theorem D log-concave in R d. If,,,, err(w s ). Key ideas: As before need For w in B(w k-1, r k-1 ) we have Infl. Noisy points sufficient to set Careful variance analysis leads Hinge loss over clean examples

29 Analysis: Malicious Noise Theorem D log-concave in R d. If,,,, err(w s ). The adversary can corrupt both the label and the feature part. Key ideas: As before need Soft localized outlier removal and careful variance analysis.

30 Improves over Passive Learning too! Passive Learning Prior Work Our Work Malicious [KLS 09] Agnostic [KLS 09] Active Learning [agnostic/malicious] NA

31 Improves over Passive Learning too! Passive Learning Prior Work Our Work [KKMS 05] Malicious [KLS 09] Info theoretic optimal Agnostic [KKMS 05] Active Learning [agnostic/malicious] NA Info theoretic optimal Slightly better results for the uniform distribution case.

32 Localization both algorithmic and analysis tool! Useful for active and passive learning!

33 Important direction: richer interactions with the expert. Better Accuracy Fewer queries Expert Natural interaction

34 New Types of Interaction [Balcan-Hanneke COLT 12] Class Conditional Query raw data Classifier Expert Labeler Mistake Query raw data dog cat penguin ) Classifier wolf Expert Labeler 37

35 Class Conditional & Mistake Queries Used in practice, e.g. Faces in IPhoto. Lack of theoretical understanding. Realizable (Folklore): much fewer queries than label requests. Balcan-Hanneke, COLT 12 Tight bounds on the number of CCQs to learn in the presence of noise (agnostic and bounded noise) 38

36 Important direction: richer interactions with the expert. Better Accuracy Fewer queries Expert Natural interaction

37 Distributed Learning

38 Distributed Learning Data distributed across multiple locations. E.g., medical data

39 Distributed Learning Data distributed across multiple locations. Each has a piece of the overall data pie. To learn over the combined D, must communicate. Communication is expensive. President Obama cites Communication-Avoiding Algorithms in FY 2012 Department of Energy Budget Request to Congress Important question: how much communication? Plus, privacy & incentives.

40 Distributed PAC learning X instance space. k players. [Balcan-Blum-Fine-Mansour, COLT 2012] Runner UP Best Paper Player i can sample from D i, samples labeled by c*. Goal: find h that approximates c* w.r.t. D=1/k (D D k ) Goal: learn good h over D, as little communication as possible Main Results Generic bounds on communication. Broadly applicable communication efficient distr. boosting. Tight results for interesting cases [intersection closed, parity fns, linear separators over nice distrib]. Privacy guarantees.

41 Interesting special case to think about k=2. One has the positives and one has the negatives. How much communication, e.g., for linear separators? Player 1 Player

42 Active learning algos with good label complexity Distributed learning algos with good communication complexity So, if linear sep., log-concave distr. only d log(1/²) examples communicated.

43 Generic Results Baseline d/² log(1/²) examples, 1 round of communication Each player sends d/(²k) log(1/²) examples to player 1. Player 1 finds consistent h 2 C, whp error ² wrt D Distributed Boosting Only O(d log 1/²) examples of communication

44 Key Properties of Adaboost Input: S={(x 1, y 1 ),,(x m, y m )} For t=1,2,,t Construct D t on {x 1,, x m } Run weak algo A on D t, get h t Output H_final=sgn( α t h t ) D 1 uniform on {x 1,, x m } h t D t+1 increases weight on x i if h t incorrect on x i ; decreases it on x i if h t correct. D t+1 i D t+1 i = D t i Z t e α t if y i = h t x i = D t i Z t e α t if y i h t x i Key points: D t+1 (x i ) depends on h 1 (x i ),, h t (x i ) and normalization factor that can be communicated efficiently. To achieve weak learning it suffices to use O(d) examples.

45 Distributed Adaboost Each player i has a sample S i from D i. For t=1,2,,t Each player sends player 1 data to produce weak h t. [For t=1, O(d/k) examples each.] Player 1 broadcasts h t to others. Player i reweights its own distribution on S i using h t and sends the sum of its weights w i,t to player 1. Player 1 determines # of samples to request from each i [samples O(d) times from the multinomial given by w i,t /W t to get n i,t+1 ]. h t DS 1 SD SD k n 1,t+1 w 1,t n 2,t+1 w nw k,t+1

46 Communication: fundamental resource in DL Theorem Learn any class C with O(log(1/²)) rounds using O(d) examples + 1 hypothesis per round. Key: in Adaboost, O(log 1/²) rounds to achieve error ε. Theorem In the agnostic case, can learn to error O(OPT)+ε using only O(k log C log(1/²)) examples. Key: distributed implementation of Robust Halving developed for learning with mistake queries [Balcan-Hanneke 12].

47 Distributed Clustering [Balcan-Ehrlich-Liang, NIPS 2013] k-median: find center pts c 1, c 2,, c r to minimize x min i d(x,c i ) k-means: find center pts c 1, c 2,, c r to minimize x min i d 2 (x,c i ) s c3 Key idea: use coresets, short summaries capturing relevant info w.r.t. all clusterings. y c 1 x z c 2 [Feldman-Langberg STOC 11] show that in centralized setting one can construct a coreset of size By combining local coresets, we get a global coreset the size goes up multiplicatively by #sites. In [Balcan-Ehrlich-Liang, NIPS 2013] show a 2 round procedure with communication only [As opposed to ]

48 Distributed Clustering [Balcan-Ehrlich-Liang, NIPS 2013] k-means: find center pts c 1, c 2,, c k to minimize x min i d 2 (x,c i )

49 Discussion Communication as a fundamental resource. Open Questions Other learning or optimization tasks. Refined trade-offs between communication complexity, computational complexity, and sample complexity. Analyze such issues in the context of transfer learning of large collections of multiple related tasks (e.g., NELL).

50

Foundations For Learning in the Age of Big Data. Maria-Florina Balcan

Foundations For Learning in the Age of Big Data. Maria-Florina Balcan Foundations For Learning in the Age of Big Data Maria-Florina Balcan Modern Machine Learning New applications Explosion of data Modern ML: New Learning Approaches Modern applications: massive amounts of

More information

Sample and Computationally Efficient Active Learning. Maria-Florina Balcan Carnegie Mellon University

Sample and Computationally Efficient Active Learning. Maria-Florina Balcan Carnegie Mellon University Sample and Computationally Efficient Active Learning Maria-Florina Balcan Carnegie Mellon University Machine Learning is Shaping the World Highly successful discipline with lots of applications. Computational

More information

Distributed Machine Learning. Maria-Florina Balcan Carnegie Mellon University

Distributed Machine Learning. Maria-Florina Balcan Carnegie Mellon University Distributed Machine Learning Maria-Florina Balcan Carnegie Mellon University Distributed Machine Learning Modern applications: massive amounts of data distributed across multiple locations. Distributed

More information

Distributed Machine Learning. Maria-Florina Balcan, Georgia Tech

Distributed Machine Learning. Maria-Florina Balcan, Georgia Tech Distributed Machine Learning Maria-Florina Balcan, Georgia Tech Model for reasoning about key issues in supervised learning [Balcan-Blum-Fine-Mansour, COLT 12] Supervised Learning Example: which emails

More information

Generalization, Overfitting, and Model Selection

Generalization, Overfitting, and Model Selection Generalization, Overfitting, and Model Selection Sample Complexity Results for Supervised Classification Maria-Florina (Nina) Balcan 10/03/2016 Two Core Aspects of Machine Learning Algorithm Design. How

More information

Generalization and Overfitting

Generalization and Overfitting Generalization and Overfitting Model Selection Maria-Florina (Nina) Balcan February 24th, 2016 PAC/SLT models for Supervised Learning Data Source Distribution D on X Learning Algorithm Expert / Oracle

More information

1 Active Learning Foundations of Machine Learning and Data Science. Lecturer: Maria-Florina Balcan Lecture 20 & 21: November 16 & 18, 2015

1 Active Learning Foundations of Machine Learning and Data Science. Lecturer: Maria-Florina Balcan Lecture 20 & 21: November 16 & 18, 2015 10-806 Foundations of Machine Learning and Data Science Lecturer: Maria-Florina Balcan Lecture 20 & 21: November 16 & 18, 2015 1 Active Learning Most classic machine learning methods and the formal learning

More information

Support Vector Machines (SVMs).

Support Vector Machines (SVMs). Support Vector Machines (SVMs). SemiSupervised Learning. SemiSupervised SVMs. MariaFlorina Balcan 3/25/215 Support Vector Machines (SVMs). One of the most theoretically well motivated and practically most

More information

Generalization, Overfitting, and Model Selection

Generalization, Overfitting, and Model Selection Generalization, Overfitting, and Model Selection Sample Complexity Results for Supervised Classification MariaFlorina (Nina) Balcan 10/05/2016 Reminders Midterm Exam Mon, Oct. 10th Midterm Review Session

More information

Active Learning Class 22, 03 May Claire Monteleoni MIT CSAIL

Active Learning Class 22, 03 May Claire Monteleoni MIT CSAIL Active Learning 9.520 Class 22, 03 May 2006 Claire Monteleoni MIT CSAIL Outline Motivation Historical framework: query learning Current framework: selective sampling Some recent results Open problems Active

More information

Practical Agnostic Active Learning

Practical Agnostic Active Learning Practical Agnostic Active Learning Alina Beygelzimer Yahoo Research based on joint work with Sanjoy Dasgupta, Daniel Hsu, John Langford, Francesco Orabona, Chicheng Zhang, and Tong Zhang * * introductory

More information

Statistical Active Learning Algorithms

Statistical Active Learning Algorithms Statistical Active Learning Algorithms Maria Florina Balcan Georgia Institute of Technology ninamf@cc.gatech.edu Vitaly Feldman IBM Research - Almaden vitaly@post.harvard.edu Abstract We describe a framework

More information

Atutorialonactivelearning

Atutorialonactivelearning Atutorialonactivelearning Sanjoy Dasgupta 1 John Langford 2 UC San Diego 1 Yahoo Labs 2 Exploiting unlabeled data Alotofunlabeleddataisplentifulandcheap,eg. documents off the web speech samples images

More information

Efficient Semi-supervised and Active Learning of Disjunctions

Efficient Semi-supervised and Active Learning of Disjunctions Maria-Florina Balcan ninamf@cc.gatech.edu Christopher Berlind cberlind@gatech.edu Steven Ehrlich sehrlich@cc.gatech.edu Yingyu Liang yliang39@gatech.edu School of Computer Science, College of Computing,

More information

Active Learning from Single and Multiple Annotators

Active Learning from Single and Multiple Annotators Active Learning from Single and Multiple Annotators Kamalika Chaudhuri University of California, San Diego Joint work with Chicheng Zhang Classification Given: ( xi, yi ) Vector of features Discrete Labels

More information

Active learning. Sanjoy Dasgupta. University of California, San Diego

Active learning. Sanjoy Dasgupta. University of California, San Diego Active learning Sanjoy Dasgupta University of California, San Diego Exploiting unlabeled data A lot of unlabeled data is plentiful and cheap, eg. documents off the web speech samples images and video But

More information

Learning and 1-bit Compressed Sensing under Asymmetric Noise

Learning and 1-bit Compressed Sensing under Asymmetric Noise JMLR: Workshop and Conference Proceedings vol 49:1 39, 2016 Learning and 1-bit Compressed Sensing under Asymmetric Noise Pranjal Awasthi Rutgers University Maria-Florina Balcan Nika Haghtalab Hongyang

More information

Efficient PAC Learning from the Crowd

Efficient PAC Learning from the Crowd Proceedings of Machine Learning Research vol 65:1 24, 2017 Efficient PAC Learning from the Crowd Pranjal Awasthi Rutgers University Avrim Blum Nika Haghtalab Carnegie Mellon University Yishay Mansour Tel-Aviv

More information

PAC Learning Introduction to Machine Learning. Matt Gormley Lecture 14 March 5, 2018

PAC Learning Introduction to Machine Learning. Matt Gormley Lecture 14 March 5, 2018 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University PAC Learning Matt Gormley Lecture 14 March 5, 2018 1 ML Big Picture Learning Paradigms:

More information

Active Learning: Disagreement Coefficient

Active Learning: Disagreement Coefficient Advanced Course in Machine Learning Spring 2010 Active Learning: Disagreement Coefficient Handouts are jointly prepared by Shie Mannor and Shai Shalev-Shwartz In previous lectures we saw examples in which

More information

A Bound on the Label Complexity of Agnostic Active Learning

A Bound on the Label Complexity of Agnostic Active Learning A Bound on the Label Complexity of Agnostic Active Learning Steve Hanneke March 2007 CMU-ML-07-103 School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213 Machine Learning Department,

More information

Machine Learning. Computational Learning Theory. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012

Machine Learning. Computational Learning Theory. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012 Machine Learning CSE6740/CS7641/ISYE6740, Fall 2012 Computational Learning Theory Le Song Lecture 11, September 20, 2012 Based on Slides from Eric Xing, CMU Reading: Chap. 7 T.M book 1 Complexity of Learning

More information

Sample complexity bounds for differentially private learning

Sample complexity bounds for differentially private learning Sample complexity bounds for differentially private learning Kamalika Chaudhuri University of California, San Diego Daniel Hsu Microsoft Research Outline 1. Learning and privacy model 2. Our results: sample

More information

1 Learning Linear Separators

1 Learning Linear Separators 8803 Machine Learning Theory Maria-Florina Balcan Lecture 3: August 30, 2011 Plan: Perceptron algorithm for learning linear separators. 1 Learning Linear Separators Here we can think of examples as being

More information

Machine Learning. Computational Learning Theory. Eric Xing , Fall Lecture 9, October 5, 2016

Machine Learning. Computational Learning Theory. Eric Xing , Fall Lecture 9, October 5, 2016 Machine Learning 10-701, Fall 2016 Computational Learning Theory Eric Xing Lecture 9, October 5, 2016 Reading: Chap. 7 T.M book Eric Xing @ CMU, 2006-2016 1 Generalizability of Learning In machine learning

More information

Machine Learning

Machine Learning Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 11, 2012 Today: Computational Learning Theory Probably Approximately Coorrect (PAC) learning theorem

More information

The Boosting Approach to. Machine Learning. Maria-Florina Balcan 10/31/2016

The Boosting Approach to. Machine Learning. Maria-Florina Balcan 10/31/2016 The Boosting Approach to Machine Learning Maria-Florina Balcan 10/31/2016 Boosting General method for improving the accuracy of any given learning algorithm. Works by creating a series of challenge datasets

More information

Improved Algorithms for Confidence-Rated Prediction with Error Guarantees

Improved Algorithms for Confidence-Rated Prediction with Error Guarantees Improved Algorithms for Confidence-Rated Prediction with Error Guarantees Kamalika Chaudhuri Chicheng Zhang Department of Computer Science and Engineering University of California, San Diego 95 Gilman

More information

Machine Learning

Machine Learning Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 11, 2012 Today: Computational Learning Theory Probably Approximately Coorrect (PAC) learning theorem

More information

Efficient and Principled Online Classification Algorithms for Lifelon

Efficient and Principled Online Classification Algorithms for Lifelon Efficient and Principled Online Classification Algorithms for Lifelong Learning Toyota Technological Institute at Chicago Chicago, IL USA Talk @ Lifelong Learning for Mobile Robotics Applications Workshop,

More information

Asymptotic Active Learning

Asymptotic Active Learning Asymptotic Active Learning Maria-Florina Balcan Eyal Even-Dar Steve Hanneke Michael Kearns Yishay Mansour Jennifer Wortman Abstract We describe and analyze a PAC-asymptotic model for active learning. We

More information

The Perceptron Algorithm, Margins

The Perceptron Algorithm, Margins The Perceptron Algorithm, Margins MariaFlorina Balcan 08/29/2018 The Perceptron Algorithm Simple learning algorithm for supervised classification analyzed via geometric margins in the 50 s [Rosenblatt

More information

TTIC An Introduction to the Theory of Machine Learning. Learning from noisy data, intro to SQ model

TTIC An Introduction to the Theory of Machine Learning. Learning from noisy data, intro to SQ model TTIC 325 An Introduction to the Theory of Machine Learning Learning from noisy data, intro to SQ model Avrim Blum 4/25/8 Learning when there is no perfect predictor Hoeffding/Chernoff bounds: minimizing

More information

PAC-learning, VC Dimension and Margin-based Bounds

PAC-learning, VC Dimension and Margin-based Bounds More details: General: http://www.learning-with-kernels.org/ Example of more complex bounds: http://www.research.ibm.com/people/t/tzhang/papers/jmlr02_cover.ps.gz PAC-learning, VC Dimension and Margin-based

More information

Margin Based Active Learning

Margin Based Active Learning Margin Based Active Learning Maria-Florina Balcan 1, Andrei Broder 2, and Tong Zhang 3 1 Computer Science Department, Carnegie Mellon University, Pittsburgh, PA. ninamf@cs.cmu.edu 2 Yahoo! Research, Sunnyvale,

More information

From Batch to Transductive Online Learning

From Batch to Transductive Online Learning From Batch to Transductive Online Learning Sham Kakade Toyota Technological Institute Chicago, IL 60637 sham@tti-c.org Adam Tauman Kalai Toyota Technological Institute Chicago, IL 60637 kalai@tti-c.org

More information

1 Learning Linear Separators

1 Learning Linear Separators 10-601 Machine Learning Maria-Florina Balcan Spring 2015 Plan: Perceptron algorithm for learning linear separators. 1 Learning Linear Separators Here we can think of examples as being from {0, 1} n or

More information

Does Unlabeled Data Help?

Does Unlabeled Data Help? Does Unlabeled Data Help? Worst-case Analysis of the Sample Complexity of Semi-supervised Learning. Ben-David, Lu and Pal; COLT, 2008. Presentation by Ashish Rastogi Courant Machine Learning Seminar. Outline

More information

1 Differential Privacy and Statistical Query Learning

1 Differential Privacy and Statistical Query Learning 10-806 Foundations of Machine Learning and Data Science Lecturer: Maria-Florina Balcan Lecture 5: December 07, 015 1 Differential Privacy and Statistical Query Learning 1.1 Differential Privacy Suppose

More information

Active Learning and Optimized Information Gathering

Active Learning and Optimized Information Gathering Active Learning and Optimized Information Gathering Lecture 7 Learning Theory CS 101.2 Andreas Krause Announcements Project proposal: Due tomorrow 1/27 Homework 1: Due Thursday 1/29 Any time is ok. Office

More information

Better Algorithms for Selective Sampling

Better Algorithms for Selective Sampling Francesco Orabona Nicolò Cesa-Bianchi DSI, Università degli Studi di Milano, Italy francesco@orabonacom nicolocesa-bianchi@unimiit Abstract We study online algorithms for selective sampling that use regularized

More information

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18 CSE 417T: Introduction to Machine Learning Final Review Henry Chai 12/4/18 Overfitting Overfitting is fitting the training data more than is warranted Fitting noise rather than signal 2 Estimating! "#$

More information

Agnostic Active Learning

Agnostic Active Learning Agnostic Active Learning Maria-Florina Balcan Carnegie Mellon University, Pittsburgh, PA 523 Alina Beygelzimer IBM T. J. Watson Research Center, Hawthorne, NY 0532 John Langford Yahoo! Research, New York,

More information

ICML '97 and AAAI '97 Tutorials

ICML '97 and AAAI '97 Tutorials A Short Course in Computational Learning Theory: ICML '97 and AAAI '97 Tutorials Michael Kearns AT&T Laboratories Outline Sample Complexity/Learning Curves: nite classes, Occam's VC dimension Razor, Best

More information

The sample complexity of agnostic learning with deterministic labels

The sample complexity of agnostic learning with deterministic labels The sample complexity of agnostic learning with deterministic labels Shai Ben-David Cheriton School of Computer Science University of Waterloo Waterloo, ON, N2L 3G CANADA shai@uwaterloo.ca Ruth Urner College

More information

Improved Algorithms for Confidence-Rated Prediction with Error Guarantees

Improved Algorithms for Confidence-Rated Prediction with Error Guarantees Improved Algorithms for Confidence-Rated Prediction with Error Guarantees Kamalika Chaudhuri Chicheng Zhang Department of Computer Science and Engineering University of California, San Diego 95 Gilman

More information

New Algorithms for Contextual Bandits

New Algorithms for Contextual Bandits New Algorithms for Contextual Bandits Lev Reyzin Georgia Institute of Technology Work done at Yahoo! 1 S A. Beygelzimer, J. Langford, L. Li, L. Reyzin, R.E. Schapire Contextual Bandit Algorithms with Supervised

More information

Boosting the Area Under the ROC Curve

Boosting the Area Under the ROC Curve Boosting the Area Under the ROC Curve Philip M. Long plong@google.com Rocco A. Servedio rocco@cs.columbia.edu Abstract We show that any weak ranker that can achieve an area under the ROC curve slightly

More information

Tsybakov noise adap/ve margin- based ac/ve learning

Tsybakov noise adap/ve margin- based ac/ve learning Tsybakov noise adap/ve margin- based ac/ve learning Aar$ Singh A. Nico Habermann Associate Professor NIPS workshop on Learning Faster from Easy Data II Dec 11, 2015 Passive Learning Ac/ve Learning (X j,?)

More information

An Algorithms-based Intro to Machine Learning

An Algorithms-based Intro to Machine Learning CMU 15451 lecture 12/08/11 An Algorithmsbased Intro to Machine Learning Plan for today Machine Learning intro: models and basic issues An interesting algorithm for combining expert advice Avrim Blum [Based

More information

Online Learning, Mistake Bounds, Perceptron Algorithm

Online Learning, Mistake Bounds, Perceptron Algorithm Online Learning, Mistake Bounds, Perceptron Algorithm 1 Online Learning So far the focus of the course has been on batch learning, where algorithms are presented with a sample of training data, from which

More information

Minimax Analysis of Active Learning

Minimax Analysis of Active Learning Journal of Machine Learning Research 6 05 3487-360 Submitted 0/4; Published /5 Minimax Analysis of Active Learning Steve Hanneke Princeton, NJ 0854 Liu Yang IBM T J Watson Research Center, Yorktown Heights,

More information

Learning Theory. Machine Learning CSE546 Carlos Guestrin University of Washington. November 25, Carlos Guestrin

Learning Theory. Machine Learning CSE546 Carlos Guestrin University of Washington. November 25, Carlos Guestrin Learning Theory Machine Learning CSE546 Carlos Guestrin University of Washington November 25, 2013 Carlos Guestrin 2005-2013 1 What now n We have explored many ways of learning from data n But How good

More information

PAC-learning, VC Dimension and Margin-based Bounds

PAC-learning, VC Dimension and Margin-based Bounds More details: General: http://www.learning-with-kernels.org/ Example of more complex bounds: http://www.research.ibm.com/people/t/tzhang/papers/jmlr02_cover.ps.gz PAC-learning, VC Dimension and Margin-based

More information

On the Sample Complexity of Noise-Tolerant Learning

On the Sample Complexity of Noise-Tolerant Learning On the Sample Complexity of Noise-Tolerant Learning Javed A. Aslam Department of Computer Science Dartmouth College Hanover, NH 03755 Scott E. Decatur Laboratory for Computer Science Massachusetts Institute

More information

Active Testing! Liu Yang! Joint work with Nina Balcan,! Eric Blais and Avrim Blum! Liu Yang 2011

Active Testing! Liu Yang! Joint work with Nina Balcan,! Eric Blais and Avrim Blum! Liu Yang 2011 ! Active Testing! Liu Yang! Joint work with Nina Balcan,! Eric Blais and Avrim Blum! 1 Property Testing Instance space X = R n (Distri D over X)! Tested function f : X->{0,1}! A property P of Boolean fn

More information

The Power of Localization for Efficiently Learning Linear Separators with Noise

The Power of Localization for Efficiently Learning Linear Separators with Noise The Power of Localization for Efficiently Learning Linear Separators with Noise Pranjal Awasthi pawasthi@princeton.edu Maria Florina Balcan ninamf@cc.gatech.edu January 3, 2014 Philip M. Long plong@microsoft.com

More information

Learning with Rejection

Learning with Rejection Learning with Rejection Corinna Cortes 1, Giulia DeSalvo 2, and Mehryar Mohri 2,1 1 Google Research, 111 8th Avenue, New York, NY 2 Courant Institute of Mathematical Sciences, 251 Mercer Street, New York,

More information

Learning large-margin halfspaces with more malicious noise

Learning large-margin halfspaces with more malicious noise Learning large-margin halfspaces with more malicious noise Philip M. Long Google plong@google.com Rocco A. Servedio Columbia University rocco@cs.columbia.edu Abstract We describe a simple algorithm that

More information

Introduction to Machine Learning Lecture 11. Mehryar Mohri Courant Institute and Google Research

Introduction to Machine Learning Lecture 11. Mehryar Mohri Courant Institute and Google Research Introduction to Machine Learning Lecture 11 Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Boosting Mehryar Mohri - Introduction to Machine Learning page 2 Boosting Ideas Main idea:

More information

COMS 4771 Introduction to Machine Learning. Nakul Verma

COMS 4771 Introduction to Machine Learning. Nakul Verma COMS 4771 Introduction to Machine Learning Nakul Verma Announcements HW2 due now! Project proposal due on tomorrow Midterm next lecture! HW3 posted Last time Linear Regression Parametric vs Nonparametric

More information

Learning symmetric non-monotone submodular functions

Learning symmetric non-monotone submodular functions Learning symmetric non-monotone submodular functions Maria-Florina Balcan Georgia Institute of Technology ninamf@cc.gatech.edu Nicholas J. A. Harvey University of British Columbia nickhar@cs.ubc.ca Satoru

More information

Multi-label Active Learning with Auxiliary Learner

Multi-label Active Learning with Auxiliary Learner Multi-label Active Learning with Auxiliary Learner Chen-Wei Hung and Hsuan-Tien Lin National Taiwan University November 15, 2011 C.-W. Hung & H.-T. Lin (NTU) Multi-label AL w/ Auxiliary Learner 11/15/2011

More information

Computational Lower Bounds for Statistical Estimation Problems

Computational Lower Bounds for Statistical Estimation Problems Computational Lower Bounds for Statistical Estimation Problems Ilias Diakonikolas (USC) (joint with Daniel Kane (UCSD) and Alistair Stewart (USC)) Workshop on Local Algorithms, MIT, June 2018 THIS TALK

More information

Lecture 8. Instructor: Haipeng Luo

Lecture 8. Instructor: Haipeng Luo Lecture 8 Instructor: Haipeng Luo Boosting and AdaBoost In this lecture we discuss the connection between boosting and online learning. Boosting is not only one of the most fundamental theories in machine

More information

Machine Learning Ensemble Learning I Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi Spring /

Machine Learning Ensemble Learning I Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi Spring / Machine Learning Ensemble Learning I Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi Spring 2015 http://ce.sharif.edu/courses/93-94/2/ce717-1 / Agenda Combining Classifiers Empirical view Theoretical

More information

Empirical Risk Minimization Algorithms

Empirical Risk Minimization Algorithms Empirical Risk Minimization Algorithms Tirgul 2 Part I November 2016 Reminder Domain set, X : the set of objects that we wish to label. Label set, Y : the set of possible labels. A prediction rule, h:

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning 236756 Prof. Nir Ailon Lecture 4: Computational Complexity of Learning & Surrogate Losses Efficient PAC Learning Until now we were mostly worried about sample complexity

More information

Computational Learning Theory. CS534 - Machine Learning

Computational Learning Theory. CS534 - Machine Learning Computational Learning Theory CS534 Machine Learning Introduction Computational learning theory Provides a theoretical analysis of learning Shows when a learning algorithm can be expected to succeed Shows

More information

Machine Learning. VC Dimension and Model Complexity. Eric Xing , Fall 2015

Machine Learning. VC Dimension and Model Complexity. Eric Xing , Fall 2015 Machine Learning 10-701, Fall 2015 VC Dimension and Model Complexity Eric Xing Lecture 16, November 3, 2015 Reading: Chap. 7 T.M book, and outline material Eric Xing @ CMU, 2006-2015 1 Last time: PAC and

More information

Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers

Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers Erin Allwein, Robert Schapire and Yoram Singer Journal of Machine Learning Research, 1:113-141, 000 CSE 54: Seminar on Learning

More information

Foundations of Machine Learning

Foundations of Machine Learning Introduction to ML Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu page 1 Logistics Prerequisites: basics in linear algebra, probability, and analysis of algorithms. Workload: about

More information

Big Data Analytics. Special Topics for Computer Science CSE CSE Feb 24

Big Data Analytics. Special Topics for Computer Science CSE CSE Feb 24 Big Data Analytics Special Topics for Computer Science CSE 4095-001 CSE 5095-005 Feb 24 Fei Wang Associate Professor Department of Computer Science and Engineering fei_wang@uconn.edu Prediction III Goal

More information

Online Convex Optimization. Gautam Goel, Milan Cvitkovic, and Ellen Feldman CS 159 4/5/2016

Online Convex Optimization. Gautam Goel, Milan Cvitkovic, and Ellen Feldman CS 159 4/5/2016 Online Convex Optimization Gautam Goel, Milan Cvitkovic, and Ellen Feldman CS 159 4/5/2016 The General Setting The General Setting (Cover) Given only the above, learning isn't always possible Some Natural

More information

Computational and Statistical Learning Theory

Computational and Statistical Learning Theory Computational and Statistical Learning Theory TTIC 31120 Prof. Nati Srebro Lecture 12: Weak Learnability and the l 1 margin Converse to Scale-Sensitive Learning Stability Convex-Lipschitz-Bounded Problems

More information

arxiv: v4 [cs.lg] 5 Nov 2014

arxiv: v4 [cs.lg] 5 Nov 2014 Statistical Active Learning Algorithms for Noise Tolerance and Differential Privacy arxiv:1307.310v4 [cs.lg] 5 Nov 014 Maria Florina Balcan Carnegie Mellon University ninamf@cs.cmu.edu Abstract Vitaly

More information

Introduction to Machine Learning. Introduction to ML - TAU 2016/7 1

Introduction to Machine Learning. Introduction to ML - TAU 2016/7 1 Introduction to Machine Learning Introduction to ML - TAU 2016/7 1 Course Administration Lecturers: Amir Globerson (gamir@post.tau.ac.il) Yishay Mansour (Mansour@tau.ac.il) Teaching Assistance: Regev Schweiger

More information

An Introduction to Statistical Theory of Learning. Nakul Verma Janelia, HHMI

An Introduction to Statistical Theory of Learning. Nakul Verma Janelia, HHMI An Introduction to Statistical Theory of Learning Nakul Verma Janelia, HHMI Towards formalizing learning What does it mean to learn a concept? Gain knowledge or experience of the concept. The basic process

More information

Lecture 29: Computational Learning Theory

Lecture 29: Computational Learning Theory CS 710: Complexity Theory 5/4/2010 Lecture 29: Computational Learning Theory Instructor: Dieter van Melkebeek Scribe: Dmitri Svetlov and Jake Rosin Today we will provide a brief introduction to computational

More information

ECE 5424: Introduction to Machine Learning

ECE 5424: Introduction to Machine Learning ECE 5424: Introduction to Machine Learning Topics: Ensemble Methods: Bagging, Boosting PAC Learning Readings: Murphy 16.4;; Hastie 16 Stefan Lee Virginia Tech Fighting the bias-variance tradeoff Simple

More information

Machine Learning

Machine Learning Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University January 14, 2015 Today: The Big Picture Overfitting Review: probability Readings: Decision trees, overfiting

More information

Towards Lifelong Machine Learning Multi-Task and Lifelong Learning with Unlabeled Tasks Christoph Lampert

Towards Lifelong Machine Learning Multi-Task and Lifelong Learning with Unlabeled Tasks Christoph Lampert Towards Lifelong Machine Learning Multi-Task and Lifelong Learning with Unlabeled Tasks Christoph Lampert HSE Computer Science Colloquium September 6, 2016 IST Austria (Institute of Science and Technology

More information

TDT4173 Machine Learning

TDT4173 Machine Learning TDT4173 Machine Learning Lecture 3 Bagging & Boosting + SVMs Norwegian University of Science and Technology Helge Langseth IT-VEST 310 helgel@idi.ntnu.no 1 TDT4173 Machine Learning Outline 1 Ensemble-methods

More information

Littlestone s Dimension and Online Learnability

Littlestone s Dimension and Online Learnability Littlestone s Dimension and Online Learnability Shai Shalev-Shwartz Toyota Technological Institute at Chicago The Hebrew University Talk at UCSD workshop, February, 2009 Joint work with Shai Ben-David

More information

PAC-Bayesian Learning and Domain Adaptation

PAC-Bayesian Learning and Domain Adaptation PAC-Bayesian Learning and Domain Adaptation Pascal Germain 1 François Laviolette 1 Amaury Habrard 2 Emilie Morvant 3 1 GRAAL Machine Learning Research Group Département d informatique et de génie logiciel

More information

21 An Augmented PAC Model for Semi- Supervised Learning

21 An Augmented PAC Model for Semi- Supervised Learning 21 An Augmented PAC Model for Semi- Supervised Learning Maria-Florina Balcan Avrim Blum A PAC Model for Semi-Supervised Learning The standard PAC-learning model has proven to be a useful theoretical framework

More information

Introduction to Support Vector Machines

Introduction to Support Vector Machines Introduction to Support Vector Machines Hsuan-Tien Lin Learning Systems Group, California Institute of Technology Talk in NTU EE/CS Speech Lab, November 16, 2005 H.-T. Lin (Learning Systems Group) Introduction

More information

Name (NetID): (1 Point)

Name (NetID): (1 Point) CS446: Machine Learning (D) Spring 2017 March 16 th, 2017 This is a closed book exam. Everything you need in order to solve the problems is supplied in the body of this exam. This exam booklet contains

More information

Clustering Perturbation Resilient

Clustering Perturbation Resilient Clustering Perturbation Resilient Instances Maria-Florina Balcan Carnegie Mellon University Clustering Comes Up Everywhere Clustering news articles or web pages or search results by topic. Clustering protein

More information

Large-Margin Thresholded Ensembles for Ordinal Regression

Large-Margin Thresholded Ensembles for Ordinal Regression Large-Margin Thresholded Ensembles for Ordinal Regression Hsuan-Tien Lin and Ling Li Learning Systems Group, California Institute of Technology, U.S.A. Conf. on Algorithmic Learning Theory, October 9,

More information

Introduction to Machine Learning Midterm Exam

Introduction to Machine Learning Midterm Exam 10-701 Introduction to Machine Learning Midterm Exam Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes, but

More information

The True Sample Complexity of Active Learning

The True Sample Complexity of Active Learning The True Sample Complexity of Active Learning Maria-Florina Balcan Computer Science Department Carnegie Mellon University ninamf@cs.cmu.edu Steve Hanneke Machine Learning Department Carnegie Mellon University

More information

Qualifying Exam in Machine Learning

Qualifying Exam in Machine Learning Qualifying Exam in Machine Learning October 20, 2009 Instructions: Answer two out of the three questions in Part 1. In addition, answer two out of three questions in two additional parts (choose two parts

More information

Automatic Feature Decomposition for Single View Co-training

Automatic Feature Decomposition for Single View Co-training Automatic Feature Decomposition for Single View Co-training Minmin Chen, Kilian Weinberger, Yixin Chen Computer Science and Engineering Washington University in Saint Louis Minmin Chen, Kilian Weinberger,

More information

Computational Learning Theory: Shattering and VC Dimensions. Machine Learning. Spring The slides are mainly from Vivek Srikumar

Computational Learning Theory: Shattering and VC Dimensions. Machine Learning. Spring The slides are mainly from Vivek Srikumar Computational Learning Theory: Shattering and VC Dimensions Machine Learning Spring 2018 The slides are mainly from Vivek Srikumar 1 This lecture: Computational Learning Theory The Theory of Generalization

More information

Midterm Exam Solutions, Spring 2007

Midterm Exam Solutions, Spring 2007 1-71 Midterm Exam Solutions, Spring 7 1. Personal info: Name: Andrew account: E-mail address:. There should be 16 numbered pages in this exam (including this cover sheet). 3. You can use any material you

More information

ECE 5984: Introduction to Machine Learning

ECE 5984: Introduction to Machine Learning ECE 5984: Introduction to Machine Learning Topics: Ensemble Methods: Bagging, Boosting Readings: Murphy 16.4; Hastie 16 Dhruv Batra Virginia Tech Administrativia HW3 Due: April 14, 11:55pm You will implement

More information

Hybrid Machine Learning Algorithms

Hybrid Machine Learning Algorithms Hybrid Machine Learning Algorithms Umar Syed Princeton University Includes joint work with: Rob Schapire (Princeton) Nina Mishra, Alex Slivkins (Microsoft) Common Approaches to Machine Learning!! Supervised

More information

The Perceptron algorithm

The Perceptron algorithm The Perceptron algorithm Tirgul 3 November 2016 Agnostic PAC Learnability A hypothesis class H is agnostic PAC learnable if there exists a function m H : 0,1 2 N and a learning algorithm with the following

More information

Plug-in Approach to Active Learning

Plug-in Approach to Active Learning Plug-in Approach to Active Learning Stanislav Minsker Stanislav Minsker (Georgia Tech) Plug-in approach to active learning 1 / 18 Prediction framework Let (X, Y ) be a random couple in R d { 1, +1}. X

More information