Active Learning from Single and Multiple Annotators

Size: px
Start display at page:

Download "Active Learning from Single and Multiple Annotators"

Transcription

1 Active Learning from Single and Multiple Annotators Kamalika Chaudhuri University of California, San Diego Joint work with Chicheng Zhang

2 Classification Given: ( xi, yi ) Vector of features Discrete Labels Find: Prediction rule in a class to predict y from x

3 Challenge: Acquiring Labeled Data Unlabeled data is cheap Labels are expensive

4 Active Learning Given: ( xi, yi ) Find: Prediction rule to predict y from x

5 Active Learning Given: ( xi, yi ) Interactive Label Queries Find: Prediction rule to predict y from x

6 Active Learning Given: ( xi, yi ) Interactive Label Queries Find: Prediction rule to predict y from x using few label queries

7 Why Active Learning Helps? Given: Unlabeled data, interactive label queries Find: Good prediction rule using few label queries

8 Why Active Learning Helps? Given: Unlabeled data, interactive label queries Find: Good prediction rule using few label queries

9 Why Active Learning Helps? Given: Unlabeled data, interactive label queries Find: Good prediction rule using few label queries

10 Why Active Learning Helps? Given: Unlabeled data, interactive label queries Find: Good prediction rule using few label queries

11 Why Active Learning Helps? Given: Unlabeled data, interactive label queries Find: Good prediction rule using few label queries

12 Challenge: Agnostic Active Learning

13 PAC Model: Realizable vs. Agnostic Given: Find: Concept class C, Samples (xi, yi) from D c in C with low error c 2 C Realizable such that c (x) =y, 8(x, y) D Agnostic No Assumptions on D -

14 Agnostic Active Learning Given: Concept class C ( best c in C has error ) ( xi, yi ) drawn from D Interactive Label Queries Find: c in C with error apple + using few label queries with no assumptions on D

15 Why is Agnostic Active Learning hard? Given: Unlabeled data, interactive label queries No assumptions on data distribution Find: Good prediction rule using few label queries

16 Why is Agnostic Active Learning hard? Given: Unlabeled data, interactive label queries No assumptions on data distribution Find: Good prediction rule using few label queries

17 Why is Agnostic Active Learning hard? Given: Unlabeled data, interactive label queries No assumptions on data distribution Find: Good prediction rule using few label queries Statistically Inconsistent: finds local minimum!

18 Talk Outline What is Active Learning? Active Learning from Single Annotator [NIPS 14] Active Learning from Multiple Annotators [NIPS 15]

19 Active Learning from a Single Annotator

20 Previous Work: Disagreement-based Active Learning 1. Maintain candidate set V (that contains best c in C) 2. For unlabeled x, if there exist c1, c2 in V s.t c 1 (x) 6= c 2 (x) then, query label of x, and update V c1 c2 [CAL94, BBL06, DHM07, H07, BDL09, BHLZ10, ] V x

21 Previous Work: Disagreement-based Active Learning 1. Maintain candidate set V (that contains best c in C) 2. For unlabeled x, if there exist c1, c2 in V s.t c 1 (x) 6= c 2 (x) then, query label of x, and update V c1 c2 Pro: Very general V Con: High label requirement x

22 Previous Work: Margin-based Active Learning Geometric algorithm for linear classification wrt uniform and log-concave distribution on sphere [BZ07, BL13, ABL14] Pro: Label requirement Con: Not general

23 Is there a general algorithm with better label complexity?

24 Talk Outline What is Active Learning? Active Learning from Single Annotator [NIPS 14] Previous Work Confidence Based Active Learning

25 Confidence-Rated Predictor (CRP) o ++ + o o o - o + Classifiers that can Abstain

26 Confidence-Rated Predictor (CRP) o ++ + o o o - o + Classifiers that can Abstain Performance Metrics: Error: Rate of mistakes Abstention Rate: Rate of Don t Know predictions

27 CRP with Guaranteed Error [EW10] - o + V - o o o V o Input: Concept set V Unlabeled data U Error guarantee

28 CRP with Guaranteed Error [EW10] - o + V - o o o V o Input: Concept set V Unlabeled data U Error guarantee Goal: Assign label P(z) in {+1, -1, 0} to each z in U Guarantee: For all c in V, Pr (P (z) = z U c(z)) apple

29 Confidence-Based Active Learning Two Key Ideas: 1. A reduction from Active Learning to CRP with guaranteed error! 2. Construction of a CRP with guaranteed error

30 Algorithm Outline 1. Active Learning via CRP 2. How to construct a CRP with guaranteed error

31 Active Learning via CRP Given: Unlabeled data U Concept class C Confidence-rate Predictor P Target excess error Goal: Output c in C with target excess error with min #label queries Problem: Which labels to query?

32 Definition: Confidence Sets for c* Recall: c* is the concept in C with min error Confidence Set: Set S of concepts s.t. w.p 1, c* is in S c* S Similar to confidence intervals

33 How to construct confidence sets? Use generalization bounds: Known: 8c,w.p. 1, err(c) cerr(c) apple r VC(C) log(n/ ) n Choose: All c with: cerr(c) apple min c 0 2C cerr(c0 )+ r VC(C) log(n/ ) n (A bit more complicated for active learning)

34 Active Learning via CRP Given: Unlabeled data U Concept class C Confidence-rate Predictor P Target error 1. Initialize: Confidence set for c* V1 = C, d = VCdim(C). 2. For epoch k = 1, 2, 3,. (a) Target error for epoch k: k = 1 2 k Problem: How to query labels?

35 Active Learning via CRP Given: Unlabeled data U Concept class C Confidence-rate Predictor P Target error 1. Initialize: Confidence set for c* V1 = C, d = VCdim(C). 2. For epoch k = 1, 2, 3,. (a) Target error for epoch k: k = 1 2 k Key Idea 1: Run P with error guarantee when P abstains k /64, and query label

36 Active Learning via CRP Given: Unlabeled data U Concept class C Confidence-rate Predictor P Target error 1. Initialize: Confidence set for c* V1 = C, d = VCdim(C). 2. For epoch k = 1, 2, 3,. (a) Target error for epoch k: k = 1 2 k Key Idea 1: Run P with error guarantee when P abstains k /64, and query label Key Idea 2: Query just enough labels to get target excess error k+1 / k on the Don t Knows ( = P s abstention prob.) k

37 Active Learning via CRP Given: Unlabeled data U Concept class C Confidence-rate Predictor P Target error 1. Initialize: Confidence set for c* V1 = C, d = VCdim(C). 2. For epoch k = 1, 2, 3,. (a) Target error for epoch k: k = 1 2 k Key Idea 1: Run P with error guarantee when P abstains k /64, and query label Key Idea 2: Query just enough labels to get target excess error k+1 / k on the Don t Knows ( = P s abstention prob.) Low abstention prob. means less labels queried k

38 Active Learning via CRP Given: Unlabeled data U Concept class C Confidence-rate Predictor P Target error 1. Initialize: Confidence set for c* V1 = C, d = VCdim(C). 2. For epoch k = 1, 2, 3,. (a) Target error for epoch k: k = 1 2 k (b) Run CRP P with V = Vk, unlabeled data, error guarantee = k /64

39 Active Learning via CRP Given: Unlabeled data U Concept class C Confidence-rate Predictor P Target error 1. Initialize: Confidence set for c* V1 = C, d = VCdim(C). 2. For epoch k = 1, 2, 3,. (a) Target error for epoch k: k = 1 2 k (b) Run CRP P with V = Vk, unlabeled data, error guarantee = k /64 (c) Whenever P abstains, query enough labels to get excess error k / k on the Don t Knows ( is P s abstention rate). k

40 Active Learning via CRP Given: Unlabeled data U Concept class C Confidence-rate Predictor P Target error 1. Initialize: Confidence set for c* V1 = C, d = VCdim(C). 2. For epoch k = 1, 2, 3,. (a) Target error for epoch k: k = 1 2 k (b) Run CRP P with V = Vk, unlabeled data, error guarantee = k /64 (c) Whenever P abstains, query enough labels to get excess error k / k on the Don t Knows ( is P s abstention rate). (d) Update: Confidence set Vk+1 (based on new labeled samples) k

41 Active Learning via CRP Given: Unlabeled data U Concept class C Confidence-rate Predictor P Target error 1. Initialize: Confidence set for c* V1 = C, d = VCdim(C). 2. For epoch k = 1, 2, 3,. (a) Target error for epoch k: k = 1 2 k (b) Run CRP P with V = Vk, unlabeled data, error guarantee = k /64 (c) Whenever P abstains, query enough labels to get excess error k / k on the Don t Knows ( is P s abstention rate). (d) Update: Confidence set Vk+1 (based on new labeled samples) 3. Output any c in V log(1/ ) k

42 Algorithm Outline 1. Active Learning via CRP 2. How to construct a CRP with guaranteed error

43 CRP with Guaranteed Error [EW10] - o + V - o o o V o Input: Concept set V Unlabeled data U Error guarantee Goal: Assign label P(z) in {+1, -1, 0} to each z in U Guarantee: For all c in V, Pr (P (z) = z U c(z)) apple

44 CRP with Guaranteed Error Given: Concept set V Error guarantee Unlabeled data U

45 CRP with Guaranteed Error Given: Concept set V Error guarantee Unlabeled data U 1. Construct and solve Linear Program: For each zi in U, variables: i : Prob. of predicting +1 on z i i : Prob. of predicting -1 on zi

46 CRP with Guaranteed Error Given: Concept set V Error guarantee Unlabeled data U 1. Construct and solve Linear Program: For each zi in U, variables: i : Prob. of predicting +1 on z i i : Prob. of predicting -1 on zi max X i i + i U * Non-abstention rate

47 CRP with Guaranteed Error Given: Concept set V Error guarantee Unlabeled data U 1. Construct and solve Linear Program: For each zi in U, variables: i : Prob. of predicting +1 on z i i : Prob. of predicting -1 on zi s.t: max X i 8c 2 V, i + X i i:c(z i )=1 i + X i:c(z i )= 1 i apple U U * Non-abstention rate Error guarantee

48 CRP with Guaranteed Error Given: Concept set V Error guarantee Unlabeled data U 1. Construct and solve Linear Program: For each zi in U, variables: i : Prob. of predicting +1 on z i i : Prob. of predicting -1 on zi s.t: max X i 8c 2 V, i + X i i + i:c(z i )=1 i:c(z i )= 1 8i, i + i apple 1, i, i 0 X i apple U U * Non-abstention rate Error guarantee Probability constraint

49 CRP with Guaranteed Error Given: Concept set V Error guarantee Unlabeled data U 1. Construct and solve Linear Program: s.t: max X i 8c 2 V, i + X i i + 8i, i + i apple 1, i, i 0 X i:c(z i )=1 i:c(z i )= 1 i apple U U * Non-abstention rate Error guarantee Probability constraint 2. For zi in U, output: 1 w.p. i, 1 w.p. i, 0ow Note: Optimal abstention rate given concept set V

50 Algorithm Outline 1. Active Learning via CRP 2. How to construct a CRP with guaranteed error

51 Talk Outline What is Active Learning? Active Learning from Single Annotator [NIPS 14] Previous Work Confidence Based Active Learning Performance Guarantees

52 Consistency Guarantees Theorem: If P is a CRP with guaranteed error, then our active learning algorithm is consistent. Note: Any CRP with guaranteed error gives consistency.

53 Label Complexity: Agnostic Case Parameters: Target error VC dimension d Error of best c in C

54 Label Complexity: Agnostic Case Parameters: Target error VC dimension d Error of best c in C Disagreement-based: (2 + )( d( ) 2 log(1/ ) 2 + d log 2 (1/ ))

55 Label Complexity: Agnostic Case Parameters: Target error VC dimension d Error of best c in C Disagreement-based: (2 + )( d( ) 2 log(1/ ) Rate of change of disagreement region 2 + d log 2 (1/ ))

56 Label Complexity: Agnostic Case Parameters: Target error VC dimension d Error of best c in C Disagreement-based: (2 + )( d( ) 2 log(1/ ) Rate of change of disagreement region 2 + d log 2 (1/ )) Our Algorithm: (2 +, /256)( d( ) 2 log(1/ ) 2 + d log 2 (1/ ))

57 Label Complexity: Agnostic Case Parameters: Target error VC dimension d Error of best c in C Disagreement-based: (2 + )( d( ) 2 log(1/ ) 2 Rate of change of disagreement region + d log 2 (1/ )) Our Algorithm: (2 +, /256)( d( ) 2 log(1/ ) Rate of change of DK region Lower than (2 + ) 2 + d log 2 (1/ ))

58 Label Complexity: Agnostic Case Parameters: Target error VC dimension d Error of best c in C Disagreement-based: (2 + )( d( ) 2 log(1/ ) 2 Rate of change of disagreement region + d log 2 (1/ )) Our Algorithm: (2 +, /256)( d( ) 2 log(1/ ) Rate of change of DK region Lower than (2 + ) 2 + d log 2 (1/ )) Passive Learning: d log(1/ ) 2

59 Linear Classification on Logconcave Distribution on Sphere Our Algorithm: (Realizable Case) d log(1/ ) + log(1/ ) log log(1/ ) (matches Margin-based Active Learning [BL13]) Disagreement-based Active Learning: d 3/2 log 2 (1/ )(log d + log log(1/ ))

60 Conclusions Novel reduction from active learning to CRP Exploits non-zero error guarantee for better label reqmt Novel active learning algorithm: Better asymptotic label complexity than disagreement based More general than margin based

61 Talk Outline What is Active Learning? Active Learning from Single Annotator [NIPS 14] Previous Work Confidence Based Active Learning Performance Guarantees Active Learning from Multiple Annotators [NIPS 15]

62 What if we have auxiliary information?..as an extra oracle

63 Oracle and Weak Labeler Oracle: expensive but correct Weak labeler: cheap, sometimes wrong

64 The Model Given: ( xi, yi ) Interactive Label Queries

65 The Model Given: ( xi, yi ) Interactive Label Queries to Oracle O or Weak Labeler W

66 The Model Given: ( xi, yi ) Interactive Label Queries to Oracle O or Weak Labeler W Find: Prediction rule to predict y from x using few label queries to O

67 Formal Model Given: Concept class C (best c has error wrt O) ( xi, yi ) drawn from D

68 Formal Model Given: Concept class C (best c has error wrt O) ( xi, yi ) drawn from D Oracle O and Weak labeler W

69 Formal Model Given: Concept class C (best c has error wrt O) ( xi, yi ) drawn from D Oracle O or Weak labeler W Find: c in C with error apple + wrt O using minimum label queries to O

70 Formal Model Implications Weak labeler W may be biased yo = -1 yw = -1 yo = 1 yw = 1 Labels by O Labels by W

71 Previous Work [UBS12] Explicit assumptions on where W and O differ, provides algorithms in the non-parametric case [MCR14] No explicit assumptions, but applies to online selective linear classification and robust regression This talk: General learning strategy from W and O with no explicit assumptions

72 Talk Outline What is Active Learning? Active Learning from Single Annotator [NIPS 14] Active Learning from Multiple Annotators [NIPS 15] The Model Algorithm

73 How to learn in this model? Main Ideas: Learn a difference classifier h to predict when O and W differ

74 How to learn in this model? Main Ideas: Learn a difference classifier h to predict when O and W differ Use h with standard active learning to decide if we should query O or W

75 Algorithm Outline 1. Draw x1,..,xm. For each xi, query O and W. Set: yi,d = 1 if yi,o = yi,w

76 Algorithm Outline 1. Draw x1,..,xm. For each xi, query O and W. Set: yi,d = 1 if yi,o = yi,w 2. Train difference classifier h in H on { (xi, yi,d) }

77 Algorithm Outline 1. Draw x1,..,xm. For each xi, query O and W. Set: yi,d = 1 if yi,o = yi,w 2. Train difference classifier h in H on { (xi, yi,d) } 3. Run standard disagreement based active learning algorithm A. If A queries the label of x then: if h(x) = 1, query O, else query W

78 Algorithm Outline 1. Draw x1,..,xm. For each xi, query O and W. Set: yi,d = 1 if yi,o = yi,w 2. Train difference classifier h in H on { (xi, yi,d) } 3. Run standard disagreement based active learning algorithm A. If A queries the label of x then: if h(x) = 1, query O, else query W Is this statistically consistent?

79 Key Observation 1 Directly learning difference classifier may lead to inconsistent annotation on target task yo = 1 Actual Labels

80 Key Observation 1 Directly learning difference classifier may lead to inconsistent annotation on target task yw = -1 yo = 1 Actual Labels

81 Key Observation 1 Directly learning difference classifier may lead to inconsistent annotation on target task yw = -1 h* yo = 1 Actual Labels

82 Key Observation 1 Directly learning difference classifier may lead to inconsistent annotation on target task yw = -1 Query O h* h* Query W yo = 1 Actual Labels Annotation using h* as difference classifier

83 Key Observation 1 Directly learning difference classifier may lead to inconsistent annotation on target task yw = -1 y = -1 h* yo = 1 y = 1 Actual Labels Annotation using h* as difference classifier

84 Solution Train a cost-sensitive difference classifier Constrain False Negative (FN) rate as very low yw = -1 h*fn yo = 1 Actual Labels

85 Solution Train a cost-sensitive difference classifier Constrain False Negative (FN) rate as very low yw = -1 Query O h*fn h*fn yo = 1 Query W Actual Labels Annotation using h*fn as difference classifier

86 Solution Train a cost-sensitive difference classifier Constrain False Negative (FN) rate as very low yw = -1 h*fn h*fn yo = 1 y = 1 Actual Labels Annotation using h*fn as difference classifier

87 Algorithm Outline 1. Draw x1,..,xm. For each xi, query O and W. Set: yi,d = 1 if yi,o = yi,w 2. Train difference classifier h in H on { (xi, yi,d) } with false negative (FN) rate apple 3. Run standard disagreement based active learning algorithm A. If A queries the label of x then: if h(x) = 1, query O, else query W Theorem: This is statistically consistent

88 What about label complexity? Label complexity = #label queries to O

89 d 0 #labels to train difference classifier Õ (d = VCdim(H), = target excess error) Can we do better?

90 Key Observation 2 R = disagreement region of current confidence set R Input space

91 Key Observation 2 R = disagreement region of current confidence set Need to learn difference classifier with FN rate R apple / Pr(R) over R Input space

92 Key Observation 2 R = disagreement region of current confidence set Need to learn difference classifier with FN rate R apple / Pr(R) over R Need Õ d 0 Pr(R) labels Input space

93 Key Observation 2 R = disagreement region of current confidence set Need to learn difference classifier with FN rate R apple / Pr(R) over R Need Õ d 0 Pr(R) labels Problem: R keeps changing. Input space

94 Full Algorithm H = difference concept class, d = VCdim(H), = disagreemt coeff For epochs 1, 2, 3,. Target excess error in epoch k k 1/2 k

95 Full Algorithm H = difference concept class, d = VCdim(H), = disagreemt coeff For epochs 1, 2, 3,. Target excess error in epoch k k 1/2 k Õ(d 0 ( + k )/ k ) Draw samples x1,..,xm. For each xi, query O and W. Set: yi,d = 1 if yi,o = yi,w

96 Full Algorithm H = difference concept class, d = VCdim(H), = disagreemt coeff For epochs 1, 2, 3,. Target excess error in epoch k k 1/2 k Õ(d 0 ( + k )/ k ) Draw samples x1,..,xm. For each xi, query O and W. Set: yi,d = 1 if yi,o = yi,w Train difference classifier h on { (xi, yi,d) } with FN rate k / ( + k ) over the current disagreemt region

97 Full Algorithm H = difference concept class, d = VCdim(H), = disagreemt coeff For epochs 1, 2, 3,. Target excess error in epoch k k 1/2 k Õ(d 0 ( + k )/ k ) Draw samples x1,..,xm. For each xi, query O and W. Set: yi,d = 1 if yi,o = yi,w Train difference classifier h on { (xi, yi,d) } with FN rate k / ( + k ) over the current disagreemt region Run disagreement based active learning algorithm A to target excess error k. If A queries the label of x then: if h(x) = 1, query O, else query W

98 Label Complexity Total #labels to train difference classifier Õ d 0 ( + ) How many labels for the rest of active learning?

99 Label Complexity: Assumptions For any r, t, there is a h in H such that: Pr(h(x) = 1,x2 DIS(B(c,r),y O 6= y W ) apple t (Low FN over disagreemt region) Pr(h(x) =1,x2 DIS(B(c,r)) apple (r, t) (Low positives) Note: (r, t) apple Pr(DIS(B(c,r))

100 Label Complexity #labels to train difference classifier Õ #labels for active learning Õ where: d ( ) 2 2 d 0 ( + ) (2 +,O( )) 2 + apple Compare: #labels for disagreemt based active learning: Õ d ( ) 2 2

101 Conclusion Auxiliary oracle may help if there is a good difference classifier in the difference concept class Open Question: What if there is no gold standard?

102 Thank You!

103

Selective Prediction. Binary classifications. Rong Zhou November 8, 2017

Selective Prediction. Binary classifications. Rong Zhou November 8, 2017 Selective Prediction Binary classifications Rong Zhou November 8, 2017 Table of contents 1. What are selective classifiers? 2. The Realizable Setting 3. The Noisy Setting 1 What are selective classifiers?

More information

Practical Agnostic Active Learning

Practical Agnostic Active Learning Practical Agnostic Active Learning Alina Beygelzimer Yahoo Research based on joint work with Sanjoy Dasgupta, Daniel Hsu, John Langford, Francesco Orabona, Chicheng Zhang, and Tong Zhang * * introductory

More information

Improved Algorithms for Confidence-Rated Prediction with Error Guarantees

Improved Algorithms for Confidence-Rated Prediction with Error Guarantees Improved Algorithms for Confidence-Rated Prediction with Error Guarantees Kamalika Chaudhuri Chicheng Zhang Department of Computer Science and Engineering University of California, San Diego 95 Gilman

More information

Sample and Computationally Efficient Active Learning. Maria-Florina Balcan Carnegie Mellon University

Sample and Computationally Efficient Active Learning. Maria-Florina Balcan Carnegie Mellon University Sample and Computationally Efficient Active Learning Maria-Florina Balcan Carnegie Mellon University Machine Learning is Shaping the World Highly successful discipline with lots of applications. Computational

More information

Sample complexity bounds for differentially private learning

Sample complexity bounds for differentially private learning Sample complexity bounds for differentially private learning Kamalika Chaudhuri University of California, San Diego Daniel Hsu Microsoft Research Outline 1. Learning and privacy model 2. Our results: sample

More information

Foundations For Learning in the Age of Big Data. Maria-Florina Balcan

Foundations For Learning in the Age of Big Data. Maria-Florina Balcan Foundations For Learning in the Age of Big Data Maria-Florina Balcan Modern Machine Learning New applications Explosion of data Modern ML: New Learning Approaches Modern applications: massive amounts of

More information

Foundations For Learning in the Age of Big Data. Maria-Florina Balcan

Foundations For Learning in the Age of Big Data. Maria-Florina Balcan Foundations For Learning in the Age of Big Data Maria-Florina Balcan Modern Machine Learning New applications Explosion of data Classic Paradigm Insufficient Nowadays Modern applications: massive amounts

More information

Machine Learning. VC Dimension and Model Complexity. Eric Xing , Fall 2015

Machine Learning. VC Dimension and Model Complexity. Eric Xing , Fall 2015 Machine Learning 10-701, Fall 2015 VC Dimension and Model Complexity Eric Xing Lecture 16, November 3, 2015 Reading: Chap. 7 T.M book, and outline material Eric Xing @ CMU, 2006-2015 1 Last time: PAC and

More information

1 Active Learning Foundations of Machine Learning and Data Science. Lecturer: Maria-Florina Balcan Lecture 20 & 21: November 16 & 18, 2015

1 Active Learning Foundations of Machine Learning and Data Science. Lecturer: Maria-Florina Balcan Lecture 20 & 21: November 16 & 18, 2015 10-806 Foundations of Machine Learning and Data Science Lecturer: Maria-Florina Balcan Lecture 20 & 21: November 16 & 18, 2015 1 Active Learning Most classic machine learning methods and the formal learning

More information

Active Learning Class 22, 03 May Claire Monteleoni MIT CSAIL

Active Learning Class 22, 03 May Claire Monteleoni MIT CSAIL Active Learning 9.520 Class 22, 03 May 2006 Claire Monteleoni MIT CSAIL Outline Motivation Historical framework: query learning Current framework: selective sampling Some recent results Open problems Active

More information

Active Learning: Disagreement Coefficient

Active Learning: Disagreement Coefficient Advanced Course in Machine Learning Spring 2010 Active Learning: Disagreement Coefficient Handouts are jointly prepared by Shie Mannor and Shai Shalev-Shwartz In previous lectures we saw examples in which

More information

Littlestone s Dimension and Online Learnability

Littlestone s Dimension and Online Learnability Littlestone s Dimension and Online Learnability Shai Shalev-Shwartz Toyota Technological Institute at Chicago The Hebrew University Talk at UCSD workshop, February, 2009 Joint work with Shai Ben-David

More information

Improved Algorithms for Confidence-Rated Prediction with Error Guarantees

Improved Algorithms for Confidence-Rated Prediction with Error Guarantees Improved Algorithms for Confidence-Rated Prediction with Error Guarantees Kamalika Chaudhuri Chicheng Zhang Department of Computer Science and Engineering University of California, San Diego 95 Gilman

More information

Distributed Machine Learning. Maria-Florina Balcan, Georgia Tech

Distributed Machine Learning. Maria-Florina Balcan, Georgia Tech Distributed Machine Learning Maria-Florina Balcan, Georgia Tech Model for reasoning about key issues in supervised learning [Balcan-Blum-Fine-Mansour, COLT 12] Supervised Learning Example: which emails

More information

Generalization, Overfitting, and Model Selection

Generalization, Overfitting, and Model Selection Generalization, Overfitting, and Model Selection Sample Complexity Results for Supervised Classification MariaFlorina (Nina) Balcan 10/05/2016 Reminders Midterm Exam Mon, Oct. 10th Midterm Review Session

More information

The Perceptron algorithm

The Perceptron algorithm The Perceptron algorithm Tirgul 3 November 2016 Agnostic PAC Learnability A hypothesis class H is agnostic PAC learnable if there exists a function m H : 0,1 2 N and a learning algorithm with the following

More information

Active learning. Sanjoy Dasgupta. University of California, San Diego

Active learning. Sanjoy Dasgupta. University of California, San Diego Active learning Sanjoy Dasgupta University of California, San Diego Exploiting unlabeled data A lot of unlabeled data is plentiful and cheap, eg. documents off the web speech samples images and video But

More information

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18 CSE 417T: Introduction to Machine Learning Final Review Henry Chai 12/4/18 Overfitting Overfitting is fitting the training data more than is warranted Fitting noise rather than signal 2 Estimating! "#$

More information

Lecture 8. Instructor: Haipeng Luo

Lecture 8. Instructor: Haipeng Luo Lecture 8 Instructor: Haipeng Luo Boosting and AdaBoost In this lecture we discuss the connection between boosting and online learning. Boosting is not only one of the most fundamental theories in machine

More information

Generalization, Overfitting, and Model Selection

Generalization, Overfitting, and Model Selection Generalization, Overfitting, and Model Selection Sample Complexity Results for Supervised Classification Maria-Florina (Nina) Balcan 10/03/2016 Two Core Aspects of Machine Learning Algorithm Design. How

More information

Learning with Rejection

Learning with Rejection Learning with Rejection Corinna Cortes 1, Giulia DeSalvo 2, and Mehryar Mohri 2,1 1 Google Research, 111 8th Avenue, New York, NY 2 Courant Institute of Mathematical Sciences, 251 Mercer Street, New York,

More information

Computational and Statistical Learning Theory

Computational and Statistical Learning Theory Computational and Statistical Learning Theory TTIC 31120 Prof. Nati Srebro Lecture 17: Stochastic Optimization Part II: Realizable vs Agnostic Rates Part III: Nearest Neighbor Classification Stochastic

More information

Lecture 29: Computational Learning Theory

Lecture 29: Computational Learning Theory CS 710: Complexity Theory 5/4/2010 Lecture 29: Computational Learning Theory Instructor: Dieter van Melkebeek Scribe: Dmitri Svetlov and Jake Rosin Today we will provide a brief introduction to computational

More information

Does Unlabeled Data Help?

Does Unlabeled Data Help? Does Unlabeled Data Help? Worst-case Analysis of the Sample Complexity of Semi-supervised Learning. Ben-David, Lu and Pal; COLT, 2008. Presentation by Ashish Rastogi Courant Machine Learning Seminar. Outline

More information

A Bound on the Label Complexity of Agnostic Active Learning

A Bound on the Label Complexity of Agnostic Active Learning A Bound on the Label Complexity of Agnostic Active Learning Steve Hanneke March 2007 CMU-ML-07-103 School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213 Machine Learning Department,

More information

Midterm, Fall 2003

Midterm, Fall 2003 5-78 Midterm, Fall 2003 YOUR ANDREW USERID IN CAPITAL LETTERS: YOUR NAME: There are 9 questions. The ninth may be more time-consuming and is worth only three points, so do not attempt 9 unless you are

More information

[read Chapter 2] [suggested exercises 2.2, 2.3, 2.4, 2.6] General-to-specific ordering over hypotheses

[read Chapter 2] [suggested exercises 2.2, 2.3, 2.4, 2.6] General-to-specific ordering over hypotheses 1 CONCEPT LEARNING AND THE GENERAL-TO-SPECIFIC ORDERING [read Chapter 2] [suggested exercises 2.2, 2.3, 2.4, 2.6] Learning from examples General-to-specific ordering over hypotheses Version spaces and

More information

Name (NetID): (1 Point)

Name (NetID): (1 Point) CS446: Machine Learning Fall 2016 October 25 th, 2016 This is a closed book exam. Everything you need in order to solve the problems is supplied in the body of this exam. This exam booklet contains four

More information

PAC Model and Generalization Bounds

PAC Model and Generalization Bounds PAC Model and Generalization Bounds Overview Probably Approximately Correct (PAC) model Basic generalization bounds finite hypothesis class infinite hypothesis class Simple case More next week 2 Motivating

More information

Nearest neighbor classification in metric spaces: universal consistency and rates of convergence

Nearest neighbor classification in metric spaces: universal consistency and rates of convergence Nearest neighbor classification in metric spaces: universal consistency and rates of convergence Sanjoy Dasgupta University of California, San Diego Nearest neighbor The primeval approach to classification.

More information

Computational Learning Theory. Definitions

Computational Learning Theory. Definitions Computational Learning Theory Computational learning theory is interested in theoretical analyses of the following issues. What is needed to learn effectively? Sample complexity. How many examples? Computational

More information

TTIC An Introduction to the Theory of Machine Learning. Learning from noisy data, intro to SQ model

TTIC An Introduction to the Theory of Machine Learning. Learning from noisy data, intro to SQ model TTIC 325 An Introduction to the Theory of Machine Learning Learning from noisy data, intro to SQ model Avrim Blum 4/25/8 Learning when there is no perfect predictor Hoeffding/Chernoff bounds: minimizing

More information

Atutorialonactivelearning

Atutorialonactivelearning Atutorialonactivelearning Sanjoy Dasgupta 1 John Langford 2 UC San Diego 1 Yahoo Labs 2 Exploiting unlabeled data Alotofunlabeleddataisplentifulandcheap,eg. documents off the web speech samples images

More information

Support Vector Machines (SVMs).

Support Vector Machines (SVMs). Support Vector Machines (SVMs). SemiSupervised Learning. SemiSupervised SVMs. MariaFlorina Balcan 3/25/215 Support Vector Machines (SVMs). One of the most theoretically well motivated and practically most

More information

Generalization and Overfitting

Generalization and Overfitting Generalization and Overfitting Model Selection Maria-Florina (Nina) Balcan February 24th, 2016 PAC/SLT models for Supervised Learning Data Source Distribution D on X Learning Algorithm Expert / Oracle

More information

CS446: Machine Learning Spring Problem Set 4

CS446: Machine Learning Spring Problem Set 4 CS446: Machine Learning Spring 2017 Problem Set 4 Handed Out: February 27 th, 2017 Due: March 11 th, 2017 Feel free to talk to other members of the class in doing the homework. I am more concerned that

More information

Machine Learning: Homework 5

Machine Learning: Homework 5 0-60 Machine Learning: Homework 5 Due 5:0 p.m. Thursday, March, 06 TAs: Travis Dick and Han Zhao Instructions Late homework policy: Homework is worth full credit if submitted before the due date, half

More information

Machine Learning. Computational Learning Theory. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012

Machine Learning. Computational Learning Theory. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012 Machine Learning CSE6740/CS7641/ISYE6740, Fall 2012 Computational Learning Theory Le Song Lecture 11, September 20, 2012 Based on Slides from Eric Xing, CMU Reading: Chap. 7 T.M book 1 Complexity of Learning

More information

Multiple Aspect Ranking Using the Good Grief Algorithm. Benjamin Snyder and Regina Barzilay MIT

Multiple Aspect Ranking Using the Good Grief Algorithm. Benjamin Snyder and Regina Barzilay MIT Multiple Aspect Ranking Using the Good Grief Algorithm Benjamin Snyder and Regina Barzilay MIT From One Opinion To Many Much previous work assumes one opinion per text. (Turney 2002; Pang et al 2002; Pang

More information

Support vector machines Lecture 4

Support vector machines Lecture 4 Support vector machines Lecture 4 David Sontag New York University Slides adapted from Luke Zettlemoyer, Vibhav Gogate, and Carlos Guestrin Q: What does the Perceptron mistake bound tell us? Theorem: The

More information

Warm up: risk prediction with logistic regression

Warm up: risk prediction with logistic regression Warm up: risk prediction with logistic regression Boss gives you a bunch of data on loans defaulting or not: {(x i,y i )} n i= x i 2 R d, y i 2 {, } You model the data as: P (Y = y x, w) = + exp( yw T

More information

6.867 Machine learning: lecture 2. Tommi S. Jaakkola MIT CSAIL

6.867 Machine learning: lecture 2. Tommi S. Jaakkola MIT CSAIL 6.867 Machine learning: lecture 2 Tommi S. Jaakkola MIT CSAIL tommi@csail.mit.edu Topics The learning problem hypothesis class, estimation algorithm loss and estimation criterion sampling, empirical and

More information

CSE 151 Machine Learning. Instructor: Kamalika Chaudhuri

CSE 151 Machine Learning. Instructor: Kamalika Chaudhuri CSE 151 Machine Learning Instructor: Kamalika Chaudhuri Linear Classification Given labeled data: (xi, feature vector yi) label i=1,..,n where y is 1 or 1, find a hyperplane to separate from Linear Classification

More information

Large-Margin Thresholded Ensembles for Ordinal Regression

Large-Margin Thresholded Ensembles for Ordinal Regression Large-Margin Thresholded Ensembles for Ordinal Regression Hsuan-Tien Lin and Ling Li Learning Systems Group, California Institute of Technology, U.S.A. Conf. on Algorithmic Learning Theory, October 9,

More information

Supervised Learning. George Konidaris

Supervised Learning. George Konidaris Supervised Learning George Konidaris gdk@cs.brown.edu Fall 2017 Machine Learning Subfield of AI concerned with learning from data. Broadly, using: Experience To Improve Performance On Some Task (Tom Mitchell,

More information

An Introduction to Statistical Theory of Learning. Nakul Verma Janelia, HHMI

An Introduction to Statistical Theory of Learning. Nakul Verma Janelia, HHMI An Introduction to Statistical Theory of Learning Nakul Verma Janelia, HHMI Towards formalizing learning What does it mean to learn a concept? Gain knowledge or experience of the concept. The basic process

More information

Computational Learning Theory

Computational Learning Theory Computational Learning Theory [read Chapter 7] [Suggested exercises: 7.1, 7.2, 7.5, 7.8] Computational learning theory Setting 1: learner poses queries to teacher Setting 2: teacher chooses examples Setting

More information

Computational Learning Theory: Probably Approximately Correct (PAC) Learning. Machine Learning. Spring The slides are mainly from Vivek Srikumar

Computational Learning Theory: Probably Approximately Correct (PAC) Learning. Machine Learning. Spring The slides are mainly from Vivek Srikumar Computational Learning Theory: Probably Approximately Correct (PAC) Learning Machine Learning Spring 2018 The slides are mainly from Vivek Srikumar 1 This lecture: Computational Learning Theory The Theory

More information

Lecture 3: Introduction to Complexity Regularization

Lecture 3: Introduction to Complexity Regularization ECE90 Spring 2007 Statistical Learning Theory Instructor: R. Nowak Lecture 3: Introduction to Complexity Regularization We ended the previous lecture with a brief discussion of overfitting. Recall that,

More information

The Perceptron Algorithm

The Perceptron Algorithm The Perceptron Algorithm Machine Learning Spring 2018 The slides are mainly from Vivek Srikumar 1 Outline The Perceptron Algorithm Perceptron Mistake Bound Variants of Perceptron 2 Where are we? The Perceptron

More information

Reducing contextual bandits to supervised learning

Reducing contextual bandits to supervised learning Reducing contextual bandits to supervised learning Daniel Hsu Columbia University Based on joint work with A. Agarwal, S. Kale, J. Langford, L. Li, and R. Schapire 1 Learning to interact: example #1 Practicing

More information

COMS 4771 Introduction to Machine Learning. Nakul Verma

COMS 4771 Introduction to Machine Learning. Nakul Verma COMS 4771 Introduction to Machine Learning Nakul Verma Announcements HW2 due now! Project proposal due on tomorrow Midterm next lecture! HW3 posted Last time Linear Regression Parametric vs Nonparametric

More information

Lecture 14: Binary Classification: Disagreement-based Methods

Lecture 14: Binary Classification: Disagreement-based Methods CSE599i: Online and Adaptive Machine Learning Winter 2018 Lecture 14: Binary Classification: Disagreement-based Methods Lecturer: Kevin Jamieson Scribes: N. Cano, O. Rafieian, E. Barzegary, G. Erion, S.

More information

Large-Margin Thresholded Ensembles for Ordinal Regression

Large-Margin Thresholded Ensembles for Ordinal Regression Large-Margin Thresholded Ensembles for Ordinal Regression Hsuan-Tien Lin (accepted by ALT 06, joint work with Ling Li) Learning Systems Group, Caltech Workshop Talk in MLSS 2006, Taipei, Taiwan, 07/25/2006

More information

Computational Learning Theory. CS 486/686: Introduction to Artificial Intelligence Fall 2013

Computational Learning Theory. CS 486/686: Introduction to Artificial Intelligence Fall 2013 Computational Learning Theory CS 486/686: Introduction to Artificial Intelligence Fall 2013 1 Overview Introduction to Computational Learning Theory PAC Learning Theory Thanks to T Mitchell 2 Introduction

More information

Computational Learning Theory

Computational Learning Theory 1 Computational Learning Theory 2 Computational learning theory Introduction Is it possible to identify classes of learning problems that are inherently easy or difficult? Can we characterize the number

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning 236756 Prof. Nir Ailon Lecture 4: Computational Complexity of Learning & Surrogate Losses Efficient PAC Learning Until now we were mostly worried about sample complexity

More information

Lecture Learning infinite hypothesis class via VC-dimension and Rademacher complexity;

Lecture Learning infinite hypothesis class via VC-dimension and Rademacher complexity; CSCI699: Topics in Learning and Game Theory Lecture 2 Lecturer: Ilias Diakonikolas Scribes: Li Han Today we will cover the following 2 topics: 1. Learning infinite hypothesis class via VC-dimension and

More information

Machine Learning. Computational Learning Theory. Eric Xing , Fall Lecture 9, October 5, 2016

Machine Learning. Computational Learning Theory. Eric Xing , Fall Lecture 9, October 5, 2016 Machine Learning 10-701, Fall 2016 Computational Learning Theory Eric Xing Lecture 9, October 5, 2016 Reading: Chap. 7 T.M book Eric Xing @ CMU, 2006-2016 1 Generalizability of Learning In machine learning

More information

Lecture Support Vector Machine (SVM) Classifiers

Lecture Support Vector Machine (SVM) Classifiers Introduction to Machine Learning Lecturer: Amir Globerson Lecture 6 Fall Semester Scribe: Yishay Mansour 6.1 Support Vector Machine (SVM) Classifiers Classification is one of the most important tasks in

More information

Learning Ensembles. 293S T. Yang. UCSB, 2017.

Learning Ensembles. 293S T. Yang. UCSB, 2017. Learning Ensembles 293S T. Yang. UCSB, 2017. Outlines Learning Assembles Random Forest Adaboost Training data: Restaurant example Examples described by attribute values (Boolean, discrete, continuous)

More information

Distributed Machine Learning. Maria-Florina Balcan Carnegie Mellon University

Distributed Machine Learning. Maria-Florina Balcan Carnegie Mellon University Distributed Machine Learning Maria-Florina Balcan Carnegie Mellon University Distributed Machine Learning Modern applications: massive amounts of data distributed across multiple locations. Distributed

More information

An Algorithms-based Intro to Machine Learning

An Algorithms-based Intro to Machine Learning CMU 15451 lecture 12/08/11 An Algorithmsbased Intro to Machine Learning Plan for today Machine Learning intro: models and basic issues An interesting algorithm for combining expert advice Avrim Blum [Based

More information

Understanding Generalization Error: Bounds and Decompositions

Understanding Generalization Error: Bounds and Decompositions CIS 520: Machine Learning Spring 2018: Lecture 11 Understanding Generalization Error: Bounds and Decompositions Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the

More information

The Boosting Approach to. Machine Learning. Maria-Florina Balcan 10/31/2016

The Boosting Approach to. Machine Learning. Maria-Florina Balcan 10/31/2016 The Boosting Approach to Machine Learning Maria-Florina Balcan 10/31/2016 Boosting General method for improving the accuracy of any given learning algorithm. Works by creating a series of challenge datasets

More information

Machine Learning

Machine Learning Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 11, 2012 Today: Computational Learning Theory Probably Approximately Coorrect (PAC) learning theorem

More information

Some Statistical Properties of Deep Networks

Some Statistical Properties of Deep Networks Some Statistical Properties of Deep Networks Peter Bartlett UC Berkeley August 2, 2018 1 / 22 Deep Networks Deep compositions of nonlinear functions h = h m h m 1 h 1 2 / 22 Deep Networks Deep compositions

More information

The PAC Learning Framework -II

The PAC Learning Framework -II The PAC Learning Framework -II Prof. Dan A. Simovici UMB 1 / 1 Outline 1 Finite Hypothesis Space - The Inconsistent Case 2 Deterministic versus stochastic scenario 3 Bayes Error and Noise 2 / 1 Outline

More information

Statistical Learning Learning From Examples

Statistical Learning Learning From Examples Statistical Learning Learning From Examples We want to estimate the working temperature range of an iphone. We could study the physics and chemistry that affect the performance of the phone too hard We

More information

Computational Learning Theory. CS534 - Machine Learning

Computational Learning Theory. CS534 - Machine Learning Computational Learning Theory CS534 Machine Learning Introduction Computational learning theory Provides a theoretical analysis of learning Shows when a learning algorithm can be expected to succeed Shows

More information

Active Testing! Liu Yang! Joint work with Nina Balcan,! Eric Blais and Avrim Blum! Liu Yang 2011

Active Testing! Liu Yang! Joint work with Nina Balcan,! Eric Blais and Avrim Blum! Liu Yang 2011 ! Active Testing! Liu Yang! Joint work with Nina Balcan,! Eric Blais and Avrim Blum! 1 Property Testing Instance space X = R n (Distri D over X)! Tested function f : X->{0,1}! A property P of Boolean fn

More information

1 A Lower Bound on Sample Complexity

1 A Lower Bound on Sample Complexity COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #7 Scribe: Chee Wei Tan February 25, 2008 1 A Lower Bound on Sample Complexity In the last lecture, we stopped at the lower bound on

More information

CSC321 Lecture 4 The Perceptron Algorithm

CSC321 Lecture 4 The Perceptron Algorithm CSC321 Lecture 4 The Perceptron Algorithm Roger Grosse and Nitish Srivastava January 17, 2017 Roger Grosse and Nitish Srivastava CSC321 Lecture 4 The Perceptron Algorithm January 17, 2017 1 / 1 Recap:

More information

ECE521 Lecture7. Logistic Regression

ECE521 Lecture7. Logistic Regression ECE521 Lecture7 Logistic Regression Outline Review of decision theory Logistic regression A single neuron Multi-class classification 2 Outline Decision theory is conceptually easy and computationally hard

More information

Part of the slides are adapted from Ziko Kolter

Part of the slides are adapted from Ziko Kolter Part of the slides are adapted from Ziko Kolter OUTLINE 1 Supervised learning: classification........................................................ 2 2 Non-linear regression/classification, overfitting,

More information

Machine Learning. Linear Models. Fabio Vandin October 10, 2017

Machine Learning. Linear Models. Fabio Vandin October 10, 2017 Machine Learning Linear Models Fabio Vandin October 10, 2017 1 Linear Predictors and Affine Functions Consider X = R d Affine functions: L d = {h w,b : w R d, b R} where ( d ) h w,b (x) = w, x + b = w

More information

A Tutorial on Computational Learning Theory Presented at Genetic Programming 1997 Stanford University, July 1997

A Tutorial on Computational Learning Theory Presented at Genetic Programming 1997 Stanford University, July 1997 A Tutorial on Computational Learning Theory Presented at Genetic Programming 1997 Stanford University, July 1997 Vasant Honavar Artificial Intelligence Research Laboratory Department of Computer Science

More information

Online Learning Class 12, 20 March 2006 Andrea Caponnetto, Sanmay Das

Online Learning Class 12, 20 March 2006 Andrea Caponnetto, Sanmay Das Online Learning 9.520 Class 12, 20 March 2006 Andrea Caponnetto, Sanmay Das About this class Goal To introduce the general setting of online learning. To describe an online version of the RLS algorithm

More information

Active Learning with Connections to Confidence-rated Prediction

Active Learning with Connections to Confidence-rated Prediction Active Learning with Connections to Confidencerated Prediction Chicheng Zhang Department of Computer Science and Engineering University of California, San Diego chichengzhang@ucsd.edu Abstract In the problem

More information

Computational Learning Theory

Computational Learning Theory Computational Learning Theory Pardis Noorzad Department of Computer Engineering and IT Amirkabir University of Technology Ordibehesht 1390 Introduction For the analysis of data structures and algorithms

More information

CSE 151 Machine Learning. Instructor: Kamalika Chaudhuri

CSE 151 Machine Learning. Instructor: Kamalika Chaudhuri CSE 151 Machine Learning Instructor: Kamalika Chaudhuri Ensemble Learning How to combine multiple classifiers into a single one Works well if the classifiers are complementary This class: two types of

More information

Multilayer Perceptron

Multilayer Perceptron Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Single Perceptron 3 Boolean Function Learning 4

More information

Natural Language Processing. Classification. Features. Some Definitions. Classification. Feature Vectors. Classification I. Dan Klein UC Berkeley

Natural Language Processing. Classification. Features. Some Definitions. Classification. Feature Vectors. Classification I. Dan Klein UC Berkeley Natural Language Processing Classification Classification I Dan Klein UC Berkeley Classification Automatically make a decision about inputs Example: document category Example: image of digit digit Example:

More information

Computational and Statistical Learning Theory

Computational and Statistical Learning Theory Computational and Statistical Learning Theory TTIC 31120 Prof. Nati Srebro Lecture 8: Boosting (and Compression Schemes) Boosting the Error If we have an efficient learning algorithm that for any distribution

More information

Computational and Statistical Learning Theory

Computational and Statistical Learning Theory Computational and Statistical Learning Theory TTIC 31120 Prof. Nati Srebro Lecture 12: Weak Learnability and the l 1 margin Converse to Scale-Sensitive Learning Stability Convex-Lipschitz-Bounded Problems

More information

Discriminative Learning can Succeed where Generative Learning Fails

Discriminative Learning can Succeed where Generative Learning Fails Discriminative Learning can Succeed where Generative Learning Fails Philip M. Long, a Rocco A. Servedio, b,,1 Hans Ulrich Simon c a Google, Mountain View, CA, USA b Columbia University, New York, New York,

More information

Machine Learning

Machine Learning Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 11, 2012 Today: Computational Learning Theory Probably Approximately Coorrect (PAC) learning theorem

More information

FORMULATION OF THE LEARNING PROBLEM

FORMULATION OF THE LEARNING PROBLEM FORMULTION OF THE LERNING PROBLEM MIM RGINSKY Now that we have seen an informal statement of the learning problem, as well as acquired some technical tools in the form of concentration inequalities, we

More information

Warm up. Regrade requests submitted directly in Gradescope, do not instructors.

Warm up. Regrade requests submitted directly in Gradescope, do not  instructors. Warm up Regrade requests submitted directly in Gradescope, do not email instructors. 1 float in NumPy = 8 bytes 10 6 2 20 bytes = 1 MB 10 9 2 30 bytes = 1 GB For each block compute the memory required

More information

Machine Learning Linear Models

Machine Learning Linear Models Machine Learning Linear Models Outline II - Linear Models 1. Linear Regression (a) Linear regression: History (b) Linear regression with Least Squares (c) Matrix representation and Normal Equation Method

More information

Lecture 25 of 42. PAC Learning, VC Dimension, and Mistake Bounds

Lecture 25 of 42. PAC Learning, VC Dimension, and Mistake Bounds Lecture 25 of 42 PAC Learning, VC Dimension, and Mistake Bounds Thursday, 15 March 2007 William H. Hsu, KSU http://www.kddresearch.org/courses/spring2007/cis732 Readings: Sections 7.4.17.4.3, 7.5.17.5.3,

More information

Computational Learning Theory

Computational Learning Theory Computational Learning Theory Sinh Hoa Nguyen, Hung Son Nguyen Polish-Japanese Institute of Information Technology Institute of Mathematics, Warsaw University February 14, 2006 inh Hoa Nguyen, Hung Son

More information

Optimization Methods for Machine Learning (OMML)

Optimization Methods for Machine Learning (OMML) Optimization Methods for Machine Learning (OMML) 2nd lecture (2 slots) Prof. L. Palagi 16/10/2014 1 What is (not) Data Mining? By Namwar Rizvi - Ad Hoc Query: ad Hoc queries just examines the current data

More information

Learning from Corrupted Binary Labels via Class-Probability Estimation

Learning from Corrupted Binary Labels via Class-Probability Estimation Learning from Corrupted Binary Labels via Class-Probability Estimation Aditya Krishna Menon Brendan van Rooyen Cheng Soon Ong Robert C. Williamson xxx National ICT Australia and The Australian National

More information

Support Vector Machines. Machine Learning Fall 2017

Support Vector Machines. Machine Learning Fall 2017 Support Vector Machines Machine Learning Fall 2017 1 Where are we? Learning algorithms Decision Trees Perceptron AdaBoost 2 Where are we? Learning algorithms Decision Trees Perceptron AdaBoost Produce

More information

Generalization bounds

Generalization bounds Advanced Course in Machine Learning pring 200 Generalization bounds Handouts are jointly prepared by hie Mannor and hai halev-hwartz he problem of characterizing learnability is the most basic question

More information

Midterm Exam Solutions, Spring 2007

Midterm Exam Solutions, Spring 2007 1-71 Midterm Exam Solutions, Spring 7 1. Personal info: Name: Andrew account: E-mail address:. There should be 16 numbered pages in this exam (including this cover sheet). 3. You can use any material you

More information

10.1 The Formal Model

10.1 The Formal Model 67577 Intro. to Machine Learning Fall semester, 2008/9 Lecture 10: The Formal (PAC) Learning Model Lecturer: Amnon Shashua Scribe: Amnon Shashua 1 We have see so far algorithms that explicitly estimate

More information

Computational and Statistical Learning Theory

Computational and Statistical Learning Theory Computational and Statistical Learning Theory Problem set 1 Due: Monday, October 10th Please send your solutions to learning-submissions@ttic.edu Notation: Input space: X Label space: Y = {±1} Sample:

More information

Random projection trees and low dimensional manifolds. Sanjoy Dasgupta and Yoav Freund University of California, San Diego

Random projection trees and low dimensional manifolds. Sanjoy Dasgupta and Yoav Freund University of California, San Diego Random projection trees and low dimensional manifolds Sanjoy Dasgupta and Yoav Freund University of California, San Diego I. The new nonparametrics The new nonparametrics The traditional bane of nonparametric

More information