ORIE 4741: Learning with Big Messy Data. Generalization

Size: px
Start display at page:

Download "ORIE 4741: Learning with Big Messy Data. Generalization"

Transcription

1 ORIE 4741: Learning with Big Messy Data Generalization Professor Udell Operations Research and Information Engineering Cornell September 23, / 21

2 Announcements midterm 10/5 makeup exam 10/2, by arrangement with instructors (by ) homework due next Tuesday substitute next week: Sumanta Basu on decision trees! I won t hold OH next week 2 / 21

3 Generalization and Overfitting goal of model is not to predict well on D goal of model is to predict well on new data if the model has training set error and test set error, we say the model: low test set error high test set error low training set error generalizes overfits high training set error?!?! underfits 3 / 21

4 Simplest case: generalizing from a mean exit polling sample n voters leaving polling places 1 for each voter i, define the Boolean random variable { 1 if voter i voted for Clinton z i = 0 otherwise sample mean: ν = 1 n n i=1 z i true mean: µ = E i US electorate z i is Clinton s expected vote share 1 and suppose no one votes by mail or early or..., so these are iid samples from the US electorate 4 / 21

5 Simplest case: generalizing from a mean exit polling sample n voters leaving polling places 1 for each voter i, define the Boolean random variable { 1 if voter i voted for Clinton z i = 0 otherwise sample mean: ν = 1 n n i=1 z i true mean: µ = E i US electorate z i is Clinton s expected vote share what does the sample mean ν tell us about the true mean µ? 1 and suppose no one votes by mail or early or..., so these are iid samples from the US electorate 4 / 21

6 Hoeffding inequality Theorem (Hoeffding Inequality) Let z i {0, 1}, i = 1,..., n, be independent Boolean random variables with mean Ez i = µ. Define the sample mean ν = 1 n n i=1 z i. Then for any ɛ > 0, P[ ν µ > ε] 2 exp ( 2ε 2 n ). an example of a concentration inequality 5 / 21

7 Hoeffding inequality Theorem (Hoeffding Inequality) Let z i {0, 1}, i = 1,..., n, be independent Boolean random variables with mean Ez i = µ. Define the sample mean ν = 1 n n i=1 z i. Then for any ɛ > 0, P[ ν µ > ε] 2 exp ( 2ε 2 n ). an example of a concentration inequality µ can t be much higher than ν µ can t be much lower than ν more samples n improve estimate exponentially quickly 5 / 21

8 Back to the learning problem fix a hypothesis h : X Y. take { 1 y i = h(x i ) z i = 0 otherwise = 1(y i = h(x i )) 6 / 21

9 Back to the learning problem fix a hypothesis h : X Y. take { 1 y i = h(x i ) z i = 0 otherwise = 1(y i = h(x i )) example. build a model of voting behavior: y i = f (x i ) is 1 if voter i voted for Clinton, 0 otherwise h(x i ) is our guess of how the voter will vote, using hypothesis h z i is 1 if we guess correctly for voter i, 0 otherwise z i depends on x i, y i, and h 6 / 21

10 Adding in probability make our model probabilistic: fix a probability distribution P(x, y) sample (x i, y i ) iid from P(x, y) form data set D by sampling: for i = 1,..., n sample (x i, y i ) P(x, y) set D = {(x1, y 1 ),..., (x n, y n )} 7 / 21

11 Adding in probability make our model probabilistic: fix a probability distribution P(x, y) sample (x i, y i ) iid from P(x, y) form data set D by sampling: for i = 1,..., n sample (x i, y i ) P(x, y) set D = {(x1, y 1 ),..., (x n, y n )} special case. y = f (x) is deterministic conditioned on x: { 1 y = f (x) P(y x) = 0 otherwise 7 / 21

12 Hoeffding for the noisy learning problem fix a hypothesis h : X Y. draw samples (x i, y i ) iid from P(x, y) to form D = {(x 1, y 1 ),..., (x n, y n )} take z i = 1(y i = h(x i )) z i are iid (since (x i, y i ) are iid, and h is fixed) Ez = E (x,y) P(x,y) 1(y = h(x)) so we can apply Hoeffding! for any ɛ > 0, [ ] 1 n P z i Ez n > ε 2 exp ( 2ε 2 n ) i=1 8 / 21

13 Hoeffding for the noisy learning problem fix a hypothesis h : X Y. draw samples (x i, y i ) iid from P(x, y) to form D = {(x 1, y 1 ),..., (x n, y n )} take z i = 1(y i = h(x i )) z i are iid (since (x i, y i ) are iid, and h is fixed) Ez = E (x,y) P(x,y) 1(y = h(x)) so we can apply Hoeffding! for any ɛ > 0, [ ] 1 n P z i Ez n > ε 2 exp ( 2ε 2 n ) i=1 Q: what is the probability over? 8 / 21

14 Hoeffding for the noisy learning problem fix a hypothesis h : X Y. draw samples (x i, y i ) iid from P(x, y) to form D = {(x 1, y 1 ),..., (x n, y n )} take z i = 1(y i = h(x i )) z i are iid (since (x i, y i ) are iid, and h is fixed) Ez = E (x,y) P(x,y) 1(y = h(x)) so we can apply Hoeffding! for any ɛ > 0, [ ] 1 n P z i Ez n > ε 2 exp ( 2ε 2 n ) i=1 Q: what is the probability over? A: other data sets D = {(x 1, y 1 ),..., (x n, y n )} drawn iid according to P(x, y) 8 / 21

15 Hoeffding for the noisy learning problem fix a hypothesis h : X Y. draw samples (x i, y i ) iid from P(x, y) to form D = {(x 1, y 1 ),..., (x n, y n )} take z i = 1(y i = h(x i )) z i are iid (since (x i, y i ) are iid, and h is fixed) Ez = E (x,y) P(x,y) 1(y = h(x)) so we can apply Hoeffding! for any ɛ > 0, [ ] 1 n P z i Ez n > ε 2 exp ( 2ε 2 n ) i=1 Q: what is the probability over? A: other data sets D = {(x 1, y 1 ),..., (x n, y n )} drawn iid according to P(x, y) Q: is 1 n n i=1 z i more like training set error or test set error? 8 / 21

16 Hoeffding for the noisy learning problem fix a hypothesis h : X Y. draw samples (x i, y i ) iid from P(x, y) to form D = {(x 1, y 1 ),..., (x n, y n )} take z i = 1(y i = h(x i )) z i are iid (since (x i, y i ) are iid, and h is fixed) Ez = E (x,y) P(x,y) 1(y = h(x)) so we can apply Hoeffding! for any ɛ > 0, [ ] 1 n P z i Ez n > ε 2 exp ( 2ε 2 n ) i=1 Q: what is the probability over? A: other data sets D = {(x 1, y 1 ),..., (x n, y n )} drawn iid according to P(x, y) Q: is 1 n n i=1 z i more like training set error or test set error? A: test set error, since h is independent of D 8 / 21

17 In-sample and out-of-sample error some new terminology: in-sample error. E in (h) = fraction of D where y i and h(x i ) disagree = 1 n 1(y i h(x i )) n i=1 out-of-sample error. E out (h) = probability that y and h(x) disagree = P [y h(x)] (x,y) P(x,y) notice E out (h) = E [E in (h)] 9 / 21

18 Hoeffding for the noisy learning problem fix a hypothesis h : X Y. consider (x i, y i ) as samples drawn from P(x, y) take z i = 1(y i = h(x i )) z i are iid (since (x i, y i ) are iid, and h is fixed) Ez = E (x,y) P(x,y) 1(y = h(x)) apply Hoeffding: for any ε > 0, P[ E in (h) E out (h) > ε] 2 exp ( 2ε 2 n ) 10 / 21

19 Does Hoeffding work for our learned model? two scenarios: 1. Without looking at any data, pick a model h : X Y to predict who will vote for Clinton. Then sample data D = {(x 1, y 1 ),..., (x n, y n )}, and set z i = 1(y i = h(x i )). 2. Sample the data D = {(x 1, y 1 ),..., (x n, y n )}, and use it to develop a model g : X Y to predict who will vote for Clinton (e.g., using perceptron). Set z i = 1(y i = g(x i )). Is the sample mean 1 n n i=1 z i a good estimate for the expected performance Ez? Is 1 n n i=1 z i a good estimate for Ez? 11 / 21

20 Does Hoeffding work for our learned model? two scenarios: 1. Without looking at any data, pick a model h : X Y to predict who will vote for Clinton. Then sample data D = {(x 1, y 1 ),..., (x n, y n )}, and set z i = 1(y i = h(x i )). 2. Sample the data D = {(x 1, y 1 ),..., (x n, y n )}, and use it to develop a model g : X Y to predict who will vote for Clinton (e.g., using perceptron). Set z i = 1(y i = g(x i )). Is the sample mean 1 n n i=1 z i a good estimate for the expected performance Ez? Is 1 n n i=1 z i a good estimate for Ez? Q: Are the z i s iid? What about the z i s? 11 / 21

21 Does Hoeffding work for our learned model? two scenarios: 1. Without looking at any data, pick a model h : X Y to predict who will vote for Clinton. Then sample data D = {(x 1, y 1 ),..., (x n, y n )}, and set z i = 1(y i = h(x i )). 2. Sample the data D = {(x 1, y 1 ),..., (x n, y n )}, and use it to develop a model g : X Y to predict who will vote for Clinton (e.g., using perceptron). Set z i = 1(y i = g(x i )). Is the sample mean 1 n n i=1 z i a good estimate for the expected performance Ez? Is 1 n n i=1 z i a good estimate for Ez? Q: Are the z i s iid? What about the z i s? A: z i s are iid. z i s are not independent: they depend on g, which depends on the whole data set D = {(x 1, y 1 ),..., (x n, y n )}. 11 / 21

22 Does Hoeffding work for our learned model? two scenarios: 1. Without looking at any data, pick a model h : X Y to predict who will vote for Clinton. Then sample data D = {(x 1, y 1 ),..., (x n, y n )}, and set z i = 1(y i = h(x i )). 2. Sample the data D = {(x 1, y 1 ),..., (x n, y n )}, and use it to develop a model g : X Y to predict who will vote for Clinton (e.g., using perceptron). Set z i = 1(y i = g(x i )). Is the sample mean 1 n n i=1 z i a good estimate for the expected performance Ez? Is 1 n n i=1 z i a good estimate for Ez? Q: Are the z i s iid? What about the z i s? A: z i s are iid. z i s are not independent: they depend on g, which depends on the whole data set D = {(x 1, y 1 ),..., (x n, y n )}. Q: Does Hoeffding apply to the first? the second? 11 / 21

23 Does Hoeffding work for our learned model? two scenarios: 1. Without looking at any data, pick a model h : X Y to predict who will vote for Clinton. Then sample data D = {(x 1, y 1 ),..., (x n, y n )}, and set z i = 1(y i = h(x i )). 2. Sample the data D = {(x 1, y 1 ),..., (x n, y n )}, and use it to develop a model g : X Y to predict who will vote for Clinton (e.g., using perceptron). Set z i = 1(y i = g(x i )). Is the sample mean 1 n n i=1 z i a good estimate for the expected performance Ez? Is 1 n n i=1 z i a good estimate for Ez? Q: Are the z i s iid? What about the z i s? A: z i s are iid. z i s are not independent: they depend on g, which depends on the whole data set D = {(x 1, y 1 ),..., (x n, y n )}. Q: Does Hoeffding apply to the first? the second? A: Hoeffding applies to first, not to second. Extreme case for second scenario: model memorizes the data. 11 / 21

24 Recall validation procedure how to decide which model to use? split data into training set D train and test set D test pick m different interesting model classes e.g., different φs: φ 1, φ 2,..., φ m fit ( train ) models on training set D train get one model h : X Y for each φs, and set H = {h 1, h 2,..., h m } compute error of each model on test set D test and choose lowest: g = argmin E Dtest (h) h H 12 / 21

25 Recall validation procedure how to decide which model to use? split data into training set D train and test set D test pick m different interesting model classes e.g., different φs: φ 1, φ 2,..., φ m fit ( train ) models on training set D train get one model h : X Y for each φs, and set H = {h 1, h 2,..., h m } compute error of each model on test set D test and choose lowest: g = argmin E Dtest (h) h H Q: Are {z i = 1(y i = g(x i ) : (x i, y i ) D test )} independent? 12 / 21

26 Recall validation procedure how to decide which model to use? split data into training set D train and test set D test pick m different interesting model classes e.g., different φs: φ 1, φ 2,..., φ m fit ( train ) models on training set D train get one model h : X Y for each φs, and set H = {h 1, h 2,..., h m } compute error of each model on test set D test and choose lowest: g = argmin E Dtest (h) h H Q: Are {z i = 1(y i = g(x i ) : (x i, y i ) D test )} independent? A: No; g was trained on D test! Hoeffding does not directly apply: E Dtest (g) may not accurately estimate E out (g) 12 / 21

27 The union bound recall the union bound: for two random events A and B, P(A B) P(A) + P(B) 13 / 21

28 The union bound recall the union bound: for two random events A and B, P(A B) P(A) + P(B) Q: When is this bound tight? i.e., when is P(A B) = P(A) + P(B)? 13 / 21

29 The union bound recall the union bound: for two random events A and B, P(A B) P(A) + P(B) Q: When is this bound tight? i.e., when is P(A B) = P(A) + P(B)? A: When A B = 13 / 21

30 Rescuing Hoeffding: the union bound let s suppose H is finite, with m hypotheses in it the hypothesis g is one of those m hypotheses so if (given a data set D) E in (g) E out (g) > ε, then for some h H, E in (h) E out (h) > ε (g depends on the data set; we might choose different hs for different data sets) 14 / 21

31 so Rescuing Hoeffding: the union bound let s suppose H is finite, with m hypotheses in it the hypothesis g is one of those m hypotheses so if (given a data set D) E in (g) E out (g) > ε, then for some h H, E in (h) E out (h) > ε (g depends on the data set; we might choose different hs for different data sets) P[ E in (g) E out (g) > ε] h H P[ E in (h) E out (h) > ε] h H 2 exp ( 2ε 2 n ) = 2m exp ( 2ε 2 n ) 14 / 21

32 Hoeffding for learning we just proved that our learning algorithm generalizes! Theorem (Generalization bound for learning) Let g be a hypothesis chosen from among m different hypotheses. Then P[ E in (g) E out (g) > ε] 2m exp ( 2ε 2 n ). 15 / 21

33 Hoeffding for learning we just proved that our learning algorithm generalizes! Theorem (Generalization bound for learning) Let g be a hypothesis chosen from among m different hypotheses. Then P[ E in (g) E out (g) > ε] 2m exp ( 2ε 2 n ). Q: do you think this bound is tight? 15 / 21

34 Hoeffding for learning we just proved that our learning algorithm generalizes! Theorem (Generalization bound for learning) Let g be a hypothesis chosen from among m different hypotheses. Then P[ E in (g) E out (g) > ε] 2m exp ( 2ε 2 n ). Q: do you think this bound is tight? A: no, it can overcount badly. for random events A and B, if P(A B) is large, then P(A B) P(A) + P(B) 15 / 21

35 Hoeffding for learning we just proved that our learning algorithm generalizes! Theorem (Generalization bound for learning) Let g be a hypothesis chosen from among m different hypotheses. Then P[ E in (g) E out (g) > ε] 2m exp ( 2ε 2 n ). Q: do you think this bound is tight? A: no, it can overcount badly. for random events A and B, if P(A B) is large, then P(A B) P(A) + P(B) more information. look up the Vapnik-Chervoninkis (VC) dimension, e.g., in Learning from Data, by Abu-Mostafa, Magdon-Ismail, and Lin. 15 / 21

36 A tradeoff for learning we want H to be big to make E in small we want H to be small to ensure E out is close to E in 16 / 21

37 A tradeoff for learning we want H to be big to make E in small we want H to be small to ensure E out is close to E in what does this tell us about the difficulty of learning complicated functions f? 16 / 21

38 Generalization for regression Theorem (Generalization bound for learning) Let g be a hypothesis chosen from among m different hypotheses. Then P[ E in (g) E out (g) > ε] 2m exp ( 2ε 2 n ). to apply Hoeffding to real-valued outputs: pick some small ɛ > 0 1((y i h(x i )) 2 ε)) is 0 if hypothesis h predicts well, 1 if hypothesis h predicts poorly define error of hypothesis h on data set D as E D (h) = 1 D (x,y) D 1((y h(x)) 2 ɛ) 17 / 21

39 Recap we introduced a probabilistic framework for generating data we showed that the in-sample error predicts the out-of-sample error for a single hypothesis we showed that the in-sample error predicts the out-of-sample error for a learned hypothesis, when H is finite we stopped there, because the math gets much more complicated but indeed, generalization is possible! the practical lesson: (especially for complex models), don t learn and estimate your error on the same data set 18 / 21

40 in, out, train, test the in-sample error and training error do not obey the Hoeffding inequality (without using, e.g., a union bound) if test set is only used for testing (not for model selection), the test error does obey the Hoeffding inequality P[ E test (g) E out (g) > ε] 2 exp ( 2ε 2 D test ). so we can use the test error to predict generalization 19 / 21

41 Overfitting to the test set if test set is used for model selection, the test error obeys the Hoeffding inequality with the union bound for each model family, optimal model trained on D is a hypothesis h so finite number of models m = finite hypothesis space H hypotheses h H are independent of test set D let g D be the hypothesis h H with lowest error on test set D Hoeffding with union bound applies! P[ E D (g D ) E out (g D ) > ε] 2m exp ( 2ε 2 D ). 20 / 21

42 References Concentration bounds for infinite model classes: see introduction to VC dimension in Learning from Data by Abu-Mostafa et al. Concentration bounds for cross validation: Concentration bounds for time series: see papers by Cosma Shalizi 21 / 21

ORIE 4741: Learning with Big Messy Data. Train, Test, Validate

ORIE 4741: Learning with Big Messy Data. Train, Test, Validate ORIE 4741: Learning with Big Messy Data Train, Test, Validate Professor Udell Operations Research and Information Engineering Cornell December 4, 2017 1 / 14 Exercise You run a hospital. A vendor wants

More information

Machine Learning. Lecture 9: Learning Theory. Feng Li.

Machine Learning. Lecture 9: Learning Theory. Feng Li. Machine Learning Lecture 9: Learning Theory Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2018 Why Learning Theory How can we tell

More information

http://imgs.xkcd.com/comics/electoral_precedent.png Statistical Learning Theory CS4780/5780 Machine Learning Fall 2012 Thorsten Joachims Cornell University Reading: Mitchell Chapter 7 (not 7.4.4 and 7.5)

More information

The PAC Learning Framework -II

The PAC Learning Framework -II The PAC Learning Framework -II Prof. Dan A. Simovici UMB 1 / 1 Outline 1 Finite Hypothesis Space - The Inconsistent Case 2 Deterministic versus stochastic scenario 3 Bayes Error and Noise 2 / 1 Outline

More information

ORIE 4741: Learning with Big Messy Data. Feature Engineering

ORIE 4741: Learning with Big Messy Data. Feature Engineering ORIE 4741: Learning with Big Messy Data Feature Engineering Professor Udell Operations Research and Information Engineering Cornell October 2, 2017 1 / 24 Agenda announcements Vote today! + registration

More information

ORIE 4741 Final Exam

ORIE 4741 Final Exam ORIE 4741 Final Exam December 15, 2016 Rules for the exam. Write your name and NetID at the top of the exam. The exam is 2.5 hours long. Every multiple choice or true false question is worth 1 point. Every

More information

Stochastic Gradient Descent

Stochastic Gradient Descent Stochastic Gradient Descent Machine Learning CSE546 Carlos Guestrin University of Washington October 9, 2013 1 Logistic Regression Logistic function (or Sigmoid): Learn P(Y X) directly Assume a particular

More information

Statistical Learning Theory: Generalization Error Bounds

Statistical Learning Theory: Generalization Error Bounds Statistical Learning Theory: Generalization Error Bounds CS6780 Advanced Machine Learning Spring 2015 Thorsten Joachims Cornell University Reading: Murphy 6.5.4 Schoelkopf/Smola Chapter 5 (beginning, rest

More information

Machine Learning

Machine Learning Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 11, 2012 Today: Computational Learning Theory Probably Approximately Coorrect (PAC) learning theorem

More information

Learning From Data Lecture 5 Training Versus Testing

Learning From Data Lecture 5 Training Versus Testing Learning From Data Lecture 5 Training Versus Testing The Two Questions of Learning Theory of Generalization (E in E out ) An Effective Number of Hypotheses A Combinatorial Puzzle M. Magdon-Ismail CSCI

More information

Machine Learning

Machine Learning Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 11, 2012 Today: Computational Learning Theory Probably Approximately Coorrect (PAC) learning theorem

More information

Learning Theory. Sridhar Mahadevan. University of Massachusetts. p. 1/38

Learning Theory. Sridhar Mahadevan. University of Massachusetts. p. 1/38 Learning Theory Sridhar Mahadevan mahadeva@cs.umass.edu University of Massachusetts p. 1/38 Topics Probability theory meet machine learning Concentration inequalities: Chebyshev, Chernoff, Hoeffding, and

More information

Bias-Variance Tradeoff

Bias-Variance Tradeoff What s learning, revisited Overfitting Generative versus Discriminative Logistic Regression Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University September 19 th, 2007 Bias-Variance Tradeoff

More information

CSE 151 Machine Learning. Instructor: Kamalika Chaudhuri

CSE 151 Machine Learning. Instructor: Kamalika Chaudhuri CSE 151 Machine Learning Instructor: Kamalika Chaudhuri Ensemble Learning How to combine multiple classifiers into a single one Works well if the classifiers are complementary This class: two types of

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Vapnik Chervonenkis Theory Barnabás Póczos Empirical Risk and True Risk 2 Empirical Risk Shorthand: True risk of f (deterministic): Bayes risk: Let us use the empirical

More information

Active Learning and Optimized Information Gathering

Active Learning and Optimized Information Gathering Active Learning and Optimized Information Gathering Lecture 7 Learning Theory CS 101.2 Andreas Krause Announcements Project proposal: Due tomorrow 1/27 Homework 1: Due Thursday 1/29 Any time is ok. Office

More information

COMP9444: Neural Networks. Vapnik Chervonenkis Dimension, PAC Learning and Structural Risk Minimization

COMP9444: Neural Networks. Vapnik Chervonenkis Dimension, PAC Learning and Structural Risk Minimization : Neural Networks Vapnik Chervonenkis Dimension, PAC Learning and Structural Risk Minimization 11s2 VC-dimension and PAC-learning 1 How good a classifier does a learner produce? Training error is the precentage

More information

Advanced Introduction to Machine Learning CMU-10715

Advanced Introduction to Machine Learning CMU-10715 Advanced Introduction to Machine Learning CMU-10715 Risk Minimization Barnabás Póczos What have we seen so far? Several classification & regression algorithms seem to work fine on training datasets: Linear

More information

Week 5: Logistic Regression & Neural Networks

Week 5: Logistic Regression & Neural Networks Week 5: Logistic Regression & Neural Networks Instructor: Sergey Levine 1 Summary: Logistic Regression In the previous lecture, we covered logistic regression. To recap, logistic regression models and

More information

Understanding Generalization Error: Bounds and Decompositions

Understanding Generalization Error: Bounds and Decompositions CIS 520: Machine Learning Spring 2018: Lecture 11 Understanding Generalization Error: Bounds and Decompositions Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the

More information

ECS171: Machine Learning

ECS171: Machine Learning ECS171: Machine Learning Lecture 6: Training versus Testing (LFD 2.1) Cho-Jui Hsieh UC Davis Jan 29, 2018 Preamble to the theory Training versus testing Out-of-sample error (generalization error): What

More information

IFT Lecture 7 Elements of statistical learning theory

IFT Lecture 7 Elements of statistical learning theory IFT 6085 - Lecture 7 Elements of statistical learning theory This version of the notes has not yet been thoroughly checked. Please report any bugs to the scribes or instructor. Scribe(s): Brady Neal and

More information

Learning From Data Lecture 3 Is Learning Feasible?

Learning From Data Lecture 3 Is Learning Feasible? Learning From Data Lecture 3 Is Learning Feasible? Outside the Data Probability to the Rescue Learning vs. Verification Selection Bias - A Cartoon M. Magdon-Ismail CSCI 4100/6100 recap: The Perceptron

More information

Machine Learning Basics Lecture 2: Linear Classification. Princeton University COS 495 Instructor: Yingyu Liang

Machine Learning Basics Lecture 2: Linear Classification. Princeton University COS 495 Instructor: Yingyu Liang Machine Learning Basics Lecture 2: Linear Classification Princeton University COS 495 Instructor: Yingyu Liang Review: machine learning basics Math formulation Given training data x i, y i : 1 i n i.i.d.

More information

Decision Trees. Data Science: Jordan Boyd-Graber University of Maryland MARCH 11, Data Science: Jordan Boyd-Graber UMD Decision Trees 1 / 1

Decision Trees. Data Science: Jordan Boyd-Graber University of Maryland MARCH 11, Data Science: Jordan Boyd-Graber UMD Decision Trees 1 / 1 Decision Trees Data Science: Jordan Boyd-Graber University of Maryland MARCH 11, 2018 Data Science: Jordan Boyd-Graber UMD Decision Trees 1 / 1 Roadmap Classification: machines labeling data for us Last

More information

2 Upper-bound of Generalization Error of AdaBoost

2 Upper-bound of Generalization Error of AdaBoost COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #10 Scribe: Haipeng Zheng March 5, 2008 1 Review of AdaBoost Algorithm Here is the AdaBoost Algorithm: input: (x 1,y 1 ),...,(x m,y

More information

Computational Learning Theory. CS534 - Machine Learning

Computational Learning Theory. CS534 - Machine Learning Computational Learning Theory CS534 Machine Learning Introduction Computational learning theory Provides a theoretical analysis of learning Shows when a learning algorithm can be expected to succeed Shows

More information

Learning Theory. Machine Learning CSE546 Carlos Guestrin University of Washington. November 25, Carlos Guestrin

Learning Theory. Machine Learning CSE546 Carlos Guestrin University of Washington. November 25, Carlos Guestrin Learning Theory Machine Learning CSE546 Carlos Guestrin University of Washington November 25, 2013 Carlos Guestrin 2005-2013 1 What now n We have explored many ways of learning from data n But How good

More information

Computational Learning Theory

Computational Learning Theory 1 Computational Learning Theory 2 Computational learning theory Introduction Is it possible to identify classes of learning problems that are inherently easy or difficult? Can we characterize the number

More information

COMS 4771 Introduction to Machine Learning. Nakul Verma

COMS 4771 Introduction to Machine Learning. Nakul Verma COMS 4771 Introduction to Machine Learning Nakul Verma Announcements HW2 due now! Project proposal due on tomorrow Midterm next lecture! HW3 posted Last time Linear Regression Parametric vs Nonparametric

More information

1 Review of The Learning Setting

1 Review of The Learning Setting COS 5: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #8 Scribe: Changyan Wang February 28, 208 Review of The Learning Setting Last class, we moved beyond the PAC model: in the PAC model we

More information

Machine Learning CSE546 Carlos Guestrin University of Washington. September 30, What about continuous variables?

Machine Learning CSE546 Carlos Guestrin University of Washington. September 30, What about continuous variables? Linear Regression Machine Learning CSE546 Carlos Guestrin University of Washington September 30, 2014 1 What about continuous variables? n Billionaire says: If I am measuring a continuous variable, what

More information

Generalization bounds

Generalization bounds Advanced Course in Machine Learning pring 200 Generalization bounds Handouts are jointly prepared by hie Mannor and hai halev-hwartz he problem of characterizing learnability is the most basic question

More information

VC Dimension Review. The purpose of this document is to review VC dimension and PAC learning for infinite hypothesis spaces.

VC Dimension Review. The purpose of this document is to review VC dimension and PAC learning for infinite hypothesis spaces. VC Dimension Review The purpose of this document is to review VC dimension and PAC learning for infinite hypothesis spaces. Previously, in discussing PAC learning, we were trying to answer questions about

More information

Machine Learning CSE546 Carlos Guestrin University of Washington. September 30, 2013

Machine Learning CSE546 Carlos Guestrin University of Washington. September 30, 2013 Bayesian Methods Machine Learning CSE546 Carlos Guestrin University of Washington September 30, 2013 1 What about prior n Billionaire says: Wait, I know that the thumbtack is close to 50-50. What can you

More information

Machine Learning Foundations

Machine Learning Foundations Machine Learning Foundations ( 機器學習基石 ) Lecture 4: Feasibility of Learning Hsuan-Tien Lin ( 林軒田 ) htlin@csie.ntu.edu.tw Department of Computer Science & Information Engineering National Taiwan University

More information

MIDTERM SOLUTIONS: FALL 2012 CS 6375 INSTRUCTOR: VIBHAV GOGATE

MIDTERM SOLUTIONS: FALL 2012 CS 6375 INSTRUCTOR: VIBHAV GOGATE MIDTERM SOLUTIONS: FALL 2012 CS 6375 INSTRUCTOR: VIBHAV GOGATE March 28, 2012 The exam is closed book. You are allowed a double sided one page cheat sheet. Answer the questions in the spaces provided on

More information

Learning Theory. Piyush Rai. CS5350/6350: Machine Learning. September 27, (CS5350/6350) Learning Theory September 27, / 14

Learning Theory. Piyush Rai. CS5350/6350: Machine Learning. September 27, (CS5350/6350) Learning Theory September 27, / 14 Learning Theory Piyush Rai CS5350/6350: Machine Learning September 27, 2011 (CS5350/6350) Learning Theory September 27, 2011 1 / 14 Why Learning Theory? We want to have theoretical guarantees about our

More information

Homework # 3. Due Monday, October 22, 2018, at 2:00 PM PDT

Homework # 3. Due Monday, October 22, 2018, at 2:00 PM PDT CS/CNS/EE 156a Learning Systems Caltech - Fall 018 http://cs156.caltech.edu (Learning From Data campus version Homework # 3 Due Monday, October, 018, at :00 PM PDT Definitions and notation follow the lectures.

More information

Machine Learning. Model Selection and Validation. Fabio Vandin November 7, 2017

Machine Learning. Model Selection and Validation. Fabio Vandin November 7, 2017 Machine Learning Model Selection and Validation Fabio Vandin November 7, 2017 1 Model Selection When we have to solve a machine learning task: there are different algorithms/classes algorithms have parameters

More information

Computational Learning Theory. Definitions

Computational Learning Theory. Definitions Computational Learning Theory Computational learning theory is interested in theoretical analyses of the following issues. What is needed to learn effectively? Sample complexity. How many examples? Computational

More information

Generalization Bounds and Stability

Generalization Bounds and Stability Generalization Bounds and Stability Lorenzo Rosasco Tomaso Poggio 9.520 Class 9 2009 About this class Goal To recall the notion of generalization bounds and show how they can be derived from a stability

More information

CSE 151 Machine Learning. Instructor: Kamalika Chaudhuri

CSE 151 Machine Learning. Instructor: Kamalika Chaudhuri CSE 151 Machine Learning Instructor: Kamalika Chaudhuri Announcements Midterm is graded! Average: 39, stdev: 6 HW2 is out today HW2 is due Thursday, May 3, by 5pm in my mailbox Decision Tree Classifiers

More information

Active Learning: Disagreement Coefficient

Active Learning: Disagreement Coefficient Advanced Course in Machine Learning Spring 2010 Active Learning: Disagreement Coefficient Handouts are jointly prepared by Shie Mannor and Shai Shalev-Shwartz In previous lectures we saw examples in which

More information

Machine Learning CSE546 Sham Kakade University of Washington. Oct 4, What about continuous variables?

Machine Learning CSE546 Sham Kakade University of Washington. Oct 4, What about continuous variables? Linear Regression Machine Learning CSE546 Sham Kakade University of Washington Oct 4, 2016 1 What about continuous variables? Billionaire says: If I am measuring a continuous variable, what can you do

More information

Statistical and Computational Learning Theory

Statistical and Computational Learning Theory Statistical and Computational Learning Theory Fundamental Question: Predict Error Rates Given: Find: The space H of hypotheses The number and distribution of the training examples S The complexity of the

More information

Introduction to Machine Learning (67577) Lecture 3

Introduction to Machine Learning (67577) Lecture 3 Introduction to Machine Learning (67577) Lecture 3 Shai Shalev-Shwartz School of CS and Engineering, The Hebrew University of Jerusalem General Learning Model and Bias-Complexity tradeoff Shai Shalev-Shwartz

More information

Machine Learning (CS 567) Lecture 5

Machine Learning (CS 567) Lecture 5 Machine Learning (CS 567) Lecture 5 Time: T-Th 5:00pm - 6:20pm Location: GFS 118 Instructor: Sofus A. Macskassy (macskass@usc.edu) Office: SAL 216 Office hours: by appointment Teaching assistant: Cheol

More information

Generalization, Overfitting, and Model Selection

Generalization, Overfitting, and Model Selection Generalization, Overfitting, and Model Selection Sample Complexity Results for Supervised Classification Maria-Florina (Nina) Balcan 10/03/2016 Two Core Aspects of Machine Learning Algorithm Design. How

More information

A Decision Stump. Decision Trees, cont. Boosting. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University. October 1 st, 2007

A Decision Stump. Decision Trees, cont. Boosting. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University. October 1 st, 2007 Decision Trees, cont. Boosting Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University October 1 st, 2007 1 A Decision Stump 2 1 The final tree 3 Basic Decision Tree Building Summarized

More information

Generalization and Overfitting

Generalization and Overfitting Generalization and Overfitting Model Selection Maria-Florina (Nina) Balcan February 24th, 2016 PAC/SLT models for Supervised Learning Data Source Distribution D on X Learning Algorithm Expert / Oracle

More information

Machine Learning, Midterm Exam: Spring 2008 SOLUTIONS. Q Topic Max. Score Score. 1 Short answer questions 20.

Machine Learning, Midterm Exam: Spring 2008 SOLUTIONS. Q Topic Max. Score Score. 1 Short answer questions 20. 10-601 Machine Learning, Midterm Exam: Spring 2008 Please put your name on this cover sheet If you need more room to work out your answer to a question, use the back of the page and clearly mark on the

More information

Computational Learning Theory (VC Dimension)

Computational Learning Theory (VC Dimension) Computational Learning Theory (VC Dimension) 1 Difficulty of machine learning problems 2 Capabilities of machine learning algorithms 1 Version Space with associated errors error is the true error, r is

More information

Statistical learning. Chapter 20, Sections 1 4 1

Statistical learning. Chapter 20, Sections 1 4 1 Statistical learning Chapter 20, Sections 1 4 Chapter 20, Sections 1 4 1 Outline Bayesian learning Maximum a posteriori and maximum likelihood learning Bayes net learning ML parameter learning with complete

More information

Part of the slides are adapted from Ziko Kolter

Part of the slides are adapted from Ziko Kolter Part of the slides are adapted from Ziko Kolter OUTLINE 1 Supervised learning: classification........................................................ 2 2 Non-linear regression/classification, overfitting,

More information

Lecture : Probabilistic Machine Learning

Lecture : Probabilistic Machine Learning Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning

More information

Lecture Learning infinite hypothesis class via VC-dimension and Rademacher complexity;

Lecture Learning infinite hypothesis class via VC-dimension and Rademacher complexity; CSCI699: Topics in Learning and Game Theory Lecture 2 Lecturer: Ilias Diakonikolas Scribes: Li Han Today we will cover the following 2 topics: 1. Learning infinite hypothesis class via VC-dimension and

More information

Machine Learning. Computational Learning Theory. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012

Machine Learning. Computational Learning Theory. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012 Machine Learning CSE6740/CS7641/ISYE6740, Fall 2012 Computational Learning Theory Le Song Lecture 11, September 20, 2012 Based on Slides from Eric Xing, CMU Reading: Chap. 7 T.M book 1 Complexity of Learning

More information

Lecture 3: Introduction to Complexity Regularization

Lecture 3: Introduction to Complexity Regularization ECE90 Spring 2007 Statistical Learning Theory Instructor: R. Nowak Lecture 3: Introduction to Complexity Regularization We ended the previous lecture with a brief discussion of overfitting. Recall that,

More information

Supervised Machine Learning (Spring 2014) Homework 2, sample solutions

Supervised Machine Learning (Spring 2014) Homework 2, sample solutions 58669 Supervised Machine Learning (Spring 014) Homework, sample solutions Credit for the solutions goes to mainly to Panu Luosto and Joonas Paalasmaa, with some additional contributions by Jyrki Kivinen

More information

Decision Trees. Nicholas Ruozzi University of Texas at Dallas. Based on the slides of Vibhav Gogate and David Sontag

Decision Trees. Nicholas Ruozzi University of Texas at Dallas. Based on the slides of Vibhav Gogate and David Sontag Decision Trees Nicholas Ruozzi University of Texas at Dallas Based on the slides of Vibhav Gogate and David Sontag Supervised Learning Input: labelled training data i.e., data plus desired output Assumption:

More information

Lecture 8. Instructor: Haipeng Luo

Lecture 8. Instructor: Haipeng Luo Lecture 8 Instructor: Haipeng Luo Boosting and AdaBoost In this lecture we discuss the connection between boosting and online learning. Boosting is not only one of the most fundamental theories in machine

More information

1 Active Learning Foundations of Machine Learning and Data Science. Lecturer: Maria-Florina Balcan Lecture 20 & 21: November 16 & 18, 2015

1 Active Learning Foundations of Machine Learning and Data Science. Lecturer: Maria-Florina Balcan Lecture 20 & 21: November 16 & 18, 2015 10-806 Foundations of Machine Learning and Data Science Lecturer: Maria-Florina Balcan Lecture 20 & 21: November 16 & 18, 2015 1 Active Learning Most classic machine learning methods and the formal learning

More information

Does Unlabeled Data Help?

Does Unlabeled Data Help? Does Unlabeled Data Help? Worst-case Analysis of the Sample Complexity of Semi-supervised Learning. Ben-David, Lu and Pal; COLT, 2008. Presentation by Ashish Rastogi Courant Machine Learning Seminar. Outline

More information

18.9 SUPPORT VECTOR MACHINES

18.9 SUPPORT VECTOR MACHINES 744 Chapter 8. Learning from Examples is the fact that each regression problem will be easier to solve, because it involves only the examples with nonzero weight the examples whose kernels overlap the

More information

Logistic Regression. Machine Learning Fall 2018

Logistic Regression. Machine Learning Fall 2018 Logistic Regression Machine Learning Fall 2018 1 Where are e? We have seen the folloing ideas Linear models Learning as loss minimization Bayesian learning criteria (MAP and MLE estimation) The Naïve Bayes

More information

CS340 Machine learning Lecture 5 Learning theory cont'd. Some slides are borrowed from Stuart Russell and Thorsten Joachims

CS340 Machine learning Lecture 5 Learning theory cont'd. Some slides are borrowed from Stuart Russell and Thorsten Joachims CS340 Machine learning Lecture 5 Learning theory cont'd Some slides are borrowed from Stuart Russell and Thorsten Joachims Inductive learning Simplest form: learn a function from examples f is the target

More information

Computational Learning Theory

Computational Learning Theory CS 446 Machine Learning Fall 2016 OCT 11, 2016 Computational Learning Theory Professor: Dan Roth Scribe: Ben Zhou, C. Cervantes 1 PAC Learning We want to develop a theory to relate the probability of successful

More information

Linear Regression. Machine Learning CSE546 Kevin Jamieson University of Washington. Oct 5, Kevin Jamieson 1

Linear Regression. Machine Learning CSE546 Kevin Jamieson University of Washington. Oct 5, Kevin Jamieson 1 Linear Regression Machine Learning CSE546 Kevin Jamieson University of Washington Oct 5, 2017 1 The regression problem Given past sales data on zillow.com, predict: y = House sale price from x = {# sq.

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Thomas G. Dietterich tgd@eecs.oregonstate.edu 1 Outline What is Machine Learning? Introduction to Supervised Learning: Linear Methods Overfitting, Regularization, and the

More information

Introduction to Bayesian Learning. Machine Learning Fall 2018

Introduction to Bayesian Learning. Machine Learning Fall 2018 Introduction to Bayesian Learning Machine Learning Fall 2018 1 What we have seen so far What does it mean to learn? Mistake-driven learning Learning by counting (and bounding) number of mistakes PAC learnability

More information

Computational and Statistical Learning theory

Computational and Statistical Learning theory Computational and Statistical Learning theory Problem set 2 Due: January 31st Email solutions to : karthik at ttic dot edu Notation : Input space : X Label space : Y = {±1} Sample : (x 1, y 1,..., (x n,

More information

CSC 411 Lecture 3: Decision Trees

CSC 411 Lecture 3: Decision Trees CSC 411 Lecture 3: Decision Trees Roger Grosse, Amir-massoud Farahmand, and Juan Carrasquilla University of Toronto UofT CSC 411: 03-Decision Trees 1 / 33 Today Decision Trees Simple but powerful learning

More information

Machine Learning, Fall 2009: Midterm

Machine Learning, Fall 2009: Midterm 10-601 Machine Learning, Fall 009: Midterm Monday, November nd hours 1. Personal info: Name: Andrew account: E-mail address:. You are permitted two pages of notes and a calculator. Please turn off all

More information

Deconstructing Data Science

Deconstructing Data Science econstructing ata Science avid Bamman, UC Berkeley Info 290 Lecture 6: ecision trees & random forests Feb 2, 2016 Linear regression eep learning ecision trees Ordinal regression Probabilistic graphical

More information

Introduction to Algorithms / Algorithms I Lecturer: Michael Dinitz Topic: Intro to Learning Theory Date: 12/8/16

Introduction to Algorithms / Algorithms I Lecturer: Michael Dinitz Topic: Intro to Learning Theory Date: 12/8/16 600.463 Introduction to Algorithms / Algorithms I Lecturer: Michael Dinitz Topic: Intro to Learning Theory Date: 12/8/16 25.1 Introduction Today we re going to talk about machine learning, but from an

More information

Machine Learning Theory (CS 6783)

Machine Learning Theory (CS 6783) Machine Learning Theory (CS 6783) Tu-Th 1:25 to 2:40 PM Kimball, B-11 Instructor : Karthik Sridharan ABOUT THE COURSE No exams! 5 assignments that count towards your grades (55%) One term project (40%)

More information

Introduction to Machine Learning Fall 2017 Note 5. 1 Overview. 2 Metric

Introduction to Machine Learning Fall 2017 Note 5. 1 Overview. 2 Metric CS 189 Introduction to Machine Learning Fall 2017 Note 5 1 Overview Recall from our previous note that for a fixed input x, our measurement Y is a noisy measurement of the true underlying response f x):

More information

CMU-Q Lecture 24:

CMU-Q Lecture 24: CMU-Q 15-381 Lecture 24: Supervised Learning 2 Teacher: Gianni A. Di Caro SUPERVISED LEARNING Hypotheses space Hypothesis function Labeled Given Errors Performance criteria Given a collection of input

More information

Machine Learning

Machine Learning Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University August 30, 2017 Today: Decision trees Overfitting The Big Picture Coming soon Probabilistic learning MLE,

More information

Learning From Data Lecture 2 The Perceptron

Learning From Data Lecture 2 The Perceptron Learning From Data Lecture 2 The Perceptron The Learning Setup A Simple Learning Algorithm: PLA Other Views of Learning Is Learning Feasible: A Puzzle M. Magdon-Ismail CSCI 4100/6100 recap: The Plan 1.

More information

CS340 Machine learning Lecture 4 Learning theory. Some slides are borrowed from Sebastian Thrun and Stuart Russell

CS340 Machine learning Lecture 4 Learning theory. Some slides are borrowed from Sebastian Thrun and Stuart Russell CS340 Machine learning Lecture 4 Learning theory Some slides are borrowed from Sebastian Thrun and Stuart Russell Announcement What: Workshop on applying for NSERC scholarships and for entry to graduate

More information

Generalization Bounds

Generalization Bounds Generalization Bounds Here we consider the problem of learning from binary labels. We assume training data D = x 1, y 1,... x N, y N with y t being one of the two values 1 or 1. We will assume that these

More information

Lecture 20 Random Samples 0/ 13

Lecture 20 Random Samples 0/ 13 0/ 13 One of the most important concepts in statistics is that of a random sample. The definition of a random sample is rather abstract. However it is critical to understand the idea behind the definition,

More information

ORIE 4741: Learning with Big Messy Data. Proximal Gradient Method

ORIE 4741: Learning with Big Messy Data. Proximal Gradient Method ORIE 4741: Learning with Big Messy Data Proximal Gradient Method Professor Udell Operations Research and Information Engineering Cornell November 13, 2017 1 / 31 Announcements Be a TA for CS/ORIE 1380:

More information

Machine Learning

Machine Learning Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University February 4, 2015 Today: Generative discriminative classifiers Linear regression Decomposition of error into

More information

Introduction to Machine Learning (67577) Lecture 5

Introduction to Machine Learning (67577) Lecture 5 Introduction to Machine Learning (67577) Lecture 5 Shai Shalev-Shwartz School of CS and Engineering, The Hebrew University of Jerusalem Nonuniform learning, MDL, SRM, Decision Trees, Nearest Neighbor Shai

More information

Machine Learning. Ensemble Methods. Manfred Huber

Machine Learning. Ensemble Methods. Manfred Huber Machine Learning Ensemble Methods Manfred Huber 2015 1 Bias, Variance, Noise Classification errors have different sources Choice of hypothesis space and algorithm Training set Noise in the data The expected

More information

Ensemble Methods for Machine Learning

Ensemble Methods for Machine Learning Ensemble Methods for Machine Learning COMBINING CLASSIFIERS: ENSEMBLE APPROACHES Common Ensemble classifiers Bagging/Random Forests Bucket of models Stacking Boosting Ensemble classifiers we ve studied

More information

CS6375: Machine Learning Gautam Kunapuli. Decision Trees

CS6375: Machine Learning Gautam Kunapuli. Decision Trees Gautam Kunapuli Example: Restaurant Recommendation Example: Develop a model to recommend restaurants to users depending on their past dining experiences. Here, the features are cost (x ) and the user s

More information

Dan Roth 461C, 3401 Walnut

Dan Roth   461C, 3401 Walnut CIS 519/419 Applied Machine Learning www.seas.upenn.edu/~cis519 Dan Roth danroth@seas.upenn.edu http://www.cis.upenn.edu/~danroth/ 461C, 3401 Walnut Slides were created by Dan Roth (for CIS519/419 at Penn

More information

Introduction to Machine Learning Midterm Exam Solutions

Introduction to Machine Learning Midterm Exam Solutions 10-701 Introduction to Machine Learning Midterm Exam Solutions Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes,

More information

CS 6375 Machine Learning

CS 6375 Machine Learning CS 6375 Machine Learning Nicholas Ruozzi University of Texas at Dallas Slides adapted from David Sontag and Vibhav Gogate Course Info. Instructor: Nicholas Ruozzi Office: ECSS 3.409 Office hours: Tues.

More information

Machine Learning CSE546

Machine Learning CSE546 http://www.cs.washington.edu/education/courses/cse546/17au/ Machine Learning CSE546 Kevin Jamieson University of Washington September 28, 2017 1 You may also like ML uses past data to make personalized

More information

Bayesian Machine Learning

Bayesian Machine Learning Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 2: Bayesian Basics https://people.orie.cornell.edu/andrew/orie6741 Cornell University August 25, 2016 1 / 17 Canonical Machine Learning

More information

Introduction to Machine Learning Midterm Exam

Introduction to Machine Learning Midterm Exam 10-701 Introduction to Machine Learning Midterm Exam Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes, but

More information

Machine Learning (CS 567) Lecture 2

Machine Learning (CS 567) Lecture 2 Machine Learning (CS 567) Lecture 2 Time: T-Th 5:00pm - 6:20pm Location: GFS118 Instructor: Sofus A. Macskassy (macskass@usc.edu) Office: SAL 216 Office hours: by appointment Teaching assistant: Cheol

More information

PAC-learning, VC Dimension and Margin-based Bounds

PAC-learning, VC Dimension and Margin-based Bounds More details: General: http://www.learning-with-kernels.org/ Example of more complex bounds: http://www.research.ibm.com/people/t/tzhang/papers/jmlr02_cover.ps.gz PAC-learning, VC Dimension and Margin-based

More information

CSC321 Lecture 4 The Perceptron Algorithm

CSC321 Lecture 4 The Perceptron Algorithm CSC321 Lecture 4 The Perceptron Algorithm Roger Grosse and Nitish Srivastava January 17, 2017 Roger Grosse and Nitish Srivastava CSC321 Lecture 4 The Perceptron Algorithm January 17, 2017 1 / 1 Recap:

More information

COMP 551 Applied Machine Learning Lecture 3: Linear regression (cont d)

COMP 551 Applied Machine Learning Lecture 3: Linear regression (cont d) COMP 551 Applied Machine Learning Lecture 3: Linear regression (cont d) Instructor: Herke van Hoof (herke.vanhoof@mail.mcgill.ca) Slides mostly by: Class web page: www.cs.mcgill.ca/~hvanho2/comp551 Unless

More information