Introduction to Machine Learning
|
|
- Emmeline Taylor
- 5 years ago
- Views:
Transcription
1 Introduction to Machine Learning Slides adapted from Eli Upfal Machine Learning: Jordan Boyd-Graber University of Maryland FEATURE ENGINEERING Machine Learning: Jordan Boyd-Graber UMD Introduction to Machine Learning 1 / 11
2 What does it mean to learn something? What are the things that we re learning? What does it mean to be learnable? Provides a framework for reasoning about what we can theoretically learn Machine Learning: Jordan Boyd-Graber UMD Introduction to Machine Learning 2 / 11
3 What does it mean to learn something? What are the things that we re learning? What does it mean to be learnable? Provides a framework for reasoning about what we can theoretically learn Sometime theoretically learnable things are very difficult Sometimes things that should be hard actually work Machine Learning: Jordan Boyd-Graber UMD Introduction to Machine Learning 2 / 11
4 Example Californian just moved to Colorado When is it nice outside? Has a perfect thermometer, but natives call 50F (10C) nice Machine Learning: Jordan Boyd-Graber UMD Introduction to Machine Learning 3 / 11
5 Example Californian just moved to Colorado When is it nice outside? Has a perfect thermometer, but natives call 50F (10C) nice Each temperature is an observation x Coloradan concept of nice c(x) Californian wants to learn hypothesis h(x) close to c(x) Machine Learning: Jordan Boyd-Graber UMD Introduction to Machine Learning 3 / 11
6 Example Californian just moved to Colorado When is it nice outside? Has a perfect thermometer, but natives call 50F (10C) nice Each temperature is an observation x Coloradan concept of nice c(x) Californian wants to learn hypothesis h(x) close to c(x) Generalization error R(h) = Pr x D [h(x) c(x)] = x D [1[h(x) c(x)]] (1) [Notation 1[x] = 1 iff x is true, 0 otherwise] Machine Learning: Jordan Boyd-Graber UMD Introduction to Machine Learning 3 / 11
7 Example Californian just moved to Colorado When is it nice outside? Has a perfect thermometer, but natives call 50F (10C) nice Each temperature is an observation x Coloradan concept of nice c(x) Californian wants to learn hypothesis h(x) close to c(x) Generalization error R(h) = Pr x D [h(x) c(x)] = x D [1[h(x) c(x)]] (1) [Notation 1[x] = 1 iff x is true, 0 otherwise] Machine Learning: Jordan Boyd-Graber UMD Introduction to Machine Learning 3 / 11
8 Probably Correct The Californian gets n random examples Machine Learning: Jordan Boyd-Graber UMD Introduction to Machine Learning 4 / 11
9 Probably Correct The Californian gets n random examples Machine Learning: Jordan Boyd-Graber UMD Introduction to Machine Learning 4 / 11
10 Probably Correct The Californian gets n random examples Machine Learning: Jordan Boyd-Graber UMD Introduction to Machine Learning 4 / 11
11 Probably Correct The best rule that conforms with the examples is [a,b] a b Machine Learning: Jordan Boyd-Graber UMD Introduction to Machine Learning 4 / 11
12 Probably Correct c d a b Let [c,d] be the correct (unknown) rule. Let be the gap between. The probability of being wrong is the probability that n samples missed ca and bd. Machine Learning: Jordan Boyd-Graber UMD Introduction to Machine Learning 4 / 11
13 PAC-learning definition Definition PAC-learnable A concept C is PAC-learnable if algorithm and a polynomial function f such that for any ε and δ, D(X) and c C for any sample size m f 1 ε, 1 δ,n, c Pr S D m [R(h S ) ε] 1 δ (2) Machine Learning: Jordan Boyd-Graber UMD Introduction to Machine Learning 5 / 11
14 PAC-learning definition Definition PAC-learnable A concept C is PAC-learnable if algorithm and a polynomial function f such that for any ε and δ, D(X) and c C for any sample size m f 1 ε, 1 δ,n, c Pr S D m [R(h S ) ε] 1 δ (2) The sample we learn from Machine Learning: Jordan Boyd-Graber UMD Introduction to Machine Learning 5 / 11
15 PAC-learning definition Definition PAC-learnable A concept C is PAC-learnable if algorithm and a polynomial function f such that for any ε and δ, D(X) and c C for any sample size m f 1 ε, 1 δ,n, c Pr S D m [R(h S ) ε] 1 δ (2) The data distribution the sample comes from Machine Learning: Jordan Boyd-Graber UMD Introduction to Machine Learning 5 / 11
16 PAC-learning definition Definition PAC-learnable A concept C is PAC-learnable if algorithm and a polynomial function f such that for any ε and δ, D(X) and c C for any sample size m f 1 ε, 1 δ,n, c Pr S D m [R(h S ) ε] 1 δ (2) The hypothesis we learn Machine Learning: Jordan Boyd-Graber UMD Introduction to Machine Learning 5 / 11
17 PAC-learning definition Definition PAC-learnable A concept C is PAC-learnable if algorithm and a polynomial function f such that for any ε and δ, D(X) and c C for any sample size m f 1 ε, 1 δ,n, c Pr S D m [R(h S ) ε] 1 δ (2) Generalization error Machine Learning: Jordan Boyd-Graber UMD Introduction to Machine Learning 5 / 11
18 PAC-learning definition Definition PAC-learnable A concept C is PAC-learnable if algorithm and a polynomial function f such that for any ε and δ, D(X) and c C for any sample size m f 1 ε, 1 δ,n, c Pr S D m [R(h S ) ε] 1 δ (2) Our bound on the generalization error (e.g., we want it to be better than 0.1) Machine Learning: Jordan Boyd-Graber UMD Introduction to Machine Learning 5 / 11
19 PAC-learning definition Definition PAC-learnable A concept C is PAC-learnable if algorithm and a polynomial function f such that for any ε and δ, D(X) and c C for any sample size m f 1 ε, 1 δ,n, c Pr S D m [R(h S ) ε] 1 δ (2) The probability of learning a hypothesis with error greater than ε (e.g., 0.05) Machine Learning: Jordan Boyd-Graber UMD Introduction to Machine Learning 5 / 11
20 Is a Californian learning temperature PAC learnable? Bad event happens if no training point in ca or bd. Pr[x 1 ca x m ca ] = m Pr[x i ca ] (3) i We want the probability of a point landing there (or to be less than ε Pr[x 1 ca x m ca ] = (1 ε) m e εm (4) Machine Learning: Jordan Boyd-Graber UMD Introduction to Machine Learning 6 / 11
21 Is a Californian learning temperature PAC learnable? Bad event happens if no training point in ca or bd. Pr[x 1 ca x m ca ] = m Pr[x i ca ] (3) i Independence! We want the probability of a point landing there (or to be less than ε Pr[x 1 ca x m ca ] = (1 ε) m e εm (4) Machine Learning: Jordan Boyd-Graber UMD Introduction to Machine Learning 6 / 11
22 Is a Californian learning temperature PAC learnable? Bad event happens if no training point in ca or bd. m Pr[x i ca ] (3) Pr[x 1 ca x m ca ] = We want the probability of a point landing there (or to be less than ε Pr[x 1 ca x m ca ] = (1 ε) m e εm (4) Useful inequality: 1 + x e x i Machine Learning: Jordan Boyd-Graber UMD Introduction to Machine Learning 6 / 11
23 Is a Californian learning temperature PAC learnable? Bad event happens if no training point in ca or bd. m Pr[x i ca ] (3) Pr[x 1 ca x m ca ] = We want the probability of a point landing there (or to be less than ε Pr[x 1 ca x m ca ] = (1 ε) m e εm (4) We want the generalization to violate ε less than δ, solving for m: i Pr[R(h) ε] δ (5) 2e εm δ (6) εm ln δ 2 (7) 1 ε ln 2 m δ (8) Machine Learning: Jordan Boyd-Graber UMD Introduction to Machine Learning 6 / 11
24 Is a Californian learning temperature PAC learnable? Bad event happens if no training point in ca or bd. m Pr[x i ca ] (3) Pr[x 1 ca x m ca ] = We want the probability of a point landing there (or to be less than ε Pr[x 1 ca x m ca ] = (1 ε) m e εm (4) We want the generalization to violate ε less than δ, solving for m: i Pr[R(h) ε] δ (5) 2e εm δ (6) εm ln δ 2 (7) 1 ε ln 2 m δ (8) Analysis is symmetrical for ca and bd Machine Learning: Jordan Boyd-Graber UMD Introduction to Machine Learning 6 / 11
25 Is a Californian learning temperature PAC learnable? Bad event happens if no training point in ca or bd. m Pr[x i ca ] (3) Pr[x 1 ca x m ca ] = We want the probability of a point landing there (or to be less than ε Pr[x 1 ca x m ca ] = (1 ε) m e εm (4) We want the generalization to violate ε less than δ, solving for m: i Pr[R(h) ε] δ (5) 2e εm δ (6) εm ln δ 2 (7) 1 ε ln 2 m δ (8) δ corresponds to the probability of bad hypothesis Machine Learning: Jordan Boyd-Graber UMD Introduction to Machine Learning 6 / 11
26 Is a Californian learning temperature PAC learnable? Bad event happens if no training point in ca or bd. m Pr[x i ca ] (3) Pr[x 1 ca x m ca ] = We want the probability of a point landing there (or to be less than ε Pr[x 1 ca x m ca ] = (1 ε) m e εm (4) We want the generalization to violate ε less than δ, solving for m: i Pr[R(h) ε] δ (5) 2e εm δ (6) εm ln δ 2 (7) 1 ε ln 2 m δ (8) Take log of both sides Machine Learning: Jordan Boyd-Graber UMD Introduction to Machine Learning 6 / 11
27 Is a Californian learning temperature PAC learnable? Bad event happens if no training point in ca or bd. m Pr[x i ca ] (3) Pr[x 1 ca x m ca ] = We want the probability of a point landing there (or to be less than ε Pr[x 1 ca x m ca ] = (1 ε) m e εm (4) We want the generalization to violate ε less than δ, solving for m: i Pr[R(h) ε] δ (5) 2e εm δ (6) εm ln δ 2 (7) 1 ε ln 2 m δ (8) Direction of inequality flips when you divide by m Machine Learning: Jordan Boyd-Graber UMD Introduction to Machine Learning 6 / 11
28 Consistent Hypotheses, Finite Spaces Possible to prove that specific problems are learnable (and we will!) Can we do something more general? Yes, for finite hypothesis spaces c H That are also consistent with training data Theorem Learning bounds for finite H, consistent Let H be a finite set of functions mapping from to. Let be an algorithm that for a iid sample S returns a consistent hypothesis (training error ˆR(h) = 0), then for any ε,δ > 0, the concept is PAC learnable with samples m 1 ln H + ln 1 ε δ (9) Machine Learning: Jordan Boyd-Graber UMD Introduction to Machine Learning 7 / 11
29 Proof: Setup We want to bound the probability that some h H is consistent and has error more than ε. Pr h H : ˆR(h) = 0 R(h) > ε (10) =Pr h 1 H ˆR(h1 ) = 0 R(h 1 ) > ε h i H ˆR(hi ) = 0 R(h i ) > ε ˆR(h) Pr = 0 R(h) > ε (11) h ˆR(h) Pr = 0 R(h) > ε h (12) Machine Learning: Jordan Boyd-Graber UMD Introduction to Machine Learning 8 / 11
30 Proof: Setup We want to bound the probability that some h H is consistent and has error more than ε. Pr h H : ˆR(h) = 0 R(h) > ε (10) =Pr h 1 H ˆR(h1 ) = 0 R(h 1 ) > ε h i H ˆR(hi ) = 0 R(h i ) > ε ˆR(h) Pr = 0 R(h) > ε (11) h ˆR(h) Pr = 0 R(h) > ε h (12) Machine Learning: Jordan Boyd-Graber UMD Introduction to Machine Learning 8 / 11
31 Proof: Setup We want to bound the probability that some h H is consistent and has error more than ε. Pr h H : ˆR(h) = 0 R(h) > ε (10) =Pr h 1 H ˆR(h1 ) = 0 R(h 1 ) > ε h i H ˆR(hi ) = 0 R(h i ) > ε ˆR(h) Pr = 0 R(h) > ε (11) h ˆR(h) Pr = 0 R(h) > ε h Union bound (12) Machine Learning: Jordan Boyd-Graber UMD Introduction to Machine Learning 8 / 11
32 Proof: Setup We want to bound the probability that some h H is consistent and has error more than ε. Pr h H : ˆR(h) = 0 R(h) > ε (10) =Pr h 1 H ˆR(h1 ) = 0 R(h 1 ) > ε h i H ˆR(hi ) = 0 R(h i ) > ε ˆR(h) Pr = 0 R(h) > ε (11) h ˆR(h) Pr = 0 R(h) > ε h Definition of conditional probability (12) Machine Learning: Jordan Boyd-Graber UMD Introduction to Machine Learning 8 / 11
33 Proof: Connection back to interval learning The generalization error is greater than ε, so we bound probability of no inconsistent points in training for a single hypothesis h. Pr ˆR(h) = 0 R(h) > ε (1 ε) m (13) Machine Learning: Jordan Boyd-Graber UMD Introduction to Machine Learning 9 / 11
34 Proof: Connection back to interval learning The generalization error is greater than ε, so we bound probability of no inconsistent points in training for a single hypothesis h. Pr ˆR(h) = 0 R(h) > ε (1 ε) m (13) but this must be true of all of the hypotheses in H, Pr h H : ˆR(h) = 0 R(h) > ε H (1 ε) m (14) Machine Learning: Jordan Boyd-Graber UMD Introduction to Machine Learning 9 / 11
35 Proof: Connection back to interval learning The generalization error is greater than ε, so we bound probability of no inconsistent points in training for a single hypothesis h. Pr ˆR(h) = 0 R(h) > ε (1 ε) m (13) but this must be true of all of the hypotheses in H, Pr h H : ˆR(h) = 0 R(h) > ε H (1 ε) m (14) H (1 ε) m H e mε =δ we set the RHS to be equal to δ Machine Learning: Jordan Boyd-Graber UMD Introduction to Machine Learning 9 / 11
36 Proof: Connection back to interval learning The generalization error is greater than ε, so we bound probability of no inconsistent points in training for a single hypothesis h. Pr ˆR(h) = 0 R(h) > ε (1 ε) m (13) but this must be true of all of the hypotheses in H, Pr h H : ˆR(h) = 0 R(h) > ε H (1 ε) m (14) H (1 ε) m H e mε =δ lnδ =ln H mε apply log to both sides Machine Learning: Jordan Boyd-Graber UMD Introduction to Machine Learning 9 / 11
37 Proof: Connection back to interval learning The generalization error is greater than ε, so we bound probability of no inconsistent points in training for a single hypothesis h. Pr ˆR(h) = 0 R(h) > ε (1 ε) m (13) but this must be true of all of the hypotheses in H, Pr h H : ˆR(h) = 0 R(h) > ε H (1 ε) m (14) H (1 ε) m H e mε =δ lnδ =ln H mε ln 1 ln H = mε δ move ln H to the other side, and rewrite lnδ = 0 ( lnδ) = 1(ln1 lnδ) = ln( 1 δ ) Machine Learning: Jordan Boyd-Graber UMD Introduction to Machine Learning 9 / 11
38 Proof: Connection back to interval learning The generalization error is greater than ε, so we bound probability of no inconsistent points in training for a single hypothesis h. Pr ˆR(h) = 0 R(h) > ε (1 ε) m (13) but this must be true of all of the hypotheses in H, Pr h H : ˆR(h) = 0 R(h) > ε H (1 ε) m (14) H (1 ε) m H e mε =δ 1 ε lnδ =ln H mε ln 1 ln H = mε δ ln H + ln 1 =m δ Divide by ε Machine Learning: Jordan Boyd-Graber UMD Introduction to Machine Learning 9 / 11
39 But what does it all mean? m 1 ε ln H + ln 1 δ (15) Confidence Complexity Machine Learning: Jordan Boyd-Graber UMD Introduction to Machine Learning 10 / 11
40 But what does it all mean? m 1 ε ln H + ln 1 δ (15) Confidence: More certainty means more training data Complexity Machine Learning: Jordan Boyd-Graber UMD Introduction to Machine Learning 10 / 11
41 But what does it all mean? m 1 ε ln H + ln 1 δ (15) Confidence: More certainty means more training data Complexity: More complicated hypotheses need more training data Machine Learning: Jordan Boyd-Graber UMD Introduction to Machine Learning 10 / 11
42 But what does it all mean? m 1 ε ln H + ln 1 δ (15) Confidence: More certainty means more training data Complexity: More complicated hypotheses need more training data Scary Question What s H for logistic regression? Machine Learning: Jordan Boyd-Graber UMD Introduction to Machine Learning 10 / 11
43 What s next... In class: examples of PAC learnability Next time: how to deal with infinite hypothesis spaces Machine Learning: Jordan Boyd-Graber UMD Introduction to Machine Learning 11 / 11
44 What s next... In class: examples of PAC learnability Next time: how to deal with infinite hypothesis spaces Takeaway Even though we can t prove anything about logistic regression, it still works However, using the theory will lead us to a better classification technique: support vector machines Machine Learning: Jordan Boyd-Graber UMD Introduction to Machine Learning 11 / 11
Classification: The PAC Learning Framework
Classification: The PAC Learning Framework Machine Learning: Jordan Boyd-Graber University of Colorado Boulder LECTURE 5 Slides adapted from Eli Upfal Machine Learning: Jordan Boyd-Graber Boulder Classification:
More informationIntroduction to Machine Learning
Introduction to Machine Learning Machine Learning: Jordan Boyd-Graber University of Maryland RADEMACHER COMPLEXITY Slides adapted from Rob Schapire Machine Learning: Jordan Boyd-Graber UMD Introduction
More information1 The Probably Approximately Correct (PAC) Model
COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #3 Scribe: Yuhui Luo February 11, 2008 1 The Probably Approximately Correct (PAC) Model A target concept class C is PAC-learnable by
More informationIntroduction to Machine Learning
Introduction to Machine Learning Machine Learning: Jordan Boyd-Graber University of Maryland SUPPORT VECTOR MACHINES Slides adapted from Tom Mitchell, Eric Xing, and Lauren Hannah Machine Learning: Jordan
More informationPAC Model and Generalization Bounds
PAC Model and Generalization Bounds Overview Probably Approximately Correct (PAC) model Basic generalization bounds finite hypothesis class infinite hypothesis class Simple case More next week 2 Motivating
More informationGeneralization, Overfitting, and Model Selection
Generalization, Overfitting, and Model Selection Sample Complexity Results for Supervised Classification Maria-Florina (Nina) Balcan 10/03/2016 Two Core Aspects of Machine Learning Algorithm Design. How
More informationHypothesis Testing and Computational Learning Theory. EECS 349 Machine Learning With slides from Bryan Pardo, Tom Mitchell
Hypothesis Testing and Computational Learning Theory EECS 349 Machine Learning With slides from Bryan Pardo, Tom Mitchell Overview Hypothesis Testing: How do we know our learners are good? What does performance
More informationTTIC An Introduction to the Theory of Machine Learning. Learning from noisy data, intro to SQ model
TTIC 325 An Introduction to the Theory of Machine Learning Learning from noisy data, intro to SQ model Avrim Blum 4/25/8 Learning when there is no perfect predictor Hoeffding/Chernoff bounds: minimizing
More informationCS340 Machine learning Lecture 4 Learning theory. Some slides are borrowed from Sebastian Thrun and Stuart Russell
CS340 Machine learning Lecture 4 Learning theory Some slides are borrowed from Sebastian Thrun and Stuart Russell Announcement What: Workshop on applying for NSERC scholarships and for entry to graduate
More informationComputational Learning Theory. Definitions
Computational Learning Theory Computational learning theory is interested in theoretical analyses of the following issues. What is needed to learn effectively? Sample complexity. How many examples? Computational
More informationComputational Learning Theory (COLT)
Computational Learning Theory (COLT) Goals: Theoretical characterization of 1 Difficulty of machine learning problems Under what conditions is learning possible and impossible? 2 Capabilities of machine
More informationIntroduction to Machine Learning
Introduction to Machine Learning Machine Learning: Jordan Boyd-Graber University of Maryland LOGISTIC REGRESSION FROM TEXT Slides adapted from Emily Fox Machine Learning: Jordan Boyd-Graber UMD Introduction
More informationStatistical Learning Learning From Examples
Statistical Learning Learning From Examples We want to estimate the working temperature range of an iphone. We could study the physics and chemistry that affect the performance of the phone too hard We
More informationThe PAC Learning Framework -II
The PAC Learning Framework -II Prof. Dan A. Simovici UMB 1 / 1 Outline 1 Finite Hypothesis Space - The Inconsistent Case 2 Deterministic versus stochastic scenario 3 Bayes Error and Noise 2 / 1 Outline
More informationMachine Learning. Computational Learning Theory. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012
Machine Learning CSE6740/CS7641/ISYE6740, Fall 2012 Computational Learning Theory Le Song Lecture 11, September 20, 2012 Based on Slides from Eric Xing, CMU Reading: Chap. 7 T.M book 1 Complexity of Learning
More informationIntroduction to Algorithms / Algorithms I Lecturer: Michael Dinitz Topic: Intro to Learning Theory Date: 12/8/16
600.463 Introduction to Algorithms / Algorithms I Lecturer: Michael Dinitz Topic: Intro to Learning Theory Date: 12/8/16 25.1 Introduction Today we re going to talk about machine learning, but from an
More informationSupport Vector Machines
Support Vector Machines Jordan Boyd-Graber University of Colorado Boulder LECTURE 7 Slides adapted from Tom Mitchell, Eric Xing, and Lauren Hannah Jordan Boyd-Graber Boulder Support Vector Machines 1 of
More informationECS171: Machine Learning
ECS171: Machine Learning Lecture 6: Training versus Testing (LFD 2.1) Cho-Jui Hsieh UC Davis Jan 29, 2018 Preamble to the theory Training versus testing Out-of-sample error (generalization error): What
More informationMachine Learning. Computational Learning Theory. Eric Xing , Fall Lecture 9, October 5, 2016
Machine Learning 10-701, Fall 2016 Computational Learning Theory Eric Xing Lecture 9, October 5, 2016 Reading: Chap. 7 T.M book Eric Xing @ CMU, 2006-2016 1 Generalizability of Learning In machine learning
More informationClassification: Logistic Regression from Data
Classification: Logistic Regression from Data Machine Learning: Jordan Boyd-Graber University of Colorado Boulder LECTURE 3 Slides adapted from Emily Fox Machine Learning: Jordan Boyd-Graber Boulder Classification:
More informationComputational Learning Theory
Computational Learning Theory Sinh Hoa Nguyen, Hung Son Nguyen Polish-Japanese Institute of Information Technology Institute of Mathematics, Warsaw University February 14, 2006 inh Hoa Nguyen, Hung Son
More informationProbably Approximately Correct Learning - III
Probably Approximately Correct Learning - III Prof. Dan A. Simovici UMB Prof. Dan A. Simovici (UMB) Probably Approximately Correct Learning - III 1 / 18 A property of the hypothesis space Aim : a property
More informationDiscrete Probability Distributions
Discrete Probability Distributions Data Science: Jordan Boyd-Graber University of Maryland JANUARY 18, 2018 Data Science: Jordan Boyd-Graber UMD Discrete Probability Distributions 1 / 1 Refresher: Random
More informationMACHINE LEARNING. Probably Approximately Correct (PAC) Learning. Alessandro Moschitti
MACHINE LEARNING Probably Approximately Correct (PAC) Learning Alessandro Moschitti Department of Information Engineering and Computer Science University of Trento Email: moschitti@disi.unitn.it Objectives:
More informationLecture Learning infinite hypothesis class via VC-dimension and Rademacher complexity;
CSCI699: Topics in Learning and Game Theory Lecture 2 Lecturer: Ilias Diakonikolas Scribes: Li Han Today we will cover the following 2 topics: 1. Learning infinite hypothesis class via VC-dimension and
More informationComputational Learning Theory. CS534 - Machine Learning
Computational Learning Theory CS534 Machine Learning Introduction Computational learning theory Provides a theoretical analysis of learning Shows when a learning algorithm can be expected to succeed Shows
More information12 Statistical Justifications; the Bias-Variance Decomposition
Statistical Justifications; the Bias-Variance Decomposition 65 12 Statistical Justifications; the Bias-Variance Decomposition STATISTICAL JUSTIFICATIONS FOR REGRESSION [So far, I ve talked about regression
More informationOnline Learning. Jordan Boyd-Graber. University of Colorado Boulder LECTURE 21. Slides adapted from Mohri
Online Learning Jordan Boyd-Graber University of Colorado Boulder LECTURE 21 Slides adapted from Mohri Jordan Boyd-Graber Boulder Online Learning 1 of 31 Motivation PAC learning: distribution fixed over
More informationLogistic Regression. Introduction to Data Science Algorithms Jordan Boyd-Graber and Michael Paul SLIDES ADAPTED FROM HINRICH SCHÜTZE
Logistic Regression Introduction to Data Science Algorithms Jordan Boyd-Graber and Michael Paul SLIDES ADAPTED FROM HINRICH SCHÜTZE Introduction to Data Science Algorithms Boyd-Graber and Paul Logistic
More information1 Learning Linear Separators
10-601 Machine Learning Maria-Florina Balcan Spring 2015 Plan: Perceptron algorithm for learning linear separators. 1 Learning Linear Separators Here we can think of examples as being from {0, 1} n or
More informationIntroduction to Statistical Learning Theory
Introduction to Statistical Learning Theory Definition Reminder: We are given m samples {(x i, y i )} m i=1 Dm and a hypothesis space H and we wish to return h H minimizing L D (h) = E[l(h(x), y)]. Problem
More informationGANs. Machine Learning: Jordan Boyd-Graber University of Maryland SLIDES ADAPTED FROM GRAHAM NEUBIG
GANs Machine Learning: Jordan Boyd-Graber University of Maryland SLIDES ADAPTED FROM GRAHAM NEUBIG Machine Learning: Jordan Boyd-Graber UMD GANs 1 / 7 Problems with Generation Generative Models Ain t Perfect
More informationComputational Learning Theory
1 Computational Learning Theory 2 Computational learning theory Introduction Is it possible to identify classes of learning problems that are inherently easy or difficult? Can we characterize the number
More informationCS 6375: Machine Learning Computational Learning Theory
CS 6375: Machine Learning Computational Learning Theory Vibhav Gogate The University of Texas at Dallas Many slides borrowed from Ray Mooney 1 Learning Theory Theoretical characterizations of Difficulty
More informationStatistical and Computational Learning Theory
Statistical and Computational Learning Theory Fundamental Question: Predict Error Rates Given: Find: The space H of hypotheses The number and distribution of the training examples S The complexity of the
More informationVC Dimension Review. The purpose of this document is to review VC dimension and PAC learning for infinite hypothesis spaces.
VC Dimension Review The purpose of this document is to review VC dimension and PAC learning for infinite hypothesis spaces. Previously, in discussing PAC learning, we were trying to answer questions about
More informationClassification. Jordan Boyd-Graber University of Maryland WEIGHTED MAJORITY. Slides adapted from Mohri. Jordan Boyd-Graber UMD Classification 1 / 13
Classification Jordan Boyd-Graber University of Maryland WEIGHTED MAJORITY Slides adapted from Mohri Jordan Boyd-Graber UMD Classification 1 / 13 Beyond Binary Classification Before we ve talked about
More informationIFT Lecture 7 Elements of statistical learning theory
IFT 6085 - Lecture 7 Elements of statistical learning theory This version of the notes has not yet been thoroughly checked. Please report any bugs to the scribes or instructor. Scribe(s): Brady Neal and
More informationCOS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture #5 Scribe: Allen(Zhelun) Wu February 19, ). Then: Pr[err D (h A ) > ɛ] δ
COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #5 Scribe: Allen(Zhelun) Wu February 19, 018 Review Theorem (Occam s Razor). Say algorithm A finds a hypothesis h A H consistent with
More informationIntroduction to Machine Learning
Introduction to Machine Learning PAC Learning and VC Dimension Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE
More informationLearning Theory. Machine Learning B Seyoung Kim. Many of these slides are derived from Tom Mitchell, Ziv- Bar Joseph. Thanks!
Learning Theory Machine Learning 10-601B Seyoung Kim Many of these slides are derived from Tom Mitchell, Ziv- Bar Joseph. Thanks! Computa2onal Learning Theory What general laws constrain inducgve learning?
More informationCSE 417T: Introduction to Machine Learning. Lecture 11: Review. Henry Chai 10/02/18
CSE 417T: Introduction to Machine Learning Lecture 11: Review Henry Chai 10/02/18 Unknown Target Function!: # % Training data Formal Setup & = ( ), + ),, ( -, + - Learning Algorithm 2 Hypothesis Set H
More informationLearning Theory. Aar$ Singh and Barnabas Poczos. Machine Learning / Apr 17, Slides courtesy: Carlos Guestrin
Learning Theory Aar$ Singh and Barnabas Poczos Machine Learning 10-701/15-781 Apr 17, 2014 Slides courtesy: Carlos Guestrin Learning Theory We have explored many ways of learning from data But How good
More informationMachine Learning: Chenhao Tan University of Colorado Boulder LECTURE 9
Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 9 Slides adapted from Jordan Boyd-Graber Machine Learning: Chenhao Tan Boulder 1 of 39 Recap Supervised learning Previously: KNN, naïve
More informationSupervised Machine Learning (Spring 2014) Homework 2, sample solutions
58669 Supervised Machine Learning (Spring 014) Homework, sample solutions Credit for the solutions goes to mainly to Panu Luosto and Joonas Paalasmaa, with some additional contributions by Jyrki Kivinen
More informationCS340 Machine learning Lecture 5 Learning theory cont'd. Some slides are borrowed from Stuart Russell and Thorsten Joachims
CS340 Machine learning Lecture 5 Learning theory cont'd Some slides are borrowed from Stuart Russell and Thorsten Joachims Inductive learning Simplest form: learn a function from examples f is the target
More informationMachine Learning Foundations
Machine Learning Foundations ( 機器學習基石 ) Lecture 4: Feasibility of Learning Hsuan-Tien Lin ( 林軒田 ) htlin@csie.ntu.edu.tw Department of Computer Science & Information Engineering National Taiwan University
More informationIntroduction to Bayesian Learning. Machine Learning Fall 2018
Introduction to Bayesian Learning Machine Learning Fall 2018 1 What we have seen so far What does it mean to learn? Mistake-driven learning Learning by counting (and bounding) number of mistakes PAC learnability
More informationLearning Theory. Machine Learning CSE546 Carlos Guestrin University of Washington. November 25, Carlos Guestrin
Learning Theory Machine Learning CSE546 Carlos Guestrin University of Washington November 25, 2013 Carlos Guestrin 2005-2013 1 What now n We have explored many ways of learning from data n But How good
More informationLecture 25 of 42. PAC Learning, VC Dimension, and Mistake Bounds
Lecture 25 of 42 PAC Learning, VC Dimension, and Mistake Bounds Thursday, 15 March 2007 William H. Hsu, KSU http://www.kddresearch.org/courses/spring2007/cis732 Readings: Sections 7.4.17.4.3, 7.5.17.5.3,
More informationOn the Sample Complexity of Noise-Tolerant Learning
On the Sample Complexity of Noise-Tolerant Learning Javed A. Aslam Department of Computer Science Dartmouth College Hanover, NH 03755 Scott E. Decatur Laboratory for Computer Science Massachusetts Institute
More informationGeneralization, Overfitting, and Model Selection
Generalization, Overfitting, and Model Selection Sample Complexity Results for Supervised Classification MariaFlorina (Nina) Balcan 10/05/2016 Reminders Midterm Exam Mon, Oct. 10th Midterm Review Session
More information1 A Lower Bound on Sample Complexity
COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #7 Scribe: Chee Wei Tan February 25, 2008 1 A Lower Bound on Sample Complexity In the last lecture, we stopped at the lower bound on
More informationComputational Learning Theory
Computational Learning Theory Slides by and Nathalie Japkowicz (Reading: R&N AIMA 3 rd ed., Chapter 18.5) Computational Learning Theory Inductive learning: given the training set, a learning algorithm
More information1 Proof of learning bounds
COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #4 Scribe: Akshay Mittal February 13, 2013 1 Proof of learning bounds For intuition of the following theore, suppose there exists a
More informationLecture 5: Efficient PAC Learning. 1 Consistent Learning: a Bound on Sample Complexity
Universität zu Lübeck Institut für Theoretische Informatik Lecture notes on Knowledge-Based and Learning Systems by Maciej Liśkiewicz Lecture 5: Efficient PAC Learning 1 Consistent Learning: a Bound on
More informationLearning theory Lecture 4
Learning theory Lecture 4 David Sontag New York University Slides adapted from Carlos Guestrin & Luke Zettlemoyer What s next We gave several machine learning algorithms: Perceptron Linear support vector
More informationTHE VAPNIK- CHERVONENKIS DIMENSION and LEARNABILITY
THE VAPNIK- CHERVONENKIS DIMENSION and LEARNABILITY Dan A. Simovici UMB, Doctoral Summer School Iasi, Romania What is Machine Learning? The Vapnik-Chervonenkis Dimension Probabilistic Learning Potential
More informationDoes Unlabeled Data Help?
Does Unlabeled Data Help? Worst-case Analysis of the Sample Complexity of Semi-supervised Learning. Ben-David, Lu and Pal; COLT, 2008. Presentation by Ashish Rastogi Courant Machine Learning Seminar. Outline
More information1 Differential Privacy and Statistical Query Learning
10-806 Foundations of Machine Learning and Data Science Lecturer: Maria-Florina Balcan Lecture 5: December 07, 015 1 Differential Privacy and Statistical Query Learning 1.1 Differential Privacy Suppose
More informationCOMS 4771 Introduction to Machine Learning. Nakul Verma
COMS 4771 Introduction to Machine Learning Nakul Verma Announcements HW2 due now! Project proposal due on tomorrow Midterm next lecture! HW3 posted Last time Linear Regression Parametric vs Nonparametric
More informationLearning Theory. Piyush Rai. CS5350/6350: Machine Learning. September 27, (CS5350/6350) Learning Theory September 27, / 14
Learning Theory Piyush Rai CS5350/6350: Machine Learning September 27, 2011 (CS5350/6350) Learning Theory September 27, 2011 1 / 14 Why Learning Theory? We want to have theoretical guarantees about our
More informationThe Vapnik-Chervonenkis Dimension
The Vapnik-Chervonenkis Dimension Prof. Dan A. Simovici UMB 1 / 91 Outline 1 Growth Functions 2 Basic Definitions for Vapnik-Chervonenkis Dimension 3 The Sauer-Shelah Theorem 4 The Link between VCD and
More informationComputational Learning Theory: Probably Approximately Correct (PAC) Learning. Machine Learning. Spring The slides are mainly from Vivek Srikumar
Computational Learning Theory: Probably Approximately Correct (PAC) Learning Machine Learning Spring 2018 The slides are mainly from Vivek Srikumar 1 This lecture: Computational Learning Theory The Theory
More informationGeneralization Bounds
Generalization Bounds Here we consider the problem of learning from binary labels. We assume training data D = x 1, y 1,... x N, y N with y t being one of the two values 1 or 1. We will assume that these
More informationComputational Learning Theory
0. Computational Learning Theory Based on Machine Learning, T. Mitchell, McGRAW Hill, 1997, ch. 7 Acknowledgement: The present slides are an adaptation of slides drawn by T. Mitchell 1. Main Questions
More informationWeb-Mining Agents Computational Learning Theory
Web-Mining Agents Computational Learning Theory Prof. Dr. Ralf Möller Dr. Özgür Özcep Universität zu Lübeck Institut für Informationssysteme Tanya Braun (Exercise Lab) Computational Learning Theory (Adapted)
More informationPart of the slides are adapted from Ziko Kolter
Part of the slides are adapted from Ziko Kolter OUTLINE 1 Supervised learning: classification........................................................ 2 2 Non-linear regression/classification, overfitting,
More information1 Review of The Learning Setting
COS 5: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #8 Scribe: Changyan Wang February 28, 208 Review of The Learning Setting Last class, we moved beyond the PAC model: in the PAC model we
More information1 Learning Linear Separators
8803 Machine Learning Theory Maria-Florina Balcan Lecture 3: August 30, 2011 Plan: Perceptron algorithm for learning linear separators. 1 Learning Linear Separators Here we can think of examples as being
More informationExpectations and Entropy
Expectations and Entropy Data Science: Jordan Boyd-Graber University of Maryland SLIDES ADAPTED FROM DAVE BLEI AND LAUREN HANNAH Data Science: Jordan Boyd-Graber UMD Expectations and Entropy 1 / 9 Expectation
More informationMachine Learning
Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 11, 2012 Today: Computational Learning Theory Probably Approximately Coorrect (PAC) learning theorem
More informationLearning Theory, Overfi1ng, Bias Variance Decomposi9on
Learning Theory, Overfi1ng, Bias Variance Decomposi9on Machine Learning 10-601B Seyoung Kim Many of these slides are derived from Tom Mitchell, Ziv- 1 Bar Joseph. Thanks! Any(!) learner that outputs a
More informationVC dimension and Model Selection
VC dimension and Model Selection Overview PAC model: review VC dimension: Definition Examples Sample: Lower bound Upper bound!!! Model Selection Introduction to Machine Learning 2 PAC model: Setting A
More informationPAC-learning, VC Dimension and Margin-based Bounds
More details: General: http://www.learning-with-kernels.org/ Example of more complex bounds: http://www.research.ibm.com/people/t/tzhang/papers/jmlr02_cover.ps.gz PAC-learning, VC Dimension and Margin-based
More informationECE 5424: Introduction to Machine Learning
ECE 5424: Introduction to Machine Learning Topics: Ensemble Methods: Bagging, Boosting PAC Learning Readings: Murphy 16.4;; Hastie 16 Stefan Lee Virginia Tech Fighting the bias-variance tradeoff Simple
More informationGeneralization and Overfitting
Generalization and Overfitting Model Selection Maria-Florina (Nina) Balcan February 24th, 2016 PAC/SLT models for Supervised Learning Data Source Distribution D on X Learning Algorithm Expert / Oracle
More information1 Generalization bounds based on Rademacher complexity
COS 5: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #0 Scribe: Suqi Liu March 07, 08 Last tie we started proving this very general result about how quickly the epirical average converges
More informationInexact Search is Good Enough
Inexact Search is Good Enough Advanced Machine Learning for NLP Jordan Boyd-Graber MATHEMATICAL TREATMENT Advanced Machine Learning for NLP Boyd-Graber Inexact Search is Good Enough 1 of 1 Preliminaries:
More informationMachine Learning
Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 11, 2012 Today: Computational Learning Theory Probably Approximately Coorrect (PAC) learning theorem
More informationGeneralization theory
Generalization theory Chapter 4 T.P. Runarsson (tpr@hi.is) and S. Sigurdsson (sven@hi.is) Introduction Suppose you are given the empirical observations, (x 1, y 1 ),..., (x l, y l ) (X Y) l. Consider the
More information1 Active Learning Foundations of Machine Learning and Data Science. Lecturer: Maria-Florina Balcan Lecture 20 & 21: November 16 & 18, 2015
10-806 Foundations of Machine Learning and Data Science Lecturer: Maria-Florina Balcan Lecture 20 & 21: November 16 & 18, 2015 1 Active Learning Most classic machine learning methods and the formal learning
More informationVBM683 Machine Learning
VBM683 Machine Learning Pinar Duygulu Slides are adapted from Dhruv Batra Bias is the algorithm's tendency to consistently learn the wrong thing by not taking into account all the information in the data
More informationReferences for online kernel methods
References for online kernel methods W. Liu, J. Principe, S. Haykin Kernel Adaptive Filtering: A Comprehensive Introduction. Wiley, 2010. W. Liu, P. Pokharel, J. Principe. The kernel least mean square
More information2 Upper-bound of Generalization Error of AdaBoost
COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #10 Scribe: Haipeng Zheng March 5, 2008 1 Review of AdaBoost Algorithm Here is the AdaBoost Algorithm: input: (x 1,y 1 ),...,(x m,y
More informationMachine Learning. Lecture 9: Learning Theory. Feng Li.
Machine Learning Lecture 9: Learning Theory Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2018 Why Learning Theory How can we tell
More informationComputational learning theory. PAC learning. VC dimension.
Computational learning theory. PAC learning. VC dimension. Petr Pošík Czech Technical University in Prague Faculty of Electrical Engineering Dept. of Cybernetics COLT 2 Concept...........................................................................................................
More informationECE521 Lecture7. Logistic Regression
ECE521 Lecture7 Logistic Regression Outline Review of decision theory Logistic regression A single neuron Multi-class classification 2 Outline Decision theory is conceptually easy and computationally hard
More informationDiscriminative Models
No.5 Discriminative Models Hui Jiang Department of Electrical Engineering and Computer Science Lassonde School of Engineering York University, Toronto, Canada Outline Generative vs. Discriminative models
More informationPAC Learning. prof. dr Arno Siebes. Algorithmic Data Analysis Group Department of Information and Computing Sciences Universiteit Utrecht
PAC Learning prof. dr Arno Siebes Algorithmic Data Analysis Group Department of Information and Computing Sciences Universiteit Utrecht Recall: PAC Learning (Version 1) A hypothesis class H is PAC learnable
More informationAdvanced Introduction to Machine Learning CMU-10715
Advanced Introduction to Machine Learning CMU-10715 Risk Minimization Barnabás Póczos What have we seen so far? Several classification & regression algorithms seem to work fine on training datasets: Linear
More informationPAC-learning, VC Dimension and Margin-based Bounds
More details: General: http://www.learning-with-kernels.org/ Example of more complex bounds: http://www.research.ibm.com/people/t/tzhang/papers/jmlr02_cover.ps.gz PAC-learning, VC Dimension and Margin-based
More informationLearning with multiple models. Boosting.
CS 2750 Machine Learning Lecture 21 Learning with multiple models. Boosting. Milos Hauskrecht milos@cs.pitt.edu 5329 Sennott Square Learning with multiple models: Approach 2 Approach 2: use multiple models
More informationA Tutorial on Computational Learning Theory Presented at Genetic Programming 1997 Stanford University, July 1997
A Tutorial on Computational Learning Theory Presented at Genetic Programming 1997 Stanford University, July 1997 Vasant Honavar Artificial Intelligence Research Laboratory Department of Computer Science
More informationIntroduction to Computational Learning Theory
Introduction to Computational Learning Theory The classification problem Consistent Hypothesis Model Probably Approximately Correct (PAC) Learning c Hung Q. Ngo (SUNY at Buffalo) CSE 694 A Fun Course 1
More informationComputational and Statistical Learning Theory
Computational and Statistical Learning Theory TTIC 31120 Prof. Nati Srebro Lecture 7: Computational Complexity of Learning Agnostic Learning Hardness of Learning via Crypto Assumption: No poly-time algorithm
More informationLogistic Regression. INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber SLIDES ADAPTED FROM HINRICH SCHÜTZE
Logistic Regression INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber SLIDES ADAPTED FROM HINRICH SCHÜTZE INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Logistic Regression
More informationMachine Learning. Machine Learning: Jordan Boyd-Graber University of Maryland REINFORCEMENT LEARNING. Slides adapted from Tom Mitchell and Peter Abeel
Machine Learning Machine Learning: Jordan Boyd-Graber University of Maryland REINFORCEMENT LEARNING Slides adapted from Tom Mitchell and Peter Abeel Machine Learning: Jordan Boyd-Graber UMD Machine Learning
More informationComputational Learning Theory
Computational Learning Theory Pardis Noorzad Department of Computer Engineering and IT Amirkabir University of Technology Ordibehesht 1390 Introduction For the analysis of data structures and algorithms
More informationRandom Variables and Events
Random Variables and Events Data Science: Jordan Boyd-Graber University of Maryland SLIDES ADAPTED FROM DAVE BLEI AND LAUREN HANNAH Data Science: Jordan Boyd-Graber UMD Random Variables and Events 1 /
More information