Computational Learning Theory

Size: px
Start display at page:

Download "Computational Learning Theory"

Transcription

1 09s1: COMP9417 Machine Learning and Data Mining Computational Learning Theory May 20, 2009 Acknowledgement: Material derived from slides for the book Machine Learning, Tom M. Mitchell, McGraw-Hill, and slides by Andrew W. Moore available at and the book Data Mining, Ian H. Witten and Eibe Frank, Morgan Kauffman, and the book Pattern Classification, Richard O. Duda, Peter E. Hart, and David G. Stork. Copyright (c) 2001 by John Wiley & Sons, Inc. and the book Elements of Statistical Learning, Trevor Hastie, Robert Tibshirani and Jerome Friedman. (c) 2001, Springer. Aims This lecture will enable you to define and reproduce fundamental approaches and results from the theory of machine learning. Following it you should be able to: describe a basic theoretical framework for sample complexity of learning define training error and true error define Probably Approximately Correct (PAC) learning define the Vapnik-Chervonenkis (VC) dimension define optimal mistake bounds reproduce the basic results in each area [Recommended reading: Mitchell, Chapter 7] [Recommended exercises: 7.2, 7.4, 7.5, 7.6, (7.1, 7.3, 7.7)] Computational Learning Theory What general laws constrain inductive learning? We seek theory to relate: Probability of successful learning Number of training examples Complexity of hypothesis space Time complexity of learning algorithm Accuracy to which target concept is approximated Manner in which training examples presented COMP9417: May 20, 2009 Computational Learning Theory: Slide 1 COMP9417: May 20, 2009 Computational Learning Theory: Slide 2

2 Computational Learning Theory Computational Learning Theory Characterise classes of algorithms using questions such as: Sample complexity How many training examples required for learner to converge (with high probability) to a successful hypothesis? Computational complexity Successful hypothesis: identical to target concept? mostly agrees with target concept...?... does this most of the time? How much computational effort required for learner to converge (with high probability) to a successful hypothesis? Mistake bounds How many training examples will the learner misclassify before converging to a successful hypothesis? COMP9417: May 20, 2009 Computational Learning Theory: Slide 3 COMP9417: May 20, 2009 Computational Learning Theory: Slide 4 Given: Prototypical Concept Learning Task Instances X: Possible days, each described by the attributes Sky, AirTemp, Humidity, Wind, Water, Forecast Target function c: EnjoySport : X {0, 1} Hypotheses H: Conjunctions of literals. E.g.?, Cold, High,?,?,? Training examples D: Positive and negative examples of the target function x 1, c(x 1 ),... x m, c(x m ) Determine: A hypothesis h in H such that h(x) = c(x) for all x in D? A hypothesis h in H such that h(x) = c(x) for all x in X? Sample Complexity How many training examples are sufficient to learn the target concept? 1. If learner proposes instances, as queries to teacher learner proposes instance x, teacher provides c(x) 2. If teacher (who knows c) provides training examples teacher provides sequence of examples of form x, c(x) 3. If some random process (e.g., nature) proposes instances instance x generated randomly, teacher provides c(x) COMP9417: May 20, 2009 Computational Learning Theory: Slide 5 COMP9417: May 20, 2009 Computational Learning Theory: Slide 6

3 Sample Complexity: 1 Learner proposes instance x, teacher provides c(x) (assume c is in learner s hypothesis space H) Optimal query strategy: play 20 questions pick instance x such that half of the hypotheses in V S classify x positive, and half classify x negative when this is possible, need log 2 H queries to learn c when not possible, need even more learner conducts experiments (generates queries) see Version Space learning Sample Complexity: 2 Teacher (who knows c) provides training examples (assume c is in learner s hypothesis space H) Optimal teaching strategy: depends on H used by learner Consider the case H = conjunctions of up to n Boolean literals e.g., (AirT emp = W arm) (W ind = Strong), where AirT emp, W ind,... each have 2 possible values. if n possible Boolean attributes in H, n + 1 examples suffice why? COMP9417: May 20, 2009 Computational Learning Theory: Slide 7 COMP9417: May 20, 2009 Computational Learning Theory: Slide 8 Sample Complexity: 3 Given: set of instances X set of hypotheses H True Error of a Hypothesis Instance space X set of possible target concepts C training instances generated by a fixed, unknown probability distribution D over X Learner observes a sequence D of training examples of form x, c(x), for some target concept c C - c + h - instances x are drawn from distribution D teacher provides target value c(x) for each + Learner must output a hypothesis h estimating c h is evaluated by its performance on subsequent instances drawn according to D Note: randomly drawn instances, noise-free classifications Where c and h disagree - COMP9417: May 20, 2009 Computational Learning Theory: Slide 9 COMP9417: May 20, 2009 Computational Learning Theory: Slide 10

4 True Error of a Hypothesis Definition: The true error (denoted error D (h)) of hypothesis h with respect to target concept c and distribution D is the probability that h will misclassify an instance drawn at random according to D. error D (h) Pr [c(x) h(x)] x D Two Notions of Error Training error of hypothesis h with respect to target concept c How often h(x) c(x) over training instances True error of hypothesis h with respect to c How often h(x) c(x) over future random instances Our concern: Can we bound the true error of h given the training error of h? First consider when training error of h is zero (i.e., h V S H,D ) COMP9417: May 20, 2009 Computational Learning Theory: Slide 11 COMP9417: May 20, 2009 Computational Learning Theory: Slide 12 Exhausting the Version Space Hypothesis space H Exhausting the Version Space Note: in the diagram (r = training error, error = true error) error =.1 r =.2 error =.2 r =0 error =.3 r =.4 Definition: The version space V S H,D is said to be ɛ-exhausted with respect to c and D, if every hypothesis h in V S H,D has error less than ɛ with respect to c and D. ( h V S H,D ) error D (h) < ɛ error =.3 r =.1 VS H,D error =.1 r =0 error =.2 r =.3 COMP9417: May 20, 2009 Computational Learning Theory: Slide 13 COMP9417: May 20, 2009 Computational Learning Theory: Slide 14

5 How many examples will ɛ-exhaust the VS? Theorem: [Haussler, 1988]. How many examples will ɛ-exhaust the VS? If we want this probability to be below δ If the hypothesis space H is finite, and D is a sequence of m 1 independent random examples of some target concept c, then for any 0 ɛ 1, the probability that the version space with respect to H and D is not ɛ-exhausted (with respect to c) is less than H e ɛm then H e ɛm δ m 1 (ln H + ln(1/δ)) ɛ Interesting! This bounds the probability that any consistent learner will output a hypothesis h with error(h) ɛ COMP9417: May 20, 2009 Computational Learning Theory: Slide 15 COMP9417: May 20, 2009 Computational Learning Theory: Slide 16 Learning Conjunctions of Boolean Literals How many examples are sufficient to assure with probability at least (1 δ) that Learning Conjunctions of Boolean Literals Suppose H contains conjunctions of constraints on up to n Boolean attributes (i.e., n Boolean literals). Then H = 3 n, and every h in V S H,D satisfies error D (h) ɛ Use our theorem: m 1 (ln H + ln(1/δ)) ɛ or m 1 ɛ (ln 3n + ln(1/δ)) m 1 (n ln 3 + ln(1/δ)) ɛ COMP9417: May 20, 2009 Computational Learning Theory: Slide 17 COMP9417: May 20, 2009 Computational Learning Theory: Slide 18

6 How About EnjoySport? m 1 (ln H + ln(1/δ)) ɛ If H is as given in EnjoySport then H = 973, and m 1 (ln ln(1/δ)) ɛ How About EnjoySport?... if want to assure that with probability 95%, V S contains only hypotheses with error D (h).1, then it is sufficient to have m examples, where m 1 (ln ln(1/.05)).1 m 10(ln ln 20) m 10( ) m 98.8 COMP9417: May 20, 2009 Computational Learning Theory: Slide 19 COMP9417: May 20, 2009 Computational Learning Theory: Slide 20 PAC Learning Consider a class C of possible target concepts defined over a set of instances X of length n, and a learner L using hypothesis space H. Definition: C is PAC-learnable by L using H if for all c C, distributions D over X, ɛ such that 0 < ɛ < 1/2, and δ such that 0 < δ < 1/2, learner L will with probability at least (1 δ) output a hypothesis h H such that error D (h) ɛ, in time that is polynomial in 1/ɛ, 1/δ, n and size(c). So far, assumed c H Agnostic Learning Agnostic learning setting: don t assume c H What do we want then? The hypothesis h that makes fewest errors on training data What is sample complexity in this case? derived from Hoeffding bounds: m 1 2ɛ2(ln H + ln(1/δ)) P r[error D (h) > error D (h) + ɛ] e 2mɛ2 COMP9417: May 20, 2009 Computational Learning Theory: Slide 21 COMP9417: May 20, 2009 Computational Learning Theory: Slide 22

7 Unbiased Learners Bias, Variance and Model Complexity Unbiased concept class C contains all target concepts definable on instance space X. C = 2 X Say X is defined using n Boolean features, then X = 2 n. C = 2 2n An unbiased learner has a hypothesis space able to represent all possible target concepts, i.e., H = C. m 1 ɛ (2n ln 2 + ln(1/δ)) i.e., exponential (in the number of features) sample complexity COMP9417: May 20, 2009 Computational Learning Theory: Slide 23 COMP9417: May 20, 2009 Computational Learning Theory: Slide 24 Bias, Variance and Model Complexity How Complex is a Hypothesis? We can see the behaviour of different models predictive accuaracy on test sample and training sample as the model complexity is varied. How to measure model complexity? number of parameters? description length? COMP9417: May 20, 2009 Computational Learning Theory: Slide 25 COMP9417: May 20, 2009 Computational Learning Theory: Slide 26

8 How Complex is a Hypothesis? The solid curve is the function sin(50x) for x [0, 1]. The blue (solid) and green (hollow) points illustrate how the associated indicator function I(sin(αx) > 0) can shatter (separate) an arbitrarily large number of points by choosing an appropriately high frequency α. Classes separated based on sin(αx), for frequency α, a single parameter. Shattering a Set of Instances Definition: a dichotomy of a set S is a partition of S into two disjoint subsets. Definition: a set of instances S is shattered by hypothesis space H if and only if for every dichotomy of S there exists some hypothesis in H consistent with this dichotomy. COMP9417: May 20, 2009 Computational Learning Theory: Slide 27 COMP9417: May 20, 2009 Computational Learning Theory: Slide 28 Three Instances Shattered Instance space X The Vapnik-Chervonenkis Dimension Definition: The Vapnik-Chervonenkis dimension, V C(H), of hypothesis space H defined over instance space X is the size of the largest finite subset of X shattered by H. If arbitrarily large finite sets of X can be shattered by H, then V C(H). COMP9417: May 20, 2009 Computational Learning Theory: Slide 29 COMP9417: May 20, 2009 Computational Learning Theory: Slide 30

9 VC Dim. of Linear Decision Surfaces VC Dim. of Linear Decision Surfaces ( a) ( b) COMP9417: May 20, 2009 Computational Learning Theory: Slide 31 COMP9417: May 20, 2009 Computational Learning Theory: Slide 32 Sample Complexity from VC Dimension How many randomly drawn examples suffice to ɛ-exhaust V S H,D with probability at least (1 δ)? m 1 ɛ (4 log 2(2/δ) + 8V C(H) log 2 (13/ɛ)) Mistake Bounds So far: how many examples needed to learn? What about: how many mistakes before convergence? Let s consider similar setting to PAC learning: Instances drawn at random from X according to distribution D Learner must classify each instance before receiving correct classification from teacher Can we bound the number of mistakes learner makes before converging? COMP9417: May 20, 2009 Computational Learning Theory: Slide 33 COMP9417: May 20, 2009 Computational Learning Theory: Slide 34

10 Winnow Mistake bounds following Littlestone (1987) who developed Winnow, an online, mistake-driven algorithm similar to perceptron learns linear threshold function designed to learn in the presence of many irrelevant features number of mistakes grows only logarithmically with number of irrelevant features Winnow While some instances are misclassified For each instance a classify a using current weights If predicted class is incorrect If a has class 1 For each a i = 1, w i αw i (if a i = 0, leave w i unchanged) Otherwise For each a i = 1, w i w i α (if a i = 0, leave w i unchanged) COMP9417: May 20, 2009 Computational Learning Theory: Slide 35 COMP9417: May 20, 2009 Computational Learning Theory: Slide 36 Winnow Mistake Bounds: Find-S user-supplied threshold θ class is 1 if w i a i > θ note similarity to perceptron training rule Winnow uses multiplicative weight updates α > 1 will do better than perceptron with many irrelevant attributes Consider Find-S when H = conjunction of Boolean literals Find-S: Initialize h to the most specific hypothesis l 1 l 1 l 2 l 2... l n l n For each positive training instance x Remove from h any literal that is not satisfied by x Output hypothesis h. an attribute-efficient learner How many mistakes before converging to correct h? COMP9417: May 20, 2009 Computational Learning Theory: Slide 37 COMP9417: May 20, 2009 Computational Learning Theory: Slide 38

11 Mistake Bounds: Find-S Mistake Bounds: Find-S Find-S will converge to error-free hypothesis in the limit if C H data are noise-free How many mistakes before converging to correct h? Find-S initially classifies all instances as negative will generalize for positive examples by dropping unsatisfied literals since c H, will never classify a negative instance as positive (if it did, would exist a hypothesis h such that c g h would exist an instance x such that h (x) = 1 and c(x) 1 contradicts definition of generality ordering g ) mistake bound from number of positives classified as negatives How many mistakes before converging to correct h? COMP9417: May 20, 2009 Computational Learning Theory: Slide 39 COMP9417: May 20, 2009 Computational Learning Theory: Slide 40 2n terms in initial hypothesis Mistake Bounds: Find-S first mistake, remove half of these terms, leaving n each further mistake, remove at least 1 term in worst case, will have to remove all n remaining terms would be most general concept - everything is positive worst case number of mistakes would be n + 1 worst case sequence of learning steps, removing only one literal per step Mistake Bounds: Halving Algorithm Consider the Halving Algorithm: Learn concept using version space Candidate-Elimination algorithm Classify new instances by majority vote of version space members How many mistakes before converging to correct h?... in worst case?... in best case? COMP9417: May 20, 2009 Computational Learning Theory: Slide 41 COMP9417: May 20, 2009 Computational Learning Theory: Slide 42

12 Mistake Bounds: Halving Algorithm Mistake Bounds: Halving Algorithm Consider the Halving Algorithm: Learn concept using version space Candidate-Elimination algorithm Classify new instances by majority vote of version space members How many mistakes before converging to correct h? Halving Algorithm learns exactly when the version space contains only one hypothesis which corresponds to the target concept at each step, remove all hypotheses whose vote was incorrect mistake when majority vote classification is incorrect mistake bound in converging to correct h?... in worst case?... in best case? COMP9417: May 20, 2009 Computational Learning Theory: Slide 43 COMP9417: May 20, 2009 Computational Learning Theory: Slide 44 how many mistakes worst case? Mistake Bounds: Halving Algorithm on every step, mistake because majority vote is incorrect each mistake, number of hypotheses reduced by at least half hypothesis space size H, worst-case mistake bound log 2 H how many mistakes best case? on every step, no mistake because majority vote is correct still remove all incorrect hypotheses, up to half best case, no mistakes in converging to correct h Optimal Mistake Bounds Let M A (C) be the max number of mistakes made by algorithm A to learn concepts in C. (maximum over all possible c C, and all possible training sequences) M A (C) max c C M A(c) Definition: Let C be an arbitrary non-empty concept class. The optimal mistake bound for C, denoted Opt(C), is the minimum over all possible learning algorithms A of M A (C). Opt(C) min M A(C) A learning algorithms V C(C) Opt(C) M Halving (C) log 2 ( C ). COMP9417: May 20, 2009 Computational Learning Theory: Slide 45 COMP9417: May 20, 2009 Computational Learning Theory: Slide 46

13 Weighted Majority generalisation of Halving Algorithm predicts by weighted vote of set of prediction algorithms learns by altering weights for prediction algorithms a prediction algorithm simply predicts value of target concept given instance elements of hypothesis space H different learning algorithms inconsistency between prediction algorithm and training example reduce weight of prediction algorithm bound no. of mistakes of ensemble by no. of mistakes made by best prediction algorithm a i is the i-th prediction algorithm w i is the weight associated with a i Weighted Majority For all i, initialize w i 1 For each training example x, c(x) Initialize q 0 and q 1 to 0 For each prediction algorithm a i If a i (x) = 0 then q 0 q 0 + w i If a i (x) = 1 then q 1 q 1 + w i If q 1 > q 0 then predict c(x) = 1 If q 0 > q 1 then predict c(x) = 0 If q 0 = q 1 then predict 0 or 1 at random for c(x) For each prediction algorithm a i If a i (x) c(x) then w i βw i COMP9417: May 20, 2009 Computational Learning Theory: Slide 47 COMP9417: May 20, 2009 Computational Learning Theory: Slide 48 Weighted Majority Weighted Majority algorithm begins by assigning weight of 1 to each prediction algorithm. On misclassification by a prediction algorithm its weight is reduced by multiplication by constant β, 0 β 1. Equivalent to Halving Algorithm when β = 0. Otherwise, just down-weights contribution of algorithms which make errors. Key result: number of mistakes made by Weighted Majority algorithm will never be greater than a constant factor times the number of mistakes made by the best member of the pool, plus a term that grows only logarithmically in the number of prediction algorithms in the pool. Summary Algorithm independent learning analysed by computational complexity methods An alternative view point to statistics; complementary, different goals Many different frameworks, no clear winner worst- Many results over-conservative from practical viewpoint, e.g. case rather than average-case analyses Has led to advances in practically useful algorithms, e.g. PAC Boosting VC dimension SVM COMP9417: May 20, 2009 Computational Learning Theory: Slide 49 COMP9417: May 20, 2009 Computational Learning Theory: Slide 50

Computational Learning Theory

Computational Learning Theory Computational Learning Theory Sinh Hoa Nguyen, Hung Son Nguyen Polish-Japanese Institute of Information Technology Institute of Mathematics, Warsaw University February 14, 2006 inh Hoa Nguyen, Hung Son

More information

Computational Learning Theory

Computational Learning Theory Computational Learning Theory [read Chapter 7] [Suggested exercises: 7.1, 7.2, 7.5, 7.8] Computational learning theory Setting 1: learner poses queries to teacher Setting 2: teacher chooses examples Setting

More information

Computational Learning Theory

Computational Learning Theory 0. Computational Learning Theory Based on Machine Learning, T. Mitchell, McGRAW Hill, 1997, ch. 7 Acknowledgement: The present slides are an adaptation of slides drawn by T. Mitchell 1. Main Questions

More information

Computational Learning Theory

Computational Learning Theory 1 Computational Learning Theory 2 Computational learning theory Introduction Is it possible to identify classes of learning problems that are inherently easy or difficult? Can we characterize the number

More information

Machine Learning. Computational Learning Theory. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012

Machine Learning. Computational Learning Theory. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012 Machine Learning CSE6740/CS7641/ISYE6740, Fall 2012 Computational Learning Theory Le Song Lecture 11, September 20, 2012 Based on Slides from Eric Xing, CMU Reading: Chap. 7 T.M book 1 Complexity of Learning

More information

Machine Learning. Computational Learning Theory. Eric Xing , Fall Lecture 9, October 5, 2016

Machine Learning. Computational Learning Theory. Eric Xing , Fall Lecture 9, October 5, 2016 Machine Learning 10-701, Fall 2016 Computational Learning Theory Eric Xing Lecture 9, October 5, 2016 Reading: Chap. 7 T.M book Eric Xing @ CMU, 2006-2016 1 Generalizability of Learning In machine learning

More information

Machine Learning

Machine Learning Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 11, 2012 Today: Computational Learning Theory Probably Approximately Coorrect (PAC) learning theorem

More information

Machine Learning

Machine Learning Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 11, 2012 Today: Computational Learning Theory Probably Approximately Coorrect (PAC) learning theorem

More information

Computational Learning Theory. Definitions

Computational Learning Theory. Definitions Computational Learning Theory Computational learning theory is interested in theoretical analyses of the following issues. What is needed to learn effectively? Sample complexity. How many examples? Computational

More information

Concept Learning through General-to-Specific Ordering

Concept Learning through General-to-Specific Ordering 0. Concept Learning through General-to-Specific Ordering Based on Machine Learning, T. Mitchell, McGRAW Hill, 1997, ch. 2 Acknowledgement: The present slides are an adaptation of slides drawn by T. Mitchell

More information

Computational Learning Theory. CS 486/686: Introduction to Artificial Intelligence Fall 2013

Computational Learning Theory. CS 486/686: Introduction to Artificial Intelligence Fall 2013 Computational Learning Theory CS 486/686: Introduction to Artificial Intelligence Fall 2013 1 Overview Introduction to Computational Learning Theory PAC Learning Theory Thanks to T Mitchell 2 Introduction

More information

Computational Learning Theory

Computational Learning Theory CS 446 Machine Learning Fall 2016 OCT 11, 2016 Computational Learning Theory Professor: Dan Roth Scribe: Ben Zhou, C. Cervantes 1 PAC Learning We want to develop a theory to relate the probability of successful

More information

Computational Learning Theory (COLT)

Computational Learning Theory (COLT) Computational Learning Theory (COLT) Goals: Theoretical characterization of 1 Difficulty of machine learning problems Under what conditions is learning possible and impossible? 2 Capabilities of machine

More information

Computational Learning Theory

Computational Learning Theory Computational Learning Theory Pardis Noorzad Department of Computer Engineering and IT Amirkabir University of Technology Ordibehesht 1390 Introduction For the analysis of data structures and algorithms

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning PAC Learning and VC Dimension Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE

More information

Computational Learning Theory (VC Dimension)

Computational Learning Theory (VC Dimension) Computational Learning Theory (VC Dimension) 1 Difficulty of machine learning problems 2 Capabilities of machine learning algorithms 1 Version Space with associated errors error is the true error, r is

More information

Lecture 25 of 42. PAC Learning, VC Dimension, and Mistake Bounds

Lecture 25 of 42. PAC Learning, VC Dimension, and Mistake Bounds Lecture 25 of 42 PAC Learning, VC Dimension, and Mistake Bounds Thursday, 15 March 2007 William H. Hsu, KSU http://www.kddresearch.org/courses/spring2007/cis732 Readings: Sections 7.4.17.4.3, 7.5.17.5.3,

More information

CS 6375: Machine Learning Computational Learning Theory

CS 6375: Machine Learning Computational Learning Theory CS 6375: Machine Learning Computational Learning Theory Vibhav Gogate The University of Texas at Dallas Many slides borrowed from Ray Mooney 1 Learning Theory Theoretical characterizations of Difficulty

More information

CS340 Machine learning Lecture 4 Learning theory. Some slides are borrowed from Sebastian Thrun and Stuart Russell

CS340 Machine learning Lecture 4 Learning theory. Some slides are borrowed from Sebastian Thrun and Stuart Russell CS340 Machine learning Lecture 4 Learning theory Some slides are borrowed from Sebastian Thrun and Stuart Russell Announcement What: Workshop on applying for NSERC scholarships and for entry to graduate

More information

[read Chapter 2] [suggested exercises 2.2, 2.3, 2.4, 2.6] General-to-specific ordering over hypotheses

[read Chapter 2] [suggested exercises 2.2, 2.3, 2.4, 2.6] General-to-specific ordering over hypotheses 1 CONCEPT LEARNING AND THE GENERAL-TO-SPECIFIC ORDERING [read Chapter 2] [suggested exercises 2.2, 2.3, 2.4, 2.6] Learning from examples General-to-specific ordering over hypotheses Version spaces and

More information

Outline. [read Chapter 2] Learning from examples. General-to-specic ordering over hypotheses. Version spaces and candidate elimination.

Outline. [read Chapter 2] Learning from examples. General-to-specic ordering over hypotheses. Version spaces and candidate elimination. Outline [read Chapter 2] [suggested exercises 2.2, 2.3, 2.4, 2.6] Learning from examples General-to-specic ordering over hypotheses Version spaces and candidate elimination algorithm Picking new examples

More information

A Tutorial on Computational Learning Theory Presented at Genetic Programming 1997 Stanford University, July 1997

A Tutorial on Computational Learning Theory Presented at Genetic Programming 1997 Stanford University, July 1997 A Tutorial on Computational Learning Theory Presented at Genetic Programming 1997 Stanford University, July 1997 Vasant Honavar Artificial Intelligence Research Laboratory Department of Computer Science

More information

Concept Learning.

Concept Learning. . Machine Learning Concept Learning Prof. Dr. Martin Riedmiller AG Maschinelles Lernen und Natürlichsprachliche Systeme Institut für Informatik Technische Fakultät Albert-Ludwigs-Universität Freiburg Martin.Riedmiller@uos.de

More information

Overview. Machine Learning, Chapter 2: Concept Learning

Overview. Machine Learning, Chapter 2: Concept Learning Overview Concept Learning Representation Inductive Learning Hypothesis Concept Learning as Search The Structure of the Hypothesis Space Find-S Algorithm Version Space List-Eliminate Algorithm Candidate-Elimination

More information

Concept Learning. Berlin Chen Department of Computer Science & Information Engineering National Taiwan Normal University.

Concept Learning. Berlin Chen Department of Computer Science & Information Engineering National Taiwan Normal University. Concept Learning Berlin Chen Department of Computer Science & Information Engineering National Taiwan Normal University References: 1. Tom M. Mitchell, Machine Learning, Chapter 2 2. Tom M. Mitchell s

More information

Concept Learning Mitchell, Chapter 2. CptS 570 Machine Learning School of EECS Washington State University

Concept Learning Mitchell, Chapter 2. CptS 570 Machine Learning School of EECS Washington State University Concept Learning Mitchell, Chapter 2 CptS 570 Machine Learning School of EECS Washington State University Outline Definition General-to-specific ordering over hypotheses Version spaces and the candidate

More information

VC Dimension Review. The purpose of this document is to review VC dimension and PAC learning for infinite hypothesis spaces.

VC Dimension Review. The purpose of this document is to review VC dimension and PAC learning for infinite hypothesis spaces. VC Dimension Review The purpose of this document is to review VC dimension and PAC learning for infinite hypothesis spaces. Previously, in discussing PAC learning, we were trying to answer questions about

More information

Web-Mining Agents Computational Learning Theory

Web-Mining Agents Computational Learning Theory Web-Mining Agents Computational Learning Theory Prof. Dr. Ralf Möller Dr. Özgür Özcep Universität zu Lübeck Institut für Informationssysteme Tanya Braun (Exercise Lab) Computational Learning Theory (Adapted)

More information

Online Learning, Mistake Bounds, Perceptron Algorithm

Online Learning, Mistake Bounds, Perceptron Algorithm Online Learning, Mistake Bounds, Perceptron Algorithm 1 Online Learning So far the focus of the course has been on batch learning, where algorithms are presented with a sample of training data, from which

More information

Version Spaces.

Version Spaces. . Machine Learning Version Spaces Prof. Dr. Martin Riedmiller AG Maschinelles Lernen und Natürlichsprachliche Systeme Institut für Informatik Technische Fakultät Albert-Ludwigs-Universität Freiburg riedmiller@informatik.uni-freiburg.de

More information

Fundamentals of Concept Learning

Fundamentals of Concept Learning Aims 09s: COMP947 Macine Learning and Data Mining Fundamentals of Concept Learning Marc, 009 Acknowledgement: Material derived from slides for te book Macine Learning, Tom Mitcell, McGraw-Hill, 997 ttp://www-.cs.cmu.edu/~tom/mlbook.tml

More information

Computational Learning Theory

Computational Learning Theory Computational Learning Theory Slides by and Nathalie Japkowicz (Reading: R&N AIMA 3 rd ed., Chapter 18.5) Computational Learning Theory Inductive learning: given the training set, a learning algorithm

More information

Outline. Training Examples for EnjoySport. 2 lecture slides for textbook Machine Learning, c Tom M. Mitchell, McGraw Hill, 1997

Outline. Training Examples for EnjoySport. 2 lecture slides for textbook Machine Learning, c Tom M. Mitchell, McGraw Hill, 1997 Outline Training Examples for EnjoySport Learning from examples General-to-specific ordering over hypotheses [read Chapter 2] [suggested exercises 2.2, 2.3, 2.4, 2.6] Version spaces and candidate elimination

More information

Concept Learning. Berlin Chen References: 1. Tom M. Mitchell, Machine Learning, Chapter 2 2. Tom M. Mitchell s teaching materials.

Concept Learning. Berlin Chen References: 1. Tom M. Mitchell, Machine Learning, Chapter 2 2. Tom M. Mitchell s teaching materials. Concept Learning Berlin Chen 2005 References: 1. Tom M. Mitchell, Machine Learning, Chapter 2 2. Tom M. Mitchell s teaching materials MLDM-Berlin 1 What is a Concept? Concept of Dogs Concept of Cats Concept

More information

Learning Theory. Machine Learning B Seyoung Kim. Many of these slides are derived from Tom Mitchell, Ziv- Bar Joseph. Thanks!

Learning Theory. Machine Learning B Seyoung Kim. Many of these slides are derived from Tom Mitchell, Ziv- Bar Joseph. Thanks! Learning Theory Machine Learning 10-601B Seyoung Kim Many of these slides are derived from Tom Mitchell, Ziv- Bar Joseph. Thanks! Computa2onal Learning Theory What general laws constrain inducgve learning?

More information

Machine Learning. VC Dimension and Model Complexity. Eric Xing , Fall 2015

Machine Learning. VC Dimension and Model Complexity. Eric Xing , Fall 2015 Machine Learning 10-701, Fall 2015 VC Dimension and Model Complexity Eric Xing Lecture 16, November 3, 2015 Reading: Chap. 7 T.M book, and outline material Eric Xing @ CMU, 2006-2015 1 Last time: PAC and

More information

Concept Learning. Space of Versions of Concepts Learned

Concept Learning. Space of Versions of Concepts Learned Concept Learning Space of Versions of Concepts Learned 1 A Concept Learning Task Target concept: Days on which Aldo enjoys his favorite water sport Example Sky AirTemp Humidity Wind Water Forecast EnjoySport

More information

Introduction to machine learning. Concept learning. Design of a learning system. Designing a learning system

Introduction to machine learning. Concept learning. Design of a learning system. Designing a learning system Introduction to machine learning Concept learning Maria Simi, 2011/2012 Machine Learning, Tom Mitchell Mc Graw-Hill International Editions, 1997 (Cap 1, 2). Introduction to machine learning When appropriate

More information

Dan Roth 461C, 3401 Walnut

Dan Roth  461C, 3401 Walnut CIS 519/419 Applied Machine Learning www.seas.upenn.edu/~cis519 Dan Roth danroth@seas.upenn.edu http://www.cis.upenn.edu/~danroth/ 461C, 3401 Walnut Slides were created by Dan Roth (for CIS519/419 at Penn

More information

COMP9444: Neural Networks. Vapnik Chervonenkis Dimension, PAC Learning and Structural Risk Minimization

COMP9444: Neural Networks. Vapnik Chervonenkis Dimension, PAC Learning and Structural Risk Minimization : Neural Networks Vapnik Chervonenkis Dimension, PAC Learning and Structural Risk Minimization 11s2 VC-dimension and PAC-learning 1 How good a classifier does a learner produce? Training error is the precentage

More information

BITS F464: MACHINE LEARNING

BITS F464: MACHINE LEARNING BITS F464: MACHINE LEARNING Lecture-09: Concept Learning Dr. Kamlesh Tiwari Assistant Professor Department of Computer Science and Information Systems Engineering, BITS Pilani, Rajasthan-333031 INDIA Jan

More information

Statistical Learning Learning From Examples

Statistical Learning Learning From Examples Statistical Learning Learning From Examples We want to estimate the working temperature range of an iphone. We could study the physics and chemistry that affect the performance of the phone too hard We

More information

Computational Learning Theory: Probably Approximately Correct (PAC) Learning. Machine Learning. Spring The slides are mainly from Vivek Srikumar

Computational Learning Theory: Probably Approximately Correct (PAC) Learning. Machine Learning. Spring The slides are mainly from Vivek Srikumar Computational Learning Theory: Probably Approximately Correct (PAC) Learning Machine Learning Spring 2018 The slides are mainly from Vivek Srikumar 1 This lecture: Computational Learning Theory The Theory

More information

Lecture 2: Foundations of Concept Learning

Lecture 2: Foundations of Concept Learning Lecture 2: Foundations of Concept Learning Cognitive Systems II - Machine Learning WS 2005/2006 Part I: Basic Approaches to Concept Learning Version Space, Candidate Elimination, Inductive Bias Lecture

More information

Question of the Day? Machine Learning 2D1431. Training Examples for Concept Enjoy Sport. Outline. Lecture 3: Concept Learning

Question of the Day? Machine Learning 2D1431. Training Examples for Concept Enjoy Sport. Outline. Lecture 3: Concept Learning Question of the Day? Machine Learning 2D43 Lecture 3: Concept Learning What row of numbers comes next in this series? 2 2 22 322 3222 Outline Training Examples for Concept Enjoy Sport Learning from examples

More information

Linear Discrimination Functions

Linear Discrimination Functions Laurea Magistrale in Informatica Nicola Fanizzi Dipartimento di Informatica Università degli Studi di Bari November 4, 2009 Outline Linear models Gradient descent Perceptron Minimum square error approach

More information

PAC Learning Introduction to Machine Learning. Matt Gormley Lecture 14 March 5, 2018

PAC Learning Introduction to Machine Learning. Matt Gormley Lecture 14 March 5, 2018 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University PAC Learning Matt Gormley Lecture 14 March 5, 2018 1 ML Big Picture Learning Paradigms:

More information

Computational Learning Theory. CS534 - Machine Learning

Computational Learning Theory. CS534 - Machine Learning Computational Learning Theory CS534 Machine Learning Introduction Computational learning theory Provides a theoretical analysis of learning Shows when a learning algorithm can be expected to succeed Shows

More information

Hypothesis Testing and Computational Learning Theory. EECS 349 Machine Learning With slides from Bryan Pardo, Tom Mitchell

Hypothesis Testing and Computational Learning Theory. EECS 349 Machine Learning With slides from Bryan Pardo, Tom Mitchell Hypothesis Testing and Computational Learning Theory EECS 349 Machine Learning With slides from Bryan Pardo, Tom Mitchell Overview Hypothesis Testing: How do we know our learners are good? What does performance

More information

[Read Ch. 5] [Recommended exercises: 5.2, 5.3, 5.4]

[Read Ch. 5] [Recommended exercises: 5.2, 5.3, 5.4] Evaluating Hypotheses [Read Ch. 5] [Recommended exercises: 5.2, 5.3, 5.4] Sample error, true error Condence intervals for observed hypothesis error Estimators Binomial distribution, Normal distribution,

More information

Lecture Learning infinite hypothesis class via VC-dimension and Rademacher complexity;

Lecture Learning infinite hypothesis class via VC-dimension and Rademacher complexity; CSCI699: Topics in Learning and Game Theory Lecture 2 Lecturer: Ilias Diakonikolas Scribes: Li Han Today we will cover the following 2 topics: 1. Learning infinite hypothesis class via VC-dimension and

More information

PDEEC Machine Learning 2016/17

PDEEC Machine Learning 2016/17 PDEEC Machine Learning 2016/17 Lecture - Model assessment, selection and Ensemble Jaime S. Cardoso jaime.cardoso@inesctec.pt INESC TEC and Faculdade Engenharia, Universidade do Porto Nov. 07, 2017 1 /

More information

Machine Learning. Ensemble Methods. Manfred Huber

Machine Learning. Ensemble Methods. Manfred Huber Machine Learning Ensemble Methods Manfred Huber 2015 1 Bias, Variance, Noise Classification errors have different sources Choice of hypothesis space and algorithm Training set Noise in the data The expected

More information

Computational learning theory. PAC learning. VC dimension.

Computational learning theory. PAC learning. VC dimension. Computational learning theory. PAC learning. VC dimension. Petr Pošík Czech Technical University in Prague Faculty of Electrical Engineering Dept. of Cybernetics COLT 2 Concept...........................................................................................................

More information

Introduction to Machine Learning

Introduction to Machine Learning Outline Contents Introduction to Machine Learning Concept Learning Varun Chandola February 2, 2018 1 Concept Learning 1 1.1 Example Finding Malignant Tumors............. 2 1.2 Notation..............................

More information

Machine Learning. Lecture 9: Learning Theory. Feng Li.

Machine Learning. Lecture 9: Learning Theory. Feng Li. Machine Learning Lecture 9: Learning Theory Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2018 Why Learning Theory How can we tell

More information

Machine Learning 2D1431. Lecture 3: Concept Learning

Machine Learning 2D1431. Lecture 3: Concept Learning Machine Learning 2D1431 Lecture 3: Concept Learning Question of the Day? What row of numbers comes next in this series? 1 11 21 1211 111221 312211 13112221 Outline Learning from examples General-to specific

More information

CSCE 478/878 Lecture 2: Concept Learning and the General-to-Specific Ordering

CSCE 478/878 Lecture 2: Concept Learning and the General-to-Specific Ordering Outline Learning from eamples CSCE 78/878 Lecture : Concept Learning and te General-to-Specific Ordering Stepen D. Scott (Adapted from Tom Mitcell s slides) General-to-specific ordering over ypoteses Version

More information

Part of the slides are adapted from Ziko Kolter

Part of the slides are adapted from Ziko Kolter Part of the slides are adapted from Ziko Kolter OUTLINE 1 Supervised learning: classification........................................................ 2 2 Non-linear regression/classification, overfitting,

More information

Learning Theory, Overfi1ng, Bias Variance Decomposi9on

Learning Theory, Overfi1ng, Bias Variance Decomposi9on Learning Theory, Overfi1ng, Bias Variance Decomposi9on Machine Learning 10-601B Seyoung Kim Many of these slides are derived from Tom Mitchell, Ziv- 1 Bar Joseph. Thanks! Any(!) learner that outputs a

More information

The Power of Random Counterexamples

The Power of Random Counterexamples Proceedings of Machine Learning Research 76:1 14, 2017 Algorithmic Learning Theory 2017 The Power of Random Counterexamples Dana Angluin DANA.ANGLUIN@YALE.EDU and Tyler Dohrn TYLER.DOHRN@YALE.EDU Department

More information

Hierarchical Concept Learning

Hierarchical Concept Learning COMS 6998-4 Fall 2017 Octorber 30, 2017 Hierarchical Concept Learning Presenter: Xuefeng Hu Scribe: Qinyao He 1 Introduction It has been shown that learning arbitrary polynomial-size circuits is computationally

More information

Data Mining und Maschinelles Lernen

Data Mining und Maschinelles Lernen Data Mining und Maschinelles Lernen Ensemble Methods Bias-Variance Trade-off Basic Idea of Ensembles Bagging Basic Algorithm Bagging with Costs Randomization Random Forests Boosting Stacking Error-Correcting

More information

CSE 417T: Introduction to Machine Learning. Lecture 11: Review. Henry Chai 10/02/18

CSE 417T: Introduction to Machine Learning. Lecture 11: Review. Henry Chai 10/02/18 CSE 417T: Introduction to Machine Learning Lecture 11: Review Henry Chai 10/02/18 Unknown Target Function!: # % Training data Formal Setup & = ( ), + ),, ( -, + - Learning Algorithm 2 Hypothesis Set H

More information

Active Learning and Optimized Information Gathering

Active Learning and Optimized Information Gathering Active Learning and Optimized Information Gathering Lecture 7 Learning Theory CS 101.2 Andreas Krause Announcements Project proposal: Due tomorrow 1/27 Homework 1: Due Thursday 1/29 Any time is ok. Office

More information

Notes on Machine Learning for and

Notes on Machine Learning for and Notes on Machine Learning for 16.410 and 16.413 (Notes adapted from Tom Mitchell and Andrew Moore.) Learning = improving with experience Improve over task T (e.g, Classification, control tasks) with respect

More information

THE VAPNIK- CHERVONENKIS DIMENSION and LEARNABILITY

THE VAPNIK- CHERVONENKIS DIMENSION and LEARNABILITY THE VAPNIK- CHERVONENKIS DIMENSION and LEARNABILITY Dan A. Simovici UMB, Doctoral Summer School Iasi, Romania What is Machine Learning? The Vapnik-Chervonenkis Dimension Probabilistic Learning Potential

More information

Neural Network Learning: Testing Bounds on Sample Complexity

Neural Network Learning: Testing Bounds on Sample Complexity Neural Network Learning: Testing Bounds on Sample Complexity Joaquim Marques de Sá, Fernando Sereno 2, Luís Alexandre 3 INEB Instituto de Engenharia Biomédica Faculdade de Engenharia da Universidade do

More information

Statistical and Computational Learning Theory

Statistical and Computational Learning Theory Statistical and Computational Learning Theory Fundamental Question: Predict Error Rates Given: Find: The space H of hypotheses The number and distribution of the training examples S The complexity of the

More information

Introduction to Algorithms / Algorithms I Lecturer: Michael Dinitz Topic: Intro to Learning Theory Date: 12/8/16

Introduction to Algorithms / Algorithms I Lecturer: Michael Dinitz Topic: Intro to Learning Theory Date: 12/8/16 600.463 Introduction to Algorithms / Algorithms I Lecturer: Michael Dinitz Topic: Intro to Learning Theory Date: 12/8/16 25.1 Introduction Today we re going to talk about machine learning, but from an

More information

10.1 The Formal Model

10.1 The Formal Model 67577 Intro. to Machine Learning Fall semester, 2008/9 Lecture 10: The Formal (PAC) Learning Model Lecturer: Amnon Shashua Scribe: Amnon Shashua 1 We have see so far algorithms that explicitly estimate

More information

Learning with multiple models. Boosting.

Learning with multiple models. Boosting. CS 2750 Machine Learning Lecture 21 Learning with multiple models. Boosting. Milos Hauskrecht milos@cs.pitt.edu 5329 Sennott Square Learning with multiple models: Approach 2 Approach 2: use multiple models

More information

EECS 349: Machine Learning

EECS 349: Machine Learning EECS 349: Machine Learning Bryan Pardo Topic 1: Concept Learning and Version Spaces (with some tweaks by Doug Downey) 1 Concept Learning Much of learning involves acquiring general concepts from specific

More information

Midterm, Fall 2003

Midterm, Fall 2003 5-78 Midterm, Fall 2003 YOUR ANDREW USERID IN CAPITAL LETTERS: YOUR NAME: There are 9 questions. The ninth may be more time-consuming and is worth only three points, so do not attempt 9 unless you are

More information

Learning Theory Continued

Learning Theory Continued Learning Theory Continued Machine Learning CSE446 Carlos Guestrin University of Washington May 13, 2013 1 A simple setting n Classification N data points Finite number of possible hypothesis (e.g., dec.

More information

Background. Adaptive Filters and Machine Learning. Bootstrap. Combining models. Boosting and Bagging. Poltayev Rassulzhan

Background. Adaptive Filters and Machine Learning. Bootstrap. Combining models. Boosting and Bagging. Poltayev Rassulzhan Adaptive Filters and Machine Learning Boosting and Bagging Background Poltayev Rassulzhan rasulzhan@gmail.com Resampling Bootstrap We are using training set and different subsets in order to validate results

More information

Introduction to Computational Learning Theory

Introduction to Computational Learning Theory Introduction to Computational Learning Theory The classification problem Consistent Hypothesis Model Probably Approximately Correct (PAC) Learning c Hung Q. Ngo (SUNY at Buffalo) CSE 694 A Fun Course 1

More information

1 Active Learning Foundations of Machine Learning and Data Science. Lecturer: Maria-Florina Balcan Lecture 20 & 21: November 16 & 18, 2015

1 Active Learning Foundations of Machine Learning and Data Science. Lecturer: Maria-Florina Balcan Lecture 20 & 21: November 16 & 18, 2015 10-806 Foundations of Machine Learning and Data Science Lecturer: Maria-Florina Balcan Lecture 20 & 21: November 16 & 18, 2015 1 Active Learning Most classic machine learning methods and the formal learning

More information

Does Unlabeled Data Help?

Does Unlabeled Data Help? Does Unlabeled Data Help? Worst-case Analysis of the Sample Complexity of Semi-supervised Learning. Ben-David, Lu and Pal; COLT, 2008. Presentation by Ashish Rastogi Courant Machine Learning Seminar. Outline

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Concept Learning Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574 1 /

More information

Answers Machine Learning Exercises 2

Answers Machine Learning Exercises 2 nswers Machine Learning Exercises 2 Tim van Erven October 7, 2007 Exercises. Consider the List-Then-Eliminate algorithm for the EnjoySport example with hypothesis space H = {?,?,?,?,?,?, Sunny,?,?,?,?,?,

More information

The Vapnik-Chervonenkis Dimension

The Vapnik-Chervonenkis Dimension The Vapnik-Chervonenkis Dimension Prof. Dan A. Simovici UMB 1 / 91 Outline 1 Growth Functions 2 Basic Definitions for Vapnik-Chervonenkis Dimension 3 The Sauer-Shelah Theorem 4 The Link between VCD and

More information

Discriminative Learning can Succeed where Generative Learning Fails

Discriminative Learning can Succeed where Generative Learning Fails Discriminative Learning can Succeed where Generative Learning Fails Philip M. Long, a Rocco A. Servedio, b,,1 Hans Ulrich Simon c a Google, Mountain View, CA, USA b Columbia University, New York, New York,

More information

Lecture Slides for INTRODUCTION TO. Machine Learning. By: Postedited by: R.

Lecture Slides for INTRODUCTION TO. Machine Learning. By:  Postedited by: R. Lecture Slides for INTRODUCTION TO Machine Learning By: alpaydin@boun.edu.tr http://www.cmpe.boun.edu.tr/~ethem/i2ml Postedited by: R. Basili Learning a Class from Examples Class C of a family car Prediction:

More information

On the Sample Complexity of Noise-Tolerant Learning

On the Sample Complexity of Noise-Tolerant Learning On the Sample Complexity of Noise-Tolerant Learning Javed A. Aslam Department of Computer Science Dartmouth College Hanover, NH 03755 Scott E. Decatur Laboratory for Computer Science Massachusetts Institute

More information

CS 446: Machine Learning Lecture 4, Part 2: On-Line Learning

CS 446: Machine Learning Lecture 4, Part 2: On-Line Learning CS 446: Machine Learning Lecture 4, Part 2: On-Line Learning 0.1 Linear Functions So far, we have been looking at Linear Functions { as a class of functions which can 1 if W1 X separate some data and not

More information

Learning a Class from Examples. Training set X. Class C 1. Class C of a family car. Output: Input representation: x 1 : price, x 2 : engine power

Learning a Class from Examples. Training set X. Class C 1. Class C of a family car. Output: Input representation: x 1 : price, x 2 : engine power Alpaydin Chaper, Michell Chaper 7 Alpaydin slides are in urquoise. Ehem Alpaydin, copyrigh: The MIT Press, 010. alpaydin@boun.edu.r hp://www.cmpe.boun.edu.r/ ehem/imle All oher slides are based on Michell.

More information

PAC Learning. prof. dr Arno Siebes. Algorithmic Data Analysis Group Department of Information and Computing Sciences Universiteit Utrecht

PAC Learning. prof. dr Arno Siebes. Algorithmic Data Analysis Group Department of Information and Computing Sciences Universiteit Utrecht PAC Learning prof. dr Arno Siebes Algorithmic Data Analysis Group Department of Information and Computing Sciences Universiteit Utrecht Recall: PAC Learning (Version 1) A hypothesis class H is PAC learnable

More information

Generalization, Overfitting, and Model Selection

Generalization, Overfitting, and Model Selection Generalization, Overfitting, and Model Selection Sample Complexity Results for Supervised Classification Maria-Florina (Nina) Balcan 10/03/2016 Two Core Aspects of Machine Learning Algorithm Design. How

More information

Lecture 29: Computational Learning Theory

Lecture 29: Computational Learning Theory CS 710: Complexity Theory 5/4/2010 Lecture 29: Computational Learning Theory Instructor: Dieter van Melkebeek Scribe: Dmitri Svetlov and Jake Rosin Today we will provide a brief introduction to computational

More information

MACHINE LEARNING. Probably Approximately Correct (PAC) Learning. Alessandro Moschitti

MACHINE LEARNING. Probably Approximately Correct (PAC) Learning. Alessandro Moschitti MACHINE LEARNING Probably Approximately Correct (PAC) Learning Alessandro Moschitti Department of Information Engineering and Computer Science University of Trento Email: moschitti@disi.unitn.it Objectives:

More information

The Decision List Machine

The Decision List Machine The Decision List Machine Marina Sokolova SITE, University of Ottawa Ottawa, Ont. Canada,K1N-6N5 sokolova@site.uottawa.ca Nathalie Japkowicz SITE, University of Ottawa Ottawa, Ont. Canada,K1N-6N5 nat@site.uottawa.ca

More information

CS446: Machine Learning Spring Problem Set 4

CS446: Machine Learning Spring Problem Set 4 CS446: Machine Learning Spring 2017 Problem Set 4 Handed Out: February 27 th, 2017 Due: March 11 th, 2017 Feel free to talk to other members of the class in doing the homework. I am more concerned that

More information

Understanding Generalization Error: Bounds and Decompositions

Understanding Generalization Error: Bounds and Decompositions CIS 520: Machine Learning Spring 2018: Lecture 11 Understanding Generalization Error: Bounds and Decompositions Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the

More information

Generalization theory

Generalization theory Generalization theory Chapter 4 T.P. Runarsson (tpr@hi.is) and S. Sigurdsson (sven@hi.is) Introduction Suppose you are given the empirical observations, (x 1, y 1 ),..., (x l, y l ) (X Y) l. Consider the

More information

PAC-learning, VC Dimension and Margin-based Bounds

PAC-learning, VC Dimension and Margin-based Bounds More details: General: http://www.learning-with-kernels.org/ Example of more complex bounds: http://www.research.ibm.com/people/t/tzhang/papers/jmlr02_cover.ps.gz PAC-learning, VC Dimension and Margin-based

More information

Generalization and Overfitting

Generalization and Overfitting Generalization and Overfitting Model Selection Maria-Florina (Nina) Balcan February 24th, 2016 PAC/SLT models for Supervised Learning Data Source Distribution D on X Learning Algorithm Expert / Oracle

More information

CSCE 478/878 Lecture 6: Bayesian Learning

CSCE 478/878 Lecture 6: Bayesian Learning Bayesian Methods Not all hypotheses are created equal (even if they are all consistent with the training data) Outline CSCE 478/878 Lecture 6: Bayesian Learning Stephen D. Scott (Adapted from Tom Mitchell

More information

Support Vector Machines. Machine Learning Fall 2017

Support Vector Machines. Machine Learning Fall 2017 Support Vector Machines Machine Learning Fall 2017 1 Where are we? Learning algorithms Decision Trees Perceptron AdaBoost 2 Where are we? Learning algorithms Decision Trees Perceptron AdaBoost Produce

More information

Machine Learning Lecture 7

Machine Learning Lecture 7 Course Outline Machine Learning Lecture 7 Fundamentals (2 weeks) Bayes Decision Theory Probability Density Estimation Statistical Learning Theory 23.05.2016 Discriminative Approaches (5 weeks) Linear Discriminant

More information