A Tutorial on Computational Learning Theory Presented at Genetic Programming 1997 Stanford University, July 1997

Size: px
Start display at page:

Download "A Tutorial on Computational Learning Theory Presented at Genetic Programming 1997 Stanford University, July 1997"


1 A Tutorial on Computational Learning Theory Presented at Genetic Programming 1997 Stanford University, July 1997 Vasant Honavar Artificial Intelligence Research Laboratory Department of Computer Science Iowa State University, Ames, Iowa

2 What are learning systems? Systems that improve their performance one or more tasks with experience in their environment Examples: Pattern recognizers, adaptive control systems, adaptive intelligent agents, etc.

3 Computational Models of Learning Model of the Learner: Computational capabilities, sensors, effectors, knowledge representation, inference mechanisms, prior knowledge, etc. Model of the Environment: Tasks to be learned, information sources (teacher, queries, experiments), performance measures Key questions: Can a learner with a certain structure learn a specified task in a particular environment? Can the learner do so efficiently? If so, how? If not, why not?

4 Computational Models of Learning Theories of Learning: What is it good for? Mistake bound model Maximum Likelihood model PAC (Probably Approximately Correct) model Learning from simple examples Concluding remarks

5 Theories of Learning: What are they good for? To make explicit relevant aspects of the learner and the environment To identify easy and hard learning problems (and the precise conditions under which they are easy or hard) To guide the design of learning systems To shed light on natural learning systems To help analyze the performance of learning systems

6 Mistake bound Model Example: Given an arbitrary, noise-free sequence of labeled examples <X 1,C(X 1 )>,<X 2,C(X 2 )>...of an unknown binary conjunctive concept C over {0,1} N, the learner's task is to predict C(X) for a given X. Theorem: Exact online learning of conjunctive concepts can be accomplished with at most (N+1) prediction mistakes.

7 Mistake bound model Algorithm Initialize L={X 1, ~X 1,... ~X N } Predict according to match between an instance and the conjunction of literals in L Whenever a mistake is made on a positive example, drop the offending literals from L Eg: <0111, 1> will result in L = {~ X 1, X 2,X 3, X 4 } <1110, 1> will yield L = {X 2,X 3 }

8 Mistake bound model Proof of Theorem 1: No literal in C is ever eliminated from L Each mistake eliminates at least one literal from L The first mistake eliminates N of the 2N literals Conjunctive concepts can be learned with at most (N+1) mistakes Conclusion: Conjunctive concepts are easy to learn in the mistake bound model

9 Optimal Mistake Bound Learning Algorithms Definition: An optimal mistake bound mbound(c) for a concept classs C is the lowest possible mistake bound in the worst case (considering all concepts in C, and all possible sequences of examples). Definition: An optimal learning algorithm for a concept class C (in the mistake bound framework) is one that is guaranteed to exactly learn any concept in C, using any noise-free example sequence, with at most O(mbound(C)) mistakes. Theorem: mbound( C) lg C

10 Definition: The version space The Halving Algorithm { } V C C i = C is consistent with the first i examples Definition: The halving algorithm predicts according to the majority of concepts in the current version space and a mistake results in elimination of all the offending concepts from the version space Fine print: The halving algorithm may not be efficiently implementable.

11 The Halving Algorithm The halving algorithm can be practical if there is a way to compactly represent and efficiently manipulate the version space. Question: Are there any efficiently implementable optimal mistake bound learning algorithms? Answer: Littlestone's algorithm for learning monotone disjunctions of at most k of n literals using the hypothesis class of threshold functions with at most (k lg n) mistakes.

12 Bounding the prediction error Mistake bound model bounds the number of mistakes that the learner will ever make before exactly learning a concept, but not the prediction error after having seen a certain number of examples. Mistake bound model assumes that the examples are chosen arbitrarily - in the worst case, by a smart, adversarial teacher. It might often be satisfactory to assume randomly drawn examples

13 Probably Approximately Correct Learning Oracle Examples Learner Samples Concept Instance Distribution

14 Probably Approximately Correct Learning Consider: An instance space X A concept space C = { C: X { 01, }} A hypothesis space H = { h : X { 0, 1} } An unknown, arbitrary, not necessarily computable, stationary probability distribution D over the instance space X

15 PAC Learning The oracle samples the instance space according to D and provides labeled examples of an unknown concept C to the learner The learner is tested on samples drawn from the instance space according to the same probability distribution D The learner's task is to output a hypothesis h from H that closely approximates the unknown concept C based on the examples it has encountered

16 PAC Learning In the PAC setting, exact learning (zero error approximation) cannot be guaranteed In the PAC setting, even approximate learning (with bounded non-zero error) cannot be guaranteed 100% of the time Definition: The error of a hypothesis h with respect to a target concept C and an instance distribution D is given by Prob D [ C( X ) h( X ) ]

17 PAC Learning Definition: A concept class C is said to be PAClearnable using a hypothesis class H if there exists a learning algorithm L such that for all concepts in C, for all instance distributions D on an instance space X, εδ, ( 0< εδ, < 1), L, when given access to the Example oracle, produces, with probability at least ( 1 δ), a hypothesis h from H with error no more than ε (Valiant, 1984)

18 Efficient PAC Learning Definition: C is said to be efficiently PAC-learnable if L runs in time that is polynomial in N (size of the instance representation), 1 size(c) (size of the 1 concept representation), and δ ε Remark Note that lower error or increased confidence require more examples. Remark: In order for a concept class to be efficiently PAClearnable, it should be PAC-learnable using a random sample of size polynomial in the relevant parameters.

19 Sample complexity of PAC Learning Definition: A consistent learner is one that returns some hypothesis h from the hypothesis class H that is consistent with a random sequence of m examples. Remark: A consistent learner is a MAP learner (one that returns a hypothesis that is most likely given the training data) if all hypotheses are a-priori equally likely Theorem: A consistent learner is guranteed to be PAC if the number of samples 1 H m > ln ε δ

20 Sample Complexity of PAC Learning Proof: Consider a hypothesis h that is not a PAC approximation of an unknown concept C. Clearly, error of h, or the probability that h is wrong on a random instance is at least ( 1 ε). The probability of h being wrong on m independently drawn random examples is at least ( 1 ε) m. For PAC learning, we want to make sure that the probability of L returning such a bad hypothesis is small. H ( ) m 1 ε < δ

21 PAC- Easy and PAC-Hard Concept Classes Conjunctive concepts are easy to learn. Use the same algorithm as the one used in the mistake bound framework 1 Sample complexity O { N ln 3 ln δ } Time complexity is polynomial in the relevant parameters of interest. Remark: Polynomial sample complexity is necessary but not sufficient for efficient PAC learning. ε

22 PAC-Easy and PAC-Hard Concept Classes Theorem: 3-term DNF concept class (disjunctions of at most 3 conjunctions) are not efficiently PAClearnable using the same hypothesis class (although it has polynomial sample complexity) unless P=NP. Proof: By polynomial time reduction of graph 3-colorability (a well-known NP-complete problem) to the problem of deciding whether a given set of labeled examples is consistent with some 3-term DNF formula.

23 Transforming Hard Problems to Easy ones Theorem: 3-term DNF concepts are efficiently PAClearnable using 3-CNF (conjunctive normal form with at most 3 literals per clause) hypothesis class. Proof: 3- term DNF 3- CNF Transform each example over N boolean variables into a corresponding example over N 3 variables (one for each possible clause in a 3-CNF formula). The problem reduces to learning a conjunctive concept over the transformed instance space.

24 Transforming Hard Problems to Easy ones Theorem For any k 2 k-term DNF are efficiently PAC-learnable using the k-cnf hypothesis class. Remark: In this case, enlarging the search space by using a hypothesis class that is larger than strictly necessary, actually makes the problem easy! Remark: No, we have not proved that P=NP. Summary: Conjunctive k - term DNF k -CNF CNF Easy Hard Easy Hard

25 Inductive Bias: Occam's Razor Occam's razor: Keep it simple, stupid! An Occam learning algorithm returns a simple or succinct hypothesis that is consistent with the training data. Definition: Let α 0 & 0 β < 1 be constants. A learning algorithm L is said to be an α β Occam algorithm for a concept class C using a hypothesis class H if L, given m random examples of an unknown concept C C, outputs a hypothesis h H such that h is consistent with the examples and size h { Nsize c } α β ( ) ( ) m

26 Sample complexity of an Occam Algorithm Theorem: An Occam algorithm is guaranteed to be PAC if the number of samples m Proof: omitted. 1 1 = O lg + ε δ ε ( Nsize() c ) α 1 1 β

27 Occam algorithm is PAC for K-decision lists Theorem: For any fixed k, the concept class of k- decision lists (nested if-then-else statements where each if condition is a conjunction of at most k of N literals and their negations) is efficiently PAClearnable using the the same hypothesis class. Remark: K-decision lists constitute the most expressive boolean concept class over the boolean instance space {0,1} N that are known to be efficiently PAC learnable.

28 PAC Learning of Infinite Concept Classes Sample complexity results can be derived for concepts defined over R N. Remark: Note that the cardinality of concept and hypothesis classes can now be infinite (e.g., in the case of threshold functions over R N ). Solution: Instead of the cardinality of concept class, use the Vapnik-Chervonenkis dimension (VC dimension) of the concept class to compute sample complexity

29 VC Dimension and Sample Complexity Definition: A set S of instances is shattered by a hypothesis class H if and only if for every dichotomy of S, there exists a hypothesis in H that is consistent with the dichotomy. Definition: The VC-dimension V(H), of a hypothesis class H defined over an instance space X is the cardinality of the largest subset of X that is shattered by H. If arbitrarily large finite subsets of X can be shattered by H, V(H)=

30 VC Dimension and Sample Complexity Example: Let the instance space X be the 2- dimensional Euclidian space. Let the hypothesis space H be the set of linear 1-dimensional hyperplanes in the 2-dimensional Euclidian space. Then V(H)=3 (a set of 3 points can be shattered by a hyperplane as long as they are not colinear but a set of 4 points cannot be shattered).

31 VC Dimension and Sample Complexity Theorem: The number m of random examples needed for PAC learning of a concept class C of VC dimension V(C) = d is given by m= O lg + dlg ε δ ε Corollary: Acyclic, layered multi-layer networks of s threshold logic units, each with r inputs, has VC dimension 2( r + 1) slg( es)

32 Using a Weak learner for PAC Learning PAC learning requires learning under all distributions, for all choices of error and confidence parameters. Suppose we are given a weak learning algorithm for concept class C that works for a fixed error and/or a fixed confidence. Can we use it for PAC learning of C? YES! (Kearns & Vazirani, 94; Natarajan, 92)

33 Learning from Simple Examples Question: Can we relax the requirement of learning under all probability distributions over the instance space (including extremely pathological distributions) by limiting the class of distributions to a useful subset of all possible distributions? What are the implications of doing so on the learnability of concept classes that are PAC-hard? What probability distributions are natural?

34 Learning from Simple Examples Intuition: Suppose mother nature is kind to us: Simple instances are more likely to be made available to the learner. Question: How can we formalize this intuitive notion? Answer: Kolmogorov complexity offers a natural measure of descriptional complexity of an instance

35 Kolmogorov Complexity Definition: Kolmogorov complexity of an object relative to a universal Turing machine M is the length (measured in number of bits) of the shortest program which when executed on M, prints out γ and halts. ( γ ) = min { ( π) ( π) = γ} K l M π Remark: Simple objects (e.g., a string of all zeros) have low Kolmogorov complexity. γ

36 Kolmogorov Complexity Definition: The conditional Kolmogorov complexity of γ given λ is the length of the shortest program π for a universal Turing machine M which, given λ, outputs γ. Remark: K( γ λ) K( γ ) Remark: Kolmogorov complexity is machineindependent (modulo an additive constant).

37 Universal Distribution Definition: The universal probability distribution over an instance space X is defined by: K( X) X X D ( X ) = where is a normalization U η2 η constant. Definition: A distribution D is simple if it is multiplicatively dominated by the universal distribution, that is, there exists a constant σ such that σ D ( X ) D( X ) U Remark: All computable distributions (including gaussian, poisson, etc. with finite precision parameters) are simple.

38 PAC Learning Under Simple Distributions Theorem: A concept class C defined over a discrete instance space is polynomially PAC-learnable under the universal distribution iff it is polynomially PAC-learnable under each simple distribution, provided, during the learning phase, the samples are drawn according to the universal distribution. (Li & Vitanyi, 91) Remarks: This raises the possibility of learning under all simple distributions by sampling examples according to the universal distribution. But universal distribution is not computable. Is nature characterized by universal distribution? Can we approximate universal distribution?

39 Learning from Simple Examples Suppose a knowledgeable teacher provides simple examples (i.e., examples with low Kolmogorov complexity conditioned on the teacher's knowledge of the concept to be learned). K X r More precisely, D r ( X ) = η r 2 ( ) where r is a suitable representation of the unknown concept and η r is a normalization constant. Definition: Let S S, a set of simple examples, that is, ( ) X SS K( X r) µ lg sizeof ( r)

40 Learning from Simple Examples Definition (informal): A representative sample S R is one that contains all the information necessary for identifying an unknown concept. Example: To learn a finite state machine, a representative examples provide information about all the state transitions. Theorem: If there exists a representative set of simple examples for each concept in a concept class C, then C is PAC learnable under distribution D r. (Denis et al., 96)

41 Learning from Simple Examples Theorem: The class of DFA whose canonical representations have at most Q states are polynomially exactly learnable when examples are provided from a sample drawn according to D r when Q is known. (Parekh & Honavar, 97) Theorem: The class of DFA are probably approximately learnable under D r (Parekh & Honavar, 97). Remark: These are encouraging results in light of the strong evidence against efficient PAC learnability of DFA (Kearns and Vazirani, 1994)

42 Concluding remarks PAC-Easy learning problems lend themselves to a variety of efficient algorithms. PAC-Hard learning problems can often be made PAC-easy through appropriate instance transformation and choice of hypothesis space Occam's razor often helps Weak learning algorithms can often be used for strong learning Learning under restricted classes of instance distributions (e.g., universal distribution) offers new possibilities

43 Bibliography 1 Honavar, V. 2 Kearns, M.J. & Vazirani, U.V. An Introduction to Computational Learning Theory. Cambridge, MA: MIT Press Langley, P. Elements of Machine Learning. Palo Alto, CA: Morgan Kaufmann Li, M. & Vitanyi, P. Kolmogorov Complexity and its Applications. New York: Springer-Verlag Mitchell, T. Machine Learning. New York: McGraw Hill Natarajan, B.K. Machine Learning: A Theoretical Approach. Palo Alto, CA: Morgan Kaufmann, 1992.

Lecture 5: Efficient PAC Learning. 1 Consistent Learning: a Bound on Sample Complexity

Lecture 5: Efficient PAC Learning. 1 Consistent Learning: a Bound on Sample Complexity Universität zu Lübeck Institut für Theoretische Informatik Lecture notes on Knowledge-Based and Learning Systems by Maciej Liśkiewicz Lecture 5: Efficient PAC Learning 1 Consistent Learning: a Bound on

More information

Lecture 25 of 42. PAC Learning, VC Dimension, and Mistake Bounds

Lecture 25 of 42. PAC Learning, VC Dimension, and Mistake Bounds Lecture 25 of 42 PAC Learning, VC Dimension, and Mistake Bounds Thursday, 15 March 2007 William H. Hsu, KSU http://www.kddresearch.org/courses/spring2007/cis732 Readings: Sections,,

More information

Computational Learning Theory

Computational Learning Theory 0. Computational Learning Theory Based on Machine Learning, T. Mitchell, McGRAW Hill, 1997, ch. 7 Acknowledgement: The present slides are an adaptation of slides drawn by T. Mitchell 1. Main Questions

More information

Computational Learning Theory. Definitions

Computational Learning Theory. Definitions Computational Learning Theory Computational learning theory is interested in theoretical analyses of the following issues. What is needed to learn effectively? Sample complexity. How many examples? Computational

More information

CS 6375: Machine Learning Computational Learning Theory

CS 6375: Machine Learning Computational Learning Theory CS 6375: Machine Learning Computational Learning Theory Vibhav Gogate The University of Texas at Dallas Many slides borrowed from Ray Mooney 1 Learning Theory Theoretical characterizations of Difficulty

More information

Computational Learning Theory (COLT)

Computational Learning Theory (COLT) Computational Learning Theory (COLT) Goals: Theoretical characterization of 1 Difficulty of machine learning problems Under what conditions is learning possible and impossible? 2 Capabilities of machine

More information

Computational Learning Theory

Computational Learning Theory 1 Computational Learning Theory 2 Computational learning theory Introduction Is it possible to identify classes of learning problems that are inherently easy or difficult? Can we characterize the number

More information

Computational Learning Theory (VC Dimension)

Computational Learning Theory (VC Dimension) Computational Learning Theory (VC Dimension) 1 Difficulty of machine learning problems 2 Capabilities of machine learning algorithms 1 Version Space with associated errors error is the true error, r is

More information

Computational Learning Theory

Computational Learning Theory Computational Learning Theory Sinh Hoa Nguyen, Hung Son Nguyen Polish-Japanese Institute of Information Technology Institute of Mathematics, Warsaw University February 14, 2006 inh Hoa Nguyen, Hung Son

More information

Computational Learning Theory

Computational Learning Theory CS 446 Machine Learning Fall 2016 OCT 11, 2016 Computational Learning Theory Professor: Dan Roth Scribe: Ben Zhou, C. Cervantes 1 PAC Learning We want to develop a theory to relate the probability of successful

More information

Online Learning, Mistake Bounds, Perceptron Algorithm

Online Learning, Mistake Bounds, Perceptron Algorithm Online Learning, Mistake Bounds, Perceptron Algorithm 1 Online Learning So far the focus of the course has been on batch learning, where algorithms are presented with a sample of training data, from which

More information

Computational Learning Theory

Computational Learning Theory 09s1: COMP9417 Machine Learning and Data Mining Computational Learning Theory May 20, 2009 Acknowledgement: Material derived from slides for the book Machine Learning, Tom M. Mitchell, McGraw-Hill, 1997

More information

CS340 Machine learning Lecture 4 Learning theory. Some slides are borrowed from Sebastian Thrun and Stuart Russell

CS340 Machine learning Lecture 4 Learning theory. Some slides are borrowed from Sebastian Thrun and Stuart Russell CS340 Machine learning Lecture 4 Learning theory Some slides are borrowed from Sebastian Thrun and Stuart Russell Announcement What: Workshop on applying for NSERC scholarships and for entry to graduate

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning PAC Learning and VC Dimension Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE

More information

Machine Learning

Machine Learning Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 11, 2012 Today: Computational Learning Theory Probably Approximately Coorrect (PAC) learning theorem

More information

Computational Learning Theory

Computational Learning Theory Computational Learning Theory Slides by and Nathalie Japkowicz (Reading: R&N AIMA 3 rd ed., Chapter 18.5) Computational Learning Theory Inductive learning: given the training set, a learning algorithm

More information

Computational Learning Theory

Computational Learning Theory Computational Learning Theory Pardis Noorzad Department of Computer Engineering and IT Amirkabir University of Technology Ordibehesht 1390 Introduction For the analysis of data structures and algorithms

More information

Machine Learning

Machine Learning Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 11, 2012 Today: Computational Learning Theory Probably Approximately Coorrect (PAC) learning theorem

More information

Computational Learning Theory. CS 486/686: Introduction to Artificial Intelligence Fall 2013

Computational Learning Theory. CS 486/686: Introduction to Artificial Intelligence Fall 2013 Computational Learning Theory CS 486/686: Introduction to Artificial Intelligence Fall 2013 1 Overview Introduction to Computational Learning Theory PAC Learning Theory Thanks to T Mitchell 2 Introduction

More information

Web-Mining Agents Computational Learning Theory

Web-Mining Agents Computational Learning Theory Web-Mining Agents Computational Learning Theory Prof. Dr. Ralf Möller Dr. Özgür Özcep Universität zu Lübeck Institut für Informationssysteme Tanya Braun (Exercise Lab) Computational Learning Theory (Adapted)

More information

Introduction to Computational Learning Theory

Introduction to Computational Learning Theory Introduction to Computational Learning Theory The classification problem Consistent Hypothesis Model Probably Approximately Correct (PAC) Learning c Hung Q. Ngo (SUNY at Buffalo) CSE 694 A Fun Course 1

More information

Computational Learning Theory. CS534 - Machine Learning

Computational Learning Theory. CS534 - Machine Learning Computational Learning Theory CS534 Machine Learning Introduction Computational learning theory Provides a theoretical analysis of learning Shows when a learning algorithm can be expected to succeed Shows

More information

Computational Learning Theory - Hilary Term : Introduction to the PAC Learning Framework

Computational Learning Theory - Hilary Term : Introduction to the PAC Learning Framework Computational Learning Theory - Hilary Term 2018 1 : Introduction to the PAC Learning Framework Lecturer: Varun Kanade 1 What is computational learning theory? Machine learning techniques lie at the heart

More information

[read Chapter 2] [suggested exercises 2.2, 2.3, 2.4, 2.6] General-to-specific ordering over hypotheses

[read Chapter 2] [suggested exercises 2.2, 2.3, 2.4, 2.6] General-to-specific ordering over hypotheses 1 CONCEPT LEARNING AND THE GENERAL-TO-SPECIFIC ORDERING [read Chapter 2] [suggested exercises 2.2, 2.3, 2.4, 2.6] Learning from examples General-to-specific ordering over hypotheses Version spaces and

More information

Computational learning theory. PAC learning. VC dimension.

Computational learning theory. PAC learning. VC dimension. Computational learning theory. PAC learning. VC dimension. Petr Pošík Czech Technical University in Prague Faculty of Electrical Engineering Dept. of Cybernetics COLT 2 Concept...........................................................................................................

More information

A Necessary Condition for Learning from Positive Examples

A Necessary Condition for Learning from Positive Examples Machine Learning, 5, 101-113 (1990) 1990 Kluwer Academic Publishers. Manufactured in The Netherlands. A Necessary Condition for Learning from Positive Examples HAIM SHVAYTSER* (HAIM%SARNOFF@PRINCETON.EDU)

More information

Foundations of Machine Learning and Data Science. Lecturer: Avrim Blum Lecture 9: October 7, 2015

Foundations of Machine Learning and Data Science. Lecturer: Avrim Blum Lecture 9: October 7, 2015 10-806 Foundations of Machine Learning and Data Science Lecturer: Avrim Blum Lecture 9: October 7, 2015 1 Computational Hardness of Learning Today we will talk about some computational hardness results

More information

NP Completeness and Approximation Algorithms

NP Completeness and Approximation Algorithms Winter School on Optimization Techniques December 15-20, 2016 Organized by ACMU, ISI and IEEE CEDA NP Completeness and Approximation Algorithms Susmita Sur-Kolay Advanced Computing and Microelectronic

More information

Machine Learning. VC Dimension and Model Complexity. Eric Xing , Fall 2015

Machine Learning. VC Dimension and Model Complexity. Eric Xing , Fall 2015 Machine Learning 10-701, Fall 2015 VC Dimension and Model Complexity Eric Xing Lecture 16, November 3, 2015 Reading: Chap. 7 T.M book, and outline material Eric Xing @ CMU, 2006-2015 1 Last time: PAC and

More information

Hypothesis Testing and Computational Learning Theory. EECS 349 Machine Learning With slides from Bryan Pardo, Tom Mitchell

Hypothesis Testing and Computational Learning Theory. EECS 349 Machine Learning With slides from Bryan Pardo, Tom Mitchell Hypothesis Testing and Computational Learning Theory EECS 349 Machine Learning With slides from Bryan Pardo, Tom Mitchell Overview Hypothesis Testing: How do we know our learners are good? What does performance

More information

Computational Learning Theory

Computational Learning Theory Computational Learning Theory [read Chapter 7] [Suggested exercises: 7.1, 7.2, 7.5, 7.8] Computational learning theory Setting 1: learner poses queries to teacher Setting 2: teacher chooses examples Setting

More information

Summer School on Introduction to Algorithms and Optimization Techniques July 4-12, 2017 Organized by ACMU, ISI and IEEE CEDA.

Summer School on Introduction to Algorithms and Optimization Techniques July 4-12, 2017 Organized by ACMU, ISI and IEEE CEDA. Summer School on Introduction to Algorithms and Optimization Techniques July 4-12, 2017 Organized by ACMU, ISI and IEEE CEDA NP Completeness Susmita Sur-Kolay Advanced Computing and Microelectronics Unit

More information

Can PAC Learning Algorithms Tolerate. Random Attribute Noise? Sally A. Goldman. Department of Computer Science. Washington University

Can PAC Learning Algorithms Tolerate. Random Attribute Noise? Sally A. Goldman. Department of Computer Science. Washington University Can PAC Learning Algorithms Tolerate Random Attribute Noise? Sally A. Goldman Department of Computer Science Washington University St. Louis, Missouri 63130 Robert H. Sloan y Dept. of Electrical Engineering

More information

VC Dimension Review. The purpose of this document is to review VC dimension and PAC learning for infinite hypothesis spaces.

VC Dimension Review. The purpose of this document is to review VC dimension and PAC learning for infinite hypothesis spaces. VC Dimension Review The purpose of this document is to review VC dimension and PAC learning for infinite hypothesis spaces. Previously, in discussing PAC learning, we were trying to answer questions about

More information

Limitations of Efficient Reducibility to the Kolmogorov Random Strings

Limitations of Efficient Reducibility to the Kolmogorov Random Strings Limitations of Efficient Reducibility to the Kolmogorov Random Strings John M. HITCHCOCK 1 Department of Computer Science, University of Wyoming Abstract. We show the following results for polynomial-time

More information

Complexity Theory VU , SS The Polynomial Hierarchy. Reinhard Pichler

Complexity Theory VU , SS The Polynomial Hierarchy. Reinhard Pichler Complexity Theory Complexity Theory VU 181.142, SS 2018 6. The Polynomial Hierarchy Reinhard Pichler Institut für Informationssysteme Arbeitsbereich DBAI Technische Universität Wien 15 May, 2018 Reinhard

More information

Outline. Complexity Theory EXACT TSP. The Class DP. Definition. Problem EXACT TSP. Complexity of EXACT TSP. Proposition VU 181.

Outline. Complexity Theory EXACT TSP. The Class DP. Definition. Problem EXACT TSP. Complexity of EXACT TSP. Proposition VU 181. Complexity Theory Complexity Theory Outline Complexity Theory VU 181.142, SS 2018 6. The Polynomial Hierarchy Reinhard Pichler Institut für Informationssysteme Arbeitsbereich DBAI Technische Universität

More information

PAC Learning. prof. dr Arno Siebes. Algorithmic Data Analysis Group Department of Information and Computing Sciences Universiteit Utrecht

PAC Learning. prof. dr Arno Siebes. Algorithmic Data Analysis Group Department of Information and Computing Sciences Universiteit Utrecht PAC Learning prof. dr Arno Siebes Algorithmic Data Analysis Group Department of Information and Computing Sciences Universiteit Utrecht Recall: PAC Learning (Version 1) A hypothesis class H is PAC learnable

More information

Machine Learning. Computational Learning Theory. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012

Machine Learning. Computational Learning Theory. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012 Machine Learning CSE6740/CS7641/ISYE6740, Fall 2012 Computational Learning Theory Le Song Lecture 11, September 20, 2012 Based on Slides from Eric Xing, CMU Reading: Chap. 7 T.M book 1 Complexity of Learning

More information

Generalization, Overfitting, and Model Selection

Generalization, Overfitting, and Model Selection Generalization, Overfitting, and Model Selection Sample Complexity Results for Supervised Classification Maria-Florina (Nina) Balcan 10/03/2016 Two Core Aspects of Machine Learning Algorithm Design. How

More information

ICML '97 and AAAI '97 Tutorials

ICML '97 and AAAI '97 Tutorials A Short Course in Computational Learning Theory: ICML '97 and AAAI '97 Tutorials Michael Kearns AT&T Laboratories Outline Sample Complexity/Learning Curves: nite classes, Occam's VC dimension Razor, Best

More information

10.1 The Formal Model

10.1 The Formal Model 67577 Intro. to Machine Learning Fall semester, 2008/9 Lecture 10: The Formal (PAC) Learning Model Lecturer: Amnon Shashua Scribe: Amnon Shashua 1 We have see so far algorithms that explicitly estimate

More information

Computational Learning Theory: Probably Approximately Correct (PAC) Learning. Machine Learning. Spring The slides are mainly from Vivek Srikumar

Computational Learning Theory: Probably Approximately Correct (PAC) Learning. Machine Learning. Spring The slides are mainly from Vivek Srikumar Computational Learning Theory: Probably Approximately Correct (PAC) Learning Machine Learning Spring 2018 The slides are mainly from Vivek Srikumar 1 This lecture: Computational Learning Theory The Theory

More information

Polynomial time Prediction Strategy with almost Optimal Mistake Probability

Polynomial time Prediction Strategy with almost Optimal Mistake Probability Polynomial time Prediction Strategy with almost Optimal Mistake Probability Nader H. Bshouty Department of Computer Science Technion, 32000 Haifa, Israel bshouty@cs.technion.ac.il Abstract We give the

More information

Lecture 29: Computational Learning Theory

Lecture 29: Computational Learning Theory CS 710: Complexity Theory 5/4/2010 Lecture 29: Computational Learning Theory Instructor: Dieter van Melkebeek Scribe: Dmitri Svetlov and Jake Rosin Today we will provide a brief introduction to computational

More information

The Power of Random Counterexamples

The Power of Random Counterexamples Proceedings of Machine Learning Research 76:1 14, 2017 Algorithmic Learning Theory 2017 The Power of Random Counterexamples Dana Angluin DANA.ANGLUIN@YALE.EDU and Tyler Dohrn TYLER.DOHRN@YALE.EDU Department

More information

Comp487/587 - Boolean Formulas

Comp487/587 - Boolean Formulas Comp487/587 - Boolean Formulas 1 Logic and SAT 1.1 What is a Boolean Formula Logic is a way through which we can analyze and reason about simple or complicated events. In particular, we are interested

More information

On the Sample Complexity of Noise-Tolerant Learning

On the Sample Complexity of Noise-Tolerant Learning On the Sample Complexity of Noise-Tolerant Learning Javed A. Aslam Department of Computer Science Dartmouth College Hanover, NH 03755 Scott E. Decatur Laboratory for Computer Science Massachusetts Institute

More information

Concept Learning Mitchell, Chapter 2. CptS 570 Machine Learning School of EECS Washington State University

Concept Learning Mitchell, Chapter 2. CptS 570 Machine Learning School of EECS Washington State University Concept Learning Mitchell, Chapter 2 CptS 570 Machine Learning School of EECS Washington State University Outline Definition General-to-specific ordering over hypotheses Version spaces and the candidate

More information

Machine Learning. Computational Learning Theory. Eric Xing , Fall Lecture 9, October 5, 2016

Machine Learning. Computational Learning Theory. Eric Xing , Fall Lecture 9, October 5, 2016 Machine Learning 10-701, Fall 2016 Computational Learning Theory Eric Xing Lecture 9, October 5, 2016 Reading: Chap. 7 T.M book Eric Xing @ CMU, 2006-2016 1 Generalizability of Learning In machine learning

More information

A Result of Vapnik with Applications

A Result of Vapnik with Applications A Result of Vapnik with Applications Martin Anthony Department of Statistical and Mathematical Sciences London School of Economics Houghton Street London WC2A 2AE, U.K. John Shawe-Taylor Department of

More information

Concept Learning through General-to-Specific Ordering

Concept Learning through General-to-Specific Ordering 0. Concept Learning through General-to-Specific Ordering Based on Machine Learning, T. Mitchell, McGRAW Hill, 1997, ch. 2 Acknowledgement: The present slides are an adaptation of slides drawn by T. Mitchell

More information

An Introduction to Statistical Theory of Learning. Nakul Verma Janelia, HHMI

An Introduction to Statistical Theory of Learning. Nakul Verma Janelia, HHMI An Introduction to Statistical Theory of Learning Nakul Verma Janelia, HHMI Towards formalizing learning What does it mean to learn a concept? Gain knowledge or experience of the concept. The basic process

More information

Being Taught can be Faster than Asking Questions

Being Taught can be Faster than Asking Questions Being Taught can be Faster than Asking Questions Ronald L. Rivest Yiqun Lisa Yin Abstract We explore the power of teaching by studying two on-line learning models: teacher-directed learning and self-directed

More information

Introduction to machine learning. Concept learning. Design of a learning system. Designing a learning system

Introduction to machine learning. Concept learning. Design of a learning system. Designing a learning system Introduction to machine learning Concept learning Maria Simi, 2011/2012 Machine Learning, Tom Mitchell Mc Graw-Hill International Editions, 1997 (Cap 1, 2). Introduction to machine learning When appropriate

More information

Computational Learning Theory: PAC Model

Computational Learning Theory: PAC Model Computational Learning Theory: PAC Model Subhash Suri May 19, 2015 1 A rectangle Learning Game These notes are based on the paper A Theory of the Learnable by Valiant, the book by Kearns-Vazirani, and

More information

Learning Theory. Machine Learning B Seyoung Kim. Many of these slides are derived from Tom Mitchell, Ziv- Bar Joseph. Thanks!

Learning Theory. Machine Learning B Seyoung Kim. Many of these slides are derived from Tom Mitchell, Ziv- Bar Joseph. Thanks! Learning Theory Machine Learning 10-601B Seyoung Kim Many of these slides are derived from Tom Mitchell, Ziv- Bar Joseph. Thanks! Computa2onal Learning Theory What general laws constrain inducgve learning?

More information

PAC Learning Introduction to Machine Learning. Matt Gormley Lecture 14 March 5, 2018

PAC Learning Introduction to Machine Learning. Matt Gormley Lecture 14 March 5, 2018 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University PAC Learning Matt Gormley Lecture 14 March 5, 2018 1 ML Big Picture Learning Paradigms:

More information

Statistical and Computational Learning Theory

Statistical and Computational Learning Theory Statistical and Computational Learning Theory Fundamental Question: Predict Error Rates Given: Find: The space H of hypotheses The number and distribution of the training examples S The complexity of the

More information

Relating Data Compression and Learnability

Relating Data Compression and Learnability Relating Data Compression and Learnability Nick Littlestone, Manfred K. Warmuth Department of Computer and Information Sciences University of California at Santa Cruz June 10, 1986 Abstract We explore

More information

NP-Completeness. f(n) \ n n sec sec sec. n sec 24.3 sec 5.2 mins. 2 n sec 17.9 mins 35.

NP-Completeness. f(n) \ n n sec sec sec. n sec 24.3 sec 5.2 mins. 2 n sec 17.9 mins 35. NP-Completeness Reference: Computers and Intractability: A Guide to the Theory of NP-Completeness by Garey and Johnson, W.H. Freeman and Company, 1979. NP-Completeness 1 General Problems, Input Size and

More information

Statistical Learning Learning From Examples

Statistical Learning Learning From Examples Statistical Learning Learning From Examples We want to estimate the working temperature range of an iphone. We could study the physics and chemistry that affect the performance of the phone too hard We

More information


THE VAPNIK- CHERVONENKIS DIMENSION and LEARNABILITY THE VAPNIK- CHERVONENKIS DIMENSION and LEARNABILITY Dan A. Simovici UMB, Doctoral Summer School Iasi, Romania What is Machine Learning? The Vapnik-Chervonenkis Dimension Probabilistic Learning Potential

More information

Neural Network Learning: Testing Bounds on Sample Complexity

Neural Network Learning: Testing Bounds on Sample Complexity Neural Network Learning: Testing Bounds on Sample Complexity Joaquim Marques de Sá, Fernando Sereno 2, Luís Alexandre 3 INEB Instituto de Engenharia Biomédica Faculdade de Engenharia da Universidade do

More information

Algorithmic Probability

Algorithmic Probability Algorithmic Probability From Scholarpedia From Scholarpedia, the free peer-reviewed encyclopedia p.19046 Curator: Marcus Hutter, Australian National University Curator: Shane Legg, Dalle Molle Institute

More information

Computational Learning Theory for Artificial Neural Networks

Computational Learning Theory for Artificial Neural Networks Computational Learning Theory for Artificial Neural Networks Martin Anthony and Norman Biggs Department of Statistical and Mathematical Sciences, London School of Economics and Political Science, Houghton

More information

CSCE 478/878 Lecture 6: Bayesian Learning

CSCE 478/878 Lecture 6: Bayesian Learning Bayesian Methods Not all hypotheses are created equal (even if they are all consistent with the training data) Outline CSCE 478/878 Lecture 6: Bayesian Learning Stephen D. Scott (Adapted from Tom Mitchell

More information

Dan Roth 461C, 3401 Walnut

Dan Roth  461C, 3401 Walnut CIS 519/419 Applied Machine Learning www.seas.upenn.edu/~cis519 Dan Roth danroth@seas.upenn.edu http://www.cis.upenn.edu/~danroth/ 461C, 3401 Walnut Slides were created by Dan Roth (for CIS519/419 at Penn

More information

COMP9444: Neural Networks. Vapnik Chervonenkis Dimension, PAC Learning and Structural Risk Minimization

COMP9444: Neural Networks. Vapnik Chervonenkis Dimension, PAC Learning and Structural Risk Minimization : Neural Networks Vapnik Chervonenkis Dimension, PAC Learning and Structural Risk Minimization 11s2 VC-dimension and PAC-learning 1 How good a classifier does a learner produce? Training error is the precentage

More information

A brief introduction to Logic. (slides from

A brief introduction to Logic. (slides from A brief introduction to Logic (slides from http://www.decision-procedures.org/) 1 A Brief Introduction to Logic - Outline Propositional Logic :Syntax Propositional Logic :Semantics Satisfiability and validity

More information

Classes of Boolean Functions

Classes of Boolean Functions Classes of Boolean Functions Nader H. Bshouty Eyal Kushilevitz Abstract Here we give classes of Boolean functions that considered in COLT. Classes of Functions Here we introduce the basic classes of functions

More information

Chapter 2. Reductions and NP. 2.1 Reductions Continued The Satisfiability Problem (SAT) SAT 3SAT. CS 573: Algorithms, Fall 2013 August 29, 2013

Chapter 2. Reductions and NP. 2.1 Reductions Continued The Satisfiability Problem (SAT) SAT 3SAT. CS 573: Algorithms, Fall 2013 August 29, 2013 Chapter 2 Reductions and NP CS 573: Algorithms, Fall 2013 August 29, 2013 2.1 Reductions Continued 2.1.1 The Satisfiability Problem SAT Propositional Formulas Definition 2.1.1. Consider a set of

More information

Propositional Resolution

Propositional Resolution Artificial Intelligence Propositional Resolution Marco Piastra Propositional Resolution 1] Deductive systems and automation Is problem decidible? A deductive system a la Hilbert (i.e. derivation using

More information

Lecture 20: conp and Friends, Oracles in Complexity Theory

Lecture 20: conp and Friends, Oracles in Complexity Theory 6.045 Lecture 20: conp and Friends, Oracles in Complexity Theory 1 Definition: conp = { L L NP } What does a conp computation look like? In NP algorithms, we can use a guess instruction in pseudocode:

More information

Outline. Training Examples for EnjoySport. 2 lecture slides for textbook Machine Learning, c Tom M. Mitchell, McGraw Hill, 1997

Outline. Training Examples for EnjoySport. 2 lecture slides for textbook Machine Learning, c Tom M. Mitchell, McGraw Hill, 1997 Outline Training Examples for EnjoySport Learning from examples General-to-specific ordering over hypotheses [read Chapter 2] [suggested exercises 2.2, 2.3, 2.4, 2.6] Version spaces and candidate elimination

More information

The Perceptron algorithm

The Perceptron algorithm The Perceptron algorithm Tirgul 3 November 2016 Agnostic PAC Learnability A hypothesis class H is agnostic PAC learnable if there exists a function m H : 0,1 2 N and a learning algorithm with the following

More information

Generalization and Overfitting

Generalization and Overfitting Generalization and Overfitting Model Selection Maria-Florina (Nina) Balcan February 24th, 2016 PAC/SLT models for Supervised Learning Data Source Distribution D on X Learning Algorithm Expert / Oracle

More information

Midterm, Fall 2003

Midterm, Fall 2003 5-78 Midterm, Fall 2003 YOUR ANDREW USERID IN CAPITAL LETTERS: YOUR NAME: There are 9 questions. The ninth may be more time-consuming and is worth only three points, so do not attempt 9 unless you are

More information

Lecture 2: Foundations of Concept Learning

Lecture 2: Foundations of Concept Learning Lecture 2: Foundations of Concept Learning Cognitive Systems II - Machine Learning WS 2005/2006 Part I: Basic Approaches to Concept Learning Version Space, Candidate Elimination, Inductive Bias Lecture

More information

1 More finite deterministic automata

1 More finite deterministic automata CS 125 Section #6 Finite automata October 18, 2016 1 More finite deterministic automata Exercise. Consider the following game with two players: Repeatedly flip a coin. On heads, player 1 gets a point.

More information

Agnostic Online learnability

Agnostic Online learnability Technical Report TTIC-TR-2008-2 October 2008 Agnostic Online learnability Shai Shalev-Shwartz Toyota Technological Institute Chicago shai@tti-c.org ABSTRACT We study a fundamental question. What classes

More information

Introduction to Algorithms / Algorithms I Lecturer: Michael Dinitz Topic: Intro to Learning Theory Date: 12/8/16

Introduction to Algorithms / Algorithms I Lecturer: Michael Dinitz Topic: Intro to Learning Theory Date: 12/8/16 600.463 Introduction to Algorithms / Algorithms I Lecturer: Michael Dinitz Topic: Intro to Learning Theory Date: 12/8/16 25.1 Introduction Today we re going to talk about machine learning, but from an

More information

Computational Learning Theory: Shattering and VC Dimensions. Machine Learning. Spring The slides are mainly from Vivek Srikumar

Computational Learning Theory: Shattering and VC Dimensions. Machine Learning. Spring The slides are mainly from Vivek Srikumar Computational Learning Theory: Shattering and VC Dimensions Machine Learning Spring 2018 The slides are mainly from Vivek Srikumar 1 This lecture: Computational Learning Theory The Theory of Generalization

More information

PAC-learning, VC Dimension and Margin-based Bounds

PAC-learning, VC Dimension and Margin-based Bounds More details: General: http://www.learning-with-kernels.org/ Example of more complex bounds: http://www.research.ibm.com/people/t/tzhang/papers/jmlr02_cover.ps.gz PAC-learning, VC Dimension and Margin-based

More information

Washington University. St. Louis, Missouri 63130

Washington University. St. Louis, Missouri 63130 Computational Learning Theory Lecture Notes for CS 582 Spring Semester, 1991 Sally A. Goldman Department of Computer Science Washington University St. Louis, Missouri 63130 WUCS-91-36 1 Preface This manuscript

More information

Halting and Equivalence of Program Schemes in Models of Arbitrary Theories

Halting and Equivalence of Program Schemes in Models of Arbitrary Theories Halting and Equivalence of Program Schemes in Models of Arbitrary Theories Dexter Kozen Cornell University, Ithaca, New York 14853-7501, USA, kozen@cs.cornell.edu, http://www.cs.cornell.edu/~kozen In Honor

More information

Concept Learning. Space of Versions of Concepts Learned

Concept Learning. Space of Versions of Concepts Learned Concept Learning Space of Versions of Concepts Learned 1 A Concept Learning Task Target concept: Days on which Aldo enjoys his favorite water sport Example Sky AirTemp Humidity Wind Water Forecast EnjoySport

More information

Part 1: Propositional Logic

Part 1: Propositional Logic Part 1: Propositional Logic Literature (also for first-order logic) Schöning: Logik für Informatiker, Spektrum Fitting: First-Order Logic and Automated Theorem Proving, Springer 1 Last time 1.1 Syntax

More information

Polynomial time reduction and NP-completeness. Exploring some time complexity limits of polynomial time algorithmic solutions

Polynomial time reduction and NP-completeness. Exploring some time complexity limits of polynomial time algorithmic solutions Polynomial time reduction and NP-completeness Exploring some time complexity limits of polynomial time algorithmic solutions 1 Polynomial time reduction Definition: A language L is said to be polynomial

More information

Notes on Machine Learning for and

Notes on Machine Learning for and Notes on Machine Learning for 16.410 and 16.413 (Notes adapted from Tom Mitchell and Andrew Moore.) Learning = improving with experience Improve over task T (e.g, Classification, control tasks) with respect

More information

Computational Complexity and Intractability: An Introduction to the Theory of NP. Chapter 9

Computational Complexity and Intractability: An Introduction to the Theory of NP. Chapter 9 1 Computational Complexity and Intractability: An Introduction to the Theory of NP Chapter 9 2 Objectives Classify problems as tractable or intractable Define decision problems Define the class P Define

More information

Name (NetID): (1 Point)

Name (NetID): (1 Point) CS446: Machine Learning Fall 2016 October 25 th, 2016 This is a closed book exam. Everything you need in order to solve the problems is supplied in the body of this exam. This exam booklet contains four

More information

PAC Model and Generalization Bounds

PAC Model and Generalization Bounds PAC Model and Generalization Bounds Overview Probably Approximately Correct (PAC) model Basic generalization bounds finite hypothesis class infinite hypothesis class Simple case More next week 2 Motivating

More information

Polynomial Certificates for Propositional Classes

Polynomial Certificates for Propositional Classes Polynomial Certificates for Propositional Classes Marta Arias Ctr. for Comp. Learning Systems Columbia University New York, NY 10115, USA Aaron Feigelson Leydig, Voit & Mayer, Ltd. Chicago, IL 60601, USA

More information

A An Overview of Complexity Theory for the Algorithm Designer

A An Overview of Complexity Theory for the Algorithm Designer A An Overview of Complexity Theory for the Algorithm Designer A.1 Certificates and the class NP A decision problem is one whose answer is either yes or no. Two examples are: SAT: Given a Boolean formula

More information

Harvard University, like reasoning, language recognition, object identication. be studied separately from learning (See (Kirsh 1991)

Harvard University, like reasoning, language recognition, object identication. be studied separately from learning (See (Kirsh 1991) To appear in AAAI 1994 Learning to Reason Roni Khardon Dan Roth y Aiken Computation Laboratory, Harvard University, Cambridge, MA 02138. froni,danrg@das.harvard.edu Abstract We introduce a new framework

More information

MACHINE LEARNING. Probably Approximately Correct (PAC) Learning. Alessandro Moschitti

MACHINE LEARNING. Probably Approximately Correct (PAC) Learning. Alessandro Moschitti MACHINE LEARNING Probably Approximately Correct (PAC) Learning Alessandro Moschitti Department of Information Engineering and Computer Science University of Trento Email: moschitti@disi.unitn.it Objectives:

More information

The Complexity of Learning Concept Classes with Polynomial General Dimension

The Complexity of Learning Concept Classes with Polynomial General Dimension The Complexity of Learning Concept Classes with Polynomial General Dimension Johannes Köbler a, Wolfgang Lindner b a Institut für Informatik, Humboldt-Universität zu Berlin, 10099 Berlin, Germany b Abteilung

More information

A Lower Bound of 2 n Conditional Jumps for Boolean Satisfiability on A Random Access Machine

A Lower Bound of 2 n Conditional Jumps for Boolean Satisfiability on A Random Access Machine A Lower Bound of 2 n Conditional Jumps for Boolean Satisfiability on A Random Access Machine Samuel C. Hsieh Computer Science Department, Ball State University July 3, 2014 Abstract We establish a lower

More information