Computational Learning Theory (COLT)

Similar documents
Computational Learning Theory

Computational Learning Theory

Computational Learning Theory

CS 6375: Machine Learning Computational Learning Theory

Computational Learning Theory (VC Dimension)

Computational Learning Theory

Computational Learning Theory

Computational Learning Theory

Lecture 25 of 42. PAC Learning, VC Dimension, and Mistake Bounds

Introduction to Machine Learning

Machine Learning

Machine Learning. Computational Learning Theory. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012

Computational Learning Theory. CS534 - Machine Learning

Machine Learning

Computational Learning Theory. CS 486/686: Introduction to Artificial Intelligence Fall 2013

Machine Learning. Computational Learning Theory. Eric Xing , Fall Lecture 9, October 5, 2016

A Tutorial on Computational Learning Theory Presented at Genetic Programming 1997 Stanford University, July 1997

Computational Learning Theory

Computational Learning Theory

Computational Learning Theory. Definitions

Web-Mining Agents Computational Learning Theory

CS340 Machine learning Lecture 4 Learning theory. Some slides are borrowed from Sebastian Thrun and Stuart Russell

Computational Learning Theory: Probably Approximately Correct (PAC) Learning. Machine Learning. Spring The slides are mainly from Vivek Srikumar

Learning Theory. Machine Learning B Seyoung Kim. Many of these slides are derived from Tom Mitchell, Ziv- Bar Joseph. Thanks!

VC Dimension Review. The purpose of this document is to review VC dimension and PAC learning for infinite hypothesis spaces.

Hypothesis Testing and Computational Learning Theory. EECS 349 Machine Learning With slides from Bryan Pardo, Tom Mitchell

Computational learning theory. PAC learning. VC dimension.

Lecture 5: Efficient PAC Learning. 1 Consistent Learning: a Bound on Sample Complexity

Machine Learning. VC Dimension and Model Complexity. Eric Xing , Fall 2015

Statistical and Computational Learning Theory

Overview. Machine Learning, Chapter 2: Concept Learning

Concept Learning through General-to-Specific Ordering

Dan Roth 461C, 3401 Walnut

[read Chapter 2] [suggested exercises 2.2, 2.3, 2.4, 2.6] General-to-specific ordering over hypotheses

Statistical Learning Learning From Examples

Introduction to Computational Learning Theory

Computational Learning Theory - Hilary Term : Introduction to the PAC Learning Framework

Concept Learning. Space of Versions of Concepts Learned

Question of the Day? Machine Learning 2D1431. Training Examples for Concept Enjoy Sport. Outline. Lecture 3: Concept Learning

Lecture 29: Computational Learning Theory

Introduction to Machine Learning

Version Spaces.

Concept Learning. Berlin Chen References: 1. Tom M. Mitchell, Machine Learning, Chapter 2 2. Tom M. Mitchell s teaching materials.

Hierarchical Concept Learning

Concept Learning.

BITS F464: MACHINE LEARNING

Concept Learning. Berlin Chen Department of Computer Science & Information Engineering National Taiwan Normal University.

CS446: Machine Learning Spring Problem Set 4

CS 151 Complexity Theory Spring Solution Set 5

Online Learning, Mistake Bounds, Perceptron Algorithm

Classification: The PAC Learning Framework

Learning Theory. Machine Learning CSE546 Carlos Guestrin University of Washington. November 25, Carlos Guestrin

On the Sample Complexity of Noise-Tolerant Learning

Machine Learning 2D1431. Lecture 3: Concept Learning

PAC Learning. prof. dr Arno Siebes. Algorithmic Data Analysis Group Department of Information and Computing Sciences Universiteit Utrecht

ICML '97 and AAAI '97 Tutorials

Introduction to Machine Learning

1 The Probably Approximately Correct (PAC) Model

Concept Learning Mitchell, Chapter 2. CptS 570 Machine Learning School of EECS Washington State University

PAC Model and Generalization Bounds

Introduction to Machine Learning

MACHINE LEARNING. Probably Approximately Correct (PAC) Learning. Alessandro Moschitti

10.1 The Formal Model

Learning Theory, Overfi1ng, Bias Variance Decomposi9on

CS 395T Computational Learning Theory. Scribe: Mike Halcrow. x 4. x 2. x 6

TTIC An Introduction to the Theory of Machine Learning. Learning from noisy data, intro to SQ model

Generalization, Overfitting, and Model Selection

Machine Learning, Midterm Exam: Spring 2008 SOLUTIONS. Q Topic Max. Score Score. 1 Short answer questions 20.

Notes on Machine Learning for and

Lecture Learning infinite hypothesis class via VC-dimension and Rademacher complexity;

Computational Learning Theory: Shattering and VC Dimensions. Machine Learning. Spring The slides are mainly from Vivek Srikumar

A Course in Machine Learning

COMP9444: Neural Networks. Vapnik Chervonenkis Dimension, PAC Learning and Structural Risk Minimization

Introduction to Bayesian Learning. Machine Learning Fall 2018

Introduction to Machine Learning

Introduction to Statistical Learning Theory

Generalization theory

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Bayesian Learning Features of Bayesian learning methods:

Probably Approximately Correct Learning - III

Lecture 2: Foundations of Concept Learning

The PAC Learning Framework -II

Computational and Statistical Learning Theory

Learning Theory Continued

PAC Learning Introduction to Machine Learning. Matt Gormley Lecture 14 March 5, 2018

THE VAPNIK- CHERVONENKIS DIMENSION and LEARNABILITY

Can PAC Learning Algorithms Tolerate. Random Attribute Noise? Sally A. Goldman. Department of Computer Science. Washington University

Computational Learning Theory: PAC Model

PAC-learning, VC Dimension and Margin-based Bounds

CS 446: Machine Learning Lecture 4, Part 2: On-Line Learning

EECS 349: Machine Learning

Learning Theory. Aar$ Singh and Barnabas Poczos. Machine Learning / Apr 17, Slides courtesy: Carlos Guestrin

The Power of Random Counterexamples

Computational and Statistical Learning Theory

Lecture 7: Passive Learning

From Batch to Transductive Online Learning

CSE 417T: Introduction to Machine Learning. Lecture 11: Review. Henry Chai 10/02/18

Generalization and Overfitting

Introduction to machine learning. Concept learning. Design of a learning system. Designing a learning system

Outline. Training Examples for EnjoySport. 2 lecture slides for textbook Machine Learning, c Tom M. Mitchell, McGraw Hill, 1997

Transcription:

Computational Learning Theory (COLT) Goals: Theoretical characterization of 1 Difficulty of machine learning problems Under what conditions is learning possible and impossible? 2 Capabilities of machine learning algorithms Under what conditions is a particular learning algorithm assured of learning successfully? 1

Some easy to compute functions are not learnable Cryptography E k where k specifies the key Even if values of E k are known for polynomially many dynamically chosen inputs Computationally infeasible to deduce an algorithm for E k or an approximation to it 2

What General Laws govern machine and nonmachine Learners? 1 Identify classes of learning problems as difficult or easy, independent of learner 2 No of training examples necessary or sufficient to learn How is it affected if learner can pose queries? 3 Characterize no of mistakes learner will make before learning 4 Characterize inherent computational complexity of classes of learning problems 3

Goal of COLT Inductively Learn a Target Function Given: Only training examples of target function Space of candidate hypotheses Sample Complexity: how many training examples are needed to converge with high probability to a successful hypothesis? Computational Complexity: how much computational effort is needed for learner to converge with high probability to a successful hypothesis? Mistake Bound: how many training examples will the learner misclassify before converging to a successful hypothesis? 4

Two frameworks for analyzing learning algorithms 1 Probably Approximately Correct (PAC) framework Identify classes of hypotheses that can/cannot be learned from a polynomial number of training samples Finite hypothesis space Infinite hypotheses (VC dimension) Define natural measure of complexity for hypothesis spaces (VC dimension) that allows bounding the number of training examples required for inductive learning 2 Mistake bound framework Number of training errors made by a learner before it determines correct hypothesis 5

Probably Learning an Approximately Correct Hypothesis Problem Setting of Probably Approximately Correct (PAC) learning Sample complexity of learning boolean valued concepts from noise-free training data 6

Problem setting of PAC learning model X = set of all possible instances over which target function may be defined = set of all people Each person described by a set of attributes Target concept c in C age ( young, old ) height ( short, tall ) corresponds to some subset of X c: X ---> { 0, 1 } people who are skiers c ( X ) = 1 if X is a positive example c ( X ) = 0 if X is a negative example 7

Distribution in PAC learning model Instances are generated at random from X according to some probability distribution D Learner L considers some set of possible hypotheses when attempting to learn target concept After observing a sequence of training examples of target concept c, L must output some hypothesis h from H which is an estimate of c 8

Error of a Hypothesis The true error of a hypothesis h with respect to target concept c and distribution D is the probability that h will misclassify an instance drawn at random according to D error D ( h) = Pr[ c( x) h( x)] xεd 9

Error of hypothesis h Instance Space X - c + h - Where c and h disagree + - h has a nonzero error with respect to c although they agree on all training samples 10

Probably Approximately Correct (PAC) Learnability Characterize concepts learnable from a reasonable number of randomly drawn training examples a reasonable amount of computation Strong characterization is futile: No of training examples needed to learn hypothesis h for which error D (h) = 0 cannot be determined because Unless there are training examples for every instance in X there may be multiple hypotheses consistent with training examples Training examples are picked at random and therefore may be misleading 11

PAC Learnable Consider a concept class C defined over a set of instances X of length n and a learner L using hypothesis space H C is PAC-learnable by L using H if for all c C, distributions D over X, such that 0 < ε <1/2 learner L will with probability at least (1 δ) output a hypothesis h ε H such that error D (h) < ε in time that is polynomial in 1/ε, 1/δ, n and size(c) 12

PAC learnability requirements of Learner L With arbitrarily high probability (1 δ) output a hypothesis having arbitrarily low error (ε) Do so efficiently In time that grows at most polynomially with 1/ε and 1/δ, which define the strength of our demands on the output hypothesis With n and size(c) that define the inherent complexity of the underlying instance space X and concept class C n is the size of instances in X If instances are conjunctions of k Boolean variables, n=k size (c) is the encoding length of c in C, eg, no of Boolean features actually used to describe c 13

Computational Resources vs No of Training Samples Required Two are closely related If L requires some minimum processing time per example, then for C to be PAC-learnable, L must learn from a polynomial number of training examples To show some class C of target concepts is PAC-learnable is to first show that each target concept in C can be learned from a polynomial number of training examples 14

Sample Complexity for Finite Hypothesis Spaces Sample Complexity No of training examples required Growth with problem size Bound on no of training samples needed for consistent learners learners that perfectly fit the training data 15

Version Space Contains all plausible versions of the target concept Hypothesis h Hypothesis Space H 16

Version Space A hypothesis h is consistent with training examples D iff h(x)=c(x) for each example <x, c(x)> in D Version space with respect to hypothesis H and training examples D, is a subset of hypotheses from H consistent with the training examples in D 17

Version Space Hypothesis h Hypothesis Space H VS H,D VS H, D = { hεh ( ( x, c( x)) εd)( h( x) = c( x))} 18

Version Space with associated errors error is the true error, r is the training error Hypothesis Space H error=3 r = 1 error=1 r = 2 VS H,D error=2 r = 0 error=1 r = 0 error=3 r = 4 error=2 r = 3 19

Exhausting the Version Space: true error is less than ε The version space is ε-exhausted with respect to c and D if hε VS ) Lerror (h) < ε ( H,D D Hypothesis Space H error=1 r = 2 error=3 r = 1 VS H,D error=2 r = 0 error=1 r = 0 error=3 r = 4 error=2 r = 3 ε = 021 20

Upper bound on probability of not ε-exhausted Theorem: If the hypothesis space H is finite D is a sequence of m > 1 independent random samples of concept c Then for any 0 < ε <1 Probability of not ε-exhausted (with respect to c) is less than or equal to H e ε m Bounds the probability that m training samples will fail to eliminate all bad hypotheses 21

Number of training samples required H e εm δ Rearranging 1 m (ln H ε + ln(1/ δ ) Probability of failure is below some desired level Provides general bound on the no of training samples sufficient for any consistent learner to learn any target concept in H for any desired values of δ and ε 22

Generalization to non-zero training error 1 m (ln H + ln(1/ δ ) 2 2ε m grows as the square of 1/ε rather than linearly Called agnostic learning 23

Conjunctions of Boolean literals are PAClearnable Sample complexity 1 m ( n ln 3+ ε ln(1/ δ ) 24

K-term DNF and CNF concepts 1 m ( nk ln 3 + ε ln(1/ δ ) 25