MACHINE LEARNING. Vapnik-Chervonenkis (VC) Dimension. Alessandro Moschitti

Similar documents
MACHINE LEARNING. Support Vector Machines. Alessandro Moschitti

MACHINE LEARNING. Probably Approximately Correct (PAC) Learning. Alessandro Moschitti

On the Sample Complexity of Noise-Tolerant Learning

Lecture Slides for INTRODUCTION TO. Machine Learning. By: Postedited by: R.

COMP9444: Neural Networks. Vapnik Chervonenkis Dimension, PAC Learning and Structural Risk Minimization

Neural Network Learning: Testing Bounds on Sample Complexity

Computational Learning Theory

CS 6375: Machine Learning Computational Learning Theory

VC Dimension Review. The purpose of this document is to review VC dimension and PAC learning for infinite hypothesis spaces.

Yale University Department of Computer Science. The VC Dimension of k-fold Union

Computational Learning Theory (VC Dimension)

Can PAC Learning Algorithms Tolerate. Random Attribute Noise? Sally A. Goldman. Department of Computer Science. Washington University

Quadratic Upper Bound for Recursive Teaching Dimension of Finite VC Classes

CS340 Machine learning Lecture 4 Learning theory. Some slides are borrowed from Sebastian Thrun and Stuart Russell

THE VAPNIK- CHERVONENKIS DIMENSION and LEARNABILITY

CSE 648: Advanced algorithms

Statistical and Computational Learning Theory

Computational Learning Theory

MACHINE LEARNING - CS671 - Part 2a The Vapnik-Chervonenkis Dimension

A Result of Vapnik with Applications

Computational Learning Theory

The information-theoretic value of unlabeled data in semi-supervised learning

Australian National University. Abstract. number of training examples necessary for satisfactory learning performance grows

Introduction to Machine Learning

An Introduction to Statistical Theory of Learning. Nakul Verma Janelia, HHMI

PAC Learning Introduction to Machine Learning. Matt Gormley Lecture 14 March 5, 2018

Computational Learning Theory: Shattering and VC Dimensions. Machine Learning. Spring The slides are mainly from Vivek Srikumar

Machine Learning

Computational Learning Theory

arxiv: v1 [cs.lg] 18 Feb 2017

Learning Theory Continued

Uniform Glivenko-Cantelli Theorems and Concentration of Measure in the Mathematical Modelling of Learning

Maximal Width Learning of Binary Functions

Machine Learning

Introduction to Machine Learning

Relating Data Compression and Learnability

Learning Theory. Piyush Rai. CS5350/6350: Machine Learning. September 27, (CS5350/6350) Learning Theory September 27, / 14

Support Vector Machine & Its Applications

k-fold unions of low-dimensional concept classes

Computational Learning Theory. CS534 - Machine Learning

Support Vector Machines

Computational Learning Theory. Definitions

VC-dimension for characterizing classifiers

Computational Learning Theory

Machine Learning. VC Dimension and Model Complexity. Eric Xing , Fall 2015

The sample complexity of agnostic learning with deterministic labels

Hypothesis Testing and Computational Learning Theory. EECS 349 Machine Learning With slides from Bryan Pardo, Tom Mitchell

Sample width for multi-category classifiers

The Performance of a New Hybrid Classifier Based on Boxes and Nearest Neighbors

Hence C has VC-dimension d iff. π C (d) = 2 d. (4) C has VC-density l if there exists K R such that, for all

Web-Mining Agents Computational Learning Theory

Maximal Width Learning of Binary Functions

Generalization, Overfitting, and Model Selection

Discriminative Learning can Succeed where Generative Learning Fails

Generalization theory

Introduction to Machine Learning

COMS 4771 Introduction to Machine Learning. Nakul Verma

The Vapnik-Chervonenkis Dimension

Computational Learning Theory

The Decision List Machine

Learning Theory, Overfi1ng, Bias Variance Decomposi9on

A Bound on the Label Complexity of Agnostic Active Learning

PAC-learning, VC Dimension and Margin-based Bounds

CSE 417T: Introduction to Machine Learning. Lecture 11: Review. Henry Chai 10/02/18

VC-dimension for characterizing classifiers

A Tutorial on Computational Learning Theory Presented at Genetic Programming 1997 Stanford University, July 1997

The Performance of a New Hybrid Classifier Based on Boxes and Nearest Neighbors

PAC-learning, VC Dimension and Margin-based Bounds

PAC LEARNING, VC DIMENSION, AND THE ARITHMETIC HIERARCHY

Computational Learning Theory for Artificial Neural Networks

FINAL: CS 6375 (Machine Learning) Fall 2014

A Theoretical and Empirical Study of a Noise-Tolerant Algorithm to Learn Geometric Patterns

Part of the slides are adapted from Ziko Kolter

Linear Classification and SVM. Dr. Xin Zhang

Computational learning theory. PAC learning. VC dimension.

Nearly-tight VC-dimension bounds for piecewise linear neural networks

Learning theory Lecture 4

A Necessary Condition for Learning from Positive Examples

Understanding Generalization Error: Bounds and Decompositions

The Power of Random Counterexamples

Catching Elephants with Mice

Vapnik-Chervonenkis Dimension of Neural Nets

Lecture 25 of 42. PAC Learning, VC Dimension, and Mistake Bounds

Unlabeled Data Does Provably Help

Learnability and the Vapnik-Chervonenkis Dimension

Efficient Noise-Tolerant Learning from Statistical Queries

Supervised Learning Part I

Machine Learning 2007: Slides 1. Instructor: Tim van Erven Website: erven/teaching/0708/ml/

Machine Learning Lecture 7

On a learning problem that is independent of the set theory ZFC axioms

Introduction to Machine Learning CMU-10701

3.1 Decision Trees Are Less Expressive Than DNFs

Self bounding learning algorithms

Learning Theory. Machine Learning B Seyoung Kim. Many of these slides are derived from Tom Mitchell, Ziv- Bar Joseph. Thanks!

Baum s Algorithm Learns Intersections of Halfspaces with respect to Log-Concave Distributions

Computational and Statistical Learning Theory

On the Complexity ofteaching. AT&T Bell Laboratories WUCS August 11, Abstract

Introduction to Support Vector Machines

VC dimension, Model Selection and Performance Assessment for SVM and Other Machine Learning Algorithms

Support Vector Machines and Kernel Methods

Transcription:

MACHINE LEARNING Vapnik-Chervonenkis (VC) Dimension Alessandro Moschitti Department of Information Engineering and Computer Science University of Trento Email: moschitti@disi.unitn.it

Computational Learning Theory The approach used in rectangular hypotheses is just one simple case: Medium-built people No general rule has been derived Is there any means to determine if a function is PAC learnable and derive the right bound? The answer is yes and it is based on thevapnik- Chervonenkis dimension (VC-dimension, [Vapnik 95])

VC-Dimension definition (1) Def.1: (set shattering): a subset S of instances of a set X is shattered by a collection of function F if S' S there is a function f F such data: f (x) = { 1 x S 0 x S S

VC-Dimension definition (2) Def. 2: the VC-dimension of a function set F (VCdim(F)) is the cardinality of the largest dataset that can be shattered by F Observation: the type of the functions used for shattering data determines the VC-dim

VC-Dim of linear functions (hyperplane) In the plane (hyperplane = line): VC (Hyperplanes) is at least 3 VC (Hyperplanes) < 4 since there is no set of 4 points, which can be shattered by a line. VC(H)=3. In general, for a k-dimension space VC(H)=k+1 NB: It is useless selecting a set of linearly independent points

Upper Bound on Sample Complexity

Lower Bound on Sample Complexity

Bound on the Classification error using VC-dimension

Example: Rectangles for learning mediumbuilt person concept have VC-dim > 4 We must choose 4-point set, which can be shattered in all possible ways Given such 4 points, we assign them the {+,-} labels, in all possible ways. For each labeling it must exist a rectangle which produces such assignment, i.e. such classification

Example (cont d) Our classifier: inside the rectangle positive and outside negative examples, respectively Given 4 points (linearly independent), we have the following assignments: a) All points are + use a rectangle that includes them b) All points are - use a empty rectangle c) 3 points - and 1 + use a rectangle centered on the + points

Example (cont d) d) 3 points + and one - we can always find a rectangle which excludes the - points e) 2 points + and 2 points - we can define a rectangle which includes the 2 + and excludes the 2 -. To show d) and e) we should check all possibilities

For example, to prove e) Given 4 points

VC-dim cannot be 5 For any 5-point set, we can define a rectangle which has the most extern points as vertices If we assign to such vertices the + label and to the internal point the - label, there will not be any rectangle which reproduces such assigment

Applying general lower bound to rectangles m = O((4+ln(1/δ))/ε))

Bound Comparison (lower bound) m > (4/ε) ln(4/δ) (ad hoc bound) m = O((1/ε) (ln(1/δ) + 4)) = (lower bound based on VC-dim) Does the ad hoc bound satisfy the general bound? (4/ε) ln(4/δ) > (1/ε) (ln(1/δ) + 4) ln(4/δ) > ln(1/δ)/4 + 1 ln(1/δ)+ln(4) > ln(1/δ)/4 + 1 ln(4) > (-1+1/4)ln(1/δ) + 1 ln(4) > 1 ln(4) > ln(e)

References VC-dimension: MY SLIDES: http://disi.unitn.it/moschitti/ teaching.html MY BOOK: Automatic text categorization: from information retrieval to support vector learning Roberto Basili and Alessandro Moschitti

References A tutorial on Support Vector Machines for Pattern Recognition Downlodable from the web The Vapnik-Chervonenkis Dimension and the Learning Capability of Neural Nets Downlodable from the web Computational Learning Theory (Sally A Goldman Washington University St. Louis Missouri) Downlodable from the web AN INTRODUCTION TO SUPPORT VECTOR MACHINES (and other kernel-based learning methods) N. Cristianini and J. Shawe-Taylor Cambridge University Press You can buy it also on line

Other Web References On the sample complexity of PAC learning half spaces against the uniform distribution, Philip M. Long. A General Lower Bound on the Number of Examples Needed for Learning, Andrzej Ehrenfeucht, David Haussler, Michael Kearns and Leslie Valiant BOUNDS ON THE NUMBER OF EXAMPLES NEEDED FOR LEARNING FUNCTIONS, Hans Ulrich Simon Learnability and the Vapnik-Chervonenkis Dimension, ANSELM BLUMER, ANDRZEJ EHRENFEUCHT, DAVID HAUSSLER AND MANFRED K. WARMUTH A Preliminary PAC Analysis of Theory Revision, Raymond J. Mooney The Upper Bounds of Sample Complexity, http://mathsci.kaist.ac.kr/~nipl/ am621/lecturenotes.html

Proposed Exercises Try to formulate the concept medium-built people with squares instead of rectangles and apply the content of the PAC learning lecture to this new class of functions. Could you build a better ad-hoc bound than the one we evaluated in class? (assume that the concept to learn is a square and not a rectangle)

Propose Exercises Evaluate the VC-dimension (of course in a plane) for squares circles equilateral triangles Sketch the proof of VC < k but do not spend to much time in formalizing such proof. Compare the lower-bound to the sample complexity using squares (calculated with VC dimension) with your ad hoc bound derived from medium-built people (as we did it in class for rectangles).