Narrowing confidence interval width of PAC learning risk function by algorithmic inference

Size: px
Start display at page:

Download "Narrowing confidence interval width of PAC learning risk function by algorithmic inference"

Transcription

1 Narrowing confidence interval width of PAC learning risk function by algorithmic inference Bruno Apolloni, Dario Malchiodi Dip. di Scienze dell Informazione, Università degli Studi di Milano Via Comelico 39/4 235 Milano - Italy Abstract We narrow the width of the confidence interval introduced by Vapnik and Chervonenkis for the risk function in PAC learning boolean functions through non-consistent hypotheses. To obtain this improvement for a large class of learning algorithms we introduce both a theoretical framework for statistical inference of functions and a concept class complexity index, the detail, that is dual to the Vapnik-Chervonenkis dimension. Detail of a class and maximum number of mislabelled points add up linearly to constitute the learning problem complexity. The sample complexity dependency on this index is almost similar to the one on VC dimension. We formally prove that the former leads to confidence intervals for the risk function that are definitely narrower than in the latter. Introduction A suitable way of revisiting PAC learning is to assume that probabilities are random variables per se. Right from the start, the object of our inference is a string of data (possibly of infinite length) that we partition into a prefix we assume to be known at present (and therefore call

2 sample) and a suffix of unknown future data we call a population (see Figure ). All these data share the feature of being independent observations of the same phenomenon. Therefore without loss of generality we assume these data as the output of some function having input from a set of independent random variables Í uniformly distributed in the unit interval effectively, the most essential source of randomness. We will refer to Å Í µ as a sampling mechanism ½ ¼ ¼ ½ ¼ ½ ½ ¼ ½ ¼ ¼ ¼ ¼ ½ ¼ ½ ½ ¼ ¼ ¼ ½ ¼ ¼ ¼ ¼ ½ ¼ ½ ½ ¼ ½ ¼ ½ ¼ sample population Figure : Sample and population of random bits. and to as an explaining function. This function is precisely the object of our inference. Let us consider, for instance, the sample mechanism Å Í Ô µ, where Í is the above uniform random variable and Ô Ùµ ½ if Ù Ô and ¼ otherwise, describes the sample and population from a Bernoulli random variable of mean Ô as in Figure. As can be seen from Figure 2, for a given sequence of Í s we obtain different binary strings depending on the height Ô of the threshold line. Thus it is easy to derive the following implication chain Ã Ô µ Ô Ôµ Ã Ô ½µ () (where is the number of ½ s observed in the sample, and Ã Ô denotes the random variable counting the number of ½ s in the sample if the threshold in the explaining function switches to Ô for the same realizations of Í) and the consequent bound on the probability È Ã Ô µ È Ô Ôµ È Ã Ô ½µ (2) which characterizes the cumulative distribution function (c.d.f.) Ô of the parameter Ô. In our statistical framework indeed the above height Ô is a random variable in ¼ ½ representing the asymptotic (Å ½ in Figure 2) frequency of ½ in the populations that are compatible, as Such a always exists by the probability integral transformation theorem [5]. By default capital letters (such as Í, ) will denote random variables and small letters (Ù,Ü) their corresponding realizations; the sets the realizations belong to will be denoted by capital gothic letters (Í ). 2

3 a function of Í suffix of the sample (of size Ñ in the figure), with the number of actually observed ½ s. Equation 2 comes straight from marginalizing the joint Í s distribution in respect to the population when we deal with sample statistic Ã Ô and vice versa when we deal with population parameter Ô. Note the asymmetry in the implications. It derives from the fact that: (i) ½ Í Ô Ñ Å ¼ Ñ Figure 2: Generating a Bernoullian sample. Horizontal axis: index of the Í realizations; vertical axis: both Í (lines) and (bullets) values. The threshold line Ô realizes a mapping from Í to through Ô. raising the threshold parameter in Ô cannot decrease the number of ½ in the observed sample, but (ii) we can recognize that such a raising occurred only if we really see a number of ones in the sample greater than. We will refer to every expression similar to () as a twisting argument, since it allows us to exchange events on parameters with events on statistics. Twisting sample with population properties is our approach, that we call algorithmic inference, to statistical inference. The peculiarity of PAC learning boolean functions is that we need more than one sampled point to recognize that the probability measure of the error domain is less than a given right implication in (). In section 2 we will denote these points as sentry points of a concept and will derive bounds on the distribution of the above measure. These bounds are based on the supremum of the cardinality of the sentry points over the set of symmetric differences between concepts and candidate hypotheses, that we call detail, and an analogous limit on the number of points misclassified by the hypotheses. In Section 3 we draw the consequent confidence intervals for the error probability and compare them with the analogous commonly used intervals proposed by Vapnik and Chervonenkis [6]. A set of graphs show the great gain in interval amplitude achived by our approach, specially when the sample size is modest. 3

4 2 Algorithmic inference of Boolean functions In the typical framework of PAC learning theory the parameter to be investigated is the probability that the inferred function will compute erroneously on next inputs (will not explain new sampled points). In greater detail, the general form of the sample is: Ñ µ ½ Ñ where are boolean variables. If we assume that for every Å and every Å an exists in a Boolean class, call it, such that Å µµ ½ Å, then we are interested in the measure of the symmetric difference between an (that we denote hypothesis ) computed from Ñ by a function (that we call learning algorithm), and any such 2. (See figure 3). The peculiarity of this inference problem is that some degrees of freedom of our h c c c 2 c 3 Figure 3: Circle describing the sample and possible circles describing the population. Small rombes and circles: sampled points; Line filled region: symmetric difference. sample are burned by the property whose confidence interval we are looking for. Namely, if we denote by the random variable Í the measure of the above symmetric difference, the twisting argument reads (with a caveat on the left part): Ì Ø Í ½ Í µ Ì Ø Í (3) 2 Note that in our approach for a given sample ÞÑ we consider possible suffixes covered by. 4

5 where Ø Í is the number of actual sample points falling in (the empirical risk in the Vapnik notation [6]), Ì the analogous statistic for an enlargment of of measure, and is a new complexity measure directly referred to. The threshold in the left inequality is due to the fact that is a function of a sample specification Þ Ñ on its own turn, so that, if is such that the symmetric difference grows with the set of included sample points and vice-versa (the mentioned caveat), Í µ implies that any enlargement region containing must violate consistency on at least one more of the sampled points at the basis of s computation. The quantity is an upper bound to the number of sample points sufficient to witness that according to a new hypothesis containing has been generated after an increase of Í. These points, which we figure as concepts sentinels, are formally described as follows. Definition 2.. For a concept class on a space Ü the set of specifications of, a sentry function is a total function Ë ¾ satisfying the conditions (i) Ë µ for all ¾. (sentinels are outside the sentinelled concept) (ii) Having introduced the sets Ë µ and ÙÔ µ ¼ ¾ such that ¼ and ¼ µ, if ¾ ¾ ÙÔ ½ µ, then ¾ Ë ½ µ. (sentinels are inside the invading concept) (iii) No Ë ¼ Ë exists satisfying (i) and (ii) and having the property that Ë ¼ µ Ë µ for every ¾. (we look for a minimal set of sentinels) (iv) Whenever ½ and ¾ are such that ½ ¾ Ë ¾ µ and ¾ Ë ½ µ, then the restriction of Ë to ½ ÙÔ ½ µò ¾ is a sentry function on this set. (sentinels are honest watchers) Ë µ is the frontier of upon Ë, and its element are called sentry points. The quantity ÙÔ Ë µ is called detail of. For ¾, denoting the set ¾, the detail Ë of the class ¾ is the quantity ÙÔ. ¾ A given concept class might admit more than one sentry function. Condition (iv) prevents us 5

6 from building sentry functions which are unnatural, where some frontier points of a function ½ have the sole role of artificially increasing the elements of ½ Ë ½ µ in order to prevent it from being included in another ¾. The mentioned condition states that this role can be considered only a side effect of points which are primarily involved in sentinelling some formula. Extension of the domain of Ë in order to include is necessary to state some key properties, such as Fact 2. below. Example 2.. Let us consider the class ¾ of boolean formulas on ¼ ½ ¾ in terms of propositional variables ½ and ¾ : ¾ ¼ ½ ½ ¾ ½ ¾. After giving label to points inside concepts and to those outside them, we represent the related supports as follows: ¼¼ ¼½ ½¼ ½½ ½ ¼ ¾ ½ ¾ ½ ¾ ½ By inspection of points (i)-(iv) of definition 2., a possible outer sentry function for ¾ is: Ë ½ µ ½½, Ë ¾ µ ¼½ ½¼, Ë µ ¼½, Ë µ ½¼, Ë µ, where sentry points are marked with a circle. The reader can realize that Ë ½ µ ¼¼ ¼½ is unfeasible according to condition (iv). Indeed point ¼¼ has the sole task of removing ¾ and from ÙÔ ½ µ. But it is useless for sentinelling, in conjunction with ¼½, concepts belonging to ÙÔ ½ µò ¾. The detail is a parameter difficult to compute, except in case of some Boolean classes usually referred to in the literature: for instance is ½ if the elements of are oriented half-lines on Ê, ¾ if they are segments or circles on Ê ¾ and if they are convex polygons on Ê ¾ having exactly edges. The interested reader can find detailed examples in [] and [2]. However, although semantically different from the Vapnik-Cervonenkis dimension, this complexity index is related to the latter 6

7 by the following: Fact 2.. [] Let us denote by Î µ the Vapnik Cervonenkis dimension [4] of a concept class, then Î µ ½µ ½ Î µ ½µ Substituting with the new complexity index in (3) we find bounds to the sampling complexity for a wide class of learning algorithms according to the following theorem: Theorem 2.. [3] For a given probability space ȵ where is a -algebra on and È is a possibly unknown probability measure defined over, assume we are given. a concept class on with detail ; 2. a sample Ñ drawn from the fixed space and labelled according to a ¾ labelling an infinite suffix of it; 3. a function Þ Ñ misclassifying at least Ø ¼ and at most Ø ¾ Æ points of total probability not greater than ¾ ¼ ½µ. Let us denote Þ Ñ µ and Í the random variable representing the probability measure of for any ¾ labelling Þ Ñ as in 2. Then for each ¾ ½µ where Á Ø Ñ function. Á ½ Ø ¼ Ñ Ø ¼ µ È Í Á Ø Ñ Øµ ½µ (4) ص ½µ ½ Ø ½ ¼ Ñ ½ µ Ñ is the incomplete Beta 3 Confidence intervals for the learning error Inequalities (4) fix a lower and an upper bound for the cumulative distribution function Í allowing us to fix the following result: Theorem 3.. For a probability space ȵ, a concept class, a learning algorithm 7

8 and parameters Ñ,, Ø ¼ and Ø as in Theorem 2., the event Ð Í Ð (5) has probability greater than ½ Æ, where Ð is Æ ¾ quantile of the Beta distribution of parameters ½ Ø ¼ and Ñ Ø ¼, and Ð is the analogous ½ Æ ¾ quantile for parameters Ø and Ñ Øµ ½ Proof. Starting from (4) and getting Á Ð Ø Ñ Øµ ½µ Á Ð Ø ¼ ½ Ñ Ø ¼ µ as a lower bound to Í Ð µ Í Ð µ È Ð Í Ð µ we can obtain the interval Ð Ð µ by dividing the probability measure outside it in two equal parts, that is solving the equation system Á Ð Ø Ñ Øµ ½µ ½ Æ ¾ Á Ð Ø ¼ ½ Ñ Ø ¼ µ Æ ¾ (6) with respect to Ð and Ð. Hence the claim follows. In true statistical inference notation, the statement È Ð Í Ð µ ½ Æ means that Ð and Ð are the extremes of a confidence interval at level Æ for the learning error Í. When Ñ grows, as the numerical solutions of (6) become difficult to handle, the Binomial distribution underlying them can be approximated with a Gaussian law, following the De Moivre- Laplace theorem [7]. The commonly used confidence interval Á for the same probability comes from the following theorem: Theorem 3.2. [6] Let ««¾ be a Boolean concept class of bounded Vapnik- Chervonenkis dimension Î, and let «µ be the frequency of errors computed from the sample for a concept «¾. Then, for Ñ and simultaneously for all the concepts in, the event «µ ¾ ÐÓ ¾Ñ ½ ÐÓ Æ Ñ Ê «µ «µ ¾ ÐÓ ¾Ñ ½ ÐÓ Æ Ñ (7) has proability ½ Æ. 8

9 Note that Theorem 2. can be issued for a number of mislabelled points and frontier cardinality of the single learning task (essentially maintaining the same proof) as Theorem 3. does for the sole former quantity. For simplicity s sake, in the following numerical example we refer to upper bounds (or values constant with concepts and hypotheses) on both complexity indices and empirical error. Figure 4 compares our confidence intervals with the ones obtained computing formula (7) for a set of values of the number of mislabelled points and of the complexity indices. Following the previous remark, we compute the former quantity in case of Vapnik formula as Ñ, where is the empirical risk (here constant with «). In the same spirit we assume Î µ and ¼ ( continuous variable). For a sample of ½¼¼, ½¼¼¼ and ½¼¼¼¼¼¼ (a) (b) (c) Figure 4: Comparison between bilateral ¼ confidence intervals for actual risk. -axis: number of misclassified points. -axis: Vapnik-Chervonenkis dimension and class detail. -axis: confidence interval limits. Light surfaces: Vapnik-Chervonenkis confidence intervals. Dark surfaces: Our confidence intervals. (a) Sample size Ñ ½¼¼, (b) Sample size Ñ ½¼¼¼, (c) Sample size Ñ ½¼¼¼¼¼¼. elements respectively, the three graphs show the limits of the ¼ -confidence intervals drawn using both Vapnik (external surfaces) and our (internal surfaces) bounds. Moreover, to appreciate the differences even better, in Fig. 5 we draw a section for concept complexity =4 in function of the number of misclassified points. We used dark gray lines for plotting bounds from (6) and light gray lines for those from their Gaussian approximation. Note that these different bounds are distinguishable only in the first figure. The figure shows that: (i) our confidence intervals are always more accurate than Vapnik s; this benefit accounts for a narrowing of one order at the smallest sample size, while tends to disappear when the sample size increases, and (ii) our 9

10 (a) (b) (c) Figure 5: Same comparison as in figure 4 for class complexity =. -axis: number of misclassified points. -axis: confidence interval limits. Black lines: Vapnik-Chervonenkis confidence intervals. Dark gray lines: Our confidence intervals. Light gray lines: Our confidence intervals obtained using Gaussian approximation. confidence intervals are consistent, that is they are always contained in ¼ ½. References [] APOLLONI, B., AND CHIARAVALLI, S. Pac learning of concept classes through the boundaries of their items. Theoretical Computer Science, 72 (997), 9 2. [2] APOLLONI, B., AND MALCHIODI, D. Gaining degrees of freedom in subsymbolic learning. Theoretical Computer Science 255 (2), [3] APOLLONI, B., MALCHIODI, D., OROVAS, C., AND PALMAS, G. From synapses to rules. Cognitive Systems Research. In press. [4] BLUMER, A., EHRENFREUCHT, A., HAUSSLER, D., AND WARMUTH, M. Learnability and the vapnik-chervonenkis dimension. Journal of the ACM, 36 (989), [5] ROHATGI, V. K. An Introduction to Probablity Theory and Mathematical Statistics. Wiley Series in Probability and Mathematical Statistcs. John Wiley & Sons, New York, 976. [6] VAPNIK, V. Estimation of dependencies based on empirical data. Springer, New York, 982. [7] WILKS, S. S. Mathematical Statistics. Wiley Publications in Statistics. John Wiley, New York, 965.

Twisting Sample Observations with Population Properties to learn

Twisting Sample Observations with Population Properties to learn Twisting Sample Observations with Population Properties to learn B. APOLLONI, S. BASSIS, S. GAITO and D. MALCHIODI Dipartimento di Scienze dell Informazione Università degli Studi di Milano Via Comelico

More information

A Language for Task Orchestration and its Semantic Properties

A Language for Task Orchestration and its Semantic Properties DEPARTMENT OF COMPUTER SCIENCES A Language for Task Orchestration and its Semantic Properties David Kitchin, William Cook and Jayadev Misra Department of Computer Science University of Texas at Austin

More information

PAC Learning. prof. dr Arno Siebes. Algorithmic Data Analysis Group Department of Information and Computing Sciences Universiteit Utrecht

PAC Learning. prof. dr Arno Siebes. Algorithmic Data Analysis Group Department of Information and Computing Sciences Universiteit Utrecht PAC Learning prof. dr Arno Siebes Algorithmic Data Analysis Group Department of Information and Computing Sciences Universiteit Utrecht Recall: PAC Learning (Version 1) A hypothesis class H is PAC learnable

More information

Analysis of Spectral Kernel Design based Semi-supervised Learning

Analysis of Spectral Kernel Design based Semi-supervised Learning Analysis of Spectral Kernel Design based Semi-supervised Learning Tong Zhang IBM T. J. Watson Research Center Yorktown Heights, NY 10598 Rie Kubota Ando IBM T. J. Watson Research Center Yorktown Heights,

More information

Applications of Discrete Mathematics to the Analysis of Algorithms

Applications of Discrete Mathematics to the Analysis of Algorithms Applications of Discrete Mathematics to the Analysis of Algorithms Conrado Martínez Univ. Politècnica de Catalunya, Spain May 2007 Goal Given some algorithm taking inputs from some set Á, we would like

More information

COMS 4771 Introduction to Machine Learning. Nakul Verma

COMS 4771 Introduction to Machine Learning. Nakul Verma COMS 4771 Introduction to Machine Learning Nakul Verma Announcements HW2 due now! Project proposal due on tomorrow Midterm next lecture! HW3 posted Last time Linear Regression Parametric vs Nonparametric

More information

An Introduction to Statistical Theory of Learning. Nakul Verma Janelia, HHMI

An Introduction to Statistical Theory of Learning. Nakul Verma Janelia, HHMI An Introduction to Statistical Theory of Learning Nakul Verma Janelia, HHMI Towards formalizing learning What does it mean to learn a concept? Gain knowledge or experience of the concept. The basic process

More information

On the Sample Complexity of Noise-Tolerant Learning

On the Sample Complexity of Noise-Tolerant Learning On the Sample Complexity of Noise-Tolerant Learning Javed A. Aslam Department of Computer Science Dartmouth College Hanover, NH 03755 Scott E. Decatur Laboratory for Computer Science Massachusetts Institute

More information

COMP9444: Neural Networks. Vapnik Chervonenkis Dimension, PAC Learning and Structural Risk Minimization

COMP9444: Neural Networks. Vapnik Chervonenkis Dimension, PAC Learning and Structural Risk Minimization : Neural Networks Vapnik Chervonenkis Dimension, PAC Learning and Structural Risk Minimization 11s2 VC-dimension and PAC-learning 1 How good a classifier does a learner produce? Training error is the precentage

More information

Computational Learning Theory

Computational Learning Theory Computational Learning Theory Pardis Noorzad Department of Computer Engineering and IT Amirkabir University of Technology Ordibehesht 1390 Introduction For the analysis of data structures and algorithms

More information

Yale University Department of Computer Science. The VC Dimension of k-fold Union

Yale University Department of Computer Science. The VC Dimension of k-fold Union Yale University Department of Computer Science The VC Dimension of k-fold Union David Eisenstat Dana Angluin YALEU/DCS/TR-1360 June 2006, revised October 2006 The VC Dimension of k-fold Union David Eisenstat

More information

Margin Maximizing Loss Functions

Margin Maximizing Loss Functions Margin Maximizing Loss Functions Saharon Rosset, Ji Zhu and Trevor Hastie Department of Statistics Stanford University Stanford, CA, 94305 saharon, jzhu, hastie@stat.stanford.edu Abstract Margin maximizing

More information

A Necessary Condition for Learning from Positive Examples

A Necessary Condition for Learning from Positive Examples Machine Learning, 5, 101-113 (1990) 1990 Kluwer Academic Publishers. Manufactured in The Netherlands. A Necessary Condition for Learning from Positive Examples HAIM SHVAYTSER* (HAIM%SARNOFF@PRINCETON.EDU)

More information

A Tutorial on Computational Learning Theory Presented at Genetic Programming 1997 Stanford University, July 1997

A Tutorial on Computational Learning Theory Presented at Genetic Programming 1997 Stanford University, July 1997 A Tutorial on Computational Learning Theory Presented at Genetic Programming 1997 Stanford University, July 1997 Vasant Honavar Artificial Intelligence Research Laboratory Department of Computer Science

More information

The sample complexity of agnostic learning with deterministic labels

The sample complexity of agnostic learning with deterministic labels The sample complexity of agnostic learning with deterministic labels Shai Ben-David Cheriton School of Computer Science University of Waterloo Waterloo, ON, N2L 3G CANADA shai@uwaterloo.ca Ruth Urner College

More information

Learnability and the Doubling Dimension

Learnability and the Doubling Dimension Learnability and the Doubling Dimension Yi Li Genome Institute of Singapore liy3@gis.a-star.edu.sg Philip M. Long Google plong@google.com Abstract Given a set of classifiers and a probability distribution

More information

General Patterns for Nonmonotonic Reasoning: From Basic Entailments to Plausible Relations

General Patterns for Nonmonotonic Reasoning: From Basic Entailments to Plausible Relations General Patterns for Nonmonotonic Reasoning: From Basic Entailments to Plausible Relations OFER ARIELI AND ARNON AVRON, Department of Computer Science, School of Mathematical Sciences, Tel-Aviv University,

More information

Statistical and Computational Learning Theory

Statistical and Computational Learning Theory Statistical and Computational Learning Theory Fundamental Question: Predict Error Rates Given: Find: The space H of hypotheses The number and distribution of the training examples S The complexity of the

More information

Computational Learning Theory

Computational Learning Theory 1 Computational Learning Theory 2 Computational learning theory Introduction Is it possible to identify classes of learning problems that are inherently easy or difficult? Can we characterize the number

More information

A Generalization of Principal Component Analysis to the Exponential Family

A Generalization of Principal Component Analysis to the Exponential Family A Generalization of Principal Component Analysis to the Exponential Family Michael Collins Sanjoy Dasgupta Robert E. Schapire AT&T Labs Research 8 Park Avenue, Florham Park, NJ 7932 mcollins, dasgupta,

More information

Randomized Simultaneous Messages: Solution of a Problem of Yao in Communication Complexity

Randomized Simultaneous Messages: Solution of a Problem of Yao in Communication Complexity Randomized Simultaneous Messages: Solution of a Problem of Yao in Communication Complexity László Babai Peter G. Kimmel Department of Computer Science The University of Chicago 1100 East 58th Street Chicago,

More information

Computational Learning Theory. Definitions

Computational Learning Theory. Definitions Computational Learning Theory Computational learning theory is interested in theoretical analyses of the following issues. What is needed to learn effectively? Sample complexity. How many examples? Computational

More information

CONVEX OPTIMIZATION OVER POSITIVE POLYNOMIALS AND FILTER DESIGN. Y. Genin, Y. Hachez, Yu. Nesterov, P. Van Dooren

CONVEX OPTIMIZATION OVER POSITIVE POLYNOMIALS AND FILTER DESIGN. Y. Genin, Y. Hachez, Yu. Nesterov, P. Van Dooren CONVEX OPTIMIZATION OVER POSITIVE POLYNOMIALS AND FILTER DESIGN Y. Genin, Y. Hachez, Yu. Nesterov, P. Van Dooren CESAME, Université catholique de Louvain Bâtiment Euler, Avenue G. Lemaître 4-6 B-1348 Louvain-la-Neuve,

More information

Lund Institute of Technology Centre for Mathematical Sciences Mathematical Statistics

Lund Institute of Technology Centre for Mathematical Sciences Mathematical Statistics Lund Institute of Technology Centre for Mathematical Sciences Mathematical Statistics STATISTICAL METHODS FOR SAFETY ANALYSIS FMS065 ÓÑÔÙØ Ö Ü Ö Ì ÓÓØ ØÖ Ô Ð ÓÖ Ø Ñ Ò Ý Ò Ò ÐÝ In this exercise we will

More information

Self-Testing Polynomial Functions Efficiently and over Rational Domains

Self-Testing Polynomial Functions Efficiently and over Rational Domains Chapter 1 Self-Testing Polynomial Functions Efficiently and over Rational Domains Ronitt Rubinfeld Madhu Sudan Ý Abstract In this paper we give the first self-testers and checkers for polynomials over

More information

CS 6375: Machine Learning Computational Learning Theory

CS 6375: Machine Learning Computational Learning Theory CS 6375: Machine Learning Computational Learning Theory Vibhav Gogate The University of Texas at Dallas Many slides borrowed from Ray Mooney 1 Learning Theory Theoretical characterizations of Difficulty

More information

Lecture 16: Modern Classification (I) - Separating Hyperplanes

Lecture 16: Modern Classification (I) - Separating Hyperplanes Lecture 16: Modern Classification (I) - Separating Hyperplanes Outline 1 2 Separating Hyperplane Binary SVM for Separable Case Bayes Rule for Binary Problems Consider the simplest case: two classes are

More information

Models of Language Acquisition: Part II

Models of Language Acquisition: Part II Models of Language Acquisition: Part II Matilde Marcolli CS101: Mathematical and Computational Linguistics Winter 2015 Probably Approximately Correct Model of Language Learning General setting of Statistical

More information

On Urquhart s C Logic

On Urquhart s C Logic On Urquhart s C Logic Agata Ciabattoni Dipartimento di Informatica Via Comelico, 39 20135 Milano, Italy ciabatto@dsiunimiit Abstract In this paper we investigate the basic many-valued logics introduced

More information

10.1 The Formal Model

10.1 The Formal Model 67577 Intro. to Machine Learning Fall semester, 2008/9 Lecture 10: The Formal (PAC) Learning Model Lecturer: Amnon Shashua Scribe: Amnon Shashua 1 We have see so far algorithms that explicitly estimate

More information

Citation Osaka Journal of Mathematics. 43(2)

Citation Osaka Journal of Mathematics. 43(2) TitleIrreducible representations of the Author(s) Kosuda, Masashi Citation Osaka Journal of Mathematics. 43(2) Issue 2006-06 Date Text Version publisher URL http://hdl.handle.net/094/0396 DOI Rights Osaka

More information

Computational Learning Theory

Computational Learning Theory CS 446 Machine Learning Fall 2016 OCT 11, 2016 Computational Learning Theory Professor: Dan Roth Scribe: Ben Zhou, C. Cervantes 1 PAC Learning We want to develop a theory to relate the probability of successful

More information

Boosting and Hard-Core Sets

Boosting and Hard-Core Sets Boosting and Hard-Core Sets Adam R. Klivans Department of Mathematics MIT Cambridge, MA 02139 klivans@math.mit.edu Rocco A. Servedio Ý Division of Engineering and Applied Sciences Harvard University Cambridge,

More information

The Vapnik-Chervonenkis Dimension

The Vapnik-Chervonenkis Dimension The Vapnik-Chervonenkis Dimension Prof. Dan A. Simovici UMB 1 / 91 Outline 1 Growth Functions 2 Basic Definitions for Vapnik-Chervonenkis Dimension 3 The Sauer-Shelah Theorem 4 The Link between VCD and

More information

Neural Network Learning: Testing Bounds on Sample Complexity

Neural Network Learning: Testing Bounds on Sample Complexity Neural Network Learning: Testing Bounds on Sample Complexity Joaquim Marques de Sá, Fernando Sereno 2, Luís Alexandre 3 INEB Instituto de Engenharia Biomédica Faculdade de Engenharia da Universidade do

More information

Lecture 3: Introduction to Complexity Regularization

Lecture 3: Introduction to Complexity Regularization ECE90 Spring 2007 Statistical Learning Theory Instructor: R. Nowak Lecture 3: Introduction to Complexity Regularization We ended the previous lecture with a brief discussion of overfitting. Recall that,

More information

CSE 648: Advanced algorithms

CSE 648: Advanced algorithms CSE 648: Advanced algorithms Topic: VC-Dimension & ɛ-nets April 03, 2003 Lecturer: Piyush Kumar Scribed by: Gen Ohkawa Please Help us improve this draft. If you read these notes, please send feedback.

More information

Discriminative Learning can Succeed where Generative Learning Fails

Discriminative Learning can Succeed where Generative Learning Fails Discriminative Learning can Succeed where Generative Learning Fails Philip M. Long, a Rocco A. Servedio, b,,1 Hans Ulrich Simon c a Google, Mountain View, CA, USA b Columbia University, New York, New York,

More information

hal , version 1-27 Mar 2014

hal , version 1-27 Mar 2014 Author manuscript, published in "2nd Multidisciplinary International Conference on Scheduling : Theory and Applications (MISTA 2005), New York, NY. : United States (2005)" 2 More formally, we denote by

More information

Generalization, Overfitting, and Model Selection

Generalization, Overfitting, and Model Selection Generalization, Overfitting, and Model Selection Sample Complexity Results for Supervised Classification Maria-Florina (Nina) Balcan 10/03/2016 Two Core Aspects of Machine Learning Algorithm Design. How

More information

Computational Learning Theory: Probably Approximately Correct (PAC) Learning. Machine Learning. Spring The slides are mainly from Vivek Srikumar

Computational Learning Theory: Probably Approximately Correct (PAC) Learning. Machine Learning. Spring The slides are mainly from Vivek Srikumar Computational Learning Theory: Probably Approximately Correct (PAC) Learning Machine Learning Spring 2018 The slides are mainly from Vivek Srikumar 1 This lecture: Computational Learning Theory The Theory

More information

Computational Learning Theory

Computational Learning Theory Computational Learning Theory Sinh Hoa Nguyen, Hung Son Nguyen Polish-Japanese Institute of Information Technology Institute of Mathematics, Warsaw University February 14, 2006 inh Hoa Nguyen, Hung Son

More information

Adaptive Sampling Under Low Noise Conditions 1

Adaptive Sampling Under Low Noise Conditions 1 Manuscrit auteur, publié dans "41èmes Journées de Statistique, SFdS, Bordeaux (2009)" Adaptive Sampling Under Low Noise Conditions 1 Nicolò Cesa-Bianchi Dipartimento di Scienze dell Informazione Università

More information

Sample width for multi-category classifiers

Sample width for multi-category classifiers R u t c o r Research R e p o r t Sample width for multi-category classifiers Martin Anthony a Joel Ratsaby b RRR 29-2012, November 2012 RUTCOR Rutgers Center for Operations Research Rutgers University

More information

Computational Learning Theory. CS534 - Machine Learning

Computational Learning Theory. CS534 - Machine Learning Computational Learning Theory CS534 Machine Learning Introduction Computational learning theory Provides a theoretical analysis of learning Shows when a learning algorithm can be expected to succeed Shows

More information

THE VAPNIK- CHERVONENKIS DIMENSION and LEARNABILITY

THE VAPNIK- CHERVONENKIS DIMENSION and LEARNABILITY THE VAPNIK- CHERVONENKIS DIMENSION and LEARNABILITY Dan A. Simovici UMB, Doctoral Summer School Iasi, Romania What is Machine Learning? The Vapnik-Chervonenkis Dimension Probabilistic Learning Potential

More information

Computational Learning Theory for Artificial Neural Networks

Computational Learning Theory for Artificial Neural Networks Computational Learning Theory for Artificial Neural Networks Martin Anthony and Norman Biggs Department of Statistical and Mathematical Sciences, London School of Economics and Political Science, Houghton

More information

CS340 Machine learning Lecture 4 Learning theory. Some slides are borrowed from Sebastian Thrun and Stuart Russell

CS340 Machine learning Lecture 4 Learning theory. Some slides are borrowed from Sebastian Thrun and Stuart Russell CS340 Machine learning Lecture 4 Learning theory Some slides are borrowed from Sebastian Thrun and Stuart Russell Announcement What: Workshop on applying for NSERC scholarships and for entry to graduate

More information

Machine Learning

Machine Learning Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 11, 2012 Today: Computational Learning Theory Probably Approximately Coorrect (PAC) learning theorem

More information

k-fold unions of low-dimensional concept classes

k-fold unions of low-dimensional concept classes k-fold unions of low-dimensional concept classes David Eisenstat September 2009 Abstract We show that 2 is the minimum VC dimension of a concept class whose k-fold union has VC dimension Ω(k log k). Keywords:

More information

VC Dimension Review. The purpose of this document is to review VC dimension and PAC learning for infinite hypothesis spaces.

VC Dimension Review. The purpose of this document is to review VC dimension and PAC learning for infinite hypothesis spaces. VC Dimension Review The purpose of this document is to review VC dimension and PAC learning for infinite hypothesis spaces. Previously, in discussing PAC learning, we were trying to answer questions about

More information

The University of Hong Kong Department of Statistics and Actuarial Science STAT2802 Statistical Models Tutorial Solutions Solutions to Problems 71-80

The University of Hong Kong Department of Statistics and Actuarial Science STAT2802 Statistical Models Tutorial Solutions Solutions to Problems 71-80 The University of Hong Kong Department of Statistics and Actuarial Science STAT2802 Statistical Models Tutorial Solutions Solutions to Problems 71-80 71. Decide in each case whether the hypothesis is simple

More information

Part of the slides are adapted from Ziko Kolter

Part of the slides are adapted from Ziko Kolter Part of the slides are adapted from Ziko Kolter OUTLINE 1 Supervised learning: classification........................................................ 2 2 Non-linear regression/classification, overfitting,

More information

Sample Complexity of Learning Independent of Set Theory

Sample Complexity of Learning Independent of Set Theory Sample Complexity of Learning Independent of Set Theory Shai Ben-David University of Waterloo, Canada Based on joint work with Pavel Hrubes, Shay Moran, Amir Shpilka and Amir Yehudayoff Simons workshop,

More information

1 The Probably Approximately Correct (PAC) Model

1 The Probably Approximately Correct (PAC) Model COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #3 Scribe: Yuhui Luo February 11, 2008 1 The Probably Approximately Correct (PAC) Model A target concept class C is PAC-learnable by

More information

A Result of Vapnik with Applications

A Result of Vapnik with Applications A Result of Vapnik with Applications Martin Anthony Department of Statistical and Mathematical Sciences London School of Economics Houghton Street London WC2A 2AE, U.K. John Shawe-Taylor Department of

More information

Block vs. Stream cipher

Block vs. Stream cipher Block vs. Stream cipher Idea of a block cipher: partition the text into relatively large (e.g. 128 bits) blocks and encode each block separately. The encoding of each block generally depends on at most

More information

MACHINE LEARNING ADVANCED MACHINE LEARNING

MACHINE LEARNING ADVANCED MACHINE LEARNING MACHINE LEARNING ADVANCED MACHINE LEARNING Recap of Important Notions on Estimation of Probability Density Functions 22 MACHINE LEARNING Discrete Probabilities Consider two variables and y taking discrete

More information

Learning symmetric non-monotone submodular functions

Learning symmetric non-monotone submodular functions Learning symmetric non-monotone submodular functions Maria-Florina Balcan Georgia Institute of Technology ninamf@cc.gatech.edu Nicholas J. A. Harvey University of British Columbia nickhar@cs.ubc.ca Satoru

More information

MACHINE LEARNING - CS671 - Part 2a The Vapnik-Chervonenkis Dimension

MACHINE LEARNING - CS671 - Part 2a The Vapnik-Chervonenkis Dimension MACHINE LEARNING - CS671 - Part 2a The Vapnik-Chervonenkis Dimension Prof. Dan A. Simovici UMB Prof. Dan A. Simovici (UMB) MACHINE LEARNING - CS671 - Part 2a The Vapnik-Chervonenkis Dimension 1 / 30 The

More information

Statistical Learning Learning From Examples

Statistical Learning Learning From Examples Statistical Learning Learning From Examples We want to estimate the working temperature range of an iphone. We could study the physics and chemistry that affect the performance of the phone too hard We

More information

Lecture 25 of 42. PAC Learning, VC Dimension, and Mistake Bounds

Lecture 25 of 42. PAC Learning, VC Dimension, and Mistake Bounds Lecture 25 of 42 PAC Learning, VC Dimension, and Mistake Bounds Thursday, 15 March 2007 William H. Hsu, KSU http://www.kddresearch.org/courses/spring2007/cis732 Readings: Sections 7.4.17.4.3, 7.5.17.5.3,

More information

Hence C has VC-dimension d iff. π C (d) = 2 d. (4) C has VC-density l if there exists K R such that, for all

Hence C has VC-dimension d iff. π C (d) = 2 d. (4) C has VC-density l if there exists K R such that, for all 1. Computational Learning Abstract. I discuss the basics of computational learning theory, including concept classes, VC-dimension, and VC-density. Next, I touch on the VC-Theorem, ɛ-nets, and the (p,

More information

Learning Theory. Piyush Rai. CS5350/6350: Machine Learning. September 27, (CS5350/6350) Learning Theory September 27, / 14

Learning Theory. Piyush Rai. CS5350/6350: Machine Learning. September 27, (CS5350/6350) Learning Theory September 27, / 14 Learning Theory Piyush Rai CS5350/6350: Machine Learning September 27, 2011 (CS5350/6350) Learning Theory September 27, 2011 1 / 14 Why Learning Theory? We want to have theoretical guarantees about our

More information

Machine Learning

Machine Learning Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 11, 2012 Today: Computational Learning Theory Probably Approximately Coorrect (PAC) learning theorem

More information

EVALUATING MISCLASSIFICATION PROBABILITY USING EMPIRICAL RISK 1. Victor Nedel ko

EVALUATING MISCLASSIFICATION PROBABILITY USING EMPIRICAL RISK 1. Victor Nedel ko 94 International Journal "Information Theories & Applications" Vol13 [Raudys, 001] Raudys S, Statistical and neural classifiers, Springer, 001 [Mirenkova, 00] S V Mirenkova (edel ko) A method for prediction

More information

PAC Learning Introduction to Machine Learning. Matt Gormley Lecture 14 March 5, 2018

PAC Learning Introduction to Machine Learning. Matt Gormley Lecture 14 March 5, 2018 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University PAC Learning Matt Gormley Lecture 14 March 5, 2018 1 ML Big Picture Learning Paradigms:

More information

The Connectivity of Boolean Satisfiability: Computational and Structural Dichotomies

The Connectivity of Boolean Satisfiability: Computational and Structural Dichotomies The Connectivity of Boolean Satisfiability: Computational and Structural Dichotomies Parikshit Gopalan Georgia Tech. parik@cc.gatech.edu Phokion G. Kolaitis Ý IBM Almaden. kolaitis@us.ibm.com Christos

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Vapnik Chervonenkis Theory Barnabás Póczos Empirical Risk and True Risk 2 Empirical Risk Shorthand: True risk of f (deterministic): Bayes risk: Let us use the empirical

More information

Maximal Width Learning of Binary Functions

Maximal Width Learning of Binary Functions Maximal Width Learning of Binary Functions Martin Anthony Department of Mathematics, London School of Economics, Houghton Street, London WC2A2AE, UK Joel Ratsaby Electrical and Electronics Engineering

More information

Machine Learning. VC Dimension and Model Complexity. Eric Xing , Fall 2015

Machine Learning. VC Dimension and Model Complexity. Eric Xing , Fall 2015 Machine Learning 10-701, Fall 2015 VC Dimension and Model Complexity Eric Xing Lecture 16, November 3, 2015 Reading: Chap. 7 T.M book, and outline material Eric Xing @ CMU, 2006-2015 1 Last time: PAC and

More information

Causality and Boundary of wave solutions

Causality and Boundary of wave solutions Causality and Boundary of wave solutions IV International Meeting on Lorentzian Geometry Santiago de Compostela, 2007 José Luis Flores Universidad de Málaga Joint work with Miguel Sánchez: Class. Quant.

More information

Cognitive Cyber-Physical System

Cognitive Cyber-Physical System Cognitive Cyber-Physical System Physical to Cyber-Physical The emergence of non-trivial embedded sensor units, networked embedded systems and sensor/actuator networks has made possible the design and implementation

More information

CS446: Machine Learning Spring Problem Set 4

CS446: Machine Learning Spring Problem Set 4 CS446: Machine Learning Spring 2017 Problem Set 4 Handed Out: February 27 th, 2017 Due: March 11 th, 2017 Feel free to talk to other members of the class in doing the homework. I am more concerned that

More information

The Decision List Machine

The Decision List Machine The Decision List Machine Marina Sokolova SITE, University of Ottawa Ottawa, Ont. Canada,K1N-6N5 sokolova@site.uottawa.ca Nathalie Japkowicz SITE, University of Ottawa Ottawa, Ont. Canada,K1N-6N5 nat@site.uottawa.ca

More information

A Probabilistic Upper Bound on Differential Entropy

A Probabilistic Upper Bound on Differential Entropy A Probabilistic Upper on ifferential Joseph estefano Member IEEE and Erik Learned-Miller Abstract A novel, non-trivial, probabilistic upper bound on the entropy of an unknown one-dimensional distribution,

More information

Understanding Generalization Error: Bounds and Decompositions

Understanding Generalization Error: Bounds and Decompositions CIS 520: Machine Learning Spring 2018: Lecture 11 Understanding Generalization Error: Bounds and Decompositions Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the

More information

Computational Learning Theory: Shattering and VC Dimensions. Machine Learning. Spring The slides are mainly from Vivek Srikumar

Computational Learning Theory: Shattering and VC Dimensions. Machine Learning. Spring The slides are mainly from Vivek Srikumar Computational Learning Theory: Shattering and VC Dimensions Machine Learning Spring 2018 The slides are mainly from Vivek Srikumar 1 This lecture: Computational Learning Theory The Theory of Generalization

More information

HANDBOOK OF APPLICABLE MATHEMATICS

HANDBOOK OF APPLICABLE MATHEMATICS HANDBOOK OF APPLICABLE MATHEMATICS Chief Editor: Walter Ledermann Volume VI: Statistics PART A Edited by Emlyn Lloyd University of Lancaster A Wiley-Interscience Publication JOHN WILEY & SONS Chichester

More information

Relating Data Compression and Learnability

Relating Data Compression and Learnability Relating Data Compression and Learnability Nick Littlestone, Manfred K. Warmuth Department of Computer and Information Sciences University of California at Santa Cruz June 10, 1986 Abstract We explore

More information

Computational Learning Theory

Computational Learning Theory Computational Learning Theory [read Chapter 7] [Suggested exercises: 7.1, 7.2, 7.5, 7.8] Computational learning theory Setting 1: learner poses queries to teacher Setting 2: teacher chooses examples Setting

More information

VARIATIONAL CALCULUS IN SPACE OF MEASURES AND OPTIMAL DESIGN

VARIATIONAL CALCULUS IN SPACE OF MEASURES AND OPTIMAL DESIGN Chapter 1 VARIATIONAL CALCULUS IN SPACE OF MEASURES AND OPTIMAL DESIGN Ilya Molchanov Department of Statistics University of Glasgow ilya@stats.gla.ac.uk www.stats.gla.ac.uk/ ilya Sergei Zuyev Department

More information

Generalization theory

Generalization theory Generalization theory Chapter 4 T.P. Runarsson (tpr@hi.is) and S. Sigurdsson (sven@hi.is) Introduction Suppose you are given the empirical observations, (x 1, y 1 ),..., (x l, y l ) (X Y) l. Consider the

More information

Computational Learning Theory. CS 486/686: Introduction to Artificial Intelligence Fall 2013

Computational Learning Theory. CS 486/686: Introduction to Artificial Intelligence Fall 2013 Computational Learning Theory CS 486/686: Introduction to Artificial Intelligence Fall 2013 1 Overview Introduction to Computational Learning Theory PAC Learning Theory Thanks to T Mitchell 2 Introduction

More information

On the Performance of Random Vector Quantization Limited Feedback Beamforming in a MISO System

On the Performance of Random Vector Quantization Limited Feedback Beamforming in a MISO System 1 On the Performance of Random Vector Quantization Limited Feedback Beamforming in a MISO System Chun Kin Au-Yeung, Student Member, IEEE, and David J. Love, Member, IEEE Abstract In multiple antenna wireless

More information

Optimal Routing Policy in Two Deterministic Queues

Optimal Routing Policy in Two Deterministic Queues INSTITUT NATIONAL DE RECHERCHE EN INFORMATIQUE ET EN AUTOMATIQUE Optimal Routing Policy in Two Deterministic Queues Bruno Gaujal Emmanuel Hyon N 3997 Septembre 2000 THÈME 1 ISSN 0249-6399 ISRN INRIA/RR--3997--FR+ENG

More information

Visual cryptography schemes with optimal pixel expansion

Visual cryptography schemes with optimal pixel expansion Theoretical Computer Science 369 (2006) 69 82 wwwelseviercom/locate/tcs Visual cryptography schemes with optimal pixel expansion Carlo Blundo a,, Stelvio Cimato b, Alfredo De Santis a a Dipartimento di

More information

ab = c a If the coefficients a,b and c are real then either α and β are real or α and β are complex conjugates

ab = c a If the coefficients a,b and c are real then either α and β are real or α and β are complex conjugates Further Pure Summary Notes. Roots of Quadratic Equations For a quadratic equation ax + bx + c = 0 with roots α and β Sum of the roots Product of roots a + b = b a ab = c a If the coefficients a,b and c

More information

An Improved Quantum Fourier Transform Algorithm and Applications

An Improved Quantum Fourier Transform Algorithm and Applications An Improved Quantum Fourier Transform Algorithm and Applications Lisa Hales Group in Logic and the Methodology of Science University of California at Berkeley hales@cs.berkeley.edu Sean Hallgren Ý Computer

More information

Algebra II Learning Targets

Algebra II Learning Targets Chapter 0 Preparing for Advanced Algebra LT 0.1 Representing Functions Identify the domain and range of functions LT 0.2 FOIL Use the FOIL method to multiply binomials LT 0.3 Factoring Polynomials Use

More information

Correlation at Low Temperature: I. Exponential Decay

Correlation at Low Temperature: I. Exponential Decay Correlation at Low Temperature: I. Exponential Decay Volker Bach FB Mathematik; Johannes Gutenberg-Universität; D-55099 Mainz; Germany; email: vbach@mathematik.uni-mainz.de Jacob Schach Møller Ý Département

More information

Math 494: Mathematical Statistics

Math 494: Mathematical Statistics Math 494: Mathematical Statistics Instructor: Jimin Ding jmding@wustl.edu Department of Mathematics Washington University in St. Louis Class materials are available on course website (www.math.wustl.edu/

More information

F O R SOCI AL WORK RESE ARCH

F O R SOCI AL WORK RESE ARCH 7 TH EUROPE AN CONFERENCE F O R SOCI AL WORK RESE ARCH C h a l l e n g e s i n s o c i a l w o r k r e s e a r c h c o n f l i c t s, b a r r i e r s a n d p o s s i b i l i t i e s i n r e l a t i o n

More information

Name (NetID): (1 Point)

Name (NetID): (1 Point) CS446: Machine Learning Fall 2016 October 25 th, 2016 This is a closed book exam. Everything you need in order to solve the problems is supplied in the body of this exam. This exam booklet contains four

More information

Visual Cryptography Schemes with Optimal Pixel Expansion

Visual Cryptography Schemes with Optimal Pixel Expansion Visual Cryptography Schemes with Optimal Pixel Expansion Carlo Blundo, Stelvio Cimato and Alfredo De Santis Dipartimento di Informatica ed Applicazioni Università degli Studi di Salerno, 808, Baronissi

More information

Lecture 29: Computational Learning Theory

Lecture 29: Computational Learning Theory CS 710: Complexity Theory 5/4/2010 Lecture 29: Computational Learning Theory Instructor: Dieter van Melkebeek Scribe: Dmitri Svetlov and Jake Rosin Today we will provide a brief introduction to computational

More information

The Fast Fourier Transform

The Fast Fourier Transform The Fast Fourier Transform 1 Motivation: digital signal processing The fast Fourier transform (FFT) is the workhorse of digital signal processing To understand how it is used, consider any signal: any

More information

Observations on the Stability Properties of Cooperative Systems

Observations on the Stability Properties of Cooperative Systems 1 Observations on the Stability Properties of Cooperative Systems Oliver Mason and Mark Verwoerd Abstract We extend two fundamental properties of positive linear time-invariant (LTI) systems to homogeneous

More information

Support Vector Machines.

Support Vector Machines. Support Vector Machines www.cs.wisc.edu/~dpage 1 Goals for the lecture you should understand the following concepts the margin slack variables the linear support vector machine nonlinear SVMs the kernel

More information

Conway s RATS Sequences in Base 3

Conway s RATS Sequences in Base 3 3 47 6 3 Journal of Integer Sequences, Vol. 5 (0), Article.9. Conway s RATS Sequences in Base 3 Johann Thiel Department of Mathematical Sciences United States Military Academy West Point, NY 0996 USA johann.thiel@usma.edu

More information