Learnability and the Doubling Dimension

Size: px
Start display at page:

Download "Learnability and the Doubling Dimension"

Transcription

1 Learnability and the Doubling Dimension Yi Li Genome Institute of Singapore Philip M. Long Google Abstract Given a set of classifiers and a probability distribution over their domain, one can define a metric by taking the distance between a pair of classifiers to be the probability that they classify a random item differently. We prove bounds on the sample complexity of PAC learning in terms of the doubling dimension of this metric. These bounds imply known bounds on the sample complexity of learning halfspaces with respect to the uniform distribution that are optimal up to a constant factor. We prove a bound that holds for any algorithm that outputs a classifier with zero error whenever this is possible; this bound is in terms of the maximum of the doubling dimension and the VC-dimension of, and strengthens the best known bound in terms of the VC-dimension alone. We show that there is no bound on the doubling dimension in terms of the VC-dimension of (in contrast with the metric dimension). 1 Introduction A set of classifiers and a probability distribution over their domain induce a metric in which the distance between classifiers is the probability that they disagree on how to classify a random object. (Let us call this metric.) Properties of metrics like this have long been used for analyzing the generalization ability of learning algorithms [11, 32]. This paper is about bounds on the number of examples required for PAC learning in terms of the doubling dimension [4] of this metric space. The doubling dimension of a metric space is the least such that any ball can be covered by ¾ balls of half its radius. The doubling dimension has been frequently used lately in the analysis of algorithms [13, 20, 21, 17, 29, 14, 7, 22, 28, 6]. In the PAC-learning model, an algorithm is given examples Ü Ü µµ Ü Ñ Ü Ñ µµ of the behavior of an arbitrary member of a known class. The items Ü Ü Ñ are chosen independently at random according to. The algorithm must, with probability at least Æ (w.r.t. to the random choice of Ü Ü Ñ ), output a classifier whose distance from is at most. We show that if µ has doubling dimension, then can be PAC-learned with respect to using ÐÓ Æ Ç (1) examples. If in addition the VC-dimension of is, we show that any algorithm that outputs a classifier with zero training error whenever this is possible PAC-learns w.r.t. using examples. Ç ¼ ÕÐÓ ÐÓ Æ (2)

2 We show that if consists of halfspaces through the origin, and is the uniform distribution over the unit ball in Ê Ò, then the doubling dimension of µ is Ç Òµ. Thus (1) generalizes the Ò ÐÓ known bound of Ç Æ for learning halfspaces with respect to the uniform distribution [25], matching a known lower bound for this problem [23] up to a constant factor. Both upper bounds Ò ÐÓ ÐÓ improve on the Ç Æ bound that follows from the traditional analysis; (2) is the first such improvement for a polynomial-time algorithm. Some previous analyses of the sample complexity of learning have made use of the fact that the metric dimension [18] is at most the VC-dimension [11, 15]. Since using the doubling dimension can sometimes lead to a better bound, a natural question is whether there is also a bound on the doubling dimension in terms of the VC-dimension. We show that this is not the case: it is possible to pack «µ ¾ Ó µµ classifiers in a set of VC-dimension so that the distance between every pair is in the interval «¾«. Our analysis was inspired by some previous work in computational geometry [19], but is simpler. Combining our upper bound analysis with established techniques (see [33, 3, 8, 31, 30]), one can perform similar analyses for the more general case in which no classifier in has zero error. We have begun with the PAC model because it is a clean setting in which to illustrate the power of the doubling dimension for analyzing learning algorithms. The doubling dimension appears most useful when the best achievable error rate (the Bayes error) is of the same order as the inverse of the number of training examples (or smaller). Bounding the doubling dimension is useful for analyzing the sample complexity of learning because it limits the richness of a subclass of near the classifier to be learned. For other analyses that exploit bounds on such local richness, please see [31, 30, 5, 25, 26, 34]. It could be that stronger results could be obtained by marrying the techniques of this paper with those. In any case, it appears that the doubling dimension is an intuitive yet powerful way to bound the local complexity of a collection of classifiers. 2 Preliminaries 2.1 Learning For some domain, an example consists of a member of, and its classification in ¼. A classifier is a mapping from to ¼. A training set is a finite collection of examples. A learning algorithm takes as input a training set, and outputs a classifier. Suppose is a probability distribution over. Then define µ ÈÖ Ü Üµ ܵµ A learning algorithm PAC learns w.r.t. with accuracy and confidence Æ from Ñ examples if, for any ¾, if domain elements Ü Ü Ñ are drawn independently at random according to, and Ü Ü µµ Ü Ñ Ü Ñ µµ is passed to, which outputs, then ÈÖ µ µ Æ If is a set of classifiers, a learning algorithm is a consistent hypothesis finder for if it outputs an element of that correctly classifies all of the training data whenever it is possible to do so. 2.2 Metrics Suppose µ is a metric space. An «-cover for is a set Ì such that every element of has a counterpart in Ì that is at a distance at most «(with respect to ).

3 An «-packing for is a set Ì such that every pair of elements of Ì are at a distance greater than «(again, with respect to ). The «-ball centered at Þ ¾ consists of all Ø ¾ for which Þ Øµ «. Denote the size of the smallest «-cover by Æ «µ. Denote the size of the largest «-packing by Å «µ. Lemma 1 ([18]) For any metric space µ, and any «¼, Å ¾«µ Æ «µ Å «µ The doubling dimension of is the least such that, for all radii «¼, any «-ball in can be covered by at most ¾ «¾-balls. That is, for any «¼ and any Þ ¾, there is a such that ¾, and Ø ¾ Þ Øµ «¾ Ø ¾ ص «¾. 2.3 Probability For a function and a probability distribution, let Ü Üµµ be the expectation of w.r.t.. We will shorten this to µ, and if Ù Ù Ù Ñ µ ¾ Ñ, then Ù µ will be Ñ È Ñ Ù µ. We will use ÈÖ Ü, ÈÖ, and ÈÖÙ similarly. 3 The strongest upper bound Theorem 2 Suppose is the doubling dimension of µ. There is an algorithm that PAClearns from Ç ÐÓ Æµ examples. The key lemma limits the extent to which points that are separated from one another can crowd around some point in a metric space with limited doubling dimension. Lemma 3 (see [13]) Suppose µ is a metric space with doubling dimension and Þ ¾. For ¼, let Þ µ consist of the elements of Ù ¾ such that Ù Þµ (that is, the -ball centered at Þ). Then Å «Þ µµ «(In other words, any «-packing must have at most «µ elements within distance of Þ.) Proof: Since has doubling dimension, the set Þ µ can be covered by ¾ balls of radius ¾. Each of these can be covered by ¾ balls of radius, and so on. Thus, Þ µ can be covered by ¾ ÐÓ ¾ ««µ balls of radius «¾. Applying Lemma 1 completes the proof. Now we are ready to prove Theorem 2. The proof is an application of the peeling technique [1] (see [30]). Proof of Theorem 2: Construct an packing greedily, by repeatedly adding an element of to for as long as this is possible. This packing is also an -cover, since otherwise we could add another member to. Consider the algorithm that outputs the element of with minimum error on the training set. Whatever the target, some element of has error at most. Applying Chernoff bounds, Ç ÐÓ Æµ examples are sufficient that, with probability at least ƾ, this classifier is incorrect on at most a fraction ¾ of the training data. Thus, the training error of the hypothesis output by is at most ¾ with probability at least ƾ. Choose an arbitrary function, and let Ë be the random training set resulting from drawing Ñ examples according to, and classifying them using. Define Ë µ to be the fraction of examples

4 in Ë on which and disagree. We have ÈÖ ¾ µ and Ë µ ¾µ ÐÓ µ ¼ ÐÓ µ ¼ ÈÖ ¾ ¾ µ ¾ and Ë µ ¾µ ¾ µ ¾Ñ by Lemma 3 and the standard Chernoff bound. Each of the following steps is a straightforward manipulation: ÐÓ µ ¼ ¾ ÐÓ µ ¾ µ ¾Ñ ¾ ¼ ¾ ¾ ¾ Ñ ¼ Ñ ¾ Ñ ÐÓ µ ¾ ¾Ñ ¾ ¼ ¾ ¾ ¾ Ñ Since Ñ Ç ÐÓ Æµµµ is sufficient for Ñ Æ¾ and ¾ Ñ ¾, this completes the proof. 4 A bound for consistent hypothesis finders In this section we analyze algorithms that work by finding hypotheses with zero training error. This is one way to achieve computational efficiency, as is the case when consists of halfspaces. This analysis will use the notion of VC-dimension. Definition 4 The VC-dimension of a set of ¼ -valued functions with a common domain is the size of the largest set Ü Ü of domain elements such that Ü µ Ü µµ ¾ ¼ The following lemma generalizes the Chernoff bound to hold uniformly over a class of random variables; it concentrates on a simplified consequence of the Chernoff bound that is useful when bounding the probability that an empirical estimate is much larger than the true expectation. Lemma 5 (see [12, 24]) Suppose is a set of ¼ -valued functions with a common domain. Let be the VC-dimension of. Let be a probability distribution over. Choose «¼ and Ã. Then if Ñ ÐÓ «ÐÓ Æ µ «Ã ÐÓ Ãµ where is an absolute constant, then ÈÖÙ Ñ ¾ ÈÖ µ «but ÈÖÙ µ õ«µ Æ Now we are ready for the main analysis of this section. Theorem 6 Suppose the doubling dimension of µ and the VC-dimension of are both at most. Any consistent hypothesis finder for PAC learns from Ç ÐÓ ÐÓ Æ examples. Proof: Assume without loss of generality that ¼¼. Let «ÜÔ Õ ÐÒ ¼¼, we have «. Õ µ; since Choose a target function. For each ¾, define ¼ by ܵ ܵ ܵ. Let ¾. Since ܵ ܵ exactly when ܵ ܵ, the doubling dimension

5 of is the same as the doubling dimension of ; the VC-dimension of is also known to be the same as the VC-dimension of (see [32]). Construct an «packing greedily, by repeatedly adding an element of to for as long as this is possible. This packing is also an «-cover. For each ¾, let µ be its nearest neighbor in. Since «, by the triangle inequality, µ and Ù µ ¼ µ µµ and Ù µ ¼ (3) The triangle inequality also yields Ù µ ¼ µ Ù µµ or ÈÖÙ µ µ µ Combining this with (3), we have we have We have ÈÖ ¾ µ but Ù µ ¼µ ÈÖ ¾ µµ but Ù µµ µ (4) ÈÖ ¾ ÈÖÙ µ µ µ ÈÖ ¾ µµ but Ù µµ µ ÈÖ ¾ µ but Ù µ µ ÈÖ ¾ µ but ÈÖÙ µ µ ÐÓ µµ ¼ ÐÓ µµ ¾ ¼ ÈÖ ¾ ¾ µ µ ¾ µ and ÈÖÙ µ µ «¾Ñ where ¼ is an absolute constant, by Lemma 3 and the standard Chernoff bound. Computing a geometric sum exactly as in the proof of Theorem 2, we have that Ñ Ç µ suffices for ÈÖ ¾ µµ but Ù µµ µ ¾Ñ «for absolute constants ¾ ¼. By plugging in the value of «and solving, we can see that Ñ Ç Ö ÐÓ ÐÓ Æ suffices for ÈÖ ¾ ÈÖ µµ but ÈÖÙ µµ µ ƾ (5) Since ÈÖ µ µ «for all ¾, applying Lemma 5 with à «µ, we get that there is an absolute constant ¼ such that also suffices for Ñ ÐÓ «ÐÓ Æ «µ ÐÓ ÈÖ ¾ ÈÖÙ µ µ µ ƾ Substituting the value «into (6), it is sufficient that Ñ ÐÓ µ Õ ÐÓ µ ÐÓ Æ ÕÐÓ ÐÓ Putting this together with (5) and (4) completes the proof. «µ (6)

6 5 Halfspaces and the uniform distribution Proposition 7 If Í Ò is the uniform distribution over the unit ball in Ê Ò, and À Ò is the set of halfspaces that go through the origin, then the doubling dimension of À Ò ÍÒ µ is Ç Òµ. Proof: Choose ¾ À Ò and «¼. We will show that the ball of radius «centered at can be covered by Ç Òµ balls of radius «¾. Suppose Í ÀÒ is the probability distribution over À Ò obtained by choosing a normal vector Û uniformly from the unit ball, and outputting Ü Û Ü ¼. The argument will be a volume argument using Í ÀÒ. It is known (see Lemma 4 of [25]) that ÈÖ ÍÀÒ ÍÒ µ «µ «µ Ò where ¼ is an absolute constant independent of «and Ò. Furthermore, where ¾ ¼ is another absolute constant. ÈÖ ÍÀÒ ÍÒ µ «µ ¾ «µ Ò Suppose we choose arbitrarily choose ¾ ¾ À Ò that are at a distance at most «from, but «¾ far from one another. By the triangle inequality, «-balls centered at ¾ are disjoint. Thus, the probability that an random element of À Ò is in a ball of radius «centered at one of Æ is at least Æ «µ Ò. On the other hand, since each Æ has distance at most «from, any element of an «ball centered at one of them is at most ««far from. Thus, the union of the «balls centered at Æ is contained in the «ball centered at. Thus Æ «µ Ò ¾ «µ Ò, which implies Æ ¾ µ Ò ¾ Ç Òµ, completing the proof. 6 Separation Theorem 8 For all «¾ ¼ ¾ and positive integers there is a set of classifiers and a probability distribution over their common domain with the following properties: the VC-dimension of is ¾ ¾ ¾«for each ¾, «µ ¾«. This proof uses the probabilistic method. We begin with the following lemma. Lemma 9 Choose positive integers and. Suppose is chosen uniformly at random from among the subsets of of size. Then, for any, ÈÖ µ µµ Proof: in Appendix A. µ µ Now we re ready for the proof of Theorem 8, which uses the deletion technique (see [2]). ¾ Proof (of Theorem 8): Set the domain to be, where «. Let Æ ¾. Suppose Æ are chosen independently, uniformly at random from among the classifiers that evaluate to on exactly elements of. For any distinct, suppose µ is fixed, and we think of the members of µ as being chosen one at a time. The probability that any of the elements of µ is also in µ is. Applying the linearity of expectation, and averaging over the different possibilities for µ, we get µ µµ ¾

7 Applying Lemma 9, ÈÖ µ µ ¾µ ¾ ¾ µ ¾ µ ¾ ¾ Thus, the expected number of pairs such that ÈÖ µ µ ¾µ is at most ¾. Æ ¾ ¾µ ¾ This implies that there exist Æ such that ¾ ¾ µ µ ¾ Æ ¾ ¾µ If we delete one element from each such pair, and form from what remains, then each pair of elements in satisfies µ µ ¾ (7) If is the uniform distribution over, then (7) implies µ «. The number of elements of is at least Æ Æ ¾ ¾µ ¾ ¾ ƾ. Since each ¾ has µ, no function in evaluates to on each element of any set of elements of. Thus, the VC-dimension of is at most. Theorem 8 implies that there is no bound on the doubling dimension of µ in terms of the VC-dimension of. For any constraint on the VC-dimension, a set satisfying the constraint can have arbitrarily large doubling dimension by setting the value of «in Theorem 8 arbitrarily small. Acknowledgement We thank Gábor Lugosi and Tong Zhang for their help. References [1] K. Alexander. Rates of growth for weighted empirical processes. In Proc. of Berkeley Conference in Honor of Jerzy Neyman and Jack Kiefer, volume 2, pages , [2] N. Alon, J. H. Spencer, and P. Erdös. The Probabilistic Method. Wiley, [3] M. Anthony and P. L. Bartlett. Neural Network Learning: Theoretical Foundations. Cambridge University Press, [4] P. Assouad. Plongements lipschitziens dans. R. Bull. Soc. Math. France, 111(4): , [5] P. L. Bartlett, O. Bousquet, and S. Mendelson. Local Rademacher complexities. Annals of Statistics, 33(4): , [6] A. Beygelzimer, S. Kakade, and J. Langford. Cover trees for nearest neighbor. ICML, [7] H. T. H. Chan, A. Gupta, B. M. Maggs, and S. Zhou. On hierarchical routing in doubling metrics. SODA, [8] L. Devroye, L. Györfi, and G. Lugosi. A Probabilistic Theory of Pattern Recognition. Springer, [9] D. Dubhashi and D. Ranjan. Balls and bins: A study in negative dependence. Random Structures & Algorithms, 13(2):99 124, Sept [10] Devdatt Dubhashi, Volker Priebe, and Desh Ranjan. Negative dependence through the FKG inequality. Technical Report RS-96-27, BRICS, [11] R. M. Dudley. Central limit theorems for empirical measures. Annals of Probability, 6(6): , [12] R. M. Dudley. A course on empirical processes. Lecture notes in mathematics, 1097:2 142, [13] A. Gupta, R. Krauthgamer, and J. R. Lee. Bounded geometries, fractals, and low-distortion embeddings. FOCS, [14] S. Har-Peled and M. Mendel. Fast construction of nets in low dimensional metrics, and their applications. SICOMP, 35(5): , 2006.

8 [15] D. Haussler. Sphere packing numbers for subsets of the Boolean Ò-cube with bounded Vapnik- Chervonenkis dimension. Journal of Combinatorial Theory, Series A, 69(2): , [16] K. Joag-Dev and F. Proschan. Negative association of random variables, with applications. The Annals of Statistics, 11(1): , [17] J. Kleinberg, A. Slivkins, and T. Wexler. Triangulation and embedding using small sets of beacons. FOCS, [18] A. N. Kolmogorov and V. M. Tihomirov. -entropy and -capacity of sets in functional spaces. American Mathematical Society Translations (Ser. 2), 17: , [19] J. Komlós, J. Pach, and G. Woeginger. Almost tight bounds on epsilon-nets. Discrete and Computational Geometry, 7: , [20] R. Krauthgamer and J. R. Lee. The black-box complexity of nearest neighbor search. ICALP, [21] R. Krauthgamer and J. R. Lee. Navigating nets: simple algorithms for proximity search. SODA, [22] F. Kuhn, T. Moscibroda, and R. Wattenhofer. On the locality of bounded growth. PODC, [23] P. M. Long. On the sample complexity of PAC learning halfspaces against the uniform distribution. IEEE Transactions on Neural Networks, 6(6): , [24] P. M. Long. Using the pseudo-dimension to analyze approximation algorithms for integer programming. Proceedings of the Seventh International Workshop on Algorithms and Data Structures, [25] P. M. Long. An upper bound on the sample complexity of PAC learning halfspaces with respect to the uniform distribution. Information Processing Letters, 87(5): , [26] S. Mendelson. Estimating the performance of kernel classes. Journal of Machine Learning Research, 4: , [27] R. Motwani and P. Raghavan. Randomized Algorithms. Cambridge University Press, [28] A. Slivkins. Distance estimation and object location via rings of neighbors. PODC, [29] K. Talwar. Bypassing the embedding: Approximation schemes and compact representations for low dimensional metrics. STOC, [30] S. van de Geer. Empirical processes in M-estimation. Cambridge Series in Statistical and Probabilistic Methods, [31] A. van der Vaart and J. A. Wellner. Weak Convergence and Empirical Processes With Applications to Statistics. Springer, [32] V. N. Vapnik. Estimation of Dependencies based on Empirical Data. Springer Verlag, [33] V. N. Vapnik. Statistical Learning Theory. New York, [34] T. Zhang. Information theoretical upper and lower bounds for statistical estimation. IEEE Transactions on Information Theory, to appear. A Proof of Lemma 9 Definition 10 ([16]) A collection Ò of random variables are negatively associated if for every disjoint pair Á Â Ò of index sets, and for every pair Ê Á Ê and Ê Â Ê of non-decreasing functions, we have ¾ Áµ ¾ µµ ¾ Áµµ ¾ µµ Lemma 11 ([10]) If is chosen uniformly at random from among the subsets of with exactly elements, and if ¾ and ¼ otherwise, then are negatively associated. Lemma 12 ([9]) Collections Ò of negatively associated random variables satisfy Chernoff bounds: for any ¼, ÜÔ È Ò µµ É Ò ÜÔ µµ Proof of Lemma 9: Let ¾ ¼ È indicate whether ¾. By Lemma 11, are negatively associated. We have. Combining Lemma 12 with a standard Chernoff-Hoeffding bound (see Theorem 4.1 of [27]) completes the proof.

An Introduction to Statistical Theory of Learning. Nakul Verma Janelia, HHMI

An Introduction to Statistical Theory of Learning. Nakul Verma Janelia, HHMI An Introduction to Statistical Theory of Learning Nakul Verma Janelia, HHMI Towards formalizing learning What does it mean to learn a concept? Gain knowledge or experience of the concept. The basic process

More information

Randomized Simultaneous Messages: Solution of a Problem of Yao in Communication Complexity

Randomized Simultaneous Messages: Solution of a Problem of Yao in Communication Complexity Randomized Simultaneous Messages: Solution of a Problem of Yao in Communication Complexity László Babai Peter G. Kimmel Department of Computer Science The University of Chicago 1100 East 58th Street Chicago,

More information

Margin Maximizing Loss Functions

Margin Maximizing Loss Functions Margin Maximizing Loss Functions Saharon Rosset, Ji Zhu and Trevor Hastie Department of Statistics Stanford University Stanford, CA, 94305 saharon, jzhu, hastie@stat.stanford.edu Abstract Margin maximizing

More information

Narrowing confidence interval width of PAC learning risk function by algorithmic inference

Narrowing confidence interval width of PAC learning risk function by algorithmic inference Narrowing confidence interval width of PAC learning risk function by algorithmic inference Bruno Apolloni, Dario Malchiodi Dip. di Scienze dell Informazione, Università degli Studi di Milano Via Comelico

More information

COMS 4771 Introduction to Machine Learning. Nakul Verma

COMS 4771 Introduction to Machine Learning. Nakul Verma COMS 4771 Introduction to Machine Learning Nakul Verma Announcements HW2 due now! Project proposal due on tomorrow Midterm next lecture! HW3 posted Last time Linear Regression Parametric vs Nonparametric

More information

Bennett-type Generalization Bounds: Large-deviation Case and Faster Rate of Convergence

Bennett-type Generalization Bounds: Large-deviation Case and Faster Rate of Convergence Bennett-type Generalization Bounds: Large-deviation Case and Faster Rate of Convergence Chao Zhang The Biodesign Institute Arizona State University Tempe, AZ 8587, USA Abstract In this paper, we present

More information

Computational Learning Theory

Computational Learning Theory Computational Learning Theory Pardis Noorzad Department of Computer Engineering and IT Amirkabir University of Technology Ordibehesht 1390 Introduction For the analysis of data structures and algorithms

More information

Applications of Discrete Mathematics to the Analysis of Algorithms

Applications of Discrete Mathematics to the Analysis of Algorithms Applications of Discrete Mathematics to the Analysis of Algorithms Conrado Martínez Univ. Politècnica de Catalunya, Spain May 2007 Goal Given some algorithm taking inputs from some set Á, we would like

More information

The sample complexity of agnostic learning with deterministic labels

The sample complexity of agnostic learning with deterministic labels The sample complexity of agnostic learning with deterministic labels Shai Ben-David Cheriton School of Computer Science University of Waterloo Waterloo, ON, N2L 3G CANADA shai@uwaterloo.ca Ruth Urner College

More information

Rademacher Averages and Phase Transitions in Glivenko Cantelli Classes

Rademacher Averages and Phase Transitions in Glivenko Cantelli Classes IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 48, NO. 1, JANUARY 2002 251 Rademacher Averages Phase Transitions in Glivenko Cantelli Classes Shahar Mendelson Abstract We introduce a new parameter which

More information

The information-theoretic value of unlabeled data in semi-supervised learning

The information-theoretic value of unlabeled data in semi-supervised learning The information-theoretic value of unlabeled data in semi-supervised learning Alexander Golovnev Dávid Pál Balázs Szörényi January 5, 09 Abstract We quantify the separation between the numbers of labeled

More information

Self-Testing Polynomial Functions Efficiently and over Rational Domains

Self-Testing Polynomial Functions Efficiently and over Rational Domains Chapter 1 Self-Testing Polynomial Functions Efficiently and over Rational Domains Ronitt Rubinfeld Madhu Sudan Ý Abstract In this paper we give the first self-testers and checkers for polynomials over

More information

COMP9444: Neural Networks. Vapnik Chervonenkis Dimension, PAC Learning and Structural Risk Minimization

COMP9444: Neural Networks. Vapnik Chervonenkis Dimension, PAC Learning and Structural Risk Minimization : Neural Networks Vapnik Chervonenkis Dimension, PAC Learning and Structural Risk Minimization 11s2 VC-dimension and PAC-learning 1 How good a classifier does a learner produce? Training error is the precentage

More information

Sample width for multi-category classifiers

Sample width for multi-category classifiers R u t c o r Research R e p o r t Sample width for multi-category classifiers Martin Anthony a Joel Ratsaby b RRR 29-2012, November 2012 RUTCOR Rutgers Center for Operations Research Rutgers University

More information

Model Selection and Error Estimation

Model Selection and Error Estimation Model Selection and Error Estimation Peter L. Bartlett Stéphane Boucheron Computer Sciences Laboratory Laboratoire de Recherche en Informatique RSISE, Australian National University, Bâtiment 490 Canberra

More information

Analysis of Spectral Kernel Design based Semi-supervised Learning

Analysis of Spectral Kernel Design based Semi-supervised Learning Analysis of Spectral Kernel Design based Semi-supervised Learning Tong Zhang IBM T. J. Watson Research Center Yorktown Heights, NY 10598 Rie Kubota Ando IBM T. J. Watson Research Center Yorktown Heights,

More information

Transforming Hierarchical Trees on Metric Spaces

Transforming Hierarchical Trees on Metric Spaces CCCG 016, Vancouver, British Columbia, August 3 5, 016 Transforming Hierarchical Trees on Metric Spaces Mahmoodreza Jahanseir Donald R. Sheehy Abstract We show how a simple hierarchical tree called a cover

More information

A Simple Proof of Optimal Epsilon Nets

A Simple Proof of Optimal Epsilon Nets A Simple Proof of Optimal Epsilon Nets Nabil H. Mustafa Kunal Dutta Arijit Ghosh Abstract Showing the existence of ɛ-nets of small size has been the subject of investigation for almost 30 years, starting

More information

Discriminative Learning can Succeed where Generative Learning Fails

Discriminative Learning can Succeed where Generative Learning Fails Discriminative Learning can Succeed where Generative Learning Fails Philip M. Long, a Rocco A. Servedio, b,,1 Hans Ulrich Simon c a Google, Mountain View, CA, USA b Columbia University, New York, New York,

More information

THE VAPNIK- CHERVONENKIS DIMENSION and LEARNABILITY

THE VAPNIK- CHERVONENKIS DIMENSION and LEARNABILITY THE VAPNIK- CHERVONENKIS DIMENSION and LEARNABILITY Dan A. Simovici UMB, Doctoral Summer School Iasi, Romania What is Machine Learning? The Vapnik-Chervonenkis Dimension Probabilistic Learning Potential

More information

Dyadic Classification Trees via Structural Risk Minimization

Dyadic Classification Trees via Structural Risk Minimization Dyadic Classification Trees via Structural Risk Minimization Clayton Scott and Robert Nowak Department of Electrical and Computer Engineering Rice University Houston, TX 77005 cscott,nowak @rice.edu Abstract

More information

Machine Learning

Machine Learning Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 11, 2012 Today: Computational Learning Theory Probably Approximately Coorrect (PAC) learning theorem

More information

Boosting and Hard-Core Sets

Boosting and Hard-Core Sets Boosting and Hard-Core Sets Adam R. Klivans Department of Mathematics MIT Cambridge, MA 02139 klivans@math.mit.edu Rocco A. Servedio Ý Division of Engineering and Applied Sciences Harvard University Cambridge,

More information

Consistency of Nearest Neighbor Methods

Consistency of Nearest Neighbor Methods E0 370 Statistical Learning Theory Lecture 16 Oct 25, 2011 Consistency of Nearest Neighbor Methods Lecturer: Shivani Agarwal Scribe: Arun Rajkumar 1 Introduction In this lecture we return to the study

More information

A Bound on the Label Complexity of Agnostic Active Learning

A Bound on the Label Complexity of Agnostic Active Learning A Bound on the Label Complexity of Agnostic Active Learning Steve Hanneke March 2007 CMU-ML-07-103 School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213 Machine Learning Department,

More information

Similarity searching, or how to find your neighbors efficiently

Similarity searching, or how to find your neighbors efficiently Similarity searching, or how to find your neighbors efficiently Robert Krauthgamer Weizmann Institute of Science CS Research Day for Prospective Students May 1, 2009 Background Geometric spaces and techniques

More information

k-fold unions of low-dimensional concept classes

k-fold unions of low-dimensional concept classes k-fold unions of low-dimensional concept classes David Eisenstat September 2009 Abstract We show that 2 is the minimum VC dimension of a concept class whose k-fold union has VC dimension Ω(k log k). Keywords:

More information

Machine Learning

Machine Learning Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 11, 2012 Today: Computational Learning Theory Probably Approximately Coorrect (PAC) learning theorem

More information

A non-linear lower bound for planar epsilon-nets

A non-linear lower bound for planar epsilon-nets A non-linear lower bound for planar epsilon-nets Noga Alon Abstract We show that the minimum possible size of an ɛ-net for point objects and line (or rectangle)- ranges in the plane is (slightly) bigger

More information

Computational Learning Theory

Computational Learning Theory Computational Learning Theory Slides by and Nathalie Japkowicz (Reading: R&N AIMA 3 rd ed., Chapter 18.5) Computational Learning Theory Inductive learning: given the training set, a learning algorithm

More information

Computational Learning Theory: Shattering and VC Dimensions. Machine Learning. Spring The slides are mainly from Vivek Srikumar

Computational Learning Theory: Shattering and VC Dimensions. Machine Learning. Spring The slides are mainly from Vivek Srikumar Computational Learning Theory: Shattering and VC Dimensions Machine Learning Spring 2018 The slides are mainly from Vivek Srikumar 1 This lecture: Computational Learning Theory The Theory of Generalization

More information

Baum s Algorithm Learns Intersections of Halfspaces with respect to Log-Concave Distributions

Baum s Algorithm Learns Intersections of Halfspaces with respect to Log-Concave Distributions Baum s Algorithm Learns Intersections of Halfspaces with respect to Log-Concave Distributions Adam R Klivans UT-Austin klivans@csutexasedu Philip M Long Google plong@googlecom April 10, 2009 Alex K Tang

More information

Machine Learning. Lecture 9: Learning Theory. Feng Li.

Machine Learning. Lecture 9: Learning Theory. Feng Li. Machine Learning Lecture 9: Learning Theory Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2018 Why Learning Theory How can we tell

More information

Observations on the Stability Properties of Cooperative Systems

Observations on the Stability Properties of Cooperative Systems 1 Observations on the Stability Properties of Cooperative Systems Oliver Mason and Mark Verwoerd Abstract We extend two fundamental properties of positive linear time-invariant (LTI) systems to homogeneous

More information

A Simple Algorithm for Learning Stable Machines

A Simple Algorithm for Learning Stable Machines A Simple Algorithm for Learning Stable Machines Savina Andonova and Andre Elisseeff and Theodoros Evgeniou and Massimiliano ontil Abstract. We present an algorithm for learning stable machines which is

More information

A Large Deviation Bound for the Area Under the ROC Curve

A Large Deviation Bound for the Area Under the ROC Curve A Large Deviation Bound for the Area Under the ROC Curve Shivani Agarwal, Thore Graepel, Ralf Herbrich and Dan Roth Dept. of Computer Science University of Illinois Urbana, IL 680, USA {sagarwal,danr}@cs.uiuc.edu

More information

On Learning µ-perceptron Networks with Binary Weights

On Learning µ-perceptron Networks with Binary Weights On Learning µ-perceptron Networks with Binary Weights Mostefa Golea Ottawa-Carleton Institute for Physics University of Ottawa Ottawa, Ont., Canada K1N 6N5 050287@acadvm1.uottawa.ca Mario Marchand Ottawa-Carleton

More information

Online Learning: Random Averages, Combinatorial Parameters, and Learnability

Online Learning: Random Averages, Combinatorial Parameters, and Learnability Online Learning: Random Averages, Combinatorial Parameters, and Learnability Alexander Rakhlin Department of Statistics University of Pennsylvania Karthik Sridharan Toyota Technological Institute at Chicago

More information

On Learnability, Complexity and Stability

On Learnability, Complexity and Stability On Learnability, Complexity and Stability Silvia Villa, Lorenzo Rosasco and Tomaso Poggio 1 Introduction A key question in statistical learning is which hypotheses (function) spaces are learnable. Roughly

More information

An Improved Quantum Fourier Transform Algorithm and Applications

An Improved Quantum Fourier Transform Algorithm and Applications An Improved Quantum Fourier Transform Algorithm and Applications Lisa Hales Group in Logic and the Methodology of Science University of California at Berkeley hales@cs.berkeley.edu Sean Hallgren Ý Computer

More information

Homomorphism Preservation Theorem. Albert Atserias Universitat Politècnica de Catalunya Barcelona, Spain

Homomorphism Preservation Theorem. Albert Atserias Universitat Politècnica de Catalunya Barcelona, Spain Homomorphism Preservation Theorem Albert Atserias Universitat Politècnica de Catalunya Barcelona, Spain Structure of the talk 1. Classical preservation theorems 2. Preservation theorems in finite model

More information

Machine Learning. VC Dimension and Model Complexity. Eric Xing , Fall 2015

Machine Learning. VC Dimension and Model Complexity. Eric Xing , Fall 2015 Machine Learning 10-701, Fall 2015 VC Dimension and Model Complexity Eric Xing Lecture 16, November 3, 2015 Reading: Chap. 7 T.M book, and outline material Eric Xing @ CMU, 2006-2015 1 Last time: PAC and

More information

On the Sample Complexity of Noise-Tolerant Learning

On the Sample Complexity of Noise-Tolerant Learning On the Sample Complexity of Noise-Tolerant Learning Javed A. Aslam Department of Computer Science Dartmouth College Hanover, NH 03755 Scott E. Decatur Laboratory for Computer Science Massachusetts Institute

More information

Kernel expansions with unlabeled examples

Kernel expansions with unlabeled examples Kernel expansions with unlabeled examples Martin Szummer MIT AI Lab & CBCL Cambridge, MA szummer@ai.mit.edu Tommi Jaakkola MIT AI Lab Cambridge, MA tommi@ai.mit.edu Abstract Modern classification applications

More information

CONSIDER a measurable space and a probability

CONSIDER a measurable space and a probability 1682 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL 46, NO 11, NOVEMBER 2001 Learning With Prior Information M C Campi and M Vidyasagar, Fellow, IEEE Abstract In this paper, a new notion of learnability is

More information

On Stochastic Decompositions of Metric Spaces

On Stochastic Decompositions of Metric Spaces On Stochastic Decompositions of Metric Spaces Scribe of a talk given at the fall school Metric Embeddings : Constructions and Obstructions. 3-7 November 204, Paris, France Ofer Neiman Abstract In this

More information

Yale University Department of Computer Science. The VC Dimension of k-fold Union

Yale University Department of Computer Science. The VC Dimension of k-fold Union Yale University Department of Computer Science The VC Dimension of k-fold Union David Eisenstat Dana Angluin YALEU/DCS/TR-1360 June 2006, revised October 2006 The VC Dimension of k-fold Union David Eisenstat

More information

Support Vector Machines. Machine Learning Fall 2017

Support Vector Machines. Machine Learning Fall 2017 Support Vector Machines Machine Learning Fall 2017 1 Where are we? Learning algorithms Decision Trees Perceptron AdaBoost 2 Where are we? Learning algorithms Decision Trees Perceptron AdaBoost Produce

More information

Beating the Hold-Out: Bounds for K-fold and Progressive Cross-Validation

Beating the Hold-Out: Bounds for K-fold and Progressive Cross-Validation Beating the Hold-Out: Bounds for K-fold and Progressive Cross-Validation Avrim Blum School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213 avrim+@cscmuedu Adam Kalai School of Computer

More information

Relating Data Compression and Learnability

Relating Data Compression and Learnability Relating Data Compression and Learnability Nick Littlestone, Manfred K. Warmuth Department of Computer and Information Sciences University of California at Santa Cruz June 10, 1986 Abstract We explore

More information

10.1 The Formal Model

10.1 The Formal Model 67577 Intro. to Machine Learning Fall semester, 2008/9 Lecture 10: The Formal (PAC) Learning Model Lecturer: Amnon Shashua Scribe: Amnon Shashua 1 We have see so far algorithms that explicitly estimate

More information

Understanding Generalization Error: Bounds and Decompositions

Understanding Generalization Error: Bounds and Decompositions CIS 520: Machine Learning Spring 2018: Lecture 11 Understanding Generalization Error: Bounds and Decompositions Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the

More information

Fast Rates for Estimation Error and Oracle Inequalities for Model Selection

Fast Rates for Estimation Error and Oracle Inequalities for Model Selection Fast Rates for Estimation Error and Oracle Inequalities for Model Selection Peter L. Bartlett Computer Science Division and Department of Statistics University of California, Berkeley bartlett@cs.berkeley.edu

More information

CSE 648: Advanced algorithms

CSE 648: Advanced algorithms CSE 648: Advanced algorithms Topic: VC-Dimension & ɛ-nets April 03, 2003 Lecturer: Piyush Kumar Scribed by: Gen Ohkawa Please Help us improve this draft. If you read these notes, please send feedback.

More information

CLUSTERING is the problem of identifying groupings of

CLUSTERING is the problem of identifying groupings of IEEE TRANSACTIONS ON INFORMATION THEORY, VOL 54, NO 2, FEBRUARY 2008 781 On the Performance of Clustering in Hilbert Spaces Gérard Biau, Luc Devroye, and Gábor Lugosi, Member, IEEE Abstract Based on n

More information

From Batch to Transductive Online Learning

From Batch to Transductive Online Learning From Batch to Transductive Online Learning Sham Kakade Toyota Technological Institute Chicago, IL 60637 sham@tti-c.org Adam Tauman Kalai Toyota Technological Institute Chicago, IL 60637 kalai@tti-c.org

More information

The 123 Theorem and its extensions

The 123 Theorem and its extensions The 123 Theorem and its extensions Noga Alon and Raphael Yuster Department of Mathematics Raymond and Beverly Sackler Faculty of Exact Sciences Tel Aviv University, Tel Aviv, Israel Abstract It is shown

More information

Computational and Statistical Learning Theory

Computational and Statistical Learning Theory Computational and Statistical Learning Theory Problem set 1 Due: Monday, October 10th Please send your solutions to learning-submissions@ttic.edu Notation: Input space: X Label space: Y = {±1} Sample:

More information

Rademacher Complexity Bounds for Non-I.I.D. Processes

Rademacher Complexity Bounds for Non-I.I.D. Processes Rademacher Complexity Bounds for Non-I.I.D. Processes Mehryar Mohri Courant Institute of Mathematical ciences and Google Research 5 Mercer treet New York, NY 00 mohri@cims.nyu.edu Afshin Rostamizadeh Department

More information

Lecture Learning infinite hypothesis class via VC-dimension and Rademacher complexity;

Lecture Learning infinite hypothesis class via VC-dimension and Rademacher complexity; CSCI699: Topics in Learning and Game Theory Lecture 2 Lecturer: Ilias Diakonikolas Scribes: Li Han Today we will cover the following 2 topics: 1. Learning infinite hypothesis class via VC-dimension and

More information

Efficient classification for metric data

Efficient classification for metric data Efficient classification for metric data Lee-Ad Gottlieb Weizmann Institute of Science lee-ad.gottlieb@weizmann.ac.il Aryeh Leonid Kontorovich Ben Gurion University karyeh@cs.bgu.ac.il Robert Krauthgamer

More information

Generalization Bounds in Machine Learning. Presented by: Afshin Rostamizadeh

Generalization Bounds in Machine Learning. Presented by: Afshin Rostamizadeh Generalization Bounds in Machine Learning Presented by: Afshin Rostamizadeh Outline Introduction to generalization bounds. Examples: VC-bounds Covering Number bounds Rademacher bounds Stability bounds

More information

Generalization bounds

Generalization bounds Advanced Course in Machine Learning pring 200 Generalization bounds Handouts are jointly prepared by hie Mannor and hai halev-hwartz he problem of characterizing learnability is the most basic question

More information

1 Active Learning Foundations of Machine Learning and Data Science. Lecturer: Maria-Florina Balcan Lecture 20 & 21: November 16 & 18, 2015

1 Active Learning Foundations of Machine Learning and Data Science. Lecturer: Maria-Florina Balcan Lecture 20 & 21: November 16 & 18, 2015 10-806 Foundations of Machine Learning and Data Science Lecturer: Maria-Florina Balcan Lecture 20 & 21: November 16 & 18, 2015 1 Active Learning Most classic machine learning methods and the formal learning

More information

Vapnik-Chervonenkis Dimension of Neural Nets

Vapnik-Chervonenkis Dimension of Neural Nets P. L. Bartlett and W. Maass: Vapnik-Chervonenkis Dimension of Neural Nets 1 Vapnik-Chervonenkis Dimension of Neural Nets Peter L. Bartlett BIOwulf Technologies and University of California at Berkeley

More information

Computational Learning Theory

Computational Learning Theory 0. Computational Learning Theory Based on Machine Learning, T. Mitchell, McGRAW Hill, 1997, ch. 7 Acknowledgement: The present slides are an adaptation of slides drawn by T. Mitchell 1. Main Questions

More information

Uniform Glivenko-Cantelli Theorems and Concentration of Measure in the Mathematical Modelling of Learning

Uniform Glivenko-Cantelli Theorems and Concentration of Measure in the Mathematical Modelling of Learning Uniform Glivenko-Cantelli Theorems and Concentration of Measure in the Mathematical Modelling of Learning Martin Anthony Department of Mathematics London School of Economics Houghton Street London WC2A

More information

Learning convex bodies is hard

Learning convex bodies is hard Learning convex bodies is hard Navin Goyal Microsoft Research India navingo@microsoft.com Luis Rademacher Georgia Tech lrademac@cc.gatech.edu Abstract We show that learning a convex body in R d, given

More information

Lecture 2: Uniform Entropy

Lecture 2: Uniform Entropy STAT 583: Advanced Theory of Statistical Inference Spring 218 Lecture 2: Uniform Entropy Lecturer: Fang Han April 16 Disclaimer: These notes have not been subjected to the usual scrutiny reserved for formal

More information

Soft-decision Decoding of Chinese Remainder Codes

Soft-decision Decoding of Chinese Remainder Codes Soft-decision Decoding of Chinese Remainder Codes Venkatesan Guruswami Amit Sahai Ý Madhu Sudan Þ Abstract Given Ò relatively prime integers Ô Ô Ò and an integer Ò, the Chinese Remainder Code, ÊÌ É ÔÔÒ,

More information

Vapnik-Chervonenkis Dimension of Axis-Parallel Cuts arxiv: v2 [math.st] 23 Jul 2012

Vapnik-Chervonenkis Dimension of Axis-Parallel Cuts arxiv: v2 [math.st] 23 Jul 2012 Vapnik-Chervonenkis Dimension of Axis-Parallel Cuts arxiv:203.093v2 [math.st] 23 Jul 202 Servane Gey July 24, 202 Abstract The Vapnik-Chervonenkis (VC) dimension of the set of half-spaces of R d with frontiers

More information

Computational Learning Theory

Computational Learning Theory CS 446 Machine Learning Fall 2016 OCT 11, 2016 Computational Learning Theory Professor: Dan Roth Scribe: Ben Zhou, C. Cervantes 1 PAC Learning We want to develop a theory to relate the probability of successful

More information

Computational Learning Theory. CS 486/686: Introduction to Artificial Intelligence Fall 2013

Computational Learning Theory. CS 486/686: Introduction to Artificial Intelligence Fall 2013 Computational Learning Theory CS 486/686: Introduction to Artificial Intelligence Fall 2013 1 Overview Introduction to Computational Learning Theory PAC Learning Theory Thanks to T Mitchell 2 Introduction

More information

PAC-learning, VC Dimension and Margin-based Bounds

PAC-learning, VC Dimension and Margin-based Bounds More details: General: http://www.learning-with-kernels.org/ Example of more complex bounds: http://www.research.ibm.com/people/t/tzhang/papers/jmlr02_cover.ps.gz PAC-learning, VC Dimension and Margin-based

More information

Machine Learning. Computational Learning Theory. Eric Xing , Fall Lecture 9, October 5, 2016

Machine Learning. Computational Learning Theory. Eric Xing , Fall Lecture 9, October 5, 2016 Machine Learning 10-701, Fall 2016 Computational Learning Theory Eric Xing Lecture 9, October 5, 2016 Reading: Chap. 7 T.M book Eric Xing @ CMU, 2006-2016 1 Generalizability of Learning In machine learning

More information

A Note on Interference in Random Networks

A Note on Interference in Random Networks CCCG 2012, Charlottetown, P.E.I., August 8 10, 2012 A Note on Interference in Random Networks Luc Devroye Pat Morin Abstract The (maximum receiver-centric) interference of a geometric graph (von Rickenbach

More information

FORMULATION OF THE LEARNING PROBLEM

FORMULATION OF THE LEARNING PROBLEM FORMULTION OF THE LERNING PROBLEM MIM RGINSKY Now that we have seen an informal statement of the learning problem, as well as acquired some technical tools in the form of concentration inequalities, we

More information

Voronoi Networks and Their Probability of Misclassification

Voronoi Networks and Their Probability of Misclassification IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 11, NO. 6, NOVEMBER 2000 1361 Voronoi Networks and Their Probability of Misclassification K. Krishna, M. A. L. Thathachar, Fellow, IEEE, and K. R. Ramakrishnan

More information

Lecture 25 of 42. PAC Learning, VC Dimension, and Mistake Bounds

Lecture 25 of 42. PAC Learning, VC Dimension, and Mistake Bounds Lecture 25 of 42 PAC Learning, VC Dimension, and Mistake Bounds Thursday, 15 March 2007 William H. Hsu, KSU http://www.kddresearch.org/courses/spring2007/cis732 Readings: Sections 7.4.17.4.3, 7.5.17.5.3,

More information

Computational Learning Theory

Computational Learning Theory 1 Computational Learning Theory 2 Computational learning theory Introduction Is it possible to identify classes of learning problems that are inherently easy or difficult? Can we characterize the number

More information

Worst case complexity of the optimal LLL algorithm

Worst case complexity of the optimal LLL algorithm Worst case complexity of the optimal LLL algorithm ALI AKHAVI GREYC - Université de Caen, F-14032 Caen Cedex, France aliakhavi@infounicaenfr Abstract In this paper, we consider the open problem of the

More information

Lecture 18: March 15

Lecture 18: March 15 CS71 Randomness & Computation Spring 018 Instructor: Alistair Sinclair Lecture 18: March 15 Disclaimer: These notes have not been subjected to the usual scrutiny accorded to formal publications. They may

More information

VC-dimension of a context-dependent perceptron

VC-dimension of a context-dependent perceptron 1 VC-dimension of a context-dependent perceptron Piotr Ciskowski Institute of Engineering Cybernetics, Wroc law University of Technology, Wybrzeże Wyspiańskiego 27, 50 370 Wroc law, Poland cis@vectra.ita.pwr.wroc.pl

More information

Using boxes and proximity to classify data into several categories

Using boxes and proximity to classify data into several categories R u t c o r Research R e p o r t Using boxes and proximity to classify data into several categories Martin Anthony a Joel Ratsaby b RRR 7-01, February 01 RUTCOR Rutgers Center for Operations Research Rutgers

More information

The Performance of a New Hybrid Classifier Based on Boxes and Nearest Neighbors

The Performance of a New Hybrid Classifier Based on Boxes and Nearest Neighbors The Performance of a New Hybrid Classifier Based on Boxes and Nearest Neighbors Martin Anthony Department of Mathematics London School of Economics and Political Science Houghton Street, London WC2A2AE

More information

A Result of Vapnik with Applications

A Result of Vapnik with Applications A Result of Vapnik with Applications Martin Anthony Department of Statistical and Mathematical Sciences London School of Economics Houghton Street London WC2A 2AE, U.K. John Shawe-Taylor Department of

More information

Vapnik-Chervonenkis Dimension of Neural Nets

Vapnik-Chervonenkis Dimension of Neural Nets Vapnik-Chervonenkis Dimension of Neural Nets Peter L. Bartlett BIOwulf Technologies and University of California at Berkeley Department of Statistics 367 Evans Hall, CA 94720-3860, USA bartlett@stat.berkeley.edu

More information

Web-Mining Agents Computational Learning Theory

Web-Mining Agents Computational Learning Theory Web-Mining Agents Computational Learning Theory Prof. Dr. Ralf Möller Dr. Özgür Özcep Universität zu Lübeck Institut für Informationssysteme Tanya Braun (Exercise Lab) Computational Learning Theory (Adapted)

More information

TOPOLOGICAL ENTROPY FOR DIFFERENTIABLE MAPS OF INTERVALS

TOPOLOGICAL ENTROPY FOR DIFFERENTIABLE MAPS OF INTERVALS Chung, Y-.M. Osaka J. Math. 38 (200), 2 TOPOLOGICAL ENTROPY FOR DIFFERENTIABLE MAPS OF INTERVALS YONG MOO CHUNG (Received February 9, 998) Let Á be a compact interval of the real line. For a continuous

More information

The Vapnik-Chervonenkis Dimension

The Vapnik-Chervonenkis Dimension The Vapnik-Chervonenkis Dimension Prof. Dan A. Simovici UMB 1 / 91 Outline 1 Growth Functions 2 Basic Definitions for Vapnik-Chervonenkis Dimension 3 The Sauer-Shelah Theorem 4 The Link between VCD and

More information

The Decision List Machine

The Decision List Machine The Decision List Machine Marina Sokolova SITE, University of Ottawa Ottawa, Ont. Canada,K1N-6N5 sokolova@site.uottawa.ca Nathalie Japkowicz SITE, University of Ottawa Ottawa, Ont. Canada,K1N-6N5 nat@site.uottawa.ca

More information

Lower Bounds on the Complexity of Simplex Range Reporting on a Pointer Machine

Lower Bounds on the Complexity of Simplex Range Reporting on a Pointer Machine Lower Bounds on the Complexity of Simplex Range Reporting on a Pointer Machine Extended Abstract Bernard Chazelle Department of Computer Science Princeton University Burton Rosenberg Department of Mathematics

More information

Learning symmetric non-monotone submodular functions

Learning symmetric non-monotone submodular functions Learning symmetric non-monotone submodular functions Maria-Florina Balcan Georgia Institute of Technology ninamf@cc.gatech.edu Nicholas J. A. Harvey University of British Columbia nickhar@cs.ubc.ca Satoru

More information

An EM algorithm for Batch Markovian Arrival Processes and its comparison to a simpler estimation procedure

An EM algorithm for Batch Markovian Arrival Processes and its comparison to a simpler estimation procedure An EM algorithm for Batch Markovian Arrival Processes and its comparison to a simpler estimation procedure Lothar Breuer (breuer@info04.uni-trier.de) University of Trier, Germany Abstract. Although the

More information

Computational Learning Theory

Computational Learning Theory Computational Learning Theory Sinh Hoa Nguyen, Hung Son Nguyen Polish-Japanese Institute of Information Technology Institute of Mathematics, Warsaw University February 14, 2006 inh Hoa Nguyen, Hung Son

More information

On A-distance and Relative A-distance

On A-distance and Relative A-distance 1 ADAPTIVE COMMUNICATIONS AND SIGNAL PROCESSING LABORATORY CORNELL UNIVERSITY, ITHACA, NY 14853 On A-distance and Relative A-distance Ting He and Lang Tong Technical Report No. ACSP-TR-08-04-0 August 004

More information

ON ASSOUAD S EMBEDDING TECHNIQUE

ON ASSOUAD S EMBEDDING TECHNIQUE ON ASSOUAD S EMBEDDING TECHNIQUE MICHAL KRAUS Abstract. We survey the standard proof of a theorem of Assouad stating that every snowflaked version of a doubling metric space admits a bi-lipschitz embedding

More information

THE information capacity is one of the most important

THE information capacity is one of the most important 256 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 44, NO. 1, JANUARY 1998 Capacity of Two-Layer Feedforward Neural Networks with Binary Weights Chuanyi Ji, Member, IEEE, Demetri Psaltis, Senior Member,

More information

Statistical and Computational Learning Theory

Statistical and Computational Learning Theory Statistical and Computational Learning Theory Fundamental Question: Predict Error Rates Given: Find: The space H of hypotheses The number and distribution of the training examples S The complexity of the

More information

CSE 417T: Introduction to Machine Learning. Lecture 11: Review. Henry Chai 10/02/18

CSE 417T: Introduction to Machine Learning. Lecture 11: Review. Henry Chai 10/02/18 CSE 417T: Introduction to Machine Learning Lecture 11: Review Henry Chai 10/02/18 Unknown Target Function!: # % Training data Formal Setup & = ( ), + ),, ( -, + - Learning Algorithm 2 Hypothesis Set H

More information