On A-distance and Relative A-distance
|
|
- Daniela Roberts
- 5 years ago
- Views:
Transcription
1 1 ADAPTIVE COMMUNICATIONS AND SIGNAL PROCESSING LABORATORY CORNELL UNIVERSITY, ITHACA, NY On A-distance and Relative A-distance Ting He and Lang Tong Technical Report No. ACSP-TR August 004
2 I. INTRODUCTION We give a method to measure the distance between two probability distributions, and based on the distance measure, we bound the probability that the distance between the empirical distribution and the actual distribution exceeds certain level. The direct implication of our result is that for large sample size, one can replace the actual probability with its corresponding empirical probability with arbitrarily small error. The proof of the theorems is based on the Vapnik- Chervonenkis Theory [6] and Anthony & Shawe-Taylor s extension of the Vapnik-Chervonenkis Theory [4]. II. DISTANCE MEASURE A-distance Fix a measure space and let A be a collection of measurable sets. Let P 1 and P be probability distributions over this space. The A-distance between P 1 and P is defined as d A (P 1,P ) = sup P 1 (A) P (A) For finite sample sets S 1 and S, d A (S 1,S ) is defined similarly by replacing P i (A) with S i (A) = S i A / S i. The following notion of relative A-distance offers a way to take the relative magnitude of a change into account. Relative A-distance Let P 1,P be two probability distributions over the same measure space, let A denote a family of measurable subsets of that space, and A a set in A. The relative A-distance between P 1 and P is defined as φ A (P 1,P ) = sup P 1 (A) P (A) P 1 (A)+P (A) For empirical distances, simply replace P i (A) with the empirical measure S i (A) = S i A / S i. It is easy to see that A-distance is a metric. For the proof of relative A-distance as a metric, see [5]. III. VC BOUNDS The following theorems are derived from [6] and [4] to guarantee the rate that the empirical distance converges to the underlying distance for both distance notions.
3 3 For A-distance, we have the following theorems: Theorem 3.1 (Vapnik-Chervonenkis Inequality): Let P be a probability distribution over domain X and S be a collection of n i.i.d samples drawn from P. Then for a family of subsets of X A and a constant ǫ (0, 1), P n {sup S(A) P(A) > ǫ} 4Π A (n) 1 e nǫ /8 Using Theorem 3.1, it is easy to derive the following corollary. Corollary 3.: Let P 1, P be any probability distributions over some domain X and let A be a family of subsets of X and ǫ (0, 1). If S 1, S are i.i.d n samples drawn by P 1, P respectively, then, P n [ A A P 1 (A) P (A) S 1 (A) S (A) ǫ] < 8Π A (n)e nǫ /3 Where P n in the above inequality is the probability over the pairs of samples (S 1,S ) induced by the sample generating distributions (P 1,P ). Proof: Simple algebra yields the result. Pr{ A A, P 1 (A) P (A) S 1 (A) S (A) ǫ} Pr{sup P 1 (A) P (A) S 1 (A) + S (A) ǫ} (1) Pr{sup P 1 (A) S 1 (A) + P (A) S (A) ǫ} () Pr{{sup P 1 (A) S 1 (A) ǫ } {sup P (A) S (A) ǫ }} (3) Pr{sup P 1 (A) S 1 (A) ǫ } + Pr{sup P (A) S (A) ǫ } (4) 8Π A (n)e nǫ /3 (5) where the last inequality comes from Theorem Π A(n) is the shatter coefficient [3]. If A has a finite VC-dimension d, then by Sauer s Lemma, Π A(n) < (n + 1) d for all n.
4 4 We thus have ways to bound the probability that empirical A-distance deviates from true A-distance from both sides. The theoretical guarantee can be improved by considering relative A-distance. We can get results similar to Theorem 3.1 and Corollary 3. for the metric φ A (P 1,P ). We start with the following result of Anthony and Shawe-Taylor [4]. Lemma 3.3: Let A be a family of subsets of the domain X. P is any probability distribution over X. If S 1 and S are two collections of n samples each, drawn i.i.d. from P, then P n (φ A (S 1,S ) > ǫ) Π A (n)e nǫ /4 (where P n is the probability that P induces over the choice of samples.) In [4], Anthony and Shawe-Taylor proved that Pr{sup S 1 (A) S (A) S 1 (A)+S (A) > ǫ} Π A (n)e nǫ /4 By symmetry of S 1 (A) and S (A), the result in Lemma 3.3 holds. Theorem 3.4: Let A be a family of subsets of the domain X, P be any probability distribution over X, and S be a set of n samples, each drawn i.i.d. from P. Then P n (φ A (S,P) > ǫ) 8Π A (n)e nǫ /4 (Where P n is the n th power of P - the probability that P induces over the choice of samples). The proof of this theorem is similar to the proof in [4]. Proof: Define Q = S Xn : A A s.t. P(A) S(A) P(A)+S(A) R = SS X n : A A s.t. S(A) S (A) where S, S are two sets of n samples, drawn i.i.d. from P. S(A)+S (A) > ǫ > ǫ Then we claim that Pr(Q) 4 Pr(R) for n > 4. This is true because of the following. ǫ P(C)+S(C) Suppose S Q, so there is C A s.t. P(C) S(C) > ǫ. Hence S(C) < P(C) + ǫ 4 ǫ ǫ 16 + P(C). Noting that S(C) 0, some simple calculation shows that P(C) > ǫ.
5 5 If we draw another set of n samples S, each drawn i.i.d. from P, and define F = S (C) S(C) S (C)+S(C) we have F > ǫ if S (C) > P(C). This is because the function f(x,y) = x y is monotone (x+y)/ increasing w.r.t. x and monotone decreasing w.r.t. y on x, y (0, 1)(taking derivative easily verifies it). So inf F is achieved when S(C) = P(C) + ǫ ǫ ǫ + P(C) and 4 16 S (C) = P(C). Plugging in yields the value ǫ, and the strict inequality follows from the strict inequalities about S(C) and S (C). The random variable ns (C) has binomial distribution B(n,P(C)). For n > 4 ǫ > P(C), ns (C) > np(c) with probability 1/4( [4]). Therefore for n > 4 ǫ, we have Pr(Q) 4 Pr(R). In [4], it is proved that Thus P n sup Pr(R) Π A (n)e nǫ /4. P(A) S(A) P(A)+S(A) Note that this inequality is trivially satisfied if n 4 ǫ. By symmetry we have > ǫ 4Π A(n)e nǫ /4 P n (φ A (S,P) > ǫ) 8Π A (n)e nǫ /4. Similar to Corollary 3., we have the following corollary of Theorem 3.4 which bounds the probability that the empirical relative A-distance deviates from the true relative A-distance. Corollary 3.5: Let P 1, P be any probability distributions over some domain X and let A be a family of subsets of X and ǫ (0, 1). If S 1, S are two collections of n samples each, drawn i.i.d. from P 1, P respectively, then P n [ φ A (P 1,P ) φ A (S 1,S ) > ǫ] 16Π A (n)e nǫ /16 Where P n in the above inequality is the probability over the pairs of samples (S 1,S ) induced by the sample generating distribution (P 1,P ).
6 6 Proof: Because φ A (, ) is a metric on [0, 1]( [5]), we have φ A (P 1,P ) φ A (P 1,S 1 ) + φ A (S 1,S ) + φ A (S,P ) and Therefore, φ A (P 1,P ) φ A (S 1,S ) φ A (P 1,S 1 ) φ A (S,P ) Pr{ φ A (P 1,P ) φ A (S 1,S ) > ǫ} Pr{φ A (P 1,S 1 ) + φ A (P,S ) > ǫ} (6) Pr{φ A (P 1,S 1 ) > ǫ } + Pr{φ A(P,S ) > ǫ } (7) 16Π A (n)e nǫ /16 (8) where the last inequality comes from Theorem 3.4. REFERENCES [1] B. Brodsky and B. Darkovsky, Non-Parametric Methods in Change-Point Problems, Kluwer Academic, The Netherlands, [] J. Shao, Mathematical Statistics, Springer, [3] L. Gyorfi, Principles of Nonparametric Learning, Springer Wien New York, 00. [4] M. Anthony and J. Shawe-Taylor, A result of Vapnik with applications, in Discrete and Applied Mathematics, vol. 47(), pp , [5] S. Ben-David, J. Gehrke and D. Kifer, Detecting Change in Data Streams, in Proc. 004 VLDB Conference, (Toronto, Canada), 004. [6] V.N. Vapnik and A. Ya. Chervonenkis On the uniform convergence of relative frequency of events to their probabilities in Theory of Probability and its Applications, Vol. 16, pp 64-80, 1971.
7 Influence Functions
7 Influence Functions The influence function is used to approximate the standard error of a plug-in estimator. The formal definition is as follows. 7.1 Definition. The Gâteaux derivative of T at F in the
More informationA Result of Vapnik with Applications
A Result of Vapnik with Applications Martin Anthony Department of Statistical and Mathematical Sciences London School of Economics Houghton Street London WC2A 2AE, U.K. John Shawe-Taylor Department of
More informationThe sample complexity of agnostic learning with deterministic labels
The sample complexity of agnostic learning with deterministic labels Shai Ben-David Cheriton School of Computer Science University of Waterloo Waterloo, ON, N2L 3G CANADA shai@uwaterloo.ca Ruth Urner College
More informationThe Vapnik-Chervonenkis Dimension
The Vapnik-Chervonenkis Dimension Prof. Dan A. Simovici UMB 1 / 91 Outline 1 Growth Functions 2 Basic Definitions for Vapnik-Chervonenkis Dimension 3 The Sauer-Shelah Theorem 4 The Link between VCD and
More informationTHE VAPNIK- CHERVONENKIS DIMENSION and LEARNABILITY
THE VAPNIK- CHERVONENKIS DIMENSION and LEARNABILITY Dan A. Simovici UMB, Doctoral Summer School Iasi, Romania What is Machine Learning? The Vapnik-Chervonenkis Dimension Probabilistic Learning Potential
More informationOn Learnability, Complexity and Stability
On Learnability, Complexity and Stability Silvia Villa, Lorenzo Rosasco and Tomaso Poggio 1 Introduction A key question in statistical learning is which hypotheses (function) spaces are learnable. Roughly
More informationLearning Theory. Piyush Rai. CS5350/6350: Machine Learning. September 27, (CS5350/6350) Learning Theory September 27, / 14
Learning Theory Piyush Rai CS5350/6350: Machine Learning September 27, 2011 (CS5350/6350) Learning Theory September 27, 2011 1 / 14 Why Learning Theory? We want to have theoretical guarantees about our
More informationMACHINE LEARNING - CS671 - Part 2a The Vapnik-Chervonenkis Dimension
MACHINE LEARNING - CS671 - Part 2a The Vapnik-Chervonenkis Dimension Prof. Dan A. Simovici UMB Prof. Dan A. Simovici (UMB) MACHINE LEARNING - CS671 - Part 2a The Vapnik-Chervonenkis Dimension 1 / 30 The
More informationSample width for multi-category classifiers
R u t c o r Research R e p o r t Sample width for multi-category classifiers Martin Anthony a Joel Ratsaby b RRR 29-2012, November 2012 RUTCOR Rutgers Center for Operations Research Rutgers University
More informationLecture Learning infinite hypothesis class via VC-dimension and Rademacher complexity;
CSCI699: Topics in Learning and Game Theory Lecture 2 Lecturer: Ilias Diakonikolas Scribes: Li Han Today we will cover the following 2 topics: 1. Learning infinite hypothesis class via VC-dimension and
More informationIntroduction to Machine Learning
Introduction to Machine Learning Vapnik Chervonenkis Theory Barnabás Póczos Empirical Risk and True Risk 2 Empirical Risk Shorthand: True risk of f (deterministic): Bayes risk: Let us use the empirical
More informationAn Introduction to Statistical Theory of Learning. Nakul Verma Janelia, HHMI
An Introduction to Statistical Theory of Learning Nakul Verma Janelia, HHMI Towards formalizing learning What does it mean to learn a concept? Gain knowledge or experience of the concept. The basic process
More informationComputational and Statistical Learning Theory
Computational and Statistical Learning Theory Problem set 1 Due: Monday, October 10th Please send your solutions to learning-submissions@ttic.edu Notation: Input space: X Label space: Y = {±1} Sample:
More informationThe information-theoretic value of unlabeled data in semi-supervised learning
The information-theoretic value of unlabeled data in semi-supervised learning Alexander Golovnev Dávid Pál Balázs Szörényi January 5, 09 Abstract We quantify the separation between the numbers of labeled
More informationCOMS 4771 Introduction to Machine Learning. Nakul Verma
COMS 4771 Introduction to Machine Learning Nakul Verma Announcements HW2 due now! Project proposal due on tomorrow Midterm next lecture! HW3 posted Last time Linear Regression Parametric vs Nonparametric
More informationGeneralization and Overfitting
Generalization and Overfitting Model Selection Maria-Florina (Nina) Balcan February 24th, 2016 PAC/SLT models for Supervised Learning Data Source Distribution D on X Learning Algorithm Expert / Oracle
More informationComputational Learning Theory
CS 446 Machine Learning Fall 2016 OCT 11, 2016 Computational Learning Theory Professor: Dan Roth Scribe: Ben Zhou, C. Cervantes 1 PAC Learning We want to develop a theory to relate the probability of successful
More informationVC Dimension and Sauer s Lemma
CMSC 35900 (Spring 2008) Learning Theory Lecture: VC Diension and Sauer s Lea Instructors: Sha Kakade and Abuj Tewari Radeacher Averages and Growth Function Theore Let F be a class of ±-valued functions
More informationROBUST DETECTION OF STEPPING-STONE ATTACKS
ROBUST DETECTION OF STEPPING-STONE ATTACKS Ting He and Lang Tong School of Electrical and Computer Engineering Cornell University Ithaca, NY 14853, USA Email:{th255,lt35}@cornell.edu Abstract The detection
More informationMachine Learning. VC Dimension and Model Complexity. Eric Xing , Fall 2015
Machine Learning 10-701, Fall 2015 VC Dimension and Model Complexity Eric Xing Lecture 16, November 3, 2015 Reading: Chap. 7 T.M book, and outline material Eric Xing @ CMU, 2006-2015 1 Last time: PAC and
More information4 Expectation & the Lebesgue Theorems
STA 205: Probability & Measure Theory Robert L. Wolpert 4 Expectation & the Lebesgue Theorems Let X and {X n : n N} be random variables on a probability space (Ω,F,P). If X n (ω) X(ω) for each ω Ω, does
More informationGeneralization, Overfitting, and Model Selection
Generalization, Overfitting, and Model Selection Sample Complexity Results for Supervised Classification Maria-Florina (Nina) Balcan 10/03/2016 Two Core Aspects of Machine Learning Algorithm Design. How
More informationMulticlass Learnability and the ERM principle
Multiclass Learnability and the ERM principle Amit Daniely Sivan Sabato Shai Ben-David Shai Shalev-Shwartz November 5, 04 arxiv:308.893v [cs.lg] 4 Nov 04 Abstract We study the sample complexity of multiclass
More informationMachine Learning. Lecture 9: Learning Theory. Feng Li.
Machine Learning Lecture 9: Learning Theory Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2018 Why Learning Theory How can we tell
More informationDomain Adaptation Can Quantity Compensate for Quality?
Domain Adaptation Can Quantity Compensate for Quality? hai Ben-David David R. Cheriton chool of Computer cience University of Waterloo Waterloo, ON N2L 3G1 CANADA shai@cs.uwaterloo.ca hai halev-hwartz
More informationRademacher Complexity Bounds for Non-I.I.D. Processes
Rademacher Complexity Bounds for Non-I.I.D. Processes Mehryar Mohri Courant Institute of Mathematical ciences and Google Research 5 Mercer treet New York, NY 00 mohri@cims.nyu.edu Afshin Rostamizadeh Department
More informationVC-DENSITY FOR TREES
VC-DENSITY FOR TREES ANTON BOBKOV Abstract. We show that for the theory of infinite trees we have vc(n) = n for all n. VC density was introduced in [1] by Aschenbrenner, Dolich, Haskell, MacPherson, and
More informationA Necessary Condition for Learning from Positive Examples
Machine Learning, 5, 101-113 (1990) 1990 Kluwer Academic Publishers. Manufactured in The Netherlands. A Necessary Condition for Learning from Positive Examples HAIM SHVAYTSER* (HAIM%SARNOFF@PRINCETON.EDU)
More informationMulticlass Learnability and the ERM Principle
Journal of Machine Learning Research 6 205 2377-2404 Submitted 8/3; Revised /5; Published 2/5 Multiclass Learnability and the ERM Principle Amit Daniely amitdaniely@mailhujiacil Dept of Mathematics, The
More informationComputational Learning Theory
Computational Learning Theory Pardis Noorzad Department of Computer Engineering and IT Amirkabir University of Technology Ordibehesht 1390 Introduction For the analysis of data structures and algorithms
More informationStrictly positive definite functions on a real inner product space
Advances in Computational Mathematics 20: 263 271, 2004. 2004 Kluwer Academic Publishers. Printed in the Netherlands. Strictly positive definite functions on a real inner product space Allan Pinkus Department
More informationChebyshev Type Inequalities for Sugeno Integrals with Respect to Intuitionistic Fuzzy Measures
BULGARIAN ACADEMY OF SCIENCES CYBERNETICS AND INFORMATION TECHNOLOGIES Volume 9, No 2 Sofia 2009 Chebyshev Type Inequalities for Sugeno Integrals with Respect to Intuitionistic Fuzzy Measures Adrian I.
More informationQuadratic Upper Bound for Recursive Teaching Dimension of Finite VC Classes
Proceedings of Machine Learning Research vol 65:1 10, 2017 Quadratic Upper Bound for Recursive Teaching Dimension of Finite VC Classes Lunjia Hu Ruihan Wu Tianhong Li Institute for Interdisciplinary Information
More informationLECTURE 15: COMPLETENESS AND CONVEXITY
LECTURE 15: COMPLETENESS AND CONVEXITY 1. The Hopf-Rinow Theorem Recall that a Riemannian manifold (M, g) is called geodesically complete if the maximal defining interval of any geodesic is R. On the other
More informationLecture 2: Uniform Entropy
STAT 583: Advanced Theory of Statistical Inference Spring 218 Lecture 2: Uniform Entropy Lecturer: Fang Han April 16 Disclaimer: These notes have not been subjected to the usual scrutiny reserved for formal
More informationBennett-type Generalization Bounds: Large-deviation Case and Faster Rate of Convergence
Bennett-type Generalization Bounds: Large-deviation Case and Faster Rate of Convergence Chao Zhang The Biodesign Institute Arizona State University Tempe, AZ 8587, USA Abstract In this paper, we present
More informationCase study: stochastic simulation via Rademacher bootstrap
Case study: stochastic simulation via Rademacher bootstrap Maxim Raginsky December 4, 2013 In this lecture, we will look at an application of statistical learning theory to the problem of efficient stochastic
More informationFamilies that remain k-sperner even after omitting an element of their ground set
Families that remain k-sperner even after omitting an element of their ground set Balázs Patkós Submitted: Jul 12, 2012; Accepted: Feb 4, 2013; Published: Feb 12, 2013 Mathematics Subject Classifications:
More informationModels of Language Acquisition: Part II
Models of Language Acquisition: Part II Matilde Marcolli CS101: Mathematical and Computational Linguistics Winter 2015 Probably Approximately Correct Model of Language Learning General setting of Statistical
More informationLearnability and models of decision making under uncertainty
and models of decision making under uncertainty Pathikrit Basu Federico Echenique Caltech Virginia Tech DT Workshop April 6, 2018 Pathikrit To think is to forget a difference, to generalize, to abstract.
More informationOn the Sample Complexity of Noise-Tolerant Learning
On the Sample Complexity of Noise-Tolerant Learning Javed A. Aslam Department of Computer Science Dartmouth College Hanover, NH 03755 Scott E. Decatur Laboratory for Computer Science Massachusetts Institute
More information,... We would like to compare this with the sequence y n = 1 n
Example 2.0 Let (x n ) n= be the sequence given by x n = 2, i.e. n 2, 4, 8, 6,.... We would like to compare this with the sequence = n (which we know converges to zero). We claim that 2 n n, n N. Proof.
More informationIntroduction to Machine Learning
Introduction to Machine Learning Machine Learning: Jordan Boyd-Graber University of Maryland RADEMACHER COMPLEXITY Slides adapted from Rob Schapire Machine Learning: Jordan Boyd-Graber UMD Introduction
More informationRademacher Averages and Phase Transitions in Glivenko Cantelli Classes
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 48, NO. 1, JANUARY 2002 251 Rademacher Averages Phase Transitions in Glivenko Cantelli Classes Shahar Mendelson Abstract We introduce a new parameter which
More information1 Stochastic Dynamic Programming
1 Stochastic Dynamic Programming Formally, a stochastic dynamic program has the same components as a deterministic one; the only modification is to the state transition equation. When events in the future
More informationCIS 800/002 The Algorithmic Foundations of Data Privacy September 29, Lecture 6. The Net Mechanism: A Partial Converse
CIS 800/002 The Algorithmic Foundations of Data Privacy September 29, 20 Lecturer: Aaron Roth Lecture 6 Scribe: Aaron Roth Finishing up from last time. Last time we showed: The Net Mechanism: A Partial
More informationContinuity. Chapter 4
Chapter 4 Continuity Throughout this chapter D is a nonempty subset of the real numbers. We recall the definition of a function. Definition 4.1. A function from D into R, denoted f : D R, is a subset of
More informationEmpirical Risk Minimization
Empirical Risk Minimization Fabrice Rossi SAMM Université Paris 1 Panthéon Sorbonne 2018 Outline Introduction PAC learning ERM in practice 2 General setting Data X the input space and Y the output space
More informationEffective Dimension and Generalization of Kernel Learning
Effective Dimension and Generalization of Kernel Learning Tong Zhang IBM T.J. Watson Research Center Yorktown Heights, Y 10598 tzhang@watson.ibm.com Abstract We investigate the generalization performance
More informationE0 370 Statistical Learning Theory Lecture 5 (Aug 25, 2011)
E0 370 Statistical Learning Theory Lecture 5 Aug 5, 0 Covering Nubers, Pseudo-Diension, and Fat-Shattering Diension Lecturer: Shivani Agarwal Scribe: Shivani Agarwal Introduction So far we have seen how
More informationUsing the Mean Absolute Percentage Error for Regression Models
Using the Mean Absolute Percentage Error for Regression Models Arnaud De Myttenaere, Boris Golden, Bénédicte Le Grand, Fabrice Rossi To cite this version: Arnaud De Myttenaere, Boris Golden, Bénédicte
More informationCOMP9444: Neural Networks. Vapnik Chervonenkis Dimension, PAC Learning and Structural Risk Minimization
: Neural Networks Vapnik Chervonenkis Dimension, PAC Learning and Structural Risk Minimization 11s2 VC-dimension and PAC-learning 1 How good a classifier does a learner produce? Training error is the precentage
More informationEXPONENTIAL INEQUALITIES IN NONPARAMETRIC ESTIMATION
EXPONENTIAL INEQUALITIES IN NONPARAMETRIC ESTIMATION Luc Devroye Division of Statistics University of California at Davis Davis, CA 95616 ABSTRACT We derive exponential inequalities for the oscillation
More informationWEAK CONVERGENCE OF RESOLVENTS OF MAXIMAL MONOTONE OPERATORS AND MOSCO CONVERGENCE
Fixed Point Theory, Volume 6, No. 1, 2005, 59-69 http://www.math.ubbcluj.ro/ nodeacj/sfptcj.htm WEAK CONVERGENCE OF RESOLVENTS OF MAXIMAL MONOTONE OPERATORS AND MOSCO CONVERGENCE YASUNORI KIMURA Department
More informationUnderstanding Generalization Error: Bounds and Decompositions
CIS 520: Machine Learning Spring 2018: Lecture 11 Understanding Generalization Error: Bounds and Decompositions Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the
More informationarxiv: v1 [cs.lg] 18 Feb 2017
Quadratic Upper Bound for Recursive Teaching Dimension of Finite VC Classes Lunjia Hu, Ruihan Wu, Tianhong Li and Liwei Wang arxiv:1702.05677v1 [cs.lg] 18 Feb 2017 February 21, 2017 Abstract In this work
More informationLecture 8 Inequality Testing and Moment Inequality Models
Lecture 8 Inequality Testing and Moment Inequality Models Inequality Testing In the previous lecture, we discussed how to test the nonlinear hypothesis H 0 : h(θ 0 ) 0 when the sample information comes
More informationComputational Learning Theory for Artificial Neural Networks
Computational Learning Theory for Artificial Neural Networks Martin Anthony and Norman Biggs Department of Statistical and Mathematical Sciences, London School of Economics and Political Science, Houghton
More informationWe are now going to go back to the concept of sequences, and look at some properties of sequences in R
4 Lecture 4 4. Real Sequences We are now going to go back to the concept of sequences, and look at some properties of sequences in R Definition 3 A real sequence is increasing if + for all, and strictly
More informationLearning with Imperfect Data
Mehryar Mohri Courant Institute and Google mohri@cims.nyu.edu Joint work with: Yishay Mansour (Tel-Aviv & Google) and Afshin Rostamizadeh (Courant Institute). Standard Learning Assumptions IID assumption.
More informationLearning symmetric non-monotone submodular functions
Learning symmetric non-monotone submodular functions Maria-Florina Balcan Georgia Institute of Technology ninamf@cc.gatech.edu Nicholas J. A. Harvey University of British Columbia nickhar@cs.ubc.ca Satoru
More informationChapter 5. Weak convergence
Chapter 5 Weak convergence We will see later that if the X i are i.i.d. with mean zero and variance one, then S n / p n converges in the sense P(S n / p n 2 [a, b])! P(Z 2 [a, b]), where Z is a standard
More informationContinuity. Chapter 4
Chapter 4 Continuity Throughout this chapter D is a nonempty subset of the real numbers. We recall the definition of a function. Definition 4.1. A function from D into R, denoted f : D R, is a subset of
More informationVC-dimension of a context-dependent perceptron
1 VC-dimension of a context-dependent perceptron Piotr Ciskowski Institute of Engineering Cybernetics, Wroc law University of Technology, Wybrzeże Wyspiańskiego 27, 50 370 Wroc law, Poland cis@vectra.ita.pwr.wroc.pl
More informationYale University Department of Computer Science. The VC Dimension of k-fold Union
Yale University Department of Computer Science The VC Dimension of k-fold Union David Eisenstat Dana Angluin YALEU/DCS/TR-1360 June 2006, revised October 2006 The VC Dimension of k-fold Union David Eisenstat
More information12. Structural Risk Minimization. ECE 830 & CS 761, Spring 2016
12. Structural Risk Minimization ECE 830 & CS 761, Spring 2016 1 / 23 General setup for statistical learning theory We observe training examples {x i, y i } n i=1 x i = features X y i = labels / responses
More informationLecture 3. Econ August 12
Lecture 3 Econ 2001 2015 August 12 Lecture 3 Outline 1 Metric and Metric Spaces 2 Norm and Normed Spaces 3 Sequences and Subsequences 4 Convergence 5 Monotone and Bounded Sequences Announcements: - Friday
More informationPeter Hoff Minimax estimation October 31, Motivation and definition. 2 Least favorable prior 3. 3 Least favorable prior sequence 11
Contents 1 Motivation and definition 1 2 Least favorable prior 3 3 Least favorable prior sequence 11 4 Nonparametric problems 15 5 Minimax and admissibility 18 6 Superefficiency and sparsity 19 Most of
More informationLearning Kernels -Tutorial Part III: Theoretical Guarantees.
Learning Kernels -Tutorial Part III: Theoretical Guarantees. Corinna Cortes Google Research corinna@google.com Mehryar Mohri Courant Institute & Google Research mohri@cims.nyu.edu Afshin Rostami UC Berkeley
More informationLocalization vs. Identification of Semi-Algebraic Sets
Machine Learning 32, 207 224 (1998) c 1998 Kluwer Academic Publishers. Manufactured in The Netherlands. Localization vs. Identification of Semi-Algebraic Sets SHAI BEN-DAVID MICHAEL LINDENBAUM Department
More informationGeneralization Error Bounds for Collaborative Prediction with Low-Rank Matrices
Generalization Error Bounds for Collaborative Prediction with Low-Rank Matrices Nathan Srebro Department of Computer Science University of Toronto Toronto, ON, Canada nati@cs.toronto.edu Noga Alon School
More informationVapnik-Chervonenkis Dimension of Neural Nets
P. L. Bartlett and W. Maass: Vapnik-Chervonenkis Dimension of Neural Nets 1 Vapnik-Chervonenkis Dimension of Neural Nets Peter L. Bartlett BIOwulf Technologies and University of California at Berkeley
More informationOn the Approximability of Partial VC Dimension
On the Approximability of Partial VC Dimension Cristina Bazgan 1 Florent Foucaud 2 Florian Sikora 1 1 LAMSADE, Université Paris Dauphine, CNRS France 2 LIMOS, Université Blaise Pascal, Clermont-Ferrand
More informationMaterial covered: Class numbers of quadratic fields, Valuations, Completions of fields.
ALGEBRAIC NUMBER THEORY LECTURE 6 NOTES Material covered: Class numbers of quadratic fields, Valuations, Completions of fields. 1. Ideal class groups of quadratic fields These are the ideal class groups
More informationChapter 8. General Countably Additive Set Functions. 8.1 Hahn Decomposition Theorem
Chapter 8 General Countably dditive Set Functions In Theorem 5.2.2 the reader saw that if f : X R is integrable on the measure space (X,, µ) then we can define a countably additive set function ν on by
More informationSample Complexity of Learning Independent of Set Theory
Sample Complexity of Learning Independent of Set Theory Shai Ben-David University of Waterloo, Canada Based on joint work with Pavel Hrubes, Shay Moran, Amir Shpilka and Amir Yehudayoff Simons workshop,
More informationOn the VC-Dimension of the Choquet Integral
On the VC-Dimension of the Choquet Integral Eyke Hüllermeier and Ali Fallah Tehrani Department of Mathematics and Computer Science University of Marburg, Germany {eyke,fallah}@mathematik.uni-marburg.de
More informationStatistical and Computational Learning Theory
Statistical and Computational Learning Theory Fundamental Question: Predict Error Rates Given: Find: The space H of hypotheses The number and distribution of the training examples S The complexity of the
More informationTowards Lifelong Machine Learning Multi-Task and Lifelong Learning with Unlabeled Tasks Christoph Lampert
Towards Lifelong Machine Learning Multi-Task and Lifelong Learning with Unlabeled Tasks Christoph Lampert HSE Computer Science Colloquium September 6, 2016 IST Austria (Institute of Science and Technology
More informationStatistical Learning Learning From Examples
Statistical Learning Learning From Examples We want to estimate the working temperature range of an iphone. We could study the physics and chemistry that affect the performance of the phone too hard We
More informationTame definable topological dynamics
Tame definable topological dynamics Artem Chernikov (Paris 7) Géométrie et Théorie des Modèles, 4 Oct 2013, ENS, Paris Joint work with Pierre Simon, continues previous work with Anand Pillay and Pierre
More informationk-fold unions of low-dimensional concept classes
k-fold unions of low-dimensional concept classes David Eisenstat September 2009 Abstract We show that 2 is the minimum VC dimension of a concept class whose k-fold union has VC dimension Ω(k log k). Keywords:
More informationEstimating the sample complexity of a multi-class. discriminant model
Estimating the sample complexity of a multi-class discriminant model Yann Guermeur LIP, UMR NRS 0, Universit Paris, place Jussieu, 55 Paris cedex 05 Yann.Guermeur@lip.fr Andr Elissee and H l ne Paugam-Moisy
More informationLecture 2. We now introduce some fundamental tools in martingale theory, which are useful in controlling the fluctuation of martingales.
Lecture 2 1 Martingales We now introduce some fundamental tools in martingale theory, which are useful in controlling the fluctuation of martingales. 1.1 Doob s inequality We have the following maximal
More informationSections of Convex Bodies via the Combinatorial Dimension
Sections of Convex Bodies via the Combinatorial Dimension (Rough notes - no proofs) These notes are centered at one abstract result in combinatorial geometry, which gives a coordinate approach to several
More informationCS 6375: Machine Learning Computational Learning Theory
CS 6375: Machine Learning Computational Learning Theory Vibhav Gogate The University of Texas at Dallas Many slides borrowed from Ray Mooney 1 Learning Theory Theoretical characterizations of Difficulty
More informationLecture 14: Binary Classification: Disagreement-based Methods
CSE599i: Online and Adaptive Machine Learning Winter 2018 Lecture 14: Binary Classification: Disagreement-based Methods Lecturer: Kevin Jamieson Scribes: N. Cano, O. Rafieian, E. Barzegary, G. Erion, S.
More informationConsistency of Nearest Neighbor Methods
E0 370 Statistical Learning Theory Lecture 16 Oct 25, 2011 Consistency of Nearest Neighbor Methods Lecturer: Shivani Agarwal Scribe: Arun Rajkumar 1 Introduction In this lecture we return to the study
More informationmeans is a subset of. So we say A B for sets A and B if x A we have x B holds. BY CONTRAST, a S means that a is a member of S.
1 Notation For those unfamiliar, we have := means equal by definition, N := {0, 1,... } or {1, 2,... } depending on context. (i.e. N is the set or collection of counting numbers.) In addition, means for
More informationExtending the Monoidal T-norm Based Logic with an Independent Involutive Negation
Extending the Monoidal T-norm Based Logic with an Independent Involutive Negation Tommaso Flaminio Dipartimento di Matematica Università di Siena Pian dei Mantellini 44 53100 Siena (Italy) flaminio@unisi.it
More informationGeneralization bounds
Advanced Course in Machine Learning pring 200 Generalization bounds Handouts are jointly prepared by hie Mannor and hai halev-hwartz he problem of characterizing learnability is the most basic question
More informationGeometry and topology of continuous best and near best approximations
Journal of Approximation Theory 105: 252 262, Geometry and topology of continuous best and near best approximations Paul C. Kainen Dept. of Mathematics Georgetown University Washington, D.C. 20057 Věra
More informationCausal Inference Basics
Causal Inference Basics Sam Lendle October 09, 2013 Observed data, question, counterfactuals Observed data: n i.i.d copies of baseline covariates W, treatment A {0, 1}, and outcome Y. O i = (W i, A i,
More information106 CHAPTER 3. TOPOLOGY OF THE REAL LINE. 2. The set of limit points of a set S is denoted L (S)
106 CHAPTER 3. TOPOLOGY OF THE REAL LINE 3.3 Limit Points 3.3.1 Main Definitions Intuitively speaking, a limit point of a set S in a space X is a point of X which can be approximated by points of S other
More informationUpper and Lower Bounds
James K. Peterson Department of Biological Sciences and Department of Mathematical Sciences Clemson University August 30, 2017 Outline 1 2 s 3 Basic Results 4 Homework Let S be a set of real numbers. We
More informationLecture 6: September 19
36-755: Advanced Statistical Theory I Fall 2016 Lecture 6: September 19 Lecturer: Alessandro Rinaldo Scribe: YJ Choe Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes have
More informationThe p-adic Numbers. Akhil Mathew
The p-adic Numbers Akhil Mathew ABSTRACT These are notes for the presentation I am giving today, which itself is intended to conclude the independent study on algebraic number theory I took with Professor
More informationActivized Learning with Uniform Classification Noise
Activized Learning with Uniform Classification Noise Liu Yang Machine Learning Department, Carnegie Mellon University Steve Hanneke LIUY@CS.CMU.EDU STEVE.HANNEKE@GMAIL.COM Abstract We prove that for any
More informationUniform laws of large numbers 2
C H A P T E R 4 Uniform laws of large numbers The focus of this chapter is a class of results known as uniform laws of large numbers. 3 As suggested by their name, these results represent a strengthening
More information