E0 370 Statistical Learning Theory Lecture 5 (Aug 25, 2011)
|
|
- Allison Ryan
- 6 years ago
- Views:
Transcription
1 E0 370 Statistical Learning Theory Lecture 5 Aug 5, 0 Covering Nubers, Pseudo-Diension, and Fat-Shattering Diension Lecturer: Shivani Agarwal Scribe: Shivani Agarwal Introduction So far we have seen how to obtain high confidence bounds on the generalization error er 0- D [h S] of a binary classifier h S learned by an algorith fro a function class H, X of liited capacity, using the ideas of unifor convergence. We saw the use of the growth function Π H to easure the capacity of the class H, as well as the VC-diension VCdiH, which provides a one-nuber suary of the capacity of H. In this lecture we will consider the proble of regression, or learning of real-valued functions. Here we are given a training saple S = x, y,..., x, y X Y for soe Y R, assued now to be drawn fro D for soe distribution D on X Y, and the goal is to learn fro this saple a real-valued function f S : X R that has low generalization error er l D [f S] w.r.t. soe appropriate loss function l : Y R [0,, such as the squared loss l sq given by l sq y, ŷ = ŷ y. As before, we will be interested in obtaining high confidence bounds on the generalization error of the learned function, er l D [f S]. Again, we will consider learning algoriths that learn f S fro a function class F R X of liited capacity, and obtain generalization error bounds for such algoriths via a unifor convergence result that upper bounds the probability P S D sup er l D[f] er l S[f]. However in order to derive such a result, we will need a different notion of capacity for a class of real-valued functions F. In particular, we will use the covering nubers of F, which will play a role analogous to that played by the growth function in the case of binary-valued function classes. Covering Nubers We start by considering covering nubers of subsets of a general etric space. We will then specialize this to subsets of Euclidean space, and use this to define covering nubers for a real-valued function class.. Covering Nubers in a General Metric Space Let A, d be a etric space. Let W A and let > 0. A set C W is said to be a proper -cover of W w.r.t. d if for every w W, c C such that dw, c <. In other words, C W is an -cover of W w.r.t. d if the union of open d-balls of radius centered at points in C contains W : 3 B d, c W. c C If W has a finite -cover w.r.t. d, then we define the -covering nuber of W w.r.t. d to be the cardinality of the sallest -cover of W : N, W, d = in C C is an -cover of W w.r.t. d. Recall that a etric space A, d consists of a set A together with a etric d : A A [0, that satisfies the following for all x, y, z A: dx, y = 0 x = y; dx, y = dy, x; and 3 dx, z dx, y + dx, z. Soeties it is also convenient to consider iproper covers C A which need not be contained in W. 3 Recall that the open d-ball centered at x A is defined as B d, x = y A dx, y <.
2 Covering Nubers, Pseudo-Diension, and Fat-Shattering Diension If W does not have a finite -cover w.r.t. d, we take N, W, d =. Thus the covering nubers of W can be viewed as easuring the extent of W in A, d at granularity or scale.. Covering Nubers in Euclidean Space Consider now A = R n. following: We can define a nuber of different etrics on R n, including in particular the d x, x = n x i x n i 3 i= d x, x = n x i x i n i= d x, x = ax x i x i. 5 i Accordingly, for any W R n, we can define the corresponding covering nubers N, W, d p for p =,,. It is easy to see that d x, x d x, x by Jensen s inequality and that d x, x d x, x, fro which it follows that the corresponding covering nubers satisfy the relation N, W, d N, W, d N, W, d. 6.3 Unifor Covering Nubers for a Real-Valued Function Class Now let F be a class of real-valued functions on X, and let x = x,..., x X. Then F x R. For any > 0 and N, the unifor d p covering nubers of F for p =,, are defined as N p, F, = ax N, F x X x, d p 7 if N, F x, d p is finite for all x X, and N p, F, = otherwise. This should be copared with the definition of growth function for a class of binary-valued functions H, which also involved a axiu over x X : in that case, H x was finite, and the axiu was over the cardinality of H x ; here, F x ay in general be infinite, and the axiu is over the extent of F x in R at scale, as easured using the etric d p. In particular, for H, X, we have that for any, N, H x, d = H x, and therefore N, H, = Π H. Thus the unifor covering nubers can be viewed as generalizing the notion of growth function to classes of real-valued functions. Note that the ter unifor here refers to the axiu over all x X of the covering nubers of F x in R, and is unrelated to the use of the ter unifor in unifor convergence, which refers to the supreu over functions f F. Unless otherwise stated, in what follows we will refer to the unifor covering nubers of a function class F as siply the covering nubers of F. 3 Unifor Convergence in a Real-Valued Function class F We can assue that functions in F take values in soe set Ŷ R, so that F ŶX. We will require the loss function l to be bounded, i.e. we will assue B > 0 such that 0 ly, ŷ B y Y, ŷ Ŷ. We will find it useful to define for any function class F ŶX and loss l : Y Ŷ [0, B] the loss function class l F [0, B] X Y given by l F = l f : X Y [0, B] l f x, y = ly, fx for soe f F. 8 We will first prove a unifor convergence result for general losses l as above in ters of the d covering nubers of the loss function class l F, and will then show that for any losses l, including the squared loss when Y and Ŷ are bounded, the d covering nubers of l F can further be bounded in ters of the d covering nubers of F.
3 Covering Nubers, Pseudo-Diension, and Fat-Shattering Diension 3 Theore 3.. Let Y, Ŷ R. Let F ŶX, and let l : Y Ŷ [0, B]. Let D be any distribution on X Y. For any > 0, P S D sup er l D[f] er l S[f] N /8, lf, e /3B. 9 Proof. The proof uses siilar techniques as in the proof of unifor convergence for the l 0- loss in the binary case that we saw in Lecture 3, and has the sae broad steps. The key difference is in the reduction to a finite class step 3. Step : Syetrization. Following the sae steps as in Lecture 3, we can show that for 8B, P S D sup er l D[h] er l S[h] P S, S D D sup er l S[f] er l S[f]. 0 Step : Swapping perutations. Again using the sae arguent as in Lecture 3, we can show that P S, S D D sup er l S[f] er l S[f] ] [P σ Γ sup er l σs [f] erl σ S [f]. sup S, S X Y Step 3: Reduction to a finite class. Fix any S, S X Y, and for siplicity, define x +i, y +i = x i, ỹ i i []. Now consider l F S, S [0, B]. Let G F be such that l G S, S is an /8-cover of l F S, S w.r.t. d. Clearly, we can take G = N /8, l F S, S, d N /8, l F, ; since l F aps to a bounded interval, this is a finite nuber. Now consider any σ Γ. We clai that if f F such that er l σs [f] erl σ S [f], then g G such that er l σs [g] erl σ S [g]. 3 To see this, let f F be such that holds. Take any g G for which i= l f x i, y i l g x i, y i < 8. Such a g exists since l G S, S is an /8-cover of l F S, S w.r.t. d. We will show that g satisfies 3. In particular, we have er l σs [f] erl σ S [f] = = < er l σs [f] erl σs [g] er l er σs [f] erl σs [g] + l er l σs [g] erl σ S [g] + er l σs [g] erl σ S [g] + er l σs [g] erl σ S [g] + 5 er lσ S [f] erlσ S [g] + er l σs [g] erl σ S [g] 6 er σ S [f] erl σ S [g] + l σs [g] er l σ S [g] 7 l f x i, y i l g x i, y i + l f x σi, y σi l g x σi, y σi 8 i= i= i=+ l f x σi, y σi l g x σi, y σi 9 by. 0 Y can be thought of as the space of true labels, and Ŷ the space of predicted labels.
4 Covering Nubers, Pseudo-Diension, and Fat-Shattering Diension The clai follows. Thus we have P σ Γ sup er l σs [f] erl σ S [f] P σ Γ ax er l σs [g] g G erl σ S [g] er N /8, l F, ax P l σ Γ σs [g] er l g G σ S [g] by union bound. Step : Hoeffding s inequality. As in Lecture 3, Hoeffding s inequality can now be used to show that for any g G, er l P σ Γ σs [g] er l σ S [g] 3 = P r, r i l g x i, y i l g x i, y i i= e /3B. 5 Putting everything together yields the desired result for 8B ; for < 8B, the result holds trivially. The above result yields a high-confidence bound on the generalization error of a function learned fro F in ters of covering nubers of l F. For well-behaved loss functions l, these can be further bounded in ters of covering nubers of F: Lea 3.. Let Y, Ŷ R. Let F ŶX, and let l : Y Ŷ [0, B]. If l is Lipschitz in its second arguent with Lipschitz constant L > 0, i.e. ly, ŷ ly, ŷ L ŷ ŷ y Y, ŷ, ŷ Ŷ, then for any N, N, lf, N /L, F,. Proof. Let S = x, y,..., x, y X Y, and let f, g F. Then l f x i, y i l g x i, y i = i= L ly i, fx i ly i, gx i 6 i= fx i gx i. 7 i= Thus any d /L-cover for F x is a d -cover for l F S, which iplies the result. This yields the following corollary: Corollary 3.3. Let Y, Ŷ R, F ŶX, and l : Y Ŷ [0, B] such that l is Lipschitz in its second arguent with Lipschitz constant L > 0. Let D be any distribution on X Y. Then for any > 0: P S D sup er l D[f] er l S[f] N /8L, F, e /3B. 8 As an exaple, consider the squared loss l sq y, ŷ = ŷ y. It can be shown that if Y, Ŷ are bounded, then l sq is bounded and Lipschitz. In particular, for Y = Ŷ = [, ], we have 0 l sqy, ŷ y Y, ŷ Ŷ,
5 Covering Nubers, Pseudo-Diension, and Fat-Shattering Diension 5 and l sq is Lipschitz with Lipschitz constant L = : l sq y, ŷ l sq y, ŷ = = y ŷ y ŷ 9 ŷ ŷ yŷ ŷ 30 ŷ + ŷ ŷ ŷ + y ŷ ŷ 3 ŷ ŷ since y, ŷ, ŷ [, ]. 3 Thus when both labels and predictions are in [, ], one gets for the squared loss: Corollary 3.. Let Y = Ŷ = [, ], F ŶX, and l sq : Y Ŷ [0, ] be given by l sqy, ŷ = ŷ y. Let D be any distribution on X Y. Then for any > 0: P S D sup er sq D [f] ersq S [f] N /3, F, e /5. 33 Exercise. Show that for Y = Ŷ = [, ], the absolute loss given by l absy, ŷ = ŷ y y Y, ŷ Ŷ is bounded and is Lipschitz in its second arguent with Lipschitz constant L =. Pseudo-Diension and Fat-Shattering Diension Just as the growth function Π H needed to be sub-exponential in for the unifor convergence result in the binary classification case to be eaningful, the covering nubers N /8, l F, or N /8L, F, need to be sub-exponential in for the above result to be eaningful. In the binary case, we saw that if the VC-diension of H is finite, then the growth function of H grows polynoially in. Analogous results can be shown to hold for the covering nubers. We provide soe basic definitions and results here; further details can be found for exaple in []. Definition Pseudo-diension. Let F R X and let x = x,..., x X. We say x is pseudoshattered by F if r = r,..., r R such that b = b,..., b,, f b F such that signf b x i r i = b i i []. The pseudo-diension of F is the cardinality of the largest set of points in X that can be pseudo-shattered by F: PdiF = ax N x X such that x is pseudo-shattered by F. If F pseudo-shatters arbitrarily large sets of points in X, we say PdiF =. Fact. If F is a vector space of real-valued functions, then PdiF = dif. For exaple, for the class of all affine functions over R n given by F = f : R n R fx = w x + b for soe w R n, b R, we have PdiF = n +. Clearly, if F F, PdiF PdiF. Definition Fat-shattering diension. Let F R X and let x = x,..., x X. Let γ > 0. We say x is γ-shattered by F if r = r,..., r R such that b = b,..., b,, f b F such that b i f b x i r i γ i []. The γ-diension of F or the fat-shattering diension of F at scale γ is the cardinality of the largest set of points in X that can be γ-shattered by F: fat F γ = ax N x X such that x is γ-shattered by F. If F γ-shatters arbitrarily large sets of points in X, we say fat F γ =. Clearly, fat F γ PdiF γ > 0. The fat-shattering diension is often called a scale-sensitive diension since it depends on the scale γ. Both quantities, when finite, can be used to bound the covering nubers of a function class F whose functions take values in a bounded range:
6 6 Covering Nubers, Pseudo-Diension, and Fat-Shattering Diension Theore.. Let F [a, b] X for soe a b. Let 0 < b a, and let fat F /8 = d <. Then for d, d log /d N, F, = O. Theore.. Let F [a, b] X N, for soe a b. Let PdiF = d <. Then for all 0 < b a and d N, F, = O. 5 Next Lecture In the next lecture, we will return to binary classification, but will focus on learning functions of the for hx = signfx for soe real-valued function f, and will see how the quantities considered in this lecture can be useful in such situations. References [] Martin Anthony and Peter L. Bartlett. Learning in Neural Networks: Theoretical Foundations. Cabridge University Press, 999.
E0 370 Statistical Learning Theory Lecture 6 (Aug 30, 2011) Margin Analysis
E0 370 tatistical Learning Theory Lecture 6 (Aug 30, 20) Margin Analysis Lecturer: hivani Agarwal cribe: Narasihan R Introduction In the last few lectures we have seen how to obtain high confidence bounds
More informationComputational and Statistical Learning Theory
Coputational and Statistical Learning Theory Proble sets 5 and 6 Due: Noveber th Please send your solutions to learning-subissions@ttic.edu Notations/Definitions Recall the definition of saple based Radeacher
More information1 Generalization bounds based on Rademacher complexity
COS 5: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #0 Scribe: Suqi Liu March 07, 08 Last tie we started proving this very general result about how quickly the epirical average converges
More informationVC Dimension and Sauer s Lemma
CMSC 35900 (Spring 2008) Learning Theory Lecture: VC Diension and Sauer s Lea Instructors: Sha Kakade and Abuj Tewari Radeacher Averages and Growth Function Theore Let F be a class of ±-valued functions
More informationSymmetrization and Rademacher Averages
Stat 928: Statistical Learning Theory Lecture: Syetrization and Radeacher Averages Instructor: Sha Kakade Radeacher Averages Recall that we are interested in bounding the difference between epirical and
More information1 Proof of learning bounds
COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #4 Scribe: Akshay Mittal February 13, 2013 1 Proof of learning bounds For intuition of the following theore, suppose there exists a
More information3.8 Three Types of Convergence
3.8 Three Types of Convergence 3.8 Three Types of Convergence 93 Suppose that we are given a sequence functions {f k } k N on a set X and another function f on X. What does it ean for f k to converge to
More informationUnderstanding Machine Learning Solution Manual
Understanding Machine Learning Solution Manual Written by Alon Gonen Edited by Dana Rubinstein Noveber 17, 2014 2 Gentle Start 1. Given S = ((x i, y i )), define the ultivariate polynoial p S (x) = i []:y
More informationPrerequisites. We recall: Theorem 2 A subset of a countably innite set is countable.
Prerequisites 1 Set Theory We recall the basic facts about countable and uncountable sets, union and intersection of sets and iages and preiages of functions. 1.1 Countable and uncountable sets We can
More informationComputational and Statistical Learning Theory
Coputational and Statistical Learning Theory TTIC 31120 Prof. Nati Srebro Lecture 2: PAC Learning and VC Theory I Fro Adversarial Online to Statistical Three reasons to ove fro worst-case deterinistic
More informationKernel Methods and Support Vector Machines
Intelligent Systes: Reasoning and Recognition Jaes L. Crowley ENSIAG 2 / osig 1 Second Seester 2012/2013 Lesson 20 2 ay 2013 Kernel ethods and Support Vector achines Contents Kernel Functions...2 Quadratic
More informationUnderstanding Generalization Error: Bounds and Decompositions
CIS 520: Machine Learning Spring 2018: Lecture 11 Understanding Generalization Error: Bounds and Decompositions Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the
More information1 Bounding the Margin
COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #12 Scribe: Jian Min Si March 14, 2013 1 Bounding the Margin We are continuing the proof of a bound on the generalization error of AdaBoost
More information1 Rademacher Complexity Bounds
COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #10 Scribe: Max Goer March 07, 2013 1 Radeacher Coplexity Bounds Recall the following theore fro last lecture: Theore 1. With probability
More informationQuestion 1. Question 3. Question 4. Graduate Analysis I Exercise 4
Graduate Analysis I Exercise 4 Question 1 If f is easurable and λ is any real nuber, f + λ and λf are easurable. Proof. Since {f > a λ} is easurable, {f + λ > a} = {f > a λ} is easurable, then f + λ is
More informationIntelligent Systems: Reasoning and Recognition. Perceptrons and Support Vector Machines
Intelligent Systes: Reasoning and Recognition Jaes L. Crowley osig 1 Winter Seester 2018 Lesson 6 27 February 2018 Outline Perceptrons and Support Vector achines Notation...2 Linear odels...3 Lines, Planes
More informationSupplement to: Subsampling Methods for Persistent Homology
Suppleent to: Subsapling Methods for Persistent Hoology A. Technical results In this section, we present soe technical results that will be used to prove the ain theores. First, we expand the notation
More informationCourse Notes for EE227C (Spring 2018): Convex Optimization and Approximation
Course Notes for EE7C (Spring 018: Convex Optiization and Approxiation Instructor: Moritz Hardt Eail: hardt+ee7c@berkeley.edu Graduate Instructor: Max Sichowitz Eail: sichow+ee7c@berkeley.edu October 15,
More information12. Structural Risk Minimization. ECE 830 & CS 761, Spring 2016
12. Structural Risk Minimization ECE 830 & CS 761, Spring 2016 1 / 23 General setup for statistical learning theory We observe training examples {x i, y i } n i=1 x i = features X y i = labels / responses
More informationComputable Shell Decomposition Bounds
Coputable Shell Decoposition Bounds John Langford TTI-Chicago jcl@cs.cu.edu David McAllester TTI-Chicago dac@autoreason.co Editor: Leslie Pack Kaelbling and David Cohn Abstract Haussler, Kearns, Seung
More informationMachine Learning Basics: Estimators, Bias and Variance
Machine Learning Basics: Estiators, Bias and Variance Sargur N. srihari@cedar.buffalo.edu This is part of lecture slides on Deep Learning: http://www.cedar.buffalo.edu/~srihari/cse676 1 Topics in Basics
More informationCourse 212: Academic Year Section 1: Metric Spaces
Course 212: Academic Year 1991-2 Section 1: Metric Spaces D. R. Wilkins Contents 1 Metric Spaces 3 1.1 Distance Functions and Metric Spaces............. 3 1.2 Convergence and Continuity in Metric Spaces.........
More informationMASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 11 10/15/2008 ABSTRACT INTEGRATION I
MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 11 10/15/2008 ABSTRACT INTEGRATION I Contents 1. Preliinaries 2. The ain result 3. The Rieann integral 4. The integral of a nonnegative
More informationConsistency of Nearest Neighbor Methods
E0 370 Statistical Learning Theory Lecture 16 Oct 25, 2011 Consistency of Nearest Neighbor Methods Lecturer: Shivani Agarwal Scribe: Arun Rajkumar 1 Introduction In this lecture we return to the study
More informationComputable Shell Decomposition Bounds
Journal of Machine Learning Research 5 (2004) 529-547 Subitted 1/03; Revised 8/03; Published 5/04 Coputable Shell Decoposition Bounds John Langford David McAllester Toyota Technology Institute at Chicago
More informationSupport Vector Machines. Maximizing the Margin
Support Vector Machines Support vector achines (SVMs) learn a hypothesis: h(x) = b + Σ i= y i α i k(x, x i ) (x, y ),..., (x, y ) are the training exs., y i {, } b is the bias weight. α,..., α are the
More informationPattern Recognition and Machine Learning. Learning and Evaluation for Pattern Recognition
Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2017 Lesson 1 4 October 2017 Outline Learning and Evaluation for Pattern Recognition Notation...2 1. The Pattern Recognition
More informationCourse Notes for EE227C (Spring 2018): Convex Optimization and Approximation
Course Notes for EE227C (Spring 2018): Convex Optiization and Approxiation Instructor: Moritz Hardt Eail: hardt+ee227c@berkeley.edu Graduate Instructor: Max Sichowitz Eail: sichow+ee227c@berkeley.edu October
More informationLecture Learning infinite hypothesis class via VC-dimension and Rademacher complexity;
CSCI699: Topics in Learning and Game Theory Lecture 2 Lecturer: Ilias Diakonikolas Scribes: Li Han Today we will cover the following 2 topics: 1. Learning infinite hypothesis class via VC-dimension and
More informationUniform Approximation and Bernstein Polynomials with Coefficients in the Unit Interval
Unifor Approxiation and Bernstein Polynoials with Coefficients in the Unit Interval Weiang Qian and Marc D. Riedel Electrical and Coputer Engineering, University of Minnesota 200 Union St. S.E. Minneapolis,
More informationLecture October 23. Scribes: Ruixin Qiang and Alana Shine
CSCI699: Topics in Learning and Gae Theory Lecture October 23 Lecturer: Ilias Scribes: Ruixin Qiang and Alana Shine Today s topic is auction with saples. 1 Introduction to auctions Definition 1. In a single
More informationarxiv: v1 [math.co] 19 Apr 2017
PROOF OF CHAPOTON S CONJECTURE ON NEWTON POLYTOPES OF q-ehrhart POLYNOMIALS arxiv:1704.0561v1 [ath.co] 19 Apr 017 JANG SOO KIM AND U-KEUN SONG Abstract. Recently, Chapoton found a q-analog of Ehrhart polynoials,
More informationU V. r In Uniform Field the Potential Difference is V Ed
SPHI/W nit 7.8 Electric Potential Page of 5 Notes Physics Tool box Electric Potential Energy the electric potential energy stored in a syste k of two charges and is E r k Coulobs Constant is N C 9 9. E
More informationChapter 6 1-D Continuous Groups
Chapter 6 1-D Continuous Groups Continuous groups consist of group eleents labelled by one or ore continuous variables, say a 1, a 2,, a r, where each variable has a well- defined range. This chapter explores:
More informationPREPRINT 2006:17. Inequalities of the Brunn-Minkowski Type for Gaussian Measures CHRISTER BORELL
PREPRINT 2006:7 Inequalities of the Brunn-Minkowski Type for Gaussian Measures CHRISTER BORELL Departent of Matheatical Sciences Division of Matheatics CHALMERS UNIVERSITY OF TECHNOLOGY GÖTEBORG UNIVERSITY
More informationSolutions 1. Introduction to Coding Theory - Spring 2010 Solutions 1. Exercise 1.1. See Examples 1.2 and 1.11 in the course notes.
Solutions 1 Exercise 1.1. See Exaples 1.2 and 1.11 in the course notes. Exercise 1.2. Observe that the Haing distance of two vectors is the iniu nuber of bit flips required to transfor one into the other.
More informationMax-Product Shepard Approximation Operators
Max-Product Shepard Approxiation Operators Barnabás Bede 1, Hajie Nobuhara 2, János Fodor 3, Kaoru Hirota 2 1 Departent of Mechanical and Syste Engineering, Bánki Donát Faculty of Mechanical Engineering,
More informationIntroduction to Machine Learning (67577) Lecture 3
Introduction to Machine Learning (67577) Lecture 3 Shai Shalev-Shwartz School of CS and Engineering, The Hebrew University of Jerusalem General Learning Model and Bias-Complexity tradeoff Shai Shalev-Shwartz
More information4 = (0.02) 3 13, = 0.25 because = 25. Simi-
Theore. Let b and be integers greater than. If = (. a a 2 a i ) b,then for any t N, in base (b + t), the fraction has the digital representation = (. a a 2 a i ) b+t, where a i = a i + tk i with k i =
More informationIntroduction to Discrete Optimization
Prof. Friedrich Eisenbrand Martin Nieeier Due Date: March 9 9 Discussions: March 9 Introduction to Discrete Optiization Spring 9 s Exercise Consider a school district with I neighborhoods J schools and
More informationCombining Classifiers
Cobining Classifiers Generic ethods of generating and cobining ultiple classifiers Bagging Boosting References: Duda, Hart & Stork, pg 475-480. Hastie, Tibsharini, Friedan, pg 246-256 and Chapter 10. http://www.boosting.org/
More informationA Simple Regression Problem
A Siple Regression Proble R. M. Castro March 23, 2 In this brief note a siple regression proble will be introduced, illustrating clearly the bias-variance tradeoff. Let Y i f(x i ) + W i, i,..., n, where
More informationLearnability and Stability in the General Learning Setting
Learnability and Stability in the General Learning Setting Shai Shalev-Shwartz TTI-Chicago shai@tti-c.org Ohad Shair The Hebrew University ohadsh@cs.huji.ac.il Nathan Srebro TTI-Chicago nati@uchicago.edu
More informationFAST DYNAMO ON THE REAL LINE
FAST DYAMO O THE REAL LIE O. KOZLOVSKI & P. VYTOVA Abstract. In this paper we show that a piecewise expanding ap on the interval, extended to the real line by a non-expanding ap satisfying soe ild hypthesis
More informationSample width for multi-category classifiers
R u t c o r Research R e p o r t Sample width for multi-category classifiers Martin Anthony a Joel Ratsaby b RRR 29-2012, November 2012 RUTCOR Rutgers Center for Operations Research Rutgers University
More information1 Proving the Fundamental Theorem of Statistical Learning
THEORETICAL MACHINE LEARNING COS 5 LECTURE #7 APRIL 5, 6 LECTURER: ELAD HAZAN NAME: FERMI MA ANDDANIEL SUO oving te Fundaental Teore of Statistical Learning In tis section, we prove te following: Teore.
More information13.2 Fully Polynomial Randomized Approximation Scheme for Permanent of Random 0-1 Matrices
CS71 Randoness & Coputation Spring 018 Instructor: Alistair Sinclair Lecture 13: February 7 Disclaier: These notes have not been subjected to the usual scrutiny accorded to foral publications. They ay
More informationNew upper bound for the B-spline basis condition number II. K. Scherer. Institut fur Angewandte Mathematik, Universitat Bonn, Bonn, Germany.
New upper bound for the B-spline basis condition nuber II. A proof of de Boor's 2 -conjecture K. Scherer Institut fur Angewandte Matheati, Universitat Bonn, 535 Bonn, Gerany and A. Yu. Shadrin Coputing
More informationConsistent Multiclass Algorithms for Complex Performance Measures. Supplementary Material
Consistent Multiclass Algoriths for Coplex Perforance Measures Suppleentary Material Notations. Let λ be the base easure over n given by the unifor rando variable (say U over n. Hence, for all easurable
More informationLearnability of Gaussians with flexible variances
Learnability of Gaussians with flexible variances Ding-Xuan Zhou City University of Hong Kong E-ail: azhou@cityu.edu.hk Supported in part by Research Grants Council of Hong Kong Start October 20, 2007
More informationStatistical and Computational Learning Theory
Statistical and Computational Learning Theory Fundamental Question: Predict Error Rates Given: Find: The space H of hypotheses The number and distribution of the training examples S The complexity of the
More informationBlock designs and statistics
Bloc designs and statistics Notes for Math 447 May 3, 2011 The ain paraeters of a bloc design are nuber of varieties v, bloc size, nuber of blocs b. A design is built on a set of v eleents. Each eleent
More informationCS Lecture 13. More Maximum Likelihood
CS 6347 Lecture 13 More Maxiu Likelihood Recap Last tie: Introduction to axiu likelihood estiation MLE for Bayesian networks Optial CPTs correspond to epirical counts Today: MLE for CRFs 2 Maxiu Likelihood
More informationIntroduction to Statistical Learning Theory
Introduction to Statistical Learning Theory Definition Reminder: We are given m samples {(x i, y i )} m i=1 Dm and a hypothesis space H and we wish to return h H minimizing L D (h) = E[l(h(x), y)]. Problem
More informationLeast Squares Fitting of Data
Least Squares Fitting of Data David Eberly, Geoetric Tools, Redond WA 98052 https://www.geoetrictools.co/ This work is licensed under the Creative Coons Attribution 4.0 International License. To view a
More informationRobustness and Regularization of Support Vector Machines
Robustness and Regularization of Support Vector Machines Huan Xu ECE, McGill University Montreal, QC, Canada xuhuan@ci.cgill.ca Constantine Caraanis ECE, The University of Texas at Austin Austin, TX, USA
More informationThe Weierstrass Approximation Theorem
36 The Weierstrass Approxiation Theore Recall that the fundaental idea underlying the construction of the real nubers is approxiation by the sipler rational nubers. Firstly, nubers are often deterined
More informationSupport Vector Machines MIT Course Notes Cynthia Rudin
Support Vector Machines MIT 5.097 Course Notes Cynthia Rudin Credit: Ng, Hastie, Tibshirani, Friedan Thanks: Şeyda Ertekin Let s start with soe intuition about argins. The argin of an exaple x i = distance
More informationBest Arm Identification: A Unified Approach to Fixed Budget and Fixed Confidence
Best Ar Identification: A Unified Approach to Fixed Budget and Fixed Confidence Victor Gabillon Mohaad Ghavazadeh Alessandro Lazaric INRIA Lille - Nord Europe, Tea SequeL {victor.gabillon,ohaad.ghavazadeh,alessandro.lazaric}@inria.fr
More informationList Scheduling and LPT Oliver Braun (09/05/2017)
List Scheduling and LPT Oliver Braun (09/05/207) We investigate the classical scheduling proble P ax where a set of n independent jobs has to be processed on 2 parallel and identical processors (achines)
More informationSymmetrization of Lévy processes and applications 1
Syetrization of Lévy processes and applications 1 Pedro J. Méndez Hernández Universidad de Costa Rica July 2010, resden 1 Joint work with R. Bañuelos, Purdue University Pedro J. Méndez (UCR Syetrization
More informationQuantum algorithms (CO 781, Winter 2008) Prof. Andrew Childs, University of Waterloo LECTURE 15: Unstructured search and spatial search
Quantu algoriths (CO 781, Winter 2008) Prof Andrew Childs, University of Waterloo LECTURE 15: Unstructured search and spatial search ow we begin to discuss applications of quantu walks to search algoriths
More informationComputational and Statistical Learning theory Assignment 4
Coputatonal and Statstcal Learnng theory Assgnent 4 Due: March 2nd Eal solutons to : karthk at ttc dot edu Notatons/Defntons Recall the defnton of saple based Radeacher coplexty : [ ] R S F) := E ɛ {±}
More informationCIS 520: Machine Learning Oct 09, Kernel Methods
CIS 520: Machine Learning Oct 09, 207 Kernel Methods Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture They may or may not cover all the material discussed
More informationSolutions of some selected problems of Homework 4
Solutions of soe selected probles of Hoework 4 Sangchul Lee May 7, 2018 Proble 1 Let there be light A professor has two light bulbs in his garage. When both are burned out, they are replaced, and the next
More informationProbability and Stochastic Processes: A Friendly Introduction for Electrical and Computer Engineers Roy D. Yates and David J.
Probability and Stochastic Processes: A Friendly Introduction for Electrical and oputer Engineers Roy D. Yates and David J. Goodan Proble Solutions : Yates and Goodan,1..3 1.3.1 1.4.6 1.4.7 1.4.8 1..6
More informationClass 2 & 3 Overfitting & Regularization
Class 2 & 3 Overfitting & Regularization Carlo Ciliberto Department of Computer Science, UCL October 18, 2017 Last Class The goal of Statistical Learning Theory is to find a good estimator f n : X Y, approximating
More informationTesting Properties of Collections of Distributions
Testing Properties of Collections of Distributions Reut Levi Dana Ron Ronitt Rubinfeld April 9, 0 Abstract We propose a fraework for studying property testing of collections of distributions, where the
More informationFoundations of Machine Learning Boosting. Mehryar Mohri Courant Institute and Google Research
Foundations of Machine Learning Boosting Mehryar Mohri Courant Institute and Google Research ohri@cis.nyu.edu Weak Learning Definition: concept class C is weakly PAC-learnable if there exists a (weak)
More informationarxiv: v2 [stat.ml] 23 Feb 2016
Perutational Radeacher Coplexity A New Coplexity Measure for Transductive Learning Ilya Tolstikhin 1, Nikita Zhivotovskiy 3, and Gilles Blanchard 4 arxiv:1505.0910v stat.ml 3 Feb 016 1 Max-Planck-Institute
More informationCOMP9444: Neural Networks. Vapnik Chervonenkis Dimension, PAC Learning and Structural Risk Minimization
: Neural Networks Vapnik Chervonenkis Dimension, PAC Learning and Structural Risk Minimization 11s2 VC-dimension and PAC-learning 1 How good a classifier does a learner produce? Training error is the precentage
More informationA1. Find all ordered pairs (a, b) of positive integers for which 1 a + 1 b = 3
A. Find all ordered pairs a, b) of positive integers for which a + b = 3 08. Answer. The six ordered pairs are 009, 08), 08, 009), 009 337, 674) = 35043, 674), 009 346, 673) = 3584, 673), 674, 009 337)
More informationVC Dimension Review. The purpose of this document is to review VC dimension and PAC learning for infinite hypothesis spaces.
VC Dimension Review The purpose of this document is to review VC dimension and PAC learning for infinite hypothesis spaces. Previously, in discussing PAC learning, we were trying to answer questions about
More informationStability Bounds for Non-i.i.d. Processes
tability Bounds for Non-i.i.d. Processes Mehryar Mohri Courant Institute of Matheatical ciences and Google Research 25 Mercer treet New York, NY 002 ohri@cis.nyu.edu Afshin Rostaiadeh Departent of Coputer
More informationCertain Subalgebras of Lipschitz Algebras of Infinitely Differentiable Functions and Their Maximal Ideal Spaces
Int. J. Nonlinear Anal. Appl. 5 204 No., 9-22 ISSN: 2008-6822 electronic http://www.ijnaa.senan.ac.ir Certain Subalgebras of Lipschitz Algebras of Infinitely Differentiable Functions and Their Maxial Ideal
More informationLecture 21. Interior Point Methods Setup and Algorithm
Lecture 21 Interior Point Methods In 1984, Kararkar introduced a new weakly polynoial tie algorith for solving LPs [Kar84a], [Kar84b]. His algorith was theoretically faster than the ellipsoid ethod and
More informationA Bernstein-Markov Theorem for Normed Spaces
A Bernstein-Markov Theore for Nored Spaces Lawrence A. Harris Departent of Matheatics, University of Kentucky Lexington, Kentucky 40506-0027 Abstract Let X and Y be real nored linear spaces and let φ :
More informationCurious Bounds for Floor Function Sums
1 47 6 11 Journal of Integer Sequences, Vol. 1 (018), Article 18.1.8 Curious Bounds for Floor Function Sus Thotsaporn Thanatipanonda and Elaine Wong 1 Science Division Mahidol University International
More informationLecture 21 Principle of Inclusion and Exclusion
Lecture 21 Principle of Inclusion and Exclusion Holden Lee and Yoni Miller 5/6/11 1 Introduction and first exaples We start off with an exaple Exaple 11: At Sunnydale High School there are 28 students
More informationlecture 36: Linear Multistep Mehods: Zero Stability
95 lecture 36: Linear Multistep Mehods: Zero Stability 5.6 Linear ultistep ethods: zero stability Does consistency iply convergence for linear ultistep ethods? This is always the case for one-step ethods,
More informationarxiv: v1 [math.nt] 14 Sep 2014
ROTATION REMAINDERS P. JAMESON GRABER, WASHINGTON AND LEE UNIVERSITY 08 arxiv:1409.411v1 [ath.nt] 14 Sep 014 Abstract. We study properties of an array of nubers, called the triangle, in which each row
More informationFixed-to-Variable Length Distribution Matching
Fixed-to-Variable Length Distribution Matching Rana Ali Ajad and Georg Böcherer Institute for Counications Engineering Technische Universität München, Gerany Eail: raa2463@gail.co,georg.boecherer@tu.de
More informationTight Bounds for Maximal Identifiability of Failure Nodes in Boolean Network Tomography
Tight Bounds for axial Identifiability of Failure Nodes in Boolean Network Toography Nicola Galesi Sapienza Università di Roa nicola.galesi@uniroa1.it Fariba Ranjbar Sapienza Università di Roa fariba.ranjbar@uniroa1.it
More informationMetric spaces and metrizability
1 Motivation Metric spaces and metrizability By this point in the course, this section should not need much in the way of motivation. From the very beginning, we have talked about R n usual and how relatively
More informationNew Bounds for Learning Intervals with Implications for Semi-Supervised Learning
JMLR: Workshop and Conference Proceedings vol (1) 1 15 New Bounds for Learning Intervals with Iplications for Sei-Supervised Learning David P. Helbold dph@soe.ucsc.edu Departent of Coputer Science, University
More informationEnergy-Efficient Threshold Circuits Computing Mod Functions
Energy-Efficient Threshold Circuits Coputing Mod Functions Akira Suzuki Kei Uchizawa Xiao Zhou Graduate School of Inforation Sciences, Tohoku University Araaki-aza Aoba 6-6-05, Aoba-ku, Sendai, 980-8579,
More informationA Theoretical Framework for Deep Transfer Learning
A Theoretical Fraewor for Deep Transfer Learning Toer Galanti The School of Coputer Science Tel Aviv University toer22g@gail.co Lior Wolf The School of Coputer Science Tel Aviv University wolf@cs.tau.ac.il
More informationA := A i : {A i } S. is an algebra. The same object is obtained when the union in required to be disjoint.
59 6. ABSTRACT MEASURE THEORY Having developed the Lebesgue integral with respect to the general easures, we now have a general concept with few specific exaples to actually test it on. Indeed, so far
More informationFeature Extraction Techniques
Feature Extraction Techniques Unsupervised Learning II Feature Extraction Unsupervised ethods can also be used to find features which can be useful for categorization. There are unsupervised ethods that
More informationNonmonotonic Networks. a. IRST, I Povo (Trento) Italy, b. Univ. of Trento, Physics Dept., I Povo (Trento) Italy
Storage Capacity and Dynaics of Nononotonic Networks Bruno Crespi a and Ignazio Lazzizzera b a. IRST, I-38050 Povo (Trento) Italy, b. Univ. of Trento, Physics Dept., I-38050 Povo (Trento) Italy INFN Gruppo
More informationMachine Learning. VC Dimension and Model Complexity. Eric Xing , Fall 2015
Machine Learning 10-701, Fall 2015 VC Dimension and Model Complexity Eric Xing Lecture 16, November 3, 2015 Reading: Chap. 7 T.M book, and outline material Eric Xing @ CMU, 2006-2015 1 Last time: PAC and
More informationThe isomorphism problem of Hausdorff measures and Hölder restrictions of functions
The isoorphis proble of Hausdorff easures and Hölder restrictions of functions Doctoral thesis András Máthé PhD School of Matheatics Pure Matheatics Progra School Leader: Prof. Miklós Laczkovich Progra
More informationApproximation by Piecewise Constants on Convex Partitions
Aroxiation by Piecewise Constants on Convex Partitions Oleg Davydov Noveber 4, 2011 Abstract We show that the saturation order of iecewise constant aroxiation in L nor on convex artitions with N cells
More informationLecture 4: Completion of a Metric Space
15 Lecture 4: Completion of a Metric Space Closure vs. Completeness. Recall the statement of Lemma??(b): A subspace M of a metric space X is closed if and only if every convergent sequence {x n } X satisfying
More informatione-companion ONLY AVAILABLE IN ELECTRONIC FORM
OPERATIONS RESEARCH doi 10.1287/opre.1070.0427ec pp. ec1 ec5 e-copanion ONLY AVAILABLE IN ELECTRONIC FORM infors 07 INFORMS Electronic Copanion A Learning Approach for Interactive Marketing to a Custoer
More informationInfinitely Many Trees Have Non-Sperner Subtree Poset
Order (2007 24:133 138 DOI 10.1007/s11083-007-9064-2 Infinitely Many Trees Have Non-Sperner Subtree Poset Andrew Vince Hua Wang Received: 3 April 2007 / Accepted: 25 August 2007 / Published online: 2 October
More informationProbabilistic Machine Learning
Probabilistic Machine Learning by Prof. Seungchul Lee isystes Design Lab http://isystes.unist.ac.kr/ UNIST Table of Contents I.. Probabilistic Linear Regression I... Maxiu Likelihood Solution II... Maxiu-a-Posteriori
More informationMetric Entropy of Convex Hulls
Metric Entropy of Convex Hulls Fuchang Gao University of Idaho Abstract Let T be a precopact subset of a Hilbert space. The etric entropy of the convex hull of T is estiated in ters of the etric entropy
More informationMachine Learning Theory (CS 6783)
Machine Learning Theory (CS 6783) Tu-Th 1:25 to 2:40 PM Kimball, B-11 Instructor : Karthik Sridharan ABOUT THE COURSE No exams! 5 assignments that count towards your grades (55%) One term project (40%)
More informationLeast Squares Regression
E0 70 Machine Learning Lecture 4 Jan 7, 03) Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are a brief summary of the topics covered in the lecture. They are not a substitute
More information