VC Dimension and Sauer s Lemma

Size: px

Start display at page:

Download "VC Dimension and Sauer s Lemma"

Shonda Beasley
5 years ago
Views:

1 CMSC (Spring 2008) Learning Theory Lecture: VC Diension and Sauer s Lea Instructors: Sha Kakade and Abuj Tewari Radeacher Averages and Growth Function Theore Let F be a class of ±-valued functions Then we have, Proof We have, 2 ln ΠF () R (F) [ [ R (F) = E E sup ɛ i a i X a F X i= E 2 ln F X [ ] 2 ln ΠF () E ]] = 2 ln ΠF () Since f(x i ) {±}, any a F X has a = The first inequality above therefore follows fro Massart s finite class lea The second inequality follows fro the definition of the growth function Π F () Note that plugging in the trivial bound Π F () 2 does not give us any interesting bound This is quite reasonable since this bound would hold for any function class no atter how coplicated it is To easure the coplexity of F, let us look at the first natural nuber such that Π F () falls below 2 This brings us to the definition of the Vapnik-Chervonenkis diension 2 Vapnik-Chervonenkis Diension The Vapnik-Chervonenkis diension (or siply the VC-diension) of a function class F {±} X is defined as VCdi(F) := ax { > 0 Π F () = 2 } An equivalent definition is that VCdi(F) is the size of the largest set shattered by F A set {x,, x } is said to be shattered by F if for any labelling b = (b,, b ) {±}, there is a function f F such that (f(x ),, f(x )) = (b,, b ) Note that a function f {±} X can de identified with the subset of X on which it is equal to + So, we often talk about the VC-diension of a collection of subsets of X The table below gives the VC-diensions for a few exaples The proofs of the clais in the first three rows of the table are left as exercises Here we prove only the last clai: the VC-diension of halfspaces in R d is d +

2 X F VCdi(F) R 2 convex polygons R 2 axis-aligned rectangles 4 R 2 convex polygons with d vertices 2d + R d halfspaces d + Theore 2 Let X = R d Define the set of ±-valued functions associated with halfspaces, Then, VCdi(F) = d + Proof We have to prove two inequalities F = { x sgn (w x θ) w R d, θ R } VCdi(F) d +, () VCdi(F) d + (2) To prove the first inequality, we need to exhibit a particular set of size d + that is shattered by F Proving the second inequality is a bit ore tricky: we need to show that for all sets of size d + 2, there is labelling that cannot be realized using halfpsaces Let us first prove () Consider the set X = {0, e,, e d } which consists of the origin along with the vectors in the standard basis of R d Given a labelling b 0,, b d of these points, set θ = b 0, w i = θ + b i, i [d] With these definitions, it iediately follows that w 0 θ = b 0 and for all i [d], w e i θ = b i Thus, X is shattered by F Since, X = d +, we have proved () Before we prove (2), we need the following result fro convex geoetry Radon s Lea Let X R d be a set of size d + 2 Then there exist two disjoint subsets X, X 2 of X such that conv(x ) conv(x 2 ) Here conv(xenotes the convex hull of X Proof Let X = {x,, x d+2 } Consider the following syste of d + equations in the variables λ,, λ d+2, λ λ 2 ( x x 2 x d+2 ) λ d+2 = 0 (3) Since, there are ore variables than equations, there is a non-trivial solution λ 0 Define the set of indices, Since λ 0, both P and N and non-epty and P = {i λ i > 0}, N = { j λ j < 0 } λ i = j N Moreover, since λ satisfies d+2 i= λ i x i = 0, we have λ i x i = j N 2 ( λ j ) 0 ( λ j )x j

3 Defining X = {x i X i P } and X 2 = {x i X i N}, we see that the point λ i x i j N = ( λ j )x j λ i j N ( λ j lies both in conv(x ) as well as conv(x 2 ) Given Radon s lea, the proof of () is quite easy We have to show that given a set X R d of size d + 2, there is a labelling that cannot be realized using halfspaces Obtain disjoint subsets X, X 2 of X whose existence is guaranteed by Radon s lea Now consider a labelling in which all the points in X are labelled + and those in X 2 are labelled We clai that such a labelling cannot be realized using a halfspace Suppose there is such a halfspace H Note that if a halfspace assigns a particular label to a set of points, then every point in their convex hull is also assigned the sae label Thus every point in conv(x ) is labelled + by H while every point in conv(x 2 ) is labelled But conv(x ) conv(x 2 ) giving us a contradiction We often work with ±-valued functions obtained by thresholding real valued functions at 0 If these real valued functions coe fro a finite diensional vector space, the next result gives an upper bound on the VC diension Theore 22 Let G be a finite diensional vector space of functions on R d Define, If the diension of G is k then VCdi(F) k F = {x sgn(g(x)) g G} Proof Fix an arbitrary set of k + points x,, x k+ We show that this set cannot be shattered by F Consider the linear transforation T : G R k+ defined as T (g) = (g(x ),, g(x k+ ) The diension of the iage of G under T is at ost k Thus, there exists a non-zero vector λ R k+ that is orthogonal to it That is, for all g G, k+ λ i g(x i ) = 0 (4) At least one of the sets, i= P := {i λ i > 0}, N := {j λ j < 0}, is non-epty Without loss of generality assue it is P Consider a labelling of x,, x k+ that assigns the label + to all x i such that i P and to the rest If this labelling is realized by a function in F then there exists g 0 G such that λ i g 0 (x i ) > 0, λ i g 0 (x i ) 0 But this contradicts (4) Therefore x,, x k+ cannot be shattered by F i N 3

4 3 Growth Function and VC Diension Suppose VCdi(F) = d Then for all d, Π F () = 2 The lea below, due to Sauer, iplies that for > d, Π F () = O( d ), a polynoial rate of growth This result is rearkable for it iplies that the growth function exhibits just two kinds of behavior If VCdi(F) = then Π F grows exponentially with On the other hand, if VCdi(F) = d < then the growth function is O( d ) Sauer s Lea Let F be such that VCdi(F) d Then, we have Π F () d ( ) i Proof We prove this by induction on + d For = d =, the above inequality holds as both sides are equal to 2 Assue that it holds for and d and for and d We will prove it for and d Define the function, h(, d) := d ( ) i so that our induction hypothesis is: for F with VCdi(F) d, Π F () h(, d) Since ( ) ( ) ( ) = +, i i i is is easy to verify that h satisfies the recurrence h(, d) = h(, d) + h(, d ) Fix a class F with VCdi(F) = d and a set X = {x,, x } X Let F = F X and X 2 = {x 2,, x } and define the function classes, F := F X F 2 := F X2 F 3 := {f X2 f F & f F st x X 2, f (x) = f(x) & f (x ) = f(x )} Note that VCdi(F ) VCdi(F) d and we wish to bound F By the definitions above, we have F = F 2 + F 3 It is easy to see that VCdi(F 2 ) d Also, VCdi(F 3 ) d because if F 3 shatters a set, we can always add x to it to get a set that is shattered by F By induction hypothesis, F 2 h(, d) and F 3 h(, d ) Thus, we have F x = F h(, d) + h(, d ) = h(, d) Since x,, x were arbitrary, we have and the induction step is coplete Π F () = sup F x x X h(, d) Corollary 3 Let F be such that VCdi(F) d Then, we have, for d, Π F () ( e d 4

5 Proof Since n d, we have d ( ) ( d ( ) ( d i d i ( ( ) ( d d i ( ) ( d + d ) d ( e d d ) i ) i 5

1 Generalization bounds based on Rademacher complexity

1 Generalization bounds based on Rademacher complexity COS 5: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #0 Scribe: Suqi Liu March 07, 08 Last tie we started proving this very general result about how quickly the epirical average converges