Coputatonal and Statstcal Learnng theory Assgnent 4 Due: March 2nd Eal solutons to : karthk at ttc dot edu Notatons/Defntons Recall the defnton of saple based Radeacher coplexty : [ ] R S F) := E ɛ {±} n sup ɛ fx ) Defnton. Gven a saple S = {x,..., x }, and any > 0, a set V R s sad to be an -cover n l p ) of functon class F on saple S f ) /p f F, v V s.t. fx ) v p Specfcally for p = fx ) v p) /p s replaced by ax [] fx ) v. Also defne N p F,, S) := n{ V : V s an -cover of n l p ) off on saple S} and N p F,, n) := sup N p F,, {x,..., x }) x,...,x Defnton 2. A functon F s sad to -shatter a saple S = {x,..., x } f there exsts a sequence of thresholds, s,..., s R such that ɛ {±}, f F s.t. [], ɛ fx ) s ) /2
Probles. VC Lea for Real-valued Functon classes : We shall prove that for any functon class F assue functons n F are bounded by ) and scale > 0, the l coverng nuber at scale can be bounded usng fat shatterng denson at that scale by provng a stateent analogous to VC lea. We shall proceed by frst extendng the stateent to fnte specfcally {0,..., k}) valued functon classes and then usng ths to prove the fnal bound of for fat N F,, n) n ) a) Let F k {0,..., k} X be a functon class wth fat 2 F k ) = d, show that N F k, /2, ) d ) ) k Show the above stateent usng nducton on n + d very slar to frst proble on Assgnent 2). Hnt : In Assgnent 2 proble where we used H + S and H S use nstead, for all {0,..., k}, F = {f F : fx ) = } note that ths s a sple ultlable extenson and for k =, F 0, F are dentcal to H +, H ). Use the noton of 2-shatterng nstead of shatterng for the VC case and use /2-cover nstead of growth functon. b) Usng the dea of -dscretzng the output of functon class F we shall conclude the requred stateent. Do the followng :. Create a {0,..., k}-valued class G where k s of order /. Show that coverng G at scale /2 ples we can cover F at scale and hence conclude that we can bound N F,, ) n ters of coverng nuber at scale /2 for G.. Show that fat 2 G) fat F). Cobne wth the bound on N G, /2, ) fro prevous sub-proble and conclude that fat N F,, n) n ) 2. Dudley Vs Pollards Bounds : In class we saw that Radeacher coplexty can be bounded n ters of coverng nubers usng Pollard s bound, Dudley ntegral bound and the slghtly odfed verson of Dudley ) 2
ntegral bound as follows : { } { } N F,, ) N2 F,, ) R S F) nf + nf + 0 0 { } N2 F, τ, ) R S F) nf 4 + 2 dτ 0 Pollard) Refned Dudley) In ths proble usng soe exaples we shall copare these bounds. a) Class wth fnte VC subgraph-denson : Assue that the VC subgraph-denson of functon class F s bounded by D. In ths case result n proble can be used to bound the coverng nuber of F n ters of D. Use ths bound on coverng nuber and copare Pollard s bound wth refned Dudley ntegral bound by wrtng down the bounds pled by each one. b) Lnear class wth bounded nor : Lnear classes n hgh densonal spaces s probably one of the ost portant and ost used functon class n achne learnng. Consder the specfc exaple where X = {x : x 2 } and F = {x w x : w 2 } In class we saw that for any ɛ > 0, fat ɛ F) 4. Usng ths wth the result n proble ɛ 2 we have that : en ) 4 ɛ N 2 F,, ) N F,, ) 2 ɛ Use the above bound on the coverng nuber and wrte down the bound on Radeacher coplexty pled by Pollard s bound. Wrte down the bound on Radeacher coplexty pled by the refned verson of the Dudley ntegral bound. 3. Data Dependent Bound : Recall the Radeacher coplexty bound we proved n class for functons F bounded by. For any δ > 0 wth probablty at least δ, ) [ sup E [fx)] ÊS[fx)] 2E S D RS F)] + log/δ) Note that we don t know the dstrbuton D. One way we used the above bound was by provdng [ upper bounds on R S F) for any saple of sze and usng ths nstead of E S D RS F)]. But deally we would lke to get tght bounds when the dstrbuton we are faced wth s ncer. The a of ths proble s to do ths. Prove that, for any δ > 0 wth probablty at least δ, over draw of saple S D, ) sup E [fx)] ÊS[fx)] 2 R log2/δ) S F) + K 3
provde explct value of constant K above). Notce that n the above bound the expected Radeacher coplexty s replaced by saple based one whch can be calculated fro the tranng saple. Hnt : Use McDard s nequalty on the expected Radeacher coplexty. 4. Learnablty and Fat-shatterng Denson Recall the settng of stochastc optzaton proble where objectve functon s appng r : H Z R. Saple S = {z,..., z } drawn d fro unknown dstrbuton D s provded to the learner and the a of the learner s to output ĥ H based on saple that has low expected objectve E [rh, z)]. a) Consder the stochastc optzaton proble wth r bounded by a,.e. rh, z) < a < for all h H and z Z. If functon class F := {z rh, z) h H} has fnte fat for all > 0, then show that the proble s learnable. b) Conclude that for a supervsed learnng proble wth bounded hypothess class H e. x X, hx) < a), and loss φ : Ŷ Y R that s L-Lpschtz n frst arguent), f H has fnte fat for all > 0, then the proble s learnable. c) Show a stochastc optzaton proble that s learnable even though t has nfnte fat for all 0. or any other cosntant of your choce). Explctly wrte down the hypothess class, and the learnng rule whch learns the class, argue that the proble s learnable, and explan why the fat s nfnte. Hnt : You can ake the learnng rule that s successful to even be ERM. d) Prove that for a supervsed learnng proble wth the absolute loss φŷ, y) = ŷ y, f the fat s nfnte for soe > 0, then the proble s not learnable. Hnt: as wth the bnary case, for every, construct a dstrbuton whch s concentrated on a set of ponts that can be fat-shattered. Challenge Probles. We saw that for any dstrbuton D, the expected Radeacher coplexty provded an upper bound on the axu devaton between ean and average unforly over functon class, specfcally we saw that E S D [ sup Prove the alost) converse that ) ] [ E [fx)] Ê [fx)] 2E S D RS F)] [ ] 2 E S D RS F) E S D 4 [ sup E [fx)] Ê [fx)] ) ]
Ths bascally establshes that Radeacher coplexty tghtly bounds the unfor axal devaton for every dstrbuton. 2. The worst case Radeacher coplexty s defned as R F) = e. supreu over saples of sze ). sup S={x,...,x } R S F) a) Prove that for any functon class F and any τ > R F), we have that fat τ F) 4 R F) 2 τ 2 Hnt : Frst start by provng the stateent for larger saple of sze = fat τ fat τ by takng fat τ saples and repeatng the approprate nuber of tes. You wll need to start wth a shattered set and you wll need to use Kntchne s nequalty whch states that for any n, [ ] n n E ɛ Unf{±} n ɛ 2 b) Cobne the above wth the refned verson of Dudley ntegral bound to prove that { } N2 F, τ, ) nf 4 + 2 dτ 0 R F) Olog 3/2 ) Ths shows that the refned dudley ntegral bound s tght to wthn log factors of the Radeacher coplexty. Thus we have establshed that n the worst case all the coplexty easures for functon class lke Radeacher coplexty, coverng nubers and fat shatterng denson all tghtly govern the rate of unfor axal devaton for the functon class all to wthn log factor). 3. Bounded Dfference Inequalty, Stablty and Generalzaton : Recall that a functon G : X R s sad to satsfy the bounded dfference nequalty f for all [] and all x,..., x, x X, Gx,..., x,..., x ) Gx,..., x, x, x +,..., x ) c for soe c 0. In ths case the McDard s nequalty gave us that for any δ > 0, wth probablty at least δ, c)2 log/δ) Gx,..., x ) E [Gx,..., x )] + The bounded dfference property turns out to be quet useful to analyze learnng algorths drectly nstead of lookng at the unfor devaton over functon class). 5
A proper learnng algorth s A : = X F s sad to be a unforly β stable s for all [], and any x,..., x, x X, sup Ax,..., x,..., x )x) Ax,..., x, x, x +,..., x )x) β x Assung functons n F are bounded by we shall prove that the learnng algorth generalzes well expected loss s close to eprcal loss of the algorth). Specfcally we shall prove that for any δ > 0, wth probablty at least δ, RAS)) RAS)) 2 log/δ) + β + 2β + ) where RAS)) = E x [AS)x)] and RAS)) = AS)x ). a) Frst show that E S [RAS)) RAS)) ] β. Hnt : Use renang of varables to frst show that for any [], E S [RAS))] = E S,x [Ax,..., x, x, x +,..., x )x )] b) Show that the functon GS) = RAS)) RAS)) satsfes bounded dfference property wth c 2β + 2. Conclude the requred stateent usng McDard s nequalty. c) Consder the stochastc convex optzaton proble where saple z = x, y) where y s real valued and x s are fro the unt ball n soe Hlbert space and hypothess s weght vectors w fro the sae Hlbert space wth objectve rw, x, y)) = w, x y + λ w 2 Show that the ERM algorth s stable for ths proble and thus provde a bound for ths algorth. 4. L Neural Network : A k-layer -nor neural network s gven by functon class F k whch s n turn defned recursvely as follows. { } d F = x wj x j w B and further for each 2 k, { } d F = x wj σf j x)) j [d ], f j F, w B j= j= 6
where d s the nuber of nodes n the th layer of the network. Functon σ : R [, ] s called the squash functon and s generally a sooth onotonc non-decreasng functon typcal exaple s the tanh functon). Assue that nput space X = [0, ] d and that σ s L-Lpschtz. Prove that k R S F k ) 2B ) L k 2T log d Notce that the above bound the d s don t appear n the bound ndcatng the nuber of nodes n nteredate layers don t affect the upper bound on Radeacher coplexty. Hnt : prove bound on Radeacher coplexty of F recursvely n ters of radeacher coplexty of F. 7