Deriving the Dual. Prof. Bennett Math of Data Science 1/13/06

Size: px

Start display at page:

Download "Deriving the Dual. Prof. Bennett Math of Data Science 1/13/06"

Martin Mitchell
5 years ago
Views:

1 Dervng the Dua Prof. Bennett Math of Data Scence /3/06

2 Outne Ntty Grtty for SVM Revew Rdge Regresson LS-SVM=KRR Dua Dervaton Bas Issue Summary

3 Ntty Grtty Need Dua of w, b, z w 2 2 mn st. ( x w ) = C z y + b + z z + 0 =,..,

4 Wofe Dua Probem wth Inequates Prma mn f( r) r st.. g () r 0 =,, n f : R R dff and convex n g : R R dff convex Dua max Lru (, ) = f( r) + u( g( r)) ru, = st.. L( r, u) = f( r) + u ( g ( r)) = 0 r r r = u 0, =,,

5 Prma Lagrangan Functon Lagrangan 2 2 = α ( ) L( wz,, b, αβ, ) = w + C z + ( y x w+ b z ) β z L( wz,, b, αβ, ) = w α yx = 0 = = = = L( w,z, b, αβ, ) = C α β = 0 =,, z L( w,z, b, αβ, ) = α y = 0 b w mn w + C z w,b,z st. 2 2 ( ) = y x w+ b z 0 z 0 =,..,

6 Wofe Dua w, b, αβ, 2 2 ( ) max w + C z + α ( y x w+ b z ) β z = = = L( wz,, b, αβ, ) = w α yx = 0 w = L( w,z, b, αβ, ) = C α β = 0 =,, z L( w,z, b, αβ, ) = α y = 0, α 0, β 0 b = Emnate β = C α 0

7 Wofe Dua w, b, αβ, 2 2 = ( b) max w + α ( y x w+ ) L( wz,, b, αβ, ) = w α yx = 0 w = C α 0 =,, L( w,z, b, αβ, ) = α y = 0 b = Use grad b to smpfy objectve

8 Wofe Dua max w,α 2 w 2 = = ( ) = = w α y x w + = α y x α y = 0 C α 0 =,, α Emnate w

9 Wofe Dua max α 2 α y j jxj α y x α y x α y j jxj + α j= = = j= = = α y = 0 C α 0 =,, Smpfy nner products

10 Fna Wofe Dua max α α yyx x + α 2 j j j = j= = = α y = 0 C α 0 =,, α Usuay convert to mnmzaton at

11 Rdge Regresson Revew Use east norm souton for fxed Reguarzed probem λ > 0. Optmaty Condton: 2 2 mn w Lλ ( w, S) = λ w + y Xw L λ ( w, S) w ( ) = 2w 2 X' y+ 2 X' Xw = 0 XX ' + λi w= Xy ' n Requres 0(n 3 ) operatons

12 Dua Representaton Inverse aways exsts for any w = X X+λI X' y ( ' ) Aternatve representaton: λ > 0. ( XX ' + λiw ) = Xy ' w = λ ( X' y XXw ' ) w = λ X' ( y Xw) = X' α = λ ( ) λα ( y Xw) ( y XX' α) α y Xw = = XX ' α+ λα = y ( λ ) α = G+ I y where G = XX' Sovng equaton s

13 Dua Rdge Regresson To predct new pont: g( x) = w, x = α x, x = y' G+ λi z where z = x, x = ( ) Note need ony compute G, the Gram Matrx G = XX' = x, x G j j Rdge Regresson requres ony nner products between data ponts

14 Lnear Regresson n Feature Key Idea: Space Map data to hgher dmensona space (feature space) and perform near regresson n embedded space. Embeddng Map: n N φ : x R F R N >> n

15 Kerne Functon A kerne s a functon K such that K xu, = φ( x), φ( u) where φ s a mappng from nput space to feature space F. There are many possbe kernes. Smpest s near kerne. K xu, = xu, F

16 Rdge Regresson n Feature Space To predct new pont: = To compute the Gram Matrx ( ) g( φ( x)) = w, φ( x) = αφ( x ), φ( x) = y' G+ λi z where z = φ( x ), φ( x) G = φ( X) φ( X)' G = φ( x ), φ( x ) = K( x, x ) j j j Use kerne to compute nner product

17 Aternatve Dua Dervaton Orgna math mode 2 2 mn w f ( w) = λ w + y Xw Equvaent math mode mn w, z f( w) = w z λ = st.. y Xw= z =,, Construct dua usng Wofe Duaty

18 Lagrangan Functon Consder the probem mn f( r) r st.. h() r = 0 =,, Lagrangan functon s Lru (, ) = f() r + u( h()) r = r r r = n f : R R dff n h : R R dff L(, ru) = f() r + u( h()) r

19 Wofe Dua Probem Prma mn f( r) r st.. h() r = 0 =,, n f : R R dff and convex n h : R R dff Dua max Lru (, ) = f( r) + u( h( r)) ru, = st.. L( r, u) = f( r) + u ( h( r)) = 0 r r r =

20 Lagrangan Functon Prma 2 2, f = + z 2 2λ = mn wz ( wz, ) w st.. y Xw = z =,, Lagrangan α λ = = = + + L( wzα,, ) w z ( y X w z ) L( wzα,, ) = w α X = 0 w = L = = z ( w,z, α) z α 0 λ

21 Wofe Dua Probem wzα,, Construct Wofe Dua α λ = = max L( wzα,, ) = w + z + ( y X w z ) st.. L( wzα,, ) = w α X = 0 w = zl( w,z, α) = z α = 0 λ Smpfy by emnatng z=λα

22 Smpfed Probem Get rd of z max w, α 2 λ 2 2 w + α α y αxw λα = = 2 λ 2 w α 2 Xw α 2 α y = = = + st.. L( wzα,, ) = w α X = 0 w = Smpfy by emnatng w=x α

23 Smpfed Probem Get rd of w λ 2 α X 2 α X j j α 2 = j= = max, α + α y α X, α X j j = = j= = + α unconstraned λ 2 αα, 2 j x x j α 2 α y,j= = =

24 Optma souton Probem n matrx notaton wth G=XX mn f( α) = α' Gα + α' α y' α α λ 2 2 Souton satsfes f( α) = Gα + λα y = 0 α = ( + ) G λi y

25 What about Bas If we mt regresson functon to f(x)=w x means that souton must pass through orgn. Many modes may requre a bas or constant factor f(x)=w x+b

26 Emnate Bas One way to emnate bas s to center the response Make response have mean of 0 mean y = y = = standard devaton y σ y = = ( y y) 2

27 Center y centered y = y y Y now has sampe mean of 0 Frequenty good to make y have standard ength: y y normazed y = σ y

28 Centerng X may be good dea Mean X µ = x = X ' e where e s vector of ones Center X xˆ = x µ ' = x e' X Centered Pont ˆ X= X eµ ' = X ee' X= ( I ee') X Centered Data

29 Scang X may be a good dea Compute Standard Devaton ˆ = ' = ' = ( ') Centered Data X X eµ X ee X I ee X Covarance of X == dag( ) σ = σ = j ( X µ ) j 2 Xˆ ' Xˆ Scae coumns/varabes X = ˆ dag( σ ) Scae each coumn X

30 You Try Consder data matrx wth 3 ponts n 4 dmensons 2 4 X = Computer the centered X by hand and wth the foowng formua, then scae ˆ = ( ') X I ee X

31 Center φ(x) n Feature Space We cannot center φ(x) drecty n feature space. Center G = XX ˆ = ˆˆ' = ( ') '( ') ' = ( ') ( ') ' G XX I ee X X I ee I ee G I ee Works n feature space too for G n kerne space G = φ( X) φ( X') = K ˆ = ( ') ( ')' K I ee K I ee

32 Centerng Kerne ˆ = ( ') ( ') K I ee K I ee Practca Computaton: Let ' = ' row average of Let Let Let µ e K K K = K eµ ' subtract row average c= K e row average of K Kˆ = K ce' subtract coumn average

33 Orgna way Rdge Regresson n Feature Space α = G+ λi y g( φ( x)) =αk( x, x) ( ) = Predcted normazed y Predcted orgna y ( ) ˆ g = αˆ = G+ λi yˆ ( φ( x)) = ˆ α ˆ K( x, x) ( ) ˆ g = αˆ = G+ λi yˆ ( φ( x)) = σ ˆ ˆ y α K( x, x) + µ y λ λ λ

34 Worksheet Normazed Y y µ y yˆ = σ yˆ + µ = y σ y y y Invert to get unnormazed y

35 Centerng Test Data ( ) ˆ g = αˆ = G+ λi yˆ ( φ( x)) = σ ˆ ˆ y α K( x, x) + µ y Cacuate test data just ke tranng data: ˆ ( ') ( ' ) where ' Ktr = Ktr e µ tr I ee µ tr = K e tr ˆ ' Ktst = ( Ktst e µ tr )( I ee') Predcton of test data becomes: ( ) ˆ ˆ ' ˆ g( ( )) αˆ = G+ λi y φ X = σ K αˆ + µ e λ tst y tst y

36 Aternate Approach Drecty add bas to the mode: y = Xw b b s bas Optmzaton probem becomes: 2 2,, b f b = + z 2 2λ = mn wz ( wz,, ) w st.. y Xw + b = z =,,

37 Lagrangan Functon Consder the probem mn f( r) r st.. h() r = 0 =,, Lagrangan functon s Lru (, ) = f() r + u( h()) r = r r r = n f : R R dff n h : R R dff L(, ru) = f() r + u( h()) r

38 Lagrangan Functon Prma 2 2,, b f b = + z 2 2λ = mn wz ( wz,, ) w st.. y Xw + b = z =,, α λ = = L( wz,, b, α) = w + z + ( y X w+ b z ) L( wz,, b, α) = w α X = 0 w = L = = z ( w,z, b, α) z α 0 λ L( w,z, b, α) = α = 0 b =

39 wz,, b, α Wofe Dua Probem 2 2 w + z α y Xw+ b z λ = = max ( ) st.. L( wzα,, ) = w α X = 0 w Smpfy by emnatng z=λα and usng e α =0 = = zl( w,z, α) = z α = 0 λ L( w,z, b, α) = α = 0 b

40 Smpfed Probem max w, α 2 λ 2 λ 2 w + α α y αxw+ αb α 2 = = 2 λ 2 w α 2 Xw α 2 α y = = = + st.. L( wzα,, ) = w α X = 0 w = α = 0 = Smpfy by emnatng w=x α

41 Smpfed Probem Get rd of w λ 2 α X 2 α X j j α 2 = j= = max, α + st.. α = 0 α y α X, α X j j = = j= = + λ 2 αα, 2 j x x j α 2 α y,j= = =

42 New Probem to be soved Probem n matrx notaton wth G=XX mn f ( α) = α'gα+ α y'α α λ 2 2 st.. e'α = 0 Ths s a constraned optmzaton probem. Souton s aso system of equatons, but not as smpe.

43 Kerne Rdge Regresson Centered agorthm just requres centerng of the kerne and sovng one equaton. Can aso add bas drecty. + Lots of fast equaton sovers. + Theory supports generazaton - requres fu tranng kerne to compute α - requres fu tranng kerne to predct future ponts

A General Column Generation Algorithm Applied to System Reliability Optimization Problems

A General Column Generation Algorithm Applied to System Reliability Optimization Problems A Genera Coumn Generaton Agorthm Apped to System Reabty Optmzaton Probems Lea Za, Davd W. Cot, Department of Industra and Systems Engneerng, Rutgers Unversty, Pscataway, J 08854, USA Abstract A genera