Hybridized Heredity In Support Vector Machine

Size: px

Start display at page:

Download "Hybridized Heredity In Support Vector Machine"

Lauren Hunter
5 years ago
Views:

1 Hybridized Heredity I Suort Vector Machie May 2015 Hybridized Heredity I Suort Vector Machie Timothy Idowu Yougmi Park Uiversity of Wiscosi-Madiso idowu@stat.wisc.edu yougmi@stat.wisc.edu May 2015 Abstract I the resece of high dimesioal classificatio roblem it becomes ecessary to ivoke some form of sarsity. I this reort we make efforts to imrove heredity ricile i the Suort Vector Machie frame work. Heredity ricile is some form of hierarchical structure imosed o the redictor variables which ca make classifiers more iterretable. The heredity riciles itroduced by Wu, S., Zou, H. ad Yua, M. (2008)] are exteded to a more geeral form. The methods here are origially based o the structured variable selectio (SVS) roosed by Yua, M., Joseh, R. ad Zou, H. (2007)]. Obtaied solutios coform with secified heredity requiremets i the resece of sarsity. I. Itroductio I the machie learig world the suort vector machie (SVM) is a widely used suervised learig method for biary classificatio. Let x deote a set of covariates. The class labels, y, are deoted as {1, 1}. Usig some traiig data set {x i, y i }, i = 1, 2,...,, the suort vector machie solves the roblem by usig the followig ealized hige loss ( ˆβ, ˆβ 0 ) = arg mi β,β 0 1 y i (x T i β β 0 )] λ β 2 2 (1) with the subscrit "" meaig ositive arts oly. The solutio is the Sig ( ˆβ 0 x T ˆβ ). I order to make the SVM more iterretable ad reduce classificatio error (roortio of times the SVM mis-classifies) we ca imlemet some structural adjustmet. Oe ossible adjustmet could be sarsity i cases with high dimesio to avoid iclusio of irrelevat redictors. A revious roositio by Bradley, P. ad Magasaria, O. (1998)] suggested a relacemet of the l 2 ealty i 1 with a l 1 ealty ( ˆβ, ˆβ 0 ) = arg mi β,β 0 1 y i (x T i β β 0 )] λ β 1. (2) Methods The aforemetioed methods do ot however take ito accout ay form of relatioshi betwee the redictors. For examle, let us cosider a quadratic classifier with redictors z 1, z 2,..., z q : β 1 z 1... β q z q β 11 z 2 1 β 12z 1 z 2... β q,q 1 z q z q 1... β qq z 2 q (3) oe might wat to activate the heredity ricile. Two well established heredity riciles are the strog ad weak. 1

2 Hybridized Heredity I Suort Vector Machie May 2015 Strog heredity: For a two factor iteractio, say, some z i z j to be active both its aret effects, z i ad z j, should be active. Weak heredity: For a two factor iteractio, say, some z i z j to be active at least oe of its aret effects, z i or z j, should be active. It is cosistet to require z i active for z 2 i to be active. The methods discussed i this reort simultaeously imose sarsity ad heredity. The costraits are also secified such that the roblems remai liear rogrammig oes. I. Geeralized Garrote ad Heredity Priciles Breima, L. 1995] itroduced the oegative garrrote used for variable selectio i liear regressio. This ca be alied i the SVM framework. If we have extracted the coefficiets ˆβ from the l 2 SVM, it is ossible to scale the each redictor x j with some arameter θ j ad the solve the followig otimizatio roblem mi 1 y i x ij ˆβ j θ j β 0 with the costraits θ j M ad θ j 0 j, with M as the shrikage arameter. The classifier ( the takes the form Sig ˆβ 0 x j ˆβ ) j ˆθ j. With a aroriately chose M, some θ j s will be reduced to 0, hece erformig a variable selectio. Adjustmets ca be made o the garrote method to fit certai requiremet by usig aroriate costraits. (4) Strog Heredity (SHSVM) Let the redictor set be of size, the hierarchical structure of the redictors ca be exressed by sets {D j : j = 1,..., }, where D j cosists of the aret effects of the jth redictor. For examle, let the q 1th redictor, x q1 = z 1 z 2 with aret effect x 1 = z 1 ad x 2 = z 2 so we have D q1 = {1, 2}. To imlemet the strog heredity ricile the garrote ca be adjusted with some costraits mi 1 y i x ij ˆβ j θ j β 0 λ θ j (5) with the costraits θ j θ r, r D j, j ad θ j 0 j. The additioal liear iequality costrait o the scalig arameters also hel with sarsity of the coefficiets. I Weak Heredity (WHSVM) To imlemet the weak heredity ricile we have the costraits are mi 1 y i x ij ˆβ j θ j β 0 λ θ j (6) with the costraits θ j r Dj θ r, j ad θ j 0 j. Based o the costraits at least oe aret eeds to be active for a iteractio to be active, which imlies that the resultig model satisfies the weak heredity ricile. 2

3 Hybridized Heredity I Suort Vector Machie May 2015 Hybridized Heredity (HHSVM) We further set the costraits so as to geeralize the heredity ricile by havig the followig: mi 1 y i x ij ˆβ j θ j β 0 λ θ j (7) with the costraits θ j r D θ j r, j ad θ D j j 0 j where D j is the dimesio of D j. Note the followig: If strog heredity is satisfied the the hybridized heredity must be satisfied. If hybridized heredity is satisfied the the weak heredity must be satisfied. I Numerical Results I. Data We use umerical examles to comare our HHSVM to the WHSVM ad SHSVM. For each simulatio examle, we geerated a dataset of size 10,000 for testig. For traiig, we radomly geerated aother datasets of sizes = 50, 100, ad 200. I each examle, all classifiers were fitted o a traiig samle ad their classificatio errors (uder 0 1 loss) were comuted o a test samle. This rocess was reeated 30 times ad the meas of the errors are reorted. Structure The geerated exlaatory variables z 1,..., z 7 are stadard ormal, where the correlatio betwee z r ad z j is ρ r j, with ρ = 0, 0.5. The class labels are geerated from a logistic regressio model. The set of redictors for fittig the SVMs is {z j, z r z j, z 2 j }, with r, j = 1,..., 7. Let θ j ad θ jj be the scalig arameters for z j ad z 2 j resectively while θ rj corresods to z r z j for r = j. To fit the SHSVM the liear costraits i 5 are of the form θ rj θ r ad θ rj θ j, r = j, where r, j = 1,..., 7 To fit the WHSVM the liear costraits i 6 are of the form θ rj θ r θ j, r = j, where r, j = 1,..., 7 To fit the HHSVM the liear costraits i 7 are of the form I θ rj θ rθ j 2, r = j, where r, j = 1,..., 7 Examle with Strog Heredity satisfied ( ) Pr(y = 1 z1,..., z 7 ) log = 2z Pr(y = 1 z 1,..., z 7 ) 1 4z 3 3z 1 z 2 1. The simulatio results are summarized i Table 1. From Table 1 we see that the SHSVM sigificatly outerforms other methods but the HHSVM outerforms the WHSVM. 3

4 Hybridized Heredity I Suort Vector Machie May 2015 Examle with Weak Heredity satisfied ( ) Pr(y = 1 z1,..., z 7 ) log = 3.5z Pr(y = 1 z 1,..., z 7 ) 1 3z 1 z 2 2.5z 1 z 3 2z 1 z 4 1.5z 1 z 5 z 1 z 6 1. The simulatio results are summarized i Table 2. From Table 2 we see that the WHSVM sigificatly outerforms other methods but the HHSVM outerforms the SHSVM. Table 1: Classificatio Error from Strog Heredity Examle ρ=0 SHSVM HHSVM WHSVM l 2 SVM ρ=0.5 SHSVM HHSVM WHSVM l 2 SVM Table 2: Classificatio Error from Weak Heredity Examle ρ=0 WHSVM HHSVM SHSVM l 2 SVM ρ=0.5 WHSVM HHSVM SHSVM l 2 SVM Discussio I the cases where weak heredity is satisfied, HHSVM outerform SHSVM. I the cases where strog heredity is satisfied, HHSVM outerform WHSVM. It is recommeded to aly HHSVM whe the heredity structure is ot kow or well secified. Refereces Bradley, P. ad Magasaria, O. (1998)] Bradley, P. ad Magasaria, O. (1998). Feature selectio via cocave miimizatio ad suort vector machies. I J. Shavlik (eds), ICML 98. Morga Kaufma. Breima, L. 1995] Breima, L. (1995). Better subset regressio usig the oegative garrote. Techometrics, 37, 4,

5 Hybridized Heredity I Suort Vector Machie May 2015 Yua, M., Joseh, R. ad Zou, H. (2007)] Yua, M., Joseh, R. ad Zou, H. (2007). Structured Variable Selectio ad Estimatio. Techical Reort, School of Idustrial ad Systems Egieerig, Georgia Istitute of Techology. Wu, S., Zou, H. ad Yua, M. (2008)] Wu, S., Zou, H. ad Yua, M. (2008). Structured Variable Selectio i Suort Vector Machies. Electroic Joural of Statistics, Vol. 2(2008)

Support vector machine revisited

Support vector machine revisited 6.867 Machie learig, lecture 8 (Jaakkola) 1 Lecture topics: Support vector machie ad kerels Kerel optimizatio, selectio Support vector machie revisited Our task here is to first tur the support vector